date:20120524

2012/5/23 Robert Haas :
> On Wed, May 23, 2012 at 3:45 PM, Kohei KaiGai  wrote:
>> I wanted to have discussion to handle this problem.
>>
>> Unlike leaky-view problem, we don't need to worry about unexpected
>> qualifier distribution on either side of join, because a scan on table
>> never contains any join. Thus, all we need to care about is order of
>> qualifiers chained on a particular scan node; being reordered by
>> the cost to invoke functions.
>>
>> How about an idea to track FuncExpr come from the security policy
>> and enforce "0" on its cost? Regular functions never reach zero
>> cost, so the security policy must be re-ordered to the head.
>
> Hmm.  That would disregard the relative costs of multiple qualifiers
> all of which were injected by the security policy, which I suspect is
> not a good idea.
>  Furthermore, I think that we should not assume that
> there is no join involved.  I would expect a fairly popular RLS qual
> to be something of the form "WHERE NOT EXISTS (SELECT 1 FROM hide_me
> WHERE hide_me.pk = thistab.pk)".
>
Please ignore the approach to track cost value of qualifiers.
I believe it does not work well without something fundamental updates.

> Perhaps when we see that RLS
> applies, we should replace the reference to the original table with a
> subquery RTE that has the security_barrier flag set - essentially
> treating a table with RLS as if it were a security view.
>
I become to think it is a better approach than tracking origin of each
qualifiers. One problem is case handling on update or delete statement.

It may be possible to rewrite the update / delete query as follows:

From:
  UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
To:
  UPDATE tbl SET X = X + 1 WHERE ctid = (
  SELECT * FROM (
  SELECT ctid FROM tbl WHERE uname = getpgusername()  <== (*)
should have security-barrier
  ) AS tbl_subqry WHERE f_leak(Y)
  );

Expanded sub-queries will have security-barrier flag, so it enforces
the "uname = getpgusername()" being checked earlier than f_leak(Y).
We may need to measure the performance impact due to the reform.

> Also, suppose that Bob applies an RLS policy to a table, and, later,
> Alice selects from the table.  How do we keep Bob from usurping
> Alice's privileges?  If we insist that Bob's RLS policy function runs
> as Bob, then it defeats inlining; but if it runs as Alice, then Bob
> can steal Alice's credentials.  One idea is to apply the security
> policy only if Alice's access to the table is granted by Bob.  That
> way, if Alice is (for example) the superuser, she's immune to RLS.
> But that doesn't seem to completely solve the problem, because Alice
> might merely be some other relatively unprivileged user and we still
> don't want Bob to be able to walk off with her access.
>
I think, this situation is similar to a case when we reference a view
without privileges to underlying tables. If Bob set up a view with
something "tricky" function, it allows Bob to reference credentials
of users who reference the view.
More or less, it might be a problem when a user try to invoke
a user defined function declared by others.
(Thus, sepgsql policy does not allow users to invoke a function
declared by another one in different domain; without DBA's checks.)

I think it is a good idea not to apply RLS when current user has
superuser privilege from perspective of security model consistency,
but it is inconsistent to check privileges underlying tables.

> Another idea is to set things up so that the RLS policy function isn't
> applied to each row directly; instead, it's invoked once per query and
> *returns* a WHERE clause.  This would be a lot more powerful than the
> proposed design, because now the table owner can write a function that
> imposes quals on some people but not others, which seems very useful.
>
Sorry, I don't favor this idea. Even if table owner set up a function to
generate additional qualifiers, it also has no guarantee the qualifiers
are invoked prior to user-given one.
It seems to me this approach will have same problem...

>>> Also, if the point here is to provide security for tables not views,
>>> it seems like you really need to have (at least a design for) RLS
>>> security on insert/update/delete operations.  Just adding the same
>>> filter condition might be adequate for deletes, but I don't think it
>>> works at all for inserts.  And for updates, what prevents the user from
>>> updating columns he shouldn't, or updating them to key values he
>>> shouldn't be able to use?
>>>
>> If we also apply the security policy to newer version of tuples on
>> update and insert, one idea is to inject a before-row-(update|insert)
>> trigger to check whether it satisfies the security policy.
>> For same reason, the trigger should be executed at the end of
>> trigger chain.
>
> It's not clear to me that there is any need for built-in server
> functionality here.  If the table owner wants to enforce some sort of
> policy regarding INSERT or UPDATE or DELET

Re: [HACKERS] [RFC] Interface of Row Level Security

On May23, 2012, at 22:12 , Robert Haas wrote:
> Also, suppose that Bob applies an RLS policy to a table, and, later,
> Alice selects from the table.  How do we keep Bob from usurping
> Alice's privileges?

That problem isn't restricted to RLW, though. Bob could just as well
have booby-trapped any other SQL object, e.g. could have added
bobs_malicious_function() to a view Alice selects from, or attached
it as a trigger to a table Alice inserts to.

It would be nice if there was (optional) protection against these
kinds of attacks, but it's not really something that RLS should be
burdened with.

BTW, I've wondered in the past how tight our protection against some
trying to take over the postgres role during pg_dump is. On the surface,
it seems that we're safe because pg_dump doesn't invoke user-defined
functions except output functions (which require superuser privileges
to modify). But that's not exactly a solid line of defense...

>> If we also apply the security policy to newer version of tuples on
>> update and insert, one idea is to inject a before-row-(update|insert)
>> trigger to check whether it satisfies the security policy.
>> For same reason, the trigger should be executed at the end of
>> trigger chain.
> 
> It's not clear to me that there is any need for built-in server
> functionality here.  If the table owner wants to enforce some sort of
> policy regarding INSERT or UPDATE or DELETE, they can already do that
> today just by attaching a trigger to the table.  And they can enforce
> whatever policy they like that way.  Before designing any new
> mechanism, what's wrong with the existing one?

Yeah, applying the security policy to the new row (for UPDATES
and INSERTS) seems weird - the policy determines what you can see,
not what you can store, which might be two different things.

But the security policy should still apply to the old rows, i.e.
you shouldn't be after to UPDATE or DELETE rows you cannot see, no?

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changing the concept of a DATABASE

On May24, 2012, at 11:39 , Susanne Ebrecht wrote:
> There are lots of offices / departments creating maps. Topography maps,
> pipeline maps, nature conservancy (e.g. where are the nests from endangered
> birds?), mineral resources, wire maps, street maps, bicycle / jogging maps,
> tourists maps, tree maps, cadastral land register, and so on.
> 
> All this departments have their own databases for their own maps.
> They only map their own stuff.
> 
> Towns / states / regions have a department where all these maps get collected.

The question is, how do they get collected? If they use some home-grown 
replication,
they might just as well collect them into schemas instead of databases. The 
same is
possible with slony, I think. And if they use WAL-based replication, they have 
no
choice but to collect them in different clusters, so cross-database queries 
within
a cluster wouldn't help.

I think that you're right that reporting would one of the main use-cases for 
cross-
database queries. But reporting is also, I think, one of the main uses-cases for
WAL-based replication. So having cross-database queries with don't allow queries
across multiple replicas will leave quite a few people out in the cold.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_stat_statements temporary file

If pg_stat_statements is set to store it's data across restarts, it
stores it in global/pg_stat_statements.stat. This causes some
interesting things to happen in combination with a base backup -
namely, if you take a base backup *after* you have restarted th
emaster, the slave will get a snapshot of whatever was in the
temporary file at the time of the restart. This is quite unpredictable
- particularly in relation to a slave where it gets a snapshot from
the last restart, not from the base backup, after which it diverges.
AFAICT, it also has the property that if the server crashes, it will
reload the latest snapshot - not reset to 0 or anything like that.

Finally, if the server were to crash *while* the file is being
written, it will get a corrupt file (I haven't tested this part, but
it's rather obvious from the code). I'm pretty sure this could lead to
a situation where the database wouldn't restart.

Fixing the last part is easy - we need to write the file to a
temporary file and then rename() it into place, like we do with the
stats collectors file.

Fixing the first one, I can think of a few things:

1) unlink() the file after we've read it.
2) forcibly exclude the file from base backups taken with
pg_basebackup. We'd still have the problem when it comes to backups
taken manually.
3) avoid loading the file on a standby (that wouldn't fix the similar
problem on the master of course)

And perhaps some other solution I haven't thought of?

If we want to go with option 2, we have another problem - it's in the
global directory. But the name is dependent on what's in a contrib
module. Meaning we'd have to teach core postgresql about a contrib
modules filename, which pretty much breaks the abstraction layer.
Perhaps a better choice here would be to create another directory
under the data directory that is always excluded from base backup, and
store it there? That would also work for third party modules that core
can never learn about...

In general, should a contrib module really store data in the global/
directory? Seems pretty ugly to me...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012/5/24 Florian Pflug :
>>> If we also apply the security policy to newer version of tuples on
>>> update and insert, one idea is to inject a before-row-(update|insert)
>>> trigger to check whether it satisfies the security policy.
>>> For same reason, the trigger should be executed at the end of
>>> trigger chain.
>>
>> It's not clear to me that there is any need for built-in server
>> functionality here.  If the table owner wants to enforce some sort of
>> policy regarding INSERT or UPDATE or DELETE, they can already do that
>> today just by attaching a trigger to the table.  And they can enforce
>> whatever policy they like that way.  Before designing any new
>> mechanism, what's wrong with the existing one?
>
> Yeah, applying the security policy to the new row (for UPDATES
> and INSERTS) seems weird - the policy determines what you can see,
> not what you can store, which might be two different things.
>
> But the security policy should still apply to the old rows, i.e.
> you shouldn't be after to UPDATE or DELETE rows you cannot see, no?
>
The case of INSERT / DELETE are simple; All we need to apply is
checks on either new or old tuples.

In case of UPDATE, we need to check on the old tuple whether use can
see, and on the new tuple whether use can store them.
Indeed, these are different checks, however, it seems like a black hole
if the new tuple is allowed to write but no reader privileges.
I expect most use cases choose same policy on reader timing and
writer times at UPDATE statement.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_stat_statments queryid

Was there any actual reason why we didn't end up exposing queryid in
the pg_stat_statements view?

It would be highly useful when tracking changes over time. Right now I
see people doing md5(query) to do that, which is a lot more ugly (and
obviously uses more space and is slow, too).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LISTEN/NOTIFY Security and the docs

On Fri, May 18, 2012 at 5:08 PM, Chander Ganesan  wrote:
> Hi All,
>
> I just realized that anyone can listen for notifications (using listen) so
> long as they know the "channel" name.  This means that a user could receive
> and view the payload for another user.
>
> Perhaps it would be good to note this in the documentation (i.e., there
> should be no expectation of privacy/security when using listen/notify, so
> any user that can connect to a database could issue and receive
> notifications for any channel.)

Might be worth a note, yes. The lack of a note really should tell you
that it's a broadcast, but it wouldn't hurt to have an extra one.

Want to prepare a patch?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

On May24, 2012, at 12:43 , Kohei KaiGai wrote:
> The case of INSERT / DELETE are simple; All we need to apply is
> checks on either new or old tuples.
> 
> In case of UPDATE, we need to check on the old tuple whether use can
> see, and on the new tuple whether use can store them.
> Indeed, these are different checks, however, it seems like a black hole
> if the new tuple is allowed to write but no reader privileges.
> I expect most use cases choose same policy on reader timing and
> writer times at UPDATE statement.

I don't think preventing block holes is sensible here - it might,
in fact, be *just* what the user wants.

Imagine a messaging system. A reasonable RLS policy would be to allow
a user to see messages addressed to him. Yet you wouldn't want to prevent
her from creating messages to other people - cause what good is a messaging
system that only allows you to send messages to yourself. What you
probably *would* want to do, though, is to check that she did put herself in
as the sender when she creates a message. And you'd probably wanna forbit
updates entirely. So you'd have

  - A RLS policy that checks current_user = ANY(recipients)
  - An ON INSERT trigger which checks current_user = sender
  - An ON UPDATE trigger which errors out

If RLS policy applies to INSERTEed rows also, how would you do that?

Another example, although in the realm of filesystem permissions, is Mac OS X.
Per default, every user has a "Drop Box" folder, which anybody can write to, yet
only the owner can read. This allows you to easily transfer files from one
user to another without allowing a third party to read it.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements temporary file

On 24 May 2012 11:43, Magnus Hagander  wrote:
> In general, should a contrib module really store data in the global/
> directory? Seems pretty ugly to me...

I think the case could be made for moving pg_stat_statements into
core, as an optionally enabled view, like pg_stat_user_functions,
since pg_stat_statements is now rather a lot more useful than it used
to be. That would solve that problem, as well as putting
pg_stat_statements into the hands of the largest possible number of
people, which would be a positive development, in my humble and fairly
predictable opinion.

However, pg_stat_statements will not prevent the database from
starting if the file is corrupt. It makes some basic attempts to
detect that within pgss_shmem_startup(), and will simply log the
problem and unlink the file in the event of detecting corruption.
Otherwise, I suppose you might get garbage values in
pg_stat_statements, which, while rather annoying and possibly
unacceptable, is hardly the end of the world.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements temporary file

On Thu, May 24, 2012 at 1:36 PM, Peter Geoghegan  wrote:
> On 24 May 2012 11:43, Magnus Hagander  wrote:
>> In general, should a contrib module really store data in the global/
>> directory? Seems pretty ugly to me...
>
> I think the case could be made for moving pg_stat_statements into
> core, as an optionally enabled view, like pg_stat_user_functions,
> since pg_stat_statements is now rather a lot more useful than it used
> to be. That would solve that problem, as well as putting
> pg_stat_statements into the hands of the largest possible number of
> people, which would be a positive development, in my humble and fairly
> predictable opinion.

Well, it would solve the problem for this specific case - but there
will always be yet another extension. Actually, it would only solve
the *ugliness*, and not the actual problem.

(That's not to say tha tI don't agree that moving it into core would
be a good idea, but that's not happening for 9.2 - and the problem
exists in 9.1 as well)


> However, pg_stat_statements will not prevent the database from
> starting if the file is corrupt. It makes some basic attempts to
> detect that within pgss_shmem_startup(), and will simply log the
> problem and unlink the file in the event of detecting corruption.
> Otherwise, I suppose you might get garbage values in
> pg_stat_statements, which, while rather annoying and possibly
> unacceptable, is hardly the end of the world.

Ok. I was worried it might crash on loading the data when it was
corrupt - say a size field that ended up specifying gigabytes that it
then tries to allocate, or something like that.

What actually happens if it tries to repalloc() something huge? palloc
will throw an elog(ERROR), and since this happens during postmaster
startup, are you sure it won't prevent the server from starting?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog stops upon server restart

On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown  wrote:
> On 10 April 2012 21:07, Magnus Hagander  wrote:
>> On Friday, April 6, 2012, Thom Brown wrote:
>>>
>>> Hi,
>>>
>>> I've tried out pg_receivexlog and have noticed that when restarting
>>> the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
>>> This is surprising as the DBA would have to remember to start
>>> pg_receivexlog up again.
>>>
>>
>> This is intentional as far as that's how the code was written, there's not a
>> malfunctioning piece of code somewhere.
>>
>> It would probably make sense to have an auto-reconnect feature, and to have
>> an option to turn it on/off.
>>
>> If you haven't already (my wifi here is currently quite useless, which is
>> why I'm working on my email backlog, so I can't check), please add it to the
>> open items list.
>
> I think it would also be useful to add a paragraph to the
> documentation stating use-cases for this feature, and its advantages.

Attached is a patch that implements this. Seems reasonable?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


pg_receivexlog_loop.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] shared_preload_libraries path

Would i make sense to have a postgresql.conf parameter that would add
to LD_LIBRARY_PATH when loading libraries from
shared_preload_libraries (and other library loads). To make it
possible to configure it without having to  mess around with the
operating system configuration? Or is that too much reimplementing OS
functionality?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements temporary file

On 24 May 2012 12:42, Magnus Hagander  wrote:
> What actually happens if it tries to repalloc() something huge? palloc
> will throw an elog(ERROR), and since this happens during postmaster
> startup, are you sure it won't prevent the server from starting?

Oh, yes, missed that.

/* Previous incarnation might have had a larger query_size */
if (temp.query_len >= buffer_size)
{
buffer = (char *) repalloc(buffer, temp.query_len + 1);
buffer_size = temp.query_len + 1;
}

Here, "temp" receives its value from an fread().

This could probably be coded to be defensive against such things, but
a better fix would be preferred. I have to wonder how much of a
problem corruption is likely to be though, given that we only save to
disk in a corresponding pgss_shmem_shutdown() call, which actually has
more protections against corruption. The window for the saved file to
be corrupt seems rather small, though I accept that a better window
would be zero.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statements temporary file

On Thu, May 24, 2012 at 2:16 PM, Peter Geoghegan  wrote:
> On 24 May 2012 12:42, Magnus Hagander  wrote:
>> What actually happens if it tries to repalloc() something huge? palloc
>> will throw an elog(ERROR), and since this happens during postmaster
>> startup, are you sure it won't prevent the server from starting?
>
> Oh, yes, missed that.
>
>                /* Previous incarnation might have had a larger query_size */
>                if (temp.query_len >= buffer_size)
>                {
>                        buffer = (char *) repalloc(buffer, temp.query_len + 1);
>                        buffer_size = temp.query_len + 1;
>                }
>
> Here, "temp" receives its value from an fread().
>
> This could probably be coded to be defensive against such things, but
> a better fix would be preferred. I have to wonder how much of a
> problem corruption is likely to be though, given that we only save to
> disk in a corresponding pgss_shmem_shutdown() call, which actually has
> more protections against corruption. The window for the saved file to
> be corrupt seems rather small, though I accept that a better window
> would be zero.

Right. But writing to a temp file and rename()ing it into place is trivial.

It's really the other issues raised that are bigger ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared_preload_libraries path

On 24 May 2012 13:09, Magnus Hagander  wrote:
> Would i make sense to have a postgresql.conf parameter that would add
> to LD_LIBRARY_PATH when loading libraries from
> shared_preload_libraries (and other library loads). To make it
> possible to configure it without having to  mess around with the
> operating system configuration? Or is that too much reimplementing OS
> functionality?

-1, I'm afraid.

I found this blog post to be insightful:

https://blogs.oracle.com/rie/entry/tt_ld_library_path_tt

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared_preload_libraries path

On 24 May 2012 13:22, Peter Geoghegan  wrote:
> I found this blog post to be insightful:
>
> https://blogs.oracle.com/rie/entry/tt_ld_library_path_tt

This one might be more useful, and itself refers to the
aforementioned, earlier post:

https://blogs.oracle.com/ali/entry/avoiding_ld_library_path_the

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog stops upon server restart

On 24 May 2012 13:05, Magnus Hagander  wrote:
> On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown  wrote:
>> On 10 April 2012 21:07, Magnus Hagander  wrote:
>>> On Friday, April 6, 2012, Thom Brown wrote:

 Hi,

 I've tried out pg_receivexlog and have noticed that when restarting
 the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
 This is surprising as the DBA would have to remember to start
 pg_receivexlog up again.

>>>
>>> This is intentional as far as that's how the code was written, there's not a
>>> malfunctioning piece of code somewhere.
>>>
>>> It would probably make sense to have an auto-reconnect feature, and to have
>>> an option to turn it on/off.
>>>
>>> If you haven't already (my wifi here is currently quite useless, which is
>>> why I'm working on my email backlog, so I can't check), please add it to the
>>> open items list.
>>
>> I think it would also be useful to add a paragraph to the
>> documentation stating use-cases for this feature, and its advantages.
>
> Attached is a patch that implements this. Seems reasonable?

s/non fatal/non-fatal/

Yes, this solves the problem for me, except you forgot to translate
noloop in long_options[] . :)

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog stops upon server restart

On Thu, May 24, 2012 at 2:34 PM, Thom Brown  wrote:
> On 24 May 2012 13:05, Magnus Hagander  wrote:
>> On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown  wrote:
>>> On 10 April 2012 21:07, Magnus Hagander  wrote:
 On Friday, April 6, 2012, Thom Brown wrote:
>
> Hi,
>
> I've tried out pg_receivexlog and have noticed that when restarting
> the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
> This is surprising as the DBA would have to remember to start
> pg_receivexlog up again.
>

 This is intentional as far as that's how the code was written, there's not 
 a
 malfunctioning piece of code somewhere.

 It would probably make sense to have an auto-reconnect feature, and to have
 an option to turn it on/off.

 If you haven't already (my wifi here is currently quite useless, which is
 why I'm working on my email backlog, so I can't check), please add it to 
 the
 open items list.
>>>
>>> I think it would also be useful to add a paragraph to the
>>> documentation stating use-cases for this feature, and its advantages.
>>
>> Attached is a patch that implements this. Seems reasonable?
>
> s/non fatal/non-fatal/
>
> Yes, this solves the problem for me, except you forgot to translate
> noloop in long_options[] . :)
Fixed :-)

Did you test it, or just assumed it worked? ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_receivexlog stops upon server restart

On 24 May 2012 13:37, Magnus Hagander  wrote:
> On Thu, May 24, 2012 at 2:34 PM, Thom Brown  wrote:
>> On 24 May 2012 13:05, Magnus Hagander  wrote:
>>> On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown  wrote:
 On 10 April 2012 21:07, Magnus Hagander  wrote:
> On Friday, April 6, 2012, Thom Brown wrote:
>>
>> Hi,
>>
>> I've tried out pg_receivexlog and have noticed that when restarting
>> the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
>> This is surprising as the DBA would have to remember to start
>> pg_receivexlog up again.
>>
>
> This is intentional as far as that's how the code was written, there's 
> not a
> malfunctioning piece of code somewhere.
>
> It would probably make sense to have an auto-reconnect feature, and to 
> have
> an option to turn it on/off.
>
> If you haven't already (my wifi here is currently quite useless, which is
> why I'm working on my email backlog, so I can't check), please add it to 
> the
> open items list.

 I think it would also be useful to add a paragraph to the
 documentation stating use-cases for this feature, and its advantages.
>>>
>>> Attached is a patch that implements this. Seems reasonable?
>>
>> s/non fatal/non-fatal/
>>
>> Yes, this solves the problem for me, except you forgot to translate
>> noloop in long_options[] . :)
> Fixed :-)
>
> Did you test it, or just assumed it worked? ;)

How very dare you. Of course I tested it.  It successfully reconnects
on multiple restarts, checks intermittently when I've stopped the
server, showing the connection error message, successfully continues
when I eventually bring the server back up, and doesn't attempt a
reconnect when using -n.

So looks good to me.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interrupting long external library calls

2012-05-24 Thread Sandro Santilli

On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:
> On 16.05.2012 15:42, Sandro Santilli wrote:
> >But CHECK_FOR_INTERRUPTS doesn't return, right ?
> >Is there another macro for just checking w/out yet acting upon it ?
> 
> Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
> variable, but on Windows it also checks for
> UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
> not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
> think InterruptPending can be set spuriously (even if that's not
> possible today, I wouldn't rely on it), and if you're in a
> HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
> even if InterruptPending is true.
> 
> The only sane way to make 3rd party code interruptible is to add
> CHECK_FOR_INTERRUPTS() to it, in safe places.

No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
How could caller code cleanup on interruption ?

--strk; 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile


Hi,

I've been running some tests on pg 9.2beta1 and in particular a set
of queries like

create table _tmp0 as select * from (
select *, (select healpixid from idt_match as m where
m.transitid=o.transitid)
as x from idt_photoobservation as o offset 0
  ) as y where x%16=ZZZ order by x;

(where ZZZ is some number integer number 0<=ZZZ<16)

With the following plan:
--
 Sort  (cost=3228814504.96..3228815137.21 rows=252902 width=498)
   Sort Key: y.x
   ->  Subquery Scan on y  (cost=0.00..3228791809.25 rows=252902 width=498)
 Filter: ((y.x % 16::bigint) = 0)
 ->  Limit  (cost=0.00..3228033102.41 rows=50580456 width=490)
   ->  Seq Scan on idt_photoobservation o  
(cost=0.00..3228033102.41 rows=50580456 width=490)
 SubPlan 1
   ->  Index Scan using idt_match_transitid_idx on 
idt_match m  (cost=0.00..63.74 rows=1 width=8)
 Index Cond: (transitid = o.transitid)

The schema of the tables are:

e2es2=> \d idt_match
 Table "public.idt_match"
  Column   |   Type   | Modifiers
---+--+---
 sourceid  | bigint   |
 transitid | bigint   |
 healpixid | bigint   |
 flagsxm   | smallint |
Indexes:
"idt_match_idx" btree (healpixid)
"idt_match_transitid_idx" btree (transitid)

Table "public.idt_photoobservation"
  Column   |   Type   | Modifiers
---+--+---
 transitid | bigint   |
 fluxbp0   | real |

more columns

Indexes:
"idt_photoobservation_idx" btree (transitid)

And I noticed than when I run the query like the one shown above in parallel
(in multiple connections for ZZZ=0...8) the performance of each query 
drops down significantly (factor of 2)(despite the fact that during the 
execution of the query postgres is mostly CPU bound).


So I tried to oprofile it and strace it, And that's what I saw:

The strace -c of each process  shows something like that
#
Process 18660 detached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 76.250.001342   0268987   semop
 23.750.000418   0 61694   read
  0.000.00   0   138   lseek
  0.000.00   0   355   select
  0.000.00   0 3   kill
-- --- --- - - 
100.000.001760331177   total
###

And the oprofile shows this on top:
---
2863981  25.7117  ReleasePredicateLocks
  2863981  100.000  ReleasePredicateLocks [self]
---
1246629  11.1917  LocalBufferAlloc
  1246629  100.000  LocalBufferAlloc [self]
---
1135393  10.1931  CheckForSerializableConflictIn
  1135393  100.000  CheckForSerializableConflictIn [self]


So there is a lot of locking for some reason, And I was surprised to see 
anything related to SSI in the profile at all, because I'm not running 
serializable transactions (I was just running my queries from multiple 
psql sessions).


Is there a reasonable explanation for what I'm seeing  ?

Thanks in advance,
Sergey

More info: there is no other concurrent activity on the machine.
shared_buffers is 10G, The machine has ~ 60G of RAM, 24 cores and proper 
RAID


Another note is that the oprofile and strace were obtained during the 
stage when the idt_photo.. table was scanned and data was being stored in 
the pgsql_tmp (not during the last bit when the data in pgsql_tmp were

actually being sorted).


*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interrupting long external library calls

2012-05-24 Thread Heikki Linnakangas


On 24.05.2012 16:04, Sandro Santilli wrote:

On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:

On 16.05.2012 15:42, Sandro Santilli wrote:

But CHECK_FOR_INTERRUPTS doesn't return, right ?
Is there another macro for just checking w/out yet acting upon it ?


Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
variable, but on Windows it also checks for
UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
think InterruptPending can be set spuriously (even if that's not
possible today, I wouldn't rely on it), and if you're in a
HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
even if InterruptPending is true.

The only sane way to make 3rd party code interruptible is to add
CHECK_FOR_INTERRUPTS() to it, in safe places.


No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
How could caller code cleanup on interruption ?


It would only be safe to call it in places where no cleanup is 
necessary. I don't know if there are any such places in the geos library.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

2012-05-24 Thread Joachim Wieland

On Tue, May 22, 2012 at 9:50 AM, Robert Haas  wrote:
> Hmm.  I think that if you do it this way, the minimum recovery point
> won't be respected, which could leave you with a corrupted database.
> Now, if all the WAL files that you need are present in pg_xlog anyway,
> then they ought to get replayed anyway, but I think that if you are
> using restore_command (as opposed to streaming replication) we restore
> WAL segments under a different file name, which might cause this
> problem.

Uhm, maybe I add some more details, so you get a better idea of what I
did: The idea was to promote the standby to be the new master. There
was streaming replication active but at some time I had to take the
master down. IIRC from the log I saw that after the master went down,
the standby continued recovering from a bunch of archived log files
(via recovery_command), I had suspected that either the standby was
lagging behind a bit or that the master archived them during shutdown.
When the standby didn't have anything else left to recover from
(saying both "xlog file foo doesn't exist" and "cannot connect to
master"), I deleted recovery.conf on the standby and restarted it.

I wouldn't have assumed any corruption was possible given that I did
clean shutdowns on both sides...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interrupting long external library calls

2012-05-24 Thread Sandro Santilli

On Thu, May 24, 2012 at 04:55:34PM +0300, Heikki Linnakangas wrote:
> On 24.05.2012 16:04, Sandro Santilli wrote:
> >On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:
> >>On 16.05.2012 15:42, Sandro Santilli wrote:
> >>>But CHECK_FOR_INTERRUPTS doesn't return, right ?
> >>>Is there another macro for just checking w/out yet acting upon it ?
> >>
> >>Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
> >>variable, but on Windows it also checks for
> >>UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
> >>not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
> >>think InterruptPending can be set spuriously (even if that's not
> >>possible today, I wouldn't rely on it), and if you're in a
> >>HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
> >>even if InterruptPending is true.
> >>
> >>The only sane way to make 3rd party code interruptible is to add
> >>CHECK_FOR_INTERRUPTS() to it, in safe places.
> >
> >No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
> >How could caller code cleanup on interruption ?
> 
> It would only be safe to call it in places where no cleanup is
> necessary. I don't know if there are any such places in the geos
> library.

Sure, right before starting and after finishing.
That is nowhere useful for a premature interruption...

The whole point of the work I'm doing these days is letting the
users interrupt GEOS calls which often take a long time and a lot
of memory resources.

The current plan is to provide custom allocators to have automatic
garbage collection on interruption. Turned out not to be an easy task
to bend GEOS into using palloc/pfree, but there's progress in that
direction.

--strk;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [9.2] crash on regex

2012-05-24 Thread Marko Kreen

Following query crashes backend on 9.2:

  select substring('asd TO foo' from ' TO (([a-z0-9._]+|"([^"]+|"")+")+)');

It is supposed to load potentially quoted table name from
CREATE RULE definition.  Works fine on 8.3 .. 9.1

Backtrace:
Program terminated with signal 11, Segmentation fault.
#0  ExecMakeFunctionResult (fcache=0x, econtext=0x953f3e8,
isNull=0xbffb8d1f "\b", isDone=0x0) at execQual.c:1833
1833*isNull = fcinfo->isnull;
(gdb) bt
#0  ExecMakeFunctionResult (fcache=0x, econtext=0x953f3e8,
isNull=0xbffb8d1f "\b", isDone=0x0) at execQual.c:1833
#1  0x08203104 in ExecEvalExprSwitchContext (expression=0x953ef60,
econtext=0x953f3e8, isNull=0xbffb8d1f "\b", isDone=0x0) at
execQual.c:4187
#2  0x082844d7 in evaluate_expr (expr=,
result_type=25, result_typmod=,
result_collation=100) at clauses.c:4408
...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012/5/24 Florian Pflug :
> On May24, 2012, at 12:43 , Kohei KaiGai wrote:
>> The case of INSERT / DELETE are simple; All we need to apply is
>> checks on either new or old tuples.
>>
>> In case of UPDATE, we need to check on the old tuple whether use can
>> see, and on the new tuple whether use can store them.
>> Indeed, these are different checks, however, it seems like a black hole
>> if the new tuple is allowed to write but no reader privileges.
>> I expect most use cases choose same policy on reader timing and
>> writer times at UPDATE statement.
>
> I don't think preventing block holes is sensible here - it might,
> in fact, be *just* what the user wants.
>
> Imagine a messaging system. A reasonable RLS policy would be to allow
> a user to see messages addressed to him. Yet you wouldn't want to prevent
> her from creating messages to other people - cause what good is a messaging
> system that only allows you to send messages to yourself. What you
> probably *would* want to do, though, is to check that she did put herself in
> as the sender when she creates a message. And you'd probably wanna forbit
> updates entirely. So you'd have
>
>  - A RLS policy that checks current_user = ANY(recipients)
>  - An ON INSERT trigger which checks current_user = sender
>  - An ON UPDATE trigger which errors out
>
> If RLS policy applies to INSERTEed rows also, how would you do that?
>
> Another example, although in the realm of filesystem permissions, is Mac OS X.
> Per default, every user has a "Drop Box" folder, which anybody can write to, 
> yet
> only the owner can read. This allows you to easily transfer files from one
> user to another without allowing a third party to read it.
>
Indeed, you are right. We have no special reason why to enforce same rules
on both of reader and writer stage on UPDATE statement.

So, the proposed interface might be revised as follows:
  ALTER TABLE  ADD SECURITY POLICY
  (, ...) [FOR SELECT |
 INSERT |
 [BEFORE|AFTER] UPDATE |
 DELETE];

In case of INSERT or AFTER UPDATE, I assume the check shall be applied
on the tail of before-row triggers.

(*) I don't check whether it conflicts syntax or not yet.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statments queryid

On 24 May 2012 11:50, Magnus Hagander  wrote:
> Was there any actual reason why we didn't end up exposing queryid in
> the pg_stat_statements view?
>
> It would be highly useful when tracking changes over time. Right now I
> see people doing md5(query) to do that, which is a lot more ugly (and
> obviously uses more space and is slow, too).

Right. I continue to maintain that this is a good idea. I raised the
issue more than once. However, my proposal was not accepted by Tom and
Robert, apparently on the basis that queryId's actual value was
partially dictated by things like the endianness of the architecture
used, and the value of OIDs when serialising and subsequently hashing
the post-analysis tree.

What I'd like to be able to do is aggregate this information over time
and/or across standbys in a cluster, as queries are evicted and
subsequently re-entered into pg_stat_statement's shared hash table.
Now, there are situations were this isn't going to work, like when a
third-party logical replication system is used. That's unfortunate,
but I wouldn't expect it makes the information any less useful to the
large majority of people. I'd also credit our users with being
discerning enough to realise that they should not jump to the
conclusion that the value will be stable according to any particular
standard.

Arguments against including an internal value in the view could
equally well be applied to any of the internal statistic collector
views, many of which have oid columns, despite the fact that various
"name" columns already unambiguously identify tuples in most cases. I
see no reason for the inconsistency, particularly given that the
pg_stat_statements.query column *is* still somewhat ambiguous, as
described in the docs, and given that the query hash value referred to
in the docs anyway.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared_preload_libraries path

Peter Geoghegan  writes:
> On 24 May 2012 13:09, Magnus Hagander  wrote:
>> Would i make sense to have a postgresql.conf parameter that would add
>> to LD_LIBRARY_PATH when loading libraries from
>> shared_preload_libraries (and other library loads). To make it
>> possible to configure it without having to mess around with the
>> operating system configuration? Or is that too much reimplementing OS
>> functionality?

> -1, I'm afraid.

I think the same.  Aside from the problems Peter points out, there are
too many different ways to spell that variable name on different
platforms.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interrupting long external library calls

On May24, 2012, at 15:04 , Sandro Santilli wrote:
> On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:
>> On 16.05.2012 15:42, Sandro Santilli wrote:
>>> But CHECK_FOR_INTERRUPTS doesn't return, right ?
>>> Is there another macro for just checking w/out yet acting upon it ?
>> 
>> Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
>> variable, but on Windows it also checks for
>> UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
>> not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
>> think InterruptPending can be set spuriously (even if that's not
>> possible today, I wouldn't rely on it), and if you're in a
>> HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
>> even if InterruptPending is true.
>> 
>> The only sane way to make 3rd party code interruptible is to add
>> CHECK_FOR_INTERRUPTS() to it, in safe places.
> 
> No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
> How could caller code cleanup on interruption ?

The postgres way is to use PG_TRY/PG_CATCH to make sure stuff gets cleaned
up if an error or an interrupts occurs. You could use those to make the
third-party library exception safe, but it'll probably be a quite
invasive change :-(.

Alternatively, you could replicate the check CHECK_FOR_INTERRUPTS() does,
but then don't actually call ProcessInterrupts() but instead just make
the third-party code abort signalling an error, and call
CHECK_FOR_INTERRUPTS() after the third-party code has returned. On unix-like
systems that'd be a simple as aborting if InterruptPending is set, while
on windows you'd have to do the if(UNBLOCKED_SIGNAL_QUEUE())… stuff first,
since otherwise InterruptPending will never get set. The following function
should do the trick

  bool have_pending_interrupts() {
  #ifdef WIN32
if (UNBLOCKED_SIGNAL_QUEUE())
  pgwin32_dispatch_queued_signals();
  #endif

return InterruptPending && !InterruptHoldoffCount && !CritSectionCount;
  }

The third-party could would then do

  if (have_pending_interrupts())
return some_error;

and you'd invoke in with

  state = third_party_code();
  CHECK_FOR_INTERRUPTS();
  if (state != success)
ereport(…);

There might be slim chance for false positives with that approach, since
ProcessInterrupts() might not always ereport(), even if InterruptHoldoffCount
and CritSectionCount are zero. But they'll simply get turned into spurious
ereport() by the if(state != success) check after CHECK_FOR_INTERRUPTS, and
should be very rare, and happen only shortly after a query cancel request
was received.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statments queryid

On Thu, May 24, 2012 at 4:26 PM, Peter Geoghegan  wrote:
> On 24 May 2012 11:50, Magnus Hagander  wrote:
>> Was there any actual reason why we didn't end up exposing queryid in
>> the pg_stat_statements view?
>>
>> It would be highly useful when tracking changes over time. Right now I
>> see people doing md5(query) to do that, which is a lot more ugly (and
>> obviously uses more space and is slow, too).
>
> Right. I continue to maintain that this is a good idea. I raised the
> issue more than once. However, my proposal was not accepted by Tom and
> Robert, apparently on the basis that queryId's actual value was
> partially dictated by things like the endianness of the architecture
> used, and the value of OIDs when serialising and subsequently hashing
> the post-analysis tree.
>
> What I'd like to be able to do is aggregate this information over time
> and/or across standbys in a cluster, as queries are evicted and
> subsequently re-entered into pg_stat_statement's shared hash table.

That's exactly the usecase I'm looking at here, except it's not
actually across standbys in this case.


> Now, there are situations were this isn't going to work, like when a
> third-party logical replication system is used. That's unfortunate,
> but I wouldn't expect it makes the information any less useful to the
> large majority of people. I'd also credit our users with being
> discerning enough to realise that they should not jump to the
> conclusion that the value will be stable according to any particular
> standard.

As long as it's documented as such, I don't see a problem with that at all.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statments queryid

On Thu, May 24, 2012 at 10:26 AM, Peter Geoghegan  wrote:
> On 24 May 2012 11:50, Magnus Hagander  wrote:
>> Was there any actual reason why we didn't end up exposing queryid in
>> the pg_stat_statements view?
>>
>> It would be highly useful when tracking changes over time. Right now I
>> see people doing md5(query) to do that, which is a lot more ugly (and
>> obviously uses more space and is slow, too).
>
> Right. I continue to maintain that this is a good idea. I raised the
> issue more than once. However, my proposal was not accepted by Tom and
> Robert, apparently on the basis that queryId's actual value was
> partially dictated by things like the endianness of the architecture
> used, and the value of OIDs when serialising and subsequently hashing
> the post-analysis tree.

No, my concern was more that I wasn't clear on what you hoped to do with it.

But I think this explanation is enough to convince me that it might be
worthwhile:

> What I'd like to be able to do is aggregate this information over time
> and/or across standbys in a cluster, as queries are evicted and
> subsequently re-entered into pg_stat_statement's shared hash table.
> Now, there are situations were this isn't going to work, like when a
> third-party logical replication system is used. That's unfortunate,
> but I wouldn't expect it makes the information any less useful to the
> large majority of people. I'd also credit our users with being
> discerning enough to realise that they should not jump to the
> conclusion that the value will be stable according to any particular
> standard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

On May24, 2012, at 16:19 , Kohei KaiGai wrote:
> So, the proposed interface might be revised as follows:
>  ALTER TABLE  ADD SECURITY POLICY
>  (, ...) [FOR SELECT |
> INSERT |
> [BEFORE|AFTER] UPDATE 
> |
> DELETE];
> 
> In case of INSERT or AFTER UPDATE, I assume the check shall be applied
> on the tail of before-row triggers.

I'd go with just "SELECT, UPDATE, DELETE", and leave the INSERT and BEFORE
UPDATE case to regular triggers, for two reasons

First, it's conceptually much simpler, since the policy always just adds
an implicit WHERE clause, period. This of course assumes that DELETE and
(BEFORE) UPDATE simply skips rows for which the policy function returns false,
instead of reporting 'permission denied' or something. But that's the most
reasonable behaviour anyway, I think, because otherwise you'd make batch
UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
of tripping over some invisible row and getting and error.

And second, it avoids mimicking functionality that is already provided
by an existing feature, namely triggers.

People will have to deal with the trigger ordering issue, but that's nothing
new, and I bet most people have a system in place for that. I usually prefix
my trigger names with 'a_' to 'z_', for example, to make the ordering explicit.

Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
created references to rows which are invisible to you, or should FOREIGN KEY
constraints be exempt from security policies? I'd say they shouldn't be, i.e.
the policy WHERE clause should be added to constraint checking queries like
usual. But maybe I'm missing some reason why that'd be undesirable…

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statments queryid

Robert Haas  writes:
> On Thu, May 24, 2012 at 10:26 AM, Peter Geoghegan  
> wrote:
> But I think this explanation is enough to convince me that it might be
> worthwhile:

>> What I'd like to be able to do is aggregate this information over time
>> and/or across standbys in a cluster, as queries are evicted and
>> subsequently re-entered into pg_stat_statement's shared hash table.

It appears to me that the above ...

>> ... I'd also credit our users with being
>> discerning enough to realise that they should not jump to the
>> conclusion that the value will be stable according to any particular
>> standard.

... is in direct contradiction to this.  The proposed usage absolutely
requires that the hash be stable "over time and/or across standbys".
I do not want to promise that it's stable over any timeframe longer than
a server reboot.  Aside from the OID dependence problem, we might well
change the way the hash is calculated in minor releases, for example by
adding or removing struct fields.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.2] crash on regex

On Thu, May 24, 2012 at 10:16 AM, Marko Kreen  wrote:
> Following query crashes backend on 9.2:
>
>  select substring('asd TO foo' from ' TO (([a-z0-9._]+|"([^"]+|"")+")+)');

I spent some time trying to reduce this to the simplest case that
still causes a crash, and came up with this:

select substring('a' from '((a))+');

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

On Thu, May 24, 2012 at 10:13 AM, Joachim Wieland  wrote:
> On Tue, May 22, 2012 at 9:50 AM, Robert Haas  wrote:
>> Hmm.  I think that if you do it this way, the minimum recovery point
>> won't be respected, which could leave you with a corrupted database.
>> Now, if all the WAL files that you need are present in pg_xlog anyway,
>> then they ought to get replayed anyway, but I think that if you are
>> using restore_command (as opposed to streaming replication) we restore
>> WAL segments under a different file name, which might cause this
>> problem.
>
> Uhm, maybe I add some more details, so you get a better idea of what I
> did: The idea was to promote the standby to be the new master. There
> was streaming replication active but at some time I had to take the
> master down. IIRC from the log I saw that after the master went down,
> the standby continued recovering from a bunch of archived log files
> (via recovery_command), I had suspected that either the standby was
> lagging behind a bit or that the master archived them during shutdown.
> When the standby didn't have anything else left to recover from
> (saying both "xlog file foo doesn't exist" and "cannot connect to
> master"), I deleted recovery.conf on the standby and restarted it.
>
> I wouldn't have assumed any corruption was possible given that I did
> clean shutdowns on both sides...

The thing that's worrying me is that there's not really any such thing
as a "clean" shutdown on a standby.  When you shut down the master, it
checkpoints.  When you shut down the standby, it can't checkpoint, so
I think it's still going to enter recovery at startup.  It'd be
interesting to know where that recovery began and ended as compared
with the minimum recovery point just before the shutdown.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 8:24 AM, Sergey Koposov  wrote:
> Hi,
>
> I've been running some tests on pg 9.2beta1 and in particular a set
> of queries like
>
> create table _tmp0 as select * from (
>        select *, (select healpixid from idt_match as m where
>                                        m.transitid=o.transitid)
>                as x from idt_photoobservation as o offset 0
>          ) as y where x%16=ZZZ order by x;
>
> (where ZZZ is some number integer number 0<=ZZZ<16)
>
> With the following plan:
> --
>  Sort  (cost=3228814504.96..3228815137.21 rows=252902 width=498)
>   Sort Key: y.x
>   ->  Subquery Scan on y  (cost=0.00..3228791809.25 rows=252902 width=498)
>         Filter: ((y.x % 16::bigint) = 0)
>         ->  Limit  (cost=0.00..3228033102.41 rows=50580456 width=490)
>               ->  Seq Scan on idt_photoobservation o
>  (cost=0.00..3228033102.41 rows=50580456 width=490)
>                     SubPlan 1
>                       ->  Index Scan using idt_match_transitid_idx on
> idt_match m  (cost=0.00..63.74 rows=1 width=8)
>                             Index Cond: (transitid = o.transitid)
>
> The schema of the tables are:
>
> e2es2=> \d idt_match
>     Table "public.idt_match"
>  Column   |   Type   | Modifiers
> ---+--+---
>  sourceid  | bigint   |
>  transitid | bigint   |
>  healpixid | bigint   |
>  flagsxm   | smallint |
> Indexes:
>    "idt_match_idx" btree (healpixid)
>    "idt_match_transitid_idx" btree (transitid)
>
> Table "public.idt_photoobservation"
>  Column   |   Type   | Modifiers
> ---+--+---
>  transitid | bigint   |
>  fluxbp0   | real     |
> 
> more columns
> 
> Indexes:
>    "idt_photoobservation_idx" btree (transitid)
>
> And I noticed than when I run the query like the one shown above in parallel
> (in multiple connections for ZZZ=0...8) the performance of each query drops
> down significantly (factor of 2)(despite the fact that during the execution
> of the query postgres is mostly CPU bound).
>
> So I tried to oprofile it and strace it, And that's what I saw:
>
> The strace -c of each process  shows something like that
> #
> Process 18660 detached
> % time     seconds  usecs/call     calls    errors syscall
> -- --- --- - - 
>  76.25    0.001342           0    268987           semop
>  23.75    0.000418           0     61694           read
>  0.00    0.00           0       138           lseek
>  0.00    0.00           0       355           select
>  0.00    0.00           0         3           kill
> -- --- --- - - 
> 100.00    0.001760                331177           total
> ###
>
> And the oprofile shows this on top:
> ---
> 2863981  25.7117  ReleasePredicateLocks
>  2863981  100.000  ReleasePredicateLocks [self]
> ---
> 1246629  11.1917  LocalBufferAlloc
>  1246629  100.000  LocalBufferAlloc [self]
> ---
> 1135393  10.1931  CheckForSerializableConflictIn
>  1135393  100.000  CheckForSerializableConflictIn [self]
> 
>
> So there is a lot of locking for some reason, And I was surprised to see
> anything related to SSI in the profile at all, because I'm not running
> serializable transactions (I was just running my queries from multiple psql
> sessions).

Are you sure?  I looked at all the ReleasePredicateLocks calls and
they appear to be guarded by:

/* Nothing to do if this is not a serializable transaction */
if (MySerializableXact == InvalidSerializableXact)
return false;

What's the default isolation mode set to?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statments queryid

On 24 May 2012 16:06, Tom Lane  wrote:
> I do not want to promise that it's stable over any timeframe longer than
> a server reboot.

You already have though, since pg_stat_statements persistently stores
statistics to disk by default, and can only ever recognise statement
equivalence based on the (dbid, userid, queryid) hash key.

> Aside from the OID dependence problem, we might well
> change the way the hash is calculated in minor releases, for example by
> adding or removing struct fields.

You've already invalidated the saved statistics if you do that, so all
bets are off anyway. If you have to do it, it'll be necessary to bump
PGSS_FILE_HEADER, so that pg_stat_statements will be cleared upon
restart. That will in turn necessitate documenting the issue in the
minor version release notes. I'd hope to avoid that, but it doesn't
seem to me that the situation is made any worse than before by
exposing the value. On the contrary, it could help users to understand
where the problem may have affected them.

If you don't expose the value, users are going to do this sort of
thing anyway, but will be far worse off due to using the query text or
a hash thereof instead of the internal value.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creating multiple indexes in one table scan.

2012-05-24 Thread Stephen Frost

* Gurjeet Singh (singh.gurj...@gmail.com) wrote:
> Bruce points out the even simpler case is to build several indexes in
> parallel over the same scan.
> 
> I thought I had posted a patch to that effect long back, but upon searching
> my emails apparently I forgot about the patch.
> 
> Attached is the patch that I developed in Nov. 2010, so expect a lot of bit
> rot. I had tried to make it elegant, but I have to admit its a hack. This
> patch does not imply that it is using any kind of parallelism, the context
> in which that above statement was made. It just helps to avoid scanning the
> same relation multiple times. I performed some tests on it and AFAICR, this
> did not produce a net win. But those tests may have been performed in a
> virtual machine and not on a bare metal, I forget.

I'm not too surprised that it didn't help all that much since you're
doing everything in one backend.  My guess at what Bruce was talking
about is being able to actually have multiple CREATE INDEX's going in
parallel in different backends.  If they didn't all grab an
AccessExclusive lock, wouldn't they be able to use the same ring buffer
to read through the table anyway?  And with CREATE INDEX CONCURRENTLY,
isn't this supported already?

I haven't tried this yet, but it sure seems like something we could
probably already claim to support..

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [9.2] crash on regex

On 24 May 2012 16:08, Robert Haas  wrote:
> On Thu, May 24, 2012 at 10:16 AM, Marko Kreen  wrote:
>> Following query crashes backend on 9.2:
>>
>>  select substring('asd TO foo' from ' TO (([a-z0-9._]+|"([^"]+|"")+")+)');
>
> I spent some time trying to reduce this to the simplest case that
> still causes a crash, and came up with this:
>
> select substring('a' from '((a))+');

It appears to occur with any quantifier attached to a group that is
more than 1 level deep with nothing between the groups, such as
((a))?, ((a))*, ((a)){1,4}, (((a)))+

Or if the quantifier is the only additional thing between the groups,
such as ((a)+)

But when breaking the groups up, it's fine, so this works: ((a)b)+

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creating multiple indexes in one table scan.

2012-05-24 Thread Gurjeet Singh

On Thu, May 24, 2012 at 11:22 AM, Stephen Frost  wrote:

> * Gurjeet Singh (singh.gurj...@gmail.com) wrote:
> > Bruce points out the even simpler case is to build several indexes in
> > parallel over the same scan.
> >
> > I thought I had posted a patch to that effect long back, but upon
> searching
> > my emails apparently I forgot about the patch.
> >
> > Attached is the patch that I developed in Nov. 2010, so expect a lot of
> bit
> > rot. I had tried to make it elegant, but I have to admit its a hack. This
> > patch does not imply that it is using any kind of parallelism, the
> context
> > in which that above statement was made. It just helps to avoid scanning
> the
> > same relation multiple times. I performed some tests on it and AFAICR,
> this
> > did not produce a net win. But those tests may have been performed in a
> > virtual machine and not on a bare metal, I forget.
>
> I'm not too surprised that it didn't help all that much since you're
> doing everything in one backend.  My guess at what Bruce was talking
> about is being able to actually have multiple CREATE INDEX's going in
> parallel in different backends.  If they didn't all grab an
> AccessExclusive lock, wouldn't they be able to use the same ring buffer
> to read through the table anyway?  And with CREATE INDEX CONCURRENTLY,
> isn't this supported already?
>
> I haven't tried this yet, but it sure seems like something we could
> probably already claim to support..
>

It'd be great if one of standard utilities like pg_restore supported this,
by spawning every concurrent index build in separate backends. Just a
thought.


-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] [9.2] crash on regex

On 24 May 2012 16:24, Thom Brown  wrote:
> On 24 May 2012 16:08, Robert Haas  wrote:
>> On Thu, May 24, 2012 at 10:16 AM, Marko Kreen  wrote:
>>> Following query crashes backend on 9.2:
>>>
>>>  select substring('asd TO foo' from ' TO (([a-z0-9._]+|"([^"]+|"")+")+)');
>>
>> I spent some time trying to reduce this to the simplest case that
>> still causes a crash, and came up with this:
>>
>> select substring('a' from '((a))+');
>
> It appears to occur with any quantifier attached to a group that is
> more than 1 level deep with nothing between the groups, such as
> ((a))?, ((a))*, ((a)){1,4}, (((a)))+
>
> Or if the quantifier is the only additional thing between the groups,
> such as ((a)+)
>
> But when breaking the groups up, it's fine, so this works: ((a)b)+

Hmmm... curiously, lazy (non-greedy) quantifiers are stable, such as: ((a))*?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_statments queryid

On 24 May 2012 16:06, Tom Lane  wrote:
>>> What I'd like to be able to do is aggregate this information over time
>>> and/or across standbys in a cluster, as queries are evicted and
>>> subsequently re-entered into pg_stat_statement's shared hash table.
>
> It appears to me that the above ...
>
>>> ... I'd also credit our users with being
>>> discerning enough to realise that they should not jump to the
>>> conclusion that the value will be stable according to any particular
>>> standard.
>
> ... is in direct contradiction to this.

I simply meant that we ought to be able to trust that people will
actually investigate the stability guarantees of a newly exposed
query_id before going and writing a tool that does some sort of
aggregation - they should and will make informed decisions. If that
guarantee is limited to "we might have to change the walker logic
during a minor release, so the value might change for some queries, so
we don't promise that it will be a stable identifier but will
naturally strive to do our best to avoid invalidating existing
statistics (within postgres and aggregated by external tools)", that
wouldn't put many people off. Still, I don't think it's all that
likely that we'll ever have to adjust the walker logic, since
pg_stat_statements already doesn't strictly promise that collisions
cannot occur, while making them very improbable.

I'm not even asking that possible uses for queryId be documented as
being useful for this sort of thing. I only ask that we expose it and
document it.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creating multiple indexes in one table scan.

On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singh  wrote:
> It'd be great if one of standard utilities like pg_restore supported this,
> by spawning every concurrent index build in separate backends. Just a
> thought.

If parallel restore doesn't already take this into account when doing
job scheduling, that would be a worthwhile improvement to consider.

Personally, I think the big win in this area is likely to be parallel
sort.  There may well be some more we can squeeze out of our existing
sort implementation first, and I'm all in favor of that, but
ultimately if you've got 60GB of data to sort and it's all in cache,
you want to be able to use more than one CPU for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

2012/5/24 Florian Pflug :
> On May24, 2012, at 16:19 , Kohei KaiGai wrote:
>> So, the proposed interface might be revised as follows:
>>  ALTER TABLE  ADD SECURITY POLICY
>>      (, ...) [FOR SELECT |
>>                                                         INSERT |
>>                                                         [BEFORE|AFTER] 
>> UPDATE |
>>                                                         DELETE];
>>
>> In case of INSERT or AFTER UPDATE, I assume the check shall be applied
>> on the tail of before-row triggers.
>
> I'd go with just "SELECT, UPDATE, DELETE", and leave the INSERT and BEFORE
> UPDATE case to regular triggers, for two reasons
>
> First, it's conceptually much simpler, since the policy always just adds
> an implicit WHERE clause, period. This of course assumes that DELETE and
> (BEFORE) UPDATE simply skips rows for which the policy function returns false,
> instead of reporting 'permission denied' or something. But that's the most
> reasonable behaviour anyway, I think, because otherwise you'd make batch
> UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
> of tripping over some invisible row and getting and error.
>
I definitely agree with starting a new feature from simple implementation.

Although I'm inclined to the approach to replace references to tables with
security policy by sub-queries with security barrier flag, instead of adding
qualifiers of where clause to avoid the leaky-view scenario, it will make its
implementation mush simpler.

> Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
> created references to rows which are invisible to you, or should FOREIGN KEY
> constraints be exempt from security policies? I'd say they shouldn't be, i.e.
> the policy WHERE clause should be added to constraint checking queries like
> usual. But maybe I'm missing some reason why that'd be undesirable…
>
I agree. The row level security policy should not be applied during FK checks
(or other internal stuff; to be harmless). At the previous discussion, it was
issued that iteration of FK/PK proving enables malicious one to estimate
existence of invisible tuple and its key value, although they cannot see the
actual values. It is well documented limitation, thus, user should not use row-
level security (or should not use natural key) if they cannot accept
this limitation.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.2] crash on regex

Robert Haas  writes:
> On Thu, May 24, 2012 at 10:16 AM, Marko Kreen  wrote:
>> Following query crashes backend on 9.2:
>> 
>>  select substring('asd TO foo' from ' TO (([a-z0-9._]+|"([^"]+|"")+")+)');

> I spent some time trying to reduce this to the simplest case that
> still causes a crash, and came up with this:

> select substring('a' from '((a))+');

Yeah, I'm looking at it.  Looks like I broke memory management somewhere
in the quantifier revisions --- it seems to be stomping the stack during
cleanup at the end of pg_regexec.  Possibly a multiple-free deal?
Haven't quite found it yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

2012-05-24 Thread Heikki Linnakangas


On 24.05.2012 18:16, Robert Haas wrote:

On Thu, May 24, 2012 at 10:13 AM, Joachim Wieland  wrote:

I wouldn't have assumed any corruption was possible given that I did
clean shutdowns on both sides...


The thing that's worrying me is that there's not really any such thing
as a "clean" shutdown on a standby.  When you shut down the master, it
checkpoints.  When you shut down the standby, it can't checkpoint, so
I think it's still going to enter recovery at startup.  It'd be
interesting to know where that recovery began and ended as compared
with the minimum recovery point just before the shutdown.


Perhaps we should introduce the concept of a clean standby shutdown. We 
can't write a checkpoint record, but we could write the same information 
somewhere else. Like in the control file. At startup, we'd see that we 
did a clean shutdown at WAL point X/X, and start up without having to 
read through all the WAL from the last master checkpoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

I'd like to summarize the current design being discussed.

syntax:
  ALTER TABLE  WITH ROW LEVEL SECURITY
  (  ) [FOR (SELECT | UPDATE | DELETE)];
  ALTER TABLE  WITHOUT ROW LEVEL SECURITY;

I tried to patch the parser/gram.y, but here was syntax conflicts
on ADD / DROP sub-command.
And, I noticed "ROW LEVEL SECURITY" allows to implement
without adding new keyword, unlike "SECURITY POLICY".

As we discussed, it causes a problem with approach to append
additional qualifiers to where clause implicitly, because it does
not solve the matter corresponding to the order to execute
qualifiers. So, I'm inclined to the approach to replace reference
to tables with security policy by sub-queries with security barrier
flag.
For example, if "tbl" has security policy, this query shall be rewritten
internally, as follows:

original)
  SELECT * FROM tbl WHERE X > 20 AND f_leak(Y);

rewritten)
  SELECT * FROM (
SELECT * FROM tbl WHERE uname = getpgusername()
  ) AS tbl_subqry WHERE X > 20 AND f_leak(Y);

The sub-query shall have security-barrier flag, so f_leak() is never
pushed down but X > 20 will be pushed down because of leakproof
attribute of the function.

It is a bit complex at UPDATE or DELETE statement, but
what I try to do is same.

  original)
UPDATE tbl SET X = X + 1 WHERE X > 20 AND f_leak(Y);

  rewritten)
UPDATE tbl SET X = X + 1 WHERE ctid = (
SELECT ctid FROM (
SELECT ctid, * FROM uname = getpgusername()
) AS tbl_subqry WHERE X > 20 AND f_leak(Y)
);

That guarantees the security policy ("uname = getpgusername()")
is evaluated prior to user given conditions.

One thing still I'm thinking is whether the security policy should
be provided as a function or a clause. Enough simple sql function
is inlined at simplify_function(), so here is no meaningful difference.
I was afraid of code complexity, but all we should do is to append
configured clause on the where clause of sub-query inside.

|  ALTER TABLE  WITH ROW LEVEL SECURITY
|  (  ) [FOR (SELECT | UPDATE | DELETE)];

So, I tried to put "condition clause" instead of a function, right now.

Regarding to FK constraints, I don't think it is a situation to
apply row-level security policy towards internal queries.
So, I plan to disable during FK checks.

One other issue we didn't have discussed is table inheritance.
In case when a table TBLP has a child table TBLC and only
TBLC has its security policy, what security policy should be
applied when we run "SELECT * FROM TBLP".
My preference is, the security policy is only applied to scan on
TBLC, not TBLP. It is not desirable behavior that visible tuples
are different from way to reference a certain table.
In addition, if and when TBLP and TBLC have their own policy
individually, what is a desirable behavior?
I think, the security policy of both TBLP and TBLC should be
applied on TBLC; in other words, it applies the security policy
of all the parent tables to scan on child table.

Any comments please. Thanks,

2012/5/24 Kohei KaiGai :
> 2012/5/24 Florian Pflug :
>> On May24, 2012, at 16:19 , Kohei KaiGai wrote:
>>> So, the proposed interface might be revised as follows:
>>>  ALTER TABLE  ADD SECURITY POLICY
>>>      (, ...) [FOR SELECT |
>>>                                                         INSERT |
>>>                                                         [BEFORE|AFTER] 
>>> UPDATE |
>>>                                                         DELETE];
>>>
>>> In case of INSERT or AFTER UPDATE, I assume the check shall be applied
>>> on the tail of before-row triggers.
>>
>> I'd go with just "SELECT, UPDATE, DELETE", and leave the INSERT and BEFORE
>> UPDATE case to regular triggers, for two reasons
>>
>> First, it's conceptually much simpler, since the policy always just adds
>> an implicit WHERE clause, period. This of course assumes that DELETE and
>> (BEFORE) UPDATE simply skips rows for which the policy function returns 
>> false,
>> instead of reporting 'permission denied' or something. But that's the most
>> reasonable behaviour anyway, I think, because otherwise you'd make batch
>> UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
>> of tripping over some invisible row and getting and error.
>>
> I definitely agree with starting a new feature from simple implementation.
>
> Although I'm inclined to the approach to replace references to tables with
> security policy by sub-queries with security barrier flag, instead of adding
> qualifiers of where clause to avoid the leaky-view scenario, it will make its
> implementation mush simpler.
>
>> Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
>> created references to rows which are invisible to you, or should FOREIGN KEY
>> constraints be exempt from security policies? I'd say they shouldn't be, i.e.
>> the policy WHERE clause should be added to constraint checking queries like
>> usual. But maybe I'm missing some reason why that'd be undesirable…
>>
> I

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

On Thursday, May 24, 2012 06:35:06 PM Heikki Linnakangas wrote:
> On 24.05.2012 18:16, Robert Haas wrote:
> > On Thu, May 24, 2012 at 10:13 AM, Joachim Wieland  wrote:
> >> I wouldn't have assumed any corruption was possible given that I did
> >> clean shutdowns on both sides...
> > 
> > The thing that's worrying me is that there's not really any such thing
> > as a "clean" shutdown on a standby.  When you shut down the master, it
> > checkpoints.  When you shut down the standby, it can't checkpoint, so
> > I think it's still going to enter recovery at startup.  It'd be
> > interesting to know where that recovery began and ended as compared
> > with the minimum recovery point just before the shutdown.
> 
> Perhaps we should introduce the concept of a clean standby shutdown. We
> can't write a checkpoint record, but we could write the same information
> somewhere else. Like in the control file. At startup, we'd see that we
> did a clean shutdown at WAL point X/X, and start up without having to
> read through all the WAL from the last master checkpoint.
The control file currently is not a very good match because of the current 
requirement of staying below 512 bytes. If we would include the list of 
running xacts that wouldn't be enough.
I wondered before if there is more to do to fix that then to do the atomic 
write();fsync();rename();fsync(); dance. I don't see a problem with the cost 
of that personally...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck

2012-05-24 Thread Fujii Masao

On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander  wrote:
> While reviewing and cleaning this patch up a bit I noticed it actually
> broke pg_receivexlog in the renaming.
>
> Here is a new version of the patch, reworked based on the above so
> we're down to a single callback. I moved the "rename last segment file
> even if it's not complete" to be a parameter into ReceiveXlogStream()
> instead of trying to overload a third functionality on the callback
> (which is what broke pg_receivexlog).
>
> How does this look? Have I overlooked any cases?

Thanks for the patch! Looks good to me except the followings:

pg_basebackup.c:233: warning: passing argument 6 of
'ReceiveXlogStream' from incompatible pointer type

I got the above warning on compile. To fix this, the third argument
"segment_finished"
needs to be added to reached_end_position().

It seems confusing that *stream_continue()* returns TRUE when
streaming *cannot continue*, i.e.,
its name seems to be inconsistent with what it does. What about
renaming it to stream_stop?
Similarly, it also seems confusing that *continue_streaming()* returns
TRUE when streaming
*cannot continue*.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

On Thu, May 24, 2012 at 6:11 AM, Kohei KaiGai  wrote:
>> Perhaps when we see that RLS
>> applies, we should replace the reference to the original table with a
>> subquery RTE that has the security_barrier flag set - essentially
>> treating a table with RLS as if it were a security view.
>>
> I become to think it is a better approach than tracking origin of each
> qualifiers. One problem is case handling on update or delete statement.
>
> It may be possible to rewrite the update / delete query as follows:
>
> From:
>  UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
> To:
>  UPDATE tbl SET X = X + 1 WHERE ctid = (
>      SELECT * FROM (
>          SELECT ctid FROM tbl WHERE uname = getpgusername()  <== (*)
> should have security-barrier
>      ) AS tbl_subqry WHERE f_leak(Y)
>  );
>
> Expanded sub-queries will have security-barrier flag, so it enforces
> the "uname = getpgusername()" being checked earlier than f_leak(Y).
> We may need to measure the performance impact due to the reform.

The problem with this is that it introduces an extra instance of tbl
into the query - there are now two rather than one.  UPDATE .. FROM is
supposed to be a way to avoid this, but it's insufficiently general to
handle all the cases (e.g. UPDATE a LEFT JOIN b can't be written using
the existing syntax).  Anyway we want to avoid inserting self-joins
for performance reasons if at all possible.  It should be easy to do
that in the case of SELECT; UPDATE and DELETE may need a bit more
work.

> I think, this situation is similar to a case when we reference a view
> without privileges to underlying tables. If Bob set up a view with
> something "tricky" function, it allows Bob to reference credentials
> of users who reference the view.
> More or less, it might be a problem when a user try to invoke
> a user defined function declared by others.
> (Thus, sepgsql policy does not allow users to invoke a function
> declared by another one in different domain; without DBA's checks.)

This is true, but there are still some new threat models.  For
example, currently, pg_dump isn't going to run any user-defined code
just because you do SELECT * FROM table, but that will change with
this patch.  Note that pg_dump need not actually select from views,
only tables.

> I think it is a good idea not to apply RLS when current user has
> superuser privilege from perspective of security model consistency,
> but it is inconsistent to check privileges underlying tables.

Seems like a somewhat random wart, if it's just an exception for
superusers.  I think we need to do better than that.  For example, at
my last company, sales reps A and B were permitted to see all
customers of the company, but sales reps C, D, E, F, G, H, I, and J
were permitted to see only their own accounts.  Those sorts of
policies need to be easy to implement.

>> Another idea is to set things up so that the RLS policy function isn't
>> applied to each row directly; instead, it's invoked once per query and
>> *returns* a WHERE clause.  This would be a lot more powerful than the
>> proposed design, because now the table owner can write a function that
>> imposes quals on some people but not others, which seems very useful.
>>
> Sorry, I don't favor this idea. Even if table owner set up a function to
> generate additional qualifiers, it also has no guarantee the qualifiers
> are invoked prior to user-given one.
> It seems to me this approach will have same problem...

It's not intended to solve the qual-ordering problem, just to allow
additional policy flexibility.

>> It's not clear to me that there is any need for built-in server
>> functionality here.  If the table owner wants to enforce some sort of
>> policy regarding INSERT or UPDATE or DELETE, they can already do that
>> today just by attaching a trigger to the table.  And they can enforce
>> whatever policy they like that way.  Before designing any new
>> mechanism, what's wrong with the existing one?
>>
> Yes, we don't need any new invent to check the value of new tuples.
> But it should be done after all the user-defined triggers. Existing
> trigger does not have a mechanism to enforce order to be invoked.
> So, what I really implement is a mechanism to inject some pseudo
> triggers "at tail of the Trigger array".

Start the trigger names with the letter "z".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

On Thu, May 24, 2012 at 6:20 AM, Florian Pflug  wrote:
> But the security policy should still apply to the old rows, i.e.
> you shouldn't be after to UPDATE or DELETE rows you cannot see, no?

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Peter Eisentraut

clang warns about that newish SvREFCNT_inc(sv) call in plperl_helpers.h
about an unused return value, because the macro expansion of
SvREFCNT_inc(sv) returns sv.  The merit of that warning might be
debatable, but it seems easy to fix by using SvREFCNT_inc_void(sv)
instead.

And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
side effects, but that's optional.

Any concerns?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

On Thu, May 24, 2012 at 12:00 PM, Kohei KaiGai  wrote:
>> Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
>> created references to rows which are invisible to you, or should FOREIGN KEY
>> constraints be exempt from security policies? I'd say they shouldn't be, i.e.
>> the policy WHERE clause should be added to constraint checking queries like
>> usual. But maybe I'm missing some reason why that'd be undesirable…
>>
> I agree. The row level security policy should not be applied during FK checks
> (or other internal stuff; to be harmless). At the previous discussion, it was
> issued that iteration of FK/PK proving enables malicious one to estimate
> existence of invisible tuple and its key value, although they cannot see the
> actual values. It is well documented limitation, thus, user should not use 
> row-
> level security (or should not use natural key) if they cannot accept
> this limitation.

You say "I agree", but it seems to me that you and Florian are in fact
taking opposite positions.

FWIW, I'm inclined to think that you should NOT be able to create a
row that references an invisible row.  You might end up with that
situation anyway, because we don't know what the semantics of the
security policy are: rows might become visible or invisible after the
fact, and we can't police that.  But I think that if you take the
opposite position that the select queries inside fkey triggers ought
to be exempt from security policy, then you need to build some new
mechanism to make that happen, which seems like extra work for no
benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Alex Hunsaker

On Thu, May 24, 2012 at 11:19 AM, Peter Eisentraut  wrote:
> clang warns about that newish SvREFCNT_inc(sv) call in plperl_helpers.h
> about an unused return value, because the macro expansion of
> SvREFCNT_inc(sv) returns sv.  The merit of that warning might be
> debatable, but it seems easy to fix by using SvREFCNT_inc_void(sv)
> instead.
>
> And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
> side effects, but that's optional.
>
> Any concerns?

Hrm I can't seem to find either of those functions in my copy of perl
5.8.1 or 5.16.0. I also looked for "_inc_" in the event I got casing
wrong :-(.

Perhaps I misunderstood, are you proposing to introduce those
functions? Im fine with that. Or doing (void)SvREFCNT_inc().

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Interface of Row Level Security

On May24, 2012, at 18:42 , Kohei KaiGai wrote:
> I'd like to summarize the current design being discussed.
> 
> syntax:
>  ALTER TABLE  WITH ROW LEVEL SECURITY
>  (  ) [FOR (SELECT | UPDATE | DELETE)];
>  ALTER TABLE  WITHOUT ROW LEVEL SECURITY;
> 
> I tried to patch the parser/gram.y, but here was syntax conflicts
> on ADD / DROP sub-command.
> And, I noticed "ROW LEVEL SECURITY" allows to implement
> without adding new keyword, unlike "SECURITY POLICY".

Let the bike-shedding begin ;-)

ALTER TABLE … WITH sounds a bit weird. What about

  ALTER TABLE  SET ROW POLICY  FOR { SELECT | UPDATE | 
DELETE }
  ALTER TABLE  RESET ROW POLICY

> As we discussed, it causes a problem with approach to append
> additional qualifiers to where clause implicitly, because it does
> not solve the matter corresponding to the order to execute
> qualifiers. So, I'm inclined to the approach to replace reference
> to tables with security policy by sub-queries with security barrier
> flag.

Since the security barrier flag carries a potentially hefty performance
penalty, I think it should be optional. Application which don't allow
SQL-level access to the database might still benefit from row-level security,
because it saves them from having to manually add the WHERE clause to every
statement, or having to wrap all their tables with views. Yet without direct
SQL-level access, the security barrier thing isn't really necessary, so
it'd be nice if they wouldn't have to pay for it. How about

  ALTER TABLE … SET ROW POLICY … WITH (security_barrier)

> One thing still I'm thinking is whether the security policy should
> be provided as a function or a clause. Enough simple sql function
> is inlined at simplify_function(), so here is no meaningful difference.
> I was afraid of code complexity, but all we should do is to append
> configured clause on the where clause of sub-query inside.
> 
> |  ALTER TABLE  WITH ROW LEVEL SECURITY
> |  (  ) [FOR (SELECT | UPDATE | DELETE)];
> 
> So, I tried to put "condition clause" instead of a function, right now.

A single function seems much easier implementation-wise, since you wouldn't
need to store an arbitrary expression in the catalog, just an oid. It also
delegates the dependency tracking problem to the function. It also simplies
the grammar, because the "FOR … " clause cannot be mistaken to belong to
the condition clause.

> One other issue we didn't have discussed is table inheritance.
> In case when a table TBLP has a child table TBLC and only
> TBLC has its security policy, what security policy should be
> applied when we run "SELECT * FROM TBLP".
> My preference is, the security policy is only applied to scan on
> TBLC, not TBLP.

Agreed.

> It is not desirable behavior that visible tuples
> are different from way to reference a certain table.
> In addition, if and when TBLP and TBLC have their own policy
> individually, what is a desirable behavior?
> I think, the security policy of both TBLP and TBLC should be
> applied on TBLC; in other words, it applies the security policy
> of all the parent tables to scan on child table.

I think security policies should only apply to the table they're
declared for, not their child tables. Mostly because that is how
triggers operate, and security policies and triggers will often
be used together, so having their semantics regarding inheritance
be the same seems to be the least surprising option.

Also, if policies are inherited, how would you define a policy
which applies only to the parent, not to the child?

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane

Yesterday I had a client that experienced a sudden high load on
one of their servers (8.3.5 - yes, I know. Those of you with
clients will understand). When I checked, almost all connections
were in a "startup" state, very similar to this thread:

http://postgresql.1045698.n5.nabble.com/9-1-3-backends-getting-stuck-in-startup-td5670712.html

Running a strace showed a lot of semop activity, and the logs showed a
successful connection, then a 5 minute plus wait before a query was issued.
So obviously, blocking on something. Unlike the thread above, I *did* find
problems in the system catalogs. For example, both pg_class and pg_index
gave warnings like this for every index during a VACUUM FULL
VERBOSE tablename:

WARNING: index "pg_class_relname_nsp_index" contains 7712 row versions,
but table contains 9471 row versions
HINT: Rebuild the index with REINDEX.

A REINDEX did not solve the problem (initially), as a REINDEX followed
by a VAC showed the same warning and hint.

The next step was dropping to standalone mode, but before that could
be done, the REINDEXes fixed the problem (no warnings, no stalled
connections). So my questions are:

* Why would a REINDEX not fix the problem as the hint suggested?
Why would it then start "working"?

* What exactly is the relationship between bad indexes and shared
memory locks?

* Is there some other emergency fix when it happens, such as killing
all backends and hoping you kill the one that is actually holding
the lock (if any was).

* Did anything in the 8.3 series fix this? I saw nothing relevant in
the release notes for everything up to 8.3.18 (which it will be on soon).

--
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

pgpwbxRekkZQf.pgp
Description: PGP signature

Re: [HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Alex Hunsaker

On Thu, May 24, 2012 at 11:31 AM, Alex Hunsaker  wrote:
> On Thu, May 24, 2012 at 11:19 AM, Peter Eisentraut  wrote:
>>
>> And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
>> side effects, but that's optional.

> Hrm I can't seem to find either of those functions in my copy of perl
> 5.8.1 or 5.16.0. I also looked for "_inc_" in the event I got casing
> wrong :-(.

Doh, it is indeed there in 5.16.0, looks like it got added in 5.10
:-(. (I was on the wrong branch...).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile


On Thu, 24 May 2012, Merlin Moncure wrote:


Are you sure?  I looked at all the ReleasePredicateLocks calls and
they appear to be guarded by:

   /* Nothing to do if this is not a serializable transaction */
   if (MySerializableXact == InvalidSerializableXact)
   return false;
What's the default isolation mode set to?


Sorry, it was indeed my mistake. I was pointing opreport to the 
binary which was recompiled, while the old version was still running , so 
I guess oprofile was picking up wrong function names


Here is the correct oprofile: when multiple queries are running :
samples  %symbol name
---
952457   13.8715  LWLockAcquire
  952457   100.000  LWLockAcquire [self]
---
779077   11.3464  PinBuffer
  779077   100.000  PinBuffer [self]
---
759898   11.0671  s_lock
  759898   100.000  s_lock [self]
---
6897536.7178  slot_deform_tuple
  689753   100.000  slot_deform_tuple [self]
---
5260025.1230  UnpinBuffer
  526002   100.000  UnpinBuffer [self]

When only one is running:
samples  %symbol name
---
163268   14.0343  slot_deform_tuple
  163268   100.000  slot_deform_tuple [self]
---
126018   10.8324  _bt_compare
  126018   100.000  _bt_compare [self]
---
1139759.7972  ExecProject
  113975   100.000  ExecProject [self]
---
49760 4.2773  FunctionCall2Coll
  49760100.000  FunctionCall2Coll [self]
---
49164 4.2261  LWLockAcquire
  49164100.000  LWLockAcquire [self]
---
43526 3.7414  hash_search_with_hash_value
  43526100.000  hash_search_with_hash_value [self]



I guess there is nothing catastrophically wrong with that, but still I'm 
very suprised that you get severe locking problems (factor of two 
slow-down) when running parallel read-only transactions.


Sergey

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.2] crash on regex

Thom Brown  writes:
> Hmmm... curiously, lazy (non-greedy) quantifiers are stable, such as: ((a))*?

I've found it.  The triggering conditions are (1) more than one set of
capturing parens in a substring() pattern, and (2) at least one trial
midpoint failing in ccondissect() or one of its siblings.  That results
in zaptreesubs() trying to clear an array entry that isn't there...
trivial to fix fortunately.  So it's not so much what the quantifiers
are as whether the first attempted match succeeds.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared_preload_libraries path

2012-05-24 Thread Peter Eisentraut

On tor, 2012-05-24 at 14:09 +0200, Magnus Hagander wrote:
> Would i make sense to have a postgresql.conf parameter that would add
> to LD_LIBRARY_PATH when loading libraries from
> shared_preload_libraries (and other library loads).

Well, you could write a library that sets it in its init function, and
load that before the library that needs it set. :-)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Jeff Janes

On Thu, May 24, 2012 at 6:24 AM, Sergey Koposov  wrote:
> Hi,
>
> I've been running some tests on pg 9.2beta1 and in particular a set
> of queries like
...
>
> And I noticed than when I run the query like the one shown above in parallel
> (in multiple connections for ZZZ=0...8) the performance of each query drops
> down significantly (factor of 2)(despite the fact that during the execution
> of the query postgres is mostly CPU bound).

Is this a regression from previous versions (9.0, 9.1), or has the
slow down always been there?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov  wrote:
> I guess there is nothing catastrophically wrong with that, but still I'm
> very suprised that you get severe locking problems (factor of two slow-down)
> when running parallel read-only transactions.

Me, too.  How many concurrent connections are you running, and does
your working set exceed shared_buffers?  Can you provide a
self-contained reproducible test case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shared_preload_libraries path

On Thu, May 24, 2012 at 7:58 PM, Peter Eisentraut  wrote:
> On tor, 2012-05-24 at 14:09 +0200, Magnus Hagander wrote:
>> Would i make sense to have a postgresql.conf parameter that would add
>> to LD_LIBRARY_PATH when loading libraries from
>> shared_preload_libraries (and other library loads).
>
> Well, you could write a library that sets it in its init function, and
> load that before the library that needs it set. :-)

:-) What's life without workarounds, eh? ;)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Peter Eisentraut

On tor, 2012-05-24 at 11:36 -0600, Alex Hunsaker wrote:
> On Thu, May 24, 2012 at 11:31 AM, Alex Hunsaker  wrote:
> > On Thu, May 24, 2012 at 11:19 AM, Peter Eisentraut  wrote:
> >>
> >> And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
> >> side effects, but that's optional.
> 
> > Hrm I can't seem to find either of those functions in my copy of perl
> > 5.8.1 or 5.16.0. I also looked for "_inc_" in the event I got casing
> > wrong :-(.
> 
> Doh, it is indeed there in 5.16.0, looks like it got added in 5.10
> :-(. (I was on the wrong branch...).

It's in ppport.h.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Creating multiple indexes in one table scan.

2012-05-24 Thread Andrew Dunstan




On 05/24/2012 11:44 AM, Robert Haas wrote:

On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singh  wrote:

It'd be great if one of standard utilities like pg_restore supported this,
by spawning every concurrent index build in separate backends. Just a
thought.

If parallel restore doesn't already take this into account when doing
job scheduling, that would be a worthwhile improvement to consider.





Parallel restore allows it in some cases, but not in others. You can't 
create an index-backed constraint concurrently with any other index 
because ALTER TABLE takes an ACCESS EXCLUSIVE lock.


We could get around that changing pg_dump to create the index first and 
then add the constraint using the existing index.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changing the concept of a DATABASE

2012-05-24 Thread Josh Berkus


> Yes, we do. It would be best to conclude that things I do on hackers
> relate in some way to those goals, even if it isn't immediately clear
> how.

See, now you've got me all curious.  How does inter-DB queries relate to
the New Replication?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Stateful pointers in set-returning functions

2012-05-24 Thread Ian Pye

Hi,

I'm writing a set-returning function which places a file handle into
PG's FuncCallContext's user_fctx space. My problem is that when the
function is ran with a limit clause (SELECT * FROM foo() LIMIT 10) the
server will stop calling the function automatically, not giving me a
chance to close the file handle. Is there a way to get the limit value
inside of foo() and set the max max_calls parameter correctly?

Thanks,

Ian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile


On Thu, 24 May 2012, Robert Haas wrote:


On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov  wrote:

I guess there is nothing catastrophically wrong with that, but still I'm
very suprised that you get severe locking problems (factor of two slow-down)
when running parallel read-only transactions.


Me, too.  How many concurrent connections are you running, and does
your working set exceed shared_buffers?  Can you provide a
self-contained reproducible test case?


The last tests I've been doing were with 8 connections.
And the working set is roughly 30Gig, which is ~ 3x the shared buffers. 
(but ~ 50% of RAM).


Regarding the test-case, I'll try to see whether I can still observe the 
same slowing down if I chop the main table by a factor of few, so I can 
put the data somewhere for download.


S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stateful pointers in set-returning functions

Ian Pye  writes:
> I'm writing a set-returning function which places a file handle into
> PG's FuncCallContext's user_fctx space. My problem is that when the
> function is ran with a limit clause (SELECT * FROM foo() LIMIT 10) the
> server will stop calling the function automatically, not giving me a
> chance to close the file handle. Is there a way to get the limit value
> inside of foo() and set the max max_calls parameter correctly?

No, and even if there were, this would be a very unsafe practice,
since errors or other issues could result in early termination of the
query.

You would likely be better off using tuplestore return mode so that you
can do all the reading during one call and not have to assume that
you'll get control back again.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changing the concept of a DATABASE

On Thursday, May 24, 2012 08:12:56 PM Josh Berkus wrote:
> > Yes, we do. It would be best to conclude that things I do on hackers
> > relate in some way to those goals, even if it isn't immediately clear
> > how.
> See, now you've got me all curious.  How does inter-DB queries relate to
> the New Replication?
Being able to apply changes from one process to multiple databases reduces 
implementation/runtime overhead...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stateful pointers in set-returning functions

2012-05-24 Thread Ian Pye

Fair enough -- thanks for the tip.

On Thu, May 24, 2012 at 11:21 AM, Tom Lane  wrote:
> Ian Pye  writes:
>> I'm writing a set-returning function which places a file handle into
>> PG's FuncCallContext's user_fctx space. My problem is that when the
>> function is ran with a limit clause (SELECT * FROM foo() LIMIT 10) the
>> server will stop calling the function automatically, not giving me a
>> chance to close the file handle. Is there a way to get the limit value
>> inside of foo() and set the max max_calls parameter correctly?
>
> No, and even if there were, this would be a very unsafe practice,
> since errors or other issues could result in early termination of the
> query.
>
> You would likely be better off using tuplestore return mode so that you
> can do all the reading during one call and not have to assume that
> you'll get control back again.
>
>                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

Andres Freund  writes:
> The control file currently is not a very good match because of the current 
> requirement of staying below 512 bytes. If we would include the list of 
> running xacts that wouldn't be enough.
> I wondered before if there is more to do to fix that then to do the atomic 
> write();fsync();rename();fsync(); dance. I don't see a problem with the cost 
> of that personally...

The reason for keeping it to one sector is that you're screwed if the
file is broken, so the fewer failure modes the better.

I'm not sure I believe that we can make a recovery resume from an
arbitrary point in WAL anyway, or that it would be worth the trouble.
Can't we just resume from the last restartpoint?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

On Thursday, May 24, 2012 08:32:47 PM Tom Lane wrote:
> Andres Freund  writes:
> > The control file currently is not a very good match because of the
> > current requirement of staying below 512 bytes. If we would include the
> > list of running xacts that wouldn't be enough.
> > I wondered before if there is more to do to fix that then to do the
> > atomic write();fsync();rename();fsync(); dance. I don't see a problem
> > with the cost of that personally...
> 
> The reason for keeping it to one sector is that you're screwed if the
> file is broken, so the fewer failure modes the better.
Yea, sure. But given the amount of software that depends on the above sequence 
to work correctly I don't really see much of a problem...

> I'm not sure I believe that we can make a recovery resume from an
> arbitrary point in WAL anyway, or that it would be worth the trouble.
> Can't we just resume from the last restartpoint?
Well, with a decent sized checkpoint_segments getting up2date can take quite a 
noticeable amount of time...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 2:19 PM, Sergey Koposov  wrote:
> On Thu, 24 May 2012, Robert Haas wrote:
>> On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov 
>> wrote:
>>>
>>> I guess there is nothing catastrophically wrong with that, but still I'm
>>> very suprised that you get severe locking problems (factor of two
>>> slow-down)
>>> when running parallel read-only transactions.
>>
>> Me, too.  How many concurrent connections are you running, and does
>> your working set exceed shared_buffers?  Can you provide a
>> self-contained reproducible test case?
>
> The last tests I've been doing were with 8 connections.
> And the working set is roughly 30Gig, which is ~ 3x the shared buffers. (but
> ~ 50% of RAM).

Given that additional information, I would say these results are
expected.  Unfortunately, our BufFreelistLock is a serious contention
point, and I think that's what you're hitting.  See the graph here:

http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html

As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Wed, May 23, 2012 at 2:28 PM, Fujii Masao  wrote:
> On Sat, Dec 31, 2011 at 10:34 PM, Simon Riggs  wrote:
>> Send new protocol keepalive messages to standby servers.
>> Allows streaming replication users to calculate transfer latency
>> and apply delay via internal functions. No external functions yet.
>
> Is there plan to implement such external functions before 9.2 release?
> If not, keepalive protocol seems to be almost useless because there is
> no use of it for a user and the increase in the number of packets might
> increase the replication performance overhead slightly. No?

Good point.  IMHO, this shouldn't really have been committed like
this, but since it was, we had better fix it, either by reverting the
change or forcing an initdb to expose the functionality.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

Andres Freund  writes:
> On Thursday, May 24, 2012 08:32:47 PM Tom Lane wrote:
>> I'm not sure I believe that we can make a recovery resume from an
>> arbitrary point in WAL anyway, or that it would be worth the trouble.
>> Can't we just resume from the last restartpoint?

> Well, with a decent sized checkpoint_segments getting up2date can take quite 
> a 
> noticeable amount of time...

So?  Slow restart is precisely the price you pay for a large checkpoint
interval.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

Robert Haas  writes:
> On Wed, May 23, 2012 at 2:28 PM, Fujii Masao  wrote:
>> Is there plan to implement such external functions before 9.2 release?
>> If not, keepalive protocol seems to be almost useless because there is
>> no use of it for a user and the increase in the number of packets might
>> increase the replication performance overhead slightly. No?

> Good point.  IMHO, this shouldn't really have been committed like
> this, but since it was, we had better fix it, either by reverting the
> change or forcing an initdb to expose the functionality.

I see no reason to rip the code out if we have plans to make use of it
in the near future.  I am also not for going back into development mode
on 9.2, which is what adding new functions now would amount to.  What's
wrong with leaving well enough alone?  It's not like there is no
unfinished work anywhere else in Postgres ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "could not open relation with OID" errors after promoting the standby to master

On Thursday, May 24, 2012 08:46:21 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Thursday, May 24, 2012 08:32:47 PM Tom Lane wrote:
> >> I'm not sure I believe that we can make a recovery resume from an
> >> arbitrary point in WAL anyway, or that it would be worth the trouble.
> >> Can't we just resume from the last restartpoint?
> > 
> > Well, with a decent sized checkpoint_segments getting up2date can take
> > quite a noticeable amount of time...
> 
> So?  Slow restart is precisely the price you pay for a large checkpoint
> interval.
Well. no. If you stop the master in a orderly fashion it will checkpoint and 
thus have a sensible startup time. There is no such possibility for a standby  
without interaction with the master.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 1:43 PM, Robert Haas  wrote:
> On Thu, May 24, 2012 at 2:19 PM, Sergey Koposov  wrote:
>> On Thu, 24 May 2012, Robert Haas wrote:
>>> On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov 
>>> wrote:

 I guess there is nothing catastrophically wrong with that, but still I'm
 very suprised that you get severe locking problems (factor of two
 slow-down)
 when running parallel read-only transactions.
>>>
>>> Me, too.  How many concurrent connections are you running, and does
>>> your working set exceed shared_buffers?  Can you provide a
>>> self-contained reproducible test case?
>>
>> The last tests I've been doing were with 8 connections.
>> And the working set is roughly 30Gig, which is ~ 3x the shared buffers. (but
>> ~ 50% of RAM).
>
> Given that additional information, I would say these results are
> expected.  Unfortunately, our BufFreelistLock is a serious contention
> point, and I think that's what you're hitting.  See the graph here:
>
> http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html
>
> As you can see, raw performance isn't much worse with the larger data
> sets, but scalability at high connection counts is severely degraded
> once the working set no longer fits in shared_buffers.

Hm, wouldn't the BufFreelistLock issue be ameliorated if
StrategyGetBuffer could reserve multiple buffers so that you'd draw
down your local list and only then go back to the global pool? (easier
said than done obviously).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Archiver not exiting upon crash

2012-05-24 Thread Jeff Janes

On Wed, May 23, 2012 at 2:21 PM, Tom Lane  wrote:
> I wrote:
>> Jeff Janes  writes:
>>> But what happens if the SIGQUIT is blocked before the system(3) is
>>> invoked?  Does the ignore take precedence over the block, or does the
>>> block take precedence over the ignore, and so the signal is still
>>> waiting once the block is reversed after the system(3) is over?  I
>>> could write a test program to see, but that wouldn't be very good
>>> evidence of the portability.
>
>> AFAICT from the POSIX spec for system(3), that would be a bug in
>> system().
>
> Actually, on further thought, it seems like there is *necessarily* a
> race condition in this.  There must be some interval where the child
> process has already exited but the waiting parent hasn't de-ignored the
> signals.

Yup.  the posix man page doesn't say that this is guaranteed to be
atomic, so I assume no such guarantee is even attempted.  For that
matter, the man page for system(3) doesn't even tell how one goes
about signaling the child--unless the signal is sent to the negation
of the group leader the child won't receive it.  (Postgres handles
group-leader thing correctly, I just point it out to show the man page
is not intended to be complete)

> However, I remain unsatisfied with this idea as an explanation for the
> behavior you're seeing.  In the first place, that race condition window
> ought not be wide enough to allow failure probabilities as high as 10%.
> In the second place, that code has been like that for a long while,
> so this theory absolutely does not explain why you're seeing a
> materially higher probability of failure in HEAD than 9.1.  There is
> something else going on.

After a while trying to bisect the behavior, I decided it was a mug's
game.  Both arms of the race (the firing of archive_command and the
engineered crash) are triggered indirectly be the same event, the
start of a checkpoint.  Small changes in the code can lead to small
changes in the timing which make drastic changes in how likely it is
that the two arms collide exactly at the vulnerability.

So my test harness is an inexplicably effective show-case for the
vulnerability, but it is not the reason the vulnerability should be
fixed.

By the way, my archive_command is very fast, as all it does it echo
the date into a log file.  I want postgres to think it is archive
mode, but for this purpose I don't want to actually deal with having
an archive.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Archiver not exiting upon crash

Jeff Janes  writes:
> On Wed, May 23, 2012 at 2:21 PM, Tom Lane  wrote:
>> However, I remain unsatisfied with this idea as an explanation for the
>> behavior you're seeing.  In the first place, that race condition window
>> ought not be wide enough to allow failure probabilities as high as 10%.
>> In the second place, that code has been like that for a long while,
>> so this theory absolutely does not explain why you're seeing a
>> materially higher probability of failure in HEAD than 9.1.  There is
>> something else going on.

> After a while trying to bisect the behavior, I decided it was a mug's
> game.  Both arms of the race (the firing of archive_command and the
> engineered crash) are triggered indirectly be the same event, the
> start of a checkpoint.  Small changes in the code can lead to small
> changes in the timing which make drastic changes in how likely it is
> that the two arms collide exactly at the vulnerability.

Ah.  OK, that sounds more plausible than "it just happened".

> So my test harness is an inexplicably effective show-case for the
> vulnerability, but it is not the reason the vulnerability should be
> fixed.

Agreed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure  wrote:
>> As you can see, raw performance isn't much worse with the larger data
>> sets, but scalability at high connection counts is severely degraded
>> once the working set no longer fits in shared_buffers.
>
> Hm, wouldn't the BufFreelistLock issue be ameliorated if
> StrategyGetBuffer could reserve multiple buffers so that you'd draw
> down your local list and only then go back to the global pool? (easier
> said than done obviously).

hm, looking at the code some more, it looks like the whole point of
the strategy system is to do that.  ISTM bulk insert type queries
would be good candidates for a buffer strategy somehow?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile


On Thu, 24 May 2012, Robert Haas wrote:


As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.


Actually the problem persits even when I trim the dataset size to be within
the shared_buffers.

Here is the dump (0.5 gig in size, tested with shared_buffers=10G,
work_mem=500Mb):
http://www.ast.cam.ac.uk/~koposov/files/dump.gz
And I attach the script

For my toy dataset the performance of a single thread goes down 
from ~6.4 to 18 seconds (~ 3 times worse),


And actually while running the script repeatedly on my main machine, for 
some reason I saw  some variation in terms of how much threaded execution 
is slower than a single thread.


Now I see 25 seconds for multi threaded run vs the same ~ 6 second for a 
single thread.


The oprofile shows
 782355   21.5269  s_lock
  782355   100.000  s_lock [self]
---
709801   19.5305  PinBuffer
  709801   100.000  PinBuffer [self]
---
3264578.9826  LWLockAcquire
  326457   100.000  LWLockAcquire [self]
---
3094378.5143  UnpinBuffer
  309437   100.000  UnpinBuffer [self]
---
2529726.9606  ReadBuffer_common
  252972   100.000  ReadBuffer_common [self]
---
2015585.5460  LockBuffer
  201558   100.000  LockBuffer [self]


It is interesting that On another machine with much smaller shared memory 
(3G), smaller RAM (12G),  smaller number of cpus  and PG 9.1 running I was 
getting consistently ~ 7.2 vs 4.5 sec (for multi vs single thread)


PS Just in case the CPU on the main machine I'm testing is Xeon(R) CPU E7- 
4807 (the total number of real cores is 24)






*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

script.sh
Description: Bourne shell script
drop table _tmpXX ;
\timing
create table _tmpXX as select * from 
  ( select *, 
  (select healpixid from idt_match as m where m.transitid=o.transitid) 
as x from idt_photoobservation_small as o offset 0
  ) as y where x%16=XX order by x;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

Greg Sabino Mullane  writes:
> Yesterday I had a client that experienced a sudden high load on 
> one of their servers (8.3.5 - yes, I know. Those of you with 
> clients will understand). When I checked, almost all connections 
> were in a "startup" state, very similar to this thread:

> http://postgresql.1045698.n5.nabble.com/9-1-3-backends-getting-stuck-in-startup-td5670712.html

> Running a strace showed a lot of semop activity, and the logs showed a
> successful connection, then a 5 minute plus wait before a query was issued.
> So obviously, blocking on something.

Did you check I/O activity?  I looked again at Jeff Frost's report and
now think that what he saw was probably a lot of seqscans on bloated
system catalogs, cf
http://archives.postgresql.org/message-id/28484.1337887...@sss.pgh.pa.us

> Unlike the thread above, I *did* find
> problems in the system catalogs. For example, both pg_class and pg_index
> gave warnings like this for every index during a VACUUM FULL
> VERBOSE tablename:

> WARNING: index "pg_class_relname_nsp_index" contains 7712 row versions,
>   but table contains 9471 row versions
> HINT:  Rebuild the index with REINDEX.

That's fairly interesting, but if it was a bloat situation then it
would've been the VAC FULL that fixed it rather than the REINDEX.
Did you happen to save the VERBOSE output?  It'd be really useful to
know whether there was any major shrinkage of the core catalogs
(esp. pg_class, pg_attribute).

> * Did anything in the 8.3 series fix this?

I think there are probably two independent issues here.  The missing
index entries are clearly bad but it's not clear that they had anything
to do with the startup stall.  There are a couple of fixes in recent
8.3.x releases that might possibly explain the index corruption,
especially if you're in the habit of reindexing the system catalogs
frequently.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Missing optimization when filters are applied after window functions

On Wed, May 16, 2012 at 09:25:13AM -0400, Tom Lane wrote:
> Volker Grabsch  writes:
> > I propose the following general optimization: If all window
> > functions are partitioned by the same first field (here: id),
> > then any filter on that field should be executed before
> > WindowAgg.
> 
> I'm not sure if that rule is correct in detail, but in any case the
> short answer is that window aggregates are a new feature in Postgres
> and we basically haven't done any optimization work on them yet.
> Feel free to work in that area if it interests you...

Is this a TODO?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

On Thu, May 24, 2012 at 2:52 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 23, 2012 at 2:28 PM, Fujii Masao  wrote:
>>> Is there plan to implement such external functions before 9.2 release?
>>> If not, keepalive protocol seems to be almost useless because there is
>>> no use of it for a user and the increase in the number of packets might
>>> increase the replication performance overhead slightly. No?
>
>> Good point.  IMHO, this shouldn't really have been committed like
>> this, but since it was, we had better fix it, either by reverting the
>> change or forcing an initdb to expose the functionality.
>
> I see no reason to rip the code out if we have plans to make use of it
> in the near future.  I am also not for going back into development mode
> on 9.2, which is what adding new functions now would amount to.  What's
> wrong with leaving well enough alone?  It's not like there is no
> unfinished work anywhere else in Postgres ...

So, extra TCP overhead for no user-visible benefit doesn't bother you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Jeff Janes

On Thu, May 24, 2012 at 12:46 PM, Merlin Moncure  wrote:
> On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure  wrote:
>>> As you can see, raw performance isn't much worse with the larger data
>>> sets, but scalability at high connection counts is severely degraded
>>> once the working set no longer fits in shared_buffers.
>>
>> Hm, wouldn't the BufFreelistLock issue be ameliorated if
>> StrategyGetBuffer could reserve multiple buffers so that you'd draw
>> down your local list and only then go back to the global pool? (easier
>> said than done obviously).
>
> hm, looking at the code some more, it looks like the whole point of
> the strategy system is to do that.

I thought you were suggesting that the StrategyGetBuffer would
pre-allocate multiple buffers to a backend under the cover of a single
BufFreelistLock.  If that is what you were suggesting, that is not
what the strategy system is currently for.  It is for locally reusing
buffers, not for gang-allocating them.

If a backend could somehow predict that the buffer it is about to read
in is likely going to be a cold buffer, perhaps it would make sense
for each backend to maintain an small ring of its own which it can
reuse for such cold buffers.

> ISTM bulk insert type queries
> would be good candidates for a buffer strategy somehow?

Probably.  There is a code or README comment to that effect that I
stumbled upon just ra couple hours ago, but can't immediately re-find
it.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 3:46 PM, Merlin Moncure  wrote:
> On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure  wrote:
>>> As you can see, raw performance isn't much worse with the larger data
>>> sets, but scalability at high connection counts is severely degraded
>>> once the working set no longer fits in shared_buffers.
>>
>> Hm, wouldn't the BufFreelistLock issue be ameliorated if
>> StrategyGetBuffer could reserve multiple buffers so that you'd draw
>> down your local list and only then go back to the global pool? (easier
>> said than done obviously).
>
> hm, looking at the code some more, it looks like the whole point of
> the strategy system is to do that.  ISTM bulk insert type queries
> would be good candidates for a buffer strategy somehow?

Yep.

commit 85e2cedf985bfecaf43a18ca17433070f439fb0e
Author: Tom Lane 
Date:   Thu Nov 6 20:51:15 2008 +

Improve bulk-insert performance by keeping the current target buffer pinned
(but not locked, as that would risk deadlocks).  Also, make it work in a sma
ring of buffers to avoid having bulk inserts trash the whole buffer arena.

Robert Haas, after an idea of Simon Riggs'.

But that doesn't help here, since pgbench is only looking up one row per query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 3:35 PM, Robert Haas  wrote:
> On Thu, May 24, 2012 at 3:46 PM, Merlin Moncure  wrote:
>> hm, looking at the code some more, it looks like the whole point of
>> the strategy system is to do that.  ISTM bulk insert type queries
>> would be good candidates for a buffer strategy somehow?
>
> Yep.
>
> commit 85e2cedf985bfecaf43a18ca17433070f439fb0e
> Author: Tom Lane 
> Date:   Thu Nov 6 20:51:15 2008 +
>
>    Improve bulk-insert performance by keeping the current target buffer pinned
>    (but not locked, as that would risk deadlocks).  Also, make it work in a 
> sma
>    ring of buffers to avoid having bulk inserts trash the whole buffer arena.
>
>    Robert Haas, after an idea of Simon Riggs'.
>
> But that doesn't help here, since pgbench is only looking up one row per 
> query.

Wait -- OP's gripe this isn't regarding standard pgbench, but multiple
large concurrent 'insert into foo select...'.  I looked in the code
and it appears that the only bulk insert strategy using operations are
copy, create table as select, and table rewrite operations. Concurrent
INSERT SELECT apparently doesn't get the benefit of a strategy and
should be fighting over the freelist once the pool exhausts.

We don't get to skip wal of course, but we should be able to use a
bulk insert strategy, especially if there was some way of predicting
that a large number of tuples were going to be inserted.  I'm
wondering though of contention on the free list is in fact the OP's
problem.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane

On Thu, May 24, 2012 at 03:54:54PM -0400, Tom Lane wrote:
> Did you check I/O activity?  I looked again at Jeff Frost's report and
> now think that what he saw was probably a lot of seqscans on bloated
> system catalogs, cf
> http://archives.postgresql.org/message-id/28484.1337887...@sss.pgh.pa.us

Thank you for the response. Yes, we did look at I/O, but nothing unusual 
was seen that would explain the load. If it happens again I'll see if the 
system catalogs are getting a lot of seqscans.

> That's fairly interesting, but if it was a bloat situation then it
> would've been the VAC FULL that fixed it rather than the REINDEX.
> Did you happen to save the VERBOSE output?  It'd be really useful to
> know whether there was any major shrinkage of the core catalogs
> (esp. pg_class, pg_attribute).

I did have them in screen, but the home screen box just craashed a 
few hours ago (after weeks of uptime: Murphy's law). It certainly could 
have been VAC FULL as we have processes that do both VAC FULL and REINDEX 
periodically; I simply assumed based on the HINT it was the REINDEX that 
cleared it up. The only snippet I have is:

# vacuum full verbose pg_class;
INFO:  vacuuming "pg_catalog.pg_class"
INFO:  "pg_class": found 43 removable, 10376 nonremovable row versions in 518 
pages
DETAIL:  6078 dead row versions cannot be removed yet.
Nonremovable row versions range from 160 to 628 bytes long.
There were 7367 unused item pointers.
Total free space (including removable row versions) is 157000 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_class_oid_index" now contains 7712 row versions in 32 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
WARNING:  index "pg_class_oid_index" contains 7712 row versions, but table 
contains 9471 row versions
HINT:  Rebuild the index with REINDEX.
INFO:  index "pg_class_relname_nsp_index" now contains 7712 row versions in 113 
pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
WARNING:  index "pg_class_relname_nsp_index" contains 7712 row versions, but 
table contains 9471 row versions
HINT:  Rebuild the index with REINDEX.
VACUUM
Time: 65.635 ms

Actually, since we have full logging, those warnings should still be 
there... (checks) ... yep. Hints elided, the last one seen was:

WARNING: index "pg_class_oid_index" contains 14877 row versions,
  but table contains 66648 row versions
WARNING: index "pg_class_relname_nsp_index" contains 14877 row versions,
  but table contains 66648 row versions
WARNING: index "pg_class_oid_index" contains 60943 row versions,
  but table contains 112714 row versions
WARNING:  index "pg_class_relname_nsp_index" contains 60943 row versions,
  but table contains 112714 row versions

Looks like there are some more going back a ways: I'll gather together 
and send those offlist after this.

> I think there are probably two independent issues here.  The missing
> index entries are clearly bad but it's not clear that they had anything
> to do with the startup stall.  There are a couple of fixes in recent
> 8.3.x releases that might possibly explain the index corruption,
> especially if you're in the habit of reindexing the system catalogs
> frequently.

Yes, we are in that habit. I hope they are not independent: I'd rather 
have one problem to worry about than two. :) I'll certainly report if I 
see either problem pop up again.

Oh, almost forgot: reading your reply to the old thread reminded me of 
something I saw in one of the straces right as it "woke up" and left 
the startup state to do some work. Here's a summary:

12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
(x a gazillion)
...
12:18:40 brk(0x1c0af000)= 0x1c0af000
...(some more semops)...
12:18:40 mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x2ac062c98000
...(handful of semops)...
12:18:40 unlink("base/1554846571/pg_internal.init.11803") = -1 ENOENT (No such 
file or directory)
12:18:40 open("base/1554846571/pg_internal.init.11803", 
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 13
12:18:40 fstat(13, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
12:18:40 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x2ac062cd9000
12:18:40 write(13, ...
...(normalish looking strace output after this)...

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8

pgpgvtyJ9p6Fs.pgp
Description: PGP signature

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane

> I think there are probably two independent issues here.  The missing
> index entries are clearly bad but it's not clear that they had anything
> to do with the startup stall.

On further log digging, I think you are correct, as those index 
warnings go back many days before the startup problems appeared.
Let me know if you'd like any of those warnings from the logs.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgp9P2W8oU7Y8.pgp
Description: PGP signature

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

On Thu, May 24, 2012 at 4:46 PM, Merlin Moncure  wrote:
> Wait -- OP's gripe this isn't regarding standard pgbench, but multiple
> large concurrent 'insert into foo select...'.  I looked in the code
> and it appears that the only bulk insert strategy using operations are
> copy, create table as select, and table rewrite operations. Concurrent
> INSERT SELECT apparently doesn't get the benefit of a strategy and
> should be fighting over the freelist once the pool exhausts.

I think you are right.

> We don't get to skip wal of course, but we should be able to use a
> bulk insert strategy, especially if there was some way of predicting
> that a large number of tuples were going to be inserted.  I'm
> wondering though of contention on the free list is in fact the OP's
> problem.

Not sure.  It might be some other LWLock, but it's hard to tell which
one from the information provided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes complete

On 21 May 2012 19:10, Josh Berkus  wrote:
>
>> For these reasons, it may be timely and appropriate, from a purely
>> advocacy point-of-view, to call our new group commit "group commit" in
>> release notes and documentation, and announce it as a new feature.
>
> First, shouldn't we be having this discussion on -advocacy?

Well, no, because this is a specific discussion about release notes.
In any case, I've given up on the idea that we should market new group
commit as "group commit". I believe that that would be a useful and
fair way of representing the feature, but there doesn't seem to be any
support for that view.

In passing, I noticed this:

"""
E.1.3.12.2. pg_stat_statements

Improve pg_stat_statements to aggregate similar queries (Peter
Geoghegan, Tom Lane)

Improve pg_stat_statements' handling of PREPARE/EXECUTE statements (Tom Lane)

Add dirtied and written block counts to pg_stat_statements (Robert Haas)
"""

I think that the second entry should be listed as a bug fix, or a
compatibility note, rather than an actual feature. At the very least,
it should be listed after "Add dirtied and written block counts". I
also think that we should separately list as a feature
pg_stat_statements new ability to track I/O timings at the query
granularity.

Are we any closer to a list of major features?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] release note item

On Mon, May 21, 2012 at 12:19:27PM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > Regarding the item:
> >   * Properly handle empty arrays returned by PL/Perl functions (Andrew
> > Dunstan) DETAILS?
> 
> > This was a bug fix, not a feature, and in any case is due to Alex 
> > Hunsaker, not me.
> 
> I assume this is in reference to these commits:

Thanks, removed from 9.2 release notes.

---


> 
> Author: Andrew Dunstan 
> Branch: REL9_1_STABLE Release: REL9_1_0 [22a55b3af] 2011-08-17 12:03:26 -0400
> 
> Properly handle empty arrays returned from plperl functions.
> 
> Bug reported by David Wheeler, fix by Alex Hunsaker.
> # Please enter the commit message for your changes. Lines starting
> # with '#' will be ignored, and an empty message aborts the commit.
> # On branch master
> # Changes to be committed:
> #   (use "git reset HEAD ..." to unstage)
> #
> # modified:   src/pl/plperl/plperl.c
> #
> # Untracked files:
> #   (use "git add ..." to include in what will be committed)
> #
> # autom4te.cache/
> # configure.in~
> # doc/src/sgml/ref/grant.sgml~
> # src/backend/port/win32_latch.c~
> # src/bin/psql/command.c~
> # src/include/pg_config.h.win32~
> # src/pl/plpython/plpython.c~
> # src/tools/msvc/pgbison.bat~
> # src/tools/msvc/pgbison.pl.bak
> # src/tools/msvc/pgflex.bat~
> # src/tools/msvc/pgflex.pl.bak
> # src/tools/pgindent/README~
> # src/tools/pgindent/pgindent.pl
> # src/tools/pgindent/pgindent.pl~
> # x
> # yy
> 
> Author: Andrew Dunstan 
> Branch: master [68c903a66] 2011-08-17 11:59:18 -0400
> 
> Properly handle empty arrays returned from plperl functions.
> 
> Bug reported by David Wheeler, fix by Alex Hunsaker.
> 
> git_changelog failed to merge them because of the different commit
> messages, which is probably why Bruce thought it was a HEAD-only
> commit.  AFAICS we should just remove this item, since it was (or
> should've been) documented already in 9.1 release notes.
> 
>   regards, tom lane

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes complete

On Thu, May 24, 2012 at 10:34:22PM +0100, Peter Geoghegan wrote:
> In passing, I noticed this:
> 
> """
> E.1.3.12.2. pg_stat_statements
> 
> Improve pg_stat_statements to aggregate similar queries (Peter
> Geoghegan, Tom Lane)
> 
> Improve pg_stat_statements' handling of PREPARE/EXECUTE statements (Tom Lane)
> 
> Add dirtied and written block counts to pg_stat_statements (Robert Haas)
> """
> 
> I think that the second entry should be listed as a bug fix, or a
> compatibility note, rather than an actual feature. At the very least,
> it should be listed after "Add dirtied and written block counts". I
> also think that we should separately list as a feature

OK, item moved down.  We have not have "bug fix" designation.  You have
a suggestion?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap metapages

On Tue, May 22, 2012 at 09:52:30AM +0100, Simon Riggs wrote:
> Having pg_upgrade touch data files is both dangerous and difficult to
> back out in case of mistake, so I am wary of putting the metapage at
> block 0. Doing it the way I suggest means the .meta files would be
> wholly new and can be deleted as a back-out. We can also clean away
> any unnecessary .vm/.fsm files as a later step.

Pg_upgrade never modifies the old cluster, except to lock it in link
mode, so there is never anything to back out.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Draft release notes complete

On 24 May 2012 22:57, Bruce Momjian wrote:
> OK, item moved down. We have not have "bug fix" designation. You have
> a suggestion?

I assumed you were going to put it beside the other compatibility note
relating to pg_stat_statements, "Change pg_stat_statements' total_time
column to be measured in milliseconds (Tom Lane)".

The "Improve pg_stat_statements' handling of PREPARE/EXECUTE
statements" is just a way of preventing SQL PREPARE and EXECUTE
utility statements from being double counted in various ways as both
utility statements and optimisable statements. No one actually noticed
this before, and it wouldn't have been feasible to fix in back
branches, I think. Here are the relevant comments:

* If it's an EXECUTE statement, we don't track it and don't increment
* the nesting level. This allows the cycles to be charged to the
* underlying PREPARE instead (by the Executor hooks), which is much more
* useful.
*
* We also don't track execution of PREPARE. If we did, we would get one
* hash table entry for the PREPARE (with hash calculated from the query
* string), and then a different one with the same query string (but hash
* calculated from the query tree) would be used to accumulate costs of
* ensuing EXECUTEs. This would be confusing, and inconsistent with other
* cases where planning time is not included at all.

Also, as I've said, this I/O timings thing certainly deserves to be
separately listed as a new pg_stat_statements feature:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5b4f346611431361339253203d486789e4babb02

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

Robert Haas  writes:
> On Thu, May 24, 2012 at 4:46 PM, Merlin Moncure  wrote:
>> We don't get to skip wal of course, but we should be able to use a
>> bulk insert strategy, especially if there was some way of predicting
>> that a large number of tuples were going to be inserted.  I'm
>> wondering though of contention on the free list is in fact the OP's
>> problem.

> Not sure.  It might be some other LWLock, but it's hard to tell which
> one from the information provided.

Yeah.  It seems quite plausible that Robert's select-only benchmark might
be mainly tripping over the freelist lock, but I'm less convinced about
something that's doing INSERT/SELECT, and therefore is also doing a lot
of WAL activity, index insertions, etc.  I'd want to see some
instrumentation results before assuming we know where the bottleneck is
there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-Database Roles