Re: [HACKERS] Changing the concept of a DATABASE

2012-05-24 Thread Susanne Ebrecht

Am 22.05.2012 15:27, schrieb Albe Laurenz:
If you need different applications to routinely access each other's 
tables, why not assign them to different schemas in one database?


I just saw another use case here.

There are lots of offices / departments creating maps. Topography maps,
pipeline maps, nature conservancy (e.g. where are the nests from endangered
birds?), mineral resources, wire maps, street maps, bicycle / jogging maps,
tourists maps, tree maps, cadastral land register, and so on.

All this departments have their own databases for their own maps.
They only map their own stuff.

Towns / states / regions have a department where all these maps get 
collected.


You can go to your town and ask for weird maps today - e.g. a map with 
all jogging

routes and waste water pipes but without autobahns.

You could say that you have one database per layer.

As I said - I saw this construction in real world outside. I am pretty 
sure that other

states maybe have other solutions but the described solution exist.

Susanne

--
Dipl. Inf. Susanne Ebrecht - 2ndQuadrant
PostgreSQL Development, 24x7 Support, Training and Services
www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Kohei KaiGai
2012/5/23 Robert Haas robertmh...@gmail.com:
 On Wed, May 23, 2012 at 3:45 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 I wanted to have discussion to handle this problem.

 Unlike leaky-view problem, we don't need to worry about unexpected
 qualifier distribution on either side of join, because a scan on table
 never contains any join. Thus, all we need to care about is order of
 qualifiers chained on a particular scan node; being reordered by
 the cost to invoke functions.

 How about an idea to track FuncExpr come from the security policy
 and enforce 0 on its cost? Regular functions never reach zero
 cost, so the security policy must be re-ordered to the head.

 Hmm.  That would disregard the relative costs of multiple qualifiers
 all of which were injected by the security policy, which I suspect is
 not a good idea.
  Furthermore, I think that we should not assume that
 there is no join involved.  I would expect a fairly popular RLS qual
 to be something of the form WHERE NOT EXISTS (SELECT 1 FROM hide_me
 WHERE hide_me.pk = thistab.pk).

Please ignore the approach to track cost value of qualifiers.
I believe it does not work well without something fundamental updates.

 Perhaps when we see that RLS
 applies, we should replace the reference to the original table with a
 subquery RTE that has the security_barrier flag set - essentially
 treating a table with RLS as if it were a security view.

I become to think it is a better approach than tracking origin of each
qualifiers. One problem is case handling on update or delete statement.

It may be possible to rewrite the update / delete query as follows:

From:
  UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
To:
  UPDATE tbl SET X = X + 1 WHERE ctid = (
  SELECT * FROM (
  SELECT ctid FROM tbl WHERE uname = getpgusername()  == (*)
should have security-barrier
  ) AS tbl_subqry WHERE f_leak(Y)
  );

Expanded sub-queries will have security-barrier flag, so it enforces
the uname = getpgusername() being checked earlier than f_leak(Y).
We may need to measure the performance impact due to the reform.

 Also, suppose that Bob applies an RLS policy to a table, and, later,
 Alice selects from the table.  How do we keep Bob from usurping
 Alice's privileges?  If we insist that Bob's RLS policy function runs
 as Bob, then it defeats inlining; but if it runs as Alice, then Bob
 can steal Alice's credentials.  One idea is to apply the security
 policy only if Alice's access to the table is granted by Bob.  That
 way, if Alice is (for example) the superuser, she's immune to RLS.
 But that doesn't seem to completely solve the problem, because Alice
 might merely be some other relatively unprivileged user and we still
 don't want Bob to be able to walk off with her access.

I think, this situation is similar to a case when we reference a view
without privileges to underlying tables. If Bob set up a view with
something tricky function, it allows Bob to reference credentials
of users who reference the view.
More or less, it might be a problem when a user try to invoke
a user defined function declared by others.
(Thus, sepgsql policy does not allow users to invoke a function
declared by another one in different domain; without DBA's checks.)

I think it is a good idea not to apply RLS when current user has
superuser privilege from perspective of security model consistency,
but it is inconsistent to check privileges underlying tables.

 Another idea is to set things up so that the RLS policy function isn't
 applied to each row directly; instead, it's invoked once per query and
 *returns* a WHERE clause.  This would be a lot more powerful than the
 proposed design, because now the table owner can write a function that
 imposes quals on some people but not others, which seems very useful.

Sorry, I don't favor this idea. Even if table owner set up a function to
generate additional qualifiers, it also has no guarantee the qualifiers
are invoked prior to user-given one.
It seems to me this approach will have same problem...

 Also, if the point here is to provide security for tables not views,
 it seems like you really need to have (at least a design for) RLS
 security on insert/update/delete operations.  Just adding the same
 filter condition might be adequate for deletes, but I don't think it
 works at all for inserts.  And for updates, what prevents the user from
 updating columns he shouldn't, or updating them to key values he
 shouldn't be able to use?

 If we also apply the security policy to newer version of tuples on
 update and insert, one idea is to inject a before-row-(update|insert)
 trigger to check whether it satisfies the security policy.
 For same reason, the trigger should be executed at the end of
 trigger chain.

 It's not clear to me that there is any need for built-in server
 functionality here.  If the table owner wants to enforce some sort of
 policy regarding INSERT or UPDATE or DELETE, they can already do that
 today just by attaching a 

Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Florian Pflug
On May23, 2012, at 22:12 , Robert Haas wrote:
 Also, suppose that Bob applies an RLS policy to a table, and, later,
 Alice selects from the table.  How do we keep Bob from usurping
 Alice's privileges?

That problem isn't restricted to RLW, though. Bob could just as well
have booby-trapped any other SQL object, e.g. could have added
bobs_malicious_function() to a view Alice selects from, or attached
it as a trigger to a table Alice inserts to.

It would be nice if there was (optional) protection against these
kinds of attacks, but it's not really something that RLS should be
burdened with.

BTW, I've wondered in the past how tight our protection against some
trying to take over the postgres role during pg_dump is. On the surface,
it seems that we're safe because pg_dump doesn't invoke user-defined
functions except output functions (which require superuser privileges
to modify). But that's not exactly a solid line of defense...

 If we also apply the security policy to newer version of tuples on
 update and insert, one idea is to inject a before-row-(update|insert)
 trigger to check whether it satisfies the security policy.
 For same reason, the trigger should be executed at the end of
 trigger chain.
 
 It's not clear to me that there is any need for built-in server
 functionality here.  If the table owner wants to enforce some sort of
 policy regarding INSERT or UPDATE or DELETE, they can already do that
 today just by attaching a trigger to the table.  And they can enforce
 whatever policy they like that way.  Before designing any new
 mechanism, what's wrong with the existing one?

Yeah, applying the security policy to the new row (for UPDATES
and INSERTS) seems weird - the policy determines what you can see,
not what you can store, which might be two different things.

But the security policy should still apply to the old rows, i.e.
you shouldn't be after to UPDATE or DELETE rows you cannot see, no?

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the concept of a DATABASE

2012-05-24 Thread Florian Pflug
On May24, 2012, at 11:39 , Susanne Ebrecht wrote:
 There are lots of offices / departments creating maps. Topography maps,
 pipeline maps, nature conservancy (e.g. where are the nests from endangered
 birds?), mineral resources, wire maps, street maps, bicycle / jogging maps,
 tourists maps, tree maps, cadastral land register, and so on.
 
 All this departments have their own databases for their own maps.
 They only map their own stuff.
 
 Towns / states / regions have a department where all these maps get collected.

The question is, how do they get collected? If they use some home-grown 
replication,
they might just as well collect them into schemas instead of databases. The 
same is
possible with slony, I think. And if they use WAL-based replication, they have 
no
choice but to collect them in different clusters, so cross-database queries 
within
a cluster wouldn't help.

I think that you're right that reporting would one of the main use-cases for 
cross-
database queries. But reporting is also, I think, one of the main uses-cases for
WAL-based replication. So having cross-database queries with don't allow queries
across multiple replicas will leave quite a few people out in the cold.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_statements temporary file

2012-05-24 Thread Magnus Hagander
If pg_stat_statements is set to store it's data across restarts, it
stores it in global/pg_stat_statements.stat. This causes some
interesting things to happen in combination with a base backup -
namely, if you take a base backup *after* you have restarted th
emaster, the slave will get a snapshot of whatever was in the
temporary file at the time of the restart. This is quite unpredictable
- particularly in relation to a slave where it gets a snapshot from
the last restart, not from the base backup, after which it diverges.
AFAICT, it also has the property that if the server crashes, it will
reload the latest snapshot - not reset to 0 or anything like that.

Finally, if the server were to crash *while* the file is being
written, it will get a corrupt file (I haven't tested this part, but
it's rather obvious from the code). I'm pretty sure this could lead to
a situation where the database wouldn't restart.

Fixing the last part is easy - we need to write the file to a
temporary file and then rename() it into place, like we do with the
stats collectors file.

Fixing the first one, I can think of a few things:

1) unlink() the file after we've read it.
2) forcibly exclude the file from base backups taken with
pg_basebackup. We'd still have the problem when it comes to backups
taken manually.
3) avoid loading the file on a standby (that wouldn't fix the similar
problem on the master of course)

And perhaps some other solution I haven't thought of?

If we want to go with option 2, we have another problem - it's in the
global directory. But the name is dependent on what's in a contrib
module. Meaning we'd have to teach core postgresql about a contrib
modules filename, which pretty much breaks the abstraction layer.
Perhaps a better choice here would be to create another directory
under the data directory that is always excluded from base backup, and
store it there? That would also work for third party modules that core
can never learn about...

In general, should a contrib module really store data in the global/
directory? Seems pretty ugly to me...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Kohei KaiGai
2012/5/24 Florian Pflug f...@phlo.org:
 If we also apply the security policy to newer version of tuples on
 update and insert, one idea is to inject a before-row-(update|insert)
 trigger to check whether it satisfies the security policy.
 For same reason, the trigger should be executed at the end of
 trigger chain.

 It's not clear to me that there is any need for built-in server
 functionality here.  If the table owner wants to enforce some sort of
 policy regarding INSERT or UPDATE or DELETE, they can already do that
 today just by attaching a trigger to the table.  And they can enforce
 whatever policy they like that way.  Before designing any new
 mechanism, what's wrong with the existing one?

 Yeah, applying the security policy to the new row (for UPDATES
 and INSERTS) seems weird - the policy determines what you can see,
 not what you can store, which might be two different things.

 But the security policy should still apply to the old rows, i.e.
 you shouldn't be after to UPDATE or DELETE rows you cannot see, no?

The case of INSERT / DELETE are simple; All we need to apply is
checks on either new or old tuples.

In case of UPDATE, we need to check on the old tuple whether use can
see, and on the new tuple whether use can store them.
Indeed, these are different checks, however, it seems like a black hole
if the new tuple is allowed to write but no reader privileges.
I expect most use cases choose same policy on reader timing and
writer times at UPDATE statement.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_statments queryid

2012-05-24 Thread Magnus Hagander
Was there any actual reason why we didn't end up exposing queryid in
the pg_stat_statements view?

It would be highly useful when tracking changes over time. Right now I
see people doing md5(query) to do that, which is a lot more ugly (and
obviously uses more space and is slow, too).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LISTEN/NOTIFY Security and the docs

2012-05-24 Thread Magnus Hagander
On Fri, May 18, 2012 at 5:08 PM, Chander Ganesan chan...@otg-nc.com wrote:
 Hi All,

 I just realized that anyone can listen for notifications (using listen) so
 long as they know the channel name.  This means that a user could receive
 and view the payload for another user.

 Perhaps it would be good to note this in the documentation (i.e., there
 should be no expectation of privacy/security when using listen/notify, so
 any user that can connect to a database could issue and receive
 notifications for any channel.)

Might be worth a note, yes. The lack of a note really should tell you
that it's a broadcast, but it wouldn't hurt to have an extra one.

Want to prepare a patch?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Florian Pflug
On May24, 2012, at 12:43 , Kohei KaiGai wrote:
 The case of INSERT / DELETE are simple; All we need to apply is
 checks on either new or old tuples.
 
 In case of UPDATE, we need to check on the old tuple whether use can
 see, and on the new tuple whether use can store them.
 Indeed, these are different checks, however, it seems like a black hole
 if the new tuple is allowed to write but no reader privileges.
 I expect most use cases choose same policy on reader timing and
 writer times at UPDATE statement.

I don't think preventing block holes is sensible here - it might,
in fact, be *just* what the user wants.

Imagine a messaging system. A reasonable RLS policy would be to allow
a user to see messages addressed to him. Yet you wouldn't want to prevent
her from creating messages to other people - cause what good is a messaging
system that only allows you to send messages to yourself. What you
probably *would* want to do, though, is to check that she did put herself in
as the sender when she creates a message. And you'd probably wanna forbit
updates entirely. So you'd have

  - A RLS policy that checks current_user = ANY(recipients)
  - An ON INSERT trigger which checks current_user = sender
  - An ON UPDATE trigger which errors out

If RLS policy applies to INSERTEed rows also, how would you do that?

Another example, although in the realm of filesystem permissions, is Mac OS X.
Per default, every user has a Drop Box folder, which anybody can write to, yet
only the owner can read. This allows you to easily transfer files from one
user to another without allowing a third party to read it.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 11:43, Magnus Hagander mag...@hagander.net wrote:
 In general, should a contrib module really store data in the global/
 directory? Seems pretty ugly to me...

I think the case could be made for moving pg_stat_statements into
core, as an optionally enabled view, like pg_stat_user_functions,
since pg_stat_statements is now rather a lot more useful than it used
to be. That would solve that problem, as well as putting
pg_stat_statements into the hands of the largest possible number of
people, which would be a positive development, in my humble and fairly
predictable opinion.

However, pg_stat_statements will not prevent the database from
starting if the file is corrupt. It makes some basic attempts to
detect that within pgss_shmem_startup(), and will simply log the
problem and unlink the file in the event of detecting corruption.
Otherwise, I suppose you might get garbage values in
pg_stat_statements, which, while rather annoying and possibly
unacceptable, is hardly the end of the world.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-24 Thread Magnus Hagander
On Thu, May 24, 2012 at 1:36 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 24 May 2012 11:43, Magnus Hagander mag...@hagander.net wrote:
 In general, should a contrib module really store data in the global/
 directory? Seems pretty ugly to me...

 I think the case could be made for moving pg_stat_statements into
 core, as an optionally enabled view, like pg_stat_user_functions,
 since pg_stat_statements is now rather a lot more useful than it used
 to be. That would solve that problem, as well as putting
 pg_stat_statements into the hands of the largest possible number of
 people, which would be a positive development, in my humble and fairly
 predictable opinion.

Well, it would solve the problem for this specific case - but there
will always be yet another extension. Actually, it would only solve
the *ugliness*, and not the actual problem.

(That's not to say tha tI don't agree that moving it into core would
be a good idea, but that's not happening for 9.2 - and the problem
exists in 9.1 as well)


 However, pg_stat_statements will not prevent the database from
 starting if the file is corrupt. It makes some basic attempts to
 detect that within pgss_shmem_startup(), and will simply log the
 problem and unlink the file in the event of detecting corruption.
 Otherwise, I suppose you might get garbage values in
 pg_stat_statements, which, while rather annoying and possibly
 unacceptable, is hardly the end of the world.

Ok. I was worried it might crash on loading the data when it was
corrupt - say a size field that ended up specifying gigabytes that it
then tries to allocate, or something like that.

What actually happens if it tries to repalloc() something huge? palloc
will throw an elog(ERROR), and since this happens during postmaster
startup, are you sure it won't prevent the server from starting?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog stops upon server restart

2012-05-24 Thread Magnus Hagander
On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown t...@linux.com wrote:
 On 10 April 2012 21:07, Magnus Hagander mag...@hagander.net wrote:
 On Friday, April 6, 2012, Thom Brown wrote:

 Hi,

 I've tried out pg_receivexlog and have noticed that when restarting
 the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
 This is surprising as the DBA would have to remember to start
 pg_receivexlog up again.


 This is intentional as far as that's how the code was written, there's not a
 malfunctioning piece of code somewhere.

 It would probably make sense to have an auto-reconnect feature, and to have
 an option to turn it on/off.

 If you haven't already (my wifi here is currently quite useless, which is
 why I'm working on my email backlog, so I can't check), please add it to the
 open items list.

 I think it would also be useful to add a paragraph to the
 documentation stating use-cases for this feature, and its advantages.

Attached is a patch that implements this. Seems reasonable?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


pg_receivexlog_loop.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] shared_preload_libraries path

2012-05-24 Thread Magnus Hagander
Would i make sense to have a postgresql.conf parameter that would add
to LD_LIBRARY_PATH when loading libraries from
shared_preload_libraries (and other library loads). To make it
possible to configure it without having to  mess around with the
operating system configuration? Or is that too much reimplementing OS
functionality?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 12:42, Magnus Hagander mag...@hagander.net wrote:
 What actually happens if it tries to repalloc() something huge? palloc
 will throw an elog(ERROR), and since this happens during postmaster
 startup, are you sure it won't prevent the server from starting?

Oh, yes, missed that.

/* Previous incarnation might have had a larger query_size */
if (temp.query_len = buffer_size)
{
buffer = (char *) repalloc(buffer, temp.query_len + 1);
buffer_size = temp.query_len + 1;
}

Here, temp receives its value from an fread().

This could probably be coded to be defensive against such things, but
a better fix would be preferred. I have to wonder how much of a
problem corruption is likely to be though, given that we only save to
disk in a corresponding pgss_shmem_shutdown() call, which actually has
more protections against corruption. The window for the saved file to
be corrupt seems rather small, though I accept that a better window
would be zero.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements temporary file

2012-05-24 Thread Magnus Hagander
On Thu, May 24, 2012 at 2:16 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 24 May 2012 12:42, Magnus Hagander mag...@hagander.net wrote:
 What actually happens if it tries to repalloc() something huge? palloc
 will throw an elog(ERROR), and since this happens during postmaster
 startup, are you sure it won't prevent the server from starting?

 Oh, yes, missed that.

                /* Previous incarnation might have had a larger query_size */
                if (temp.query_len = buffer_size)
                {
                        buffer = (char *) repalloc(buffer, temp.query_len + 1);
                        buffer_size = temp.query_len + 1;
                }

 Here, temp receives its value from an fread().

 This could probably be coded to be defensive against such things, but
 a better fix would be preferred. I have to wonder how much of a
 problem corruption is likely to be though, given that we only save to
 disk in a corresponding pgss_shmem_shutdown() call, which actually has
 more protections against corruption. The window for the saved file to
 be corrupt seems rather small, though I accept that a better window
 would be zero.

Right. But writing to a temp file and rename()ing it into place is trivial.

It's really the other issues raised that are bigger ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_preload_libraries path

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 13:09, Magnus Hagander mag...@hagander.net wrote:
 Would i make sense to have a postgresql.conf parameter that would add
 to LD_LIBRARY_PATH when loading libraries from
 shared_preload_libraries (and other library loads). To make it
 possible to configure it without having to  mess around with the
 operating system configuration? Or is that too much reimplementing OS
 functionality?

-1, I'm afraid.

I found this blog post to be insightful:

https://blogs.oracle.com/rie/entry/tt_ld_library_path_tt

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_preload_libraries path

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 13:22, Peter Geoghegan pe...@2ndquadrant.com wrote:
 I found this blog post to be insightful:

 https://blogs.oracle.com/rie/entry/tt_ld_library_path_tt

This one might be more useful, and itself refers to the
aforementioned, earlier post:

https://blogs.oracle.com/ali/entry/avoiding_ld_library_path_the

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog stops upon server restart

2012-05-24 Thread Thom Brown
On 24 May 2012 13:05, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown t...@linux.com wrote:
 On 10 April 2012 21:07, Magnus Hagander mag...@hagander.net wrote:
 On Friday, April 6, 2012, Thom Brown wrote:

 Hi,

 I've tried out pg_receivexlog and have noticed that when restarting
 the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
 This is surprising as the DBA would have to remember to start
 pg_receivexlog up again.


 This is intentional as far as that's how the code was written, there's not a
 malfunctioning piece of code somewhere.

 It would probably make sense to have an auto-reconnect feature, and to have
 an option to turn it on/off.

 If you haven't already (my wifi here is currently quite useless, which is
 why I'm working on my email backlog, so I can't check), please add it to the
 open items list.

 I think it would also be useful to add a paragraph to the
 documentation stating use-cases for this feature, and its advantages.

 Attached is a patch that implements this. Seems reasonable?

s/non fatal/non-fatal/

Yes, this solves the problem for me, except you forgot to translate
noloop in long_options[] . :)

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog stops upon server restart

2012-05-24 Thread Magnus Hagander
On Thu, May 24, 2012 at 2:34 PM, Thom Brown t...@linux.com wrote:
 On 24 May 2012 13:05, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown t...@linux.com wrote:
 On 10 April 2012 21:07, Magnus Hagander mag...@hagander.net wrote:
 On Friday, April 6, 2012, Thom Brown wrote:

 Hi,

 I've tried out pg_receivexlog and have noticed that when restarting
 the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
 This is surprising as the DBA would have to remember to start
 pg_receivexlog up again.


 This is intentional as far as that's how the code was written, there's not 
 a
 malfunctioning piece of code somewhere.

 It would probably make sense to have an auto-reconnect feature, and to have
 an option to turn it on/off.

 If you haven't already (my wifi here is currently quite useless, which is
 why I'm working on my email backlog, so I can't check), please add it to 
 the
 open items list.

 I think it would also be useful to add a paragraph to the
 documentation stating use-cases for this feature, and its advantages.

 Attached is a patch that implements this. Seems reasonable?

 s/non fatal/non-fatal/

 Yes, this solves the problem for me, except you forgot to translate
 noloop in long_options[] . :)
Fixed :-)

Did you test it, or just assumed it worked? ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog stops upon server restart

2012-05-24 Thread Thom Brown
On 24 May 2012 13:37, Magnus Hagander mag...@hagander.net wrote:
 On Thu, May 24, 2012 at 2:34 PM, Thom Brown t...@linux.com wrote:
 On 24 May 2012 13:05, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Apr 19, 2012 at 1:00 PM, Thom Brown t...@linux.com wrote:
 On 10 April 2012 21:07, Magnus Hagander mag...@hagander.net wrote:
 On Friday, April 6, 2012, Thom Brown wrote:

 Hi,

 I've tried out pg_receivexlog and have noticed that when restarting
 the cluster, pg_receivexlog gets cut off... it doesn't keep waiting.
 This is surprising as the DBA would have to remember to start
 pg_receivexlog up again.


 This is intentional as far as that's how the code was written, there's 
 not a
 malfunctioning piece of code somewhere.

 It would probably make sense to have an auto-reconnect feature, and to 
 have
 an option to turn it on/off.

 If you haven't already (my wifi here is currently quite useless, which is
 why I'm working on my email backlog, so I can't check), please add it to 
 the
 open items list.

 I think it would also be useful to add a paragraph to the
 documentation stating use-cases for this feature, and its advantages.

 Attached is a patch that implements this. Seems reasonable?

 s/non fatal/non-fatal/

 Yes, this solves the problem for me, except you forgot to translate
 noloop in long_options[] . :)
 Fixed :-)

 Did you test it, or just assumed it worked? ;)

How very dare you. Of course I tested it.  It successfully reconnects
on multiple restarts, checks intermittently when I've stopped the
server, showing the connection error message, successfully continues
when I eventually bring the server back up, and doesn't attempt a
reconnect when using -n.

So looks good to me.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interrupting long external library calls

2012-05-24 Thread Sandro Santilli
On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:
 On 16.05.2012 15:42, Sandro Santilli wrote:
 But CHECK_FOR_INTERRUPTS doesn't return, right ?
 Is there another macro for just checking w/out yet acting upon it ?
 
 Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
 variable, but on Windows it also checks for
 UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
 not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
 think InterruptPending can be set spuriously (even if that's not
 possible today, I wouldn't rely on it), and if you're in a
 HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
 even if InterruptPending is true.
 
 The only sane way to make 3rd party code interruptible is to add
 CHECK_FOR_INTERRUPTS() to it, in safe places.

No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
How could caller code cleanup on interruption ?

--strk; 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Sergey Koposov

Hi,

I've been running some tests on pg 9.2beta1 and in particular a set
of queries like

create table _tmp0 as select * from (
select *, (select healpixid from idt_match as m where
m.transitid=o.transitid)
as x from idt_photoobservation as o offset 0
  ) as y where x%16=ZZZ order by x;

(where ZZZ is some number integer number 0=ZZZ16)

With the following plan:
--
 Sort  (cost=3228814504.96..3228815137.21 rows=252902 width=498)
   Sort Key: y.x
   -  Subquery Scan on y  (cost=0.00..3228791809.25 rows=252902 width=498)
 Filter: ((y.x % 16::bigint) = 0)
 -  Limit  (cost=0.00..3228033102.41 rows=50580456 width=490)
   -  Seq Scan on idt_photoobservation o  
(cost=0.00..3228033102.41 rows=50580456 width=490)
 SubPlan 1
   -  Index Scan using idt_match_transitid_idx on 
idt_match m  (cost=0.00..63.74 rows=1 width=8)
 Index Cond: (transitid = o.transitid)

The schema of the tables are:

e2es2= \d idt_match
 Table public.idt_match
  Column   |   Type   | Modifiers
---+--+---
 sourceid  | bigint   |
 transitid | bigint   |
 healpixid | bigint   |
 flagsxm   | smallint |
Indexes:
idt_match_idx btree (healpixid)
idt_match_transitid_idx btree (transitid)

Table public.idt_photoobservation
  Column   |   Type   | Modifiers
---+--+---
 transitid | bigint   |
 fluxbp0   | real |

more columns

Indexes:
idt_photoobservation_idx btree (transitid)

And I noticed than when I run the query like the one shown above in parallel
(in multiple connections for ZZZ=0...8) the performance of each query 
drops down significantly (factor of 2)(despite the fact that during the 
execution of the query postgres is mostly CPU bound).


So I tried to oprofile it and strace it, And that's what I saw:

The strace -c of each process  shows something like that
#
Process 18660 detached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 76.250.001342   0268987   semop
 23.750.000418   0 61694   read
  0.000.00   0   138   lseek
  0.000.00   0   355   select
  0.000.00   0 3   kill
-- --- --- - - 
100.000.001760331177   total
###

And the oprofile shows this on top:
---
2863981  25.7117  ReleasePredicateLocks
  2863981  100.000  ReleasePredicateLocks [self]
---
1246629  11.1917  LocalBufferAlloc
  1246629  100.000  LocalBufferAlloc [self]
---
1135393  10.1931  CheckForSerializableConflictIn
  1135393  100.000  CheckForSerializableConflictIn [self]


So there is a lot of locking for some reason, And I was surprised to see 
anything related to SSI in the profile at all, because I'm not running 
serializable transactions (I was just running my queries from multiple 
psql sessions).


Is there a reasonable explanation for what I'm seeing  ?

Thanks in advance,
Sergey

More info: there is no other concurrent activity on the machine.
shared_buffers is 10G, The machine has ~ 60G of RAM, 24 cores and proper 
RAID


Another note is that the oprofile and strace were obtained during the 
stage when the idt_photo.. table was scanned and data was being stored in 
the pgsql_tmp (not during the last bit when the data in pgsql_tmp were

actually being sorted).


*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interrupting long external library calls

2012-05-24 Thread Heikki Linnakangas

On 24.05.2012 16:04, Sandro Santilli wrote:

On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:

On 16.05.2012 15:42, Sandro Santilli wrote:

But CHECK_FOR_INTERRUPTS doesn't return, right ?
Is there another macro for just checking w/out yet acting upon it ?


Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
variable, but on Windows it also checks for
UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
think InterruptPending can be set spuriously (even if that's not
possible today, I wouldn't rely on it), and if you're in a
HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
even if InterruptPending is true.

The only sane way to make 3rd party code interruptible is to add
CHECK_FOR_INTERRUPTS() to it, in safe places.


No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
How could caller code cleanup on interruption ?


It would only be safe to call it in places where no cleanup is 
necessary. I don't know if there are any such places in the geos library.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Joachim Wieland
On Tue, May 22, 2012 at 9:50 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm.  I think that if you do it this way, the minimum recovery point
 won't be respected, which could leave you with a corrupted database.
 Now, if all the WAL files that you need are present in pg_xlog anyway,
 then they ought to get replayed anyway, but I think that if you are
 using restore_command (as opposed to streaming replication) we restore
 WAL segments under a different file name, which might cause this
 problem.

Uhm, maybe I add some more details, so you get a better idea of what I
did: The idea was to promote the standby to be the new master. There
was streaming replication active but at some time I had to take the
master down. IIRC from the log I saw that after the master went down,
the standby continued recovering from a bunch of archived log files
(via recovery_command), I had suspected that either the standby was
lagging behind a bit or that the master archived them during shutdown.
When the standby didn't have anything else left to recover from
(saying both xlog file foo doesn't exist and cannot connect to
master), I deleted recovery.conf on the standby and restarted it.

I wouldn't have assumed any corruption was possible given that I did
clean shutdowns on both sides...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interrupting long external library calls

2012-05-24 Thread Sandro Santilli
On Thu, May 24, 2012 at 04:55:34PM +0300, Heikki Linnakangas wrote:
 On 24.05.2012 16:04, Sandro Santilli wrote:
 On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:
 On 16.05.2012 15:42, Sandro Santilli wrote:
 But CHECK_FOR_INTERRUPTS doesn't return, right ?
 Is there another macro for just checking w/out yet acting upon it ?
 
 Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
 variable, but on Windows it also checks for
 UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
 not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
 think InterruptPending can be set spuriously (even if that's not
 possible today, I wouldn't rely on it), and if you're in a
 HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
 even if InterruptPending is true.
 
 The only sane way to make 3rd party code interruptible is to add
 CHECK_FOR_INTERRUPTS() to it, in safe places.
 
 No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
 How could caller code cleanup on interruption ?
 
 It would only be safe to call it in places where no cleanup is
 necessary. I don't know if there are any such places in the geos
 library.

Sure, right before starting and after finishing.
That is nowhere useful for a premature interruption...

The whole point of the work I'm doing these days is letting the
users interrupt GEOS calls which often take a long time and a lot
of memory resources.

The current plan is to provide custom allocators to have automatic
garbage collection on interruption. Turned out not to be an easy task
to bend GEOS into using palloc/pfree, but there's progress in that
direction.

--strk;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [9.2] crash on regex

2012-05-24 Thread Marko Kreen
Following query crashes backend on 9.2:

  select substring('asd TO foo' from ' TO (([a-z0-9._]+|([^]+|)+)+)');

It is supposed to load potentially quoted table name from
CREATE RULE definition.  Works fine on 8.3 .. 9.1

Backtrace:
Program terminated with signal 11, Segmentation fault.
#0  ExecMakeFunctionResult (fcache=0x, econtext=0x953f3e8,
isNull=0xbffb8d1f \b, isDone=0x0) at execQual.c:1833
1833*isNull = fcinfo-isnull;
(gdb) bt
#0  ExecMakeFunctionResult (fcache=0x, econtext=0x953f3e8,
isNull=0xbffb8d1f \b, isDone=0x0) at execQual.c:1833
#1  0x08203104 in ExecEvalExprSwitchContext (expression=0x953ef60,
econtext=0x953f3e8, isNull=0xbffb8d1f \b, isDone=0x0) at
execQual.c:4187
#2  0x082844d7 in evaluate_expr (expr=value optimized out,
result_type=25, result_typmod=value optimized out,
result_collation=100) at clauses.c:4408
...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Kohei KaiGai
2012/5/24 Florian Pflug f...@phlo.org:
 On May24, 2012, at 12:43 , Kohei KaiGai wrote:
 The case of INSERT / DELETE are simple; All we need to apply is
 checks on either new or old tuples.

 In case of UPDATE, we need to check on the old tuple whether use can
 see, and on the new tuple whether use can store them.
 Indeed, these are different checks, however, it seems like a black hole
 if the new tuple is allowed to write but no reader privileges.
 I expect most use cases choose same policy on reader timing and
 writer times at UPDATE statement.

 I don't think preventing block holes is sensible here - it might,
 in fact, be *just* what the user wants.

 Imagine a messaging system. A reasonable RLS policy would be to allow
 a user to see messages addressed to him. Yet you wouldn't want to prevent
 her from creating messages to other people - cause what good is a messaging
 system that only allows you to send messages to yourself. What you
 probably *would* want to do, though, is to check that she did put herself in
 as the sender when she creates a message. And you'd probably wanna forbit
 updates entirely. So you'd have

  - A RLS policy that checks current_user = ANY(recipients)
  - An ON INSERT trigger which checks current_user = sender
  - An ON UPDATE trigger which errors out

 If RLS policy applies to INSERTEed rows also, how would you do that?

 Another example, although in the realm of filesystem permissions, is Mac OS X.
 Per default, every user has a Drop Box folder, which anybody can write to, 
 yet
 only the owner can read. This allows you to easily transfer files from one
 user to another without allowing a third party to read it.

Indeed, you are right. We have no special reason why to enforce same rules
on both of reader and writer stage on UPDATE statement.

So, the proposed interface might be revised as follows:
  ALTER TABLE tblname ADD SECURITY POLICY
  func_name(args, ...) [FOR SELECT |
 INSERT |
 [BEFORE|AFTER] UPDATE |
 DELETE];

In case of INSERT or AFTER UPDATE, I assume the check shall be applied
on the tail of before-row triggers.

(*) I don't check whether it conflicts syntax or not yet.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statments queryid

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 11:50, Magnus Hagander mag...@hagander.net wrote:
 Was there any actual reason why we didn't end up exposing queryid in
 the pg_stat_statements view?

 It would be highly useful when tracking changes over time. Right now I
 see people doing md5(query) to do that, which is a lot more ugly (and
 obviously uses more space and is slow, too).

Right. I continue to maintain that this is a good idea. I raised the
issue more than once. However, my proposal was not accepted by Tom and
Robert, apparently on the basis that queryId's actual value was
partially dictated by things like the endianness of the architecture
used, and the value of OIDs when serialising and subsequently hashing
the post-analysis tree.

What I'd like to be able to do is aggregate this information over time
and/or across standbys in a cluster, as queries are evicted and
subsequently re-entered into pg_stat_statement's shared hash table.
Now, there are situations were this isn't going to work, like when a
third-party logical replication system is used. That's unfortunate,
but I wouldn't expect it makes the information any less useful to the
large majority of people. I'd also credit our users with being
discerning enough to realise that they should not jump to the
conclusion that the value will be stable according to any particular
standard.

Arguments against including an internal value in the view could
equally well be applied to any of the internal statistic collector
views, many of which have oid columns, despite the fact that various
name columns already unambiguously identify tuples in most cases. I
see no reason for the inconsistency, particularly given that the
pg_stat_statements.query column *is* still somewhat ambiguous, as
described in the docs, and given that the query hash value referred to
in the docs anyway.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_preload_libraries path

2012-05-24 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes:
 On 24 May 2012 13:09, Magnus Hagander mag...@hagander.net wrote:
 Would i make sense to have a postgresql.conf parameter that would add
 to LD_LIBRARY_PATH when loading libraries from
 shared_preload_libraries (and other library loads). To make it
 possible to configure it without having to mess around with the
 operating system configuration? Or is that too much reimplementing OS
 functionality?

 -1, I'm afraid.

I think the same.  Aside from the problems Peter points out, there are
too many different ways to spell that variable name on different
platforms.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interrupting long external library calls

2012-05-24 Thread Florian Pflug
On May24, 2012, at 15:04 , Sandro Santilli wrote:
 On Wed, May 16, 2012 at 07:30:03PM +0300, Heikki Linnakangas wrote:
 On 16.05.2012 15:42, Sandro Santilli wrote:
 But CHECK_FOR_INTERRUPTS doesn't return, right ?
 Is there another macro for just checking w/out yet acting upon it ?
 
 Hmm, no. CHECK_FOR_INTERRUPTS() checks the InterruptPending
 variable, but on Windows it also checks for
 UNBLOCKED_SIGNAL_QUEUE(). And even if InterruptPending is set, it's
 not totally certain that CHECK_FOR_INTERRUPTS() won't return. I
 think InterruptPending can be set spuriously (even if that's not
 possible today, I wouldn't rely on it), and if you're in a
 HOLD/RESUME_INTERRUPTS block, CHECK_FOR_INTERRUPTS() will do nothing
 even if InterruptPending is true.
 
 The only sane way to make 3rd party code interruptible is to add
 CHECK_FOR_INTERRUPTS() to it, in safe places.
 
 No place is safe if CHECK_FOR_INTERRUPTS doesn't return.
 How could caller code cleanup on interruption ?

The postgres way is to use PG_TRY/PG_CATCH to make sure stuff gets cleaned
up if an error or an interrupts occurs. You could use those to make the
third-party library exception safe, but it'll probably be a quite
invasive change :-(.

Alternatively, you could replicate the check CHECK_FOR_INTERRUPTS() does,
but then don't actually call ProcessInterrupts() but instead just make
the third-party code abort signalling an error, and call
CHECK_FOR_INTERRUPTS() after the third-party code has returned. On unix-like
systems that'd be a simple as aborting if InterruptPending is set, while
on windows you'd have to do the if(UNBLOCKED_SIGNAL_QUEUE())… stuff first,
since otherwise InterruptPending will never get set. The following function
should do the trick

  bool have_pending_interrupts() {
  #ifdef WIN32
if (UNBLOCKED_SIGNAL_QUEUE())
  pgwin32_dispatch_queued_signals();
  #endif

return InterruptPending  !InterruptHoldoffCount  !CritSectionCount;
  }

The third-party could would then do

  if (have_pending_interrupts())
return some_error;

and you'd invoke in with

  state = third_party_code();
  CHECK_FOR_INTERRUPTS();
  if (state != success)
ereport(…);

There might be slim chance for false positives with that approach, since
ProcessInterrupts() might not always ereport(), even if InterruptHoldoffCount
and CritSectionCount are zero. But they'll simply get turned into spurious
ereport() by the if(state != success) check after CHECK_FOR_INTERRUPTS, and
should be very rare, and happen only shortly after a query cancel request
was received.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statments queryid

2012-05-24 Thread Magnus Hagander
On Thu, May 24, 2012 at 4:26 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 24 May 2012 11:50, Magnus Hagander mag...@hagander.net wrote:
 Was there any actual reason why we didn't end up exposing queryid in
 the pg_stat_statements view?

 It would be highly useful when tracking changes over time. Right now I
 see people doing md5(query) to do that, which is a lot more ugly (and
 obviously uses more space and is slow, too).

 Right. I continue to maintain that this is a good idea. I raised the
 issue more than once. However, my proposal was not accepted by Tom and
 Robert, apparently on the basis that queryId's actual value was
 partially dictated by things like the endianness of the architecture
 used, and the value of OIDs when serialising and subsequently hashing
 the post-analysis tree.

 What I'd like to be able to do is aggregate this information over time
 and/or across standbys in a cluster, as queries are evicted and
 subsequently re-entered into pg_stat_statement's shared hash table.

That's exactly the usecase I'm looking at here, except it's not
actually across standbys in this case.


 Now, there are situations were this isn't going to work, like when a
 third-party logical replication system is used. That's unfortunate,
 but I wouldn't expect it makes the information any less useful to the
 large majority of people. I'd also credit our users with being
 discerning enough to realise that they should not jump to the
 conclusion that the value will be stable according to any particular
 standard.

As long as it's documented as such, I don't see a problem with that at all.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statments queryid

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 10:26 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 24 May 2012 11:50, Magnus Hagander mag...@hagander.net wrote:
 Was there any actual reason why we didn't end up exposing queryid in
 the pg_stat_statements view?

 It would be highly useful when tracking changes over time. Right now I
 see people doing md5(query) to do that, which is a lot more ugly (and
 obviously uses more space and is slow, too).

 Right. I continue to maintain that this is a good idea. I raised the
 issue more than once. However, my proposal was not accepted by Tom and
 Robert, apparently on the basis that queryId's actual value was
 partially dictated by things like the endianness of the architecture
 used, and the value of OIDs when serialising and subsequently hashing
 the post-analysis tree.

No, my concern was more that I wasn't clear on what you hoped to do with it.

But I think this explanation is enough to convince me that it might be
worthwhile:

 What I'd like to be able to do is aggregate this information over time
 and/or across standbys in a cluster, as queries are evicted and
 subsequently re-entered into pg_stat_statement's shared hash table.
 Now, there are situations were this isn't going to work, like when a
 third-party logical replication system is used. That's unfortunate,
 but I wouldn't expect it makes the information any less useful to the
 large majority of people. I'd also credit our users with being
 discerning enough to realise that they should not jump to the
 conclusion that the value will be stable according to any particular
 standard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Florian Pflug
On May24, 2012, at 16:19 , Kohei KaiGai wrote:
 So, the proposed interface might be revised as follows:
  ALTER TABLE tblname ADD SECURITY POLICY
  func_name(args, ...) [FOR SELECT |
 INSERT |
 [BEFORE|AFTER] UPDATE 
 |
 DELETE];
 
 In case of INSERT or AFTER UPDATE, I assume the check shall be applied
 on the tail of before-row triggers.

I'd go with just SELECT, UPDATE, DELETE, and leave the INSERT and BEFORE
UPDATE case to regular triggers, for two reasons

First, it's conceptually much simpler, since the policy always just adds
an implicit WHERE clause, period. This of course assumes that DELETE and
(BEFORE) UPDATE simply skips rows for which the policy function returns false,
instead of reporting 'permission denied' or something. But that's the most
reasonable behaviour anyway, I think, because otherwise you'd make batch
UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
of tripping over some invisible row and getting and error.

And second, it avoids mimicking functionality that is already provided
by an existing feature, namely triggers.

People will have to deal with the trigger ordering issue, but that's nothing
new, and I bet most people have a system in place for that. I usually prefix
my trigger names with 'a_' to 'z_', for example, to make the ordering explicit.

Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
created references to rows which are invisible to you, or should FOREIGN KEY
constraints be exempt from security policies? I'd say they shouldn't be, i.e.
the policy WHERE clause should be added to constraint checking queries like
usual. But maybe I'm missing some reason why that'd be undesirable…

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statments queryid

2012-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, May 24, 2012 at 10:26 AM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 But I think this explanation is enough to convince me that it might be
 worthwhile:

 What I'd like to be able to do is aggregate this information over time
 and/or across standbys in a cluster, as queries are evicted and
 subsequently re-entered into pg_stat_statement's shared hash table.

It appears to me that the above ...

 ... I'd also credit our users with being
 discerning enough to realise that they should not jump to the
 conclusion that the value will be stable according to any particular
 standard.

... is in direct contradiction to this.  The proposed usage absolutely
requires that the hash be stable over time and/or across standbys.
I do not want to promise that it's stable over any timeframe longer than
a server reboot.  Aside from the OID dependence problem, we might well
change the way the hash is calculated in minor releases, for example by
adding or removing struct fields.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.2] crash on regex

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 10:16 AM, Marko Kreen mark...@gmail.com wrote:
 Following query crashes backend on 9.2:

  select substring('asd TO foo' from ' TO (([a-z0-9._]+|([^]+|)+)+)');

I spent some time trying to reduce this to the simplest case that
still causes a crash, and came up with this:

select substring('a' from '((a))+');

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 10:13 AM, Joachim Wieland j...@mcknight.de wrote:
 On Tue, May 22, 2012 at 9:50 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm.  I think that if you do it this way, the minimum recovery point
 won't be respected, which could leave you with a corrupted database.
 Now, if all the WAL files that you need are present in pg_xlog anyway,
 then they ought to get replayed anyway, but I think that if you are
 using restore_command (as opposed to streaming replication) we restore
 WAL segments under a different file name, which might cause this
 problem.

 Uhm, maybe I add some more details, so you get a better idea of what I
 did: The idea was to promote the standby to be the new master. There
 was streaming replication active but at some time I had to take the
 master down. IIRC from the log I saw that after the master went down,
 the standby continued recovering from a bunch of archived log files
 (via recovery_command), I had suspected that either the standby was
 lagging behind a bit or that the master archived them during shutdown.
 When the standby didn't have anything else left to recover from
 (saying both xlog file foo doesn't exist and cannot connect to
 master), I deleted recovery.conf on the standby and restarted it.

 I wouldn't have assumed any corruption was possible given that I did
 clean shutdowns on both sides...

The thing that's worrying me is that there's not really any such thing
as a clean shutdown on a standby.  When you shut down the master, it
checkpoints.  When you shut down the standby, it can't checkpoint, so
I think it's still going to enter recovery at startup.  It'd be
interesting to know where that recovery began and ended as compared
with the minimum recovery point just before the shutdown.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Merlin Moncure
On Thu, May 24, 2012 at 8:24 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Hi,

 I've been running some tests on pg 9.2beta1 and in particular a set
 of queries like

 create table _tmp0 as select * from (
        select *, (select healpixid from idt_match as m where
                                        m.transitid=o.transitid)
                as x from idt_photoobservation as o offset 0
          ) as y where x%16=ZZZ order by x;

 (where ZZZ is some number integer number 0=ZZZ16)

 With the following plan:
 --
  Sort  (cost=3228814504.96..3228815137.21 rows=252902 width=498)
   Sort Key: y.x
   -  Subquery Scan on y  (cost=0.00..3228791809.25 rows=252902 width=498)
         Filter: ((y.x % 16::bigint) = 0)
         -  Limit  (cost=0.00..3228033102.41 rows=50580456 width=490)
               -  Seq Scan on idt_photoobservation o
  (cost=0.00..3228033102.41 rows=50580456 width=490)
                     SubPlan 1
                       -  Index Scan using idt_match_transitid_idx on
 idt_match m  (cost=0.00..63.74 rows=1 width=8)
                             Index Cond: (transitid = o.transitid)

 The schema of the tables are:

 e2es2= \d idt_match
     Table public.idt_match
  Column   |   Type   | Modifiers
 ---+--+---
  sourceid  | bigint   |
  transitid | bigint   |
  healpixid | bigint   |
  flagsxm   | smallint |
 Indexes:
    idt_match_idx btree (healpixid)
    idt_match_transitid_idx btree (transitid)

 Table public.idt_photoobservation
  Column   |   Type   | Modifiers
 ---+--+---
  transitid | bigint   |
  fluxbp0   | real     |
 
 more columns
 
 Indexes:
    idt_photoobservation_idx btree (transitid)

 And I noticed than when I run the query like the one shown above in parallel
 (in multiple connections for ZZZ=0...8) the performance of each query drops
 down significantly (factor of 2)(despite the fact that during the execution
 of the query postgres is mostly CPU bound).

 So I tried to oprofile it and strace it, And that's what I saw:

 The strace -c of each process  shows something like that
 #
 Process 18660 detached
 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  76.25    0.001342           0    268987           semop
  23.75    0.000418           0     61694           read
  0.00    0.00           0       138           lseek
  0.00    0.00           0       355           select
  0.00    0.00           0         3           kill
 -- --- --- - - 
 100.00    0.001760                331177           total
 ###

 And the oprofile shows this on top:
 ---
 2863981  25.7117  ReleasePredicateLocks
  2863981  100.000  ReleasePredicateLocks [self]
 ---
 1246629  11.1917  LocalBufferAlloc
  1246629  100.000  LocalBufferAlloc [self]
 ---
 1135393  10.1931  CheckForSerializableConflictIn
  1135393  100.000  CheckForSerializableConflictIn [self]
 

 So there is a lot of locking for some reason, And I was surprised to see
 anything related to SSI in the profile at all, because I'm not running
 serializable transactions (I was just running my queries from multiple psql
 sessions).

Are you sure?  I looked at all the ReleasePredicateLocks calls and
they appear to be guarded by:

/* Nothing to do if this is not a serializable transaction */
if (MySerializableXact == InvalidSerializableXact)
return false;

What's the default isolation mode set to?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statments queryid

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 16:06, Tom Lane t...@sss.pgh.pa.us wrote:
 I do not want to promise that it's stable over any timeframe longer than
 a server reboot.

You already have though, since pg_stat_statements persistently stores
statistics to disk by default, and can only ever recognise statement
equivalence based on the (dbid, userid, queryid) hash key.

 Aside from the OID dependence problem, we might well
 change the way the hash is calculated in minor releases, for example by
 adding or removing struct fields.

You've already invalidated the saved statistics if you do that, so all
bets are off anyway. If you have to do it, it'll be necessary to bump
PGSS_FILE_HEADER, so that pg_stat_statements will be cleared upon
restart. That will in turn necessitate documenting the issue in the
minor version release notes. I'd hope to avoid that, but it doesn't
seem to me that the situation is made any worse than before by
exposing the value. On the contrary, it could help users to understand
where the problem may have affected them.

If you don't expose the value, users are going to do this sort of
thing anyway, but will be far worse off due to using the query text or
a hash thereof instead of the internal value.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating multiple indexes in one table scan.

2012-05-24 Thread Stephen Frost
* Gurjeet Singh (singh.gurj...@gmail.com) wrote:
 Bruce points out the even simpler case is to build several indexes in
 parallel over the same scan.
 
 I thought I had posted a patch to that effect long back, but upon searching
 my emails apparently I forgot about the patch.
 
 Attached is the patch that I developed in Nov. 2010, so expect a lot of bit
 rot. I had tried to make it elegant, but I have to admit its a hack. This
 patch does not imply that it is using any kind of parallelism, the context
 in which that above statement was made. It just helps to avoid scanning the
 same relation multiple times. I performed some tests on it and AFAICR, this
 did not produce a net win. But those tests may have been performed in a
 virtual machine and not on a bare metal, I forget.

I'm not too surprised that it didn't help all that much since you're
doing everything in one backend.  My guess at what Bruce was talking
about is being able to actually have multiple CREATE INDEX's going in
parallel in different backends.  If they didn't all grab an
AccessExclusive lock, wouldn't they be able to use the same ring buffer
to read through the table anyway?  And with CREATE INDEX CONCURRENTLY,
isn't this supported already?

I haven't tried this yet, but it sure seems like something we could
probably already claim to support..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [9.2] crash on regex

2012-05-24 Thread Thom Brown
On 24 May 2012 16:08, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 24, 2012 at 10:16 AM, Marko Kreen mark...@gmail.com wrote:
 Following query crashes backend on 9.2:

  select substring('asd TO foo' from ' TO (([a-z0-9._]+|([^]+|)+)+)');

 I spent some time trying to reduce this to the simplest case that
 still causes a crash, and came up with this:

 select substring('a' from '((a))+');

It appears to occur with any quantifier attached to a group that is
more than 1 level deep with nothing between the groups, such as
((a))?, ((a))*, ((a)){1,4}, (((a)))+

Or if the quantifier is the only additional thing between the groups,
such as ((a)+)

But when breaking the groups up, it's fine, so this works: ((a)b)+

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating multiple indexes in one table scan.

2012-05-24 Thread Gurjeet Singh
On Thu, May 24, 2012 at 11:22 AM, Stephen Frost sfr...@snowman.net wrote:

 * Gurjeet Singh (singh.gurj...@gmail.com) wrote:
  Bruce points out the even simpler case is to build several indexes in
  parallel over the same scan.
 
  I thought I had posted a patch to that effect long back, but upon
 searching
  my emails apparently I forgot about the patch.
 
  Attached is the patch that I developed in Nov. 2010, so expect a lot of
 bit
  rot. I had tried to make it elegant, but I have to admit its a hack. This
  patch does not imply that it is using any kind of parallelism, the
 context
  in which that above statement was made. It just helps to avoid scanning
 the
  same relation multiple times. I performed some tests on it and AFAICR,
 this
  did not produce a net win. But those tests may have been performed in a
  virtual machine and not on a bare metal, I forget.

 I'm not too surprised that it didn't help all that much since you're
 doing everything in one backend.  My guess at what Bruce was talking
 about is being able to actually have multiple CREATE INDEX's going in
 parallel in different backends.  If they didn't all grab an
 AccessExclusive lock, wouldn't they be able to use the same ring buffer
 to read through the table anyway?  And with CREATE INDEX CONCURRENTLY,
 isn't this supported already?

 I haven't tried this yet, but it sure seems like something we could
 probably already claim to support..


It'd be great if one of standard utilities like pg_restore supported this,
by spawning every concurrent index build in separate backends. Just a
thought.


-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] [9.2] crash on regex

2012-05-24 Thread Thom Brown
On 24 May 2012 16:24, Thom Brown t...@linux.com wrote:
 On 24 May 2012 16:08, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 24, 2012 at 10:16 AM, Marko Kreen mark...@gmail.com wrote:
 Following query crashes backend on 9.2:

  select substring('asd TO foo' from ' TO (([a-z0-9._]+|([^]+|)+)+)');

 I spent some time trying to reduce this to the simplest case that
 still causes a crash, and came up with this:

 select substring('a' from '((a))+');

 It appears to occur with any quantifier attached to a group that is
 more than 1 level deep with nothing between the groups, such as
 ((a))?, ((a))*, ((a)){1,4}, (((a)))+

 Or if the quantifier is the only additional thing between the groups,
 such as ((a)+)

 But when breaking the groups up, it's fine, so this works: ((a)b)+

Hmmm... curiously, lazy (non-greedy) quantifiers are stable, such as: ((a))*?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statments queryid

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 16:06, Tom Lane t...@sss.pgh.pa.us wrote:
 What I'd like to be able to do is aggregate this information over time
 and/or across standbys in a cluster, as queries are evicted and
 subsequently re-entered into pg_stat_statement's shared hash table.

 It appears to me that the above ...

 ... I'd also credit our users with being
 discerning enough to realise that they should not jump to the
 conclusion that the value will be stable according to any particular
 standard.

 ... is in direct contradiction to this.

I simply meant that we ought to be able to trust that people will
actually investigate the stability guarantees of a newly exposed
query_id before going and writing a tool that does some sort of
aggregation - they should and will make informed decisions. If that
guarantee is limited to we might have to change the walker logic
during a minor release, so the value might change for some queries, so
we don't promise that it will be a stable identifier but will
naturally strive to do our best to avoid invalidating existing
statistics (within postgres and aggregated by external tools), that
wouldn't put many people off. Still, I don't think it's all that
likely that we'll ever have to adjust the walker logic, since
pg_stat_statements already doesn't strictly promise that collisions
cannot occur, while making them very improbable.

I'm not even asking that possible uses for queryId be documented as
being useful for this sort of thing. I only ask that we expose it and
document it.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating multiple indexes in one table scan.

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singh singh.gurj...@gmail.com wrote:
 It'd be great if one of standard utilities like pg_restore supported this,
 by spawning every concurrent index build in separate backends. Just a
 thought.

If parallel restore doesn't already take this into account when doing
job scheduling, that would be a worthwhile improvement to consider.

Personally, I think the big win in this area is likely to be parallel
sort.  There may well be some more we can squeeze out of our existing
sort implementation first, and I'm all in favor of that, but
ultimately if you've got 60GB of data to sort and it's all in cache,
you want to be able to use more than one CPU for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Kohei KaiGai
2012/5/24 Florian Pflug f...@phlo.org:
 On May24, 2012, at 16:19 , Kohei KaiGai wrote:
 So, the proposed interface might be revised as follows:
  ALTER TABLE tblname ADD SECURITY POLICY
      func_name(args, ...) [FOR SELECT |
                                                         INSERT |
                                                         [BEFORE|AFTER] 
 UPDATE |
                                                         DELETE];

 In case of INSERT or AFTER UPDATE, I assume the check shall be applied
 on the tail of before-row triggers.

 I'd go with just SELECT, UPDATE, DELETE, and leave the INSERT and BEFORE
 UPDATE case to regular triggers, for two reasons

 First, it's conceptually much simpler, since the policy always just adds
 an implicit WHERE clause, period. This of course assumes that DELETE and
 (BEFORE) UPDATE simply skips rows for which the policy function returns false,
 instead of reporting 'permission denied' or something. But that's the most
 reasonable behaviour anyway, I think, because otherwise you'd make batch
 UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
 of tripping over some invisible row and getting and error.

I definitely agree with starting a new feature from simple implementation.

Although I'm inclined to the approach to replace references to tables with
security policy by sub-queries with security barrier flag, instead of adding
qualifiers of where clause to avoid the leaky-view scenario, it will make its
implementation mush simpler.

 Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
 created references to rows which are invisible to you, or should FOREIGN KEY
 constraints be exempt from security policies? I'd say they shouldn't be, i.e.
 the policy WHERE clause should be added to constraint checking queries like
 usual. But maybe I'm missing some reason why that'd be undesirable…

I agree. The row level security policy should not be applied during FK checks
(or other internal stuff; to be harmless). At the previous discussion, it was
issued that iteration of FK/PK proving enables malicious one to estimate
existence of invisible tuple and its key value, although they cannot see the
actual values. It is well documented limitation, thus, user should not use row-
level security (or should not use natural key) if they cannot accept
this limitation.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.2] crash on regex

2012-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, May 24, 2012 at 10:16 AM, Marko Kreen mark...@gmail.com wrote:
 Following query crashes backend on 9.2:
 
  select substring('asd TO foo' from ' TO (([a-z0-9._]+|([^]+|)+)+)');

 I spent some time trying to reduce this to the simplest case that
 still causes a crash, and came up with this:

 select substring('a' from '((a))+');

Yeah, I'm looking at it.  Looks like I broke memory management somewhere
in the quantifier revisions --- it seems to be stomping the stack during
cleanup at the end of pg_regexec.  Possibly a multiple-free deal?
Haven't quite found it yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Heikki Linnakangas

On 24.05.2012 18:16, Robert Haas wrote:

On Thu, May 24, 2012 at 10:13 AM, Joachim Wielandj...@mcknight.de  wrote:

I wouldn't have assumed any corruption was possible given that I did
clean shutdowns on both sides...


The thing that's worrying me is that there's not really any such thing
as a clean shutdown on a standby.  When you shut down the master, it
checkpoints.  When you shut down the standby, it can't checkpoint, so
I think it's still going to enter recovery at startup.  It'd be
interesting to know where that recovery began and ended as compared
with the minimum recovery point just before the shutdown.


Perhaps we should introduce the concept of a clean standby shutdown. We 
can't write a checkpoint record, but we could write the same information 
somewhere else. Like in the control file. At startup, we'd see that we 
did a clean shutdown at WAL point X/X, and start up without having to 
read through all the WAL from the last master checkpoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Kohei KaiGai
I'd like to summarize the current design being discussed.

syntax:
  ALTER TABLE tblname WITH ROW LEVEL SECURITY
  ( condition clause ) [FOR (SELECT | UPDATE | DELETE)];
  ALTER TABLE tblname WITHOUT ROW LEVEL SECURITY;

I tried to patch the parser/gram.y, but here was syntax conflicts
on ADD / DROP sub-command.
And, I noticed ROW LEVEL SECURITY allows to implement
without adding new keyword, unlike SECURITY POLICY.

As we discussed, it causes a problem with approach to append
additional qualifiers to where clause implicitly, because it does
not solve the matter corresponding to the order to execute
qualifiers. So, I'm inclined to the approach to replace reference
to tables with security policy by sub-queries with security barrier
flag.
For example, if tbl has security policy, this query shall be rewritten
internally, as follows:

original)
  SELECT * FROM tbl WHERE X  20 AND f_leak(Y);

rewritten)
  SELECT * FROM (
SELECT * FROM tbl WHERE uname = getpgusername()
  ) AS tbl_subqry WHERE X  20 AND f_leak(Y);

The sub-query shall have security-barrier flag, so f_leak() is never
pushed down but X  20 will be pushed down because of leakproof
attribute of the function.

It is a bit complex at UPDATE or DELETE statement, but
what I try to do is same.

  original)
UPDATE tbl SET X = X + 1 WHERE X  20 AND f_leak(Y);

  rewritten)
UPDATE tbl SET X = X + 1 WHERE ctid = (
SELECT ctid FROM (
SELECT ctid, * FROM uname = getpgusername()
) AS tbl_subqry WHERE X  20 AND f_leak(Y)
);

That guarantees the security policy (uname = getpgusername())
is evaluated prior to user given conditions.

One thing still I'm thinking is whether the security policy should
be provided as a function or a clause. Enough simple sql function
is inlined at simplify_function(), so here is no meaningful difference.
I was afraid of code complexity, but all we should do is to append
configured clause on the where clause of sub-query inside.

|  ALTER TABLE tblname WITH ROW LEVEL SECURITY
|  ( condition clause ) [FOR (SELECT | UPDATE | DELETE)];

So, I tried to put condition clause instead of a function, right now.


Regarding to FK constraints, I don't think it is a situation to
apply row-level security policy towards internal queries.
So, I plan to disable during FK checks.


One other issue we didn't have discussed is table inheritance.
In case when a table TBLP has a child table TBLC and only
TBLC has its security policy, what security policy should be
applied when we run SELECT * FROM TBLP.
My preference is, the security policy is only applied to scan on
TBLC, not TBLP. It is not desirable behavior that visible tuples
are different from way to reference a certain table.
In addition, if and when TBLP and TBLC have their own policy
individually, what is a desirable behavior?
I think, the security policy of both TBLP and TBLC should be
applied on TBLC; in other words, it applies the security policy
of all the parent tables to scan on child table.

Any comments please. Thanks,

2012/5/24 Kohei KaiGai kai...@kaigai.gr.jp:
 2012/5/24 Florian Pflug f...@phlo.org:
 On May24, 2012, at 16:19 , Kohei KaiGai wrote:
 So, the proposed interface might be revised as follows:
  ALTER TABLE tblname ADD SECURITY POLICY
      func_name(args, ...) [FOR SELECT |
                                                         INSERT |
                                                         [BEFORE|AFTER] 
 UPDATE |
                                                         DELETE];

 In case of INSERT or AFTER UPDATE, I assume the check shall be applied
 on the tail of before-row triggers.

 I'd go with just SELECT, UPDATE, DELETE, and leave the INSERT and BEFORE
 UPDATE case to regular triggers, for two reasons

 First, it's conceptually much simpler, since the policy always just adds
 an implicit WHERE clause, period. This of course assumes that DELETE and
 (BEFORE) UPDATE simply skips rows for which the policy function returns 
 false,
 instead of reporting 'permission denied' or something. But that's the most
 reasonable behaviour anyway, I think, because otherwise you'd make batch
 UPDATEs and DELETEs pretty much unusable, 'cause there'd always be the risk
 of tripping over some invisible row and getting and error.

 I definitely agree with starting a new feature from simple implementation.

 Although I'm inclined to the approach to replace references to tables with
 security policy by sub-queries with security barrier flag, instead of adding
 qualifiers of where clause to avoid the leaky-view scenario, it will make its
 implementation mush simpler.

 Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
 created references to rows which are invisible to you, or should FOREIGN KEY
 constraints be exempt from security policies? I'd say they shouldn't be, i.e.
 the policy WHERE clause should be added to constraint checking queries like
 usual. But maybe I'm missing some reason why that'd be 

Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Andres Freund
On Thursday, May 24, 2012 06:35:06 PM Heikki Linnakangas wrote:
 On 24.05.2012 18:16, Robert Haas wrote:
  On Thu, May 24, 2012 at 10:13 AM, Joachim Wielandj...@mcknight.de  wrote:
  I wouldn't have assumed any corruption was possible given that I did
  clean shutdowns on both sides...
  
  The thing that's worrying me is that there's not really any such thing
  as a clean shutdown on a standby.  When you shut down the master, it
  checkpoints.  When you shut down the standby, it can't checkpoint, so
  I think it's still going to enter recovery at startup.  It'd be
  interesting to know where that recovery began and ended as compared
  with the minimum recovery point just before the shutdown.
 
 Perhaps we should introduce the concept of a clean standby shutdown. We
 can't write a checkpoint record, but we could write the same information
 somewhere else. Like in the control file. At startup, we'd see that we
 did a clean shutdown at WAL point X/X, and start up without having to
 read through all the WAL from the last master checkpoint.
The control file currently is not a very good match because of the current 
requirement of staying below 512 bytes. If we would include the list of 
running xacts that wouldn't be enough.
I wondered before if there is more to do to fix that then to do the atomic 
write();fsync();rename();fsync(); dance. I don't see a problem with the cost 
of that personally...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck

2012-05-24 Thread Fujii Masao
On Wed, May 23, 2012 at 9:25 PM, Magnus Hagander mag...@hagander.net wrote:
 While reviewing and cleaning this patch up a bit I noticed it actually
 broke pg_receivexlog in the renaming.

 Here is a new version of the patch, reworked based on the above so
 we're down to a single callback. I moved the rename last segment file
 even if it's not complete to be a parameter into ReceiveXlogStream()
 instead of trying to overload a third functionality on the callback
 (which is what broke pg_receivexlog).

 How does this look? Have I overlooked any cases?

Thanks for the patch! Looks good to me except the followings:

pg_basebackup.c:233: warning: passing argument 6 of
'ReceiveXlogStream' from incompatible pointer type

I got the above warning on compile. To fix this, the third argument
segment_finished
needs to be added to reached_end_position().

It seems confusing that *stream_continue()* returns TRUE when
streaming *cannot continue*, i.e.,
its name seems to be inconsistent with what it does. What about
renaming it to stream_stop?
Similarly, it also seems confusing that *continue_streaming()* returns
TRUE when streaming
*cannot continue*.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 6:11 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Perhaps when we see that RLS
 applies, we should replace the reference to the original table with a
 subquery RTE that has the security_barrier flag set - essentially
 treating a table with RLS as if it were a security view.

 I become to think it is a better approach than tracking origin of each
 qualifiers. One problem is case handling on update or delete statement.

 It may be possible to rewrite the update / delete query as follows:

 From:
  UPDATE tbl SET X = X + 1 WHERE f_leak(Y)
 To:
  UPDATE tbl SET X = X + 1 WHERE ctid = (
      SELECT * FROM (
          SELECT ctid FROM tbl WHERE uname = getpgusername()  == (*)
 should have security-barrier
      ) AS tbl_subqry WHERE f_leak(Y)
  );

 Expanded sub-queries will have security-barrier flag, so it enforces
 the uname = getpgusername() being checked earlier than f_leak(Y).
 We may need to measure the performance impact due to the reform.

The problem with this is that it introduces an extra instance of tbl
into the query - there are now two rather than one.  UPDATE .. FROM is
supposed to be a way to avoid this, but it's insufficiently general to
handle all the cases (e.g. UPDATE a LEFT JOIN b can't be written using
the existing syntax).  Anyway we want to avoid inserting self-joins
for performance reasons if at all possible.  It should be easy to do
that in the case of SELECT; UPDATE and DELETE may need a bit more
work.

 I think, this situation is similar to a case when we reference a view
 without privileges to underlying tables. If Bob set up a view with
 something tricky function, it allows Bob to reference credentials
 of users who reference the view.
 More or less, it might be a problem when a user try to invoke
 a user defined function declared by others.
 (Thus, sepgsql policy does not allow users to invoke a function
 declared by another one in different domain; without DBA's checks.)

This is true, but there are still some new threat models.  For
example, currently, pg_dump isn't going to run any user-defined code
just because you do SELECT * FROM table, but that will change with
this patch.  Note that pg_dump need not actually select from views,
only tables.

 I think it is a good idea not to apply RLS when current user has
 superuser privilege from perspective of security model consistency,
 but it is inconsistent to check privileges underlying tables.

Seems like a somewhat random wart, if it's just an exception for
superusers.  I think we need to do better than that.  For example, at
my last company, sales reps A and B were permitted to see all
customers of the company, but sales reps C, D, E, F, G, H, I, and J
were permitted to see only their own accounts.  Those sorts of
policies need to be easy to implement.

 Another idea is to set things up so that the RLS policy function isn't
 applied to each row directly; instead, it's invoked once per query and
 *returns* a WHERE clause.  This would be a lot more powerful than the
 proposed design, because now the table owner can write a function that
 imposes quals on some people but not others, which seems very useful.

 Sorry, I don't favor this idea. Even if table owner set up a function to
 generate additional qualifiers, it also has no guarantee the qualifiers
 are invoked prior to user-given one.
 It seems to me this approach will have same problem...

It's not intended to solve the qual-ordering problem, just to allow
additional policy flexibility.

 It's not clear to me that there is any need for built-in server
 functionality here.  If the table owner wants to enforce some sort of
 policy regarding INSERT or UPDATE or DELETE, they can already do that
 today just by attaching a trigger to the table.  And they can enforce
 whatever policy they like that way.  Before designing any new
 mechanism, what's wrong with the existing one?

 Yes, we don't need any new invent to check the value of new tuples.
 But it should be done after all the user-defined triggers. Existing
 trigger does not have a mechanism to enforce order to be invoked.
 So, what I really implement is a mechanism to inject some pseudo
 triggers at tail of the Trigger array.

Start the trigger names with the letter z.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 6:20 AM, Florian Pflug f...@phlo.org wrote:
 But the security policy should still apply to the old rows, i.e.
 you shouldn't be after to UPDATE or DELETE rows you cannot see, no?

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Peter Eisentraut
clang warns about that newish SvREFCNT_inc(sv) call in plperl_helpers.h
about an unused return value, because the macro expansion of
SvREFCNT_inc(sv) returns sv.  The merit of that warning might be
debatable, but it seems easy to fix by using SvREFCNT_inc_void(sv)
instead.

And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
side effects, but that's optional.

Any concerns?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 12:00 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Another issue, BTW, are FOREIGN KEY constraints. Should you be allowed to
 created references to rows which are invisible to you, or should FOREIGN KEY
 constraints be exempt from security policies? I'd say they shouldn't be, i.e.
 the policy WHERE clause should be added to constraint checking queries like
 usual. But maybe I'm missing some reason why that'd be undesirable…

 I agree. The row level security policy should not be applied during FK checks
 (or other internal stuff; to be harmless). At the previous discussion, it was
 issued that iteration of FK/PK proving enables malicious one to estimate
 existence of invisible tuple and its key value, although they cannot see the
 actual values. It is well documented limitation, thus, user should not use 
 row-
 level security (or should not use natural key) if they cannot accept
 this limitation.

You say I agree, but it seems to me that you and Florian are in fact
taking opposite positions.

FWIW, I'm inclined to think that you should NOT be able to create a
row that references an invisible row.  You might end up with that
situation anyway, because we don't know what the semantics of the
security policy are: rows might become visible or invisible after the
fact, and we can't police that.  But I think that if you take the
opposite position that the select queries inside fkey triggers ought
to be exempt from security policy, then you need to build some new
mechanism to make that happen, which seems like extra work for no
benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Alex Hunsaker
On Thu, May 24, 2012 at 11:19 AM, Peter Eisentraut pete...@gmx.net wrote:
 clang warns about that newish SvREFCNT_inc(sv) call in plperl_helpers.h
 about an unused return value, because the macro expansion of
 SvREFCNT_inc(sv) returns sv.  The merit of that warning might be
 debatable, but it seems easy to fix by using SvREFCNT_inc_void(sv)
 instead.

 And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
 side effects, but that's optional.

 Any concerns?

Hrm I can't seem to find either of those functions in my copy of perl
5.8.1 or 5.16.0. I also looked for _inc_ in the event I got casing
wrong :-(.

Perhaps I misunderstood, are you proposing to introduce those
functions? Im fine with that. Or doing (void)SvREFCNT_inc().

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Interface of Row Level Security

2012-05-24 Thread Florian Pflug
On May24, 2012, at 18:42 , Kohei KaiGai wrote:
 I'd like to summarize the current design being discussed.
 
 syntax:
  ALTER TABLE tblname WITH ROW LEVEL SECURITY
  ( condition clause ) [FOR (SELECT | UPDATE | DELETE)];
  ALTER TABLE tblname WITHOUT ROW LEVEL SECURITY;
 
 I tried to patch the parser/gram.y, but here was syntax conflicts
 on ADD / DROP sub-command.
 And, I noticed ROW LEVEL SECURITY allows to implement
 without adding new keyword, unlike SECURITY POLICY.

Let the bike-shedding begin ;-)

ALTER TABLE … WITH sounds a bit weird. What about

  ALTER TABLE tblname SET ROW POLICY condition FOR { SELECT | UPDATE | 
DELETE }
  ALTER TABLE tblname RESET ROW POLICY

 As we discussed, it causes a problem with approach to append
 additional qualifiers to where clause implicitly, because it does
 not solve the matter corresponding to the order to execute
 qualifiers. So, I'm inclined to the approach to replace reference
 to tables with security policy by sub-queries with security barrier
 flag.

Since the security barrier flag carries a potentially hefty performance
penalty, I think it should be optional. Application which don't allow
SQL-level access to the database might still benefit from row-level security,
because it saves them from having to manually add the WHERE clause to every
statement, or having to wrap all their tables with views. Yet without direct
SQL-level access, the security barrier thing isn't really necessary, so
it'd be nice if they wouldn't have to pay for it. How about

  ALTER TABLE … SET ROW POLICY … WITH (security_barrier)

 One thing still I'm thinking is whether the security policy should
 be provided as a function or a clause. Enough simple sql function
 is inlined at simplify_function(), so here is no meaningful difference.
 I was afraid of code complexity, but all we should do is to append
 configured clause on the where clause of sub-query inside.
 
 |  ALTER TABLE tblname WITH ROW LEVEL SECURITY
 |  ( condition clause ) [FOR (SELECT | UPDATE | DELETE)];
 
 So, I tried to put condition clause instead of a function, right now.

A single function seems much easier implementation-wise, since you wouldn't
need to store an arbitrary expression in the catalog, just an oid. It also
delegates the dependency tracking problem to the function. It also simplies
the grammar, because the FOR …  clause cannot be mistaken to belong to
the condition clause.

 One other issue we didn't have discussed is table inheritance.
 In case when a table TBLP has a child table TBLC and only
 TBLC has its security policy, what security policy should be
 applied when we run SELECT * FROM TBLP.
 My preference is, the security policy is only applied to scan on
 TBLC, not TBLP.

Agreed.

 It is not desirable behavior that visible tuples
 are different from way to reference a certain table.
 In addition, if and when TBLP and TBLC have their own policy
 individually, what is a desirable behavior?
 I think, the security policy of both TBLP and TBLC should be
 applied on TBLC; in other words, it applies the security policy
 of all the parent tables to scan on child table.

I think security policies should only apply to the table they're
declared for, not their child tables. Mostly because that is how
triggers operate, and security policies and triggers will often
be used together, so having their semantics regarding inheritance
be the same seems to be the least surprising option.

Also, if policies are inherited, how would you define a policy
which applies only to the parent, not to the child?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
Yesterday I had a client that experienced a sudden high load on 
one of their servers (8.3.5 - yes, I know. Those of you with 
clients will understand). When I checked, almost all connections 
were in a startup state, very similar to this thread:

http://postgresql.1045698.n5.nabble.com/9-1-3-backends-getting-stuck-in-startup-td5670712.html

Running a strace showed a lot of semop activity, and the logs showed a
successful connection, then a 5 minute plus wait before a query was issued.
So obviously, blocking on something. Unlike the thread above, I *did* find
problems in the system catalogs. For example, both pg_class and pg_index
gave warnings like this for every index during a VACUUM FULL
VERBOSE tablename:

WARNING: index pg_class_relname_nsp_index contains 7712 row versions,
  but table contains 9471 row versions
HINT:  Rebuild the index with REINDEX.

A REINDEX did not solve the problem (initially), as a REINDEX followed
by a VAC showed the same warning and hint.

The next step was dropping to standalone mode, but before that could
be done, the REINDEXes fixed the problem (no warnings, no stalled 
connections). So my questions are:

* Why would a REINDEX not fix the problem as the hint suggested?
Why would it then start working?

* What exactly is the relationship between bad indexes and shared
memory locks?

* Is there some other emergency fix when it happens, such as killing
all backends and hoping you kill the one that is actually holding
the lock (if any was).

* Did anything in the 8.3 series fix this? I saw nothing relevant in
the release notes for everything up to 8.3.18 (which it will be on soon).

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpwbxRekkZQf.pgp
Description: PGP signature


Re: [HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Alex Hunsaker
On Thu, May 24, 2012 at 11:31 AM, Alex Hunsaker bada...@gmail.com wrote:
 On Thu, May 24, 2012 at 11:19 AM, Peter Eisentraut pete...@gmx.net wrote:

 And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
 side effects, but that's optional.

 Hrm I can't seem to find either of those functions in my copy of perl
 5.8.1 or 5.16.0. I also looked for _inc_ in the event I got casing
 wrong :-(.

Doh, it is indeed there in 5.16.0, looks like it got added in 5.10
:-(. (I was on the wrong branch...).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Sergey Koposov

On Thu, 24 May 2012, Merlin Moncure wrote:


Are you sure?  I looked at all the ReleasePredicateLocks calls and
they appear to be guarded by:

   /* Nothing to do if this is not a serializable transaction */
   if (MySerializableXact == InvalidSerializableXact)
   return false;
What's the default isolation mode set to?


Sorry, it was indeed my mistake. I was pointing opreport to the 
binary which was recompiled, while the old version was still running , so 
I guess oprofile was picking up wrong function names


Here is the correct oprofile: when multiple queries are running :
samples  %symbol name
---
952457   13.8715  LWLockAcquire
  952457   100.000  LWLockAcquire [self]
---
779077   11.3464  PinBuffer
  779077   100.000  PinBuffer [self]
---
759898   11.0671  s_lock
  759898   100.000  s_lock [self]
---
6897536.7178  slot_deform_tuple
  689753   100.000  slot_deform_tuple [self]
---
5260025.1230  UnpinBuffer
  526002   100.000  UnpinBuffer [self]

When only one is running:
samples  %symbol name
---
163268   14.0343  slot_deform_tuple
  163268   100.000  slot_deform_tuple [self]
---
126018   10.8324  _bt_compare
  126018   100.000  _bt_compare [self]
---
1139759.7972  ExecProject
  113975   100.000  ExecProject [self]
---
49760 4.2773  FunctionCall2Coll
  49760100.000  FunctionCall2Coll [self]
---
49164 4.2261  LWLockAcquire
  49164100.000  LWLockAcquire [self]
---
43526 3.7414  hash_search_with_hash_value
  43526100.000  hash_search_with_hash_value [self]



I guess there is nothing catastrophically wrong with that, but still I'm 
very suprised that you get severe locking problems (factor of two 
slow-down) when running parallel read-only transactions.


Sergey

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [9.2] crash on regex

2012-05-24 Thread Tom Lane
Thom Brown t...@linux.com writes:
 Hmmm... curiously, lazy (non-greedy) quantifiers are stable, such as: ((a))*?

I've found it.  The triggering conditions are (1) more than one set of
capturing parens in a substring() pattern, and (2) at least one trial
midpoint failing in ccondissect() or one of its siblings.  That results
in zaptreesubs() trying to clear an array entry that isn't there...
trivial to fix fortunately.  So it's not so much what the quantifiers
are as whether the first attempted match succeeds.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_preload_libraries path

2012-05-24 Thread Peter Eisentraut
On tor, 2012-05-24 at 14:09 +0200, Magnus Hagander wrote:
 Would i make sense to have a postgresql.conf parameter that would add
 to LD_LIBRARY_PATH when loading libraries from
 shared_preload_libraries (and other library loads).

Well, you could write a library that sets it in its init function, and
load that before the library that needs it set. :-)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Jeff Janes
On Thu, May 24, 2012 at 6:24 AM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Hi,

 I've been running some tests on pg 9.2beta1 and in particular a set
 of queries like
...

 And I noticed than when I run the query like the one shown above in parallel
 (in multiple connections for ZZZ=0...8) the performance of each query drops
 down significantly (factor of 2)(despite the fact that during the execution
 of the query postgres is mostly CPU bound).

Is this a regression from previous versions (9.0, 9.1), or has the
slow down always been there?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 I guess there is nothing catastrophically wrong with that, but still I'm
 very suprised that you get severe locking problems (factor of two slow-down)
 when running parallel read-only transactions.

Me, too.  How many concurrent connections are you running, and does
your working set exceed shared_buffers?  Can you provide a
self-contained reproducible test case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared_preload_libraries path

2012-05-24 Thread Magnus Hagander
On Thu, May 24, 2012 at 7:58 PM, Peter Eisentraut pete...@gmx.net wrote:
 On tor, 2012-05-24 at 14:09 +0200, Magnus Hagander wrote:
 Would i make sense to have a postgresql.conf parameter that would add
 to LD_LIBRARY_PATH when loading libraries from
 shared_preload_libraries (and other library loads).

 Well, you could write a library that sets it in its init function, and
 load that before the library that needs it set. :-)

:-) What's life without workarounds, eh? ;)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plperl_helpers.h fix for clang

2012-05-24 Thread Peter Eisentraut
On tor, 2012-05-24 at 11:36 -0600, Alex Hunsaker wrote:
 On Thu, May 24, 2012 at 11:31 AM, Alex Hunsaker bada...@gmail.com wrote:
  On Thu, May 24, 2012 at 11:19 AM, Peter Eisentraut pete...@gmx.net wrote:
 
  And we could use SvREFCNT_inc_simple_void(sv), since sv doesn't have any
  side effects, but that's optional.
 
  Hrm I can't seem to find either of those functions in my copy of perl
  5.8.1 or 5.16.0. I also looked for _inc_ in the event I got casing
  wrong :-(.
 
 Doh, it is indeed there in 5.16.0, looks like it got added in 5.10
 :-(. (I was on the wrong branch...).

It's in ppport.h.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Creating multiple indexes in one table scan.

2012-05-24 Thread Andrew Dunstan



On 05/24/2012 11:44 AM, Robert Haas wrote:

On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singhsingh.gurj...@gmail.com  wrote:

It'd be great if one of standard utilities like pg_restore supported this,
by spawning every concurrent index build in separate backends. Just a
thought.

If parallel restore doesn't already take this into account when doing
job scheduling, that would be a worthwhile improvement to consider.





Parallel restore allows it in some cases, but not in others. You can't 
create an index-backed constraint concurrently with any other index 
because ALTER TABLE takes an ACCESS EXCLUSIVE lock.


We could get around that changing pg_dump to create the index first and 
then add the constraint using the existing index.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the concept of a DATABASE

2012-05-24 Thread Josh Berkus

 Yes, we do. It would be best to conclude that things I do on hackers
 relate in some way to those goals, even if it isn't immediately clear
 how.

See, now you've got me all curious.  How does inter-DB queries relate to
the New Replication?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Stateful pointers in set-returning functions

2012-05-24 Thread Ian Pye
Hi,

I'm writing a set-returning function which places a file handle into
PG's FuncCallContext's user_fctx space. My problem is that when the
function is ran with a limit clause (SELECT * FROM foo() LIMIT 10) the
server will stop calling the function automatically, not giving me a
chance to close the file handle. Is there a way to get the limit value
inside of foo() and set the max max_calls parameter correctly?

Thanks,

Ian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Sergey Koposov

On Thu, 24 May 2012, Robert Haas wrote:


On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:

I guess there is nothing catastrophically wrong with that, but still I'm
very suprised that you get severe locking problems (factor of two slow-down)
when running parallel read-only transactions.


Me, too.  How many concurrent connections are you running, and does
your working set exceed shared_buffers?  Can you provide a
self-contained reproducible test case?


The last tests I've been doing were with 8 connections.
And the working set is roughly 30Gig, which is ~ 3x the shared buffers. 
(but ~ 50% of RAM).


Regarding the test-case, I'll try to see whether I can still observe the 
same slowing down if I chop the main table by a factor of few, so I can 
put the data somewhere for download.


S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stateful pointers in set-returning functions

2012-05-24 Thread Tom Lane
Ian Pye ian...@gmail.com writes:
 I'm writing a set-returning function which places a file handle into
 PG's FuncCallContext's user_fctx space. My problem is that when the
 function is ran with a limit clause (SELECT * FROM foo() LIMIT 10) the
 server will stop calling the function automatically, not giving me a
 chance to close the file handle. Is there a way to get the limit value
 inside of foo() and set the max max_calls parameter correctly?

No, and even if there were, this would be a very unsafe practice,
since errors or other issues could result in early termination of the
query.

You would likely be better off using tuplestore return mode so that you
can do all the reading during one call and not have to assume that
you'll get control back again.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing the concept of a DATABASE

2012-05-24 Thread Andres Freund
On Thursday, May 24, 2012 08:12:56 PM Josh Berkus wrote:
  Yes, we do. It would be best to conclude that things I do on hackers
  relate in some way to those goals, even if it isn't immediately clear
  how.
 See, now you've got me all curious.  How does inter-DB queries relate to
 the New Replication?
Being able to apply changes from one process to multiple databases reduces 
implementation/runtime overhead...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stateful pointers in set-returning functions

2012-05-24 Thread Ian Pye
Fair enough -- thanks for the tip.

On Thu, May 24, 2012 at 11:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ian Pye ian...@gmail.com writes:
 I'm writing a set-returning function which places a file handle into
 PG's FuncCallContext's user_fctx space. My problem is that when the
 function is ran with a limit clause (SELECT * FROM foo() LIMIT 10) the
 server will stop calling the function automatically, not giving me a
 chance to close the file handle. Is there a way to get the limit value
 inside of foo() and set the max max_calls parameter correctly?

 No, and even if there were, this would be a very unsafe practice,
 since errors or other issues could result in early termination of the
 query.

 You would likely be better off using tuplestore return mode so that you
 can do all the reading during one call and not have to assume that
 you'll get control back again.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 The control file currently is not a very good match because of the current 
 requirement of staying below 512 bytes. If we would include the list of 
 running xacts that wouldn't be enough.
 I wondered before if there is more to do to fix that then to do the atomic 
 write();fsync();rename();fsync(); dance. I don't see a problem with the cost 
 of that personally...

The reason for keeping it to one sector is that you're screwed if the
file is broken, so the fewer failure modes the better.

I'm not sure I believe that we can make a recovery resume from an
arbitrary point in WAL anyway, or that it would be worth the trouble.
Can't we just resume from the last restartpoint?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Andres Freund
On Thursday, May 24, 2012 08:32:47 PM Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  The control file currently is not a very good match because of the
  current requirement of staying below 512 bytes. If we would include the
  list of running xacts that wouldn't be enough.
  I wondered before if there is more to do to fix that then to do the
  atomic write();fsync();rename();fsync(); dance. I don't see a problem
  with the cost of that personally...
 
 The reason for keeping it to one sector is that you're screwed if the
 file is broken, so the fewer failure modes the better.
Yea, sure. But given the amount of software that depends on the above sequence 
to work correctly I don't really see much of a problem...

 I'm not sure I believe that we can make a recovery resume from an
 arbitrary point in WAL anyway, or that it would be worth the trouble.
 Can't we just resume from the last restartpoint?
Well, with a decent sized checkpoint_segments getting up2date can take quite a 
noticeable amount of time...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 2:19 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 24 May 2012, Robert Haas wrote:
 On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov kopo...@ast.cam.ac.uk
 wrote:

 I guess there is nothing catastrophically wrong with that, but still I'm
 very suprised that you get severe locking problems (factor of two
 slow-down)
 when running parallel read-only transactions.

 Me, too.  How many concurrent connections are you running, and does
 your working set exceed shared_buffers?  Can you provide a
 self-contained reproducible test case?

 The last tests I've been doing were with 8 connections.
 And the working set is roughly 30Gig, which is ~ 3x the shared buffers. (but
 ~ 50% of RAM).

Given that additional information, I would say these results are
expected.  Unfortunately, our BufFreelistLock is a serious contention
point, and I think that's what you're hitting.  See the graph here:

http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html

As you can see, raw performance isn't much worse with the larger data
sets, but scalability at high connection counts is severely degraded
once the working set no longer fits in shared_buffers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-05-24 Thread Robert Haas
On Wed, May 23, 2012 at 2:28 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Dec 31, 2011 at 10:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Send new protocol keepalive messages to standby servers.
 Allows streaming replication users to calculate transfer latency
 and apply delay via internal functions. No external functions yet.

 Is there plan to implement such external functions before 9.2 release?
 If not, keepalive protocol seems to be almost useless because there is
 no use of it for a user and the increase in the number of packets might
 increase the replication performance overhead slightly. No?

Good point.  IMHO, this shouldn't really have been committed like
this, but since it was, we had better fix it, either by reverting the
change or forcing an initdb to expose the functionality.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Thursday, May 24, 2012 08:32:47 PM Tom Lane wrote:
 I'm not sure I believe that we can make a recovery resume from an
 arbitrary point in WAL anyway, or that it would be worth the trouble.
 Can't we just resume from the last restartpoint?

 Well, with a decent sized checkpoint_segments getting up2date can take quite 
 a 
 noticeable amount of time...

So?  Slow restart is precisely the price you pay for a large checkpoint
interval.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, May 23, 2012 at 2:28 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Is there plan to implement such external functions before 9.2 release?
 If not, keepalive protocol seems to be almost useless because there is
 no use of it for a user and the increase in the number of packets might
 increase the replication performance overhead slightly. No?

 Good point.  IMHO, this shouldn't really have been committed like
 this, but since it was, we had better fix it, either by reverting the
 change or forcing an initdb to expose the functionality.

I see no reason to rip the code out if we have plans to make use of it
in the near future.  I am also not for going back into development mode
on 9.2, which is what adding new functions now would amount to.  What's
wrong with leaving well enough alone?  It's not like there is no
unfinished work anywhere else in Postgres ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] could not open relation with OID errors after promoting the standby to master

2012-05-24 Thread Andres Freund
On Thursday, May 24, 2012 08:46:21 PM Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On Thursday, May 24, 2012 08:32:47 PM Tom Lane wrote:
  I'm not sure I believe that we can make a recovery resume from an
  arbitrary point in WAL anyway, or that it would be worth the trouble.
  Can't we just resume from the last restartpoint?
  
  Well, with a decent sized checkpoint_segments getting up2date can take
  quite a noticeable amount of time...
 
 So?  Slow restart is precisely the price you pay for a large checkpoint
 interval.
Well. no. If you stop the master in a orderly fashion it will checkpoint and 
thus have a sensible startup time. There is no such possibility for a standby  
without interaction with the master.

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Merlin Moncure
On Thu, May 24, 2012 at 1:43 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 24, 2012 at 2:19 PM, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 On Thu, 24 May 2012, Robert Haas wrote:
 On Thu, May 24, 2012 at 1:42 PM, Sergey Koposov kopo...@ast.cam.ac.uk
 wrote:

 I guess there is nothing catastrophically wrong with that, but still I'm
 very suprised that you get severe locking problems (factor of two
 slow-down)
 when running parallel read-only transactions.

 Me, too.  How many concurrent connections are you running, and does
 your working set exceed shared_buffers?  Can you provide a
 self-contained reproducible test case?

 The last tests I've been doing were with 8 connections.
 And the working set is roughly 30Gig, which is ~ 3x the shared buffers. (but
 ~ 50% of RAM).

 Given that additional information, I would say these results are
 expected.  Unfortunately, our BufFreelistLock is a serious contention
 point, and I think that's what you're hitting.  See the graph here:

 http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html

 As you can see, raw performance isn't much worse with the larger data
 sets, but scalability at high connection counts is severely degraded
 once the working set no longer fits in shared_buffers.

Hm, wouldn't the BufFreelistLock issue be ameliorated if
StrategyGetBuffer could reserve multiple buffers so that you'd draw
down your local list and only then go back to the global pool? (easier
said than done obviously).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Archiver not exiting upon crash

2012-05-24 Thread Jeff Janes
On Wed, May 23, 2012 at 2:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Jeff Janes jeff.ja...@gmail.com writes:
 But what happens if the SIGQUIT is blocked before the system(3) is
 invoked?  Does the ignore take precedence over the block, or does the
 block take precedence over the ignore, and so the signal is still
 waiting once the block is reversed after the system(3) is over?  I
 could write a test program to see, but that wouldn't be very good
 evidence of the portability.

 AFAICT from the POSIX spec for system(3), that would be a bug in
 system().

 Actually, on further thought, it seems like there is *necessarily* a
 race condition in this.  There must be some interval where the child
 process has already exited but the waiting parent hasn't de-ignored the
 signals.

Yup.  the posix man page doesn't say that this is guaranteed to be
atomic, so I assume no such guarantee is even attempted.  For that
matter, the man page for system(3) doesn't even tell how one goes
about signaling the child--unless the signal is sent to the negation
of the group leader the child won't receive it.  (Postgres handles
group-leader thing correctly, I just point it out to show the man page
is not intended to be complete)

 However, I remain unsatisfied with this idea as an explanation for the
 behavior you're seeing.  In the first place, that race condition window
 ought not be wide enough to allow failure probabilities as high as 10%.
 In the second place, that code has been like that for a long while,
 so this theory absolutely does not explain why you're seeing a
 materially higher probability of failure in HEAD than 9.1.  There is
 something else going on.

After a while trying to bisect the behavior, I decided it was a mug's
game.  Both arms of the race (the firing of archive_command and the
engineered crash) are triggered indirectly be the same event, the
start of a checkpoint.  Small changes in the code can lead to small
changes in the timing which make drastic changes in how likely it is
that the two arms collide exactly at the vulnerability.

So my test harness is an inexplicably effective show-case for the
vulnerability, but it is not the reason the vulnerability should be
fixed.

By the way, my archive_command is very fast, as all it does it echo
the date into a log file.  I want postgres to think it is archive
mode, but for this purpose I don't want to actually deal with having
an archive.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 Yesterday I had a client that experienced a sudden high load on 
 one of their servers (8.3.5 - yes, I know. Those of you with 
 clients will understand). When I checked, almost all connections 
 were in a startup state, very similar to this thread:

 http://postgresql.1045698.n5.nabble.com/9-1-3-backends-getting-stuck-in-startup-td5670712.html

 Running a strace showed a lot of semop activity, and the logs showed a
 successful connection, then a 5 minute plus wait before a query was issued.
 So obviously, blocking on something.

Did you check I/O activity?  I looked again at Jeff Frost's report and
now think that what he saw was probably a lot of seqscans on bloated
system catalogs, cf
http://archives.postgresql.org/message-id/28484.1337887...@sss.pgh.pa.us

 Unlike the thread above, I *did* find
 problems in the system catalogs. For example, both pg_class and pg_index
 gave warnings like this for every index during a VACUUM FULL
 VERBOSE tablename:

 WARNING: index pg_class_relname_nsp_index contains 7712 row versions,
   but table contains 9471 row versions
 HINT:  Rebuild the index with REINDEX.

That's fairly interesting, but if it was a bloat situation then it
would've been the VAC FULL that fixed it rather than the REINDEX.
Did you happen to save the VERBOSE output?  It'd be really useful to
know whether there was any major shrinkage of the core catalogs
(esp. pg_class, pg_attribute).

 * Did anything in the 8.3 series fix this?

I think there are probably two independent issues here.  The missing
index entries are clearly bad but it's not clear that they had anything
to do with the startup stall.  There are a couple of fixes in recent
8.3.x releases that might possibly explain the index corruption,
especially if you're in the habit of reindexing the system catalogs
frequently.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing optimization when filters are applied after window functions

2012-05-24 Thread Bruce Momjian
On Wed, May 16, 2012 at 09:25:13AM -0400, Tom Lane wrote:
 Volker Grabsch v...@notjusthosting.com writes:
  I propose the following general optimization: If all window
  functions are partitioned by the same first field (here: id),
  then any filter on that field should be executed before
  WindowAgg.
 
 I'm not sure if that rule is correct in detail, but in any case the
 short answer is that window aggregates are a new feature in Postgres
 and we basically haven't done any optimization work on them yet.
 Feel free to work in that area if it interests you...

Is this a TODO?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 2:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, May 23, 2012 at 2:28 PM, Fujii Masao masao.fu...@gmail.com wrote:
 Is there plan to implement such external functions before 9.2 release?
 If not, keepalive protocol seems to be almost useless because there is
 no use of it for a user and the increase in the number of packets might
 increase the replication performance overhead slightly. No?

 Good point.  IMHO, this shouldn't really have been committed like
 this, but since it was, we had better fix it, either by reverting the
 change or forcing an initdb to expose the functionality.

 I see no reason to rip the code out if we have plans to make use of it
 in the near future.  I am also not for going back into development mode
 on 9.2, which is what adding new functions now would amount to.  What's
 wrong with leaving well enough alone?  It's not like there is no
 unfinished work anywhere else in Postgres ...

So, extra TCP overhead for no user-visible benefit doesn't bother you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Jeff Janes
On Thu, May 24, 2012 at 12:46 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure mmonc...@gmail.com wrote:
 As you can see, raw performance isn't much worse with the larger data
 sets, but scalability at high connection counts is severely degraded
 once the working set no longer fits in shared_buffers.

 Hm, wouldn't the BufFreelistLock issue be ameliorated if
 StrategyGetBuffer could reserve multiple buffers so that you'd draw
 down your local list and only then go back to the global pool? (easier
 said than done obviously).

 hm, looking at the code some more, it looks like the whole point of
 the strategy system is to do that.

I thought you were suggesting that the StrategyGetBuffer would
pre-allocate multiple buffers to a backend under the cover of a single
BufFreelistLock.  If that is what you were suggesting, that is not
what the strategy system is currently for.  It is for locally reusing
buffers, not for gang-allocating them.

If a backend could somehow predict that the buffer it is about to read
in is likely going to be a cold buffer, perhaps it would make sense
for each backend to maintain an small ring of its own which it can
reuse for such cold buffers.



 ISTM bulk insert type queries
 would be good candidates for a buffer strategy somehow?

Probably.  There is a code or README comment to that effect that I
stumbled upon just ra couple hours ago, but can't immediately re-find
it.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 3:46 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, May 24, 2012 at 2:24 PM, Merlin Moncure mmonc...@gmail.com wrote:
 As you can see, raw performance isn't much worse with the larger data
 sets, but scalability at high connection counts is severely degraded
 once the working set no longer fits in shared_buffers.

 Hm, wouldn't the BufFreelistLock issue be ameliorated if
 StrategyGetBuffer could reserve multiple buffers so that you'd draw
 down your local list and only then go back to the global pool? (easier
 said than done obviously).

 hm, looking at the code some more, it looks like the whole point of
 the strategy system is to do that.  ISTM bulk insert type queries
 would be good candidates for a buffer strategy somehow?

Yep.

commit 85e2cedf985bfecaf43a18ca17433070f439fb0e
Author: Tom Lane t...@sss.pgh.pa.us
Date:   Thu Nov 6 20:51:15 2008 +

Improve bulk-insert performance by keeping the current target buffer pinned
(but not locked, as that would risk deadlocks).  Also, make it work in a sma
ring of buffers to avoid having bulk inserts trash the whole buffer arena.

Robert Haas, after an idea of Simon Riggs'.

But that doesn't help here, since pgbench is only looking up one row per query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Merlin Moncure
On Thu, May 24, 2012 at 3:35 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, May 24, 2012 at 3:46 PM, Merlin Moncure mmonc...@gmail.com wrote:
 hm, looking at the code some more, it looks like the whole point of
 the strategy system is to do that.  ISTM bulk insert type queries
 would be good candidates for a buffer strategy somehow?

 Yep.

 commit 85e2cedf985bfecaf43a18ca17433070f439fb0e
 Author: Tom Lane t...@sss.pgh.pa.us
 Date:   Thu Nov 6 20:51:15 2008 +

    Improve bulk-insert performance by keeping the current target buffer pinned
    (but not locked, as that would risk deadlocks).  Also, make it work in a 
 sma
    ring of buffers to avoid having bulk inserts trash the whole buffer arena.

    Robert Haas, after an idea of Simon Riggs'.

 But that doesn't help here, since pgbench is only looking up one row per 
 query.

Wait -- OP's gripe this isn't regarding standard pgbench, but multiple
large concurrent 'insert into foo select...'.  I looked in the code
and it appears that the only bulk insert strategy using operations are
copy, create table as select, and table rewrite operations. Concurrent
INSERT SELECT apparently doesn't get the benefit of a strategy and
should be fighting over the freelist once the pool exhausts.

We don't get to skip wal of course, but we should be able to use a
bulk insert strategy, especially if there was some way of predicting
that a large number of tuples were going to be inserted.  I'm
wondering though of contention on the free list is in fact the OP's
problem.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
On Thu, May 24, 2012 at 03:54:54PM -0400, Tom Lane wrote:
 Did you check I/O activity?  I looked again at Jeff Frost's report and
 now think that what he saw was probably a lot of seqscans on bloated
 system catalogs, cf
 http://archives.postgresql.org/message-id/28484.1337887...@sss.pgh.pa.us

Thank you for the response. Yes, we did look at I/O, but nothing unusual 
was seen that would explain the load. If it happens again I'll see if the 
system catalogs are getting a lot of seqscans.

 That's fairly interesting, but if it was a bloat situation then it
 would've been the VAC FULL that fixed it rather than the REINDEX.
 Did you happen to save the VERBOSE output?  It'd be really useful to
 know whether there was any major shrinkage of the core catalogs
 (esp. pg_class, pg_attribute).

I did have them in screen, but the home screen box just craashed a 
few hours ago (after weeks of uptime: Murphy's law). It certainly could 
have been VAC FULL as we have processes that do both VAC FULL and REINDEX 
periodically; I simply assumed based on the HINT it was the REINDEX that 
cleared it up. The only snippet I have is:

# vacuum full verbose pg_class;
INFO:  vacuuming pg_catalog.pg_class
INFO:  pg_class: found 43 removable, 10376 nonremovable row versions in 518 
pages
DETAIL:  6078 dead row versions cannot be removed yet.
Nonremovable row versions range from 160 to 628 bytes long.
There were 7367 unused item pointers.
Total free space (including removable row versions) is 157000 bytes.
0 pages are or will become empty, including 0 at the end of the table.
0 pages containing 0 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index pg_class_oid_index now contains 7712 row versions in 32 pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
WARNING:  index pg_class_oid_index contains 7712 row versions, but table 
contains 9471 row versions
HINT:  Rebuild the index with REINDEX.
INFO:  index pg_class_relname_nsp_index now contains 7712 row versions in 113 
pages
DETAIL:  6 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
WARNING:  index pg_class_relname_nsp_index contains 7712 row versions, but 
table contains 9471 row versions
HINT:  Rebuild the index with REINDEX.
VACUUM
Time: 65.635 ms

Actually, since we have full logging, those warnings should still be 
there... (checks) ... yep. Hints elided, the last one seen was:

WARNING: index pg_class_oid_index contains 14877 row versions,
  but table contains 66648 row versions
WARNING: index pg_class_relname_nsp_index contains 14877 row versions,
  but table contains 66648 row versions
WARNING: index pg_class_oid_index contains 60943 row versions,
  but table contains 112714 row versions
WARNING:  index pg_class_relname_nsp_index contains 60943 row versions,
  but table contains 112714 row versions

Looks like there are some more going back a ways: I'll gather together 
and send those offlist after this.

 I think there are probably two independent issues here.  The missing
 index entries are clearly bad but it's not clear that they had anything
 to do with the startup stall.  There are a couple of fixes in recent
 8.3.x releases that might possibly explain the index corruption,
 especially if you're in the habit of reindexing the system catalogs
 frequently.

Yes, we are in that habit. I hope they are not independent: I'd rather 
have one problem to worry about than two. :) I'll certainly report if I 
see either problem pop up again.

Oh, almost forgot: reading your reply to the old thread reminded me of 
something I saw in one of the straces right as it woke up and left 
the startup state to do some work. Here's a summary:

12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
(x a gazillion)
...
12:18:40 brk(0x1c0af000)= 0x1c0af000
...(some more semops)...
12:18:40 mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
-1, 0) = 0x2ac062c98000
...(handful of semops)...
12:18:40 unlink(base/1554846571/pg_internal.init.11803) = -1 ENOENT (No such 
file or directory)
12:18:40 open(base/1554846571/pg_internal.init.11803, 
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 13
12:18:40 fstat(13, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
12:18:40 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0x2ac062cd9000
12:18:40 write(13, ...
...(normalish looking strace output after this)...

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgpgvtyJ9p6Fs.pgp
Description: PGP signature


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
 I think there are probably two independent issues here.  The missing
 index entries are clearly bad but it's not clear that they had anything
 to do with the startup stall.

On further log digging, I think you are correct, as those index 
warnings go back many days before the startup problems appeared.
Let me know if you'd like any of those warnings from the logs.

-- 
Greg Sabino Mullane g...@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8


pgp9P2W8oU7Y8.pgp
Description: PGP signature


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Robert Haas
On Thu, May 24, 2012 at 4:46 PM, Merlin Moncure mmonc...@gmail.com wrote:
 Wait -- OP's gripe this isn't regarding standard pgbench, but multiple
 large concurrent 'insert into foo select...'.  I looked in the code
 and it appears that the only bulk insert strategy using operations are
 copy, create table as select, and table rewrite operations. Concurrent
 INSERT SELECT apparently doesn't get the benefit of a strategy and
 should be fighting over the freelist once the pool exhausts.

I think you are right.

 We don't get to skip wal of course, but we should be able to use a
 bulk insert strategy, especially if there was some way of predicting
 that a large number of tuples were going to be inserted.  I'm
 wondering though of contention on the free list is in fact the OP's
 problem.

Not sure.  It might be some other LWLock, but it's hard to tell which
one from the information provided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-24 Thread Peter Geoghegan
On 21 May 2012 19:10, Josh Berkus j...@agliodbs.com wrote:

 For these reasons, it may be timely and appropriate, from a purely
 advocacy point-of-view, to call our new group commit group commit in
 release notes and documentation, and announce it as a new feature.

 First, shouldn't we be having this discussion on -advocacy?

Well, no, because this is a specific discussion about release notes.
In any case, I've given up on the idea that we should market new group
commit as group commit. I believe that that would be a useful and
fair way of representing the feature, but there doesn't seem to be any
support for that view.

In passing, I noticed this:


E.1.3.12.2. pg_stat_statements

Improve pg_stat_statements to aggregate similar queries (Peter
Geoghegan, Tom Lane)

Improve pg_stat_statements' handling of PREPARE/EXECUTE statements (Tom Lane)

Add dirtied and written block counts to pg_stat_statements (Robert Haas)


I think that the second entry should be listed as a bug fix, or a
compatibility note, rather than an actual feature. At the very least,
it should be listed after Add dirtied and written block counts. I
also think that we should separately list as a feature
pg_stat_statements new ability to track I/O timings at the query
granularity.

Are we any closer to a list of major features?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] release note item

2012-05-24 Thread Bruce Momjian
On Mon, May 21, 2012 at 12:19:27PM -0400, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  Regarding the item:
* Properly handle empty arrays returned by PL/Perl functions (Andrew
  Dunstan) DETAILS?
 
  This was a bug fix, not a feature, and in any case is due to Alex 
  Hunsaker, not me.
 
 I assume this is in reference to these commits:

Thanks, removed from 9.2 release notes.

---


 
 Author: Andrew Dunstan and...@dunslane.net
 Branch: REL9_1_STABLE Release: REL9_1_0 [22a55b3af] 2011-08-17 12:03:26 -0400
 
 Properly handle empty arrays returned from plperl functions.
 
 Bug reported by David Wheeler, fix by Alex Hunsaker.
 # Please enter the commit message for your changes. Lines starting
 # with '#' will be ignored, and an empty message aborts the commit.
 # On branch master
 # Changes to be committed:
 #   (use git reset HEAD file... to unstage)
 #
 # modified:   src/pl/plperl/plperl.c
 #
 # Untracked files:
 #   (use git add file... to include in what will be committed)
 #
 # autom4te.cache/
 # configure.in~
 # doc/src/sgml/ref/grant.sgml~
 # src/backend/port/win32_latch.c~
 # src/bin/psql/command.c~
 # src/include/pg_config.h.win32~
 # src/pl/plpython/plpython.c~
 # src/tools/msvc/pgbison.bat~
 # src/tools/msvc/pgbison.pl.bak
 # src/tools/msvc/pgflex.bat~
 # src/tools/msvc/pgflex.pl.bak
 # src/tools/pgindent/README~
 # src/tools/pgindent/pgindent.pl
 # src/tools/pgindent/pgindent.pl~
 # x
 # yy
 
 Author: Andrew Dunstan and...@dunslane.net
 Branch: master [68c903a66] 2011-08-17 11:59:18 -0400
 
 Properly handle empty arrays returned from plperl functions.
 
 Bug reported by David Wheeler, fix by Alex Hunsaker.
 
 git_changelog failed to merge them because of the different commit
 messages, which is probably why Bruce thought it was a HEAD-only
 commit.  AFAICS we should just remove this item, since it was (or
 should've been) documented already in 9.1 release notes.
 
   regards, tom lane

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-24 Thread Bruce Momjian
On Thu, May 24, 2012 at 10:34:22PM +0100, Peter Geoghegan wrote:
 In passing, I noticed this:
 
 
 E.1.3.12.2. pg_stat_statements
 
 Improve pg_stat_statements to aggregate similar queries (Peter
 Geoghegan, Tom Lane)
 
 Improve pg_stat_statements' handling of PREPARE/EXECUTE statements (Tom Lane)
 
 Add dirtied and written block counts to pg_stat_statements (Robert Haas)
 
 
 I think that the second entry should be listed as a bug fix, or a
 compatibility note, rather than an actual feature. At the very least,
 it should be listed after Add dirtied and written block counts. I
 also think that we should separately list as a feature

OK, item moved down.  We have not have bug fix designation.  You have
a suggestion?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap metapages

2012-05-24 Thread Bruce Momjian
On Tue, May 22, 2012 at 09:52:30AM +0100, Simon Riggs wrote:
 Having pg_upgrade touch data files is both dangerous and difficult to
 back out in case of mistake, so I am wary of putting the metapage at
 block 0. Doing it the way I suggest means the .meta files would be
 wholly new and can be deleted as a back-out. We can also clean away
 any unnecessary .vm/.fsm files as a later step.

Pg_upgrade never modifies the old cluster, except to lock it in link
mode, so there is never anything to back out.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes complete

2012-05-24 Thread Peter Geoghegan
On 24 May 2012 22:57, Bruce Momjian br...@momjian.us wrote:
 OK, item moved down.  We have not have bug fix designation.  You have
 a suggestion?

I assumed you were going to put it beside the other compatibility note
relating to pg_stat_statements, Change pg_stat_statements' total_time
column to be measured in milliseconds (Tom Lane).

The Improve pg_stat_statements' handling of PREPARE/EXECUTE
statements is just a way of preventing SQL PREPARE and EXECUTE
utility statements from being double counted in various ways as both
utility statements and optimisable statements. No one actually noticed
this before, and it wouldn't have been feasible to fix in back
branches, I think. Here are the relevant comments:

  * If it's an EXECUTE statement, we don't track it and don't increment
  * the nesting level.  This allows the cycles to be charged to the
  * underlying PREPARE instead (by the Executor hooks), which is much more
  * useful.
  *
  * We also don't track execution of PREPARE.  If we did, we would get one
  * hash table entry for the PREPARE (with hash calculated from the query
  * string), and then a different one with the same query string (but hash
  * calculated from the query tree) would be used to accumulate costs of
  * ensuing EXECUTEs.  This would be confusing, and inconsistent with other
  * cases where planning time is not included at all.

Also, as I've said, this I/O timings thing certainly deserves to be
separately listed as a new pg_stat_statements feature:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5b4f346611431361339253203d486789e4babb02

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, May 24, 2012 at 4:46 PM, Merlin Moncure mmonc...@gmail.com wrote:
 We don't get to skip wal of course, but we should be able to use a
 bulk insert strategy, especially if there was some way of predicting
 that a large number of tuples were going to be inserted.  I'm
 wondering though of contention on the free list is in fact the OP's
 problem.

 Not sure.  It might be some other LWLock, but it's hard to tell which
 one from the information provided.

Yeah.  It seems quite plausible that Robert's select-only benchmark might
be mainly tripping over the freelist lock, but I'm less convinced about
something that's doing INSERT/SELECT, and therefore is also doing a lot
of WAL activity, index insertions, etc.  I'd want to see some
instrumentation results before assuming we know where the bottleneck is
there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Per-Database Roles

2012-05-24 Thread Bruce Momjian
On Tue, May 22, 2012 at 10:19:12AM -0400, Robert Haas wrote:
 In retrospect, I think the idea of shared catalogs was probably a bad
 idea.  I think we should have made roles and tablespaces database
 objects rather than shared objects, and come up with some ad-hoc
 method of representing the set of available databases.  But that
 decision seems to have been made sometime pre-1996, so the thought of
 changing it now is pretty painful, but I can dream...

Yes, pre-1996.  I think the fact that authentication/user names appear
in pg_hba.conf really locked the user name idea into global objects, and
we have never really been able to make a dent in that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Tom Lane
Greg Sabino Mullane g...@endpoint.com writes:
 Oh, almost forgot: reading your reply to the old thread reminded me of 
 something I saw in one of the straces right as it woke up and left 
 the startup state to do some work. Here's a summary:

 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0
 (x a gazillion)
 ...
 12:18:40 brk(0x1c0af000)= 0x1c0af000
 ...(some more semops)...
 12:18:40 mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
 -1, 0) = 0x2ac062c98000
 ...(handful of semops)...
 12:18:40 unlink(base/1554846571/pg_internal.init.11803) = -1 ENOENT (No 
 such file or directory)
 12:18:40 open(base/1554846571/pg_internal.init.11803, 
 O_WRONLY|O_CREAT|O_TRUNC, 0666) = 13
 12:18:40 fstat(13, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
 12:18:40 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 
 -1, 0) = 0x2ac062cd9000
 12:18:40 write(13, ...
 ...(normalish looking strace output after this)...

Yeah, this is proof that what it was doing is the same as what we saw in
Jeff's backtrace, ie loading up the system catalog relcache entries the
hard way via seqscans on the core catalogs.  So the question to be
answered is why that's suddenly a big performance bottleneck.  It's not
a cheap operation of course (that's why we cache the results ;-)) but
it shouldn't take minutes either.  And, because they are seqscans, it
doesn't seem like messed-up indexes should matter.

The theory I have in mind about Jeff's case is that it was basically an
I/O storm, but it's not clear whether the same explanation works for
your case.  There may be some other contributing factor that we haven't
identified yet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

2012-05-24 Thread Sergey Koposov

Hi,

On Thu, 24 May 2012, Robert Haas wrote:

Not sure.  It might be some other LWLock, but it's hard to tell which
one from the information provided.


If you could tell what's the best way to find out the info that you need, 
then I could run it reasonably quickly.


S

*
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Readme of Buffer Management seems to have wrong sentence

2012-05-24 Thread Ants Aasma
On Tue, May 22, 2012 at 11:36 PM, Ants Aasma a...@cybertec.at wrote:
 ... The free list
 itself is a bit trickier, but if it's still necessary/useful then
 SC-firstFreeBuffer and buf-freeNext are in effect a linked-list
 stack, there should plenty of tested lock free algorithms floating
 around for that. (btw. lastFreeBuffer looks like dead code, is that
 correct?)

Thinking about it a bit more, if the freelist is mostly empty, a
simpler alternative would be to make an unprotected read to check
SC-firstFreeBuffer and only acquire BufFreelistLock if there's
anything to pop. This would reduce the lock free parts to just
atomically incrementing a variable.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >