[HACKERS] Using the new libpq connection functions in PostgreSQL binaries

2010-01-31 Thread Guillaume Lelarge
Hi,

I worked on a patch to make PostgreSQL binaries use the new
PQconnectdbParams() libpq functions. I tried to mimic the way Joe Conway
changed my previous patch.

I know I'm way over the deadline for this commitfest. I couldn't do it
before because my previous patch (on this commit fest) proposed two
methods to do the new connection functions (a one array method, and a
two-arrays method). Joe chose the two arrays method. Anyways, I would
understand if it gets postponed to the first commitfest for 9.1.

Regards.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com
Index: contrib/oid2name/oid2name.c
===
RCS file: /opt/cvsroot_postgresql/pgsql/contrib/oid2name/oid2name.c,v
retrieving revision 1.36
diff -c -p -c -r1.36 oid2name.c
*** contrib/oid2name/oid2name.c	11 Jun 2009 14:48:51 -	1.36
--- contrib/oid2name/oid2name.c	31 Jan 2010 01:36:38 -
*** sql_conn(struct options * my_opts)
*** 301,306 
--- 301,308 
  	PGconn	   *conn;
  	char	   *password = NULL;
  	bool		new_pass;
+ 	const char *keywords[] = {host,port,dbname,user,
+ 			  password,application_name,NULL};
  
  	/*
  	 * Start the connection.  Loop until we have a password if requested by
*** sql_conn(struct options * my_opts)
*** 308,321 
  	 */
  	do
  	{
  		new_pass = false;
! 		conn = PQsetdbLogin(my_opts-hostname,
! 			my_opts-port,
! 			NULL,		/* options */
! 			NULL,		/* tty */
! 			my_opts-dbname,
! 			my_opts-username,
! 			password);
  		if (!conn)
  		{
  			fprintf(stderr, %s: could not connect to database %s\n,
--- 310,327 
  	 */
  	do
  	{
+ const char *values[] = {
+   my_opts-hostname,
+   my_opts-port,
+   my_opts-dbname,
+   my_opts-username,
+   password,
+   oid2name,
+   NULL
+   };
+ 
  		new_pass = false;
! conn = PQconnectdbParams(keywords, values);
  		if (!conn)
  		{
  			fprintf(stderr, %s: could not connect to database %s\n,
Index: contrib/pgbench/pgbench.c
===
RCS file: /opt/cvsroot_postgresql/pgsql/contrib/pgbench/pgbench.c,v
retrieving revision 1.96
diff -c -p -c -r1.96 pgbench.c
*** contrib/pgbench/pgbench.c	6 Jan 2010 01:30:03 -	1.96
--- contrib/pgbench/pgbench.c	31 Jan 2010 01:41:45 -
*** doConnect(void)
*** 345,350 
--- 345,352 
  	PGconn	   *conn;
  	static char *password = NULL;
  	bool		new_pass;
+ 	const char *keywords[] = {host,port,options,tty,dbname,user,
+ 			  password,application_name,NULL};
  
  	/*
  	 * Start the connection.  Loop until we have a password if requested by
*** doConnect(void)
*** 352,361 
  	 */
  	do
  	{
  		new_pass = false;
! 
! 		conn = PQsetdbLogin(pghost, pgport, pgoptions, pgtty, dbName,
! 			login, password);
  		if (!conn)
  		{
  			fprintf(stderr, Connection to database \%s\ failed\n,
--- 354,373 
  	 */
  	do
  	{
+ const char *values[] = {
+   pghost,
+   pgport,
+   pgoptions,
+   pgtty,
+   dbName,
+   login,
+   password,
+   pgbench,
+   NULL
+   };
+ 
  		new_pass = false;
! conn = PQconnectdbParams(keywords, values);
  		if (!conn)
  		{
  			fprintf(stderr, Connection to database \%s\ failed\n,
Index: contrib/vacuumlo/vacuumlo.c
===
RCS file: /opt/cvsroot_postgresql/pgsql/contrib/vacuumlo/vacuumlo.c,v
retrieving revision 1.44
diff -c -p -c -r1.44 vacuumlo.c
*** contrib/vacuumlo/vacuumlo.c	2 Jan 2010 16:57:33 -	1.44
--- contrib/vacuumlo/vacuumlo.c	31 Jan 2010 01:44:55 -
*** vacuumlo(char *database, struct _param *
*** 70,75 
--- 70,77 
  	int			i;
  	static char *password = NULL;
  	bool		new_pass;
+ 	const char *keywords[] = {host,port,dbname,user,
+ 			  password,application_name,NULL};
  
  	if (param-pg_prompt == TRI_YES  password == NULL)
  		password = simple_prompt(Password: , 100, false);
*** vacuumlo(char *database, struct _param *
*** 80,94 
  	 */
  	do
  	{
  		new_pass = false;
! 
! 		conn = PQsetdbLogin(param-pg_host,
! 			param-pg_port,
! 			NULL,
! 			NULL,
! 			database,
! 			param-pg_user,
! 			password);
  		if (!conn)
  		{
  			fprintf(stderr, Connection to database \%s\ failed\n,
--- 82,99 
  	 */
  	do
  	{
+ const char *values[] = {
+   param-pg_host,
+   param-pg_port,
+   database,
+   param-pg_user,
+   password,
+   vacuumlo,
+   NULL
+   };
+ 
  		new_pass = false;
!

Re: [HACKERS] development setup and libdir

2010-01-31 Thread Euler Taveira de Oliveira
Ivan Sergio Borgonovo escreveu:
 I'm pretty sure that what you're pointing at is not going to work
 unless you specify a bunch of other parameters.
 
Ugh? Are you saying there is something wrong in our *official* documentation?
It is just a normal compilation command; if you're a C programmer you'll have
no problem.

 Once you ask... people direct you to pgxs. pgxs seems a good
 choice... a newcomer like me thinks... well that was made exactly to
 pick up all the information it needs from the environment and build
 my module.
 
Again, PGXS was developed to make packagers' life easier. Why on earth I want
to install my library in a path different from $libdir? Packagers don't.
Besides, PGXS won't work on Windows unless you're using a installation built
using MinGW.

 connection, but still different version of pgxs gives different
 names to .so.
 
What the problem with that? If it set up the right name in sql file that's OK.

IMHO, you're trying to complicate a simple process. If you messed up your
installation you can always do:

$ cd mymodule
$ USE_PGXS=1 make uninstall


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-31 Thread Matteo Beccati

On 30/01/2010 22:18, Joe Conway wrote:

On 01/30/2010 01:14 PM, Dimitri Fontaine wrote:

Matteo Beccatip...@beccati.com  writes:

I've been following the various suggestions. Please take a look at the
updated archives proof of concept:

http://archives.beccati.org/


I like the features a lot, and the only remarks I can think about are
bikeschedding, so I'll let it to the web team when they integrate it. It
sure looks like a when rather than an if as far as I'm concerned.

In short, +1! And thanks a lot!


+1 here too. That looks wonderful!


Thanks guys. Hopefully in the next few days I'll be able to catch up 
with Alvaro to see how we can proceed on this.


Incidentally, I've just found out that the mailing lists are dropping 
some messages. According to my qmail logs the AOX account never received 
Joe's message yesterday, nor quite a few others:


M156252, M156259, M156262, M156273, M156275

and I've verified that it also has happened before. I don't know why, 
but I'm pretty sure that my MTA was contacted only once for those 
messages, while normally I get two connections (my own address + aox 
address).



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries

2010-01-31 Thread Magnus Hagander
On Sun, Jan 31, 2010 at 09:34, Guillaume Lelarge guilla...@lelarge.info wrote:
 Hi,

 I worked on a patch to make PostgreSQL binaries use the new
 PQconnectdbParams() libpq functions. I tried to mimic the way Joe Conway
 changed my previous patch.

 I know I'm way over the deadline for this commitfest. I couldn't do it
 before because my previous patch (on this commit fest) proposed two
 methods to do the new connection functions (a one array method, and a
 two-arrays method). Joe chose the two arrays method. Anyways, I would
 understand if it gets postponed to the first commitfest for 9.1.

I think this can reasonably be seen as the final step of that patch,
rather than a completely new feature. Please add it to this CF - we
can always remove it if too many others object ;)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] development setup and libdir

2010-01-31 Thread Ivan Sergio Borgonovo
On Sun, 31 Jan 2010 02:44:13 -0200
Euler Taveira de Oliveira eu...@timbira.com wrote:

 Ivan Sergio Borgonovo escreveu:
  I'm pretty sure that what you're pointing at is not going to work
  unless you specify a bunch of other parameters.

 Ugh? Are you saying there is something wrong in our *official*
 documentation? It is just a normal compilation command; if you're
 a C programmer you'll have no problem.

Well I think if I didn't have to compile for 8.3 I'd be more
naturally guided to a solution since when make install does more
than just moving files, you wonder if it is doing even more.
I still don't know if make install is doing something more other
than moving/renaming files.

Still I think the documentation doesn't provide a reasonable path to
*try* to write Hello world.
Actually there isn't even a suggested path that works unless you
knock at pgsql-hackers and ask for details.

Of course explanation on how to compile manually an extension
without pgxs and a template makefile aren't sufficient.

When I'd like to try something new I'd like to put myself in the
most diffused, standard environment eg. one thing I'd like to avoid
is choosing my own flavour of compile options.

So... what's better than providing a template makefile?
And there is one, and it works.

But postgres makefile read module.sql.in and output module.sql. I'd
consider module.sql.in part of the source of the project and I'd
think it has to be relocatable without change.

Now you use pgsx and it works great...
You've your .so there, you look at the pattern used in the .sql, you
adapt it and it still does work. Oh look! 8.3 change the .so name at
make install.
You adapt it once more but that makes you wonder... is make install
doing something else?
It would be better if:
a) you document what make install is really doing
or
b) provided that contrib make install just move stuff around you let
specify people where they would like to install the lib at make time
so a sensible module.sql is produced

b) looks easier to maintain and easier to use. But as said I may have
a naïve idea of what really means providing a working build system
for many architecture/OSes.

No rocket science indeed. I'm not crying, I've already written
mysimple script to just post-process module.sql. I'm describing my
experience so you may be aware of the problems that new comers face.
It is up to the people that will write actual code/docs to see if it
is worth for them to do so.

As soon as I'll feel comfortable to write something worth I'll write
some docs.

  Once you ask... people direct you to pgxs. pgxs seems a good
  choice... a newcomer like me thinks... well that was made
  exactly to pick up all the information it needs from the
  environment and build my module.

 Again, PGXS was developed to make packagers' life easier. Why on
 earth I want to install my library in a path different from
 $libdir? Packagers don't. Besides, PGXS won't work on Windows
 unless you're using a installation built using MinGW.

This is something like you bought a car to go from Paris to Berlin
but you can't use it to go to Marseilles just because you don't have
a map.
pgxs and debian packages did a nearly perfect job even for my needs.

Being able to compile and install an extension without a full dev
environment and without being root has a non empty set of reasonable
applications. Making easier to automate it out of the box doesn't
look a bad thing.

Sorry if it seemed I was complaining, I was trying to give feedback
while learning my way to write Hello world.

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-31 Thread Magnus Hagander
On Sat, Jan 30, 2010 at 22:43, Matteo Beccati p...@beccati.com wrote:
 On 30/01/2010 17:54, Alvaro Herrera wrote:
 * While I don't personally care, some are going to insist that the site
 works with Javascript disabled.  I didn't try but from your description
 it doesn't seem like it would.  Is this easily fixable?

 Date sorting works nicely even without JS, while thread sorting doesn't at
 all. I've just updated the PoC so that thread sorting is not available when
 JS is not available, while it still is the default otherwise. Hopefully
 that's enough to keep JS haters happy.

I haven't looked at how it actually works, but the general requirement
is that it has to *work* without JS. It doesn't have to work *as
well*. That means serving up a page with zero contents, or a page that
you can't navigate, is not acceptable. Requiring more clicks to get
around the navigation and things like that, are ok.


 * The old monthly interface /list/-mm/msg*php is not really
 necessary to keep, *except* that we need the existing URLs to redirect
 to the corresponding new message page.  I think we should be able to
 create a database of URL redirects from the old site, using the
 Message-Id URL style.  So each message accessed using the old URL style
 would require two redirects, but I don't think this is a problem.  Do
 you agree?

 Sure. I was just hoping there was an even easier way (rescritct to month,
 order by uid limit 1 offset X). I guess it wouldn't be hard to write a
 script that populates a backward compatibility table. No need for double
 redirects, it'd be just a matter of adding a JOIN or two to the query.

Once we go into production on this, we'll need to do some serious
thinking about the caching issues. And in any such scenario we should
very much avoid serving up the same content under different URLs,
since it'll blow away cache space for no reason - it's much better to
throw a redirct.


 * We're using Subversion to keep the current code.  Is your code
 version-controlled?  We'd need to import your code there, I'm afraid.

 I do have a local svn repository. Given it's just a PoC that is going to be
 rewritten I don't think it should live in the official repo, but if you
 think id does, I'll be glad to switch.

Note that the plan is to switch pgweb to git as well. So if you just
want to push the stuff up during development so people can look at it,
register for a repository at git.postgresql.org - or just set one up
at github which is even easier.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries

2010-01-31 Thread Guillaume Lelarge
Le 31/01/2010 13:39, Magnus Hagander a écrit :
 On Sun, Jan 31, 2010 at 09:34, Guillaume Lelarge guilla...@lelarge.info 
 wrote:
 Hi,

 I worked on a patch to make PostgreSQL binaries use the new
 PQconnectdbParams() libpq functions. I tried to mimic the way Joe Conway
 changed my previous patch.

 I know I'm way over the deadline for this commitfest. I couldn't do it
 before because my previous patch (on this commit fest) proposed two
 methods to do the new connection functions (a one array method, and a
 two-arrays method). Joe chose the two arrays method. Anyways, I would
 understand if it gets postponed to the first commitfest for 9.1.
 
 I think this can reasonably be seen as the final step of that patch,
 rather than a completely new feature. Please add it to this CF - we
 can always remove it if too many others object ;)
 

Done (https://commitfest.postgresql.org/action/patch_view?id=278). Thanks.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] development setup and libdir

2010-01-31 Thread Andrew Dunstan



Ivan Sergio Borgonovo wrote:


When I'd like to try something new I'd like to put myself in the
most diffused, standard environment eg. one thing I'd like to avoid
is choosing my own flavour of compile options.

  


This is just nonsense, as we have explained to you several times and you 
keep ignoring.


The standard environment you're talking about will not have the 
compilation options enabled which are specifically designed to help 
developers. Why would anyone want to deprive themselves if that support?


You have wasted far more of your time writing these emails than it would 
have taken you to set up a postgres development environment which you 
could have used *with* pgxs, because, depite being told what pgxs is 
for, you keep trying to make it do something it was not desinged for. If 
anyone is going from London to Paris via Marseilles it is you.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mailing list archiver chewing patches

2010-01-31 Thread Matteo Beccati

On 31/01/2010 13:45, Magnus Hagander wrote:

On Sat, Jan 30, 2010 at 22:43, Matteo Beccatip...@beccati.com  wrote:

On 30/01/2010 17:54, Alvaro Herrera wrote:

* While I don't personally care, some are going to insist that the site
works with Javascript disabled.  I didn't try but from your description
it doesn't seem like it would.  Is this easily fixable?


Date sorting works nicely even without JS, while thread sorting doesn't at
all. I've just updated the PoC so that thread sorting is not available when
JS is not available, while it still is the default otherwise. Hopefully
that's enough to keep JS haters happy.


I haven't looked at how it actually works, but the general requirement
is that it has to *work* without JS. It doesn't have to work *as
well*. That means serving up a page with zero contents, or a page that
you can't navigate, is not acceptable. Requiring more clicks to get
around the navigation and things like that, are ok.


As it currently stands, date sorting is the default and there are no 
links to the thread view, which would otherwise look empty. We can 
surely build a non-JS thread view as well, I'm just not sure if it's 
worth the effort.



* The old monthly interface /list/-mm/msg*php is not really
necessary to keep, *except* that we need the existing URLs to redirect
to the corresponding new message page.  I think we should be able to
create a database of URL redirects from the old site, using the
Message-Id URL style.  So each message accessed using the old URL style
would require two redirects, but I don't think this is a problem.  Do
you agree?


Sure. I was just hoping there was an even easier way (rescritct to month,
order by uid limit 1 offset X). I guess it wouldn't be hard to write a
script that populates a backward compatibility table. No need for double
redirects, it'd be just a matter of adding a JOIN or two to the query.


Once we go into production on this, we'll need to do some serious
thinking about the caching issues. And in any such scenario we should
very much avoid serving up the same content under different URLs,
since it'll blow away cache space for no reason - it's much better to
throw a redirct.


Yes, that was my point. A single redirect to the only URL for the message.


* We're using Subversion to keep the current code.  Is your code
version-controlled?  We'd need to import your code there, I'm afraid.


I do have a local svn repository. Given it's just a PoC that is going to be
rewritten I don't think it should live in the official repo, but if you
think id does, I'll be glad to switch.


Note that the plan is to switch pgweb to git as well. So if you just
want to push the stuff up during development so people can look at it,
register for a repository at git.postgresql.org - or just set one up
at github which is even easier.


The only reason why I used svn is that git support in netbeans is rather 
poor, or at least that was my impression. I think it won't be a problem 
to move to git, I probably just need some directions ;)



Cheers
--
Matteo Beccati

Development  Consulting - http://www.beccati.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Memory leak in deferrable index constraints

2010-01-31 Thread Dean Rasheed
Oops, my fault. The list returned by ExecInsertIndexTuples() needs to be
freed otherwise lots of lists (one per row) will build up and not be freed
until the end of the query. This actually accounts for even more memory
than the after-trigger event queue. Patch attached.

Of course the after-trigger queue still uses a lot of memory for large
updates (I was working on a patch for that but ran out of time before
this commitfest started). This fix at least brings deferred index
constraints into line with FK constraints, in terms of memory usage.

Regards,
Dean


memory_leak.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory leak in deferrable index constraints

2010-01-31 Thread Tom Lane
Dean Rasheed dean.a.rash...@googlemail.com writes:
 Oops, my fault. The list returned by ExecInsertIndexTuples() needs to be
 freed otherwise lots of lists (one per row) will build up and not be freed
 until the end of the query. This actually accounts for even more memory
 than the after-trigger event queue. Patch attached.

It seems a bit unlikely that this would be the largest memory leak in
that area.  Can you show a test case that demonstrates this is worth
worrying about?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ordered aggregates using WITHIN GROUP (was Re: can somebody execute this query on Oracle 11.2g and send result?)

2010-01-31 Thread Tom Lane
Hitoshi Harada umi.tan...@gmail.com writes:
 As far as I know hypothetical set function is used to do what-if
 analysis. rank(val1) within group (order by sk1) chooses the rank
 value so that val1 is equivalent to or just greater than sk1 when you
 calculate rank() over (partition by group order by sk1) within the
 group.

Hmm.  I found this in SQL:2008 4.15:

The hypothetical set functions are related to the window functions RANK,
DENSE_RANK, PERCENT_RANK, and CUME_DIST, and use the same names, though
with a different syntax.  These functions take an argument A and an
ordering of a value expression VE.  VE is evaluated for all rows of the
group.  This collection of values is augmented with A; the resulting
collection is treated as a window partition of the corresponding window
function whose window ordering is the ordering of the value expression.
The result of the hypothetical set function is the value of the
eponymous window function for the hypothetical row that contributes A
to the collection.

It appears that the syntax is meant to be

hypothetical_function(A) WITHIN GROUP (VE)

However this really ought to imply that A contains no variables of the
current query, and I don't see such a restriction mentioned anywhere ---
maybe an oversight in the spec?  If A does contain a variable then there
is no unique value to append as the single additional row.

I still say that Oracle are completely wrong to have adopted this syntax
for listagg, because per spec it does something different than what
listagg needs to do.  In particular it should mean that the listagg
argument can't contain variables --- which is what they want for the
delimiter, perhaps, but not for the expression to be concatenated.

 In other words, the queries can be the same:

 SELECT array_agg(val ORDER BY sk) FROM ...
 SELECT array_agg(val) WITHIN GROUP (ORDER BY sk) FROM ...

One more time: THOSE DON'T MEAN THE SAME THING.  If we ever get
around to implementing the hypothetical set functions, we would
be very unhappy to have introduced such a bogus equivalence.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory leak in deferrable index constraints

2010-01-31 Thread Dean Rasheed
On 31 January 2010 16:03, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@googlemail.com writes:
 Oops, my fault. The list returned by ExecInsertIndexTuples() needs to be
 freed otherwise lots of lists (one per row) will build up and not be freed
 until the end of the query. This actually accounts for even more memory
 than the after-trigger event queue. Patch attached.

 It seems a bit unlikely that this would be the largest memory leak in
 that area.  Can you show a test case that demonstrates this is worth
 worrying about?


If I do

create table foo(a int unique deferrable initially deferred);
insert into foo (select * from generate_series(1, 1000));
begin;
update foo set a=a+1;
set constraints all immediate;
commit;

and watch the backend memory usage during the UPDATE, it grows to
around 970MB and then falls back to 370MB as soon as the UPDATE
command finishes. During whole SET CONSTRAINTS trigger execution
it then remains constant at 370MB.

With this patch, it never grows beyond the 370MB occupied by the
after-triggers queue.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] odd output in initdb

2010-01-31 Thread Magnus Hagander
On Fri, Jan 29, 2010 at 23:28, Andrew Dunstan and...@dunslane.net wrote:


 Andrew Dunstan wrote:


   initializing dependencies ... WARNING:  pgstat wait timeout
   WARNING:  pgstat wait timeout
   ok
     vacuuming database template1 ... WARNING:  pgstat wait timeout
   WARNING:  pgstat wait timeout
   ok
   copying template1 to template0 ... WARNING:  pgstat wait timeout


 I can't see an obvious culprit in the commit logs right off.

 Actually, on close inspection it looks to me like this commit: Create
 typedef pgsocket for storing socket descriptors.
 http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=ea1a4463e9de9662b7c9e13374ec8c7b92ff2f19
 could well be the culprit.


I'm not claiming it's not, but what exactly points to that? Does the
problem go away if you move to a version before that?

Because I'm 99% sure I saw it well before that commit, and we've had
reports on it from 8.4 as well, I'm not so sure... But it may be that
that commit made something more likely to happen...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries

2010-01-31 Thread Tom Lane
Guillaume Lelarge guilla...@lelarge.info writes:

*/
   do
   {
 + const char *values[] = {
 +   my_opts-hostname,
 +   my_opts-port,
 +   my_opts-dbname,
 +   my_opts-username,
 +   password,
 +   oid2name,
 +   NULL
 +   };
 + 
   new_pass = false;

Is that really legal C89 syntax?  I seem to recall that array
constructors can only be used for static assignments with older
compilers.

Also, as a matter of style, I find it pretty horrid that this isn't
immediately adjacent to the keywords array that it MUST match.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] odd output in initdb

2010-01-31 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Fri, Jan 29, 2010 at 23:28, Andrew Dunstan and...@dunslane.net wrote:
   initializing dependencies ... WARNING:  pgstat wait timeout

 I'm not claiming it's not, but what exactly points to that? Does the
 problem go away if you move to a version before that?

 Because I'm 99% sure I saw it well before that commit, and we've had
 reports on it from 8.4 as well, I'm not so sure... But it may be that
 that commit made something more likely to happen...

I notice pgstat_send is still using if (pgStatSock  0) to detect
PGINVALID_SOCKET ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pathological regexp match

2010-01-31 Thread Magnus Hagander
On Sat, Jan 30, 2010 at 08:07, Tom Lane t...@sss.pgh.pa.us wrote:
 Michael Glaesemann michael.glaesem...@myyearbook.com writes:
 We came across a regexp that takes very much longer than expected.

 I poked into this a little bit.  What seems to be happening is that the
 use of non-greedy quantifiers plus backreferences turns off most of the
 optimization that the regexp engine usually does, leaving the RE as a
 tree of possibilities that is explored in a fairly dumb fashion.  In
 particular, there is a loop in crevdissect() that tries to locate a
 feasible division point between two concatenated sub-patterns, and
 for each try, a recursive call to crevdissect() tries to locate a
 feasible division point between *its* two sub-patterns, resulting in
 O(N^2) runtime.  With a not very pleasant constant factor, too, because
 of repeated startup/shutdown costs for the DFA matching engine.

 I found a possible patch that seems to improve matters: AFAICS the DFA
 matching is independent of the checking that cdissect() and friends do,
 so we can apply that check first instead of second.  This brings the
 runtime down from minutes to sub-second on my machine.  However I don't
 have a whole lot of faith either in the correctness of this change, or
 that it might not make some other cases slower instead of faster.
 Has anyone got a reasonably messy collection of regexps they'd like
 to try this patch on?

I only have the one, so I don't think I can help all that much with that.

 BTW, I filed the problem upstream with the Tcl project
 https://sourceforge.net/tracker/?func=detailaid=2942697group_id=10894atid=110894
 but I'm not going to hold my breath waiting for useful advice from
 them.  Since Henry Spencer dropped off the net, there doesn't seem
 to be anybody there who understands that code much better than we do.

 Also, we likely still ought to toss some CHECK_FOR_INTERRUPTS calls
 in there ...

Yeah. I have zero experience around that code, so if oyu have at least
some, I'd much appreciate it if you (or someone who does) could look
at it. Likely to cause a lot less breakage than me :D

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_listener entries deleted under heavy NOTIFY load only on Windows

2010-01-31 Thread Magnus Hagander
On Fri, Jan 22, 2010 at 22:30, Radu Ilie ri...@wsi.com wrote:
 On a Windows server under heavy load of NOTIFY events, entries in
 pg_listener table for some events are deleted. It is like UNLISTEN was
 called.

 PostgreSQL version: 8.3.9.
 Operating System: Windows XP.

 PostgreSQL believes that if it fails to notify a listener (by signaling the
 respective backend), then the backend doesn't exist anymore and so it should
 get rid of the pg_listener entry. The relevant code is in
 src/backend/commands/async.c, function Send_Notify:

 if (kill(listenerPID, SIGUSR2)  0)
 {
 /*
 * Get rid of pg_listener entry if it refers to a PID that no
 * longer exists.  Presumably, that backend crashed without
 * deleting its pg_listener entries. This code used to only
 * delete the entry if errno==ESRCH, but as far as I can see
 * we should just do it for any failure (certainly at least
 * for EPERM too...)
 */
 simple_heap_delete(lRel, lTuple-t_self);
 }

 The problem is that under Windows, kill can fail even if the process is
 still alive. PostgreSQL uses named pipes under Windows to send signals to
 backends. The present implementation has a bug that causes a client to fail
 to write data to the named pipe, even though the server process is alive.
 This is because the server doesn't maintain the named pipe at all times. The
 bug is present in file src/backend/port/win32/signal.c, function
 pg_signal_thread.

 The server code stays in a loop in which it continuously creates an instance
 of the named pipe (via CreateNamedPipe) and waits for a client process to
 connect (via ConnectNamedPipe). Once a client connects, the communication
 with the client is handled in a new thread, with the thread procedure
 pg_signal_dispatch_thread. This function is very simple: it reads one byte
 from the named pipe, it writes it back and (very important) closes the
 handle to the named pipe instance. The main loop creates another instance of
 the named pipe and waits for another client to connect.

 Now imagine that the server is under heavy load. There are dozens of
 backends and threads running and the CPU usage is close to 100%. The
 following succession of events is possible:

 1. Server signal handling thread (in function pg_signal_thread) creates the
 first, one and only instance of the named pipe via CreateNamedPipe.
 2. Server code starts waiting for clients to connect with ConnectNamedPipe.
 3. Client wishes to make a transaction on the named pipe and calls
 CallNamedPipe (in file src/port/kill.c, function pgkill).
 4. Server code returns from ConnectNamedPipe. It creates a new thread with
 the thread procedure pg_signal_dispatch_thread.
 5. The signal dispatch thread is scheduled for execution and it runs to
 completion. As you can see, the last thing it does related to the named pipe
 is to close the handle via CloseHandle (in function
 pg_signal_dispatch_thread). This closes the last instance of the named pipe.
 The named pipe is gone. There is no more named pipe. The signal handling
 thread was not yet scheduled by the operating system for execution and thus
 didn't have an opportunity to call CreateNamedPipe.
 6. Another client (or the same one, it doesn't matter) tries to write to the
 named pipe via CallNamedPipe. The call returns ERROR_FILE_NOT_FOUND, because
 the named pipe is gone. The client believes the backend is gone and it
 removes the entry from pg_listener.
 7. The signal handling thread (in function pg_signal_thread) is finally
 scheduled for execution and it calls CreateNamedPipe. We now have an
 instance of the named pipe available.

 So we end up with the server backend alive, the named pipe is there, but the
 row is gone from pg_listener. This is easy to reproduce under Windows. I
 used the scripts posted by Steve Marshall in a similar thread from
 01/15/2009 and the problem appears within one minute all the time. For
 testing I used a Windows XP machine with 2 cores and 2GB of RAM. The CPU
 usage was over 70% during the trials.

Thanks for a *very* detailed analysis. I've finally managed to
reproduce this problem, and test your fix.

Interestingly enough, the race condition is copied from Microsoft's
example of how to write a multi-threaded pipe server. That doesn't
make it any less real of course :-)


 The solution is to create a new instance of the named pipe before launching
 the signal dispatch thread. This means changing the code in
 src/backend/port/win32/signal.c to look like this:

I have made some minor stylistic changes to this, such as some missing
error reporting, and added a lot of comments to explain why we do
this, and applied.

If you can, please verify that the tip of CVS for your version solves
the problem in your real world scenario, and not just my reproduction
test case.


 This will guarantee that we have an instance of the named pipe available at
 any given moment. If we do this, we can also remove the 3 tries loop from
 src/port/kill.c:

I have removed these on HEAD only, and I'm 

Re: [HACKERS] odd output in initdb

2010-01-31 Thread Magnus Hagander
On Sun, Jan 31, 2010 at 17:56, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Fri, Jan 29, 2010 at 23:28, Andrew Dunstan and...@dunslane.net wrote:
   initializing dependencies ... WARNING:  pgstat wait timeout

 I'm not claiming it's not, but what exactly points to that? Does the
 problem go away if you move to a version before that?

 Because I'm 99% sure I saw it well before that commit, and we've had
 reports on it from 8.4 as well, I'm not so sure... But it may be that
 that commit made something more likely to happen...

 I notice pgstat_send is still using if (pgStatSock  0) to detect
 PGINVALID_SOCKET ...

That's certainly so, but that shouldn't have any effect on this -
since on that platform it's defined to -1 anyway. But I'll go ahead
and fix it :-)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS/SR and smart shutdown

2010-01-31 Thread Magnus Hagander
On Sat, Jan 30, 2010 at 01:05, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jan 29, 2010 at 7:01 PM, Josh Berkus j...@agliodbs.com wrote:
 It's a good question if that still makes sense with Hot Standby.
 Perhaps we should redefine smart shutdown in standby mode to shut down
 as soon as all read-only connections have died.
 It's clear that smart shutdown doesn't work while something is active.
 Recovery is active and so we shouldn't shutdown. It makes sense, it
 works like this already, lets leave it. Document it if needed.
 I don't think it's clear, or intuitive for users.  In SR, recovery is
 *never* done, so smart shutdown never completes (even if the master is
 shut down, when I tested it).  This is particularly an important issue
 when you consider that some/many service and init scripts only use smart
 shutdown ... so we'll get a lot of bug reports of posgresql does not
 shut down.

 Absolutely agreed.  The existing smart shutdown behavior makes sense
 from a certain point of view, but it doesn't seem very... what's the
 word I'm looking for?... smart.

Yeah.
How about we change it so it's not the default anymore?

The fact is that for most applications, it's just broken. Consider any
application that uses connection pooling, which happens to be what we
recommend people to do. A smart shutdown will never shut that server
down. But it will make it not accept new connections. Which is
probably the worst possible behavior in most cases.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] odd output in initdb

2010-01-31 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Sun, Jan 31, 2010 at 17:56, Tom Lane t...@sss.pgh.pa.us wrote:
 I notice pgstat_send is still using if (pgStatSock  0) to detect
 PGINVALID_SOCKET ...

 That's certainly so, but that shouldn't have any effect on this -
 since on that platform it's defined to -1 anyway. But I'll go ahead
 and fix it :-)

I was more worried about the other way around: could a *valid* handle
be  0?  Although it's still not clear why it'd only recently have
started being a problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] odd output in initdb

2010-01-31 Thread Magnus Hagander
On Sun, Jan 31, 2010 at 18:33, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Sun, Jan 31, 2010 at 17:56, Tom Lane t...@sss.pgh.pa.us wrote:
 I notice pgstat_send is still using if (pgStatSock  0) to detect
 PGINVALID_SOCKET ...

 That's certainly so, but that shouldn't have any effect on this -
 since on that platform it's defined to -1 anyway. But I'll go ahead
 and fix it :-)

 I was more worried about the other way around: could a *valid* handle
 be  0?  Although it's still not clear why it'd only recently have
 started being a problem.

I don't think it can.

And no, I can't figure out why that should've changed recently because
of this, since we've checked against that before..

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries

2010-01-31 Thread Guillaume Lelarge
Le 31/01/2010 17:35, Tom Lane a écrit :
 Guillaume Lelarge guilla...@lelarge.info writes:
 
   */
  do
  {
 + const char *values[] = {
 +   my_opts-hostname,
 +   my_opts-port,
 +   my_opts-dbname,
 +   my_opts-username,
 +   password,
 +   oid2name,
 +   NULL
 +   };
 + 
  new_pass = false;
 
 Is that really legal C89 syntax?

I don't really know. gcc (4.4.1 release) didn't complain about it,
whereas it complained with a warning for not-legal-syntax when I had the
new_pass = false; statement before the array declaration.

 I seem to recall that array
 constructors can only be used for static assignments with older
 compilers.
 
 Also, as a matter of style, I find it pretty horrid that this isn't
 immediately adjacent to the keywords array that it MUST match.
 

I don't find that horrid. AFAICT, that's the only advantage of the
two-arrays method. By the way, it's that kind of code (keywords
declaration separated from values declaration) that got commited in the
previous patch
(http://archives.postgresql.org/pgsql-committers/2010-01/msg00398.php).
I merely used the same code for the other binaries.


-- 
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries

2010-01-31 Thread Joe Conway
On 01/31/2010 09:42 AM, Guillaume Lelarge wrote:
 I don't find that horrid. AFAICT, that's the only advantage of the
 two-arrays method. By the way, it's that kind of code (keywords
 declaration separated from values declaration) that got commited in the
 previous patch
 (http://archives.postgresql.org/pgsql-committers/2010-01/msg00398.php).
 I merely used the same code for the other binaries.

Yes, I separated them, because otherwise the compiler complained about
the declaration not being at the top of a block. Of course Tom's other
complaint and this one can both be satisfied by not doing the static
assignment in the declaration.

I'll fix the already committed code and take a look at refactoring this
latest patch. I stand by the two arrays mthod decision though -- I find
combining them into a single array to be unseemly.

Joe



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] mailing list archiver chewing patches

2010-01-31 Thread Magnus Hagander
On Sun, Jan 31, 2010 at 15:09, Matteo Beccati p...@beccati.com wrote:
 On 31/01/2010 13:45, Magnus Hagander wrote:

 On Sat, Jan 30, 2010 at 22:43, Matteo Beccatip...@beccati.com  wrote:

 On 30/01/2010 17:54, Alvaro Herrera wrote:

 * While I don't personally care, some are going to insist that the site
 works with Javascript disabled.  I didn't try but from your description
 it doesn't seem like it would.  Is this easily fixable?

 Date sorting works nicely even without JS, while thread sorting doesn't
 at
 all. I've just updated the PoC so that thread sorting is not available
 when
 JS is not available, while it still is the default otherwise. Hopefully
 that's enough to keep JS haters happy.

 I haven't looked at how it actually works, but the general requirement
 is that it has to *work* without JS. It doesn't have to work *as
 well*. That means serving up a page with zero contents, or a page that
 you can't navigate, is not acceptable. Requiring more clicks to get
 around the navigation and things like that, are ok.

 As it currently stands, date sorting is the default and there are no links
 to the thread view, which would otherwise look empty. We can surely build a
 non-JS thread view as well, I'm just not sure if it's worth the effort.

Hmm. I personally think we need some level of thread support for
non-JS as well, that's at least not *too* much of a step backwards
from what we have now. But others may have other thoughts about that?


 * We're using Subversion to keep the current code.  Is your code
 version-controlled?  We'd need to import your code there, I'm afraid.

 I do have a local svn repository. Given it's just a PoC that is going to
 be
 rewritten I don't think it should live in the official repo, but if you
 think id does, I'll be glad to switch.

 Note that the plan is to switch pgweb to git as well. So if you just
 want to push the stuff up during development so people can look at it,
 register for a repository at git.postgresql.org - or just set one up
 at github which is even easier.

 The only reason why I used svn is that git support in netbeans is rather
 poor, or at least that was my impression. I think it won't be a problem to
 move to git, I probably just need some directions ;)

:-)

Well, it doesn't matter what type of repo it's in at this point, only
once it goes into production. The reason I suggested git at this point
is that we (the postgresql project) do provide git hosting at
git.postgresql.org, but we don't provide subversion anywhere. And I'm
certainly not going to suggest you use pgfoundry and cvs

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory leak in deferrable index constraints

2010-01-31 Thread Tom Lane
Dean Rasheed dean.a.rash...@googlemail.com writes:
 On 31 January 2010 16:03, Tom Lane t...@sss.pgh.pa.us wrote:
 It seems a bit unlikely that this would be the largest memory leak in
 that area.  Can you show a test case that demonstrates this is worth
 worrying about?

 create table foo(a int unique deferrable initially deferred);
 insert into foo (select * from generate_series(1, 1000));
 begin;
 update foo set a=a+1;
 set constraints all immediate;
 commit;

Thanks.  I had forgotten all the work we put into minimizing the size of
the deferred trigger queue.  In this example it's only 16 bytes per
entry, whereas a 1-element List is going to involve 16 bytes for the
header, 8 bytes for the cell, plus two palloc item overheads --- and
double all that on a 64-bit machine.  So yeah, this is a significant
leak.  Patch applied.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The last item on my list before close is making VACUUM FULL and Hot
 Standby play nicely together.
 The options to do this were and still are:
 (1) Add WAL messages for non-transactional relcache invalidations
 (2) Allow system relations to be cluster-ed/vacuum full-ed.
 (1) was how we did it originally and I believe it worked without
 problem. We saw the opportunity to do (2) and it has been worth
 exploring.
 Refresh my memory on why (1) lets us avoid fixing (2)?  
 
 (1) allows us to retain VACUUM FULL INPLACE for system relations, thus
 avoiding the need to do (2). Non-transactional invalidations need to be
 replayed in recovery for the same reason they exist on the primary.
 
 It sounds like a kluge at best ...
 
 (2) isn't a necessary change right now. It is the best design going
 forwards, but its burst radius stretches far beyond Hot Standby. There
 is enough code in HS for us to support, so adding to it makes little
 sense for me, in this release, since there is no functional benefit in
 doing so.

Well, it'll avoid having to support the kludges in HS required for
VACUUM FULL INPLACE.

I'm in favor of (2), unless some unforeseen hard problems come up with
implementing the ideas on that discussed earlier. Yeah, that's pretty
vague, but basically I think it's worth spending some more time doing
(2), than doing (1). For one, if the plan is to do (2) in next release
anyway, we might as well do it now and avoid having to support the
HS+VACUUM FULL INPLACE combination in only 9.0 stable branch for years
to come.

I believe we had a pretty well-thought out plan on how to support system
catalogs with the new VACUUM FULL.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2010-01-29 at 14:04 -0500, Tom Lane wrote:
 WTF?  Simon, this seems to be getting way way beyond anything the
 community has agreed to.  Particularly, inventing GUCs is not something
 to be doing without consensus.

 If you or anybody else would like me to revoke it then I am happy to do
 that, with no problem or argument. I agree it hasn't had discussion,
 though am happy to have such a discussion.

 The commit is a one line change, with parameter to control it, discussed
 by Heikki and myself in December 2008. I stand by the accuracy of the
 change; the parameter is really to ensure we can test during beta. 

Well, I was waiting to see if anyone else had an opinion, but: my
opinion is that a GUC is not appropriate here.  Either test it yourself
enough to be sure it's a win, or don't put it in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 21:00 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote:
  Simon Riggs si...@2ndquadrant.com writes:
  The last item on my list before close is making VACUUM FULL and Hot
  Standby play nicely together.
  The options to do this were and still are:
  (1) Add WAL messages for non-transactional relcache invalidations
  (2) Allow system relations to be cluster-ed/vacuum full-ed.
  (1) was how we did it originally and I believe it worked without
  problem. We saw the opportunity to do (2) and it has been worth
  exploring.
  Refresh my memory on why (1) lets us avoid fixing (2)?  
  
  (1) allows us to retain VACUUM FULL INPLACE for system relations, thus
  avoiding the need to do (2). Non-transactional invalidations need to be
  replayed in recovery for the same reason they exist on the primary.
  
  It sounds like a kluge at best ...
  
  (2) isn't a necessary change right now. It is the best design going
  forwards, but its burst radius stretches far beyond Hot Standby. There
  is enough code in HS for us to support, so adding to it makes little
  sense for me, in this release, since there is no functional benefit in
  doing so.
 
 Well, it'll avoid having to support the kludges in HS required for
 VACUUM FULL INPLACE.
 
 I'm in favor of (2), unless some unforeseen hard problems come up with
 implementing the ideas on that discussed earlier. Yeah, that's pretty
 vague, but basically I think it's worth spending some more time doing
 (2), than doing (1). For one, if the plan is to do (2) in next release
 anyway, we might as well do it now and avoid having to support the
 HS+VACUUM FULL INPLACE combination in only 9.0 stable branch for years
 to come.

That's a good argument, but with respect, it isn't you who is writing
the code, nor will it be you that supports it, AIUI. Right now, HS is
isolated in many ways. If we introduce this change it will effect
everybody and that means I'll be investigating all manner of bug reports
that have zip to do with HS for a long time to come.

What I would say is that for 9.0 we can easily remove the INPLACE option
as an explicit request.

 I believe we had a pretty well-thought out plan on how to support system
 catalogs with the new VACUUM FULL.

I think calling it a well thought out plan is, err, overstating
things. We had what looked like a viable sketch of how to proceed and I
agreed to investigate that. Having done so, I'm saying I don't like what
I see going further and wish to backtrack to my known safe solution.

Overall, I don't see any benefit in pursuing that course any further. I
just see risk, on balance with (2), whereas (1) seems easier/faster,
less risk and more isolated.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The options to do this were and still are:
 (1) Add WAL messages for non-transactional relcache invalidations
 (2) Allow system relations to be cluster-ed/vacuum full-ed.

 Refresh my memory on why (1) lets us avoid fixing (2)?  

 (1) allows us to retain VACUUM FULL INPLACE for system relations, thus
 avoiding the need to do (2). Non-transactional invalidations need to be
 replayed in recovery for the same reason they exist on the primary.

Well, I would expect that invalidation events need to be transmitted to
hot-standby slaves no matter what --- backends running queries on an HS
slave need to hear about inval events just as much as backends on the
master do.  So my take on it is that all inval events will have to have
associated WAL records when in HS mode, independently of what we choose
to do about VACUUM.

Anyway, it's still not apparent to me exactly what the connection is
between VACUUM FULL and Hot Standby.  I remember that we said HS didn't
work with VACUUM FULL (INPLACE) but I don't recall why that is, and the
links on the open-items pages are not leading me to any useful
discussion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote:

  The commit is a one line change, with parameter to control it, discussed
  by Heikki and myself in December 2008. I stand by the accuracy of the
  change; the parameter is really to ensure we can test during beta. 
 
 Well, I was waiting to see if anyone else had an opinion, but: my
 opinion is that a GUC is not appropriate here.  Either test it yourself
 enough to be sure it's a win, or don't put it in.

I will remove the parameter then, keeping the augmentation. That OK?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Having said that I now realise a few things I didn't before:

 * Approach (2) effects the core of Postgres, even if you don't use Hot
 Standby.
 * I've had to remove 7 sanity checks to get the first few VACUUMs
 working. ISTM that removing various basic checks in the code is not a
 good thing. 
 * There are are more special cases than I realised at first: temp,
 shared, with-toast, nailed, shared-and-nailed, pg_class, normal system.

Quite honestly, these statements and the attached patch (which doesn't
even begin to touch the central issue, but does indeed break a lot of
things) demonstrate that *you* are not the guy to implement what was
being discussed.  It needs to be done by someone who understands the
core caching code, which apparently you haven't studied in any detail.

I have a feeling that I should go do this...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] development setup and libdir

2010-01-31 Thread Dimitri Fontaine
Hi,

Ivan Sergio Borgonovo m...@webthatworks.it writes:
 But considering that what it's really missing between what I need
 and what I get is renaming a file... it's just a pain I've to set up
 a whole new instance of postgres, install the whole source etc...

Untrue. Get the sources with git clone, them
  ./configure --prefix=~/pgsql/master --enable-debug --enable-cassert
  make
  make install

Now you can add ~/pgsql/master to your PATH, initdb then pg_ctl start,
and use PGXS. All will work just fine, and against a *devel* env. Not a
production one. That's a clear *bonus*.

 But well again... considering I'm a rename away from being able to
 compile and test an extension with my user privileges... it's just a
 pity it doesn't work that way out of the box.

I think you're mixing up the development needs with the packaging
ones. You have to keep separating them even when doing both.

 Another scenario could be I could just svn update on a staging box,
 compile there and then move everything to production easier.
 Not every shop is a multimillion dollars enterprise that can afford
 a dev/staging/production box for every version of postgres it is
 using.
 If you don't need to squeeze out every cpu cycle in a production box
 you may be happy with a pre-packaged postgres.

You want pre-packaged stuff, that does not mean you develop them on
production servers. Virtual Machines and chroot etc are cheap.

 still I think my need isn't that weird and could lower the entry
 point for many developers quite a bit.

Well you're only missing non-root make install with *packaged*
PostgreSQL. As it stands, the 2 options are not compatible.

But I fail to see how much it's a burden now that it's easy to develop
on a dedicated *virtual* machine, e.g.

 [1] and yeah I'll need dbg symbols but that's going to happen later.

Development environment and production one are different. You seem to be
wanting to ease the development on production environment. I'm not
convinced this is a good way of approaching the problem.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Stefan Kaltenbrunner

Simon Riggs wrote:

On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote:


The commit is a one line change, with parameter to control it, discussed
by Heikki and myself in December 2008. I stand by the accuracy of the
change; the parameter is really to ensure we can test during beta. 

Well, I was waiting to see if anyone else had an opinion, but: my
opinion is that a GUC is not appropriate here.  Either test it yourself
enough to be sure it's a win, or don't put it in.


I will remove the parameter then, keeping the augmentation. That OK?


Well how much is the actual hit with this on the master for different 
workloads do we have realistic numbers on that? Also how much of an 
actual win is it in the other direction - as in under what circumstances 
and workloads does it help in avoiding superflous cancelations on the 
standby?



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 14:35 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote:
  Simon Riggs si...@2ndquadrant.com writes:
  The options to do this were and still are:
  (1) Add WAL messages for non-transactional relcache invalidations
  (2) Allow system relations to be cluster-ed/vacuum full-ed.
 
  Refresh my memory on why (1) lets us avoid fixing (2)?  
 
  (1) allows us to retain VACUUM FULL INPLACE for system relations, thus
  avoiding the need to do (2). Non-transactional invalidations need to be
  replayed in recovery for the same reason they exist on the primary.
 
 Well, I would expect that invalidation events need to be transmitted to
 hot-standby slaves no matter what --- backends running queries on an HS
 slave need to hear about inval events just as much as backends on the
 master do.  So my take on it is that all inval events will have to have
 associated WAL records when in HS mode, independently of what we choose
 to do about VACUUM.

All transactional invalidations are already handled by HS. It was the
non-transactional invalidations in VACUUM FULL INPLACE that still
require additional handling.

 Anyway, it's still not apparent to me exactly what the connection is
 between VACUUM FULL and Hot Standby.  I remember that we said HS didn't
 work with VACUUM FULL (INPLACE) but I don't recall why that is, and the
 links on the open-items pages are not leading me to any useful
 discussion.

Very little really; not enough to force the sort of changes that I am
now seeing will be required in the way catalogs and caches operate.
There was some difficulty around the fact that VFI issues two commits
for the same transaction, but that is now correctly handled in the code
after discussion.

I'm not known as a risk-averse person but (2) is a step too far for me.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] development setup and libdir

2010-01-31 Thread Dimitri Fontaine
Ivan Sergio Borgonovo m...@webthatworks.it writes:
 Being able to compile and install an extension without a full dev
 environment 

is nonsense.

 and without being root

That's easy on a development environment, but you keep insisting in not
having one. Now that's your problem.

 Sorry if it seemed I was complaining, I was trying to give feedback
 while learning my way to write Hello world.

You seem to have a pretty clear idea of how you want things to work, and
that has nothing to do with how they do. So you're confused at first,
then you want to fix a non existent problem. Please step back, breathe,
and look at your goal and problem again.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 14:44 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  Having said that I now realise a few things I didn't before:
 
  * Approach (2) effects the core of Postgres, even if you don't use Hot
  Standby.
  * I've had to remove 7 sanity checks to get the first few VACUUMs
  working. ISTM that removing various basic checks in the code is not a
  good thing. 
  * There are are more special cases than I realised at first: temp,
  shared, with-toast, nailed, shared-and-nailed, pg_class, normal system.
 
 Quite honestly, these statements and the attached patch (which doesn't
 even begin to touch the central issue, but does indeed break a lot of
 things) demonstrate that *you* are not the guy to implement what was
 being discussed.  It needs to be done by someone who understands the
 core caching code, which apparently you haven't studied in any detail.

I didn't claim the attached patch began to touch the issues. I was very
clear that it covered only the very simplest use case, that was the
point. You may not wish to acknowledge it, but I *have* studied the core
caching code in detail for many months and that is the basis for my
comments. The way I have written the exclusions in vacuum.c shows that I
have identified each of the sub-cases we are required to handle.

There is nothing wrong with your idea of using a mapping file. That is
relatively easy part of the problem.

 I have a feeling that I should go do this...

If you wish, but I still think it is an unnecessary change for this
release, whoever does it. We both know that once you start you won't
stop, but that doesn't make it worthwhile or less risky.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 20:48 +0100, Stefan Kaltenbrunner wrote:
 Simon Riggs wrote:
  On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote:
  
  The commit is a one line change, with parameter to control it, discussed
  by Heikki and myself in December 2008. I stand by the accuracy of the
  change; the parameter is really to ensure we can test during beta. 
  Well, I was waiting to see if anyone else had an opinion, but: my
  opinion is that a GUC is not appropriate here.  Either test it yourself
  enough to be sure it's a win, or don't put it in.
  
  I will remove the parameter then, keeping the augmentation. That OK?
 
 Well how much is the actual hit with this on the master for different 
 workloads do we have realistic numbers on that? Also how much of an 
 actual win is it in the other direction - as in under what circumstances 
 and workloads does it help in avoiding superflous cancelations on the 
 standby?

At the moment a btree delete record will cancel every request
1. no matter how long they have been running
2. no matter if they haven't accessed the index being cleaned (they
might later, is the thinking...)

This change improves case (1) in that it will only remove queries that
are older than the oldest snapshot on the primary, should
max_standby_delay be exceeded. Case (2) would have been improved by my
other proposed patch should max_standby_delay be exceeded.

The cost of improving case (1) is that we do the equivalent of taking a
snapshot of the procarray while holding locks on the btree block being
cleaned. That will increase index contention somewhat in applications
that do btree deletes, i.e. non-hot index updates or deletes.

Greg Stark's comments have been that he wants to see max_standby_delay =
-1 put back, which would effecively prevent both (1) and (2) from
occurring. I am working on that now.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Sun, 2010-01-31 at 14:35 -0500, Tom Lane wrote:
 Anyway, it's still not apparent to me exactly what the connection is
 between VACUUM FULL and Hot Standby.  I remember that we said HS didn't
 work with VACUUM FULL (INPLACE) but I don't recall why that is, and the

[ sorry, I meant not-INPLACE ]

 links on the open-items pages are not leading me to any useful
 discussion.

 Very little really; not enough to force the sort of changes that I am
 now seeing will be required in the way catalogs and caches operate.
 There was some difficulty around the fact that VFI issues two commits
 for the same transaction, but that is now correctly handled in the code
 after discussion.

If the only benefit of getting rid of VACUUM FULL were simplifying
Hot Standby, I'd agree with you.  But there are numerous other benefits.
The double-commit hack you mention is something we need to get rid of
for general system stability (because of the risk of PANIC if the vacuum
fails after the first commit).  Getting rid of REINDEX-in-place on
shared catalog indexes is another thing that's really safety critical.
Removing V-F related hacks in other places would just be a bonus.

It's something we need to do, so if Hot Standby is forcing our hands,
then let's just do it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 15:14 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Sun, 2010-01-31 at 14:35 -0500, Tom Lane wrote:
  Anyway, it's still not apparent to me exactly what the connection is
  between VACUUM FULL and Hot Standby.  I remember that we said HS didn't
  work with VACUUM FULL (INPLACE) but I don't recall why that is, and the
 
 [ sorry, I meant not-INPLACE ]
 
  links on the open-items pages are not leading me to any useful
  discussion.
 
  Very little really; not enough to force the sort of changes that I am
  now seeing will be required in the way catalogs and caches operate.
  There was some difficulty around the fact that VFI issues two commits
  for the same transaction, but that is now correctly handled in the code
  after discussion.
 
 If the only benefit of getting rid of VACUUM FULL were simplifying
 Hot Standby, I'd agree with you.  But there are numerous other benefits.
 The double-commit hack you mention is something we need to get rid of
 for general system stability (because of the risk of PANIC if the vacuum
 fails after the first commit).  Getting rid of REINDEX-in-place on
 shared catalog indexes is another thing that's really safety critical.
 Removing V-F related hacks in other places would just be a bonus.
 
 It's something we need to do, so if Hot Standby is forcing our hands,
 then let's just do it.

That's the point: Hot Standby is *not* forcing our hand to do this.

Doing this will not simplify Hot Standby in any significant way. The
code to support VFI with Hot Standby is, after technical review, much,
much simpler than the code to remove VFI.

I'll do a little work towards step (1) just so we can take a more
informed view once you've had a better look at just what (2) involves. I
had already written the code for the Sept release of HS.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Eliminating VACUUM FULL WAS: remove flatfiles.c

2010-01-31 Thread Tom Lane
Back in September I wrote:
 ... The sticking point --- not only for pg_class,
 but for shared catalogs such as pg_database --- is the lack of a
 way to track relfilenode if it ever changes.  What if we keep
 the relfilenode of these critical tables someplace else?  For
 instance, we could have a map file in each database holding
 the relfilenode of pg_class, and one in $PGDATA/global holding
 the relfilenodes of the shared catalogs and indexes.  It'd be
 possible to update a map file atomically via the rename(2) trick.
 Then we teach relcache or some similar place to believe the map
 files over the contents of pg_class.

Thinking about this some more, I can see one small disadvantage:
for the relations that we use the map file for, pg_class.relfilenode
would not be trustworthy.  This would not affect most of the system
internals (which will be looking at the relcache's copy, which would
be kept valid by the relcache code).  But it would affect user queries,
such as for example attempts to use contrib/oid2name to identify a
file on-disk.  The main case where pg_class.relfilenode would be
likely to be out-of-sync is for shared catalogs.  We could keep it
up to date in most cases for local catalogs, but there's no hope
of reaching into other databases' pg_class when a shared catalog
is relocated.

What I'd suggest doing about this is:

(1) Store zero in pg_class.relfilenode for those catalogs for which
the map is used.  This at least makes it obvious that the value
you're looking at isn't valid.

(2) Provide a SQL function to extract the real relfilenode of any
specified pg_class entry.  We'd have to modify oid2name and
pg_dump to know to use the function instead of looking at the
column.

There might be some other client-side code that would be broken
until it got taught about the function, but hopefully not much.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 15:14 -0500, Tom Lane wrote:

 If the only benefit of getting rid of VACUUM FULL were simplifying
 Hot Standby, I'd agree with you.  But there are numerous other benefits.
 The double-commit hack you mention is something we need to get rid of
 for general system stability (because of the risk of PANIC if the vacuum
 fails after the first commit).  Getting rid of REINDEX-in-place on
 shared catalog indexes is another thing that's really safety critical.
 Removing V-F related hacks in other places would just be a bonus.

I should've agreed with this in my last post, cos I do. I want very,
very much to get rid of VACUUM FULL just because it's such a sump of
ugly, complex code. But there is a limit to how and when performs what I
now see is a more major surgical operation.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 At the moment a btree delete record will cancel every request
 1. no matter how long they have been running
 2. no matter if they haven't accessed the index being cleaned (they
 might later, is the thinking...)

That seems seriously horrid.  What is the rationale for #2 in
particular?  I would hope that at worst this would affect sessions
that are actively competing for the index being cleaned.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ordered aggregates using WITHIN GROUP (was Re: can somebody execute this query on Oracle 11.2g and send result?)

2010-01-31 Thread Hitoshi Harada
2010/2/1 Tom Lane t...@sss.pgh.pa.us:
 Hitoshi Harada umi.tan...@gmail.com writes:
 In other words, the queries can be the same:

 SELECT array_agg(val ORDER BY sk) FROM ...
 SELECT array_agg(val) WITHIN GROUP (ORDER BY sk) FROM ...

 One more time: THOSE DON'T MEAN THE SAME THING.  If we ever get
 around to implementing the hypothetical set functions, we would
 be very unhappy to have introduced such a bogus equivalence.

I completely agree. Although Oracle's syntax can express ordered
aggregate, by introducing such syntax now it will be quite complicated
to implement hypothetical functions for those syntactic restrictions
and design in the future.


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 15:41 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  At the moment a btree delete record will cancel every request
  1. no matter how long they have been running
  2. no matter if they haven't accessed the index being cleaned (they
  might later, is the thinking...)
 
 That seems seriously horrid.  What is the rationale for #2 in
 particular?  I would hope that at worst this would affect sessions
 that are actively competing for the index being cleaned.

That is exactly the feedback I received from many other people and why I
prioritised the relation-specific conflict patch.

It's worse that that because point 2 effects WAL cleanup records for the
heap also.

The rationale is that a session *might* in the future access a table,
and if it did so it would receive the wrong answer *potentially*. This
is the issue I have been discussing for a long time now, in various
forms, starting on-list in Aug 2008.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Stefan Kaltenbrunner

Simon Riggs wrote:

On Sun, 2010-01-31 at 20:48 +0100, Stefan Kaltenbrunner wrote:

Simon Riggs wrote:

On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote:


The commit is a one line change, with parameter to control it, discussed
by Heikki and myself in December 2008. I stand by the accuracy of the
change; the parameter is really to ensure we can test during beta. 

Well, I was waiting to see if anyone else had an opinion, but: my
opinion is that a GUC is not appropriate here.  Either test it yourself
enough to be sure it's a win, or don't put it in.

I will remove the parameter then, keeping the augmentation. That OK?
Well how much is the actual hit with this on the master for different 
workloads do we have realistic numbers on that? Also how much of an 
actual win is it in the other direction - as in under what circumstances 
and workloads does it help in avoiding superflous cancelations on the 
standby?


At the moment a btree delete record will cancel every request
1. no matter how long they have been running
2. no matter if they haven't accessed the index being cleaned (they
might later, is the thinking...)

This change improves case (1) in that it will only remove queries that
are older than the oldest snapshot on the primary, should
max_standby_delay be exceeded. Case (2) would have been improved by my
other proposed patch should max_standby_delay be exceeded.

The cost of improving case (1) is that we do the equivalent of taking a
snapshot of the procarray while holding locks on the btree block being
cleaned. That will increase index contention somewhat in applications
that do btree deletes, i.e. non-hot index updates or deletes.


hmm makes sense from the HS side - but I think to make a case for a GUC 
it would help a lot to see actual numbers(as in benchmarks) showing how 
much of a hit it is to the master.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 22:05 +0100, Stefan Kaltenbrunner wrote:

 hmm makes sense from the HS side - but I think to make a case for a GUC 
 it would help a lot to see actual numbers(as in benchmarks) showing how 
 much of a hit it is to the master.

Agreed, though my approach to benchmarking was to provide the mechanism
by which it was possible to benchmark. I didn't presume to be able to
cover wide area with benchmarking tests just for this.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Sun, 2010-01-31 at 22:05 +0100, Stefan Kaltenbrunner wrote:
 
 hmm makes sense from the HS side - but I think to make a case for a GUC 
 it would help a lot to see actual numbers(as in benchmarks) showing how 
 much of a hit it is to the master.
 
 Agreed, though my approach to benchmarking was to provide the mechanism
 by which it was possible to benchmark. I didn't presume to be able to
 cover wide area with benchmarking tests just for this.

I don't think this patch helps much though. It's setting lastRemovedXid
to GetOldestXmin(), which is still a very pessimistic estimate. In fact,
if there's only one transaction running in the master, it's not any
better than just setting it to InvalidTransactionId and killing all
active queries in the standby.

What we'd want to set it to is the xmin/xmax of the oldest heap tuple
whose pointer was removed from the index page. That could be much much
older than GetOldestXmin(), allowing many more read-only queries to live
in the standby.

IIRC it was Greg Stark who suggested last time this was discussed that
we could calculate the exact value for latestRemovedXid in the standby.
When replaying the deletion record, the standby could look at all the
heap tuples whose index pointers are being removed, to see which one was
newest. That can be pretty expensive, involving random I/O, but it gives
an exact value, and doesn't stress the master at all. And we could skip
it if there's no potentially conflicting read-only queries running.

That seems like the most user-friendly approach to me. Even though
random I/O is expensive, surely it's better than killing queries.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 IIRC it was Greg Stark who suggested last time this was discussed that
 we could calculate the exact value for latestRemovedXid in the standby.
 When replaying the deletion record, the standby could look at all the
 heap tuples whose index pointers are being removed, to see which one was
 newest.

This assumes that the standby can tell which heap tuples are being
removed, which I'm not sure is true --- consider the case where the
deletion record is carrying a page image instead of a list of deleted
tuples.  But maybe we could rejigger the representation to make sure
that the info is always available.  In general it sounds like a much
nicer answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot Standby and deadlock detection

2010-01-31 Thread Simon Riggs
Greg Stark has requested that I re-add max_standby_delay = -1.
I deferred that in favour of relation-specific conflict resolution,
though that seems too major a change from comments received.

As discussed in various other posts, in order to re-add the -1 option we
need to add deadlock detection. I woke up today with a simplifying
assumption and have worked out a solution, the easy parts of which I
have committed earlier.

Part #2 is to make Startup process do deadlock detection. I attach a WIP
patch for comments since signal handling has been a much-discussed area
in recent weeks.

Normal deadlock detection waits for deadlock_timeout before doing the
detection. That is a simple performance tuning mechanism which I think
is probably unnecessary with hot standby, at least in the first
instance.

The way this would work is if Startup waits on a buffer pin we
immediately send out a request to all backends to cancel themselves if
they are holding the buffer pin required  waiting on a lock. We then
sleep until max_standby_delay. When max_standby_delay = -1 we only sleep
until deadlock timeout and then check (on the Startup process).

That keeps the signal handler code simple and reduces the number of test
cases required to confirm everything is solid.

This patch and the last commit together present everything we need to
reenable max_standby_delay = -1, so that change is included here also.

?

-- 
 Simon Riggs   www.2ndQuadrant.com
*** a/src/backend/storage/ipc/procsignal.c
--- b/src/backend/storage/ipc/procsignal.c
***
*** 272,277  procsignal_sigusr1_handler(SIGNAL_ARGS)
--- 272,280 
  	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT))
  		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT);
  
+ 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK))
+ 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+ 
  	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
  		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  
*** a/src/backend/storage/ipc/standby.c
--- b/src/backend/storage/ipc/standby.c
***
*** 127,132  WaitExceedsMaxStandbyDelay(void)
--- 127,135 
  	long	delay_secs;
  	int		delay_usecs;
  
+ 	if (MaxStandbyDelay == -1)
+ 		return false;
+ 
  	/* Are we past max_standby_delay? */
  	TimestampDifference(GetLatestXLogTime(), GetCurrentTimestamp(),
  		delay_secs, delay_usecs);
***
*** 372,378  ResolveRecoveryConflictWithBufferPin(void)
  	 * Signal immediately or set alarm for later.
  	 */
  	if (MaxStandbyDelay == 0)
! 		SendRecoveryConflictWithBufferPin();
  	else
  	{
  		TimestampTz now;
--- 375,391 
  	 * Signal immediately or set alarm for later.
  	 */
  	if (MaxStandbyDelay == 0)
! 		SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
! 	else if (MaxStandbyDelay == -1)
! 	{
! 		/*
! 		 * Set the deadlock timer only, we don't timeout in any other case
! 		 */
! 		if (enable_standby_deadlock_alarm())
! 			sig_alarm_enabled = true;
! 		else
! 			elog(FATAL, could not set timer for process wakeup);
! 	}
  	else
  	{
  		TimestampTz now;
***
*** 386,392  ResolveRecoveryConflictWithBufferPin(void)
  			standby_delay_secs, standby_delay_usecs);
  
  		if (standby_delay_secs = MaxStandbyDelay)
! 			SendRecoveryConflictWithBufferPin();
  		else
  		{
  			TimestampTz fin_time;			/* Expected wake-up time by timer */
--- 399,405 
  			standby_delay_secs, standby_delay_usecs);
  
  		if (standby_delay_secs = MaxStandbyDelay)
! 			SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  		else
  		{
  			TimestampTz fin_time;			/* Expected wake-up time by timer */
***
*** 394,399  ResolveRecoveryConflictWithBufferPin(void)
--- 407,419 
  			int		timer_delay_usecs = 0;
  
  			/*
+ 			 * Send out a request to check for buffer pin deadlocks before we wait.
+ 			 * This is fairly cheap, so no need to wait for deadlock timeout before
+ 			 * trying to send it out.
+ 			 */
+ 			SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+ 
+ 			/*
  			 * How much longer we should wait?
  			 */
  			timer_delay_secs = MaxStandbyDelay - standby_delay_secs;
***
*** 435,449  ResolveRecoveryConflictWithBufferPin(void)
  }
  
  void
! SendRecoveryConflictWithBufferPin(void)
  {
  	/*
  	 * We send signal to all backends to ask them if they are holding
  	 * the buffer pin which is delaying the Startup process. We must
  	 * not set the conflict flag yet, since most backends will be innocent.
  	 * Let the SIGUSR1 handling in each backend decide their own fate.
  	 */
! 	CancelDBBackends(InvalidOid, PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, false);
  }
  
  /*
--- 455,472 
  }
  
  void
! SendRecoveryConflictWithBufferPin(ProcSignalReason reason)
  {
+ 	Assert(reason == PROCSIG_RECOVERY_CONFLICT_BUFFERPIN ||
+ 		   reason == 

Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote:
 IIRC it was Greg Stark who suggested last time this was discussed that
 we could calculate the exact value for latestRemovedXid in the
 standby. When replaying the deletion record, the standby could look at
 all the heap tuples whose index pointers are being removed, to see
 which one was newest. That can be pretty expensive, involving random
 I/O, but it gives an exact value, and doesn't stress the master at
 all. And we could skip it if there's no potentially conflicting
 read-only queries running.
 
 That seems like the most user-friendly approach to me. Even though
 random I/O is expensive, surely it's better than killing queries.

Best solution, no time to do it.

Should I revoke this change? 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 Should I revoke this change? 

Please.  We can always put it back later if nothing better gets
implemented.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] development setup and libdir

2010-01-31 Thread Mark Cave-Ayland

Ivan Sergio Borgonovo wrote:


Of course I can write a script that can workaround this.
It seems that the only thing missing is that pgxs 8.3 used to
prefix .so with lib and then rename them at install time, but pgxs
8.4 build them directly without prefix.
I'm just speculating this is the issue and it is the only one I
still have to solve... but... what's going to happen next release?
Wouldn't it be better if make install could install stuff where I
ask so I could put modules in different places *even* for the same
installation of postgres?


FWIW the soon-to-be-released PostGIS 1.5 has an out of place 
regression testing feature that allows PostGIS to be built using PGXS 
and regression tested without putting anything in the PostgreSQL 
installation directory itself.


It's a little bit of a hack, but take a look at the PGXS makefile and 
the regression makefile to see how it all works:


http://trac.osgeo.org/postgis/browser/trunk/postgis/Makefile.in
http://trac.osgeo.org/postgis/browser/trunk/regress/Makefile.in


HTH,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Eliminating VACUUM FULL WAS: remove flatfiles.c

2010-01-31 Thread Robert Haas
On Sun, Jan 31, 2010 at 3:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Back in September I wrote:
 ... The sticking point --- not only for pg_class,
 but for shared catalogs such as pg_database --- is the lack of a
 way to track relfilenode if it ever changes.  What if we keep
 the relfilenode of these critical tables someplace else?  For
 instance, we could have a map file in each database holding
 the relfilenode of pg_class, and one in $PGDATA/global holding
 the relfilenodes of the shared catalogs and indexes.  It'd be
 possible to update a map file atomically via the rename(2) trick.
 Then we teach relcache or some similar place to believe the map
 files over the contents of pg_class.

 Thinking about this some more, I can see one small disadvantage:
 for the relations that we use the map file for, pg_class.relfilenode
 would not be trustworthy.  This would not affect most of the system
 internals (which will be looking at the relcache's copy, which would
 be kept valid by the relcache code).  But it would affect user queries,
 such as for example attempts to use contrib/oid2name to identify a
 file on-disk.  The main case where pg_class.relfilenode would be
 likely to be out-of-sync is for shared catalogs.  We could keep it
 up to date in most cases for local catalogs, but there's no hope
 of reaching into other databases' pg_class when a shared catalog
 is relocated.

 What I'd suggest doing about this is:

 (1) Store zero in pg_class.relfilenode for those catalogs for which
 the map is used.  This at least makes it obvious that the value
 you're looking at isn't valid.

 (2) Provide a SQL function to extract the real relfilenode of any
 specified pg_class entry.  We'd have to modify oid2name and
 pg_dump to know to use the function instead of looking at the
 column.

 There might be some other client-side code that would be broken
 until it got taught about the function, but hopefully not much.

 Thoughts?

Seems reasonable to me (assuming there's no way to avoid changing the
relfilenode, which I assume is the case but don't actually know the
code well enough to say with certainty).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to

2010-01-31 Thread Simon Riggs
On Sun, 2010-01-31 at 17:10 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  Should I revoke this change? 
 
 Please.

Will do, in morning.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG?] strange behavior in ALTER TABLE ... RENAME TO on inherited columns

2010-01-31 Thread KaiGai Kohei
(2010/01/30 3:36), Robert Haas wrote:
 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com:
 (2010/01/29 9:58), KaiGai Kohei wrote:
 (2010/01/29 9:29), Robert Haas wrote:
 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com:
 (2010/01/29 0:46), Robert Haas wrote:
 2010/1/27 KaiGai Koheikai...@ak.jp.nec.com:
 Hmm, indeed, this logic (V3/V5) is busted.

 The idea of V4 patch can also handle this case correctly, although it
 is lesser in performance.
 I wonder whether it is really unacceptable cost in performance, or not.
 Basically, I assume ALTER TABLE RENAME/TYPE is not frequent operations,
 and I don't think this bugfix will damage to the reputation of 
 PostgreSQL.

 Where should we go on the next?

 Isn't the problem here just that the following comment is 100% wrong?

 /*
  * Unlike find_all_inheritors(), we need to walk on
 child relations
  * that have diamond inheritance tree, because this
 function has to
  * return correct expected inhecount to the caller.
  */

 It seems to me that the right solution here is to just add one more
 argument to find_all_inheritors(), something like List
 **expected_inh_count.

 Am I missing something?

 The find_all_inheritors() does not walk on child relations more than
 two times, even if a child has multiple parents inherited from common
 origin, because list_concat_unique_oid() ignores the given OID if it
 is already on the list. It means all the child relations under the
 relation already walked on does not checked anywhere. (Of course,
 this assumption is correct for the purpose of find_all_inheritors()
 with minimum cost.)

 What we want to do here is to compute the number of times a certain
 child relation is inherited from a common origin; it shall be the
 expected-inhcount. So, we need an arrangement to the logic.

 For example, see the following diagram.

  T2
 /  \
 T1T4---T5
 \  /
  T3

 If we call find_all_inheritors() with T1. The find_inheritance_children()
 returns T2 and T3 for T1.
 Then, it calls find_inheritance_children() for T2, and it returns T4.
 Then, it calls find_inheritance_children() for T3, and it returns T4, but
 it is already in the rels_list, so list_concat_unique_oid() ignores it.
 Then, it calls find_inheritance_children() for T4, and it returns T5.

 In this example, we want the expected inhcount for T2 and T3 should be 1,
 for T4 and T5 should be 2. However, it walks on T4 and T5 only once, so
 they will have 1 incorrectly.
 Even if we count up the ignored OID (T4), find_all_inheritors() does not
 walk on T5, because it is already walked on obviously when T4 is ignored.

 I think the count for T5 should be 1, and I think that the count for
 T4 can easily be made to be 2 by coding the algorithm correctly.

 Ahh, it is right. I was confused.

 Is it possible to introduce the logic mathematical-strictly?
 Now I'm considering whether the find_all_inheritors() logic is suitable
 for any cases, or not.

 I modified the logic to compute an expected inhcount of the child relations.

 What we want to compute here is to compute the number of times a certain
 relation being inherited directly from any relations delivered from a unique
 origin. If the column to be renamed is eventually inherited from a common
 origin, its attinhcount is not larger than the expected inhcount.

 WITH RECURSIVE r AS (
   SELECT 't1'::regclass AS inhrelid
 UNION ALL
   SELECT c.inhrelid FROM pg_inherits c, r WHERE r.inhrelid = c.inhparent
 )  -- r is all the child relations inherited from 't1'
 SELECT inhrelid::regclass, count(*) FROM pg_inherits
   WHERE inhparent IN (SELECT inhrelid FROM r) GROUP BY inhrelid;

 The modified logic increments the expected inhcount of the relation already
 walked on. If we found a child relation twice or more, it means the child
 relation is at the junction of the inheritance tree.
 In this case, we don't walk on the child relations any more, but it is not
 necessary, because the first path already checked it.

 The find_all_inheritors() is called from several points more than 
 renameatt(),
 so I added find_all_inheritors_with_inhcount() which takes argument of the
 expected inhcount list. And, find_all_inheritors() was also modified to call
 find_all_inheritors_with_inhcount() with NULL argument to avoid code 
 duplication.

 It is the result of Berrnd's test case.

 - CVS HEAD
   0.895s
   0.903s
   0.884s
   0.896s
   0.892s

 - with V6 patch
   0.972s
   0.941s
   0.961s
   0.949s
   0.946s
 
 Well, that seems a lot better.  Unfortunately, I can't commit this
 code: it's mind-bogglingly ugly.  I don't believe that duplicating
 find_all_inheritors is the right solution (a point I've mentioned
 previously), and it's certainly not right to use typeName-location as
 a place to store an unrelated attribute inheritance count.  There is
 also at least one superfluous variable renaming; and the recursion

[HACKERS] Aggressive freezing versus caching of pg_proc entries

2010-01-31 Thread Tom Lane
There are various places in the backend that rely on comparing a catalog
tuple's TID and XMIN to values saved in a cache entry in order to detect
whether the tuple changed since the cache entry was made.  (So far as
I can find, the only places that do this are looking at pg_proc entries
--- fmgr.c as well as each of the standard PLs use this technique for
checking validity of function-related cache entries.)

It strikes me that aggressive freezing of pg_proc entries could break
this logic; where aggressive means freezing a tuple in less than the
inter-reference time of somebody's cache entry.  Consider a sequence
like this:

1. A pg_proc tuple is frozen, so its xmin = FrozenXID.

2. Somebody caches a function definition based on the tuple.

3. Someone else updates the tuple twice; the second update by chance
puts the updated tuple back at its original TID (quite likely with HOT).

4. Aggressive freeze of pg_proc sets the tuple's xmin back to FrozenXID.

5. The first somebody uses the function again.  He'll see same TID and
XMIN as before, hence fail to realize tuple is changed.

Another path to failure is that the tuple could be dropped entirely,
then replaced by one that unluckily has same OID and is placed at same
TID.  Again, freezing destroys all trace that this happened.

The reason this occurred to me is that I was thinking about the
consequences of applying cluster-like VACUUM FULL to pg_proc.  That
creates a third failure path: a single update followed by clustering
that unluckily drops tuple back at its old TID.  But as shown above,
we're already at risk without that.

I'm inclined to think that we should get rid of this caching method
in favor of having fmgr.c and the PLs hook into sinval cache flush
callbacks.  It's not high priority, but given that various people
have advocated aggressive freezing policies, it seems there's some
risk in that.  I also wonder if it wouldn't be better to centralize
this logic instead of having five different implementations of it
(or more --- likely some third-party PLs have copied that logic...)

Anyway, not proposing to fix this now, but maybe it should be on the
TODO list.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] odd output in initdb

2010-01-31 Thread Andrew Dunstan



Magnus Hagander wrote:


Actually, on close inspection it looks to me like this commit: Create
typedef pgsocket for storing socket descriptors.
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=ea1a4463e9de9662b7c9e13374ec8c7b92ff2f19
could well be the culprit.




I'm not claiming it's not, but what exactly points to that? Does the
problem go away if you move to a version before that?

Because I'm 99% sure I saw it well before that commit, and we've had
reports on it from 8.4 as well, I'm not so sure... But it may be that
that commit made something more likely to happen...
  



The buildfarm logs say otherwise. This was first seen in Jan 10, at 
least on my Windows animals:


   pgbfprod=# select sysname, min(snapshot) from build_status_log where
   sysname ~ 'bat' and log_stage ~ 'initdb' and log_text ~ 'pgstat'
   group by sysname;
sysname  | min
   --+-

dawn_bat | 2010-01-10 17:30:02
red_bat  | 2010-01-10 23:30:01
   (2 rows)

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] contrib\xml2 package's xpath_table function in PostgreSQL

2010-01-31 Thread HX Zheng
Hi,

We have a huge system based on PostgreSQL with xml2 functions. From the 
PostgreSQL 8.4 documentation F.38.1. Deprecation notice, it looks like those 
functions are removed. However, our solution is very huge, and heavily depends 
on them. 

1. Could you please give me some instructions to get them back in the 
PostgreSQL 8.4? 

2. Could you please also give me some details if we can find similar functions 
we can use to replace them in PostgreSQL 8.4?

Thank you in advance.

Best,
Michael

PS. Attached copied from PostgreSQL 8.4 documentation:


F.38. xml2

The xml2 module provides XPath querying and XSLT functionality. 

F.38.1. Deprecation notice

From PostgreSQL 8.3 on, there is XML-related functionality based on the 
SQL/XML standard in the core server. That functionality covers XML syntax 
checking and XPath queries, which is what this module does, and more, but the 
API is not at all compatible. It is planned that this module will be removed 
in PostgreSQL 8.4 in favor of the newer standard API, so you are encouraged to 
try converting your applications. If you find that some of the functionality 
of this module is not available in an adequate form with the newer API, please 
explain your issue to pgsql-hackers@postgresql.org so that the deficiency can 
be addressed.


Re: [HACKERS] mailing list archiver chewing patches

2010-01-31 Thread Alvaro Herrera
Matteo Beccati wrote:

 Incidentally, I've just found out that the mailing lists are
 dropping some messages. According to my qmail logs the AOX account
 never received Joe's message yesterday, nor quite a few others:
 
 M156252, M156259, M156262, M156273, M156275
 
 and I've verified that it also has happened before. I don't know
 why, but I'm pretty sure that my MTA was contacted only once for
 those messages, while normally I get two connections (my own address
 + aox address).

Hmm, I see it here:
http://archives.postgresql.org/message-id/4B64A238.1050307%40joeconway.com
Maybe it was just delayed?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS/SR and smart shutdown

2010-01-31 Thread Fujii Masao
On Sat, Jan 30, 2010 at 12:54 PM, Fujii Masao masao.fu...@gmail.com wrote:
 HOWEVER, I do believe this is an issue we could live with for 9.0 if
 it's going to lead to a whole lot of additional debugging of SR.  But if
 it's an easy fix, it'll avoid a lot of complaints on pgsql-general.

 I think that the latter statement is right.

Though we've not reached consensus on smart shutdown during
recovery yet, I wrote the patch that changes its behavior:
shut down the server (including the startup process and the
walreceiver) as soon as all read-only connections have died.
The code is also available in the 'replication' branch in
my git repository.

And, let's discuss whether something like the attached patch
is required for v9.0 or not.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***
*** 278,283  typedef enum
--- 278,284 
  	PM_RECOVERY_CONSISTENT,		/* consistent recovery mode */
  	PM_RUN,		/* normal database is alive state */
  	PM_WAIT_BACKUP,/* waiting for online backup mode to end */
+ 	PM_WAIT_READONLY,			/* waiting for read only backends to exit */
  	PM_WAIT_BACKENDS,			/* waiting for live backends to exit */
  	PM_SHUTDOWN,/* waiting for bgwriter to do shutdown ckpt */
  	PM_SHUTDOWN_2,/* waiting for archiver and walsenders to finish */
***
*** 2165,2171  pmdie(SIGNAL_ARGS)
  /* and the walwriter too */
  if (WalWriterPID != 0)
  	signal_child(WalWriterPID, SIGTERM);
! pmState = PM_WAIT_BACKUP;
  			}
  
  			/*
--- 2166,2173 
  /* and the walwriter too */
  if (WalWriterPID != 0)
  	signal_child(WalWriterPID, SIGTERM);
! /* online backup mode is active only when normal processing */
! pmState = (pmState == PM_RUN) ? PM_WAIT_BACKUP : PM_WAIT_READONLY;
  			}
  
  			/*
***
*** 2840,2845  PostmasterStateMachine(void)
--- 2842,2870 
  	}
  
  	/*
+ 	 * If we are in a state-machine state that implies waiting for read only
+ 	 * backends to exit, see if they're all gone, and change state if so.
+ 	 */
+ 	if (pmState == PM_WAIT_READONLY)
+ 	{
+ 		/*
+ 		 * PM_WAIT_READONLY state ends when we have no regular backends that
+ 		 * have been started during recovery. Since those backends might be
+ 		 * waiting for the WAL record that conflicts with their queries to be
+ 		 * replayed, recovery and replication need to remain until all read
+ 		 * only backends have been gone away.
+ 		 */
+ 		if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
+ 		{
+ 			if (StartupPID != 0)
+ signal_child(StartupPID, SIGTERM);
+ 			if (WalReceiverPID != 0)
+ signal_child(WalReceiverPID, SIGTERM);
+ 			pmState = PM_WAIT_BACKENDS;
+ 		}
+ 	}
+ 
+ 	/*
  	 * If we are in a state-machine state that implies waiting for backends to
  	 * exit, see if they're all gone, and change state if so.
  	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG?] strange behavior in ALTER TABLE ... RENAME TO on inherited columns

2010-01-31 Thread KaiGai Kohei
(2010/02/01 8:41), KaiGai Kohei wrote:
 (2010/01/30 3:36), Robert Haas wrote:
 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com:
 (2010/01/29 9:58), KaiGai Kohei wrote:
 (2010/01/29 9:29), Robert Haas wrote:
 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com:
 (2010/01/29 0:46), Robert Haas wrote:
 2010/1/27 KaiGai Koheikai...@ak.jp.nec.com:
 Hmm, indeed, this logic (V3/V5) is busted.

 The idea of V4 patch can also handle this case correctly, although it
 is lesser in performance.
 I wonder whether it is really unacceptable cost in performance, or not.
 Basically, I assume ALTER TABLE RENAME/TYPE is not frequent operations,
 and I don't think this bugfix will damage to the reputation of 
 PostgreSQL.

 Where should we go on the next?

 Isn't the problem here just that the following comment is 100% wrong?

  /*
   * Unlike find_all_inheritors(), we need to walk on
 child relations
   * that have diamond inheritance tree, because this
 function has to
   * return correct expected inhecount to the caller.
   */

 It seems to me that the right solution here is to just add one more
 argument to find_all_inheritors(), something like List
 **expected_inh_count.

 Am I missing something?

 The find_all_inheritors() does not walk on child relations more than
 two times, even if a child has multiple parents inherited from common
 origin, because list_concat_unique_oid() ignores the given OID if it
 is already on the list. It means all the child relations under the
 relation already walked on does not checked anywhere. (Of course,
 this assumption is correct for the purpose of find_all_inheritors()
 with minimum cost.)

 What we want to do here is to compute the number of times a certain
 child relation is inherited from a common origin; it shall be the
 expected-inhcount. So, we need an arrangement to the logic.

 For example, see the following diagram.

   T2
  /  \
 T1T4---T5
  \  /
   T3

 If we call find_all_inheritors() with T1. The find_inheritance_children()
 returns T2 and T3 for T1.
 Then, it calls find_inheritance_children() for T2, and it returns T4.
 Then, it calls find_inheritance_children() for T3, and it returns T4, but
 it is already in the rels_list, so list_concat_unique_oid() ignores it.
 Then, it calls find_inheritance_children() for T4, and it returns T5.

 In this example, we want the expected inhcount for T2 and T3 should be 1,
 for T4 and T5 should be 2. However, it walks on T4 and T5 only once, so
 they will have 1 incorrectly.
 Even if we count up the ignored OID (T4), find_all_inheritors() does not
 walk on T5, because it is already walked on obviously when T4 is ignored.

 I think the count for T5 should be 1, and I think that the count for
 T4 can easily be made to be 2 by coding the algorithm correctly.

 Ahh, it is right. I was confused.

 Is it possible to introduce the logic mathematical-strictly?
 Now I'm considering whether the find_all_inheritors() logic is suitable
 for any cases, or not.

 I modified the logic to compute an expected inhcount of the child relations.

 What we want to compute here is to compute the number of times a certain
 relation being inherited directly from any relations delivered from a unique
 origin. If the column to be renamed is eventually inherited from a common
 origin, its attinhcount is not larger than the expected inhcount.

  WITH RECURSIVE r AS (
SELECT 't1'::regclass AS inhrelid
  UNION ALL
SELECT c.inhrelid FROM pg_inherits c, r WHERE r.inhrelid = 
 c.inhparent
  )  -- r is all the child relations inherited from 't1'
  SELECT inhrelid::regclass, count(*) FROM pg_inherits
WHERE inhparent IN (SELECT inhrelid FROM r) GROUP BY inhrelid;

 The modified logic increments the expected inhcount of the relation already
 walked on. If we found a child relation twice or more, it means the child
 relation is at the junction of the inheritance tree.
 In this case, we don't walk on the child relations any more, but it is not
 necessary, because the first path already checked it.

 The find_all_inheritors() is called from several points more than 
 renameatt(),
 so I added find_all_inheritors_with_inhcount() which takes argument of the
 expected inhcount list. And, find_all_inheritors() was also modified to call
 find_all_inheritors_with_inhcount() with NULL argument to avoid code 
 duplication.

 It is the result of Berrnd's test case.

 - CVS HEAD
0.895s
0.903s
0.884s
0.896s
0.892s

 - with V6 patch
0.972s
0.941s
0.961s
0.949s
0.946s

 Well, that seems a lot better.  Unfortunately, I can't commit this
 code: it's mind-bogglingly ugly.  I don't believe that duplicating
 find_all_inheritors is the right solution (a point I've mentioned
 previously), and it's certainly not right to use typeName-location as
 a place to store an unrelated attribute inheritance count.  There is
 

Re: [HACKERS] Pathological regexp match

2010-01-31 Thread Tom Lane
I wrote:
 I found a possible patch that seems to improve matters: AFAICS the DFA
 matching is independent of the checking that cdissect() and friends do,
 so we can apply that check first instead of second.  This brings the
 runtime down from minutes to sub-second on my machine.  However I don't
 have a whole lot of faith either in the correctness of this change, or
 that it might not make some other cases slower instead of faster.
 Has anyone got a reasonably messy collection of regexps they'd like
 to try this patch on?

The Tcl folk accepted that patch, so I went ahead and applied it to
our code.  It would still be a good idea for us to do any testing we
can on it, though.

 Also, we likely still ought to toss some CHECK_FOR_INTERRUPTS calls
 in there ...

I looked at this a bit and decided that it would be messier than seems
justified in the absence of known problems.  The regex code uses malloc
not palloc for transient data structures, so we can't just stick a
CHECK_FOR_INTERRUPTS() into some inner loop hotspot --- throwing a
longjmp would mean a permanent memory leak.  I looked at integrating
into the error mechanism it does have, but it turns out that that's
not particularly designed to force immediate exit on failure either;
they just set a flag bit that will be tested whenever control does
return from the match.  I think that a good fix would require first
changing the mallocs to pallocs, then add CHECK_FOR_INTERRUPTS.
But that's not something I have time to mess with at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: listagg aggregate

2010-01-31 Thread Takahiro Itagaki

Robert Haas robertmh...@gmail.com wrote:

  ok - I am only one who like original behave - so I ?am withdrawing.
  Robert, If you like, please commit Itagaki's patch. + add note about
  behave when second parameter isn't stable.
 
 I haven't even looked at this code - I sort of assumed Itagaki was
 handling this one.  But it might be good to make sure that the docs
 have been read through by a native English speaker prior to commit...

Applied with some editorialization. Please check the docs are good enough.

I removed a test pattern for variable delimiters from regression test
because it might be an undefined behavior. We might change the behavior
in the future, so we guarantee only constant delimiters.

The transition functions are renamed to string_agg_transfn and
string_agg_delim_transfn. We cannot use string_agg_transfn for
both names because we cast the function name as regproc in bootstrap;
it complains about duplicated function names.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 I'll do a little work towards step (1) just so we can take a more
 informed view once you've had a better look at just what (2) involves.

I spent a couple of hours reading code and believe that I've identified
all the pain points for allowing relocation of system catalogs (ie,
assigning them new relfilenodes during cluster-style VACUUM FULL,
REINDEX, etc).  It's definitely not a trivial project but it's not out
of reach either --- I estimate I could get it done in a couple or three
days of full-time effort.  Given the potential benefits I think it's
worth doing.  Rough notes are attached --- comments?

regards, tom lane


Change pg_class.relfilenode to be 0 for all rels for which a map file should
be used instead.  Define RelationGetFilenode() to look to the physaddr info
instead, and make all internal refs go to that instead of reading
rd_rel-relfilenode directly.  Define pg_relation_filenode(regclass) and fix
external refs (oid2name, pg_dump) to use that instead.

In zeroth cut, just have relcache.c substitute the OID in place of reading
a map file.  Possibly it should always do that during bootstrap.

How should bootstrap decide which rels to insert zero for, anyway?
Shared definitely, pg_class, ... maybe that's enough?  Or do we need
it for all nailed local catalogs?

local state contains
* shared map list
* this database's map list
* list of local overrides to each on-disk map list

At commit: if modified, write out new version; do this as late as possible
before xact commit
At abort: just discard local-override list

Write must include generating a WAL entry as well as sending a shared-inval
signal
* write temp file, fsync it
* emit WAL record containing copy of new file contents, fsync it
* atomically rename temp file into place
* send sinval

During relation open, check for sinval before relying on current cached value
of map contents

Hm, what about concurrent updates of map?  Probably instantiate a new
LWLock or maybe better a HW lock.  So write involves taking lock, check
for updates, finally write --- which means that local modifications
have to be stored in a way that allows re-reading somebody else's mods.
Hence above data structure with separate storage of local modifications.
We assume rel-level locking protects us from concurrent update attempts
on the same map entry, but we don't want to forbid concurrent relocations
of different catalogs.

LWLock would work if we do an unconditional read of the file contents after
getting lock rather than relying on sinval signaling, which seems a small
price considering map updates should be infrequent.  Without that, writers
have to hold the lock till commit which rules out using LWLock.

Hm ... do we want an LWLock per map file, or is one lock to rule them all
sufficient?  LWLock per database seems problematic.  With an HW lock there
wouldn't be a problem with that.  HW lock would allow concurrent updates of
the map files of different DBs, but is that worth the extra cycles?

In a case where a transaction relocates pg_class (or another mapped catalog)
and then makes updates in that catalog, all is well in the sense that the
updates will be preserved iff the xact commits.  HOWEVER we would have
problems during WAL replay?  No, because the WAL entries will command updates
to the catalog's new relfilenode, which will be interesting to anyone else if
and only if the xact gets to commit.  We would need to be careful about the
case of relocating pg_class itself though --- make sure local relcache
references the new pg_class before any such updates happen.  Probably a CCI
is sufficient.

Another issue for clustering a catalog is that sinval catcache signalling
could get confused, since that depends on catalog entry TIDs which would
change --- we'd likely need to have relocation send an sinval signal saying
flush *everything* from that catalog.  (relcache inval on the catalog itself
doesn't do that.)  This action could/should be transactional so no
additional code is needed to propagate the notice to HS standbys.

Once the updated map file is moved into place, the relocation is effectively
committed even if we subsequently abort the transaction.  We can make that
window pretty narrow but not remove it completely.  Risk factors:
* abort would try to remove relation files created by xact, in particular
the new version of the catalog.  Ooops.  Can fix this by telling smgr to
forget about removing those files before installing the new map file ---
better to leak some disk space than destroy catalogs.
* on abort, we'd not send sinval notices, leaving other backends at risk
of using stale info (maybe our own too).  We could fix this by sending
the sinval notice BEFORE renaming into place (if we send it and then fail
to rename, no real harm done, just useless cache flushes).  This requires
keeping a nontransactional-inval path in inval.c, but 

Re: [HACKERS] Hot Standby and VACUUM FULL

2010-01-31 Thread Bruce Momjian

FYI, getting rid of VACUUM FULL in a .0 release does make more sense
than doing it in a .1 release.

---

Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  I'll do a little work towards step (1) just so we can take a more
  informed view once you've had a better look at just what (2) involves.
 
 I spent a couple of hours reading code and believe that I've identified
 all the pain points for allowing relocation of system catalogs (ie,
 assigning them new relfilenodes during cluster-style VACUUM FULL,
 REINDEX, etc).  It's definitely not a trivial project but it's not out
 of reach either --- I estimate I could get it done in a couple or three
 days of full-time effort.  Given the potential benefits I think it's
 worth doing.  Rough notes are attached --- comments?
 
   regards, tom lane
 
 
 Change pg_class.relfilenode to be 0 for all rels for which a map file should
 be used instead.  Define RelationGetFilenode() to look to the physaddr info
 instead, and make all internal refs go to that instead of reading
 rd_rel-relfilenode directly.  Define pg_relation_filenode(regclass) and fix
 external refs (oid2name, pg_dump) to use that instead.
 
 In zeroth cut, just have relcache.c substitute the OID in place of reading
 a map file.  Possibly it should always do that during bootstrap.
 
 How should bootstrap decide which rels to insert zero for, anyway?
 Shared definitely, pg_class, ... maybe that's enough?  Or do we need
 it for all nailed local catalogs?
 
 local state contains
   * shared map list
   * this database's map list
   * list of local overrides to each on-disk map list
 
 At commit: if modified, write out new version; do this as late as possible
 before xact commit
 At abort: just discard local-override list
 
 Write must include generating a WAL entry as well as sending a shared-inval
 signal
   * write temp file, fsync it
   * emit WAL record containing copy of new file contents, fsync it
   * atomically rename temp file into place
   * send sinval
 
 During relation open, check for sinval before relying on current cached value
 of map contents
 
 Hm, what about concurrent updates of map?  Probably instantiate a new
 LWLock or maybe better a HW lock.  So write involves taking lock, check
 for updates, finally write --- which means that local modifications
 have to be stored in a way that allows re-reading somebody else's mods.
 Hence above data structure with separate storage of local modifications.
 We assume rel-level locking protects us from concurrent update attempts
 on the same map entry, but we don't want to forbid concurrent relocations
 of different catalogs.
 
 LWLock would work if we do an unconditional read of the file contents after
 getting lock rather than relying on sinval signaling, which seems a small
 price considering map updates should be infrequent.  Without that, writers
 have to hold the lock till commit which rules out using LWLock.
 
 Hm ... do we want an LWLock per map file, or is one lock to rule them all
 sufficient?  LWLock per database seems problematic.  With an HW lock there
 wouldn't be a problem with that.  HW lock would allow concurrent updates of
 the map files of different DBs, but is that worth the extra cycles?
 
 In a case where a transaction relocates pg_class (or another mapped catalog)
 and then makes updates in that catalog, all is well in the sense that the
 updates will be preserved iff the xact commits.  HOWEVER we would have
 problems during WAL replay?  No, because the WAL entries will command updates
 to the catalog's new relfilenode, which will be interesting to anyone else if
 and only if the xact gets to commit.  We would need to be careful about the
 case of relocating pg_class itself though --- make sure local relcache
 references the new pg_class before any such updates happen.  Probably a CCI
 is sufficient.
 
 Another issue for clustering a catalog is that sinval catcache signalling
 could get confused, since that depends on catalog entry TIDs which would
 change --- we'd likely need to have relocation send an sinval signal saying
 flush *everything* from that catalog.  (relcache inval on the catalog itself
 doesn't do that.)  This action could/should be transactional so no
 additional code is needed to propagate the notice to HS standbys.
 
 Once the updated map file is moved into place, the relocation is effectively
 committed even if we subsequently abort the transaction.  We can make that
 window pretty narrow but not remove it completely.  Risk factors:
 * abort would try to remove relation files created by xact, in particular
 the new version of the catalog.  Ooops.  Can fix this by telling smgr to
 forget about removing those files before installing the new map file ---
 better to leak some disk space than destroy catalogs.
 * on abort, we'd not send sinval notices, leaving other backends at risk
 of 

Re: [HACKERS] Review: listagg aggregate

2010-01-31 Thread Pavel Stehule
2010/2/1 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp:

 Robert Haas robertmh...@gmail.com wrote:

  ok - I am only one who like original behave - so I ?am withdrawing.
  Robert, If you like, please commit Itagaki's patch. + add note about
  behave when second parameter isn't stable.

 I haven't even looked at this code - I sort of assumed Itagaki was
 handling this one.  But it might be good to make sure that the docs
 have been read through by a native English speaker prior to commit...

 Applied with some editorialization. Please check the docs are good enough.

 I removed a test pattern for variable delimiters from regression test
 because it might be an undefined behavior. We might change the behavior
 in the future, so we guarantee only constant delimiters.

 The transition functions are renamed to string_agg_transfn and
 string_agg_delim_transfn. We cannot use string_agg_transfn for
 both names because we cast the function name as regproc in bootstrap;
 it complains about duplicated function names.

thank you very much

Pavel Stehule


 Regards,
 ---
 Takahiro Itagaki
 NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Largeobject Access Controls (r2460)

2010-01-31 Thread Takahiro Itagaki

KaiGai Kohei kai...@ak.jp.nec.com wrote:

 The attached patch uses one TOC entry for each blob objects.

This patch does not only fix the existing bugs, but also refactor
the dump format of large objects in pg_dump. The new format are
more similar to the format of tables:

 SectionTables New LOOld LO
-
 Schema TABLE  BLOB ITEM BLOBS
 Data   TABLE DATA BLOB DATA BLOBS
 Comments   COMMENTCOMMENT   BLOB COMMENTS

We will allocate BlobInfo in memory for each large object. It might
consume much more memory compared with former versions if we have
many large objects, but we discussed it is an acceptable change.

As far as I read, the patch is almost ready to commit
except the following issue about backward compatibility:

 * BLOB DATA
 This section is same as existing BLOBS section, except for _LoadBlobs()
 does not create a new large object before opening it with INV_WRITE, and
 lo_truncate() will be used instead of lo_unlink() when --clean is given.

 The legacy sections (BLOBS and BLOB COMMENTS) are available to read
 for compatibility, but newer pg_dump never create these sections.

I wonder we need to support older versions in pg_restore. You add a check
PQserverVersion = 80500 in CleanupBlobIfExists(), but out documentation
says we cannot use pg_restore 9.0 for 8.4 or older servers:

http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html
| it is not guaranteed that pg_dump's output can be loaded into
| a server of an older major version

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.


 One remained issue is the way to decide whether blobs to be dumped, or not.
 Right now, --schema-only eliminate all the blob dumps.
 However, I think it should follow the manner in any other object classes.
 
  -a, --data-only... only BLOB DATA sections, not BLOB ITEM
  -s, --schema-only  ... only BLOB ITEM sections, not BLOB DATA
  -b, --blobs... both of BLOB ITEM and BLOB DATA independently
 from --data-only and --schema-only?

I cannot image situations that require data-only dumps -- for example,
restoring database has a filled pg_largeobject_metadata and an empty
or broken pg_largeobject -- but it seems to be unnatural.

I'd prefer to keep the existing behavior:
  * default or data-only : dump all attributes and data of blobs
  * schema-only  : don't dump any blobs
and have independent options to control blob dumps:
  * -b, --blobs: dump all blobs even if schema-only
  * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers