[HACKERS] Using the new libpq connection functions in PostgreSQL binaries
Hi, I worked on a patch to make PostgreSQL binaries use the new PQconnectdbParams() libpq functions. I tried to mimic the way Joe Conway changed my previous patch. I know I'm way over the deadline for this commitfest. I couldn't do it before because my previous patch (on this commit fest) proposed two methods to do the new connection functions (a one array method, and a two-arrays method). Joe chose the two arrays method. Anyways, I would understand if it gets postponed to the first commitfest for 9.1. Regards. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com Index: contrib/oid2name/oid2name.c === RCS file: /opt/cvsroot_postgresql/pgsql/contrib/oid2name/oid2name.c,v retrieving revision 1.36 diff -c -p -c -r1.36 oid2name.c *** contrib/oid2name/oid2name.c 11 Jun 2009 14:48:51 - 1.36 --- contrib/oid2name/oid2name.c 31 Jan 2010 01:36:38 - *** sql_conn(struct options * my_opts) *** 301,306 --- 301,308 PGconn *conn; char *password = NULL; bool new_pass; + const char *keywords[] = {host,port,dbname,user, + password,application_name,NULL}; /* * Start the connection. Loop until we have a password if requested by *** sql_conn(struct options * my_opts) *** 308,321 */ do { new_pass = false; ! conn = PQsetdbLogin(my_opts-hostname, ! my_opts-port, ! NULL, /* options */ ! NULL, /* tty */ ! my_opts-dbname, ! my_opts-username, ! password); if (!conn) { fprintf(stderr, %s: could not connect to database %s\n, --- 310,327 */ do { + const char *values[] = { + my_opts-hostname, + my_opts-port, + my_opts-dbname, + my_opts-username, + password, + oid2name, + NULL + }; + new_pass = false; ! conn = PQconnectdbParams(keywords, values); if (!conn) { fprintf(stderr, %s: could not connect to database %s\n, Index: contrib/pgbench/pgbench.c === RCS file: /opt/cvsroot_postgresql/pgsql/contrib/pgbench/pgbench.c,v retrieving revision 1.96 diff -c -p -c -r1.96 pgbench.c *** contrib/pgbench/pgbench.c 6 Jan 2010 01:30:03 - 1.96 --- contrib/pgbench/pgbench.c 31 Jan 2010 01:41:45 - *** doConnect(void) *** 345,350 --- 345,352 PGconn *conn; static char *password = NULL; bool new_pass; + const char *keywords[] = {host,port,options,tty,dbname,user, + password,application_name,NULL}; /* * Start the connection. Loop until we have a password if requested by *** doConnect(void) *** 352,361 */ do { new_pass = false; ! ! conn = PQsetdbLogin(pghost, pgport, pgoptions, pgtty, dbName, ! login, password); if (!conn) { fprintf(stderr, Connection to database \%s\ failed\n, --- 354,373 */ do { + const char *values[] = { + pghost, + pgport, + pgoptions, + pgtty, + dbName, + login, + password, + pgbench, + NULL + }; + new_pass = false; ! conn = PQconnectdbParams(keywords, values); if (!conn) { fprintf(stderr, Connection to database \%s\ failed\n, Index: contrib/vacuumlo/vacuumlo.c === RCS file: /opt/cvsroot_postgresql/pgsql/contrib/vacuumlo/vacuumlo.c,v retrieving revision 1.44 diff -c -p -c -r1.44 vacuumlo.c *** contrib/vacuumlo/vacuumlo.c 2 Jan 2010 16:57:33 - 1.44 --- contrib/vacuumlo/vacuumlo.c 31 Jan 2010 01:44:55 - *** vacuumlo(char *database, struct _param * *** 70,75 --- 70,77 int i; static char *password = NULL; bool new_pass; + const char *keywords[] = {host,port,dbname,user, + password,application_name,NULL}; if (param-pg_prompt == TRI_YES password == NULL) password = simple_prompt(Password: , 100, false); *** vacuumlo(char *database, struct _param * *** 80,94 */ do { new_pass = false; ! ! conn = PQsetdbLogin(param-pg_host, ! param-pg_port, ! NULL, ! NULL, ! database, ! param-pg_user, ! password); if (!conn) { fprintf(stderr, Connection to database \%s\ failed\n, --- 82,99 */ do { + const char *values[] = { + param-pg_host, + param-pg_port, + database, + param-pg_user, + password, + vacuumlo, + NULL + }; + new_pass = false; !
Re: [HACKERS] development setup and libdir
Ivan Sergio Borgonovo escreveu: I'm pretty sure that what you're pointing at is not going to work unless you specify a bunch of other parameters. Ugh? Are you saying there is something wrong in our *official* documentation? It is just a normal compilation command; if you're a C programmer you'll have no problem. Once you ask... people direct you to pgxs. pgxs seems a good choice... a newcomer like me thinks... well that was made exactly to pick up all the information it needs from the environment and build my module. Again, PGXS was developed to make packagers' life easier. Why on earth I want to install my library in a path different from $libdir? Packagers don't. Besides, PGXS won't work on Windows unless you're using a installation built using MinGW. connection, but still different version of pgxs gives different names to .so. What the problem with that? If it set up the right name in sql file that's OK. IMHO, you're trying to complicate a simple process. If you messed up your installation you can always do: $ cd mymodule $ USE_PGXS=1 make uninstall -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mailing list archiver chewing patches
On 30/01/2010 22:18, Joe Conway wrote: On 01/30/2010 01:14 PM, Dimitri Fontaine wrote: Matteo Beccatip...@beccati.com writes: I've been following the various suggestions. Please take a look at the updated archives proof of concept: http://archives.beccati.org/ I like the features a lot, and the only remarks I can think about are bikeschedding, so I'll let it to the web team when they integrate it. It sure looks like a when rather than an if as far as I'm concerned. In short, +1! And thanks a lot! +1 here too. That looks wonderful! Thanks guys. Hopefully in the next few days I'll be able to catch up with Alvaro to see how we can proceed on this. Incidentally, I've just found out that the mailing lists are dropping some messages. According to my qmail logs the AOX account never received Joe's message yesterday, nor quite a few others: M156252, M156259, M156262, M156273, M156275 and I've verified that it also has happened before. I don't know why, but I'm pretty sure that my MTA was contacted only once for those messages, while normally I get two connections (my own address + aox address). Cheers -- Matteo Beccati Development Consulting - http://www.beccati.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries
On Sun, Jan 31, 2010 at 09:34, Guillaume Lelarge guilla...@lelarge.info wrote: Hi, I worked on a patch to make PostgreSQL binaries use the new PQconnectdbParams() libpq functions. I tried to mimic the way Joe Conway changed my previous patch. I know I'm way over the deadline for this commitfest. I couldn't do it before because my previous patch (on this commit fest) proposed two methods to do the new connection functions (a one array method, and a two-arrays method). Joe chose the two arrays method. Anyways, I would understand if it gets postponed to the first commitfest for 9.1. I think this can reasonably be seen as the final step of that patch, rather than a completely new feature. Please add it to this CF - we can always remove it if too many others object ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] development setup and libdir
On Sun, 31 Jan 2010 02:44:13 -0200 Euler Taveira de Oliveira eu...@timbira.com wrote: Ivan Sergio Borgonovo escreveu: I'm pretty sure that what you're pointing at is not going to work unless you specify a bunch of other parameters. Ugh? Are you saying there is something wrong in our *official* documentation? It is just a normal compilation command; if you're a C programmer you'll have no problem. Well I think if I didn't have to compile for 8.3 I'd be more naturally guided to a solution since when make install does more than just moving files, you wonder if it is doing even more. I still don't know if make install is doing something more other than moving/renaming files. Still I think the documentation doesn't provide a reasonable path to *try* to write Hello world. Actually there isn't even a suggested path that works unless you knock at pgsql-hackers and ask for details. Of course explanation on how to compile manually an extension without pgxs and a template makefile aren't sufficient. When I'd like to try something new I'd like to put myself in the most diffused, standard environment eg. one thing I'd like to avoid is choosing my own flavour of compile options. So... what's better than providing a template makefile? And there is one, and it works. But postgres makefile read module.sql.in and output module.sql. I'd consider module.sql.in part of the source of the project and I'd think it has to be relocatable without change. Now you use pgsx and it works great... You've your .so there, you look at the pattern used in the .sql, you adapt it and it still does work. Oh look! 8.3 change the .so name at make install. You adapt it once more but that makes you wonder... is make install doing something else? It would be better if: a) you document what make install is really doing or b) provided that contrib make install just move stuff around you let specify people where they would like to install the lib at make time so a sensible module.sql is produced b) looks easier to maintain and easier to use. But as said I may have a naïve idea of what really means providing a working build system for many architecture/OSes. No rocket science indeed. I'm not crying, I've already written mysimple script to just post-process module.sql. I'm describing my experience so you may be aware of the problems that new comers face. It is up to the people that will write actual code/docs to see if it is worth for them to do so. As soon as I'll feel comfortable to write something worth I'll write some docs. Once you ask... people direct you to pgxs. pgxs seems a good choice... a newcomer like me thinks... well that was made exactly to pick up all the information it needs from the environment and build my module. Again, PGXS was developed to make packagers' life easier. Why on earth I want to install my library in a path different from $libdir? Packagers don't. Besides, PGXS won't work on Windows unless you're using a installation built using MinGW. This is something like you bought a car to go from Paris to Berlin but you can't use it to go to Marseilles just because you don't have a map. pgxs and debian packages did a nearly perfect job even for my needs. Being able to compile and install an extension without a full dev environment and without being root has a non empty set of reasonable applications. Making easier to automate it out of the box doesn't look a bad thing. Sorry if it seemed I was complaining, I was trying to give feedback while learning my way to write Hello world. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mailing list archiver chewing patches
On Sat, Jan 30, 2010 at 22:43, Matteo Beccati p...@beccati.com wrote: On 30/01/2010 17:54, Alvaro Herrera wrote: * While I don't personally care, some are going to insist that the site works with Javascript disabled. I didn't try but from your description it doesn't seem like it would. Is this easily fixable? Date sorting works nicely even without JS, while thread sorting doesn't at all. I've just updated the PoC so that thread sorting is not available when JS is not available, while it still is the default otherwise. Hopefully that's enough to keep JS haters happy. I haven't looked at how it actually works, but the general requirement is that it has to *work* without JS. It doesn't have to work *as well*. That means serving up a page with zero contents, or a page that you can't navigate, is not acceptable. Requiring more clicks to get around the navigation and things like that, are ok. * The old monthly interface /list/-mm/msg*php is not really necessary to keep, *except* that we need the existing URLs to redirect to the corresponding new message page. I think we should be able to create a database of URL redirects from the old site, using the Message-Id URL style. So each message accessed using the old URL style would require two redirects, but I don't think this is a problem. Do you agree? Sure. I was just hoping there was an even easier way (rescritct to month, order by uid limit 1 offset X). I guess it wouldn't be hard to write a script that populates a backward compatibility table. No need for double redirects, it'd be just a matter of adding a JOIN or two to the query. Once we go into production on this, we'll need to do some serious thinking about the caching issues. And in any such scenario we should very much avoid serving up the same content under different URLs, since it'll blow away cache space for no reason - it's much better to throw a redirct. * We're using Subversion to keep the current code. Is your code version-controlled? We'd need to import your code there, I'm afraid. I do have a local svn repository. Given it's just a PoC that is going to be rewritten I don't think it should live in the official repo, but if you think id does, I'll be glad to switch. Note that the plan is to switch pgweb to git as well. So if you just want to push the stuff up during development so people can look at it, register for a repository at git.postgresql.org - or just set one up at github which is even easier. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries
Le 31/01/2010 13:39, Magnus Hagander a écrit : On Sun, Jan 31, 2010 at 09:34, Guillaume Lelarge guilla...@lelarge.info wrote: Hi, I worked on a patch to make PostgreSQL binaries use the new PQconnectdbParams() libpq functions. I tried to mimic the way Joe Conway changed my previous patch. I know I'm way over the deadline for this commitfest. I couldn't do it before because my previous patch (on this commit fest) proposed two methods to do the new connection functions (a one array method, and a two-arrays method). Joe chose the two arrays method. Anyways, I would understand if it gets postponed to the first commitfest for 9.1. I think this can reasonably be seen as the final step of that patch, rather than a completely new feature. Please add it to this CF - we can always remove it if too many others object ;) Done (https://commitfest.postgresql.org/action/patch_view?id=278). Thanks. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] development setup and libdir
Ivan Sergio Borgonovo wrote: When I'd like to try something new I'd like to put myself in the most diffused, standard environment eg. one thing I'd like to avoid is choosing my own flavour of compile options. This is just nonsense, as we have explained to you several times and you keep ignoring. The standard environment you're talking about will not have the compilation options enabled which are specifically designed to help developers. Why would anyone want to deprive themselves if that support? You have wasted far more of your time writing these emails than it would have taken you to set up a postgres development environment which you could have used *with* pgxs, because, depite being told what pgxs is for, you keep trying to make it do something it was not desinged for. If anyone is going from London to Paris via Marseilles it is you. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mailing list archiver chewing patches
On 31/01/2010 13:45, Magnus Hagander wrote: On Sat, Jan 30, 2010 at 22:43, Matteo Beccatip...@beccati.com wrote: On 30/01/2010 17:54, Alvaro Herrera wrote: * While I don't personally care, some are going to insist that the site works with Javascript disabled. I didn't try but from your description it doesn't seem like it would. Is this easily fixable? Date sorting works nicely even without JS, while thread sorting doesn't at all. I've just updated the PoC so that thread sorting is not available when JS is not available, while it still is the default otherwise. Hopefully that's enough to keep JS haters happy. I haven't looked at how it actually works, but the general requirement is that it has to *work* without JS. It doesn't have to work *as well*. That means serving up a page with zero contents, or a page that you can't navigate, is not acceptable. Requiring more clicks to get around the navigation and things like that, are ok. As it currently stands, date sorting is the default and there are no links to the thread view, which would otherwise look empty. We can surely build a non-JS thread view as well, I'm just not sure if it's worth the effort. * The old monthly interface /list/-mm/msg*php is not really necessary to keep, *except* that we need the existing URLs to redirect to the corresponding new message page. I think we should be able to create a database of URL redirects from the old site, using the Message-Id URL style. So each message accessed using the old URL style would require two redirects, but I don't think this is a problem. Do you agree? Sure. I was just hoping there was an even easier way (rescritct to month, order by uid limit 1 offset X). I guess it wouldn't be hard to write a script that populates a backward compatibility table. No need for double redirects, it'd be just a matter of adding a JOIN or two to the query. Once we go into production on this, we'll need to do some serious thinking about the caching issues. And in any such scenario we should very much avoid serving up the same content under different URLs, since it'll blow away cache space for no reason - it's much better to throw a redirct. Yes, that was my point. A single redirect to the only URL for the message. * We're using Subversion to keep the current code. Is your code version-controlled? We'd need to import your code there, I'm afraid. I do have a local svn repository. Given it's just a PoC that is going to be rewritten I don't think it should live in the official repo, but if you think id does, I'll be glad to switch. Note that the plan is to switch pgweb to git as well. So if you just want to push the stuff up during development so people can look at it, register for a repository at git.postgresql.org - or just set one up at github which is even easier. The only reason why I used svn is that git support in netbeans is rather poor, or at least that was my impression. I think it won't be a problem to move to git, I probably just need some directions ;) Cheers -- Matteo Beccati Development Consulting - http://www.beccati.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory leak in deferrable index constraints
Oops, my fault. The list returned by ExecInsertIndexTuples() needs to be freed otherwise lots of lists (one per row) will build up and not be freed until the end of the query. This actually accounts for even more memory than the after-trigger event queue. Patch attached. Of course the after-trigger queue still uses a lot of memory for large updates (I was working on a patch for that but ran out of time before this commitfest started). This fix at least brings deferred index constraints into line with FK constraints, in terms of memory usage. Regards, Dean memory_leak.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak in deferrable index constraints
Dean Rasheed dean.a.rash...@googlemail.com writes: Oops, my fault. The list returned by ExecInsertIndexTuples() needs to be freed otherwise lots of lists (one per row) will build up and not be freed until the end of the query. This actually accounts for even more memory than the after-trigger event queue. Patch attached. It seems a bit unlikely that this would be the largest memory leak in that area. Can you show a test case that demonstrates this is worth worrying about? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ordered aggregates using WITHIN GROUP (was Re: can somebody execute this query on Oracle 11.2g and send result?)
Hitoshi Harada umi.tan...@gmail.com writes: As far as I know hypothetical set function is used to do what-if analysis. rank(val1) within group (order by sk1) chooses the rank value so that val1 is equivalent to or just greater than sk1 when you calculate rank() over (partition by group order by sk1) within the group. Hmm. I found this in SQL:2008 4.15: The hypothetical set functions are related to the window functions RANK, DENSE_RANK, PERCENT_RANK, and CUME_DIST, and use the same names, though with a different syntax. These functions take an argument A and an ordering of a value expression VE. VE is evaluated for all rows of the group. This collection of values is augmented with A; the resulting collection is treated as a window partition of the corresponding window function whose window ordering is the ordering of the value expression. The result of the hypothetical set function is the value of the eponymous window function for the hypothetical row that contributes A to the collection. It appears that the syntax is meant to be hypothetical_function(A) WITHIN GROUP (VE) However this really ought to imply that A contains no variables of the current query, and I don't see such a restriction mentioned anywhere --- maybe an oversight in the spec? If A does contain a variable then there is no unique value to append as the single additional row. I still say that Oracle are completely wrong to have adopted this syntax for listagg, because per spec it does something different than what listagg needs to do. In particular it should mean that the listagg argument can't contain variables --- which is what they want for the delimiter, perhaps, but not for the expression to be concatenated. In other words, the queries can be the same: SELECT array_agg(val ORDER BY sk) FROM ... SELECT array_agg(val) WITHIN GROUP (ORDER BY sk) FROM ... One more time: THOSE DON'T MEAN THE SAME THING. If we ever get around to implementing the hypothetical set functions, we would be very unhappy to have introduced such a bogus equivalence. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak in deferrable index constraints
On 31 January 2010 16:03, Tom Lane t...@sss.pgh.pa.us wrote: Dean Rasheed dean.a.rash...@googlemail.com writes: Oops, my fault. The list returned by ExecInsertIndexTuples() needs to be freed otherwise lots of lists (one per row) will build up and not be freed until the end of the query. This actually accounts for even more memory than the after-trigger event queue. Patch attached. It seems a bit unlikely that this would be the largest memory leak in that area. Can you show a test case that demonstrates this is worth worrying about? If I do create table foo(a int unique deferrable initially deferred); insert into foo (select * from generate_series(1, 1000)); begin; update foo set a=a+1; set constraints all immediate; commit; and watch the backend memory usage during the UPDATE, it grows to around 970MB and then falls back to 370MB as soon as the UPDATE command finishes. During whole SET CONSTRAINTS trigger execution it then remains constant at 370MB. With this patch, it never grows beyond the 370MB occupied by the after-triggers queue. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd output in initdb
On Fri, Jan 29, 2010 at 23:28, Andrew Dunstan and...@dunslane.net wrote: Andrew Dunstan wrote: initializing dependencies ... WARNING: pgstat wait timeout WARNING: pgstat wait timeout ok vacuuming database template1 ... WARNING: pgstat wait timeout WARNING: pgstat wait timeout ok copying template1 to template0 ... WARNING: pgstat wait timeout I can't see an obvious culprit in the commit logs right off. Actually, on close inspection it looks to me like this commit: Create typedef pgsocket for storing socket descriptors. http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=ea1a4463e9de9662b7c9e13374ec8c7b92ff2f19 could well be the culprit. I'm not claiming it's not, but what exactly points to that? Does the problem go away if you move to a version before that? Because I'm 99% sure I saw it well before that commit, and we've had reports on it from 8.4 as well, I'm not so sure... But it may be that that commit made something more likely to happen... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries
Guillaume Lelarge guilla...@lelarge.info writes: */ do { + const char *values[] = { + my_opts-hostname, + my_opts-port, + my_opts-dbname, + my_opts-username, + password, + oid2name, + NULL + }; + new_pass = false; Is that really legal C89 syntax? I seem to recall that array constructors can only be used for static assignments with older compilers. Also, as a matter of style, I find it pretty horrid that this isn't immediately adjacent to the keywords array that it MUST match. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd output in initdb
Magnus Hagander mag...@hagander.net writes: On Fri, Jan 29, 2010 at 23:28, Andrew Dunstan and...@dunslane.net wrote: initializing dependencies ... WARNING: pgstat wait timeout I'm not claiming it's not, but what exactly points to that? Does the problem go away if you move to a version before that? Because I'm 99% sure I saw it well before that commit, and we've had reports on it from 8.4 as well, I'm not so sure... But it may be that that commit made something more likely to happen... I notice pgstat_send is still using if (pgStatSock 0) to detect PGINVALID_SOCKET ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pathological regexp match
On Sat, Jan 30, 2010 at 08:07, Tom Lane t...@sss.pgh.pa.us wrote: Michael Glaesemann michael.glaesem...@myyearbook.com writes: We came across a regexp that takes very much longer than expected. I poked into this a little bit. What seems to be happening is that the use of non-greedy quantifiers plus backreferences turns off most of the optimization that the regexp engine usually does, leaving the RE as a tree of possibilities that is explored in a fairly dumb fashion. In particular, there is a loop in crevdissect() that tries to locate a feasible division point between two concatenated sub-patterns, and for each try, a recursive call to crevdissect() tries to locate a feasible division point between *its* two sub-patterns, resulting in O(N^2) runtime. With a not very pleasant constant factor, too, because of repeated startup/shutdown costs for the DFA matching engine. I found a possible patch that seems to improve matters: AFAICS the DFA matching is independent of the checking that cdissect() and friends do, so we can apply that check first instead of second. This brings the runtime down from minutes to sub-second on my machine. However I don't have a whole lot of faith either in the correctness of this change, or that it might not make some other cases slower instead of faster. Has anyone got a reasonably messy collection of regexps they'd like to try this patch on? I only have the one, so I don't think I can help all that much with that. BTW, I filed the problem upstream with the Tcl project https://sourceforge.net/tracker/?func=detailaid=2942697group_id=10894atid=110894 but I'm not going to hold my breath waiting for useful advice from them. Since Henry Spencer dropped off the net, there doesn't seem to be anybody there who understands that code much better than we do. Also, we likely still ought to toss some CHECK_FOR_INTERRUPTS calls in there ... Yeah. I have zero experience around that code, so if oyu have at least some, I'd much appreciate it if you (or someone who does) could look at it. Likely to cause a lot less breakage than me :D -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_listener entries deleted under heavy NOTIFY load only on Windows
On Fri, Jan 22, 2010 at 22:30, Radu Ilie ri...@wsi.com wrote: On a Windows server under heavy load of NOTIFY events, entries in pg_listener table for some events are deleted. It is like UNLISTEN was called. PostgreSQL version: 8.3.9. Operating System: Windows XP. PostgreSQL believes that if it fails to notify a listener (by signaling the respective backend), then the backend doesn't exist anymore and so it should get rid of the pg_listener entry. The relevant code is in src/backend/commands/async.c, function Send_Notify: if (kill(listenerPID, SIGUSR2) 0) { /* * Get rid of pg_listener entry if it refers to a PID that no * longer exists. Presumably, that backend crashed without * deleting its pg_listener entries. This code used to only * delete the entry if errno==ESRCH, but as far as I can see * we should just do it for any failure (certainly at least * for EPERM too...) */ simple_heap_delete(lRel, lTuple-t_self); } The problem is that under Windows, kill can fail even if the process is still alive. PostgreSQL uses named pipes under Windows to send signals to backends. The present implementation has a bug that causes a client to fail to write data to the named pipe, even though the server process is alive. This is because the server doesn't maintain the named pipe at all times. The bug is present in file src/backend/port/win32/signal.c, function pg_signal_thread. The server code stays in a loop in which it continuously creates an instance of the named pipe (via CreateNamedPipe) and waits for a client process to connect (via ConnectNamedPipe). Once a client connects, the communication with the client is handled in a new thread, with the thread procedure pg_signal_dispatch_thread. This function is very simple: it reads one byte from the named pipe, it writes it back and (very important) closes the handle to the named pipe instance. The main loop creates another instance of the named pipe and waits for another client to connect. Now imagine that the server is under heavy load. There are dozens of backends and threads running and the CPU usage is close to 100%. The following succession of events is possible: 1. Server signal handling thread (in function pg_signal_thread) creates the first, one and only instance of the named pipe via CreateNamedPipe. 2. Server code starts waiting for clients to connect with ConnectNamedPipe. 3. Client wishes to make a transaction on the named pipe and calls CallNamedPipe (in file src/port/kill.c, function pgkill). 4. Server code returns from ConnectNamedPipe. It creates a new thread with the thread procedure pg_signal_dispatch_thread. 5. The signal dispatch thread is scheduled for execution and it runs to completion. As you can see, the last thing it does related to the named pipe is to close the handle via CloseHandle (in function pg_signal_dispatch_thread). This closes the last instance of the named pipe. The named pipe is gone. There is no more named pipe. The signal handling thread was not yet scheduled by the operating system for execution and thus didn't have an opportunity to call CreateNamedPipe. 6. Another client (or the same one, it doesn't matter) tries to write to the named pipe via CallNamedPipe. The call returns ERROR_FILE_NOT_FOUND, because the named pipe is gone. The client believes the backend is gone and it removes the entry from pg_listener. 7. The signal handling thread (in function pg_signal_thread) is finally scheduled for execution and it calls CreateNamedPipe. We now have an instance of the named pipe available. So we end up with the server backend alive, the named pipe is there, but the row is gone from pg_listener. This is easy to reproduce under Windows. I used the scripts posted by Steve Marshall in a similar thread from 01/15/2009 and the problem appears within one minute all the time. For testing I used a Windows XP machine with 2 cores and 2GB of RAM. The CPU usage was over 70% during the trials. Thanks for a *very* detailed analysis. I've finally managed to reproduce this problem, and test your fix. Interestingly enough, the race condition is copied from Microsoft's example of how to write a multi-threaded pipe server. That doesn't make it any less real of course :-) The solution is to create a new instance of the named pipe before launching the signal dispatch thread. This means changing the code in src/backend/port/win32/signal.c to look like this: I have made some minor stylistic changes to this, such as some missing error reporting, and added a lot of comments to explain why we do this, and applied. If you can, please verify that the tip of CVS for your version solves the problem in your real world scenario, and not just my reproduction test case. This will guarantee that we have an instance of the named pipe available at any given moment. If we do this, we can also remove the 3 tries loop from src/port/kill.c: I have removed these on HEAD only, and I'm
Re: [HACKERS] odd output in initdb
On Sun, Jan 31, 2010 at 17:56, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Fri, Jan 29, 2010 at 23:28, Andrew Dunstan and...@dunslane.net wrote: initializing dependencies ... WARNING: pgstat wait timeout I'm not claiming it's not, but what exactly points to that? Does the problem go away if you move to a version before that? Because I'm 99% sure I saw it well before that commit, and we've had reports on it from 8.4 as well, I'm not so sure... But it may be that that commit made something more likely to happen... I notice pgstat_send is still using if (pgStatSock 0) to detect PGINVALID_SOCKET ... That's certainly so, but that shouldn't have any effect on this - since on that platform it's defined to -1 anyway. But I'll go ahead and fix it :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS/SR and smart shutdown
On Sat, Jan 30, 2010 at 01:05, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 29, 2010 at 7:01 PM, Josh Berkus j...@agliodbs.com wrote: It's a good question if that still makes sense with Hot Standby. Perhaps we should redefine smart shutdown in standby mode to shut down as soon as all read-only connections have died. It's clear that smart shutdown doesn't work while something is active. Recovery is active and so we shouldn't shutdown. It makes sense, it works like this already, lets leave it. Document it if needed. I don't think it's clear, or intuitive for users. In SR, recovery is *never* done, so smart shutdown never completes (even if the master is shut down, when I tested it). This is particularly an important issue when you consider that some/many service and init scripts only use smart shutdown ... so we'll get a lot of bug reports of posgresql does not shut down. Absolutely agreed. The existing smart shutdown behavior makes sense from a certain point of view, but it doesn't seem very... what's the word I'm looking for?... smart. Yeah. How about we change it so it's not the default anymore? The fact is that for most applications, it's just broken. Consider any application that uses connection pooling, which happens to be what we recommend people to do. A smart shutdown will never shut that server down. But it will make it not accept new connections. Which is probably the worst possible behavior in most cases. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd output in initdb
Magnus Hagander mag...@hagander.net writes: On Sun, Jan 31, 2010 at 17:56, Tom Lane t...@sss.pgh.pa.us wrote: I notice pgstat_send is still using if (pgStatSock 0) to detect PGINVALID_SOCKET ... That's certainly so, but that shouldn't have any effect on this - since on that platform it's defined to -1 anyway. But I'll go ahead and fix it :-) I was more worried about the other way around: could a *valid* handle be 0? Although it's still not clear why it'd only recently have started being a problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd output in initdb
On Sun, Jan 31, 2010 at 18:33, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Sun, Jan 31, 2010 at 17:56, Tom Lane t...@sss.pgh.pa.us wrote: I notice pgstat_send is still using if (pgStatSock 0) to detect PGINVALID_SOCKET ... That's certainly so, but that shouldn't have any effect on this - since on that platform it's defined to -1 anyway. But I'll go ahead and fix it :-) I was more worried about the other way around: could a *valid* handle be 0? Although it's still not clear why it'd only recently have started being a problem. I don't think it can. And no, I can't figure out why that should've changed recently because of this, since we've checked against that before.. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries
Le 31/01/2010 17:35, Tom Lane a écrit : Guillaume Lelarge guilla...@lelarge.info writes: */ do { + const char *values[] = { + my_opts-hostname, + my_opts-port, + my_opts-dbname, + my_opts-username, + password, + oid2name, + NULL + }; + new_pass = false; Is that really legal C89 syntax? I don't really know. gcc (4.4.1 release) didn't complain about it, whereas it complained with a warning for not-legal-syntax when I had the new_pass = false; statement before the array declaration. I seem to recall that array constructors can only be used for static assignments with older compilers. Also, as a matter of style, I find it pretty horrid that this isn't immediately adjacent to the keywords array that it MUST match. I don't find that horrid. AFAICT, that's the only advantage of the two-arrays method. By the way, it's that kind of code (keywords declaration separated from values declaration) that got commited in the previous patch (http://archives.postgresql.org/pgsql-committers/2010-01/msg00398.php). I merely used the same code for the other binaries. -- Guillaume. http://www.postgresqlfr.org http://dalibo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using the new libpq connection functions in PostgreSQL binaries
On 01/31/2010 09:42 AM, Guillaume Lelarge wrote: I don't find that horrid. AFAICT, that's the only advantage of the two-arrays method. By the way, it's that kind of code (keywords declaration separated from values declaration) that got commited in the previous patch (http://archives.postgresql.org/pgsql-committers/2010-01/msg00398.php). I merely used the same code for the other binaries. Yes, I separated them, because otherwise the compiler complained about the declaration not being at the top of a block. Of course Tom's other complaint and this one can both be satisfied by not doing the static assignment in the declaration. I'll fix the already committed code and take a look at refactoring this latest patch. I stand by the two arrays mthod decision though -- I find combining them into a single array to be unseemly. Joe signature.asc Description: OpenPGP digital signature
Re: [HACKERS] mailing list archiver chewing patches
On Sun, Jan 31, 2010 at 15:09, Matteo Beccati p...@beccati.com wrote: On 31/01/2010 13:45, Magnus Hagander wrote: On Sat, Jan 30, 2010 at 22:43, Matteo Beccatip...@beccati.com wrote: On 30/01/2010 17:54, Alvaro Herrera wrote: * While I don't personally care, some are going to insist that the site works with Javascript disabled. I didn't try but from your description it doesn't seem like it would. Is this easily fixable? Date sorting works nicely even without JS, while thread sorting doesn't at all. I've just updated the PoC so that thread sorting is not available when JS is not available, while it still is the default otherwise. Hopefully that's enough to keep JS haters happy. I haven't looked at how it actually works, but the general requirement is that it has to *work* without JS. It doesn't have to work *as well*. That means serving up a page with zero contents, or a page that you can't navigate, is not acceptable. Requiring more clicks to get around the navigation and things like that, are ok. As it currently stands, date sorting is the default and there are no links to the thread view, which would otherwise look empty. We can surely build a non-JS thread view as well, I'm just not sure if it's worth the effort. Hmm. I personally think we need some level of thread support for non-JS as well, that's at least not *too* much of a step backwards from what we have now. But others may have other thoughts about that? * We're using Subversion to keep the current code. Is your code version-controlled? We'd need to import your code there, I'm afraid. I do have a local svn repository. Given it's just a PoC that is going to be rewritten I don't think it should live in the official repo, but if you think id does, I'll be glad to switch. Note that the plan is to switch pgweb to git as well. So if you just want to push the stuff up during development so people can look at it, register for a repository at git.postgresql.org - or just set one up at github which is even easier. The only reason why I used svn is that git support in netbeans is rather poor, or at least that was my impression. I think it won't be a problem to move to git, I probably just need some directions ;) :-) Well, it doesn't matter what type of repo it's in at this point, only once it goes into production. The reason I suggested git at this point is that we (the postgresql project) do provide git hosting at git.postgresql.org, but we don't provide subversion anywhere. And I'm certainly not going to suggest you use pgfoundry and cvs -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory leak in deferrable index constraints
Dean Rasheed dean.a.rash...@googlemail.com writes: On 31 January 2010 16:03, Tom Lane t...@sss.pgh.pa.us wrote: It seems a bit unlikely that this would be the largest memory leak in that area. Can you show a test case that demonstrates this is worth worrying about? create table foo(a int unique deferrable initially deferred); insert into foo (select * from generate_series(1, 1000)); begin; update foo set a=a+1; set constraints all immediate; commit; Thanks. I had forgotten all the work we put into minimizing the size of the deferred trigger queue. In this example it's only 16 bytes per entry, whereas a 1-element List is going to involve 16 bytes for the header, 8 bytes for the cell, plus two palloc item overheads --- and double all that on a 64-bit machine. So yeah, this is a significant leak. Patch applied. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
Simon Riggs wrote: On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: The last item on my list before close is making VACUUM FULL and Hot Standby play nicely together. The options to do this were and still are: (1) Add WAL messages for non-transactional relcache invalidations (2) Allow system relations to be cluster-ed/vacuum full-ed. (1) was how we did it originally and I believe it worked without problem. We saw the opportunity to do (2) and it has been worth exploring. Refresh my memory on why (1) lets us avoid fixing (2)? (1) allows us to retain VACUUM FULL INPLACE for system relations, thus avoiding the need to do (2). Non-transactional invalidations need to be replayed in recovery for the same reason they exist on the primary. It sounds like a kluge at best ... (2) isn't a necessary change right now. It is the best design going forwards, but its burst radius stretches far beyond Hot Standby. There is enough code in HS for us to support, so adding to it makes little sense for me, in this release, since there is no functional benefit in doing so. Well, it'll avoid having to support the kludges in HS required for VACUUM FULL INPLACE. I'm in favor of (2), unless some unforeseen hard problems come up with implementing the ideas on that discussed earlier. Yeah, that's pretty vague, but basically I think it's worth spending some more time doing (2), than doing (1). For one, if the plan is to do (2) in next release anyway, we might as well do it now and avoid having to support the HS+VACUUM FULL INPLACE combination in only 9.0 stable branch for years to come. I believe we had a pretty well-thought out plan on how to support system catalogs with the new VACUUM FULL. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Simon Riggs si...@2ndquadrant.com writes: On Fri, 2010-01-29 at 14:04 -0500, Tom Lane wrote: WTF? Simon, this seems to be getting way way beyond anything the community has agreed to. Particularly, inventing GUCs is not something to be doing without consensus. If you or anybody else would like me to revoke it then I am happy to do that, with no problem or argument. I agree it hasn't had discussion, though am happy to have such a discussion. The commit is a one line change, with parameter to control it, discussed by Heikki and myself in December 2008. I stand by the accuracy of the change; the parameter is really to ensure we can test during beta. Well, I was waiting to see if anyone else had an opinion, but: my opinion is that a GUC is not appropriate here. Either test it yourself enough to be sure it's a win, or don't put it in. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
On Sun, 2010-01-31 at 21:00 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: The last item on my list before close is making VACUUM FULL and Hot Standby play nicely together. The options to do this were and still are: (1) Add WAL messages for non-transactional relcache invalidations (2) Allow system relations to be cluster-ed/vacuum full-ed. (1) was how we did it originally and I believe it worked without problem. We saw the opportunity to do (2) and it has been worth exploring. Refresh my memory on why (1) lets us avoid fixing (2)? (1) allows us to retain VACUUM FULL INPLACE for system relations, thus avoiding the need to do (2). Non-transactional invalidations need to be replayed in recovery for the same reason they exist on the primary. It sounds like a kluge at best ... (2) isn't a necessary change right now. It is the best design going forwards, but its burst radius stretches far beyond Hot Standby. There is enough code in HS for us to support, so adding to it makes little sense for me, in this release, since there is no functional benefit in doing so. Well, it'll avoid having to support the kludges in HS required for VACUUM FULL INPLACE. I'm in favor of (2), unless some unforeseen hard problems come up with implementing the ideas on that discussed earlier. Yeah, that's pretty vague, but basically I think it's worth spending some more time doing (2), than doing (1). For one, if the plan is to do (2) in next release anyway, we might as well do it now and avoid having to support the HS+VACUUM FULL INPLACE combination in only 9.0 stable branch for years to come. That's a good argument, but with respect, it isn't you who is writing the code, nor will it be you that supports it, AIUI. Right now, HS is isolated in many ways. If we introduce this change it will effect everybody and that means I'll be investigating all manner of bug reports that have zip to do with HS for a long time to come. What I would say is that for 9.0 we can easily remove the INPLACE option as an explicit request. I believe we had a pretty well-thought out plan on how to support system catalogs with the new VACUUM FULL. I think calling it a well thought out plan is, err, overstating things. We had what looked like a viable sketch of how to proceed and I agreed to investigate that. Having done so, I'm saying I don't like what I see going further and wish to backtrack to my known safe solution. Overall, I don't see any benefit in pursuing that course any further. I just see risk, on balance with (2), whereas (1) seems easier/faster, less risk and more isolated. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
Simon Riggs si...@2ndquadrant.com writes: On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: The options to do this were and still are: (1) Add WAL messages for non-transactional relcache invalidations (2) Allow system relations to be cluster-ed/vacuum full-ed. Refresh my memory on why (1) lets us avoid fixing (2)? (1) allows us to retain VACUUM FULL INPLACE for system relations, thus avoiding the need to do (2). Non-transactional invalidations need to be replayed in recovery for the same reason they exist on the primary. Well, I would expect that invalidation events need to be transmitted to hot-standby slaves no matter what --- backends running queries on an HS slave need to hear about inval events just as much as backends on the master do. So my take on it is that all inval events will have to have associated WAL records when in HS mode, independently of what we choose to do about VACUUM. Anyway, it's still not apparent to me exactly what the connection is between VACUUM FULL and Hot Standby. I remember that we said HS didn't work with VACUUM FULL (INPLACE) but I don't recall why that is, and the links on the open-items pages are not leading me to any useful discussion. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: The commit is a one line change, with parameter to control it, discussed by Heikki and myself in December 2008. I stand by the accuracy of the change; the parameter is really to ensure we can test during beta. Well, I was waiting to see if anyone else had an opinion, but: my opinion is that a GUC is not appropriate here. Either test it yourself enough to be sure it's a win, or don't put it in. I will remove the parameter then, keeping the augmentation. That OK? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
Simon Riggs si...@2ndquadrant.com writes: Having said that I now realise a few things I didn't before: * Approach (2) effects the core of Postgres, even if you don't use Hot Standby. * I've had to remove 7 sanity checks to get the first few VACUUMs working. ISTM that removing various basic checks in the code is not a good thing. * There are are more special cases than I realised at first: temp, shared, with-toast, nailed, shared-and-nailed, pg_class, normal system. Quite honestly, these statements and the attached patch (which doesn't even begin to touch the central issue, but does indeed break a lot of things) demonstrate that *you* are not the guy to implement what was being discussed. It needs to be done by someone who understands the core caching code, which apparently you haven't studied in any detail. I have a feeling that I should go do this... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] development setup and libdir
Hi, Ivan Sergio Borgonovo m...@webthatworks.it writes: But considering that what it's really missing between what I need and what I get is renaming a file... it's just a pain I've to set up a whole new instance of postgres, install the whole source etc... Untrue. Get the sources with git clone, them ./configure --prefix=~/pgsql/master --enable-debug --enable-cassert make make install Now you can add ~/pgsql/master to your PATH, initdb then pg_ctl start, and use PGXS. All will work just fine, and against a *devel* env. Not a production one. That's a clear *bonus*. But well again... considering I'm a rename away from being able to compile and test an extension with my user privileges... it's just a pity it doesn't work that way out of the box. I think you're mixing up the development needs with the packaging ones. You have to keep separating them even when doing both. Another scenario could be I could just svn update on a staging box, compile there and then move everything to production easier. Not every shop is a multimillion dollars enterprise that can afford a dev/staging/production box for every version of postgres it is using. If you don't need to squeeze out every cpu cycle in a production box you may be happy with a pre-packaged postgres. You want pre-packaged stuff, that does not mean you develop them on production servers. Virtual Machines and chroot etc are cheap. still I think my need isn't that weird and could lower the entry point for many developers quite a bit. Well you're only missing non-root make install with *packaged* PostgreSQL. As it stands, the 2 options are not compatible. But I fail to see how much it's a burden now that it's easy to develop on a dedicated *virtual* machine, e.g. [1] and yeah I'll need dbg symbols but that's going to happen later. Development environment and production one are different. You seem to be wanting to ease the development on production environment. I'm not convinced this is a good way of approaching the problem. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Simon Riggs wrote: On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: The commit is a one line change, with parameter to control it, discussed by Heikki and myself in December 2008. I stand by the accuracy of the change; the parameter is really to ensure we can test during beta. Well, I was waiting to see if anyone else had an opinion, but: my opinion is that a GUC is not appropriate here. Either test it yourself enough to be sure it's a win, or don't put it in. I will remove the parameter then, keeping the augmentation. That OK? Well how much is the actual hit with this on the master for different workloads do we have realistic numbers on that? Also how much of an actual win is it in the other direction - as in under what circumstances and workloads does it help in avoiding superflous cancelations on the standby? Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
On Sun, 2010-01-31 at 14:35 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Sat, 2010-01-30 at 15:17 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: The options to do this were and still are: (1) Add WAL messages for non-transactional relcache invalidations (2) Allow system relations to be cluster-ed/vacuum full-ed. Refresh my memory on why (1) lets us avoid fixing (2)? (1) allows us to retain VACUUM FULL INPLACE for system relations, thus avoiding the need to do (2). Non-transactional invalidations need to be replayed in recovery for the same reason they exist on the primary. Well, I would expect that invalidation events need to be transmitted to hot-standby slaves no matter what --- backends running queries on an HS slave need to hear about inval events just as much as backends on the master do. So my take on it is that all inval events will have to have associated WAL records when in HS mode, independently of what we choose to do about VACUUM. All transactional invalidations are already handled by HS. It was the non-transactional invalidations in VACUUM FULL INPLACE that still require additional handling. Anyway, it's still not apparent to me exactly what the connection is between VACUUM FULL and Hot Standby. I remember that we said HS didn't work with VACUUM FULL (INPLACE) but I don't recall why that is, and the links on the open-items pages are not leading me to any useful discussion. Very little really; not enough to force the sort of changes that I am now seeing will be required in the way catalogs and caches operate. There was some difficulty around the fact that VFI issues two commits for the same transaction, but that is now correctly handled in the code after discussion. I'm not known as a risk-averse person but (2) is a step too far for me. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] development setup and libdir
Ivan Sergio Borgonovo m...@webthatworks.it writes: Being able to compile and install an extension without a full dev environment is nonsense. and without being root That's easy on a development environment, but you keep insisting in not having one. Now that's your problem. Sorry if it seemed I was complaining, I was trying to give feedback while learning my way to write Hello world. You seem to have a pretty clear idea of how you want things to work, and that has nothing to do with how they do. So you're confused at first, then you want to fix a non existent problem. Please step back, breathe, and look at your goal and problem again. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
On Sun, 2010-01-31 at 14:44 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Having said that I now realise a few things I didn't before: * Approach (2) effects the core of Postgres, even if you don't use Hot Standby. * I've had to remove 7 sanity checks to get the first few VACUUMs working. ISTM that removing various basic checks in the code is not a good thing. * There are are more special cases than I realised at first: temp, shared, with-toast, nailed, shared-and-nailed, pg_class, normal system. Quite honestly, these statements and the attached patch (which doesn't even begin to touch the central issue, but does indeed break a lot of things) demonstrate that *you* are not the guy to implement what was being discussed. It needs to be done by someone who understands the core caching code, which apparently you haven't studied in any detail. I didn't claim the attached patch began to touch the issues. I was very clear that it covered only the very simplest use case, that was the point. You may not wish to acknowledge it, but I *have* studied the core caching code in detail for many months and that is the basis for my comments. The way I have written the exclusions in vacuum.c shows that I have identified each of the sub-cases we are required to handle. There is nothing wrong with your idea of using a mapping file. That is relatively easy part of the problem. I have a feeling that I should go do this... If you wish, but I still think it is an unnecessary change for this release, whoever does it. We both know that once you start you won't stop, but that doesn't make it worthwhile or less risky. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
On Sun, 2010-01-31 at 20:48 +0100, Stefan Kaltenbrunner wrote: Simon Riggs wrote: On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: The commit is a one line change, with parameter to control it, discussed by Heikki and myself in December 2008. I stand by the accuracy of the change; the parameter is really to ensure we can test during beta. Well, I was waiting to see if anyone else had an opinion, but: my opinion is that a GUC is not appropriate here. Either test it yourself enough to be sure it's a win, or don't put it in. I will remove the parameter then, keeping the augmentation. That OK? Well how much is the actual hit with this on the master for different workloads do we have realistic numbers on that? Also how much of an actual win is it in the other direction - as in under what circumstances and workloads does it help in avoiding superflous cancelations on the standby? At the moment a btree delete record will cancel every request 1. no matter how long they have been running 2. no matter if they haven't accessed the index being cleaned (they might later, is the thinking...) This change improves case (1) in that it will only remove queries that are older than the oldest snapshot on the primary, should max_standby_delay be exceeded. Case (2) would have been improved by my other proposed patch should max_standby_delay be exceeded. The cost of improving case (1) is that we do the equivalent of taking a snapshot of the procarray while holding locks on the btree block being cleaned. That will increase index contention somewhat in applications that do btree deletes, i.e. non-hot index updates or deletes. Greg Stark's comments have been that he wants to see max_standby_delay = -1 put back, which would effecively prevent both (1) and (2) from occurring. I am working on that now. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
Simon Riggs si...@2ndquadrant.com writes: On Sun, 2010-01-31 at 14:35 -0500, Tom Lane wrote: Anyway, it's still not apparent to me exactly what the connection is between VACUUM FULL and Hot Standby. I remember that we said HS didn't work with VACUUM FULL (INPLACE) but I don't recall why that is, and the [ sorry, I meant not-INPLACE ] links on the open-items pages are not leading me to any useful discussion. Very little really; not enough to force the sort of changes that I am now seeing will be required in the way catalogs and caches operate. There was some difficulty around the fact that VFI issues two commits for the same transaction, but that is now correctly handled in the code after discussion. If the only benefit of getting rid of VACUUM FULL were simplifying Hot Standby, I'd agree with you. But there are numerous other benefits. The double-commit hack you mention is something we need to get rid of for general system stability (because of the risk of PANIC if the vacuum fails after the first commit). Getting rid of REINDEX-in-place on shared catalog indexes is another thing that's really safety critical. Removing V-F related hacks in other places would just be a bonus. It's something we need to do, so if Hot Standby is forcing our hands, then let's just do it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
On Sun, 2010-01-31 at 15:14 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Sun, 2010-01-31 at 14:35 -0500, Tom Lane wrote: Anyway, it's still not apparent to me exactly what the connection is between VACUUM FULL and Hot Standby. I remember that we said HS didn't work with VACUUM FULL (INPLACE) but I don't recall why that is, and the [ sorry, I meant not-INPLACE ] links on the open-items pages are not leading me to any useful discussion. Very little really; not enough to force the sort of changes that I am now seeing will be required in the way catalogs and caches operate. There was some difficulty around the fact that VFI issues two commits for the same transaction, but that is now correctly handled in the code after discussion. If the only benefit of getting rid of VACUUM FULL were simplifying Hot Standby, I'd agree with you. But there are numerous other benefits. The double-commit hack you mention is something we need to get rid of for general system stability (because of the risk of PANIC if the vacuum fails after the first commit). Getting rid of REINDEX-in-place on shared catalog indexes is another thing that's really safety critical. Removing V-F related hacks in other places would just be a bonus. It's something we need to do, so if Hot Standby is forcing our hands, then let's just do it. That's the point: Hot Standby is *not* forcing our hand to do this. Doing this will not simplify Hot Standby in any significant way. The code to support VFI with Hot Standby is, after technical review, much, much simpler than the code to remove VFI. I'll do a little work towards step (1) just so we can take a more informed view once you've had a better look at just what (2) involves. I had already written the code for the Sept release of HS. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating VACUUM FULL WAS: remove flatfiles.c
Back in September I wrote: ... The sticking point --- not only for pg_class, but for shared catalogs such as pg_database --- is the lack of a way to track relfilenode if it ever changes. What if we keep the relfilenode of these critical tables someplace else? For instance, we could have a map file in each database holding the relfilenode of pg_class, and one in $PGDATA/global holding the relfilenodes of the shared catalogs and indexes. It'd be possible to update a map file atomically via the rename(2) trick. Then we teach relcache or some similar place to believe the map files over the contents of pg_class. Thinking about this some more, I can see one small disadvantage: for the relations that we use the map file for, pg_class.relfilenode would not be trustworthy. This would not affect most of the system internals (which will be looking at the relcache's copy, which would be kept valid by the relcache code). But it would affect user queries, such as for example attempts to use contrib/oid2name to identify a file on-disk. The main case where pg_class.relfilenode would be likely to be out-of-sync is for shared catalogs. We could keep it up to date in most cases for local catalogs, but there's no hope of reaching into other databases' pg_class when a shared catalog is relocated. What I'd suggest doing about this is: (1) Store zero in pg_class.relfilenode for those catalogs for which the map is used. This at least makes it obvious that the value you're looking at isn't valid. (2) Provide a SQL function to extract the real relfilenode of any specified pg_class entry. We'd have to modify oid2name and pg_dump to know to use the function instead of looking at the column. There might be some other client-side code that would be broken until it got taught about the function, but hopefully not much. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
On Sun, 2010-01-31 at 15:14 -0500, Tom Lane wrote: If the only benefit of getting rid of VACUUM FULL were simplifying Hot Standby, I'd agree with you. But there are numerous other benefits. The double-commit hack you mention is something we need to get rid of for general system stability (because of the risk of PANIC if the vacuum fails after the first commit). Getting rid of REINDEX-in-place on shared catalog indexes is another thing that's really safety critical. Removing V-F related hacks in other places would just be a bonus. I should've agreed with this in my last post, cos I do. I want very, very much to get rid of VACUUM FULL just because it's such a sump of ugly, complex code. But there is a limit to how and when performs what I now see is a more major surgical operation. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Simon Riggs si...@2ndquadrant.com writes: At the moment a btree delete record will cancel every request 1. no matter how long they have been running 2. no matter if they haven't accessed the index being cleaned (they might later, is the thinking...) That seems seriously horrid. What is the rationale for #2 in particular? I would hope that at worst this would affect sessions that are actively competing for the index being cleaned. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ordered aggregates using WITHIN GROUP (was Re: can somebody execute this query on Oracle 11.2g and send result?)
2010/2/1 Tom Lane t...@sss.pgh.pa.us: Hitoshi Harada umi.tan...@gmail.com writes: In other words, the queries can be the same: SELECT array_agg(val ORDER BY sk) FROM ... SELECT array_agg(val) WITHIN GROUP (ORDER BY sk) FROM ... One more time: THOSE DON'T MEAN THE SAME THING. If we ever get around to implementing the hypothetical set functions, we would be very unhappy to have introduced such a bogus equivalence. I completely agree. Although Oracle's syntax can express ordered aggregate, by introducing such syntax now it will be quite complicated to implement hypothetical functions for those syntactic restrictions and design in the future. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
On Sun, 2010-01-31 at 15:41 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: At the moment a btree delete record will cancel every request 1. no matter how long they have been running 2. no matter if they haven't accessed the index being cleaned (they might later, is the thinking...) That seems seriously horrid. What is the rationale for #2 in particular? I would hope that at worst this would affect sessions that are actively competing for the index being cleaned. That is exactly the feedback I received from many other people and why I prioritised the relation-specific conflict patch. It's worse that that because point 2 effects WAL cleanup records for the heap also. The rationale is that a session *might* in the future access a table, and if it did so it would receive the wrong answer *potentially*. This is the issue I have been discussing for a long time now, in various forms, starting on-list in Aug 2008. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Simon Riggs wrote: On Sun, 2010-01-31 at 20:48 +0100, Stefan Kaltenbrunner wrote: Simon Riggs wrote: On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: The commit is a one line change, with parameter to control it, discussed by Heikki and myself in December 2008. I stand by the accuracy of the change; the parameter is really to ensure we can test during beta. Well, I was waiting to see if anyone else had an opinion, but: my opinion is that a GUC is not appropriate here. Either test it yourself enough to be sure it's a win, or don't put it in. I will remove the parameter then, keeping the augmentation. That OK? Well how much is the actual hit with this on the master for different workloads do we have realistic numbers on that? Also how much of an actual win is it in the other direction - as in under what circumstances and workloads does it help in avoiding superflous cancelations on the standby? At the moment a btree delete record will cancel every request 1. no matter how long they have been running 2. no matter if they haven't accessed the index being cleaned (they might later, is the thinking...) This change improves case (1) in that it will only remove queries that are older than the oldest snapshot on the primary, should max_standby_delay be exceeded. Case (2) would have been improved by my other proposed patch should max_standby_delay be exceeded. The cost of improving case (1) is that we do the equivalent of taking a snapshot of the procarray while holding locks on the btree block being cleaned. That will increase index contention somewhat in applications that do btree deletes, i.e. non-hot index updates or deletes. hmm makes sense from the HS side - but I think to make a case for a GUC it would help a lot to see actual numbers(as in benchmarks) showing how much of a hit it is to the master. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
On Sun, 2010-01-31 at 22:05 +0100, Stefan Kaltenbrunner wrote: hmm makes sense from the HS side - but I think to make a case for a GUC it would help a lot to see actual numbers(as in benchmarks) showing how much of a hit it is to the master. Agreed, though my approach to benchmarking was to provide the mechanism by which it was possible to benchmark. I didn't presume to be able to cover wide area with benchmarking tests just for this. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Simon Riggs wrote: On Sun, 2010-01-31 at 22:05 +0100, Stefan Kaltenbrunner wrote: hmm makes sense from the HS side - but I think to make a case for a GUC it would help a lot to see actual numbers(as in benchmarks) showing how much of a hit it is to the master. Agreed, though my approach to benchmarking was to provide the mechanism by which it was possible to benchmark. I didn't presume to be able to cover wide area with benchmarking tests just for this. I don't think this patch helps much though. It's setting lastRemovedXid to GetOldestXmin(), which is still a very pessimistic estimate. In fact, if there's only one transaction running in the master, it's not any better than just setting it to InvalidTransactionId and killing all active queries in the standby. What we'd want to set it to is the xmin/xmax of the oldest heap tuple whose pointer was removed from the index page. That could be much much older than GetOldestXmin(), allowing many more read-only queries to live in the standby. IIRC it was Greg Stark who suggested last time this was discussed that we could calculate the exact value for latestRemovedXid in the standby. When replaying the deletion record, the standby could look at all the heap tuples whose index pointers are being removed, to see which one was newest. That can be pretty expensive, involving random I/O, but it gives an exact value, and doesn't stress the master at all. And we could skip it if there's no potentially conflicting read-only queries running. That seems like the most user-friendly approach to me. Even though random I/O is expensive, surely it's better than killing queries. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: IIRC it was Greg Stark who suggested last time this was discussed that we could calculate the exact value for latestRemovedXid in the standby. When replaying the deletion record, the standby could look at all the heap tuples whose index pointers are being removed, to see which one was newest. This assumes that the standby can tell which heap tuples are being removed, which I'm not sure is true --- consider the case where the deletion record is carrying a page image instead of a list of deleted tuples. But maybe we could rejigger the representation to make sure that the info is always available. In general it sounds like a much nicer answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hot Standby and deadlock detection
Greg Stark has requested that I re-add max_standby_delay = -1. I deferred that in favour of relation-specific conflict resolution, though that seems too major a change from comments received. As discussed in various other posts, in order to re-add the -1 option we need to add deadlock detection. I woke up today with a simplifying assumption and have worked out a solution, the easy parts of which I have committed earlier. Part #2 is to make Startup process do deadlock detection. I attach a WIP patch for comments since signal handling has been a much-discussed area in recent weeks. Normal deadlock detection waits for deadlock_timeout before doing the detection. That is a simple performance tuning mechanism which I think is probably unnecessary with hot standby, at least in the first instance. The way this would work is if Startup waits on a buffer pin we immediately send out a request to all backends to cancel themselves if they are holding the buffer pin required waiting on a lock. We then sleep until max_standby_delay. When max_standby_delay = -1 we only sleep until deadlock timeout and then check (on the Startup process). That keeps the signal handler code simple and reduces the number of test cases required to confirm everything is solid. This patch and the last commit together present everything we need to reenable max_standby_delay = -1, so that change is included here also. ? -- Simon Riggs www.2ndQuadrant.com *** a/src/backend/storage/ipc/procsignal.c --- b/src/backend/storage/ipc/procsignal.c *** *** 272,277 procsignal_sigusr1_handler(SIGNAL_ARGS) --- 272,280 if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_SNAPSHOT); + if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK)) + RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); + if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); *** a/src/backend/storage/ipc/standby.c --- b/src/backend/storage/ipc/standby.c *** *** 127,132 WaitExceedsMaxStandbyDelay(void) --- 127,135 long delay_secs; int delay_usecs; + if (MaxStandbyDelay == -1) + return false; + /* Are we past max_standby_delay? */ TimestampDifference(GetLatestXLogTime(), GetCurrentTimestamp(), delay_secs, delay_usecs); *** *** 372,378 ResolveRecoveryConflictWithBufferPin(void) * Signal immediately or set alarm for later. */ if (MaxStandbyDelay == 0) ! SendRecoveryConflictWithBufferPin(); else { TimestampTz now; --- 375,391 * Signal immediately or set alarm for later. */ if (MaxStandbyDelay == 0) ! SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); ! else if (MaxStandbyDelay == -1) ! { ! /* ! * Set the deadlock timer only, we don't timeout in any other case ! */ ! if (enable_standby_deadlock_alarm()) ! sig_alarm_enabled = true; ! else ! elog(FATAL, could not set timer for process wakeup); ! } else { TimestampTz now; *** *** 386,392 ResolveRecoveryConflictWithBufferPin(void) standby_delay_secs, standby_delay_usecs); if (standby_delay_secs = MaxStandbyDelay) ! SendRecoveryConflictWithBufferPin(); else { TimestampTz fin_time; /* Expected wake-up time by timer */ --- 399,405 standby_delay_secs, standby_delay_usecs); if (standby_delay_secs = MaxStandbyDelay) ! SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); else { TimestampTz fin_time; /* Expected wake-up time by timer */ *** *** 394,399 ResolveRecoveryConflictWithBufferPin(void) --- 407,419 int timer_delay_usecs = 0; /* + * Send out a request to check for buffer pin deadlocks before we wait. + * This is fairly cheap, so no need to wait for deadlock timeout before + * trying to send it out. + */ + SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); + + /* * How much longer we should wait? */ timer_delay_secs = MaxStandbyDelay - standby_delay_secs; *** *** 435,449 ResolveRecoveryConflictWithBufferPin(void) } void ! SendRecoveryConflictWithBufferPin(void) { /* * We send signal to all backends to ask them if they are holding * the buffer pin which is delaying the Startup process. We must * not set the conflict flag yet, since most backends will be innocent. * Let the SIGUSR1 handling in each backend decide their own fate. */ ! CancelDBBackends(InvalidOid, PROCSIG_RECOVERY_CONFLICT_BUFFERPIN, false); } /* --- 455,472 } void ! SendRecoveryConflictWithBufferPin(ProcSignalReason reason) { + Assert(reason == PROCSIG_RECOVERY_CONFLICT_BUFFERPIN || + reason ==
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote: IIRC it was Greg Stark who suggested last time this was discussed that we could calculate the exact value for latestRemovedXid in the standby. When replaying the deletion record, the standby could look at all the heap tuples whose index pointers are being removed, to see which one was newest. That can be pretty expensive, involving random I/O, but it gives an exact value, and doesn't stress the master at all. And we could skip it if there's no potentially conflicting read-only queries running. That seems like the most user-friendly approach to me. Even though random I/O is expensive, surely it's better than killing queries. Best solution, no time to do it. Should I revoke this change? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Simon Riggs si...@2ndquadrant.com writes: Should I revoke this change? Please. We can always put it back later if nothing better gets implemented. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] development setup and libdir
Ivan Sergio Borgonovo wrote: Of course I can write a script that can workaround this. It seems that the only thing missing is that pgxs 8.3 used to prefix .so with lib and then rename them at install time, but pgxs 8.4 build them directly without prefix. I'm just speculating this is the issue and it is the only one I still have to solve... but... what's going to happen next release? Wouldn't it be better if make install could install stuff where I ask so I could put modules in different places *even* for the same installation of postgres? FWIW the soon-to-be-released PostGIS 1.5 has an out of place regression testing feature that allows PostGIS to be built using PGXS and regression tested without putting anything in the PostgreSQL installation directory itself. It's a little bit of a hack, but take a look at the PGXS makefile and the regression makefile to see how it all works: http://trac.osgeo.org/postgis/browser/trunk/postgis/Makefile.in http://trac.osgeo.org/postgis/browser/trunk/regress/Makefile.in HTH, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Eliminating VACUUM FULL WAS: remove flatfiles.c
On Sun, Jan 31, 2010 at 3:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Back in September I wrote: ... The sticking point --- not only for pg_class, but for shared catalogs such as pg_database --- is the lack of a way to track relfilenode if it ever changes. What if we keep the relfilenode of these critical tables someplace else? For instance, we could have a map file in each database holding the relfilenode of pg_class, and one in $PGDATA/global holding the relfilenodes of the shared catalogs and indexes. It'd be possible to update a map file atomically via the rename(2) trick. Then we teach relcache or some similar place to believe the map files over the contents of pg_class. Thinking about this some more, I can see one small disadvantage: for the relations that we use the map file for, pg_class.relfilenode would not be trustworthy. This would not affect most of the system internals (which will be looking at the relcache's copy, which would be kept valid by the relcache code). But it would affect user queries, such as for example attempts to use contrib/oid2name to identify a file on-disk. The main case where pg_class.relfilenode would be likely to be out-of-sync is for shared catalogs. We could keep it up to date in most cases for local catalogs, but there's no hope of reaching into other databases' pg_class when a shared catalog is relocated. What I'd suggest doing about this is: (1) Store zero in pg_class.relfilenode for those catalogs for which the map is used. This at least makes it obvious that the value you're looking at isn't valid. (2) Provide a SQL function to extract the real relfilenode of any specified pg_class entry. We'd have to modify oid2name and pg_dump to know to use the function instead of looking at the column. There might be some other client-side code that would be broken until it got taught about the function, but hopefully not much. Thoughts? Seems reasonable to me (assuming there's no way to avoid changing the relfilenode, which I assume is the case but don't actually know the code well enough to say with certainty). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
On Sun, 2010-01-31 at 17:10 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Should I revoke this change? Please. Will do, in morning. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG?] strange behavior in ALTER TABLE ... RENAME TO on inherited columns
(2010/01/30 3:36), Robert Haas wrote: 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com: (2010/01/29 9:58), KaiGai Kohei wrote: (2010/01/29 9:29), Robert Haas wrote: 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com: (2010/01/29 0:46), Robert Haas wrote: 2010/1/27 KaiGai Koheikai...@ak.jp.nec.com: Hmm, indeed, this logic (V3/V5) is busted. The idea of V4 patch can also handle this case correctly, although it is lesser in performance. I wonder whether it is really unacceptable cost in performance, or not. Basically, I assume ALTER TABLE RENAME/TYPE is not frequent operations, and I don't think this bugfix will damage to the reputation of PostgreSQL. Where should we go on the next? Isn't the problem here just that the following comment is 100% wrong? /* * Unlike find_all_inheritors(), we need to walk on child relations * that have diamond inheritance tree, because this function has to * return correct expected inhecount to the caller. */ It seems to me that the right solution here is to just add one more argument to find_all_inheritors(), something like List **expected_inh_count. Am I missing something? The find_all_inheritors() does not walk on child relations more than two times, even if a child has multiple parents inherited from common origin, because list_concat_unique_oid() ignores the given OID if it is already on the list. It means all the child relations under the relation already walked on does not checked anywhere. (Of course, this assumption is correct for the purpose of find_all_inheritors() with minimum cost.) What we want to do here is to compute the number of times a certain child relation is inherited from a common origin; it shall be the expected-inhcount. So, we need an arrangement to the logic. For example, see the following diagram. T2 / \ T1T4---T5 \ / T3 If we call find_all_inheritors() with T1. The find_inheritance_children() returns T2 and T3 for T1. Then, it calls find_inheritance_children() for T2, and it returns T4. Then, it calls find_inheritance_children() for T3, and it returns T4, but it is already in the rels_list, so list_concat_unique_oid() ignores it. Then, it calls find_inheritance_children() for T4, and it returns T5. In this example, we want the expected inhcount for T2 and T3 should be 1, for T4 and T5 should be 2. However, it walks on T4 and T5 only once, so they will have 1 incorrectly. Even if we count up the ignored OID (T4), find_all_inheritors() does not walk on T5, because it is already walked on obviously when T4 is ignored. I think the count for T5 should be 1, and I think that the count for T4 can easily be made to be 2 by coding the algorithm correctly. Ahh, it is right. I was confused. Is it possible to introduce the logic mathematical-strictly? Now I'm considering whether the find_all_inheritors() logic is suitable for any cases, or not. I modified the logic to compute an expected inhcount of the child relations. What we want to compute here is to compute the number of times a certain relation being inherited directly from any relations delivered from a unique origin. If the column to be renamed is eventually inherited from a common origin, its attinhcount is not larger than the expected inhcount. WITH RECURSIVE r AS ( SELECT 't1'::regclass AS inhrelid UNION ALL SELECT c.inhrelid FROM pg_inherits c, r WHERE r.inhrelid = c.inhparent ) -- r is all the child relations inherited from 't1' SELECT inhrelid::regclass, count(*) FROM pg_inherits WHERE inhparent IN (SELECT inhrelid FROM r) GROUP BY inhrelid; The modified logic increments the expected inhcount of the relation already walked on. If we found a child relation twice or more, it means the child relation is at the junction of the inheritance tree. In this case, we don't walk on the child relations any more, but it is not necessary, because the first path already checked it. The find_all_inheritors() is called from several points more than renameatt(), so I added find_all_inheritors_with_inhcount() which takes argument of the expected inhcount list. And, find_all_inheritors() was also modified to call find_all_inheritors_with_inhcount() with NULL argument to avoid code duplication. It is the result of Berrnd's test case. - CVS HEAD 0.895s 0.903s 0.884s 0.896s 0.892s - with V6 patch 0.972s 0.941s 0.961s 0.949s 0.946s Well, that seems a lot better. Unfortunately, I can't commit this code: it's mind-bogglingly ugly. I don't believe that duplicating find_all_inheritors is the right solution (a point I've mentioned previously), and it's certainly not right to use typeName-location as a place to store an unrelated attribute inheritance count. There is also at least one superfluous variable renaming; and the recursion
[HACKERS] Aggressive freezing versus caching of pg_proc entries
There are various places in the backend that rely on comparing a catalog tuple's TID and XMIN to values saved in a cache entry in order to detect whether the tuple changed since the cache entry was made. (So far as I can find, the only places that do this are looking at pg_proc entries --- fmgr.c as well as each of the standard PLs use this technique for checking validity of function-related cache entries.) It strikes me that aggressive freezing of pg_proc entries could break this logic; where aggressive means freezing a tuple in less than the inter-reference time of somebody's cache entry. Consider a sequence like this: 1. A pg_proc tuple is frozen, so its xmin = FrozenXID. 2. Somebody caches a function definition based on the tuple. 3. Someone else updates the tuple twice; the second update by chance puts the updated tuple back at its original TID (quite likely with HOT). 4. Aggressive freeze of pg_proc sets the tuple's xmin back to FrozenXID. 5. The first somebody uses the function again. He'll see same TID and XMIN as before, hence fail to realize tuple is changed. Another path to failure is that the tuple could be dropped entirely, then replaced by one that unluckily has same OID and is placed at same TID. Again, freezing destroys all trace that this happened. The reason this occurred to me is that I was thinking about the consequences of applying cluster-like VACUUM FULL to pg_proc. That creates a third failure path: a single update followed by clustering that unluckily drops tuple back at its old TID. But as shown above, we're already at risk without that. I'm inclined to think that we should get rid of this caching method in favor of having fmgr.c and the PLs hook into sinval cache flush callbacks. It's not high priority, but given that various people have advocated aggressive freezing policies, it seems there's some risk in that. I also wonder if it wouldn't be better to centralize this logic instead of having five different implementations of it (or more --- likely some third-party PLs have copied that logic...) Anyway, not proposing to fix this now, but maybe it should be on the TODO list. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd output in initdb
Magnus Hagander wrote: Actually, on close inspection it looks to me like this commit: Create typedef pgsocket for storing socket descriptors. http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=ea1a4463e9de9662b7c9e13374ec8c7b92ff2f19 could well be the culprit. I'm not claiming it's not, but what exactly points to that? Does the problem go away if you move to a version before that? Because I'm 99% sure I saw it well before that commit, and we've had reports on it from 8.4 as well, I'm not so sure... But it may be that that commit made something more likely to happen... The buildfarm logs say otherwise. This was first seen in Jan 10, at least on my Windows animals: pgbfprod=# select sysname, min(snapshot) from build_status_log where sysname ~ 'bat' and log_stage ~ 'initdb' and log_text ~ 'pgstat' group by sysname; sysname | min --+- dawn_bat | 2010-01-10 17:30:02 red_bat | 2010-01-10 23:30:01 (2 rows) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] contrib\xml2 package's xpath_table function in PostgreSQL
Hi, We have a huge system based on PostgreSQL with xml2 functions. From the PostgreSQL 8.4 documentation F.38.1. Deprecation notice, it looks like those functions are removed. However, our solution is very huge, and heavily depends on them. 1. Could you please give me some instructions to get them back in the PostgreSQL 8.4? 2. Could you please also give me some details if we can find similar functions we can use to replace them in PostgreSQL 8.4? Thank you in advance. Best, Michael PS. Attached copied from PostgreSQL 8.4 documentation: F.38. xml2 The xml2 module provides XPath querying and XSLT functionality. F.38.1. Deprecation notice From PostgreSQL 8.3 on, there is XML-related functionality based on the SQL/XML standard in the core server. That functionality covers XML syntax checking and XPath queries, which is what this module does, and more, but the API is not at all compatible. It is planned that this module will be removed in PostgreSQL 8.4 in favor of the newer standard API, so you are encouraged to try converting your applications. If you find that some of the functionality of this module is not available in an adequate form with the newer API, please explain your issue to pgsql-hackers@postgresql.org so that the deficiency can be addressed.
Re: [HACKERS] mailing list archiver chewing patches
Matteo Beccati wrote: Incidentally, I've just found out that the mailing lists are dropping some messages. According to my qmail logs the AOX account never received Joe's message yesterday, nor quite a few others: M156252, M156259, M156262, M156273, M156275 and I've verified that it also has happened before. I don't know why, but I'm pretty sure that my MTA was contacted only once for those messages, while normally I get two connections (my own address + aox address). Hmm, I see it here: http://archives.postgresql.org/message-id/4B64A238.1050307%40joeconway.com Maybe it was just delayed? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HS/SR and smart shutdown
On Sat, Jan 30, 2010 at 12:54 PM, Fujii Masao masao.fu...@gmail.com wrote: HOWEVER, I do believe this is an issue we could live with for 9.0 if it's going to lead to a whole lot of additional debugging of SR. But if it's an easy fix, it'll avoid a lot of complaints on pgsql-general. I think that the latter statement is right. Though we've not reached consensus on smart shutdown during recovery yet, I wrote the patch that changes its behavior: shut down the server (including the startup process and the walreceiver) as soon as all read-only connections have died. The code is also available in the 'replication' branch in my git repository. And, let's discuss whether something like the attached patch is required for v9.0 or not. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** *** 278,283 typedef enum --- 278,284 PM_RECOVERY_CONSISTENT, /* consistent recovery mode */ PM_RUN, /* normal database is alive state */ PM_WAIT_BACKUP,/* waiting for online backup mode to end */ + PM_WAIT_READONLY, /* waiting for read only backends to exit */ PM_WAIT_BACKENDS, /* waiting for live backends to exit */ PM_SHUTDOWN,/* waiting for bgwriter to do shutdown ckpt */ PM_SHUTDOWN_2,/* waiting for archiver and walsenders to finish */ *** *** 2165,2171 pmdie(SIGNAL_ARGS) /* and the walwriter too */ if (WalWriterPID != 0) signal_child(WalWriterPID, SIGTERM); ! pmState = PM_WAIT_BACKUP; } /* --- 2166,2173 /* and the walwriter too */ if (WalWriterPID != 0) signal_child(WalWriterPID, SIGTERM); ! /* online backup mode is active only when normal processing */ ! pmState = (pmState == PM_RUN) ? PM_WAIT_BACKUP : PM_WAIT_READONLY; } /* *** *** 2840,2845 PostmasterStateMachine(void) --- 2842,2870 } /* + * If we are in a state-machine state that implies waiting for read only + * backends to exit, see if they're all gone, and change state if so. + */ + if (pmState == PM_WAIT_READONLY) + { + /* + * PM_WAIT_READONLY state ends when we have no regular backends that + * have been started during recovery. Since those backends might be + * waiting for the WAL record that conflicts with their queries to be + * replayed, recovery and replication need to remain until all read + * only backends have been gone away. + */ + if (CountChildren(BACKEND_TYPE_NORMAL) == 0) + { + if (StartupPID != 0) + signal_child(StartupPID, SIGTERM); + if (WalReceiverPID != 0) + signal_child(WalReceiverPID, SIGTERM); + pmState = PM_WAIT_BACKENDS; + } + } + + /* * If we are in a state-machine state that implies waiting for backends to * exit, see if they're all gone, and change state if so. */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG?] strange behavior in ALTER TABLE ... RENAME TO on inherited columns
(2010/02/01 8:41), KaiGai Kohei wrote: (2010/01/30 3:36), Robert Haas wrote: 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com: (2010/01/29 9:58), KaiGai Kohei wrote: (2010/01/29 9:29), Robert Haas wrote: 2010/1/28 KaiGai Koheikai...@ak.jp.nec.com: (2010/01/29 0:46), Robert Haas wrote: 2010/1/27 KaiGai Koheikai...@ak.jp.nec.com: Hmm, indeed, this logic (V3/V5) is busted. The idea of V4 patch can also handle this case correctly, although it is lesser in performance. I wonder whether it is really unacceptable cost in performance, or not. Basically, I assume ALTER TABLE RENAME/TYPE is not frequent operations, and I don't think this bugfix will damage to the reputation of PostgreSQL. Where should we go on the next? Isn't the problem here just that the following comment is 100% wrong? /* * Unlike find_all_inheritors(), we need to walk on child relations * that have diamond inheritance tree, because this function has to * return correct expected inhecount to the caller. */ It seems to me that the right solution here is to just add one more argument to find_all_inheritors(), something like List **expected_inh_count. Am I missing something? The find_all_inheritors() does not walk on child relations more than two times, even if a child has multiple parents inherited from common origin, because list_concat_unique_oid() ignores the given OID if it is already on the list. It means all the child relations under the relation already walked on does not checked anywhere. (Of course, this assumption is correct for the purpose of find_all_inheritors() with minimum cost.) What we want to do here is to compute the number of times a certain child relation is inherited from a common origin; it shall be the expected-inhcount. So, we need an arrangement to the logic. For example, see the following diagram. T2 / \ T1T4---T5 \ / T3 If we call find_all_inheritors() with T1. The find_inheritance_children() returns T2 and T3 for T1. Then, it calls find_inheritance_children() for T2, and it returns T4. Then, it calls find_inheritance_children() for T3, and it returns T4, but it is already in the rels_list, so list_concat_unique_oid() ignores it. Then, it calls find_inheritance_children() for T4, and it returns T5. In this example, we want the expected inhcount for T2 and T3 should be 1, for T4 and T5 should be 2. However, it walks on T4 and T5 only once, so they will have 1 incorrectly. Even if we count up the ignored OID (T4), find_all_inheritors() does not walk on T5, because it is already walked on obviously when T4 is ignored. I think the count for T5 should be 1, and I think that the count for T4 can easily be made to be 2 by coding the algorithm correctly. Ahh, it is right. I was confused. Is it possible to introduce the logic mathematical-strictly? Now I'm considering whether the find_all_inheritors() logic is suitable for any cases, or not. I modified the logic to compute an expected inhcount of the child relations. What we want to compute here is to compute the number of times a certain relation being inherited directly from any relations delivered from a unique origin. If the column to be renamed is eventually inherited from a common origin, its attinhcount is not larger than the expected inhcount. WITH RECURSIVE r AS ( SELECT 't1'::regclass AS inhrelid UNION ALL SELECT c.inhrelid FROM pg_inherits c, r WHERE r.inhrelid = c.inhparent ) -- r is all the child relations inherited from 't1' SELECT inhrelid::regclass, count(*) FROM pg_inherits WHERE inhparent IN (SELECT inhrelid FROM r) GROUP BY inhrelid; The modified logic increments the expected inhcount of the relation already walked on. If we found a child relation twice or more, it means the child relation is at the junction of the inheritance tree. In this case, we don't walk on the child relations any more, but it is not necessary, because the first path already checked it. The find_all_inheritors() is called from several points more than renameatt(), so I added find_all_inheritors_with_inhcount() which takes argument of the expected inhcount list. And, find_all_inheritors() was also modified to call find_all_inheritors_with_inhcount() with NULL argument to avoid code duplication. It is the result of Berrnd's test case. - CVS HEAD 0.895s 0.903s 0.884s 0.896s 0.892s - with V6 patch 0.972s 0.941s 0.961s 0.949s 0.946s Well, that seems a lot better. Unfortunately, I can't commit this code: it's mind-bogglingly ugly. I don't believe that duplicating find_all_inheritors is the right solution (a point I've mentioned previously), and it's certainly not right to use typeName-location as a place to store an unrelated attribute inheritance count. There is
Re: [HACKERS] Pathological regexp match
I wrote: I found a possible patch that seems to improve matters: AFAICS the DFA matching is independent of the checking that cdissect() and friends do, so we can apply that check first instead of second. This brings the runtime down from minutes to sub-second on my machine. However I don't have a whole lot of faith either in the correctness of this change, or that it might not make some other cases slower instead of faster. Has anyone got a reasonably messy collection of regexps they'd like to try this patch on? The Tcl folk accepted that patch, so I went ahead and applied it to our code. It would still be a good idea for us to do any testing we can on it, though. Also, we likely still ought to toss some CHECK_FOR_INTERRUPTS calls in there ... I looked at this a bit and decided that it would be messier than seems justified in the absence of known problems. The regex code uses malloc not palloc for transient data structures, so we can't just stick a CHECK_FOR_INTERRUPTS() into some inner loop hotspot --- throwing a longjmp would mean a permanent memory leak. I looked at integrating into the error mechanism it does have, but it turns out that that's not particularly designed to force immediate exit on failure either; they just set a flag bit that will be tested whenever control does return from the match. I think that a good fix would require first changing the mallocs to pallocs, then add CHECK_FOR_INTERRUPTS. But that's not something I have time to mess with at the moment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Robert Haas robertmh...@gmail.com wrote: ok - I am only one who like original behave - so I ?am withdrawing. Robert, If you like, please commit Itagaki's patch. + add note about behave when second parameter isn't stable. I haven't even looked at this code - I sort of assumed Itagaki was handling this one. But it might be good to make sure that the docs have been read through by a native English speaker prior to commit... Applied with some editorialization. Please check the docs are good enough. I removed a test pattern for variable delimiters from regression test because it might be an undefined behavior. We might change the behavior in the future, so we guarantee only constant delimiters. The transition functions are renamed to string_agg_transfn and string_agg_delim_transfn. We cannot use string_agg_transfn for both names because we cast the function name as regproc in bootstrap; it complains about duplicated function names. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby and VACUUM FULL
Simon Riggs si...@2ndquadrant.com writes: I'll do a little work towards step (1) just so we can take a more informed view once you've had a better look at just what (2) involves. I spent a couple of hours reading code and believe that I've identified all the pain points for allowing relocation of system catalogs (ie, assigning them new relfilenodes during cluster-style VACUUM FULL, REINDEX, etc). It's definitely not a trivial project but it's not out of reach either --- I estimate I could get it done in a couple or three days of full-time effort. Given the potential benefits I think it's worth doing. Rough notes are attached --- comments? regards, tom lane Change pg_class.relfilenode to be 0 for all rels for which a map file should be used instead. Define RelationGetFilenode() to look to the physaddr info instead, and make all internal refs go to that instead of reading rd_rel-relfilenode directly. Define pg_relation_filenode(regclass) and fix external refs (oid2name, pg_dump) to use that instead. In zeroth cut, just have relcache.c substitute the OID in place of reading a map file. Possibly it should always do that during bootstrap. How should bootstrap decide which rels to insert zero for, anyway? Shared definitely, pg_class, ... maybe that's enough? Or do we need it for all nailed local catalogs? local state contains * shared map list * this database's map list * list of local overrides to each on-disk map list At commit: if modified, write out new version; do this as late as possible before xact commit At abort: just discard local-override list Write must include generating a WAL entry as well as sending a shared-inval signal * write temp file, fsync it * emit WAL record containing copy of new file contents, fsync it * atomically rename temp file into place * send sinval During relation open, check for sinval before relying on current cached value of map contents Hm, what about concurrent updates of map? Probably instantiate a new LWLock or maybe better a HW lock. So write involves taking lock, check for updates, finally write --- which means that local modifications have to be stored in a way that allows re-reading somebody else's mods. Hence above data structure with separate storage of local modifications. We assume rel-level locking protects us from concurrent update attempts on the same map entry, but we don't want to forbid concurrent relocations of different catalogs. LWLock would work if we do an unconditional read of the file contents after getting lock rather than relying on sinval signaling, which seems a small price considering map updates should be infrequent. Without that, writers have to hold the lock till commit which rules out using LWLock. Hm ... do we want an LWLock per map file, or is one lock to rule them all sufficient? LWLock per database seems problematic. With an HW lock there wouldn't be a problem with that. HW lock would allow concurrent updates of the map files of different DBs, but is that worth the extra cycles? In a case where a transaction relocates pg_class (or another mapped catalog) and then makes updates in that catalog, all is well in the sense that the updates will be preserved iff the xact commits. HOWEVER we would have problems during WAL replay? No, because the WAL entries will command updates to the catalog's new relfilenode, which will be interesting to anyone else if and only if the xact gets to commit. We would need to be careful about the case of relocating pg_class itself though --- make sure local relcache references the new pg_class before any such updates happen. Probably a CCI is sufficient. Another issue for clustering a catalog is that sinval catcache signalling could get confused, since that depends on catalog entry TIDs which would change --- we'd likely need to have relocation send an sinval signal saying flush *everything* from that catalog. (relcache inval on the catalog itself doesn't do that.) This action could/should be transactional so no additional code is needed to propagate the notice to HS standbys. Once the updated map file is moved into place, the relocation is effectively committed even if we subsequently abort the transaction. We can make that window pretty narrow but not remove it completely. Risk factors: * abort would try to remove relation files created by xact, in particular the new version of the catalog. Ooops. Can fix this by telling smgr to forget about removing those files before installing the new map file --- better to leak some disk space than destroy catalogs. * on abort, we'd not send sinval notices, leaving other backends at risk of using stale info (maybe our own too). We could fix this by sending the sinval notice BEFORE renaming into place (if we send it and then fail to rename, no real harm done, just useless cache flushes). This requires keeping a nontransactional-inval path in inval.c, but
Re: [HACKERS] Hot Standby and VACUUM FULL
FYI, getting rid of VACUUM FULL in a .0 release does make more sense than doing it in a .1 release. --- Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: I'll do a little work towards step (1) just so we can take a more informed view once you've had a better look at just what (2) involves. I spent a couple of hours reading code and believe that I've identified all the pain points for allowing relocation of system catalogs (ie, assigning them new relfilenodes during cluster-style VACUUM FULL, REINDEX, etc). It's definitely not a trivial project but it's not out of reach either --- I estimate I could get it done in a couple or three days of full-time effort. Given the potential benefits I think it's worth doing. Rough notes are attached --- comments? regards, tom lane Change pg_class.relfilenode to be 0 for all rels for which a map file should be used instead. Define RelationGetFilenode() to look to the physaddr info instead, and make all internal refs go to that instead of reading rd_rel-relfilenode directly. Define pg_relation_filenode(regclass) and fix external refs (oid2name, pg_dump) to use that instead. In zeroth cut, just have relcache.c substitute the OID in place of reading a map file. Possibly it should always do that during bootstrap. How should bootstrap decide which rels to insert zero for, anyway? Shared definitely, pg_class, ... maybe that's enough? Or do we need it for all nailed local catalogs? local state contains * shared map list * this database's map list * list of local overrides to each on-disk map list At commit: if modified, write out new version; do this as late as possible before xact commit At abort: just discard local-override list Write must include generating a WAL entry as well as sending a shared-inval signal * write temp file, fsync it * emit WAL record containing copy of new file contents, fsync it * atomically rename temp file into place * send sinval During relation open, check for sinval before relying on current cached value of map contents Hm, what about concurrent updates of map? Probably instantiate a new LWLock or maybe better a HW lock. So write involves taking lock, check for updates, finally write --- which means that local modifications have to be stored in a way that allows re-reading somebody else's mods. Hence above data structure with separate storage of local modifications. We assume rel-level locking protects us from concurrent update attempts on the same map entry, but we don't want to forbid concurrent relocations of different catalogs. LWLock would work if we do an unconditional read of the file contents after getting lock rather than relying on sinval signaling, which seems a small price considering map updates should be infrequent. Without that, writers have to hold the lock till commit which rules out using LWLock. Hm ... do we want an LWLock per map file, or is one lock to rule them all sufficient? LWLock per database seems problematic. With an HW lock there wouldn't be a problem with that. HW lock would allow concurrent updates of the map files of different DBs, but is that worth the extra cycles? In a case where a transaction relocates pg_class (or another mapped catalog) and then makes updates in that catalog, all is well in the sense that the updates will be preserved iff the xact commits. HOWEVER we would have problems during WAL replay? No, because the WAL entries will command updates to the catalog's new relfilenode, which will be interesting to anyone else if and only if the xact gets to commit. We would need to be careful about the case of relocating pg_class itself though --- make sure local relcache references the new pg_class before any such updates happen. Probably a CCI is sufficient. Another issue for clustering a catalog is that sinval catcache signalling could get confused, since that depends on catalog entry TIDs which would change --- we'd likely need to have relocation send an sinval signal saying flush *everything* from that catalog. (relcache inval on the catalog itself doesn't do that.) This action could/should be transactional so no additional code is needed to propagate the notice to HS standbys. Once the updated map file is moved into place, the relocation is effectively committed even if we subsequently abort the transaction. We can make that window pretty narrow but not remove it completely. Risk factors: * abort would try to remove relation files created by xact, in particular the new version of the catalog. Ooops. Can fix this by telling smgr to forget about removing those files before installing the new map file --- better to leak some disk space than destroy catalogs. * on abort, we'd not send sinval notices, leaving other backends at risk of
Re: [HACKERS] Review: listagg aggregate
2010/2/1 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp: Robert Haas robertmh...@gmail.com wrote: ok - I am only one who like original behave - so I ?am withdrawing. Robert, If you like, please commit Itagaki's patch. + add note about behave when second parameter isn't stable. I haven't even looked at this code - I sort of assumed Itagaki was handling this one. But it might be good to make sure that the docs have been read through by a native English speaker prior to commit... Applied with some editorialization. Please check the docs are good enough. I removed a test pattern for variable delimiters from regression test because it might be an undefined behavior. We might change the behavior in the future, so we guarantee only constant delimiters. The transition functions are renamed to string_agg_transfn and string_agg_delim_transfn. We cannot use string_agg_transfn for both names because we cast the function name as regproc in bootstrap; it complains about duplicated function names. thank you very much Pavel Stehule Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Largeobject Access Controls (r2460)
KaiGai Kohei kai...@ak.jp.nec.com wrote: The attached patch uses one TOC entry for each blob objects. This patch does not only fix the existing bugs, but also refactor the dump format of large objects in pg_dump. The new format are more similar to the format of tables: SectionTables New LOOld LO - Schema TABLE BLOB ITEM BLOBS Data TABLE DATA BLOB DATA BLOBS Comments COMMENTCOMMENT BLOB COMMENTS We will allocate BlobInfo in memory for each large object. It might consume much more memory compared with former versions if we have many large objects, but we discussed it is an acceptable change. As far as I read, the patch is almost ready to commit except the following issue about backward compatibility: * BLOB DATA This section is same as existing BLOBS section, except for _LoadBlobs() does not create a new large object before opening it with INV_WRITE, and lo_truncate() will be used instead of lo_unlink() when --clean is given. The legacy sections (BLOBS and BLOB COMMENTS) are available to read for compatibility, but newer pg_dump never create these sections. I wonder we need to support older versions in pg_restore. You add a check PQserverVersion = 80500 in CleanupBlobIfExists(), but out documentation says we cannot use pg_restore 9.0 for 8.4 or older servers: http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html | it is not guaranteed that pg_dump's output can be loaded into | a server of an older major version Can we remove such path and raise an error instead? Also, even if we support the older servers in the routine, the new bytea format will be another problem anyway. One remained issue is the way to decide whether blobs to be dumped, or not. Right now, --schema-only eliminate all the blob dumps. However, I think it should follow the manner in any other object classes. -a, --data-only... only BLOB DATA sections, not BLOB ITEM -s, --schema-only ... only BLOB ITEM sections, not BLOB DATA -b, --blobs... both of BLOB ITEM and BLOB DATA independently from --data-only and --schema-only? I cannot image situations that require data-only dumps -- for example, restoring database has a filled pg_largeobject_metadata and an empty or broken pg_largeobject -- but it seems to be unnatural. I'd prefer to keep the existing behavior: * default or data-only : dump all attributes and data of blobs * schema-only : don't dump any blobs and have independent options to control blob dumps: * -b, --blobs: dump all blobs even if schema-only * -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers