Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
On Sun, Jan 15, 2012 at 05:44, Greg Smith g...@2ndquadrant.com wrote: On 01/14/2012 10:49 PM, Gurjeet Singh wrote: So lets make it easy for the patch submitter to start the process. I propose that we have a page in the CF application where people can upload/attach the patch, and the app posts the patch to -hackers and uses the post URL to create the CF entry. That would be nice, but there's at least two serious problems with it, which I would guess are both unsolvable without adding an unsupportable amount of work to the current PostgreSQL web team. First, it is technically risky for a web application hosted on postgresql.org to be e-mailing this list. There are some things in the infrastructure that do that already--I believe the pgsql-commiters list being driven from commits is the busiest such bot. But all of the ones that currently exist are either moderated, have a limited number of approved submitters, or both. It's not really a problem from that perspective, as long as it requires the user to be logged in. The mail would be sent from the users account, with that one as a sender, and thus be exposed to the same moderation rules as the rest of the list posts. If it were possible for a bot to create a postgresql.org community account, then trigger an e-mail to pgsql-hackers just by filling out a web form, I'd give it maybe six months before it has to be turned off for a bit--because there are thousands messages queued up once the first bored spammer figures Said bot can already use the bug report form *without* having to sign up for an account. Or said bot could submit news or events, which trigger an email to at least some lists, which hasn't bene done. It's supposedly not easy for a bot to sign up for a community account, since it requires you to have access to the email address it's registered on. If that doesn't work, it's a bug and needs to be fixed regardless. that out. Securing web to e-mail gateways is a giant headache, and everyone working on the PostgreSQL infrastructure who might work on that is already overloaded with community volunteer work. There's an element of zero-sum We've already solved that problem for other situtations, and given how the infrastructure is built, that's fairly easy to replicate to another node. I think the bigger problem is who'll write it. AFAIK, the CF app *itself* is even more person- and time-constrained to senior developers (Robert Haas only) than the infrastructure, and that's a bigger problem. There are already a bunch of things that are a lot simpler than this that has been pending on that one for well over half a year. Second, e-mail provides some level of validation that patches being submitted are coming from the person they claim. We currently reject patches that are only shared with the community on the web, via places like github. The process around this mailing list tries to make it clear sending patches to here is a code submission under the PostgreSQL license. And e-mail nowadays keeps increasing the number of checks that confirm it's coming from the person it claims sent it. I can go check into the DKIM credentials your Gmail message to the list contained if I'd like, to help confirm it really came from your account. E-mail headers are certainly not I think DKIM was a bad example, because AFAIK our lists mangle DKIM and thus actually show them as *invalid* for at least the majority of messages... One unicorn I would like to have here would give the CF app a database of recent e-mails to pgsql-hackers. I login to the CF app, click on Add recent submission, and anything matching my e-mail address appears with a checkbox next to it. Click on the patch submissions, and then something like you described would happen. That would save me the annoying work around looking up message IDs so much. That would be neat. And FWIW, I'd find it a lot more useful for the CF app to have the ability to post *reviews* in it, that would end up being properly threaded. The way it is now, half the reviewers create a *new* thread to post their reviews on, making it a PITA to keep track of those patches on the list at all, which somewhat takes away the whole idea of mail being the primary way to track it. Not saying it's critical, but I'd put it a lot higher on the list than being able to post the initial patch. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Vacuum rate limit in KBps
So far the reaction I've gotten from my recent submission to make autovacuum log its read/write in MB/s has been rather positive. I've been surprised at the unprecedented (to me at least) amount of backporting onto big production systems it's gotten. There is a whole lot of pent up frustration among larger installs over not having good visibility into how changing cost-based vacuum parameters turns into real-world units. That got me thinking: if MB/s is what everyone wants to monitor, can we provide a UI to set these parameters that way too? The attached patch is a bit rough still, but it does that. The key was recognizing that the cost delay plus cost limit can be converted into an upper limit on cost units per second, presuming the writes themselves are free. If you then also assume the worst case--that everything will end up dirty--by throwing in the block size, too, you compute a maximum rate in MB/s. That represents the fastest you can possibly write. If you then turn that equation around, making the maximum write rate the input, for any given cost delay and dirty page cost you can solve for the cost limit--the parameter in fictitious units everyone hates. It works like this, with the computation internals logged every time they run for now: #vacuum_cost_rate_limit = 4000 # maximum write rate in kilobytes/second LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 That's the same cost limit that was there before, except now it's derived from that maximum write rate figure. vacuum_cost_limit is gone as a GUC, replaced with this new vacuum_cost_rate_limit. Internally, vacuum_cost_rate_limit hasn't gone anywhere though. All of the entry points into vacuum and autovacuum derive an internal-only VacuumCostLimit as part of any setup or rebalance operation. But there's no change to underlying cost management code; the cost limit is budgeted and accounted for in exactly the same way as it always was. Why is this set in kilobytes/second rather than using something based on a memory unit? That decision was made after noting these values can also be set in relation options. Making relation options aware of memory unit math seemed ambitious relative to its usefulness, and it's not like KB/s is hard to work with in this context. OK, I lied; technically this is set in kibibytes per second right now. Ran out of energy before I got to confirming that was consistent with all similar GUC settings, will put on my pedantic hat later to check that. One nice thing that falls out of this is that the *vacuum_cost_delay settings essentially turn into a boolean. If the delay is 0, cost limits are off; set it to any other value, and the rate you get is driven almost entirely by vacuum_cost_rate_limit (disclaimer mainly because of issues like sleep time accuracy are possible). You can see that at work in these examples: LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 LOG: cost limit=100 based on rate limit=4000 KB/s delay=10 dirty cost=20 LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 LOG: cost limit=100 based on rate limit=2000 KB/s delay=20 dirty cost=20 Halve the delay to 10, and the cost limit drops in half too to keep the same I/O rate. Halve the rate limit instead, and the cost limit halves with it. Most sites will never need to change the delay figure from 20ms, they can just focus on tuning the more human-readable rate limit figure instead. The main reason I thought of to keep the delay around as an integer still is sites trying to minimize power use, they might increase it from the normally used 20ms. I'm not as worried about postgresql.conf settings bloat to support a valid edge use case, so long as most sites find a setting unnecessary to tune. And the autovacuum side of cost delay should fall into that category with this change. Here's a full autovacuum log example. This shows how close to the KBps rate the server actually got, along with the autovacuum cost balancing working the same old way (this is after running the boring autovac-big.sql test case attached here too): 2012-01-15 02:10:51.905 EST: LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20 2012-01-15 02:10:51.906 EST: DEBUG: autovac_balance_cost(pid=13054 db=16384, rel=16444, cost_rate_limit=4000, cost_limit=200, cost_limit_base=200, cost_delay=20) 2012-01-15 02:11:05.127 EST: DEBUG: t: removed 499 row versions in 22124 pages 2012-01-15 02:11:05.127 EST: DEBUG: t: found 499 removable, 501 nonremovable row versions in 44248 out of 44248 pages 2012-01-15 02:11:05.127 EST: DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.27s/0.97u sec elapsed 131.73 sec. 2012-01-15 02:11:05.127 EST: LOG: automatic vacuum of table gsmith.public.t: index scans: 0 pages: 0
Re: [HACKERS] Our poll() based WaitLatch implementation is broken
On 15 January 2012 07:26, Peter Geoghegan pe...@2ndquadrant.com wrote: Build Postgres master, on Linux or another platform that will use the poll() implementation rather than the older select(). Send the Postmaster SIGKILL. Observe that the WAL Writer lives on, representing a denial of service as it stays attached to shared memory, busy waiting (evident from the fact that it quickly leaks memory). Correction: The CPU usage % approaches 100. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
On 01/15/2012 03:17 AM, Magnus Hagander wrote: And FWIW, I'd find it a lot more useful for the CF app to have the ability to post *reviews* in it, that would end up being properly threaded. Next you'll be saying we should have some sort of web application to help with the whole review process, show the work on an integrated sort of Review Board or something. What crazy talk. My contribution toward patch review ease for this week is that peg is quite a bit smarter about referring to the correct part of the origin git repo now, when you've been working on a branch for a while then create a new one: https://github.com/gregs1104/peg Last week's was that I confirmed that on a Mac using Homebrew for package management, after brew install postgresql to get the dependencies in, you can then use peg to setup a PostgreSQL in your home directory for patch testing or development. Works fine out of the box, you just won't have things like all the PLs installed. Yes, I am aware I'm going at this bottom-up. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
On Sun, Jan 15, 2012 at 09:37, Greg Smith g...@2ndquadrant.com wrote: On 01/15/2012 03:17 AM, Magnus Hagander wrote: And FWIW, I'd find it a lot more useful for the CF app to have the ability to post *reviews* in it, that would end up being properly threaded. Next you'll be saying we should have some sort of web application to help with the whole review process, show the work on an integrated sort of Review Board or something. What crazy talk. Well, it's early in the morning for being sunday, I blame that. My contribution toward patch review ease for this week is that peg is quite a bit smarter about referring to the correct part of the origin git repo now, when you've been working on a branch for a while then create a new one: https://github.com/gregs1104/peg Being able to refer to a git branch is one of those things that have been on the todo list for the cf app since pgcon last year... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgstat documentation tables
Right now we have a single table on http://www.postgresql.org/docs/devel/static/monitoring-stats.html#MONITORING-STATS-VIEWS that lists all our statistics views. It makes for difficult to parse descriptions like One row only, showing cluster-wide statistics from the background writer: number of scheduled checkpoints, requested checkpoints, buffers written by checkpoints and cleaning scans, and the number of times the background writer stopped a cleaning scan because it had written too many buffers. Also includes statistics about the shared buffer pool, including buffers written by backends (that is, not by the background writer), how many times those backends had to execute their own fsync calls (normally the background writer handles those even when the backend does its own write), total buffers allocated, and time of last statistics reset. I'd like to turn that into one table for each view, with two columns, one being the name the other one being the description. That'll also make it possible to expand on the descriptions without making it completley unreadable, should we want to. Comments?Objections? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IDLE in transaction introspection
On 01/12/2012 11:57 AM, Scott Mead wrote: Pretty delayed, but please find the attached patch that addresses all the issues discussed. The docs on this v4 look like they suffered a patch order problem here. In the v3, you added a whole table describing the pg_stat_activity documentation in more detail than before. v4 actually tries to remove those new docs, a change which won't even apply as they don't exist upstream. My guess is you committed v3 to somewhere, applied the code changes for v4, but not the documentation ones. It's easy to do that and end up with a patch that removes a bunch of docs the previous patch added. I have to be careful to always do something like git diff origin/master to avoid this class of problem, until I got into that habit I did this sort of thing regularly. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat documentation tables
On 01/15/2012 03:51 AM, Magnus Hagander wrote: I'd like to turn that into one table for each view, with two columns, one being the name the other one being the description. That'll also make it possible to expand on the descriptions without making it completley unreadable, should we want to. Scott Mead's pg_stat_activity refactoring patch already was forced toward doing something similar for its documentation to be readable (maybe exactly the same as you were thinking, can't tell because my SGML-fu is weak) . I feel doing this for every view in there is inevitable, just a matter of when to bite on the problem. This whole section is barely usable in its current form. I strongly hope the pg_stat_activity patch goes in, which will be close to a compelling event to motivate fixing the rest of them in a similar way. I would advocate leading with that one until it seems right, then adopting the same style of reorg to the rest until they're all converted. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review of: pg_stat_statements with query tree normalization
I've *finally* gotten around to reviewing this patch. My first step was to de-bitrot it very slightly. More on that in a moment. After that, I tried using it. Installation worked nicely -- I did CREATE EXTENSION and then tried reading from pg_stat_statements. I was then given an error message to add pg_stat_statements into shared_preload_libraries, and so I did. Given its use of shared memory, there is not much other choice, but it's neat that I got it running in just a few minutes without reading any documentation. I ran a non-trivial but not-enormous application against the pg_stat_statements-enabled database, and the results were immediate and helpful, being correctly normalized and statistics apparently accrued. 'make check' also worked fine. I ran with assertions and debugging enabled. This query I suspect will quickly become a new crowd-pleaser: SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; The output looks like this (fixed width looks best): userid | 16385 dbid| 16386 query | SELECT * FROM lock_head($1, $2); calls | 12843 total_time | 19.795586 rows| 8825 shared_blks_hit | 194583 shared_blks_read| 19 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 A couple of nit-picks: a query execution is probably not often referred to as a call, and total_time doesn't signify its units in the name, but I eyeballed it to be seconds. I cannot see superuser query text when I'm in a non-superuser role, however, I can see the execution statistics: userid | 10 dbid| 16386 query | insufficient privilege calls | 1448 total_time | 0.696188 rows| 205616 shared_blks_hit | 13018 shared_blks_read| 14 shared_blks_written | 0 local_blks_hit | 0 local_blks_read | 0 local_blks_written | 0 temp_blks_read | 0 temp_blks_written | 0 Prepared statements are less informative, unless one knows their naming convention: query | execute foo; I don't know what JDBC users or ActiveRecord 3 users are going to think about that. However, just about everyone else will be happy. The patch very nicely handles its intended task: folding together queries with the same literal. This is the key feature that makes the output really useful, and is quite unique: the next best thing I know of is pgfouine, which uses some heuristics and needs to be run over specially formatted logs, usually offline. This is much better, much faster, more interactive, and much more sound when it comes to normalizing queries. Onto the mechanism: the patch is both a contrib and changes to Postgres. The changes to postgres are mechanical in nature, but voluminous. The key change is to not only remember the position of Const nodes in the query tree, but also their length, and this change is really extensive although repetitive. It was this swathe of change that bitrotted the source, when new references to some flex constructs were added. Every such reference has needs to explicitly refer to '.begins', which is the beginning position of the const -- this used to be the only information tracked. Because .length was never required by postgres in the past, it fastidiously bookkeeps that information without ever referring to it internally: only pg_stat_statements does. There is a performance test program (written in Python) conveniently included in the contrib to allay fears that such const-length bookkeeping may be too expensive. I have not run it yet myself. There is also a regression test -- also a python program -- presumably because it would be more difficult to test this the normal way via pg_regress, at least to get a full integration test. I'm a little surprised that pg_stat_statements.c uses direct recursion instead of the mutually recursive walker interface, which makes it quite a bit different than most passes in the optimizer I've seen. I'm still looking at the mechanism in more detail, but I am having a fairly easy time following it -- there's just a lot of it to cover the litany of Nodes. I'll submit my editorializations of whitespace and anti-bitrot to Peter Geoghegan via git (actually, that information is below, for any curious onlookers). If anyone wants to play with the this anti-bitrotted copy against a very recent master, please see: $ git fetch https://github.com/fdr/postgres.git pg_stat_statements_norm:pg_stat_statements_norm $ git checkout pg_stat_statements_norm All in all, it looks and works well, inside and out. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: pg_stat_statements with query tree normalization
On 15 January 2012 11:41, Daniel Farina dan...@heroku.com wrote: I've *finally* gotten around to reviewing this patch. My first step was to de-bitrot it very slightly. More on that in a moment. Thanks. Prepared statements are less informative, unless one knows their naming convention: query | execute foo; I don't know what JDBC users or ActiveRecord 3 users are going to think about that. However, just about everyone else will be happy. I should point out, though I think you might be aware of this already, that the patch actually behaves in the same way as the existing implementation here. It will only normalise prepared queries that are prepared with PQprepare() or its underlying wire-protocol facility. If you run the example in the pg_stat_statements docs, where pgbench is passed -M prepared, that will work just as well as before. Perhaps it's something that the patch should be able to do. That said, it seems pretty likely that client libraries won't be dynamically generating SQL Prepare/Execute statements under the hood. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] exit() calls in libraries
On mån, 2011-12-05 at 15:04 -0300, Alvaro Herrera wrote: Having had to battle some exit() calls in the PHP interpreter back when I was working in PL/php, I agree that they shouldn't be there -- abort() seems more appropriate if the system is truly busted. As for the fr-print.c code, I'm not really sure why don't we just remove it. This is the patch that would get rid of all exit() calls in the libraries. diff --git i/src/interfaces/libpq/fe-print.c w/src/interfaces/libpq/fe-print.c index 5fa3be0..a6c2959 100644 --- i/src/interfaces/libpq/fe-print.c +++ w/src/interfaces/libpq/fe-print.c @@ -112,17 +112,17 @@ PQprint(FILE *fout, const PGresult *res, const PQprintOpt *po) if (!(fieldNames = (const char **) calloc(nFields, sizeof(char * { fprintf(stderr, libpq_gettext(out of memory\n)); - exit(1); + abort(); } if (!(fieldNotNum = (unsigned char *) calloc(nFields, 1))) { fprintf(stderr, libpq_gettext(out of memory\n)); - exit(1); + abort(); } if (!(fieldMax = (int *) calloc(nFields, sizeof(int { fprintf(stderr, libpq_gettext(out of memory\n)); - exit(1); + abort(); } for (numFieldName = 0; po-fieldName po-fieldName[numFieldName]; @@ -203,7 +203,7 @@ PQprint(FILE *fout, const PGresult *res, const PQprintOpt *po) if (!(fields = (char **) calloc(nFields * (nTups + 1), sizeof(char * { fprintf(stderr, libpq_gettext(out of memory\n)); -exit(1); +abort(); } } else if (po-header !po-html3) @@ -391,7 +391,7 @@ do_field(const PQprintOpt *po, const PGresult *res, if (!(fields[i * nFields + j] = (char *) malloc(plen + 1))) { fprintf(stderr, libpq_gettext(out of memory\n)); -exit(1); +abort(); } strcpy(fields[i * nFields + j], pval); } @@ -462,7 +462,7 @@ do_header(FILE *fout, const PQprintOpt *po, const int nFields, int *fieldMax, if (!border) { fprintf(stderr, libpq_gettext(out of memory\n)); - exit(1); + abort(); } p = border; if (po-standard) @@ -605,7 +605,7 @@ PQdisplayTuples(const PGresult *res, if (!fLength) { fprintf(stderr, libpq_gettext(out of memory\n)); - exit(1); + abort(); } for (j = 0; j nFields; j++) @@ -704,7 +704,7 @@ PQprintTuples(const PGresult *res, if (!tborder) { fprintf(stderr, libpq_gettext(out of memory\n)); -exit(1); +abort(); } for (i = 0; i = width; i++) tborder[i] = '-'; diff --git i/src/interfaces/libpq/libpq-int.h w/src/interfaces/libpq/libpq-int.h index 64dfcb2..a0d93e0 100644 --- i/src/interfaces/libpq/libpq-int.h +++ w/src/interfaces/libpq/libpq-int.h @@ -483,7 +483,7 @@ extern pgthreadlock_t pg_g_threadlock; #define PGTHREAD_ERROR(msg) \ do { \ fprintf(stderr, %s\n, msg); \ - exit(1); \ + abort(); \ } while (0) diff --git i/src/port/path.c w/src/port/path.c index 9cb0b01..de345c2 100644 --- i/src/port/path.c +++ w/src/port/path.c @@ -445,7 +445,7 @@ get_progname(const char *argv0) if (progname == NULL) { fprintf(stderr, %s: out of memory\n, nodir_name); - exit(1);/* This could exit the postmaster */ + abort();/* This could exit the postmaster */ } #if defined(__CYGWIN__) || defined(WIN32) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] age(xid) on hot standby
On ons, 2011-12-28 at 14:35 -0500, Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Peter Eisentraut's message of mié dic 28 15:04:09 -0300 2011: On a hot standby, this fails with: ERROR: cannot assign TransactionIds during recovery I think we could just have the xid_age call GetCurrentTransactionIdIfAny, and if that returns InvalidXid, use ReadNewTransactionId instead. That xid_age assigns a transaction seems more of an accident than really intended. The trouble with using ReadNewTransactionId is that it makes the results volatile, not stable as the function is declared to be. Could we alleviate that problem with some caching within the function? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lots of unused variable warnings in assert-free builds
On 01/15/2012 01:37 AM, Tom Lane wrote: Peter Eisentrautpete...@gmx.net writes: I see that in some places our code already uses #ifdef USE_ASSERT_CHECKING, presumably to hide similar issues. But in most cases using this would significantly butcher the code. I found that adding __attribute__((unused)) is cleaner. Attached is a patch that cleans up all the warnings I encountered. Surely this will fail entirely on most non-gcc compilers? Not to mention that next month's gcc may complain hey, you used this 'unused' variable. I think #ifdef USE_ASSERT_CHECKING is really the only way if you care about quieting these warnings. (Personally, I don't.) It would possibly have some documentary value too. Just looking very quickly at Peter's patch, I don't really understand his assertion that this would significantly butcher the code. The worst effect would be that in a few cases we'd have to break up multiple declarations where one of the variables was in this class. That doesn't seem like a tragedy. I like software that compiles in the normal use with few or no warnings. I should have thought that would appeal to most packagers, too. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lots of unused variable warnings in assert-free builds
On Sun, Jan 15, 2012 at 1:14 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/15/2012 01:37 AM, Tom Lane wrote: Peter Eisentrautpete...@gmx.net writes: I see that in some places our code already uses #ifdef USE_ASSERT_CHECKING, presumably to hide similar issues. But in most cases using this would significantly butcher the code. I found that adding __attribute__((unused)) is cleaner. Attached is a patch that cleans up all the warnings I encountered. Surely this will fail entirely on most non-gcc compilers? Not to mention that next month's gcc may complain hey, you used this 'unused' variable. I think #ifdef USE_ASSERT_CHECKING is really the only way if you care about quieting these warnings. (Personally, I don't.) It would possibly have some documentary value too. Just looking very quickly at Peter's patch, I don't really understand his assertion that this would significantly butcher the code. The worst effect would be that in a few cases we'd have to break up multiple declarations where one of the variables was in this class. That doesn't seem like a tragedy. I like software that compiles in the normal use with few or no warnings. I should have thought that would appeal to most packagers, too. Sounds good, but let's not do it yet because we have a few patches to commit first. It would be good to minimise bit rot during the CF. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat documentation tables
On Sun, Jan 15, 2012 at 10:35, Greg Smith g...@2ndquadrant.com wrote: On 01/15/2012 03:51 AM, Magnus Hagander wrote: I'd like to turn that into one table for each view, with two columns, one being the name the other one being the description. That'll also make it possible to expand on the descriptions without making it completley unreadable, should we want to. Scott Mead's pg_stat_activity refactoring patch already was forced toward doing something similar for its documentation to be readable (maybe exactly the same as you were thinking, can't tell because my SGML-fu is weak) . I feel doing this for every view in there is inevitable, just a matter of when to bite on the problem. This whole section is barely usable in its current form. I strongly hope the pg_stat_activity patch goes in, which will be close to a compelling event to motivate fixing the rest of them in a similar way. I would advocate leading with that one until it seems right, then adopting the same style of reorg to the rest until they're all converted. Yes, the pg_stat_activity patch will make it more important - but I want to do it for *all* of them... I'll be happy to do the mechanical work on turning them into table, but yes, I was planning to look at the pg_stat_activity patch first all along :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to allow users to kill their own queries
On Fri, Jan 13, 2012 at 14:46, Magnus Hagander mag...@hagander.net wrote: On Fri, Jan 13, 2012 at 14:42, Greg Smith g...@2ndquadrant.com wrote: On 01/03/2012 12:59 PM, Tom Lane wrote: Noah Mischn...@leadboat.com writes: Regarding the other message, avoid composing a translated message from independently-translated parts. Yes. I haven't looked at the patch, but I wonder whether it wouldn't be better to dodge both of these problems by having the subroutine return a success/failure result code, so that the actual ereport can be done at an outer level where the full message (and hint) can be written directly. Between your and Noah's comments I understand the issue and likely direction to sort this out now. Bounced forward to the next CommitFest and back on my plate again. Guess I'll be learning new and exciting things about translation this week. I should say that I have more or less finished this one off, since I figured you were working on more important things. Just a final round of cleanup and commit left, really, so you can take it off your plate again if you want to :-) Finally, to end the bikeshedding, I've applied this patch after another round of fairly extensive changes. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers
On Sat, Jan 14, 2012 at 6:48 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Jan 14, 2012 at 5:25 AM, Hitoshi Harada umi.tan...@gmail.com wrote: The patch looks ok, though I wonder if we could have a way to release the lock on namespace much before the end of transaction. Well, that wold kind of miss the point, wouldn't it? I mean, the race is that the process dropping the schema might not see the newly created object. The only way to prevent that is to hold some sort of lock until the newly created object is visible, which doesn't happen until commit. I know it's a limited situation, though. Maybe the right way would be to check the namespace at the end of the transaction if any object is created, rather than locking it. And then what? If you find that you created an object in a namespace that's been subsequently dropped, what do you do about that? As far as I can see, your own really choice would be to roll back the transaction that uncovers this problem, which is likely to produce more rollbacks than just letting the deadlock detector sort it out. Right. I thought this patch introduced lock on schema in SELECT, but actually we'd been locking schema with SELECT since before the patch. So the behavior becomes consistent (between SELECT and CREATE) now with it. And I agree that my insist is pointless after looking more. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/14/2012 03:06 PM, Andrew Dunstan wrote: OK, here's a patch that does both query_to_json and array_to_json, along with docs and regression tests. It include Robert's original patch, although I can produce a differential patch if required. It can also be pulled from https://bitbucket.org/adunstan/pgdevel Here's an update that adds row_to_json, plus a bit more cleanup. Example: andrew=# SELECT row_to_json(q) FROM (SELECT $$a$$ || x AS b, y AS c, ARRAY[ROW(x.*,ARRAY[1,2,3]), ROW(y.*,ARRAY[4,5,6])] AS z FROM generate_series(1,2) x, generate_series(4,5) y) q; row_to_json {b:a1,c:4,z:[{f1:1,f2:[1,2,3]},{f1:4,f2:[4,5,6]}]} {b:a1,c:5,z:[{f1:1,f2:[1,2,3]},{f1:5,f2:[4,5,6]}]} {b:a2,c:4,z:[{f1:2,f2:[1,2,3]},{f1:4,f2:[4,5,6]}]} {b:a2,c:5,z:[{f1:2,f2:[1,2,3]},{f1:5,f2:[4,5,6]}]} (4 rows) (This might be more to Robert's taste than query_to_json() :-) ) cheers andrew diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 152ef2f..f45b10b 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -269,6 +269,12 @@ entry/entry entryXML data/entry /row + + row + entrytypejson/type/entry + entry/entry + entryJSON data/entry + /row /tbody /tgroup /table @@ -4169,6 +4175,21 @@ SET xmloption TO { DOCUMENT | CONTENT }; /sect2 /sect1 + sect1 id=datatype-json + titleacronymJSON/ Type/title + + indexterm zone=datatype-json +primaryJSON/primary + /indexterm + + para +The typejson/type data type can be used to store JSON data. Such +data can also be stored as typetext/type, but the +typejson/type data type has the advantage of checking that each +stored value is a valid JSON value. + /para + /sect1 + array; rowtypes; diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2e06346..9368739 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -9615,6 +9615,77 @@ table2-mapping /sect2 /sect1 + sect1 id=functions-json + titleJSON functions/title + + indexterm zone=datatype-json + primaryJSON/primary + secondaryFunctions and operators/secondary + /indexterm + + para +This section descripbes the functions that are available for creating +JSON (see xref linkend=datatype-json) data. + /para + + table id=functions-json-table +titleJSON Support Functions/title +tgroup cols=4 + thead + row + entryFunction/entry + entryDescription/entry + entryExample/entry + entryExample Result/entry + /row + /thead + tbody + row + entry + indexterm + primaryquery_to_json/primary + /indexterm + literalquery_to_json(text, boolean)/literal + /entry + entry + Returns the result of running the query as JSON. If the + second parameter is true, there will be a line feed between records. + /entry + entryliteralquery_to_json('select 1 as a, $$foo$$ as b', false)/literal/entry + entryliteral[{a:1,b:foo}]/literal/entry + /row + row + entry + indexterm + primaryarray_to_json/primary + /indexterm + literalarray_to_json(anyarray)/literal + /entry + entry + Returns the array as JSON. A Postgres multi-dimensional array becomes a JSON + array of arrays. + /entry + entryliteralarray_to_json('{{1,5},{99,100}}'::int[])/literal/entry + entryliteral[[1,5],[99,100]]/literal/entry + /row + row + entry + indexterm + primaryrow_to_json/primary + /indexterm + literalrow_to_json(record)/literal + /entry + entry + Returns the row as JSON. + /entry + entryliteralrow_to_json(row(1,'foo'))/literal/entry + entryliteral{f1:1,f2:foo}/literal/entry + /row + /tbody +/tgroup + /table + + /sect1 sect1 id=functions-sequence titleSequence Manipulation Functions/title diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 8b48105..ddb2784 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -24,6 +24,7 @@ #include rewrite/rewriteHandler.h #include tcop/tcopprot.h #include utils/builtins.h +#include utils/json.h #include utils/lsyscache.h #include utils/rel.h #include utils/snapmgr.h @@ -99,7 +100,6 @@ static void ExplainDummyGroup(const char *objtype, const char *labelname, static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es); static void ExplainJSONLineEnding(ExplainState *es); static void ExplainYAMLLineStarting(ExplainState *es); -static void escape_json(StringInfo buf, const char *str); static void escape_yaml(StringInfo buf, const char
[HACKERS] WAL Restore process during recovery
WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ce659ec..e8b0b69 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -40,6 +40,7 @@ #include pgstat.h #include postmaster/bgwriter.h #include postmaster/startup.h +#include postmaster/walrestore.h #include replication/walreceiver.h #include replication/walsender.h #include storage/bufmgr.h @@ -187,7 +188,6 @@ static bool InArchiveRecovery = false; static bool restoredFromArchive = false; /* options taken from recovery.conf for archive recovery */ -static char *recoveryRestoreCommand = NULL; static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; @@ -575,8 +575,8 @@ bool reachedConsistency = false; static bool InRedo = false; -/* Have we launched bgwriter during recovery? */ -static bool bgwriterLaunched = false; +/* Have we launched background procs during archive recovery yet? */ +static bool ArchRecoveryBgProcsActive = false; /* * Information logged when we detect a change in one of the parameters @@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); -static bool RestoreArchivedFile(char *path, const char *xlogfname, - const char *recovername, off_t expectedSize); static void ExecuteRecoveryCommand(char *command, char *commandName, bool failOnerror); static void PreallocXlogFiles(XLogRecPtr endptr); @@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, XLogFileName(xlogfname, tli, log, seg); +#define TMPRECOVERYXLOG RECOVERYXLOG + switch (source) { case XLOG_FROM_ARCHIVE: + /* + * Check to see if the WALRestore process has already put the + * next file in place while we were working. If so, use that. + * If not, get it ourselves. This makes it easier to handle + * initial state before the WALRestore is active, and also + * handles the stop/start logic correctly when we have both + * streaming and file based replication active. + * + * We queue up the next task for WALRestore after we've begun to + * use this file later in XLogFileRead(). + * + * If the WALRestore process is still active, the lock wait makes + * us wait, which is just like we were executing the command + * ourselves and so doesn't alter the logic elsewhere. + */ + if (XLogFileIsNowFullyRestored(tli, log, seg)) + { +snprintf(path, MAXPGPATH, XLOGDIR /%s, TMPRECOVERYXLOG); +restoredFromArchive = true; +break; + } + /* Report recovery progress in PS display */ snprintf(activitymsg, sizeof(activitymsg), waiting for %s, xlogfname); set_ps_display(activitymsg, false); restoredFromArchive = RestoreArchivedFile(path, xlogfname, - RECOVERYXLOG, + TMPRECOVERYXLOG, XLogSegSize); + if (!restoredFromArchive) + { +LWLockRelease(WALRestoreCommandLock); return -1; + } break; case XLOG_FROM_PG_XLOG: @@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, if (stat(xlogfpath, statbuf) == 0) { if (unlink(xlogfpath) != 0) + { +LWLockRelease(WALRestoreCommandLock); ereport(FATAL, (errcode_for_file_access(), errmsg(could not remove file \%s\: %m, xlogfpath))); + } reload = true; } if (rename(path, xlogfpath) 0) + { + LWLockRelease(WALRestoreCommandLock); ereport(ERROR, (errcode_for_file_access(), errmsg(could not rename file \%s\ to \%s\: %m, path, xlogfpath))); + } + + /* + * Make sure we recover from the new filename, so we can reuse the + * temporary filename for asynchronous restore actions. + */ + strcpy(path, xlogfpath); + + /* + * Tell the WALRestore process to get the next file now. + * Hopefully it will be ready for use in time for the next call the + * Startup process makes to XLogFileRead(). + * + * It might seem like we should do that earlier but then there is a + * race condition that might lead to replacing RECOVERYXLOG with + * another file before we've copied it. + */ + SetNextWALRestoreLogSeg(tli, log, seg); + LWLockRelease(WALRestoreCommandLock); /* * If the existing segment was replaced, since walsenders might have @@ -2911,8 +2961,11 @@ XLogFileClose(void) * For fixed-size files, the caller may pass the expected
Re: [HACKERS] pgstat documentation tables
Magnus Hagander mag...@hagander.net writes: Right now we have a single table on http://www.postgresql.org/docs/devel/static/monitoring-stats.html#MONITORING-STATS-VIEWS that lists all our statistics views ... I'd like to turn that into one table for each view, Please follow the style already used for system catalogs; ie I think there should be a summary table with one entry per view, and then a separate description and table-of-columns for each view. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 Reviewfest - logging hooks patch
On 01/13/2012 05:29 AM, Christopher Maujean wrote: Patch http://archives.postgresql.org/message-id/4edbefbb.9070...@gmail.com In reviewing the referenced patch, we found that it has no developer documentation and no regression tests. While it compiles and installs, there is nothing to tell us how to use (or test) it. Thanks for reviewing! I think the regression tests should be part of the module that uses the hooks. Alas, there are none that we'd want to ship with Postgres at this time. The best I could offer for testing the hooks is a log forwarding module I've been working on. Basically, what it does is that it installs the hooks, looks at the message emitted by backend and if needed sends it away via UDP. Available at: https://github.com/mpihlak/pg_logforward To install, run make install and modify postgresql.conf to include: shared_preload_libraries = 'pg_logforward' logforward.target_names = 'jsonsrv' logforward.jsonsrv_host = '127.0.0.1' logforward.jsonsrv_port = 23456 Now, after restarting postgres, you could use nc 23456 -u -l to receive the log. Alternatively use testing/test_logserver.py as a log server. About documentation - it sure would be nice to have all the hooks written up somewhere. Any suggestions as to where to put it? regards, Martin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved
On tis, 2012-01-10 at 14:10 -0500, Alex Goncharov wrote: | Note that const PGresult * would only warn against changing the | fields It would not warn, it would err (the compilation should fail). No, const violations generally only produce warnings. | of the PGresult struct. It doesn't do anything about changing the data | pointed to by pointers in the PGresult struct. So what you are saying | doesn't follow. By this logic, passing 'const struct foo *' doesn't have any point and value, for any function. It pretty much doesn't. But we know that this is done (and thank you for that) in many cases -- a good style, self-documentation and some protection. I'm not disagreeing, but I'm pointing out that it won't do what you expect. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dry-run mode for pg_archivecleanup
Hi Josh, Il 15/01/12 01:13, Josh Kupershmidt ha scritto: I have signed on to review this patch for the 2012-01 CF. The patch applies cleanly, includes the necessary documentation, and implements a useful feature. Thank you for dedicating your time to this matter. I think the actual debugging line: + fprintf(stdout, %s\n, WALFilePath); My actual intention was to have the filename as output of the command, in order to easily pipe it to another script. Hence my first choice was to use the stdout channel, considering also that pg_archivecleanup in dry-run mode is harmless and does not touch the content of the directory. Oh, and I think the removing file... debug message above should not be printed in dryrun-mode, lest we confuse the admin. Yes, I agree with you here. Let me know what you think about my reasons for the stdout channel, then I will send v2. Thanks again! Ciao, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_trigger_depth() v3 (was: TG_DEPTH)
Kevin Grittner wrote: Florian Pflug wrote: The trigger depth is incremented before calling the trigger function in ExecCallTriggerFunc() and decremented right afterwards, which seems fine - apart from the fact that the decrement is skipped in case of an error. The patch handles that by saving respectively restoring the value of pg_depth in PushTransaction() respectively {Commit,Abort}SubTransaction(). While I can't come up with a failure case for that approach, it seems rather fragile - who's to say that nested trigger invocations correspond that tightly to sub-transaction? I believe the same effect could be achieved more cleanly by a TRY/CATCH block in ExecCallTriggerFunc(). Done that way in this version. * Other in-core PLs As it stands, the patch only export tg_depth to plpgsql functions, not to functions written in one of the other in-core PLs. It'd be good to change that, I believe - otherwise the other PLs become second-class citizens in the long run. Are you suggesting that this be implemented as a special trigger variable in every PL, or that it simply be a regular function that returns zero when not in a trigger and some positive value when called from a trigger? The latter seems like a pretty good idea to me. If that is done, is there any point to *also* having a TG_DEPTH trigger variable in plpgsql? (I don't think there is.) I dropped the TG_DEPTH name and the patch just supports a pg_trigger_depth() function now. Useful values are probably: 0: No trigger active on the connection. 1: Top level trigger. Useful to restrict certain DML to be allowed only from triggers. 1: OK to do trigger-restricted DML. greater than expected maximum depth: warn before hard crash [questions about code coverage in regression tests] I altered the tests to improve code coverage. In addition, since this is no longer just a plpgsql feature, I move the tests to the triggers.sql file. -Kevin tg_depth.v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our poll() based WaitLatch implementation is broken
On 15.01.2012 09:26, Peter Geoghegan wrote: Build Postgres master, on Linux or another platform that will use the poll() implementation rather than the older select(). Send the Postmaster SIGKILL. Observe that the WAL Writer lives on, representing a denial of service as it stays attached to shared memory, busy waiting (evident from the fact that it quickly leaks memory). The poll()-based implementation checked for POLLIN on the postmaster-alive-pipe, just like we check for the fd to become readable in the select() implementation. But poll() has a separate POLLHUP event code for that; it does not set POLLIN on the fd but POLLHUP. Fixed, to check POLLHUP. I still kept the check POLLIN, the pipe should never become readable so if it does something is badly wrong. I also threw in a check for POLLNVAL, which means Invalid request: fd not open. That should definitely not happen, but if it does, it seems good to treat it as postmaster death too. Even if the postmaster isn't dead yet, we could no longer detect when it does die. The rationale for introducing the poll()-based implementation where available was that it performed better than a select()-based one. I wonder, how compelling a win is that expected to be? Ganesh Venkitachalam did some micro-benchmarking after the latch patch was committed: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01609.php. I don't think it make any meaningful difference in real applications, but poll() also doesn't have an arbitrary limit on the range of fd numbers that can be used. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup is not checking IDENTIFY_SYSTEM numbre of columns
On Wed, Jan 11, 2012 at 5:11 PM, Magnus Hagander mag...@hagander.net wrote: No, no reason. Adding such a check would be a good idea. ok. patch attached, it also adds a few PQclear() calls before disconnect_and_exit(). btw, in BaseBackup() in line 1149 (after the patch is applied) there is an exit instead of disconnect_and_exit() and that is probably a typo too -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c new file mode 100644 index 4007680..22d906a *** a/src/bin/pg_basebackup/pg_basebackup.c --- b/src/bin/pg_basebackup/pg_basebackup.c *** BaseBackup(void) *** 916,927 { fprintf(stderr, _(%s: could not identify system: %s\n), progname, PQerrorMessage(conn)); disconnect_and_exit(1); } ! if (PQntuples(res) != 1) { ! fprintf(stderr, _(%s: could not identify system, got %i rows\n), ! progname, PQntuples(res)); disconnect_and_exit(1); } sysidentifier = strdup(PQgetvalue(res, 0, 0)); --- 916,929 { fprintf(stderr, _(%s: could not identify system: %s\n), progname, PQerrorMessage(conn)); + PQclear(res); disconnect_and_exit(1); } ! if (PQntuples(res) != 1 || PQnfields(res) != 3) { ! fprintf(stderr, _(%s: could not identify system, got %i rows and %i fields\n), ! progname, PQntuples(res), PQnfields(res)); ! PQclear(res); disconnect_and_exit(1); } sysidentifier = strdup(PQgetvalue(res, 0, 0)); *** BaseBackup(void) *** 954,965 --- 956,969 { fprintf(stderr, _(%s: could not initiate base backup: %s), progname, PQerrorMessage(conn)); + PQclear(res); disconnect_and_exit(1); } if (PQntuples(res) != 1) { fprintf(stderr, _(%s: no start point returned from server\n), progname); + PQclear(res); disconnect_and_exit(1); } strcpy(xlogstart, PQgetvalue(res, 0, 0)); *** BaseBackup(void) *** 976,986 --- 980,992 { fprintf(stderr, _(%s: could not get backup header: %s), progname, PQerrorMessage(conn)); + PQclear(res); disconnect_and_exit(1); } if (PQntuples(res) 1) { fprintf(stderr, _(%s: no data returned from server\n), progname); + PQclear(res); disconnect_and_exit(1); } *** BaseBackup(void) *** 1010,1015 --- 1016,1022 { fprintf(stderr, _(%s: can only write single tablespace to stdout, database has %d\n), progname, PQntuples(res)); + PQclear(res); disconnect_and_exit(1); } *** BaseBackup(void) *** 1051,1062 --- 1058,1071 { fprintf(stderr, _(%s: could not get WAL end position from server\n), progname); + PQclear(res); disconnect_and_exit(1); } if (PQntuples(res) != 1) { fprintf(stderr, _(%s: no WAL end position returned from server\n), progname); + PQclear(res); disconnect_and_exit(1); } strcpy(xlogend, PQgetvalue(res, 0, 0)); *** BaseBackup(void) *** 1069,1074 --- 1078,1084 { fprintf(stderr, _(%s: final receive failed: %s), progname, PQerrorMessage(conn)); + PQclear(res); disconnect_and_exit(1); } *** BaseBackup(void) *** 1089,1094 --- 1099,1105 { fprintf(stderr, _(%s: could not send command to background pipe: %s\n), progname, strerror(errno)); + PQclear(res); disconnect_and_exit(1); } *** BaseBackup(void) *** 1098,1121 --- 1109,1136 { fprintf(stderr, _(%s: could not wait for child process: %s\n), progname, strerror(errno)); + PQclear(res); disconnect_and_exit(1); } if (r != bgchild) { fprintf(stderr, _(%s: child %i died, expected %i\n), progname, r, bgchild); + PQclear(res); disconnect_and_exit(1); } if (!WIFEXITED(status)) { fprintf(stderr, _(%s: child process did not exit normally\n), progname); + PQclear(res); disconnect_and_exit(1); } if (WEXITSTATUS(status) != 0) { fprintf(stderr, _(%s: child process exited with error %i\n), progname, WEXITSTATUS(status)); + PQclear(res); disconnect_and_exit(1); } /* Exited normally, we're happy! */ *** BaseBackup(void) *** 1130,1135 --- 1145,1151 { fprintf(stderr, _(%s: could not parse xlog end position \%s\\n), progname, xlogend); + PQclear(res); exit(1); } InterlockedIncrement(has_xlogendptr); *** BaseBackup(void) *** 1140,1145 --- 1156,1162 _dosmaperr(GetLastError()); fprintf(stderr, _(%s: could not wait for child thread: %s\n), progname, strerror(errno)); + PQclear(res); disconnect_and_exit(1); } if (GetExitCodeThread((HANDLE) bgchild, status) == 0) *** BaseBackup(void) *** 1147,1158 --- 1164,1177
Re: [HACKERS] [PATCH] Support for foreign keys with arrays
Hello, Il giorno dom, 11/12/2011 alle 19.45 -0500, Noah Misch ha scritto: On Sat, Dec 10, 2011 at 09:47:53AM +0100, Gabriele Bartolini wrote: So, here is a summary: --- - - | ON| ON| Action | DELETE | UPDATE | --- - - CASCADE| Row | Error | SET NULL | Row | Row | SET DEFAULT| Row | Row | ARRAY CASCADE | Element | Element | ARRAY SET NULL | Element | Element | NO ACTION |-|-| RESTRICT |-|-| --- - - If that's fine with you guys, Marco and I will refactor the development based on these assumptions. Looks fine. This is our latest version of the patch. Gabriele, Gianni and I have discussed a lot and decided to send an initial patch which uses EACH REFERENCES instead of ARRAY REFERENCES. The reason behind this is that ARRAY REFERENCES generates an ambiguous grammar, and we all agreed that EACH REFERENCES makes sense (and the same time does not introduce any new keyword). This is however open for discussion. The patch now includes the following clauses on the delete/update actions - as per previous emails: --- - - | ON| ON| Action | DELETE | UPDATE | --- - - CASCADE| Row | Error | SET NULL | Row | Row | SET DEFAULT| Row | Row | ARRAY CASCADE | Element | Element | ARRAY SET NULL | Element | Element | NO ACTION |-|-| RESTRICT |-|-| --- - - We will resubmit the patch for the 2012-01 commit fest. Thank you, Marco -- Marco Nenciarini - System manager @ Devise.IT marco.nenciar...@devise.it | http://www.devise.it foreign-key-array-v2.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On 15.01.2012 10:24, Greg Smith wrote: That got me thinking: if MB/s is what everyone wants to monitor, can we provide a UI to set these parameters that way too? The attached patch is a bit rough still, but it does that. The key was recognizing that the cost delay plus cost limit can be converted into an upper limit on cost units per second, presuming the writes themselves are free. If you then also assume the worst case--that everything will end up dirty--by throwing in the block size, too, you compute a maximum rate in MB/s. That represents the fastest you can possibly write. +1. I've been thinking we should do that for a long time, but haven't gotten around to it. I think it makes more sense to use the max read rate as the main knob, rather than write rate. That's because the max read rate is higher than the write rate, when you don't need to dirty pages. Or do you think saturating the I/O system with writes is so much bigger a problem than read I/O that it makes more sense to emphasize the writes? I was thinking of something like this, in postgresql.conf: # - Vacuum Throttling - #vacuum_cost_page_miss = 1.0# measured on an arbitrary scale #vacuum_cost_page_dirty = 2.0 # same scale as above #vacuum_cost_page_hit = 0.1 # same scale as above #vacuum_rate_limit = 8MB# max reads per second This is now similar to the cost settings for the planner, which is good. There's one serious concern I don't have a quick answer to. What do we do with in-place upgrade of relations that specified a custom vacuum_cost_limit? I can easily chew on getting the right logic to convert those to equals in the new setting style, but I am not prepared to go solely on the hook for all in-place upgrade work one might do here. Would this be easiest to handle as one of those dump/restore transformations? It needs to be handled at dump/restore time. I'm not sure where that transformation belongs to, though. Do we have any precedence for this? I think we have two options: 1. Accept the old autovacuum_cost_limit setting in CREATE TABLE, and transform it immediately into corresponding autovacuum_rate_limit setting. 2. Transform in pg_dump, so that the CREATE TABLE statements in the dump use the new autovacuum_rate_limit setting. The advantage of 1. option is that dumps taken with old 9.1 pg_dump still work on a 9.2 server. We usually try to preserve that backwards-compatibility, although we always recommend using the pg_dump from the newer version on upgrade. However, you need to know the vacuum_cost_page_miss setting effective in the old server to do the transformation correctly (or vacuum_cost_page_dirty, if we use the write max rate as the main knob as you suggested), and we don't have access when restoring a dump. My guess is that's more sensible than the alternative of making an on-read converter that only writes in the new format, then worrying about upgrading all old pages before moving forward. This requires any page format changes, so I don't think the above sentence makes any sense. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.
On 01/13/2012 07:09 PM, Alex Hunsaker wrote: On Fri, Jan 13, 2012 at 16:07, Andrew Dunstanand...@dunslane.net wrote: On 01/12/2012 09:28 PM, Alex Hunsaker wrote: Util.c/o not depending on plperl_helpers.h was also throwing me for a loop so I fixed it and SPI.c... Thoughts? Basically looks good, but I'm confused by this: do language plperl $$ elog('NOTICE', ${^TAINT}); $$; Why is NOTICE quoted here? It's constant that we've been careful to set up. No reason. [ Err well It was just part of me flailing around while trying to figure out why elog was still crashing even thought I had the issue fixed. The real problem was Util.o was not being recompiled due to Util.c not depending on plperl_helpers.h) ] Except that it doesn't work. The above produces no output in the regression tests. I've committed it with that changed and result file changed accordingly. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Fri, Jan 13, 2012 at 9:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Janes jeff.ja...@gmail.com writes: I don't understand why this is controversial. In the current code, if you have a master and a single sync standby, and the master disappears and you promote the standby, now the new master is running *without a standby*. If you configured it to use sync rep, it won't accept any transactions until you give it a standby. If you configured it not to, then it's you that has changed the replication requirements. Sure, but isn't that a very common usage? Maybe my perceptions are out of whack, but I commonly hear about fail-over and rarely hear about using more than one slave so that you can fail over and still have a positive number of slaves. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
On Fri, Jan 13, 2012 at 10:12 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jeff Janes jeff.ja...@gmail.com wrote:\ I don't understand why this is controversial. I'm having a hard time seeing why this is considered a feature. It seems to me what is being proposed is a mode with no higher integrity guarantee than asynchronous replication, but latency equivalent to synchronous replication. There are never 100% guarantees. You could always have two independent failures (the WAL disk of the master and of the slave) nearly simultaneously. If you look at weaker guarantees, then with asynchronous replication you are almost guaranteed to lose transactions on a fail-over of a busy server, and with the proposed option you are almost guaranteed not to, as long as disconnections are rare. As far as latency, I think there are many cases when a small latency is pretty much equivalent to zero latency. A human on the other end of a commit is unlikely to notice a latency of 0.1 seconds. I can see where it's tempting to want to think it gives something more in terms of integrity guarantees, but when I think it through, I'm not really seeing any actual benefit. I think the value of having a synchronously replicated commit is greater than zero but less than infinite. I don't think it is outrageous to think that that value could be approximately expressed in seconds you are willing to wait for that replicated commit before going ahead without it. If this fed into something such that people got jabber message, emails, or telephone calls any time it switched between synchronous and stand-alone mode, that would make it a built-in monitoring, fail-over, and alert system -- which *would* have some value. But in the past we've always recommended external tools for such features. Since synchronous_standby_names cannot be changed without bouncing the server, we do not provide the tools for an external tool to make this change cleanly. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dry-run mode for pg_archivecleanup
On Sun, Jan 15, 2012 at 3:02 PM, Gabriele Bartolini gabriele.bartol...@2ndquadrant.it wrote: My actual intention was to have the filename as output of the command, in order to easily pipe it to another script. Hence my first choice was to use the stdout channel, considering also that pg_archivecleanup in dry-run mode is harmless and does not touch the content of the directory. Oh, right - I should have re-read your initial email before diving into the patch. That all makes sense given your intended purpose. I guess your goal of constructing some simple way to pass the files which would be removed on to another script is a little different than what I initially thought the patch would be useful for, namely as a testing/debugging aid for an admin. Perhaps both goals could be met by making use of '--debug' together with '--dry-run'. If they are both on, then an additional message like pg_archivecleanup: would remove file ... would be printed to stderr, along with just the filename printed to stdout you already have. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Group commit, revised
Attached is a patch that myself and Simon Riggs collaborated on. I took the group commit patch that Simon posted to the list back in November, and partially rewrote it. Here is that original thread: http://archives.postgresql.org/pgsql-hackers/2011-11/msg00802.php I've also attached the results of a pgbench-tools driven benchmark, which are quite striking (Just the most relevant image - e-mail me privately if you'd like a copy of the full report, as I don't want to send a large PDF file to the list as a courtesy to others). Apart from the obvious improvement in throughput, there is also a considerable improvement in average and worst latency at all client counts. To recap, the initial impetus to pursue this idea came from the observation that with sync rep, we could get massive improvements in the transactions-per-second throughput by simply adding clients. Greg Smith performed a benchmark while in Amsterdam for the PgConf.EU conference, which was discussed in a talk there. Over an inter-continental connection from Amsterdam to his office in Baltimore on the U.S. east coast, he saw TPS as reported by pgbench on what I suppose was either an insert or update workload grow from a mere 10 TPS for a single connection to over 2000 TPS for about 300 connections. That was with the large, inherent latency imposed by those sorts of distances (3822 miles/ 6150 km, about a 100ms ping time on a decent connection). Quite obviously, as clients were added, the server was able to batch increasing numbers of commits in each confirmation message, resulting in this effect. The main way that I've added value here is by refactoring and fixing bugs. There were some tricky race conditions that caused the regression tests to fail for that early draft patch, but there were a few other small bugs too. There is an unprecedented latch pattern introduced by this patch: Two processes (the WAL Writer and any given connection backend) have a mutual dependency. Only one may sleep, in which case the other is responsible for waking it. Much of the difficulty in debugging this patch, and much of the complexity that I've added, came from preventing both from simultaneously sleeping, even in the face of various known failure modes like postmaster death. If this happens, it does of course represent a denial-of-service, so that's something that reviewers are going to want to heavily scrutinise. I suppose that sync rep should be fine here, as it waits on walsenders, but I haven't actually comprehensively tested the patch, so there may be undiscovered unpleasant interactions with other xlog related features. I can report that it passes the regression tests successfully, and on an apparently consistently basis - I battled with intermittent failures for a time. Simon's original patch largely copied the syncrep.c code as an expedient to prove the concept. Obviously this design was never intended to get committed, and I've done some commonality and variability analysis, refactoring to considerably slim down the new groupcommit.c file by exposing some previously module-private functions from syncrep.c . I encourage others to reproduce my benchmark here. I attach a pgbench-tools config. You can get the latest version of the tool at: https://github.com/gregs1104/pgbench-tools I also attach hdparm information for the disk that was used during these benchmarks. Note that I have not disabled the write cache. It's a Linux box, with ext4, running a recent kernel. The benefits (and, admittedly, the controversies) of this patch go beyond mere batching of commits: it also largely, though not entirely, obviates the need for user backends to directly write/flush WAL, and the WAL Writer may never sleep if it continually finds work to do - wal_writer_delay is obsoleted, as are commit_siblings and commit_delay. I suspect that they will not be missed. Of course, it does all this to facilitate group commit itself. The group commit feature does not have a GUC to control it, as this seems like something that would be fairly pointless to turn off. FWIW, this is currently the case for the recently introduced Maria DB group commit implementation. Auxiliary processes cannot use group commit. The changes made prevent them from availing of commit_siblings/commit_delay parallelism, because it doesn't exist anymore. Group commit is sometimes throttled, which seems appropriate - if a backend requests that the WAL Writer flush an LSN deemed too far from the known flushed point, that request is rejected and the backend goes through another path, where XLogWrite() is called. Currently the group commit infrastructure decides that on the sole basis of there being a volume of WAL that is equivalent in size to checkpoint_segments between the two points. This is probably a fairly horrible heuristic, not least since it overloads checkpoint_segments, but is of course only a first-pass effort. Bright ideas are always welcome. Thoughts? -- Peter Geoghegan
[HACKERS] SKIP LOCKED DATA
Hi Apologies for posting about new vapourware features for distant future releases at a very busy time in the cycle for 9.2... I am wondering out loud whether I am brave enough to try to propose SKIP LOCKED DATA support and would be grateful for any feedback and/or {en|dis}couragement. I don't see it on the todo list, and didn't find signs of others working on this (did I miss something?), but there are examples of users asking for this feature (by various names) on the mailing lists. Has the idea already been rejected, is it fundamentally infeasible for some glaring reason, or far too complicated for new players? What it is: Like the existing NOWAIT option, it means your query doesn't wait for others sessions when trying to get an exclusive lock. However, rather than returning an error if it would block, it simply skips over the rows that couldn't be locked. What it looks like in other RDBMSs: DB2 (z/OS only): FOR UPDATE SKIP LOCKED DATA Oracle: FOR UPDATE SKIP LOCKED Sybase: FOR UPDATE READPAST MS SQL Server: FOR UPDATE WITH (READPAST) (I'm not 100% sure about the last two, which I found by googling for equivalents of the first two, and there are no doubt subtle differences among these). What it's for: A common usage for this is to increase parallelism in systems with multiple workers taking jobs from a queue. I've used it for this purpose myself on another RDBMS, having seen it recommended for some types of work queue implementation. It may have other uses. How it might be implemented in PostgreSQL: 1. Extend the grammar and parser to support SKIP LOCKED DATA (or some other choice of words) in the same place that NOWAIT can appear. 2. Modify heap_lock_tuple so that the boolean 'nowait' argument is replaced by an enumeration LockWaitPolicy with values LOCK_WAIT_POLICY_WAIT (= what false currently does), LOCK_WAIT_POLICY_LOCK_OR_ERROR (= what true currently does), LOCK_WAIT_POLICY_LOCK_OR_SKIP (= new behaviour). Where currently 'nowait' is handled, the new case would also be handled, cleaning up resources and returning a new HTSU_Result enumerator HeapTupleWouldBlock. 3. Modify ExecLockRows to pass the appropriate value to heap_lock_tuple (presumably received via ExecRowMark, as nowait is received currently). Modify the switch on the result of that call, treating the new case HeapTupleWouldBlock the same way that HeapTupleSelfUpdated is treated -- that is, goto lnext to fetch the next tuple. 4. Probably some changes to handle table-level locks too. 5. Probably many other things that I'm not aware of right now and won't discover until I dig/ask further and/or run into a brick wall! Useful? Doable? Thanks, Thomas Munro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disable prompting by default in createuser
On Thu, Dec 22, 2011 at 2:26 PM, Peter Eisentraut pete...@gmx.net wrote: On lör, 2011-11-26 at 01:28 +0200, Peter Eisentraut wrote: I propose that we change createuser so that it does not prompt for anything by default. We can arrange options so that you can get prompts for whatever is missing, but by default, a call to createuser should just run CREATE USER with default options. The fact that createuser issues prompts is always annoying when you create setup scripts, because you have to be careful to specify all the necessary options, and they are inconsistent and different between versions (although the last change about that was a long time ago), and the whole behavior seems contrary to the behavior of all other utilities. I don't think there'd be a real loss by not prompting by default; after all, we don't really want to encourage users to create more superusers, do we? Comments? Patch attached. I'll add it to the next commitfest. I looked at this patch for the 2012-01 CF. I like the idea, the interactive-by-default behavior of createuser has always bugged me as well. I see this patch includes a small change to dropuser, to make the 'username' argument mandatory if --interactive is not set, for symmetry with createuser's new behavior. That's dandy, though IMO we shouldn't have -i be shorthand for --interactive with dropuser, and something different with createuser (i.e. we should just get rid of the i alias for dropuser). Another little inconsistency I see with the behavior when no username to create or drop is given: $ createuser createuser: creation of new role failed: ERROR: role josh already exists $ dropuser dropuser: missing required argument role name Try dropuser --help for more information. i.e. createuser tries taking either $PGUSER or the current username as a default user to create, while dropuser just bails out. Personally, I prefer just bailing out if no create/drop user is specified, but either way I think they should be consistent. Oh, and I think the leading whitespace of this help message: printf(_( --interactive prompt for missing role name and attributes rather\n should be made the same as for other commands with no short-alias, e.g. printf(_( --replication role can initiate replication\n)); Other than those little complaints, everything looks good. Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory usage during sorting
In tuplesort.c, it initially reads tuples into memory until availMem is exhausted. It then switches to the tape sort algorithm, and allocates buffer space for each tape it will use. This substantially over-runs the allowedMem, and drives availMem negative. It works off this deficit by writing tuples to tape, and pfree-ing their spot in the heap, which pfreed space is more or less randomly scattered. Once this is done it proceeds to alternate between freeing more space in the heap and adding things to the heap (in a nearly strict 1+1 alternation if the tuples are constant size). The space freed up by the initial round of pfree where it is working off the space deficit from inittapes is never re-used. It also cannot be paged out by the VM system, because it is scattered among actively used memory. I don't think that small chunks can be reused from one memory context to another, but I haven't checked. Even if it can be, during a big sort like an index build the backend process doing the build may have no other contexts which need to use it. So having over-ran workMem and stomped all over it to ensure no one else can re-use it, we then scrupulously refuse to benefit from that over-run amount ourselves. The attached patch allows it to reuse that memory. On my meager system it reduced the building of an index on an integer column in a skinny 200 million row totally randomly ordered table by about 3% from a baseline of 25 minutes. I think it would be better to pre-deduct the tape overhead amount we will need if we decide to switch to tape sort from the availMem before we even start reading (and then add it back if we do indeed make that switch). That way we wouldn't over-run the memory in the first place. However, that would cause apparent regressions in which sorts that previously fit into maintenance_work_mem no longer do. Boosting maintenance_work_mem to a level that was actually being used previously would fix those regressions, but pointing out that the previous behavior was not optimal doesn't change the fact that people are used to it and perhaps tuned to it. So the attached patch seems more backwards-friendly. Cheers, Jeff sort_mem_usage_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat documentation tables
On 01/15/2012 12:20 PM, Tom Lane wrote: Please follow the style already used for system catalogs; ie I think there should be a summary table with one entry per view, and then a separate description and table-of-columns for each view. Yes, that's a perfect precedent. I think the easiest path forward here is to tweak the updated pg_stat_activity documentation, since that's being refactoring first anyway. That can be reformatted until it looks just like the system catalog documentation. And then once that's done, the rest of them can be converted over to follow the same style. I'd be willing to work on doing that in a way that improves what is documented, too. The difficulty of working with the existing tables has been the deterrent for improving that section to me. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] archive_keepalive_command
On Fri, Dec 16, 2011 at 3:01 PM, Simon Riggs si...@2ndquadrant.com wrote: archive_command and restore_command describe how to ship WAL files to/from an archive. When there is nothing to ship, we delay sending WAL files. When no WAL files, the standby has no information at all. To provide some form of keepalive on quiet systems the archive_keepalive_command provides a generic hook to implement keepalives. This is implemented as a separate command to avoid storing keepalive messages in the archive, or at least allow overwrites using a single filename like keepalive. Examples archive_keepalive_command = 'arch_cmd keepalive' # sends a file called keepalive to archive, overwrites allowed archive_keepalive_command = 'arch_cmd %f.%t.keepalive #sends a file like 0001000ABFE.20111216143517.keepalive If there is no WAL file to send, then we send a keepalive file instead. Keepalive is a small file that contains same contents as a streaming keepalive message (re: other patch on that). If no WAL file is available and we are attempting to restore in standby_mode, then we execute restore_keepalive_command to see if a keepalive file is available. Checks for a file in the specific keepalive format and then uses that to update last received info from master. e.g. restore_keepalive_command = 'restore_cmd keepalive' # gets a file called keepalive to archive, overwrites allowed Patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index 5acfa57..fab288c 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -43,6 +43,13 @@ # #restore_command = '' # e.g. 'cp /mnt/server/archivedir/%f %p' # +# restore_keepalive_command +# +# specifies an optional shell command to download keepalive files +# e.g. archive_keepalive_command = 'cp -f %p $ARCHIVE/keepalive /dev/null' +# e.g. restore_keepalive_command = 'cp $ARCHIVE/keepalive %p' +# +#restore_keepalive_command = '' # # archive_cleanup_command # diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ce659ec..2729141 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -73,8 +73,10 @@ int CheckPointSegments = 3; int wal_keep_segments = 0; int XLOGbuffers = -1; int XLogArchiveTimeout = 0; +int XLogArchiveKeepaliveTimeout = 10; /* XXX set to 60 before commit */ bool XLogArchiveMode = false; char *XLogArchiveCommand = NULL; +char *XLogArchiveKeepaliveCommand = NULL; bool EnableHotStandby = false; bool fullPageWrites = true; bool log_checkpoints = false; @@ -188,6 +190,7 @@ static bool restoredFromArchive = false; /* options taken from recovery.conf for archive recovery */ static char *recoveryRestoreCommand = NULL; +static char *recoveryRestoreKeepaliveCommand = NULL; static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; @@ -634,6 +637,7 @@ static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); static bool RestoreArchivedFile(char *path, const char *xlogfname, const char *recovername, off_t expectedSize); +static void RestoreKeepaliveFile(void); static void ExecuteRecoveryCommand(char *command, char *commandName, bool failOnerror); static void PreallocXlogFiles(XLogRecPtr endptr); @@ -2718,7 +2722,10 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, RECOVERYXLOG, XLogSegSize); if (!restoredFromArchive) + { +RestoreKeepaliveFile(); return -1; + } break; case XLOG_FROM_PG_XLOG: @@ -3179,6 +3186,192 @@ not_available: return false; } +static void +RestoreKeepaliveFile(void) +{ + char keepalivepath[MAXPGPATH]; + char keepaliveRestoreCmd[MAXPGPATH]; + char *dp; + char *endp; + const char *sp; + int rc; + bool signaled; + struct stat stat_buf; + + /* In standby mode, restore_command might not be supplied */ + if (recoveryRestoreKeepaliveCommand == NULL) + return; + + snprintf(keepalivepath, MAXPGPATH, XLOGDIR /archive_status/KEEPALIVE); + + /* + * Make sure there is no existing file in keepalivepath + */ + if (stat(keepalivepath, stat_buf) == 0) + { + if (unlink(keepalivepath) != 0) + ereport(FATAL, + (errcode_for_file_access(), + errmsg(could not remove file \%s\: %m, + keepalivepath))); + } + + /* + * construct the command to be executed + */ + dp = keepaliveRestoreCmd; + endp = keepaliveRestoreCmd + MAXPGPATH - 1; + *endp = '\0'; + + for (sp = recoveryRestoreKeepaliveCommand; *sp; sp++) + { + if (*sp == '%') + { + switch (sp[1]) + { +case 'p': + /* %p: relative path of target file */ + sp++; +
Re: [HACKERS] Vacuum rate limit in KBps
On 01/15/2012 04:17 PM, Heikki Linnakangas wrote: I think it makes more sense to use the max read rate as the main knob, rather than write rate. That's because the max read rate is higher than the write rate, when you don't need to dirty pages. Or do you think saturating the I/O system with writes is so much bigger a problem than read I/O that it makes more sense to emphasize the writes? I haven't had the I/O rate logging available for long enough to have a good feel for which is more important to emphasize. I'm agnostic on this. I'd have no problem accepting the argument that exposing the larger of the two rates--which is the read one--makes for a cleaner UI. Or that it is the one more like other knobs setting precedents here. My guess is that the changed documentation will actually be a bit cleaner that way. I give an example in the patch of how read and write rate are related, based on the ratio of the values for dirty vs. hit. I wasn't perfectly happy with how that was written yet, and I think it could be cleaner if the read rate is the primary tunable. We usually try to preserve that backwards-compatibility, although we always recommend using the pg_dump from the newer version on upgrade. However, you need to know the vacuum_cost_page_miss setting effective in the old server to do the transformation correctly (or vacuum_cost_page_dirty, if we use the write max rate as the main knob as you suggested), and we don't have access when restoring a dump. If someone does a storage parameter change to autovacuum_vacuum_cost_limit but doesn't touch autovacuum_vacuum_cost_delay there, I think it's possible to need the GUC value for autovacuum_vacuum_cost_delay too, which can then refer to vacuum_cost_delay as well. I don't think that tweaking these parameters, particularly at the storage options level, is a popular thing to do. I've run into customers who made changes there while trying to navigate the complexity of autovacuum tuning, but not very many. My guess as I think about that history is that I've ended up reverting them as often, or maybe even slightly more often, than I've ended up keeping them around. It's been hard to do well. And the level of PostgreSQL deployment that reaches that stage of tuning, where they managed to tweak those productively, doesn't seem likely to do a blind upgrade to me. One of the reasons I thought now was a good time to work on this change is because there's already things brewing that are going to make 9.2 break a few more things than anyone would like, all for long-term positive benefits. recovery.conf and pg_stat_activity changes are the two of those I've been tracking the closest. My current thinking on this is that we ought to learn a lesson from the 8.3 casting breakage saga and provide a clear 9.2 Upgrade Migration Guide that goes beyond just listing what changed in the release notes. Aim more toward having a checklist of things to look for and tools to help find them. In this case, having the migration guide include a query that pokes through the catalogs looking for this particular customization would be helpful. It would be easy to produce a bit of post-9.2 upgrade SQL that converted a customization here if you started by running something against the existing installation. And that should be a wiki page so it can be extended as new things are discovered, some of which will include application specific guidance. The rough idea in this direction I put together for 8.3 (it didn't catch on for later versions) is at http://wiki.postgresql.org/wiki/Version_History Note how that grew to include tsearch2 migration info and MediaWiki specific advice at one point (some of that was then lost in a Planet fire, but you can get the idea just from the article titles). Needing a version migration guide with specific details about issues like this is something I think PostgreSQL needs to accept as part of the release cycle. Expecting that pg_upgrade can transparently handle every possible change would be setting a high bar to clear, higher than I think is expected by the database industry at large. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 01/15/2012 05:14 PM, Ants Aasma wrote: I hope that having a tool to measure the overhead and check the sanity of clock sources is enough to answer the worries about the potential performance hit. We could also check that the clock source is fast enough on start-up/when the guc is changed, but that seems a bit too much and leaves open the question about what is fast enough. I've been thinking along those same lines--check at startup, provide some guidance on the general range of what's considered fast vs. slow in both the code and documentation. What I'm hoping to do here is split your patch in half and work on the pg_test_timing contrib utility first. That part answers some overdue questions about when EXPLAIN ANALYZE can be expected to add a lot of overhead, which means it's useful all on its own. I'd like to see that utility go into 9.2, along with a new documentation section covering that topic. I'll write the new documentation bit. As far as the buffer timing goes, there is a lot of low-level timing information I'd like to see the database collect. To pick a second example with very similar mechanics, I'd like to know which queries spend a lot of their time waiting on locks. The subset of time a statement spends waiting just for commit related things is a third. The industry standard term for these is wait events, as seen in Oracle, MySQL, MS SQL Server. etc. That's so standard I don't see an intellectual property issue with PostgreSQL using the same term. Talk with a random person who is converting from Oracle to PostgreSQL, ask them about their performance concerns. At least 3/4 of those conversations I have mention being nervous about not having wait event data. Right now, I feel the biggest hurdle to performance tuning PostgreSQL is not having good enough built-in query log analysis tools. If the pg_stat_statements normalization upgrade in the CF queue is commited, that's enough to make me bump that to solved well enough. After clearing that hurdle, figuring out how to log, analyze, and manage storage of wait events is the next biggest missing piece. One of my top goals for 9.3 was to make sure that happened. I don't think the long-term answer for how to manage wait event data is to collect it as part of pg_stat_statements though. But I don't have a good alternate proposal, while you've submitted a patch that actually does something useful right now. I'm going to think some more about how to reconcile all that. There is an intermediate point to consider as well, which is just committing something that adjusts the core code to make the buffer wait event data available. pg_stat_statements is easy enough to continue work on outside of core. I could see a path where that happens in parallel with adding a better core wait event infrastructure, just to get the initial buffer wait info into people's hands earlier. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] separate initdb -A options for local and host
On Sat, Jan 14, 2012 at 5:18 PM, Peter Eisentraut pete...@gmx.net wrote: On lör, 2011-11-26 at 01:20 +0200, Peter Eisentraut wrote: I think it would be useful to have separate initdb -A options for local and host entries. In 9.1, we went out of our way to separate the peer and ident methods, but we have moved the confusion into the initdb -A option, where ident sometimes means peer, and peer sometimes means ident. Moreover, having separate options would allow what I think would be a far more common use case, namely having local peer and host something other than ident, such as md5. I'm thinking, we could keep the existing -A option, but add long options such as --auth-local and --auth-host, to specify more detail. Here is a patch that implements exactly that. I reviewed this patch. It looks OK to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] triggered_change_notification v3
Attached is a version of a previously posted patch which has been modified based on on-list feedback from Álvaro. This is a generalized trigger function which can be used as an AFTER EACH ROW trigger on any table which has a primary key, and will send notifications of operations for which the trigger is attached, with a payload indicating the table, the operation, and the primary key. A channel can be specified in the CREATE TRIGGER command, but will default to tcn if omitted. I had previously submitted this as a core function, but with a mature extensions feature now present, I reworked it as a contrib extension. If not accepted into contrib, I can post it to PGXN, but I thought it seemed a reasonable candidate for contrib. I took a shot at creating docs similar to other contrib extensions, but couldn't quite figure out how to get them to build. If someone can give me pointers on that, I'll polish that up. I find it hard to work on the docs in sgml without being able to build and review the html output. -Kevin tcn-v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SKIP LOCKED DATA
Thomas Munro mu...@ip9.org writes: I am wondering out loud whether I am brave enough to try to propose SKIP LOCKED DATA support and would be grateful for any feedback and/or {en|dis}couragement. I don't see it on the todo list, and didn't find signs of others working on this (did I miss something?), but there are examples of users asking for this feature (by various names) on the mailing lists. Has the idea already been rejected, is it fundamentally infeasible for some glaring reason, or far too complicated for new players? It sounds to me like silently give the wrong answers. Are you sure there are not better, more deterministic ways to solve your problem? (Or in other words: the fact that Oracle has it isn't enough to persuade me it's a good idea.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch : Allow toast tables to be moved to a different tablespace
On Tue, Dec 13, 2011 at 12:29 PM, Julien Tachoires jul...@gmail.com wrote: OK, considering that, I don't see any way to handle the case raised by Jaime :( Did you consider what Álvaro suggested? anyway, seems is too late for this commitfest. are you intending to resume work on this for the next cycle? do we consider this as a useful thing? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
One of the most useful bits of feedback on how well checkpoint I/O is going is the amount of time taken to sync files to disk. Right now the only way to get that is to parse the logs. The attached patch publishes the most useful three bits of data you could only get from log_checkpoints before out to pg_stat_bgwriter. Easiest to just show an example: $ createdb pgbench $ psql -c select pg_stat_reset_shared('bgwriter') $ pgbench -i -s 10 pgbench $ psql -c checkpoint $ tail $PGLOG LOG: checkpoint complete: wrote 590 buffers (14.4%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.117 s, total=1.746 s; sync files=31, longest=0.104 s, average=0.003 s $ psql -c select write_ms,sync_ms,sync_files from pg_stat_bgwriter write_ms | sync_ms | sync_files --+-+ 2 | 117 | 31 Like the other checkpoint statistics, these are updated in one lump when the checkpoint finishes. This includes rough documentation and the code isn't very complicated; a cat version bump is required. Some open concerns I still have about this: -The total amount of time in the sync phase is not the same as the total of all sync wait times. I have published the total elapsed sync time. The total of the times spent waiting on individual sync calls is also available at the point when the above data is collected, and it could be published as well. I don't think that's necessary yet, since they are almost identical right now. But any feature that staggers sync calls around might make that distinction important. -The total checkpoint time is larger than the total of write+sync. You can see that here: 2ms write + 117ms sync = 119ms; total checkpoint time is actually 1746ms though. While I don't normally find the total useful from a tuning perspective, it may be sensible to publish it just to further reduce any perceived need for log scraping here. -None of the debug1 level information about individual sync calls can be published this way usefully. I'd like that to come out somewhere eventually, but really what's needed there is a performance event logging history collector, not pg_stats. I'm content that this patch gets the big picture data out; it's sufficient to let you rule out checkpoint problems if they aren't happening. If you only have to drill into the logs when long times start showing up in this data, that's not so bad to live with. -I can't tell for sure if this is working properly when log_checkpoints is off. This now collects checkpoint end time data in all cases, whereas before it ignored that work if log_checkpoints was off. I think I will have to fake the logging in this area, make it log some things even when log_checkpoints is off, to confirm it gives the right results. I haven't done that yet. I would like to be able to tell people they need never turn on log_checkpoints if they monitor pg_stat_bgwriter instead, because that sets a good precedent for what direction the database is going. It would be nice for pg_stat_bgwriter's format to settle down for a long period too. In addition to my small list of concerns here, there seem to be an increasing number of write path feature patches floating around here lately. That all works against this being stable. The main consumer of this data I use regularly is the pg_stat_bgwriter graphs that Munin produces, and my expectation is that myself and/or Magnus will make sure it's compatible with any changes made here for 9.2. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index a12a9a2..674091c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1182,6 +1182,34 @@ postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re /row row + entryliteralfunctionpg_stat_get_bgwriter_write_ms()/function/literal/entry + entrytypebigint/type/entry + entry + Total amount of time that has been spent in the part of checkpoint + processing where files are written to disk, in milliseconds. + /entry + /row + + + row + entryliteralfunctionpg_stat_get_bgwriter_sync_ms()/function/literal/entry + entrytypebigint/type/entry + entry + Total amount of time that has been spent in the part of checkpoint + processing where files are synchronized to disk, in milliseconds. + /entry + /row + + row + entryliteralfunctionpg_stat_get_bgwriter_sync_files()/function/literal/entry + entrytypebigint/type/entry + entry + Total number of files that have been synchronized to disk during + checkpoint processing. + /entry + /row + + row entryliteralfunctionpg_stat_get_wal_senders()/function/literal/entry entrytypesetof record/type/entry
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On 01/16/2012 01:28 AM, Greg Smith wrote: -I can't tell for sure if this is working properly when log_checkpoints is off. This now collects checkpoint end time data in all cases, whereas before it ignored that work if log_checkpoints was off. ...and there's at least one I missed located already: inside of md.c. I'd forgotten how many spots where timing calls are optimized out are floating around this code path. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Checkpoint sync pause
Last year at this point, I submitted an increasingly complicated checkpoint sync spreading feature. I wasn't able to prove any repeatable drop in sync time latency from those patches. While that was going on, and continuing into recently, the production server that started all this with its sync time latency issues didn't stop having that problem. Data collection continued, new patches were tried. There was a really simple triage step Simon and I made before getting into the complicated ones: just delay for a few seconds between every single sync call made during a checkpoint. That approach is still hanging around that server's patched PostgreSQL package set, and it still works better than anything more complicated we've tried so far. The recent split of background writer and checkpointer makes that whole thing even easier to do without rippling out to have unexpected consequences. In order to be able to tune this usefully, you need to know information about how many files a typical checkpoint syncs. That could be available without needing log scraping using the Publish checkpoint timing and sync files summary data to pg_stat_bgwriter addition I just submitted. People who set this new checkpoint_sync_pause value too high can face checkpoints running over schedule, but you can measure how bad your exposure is with the new view information. I owe the community a lot of data to prove this is useful before I'd expect it to be taken seriously. I was planning to leave this whole area alone until 9.3. But since recent submissions may pull me back into trying various ways of rearranging the write path for 9.2, I wanted to have my own miniature horse in that race. It works simply: ... 2012-01-16 02:39:01.184 EST [25052]: DEBUG: checkpoint sync: number=34 file=base/16385/11766 time=0.006 msec 2012-01-16 02:39:01.184 EST [25052]: DEBUG: checkpoint sync delay: seconds left=3 2012-01-16 02:39:01.284 EST [25052]: DEBUG: checkpoint sync delay: seconds left=2 2012-01-16 02:39:01.385 EST [25052]: DEBUG: checkpoint sync delay: seconds left=1 2012-01-16 02:39:01.860 EST [25052]: DEBUG: checkpoint sync: number=35 file=global/12007 time=375.710 msec 2012-01-16 02:39:01.860 EST [25052]: DEBUG: checkpoint sync delay: seconds left=3 2012-01-16 02:39:01.961 EST [25052]: DEBUG: checkpoint sync delay: seconds left=2 2012-01-16 02:39:02.061 EST [25052]: DEBUG: checkpoint sync delay: seconds left=1 2012-01-16 02:39:02.161 EST [25052]: DEBUG: checkpoint sync: number=36 file=base/16385/11754 time=0.008 msec 2012-01-16 02:39:02.555 EST [25052]: LOG: checkpoint complete: wrote 2586 buffers (63.1%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=2.422 s, sync=13.282 s, total=16.123 s; sync files=36, longest=1.085 s, average=0.040 s No docs yet, really need a better guide to tuning checkpoints as they exist now before there's a place to attach a discussion of this to. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 0b792d2..54da69a 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -142,6 +142,7 @@ static BgWriterShmemStruct *BgWriterShmem; int CheckPointTimeout = 300; int CheckPointWarning = 30; double CheckPointCompletionTarget = 0.5; +int CheckPointSyncPause = 0; /* * Flags set by interrupt handlers for later service in the main loop. @@ -157,6 +158,8 @@ static bool am_checkpointer = false; static bool ckpt_active = false; +static int checkpoint_flags = 0; + /* these values are valid when ckpt_active is true: */ static pg_time_t ckpt_start_time; static XLogRecPtr ckpt_start_recptr; @@ -643,6 +646,9 @@ CheckpointWriteDelay(int flags, double progress) if (!am_checkpointer) return; + /* Cache this value for a later sync delay */ + checkpoint_flags=flags; + /* * Perform the usual duties and take a nap, unless we're behind * schedule, in which case we just try to catch up as quickly as possible. @@ -685,6 +691,72 @@ CheckpointWriteDelay(int flags, double progress) } /* + * CheckpointSyncDelay -- control rate of checkpoint sync stage + * + * This function is called after each relation sync performed by mdsync(). + * It delays for a fixed period while still making sure to absorb + * incoming fsync requests. + * + * Due to where this is called with the md layer, it's not practical + * for it to be directly passed the checkpoint flags. It's expected + * they will have been stashed within the checkpointer's local state + * by a call to CheckpointWriteDelay. + * + */ +void +CheckpointSyncDelay() +{ + static int absorb_counter = WRITES_PER_ABSORB; + int sync_delay_secs = CheckPointSyncPause; + + /* Do nothing if checkpoint is being executed by non-checkpointer process */ + if