Re: [HACKERS] ERRCODE_READ_ONLY_SQL_TRANSACTION
Hi, Tom Lane t...@sss.pgh.pa.us writes: Simon Riggs si...@2ndquadrant.com writes: Hot Standby returns ERRCODE_READ_ONLY_SQL_TRANSACTION in most cases for illegal actions on a standby. I don't think I like this patch: you are promoting what are and ought to be very low-level internal sanity checks into user-facing errors (which among other things will require translation effort for the messages). So it seems the last-9-2-CF deadline is making us a little too hasty. Apparently as you're saying there's no way to exercise that code paths from an SQL connection on a Hot Standby short of deploying a C coded extension calling either GetNewTransactionId() or XLogInsert(), which means it's out of scope. My quest was figuring out if ERRCODE_READ_ONLY_SQL_TRANSACTION really is trustworthy as a signal that you could transparently now redirect the transaction to the master when seeing that in a “proxy” of some sort. I felt that we were missing something simple here, but after review I think we finally have all the pieces to achieve that with current 9.2 code base in fact. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERRCODE_READ_ONLY_SQL_TRANSACTION
On Fri, Jan 13, 2012 at 8:55 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I felt that we were missing something simple here, but after review I think we finally have all the pieces to achieve that with current 9.2 code base in fact. Good, patch revoked. No time wasted, it was worth checking. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench post-connection command
On Thu, Jan 12, 2012 at 8:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, Jan 12, 2012 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: More like \once ... any SQL command or meta command here ... if we want to extend the scripting language. But I'd be perfectly happy with a command-line switch that specifies a script file to be run once. Once per connection, yes? Hmmm ... good question. Heikki was speculating about doing CREATE TABLE or similar, which you'd want done only once period. But I see no very strong reason why cases like that couldn't be handled outside of pgbench. So yeah, once per connection. OK, its a TODO item, but I don't think I'll have time for it for the next CF. If we need it during testing of other patches then I'll write it. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Wed, 21 Sep 2011 18:13:07 +0200, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 21.09.2011 18:46, Tom Lane wrote: The idea that I was toying with was to allow the regular SQL-callable comparison function to somehow return a function pointer to the alternate comparison function, You could have a new function with a pg_proc entry, that just returns a function pointer to the qsort-callback. Yeah, possibly. That would be a much more invasive change, but cleaner in some sense. I'm not really prepared to do all the legwork involved in that just to get to a performance-testable patch though. A few years ago I had looked for a way to speed up COPY operations, and it turned out that COPY TO has a good optimization opportunity. At that time, for each datum, COPY TO would : - test for nullness - call an outfunc through fmgr - outfunc pallocs() a bytea or text, fills it with data, and returns it (sometimes it uses an extensible string buffer which may be repalloc()d several times) - COPY memcpy()s returned data to a buffer and eventually flushes the buffer to client socket. I introduced a special write buffer with an on-flush callback (ie, a close relative of the existing string-buffer), in this case the callback was flush to client socket, and several outfuncs (one per type) which took that buffer as argument, besides the datum to output, and simply put the datum inside the buffer, with appropriate transformations (like converting to bytea or text), and flushed if needed. Then the COPY TO BINARY of a constant-size datum would turn to : - one test for nullness - one C function call - one test to ensure appropriate space available in buffer (flush if needed) - one htonl() and memcpy of constant size, which the compiler turns out into a couple of simple instructions I recall measuring speedups of 2x - 8x on COPY BINARY, less for text, but still large gains. Although eliminating fmgr call and palloc overhead was an important part of it, another large part was getting rid of memcpy()'s which the compiler turned into simple movs for known-size types, a transformation that can be done only if the buffer write functions are inlined inside the outfuncs. Compilers love constants... Additionnally, code size growth was minimal since I moved the old outfuncs code into the new outfuncs, and replaced the old fmgr-callable outfuncs with create buffer with on-full callback=extend_and_repalloc() - pass to new outfunc(buffer,datum) - return buffer. Which is basically equivalent to the previous palloc()-based code, maybe with a few extra instructions. When I submitted the patch for review, Tom rightfully pointed out that my way of obtaining the C function pointer sucked very badly (I don't remember how I did it, only that it was butt-ugly) but the idea was to get a quick measurement of what could be gained, and the result was positive. Unfortunately I had no time available to finish it and make it into a real patch, I'm sorry about that. So why do I post in this sorting topic ? It seems, by bypassing fmgr for functions which are small, simple, and called lots of times, there is a large gain to be made, not only because of fmgr overhead but also because of the opportunity for new compiler optimizations, palloc removal, etc. However, in my experiment the arguments and return types of the new functions were DIFFERENT from the old functions : the new ones do the same thing, but in a different manner. One manner was suited to sql-callable functions (ie, palloc and return a bytea) and another one to writing large amounts of data (direct buffer write). Since both have very different requirements, being fast at both is impossible for the same function. Anyway, all that rant boils down to : Some functions could benefit having two versions (while sharing almost all the code between them) : - User-callable (fmgr) version (current one) - C-callable version, usually with different parameters and return type And it would be cool to have a way to grab a bare function pointer on the second one. Maybe an extra column in pg_proc would do (but then, the proargtypes and friends would describe only the sql-callable version) ? Or an extra table ? pg_cproc ? Or an in-memory hash : hashtable[ fmgr-callable function ] = C version - What happens if a C function has no SQL-callable equivalent ? Or (ugly) introduce an extra per-type function type_get_function_ptr( function_kind ) which returns the requested function ptr If one of those happens, I'll dust off my old copy-optimization patch ;) Hmm... just my 2c Regards Pierre -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] replay_location indicates incorrect location
Hi, When I looked at pg_stat_replication just after starting the standby before executing any write transactions on the master, I found that replay_location indicated incorrect location different from sent/write/flush_location. Then, if I ran write transaction on the master, replay_location indicated the same location as the others. The cause of this problem is that Xlogctl-recoveryLastRecPtr which points to replay_location is initialized with wrong variable ReadRecPtr. Instead, it should be initialized with EndRecPtr. Attached patch does that. This needs to be backported to 9.0. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 6407,6413 StartupXLOG(void) */ SpinLockAcquire(xlogctl-info_lck); xlogctl-replayEndRecPtr = ReadRecPtr; ! xlogctl-recoveryLastRecPtr = ReadRecPtr; xlogctl-recoveryLastXTime = 0; xlogctl-currentChunkStartTime = 0; xlogctl-recoveryPause = false; --- 6407,6413 */ SpinLockAcquire(xlogctl-info_lck); xlogctl-replayEndRecPtr = ReadRecPtr; ! xlogctl-recoveryLastRecPtr = EndRecPtr; xlogctl-recoveryLastXTime = 0; xlogctl-currentChunkStartTime = 0; xlogctl-recoveryPause = false; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
At this point I feel that this new functionality might be a bit overkill for postgres, maybe it's better to stay lean and mean rather than add a controversial feature like this. I also agree that a more general replication timeout variable would be more useful to a larger audience but that would in my view add more complexity to the replication code which is quite simple and understandable right now ... Anyway, my backup plan was to achieve the same thing by triggering on the logging produced on the primary server and switch to async mode when detecting that the standby replication link has failed (and then back again when it is restored). In effect I would put a replication monitor on the outside of the server instead of embedding it. /A -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CLONE TABLE DATA TO new_table
Hello, I wonder if it would be possible to have a fast table clone function (data only) while copying the corresponding data files instead of using the CREATE TABLE AS way. pg_upgrade seems to have such a mechanisms, though it requires to first stop the server... This would of course require to lock the complete table and ensure that all latest changes are flushed to the plates. I don't know how are the plan about switching from UNLOGGED to LOGGED tables, but I guess this might be required to start logging the table only after the copy. Background: I have daily tables with hourly imports which may contain 100 Mio rows and require 7 indices on them. In order to improve import performances, I first do a copy of the active table, import new data and rebuild the indexes. Thanks for your great job, Marc Mamin
Re: [HACKERS] replay_location indicates incorrect location
On Fri, Jan 13, 2012 at 10:04 AM, Fujii Masao masao.fu...@gmail.com wrote: When I looked at pg_stat_replication just after starting the standby before executing any write transactions on the master, I found that replay_location indicated incorrect location different from sent/write/flush_location. Then, if I ran write transaction on the master, replay_location indicated the same location as the others. The cause of this problem is that Xlogctl-recoveryLastRecPtr which points to replay_location is initialized with wrong variable ReadRecPtr. Instead, it should be initialized with EndRecPtr. Attached patch does that. This needs to be backported to 9.0. Agreed. Will commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] read transaction and sync rep
Hi, I found that even read transaction waits for sync rep when it generates WAL records even if XID is not assigned. For example, imagine the case where SELECT query does a heap clean operation and generates XLOG_HEAP2_CLEAN record. ISTM that such a read transaction doesn't need to wait for sync rep because that's not visible to the client... Can we skip waiting for sync rep if XID is not assigned? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] read transaction and sync rep
On Fri, Jan 13, 2012 at 11:30 AM, Fujii Masao masao.fu...@gmail.com wrote: I found that even read transaction waits for sync rep when it generates WAL records even if XID is not assigned. For example, imagine the case where SELECT query does a heap clean operation and generates XLOG_HEAP2_CLEAN record. ISTM that such a read transaction doesn't need to wait for sync rep because that's not visible to the client... Can we skip waiting for sync rep if XID is not assigned? Your example of XLOG_HEAP2_CLEAN records is a good one but there are other record types and circumstances, as described in the comment near the top of RecordTransactionCommit /* * If we didn't create XLOG entries, we're done here; otherwise we * should flush those entries the same as a commit record. (An * example of a possible record that wouldn't cause an XID to be * assigned is a sequence advance record due to nextval() --- we want * to flush that to disk before reporting commit.) */ if (!wrote_xlog) goto cleanup; Perhaps there is a case to say that sequences don't need to be flushed if all they did was do nextval but no change was associated with that, I'm not sure. So I think there is a case for optimisation using finer grained decision making. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New replication mode: write
On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao masao.fu...@gmail.com wrote: Thought? Comments? This is almost exactly the same as my patch series syncrep_queues.v[1,2].patch earlier this year. Which I know because I was updating that patch myself last night for 9.2. I'm about half way through doing that, since you and I agreed in Ottawa I would do this. Perhaps it is better if we work together? I think this comment is mostly pointless. We don't have time to work together and there's no real reason to. You know what you're doing, so I'll leave you to do it. Please add the Apply mode. OK, will do. In my patch, the reason I avoided doing WRITE mode (which we had previously referred to as RECV) was that no fsync of the WAL contents takes place. In that case we are applying changes using un-fsynced WAL data and in case of crash this would cause a problem. My patch has not changed the execution order of WAL flush and replay. WAL records are always replayed after they are flushed by walreceiver. So, such a problem doesn't happen. But which means that transaction might need to wait for WAL flush caused by previous transaction even if WRITE mode is chosen. Which limits the performance gain by WRITE mode, and should be improved later, I think. I was going to make the WalWriter available during recovery to cater for that. Do you not think that is no longer necessary? That's still necessary to improve the performance in sync rep further, I think. What I'd like to do (maybe in 9.3dev) after supporting WRITE mode is: * Allow WAL records to be replayed before they are flushed to the disk. * Add new GUC parameter specifying whether to allow the standby to defer WAL flush. If the parameter is false, walreceiver flushes WAL whenever it receives WAL (i.e., it's same as the current behavior). If true, walreceiver doesn't flush WAL at all. Instead, walwriter, backend or startup process does that. Walwriter periodically checks whether there is un-flushed WAL file, and flushes it if exists. When the buffer page is written out, backend or startup process forces WAL flush up to buffer's LSN. If the above GUC parameter is set to true (i.e., walreceiver doesn't flush WAL at all) and WRITE mode is chosen, transaction doesn't need to wait for WAL flush on the standby at all. Also the frequency of WAL flush on the standby would become lower, which significantly reduces I/O load. After all, the performance in sync rep would improve very much. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New replication mode: write
On Fri, Jan 13, 2012 at 12:27 PM, Fujii Masao masao.fu...@gmail.com wrote: In my patch, the reason I avoided doing WRITE mode (which we had previously referred to as RECV) was that no fsync of the WAL contents takes place. In that case we are applying changes using un-fsynced WAL data and in case of crash this would cause a problem. My patch has not changed the execution order of WAL flush and replay. WAL records are always replayed after they are flushed by walreceiver. So, such a problem doesn't happen. But which means that transaction might need to wait for WAL flush caused by previous transaction even if WRITE mode is chosen. Which limits the performance gain by WRITE mode, and should be improved later, I think. If the WALreceiver still flushes that is OK. The latency would be smoother and lower if the WALwriter were active. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench post-connection command
There's one part of this that's still fuzzy in the spec I'd like to clarify, if nothing else than for my own memory's sake--as the person most likely to review a random pgbench patch. Simon gave an example like this: pgbench -x SET synchronous_commit = off All are agreed this should take the name of a script instead on the command line: pgbench -x nosync And execute that script as part of the initial setup to every connection. Now: nosync might be a shell script called similarly to \shell. That's more flexible in terms of its ability to do complicated setup things, say a more ambitious iteration along the 'create a table per connection' trail. But it can't really prime connection parameters anymore, so that's pretty worthless. It seems it must then be a pgbench script like -f specifies instead. It will execute SQL and pgbench's \ meta commands. And I think that's OK so long as two things are nailed down (as in, tested to work and hopefully alluded to in the documentation): 1) The pgbench connection setup script can call \shell or \setshell. Then we've got a backdoor to also handle the complicated external script situation. 2) The connection setup script can set variables and they will still be active after passing control to the main test script. Returning to the table per connection sort of idea, that might be: setup: \setrandom tablename 1 100 CREATE TABLE test_:tablename; main: SELECT count(*) FROM test_tablename; I would use this feature all the time once it was added, so glad to see the idea pop up. I also have a long standing personal TODO to write a doc update in this area: suggest how to use environment variables to sneak settings into things. There's been a couple of ideas for pgbench proposed that were blocked with you can already do that setting PGxxx before calling pgbench. That is true, but not obvious, and periodically it get reinvented again. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLONE TABLE DATA TO new_table
On Fri, Jan 13, 2012 at 5:56 AM, Marc Mamin m.ma...@intershop.de wrote: I wonder if it would be possible to have a fast table clone function (data only) while copying the corresponding data files instead of using the CREATE TABLE AS way. pg_upgrade seems to have such a mechanisms, though it requires to first stop the server... This would of course require to lock the complete table and ensure that all latest changes are flushed to the plates. I think it would be possible to implement this. In fact, it could probably be done as a contrib module. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLONE TABLE DATA TO new_table
This would be great, but I can't C :-( Marc -Original Message- From: Robert Haas [mailto:robertmh...@gmail.com] Sent: Freitag, 13. Januar 2012 14:12 To: Marc Mamin Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] CLONE TABLE DATA TO new_table On Fri, Jan 13, 2012 at 5:56 AM, Marc Mamin m.ma...@intershop.de wrote: I wonder if it would be possible to have a fast table clone function (data only) while copying the corresponding data files instead of using the CREATE TABLE AS way. pg_upgrade seems to have such a mechanisms, though it requires to first stop the server... This would of course require to lock the complete table and ensure that all latest changes are flushed to the plates. I think it would be possible to implement this. In fact, it could probably be done as a contrib module. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Disabled features on Hot Standby
The open items list for 9.2 says Index-only scans need to be disabled when running under Hot Standby There is no explanation, so please explain here so we can change it, if possible. Not sure its great policy to keep implementing things that don't work on HS, while not giving a proper chance for other people to fix that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres ReviewFest Patch: URI connection string support for libpq
On 01/12/2012 11:07 PM, Nick Roosevelt wrote: Just reviewed the patch for adding URI connection string support for libpg. ...Seems like the file has changed since this patch was created. Please recreate the patch. Thanks for the review. In the CommitFest appliation, it looks like someone marked this patch Returned with Feedback after adding your review. Whoever did that, it wasn't the right next step for this one. Returned with Feedback suggests a patch has been pushed out of the CommitFest as not possible to commit yet. It's a final state--normally a submission going there is finished with, until the next CommitFest starts. In a case like this one, where a problem is identified but there's still time to fix it, the right next step is Waiting on Author. There's some more details about this at http://wiki.postgresql.org/wiki/Running_a_CommitFest , which we don't expect everyone to know because it isn't one of the popular pages to read. I've sorted out the problem for this one already, just letting whoever made that change know about the process here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Poorly thought out code in vacuum
On Fri, Jan 6, 2012 at 12:45 PM, Robert Haas robertmh...@gmail.com wrote: Oh, that's brilliant. OK, I'll go try that. All right, that test does in fact reveal the broken-ness of the current code, and the patch I committed upthread does seem to fix it, so I've committed that. After further reflection, I'm thinking that skipping the *second* heap pass when a cleanup lock is not immediately available is safe even during an anti-wraparound vacuum (i.e. scan_all = true), because the only thing the second pass is doing is changing dead line pointers to unused, and failing to do that doesn't prevent relfrozenxid advancement: the dead line pointer doesn't contain an XID. The next vacuum will clean it up: it'll see that there's a dead line pointer, add that to the list of TIDs to be killed if seen during the index vacuum (it won't be, since we got that far the first time, but vacuum doesn't know or care), and then try again during the second heap pass to mark that dead line pointer unused. This might be worth a code comment, since it's not entirely obvious. At any rate, it seems that the worst thing that happens from skipping a block during the second pass is that we postpone reclaiming a line pointer whose tuple storage has already been recovered, which is pretty far down in the weeds. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to allow users to kill their own queries
On 01/03/2012 12:59 PM, Tom Lane wrote: Noah Mischn...@leadboat.com writes: Regarding the other message, avoid composing a translated message from independently-translated parts. Yes. I haven't looked at the patch, but I wonder whether it wouldn't be better to dodge both of these problems by having the subroutine return a success/failure result code, so that the actual ereport can be done at an outer level where the full message (and hint) can be written directly. Between your and Noah's comments I understand the issue and likely direction to sort this out now. Bounced forward to the next CommitFest and back on my plate again. Guess I'll be learning new and exciting things about translation this week. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to allow users to kill their own queries
On Fri, Jan 13, 2012 at 14:42, Greg Smith g...@2ndquadrant.com wrote: On 01/03/2012 12:59 PM, Tom Lane wrote: Noah Mischn...@leadboat.com writes: Regarding the other message, avoid composing a translated message from independently-translated parts. Yes. I haven't looked at the patch, but I wonder whether it wouldn't be better to dodge both of these problems by having the subroutine return a success/failure result code, so that the actual ereport can be done at an outer level where the full message (and hint) can be written directly. Between your and Noah's comments I understand the issue and likely direction to sort this out now. Bounced forward to the next CommitFest and back on my plate again. Guess I'll be learning new and exciting things about translation this week. I should say that I have more or less finished this one off, since I figured you were working on more important things. Just a final round of cleanup and commit left, really, so you can take it off your plate again if you want to :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 8:27 AM, Simon Riggs si...@2ndquadrant.com wrote: The open items list for 9.2 says Index-only scans need to be disabled when running under Hot Standby There is no explanation, so please explain here so we can change it, if possible. Not sure its great policy to keep implementing things that don't work on HS, while not giving a proper chance for other people to fix that. Well, I agree that it's not a great policy to implement lots of things that don't work on Hot Standby, but this particular issue is an old one. http://archives.postgresql.org/pgsql-hackers/2008-10/msg01422.php Since the xmin horizon on the standby may precede the xmin horizon on the master, we can't assume that a page is all-visible on the standby just because it's all-visible on the master. The crash-safe visibililty map code is in some ways a step forward for Hot Standby, because in previous releases, we didn't WAL-log setting visibility map bits at all, and therefore a just-promoted Hot Standby server would have no bits set apart from those which had been continuously set since the base backup was taken, or those it happened to get by chance in full page images. Even in old releases, that's not particularly good, because it means that (1) sequential scans on a just-promoted standby will be slower, since those can leverage PD_ALL_VISIBLE to avoid retail MVCC checks on every tuple and (2) the first VACUUM on each table on the standby after promotion will possibly need to scan a lot more of the table than was being scanned by a typical VACUUM on the old master, owing to the relevant bits in the visibillity map not being set. 9.2 will be better in this regard, because the bits will propagate from master to standby as they're set on the master, but it's still not safe to rely on them until we've exited recovery. I'm not sure I have a brilliant idea for how to fix this. We could skip replay of any WAL record that sets a visibility map or PD_ALL_VISIBLE bit unless the page is also all-visible under the standby's xmin horizon, but we'd need some way to know that, and we'd need to worry not only about the records emitted by vacuum that explicitly set the bit, but also about any place where we send the whole page and the bit goes along with it; we'd need to add code to go in and clear the bit. That seems a bit complex and error-prone. Or, we could try to lift this restriction in the special case when hot_standby_feedback is set, though I have a feeling that's not really robust - any time you lose the connection to the master, it'll lose your xmin holdback and possibly mark some things all-visible that really aren't on the standby, and then you reconnect and replay those WAL records and bad things happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote: Since the xmin horizon on the standby may precede the xmin horizon on the master When hot_standby_feedback is turned on, the xmin of the standby is provided to the master to ensure the situation you describe never happens. Surely that means the problem is solved, if we can confirm the setting of the parameter. So this seems like a cleanup issues that can/should be resolved. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] show Heap Fetches in EXPLAIN for index-only scans
I think that people who are using index-only scans are going to want some way to find out the degree to which the scans are in fact index-only. So here's a 5-line patch that adds the number of heap fetches to the EXPLAIN ANALYZE output. This might not be all the instrumentation we'll ever want here, but I think we at least want this much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company explain-heap-fetches.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 10:17 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote: Since the xmin horizon on the standby may precede the xmin horizon on the master When hot_standby_feedback is turned on, the xmin of the standby is provided to the master to ensure the situation you describe never happens. Surely that means the problem is solved, if we can confirm the setting of the parameter. So this seems like a cleanup issues that can/should be resolved. How do you respond to the concerns I raised about that approach in the last sentence of my previous email? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans
On Fri, Jan 13, 2012 at 16:21, Robert Haas robertmh...@gmail.com wrote: I think that people who are using index-only scans are going to want some way to find out the degree to which the scans are in fact index-only. So here's a 5-line patch that adds the number of heap fetches to the EXPLAIN ANALYZE output. This might not be all the instrumentation we'll ever want here, but I think we at least want this much. Agreed. Would also be good to have counter sin pg_stat_* for this, since you'd usually want to look at this kind of data over time as well. In your plans? ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Jan 13, 2012, at 8:03 AM, Robert Haas wrote: Or, we could try to lift this restriction in the special case when hot_standby_feedback is set, though I have a feeling that's not really robust - any time you lose the connection to the master, it'll lose your xmin holdback and possibly mark some things all-visible that really aren't on the standby, and then you reconnect and replay those WAL records and bad things happen. Also, what happens if you started off with hot_standby_feedback turned off? New stuff won't just magically start working when you turn it on (or is that parameter settable only on restart?) ISTM that hot standbys need some ability to store some interim data that is only needed by the hot standby while older transactions are running. IIRC we've seen other places where we have this problem too. Perhaps it would be possible to keep older copies of pages around when there are older transactions running on the standby? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans
On 13 January 2012 15:21, Robert Haas robertmh...@gmail.com wrote: I think that people who are using index-only scans are going to want some way to find out the degree to which the scans are in fact index-only. So here's a 5-line patch that adds the number of heap fetches to the EXPLAIN ANALYZE output. This might not be all the instrumentation we'll ever want here, but I think we at least want this much. Good idea. The fact that that information wasn't available did bother me. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans
On Fri, Jan 13, 2012 at 10:29 AM, Magnus Hagander mag...@hagander.net wrote: On Fri, Jan 13, 2012 at 16:21, Robert Haas robertmh...@gmail.com wrote: I think that people who are using index-only scans are going to want some way to find out the degree to which the scans are in fact index-only. So here's a 5-line patch that adds the number of heap fetches to the EXPLAIN ANALYZE output. This might not be all the instrumentation we'll ever want here, but I think we at least want this much. Agreed. Would also be good to have counter sin pg_stat_* for this, since you'd usually want to look at this kind of data over time as well. In your plans? ;) Not really. I don't have a clear enough idea about what that should look like, and I expect a vigorous debate over the distributed cost of another counter. But I'm happy to have someone who feels more strongly about it than I do take up the cause. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 10:31 AM, Jim Nasby j...@nasby.net wrote: Perhaps it would be possible to keep older copies of pages around when there are older transactions running on the standby? I've thought about that... it's basically a rollback segment, which for that matter might be judged superior to what we do on the master (i.e. VACUUM). Needless to say, that would be just a teensy bit more work than we're likely to have time for in 9.2. But maybe for 10.2... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans
On 13 January 2012 15:31, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 13, 2012 at 10:29 AM, Magnus Hagander mag...@hagander.net wrote: Would also be good to have counter sin pg_stat_* for this, since you'd usually want to look at this kind of data over time as well. In your plans? ;) Not really. I don't have a clear enough idea about what that should look like, and I expect a vigorous debate over the distributed cost of another counter. But I'm happy to have someone who feels more strongly about it than I do take up the cause. Maybe the the ioss_HeapFetches field should be a uint32? That's all it's going to be on some platforms anyway. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Text comparison suddenly can't find collation?
Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com writes: Why would a string comparison work in one case and not another? In the following example, it works to compare a and b, but not a and d. This is in a C module which calls DirectFunctionCall2( text_le, d1, d2 ); As of 9.1, I'd expect that coding to fail every time. text_le needs to be passed a collation, and you aren't doing so. You need to be using DirectFunctionCall2Coll. Where to get the collation from might be an interesting question too, but without more context it's hard to guess what will be appropriate for you. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TG_DEPTH patch v1
[reviving discussion of another old patch] In post: http://archives.postgresql.org/pgsql-hackers/2011-06/msg00870.php Florian Pflug f...@phlo.org wrote: Updated patch attached. Thanks. The trigger depth is incremented before calling the trigger function in ExecCallTriggerFunc() and decremented right afterwards, which seems fine - apart from the fact that the decrement is skipped in case of an error. The patch handles that by saving respectively restoring the value of pg_depth in PushTransaction() respectively {Commit,Abort}SubTransaction(). While I can't come up with a failure case for that approach, it seems rather fragile - who's to say that nested trigger invocations correspond that tightly to sub-transaction? Good question -- is it reasonable to assume that if an error is thrown in a trigger, that the state restored when the error is caught will be at the same trigger depth as when the transaction or subtransaction started? FWIW, the patch, using this technique, has been in production use with about 3,000 OLTP users for about six months, with no apparent problems. On the other hand, we don't do a lot with explicit subtransactions. At the very least this needs a comment explaining why this is safe, Agreed. but I believe the same effect could be achieved more cleanly by a TRY/CATCH block in ExecCallTriggerFunc(). That occurred to me, but I was concerned about the cost of setting and clearing a longjump target for every trigger call. It seems like I've seen comments about that being relatively expensive. Tracking one more int in the current subtransaction structures is pretty cheap, if that works. * Other in-core PLs As it stands, the patch only export tg_depth to plpgsql functions, not to functions written in one of the other in-core PLs. It'd be good to change that, I believe - otherwise the other PLs become second-class citizens in the long run. Are you suggesting that this be implemented as a special trigger variable in every PL, or that it simply be a regular function that returns zero when not in a trigger and some positive value when called from a trigger? The latter seems like a pretty good idea to me. If that is done, is there any point to *also* having a TG_DEPTH trigger variable in plpgsql? (I don't think there is.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans
On Fri, Jan 13, 2012 at 10:41 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 13 January 2012 15:31, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 13, 2012 at 10:29 AM, Magnus Hagander mag...@hagander.net wrote: Would also be good to have counter sin pg_stat_* for this, since you'd usually want to look at this kind of data over time as well. In your plans? ;) Not really. I don't have a clear enough idea about what that should look like, and I expect a vigorous debate over the distributed cost of another counter. But I'm happy to have someone who feels more strongly about it than I do take up the cause. Maybe the the ioss_HeapFetches field should be a uint32? That's all it's going to be on some platforms anyway. I originally had it that way, but then it didn't match what ExplainPropertyLong() was looking for. I didn't think it was worth further complicating explain.c for this... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 3:22 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 13, 2012 at 10:17 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 13, 2012 at 2:03 PM, Robert Haas robertmh...@gmail.com wrote: Since the xmin horizon on the standby may precede the xmin horizon on the master When hot_standby_feedback is turned on, the xmin of the standby is provided to the master to ensure the situation you describe never happens. Surely that means the problem is solved, if we can confirm the setting of the parameter. So this seems like a cleanup issues that can/should be resolved. How do you respond to the concerns I raised about that approach in the last sentence of my previous email? I think it should be you that comes up with a fix, not for me to respond to your concerns about how hard it is. Many things that don't fully work are rejected for that reason. Having said that, I have input that seems to solve the problem. Many WAL records have latestRemovedXid on them. We can use the same idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the latest vacrelstats-latestRemovedXid. That then creates a recovery snapshot conflict that would cancel any query that might then see a page of the vis map that was written when the xmin was later than on the standby. If replication disconnects briefly and a vimap bit is updated that would cause a problem, just as the same situation would cause a problem because of other record types. If problem solved then we can enable IndexOnlyScans on standby. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show Heap Fetches in EXPLAIN for index-only scans
Robert Haas robertmh...@gmail.com writes: So here's a 5-line patch that adds the number of heap fetches to the EXPLAIN ANALYZE output. This might not be all the instrumentation we'll ever want here, but I think we at least want this much. Cosmetic gripes: 1. Please initialize the counter in ExecInitIndexOnlyScan. We don't generally rely on node fields to init as zeroes. 2. Project style is to use foo++, not ++foo, in contexts where it doesn't actually matter which is used. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres ReviewFest Patch: URI connection string support for libpq
There's some more details about this at http://wiki.postgresql.org/wiki/Running_a_CommitFest , which we don't expect everyone to know because it isn't one of the popular pages to read. I've sorted out the problem for this one already, just letting whoever made that change know about the process here. Yeah, I need to go through the new reviewer's comments. Several of them got ahead of the group and started marking things up on the CF page; should be easily cleaned up though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review of: explain / allow collecting row counts without timing info
Eric's review follows: Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from 2012-01-12 git checkout. Patch applied fine. 'make check' results in failures when this patch is put into place. 6 of 128 tests failed. Here are the relevant failures. parallel group (2 tests): create_view create_index create_index ... FAILED parallel group (9 tests): create_cast create_aggregate drop_if_exists typed_table vacuum constraints create_table_like triggers inherit inherit ... FAILED parallel group (20 tests): select_distinct_on btree_index update random select_distinct select_having namespace delete case hash_index union select_implicit select_into transactions portals subselect arrays aggregates join prepared_xacts join ... FAILED aggregates ... FAILED parallel group (15 tests): combocid portals_p2 advisory_lock xmlmap tsdicts guc functional_deps dependency select_views cluster tsearch window foreign_data foreign_key bitmapops foreign_data ... FAILED parallel group (19 tests): limit prepare copy2 conversion xml plancache returning temp sequence without_oid with rowtypes truncate polymorphism domain largeobject rangefuncs alter_table plpgsql alter_table ... FAILED -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 11:13 AM, Simon Riggs si...@2ndquadrant.com wrote: I think it should be you that comes up with a fix, not for me to respond to your concerns about how hard it is. Many things that don't fully work are rejected for that reason. Well, I disagree. The fact that all-visible info can't be trusted in standby mode is a problem that has existed since Hot Standby was committed, and I don't feel obliged to fix it just because I was involved in developing a new feature that happens to rely on all-visible info. I'm sorry to butt heads with you on this one, but this limitation has been long-known and discussed many times before on pgsql-hackers, and I'm not going to drop everything and start working on this just because you seem to think that I should. Having said that, I have input that seems to solve the problem. Many WAL records have latestRemovedXid on them. We can use the same idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the latest vacrelstats-latestRemovedXid. That then creates a recovery snapshot conflict that would cancel any query that might then see a page of the vis map that was written when the xmin was later than on the standby. If replication disconnects briefly and a vimap bit is updated that would cause a problem, just as the same situation would cause a problem because of other record types. That could create a lot of recovery conflicts when hot_standby_feedback=off, I think, but it might work when hot_standby_feedback=on. I don't fully understand the latestRemovedXid machinery, but I guess the idea would be to kill any standby transaction whose proc-xmin precedes the oldest committed xmin or xmax on the page. If hot_standby_feedback=on then there shouldn't be any, except in the case where it's just been enabled or the SR connection is bouncing. Also, what happens if an all-visible bit gets set on the standby through some other mechanism - e.g. restored from an FPI or XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the visibility map page itself, but we certainly do it for the heap pages. So it might be that this infrastructure would (somewhat bizarrely) trust the visibility map bits but not the PD_ALL_VISIBLE bits. I'm hoping Heikki or Tom will comment on this thread, because I think there are a bunch of subtle issues here and that we could easily screw it up by trying to plow through the problem too hastily. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] checkpoint writeback via sync_file_range
On Thu, Jan 12, 2012 at 7:26 PM, Greg Smith g...@2ndquadrant.com wrote: On 1/11/12 9:25 AM, Andres Freund wrote: The heavy pressure putting it directly in the writeback queue leads to less efficient io because quite often it won't reorder sensibly with other io anymore and thelike. At least that was my experience in using it with in another application. Sure, this is one of the things I was cautioning about in the Double Writes thread, with VACUUM being the worst such case I've measured. The thing to realize here is that the data we're talking about must be flushed to disk in the near future. And Linux will happily cache gigabytes of it. Right now, the database asks for that to be forced to disk via fsync, which means in chunks that can be large as a gigabyte. Let's say we have a traditional storage array and there's competing activity. 10MB/s would be a good random I/O write rate in that situation. A single fsync that forces 1GB out at that rate will take *100 seconds*. And I've seen exactly that when trying to--about 80 seconds is my current worst checkpoint stall ever. And we don't have a latency vs. throughput knob any finer than that. If one is added, and you turn it too far toward latency, throughput is going to tank for the reasons you've also seen. Less reordering, elevator sorting, and write combining. If the database isn't going to micro-manage the writes, it needs to give the OS room to do that work for it. Are there any IO benchmarking tools out there that benchmark the effects of reordering, elevator sorting, write combining, etc.? What I've seen is basically either completely sequential or completely random with not much in between. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 5:08 PM, Robert Haas robertmh...@gmail.com wrote: Also, what happens if an all-visible bit gets set on the standby through some other mechanism - e.g. restored from an FPI or XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the visibility map page itself, but we certainly do it for the heap pages. So it might be that this infrastructure would (somewhat bizarrely) trust the visibility map bits but not the PD_ALL_VISIBLE bits. I'm hoping Heikki or Tom will comment on this thread, because I think there are a bunch of subtle issues here and that we could easily screw it up by trying to plow through the problem too hastily. An FPI can't change the all visible flag. If it did, it would imply that we'd skipped generation or replay of an XLOG_HEAP2_VISIBLE record, or that we're doing crash recovery/inital startup and HS is not yet enabled. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Kevin Grittner kevin.gritt...@wicourts.gov writes: You were right. One of the isolation tests failed an assertion; things pass with the attached change, setting the cmax conditionally. Some comments updated. I hope this is helpful. I worked over this patch a bit, mostly cosmetically; updated version attached. However, we're not there yet. I have a couple of remaining concerns: 1. I think the error message needs more thought. I believe it's possible to trigger this error not only with BEFORE triggers, but also with a volatile function used in the query that reaches around and updates row(s) of the target table. Now, I don't have the slightest qualms about breaking any apps that do anything so dirty, but should we try to generalize the message text to cope with that? 2. The HINT needs work too, as it seems pretty useless as it stands. I'd have expected some short reference to the technique of re-executing the update in the trigger and then returning NULL. (BTW, does this patch require any documentation changes, and if so where?) 3. I modified heap_lock_tuple to also return a HeapUpdateFailureData, principally on the grounds that its API should be largely parallel to heap_update, but having done that I can't help wondering whether its callers are okay with skipping self-updated tuples. I see that you changed EvalPlanQualFetch, but I'm not entirely sure that that is right; shouldn't it continue to ignore rows modified by the current command itself? And you did not touch the other two callers, which definitely have got issues. Here is an example, which is basically the reference count case refactored into a single self-referencing table, so that we can hit both parent and child rows in one operation: create table test ( id int primary key, parent int references test, data text, nchildren int not null default 0 ); create function test_ins_func() returns trigger language plpgsql as $$ begin if new.parent is not null then update test set nchildren = nchildren + 1 where id = new.parent; end if; return new; end; $$; create trigger test_ins_trig before insert on test for each row execute procedure test_ins_func(); create function test_del_func() returns trigger language plpgsql as $$ begin if old.parent is not null then update test set nchildren = nchildren - 1 where id = old.parent; end if; return old; end; $$; create trigger test_del_trig before delete on test for each row execute procedure test_del_func(); insert into test values (1, null, 'root'); insert into test values (2, 1, 'root child A'); insert into test values (3, 1, 'root child B'); insert into test values (4, 2, 'grandchild 1'); insert into test values (5, 3, 'grandchild 2'); update test set data = 'root!' where id = 1; select * from test; delete from test; select * from test; drop table test; drop function test_ins_func(); drop function test_del_func(); When you run this, with or without the current patch, you get: id | parent | data | nchildren ++--+--- 2 | 1 | root child A | 1 4 | 2 | grandchild 1 | 0 3 | 1 | root child B | 1 5 | 3 | grandchild 2 | 0 1 || root!| 2 (5 rows) DELETE 4 id | parent | data | nchildren ++---+--- 1 || root! | 0 (1 row) the reason being that when the outer delete arrives at the last (root!) row, GetTupleForTrigger fails because the tuple is already self-updated, and we treat that as canceling the outer delete action. I'm not sure what to do about this. If we throw an error there, there will be no way that the trigger can override the error because it will never get to run. Possibly we could plow ahead with the expectation of throwing an error later if the trigger doesn't cancel the update/delete, but is it safe to do so if we don't hold lock on the tuple? In any case that idea doesn't help with the remaining caller, ExecLockRows. regards, tom lane diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 5f6ac2ec1fdaf7598084150d017d8c85bfeeebe5..eb94060371a41913c63b2187fb4b66012ebfaed3 100644 *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** simple_heap_insert(Relation relation, He *** 2317,2343 * * relation - table to be modified (caller must hold suitable lock) * tid - TID of tuple to be deleted - * ctid - output parameter, used only for failure case (see below) - * update_xmax - output parameter, used only for failure case (see below) * cid - delete command ID (used for visibility test, and stored into * cmax if successful) * crosscheck - if not InvalidSnapshot, also check tuple against this * wait - true if should wait for any conflicting update to commit/abort * * Normal, successful return value is HeapTupleMayBeUpdated, which * actually
Re: [HACKERS] Standalone synchronous master
On Fri, Jan 13, 2012 at 2:30 AM, Alexander Björnhagen alex.bjornha...@gmail.com wrote: At this point I feel that this new functionality might be a bit overkill for postgres, maybe it's better to stay lean and mean rather than add a controversial feature like this. I don't understand why this is controversial. In the current code, if you have a master and a single sync standby, and the master disappears and you promote the standby, now the new master is running *without a standby*. If you are willing to let the new master run without a standby, why are you not willing to let the the old one do so if it were the standby which failed in the first place? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
Jeff Janes jeff.ja...@gmail.com writes: I don't understand why this is controversial. In the current code, if you have a master and a single sync standby, and the master disappears and you promote the standby, now the new master is running *without a standby*. If you configured it to use sync rep, it won't accept any transactions until you give it a standby. If you configured it not to, then it's you that has changed the replication requirements. If you are willing to let the new master run without a standby, why are you not willing to let the the old one do so if it were the standby which failed in the first place? Doesn't follow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] read transaction and sync rep
On Fri, Jan 13, 2012 at 3:44 AM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 13, 2012 at 11:30 AM, Fujii Masao masao.fu...@gmail.com wrote: I found that even read transaction waits for sync rep when it generates WAL records even if XID is not assigned. For example, imagine the case where SELECT query does a heap clean operation and generates XLOG_HEAP2_CLEAN record. ISTM that such a read transaction doesn't need to wait for sync rep because that's not visible to the client... Can we skip waiting for sync rep if XID is not assigned? I think this applies not only to sync rep, but also the xlog sync on the master as well. That is, the transaction could just commit asynchronously. Your example of XLOG_HEAP2_CLEAN records is a good one but there are other record types and circumstances, as described in the comment near the top of RecordTransactionCommit /* * If we didn't create XLOG entries, we're done here; otherwise we * should flush those entries the same as a commit record. (An * example of a possible record that wouldn't cause an XID to be * assigned is a sequence advance record due to nextval() --- we want * to flush that to disk before reporting commit.) */ if (!wrote_xlog) goto cleanup; Perhaps there is a case to say that sequences don't need to be flushed if all they did was do nextval but no change was associated with that, I'm not sure. I think there that that is the case. If one wants to argue that someone could take the sequence value and do something with it outside the database and so that sequence can't be repeated after recovery, then the commit is the wrong place to make that argument. The flush would have to occur before the value is handed out, not before the transaction which obtained the value commits. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
Jeff Janes jeff.ja...@gmail.com wrote:\ I don't understand why this is controversial. I'm having a hard time seeing why this is considered a feature. It seems to me what is being proposed is a mode with no higher integrity guarantee than asynchronous replication, but latency equivalent to synchronous replication. I can see where it's tempting to want to think it gives something more in terms of integrity guarantees, but when I think it through, I'm not really seeing any actual benefit. If this fed into something such that people got jabber message, emails, or telephone calls any time it switched between synchronous and stand-alone mode, that would make it a built-in monitoring, fail-over, and alert system -- which *would* have some value. But in the past we've always recommended external tools for such features. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: 3. I modified heap_lock_tuple to also return a HeapUpdateFailureData, principally on the grounds that its API should be largely parallel to heap_update, but having done that I can't help wondering whether its callers are okay with skipping self-updated tuples. I see that you changed EvalPlanQualFetch, but I'm not entirely sure that that is right; shouldn't it continue to ignore rows modified by the current command itself? I made that change when working on the approach where safe conflicts (like a DELETE from within the BEFORE DELETE trigger) were quietly ignored. Without that change, it didn't see the end of the ctid chain, and didn't behave correctly. I've been wondering if it is still needed. I had been planning to create a test case to try to show that it was. Maybe an UPDATE with a join to force multiple row updates from the primary statement, followed by a triggered UPDATE to the row. If that doesn't need the EvalPlanQualFetch change, I don't know what would. The EvalPlanQual code isn't really exercised by any existing regression tests, because it is only triggered when two sessions concurrently update the same row, something that we avoid for reproducibility's sake in the regression tests. I think we could now test it using the isolation test scaffolding, and maybe it would be a good idea to add some tests there. I did verify that whether that part of your patch is included or not makes no difference to any existing test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Tom Lane t...@sss.pgh.pa.us wrote: I worked over this patch a bit, mostly cosmetically; updated version attached. Thanks! However, we're not there yet. I have a couple of remaining concerns: 1. I think the error message needs more thought. I believe it's possible to trigger this error not only with BEFORE triggers, but also with a volatile function used in the query that reaches around and updates row(s) of the target table. Now, I don't have the slightest qualms about breaking any apps that do anything so dirty, but should we try to generalize the message text to cope with that? I hadn't thought of that, but it does seem possible. Maybe after dealing with the other points, I'll work on a test case to show that behavior. I'm also fine with generating an error for such dirty tricks, and I agree that if that's indeed possible we should make the message general enough to cover that case. Nothing comes to mind at the moment, but I'll think on it. 2. The HINT needs work too, as it seems pretty useless as it stands. I'd have expected some short reference to the technique of re-executing the update in the trigger and then returning NULL. In the previous (rather long) thread on the topic, it seemed that most cases where people have hit this, the problem was easily solved by moving the offending code to the AFTER trigger. The technique of re-issuing the command didn't turn up until much later. I would bet it will be the right thing to do 20% of the time when people get such an error. I don't want to leave the 80% solution out of the hint, but if you don't think it's too verbose, I guess it would be a good thing to mention the 20% solution, too. (BTW, does this patch require any documentation changes, and if so where?) I've been wondering that. Perhaps a paragraph or two with an example on this page?: http://www.postgresql.org/docs/devel/static/trigger-definition.html 3. I modified heap_lock_tuple to also return a HeapUpdateFailureData, principally on the grounds that its API should be largely parallel to heap_update, but having done that I can't help wondering whether its callers are okay with skipping self-updated tuples. I see that you changed EvalPlanQualFetch, but I'm not entirely sure that that is right; shouldn't it continue to ignore rows modified by the current command itself? I made that change when working on the approach where safe conflicts (like a DELETE from within the BEFORE DELETE trigger) were quietly ignored. Without that change, it didn't see the end of the ctid chain, and didn't behave correctly. I've been wondering if it is still needed. I had been planning to create a test case to try to show that it was. Maybe an UPDATE with a join to force multiple row updates from the primary statement, followed by a triggered UPDATE to the row. If that doesn't need the EvalPlanQualFetch change, I don't know what would. And you did not touch the other two callers, which definitely have got issues. Here is an example [example] I'm not sure what to do about this. If we throw an error there, there will be no way that the trigger can override the error because it will never get to run. Possibly we could plow ahead with the expectation of throwing an error later if the trigger doesn't cancel the update/delete, but is it safe to do so if we don't hold lock on the tuple? In any case that idea doesn't help with the remaining caller, ExecLockRows. It will take me some time to work though the example and review the relevant code. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
On 01/13/2012 12:31 AM, Hitoshi Harada wrote: So my conclusion is it's better than nothing, but we could do better job here. From timeline perspective, it'd be ok to apply this patch and improve more later in 9.3+. I agree, let's look at the items other than the target list during 9.3. But I do think this addresses the biggest pain point. Thanks for the review. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Multithread Query Planner
Hi folks. Is there any restriction in create and start threads inside Postgres? I'm trying to develop a multithread planner, and some times is raised a exception of access memory. I'm debugging the code to see if is a bug in the planner, but until now, I still not found. I tried to use the same memory context of root process and create a new context to each new thread, but doesn't worked. Any tips? Att, Fred Enviado via iPad -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multithread Query Planner
On Fri, Jan 13, 2012 at 3:14 PM, Frederico zepf...@gmail.com wrote: Hi folks. Is there any restriction in create and start threads inside Postgres? I'm trying to develop a multithread planner, and some times is raised a exception of access memory. I'm debugging the code to see if is a bug in the planner, but until now, I still not found. I tried to use the same memory context of root process and create a new context to each new thread, but doesn't worked. Any tips? Yes, don't try to use threads. http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F ... threads are not currently used instead of multiple processes for backends because: Historically, threads were poorly supported and buggy. An error in one backend can corrupt other backends if they're threads within a single process Speed improvements using threads are small compared to the remaining backend startup time. The backend code would be more complex. Terminating backend processes allows the OS to cleanly and quickly free all resources, protecting against memory and file descriptor leaks and making backend shutdown cheaper and faster Debugging threaded programs is much harder than debugging worker processes, and core dumps are much less useful Sharing of read-only executable mappings and the use of shared_buffers means processes, like threads, are very memory efficient Regular creation and destruction of processes helps protect against memory fragmentation, which can be hard to manage in long-running processes There's a pretty large burden of reasons *not* to use threads, and while some of them have diminished in importance, most have not. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Tom Lane t...@sss.pgh.pa.us wrote: [rearranging so results directly follow statements] select * from test; id | parent | data | nchildren ++--+--- 2 | 1 | root child A | 1 4 | 2 | grandchild 1 | 0 3 | 1 | root child B | 1 5 | 3 | grandchild 2 | 0 1 || root!| 2 (5 rows) delete from test; DELETE 4 select * from test; id | parent | data | nchildren ++---+--- 1 || root! | 0 (1 row) And other minor updates to the data column can result in totally different sets of rows left over after a DELETE from the table with no WHERE clause. It makes me pretty queasy whenever the semantics of a statement can depend on the order of tuples in the heap. It's pretty hard to view this particular case as anything other than a bug. the reason being that when the outer delete arrives at the last (root!) row, GetTupleForTrigger fails because the tuple is already self-updated, and we treat that as canceling the outer delete action. I'm not sure what to do about this. If we throw an error there, there will be no way that the trigger can override the error because it will never get to run. Possibly we could plow ahead with the expectation of throwing an error later if the trigger doesn't cancel the update/delete, but is it safe to do so if we don't hold lock on the tuple? In any case that idea doesn't help with the remaining caller, ExecLockRows. I'm still trying to sort through what could be done at the source code level, but from a user level I would much rather see an error than such surprising and unpredictable behavior. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: I'm not sure what to do about this. If we throw an error there, there will be no way that the trigger can override the error because it will never get to run. Possibly we could plow ahead with the expectation of throwing an error later if the trigger doesn't cancel the update/delete, but is it safe to do so if we don't hold lock on the tuple? In any case that idea doesn't help with the remaining caller, ExecLockRows. I'm still trying to sort through what could be done at the source code level, but from a user level I would much rather see an error than such surprising and unpredictable behavior. I don't object to throwing an error by default. What I'm wondering is what would have to be done to make such updates work safely. AFAICT, throwing an error in GetTupleForTrigger would preclude any chance of coding a trigger to make this work, which IMO greatly weakens the argument that this whole approach is acceptable. In this particular example, I think it would work just as well to do the reference-count updates in AFTER triggers, and maybe the short answer is to tell people they have to do it like that instead of in BEFORE triggers. However, I wonder what use-case led you to file bug #6123 to begin with. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Kevin Grittner kevin.gritt...@wicourts.gov writes: I'm also fine with generating an error for such dirty tricks, and I agree that if that's indeed possible we should make the message general enough to cover that case. Nothing comes to mind at the moment, but I'll think on it. What do you think of ERROR: tuple to be updated was already modified by an operation triggered by the UPDATE command HINT: Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows. (s/update/delete/ for the DELETE case of course) The phrase triggered by seems slippery enough to cover cases such as a volatile function executed by the UPDATE. The HINT doesn't cover that case of course, but we have a ground rule that HINTs can be wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Tom Lane t...@sss.pgh.pa.us wrote: What do you think of ERROR: tuple to be updated was already modified by an operation triggered by the UPDATE command HINT: Consider using an AFTER trigger instead of a BEFORE trigger to propagate changes to other rows. (s/update/delete/ for the DELETE case of course) The phrase triggered by seems slippery enough to cover cases such as a volatile function executed by the UPDATE. The HINT doesn't cover that case of course, but we have a ground rule that HINTs can be wrong. Looks good to me. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3 feature proposal: vacuumdb -j #
Hackers, It occurs to me that I would find it quite personally useful if the vacuumdb utility was multiprocess capable. For example, just today I needed to manually analyze a database with over 500 tables, on a server with 24 cores. And I needed to know when the analyze was done, because it was part of a downtime. I had to resort to a python script. I'm picturing doing this in the simplest way possible: get the list of tables and indexes, divide them by the number of processes, and give each child process its own list. Any reason not to hack on this for 9.3? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrent CREATE TABLE/DROP SCHEMA leaves inconsistent leftovers
On Tue, Nov 29, 2011 at 10:10 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Nov 29, 2011 at 3:37 AM, Daniel Farina dan...@heroku.com wrote: Hmm, just to prod this thread: has any fix for this been committed? After Nikhil confirmed that this bug could cause pg_dump to not be able to operate without direct catalog surgery I have encountered this bug (and treated its symptoms in the same manner) twice. I tried to do my homework in a backbranch, but am not seeing anything beaming out at me. I have plans to try to improve this, but it's one of those things that I care about more than the people who write the checks do, so it hasn't quite gotten to the top of the priority list yet. All right... I have a patch that I think fixes this, at least so far as relations are concerned. I rewrote RangeVarGetAndCheckCreationNamespace() extensively, did surgery on its callers, and then modified CREATE OR REPLACE VIEW and ALTER relkind .. SET SCHEMA to make use of it rather than doing things as they did before. So this patch prevents (1) concurrently dropping a schema in which a new relation is being created, (2) concurrently dropping a schema into which an existing relation is being moved, and (3) using CREATE OR REPLACE VIEW to attempt to obtain a lock on a relation you don't own (the command would eventually fail, of course, but if there were, say, an outstanding AccessShareLock on the relation, you'd queue up for the lock and thus prevent any further locks from being granted, despite your lack of any rights on the target). It doesn't fix any of the non-relation object types. That would be nice to do, but I think it's material for a separate patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company concurrent-drop-schema.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #
Am 13.01.2012 22:50, schrieb Josh Berkus: It occurs to me that I would find it quite personally useful if the vacuumdb utility was multiprocess capable. For example, just today I needed to manually analyze a database with over 500 tables, on a server with 24 cores. And I needed to know when the analyze was done, because it was part of a downtime. I had to resort to a python script. I'm picturing doing this in the simplest way possible: get the list of tables and indexes, divide them by the number of processes, and give each child process its own list. Any reason not to hack on this for 9.3? I don't see any reason not to do it, but plenty to do it. Right now I have systems hosting many databases, I need to vacuum full from time to time. I have wrapped vacuumdb with a shell script to actually use all the capacity that is available. A vacuumdb -faz just isn't that usefull on large machines anymore. Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #
On Friday, January 13, 2012 10:50:32 PM Josh Berkus wrote: Hackers, It occurs to me that I would find it quite personally useful if the vacuumdb utility was multiprocess capable. For example, just today I needed to manually analyze a database with over 500 tables, on a server with 24 cores. And I needed to know when the analyze was done, because it was part of a downtime. I had to resort to a python script. I'm picturing doing this in the simplest way possible: get the list of tables and indexes, divide them by the number of processes, and give each child process its own list. That doesn't sound like a good idea. Its way too likely that you will end up with one backend doing all the work because it got some big tables. I don't think this task deserves using threads or subprocesses. Multiple connections from one process seems way more sensible and mostly avoids the above problem. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #
On 13-01-2012 18:50, Josh Berkus wrote: It occurs to me that I would find it quite personally useful if the vacuumdb utility was multiprocess capable. It is in the mid of my TODO list. reindexdb is in the plans too. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #
On Fri, Jan 13, 2012 at 4:50 PM, Josh Berkus j...@agliodbs.com wrote: It occurs to me that I would find it quite personally useful if the vacuumdb utility was multiprocess capable. For example, just today I needed to manually analyze a database with over 500 tables, on a server with 24 cores. And I needed to know when the analyze was done, because it was part of a downtime. I had to resort to a python script. I'm picturing doing this in the simplest way possible: get the list of tables and indexes, divide them by the number of processes, and give each child process its own list. I think simplest isn't *quite* best... There's the risk that all the big tables get tied to one child, and so the one child is doing them serially. Better: Have two logical tasks: a) A process that manages the list, and b) Child processes doing vacuums. Each time a child completes a table, it asks the parent for another one. So the tendency will be that if there are 8 big tables, and 12 child processes, it's *certain* that the 8 big tables will be spread across the children. It guarantees that the child processes will all be busy until there are fewer tables left than there are child processes. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 feature proposal: vacuumdb -j #
On 1/13/12 2:12 PM, Euler Taveira de Oliveira wrote: On 13-01-2012 18:50, Josh Berkus wrote: It occurs to me that I would find it quite personally useful if the vacuumdb utility was multiprocess capable. It is in the mid of my TODO list. reindexdb is in the plans too. I'm even happier to have someone else do it. ;-) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Robert Haas robertmh...@gmail.com writes: Maybe we should try to split the baby here and defer the question of whether to expose any of the parse tree internals, and if so how much, to a future release. It seems to me that we could design a fairly useful set of functionality around AFTER-CREATE, BEFORE-DROP, and maybe even AFTER-ALTER triggers without exposing any parse tree details. Please find attached v5 of the patch, with nodeToString() support removed. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support command-trigger.v5.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Monday, December 19, 2011 07:37:46 PM Robert Haas wrote: Maybe we should try to split the baby here and defer the question of whether to expose any of the parse tree internals, and if so how much, to a future release. I personally think this is an error and those details should at least be available on the c level (e.g. some pg_command_trigger_get_plan() function, only available via C) to allow sensible playing around with that knowledge. I don't really see making progress towards a nice interface unless we get something to play around with out there. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
Robert Haas robertmh...@gmail.com writes: Well, I disagree. The fact that all-visible info can't be trusted in standby mode is a problem that has existed since Hot Standby was committed, and I don't feel obliged to fix it just because I was involved in developing a new feature that happens to rely on I'm very sorry to read that, because I sure feel like that's exactly what us non commiters are asked to be doing when cooking any patch. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Standalone synchronous master
Kevin Grittner kevin.gritt...@wicourts.gov writes: I'm having a hard time seeing why this is considered a feature. It seems to me what is being proposed is a mode with no higher integrity guarantee than asynchronous replication, but latency equivalent to synchronous replication. I can see where it's tempting to want to think it gives something more in terms of integrity guarantees, but when I think it through, I'm not really seeing any actual benefit. Same here, so what I think is that the new recv and write modes that Fujii is working on could maybe be demoted from sync variant, while not being really async ones. Maybe “eager” or some other term. It seems to me that would answer the OP use case and your remark here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multithread Query Planner
Christopher Browne cbbro...@gmail.com writes: Yes, don't try to use threads. http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F ... threads are not currently used instead of multiple processes for backends because: I would only add that the backend code is really written in a process based perspective, with a giant number of private variables that are in fact global variables. Trying to “clean” that out in order to get to threads… wow. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Andres Freund and...@anarazel.de writes: I personally think this is an error and those details should at least be available on the c level (e.g. some pg_command_trigger_get_plan() function, only available via C) to allow sensible playing around with that knowledge. I don't really see making progress towards a nice interface unless we get something to play around with out there. If you target C coded triggers then all you need to do is provide a pointer to the Node *parsetree, I would think. What else? The drawback though is still the same, the day you do that you've proposed a public API and changing the parsetree stops being internal refactoring. The way around this problem is that if you want a command trigger in C, just write an extension that implements the Process Utility hook. Bonus, you can have that working with already released versions of PostgreSQL. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On 13.1.2012 18:07, Josh Berkus wrote: Eric's review follows: Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from 2012-01-12 git checkout. Patch applied fine. 'make check' results in failures when this patch is put into place. 6 of 128 tests failed. Here are the relevant failures. parallel group (2 tests): create_view create_index create_index ... FAILED parallel group (9 tests): create_cast create_aggregate drop_if_exists typed_table vacuum constraints create_table_like triggers inherit inherit ... FAILED parallel group (20 tests): select_distinct_on btree_index update random select_distinct select_having namespace delete case hash_index union select_implicit select_into transactions portals subselect arrays aggregates join prepared_xacts join ... FAILED aggregates ... FAILED parallel group (15 tests): combocid portals_p2 advisory_lock xmlmap tsdicts guc functional_deps dependency select_views cluster tsearch window foreign_data foreign_key bitmapops foreign_data ... FAILED parallel group (19 tests): limit prepare copy2 conversion xml plancache returning temp sequence without_oid with rowtypes truncate polymorphism domain largeobject rangefuncs alter_table plpgsql alter_table ... FAILED Fixed. The default value of TIMING option did not work as intended, it was set to true even for plain EXPLAIN (without ANALYZE). In that case the EXPLAIN failed. Tomas diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index 61da6a2..9fc0ae1 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -23,6 +23,7 @@ static intauto_explain_log_min_duration = -1; /* msec or -1 */ static bool auto_explain_log_analyze = false; static bool auto_explain_log_verbose = false; static bool auto_explain_log_buffers = false; +static bool auto_explain_log_timing = false; static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT; static bool auto_explain_log_nested_statements = false; @@ -133,6 +134,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable(auto_explain.log_timing, +Collect timing data, not just row counts., +NULL, + auto_explain_log_timing, +true, +PGC_SUSET, +0, +NULL, +NULL, +NULL); + EmitWarningsOnPlaceholders(auto_explain); /* Install hooks. */ @@ -170,7 +182,12 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags) /* Enable per-node instrumentation iff log_analyze is required. */ if (auto_explain_log_analyze (eflags EXEC_FLAG_EXPLAIN_ONLY) == 0) { - queryDesc-instrument_options |= INSTRUMENT_TIMER; + if (auto_explain_log_timing) + queryDesc-instrument_options |= INSTRUMENT_TIMER; + else + queryDesc-instrument_options |= INSTRUMENT_ROWS; + + if (auto_explain_log_buffers) queryDesc-instrument_options |= INSTRUMENT_BUFFERS; } diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index 8a9c9de..4488956 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -40,6 +40,7 @@ EXPLAIN [ ANALYZE ] [ VERBOSE ] replaceable class=parameterstatement/replac VERBOSE [ replaceable class=parameterboolean/replaceable ] COSTS [ replaceable class=parameterboolean/replaceable ] BUFFERS [ replaceable class=parameterboolean/replaceable ] +TIMING [ replaceable class=parameterboolean/replaceable ] FORMAT { TEXT | XML | JSON | YAML } /synopsis /refsynopsisdiv @@ -172,6 +173,20 @@ ROLLBACK; /varlistentry varlistentry +termliteralTIMING/literal/term +listitem + para + Include information on timing for each node. Specifically, include the + actual startup time and time spent in the node. This may involve a lot + of gettimeofday calls, and on some systems this means significant + slowdown of the query. + This parameter may only be used when literalANALYZE/literal is also + enabled. It defaults to
Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.
On 01/12/2012 09:28 PM, Alex Hunsaker wrote: Util.c/o not depending on plperl_helpers.h was also throwing me for a loop so I fixed it and SPI.c... Thoughts? Basically looks good, but I'm confused by this: do language plperl $$ elog('NOTICE', ${^TAINT}); $$; Why is NOTICE quoted here? It's constant that we've been careful to set up. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Tom Lane t...@sss.pgh.pa.us wrote: In this particular example, I think it would work just as well to do the reference-count updates in AFTER triggers, and maybe the short answer is to tell people they have to do it like that instead of in BEFORE triggers. I think that is quite often the right answer. However, I wonder what use-case led you to file bug #6123 to begin with. In our Circuit Court software, we have about 1600 trigger functions on about 400 tables, and this summer we converted them from our Java middle tier framework to native PostgreSQL triggers. Since we had been writing them in our interpretation of ANSI SQL trigger code, parsing that, and using the parse tree to build a Java class for each trigger, we were able to generate the PostgreSQL trigger functions and CREATE TRIGGER statement mechanically (from the parse tree), with pretty good success. In testing, though, our business analysts noticed a number of situations where an attempt to delete a row actually deleted related rows which should have gone away with the row they were directly trying to delete, but the target row was still there. A few days of investigation, including stepping through query execution in gdb, turned up this issue. Having identified (at least one flavor of) the problem, we grepped the source code for the BEFORE triggers for UPDATE and DELETE statements, and were able to fix a number of them by moving code to AFTER triggers or setting values into NEW fields rather than running an UPDATE. So far, so good. But there were a number of situations where the DELETE of a row needed to cause related rows in other tables to be deleted, and for one reason or another a foreign key with ON DELETE CASCADE was not an option. At the same time, triggers on some of those related tables needed to update summary or redundant data in other tables for performance reasons. Because a number of tables could be involved, and some of the triggers (at the lower levels) could be AFTER triggers and still contribute to the problem, (1) we had no reliable way to ensure we would find all of the cases of this on all code paths, and (2) due to referential integrity and other trigger- based validations, it would be hard to restructure such that the DELETE of the child rows was not done on the BEFORE DELETE trigger of the parent. The patch we've been using in production throws errors if the row for a BEFORE UPDATE trigger is updated by another statement. (Well, OK, you showed me that it really is throwing an error if the row is updated and there has been another statement executed, but as long as it is *more* strict than we actually need, we won't corrupt data -- and the current rule hasn't been hard for us to live with.) It allows the DELETE to proceed if the tuple is updated from within the BEFORE DELETE trigger. We would need to tweak some triggers to move to the approach embodied in the recent patch drafts, but the IF FOUND logic suggested by Florian looks like it will cover all of our use cases, and they should be fairly easy to find with grep. Hopefully this answers your question. I went looking for details on particular failures here, but didn't have luck with so far. I can try again if more detail like that would help. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix breakage from earlier plperl fix.
On Fri, Jan 13, 2012 at 16:07, Andrew Dunstan and...@dunslane.net wrote: On 01/12/2012 09:28 PM, Alex Hunsaker wrote: Util.c/o not depending on plperl_helpers.h was also throwing me for a loop so I fixed it and SPI.c... Thoughts? Basically looks good, but I'm confused by this: do language plperl $$ elog('NOTICE', ${^TAINT}); $$; Why is NOTICE quoted here? It's constant that we've been careful to set up. No reason. [ Err well It was just part of me flailing around while trying to figure out why elog was still crashing even thought I had the issue fixed. The real problem was Util.o was not being recompiled due to Util.c not depending on plperl_helpers.h) ] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remembering bug #6123
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: However, I wonder what use-case led you to file bug #6123 to begin with. [ details ] Hopefully this answers your question. I went looking for details on particular failures here, but didn't have luck with so far. I can try again if more detail like that would help. Well, the bottom line that's concerning me here is whether throwing errors is going to push anyone's application into an unfixable corner. I'm somewhat encouraged that your Circuit Courts software can adapt to it, since that's certainly one of the larger and more complex applications out there. Or at least I would be if you had actually verified that the CC code was okay with the recently-proposed patch versions. Do you have any thorough tests you can run against whatever we end up with? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Friday, January 13, 2012 11:53:32 PM Dimitri Fontaine wrote: Andres Freund and...@anarazel.de writes: I personally think this is an error and those details should at least be available on the c level (e.g. some pg_command_trigger_get_plan() function, only available via C) to allow sensible playing around with that knowledge. I don't really see making progress towards a nice interface unless we get something to play around with out there. If you target C coded triggers then all you need to do is provide a pointer to the Node *parsetree, I would think. What else? Yes. Being able to turn that into a statement again is still valuable imo. The drawback though is still the same, the day you do that you've proposed a public API and changing the parsetree stops being internal refactoring. Yes, sure. I don't particularly care though actually. Changing some generic guts of trigger functions is not really that much of a problem compared to the other stuff involoved in a version migration. The way around this problem is that if you want a command trigger in C, just write an extension that implements the Process Utility hook. The point is that with CREATE COMMAND TRIGGER only the internal part of the triggers need to change. No the external interface. Which is a big difference from my pov. Also hooks are relatively hard to stack, i.e. its hard to use them sensibly from multiple independent projects. They also cannot be purely installed via extensions/sql. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 12:08:05PM -0500, Robert Haas wrote: On Fri, Jan 13, 2012 at 11:13 AM, Simon Riggs si...@2ndquadrant.com wrote: Many WAL records have latestRemovedXid on them. We can use the same idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the latest vacrelstats-latestRemovedXid. That then creates a recovery snapshot conflict that would cancel any query that might then see a page of the vis map that was written when the xmin was later than on the standby. If replication disconnects briefly and a vimap bit is updated that would cause a problem, just as the same situation would cause a problem because of other record types. That could create a lot of recovery conflicts when hot_standby_feedback=off, I think, but it might work when hot_standby_feedback=on. I don't fully understand the latestRemovedXid machinery, but I guess the idea would be to kill any standby transaction whose proc-xmin precedes the oldest committed xmin or xmax on the page. If hot_standby_feedback=on then there shouldn't be any, except in the case where it's just been enabled or the SR connection is bouncing. FWIW, Tom aired the same idea here: http://archives.postgresql.org/message-id/27743.1291135...@sss.pgh.pa.us While reviewing the crash-safe visibility map patch, I echoed the idea and explained why the extra conflicts would be immaterial: http://archives.postgresql.org/message-id/20110618133703.ga1...@tornado.leadboat.com Also, what happens if an all-visible bit gets set on the standby through some other mechanism - e.g. restored from an FPI or XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the visibility map page itself, but we certainly do it for the heap pages. So it might be that this infrastructure would (somewhat bizarrely) trust the visibility map bits but not the PD_ALL_VISIBLE bits. Simon spoke to the FPI side of the question. For heap pages, the XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET TABLESPACE. For the last, we will have already logged any PD_ALL_VISIBLE bits through normal channels. CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or visibility map bits. When, someday, they do, we might emit a separate WAL record to force the recovery conflict. However, CLUSTER/VACUUM FULL already remove tuples still-visible to standby snapshots without provoking a recovery conflict. (Again only with hot_standby_feedback=off.) Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Speed dblink using alternate libpq tuple storage
One patch that fell off the truck during a turn in the November CommitFest was Allow substitute allocators for PGresult. Re-reading the whole thing again, it actually turned into a rather different submission in the middle, and I know I didn't follow that shift correctly then. I'm replying to its thread but have changed the subject to reflect that change. From a procedural point of view, I don't feel right kicking this back to its author on a Friday night when the deadline for resubmitting it would be Sunday. Instead I've refreshed the patch myself and am adding it to the January CommitFest. The new patch is a single file; it's easy enough to split out the dblink changes if someone wants to work with the pieces separately. After my meta-review I think we should get another reviewer familiar with using dblink to look at this next. This is fundamentally a performance patch now. Some results and benchmarking code were submitted along with it; the other issues are moot if those aren't reproducible. The secondary goal for a new review here is to provide another opinion on the naming issues and abstraction concerns raised so far. To clear out the original line of thinking, this is not a replacement low-level storage allocator anymore. The idea of using such a mechanism to help catch memory leaks has also been dropped. Instead this adds and documents a new path for libpq callers to more directly receive tuples, for both improved speed and lower memory usage. dblink has been modified to use this new mechanism. Benchmarking by the author suggests no significant change in libpq speed when only that change was made, while the modified dblink using the new mechanism was significantly faster. It jumped from 332K tuples/sec to 450K, a 35% gain, and had a lower memory footprint too. Test methodology and those results are at http://archives.postgresql.org/pgsql-hackers/2011-12/msg8.php Robert Haas did a quick code review of this already, it along with author response mixed in are at http://archives.postgresql.org/pgsql-hackers/2011-12/msg01149.php I see two areas of contention there: -There are several naming bits no one is happy with yet. Robert didn't like some of them, but neither did Kyotaro. I don't have an opinion myself. Is it the case that some changes to the existing code's terminology are what's actually needed to make this all better? Or is this just fundamentally warty and there's nothing to be done about it. Dunno. -There is an abstraction wrapper vs. coding convenience trade-off centering around PGresAttValue. It sounded to me like it raised always fun questions like where's the right place for the line between lipq-fe.h and libpq-int.h to be? dblink is pretty popular, and this is a big performance win for it. If naming and API boundary issues are the worst problems here, this sounds like something well worth pursuing as part of 9.2's still advancing performance theme. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 36a8e3e..f48fe4f 100644 *** a/contrib/dblink/dblink.c --- b/contrib/dblink/dblink.c *** typedef struct remoteConn *** 63,73 bool newXactForCursor; /* Opened a transaction for a cursor */ } remoteConn; /* * Internal declarations */ static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async); - static void materializeResult(FunctionCallInfo fcinfo, PGresult *res); static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); --- 63,85 bool newXactForCursor; /* Opened a transaction for a cursor */ } remoteConn; + typedef struct storeInfo + { + Tuplestorestate *tuplestore; + int nattrs; + AttInMetadata *attinmeta; + MemoryContext oldcontext; + char *attrvalbuf; + void **valbuf; + size_t *valbufsize; + bool error_occurred; + bool nummismatch; + } storeInfo; + /* * Internal declarations */ static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async); static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); *** static char *escape_param_str(const char *** 90,95 --- 102,111 static void validate_pkattnums(Relation rel, int2vector *pkattnums_arg, int32 pknumatts_arg, int **pkattnums, int *pknumatts); + static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo); + static void finishStoreInfo(storeInfo *sinfo); + static void *addTuple(PGresult *res, AddTupFunc func, int id, size_t size); + /* Global */ static remoteConn *pconn = NULL; *** dblink_fetch(PG_FUNCTION_ARGS) *** 503,508
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 5:37 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: Well, I disagree. The fact that all-visible info can't be trusted in standby mode is a problem that has existed since Hot Standby was committed, and I don't feel obliged to fix it just because I was involved in developing a new feature that happens to rely on I'm very sorry to read that, because I sure feel like that's exactly what us non commiters are asked to be doing when cooking any patch. There is obviously room for disagreement over what should or should not be within the scope of any particular development project, and we may not always agree on that. However, I do try to act in good faith and apply, as much as I can, the same standard to other people's patches as I do to my own. In general, when I ask for the scope of something to be expanded, I try to limit myself to something that I feel can be accomplished in a day or two of development and which are closely related to the core of what the patch is about. There are exceptions, of course, where I feel that more than that is needed, but there are also many examples on file of me arguing for a narrower scope - either that an already-written patch should be divided into pieces so that the uncontroversial pieces can be committed, or that a project need not include every element that anyone wants in order to be acceptable. I try very hard to be sensitive to the fact that non-committers also have stuff they want to get done, which is why I am one of the most prolific committers of the work of non-committers, even extending in numerous instances (most recently, today) to expending rather large amounts of time rewriting stuff that neither I nor my employer have a huge stake in to reach the quality of code that I think is needed for commit, or to satisfy the objections of other community members that I may not personally share. In this particular case, I am not even the person who committed the patch. Tom committed index-only scans well before I thought it was ready for prime time, and then did a whole lot of work to improve it afterwards. Even now, I believe there are significant issues remaining. There is no EXPLAIN support (for which I supported a patch today); the statistics might need work (as Magnus complained about in response to my EXPLAIN patch); we don't yet support index-only quals (i.e. even though you know you'll eventually need the heap tuple, evaluate the quals you can using just the index tuple in the hopes of avoiding some heap fetches); and I'm pretty worried that we'll run across cases where too much of the heap is not-all-visible and we either don't use index only scans or we use them but they don't perform well - a problem which I suspect will require adjustment of our vacuuming algorithm to avoid. With the exception of EXPLAIN support which I think is merely an oversight, all of those issues, including the problems in Hot Standby mode, remain because nobody knows exactly what we ought to do to fix them. When somebody figures it out, I predict they'll get fixed pretty quickly. Now, whether or not that will be in time for 9.2, or whether I'll be the one to write the code, I don't know. But in the meantime, there are really only two choices here: we can understand that the feature we have has some limitations, or we can revert it. Considering the amount of time that was put into it, particular by Tom, I can't imagine that we really want to revert it, but then again I'm relatively shocked that I'm being asked, one day before final CommitFest starts, to drop everything and work on a limitation that I thought had been well-understood by everyone for three years and that it's not clear we know how to lift. It may be that I didn't do a sufficiently good job communicating that this limitation was bound to exist, and I apologize for that. I did mention it on-list multiple times, as did Heikki, and I thought it was understood, but apparently not. If people who didn't object to committing the patch would have done so had they realized this, and don't feel that I made it clear enough, I am sincerely sorry. I would have made the same argument then that I am making now, but at least we would have hashed it out one way or the other before moving forward. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 8:02 PM, Noah Misch n...@leadboat.com wrote: Simon spoke to the FPI side of the question. For heap pages, the XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET TABLESPACE. For the last, we will have already logged any PD_ALL_VISIBLE bits through normal channels. CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or visibility map bits. When, someday, they do, we might emit a separate WAL record to force the recovery conflict. However, CLUSTER/VACUUM FULL already remove tuples still-visible to standby snapshots without provoking a recovery conflict. (Again only with hot_standby_feedback=off.) Is the big about CLUSTER/VACUUM FULL a preexisting bug? If not, why not? Other than that, it seems like we might be converging on a workable solution: if hot_standby_feedback=off, disable index-only scans for snapshots taken during recovery; if hot_standby_feedback=on, generate recovery conflicts when a snapshot's xmin precedes the youngest xmin on a page marked all-visible, but allow the use of index-only scans, and allow sequential scans to trust PD_ALL_VISIBLE. Off the top of my head, I don't see a hole in that logic... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Fri, Jan 13, 2012 at 12:18 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Jan 13, 2012 at 5:08 PM, Robert Haas robertmh...@gmail.com wrote: Also, what happens if an all-visible bit gets set on the standby through some other mechanism - e.g. restored from an FPI or XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the visibility map page itself, but we certainly do it for the heap pages. So it might be that this infrastructure would (somewhat bizarrely) trust the visibility map bits but not the PD_ALL_VISIBLE bits. I'm hoping Heikki or Tom will comment on this thread, because I think there are a bunch of subtle issues here and that we could easily screw it up by trying to plow through the problem too hastily. An FPI can't change the all visible flag. If it did, it would imply that we'd skipped generation or replay of an XLOG_HEAP2_VISIBLE record, or that we're doing crash recovery/inital startup and HS is not yet enabled. Interesting. I hadn't considered that point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disabled features on Hot Standby
On Sat, Jan 14, 2012 at 12:42:02AM -0500, Robert Haas wrote: On Fri, Jan 13, 2012 at 8:02 PM, Noah Misch n...@leadboat.com wrote: Simon spoke to the FPI side of the question. ?For heap pages, the XLOG_HEAP_NEWPAGE consumers are CLUSTER, VACUUM FULL and ALTER TABLE SET TABLESPACE. ?For the last, we will have already logged any PD_ALL_VISIBLE bits through normal channels. ?CLUSTER and VACUUM FULL never set PD_ALL_VISIBLE or visibility map bits. ?When, someday, they do, we might emit a separate WAL record to force the recovery conflict. ?However, CLUSTER/VACUUM FULL already remove tuples still-visible to standby snapshots without provoking a recovery conflict. ?(Again only with hot_standby_feedback=off.) Is the big about CLUSTER/VACUUM FULL a preexisting bug? If not, why not? I suspect it goes back to 9.0, yes. I'm on the fence regarding whether to call it a bug or an unimplemented feature. In any case, +1 for improving it. Other than that, it seems like we might be converging on a workable solution: if hot_standby_feedback=off, disable index-only scans for snapshots taken during recovery; if hot_standby_feedback=on, generate recovery conflicts when a snapshot's xmin precedes the youngest xmin on a page marked all-visible, but allow the use of index-only scans, and allow sequential scans to trust PD_ALL_VISIBLE. Off the top of my head, I don't see a hole in that logic... I wouldn't check hot_standby_feedback. Rather, mirror what we do for XLOG_HEAP2_CLEAN. Unconditionally add an xid to xl_heap_visible bearing the youngest xmin on the page (alternately, some convenient upper bound thereof). Have heap_xlog_visible() call ResolveRecoveryConflictWithSnapshot() on that xid. Now, unconditionally trust PD_ALL_VISIBLE and permit index-only scans. The user's settings of hot_standby_feedback, vacuum_defer_cleanup_age, max_standby_streaming_delay and max_standby_archive_delay will drive the consequential trade-off: nothing, query cancellation, or recovery delay. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers