Re: [HACKERS] Hot standby, recovery infra
Simon Riggs wrote: On Thu, 2009-02-05 at 09:28 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: I would suggest that at end of recovery we write the last LSN to the control file, so if we crash recover then we will always end archive recovery at the same place again should we re-enter it. So we would have a recovery_target_lsn that overrides recovery_target_xid etc.. Hmm, we don't actually want to end recovery at the same point again: if there's some updates right after the database came up, but before the first checkpoint and crash, those actions need to be replayed too. I think we do need to. Crash recovery is supposed to return us to the same state. Where we ended ArchiveRecovery is part of that state. It isn't for crash recovery to decide that it saw a few more transactions and decided to apply them anyway. The user may have specifically ended recovery to avoid those additional changes from taking place. Those additional changes were made in the standby, after ending recovery. So the sequence of events is: 1. Standby performs archive recovery 2. End of archive (or stop point) reached. Open for connections, read-write. Request an online checkpoint. Online checkpoint begins. 3. A user connects, and does INSERT INTO foo VALUES (123). Commits, commit returns. 4. pg_ctl stop -m immediate. The checkpoint started in step 2 hasn't finished yet 5. Restart the server. The server will now re-enter archive recovery. But this time, we want to replay the INSERT too. (let's do the startup checkpoint for now, as discussed elsewhere in this thread) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-updated fields
Christopher Browne wrote: On Wed, Feb 4, 2009 at 1:23 PM, Bruce Momjian br...@momjian.us wrote: Robert Treat wrote: On Wednesday 21 January 2009 20:21:41 Bruce Momjian wrote: CREATE FUNCTION last_updated() RETURNS trigger AS $$ BEGIN NEW.last_update = CURRENT_TIMESTAMP; RETURN NEW; END $$ LANGUAGE plpgsql; It requires you name your column last_update, which is what the naming convention is in pagila, but might not work for everyone. Can someone work with that and move forward? Or maybe give a more specific pointer to the generic trigger stuff (I've not looked at it before) Well, I thought it was a good idea, but no one seems to want to do the work. I'd like to see more options than that, which, it seems to me, establishes a need for more design work. Another perspective on temporality is to have a transaction column which points (via foreign key) to a transaction table, where you would use currval('transaction_sequence') as the value instead of CURRENT_TIMESTAMP. I use the following: CREATE OR REPLACE FUNCTION lastupdate() RETURNS TRIGGER AS $$ BEGIN IF OLD.lastupdate=NEW.lastupdate THEN NEW.lastupdate:=CURRENT_TIMESTAMP; ELSIF OLD.lastupdate IS NULL OR NEW.lastupdate IS NULL THEN RAISE EXCEPTION 'Concurrent modification of table %',TG_ARGV[0]; END IF; RETURN NEW; END;$$ LANGUAGE PLPGSQL; Which allows detection of concurrent updates on the same page (if the lastupdate value is being fetched before the update-template is filled). -- Sincerely, Stephen R. van den Berg. Auto repair rates: basic labor $40/hour; if you wait, $60; if you watch, $80; if you ask questions, $100; if you help, $120; if you laugh, $140. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
Simon Riggs wrote: On Thu, 2009-02-05 at 09:28 +0200, Heikki Linnakangas wrote: I've changed the way minRecoveryPoint is updated now anyway, so it no longer happens every XLogFileRead(). Care to elucidate? I got rid of minSafeStartPoint, advancing minRecoveryPoint instead. And it's advanced in XLogFlush instead of XLogFileRead. I'll post an updated patch soon. Why do you think XLogFlush is called less frequently than XLogFileRead? It's not, but we only need to update the control file when we're flushing an LSN that's greater than current minRecoveryPoint. And when we do update minRecoveryPoint, we can update it to the LSN of the last record we've read from the archive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
On Thu, 2009-02-05 at 10:07 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2009-02-05 at 09:28 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: I would suggest that at end of recovery we write the last LSN to the control file, so if we crash recover then we will always end archive recovery at the same place again should we re-enter it. So we would have a recovery_target_lsn that overrides recovery_target_xid etc.. Hmm, we don't actually want to end recovery at the same point again: if there's some updates right after the database came up, but before the first checkpoint and crash, those actions need to be replayed too. I think we do need to. Crash recovery is supposed to return us to the same state. Where we ended ArchiveRecovery is part of that state. It isn't for crash recovery to decide that it saw a few more transactions and decided to apply them anyway. The user may have specifically ended recovery to avoid those additional changes from taking place. Those additional changes were made in the standby, after ending recovery. So the sequence of events is: 1. Standby performs archive recovery 2. End of archive (or stop point) reached. Open for connections, read-write. Request an online checkpoint. Online checkpoint begins. 3. A user connects, and does INSERT INTO foo VALUES (123). Commits, commit returns. 4. pg_ctl stop -m immediate. The checkpoint started in step 2 hasn't finished yet 5. Restart the server. The server will now re-enter archive recovery. But this time, we want to replay the INSERT too. I agree with this, so let me restate both comments together. When the server starts it begins a new timeline. When recovering we must switch to that timeline at the same point we did previously when we ended archive recovery. We currently don't record when that is and we need to. Yes, we must also replay the records in the new timeline once we have switched to it, as you say. But we must not replay any further in the older timeline(s). -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
On Thu, 2009-02-05 at 10:31 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2009-02-05 at 09:28 +0200, Heikki Linnakangas wrote: I've changed the way minRecoveryPoint is updated now anyway, so it no longer happens every XLogFileRead(). Care to elucidate? I got rid of minSafeStartPoint, advancing minRecoveryPoint instead. And it's advanced in XLogFlush instead of XLogFileRead. I'll post an updated patch soon. Why do you think XLogFlush is called less frequently than XLogFileRead? It's not, but we only need to update the control file when we're flushing an LSN that's greater than current minRecoveryPoint. And when we do update minRecoveryPoint, we can update it to the LSN of the last record we've read from the archive. So we might end up flushing more often *and* we will be doing it potentially in the code path of other users. This change seems speculative and also against what has previously been agreed with Tom. If he chooses not to comment on your changes, that's up to him, but I don't think you should remove things quietly that have been put there through the community process, as if they caused problems. I feel like I'm in the middle here. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
On Thu, 2009-02-05 at 09:26 +, Simon Riggs wrote: This change seems speculative and also against what has previously been agreed with Tom. If he chooses not to comment on your changes, that's up to him, but I don't think you should remove things quietly that have been put there through the community process, as if they caused problems. I feel like I'm in the middle here. This is only a problem because of two independent reviewers. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
Simon Riggs wrote: On Thu, 2009-02-05 at 10:31 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2009-02-05 at 09:28 +0200, Heikki Linnakangas wrote: I got rid of minSafeStartPoint, advancing minRecoveryPoint instead. And it's advanced in XLogFlush instead of XLogFileRead. I'll post an updated patch soon. Why do you think XLogFlush is called less frequently than XLogFileRead? It's not, but we only need to update the control file when we're flushing an LSN that's greater than current minRecoveryPoint. And when we do update minRecoveryPoint, we can update it to the LSN of the last record we've read from the archive. So we might end up flushing more often *and* we will be doing it potentially in the code path of other users. For example, imagine a database that fits completely in shared buffers. If we update at every XLogFileRead, we have to fsync every 16MB of WAL. If we update in XLogFlush the way I described, you only need to update when we flush a page from the buffer cache, which will only happen at restartpoints. That's far less updates. Expanding that example to a database that doesn't fit in cache, you're still replacing pages from the buffer cache that have been untouched for longest. Such pages will have an old LSN, too, so we shouldn't need to update very often. I'm sure you can come up with an example of where we end up fsyncing more often, but it doesn't seem like the common case to me. This change seems speculative and also against what has previously been agreed with Tom. If he chooses not to comment on your changes, that's up to him, but I don't think you should remove things quietly that have been put there through the community process, as if they caused problems. I feel like I'm in the middle here. I'd like to have the extra protection that this approach gives. If we let safeStartPoint to be ahead of the actual WAL we've replayed, we have to just assume we're fine if we reach end of WAL before reaching that point. That assumption falls down if e.g recovery is stopped, and you go and remove the last few WAL segments from the archive before restarting it, or signal pg_standby to trigger failover too early. Tracking the real safe starting point and enforcing it always protects you from that. (we did discuss this a week ago: http://archives.postgresql.org/message-id/4981f6e0.2040...@enterprisedb.com) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
On Thu, 2009-02-05 at 11:46 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: So we might end up flushing more often *and* we will be doing it potentially in the code path of other users. For example, imagine a database that fits completely in shared buffers. If we update at every XLogFileRead, we have to fsync every 16MB of WAL. If we update in XLogFlush the way I described, you only need to update when we flush a page from the buffer cache, which will only happen at restartpoints. That's far less updates. Oh, did you change the bgwriter so it doesn't do normal page cleaning? General thoughts: Latest HS patch has a CPU profile within 1-2% of current and the use of ProcArrayLock is fairly minimal now. The additional CPU is recoveryStopsHere(), which enables the manual control of recovery, so the trade off seems worth it. The major CPU hog remains RecordIsValid, which is the CRC checks. Startup is still I/O bound. Specific avoidable I/O hogs are (1) checkpoints, (2) page cleaning so I hope we can avoid both of those. I also hope to minimise the I/O cost of keeping track of the consistency point. If that was done as infrequently as each restartpoint then I would certainly be very happy, but that won't happen in the proposed scheme if we do page cleaning. Expanding that example to a database that doesn't fit in cache, you're still replacing pages from the buffer cache that have been untouched for longest. Such pages will have an old LSN, too, so we shouldn't need to update very often. They will tend to be written in ascending LSN order which will mean we continually update the control file. Anything out of order does skip a write. The better the cache is at finding LRU blocks out the more writes we will make. I'm sure you can come up with an example of where we end up fsyncing more often, but it doesn't seem like the common case to me. I'm not trying to come up with counterexamples... This change seems speculative and also against what has previously been agreed with Tom. If he chooses not to comment on your changes, that's up to him, but I don't think you should remove things quietly that have been put there through the community process, as if they caused problems. I feel like I'm in the middle here. I'd like to have the extra protection that this approach gives. If we let safeStartPoint to be ahead of the actual WAL we've replayed, we have to just assume we're fine if we reach end of WAL before reaching that point. That assumption falls down if e.g recovery is stopped, and you go and remove the last few WAL segments from the archive before restarting it, or signal pg_standby to trigger failover too early. Tracking the real safe starting point and enforcing it always protects you from that. Doing it this way will require you to remove existing specific error messages about ending before end time of backup, to be replaced by more general ones that say consistency not reached which is harder to figure out what to do about it. (we did discuss this a week ago: http://archives.postgresql.org/message-id/4981f6e0.2040...@enterprisedb.com) Yes, we need to update it in that case. Though that is no way agreement to the other changes, nor does it require them. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
Simon Riggs wrote: On Thu, 2009-02-05 at 11:46 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: So we might end up flushing more often *and* we will be doing it potentially in the code path of other users. For example, imagine a database that fits completely in shared buffers. If we update at every XLogFileRead, we have to fsync every 16MB of WAL. If we update in XLogFlush the way I described, you only need to update when we flush a page from the buffer cache, which will only happen at restartpoints. That's far less updates. Oh, did you change the bgwriter so it doesn't do normal page cleaning? No. Ok, that wasn't completely accurate. The page cleaning by bgwriter will perform XLogFlushes, but that should be pretty insignificant. When there's little page replacement going on, bgwriter will do a small trickle of page cleaning, which won't matter much. If there's more page replacement going on, bgwriter is cleaning up pages that will soon be replaced, so it's just offsetting work from other backends (or the startup process in this case). Expanding that example to a database that doesn't fit in cache, you're still replacing pages from the buffer cache that have been untouched for longest. Such pages will have an old LSN, too, so we shouldn't need to update very often. They will tend to be written in ascending LSN order which will mean we continually update the control file. Anything out of order does skip a write. The better the cache is at finding LRU blocks out the more writes we will make. When minRecoveryPoint is updated, it's not update to just the LSN that's being flushed. It's updated to the recptr of the most recently read WAL record. That's an important point to avoid that behavior. Just like XLogFlush normally always flushes all of the outstanding WAL, not just up to the requested LSN. I'd like to have the extra protection that this approach gives. If we let safeStartPoint to be ahead of the actual WAL we've replayed, we have to just assume we're fine if we reach end of WAL before reaching that point. That assumption falls down if e.g recovery is stopped, and you go and remove the last few WAL segments from the archive before restarting it, or signal pg_standby to trigger failover too early. Tracking the real safe starting point and enforcing it always protects you from that. Doing it this way will require you to remove existing specific error messages about ending before end time of backup, to be replaced by more general ones that say consistency not reached which is harder to figure out what to do about it. Yeah. If that's an important distinction, we could still save the original backup stop location somewhere, just so that we can give the old error message when we've not passed that location. But perhaps a message like WAL ends before reaching a consistent state with a hint Make sure you archive all the WAL created during backup or something would do suffice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
On Thu, 2009-02-05 at 13:18 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2009-02-05 at 11:46 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: So we might end up flushing more often *and* we will be doing it potentially in the code path of other users. For example, imagine a database that fits completely in shared buffers. If we update at every XLogFileRead, we have to fsync every 16MB of WAL. If we update in XLogFlush the way I described, you only need to update when we flush a page from the buffer cache, which will only happen at restartpoints. That's far less updates. Oh, did you change the bgwriter so it doesn't do normal page cleaning? No. Ok, that wasn't completely accurate. The page cleaning by bgwriter will perform XLogFlushes, but that should be pretty insignificant. When there's little page replacement going on, bgwriter will do a small trickle of page cleaning, which won't matter much. Yes, that case is good, but it wasn't the use case we're trying to speed up by having the bgwriter active during recovery. We're worried about I/O bound recoveries. If there's more page replacement going on, bgwriter is cleaning up pages that will soon be replaced, so it's just offsetting work from other backends (or the startup process in this case). Which only needs to be done if we follow this route, so is not an argument in favour. Using more I/O in I/O bound cases makes the worst case even worse. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synch Replication
Hi Niranjan, On Thu, Feb 5, 2009 at 3:34 PM, K, Niranjan (NSN - IN/Bangalore) niranja...@nsn.com wrote: I added the further logging codes into the patch, and uploaded the source. If you have time, please try it and report the results. Refer attached logs. Thanks, but I have not identified the cause yet. Sorry. I changed the code to dump all messages for replication, so please try it and report the result. The messages which the standby server receives are logged in the file (*1). Please send also that file. http://senduit.com/d48e0a (*1) installation_directory/sbydata/walreceiver_trace.out Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
Simon Riggs wrote: On Thu, 2009-02-05 at 13:18 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2009-02-05 at 11:46 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: So we might end up flushing more often *and* we will be doing it potentially in the code path of other users. For example, imagine a database that fits completely in shared buffers. If we update at every XLogFileRead, we have to fsync every 16MB of WAL. If we update in XLogFlush the way I described, you only need to update when we flush a page from the buffer cache, which will only happen at restartpoints. That's far less updates. Oh, did you change the bgwriter so it doesn't do normal page cleaning? No. Ok, that wasn't completely accurate. The page cleaning by bgwriter will perform XLogFlushes, but that should be pretty insignificant. When there's little page replacement going on, bgwriter will do a small trickle of page cleaning, which won't matter much. Yes, that case is good, but it wasn't the use case we're trying to speed up by having the bgwriter active during recovery. We're worried about I/O bound recoveries. Ok, let's do the math: By updating minRecoveryPoint in XLogFileRead, you're fsyncing the control file once every 16MB of WAL. By updating in XLogFlush, the frequency depends on the amount of shared_buffers available to buffer the modified pages, the average WAL record size, and the cache hit ratio. Let's determine the worst case: The smallest WAL record that dirties a page is a heap deletion record. That contains just enough information to locate the tuple. If I'm reading the headers right, that record is 48 bytes long (28 bytes of xlog header + 18 bytes of payload + padding). Assuming that the WAL is full of just those records, and there's no full page images, and that the cache hit ratio is 0%, we will need (16 MB / 48 B) * 8 kB = 2730 MB of shared_buffers to achieve the once per 16 MB of WAL per one fsync mark. So if you have a lower shared_buffers setting than 2.7 GB, you can have more frequent fsyncs this way in the worst case. If you think of the typical case, you're probably not doing all deletes, and you're having a non-zero cache hit ratio, so you achieve the same frequency with a much lower shared_buffers setting. And if you're really that I/O bound, I doubt the few extra fsyncs matter much. Also note that when the control file is updated in XLogFlush, it's typically the bgwriter doing it as it cleans buffers ahead of the clock hand, not the startup process. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-updated fields
Christopher Browne wrote: I think this is a TODO, but not sure who is working on it or what needs to be done. The generic version in pagila is perhaps not generic enough: CREATE FUNCTION last_updated() RETURNS trigger AS $$ BEGIN NEW.last_update = CURRENT_TIMESTAMP; RETURN NEW; END $$ LANGUAGE plpgsql; It requires you name your column last_update, which is what the naming convention is in pagila, but might not work for everyone. Can someone work with that and move forward? Or maybe give a more specific pointer to the generic trigger stuff (I've not looked at it before) Well, I thought it was a good idea, but no one seems to want to do the work. I'd like to see more options than that, which, it seems to me, establishes a need for more design work. At the very least it should not have a hard-coded field name in it. You should pass the field name to be set as a parameter in the trigger setup. That's probably a lot more doable if the trigger is written in C, and in any case I think any prepackaged triggers we provide should be written in C. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
On Thu, 2009-02-05 at 14:18 +0200, Heikki Linnakangas wrote: when the control file is updated in XLogFlush, it's typically the bgwriter doing it as it cleans buffers ahead of the clock hand, not the startup process That is the key point. Let's do it your way. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Auto-updated fields
At the very least it should not have a hard-coded field name in it. You should pass the field name to be set as a parameter in the trigger setup. That's probably a lot more doable if the trigger is written in C, and in any case I think any prepackaged triggers we provide should be written in C. +1. Although, I'm not sure there's much point in providing a prepackaged trigger that does something you could accomplish just as well with a single line of PL/pgsql, even if we do rewrite it in C. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #4516: FOUND variable does not work after RETURN QUERY
Pavel Stehule wrote: I am sending patch, that adds FOUND and GET DIAGNOSTICS support for RETURN QUERY statement Updated patch attached and applied. Thanks. --- Regards Pavel Stehule 2008/11/10 Andrew Gierth and...@tao11.riddles.org.uk: Pavel == Pavel Stehule pavel.steh...@gmail.com writes: Well, changing the semantics of an already-released statement carries a risk of breaking existing apps that aren't expecting it to change FOUND. So I'd want to see a pretty strong case why this is important --- not just that it didn't meet someone's didn't-read-the-manual expectation. Pavel It's should do some problems, but I belive much less than Pavel change of casting or tsearch2 integration. And actually it's Pavel not ortogonal. Every not dynamic statement change FOUND Pavel variable. Regardless of what you think of FOUND, a more serious problem is this: postgres=# create function test(n integer) returns setof integer language plpgsql as $f$ declare rc bigint; begin return query (select i from generate_series(1,n) i); get diagnostics rc = row_count; raise notice 'rc = %',rc; end; $f$; CREATE FUNCTION postgres=# select test(3); NOTICE: rc = 0 test -- 1 2 3 (3 rows) Since GET DIAGNOSTICS is documented as working for every SQL query executed in the function, rather than for a specific list of constructs, this is clearly a bug. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-bugs mailing list (pgsql-b...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs [ Attachment, skipping... ] -- Sent via pgsql-bugs mailing list (pgsql-b...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/plpgsql.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/plpgsql.sgml,v retrieving revision 1.137 diff -c -c -r1.137 plpgsql.sgml *** doc/src/sgml/plpgsql.sgml 4 Feb 2009 21:30:41 - 1.137 --- doc/src/sgml/plpgsql.sgml 5 Feb 2009 15:15:03 - *** *** 1356,1361 --- 1356,1369 execution of other statements within the loop body. /para /listitem + listitem +para + A commandRETURN QUERY/command and commandRETURN QUERY + EXECUTE/command statements set literalFOUND/literal + true if the query returns at least one row, false if no row + is returned. +/para + /listitem /itemizedlist literalFOUND/literal is a local variable within each Index: src/pl/plpgsql/src/pl_exec.c === RCS file: /cvsroot/pgsql/src/pl/plpgsql/src/pl_exec.c,v retrieving revision 1.231 diff -c -c -r1.231 pl_exec.c *** src/pl/plpgsql/src/pl_exec.c 21 Jan 2009 11:13:14 - 1.231 --- src/pl/plpgsql/src/pl_exec.c 5 Feb 2009 15:15:03 - *** *** 2286,2291 --- 2286,2292 PLpgSQL_stmt_return_query *stmt) { Portal portal; + uint32 processed = 0; if (!estate-retisset) ereport(ERROR, *** *** 2327,2332 --- 2328,2334 HeapTuple tuple = SPI_tuptable-vals[i]; tuplestore_puttuple(estate-tuple_store, tuple); + processed++; } MemoryContextSwitchTo(old_cxt); *** *** 2336,2341 --- 2338,2346 SPI_freetuptable(SPI_tuptable); SPI_cursor_close(portal); + estate-eval_processed = processed; + exec_set_found(estate, processed != 0); + return PLPGSQL_RC_OK; } Index: src/test/regress/expected/plpgsql.out === RCS file: /cvsroot/pgsql/src/test/regress/expected/plpgsql.out,v retrieving revision 1.66 diff -c -c -r1.66 plpgsql.out *** src/test/regress/expected/plpgsql.out 18 Jul 2008 03:32:53 - 1.66 --- src/test/regress/expected/plpgsql.out 5 Feb 2009 15:15:03 - *** *** 3666,3668 --- 3666,3700 (2 rows) drop function tftest(int); + create or replace function rttest() + returns setof int as $$ + declare rc int; + begin + return query values(10),(20); + get diagnostics rc = row_count; + raise notice '% %', found, rc; + return query select * from (values(10),(20)) f(a) where false; + get diagnostics rc = row_count; + raise notice '% %', found, rc; + return query execute 'values(10),(20)'; + get diagnostics rc = row_count; + raise notice '% %', found, rc; + return query execute 'select * from
[HACKERS] Fixing Grittner's planner issues
I looked into the planning problems in HEAD discussed here: http://archives.postgresql.org/pgsql-hackers/2009-02/msg00120.php As I suspected, there are really two independent bugs there, though both are in the new semi/antijoin planning code. The first problem is the one already mentioned: cost_mergejoin fails to penalize semi/anti joins for nonunique merge keys, although nonunique keys require rescanning of parts of the inner relation and can result in huge runtime increases. After unsuccessfully fooling with some quick and dirty fixes, I remembered why I'd tried to dodge the problem: the correct way to estimate the amount of rescanning done involves estimating the merge condition selectivity as if it were a plain inner join condition, not a semi or anti join. We have code to do that, certainly, but this means that we will sometimes demand the selectivity of the same RestrictInfo under JOIN_INNER rules and sometimes under SEMI or ANTI rules. And there's only space to cache one selectivity there. And we *really* need to cache both because cost_mergejoin tends to get done a large number of times in a complex join problem. The only good solution I can see is to add another selectivity cache field to RestrictInfo so that we can cache both the JOIN_INNER selectivity and the native selectivity of the qual's context. This is slightly annoying from a space consumption point of view, but it's not fatal. The second problem is specific to the particular structure of Kevin's query: SELECT ... FROM Case C LEFT OUTER JOIN CaseDispo CD ON (CD.caseNo = C.caseNo) AND (CD.countyNo = C.countyNo) AND (NOT (EXISTS (SELECT 1 FROM CaseDispo CD2 WHERE (CD2.caseNo = CD.caseNo) AND (CD2.countyNo = CD.countyNo) AND (CD2.dispoDate CD.dispoDate WHERE some-clause-that-selects-just-a-few-C-rows that is, the EXISTS clause is part of the ON condition of an outer join. If it referred to any variables of the left side of the upper join (ie, C here) then we couldn't convert it to a separate join at all. Since it refers only to CD, it's semantically legitimate to push it down into the right-hand side of the outer join, and then we can convert it to a lower outer join. The problem is that having done so, the resulting anti join doesn't commute with the upper outer join. So we're forced to evaluate it first, and in this particular example this results in a much worse plan than leaving it as a SubLink would produce. A SubLink isn't evaluated after the upper join, exactly, but from a performance point of view we get that effect because it only gets evaluated after all the other join conditions have succeeded for a particular pair of C and CD rows. In this case we only need the answer for a few C/CD rows so that's a win. (If we needed the answer for most CD rows, it's not a win; but I don't think we can optimize that case at the expense of getting miserably slower for cases we used to be good at.) I think the only fix for this that's practical in the 8.4 time frame is to give up trying to flatten EXISTS/NOT EXISTS that occur within the ON condition of an outer join, ie, revert to 8.3's level of intelligence about this case. It seems like a general purpose solution would involve somehow being able to separate the semantic effects of an outer join (in particular, null-insertion) from the time at which it's performed, and I've really got no idea how to do that or what consequences it would have for both the planner and the executor. Reflecting on this further, I suspect there are also some bugs in the planner's rules about when semi/antijoins can commute with other joins; but that's not directly causing Kevin's problem, because the rules do make the correct decision for this case. So I've got a few must fix items here, but before I go off to start doing that, I wondered if anyone had any comments or better ideas. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing Grittner's planner issues
Tom Lane t...@sss.pgh.pa.us wrote: SELECT ... FROM Case C LEFT OUTER JOIN CaseDispo CD ON (CD.caseNo = C.caseNo) AND (CD.countyNo = C.countyNo) AND (NOT (EXISTS (SELECT 1 FROM CaseDispo CD2 WHERE (CD2.caseNo = CD.caseNo) AND (CD2.countyNo = CD.countyNo) AND (CD2.dispoDate CD.dispoDate WHERE some-clause-that-selects-just-a-few-C-rows that is, the EXISTS clause is part of the ON condition of an outer join. If it referred to any variables of the left side of the upper join (ie, C here) then we couldn't convert it to a separate join at all. I wondered if anyone had any comments The only thing that comes to mind for me that seems possibly helpful is that we have typically considered it obvious that in the context of the NOT EXISTS clause we have already established that (CD.caseNo = C.caseNo) AND (CD.countyNo = C.countyNo) and have not been at all consistent about whether we used C or CD to compare to CD2. Our operating assumption has been that it didn't matter in that context. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing Grittner's planner issues
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: If it referred to any variables of the left side of the upper join (ie, C here) then we couldn't convert it to a separate join at all. The only thing that comes to mind for me that seems possibly helpful is that we have typically considered it obvious that in the context of the NOT EXISTS clause we have already established that (CD.caseNo = C.caseNo) AND (CD.countyNo = C.countyNo) and have not been at all consistent about whether we used C or CD to compare to CD2. Our operating assumption has been that it didn't matter in that context. Yeah, I had thought about suggesting that you could use that to determine which type of plan got chosen, but immediately dismissed it as too bletcherous for words. In any case the same type of problem could occur in scenarios where the upper and lower join conditions aren't so neatly related. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot standby, recovery infra
On Thu, 2009-02-05 at 21:54 +0200, Heikki Linnakangas wrote: - If bgwriter is performing a restartpoint when recovery ends, the startup checkpoint will be queued up behind the restartpoint. And since it uses the same smoothing logic as checkpoints, it can take quite some time for that to finish. The original patch had some code to hurry up the restartpoint by signaling the bgwriter if LWLockConditionalAcquire(CheckPointLock) fails, but there's a race condition with that if a restartpoint starts right after that check. We could let the bgwriter do the checkpoint too, and wait for it, but bgwriter might not be running yet, and we'd have to allow bgwriter to write WAL while disallowing it for all other processes, which seems quite complex. Seems like we need something like the LWLockConditionalAcquire approach, but built into CreateCheckPoint to eliminate the race condition Seems straightforward? Hold the lock longer. - If you perform a fast shutdown while startup process is waiting for the restore command, startup process sometimes throws a FATAL error which leads escalates into an immediate shutdown. That leads to different messages in the logs, and skipping of the shutdown restartpoint that we now otherwise perform. Sometimes? - It's not clear to me if the rest of the xlog flushing related functions, XLogBackgroundFlush, XLogNeedsFlush and XLogAsyncCommitFlush, need to work during recovery, and what they should do. XLogNeedsFlush should always return false InRecoveryProcessingMode(). The WAL is already in the WAL files, not in wal_buffers anymore. XLogAsyncCommitFlush should contain Assert(!InRecoveryProcessingMode()) since it is called during a VACUUM FULL only. XLogBackgroundFlush should never be called during recovery because the WALWriter is never active in recovery. That should just be documented. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] new GUC var: autovacuum_process_all_tables
Hi, Right now, when autovacuum is turned on we always assume it's supposed to process all tables except those that have autovacuum_enabled=false. Now, sometimes it might make more sense to keep it enabled but have it only check for certain tables, and leave the majority of them disabled. For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not wedded to the name), and have the user set autovacuum_enabled=true via reloptions to enable it. Opinions? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 17:45 -0300, Alvaro Herrera wrote: Hi, Right now, when autovacuum is turned on we always assume it's supposed to process all tables except those that have autovacuum_enabled=false. Now, sometimes it might make more sense to keep it enabled but have it only check for certain tables, and leave the majority of them disabled. For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not wedded to the name), and have the user set autovacuum_enabled=true via reloptions to enable it. Opinions? So you are inverting the option? What I mean is you are giving the option of either: A. Process everything unless false B. Process nothing unless true If I am understanding what you wrote correctly I am not sure I like the idea as a whole. I think we should just always have it on and not have it be optional. The rule of thumb should be, we autovacuum everything, unless there is a extremely good reason not to and I think you should have to explicitly turn off autovacuum for a relation. Sincerely, Joshua D. Drake -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 17:45 -0300, Alvaro Herrera wrote: Right now, when autovacuum is turned on we always assume it's supposed to process all tables except those that have autovacuum_enabled=false. Now, sometimes it might make more sense to keep it enabled but have it only check for certain tables, and leave the majority of them disabled. For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not wedded to the name), and have the user set autovacuum_enabled=true via reloptions to enable it. I would prefer it if that behaviour was enabled by putting a special entry into pg_autovacuum, e.g. ALL TABLES, autovacuum_enabled=false I don't really want more GUCs for every nuance of AV behaviour. If you do this we'd want it to be selectable by database and schema as well. Perhaps by inserting the oid of the relevant database or schema? e.g. if we want to turn off AV for database X, which has oid x then we insert into pg_autovacuum(x, false, ) or to make the default no-autovacuum for all tables in all databases insert into pg_autovacuum(0, false, ) It would be useful if all but the first two columns were nullable also, to avoid having to specify -1. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Alvaro Herrera alvhe...@commandprompt.com writes: Right now, when autovacuum is turned on we always assume it's supposed to process all tables except those that have autovacuum_enabled=false. Now, sometimes it might make more sense to keep it enabled but have it only check for certain tables, and leave the majority of them disabled. When would that be? I can follow the use-case for vacuuming a selected set of tables via cron-driven commands or whatever, and then excluding those tables from autovacuum's purview. But there isn't a command to vacuum all tables except these. Without such a command available to the cron-job, a switch such as you suggest is merely a foot-gun, because it's dead certain some tables are going to get left out of both manual and autovacuum processing. And before anyone suggests it, I don't want to invent vacuum all tables except these. It'd make more sense to put the effort into developing better scheduling control over autovacuum, such as a concept of maintenance windows. (BTW, autovac does vacuum tables to prevent wraparound even if you try to tell it to skip them, right?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, Feb 5, 2009 at 3:45 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Right now, when autovacuum is turned on we always assume it's supposed to process all tables except those that have autovacuum_enabled=false. Now, sometimes it might make more sense to keep it enabled but have it only check for certain tables, and leave the majority of them disabled. For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not wedded to the name), and have the user set autovacuum_enabled=true via reloptions to enable it. Opinions? Sounds horribly confusing. It's not very difficult to write a script to set this value for every table in the database, if that's what you want to do. Having autovacuum_enabled potentially mean two different things depending on the value of some GUC sounds like a recipe for confusion (and no I don't like it any better if we put the global switch somewhere other than a GUC). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Simon Riggs wrote: On Thu, 2009-02-05 at 17:45 -0300, Alvaro Herrera wrote: For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not wedded to the name), and have the user set autovacuum_enabled=true via reloptions to enable it. I would prefer it if that behaviour was enabled by putting a special entry into pg_autovacuum, So you're not aware that we're doing away with pg_autovacuum for good? It's going to be replaced by reloptions, i.e. ALTER TABLE foo SET (autovacuum_enabled = false); Obviously there's no way to add a catchall setting. e.g. ALL TABLES, autovacuum_enabled=false I don't really want more GUCs for every nuance of AV behaviour. In any case I fail to see how is this much different from a new GUC var. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote: It'd make more sense to put the effort into developing better scheduling control over autovacuum, such as a concept of maintenance windows. We need that as well, not instead of. People want to be able to specify (as an example) * autovac these problem tables anytime required * for all other tables disable AV, except on sundays [non-busy-times] If we're going to build in scheduling featured for AV, I'd like to make those scheduling features available to user defined tasks also. No need to limit ourselves to just AV. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 18:25 -0300, Alvaro Herrera wrote: So you're not aware that we're doing away with pg_autovacuum for good? It's going to be replaced by reloptions, i.e. ALTER TABLE foo SET (autovacuum_enabled = false); Obviously there's no way to add a catchall setting. Seems like a bad plan then. How do you reconcile those conflicting requirements? e.g. ALL TABLES, autovacuum_enabled=false I don't really want more GUCs for every nuance of AV behaviour. In any case I fail to see how is this much different from a new GUC var. Rows in a table v. new parameters. We can allow endless table driven complexity. Adding my_little_nuance=on|off strains most people's patience. How would I specify that database A wants AV turned off, but database B wants it on? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Alvaro Herrera escreveu: Hi, Right now, when autovacuum is turned on we always assume it's supposed to process all tables except those that have autovacuum_enabled=false. Now, sometimes it might make more sense to keep it enabled but have it only check for certain tables, and leave the majority of them disabled. For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not wedded to the name), and have the user set autovacuum_enabled=true via reloptions to enable it. Opinions? What about 'autovacuum_mode'? Values could be 'all' and 'reloption'. If we don't want to add another GUC, I'll go changing the 'autovacuum' GUC. Values would be: 'on' means enable autovacuum and process all tables, 'off' means disable autovacuum and 'reloption' (?) means only process those tables that have reloption autovacuum_enabled=true. The con is that we couldn't implement a per-{schema,database} switch for autovacuum. So I prefer the first one. -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Tom Lane wrote: (BTW, autovac does vacuum tables to prevent wraparound even if you try to tell it to skip them, right?) Yes. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Simon Riggs wrote: On Thu, 2009-02-05 at 18:25 -0300, Alvaro Herrera wrote: So you're not aware that we're doing away with pg_autovacuum for good? It's going to be replaced by reloptions, i.e. ALTER TABLE foo SET (autovacuum_enabled = false); Obviously there's no way to add a catchall setting. Seems like a bad plan then. How do you reconcile those conflicting requirements? I don't see them as conflicting; I see yours as a missing feature, namely the ability to add tables to an autovacuum group, which could have settings attached. Being able to do that is the whole point of moving settings to reloptions. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Simon Riggs si...@2ndquadrant.com writes: On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote: It'd make more sense to put the effort into developing better scheduling control over autovacuum, such as a concept of maintenance windows. We need that as well, not instead of. I disagree; adding every frammish anyone could ever think of is not an overall improvement to the system. My feeling is that we should be trying to eliminate use-cases for cron-driven vacuuming, not trying to make sure that cron-driven scripts can do anything autovacuum can. The main remaining use-case seems to me to make vacuuming work adhere to some business-determined schedule, hence maintenance windows seem like the next thing to do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 18:54 -0300, Alvaro Herrera wrote: Simon Riggs wrote: On Thu, 2009-02-05 at 18:25 -0300, Alvaro Herrera wrote: So you're not aware that we're doing away with pg_autovacuum for good? It's going to be replaced by reloptions, i.e. ALTER TABLE foo SET (autovacuum_enabled = false); Obviously there's no way to add a catchall setting. Seems like a bad plan then. How do you reconcile those conflicting requirements? I don't see them as conflicting; I see yours as a missing feature, namely the ability to add tables to an autovacuum group, which could have settings attached. Being able to do that is the whole point of moving settings to reloptions. So your changes will allow these? ALTER DATABASE foo SET (autovacuum_enabled = false); ALTER SCHEMA foo SET (autovacuum_enabled = false); CREATE TABLE GROUP foo_group; ALTER TABLE foo SET TABLE GROUP foo_group; ALTER TABLE foo2 SET TABLE GROUP foo_group; ALTER TABLE GROUP SET (autovacuum_enabled = false); Hopefully the grouping of tables is not purely related to AV? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 17:08 -0500, Tom Lane wrote: I disagree; adding every frammish anyone could ever think of is not an overall improvement to the system. :) My feeling is that we should be trying to eliminate use-cases for cron-driven vacuuming, not trying to make sure that cron-driven scripts can do anything autovacuum can. Agreed. IMO, the user should only have to think about vacuum in an abstract sense. With the exception being those few tables that need the customized configuration (thus reloptions). The main remaining use-case seems to me to make vacuuming work adhere to some business-determined schedule, hence maintenance windows seem like the next thing to do. Also agreed. Sincerely, Joshua D. Drake regards, tom lane -- PostgreSQL - XMPP: jdr...@jabber.postgresql.org Consulting, Development, Support, Training 503-667-4564 - http://www.commandprompt.com/ The PostgreSQL Company, serving since 1997 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote: It'd make more sense to put the effort into developing better scheduling control over autovacuum, such as a concept of maintenance windows. We need that as well, not instead of. I disagree; adding every frammish anyone could ever think of is not an overall improvement to the system. Agreed, let's get this capability out in 8.4 and we can always adust it based on user demand. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 17:57 -0500, Bruce Momjian wrote: Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote: It'd make more sense to put the effort into developing better scheduling control over autovacuum, such as a concept of maintenance windows. We need that as well, not instead of. I disagree; adding every frammish anyone could ever think of is not an overall improvement to the system. Agreed, let's get this capability out in 8.4 and we can always adust it based on user demand. Oh, I agree to limiting what we do for 8,4, but we need more later. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 17:08 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, 2009-02-05 at 16:29 -0500, Tom Lane wrote: It'd make more sense to put the effort into developing better scheduling control over autovacuum, such as a concept of maintenance windows. We need that as well, not instead of. I disagree; adding every frammish anyone could ever think of is not an overall improvement to the system. I like your word frammish and am watchful of such things myself. My feeling is that we should be trying to eliminate use-cases for cron-driven vacuuming, Agreed. not trying to make sure that cron-driven scripts can do anything autovacuum can. I'm not in favour of limiting our capability to internal actions only. If we add a capability for scheduling work, we can easily make it capable of scheduling many kinds of work. Writing an application maintenance utility in PL/pgSQL is much better than having to write it for all the different servers an application may need to run on. We can't ignore that many people use Windows. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, Feb 5, 2009 at 11:57 PM, Simon Riggs si...@2ndquadrant.com wrote: Writing an application maintenance utility in PL/pgSQL is much better than having to write it for all the different servers an application may need to run on. Welcome to the suction effect. If your scheduler is in the database then you're stuck with the interfaces the database provides. When you use those interfaces you're going to be stuck with whatever tools work with them. Imagine trying to compose MIME email in plpgsql or do dns lookups or interface with your C application code. Plpgsql is singularly unsuited for anything other than database work. Yes we have other languages but there are still relatively few and having them running within a PL interface makes integrating with the rest of their systems more awkward. And more dangerous -- consider what a simple memory management bug can do if it's in a database backend instead of a network client. We can't ignore that many people use Windows. I think that logic is backwards. People choose their development and server environment based on what works best for them. It makes no sense to engineer the system around the assumption that they don't like developing using the best native tools. Our reimplementation of the OS is always going to be second-rate by comparison and it's doing nothing for them but imposing the disadvantages of the restrictions being stuck in a database backend brings. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Agreed, let's get this capability out in 8.4 and we can always adust it based on user demand. Oh, I agree to limiting what we do for 8,4, but we need more later. Thinking about this a little more, the biggest problem I have with this feature is that it makes autovacuum_enabled mean two different things depending on context. But maybe we should change the name of the reloption to autovacuum and have three values for it: default|enabled|disabled. Then we could add a GUC called autovacuum_by_default = on|off, and we could later insert per-schema or per-database or per-table-group defaults without introducing any backwards-incompatibility (default would still mean default, though the default might come from a different source). ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Simon Riggs wrote: not trying to make sure that cron-driven scripts can do anything autovacuum can. I'm not in favour of limiting our capability to internal actions only. If we add a capability for scheduling work, we can easily make it capable of scheduling many kinds of work. Writing an application maintenance utility in PL/pgSQL is much better than having to write it for all the different servers an application may need to run on. We can't ignore that many people use Windows. I'm not sure what you're saying here. Windows has a scheduler (in my setup, that's how my buildfarm members run). And there are third party cron utilities as well. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
Alvaro, First off, with over 200 GUC variables currently active, in general we should be looking to *eliminate* variables rather than adding them. So my personal bar for endorsing a new GUC is set pretty high. Now, sometimes it might make more sense to keep it enabled but have it only check for certain tables, and leave the majority of them disabled. For this we'd have a separate GUC parameter, as in $SUBJECT (I'm not wedded to the name), and have the user set autovacuum_enabled=true via reloptions to enable it. I can't imagine, nor have I encountered in the 3 years of consulting I did since Autovaccum became available, such a use case. Unless there's a real, critical use case for this which is common, I'm opposed to this GUC. On the other hand, I'd been keen on a runtime suset autovaccum=on/off which we could call from a cron job or the pgadmin scheduler in order to have maintenance windows. Unless that's already becoming possible? --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, Feb 5, 2009 at 7:13 PM, Robert Haas robertmh...@gmail.com wrote: Thinking about this a little more, the biggest problem I have with this feature is that it makes autovacuum_enabled mean two different things depending on context. But maybe we should change the name of the reloption to autovacuum and have three values for it: default|enabled|disabled. Then we could add a GUC called autovacuum_by_default = on|off, and we could later insert per-schema or per-database or per-table-group defaults without introducing any backwards-incompatibility (default would still mean default, though the default might come from a different source). In fact (he said to himself), we could take this a step further and call both the reloption and GUC autovacuum_policy. Then we could have two policies for this release (always and never) plus allow default for the reloption. Then future releases could allow users to define additional policies, like off-hours. Just thinking out loud here folks... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum and reloptions
Alvaro Herrera escreveu: So here's what looks like a committable patch. Note to self: remember to remove src/include/catalog/pg_autovacuum.h and to bump catversion. Works for me. Just a few comments. (i) I don't like this construction by entries by changing storage parameters. I prefer by changing storage parameters or by entries in pg_class.reloptions; (ii) I think we should change the expression storage parameters for something else because autovacuum is related to maintenance. My suggestion is a general expression like relation parameters; (iii) I noticed that GUC defaults and relopt defaults are different (autovacuum_cost_delay and autovacuum_cost_limit). Is there any reason for not using -1? (iv) Maybe we should document that pg_dump will only dump reloptions like toast.foo iff the relation has an associated TOAST table. This seems obvious but ... -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synch Replication
Hi Niranjan, On Thu, Feb 5, 2009 at 10:50 PM, K, Niranjan (NSN - IN/Bangalore) niranja...@nsn.com wrote: I changed the code to dump all messages for replication, so please try it and report the result. The messages which the standby server receives are logged in the file (*1). Please send also that file. (*1) installation_directory/sbydata/walreceiver_trace.out Refer attachments. Thanks for interesting results! According to the logs, walreceiver receives 'c' message (which is invalid for walreceiver) though walsender doesn't send it. So, next, I should diagnose more carefully the messages between those processes. Please test synch-rep again and report the following information. * server log * walreceiver_trace.out * netstat -nap (before and after replication crashes) * tcpdump -i lo -n tcp Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 release planning
Hi, On Tue, Feb 3, 2009 at 5:08 AM, Bruce Momjian br...@momjian.us wrote: o Others We will focus on all the other items on the commit fest page, and that will determine our time-line for 8.4 beta, i.e. the first three items will not delay our beta release. Simon is assigned as reviewer of PITR performance improvement patch, but I think that he is too busy with HS to check the code. More reviewer should be assigned to that? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 release planning
I understand Simon is extremely busy on his own patch. I appreciate if anyone help the review. 2009/2/6 Fujii Masao masao.fu...@gmail.com: Hi, On Tue, Feb 3, 2009 at 5:08 AM, Bruce Momjian br...@momjian.us wrote: o Others We will focus on all the other items on the commit fest page, and that will determine our time-line for 8.4 beta, i.e. the first three items will not delay our beta release. Simon is assigned as reviewer of PITR performance improvement patch, but I think that he is too busy with HS to check the code. More reviewer should be assigned to that? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixing Grittner's planner issues
I think the only fix for this that's practical in the 8.4 time frame is to give up trying to flatten EXISTS/NOT EXISTS that occur within the ON condition of an outer join, ie, revert to 8.3's level of intelligence about this case. It seems like a general purpose solution would involve somehow being able to separate the semantic effects of an outer join (in particular, null-insertion) from the time at which it's performed, and I've really got no idea how to do that or what consequences it would have for both the planner and the executor. I think that A LEFT JOIN (B ANTI JOIN C) is equivalent to (A LEFT JOIN B) LEFT JOIN (UNIQUE C) if you rewrite each attribute provided by b as CASE WHEN (no matching tuple found in C) THEN b.attribute END. On a related note, I have some vague unease about planning A SEMI JOIN B as A INNER JOIN (UNIQUE B), as make_one_rel currently attempts to do. For a merge join or nested loop, I don't see how this can ever be a win over teaching the executor to just not rescan B. For a hash join, it can be a win if B turns out to have duplicates, but then again you could also just teach the executor to skip the insertion of the duplicate into the table in the first place (it has to hash 'em anyway...). I think maybe I'm not understanding something about the logic here. It seems like you might want some infrastructure for cases where we know we are just probing the inner side of the relation for a match. If you find at least one, you either make a decision as to whether or not to filter the tuple (regular semi/anti-join) or whether or not to force certain fields to NULL (semi/anti-join under ON-clause). But all these cases can share the we-don't-care-about-multiple-inner-matches logic. Reflecting on this further, I suspect there are also some bugs in the planner's rules about when semi/antijoins can commute with other joins; but that's not directly causing Kevin's problem, because the rules do make the correct decision for this case. One thing I notice is that src/backend/optimizer/README should probably be updated with the rules for commuting SEMI and ANTI joins; it currently only mentions INNER, LEFT, RIGHT, and FULL. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1522)
The following patches are updated ones: [1/5] http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1522.patch [2/5] http://sepgsql.googlecode.com/files/sepostgresql-utils-8.4devel-3-r1522.patch [3/5] http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1522.patch [4/5] http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1522.patch [5/5] http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1522.patch - List of updates: * The facilities of PGACE are removed. * The facilities of row-level access controls are separated. * The facilities of security attribute management are separated. - The pg_security system catalog, the idea of security identifier and the security_label system column are included. - AVC become to accept text form security context. - pg_class, pg_attribute, pg_database and pg_proc got a new field to store text form security context. * A few of security hooks are integrated into pg_xxx_aclcheck() - sepgsqlCheckProcedureExecute() from pg_proc_aclmask() - sepgsqlCheckDatabaseAccess() from pg_database_aclmask() * Access controls on large objects are separated. * The baseline security policy module is omitted, so the 3rd patch provides only developer's policy. * Descriptions about PGACE and row-level access controls are separated. * Testcases are reworked. * Anyway, most of patches are reworked! - Scale of patches It may seem you the updated version is not smaller than previous version, but more than half of affected lines are come from changes in system catalog. * The previous full-functional version (r1467) $ diffstat sepostgresql-sepgsql-8.4devel-3-r1467.patch : 110 files changed, 9813 insertions(+), 16 deletions(-), 924 modifications(!) * Current version (r1522) $ diffstat sepostgresql-sepgsql-8.4devel-3-r1522.patch : src/include/catalog/pg_attribute.h| 500 !!! src/include/catalog/pg_class.h| 12 src/include/catalog/pg_database.h |6 src/include/catalog/pg_proc.h | 4207 !! : 65 files changed, 4737 insertions(+), 11 deletions(-), 4908 modifications(!) Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Thu, 2009-02-05 at 19:23 -0500, Andrew Dunstan wrote: Simon Riggs wrote: not trying to make sure that cron-driven scripts can do anything autovacuum can. I'm not in favour of limiting our capability to internal actions only. If we add a capability for scheduling work, we can easily make it capable of scheduling many kinds of work. Writing an application maintenance utility in PL/pgSQL is much better than having to write it for all the different servers an application may need to run on. We can't ignore that many people use Windows. I'm not sure what you're saying here. Windows has a scheduler (in my setup, that's how my buildfarm members run). And there are third party cron utilities as well. All I'm saying is *if* we put scheduling inside Postgres for autovacuum *then* we should make it general purpose scheduling. If anybody uses the argument that we have external schedulers, so don't put them in the database then that argument applies equally to scheduling autovacuum. It's easy to turn autovacuum on/off via an external scheduler, yet look upthread and see how many people think it should be in the database. Whichever way you think the decision should go, the same arguments apply to scheduling autovacuum and scheduling other database maintenance tasks. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new GUC var: autovacuum_process_all_tables
On Fri, 2009-02-06 at 00:07 +, Greg Stark wrote: On Thu, Feb 5, 2009 at 11:57 PM, Simon Riggs si...@2ndquadrant.com wrote: Writing an application maintenance utility in PL/pgSQL is much better than having to write it for all the different servers an application may need to run on. Welcome to the suction effect. If your scheduler is in the database then you're stuck with the interfaces the database provides. When you use those interfaces you're going to be stuck with whatever tools work with them. Imagine trying to compose MIME email in plpgsql or do dns lookups or interface with your C application code. Plpgsql is singularly unsuited for anything other than database work. Yes we have other languages but there are still relatively few and having them running within a PL interface makes integrating with the rest of their systems more awkward. And more dangerous -- consider what a simple memory management bug can do if it's in a database backend instead of a network client. You're saying that because it would be wrong for some things, we're not going to allow it at all. Re-read what you've written and you'll see that same argument would ban all PLs, on the basis that we have external language environments. I don't suppose you believe that. There are many in-database-only actions that people want to schedule, not just autovacuum. End of month data deletion, partition setup, pre-joined or summary table creation, FY rollover calculations etc etc. None of those have anything to do with the world outside the DB. If writing database functions is possible and sometimes desirable in Postgres, then scheduling them is also sometimes desirable also. I would hope that we can give our users the choice. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers