Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 11 June 2013 04:36, Sergey Konoplev gray...@gmail.com wrote: Hi, On Thu, May 9, 2013 at 7:28 PM, Sergey Konoplev gray...@gmail.com wrote: On Tue, Apr 2, 2013 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote: The attached patch fixes this although I don't like the way it knowledge of the point up to which StartupSUBTRANS zeroes pages is handled. One month has passed since the patched version was installed in our production environment and can confirm that everything works perfect. Thank you very much for your prompt help, Andres. Are there any plans to commit this patch and what version it is going to be done to? Thank you. I'll be committing this soon, since we're likely coming up to the next point release soon. Thanks for the reminder. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [HACKERS] [BUGS] COPY .... (FORMAT binary) syntax doesn't work
On 27 May 2013 15:31, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 26 May 2013 17:10, Tom Lane t...@sss.pgh.pa.us wrote: More readable would be to invent an intermediate nonterminal falling between ColId and ColLabel, whose expansion would be IDENT | unreserved_keyword | col_name_keyword | type_func_name_keyword, and then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. Any thoughts about a name for that new nonterminal? Do you think complicating the parser in that way is worth the trouble for this case? Could that slow down parsing? It makes the grammar tables a bit larger (1% or so IIRC). There would be some distributed penalty for that, but probably not much. Of course there's always the slippery-slope argument about that. We don't actually have to fix it; clearly not too many people are bothered, since no complaints in 3 years. Documenting 'binary' seems better. Well, my thought is there are other cases. For instance: regression=# create role binary; ERROR: syntax error at or near binary LINE 1: create role binary; ^ regression=# create user cross; ERROR: syntax error at or near cross LINE 1: create user cross; ^ If we don't have to treat type_func_name_keywords as reserved in these situations, shouldn't we avoid doing so? Seems reasonable argument, so +1. Sorry for delayed reply. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8192: On very large tables the concurrent update with vacuum lag the hot_standby replica
On 30 May 2013 17:43, feder...@brandwatch.com wrote: It seems on very large tables the concurrent update with vacuum (or autovacuum), when the slave is in hot standby mode, generates long loops in read on a single wal segment during the recovery process. Not all aspects of hot standby are optimally tuned, as yet. Some room for improvement exists. In some cases, conscious choices were made to keep processing on master fast at the expense of speed on the standby. There doesn't seem to be anything which is in itself an error here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work
On 26 May 2013 16:35, Heikki Linnakangas hlinnakan...@vmware.com wrote: My attempts to fix that look pretty ugly, so I'm not even going to post them. I can stop the error on binary by causing errors on csv and text, obviously not a fix. Any grammar based fix looks like it would restrict the list of formats, which breaks the orginal intention of the syntax change. This seems to work: This was almost exactly the fix I described above that only fixes that specific case and then breaks others. --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -2528,3 +2528,7 @@ copy_generic_opt_elem: { $$ = makeDefElem($1, $2); } + | ColLabel BINARY + { + $$ = makeDefElem($1, (Node *) makeString(binary)); + } So, no that doesn't work. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [HACKERS] [BUGS] COPY .... (FORMAT binary) syntax doesn't work
On 26 May 2013 17:10, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: More readable would be to invent an intermediate nonterminal falling between ColId and ColLabel, whose expansion would be IDENT | unreserved_keyword | col_name_keyword | type_func_name_keyword, and then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. Any thoughts about a name for that new nonterminal? Do you think complicating the parser in that way is worth the trouble for this case? Could that slow down parsing? We don't actually have to fix it; clearly not too many people are bothered, since no complaints in 3 years. Documenting 'binary' seems better. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] COPY .... (FORMAT binary) syntax doesn't work
This works fine... COPY pgbench_accounts TO '/tmp/acc' BINARY; This new format does not COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY); ERROR: syntax error at or near BINARY at character 47 which looks like I've mistyped something. Until you realise that this statement gives a completely different error message. COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname); ERROR: COPY format anyname not recognized and we also note that there are no examples in the docs, nor regression tests to cover this situation. So I conclude that this hasn't ever worked since it was introduced in 9.0. The cause is that there is an overlap between the old and the new COPY syntax, relating to the word BINARY. It's the grammar generating the error, not post parse analysis. My attempts to fix that look pretty ugly, so I'm not even going to post them. I can stop the error on binary by causing errors on csv and text, obviously not a fix. Any grammar based fix looks like it would restrict the list of formats, which breaks the orginal intention of the syntax change. I'm advised that COPY pgbench_accounts FROM '/tmp/acc' (FORMAT 'BINARY'); works fine. Though that is not documented and I doubt anyone much uses that. My conclusion is that we should *not* fix this bug, but just alter the manual slightly to explain what the correct usage is (use quotes around 'binary'). Reason for that suggestion is that nobody ever reported this bug, so either few people use binary mode or they use the old syntax. Of course, that is not a normal suggestion, so feel free to walk up and down my spine with boots on for suggesting it. Thoughts? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages
On 30 March 2013 17:21, Andres Freund and...@2ndquadrant.com wrote: So if the xid is later than latestObservedXid we extend subtrans one by one. So far so good. But we initialize it in ProcArrayApplyRecoveryInfo() when consistency is initially reached: latestObservedXid = running-nextXid; TransactionIdRetreat(latestObservedXid); Before that subtrans has initially been started up with: if (wasShutdown) oldestActiveXID = PrescanPreparedTransactions(xids, nxids); else oldestActiveXID = checkPoint.oldestActiveXid; ... StartupSUBTRANS(oldestActiveXID); That means its only initialized up to checkPoint.oldestActiveXid. As it can take some time till we reach consistency it seems rather plausible that there now will be a gap in initilized pages. From checkPoint.oldestActiveXid to running-nextXid if there are pages inbetween. That was an old bug. StartupSUBTRANS() now explicitly fills that gap. Are you saying it does that incorrectly? How? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] postgres 9.2.2 point conversion from polygon doesn't always give accurate center
On 1 February 2013 17:54, Colin Dunklau colin.dunk...@gmail.com wrote: For the below two queries, I expect to get a result of (0.5, 0.5). cdunklau=# select point( polygon '((0,0),(0,1),(1,1),(0,1))'); point - (0.25,0.75) (1 row) I think you just simply mistyped the coordinates... sriggs=# select point( polygon '((0,0),(0,1),(1,1),(1,0))'); point --- (0.5,0.5) (1 row) Your last point is a duplicate of the 2nd point, so you have a 4-pointed triangle and hence a strange centre. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7803: Replication Problem(no master is there)
On 15 January 2013 05:12, Tomonari Katsumata katsumata.tomon...@po.ntts.co.jp wrote: We added a REPLICATION privelge onto user accounts to control access. Perhaps we should add a CASCADE privilege as well, so that we can control whether we can connect to a master and/or a standby. Syntax would be ALTER USER foo [MASTER | CASCADE] REPLICATION REPLICATION allows both master and cascaded replication (same as now) MASTER REPLICATION allows master only CASCADE REPLICATION allows cascaded replication only NOREPLICATION allows neither option Someone is working for it already ? If not yet, may I try to implement it ? Please do. It looks fairly short. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7803: Replication Problem(no master is there)
On 11 January 2013 08:40, Heikki Linnakangas hlinnakan...@vmware.com wrote: This makes me wonder if there should be a GUC to forbid cascading replication, though. If you don't want to do cascading replication (which is quite rare, I'd say), you could just disable it to avoid a situation like this. Connection from a standby is disabled by default. Don't enable it... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7803: Replication Problem(no master is there)
On 11 January 2013 12:18, Tomonari Katsumata katsumata.tomon...@po.ntts.co.jp wrote: I'm not sure but what about adding the parameter(cascade_mode) on recovery.conf ? The parameter represents a will to connect to standby server when starting as standby. If the parameter is set to on, connect to a server forcely like PostgreSQL 9.2, and if the parameter is set to off, connect to the another standby server is refused like PostgreSQL 9.1. We added a REPLICATION privelge onto user accounts to control access. Perhaps we should add a CASCADE privilege as well, so that we can control whether we can connect to a master and/or a standby. Syntax would be ALTER USER foo [MASTER | CASCADE] REPLICATION REPLICATION allows both master and cascaded replication (same as now) MASTER REPLICATION allows master only CASCADE REPLICATION allows cascaded replication only NOREPLICATION allows neither option -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
On 8 January 2013 19:24, bri...@openroadtech.com wrote: Simply stated, pg_xlog grows out of control on a streaming-replication backup server with a high volume of writes on the master server. This occurs only with checkpoint_completion_target0 and very large (eg. 8GB) shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me). All of this appears to be working as designed. It will issue a restartpoint every checkpoint_timeout seconds on the standby. checkpoint_segments is ignored on standby. Detail: I have two new/fast servers using streaming replication, with 9.2.2 from the PostgreSQL yum repository. The servers are connected via 10G network, so the failover receives data as fast as it is created. It appears that to trigger this problem you need large shared_buffers (mine is set to 8GB) and checkpoint_completion_target 0. (.8 or .9 will do). I have checkpoint_timeout=5min. (though setting this to 30s *does not* alleviate the problem). The failover machine doesn't seem to be concerned with checkpoint_timeout or checkpoint_segments. When doing large writes (like pg_restore) the failover machine's pg_xlog does not clear files. The reason appears to be that it is not checkpointing(verified by log_checkpoints=on and examining the log). If I run 'checkpoint;' manually on the failover in psql(hot_standby=on), it immediately cleans up pg_xlog(after a pause to actually perform the checkpoint). To generate the data,you can just run pgbench -i -s1000 on the master server, then watch pg_xlog grow on the failover. This produces a ~10GB of data, then creates indices. If I run du on the failover after the data is loaded, I see: # du -s base pg_xlog | sort -n 10928276pg_xlog 13144536base I have a 10GB pg_xlog. The load completes in less than 5 minutes, which may be relevant here, since eventually the failover will perform a checkpoint. On the primary server, pg_xlog caps at 1.2GB. If checkpoint_completion_target=0, or shared_buffers=256MB, pg_xlog grows to around 1GB and stays there. This guess may be off target, but it appears that the database only checkpoints every approximately checkpoint_completion_target* checkpoint_timeout seconds. So, for a .9 completion target and the default 5min timeout, the failover will go for 4+ minutes without executing a single checkpoint, regardless of how much data arrives (verified with log_checkpoints=on). All the while the master server is spitting out 'checkpoints are occurring too frequently' messages, which I assume is to be expected during a reload. This means that the failover machine needs 2x the database size to not crash during a database reload. We run VMs with limited disk allocation where this is not the case. We periodically need to dump/restore, and I'm concerned this simple operation will crash our failover machines. I can avoid the problem by setting checkpoint_completion_target=0, which I will do for now. Here are settings that differ from the default initdb install: shared_buffers = 8GB work_mem = 64MB maintenance_work_mem = 256MB max_stack_depth = 8MB wal_level = hot_standby synchronous_commit = off wal_buffers = 64MB checkpoint_segments = 10 checkpoint_completion_target = 0 max_wal_senders = 2 hot_standby = on effective_cache_size = 8GB log_checkpoints = on -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7752: FATAL btree error on PITR
On 12 December 2012 01:55, m.sakre...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 7752 Logged by: Maciek Sakrejda Email address: m.sakre...@gmail.com PostgreSQL version: 9.1.6 Operating system: Ubuntu 10.04 LTS 64-bit Description: Ran into the following error in trying to perform a PITR: FATAL: btree level 66135134 not found in index 436254 This happened when the PITR was almost complete (judging by on-disk database size). I guess this may be related to one of the index corruption issues fixed in 9.1.7, but if that's the case, would it perhaps make sense to complete the PITR without the corrupt index(es) and deal with the index issue separately? Clearly, just a warning is dangerously likely to be skipped, but maybe a mechanism like backup_label, that prevents normal startup before the issue is resolved? Yes, I've thought about allowing recovery to skip corrupt indexes, but that feature hasn't been implemented yet. We'd need to track such things as recovery proceeds and then mark them as invalid when recovery completes. Patches welcome. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 00:35, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: So apparently this is something we broke since Nov 18. Don't know what yet --- any thoughts? Further experimentation shows that reverting commit ffc3172e4e3caee0327a7e4126b5e7a3c8a1c8cf makes it work. So there's something wrong/incomplete about that fix. This is a bit urgent since we now have to consider whether to withdraw 9.2.2 and issue a hasty 9.2.3. Do we have a regression here since 9.2.1, and if so how bad is it? I'll look at this now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 02:27, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: But the key is, the database was not actually consistent at that point, and so opening hot standby was a dangerous thing to do. The bug that allowed the database to open early (the original topic if this email chain) was masking this secondary issue. Could you check whether the attached patch fixes the behaviour? Yeah, I had just come to the same conclusion: the bug is not with Heikki's fix, but with the pause logic. The comment says that it shouldn't pause unless users can connect to un-pause, but the actual implementation of that test is several bricks shy of a load. OK, I've had a look at this now. Andres correctly identified the bug above, which was that the backup end is noted by the XLOG_BACKUP_END record. recoveryStopsHere() was not changed when that record type was added, so it ignores the situation that we are waiting for end of backup and yet it stop anyway. So I concur with Jeff that there is a bug and think that Andres has provided the clue to a fix. I'll patch that now. Aboriginal bug extends back to 9.0. Pause has got nothing at all to do with this, even if there are other problems there. Your patch is better, but it's still missing a bet: what we need to be sure of is not merely that we *could* have told the postmaster to start hot standby, but that we *actually have done so*. Otherwise, it's flow-of-control-dependent whether it works or not; we could fail if the main loop hasn't gotten to CheckRecoveryConsistency since the conditions became true. Looking at the code in CheckRecoveryConsistency, testing LocalHotStandbyActive seems appropriate. I also thought it was pretty dangerous that this absolutely critical test was not placed in recoveryPausesHere() itself, rather than relying on the call sites to remember to do it. So the upshot is that I propose a patch more like the attached. I've reworked pause logic along the lines you suggest. Attached here for further discussion. This is not a regression because the pause logic is broken this same way since 9.1. So I no longer think that we need a rewrap. I think we do need a rewrap, since the bug is not in pause logic. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services recovery_pause_refactor.v1.patch Description: Binary data -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 13:34, Simon Riggs si...@2ndquadrant.com wrote: Aboriginal bug extends back to 9.0. I don't see any bug in 9.0 and 9.1, just 9.2+ -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 14:33, Andres Freund and...@2ndquadrant.com wrote: Independent of this patch, I am slightly confused about the whole stop logic. Isn't the idea that you can stop/start/stop/start/... recovery? Because if !recoveryApply we break out of the whole recovery loop and are done with things. You can, but by shutting down server, updating recovery target, then restarting server. That's not a beautiful design, its just waiting for someone to make recovery targets changeable, which is best done when recovery.conf parameters are in postgresql.conf -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 16:40, Tom Lane t...@sss.pgh.pa.us wrote: The real question here probably needs to be what is the point of recoveryPauseAtTarget in the first place?. I find it hard to envision what's the point of pausing unless the user has an opportunity to make a decision about whether to continue applying WAL. As Simon mentioned, we seem to be lacking some infrastructure that would let the user adjust the recovery_target parameters before resuming WAL processing. But, assuming for the moment that our workaround for that is shutdown the server, adjust recovery.conf, and restart, is the pause placed in a useful spot for that? BTW, could we make this more convenient by letting recoveryPausesHere() reread recovery.conf? Also, shouldn't the code re-evaluate recoveryStopsHere() after that? The recovery target and the consistency point are in some ways in conflict. If the recovery target is before the consistency point there is no point in stopping there, whether or not we pause. What we should do is say recovery target reached, yet recovery not yet consistent, continuing. So ISTM that we should make recoveryStopsHere() return false while we are inconsistent. Problems solved. We can re-read parameters after we wake from a pause. Presumably only in HEAD, or do you mean in 9.2/9,1 also? My earlier patch to rearrange pause logic did a few things, nothing very dramatic * Put the logic for whether we're consistent inside recoveryPausesHere() as requested * rearrange logic so setting EndPtr is the very last thing we do before redo for the current record * Fixes the bug that we don't pause in the correct place if recoveryApply = true The comment on that new pause location was slightly wrong -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 17:17, Simon Riggs si...@2ndquadrant.com wrote: The recovery target and the consistency point are in some ways in conflict. If the recovery target is before the consistency point there is no point in stopping there, whether or not we pause. What we should do is say recovery target reached, yet recovery not yet consistent, continuing. So ISTM that we should make recoveryStopsHere() return false while we are inconsistent. Problems solved. Patch -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services recovery_dont_stop_when_inconsistent.v1.patch Description: Binary data -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 18:48, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Andres Freund and...@2ndquadrant.com writes: On 2012-12-05 17:24:42 +, Simon Riggs wrote: So ISTM that we should make recoveryStopsHere() return false while we are inconsistent. Problems solved. I prefer the previous (fixed) behaviour where we error out if we reach a recovery target before we are consistent: I agree. Silently ignoring the user's specification is not good. (I'm not totally sure about ignoring the pause spec, either, but there is no good reason to change the established behavior for the recovery target spec.) On further thought, it seems like recovery_pause_at_target is rather misdesigned anyway, and taking recovery target parameters from recovery.conf is an obsolete API that was designed in a world before hot standby. What I suggest people really want, if they're trying to interactively determine how far to roll forward, is this: (1) A recovery.conf parameter that specifies pause when hot standby opens up (that is, as soon as we have consistency). Agreed. (2) A SQL command/function that releases the pause mode *and* specifies a new target stop point (ie, an interactive way of setting the recovery target parameters). The startup process then rolls forward to that target and pauses again. That was my original patch, IIRC, with different functions for each target type, plus an advance N records function for use in debugging/automated testing. Having a single function would be possible, taking a single TEXT parameter that we parse just like the recovery.conf, taking semi-colons as newlines. (3) A SQL command/function that releases the pause mode and specifies coming up normally, ie not following the archived WAL any further (I imagine this would force a timeline switch). The existing pause now function could still fit into this framework; but it seems to me to have mighty limited usefulness, considering the speed of WAL replay versus human reaction time. I think that was in there also. Can't remember why we didn't go for the full API last time. I'll have another go, in HEAD. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 21:15, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 5 December 2012 18:48, Tom Lane t...@sss.pgh.pa.us wrote: On further thought, it seems like recovery_pause_at_target is rather misdesigned anyway, and taking recovery target parameters from recovery.conf is an obsolete API that was designed in a world before hot standby. What I suggest people really want, if they're trying to interactively determine how far to roll forward, is this: ... Can't remember why we didn't go for the full API last time. I'll have another go, in HEAD. That's fine, but the immediate question is what are we doing to fix the back branches. I think everyone is clear that we should be testing LocalHotStandbyActive rather than precursor conditions to see if a pause is allowed, but are we going to do anything more than that? No The only other thing I really wanted to do is not have the in-loop pause occur after we've taken actions that are effectively part of the WAL apply step. I think ideally it should happen just before or after the recoveryStopsHere stanza. Last night I worried about adding an extra spinlock acquire to make that work, but today I wonder if we couldn't get away with just a naked if (xlogctl-recoveryPause) recoveryPausesHere(); The argument for this is that although we might fetch a slightly stale value of the shared variable, it can't be very stale --- certainly no older than the spinlock acquisition near the bottom of the previous iteration of the loop. And this is a highly asynchronous feature anyhow, so fuzziness of plus or minus one WAL record in when the pause gets honored is not going to be detectable. Hence an extra spinlock acquisition is not worth the cost. Yep, thats fine. Are you doing this or do you want me to? Don't mind either way. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 5 December 2012 22:23, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Dec 5, 2012 at 4:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: The argument for this is that although we might fetch a slightly stale value of the shared variable, it can't be very stale --- certainly no older than the spinlock acquisition near the bottom of the previous iteration of the loop. And this is a highly asynchronous feature anyhow, so fuzziness of plus or minus one WAL record in when the pause gets honored is not going to be detectable. Hence an extra spinlock acquisition is not worth the cost. I wonder whether the cost of an extra spinlock acquire/release cycle is really noticeable here. That can get expensive in a hurry when lots of processes are contending the spinlock ... but I think that shouldn't be the case here; most of the accesses will be coming from the startup process. Of course atomic operations are much more expensive than ordinary CPU instructions even under the best of circumstances, but is that really material here? I'm just wondering whether this is premature optimization that's going to potentially bite us later in the form of subtle, hard-to-reproduce bugs. I have the same doubt about whether it's really significant. However, I think it's much more likely that we'd find out it is significant than that we'd find out anybody could detect the lack of a memory barrier there. Agreed. And any future logic to stop at a specific point will be exactly precise because the decision and action will be taken in same process. Patch looks good. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7710: Xid epoch is not updated properly during checkpoint
On 1 December 2012 23:10, Andres Freund and...@anarazel.de wrote: So barring objections, I'm going to remove LogStandbySnapshot's behavior of returning the updated nextXid. I don't see any reason why it would be bad to remove this. I think the current behaviour could actually even delay getting to an active state slightly in the presence of prepared transactions because its used to create to initialize the KnownAssignedXid machinery in xlog_redo. If the prepared xacts are suboverflown its a *good* thing to have an old -nextXid. That particular coding was pretty weird. Please change. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7710: Xid epoch is not updated properly during checkpoint
On 1 December 2012 22:56, Tom Lane t...@sss.pgh.pa.us wrote: tar...@gmail.com writes: [ txid_current can show a bogus value near XID wraparound ] This happens only if wal_level=hot_standby. I believe what is happening here is (1) CreateCheckPoint sets up checkPoint.nextXid and checkPoint.nextXidEpoch, near xlog.c line 7070 in HEAD. At this point, nextXid is still a bit less than the wrap point. (2) After performing the checkpoint, at line 7113, CreateCheckPoint calls LogStandbySnapshot() which helpfully updates checkPoint.nextXid to the latest value. Which by now has wrapped around. But it doesn't fix checkPoint.nextXidEpoch, so the checkpoint that gets written out has effectively lost the epoch bump that should have happened. While we could add some more logic to try to correct the epoch value in this scenario, I think it's a much better idea to just stop having LogStandbySnapshot update the nextXid. That seems to me to be useless complication. I also quite dislike the fact that we're effectively redefining the checkpoint nextXid from being taken before the main body of the checkpoint to being taken afterwards, but *only* in XLogStandbyInfoActive mode. If that inconsistency isn't already causing bugs (besides this one) today, it'll probably cause them in the future. I agree that the coding looks weird and agree it shouldn't be there. The meaning of the checkpoint values should not differ because wal_level has changed. So barring objections, I'm going to remove LogStandbySnapshot's behavior of returning the updated nextXid. Removing it may cause other bugs, but if so, those other bugs need to be solved in the right way, not by having a too-far-forwards nextxid on the checkpoint record. Having said that, I can't see any bugs that would be caused by this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7710: Xid epoch is not updated properly during checkpoint
On 2 December 2012 12:51, Simon Riggs si...@2ndquadrant.com wrote: On 1 December 2012 22:56, Tom Lane t...@sss.pgh.pa.us wrote: tar...@gmail.com writes: [ txid_current can show a bogus value near XID wraparound ] This happens only if wal_level=hot_standby. I believe what is happening here is (1) CreateCheckPoint sets up checkPoint.nextXid and checkPoint.nextXidEpoch, near xlog.c line 7070 in HEAD. At this point, nextXid is still a bit less than the wrap point. (2) After performing the checkpoint, at line 7113, CreateCheckPoint calls LogStandbySnapshot() which helpfully updates checkPoint.nextXid to the latest value. Which by now has wrapped around. But it doesn't fix checkPoint.nextXidEpoch, so the checkpoint that gets written out has effectively lost the epoch bump that should have happened. While we could add some more logic to try to correct the epoch value in this scenario, I think it's a much better idea to just stop having LogStandbySnapshot update the nextXid. That seems to me to be useless complication. I also quite dislike the fact that we're effectively redefining the checkpoint nextXid from being taken before the main body of the checkpoint to being taken afterwards, but *only* in XLogStandbyInfoActive mode. If that inconsistency isn't already causing bugs (besides this one) today, it'll probably cause them in the future. I agree that the coding looks weird and agree it shouldn't be there. The meaning of the checkpoint values should not differ because wal_level has changed. So barring objections, I'm going to remove LogStandbySnapshot's behavior of returning the updated nextXid. Removing it may cause other bugs, but if so, those other bugs need to be solved in the right way, not by having a too-far-forwards nextxid on the checkpoint record. Having said that, I can't see any bugs that would be caused by this. Andres' patch is invasive and is not for my liking in backbranches. Tom's recommended change goes further still and not one I'm comfortable risking, even though I agree with the root cause/change. I've applied an absolutely minimal fix on this, which introduces no other changes that could cause unforeseen consequences. Others may wish to go further, overriding my patches, as they choose. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7710: Xid epoch is not updated properly during checkpoint
On 2 December 2012 15:25, Tom Lane t...@sss.pgh.pa.us wrote: I can point right now to one misbehavior this causes: if you run a point-in-time recovery with a stop point somewhere in the middle of the checkpoint, you should end up with a nextXid corresponding to the stop point. This hack in LogStandbySnapshot causes you to end up with a much later nextXid, if you were running hot-standby. True, though that does not cause any problem. Others may wish to go further, overriding my patches, as they choose. Okay, I will take the responsibility for changing this, but it needs to change. OK. At least we have the minimal coding to fall back on if need be. This coding was ill-considered from the word go. Agreed, but then I don't have a clear reason why it is that way and yet I'm sure I did it for some reason. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7710: Xid epoch is not updated properly during checkpoint
On 2 December 2012 16:44, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 2 December 2012 15:25, Tom Lane t...@sss.pgh.pa.us wrote: This coding was ill-considered from the word go. Agreed, but then I don't have a clear reason why it is that way and yet I'm sure I did it for some reason. I think you just did it because it looked easy, and you didn't think very hard about the consequences. I don't typically think such thoughts. Your ability to see more deeply into certain topics is a credit to you, not an implication of laziness on me. I'm sure I believed it a necessary or useful thing to do. As far as the concern about new bugs is concerned: if we start replay from this checkpoint, we will certainly not consider that the DB is consistent until we reach the checkpoint's physical position. And by that point we'll have replayed the XLOG_RUNNING_XACTS record emitted by LogStandbySnapshot, so our idea of the nextXid should be fully up to date anyway. The same goes for checkpoints encountered later in the replay run --- they'd just be duplicative of the preceding XLOG_RUNNING_XACTS record. There is no reason to put the same XID into the checkpoint record. Exactly, the end result is identical, but the intermediate states may differ. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7572: virtualxid lock held by bgwriter on promoted slaves
On 27 September 2012 22:29, Simon Riggs si...@2ndquadrant.com wrote: On 26 September 2012 22:33, daniele.varra...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 7572 Logged by: Daniele Varrazzo Email address: daniele.varra...@gmail.com PostgreSQL version: 9.1.4 Operating system: Linux Description: Hello, when a slave is promoted, the pgwriter keeps holding a lock with virtualxid 1/1 and virtualtransaction -1/0. Such lock stops pg_reorg to run (as reported here: http://pgfoundry.org/tracker/index.php?func=detailaid=1011203group_id=1000411atid=1376) but I've verified the same condition on 9.1.4). Is it possible to free that lock on slave promotion? Is it safe to ignore that lock for pg_reorg sake? The program is which is probably waiting for all the transactions to finish before swapping the table in the system catalog but I'm not sure about that yet. This one is mine I don't think its important we hold that lock, but will check. This was an interesting bug to track down and certainly had me stumped for some time. Nothing at all todo with the WALwriter. It turns out that the Startup process never ended its VirtualTransaction, which caused 2 separate bugs in 9.0/9.1 and 9.2/HEAD. The bugs in 9.2+ were only visible because of lack of initialisation of the fp fields, also fixed. Thanks for the bug report. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7627: Bad query plan when using LIMIT
On 30 October 2012 05:27, Tom Lane t...@sss.pgh.pa.us wrote: edw...@clericare.com writes: The following two queries differ only by adding LIMIT 1, but the one with the limit gets radically worse performance. I've done VACUUM FULL, VACUUM ANALYZE, and REINDEX DATABASE and there are no modifications since. EXPLAIN ANALYZE SELECT * FROM commits WHERE id IN (SELECT id FROM commits ORDER BY tree_high LIMIT 605 ) AND tree_other IS NULL ORDER BY tree_high DESC; EXPLAIN ANALYZE SELECT * FROM commits WHERE id IN (SELECT id FROM commits ORDER BY tree_high LIMIT 605 ) AND tree_other IS NULL ORDER BY tree_high DESC LIMIT 1; I think what's happening there is that the planner supposes that an indexscan in tree_high DESC order will find rows matching the IN condition uniformly distributed in the scan order --- but, because of the construction of the IN clause, they're actually going to be pessimally located at the very end of that scan order. So it ends up forming all of the nestloop result, when it had expected to have to compute only 1/595 of it. We've discussed dialing down the planner's optimism about limit plans to not assume perfect independence of filter conditions, but I don't think anyone would advocate for having it assume the worst possible case, which is what you've got here unfortunately. Very bad plans with LIMIT are frequent. This is bad for us because adding LIMIT usually/is supposed to make queries faster, not slower. We need to do something. LIMIT optimistically assumes the very best case, and that best case gets better the number of rows filtered away. If we are scanning N rows with LIMIT M we must assume that the action relates in some way to N, rather than assuming the LIMIT is more and more effective as N/M increases. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Pg_stat_replication shows sync standby with flush location behind primary in 9.1.5
On 4 October 2012 05:32, Mark Kirkwood mark.kirkw...@catalyst.net.nz wrote: I am seeing the situation where the reported flush location for the sync standby (standby1 below) is *behind* the reported current xlog location of the primary. This is Postgres 9.1.5 , and I was under the impression that transactions initiated on the master do not commit until the corresponding wal is flushed on the sync standby. Now the standby is definitely working in sync mode, because stopping it halts all write transactions on the primary (sync_standby_names contains only standby1). So is the reported lag in flush location merely an artifact of timing in the query, or is there something else going on? [1] The writing of new WAL is independent of the wait that occurs on commit, so it is entirely possible, even desirable, that the observed effect occurs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7572: virtualxid lock held by bgwriter on promoted slaves
On 26 September 2012 22:33, daniele.varra...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 7572 Logged by: Daniele Varrazzo Email address: daniele.varra...@gmail.com PostgreSQL version: 9.1.4 Operating system: Linux Description: Hello, when a slave is promoted, the pgwriter keeps holding a lock with virtualxid 1/1 and virtualtransaction -1/0. Such lock stops pg_reorg to run (as reported here: http://pgfoundry.org/tracker/index.php?func=detailaid=1011203group_id=1000411atid=1376) but I've verified the same condition on 9.1.4). Is it possible to free that lock on slave promotion? Is it safe to ignore that lock for pg_reorg sake? The program is which is probably waiting for all the transactions to finish before swapping the table in the system catalog but I'm not sure about that yet. This one is mine I don't think its important we hold that lock, but will check. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6239: Looking for a technical contact point for PostgreSQL compatibility issue on Windows8
On 4 October 2011 02:42, Seiko Ishida v-sei...@microsoft.com wrote: The following bug has been logged online: Bug reference: 6239 Logged by: Seiko Ishida Email address: v-sei...@microsoft.com PostgreSQL version: 8.2.4 Operating system: Windows 8 Description:Looking for a technical contact point for PostgreSQL compatibility issue on Windows8 Details: Hello, I am a Program Manager with the Ecosystem Engineering team at Microsoft. I am looking for a technical contact point to notify of compatibility issues with PostgreSQL. Could you please connect me to the appropriate individual for this? Regards, Seiko Ishida Microsoft ISV Readiness, EcoSystem Engineering Team Program Manager Ref : 341057 As a general check, I'd like to raise the topic of Window 8 and Windows Server 2012 support. PostgreSQL 9.2 will be released in next few weeks/months, since we are now at release candidate stage. Can I confirm that both Microsoft and PostgreSQL community thinks Windows 8/Server 2012 is fully supported? I can't see build farm members running either of those relases, and there's been no further comments on this issue here. Is somebody doing private testing on behalf of the community? Should we be mentioning Windows8 and Server 2012 support in our release notes? It seems like it could be a feature, or at very least it will be a popular question. OTOH, we don't mention tuned for the Linux 3 kernel, but then perhaps we should be doing that as well... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6239: Looking for a technical contact point for PostgreSQL compatibility issue on Windows8
On 1 September 2012 08:10, Gavin Flower gavinflo...@archidevsys.co.nz wrote: Possibly Microsoft could donate boxen to enable PostgreSQL to be tested on their O/S's? Microsoft don't need to donate boxes, they can setup their own build farm members and contribute directly. Postgres runs a distributed build farm that anybody can join, no fees, no restrictions. http://www.pgbuildfarm.org/cgi-bin/register-form.pl Community assistance would be available to help. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] primary and hot standby database don' work now
On 22 July 2012 07:00, leo xu leoxu8...@gmail.com wrote: hello everyone: my pg version is 9.1.2 running on red hat 5.4 x64 servers.one day,i found all database indexes can't work,reindex database,it can work,but some table has primary key,found exists the same two records in table. primary database error: 2012-07-19 21:18:36.133 CST,,,476,,4fde7dd3.1dc,22,,2012-06-18 09:01:07 CST,1/0,0,WARNING,01000,specified item offset is too large,xlog redo insert: rel 17673/16386/16780; tid 409/151 There's something badly wrong with your WAL records. I'd suggest you move to a later version of 9.1 and retry. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6736: Hot-standby replica crashed after failover:
On 13 July 2012 14:38, maxim.bo...@gmail.com wrote: 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []PANIC: WAL contains references to invalid pages The problem is not connected with Hot Standby. The issue relates to index code that contains problems, leaving invalid pages. These are not reported until end of recovery. That has now been changed to be reported earlier, so its clearer that this situation exists. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6661: out-of-order XID insertion in KnownAssignedXids
On 7 June 2012 17:49, Andres Freund and...@2ndquadrant.com wrote: A patch implementing that is attached. Unfortunately not really tested yet because its kinda hard to hit that code-path. Valentine, can you test that patch? Patch looks good, so as soon as we confirm it we can apply. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6661: out-of-order XID insertion in KnownAssignedXids
On 8 June 2012 12:40, Andres Freund and...@2ndquadrant.com wrote: Hi All, On Friday, June 08, 2012 12:42:00 PM Andres Freund wrote: On Friday, June 08, 2012 12:25:25 PM Valentine Gogichashvili wrote: unfortunately I did not manage to keep the snapshot of the database, that had this issue. Ok, after some playing I could reproduce the issue: master: S1: CREATE FUNCTION insertone() RETURNS void LANGUAGE plpgsql AS $$BEGIN INSERT INTO data(data) VALUES (random()); EXCEPTION WHEN division_by_zero THEN RAISE NOTICE 'huh'; END;$$; S1: BEGIN; S1: SELECT insertone() FROM generate_series(1, 1000); --subxid overflow S2: BEGIN; S2: SELECT insertone(); --allocate xid standby: pg_basebackup LOG:  0: consistent recovery state reached at 0/5C0001C0 LOG:  0: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4752 (now 3750) master: S1: commit: S3: checkpoint; standby: before patch: crash in assert/elog after:  LOG:  0: recovery snapshots are now enabled So I can confirm the above patch fixes at least that bug. Andres Committed, with minor changes in wording of comments and error messages. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6629: Creating a gist index fails with too many LWLocks taken
On 11 May 2012 11:07, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I wonder if we should reserve a few of the lwlock slots for critical sections, to make this less likely to happen. Not only in this case, but in general. We haven't seen this problem often, but it would be quite trivial to reserve a few slots. Why reserve them solely for critical sections? What is the downside from having 100 slots for general use. ISTM we should have 250 slots and log a warning if we ever hit 50 or more. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6629: Creating a gist index fails with too many LWLocks taken
On 11 May 2012 15:14, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.05.2012 16:56, Simon Riggs wrote: On 11 May 2012 11:07, Heikki Linnakangas heikki.linnakan...@enterprisedb.com  wrote: I wonder if we should reserve a few of the lwlock slots for critical sections, to make this less likely to happen. Not only in this case, but in general. We haven't seen this problem often, but it would be quite trivial to reserve a few slots. Why reserve them solely for critical sections? Because if you run out of lwlocks in a critical section, you get a PANIC. Yes, but why reserve them solely for critical sections? If you have an escape hatch you use it, always --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6629: Creating a gist index fails with too many LWLocks taken
On 11 May 2012 17:48, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.05.2012 18:18, Simon Riggs wrote: On 11 May 2012 15:14, Heikki Linnakangas heikki.linnakan...@enterprisedb.com  wrote: On 11.05.2012 16:56, Simon Riggs wrote: On 11 May 2012 11:07, Heikki Linnakangas heikki.linnakan...@enterprisedb.com   wrote: I wonder if we should reserve a few of the lwlock slots for critical sections, to make this less likely to happen. Not only in this case, but in general. We haven't seen this problem often, but it would be quite trivial to reserve a few slots. Why reserve them solely for critical sections? Because if you run out of lwlocks in a critical section, you get a PANIC. Yes, but why reserve them solely for critical sections? If you have an escape hatch you use it, always Well, no, because a PANIC is much worse than an ERROR. Think of this as a spare oxygen tank while diving, rather than an escape hatch. A spare tank can save your life if you run out of oxygen while ascending, but if you run out of oxygen on the way down, you don't continue going down with just the spare tank. Imagine that you have a process that does something like this: for (i=0; i 99; i++)  LWLockAcquire(foolock[i]) START_CRIT_SECTION(); XLogInsert(...) END_CRIT_SECTION(); What happens at the moment is that XLogInsert hits the limit when it tries to acquire WALInsertLock, so you get a PANIC. If we reserved, say, 5 locks for critical sections, so that you could hold 95 locks while outside a critical section, and 5 more within on, you would get an error earlier, outside the critical section, while acquiring the foolocks. Or if the number of foolocks acquired was less than 95, you would not get error at all. That avoids the PANIC. You can argue for just raising the limit, but that just moves the problem around. It's still possible to hit the limit within a critical section, and PANIC. Likewise, if we lower the limit, that helps us find the problems earlier in the development cycle, but doesn't change the fact that if you run too close to the edge, you run out of locks within a critical section and PANIC. Of course, nothing stops you from writing (bad) code that acquires a lot of locks within a critical section, in which case you're screwed anyway. OK, I agree. User code can create and hold lwlocks as it chooses, so we need to protect the server as a whole from bad user code. The implementation I would prefer would be to put the check in START_CRIT_SECTION(); so we actually fail before we enter the section. That way we don't need extra locks, which is good since any additional amount we pick can still be exceeded by user code. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] in hot standby database execute select * from pg_class indicate error
On 10 May 2012 06:41, leo xu leoxu8...@gmail.com wrote: hello:  i execute any operation in hot standby database,it indicates that index pg_class_relname_nsp_index contains unexpected zero page at block 0,please reindex it.but hot standby server ,it only allow read,not write,  i connect to primary server,i can execute any operation.  standby connects to primary using streaming replication,it has synchronous.  my database server is:9.1.2  machine is :linux 5.4 64bit. i know that i can recreate standby database,it can work. is there  others ways! Run REINDEX on the primary server. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6619: Misleading output from slave when host is not running
On Fri, Apr 27, 2012 at 8:47 AM, petteri.r...@aalto.fi wrote: LOG:  entering standby mode WARNING:  WAL was generated with wal_level=minimal, data may be missing HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup. FATAL:  hot standby is not possible because wal_level was not set to hot_standby on the master server HINT:  Either set wal_level to hot_standby on the master, or turn off hot_standby here. LOG:  startup process (PID 28761) exited with exit code 1 LOG:  aborting startup due to startup process failure The error message above on the FATAL line is wrong (or at least misleading). The real problem should be that it can't connect to the master. The wal_level on the master is hot_standby (captured after I started it): The HINT that we should simply set something on the master is a little misleading with respect to timing. However, if the master and the standby aren't even connected and you know that, how did you expect there to be a causal link between the setting on the master and the state of the standby? What do you suggest the messages say? --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
On Fri, Feb 3, 2012 at 6:45 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: I have not gotten very far with the coredump, except to observe that gdb says the Assert ought to have passed: ... This suggests very strongly that indeed the buffer was changing under us. I probably ought to let the test case run overnight before concluding anything, but at this point it's run for two-plus hours with no errors after applying this patch: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index cce87a3..b128bfd 100644 *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** RestoreBkpBlocks(XLogRecPtr lsn, XLogRec *** 3716,3724         }         else         { -            /* must zero-fill the hole */ -            MemSet((char *) page, 0, BLCKSZ);             memcpy((char *) page, blk, bkpb.hole_offset);             memcpy((char *) page + (bkpb.hole_offset + bkpb.hole_length),                  blk + bkpb.hole_offset,                  BLCKSZ - (bkpb.hole_offset + bkpb.hole_length)); --- 3716,3724         }         else         {             memcpy((char *) page, blk, bkpb.hole_offset); +            /* must zero-fill the hole */ +            MemSet((char *) page + bkpb.hole_offset, 0, bkpb.hole_length);             memcpy((char *) page + (bkpb.hole_offset + bkpb.hole_length),                  blk + bkpb.hole_offset,                  BLCKSZ - (bkpb.hole_offset + bkpb.hole_length)); The existing code makes the page state transiently invalid (all zeroes) for no particularly good reason, and consumes useless cycles to do so, so this would be a good change in any case.  The reason it is relevant to our current problem is that even though RestoreBkpBlocks faithfully takes exclusive lock on the buffer, *that is not enough to guarantee that no one else is touching that buffer*.  Another backend that has already located a visible tuple on a page is entitled to keep accessing that tuple with only a buffer pin.  So the existing code transiently wipes the data from underneath the other backend's pin. It's clear how this explains the symptoms Yes, that looks like the murder weapon. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] pg_archivecleanup outputs errors messages which are not error messages per se
On Mon, Jan 9, 2012 at 9:55 AM, Benjamin Henrion b...@udev.org wrote: I am using pg_archivecleanup in a shell script that sends STDERR to a log file, and I see the foilowing: pg_archivecleanup: keep WAL file /db/current/wal/0009001700DD and later pg_archivecleanup: removing file /db/current/wal/0009001700DC pg_archivecleanup: removing file /db/current/wal/0009001700DB Should those messages considered as being errors? I don't think so. This is a feature not a bug, since that allows the messages to appear in the PG log when the utility is used as an archive_cleanup_command. If you'd like to submit a patch to allow -s mode (silent) or similar, we'd be happy to receive it. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Incorrect comment in heapam.c
On Tue, Dec 20, 2011 at 5:50 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Line 834 of heapam.c has the following comment: /*  * This is formatted so oddly so that the correspondence to the macro  * definition in access/heapam.h is maintained.  */ In fact, that macro is defined in access/htup.h...should it be? IMHO comment is wrong, code is in the right place. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6307: intarray extention gin index does not work with Hot standby
On Fri, Nov 25, 2011 at 6:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I know GIST on intarray[] do not have that problem. Very likely the problem is limited to intarray[] GIN indexes only (but I going to test some other not-well known GIN indexes tomorrow). Broken FTS indexes on Hot Standby should be known years before. You might think that, but you'd be wrong :-(. Yes, that did sound ominous. ginRedoUpdateMetapage is failing to restore the contents of the pending-list correctly, which means this is broken for all types of GIN indexes.  Will fix. Great detective work Tom as ever, much appreciated. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6307: intarray extention gin index does not work with Hot standby
On Thu, Nov 24, 2011 at 11:12 PM, Maksym Boguk maxim.bo...@gmail.com wrote: postgres=# SELECT * from test where sections '{2000}';  id | sections +-- (0 rows) Ooops. Can you see if this is just intarray or if there are other failing cases? It would be good to get more info on this before I start investigating. Thanks --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6269: Anomaly detection
On Tue, Oct 25, 2011 at 10:51 AM, Paul Stapersma paul.staper...@gmail.com wrote: PostgreSQL version: 8.3.3 For a project at my University, we compared PostgreSQL with MySQL's InnoDB. In this research, we found several cases in which anomalies where detected in Isolation levels that guaranteed not to have these anomalies. It looks to me that you've made an honest attempt at this comparison. Thank you for noting points such as PostgreSQL’s overall performance was for all isolation levels better in comparison to InnoDB’s performance. To allow this type of study to be widely accepted, you need to declare who you are and where you are from. And to allow your results to be confirmed. Accountability and verifiability are essential aspects of good science. I would also like you to state your funding source, so that it is clear whether or not marketing funds inspire or reward such studies. Full disclosure is appreciated. Using InnoDB 5.0 and PostgreSQL 8.3 isn't very useful and I'm surprised you haven't used the latest production releases of both products, especially when we have ground breaking features in exactly this area in the latest release. Perhaps I misunderstand - you don't actually say which release of InnoDB you use, though you have URLs linking to both 5.0 and 5.1 versions. As others have observed, some other points need more careful examination and it would be good to provide exact details of your tests to allow any such results to be reproduced. If problems do exist, we will be very interested. There are many people here that would be happy to help if you find problems, so please don't be put off by these responses to your work. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6264: Superuser does not have inherent Replication permission
On Tue, Oct 25, 2011 at 1:39 PM, Magnus Hagander mag...@hagander.net wrote: On Mon, Oct 24, 2011 at 16:37, Keith Fiske ke...@omniti.com wrote: On Sat, Oct 22, 2011 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote: Keith Fiske ke...@omniti.com writes: If you create a user as a NONsuperuser, then later ALTER them to be one, they will NOT have the replication permission and cannot be used as a replication user until you explicitly grant that permission. That doesn't sound to me like a bug.  These flags are independent, we just provide a certain default at role creation time. That is not what the documentation as read would lead people to believe. I'd be more than happy to help with clarifying the documentation myself if needed. Just let me know how. This part I agree with - it makes sense for ALTER to set both flags when it enables superuser. +1 Change the behaviour to match the docs makes most sense to me. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] char(0)
On Mon, Oct 17, 2011 at 8:31 AM, Susanne Ebrecht susa...@2ndquadrant.com wrote: PostgreSQL isn't supporting CHAR(0). What does the SQL Standard say? --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6200: standby bad memory allocations on SELECT
On Thu, Sep 8, 2011 at 11:33 PM, Daniel Farina dan...@heroku.com wrote:  ERROR: invalid memory alloc request size 18446744073709551613 At least once, a hot standby was promoted to a primary and the errors seem to discontinue, but then reappear on a newly-provisioned standby. So the query that fails is a btree index on a hot standby. I don't fully accept it as an HS bug, but lets assume that it is and analyse what could cause it. The MO is certain user queries, only observed in HS. So certain queries might be related to the way we use indexes or not. There is a single and small difference between how a btree index operates in HS and normal operation, which relates to whether we kill tuples in the index. That's simple code and there's no obvious bugs there, nor anything that specifically allocates memory even. So the only bug that springs to mind is something related to how we navigate hot chains with/without killed tuples. i.e. the bug is not actually HS related, but is only observed under conditions typical in HS. HS touches almost nothing else in user space, apart from snapshots. So there could be a bug there also, maybe in CopySnapshot(). --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6136: Perfomance dies when using PK on 64-bit field
On Sun, Jul 31, 2011 at 2:47 PM, Robert robert.ayrapet...@gmail.com wrote: I've found strange behavior of my pg installation (tested both 8.4 and 9.0 - they behave same) on FreeBSD platform. In short - when some table have PK on bigint field - COPY to that table from file becomes slower and slower as table grows. When table reaches ~5GB - COPY of 100k records may take up to 20 mins. I've experimented with all params in configs, moved indexes to separate hdd etc - nothing made any improvement. However, once I'm dropping 64 bit PK - COPY of 100k records passes in seconds. Interesting thing - same table has other indexes, including composite ones, but none of them include bigint fields, that's why I reached decision that bug connected with indexes on bigint fields only. In terms of IO picture is following: after copy started gstat shows 100% load on index partition (as I mentioned above - I've tried separate hdd to keep index tablespace), large queue (over 2k elements), and constant slow write on speed of ~2MB\s. Hdd becomes completely unresponsive, even ls on empty folder hangs for minute or so. To avoid thoughts like your hdd is slow, you haven't tuned postgresql.conf etc - all slowness dissapears with drop of bigint PK, same time other indexes on same table remain alive. And yes - I've tried drop PK \ recreate PK, vacuum full analyze and all other things - nothing helped, only drop helps. Sounds weird. Looking at this now. Could be that its storing the wrong kind of plan on the RI trigger for PK. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6136: Perfomance dies when using PK on 64-bit field
On Mon, Aug 1, 2011 at 8:25 AM, simon si...@2ndquadrant.com wrote: Could be that its storing the wrong kind of plan on the RI trigger for PK. No, not that. Will look elsewhere. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6094: Streaming replication does not catch up when writing enough data
On Thu, Jul 7, 2011 at 1:05 PM, David Hartveld david.hartv...@mendix.com wrote: The following bug has been logged online: Bug reference:    6094 Logged by:      David Hartveld Email address:    david.hartv...@mendix.com PostgreSQL version: 9.1-beta2 Operating system:  Debian GNU/Linux 6.0.2 Squeeze Description:     Streaming replication does not catch up when writing enough data Details: After creation of two new clusters, and setting them up as master and slave (in async mode, according to the current 9.1 docs), the execution of a large SQL script (creating a db, tables, sequences, etc., filling them with data through COPY) runs properly on the master, but does not stream to the slave, i.e. the slave does not catch up. In the master log, the following line is printed many times: Your output indicates that there is a problem in your replication setup and this is why the slave does not catch up. This is not a performance issue. It is either a bug in replication, or a user configuration issue. Since few things have changed in 9.1 in this area, at the moment the balance of probablity if user error. If you can provide a more isolated bug report we may be able to investigate. This is being discussed in a thread on the General list and there is no reason to post twice. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6041: Unlogged table was created bad in slave node
On Wed, Jun 1, 2011 at 7:28 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, May 26, 2011 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Euler Taveira de Oliveira's message of jue may 26 12:00:05 -0400 2011: I think we should emit the real cause in those cases, if possible (not too much overhead). The message would be Unlogged table content is not available in standby server. I guess what it should do is create an empty file in the slave. Probably it should, because won't the table malfunction after the slave is promoted to master, if there's no file at all there?  Or will the process of coming live create an empty file even if there was none? Coming live creates an empty file. But Euler is pointing out a different issue, which is usability.  If the slave just acts like the table is present but empty, we are likely to get bug reports about that too.  An error telling you you aren't allowed to access such a table on slaves would be more user-friendly, if we can do it without too much pain. I looked into this a bit.  A few observations: (1) This problem is actually not confined to unlogged tables; temporary tables have the same issue.  For example, if you create a temporary table on the master and then, on the slave, do SELECT * FROM  pg_temp_3.hi_mom (or whatever the name of the temp schema where the temp table is) you get the same error.  In fact I suspect if you took a base backup that included the temporary relation and matched the backend ID you could even manage to read out the old contents (modulo any fun and exciting XID wraparound issues).  But the problem is of course more noticeable for unlogged tables since they're not hidden away in a special funny schema. Seems like you're trying to fix the problem directly, which as you say, has problems. At some point we resolve from a word mentioned in the FROM clause to a relfilenode. Surely somewhere there we can notice its unlogged before we end up down in the guts of smgr? --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6042: unlogged table with Streaming Replication
On Fri, May 27, 2011 at 7:15 AM, Tomonari Katsumata katsumata.tomon...@po.ntts.co.jp wrote: I've thought that if there are no standby, the primary would behave like stand-alone... sorry, this is my misunderstanding. It is a common misunderstanding. The programmed behaviour leads to the most trusted level of robustness. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Warm Standby startup process unconditionally hangs
a report about this before now. (You replied there had been one). ProcSendSignal() doesn't do anything at all when called with the Startup proc's pid, because BackendPidGetProc() returns NULL. The root cause appears to be the separation of the two parts of the Hot Standby patch that occurred late in 8.4 cycle. Looks like the protection intended to allow the bgwriter to coexist with the startup process was never applied. Similar issue exists in 9.0 also when HS not enabled. (Fixed) The patches are slightly different because the infrastructure isn't all there in 8.4. Please can you test these before I commit to 8.4. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services publish_startup_info_84.v1.patch Description: Binary data -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5938: PostgreSQL Installer outputs log file with superuser password in clear text
On Tue, Mar 22, 2011 at 4:09 PM, Dave Page dp...@pgadmin.org wrote: On Tue, Mar 22, 2011 at 3:45 PM, Dave Page dp...@pgadmin.org wrote: On Tue, Mar 22, 2011 at 5:10 AM, Craig Sacco craig.sa...@gmail.com wrote: The following bug has been logged online: Bug reference:    5938 Logged by:      Craig Sacco Email address:    craig.sa...@gmail.com PostgreSQL version: 9.0.3 Operating system:  Microsoft Windows (all variants, 32 and 64 bit) Description:     PostgreSQL Installer outputs log file with superuser password in clear text Details: The PostgreSQL installer outputs a log file to the temporary directory with the superuser password in clear text. We are deploying PostgreSQL as part of a commercial product and would like to ensure that the password is not available to ordinary users. This has been fixed for the next releases. For the sake of the archives, it should also be noted that the file is in a secure directory, much as a .pgpass file would be, so this is generally only an issue for the situation described above, and not when a user installs a copy himself. I accept its not a worst-case problem, but we should rate the problem A-D as with other security issues. All cases should get a rating so we know what we're dealing with The problem is that the password is disclosed in a surprising way. .pgpass files are explicitly put there by a user, so they know what they've done. Putting a password in cleartext somewhere is an issue if people don't know about it. --  Simon Riggs  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS][PATCH] crash in 9.1's psql:describeOneTableDetails
On Sat, 2011-02-12 at 15:39 +0100, Andres Freund wrote: On Thursday, February 10, 2011 10:50:38 AM Andres Freund wrote: Hi, Sorry, I unfortunately had the wrong window focused when hitting enter... # \d category column number 2 is out of range 0..1 results in: Program received signal SIGSEGV, Segmentation fault. Ok, found the bug, simple oversight in invalid foreign key patch. Andres Thanks, will apply. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] [Fwd: DBD::Pg on HP-UX 11.31 64bit]
FW Forwarded Message From: H.Merijn Brand h.m.br...@xs4all.nl To: DBI Developers Mailing List dbi-...@perl.org, Simon Riggs si...@2ndquadrant.com Subject: DBD::Pg on HP-UX 11.31 64bit Date: Tue, 14 Dec 2010 13:35:49 +0100 I have postgres running on most my HP-UX varieties, ranging from HP-UX 10.20/32bit through 11.31/64bit. It works fine everywhere, except on HP-UX 11.31-ipf in 64bit mode. Note that this is Itanium architecture. postgres' own test suite passes, but all connects fail with DBD::Pg. The compiled psql works fine. Same for 8.4.5 and 9.0.1. I'm working with 8.4.5 now to make it as stable as possible to compare to my other architectures. Everything is a success when I use the same perl built in 32bit mode I've run truss (HP-UX' version of trace, and narrowed the fail part to 141 lines by being as explicit as possible (e.g setting PGHOST to an IP address to prevent calls to the resolver). Or to 56 lines for the version ran with gcc compiled perl5.12. I now need guidance or help to nail down where things actually cause these failures. Here are two traces. The first for perl-5.10.1 compiled with HP's C-ANSI-C, the second for perl-5.12.2 compiled with gcc-4.2.4 This is perl, v5.10.1 (*) built for IA64.ARCHREV_0-LP64 open (/pro/pgsql/lib/libpq.so.5, O_RDONLY|0x800, 0) = 3 fstat (3, 0x9fffdd70) = 0 read (3, 7f454c46020201010100 | ELF .., 64) = 64 pread (3, 7f454c46020201010100 | ELF .., 1024, 0) = 1024 stat (/usr/lib/hpux64/dpd, 0x9fffd3d0) = 0 open (/usr/lib/hpux64/dpd/libpq.so.5.bpd, O_RDONLY|0x800, 0) ERR#2 ENOENT getuid () = 203 (203) getgid () = 200 (200) mmap (NULL, 428816, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 3, 0) = 0xc9d28000 mmap (NULL, 9860, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 3, 458752) = 0x9fffbf764000 close (3) = 0 getuid () = 203 (203) getgid () = 200 (200) mmap (NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)= 0x9fffbf76c000 getuid () = 203 (203) getgid () = 200 (200) open (/pro/local/lib/libcrypto.so, O_RDONLY|0x800, 0) ERR#2 ENOENT open (/usr/local/ssl/lib/libcrypto.so, O_RDONLY|0x800, 0) = 3 fstat (3, 0x9fffdd20) = 0 read (3, 7f454c46020201010100 | ELF .., 64) = 64 pread (3, 7f454c46020201010100 | ELF .., 1024, 0) = 1024 stat (/usr/lib/hpux64/dpd, 0x9fffd380) = 0 open (/usr/lib/hpux64/dpd/libcrypto.so.bpd, O_RDONLY|0x800, 0) ERR#2 ENOENT getuid () = 203 (203) getgid () = 200 (200) mmap (NULL, 3530960, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 3, 0) = 0xc6fe8000 mmap (NULL, 11552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_SHLIB, -1, 0) = 0x9fffbf76 mmap (0x9fffbf731000, 191296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_SHLIB, 3, 3538944) = 0x9fffbf731000 close (3) = 0 getuid () = 203 (203) getgid () = 200 (200) mmap (NULL, 32768, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x9fffbf728000 getuid () = 203 (203) getgid () = 200 (200) open (/pro/local/lib/libssl.so, O_RDONLY|0x800, 0) ERR#2 ENOENT open (/usr/local/ssl/lib/libssl.so, O_RDONLY|0x800, 0) = 3 fstat (3, 0x9fffdd20
Re: [BUGS] BUG #5727: Indexes broken in streaming replication
On Tue, 2010-10-26 at 14:08 -0400, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: * Operations on hash indexes are not presently WAL-logged, so replay will not update these indexes. Hash indexes will not be used for query plans during recovery. The initial patch indeed had a special-case in the planner to ignore hash indexes during hot standby, but it was left out because the lack of WAL-logging is a general problem with hash indexes, not a hot standby issue. Yeah, and also the index would still be broken after the slave exits hot standby and becomes live; so that hack didn't cure the problem anyway. OK, that's a good argument. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
On Thu, 2010-08-12 at 01:31 -0400, Tom Lane wrote: So the DBA is just flying blind as to whether the slave is trustworthy yet. I can't prove that that's what burnt the original complainant, but it fits the symptoms. The safest approach is to 1. run pg_start_backup() on master, remember LSN 2. copy backup_label from master to standby 3. wait for starting LSN to be applied on standby 4. run backup on standby 5. run pg_stop_backup() on master That ensures we don't run without full page writes during backup and that we have an explicit consistency point to work from. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Re: BUG #5602: Recovering from Hot-Standby file backup leads to the currupted indexes
On Thu, 2010-08-05 at 11:28 -0700, valgog wrote: This process seems almost entirely unrelated to the documented way of doing it; I'm not surprised that you end up with some files not in sync. Please see pg_start_backup and friends. It was done as documented in http://www.postgresql.org/docs/9.0/static/backup-incremental-updated.html The procedure used does differ from that documented. However, IMHO the procedure *documented* is *not* safe and could lead to corrupt indexes in the way described, since the last recovered point might be mid-way between two halves of an index split record, which will never be corrected during HS. What I find surprising is that the technique the OP describes should be safe, assuming step 5 waits for the correct point of consistency before attempting to run queries. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5566: High levels of savepoint nesting trigger stack overflow in AssignTransactionId
On Wed, 2010-07-21 at 21:20 +0200, Andres Freund wrote: If want I can write a patch for that as well, seems to be trivial enough. Updated patch attached. hm. I dont want to push - just to ask: Is any comitter looking either at the patch or the bug? Patch looks OK, though IMHO wouldn't be looking to backpatch this. I'll look at this in more detail next week, unless others wish to. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PD_ALL_VISIBLE flag error on 9.0 alpha 4
On Sat, 2010-03-13 at 11:29 -0800, Josh Berkus wrote: That's better, I was worried you'd gone all complimentary on me. grinNever fear that! Was that setting originally part of your design for HS? If so, why did you back off from it? We all agreed its a kluge, that's why. It's also my 3rd choice of solution behind fine-grained lock conflicts (1st) which would avoid many issues and master/standby in lock step (2nd). Having said that I'm much in favour of providing a range of options and then letting users tell us what works for them. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PD_ALL_VISIBLE flag error on 9.0 alpha 4
On Wed, 2010-03-10 at 17:12 -0800, Josh Berkus wrote: On 3/10/10 3:26 PM, Simon Riggs wrote: OK, that's enough to not remove it. I was aware of more negative thoughts and conscious of my own feelings about it being a kluge. Well, it *is* a kludge, but it may be the best one for people who want to use HS/SR to support web applications. So I think we should work on making it less kludgy. That's better, I was worried you'd gone all complimentary on me. Ultimately we're going to need publish-XID-to-master, but that's not realistic for 9.0. Still some open items that I'm working on, but not that. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PD_ALL_VISIBLE flag error on 9.0 alpha 4
On Tue, 2010-03-09 at 22:02 -0800, Josh Berkus wrote: 1. Set up 9.0a4 doing SR replication with a 2nd 9.0a4 2. Ran pgbench for a while. 3. Aborted pgbench with Ctl-C 4. Changed vacuum_defer_cleanup_age in postgresql.conf and reloaded 5. Ran pgbench again, and got: Sidney-Stratton:pg90 josh$ pgbench -c 2 -T 300 bench starting vacuum...WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation pgbench_branches page 0 WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation pgbench_branches page 1 WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation pgbench_tellers page 0 WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation pgbench_tellers page 1 Understandable. Time to remove vacuum_defer_cleanup_age, I think. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PD_ALL_VISIBLE flag error on 9.0 alpha 4
On Wed, 2010-03-10 at 23:08 +0200, Heikki Linnakangas wrote: Time to remove vacuum_defer_cleanup_age, I think. Umm, so what's the bug? Whether you call it a bug or just an annoyance is debatable, but the source of it is clear. Given the lack of effectiveness, I propose removing it. Would you agree or disagree with the suggested removal? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PD_ALL_VISIBLE flag error on 9.0 alpha 4
On Wed, 2010-03-10 at 17:55 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Time to remove vacuum_defer_cleanup_age, I think. Umm, so what's the bug? Whether you call it a bug or just an annoyance is debatable, but the source of it is clear. Maybe to you, but the rest of us would like to know. If vacuum_defer_cleanup_age is set higher this causes the xmin to go backwards, leading to the PD_ALL_VISIBLE flag was incorrectly set warning. Having this false xmin move backwards doesn't endanger the standby, since the xids arrive and are checked normally. If they stop arriving that is fine. Having the false xmin going backwards is not a serious issue on primary because the actual xmin does not go backwards. No observer loses information as a result of this, it is only about whether cleanup records are generated later than normal, or not. Given the lack of effectiveness, I propose removing it. I read Josh's recent report at http://archives.postgresql.org/message-id/4b973c3f.9070...@agliodbs.com to say that it's quite effective. I think you're being way too hasty to decide that it can just be dropped. OK, that's enough to not remove it. I was aware of more negative thoughts and conscious of my own feelings about it being a kluge. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5357: PGAdmin: SQL Query Editor does not (always) open files
On Tue, 2010-03-02 at 08:32 -0700, Joshua Tolley wrote: On Tue, Mar 02, 2010 at 10:08:55AM +, Julien wrote: Description:PGAdmin: SQL Query Editor does not (always) open files A better place for pgadmin problems is the pgadmin-support list: http://www.pgadmin.org/support/list.php I think we need to work on a distribution mechanism for such requests, rather than trouble bug reporters with our bureaucracy. pgAdmin has been bundled with Postgres for some time now. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Assertion failure on reload of GUC_SUPERUSER_ONLY parms
TRAP: BadState(!(((bool) ((CurrentUserId) != ((Oid) 0, File: miscinit.c, Line: 295) Gabriele Bartolini originally reported this to me as a bug in Hot Standby, though I have now been able to reproduce in CVS head. We were just unlucky enough to hit it while doing thorough HS testing. backtrace #0 0x2b7beafd2b45 in raise () from /lib64/libc.so.6 #1 0x2b7beafd40e0 in abort () from /lib64/libc.so.6 #2 0x006c016d in ExceptionalCondition ( conditionName=value optimized out, errorType=value optimized out, fileName=value optimized out, lineNumber=value optimized out) at assert.c:57 #3 0x006ccf4d in GetUserId () at miscinit.c:295 #4 0x006db1b9 in superuser () at superuser.c:48 #5 0x006d5a2b in GetConfigOption (name=0xb2b6e8 log_filename) at guc.c:5226 #6 0x006d8a2c in ProcessConfigFile (context=PGC_SIGHUP) at guc-file.l:319 #7 0x005e49e2 in SIGHUP_handler ( postgres_signal_arg=value optimized out) at postmaster.c:2062 #8 signal handler called #9 0x2b7beb060bc3 in select () from /lib64/libc.so.6 #10 0x005e1bf7 in ServerLoop () at postmaster.c:1360 #11 0x005e2b14 in PostmasterMain (argc=3, argv=0xb246a0) at postmaster.c:1065 It turns out that when you modify a GUC_SUPERUSER_ONLY parameter, such as log_filename, in postgresql.conf and then reload you will get the assertion failure. It looks to me like the correct fix would be to use GetUserIdAndContext() instead, though I would suggest inventing GetUserIdIfAny() which would skip the assertion test for use in superuser(). -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] GIN needs tonic
On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: In recovery of GIN operations using CVS HEAD I see consistently TRAP: FailedAssertion(!(((bool) ((spcNode) != ((Oid) 0, File: tablespace.c, Line: 116) Looking at code, new list page WAL record is a GIN record type and at line 115 in gin/ginfast.c I see that the value of the node is unset. That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so triggers the assertion failure. Yeah, so it seems. The fix should be as simple as: I can't speak for anything more than that it stops it from failing with an assertion. That is very good since it allows me to bypass it and continue with my own testing. This means that the WAL replay of that record type has never been tested correctly :-(. This must have been added after mid-Feb this year. I notice there are a few places where functionality is tested against temp tables, which may mask other non-recoverable issues in this and other rmgrs. We should make it standard practice to include only non-temp tables to cover functionality other than specific temp table commands. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] GIN needs tonic
On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, 2009-09-15 at 09:41 +0300, Heikki Linnakangas wrote: This means that the WAL replay of that record type has never been tested correctly :-(. This must have been added after mid-Feb this year. I notice there are a few places where functionality is tested against temp tables, which may mask other non-recoverable issues in this and other rmgrs. We should make it standard practice to include only non-temp tables to cover functionality other than specific temp table commands. I've pointed out before that the regression tests are not particularly meant to provide an exhaustive test of WAL recovery. In this particular case, so far as I can tell the bug is only observable with full_page_writes turned off --- otherwise XLogInsert will invariably decide to log the full page, because it's going to see a zeroed-out LSN in the passed-in buffer. Yes, I was testing with full_page_writes = off at that point. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] GIN needs tonic
On Tue, 2009-09-15 at 15:34 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Tue, 2009-09-15 at 14:31 -0400, Tom Lane wrote: I've pointed out before that the regression tests are not particularly meant to provide an exhaustive test of WAL recovery. In this particular case, so far as I can tell the bug is only observable with full_page_writes turned off --- otherwise XLogInsert will invariably decide to log the full page, because it's going to see a zeroed-out LSN in the passed-in buffer. Yes, I was testing with full_page_writes = off at that point. In further testing, I've found that the CVS HEAD regression tests actually will expose the error if you run them with full_page_writes off and then force a WAL replay. (8.4 would not have, though, because of the DROP TABLESPACE at the end of the run.) In any case, I stand by the opinion that the standard regression tests aren't really meant to exercise WAL recovery. Yes, that's exactly how I located the first bug. Sorry if that wasn't clear. BTW, there's more than one bug here :-(. Heikki found one, but the code is also attaching the buffer indicator to the wrong rdata entry --- the record header, not the workspace, is what gets suppressed if the full page is logged. Well, whether they were meant to they do and to a much greater extent than any other body of tests that I'm aware of. Even if we had another test suite I would still expect recovery to handle a run on the primary of the full regression tests without problem. I look for multiple ways of testing and that is just one. I guess if you or another committer spends some time writing a test framework that is useful and that you can trust, I'm sure many people will add to it. That'll be true for any of the major/complex areas not covered by public test suites: concurrency, recovery and the optimizer. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] GIN needs tonic
On Tue, 2009-09-15 at 12:09 -0400, Tom Lane wrote: I'm working on this now. Thanks to you and Heikki for fixing this while I worked around it. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] GIN needs tonic
In recovery of GIN operations using CVS HEAD I see consistently TRAP: FailedAssertion(!(((bool) ((spcNode) != ((Oid) 0, File: tablespace.c, Line: 116) Looking at code, new list page WAL record is a GIN record type and at line 115 in gin/ginfast.c I see that the value of the node is unset. That means XLOG_GIN_INSERT_LISTPAGE always has specNode == 0 and so triggers the assertion failure. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Fri, 2009-06-26 at 16:48 -0400, Tom Lane wrote: * I find the RecoveryInProgress test in XLogNeedsFlush rather dubious. Why is it okay to disable that? For at least one of the two callers (SetHintBits) it seems like the safe answer is true not false. This doesn't matter too much yet but it will for hot standby. IIUC you think it is safest to *not* write hint bits during Hot Standby? So we would treat all commits as async commits? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 12:55 +, Fujii Masao wrote: The following bug has been logged online: Bug reference: 4879 Logged by: Fujii Masao Email address: masao.fu...@gmail.com PostgreSQL version: 8.4dev Operating system: RHEL5.1 x86_64 Description:bgwriter fails to fsync the file in recovery mode Details: Looking at it now. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 12:55 +, Fujii Masao wrote: The restartpoint by bgwriter in recovery mode caused the following error. ERROR: could not fsync segment 0 of relation base/11564/16422_fsm: No such file or directory I think I see a related bug also. register_dirty_segment() run by Startup process doesn't differentiate correctly whether the bgwriter is active. md.c line 1194. So it will continue to register fsync requests into its own private pendingOpsTable when it should be forwarding them to bgwriter. When bgwriter runs mdsync() the contents of startup's pendingOpsTable will be ignored. That looks to me to be a serious bug. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 17:35 +0300, Heikki Linnakangas wrote: Heikki Linnakangas wrote: Hmm, what happens when the startup process performs a write, and bgwriter is not running? Do the fsync requests queue up in the shmem queue until the end of recovery when bgwriter is launched? I guess I'll have to try it out... Oh dear, doesn't look good. The startup process has a pendingOpsTable of its own. bgwriter won't fsync() files that the startup process has written itself. That needs to be fixed, or you can lose data when an archive recovery crashes after a restartpoint. Yes, that's what I see also. Patch attached. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support Index: src/backend/access/transam/xlog.c === RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.343 diff -c -r1.343 xlog.c *** src/backend/access/transam/xlog.c 11 Jun 2009 14:48:54 - 1.343 --- src/backend/access/transam/xlog.c 25 Jun 2009 15:00:13 - *** *** 5472,5479 * process in addition to postmaster! */ if (InArchiveRecovery IsUnderPostmaster) SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED); ! /* * main redo apply loop */ --- 5472,5482 * process in addition to postmaster! */ if (InArchiveRecovery IsUnderPostmaster) + { SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED); ! SetForwardFsyncRequests(); ! } ! /* * main redo apply loop */ Index: src/backend/storage/smgr/md.c === RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/storage/smgr/md.c,v retrieving revision 1.146 diff -c -r1.146 md.c *** src/backend/storage/smgr/md.c 11 Jun 2009 14:49:02 - 1.146 --- src/backend/storage/smgr/md.c 25 Jun 2009 15:02:26 - *** *** 141,146 --- 141,147 static HTAB *pendingOpsTable = NULL; static List *pendingUnlinks = NIL; + static remember_fsync_requests_locally = false; static CycleCtr mdsync_cycle_ctr = 0; static CycleCtr mdckpt_cycle_ctr = 0; *** *** 199,204 --- 200,206 100L, hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT); + remember_fsync_requests_locally = true; pendingUnlinks = NIL; } } *** *** 1180,1185 --- 1182,1202 } /* + * InArchiveRecovery we need forward requests to the bgwriter. + */ + void + SetForwardFsyncRequests(void) + { + remember_fsync_requests_locally = false; + + /* + * In case caller has any pending ops + */ + if (pendingOpsTable) + mdsync(); + } + + /* * register_dirty_segment() -- Mark a relation segment as needing fsync * * If there is a local pending-ops table, just make an entry in it for *** *** 1191,1197 static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg) { ! if (pendingOpsTable) { /* push it into local pending-ops table */ RememberFsyncRequest(reln-smgr_rnode, forknum, seg-mdfd_segno); --- 1208,1214 static void register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg) { ! if (remember_fsync_requests_locally) { /* push it into local pending-ops table */ RememberFsyncRequest(reln-smgr_rnode, forknum, seg-mdfd_segno); *** *** 1219,1225 static void register_unlink(RelFileNode rnode) { ! if (pendingOpsTable) { /* push it into local pending-ops table */ RememberFsyncRequest(rnode, MAIN_FORKNUM, UNLINK_RELATION_REQUEST); --- 1236,1242 static void register_unlink(RelFileNode rnode) { ! if (remember_fsync_requests_locally) { /* push it into local pending-ops table */ RememberFsyncRequest(rnode, MAIN_FORKNUM, UNLINK_RELATION_REQUEST); *** *** 1377,1383 void ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum) { ! if (pendingOpsTable) { /* standalone backend or startup process: fsync state is local */ RememberFsyncRequest(rnode, forknum, FORGET_RELATION_FSYNC); --- 1394,1400 void ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum) { ! if (remember_fsync_requests_locally) { /* standalone backend or startup process: fsync state is local */ RememberFsyncRequest(rnode, forknum, FORGET_RELATION_FSYNC); *** *** 1416,1422 rnode.spcNode = 0; rnode.relNode = 0; ! if (pendingOpsTable) { /* standalone backend or startup process: fsync state is local */ RememberFsyncRequest(rnode, InvalidForkNumber, FORGET_DATABASE_FSYNC); --- 1433,1439 rnode.spcNode = 0; rnode.relNode = 0; ! if (remember_fsync_requests_locally) { /* standalone backend or startup process: fsync state is local */ RememberFsyncRequest(rnode
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 17:02 +0300, Heikki Linnakangas wrote: I think the real problem is this in mdunlink(): /* Register request to unlink first segment later */ if (!isRedo forkNum == MAIN_FORKNUM) register_unlink(rnode); When we replay the unlink of the relation, we don't te bgwriter about it. Normally we do, so bgwriter knows that if the fsync() fails with ENOENT, it's ok since the file was deleted. It's tempting to just remove the !isRedo condition, but then we have another problem: if bgwriter hasn't been started yet, and the shmem queue is full, we get stuck in register_unlink() trying to send the message and failing. In archive recovery, we always start bgwriter at the beginning of WAL replay. In crash recovery, we don't start bgwriter until the end of wAL replay. So we could change the !isRedo condition to !InArchiveRecovery. It's not a very clean solution, but it's simple. That seems to work for me, though I have some doubts as to the way two phase commit is coded. 2PC seems to assume that if a file still exists we must be in recovery and its OK to ignore. Clean? We've changed the conditions under which the unlink needs to be registered and !InArchiveRecovery defines the changed conditions precisely. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 11:31 -0400, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Heikki Linnakangas wrote: Hmm, what happens when the startup process performs a write, and bgwriter is not running? Do the fsync requests queue up in the shmem queue until the end of recovery when bgwriter is launched? I guess I'll have to try it out... Oh dear, doesn't look good. The startup process has a pendingOpsTable of its own. bgwriter won't fsync() files that the startup process has written itself. That needs to be fixed, or you can lose data when an archive recovery crashes after a restartpoint. Ouch. I'm beginning to think that the best thing is to temporarily revert the change that made bgwriter active during recovery. That seems the safest course, to avoid derailing the schedule. Please define temporarily. Will it be re-enabled in 8.4.1, assuming we find an acceptable fix? It's obviously not been adequately thought through or tested. Hmmm... -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 18:12 +0300, Heikki Linnakangas wrote: A short fix would be to have bgwriter do the shutdown checkpoint instead in archive recovery. I don't recall if there was a reason it wasn't coded like that to begin with, though. I think the problem was that it was coded both ways at different stages of patch evolution. The decision to retain the shutdown checkpoint by the startup process was taken in January, IIRC. Having startup process issue this if (InArchiveRecovery) RequestCheckpoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FORCE | CHECKPOINT_WAIT) else should make the startup process wait for bgwriter to complete the checkpoint. But we need to set LocalRecoveryInProgress appropriately also. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 18:57 +0300, Heikki Linnakangas wrote: I came up with the attached patch, which includes Simon's patch to have all fsync requests forwarded to bgwriter during archive recovery. To fix the startup checkpoint issue, startup process requests a forced restartpoint, which will flush any fsync requests bgwriter has accumulated, before doing the actual checkpoint in the startup process. That looks fine. This is completely untested still, but does anyone immediately see any more problems? Nothing seen. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 12:33 -0400, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: This is completely untested still, but does anyone immediately see any more problems? Isn't SetForwardFsyncRequests going to cause the final mdsync to fail? Yes it will, as it is now. I didn't read the parts of the patch that were described as being from my patch. Heikki, please tell me if you change things... it means I have to be both patch author and reviewer. Perhaps just separate patches, so we can review them independently. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 12:43 -0400, Tom Lane wrote: What about revert the patch? That's probably just as dangerous. Much easier to just avoid that state altogether, if you must. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 19:37 +0300, Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: This is completely untested still, but does anyone immediately see any more problems? Isn't SetForwardFsyncRequests going to cause the final mdsync to fail? Yes, I just noticed that myself, testing it for the first time. That check needs to be suppressed in the startup checkpoint. Better to do it as it was in my patch. We can turn off/on then. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 13:25 -0400, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: It's tempting to just remove the !isRedo condition, but then we have another problem: if bgwriter hasn't been started yet, and the shmem queue is full, we get stuck in register_unlink() trying to send the message and failing. In archive recovery, we always start bgwriter at the beginning of WAL replay. In crash recovery, we don't start bgwriter until the end of wAL replay. So we could change the !isRedo condition to !InArchiveRecovery. It's not a very clean solution, but it's simple. I looked through the callers of smgrdounlink(), and found that FinishPreparedTransaction passes isRedo = true. I'm not sure at the moment what the reasoning behind that was, but it does seem to mean that checking InArchiveRecovery instead of isRedo may not be quite right. I think that's because FinishPreparedTransaction() implicitly assumes that if a file still exists we are in recovery. I can't comment on whether that is necessarily true for some reason, but it doesn't sound like it is a safe assumption. I would guess it's using isRedo as an implicit override rather than using the correct meaning of the variable. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 14:46 -0400, Tom Lane wrote: On the other point: are we going to eliminate mdunlink's isRedo parameter? Maybe the better thing is to have its callers pass the value of InArchiveRecovery? We have one caller that is using a parameter incorrectly. It seems we should correct the caller rather than change the API and potentially effect all callers. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 21:59 +0300, Heikki Linnakangas wrote: I think my initial analysis of this bug was bogus. Perhaps, but the pendingOpsTable issue is a serious one, so we haven't wasted our time by fixing it. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 15:10 -0400, Tom Lane wrote: Are you (Heikki and Simon) in a position to finish these things today? Certainly will try. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 15:10 -0400, Tom Lane wrote: So to summarize the state of play, it seems we have these issues: * need to delete startup process's local pendingOpsTable once bgwriter is launched, so that requests go to bgwriter instead Need to ensure that fsync requests are directed to the process that will act on the fsync requests. * need to push end-of-recovery checkpoint into bgwriter That's probably the easiest thing to do, but the issue is that we must fsync all files mentioned in the pendingOpsTable in *any* process that has been accumulating such requests. * need to do something about IsRecovery tests that will now be executed in bgwriter Yes * need to fix mistaken code in FinishPreparedTransaction Yes -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 22:29 +0300, Heikki Linnakangas wrote: Tom Lane wrote: While nosing around the problem areas, I think I've found yet another issue here. The global bool InRecovery is only maintained correctly in the startup process, which wasn't a problem before 8.4. However, if we are making the bgwriter execute the end-of-recovery checkpoint, there are multiple places where it is tested that are going to be executed by bgwriter. I think (but am not 100% sure) that these are all the at-risk references: XLogFlush CheckPointMultiXact CreateCheckPoint (2 places) Heikki's latest patch deals with the tests in CreateCheckPoint (rather klugily IMO) but not the others. I think it might be better to fix things so that InRecovery is maintained correctly in the bgwriter too. We could set InRecovery=true in CreateCheckPoint if it's a startup checkpoint, and reset it afterwards. That seems like a bad idea. As you said earlier, On Thu, 2009-06-25 at 23:15 +0300, Heikki Linnakangas wrote: Well, we have RecoveryInProgress() now that answers the question is recovery still in progress in the system. InRecovery now means am I a process that's performing WAL replay?. so it would be a mistake to do as you propose above because it changes the meaning of these well defined parts of the system. * XLogFlush mentions InRecovery right at the end and the correct usage would be RecoveryIsInProgress() rather than InRecovery. * The usage of InRecovery in CheckPointMultiXact() should also be replaced with RecoveryIsInProgress() * The two usages of InRecovery can also be replaced by RecoveryIsInProgress() So, yes, there are some places where InRecovery is used in code executed by the bgwriter, but the correct fix is to use RecoveryIsInProgress(). It is a hack to try to set the InRecovery state flag in bgwriter and one that would change the clear meaning of InRecovery, to no good purpose. Subsequent discussion on this subthread may no longer be relevant. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 22:29 +0300, Heikki Linnakangas wrote: Hmm, I see another small issue. We now keep track of the minimum recovery point. Whenever a data page is flushed, we set minimum recovery point to the LSN of the page in XLogFlush(), instead of fsyncing WAL like we do in normal operation. During the end-of-recovery checkpoint, however, RecoveryInProgress() returns false, so we don't update minimum recovery point in XLogFlush(). You're unlikely to be bitten by that in practice; you would need to crash during the end-of-recovery checkpoint, and then set the recovery target to an earlier point. It should be fixed nevertheless. RecoveryInProgress returns false only because you set LocalRecoveryInProgress at the start of the checkpoint, not at the end. It only needs to be set immediately prior to the call for XLogInsert() in CreateCheckpoint(). If we do that then XLogFlush() acts as advertised and we have no worries. Tom: Heikki's work on MinRecoveryPoint seems sound to me and altering it now seems like something too dangerous to attempt at this stage. I see no need; we have no new bug, just a minor point of how we code the fix to the bug we do have. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 17:11 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: So, yes, there are some places where InRecovery is used in code executed by the bgwriter, but the correct fix is to use RecoveryIsInProgress(). Agreed, but this gets us no closer to solving the real problem, which is that when we perform the end-of-recovery checkpoint, we need to act like we are *not* in recovery anymore, for at least some purposes. Not for some purposes, just for *one* purpose: the end of recovery checkpoint needs to run XLogInsert() which has specific protection against being run during recovery. Most notably, to allow us to write a WAL entry at all; but I am suspicious that pretty much every InRecovery/RecoveryIsInProgress test that that checkpoint might execute should behave as if we're not in recovery. You are right to question whether we should revoke the patch. ISTM that we are likely to decrease code robustness by doing that at this stage. I think we have the problems pretty much solved now. If we want to increase robustness, I would suggest we add a recovery.conf parameter to explicitly enable use of bgwriter during recovery, off by default. In addition to the fixes being worked on currently. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 17:17 -0400, Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: We do store the minimum recovery point required by the base backup in control file, but it's not advanced once the recovery starts. So if you start recovery, starting from say 123, let it progress to location 456, kill recovery and start it again, but only let it go up to 300, you end up with a corrupt database. I don't believe that you have the ability to do that. Once the recovery process is launched, the user does not have the ability to control where the WAL scan proceeds from. Or at least that's how it used to work, and if someone has tried to change it, it's broken for exactly this reason. The behavior you seem to have in mind would be completely disastrous from a performance standpoint, as we'd be writing and fsyncing pg_control constantly during a recovery. I wouldn't consider that a good idea from a reliability standpoint either --- the more writes to pg_control, the more risk of fatal corruption of that file. This is sounding more and more like something that needs to be reverted. AFAICS the problem Heikki is worried about exists 8.2+. If you stop recovery, edit recovery.conf to an earlier recovery target and then re-run recovery then it is possible that data that would not exist until after the (new) recovery point has made its way to disk. The code in 8.4 does a few things to improve that but the base problem persists and revoking code won't change that. We should describe the issue in the docs and leave it at that - there is no particular reason to believe anybody would want to do such a thing. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Fri, 2009-06-26 at 00:37 +0300, Heikki Linnakangas wrote: Ok, I've committed the above fixes everyone agreed on. Sorry, but I haven't agreed to very much of what you just committed. There are some unnecessary and confusing hacks that really don't help anybody sort out why any of this has been done. At this stage of RC, I expected you to post the patch and have the other 2 people working actively on this issue review it before you commit. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 18:40 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Fri, 2009-06-26 at 00:37 +0300, Heikki Linnakangas wrote: Ok, I've committed the above fixes everyone agreed on. At this stage of RC, I expected you to post the patch and have the other 2 people working actively on this issue review it before you commit. We're going to slip release a day to allow time to review this properly. Given that, I have no problem with the proposed fix being in CVS --- it makes it a shade easier for other people to test it. I wasn't suggesting that we post the patch and leave it for days, I was thinking more of the 3 people actively working together on the problem looking at the code before its committed. Describing it as stuff we agreed seems particularly strange in that light. On the patch, the kluge to set InRecovery is unnecessary and confusing. If I submitted that you'd kick my ass all around town. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 19:16 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On the patch, the kluge to set InRecovery is unnecessary and confusing. I'll look into a better way to do that tonight. Did you have any specific improvement in mind? Yes, all already mentioned on this thread. I'm unable to spend time on this tomorrow, but I think the main elements of the solution are there now. The bug was found in my feature, so mea culpa. Thanks to everybody for helping identify and fix it. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #4879: bgwriter fails to fsync the file in recovery mode
On Thu, 2009-06-25 at 20:25 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, 2009-06-25 at 19:16 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On the patch, the kluge to set InRecovery is unnecessary and confusing. I'll look into a better way to do that tonight. Did you have any specific improvement in mind? Yes, all already mentioned on this thread. After looking at this a bit more, the easy solution mentioned upthread doesn't work. We want the end-of-recovery checkpoint to be able to write WAL (obviously), but it *must not* try to call TruncateMultiXact or TruncateSUBTRANS, because those subsystems aren't initialized yet. The check in XLogFlush needs to behave as though InRecovery were true too. So the idea of testing RecoveryInProgress() rather than InRecovery cannot handle all these cases. Yes it can, but only if LocalRecoveryInProgress is set immediately before XLogInsert() in CreateCheckpoint(). Then it is correct However, I still agree with your thought that having InRecovery true only in the process that's applying WAL records is probably the cleanest definition. And it also seems to me that having crystal-clear definitions of these states is essential, because not being clear about them is exactly what got us into this mess. Yes What I am thinking is that instead of the hack of clearing LocalRecoveryInProgress to allow the current process to write WAL, we should have a separate test function WALWriteAllowed() with a state variable LocalWALWriteAllowed, and the hack should set that state without playing any games with LocalRecoveryInProgress. Then RecoveryInProgress() remains true during the end-of-recovery checkpoint and we can condition the TruncateMultiXact and TruncateSUBTRANS calls on that. Meanwhile the various places that check RecoveryInProgress to decide if WAL writing is allowed should call WALWriteAllowed() instead. No need. I have not yet gone through all the code sites to make sure this is a consistent way to do it, but we clearly need more states than we have now if we are to avoid weird overloadings of the state meanings. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs