Re: [HACKERS] Fast promotion failure
On 21 May 2013 09:26, Simon Riggs wrote: > I'm OK with that principle... Well, after fighting some more with that, I've gone with the, er, principle of slightly less ugliness. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 21 May 2013 07:46, Heikki Linnakangas wrote: > On 21.05.2013 00:00, Simon Riggs wrote: >> >> When we set the new timeline we should be updating all values that >> might be used elsewhere. If we do that, then no matter when or how we >> run GetXLogReplayRecPtr, it can't ever get it wrong in any backend. >> >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -5838,8 +5838,16 @@ StartupXLOG(void) >> } >> >> /* Save the selected TimeLineID in shared memory, too */ >> - XLogCtl->ThisTimeLineID = ThisTimeLineID; >> - XLogCtl->PrevTimeLineID = PrevTimeLineID; >> + { >> + /* use volatile pointer to prevent code rearrangement */ >> + volatile XLogCtlData *xlogctl = XLogCtl; >> + >> + SpinLockAcquire(&xlogctl->info_lck); >> + XLogCtl->ThisTimeLineID = ThisTimeLineID; >> + XLogCtl->lastReplayedTLI = ThisTimeLineID; >> + XLogCtl->PrevTimeLineID = PrevTimeLineID; >> + SpinLockRelease(&xlogctl->info_lck); >> + } > > > Hmm. lastReplayedTLI is supposed to be the timeline of the last record that > was replayed, ie. the timeline corresponding lastReplayedEndRecPtr. I think > it would work, but it feels like it could be an easy source of bugs in the > future. I'm OK with that principle, as long as we don't touch ThisTimeLineID, which has been the source of multiple bugs. So we should set the TLI explicitly before installing, like attached patch. Otherwise we'd need multiple permanent TLIs which would be overkill. I feel there are problems because we set the newly selected TLI from startup process into shared memory, then some time later we set SharedRecoveryInProgress = false. That timing window isn't good, but I don't see a different way. > It might be a good idea to change walsender to not store that in > ThisTimeLineID, though. It used to make sense for ThisTimeLineID to track > the current recovery timeline in 9.2 and below, but now that walsender can > be sending any older timeline, using ThisTimeLineID to store the latest one > seems confusing. Agreed, but looks like too much code to touch that lightly. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services install_xlog_right.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 21.05.2013 00:00, Simon Riggs wrote: When we set the new timeline we should be updating all values that might be used elsewhere. If we do that, then no matter when or how we run GetXLogReplayRecPtr, it can't ever get it wrong in any backend. --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5838,8 +5838,16 @@ StartupXLOG(void) } /* Save the selected TimeLineID in shared memory, too */ - XLogCtl->ThisTimeLineID = ThisTimeLineID; - XLogCtl->PrevTimeLineID = PrevTimeLineID; + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + SpinLockAcquire(&xlogctl->info_lck); + XLogCtl->ThisTimeLineID = ThisTimeLineID; + XLogCtl->lastReplayedTLI = ThisTimeLineID; + XLogCtl->PrevTimeLineID = PrevTimeLineID; + SpinLockRelease(&xlogctl->info_lck); + } Hmm. lastReplayedTLI is supposed to be the timeline of the last record that was replayed, ie. the timeline corresponding lastReplayedEndRecPtr. I think it would work, but it feels like it could be an easy source of bugs in the future. It might be a good idea to change walsender to not store that in ThisTimeLineID, though. It used to make sense for ThisTimeLineID to track the current recovery timeline in 9.2 and below, but now that walsender can be sending any older timeline, using ThisTimeLineID to store the latest one seems confusing. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 20 May 2013 20:40, Heikki Linnakangas wrote: > On 20.05.2013 22:18, Simon Riggs wrote: >> >> On 20 May 2013 18:47, Heikki Linnakangas wrote: >>> >>> Not sure what the best fix would be. Perhaps change the code in >>> >>> CreateRestartPoint() to do something like this instead: >>> >>> GetXLogReplayRecPtr(&replayTLI); >>> if (RecoveryInProgress()) >>>ThisTimeLineID = replayTLI; >> >> >> Installing a few extra WAL files doesn't seem to be a good reason to >> mess with such an important variable as ThisTimeLineID, especially >> since its such a rare event and hardly worth optimising for. >> >> I would prefer it if we didn't set ThisTimeLineID at all in that way, >> or anywhere else. If we do, it needs to be done safely, otherwise any >> caller could make the same mistake. > > >> Suggested patch attached. > > >> @@ -7466,14 +7468,6 @@ CreateRestartPoint(int flags) >> >> * not be used, and will go wasted until recycled on the >> next >> * restartpoint. We'll live with that. >> */ >> - (void) GetXLogReplayRecPtr(&ThisTimeLineID); >> - >> - RemoveOldXlogFiles(_logSegNo, endptr); >> - >> - /* >> -* Make more log segments if needed. (Do this after >> recycling old log >> -* segments, since that may supply some of the needed >> files.) >> -*/ >> PreallocXlogFiles(endptr); >> } > > > That's essentially reverting commit 343ee00b7, resurrecting the bug that > that fixed. Hmm, OK. Then we should using the correct value, not the one that patch set. It certainly worked at that time, but not in a principled way. When we set the new timeline we should be updating all values that might be used elsewhere. If we do that, then no matter when or how we run GetXLogReplayRecPtr, it can't ever get it wrong in any backend. Patch attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services set_lastReplayedTLI.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 20.05.2013 22:18, Simon Riggs wrote: On 20 May 2013 18:47, Heikki Linnakangas wrote: Not sure what the best fix would be. Perhaps change the code in CreateRestartPoint() to do something like this instead: GetXLogReplayRecPtr(&replayTLI); if (RecoveryInProgress()) ThisTimeLineID = replayTLI; Installing a few extra WAL files doesn't seem to be a good reason to mess with such an important variable as ThisTimeLineID, especially since its such a rare event and hardly worth optimising for. I would prefer it if we didn't set ThisTimeLineID at all in that way, or anywhere else. If we do, it needs to be done safely, otherwise any caller could make the same mistake. Suggested patch attached. @@ -7466,14 +7468,6 @@ CreateRestartPoint(int flags) * not be used, and will go wasted until recycled on the next * restartpoint. We'll live with that. */ - (void) GetXLogReplayRecPtr(&ThisTimeLineID); - - RemoveOldXlogFiles(_logSegNo, endptr); - - /* -* Make more log segments if needed. (Do this after recycling old log -* segments, since that may supply some of the needed files.) -*/ PreallocXlogFiles(endptr); } That's essentially reverting commit 343ee00b7, resurrecting the bug that that fixed. @@ -9098,10 +9092,10 @@ GetXLogReplayRecPtr(TimeLineID *replayTLI) SpinLockAcquire(&xlogctl->info_lck); recptr = xlogctl->lastReplayedEndRecPtr; tli = xlogctl->lastReplayedTLI; + if (replayTLI && xlogctl->SharedRecoveryInProgress) + *replayTLI = tli; SpinLockRelease(&xlogctl->info_lck); - if (replayTLI) - *replayTLI = tli; return recptr; } That breaks the only remaining caller that passes a replayTLI pointer, GetStandbyFlushRecPtr(); *replayTLI points to a local variable in that function, which is left uninitialized if !xlogctl->SharedRecoveryInProgress. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 20 May 2013 18:47, Heikki Linnakangas wrote: > On 19.05.2013 17:25, Simon Riggs wrote: >> So while I believe that the checkpointer might have an incorrect TLI >> and that you've seen a bug, what isn't clear is that the checkpointer >> is the only process that would see an incorrect TLI, or why such >> processes see an incorrect TLI. It seems equally likely at this point >> that the TLI may be set incorrectly somehow and that is why it is >> being read incorrectly. > > > Yeah. I spent some more time debugging this, and found the culprit. In > CreateRestartPoint, we do this: > >> /* >> * Update ThisTimeLineID to the timeline we're currently >> replaying, >> * so that we install any recycled segments on that >> timeline. >> * >> * There is no guarantee that the WAL segments will be >> useful on the >> * current timeline; if recovery proceeds to a new >> timeline right >> * after this, the pre-allocated WAL segments on this >> timeline will >> * not be used, and will go wasted until recycled on the >> next >> * restartpoint. We'll live with that. >> */ >> (void) GetXLogReplayRecPtr(&ThisTimeLineID); > > > Here's what happens: > > 0. System is in recovery > 1. checkpointer process starts creating a restartpoint. > 2. Startup process ends recovery, creates end-of-recovery record, sets the > shared ThisTimeLineID value to 2, and exits. > 3. checkpointer process calls RecoveryInProgess() while performing the > restartpoint (there are RecoveryInProgress() calls in many places, e.g in > XLogNeedsFlush()). It sets LocalRecoveryInProgress = true, and initializes > ThisTimeLineID to 2. > 4. At the end of restartpoint, the checkpointer process does the above > GetXLogReplayRecPtr(&ThisTimeLineID). That overwrites ThisTimeLineID back to > 1. > 5. Checkpointer calls RecoveryInProgress, which returns true immediately, as > LocalRecoveryInProgress is already set. > 6. Checkpointer performs the first checkpoint, with ThisTimeLineID set > incorrectly to 1. That description looks correct to me from the code. > Not sure what the best fix would be. Perhaps change the code in > CreateRestartPoint() to do something like this instead: > > GetXLogReplayRecPtr(&replayTLI); > if (RecoveryInProgress()) > ThisTimeLineID = replayTLI; Installing a few extra WAL files doesn't seem to be a good reason to mess with such an important variable as ThisTimeLineID, especially since its such a rare event and hardly worth optimising for. I would prefer it if we didn't set ThisTimeLineID at all in that way, or anywhere else. If we do, it needs to be done safely, otherwise any caller could make the same mistake. Suggested patch attached. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services avoid_setting_tli.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 19.05.2013 17:25, Simon Riggs wrote: However, there is a call to RecoveryInProgress() at the top of the main loop of the checkpointer, which does explicitly state that it "initializes TimeLineID if it's not set yet." The checkpointer makes the decision about whether to run a restartpoint or a checkpoint directly from the answer to that, modified only in the case of an CHECKPOINT_END_OF_RECOVERY. So the appropiate call is made and I don't agree with the statement that this "doesn't get executed with fast promotion", or the title of the thread. Hmm, I see. So while I believe that the checkpointer might have an incorrect TLI and that you've seen a bug, what isn't clear is that the checkpointer is the only process that would see an incorrect TLI, or why such processes see an incorrect TLI. It seems equally likely at this point that the TLI may be set incorrectly somehow and that is why it is being read incorrectly. Yeah. I spent some more time debugging this, and found the culprit. In CreateRestartPoint, we do this: /* * Update ThisTimeLineID to the timeline we're currently replaying, * so that we install any recycled segments on that timeline. * * There is no guarantee that the WAL segments will be useful on the * current timeline; if recovery proceeds to a new timeline right * after this, the pre-allocated WAL segments on this timeline will * not be used, and will go wasted until recycled on the next * restartpoint. We'll live with that. */ (void) GetXLogReplayRecPtr(&ThisTimeLineID); Here's what happens: 0. System is in recovery 1. checkpointer process starts creating a restartpoint. 2. Startup process ends recovery, creates end-of-recovery record, sets the shared ThisTimeLineID value to 2, and exits. 3. checkpointer process calls RecoveryInProgess() while performing the restartpoint (there are RecoveryInProgress() calls in many places, e.g in XLogNeedsFlush()). It sets LocalRecoveryInProgress = true, and initializes ThisTimeLineID to 2. 4. At the end of restartpoint, the checkpointer process does the above GetXLogReplayRecPtr(&ThisTimeLineID). That overwrites ThisTimeLineID back to 1. 5. Checkpointer calls RecoveryInProgress, which returns true immediately, as LocalRecoveryInProgress is already set. 5. Checkpointer performs the first checkpoint, with ThisTimeLineID set incorrectly to 1. Not sure what the best fix would be. Perhaps change the code in CreateRestartPoint() to do something like this instead: GetXLogReplayRecPtr(&replayTLI); if (RecoveryInProgress()) ThisTimeLineID = replayTLI; I see that the comment in InitXLOGAccess() is incorrect "ThisTimeLineID doesn't change so we need no lock to copy it", which seems worth correcting since there's no need to save time in a once only function. Hmm, the code seems right to me, XLogCtl->ThisTimeLineID indeed doesn't change, once it's set. And InitXLOGAccess() isn't called until it is set. The comment could explain that better, though. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 7 May 2013 10:57, Heikki Linnakangas wrote: > While testing the bug from the "Assertion failure at standby promotion", I > bumped into a different bug in fast promotion. When the first checkpoint > after fast promotion is performed, there is no guarantee that the > checkpointer process is running with the correct, new, ThisTimeLineID. In > CreateCheckPoint(), we have this: > >> /* >> * An end-of-recovery checkpoint is created before anyone is >> allowed to >> * write WAL. To allow us to write the checkpoint record, >> temporarily >> * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is >> * initialized, which we need here and in AdvanceXLInsertBuffer.) >> */ >> if (flags & CHECKPOINT_END_OF_RECOVERY) >> LocalSetXLogInsertAllowed(); > > > That ensures that ThisTimeLineID is updated when performing an > end-of-recovery checkpoint, but it doesn't get executed with fast promotion. > The consequence is that the checkpoint is created with the old timeline, and > subsequent recovery from it will fail. LocalSetXLogInsertAllowed() is called by CreateEndOfRecoveryRecord(), but in fast promotion this is called by Startup process, not Checkpointer process. So there is no longer an explicit call to InitXLOGAccess() in the checkpointer process via a checkpoint with flag CHECKPOINT_END_OF_RECOVERY. However, there is a call to RecoveryInProgress() at the top of the main loop of the checkpointer, which does explicitly state that it "initializes TimeLineID if it's not set yet." The checkpointer makes the decision about whether to run a restartpoint or a checkpoint directly from the answer to that, modified only in the case of an CHECKPOINT_END_OF_RECOVERY. So the appropiate call is made and I don't agree with the statement that this "doesn't get executed with fast promotion", or the title of the thread. So while I believe that the checkpointer might have an incorrect TLI and that you've seen a bug, what isn't clear is that the checkpointer is the only process that would see an incorrect TLI, or why such processes see an incorrect TLI. It seems equally likely at this point that the TLI may be set incorrectly somehow and that is why it is being read incorrectly. I see that the comment in InitXLOGAccess() is incorrect "ThisTimeLineID doesn't change so we need no lock to copy it", which seems worth correcting since there's no need to save time in a once only function. Continuing to investigate further. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On Thursday, May 16, 2013 11:33 AM Kyotaro HORIGUCHI wrote: > Hello, > > > > >> Is the point of this discussion that the patch may leave out > some > > > >> glich about timing of timeline-related changing and Heikki saw > an > > > >> egress of that? > > > > > > > > AFAIU, the committed patch has some gap in overall scenario which > is > > > the > > > > fast promotion issue. > > > > > > Right, the fast promotion issue is still there. > > > > > > Just to get us all on the same page again: Does anyone see a > problem > > > with a fresh git checkout, with the fast-promotion-quick-fix.patch > > > applied? > > > (http://www.postgresql.org/message-id/51894942.4080...@vmware.com). > If > > > you do, please speak up. As far as I know, the already-committed > patch, > > > together with fast-promotion-quick-fix.patch, should fix all known > > > issues (*). > > Shared XLogCtl->ThisTimeLineID is written and read without > fencing by spinlock unlike some other XLogCtl members. Can this > break coherency of its memory between different processors? It > is quite reasonable that I cannot find the trouble if it is the > cause. I didn't see the issue even without > fast-promotion-quick-fix.patch. > > > The patch provided will un-necessarily call InitXLOGAccess() 2 times > for End > > of recovery checkpoint, it doesn't matter w.r.t performance but > actually the > > purpose will > > be almost same for calling LocalSetXLogInsertAllowed() and > InitXLOGAccess(), > > or am I missing something. > > > > One more thing, I think after fast promotion, either it should set > timeline > > or give error in CreateCheckPoint() function before it reaches the > check > > mentioned by you in your initial mail. > > if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == > 0) > > elog(ERROR, "can't create a checkpoint during > recovery"); > > Shouldn't it set timeline in above check (RecoveryInProgress()) or > when > > RecoveryInProgress() is called before CreateCheckPoint()? > > Thinking of checkpointer, it does RecoveryInProgress() far > earlier to there, in waiting loop in CheckPointerMain where to > decide which to do between checkpoint and restartpoint. So > InitXLogAccess() has been already done when checkpoint is choosed > there for the first time. And before that, ThisTimeLineID in the > startup process gets incremented and is copied onto > XLogCtl->ThisTimeLineID before xlogctl->SharedRecoveryInProgress > becomes false in StartupXLog(). I don't think it is possible for > checkpointer to run on older timeline id on codition that all > processes looks exactly the same memory image. I also think the same, but now one difference with fast promotion is the request for checkpoint is done after recovery; so some operations can happen before checkpoint with new timeline. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 16 May 2013 07:02, Kyotaro HORIGUCHI wrote: >> > > fast promotion issue. Excuse me for not joining the thread earlier. I'm not available today, but will join in later in my evening. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
Hello, > > >> Is the point of this discussion that the patch may leave out some > > >> glich about timing of timeline-related changing and Heikki saw an > > >> egress of that? > > > > > > AFAIU, the committed patch has some gap in overall scenario which is > > the > > > fast promotion issue. > > > > Right, the fast promotion issue is still there. > > > > Just to get us all on the same page again: Does anyone see a problem > > with a fresh git checkout, with the fast-promotion-quick-fix.patch > > applied? > > (http://www.postgresql.org/message-id/51894942.4080...@vmware.com). If > > you do, please speak up. As far as I know, the already-committed patch, > > together with fast-promotion-quick-fix.patch, should fix all known > > issues (*). Shared XLogCtl->ThisTimeLineID is written and read without fencing by spinlock unlike some other XLogCtl members. Can this break coherency of its memory between different processors? It is quite reasonable that I cannot find the trouble if it is the cause. I didn't see the issue even without fast-promotion-quick-fix.patch. > The patch provided will un-necessarily call InitXLOGAccess() 2 times for End > of recovery checkpoint, it doesn't matter w.r.t performance but actually the > purpose will > be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(), > or am I missing something. > > One more thing, I think after fast promotion, either it should set timeline > or give error in CreateCheckPoint() function before it reaches the check > mentioned by you in your initial mail. > if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0) > elog(ERROR, "can't create a checkpoint during recovery"); > Shouldn't it set timeline in above check (RecoveryInProgress()) or when > RecoveryInProgress() is called before CreateCheckPoint()? Thinking of checkpointer, it does RecoveryInProgress() far earlier to there, in waiting loop in CheckPointerMain where to decide which to do between checkpoint and restartpoint. So InitXLogAccess() has been already done when checkpoint is choosed there for the first time. And before that, ThisTimeLineID in the startup process gets incremented and is copied onto XLogCtl->ThisTimeLineID before xlogctl->SharedRecoveryInProgress becomes false in StartupXLog(). I don't think it is possible for checkpointer to run on older timeline id on codition that all processes looks exactly the same memory image. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On Monday, May 13, 2013 1:13 PM Heikki Linnakangas wrote: > On 13.05.2013 06:07, Amit Kapila wrote: > > On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote: > >> Heikki said in the fist message in this thread that he suspected > >> the cause of the failure he had seen to be wrong TLI on whitch > >> checkpointer runs. Nevertheless, the patch you suggested for me > >> looks fixing it. Moreover (one of?) the failure from the same > >> cause looks fixed with the patch. > > > > There were 2 problems: > > 1. There was some issue in walsender logic due to which after > promotion in > > some cases it hits assertion or error > > 2. During fast promotion, checkpoint gets created with wrong TLI > > > > He has provided 2 different patches > > fix-standby-promotion-assert-fail-2.patch and > > fast-promotion-quick-fix.patch. > > Among 2, he has already committed fix-standby-promotion-assert-fail- > 2.patch > > > (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa > 66f49 > > 75c99e52984f7ee81b47d137b5b4751) > > That's correct. > > >> Is the point of this discussion that the patch may leave out some > >> glich about timing of timeline-related changing and Heikki saw an > >> egress of that? > > > > AFAIU, the committed patch has some gap in overall scenario which is > the > > fast promotion issue. > > Right, the fast promotion issue is still there. > > Just to get us all on the same page again: Does anyone see a problem > with a fresh git checkout, with the fast-promotion-quick-fix.patch > applied? > (http://www.postgresql.org/message-id/51894942.4080...@vmware.com). If > you do, please speak up. As far as I know, the already-committed patch, > together with fast-promotion-quick-fix.patch, should fix all known > issues (*). > > I haven't committed a fix for the issue I reported in this thread, > because I'm not 100% on what the right fix for it would be. > fast-promotion-quick-fix.patch seems to do the trick, but at least the > comments need to be updated, and I'm not sure if there some related > corner cases that it doesn't handle. Simon? The patch provided will un-necessarily call InitXLOGAccess() 2 times for End of recovery checkpoint, it doesn't matter w.r.t performance but actually the purpose will be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(), or am I missing something. One more thing, I think after fast promotion, either it should set timeline or give error in CreateCheckPoint() function before it reaches the check mentioned by you in your initial mail. if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0) elog(ERROR, "can't create a checkpoint during recovery"); Shouldn't it set timeline in above check (RecoveryInProgress()) or when RecoveryInProgress() is called before CreateCheckPoint()? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On 13.05.2013 06:07, Amit Kapila wrote: On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote: Heikki said in the fist message in this thread that he suspected the cause of the failure he had seen to be wrong TLI on whitch checkpointer runs. Nevertheless, the patch you suggested for me looks fixing it. Moreover (one of?) the failure from the same cause looks fixed with the patch. There were 2 problems: 1. There was some issue in walsender logic due to which after promotion in some cases it hits assertion or error 2. During fast promotion, checkpoint gets created with wrong TLI He has provided 2 different patches fix-standby-promotion-assert-fail-2.patch and fast-promotion-quick-fix.patch. Among 2, he has already committed fix-standby-promotion-assert-fail-2.patch (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa66f49 75c99e52984f7ee81b47d137b5b4751) That's correct. Is the point of this discussion that the patch may leave out some glich about timing of timeline-related changing and Heikki saw an egress of that? AFAIU, the committed patch has some gap in overall scenario which is the fast promotion issue. Right, the fast promotion issue is still there. Just to get us all on the same page again: Does anyone see a problem with a fresh git checkout, with the fast-promotion-quick-fix.patch applied? (http://www.postgresql.org/message-id/51894942.4080...@vmware.com). If you do, please speak up. As far as I know, the already-committed patch, together with fast-promotion-quick-fix.patch, should fix all known issues (*). I haven't committed a fix for the issue I reported in this thread, because I'm not 100% on what the right fix for it would be. fast-promotion-quick-fix.patch seems to do the trick, but at least the comments need to be updated, and I'm not sure if there some related corner cases that it doesn't handle. Simon? (*) Well, almost. This one is still pending: http://www.postgresql.org/message-id/cab7npqrhucuud012gcb_taafrixx2wion_zfxqcvlurab8d...@mail.gmail.com - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote: > 2013/05/10 20:01 "Amit Kapila" : > > > > C 2013-05-10 15:32:32.170 JST 9242 FATAL: could not receive data > > > from WAL stream: > > > > Is there any chance, that there is any network glitch caused this one > time > > error. > > Unix domam sockets are hardly likely to have such troubles. This > test ran within single host. > > > > I'm get confused, the patch seems to me ensureing the "first > > > checkpoint after fast promotion is performed" to use the > > > "correct, new, ThisTimeLineID". > > > > What is your confusion? > > Heikki said in the fist message in this thread that he suspected > the cause of the failure he had seen to be wrong TLI on whitch > checkpointer runs. Nevertheless, the patch you suggested for me > looks fixing it. Moreover (one of?) the failure from the same > cause looks fixed with the patch. There were 2 problems: 1. There was some issue in walsender logic due to which after promotion in some cases it hits assertion or error 2. During fast promotion, checkpoint gets created with wrong TLI He has provided 2 different patches fix-standby-promotion-assert-fail-2.patch and fast-promotion-quick-fix.patch. Among 2, he has already committed fix-standby-promotion-assert-fail-2.patch (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa66f49 75c99e52984f7ee81b47d137b5b4751) > Is the point of this discussion that the patch may leave out some > glich about timing of timeline-related changing and Heikki saw an > egress of that? AFAIU, the committed patch has some gap in overall scenario which is the fast promotion issue. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
2013/05/10 20:01 "Amit Kapila" : > > > C 2013-05-10 15:32:32.170 JST 9242 FATAL: could not receive data > > from WAL stream: > > Is there any chance, that there is any network glitch caused this one time > error. Unix domam sockets are hardly likely to have such troubles. This test ran within single host. > > I'm get confused, the patch seems to me ensureing the "first > > checkpoint after fast promotion is performed" to use the > > "correct, new, ThisTimeLineID". > > What is your confusion? Heikki said in the fist message in this thread that he suspected the cause of the failure he had seen to be wrong TLI on whitch checkpointer runs. Nevertheless, the patch you suggested for me looks fixing it. Moreover (one of?) the failure from the same cause looks fixed with the patch. Is the point of this discussion that the patch may leave out some glich about timing of timeline-related changing and Heikki saw an egress of that? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On Friday, May 10, 2013 2:07 PM Kyotaro HORIGUCHI wrote: > Thank you for noticing me of that. > > > It seems to me, it is the same problem as discussed and fixed in > below > > thread. > > http://www.postgresql.org/message-id/51894942.4080...@vmware.com > > > > Could you try with fixes given by heikki. > > The first one settles the timeline transition problem for the > present. Besides, I have no longer found any recovery failure > except unrepeatable one shown below. > > > C 2013-05-10 15:30:29.589 JST 9242 LOG: restarted WAL streaming at > 0/500 on timeline 2 > > B 2013-05-10 15:30:33.789 JST 9233 LOG: restartpoint complete: > (snip.) > > B 2013-05-10 15:30:33.789 JST 9233 LOG: recovery restart point at > 0/328 > > B 2013-05-10 15:30:33.789 JST 9233 LOG: checkpoint starting: > > C 2013-05-10 15:32:32.170 JST 9242 FATAL: could not receive data > from WAL stream: Is there any chance, that there is any network glitch caused this one time error. > I'm get confused, the patch seems to me ensureing the "first > checkpoint after fast promotion is performed" to use the > "correct, new, ThisTimeLineID". What is your confusion? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
Thank you for noticing me of that. > It seems to me, it is the same problem as discussed and fixed in below > thread. > http://www.postgresql.org/message-id/51894942.4080...@vmware.com > > Could you try with fixes given by heikki. The first one settles the timeline transition problem for the present. Besides, I have no longer found any recovery failure except unrepeatable one shown below. > C 2013-05-10 15:30:29.589 JST 9242 LOG: restarted WAL streaming at 0/500 > on timeline 2 > B 2013-05-10 15:30:33.789 JST 9233 LOG: restartpoint complete: (snip.) > B 2013-05-10 15:30:33.789 JST 9233 LOG: recovery restart point at 0/328 > B 2013-05-10 15:30:33.789 JST 9233 LOG: checkpoint starting: > C 2013-05-10 15:32:32.170 JST 9242 FATAL: could not receive data from WAL > stream: I'm get confused, the patch seems to me ensureing the "first checkpoint after fast promotion is performed" to use the "correct, new, ThisTimeLineID". At Thu, 9 May 2013 11:10:23 +0530, Amit Kapila wrote in <005501ce4c77$b45f0b70$1d1d2250$@kap...@huawei.com> > Without fast-promotion, it will request/perform End of Recovery checkpoint > while still in recovery (before setting xlogctl->SharedRecoveryInProgress), > So I think before any new operation can start, it can make sure that > Checkpoint with new timeline is performed. > > However with fast promotion, the request for checkpoint is done after > recovery; so some operations can happen before checkpoint with new timeline. > I think it can so happen that last checkpoint is with old timeline and there > are operations with new timeline which might have caused the problem Heikki > has seen. I don't see any problem (in my view :) that something writes WAL runs before the first checkpoint starts - as described in the comment just above the checkpoint request.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On Thursday, May 09, 2013 2:14 PM Kyotaro HORIGUCHI wrote: > With printing some additinal logs, the situation should be more > clear.. > > It seems that Sby-B failes to promote to TLI= 2; nevertheless the > history file for TLI = 2 is somehow sent to sby-C. So sby-B > remains on TLI=1 but sby-C solely switches onto TLI=2. > > # Come to think of this, I suspect that the additional logs is not so > useful :( It seems to me, it is the same problem as discussed and fixed in below thread. http://www.postgresql.org/message-id/51894942.4080...@vmware.com Could you try with fixes given by heikki. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
With printing some additinal logs, the situation should be more clear.. It seems that Sby-B failes to promote to TLI= 2; nevertheless the history file for TLI = 2 is somehow sent to sby-C. So sby-B remains on TLI=1 but sby-C solely switches onto TLI=2. # Come to think of this, I suspect that the additional logs is not so useful :( > B 2013-05-09 17:29:53.380 JST 32258 ERROR: server switched off timeline 1 at > 0/53F8B60, but walsender already streamed up to 0/53FA000 > C 2013-05-09 17:29:53.380 JST 32257 FATAL: could not receive data from WAL > stream: ERROR: server switched off timeline 1 at 0/53F8B60, but walsender > already streamed up to 0/53FA000 > > B 2013-05-09 17:29:53.380 JST 32244 LOG: database system is ready to accept > connections .. > C 2013-05-09 17:30:08.395 JST 32256 LOG: Reading page on Timeline ID = 1 > C 2013-05-09 17:30:08.398 JST 32274 LOG: fetching timeline history file for > timeline 2 from primary server > C 2013-05-09 17:30:08.448 JST 32274 LOG: started streaming WAL from primary > at 0/500 on timeline 1 > C 2013-05-09 17:30:08.452 JST 32274 LOG: replication terminated by primary > server > C 2013-05-09 17:30:08.452 JST 32274 DETAIL: End of WAL reached on timeline 1 > at 0/53F8B60 > C 2013-05-09 17:30:08.452 JST 32256 LOG: new target timeline is 2 > C 2013-05-09 17:30:08.452 JST 32256 LOG: Reading page on Timeline ID = 1 > C 2013-05-09 17:30:08.452 JST 32256 LOG: Reading page on Timeline ID = 1 > C 2013-05-09 17:30:08.453 JST 32274 LOG: restarted WAL streaming at > 0/500 on timeline 2 > B 2013-05-09 17:30:10.913 JST 32248 LOG: This checkpoint record is on > TimelineID = 1, loc is about 0/53F8C30 > B 2013-05-09 17:30:10.953 JST 32248 LOG: checkpoint complete: wrote 637 > buffers (3.9%); 0 transaction log file(s) added, 0 removed, 0 recycled; > write=13.502 s, sync=0.105 s, total=13.733 s; sync files=2, longest=0.089 s, > average=0.052 s > B 2013-05-09 17:30:10.953 JST 32248 LOG: checkpoint starting: immediate > force wait > B 2013-05-09 17:30:10.963 JST 32248 LOG: This checkpoint record is on > TimelineID = 1, loc is about 0/53F8CD0 > B 2013-05-09 17:30:11.003 JST 32248 LOG: checkpoint complete: wrote 0 > buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; > write=0.000 s, sync=0.000 s, total=0.049 s; sync files=0, longest=0.000 s, > average=0.000 s > B 2013-05-09 17:30:11.096 JST 32248 LOG: checkpoint starting: immediate > force wait > B 2013-05-09 17:30:11.909 JST 32248 LOG: This checkpoint record is on > TimelineID = 1, loc is about 0/540BEF8 > C 2013-05-09 17:30:11.929 JST 32256 LOG: invalid magic number in log > segment 00010005, offset 4169728 > C 2013-05-09 17:30:11.929 JST 32274 FATAL: terminating walreceiver process > due to administrator command > B 2013-05-09 17:30:11.951 JST 32248 LOG: checkpoint complete: wrote 18 > buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; > write=0.017 s, sync=0.785 s, total=0.855 s; sync files=13, longest=0.235 s, > average=0.060 s > CHECKPOINT > C 2013-05-09 17:30:13.931 JST 32256 LOG: Reading page on Timeline ID = 2 > C 2013-05-09 17:30:13.931 JST 32256 LOG: record with zero length at 0/53F8B90 > C 2013-05-09 17:30:13.931 JST 32256 LOG: Reading page on Timeline ID = 2 > C 2013-05-09 17:30:13.931 JST 32256 LOG: record with zero length at 0/53F8B90 > C 2013-05-09 17:30:18.936 JST 32256 LOG: Reading page on Timeline ID = 2 > C 2013-05-09 17:30:18.936 JST 32256 LOG: Reading page on Timeline ID = 2 > C 2013-05-09 17:30:18.936 JST 32256 LOG: record with zero length at 0/53F8B90 regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
Hello, > I think it can so happen that last checkpoint is with old timeline and there > are operations with new timeline which might have caused the problem Heikki > has seen. I suppose to have seen that. After adding an SQL command to modify the DB on standby-B after passive(propagated?) promotion of standby-C, < pg_ctl -D data-standbyB promote --- > > pg_ctl -D data-standbyB -w promote > sleep 20 > psql -p 5433 postgres -c 'create table t (a int); checkpoint;' I saw following complaint from standby-C. > B 2013-05-09 17:00:06.019 JST 30268 LOG: checkpoint starting: immediate > force wait > C 2013-05-09 17:00:06.211 JST 30277 LOG: invalid magic number in log > segment 00010005, offset 4169728 > C 2013-05-09 17:00:06.211 JST 30295 FATAL: terminating walreceiver process > due to administrator command > B 2013-05-09 17:00:06.219 JST 30268 LOG: checkpoint complete: wrote 18 > buffers (0.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; > write=0.000 s, sync=0.157 s, total=0.199 s; sync files=13, longest=0.041 s, > average=0.012 s > CHECKPOINT > C 2013-05-09 16:41:33.974 JST 29624 LOG: record with zero length at 0/53F8B90 > C 2013-05-09 16:41:33.974 JST 29624 LOG: record with zero length at 0/53F8B90 > C 2013-05-09 16:41:38.980 JST 29624 LOG: record with zero length at 0/53F8B90 .. repeats forever .. This seems to be caused from a kind of timeline inconsistency at first glance.. Explicit checkpoint before the modification on Sby-B does not help. > psql -p 5433 postgres -c 'checkpoint;create table t (a int); checkpoint;' regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On Thursday, May 09, 2013 6:29 AM Fujii Masao wrote: > On Tue, May 7, 2013 at 6:57 PM, Heikki Linnakangas > wrote: > > While testing the bug from the "Assertion failure at standby > promotion", I > > bumped into a different bug in fast promotion. When the first > checkpoint > > after fast promotion is performed, there is no guarantee that the > > checkpointer process is running with the correct, new, > ThisTimeLineID. In > > CreateCheckPoint(), we have this: > > > >> /* > >> * An end-of-recovery checkpoint is created before anyone is > >> allowed to > >> * write WAL. To allow us to write the checkpoint record, > >> temporarily > >> * enable XLogInsertAllowed. (This also ensures > ThisTimeLineID is > >> * initialized, which we need here and in > AdvanceXLInsertBuffer.) > >> */ > >> if (flags & CHECKPOINT_END_OF_RECOVERY) > >> LocalSetXLogInsertAllowed(); > > > > > > That ensures that ThisTimeLineID is updated when performing an > > end-of-recovery checkpoint, but it doesn't get executed with fast > promotion. > > The consequence is that the checkpoint is created with the old > timeline, and > > subsequent recovery from it will fail. > > > > I ran into this with the attached script. It sets up a master (M), a > standby > > (B), and a cascading standby (C). I'm not sure why, but when I tried > to > > simplify the script by removing the cascading standby, it started to > work. > > The bug occurs in standby B, so I'm not sure why the presence of the > > cascading standby makes any difference. Maybe it just affects the > timing. > > Can this really happen? ISTM that the checkpointer should detect that > the recovery mode ends and call RecoveryInProgress()->InitXLOGAccess() > before calling CreateCheckPoint(). Without fast-promotion, it will request/perform End of Recovery checkpoint while still in recovery (before setting xlogctl->SharedRecoveryInProgress), So I think before any new operation can start, it can make sure that Checkpoint with new timeline is performed. However with fast promotion, the request for checkpoint is done after recovery; so some operations can happen before checkpoint with new timeline. I think it can so happen that last checkpoint is with old timeline and there are operations with new timeline which might have caused the problem Heikki has seen. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast promotion failure
On Tue, May 7, 2013 at 6:57 PM, Heikki Linnakangas wrote: > While testing the bug from the "Assertion failure at standby promotion", I > bumped into a different bug in fast promotion. When the first checkpoint > after fast promotion is performed, there is no guarantee that the > checkpointer process is running with the correct, new, ThisTimeLineID. In > CreateCheckPoint(), we have this: > >> /* >> * An end-of-recovery checkpoint is created before anyone is >> allowed to >> * write WAL. To allow us to write the checkpoint record, >> temporarily >> * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is >> * initialized, which we need here and in AdvanceXLInsertBuffer.) >> */ >> if (flags & CHECKPOINT_END_OF_RECOVERY) >> LocalSetXLogInsertAllowed(); > > > That ensures that ThisTimeLineID is updated when performing an > end-of-recovery checkpoint, but it doesn't get executed with fast promotion. > The consequence is that the checkpoint is created with the old timeline, and > subsequent recovery from it will fail. > > I ran into this with the attached script. It sets up a master (M), a standby > (B), and a cascading standby (C). I'm not sure why, but when I tried to > simplify the script by removing the cascading standby, it started to work. > The bug occurs in standby B, so I'm not sure why the presence of the > cascading standby makes any difference. Maybe it just affects the timing. Can this really happen? ISTM that the checkpointer should detect that the recovery mode ends and call RecoveryInProgress()->InitXLOGAccess() before calling CreateCheckPoint(). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fast promotion failure
While testing the bug from the "Assertion failure at standby promotion", I bumped into a different bug in fast promotion. When the first checkpoint after fast promotion is performed, there is no guarantee that the checkpointer process is running with the correct, new, ThisTimeLineID. In CreateCheckPoint(), we have this: /* * An end-of-recovery checkpoint is created before anyone is allowed to * write WAL. To allow us to write the checkpoint record, temporarily * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is * initialized, which we need here and in AdvanceXLInsertBuffer.) */ if (flags & CHECKPOINT_END_OF_RECOVERY) LocalSetXLogInsertAllowed(); That ensures that ThisTimeLineID is updated when performing an end-of-recovery checkpoint, but it doesn't get executed with fast promotion. The consequence is that the checkpoint is created with the old timeline, and subsequent recovery from it will fail. I ran into this with the attached script. It sets up a master (M), a standby (B), and a cascading standby (C). I'm not sure why, but when I tried to simplify the script by removing the cascading standby, it started to work. The bug occurs in standby B, so I'm not sure why the presence of the cascading standby makes any difference. Maybe it just affects the timing. - Heikki fast-promotion-bug.sh Description: application/shellscript -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers