Re: Problem while setting the fpw with SIGHUP
On Fri, Sep 28, 2018 at 5:16 PM Amit Kapila wrote: > > On Fri, Sep 28, 2018 at 4:23 AM Michael Paquier wrote: > > > > On Thu, Sep 27, 2018 at 07:38:31PM +0530, Amit Kapila wrote: > > > Okay, I am planning to commit the attached patch tomorrow unless you > > > or anybody else has any objections to it. > > > > None from here. Thanks for taking care of it. > > > > Thanks, pushed! I have backpatched till 9.5 as this has been > introduced by the commit 2c03216d83 which added the XLOG machinery > initialization in RecoveryInProgress code path. > I have marked the originally reported issue as fixed in the open-items list. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Fri, Sep 28, 2018 at 4:23 AM Michael Paquier wrote: > > On Thu, Sep 27, 2018 at 07:38:31PM +0530, Amit Kapila wrote: > > Okay, I am planning to commit the attached patch tomorrow unless you > > or anybody else has any objections to it. > > None from here. Thanks for taking care of it. > Thanks, pushed! I have backpatched till 9.5 as this has been introduced by the commit 2c03216d83 which added the XLOG machinery initialization in RecoveryInProgress code path. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 07:38:31PM +0530, Amit Kapila wrote: > Okay, I am planning to commit the attached patch tomorrow unless you > or anybody else has any objections to it. None from here. Thanks for taking care of it. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 6:22 PM Michael Paquier wrote: > > On Thu, Sep 27, 2018 at 04:19:02PM +0530, Amit Kapila wrote: > > I think this is mostly fine, but it seems "if the instance just got > > out of recovery" doesn't fit well because it can happen anytime after > > recovery, this code gets called from checkpointer. I think we can > > slightly tweak it as below: > > "Perform this outside critical section so that the WAL insert > > initialization done by RecoveryInProgress() doesn't trigger an > > assertion failure." > > > > What do you say? > > Fine for me. > Okay, I am planning to commit the attached patch tomorrow unless you or anybody else has any objections to it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_fpwupdate.2.patch Description: Binary data
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 04:19:02PM +0530, Amit Kapila wrote: > I think this is mostly fine, but it seems "if the instance just got > out of recovery" doesn't fit well because it can happen anytime after > recovery, this code gets called from checkpointer. I think we can > slightly tweak it as below: > "Perform this outside critical section so that the WAL insert > initialization done by RecoveryInProgress() doesn't trigger an > assertion failure." > > What do you say? Fine for me. >> Sure, feel free to if you have some room. I am fine to take care of it >> as well, so that's up to you to decide. > > Okay, I will take care of it. Thanks. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 1:32 PM Michael Paquier wrote: > > On Thu, Sep 27, 2018 at 11:18:02AM +0530, Amit Kapila wrote: > > Your proposed solution makes sense to me. IIUC, this is quite similar > > to what Dilip has also proposed [1]. > > Indeed. I would just add with the patch a comment like that: > "Perform this call outside the critical section so as if the instance > just got out of recovery, the upcoming WAL insert initialization does > not trigger an assertion failure." > I think this is mostly fine, but it seems "if the instance just got out of recovery" doesn't fit well because it can happen anytime after recovery, this code gets called from checkpointer. I think we can slightly tweak it as below: "Perform this outside critical section so that the WAL insert initialization done by RecoveryInProgress() doesn't trigger an assertion failure." What do you say? > Sure, feel free to if you have some room. I am fine to take care of it > as well, so that's up to you to decide. Okay, I will take care of it. > Adding a comment like what I > proposed upthread is necessary in my opinion. Agreed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 01:30:13PM +0530, Amit Kapila wrote: > I can take care of committing something along the lines of Dilip's > patch if you are okay. Sure, feel free to if you have some room. I am fine to take care of it as well, so that's up to you to decide. Adding a comment like what I proposed upthread is necessary in my opinion. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 11:18:02AM +0530, Amit Kapila wrote: > Your proposed solution makes sense to me. IIUC, this is quite similar > to what Dilip has also proposed [1]. Indeed. I would just add with the patch a comment like that: "Perform this call outside the critical section so as if the instance just got out of recovery, the upcoming WAL insert initialization does not trigger an assertion failure." If that sounds fine to you, I propose that we commit Dilip's patch with the comment addition. That will take care of (a). -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 11:18 AM Amit Kapila wrote: > > On Thu, Sep 27, 2018 at 10:34 AM Michael Paquier wrote: > > > > On Thu, Sep 27, 2018 at 10:03:59AM +0530, Amit Kapila wrote: > > > I think, in this case, it might be advisable to just fix the problem > > > (a) which is what has been reported originally in the thread and > > > AFAICS, the fix for that is clear as compared to the problem (b). If > > > you agree, then we can discuss what is the best fix for the first > > > problem (a). > > > > Okay, thanks for the input. The fix for (a) would be in my opinion to > > just move the call to RecoveryInProgress() out of the critical section, > > then save the result into a variable, and use the variable within the > > critical section to avoid the potential palloc() problems. What do you > > think? > > > > Your proposed solution makes sense to me. IIUC, this is quite similar > to what Dilip has also proposed [1]. > I can take care of committing something along the lines of Dilip's patch if you are okay. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 10:34 AM Michael Paquier wrote: > > On Thu, Sep 27, 2018 at 10:03:59AM +0530, Amit Kapila wrote: > > I think, in this case, it might be advisable to just fix the problem > > (a) which is what has been reported originally in the thread and > > AFAICS, the fix for that is clear as compared to the problem (b). If > > you agree, then we can discuss what is the best fix for the first > > problem (a). > > Okay, thanks for the input. The fix for (a) would be in my opinion to > just move the call to RecoveryInProgress() out of the critical section, > then save the result into a variable, and use the variable within the > critical section to avoid the potential palloc() problems. What do you > think? > Your proposed solution makes sense to me. IIUC, this is quite similar to what Dilip has also proposed [1]. [1] - https://www.postgresql.org/message-id/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 27, 2018 at 10:03:59AM +0530, Amit Kapila wrote: > I think, in this case, it might be advisable to just fix the problem > (a) which is what has been reported originally in the thread and > AFAICS, the fix for that is clear as compared to the problem (b). If > you agree, then we can discuss what is the best fix for the first > problem (a). Okay, thanks for the input. The fix for (a) would be in my opinion to just move the call to RecoveryInProgress() out of the critical section, then save the result into a variable, and use the variable within the critical section to avoid the potential palloc() problems. What do you think? -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Wed, Sep 19, 2018 at 10:50 AM Michael Paquier wrote: > > Agreed. "If we need to do that in the start process," we need to > > change the shared flag and issue FPW_CHANGE always when the > > database state is different from configuration file, regardless > > of what StartXLOG() did until the point. > > Definitely my mistake here. Attached is a patch to show what I have in > mind by making sure that the startup process generates a record even > after switching full_page_writes when started normally. This removes > the condition based on InRecovery, and uses a new argument for > UpdateFullPageWrites() instead. Your test case,as well as what I do > manually for testing, pass without triggering the assertion. > This will fix the previous problem reported by me but will lead to some other behavior change. If somebody toggles the full_page_writes flag before crash recovery, then it will log the XLOG_FPW_CHANGE record, but that was not the case without your patch. > When I see your patch, I actually see the same kind of logic as what I > propose which is summarized in two points, and that's a good thing: > a) Avoid the allocation in the critical section. > b) Avoid two processes to call UpdateFullPageWrites at the same time. I think, in this case, it might be advisable to just fix the problem (a) which is what has been reported originally in the thread and AFAICS, the fix for that is clear as compared to the problem (b). If you agree, then we can discuss what is the best fix for the first problem (a). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Tue, Sep 18, 2018 at 12:46 PM Kyotaro HORIGUCHI wrote: > > At Fri, 14 Sep 2018 16:30:37 +0530, Amit Kapila > wrote in > > On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier > > wrote: > > > > > > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote: > > > > /* > > > > * Properly accept or ignore signals the postmaster might send us. > > > > */ > > > > - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config > > > > file */ > > > > + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ > > > > > > > > I am finally coming back to this patch set, and that's one of the first > > > > things I am going to help moving on for this CF. And this bit from the > > > > last patch series is not acceptable as there are some parameters which > > > > are used by the startup process which can be reloaded. One of them is > > > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. > > > > > > So, I have been working on this problem again and I have reviewed the > > > thread, and there have been many things discussed in the last couple of > > > months: > > > 1) We do not want to initialize XLogInsert stuff unconditionally for all > > > processes at the moment recovery begins, but we just want to initialize > > > it once WAL write is open for business. > > > 2) Both the checkpointer and the startup process can call > > > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get > > > incorrect values. > > > > Can you share the steps to reproduce this problem? > > The window for the issue is small. > > It happens if checkpointer first looks SharedRecoveryInProgress > is turned off at the beginning of the CheckPointerMain loop. > The window is needed be widen to make sure the issue happens. > > Applying the first patch attched, the following steps will cause > the issue with high reliability. > > 1. initdb (full_page_writes = on by default) > > 2. put recovery.conf so that server can accept promote signal > > 3. touch /tmp/hoge >change full_page_write to off in postgresql.conf > > 4. pg_ctl_promote > > The attached second file is a script do the above steps. > Server writes the following log message and die. > > | 2018-09-18 13:55:49.928 JST [16405] LOG: database system is ready to > accept connections > | TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731) > | 2018-09-18 13:55:50.546 JST [16405] LOG: checkpointer process (PID 16407) > was terminated by signal 6: Aborted > > We can preallocating the XLogInsert buffer just to prevent the > crash. This is done by calling RecoveryInProgress() before > UpdateFullPageWrites() finds it turned to false. This is an > isolated problem. > Yes, till this point the problem is quite clear and can be reproduced, however, my question was for the next part for which there doesn't seem to be a concrete test case. > But it has another issue that FPW_CHANGE record > can be missing or wrong FPW state after recovery end. > > It comes from the fact that responsibility to update the flag is > not atomically passed from startup to checkpointer. (The window > is after UpdateFullPageWrites() call and until setting > SharedRecoveryInProgress to false.) > I understand your concern about missing FPW_CHANGE WAL record during the above window but is that really a problem because till the promotion is complete, it is not expected that we write any WAL record. Now, the other part of the problem you mentioned is "wrong FPW state" which can happen because we don't read Insert->fullPageWrites under lock. If you are concerned about that then I think we can solve it with a much less invasive change. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Wed, Sep 19, 2018 at 02:20:34PM +0900, Michael Paquier wrote: > On Tue, Sep 18, 2018 at 04:15:42PM +0900, Kyotaro HORIGUCHI wrote: >> My latest patch tries to remove the window by imposing all >> responsibility to apply config file changes to the shared FPW >> flag on the checkpointer. RecoveryInProgress() is changed to be >> called prior to UpdateFullPageWrites on the way doing that. > > I still need to look at that in details. That may be better than what I > am proposing. At quick glance what I propose is more simple, and does > not need enforcing a checkpoint using SIGINT. I have finally looked at the patch set from Horiguchi-san here: https://www.postgresql.org/message-id/20180828.193436.253621888.horiguchi.kyot...@lab.ntt.co.jp And actually this is very close to what my proposal does, except for a couple of caveats. Here are my notes: 1) The issue with palloc() happening in critical sections is able to go away, by making the logic behind UpdateFullPageWrites() more complicated than necessary. With the proposed patch, UpdateFullPageWrites() never gets called by the checkpointer until the startup process has done its business. I would have believed that keeping the check to RecoveryInProgress() directly in UpdateFullPageWrites() makes the logic more simple. That's actually what my proposal does. With your patch, it would be even possible to add an assertion so as this never gets called while in recovery. 2) At the end of recovery, if there is a crash before the checkpointer is able to update the shared parameters it needs to work on away, then no XLOG_CHANGE_PARAMETER record is generated. This is not a problem for normal cases, but there is one scenario where this is a problem: at the end of recovery if the bgwriter is not started, then the startup process creates a checkpoint by itself, and it would miss the record generation. 3) SIGINT is abused of, in such a way that the checkpointer may generate two checkpoints where only one is needed post-recovery. 4) SyncRepUpdateSyncStandbysDefined() would get called even without SIGHUP being reached. This feels also like a future trap waiting to bite as well. When I see your patch, I actually see the same kind of logic as what I propose which is summarized in two points, and that's a good thing: a) Avoid the allocation in the critical section. b) Avoid two processes to call UpdateFullPageWrites at the same time. Now the points mentioned above make what you are proposing weaker in my opinion, and 2) is an actual bug, side effect of the proposed patch. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Tue, Sep 18, 2018 at 04:15:42PM +0900, Kyotaro HORIGUCHI wrote: > My latest patch tries to remove the window by imposing all > responsibility to apply config file changes to the shared FPW > flag on the checkpointer. RecoveryInProgress() is changed to be > called prior to UpdateFullPageWrites on the way doing that. I still need to look at that in details. That may be better than what I am proposing. At quick glance what I propose is more simple, and does not need enforcing a checkpoint using SIGINT. > InRecoery is turned off after the last UpdateFullPageWrite() and > before SharedRecoveryInProgress is turned off. So it is still > leaving the window. Thus, checkpointer stops flipping the value > before SharedRecoveryInProgress's turning off. (I don't > understand InRecovery condition..) However, this lets > checkpointer omit reloading after UpdateFullPagesWrite() in > startup until SharedRecoveryInProgress is tunred off. That won't matter in this case, as RecoveryInProgress() gets called out of the critical section in the previous patch I sent, so there is no failure. Let's not forget that the first issue is the allocation in the critical section. The second issue is that UpdateFullPageWrites may be called concurrently across multiple processes, which is not something it is designed for. The second issue gets addressed in my proposal my making sure that the checkpointer never tries to update full_page_writes by himself until recovery has finished, and that the startup process is the only updater once recovery ends. > Agreed. "If we need to do that in the start process," we need to > change the shared flag and issue FPW_CHANGE always when the > database state is different from configuration file, regardless > of what StartXLOG() did until the point. Definitely my mistake here. Attached is a patch to show what I have in mind by making sure that the startup process generates a record even after switching full_page_writes when started normally. This removes the condition based on InRecovery, and uses a new argument for UpdateFullPageWrites() instead. Your test case,as well as what I do manually for testing, pass without triggering the assertion. + /* DEBUG: cause a reload */ + { + struct stat b; + if (stat("/tmp/hoge", &b) == 0) + { + elog(LOG, "STARTUP SLEEP FOR 1s"); + sleep(1); + elog(LOG, "DONE."); + DirectFunctionCall1(pg_reload_conf, 0); + } + } The way you patch the backend this way is always nice to see so as it is easy to reproduce bugs, and it actually helps in reproducing the assertion failure correctly ;) -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3025d0badb..079e1814c1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7719,10 +7719,15 @@ StartupXLOG(void) * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE * record before resource manager writes cleanup WAL records or checkpoint * record is written. + * + * It is safe to check the shared full_page_writes without the lock, + * because there is no concurrently running process able to update it. + * The only other process able to update full_page_writes is the + * checkpointer, still it is unable to do so until recovery finishes. */ Insert->fullPageWrites = lastFullPageWrites; LocalSetXLogInsertAllowed(); - UpdateFullPageWrites(); + UpdateFullPageWrites(true); LocalXLogInsertAllowed = -1; if (InRecovery) @@ -9693,14 +9698,28 @@ XLogReportParameters(void) * Update full_page_writes in shared memory, and write an * XLOG_FPW_CHANGE record if necessary. * - * Note: this function assumes there is no other process running - * concurrently that could update it. + * Note: this can be called from the checkpointer, or the startup process + * at the end of recovery. One could think that this routine should be + * careful with its lock handling, however this is a no-op for the + * checkpointer until the startup process marks the end of recovery, + * so only one of them can do the work of this routine at once. */ void -UpdateFullPageWrites(void) +UpdateFullPageWrites(bool force) { XLogCtlInsert *Insert = &XLogCtl->Insert; + /* + * Check if recovery is still in progress before entering this critical + * section, as some memory allocation could happen at the end of + * recovery. There is nothing to do for a system still in recovery. + * Note that the caller may still want to enforce an update to happen. + * One such example is the startup process, which updates full_page_writes + * at the end of recovery. + */ + if (RecoveryInProgress() && !force) + return; + /* * Do nothing if full_page_writes has not been changed. * @@ -9731,7 +9750,7 @@ UpdateFullPageWrites(void) * Write an XLOG_FPW_CHANGE record. This allows us to keep track of * full_page_writes during archive recovery, if required. */ - if (XL
Re: Problem while setting the fpw with SIGHUP
At Fri, 14 Sep 2018 16:30:37 +0530, Amit Kapila wrote in > On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier wrote: > > > > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote: > > > /* > > > * Properly accept or ignore signals the postmaster might send us. > > > */ > > > - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file > > > */ > > > + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ > > > > > > I am finally coming back to this patch set, and that's one of the first > > > things I am going to help moving on for this CF. And this bit from the > > > last patch series is not acceptable as there are some parameters which > > > are used by the startup process which can be reloaded. One of them is > > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. > > > > So, I have been working on this problem again and I have reviewed the > > thread, and there have been many things discussed in the last couple of > > months: > > 1) We do not want to initialize XLogInsert stuff unconditionally for all > > processes at the moment recovery begins, but we just want to initialize > > it once WAL write is open for business. > > 2) Both the checkpointer and the startup process can call > > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get > > incorrect values. > > Can you share the steps to reproduce this problem? The window for the issue is small. It happens if checkpointer first looks SharedRecoveryInProgress is turned off at the beginning of the CheckPointerMain loop. The window is needed be widen to make sure the issue happens. Applying the first patch attched, the following steps will cause the issue with high reliability. 1. initdb (full_page_writes = on by default) 2. put recovery.conf so that server can accept promote signal 3. touch /tmp/hoge change full_page_write to off in postgresql.conf 4. pg_ctl_promote The attached second file is a script do the above steps. Server writes the following log message and die. | 2018-09-18 13:55:49.928 JST [16405] LOG: database system is ready to accept connections | TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731) | 2018-09-18 13:55:50.546 JST [16405] LOG: checkpointer process (PID 16407) was terminated by signal 6: Aborted We can preallocating the XLogInsert buffer just to prevent the crash. This is done by calling RecoveryInProgress() before UpdateFullPageWrites() finds it turned to false. This is an isolated problem. But it has another issue that FPW_CHANGE record can be missing or wrong FPW state after recovery end. It comes from the fact that responsibility to update the flag is not atomically passed from startup to checkpointer. (The window is after UpdateFullPageWrites() call and until setting SharedRecoveryInProgress to false.) My latest patch tries to remove the window by imposing all responsibility to apply config file changes to the shared FPW flag on the checkpointer. RecoveryInProgress() is changed to be called prior to UpdateFullPageWrites on the way doing that. > > 3) We do not want a palloc() in a critical section because of > > RecoveryinProgress being called. > > > > And the root issue here is 2), because the checkpointer tries to update > > Insert->fullPageWrites but it does not need to do so until recovery has > > been finished. So in order to fix the original issue I am proposing a > > simple fix: let's make sure that the checkpointer does not update > > Insert->fullPageWrites until recovery finishes, and let's have the > > startup process do the first update once it finishes recovery and > > inserts by itself the XLOG_PARAMETER_CHANGE. This way the order of > > events is purely sequential and we don't need to worry about having the > > checkpointer and the startup process eat on each other's plate because > > the checkpointer would only try to work on updating the shared memory > > value of full_page_writes once SharedRecoveryInProgress is switched to > > true, and that happens after the startup process does its initial call > > to UpdateFullPageWrites(). I have improved as well all the comments > > around to make clear the behavior wanted. > > > > Thoughts? InRecoery is turned off after the last UpdateFullPageWrite() and before SharedRecoveryInProgress is turned off. So it is still leaving the window. Thus, checkpointer stops flipping the value before SharedRecoveryInProgress's turning off. (I don't understand InRecovery condition..) However, this lets checkpointer omit reloading after UpdateFullPagesWrite() in startup until SharedRecoveryInProgress is tunred off. > UpdateFullPageWrites(void) > { > XLogCtlInsert *Insert = &XLogCtl->Insert; > + /* > + * Check if recovery is still in progress before entering this critical > + * section, as some memory allocation could happen at the end of > + * recovery. There is nothing to do for a system still in recovery. > + * Note that we need to process things here at
Re: Problem while setting the fpw with SIGHUP
On Tue, Sep 18, 2018 at 01:06:09PM +0900, Kyotaro HORIGUCHI wrote: > I was wrong here. It was handled in HandleStartupProcInterrupts > called from StartupXLOG. So, it should be just removed from the > set. Sorry for the bogus patch. Thanks for confirming. Still, it looks like a waste to abuse on SIGINT just to forcibly wake up the checkpointer and request from it a checkpoint... And you could just have used a new parameter for the checkpointer appended with CHECKPOINT_FORCE. I think that my approach of just making the set of events purely ordered will save from any kind of race conditions, while I suspect that what you propose here does not close all the holes. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Fri, Sep 14, 2018 at 04:30:37PM +0530, Amit Kapila wrote: > On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier wrote: >> So, I have been working on this problem again and I have reviewed the >> thread, and there have been many things discussed in the last couple of >> months: >> 1) We do not want to initialize XLogInsert stuff unconditionally for all >> processes at the moment recovery begins, but we just want to initialize >> it once WAL write is open for business. >> 2) Both the checkpointer and the startup process can call >> UpdateFullPageWrites() which can cause Insert->fullPageWrites to get >> incorrect values. > > Can you share the steps to reproduce this problem? This refers to the first problem reported on this thread: https://www.postgresql.org/message-id/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com In order to reproduce the problem, you can for example stop the server in immediate mode. Then attach a debugger to it and add a breakpoint to UpdateFullPageWrites. You can check that XLOG insert has not been initialized yet by looking at xloginsert_cxt ot ThisTimeLineID. On a second session, switch full_page_writes to on or off, reload the parameters and then trigger a checkpoint. The important point is to trigger an inconsistency between XLogCtl->Insert->fullPageWrites and the value of fullPageWrites within the checkpointer context. With the checkpoint triggered, the debugger will stop at UpdateFullPageWrites immediately. At this point, you can simply check if fullPageWrites Insert->fullPageWrites have the same value or a different one. If the values match, simply switch full_page_writes and reload again, with the checkpointer still waiting at the beginning of UpdateFullPageWrites. SIGHUP will make the checkpointer process hang a bit, and then it will move on. At this point you will be able to see the failure: TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731) 2018-09-18 15:06:39 JST [7396]: [11-1] db=,user=,app=,client= LOG: checkpointer process (PID 7399) was terminated by signal 6: Aborted > On a regular startup when there is no recovery, it won't allow us to > log the WAL record (XLOG_FPW_CHANGE) which can happen without above > change. You can check that by setting full_page_writes=off and start > the system. Oh, good point, InRecovery is set to false in this case so that would be skipped. We can simply fix that by adding a flag, say "force" to UpdateFullPageWrites to allow a process to enforce the update of FPW even if RecoveryInProgress returns true, which would be the case for the startup process. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
Hello. At Tue, 18 Sep 2018 11:38:50 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20180918.113850.164570138.horiguchi.kyot...@lab.ntt.co.jp> > At Thu, 6 Sep 2018 16:37:28 -0700, Michael Paquier > wrote in <20180906233728.gr2...@paquier.xyz> > > I am finally coming back to this patch set, and that's one of the first > > things I am going to help moving on for this CF. And this bit from the > > last patch series is not acceptable as there are some parameters which > > are used by the startup process which can be reloaded. One of them is > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. > > The third patch actually is not mandatory in the patch set. The > only motive of that is it doesn't nothing. The handler for SIGHUP > just sets got_SIGHUP then wakes up the process, and the variable > is not looked up by no one. If you mind the change, it can be > removed with no side effect. I was wrong here. It was handled in HandleStartupProcInterrupts called from StartupXLOG. So, it should be just removed from the set. Sorry for the bogus patch. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem while setting the fpw with SIGHUP
At Thu, 6 Sep 2018 16:37:28 -0700, Michael Paquier wrote in <20180906233728.gr2...@paquier.xyz> > On Tue, Aug 28, 2018 at 07:34:36PM +0900, Kyotaro HORIGUCHI wrote: > > Thanks for prompting. The difference is in a comment and I'm fine > > with the change. > > /* > * Properly accept or ignore signals the postmaster might send us. > */ > - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ > + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ > > I am finally coming back to this patch set, and that's one of the first > things I am going to help moving on for this CF. And this bit from the > last patch series is not acceptable as there are some parameters which > are used by the startup process which can be reloaded. One of them is > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. The third patch actually is not mandatory in the patch set. The only motive of that is it doesn't nothing. The handler for SIGHUP just sets got_SIGHUP then wakes up the process, and the variable is not looked up by no one. If you mind the change, it can be removed with no side effect. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem while setting the fpw with SIGHUP
On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier wrote: > > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote: > > /* > > * Properly accept or ignore signals the postmaster might send us. > > */ > > - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ > > + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ > > > > I am finally coming back to this patch set, and that's one of the first > > things I am going to help moving on for this CF. And this bit from the > > last patch series is not acceptable as there are some parameters which > > are used by the startup process which can be reloaded. One of them is > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. > > So, I have been working on this problem again and I have reviewed the > thread, and there have been many things discussed in the last couple of > months: > 1) We do not want to initialize XLogInsert stuff unconditionally for all > processes at the moment recovery begins, but we just want to initialize > it once WAL write is open for business. > 2) Both the checkpointer and the startup process can call > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get > incorrect values. Can you share the steps to reproduce this problem? > 3) We do not want a palloc() in a critical section because of > RecoveryinProgress being called. > > And the root issue here is 2), because the checkpointer tries to update > Insert->fullPageWrites but it does not need to do so until recovery has > been finished. So in order to fix the original issue I am proposing a > simple fix: let's make sure that the checkpointer does not update > Insert->fullPageWrites until recovery finishes, and let's have the > startup process do the first update once it finishes recovery and > inserts by itself the XLOG_PARAMETER_CHANGE. This way the order of > events is purely sequential and we don't need to worry about having the > checkpointer and the startup process eat on each other's plate because > the checkpointer would only try to work on updating the shared memory > value of full_page_writes once SharedRecoveryInProgress is switched to > true, and that happens after the startup process does its initial call > to UpdateFullPageWrites(). I have improved as well all the comments > around to make clear the behavior wanted. > > Thoughts? > UpdateFullPageWrites(void) { XLogCtlInsert *Insert = &XLogCtl->Insert; + /* + * Check if recovery is still in progress before entering this critical + * section, as some memory allocation could happen at the end of + * recovery. There is nothing to do for a system still in recovery. + * Note that we need to process things here at the end of recovery for + * the startup process, which is why this checks after InRecovery. + */ + if (RecoveryInProgress() && !InRecovery) + return; + On a regular startup when there is no recovery, it won't allow us to log the WAL record (XLOG_FPW_CHANGE) which can happen without above change. You can check that by setting full_page_writes=off and start the system. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote: > /* > * Properly accept or ignore signals the postmaster might send us. > */ > - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ > + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ > > I am finally coming back to this patch set, and that's one of the first > things I am going to help moving on for this CF. And this bit from the > last patch series is not acceptable as there are some parameters which > are used by the startup process which can be reloaded. One of them is > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. So, I have been working on this problem again and I have reviewed the thread, and there have been many things discussed in the last couple of months: 1) We do not want to initialize XLogInsert stuff unconditionally for all processes at the moment recovery begins, but we just want to initialize it once WAL write is open for business. 2) Both the checkpointer and the startup process can call UpdateFullPageWrites() which can cause Insert->fullPageWrites to get incorrect values. 3) We do not want a palloc() in a critical section because of RecoveryinProgress being called. And the root issue here is 2), because the checkpointer tries to update Insert->fullPageWrites but it does not need to do so until recovery has been finished. So in order to fix the original issue I am proposing a simple fix: let's make sure that the checkpointer does not update Insert->fullPageWrites until recovery finishes, and let's have the startup process do the first update once it finishes recovery and inserts by itself the XLOG_PARAMETER_CHANGE. This way the order of events is purely sequential and we don't need to worry about having the checkpointer and the startup process eat on each other's plate because the checkpointer would only try to work on updating the shared memory value of full_page_writes once SharedRecoveryInProgress is switched to true, and that happens after the startup process does its initial call to UpdateFullPageWrites(). I have improved as well all the comments around to make clear the behavior wanted. Thoughts? -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3025d0badb..69912e6a22 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7719,6 +7719,11 @@ StartupXLOG(void) * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE * record before resource manager writes cleanup WAL records or checkpoint * record is written. + * + * It is safe to check the shared full_page_writes without the lock, + * because there is no concurrently running process able to update it. + * The only other process able to update full_page_writes is the + * checkpointer, still it is unable to do so until recovery finishes. */ Insert->fullPageWrites = lastFullPageWrites; LocalSetXLogInsertAllowed(); @@ -9693,14 +9698,27 @@ XLogReportParameters(void) * Update full_page_writes in shared memory, and write an * XLOG_FPW_CHANGE record if necessary. * - * Note: this function assumes there is no other process running - * concurrently that could update it. + * Note: this can be called from the checkpointer, or the startup process + * at the end of recovery. One could think that this routine should be + * careful with its lock handling, however this is a no-op for the + * checkpointer until the startup process marks the end of recovery, + * so only one of them can do the work of this routine at once. */ void UpdateFullPageWrites(void) { XLogCtlInsert *Insert = &XLogCtl->Insert; + /* + * Check if recovery is still in progress before entering this critical + * section, as some memory allocation could happen at the end of + * recovery. There is nothing to do for a system still in recovery. + * Note that we need to process things here at the end of recovery for + * the startup process, which is why this checks after InRecovery. + */ + if (RecoveryInProgress() && !InRecovery) + return; + /* * Do nothing if full_page_writes has not been changed. * @@ -9731,7 +9749,7 @@ UpdateFullPageWrites(void) * Write an XLOG_FPW_CHANGE record. This allows us to keep track of * full_page_writes during archive recovery, if required. */ - if (XLogStandbyInfoActive() && !RecoveryInProgress()) + if (XLogStandbyInfoActive()) { XLogBeginInsert(); XLogRegisterData((char *) (&fullPageWrites), sizeof(bool)); signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Thu, Sep 13, 2018 at 04:38:30PM +0530, Amit Kapila wrote: > So, the problem started appearing after some rearrangement of code in > both the above-mentioned commits. I verified that this problem > doesn't exist in versions <=9.4, so backpatch-through 9.5. Thanks Amit for taking care of this first problem. I am going to send another patch which is able to take care of concurrent updates of Insert->fullPageWrites for the checkpointer and the startup process to fix the original issue reported by Dilip Kumar, so as we are able to close definitely the loop on this thread. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Mon, Sep 10, 2018 at 11:54 AM Amit Kapila wrote: > > Thanks, but what I wanted you to verify is the commit that broke it in > 9.5. On again looking at it, I think it is below code in commit > 2076db2aea that caused this problem. If possible, can you once test > it before and at this commit or at least do the manual review of same > to cross-verify? > I have myself investigated this further and found that this problem started occuring due to code rearrangement in commits 2c03216d83 and 2076db2aea. In commit 2076db2aea, below check for comparing the old and new value for fullPageWrites got changed: > - if ((Insert->fullPageWrites || Insert->forcePageWrites) && > !doPageWrites) > + if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && > doPageWrites) > { However, it alone didn't lead to this problem because XLogRecordAssemble() gives the valid value of fpw_lsn irrespective of whether full-page-writes was enabled or not. Later in commit 2c03216d83, we changed XLogRecordAssemble() such that it will give the valid value of fpw_lsn only if doPageWrites is enabled, basically this part of the commit: + /* Determine if this block needs to be backed up */ + if (regbuf->flags & REGBUF_FORCE_IMAGE) + needs_backup = true; + else if (regbuf->flags & REGBUF_NO_IMAGE) + needs_backup = false; + else if (!doPageWrites) + needs_backup = false; + else { - /* Simple data, just include it */ - len += rdt->len; + /* + * We assume page LSN is first data on *every* page that can be + * passed to XLogInsert, whether it has the standard page layout + * or not. + */ + XLogRecPtr page_lsn = PageGetLSN(regbuf->page); + + needs_backup = (page_lsn <= RedoRecPtr); + if (!needs_backup) + { + if (*fpw_lsn == InvalidXLogRecPtr || page_lsn < *fpw_lsn) + *fpw_lsn = page_lsn; + } } So, the problem started appearing after some rearrangement of code in both the above-mentioned commits. I verified that this problem doesn't exist in versions <=9.4, so backpatch-through 9.5. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Tue, Aug 28, 2018 at 4:05 PM Kyotaro HORIGUCHI wrote: > > Hello. > > At Sat, 25 Aug 2018 14:50:53 +0530, Amit Kapila > wrote in > > On Wed, Aug 1, 2018 at 12:56 PM Kyotaro HORIGUCHI > > wrote: > > > > > > Thank you, Amit, Michael. > > > > > > > Can you verify the first patch that I have posted above [1]? We can > > commit it separately. > > Thanks for prompting. The difference is in a comment and I'm fine > with the change. > Thanks, but what I wanted you to verify is the commit that broke it in 9.5. On again looking at it, I think it is below code in commit 2076db2aea that caused this problem. If possible, can you once test it before and at this commit or at least do the manual review of same to cross-verify? + doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites); - /* -* Also check to see if fullPageWrites or forcePageWrites was just turned -* on; if we weren't already doing full-page writes then go back and -* recompute. (If it was just turned off, we could recompute the record -* without full pages, but we choose not to bother.) -*/ - if ((Insert->fullPageWrites || Insert->forcePageWrites) && !doPageWrites) + if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites) { - /* Oops, must redo it with full-page data. */ + /* +* Oops, some buffer now needs to be backed up that the caller +* didn't back up. Start over. +*/ WALInsertLockRelease(); END_CRIT_SECTION(); - rdt_lastnormal->next = NULL; - info = info_orig; - goto begin; + return InvalidXLogRecPtr; } -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Tue, Aug 28, 2018 at 07:34:36PM +0900, Kyotaro HORIGUCHI wrote: > Thanks for prompting. The difference is in a comment and I'm fine > with the change. /* * Properly accept or ignore signals the postmaster might send us. */ - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ I am finally coming back to this patch set, and that's one of the first things I am going to help moving on for this CF. And this bit from the last patch series is not acceptable as there are some parameters which are used by the startup process which can be reloaded. One of them is wal_retrieve_retry_interval for tuning when fetching WAL at recovery. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
Hello. At Sat, 25 Aug 2018 14:50:53 +0530, Amit Kapila wrote in > On Wed, Aug 1, 2018 at 12:56 PM Kyotaro HORIGUCHI > wrote: > > > > Thank you, Amit, Michael. > > > > Can you verify the first patch that I have posted above [1]? We can > commit it separately. Thanks for prompting. The difference is in a comment and I'm fine with the change. > > It's a long time ago.. Let me have a bit of time to blow dust off.. > > > > https://www.postgresql.org/message-id/20180420.173354.43303926.horiguchi.kyot...@lab.ntt.co.jp > > > > ...done..i working.. > > > > The original problem here was UpdateFullPageWrites() can call > > RecoveryInProgress(), which does palloc in a critical > > section. This happens when standy is commanded to reload with > > change of full_pages_writes during recovery. > > > > AFAIU from the original problem reported by Dilip, it can happen > during checkpoint without standby as well. Yes, standby is not needed but archive (streaming) recovery is required to start checkpointer. > > While exploring it, I found that update of fullPageWrite flag is > > updated concurrently against its design. A race condition happens > > in the following steps in my diagnosis. > > > > 1. While the startup process is running recovery, we didn't > > consider that checkpointer may be running concurrently, but it > > happens for standby. > > > > 2. Checkpointer considers itself (or designed) as the *only* > > updator of shared config including fillPageWrites. In reality > > the startup is another concurent updator on standby. Addition to > > that, checkpointer assumes that it is started under WAL-emitting > > state, that is, InitXLogInsert() has been called elsewhere, but > > it is not the case on standby. > > > > Note that checkpointer mustn't update FPW flag before recovery > > ends because startup will overrides the flag. > > > > 3. As the result, when standby gets full_page_writes changed and > > SIGHUP during recovery, checkpointer tries to update FPW flag > > and calls RecoveryInProgress() on the way. bang! > > > > > > With the 0002-step1.patch, checkpointer really becomes the only > > updator of the FPW flag after recovery ends. StartupXLOG() > > updates it just once before checkpointer starts to update it. > > > > - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE > - * record before resource manager writes cleanup WAL records or checkpoint > - * record is written. > + * Update full_page_writes in shared memory with the lastest value before > + * resource manager writes cleanup WAL records or checkpoint record is > + * written. We don't need to write XLOG_FPW_CHANGE since this just > + * reflects the status at the last redo'ed record. No lock is required > + * since startup is the only updator of the flag at this > + * point. Checkpointer will take over after SharedRecoveryInProgress is > + * turned to false. > */ > Insert->fullPageWrites = lastFullPageWrites; > - LocalSetXLogInsertAllowed(); > - UpdateFullPageWrites(); > - LocalXLogInsertAllowed = -1; > > lastFullPageWrites will contain the latest value among the replayed > WAL records. It can still be different fullPageWrites which is > updated by UpdateFullPageWrites(). So, I don't think it is advisable > to remove it and rely on checkpointer to update it. I think when it > is called from this code path, it will anyway not write > XLOG_FPW_CHANGE because of the !RecoveryInProgress() check. Unfortunately startup doesn't process reloads by SIGHUP. So just letting startup process set sharedFPW doesn't work correctly. I don't think reload during redo loop will be apparently safe. Forcibly reloading config without SIGHUP just before UpdateFullPageWrites() in StartupXLOG breaks the reload semantics. # The comment for StartupPorcessSigHagndler is wrong in the sense.. > > Under the normal(slow?) promotion mode, checkpointer is woken up > > explicitly to make the config setting effective as soon as > > possible. (This might be unnecessary.) > > > > I am not sure this is the right approach. If we are worried about > concurrency issues, then we can use lock to protect it. Since only checkpointer knows the right current value of the flag, it is responsible for the final (just after recovery end) setup of the flag. Actually we don't need to wake up checkpoiner as soon as the end of recovery, but it must UpdateFullPageWrites() before the first checkpoint starts without receiving SIGHUP. > > In checkpointer, RecoveryInProgress() is called preior to > > UpdateFPW() to disable update FPW during recovery, so the crash > > that is the issue here is fixed. > > > > It seems to me that you are trying to resolve two different problems > in the same patch. I think we can have a patch to deal with > concurrency issue if any and a separate patch to call > RecoveryInProgress outside critical section. Hmm. Perhaps right. The change of UpdateShareMemoryConfig alone resolves the initial issue and others are needed to have t
Re: Problem while setting the fpw with SIGHUP
On Wed, Aug 1, 2018 at 12:56 PM Kyotaro HORIGUCHI wrote: > > Thank you, Amit, Michael. > Can you verify the first patch that I have posted above [1]? We can commit it separately. > > It's a long time ago.. Let me have a bit of time to blow dust off.. > > https://www.postgresql.org/message-id/20180420.173354.43303926.horiguchi.kyot...@lab.ntt.co.jp > > ...done..i working.. > > The original problem here was UpdateFullPageWrites() can call > RecoveryInProgress(), which does palloc in a critical > section. This happens when standy is commanded to reload with > change of full_pages_writes during recovery. > AFAIU from the original problem reported by Dilip, it can happen during checkpoint without standby as well. > While exploring it, I found that update of fullPageWrite flag is > updated concurrently against its design. A race condition happens > in the following steps in my diagnosis. > > 1. While the startup process is running recovery, we didn't > consider that checkpointer may be running concurrently, but it > happens for standby. > > 2. Checkpointer considers itself (or designed) as the *only* > updator of shared config including fillPageWrites. In reality > the startup is another concurent updator on standby. Addition to > that, checkpointer assumes that it is started under WAL-emitting > state, that is, InitXLogInsert() has been called elsewhere, but > it is not the case on standby. > > Note that checkpointer mustn't update FPW flag before recovery > ends because startup will overrides the flag. > > 3. As the result, when standby gets full_page_writes changed and > SIGHUP during recovery, checkpointer tries to update FPW flag > and calls RecoveryInProgress() on the way. bang! > > > With the 0002-step1.patch, checkpointer really becomes the only > updator of the FPW flag after recovery ends. StartupXLOG() > updates it just once before checkpointer starts to update it. > - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. + * Update full_page_writes in shared memory with the lastest value before + * resource manager writes cleanup WAL records or checkpoint record is + * written. We don't need to write XLOG_FPW_CHANGE since this just + * reflects the status at the last redo'ed record. No lock is required + * since startup is the only updator of the flag at this + * point. Checkpointer will take over after SharedRecoveryInProgress is + * turned to false. */ Insert->fullPageWrites = lastFullPageWrites; - LocalSetXLogInsertAllowed(); - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; lastFullPageWrites will contain the latest value among the replayed WAL records. It can still be different fullPageWrites which is updated by UpdateFullPageWrites(). So, I don't think it is advisable to remove it and rely on checkpointer to update it. I think when it is called from this code path, it will anyway not write XLOG_FPW_CHANGE because of the !RecoveryInProgress() check. > Under the normal(slow?) promotion mode, checkpointer is woken up > explicitly to make the config setting effective as soon as > possible. (This might be unnecessary.) > I am not sure this is the right approach. If we are worried about concurrency issues, then we can use lock to protect it. > In checkpointer, RecoveryInProgress() is called preior to > UpdateFPW() to disable update FPW during recovery, so the crash > that is the issue here is fixed. > It seems to me that you are trying to resolve two different problems in the same patch. I think we can have a patch to deal with concurrency issue if any and a separate patch to call RecoveryInProgress outside critical section. [1] - https://www.postgresql.org/message-id/CAA4eK1JvKxsHfM6GCHoKNas-7vDSniLgaAm%3DcG_OaQaOYRNc3w%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
Thank you, Amit, Michael. At Sun, 29 Jul 2018 08:19:11 +0900, Michael Paquier wrote in <20180728231911.gb1...@paquier.xyz> > On Sat, Jul 28, 2018 at 07:10:24PM +0530, Amit Kapila wrote: > > I have just responded to your first patch (0001). Can you once again > > summarize what the 0002 exactly accomplishes? I think one of the > > goals is to fix the original problem reported in this thread and other > > is you have found the concurrency issue. Is it possible to have > > separate patches for those or you think they are interrelated and > > needs to be fixed together? > > That would be nice. The last time I read this thread I have been rather > confused about what was being discussed, what were the set of problems, > and what was being fixed. Speaking of which, this is one of the bugfix > patches I wanted to look at once I have untanggled the autovacuum one > for temporary relations and the DOS issues with lock lookups. It's a long time ago.. Let me have a bit of time to blow dust off.. https://www.postgresql.org/message-id/20180420.173354.43303926.horiguchi.kyot...@lab.ntt.co.jp ...done..i working.. The original problem here was UpdateFullPageWrites() can call RecoveryInProgress(), which does palloc in a critical section. This happens when standy is commanded to reload with change of full_pages_writes during recovery. While exploring it, I found that update of fullPageWrite flag is updated concurrently against its design. A race condition happens in the following steps in my diagnosis. 1. While the startup process is running recovery, we didn't consider that checkpointer may be running concurrently, but it happens for standby. 2. Checkpointer considers itself (or designed) as the *only* updator of shared config including fillPageWrites. In reality the startup is another concurent updator on standby. Addition to that, checkpointer assumes that it is started under WAL-emitting state, that is, InitXLogInsert() has been called elsewhere, but it is not the case on standby. Note that checkpointer mustn't update FPW flag before recovery ends because startup will overrides the flag. 3. As the result, when standby gets full_page_writes changed and SIGHUP during recovery, checkpointer tries to update FPW flag and calls RecoveryInProgress() on the way. bang! With the 0002-step1.patch, checkpointer really becomes the only updator of the FPW flag after recovery ends. StartupXLOG() updates it just once before checkpointer starts to update it. Under the normal(slow?) promotion mode, checkpointer is woken up explicitly to make the config setting effective as soon as possible. (This might be unnecessary.) In checkpointer, RecoveryInProgress() is called preior to UpdateFPW() to disable update FPW during recovery, so the crash that is the issue here is fixed. FYI I think that 0002-tesp2.patch is rejected. Please find the attacehd revised patch. It is rebased and provided with a commit message. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From c839e039d4ebb06c0dd345f7992a460a27827805 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 1 Aug 2018 15:29:01 +0900 Subject: [PATCH] Fix FPW flags updates during recovery. Each of checkpointer and startup processes thinks it is the only updator of fullPageWrites flag ignorantly of each other. This causes race condition and leads checkpionter to a crash in a certain case. This patch makes chackpionter the only updator and organize fix the sequence around recovery end in order to fullPageWrite flag is correctly updated. --- src/backend/access/transam/xlog.c | 19 - src/backend/postmaster/checkpointer.c | 50 --- src/include/postmaster/bgwriter.h | 1 + 3 files changed, 48 insertions(+), 22 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 493f1db7b9..f315d11745 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7692,14 +7692,15 @@ StartupXLOG(void) XLogCtl->LogwrtRqst.Flush = EndOfLog; /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. + * Update full_page_writes in shared memory with the lastest value before + * resource manager writes cleanup WAL records or checkpoint record is + * written. We don't need to write XLOG_FPW_CHANGE since this just + * reflects the status at the last redo'ed record. No lock is required + * since startup is the only updator of the flag at this + * point. Checkpointer will take over after SharedRecoveryInProgress is + * turned to false. */ Insert->fullPageWrites = lastFullPageWrites; - LocalSetXLogInsertAllowed(); - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; if (InRecovery) { @@ -7941,10 +7942,14 @@ StartupXLOG(void) * If this was a fast promotion, request an (online) checkpoint now
Re: Problem while setting the fpw with SIGHUP
On Sat, Jul 28, 2018 at 07:10:24PM +0530, Amit Kapila wrote: > I have just responded to your first patch (0001). Can you once again > summarize what the 0002 exactly accomplishes? I think one of the > goals is to fix the original problem reported in this thread and other > is you have found the concurrency issue. Is it possible to have > separate patches for those or you think they are interrelated and > needs to be fixed together? That would be nice. The last time I read this thread I have been rather confused about what was being discussed, what were the set of problems, and what was being fixed. Speaking of which, this is one of the bugfix patches I wanted to look at once I have untanggled the autovacuum one for temporary relations and the DOS issues with lock lookups. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Tue, Apr 24, 2018 at 7:10 AM, Kyotaro HORIGUCHI wrote: > At Tue, 24 Apr 2018 08:52:17 +0900, Michael Paquier > wrote in <20180423235217.gb1...@paquier.xyz> >> On Mon, Apr 23, 2018 at 12:21:04PM -0400, Robert Haas wrote: >> > Fine, but that doesn't answer the question of whether we actually need >> > to or should change the behavior in the first place. >> >> Per the last arguments that would be "No, we don't want to change it as >> it would surprise some users": >> https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz > > The answer is that the change of behavior is not required to fix > the bug. So I'm fine with applying only (0001 and) 0002 here. > I have just responded to your first patch (0001). Can you once again summarize what the 0002 exactly accomplishes? I think one of the goals is to fix the original problem reported in this thread and other is you have found the concurrency issue. Is it possible to have separate patches for those or you think they are interrelated and needs to be fixed together? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Fri, Apr 20, 2018 at 6:06 PM, Amit Kapila wrote: > On Fri, Apr 20, 2018 at 11:40 AM, Kyotaro HORIGUCHI > wrote: >> By the way, I think I found a bug of FPW. >> >> The following steps yields INSERT record that doesn't have a FPI >> after a checkpoint. >> >> (Start server with full_page_writes = off) >> CREATE TABLE t (a int); >> CHECKPOINT; >> INSERT INTO t VALUES (1); >> ALTER SYSTEM SET full_page_writes TO on; >> SELECT pg_reload_conf(); >> CHECKPOINT; >> INSERT INTO t VALUES (1); >> >> The last insert is expected to write a record with FPI but it >> doesn't actually. No FPI will be written for the page after that. >> >> It seems that the reason is that XLogInsertRecord is forgetting >> to check doPageWrites' update. >> >> In the failure case, fpw_lsn has been set by XLogRecordAssemble >> but doPageWrite is false at the time and it considers that no FPI >> is required then it sets fpw_lsn to InvalidXLogRecPtr. >> >> After that, XLogInsertRecord receives the record but it thinks >> that doPageWrites is true since it looks the shared value. >> >>> if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites) >> >> So this line thinks that "no FPI is omitted in this record" but >> actually the record is just forgotten to attach them. >> >> The attached patch lets XLogInsertRecord check if doPageWrites >> has been turned on after the last call and cause recomputation in >> the case. >> > > I think you have correctly spotted the problem and your fix looks good > to me. As this is a separate problem and fix is different from what > we are discussing here, I think this can be committed it separately. > AFAICS, this problem has been introduced by commit 2c03216d831160bedd72d45f712601b6f7d03f1c, so we should backpatch till 9.5. Please find attached the slightly modified patch for this bug. I have modified one of the comments in the patch and the proposed commit message. Can you please once cross-verify if the problem exits till 9.5 and that this patch fixes it? Also, I don't see an easy way to write a test for it, do you have anything in mind? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com Correctly-attach-FPI-to-the-first-record.2.patch Description: Binary data
Re: Problem while setting the fpw with SIGHUP
At Tue, 24 Apr 2018 08:52:17 +0900, Michael Paquier wrote in <20180423235217.gb1...@paquier.xyz> > On Mon, Apr 23, 2018 at 12:21:04PM -0400, Robert Haas wrote: > > Fine, but that doesn't answer the question of whether we actually need > > to or should change the behavior in the first place. > > Per the last arguments that would be "No, we don't want to change it as > it would surprise some users": > https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz The answer is that the change of behavior is not required to fix the bug. So I'm fine with applying only (0001 and) 0002 here. https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz One reason that I introduced the "restriction" was that it was useful to avoid concurrent udpate of the flag. It is now avoided in another way. Just my opinion on the behavior follows. I don't think anyone confirms that FPI come back after switching full_page_writes (one of the reason is it needs pg_waldump). pg_start/stop_backup() on standby says that "You need to turn on FPW, then do checkpoint". It suggests that FPI's don't work before the next checkpoint. If we keep the current behavior, the documentation might need an additional phrase something like "FPW ensures that data is protected from partial writes after the next chackpoint starts". On the other hand honestly I don't have confidence that the WAL reduction worth the additional complexity by 0003. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem while setting the fpw with SIGHUP
On Mon, Apr 23, 2018 at 12:21:04PM -0400, Robert Haas wrote: > Fine, but that doesn't answer the question of whether we actually need > to or should change the behavior in the first place. Per the last arguments that would be "No, we don't want to change it as it would surprise some users": https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Wed, Apr 18, 2018 at 9:49 PM, Michael Paquier wrote: > On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote: >> I would just document the risks. If the documentation says that you >> can't rely on the value until after the next checkpoint, or whatever >> the rule is, then I think we're fine. I don't think that we really >> have the infrastructure to do any better; if we try, we'll just end up >> with odd warts. Documenting the current set of warts is less churn >> and less work. > > The last version of the patch proposed has eaten this diff which was > part of one of the past versions (v2-0001-Change-FPW-handling.patch from > https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp): > +The default is on. The change of the parameter > takes > +effect at the next checkpoint time. > So there were some documentation about the beHavior change for what it's > worth. Fine, but that doesn't answer the question of whether we actually need to or should change the behavior in the first place. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Problem while setting the fpw with SIGHUP
On Fri, Apr 20, 2018 at 11:40 AM, Kyotaro HORIGUCHI wrote: > By the way, I think I found a bug of FPW. > > The following steps yields INSERT record that doesn't have a FPI > after a checkpoint. > > (Start server with full_page_writes = off) > CREATE TABLE t (a int); > CHECKPOINT; > INSERT INTO t VALUES (1); > ALTER SYSTEM SET full_page_writes TO on; > SELECT pg_reload_conf(); > CHECKPOINT; > INSERT INTO t VALUES (1); > > The last insert is expected to write a record with FPI but it > doesn't actually. No FPI will be written for the page after that. > > It seems that the reason is that XLogInsertRecord is forgetting > to check doPageWrites' update. > > In the failure case, fpw_lsn has been set by XLogRecordAssemble > but doPageWrite is false at the time and it considers that no FPI > is required then it sets fpw_lsn to InvalidXLogRecPtr. > > After that, XLogInsertRecord receives the record but it thinks > that doPageWrites is true since it looks the shared value. > >> if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites) > > So this line thinks that "no FPI is omitted in this record" but > actually the record is just forgotten to attach them. > > The attached patch lets XLogInsertRecord check if doPageWrites > has been turned on after the last call and cause recomputation in > the case. > I think you have correctly spotted the problem and your fix looks good to me. As this is a separate problem and fix is different from what we are discussing here, I think this can be committed it separately. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
I noticed that the previous patch is a mixture with another patch.. sorry. At Thu, 19 Apr 2018 19:11:43 +0530, Amit Kapila wrote in > On Thu, Apr 19, 2018 at 7:19 AM, Michael Paquier wrote: > > On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote: > >> I would just document the risks. If the documentation says that you > >> can't rely on the value until after the next checkpoint, or whatever > >> the rule is, then I think we're fine. I don't think that we really > >> have the infrastructure to do any better; if we try, we'll just end up > >> with odd warts. Documenting the current set of warts is less churn > >> and less work. > > > > The last version of the patch proposed has eaten this diff which was > > part of one of the past versions (v2-0001-Change-FPW-handling.patch from > > https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp): > > +The default is on. The change of the parameter > > takes > > +effect at the next checkpoint time. > > So there were some documentation about the beHavior change for what it's > > worth. > > > > And, er, actually, I was thinking again about the case where a user > > wants to disable full_page_writes temporarily to do some bulk load and > > then re-enable it. With the patch proposed to actually update the FPW > > effect at checkpoint time, then a user would need to issue a manual > > checkpoint after updating the configuration and reloading, which may > > create more I/O than he'd want to pay for, then a second checkpoint > > would need to be issued after the configuration comes back again. > > > > Why a second checkpoint? One checkpoint either manual or automatic > should be enough to make the setting effective. One significant point in my first patch is anyway the FPW is useless untill the second checkpoint starts. In the sense of the timing when *useful* FPW comes back, the second checkpoint is required regardless of the patch. As Amit said, there is no difference whether it is manual or automatic. On the other hand the timing when FPW is actually turned off is different. Focusing on user's view, one can run bulkload immediately under the current behavior but should wait for the next checkpoint starts with the first patch, which can be caused manually but may be delayed after the currently running checkpoint if any. I think that starting the *first* checkpoint before bulkload is not such a bother but on the other hand the behavior can lead to FPW flood for those who are accostomed to the current behavior. The attached first patch is the bugfix proposed in this thread. The second fixes the cocurrent update problem only. The third changes the behavior so that turning-on happenes only on checkpoints and turning-off happens any time. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 9b3496ceb6f39b017698225fef289974bd01fb07 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 20 Apr 2018 15:01:26 +0900 Subject: [PATCH 1/3] Correctly attach FPI to the first record after a checkpoint XLogInsert fails to attach a required FPIs to the first record after a checkpoint if no other record has been written after full_page_writes was turned on. This makes incosistency between fpw flag of the checkpoint record and the following record's FPW status. This patch makes XLogInsertRecord to cause a recomputation when the given record is generated during FPW is off but found that the flag has been turned on. --- src/backend/access/transam/xlog.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 08dc9ba031..27753f7321 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -937,7 +937,7 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt); * * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this * WAL record applies to, that were not included in the record as full page - * images. If fpw_lsn >= RedoRecPtr, the function does not perform the + * images. If fpw_lsn <= RedoRecPtr, the function does not perform the * insertion and returns InvalidXLogRecPtr. The caller can then recalculate * which pages need a full-page image, and retry. If fpw_lsn is invalid, the * record is always inserted. @@ -970,6 +970,7 @@ XLogInsertRecord(XLogRecData *rdata, info == XLOG_SWITCH); XLogRecPtr StartPos; XLogRecPtr EndPos; + bool prevDoPageWrites = doPageWrites; /* we assume that all of the record header is in the first chunk */ Assert(rdata->len >= SizeOfXLogRecord); @@ -1022,7 +1023,8 @@ XLogInsertRecord(XLogRecData *rdata, * This can only happen just after a checkpoint, so it's better to be slow * in this case and fast otherwise. * - * If we aren't doing full-page writes then RedoRecPtr doesn't actually + * If doPageWrites was just turned on, we force a recomputation. Otherwis
Re: Problem while setting the fpw with SIGHUP
By the way, I think I found a bug of FPW. The following steps yields INSERT record that doesn't have a FPI after a checkpoint. (Start server with full_page_writes = off) CREATE TABLE t (a int); CHECKPOINT; INSERT INTO t VALUES (1); ALTER SYSTEM SET full_page_writes TO on; SELECT pg_reload_conf(); CHECKPOINT; INSERT INTO t VALUES (1); The last insert is expected to write a record with FPI but it doesn't actually. No FPI will be written for the page after that. It seems that the reason is that XLogInsertRecord is forgetting to check doPageWrites' update. In the failure case, fpw_lsn has been set by XLogRecordAssemble but doPageWrite is false at the time and it considers that no FPI is required then it sets fpw_lsn to InvalidXLogRecPtr. After that, XLogInsertRecord receives the record but it thinks that doPageWrites is true since it looks the shared value. > if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites) So this line thinks that "no FPI is omitted in this record" but actually the record is just forgotten to attach them. The attached patch lets XLogInsertRecord check if doPageWrites has been turned on after the last call and cause recomputation in the case. > * If there are any registered buffers, and a full-page image was not taken > * of all of them, *fpw_lsn is set to the lowest LSN among such pages. This > * signals that the assembled record is only good for insertion on the > * assumption that the RedoRecPtr and doPageWrites values were up-to-date. And the patch fixes one comment typo of XLogInsertRecord. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 40ce98bba0496b1eb0a982134eae9cec01d532a8 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 20 Apr 2018 15:01:26 +0900 Subject: [PATCH] Correctly attach FPI to the first record after a checkpoint XLogInsert fails to attach a required FPIs to the first record after a checkpoint if no other record has been written after full_page_writes was turned on. This makes incosistency between fpw flag of the checkpoint record and the following record's FPW status. This patch makes XLogInsertRecord to cause a recomputation when the given record is generated during FPW is off but found that the flag has been turned on. --- src/backend/access/transam/xlog.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 08dc9ba031..27753f7321 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -937,7 +937,7 @@ static void WALInsertLockUpdateInsertingAt(XLogRecPtr insertingAt); * * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this * WAL record applies to, that were not included in the record as full page - * images. If fpw_lsn >= RedoRecPtr, the function does not perform the + * images. If fpw_lsn <= RedoRecPtr, the function does not perform the * insertion and returns InvalidXLogRecPtr. The caller can then recalculate * which pages need a full-page image, and retry. If fpw_lsn is invalid, the * record is always inserted. @@ -970,6 +970,7 @@ XLogInsertRecord(XLogRecData *rdata, info == XLOG_SWITCH); XLogRecPtr StartPos; XLogRecPtr EndPos; + bool prevDoPageWrites = doPageWrites; /* we assume that all of the record header is in the first chunk */ Assert(rdata->len >= SizeOfXLogRecord); @@ -1022,7 +1023,8 @@ XLogInsertRecord(XLogRecData *rdata, * This can only happen just after a checkpoint, so it's better to be slow * in this case and fast otherwise. * - * If we aren't doing full-page writes then RedoRecPtr doesn't actually + * If doPageWrites was just turned on, we force a recomputation. Otherwise + * if we aren't doing full-page writes then RedoRecPtr doesn't actually * affect the contents of the XLOG record, so we'll update our local copy * but not force a recomputation. (If doPageWrites was just turned off, * we could recompute the record without full pages, but we choose not to @@ -1035,7 +1037,9 @@ XLogInsertRecord(XLogRecData *rdata, } doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites); - if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites) + if (doPageWrites && + (!prevDoPageWrites || + (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr))) { /* * Oops, some buffer now needs to be backed up that the caller didn't -- 2.16.3
Re: Problem while setting the fpw with SIGHUP
On Thu, Apr 19, 2018 at 07:11:43PM +0530, Amit Kapila wrote: > On Thu, Apr 19, 2018 at 7:19 AM, Michael Paquier wrote: >> And, er, actually, I was thinking again about the case where a user >> wants to disable full_page_writes temporarily to do some bulk load and >> then re-enable it. With the patch proposed to actually update the FPW >> effect at checkpoint time, then a user would need to issue a manual >> checkpoint after updating the configuration and reloading, which may >> create more I/O than he'd want to pay for, then a second checkpoint >> would need to be issued after the configuration comes back again. > > Why a second checkpoint? One checkpoint either manual or automatic > should be enough to make the setting effective. I was thinking about cases where users have say hourly cron jobs in charge of doing some maintenance of update cleanups, where they would need to be sure that full_page_writes are back online after doing the bulk-load. In this case an extra checkpoint would be necessary to make the parameter update effective. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Thu, Apr 19, 2018 at 7:19 AM, Michael Paquier wrote: > On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote: >> I would just document the risks. If the documentation says that you >> can't rely on the value until after the next checkpoint, or whatever >> the rule is, then I think we're fine. I don't think that we really >> have the infrastructure to do any better; if we try, we'll just end up >> with odd warts. Documenting the current set of warts is less churn >> and less work. > > The last version of the patch proposed has eaten this diff which was > part of one of the past versions (v2-0001-Change-FPW-handling.patch from > https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp): > +The default is on. The change of the parameter > takes > +effect at the next checkpoint time. > So there were some documentation about the beHavior change for what it's > worth. > > And, er, actually, I was thinking again about the case where a user > wants to disable full_page_writes temporarily to do some bulk load and > then re-enable it. With the patch proposed to actually update the FPW > effect at checkpoint time, then a user would need to issue a manual > checkpoint after updating the configuration and reloading, which may > create more I/O than he'd want to pay for, then a second checkpoint > would need to be issued after the configuration comes back again. > Why a second checkpoint? One checkpoint either manual or automatic should be enough to make the setting effective. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote: > I would just document the risks. If the documentation says that you > can't rely on the value until after the next checkpoint, or whatever > the rule is, then I think we're fine. I don't think that we really > have the infrastructure to do any better; if we try, we'll just end up > with odd warts. Documenting the current set of warts is less churn > and less work. The last version of the patch proposed has eaten this diff which was part of one of the past versions (v2-0001-Change-FPW-handling.patch from https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp): +The default is on. The change of the parameter takes +effect at the next checkpoint time. So there were some documentation about the beHavior change for what it's worth. And, er, actually, I was thinking again about the case where a user wants to disable full_page_writes temporarily to do some bulk load and then re-enable it. With the patch proposed to actually update the FPW effect at checkpoint time, then a user would need to issue a manual checkpoint after updating the configuration and reloading, which may create more I/O than he'd want to pay for, then a second checkpoint would need to be issued after the configuration comes back again. That would cause a regression which could surprise a class of users. WAL and FPW overhead is a problem which shows up a lot when doing bulk-loading of data. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Wed, Apr 18, 2018 at 10:37 AM, Amit Kapila wrote: > On Fri, Apr 13, 2018 at 10:36 PM, Robert Haas wrote: >> On Thu, Apr 12, 2018 at 9:29 PM, Michael Paquier wrote: >>> Still does it matter when the change is effective? >> >> I don't really care deeply about when the change takes effect, but I >> do care about whether the time when the system *says* the change took >> effect is the same as when it *actually* took effect. If those aren't >> the same, it's confusing. >> > > So, what in your opinion is the way to deal with this? If we make it > a PGC_POSTMASTER parameter, it will have a very clear behavior and > users don't need to bother whether they have a risk of torn page > problem or not and as a side-impact the code will be simplified as > well. However, as Michael said the people who get the benefit of this > option by disabling/enabling this parameter might complain. Keeping > it as a SIGHUP option has the drawback that even after the user has > enabled it, there is a risk of torn pages. I would just document the risks. If the documentation says that you can't rely on the value until after the next checkpoint, or whatever the rule is, then I think we're fine. I don't think that we really have the infrastructure to do any better; if we try, we'll just end up with odd warts. Documenting the current set of warts is less churn and less work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Problem while setting the fpw with SIGHUP
On Fri, Apr 13, 2018 at 10:36 PM, Robert Haas wrote: > On Thu, Apr 12, 2018 at 9:29 PM, Michael Paquier wrote: >> Still does it matter when the change is effective? > > I don't really care deeply about when the change takes effect, but I > do care about whether the time when the system *says* the change took > effect is the same as when it *actually* took effect. If those aren't > the same, it's confusing. > So, what in your opinion is the way to deal with this? If we make it a PGC_POSTMASTER parameter, it will have a very clear behavior and users don't need to bother whether they have a risk of torn page problem or not and as a side-impact the code will be simplified as well. However, as Michael said the people who get the benefit of this option by disabling/enabling this parameter might complain. Keeping it as a SIGHUP option has the drawback that even after the user has enabled it, there is a risk of torn pages. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Thu, Apr 12, 2018 at 9:29 PM, Michael Paquier wrote: > Still does it matter when the change is effective? I don't really care deeply about when the change takes effect, but I do care about whether the time when the system *says* the change took effect is the same as when it *actually* took effect. If those aren't the same, it's confusing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Problem while setting the fpw with SIGHUP
Sorry, the patch attached to the previous main is slightly old. The attached is the correct one. # They differ only in some phrase in a comment. At Fri, 13 Apr 2018 17:28:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20180413.172840.228724367.horiguchi.kyot...@lab.ntt.co.jp> At Fri, 13 Apr 2018 13:47:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20180413.134751.76149471.horiguchi.kyot...@lab.ntt.co.jp> > At Fri, 13 Apr 2018 08:31:02 +0530, Amit Kapila > wrote in > > On Fri, Apr 13, 2018 at 6:59 AM, Michael Paquier > > wrote: > > > On Thu, Apr 12, 2018 at 02:55:53PM -0400, Robert Haas wrote: > > >> I think it may actually be confusing. If you run pg_ctl reload and it > > >> reports that the value has changed, you'll expect it to have taken > > >> effect. But really, it will take effect at some later time. > > > > > > > +1. I also think it is confusing and it could be difficult for end > > users to know when the setting is effective. > > > > > It is true that sometimes some people like to temporarily disable > > > full_page_writes particularly when doing some bulk load of data to > > > minimize the effort on WAL, and then re-enable it just after doing > > > the inserting this data. > > > > > > Still does it matter when the change is effective? By disabling > > > full_page_writes even temporarily, you accept the fact that this > > > instance would not be safe until the next checkpoint completes. The > > > instance even finishes by writing less unnecessary WAL data if the > > > change is only effective at the next checkpoint. Well, it is true that > > > this increases potential torn pages problems but the user is already > > > accepting that risk if a crash happens until the next checkpoint then it > > > exposes itself to torn pages anyway as it chose to disable > > > full_page_writes. > > I still don't think that enabling FPW anytime is useful but > disabling seems useful as I mentioned upthread. > > The problem was checkpointer changes the flag anytime including > recovery time. Startup process updates the same flag at the end > of recovery but before publicated. Letting checkpointer change > the flag only at checkpoint time is a straightforward way to > avoid conflicts with startup process. > > I reconsider a bit and came up with the thought that we could > just skip changing shared FPW in checkpointer until recovery > ends, then update the flag after recovery end (perhaps at > checkpoint time in major cases). In this case, FPI is attached > from REDO point of the first checkpoint (not restartpoint) or a > bit earlier, then FPW can be flipped at any time. I'll come up > with that soon. Please find the attached. The most significant change is that UpdateSharedMemoryConfig skips updating of shared fullPageWrites during recovery. The original crash is fixed since this guarantees that XLog working area is initializeed before reaching UpdateFullPageWrites(). Addition to that, I changed CheckpointerMain so that it tries update of shared FPW regardless of SIGHUP and provided new function to just wakeup checkpointer. StartupXLOG wakes up checkpointer either checkpoint is required or not and checkpointer makes the first update of shared FPW at the time. After this point, everything works as the same as the current behavior. > > I think this means that is will be difficult for end users to predict > > unless they track the next checkpoint which isn't too bad, but won't > > be convenient either. > > Looking checkpiont record is enough to know wheter the checkpoint > is protected by FPW eough, but I agree that such strictness is > not crutial. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From b17158ce5cb8f29c587fcf8b47ac1ebb8bac6a70 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 13 Apr 2018 16:29:25 +0900 Subject: [PATCH] Inhibit update shared FPW during recovery The shared full_pages_writes is currently updated by checkpointer even during recovery. This leads to unexpected concurrent updates of the flag or a crash by using uninitialized memory. The update is needless in the first place so this patch inhibits checkpointer from updating it during recovery. Instead, startup process wakes up checkpointer at the end of recovery so as to the shared flag is updated just after. --- src/backend/access/transam/xlog.c | 4 +++ src/backend/postmaster/checkpointer.c | 58 ++- src/include/postmaster/bgwriter.h | 1 + 3 files changed, 42 insertions(+), 21 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 08dc9ba031..d518cc4a60 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7826,9 +7826,13 @@ StartupXLOG(void) * isn't required for consistency, but the last restartpoint might be far * back, and in case of a crash, recovering from it might take a longer * than is appropriate now that we're not
Re: Problem while setting the fpw with SIGHUP
At Fri, 13 Apr 2018 13:47:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20180413.134751.76149471.horiguchi.kyot...@lab.ntt.co.jp> > At Fri, 13 Apr 2018 08:31:02 +0530, Amit Kapila > wrote in > > On Fri, Apr 13, 2018 at 6:59 AM, Michael Paquier > > wrote: > > > On Thu, Apr 12, 2018 at 02:55:53PM -0400, Robert Haas wrote: > > >> I think it may actually be confusing. If you run pg_ctl reload and it > > >> reports that the value has changed, you'll expect it to have taken > > >> effect. But really, it will take effect at some later time. > > > > > > > +1. I also think it is confusing and it could be difficult for end > > users to know when the setting is effective. > > > > > It is true that sometimes some people like to temporarily disable > > > full_page_writes particularly when doing some bulk load of data to > > > minimize the effort on WAL, and then re-enable it just after doing > > > the inserting this data. > > > > > > Still does it matter when the change is effective? By disabling > > > full_page_writes even temporarily, you accept the fact that this > > > instance would not be safe until the next checkpoint completes. The > > > instance even finishes by writing less unnecessary WAL data if the > > > change is only effective at the next checkpoint. Well, it is true that > > > this increases potential torn pages problems but the user is already > > > accepting that risk if a crash happens until the next checkpoint then it > > > exposes itself to torn pages anyway as it chose to disable > > > full_page_writes. > > I still don't think that enabling FPW anytime is useful but > disabling seems useful as I mentioned upthread. > > The problem was checkpointer changes the flag anytime including > recovery time. Startup process updates the same flag at the end > of recovery but before publicated. Letting checkpointer change > the flag only at checkpoint time is a straightforward way to > avoid conflicts with startup process. > > I reconsider a bit and came up with the thought that we could > just skip changing shared FPW in checkpointer until recovery > ends, then update the flag after recovery end (perhaps at > checkpoint time in major cases). In this case, FPI is attached > from REDO point of the first checkpoint (not restartpoint) or a > bit earlier, then FPW can be flipped at any time. I'll come up > with that soon. Please find the attached. The most significant change is that UpdateSharedMemoryConfig skips updating of shared fullPageWrites during recovery. The original crash is fixed since this guarantees that XLog working area is initializeed before reaching UpdateFullPageWrites(). Addition to that, I changed CheckpointerMain so that it tries update of shared FPW regardless of SIGHUP and provided new function to just wakeup checkpointer. StartupXLOG wakes up checkpointer either checkpoint is required or not and checkpointer makes the first update of shared FPW at the time. After this point, everything works as the same as the current behavior. > > I think this means that is will be difficult for end users to predict > > unless they track the next checkpoint which isn't too bad, but won't > > be convenient either. > > Looking checkpiont record is enough to know wheter the checkpoint > is protected by FPW eough, but I agree that such strictness is > not crutial. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 70cf6a0c0fb4c4e421d8895c197b1bf7fecdaa8f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 13 Apr 2018 16:29:25 +0900 Subject: [PATCH] Inhibit update shared FPW during recovery The shared full_pages_writes is currently updated by checkpointer even during recovery. This leads to unexpected concurrent updates of the flag or a crash by using uninitialized memory. The update is needless in the first place so this patch inhibits checkpointer from updating it during recovery. Instead, startup process wakes up checkpointer at the end of recovery so as to the shared flag is updated just after. --- src/backend/access/transam/xlog.c | 4 +++ src/backend/postmaster/checkpointer.c | 57 ++- src/include/postmaster/bgwriter.h | 1 + 3 files changed, 41 insertions(+), 21 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 08dc9ba031..581844904d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7826,9 +7826,13 @@ StartupXLOG(void) * isn't required for consistency, but the last restartpoint might be far * back, and in case of a crash, recovering from it might take a longer * than is appropriate now that we're not in standby mode anymore. + * If no checkpoint need, request checkpointer to process shared + * configuration instead. */ if (fast_promoted) RequestCheckpoint(CHECKPOINT_FORCE); + else + WakeupCheckpointer(); } /* diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postma
Re: Problem while setting the fpw with SIGHUP
At Fri, 13 Apr 2018 08:31:02 +0530, Amit Kapila wrote in > On Fri, Apr 13, 2018 at 6:59 AM, Michael Paquier wrote: > > On Thu, Apr 12, 2018 at 02:55:53PM -0400, Robert Haas wrote: > >> I think it may actually be confusing. If you run pg_ctl reload and it > >> reports that the value has changed, you'll expect it to have taken > >> effect. But really, it will take effect at some later time. > > > > +1. I also think it is confusing and it could be difficult for end > users to know when the setting is effective. > > > It is true that sometimes some people like to temporarily disable > > full_page_writes particularly when doing some bulk load of data to > > minimize the effort on WAL, and then re-enable it just after doing > > the inserting this data. > > > > Still does it matter when the change is effective? By disabling > > full_page_writes even temporarily, you accept the fact that this > > instance would not be safe until the next checkpoint completes. The > > instance even finishes by writing less unnecessary WAL data if the > > change is only effective at the next checkpoint. Well, it is true that > > this increases potential torn pages problems but the user is already > > accepting that risk if a crash happens until the next checkpoint then it > > exposes itself to torn pages anyway as it chose to disable > > full_page_writes. I still don't think that enabling FPW anytime is useful but disabling seems useful as I mentioned upthread. The problem was checkpointer changes the flag anytime including recovery time. Startup process updates the same flag at the end of recovery but before publicated. Letting checkpointer change the flag only at checkpoint time is a straightforward way to avoid conflicts with startup process. I reconsider a bit and came up with the thought that we could just skip changing shared FPW in checkpointer until recovery ends, then update the flag after recovery end (perhaps at checkpoint time in major cases). In this case, FPI is attached from REDO point of the first checkpoint (not restartpoint) or a bit earlier, then FPW can be flipped at any time. I'll come up with that soon. > I think this means that is will be difficult for end users to predict > unless they track the next checkpoint which isn't too bad, but won't > be convenient either. Looking checkpiont record is enough to know wheter the checkpoint is protected by FPW eough, but I agree that such strictness is not crutial. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem while setting the fpw with SIGHUP
On Fri, Apr 13, 2018 at 6:59 AM, Michael Paquier wrote: > On Thu, Apr 12, 2018 at 02:55:53PM -0400, Robert Haas wrote: >> I think it may actually be confusing. If you run pg_ctl reload and it >> reports that the value has changed, you'll expect it to have taken >> effect. But really, it will take effect at some later time. > +1. I also think it is confusing and it could be difficult for end users to know when the setting is effective. > It is true that sometimes some people like to temporarily disable > full_page_writes particularly when doing some bulk load of data to > minimize the effort on WAL, and then re-enable it just after doing > the inserting this data. > > Still does it matter when the change is effective? By disabling > full_page_writes even temporarily, you accept the fact that this > instance would not be safe until the next checkpoint completes. The > instance even finishes by writing less unnecessary WAL data if the > change is only effective at the next checkpoint. Well, it is true that > this increases potential torn pages problems but the user is already > accepting that risk if a crash happens until the next checkpoint then it > exposes itself to torn pages anyway as it chose to disable > full_page_writes. > I think this means that is will be difficult for end users to predict unless they track the next checkpoint which isn't too bad, but won't be convenient either. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Thu, Apr 12, 2018 at 02:55:53PM -0400, Robert Haas wrote: > I think it may actually be confusing. If you run pg_ctl reload and it > reports that the value has changed, you'll expect it to have taken > effect. But really, it will take effect at some later time. It is true that sometimes some people like to temporarily disable full_page_writes particularly when doing some bulk load of data to minimize the effort on WAL, and then re-enable it just after doing the inserting this data. Still does it matter when the change is effective? By disabling full_page_writes even temporarily, you accept the fact that this instance would not be safe until the next checkpoint completes. The instance even finishes by writing less unnecessary WAL data if the change is only effective at the next checkpoint. Well, it is true that this increases potential torn pages problems but the user is already accepting that risk if a crash happens until the next checkpoint then it exposes itself to torn pages anyway as it chose to disable full_page_writes. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Wed, Apr 11, 2018 at 7:09 AM, Heikki Linnakangas wrote: > I think the new behavior where the GUC only takes effect at next checkpoint > is OK. It seems quite intuitive. I think it may actually be confusing. If you run pg_ctl reload and it reports that the value has changed, you'll expect it to have taken effect. But really, it will take effect at some later time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Problem while setting the fpw with SIGHUP
On Thu, Apr 12, 2018 at 04:59:10PM +0900, Kyotaro HORIGUCHI wrote: > At Thu, 12 Apr 2018 14:07:53 +0900, Michael Paquier > wrote in <20180412050753.ga19...@paquier.xyz> >> I have been able to spend a couple of hours on your patch, wrapping my >> mind on your stuff. So what I had in mind was something like this type >> of scenario: > > Thank for the precise explanation. Just to be clear and to avoid incorrect conclusion. This is the type of scenarios I imagined about when I read your previous email, concluding such scenarios those cannot apply per the strong assumption on SharedRecoveryInProgress your patch heavily relies on. In short I have no objections. >> (The latest patch is a mix of two patches.) > > Sorry I counld get this. The patch called v2-0001-Change-FPW-handling.patch posted on https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp, which is the latest version available, is a mix of the patch you are creating for this thread and of a patch aimed a fixing an issue with partition range handling. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
Hello. At Thu, 12 Apr 2018 14:07:53 +0900, Michael Paquier wrote in <20180412050753.ga19...@paquier.xyz> > On Thu, Apr 12, 2018 at 10:34:30AM +0900, Kyotaro HORIGUCHI wrote: > > Checkpointer never calls CreateCheckPoint while > > RecoveryInProgress() == true. In other words, checkpointer is not > > an updator of shared FPW at the time StartupXLOG calls > > CreateCheckPoint for fallback_promote. > > I have been able to spend a couple of hours on your patch, wrapping my > mind on your stuff. So what I had in mind was something like this type > of scenario: Thank for the precise explanation. The scenario that CreateCheckPoint is called simultaneously in different prcoesses seems broken by itself in the first place but I put that aside for now. > 1) The startup process requires a restart point. > 2) The checkpointer receives the request, and blocks before reading > RecoveryInProgress(). RecoveryInProgress doesn't take lock. But I assume here that checkpointer is taking a long time after entering RecoveryInProgress and haven't actually read SharedRecoveryInProgress. > 3) A fallback_promote is triggered, making the startup process call > CreateCheckpoint(). I'm confused here. It seems to me that StartupXLOG calls CreateCheckPoint only in bootstrap or standalone cases. No concurrent processe is running in the cases. Even if CreateCheckPoint is called there in the IsUnderPostmater case, checkpointer never calls CreateCheckPoint during the checkpoint. This is safe in regard to shared FPW. (but checkpoint is being blocked in this scenario.) Assuming that RequestCheckpoint() is called here, the request is merged with the previous request above and the function returns after the checkpoint ends. (That is, checkpointer must continue to run in this case.) > 4) Startup process finishes checkpoint, updates Insert->fullPageWrites. According to this scenario, checkpionter is still stalling now. So it is not a concurrent update. > 5) Checkpoint reads RecoveryInProgress to false, moves on with its > checkpoint. If checkpointer sees SharedRecoveryInProgress being false, Create(or Request)CheckPoint called in (3) must have finished and StartupXLOG() no longer updates shared FPW. There's no concurrent update. > > The comment may be somewhat confusing that it is written > > there. The point is that checkpointer and StartupXLOG are > > mutually excluded on updating shared FPW by > > SharedRecoveryInProgress flag. > > Indeed. I can see that it is the main key point of the patch. > > > | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before > > | * the flag actually takes effect. Checkpointer never calls this function > > | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no > > | * window where checkpointer and startup processes - the only updators of > > | * the flag - can update shared FPW simultaneously. Thus no lock is > > | * required here. Both shared and local fullPageWrites do not change > > | * before the next reading below. > > Yeah, this reduces the confusion. Thanks^^; > (The latest patch is a mix of two patches.) Sorry I counld get this. > +The default is on. The change of the parmeter > takes > +effect at the next checkpoint time. > s/parmeter/parameter/ > > By the way, I would vote for keeping track in WAL of full_page_writes > switched from off to on. This is not used in the backend, but that's > still useful for debugging end-user issues. Agreed and I tried that. The problem on that is that some records can be written after REDO point before XLOG_FPW_CHANGE(true) is written. However this is no problem for the FPW-related stuff to work properly (since no one looks it), the FPW record suggests that the current checkpoint loses FPI in the first several records. This has a far larger impact with this patch because shared FPW is always turned on just at REDO point. So I choosed not to write XLOG_FPW_CHANGE(false) rather than writing bogus records. > Actually, I was wondering why a spin lock is not taken in > RecoveryInProgress when reading SharedRecoveryInProgress, but that's > from 1a3d1044 which added a memory barrier instead as guarantee... Maybe it doesn't need barrier, since the flag is initialized as true and becomes false just once and delay in reading by other processes doesn't no harm. I think that bool doesn't suffer atomicity. Even all these are true, some description is needed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem while setting the fpw with SIGHUP
On Thu, Apr 12, 2018 at 10:34:30AM +0900, Kyotaro HORIGUCHI wrote: > Checkpointer never calls CreateCheckPoint while > RecoveryInProgress() == true. In other words, checkpointer is not > an updator of shared FPW at the time StartupXLOG calls > CreateCheckPoint for fallback_promote. I have been able to spend a couple of hours on your patch, wrapping my mind on your stuff. So what I had in mind was something like this type of scenario: 1) The startup process requires a restart point. 2) The checkpointer receives the request, and blocks before reading RecoveryInProgress(). 3) A fallback_promote is triggered, making the startup process call CreateCheckpoint(). 4) Startup process finishes checkpoint, updates Insert->fullPageWrites. 5) Checkpoint reads RecoveryInProgress to false, moves on with its checkpoint. > The comment may be somewhat confusing that it is written > there. The point is that checkpointer and StartupXLOG are > mutually excluded on updating shared FPW by > SharedRecoveryInProgress flag. Indeed. I can see that it is the main key point of the patch. > | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before > | * the flag actually takes effect. Checkpointer never calls this function > | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no > | * window where checkpointer and startup processes - the only updators of > | * the flag - can update shared FPW simultaneously. Thus no lock is > | * required here. Both shared and local fullPageWrites do not change > | * before the next reading below. Yeah, this reduces the confusion. (The latest patch is a mix of two patches.) +The default is on. The change of the parmeter takes +effect at the next checkpoint time. s/parmeter/parameter/ By the way, I would vote for keeping track in WAL of full_page_writes switched from off to on. This is not used in the backend, but that's still useful for debugging end-user issues. Actually, I was wondering why a spin lock is not taken in RecoveryInProgress when reading SharedRecoveryInProgress, but that's from 1a3d1044 which added a memory barrier instead as guarantee... -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
Hello. Thanks to Heikkit for picking this up and thanks for the commnet to Michael. # The attached is changed only in a comment, and rebased. At Thu, 12 Apr 2018 05:24:14 +0900, Michael Paquier wrote in <20180411202414.ga32...@paquier.xyz> > On Wed, Apr 11, 2018 at 02:09:48PM +0300, Heikki Linnakangas wrote: > > I think the new behavior where the GUC only takes effect at next checkpoint > > is OK. It seems quite intuitive. > > > > > [rebased patch version] > > > > Looks good at a quick glance. Assuming no objections from others, I'll take > > a closer look and commit tomorrow. Thanks! > > Sorry for not following up closely this thread lately. > > + /* > +* If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before > +* the flag actually takes effect. No lock is required since checkpointer > +* is the only updator of shared fullPageWrites after recovery is > +* finished. Both shared and local fullPageWrites do not change before the > +* next reading below. > +*/ > + if (Insert->fullPageWrites && !fullPageWrites) > + { > + XLogBeginInsert(); > + XLogRegisterData((char *) (&fullPageWrites), sizeof(bool)); > + XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE); > + } > > This is not actually true. If a fallback_promote is used, then > CreateCheckPoint() is called by the startup process which is in charge > of issuing the end-of-recovery checkpoint, and not the checkpointer. So > I still fail to see how a no-lock approach is fine except if we remove > fallback_promote? Checkpointer never calls CreateCheckPoint while RecoveryInProgress() == true. In other words, checkpointer is not an updator of shared FPW at the time StartupXLOG calls CreateCheckPoint for fallback_promote. The comment may be somewhat confusing that it is written there. The point is that checkpointer and StartupXLOG are mutually excluded on updating shared FPW by SharedRecoveryInProgress flag. | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before | * the flag actually takes effect. Checkpointer never calls this function | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no | * window where checkpointer and startup processes - the only updators of | * the flag - can update shared FPW simultaneously. Thus no lock is | * required here. Both shared and local fullPageWrites do not change | * before the next reading below. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From f6d4857356508fa16dc5d54b92d0177dbeaae3e2 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 6 Apr 2018 13:57:48 +0900 Subject: [PATCH] Change FPW handling The GUC full_pages_writes currently has an effect immediately. That makes a race condition between config reload on checkpointer and StartupXLOG. But since full page images are meaningful only when they are attached to all WAL records covers a checkpoint, there is no problem if we update the shared FPW only at REDO point. On the other hand, online backup mechanism on standby requires to know if FPW is turned off before the next checkpoint record comes. As the result, with this patch, changing of full_page_writes takes effect at REDO point and additional XLOG_FPW_CHANGE is written only for turning-off. These are sufficient for standby-backup to work properly, reduces complexity and prevent the race condition. --- doc/src/sgml/config.sgml | 4 +- src/backend/access/transam/xlog.c | 116 - src/backend/optimizer/util/clauses.c | 4 +- src/backend/parser/gram.y | 78 ++- src/backend/parser/parse_agg.c | 10 +++ src/backend/parser/parse_expr.c| 5 ++ src/backend/parser/parse_func.c| 3 + src/backend/parser/parse_utilcmd.c | 72 +++--- src/backend/postmaster/checkpointer.c | 6 -- src/include/access/xlog.h | 1 - src/include/catalog/pg_control.h | 2 +- src/include/optimizer/clauses.h| 2 + src/include/parser/parse_node.h| 1 + src/include/utils/rel.h| 6 ++ src/test/regress/expected/create_table.out | 36 + src/test/regress/sql/create_table.sql | 21 +- 16 files changed, 192 insertions(+), 175 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5d5f2d23c4..7ea42c25e2 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2598,7 +2598,9 @@ include_dir 'conf.d' This parameter can only be set in the postgresql.conf file or on the server command line. -The default is on. + +The default is on. The change of the parmeter takes +effect at the next checkpoint time. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4a47395174..e251cc108b 100644 --- a/src/backend/access/transam/xlog.c +
Re: Problem while setting the fpw with SIGHUP
On Wed, Apr 11, 2018 at 02:09:48PM +0300, Heikki Linnakangas wrote: > I think the new behavior where the GUC only takes effect at next checkpoint > is OK. It seems quite intuitive. > > > [rebased patch version] > > Looks good at a quick glance. Assuming no objections from others, I'll take > a closer look and commit tomorrow. Thanks! Sorry for not following up closely this thread lately. + /* +* If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before +* the flag actually takes effect. No lock is required since checkpointer +* is the only updator of shared fullPageWrites after recovery is +* finished. Both shared and local fullPageWrites do not change before the +* next reading below. +*/ + if (Insert->fullPageWrites && !fullPageWrites) + { + XLogBeginInsert(); + XLogRegisterData((char *) (&fullPageWrites), sizeof(bool)); + XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE); + } This is not actually true. If a fallback_promote is used, then CreateCheckPoint() is called by the startup process which is in charge of issuing the end-of-recovery checkpoint, and not the checkpointer. So I still fail to see how a no-lock approach is fine except if we remove fallback_promote? -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On 09/04/18 11:13, Kyotaro HORIGUCHI wrote: At Fri, 6 Apr 2018 17:59:58 +0530, Amit Kapila wrote in On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI wrote: Hello. At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20180404.172646.238325981.horiguchi.kyot...@lab.ntt.co.jp> In general, I was wondering why in the first place this variable (full_page_writes) is a SIGHUP variable? I think if the user tries to switch it to 'on' from 'off', it won't guarantee the recovery from torn pages. Yeah, one can turn it to 'off' from 'on' without any problem, but as the reverse doesn't guarantee anything, it can confuse users. What do you think? I tend to agree with you. It works as expected after the next checkpoint. So define the variable as "it can be changed any time but has an effect at the next checkpoint time", then remove XLOG_FPW_CHANGE record. And that eliminates the problem of concurrent calls since the checkpointer becomes the only modifier of the variable. And the problematic function UpdateFullPageWrites also no longer needs to write a WAL record. The information is conveyed only by checkpoint records. I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by pg_start/stop_backup to know FPW's turning-off without waiting for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not required since no one uses the information. It seems even harmful when it is written at the incorrect place. In the attached patch, shared fullPageWrites is updated only at REDO point I am not completely sure if that is the right option because this would mean that we are defining the new scope for a GUC variable. I guess we should take the input of others as well. I am not sure what is the right way to do that, but maybe we can start a new thread with a proper subject and description rather than discussing this under some related bug fix patch discussion. I guess we can try that after CF unless some other people pitch in and share their feedback. I think the new behavior where the GUC only takes effect at next checkpoint is OK. It seems quite intuitive. [rebased patch version] Looks good at a quick glance. Assuming no objections from others, I'll take a closer look and commit tomorrow. Thanks! - Heikki
Re: Problem while setting the fpw with SIGHUP
At Fri, 6 Apr 2018 17:59:58 +0530, Amit Kapila wrote in > On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI > wrote: > > Hello. > > > > At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI > > wrote in > > <20180404.172646.238325981.horiguchi.kyot...@lab.ntt.co.jp> > >> > In general, I was wondering why in the first place this variable > >> > (full_page_writes) is a SIGHUP variable? I think if the user tries to > >> > switch it to 'on' from 'off', it won't guarantee the recovery from > >> > torn pages. Yeah, one can turn it to 'off' from 'on' without any > >> > problem, but as the reverse doesn't guarantee anything, it can confuse > >> > users. What do you think? > >> > >> I tend to agree with you. It works as expected after the next > >> checkpoint. So define the variable as "it can be changed any time > >> but has an effect at the next checkpoint time", then remove > >> XLOG_FPW_CHANGE record. And that eliminates the problem of > >> concurrent calls since the checkpointer becomes the only modifier > >> of the variable. And the problematic function > >> UpdateFullPageWrites also no longer needs to write a WAL > >> record. The information is conveyed only by checkpoint records. > > > > I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by > > pg_start/stop_backup to know FPW's turning-off without waiting > > for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not > > required since no one uses the information. It seems even harmful > > when it is written at the incorrect place. > > > > In the attached patch, shared fullPageWrites is updated only at > > REDO point > > > > I am not completely sure if that is the right option because this > would mean that we are defining the new scope for a GUC variable. I > guess we should take the input of others as well. I am not sure what > is the right way to do that, but maybe we can start a new thread with > a proper subject and description rather than discussing this under > some related bug fix patch discussion. I guess we can try that after > CF unless some other people pitch in and share their feedback. I'd like to refrain from making a new thread since this topic is registered as an open item (in Old Bugs section). Or making a new thread then relinking it from the page is preferable? I'm surprised a bit that this got confilcted so soon. On the way rebasing, for anyone's information, I considered comment and documentation fix but all comments and documentation can be read as both old and new behavior. That being said, the patch contains a small addtion about the new behavior. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From fe0401335e0ac1a9ae36dbdb5c2db82f9081a1a3 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 6 Apr 2018 13:57:48 +0900 Subject: [PATCH] Change FPW handling The GUC full_pages_writes currently has an effect immediately. That makes a race condition between config reload on checkpointer and StartupXLOG. But since full page images are meaningful only when they are attached to all WAL records covers a checkpoint, there is no problem if we update the shared FPW only at REDO point. On the other hand, online backup mechanism on standby requires to know if FPW is turned off before the next checkpoint record comes. As the result, with this patch, changing of full_page_writes takes effect at REDO point and additional XLOG_FPW_CHANGE is written only for turning-off. These are sufficient for standby-backup to work properly, reduces complexity and prevent the race condition. --- doc/src/sgml/config.sgml | 4 +- src/backend/access/transam/xlog.c | 114 +- src/backend/postmaster/checkpointer.c | 6 -- src/include/access/xlog.h | 1 - src/include/catalog/pg_control.h | 2 +- 5 files changed, 34 insertions(+), 93 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5d5f2d23c4..7ea42c25e2 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2598,7 +2598,9 @@ include_dir 'conf.d' This parameter can only be set in the postgresql.conf file or on the server command line. -The default is on. + +The default is on. The change of the parmeter takes +effect at the next checkpoint time. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4a47395174..ba9cf07d38 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -202,14 +202,6 @@ static XLogRecPtr LastRec; static XLogRecPtr receivedUpto = 0; static TimeLineID receiveTLI = 0; -/* - * During recovery, lastFullPageWrites keeps track of full_page_writes that - * the replayed WAL records indicate. It's initialized with full_page_writes - * that the recovery starting checkpoint record indicates, and then updated - * each time XLOG_FPW_CHANGE record is replayed. - *
Re: Problem while setting the fpw with SIGHUP
On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI wrote: > Hello. > > At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI > wrote in > <20180404.172646.238325981.horiguchi.kyot...@lab.ntt.co.jp> >> > In general, I was wondering why in the first place this variable >> > (full_page_writes) is a SIGHUP variable? I think if the user tries to >> > switch it to 'on' from 'off', it won't guarantee the recovery from >> > torn pages. Yeah, one can turn it to 'off' from 'on' without any >> > problem, but as the reverse doesn't guarantee anything, it can confuse >> > users. What do you think? >> >> I tend to agree with you. It works as expected after the next >> checkpoint. So define the variable as "it can be changed any time >> but has an effect at the next checkpoint time", then remove >> XLOG_FPW_CHANGE record. And that eliminates the problem of >> concurrent calls since the checkpointer becomes the only modifier >> of the variable. And the problematic function >> UpdateFullPageWrites also no longer needs to write a WAL >> record. The information is conveyed only by checkpoint records. > > I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by > pg_start/stop_backup to know FPW's turning-off without waiting > for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not > required since no one uses the information. It seems even harmful > when it is written at the incorrect place. > > In the attached patch, shared fullPageWrites is updated only at > REDO point > I am not completely sure if that is the right option because this would mean that we are defining the new scope for a GUC variable. I guess we should take the input of others as well. I am not sure what is the right way to do that, but maybe we can start a new thread with a proper subject and description rather than discussing this under some related bug fix patch discussion. I guess we can try that after CF unless some other people pitch in and share their feedback. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
Hello. At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20180404.172646.238325981.horiguchi.kyot...@lab.ntt.co.jp> > > In general, I was wondering why in the first place this variable > > (full_page_writes) is a SIGHUP variable? I think if the user tries to > > switch it to 'on' from 'off', it won't guarantee the recovery from > > torn pages. Yeah, one can turn it to 'off' from 'on' without any > > problem, but as the reverse doesn't guarantee anything, it can confuse > > users. What do you think? > > I tend to agree with you. It works as expected after the next > checkpoint. So define the variable as "it can be changed any time > but has an effect at the next checkpoint time", then remove > XLOG_FPW_CHANGE record. And that eliminates the problem of > concurrent calls since the checkpointer becomes the only modifier > of the variable. And the problematic function > UpdateFullPageWrites also no longer needs to write a WAL > record. The information is conveyed only by checkpoint records. I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by pg_start/stop_backup to know FPW's turning-off without waiting for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not required since no one uses the information. It seems even harmful when it is written at the incorrect place. In the attached patch, shared fullPageWrites is updated only at REDO point and XLOG_FPW_CHANGE is written only for FPW's turning off before REDO point. Disabling FPW at any time makes sense when we need to slow down the speed of WAL urgently, but... regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 10f6865cf1e96e8d883ed7e03e1fafb6e73ec935 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 6 Apr 2018 13:57:48 +0900 Subject: [PATCH] Change FPW handling The GUC full_pages_writes currently has an effect immediately. That makes a race condition between config reload on checkpointer and StartupXLOG. But since full page images are meaningful only when they are attached to all WAL records covers a checkpoint, there is no problem if we update the shared FPW only at REDO point. On the other hand, online backup mechanism on standby requires to know if FPW is turned off before the next checkpoint record comes. As the result, with this patch, changing of full_page_writes takes effect at REDO point and additional XLOG_FPW_CHANGE is written only for turning-off. These are sufficient for standby-backup to work properly, reduces complexity and prevent the race condition. --- src/backend/access/transam/xlog.c | 114 +- src/backend/postmaster/checkpointer.c | 6 -- src/include/access/xlog.h | 1 - src/include/catalog/pg_control.h | 2 +- 4 files changed, 31 insertions(+), 92 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b4fd8395b7..0a81650c26 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -202,14 +202,6 @@ static XLogRecPtr LastRec; static XLogRecPtr receivedUpto = 0; static TimeLineID receiveTLI = 0; -/* - * During recovery, lastFullPageWrites keeps track of full_page_writes that - * the replayed WAL records indicate. It's initialized with full_page_writes - * that the recovery starting checkpoint record indicates, and then updated - * each time XLOG_FPW_CHANGE record is replayed. - */ -static bool lastFullPageWrites; - /* * Local copy of SharedRecoveryInProgress variable. True actually means "not * known, need to check the shared state". @@ -6776,11 +6768,7 @@ StartupXLOG(void) */ restoreTwoPhaseData(); - lastFullPageWrites = checkPoint.fullPageWrites; - RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo; - doPageWrites = lastFullPageWrites; - if (RecPtr < checkPoint.redo) ereport(PANIC, (errmsg("invalid redo in checkpoint record"))); @@ -7575,16 +7563,6 @@ StartupXLOG(void) /* Pre-scan prepared transactions to find out the range of XIDs present */ oldestActiveXID = PrescanPreparedTransactions(NULL, NULL); - /* - * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE - * record before resource manager writes cleanup WAL records or checkpoint - * record is written. - */ - Insert->fullPageWrites = lastFullPageWrites; - LocalSetXLogInsertAllowed(); - UpdateFullPageWrites(); - LocalXLogInsertAllowed = -1; - if (InRecovery) { /* @@ -7808,6 +7786,13 @@ StartupXLOG(void) ControlFile->state = DB_IN_PRODUCTION; ControlFile->time = (pg_time_t) time(NULL); + /* + * Set the initial value of shared fullPageWrites. Once + * SharedRecoveryInProgress is turned false, checkpointer will update this + * value. + */ + XLogCtl->Insert.fullPageWrites = checkPoint.fullPageWrites; + SpinLockAcquire(&XLogCtl->info_lck); XLogCtl->SharedRecoveryInProgress = false; SpinLockRelease(&XLogCtl->info_lck); @@ -8669,6 +8654,20 @@ CreateChe
Re: Problem while setting the fpw with SIGHUP
At Sat, 31 Mar 2018 17:43:58 +0530, Amit Kapila wrote in > On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI > wrote: > > At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier > > wrote in <20180327130226.ga1...@paquier.xyz> > >> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote: > >> > The current UpdateFullPageWrites is safe on standby and promotion > >> > so what we should consider is only the non-standby case. I think > >> > what we should do is just calling RecoveryInProgress() at the > >> > beginning of CheckPointerMain, which is just the same thing with > >> > InitPostgres, but before setting up signal handler to avoid > >> > processing SIGHUP before being ready to insert xlog. > >> > >> Your proposal does not fix the issue for a checkpointer process started > >> on a standby. After a promotion, if SIGHUP is issued with a change in > >> full_page_writes, then the initialization of InitXLogInsert() would > >> happen again in the critical section of UpdateFullPageWrites(). The > >> window is rather small for normal promotions as the startup process > >> requests a checkpoint which would do the initialization, and much larger > >> for fallback_promote where the startup process is in charge of doing the > >> end-of-recovery checkpoint. > > > > Yeah. I realized that after sending the mail. > > > > I looked closer and found several problems there. > > > > - On standby, StartupXLOG calls UpdateFullPageWrites and > > checkpointer can call the same function simultaneously, but it > > doesn't assume concurrent call. > > > > - StartupXLOG can make a concurrent write to > > Insert->fullPageWrite so it needs to be locked. > > > > - At the time of the very end of recovery, the startup process > > ignores possible change of full_page_writes GUC. It sticks with > > the startup value. It leads to loss of > > XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by > > reload) > > > > Won't this be covered by checkpointer process? Basically, the next > time checkpointer sees that it has received the sighup, it will call > UpdateFullPageWrites which will log the record if required. Right. Checkpointer is doing the right thing, but without writing XLOG_FPW_CHANGE records during recovery. The problem is in StartupXLOG. It doesn't read shared FPW flag during recovery and updates local flag according to WAL records. Then it tries to issue XLOG_FPW_CHANGE if its local status and shared flag are different. This is correct. But after that, checkpointer still cannot write XLOG (before SharedRecovoeryInProgress becomes false) but checkpointer can change shared fullPagesWrites without writing WAL and the WAL record is eventually lost. > In general, I was wondering why in the first place this variable > (full_page_writes) is a SIGHUP variable? I think if the user tries to > switch it to 'on' from 'off', it won't guarantee the recovery from > torn pages. Yeah, one can turn it to 'off' from 'on' without any > problem, but as the reverse doesn't guarantee anything, it can confuse > users. What do you think? I tend to agree with you. It works as expected after the next checkpoint. So define the variable as "it can be changed any time but has an effect at the next checkpoint time", then remove XLOG_FPW_CHANGE record. And that eliminates the problem of concurrent calls since the checkpointer becomes the only modifier of the variable. And the problematic function UpdateFullPageWrites also no longer needs to write a WAL record. The information is conveyed only by checkpoint records. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem while setting the fpw with SIGHUP
On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI wrote: > At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier > wrote in <20180327130226.ga1...@paquier.xyz> >> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote: >> > The current UpdateFullPageWrites is safe on standby and promotion >> > so what we should consider is only the non-standby case. I think >> > what we should do is just calling RecoveryInProgress() at the >> > beginning of CheckPointerMain, which is just the same thing with >> > InitPostgres, but before setting up signal handler to avoid >> > processing SIGHUP before being ready to insert xlog. >> >> Your proposal does not fix the issue for a checkpointer process started >> on a standby. After a promotion, if SIGHUP is issued with a change in >> full_page_writes, then the initialization of InitXLogInsert() would >> happen again in the critical section of UpdateFullPageWrites(). The >> window is rather small for normal promotions as the startup process >> requests a checkpoint which would do the initialization, and much larger >> for fallback_promote where the startup process is in charge of doing the >> end-of-recovery checkpoint. > > Yeah. I realized that after sending the mail. > > I looked closer and found several problems there. > > - On standby, StartupXLOG calls UpdateFullPageWrites and > checkpointer can call the same function simultaneously, but it > doesn't assume concurrent call. > > - StartupXLOG can make a concurrent write to > Insert->fullPageWrite so it needs to be locked. > > - At the time of the very end of recovery, the startup process > ignores possible change of full_page_writes GUC. It sticks with > the startup value. It leads to loss of > XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by > reload) > Won't this be covered by checkpointer process? Basically, the next time checkpointer sees that it has received the sighup, it will call UpdateFullPageWrites which will log the record if required. In general, I was wondering why in the first place this variable (full_page_writes) is a SIGHUP variable? I think if the user tries to switch it to 'on' from 'off', it won't guarantee the recovery from torn pages. Yeah, one can turn it to 'off' from 'on' without any problem, but as the reverse doesn't guarantee anything, it can confuse users. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
At Wed, 28 Mar 2018 15:59:48 +0900, Michael Paquier wrote in <20180328065948.gm1...@paquier.xyz> > On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote: > > The attached does that. I don't like that it uses ControlFileLock > > to exlucde concurrent UpdateFullPageWrites and StartupXLOG but > > WALInsertLock cannot be used since UpdateFullPageWrites may take > > the same lock. > > You visibly forgot your patch. Mmm, someone must have eaten that. I'm sure it is attached this time. I don't like UpdateFullPageWrites is using ControlFileLock to exclusion.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index cb9c2a29cb..d2bb5e16ac 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7558,9 +7558,11 @@ StartupXLOG(void) /* * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE * record before resource manager writes cleanup WAL records or checkpoint - * record is written. + * record is written, ignoreing the change of full_page_write GUC so far. */ + WALInsertLockAcquireExclusive(); Insert->fullPageWrites = lastFullPageWrites; + WALInsertLockRelease(); LocalSetXLogInsertAllowed(); UpdateFullPageWrites(); LocalXLogInsertAllowed = -1; @@ -7783,6 +7785,9 @@ StartupXLOG(void) * itself, we use the info_lck here to ensure that there are no race * conditions concerning visibility of other recent updates to shared * memory. + * + * ControlFileLock excludes concurrent update of shared fullPageWrites in + * addition to its ordinary use. */ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); ControlFile->state = DB_IN_PRODUCTION; @@ -7790,11 +7795,25 @@ StartupXLOG(void) SpinLockAcquire(&XLogCtl->info_lck); XLogCtl->SharedRecoveryInProgress = false; + lastFullPageWrites = Insert->fullPageWrites; SpinLockRelease(&XLogCtl->info_lck); UpdateControlFile(); LWLockRelease(ControlFileLock); + /* + * Shared fullPageWrites at the end of recovery differs to our last known + * value. The change has been made by checkpointer but we haven't made + * corresponding XLOG_FPW_CHANGE record. We reread GUC change if any and + * try update shared fullPageWrites by myself. It ends with doing nothing + * if checkpointer already did the same thing. + */ + if (lastFullPageWrites != fullPageWrites) + { + HandleStartupProcInterrupts(); + UpdateFullPageWrites(); + } + /* * If there were cascading standby servers connected to us, nudge any wal * sender processes to notice that we've been promoted. @@ -9525,8 +9544,10 @@ XLogReportParameters(void) * Update full_page_writes in shared memory, and write an * XLOG_FPW_CHANGE record if necessary. * - * Note: this function assumes there is no other process running - * concurrently that could update it. + * This function assumes called concurrently from multiple processes and + * called concurrently with changing of shared fullPageWrites. See + * StartupXLOG(). + * */ void UpdateFullPageWrites(void) @@ -9536,13 +9557,25 @@ UpdateFullPageWrites(void) /* * Do nothing if full_page_writes has not been changed. * - * It's safe to check the shared full_page_writes without the lock, - * because we assume that there is no concurrently running process which - * can update it. + * We acquire ControlFileLock to exclude other concurrent call and change + * of shared fullPageWrites. */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + WALInsertLockAcquireExclusive(); if (fullPageWrites == Insert->fullPageWrites) + { + WALInsertLockRelease(); + LWLockRelease(ControlFileLock); return; + } + WALInsertLockRelease(); + /* + * Need to set up XLogInsert working area before entering critical section + * if we are not in recovery mode. + */ + (void) RecoveryInProgress(); + START_CRIT_SECTION(); /* @@ -9578,6 +9611,8 @@ UpdateFullPageWrites(void) WALInsertLockRelease(); } END_CRIT_SECTION(); + + LWLockRelease(ControlFileLock); } /*
Re: Problem while setting the fpw with SIGHUP
On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote: > The attached does that. I don't like that it uses ControlFileLock > to exlucde concurrent UpdateFullPageWrites and StartupXLOG but > WALInsertLock cannot be used since UpdateFullPageWrites may take > the same lock. You visibly forgot your patch. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier wrote in <20180327130226.ga1...@paquier.xyz> > On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote: > > The current UpdateFullPageWrites is safe on standby and promotion > > so what we should consider is only the non-standby case. I think > > what we should do is just calling RecoveryInProgress() at the > > beginning of CheckPointerMain, which is just the same thing with > > InitPostgres, but before setting up signal handler to avoid > > processing SIGHUP before being ready to insert xlog. > > Your proposal does not fix the issue for a checkpointer process started > on a standby. After a promotion, if SIGHUP is issued with a change in > full_page_writes, then the initialization of InitXLogInsert() would > happen again in the critical section of UpdateFullPageWrites(). The > window is rather small for normal promotions as the startup process > requests a checkpoint which would do the initialization, and much larger > for fallback_promote where the startup process is in charge of doing the > end-of-recovery checkpoint. Yeah. I realized that after sending the mail. I looked closer and found several problems there. - On standby, StartupXLOG calls UpdateFullPageWrites and checkpointer can call the same function simultaneously, but it doesn't assume concurrent call. - StartupXLOG can make a concurrent write to Insert->fullPageWrite so it needs to be locked. - At the time of the very end of recovery, the startup process ignores possible change of full_page_writes GUC. It sticks with the startup value. It leads to loss of XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by reload) - If checkpointer calls UpdateFullPageWrites concurrently with change of SharedRecoveryInProgress to false in StartupXLOG the change may lose corresponding XLOG_CHANGE_FPW. So, if we don't accept the current behavior, what I think we should do are all of the follows. A. In StartupXLOG, protect write to Insert->fullPageWrites with wal insert exlusive lock. Do the same thing for read in UpdateFullPageWrites. B. Surround the whole UpdateFullPageWrites with any kind of lock to exclude concurrent calls. The attached uses ControlFileLock. This also exludes the function with chaging of SharedRecoveryInProgress to avoid loss of XLOG_CHANGE_FPW. C. After exiting recovery mode, call UpdateFullPageWrites from StartupXLOG if shared fullPageWrites is found changed from the last known value. If checkponiter did the same thing at the same time, one of them completes the work. D. Call RecoveryInProgress to set up xlog working area. The attached does that. I don't like that it uses ControlFileLock to exlucde concurrent UpdateFullPageWrites and StartupXLOG but WALInsertLock cannot be used since UpdateFullPageWrites may take the same lock. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Problem while setting the fpw with SIGHUP
On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote: > The current UpdateFullPageWrites is safe on standby and promotion > so what we should consider is only the non-standby case. I think > what we should do is just calling RecoveryInProgress() at the > beginning of CheckPointerMain, which is just the same thing with > InitPostgres, but before setting up signal handler to avoid > processing SIGHUP before being ready to insert xlog. Your proposal does not fix the issue for a checkpointer process started on a standby. After a promotion, if SIGHUP is issued with a change in full_page_writes, then the initialization of InitXLogInsert() would happen again in the critical section of UpdateFullPageWrites(). The window is rather small for normal promotions as the startup process requests a checkpoint which would do the initialization, and much larger for fallback_promote where the startup process is in charge of doing the end-of-recovery checkpoint. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
At Tue, 27 Mar 2018 16:46:30 +0900, Michael Paquier wrote in <20180327074630.gd9...@paquier.xyz> > I have finally been able to spend more time on this issue, and checked > for a couple of hours all the calls to RecoveryInProgress() that could > be triggered within a critical section to see if the move I suggested > previously is worth it ot not as this would cost some memory for > standbys all the time, which would suck for many read-only sessions. > > There are a couple of calls potentially happening in critical sections, > however except for the one in UpdateFullPageWrites() those are used for > sanity checks in code paths that should never trigger it, take > XLogInsertBegin() for example. So with assertions this would trigger > a failure before the real elog(ERROR) message shows up. > > Hence, I am changing opinion still I think that instead of the patch you > proposed first we could just do a call to InitXLogInsert() before > entering the critical section. This is also more consistent with what > CreateCheckpoint() does. That's also less risky for a backpatch as your > patch increases the window between the beginning of the critical section > and the real moment where the check for RecoveryInProgress is needed. A > comment explaining why the initialization needs to happen is also > essential. > > All in all, this would give the simple patch attached. > > Thoughts? At the first look I was uneasy that the patch initializes xlog working area earlier than required. The current UpdateFullPageWrites is safe on standby and promotion so what we should consider is only the non-standby case. I think what we should do is just calling RecoveryInProgress() at the beginning of CheckPointerMain, which is just the same thing with InitPostgres, but before setting up signal handler to avoid processing SIGHUP before being ready to insert xlog. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 4b452e7cee..c446e81299 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -197,6 +197,13 @@ CheckpointerMain(void) CheckpointerShmem->checkpointer_pid = MyProcPid; + /* + * We need to call InitXLOGAccess(), if the system isn't in hot-standby + * mode. This is handled by calling RecoveryInProgress and ignoring the + * result. This needs to be done before SIGHUP handler is set up. + */ + (void) RecoveryInProgress(); + /* * Properly accept or ignore signals the postmaster might send us *
Re: Problem while setting the fpw with SIGHUP
On Mon, Mar 26, 2018 at 02:32:22PM +0530, Dilip Kumar wrote: > On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier > wrote: > In StartupXLOG, just before the CreateCheckPoint() call, we are calling > LocalSetXLogInsertAllowed(). So, I am thinking can we just remove > InitXLogInsert from CreateCheckpoint. And, we don't need to move it to > bootstrap.c. Or am I missing something? I have finally been able to spend more time on this issue, and checked for a couple of hours all the calls to RecoveryInProgress() that could be triggered within a critical section to see if the move I suggested previously is worth it ot not as this would cost some memory for standbys all the time, which would suck for many read-only sessions. There are a couple of calls potentially happening in critical sections, however except for the one in UpdateFullPageWrites() those are used for sanity checks in code paths that should never trigger it, take XLogInsertBegin() for example. So with assertions this would trigger a failure before the real elog(ERROR) message shows up. Hence, I am changing opinion still I think that instead of the patch you proposed first we could just do a call to InitXLogInsert() before entering the critical section. This is also more consistent with what CreateCheckpoint() does. That's also less risky for a backpatch as your patch increases the window between the beginning of the critical section and the real moment where the check for RecoveryInProgress is needed. A comment explaining why the initialization needs to happen is also essential. All in all, this would give the simple patch attached. Thoughts? -- Michael From 0fc3878375890aef5c0a6dc57d73c30aa97c88df Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 27 Mar 2018 16:38:37 +0900 Subject: [PATCH] Fix WAL insert when updating full_page_writes for checkpointer At startup or when receiving a SIGHUP signal to update its configuration, the checkpointer may need to update the shared memory for full_page_writes and needs to write a WAL record about it. However, just after a promotion, there is a small window where it is possible that the memory context in charge of building WAL records has not been correctly initialized, leading to a crash of the checkpointer. With assertions enabled, this causes a failure when allocating memory in a critical section. And this fails when trying to build the FPW record without assertions. Report and patch by Dilip Kumar, reviewed by Michael Paquier. Discussion: https://postgr.es/m/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com --- src/backend/access/transam/xlog.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index cb9c2a29cb..aae1131977 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9543,6 +9543,16 @@ UpdateFullPageWrites(void) if (fullPageWrites == Insert->fullPageWrites) return; + /* + * Initialize InitXLogInsert working areas before entering the critical + * section. Normally, this is done by the first call to + * RecoveryInProgress() or LocalSetXLogInsertAllowed(), but it can + * be possible that this has not been initialized yet for a checkpointer + * which updates the value of full_page_writes for a received SIGHUP or + * at process startup. + */ + InitXLogInsert(); + START_CRIT_SECTION(); /* -- 2.16.3 signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier wrote: > On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote: > > Yeah, you are right. Fixed. > > So I have been spending a couple of hours playing with your patch, and > tested various configurations manually, like switch the fpw switch to on > and off while running in parallel pgbench. I have also tested > promotions, etc. > > While the patch does its job, it is possible to simplify a bit more the > calls to InitXLogInsert(). Particularly, the one in CreateCheckpoint() > is basically useless for the checkpointer, still it is useful for the > startup process when trigerring an end-in-recovery checkpoint. So one > additional cleanup would be to move the call in CreateCheckpoint() to > bootstrap.c for the startup process. In StarupXLOG, just before the CreateCheckPoint() call, we are calling LocalSetXLogInsertAllowed(). So, I am thinking can we just remove InitXLogInsert from CreateCheckpoint. And, we don't need to move it to bootstrap.c. Or am I missing something? > In order to test that, please make > sure to create fallback_promote at the root of the data folder before > sending SIGUSR2 to the startup process which would trigger the pre-9.3 > promotion where the end-of-recovery checkpoint is triggered by the > startup process itself. > > + /* Initialize the working areas for constructing WAL records. */ > + InitXLogInsert(); > Instead of having the same comment for each process calling > InitXLogInsert() multiple times, I think that it would be better to > complete the comment in bootstrap.c where is written "XLOG operations". > > Here is a suggestion: > /* > * Initialize WAL-related operations and enter the main loop of each > * process. InitXLogInsert is called for each process which can > * generate WAL. While this is wasteful for processes started on > * a standby, this gives the guarantee that initialization of WAL > * insertion areas is able to happen in a consistent way and out of > * any critical sections so as the facility is usable when a promotion > * is triggered. > */ > > What do you think? > Looks good to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Problem while setting the fpw with SIGHUP
On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote: > Yeah, you are right. Fixed. So I have been spending a couple of hours playing with your patch, and tested various configurations manually, like switch the fpw switch to on and off while running in parallel pgbench. I have also tested promotions, etc. While the patch does its job, it is possible to simplify a bit more the calls to InitXLogInsert(). Particularly, the one in CreateCheckpoint() is basically useless for the checkpointer, still it is useful for the startup process when trigerring an end-in-recovery checkpoint. So one additional cleanup would be to move the call in CreateCheckpoint() to bootstrap.c for the startup process. In order to test that, please make sure to create fallback_promote at the root of the data folder before sending SIGUSR2 to the startup process which would trigger the pre-9.3 promotion where the end-of-recovery checkpoint is triggered by the startup process itself. + /* Initialize the working areas for constructing WAL records. */ + InitXLogInsert(); Instead of having the same comment for each process calling InitXLogInsert() multiple times, I think that it would be better to complete the comment in bootstrap.c where is written "XLOG operations". Here is a suggestion: /* * Initialize WAL-related operations and enter the main loop of each * process. InitXLogInsert is called for each process which can * generate WAL. While this is wasteful for processes started on * a standby, this gives the guarantee that initialization of WAL * insertion areas is able to happen in a consistent way and out of * any critical sections so as the facility is usable when a promotion * is triggered. */ What do you think? -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Tue, Mar 20, 2018 at 11:26 AM, Michael Paquier wrote: > On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote: > > I think like WALWriterProcess, we need to call InitXLogInsert for the > > CheckpointerProcess as well as for the BgWriterProcess > > because earlier they were calling InitXLogInsert while check > > RecoveryInProgress before inserting the WAL. > > /* don't set signals, bgwriter has its own agenda */ > + InitXLOGAccess(); > + InitXLogInsert() > > This is wrong, as the checkpointer is started as well on standbys, and > that InitXLOGAccess initializes things for WAL generation like > ThisTimeLineID. So you should just call InitXLogInsert(), and a comment > would be welcome for both the bgwriter and the checkpointer. > Yeah, you are right. Fixed. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com xlog-insert-critical-v3.patch Description: Binary data
Re: Problem while setting the fpw with SIGHUP
On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote: > I think like WALWriterProcess, we need to call InitXLogInsert for the > CheckpointerProcess as well as for the BgWriterProcess > because earlier they were calling InitXLogInsert while check > RecoveryInProgress before inserting the WAL. /* don't set signals, bgwriter has its own agenda */ + InitXLOGAccess(); + InitXLogInsert() This is wrong, as the checkpointer is started as well on standbys, and that InitXLOGAccess initializes things for WAL generation like ThisTimeLineID. So you should just call InitXLogInsert(), and a comment would be welcome for both the bgwriter and the checkpointer. -- Michael signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Fri, Mar 16, 2018 at 10:53 AM, Michael Paquier wrote: > On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote: > > Instead of doing what you are suggesting, why not moving > > InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as > > the allocations for WAL inserts is done unconditionally? This has > > the cost of also making this allocation even for backends which are > > started during recovery, still we are talking about allocating a couple > > of bytes in exchange of addressing completely all race conditions in > > this area. InitXLogInsert() does not depend on any post-recovery data > > like ThisTimeLineId, so a split is possible. > > I have been hacking things this way, and it seems to me that it takes > care of all this class of problems. CreateCheckPoint() actually > mentions that InitXLogInsert() cannot be called in a critical section, > so the comments don't match the code. I also think that we still want > to be able to use RecoveryInProgress() in critical sections to do > decision-making for the generation of WAL records > Thanks for the patch, the idea looks good to me. Please find some comments and updated patch. I think like WALWriterProcess, we need to call InitXLogInsert for the CheckpointerProcess as well as for the BgWriterProcess because earlier they were calling InitXLogInsert while check RecoveryInProgress before inserting the WAL. see below crash: #0 0x7f89273a65d7 in raise () from /lib64/libc.so.6 #1 0x7f89273a7cc8 in abort () from /lib64/libc.so.6 #2 0x009fd24e in errfinish (dummy=0) at elog.c:556 #3 0x009ff70b in elog_finish (elevel=20, fmt=0xac0d1d "too much WAL data") at elog.c:1378 #4 0x00558766 in XLogRegisterData (data=0xf3efac "\001", len=1) at xloginsert.c:330 #5 0x0055080e in UpdateFullPageWrites () at xlog.c:9569 #6 0x007ea831 in UpdateSharedMemoryConfig () at checkpointer.c:1360 #7 0x007e95b1 in CheckpointerMain () at checkpointer.c:370 #8 0x00561680 in AuxiliaryProcessMain (argc=2, argv=0x7fffcfd4bec0) at bootstrap.c:447 I have modified you patch and called InitXLogInsert for CheckpointerProcess and BgWriterProcess also. After that the issue is solved and fpw is getting set properly. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com xlog-insert-critical-v2.patch Description: Binary data
Re: Problem while setting the fpw with SIGHUP
On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote: > Instead of doing what you are suggesting, why not moving > InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as > the allocations for WAL inserts is done unconditionally? This has > the cost of also making this allocation even for backends which are > started during recovery, still we are talking about allocating a couple > of bytes in exchange of addressing completely all race conditions in > this area. InitXLogInsert() does not depend on any post-recovery data > like ThisTimeLineId, so a split is possible. I have been hacking things this way, and it seems to me that it takes care of all this class of problems. CreateCheckPoint() actually mentions that InitXLogInsert() cannot be called in a critical section, so the comments don't match the code. I also think that we still want to be able to use RecoveryInProgress() in critical sections to do decision-making for the generation of WAL records. -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 47a6c4d895..cff238ee2a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8049,6 +8049,9 @@ LocalSetXLogInsertAllowed(void) /* Initialize as RecoveryInProgress() would do when switching state */ InitXLOGAccess(); + + /* Also initialize the working areas for constructing WAL records */ + InitXLogInsert(); } /* @@ -8178,9 +8181,6 @@ InitXLOGAccess(void) (void) GetRedoRecPtr(); /* Also update our copy of doPageWrites. */ doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites); - - /* Also initialize the working areas for constructing WAL records */ - InitXLogInsert(); } /* @@ -8582,11 +8582,11 @@ CreateCheckPoint(int flags) /* * Initialize InitXLogInsert working areas before entering the critical - * section. Normally, this is done by the first call to - * RecoveryInProgress() or LocalSetXLogInsertAllowed(), but when creating - * an end-of-recovery checkpoint, the LocalSetXLogInsertAllowed call is - * done below in a critical section, and InitXLogInsert cannot be called - * in a critical section. + * section. Normally, this is done at backend startup or when calling + * LocalSetXLogInsertAllowed(), but when creating an end-of-recovery + * checkpoint, the LocalSetXLogInsertAllowed call is done below in a + * critical section, and InitXLogInsert cannot be called in a critical + * section. */ InitXLogInsert(); diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c index 28ff2f0979..4d20b9712f 100644 --- a/src/backend/bootstrap/bootstrap.c +++ b/src/backend/bootstrap/bootstrap.c @@ -452,6 +452,7 @@ AuxiliaryProcessMain(int argc, char *argv[]) case WalWriterProcess: /* don't set signals, walwriter has its own agenda */ InitXLOGAccess(); + InitXLogInsert(); WalWriterMain(); proc_exit(1); /* should never return */ diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index d8f45b3c43..8db377c1cf 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -618,6 +618,15 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, */ if (IsUnderPostmaster) { + /* + * Initialize the working areas for constructing WAL records. + * This is done even for a standby instance to avoid initialization + * of this machinery after a promotion, which could happen in a + * critical section and InitXLogInsert() cannot be called in such + * a code path. + */ + InitXLogInsert(); + /* * The postmaster already started the XLOG machinery, but we need to * call InitXLOGAccess(), if the system isn't in hot-standby mode. signature.asc Description: PGP signature
Re: Problem while setting the fpw with SIGHUP
On Fri, Mar 09, 2018 at 01:42:04PM +0530, Dilip Kumar wrote: > While setting the full_page_write with SIGHUP I hit an assert in checkpoint > process. And, that is because inside a CRITICAL section we are calling > RecoveryInProgress which intern allocates memory. So I have moved > RecoveryInProgress call out of the CRITICAL section and the problem got > solved. Indeed, I can see how this is possible. If you apply the following you can also have way more fun, but that's overdoing it: --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7918,6 +7918,8 @@ CheckRecoveryConsistency(void) bool RecoveryInProgress(void) { + Assert(CritSectionCount == 0); Anyway, it seems to me that you are not taking care of all possible race conditions here. RecoveryInProgress() could as well be called in XLogFlush(), and that's a code path taken during redo. Instead of doing what you are suggesting, why not moving InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as the allocations for WAL inserts is done unconditionally? This has the cost of also making this allocation even for backends which are started during recovery, still we are talking about allocating a couple of bytes in exchange of addressing completely all race conditions in this area. InitXLogInsert() does not depend on any post-recovery data like ThisTimeLineId, so a split is possible. Thoughts? -- Michael signature.asc Description: PGP signature
Problem while setting the fpw with SIGHUP
While setting the full_page_write with SIGHUP I hit an assert in checkpoint process. And, that is because inside a CRITICAL section we are calling RecoveryInProgress which intern allocates memory. So I have moved RecoveryInProgress call out of the CRITICAL section and the problem got solved. command executed: killall -SIGHUP postgres Crash call stack: #0 0x7fa19560d5d7 in raise () from /lib64/libc.so.6 #1 0x7fa19560ecc8 in abort () from /lib64/libc.so.6 #2 0x009fc991 in ExceptionalCondition (conditionName=0xc5ab1c "!(CritSectionCount == 0)", errorType=0xc5a739 "FailedAssertion", fileName=0xc5a8a5 "mcxt.c", lineNumber=635) at assert.c:54 #3 0x00a34e56 in MemoryContextCreate (node=0x192edc0, tag=T_AllocSetContext, size=216, nameoffset=216, methods=0xc58620 , parent=0x18fe1b0, name=0xac1137 "WAL record construction", flags=0) at mcxt.c:635 #4 0x00a2aaa1 in AllocSetContextCreateExtended (parent=0x18fe1b0, name=0xac1137 "WAL record construction", flags=0, minContextSize=0, initBlockSize=8192, maxBlockSize=8388608) at aset.c:463 #5 0x0055983c in InitXLogInsert () at xloginsert.c:1033 #6 0x0054e4e5 in InitXLOGAccess () at xlog.c:8183 #7 0x0054df71 in RecoveryInProgress () at xlog.c:7952 #8 0x005507f6 in UpdateFullPageWrites () at xlog.c:9566 #9 0x007ea821 in UpdateSharedMemoryConfig () at checkpointer.c:1366 #10 0x007e95a1 in CheckpointerMain () at checkpointer.c:383 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com fix_fpwupdate.patch Description: Binary data