At Fri, 14 Sep 2018 16:30:37 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote in <caa4ek1+xfx5jd2chmlpnqxeozqrlkg9tnr8wfs3-cpf9ln9...@mail.gmail.com> > On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier <mich...@paquier.xyz> wrote: > > > > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote: > > > /* > > > * Properly accept or ignore signals the postmaster might send us. > > > */ > > > - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file > > > */ > > > + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ > > > > > > I am finally coming back to this patch set, and that's one of the first > > > things I am going to help moving on for this CF. And this bit from the > > > last patch series is not acceptable as there are some parameters which > > > are used by the startup process which can be reloaded. One of them is > > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. > > > > So, I have been working on this problem again and I have reviewed the > > thread, and there have been many things discussed in the last couple of > > months: > > 1) We do not want to initialize XLogInsert stuff unconditionally for all > > processes at the moment recovery begins, but we just want to initialize > > it once WAL write is open for business. > > 2) Both the checkpointer and the startup process can call > > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get > > incorrect values. > > Can you share the steps to reproduce this problem?
The window for the issue is small. It happens if checkpointer first looks SharedRecoveryInProgress is turned off at the beginning of the CheckPointerMain loop. The window is needed be widen to make sure the issue happens. Applying the first patch attched, the following steps will cause the issue with high reliability. 1. initdb (full_page_writes = on by default) 2. put recovery.conf so that server can accept promote signal 3. touch /tmp/hoge change full_page_write to off in postgresql.conf 4. pg_ctl_promote The attached second file is a script do the above steps. Server writes the following log message and die. | 2018-09-18 13:55:49.928 JST [16405] LOG: database system is ready to accept connections | TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731) | 2018-09-18 13:55:50.546 JST [16405] LOG: checkpointer process (PID 16407) was terminated by signal 6: Aborted We can preallocating the XLogInsert buffer just to prevent the crash. This is done by calling RecoveryInProgress() before UpdateFullPageWrites() finds it turned to false. This is an isolated problem. But it has another issue that FPW_CHANGE record can be missing or wrong FPW state after recovery end. It comes from the fact that responsibility to update the flag is not atomically passed from startup to checkpointer. (The window is after UpdateFullPageWrites() call and until setting SharedRecoveryInProgress to false.) My latest patch tries to remove the window by imposing all responsibility to apply config file changes to the shared FPW flag on the checkpointer. RecoveryInProgress() is changed to be called prior to UpdateFullPageWrites on the way doing that. > > 3) We do not want a palloc() in a critical section because of > > RecoveryinProgress being called. > > > > And the root issue here is 2), because the checkpointer tries to update > > Insert->fullPageWrites but it does not need to do so until recovery has > > been finished. So in order to fix the original issue I am proposing a > > simple fix: let's make sure that the checkpointer does not update > > Insert->fullPageWrites until recovery finishes, and let's have the > > startup process do the first update once it finishes recovery and > > inserts by itself the XLOG_PARAMETER_CHANGE. This way the order of > > events is purely sequential and we don't need to worry about having the > > checkpointer and the startup process eat on each other's plate because > > the checkpointer would only try to work on updating the shared memory > > value of full_page_writes once SharedRecoveryInProgress is switched to > > true, and that happens after the startup process does its initial call > > to UpdateFullPageWrites(). I have improved as well all the comments > > around to make clear the behavior wanted. > > > > Thoughts? InRecoery is turned off after the last UpdateFullPageWrite() and before SharedRecoveryInProgress is turned off. So it is still leaving the window. Thus, checkpointer stops flipping the value before SharedRecoveryInProgress's turning off. (I don't understand InRecovery condition..) However, this lets checkpointer omit reloading after UpdateFullPagesWrite() in startup until SharedRecoveryInProgress is tunred off. > UpdateFullPageWrites(void) > { > XLogCtlInsert *Insert = &XLogCtl->Insert; > + /* > + * Check if recovery is still in progress before entering this critical > + * section, as some memory allocation could happen at the end of > + * recovery. There is nothing to do for a system still in recovery. > + * Note that we need to process things here at the end of recovery for > + * the startup process, which is why this checks after InRecovery. > + */ > + if (RecoveryInProgress() && !InRecovery) > + return; > + > > On a regular startup when there is no recovery, it won't allow us to > log the WAL record (XLOG_FPW_CHANGE) which can happen without above > change. You can check that by setting full_page_writes=off and start > the system. Agreed. "If we need to do that in the start process," we need to change the shared flag and issue FPW_CHANGE always when the database state is different from configuration file, regardless of what StartXLOG() did until the point. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3025d0badb..c34a6f6f6b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7961,6 +7961,17 @@ StartupXLOG(void) */ WalSndWakeup(); + /* DEBUG: cause a reload */ + { + struct stat b; + if (stat("/tmp/hoge", &b) == 0) + { + elog(LOG, "STARTUP SLEEP FOR 1s"); + sleep(1); + elog(LOG, "DONE."); + DirectFunctionCall1(pg_reload_conf, 0); + } + } /* * If this was a fast promotion, request an (online) checkpoint now. This * isn't required for consistency, but the last restartpoint might be far diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 1a033093c5..26da229ca1 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -354,6 +354,9 @@ CheckpointerMain(void) */ AbsorbFsyncRequests(); + elog(LOG, "CHECKPOINTER SLEEP FOR 3s"); + sleep(3); + elog(LOG, "DONE."); if (got_SIGHUP) { got_SIGHUP = false;
#! /bin/bash set -e #PGDATA=<somewhere> if [ "$PGDATA" = "" ]; then echo set pgdata exit 1; fi # rm -rf $PGDATA ## danger!! rm -f /tmp/hoge initdb cat << EOF > $PGDATA/recovery.conf standby_mode='yes' primary_conninfo='host=/tmp port=9999' EOF cat << EOF >> $PGDATA/postgresql.conf restart_after_crash = off EOF pg_ctl start sleep 5 touch /tmp/hoge cat << EOF >> $PGDATA/postgresql.conf full_page_writes = off EOF pg_ctl promote sleep 10 pg_ctl stop -m i