At Fri, 14 Sep 2018 16:30:37 +0530, Amit Kapila <amit.kapil...@gmail.com> wrote 
in <caa4ek1+xfx5jd2chmlpnqxeozqrlkg9tnr8wfs3-cpf9ln9...@mail.gmail.com>
> On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier <mich...@paquier.xyz> wrote:
> >
> > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote:
> > > /*
> > >  * Properly accept or ignore signals the postmaster might send us.
> > >  */
> > > -       pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file 
> > > */
> > > +       pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */
> > >
> > > I am finally coming back to this patch set, and that's one of the first
> > > things I am going to help moving on for this CF.  And this bit from the
> > > last patch series is not acceptable as there are some parameters which
> > > are used by the startup process which can be reloaded.  One of them is
> > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery.
> >
> > So, I have been working on this problem again and I have reviewed the
> > thread, and there have been many things discussed in the last couple of
> > months:
> > 1) We do not want to initialize XLogInsert stuff unconditionally for all
> > processes at the moment recovery begins, but we just want to initialize
> > it once WAL write is open for business.
> > 2) Both the checkpointer and the startup process can call
> > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
> > incorrect values.
> 
> Can you share the steps to reproduce this problem?

The window for the issue is small.

It happens if checkpointer first looks SharedRecoveryInProgress
is turned off at the beginning of the CheckPointerMain loop.
The window is needed be widen to make sure the issue happens.

Applying the first patch attched, the following steps will cause
the issue with high reliability.

1. initdb  (full_page_writes = on by default)

2. put recovery.conf so that server can accept promote signal

3. touch /tmp/hoge
   change full_page_write to off in postgresql.conf

4. pg_ctl_promote

The attached second file is a script do the above steps.
Server writes the following log message and die.

| 2018-09-18 13:55:49.928 JST [16405] LOG:  database system is ready to accept 
connections
| TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 731)
| 2018-09-18 13:55:50.546 JST [16405] LOG:  checkpointer process (PID 16407) 
was terminated by signal 6: Aborted

We can preallocating the XLogInsert buffer just to prevent the
crash. This is done by calling RecoveryInProgress() before
UpdateFullPageWrites() finds it turned to false. This is an
isolated problem. But it has another issue that FPW_CHANGE record
can be missing or wrong FPW state after recovery end.

It comes from the fact that responsibility to update the flag is
not atomically passed from startup to checkpointer. (The window
is after UpdateFullPageWrites() call and until setting
SharedRecoveryInProgress to false.)

My latest patch tries to remove the window by imposing all
responsibility to apply config file changes to the shared FPW
flag on the checkpointer. RecoveryInProgress() is changed to be
called prior to UpdateFullPageWrites on the way doing that.


> > 3) We do not want a palloc() in a critical section because of
> > RecoveryinProgress being called.
> >
> > And the root issue here is 2), because the checkpointer tries to update
> > Insert->fullPageWrites but it does not need to do so until recovery has
> > been finished.  So in order to fix the original issue I am proposing a
> > simple fix: let's make sure that the checkpointer does not update
> > Insert->fullPageWrites until recovery finishes, and let's have the
> > startup process do the first update once it finishes recovery and
> > inserts by itself the XLOG_PARAMETER_CHANGE.  This way the order of
> > events is purely sequential and we don't need to worry about having the
> > checkpointer and the startup process eat on each other's plate because
> > the checkpointer would only try to work on updating the shared memory
> > value of full_page_writes once SharedRecoveryInProgress is switched to
> > true, and that happens after the startup process does its initial call
> > to UpdateFullPageWrites().  I have improved as well all the comments
> > around to make clear the behavior wanted.
> >
> > Thoughts?

InRecoery is turned off after the last UpdateFullPageWrite() and
before SharedRecoveryInProgress is turned off. So it is still
leaving the window. Thus, checkpointer stops flipping the value
before SharedRecoveryInProgress's turning off. (I don't
understand InRecovery condition..) However, this lets
checkpointer omit reloading after UpdateFullPagesWrite() in
startup until SharedRecoveryInProgress is tunred off.

>  UpdateFullPageWrites(void)
>  {
>   XLogCtlInsert *Insert = &XLogCtl->Insert;
> + /*
> + * Check if recovery is still in progress before entering this critical
> + * section, as some memory allocation could happen at the end of
> + * recovery.  There is nothing to do for a system still in recovery.
> + * Note that we need to process things here at the end of recovery for
> + * the startup process, which is why this checks after InRecovery.
> + */
> + if (RecoveryInProgress() && !InRecovery)
> + return;
> +
> 
> On a regular startup when there is no recovery, it won't allow us to
> log the WAL record (XLOG_FPW_CHANGE) which can happen without above
> change.  You can check that by setting full_page_writes=off and start
> the system.

Agreed. "If we need to do that in the start process," we need to
change the shared flag and issue FPW_CHANGE always when the
database state is different from configuration file, regardless
of what StartXLOG() did until the point.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3025d0badb..c34a6f6f6b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7961,6 +7961,17 @@ StartupXLOG(void)
 	 */
 	WalSndWakeup();
 
+	/* DEBUG: cause a reload */
+	{
+		struct stat b;
+		if (stat("/tmp/hoge", &b) == 0)
+		{
+			elog(LOG, "STARTUP SLEEP FOR 1s");
+			sleep(1);
+			elog(LOG, "DONE.");
+			DirectFunctionCall1(pg_reload_conf, 0);
+		}
+	}
 	/*
 	 * If this was a fast promotion, request an (online) checkpoint now. This
 	 * isn't required for consistency, but the last restartpoint might be far
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 1a033093c5..26da229ca1 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -354,6 +354,9 @@ CheckpointerMain(void)
 		 */
 		AbsorbFsyncRequests();
 
+		elog(LOG, "CHECKPOINTER SLEEP FOR 3s");
+		sleep(3);
+		elog(LOG, "DONE.");
 		if (got_SIGHUP)
 		{
 			got_SIGHUP = false;
#! /bin/bash

set -e

#PGDATA=<somewhere>
if [ "$PGDATA" = "" ]; then
  echo set pgdata
  exit 1;
fi

# rm -rf $PGDATA  ## danger!!
rm -f /tmp/hoge
initdb

cat << EOF > $PGDATA/recovery.conf
standby_mode='yes'
primary_conninfo='host=/tmp port=9999'
EOF
cat << EOF >> $PGDATA/postgresql.conf
restart_after_crash = off
EOF

pg_ctl start
sleep 5
touch /tmp/hoge

cat << EOF >> $PGDATA/postgresql.conf
full_page_writes = off
EOF

pg_ctl promote
sleep 10
pg_ctl stop -m i


Reply via email to