Tom Lane wrote: > Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes: >> Heikki Linnakangas wrote: >>> Hmm, what happens when the startup process performs a write, and >>> bgwriter is not running? Do the fsync requests queue up in the shmem >>> queue until the end of recovery when bgwriter is launched? I guess I'll >>> have to try it out... > >> Oh dear, doesn't look good. The startup process has a pendingOpsTable of >> its own. bgwriter won't fsync() files that the startup process has >> written itself. That needs to be fixed, or you can lose data when an >> archive recovery crashes after a restartpoint. > > Ouch. I'm beginning to think that the best thing is to temporarily > revert the change that made bgwriter active during recovery. It's > obviously not been adequately thought through or tested.
That was my first thought too, but unfortunately we now rely on bgwriter to perform restartpoints :-(. I came up with the attached patch, which includes Simon's patch to have all fsync requests forwarded to bgwriter during archive recovery. To fix the startup checkpoint issue, startup process requests a forced restartpoint, which will flush any fsync requests bgwriter has accumulated, before doing the actual checkpoint in the startup process. This is completely untested still, but does anyone immediately see any more problems? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 4fd9d41..5114664 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5156,6 +5156,7 @@ StartupXLOG(void) XLogRecord *record; uint32 freespace; TransactionId oldestActiveXID; + bool bgwriterLaunched = false; XLogCtl->SharedRecoveryInProgress = true; @@ -5472,7 +5473,11 @@ StartupXLOG(void) * process in addition to postmaster! */ if (InArchiveRecovery && IsUnderPostmaster) + { + SetForwardFsyncRequests(); SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED); + bgwriterLaunched = true; + } /* * main redo apply loop @@ -5742,7 +5747,16 @@ StartupXLOG(void) * assigning a new TLI, using a shutdown checkpoint allows us to have * the rule that TLI only changes in shutdown checkpoints, which * allows some extra error checking in xlog_redo. + * + * If bgwriter is active, we can't do the necessary fsyncs in the + * startup process because bgwriter has accumulated fsync requests + * into its private memory. So we request a last forced restart point, + * which will flush them to disk, before performing the actual + * checkpoint. */ + if (bgwriterLaunched) + RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | + CHECKPOINT_WAIT); CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE); /* @@ -6219,27 +6233,11 @@ CreateCheckPoint(int flags) /* * Acquire CheckpointLock to ensure only one checkpoint happens at a time. - * During normal operation, bgwriter is the only process that creates - * checkpoints, but at the end of archive recovery, the bgwriter can be - * busy creating a restartpoint while the startup process tries to perform - * the startup checkpoint. + * (This is just pro forma, since in the present system structure there is + * only one process that is allowed to issue checkpoints at any given + * time.) */ - if (!LWLockConditionalAcquire(CheckpointLock, LW_EXCLUSIVE)) - { - Assert(InRecovery); - - /* - * A restartpoint is in progress. Wait until it finishes. This can - * cause an extra restartpoint to be performed, but that's OK because - * we're just about to perform a checkpoint anyway. Flushing the - * buffers in this restartpoint can take some time, but that time is - * saved from the upcoming checkpoint so the net effect is zero. - */ - ereport(DEBUG2, (errmsg("hurrying in-progress restartpoint"))); - RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT); - - LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); - } + LWLockAcquire(CheckpointLock, LW_EXCLUSIVE); /* * Prepare to accumulate statistics. @@ -6660,8 +6658,9 @@ CreateRestartPoint(int flags) * restartpoint. It's assumed that flushing the buffers will do that as a * side-effect. */ - if (XLogRecPtrIsInvalid(lastCheckPointRecPtr) || - XLByteLE(lastCheckPoint.redo, ControlFile->checkPointCopy.redo)) + if (!(flags & CHECKPOINT_FORCE) && + (XLogRecPtrIsInvalid(lastCheckPointRecPtr) || + XLByteLE(lastCheckPoint.redo, ControlFile->checkPointCopy.redo))) { XLogRecPtr InvalidXLogRecPtr = {0, 0}; diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index d42e86a..b01b2ab 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -204,6 +204,21 @@ mdinit(void) } /* + * In archive recovery, we rely on bgwriter to do fsyncs(), but we don't + * know that we do archive recovery at process startup when pendingOpsTable + * has already been created. Calling this function drops pendingOpsTable + * and causes any subsequent requests to be forwarded to bgwriter. + */ +void +SetForwardFsyncRequests(void) +{ + /* Perform any pending ops we may have queued up */ + if (pendingOpsTable) + mdsync(); + pendingOpsTable = NULL; +} + +/* * mdexists() -- Does the physical file exist? * * Note: this will return true for lingering files, with pending deletions diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h index 7556b14..fd79d7b 100644 --- a/src/include/storage/smgr.h +++ b/src/include/storage/smgr.h @@ -109,6 +109,7 @@ extern void mdpreckpt(void); extern void mdsync(void); extern void mdpostckpt(void); +extern void SetForwardFsyncRequests(void); extern void RememberFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno); extern void ForgetRelationFsyncRequests(RelFileNode rnode, ForkNumber forknum);
-- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs