On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier <michael.paqu...@gmail.com> wrote: > > On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao.fu...@gmail.com> wrote: >> >> In this case, the patch seems to make the restartpoint recycle even WAL files >> which have .ready files and will have to be archived later. Thought? > > The real problem currently is that it is possible to have a segment file not > marked as .done during recovery when stream connection is abruptly cut when > this segment is switched, marking it as .ready in archive_status and simply > letting this segment in pg_xlog because it will neither be recycled nor > removed. I have not been able to look much at this code these days, so I am > not sure how invasive it would be in back-branches, but perhaps we should try > to improve code such as when a segment file is switched and connection to the > is cut, we guarantee that this file is completed and marked as .done.
I have spent more time on that, with a bit more of underground work... First, the problem can be reproduced most of the time by running this simple command: psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate This will enforce a segment file switch and restart the master in crash recovery. This has as effect to immediately cut the WAL stream on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0. For example, let's imagine that stream fails when switching from 000000010000000000000003 to the next segment, then the the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for example 0/3100000, and walrcv->latestWalEnd is 0/4000000. When stream restarts it will begin once again from 0/4000000, ignoring that 000000010000000000000003 should be marked as .done, ultimately marking it in .ready state when old segment files are recycled or removed. There is nothing that can really be done to enforce the creation of a .done file before the FATAL of libpqrcv_receive because we cannot predict the stream failure.. Now, we can do better than what we have now by looking at WAL start position used when starting streaming in WAL receiver and enforce .done if the start position is the last one of previous segment. Hence, in the case of start position 0/4000000 that was found previously, the file that will be enforced to .done is 000000010000000000000003. I have written the patch attached that implements this idea and fixes the problem. Now let's see if you guys see any flaws in this simple logic which uses a sniper gun instead of a bazooka as in the previous patches sent. Regards, -- Michael
From ce7e1eec97dbe7b1648b4f56a1f9825eeba7ebed Mon Sep 17 00:00:00 2001 From: Michael Paquier <mpaqu...@vmware.com> Date: Mon, 20 Oct 2014 14:40:37 +0900 Subject: [PATCH] Mark segment file .done for last WAL position at stream start When stream connection between a standby and its root node is unstable, WAL stream errors make this standby node fail with FATAL errors because of the WAL receiver, forcing it to restart in crash-recovery mode. Now, when the crash occurs exactly when a switch to a new segment file is done, it is possible that the WAL receiver restarts from a position located on the next segment file, while the previous file is let as is, marked ultimately in .ready state once old WAL files are recycled or removed. Note that this file should have been marked as .done by the WAL receiver itself. With this patch, WAL receiver checks for the presence of the previous segment file of start position if this WAL position is the last one of the previous segment file and enforces it to .done, preventing .ready files from being accumulated on standbys. This failure can be easily reproducible with the following command: psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate --- src/backend/access/transam/xlogarchive.c | 26 ++++++++++++++++++++++++++ src/backend/replication/walreceiver.c | 12 ++++++++++++ src/include/access/xlog_internal.h | 1 + 3 files changed, 39 insertions(+) diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 047efa2..25a153f 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -553,6 +553,32 @@ XLogArchiveNotifySeg(XLogSegNo segno) XLogArchiveNotify(xlog); } + +/* + * XLogArchiveForceDoneIfPresent + * + * Wrapper of XLogArchiveForceDone, used conditionally based on the presence + * of given XLOG segment file. + */ +void +XLogArchiveForceDoneIfPresent(TimeLineID tli, XLogSegNo segno) +{ + struct stat stat_buf; + char xlogfname[MAXFNAMELEN]; + char xlogfpath[MAXPGPATH]; + + XLogFilePath(xlogfpath, tli, segno); + + /* File is missing, nothing to do */ + if (stat(xlogfpath, &stat_buf) != 0) + return; + + /* All is fine, process... */ + XLogFileName(xlogfname, tli, segno); + XLogArchiveForceDone(xlogfname); +} + + /* * XLogArchiveForceDone * diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index c2d4ed3..6857a05 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -361,6 +361,7 @@ WalReceiverMain(void) slotname[0] != '\0' ? slotname : NULL)) { bool endofwal = false; + XLogSegNo startPointSegNo; if (first_stream) ereport(LOG, @@ -383,6 +384,17 @@ WalReceiverMain(void) last_recv_timestamp = GetCurrentTimestamp(); ping_sent = false; + /* + * Check if current start point is located exactly at the end of + * a segment file and mark this file as already archived. It is + * possible that if WAL stream connection has been abruptly cut + * exactly during a segment file switch that this file is still + * present but not marked as .done even if it should be. + */ + XLByteToPrevSeg(startpoint, startPointSegNo); + if (!XLByteInSeg(startpoint, startPointSegNo)) + XLogArchiveForceDoneIfPresent(startpointTLI, startPointSegNo); + /* Loop until end-of-streaming or error */ while (!endofwal) { diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h index 27b9899..3b1a99e 100644 --- a/src/include/access/xlog_internal.h +++ b/src/include/access/xlog_internal.h @@ -286,6 +286,7 @@ extern void ExecuteRecoveryCommand(char *command, char *commandName, extern void KeepFileRestoredFromArchive(char *path, char *xlogfname); extern void XLogArchiveNotify(const char *xlog); extern void XLogArchiveNotifySeg(XLogSegNo segno); +extern void XLogArchiveForceDoneIfPresent(TimeLineID tli, XLogSegNo segno); extern void XLogArchiveForceDone(const char *xlog); extern bool XLogArchiveCheckDone(const char *xlog); extern bool XLogArchiveIsBusy(const char *xlog); -- 2.1.2
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers