On Fri, Mar 09, 2018 at 08:22:49AM +0000, Tsunakawa, Takayuki wrote:
> Thanks for reviewing.  All done.

Thanks.  Please be careful with the indentation of the patch.  Attached
is a slightly-updated version with a small modification in
remove_target_file to make the code cleaner, a proposal of commit
message and pgindent applied on the code.  I am switching that as ready
for committer.
--
Michael
From 62950c615d87e3e58998f295ce446eb5c80707e4 Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Sun, 11 Mar 2018 22:49:55 +0900
Subject: [PATCH] Fix handling of removed files on target server with pg_rewind

After processing the filemap to build the list of chunks that will be
fetched from the source to rewing the target server, it is possible that
a file which was previously processed is removed from the source.  A
simple example of such an occurence is a WAL segment which gets recycled
on the target in-between.  When the filemap is processed, files not
categorized as relation files are first truncated to prepare for its
full copy of which is going to be taken from the source, divided into a
set of junks.  However, for a recycled WAL segment, this would result in
a segment which has a zero-byte size.  With such an empty file,
post-rewind recovery thinks that records are saved but they are actually
not because of the truncation which happened when processing the
filemap, resulting in data loss.

In order to fix the problem, make sure that files which are found as
removed on the source when receiving chunks of them are as well deleted
on the target server for consistency.  This ensures that no empty junk
files remain.  If a file has a size so large that it needs more than one
chunk to be fully copied, it could be possible that the deletion is
attempted more than once.  It could be possible as well that the file
has gone away after already copying some chunks on the target.  For
those reasons, the file removals done when processing the file chunks
are lossy, and ignore missing entries.

Patch and report from Tsunakawa Takayuki, reviewed by Michael Paquier.

Discussion: https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8DAAA2%40G01JPEXMBYT05
---
 src/bin/pg_rewind/file_ops.c    | 16 ++++++++++++----
 src/bin/pg_rewind/file_ops.h    |  1 +
 src/bin/pg_rewind/libpq_fetch.c | 10 +++++++---
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_rewind/file_ops.c b/src/bin/pg_rewind/file_ops.c
index 705383d184..f491ed7f5c 100644
--- a/src/bin/pg_rewind/file_ops.c
+++ b/src/bin/pg_rewind/file_ops.c
@@ -29,7 +29,6 @@
 static int	dstfd = -1;
 static char dstpath[MAXPGPATH] = "";
 
-static void remove_target_file(const char *path);
 static void create_target_dir(const char *path);
 static void remove_target_dir(const char *path);
 static void create_target_symlink(const char *path, const char *link);
@@ -134,7 +133,7 @@ remove_target(file_entry_t *entry)
 			break;
 
 		case FILE_TYPE_REGULAR:
-			remove_target_file(entry->path);
+			remove_target_file(entry->path, false);
 			break;
 
 		case FILE_TYPE_SYMLINK:
@@ -165,8 +164,12 @@ create_target(file_entry_t *entry)
 	}
 }
 
-static void
-remove_target_file(const char *path)
+/*
+ * Remove a file from target data directory.  If missing_ok is true, it
+ * is fine for the target file to not exist.
+ */
+void
+remove_target_file(const char *path, bool missing_ok)
 {
 	char		dstpath[MAXPGPATH];
 
@@ -175,8 +178,13 @@ remove_target_file(const char *path)
 
 	snprintf(dstpath, sizeof(dstpath), "%s/%s", datadir_target, path);
 	if (unlink(dstpath) != 0)
+	{
+		if (errno == ENOENT && missing_ok)
+			return;
+
 		pg_fatal("could not remove file \"%s\": %s\n",
 				 dstpath, strerror(errno));
+	}
 }
 
 void
diff --git a/src/bin/pg_rewind/file_ops.h b/src/bin/pg_rewind/file_ops.h
index be580ee4db..9d26cf4f77 100644
--- a/src/bin/pg_rewind/file_ops.h
+++ b/src/bin/pg_rewind/file_ops.h
@@ -15,6 +15,7 @@
 extern void open_target_file(const char *path, bool trunc);
 extern void write_target_range(char *buf, off_t begin, size_t size);
 extern void close_target_file(void);
+extern void remove_target_file(const char *path, bool missing_ok);
 extern void truncate_target_file(const char *path, off_t newsize);
 extern void create_target(file_entry_t *t);
 extern void remove_target(file_entry_t *t);
diff --git a/src/bin/pg_rewind/libpq_fetch.c b/src/bin/pg_rewind/libpq_fetch.c
index 8f8d504455..5914b15017 100644
--- a/src/bin/pg_rewind/libpq_fetch.c
+++ b/src/bin/pg_rewind/libpq_fetch.c
@@ -311,15 +311,19 @@ receiveFileChunks(const char *sql)
 		chunk = PQgetvalue(res, 0, 2);
 
 		/*
-		 * It's possible that the file was deleted on remote side after we
-		 * created the file map. In this case simply ignore it, as if it was
-		 * not there in the first place, and move on.
+		 * If a file has been deleted on the source, remove it on the target
+		 * as well.  Note that multiple unlink() calls may happen on the same
+		 * file if multiple data chunks are associated with it, hence ignore
+		 * unconditionally anything missing.  If this file is not a relation
+		 * data file, then it has been already truncated when creating the
+		 * file chunk list at the previous execution of the filemap.
 		 */
 		if (PQgetisnull(res, 0, 2))
 		{
 			pg_log(PG_DEBUG,
 				   "received null value for chunk for file \"%s\", file has been deleted\n",
 				   filename);
+			remove_target_file(filename, true);
 			pg_free(filename);
 			PQclear(res);
 			continue;
-- 
2.16.2

Attachment: signature.asc
Description: PGP signature

Reply via email to