Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-03 Thread Michael Paquier
On Tue, Sep 03, 2019 at 10:58:18AM +0900, Michael Paquier wrote:
> Attached is an updated patch, which needs to go down to v10.

Committed, after doing more double-checks on it.  Thanks for the
report, Takahashi-san.
--
Michael


signature.asc
Description: PGP signature


Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread Michael Paquier
On Mon, Sep 02, 2019 at 05:38:56PM +0900, Michael Paquier wrote:
> Thinking wider, don't we have the same problem with wal_sender_timeout
> in the case where a sync request takes longer than the time it would
> take the backend to terminate the connection?

I have been able to work more on that, and that can indeed happen with
wal_sender_timeout.  While reviewing the code, I have noticed that
there is little point to enable do_sync when fetching WAL segments.
This actually led to too many fsyncs done for the plain format as each
WAL segment is fsync'd first by walmethods.c, then fsync'd again by
fsync_pgdata() in pg_wal/.

Attached is an updated patch, which needs to go down to v10.
--
Michael
From ec75a75514d9601d1b1c3b929ac1eb870a208aa2 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 3 Sep 2019 10:48:45 +0900
Subject: [PATCH v2] Delay fsync of pg_basebackup tar format until the end

Since the addition of fsync requests in bc34223 to make base backup data
consistent on disk once pg_basebackup finishes, each tablespace tar file
is individually flushed once completed, with an additional flush of the
parent directory when the base backup finishes.  While holding a
connection to the server, a fsync request taking a long time may cause a
failure of the base backup.  A recent example is tcp_user_timeout, but
wal_sender_timeout can cause similar problems.

While reviewing the code, there was a second issue causing too many
fsync requests to be done when completing WAL records for the tar or
plain format. As recursive fsyncs are done at the end for both formats
for everything written, it is fine to disable fsync when fetching or
streaming WAL files.

Reported-by: Ryohei Takahashi
Author: Michael Paquier
Reviewed-by: Ryohei Takahashi
Discussion: https://postgr.es/m/osbpr01mb4550dae2f8c9502894a45aab82...@osbpr01mb4550.jpnprd01.prod.outlook.com
Backpatch-through: 10
---
 src/bin/pg_basebackup/pg_basebackup.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9207109ba3..da475319b6 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -486,15 +486,18 @@ LogStreamerMain(logstreamer_param *param)
 #endif
 	stream.standby_message_timeout = standby_message_timeout;
 	stream.synchronous = false;
-	stream.do_sync = do_sync;
+	/* fsync happens at the end of pg_basebackup for all data */
+	stream.do_sync = false;
 	stream.mark_done = true;
 	stream.partial_suffix = NULL;
 	stream.replication_slot = replication_slot;
 
 	if (format == 'p')
-		stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0, do_sync);
+		stream.walmethod = CreateWalDirectoryMethod(param->xlog, 0,
+	stream.do_sync);
 	else
-		stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel, do_sync);
+		stream.walmethod = CreateWalTarMethod(param->xlog, compresslevel,
+			  stream.do_sync);
 
 	if (!ReceiveXlogStream(param->bgconn, ))
 
@@ -1346,9 +1349,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 	if (copybuf != NULL)
 		PQfreemem(copybuf);
 
-	/* sync the resulting tar file, errors are not considered fatal */
-	if (do_sync && strcmp(basedir, "-") != 0)
-		(void) fsync_fname(filename, false);
+	/*
+	 * Do not sync the resulting tar file yet, all files are synced once at
+	 * the end.
+	 */
 }
 
 
@@ -2138,9 +2142,9 @@ BaseBackup(void)
 
 	/*
 	 * Make data persistent on disk once backup is completed. For tar format
-	 * once syncing the parent directory is fine, each tar file created per
-	 * tablespace has been already synced. In plain format, all the data of
-	 * the base directory is synced, taking into account all the tablespaces.
+	 * sync the parent directory and all its contents as each tar file was not
+	 * synced after being completed.  In plain format, all the data of the
+	 * base directory is synced, taking into account all the tablespaces.
 	 * Errors are not considered fatal.
 	 */
 	if (do_sync)
@@ -2150,7 +2154,7 @@ BaseBackup(void)
 		if (format == 't')
 		{
 			if (strcmp(basedir, "-") != 0)
-(void) fsync_fname(basedir, true);
+(void) fsync_dir_recurse(basedir);
 		}
 		else
 		{
-- 
2.23.0



signature.asc
Description: PGP signature


Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread Michael Paquier
On Mon, Sep 02, 2019 at 08:06:22AM +, r.takahash...@fujitsu.com wrote:
> I'm not sure whether this patch should be applied to postgres below 11
> since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the 
> same error.

Thinking wider, don't we have the same problem with wal_sender_timeout
in the case where a sync request takes longer than the time it would
take the backend to terminate the connection?
--
Michael


signature.asc
Description: PGP signature


RE: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread r.takahash...@fujitsu.com
Hi Michael-san,


> Attached is a patch to do that, which should go down to v12 where
> tcp_user_timeout has been introduced.  Takahashi-san, what do you
> think?

Thank you for creating the patch.
This patch is what I expected.

I'm not sure whether this patch should be applied to postgres below 11
since I'm not sure whether the OS parameters (ex. tcp_retries2) cause the same 
error.

Regards,
Ryohei Takahashi





Re: pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-02 Thread Michael Paquier
On Mon, Sep 02, 2019 at 04:42:55AM +, r.takahash...@fujitsu.com wrote:
> I think fsync() for each tablespace is not necessary.
> Like pg_basebackup -F p, I think fsync() is necessary only once at the end.

Yes, I agree that we overlooked that part when introducing
tcp_user_timeout.  It is possible to sync all the contents of
pg_basebackup's tar format once at the end with fsync_dir_recurse().

Looking at the original discussion that brought bc34223b, the proposed
patches did what we have now on HEAD but we did not really exchange
about doing a fsync() just at the end with all the result base
directory contents:
https://www.postgresql.org/message-id/cab7npqql0fcp0edcvd6+3+je24xeapu14vkz_pbpna0stpw...@mail.gmail.com

Attached is a patch to do that, which should go down to v12 where
tcp_user_timeout has been introduced.  Takahashi-san, what do you
think?
--
Michael
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9207109ba3..f9f27fbf76 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1346,9 +1346,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 	if (copybuf != NULL)
 		PQfreemem(copybuf);
 
-	/* sync the resulting tar file, errors are not considered fatal */
-	if (do_sync && strcmp(basedir, "-") != 0)
-		(void) fsync_fname(filename, false);
+	/*
+	 * Do not sync the resulting tar file yet, all files are synced once
+	 * at the end.
+	 */
 }
 
 
@@ -2138,8 +2139,8 @@ BaseBackup(void)
 
 	/*
 	 * Make data persistent on disk once backup is completed. For tar format
-	 * once syncing the parent directory is fine, each tar file created per
-	 * tablespace has been already synced. In plain format, all the data of
+	 * sync the parent directory and all its contents as each tar file was
+	 * not synced after being completed.  In plain format, all the data of
 	 * the base directory is synced, taking into account all the tablespaces.
 	 * Errors are not considered fatal.
 	 */
@@ -2150,7 +2151,7 @@ BaseBackup(void)
 		if (format == 't')
 		{
 			if (strcmp(basedir, "-") != 0)
-(void) fsync_fname(basedir, true);
+(void) fsync_dir_recurse(basedir);
 		}
 		else
 		{


signature.asc
Description: PGP signature


pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout

2019-09-01 Thread r.takahash...@fujitsu.com
Hi


pg_basebackup -F t fails when fsync spends more time than tcp_user_timeout in 
following environment.

[Environment]
Postgres 13dev (master branch)
Red Hat Enterprise Postgres 7.4

[Error]
$ pg_basebackup -F t --progress --verbose -h  -D 
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/5A60 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_15647"
pg_basebackup: error: could not read COPY data: server closed the connection 
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

[Analysis]
- pg_basebackup -F t creates a tar file and does fsync() for each tablespace.
  (Otherwise, -F p does fsync() only once at the end.)
- While doing fsync() for a tar file for one tablespace, wal sender sends the 
content of the next tablespace.
  When fsync() spends long time, the tcp socket of pg_basebackup returns "zero 
window" packets to wal sender.
  This means the tcp socket buffer of pg_basebackup is exhausted since 
pg_basebackup cannot receive during fsync().
- The socket of wal sender retries to send the packet, but resets connection 
after tcp_user_timeout.
  After wal sender resets connection, pg_basebackup cannot receive data and 
fails with above error.

[Solution]
I think fsync() for each tablespace is not necessary.
Like pg_basebackup -F p, I think fsync() is necessary only once at the end.


Could you give me any comment?


Regards,
Ryohei Takahashi