Re: [HACKERS] pg_basebackup may fail to send feedbacks.

Kyotaro HORIGUCHI Tue, 10 Mar 2015 01:32:26 -0700

Hi, the attached is the v5 patch.

- Do feGetCurrentTimestamp() only when necessary.
- Rebased to current master

At Mon, 2 Mar 2015 20:21:36 +0900, Fujii Masao <masao.fu...@gmail.com> wrote in 
<cahgqgwg1tjhpg03ozgwokxt5wyd5v4s3hutgsx7rotbhhnj...@mail.gmail.com>
> On Tue, Feb 24, 2015 at 6:44 PM, Kyotaro HORIGUCHI
> <horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > Hello, the attached is the v4 patch that checks feedback timing
> > every WAL segments boundary.
..
> > I said that checking whether to send feedback every boundary
> > between WAL segments seemed too long but after some rethinking, I
> > changed my mind.
> >
> >  - The most large possible delay source in the busy-receive loop
> >    is fsyncing at closing WAL segment file just written, this can
> >    take several seconds.  Freezing longer than the timeout
> >    interval could not be saved and is not worth saving anyway.
> >
> >  - 16M bytes-disk-writes intervals between gettimeofday() seems
> >    to be gentle enough even on platforms where gettimeofday() is
> >    rather heavy.
> 
> Sounds reasonable to me.
> 
> So we don't need to address the problem in walreceiver side so proactively
> because it tries to send the feedback every after flushing the WAL records.
> IOW, the problem you observed is less likely to happen. Right?
> 
> +            now = feGetCurrentTimestamp();
> +            if (standby_message_timeout > 0 &&

Surely it would hardly happen, especially on ordinary
configuration.

However, the continuous receiving of the replication stream is a
quite normal behavior even if hardly happens.  So the receiver
should guarantee the feedbacks to be sent by logic as long as it
is working normally, as long as the code for the special case
won't be too large and won't take too long time:).

The current walreceiver doesn't look trying to send feedbacks
after flushing every wal records. HandleCopyStream will
restlessly process the records in a gapless replication stream,
sending feedback only when keepalive packet arrives. It is the
fact that the response to the keepalive would help greatly but it
is not what the keepalives are for. It is intended to be used to
confirm if a silent receiver is still alive.

Even with this fix, the case that one write or flush takes longer
time than the feedback interval cannot be saved, but it would be
ok since it should be regarded as disorder.

> Minor comment: should feGetCurrentTimestamp() be called after the check of
> standby_message_timeout > 0, to avoid unnecessary calls of that?

Ah, you're right. I'll fixed it.

>  ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
>                     XLogRecPtr *blockpos, uint32 timeline,
>                     char *basedir, stream_stop_callback stream_stop,
> -                   char *partial_suffix, bool mark_done)
> +                   char *partial_suffix, bool mark_done,
> +                   int standby_message_timeout, int64 *last_status)
> 
> Maybe it's time to refactor this ugly coding (i.e., currently many arguments
> need to be given to each functions. Looks ugly)...

I'm increasing the ugliness:(

XLog stuff seems to need to share many states widely to work. But
the parameter list of the function looks to be bearable to this
extent, to me:).

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From ef7b04c9ddf351ca99736d9ec9fa1954383cd124 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Tue, 24 Feb 2015 17:52:01 +0900
Subject: [PATCH] Make effort to send feedback regulary on heavy load.

pg_basebackup and pg_receivexlog might be forced to omit sending
feedback for long time by continuous replication stream caused by
possible heavy load on receiver side. Keep alives from the server
could be delayed on such a situation. This patch let them make efforts
to send feedback on such a situation. On every boundary between WAL
segments, send feedback if so the time has come just after syncing and
closing the segment just finished.
---
 src/bin/pg_basebackup/receivelog.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 8caedff..df51f9d 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -45,7 +45,8 @@ static bool ProcessKeepaliveMsg(PGconn *conn, char *copybuf, int len,
 static bool ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
 							   XLogRecPtr *blockpos, uint32 timeline,
 							   char *basedir, stream_stop_callback stream_stop,
-							   char *partial_suffix, bool mark_done);
+							   char *partial_suffix, bool mark_done,
+							   int standby_message_timeout, int64 *last_status);
 static PGresult *HandleEndOfCopyStream(PGconn *conn, char *copybuf,
 									   XLogRecPtr blockpos, char *basedir, char *partial_suffix,
 									   XLogRecPtr *stoppos, bool mark_done);
@@ -906,7 +907,8 @@ HandleCopyStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
 			{
 				if (!ProcessXLogDataMsg(conn, copybuf, r, &blockpos,
 										timeline, basedir, stream_stop,
-										partial_suffix, mark_done))
+										partial_suffix, mark_done,
+										standby_message_timeout, &last_status))
 					goto error;
 
 				/*
@@ -1115,7 +1117,8 @@ static bool
 ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
 				   XLogRecPtr *blockpos, uint32 timeline,
 				   char *basedir, stream_stop_callback stream_stop,
-				   char *partial_suffix, bool mark_done)
+				   char *partial_suffix, bool mark_done,
+				   int standby_message_timeout, int64 *last_status)
 {
 	int			xlogoff;
 	int			bytes_left;
@@ -1223,12 +1226,31 @@ ProcessXLogDataMsg(PGconn *conn, char *copybuf, int len,
 		/* Did we reach the end of a WAL segment? */
 		if (*blockpos % XLOG_SEG_SIZE == 0)
 		{
+			int64 now;
 			if (!close_walfile(basedir, partial_suffix, *blockpos, mark_done))
 				/* Error message written in close_walfile() */
 				return false;
 
 			xlogoff = 0;
 
+			/*
+			 * Continuous input stream might cause long duration after the
+			 * previous feedback. Here is a good point to check if the time to
+			 * feedback has come because the fsync done in close_walfile()
+			 * might have taken long time.
+			 */
+			if (standby_message_timeout > 0)
+			{
+				now = feGetCurrentTimestamp();
+				if(feTimestampDifferenceExceeds(*last_status, now,
+												standby_message_timeout))
+				{
+					if (!sendFeedback(conn, *blockpos, now, false))
+						return false;
+					*last_status = now;
+				}
+			}
+
 			if (still_sending && stream_stop(*blockpos, timeline, true))
 			{
 				if (PQputCopyEnd(conn, NULL) <= 0 || PQflush(conn))
-- 
2.1.0.GIT

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup may fail to send feedbacks.

Reply via email to