Re: recovery_target immediate timestamp

2020-11-13 Thread Fujii Masao




On 2020/11/13 8:39, Euler Taveira wrote:

On Wed, 11 Nov 2020 at 22:40, Fujii Masao mailto:masao.fu...@oss.nttdata.com>> wrote:


On 2020/11/12 6:00, Euler Taveira wrote:

 > The first patch adds a new message that prints the latest completed 
checkpoint
 > when the consistent state is reached.

I'm not sure how useful this information is in practice.

Fujii, thanks for reviewing it. It provides the same information as the "last
completed transaction was" message.

 > It also exposes the checkpoint timestamp
 > in debug messages.

                         ereport(DEBUG1,
                                         (errmsg("checkpoint record is at 
%X/%X",
                                                         (uint32) (checkPointLoc 
>> 32), (uint32) checkPointLoc)));
+                       ereport(DEBUG1,
+                                       (errmsg("checkpoint time is %s", 
str_time(checkPoint.time;

The above first debug message displays the LSN of the checkpoint record.
OTOH, the second message displays the time when the checkpoint started
(not the time when checkpoint record was written at the end of checkpoint).
So isn't it confusing to display those inconsistent information together?

Indeed the checkpoint timestamp is from before it determines the REDO LSN. Are
you saying the this checkpoint timestamp is inconsistent because it is not near
the point it saves the RedoRecPtr? If so, let's move checkPoint.time a few
lines below.


No. What I'd like to say is; checkPointLoc that the first debug message
outputs is the LSN of checkpoint record, not the checkpoint REDO location.
The checkpoint REDO location is determined at the early stage of
checkpointing. OTOH, the location of checkpoint record is determined
at the end of checkpointing. They are different.

The checkpoint time that the second debug message you added outputs is
the timestamp determined at the beginning of checkpointing. So it seems
not reasonable to display the location of checkpoint record and
the checkpoint time because they are determined at the different timing.
Am I missing something?




     /*
      * Here we update the shared RedoRecPtr for future XLogInsert calls; this
      * must be done while holding all the insertion locks.
      *
      * Note: if we fail to complete the checkpoint, RedoRecPtr will be left
      * pointing past where it really needs to point.  This is okay; the only
      * consequence is that XLogInsert might back up whole buffers that it
      * didn't really need to.  We can't postpone advancing RedoRecPtr because
      * XLogInserts that happen while we are dumping buffers must assume that
      * their buffer changes are not included in the checkpoint.
      */
     RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
     checkPoint.time = (pg_time_t) time(NULL);

I realized that I was using the wrong variable in one of the debug messages.

 > The second patch provides the checkpoint timestamp in the pg_waldump 
output and
 > also when you enable wal_debug parameter. The pg_waldump output looks 
like

+1

+#ifdef FRONTEND
+               strftime(checkpointstr, sizeof(checkpointstr), "%c", 
localtime(&time_tmp));
+#else
+               pg_strftime(checkpointstr, sizeof(checkpointstr), "%c", 
pg_localtime(&time_tmp, log_timezone));
+#endif

You can simplify the code by using timestamptz_to_str() here instead, like 
xact_desc_commit() does.

I have the same idea until I realized that checkPoint.time is pg_time_t and not
TimestampTz. [digging the code a bit...] I figure out there is a function that
converts from pg_time_t to TimestampTz: time_t_to_timestamptz(). I removed that
ugly code but have to duplicate this function into compat.c. I don't have a
strong preference but I'm attaching a new patch.


Thanks for updating the patch! At least for me this approach looks better.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: recovery_target immediate timestamp

2020-11-12 Thread Euler Taveira
On Wed, 11 Nov 2020 at 22:40, Fujii Masao 
wrote:

>
> On 2020/11/12 6:00, Euler Taveira wrote:
>
> > The first patch adds a new message that prints the latest completed
> checkpoint
> > when the consistent state is reached.
>
> I'm not sure how useful this information is in practice.
>
> Fujii, thanks for reviewing it. It provides the same information as the
"last
completed transaction was" message.


> > It also exposes the checkpoint timestamp
> > in debug messages.
>
> ereport(DEBUG1,
> (errmsg("checkpoint record is at
> %X/%X",
> (uint32)
> (checkPointLoc >> 32), (uint32) checkPointLoc)));
> +   ereport(DEBUG1,
> +   (errmsg("checkpoint time is %s",
> str_time(checkPoint.time;
>
> The above first debug message displays the LSN of the checkpoint record.
> OTOH, the second message displays the time when the checkpoint started
> (not the time when checkpoint record was written at the end of checkpoint).
> So isn't it confusing to display those inconsistent information together?
>
> Indeed the checkpoint timestamp is from before it determines the REDO LSN.
Are
you saying the this checkpoint timestamp is inconsistent because it is not
near
the point it saves the RedoRecPtr? If so, let's move checkPoint.time a few
lines below.

/*
 * Here we update the shared RedoRecPtr for future XLogInsert calls;
this
 * must be done while holding all the insertion locks.
 *
 * Note: if we fail to complete the checkpoint, RedoRecPtr will be left
 * pointing past where it really needs to point.  This is okay; the only
 * consequence is that XLogInsert might back up whole buffers that it
 * didn't really need to.  We can't postpone advancing RedoRecPtr
because
 * XLogInserts that happen while we are dumping buffers must assume that
 * their buffer changes are not included in the checkpoint.
 */
RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
checkPoint.time = (pg_time_t) time(NULL);

I realized that I was using the wrong variable in one of the debug messages.


> > The second patch provides the checkpoint timestamp in the pg_waldump
> output and
> > also when you enable wal_debug parameter. The pg_waldump output looks
> like
>
> +1
>
> +#ifdef FRONTEND
> +   strftime(checkpointstr, sizeof(checkpointstr), "%c",
> localtime(&time_tmp));
> +#else
> +   pg_strftime(checkpointstr, sizeof(checkpointstr), "%c",
> pg_localtime(&time_tmp, log_timezone));
> +#endif
>
> You can simplify the code by using timestamptz_to_str() here instead, like
> xact_desc_commit() does.
>
> I have the same idea until I realized that checkPoint.time is pg_time_t
and not
TimestampTz. [digging the code a bit...] I figure out there is a function
that
converts from pg_time_t to TimestampTz: time_t_to_timestamptz(). I removed
that
ugly code but have to duplicate this function into compat.c. I don't have a
strong preference but I'm attaching a new patch.

At the end, I asked myself if it is worth changing this type from pg_time_t
to
TimestampTz.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From df8f4d295ebc205d46e77c5037c399afda17c546 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Wed, 11 Nov 2020 11:24:25 -0300
Subject: [PATCH 1/2] Report latest completed checkpoint timestamp

A new detail message prints the latest completed checkpoint when
reaching the consistent state. This information is useful when you use
recovery_target = immediate because it is not clear what timestamp it
stops applying the WAL.

It also adds new debug messages that report the checkpoint timestamp.
It might be useful for debug purposes.
---
 src/backend/access/transam/xlog.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa63f37615..2ff83c49e2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6524,6 +6524,8 @@ StartupXLOG(void)
 			ereport(DEBUG1,
 	(errmsg("checkpoint record is at %X/%X",
 			(uint32) (checkPointLoc >> 32), (uint32) checkPointLoc)));
+			ereport(DEBUG1,
+	(errmsg("checkpoint time is %s", str_time(checkPoint.time;
 			InRecovery = true;	/* force recovery even if SHUTDOWNED */
 
 			/*
@@ -6657,6 +6659,8 @@ StartupXLOG(void)
 			ereport(DEBUG1,
 	(errmsg("checkpoint record is at %X/%X",
 			(uint32) (checkPointLoc >> 32), (uint32) checkPointLoc)));
+			ereport(DEBUG1,
+	(errmsg("checkpoint time is %s", str_time(ControlFile->checkPointCopy.time;
 		}
 		else
 		{
@@ -8033,7 +8037,9 @@ CheckRecoveryConsistency(void)
 		ereport(LOG,
 (errmsg("consistent recovery state reached at %X/%X",
 		(uint32) (lastReplayedEndRecPtr >> 32),
-	

Re: recovery_target immediate timestamp

2020-11-11 Thread Fujii Masao




On 2020/11/12 6:00, Euler Taveira wrote:

Hi,

While restoring a backup using recovery_target immediate, I noticed that there
isn't a timestamp available.

LOG:  consistent recovery state reached at 0/A000100
LOG:  recovery stopping after reaching consistency
LOG:  pausing at the end of recovery
HINT:  Execute pg_wal_replay_resume() to promote.
LOG:  database system is ready to accept read only connections

if you decide to use one of the targets or just recover until the end of the
WAL, you get a (last completed transaction) timestamp.

LOG:  redo done at 0/10FFEC38 system usage: CPU: user: 0.10 s, system: 0.05 s, 
elapsed: 1.65 s
LOG:  last completed transaction was at log time 2020-11-11 17:27:31.715251-03
LOG:  restored log file "00010010" from archive
cp: cannot stat '/a/pgarchive/0002.history': No such file or directory
LOG:  selected new timeline ID: 2
LOG:  archive recovery complete
cp: cannot stat '/a/pgarchive/0001.history': No such file or directory
LOG:  database system is ready to accept connections

I dig into the pg_waldump output to figure out the timestamp, however, the
checkpoint timestamp isn't printed by pg_waldump. The checkpoint timestamp
might be useful information at least when you set large values for a checkpoint
or need to investigate a performance/corruption issue.

The first patch adds a new message that prints the latest completed checkpoint
when the consistent state is reached.


I'm not sure how useful this information is in practice.


It also exposes the checkpoint timestamp
in debug messages.


ereport(DEBUG1,
(errmsg("checkpoint record is at %X/%X",
(uint32) (checkPointLoc 
>> 32), (uint32) checkPointLoc)));
+   ereport(DEBUG1,
+   (errmsg("checkpoint time is %s", 
str_time(checkPoint.time;

The above first debug message displays the LSN of the checkpoint record.
OTOH, the second message displays the time when the checkpoint started
(not the time when checkpoint record was written at the end of checkpoint).
So isn't it confusing to display those inconsistent information together?



LOG:  consistent recovery state reached at 0/A000100
DETAIL:  Last completed checkpoint was at log time 2020-11-11 17:31:50 -03.
LOG:  recovery stopping after reaching consistency
LOG:  pausing at the end of recovery
HINT:  Execute pg_wal_replay_resume() to promote.
LOG:  database system is ready to accept read only connections
.
.
.
DEBUG:  checkpoint record is at 0/A60
DEBUG:  checkpoint time is 2020-11-11 17:34:19 -03

The second patch provides the checkpoint timestamp in the pg_waldump output and
also when you enable wal_debug parameter. The pg_waldump output looks like


+1

+#ifdef FRONTEND
+   strftime(checkpointstr, sizeof(checkpointstr), "%c", 
localtime(&time_tmp));
+#else
+   pg_strftime(checkpointstr, sizeof(checkpointstr), "%c", 
pg_localtime(&time_tmp, log_timezone));
+#endif

You can simplify the code by using timestamptz_to_str() here instead, like 
xact_desc_commit() does.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




recovery_target immediate timestamp

2020-11-11 Thread Euler Taveira
Hi,

While restoring a backup using recovery_target immediate, I noticed that
there
isn't a timestamp available.

LOG:  consistent recovery state reached at 0/A000100
LOG:  recovery stopping after reaching consistency
LOG:  pausing at the end of recovery
HINT:  Execute pg_wal_replay_resume() to promote.
LOG:  database system is ready to accept read only connections

if you decide to use one of the targets or just recover until the end of the
WAL, you get a (last completed transaction) timestamp.

LOG:  redo done at 0/10FFEC38 system usage: CPU: user: 0.10 s, system: 0.05
s, elapsed: 1.65 s
LOG:  last completed transaction was at log time 2020-11-11
17:27:31.715251-03
LOG:  restored log file "00010010" from archive
cp: cannot stat '/a/pgarchive/0002.history': No such file or directory
LOG:  selected new timeline ID: 2
LOG:  archive recovery complete
cp: cannot stat '/a/pgarchive/0001.history': No such file or directory
LOG:  database system is ready to accept connections

I dig into the pg_waldump output to figure out the timestamp, however, the
checkpoint timestamp isn't printed by pg_waldump. The checkpoint timestamp
might be useful information at least when you set large values for a
checkpoint
or need to investigate a performance/corruption issue.

The first patch adds a new message that prints the latest completed
checkpoint
when the consistent state is reached. It also exposes the checkpoint
timestamp
in debug messages.

LOG:  consistent recovery state reached at 0/A000100
DETAIL:  Last completed checkpoint was at log time 2020-11-11 17:31:50 -03.
LOG:  recovery stopping after reaching consistency
LOG:  pausing at the end of recovery
HINT:  Execute pg_wal_replay_resume() to promote.
LOG:  database system is ready to accept read only connections
.
.
.
DEBUG:  checkpoint record is at 0/A60
DEBUG:  checkpoint time is 2020-11-11 17:34:19 -03

The second patch provides the checkpoint timestamp in the pg_waldump output
and
also when you enable wal_debug parameter. The pg_waldump output looks like

rmgr: XLOGlen (rec/tot):114/   114, tx:  0, lsn:
0/0A60, prev 0/0A28, desc: CHECKPOINT_ONLINE redo 0/A28;
timestamp qua 11 nov 2020 17:34:19 -03; tli 1; prev tli 1; fpw true; xid
0:519; oid 24576; multi 1; offset 0; oldest xid 501 in DB 1; oldest multi 1
in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 519;
online

and the debug messages are

DEBUG:  checkpoint record is at 0/A60
DEBUG:  checkpoint time is 2020-11-11 17:37:47 -03
LOG:  REDO @ 0/A60; LSN 0/AD8: prev 0/A28; xid 0; len 88 -
XLOG/CHECKPOINT_ONLINE: redo 0/A28; timestamp Wed Nov 11 17:37:47 2020;
tli 1; prev tli 1; fpw true; xid 0:519; oid 24576; multi 1; offset 0;
oldest xid 501 in DB 1; oldest multi 1 in DB 1; oldest/newest commit
timestamp xid: 0/0; oldest running xid 519; online


Regards,

-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 4b725a45dfb89ee9a463e6ee7987cd8b660053cd Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Wed, 11 Nov 2020 11:24:25 -0300
Subject: [PATCH 1/2] Report latest completed checkpoint timestamp

A new detail message prints the latest completed checkpoint when
reaching the consistent state. This information is useful when you use
recovery_target = immediate because it is not clear what timestamp it
stops applying the WAL.

It also adds new debug messages that report the checkpoint timestamp.
It might be useful for debug purposes.
---
 src/backend/access/transam/xlog.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index aa63f37615..cf70950932 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6524,6 +6524,8 @@ StartupXLOG(void)
 			ereport(DEBUG1,
 	(errmsg("checkpoint record is at %X/%X",
 			(uint32) (checkPointLoc >> 32), (uint32) checkPointLoc)));
+			ereport(DEBUG1,
+	(errmsg("checkpoint time is %s", str_time(checkPoint.time;
 			InRecovery = true;	/* force recovery even if SHUTDOWNED */
 
 			/*
@@ -6657,6 +6659,8 @@ StartupXLOG(void)
 			ereport(DEBUG1,
 	(errmsg("checkpoint record is at %X/%X",
 			(uint32) (checkPointLoc >> 32), (uint32) checkPointLoc)));
+			ereport(DEBUG1,
+	(errmsg("checkpoint time is %s", str_time(checkPoint.time;
 		}
 		else
 		{
@@ -8033,7 +8037,9 @@ CheckRecoveryConsistency(void)
 		ereport(LOG,
 (errmsg("consistent recovery state reached at %X/%X",
 		(uint32) (lastReplayedEndRecPtr >> 32),
-		(uint32) lastReplayedEndRecPtr)));
+		(uint32) lastReplayedEndRecPtr),
+ errdetail("Last completed checkpoint was at log time %s.",
+		   str_time(ControlFile->checkPointCopy.time;
 	}
 
 	/*
-- 
2.20.1

From 9b7b3effad8c860a003dc60eb1ea53aa15a26ad9 Mon Sep 17 00:00:00 2001
From: Euler Taveira 
Date: Wed, 11 Nov 202