On Sat, Sep 15, 2018 at 12:29 AM, Alvaro Herrera
<alvhe...@2ndquadrant.com> wrote:
> On 2018-Sep-15, Masahiko Sawada wrote:
>
>> On Fri, Sep 14, 2018 at 4:27 PM, 李海龙 <hailong...@qunar.com> wrote:
>
>> > When I enable the parameter track_commit_timestamp in postgresql.conf of a
>> > Base Backup (making a Base Backup from a standby and the
>> > track_commit_timestamp is off on it),
>>
>> In addition to the above operation, I've reproduced this issue by
>> replaying a commit WAL record that sets the timestamp to a new page
>> during the crash recovery (or from restart).
>>
>> It seems to me that the cause of this is that we could not extend
>> commitTs page since the COMMIT_TS_ZEROPAGE WAL wasn't generated at the
>> standby server whose track_commit_timestamp is off. So during
>> replaying the commit WAL record the startup process fails since the
>> corresponding commitTs page doesn't exist.
>
> Hmm, wow.  I wonder if it's possible to detect the config difference
> early enough that the zeropage WAL records are emitted, instead.  But
> even this might not work, since some transactions need to have their
> commitTS in pages that will not have been zeroed anyway, because the
> page threshold was crossed in the old primary.
>
>> To fix that maybe we can disable commitTs if
>> controlFile->track_commit_timestamp == false and the
>> track_commit_timestamp == true even in crash recovery.
>
> Hmm, so keep it off while crash recovery runs, and once it's out of that
> then enable it automatically?

Yes. The attached patch changes it to check
controlFile->track_commit_timestamp even the crash recover case. If
track_commit_timestamp is set to true in the config file, it's enabled
at end of the recovery.

> That might work -- by definition we don't
> care about the commit TSs of the transaction replayed during crash
> recovery, since they were executed in the primary that didn't have
> commitTS enable anyway.
>
> It seems like the first thing we need is TAP cases that reproduce these
> two crash scenarios.

I attached TAP test that reproduces this issue. We can reproduce it
even with single server; making postgres replay a commit WAL in the
crash recovery after consumed transactions and enabled
track_commit_timestamp.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 493f1db..805bbe1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6809,11 +6809,12 @@ StartupXLOG(void)
 	StartupMultiXact();
 
 	/*
-	 * Ditto commit timestamps.  In a standby, we do it if setting is enabled
-	 * in ControlFile; in a master we base the decision on the GUC itself.
+	 * Ditto commit timestamps.  In both a standby and a master, we do it if
+	 * setting is enabled in ControlFile since we don't care about the commit
+	 * timestamps of the transaction that were executed when the commit
+	 * timestamp is disabled.
 	 */
-	if (ArchiveRecoveryRequested ?
-		ControlFile->track_commit_timestamp : track_commit_timestamp)
+	if (ControlFile->track_commit_timestamp)
 		StartupCommitTs();
 
 	/*
diff --git a/src/test/modules/commit_ts/t/005_recovery.pl b/src/test/modules/commit_ts/t/005_recovery.pl
new file mode 100644
index 0000000..cdc1cf8
--- /dev/null
+++ b/src/test/modules/commit_ts/t/005_recovery.pl
@@ -0,0 +1,53 @@
+# Recovery test
+
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf',
+				  "track_commit_timestamp = off");
+$node->start;
+
+# When we start firstly from the initdb the PARAMETER_CHANGES
+# is emitted at end of the recovery, which disables the
+# track_commit_timestamp if the crash recovery replay that
+# WAL. Therefore we restart the server so that we can recovery
+# from the point where doesn't contain that WAL.
+$node->restart;
+
+# Consume 2000 XIDs to beyond the commitTS page boundary.
+$node->safe_psql(
+	'postgres',
+	qq(
+CREATE PROCEDURE comsume_xid(cnt int)
+AS \$\$
+DECLARE
+	i int;
+BEGIN
+	FOR i in 1..cnt LOOP
+		EXECUTE 'SELECT txid_current()';
+		COMMIT;
+	END LOOP;
+END;
+\$\$
+LANGUAGE plpgsql;
+));
+$node->safe_psql('postgres', 'CALL comsume_xid(2000)');
+
+$node->teardown_node;
+
+# Enable track_commit_tiemstamp
+$node->append_conf('postgresql.conf',
+				  "track_commit_timestamp = on");
+
+# During the crash recovery we replay the commit WAL that sets
+# the commit timestamp to a new page.
+$node->start;
+
+# Check if the server launched.
+is($node->psql('postgres', qq(SELECT 1)), 0,
+   'started from the crash recovery');

Reply via email to