On Sat, Sep 15, 2018 at 12:29 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > On 2018-Sep-15, Masahiko Sawada wrote: > >> On Fri, Sep 14, 2018 at 4:27 PM, 李海龙 <hailong...@qunar.com> wrote: > >> > When I enable the parameter track_commit_timestamp in postgresql.conf of a >> > Base Backup (making a Base Backup from a standby and the >> > track_commit_timestamp is off on it), >> >> In addition to the above operation, I've reproduced this issue by >> replaying a commit WAL record that sets the timestamp to a new page >> during the crash recovery (or from restart). >> >> It seems to me that the cause of this is that we could not extend >> commitTs page since the COMMIT_TS_ZEROPAGE WAL wasn't generated at the >> standby server whose track_commit_timestamp is off. So during >> replaying the commit WAL record the startup process fails since the >> corresponding commitTs page doesn't exist. > > Hmm, wow. I wonder if it's possible to detect the config difference > early enough that the zeropage WAL records are emitted, instead. But > even this might not work, since some transactions need to have their > commitTS in pages that will not have been zeroed anyway, because the > page threshold was crossed in the old primary. > >> To fix that maybe we can disable commitTs if >> controlFile->track_commit_timestamp == false and the >> track_commit_timestamp == true even in crash recovery. > > Hmm, so keep it off while crash recovery runs, and once it's out of that > then enable it automatically?
Yes. The attached patch changes it to check controlFile->track_commit_timestamp even the crash recover case. If track_commit_timestamp is set to true in the config file, it's enabled at end of the recovery. > That might work -- by definition we don't > care about the commit TSs of the transaction replayed during crash > recovery, since they were executed in the primary that didn't have > commitTS enable anyway. > > It seems like the first thing we need is TAP cases that reproduce these > two crash scenarios. I attached TAP test that reproduces this issue. We can reproduce it even with single server; making postgres replay a commit WAL in the crash recovery after consumed transactions and enabled track_commit_timestamp. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 493f1db..805bbe1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6809,11 +6809,12 @@ StartupXLOG(void) StartupMultiXact(); /* - * Ditto commit timestamps. In a standby, we do it if setting is enabled - * in ControlFile; in a master we base the decision on the GUC itself. + * Ditto commit timestamps. In both a standby and a master, we do it if + * setting is enabled in ControlFile since we don't care about the commit + * timestamps of the transaction that were executed when the commit + * timestamp is disabled. */ - if (ArchiveRecoveryRequested ? - ControlFile->track_commit_timestamp : track_commit_timestamp) + if (ControlFile->track_commit_timestamp) StartupCommitTs(); /*
diff --git a/src/test/modules/commit_ts/t/005_recovery.pl b/src/test/modules/commit_ts/t/005_recovery.pl new file mode 100644 index 0000000..cdc1cf8 --- /dev/null +++ b/src/test/modules/commit_ts/t/005_recovery.pl @@ -0,0 +1,53 @@ +# Recovery test + +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 1; + +my $node = get_new_node('test'); +$node->init; +$node->append_conf('postgresql.conf', + "track_commit_timestamp = off"); +$node->start; + +# When we start firstly from the initdb the PARAMETER_CHANGES +# is emitted at end of the recovery, which disables the +# track_commit_timestamp if the crash recovery replay that +# WAL. Therefore we restart the server so that we can recovery +# from the point where doesn't contain that WAL. +$node->restart; + +# Consume 2000 XIDs to beyond the commitTS page boundary. +$node->safe_psql( + 'postgres', + qq( +CREATE PROCEDURE comsume_xid(cnt int) +AS \$\$ +DECLARE + i int; +BEGIN + FOR i in 1..cnt LOOP + EXECUTE 'SELECT txid_current()'; + COMMIT; + END LOOP; +END; +\$\$ +LANGUAGE plpgsql; +)); +$node->safe_psql('postgres', 'CALL comsume_xid(2000)'); + +$node->teardown_node; + +# Enable track_commit_tiemstamp +$node->append_conf('postgresql.conf', + "track_commit_timestamp = on"); + +# During the crash recovery we replay the commit WAL that sets +# the commit timestamp to a new page. +$node->start; + +# Check if the server launched. +is($node->psql('postgres', qq(SELECT 1)), 0, + 'started from the crash recovery');