Re: [HACKERS] pg_rewind test race condition..?

Heikki Linnakangas Tue, 28 Apr 2015 17:37:50 -0700

On 04/28/2015 11:02 AM, Stephen Frost wrote:

Heikki,


   Not sure if anyone else is seeing this, but I'm getting regression
   test failures when running the pg_rewind tests pretty consistently
   with 'make check'.  Specifically with "basic remote", I'm getting:

source and target cluster are on the same timeline
Failure, exiting

   in regress_log/pg_rewind_log_basic_remote.

   If I throw a "sleep(5);" into t/001_basic.pl before the call to
   RewindTest::run_pg_rewind($test_mode); then everything works fine.

The problem seems to be that when the standby is promoted, it's aso-called "fast promotion", where it writes an end-of-recovery recordand starts accepting queries before creating a real checkpoint.pg_rewind looks at the TLI in the latest checkpoint, as it's in thecontrol file, but that isn't updated until the checkpoint completes. Idon't see it on my laptop normally, but I can reproduce it if I insert a"sleep(5)" in StartupXLog, just before it requests the checkpoint:


--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7173,7 +7173,10 @@ StartupXLOG(void)
         * than is appropriate now that we're not in standby mode anymore.
         */
        if (fast_promoted)
+       {
+               sleep(5);
                RequestCheckpoint(CHECKPOINT_FORCE);
+       }
 }

The simplest fix would be to force a checkpoint in the regression test,before running pg_rewind. It's a bit of a cop out, since you'd still getthe same issue when you tried to do the same thing in the real world. Itshould be rare in practice - you'd not normally run pg_rewindimmediately after promoting the standby - but a better error message atleast would be nice..

- Heikki



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind test race condition..?

Reply via email to