Hello Amit and Hou-San,

11.07.2024 13:21, Amit Kapila wrote:

We don't wait for the xmin to catch up corresponding to this insert
and I don't know if there is a way to do that. So, we should move this
Insert to after the call to pg_sync_replication_slots(). It won't
impact the general test of pg_createsubscriber.

Thanks to Hou-San for helping me in the analysis of this BF failure.

Thank you for investigating that issue!

May I ask you to look at another failure of the test occurred today [1]?
The failure log contains:
recovery_target_lsn = '0/30098D0'
pg_createsubscriber: starting the subscriber
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
2024-07-11 07:40:10.837 UTC [2948531][postmaster][:0] LOG:  received fast 
shutdown request

Though what I'm seeing after a successful run is:
recovery_target_lsn = '0/3009860'
pg_createsubscriber: starting the subscriber
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
2024-07-11 15:19:40.733 UTC [207517] 040_pg_createsubscriber.pl LOG:  
statement: SELECT pg_catalog.pg_is_in_recovery()
2024-07-11 15:19:41.635 UTC [207514] LOG:  recovery stopping after WAL location (LSN) 
"0/3009860"
2024-07-11 15:19:41.635 UTC [207514] LOG:  redo done at 0/3009860 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 21.00 s

I've managed to reproduce the failure locally. With the bgwriter mod:
-#define LOG_SNAPSHOT_INTERVAL_MS 15000
+#define LOG_SNAPSHOT_INTERVAL_MS 150

and wal_debug=on, when running 5 test instances with parallel, I get the
failure with the following log:
recovery_target_lsn = '0/3009A20'
pg_createsubscriber: starting the subscriber

2024-07-11 14:40:04.551 UTC [205589:72][startup][33/0:0] LOG:  REDO @ 0/30099E8; LSN 0/3009A20: prev 0/30099B0; xid 0; len 24 - Standby/RUNNING_XACTS: nextXid 747 latestCompletedXid 746 oldestRunningXid 747
# ^^^ the last REDO record in the log
...
pg_createsubscriber: server was started
pg_createsubscriber: waiting for the target server to reach the consistent state
...
pg_createsubscriber: server was stopped
pg_createsubscriber: error: recovery timed out
...
[14:43:06.011](181.800s) not ok 30 - run pg_createsubscriber on node S
[14:43:06.012](0.000s)
[14:43:06.012](0.000s) #   Failed test 'run pg_createsubscriber on node S'
#   at t/040_pg_createsubscriber.pl line 356.

$ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata/pg_wal/ 000000010000000000000003 000000010000000000000003 | tail -2 rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747 rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747

Whilst
$ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_p_data/pgdata/pg_wal/ 000000010000000000000003 000000010000000000000003 | tail rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747 rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747 rmgr: Heap2       len (rec/tot):     60/    60, tx:        747, lsn: 0/03009A20, prev 0/030099E8, desc: NEW_CID rel: 1663/16384/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295 rmgr: Heap        len (rec/tot):     54/    54, tx:        747, lsn: 0/03009A60, prev 0/03009A20, desc: DELETE xmax: 747, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16384/6104 blk 0 rmgr: Transaction len (rec/tot):     78/    78, tx:        747, lsn: 0/03009A98, prev 0/03009A60, desc: INVALIDATION ; inval msgs: catcache 49 catcache 46 relcache 0 rmgr: Transaction len (rec/tot):     98/    98, tx:        747, lsn: 0/03009AE8, prev 0/03009A98, desc: COMMIT 2024-07-11 14:43:05.994561 UTC; relcache init file inval dbid 16384 tsid 1663; inval msgs: catcache 49 catcache 46 relcache 0 rmgr: Heap2       len (rec/tot):     60/    60, tx:        748, lsn: 0/03009B50, prev 0/03009AE8, desc: NEW_CID rel: 1663/16385/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295 rmgr: Heap        len (rec/tot):     54/    54, tx:        748, lsn: 0/03009B90, prev 0/03009B50, desc: DELETE xmax: 748, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16385/6104 blk 0 rmgr: Transaction len (rec/tot):     78/    78, tx:        748, lsn: 0/03009BC8, prev 0/03009B90, desc: INVALIDATION ; inval msgs: catcache 49 catcache 46 relcache 0 rmgr: Transaction len (rec/tot):     98/    98, tx:        748, lsn: 0/03009C18, prev 0/03009BC8, desc: COMMIT 2024-07-11 14:43:06.008619 UTC; relcache init file inval dbid 16385 tsid 1663; inval msgs: catcache 49 catcache 46 relcache 0

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-07-11%2007%3A25%3A12

Best regards,
Alexander


Reply via email to