Dear Alexander, > Unfortunately, the testing procedure I shared above still produces failures > with the patched 009_twophase.pl.
Hmm, I ran the test for hours, but I could nor reproduce the failure. But let
me analyze
based on your log.
> With wal_debug = on
> in /tmp/temp.config, I can see:
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:
> statement: PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG:
> INSERT @ 0/030227F8: - Transaction/PREPARE: gid xact_009_10: 2026-02-17
> 07:06:44.313023+02; subxacts: 790
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:
> PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl LOG: xlog
> flush request 0/030227F8; write 0/00000000; flush 0/00000000
> 2026-02-17 07:06:44.313 EET client backend[754908] 009_twophase.pl STATEMENT:
> PREPARE TRANSACTION 'xact_009_10';
> 2026-02-17 07:06:44.313 EET background writer[754333] LOG: INSERT @
> 0/03022838: - Standby/RUNNING_XACTS: nextXid 791 latestCompletedXid 788
> oldestRunningXid 789; 1 xacts: 789; 1 subxacts: 790
> 2026-02-17 07:06:44.322 EET client backend[754949] 009_twophase.pl LOG:
> statement: SELECT pg_current_wal_flush_lsn()
> 2026-02-17 07:06:44.330 EET client backend[754974] 009_twophase.pl LOG:
> statement: SELECT '0/030227F8' <= replay_lsn AND state = 'streaming'
> FROM pg_catalog.pg_stat_replication
> WHERE application_name IN ('paris', 'walreceiver')
> 2026-02-17 07:06:44.337 EET postmaster[754306] LOG: received immediate
> shutdown request
> 2026-02-17 07:06:44.339 EET postmaster[754306] LOG: database system is shut
> down
I have few experience to see the wal_debug output, but background writer seems
to
generate the RUNNING_XACTS record. It's different from my expectation. To
confirm,
did you really enable the injection point? For now 009_twophase can work without
the `-Dinjection_points=true` but it should be set to avoid random failures.
Another possibility is that bgwriter has already been passed
`IS_INJECTION_POINT_ATTACHED("skip-log-running-xacts")`
before attaching the injection point. In this case background writer would
generate the RUNNING_XACTS records at the bad timing. One possible workaround
is to attach the injeciton_point bit earlier, it does not solve the root cause
though. Attached patch applies the band-aid.
> Regarding the approach with "skip-log-running-xacts", I wonder if there is
> a strong guarantee that no other WAL write can happen in the same window
> (e.g., from autovacuum)?
Good point, autovacuum and checkpoint should be also avoided. IIUC it can be
done
by setting GUCs.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
v2-0001-Stabilize-009_twophase.pl.patch
Description: v2-0001-Stabilize-009_twophase.pl.patch
