Hi ChangAo,

Thanks for the v3, the commit message, in-line comment, and the
rewind_source.h note all look good

On the test front: I don't think a hang-detection test can be made
reliable. The bug requires the source's insert LSN to be exactly
segment_boundary + SizeOfXLogLongPHD with no further WAL activity, but
bgwriter's periodic LogStandbySnapshot emits a RUNNING_XACTS which can
advance the insert LSN
nondeterministically between pg_switch_wal() and the rewind. In my
reproduction bgwriter ended the hang after ~9s; that's the kind of timing
we don't want in CI.

The deterministic alternative is to parse pg_controldata on the target
after pg_rewind and assert minRecoveryPoint does not land
at "boundary + SizeOfXLogLongPHD". That's a direct check on the patched
behavior independent of source idleness or replay
timing. It doesn't exercise the integration property that the rewound node
reaches consistency without further upstream WAL.
So I am not sure if this testcase is a complete one in our scenario.


Regards,
Surya Poondla

Reply via email to