Hi Kyotaro, On Wed, 13 Mar 2024 at 03:56, Kyotaro Horiguchi <horikyota....@gmail.com> wrote:
I identified the cause of the second issue. When I tried to replay the > issue, the second standby accidentally received the old timeline's > last page-spanning record till the end while the first standby was > promoting (but it had not been read by recovery). In addition to that, > on the second standby, there's a time window where the timeline > increased but the first segment of the new timeline is not available > yet. In this case, the second standby successfully reads the > page-spanning record in the old timeline even after the second standby > noticed that the timeline ID has been increased, thanks to the > robustness of XLogFileReadAnyTLI(). > Hmm, I don't think it could really be prevented. There are always chances that the standby that is not ahead of other standbys could be promoted due to reasons like: 1. HA configuration doesn't let certain nodes to be promoted. 2. This is an async standby (name isn't listed in synchronous_standby_names) and it was ahead of promoted sync standby. No data loss from the client point of view. > Of course, regardless of the changes above, if recovery on the second > standby had reached the end of the page-spanning record before > redirection to the first standby, it would need pg_rewind to connect > to the first standby. > Correct, IMO pg_rewind is a right way of solving it. Regards, -- Alexander Kukushkin