Hi Kyotaro,

On Wed, 13 Mar 2024 at 03:56, Kyotaro Horiguchi <horikyota....@gmail.com>
wrote:

I identified the cause of the second issue. When I tried to replay the
> issue, the second standby accidentally received the old timeline's
> last page-spanning record till the end while the first standby was
> promoting (but it had not been read by recovery). In addition to that,
> on the second standby, there's a time window where the timeline
> increased but the first segment of the new timeline is not available
> yet. In this case, the second standby successfully reads the
> page-spanning record in the old timeline even after the second standby
> noticed that the timeline ID has been increased, thanks to the
> robustness of XLogFileReadAnyTLI().
>

Hmm, I don't think it could really be prevented.
There are always chances that the standby that is not ahead of other
standbys could be promoted due to reasons like:
1. HA configuration doesn't let certain nodes to be promoted.
2. This is an async standby (name isn't listed in
synchronous_standby_names) and it was ahead of promoted sync standby. No
data loss from the client point of view.


> Of course, regardless of the changes above, if recovery on the second
> standby had reached the end of the page-spanning record before
> redirection to the first standby, it would need pg_rewind to connect
> to the first standby.
>

Correct, IMO pg_rewind is a right way of solving it.

Regards,
--
Alexander Kukushkin

Reply via email to