On Thu, Jan 11, 2024 at 4:35 PM vignesh C <vignes...@gmail.com> wrote:
>
> On further analysis, it was found that in the failing test,
> CHECKPOINT_SHUTDOWN was started in a new page, so there was the WAL
> page header present just before the CHECKPOINT_SHUTDOWN which was
> causing the failure. We could alternatively reproduce the issue by
> switching the WAL file before restarting the server like in the
> attached test change patch.
> There are a couple of ways to fix this issue a) one by switching the
> WAL before the insertion of records so that the CHECKPOINT_SHUTDOWN
> does not get inserted in a new page as in the attached test_fix.patch
> b) by using pg_walinspect to check that the next WAL record is
> CHECKPOINT_SHUTDOWN. I have to try this approach.
>
> Thanks to Bharath and Kuroda-san for offline discussions and helping
> in getting to the root cause.

IIUC, the problem the commit e0b2eed tries to solve is to ensure there
are no left-over decodable WAL records between confirmed_flush LSN and
a shutdown checkpoint, which is what it is expected from the
t/038_save_logical_slots_shutdown.pl. How about we have a PG function
returning true if there are any decodable WAL records between the
given start_lsn and end_lsn? Usage of this new function will make the
tests more concrete and stable. This function doesn't have to be
something really new, we can just turn
binary_upgrade_logical_slot_has_caught_up to a general, non-binary PG
function; this idea has come up before
https://www.postgresql.org/message-id/CAA4eK1KZXaBgVOAdV8ZfG6AdDbKYFVz7teDa7GORgQ3RVYS93g%40mail.gmail.com.
If okay, I can offer to write a patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to