Chao Li <[email protected]> wrote: > While reading logical replication code, I found an issue in > LogicalConfirmReceivedLocation(). > > In LogicalConfirmReceivedLocation(), updated_restart is tracked independently > from updated_xmin, and the slot is marked dirty and saved when either one > changed. But after that, ReplicationSlotsComputeRequiredLSN() is still only > called inside "if (updated_xmin)”. > > So for the restart-only case: > > * updated_restart = true > * updated_xmin = false > * ReplicationSlotSave() runs > * ReplicationSlotsComputeRequiredLSN() does not run because updated_xmin is > false > > That means the global retention point managed by > XLogSetReplicationSlotMinimumLSN() can stay stale until some later unrelated > event recomputes it. Since ReplicationSlotsComputeRequiredLSN() derives the > global minimum from slot restat_lsn, skipping it after a restart-only advance > can retain excess WAL and may lead to WAL bloat. > > This patch fixes the problem by moving ReplicationSlotsComputeRequiredLSN() > under “if (updated_restart)”.
FYI, this overlaps with another post in the REPACK thread [1]. > Looks like this issue has been there for a long time, so if this analysis is > correct, it may also be worth back-patching. As REPACK in PG 19 does not let xmin advance (that should be fixed in the future), I think makes sense to apply [1] to v19. However, during logical replication, xmin (IMO) gets updated rather often, so the problem should not be that severe in earlier versions. [1] https://www.postgresql.org/message-id/TYRPR01MB14195633567DA00ABD42570B794592%40TYRPR01MB14195.jpnprd01.prod.outlook.com -- Antonin Houska Web: https://www.cybertec-postgresql.com
