On Fri, Jan 9, 2026 at 10:03 AM Masahiko Sawada <[email protected]> wrote: > > On Thu, Jan 8, 2026 at 6:59 PM Chao Li <[email protected]> wrote: > > > > > > > > I just looked into v3. Basically, it now does a shared WAL scan to find the > > newest decodable LSN and uses that to compare with all slots’ > > confirmed_flush_lsn, which significantly reduces WAL scan effort when there > > are many slots. > > Thank you for reviewing the patch! > > > > > One thing I'm thinking about is that if all slots are far behind, the > > shared scan may still take a long time. Before this change, a scan could > > stop as soon as it found a pending WAL. So after the change, when there are > > only a few slots and they are far behind, the scan might end up doing more > > work than before. > > That's a valid concern. > > > As a possible optimization, maybe we could also pass the newest > > confirmed_flush_lsn to the scan. Once it finds a decodable WAL record newer > > than that confirmed_flush_lsn, we already know all slots are behind, so the > > scan could stop at that point. > > Sounds like a reasonable idea. I'll give it a try and see how it's worthwhile. > > > > > WRT the code change, I got a few comments: > > > > 1 > > ··· > > + * otherwise false. If last_pending_wal_p is set by the caller, it > > continues > > + * scanning WAL even after detecting a decodable WAL record, in order to > > + * get the last decodable WAL record's LSN. > > */ > > bool > > -LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal) > > +LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal, > > + > > XLogRecPtr *last_pending_wal_p) > > { > > bool has_pending_wal = false; > > > > Assert(MyReplicationSlot); > > > > + if (last_pending_wal_p != NULL) > > + *last_pending_wal_p = InvalidXLogRecPtr; > > ··· > > > > The header comment seems to conflict to the code. last_pending_wal_p is > > unconditionally set to InvalidXLogRecPtr, so whatever a caller set is > > overwritten. I think you meant to say “if last_pending_wal_p is not NULL”. > > Agreed. > > > > > 2 > > ``` > > @@ -286,9 +287,9 @@ > > binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS) > > { > > Name slot_name; > > XLogRecPtr end_of_wal; > > - bool found_pending_wal; > > + XLogRecPtr last_pending_wal; > > ``` > > > > The function header comment still says “returns true if …”, that should be > > updated. > > > > Also, with the change, the function name becomes misleading, where “has” > > implies a boolean result, but now it will return the newest docodeable wal > > when no catching up. The function name doesn’t reflect to the actual > > behavior. Looks like the function is only used by pg_upgrade, so maybe > > rename it. > > Agreed, I'll incorporate the comment in the next version patch. >
I've attached the updated patch that addressed all review comments. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
v4-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch
Description: Binary data
