Re: pg_upgrade: optimize replication slot caught-up check

Masahiko Sawada Fri, 09 Jan 2026 13:33:41 -0800

On Fri, Jan 9, 2026 at 10:03 AM Masahiko Sawada <[email protected]> wrote:
>
> On Thu, Jan 8, 2026 at 6:59 PM Chao Li <[email protected]> wrote:
> >
> >
> >
> > I just looked into v3. Basically, it now does a shared WAL scan to find the 
> > newest decodable LSN and uses that to compare with all slots’ 
> > confirmed_flush_lsn, which significantly reduces WAL scan effort when there 
> > are many slots.
>
> Thank you for reviewing the patch!
>
> >
> > One thing I'm thinking about is that if all slots are far behind, the 
> > shared scan may still take a long time. Before this change, a scan could 
> > stop as soon as it found a pending WAL. So after the change, when there are 
> > only a few slots and they are far behind, the scan might end up doing more 
> > work than before.
>
> That's a valid concern.
>
> > As a possible optimization, maybe we could also pass the newest 
> > confirmed_flush_lsn to the scan. Once it finds a decodable WAL record newer 
> > than that confirmed_flush_lsn, we already know all slots are behind, so the 
> > scan could stop at that point.
>
> Sounds like a reasonable idea. I'll give it a try and see how it's worthwhile.
>
> >
> > WRT the code change, I got a few comments:
> >
> > 1
> > ···
> > + * otherwise false. If last_pending_wal_p is set by the caller, it 
> > continues
> > + * scanning WAL even after detecting a decodable WAL record, in order to
> > + * get the last decodable WAL record's LSN.
> >   */
> >  bool
> > -LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
> > +LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal,
> > +                                                                       
> > XLogRecPtr *last_pending_wal_p)
> >  {
> >         bool            has_pending_wal = false;
> >
> >         Assert(MyReplicationSlot);
> >
> > +       if (last_pending_wal_p != NULL)
> > +               *last_pending_wal_p = InvalidXLogRecPtr;
> > ···
> >
> > The header comment seems to conflict to the code. last_pending_wal_p is 
> > unconditionally set to InvalidXLogRecPtr, so whatever a caller set is 
> > overwritten. I think you meant to say “if last_pending_wal_p is not NULL”.
>
> Agreed.
>
> >
> > 2
> > ```
> > @@ -286,9 +287,9 @@ 
> > binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
> >  {
> >         Name            slot_name;
> >         XLogRecPtr      end_of_wal;
> > -       bool            found_pending_wal;
> > +       XLogRecPtr      last_pending_wal;
> > ```
> >
> > The function header comment still says “returns true if …”, that should be 
> > updated.
> >
> > Also, with the change, the function name becomes misleading, where “has” 
> > implies a boolean result, but now it will return the newest docodeable wal 
> > when no catching up. The function name doesn’t reflect to the actual 
> > behavior. Looks like the function is only used by pg_upgrade, so maybe 
> > rename it.
>
> Agreed, I'll incorporate the comment in the next version patch.
>


I've attached the updated patch that addressed all review comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

v4-0001-pg_upgrade-Optimize-replication-slot-caught-up-ch.patch
Description: Binary data

Re: pg_upgrade: optimize replication slot caught-up check

Reply via email to