Re: pg_upgrade: optimize replication slot caught-up check

Masahiko Sawada Fri, 09 Jan 2026 10:04:47 -0800

On Thu, Jan 8, 2026 at 6:59 PM Chao Li <[email protected]> wrote:
>
>
>
> I just looked into v3. Basically, it now does a shared WAL scan to find the 
> newest decodable LSN and uses that to compare with all slots’ 
> confirmed_flush_lsn, which significantly reduces WAL scan effort when there 
> are many slots.


Thank you for reviewing the patch!

>
> One thing I'm thinking about is that if all slots are far behind, the shared 
> scan may still take a long time. Before this change, a scan could stop as 
> soon as it found a pending WAL. So after the change, when there are only a 
> few slots and they are far behind, the scan might end up doing more work than 
> before.

That's a valid concern.

> As a possible optimization, maybe we could also pass the newest 
> confirmed_flush_lsn to the scan. Once it finds a decodable WAL record newer 
> than that confirmed_flush_lsn, we already know all slots are behind, so the 
> scan could stop at that point.

Sounds like a reasonable idea. I'll give it a try and see how it's worthwhile.

>
> WRT the code change, I got a few comments:
>
> 1
> ···
> + * otherwise false. If last_pending_wal_p is set by the caller, it continues
> + * scanning WAL even after detecting a decodable WAL record, in order to
> + * get the last decodable WAL record's LSN.
>   */
>  bool
> -LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal)
> +LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal,
> +                                                                       
> XLogRecPtr *last_pending_wal_p)
>  {
>         bool            has_pending_wal = false;
>
>         Assert(MyReplicationSlot);
>
> +       if (last_pending_wal_p != NULL)
> +               *last_pending_wal_p = InvalidXLogRecPtr;
> ···
>
> The header comment seems to conflict to the code. last_pending_wal_p is 
> unconditionally set to InvalidXLogRecPtr, so whatever a caller set is 
> overwritten. I think you meant to say “if last_pending_wal_p is not NULL”.

Agreed.

>
> 2
> ```
> @@ -286,9 +287,9 @@ 
> binary_upgrade_logical_slot_has_caught_up(PG_FUNCTION_ARGS)
>  {
>         Name            slot_name;
>         XLogRecPtr      end_of_wal;
> -       bool            found_pending_wal;
> +       XLogRecPtr      last_pending_wal;
> ```
>
> The function header comment still says “returns true if …”, that should be 
> updated.
>
> Also, with the change, the function name becomes misleading, where “has” 
> implies a boolean result, but now it will return the newest docodeable wal 
> when no catching up. The function name doesn’t reflect to the actual 
> behavior. Looks like the function is only used by pg_upgrade, so maybe rename 
> it.

Agreed, I'll incorporate the comment in the next version patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: pg_upgrade: optimize replication slot caught-up check

Reply via email to