Re: pg_upgrade and logical replication

Michael Paquier Wed, 06 Sep 2023 23:34:34 -0700

On Mon, Sep 04, 2023 at 02:12:58PM +0530, Amit Kapila wrote:
> Yeah, I agree that could be hacked quickly but note I haven't reviewed
> in detail if there are other design issues in this patch. Note that we
> thought first to support the upgrade of the publisher node, otherwise,
> immediately after upgrading the subscriber and publisher, the
> subscriptions won't work and start giving errors as they are dependent
> on slots in the publisher. One other point that needs some thought is
> that the LSN positions we are going to copy in the catalog may no
> longer be valid after the upgrade (of the publisher) because we reset
> WAL. Does that need some special consideration or are we okay with
> that in all cases?


In pg_upgrade, copy_xact_xlog_xid() puts the new node ahead of the old
cluster by 8 segments on TLI 1, so how would be it a problem if the
subscribers keep a remote confirmed LSN lower than that in their
catalogs?  (You've mentioned that to me offline, but I forgot the
details in the code.)

> As of now, things are quite safe as documented in
> pg_dump doc page that it will be the user's responsibility to set up
> replication after dump/restore. I think it would be really helpful if
> you could share your thoughts on the publisher-side matter as we are
> facing a few tricky questions to be answered. For example, see a new
> thread [1].

In my experience, users are quite used to upgrade standbys *first*,
even in simple scenarios like minor upgrades, because that's the only
way to do things safely.  For example, updating and/or upgrading
primaries before the standbys could be a problem if an update
introduces a slight change in the WAL record format that could be
generated by the primary but not be processed by a standby, and we've
done such tweaks in some records in the past for some bug fixes that
had to be backpatched to stable branches.

IMO, the upgrade of subscriber nodes and the upgrade of publisher
nodes need to be treated as two independent processing problems, dealt
with separately.

As you have mentioned me earlier offline, these two have, from what I
understand. one dependency: during a publisher upgrade we need to make
sure that there are no invalid slots when beginning to run pg_upgrade,
and that the confirmed LSN of all the slots used by the subscribers
match with the shutdown checkpoint's LSN, ensuring that the
subscribers would not lose any data because everything's already been
consumed by them when the publisher gets to be upgraded.

> The point raised by Jonathan for not having an option for pg_upgrade
> is that it will be easy for users, otherwise, users always need to
> enable this option. Consider a replication setup, wouldn't users want
> by default it to be upgraded? Asking them to do that via an option
> would be an inconvenience. So, that was the reason, we wanted to have
> an --exclude option and by default allow slots to be upgraded. I think
> the same theory applies here.
> 
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1LV3%2B76CSOAk0h8Kv0AKb-OETsJHe6Sq6172-7DZXf0Qg%40mail.gmail.com

I saw this thread, and have some thoughts to share.  Will reply there.
--
Michael

signature.asc
Description: PGP signature

Re: pg_upgrade and logical replication

Reply via email to