On Mon, Sep 04, 2023 at 02:12:58PM +0530, Amit Kapila wrote: > Yeah, I agree that could be hacked quickly but note I haven't reviewed > in detail if there are other design issues in this patch. Note that we > thought first to support the upgrade of the publisher node, otherwise, > immediately after upgrading the subscriber and publisher, the > subscriptions won't work and start giving errors as they are dependent > on slots in the publisher. One other point that needs some thought is > that the LSN positions we are going to copy in the catalog may no > longer be valid after the upgrade (of the publisher) because we reset > WAL. Does that need some special consideration or are we okay with > that in all cases?
In pg_upgrade, copy_xact_xlog_xid() puts the new node ahead of the old cluster by 8 segments on TLI 1, so how would be it a problem if the subscribers keep a remote confirmed LSN lower than that in their catalogs? (You've mentioned that to me offline, but I forgot the details in the code.) > As of now, things are quite safe as documented in > pg_dump doc page that it will be the user's responsibility to set up > replication after dump/restore. I think it would be really helpful if > you could share your thoughts on the publisher-side matter as we are > facing a few tricky questions to be answered. For example, see a new > thread [1]. In my experience, users are quite used to upgrade standbys *first*, even in simple scenarios like minor upgrades, because that's the only way to do things safely. For example, updating and/or upgrading primaries before the standbys could be a problem if an update introduces a slight change in the WAL record format that could be generated by the primary but not be processed by a standby, and we've done such tweaks in some records in the past for some bug fixes that had to be backpatched to stable branches. IMO, the upgrade of subscriber nodes and the upgrade of publisher nodes need to be treated as two independent processing problems, dealt with separately. As you have mentioned me earlier offline, these two have, from what I understand. one dependency: during a publisher upgrade we need to make sure that there are no invalid slots when beginning to run pg_upgrade, and that the confirmed LSN of all the slots used by the subscribers match with the shutdown checkpoint's LSN, ensuring that the subscribers would not lose any data because everything's already been consumed by them when the publisher gets to be upgraded. > The point raised by Jonathan for not having an option for pg_upgrade > is that it will be easy for users, otherwise, users always need to > enable this option. Consider a replication setup, wouldn't users want > by default it to be upgraded? Asking them to do that via an option > would be an inconvenience. So, that was the reason, we wanted to have > an --exclude option and by default allow slots to be upgraded. I think > the same theory applies here. > > [1] - > https://www.postgresql.org/message-id/CAA4eK1LV3%2B76CSOAk0h8Kv0AKb-OETsJHe6Sq6172-7DZXf0Qg%40mail.gmail.com I saw this thread, and have some thoughts to share. Will reply there. -- Michael
signature.asc
Description: PGP signature