On Thu, Aug 17, 2023 at 6:07 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.m...@gmail.com> 
> > wrote:
> > >
> > > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapil...@gmail.com> 
> > > wrote:
> > > >
> > > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.m...@gmail.com> 
> > > > wrote:
> > > > > Another idea is (which might have already discussed thoguh) that we 
> > > > > check if the latest shutdown checkpoint LSN in the control file 
> > > > > matches the confirmed_flush_lsn in pg_replication_slots view. That 
> > > > > way, we can ensure that the slot has consumed all WAL records before 
> > > > > the last shutdown. We don't need to worry about WAL records generated 
> > > > > after starting the old cluster during the upgrade, at least for 
> > > > > logical replication slots.
> > > > >
> > > >
> > > > Right, this is somewhat closer to what Patch is already doing. But
> > > > remember in this case we need to remember and use the latest
> > > > checkpoint from the control file before the old cluster is started
> > > > because otherwise the latest checkpoint location could be even updated
> > > > during the upgrade. So, instead of reading from WAL, we need to change
> > > > so that we rely on the control file's latest LSN.
> > >
> > > Yes, I was thinking the same idea.
> > >
> > > But it works for only replication slots for logical replication. Do we
> > > want to check if no meaningful WAL records are generated after the
> > > latest shutdown checkpoint, for manually created slots (or non-logical
> > > replication slots)? If so, we would need to have something reading WAL
> > > records in the end.
> > >
> >
> > This feature only targets logical replication slots. I don't see a
> > reason to be different for manually created logical replication slots.
> > Is there something particular that you think we could be missing?
>
> Sorry I was not clear. I meant the logical replication slots that are
> *not* used by logical replication, i.e., are created manually and used
> by third party tools that periodically consume decoded changes. As we
> discussed before, these slots will never be able to pass that
> confirmed_flush_lsn check.
>

I think normally one would have a background process to periodically
consume changes. Won't one can use the walsender infrastructure for
their plugins to consume changes probably by using replication
protocol? Also, I feel it is the plugin author's responsibility to
consume changes or advance slot to the required position before
shutdown.

> After some thoughts, one thing we might
> need to consider is that in practice, the upgrade project is performed
> during the maintenance window and has a backup plan that revert the
> upgrade process, in case something bad happens. If we require the
> users to drop such logical replication slots, they cannot resume to
> use the old cluster in that case, since they would need to create new
> slots, missing some changes.
>

Can't one keep the backup before removing slots?

> Other checks in pg_upgrade seem to be
> compatibility checks that would eventually be required for the upgrade
> anyway. Do we need to consider this case? For example, we do that
> confirmed_flush_lsn check for only the slots with pgoutput plugin.
>

I think one is allowed to use pgoutput plugin even for manually
created slots. So, such a check may not work.

-- 
With Regards,
Amit Kapila.


Reply via email to