Dear Amit, Sawada-san,

I have also reproduced the failure on PG15 with some debug log, and I agreed 
that
somebody changed procArray->replication_slot_xmin to InvalidTransactionId.

> > The same assertion failure has been reported on another thread[1].
> > Since I could reproduce this issue several times in my environment
> > I've investigated the root cause.
> >
> > I think there is a race condition of updating
> > procArray->replication_slot_xmin by CreateInitDecodingContext() and
> > LogicalConfirmReceivedLocation().
> >
> > What I observed in the test was that a walsender process called:
> > SnapBuildProcessRunningXacts()
> >   LogicalIncreaseXminForSlot()
> >     LogicalConfirmReceivedLocation()
> >       ReplicationSlotsComputeRequiredXmin(false).
> >
> > In ReplicationSlotsComputeRequiredXmin() it acquired the
> > ReplicationSlotControlLock and got 0 as the minimum xmin since there
> > was no wal sender having effective_xmin.
> >
> 
> What about the current walsender process which is processing
> running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender
> slot's effective_xmin have a non-zero value? If not, then why?

Normal walsenders which are not for tablesync create a replication slot with
NOEXPORT_SNAPSHOT option. I think in this case, CreateInitDecodingContext() is
called with need_full_snapshot = false, and slot->effective_xmin is not updated.
It is set as InvalidTransactionId at ReplicationSlotCreate() and no functions 
update
that. Hence the slot acquired by the walsender may have Invalid effective_min.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Reply via email to