On Sat, Jan 28, 2023 at 11:54 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > Dear Amit, Sawada-san, > > I have also reproduced the failure on PG15 with some debug log, and I agreed > that > somebody changed procArray->replication_slot_xmin to InvalidTransactionId. > > > > The same assertion failure has been reported on another thread[1]. > > > Since I could reproduce this issue several times in my environment > > > I've investigated the root cause. > > > > > > I think there is a race condition of updating > > > procArray->replication_slot_xmin by CreateInitDecodingContext() and > > > LogicalConfirmReceivedLocation(). > > > > > > What I observed in the test was that a walsender process called: > > > SnapBuildProcessRunningXacts() > > > LogicalIncreaseXminForSlot() > > > LogicalConfirmReceivedLocation() > > > ReplicationSlotsComputeRequiredXmin(false). > > > > > > In ReplicationSlotsComputeRequiredXmin() it acquired the > > > ReplicationSlotControlLock and got 0 as the minimum xmin since there > > > was no wal sender having effective_xmin. > > > > > > > What about the current walsender process which is processing > > running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender > > slot's effective_xmin have a non-zero value? If not, then why? > > Normal walsenders which are not for tablesync create a replication slot with > NOEXPORT_SNAPSHOT option. I think in this case, CreateInitDecodingContext() is > called with need_full_snapshot = false, and slot->effective_xmin is not > updated.
Right. This is how we create a slot used by an apply worker. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com