On Mon, Jan 30, 2023 at 11:34 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > I have reproduced it manually. For this, I had to manually make the > debugger call ReplicationSlotsComputeRequiredXmin(false) via path > SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot()->LogicalConfirmReceivedLocation() > ->ReplicationSlotsComputeRequiredXmin(false) for the apply worker. The > sequence of events is something like (a) the replication_slot_xmin for > tablesync worker is overridden by apply worker as zero as explained in > Sawada-San's email, (b) another transaction happened on the publisher > that will increase the value of ShmemVariableCache->nextXid (c) > tablesync worker invokes > SnapBuildInitialSnapshot()->GetOldestSafeDecodingTransactionId() which > will return an oldestSafeXid which is higher than snapshot's xmin. > This happens because replication_slot_xmin has an InvalidTransactionId > value and we won't consider replication_slot_catalog_xmin because > catalogOnly flag is false and there is no other open running > transaction. I think we should try to get a simplified test to > reproduce this problem if possible. >
Here are steps to reproduce it manually with the help of a debugger: Session-1 ========== select pg_create_logical_replication_slot('s', 'test_decoding'); create table t2(c1 int); select pg_replication_slot_advance('s', pg_current_wal_lsn()); -- Debug this statement. Stop before taking procarraylock in ProcArraySetReplicationSlotXmin. Session-2 ============ psql -d postgres Begin; Session-3 =========== psql -d "dbname=postgres replication=database" begin transaction isolation level repeatable read read only; CREATE_REPLICATION_SLOT slot1 LOGICAL test_decoding USE_SNAPSHOT; --Debug this statement. Stop in SnapBuildInitialSnapshot() before taking procarraylock Session-1 ========== Continue debugging and finish execution of ProcArraySetReplicationSlotXmin. Verify procArray->replication_slot_xmin is zero. Session-2 ========= Select txid_current(); Commit; Session-3 ========== Continue debugging. Verify that safeXid follows snap->xmin. This leads to assertion (in back branches) or error (in HEAD). -- With Regards, Amit Kapila.