On Fri, Sep 2, 2022 at 6:26 AM osumi.takami...@fujitsu.com
<osumi.takami...@fujitsu.com> wrote:
>
>
> I've met an assertion failure of logical decoding with below scenario on HEAD.
>
> ---
> <preparation>
> create table tab1 (val integer);
> select 'init' from  pg_create_logical_replication_slot('regression_slot', 
> 'test_decoding');
>
> <session1>
> begin;
> savepoint sp1;
> insert into tab1 values (1);
>
> <session2>
> checkpoint; -- for RUNNING_XACT
> select data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
> 'include-xids', '0', 'skip-empty-xacts', '1');
>
> <session1>
> truncate tab1; -- for NEW_CID
> commit;
> begin;
> insert into tab1 values (3);
>

BTW, if I just change the truncate statement to "Analyze tab1" in your
entire test then I am getting a different assertion failure:

postgres.exe!ExceptionalCondition(const char * conditionName, const
char * errorType, const char * fileName, int lineNumber) Line 70 C
postgres.exe!AssertTXNLsnOrder(ReorderBuffer * rb) Line 902 C
postgres.exe!ReorderBufferTXNByXid(ReorderBuffer * rb, unsigned int
xid, bool create, bool * is_new, unsigned __int64 lsn, bool
create_as_top) Line 681 C
postgres.exe!ReorderBufferAddNewTupleCids(ReorderBuffer * rb, unsigned
int xid, unsigned __int64 lsn, RelFileLocator locator, ItemPointerData
tid, unsigned int cmin, unsigned int cmax, unsigned int combocid) Line
3188 C
postgres.exe!SnapBuildProcessNewCid(SnapBuild * builder, unsigned int
xid, unsigned __int64 lsn, xl_heap_new_cid * xlrec) Line 823 C
postgres.exe!heap2_decode(LogicalDecodingContext * ctx,
XLogRecordBuffer * buf) Line 408 C
postgres.exe!LogicalDecodingProcessRecord(LogicalDecodingContext *
ctx, XLogReaderState * record) Line 119 C
postgres.exe!pg_logical_slot_get_changes_guts(FunctionCallInfoBaseData
* fcinfo, bool confirm, bool binary) Line 274 C
postgres.exe!pg_logical_slot_get_changes(FunctionCallInfoBaseData *
fcinfo) Line 339 C

This is matching with call stack we see intermittently in the BF
[1][2]. The difference with your scenario is that the Truncate
statement generates an additional WAL XLOG_STANDBY_LOCK prior to
XLOG_HEAP2_NEW_CID.

I think we can fix this in the below ways:
a. Assert(prev_first_lsn <= cur_txn->first_lsn); -- Explain in
comments that it is possible when subtransaction and transaction are
not previously logged as it happened in this scenario
b. track txn of prev_first_lsn (say as prev_txn) and check if
prev_txn's toptxn is the same as cur_txn or cur_txn's toptxn is the
same as the prev_txn then perform assert mentioned in (a) else, keep
the current Assert.

It seems (b) will be more robust.

Thoughts?

Note: I added Sawada-San as sometime back we had an offlist discussion
on this intermittent BF failure but we were not able to reach the
exact test which can show this failure.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-08-20%2002%3A45%3A34
[2] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09

-- 
With Regards,
Amit Kapila.


Reply via email to