Hi,

I found a crash in the logical replication sequence synchronization worker
when the publisher returns NULL sequence data for a sequence after at least
one sequence in the same sync batch has already been processed.

One way to reproduce this is to use a subscription that connects to the
publisher as a replication role that can read one published sequence but
cannot read another one. pg_get_sequence_data() returns NULLs for the
inaccessible sequence. In get_and_validate_seq_info(), that path returns
COPYSEQ_SKIPPED before assigning a new value to the Relation output
argument.
copy_sequences() then still sees the Relation pointer left from the previous
row and calls table_close() on it again. On a cassert build, this trips:

    TRAP: failed Assert("rel->rd_refcnt > 0");  "relcache.c" file

The attached patch clears the output Relation pointer at the start of
get_and_validate_seq_info() and clears the local pointer in copy_sequences()
after closing it. That keeps early returns from reusing a relation from a
previous row.

The patch also adds a TAP test to 036_sequences.pl.

Thoughts?

Regards,
Ayush

Attachment: v1-0001-Fix-stale-relation-close-in-sequence-synchronization.patch
Description: Binary data

Reply via email to