Hi, I found a crash in the logical replication sequence synchronization worker when the publisher returns NULL sequence data for a sequence after at least one sequence in the same sync batch has already been processed.
One way to reproduce this is to use a subscription that connects to the
publisher as a replication role that can read one published sequence but
cannot read another one. pg_get_sequence_data() returns NULLs for the
inaccessible sequence. In get_and_validate_seq_info(), that path returns
COPYSEQ_SKIPPED before assigning a new value to the Relation output
argument.
copy_sequences() then still sees the Relation pointer left from the previous
row and calls table_close() on it again. On a cassert build, this trips:
TRAP: failed Assert("rel->rd_refcnt > 0"); "relcache.c" file
The attached patch clears the output Relation pointer at the start of
get_and_validate_seq_info() and clears the local pointer in copy_sequences()
after closing it. That keeps early returns from reusing a relation from a
previous row.
The patch also adds a TAP test to 036_sequences.pl.
Thoughts?
Regards,
Ayush
v1-0001-Fix-stale-relation-close-in-sequence-synchronization.patch
Description: Binary data
