Confine RI fast-path batching to the top transaction level

The FK fast-path batching added in b7b27eb41a5 buffers rows in a
transaction-lived cache (ri_fastpath_cache) keyed by constraint OID.
Running user-defined cast and equality functions during a batch flush,
together with the cache's lifetime and iteration, exposed two defects
reachable by an unprivileged table owner.

First, on subtransaction abort ri_FastPathSubXactCallback discarded the
entire cache.  An entry's batch holds rows buffered by the enclosing
transaction, not just the aborting subxact -- the cache is keyed by
constraint, so a single entry can mix rows from multiple subxact levels.
An internal subxact abort during after-trigger firing (e.g. a PL/pgSQL
BEGIN ... EXCEPTION block) therefore dropped buffered rows of the outer
transaction without running their FK checks, letting orphan rows commit
behind a constraint that still reported itself valid.  The discard also
left relations opened by the batch unclosed, producing "resource was not
closed" warnings.

Second, ri_FastPathEndBatch flushes by iterating the cache with
hash_seq_search.  If flush-time user code inserts into a different
fast-path FK table, a new entry is added to the cache mid-scan; it may
land in a bucket the scan has already passed and never be reached, and
ri_FastPathTeardown then destroys the cache without flushing it,
silently dropping that check.

Cleanly unwinding the cache on subxact abort would require tracking the
originating subxact of each buffered row, since rows from different
levels share an entry (the cache is keyed by constraint) and deferred
constraints cannot be flushed early at a subxact boundary.  Rather than
add that bookkeeping, confine batching to the top transaction level: in
RI_FKey_check, when GetCurrentTransactionNestLevel() > 1, use the
per-row fast path (ri_FastPathCheck) instead of buffering.  Rows checked
inside a subtransaction are then verified immediately and roll back
cleanly with their subtransaction, and the cache only ever holds
top-level rows.  With the cache confined to the top level, a
subtransaction abort has nothing of its own to discard, so
ri_FastPathSubXactCallback is removed along with its registration.

For the second defect, add a cache-wide flag (ri_fastpath_flushing) set
while ri_FastPathEndBatch iterates the cache.  A re-entrant FK check
arriving while the flag is set takes the per-row path rather than adding
an entry to the cache being scanned, so no entry can be missed and torn
down unflushed.  The flag is cleared in a PG_FINALLY so a flush that
throws (a reported violation or an error from user code) does not leave
it stuck.  As defensive insurance it is also cleared in
ri_FastPathXactCallback() at transaction end.

The per-row fast path still bypasses SPI and stays well ahead of the
pre-19 SPI-based check.  A fuller fix that preserves batching across
subtransactions -- whether by tracking the originating subxact of each
buffered row or by per-subxact cache stacks merged into the parent on
commit -- is left for a future release.

The subtransaction-abort case is covered by a new regression test.  The
mid-scan cross-table case depends on hash bucket placement and so is not
reliably reproducible in a portable test, but the flag prevents it by
construction.

Reported-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Nikolay Samokhvalov <[email protected]>
Reviewed-by: Ayush Tiwari <[email protected]>
Reviewed-by: Junwang Zhao <[email protected]>
Discussion: 
https://postgr.es/m/CAM527d9exRCdWrhJOnAxk_vACg7sr_yPoaJp_+uCFY0qP8v=a...@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/4113873a5ab0fb83a6f772f455b2842359d5ec50

Modified Files
--------------
src/backend/utils/adt/ri_triggers.c       | 86 +++++++++++++++++++++----------
src/test/regress/expected/foreign_key.out | 24 +++++++++
src/test/regress/sql/foreign_key.sql      | 23 +++++++++
3 files changed, 105 insertions(+), 28 deletions(-)

Reply via email to