Re: Eliminating SPI / SQL from some RI triggers - take 3

Amit Langote Thu, 19 Feb 2026 01:22:20 -0800

Hi Junwang,

On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <[email protected]> wrote:
> As Amit has already stated, we are approaching a hybrid "fast-path + fallback"
> design.
>
> 0001 adds a fast path optimization for foreign key constraint checks
> that bypasses the SPI executor, the fast path applies when the referenced
> table is not partitioned, and the constraint does not involve temporal
> semantics.
>
> With the following test:
>
> create table pk (a numeric primary key);
> create table fk (a bigint references pk);
> insert into pk select generate_series(1, 2000000);
>
> head:
>
> [local] zhjwpku@postgres:5432-90419=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 13516.177 ms (00:13.516)
>
> [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 15057.638 ms (00:15.058)
>
> patched:
>
> [local] zhjwpku@postgres:5432-98673=# insert into fk select
> generate_series(1, 2000000, 2);
> INSERT 0 1000000
> Time: 8248.777 ms (00:08.249)
>
> [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> UPDATE 1000000
> Time: 10117.002 ms (00:10.117)
>
> 0002 cache fast-path metadata used by the index probe, at the current
> time only comparison operator hash entries, operator function OIDs
> and strategy numbers and subtypes for index scans. But this cache
> doesn't buy any performance improvement.
>
> Caching additional metadata should improve performance for foreign key checks.
>
> Amit suggested introducing a mechanism for ri_triggers.c to register a
> cleanup callback in the EState, which AfterTriggerEndQuery() could then
> invoke to release per-statement cached metadata (such as the IndexScanDesc).
> However, I haven't been able to implement this mechanism yet.


Thanks for working on this.  I've taken your patches as a starting
point and reworked the series into two patches (attached): 1st is your
0001+0002 as the core patch that adds a gated fast-path alternative to
SPI and 2nd where I added per-statement resource caching.  Doing the
latter turned out to be not so hard thanks to the structure you chose
to build the core fast path.  Good call on adding the RLS and ACL test
cases, btw.

So, 0001 is a functionally complete fast path: concurrency handling,
REPEATABLE READ crosscheck, cross-type operators, security context,
and metadata caching. 0002 implements the per-statement resource
caching we discussed, though instead of sharing the EState between
trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
mechanism that fires at the end of each trigger-firing cycle
(per-statement for immediate constraints, or until COMMIT for deferred
ones). It layers resource caching on top so that the PK relation,
index, scan descriptor, and snapshot stay open across all FK trigger
invocations within a single trigger-firing cycle rather than being
opened and closed per row.

Note that phe previous 0002 (metadata caching) is folded into 0001,
and most of the new fast-path logic added in 0001 now lives in
ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
RI_FKey_check diff is just the gating call and SPI fallback.

I re-ran the benchmarks (same test as yours, different machine):

create table pk (a numeric primary key);
create table fk (a bigint references pk);
insert into pk select generate_series(1, 2000000);
insert into fk select generate_series(1, 2000000, 2);

master: 2444 ms  (median of 3 runs)
0001: 1382 ms  (43% faster)
0001+0002: 1202 ms  (51% faster, 13% over 0001 alone)

Also, with int PK / int FK (1M rows):

create table pk (a int primary key);
create table fk (a int references pk);
insert into pk select generate_series(1, 1000000);
insert into fk select generate_series(1, 1000000);

master: 1000 ms
0001: 520 ms  (48% faster)
0001+0002: 432 ms  (57% faster, 17% over 0001 alone)

The incremental gain from 0002 comes from eliminating per-row relation
open/close, scan begin/end, slot alloc/free, and replacing per-row
GetSnapshotData() with only curcid adjustment on the registered
snapshot copy in the cache.

The two current limitations are partitioned referenced tables and
temporal foreign keys. Partitioned PKs are relatively uncommon in
practice, so the non-partitioned case should cover most FK workloads,
so I'm not sure it's worth the added complexity to support them.
Temporal FKs are inherently multi-row, so they're a poor fit for a
single-probe fast path.

David Rowley mentioned off-list that it might be worth batching
multiple FK values into a single index probe, leveraging the
ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
to buffer FK values across trigger invocations in the per-constraint
cache (0002 already has the right structure for this), build a
SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
pages in one sorted traversal instead of one tree descent per row. The
locking and recheck would still be per-tuple, but the index traversal
cost drops significantly. Single-column FKs are the obvious starting
point. That seems worth exploring but can be done as a separate patch
on top of this.

I think the series is in reasonable shape but would appreciate extra
eyeballs, especially on the concurrency handling in ri_LockPKTuple()
in 0001 and the snapshot lifecycle in 0002. Or anything else that
catches one's eye.

--
Thanks, Amit Langote

v3-0001-Add-fast-path-for-foreign-key-constraint-checks.patch
Description: Binary data

v3-0002-Cache-per-batch-resources-for-fast-path-foreign-k.patch
Description: Binary data

Re: Eliminating SPI / SQL from some RI triggers - take 3

Reply via email to