Re: Eliminating SPI / SQL from some RI triggers - take 3

Junwang Zhao Mon, 23 Feb 2026 05:46:10 -0800

Hi Amit,

On Thu, Feb 19, 2026 at 5:21 PM Amit Langote <[email protected]> wrote:
>
> Hi Junwang,
>
> On Mon, Dec 1, 2025 at 3:09 PM Junwang Zhao <[email protected]> wrote:
> > As Amit has already stated, we are approaching a hybrid "fast-path + 
> > fallback"
> > design.
> >
> > 0001 adds a fast path optimization for foreign key constraint checks
> > that bypasses the SPI executor, the fast path applies when the referenced
> > table is not partitioned, and the constraint does not involve temporal
> > semantics.
> >
> > With the following test:
> >
> > create table pk (a numeric primary key);
> > create table fk (a bigint references pk);
> > insert into pk select generate_series(1, 2000000);
> >
> > head:
> >
> > [local] zhjwpku@postgres:5432-90419=# insert into fk select
> > generate_series(1, 2000000, 2);
> > INSERT 0 1000000
> > Time: 13516.177 ms (00:13.516)
> >
> > [local] zhjwpku@postgres:5432-90419=# update fk set a = a + 1;
> > UPDATE 1000000
> > Time: 15057.638 ms (00:15.058)
> >
> > patched:
> >
> > [local] zhjwpku@postgres:5432-98673=# insert into fk select
> > generate_series(1, 2000000, 2);
> > INSERT 0 1000000
> > Time: 8248.777 ms (00:08.249)
> >
> > [local] zhjwpku@postgres:5432-98673=# update fk set a = a + 1;
> > UPDATE 1000000
> > Time: 10117.002 ms (00:10.117)
> >
> > 0002 cache fast-path metadata used by the index probe, at the current
> > time only comparison operator hash entries, operator function OIDs
> > and strategy numbers and subtypes for index scans. But this cache
> > doesn't buy any performance improvement.
> >
> > Caching additional metadata should improve performance for foreign key 
> > checks.
> >
> > Amit suggested introducing a mechanism for ri_triggers.c to register a
> > cleanup callback in the EState, which AfterTriggerEndQuery() could then
> > invoke to release per-statement cached metadata (such as the IndexScanDesc).
> > However, I haven't been able to implement this mechanism yet.
>
> Thanks for working on this.  I've taken your patches as a starting
> point and reworked the series into two patches (attached): 1st is your
> 0001+0002 as the core patch that adds a gated fast-path alternative to
> SPI and 2nd where I added per-statement resource caching.  Doing the
> latter turned out to be not so hard thanks to the structure you chose
> to build the core fast path.  Good call on adding the RLS and ACL test
> cases, btw.
>
> So, 0001 is a functionally complete fast path: concurrency handling,
> REPEATABLE READ crosscheck, cross-type operators, security context,
> and metadata caching. 0002 implements the per-statement resource
> caching we discussed, though instead of sharing the EState between
> trigger.c and ri_triggers.c it uses a new AfterTriggerBatchCallback
> mechanism that fires at the end of each trigger-firing cycle
> (per-statement for immediate constraints, or until COMMIT for deferred
> ones). It layers resource caching on top so that the PK relation,
> index, scan descriptor, and snapshot stay open across all FK trigger
> invocations within a single trigger-firing cycle rather than being
> opened and closed per row.
>
> Note that phe previous 0002 (metadata caching) is folded into 0001,
> and most of the new fast-path logic added in 0001 now lives in
> ri_FastPathCheck() rather than inline in RI_FKey_check(), so the
> RI_FKey_check diff is just the gating call and SPI fallback.
>
> I re-ran the benchmarks (same test as yours, different machine):
>
> create table pk (a numeric primary key);
> create table fk (a bigint references pk);
> insert into pk select generate_series(1, 2000000);
> insert into fk select generate_series(1, 2000000, 2);
>
> master: 2444 ms  (median of 3 runs)
> 0001: 1382 ms  (43% faster)
> 0001+0002: 1202 ms  (51% faster, 13% over 0001 alone)


I can get similar improvement on my old mac intel chip:

master: 12963.993 ms
0001: 6641.692 ms, 48.8% faster
0001+0002: 5771.703 ms, 55.5% faster

>
> Also, with int PK / int FK (1M rows):
>
> create table pk (a int primary key);
> create table fk (a int references pk);
> insert into pk select generate_series(1, 1000000);
> insert into fk select generate_series(1, 1000000);
>
> master: 1000 ms
> 0001: 520 ms  (48% faster)
> 0001+0002: 432 ms  (57% faster, 17% over 0001 alone)

master: 11134.583 ms
0001: 5240.298 ms, 52.9% faster
0001+0002: 4554.215 ms, 59.1% faster

>
> The incremental gain from 0002 comes from eliminating per-row relation
> open/close, scan begin/end, slot alloc/free, and replacing per-row
> GetSnapshotData() with only curcid adjustment on the registered
> snapshot copy in the cache.
>
> The two current limitations are partitioned referenced tables and
> temporal foreign keys. Partitioned PKs are relatively uncommon in
> practice, so the non-partitioned case should cover most FK workloads,
> so I'm not sure it's worth the added complexity to support them.
> Temporal FKs are inherently multi-row, so they're a poor fit for a
> single-probe fast path.
>
> David Rowley mentioned off-list that it might be worth batching
> multiple FK values into a single index probe, leveraging the
> ScalarArrayOp btree improvements from PostgreSQL 17. The idea would be
> to buffer FK values across trigger invocations in the per-constraint
> cache (0002 already has the right structure for this), build a
> SK_SEARCHARRAY scan key, and let the btree AM walk the matching leaf
> pages in one sorted traversal instead of one tree descent per row. The
> locking and recheck would still be per-tuple, but the index traversal
> cost drops significantly. Single-column FKs are the obvious starting
> point. That seems worth exploring but can be done as a separate patch
> on top of this.

I will take a look at this in the following weeks.

>
> I think the series is in reasonable shape but would appreciate extra
> eyeballs, especially on the concurrency handling in ri_LockPKTuple()
> in 0001 and the snapshot lifecycle in 0002. Or anything else that
> catches one's eye.
>
> --
> Thanks, Amit Langote

I don't have any additional comments on the patch except one minor nit,
maybe merge the following two if conditions into one, not a strong opinion
though.

if (use_cache)
{
/*
* The snapshot was registered once when the cache entry was created.
* We just patch curcid to reflect the new command counter.
* SnapshotSetCommandId() only patches process-global statics, not
* registered copies, so we do it directly.
*
* The xmin/xmax/xip fields don't need refreshing: within a single
* statement batch, only curcid changes between rows.
*/
Assert(fpentry && fpentry->snapshot != NULL);
snapshot = fpentry->snapshot;
snapshot->curcid = GetCurrentCommandId(false);
}
else
snapshot = RegisterSnapshot(GetLatestSnapshot());

if (use_cache)
{
pk_rel = fpentry->pk_rel;
idx_rel = fpentry->idx_rel;
scandesc = fpentry->scandesc;
slot = fpentry->slot;
}
else
{
pk_rel = table_open(riinfo->pk_relid, RowShareLock);
idx_rel = index_open(riinfo->conindid, AccessShareLock);
scandesc = index_beginscan(pk_rel, idx_rel,
snapshot, NULL,
riinfo->nkeys, 0);
slot = table_slot_create(pk_rel, NULL);
}

-- 
Regards
Junwang Zhao

Re: Eliminating SPI / SQL from some RI triggers - take 3

Reply via email to