Re: Eliminating SPI / SQL from some RI triggers - take 3

Chao Li Thu, 09 Apr 2026 03:26:18 -0700

> On Apr 9, 2026, at 16:40, Amit Langote <[email protected]> wrote:
> 
> Hi,
> 
> On Thu, Apr 9, 2026 at 4:40 PM Chao Li <[email protected]> wrote:
>>> On Apr 8, 2026, at 22:26, Amit Langote <[email protected]> wrote:
>>> On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]> wrote:
>>>> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]> 
>>>> wrote:
>>>>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
>>>>> <[email protected]> wrote:
>>>>>> The patch also adds a test module (test_spi_func) with a C function
>>>>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
>>>>>> crash cannot be triggered from PL/pgSQL. The test exercises the
>>>>>> C-level SPI INSERT with multiple FK constraints, FK violations, and
>>>>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
>>>> 
>>>> I applied only the test module changes and it passes (without
>>>> crashing) even without your proposed fix. It seems that's because the
>>>> C function in test_spi_func calling SPI is using the same resource
>>>> owner as the parent SELECT. I think you'd need to create a resource
>>>> owner manually in the spi_exec() C function to reproduce the crash, as
>>>> done in the attached 0001, which contains the src/test changes
>>>> extracted from your patch modified as described, including renaming
>>>> the C function to spi_exec_sql().
>>>> 
>>>> Also, the test cases that call spi_exec() (_sql()) directly from a
>>>> SELECT don't actually exercise the crash path because there is no
>>>> outer trigger-firing loop active. query_depth is 0 inside the inner
>>>> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
>>>> callback there anyway. The critical case requires spi_exec_sql() to be
>>>> called from inside an AFTER trigger, where query_depth > 0 causes the
>>>> guard to defer the callback past the inner resource owner's lifetime.
>>>> I've added that test case. I kept your original test cases as they
>>>> still provide useful coverage of C-level SPI FK behavior even if they
>>>> don't exercise the crash path specifically.  Maybe your original
>>>> PostGIS test suite that hit the crash did have the right structure,
>>>> but that's not reflected in the patch as far as I can tell.
>>>> 
>>>> I've also renamed the module to test_spi_resowner to better reflect
>>>> what it's about.
>>>> 
>>>> For the fix, I have a different proposal. As you observed, the
>>>> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
>>>> that the nested SPI's callbacks get called under the outer resource
>>>> owner, which may not be the same as the one that SPI used. I think it
>>>> was a mistake to have that early return in the first place. Instead we
>>>> could remember for each callback what firing level it should be called
>>>> at, so the nested SPI's callbacks fire before returning to the parent
>>>> level and parent-level callbacks fire when the parent level completes.
>>>> I have implemented that in the attached 0002 along with transaction
>>>> boundary cleanup of callbacks, which passes the check-world for me,
>>>> but I'll need to stare some more at it before committing.
>>>> 
>>>> Let me know if this also fixes your own in-house test suite or if you
>>>> have any other suggestions or if you think I am missing something.
>>> 
>>> One more cleanup patch attached as 0003: afterTriggerFiringDepth was
>>> added by commit 5c54c3ed1 as a file-static variable, which in
>>> hindsight should have been a field in AfterTriggersData alongside the
>>> other per-transaction after-trigger state. This patch makes that
>>> correction.
>>> 
>>> One alternative design worth considering for 0002: storing
>>> batch_callbacks per query level in AfterTriggersQueryData rather than
>>> as a single list in AfterTriggersData, so callbacks naturally live at
>>> the query level where they were registered and get cleaned up with
>>> AfterTriggerFreeQuery on abort. Deferred constraints still need a
>>> top-level list in AfterTriggersData since they fire outside any query
>>> level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
>>> caller passes either the query-level or top-level list as appropriate.
>>> This eliminates the need for firing_depth-matched firing entirely. I
>>> did that in 0004.  I think I like it over 0002.  Will look more
>>> closely tomorrow morning.
>> A few comments on v3:
> 
> Thanks for the review.
> 
>> 1 - 0002
>> ```
>> static void
>> FireAfterTriggerBatchCallbacks(void)
>> {
>> +       List       *remaining = NIL;
>> +       List       *to_fire = NIL;
>>        ListCell   *lc;
>> 
>> -       if (afterTriggers.query_depth > 0)
>> -               return;
>> +       /* remaining and to_fire lists must survive until callbacks complete 
>> */
>> +       MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
>> ```
>> 
>> I think remaining and to_fire should stay in the same context of 
>> afterTriggers.batch_callbacks, so instead of hard coding 
>> TopTransactionContext, we can use 
>> GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the 
>> intention explicit.
> 
> I'm dropping 0002 or have merged 0004 into it so this memory context
> switch is no longer present.
> 
>> 2 - 0004, I noticed one potential problem, although I am not sure whether it 
>> can really happen in practice. This version stores callback items at the 
>> individual query depth, and FireAfterTriggerBatchCallbacks() now iterates 
>> the callback list for that depth and invokes each callback directly. My 
>> concern is that if one of those callbacks needs to register a new callback, 
>> that would append a new item to the same list while it is being iterated. 
>> That seems unsafe to me, because list append may create a new list structure 
>> underneath. If that happens, we may end up modifying the list being 
>> traversed, which does not look safe.
>> 
>> This problem doesn’t exist in 0002, because 0002 splits 
>> afterTriggers.batch_callbacks into remaining and to_fire, and reset 
>> afterTriggers.batch_callbacks = remaining before running callbacks. But the 
>> problem is, if a callback registers a new callback, the new callback goes to 
>> afterTriggers.batch_callbacks, so it won’t get executed.
>> 
>> From this perspective, I would assume a callback should not be allowed to 
>> register a new callback. Can you please help confirm?
> 
> Good point on the re-entrant registration concern. I've added a
> firing_batch_callbacks flag to AfterTriggersData that prevents
> callbacks from registering new callbacks during
> FireAfterTriggerBatchCallbacks(), with an Assert in
> RegisterAfterTriggerBatchCallback() to enforce it. That should keep
> the list being iterated from being modified.
> 
> The attached patches are updated accordingly. 0001 is the main fix
> incorporating the per-query-level storage design, the transaction
> boundary cleanup, and the firing_batch_callbacks guard. 0002 is a
> followup that moves afterTriggerFiringDepth into AfterTriggersData as
> a minor cleanup of 5c54c3ed1b9. Barring further feedback I plan to
> commit 0001 and 0002 shortly. For 0003, I need to check on the policy
> around adding new test modules during feature freeze before committing
> it.
> 
> -- 
> Thanks, Amit Langote
> <v4-0002-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch><v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch><v4-0003-Add-test-module-for-RI-fast-path-FK-checks-under-.patch>

0001 and 0002 look good to me. I didn’t review 0003 and don’t intend to review 
it.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Re: Eliminating SPI / SQL from some RI triggers - take 3

Reply via email to