> On Apr 9, 2026, at 16:40, Amit Langote <[email protected]> wrote:
>
> Hi,
>
> On Thu, Apr 9, 2026 at 4:40 PM Chao Li <[email protected]> wrote:
>>> On Apr 8, 2026, at 22:26, Amit Langote <[email protected]> wrote:
>>> On Wed, Apr 8, 2026 at 6:58 PM Amit Langote <[email protected]> wrote:
>>>> On Wed, Apr 8, 2026 at 10:23 AM Amit Langote <[email protected]>
>>>> wrote:
>>>>> On Tue, Apr 7, 2026 at 10:00 PM Evan Montgomery-Recht
>>>>> <[email protected]> wrote:
>>>>>> The patch also adds a test module (test_spi_func) with a C function
>>>>>> that executes SQL via SPI_connect/SPI_execute/SPI_finish, since this
>>>>>> crash cannot be triggered from PL/pgSQL. The test exercises the
>>>>>> C-level SPI INSERT with multiple FK constraints, FK violations, and
>>>>>> nested PL/pgSQL-calls-C-SPI (matching the PostGIS call pattern).
>>>>
>>>> I applied only the test module changes and it passes (without
>>>> crashing) even without your proposed fix. It seems that's because the
>>>> C function in test_spi_func calling SPI is using the same resource
>>>> owner as the parent SELECT. I think you'd need to create a resource
>>>> owner manually in the spi_exec() C function to reproduce the crash, as
>>>> done in the attached 0001, which contains the src/test changes
>>>> extracted from your patch modified as described, including renaming
>>>> the C function to spi_exec_sql().
>>>>
>>>> Also, the test cases that call spi_exec() (_sql()) directly from a
>>>> SELECT don't actually exercise the crash path because there is no
>>>> outer trigger-firing loop active. query_depth is 0 inside the inner
>>>> SPI's AfterTriggerEndQuery, so the old guard wouldn't suppress the
>>>> callback there anyway. The critical case requires spi_exec_sql() to be
>>>> called from inside an AFTER trigger, where query_depth > 0 causes the
>>>> guard to defer the callback past the inner resource owner's lifetime.
>>>> I've added that test case. I kept your original test cases as they
>>>> still provide useful coverage of C-level SPI FK behavior even if they
>>>> don't exercise the crash path specifically. Maybe your original
>>>> PostGIS test suite that hit the crash did have the right structure,
>>>> but that's not reflected in the patch as far as I can tell.
>>>>
>>>> I've also renamed the module to test_spi_resowner to better reflect
>>>> what it's about.
>>>>
>>>> For the fix, I have a different proposal. As you observed, the
>>>> query_depth > 0 early return in FireAfterTriggerBatchCallbacks() means
>>>> that the nested SPI's callbacks get called under the outer resource
>>>> owner, which may not be the same as the one that SPI used. I think it
>>>> was a mistake to have that early return in the first place. Instead we
>>>> could remember for each callback what firing level it should be called
>>>> at, so the nested SPI's callbacks fire before returning to the parent
>>>> level and parent-level callbacks fire when the parent level completes.
>>>> I have implemented that in the attached 0002 along with transaction
>>>> boundary cleanup of callbacks, which passes the check-world for me,
>>>> but I'll need to stare some more at it before committing.
>>>>
>>>> Let me know if this also fixes your own in-house test suite or if you
>>>> have any other suggestions or if you think I am missing something.
>>>
>>> One more cleanup patch attached as 0003: afterTriggerFiringDepth was
>>> added by commit 5c54c3ed1 as a file-static variable, which in
>>> hindsight should have been a field in AfterTriggersData alongside the
>>> other per-transaction after-trigger state. This patch makes that
>>> correction.
>>>
>>> One alternative design worth considering for 0002: storing
>>> batch_callbacks per query level in AfterTriggersQueryData rather than
>>> as a single list in AfterTriggersData, so callbacks naturally live at
>>> the query level where they were registered and get cleaned up with
>>> AfterTriggerFreeQuery on abort. Deferred constraints still need a
>>> top-level list in AfterTriggersData since they fire outside any query
>>> level. FireAfterTriggerBatchCallbacks() takes a list parameter and the
>>> caller passes either the query-level or top-level list as appropriate.
>>> This eliminates the need for firing_depth-matched firing entirely. I
>>> did that in 0004. I think I like it over 0002. Will look more
>>> closely tomorrow morning.
>> A few comments on v3:
>
> Thanks for the review.
>
>> 1 - 0002
>> ```
>> static void
>> FireAfterTriggerBatchCallbacks(void)
>> {
>> + List *remaining = NIL;
>> + List *to_fire = NIL;
>> ListCell *lc;
>>
>> - if (afterTriggers.query_depth > 0)
>> - return;
>> + /* remaining and to_fire lists must survive until callbacks complete
>> */
>> + MemoryContext oldcxt = MemoryContextSwitchTo(TopTransactionContext);
>> ```
>>
>> I think remaining and to_fire should stay in the same context of
>> afterTriggers.batch_callbacks, so instead of hard coding
>> TopTransactionContext, we can use
>> GetMemoryChunkContext(afterTriggers.batch_callbacks), which makes the
>> intention explicit.
>
> I'm dropping 0002 or have merged 0004 into it so this memory context
> switch is no longer present.
>
>> 2 - 0004, I noticed one potential problem, although I am not sure whether it
>> can really happen in practice. This version stores callback items at the
>> individual query depth, and FireAfterTriggerBatchCallbacks() now iterates
>> the callback list for that depth and invokes each callback directly. My
>> concern is that if one of those callbacks needs to register a new callback,
>> that would append a new item to the same list while it is being iterated.
>> That seems unsafe to me, because list append may create a new list structure
>> underneath. If that happens, we may end up modifying the list being
>> traversed, which does not look safe.
>>
>> This problem doesn’t exist in 0002, because 0002 splits
>> afterTriggers.batch_callbacks into remaining and to_fire, and reset
>> afterTriggers.batch_callbacks = remaining before running callbacks. But the
>> problem is, if a callback registers a new callback, the new callback goes to
>> afterTriggers.batch_callbacks, so it won’t get executed.
>>
>> From this perspective, I would assume a callback should not be allowed to
>> register a new callback. Can you please help confirm?
>
> Good point on the re-entrant registration concern. I've added a
> firing_batch_callbacks flag to AfterTriggersData that prevents
> callbacks from registering new callbacks during
> FireAfterTriggerBatchCallbacks(), with an Assert in
> RegisterAfterTriggerBatchCallback() to enforce it. That should keep
> the list being iterated from being modified.
>
> The attached patches are updated accordingly. 0001 is the main fix
> incorporating the per-query-level storage design, the transaction
> boundary cleanup, and the firing_batch_callbacks guard. 0002 is a
> followup that moves afterTriggerFiringDepth into AfterTriggersData as
> a minor cleanup of 5c54c3ed1b9. Barring further feedback I plan to
> commit 0001 and 0002 shortly. For 0003, I need to check on the policy
> around adding new test modules during feature freeze before committing
> it.
>
> --
> Thanks, Amit Langote
> <v4-0002-Move-afterTriggerFiringDepth-into-AfterTriggersDa.patch><v4-0001-Fix-RI-fast-path-crash-under-nested-C-level-SPI.patch><v4-0003-Add-test-module-for-RI-fast-path-FK-checks-under-.patch>
0001 and 0002 look good to me. I didn’t review 0003 and don’t intend to review
it.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/