Hi -hackers, we've got report of rare crash in EDB's PostgreSQL fork which revealed that the PostgreSQL's ReinitializeParallelDSM() allocates memory from wrong memory context (ExecutorState) instead of TopTransactionContext.
Currently we do not have public reproducer for this in PG community version, however when using Parallel Query, initially both ParallelContext pcxt->worker and pcxt->worker[i].error_mqh are allocated from TopTransactionContext (and that's appears to be correct), but that happens only initially, because later on - where reuse of DSM/workers is in play and more data is involved / technically where ExecParlallelReinitalize()->ReinitializeParallelDSM() is involved - after some time, the pcxt->worker[i].error_mqh will end up being reinitialized from ExecutorState memory context and that's the bug. In layman terms it means that this ReinitializeParallelDSM() is usually used for Nested Loop with Gather (useful side note: but this can be easier triggered with enable_material = off as this reaches the ReinitializeParallelDSM() way faster ). Normally this is not causing problems, however this might be problematic when query is cancelled, as the ExecutorState might be pfreed depending on what is inside PG_CATCH(): in case of having SPI_finish() there, the ExecutorState will be pfreed, then the pcxt->workers[i].error_mqh-> might end up being accessed by the SIGINT handler itself later like this: AbortTransaction → DestroyParallelContext() which leads to use after free. The stack would be similiar to : longjmp -> AbortCurrentTransaction() -> AbortTransaction() -> AtEOXact_Parallel() -> DestroyParallelContext() -> shm_mq_detach(). SPI_finish() is just an example here where we have caught it (it's releasing ExecutorState). To sum up it occurs in the following conditions: 1. parallel query involved with nest loop/gather 2. ReinitializeParallelDSM() being used 3. query cancellation 4. PG_CATCH() pfreeing ExecutorState We have attached the patch. The issue is within shm_mq_attach(), but we want to protect the whole function just in case just like in InitializeParallelDSM. Thanks to Jeevan Chalke and Robert Haas for help while debugging this. -J.
From 83641b878d8732b5eec4614acf1f9df79d29be0f Mon Sep 17 00:00:00 2001 From: Jakub Wartak <[email protected]> Date: Mon, 8 Dec 2025 10:39:35 +0530 Subject: [PATCH v1] Parallel query: Use TopTransactionContext for ReinitializeParallelDSM() When reinitializing the dynamic shared memory (DSM) segment for a parallel context in ReinitializeParallelDSM(), we failed to switch to the long-lived TopTransactionContext for necessary memory allocations. This deviates from the established pattern used in InitializeParallelDSM(). Allocations were instead made in the current, potentially short-lived memory context. This exact issue could to a potential server crash (segmentation fault) when a pointer allocated in the short-lived context was prematurely freed. Subsequent cleanup in DestroyParallelContext() could resulted in a use-after-free error. This commit fixes the breakage by ensuring that memory for the parallel context is always correctly allocated in TopTransactionContext during reinitialization. Author: Jakub Wartak <[email protected]> Co-authored-by: Jeevan Chalke <[email protected]> Reviewed-by: Discussion: --- src/backend/access/transam/parallel.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 94db1ec3012..5e6d21969e1 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -507,8 +507,12 @@ InitializeParallelDSM(ParallelContext *pcxt) void ReinitializeParallelDSM(ParallelContext *pcxt) { + MemoryContext oldcontext; FixedParallelState *fps; + /* We might be running in a very short-lived memory context. */ + oldcontext = MemoryContextSwitchTo(TopTransactionContext); + /* Wait for any old workers to exit. */ if (pcxt->nworkers_launched > 0) { @@ -546,6 +550,9 @@ ReinitializeParallelDSM(ParallelContext *pcxt) pcxt->worker[i].error_mqh = shm_mq_attach(mq, pcxt->seg, NULL); } } + + /* Restore previous memory context. */ + MemoryContextSwitchTo(oldcontext); } /* -- 2.43.0
