On Sat, 2026-04-11 at 14:42 +1200, David Rowley wrote:
> I had imagined we'd have an Assert to ensure there's no other
> MemoryPool in the context hierarchy. I guess it would be possible for
> a MemoryPool to have a MemoryPool *next field and chain them, but
> since we've no need for that right now, I imagine it's fine to
> disallow that.
The thread started with a user-defined aggregate making a memory
context per group. If it was doing some more complex things with
MemoryContextSetParent, are we sure we can just disallow it? It's hard
for me to say for sure what a user-defined aggregate might want to do.
> Since we won't want to add a MemoryPool to the parent of
> those contexts, we might need some way to have an existing context
> "join" an existing MemoryPool. That would mean an API more like:
> MemoryPool *MemoryPoolCreate(size_t bytes_limit); then void
> MemoryContextJoinPool(MemoryContext ctx, MemoryPool *pool);
Right.
> All the code that currently does "+= mem_allocated" or "-=
> mem_allocated" would need extra code to do "if (context->memory_pool
> != NULL) context->memory_pool->memory_allocated (+/-)= blksize;".
> I.e., no looping up the hierarchy.
OK.
> I imagined we'd document that a MemoryPool is designed to more easily
> keep track of malloc'd memory for a given (or group of) MemoryContext
> and all of its child contexts so that the total memory allocated can
> easily be checked without recursively visiting the memory_used fields
> in all child contexts. Also, that it is intended for short-lived
> contexts. Any longer-lived usages would mean we'd have MemoryPools in
> contexts that live longer than the query, or an executor node and
> that
> would mean we'd have to code up the ability to have multiple
> MemoryPools in the hierarchy, which seems more complex than what we
> need today.
Yeah, I hope we don't need to go that far.
> Maybe in the future there'd be some need to have MemoryPools with a
> hard limit that ERRORs when it goes above a threshold, but that's not
> what we need for nodeAgg.c. That seems to be more along the lines of
> what Tomas was mentioning in regards to the "Add the ability to limit
> the amount of memory that can be allocated to backends" thread.
Agreed.
Attached v2-0001 simply tracks the totals for all contexts, and works
across MemoryContextSetParent(). I was (and still am) slightly worried
that it will cause a regression, but some basic pgbench runs didn't
show any slowdown. Can you suggest some settings that might focus more
on allocation performance?
Assuming we only want to track for a subtree, then the two approaches
suggested are:
1. Have some "invalid" marker that only tracks the memory upward as far
as it's needed/enabled, then stops. This adds a bit of complexity
because:
a. to enable it after the context is created, we need a new memory
context method to give the current allocated total so we can properly
initialize the size
b. when doing MemoryContextSetParent(), we need to subtract from the
old subtree and add to the new subtree, and if the current subtree
isn't already tracked, then we need to recalculate
2. The memory pools as you suggest. The complexity here is:
a. in MemoryContextSetParent() for the same reason as above
b. mixing subtrees gets messy, and I'm not sure we can just disallow
that. Does that mean MemoryContextSetParent() would just fail?
c. joining in to an existing pool -- not terribly complex, but adds
to the API
I think if we are taking on the complexity with memory pools, we should
do so with the idea that they'll be useful beyond just optimizing the
size calculation. For instance, carrying around information that it's
part of a "work mem" context, which can do things like limit the max
block size.
Regards,
Jeff Davis
From 0d5673e156554c9d95a72c6c7949be9b2d5adb3e Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Tue, 7 Apr 2026 15:47:32 -0700
Subject: [PATCH v2] Account for memory context totals at the time of
allocation.
When a new block is allocated by the underlying allocator, add it to
the current context as well as the totals for all ancestors.
Previously, we tracked only the memory for the current context, and
calculated the total recursively when needed. That led to N^2 behavior
in HashAgg when each group created its own subcontext.
Discussion: https://postgr.es/m/[email protected]
---
src/backend/utils/mmgr/aset.c | 24 +++++++++++++-----------
src/backend/utils/mmgr/bump.c | 9 +++++----
src/backend/utils/mmgr/generation.c | 9 +++++----
src/backend/utils/mmgr/mcxt.c | 27 +++++++++++++++------------
src/backend/utils/mmgr/slab.c | 10 ++++++----
src/include/nodes/memnodes.h | 13 +++++++++++++
6 files changed, 57 insertions(+), 35 deletions(-)
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 6a9ea367107..0f280089bc3 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -422,8 +422,8 @@ AllocSetContextCreateInternal(MemoryContext parent,
parent,
name);
- ((MemoryContext) set)->mem_allocated =
- KeeperBlock(set)->endptr - ((char *) set);
+ MemoryContextUpdateAllocation((MemoryContext) set,
+ KeeperBlock(set)->endptr - ((char *) set));
return (MemoryContext) set;
}
@@ -525,7 +525,7 @@ AllocSetContextCreateInternal(MemoryContext parent,
parent,
name);
- ((MemoryContext) set)->mem_allocated = firstBlockSize;
+ MemoryContextUpdateAllocation((MemoryContext) set, firstBlockSize);
return (MemoryContext) set;
}
@@ -589,7 +589,8 @@ AllocSetReset(MemoryContext context)
else
{
/* Normal case, release the block */
- context->mem_allocated -= block->endptr - ((char *) block);
+ MemoryContextUpdateAllocation(context,
+ -(block->endptr - ((char *) block)));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -696,7 +697,8 @@ AllocSetDelete(MemoryContext context)
AllocBlock next = block->next;
if (!IsKeeperBlock(set, block))
- context->mem_allocated -= block->endptr - ((char *) block);
+ MemoryContextUpdateAllocation(context,
+ -(block->endptr - ((char *) block)));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -758,7 +760,7 @@ AllocSetAllocLarge(MemoryContext context, Size size, int flags)
/* Make a vchunk covering the new block's header */
VALGRIND_MEMPOOL_ALLOC(set, block, ALLOC_BLOCKHDRSZ);
- context->mem_allocated += blksize;
+ MemoryContextUpdateAllocation(context, blksize);
block->aset = set;
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -967,7 +969,7 @@ AllocSetAllocFromNewBlock(MemoryContext context, Size size, int flags,
/* Make a vchunk covering the new block's header */
VALGRIND_MEMPOOL_ALLOC(set, block, ALLOC_BLOCKHDRSZ);
- context->mem_allocated += blksize;
+ MemoryContextUpdateAllocation(context, blksize);
block->aset = set;
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
@@ -1144,7 +1146,8 @@ AllocSetFree(void *pointer)
if (block->next)
block->next->prev = block->prev;
- set->header.mem_allocated -= block->endptr - ((char *) block);
+ MemoryContextUpdateAllocation((MemoryContext) set,
+ -(block->endptr - ((char *) block)));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->freeptr - ((char *) block));
@@ -1307,9 +1310,8 @@ AllocSetRealloc(void *pointer, Size size, int flags)
VALGRIND_MEMPOOL_CHANGE(set, block, newblock, ALLOC_BLOCKHDRSZ);
block = newblock;
- /* updated separately, not to underflow when (oldblksize > blksize) */
- set->header.mem_allocated -= oldblksize;
- set->header.mem_allocated += blksize;
+ MemoryContextUpdateAllocation((MemoryContext) set,
+ (ssize_t) blksize - (ssize_t) oldblksize);
block->freeptr = block->endptr = ((char *) block) + blksize;
diff --git a/src/backend/utils/mmgr/bump.c b/src/backend/utils/mmgr/bump.c
index bfb5a114147..1af3a62755b 100644
--- a/src/backend/utils/mmgr/bump.c
+++ b/src/backend/utils/mmgr/bump.c
@@ -235,7 +235,7 @@ BumpContextCreate(MemoryContext parent, const char *name, Size minContextSize,
MemoryContextCreate((MemoryContext) set, T_BumpContext, MCTX_BUMP_ID,
parent, name);
- ((MemoryContext) set)->mem_allocated = allocSize;
+ MemoryContextUpdateAllocation((MemoryContext) set, allocSize);
return (MemoryContext) set;
}
@@ -341,7 +341,7 @@ BumpAllocLarge(MemoryContext context, Size size, int flags)
/* Make a vchunk covering the new block's header */
VALGRIND_MEMPOOL_ALLOC(set, block, Bump_BLOCKHDRSZ);
- context->mem_allocated += blksize;
+ MemoryContextUpdateAllocation(context, blksize);
/* the block is completely full */
block->freeptr = block->endptr = ((char *) block) + blksize;
@@ -481,7 +481,7 @@ BumpAllocFromNewBlock(MemoryContext context, Size size, int flags,
/* Make a vchunk covering the new block's header */
VALGRIND_MEMPOOL_ALLOC(set, block, Bump_BLOCKHDRSZ);
- context->mem_allocated += blksize;
+ MemoryContextUpdateAllocation(context, blksize);
/* initialize the new block */
BumpBlockInit(set, block, blksize);
@@ -626,7 +626,8 @@ BumpBlockFree(BumpContext *set, BumpBlock *block)
/* release the block from the list of blocks */
dlist_delete(&block->node);
- ((MemoryContext) set)->mem_allocated -= ((char *) block->endptr - (char *) block);
+ MemoryContextUpdateAllocation((MemoryContext) set,
+ -((char *) block->endptr - (char *) block));
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, ((char *) block->endptr - (char *) block));
diff --git a/src/backend/utils/mmgr/generation.c b/src/backend/utils/mmgr/generation.c
index 609c9bdc9a6..2e84a4fba93 100644
--- a/src/backend/utils/mmgr/generation.c
+++ b/src/backend/utils/mmgr/generation.c
@@ -273,7 +273,7 @@ GenerationContextCreate(MemoryContext parent,
parent,
name);
- ((MemoryContext) set)->mem_allocated = firstBlockSize;
+ MemoryContextUpdateAllocation((MemoryContext) set, firstBlockSize);
return (MemoryContext) set;
}
@@ -388,7 +388,7 @@ GenerationAllocLarge(MemoryContext context, Size size, int flags)
/* Make a vchunk covering the new block's header */
VALGRIND_MEMPOOL_ALLOC(set, block, Generation_BLOCKHDRSZ);
- context->mem_allocated += blksize;
+ MemoryContextUpdateAllocation(context, blksize);
/* block with a single (used) chunk */
block->context = set;
@@ -513,7 +513,7 @@ GenerationAllocFromNewBlock(MemoryContext context, Size size, int flags,
/* Make a vchunk covering the new block's header */
VALGRIND_MEMPOOL_ALLOC(set, block, Generation_BLOCKHDRSZ);
- context->mem_allocated += blksize;
+ MemoryContextUpdateAllocation(context, blksize);
/* initialize the new block */
GenerationBlockInit(set, block, blksize);
@@ -697,7 +697,8 @@ GenerationBlockFree(GenerationContext *set, GenerationBlock *block)
/* release the block from the list of blocks */
dlist_delete(&block->node);
- ((MemoryContext) set)->mem_allocated -= block->blksize;
+ MemoryContextUpdateAllocation((MemoryContext) set,
+ -(ssize_t) block->blksize);
#ifdef CLOBBER_FREED_MEMORY
wipe_mem(block, block->blksize);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 073bdb35d2a..65b4514d77e 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -696,6 +696,7 @@ MemoryContextSetParent(MemoryContext context, MemoryContext new_parent)
if (context->parent)
{
MemoryContext parent = context->parent;
+ MemoryContext ancestor;
if (context->prevchild != NULL)
context->prevchild->nextchild = context->nextchild;
@@ -707,11 +708,17 @@ MemoryContextSetParent(MemoryContext context, MemoryContext new_parent)
if (context->nextchild != NULL)
context->nextchild->prevchild = context->prevchild;
+
+ /* subtract this context's total from old parent chain */
+ for (ancestor = parent; ancestor != NULL; ancestor = ancestor->parent)
+ ancestor->total_allocated -= context->total_allocated;
}
/* And relink */
if (new_parent)
{
+ MemoryContext ancestor;
+
Assert(MemoryContextIsValid(new_parent));
context->parent = new_parent;
context->prevchild = NULL;
@@ -719,6 +726,10 @@ MemoryContextSetParent(MemoryContext context, MemoryContext new_parent)
if (new_parent->firstchild != NULL)
new_parent->firstchild->prevchild = context;
new_parent->firstchild = context;
+
+ /* add this context's total to new parent chain */
+ for (ancestor = new_parent; ancestor != NULL; ancestor = ancestor->parent)
+ ancestor->total_allocated += context->total_allocated;
}
else
{
@@ -810,21 +821,12 @@ MemoryContextIsEmpty(MemoryContext context)
Size
MemoryContextMemAllocated(MemoryContext context, bool recurse)
{
- Size total = context->mem_allocated;
-
Assert(MemoryContextIsValid(context));
if (recurse)
- {
- for (MemoryContext curr = context->firstchild;
- curr != NULL;
- curr = MemoryContextTraverseNext(curr, context))
- {
- total += curr->mem_allocated;
- }
- }
-
- return total;
+ return context->total_allocated;
+ else
+ return context->mem_allocated;
}
/*
@@ -1166,6 +1168,7 @@ MemoryContextCreate(MemoryContext node,
node->parent = parent;
node->firstchild = NULL;
node->mem_allocated = 0;
+ node->total_allocated = 0;
node->prevchild = NULL;
node->name = name;
node->ident = NULL;
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index dd1db9566d1..a38860fbc28 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -461,7 +461,7 @@ SlabReset(MemoryContext context)
VALGRIND_MEMPOOL_FREE(slab, block);
free(block);
- context->mem_allocated -= slab->blockSize;
+ MemoryContextUpdateAllocation(context, -(ssize_t) slab->blockSize);
}
/* walk over blocklist and free the blocks */
@@ -481,7 +481,8 @@ SlabReset(MemoryContext context)
VALGRIND_MEMPOOL_FREE(slab, block);
free(block);
- context->mem_allocated -= slab->blockSize;
+ MemoryContextUpdateAllocation(context,
+ -(ssize_t) slab->blockSize);
}
}
@@ -597,7 +598,7 @@ SlabAllocFromNewBlock(MemoryContext context, Size size, int flags)
VALGRIND_MEMPOOL_ALLOC(slab, block, Slab_BLOCKHDRSZ);
block->slab = slab;
- context->mem_allocated += slab->blockSize;
+ MemoryContextUpdateAllocation(context, slab->blockSize);
/* use the first chunk in the new block */
chunk = SlabBlockGetChunk(slab, block, 0);
@@ -836,7 +837,8 @@ SlabFree(void *pointer)
VALGRIND_MEMPOOL_FREE(slab, block);
free(block);
- slab->header.mem_allocated -= slab->blockSize;
+ MemoryContextUpdateAllocation((MemoryContext) slab,
+ -(ssize_t) slab->blockSize);
}
/*
diff --git a/src/include/nodes/memnodes.h b/src/include/nodes/memnodes.h
index 4b24778fe1e..ca5b63b07d8 100644
--- a/src/include/nodes/memnodes.h
+++ b/src/include/nodes/memnodes.h
@@ -123,6 +123,7 @@ typedef struct MemoryContextData
bool isReset; /* T = no space allocated since last reset */
bool allowInCritSection; /* allow palloc in critical section */
Size mem_allocated; /* track memory allocated for this context */
+ Size total_allocated; /* memory including child contexts */
const MemoryContextMethods *methods; /* virtual function table */
MemoryContext parent; /* NULL if no parent (toplevel context) */
MemoryContext firstchild; /* head of linked list of children */
@@ -136,6 +137,18 @@ typedef struct MemoryContextData
/* utils/palloc.h contains typedef struct MemoryContextData *MemoryContext */
+/*
+ * Add allocation to the given context and all ancestors.
+ */
+static inline void
+MemoryContextUpdateAllocation(MemoryContext context, ssize_t amount)
+{
+ context->mem_allocated += amount;
+
+ for (MemoryContext curr = context; curr != NULL; curr = curr->parent)
+ curr->total_allocated += amount;
+}
+
/*
* MemoryContextIsValid
* True iff memory context is valid.
--
2.43.0