Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread Amul Sul
On Tue, Apr 16, 2024 at 3:44 PM David Rowley  wrote:

> On Tue, 16 Apr 2024 at 17:13, Amul Sul  wrote:
> > Attached is a small patch adding the missing BumpContext description to
> the
> > README.
>
> Thanks for noticing and working on the patch.
>
> There were a few things that were not quite accurate or are misleading:
>
> 1.
>
> > +These three memory contexts aim to free memory back to the operating
> system
>
> That's not true for bump. It's the worst of the 4.  Worse than aset.
> It only returns memory when the context is reset/deleted.
>
> 2.
>
> "These memory contexts were initially developed for ReorderBuffer, but
> may be useful elsewhere as long as the allocation patterns match."
>
> The above isn't true for bump.  It was written for tuplesort.  I think
> we can just remove that part now.  Slab and generation are both old
> enough not to care why they were conceived.
>
> Also since adding bump, I think the choice of which memory context to
> use is about 33% harder than it used to be when there were only 3
> context types.  I think this warrants giving more detail on what these
> 3 special-purpose memory allocators are good for.  I've added more
> details in the attached patch.  This includes more details about
> freeing malloc'd blocks
>
> I've tried to detail out enough of the specialities of the context
> type without going into extensive detail. My hope is that there will
> be enough detail for someone to choose the most suitable looking one
> and head over to the corresponding .c file to find out more.
>
> Is that about the right level of detail?
>

Yes, it looks much better now, thank you.

Regards,
Amul


Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread David Rowley
On Mon, 8 Apr 2024 at 00:37, David Rowley  wrote:
> I've now pushed all 3 patches.   Thank you for all the reviews on
> these and for the extra MemoryContextMethodID bit, Matthias.

I realised earlier today when working on [1] that bump makes a pretty
brain-dead move when adding dedicated blocks to the blocks list.  The
problem is that I opted to not have a current block field in
BumpContext and just rely on the head pointer of the blocks list to be
the "current" block.  The head block is the block we look at to see if
we've any space left when new allocations come in.  The problem there
is when adding a dedicated block in BumpAllocLarge(), the code adds
this to the head of the blocks list so that when a new allocation
comes in that's normal-sized, the block at the top of the list is full
and we have to create a new block for the allocation.

The attached fixes this by pushing these large/dedicated blocks to the
*tail* of the blocks list.  This means the partially filled block
remains at the head and is available for any new allocation which will
fit.  This behaviour is evident by the regression test change that I
added earlier today when working on [1].  The 2nd and smaller
allocation in that text goes onto the keeper block rather than a new
block.

I plan to push this tomorrow.

David

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bea97cd02ebb347ab469b78673c2b33a72109669
From 39a60420a83e5a059de50869c0981be9732909ab Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Tue, 16 Apr 2024 22:26:00 +1200
Subject: [PATCH v1] Push dedicated BumpBlocks to the tail of the blocks list

BumpContext relies on using the head block from its 'blocks' field to
use as the current block to allocate new chunks to.  When we receive an
allocation request larger than allocChunkLimit, we place these chunks on
a new dedicated block and push this block onto the head of the 'blocks'
list.  This behavior caused the previous bump block to no longer be
available for new normal-sized (non-large) allocations and would result
in wasting space on blocks when a large request arrived before the
current block was full.

Here adjust the code to push these dedicated blocks onto the *tail* of
the blocks list so that the head block remains intact and available to
be used by normal allocation request sizes.

Discussion: 
https://postgr.es/m/caaphdvp9___r-ayjj0nz6gd3mecgwgz0_6zptwpwj+zqhtm...@mail.gmail.com
---
 src/backend/utils/mmgr/bump.c  | 8 ++--
 src/test/regress/expected/sysviews.out | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/mmgr/bump.c b/src/backend/utils/mmgr/bump.c
index 449bd29344..83798e2245 100644
--- a/src/backend/utils/mmgr/bump.c
+++ b/src/backend/utils/mmgr/bump.c
@@ -342,8 +342,12 @@ BumpAllocLarge(MemoryContext context, Size size, int flags)
randomize_mem((char *) MemoryChunkGetPointer(chunk), size);
 #endif
 
-   /* add the block to the list of allocated blocks */
-   dlist_push_head(>blocks, >node);
+   /*
+* Add the block to the tail of allocated blocks list.  The current 
block
+* is left at the head of the list as it may still have space for
+* non-large allocations.
+*/
+   dlist_push_tail(>blocks, >node);
 
 #ifdef MEMORY_CONTEXT_CHECKING
/* Ensure any padding bytes are marked NOACCESS. */
diff --git a/src/test/regress/expected/sysviews.out 
b/src/test/regress/expected/sysviews.out
index 634dc8d8b8..2f3eb4e7f1 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -47,7 +47,7 @@ select name, parent, total_bytes > 0, total_nblocks, 
free_bytes > 0, free_chunks
 from pg_backend_memory_contexts where name = 'Caller tuples';
  name  | parent | ?column? | total_nblocks | ?column? | 
free_chunks 
 
---++--+---+--+-
- Caller tuples | TupleSort sort | t| 3 | t|
   0
+ Caller tuples | TupleSort sort | t| 2 | t|
   0
 (1 row)
 
 rollback;
-- 
2.40.1



Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread David Rowley
On Tue, 16 Apr 2024 at 17:13, Amul Sul  wrote:
> Attached is a small patch adding the missing BumpContext description to the
> README.

Thanks for noticing and working on the patch.

There were a few things that were not quite accurate or are misleading:

1.

> +These three memory contexts aim to free memory back to the operating system

That's not true for bump. It's the worst of the 4.  Worse than aset.
It only returns memory when the context is reset/deleted.

2.

"These memory contexts were initially developed for ReorderBuffer, but
may be useful elsewhere as long as the allocation patterns match."

The above isn't true for bump.  It was written for tuplesort.  I think
we can just remove that part now.  Slab and generation are both old
enough not to care why they were conceived.

Also since adding bump, I think the choice of which memory context to
use is about 33% harder than it used to be when there were only 3
context types.  I think this warrants giving more detail on what these
3 special-purpose memory allocators are good for.  I've added more
details in the attached patch.  This includes more details about
freeing malloc'd blocks

I've tried to detail out enough of the specialities of the context
type without going into extensive detail. My hope is that there will
be enough detail for someone to choose the most suitable looking one
and head over to the corresponding .c file to find out more.

Is that about the right level of detail?

David
diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index f484f7d6f5..695088bb66 100644
--- a/src/backend/utils/mmgr/README
+++ b/src/backend/utils/mmgr/README
@@ -471,25 +471,32 @@ thrashing.
 Alternative Memory Context Implementations
 --
 
-aset.c is our default general-purpose implementation, working fine
-in most situations. We also have two implementations optimized for
-special use cases, providing either better performance or lower memory
-usage compared to aset.c (or both).
-
-* slab.c (SlabContext) is designed for allocations of fixed-length
-  chunks, and does not allow allocations of chunks with different size.
-
-* generation.c (GenerationContext) is designed for cases when chunks
-  are allocated in groups with similar lifespan (generations), or
-  roughly in FIFO order.
-
-Both memory contexts aim to free memory back to the operating system
-(unlike aset.c, which keeps the freed chunks in a freelist, and only
-returns the memory when reset/deleted).
-
-These memory contexts were initially developed for ReorderBuffer, but
-may be useful elsewhere as long as the allocation patterns match.
-
+aset.c (AllocSetContext) is our default general-purpose allocator.  Three other
+allocator types also exist which are special-purpose:
+
+* slab.c (SlabContext) is designed for allocations of fixed-sized
+  chunks.  The fixed chunk size must be specified when creating the context.
+  New chunks are allocated to the fullest block, keeping used chunks densely
+  packed together to avoid memory fragmentation.  This also increases the
+  chances that pfree'ing a chunk will result in a block becoming empty of all
+  chunks and allow it to be free'd back to the operating system.
+
+* generation.c (GenerationContext) is best suited for cases when chunks are
+  allocated in groups with similar lifespan (generations), or roughly in FIFO
+  order.  No attempt is made to reuse space left by pfree'd chunks.  Blocks
+  are returned to the operating system when all chunks on them have been
+  pfree'd.
+
+* bump.c (BumpContext) is best suited for use cases that require densely
+  allocated chunks of memory that never need to be individually pfree'd or
+  repalloc'd.  These operations are unsupported due to BumpContext chunks
+  having no chunk header.  No chunk header means more densely packed chunks,
+  which is especially useful for workloads that perform lots of small
+  allocations.  Blocks are only free'd back to the operating system when the
+  context is reset or deleted.
+
+For further details, please read the header comment in the corresponding .c
+file.
 
 Memory Accounting
 -


Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread Daniel Gustafsson
> On 16 Apr 2024, at 07:12, Amul Sul  wrote:
> 
> Attached is a small patch adding the missing BumpContext description to the
> README.

Nice catch, we should add it to the README.

+  pfree'd or realloc'd.
I think it's best to avoid mixing API:s, "pfree'd or repalloc'd" keeps it to
functions in our API instead.

--
Daniel Gustafsson





Re: Add bump memory context type and use it for tuplesorts

2024-04-15 Thread Amul Sul
Attached is a small patch adding the missing BumpContext description to the
README.

Regards,
Amul


0001-Add-BumpContext-description-to-mmgr-README.patch
Description: Binary data


Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Daniel Gustafsson
> On 8 Apr 2024, at 01:04, Andres Freund  wrote:

> What makes you think that's unrelated? To me that looks like the same issue?

Nvm, I misread the assert, ETOOLITTLESLEEP. Sorry for the noise.

--
Daniel Gustafsson





Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread David Rowley
On Mon, 8 Apr 2024 at 09:09, Andres Freund  wrote:
> I suspect that KeeperBlock() isn't returning true, because IsKeeperBlock 
> misses
> the MAXALIGN(). I think that about fits with:

Thanks for investigating that.

I've just pushed a fix for the macro and also adjusted a location
which was *correctly* calculating the keeper block address manually to
use the macro. If I'd used the macro there to start with the Assert
likely wouldn't have failed, but there'd have been memory alignment
issues.

David




Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Andres Freund
Hi,

On 2024-04-08 00:55:42 +0200, Daniel Gustafsson wrote:
> > On 8 Apr 2024, at 00:41, Tom Lane  wrote:
> > 
> > Tomas Vondra  writes:
> >> Yup, changing it to this:
> > 
> >> #define KeeperBlock(set) ((BumpBlock *) ((char *) (set) +
> >> MAXALIGN(sizeof(BumpContext
> > 
> >> fixes the issue for me.
> > 
> > Mamba is happy with that change, too.
> 
> Unrelated to that one, seems like turaco ran into another issue:
> 
> running bootstrap script ... TRAP: failed Assert("total_allocated == 
> context->mem_allocated"), File: "bump.c", Line: 808, PID: 7809
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=turaco=2024-04-07%2022%3A42%3A54

What makes you think that's unrelated? To me that looks like the same issue?

Greetings,

Andres Freund




Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Daniel Gustafsson
> On 8 Apr 2024, at 00:41, Tom Lane  wrote:
> 
> Tomas Vondra  writes:
>> Yup, changing it to this:
> 
>> #define KeeperBlock(set) ((BumpBlock *) ((char *) (set) +
>> MAXALIGN(sizeof(BumpContext
> 
>> fixes the issue for me.
> 
> Mamba is happy with that change, too.

Unrelated to that one, seems like turaco ran into another issue:

running bootstrap script ... TRAP: failed Assert("total_allocated == 
context->mem_allocated"), File: "bump.c", Line: 808, PID: 7809

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=turaco=2024-04-07%2022%3A42%3A54

--
Daniel Gustafsson





Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Tom Lane
Tomas Vondra  writes:
> Yup, changing it to this:

> #define KeeperBlock(set) ((BumpBlock *) ((char *) (set) +
> MAXALIGN(sizeof(BumpContext

> fixes the issue for me.

Mamba is happy with that change, too.

regards, tom lane




Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Tomas Vondra



On 4/7/24 23:09, Andres Freund wrote:
> Hi,
> 
> On 2024-04-07 22:35:47 +0200, Tomas Vondra wrote:
>> I haven't investigated, but I'd considering it works on 64-bit, I guess
>> it's not considering alignment somewhere. I can dig more if needed.
> 
> I think I may the problem:
> 
> 
> #define KeeperBlock(set) ((BumpBlock *) ((char *) (set) + 
> sizeof(BumpContext)))
> #define IsKeeperBlock(set, blk) (KeeperBlock(set) == (blk))
> 
> BumpContextCreate():
> ...
>   /* Fill in the initial block's block header */
>   block = (BumpBlock *) (((char *) set) + MAXALIGN(sizeof(BumpContext)));
>   /* determine the block size and initialize it */
>   firstBlockSize = allocSize - MAXALIGN(sizeof(BumpContext));
>   BumpBlockInit(set, block, firstBlockSize);
> ...
>   ((MemoryContext) set)->mem_allocated = allocSize;
> 
> void
> BumpCheck(MemoryContext context)
> ...
>   if (IsKeeperBlock(bump, block))
>   total_allocated += block->endptr - (char *) bump;
> ...
> 
> I suspect that KeeperBlock() isn't returning true, because IsKeeperBlock 
> misses
> the MAXALIGN(). I think that about fits with:
> 
>> #4  0x008f0088 in BumpCheck (context=0x131e330) at bump.c:808
>> 808 Assert(total_allocated == context->mem_allocated);
>> (gdb) p total_allocated
>> $1 = 8120
>> (gdb) p context->mem_allocated
>> $2 = 8192
> 

Yup, changing it to this:

#define KeeperBlock(set) ((BumpBlock *) ((char *) (set) +
MAXALIGN(sizeof(BumpContext

fixes the issue for me.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Andres Freund
Hi,

On 2024-04-07 22:35:47 +0200, Tomas Vondra wrote:
> I haven't investigated, but I'd considering it works on 64-bit, I guess
> it's not considering alignment somewhere. I can dig more if needed.

I think I may the problem:


#define KeeperBlock(set) ((BumpBlock *) ((char *) (set) + sizeof(BumpContext)))
#define IsKeeperBlock(set, blk) (KeeperBlock(set) == (blk))

BumpContextCreate():
...
/* Fill in the initial block's block header */
block = (BumpBlock *) (((char *) set) + MAXALIGN(sizeof(BumpContext)));
/* determine the block size and initialize it */
firstBlockSize = allocSize - MAXALIGN(sizeof(BumpContext));
BumpBlockInit(set, block, firstBlockSize);
...
((MemoryContext) set)->mem_allocated = allocSize;

void
BumpCheck(MemoryContext context)
...
if (IsKeeperBlock(bump, block))
total_allocated += block->endptr - (char *) bump;
...

I suspect that KeeperBlock() isn't returning true, because IsKeeperBlock misses
the MAXALIGN(). I think that about fits with:

> #4  0x008f0088 in BumpCheck (context=0x131e330) at bump.c:808
> 808 Assert(total_allocated == context->mem_allocated);
> (gdb) p total_allocated
> $1 = 8120
> (gdb) p context->mem_allocated
> $2 = 8192


Greetings,

Andres Freund




Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Tomas Vondra




On 4/7/24 22:35, Tomas Vondra wrote:
> On 4/7/24 14:37, David Rowley wrote:
>> On Sun, 7 Apr 2024 at 22:05, John Naylor  wrote:
>>>
>>> On Sat, Apr 6, 2024 at 7:37 PM David Rowley  wrote:

>>>  I'm planning on pushing these, pending a final look at 0002 and 0003
 on Sunday morning NZ time (UTC+12), likely in about 10 hours time.
>>>
>>> +1
>>
>> I've now pushed all 3 patches.   Thank you for all the reviews on
>> these and for the extra MemoryContextMethodID bit, Matthias.
>>
>>> I haven't looked at v6, but I've tried using it in situ, and it seems
>>> to work as well as hoped:
>>>
>>> https://www.postgresql.org/message-id/CANWCAZZQFfxvzO8yZHFWtQV%2BZ2gAMv1ku16Vu7KWmb5kZQyd1w%40mail.gmail.com
>>
>> I'm already impressed with the radix tree work.  Nice to see bump
>> allowing a little more memory to be saved for TID storage.
>>
>> David
> 
> There seems to be some issue with this on 32-bit machines. A couple
> animals (grison, mamba) already complained about an assert int
> BumpCheck() during initdb, I get the same crash on my rpi5 running
> 32-bit debian - see the backtrace attached.
> 
> I haven't investigated, but I'd considering it works on 64-bit, I guess
> it's not considering alignment somewhere. I can dig more if needed.
> 

I did try running it under valgrind, and there doesn't seem to be
anything particularly wrong - just a bit of noise about calculating CRC
on uninitialized bytes.

regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Tomas Vondra
On 4/7/24 14:37, David Rowley wrote:
> On Sun, 7 Apr 2024 at 22:05, John Naylor  wrote:
>>
>> On Sat, Apr 6, 2024 at 7:37 PM David Rowley  wrote:
>>>
>>  I'm planning on pushing these, pending a final look at 0002 and 0003
>>> on Sunday morning NZ time (UTC+12), likely in about 10 hours time.
>>
>> +1
> 
> I've now pushed all 3 patches.   Thank you for all the reviews on
> these and for the extra MemoryContextMethodID bit, Matthias.
> 
>> I haven't looked at v6, but I've tried using it in situ, and it seems
>> to work as well as hoped:
>>
>> https://www.postgresql.org/message-id/CANWCAZZQFfxvzO8yZHFWtQV%2BZ2gAMv1ku16Vu7KWmb5kZQyd1w%40mail.gmail.com
> 
> I'm already impressed with the radix tree work.  Nice to see bump
> allowing a little more memory to be saved for TID storage.
> 
> David

There seems to be some issue with this on 32-bit machines. A couple
animals (grison, mamba) already complained about an assert int
BumpCheck() during initdb, I get the same crash on my rpi5 running
32-bit debian - see the backtrace attached.

I haven't investigated, but I'd considering it works on 64-bit, I guess
it's not considering alignment somewhere. I can dig more if needed.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companywarning: Can't open file /SYSV003414ed (deleted) during file-backed mapping 
note processing
[New LWP 20363]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Core was generated by 
`/mnt/data/postgres/tmp_install/home/debian/pg-master/bin/postgres --boot -F 
-c'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=4152716800, signo=6, 
no_tid=) at pthread_kill.c:44
44  pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=4152716800, signo=6, 
no_tid=) at pthread_kill.c:44
#1  0xf721bcfc in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#2  0xf72050a0 in __GI_abort () at abort.c:79
#3  0x008b29fc in ExceptionalCondition (conditionName=0xb1c3cc "total_allocated 
== context->mem_allocated", fileName=0xb1bf98 "bump.c", lineNumber=808) at 
assert.c:66
#4  0x008f0088 in BumpCheck (context=0x131e330) at bump.c:808
#5  0x008ef1bc in BumpReset (context=0x131e330) at bump.c:258
#6  0x008ef310 in BumpDelete (context=0x131e330) at bump.c:287
#7  0x008fab58 in MemoryContextDeleteOnly (context=0x131e330) at mcxt.c:528
#8  0x008faa00 in MemoryContextDelete (context=0x131e330) at mcxt.c:482
#9  0x008fac1c in MemoryContextDeleteChildren (context=0x131c328) at mcxt.c:548
#10 0x008fa75c in MemoryContextReset (context=0x131c328) at mcxt.c:389
#11 0x0090a1a8 in tuplesort_free (state=0x131a3b8) at tuplesort.c:958
#12 0x0090a1fc in tuplesort_end (state=0x131a3b8) at tuplesort.c:973
#13 0x00149250 in _bt_spooldestroy (btspool=0x1222b20) at nbtsort.c:517
#14 0x00149200 in _bt_spools_heapscan (heap=0xebf9f250, index=0xebf426e0, 
buildstate=0xffde9ed0, indexInfo=0x12e8f90) at nbtsort.c:504
#15 0x00148e30 in btbuild (heap=0xebf9f250, index=0xebf426e0, 
indexInfo=0x12e8f90) at nbtsort.c:321
#16 0x001fd164 in index_build (heapRelation=0xebf9f250, 
indexRelation=0xebf426e0, indexInfo=0x12e8f90, isreindex=false, parallel=false) 
at index.c:3021
#17 0x001e0404 in build_indices () at bootstrap.c:962
#18 0x001da92c in boot_yyparse () at bootparse.y:388
#19 0x001de988 in BootstrapModeMain (argc=6, argv=0x119c114, check_only=false) 
at bootstrap.c:357
#20 0x004389e4 in main (argc=7, argv=0x119c110) at main.c:186
(gdb) up
#1  0xf721bcfc in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
26  ../sysdeps/posix/raise.c: No such file or directory.
(gdb) up
#2  0xf72050a0 in __GI_abort () at abort.c:79
79  abort.c: No such file or directory.
(gdb) up
#3  0x008b29fc in ExceptionalCondition (conditionName=0xb1c3cc "total_allocated 
== context->mem_allocated", fileName=0xb1bf98 "bump.c", lineNumber=808) at 
assert.c:66
66  abort();
(gdb) up
#4  0x008f0088 in BumpCheck (context=0x131e330) at bump.c:808
808 Assert(total_allocated == context->mem_allocated);
(gdb) p total_allocated
$1 = 8120
(gdb) p context->mem_allocated
$2 = 8192


Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread David Rowley
On Sun, 7 Apr 2024 at 22:05, John Naylor  wrote:
>
> On Sat, Apr 6, 2024 at 7:37 PM David Rowley  wrote:
> >
>  I'm planning on pushing these, pending a final look at 0002 and 0003
> > on Sunday morning NZ time (UTC+12), likely in about 10 hours time.
>
> +1

I've now pushed all 3 patches.   Thank you for all the reviews on
these and for the extra MemoryContextMethodID bit, Matthias.

> I haven't looked at v6, but I've tried using it in situ, and it seems
> to work as well as hoped:
>
> https://www.postgresql.org/message-id/CANWCAZZQFfxvzO8yZHFWtQV%2BZ2gAMv1ku16Vu7KWmb5kZQyd1w%40mail.gmail.com

I'm already impressed with the radix tree work.  Nice to see bump
allowing a little more memory to be saved for TID storage.

David




Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread John Naylor
On Sat, Apr 6, 2024 at 7:37 PM David Rowley  wrote:
>
 I'm planning on pushing these, pending a final look at 0002 and 0003
> on Sunday morning NZ time (UTC+12), likely in about 10 hours time.

+1

I haven't looked at v6, but I've tried using it in situ, and it seems
to work as well as hoped:

https://www.postgresql.org/message-id/CANWCAZZQFfxvzO8yZHFWtQV%2BZ2gAMv1ku16Vu7KWmb5kZQyd1w%40mail.gmail.com




Re: Add bump memory context type and use it for tuplesorts

2024-04-06 Thread Matthias van de Meent
On Sun, 7 Apr 2024, 01:59 David Rowley,  wrote:

> On Sun, 7 Apr 2024 at 05:45, Matthias van de Meent
>  wrote:
> > Malloc's docs specify the minimum chunk size at 4*sizeof(void*) and
> itself uses , so using powers of 2 for chunks would indeed fail to detect
> 1s in the 4th bit. I suspect you'll get different results when you check
> the allocation patterns of multiples of 8 bytes, starting from 40,
> especially on 32-bit arm (where MALLOC_ALIGNMENT is 8 bytes, rather than
> the 16 bytes on i386 and 64-bit architectures, assuming  [0] is accurate)


I'd hazard a guess that
> there are more instances of Postgres running on Windows today than on
> 32-bit CPUs and we don't seem too worried about the bit-patterns used
> for Windows.
>

Yeah, that is something I had some thoughts about too, but didn't check if
there was historical context around. I don't think it's worth bothering
right now though.

>> Another reason not to make it 5 bits is that I believe that would make
> >> the mcxt_methods[] array 2304 bytes rather than 576 bytes.  4 bits
> >> makes it 1152 bytes, if I'm counting correctly.
> >
> >
> > I don't think I understand why this would be relevant when only 5 of the
> contexts are actually in use (thus in caches). Is that size concern about
> TLB entries then?
>
> It's a static const array. I don't want to bloat the binary with
> something we'll likely never need.  If we one day need it, we can
> reserve another bit using the same overlapping method.
>

Fair points.

>> I revised the patch to simplify hdrmask logic.  This started with me
> >> having trouble finding the best set of words to document that the
> >> offset is "half the bytes between the chunk and block".  So, instead
> >> of doing that, I've just made it so these two fields effectively
> >> overlap. The lowest bit of the block offset is the same bit as the
> >> high bit of what MemoryChunkGetValue returns.
> >
> >
> > Works for me, I suppose.
>
> hmm. I don't detect much enthusiasm for it.
>

I had a tiring day leaving me short on enthousiasm, after which I realised
there were some things to this patch that would need fixing.

I could've worded this better, but nothing against this code.

-Matthias


Re: Add bump memory context type and use it for tuplesorts

2024-04-06 Thread David Rowley
On Sun, 7 Apr 2024 at 05:45, Matthias van de Meent
 wrote:
> Malloc's docs specify the minimum chunk size at 4*sizeof(void*) and itself 
> uses , so using powers of 2 for chunks would indeed fail to detect 1s in the 
> 4th bit. I suspect you'll get different results when you check the allocation 
> patterns of multiples of 8 bytes, starting from 40, especially on 32-bit arm 
> (where MALLOC_ALIGNMENT is 8 bytes, rather than the 16 bytes on i386 and 
> 64-bit architectures, assuming  [0] is accurate)

I'm prepared to be overruled, but I just don't have strong feelings
that 32-bit is worth making these reservations for. Especially so
given the rate we're filling these slots.  The only system that I see
the 4th bit change is Cygwin. It doesn't look like a very easy system
to protect against pfreeing of malloc'd chunks as the prior 8-bytes
seem to vary depending on the malloc'd size and I see all bit patterns
there, including the ones we use for our memory contexts.

Since we can't protect against every possible bit pattern there, we
need to draw the line somewhere.  I don't think 32-bit systems are
worth reserving these precious slots for. I'd hazard a guess that
there are more instances of Postgres running on Windows today than on
32-bit CPUs and we don't seem too worried about the bit-patterns used
for Windows.

> In your updated 0001, you don't seem to fill the RESERVED_GLIBC memctx array 
> entries with BOGUS_MCTX().

Oops. Thanks

>> Another reason not to make it 5 bits is that I believe that would make
>> the mcxt_methods[] array 2304 bytes rather than 576 bytes.  4 bits
>> makes it 1152 bytes, if I'm counting correctly.
>
>
> I don't think I understand why this would be relevant when only 5 of the 
> contexts are actually in use (thus in caches). Is that size concern about TLB 
> entries then?

It's a static const array. I don't want to bloat the binary with
something we'll likely never need.  If we one day need it, we can
reserve another bit using the same overlapping method.

>> I revised the patch to simplify hdrmask logic.  This started with me
>> having trouble finding the best set of words to document that the
>> offset is "half the bytes between the chunk and block".  So, instead
>> of doing that, I've just made it so these two fields effectively
>> overlap. The lowest bit of the block offset is the same bit as the
>> high bit of what MemoryChunkGetValue returns.
>
>
> Works for me, I suppose.

hmm. I don't detect much enthusiasm for it.

Personally, I quite like the method as it adds no extra instructions
when encoding the MemoryChunk and only a simple bitwise-AND when
decoding it.  Your method added extra instructions in the encode and
decode.  I went to great lengths to make this code as fast as
possible, so I know which method that I prefer.  We often palloc and
never do anything that requires the chunk header to be decoded, so not
adding extra instructions on the encoding stage is a big win.

The only method I see to avoid adding instructions in encoding and
decoding is to reduce the bit-space for the MemoryChunkGetValue field
to 29 bits. Effectively, that means non-external chunks can only be
512MB rather than 1GB.  As far as I know, that just limits slab.c to
only being able to do 512MB pallocs as generation.c and aset.c use
external chunks well below that threshold. Restricting slab to 512MB
is probably far from the end of the world. Anything close to that
would be a terrible use case for slab.  I was just less keen on using
a bit from there as that's a field we allow the context implementation
to do what they like with. Having bitspace for 2^30 possible values in
there just seems nice given that it can store any possible value from
zero up to MaxAllocSize.

David




Re: Add bump memory context type and use it for tuplesorts

2024-04-06 Thread Matthias van de Meent
On Sat, 6 Apr 2024, 14:36 David Rowley,  wrote:

> On Sat, 6 Apr 2024 at 02:30, Matthias van de Meent
>  wrote:
> >
> > On Thu, 4 Apr 2024 at 22:43, Tom Lane  wrote:
> > >
> > > Matthias van de Meent  writes:
> > > > It extends memory context IDs to 5 bits (32 values), of which
> > > > - 8 have glibc's malloc pattern of 001/010;
> > > > - 1 is unused memory's 0
> > > > - 1 is wipe_mem's 1
> > > > - 4 are used by existing contexts
> (Aset/Generation/Slab/AlignedRedirect)
> > > > - 18 are newly available.
> > >
> > > This seems like it would solve the problem for a good long time
> > > to come; and if we ever need more IDs, we could steal one more bit
> > > by requiring the offset to the block header to be a multiple of 8.
> > > (Really, we could just about do that today at little or no cost ...
> > > machines with MAXALIGN less than 8 are very thin on the ground.)
> >
> > Hmm, it seems like a decent idea, but I didn't want to deal with the
> > repercussions of that this late in the cycle when these 2 bits were
> > still relatively easy to get hold of.
>
> Thanks for writing the patch.
>
> I think 5 bits is 1 too many. 4 seems fine. I also think you've
> reserved too many slots in your patch as I disagree that we need to
> reserve the glibc malloc pattern anywhere but in the 1 and 2 slots of
> the mcxt_methods[] array.  I looked again at the 8 bytes prior to a
> glibc malloc'd chunk and I see the lowest 4 bits of the headers
> consistently set to 0001 for all powers of 2 starting at 8 up to
> 65536.


Malloc's docs specify the minimum chunk size at 4*sizeof(void*) and itself
uses , so using powers of 2 for chunks would indeed fail to detect 1s in
the 4th bit. I suspect you'll get different results when you check the
allocation patterns of multiples of 8 bytes, starting from 40, especially
on 32-bit arm (where MALLOC_ALIGNMENT is 8 bytes, rather than the 16 bytes
on i386 and 64-bit architectures, assuming  [0] is accurate)

131072 seems to vary and beyond that, they seem to be set to
> 0010.
>

In your updated 0001, you don't seem to fill the RESERVED_GLIBC memctx
array entries with BOGUS_MCTX().

With that, there's no increase in the number of reserved slots from
> what we have reserved today. Still 4.  So having 4 bits instead of 3
> bits gives us a total of 12 slots rather than 4 slots.  Having 3x
> slots seems enough.  We might need an extra bit for something else
> sometime. I think keeping it up our sleeve is a good idea.
>
> Another reason not to make it 5 bits is that I believe that would make
> the mcxt_methods[] array 2304 bytes rather than 576 bytes.  4 bits
> makes it 1152 bytes, if I'm counting correctly.
>

I don't think I understand why this would be relevant when only 5 of the
contexts are actually in use (thus in caches). Is that size concern about
TLB entries then?


> I revised the patch to simplify hdrmask logic.  This started with me
> having trouble finding the best set of words to document that the
> offset is "half the bytes between the chunk and block".  So, instead
> of doing that, I've just made it so these two fields effectively
> overlap. The lowest bit of the block offset is the same bit as the
> high bit of what MemoryChunkGetValue returns.


Works for me, I suppose.

I also updated src/backend/utils/mmgr/README to explain this and
> adjust the mentions of 3-bits and 61-bits to 4-bits and 60-bits.  I
> also explained the overlapping part.
>

Thanks!

[0]
https://sourceware.org/glibc/wiki/MallocInternals#Platform-specific_Thresholds_and_Constants

>


Re: Add bump memory context type and use it for tuplesorts

2024-04-05 Thread David Rowley
On Sat, 6 Apr 2024 at 03:24, Tom Lane  wrote:
> OK.  I did not read the patch very closely, but at least in principle
> I have no further objections.  David, are you planning to take point
> on getting this in?

Yes. I'll be looking soon.

David




Re: Add bump memory context type and use it for tuplesorts

2024-04-05 Thread Tom Lane
Matthias van de Meent  writes:
> On Thu, 4 Apr 2024 at 22:43, Tom Lane  wrote:
>> The only objection I can think of is that perhaps this would slow
>> things down a tad by requiring more complicated shifting/masking.
>> I wonder if we could redo the performance checks that were done
>> on the way to accepting the current design.

> I didn't do very extensive testing, but the light performance tests
> that I did with the palloc performance benchmark patch & script shared
> above indicate didn't measure an observable negative effect.

OK.  I did not read the patch very closely, but at least in principle
I have no further objections.  David, are you planning to take point
on getting this in?

regards, tom lane




Re: Add bump memory context type and use it for tuplesorts

2024-04-05 Thread Matthias van de Meent
On Thu, 4 Apr 2024 at 22:43, Tom Lane  wrote:
>
> Matthias van de Meent  writes:
> > On Mon, 25 Mar 2024 at 22:44, Tom Lane  wrote:
> >> Basically, I'm not happy with consuming the last reasonably-available
> >> pattern for a memory context type that has little claim to being the
> >> Last Context Type We Will Ever Want.  Rather than making a further
> >> dent in our ability to detect corrupted chunks, we should do something
> >> towards restoring the expansibility that existed in the original
> >> design.  Then we can add bump contexts and whatever else we want.
>
> > So, would something like the attached make enough IDs available so
> > that we can add the bump context anyway?
>
> > It extends memory context IDs to 5 bits (32 values), of which
> > - 8 have glibc's malloc pattern of 001/010;
> > - 1 is unused memory's 0
> > - 1 is wipe_mem's 1
> > - 4 are used by existing contexts (Aset/Generation/Slab/AlignedRedirect)
> > - 18 are newly available.
>
> This seems like it would solve the problem for a good long time
> to come; and if we ever need more IDs, we could steal one more bit
> by requiring the offset to the block header to be a multiple of 8.
> (Really, we could just about do that today at little or no cost ...
> machines with MAXALIGN less than 8 are very thin on the ground.)

Hmm, it seems like a decent idea, but I didn't want to deal with the
repercussions of that this late in the cycle when these 2 bits were
still relatively easy to get hold of.

> The only objection I can think of is that perhaps this would slow
> things down a tad by requiring more complicated shifting/masking.
> I wonder if we could redo the performance checks that were done
> on the way to accepting the current design.

I didn't do very extensive testing, but the light performance tests
that I did with the palloc performance benchmark patch & script shared
above indicate didn't measure an observable negative effect.
An adapted version of the test that uses repalloc() to check
performance differences in MCXT_METHOD() doesn't show a significant
performance difference from master either. That test case is attached
as repalloc-performance-test-function.patch.txt.

The full set of patches would then accumulate to the attached v5 of
the patchset.
0001 is an update of my patch from yesterday, in which I update
MemoryContextMethodID infrastructure for more IDs, and use a new
naming scheme for unused/reserved IDs.
0002 and 0003 are David's patches, with minor changes to work with
0001 (rebasing, and I moved the location around to keep function
declaration in order with memctx ids)

Kind regards,

Matthias van de Meent
From b65320736cf63684b4710b3d63e243a0862d37c6 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Fri, 5 Apr 2024 15:13:52 +0200
Subject: [PATCH] repalloc performance test function

---
 src/backend/utils/adt/mcxtfuncs.c | 231 ++
 src/include/catalog/pg_proc.dat   |  12 ++
 2 files changed, 243 insertions(+)

diff --git a/src/backend/utils/adt/mcxtfuncs.c 
b/src/backend/utils/adt/mcxtfuncs.c
index 4d4a70915b..88de4bbcb5 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,8 +15,11 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
+#include "miscadmin.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/builtins.h"
@@ -185,3 +188,231 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 
PG_RETURN_BOOL(true);
 }
+
+typedef struct AllocateTestNext
+{
+   struct AllocateTestNext *next;  /* ptr to the next allocation */
+} AllocateTestNext;
+
+/* #define ALLOCATE_TEST_DEBUG */
+/*
+ * pg_allocate_memory_test
+ * Used to test the performance of a memory context types
+ */
+Datum
+pg_allocate_memory_test(PG_FUNCTION_ARGS)
+{
+   int32   chunk_size = PG_GETARG_INT32(0);
+   int64   keep_memory = PG_GETARG_INT64(1);
+   int64   total_alloc = PG_GETARG_INT64(2);
+   text   *context_type_text = PG_GETARG_TEXT_PP(3);
+   char   *context_type;
+   int64   curr_memory_use = 0;
+   int64   remaining_alloc_bytes = total_alloc;
+   MemoryContext context;
+   MemoryContext oldContext;
+   AllocateTestNext   *next_free_ptr = NULL;
+   AllocateTestNext   *last_alloc = NULL;
+   clock_t start, end;
+
+   if (chunk_size < sizeof(AllocateTestNext))
+   elog(ERROR, "chunk_size (%d) must be at least %ld bytes", 
chunk_size,
+sizeof(AllocateTestNext));
+   if (keep_memory > total_alloc)
+   elog(ERROR, "keep_memory (" INT64_FORMAT ") must be less than 
total_alloc (" INT64_FORMAT ")",
+keep_memory, total_alloc);
+
+   context_type = text_to_cstring(context_type_text);
+
+   start = clock();
+
+   if (strcmp(context_type, "generation") == 0)
+   context = 

Re: Add bump memory context type and use it for tuplesorts

2024-04-04 Thread Tom Lane
Matthias van de Meent  writes:
> On Mon, 25 Mar 2024 at 22:44, Tom Lane  wrote:
>> Basically, I'm not happy with consuming the last reasonably-available
>> pattern for a memory context type that has little claim to being the
>> Last Context Type We Will Ever Want.  Rather than making a further
>> dent in our ability to detect corrupted chunks, we should do something
>> towards restoring the expansibility that existed in the original
>> design.  Then we can add bump contexts and whatever else we want.

> So, would something like the attached make enough IDs available so
> that we can add the bump context anyway?

> It extends memory context IDs to 5 bits (32 values), of which
> - 8 have glibc's malloc pattern of 001/010;
> - 1 is unused memory's 0
> - 1 is wipe_mem's 1
> - 4 are used by existing contexts (Aset/Generation/Slab/AlignedRedirect)
> - 18 are newly available.

This seems like it would solve the problem for a good long time
to come; and if we ever need more IDs, we could steal one more bit
by requiring the offset to the block header to be a multiple of 8.
(Really, we could just about do that today at little or no cost ...
machines with MAXALIGN less than 8 are very thin on the ground.)

The only objection I can think of is that perhaps this would slow
things down a tad by requiring more complicated shifting/masking.
I wonder if we could redo the performance checks that were done
on the way to accepting the current design.

regards, tom lane




Re: Add bump memory context type and use it for tuplesorts

2024-04-04 Thread Matthias van de Meent
On Mon, 25 Mar 2024 at 22:44, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Tue, 26 Mar 2024 at 03:53, Tom Lane  wrote:
> >> Could we move the knowledge of exactly which context type it is out
> >> of the per-chunk header and keep it in the block header?
>
> > I wasn't 100% clear on your opinion about using 010 vs expanding the
> > bit-space. Based on the following it sounded like you were not
> > outright rejecting the idea of consuming the 010 pattern.
>
> What I said earlier was that 010 was the least bad choice if we
> fail to do any expansibility work; but I'm not happy with failing
> to do that.

Okay.

> Basically, I'm not happy with consuming the last reasonably-available
> pattern for a memory context type that has little claim to being the
> Last Context Type We Will Ever Want.  Rather than making a further
> dent in our ability to detect corrupted chunks, we should do something
> towards restoring the expansibility that existed in the original
> design.  Then we can add bump contexts and whatever else we want.

So, would something like the attached make enough IDs available so
that we can add the bump context anyway?

It extends memory context IDs to 5 bits (32 values), of which
- 8 have glibc's malloc pattern of 001/010;
- 1 is unused memory's 0
- 1 is wipe_mem's 1
- 4 are used by existing contexts (Aset/Generation/Slab/AlignedRedirect)
- 18 are newly available.

Kind regards,

Matthias
From 4c0b4b1af98fcfecf80a30ea1834668b59d543a5 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent 
Date: Thu, 4 Apr 2024 21:34:46 +0200
Subject: [PATCH v0] Add bitspace for more memory context types in
 MemoryChunk's hdrmask

Assuming we don't want to use patterns from unused memory, common
glibc malloc patterns, and wipe_mem-ed memory, we now have 18 new IDs
to work with again, from 0.

In passing, simplify/clean up initialization of unused memory contexts
in the mcxt_methods array.

Inspiration: 
https://postgr.es/m/CAApHDvqGSpCU95TmM%3DBp%3D6xjL_nLys4zdZOpfNyWBk97Xrdj2w%40mail.gmail.com
---
 src/backend/utils/mmgr/mcxt.c| 55 +++-
 src/include/utils/memutils_internal.h| 35 ---
 src/include/utils/memutils_memorychunk.h | 15 ---
 3 files changed, 73 insertions(+), 32 deletions(-)

diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 5d426795d9..9373f782ce 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -102,26 +102,41 @@ static const MemoryContextMethods mcxt_methods[] = {
 * seems sufficient to provide routines for the methods that might get
 * invoked from inspection of a chunk (see MCXT_METHOD calls below).
 */
-
-   [MCTX_UNUSED1_ID].free_p = BogusFree,
-   [MCTX_UNUSED1_ID].realloc = BogusRealloc,
-   [MCTX_UNUSED1_ID].get_chunk_context = BogusGetChunkContext,
-   [MCTX_UNUSED1_ID].get_chunk_space = BogusGetChunkSpace,
-
-   [MCTX_UNUSED2_ID].free_p = BogusFree,
-   [MCTX_UNUSED2_ID].realloc = BogusRealloc,
-   [MCTX_UNUSED2_ID].get_chunk_context = BogusGetChunkContext,
-   [MCTX_UNUSED2_ID].get_chunk_space = BogusGetChunkSpace,
-
-   [MCTX_UNUSED3_ID].free_p = BogusFree,
-   [MCTX_UNUSED3_ID].realloc = BogusRealloc,
-   [MCTX_UNUSED3_ID].get_chunk_context = BogusGetChunkContext,
-   [MCTX_UNUSED3_ID].get_chunk_space = BogusGetChunkSpace,
-
-   [MCTX_UNUSED4_ID].free_p = BogusFree,
-   [MCTX_UNUSED4_ID].realloc = BogusRealloc,
-   [MCTX_UNUSED4_ID].get_chunk_context = BogusGetChunkContext,
-   [MCTX_UNUSED4_ID].get_chunk_space = BogusGetChunkSpace,
+#define BOGUS_MCTX(id) \
+   [id].free_p = BogusFree, \
+   [id].realloc = BogusRealloc, \
+   [id].get_chunk_context = BogusGetChunkContext, \
+   [id].get_chunk_space = BogusGetChunkSpace
+
+   BOGUS_MCTX(MCTX_UNUSED1_ID),
+   BOGUS_MCTX(MCTX_UNUSED2_ID),
+   BOGUS_MCTX(MCTX_UNUSED3_ID),
+   BOGUS_MCTX(MCTX_UNUSED4_ID),
+   BOGUS_MCTX(MCTX_UNUSED5_ID),
+   BOGUS_MCTX(MCTX_UNUSED6_ID),
+   BOGUS_MCTX(MCTX_UNUSED7_ID),
+   BOGUS_MCTX(MCTX_UNUSED8_ID),
+   BOGUS_MCTX(MCTX_UNUSED9_ID),
+   BOGUS_MCTX(MCTX_UNUSED10_ID),
+   BOGUS_MCTX(MCTX_UNUSED11_ID),
+   BOGUS_MCTX(MCTX_UNUSED12_ID),
+   BOGUS_MCTX(MCTX_UNUSED13_ID),
+   BOGUS_MCTX(MCTX_UNUSED14_ID),
+   BOGUS_MCTX(MCTX_UNUSED15_ID),
+   BOGUS_MCTX(MCTX_UNUSED16_ID),
+   BOGUS_MCTX(MCTX_UNUSED17_ID),
+   BOGUS_MCTX(MCTX_UNUSED18_ID),
+   BOGUS_MCTX(MCTX_UNUSED19_ID),
+   BOGUS_MCTX(MCTX_UNUSED20_ID),
+   BOGUS_MCTX(MCTX_UNUSED21_ID),
+   BOGUS_MCTX(MCTX_UNUSED22_ID),
+   BOGUS_MCTX(MCTX_UNUSED23_ID),
+   BOGUS_MCTX(MCTX_UNUSED24_ID),
+   BOGUS_MCTX(MCTX_UNUSED25_ID),
+   BOGUS_MCTX(MCTX_UNUSED26_ID),
+   BOGUS_MCTX(MCTX_UNUSED27_ID),
+   BOGUS_MCTX(MCTX_UNUSED28_ID)
+#undef BOGUS_MCTX
 };
 
 /*
diff --git a/src/include/utils/memutils_internal.h 

Re: Add bump memory context type and use it for tuplesorts

2024-03-25 Thread Tom Lane
David Rowley  writes:
> On Tue, 26 Mar 2024 at 03:53, Tom Lane  wrote:
>> Could we move the knowledge of exactly which context type it is out
>> of the per-chunk header and keep it in the block header?

> I wasn't 100% clear on your opinion about using 010 vs expanding the
> bit-space. Based on the following it sounded like you were not
> outright rejecting the idea of consuming the 010 pattern.

What I said earlier was that 010 was the least bad choice if we
fail to do any expansibility work; but I'm not happy with failing
to do that.

Basically, I'm not happy with consuming the last reasonably-available
pattern for a memory context type that has little claim to being the
Last Context Type We Will Ever Want.  Rather than making a further
dent in our ability to detect corrupted chunks, we should do something
towards restoring the expansibility that existed in the original
design.  Then we can add bump contexts and whatever else we want.

regards, tom lane




Re: Add bump memory context type and use it for tuplesorts

2024-03-25 Thread David Rowley
On Tue, 26 Mar 2024 at 03:53, Tom Lane  wrote:
> I agree with this completely.  However, the current design for chunk
> headers is mighty restrictive about how many kinds of contexts we can
> have.  We need to open that back up.

Andres mentioned how we could do this in [1].  One possible issue with
that is that slab.c has no external chunks so would restrict slab to
512MB chunks.  I doubt that's ever going to realistically be an issue.
That's just not a good use case for slab, so I'd be ok with that.

> Could we move the knowledge of exactly which context type it is out
> of the per-chunk header and keep it in the block header?  This'd
> require that every context type have a standardized way of finding
> the block header from a chunk.  We could repurpose the existing
> MemoryContextMethodID bits to allow having a small number of different
> ways, perhaps.

I wasn't 100% clear on your opinion about using 010 vs expanding the
bit-space. Based on the following it sounded like you were not
outright rejecting the idea of consuming the 010 pattern.

On Sat, 17 Feb 2024 at 12:14, Tom Lane  wrote:
> If we do kick this can down the road, then I concur with eating 010
> next, as it seems the least likely to occur in glibc-malloced
> chunks.

David

[1] https://postgr.es/m/20240217200845.ywlwenjrlbyoc...@awork3.anarazel.de




Re: Add bump memory context type and use it for tuplesorts

2024-03-25 Thread Tom Lane
Tomas Vondra  writes:
> IMHO the idea of having a general purpose memory context and then also
> specialized memory contexts for particular allocation patterns is great,
> and we should embrace it. Adding more and more special cases into
> AllocSet seems to go directly against that idea, makes the code more
> complex, and I don't quite see how is that better or easier to use than
> having a separate BumpContext ...

I agree with this completely.  However, the current design for chunk
headers is mighty restrictive about how many kinds of contexts we can
have.  We need to open that back up.

Could we move the knowledge of exactly which context type it is out
of the per-chunk header and keep it in the block header?  This'd
require that every context type have a standardized way of finding
the block header from a chunk.  We could repurpose the existing
MemoryContextMethodID bits to allow having a small number of different
ways, perhaps.

regards, tom lane




Re: Add bump memory context type and use it for tuplesorts

2024-03-25 Thread Tomas Vondra
On 3/25/24 12:41, David Rowley wrote:
> On Tue, 5 Mar 2024 at 15:42, David Rowley  wrote:
>> The query I ran was:
>>
>> select chksz,mtype,pg_allocate_memory_test_reset(chksz,
>> 1024*1024,1024*1024*1024, mtype) from (values(8),(16),(32),(64))
>> sizes(chksz),(values('aset'),('generation'),('slab'),('bump'))
>> cxt(mtype) order by mtype,chksz;
> 
> Andres and I were discussing this patch offlist in the context of
> "should we have bump".  Andres wonders if it would be better to have a
> function such as palloc_nofree() (we didn't actually discuss the
> name), which for aset, would forego rounding up to the next power of 2
> and not bother checking the freelist and only have a chunk header for
> MEMORY_CONTEXT_CHECKING builds. For non-MEMORY_CONTEXT_CHECKING
> builds, the chunk header could be set to some other context type such
> as one of the unused ones or perhaps a dedicated new one that does
> something along the lines of BogusFree() which raises an ERROR if
> anything tries to pfree or repalloc it.
> 
> An advantage of having this instead of bump would be that it could be
> used for things such as the parser, where we make a possibly large
> series of small allocations and never free them again.
> 

I may be missing something, but I don't quite understand how this would
be simpler to use in places like parser. Wouldn't it require all the
places to start explicitly calling palloc_nofree()? How is that better
than having a specialized memory context?

> Andres ask me to run some benchmarks to mock up AllocSetAlloc() to
> have it not check the freelist to see how the performance of it
> compares to BumpAlloc().  I did this in the attached 2 patches.  The
> 0001 patch just #ifdefs that part of AllocSetAlloc out, however
> properly implementing this is more complex as aset.c currently stores
> the freelist index in the MemoryChunk rather than the chunk_size.  I
> did this because it saved having to call AllocSetFreeIndex() in
> AllocSetFree() which made a meaningful performance improvement in
> pfree().  The 0002 patch effectively reverses that change out so that
> the chunk_size is stored.  Again, these patches are only intended to
> demonstrate the performance and check how it compares to bump.
> 
> I'm yet uncertain why, but I find that the first time I run the query
> quoted above, the aset results are quite a bit slower than on
> subsequent runs.  Other context types don't seem to suffer from this.
> The previous results I sent in [1] were of the initial run after
> starting up the database.
> 
> The attached graph shows the number of seconds it takes to allocate a
> total of 1GBs of memory in various chunk sizes, resetting the context
> after 1MBs has been allocated, so as to keep the test sized so it fits
> in CPU caches.
> 

Yeah, strange and interesting. My guess is it's some sort of caching
effect, where the first run has to initialize stuff that's not in any of
the CPU caches yet, likely something specific to AllocSet (freelist?).
Or maybe memory prefetching does not work that well for AllocSet?

I'd try perf-stat, that might tell us more ... but who knows.

Alternatively, it might be some interaction with the glibc allocator.
Have you tried using jemalloc using LD_PRELOAD, or tweaking the glibc
parameters using environment variables? If you feel adventurous, you
might even try the memory pool stuff, although I'm not sure that can
help with the first run.

> I'm not drawing any particular conclusion from the results aside from
> it's not quite as fast as bump.   I also have some reservations about
> how easy it would be to actually use something like palloc_nofree().
> For example heap_form_minimal_tuple() does palloc0().  What if I
> wanted to call ExecCopySlotMinimalTuple() and use palloc0_nofree().
> Would we need new versions of various functions to give us control
> over this?
> 

That's kinda the problem that I mentioned above - is this really any
simpler/better than just having a separate memory context type? I don't
see what benefits this is supposed to have.

IMHO the idea of having a general purpose memory context and then also
specialized memory contexts for particular allocation patterns is great,
and we should embrace it. Adding more and more special cases into
AllocSet seems to go directly against that idea, makes the code more
complex, and I don't quite see how is that better or easier to use than
having a separate BumpContext ...

Having an AllocSet that mixes chunks that may be freed and chunks that
can't be freed, and have a different context type in the chunk header,
seems somewhat confusing and "not great" for debugging, for example.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add bump memory context type and use it for tuplesorts

2024-03-25 Thread David Rowley
On Tue, 5 Mar 2024 at 15:42, David Rowley  wrote:
> The query I ran was:
>
> select chksz,mtype,pg_allocate_memory_test_reset(chksz,
> 1024*1024,1024*1024*1024, mtype) from (values(8),(16),(32),(64))
> sizes(chksz),(values('aset'),('generation'),('slab'),('bump'))
> cxt(mtype) order by mtype,chksz;

Andres and I were discussing this patch offlist in the context of
"should we have bump".  Andres wonders if it would be better to have a
function such as palloc_nofree() (we didn't actually discuss the
name), which for aset, would forego rounding up to the next power of 2
and not bother checking the freelist and only have a chunk header for
MEMORY_CONTEXT_CHECKING builds. For non-MEMORY_CONTEXT_CHECKING
builds, the chunk header could be set to some other context type such
as one of the unused ones or perhaps a dedicated new one that does
something along the lines of BogusFree() which raises an ERROR if
anything tries to pfree or repalloc it.

An advantage of having this instead of bump would be that it could be
used for things such as the parser, where we make a possibly large
series of small allocations and never free them again.

Andres ask me to run some benchmarks to mock up AllocSetAlloc() to
have it not check the freelist to see how the performance of it
compares to BumpAlloc().  I did this in the attached 2 patches.  The
0001 patch just #ifdefs that part of AllocSetAlloc out, however
properly implementing this is more complex as aset.c currently stores
the freelist index in the MemoryChunk rather than the chunk_size.  I
did this because it saved having to call AllocSetFreeIndex() in
AllocSetFree() which made a meaningful performance improvement in
pfree().  The 0002 patch effectively reverses that change out so that
the chunk_size is stored.  Again, these patches are only intended to
demonstrate the performance and check how it compares to bump.

I'm yet uncertain why, but I find that the first time I run the query
quoted above, the aset results are quite a bit slower than on
subsequent runs.  Other context types don't seem to suffer from this.
The previous results I sent in [1] were of the initial run after
starting up the database.

The attached graph shows the number of seconds it takes to allocate a
total of 1GBs of memory in various chunk sizes, resetting the context
after 1MBs has been allocated, so as to keep the test sized so it fits
in CPU caches.

I'm not drawing any particular conclusion from the results aside from
it's not quite as fast as bump.   I also have some reservations about
how easy it would be to actually use something like palloc_nofree().
For example heap_form_minimal_tuple() does palloc0().  What if I
wanted to call ExecCopySlotMinimalTuple() and use palloc0_nofree().
Would we need new versions of various functions to give us control
over this?

David

[1] 
https://www.postgresql.org/message-id/CAApHDvr_hGT=kap0yxbksnztbrx+6hukiecwen2bulww1ut...@mail.gmail.com
From 3980d35a11d7f27d18c57586a7f7539d86dafae3 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Tue, 26 Mar 2024 00:06:55 +1300
Subject: [PATCH 2/2] Modify aset.c to store the chunk size in the hdr

---
 src/backend/utils/mmgr/aset.c | 51 +--
 1 file changed, 19 insertions(+), 32 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index c683e2a928..27006662c4 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -772,7 +772,7 @@ AllocSetAllocLarge(MemoryContext context, Size size, int 
flags)
  */
 static inline void *
 AllocSetAllocChunkFromBlock(MemoryContext context, AllocBlock block,
-   Size size, Size 
chunk_size, int fidx)
+   Size size, Size 
chunk_size)
 {
MemoryChunk *chunk;
 
@@ -785,7 +785,7 @@ AllocSetAllocChunkFromBlock(MemoryContext context, 
AllocBlock block,
Assert(block->freeptr <= block->endptr);
 
/* store the free list index in the value field */
-   MemoryChunkSetHdrMask(chunk, block, fidx, MCTX_ASET_ID);
+   MemoryChunkSetHdrMask(chunk, block, chunk_size, MCTX_ASET_ID);
 
 #ifdef MEMORY_CONTEXT_CHECKING
chunk->requested_size = size;
@@ -817,14 +817,13 @@ AllocSetAllocChunkFromBlock(MemoryContext context, 
AllocBlock block,
 pg_noinline
 static void *
 AllocSetAllocFromNewBlock(MemoryContext context, Size size, int flags,
- int fidx)
+ Size chunk_size)
 {
AllocSetset = (AllocSet) context;
AllocBlock  block;
Sizeavailspace;
Sizeblksize;
Sizerequired_size;
-   Sizechunk_size;
 
/* due to the keeper block set->blocks should always be valid */
Assert(set->blocks != NULL);
@@ -869,7 +868,7 @@ AllocSetAllocFromNewBlock(MemoryContext context, Size size, 
int 

Re: Add bump memory context type and use it for tuplesorts

2024-03-15 Thread Tomas Vondra
On 3/15/24 03:21, David Rowley wrote:
> On Tue, 12 Mar 2024 at 23:57, Tomas Vondra
>  wrote:
>> Attached is an updated version of the mempool patch, modifying all the
>> memory contexts (not just AllocSet), including the bump context. And
>> then also PDF with results from the two machines, comparing results
>> without and with the mempool. There's very little impact on small reset
>> values (128kB, 1MB), but pretty massive improvements on the 8MB test
>> (where it's a 2x improvement).
> 
> I think it would be good to have something like this.  I've done some
> experiments before with something like this [1]. However, mine was
> much more trivial.
> 

Interesting. My thing is a bit more complex because it was not meant to
be a cache initially, but more a way to limit the memory allocated by a
backend (discussed in [1]), or perhaps even a smaller part of a plan.

I only added the caching after I ran into some bottlenecks [2], and
malloc turned out to be a scalability issue.

> One thing my version did was get rid of the *context* freelist stuff
> in aset.  I wondered if we'd need that anymore as, if I understand
> correctly, it's just there to stop malloc/free thrashing, which is
> what the patch aims to do anyway. Aside from that, it's now a little
> weird that aset.c has that but generation.c and slab.c do not.
> 

True. I think the "memory pool" shared by all memory contexts would be a
more principled way to do this - not only it works for all memory
context types, but it's also part of the "regular" cache eviction like
everything else (which the context freelist is not).

> One thing I found was that in btbeginscan(), we have "so =
> (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));", which on this
> machine is 27344 bytes and results in a call to AllocSetAllocLarge()
> and therefore a malloc().  Ideally, there'd be no malloc() calls in a
> standard pgbench run, at least once the rel and cat caches have been
> warmed up.
> 

Right. That's exactly what I found in [2], where it's a massive problem
with many partitions and many concurrent connections.

> I think there are a few things in your patch that could be improved,
> here's a quick review.
> 
> 1. MemoryPoolEntryIndex() could follow the lead of
> AllocSetFreeIndex(), which is quite well-tuned and has no looping. I
> think you can get rid of MemoryPoolEntrySize() and just have
> MemoryPoolEntryIndex() round up to the next power of 2.
> 
> 2. The following could use  "result = Min(MEMPOOL_MIN_BLOCK,
> pg_nextpower2_size_t(size));"
> 
> + * should be very low, though (less than MEMPOOL_SIZES, i.e. 14).
> + */
> + result = MEMPOOL_MIN_BLOCK;
> + while (size > result)
> + result *= 2;
> 
> 3. "MemoryPoolFree": I wonder if this is a good name for such a
> function.  Really you want to return it to the pool. "Free" sounds
> like you're going to free() it.  I went for "Fetch" and "Release"
> which I thought was less confusing.
> 
> 4. MemoryPoolRealloc(), could this just do nothing if the old and new
> indexes are the same?
> 
> 5. It might be good to put a likely() around this:
> 
> + /* only do this once every MEMPOOL_REBALANCE_DISTANCE allocations */
> + if (pool->num_requests < MEMPOOL_REBALANCE_DISTANCE)
> + return;
> 
> Otherwise, if that function is inlined then you'll bloat the functions
> that inline it for not much good reason.  Another approach would be to
> have a static inline function which checks and calls a noinline
> function that does the work so that the rebalance stuff is never
> inlined.
> 

Yes, I agree with all of that. I was a bit lazy when doing the PoC, so I
ignored these things.

> Overall, I wonder if the rebalance stuff might make performance
> testing quite tricky.  I see:
> 
> +/*
> + * How often to rebalance the memory pool buckets (number of allocations).
> + * This is a tradeoff between the pool being adaptive and more overhead.
> + */
> +#define MEMPOOL_REBALANCE_DISTANCE 25000
> 
> Will TPS take a sudden jump after 25k transactions doing the same
> thing?  I'm not saying this shouldn't happen, but... benchmarking is
> pretty hard already. I wonder if there's something more fine-grained
> that can be done which makes the pool adapt faster but not all at
> once.  (I've not studied your algorithm for the rebalance.)
> 

I don't think so, or at least I haven't observed anything like that. My
intent was to make the rebalancing fairly frequent but incremental, with
each increment doing only a tiny amount of work.

It does not do any more malloc/free calls than without the cache - it
may just delay them a bit, and the assumption is the rest of the
rebalancing (walking the size slots and adjusting counters based on
activity since the last run) is super cheap.

So it shouldn't be the case that the rebalancing is so expensive to
cause a measurable drop in throughput, or something like that. I can
imagine spreading it even more (doing it in smaller steps), but on the
other hand the interval must not be too short - we need to do 

Re: Add bump memory context type and use it for tuplesorts

2024-03-14 Thread David Rowley
On Tue, 12 Mar 2024 at 23:57, Tomas Vondra
 wrote:
> Attached is an updated version of the mempool patch, modifying all the
> memory contexts (not just AllocSet), including the bump context. And
> then also PDF with results from the two machines, comparing results
> without and with the mempool. There's very little impact on small reset
> values (128kB, 1MB), but pretty massive improvements on the 8MB test
> (where it's a 2x improvement).

I think it would be good to have something like this.  I've done some
experiments before with something like this [1]. However, mine was
much more trivial.

One thing my version did was get rid of the *context* freelist stuff
in aset.  I wondered if we'd need that anymore as, if I understand
correctly, it's just there to stop malloc/free thrashing, which is
what the patch aims to do anyway. Aside from that, it's now a little
weird that aset.c has that but generation.c and slab.c do not.

One thing I found was that in btbeginscan(), we have "so =
(BTScanOpaque) palloc(sizeof(BTScanOpaqueData));", which on this
machine is 27344 bytes and results in a call to AllocSetAllocLarge()
and therefore a malloc().  Ideally, there'd be no malloc() calls in a
standard pgbench run, at least once the rel and cat caches have been
warmed up.

I think there are a few things in your patch that could be improved,
here's a quick review.

1. MemoryPoolEntryIndex() could follow the lead of
AllocSetFreeIndex(), which is quite well-tuned and has no looping. I
think you can get rid of MemoryPoolEntrySize() and just have
MemoryPoolEntryIndex() round up to the next power of 2.

2. The following could use  "result = Min(MEMPOOL_MIN_BLOCK,
pg_nextpower2_size_t(size));"

+ * should be very low, though (less than MEMPOOL_SIZES, i.e. 14).
+ */
+ result = MEMPOOL_MIN_BLOCK;
+ while (size > result)
+ result *= 2;

3. "MemoryPoolFree": I wonder if this is a good name for such a
function.  Really you want to return it to the pool. "Free" sounds
like you're going to free() it.  I went for "Fetch" and "Release"
which I thought was less confusing.

4. MemoryPoolRealloc(), could this just do nothing if the old and new
indexes are the same?

5. It might be good to put a likely() around this:

+ /* only do this once every MEMPOOL_REBALANCE_DISTANCE allocations */
+ if (pool->num_requests < MEMPOOL_REBALANCE_DISTANCE)
+ return;

Otherwise, if that function is inlined then you'll bloat the functions
that inline it for not much good reason.  Another approach would be to
have a static inline function which checks and calls a noinline
function that does the work so that the rebalance stuff is never
inlined.

Overall, I wonder if the rebalance stuff might make performance
testing quite tricky.  I see:

+/*
+ * How often to rebalance the memory pool buckets (number of allocations).
+ * This is a tradeoff between the pool being adaptive and more overhead.
+ */
+#define MEMPOOL_REBALANCE_DISTANCE 25000

Will TPS take a sudden jump after 25k transactions doing the same
thing?  I'm not saying this shouldn't happen, but... benchmarking is
pretty hard already. I wonder if there's something more fine-grained
that can be done which makes the pool adapt faster but not all at
once.  (I've not studied your algorithm for the rebalance.)

David

[1] https://github.com/david-rowley/postgres/tree/malloccache




Re: Add bump memory context type and use it for tuplesorts

2024-03-12 Thread John Naylor
On Tue, Mar 12, 2024 at 6:41 AM David Rowley  wrote:
> Thanks for trying this out.  I didn't check if the performance was
> susceptible to the memory size before the reset.  It certainly would
> be once the allocation crosses some critical threshold of CPU cache
> size, but probably it will also be to some extent regarding the number
> of actual mallocs that are required underneath.

I neglected to mention it, but the numbers I chose did have the L2/L3
cache in mind, but the reset frequency didn't seem to make much
difference.

> I see there's some discussion of bump in [1].  Do you still have a
> valid use case for bump for performance/memory usage reasons?

Yeah, that was part of my motivation for helping test, although my
interest is in saving memory in cases of lots of small allocations. It
might help if I make this a little more concrete, so I wrote a
quick-and-dirty function to measure the bytes used by the proposed TID
store and the vacuum's current array.

Using bitmaps really shines with a high number of offsets per block,
e.g. with about a million sequential blocks, and 49 offsets per block
(last parameter is a bound):

select * from tidstore_memory(0,1*1001*1000, 1,50);
 array_mem |  ts_mem
---+--
 294294000 | 42008576

The break-even point with this scenario is around 7 offsets per block:

select * from tidstore_memory(0,1*1001*1000, 1,8);
 array_mem |  ts_mem
---+--
  42042000 | 42008576

Below that, the array gets smaller, but the bitmap just has more empty
space. Here, 8 million bytes are used by the chunk header in bitmap
allocations, so the bump context would help there (I haven't actually
tried). Of course, the best allocation is no allocation at all, and I
have a draft patch to store up to 3 offsets in the last-level node's
pointer array, so for 2 or 3 offsets per block we're smaller than the
array again:

select * from tidstore_memory(0,1*1001*1000, 1,4);
 array_mem | ts_mem
---+-
  18018000 | 8462336

Sequential blocks are not the worst case scenario for memory use, but
this gives an idea of what's involved. So, with aset, on average I
still expect to use quite a bit less memory, with some corner cases
that use more. The bump context would be some extra insurance to
reduce those corner cases, where there are a large number of blocks in
play.




Re: Add bump memory context type and use it for tuplesorts

2024-03-11 Thread David Rowley
On Mon, 11 Mar 2024 at 22:09, John Naylor  wrote:
> I ran the test function, but using 256kB and 3MB for the reset
> frequency, and with 8,16,24,32 byte sizes (patched against a commit
> after the recent hot/cold path separation). Images attached. I also
> get a decent speedup with the bump context, but not quite as dramatic
> as on your machine. It's worth noting that slab is the slowest for me.
> This is an Intel i7-10750H.

Thanks for trying this out.  I didn't check if the performance was
susceptible to the memory size before the reset.  It certainly would
be once the allocation crosses some critical threshold of CPU cache
size, but probably it will also be to some extent regarding the number
of actual mallocs that are required underneath.

I see there's some discussion of bump in [1].  Do you still have a
valid use case for bump for performance/memory usage reasons?

The reason I ask is due to what Tom mentioned in [2] ("It's not
apparent to me that the "bump context" idea is valuable enough to
foreclose ever adding more context types"). So, I'm just probing to
find other possible use cases that reinforce the usefulness of bump.
It would be interesting to try it in a few places to see what
performance gains could be had.  I've not done much scouting around
the codebase for other uses other than non-bounded tuplesorts.

David

[1] 
https://postgr.es/m/canwcazbxxhysytrpyz-wzbdtvrpwoete7rqm1g_+4cb8z6k...@mail.gmail.com
[2] https://postgr.es/m/3537323.1708125...@sss.pgh.pa.us




Re: Add bump memory context type and use it for tuplesorts

2024-03-11 Thread David Rowley
On Tue, 12 Mar 2024 at 12:25, Tomas Vondra
 wrote:
> (b) slab is considerably slower

It would be interesting to modify SlabReset() to, instead of free()ing
the blocks, push the first SLAB_MAXIMUM_EMPTY_BLOCKS of them onto the
emptyblocks list.

That might give us an idea of how much overhead comes from malloc/free.

Having something like this as an option when creating a context might
be a good idea.  generation.c now keeps 1 "freeblock" which currently
does not persist during context resets. Some memory context usages
might suit having an option like this.  Maybe something like the
executor's per-tuple context, which perhaps (c|sh)ould be a generation
context...  However, saying that, I see you measure it to be slightly
slower than aset.

David




Re: Add bump memory context type and use it for tuplesorts

2024-03-11 Thread John Naylor
On Tue, Mar 5, 2024 at 9:42 AM David Rowley  wrote:
> performance against the recently optimised aset, generation and slab
> contexts.  The attached graph shows the time it took in seconds to
> allocate 1GB of memory performing a context reset after 1MB. The
> function I ran the test on is in the attached
> pg_allocate_memory_test.patch.txt file.
>
> The query I ran was:
>
> select chksz,mtype,pg_allocate_memory_test_reset(chksz,
> 1024*1024,1024*1024*1024, mtype) from (values(8),(16),(32),(64))
> sizes(chksz),(values('aset'),('generation'),('slab'),('bump'))
> cxt(mtype) order by mtype,chksz;

I ran the test function, but using 256kB and 3MB for the reset
frequency, and with 8,16,24,32 byte sizes (patched against a commit
after the recent hot/cold path separation). Images attached. I also
get a decent speedup with the bump context, but not quite as dramatic
as on your machine. It's worth noting that slab is the slowest for me.
This is an Intel i7-10750H.


Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread David Rowley
On Wed, 21 Feb 2024 at 00:02, Matthias van de Meent
 wrote:
>
> On Tue, 20 Feb 2024 at 11:02, David Rowley  wrote:
> > On Fri, 26 Jan 2024 at 01:29, Matthias van de Meent
> >  wrote:
> > > >> +allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
> > > >> +if (minContextSize != 0)
> > > >> +allocSize = Max(allocSize, minContextSize);
> > > >> +else
> > > >> +allocSize = Max(allocSize, initBlockSize);
> > >
> > > Shouldn't this be the following, considering the meaning of 
> > > "initBlockSize"?
> >
> > No, we want to make the blocksize exactly initBlockSize if we can. Not
> > initBlockSize plus all the header stuff.  We do it that way for all
> > the other contexts and I agree that it's a good idea as it keeps the
> > malloc request sizes powers of 2.
>
> One part of the reason of my comment was that initBlockSize was
> ignored in favour of minContextSize if that was configured, regardless
> of the value of initBlockSize. Is it deliberately ignored when
> minContextSize is set?

Ok, it's a good question. It's to allow finer-grained control over the
initial block as it allows it to be a fixed given size without
affecting the number that we double for the subsequent blocks.

e.g BumpContextCreate(64*1024, 8*1024, 1024*1024);

would make the first block 64K and the next block 16K, followed by
32K, 64K ... 1MB.

Whereas, BumpContextCreate(0, 8*1024, 1024*1024) will start at 8K, 16K ... 1MB.

It seems useful as you might have a good idea of how much memory the
common case has and want to do that without having to allocate
subsequent blocks, but if slightly more memory is required sometimes,
you probably don't want the next malloc to be double the common size,
especially if the common size is large.

Apart from slab.c, this is how all the other contexts work.  It seems
best to keep this and not to go against the grain on this as there's
more to consider if we opt to change the context types of existing
contexts.

David




Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread David Rowley
On Tue, 20 Feb 2024 at 23:52, Matthias van de Meent
 wrote:
> What I meant was that
>
> > (char *) block + Bump_BLOCKHDRSZ
> vs
> >  ((char *) block) + Bump_BLOCKHDRSZ
>
> , when combined with my little experience with pointer addition and
> precedence, and a lack of compiler at the ready at that point in time,
> I was afraid that "(char *) block + Bump_BLOCKHDRSZ" would be parsed
> as "(char *) (block + Bump_BLOCKHDRSZ)", which would get different
> offsets across the two statements.
> Godbolt has since helped me understand that both are equivalent.

I failed to notice this.  I've made them the same regardless to
prevent future questions from being raised about the discrepancy
between the two.

David




Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread Matthias van de Meent
On Tue, 20 Feb 2024 at 11:02, David Rowley  wrote:
> On Fri, 26 Jan 2024 at 01:29, Matthias van de Meent
>  wrote:
> > >> +allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
> > >> +if (minContextSize != 0)
> > >> +allocSize = Max(allocSize, minContextSize);
> > >> +else
> > >> +allocSize = Max(allocSize, initBlockSize);
> >
> > Shouldn't this be the following, considering the meaning of "initBlockSize"?
>
> No, we want to make the blocksize exactly initBlockSize if we can. Not
> initBlockSize plus all the header stuff.  We do it that way for all
> the other contexts and I agree that it's a good idea as it keeps the
> malloc request sizes powers of 2.

One part of the reason of my comment was that initBlockSize was
ignored in favour of minContextSize if that was configured, regardless
of the value of initBlockSize. Is it deliberately ignored when
minContextSize is set?

> > >> + * BumpFree
> > >> + *Unsupported.
> > >> [...]
> > >> + * BumpRealloc
> > >> + *Unsupported.
> >
> > Rather than the error, can't we make this a no-op (potentially
> > optionally, or in a different memory context?)
>
> Unfortunately not.  There are no MemoryChunks on bump chunks so we've
> no way to determine the context type a given pointer belongs to.  I've
> left the MemoryChunk on there for debug builds so we can get the
> ERRORs to allow us to fix the broken code that is pfreeing these
> chunks.
>
> > I understand that allowing pfree/repalloc in bump contexts requires
> > each allocation to have a MemoryChunk prefix in overhead, but I think
> > it's still a valid use case to have a very low overhead allocator with
> > no-op deallocator (except context reset). Do you have performance
> > comparison results between with and without the overhead of
> > MemoryChunk?
>
> Oh right, you've taken this into account.  I was hoping not to have
> the headers otherwise the only gains we see over using generation.c is
> that of the allocation function being faster.
>
> [...]  The smaller the tuple, the
> more that will be noticeable as the chunk header is a larger portion
> of the overall allocation with those.

I see. Thanks for the explanation.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread Matthias van de Meent
On Tue, 20 Feb 2024 at 10:41, David Rowley  wrote:
> On Tue, 7 Nov 2023 at 07:55, Matthias van de Meent
>  wrote:
> > > +++ b/src/backend/utils/mmgr/bump.c
> > > +BumpBlockIsEmpty(BumpBlock *block)
> > > +{
> > > +/* it's empty if the freeptr has not moved */
> > > +return (block->freeptr == (char *) block + Bump_BLOCKHDRSZ);
> > > [...]
> > > +static inline void
> > > +BumpBlockMarkEmpty(BumpBlock *block)
> > > +{
> > > +#if defined(USE_VALGRIND) || defined(CLOBBER_FREED_MEMORY)
> > > +char   *datastart = ((char *) block) + Bump_BLOCKHDRSZ;
> >
> > These two use different definitions of the start pointer. Is that 
> > deliberate?
>
> hmm, I'm not sure if I follow what you mean.  Are you talking about
> the "datastart" variable and the assignment of block->freeptr (which
> you've not quoted?)

What I meant was that

> (char *) block + Bump_BLOCKHDRSZ
vs
>  ((char *) block) + Bump_BLOCKHDRSZ

, when combined with my little experience with pointer addition and
precedence, and a lack of compiler at the ready at that point in time,
I was afraid that "(char *) block + Bump_BLOCKHDRSZ" would be parsed
as "(char *) (block + Bump_BLOCKHDRSZ)", which would get different
offsets across the two statements.
Godbolt has since helped me understand that both are equivalent.

Kind regards,

Matthias van de Meent




Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread David Rowley
Thanks for taking an interest in this.

On Sat, 17 Feb 2024 at 11:46, Tomas Vondra
 wrote:
> I wasn't paying much attention to these memcontext reworks in 2022, so
> my instinct was simply to use one of those "UNUSED" IDs. But after
> looking at the 80ef92675823 a bit more, are those IDs really unused? I
> mean, we're relying on those values to detect bogus pointers, right? So
> if we instead start using those values for a new memory context, won't
> we lose the ability to detect those issues?

I wouldn't say we're "relying" on them.  Really there just there to
improve debugability.  If we call any code that tries to look at the
MemoryChunk header of a malloc'd chunk, then we can expect bad things
to happen. We no longer have any code which does this.
MemoryContextContains() did, and it's now gone.

> Maybe I'm completely misunderstanding the implication of those limits,
> but doesn't this mean the claim that we can support 8 memory context
> types is not quite true, and the limit is 4, because the 4 IDs are
> already used for malloc stuff?

I think we all expected a bit more pain from the memory context
change.  I was happy that Tom did the extra work to look at the malloc
patterns of glibc, but I think there's been very little gone wrong.
The reserved MemoryContextMethodIDs do seem to have allowed [1] to be
found, but I guess there'd have been a segfault instead of an ERROR
without the reserved IDs.

I've attached version 2, now split into 2 patches.

0001 for the bump allocator
0002 to use the new allocator for tuplesorts

David

[1] https://postgr.es/m/796b65c3-57b7-bddf-b0d5-a8afafb8b...@gmail.com
From 3270fbbb9bceb3b082a0b27f7fd1759c2b1c4150 Mon Sep 17 00:00:00 2001
From: David Rowley 
Date: Tue, 20 Feb 2024 21:49:57 +1300
Subject: [PATCH v3 1/2] Introduce a bump memory allocator

---
 src/backend/nodes/gen_node_support.pl |   2 +-
 src/backend/utils/mmgr/Makefile   |   1 +
 src/backend/utils/mmgr/bump.c | 747 ++
 src/backend/utils/mmgr/mcxt.c |  19 +-
 src/backend/utils/mmgr/meson.build|   1 +
 src/include/nodes/memnodes.h  |   3 +-
 src/include/utils/memutils.h  |   7 +
 src/include/utils/memutils_internal.h |  21 +-
 src/tools/pgindent/typedefs.list  |   2 +
 9 files changed, 794 insertions(+), 9 deletions(-)
 create mode 100644 src/backend/utils/mmgr/bump.c

diff --git a/src/backend/nodes/gen_node_support.pl 
b/src/backend/nodes/gen_node_support.pl
index 2f0a59bc87..9401f8d7a1 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -149,7 +149,7 @@ my @abstract_types = qw(Node);
 # they otherwise don't participate in node support.
 my @extra_tags = qw(
   IntList OidList XidList
-  AllocSetContext GenerationContext SlabContext
+  AllocSetContext GenerationContext SlabContext BumpContext
   TIDBitmap
   WindowObjectData
 );
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index dae3432c98..01a1fb8527 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
alignedalloc.o \
aset.o \
+   bump.o \
dsa.o \
freepage.o \
generation.o \
diff --git a/src/backend/utils/mmgr/bump.c b/src/backend/utils/mmgr/bump.c
new file mode 100644
index 00..015e66709f
--- /dev/null
+++ b/src/backend/utils/mmgr/bump.c
@@ -0,0 +1,747 @@
+/*-
+ *
+ * bump.c
+ *   Bump allocator definitions.
+ *
+ * Bump is a MemoryContext implementation designed for memory usages which
+ * require allocating a large number of chunks, none of which ever need to be
+ * pfree'd or realloc'd.  Chunks allocated by this context have no chunk header
+ * and operations which ordinarily require looking at the chunk header cannot
+ * be performed.  For example, pfree, realloc, GetMemoryChunkSpace and
+ * GetMemoryChunkContext are all not possible with bump allocated chunks.  The
+ * only way to release memory allocated by this context type is to reset or
+ * delete the context.
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *   src/backend/utils/mmgr/bump.c
+ *
+ *
+ * Bump is best suited to cases which require a large number of short-lived
+ * chunks where performance matters.  Because bump allocated chunks don't
+ * have a chunk header, it can fit more chunks on each block.  This means 
we
+ * can do more with less memory and fewer cache lines.  The reason it's 
best
+ * suited for short-lived usages of memory is that ideally, pointers to 
bump
+ * allocated chunks won't be visible to a large amount of code.  The more
+ * code that operates on memory allocated by this allocator, the more 
chances
+ * that some code will try to perform a pfree or one of the other 
operations
+ * which are made 

Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread David Rowley
On Fri, 26 Jan 2024 at 01:29, Matthias van de Meent
 wrote:
> >> +allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
> >> +if (minContextSize != 0)
> >> +allocSize = Max(allocSize, minContextSize);
> >> +else
> >> +allocSize = Max(allocSize, initBlockSize);
>
> Shouldn't this be the following, considering the meaning of "initBlockSize"?

No, we want to make the blocksize exactly initBlockSize if we can. Not
initBlockSize plus all the header stuff.  We do it that way for all
the other contexts and I agree that it's a good idea as it keeps the
malloc request sizes powers of 2.

> >> + * BumpFree
> >> + *Unsupported.
> >> [...]
> >> + * BumpRealloc
> >> + *Unsupported.
>
> Rather than the error, can't we make this a no-op (potentially
> optionally, or in a different memory context?)

Unfortunately not.  There are no MemoryChunks on bump chunks so we've
no way to determine the context type a given pointer belongs to.  I've
left the MemoryChunk on there for debug builds so we can get the
ERRORs to allow us to fix the broken code that is pfreeing these
chunks.

> I understand that allowing pfree/repalloc in bump contexts requires
> each allocation to have a MemoryChunk prefix in overhead, but I think
> it's still a valid use case to have a very low overhead allocator with
> no-op deallocator (except context reset). Do you have performance
> comparison results between with and without the overhead of
> MemoryChunk?

Oh right, you've taken this into account.  I was hoping not to have
the headers otherwise the only gains we see over using generation.c is
that of the allocation function being faster.

I certainly did do benchmarks in [1] and saw the 338% increase due to
the reduction in memory.  That massive jump happened by accident as
the sort on tenk1 went from not fitting into default 4MB work_mem to
fitting in, so what I happened to measure there was the difference of
spilling to disk and not. The same could happen for this case, so the
overhead of having the chunk headers really depends on what the test
is. Probably, "potentially large" is likely a good way to describe the
overhead of having chunk headers.  However, to a lesser extent, there
will be a difference for large sorts as we'll be able to fit more
tuples per batch and do fewer batches.  The smaller the tuple, the
more that will be noticeable as the chunk header is a larger portion
of the overall allocation with those.

David

[1] 
https://postgr.es/m/caaphdvoh4aszsaoyhcxkuy01qf++8jj0paw+03dk+w25tqe...@mail.gmail.com




Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread David Rowley
On Wed, 26 Jul 2023 at 12:11, Nathan Bossart  wrote:
> I think it'd be okay to steal those bits.  AFAICT it'd complicate the
> macros in memutils_memorychunk.h a bit, but that doesn't seem like such a
> terrible price to pay to allow us to keep avoiding the glibc bit patterns.

I've not adjusted anything here and I've kept the patch using the
>128KB glibc bit pattern.  I think it was a good idea to make our
lives easier if someone came to us with a bug report, but it's not
like the reserved patterns are guaranteed to cover all malloc
implementations.  What's there is just to cover the likely candidates.
I'd like to avoid adding any bit shift instructions in the code that
decodes the hdrmask.

> > + if (base->sortopt & TUPLESORT_ALLOWBOUNDED)
> > + tuplen = GetMemoryChunkSpace(tuple);
> > + else
> > + tuplen = MAXALIGN(tuple->t_len);
>
> nitpick: I see this repeated in a few places, and I wonder if it might
> deserve a comment.

I ended up adding a macro and a comment in each location that does this.

> I haven't had a chance to try out your benchmark, but I'm hoping to do so
> in the near future.

Great. It would be good to get a 2nd opinion.

David




Re: Add bump memory context type and use it for tuplesorts

2024-02-20 Thread David Rowley
Thanks for having a look at this.

On Tue, 7 Nov 2023 at 07:55, Matthias van de Meent
 wrote:
> I think it would make sense to split the "add a bump allocator"
> changes from the "use the bump allocator in tuplesort" patches.

I've done this and will post updated patches after replying to the
other comments.

> Tangent: Do we have specific notes on worst-case memory usage of
> memory contexts with various allocation patterns? This new bump
> allocator seems to be quite efficient, but in a worst-case allocation
> pattern it can still waste about 1/3 of its allocated memory due to
> never using free space on previous blocks after an allocation didn't
> fit on that block.
> It probably isn't going to be a huge problem in general, but this
> seems like something that could be documented as a potential problem
> when you're looking for which allocator to use and compare it with
> other allocators that handle different allocation sizes more
> gracefully.

It might be a good idea to document this. The more memory allocator
types we add, the harder it is to decide which one to use when writing
new code.

> > +++ b/src/backend/utils/mmgr/bump.c
> > +BumpBlockIsEmpty(BumpBlock *block)
> > +{
> > +/* it's empty if the freeptr has not moved */
> > +return (block->freeptr == (char *) block + Bump_BLOCKHDRSZ);
> > [...]
> > +static inline void
> > +BumpBlockMarkEmpty(BumpBlock *block)
> > +{
> > +#if defined(USE_VALGRIND) || defined(CLOBBER_FREED_MEMORY)
> > +char   *datastart = ((char *) block) + Bump_BLOCKHDRSZ;
>
> These two use different definitions of the start pointer. Is that deliberate?

hmm, I'm not sure if I follow what you mean.  Are you talking about
the "datastart" variable and the assignment of block->freeptr (which
you've not quoted?)

> > +++ b/src/include/utils/tuplesort.h
> > @@ -109,7 +109,8 @@ typedef struct TuplesortInstrumentation
> >  * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
> >  * which is a separate palloc chunk --- we assume it is just one chunk and
> >  * can be freed by a simple pfree() (except during merge, when we use a
> > - * simple slab allocator).  SortTuples also contain the tuple's first key
> > + * simple slab allocator and when performing a non-bounded sort where we
> > + * use a bump allocator).  SortTuples also contain the tuple's first key
>
> I'd go with something like the following:
>
> + * ...(except during merge *where* we use a
> + * simple slab allocator, and during a non-bounded sort where we
> + * use a bump allocator).

Adjusted.




Re: Add bump memory context type and use it for tuplesorts

2024-02-17 Thread Andres Freund
Hi,

On 2024-02-16 20:10:48 -0500, Tom Lane wrote:
> Tomas Vondra  writes:
> > On 2/17/24 00:14, Tom Lane wrote:
> >> The conclusion was that the specific invalid values didn't matter as
> >> much on the other platforms as they do with glibc.  But right now you
> >> have a fifty-fifty chance that a pointer to garbage will look valid.
> >> Do we want to increase those odds?
> 
> > Not sure. The ability to detect bogus pointers seems valuable, but is
> > the difference between 4/8 and 3/8 really qualitatively different? If it
> > is, maybe we should try to increase it by simply adding a bit.
> 
> I think it'd be worth taking a fresh look at the bit allocation in the
> header word to see if we can squeeze another bit without too much
> pain.  There's basically no remaining headroom in the current design,
> and it starts to seem like we want some.

I think we could fairly easily "move" some bits around, by restricting the
maximum size of a non-external chunk (i.e. allocations coming out of a larger
block, not a separate allocation). Right now we reserve 30 bits for the offset
from the block header to the allocation.

It seems unlikely that it's ever worth having an undivided 1GB block. Even if
we wanted something that large - say because we want to use 1GB huge pages to
back the block - we could just add a few block headers ever couple hundred
MBs.

Another avenue is that presumably the chunk<->block header offset always has
at least the two lower bits set to zero, so perhaps we could just shift
blockoffset right by two bits in MemoryChunkSetHdrMask() and left in
MemoryChunkGetBlock()?


> (I'm also wondering whether the palloc_aligned stuff should have been done
> some other way than by consuming a context type ID.)

Possibly, I just don't quite know how.

Greetings,

Andres Freund




Re: Add bump memory context type and use it for tuplesorts

2024-02-16 Thread Tom Lane
Tomas Vondra  writes:
> On 2/17/24 00:14, Tom Lane wrote:
>> The conclusion was that the specific invalid values didn't matter as
>> much on the other platforms as they do with glibc.  But right now you
>> have a fifty-fifty chance that a pointer to garbage will look valid.
>> Do we want to increase those odds?

> Not sure. The ability to detect bogus pointers seems valuable, but is
> the difference between 4/8 and 3/8 really qualitatively different? If it
> is, maybe we should try to increase it by simply adding a bit.

I think it'd be worth taking a fresh look at the bit allocation in the
header word to see if we can squeeze another bit without too much
pain.  There's basically no remaining headroom in the current design,
and it starts to seem like we want some.  (I'm also wondering whether
the palloc_aligned stuff should have been done some other way than
by consuming a context type ID.)

regards, tom lane




Re: Add bump memory context type and use it for tuplesorts

2024-02-16 Thread Tomas Vondra



On 2/17/24 00:14, Tom Lane wrote:
> Tomas Vondra  writes:
>> Maybe I'm completely misunderstanding the implication of those limits,
>> but doesn't this mean the claim that we can support 8 memory context
>> types is not quite true, and the limit is 4, because the 4 IDs are
>> already used for malloc stuff?
> 
> Well, correct code would still work, but we will take a hit in our
> ability to detect bogus chunk pointers if we convert any of the four
> remaining bit-patterns to valid IDs.  That has costs for debugging.
> The particular bit patterns we left unused were calculated to make it
> likely that we could detect a malloced-instead-of-palloced chunk (at
> least with glibc); but in general, reducing the number of invalid
> patterns makes it more likely that a just-plain-bad pointer would
> escape detection.
> 
> I am less concerned about that than I was in 2022, because people
> have already had some time to flush out bugs associated with the
> GUC malloc->palloc conversion.  Still, maybe we should think harder
> about whether we can free up another ID bit before we go eating
> more ID types.  It's not apparent to me that the "bump context"
> idea is valuable enough to foreclose ever adding more context types,
> yet it will be pretty close to doing that if we commit it as-is.
> 
> If we do kick this can down the road, then I concur with eating 010
> next, as it seems the least likely to occur in glibc-malloced
> chunks.
> 

I don't know if the bump context for tuplesorts alone is worth it, but
I've been thinking it's not the only place doing something like that.
I'm aware of two other places doing this "dense allocation" - spell.c
and nodeHash.c. And in those cases it definitely made a big difference
(ofc, the question is how big the difference would be now, with all the
palloc improvements).

But maybe we could switch all those places to a proper memcontext
(instead of something built on top of a memory context) ... Of course,
the code in spell.c/nodeHash.c is quite stable, so the custom code does
not cost much.

>> One thing that confuses me a bit is that the comments introduced by
>> 80ef92675823 talk about glibc, but the related discussion in [1] talks a
>> lot about FreeBSD, NetBSD, ... which don't actually use glibc (AFAIK).
> 
> The conclusion was that the specific invalid values didn't matter as
> much on the other platforms as they do with glibc.  But right now you
> have a fifty-fifty chance that a pointer to garbage will look valid.
> Do we want to increase those odds?
> 

Not sure. The ability to detect bogus pointers seems valuable, but is
the difference between 4/8 and 3/8 really qualitatively different? If it
is, maybe we should try to increase it by simply adding a bit.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add bump memory context type and use it for tuplesorts

2024-02-16 Thread Tom Lane
Tomas Vondra  writes:
> Maybe I'm completely misunderstanding the implication of those limits,
> but doesn't this mean the claim that we can support 8 memory context
> types is not quite true, and the limit is 4, because the 4 IDs are
> already used for malloc stuff?

Well, correct code would still work, but we will take a hit in our
ability to detect bogus chunk pointers if we convert any of the four
remaining bit-patterns to valid IDs.  That has costs for debugging.
The particular bit patterns we left unused were calculated to make it
likely that we could detect a malloced-instead-of-palloced chunk (at
least with glibc); but in general, reducing the number of invalid
patterns makes it more likely that a just-plain-bad pointer would
escape detection.

I am less concerned about that than I was in 2022, because people
have already had some time to flush out bugs associated with the
GUC malloc->palloc conversion.  Still, maybe we should think harder
about whether we can free up another ID bit before we go eating
more ID types.  It's not apparent to me that the "bump context"
idea is valuable enough to foreclose ever adding more context types,
yet it will be pretty close to doing that if we commit it as-is.

If we do kick this can down the road, then I concur with eating 010
next, as it seems the least likely to occur in glibc-malloced
chunks.

> One thing that confuses me a bit is that the comments introduced by
> 80ef92675823 talk about glibc, but the related discussion in [1] talks a
> lot about FreeBSD, NetBSD, ... which don't actually use glibc (AFAIK).

The conclusion was that the specific invalid values didn't matter as
much on the other platforms as they do with glibc.  But right now you
have a fifty-fifty chance that a pointer to garbage will look valid.
Do we want to increase those odds?

regards, tom lane




Re: Add bump memory context type and use it for tuplesorts

2024-02-16 Thread Tomas Vondra
On 11/6/23 19:54, Matthias van de Meent wrote:
>
> ...
>
> Tangent: Do we have specific notes on worst-case memory usage of
> memory contexts with various allocation patterns? This new bump
> allocator seems to be quite efficient, but in a worst-case allocation
> pattern it can still waste about 1/3 of its allocated memory due to
> never using free space on previous blocks after an allocation didn't
> fit on that block.
> It probably isn't going to be a huge problem in general, but this
> seems like something that could be documented as a potential problem
> when you're looking for which allocator to use and compare it with
> other allocators that handle different allocation sizes more
> gracefully.
> 

I don't think it's documented anywhere, but I agree it might be an
interesting piece of information. It probably did not matter too much
when we had just AllocSet, but now we have 3 very different allocators,
so maybe we should explain this.

When implementing these allocators, it didn't feel that important,
because the new allocators started as intended for a very specific part
of the code (as in "This piece of code has a very unique allocation
pattern, let's develop a custom allocator for it."), but if we feel we
want to make it simpler to use the allocators elsewhere ...

I think there are two obvious places where to document this - either in
the header of each memory context .c file, or a README in the mmgr
directory. Or some combination of it.

At some point I was thinking about writing a "proper paper" comparing
these allocators in a more scientific / thorough way, but I never got to
do it. I wonder if that'd be interesting for enough people.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add bump memory context type and use it for tuplesorts

2024-02-16 Thread Tomas Vondra
Hi,

I wanted to take a look at this patch but it seems to need a rebase,
because of a seemingly trivial conflict in MemoryContextMethodID:

--- src/include/utils/memutils_internal.h
+++ src/include/utils/memutils_internal.h
@@ -123,12 +140,13 @@ typedef enum MemoryContextMethodID
 {
   MCTX_UNUSED1_ID,  /* 000 occurs in never-used memory */
   MCTX_UNUSED2_ID,  /* glibc malloc'd chunks usually match 001 */
-  MCTX_UNUSED3_ID,  /* glibc malloc'd chunks > 128kB match 010 */
+  MCTX_BUMP_ID,/* glibc malloc'd chunks > 128kB match 010
+ * XXX? */
   MCTX_ASET_ID,
   MCTX_GENERATION_ID,
   MCTX_SLAB_ID,
   MCTX_ALIGNED_REDIRECT_ID,
-  MCTX_UNUSED4_ID/* 111 occurs in wipe_mem'd memory */
+  MCTX_UNUSED3_ID/* 111 occurs in wipe_mem'd memory */
 } MemoryContextMethodID;


I wasn't paying much attention to these memcontext reworks in 2022, so
my instinct was simply to use one of those "UNUSED" IDs. But after
looking at the 80ef92675823 a bit more, are those IDs really unused? I
mean, we're relying on those values to detect bogus pointers, right? So
if we instead start using those values for a new memory context, won't
we lose the ability to detect those issues?

Maybe I'm completely misunderstanding the implication of those limits,
but doesn't this mean the claim that we can support 8 memory context
types is not quite true, and the limit is 4, because the 4 IDs are
already used for malloc stuff?

One thing that confuses me a bit is that the comments introduced by
80ef92675823 talk about glibc, but the related discussion in [1] talks a
lot about FreeBSD, NetBSD, ... which don't actually use glibc (AFAIK).
So how portable are those unused IDs, actually?

Or am I just too caffeine-deprived and missing something obvious?

regards


[1] https://postgr.es/m/2910981.1665080...@sss.pgh.pa.us

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add bump memory context type and use it for tuplesorts

2024-01-25 Thread Matthias van de Meent
On Mon, 6 Nov 2023 at 19:54, Matthias van de Meent
 wrote:
>
> On Tue, 11 Jul 2023 at 01:51, David Rowley  wrote:
>>
>> On Tue, 27 Jun 2023 at 21:19, David Rowley  wrote:
>>> I've attached the bump allocator patch and also the script I used to
>>> gather the performance results in the first 2 tabs in the attached
>>> spreadsheet.
>>
>> I've attached a v2 patch which changes the BumpContext a little to
>> remove some of the fields that are not really required.  There was no
>> need for the "keeper" field as the keeper block always comes at the
>> end of the BumpContext as these are allocated in a single malloc().
>> The pointer to the "block" also isn't really needed. This is always
>> the same as the head element in the blocks dlist.

>> +++ b/src/backend/utils/mmgr/bump.c
>> [...]
>> +MemoryContext
>> +BumpContextCreate(MemoryContext parent,
>> +  const char *name,
>> +  Size minContextSize,
>> +  Size initBlockSize,
>> +  Size maxBlockSize)
>> [...]
>> +/* Determine size of initial block */
>> +allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
>> +if (minContextSize != 0)
>> +allocSize = Max(allocSize, minContextSize);
>> +else
>> +allocSize = Max(allocSize, initBlockSize);

Shouldn't this be the following, considering the meaning of "initBlockSize"?

+allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
+Bump_CHUNKHDRSZ + initBlockSize;
+if (minContextSize != 0)
+allocSize = Max(allocSize, minContextSize);

>> + * BumpFree
>> + *Unsupported.
>> [...]
>> + * BumpRealloc
>> + *Unsupported.

Rather than the error, can't we make this a no-op (potentially
optionally, or in a different memory context?)

What I mean is, I get that it is an easy validator check that the code
that uses this context doesn't accidentally leak memory through
assumptions about pfree, but this does make this memory context
unusable for more generic operations where leaking a little memory is
preferred over the overhead of other memory contexts, as
MemoryContextReset is quite cheap in the grand scheme of things.

E.g. using a bump context in btree descent could speed up queries when
we use compare operators that do allocate memory (e.g. numeric, text),
because btree operators must not leak memory and thus always have to
manually keep track of all allocations, which can be expensive.

I understand that allowing pfree/repalloc in bump contexts requires
each allocation to have a MemoryChunk prefix in overhead, but I think
it's still a valid use case to have a very low overhead allocator with
no-op deallocator (except context reset). Do you have performance
comparison results between with and without the overhead of
MemoryChunk?



Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Add bump memory context type and use it for tuplesorts

2023-11-06 Thread Matthias van de Meent
On Tue, 11 Jul 2023 at 01:51, David Rowley  wrote:
>
> On Tue, 27 Jun 2023 at 21:19, David Rowley  wrote:
> > I've attached the bump allocator patch and also the script I used to
> > gather the performance results in the first 2 tabs in the attached
> > spreadsheet.
>
> I've attached a v2 patch which changes the BumpContext a little to
> remove some of the fields that are not really required.  There was no
> need for the "keeper" field as the keeper block always comes at the
> end of the BumpContext as these are allocated in a single malloc().
> The pointer to the "block" also isn't really needed. This is always
> the same as the head element in the blocks dlist.

Neat idea, +1.

I think it would make sense to split the "add a bump allocator"
changes from the "use the bump allocator in tuplesort" patches.

Tangent: Do we have specific notes on worst-case memory usage of
memory contexts with various allocation patterns? This new bump
allocator seems to be quite efficient, but in a worst-case allocation
pattern it can still waste about 1/3 of its allocated memory due to
never using free space on previous blocks after an allocation didn't
fit on that block.
It probably isn't going to be a huge problem in general, but this
seems like something that could be documented as a potential problem
when you're looking for which allocator to use and compare it with
other allocators that handle different allocation sizes more
gracefully.

> +++ b/src/backend/utils/mmgr/bump.c
> +BumpBlockIsEmpty(BumpBlock *block)
> +{
> +/* it's empty if the freeptr has not moved */
> +return (block->freeptr == (char *) block + Bump_BLOCKHDRSZ);
> [...]
> +static inline void
> +BumpBlockMarkEmpty(BumpBlock *block)
> +{
> +#if defined(USE_VALGRIND) || defined(CLOBBER_FREED_MEMORY)
> +char   *datastart = ((char *) block) + Bump_BLOCKHDRSZ;

These two use different definitions of the start pointer. Is that deliberate?

> +++ b/src/include/utils/tuplesort.h
> @@ -109,7 +109,8 @@ typedef struct TuplesortInstrumentation
>  * a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
>  * which is a separate palloc chunk --- we assume it is just one chunk and
>  * can be freed by a simple pfree() (except during merge, when we use a
> - * simple slab allocator).  SortTuples also contain the tuple's first key
> + * simple slab allocator and when performing a non-bounded sort where we
> + * use a bump allocator).  SortTuples also contain the tuple's first key

I'd go with something like the following:

+ * ...(except during merge *where* we use a
+ * simple slab allocator, and during a non-bounded sort where we
+ * use a bump allocator).

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Add bump memory context type and use it for tuplesorts

2023-07-25 Thread Nathan Bossart
On Tue, Jun 27, 2023 at 09:19:26PM +1200, David Rowley wrote:
> Because of all of what is mentioned above about the current state of
> tuplesort, there does not really seem to be much need to have chunk
> headers in memory we allocate for tuples at all.  Not having these
> saves us a further 8 bytes per tuple.
> 
> In the attached patch, I've added a bump memory allocator which
> allocates chunks without and chunk header.  This means the chunks
> cannot be pfree'd or realloc'd.  That seems fine for the use case for
> storing tuples in tuplesort. I've coded bump.c in such a way that when
> built with MEMORY_CONTEXT_CHECKING, we *do* have chunk headers.  That
> should allow us to pick up any bugs that are introduced by any code
> which accidentally tries to pfree a bump.c chunk.

This is a neat idea.

> In terms of performance of tuplesort, there's a small (~5-6%)
> performance gain. Not as much as I'd hoped, but I'm also doing a bit
> of other work on tuplesort to make it more efficient in terms of CPU,
> so I suspect the cache efficiency improvements might be more
> pronounced after those.

Nice.

> One thing that might need more thought is that we're running a bit low
> on  MemoryContextMethodIDs. I had to use an empty slot that has a bit
> pattern like glibc malloc'd chunks sized 128kB.  Maybe it's worth
> freeing up a bit from the block offset in MemoryChunk. This is
> currently 30 bits allowing 1GB offset, but these offsets are always
> MAXALIGNED, so we could free up a couple of bits since those 2
> lowest-order bits will always be 0 anyway.

I think it'd be okay to steal those bits.  AFAICT it'd complicate the
macros in memutils_memorychunk.h a bit, but that doesn't seem like such a
terrible price to pay to allow us to keep avoiding the glibc bit patterns.

> + if (base->sortopt & TUPLESORT_ALLOWBOUNDED)
> + tuplen = GetMemoryChunkSpace(tuple);
> + else
> + tuplen = MAXALIGN(tuple->t_len);

nitpick: I see this repeated in a few places, and I wonder if it might
deserve a comment.

I haven't had a chance to try out your benchmark, but I'm hoping to do so
in the near future.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add bump memory context type and use it for tuplesorts

2023-07-10 Thread David Rowley
On Tue, 27 Jun 2023 at 21:19, David Rowley  wrote:
> I've attached the bump allocator patch and also the script I used to
> gather the performance results in the first 2 tabs in the attached
> spreadsheet.

I've attached a v2 patch which changes the BumpContext a little to
remove some of the fields that are not really required.  There was no
need for the "keeper" field as the keeper block always comes at the
end of the BumpContext as these are allocated in a single malloc().
The pointer to the "block" also isn't really needed. This is always
the same as the head element in the blocks dlist.

David
diff --git a/src/backend/nodes/gen_node_support.pl 
b/src/backend/nodes/gen_node_support.pl
index 72c7963578..a603f1d549 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -149,7 +149,7 @@ my @abstract_types = qw(Node);
 # they otherwise don't participate in node support.
 my @extra_tags = qw(
   IntList OidList XidList
-  AllocSetContext GenerationContext SlabContext
+  AllocSetContext GenerationContext SlabContext BumpContext
   TIDBitmap
   WindowObjectData
 );
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index dae3432c98..01a1fb8527 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
alignedalloc.o \
aset.o \
+   bump.o \
dsa.o \
freepage.o \
generation.o \
diff --git a/src/backend/utils/mmgr/bump.c b/src/backend/utils/mmgr/bump.c
new file mode 100644
index 00..015e66709f
--- /dev/null
+++ b/src/backend/utils/mmgr/bump.c
@@ -0,0 +1,747 @@
+/*-
+ *
+ * bump.c
+ *   Bump allocator definitions.
+ *
+ * Bump is a MemoryContext implementation designed for memory usages which
+ * require allocating a large number of chunks, none of which ever need to be
+ * pfree'd or realloc'd.  Chunks allocated by this context have no chunk header
+ * and operations which ordinarily require looking at the chunk header cannot
+ * be performed.  For example, pfree, realloc, GetMemoryChunkSpace and
+ * GetMemoryChunkContext are all not possible with bump allocated chunks.  The
+ * only way to release memory allocated by this context type is to reset or
+ * delete the context.
+ *
+ * Portions Copyright (c) 2023, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *   src/backend/utils/mmgr/bump.c
+ *
+ *
+ * Bump is best suited to cases which require a large number of short-lived
+ * chunks where performance matters.  Because bump allocated chunks don't
+ * have a chunk header, it can fit more chunks on each block.  This means 
we
+ * can do more with less memory and fewer cache lines.  The reason it's 
best
+ * suited for short-lived usages of memory is that ideally, pointers to 
bump
+ * allocated chunks won't be visible to a large amount of code.  The more
+ * code that operates on memory allocated by this allocator, the more 
chances
+ * that some code will try to perform a pfree or one of the other 
operations
+ * which are made impossible due to the lack of chunk header.  In order to
+ * to detect accidental usage of the various disallowed operations, we do 
add
+ * a MemoryChunk chunk header in MEMORY_CONTEXT_CHECKING builds and have 
the
+ * various disallowed functions raise an ERROR.
+ *
+ * Allocations are MAXALIGNed.
+ *
+ *-
+ */
+
+#include "postgres.h"
+
+#include "lib/ilist.h"
+#include "port/pg_bitutils.h"
+#include "utils/memdebug.h"
+#include "utils/memutils.h"
+#include "utils/memutils_memorychunk.h"
+#include "utils/memutils_internal.h"
+
+#define Bump_BLOCKHDRSZMAXALIGN(sizeof(BumpBlock))
+
+/* No chunk header unless built with MEMORY_CONTEXT_CHECKING */
+#ifdef MEMORY_CONTEXT_CHECKING
+#define Bump_CHUNKHDRSZsizeof(MemoryChunk)
+#else
+#define Bump_CHUNKHDRSZ0
+#endif
+
+#define Bump_CHUNK_FRACTION8
+
+/* The keeper block is allocated in the same allocation as the set */
+#define KeeperBlock(set) ((BumpBlock *) ((char *) (set) + sizeof(BumpContext)))
+#define IsKeeperBlock(set, blk) (KeeperBlock(set) == (blk))
+
+typedef struct BumpBlock BumpBlock; /* forward reference */
+
+typedef struct BumpContext
+{
+   MemoryContextData header;   /* Standard memory-context fields */
+
+   /* Bump context parameters */
+   uint32  initBlockSize;  /* initial block size */
+   uint32  maxBlockSize;   /* maximum block size */
+   uint32  nextBlockSize;  /* next block size to allocate */
+   uint32  allocChunkLimit;/* effective chunk size limit */
+
+   dlist_head  blocks; /* list of blocks with the 
block currently
+