Re: slab allocator performance issues

David Rowley Fri, 09 Dec 2022 20:02:36 -0800

.On Mon, 5 Dec 2022 at 23:18, John Naylor <john.nay...@enterprisedb.com> wrote:
>
>
> On Mon, Dec 5, 2022 at 3:02 PM David Rowley <dgrowle...@gmail.com> wrote:


> > Going by [2], the instructions are very different with each method, so
> > other machines with different latencies on those instructions might
> > show something different. I attached what I used to test if anyone
> > else wants a go.
>
> I get about 0.1% difference on my machine.  Both ways boil down to (on gcc) 3 
> instructions with low latency. The later ones need the prior results to 
> execute, which I think is what the XXX comment "isn't great" was referring 
> to. The new coding is more mysterious (does it do the right thing on all 
> platforms?), so I guess the original is still the way to go unless we get a 
> better idea.

I don't think it would work well on a one's complement machine, but I
don't think we support those going by the comments above RIGHTMOST_ONE
in bitmapset.c.  In anycase, I found that it wasn't any faster than
what Andres wrote. In fact, even changing the code there to "index =
(freecount > 0);" seems to do very little to increase performance. I
do see that having 3 freelist items performs a decent amount better
than having 9. However, which workload is run may alter the result of
that, assuming that keeping new allocations on fuller blocks is a
winning strategy for the CPU's caches.

I've now done quite a bit more work on Andres' patch to try and get it
into (hopefully) somewhere close to a committable shape.

I'm fairly happy with what's there now. It does seem to perform much
better than current master, especially so when the workload would have
caused master to continually malloc and free an entire block when
freeing and allocating a single chunk.

I've done some basic benchmarking, mostly using the attached alloc_bench patch.

If I run:

select *, round(slab_result / aset_result * 100 - 100,1)::text || '%'
as slab_slower_by
from (
select
    chunk_size,
    keep_chunks,
    sum(pg_allocate_memory_test(chunk_size, chunk_size * keep_chunks,
1024*1024*1024, 'slab'))::numeric(1000,3) as slab_result,
    sum(pg_allocate_memory_test(chunk_size, chunk_size * keep_chunks,
1024*1024*1024, 'aset'))::numeric(1000,3) as aset_result
from
    
(values(1),(10),(50),(100),(200),(300),(400),(500),(1000),(2000),(3000),(4000),(5000),(10000))
v1(keep_chunks),
    (values(64),(128),(256),(512),(1024)) v2(chunk_size)
group by rollup(1,2)
);

The results for the 64-byte chunk are shown in the attached chart.
It's not as fast as aset, but much faster than master's slab.c
The first blue bar of the chart is well above the vertical axis. It
took master 1118958 milliseconds for that test. The attached patch
took 194 ms. The rest of the tests seem to put the patched code around
somewhere in the middle between the unpatched code and aset.c's
performance.

The benchmark I did was entirely a FIFO workload that keeps around
"keep_chunks" at once before starting to free the oldest chunks.  I
know Tomas has some LIFO and random benchmarks. I edited that code [1]
a little to add support for other context types so that a comparison
could be done more easily, however, I'm getting very weird performance
results where sometimes it runs about twice as fast (Tomas mentioned
he got this too). I'm not seeing that with my own benchmarking
function, so I'm wondering if there's something weird going on with
the benchmark itself rather than the slab.c code.

I've likely made much more changes than I can list here, but here are
a few of the more major ones:

1. Make use of dclist for empty blocks
2. In SlabAlloc() allocate chunks from the freelist before the unused list.
3. Added output for showing information about empty blocks in the
SlabStats output.
4. Renamed the context freelists to blocklist.  I found this was
likely just confusing things with the block-level freelist.  In any
case, it seemed weird to have freelist[0] store full blocks. Not much
free there! I renamed to blocklist[].
5. I did a round of changing the order of the fields in SlabBlock.
This seems to affect performance quite a bit. Having the context first
seems to improve performance. Having the blocklist[] node last also
helps.
6. Removed nblocks and headerSize from SlabContext. headerSize is no
longer needed. nblocks was only really used for Asserts and
SlabIsEmpty.  I changed the Asserts to use a local count of blocks and
changed SlabIsEmpty to look at the context's mem_allocated.
7. There's now no integer division in any of the alloc and free code.
The only time we divide by fullChunkSize is in the check function.
8. When using a block from the emptyblock list, I changed the code to
not re-init the block. It's now used as it was left previously. This
means no longer having to build the freelist again.
9. Updated all comments to reflect the current state of the code.

Some things I thought about but didn't do:
a. Change the size of SlabContext's chunkSize, fullChunkSize and
blockSize to be uint32 instead of Size.  It might be possible to get
SlabContext below 128 bytes with a bit more work.
b. I could have done a bit more experimentation with unlikely() and
likely() to move less frequently accessed code off into a cold area.

For #2 above, I didn't really see much change in performance when I
swapped the order of what we allocate from first. I expected free
chunks would be better as they've been used and are seemingly more
likely to be in some CPU cache than one of the unused chunks.  I might
need a different allocation pattern than the one I used to highlight
that fact though.

David

[1] https://github.com/david-rowley/postgres/tree/alloc_bench_contrib

From b970576654bbce5c57690b8ab49d0b4376d1b5d3 Mon Sep 17 00:00:00 2001
From: David Rowley <dgrow...@gmail.com>
Date: Wed, 12 Oct 2022 09:30:24 +1300
Subject: [PATCH v4] Improve the performance of the slab memory allocator

Author: Andres Freund, David Rowley
Discussion: 
https://postgr.es/m/20210717194333.mr5io3zup3kxa...@alap3.anarazel.de
---
 src/backend/utils/mmgr/slab.c | 750 ++++++++++++++++++++++------------
 1 file changed, 490 insertions(+), 260 deletions(-)

diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index 6df0839b6a..2b95a6c061 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -3,8 +3,8 @@
  * slab.c
  *       SLAB allocator definitions.
  *
- * SLAB is a MemoryContext implementation designed for cases where large
- * numbers of equally-sized objects are allocated (and freed).
+ * SLAB is a MemoryContext implementation designed for cases where a large
+ * numbers of equally-sized objects can be allocated and freed efficiently.
  *
  *
  * Portions Copyright (c) 2017-2022, PostgreSQL Global Development Group
@@ -19,33 +19,41 @@
  *     into chunks of exactly the right size (plus alignment), not wasting any
  *     memory.
  *
- *     The information about free chunks is maintained both at the block level 
and
- *     global (context) level. This is possible as the chunk size (and thus 
also
- *     the number of chunks per block) is fixed.
+ *     Slab can also help reduce memory fragmentation where chunks remain 
stored
+ *     on blocks with no other or only a few other allocated chunks.  If this
+ *     happens then the entire block cannot be freed until the last remaining
+ *     chunk is freed.  Slab helps work around this problem by prioritizing
+ *     storing newly allocated chunks starting with the fullest blocks first.
+ *     This makes it more likely that blocks with only a small number of
+ *     remaining chunks eventually get freed when the final remaining chunk is
+ *     freed.
  *
- *     On each block, free chunks are tracked in a simple linked list. Contents
- *     of free chunks is replaced with an index of the next free chunk, forming
- *     a very simple linked list. Each block also contains a counter of free
- *     chunks. Combined with the local block-level freelist, it makes it 
trivial
- *     to eventually free the whole block.
+ *     We are easily able to find the fullest block to store a new chunk on as 
we
+ *     maintain a list of all blocks and partition that list by the block's
+ *     number of free chunks.  This does mean having to possibly move the block
+ *     onto another list whenever we allocate or free a chunk from it.  
However,
+ *     this problem is significantly reduced by only having a small fixed 
number
+ *     of lists for blocks, each of which allows storage of blocks with ranges 
of
+ *     free chunks.  The block only needs to be moved to another list when the
+ *     number of free chunks crosses the range boundary with another block 
list.
  *
- *     At the context level, we use 'freelist' to track blocks ordered by 
number
- *     of free chunks, starting with blocks having a single allocated chunk, 
and
- *     with completely full blocks on the tail.
+ *     Within each block, we maintain a list of chunks which are free to be 
used
+ *     by new allocations.  This is done in the form of a linked list where we
+ *     store a pointer to the MemoryChunk that's next in the free list within 
the
+ *     previous chunk's memory.  The first of such chunks is pointed to from
+ *     the block's "freehead" pointer.
  *
- *     This also allows various optimizations - for example when searching for
- *     free chunk, the allocator reuses space from the fullest blocks first, in
- *     the hope that some of the less full blocks will get completely empty 
(and
- *     returned back to the OS).
- *
- *     For each block, we maintain pointer to the first free chunk - this is 
quite
- *     cheap and allows us to skip all the preceding used chunks, eliminating
- *     a significant number of lookups in many common usage patterns. In the 
worst
- *     case this performs as if the pointer was not maintained.
- *
- *     We cache the freelist index for the blocks with the fewest free chunks
- *     (minFreeChunks), so that we don't have to search the freelist on every
- *     SlabAlloc() call, which is quite expensive.
+ *     When we allocate a new block, technically all chunks are free, however, 
to
+ *     avoid having to write out the entire block to set the linked list for 
the
+ *     free chunks for every chunk in the block, we instead store a pointer to
+ *     the next "unused" chunk on the block and keep track of how many of these
+ *     unused chunks there are.  In a newly allocated block, instead of
+ *     populating the block's freelist, we simply consume the first unused 
chunk.
+ *     It's only when these chunks are later freed that they go onto the 
block's
+ *     freelist.  When a block has both unused chunks and chunks on the 
freelist,
+ *     we give priority to using the freelist chunks as the memory of these
+ *     chunks is more likely to be in the CPUs caches as they've previously 
been
+ *     used, whereas the unused ones have not yet been used.
  *
  *-------------------------------------------------------------------------
  */
@@ -60,6 +68,26 @@
 
 #define Slab_BLOCKHDRSZ        MAXALIGN(sizeof(SlabBlock))
 
+#ifdef MEMORY_CONTEXT_CHECKING
+/*
+ * Size of the memory required to store the SlabContext.
+ * MEMORY_CONTEXT_CHECKING builds need some extra member for the freechunks
+ * array.
+ */
+#define Slab_CONTEXT_HDRSZ(cpb)        (sizeof(SlabContext) + ((cpb) * 
sizeof(bool)))
+#else
+#define Slab_CONTEXT_HDRSZ(cpb)        sizeof(SlabContext)
+#endif
+
+/*
+ * The number of partitions to divide the used blocks list into based their
+ * number of free chunks.  There must be at least 2.
+ */
+#define SLAB_BLOCKLIST_COUNT 3
+
+/* The maximum number of completely empty blocks to keep around to recycle */
+#define SLAB_MAXIMUM_EMPTY_BLOCKS 10
+
 /*
  * SlabContext is a specialized implementation of MemoryContext.
  */
@@ -67,56 +95,94 @@ typedef struct SlabContext
 {
        MemoryContextData header;       /* Standard memory-context fields */
        /* Allocation parameters for this context: */
-       Size            chunkSize;              /* chunk size */
-       Size            fullChunkSize;  /* chunk size including header and 
alignment */
-       Size            blockSize;              /* block size */
-       Size            headerSize;             /* allocated size of context 
header */
-       int                     chunksPerBlock; /* number of chunks per block */
-       int                     minFreeChunks;  /* min number of free chunks in 
any block */
-       int                     nblocks;                /* number of blocks 
allocated */
+       Size            chunkSize;              /* the requested (non-aligned) 
chunk size */
+       Size            fullChunkSize;  /* chunk size with chunk header and 
alignment */
+       Size            blockSize;              /* the size to make each block 
of chunks */
+       int32           chunksPerBlock; /* number of chunks per block */
+       int32           curBlocklistIndex;      /* index into the blocklist[] 
element
+                                                                        * 
containing the fullest, blocks */
 #ifdef MEMORY_CONTEXT_CHECKING
        bool       *freechunks;         /* bitmap of free chunks in a block */
 #endif
-       /* blocks with free space, grouped by number of free chunks: */
-       dlist_head      freelist[FLEXIBLE_ARRAY_MEMBER];
+
+       int32           blocklist_shift;        /* number of bits to shift the 
nfree count
+                                                                        * by 
to get the index into blocklist[] */
+       dclist_head emptyblocks;        /* Up to SLAB_MAXIMUM_EMPTY_BLOCKS 
blocks
+                                                                * which are 
empty and ready to be reused */
+
+       /*
+        * Blocks with free space, grouped by the number of free chunks they
+        * contain.  Completely full blocks are stored in the 0th element.
+        * Completely empty blocks are freed or stored in emptyblocks.
+        */
+       dlist_head      blocklist[SLAB_BLOCKLIST_COUNT];
 } SlabContext;
 
 /*
  * SlabBlock
- *             Structure of a single block in SLAB allocator.
+ *             Structure of a single slab block.
  *
- * node: doubly-linked list of blocks in global freelist
- * nfree: number of free chunks in this block
- * firstFreeChunk: index of the first free chunk
+ * slab: pointer back to the owning MemoryContext
+ * nfree: number of chunks on the block which are unallocated
+ * nunused: number of chunks on the block unallocated and not on the block's
+ * freelist.
+ * freehead: linked-list header storing a pointer to the first free chunk on
+ * the block.  Subsequent pointers are stored in the chunk's memory.  NULL
+ * indicates the end of the list.
+ * unused: pointer to the next chunk which has yet to be used.
+ * node: doubly-linked list node for the context's blocklist
  */
 typedef struct SlabBlock
 {
-       dlist_node      node;                   /* doubly-linked list */
-       int                     nfree;                  /* number of free 
chunks */
-       int                     firstFreeChunk; /* index of the first free 
chunk in the block */
        SlabContext *slab;                      /* owning context */
+       int32           nfree;                  /* number of chunks on freelist 
+ unused */
+       int32           nunused;                /* number of unused chunks */
+       MemoryChunk *freehead;          /* pointer to the first free chunk */
+       MemoryChunk *unused;            /* pointer to the next unused chunk */
+       dlist_node      node;                   /* doubly-linked list for 
blocklist[] */
 } SlabBlock;
 
 
 #define Slab_CHUNKHDRSZ sizeof(MemoryChunk)
-#define SlabPointerGetChunk(ptr)       \
-       ((MemoryChunk *)(((char *)(ptr)) - sizeof(MemoryChunk)))
 #define SlabChunkGetPointer(chk)       \
-       ((void *)(((char *)(chk)) + sizeof(MemoryChunk)))
-#define SlabBlockGetChunk(slab, block, idx) \
+       ((void *) (((char *) (chk)) + sizeof(MemoryChunk)))
+
+/*
+ * SlabBlockGetChunk
+ *             Obtain a pointer to the Nth chunk in the block
+ */
+#define SlabBlockGetChunk(slab, block, N) \
        ((MemoryChunk *) ((char *) (block) + Slab_BLOCKHDRSZ    \
-                                       + (idx * slab->fullChunkSize)))
-#define SlabBlockStart(block)  \
-       ((char *) block + Slab_BLOCKHDRSZ)
+                                       + ((N) * (slab)->fullChunkSize)))
+
+#ifdef MEMORY_CONTEXT_CHECKING
+
+/*
+ * SlabChunkIndex
+ *             Get the 0-based index of how many chunks into the block the 
given
+ *             chunk is.
+*/
 #define SlabChunkIndex(slab, block, chunk)     \
-       (((char *) chunk - SlabBlockStart(block)) / slab->fullChunkSize)
+       (((char *) (chunk) - (char *) SlabBlockGetChunk(slab, block, 0)) / \
+       (slab)->fullChunkSize)
+
+/*
+ * SlabChunkMod
+ *             A MemoryChunk should always be at an address which is a 
multiple of
+ *             fullChunkSize starting from the 0th chunk position.  This will 
return
+ *             non-zero if it's not.
+ */
+#define SlabChunkMod(slab, block, chunk)       \
+       (((char *) (chunk) - (char *) SlabBlockGetChunk(slab, block, 0)) % \
+       (slab)->fullChunkSize)
+
+#endif
 
 /*
  * SlabIsValid
  *             True iff set is valid slab allocation set.
  */
-#define SlabIsValid(set) \
-       (PointerIsValid(set) && IsA(set, SlabContext))
+#define SlabIsValid(set) (PointerIsValid(set) && IsA(set, SlabContext))
 
 /*
  * SlabBlockIsValid
@@ -125,6 +191,59 @@ typedef struct SlabBlock
 #define SlabBlockIsValid(block) \
        (PointerIsValid(block) && SlabIsValid((block)->slab))
 
+/*
+ * SlabBlocklistIndex
+ *             Determine the blocklist index that a block should be in for the 
given
+ *             number of free chunks.
+ */
+static inline int32
+SlabBlocklistIndex(SlabContext *slab, int nfree)
+{
+       int32           index;
+       int32           blocklist_shift = slab->blocklist_shift;
+
+       Assert(nfree <= slab->chunksPerBlock);
+
+       /*
+        * Determine the blocklist index based on the number of free chunks.  We
+        * must enure that 0 free chunks is dedicated to index 0.  Everything 
else
+        * must be >= 1 and < SLAB_BLOCKLIST_COUNT.
+        */
+       index = (nfree + (1 << blocklist_shift) - 1) >> blocklist_shift;
+
+       if (nfree == 0)
+               Assert(index == 0);
+       else
+               Assert(index >= 1 && index < SLAB_BLOCKLIST_COUNT);
+
+       return index;
+}
+
+/*
+ * SlabFindNextBlockListIndex
+ *             Search blocklist for blocks which have free chunks and return 
the
+ *             index of the blocklist found containing at least 1 block with 
free
+ *             chunks.  If no block can be found we return 0.
+ *
+ * Note: We give priority to full blocks so that these are filled before more
+ * empty blocks.  This is done to increase the chances that mostly-empty
+ * blocks will eventually become completely empty so they can be freed.
+ */
+static int32
+SlabFindNextBlockListIndex(SlabContext *slab)
+{
+       /* start at 1.  blocklist[0] is for full blocks. */
+       for (int i = 1; i < SLAB_BLOCKLIST_COUNT; i++)
+       {
+               if (dlist_is_empty(&slab->blocklist[i]))
+                       continue;
+
+               return i;
+       }
+
+       /* no blocks with free space */
+       return 0;
+}
 
 /*
  * SlabContextCreate
@@ -145,8 +264,6 @@ SlabContextCreate(MemoryContext parent,
 {
        int                     chunksPerBlock;
        Size            fullChunkSize;
-       Size            freelistSize;
-       Size            headerSize;
        SlabContext *slab;
        int                     i;
 
@@ -155,11 +272,14 @@ SlabContextCreate(MemoryContext parent,
                                         "sizeof(MemoryChunk) is not 
maxaligned");
        Assert(MAXALIGN(chunkSize) <= MEMORYCHUNK_MAX_VALUE);
 
-       /* Make sure the linked list node fits inside a freed chunk */
-       if (chunkSize < sizeof(int))
-               chunkSize = sizeof(int);
+       /*
+        * Ensure there's enough space to store the pointer to the next free 
chunk
+        * in the memory of the (otherwise) unused allocation.
+        */
+       if (chunkSize < sizeof(MemoryChunk *))
+               chunkSize = sizeof(MemoryChunk *);
 
-       /* chunk, including SLAB header (both addresses nicely aligned) */
+       /* length of the maxaligned chunk including the chunk header  */
 #ifdef MEMORY_CONTEXT_CHECKING
        /* ensure there's always space for the sentinel byte */
        fullChunkSize = Slab_CHUNKHDRSZ + MAXALIGN(chunkSize + 1);
@@ -167,36 +287,17 @@ SlabContextCreate(MemoryContext parent,
        fullChunkSize = Slab_CHUNKHDRSZ + MAXALIGN(chunkSize);
 #endif
 
-       /* Make sure the block can store at least one chunk. */
-       if (blockSize < fullChunkSize + Slab_BLOCKHDRSZ)
-               elog(ERROR, "block size %zu for slab is too small for %zu 
chunks",
-                        blockSize, chunkSize);
-
-       /* Compute maximum number of chunks per block */
+       /* compute the number of chunks that will fit on each block */
        chunksPerBlock = (blockSize - Slab_BLOCKHDRSZ) / fullChunkSize;
 
-       /* The freelist starts with 0, ends with chunksPerBlock. */
-       freelistSize = sizeof(dlist_head) * (chunksPerBlock + 1);
-
-       /*
-        * Allocate the context header.  Unlike aset.c, we never try to combine
-        * this with the first regular block; not worth the extra complication.
-        */
+       /* Make sure the block can store at least one chunk. */
+       if (chunksPerBlock == 0)
+               elog(ERROR, "block size %zu for slab is too small for %zu-byte 
chunks",
+                        blockSize, chunkSize);
 
-       /* Size of the memory context header */
-       headerSize = offsetof(SlabContext, freelist) + freelistSize;
 
-#ifdef MEMORY_CONTEXT_CHECKING
 
-       /*
-        * With memory checking, we need to allocate extra space for the bitmap 
of
-        * free chunks. The bitmap is an array of bools, so we don't need to 
worry
-        * about alignment.
-        */
-       headerSize += chunksPerBlock * sizeof(bool);
-#endif
-
-       slab = (SlabContext *) malloc(headerSize);
+       slab = (SlabContext *) malloc(Slab_CONTEXT_HDRSZ(chunksPerBlock));
        if (slab == NULL)
        {
                MemoryContextStats(TopMemoryContext);
@@ -216,19 +317,33 @@ SlabContextCreate(MemoryContext parent,
        slab->chunkSize = chunkSize;
        slab->fullChunkSize = fullChunkSize;
        slab->blockSize = blockSize;
-       slab->headerSize = headerSize;
        slab->chunksPerBlock = chunksPerBlock;
-       slab->minFreeChunks = 0;
-       slab->nblocks = 0;
+       slab->curBlocklistIndex = 0;
 
-       /* initialize the freelist slots */
-       for (i = 0; i < (slab->chunksPerBlock + 1); i++)
-               dlist_init(&slab->freelist[i]);
+       /*
+        * Compute a shift that guarantees that shifting chunksPerBlock with it 
is
+        * < SLAB_BLOCKLIST_COUNT - 1.  The reason that we subtract 1 from
+        * SLAB_BLOCKLIST_COUNT in this calculation is that we reserve the 0th
+        * blocklist element for blocks which have no free chunks.
+        *
+        * We calculate the number of bits to shift by rather than a divisor to
+        * divide by as performing division each time we need to find the
+        * blocklist index would be much slower.
+        */
+       slab->blocklist_shift = 0;
+       while ((slab->chunksPerBlock >> slab->blocklist_shift) >= 
(SLAB_BLOCKLIST_COUNT - 1))
+               slab->blocklist_shift++;
+
+       /* initialize the list to store empty blocks to be recycled */
+       dclist_init(&slab->emptyblocks);
+
+       /* initialize the blocklist slots */
+       for (i = 0; i < SLAB_BLOCKLIST_COUNT; i++)
+               dlist_init(&slab->blocklist[i]);
 
 #ifdef MEMORY_CONTEXT_CHECKING
-       /* set the freechunks pointer right after the freelists array */
-       slab->freechunks
-               = (bool *) slab + offsetof(SlabContext, freelist) + 
freelistSize;
+       /* set the freechunks pointer right after the end of the context */
+       slab->freechunks = (bool *) ((char *) slab + sizeof(SlabContext));
 #endif
 
        /* Finally, do the type-independent part of context creation */
@@ -252,6 +367,7 @@ void
 SlabReset(MemoryContext context)
 {
        SlabContext *slab = (SlabContext *) context;
+       dlist_mutable_iter miter;
        int                     i;
 
        Assert(SlabIsValid(slab));
@@ -261,12 +377,24 @@ SlabReset(MemoryContext context)
        SlabCheck(context);
 #endif
 
-       /* walk over freelists and free the blocks */
-       for (i = 0; i <= slab->chunksPerBlock; i++)
+       /* release any retained empty blocks */
+       dclist_foreach_modify(miter, &slab->emptyblocks)
        {
-               dlist_mutable_iter miter;
+               SlabBlock  *block = dlist_container(SlabBlock, node, miter.cur);
+
+               dclist_delete_from(&slab->emptyblocks, miter.cur);
 
-               dlist_foreach_modify(miter, &slab->freelist[i])
+#ifdef CLOBBER_FREED_MEMORY
+               wipe_mem(block, slab->blockSize);
+#endif
+               free(block);
+               context->mem_allocated -= slab->blockSize;
+       }
+
+       /* walk over blocklist and free the blocks */
+       for (i = 0; i < SLAB_BLOCKLIST_COUNT; i++)
+       {
+               dlist_foreach_modify(miter, &slab->blocklist[i])
                {
                        SlabBlock  *block = dlist_container(SlabBlock, node, 
miter.cur);
 
@@ -276,14 +404,12 @@ SlabReset(MemoryContext context)
                        wipe_mem(block, slab->blockSize);
 #endif
                        free(block);
-                       slab->nblocks--;
                        context->mem_allocated -= slab->blockSize;
                }
        }
 
-       slab->minFreeChunks = 0;
+       slab->curBlocklistIndex = 0;
 
-       Assert(slab->nblocks == 0);
        Assert(context->mem_allocated == 0);
 }
 
@@ -311,12 +437,12 @@ SlabAlloc(MemoryContext context, Size size)
        SlabContext *slab = (SlabContext *) context;
        SlabBlock  *block;
        MemoryChunk *chunk;
-       int                     idx;
 
        Assert(SlabIsValid(slab));
 
-       Assert((slab->minFreeChunks >= 0) &&
-                  (slab->minFreeChunks < slab->chunksPerBlock));
+       /* sanity check that this is pointing to a valid blocklist */
+       Assert(slab->curBlocklistIndex >= 0);
+       Assert(slab->curBlocklistIndex <= SlabBlocklistIndex(slab, 
slab->chunksPerBlock));
 
        /* make sure we only allow correct request size */
        if (size != slab->chunkSize)
@@ -324,114 +450,153 @@ SlabAlloc(MemoryContext context, Size size)
                         size, slab->chunkSize);
 
        /*
-        * If there are no free chunks in any existing block, create a new block
-        * and put it to the last freelist bucket.
-        *
-        * slab->minFreeChunks == 0 means there are no blocks with free chunks,
-        * thanks to how minFreeChunks is updated at the end of SlabAlloc().
+        * Handle the case when there are no partially filled blocks available.
+        * SlabFree() will have updated the curBlocklistIndex setting it to zero
+        * to indicate that it has freed the final block.  Also later in
+        * SlabAlloc() we will set the curBlocklistIndex to zero if we end up
+        * filling the final block.
         */
-       if (slab->minFreeChunks == 0)
+       if (unlikely(slab->curBlocklistIndex == 0))
        {
-               block = (SlabBlock *) malloc(slab->blockSize);
+               dlist_head *blocklist;
+               int                     blocklist_idx;
 
-               if (block == NULL)
-                       return NULL;
+               /* to save allocating a new one, first check the empty blocks 
list */
+               if (dclist_count(&slab->emptyblocks) > 0)
+               {
+                       dlist_node *node = 
dclist_pop_head_node(&slab->emptyblocks);
 
-               block->nfree = slab->chunksPerBlock;
-               block->firstFreeChunk = 0;
-               block->slab = slab;
+                       block = dlist_container(SlabBlock, node, node);
 
-               /*
-                * Put all the chunks on a freelist. Walk the chunks and point 
each
-                * one to the next one.
-                */
-               for (idx = 0; idx < slab->chunksPerBlock; idx++)
-               {
-                       chunk = SlabBlockGetChunk(slab, block, idx);
-                       *(int32 *) MemoryChunkGetPointer(chunk) = (idx + 1);
+                       /*
+                        * SlabFree() should have left this block in a valid 
state with
+                        * all chunks free.  Ensure that's the case.
+                        */
+                       Assert(block->nfree == slab->chunksPerBlock);
+
+                       if (block->freehead != NULL)
+                       {
+                               chunk = block->freehead;
+
+                               /*
+                                * Pop the chunk from the linked list of free 
chunks.  The
+                                * pointer to the next free chunk is stored in 
the chunk
+                                * itself.
+                                */
+                               
VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(MemoryChunk *));
+                               block->freehead = *(MemoryChunk **) 
SlabChunkGetPointer(chunk);
+
+                               Assert(block->freehead == NULL ||
+                                          ((char *) block->freehead >= (char 
*) block &&
+                                               (char *) block->freehead < 
(char *) block + slab->blockSize &&
+                                               SlabChunkMod(slab, block, 
block->freehead) == 0));
+                       }
+                       else
+                       {
+                               Assert(block->nunused > 0);
+
+                               chunk = block->unused;
+                               block->unused = (MemoryChunk *) (((char *) 
block->unused) + slab->fullChunkSize);
+                               block->nunused--;
+                       }
                }
+               else
+               {
+                       block = (SlabBlock *) malloc(slab->blockSize);
 
-               /*
-                * And add it to the last freelist with all chunks empty.
-                *
-                * We know there are no blocks in the freelist, otherwise we 
wouldn't
-                * need a new block.
-                */
-               Assert(dlist_is_empty(&slab->freelist[slab->chunksPerBlock]));
+                       if (unlikely(block == NULL))
+                               return NULL;
 
-               dlist_push_head(&slab->freelist[slab->chunksPerBlock], 
&block->node);
+                       block->slab = slab;
+                       context->mem_allocated += slab->blockSize;
 
-               slab->minFreeChunks = slab->chunksPerBlock;
-               slab->nblocks += 1;
-               context->mem_allocated += slab->blockSize;
-       }
+                       /* use the first chunk in the new block */
+                       chunk = SlabBlockGetChunk(slab, block, 0);
 
-       /* grab the block from the freelist (even the new block is there) */
-       block = dlist_head_element(SlabBlock, node,
-                                                          
&slab->freelist[slab->minFreeChunks]);
+                       block->nfree = slab->chunksPerBlock;
+                       block->unused = SlabBlockGetChunk(slab, block, 1);
+                       block->freehead = NULL;
+                       block->nunused = slab->chunksPerBlock - 1;
+               }
 
-       /* make sure we actually got a valid block, with matching nfree */
-       Assert(block != NULL);
-       Assert(slab->minFreeChunks == block->nfree);
-       Assert(block->nfree > 0);
+               /* find the blocklist element for storing blocks with 1 used 
chunk */
+               blocklist_idx = SlabBlocklistIndex(slab, slab->chunksPerBlock - 
1);
+               blocklist = &slab->blocklist[blocklist_idx];
 
-       /* we know index of the first free chunk in the block */
-       idx = block->firstFreeChunk;
+               /* this better be empty.  We just added a block thinking it was 
*/
+               Assert(dlist_is_empty(blocklist));
 
-       /* make sure the chunk index is valid, and that it's marked as empty */
-       Assert((idx >= 0) && (idx < slab->chunksPerBlock));
+               dlist_push_head(blocklist, &block->node);
 
-       /* compute the chunk location block start (after the block header) */
-       chunk = SlabBlockGetChunk(slab, block, idx);
+               slab->curBlocklistIndex = blocklist_idx;
+       }
+       else
+       {
+               dlist_head *blocklist = 
&slab->blocklist[slab->curBlocklistIndex];
+               int                     new_blocklist_idx;
 
-       /*
-        * Update the block nfree count, and also the minFreeChunks as we've
-        * decreased nfree for a block with the minimum number of free chunks
-        * (because that's how we chose the block).
-        */
-       block->nfree--;
-       slab->minFreeChunks = block->nfree;
+               Assert(!dlist_is_empty(blocklist));
 
-       /*
-        * Remove the chunk from the freelist head. The index of the next free
-        * chunk is stored in the chunk itself.
-        */
-       VALGRIND_MAKE_MEM_DEFINED(MemoryChunkGetPointer(chunk), sizeof(int32));
-       block->firstFreeChunk = *(int32 *) MemoryChunkGetPointer(chunk);
+               /* grab the block from the blocklist */
+               block = dlist_head_element(SlabBlock, node, blocklist);
 
-       Assert(block->firstFreeChunk >= 0);
-       Assert(block->firstFreeChunk <= slab->chunksPerBlock);
+               /* make sure we actually got a valid block, with matching nfree 
*/
+               Assert(block != NULL);
+               Assert(slab->curBlocklistIndex == SlabBlocklistIndex(slab, 
block->nfree));
+               Assert(block->nfree > 0);
 
-       Assert((block->nfree != 0 &&
-                       block->firstFreeChunk < slab->chunksPerBlock) ||
-                  (block->nfree == 0 &&
-                       block->firstFreeChunk == slab->chunksPerBlock));
+               /*
+                * Use previously free'd chunks first. Unused chunks are less 
likely
+                * to be cached by the CPU.
+                */
+               if (block->freehead != NULL)
+               {
+                       chunk = block->freehead;
 
-       /* move the whole block to the right place in the freelist */
-       dlist_delete(&block->node);
-       dlist_push_head(&slab->freelist[block->nfree], &block->node);
+                       /*
+                        * Pop the chunk from the linked list of free chunks.  
The pointer
+                        * to the next free chunk is stored in the chunk itself.
+                        */
+                       VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), 
sizeof(MemoryChunk *));
+                       block->freehead = *(MemoryChunk **) 
SlabChunkGetPointer(chunk);
 
-       /*
-        * And finally update minFreeChunks, i.e. the index to the block with 
the
-        * lowest number of free chunks. We only need to do that when the block
-        * got full (otherwise we know the current block is the right one). 
We'll
-        * simply walk the freelist until we find a non-empty entry.
-        */
-       if (slab->minFreeChunks == 0)
-       {
-               for (idx = 1; idx <= slab->chunksPerBlock; idx++)
+                       Assert(block->freehead == NULL ||
+                                  ((char *) block->freehead >= (char *) block 
&&
+                                       (char *) block->freehead < (char *) 
block + slab->blockSize &&
+                                       SlabChunkMod(slab, block, 
block->freehead) == 0));
+               }
+               else
                {
-                       if (dlist_is_empty(&slab->freelist[idx]))
-                               continue;
+                       Assert(block->nunused > 0);
 
-                       /* found a non-empty freelist */
-                       slab->minFreeChunks = idx;
-                       break;
+                       chunk = block->unused;
+                       block->unused = (MemoryChunk *) (((char *) 
block->unused) + slab->fullChunkSize);
+                       block->nunused--;
+               }
+
+               /* get the new blocklist index based on the new free chunk 
count */
+               new_blocklist_idx = SlabBlocklistIndex(slab, block->nfree - 1);
+
+               /*
+                * Handle the case where the blocklist index changes.  This 
also deals
+                * with blocks becoming full as only full blocks go at index 0.
+                */
+               if (unlikely(slab->curBlocklistIndex != new_blocklist_idx))
+               {
+                       dlist_delete_from(blocklist, &block->node);
+                       dlist_push_head(&slab->blocklist[new_blocklist_idx], 
&block->node);
+
+                       if (dlist_is_empty(blocklist))
+                               slab->curBlocklistIndex = 
SlabFindNextBlockListIndex(slab);
                }
        }
 
-       if (slab->minFreeChunks == slab->chunksPerBlock)
-               slab->minFreeChunks = 0;
+       /* check that the chunk pointer is actually somewhere on the block */
+       Assert(chunk >= SlabBlockGetChunk(slab, block, 0));
+       Assert(chunk <= SlabBlockGetChunk(slab, block, slab->chunksPerBlock));
+
+       /* update the block nfree count */
+       block->nfree--;
 
        /* Prepare to initialize the chunk header. */
        VALGRIND_MAKE_MEM_UNDEFINED(chunk, Slab_CHUNKHDRSZ);
@@ -453,8 +618,6 @@ SlabAlloc(MemoryContext context, Size size)
        randomize_mem((char *) MemoryChunkGetPointer(chunk), size);
 #endif
 
-       Assert(slab->nblocks * slab->blockSize == context->mem_allocated);
-
        return MemoryChunkGetPointer(chunk);
 }
 
@@ -468,7 +631,8 @@ SlabFree(void *pointer)
        MemoryChunk *chunk = PointerGetMemoryChunk(pointer);
        SlabBlock  *block = MemoryChunkGetBlock(chunk);
        SlabContext *slab;
-       int                     idx;
+       int                     curBlocklistIdx;
+       int                     newBlocklistIdx;
 
        /*
         * For speed reasons we just Assert that the referenced block is good.
@@ -486,63 +650,82 @@ SlabFree(void *pointer)
                         slab->header.name, chunk);
 #endif
 
-       /* compute index of the chunk with respect to block start */
-       idx = SlabChunkIndex(slab, block, chunk);
+       /* push this chunk onto the head of the free list */
+       *(MemoryChunk **) pointer = block->freehead;
+       block->freehead = chunk;
 
-       /* add chunk to freelist, and update block nfree count */
-       *(int32 *) pointer = block->firstFreeChunk;
-       block->firstFreeChunk = idx;
        block->nfree++;
 
        Assert(block->nfree > 0);
        Assert(block->nfree <= slab->chunksPerBlock);
 
 #ifdef CLOBBER_FREED_MEMORY
-       /* XXX don't wipe the int32 index, used for block-level freelist */
-       wipe_mem((char *) pointer + sizeof(int32),
-                        slab->chunkSize - sizeof(int32));
+       /* don't wipe the free list MemoryChunk pointer stored in the chunk */
+       wipe_mem((char *) pointer + sizeof(MemoryChunk *),
+                        slab->chunkSize - sizeof(MemoryChunk *));
 #endif
 
-       /* remove the block from a freelist */
-       dlist_delete(&block->node);
+       curBlocklistIdx = SlabBlocklistIndex(slab, block->nfree - 1);
+       newBlocklistIdx = SlabBlocklistIndex(slab, block->nfree);
 
        /*
-        * See if we need to update the minFreeChunks field for the slab - we 
only
-        * need to do that if there the block had that number of free chunks
-        * before we freed one. In that case, we check if there still are blocks
-        * in the original freelist and we either keep the current value (if 
there
-        * still are blocks) or increment it by one (the new block is still the
-        * one with minimum free chunks).
-        *
-        * The one exception is when the block will get completely free - in 
that
-        * case we will free it, se we can't use it for minFreeChunks. It 
however
-        * means there are no more blocks with free chunks.
+        * Check if the block needs to be moved to another element on the
+        * blocklist based on it now having 1 more free chunk.
         */
-       if (slab->minFreeChunks == (block->nfree - 1))
+       if (unlikely(curBlocklistIdx != newBlocklistIdx))
        {
-               /* Have we removed the last chunk from the freelist? */
-               if (dlist_is_empty(&slab->freelist[slab->minFreeChunks]))
+               /* do the move */
+               dlist_delete_from(&slab->blocklist[curBlocklistIdx], 
&block->node);
+               dlist_push_head(&slab->blocklist[newBlocklistIdx], 
&block->node);
+
+               /*
+                * It's possible that we've no blocks in the blocklist at the
+                * curBlocklistIndex position.  When this happens we must find 
the
+                * next blocklist index which contains blocks.  We can be 
certain
+                * we'll find a block as at least one must exist for the chunk 
we're
+                * currently freeing.
+                */
+               if (slab->curBlocklistIndex == curBlocklistIdx &&
+                       dlist_is_empty(&slab->blocklist[curBlocklistIdx]))
                {
-                       /* but if we made the block entirely free, we'll free 
it */
-                       if (block->nfree == slab->chunksPerBlock)
-                               slab->minFreeChunks = 0;
-                       else
-                               slab->minFreeChunks++;
+                       slab->curBlocklistIndex = 
SlabFindNextBlockListIndex(slab);
+                       Assert(slab->curBlocklistIndex > 0);
                }
        }
 
-       /* If the block is now completely empty, free it. */
-       if (block->nfree == slab->chunksPerBlock)
+       /* Handle when a block becomes completely empty */
+       if (unlikely(block->nfree == slab->chunksPerBlock))
        {
-               free(block);
-               slab->nblocks--;
-               slab->header.mem_allocated -= slab->blockSize;
-       }
-       else
-               dlist_push_head(&slab->freelist[block->nfree], &block->node);
+               /* remove the block */
+               dlist_delete_from(&slab->blocklist[newBlocklistIdx], 
&block->node);
 
-       Assert(slab->nblocks >= 0);
-       Assert(slab->nblocks * slab->blockSize == slab->header.mem_allocated);
+               /*
+                * To avoid thrashing malloc/free, we keep a list of empty 
blocks that
+                * we can reuse again insteading having to malloc a new one.
+                */
+               if (dclist_count(&slab->emptyblocks) < 
SLAB_MAXIMUM_EMPTY_BLOCKS)
+                       dclist_push_head(&slab->emptyblocks, &block->node);
+               else
+               {
+                       /*
+                        * When we have enough empty blocks stored already, we 
actually
+                        * free the block.
+                        */
+#ifdef CLOBBER_FREED_MEMORY
+                       wipe_mem(block, slab->blockSize);
+#endif
+                       free(block);
+                       slab->header.mem_allocated -= slab->blockSize;
+               }
+
+               /*
+                * Check if we need to reset the blocklist index.  This is 
required
+                * when the blocklist we're using has become completely empty.
+                */
+               if (slab->curBlocklistIndex == newBlocklistIdx &&
+                       dlist_is_empty(&slab->blocklist[newBlocklistIdx]))
+                       slab->curBlocklistIndex = 
SlabFindNextBlockListIndex(slab);
+       }
 }
 
 /*
@@ -622,11 +805,9 @@ SlabGetChunkSpace(void *pointer)
 bool
 SlabIsEmpty(MemoryContext context)
 {
-       SlabContext *slab = (SlabContext *) context;
-
-       Assert(SlabIsValid(slab));
+       Assert(SlabIsValid((SlabContext *) context));
 
-       return (slab->nblocks == 0);
+       return (context->mem_allocated > 0);
 }
 
 /*
@@ -654,13 +835,16 @@ SlabStats(MemoryContext context,
        Assert(SlabIsValid(slab));
 
        /* Include context header in totalspace */
-       totalspace = slab->headerSize;
+       totalspace = Slab_CONTEXT_HDRSZ(slab->chunksPerBlock);
 
-       for (i = 0; i <= slab->chunksPerBlock; i++)
+       /* Add the space consumed by blocks in the emptyblocks list */
+       totalspace += dclist_count(&slab->emptyblocks) * slab->blockSize;
+
+       for (i = 0; i < SLAB_BLOCKLIST_COUNT; i++)
        {
                dlist_iter      iter;
 
-               dlist_foreach(iter, &slab->freelist[i])
+               dlist_foreach(iter, &slab->blocklist[i])
                {
                        SlabBlock  *block = dlist_container(SlabBlock, node, 
iter.cur);
 
@@ -676,9 +860,9 @@ SlabStats(MemoryContext context,
                char            stats_string[200];
 
                snprintf(stats_string, sizeof(stats_string),
-                                "%zu total in %zu blocks; %zu free (%zu 
chunks); %zu used",
-                                totalspace, nblocks, freespace, freechunks,
-                                totalspace - freespace);
+                                "%zu total in %zu blocks; %u empty blocks; %zu 
free (%zu chunks); %zu used",
+                                totalspace, nblocks, 
dclist_count(&slab->emptyblocks),
+                                freespace, freechunks, totalspace - freespace);
                printfunc(context, passthru, stats_string, print_to_stderr);
        }
 
@@ -707,31 +891,45 @@ SlabCheck(MemoryContext context)
 {
        SlabContext *slab = (SlabContext *) context;
        int                     i;
+       int                     nblocks = 0;
        const char *name = slab->header.name;
+       dlist_iter      iter;
 
        Assert(SlabIsValid(slab));
        Assert(slab->chunksPerBlock > 0);
 
-       /* walk all the freelists */
-       for (i = 0; i <= slab->chunksPerBlock; i++)
+       /*
+        * Have a look at the empty blocks.  These should have all their chunks
+        * marked as free.  Ensure that's the case.
+        */
+       dclist_foreach(iter, &slab->emptyblocks)
+       {
+                       SlabBlock  *block = dlist_container(SlabBlock, node, 
iter.cur);
+
+                       if (block->nfree != slab->chunksPerBlock)
+                               elog(WARNING, "problem in slab %s: empty block 
%p should have %d free chunks but only has %d",
+                                        name, block, slab->chunksPerBlock, 
block->nfree);
+       }
+
+       /* walk all the block lists */
+       for (i = 0; i < SLAB_BLOCKLIST_COUNT; i++)
        {
                int                     j,
                                        nfree;
-               dlist_iter      iter;
 
-               /* walk all blocks on this freelist */
-               dlist_foreach(iter, &slab->freelist[i])
+               /* walk all blocks on this blocklist */
+               dlist_foreach(iter, &slab->blocklist[i])
                {
-                       int                     idx;
                        SlabBlock  *block = dlist_container(SlabBlock, node, 
iter.cur);
+                       MemoryChunk *cur_chunk;
 
                        /*
                         * Make sure the number of free chunks (in the block 
header)
-                        * matches position in the freelist.
+                        * matches position in the blocklist.
                         */
-                       if (block->nfree != i)
-                               elog(WARNING, "problem in slab %s: number of 
free chunks %d in block %p does not match freelist %d",
-                                        name, block->nfree, block, i);
+                       if (SlabBlocklistIndex(slab, block->nfree) != i)
+                               elog(WARNING, "problem in slab %s: block %p is 
on blocklist %d but should be on blocklist %d",
+                                        name, block, i, 
SlabBlocklistIndex(slab, block->nfree));
 
                        /* make sure the slab pointer correctly points to this 
context */
                        if (block->slab != slab)
@@ -740,28 +938,55 @@ SlabCheck(MemoryContext context)
 
                        /* reset the bitmap of free chunks for this block */
                        memset(slab->freechunks, 0, (slab->chunksPerBlock * 
sizeof(bool)));
-                       idx = block->firstFreeChunk;
+                       nfree = 0;
 
                        /*
-                        * Now walk through the chunks, count the free ones and 
also
-                        * perform some additional checks for the used ones. As 
the chunk
-                        * freelist is stored within the chunks themselves, we 
have to
-                        * walk through the chunks and construct our own bitmap.
+                        * Walk through the free list chunks and count the 
number of free
+                        * chunks.
                         */
+                       cur_chunk = block->freehead;
+                       while (cur_chunk != NULL)
+                       {
+                               int                     chunkidx = 
SlabChunkIndex(slab, block, cur_chunk);
+                               intptr_t        chunkmod = SlabChunkMod(slab, 
block, cur_chunk);
+
+                               /*
+                                * Make sure the free list link points to 
something on the
+                                * block
+                                */
+                               if ((char *) cur_chunk <= (char *) block ||
+                                       (char *) cur_chunk >= (char *) block + 
slab->blockSize ||
+                                       chunkmod != 0)
+                                       elog(WARNING, "problem in slab %s: 
bogus free list link %p in block %p",
+                                                name, cur_chunk, block);
 
-                       nfree = 0;
-                       while (idx < slab->chunksPerBlock)
+                               /* count the chunk as free, add it to the 
bitmap */
+                               nfree++;
+                               slab->freechunks[chunkidx] = true;
+
+                               /* read pointer of the next free chunk */
+                               
VALGRIND_MAKE_MEM_DEFINED(MemoryChunkGetPointer(cur_chunk), sizeof(MemoryChunk 
*));
+                               cur_chunk = *(MemoryChunk **) 
SlabChunkGetPointer(cur_chunk);
+                       }
+
+                       /*
+                        * count the remaining free chunks that have yet to 
make it onto
+                        * the block's free list.
+                        */
+                       cur_chunk = block->unused;
+                       for (j = 0; j < block->nunused; j++)
                        {
-                               MemoryChunk *chunk;
+                               int                     chunkidx = 
SlabChunkIndex(slab, block, cur_chunk);
+
+                               /* make sure we've not stepped off the end of 
the block */
+                               Assert((char *) cur_chunk < (char *) block + 
slab->blockSize);
 
                                /* count the chunk as free, add it to the 
bitmap */
                                nfree++;
-                               slab->freechunks[idx] = true;
+                               slab->freechunks[chunkidx] = true;
 
-                               /* read index of the next free chunk */
-                               chunk = SlabBlockGetChunk(slab, block, idx);
-                               
VALGRIND_MAKE_MEM_DEFINED(MemoryChunkGetPointer(chunk), sizeof(int32));
-                               idx = *(int32 *) MemoryChunkGetPointer(chunk);
+                               /* move forward 1 chunk */
+                               cur_chunk = (MemoryChunk *) (((char *) 
cur_chunk) + slab->fullChunkSize);
                        }
 
                        for (j = 0; j < slab->chunksPerBlock; j++)
@@ -795,10 +1020,15 @@ SlabCheck(MemoryContext context)
                        if (nfree != block->nfree)
                                elog(WARNING, "problem in slab %s: number of 
free chunks %d in block %p does not match bitmap %d",
                                         name, block->nfree, block, nfree);
+
+                       nblocks++;
                }
        }
 
-       Assert(slab->nblocks * slab->blockSize == context->mem_allocated);
+       /* the stored empty blocks are tracked in mem_allocated too */
+       nblocks += dclist_count(&slab->emptyblocks);
+
+       Assert(nblocks * slab->blockSize == context->mem_allocated);
 }
 
 #endif                                                 /* 
MEMORY_CONTEXT_CHECKING */
-- 
2.35.1.windows.2

diff --git a/src/backend/utils/adt/mcxtfuncs.c 
b/src/backend/utils/adt/mcxtfuncs.c
index 4add219553..8d04e8aad3 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -15,6 +15,8 @@
 
 #include "postgres.h"
 
+#include <time.h>
+
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "mb/pg_wchar.h"
@@ -193,3 +195,214 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
 
        PG_RETURN_BOOL(true);
 }
+
+typedef struct AllocateTestNext
+{
+       struct AllocateTestNext *next;          /* ptr to the next allocation */
+} AllocateTestNext;
+
+/* #define ALLOCATE_TEST_DEBUG */
+/*
+ * pg_allocate_memory_test
+ *             Used to test the performance of a memory context types
+ */
+Datum
+pg_allocate_memory_test(PG_FUNCTION_ARGS)
+{
+       int32   chunk_size = PG_GETARG_INT32(0);
+       int64   keep_memory = PG_GETARG_INT64(1);
+       int64   total_alloc = PG_GETARG_INT64(2);
+       text   *context_type_text = PG_GETARG_TEXT_PP(3);
+       char   *context_type;
+       int64   curr_memory_use = 0;
+       int64   remaining_alloc_bytes = total_alloc;
+       MemoryContext context;
+       MemoryContext oldContext;
+       AllocateTestNext           *next_free_ptr = NULL;
+       AllocateTestNext           *last_alloc = NULL;
+       clock_t start, end;
+
+       if (chunk_size < sizeof(AllocateTestNext))
+               elog(ERROR, "chunk_size (%d) must be at least %ld bytes", 
chunk_size,
+                        sizeof(AllocateTestNext));
+       if (keep_memory > total_alloc)
+               elog(ERROR, "keep_memory (" INT64_FORMAT ") must be less than 
total_alloc (" INT64_FORMAT ")",
+                        keep_memory, total_alloc);
+
+       context_type = text_to_cstring(context_type_text);
+
+       start = clock();
+
+       if (strcmp(context_type, "generation") == 0)
+               context = GenerationContextCreate(CurrentMemoryContext,
+                                                                               
  "pg_allocate_memory_test",
+                                                                               
  ALLOCSET_DEFAULT_SIZES);
+       else if (strcmp(context_type, "aset") == 0)
+               context = AllocSetContextCreate(CurrentMemoryContext,
+                                                                               
"pg_allocate_memory_test",
+                                                                               
ALLOCSET_DEFAULT_SIZES);
+       else if (strcmp(context_type, "slab") == 0)
+               context = SlabContextCreate(CurrentMemoryContext,
+                                                                       
"pg_allocate_memory_test",
+                                                                       
ALLOCSET_DEFAULT_MAXSIZE,
+                                                                       
chunk_size);
+       else
+               elog(ERROR, "context_type must be \"generation\", \"aset\" or 
\"slab\"");
+
+       oldContext = MemoryContextSwitchTo(context);
+
+       while (remaining_alloc_bytes > 0)
+       {
+               AllocateTestNext *curr_alloc;
+
+               CHECK_FOR_INTERRUPTS();
+
+               /* Allocate the memory and update the counters */
+               curr_alloc = (AllocateTestNext *) palloc(chunk_size);
+               remaining_alloc_bytes -= chunk_size;
+               curr_memory_use += chunk_size;
+
+#ifdef ALLOCATE_TEST_DEBUG
+               elog(NOTICE, "alloc %p (curr_memory_use " INT64_FORMAT " bytes, 
remaining_alloc_bytes " INT64_FORMAT ")", curr_alloc, curr_memory_use, 
remaining_alloc_bytes);
+#endif
+
+               /*
+                * Point the last allocate to this one so that we can free 
allocations
+                * starting with the oldest first.
+                */
+               curr_alloc->next = NULL;
+               if (last_alloc != NULL)
+                       last_alloc->next = curr_alloc;
+
+               if (next_free_ptr == NULL)
+               {
+                       /*
+                        * Remember the first chunk to free. We will follow the 
->next
+                        * pointers to find the next chunk to free when freeing 
memory
+                        */
+                       next_free_ptr = curr_alloc;
+               }
+
+               /*
+                * If the currently allocated memory has reached or exceeded 
the amount
+                * of memory we want to keep allocated at once then we'd better 
free
+                * some.  Since all allocations are the same size we only need 
to free
+                * one allocation per loop.
+                */
+               if (curr_memory_use >= keep_memory)
+               {
+                       AllocateTestNext         *next = next_free_ptr->next;
+
+                       /* free the memory and update the current memory usage 
*/
+                       pfree(next_free_ptr);
+                       curr_memory_use -= chunk_size;
+
+#ifdef ALLOCATE_TEST_DEBUG
+                       elog(NOTICE, "free %p (curr_memory_use " INT64_FORMAT " 
bytes, remaining_alloc_bytes " INT64_FORMAT ")", next_free_ptr, 
curr_memory_use, remaining_alloc_bytes);
+#endif
+                       /* get the next chunk to free */
+                       next_free_ptr = next;
+               }
+
+               if (curr_memory_use > 0)
+                       last_alloc = curr_alloc;
+               else
+                       last_alloc = NULL;
+       }
+
+       /* cleanup loop -- pfree remaining memory */
+       while (next_free_ptr != NULL)
+       {
+               AllocateTestNext         *next = next_free_ptr->next;
+
+               /* free the memory and update the current memory usage */
+               pfree(next_free_ptr);
+               curr_memory_use -= chunk_size;
+
+#ifdef ALLOCATE_TEST_DEBUG
+               elog(NOTICE, "free %p (curr_memory_use " INT64_FORMAT " bytes, 
remaining_alloc_bytes " INT64_FORMAT ")", next_free_ptr, curr_memory_use, 
remaining_alloc_bytes);
+#endif
+
+               next_free_ptr = next;
+       }
+
+       MemoryContextSwitchTo(oldContext);
+
+       end = clock();
+
+       PG_RETURN_FLOAT8((double) (end - start) / CLOCKS_PER_SEC);
+}
+
+Datum
+pg_allocate_memory_test_reset(PG_FUNCTION_ARGS)
+{
+       int32   chunk_size = PG_GETARG_INT32(0);
+       int64   keep_memory = PG_GETARG_INT64(1);
+       int64   total_alloc = PG_GETARG_INT64(2);
+       text   *context_type_text = PG_GETARG_TEXT_PP(3);
+       char   *context_type;
+       int64   curr_memory_use = 0;
+       int64   remaining_alloc_bytes = total_alloc;
+       MemoryContext context;
+       MemoryContext oldContext;
+       clock_t start, end;
+
+       if (chunk_size < 1)
+               elog(ERROR, "size of chunk must be above 0");
+       if (keep_memory > total_alloc)
+               elog(ERROR, "keep_memory (" INT64_FORMAT ") must be less than 
total_alloc (" INT64_FORMAT ")",
+                        keep_memory, total_alloc);
+
+       context_type = text_to_cstring(context_type_text);
+
+       start = clock();
+
+       if (strcmp(context_type, "generation") == 0)
+               context = GenerationContextCreate(CurrentMemoryContext,
+                                                                               
  "pg_allocate_memory_test",
+                                                                               
  ALLOCSET_DEFAULT_SIZES);
+       else if (strcmp(context_type, "aset") == 0)
+               context = AllocSetContextCreate(CurrentMemoryContext,
+                                                                               
"pg_allocate_memory_test",
+                                                                               
ALLOCSET_DEFAULT_SIZES);
+       else if (strcmp(context_type, "slab") == 0)
+               context = SlabContextCreate(CurrentMemoryContext,
+                                                                       
"pg_allocate_memory_test",
+                                                                       
ALLOCSET_DEFAULT_MAXSIZE,
+                                                                       
chunk_size);
+       else
+               elog(ERROR, "context_type must be \"generation\", \"aset\" or 
\"slab\"");
+
+       oldContext = MemoryContextSwitchTo(context);
+
+       while (remaining_alloc_bytes > 0)
+       {
+               CHECK_FOR_INTERRUPTS();
+
+               /* Allocate the memory and update the counters */
+               (void) palloc(chunk_size);
+               remaining_alloc_bytes -= chunk_size;
+               curr_memory_use += chunk_size;
+
+#ifdef ALLOCATE_TEST_DEBUG
+               elog(NOTICE, "alloc %p (curr_memory_use " INT64_FORMAT " bytes, 
remaining_alloc_bytes " INT64_FORMAT ")", curr_alloc, curr_memory_use, 
remaining_alloc_bytes);
+#endif
+
+               /*
+                * If the currently allocated memory has reached or exceeded 
the amount
+                * of memory we want to keep allocated at once then reset the 
context.
+                */
+               if (curr_memory_use >= keep_memory)
+               {
+                       curr_memory_use = 0;
+                       MemoryContextReset(context);
+               }
+       }
+
+       MemoryContextSwitchTo(oldContext);
+       MemoryContextDelete(context);
+
+       end = clock();
+
+       PG_RETURN_FLOAT8((double) (end - start) / CLOCKS_PER_SEC);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 719599649a..6bb53529a0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8143,6 +8143,18 @@
   prorettype => 'bool', proargtypes => 'int4',
   prosrc => 'pg_log_backend_memory_contexts' },
 
+# just for testing memory context allocation speed
+{ oid => '9319', descr => 'for testing performance of allocation and freeing',
+  proname => 'pg_allocate_memory_test', provolatile => 'v',
+  prorettype => 'float8', proargtypes => 'int4 int8 int8 text',
+  prosrc => 'pg_allocate_memory_test' },
+
+# just for testing memory context allocation speed
+{ oid => '9320', descr => 'for testing performance of allocation and 
resetting',
+  proname => 'pg_allocate_memory_test_reset', provolatile => 'v',
+  prorettype => 'float8', proargtypes => 'int4 int8 int8 text',
+  prosrc => 'pg_allocate_memory_test_reset' },
+
 # non-persistent series generator
 { oid => '1066', descr => 'non-persistent series generator',
   proname => 'generate_series', prorows => '1000',

Re: slab allocator performance issues

Reply via email to