On 11/12/2016 08:12 PM, Andres Freund wrote:
Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...


+ * IDENTIFICATION
+ *       src/backend/utils/mmgr/slab.c
+ *
+ *
+ *     The constant allocation size allows significant simplification and 
various
+ *     optimizations that are not possible in AllocSet. Firstly, we can get rid
+ *     of the doubling and carve the blocks into chunks of exactly the right 
size
+ *     (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.


Yeah, that's probably true.

+ *
+ *     At the context level, we use 'freelist' array to track blocks grouped by
+ *     number of free chunks. For example freelist[0] is a list of completely 
full
+ *     blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?

So there'd be one huge list of blocks, sorted by the number of empty chunks? Hmm, that might work I guess.

I don't think the combination of large blocks with small allocations is particularly sensible, though - what exactly would be the benefit of such combination? I would even consider enforcing some upper limit on the number of chunks per block - say, 256, for example.



+ *     About MEMORY_CONTEXT_CHECKING:
+ *
+ *     Since we usually round request sizes up to the next power of 2, there
+ *     is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?


Yeah, sorry about that.


+ *     Thus, if someone makes the common error of writing past what they've
+ *     requested, the problem is likely to go unnoticed ... until the day when
+ *     there *isn't* any wasted space, perhaps because of different memory
+ *     ...
+ *
+ *     See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.


Yes, I planned to do that for the next version of patch. Laziness.


+typedef struct SlabBlockData *SlabBlock;               /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.


Why is it a bad pattern?

+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+       MemoryContextData header;       /* Standard memory-context fields */
+       /* Allocation parameters for this context: */
+       Size            chunkSize;              /* chunk size */
+       Size            fullChunkSize;  /* chunk size including header and 
alignment */
+       Size            blockSize;              /* block size */
+       int                     chunksPerBlock; /* number of chunks per block */
+       int                     minFreeChunks;  /* min number of free chunks in 
any block */
+       int                     nblocks;                /* number of blocks 
allocated */
+       /* Info about storage allocated in this context: */
+       SlabBlock       freelist[1];    /* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.


Will fix, thanks.

+/*
+ * SlabBlockData
+ *             Structure of a single block in SLAB allocator.
+ *
+ * slab: context owning this block

What do we need this for?


You're right the pointer to the owning context is unnecessary - there's nothing like "standard block header" and we already have the pointer in the standard chunk header. But maybe keeping the pointer at least with MEMORY_CONTEXT_CHECKING would be a good idea?

+ * prev, next: used for doubly-linked list of blocks in global freelist

I'd prefer using an embedded list here (cf. ilist.h).


Makes sense.

+/*
+ * SlabChunk
+ *             The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ * However it's possible to fields in front of the StandardChunkHeader fields,
+ * which is used to add pointer to the block.
+ */

Wouldn't that be easier to enforce - particularly around alignment
requirements - by embedding a StandardChunkHeader here? That'd also
avoid redundancies.


Also makes sense.

+/* ----------
+ * Debug macros
+ * ----------
+ */
+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+                       fprintf(stderr, "SlabFree: %s: %p, %d\n", \
+                               (_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+                       fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
+                               (_cxt)->header.name, (_chunk), (_chunk)->size)
+#else
+#define SlabFreeInfo(_cxt, _chunk)
+#define SlabAllocInfo(_cxt, _chunk)
+#endif

Do we really have to copy that stuff from aset.c? Obviously no-one uses
that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
defined:
/home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: 
format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘Size {aka 
long unsigned int}’ [-Wformat=]
    fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \


I don't really care. Sure, we should fix the warning, but not supporting HAVE_ALLOCINFO in the new allocator(s) seems wrong - we should either support it everywhere, or we should rip it out. That's not the purpose of this patch, though.


+#ifdef CLOBBER_FREED_MEMORY
+
+/* Wipe freed memory for debugging purposes */
+static void
+wipe_mem(void *ptr, size_t size)

+#ifdef MEMORY_CONTEXT_CHECKING
+static void
+set_sentinel(void *base, Size offset)
+
+static bool
+sentinel_ok(const void *base, Size offset)
+#endif

+#ifdef RANDOMIZE_ALLOCATED_MEMORY
+static void
+randomize_mem(char *ptr, size_t size)

Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
static inlines.


Yes, next to the valgrind stuff.

+MemoryContext
+SlabContextCreate(MemoryContext parent,
+                                         const char *name,
+                                         Size blockSize,
+                                         Size chunkSize)
+{
+       int             chunksPerBlock;
+       Size    fullChunkSize;
+       Slab    set;
+
+       /* chunk, including SLAB header (both addresses nicely aligned) */
+       fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

+       /* make sure the block can store at least one chunk (plus a bitmap) */
+       if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
+               elog(ERROR, "block size %ld for slab is too small for chunks 
%ld",
+                                       blockSize, chunkSize);

I assume the 1 is the bitmap size?


Yes, the smallest bitmap is 1 byte.


+       /* so how many chunks can we fit into a block, including header and 
bitmap? */
+       chunksPerBlock
+               =  (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * 
fullChunkSize + 1);

I'm slightly drunk due to bad airline wine, but right now that seems a
bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
pattern elsewhere, but right now I can't follow the - 7.


We need all the bits (header, chunks and bitmap) to fit onto the block, so this needs to hold:

   blockSize >= sizeof(SlabBlockData) +
                chunksPerBlock * fullChunkSize +
                (chunksPerBlock + 7) / 8

solve for 'chunksPerBlock' and you'll get the above formula. Moving the 7 to the other side of the inequality is the reason for the minus.

But documenting this is probably a good idea.

+/*
+ * SlabAlloc
+ *             Returns pointer to allocated memory of given size or NULL if
+ *             request could not be completed; memory is added to the set.
+ *
+ * No request may exceed:
+ *             MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
+ * All callers use a much-lower limit.

That seems like a meaningless comment in the context of a slab allocator
with a fixed size.


Why? It might be worth moving this to SlabContextCreate though.


+       /*
+        * If there are no free chunks in any existing block, create a new block
+        * and put it to the last freelist bucket.
+        *
+        * (set->minFreeChunks == 0) means there are no blocks with free chunks,
+        * thanks to how minFreeChunks is updated at the end of SlabAlloc().
+        */
+       if (set->minFreeChunks == 0)
+       {
+               block = (SlabBlock)malloc(set->blockSize);

Space after cast - maybe run pgindent over the file before submission?
Doing that manually helps to avoid ugly damange by the per-release run
later. I'm pretty sure there'll be a significant number of changes.


Will do.



+       if (block->nfree == 0)
+               block->firstFreeChunk = set->chunksPerBlock;
+       else
+       {
+               /* look for the next free chunk in the block, after the first 
one */
+               while ((++block->firstFreeChunk) < set->chunksPerBlock)
+               {
+                       int byte = block->firstFreeChunk / 8;
+                       int bit  = block->firstFreeChunk % 8;
+
+                       /* stop when you find 0 (unused chunk) */
+                       if (! (block->bitmapptr[byte] & (0x01 << bit)))
+                               break;
+               }

I haven't profiled (or even compiled) this patchset yet, but FWIW, in
the tuple deforming code, I could measure a noticeable speedup by
accessing bitmap-bytes in the native word-size, rather than char. I'm
*NOT* saying you should do that, but if this ever shows up as a
bottlneck, it might be worthwhile to optimize.


OK, will try, although I don't expect this branch to be very hot.

+       /*
+        * And finally update minFreeChunks, i.e. the index to the block with 
the
+        * lowest number of free chunks. We only need to do that when the block
+        * got full (otherwise we know the current block is the right one).
+        * We'll simply walk the freelist until we find a non-empty entry.
+        */
+       if (set->minFreeChunks == 0)
+               for (idx = 1; idx <= set->chunksPerBlock; idx++)
+                       if (set->freelist[idx])
+                       {
+                               set->minFreeChunks = idx;
+                               break;
+                       }

Yuck. This definitely needs braces.


OK ;-)


thanks

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to