On 02/09/2017 10:37 PM, Andres Freund wrote:
Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
 src/backend/utils/mmgr/Makefile   |   2 +-
 src/backend/utils/mmgr/aset.c     | 115 +--------------------------------
 src/backend/utils/mmgr/memdebug.c | 131 ++++++++++++++++++++++++++++++++++++++
 src/include/utils/memdebug.h      |  22 +++++++
 4 files changed, 156 insertions(+), 114 deletions(-)
 create mode 100644 src/backend/utils/mmgr/memdebug.c

I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower?  Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).


Do you have any numbers to support that? AFAICS compilers got really good in inlining stuff on their own.


From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <to...@2ndquadrant.com>
Date: Wed, 30 Nov 2016 15:36:23 +0100
Subject: [PATCH 2/3] slab allocator

---
 src/backend/replication/logical/reorderbuffer.c |  82 +--
 src/backend/utils/mmgr/Makefile                 |   2 +-
 src/backend/utils/mmgr/slab.c                   | 803 ++++++++++++++++++++++++
 src/include/nodes/memnodes.h                    |   2 +-
 src/include/nodes/nodes.h                       |   1 +
 src/include/replication/reorderbuffer.h         |  15 +-
 src/include/utils/memutils.h                    |   9 +

I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.


I rather dislike patches that only add a bunch of code, without actually using it anywhere. But if needed, this is trivial to do at commit time - just don't commit the reorderbuffer bits.




+/*-------------------------------------------------------------------------
+ *
+ * slab.c
+ *       SLAB allocator definitions.
+ *
+ * SLAB is a custom MemoryContext implementation designed for cases of
+ * equally-sized objects.
+ *
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group

Bump, before a committer forgets it.


OK.


+ * IDENTIFICATION
+ *       src/backend/utils/mmgr/slab.c
+ *
+ *
+ *     The constant allocation size allows significant simplification and 
various
+ *     optimizations. Firstly, we can get rid of the doubling and carve the 
blocks
+ *     into chunks of exactly the right size (plus alignment), not wasting 
memory.

Getting rid of it relative to what? I'd try to phrase it so these
comments stand on their own.


OK, rill reword.


+ *     Each block includes a simple bitmap tracking which chunks are used/free.
+ *     This makes it trivial to check if all chunks on the block are free, and
+ *     eventually free the whole block (which is almost impossible with a 
global
+ *     freelist of chunks, storing chunks from all blocks).

Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency.  A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?


A block-level counter would be enough to decide if all chunks on the block are free, but it's not sufficient to identify which chunks are free / available for reuse.

The bitmap only has a single bit per chunk, so I find "potentially long-ish" is a bit misleading. Any linked list implementation will require much more per-chunk overhead - as the chunks are fixed-legth, it's possible to use chunk index (instead of 64-bit pointers), to save some space. But with large blocks / small chunks that's still at least 2 or 4 bytes per index, and you'll need two (to implement doubly-linked list, to make add/remove efficient).

For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap that's 16B to track all free space on the block. Doubly linked list would require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B.

I have a hard time believing this the cache efficiency of linked lists (which may or may not be real in this case) out-weights this, but if you want to try, be my guest.


+ *     At the context level, we use 'freelist' to track blocks ordered by 
number
+ *     of free chunks, starting with blocks having a single allocated chunk, 
and
+ *     with completely full blocks on the tail.

Why that way round? Filling chunks up as much as possible is good for
cache and TLB efficiency, and allows for earlier de-allocation of
partially used blocks?  Oh, I see you do that in the next comment,
but it still leaves me wondering.

Also, is this actually a list? It's more an array of lists, right?
I.e. it should be named freelists?


Possibly. Naming things is hard.

>
Thirdly, isn't that approach going to result in a quite long freelists
array, when you have small items and a decent blocksize? That seems like
a fairly reasonable thing to do?


I'm confused. Why wouldn't that be reasonable. Or rather, what would be a more reasonable way?


+ *     This also allows various optimizations - for example when searching for
+ *     free chunk, we the allocator reuses space from the most full blocks 
first,
+ *     in the hope that some of the less full blocks will get completely empty
+ *     (and returned back to the OS).

Might be worth mentioning tlb/cache efficiency too.


I haven't really considered tlb/cache very much. The main goal of this design was to free blocks (instead of keeping many partially-used blocks around). If you have comments on this, feel free to add them.


+ *     For each block, we maintain pointer to the first free chunk - this is 
quite
+ *     cheap and allows us to skip all the preceding used chunks, eliminating
+ *     a significant number of lookups in many common usage patters. In the 
worst
+ *     case this performs as if the pointer was not maintained.

Hm, so that'd be eliminated if we maintained a linked list of chunks (by
chunk number) and a free_chunk_cnt or such.


As I explained above, I don't think linked list is a good solution. IIRC correctly I've initially done that, and ended using the bitmap. If you have idea how to do that, feel free to implement that and then we can do some measurements and compare the patches.


+
+#include "postgres.h"
+
+#include "utils/memdebug.h"
+#include "utils/memutils.h"
+#include "lib/ilist.h"

Move ilist up, above memdebug, so the list is alphabetically ordered.


OK


+/*
+ * SlabPointer
+ *             Aligned pointer which may be a member of an allocation set.
+ */
+typedef void *SlabPointer;
+typedef SlabContext *Slab;

I personally wont commit this whith pointer hiding typedefs.  If
somebody else does, I can live with it, but for me it's bad enough taste
that I wont.


Meh.


+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+       MemoryContextData header;       /* Standard memory-context fields */
+       /* Allocation parameters for this context: */
+       Size            chunkSize;              /* chunk size */
+       Size            fullChunkSize;  /* chunk size including header and 
alignment */
+       Size            blockSize;              /* block size */
+       int                     chunksPerBlock; /* number of chunks per block */
+       int                     minFreeChunks;  /* min number of free chunks in 
any block */
+       int                     nblocks;                /* number of blocks 
allocated */
+       /* blocks with free space, grouped by number of free chunks: */
+       dlist_head      freelist[FLEXIBLE_ARRAY_MEMBER];
+}      SlabContext;
+

Why aren't these ints something unsigned?


Yeah, some of those could be unsigned. Will check.


+/*
+ * SlabIsValid
+ *             True iff set is valid allocation set.
+ */
+#define SlabIsValid(set) PointerIsValid(set)

It's not your fault, but this "iff" is obviously a lot stronger than the
actual test ;).  I seriously doubt this macro is worth anything...


Yeah.


+/*
+ * SlabReset
+ *             Frees all memory which is allocated in the given set.
+ *
+ * The code simply frees all the blocks in the context - we don't keep any
+ * keeper blocks or anything like that.
+ */

Why don't we?  Seems quite worthwhile?   Thinking about this, won't this
result in a drastic increase of system malloc/mmap/brk traffic when
there's lot of short transactions in reorderbuffer?


I haven't seen any significant impact of that during the tests I've done (even with many tiny transactions), but it adding keeper blocks should be trivial.


+static void
+SlabReset(MemoryContext context)
+{
+       /* walk over freelists and free the blocks */
+       for (i = 0; i <= set->chunksPerBlock; i++)
+       {
+               dlist_mutable_iter miter;
+
+               dlist_foreach_modify(miter, &set->freelist[i])
+               {
+                       SlabBlock       block = dlist_container(SlabBlockData, 
node, miter.cur);
+
+                       dlist_delete(miter.cur);
+
+                       /* Normal case, release the block */

What does "normal case" refer to here? Given that there's no alternative
case...


Mem, bogus comment.


+       /*
+        * We need to update index of the next free chunk on the block. If we 
used
+        * the last free chunk on this block, set it to chunksPerBlock (which is
+        * not a valid chunk index). Otherwise look for the next chunk - we know
+        * that it has to be above the current firstFreeChunk value, thanks to 
how
+        * we maintain firstFreeChunk here and in SlabFree().
+        */
+       if (block->nfree == 0)
+               block->firstFreeChunk = set->chunksPerBlock;
+       else
+       {
+               /* look for the next free chunk in the block, after the first 
one */
+               while ((++block->firstFreeChunk) < set->chunksPerBlock)
+               {
+                       int                     byte = block->firstFreeChunk / 
8;
+                       int                     bit = block->firstFreeChunk % 8;
+
+                       /* stop when you find 0 (unused chunk) */
+                       if (!(block->bitmapptr[byte] & (0x01 << bit)))
+                               break;
+               }
+
+               /* must have found the free chunk */
+               Assert(block->firstFreeChunk != set->chunksPerBlock);
+       }

This and previous code just re-affirms my opinion that a bitmap is not
the best structure here.


It'd be great if you could explain why, instead of just making such claims ...


+       /* move the whole block to the right place in the freelist */
+       dlist_delete(&block->node);
+       dlist_push_head(&set->freelist[block->nfree], &block->node);

Hm.  What if we, instead of the array of doubly linked lists, just kept
a single linked list of blocks, and keep that list sorted by number of
free chunks?  Given that freeing / allocation never changes the number
of allocated chunks by more than 1, we'll never have to move an entry
far in that list to keep it sorted.


Only assuming that there'll be only few blocks with the same number of free chunks. If that's not the case, you'll have to walk many blocks to move the block to the right place in the list. The array of lists handles such cases way more efficiently, and I think we should keep it.


+/*
+ * SlabRealloc
+ *             As Slab is designed for allocating equally-sized chunks of 
memory, it
+ *             can't really do an actual realloc.
+ *
+ * We try to be gentle and allow calls with exactly the same size as in that
+ * case we can simply return the same chunk. When the size differs, we fail
+ * with assert failure or return NULL.
+ *
+ * We might be even support cases with (size < chunkSize). That however seems
+ * rather pointless - Slab is meant for chunks of constant size, and moreover
+ * realloc is usually used to enlarge the chunk.
+ *
+ * XXX Perhaps we should not be gentle at all and simply fails in all cases,
+ * to eliminate the (mostly pointless) uncertainty.

I think I'm in favor of that.  This seems more likely to hide a bug than
actually helpful.


OK.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to