On Wed, Sep 28, 2022 at 1:18 PM I wrote: > Along those lines, one thing I've been thinking about is the number of size classes. There is a tradeoff between memory efficiency and number of branches when searching/inserting. My current thinking is there is too much coupling between size class and data type. Each size class currently uses a different data type and a different algorithm to search and set it, which in turn requires another branch. We've found that a larger number of size classes leads to poor branch prediction [1] and (I imagine) code density. > > I'm thinking we can use "flexible array members" for the values/pointers, and keep the rest of the control data in the struct the same. That way, we never have more than 4 actual "kinds" to code and branch on. As a bonus, when migrating a node to a larger size class of the same kind, we can simply repalloc() to the next size.
While the most important challenge right now is how to best represent and organize the shared memory case, I wanted to get the above idea working and out of the way, to be saved for a future time. I've attached a rough implementation (applies on top of v9 0003) that splits node32 into 2 size classes. They both share the exact same base data type and hence the same search/set code, so the number of "kind"s is still four, but here there are five "size classes", so a new case in the "unlikely" node-growing path. The smaller instance of node32 is a "node15", because that's currently 160 bytes, corresponding to one of the DSA size classes. This idea can be applied to any other node except the max size, as we see fit. (Adding a singleton size class would bring it back in line with the prototype, at least as far as memory consumption.) One issue with this patch: The "fanout" member is a uint8, so it can't hold 256 for the largest node kind. That's not an issue in practice, since we never need to grow it, and we only compare that value with the count in an Assert(), so I just set it to zero. That does break an invariant, so it's not great. We could use 2 bytes to be strictly correct in all cases, but that limits what we can do with the smallest node kind. In the course of working on this, I encountered a pain point. Since it's impossible to repalloc in slab, we have to do alloc/copy/free ourselves. That's fine, but the current coding makes too many assumptions about the use cases: rt_alloc_node and rt_copy_node are too entangled with each other and do too much work unrelated to what the names imply. I seem to remember an earlier version had something like rt_node_copy_common that did only...copying. That was much easier to reason about. In 0002 I resorted to doing my own allocation to show what I really want to do, because the new use case doesn't need zeroing and setting values. It only needs to...allocate (and increase the stats counter if built that way). Future optimization work while I'm thinking of it: rt_alloc_node should be always-inlined and the memset done separately (i.e. not *AllocZero). That way the compiler should be able generate more efficient zeroing code for smaller nodes. I'll test the numbers on this sometime in the future. -- John Naylor EDB: http://www.enterprisedb.com
From 6fcc970ae7e31f44fa6b6aface983cadb023cc50 Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Thu, 17 Nov 2022 16:10:44 +0700 Subject: [PATCH v901 2/2] Make node32 variable sized Add a size class for 15 elements, which corresponds to 160 bytes, an allocation size used by DSA. When a 16th element is to be inserted, allocte a larger area and memcpy the entire old node to it. NB: Zeroing the new area is only necessary if it's for an inner node128, since insert logic must check for null child pointers. This technique allows us to limit the node kinds to 4, which 1. limits the number of cases in switch statements 2. allows a possible future optimization to encode the node kind in a pointer tag --- src/backend/lib/radixtree.c | 141 +++++++++++++++++++++++++++--------- 1 file changed, 108 insertions(+), 33 deletions(-) diff --git a/src/backend/lib/radixtree.c b/src/backend/lib/radixtree.c index bef1a438ab..f368e750d5 100644 --- a/src/backend/lib/radixtree.c +++ b/src/backend/lib/radixtree.c @@ -130,6 +130,7 @@ typedef enum typedef enum rt_size_class { RT_CLASS_4_FULL = 0, + RT_CLASS_32_PARTIAL, RT_CLASS_32_FULL, RT_CLASS_128_FULL, RT_CLASS_256 @@ -147,6 +148,8 @@ typedef struct rt_node uint16 count; /* Max number of children. We can use uint8 because we never need to store 256 */ + /* WIP: if we don't have a variable sized node4, this should instead be in the base + types as needed, since saving every byte is crucial for the smallest node kind */ uint8 fanout; /* @@ -166,6 +169,8 @@ typedef struct rt_node ((node)->base.n.count < (node)->base.n.fanout) /* Base type of each node kinds for leaf and inner nodes */ +/* The base types must be a be able to accommodate the largest size +class for variable-sized node kinds*/ typedef struct rt_node_base_4 { rt_node n; @@ -217,40 +222,40 @@ typedef struct rt_node_inner_4 { rt_node_base_4 base; - /* 4 children, for key chunks */ - rt_node *children[4]; + /* number of children depends on size class */ + rt_node *children[FLEXIBLE_ARRAY_MEMBER]; } rt_node_inner_4; typedef struct rt_node_leaf_4 { rt_node_base_4 base; - /* 4 values, for key chunks */ - uint64 values[4]; + /* number of values depends on size class */ + uint64 values[FLEXIBLE_ARRAY_MEMBER]; } rt_node_leaf_4; typedef struct rt_node_inner_32 { rt_node_base_32 base; - /* 32 children, for key chunks */ - rt_node *children[32]; + /* number of children depends on size class */ + rt_node *children[FLEXIBLE_ARRAY_MEMBER]; } rt_node_inner_32; typedef struct rt_node_leaf_32 { rt_node_base_32 base; - /* 32 values, for key chunks */ - uint64 values[32]; + /* number of values depends on size class */ + uint64 values[FLEXIBLE_ARRAY_MEMBER]; } rt_node_leaf_32; typedef struct rt_node_inner_128 { rt_node_base_128 base; - /* Slots for 128 children */ - rt_node *children[128]; + /* number of children depends on size class */ + rt_node *children[FLEXIBLE_ARRAY_MEMBER]; } rt_node_inner_128; typedef struct rt_node_leaf_128 @@ -260,8 +265,8 @@ typedef struct rt_node_leaf_128 /* isset is a bitmap to track which slot is in use */ uint8 isset[RT_NODE_NSLOTS_BITS(128)]; - /* Slots for 128 values */ - uint64 values[128]; + /* number of values depends on size class */ + uint64 values[FLEXIBLE_ARRAY_MEMBER]; } rt_node_leaf_128; /* @@ -307,32 +312,40 @@ typedef struct rt_size_class_elem * from the block. */ #define NODE_SLAB_BLOCK_SIZE(size) \ - Max((SLAB_DEFAULT_BLOCK_SIZE / (size)) * size, (size) * 32) + Max((SLAB_DEFAULT_BLOCK_SIZE / (size)) * (size), (size) * 32) static rt_size_class_elem rt_size_class_info[RT_SIZE_CLASS_COUNT] = { [RT_CLASS_4_FULL] = { .name = "radix tree node 4", .fanout = 4, - .inner_size = sizeof(rt_node_inner_4), - .leaf_size = sizeof(rt_node_leaf_4), - .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_4)), - .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_4)), + .inner_size = sizeof(rt_node_inner_4) + 4 * sizeof(rt_node *), + .leaf_size = sizeof(rt_node_leaf_4) + 4 * sizeof(uint64), + .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_4) + 4 * sizeof(rt_node *)), + .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_4) + 4 * sizeof(uint64)), + }, + [RT_CLASS_32_PARTIAL] = { + .name = "radix tree node 15", + .fanout = 15, + .inner_size = sizeof(rt_node_inner_32) + 15 * sizeof(rt_node *), + .leaf_size = sizeof(rt_node_leaf_32) + 15 * sizeof(uint64), + .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_32) + 15 * sizeof(rt_node *)), + .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_32) + 15 * sizeof(uint64)), }, [RT_CLASS_32_FULL] = { .name = "radix tree node 32", .fanout = 32, - .inner_size = sizeof(rt_node_inner_32), - .leaf_size = sizeof(rt_node_leaf_32), - .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_32)), - .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_32)), + .inner_size = sizeof(rt_node_inner_32) + 32 * sizeof(rt_node *), + .leaf_size = sizeof(rt_node_leaf_32) + 32 * sizeof(uint64), + .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_32) + 32 * sizeof(rt_node *)), + .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_32) + 32 * sizeof(uint64)), }, [RT_CLASS_128_FULL] = { .name = "radix tree node 128", .fanout = 128, - .inner_size = sizeof(rt_node_inner_128), - .leaf_size = sizeof(rt_node_leaf_128), - .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_128)), - .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_128)), + .inner_size = sizeof(rt_node_inner_128) + 128 * sizeof(rt_node *), + .leaf_size = sizeof(rt_node_leaf_128) + 128 * sizeof(uint64), + .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_128) + 128 * sizeof(rt_node *)), + .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_128) + 128 * sizeof(uint64)), }, [RT_CLASS_256] = { .name = "radix tree node 256", @@ -922,7 +935,6 @@ rt_free_node(radix_tree *tree, rt_node *node) #ifdef RT_DEBUG /* update the statistics */ - // FIXME for (i = 0; i < RT_SIZE_CLASS_COUNT; i++) { if (node->fanout == rt_size_class_info[i].fanout) @@ -1240,7 +1252,7 @@ rt_node_insert_inner(radix_tree *tree, rt_node *parent, rt_node *node, uint64 ke /* grow node from 4 to 32 */ new32 = (rt_node_inner_32 *) rt_copy_node(tree, (rt_node *) n4, - RT_NODE_KIND_32, RT_CLASS_32_FULL); + RT_NODE_KIND_32, RT_CLASS_32_PARTIAL); chunk_children_array_copy(n4->base.chunks, n4->children, new32->base.chunks, new32->children, n4->base.n.count); @@ -1282,6 +1294,37 @@ rt_node_insert_inner(radix_tree *tree, rt_node *parent, rt_node *node, uint64 ke if (unlikely(!NODE_HAS_FREE_SLOT(n32))) { + Assert(parent != NULL); + + if (n32->base.n.fanout == rt_size_class_info[RT_CLASS_32_PARTIAL].fanout) + { + /* use the same node kind, but expand to the next size class */ + + /* no need to zero the new memory */ + rt_node_inner_32 *new32 = + (rt_node_inner_32 *) MemoryContextAlloc(tree->inner_slabs[RT_CLASS_32_FULL], + rt_size_class_info[RT_CLASS_32_FULL].inner_size); + +// FIXME the API for rt_alloc_node and rt_node_copy are too entangled +#ifdef RT_DEBUG + /* update the statistics */ + tree->cnt[RT_CLASS_32_FULL]++; +#endif + /* copy the entire old node -- the new node is only different in having + additional slots so we only have to change the fanout */ + memcpy(new32, n32, rt_size_class_info[RT_CLASS_32_PARTIAL].inner_size); + new32->base.n.fanout = rt_size_class_info[RT_CLASS_32_FULL].fanout; + + rt_replace_node(tree, parent, (rt_node *) n32, (rt_node *) new32, + key); + + /* must update both pointers here */ + node = (rt_node *) new32; + n32 = new32; + goto retry_insert_inner_32; + } + else + { rt_node_inner_128 *new128; /* grow node from 32 to 128 */ @@ -1290,13 +1333,14 @@ rt_node_insert_inner(radix_tree *tree, rt_node *parent, rt_node *node, uint64 ke for (int i = 0; i < n32->base.n.count; i++) node_inner_128_insert(new128, n32->base.chunks[i], n32->children[i]); - Assert(parent != NULL); rt_replace_node(tree, parent, (rt_node *) n32, (rt_node *) new128, key); node = (rt_node *) new128; + } } else { +retry_insert_inner_32: int insertpos = node_32_get_insertpos((rt_node_base_32 *) n32, chunk); int16 count = n32->base.n.count; @@ -1409,12 +1453,10 @@ rt_node_insert_leaf(radix_tree *tree, rt_node *parent, rt_node *node, /* grow node from 4 to 32 */ new32 = (rt_node_leaf_32 *) rt_copy_node(tree, (rt_node *) n4, - RT_NODE_KIND_32, RT_CLASS_32_FULL); + RT_NODE_KIND_32, RT_CLASS_32_PARTIAL); chunk_values_array_copy(n4->base.chunks, n4->values, new32->base.chunks, new32->values, n4->base.n.count); - - Assert(parent != NULL); rt_replace_node(tree, parent, (rt_node *) n4, (rt_node *) new32, key); node = (rt_node *) new32; @@ -1451,6 +1493,37 @@ rt_node_insert_leaf(radix_tree *tree, rt_node *parent, rt_node *node, if (unlikely(!NODE_HAS_FREE_SLOT(n32))) { + Assert(parent != NULL); + + if (n32->base.n.fanout == rt_size_class_info[RT_CLASS_32_PARTIAL].fanout) + { + /* use the same node kind, but expand to the next size class */ + + /* no need to zero the new memory */ + rt_node_leaf_32 *new32 = + (rt_node_leaf_32 *) MemoryContextAlloc(tree->leaf_slabs[RT_CLASS_32_FULL], + rt_size_class_info[RT_CLASS_32_FULL].leaf_size); + +// FIXME the API for rt_alloc_node and rt_node_copy are too entangled +#ifdef RT_DEBUG + /* update the statistics */ + tree->cnt[RT_CLASS_32_FULL]++; +#endif + /* copy the entire old node -- the new node is only different in having + additional slots so we only have to change the fanout */ + memcpy(new32, n32, rt_size_class_info[RT_CLASS_32_PARTIAL].leaf_size); + new32->base.n.fanout = rt_size_class_info[RT_CLASS_32_FULL].fanout; + + rt_replace_node(tree, parent, (rt_node *) n32, (rt_node *) new32, + key); + + /* must update both pointers here */ + node = (rt_node *) new32; + n32 = new32; + goto retry_insert_leaf_32; + } + else + { rt_node_leaf_128 *new128; /* grow node from 32 to 128 */ @@ -1459,13 +1532,14 @@ rt_node_insert_leaf(radix_tree *tree, rt_node *parent, rt_node *node, for (int i = 0; i < n32->base.n.count; i++) node_leaf_128_insert(new128, n32->base.chunks[i], n32->values[i]); - Assert(parent != NULL); rt_replace_node(tree, parent, (rt_node *) n32, (rt_node *) new128, key); node = (rt_node *) new128; + } } else { +retry_insert_leaf_32: int insertpos = node_32_get_insertpos((rt_node_base_32 *) n32, chunk); int count = n32->base.n.count; @@ -2189,10 +2263,11 @@ rt_verify_node(rt_node *node) void rt_stats(radix_tree *tree) { - ereport(LOG, (errmsg("num_keys = %lu, height = %u, n4 = %u, n32 = %u, n128 = %u, n256 = %u", + ereport(NOTICE, (errmsg("num_keys = %lu, height = %u, n4 = %u, n15 = %u, n32 = %u, n128 = %u, n256 = %u", tree->num_keys, tree->root->shift / RT_NODE_SPAN, tree->cnt[RT_CLASS_4_FULL], + tree->cnt[RT_CLASS_32_PARTIAL], tree->cnt[RT_CLASS_32_FULL], tree->cnt[RT_CLASS_128_FULL], tree->cnt[RT_CLASS_256]))); -- 2.38.1
From 15e16df13912d265c3b1eda858456de6fe595c33 Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Thu, 17 Nov 2022 12:10:31 +0700 Subject: [PATCH v901 1/2] Preparatory refactoring for decoupling kind from size class Rename the current kind info array to refer to size classes, but keep all the contents the same. Add a fanout member to all nodes which stores the max capacity of the node. This is currently set with the same hardcoded value as in the kind info array. In passing, remove outdated reference to node16 in the regression test. --- src/backend/lib/radixtree.c | 147 +++++++++++------- .../modules/test_radixtree/test_radixtree.c | 1 - 2 files changed, 87 insertions(+), 61 deletions(-) diff --git a/src/backend/lib/radixtree.c b/src/backend/lib/radixtree.c index bd58b2bfad..bef1a438ab 100644 --- a/src/backend/lib/radixtree.c +++ b/src/backend/lib/radixtree.c @@ -127,6 +127,16 @@ typedef enum #define RT_NODE_KIND_256 0x03 #define RT_NODE_KIND_COUNT 4 +typedef enum rt_size_class +{ + RT_CLASS_4_FULL = 0, + RT_CLASS_32_FULL, + RT_CLASS_128_FULL, + RT_CLASS_256 + +#define RT_SIZE_CLASS_COUNT (RT_CLASS_256 + 1) +} rt_size_class; + /* Common type for all nodes types */ typedef struct rt_node { @@ -136,6 +146,9 @@ typedef struct rt_node */ uint16 count; + /* Max number of children. We can use uint8 because we never need to store 256 */ + uint8 fanout; + /* * Shift indicates which part of the key space is represented by this * node. That is, the key is shifted by 'shift' and the lowest @@ -144,13 +157,13 @@ typedef struct rt_node uint8 shift; uint8 chunk; - /* Size kind of the node */ + /* Node kind, one per search/set algorithm */ uint8 kind; } rt_node; #define NODE_IS_LEAF(n) (((rt_node *) (n))->shift == 0) #define NODE_IS_EMPTY(n) (((rt_node *) (n))->count == 0) -#define NODE_HAS_FREE_SLOT(n) \ - (((rt_node *) (n))->count < rt_node_kind_info[((rt_node *) (n))->kind].fanout) +#define NODE_HAS_FREE_SLOT(node) \ + ((node)->base.n.count < (node)->base.n.fanout) /* Base type of each node kinds for leaf and inner nodes */ typedef struct rt_node_base_4 @@ -190,7 +203,7 @@ typedef struct rt_node_base256 /* * Inner and leaf nodes. * - * There are separate from inner node size classes for two main reasons: + * Theres are separate for two main reasons: * * 1) the value type might be different than something fitting into a pointer * width type @@ -274,8 +287,8 @@ typedef struct rt_node_leaf_256 uint64 values[RT_NODE_MAX_SLOTS]; } rt_node_leaf_256; -/* Information of each size kinds */ -typedef struct rt_node_kind_info_elem +/* Information for each size class */ +typedef struct rt_size_class_elem { const char *name; int fanout; @@ -287,7 +300,7 @@ typedef struct rt_node_kind_info_elem /* slab block size */ Size inner_blocksize; Size leaf_blocksize; -} rt_node_kind_info_elem; +} rt_size_class_elem; /* * Calculate the slab blocksize so that we can allocate at least 32 chunks @@ -295,9 +308,9 @@ typedef struct rt_node_kind_info_elem */ #define NODE_SLAB_BLOCK_SIZE(size) \ Max((SLAB_DEFAULT_BLOCK_SIZE / (size)) * size, (size) * 32) -static rt_node_kind_info_elem rt_node_kind_info[RT_NODE_KIND_COUNT] = { +static rt_size_class_elem rt_size_class_info[RT_SIZE_CLASS_COUNT] = { - [RT_NODE_KIND_4] = { + [RT_CLASS_4_FULL] = { .name = "radix tree node 4", .fanout = 4, .inner_size = sizeof(rt_node_inner_4), @@ -305,7 +318,7 @@ static rt_node_kind_info_elem rt_node_kind_info[RT_NODE_KIND_COUNT] = { .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_4)), .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_4)), }, - [RT_NODE_KIND_32] = { + [RT_CLASS_32_FULL] = { .name = "radix tree node 32", .fanout = 32, .inner_size = sizeof(rt_node_inner_32), @@ -313,7 +326,7 @@ static rt_node_kind_info_elem rt_node_kind_info[RT_NODE_KIND_COUNT] = { .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_32)), .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_32)), }, - [RT_NODE_KIND_128] = { + [RT_CLASS_128_FULL] = { .name = "radix tree node 128", .fanout = 128, .inner_size = sizeof(rt_node_inner_128), @@ -321,9 +334,11 @@ static rt_node_kind_info_elem rt_node_kind_info[RT_NODE_KIND_COUNT] = { .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_128)), .leaf_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_leaf_128)), }, - [RT_NODE_KIND_256] = { + [RT_CLASS_256] = { .name = "radix tree node 256", - .fanout = 256, + /* technically it's 256, but we can't store that in a uint8, + and this is the max size class so it will never grow */ + .fanout = 0, .inner_size = sizeof(rt_node_inner_256), .leaf_size = sizeof(rt_node_leaf_256), .inner_blocksize = NODE_SLAB_BLOCK_SIZE(sizeof(rt_node_inner_256)), @@ -372,17 +387,17 @@ struct radix_tree uint64 max_val; uint64 num_keys; - MemoryContextData *inner_slabs[RT_NODE_KIND_COUNT]; - MemoryContextData *leaf_slabs[RT_NODE_KIND_COUNT]; + MemoryContextData *inner_slabs[RT_SIZE_CLASS_COUNT]; + MemoryContextData *leaf_slabs[RT_SIZE_CLASS_COUNT]; /* statistics */ #ifdef RT_DEBUG - int32 cnt[RT_NODE_KIND_COUNT]; + int32 cnt[RT_SIZE_CLASS_COUNT]; #endif }; static void rt_new_root(radix_tree *tree, uint64 key); -static rt_node *rt_alloc_node(radix_tree *tree, int kind, uint8 shift, uint8 chunk, +static rt_node *rt_alloc_node(radix_tree *tree, int kind, rt_size_class size_class, uint8 shift, uint8 chunk, bool inner); static void rt_free_node(radix_tree *tree, rt_node *node); static void rt_extend(radix_tree *tree, uint64 key); @@ -584,7 +599,7 @@ chunk_children_array_copy(uint8 *src_chunks, rt_node **src_children, uint8 *dst_chunks, rt_node **dst_children, int count) { /* For better code generation */ - if (count > rt_node_kind_info[RT_NODE_KIND_4].fanout) + if (count > rt_size_class_info[RT_CLASS_4_FULL].fanout) pg_unreachable(); memcpy(dst_chunks, src_chunks, sizeof(uint8) * count); @@ -596,7 +611,7 @@ chunk_values_array_copy(uint8 *src_chunks, uint64 *src_values, uint8 *dst_chunks, uint64 *dst_values, int count) { /* For better code generation */ - if (count > rt_node_kind_info[RT_NODE_KIND_4].fanout) + if (count > rt_size_class_info[RT_CLASS_4_FULL].fanout) pg_unreachable(); memcpy(dst_chunks, src_chunks, sizeof(uint8) * count); @@ -837,7 +852,7 @@ rt_new_root(radix_tree *tree, uint64 key) int shift = key_get_shift(key); rt_node *node; - node = (rt_node *) rt_alloc_node(tree, RT_NODE_KIND_4, shift, 0, + node = (rt_node *) rt_alloc_node(tree, RT_NODE_KIND_4, RT_CLASS_4_FULL, shift, 0, shift > 0); tree->max_val = shift_get_max_val(shift); tree->root = node; @@ -847,18 +862,19 @@ rt_new_root(radix_tree *tree, uint64 key) * Allocate a new node with the given node kind. */ static rt_node * -rt_alloc_node(radix_tree *tree, int kind, uint8 shift, uint8 chunk, bool inner) +rt_alloc_node(radix_tree *tree, int kind, rt_size_class size_class, uint8 shift, uint8 chunk, bool inner) { rt_node *newnode; if (inner) - newnode = (rt_node *) MemoryContextAllocZero(tree->inner_slabs[kind], - rt_node_kind_info[kind].inner_size); + newnode = (rt_node *) MemoryContextAllocZero(tree->inner_slabs[size_class], + rt_size_class_info[size_class].inner_size); else - newnode = (rt_node *) MemoryContextAllocZero(tree->leaf_slabs[kind], - rt_node_kind_info[kind].leaf_size); + newnode = (rt_node *) MemoryContextAllocZero(tree->leaf_slabs[size_class], + rt_size_class_info[size_class].leaf_size); newnode->kind = kind; + newnode->fanout = rt_size_class_info[size_class].fanout; newnode->shift = shift; newnode->chunk = chunk; @@ -872,7 +888,7 @@ rt_alloc_node(radix_tree *tree, int kind, uint8 shift, uint8 chunk, bool inner) #ifdef RT_DEBUG /* update the statistics */ - tree->cnt[kind]++; + tree->cnt[size_class]++; #endif return newnode; @@ -883,11 +899,11 @@ rt_alloc_node(radix_tree *tree, int kind, uint8 shift, uint8 chunk, bool inner) * count of 'node'. */ static rt_node * -rt_copy_node(radix_tree *tree, rt_node *node, int new_kind) +rt_copy_node(radix_tree *tree, rt_node *node, int new_kind, rt_size_class new_size_class) { rt_node *newnode; - newnode = rt_alloc_node(tree, new_kind, node->shift, node->chunk, + newnode = rt_alloc_node(tree, new_kind, new_size_class, node->shift, node->chunk, node->shift > 0); newnode->count = node->count; @@ -898,14 +914,22 @@ rt_copy_node(radix_tree *tree, rt_node *node, int new_kind) static void rt_free_node(radix_tree *tree, rt_node *node) { + int i; + /* If we're deleting the root node, make the tree empty */ if (tree->root == node) tree->root = NULL; #ifdef RT_DEBUG /* update the statistics */ - tree->cnt[node->kind]--; - Assert(tree->cnt[node->kind] >= 0); + // FIXME + for (i = 0; i < RT_SIZE_CLASS_COUNT; i++) + { + if (node->fanout == rt_size_class_info[i].fanout) + break; + } + tree->cnt[i]--; + Assert(tree->cnt[i] >= 0); #endif pfree(node); @@ -954,7 +978,7 @@ rt_extend(radix_tree *tree, uint64 key) { rt_node_inner_4 *node; - node = (rt_node_inner_4 *) rt_alloc_node(tree, RT_NODE_KIND_4, + node = (rt_node_inner_4 *) rt_alloc_node(tree, RT_NODE_KIND_4, RT_CLASS_4_FULL, shift, 0, true); node->base.n.count = 1; node->base.chunks[0] = 0; @@ -984,7 +1008,7 @@ rt_set_extend(radix_tree *tree, uint64 key, uint64 value, rt_node *parent, rt_node *newchild; int newshift = shift - RT_NODE_SPAN; - newchild = rt_alloc_node(tree, RT_NODE_KIND_4, newshift, + newchild = rt_alloc_node(tree, RT_NODE_KIND_4, RT_CLASS_4_FULL, newshift, RT_GET_KEY_CHUNK(key, node->shift), newshift > 0); rt_node_insert_inner(tree, parent, node, key, newchild); @@ -1216,7 +1240,7 @@ rt_node_insert_inner(radix_tree *tree, rt_node *parent, rt_node *node, uint64 ke /* grow node from 4 to 32 */ new32 = (rt_node_inner_32 *) rt_copy_node(tree, (rt_node *) n4, - RT_NODE_KIND_32); + RT_NODE_KIND_32, RT_CLASS_32_FULL); chunk_children_array_copy(n4->base.chunks, n4->children, new32->base.chunks, new32->children, n4->base.n.count); @@ -1262,7 +1286,7 @@ rt_node_insert_inner(radix_tree *tree, rt_node *parent, rt_node *node, uint64 ke /* grow node from 32 to 128 */ new128 = (rt_node_inner_128 *) rt_copy_node(tree, (rt_node *) n32, - RT_NODE_KIND_128); + RT_NODE_KIND_128, RT_CLASS_128_FULL); for (int i = 0; i < n32->base.n.count; i++) node_inner_128_insert(new128, n32->base.chunks[i], n32->children[i]); @@ -1305,7 +1329,7 @@ rt_node_insert_inner(radix_tree *tree, rt_node *parent, rt_node *node, uint64 ke /* grow node from 128 to 256 */ new256 = (rt_node_inner_256 *) rt_copy_node(tree, (rt_node *) n128, - RT_NODE_KIND_256); + RT_NODE_KIND_256, RT_CLASS_256); for (int i = 0; i < RT_NODE_MAX_SLOTS && cnt < n128->base.n.count; i++) { if (!node_128_is_chunk_used((rt_node_base_128 *) n128, i)) @@ -1332,7 +1356,8 @@ rt_node_insert_inner(radix_tree *tree, rt_node *parent, rt_node *node, uint64 ke rt_node_inner_256 *n256 = (rt_node_inner_256 *) node; chunk_exists = node_inner_256_is_chunk_used(n256, chunk); - Assert(chunk_exists || NODE_HAS_FREE_SLOT(n256)); + Assert(n256->base.n.fanout == 0); + Assert(chunk_exists || ((rt_node *) n256)->count < RT_NODE_MAX_SLOTS); node_inner_256_set(n256, chunk, child); break; @@ -1384,7 +1409,7 @@ rt_node_insert_leaf(radix_tree *tree, rt_node *parent, rt_node *node, /* grow node from 4 to 32 */ new32 = (rt_node_leaf_32 *) rt_copy_node(tree, (rt_node *) n4, - RT_NODE_KIND_32); + RT_NODE_KIND_32, RT_CLASS_32_FULL); chunk_values_array_copy(n4->base.chunks, n4->values, new32->base.chunks, new32->values, n4->base.n.count); @@ -1430,7 +1455,7 @@ rt_node_insert_leaf(radix_tree *tree, rt_node *parent, rt_node *node, /* grow node from 32 to 128 */ new128 = (rt_node_leaf_128 *) rt_copy_node(tree, (rt_node *) n32, - RT_NODE_KIND_128); + RT_NODE_KIND_128, RT_CLASS_128_FULL); for (int i = 0; i < n32->base.n.count; i++) node_leaf_128_insert(new128, n32->base.chunks[i], n32->values[i]); @@ -1473,7 +1498,7 @@ rt_node_insert_leaf(radix_tree *tree, rt_node *parent, rt_node *node, /* grow node from 128 to 256 */ new256 = (rt_node_leaf_256 *) rt_copy_node(tree, (rt_node *) n128, - RT_NODE_KIND_256); + RT_NODE_KIND_256, RT_CLASS_256); for (int i = 0; i < RT_NODE_MAX_SLOTS && cnt < n128->base.n.count; i++) { if (!node_128_is_chunk_used((rt_node_base_128 *) n128, i)) @@ -1500,7 +1525,8 @@ rt_node_insert_leaf(radix_tree *tree, rt_node *parent, rt_node *node, rt_node_leaf_256 *n256 = (rt_node_leaf_256 *) node; chunk_exists = node_leaf_256_is_chunk_used(n256, chunk); - Assert(chunk_exists || NODE_HAS_FREE_SLOT(n256)); + Assert(((rt_node *) n256)->fanout == 0); + Assert(chunk_exists || ((rt_node *) n256)->count < 256); node_leaf_256_set(n256, chunk, value); break; @@ -1538,16 +1564,16 @@ rt_create(MemoryContext ctx) tree->num_keys = 0; /* Create the slab allocator for each size class */ - for (int i = 0; i < RT_NODE_KIND_COUNT; i++) + for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++) { tree->inner_slabs[i] = SlabContextCreate(ctx, - rt_node_kind_info[i].name, - rt_node_kind_info[i].inner_blocksize, - rt_node_kind_info[i].inner_size); + rt_size_class_info[i].name, + rt_size_class_info[i].inner_blocksize, + rt_size_class_info[i].inner_size); tree->leaf_slabs[i] = SlabContextCreate(ctx, - rt_node_kind_info[i].name, - rt_node_kind_info[i].leaf_blocksize, - rt_node_kind_info[i].leaf_size); + rt_size_class_info[i].name, + rt_size_class_info[i].leaf_blocksize, + rt_size_class_info[i].leaf_size); #ifdef RT_DEBUG tree->cnt[i] = 0; #endif @@ -1564,7 +1590,7 @@ rt_create(MemoryContext ctx) void rt_free(radix_tree *tree) { - for (int i = 0; i < RT_NODE_KIND_COUNT; i++) + for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++) { MemoryContextDelete(tree->inner_slabs[i]); MemoryContextDelete(tree->leaf_slabs[i]); @@ -2076,7 +2102,7 @@ rt_memory_usage(radix_tree *tree) { Size total = sizeof(radix_tree); - for (int i = 0; i < RT_NODE_KIND_COUNT; i++) + for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++) { total += MemoryContextMemAllocated(tree->inner_slabs[i], true); total += MemoryContextMemAllocated(tree->leaf_slabs[i], true); @@ -2166,10 +2192,10 @@ rt_stats(radix_tree *tree) ereport(LOG, (errmsg("num_keys = %lu, height = %u, n4 = %u, n32 = %u, n128 = %u, n256 = %u", tree->num_keys, tree->root->shift / RT_NODE_SPAN, - tree->cnt[0], - tree->cnt[1], - tree->cnt[2], - tree->cnt[3]))); + tree->cnt[RT_CLASS_4_FULL], + tree->cnt[RT_CLASS_32_FULL], + tree->cnt[RT_CLASS_128_FULL], + tree->cnt[RT_CLASS_256]))); } static void @@ -2177,11 +2203,12 @@ rt_dump_node(rt_node *node, int level, bool recurse) { char space[128] = {0}; - fprintf(stderr, "[%s] kind %d, count %u, shift %u, chunk 0x%X:\n", + fprintf(stderr, "[%s] kind %d, fanout %d, count %u, shift %u, chunk 0x%X:\n", NODE_IS_LEAF(node) ? "LEAF" : "INNR", (node->kind == RT_NODE_KIND_4) ? 4 : (node->kind == RT_NODE_KIND_32) ? 32 : (node->kind == RT_NODE_KIND_128) ? 128 : 256, + node->fanout == 0 ? 256 : node->fanout, node->count, node->shift, node->chunk); if (level > 0) @@ -2384,13 +2411,13 @@ rt_dump_search(radix_tree *tree, uint64 key) void rt_dump(radix_tree *tree) { - for (int i = 0; i < RT_NODE_KIND_COUNT; i++) + for (int i = 0; i < RT_SIZE_CLASS_COUNT; i++) fprintf(stderr, "%s\tinner_size%lu\tinner_blocksize %lu\tleaf_size %lu\tleaf_blocksize %lu\n", - rt_node_kind_info[i].name, - rt_node_kind_info[i].inner_size, - rt_node_kind_info[i].inner_blocksize, - rt_node_kind_info[i].leaf_size, - rt_node_kind_info[i].leaf_blocksize); + rt_size_class_info[i].name, + rt_size_class_info[i].inner_size, + rt_size_class_info[i].inner_blocksize, + rt_size_class_info[i].leaf_size, + rt_size_class_info[i].leaf_blocksize); fprintf(stderr, "max_val = %lu\n", tree->max_val); if (!tree->root) diff --git a/src/test/modules/test_radixtree/test_radixtree.c b/src/test/modules/test_radixtree/test_radixtree.c index cb3596755d..de1cd6cd70 100644 --- a/src/test/modules/test_radixtree/test_radixtree.c +++ b/src/test/modules/test_radixtree/test_radixtree.c @@ -40,7 +40,6 @@ static const bool rt_test_stats = false; /* The maximum number of entries each node type can have */ static int rt_node_max_entries[] = { 4, /* RT_NODE_KIND_4 */ - 16, /* RT_NODE_KIND_16 */ 32, /* RT_NODE_KIND_32 */ 128, /* RT_NODE_KIND_128 */ 256 /* RT_NODE_KIND_256 */ -- 2.38.1