On Tue, Sep 20, 2022 at 3:19 PM Masahiko Sawada <sawada.m...@gmail.com>
wrote:
>
> On Fri, Sep 16, 2022 at 4:54 PM John Naylor
> <john.nay...@enterprisedb.com> wrote:

> > Here again, I'd rather put this off and focus on getting the "large
> > details" in good enough shape so we can got towards integrating with
> > vacuum.
>
> Thank you for the comments! These above comments are addressed by
> Nathan in a newly derived thread. I'll work on the patch.

I still seem to be out-voted on when to tackle this particular
optimization, so I've extended the v6 benchmark code with a hackish
function that populates a fixed number of keys, but with different fanouts.
(diff attached as a text file)

I didn't take particular care to make this scientific, but the following
seems pretty reproducible. Note what happens to load and search performance
when node16 has 15 entries versus 16:

 fanout | nkeys  | rt_mem_allocated | rt_load_ms | rt_search_ms
--------+--------+------------------+------------+--------------
     15 | 327680 |          3776512 |         39 |           20
(1 row)
num_keys = 327680, height = 4, n4 = 1, n16 = 23408, n32 = 0, n128 = 0, n256
= 0

 fanout | nkeys  | rt_mem_allocated | rt_load_ms | rt_search_ms
--------+--------+------------------+------------+--------------
     16 | 327680 |          3514368 |         25 |           11
(1 row)
num_keys = 327680, height = 4, n4 = 0, n16 = 21846, n32 = 0, n128 = 0, n256
= 0

In trying to wrap the SIMD code behind layers of abstraction, the latest
patch (and Nathan's cleanup) threw it away in almost all cases. To explain,
we need to talk about how vectorized code deals with the "tail" that is too
small for the register:

1. Use a one-by-one algorithm, like we do for the pg_lfind* variants.
2. Read some junk into the register and mask off false positives from the
result.

There are advantages to both depending on the situation.

Patch v5 and earlier used #2. Patch v6 used #1, so if a node16 has 15
elements or less, it will iterate over them one-by-one exactly like a
node4. Only when full with 16 will the vector path be taken. When another
entry is added, the elements are copied to the next bigger node, so there's
a *small* window where it's fast.

In short, this code needs to be lower level so that we still have full
control while being portable. I will work on this, and also the related
code for node dispatch.

Since v6 has some good infrastructure to do low-level benchmarking, I also
want to do some experiments with memory management.

(I have further comments about the code, but I will put that off until
later)

> I'll consider how to integrate with vacuum as the next step. One
> concern for me is how to limit the memory usage to
> maintenance_work_mem. Unlike using a flat array, memory space for
> adding one TID varies depending on the situation. If we want strictly
> not to allow using memory more than maintenance_work_mem, probably we
> need to estimate the memory consumption in a conservative way.

+1

--
John Naylor
EDB: http://www.enterprisedb.com
commit 18407962e96ccec6c9aeeba97412edd762a5a4fe
Author: John Naylor <john.nay...@postgresql.org>
Date:   Wed Sep 21 11:44:43 2022 +0700

    Add special benchmark function to test effect of fanout

diff --git a/contrib/bench_radix_tree/Makefile 
b/contrib/bench_radix_tree/Makefile
index b8f70e12d1..952bb0ceae 100644
--- a/contrib/bench_radix_tree/Makefile
+++ b/contrib/bench_radix_tree/Makefile
@@ -7,7 +7,7 @@ OBJS = \
 EXTENSION = bench_radix_tree
 DATA = bench_radix_tree--1.0.sql
 
-REGRESS = bench
+REGRESS = bench_fixed_height
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql 
b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
index 6663abe6a4..f2fee15b17 100644
--- a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
+++ b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
@@ -40,3 +40,15 @@ OUT load_ms int8)
 returns record
 as 'MODULE_PATHNAME'
 LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE;
+
+create function bench_fixed_height_search(
+fanout int4,
+OUT fanout int4,
+OUT nkeys int8,
+OUT rt_mem_allocated int8,
+OUT rt_load_ms int8,
+OUT rt_search_ms int8
+)
+returns record
+as 'MODULE_PATHNAME'
+LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE;
diff --git a/contrib/bench_radix_tree/bench_radix_tree.c 
b/contrib/bench_radix_tree/bench_radix_tree.c
index 5806ef7519..0778da2d7b 100644
--- a/contrib/bench_radix_tree/bench_radix_tree.c
+++ b/contrib/bench_radix_tree/bench_radix_tree.c
@@ -13,6 +13,7 @@
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/radixtree.h"
+#include <math.h>
 #include "miscadmin.h"
 #include "utils/timestamp.h"
 
@@ -24,6 +25,7 @@ PG_MODULE_MAGIC;
 PG_FUNCTION_INFO_V1(bench_seq_search);
 PG_FUNCTION_INFO_V1(bench_shuffle_search);
 PG_FUNCTION_INFO_V1(bench_load_random_int);
+PG_FUNCTION_INFO_V1(bench_fixed_height_search);
 
 static radix_tree *rt = NULL;
 static ItemPointer itemptrs = NULL;
@@ -299,3 +301,108 @@ bench_load_random_int(PG_FUNCTION_ARGS)
        rt_free(rt);
        PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, 
nulls)));
 }
+
+Datum
+bench_fixed_height_search(PG_FUNCTION_ARGS)
+{
+       int     fanout = PG_GETARG_INT32(0);
+       radix_tree *rt;
+       TupleDesc       tupdesc;
+       TimestampTz     start_time,     end_time;
+       long    secs;
+       int             usecs;
+       int64   rt_load_ms, rt_search_ms;
+       Datum   values[5];
+       bool    nulls[5];
+
+       /* test boundary between vector and iteration */
+       const int               n_keys = 5 * 16 * 16 * 16 * 16;
+       uint64          r, h, i, j, k;
+       int key_id;
+
+       /* Build a tuple descriptor for our result type */
+       if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+               elog(ERROR, "return type must be a row type");
+
+       rt = rt_create(CurrentMemoryContext);
+
+       start_time = GetCurrentTimestamp();
+
+       key_id = 0;
+       /*  lower nodes have limited fanout, the top is only limited by 
bits-per-byte */
+       for (r=1;;r++)
+       {
+               for (h = 1; h <= fanout; h++)
+               {
+                       for (i = 1; i <= fanout; i++)
+                       {
+                               for (j = 1; j <= fanout; j++)
+                               {
+                                       for (k = 1; k <= fanout; k++)
+                                       {
+                                               uint64  key;
+                                               key = (r<<32) | (h<<24) | 
(i<<16) | (j<<8) | (k);
+
+                                               CHECK_FOR_INTERRUPTS();
+
+                                               key_id++;
+                                               if (key_id > n_keys)
+                                                       goto finish_set;
+
+                                               rt_set(rt, key, key_id);
+                                       }
+                               }
+                       }
+               }
+       }
+finish_set:
+       end_time = GetCurrentTimestamp();
+       TimestampDifference(start_time, end_time, &secs, &usecs);
+       rt_load_ms = secs * 1000 + usecs / 1000;
+
+       rt_stats(rt);
+
+       /* meaure the search time of the radix tree */
+       start_time = GetCurrentTimestamp();
+
+       key_id = 0;
+       for (r=1;;r++)
+       {
+               for (h = 1; h <= fanout; h++)
+               {
+                       for (i = 1; i <= fanout; i++)
+                       {
+                               for (j = 1; j <= fanout; j++)
+                               {
+                                       for (k = 1; k <= fanout; k++)
+                                       {
+                                               uint64  key, val;
+                                               key = (r<<32) | (h<<24) | 
(i<<16) | (j<<8) | (k);
+
+                                               CHECK_FOR_INTERRUPTS();
+
+                                               key_id++;
+                                               if (key_id > n_keys)
+                                                       goto finish_search;
+
+                                               rt_search(rt, key, &val);
+                                       }
+                               }
+                       }
+               }
+       }
+finish_search:
+       end_time = GetCurrentTimestamp();
+       TimestampDifference(start_time, end_time, &secs, &usecs);
+       rt_search_ms = secs * 1000 + usecs / 1000;
+
+       MemSet(nulls, false, sizeof(nulls));
+       values[0] = Int32GetDatum(fanout);
+       values[1] = Int64GetDatum(rt_num_entries(rt));
+       values[2] = Int64GetDatum(rt_memory_usage(rt));
+       values[3] = Int64GetDatum(rt_load_ms);
+       values[4] = Int64GetDatum(rt_search_ms);
+
+       rt_free(rt);
+       PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, 
nulls)));
+}
diff --git a/contrib/bench_radix_tree/expected/bench_fixed_height.out 
b/contrib/bench_radix_tree/expected/bench_fixed_height.out
new file mode 100644
index 0000000000..c4995afc13
--- /dev/null
+++ b/contrib/bench_radix_tree/expected/bench_fixed_height.out
@@ -0,0 +1,6 @@
+create extension bench_radix_tree;
+\o fixed_height_search.data
+begin;
+select * from bench_fixed_height_search(15);
+select * from bench_fixed_height_search(16);
+commit;
diff --git a/contrib/bench_radix_tree/sql/bench_fixed_height.sql 
b/contrib/bench_radix_tree/sql/bench_fixed_height.sql
new file mode 100644
index 0000000000..0c06570e9a
--- /dev/null
+++ b/contrib/bench_radix_tree/sql/bench_fixed_height.sql
@@ -0,0 +1,7 @@
+create extension bench_radix_tree;
+
+\o fixed_height_search.data
+begin;
+select * from bench_fixed_height_search(15);
+select * from bench_fixed_height_search(16);
+commit;
diff --git a/src/backend/lib/radixtree.c b/src/backend/lib/radixtree.c
index b163eac480..4ce8e9ad9d 100644
--- a/src/backend/lib/radixtree.c
+++ b/src/backend/lib/radixtree.c
@@ -1980,7 +1980,7 @@ rt_verify_node(rt_node *node)
 void
 rt_stats(radix_tree *tree)
 {
-       fprintf(stderr, "num_keys = %lu, height = %u, n4 = %u, n16 = %u,n32 = 
%u, n128 = %u, n256 = %u",
+       fprintf(stderr, "num_keys = %lu, height = %u, n4 = %u, n16 = %u,n32 = 
%u, n128 = %u, n256 = %u\n",
                        tree->num_keys,
                        tree->root->shift / RT_NODE_SPAN,
                        tree->cnt[0],
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 38cc6abf4c..6016d593ee 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -15,7 +15,7 @@
 
 #include "postgres.h"
 
-/* #define RT_DEBUG 1 */
+#define RT_DEBUG 1 
 
 typedef struct radix_tree radix_tree;
 typedef struct rt_iter rt_iter;

Reply via email to