Hi,
On Tue, Feb 7, 2023 at 6:25 PM John Naylor <[email protected]> wrote:
>
>
> On Tue, Jan 31, 2023 at 9:43 PM Masahiko Sawada <[email protected]> wrote:
>
> > I've attached v24 patches. The locking support patch is separated
> > (0005 patch). Also I kept the updates for TidStore and the vacuum
> > integration from v23 separate.
>
> Okay, that's a lot more simple, and closer to what I imagined. For v25, I
> squashed v24's additions and added a couple of my own. I've kept the CF
> status at "needs review" because no specific action is required at the moment.
>
> I did start to review the TID store some more, but that's on hold because
> something else came up: On a lark I decided to re-run some benchmarks to see
> if anything got lost in converting to a template, and that led me down a
> rabbit hole -- some good and bad news on that below.
>
> 0001:
>
> I removed the uint64 case, as discussed. There is now a brief commit message,
> but needs to be fleshed out a bit. I took another look at the Arm
> optimization that Nathan found some month ago, for forming the highbit mask,
> but that doesn't play nicely with how node32 uses it, so I decided against
> it. I added a comment to describe the reasoning in case someone else gets a
> similar idea.
>
> I briefly looked into "separate-commit TODO: move non-SIMD fallbacks to their
> own header to clean up the #ifdef maze.", but decided it wasn't such a clear
> win to justify starting the work now. It's still in the back of my mind, but
> I removed the reminder from the commit message.
The changes make sense to me.
>
> 0003:
>
> The template now requires the value to be passed as a pointer. That was a
> pretty trivial change, but affected multiple other patches, so not sent
> separately. Also adds a forgotten RT_ prefix to the bitmap macros and adds a
> top comment to the *_impl.h headers. There are some comment fixes. The
> changes were either trivial or discussed earlier, so also not sent separately.
Great.
>
> 0004/5: I wanted to measure the load time as well as search time in
> bench_search_random_nodes(). That's kept separate to make it easier to test
> other patch versions.
>
> The bad news is that the speed of loading TIDs in bench_seq/shuffle_search()
> has regressed noticeably. I can't reproduce this in any other bench function
> and was the reason for writing 0005 to begin with. More confusingly, my
> efforts to fix this improved *other* functions, but the former didn't budge
> at all. First the patches:
>
> 0006 adds and removes some "inline" declarations (where it made sense), and
> added some for "pg_noinline" based on Andres' advice some months ago.
Agreed.
>
> 0007 removes some dead code. RT_NODE_INSERT_INNER is only called during
> RT_SET_EXTEND, so it can't possibly find an existing key. This kind of change
> is much easier with the inner/node cases handled together in a template, as
> far as being sure of how those cases are different. I thought about trying
> the search in assert builds and verifying it doesn't exist, but thought yet
> another #ifdef would be too messy.
Agreed.
>
> v25-addendum-try-no-maintain-order.txt -- It makes optional keeping the key
> chunks in order for the linear-search nodes. I believe the TID store no
> longer cares about the ordering, but this is a text file for now because I
> don't want to clutter the CI with a behavior change. Also, the second ART
> paper (on concurrency) mentioned that some locking schemes don't allow these
> arrays to be shifted. So it might make sense to give up entirely on
> guaranteeing ordered iteration, or at least make it optional as in the patch.
I think it's still important for lazy vacuum that an iteration over a
TID store returns TIDs in ascending order, because otherwise a heap
vacuum does random writes. That being said, we can have
RT_ITERATE_NEXT() return key-value pairs in an order regardless of how
the key chunks are stored in a node.
> ========================================
> psql -c "select rt_load_ms, rt_search_ms from bench_seq_search(0, 1 * 1000 *
> 1000)"
> (min load time of three)
>
> v15:
> rt_load_ms | rt_search_ms
> ------------+--------------
> 113 | 455
>
> v25-0005:
> rt_load_ms | rt_search_ms
> ------------+--------------
> 135 | 456
>
> v25-0006 (inlining or not):
> rt_load_ms | rt_search_ms
> ------------+--------------
> 136 | 455
>
> v25-0007 (remove dead code):
> rt_load_ms | rt_search_ms
> ------------+--------------
> 135 | 455
>
> v25-addendum...txt (no ordering):
> rt_load_ms | rt_search_ms
> ------------+--------------
> 134 | 455
>
> Note: The regression seems to have started in v17, which is the first with a
> full template.
>
> Nothing so far has helped here, and previous experience has shown that trying
> to profile 100ms will not be useful. Instead of putting more effort into
> diving deeper, it seems a better use of time to write a benchmark that calls
> the tid store itself. That's more realistic, since this function was intended
> to test load and search of tids, but the tid store doesn't quite operate so
> simply anymore. What do you think, Masahiko?
Yeah, that's more realistic. TidStore now encodes TIDs slightly
differently from the benchmark test.
I've attached the patch that adds a simple benchmark test using
TidStore. With this test, I got similar trends of results to yours
with gcc, but I've not analyzed them in depth yet.
query: select * from bench_tidstore_load(0, 10 * 1000 * 1000)
v15:
load_ms
---------
816
v25-0007 (remove dead code):
load_ms
---------
839
v25-addendum...txt (no ordering):
load_ms
---------
820
BTW it would be better to remove the RT_DEBUG macro from bench_radix_tree.c.
>
> I'm inclined to keep 0006, because it might give a slight boost, and 0007
> because it's never a bad idea to remove dead code.
Yeah, these two changes make sense to me too.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From e056133360436e115a434a8a21685a99602a5b5d Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Wed, 8 Feb 2023 15:53:14 +0900
Subject: [PATCH] Add bench_tidstore_load()
---
.../bench_radix_tree--1.0.sql | 10 ++++
contrib/bench_radix_tree/bench_radix_tree.c | 46 +++++++++++++++++++
2 files changed, 56 insertions(+)
diff --git a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
index 95eedbbe10..fbf51c1086 100644
--- a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
+++ b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
@@ -75,3 +75,13 @@ OUT rt_sparseload_ms int8
returns record
as 'MODULE_PATHNAME'
LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE;
+
+create function bench_tidstore_load(
+minblk int4,
+maxblk int4,
+OUT mem_allocated int8,
+OUT load_ms int8
+)
+returns record
+as 'MODULE_PATHNAME'
+LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE;
diff --git a/contrib/bench_radix_tree/bench_radix_tree.c
b/contrib/bench_radix_tree/bench_radix_tree.c
index 7d1e2eee57..3c2caa3b90 100644
--- a/contrib/bench_radix_tree/bench_radix_tree.c
+++ b/contrib/bench_radix_tree/bench_radix_tree.c
@@ -9,6 +9,7 @@
*/
#include "postgres.h"
+#include "access/tidstore.h"
#include "common/pg_prng.h"
#include "fmgr.h"
#include "funcapi.h"
@@ -54,6 +55,7 @@ PG_FUNCTION_INFO_V1(bench_load_random_int);
PG_FUNCTION_INFO_V1(bench_fixed_height_search);
PG_FUNCTION_INFO_V1(bench_search_random_nodes);
PG_FUNCTION_INFO_V1(bench_node128_load);
+PG_FUNCTION_INFO_V1(bench_tidstore_load);
static uint64
tid_to_key_off(ItemPointer tid, uint32 *off)
@@ -168,6 +170,50 @@ vac_cmp_itemptr(const void *left, const void *right)
}
#endif
+Datum
+bench_tidstore_load(PG_FUNCTION_ARGS)
+{
+ BlockNumber minblk = PG_GETARG_INT32(0);
+ BlockNumber maxblk = PG_GETARG_INT32(1);
+ TidStore *ts;
+ OffsetNumber *offs;
+ TimestampTz start_time,
+ end_time;
+ long secs;
+ int usecs;
+ int64 load_ms;
+ TupleDesc tupdesc;
+ Datum values[2];
+ bool nulls[2] = {false};
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ offs = palloc(sizeof(OffsetNumber) * TIDS_PER_BLOCK_FOR_LOAD);
+ for (int i = 0; i < TIDS_PER_BLOCK_FOR_LOAD; i++)
+ offs[i] = i + 1; /* FirstOffsetNumber is 1 */
+
+ ts = tidstore_create(1 * 1024L * 1024L * 1024L, MaxHeapTuplesPerPage,
NULL);
+
+ elog(NOTICE, "sleeping for 2 seconds...");
+ pg_usleep(2 * 1000000L);
+
+ /* load tids */
+ start_time = GetCurrentTimestamp();
+ for (BlockNumber blkno = minblk; blkno < maxblk; blkno++)
+ tidstore_add_tids(ts, blkno, offs, TIDS_PER_BLOCK_FOR_LOAD);
+ end_time = GetCurrentTimestamp();
+ TimestampDifference(start_time, end_time, &secs, &usecs);
+ load_ms = secs * 1000 + usecs / 1000;
+
+ values[0] = Int64GetDatum(tidstore_memory_usage(ts));
+ values[1] = Int64GetDatum(load_ms);
+
+ tidstore_destroy(ts);
+ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values,
nulls)));
+}
+
static Datum
bench_search(FunctionCallInfo fcinfo, bool shuffle)
{
--
2.31.1