I wrote: > I've reproduced what looks like about the same thing on > my Pi 4 using Fedora 38: just run "make installcheck-parallel" > under valgrind, and kaboom. Definitely needs investigation.
The problem appears to be that RT_ALLOC_NODE doesn't bother to initialize the chunks[] array when making a RT_NODE_16 node. If we fill fewer than RT_FANOUT_16_MAX of the chunks[] entries, then when RT_NODE_16_SEARCH_EQ applies vector operations that read the entire array, it's operating partially on uninitialized data. Now, that's actually OK because of the "mask off invalid entries" step, but aarch64 valgrind complains anyway. I hypothesize that the reason we're not seeing equivalent failures on x86_64 is one of 1. x86_64 valgrind is stupider than aarch64, and fails to track that the contents of the SIMD registers are only partially defined. 2. x86_64 valgrind is smarter than aarch64, and is able to see that the "mask off invalid entries" step removes all the potentially-uninitialized bits. The first attached patch, "radixtree-fix-minimal.patch", is enough to stop the aarch64 valgrind failure for me. However, I think that the coding here is pretty penny-wise and pound-foolish, and that what we really ought to do is the second patch, "radixtree-fix-proposed.patch". I do not believe that asking memset to zero the three-byte RT_NODE structure produces code that's either shorter or faster than having it zero 8 bytes (as for RT_NODE_4) or having it do that and then separately zero some more stuff (as for the larger node types). Leaving RT_NODE_4's chunks[] array uninitialized is going to bite us someday, too, even if it doesn't right now. So I think we ought to just zero the whole fixed-size part of the nodes, which is what radixtree-fix-proposed.patch does. (The RT_NODE_48 case could be optimized a bit if we cared to swap the order of its slot_idxs[] and isset[] arrays; then the initial zeroing could just go up to slot_idxs[]. I don't know if there's any reason why the current order is preferable.) regards, tom lane
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 338e1d741d..e21f7be3f9 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -849,8 +849,14 @@ RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_c switch (kind) { case RT_NODE_KIND_4: - case RT_NODE_KIND_16: break; + case RT_NODE_KIND_16: + { + RT_NODE_16 *n16 = (RT_NODE_16 *) node; + + memset(n16->chunks, 0, sizeof(n16->chunks)); + break; + } case RT_NODE_KIND_48: { RT_NODE_48 *n48 = (RT_NODE_48 *) node;
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 338e1d741d..4510f7c4cd 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -845,27 +845,25 @@ RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_c /* initialize contents */ - memset(node, 0, sizeof(RT_NODE)); switch (kind) { case RT_NODE_KIND_4: + memset(node, 0, offsetof(RT_NODE_4, children)); + break; case RT_NODE_KIND_16: + memset(node, 0, offsetof(RT_NODE_16, children)); break; case RT_NODE_KIND_48: { RT_NODE_48 *n48 = (RT_NODE_48 *) node; - memset(n48->isset, 0, sizeof(n48->isset)); + memset(n48, 0, offsetof(RT_NODE_48, children)); memset(n48->slot_idxs, RT_INVALID_SLOT_IDX, sizeof(n48->slot_idxs)); break; } case RT_NODE_KIND_256: - { - RT_NODE_256 *n256 = (RT_NODE_256 *) node; - - memset(n256->isset, 0, sizeof(n256->isset)); - break; - } + memset(node, 0, offsetof(RT_NODE_256, children)); + break; default: pg_unreachable(); }