On Sat, Dec 10, 2022 at 11:02 AM David Rowley <dgrowle...@gmail.com> wrote: > [v4]
Thanks for working on this! I ran an in-situ benchmark using the v13 radix tree patchset ([1] WIP but should be useful enough for testing allocation speed), only applying the first five, which are local-memory only. The benchmark is not meant to represent a realistic workload, and primarily stresses traversal and allocation of the smallest node type. Minimum of five, with turbo-boost off, on recent Intel laptop hardware: v13-0001 to 0005: # select * from bench_load_random_int(500 * 1000); mem_allocated | load_ms ---------------+--------- 151123432 | 222 47.06% postgres postgres [.] rt_set 22.89% postgres postgres [.] SlabAlloc 9.65% postgres postgres [.] rt_node_insert_inner.isra.0 5.94% postgres [unknown] [k] 0xffffffffb5e011b7 3.62% postgres postgres [.] MemoryContextAlloc 2.70% postgres libc.so.6 [.] __memmove_avx_unaligned_erms 2.60% postgres postgres [.] SlabFree + v4 slab: # select * from bench_load_random_int(500 * 1000); mem_allocated | load_ms ---------------+--------- 152463112 | 213 52.42% postgres postgres [.] rt_set 12.80% postgres postgres [.] SlabAlloc 9.38% postgres postgres [.] rt_node_insert_inner.isra.0 7.87% postgres [unknown] [k] 0xffffffffb5e011b7 4.98% postgres postgres [.] SlabFree While allocation is markedly improved, freeing looks worse here. The proportion is surprising because only about 2% of nodes are freed during the load, but doing that takes up 10-40% of the time compared to allocating. num_keys = 500000, height = 7 n4 = 2501016, n15 = 56932, n32 = 270, n125 = 0, n256 = 257 Sidenote: I don't recall ever seeing vsyscall (I think that's what the 0xffffffffb5e011b7 address is referring to) in a profile, so not sure what is happening there. [1] https://www.postgresql.org/message-id/CAFBsxsHNE621mGuPhd7kxaGc22vMkoSu7R4JW9Zan1jjorGy3g%40mail.gmail.com -- John Naylor EDB: http://www.enterprisedb.com