On Tue, Dec 13, 2022 at 7:50 AM David Rowley <dgrowle...@gmail.com> wrote: > > Thanks for testing the patch. > > On Mon, 12 Dec 2022 at 20:14, John Naylor <john.nay...@enterprisedb.com> wrote:
> > While allocation is markedly improved, freeing looks worse here. The proportion is surprising because only about 2% of nodes are freed during the load, but doing that takes up 10-40% of the time compared to allocating. > > I've tried to reproduce this with the v13 patches applied and I'm not > really getting the same as you are. To run the function 100 times I > used: > > select x, a.* from generate_series(1,100) x(x), lateral (select * from > bench_load_random_int(500 * 1000 * (1+x-x))) a; Simply running over a longer period of time like this makes the SlabFree difference much closer to your results, so it doesn't seem out of line anymore. Here SlabAlloc seems to take maybe 2/3 of the time of current slab, with a 5% reduction in total time: 500k ints: v13-0001-0005 average of 30: 217ms 47.61% postgres postgres [.] rt_set 20.99% postgres postgres [.] SlabAlloc 10.00% postgres postgres [.] rt_node_insert_inner.isra.0 6.87% postgres [unknown] [k] 0xffffffffbce011b7 3.53% postgres postgres [.] MemoryContextAlloc 2.82% postgres postgres [.] SlabFree +slab v4 average of 30: 206ms 51.13% postgres postgres [.] rt_set 14.08% postgres postgres [.] SlabAlloc 11.41% postgres postgres [.] rt_node_insert_inner.isra.0 7.44% postgres [unknown] [k] 0xffffffffbce011b7 3.89% postgres postgres [.] MemoryContextAlloc 3.39% postgres postgres [.] SlabFree It doesn't look mysterious anymore, but I went ahead and took some more perf measurements, including for cache misses. My naive impression is that we're spending a bit more time waiting for data, but having to do less work with it once we get it, which is consistent with your earlier comments: perf stat -p $pid sleep 2 v13: 2,001.55 msec task-clock:u # 1.000 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 311,690 page-faults:u # 155.724 K/sec 3,128,740,701 cycles:u # 1.563 GHz 4,739,333,861 instructions:u # 1.51 insn per cycle 820,014,588 branches:u # 409.690 M/sec 7,385,923 branch-misses:u # 0.90% of all branches +slab v4: 2,001.09 msec task-clock:u # 1.000 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 326,017 page-faults:u # 162.920 K/sec 3,016,668,818 cycles:u # 1.508 GHz 4,324,863,908 instructions:u # 1.43 insn per cycle 761,839,927 branches:u # 380.712 M/sec 7,718,366 branch-misses:u # 1.01% of all branches perf stat -e LLC-loads,LLC-loads-misses -p $pid sleep 2 min/max of 3 runs: v13: LL cache misses: 25.08% - 25.41% +slab v4: LL cache misses: 25.74% - 26.01% -- John Naylor EDB: http://www.enterprisedb.com