On Tue, Dec 13, 2022 at 7:50 AM David Rowley <dgrowle...@gmail.com> wrote:
>
> Thanks for testing the patch.
>
> On Mon, 12 Dec 2022 at 20:14, John Naylor <john.nay...@enterprisedb.com>
wrote:

> > While allocation is markedly improved, freeing looks worse here. The
proportion is surprising because only about 2% of nodes are freed during
the load, but doing that takes up 10-40% of the time compared to allocating.
>
> I've tried to reproduce this with the v13 patches applied and I'm not
> really getting the same as you are. To run the function 100 times I
> used:
>
> select x, a.* from generate_series(1,100) x(x), lateral (select * from
> bench_load_random_int(500 * 1000 * (1+x-x))) a;

Simply running over a longer period of time like this makes the SlabFree
difference much closer to your results, so it doesn't seem out of line
anymore. Here SlabAlloc seems to take maybe 2/3 of the time of current
slab, with a 5% reduction in total time:

500k ints:

v13-0001-0005
average of 30: 217ms

  47.61%  postgres  postgres             [.] rt_set
  20.99%  postgres  postgres             [.] SlabAlloc
  10.00%  postgres  postgres             [.] rt_node_insert_inner.isra.0
   6.87%  postgres  [unknown]            [k] 0xffffffffbce011b7
   3.53%  postgres  postgres             [.] MemoryContextAlloc
   2.82%  postgres  postgres             [.] SlabFree

+slab v4
average of 30: 206ms

  51.13%  postgres  postgres             [.] rt_set
  14.08%  postgres  postgres             [.] SlabAlloc
  11.41%  postgres  postgres             [.] rt_node_insert_inner.isra.0
   7.44%  postgres  [unknown]            [k] 0xffffffffbce011b7
   3.89%  postgres  postgres             [.] MemoryContextAlloc
   3.39%  postgres  postgres             [.] SlabFree

It doesn't look mysterious anymore, but I went ahead and took some more
perf measurements, including for cache misses. My naive impression is that
we're spending a bit more time waiting for data, but having to do less work
with it once we get it, which is consistent with your earlier comments:

perf stat -p $pid sleep 2
v13:
          2,001.55 msec task-clock:u                     #    1.000 CPUs
utilized
                 0      context-switches:u               #    0.000 /sec

                 0      cpu-migrations:u                 #    0.000 /sec

           311,690      page-faults:u                    #  155.724 K/sec

     3,128,740,701      cycles:u                         #    1.563 GHz

     4,739,333,861      instructions:u                   #    1.51  insn
per cycle
       820,014,588      branches:u                       #  409.690 M/sec

         7,385,923      branch-misses:u                  #    0.90% of all
branches

+slab v4:
          2,001.09 msec task-clock:u                     #    1.000 CPUs
utilized
                 0      context-switches:u               #    0.000 /sec

                 0      cpu-migrations:u                 #    0.000 /sec

           326,017      page-faults:u                    #  162.920 K/sec

     3,016,668,818      cycles:u                         #    1.508 GHz

     4,324,863,908      instructions:u                   #    1.43  insn
per cycle
       761,839,927      branches:u                       #  380.712 M/sec

         7,718,366      branch-misses:u                  #    1.01% of all
branches


perf stat -e LLC-loads,LLC-loads-misses -p $pid sleep 2
min/max of 3 runs:
v13:      LL cache misses: 25.08% - 25.41%
+slab v4: LL cache misses: 25.74% - 26.01%

--
John Naylor
EDB: http://www.enterprisedb.com

Reply via email to