On Sun, Dec 10, 2023 at 2:18 AM Jeff Davis <pg...@j-davis.com> wrote: > > On Sat, 2023-12-09 at 18:52 +0700, John Naylor wrote: > > > I tested using the new hash function APIs for my search path cache, > > > and > > > there's a significant speedup for cases not benefiting from > > > a86c61c9ee. > > > It's enough that we almost don't need a86c61c9ee. So a definite +1 > > > to > > > the new APIs. > > > > Do you have a new test? > > Still using the same basic test here: > > https://www.postgresql.org/message-id/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com > > What I did is: > > a. add your v5 patches > b. disable optimization in a86c61c9ee > c. add attached patch to use new hash APIs > > I got a slowdown between (a) and (b), and then (c) closed the gap about > halfway. It started to get close to test noise at that point -- I could > get some better numbers out of it if it's helpful.
I tried my variant of the same test [1] (but only 20 seconds per run), which uses pgbench to take the average of a few dozen runs, and doesn't use table I/O (when doing that, it's best to pre-warm the buffers to reduce noise). pgbench -n -T 20 -f bench.sql -M prepared (done three times and take the median, with turbo off) * master at 457428d9e99b6b from Dec 4: latency average = 571.413 ms * v8 (bytewise hash): latency average = 588.942 ms This regression is a bit surprising, since there is no strlen call, and it uses roleid as a seed without a round of mixing (not sure if we should do that, but just trying to verify results). * v8 with chunked interface: latency average = 555.688 ms This starts to improve things for me. * v8 with chunked, and return lower 32 bits of full 64-bit hash: latency average = 556.324 ms This is within the noise level. There doesn't seem to be much downside of using a couple cycles for fasthash's 32-bit reduction. * revert back to master from Dec 4 and then cherry pick a86c61c9ee (save last entry of SearchPathCache) latency average = 545.747 ms So chunked incremental hashing gets within ~2% of that, which is nice. It seems we should use that when removing strlen, when convenient. Updated next steps: * Investigate whether/how to incorporate final length into the calculation when we don't have the length up front. * Add some desperately needed explanatory comments. * Use this in some existing cases where it makes sense. * Get back to GUC hash and dynahash. [1] https://www.postgresql.org/message-id/CANWCAZY7Cr-GdUhrCLoR4%2BJGLChTb0pQxq9ZPi1KTLs%2B_KDFqg%40mail.gmail.com