On Fri, Sep 04, 2020 at 07:01:41AM +0000, Jakub Wartak wrote:
Greetins hackers, I have mixed feelings if this welcome contribution as the potential gain is relatively small in my tests, but still I would like to point out that HASH_FFACTOR functionality from dynahash.c could be removed or optimized (default fill factor is always 1, there's not a single place that uses custom custom fill factor other than DEF_FFACTOR=1 inside PostgreSQL repository). Because the functionality is present there seems to be division for every buffer access [BufTableLookup()] / or every smgropen() call (everything call to hash_search() is affected, provided it's not ShmemInitHash/HASH_PARTITION). This division is especially visible via perf on single process StartupXLOG WAL recovery process on standby in heavy duty 100% CPU conditions , as the top1 is inside hash_search: 0x0000000000888751 <+449>: idiv r8 0x0000000000888754 <+452>: cmp rax,QWORD PTR [r15+0x338] <<-- in perf annotate shows as 30-40%, even on default -O2, probably CPU pipelining for idiv above I've made a PoC test to skip that division assuming ffactor would be gone: if (!IS_PARTITIONED(hctl) && !hashp->frozen && - hctl->freeList[0].nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor && + hctl->freeList[0].nentries >= (long) (hctl->max_bucket + 1) && For a stream of WAL 3.7GB I'm getting consistent improvement of ~4%, (yes I know it's small, that's why I'm having mixed feelings): gcc -O3: 104->100s gcc -O2: 108->104s pgbench -S -c 16 -j 4 -T 30 -M prepared: stays more or less the same (-s 100), so no positive impact there
Hmm, so it's the division that makes the difference? In that case, wouldn't it make sense to just compute a threshold every time the hash table is resized, and then do just the comparison. That is, compute nentries_threshold = (long) (hctl->max_bucket + 1) * hctl->ffactor; and then do just hctl->freeList[0].nentries >= nentries_threshold Of course, this assumes the threshold is calculated only rarely, maybe that's not the case. Also, what if you lower the fill factor, e.g. to 0.8? Does that improve performance or not? I can't find any discussion about this in the archives, but the dynamic hashing paper [1] seems to suggest it does make a difference (see the tables at the end, but I haven't re-read the paper recently so maybe it's irrelevant). Anyway, it'd be foolish to remove the ffactor parameter to save on division only to lose the ability to lower the factor and save more than that ... As for the 4% - it's not much, but it's also not negligible. I'm sure we've done micro-optimizations for smaller gains. The big question is whether this is statistically significant, or whether it's just due to random effects. It could easily be caused by changes in layout of the binary, for example - that can happen quite easily. See [2] and [3]. The talk actually mentions a tool meant to eliminate this bias by randomization, but I've never tried using it on PostgreSQL so I'm not sure how compatible it is :-( [1] https://www.csd.uoc.gr/~hy460/pdf/Dynamic%20Hash%20Tables.pdf [2] https://www.cis.upenn.edu/~cis501/previous/fall2016/papers/producing-wrong-data.pdf [3] https://www.youtube.com/watch?v=r-TLSBdHe1A
After removing HASH_FFACTOR PostgreSQL still compiles... Would removing it break some external API/extensions ? I saw several optimization for the "idiv" where it could be optimized e.g. see https://github.com/ridiculousfish/libdivide Or maybe there is some other idea to expose bottlenecks of BufTableLookup() ? I also saw codepath PinBuffer()->GetPrivateRefCountEntry() -> dynahash that could be called pretty often I have no idea what kind of pgbench stresstest could be used to demonstrate the gain (or lack of it). -Jakub Wartak.
I don't think whit would break a lot of stuff, but I'm kinda dubious it's a measurable improvement for common workloads ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services