Andres Freund писал 2021-04-26 21:46:
Hi,

On 2021-04-25 01:27:24 +0300, Yura Sokolov wrote:
It is quite interesting result. Simplehash being open-addressing with
linear probing is friendly for cpu cache. I'd recommend to define
SH_FILLFACTOR with value lower than default (0.9). I believe 0.75 is
suitable most for such kind of hash table.

It's not a "plain" linear probing hash table (although it is on the lookup side). During insertions collisions are reordered so that the average distance from the "optimal" position is ~even ("robin hood hashing"). That allows a higher load factor than a plain linear probed hash table (for which IIRC
there's data to show 0.75 to be a good default load factor).

Even for Robin Hood hashing 0.9 fill factor is too high. It leads to too
much movements on insertion/deletion and longer average collision chain.

Note that Robin Hood doesn't optimize average case. Indeed, usually Robin Hood
has worse (longer) average collision chain than simple linear probing.
Robin Hood hashing optimizes worst case, ie it guarantees there is no unnecessary
long collision chain.

See discussion on Rust hash table fill factor when it were Robin Hood:
https://github.com/rust-lang/rust/issues/38003


There of course may still be a benefit in lowering the load factor, but I'd
not start there.

David's test aren't really suited to benchmarking the load factor, but to me the stats he showed didn't highlight a need to lower the load factor. Lowering
the fill factor does influence the cache hit ratio...

Greetings,

Andres Freund

regards,
Yura.


Reply via email to