Hi, On 2020-04-28 18:22:20 -0400, James Coleman wrote: > I cc'd Andres given his commit introduced simplehash, so I figured > he'd probably have a few pointers on when each one might be useful. > [...] > Do you have any thoughts on what the trade-offs/use-cases etc. are for > dynahash versus simple hash? > > From reading the commit message in b30d3ea824c it seems like simple > hash is faster and optimized for CPU cache benefits. The comments at > the top of simplehash.h also discourage it's use in non > performance/space sensitive uses, but there isn't anything I can see > that explicitly tries to discuss when dynahash is useful, etc.
Benefits of dynahash (chained hashtable): - supports partitioning, useful for shared memory accessed under locks - better performance for large entries, as they don't need to be moved around in case of hash conflicts - stable pointers to hash table entries Benefits of simplehash (open addressing hash table): - no indirect function calls, known structure sizes, due to "templated" code generation (these show up substantially in profiles for dynahash) - considerably faster for small entries due to previous point, and due open addressing hash tables having better cache behaviour than chained hashtables - once set-up the interface is type safe and easier to use - no overhead of a separate memory context etc > Given the performance notes in that commit message, I thinking > switching to simple hash is worth it. Seems plausible to me. > But I also wonder if there might be some value in a README or comments > addition that would be a guide to what the various hash > implementations are useful for. If there's interest, I could try to > type something short up so that we have something to make the code > base a bit more discoverable. That'd make sense to me. Greetings, Andres Freund