On Tue, Aug 02, 2022 at 03:31:57PM +0200, J. Reinders wrote:

> I’ve been investigating fast hash table implementations. In particular
> hash tables used for counting unique items. For this use case, I
> believe the most performant hash tables are, in C terms, arrays of
> structures with a (boxed) pointer to the key, which is the item that
> we are counting, and an (unboxed) integer which holds the actual
> count.
> 
> I already know of the ‘vector-hashtables’ package which uses two
> separate arrays, for example one boxed to hold the keys and one
> unboxed to hold the counts. However, I believe it can be quite
> important to store all the elements in the same array as that can
> reduce the number of cache misses. Because with random access to two
> arrays there is a higher chance that there will be two cache misses
> even if it immediately finds the right key in the hash table.

Could you use `StablePtr` for the keys?

    
https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html

The corresponding `Ptr` can be stored in an unboxed Storable array along
with the count.

This comes at the cost of later having to explicitly free each StablePtr.

    
https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html#v:freeStablePtr

How does the cost of computing object hashes and comparing colliding
objects compare with the potential cache miss cost of using boxed
integers or a separate array?  Would such an "optimisation" be worth
the effort?

-- 
    Viktor.
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to