On Tue, Aug 02, 2022 at 03:31:57PM +0200, J. Reinders wrote: > I’ve been investigating fast hash table implementations. In particular > hash tables used for counting unique items. For this use case, I > believe the most performant hash tables are, in C terms, arrays of > structures with a (boxed) pointer to the key, which is the item that > we are counting, and an (unboxed) integer which holds the actual > count. > > I already know of the ‘vector-hashtables’ package which uses two > separate arrays, for example one boxed to hold the keys and one > unboxed to hold the counts. However, I believe it can be quite > important to store all the elements in the same array as that can > reduce the number of cache misses. Because with random access to two > arrays there is a higher chance that there will be two cache misses > even if it immediately finds the right key in the hash table.
Could you use `StablePtr` for the keys? https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html The corresponding `Ptr` can be stored in an unboxed Storable array along with the count. This comes at the cost of later having to explicitly free each StablePtr. https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/GHC-Stable.html#v:freeStablePtr How does the cost of computing object hashes and comparing colliding objects compare with the potential cache miss cost of using boxed integers or a separate array? Would such an "optimisation" be worth the effort? -- Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs