Hi GHC devs,

I’ve been investigating fast hash table implementations. In particular hash 
tables used for counting unique items. For this use case, I believe the most 
performant hash tables are, in C terms, arrays of structures with a (boxed) 
pointer to the key, which is the item that we are counting, and an (unboxed) 
integer which holds the actual count.

I already know of the ‘vector-hashtables’ package which uses two separate 
arrays, for example one boxed to hold the keys and one unboxed to hold the 
counts. However, I believe it can be quite important to store all the elements 
in the same array as that can reduce the number of cache misses. Because with 
random access to two arrays there is a higher chance that there will be two 
cache misses even if it immediately finds the right key in the hash table.

So, I have also been looking at the low level arrays from the ‘primitive’ 
package and even in GHC.Exts, but I don’t believe it is currently possible to 
create a single array that contains both boxed and unboxed elements.

Have I overlooked something? Or else, would it be possible to support this use 
case in a future version of GHC?

Cheers,

Jaro
_______________________________________________
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Reply via email to