On Wed 2008-08-27 16:21, Jules Bean wrote: > Bulat Ziganshin wrote: >> Hello haskell-cafe, >> >> solving one more task that uses English dictionary, i've thought: why we >> don't yet have pure hashtable library? There is imperative hashtables, >> pretty complex as they need to rebuild entire table as it grows. There is >> also simple assoc lists and tree/trie implementations, but there is no >> simple non-modifiable hashes. >> >> how should it look: >> * hashtable is represented as an array of assoc lists: Array Int [(a,b)] >> >> * interface to extract data from ht is the same as from assoc list: >> lookup :: HT a b -> a -> Maybe b >> >> * ht may be built from assoc list. we should just know it's size beforehand >> in order to create Array of reasonable size. constructor also need a hashing >> function: >> >> create :: [(a,b)] -> Int -> (a->Int) -> HT a b >> >> >> given these constraints, it should be just a 10-20 lines of code, and >> provide much better efficiency than any tree/trie implementations > > Prove it. > > To get "much better efficient" than a trie, the hash function has to be > so fast that it is faster than following (log n) pointers, and yet also > so "perfect" that it doesn't generate too many collisions.
Many people have probably seen this and it has nothing to do with Haskell, but it is a good performance comparison of a simple hash to an optimized trie. http://www.nothings.org/computer/judy/ The conclusion (we're only interested in lookup times) is that the trie is preferable for sequential lookups, slower for random access hits, and about the same for random access misses. Also, the hash was uniformly better for small sizes (< 10k). BTW a single cache miss has a latency around 250 cycles, you can compute a hell of a hash for that. Jed
pgpBGCqLEkPII.pgp
Description: PGP signature
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe