On 14 October 2005 20:31, Jan-Willem Maessen wrote: > That "5K" number made me immediately suspicious, so I took a look at > the source code to Data.HashTable. Sure enough, it's allocating a > number of large IOArrays, which are filled with pointers. The > practical upshot is that, for a hash table with (say) 24 entries, the > GC must scan an additional 1000 pointers and discover that each one > is []. > > I've seen other implementations of this two-level technique which use > a smallish sEGMENT_SIZE in order to avoid excessive GC overhead for > less-than-gigantic hash tables. This might be worth doing in the > Data.HashTable implementation. > > [Curious: what (if anything) is being used to test Data.HashTable? > I'd be willing to undertake very small amounts of fiddling if I could > be sure I wasn't slowing down something which mattered.]
The few benchmarks I've tried seem to indicate that Data.HashTable performance isn't overwhelming, and can often be bettered by Data.Map. I've never investigated too deeply. GC overhead seems a very plausible cause, though. Please by all means tweak the implementation and try some benchmarks, I'll incorporate any performance fixes that arise. Cheers, Simon _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users