Don Stewart wrote:
Do you have any benchmarks comparing dictionaries against Map ByteString
Int, or Map String Int?

I haven't personally run them, but Mark Wotton has compared [(ByteString,Int)] vs (Map ByteString Int) vs (Trie Int) version 0.1.1 or 0.1.2 and using data from /usr/share/dict/words:


9:22 ~/projects/current/MapBench % ghc --make  Test.hs
../../Microbench.hs && ./Test
[1 of 2] Compiling Microbench ( ../../Microbench.hs, ../../Microbench.o )
[2 of 2] Compiling Main             ( Test.hs, Test.o )
Linking Test ...
* alist lookup:
  160.641ns per iteration / 6225.07 per second.
* map lookup:
  0.881ns per iteration / 1135623.22 per second.
* Trie lookup:
  0.243ns per iteration / 4116930.41 per second.


The benchmarking code requires hacking microbench to use Integers instead of Ints, to avoid counter overflow. I haven't yet worked the benchmarking code into the Darcs repo, but it's available to those interested. I'll add it to the repo soon.

The cost of inserting elements is higher for Trie than Map (I don't have the numbers). One possible reason for this is that the generic alterBy doesn't know whether it's inserting or deleting, so it uses smart constructors along the spine which adds the cost of consistency checks. I've planned to add a dedicated implementation for insertBy (or modifyBy, or some such name) in order to test this hypothesis.

The cost of merging is still unknown, but big-endian patricia trees have better asymptotic complexity than the balanced trees used by Map, so Trie is probably better there too.

Another improvement in the works is a smarter algorithm for splitting strings. Once it's in place, this should speed up all three of the main algorithms.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to