Dubious performance characteristics of hash-maps

Edward Z. Yang Tue, 23 Feb 2010 04:18:22 -0800

I'd first like to state that I went into this exercise expecting
Bagwell's Hash Array Mapped Tries to blow the competition out of the
water; I'd get a fast functional map and give it to Haskell and people
would rejoice.


Instead, I found a functional hash-map that was approximately four times
slower than Haskell's IntMap.  I find this puzzling, and would love to
puzzle it out with you folks.

First, the test case:

    (ns maptest (:gen-class))

    (defn mk-random-stream []
        (let [r (new ec.util.MersenneTwisterFast)]
                 (repeatedly (fn [] (. r (nextInt))))))

    (defn -main [sn]
      (let [n  (Integer/parseInt sn)
                  rs (take (* 2 n) (mk-random-stream))]
       (time (let [m  (apply hash-map (take (* n 2) (interleave rs rs)))]
            (reduce
                    (fn [s k] (let [v (get m k)]
                                            (+ s (if (nil? v) 0 v))))
                    0
                    (take n (drop (/ n 2) rs)))))))

We generate an n-sized hash tree, and then do n accesses, half of which
match and half of which miss.  We compute the sum of all the values that
catch.  We see the following characteristics:

       n   Clojure   Haskell (HAMT)   Haskell (IntMap)
     32K     .56s            .30s           .09s
     64K     .84s            .69s           .22s
    128K    1.65s           1.50s           .46s
    256K    2.94s           3.23s          1.17s
    512K    7.32s           6.96s          2.70s

I was careful to compile the Clojure code before attempting to run it.
I also pre-allocate the entire random sequence to prevent it from
muddying the execution time (although I am using Sean Luke's excellent
fast Mersenne Twister [1]).  The full string I used to invoke the
program was:

    /usr/lib/jvm/java-6-sun-1.6.0.16/bin/java -Dfile.encoding=UTF-8 \
        -classpath $CLASSES maptest $@

The fact that it's similar in performance to the Haskell
reimplementation (I'd say that the variance is not particularly
significant and both are about on par performance wise) seems to
validate the testing methodology, I hope, though the real proof of the
pudding would lie in implementing Haskell-style IntMap in Java.

You can see the test code for the Haskell versions here:

    http://github.com/ezyang/hamt

So... what's up with these numbers?  Is my methodology wrong?  Is the
garbage collector sucking?  Is the algorithm just not as good as it
makes out to be?

Cheers,
Edward

[1] http://www.cs.gmu.edu/~sean/research/

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Dubious performance characteristics of hash-maps

Reply via email to