Not sure which version of Clojure you are using, but in all versions up to
the latest 'official' release, Clojure 1.8.0, records have their hash value
calculated every time they are needed, with no caching of the calculated
value.  The hash value is needed every time an operation is done in a
Clojure set, and probably a java HashSet, too.

The latest alpha release, Clojure 1.9.0-alpha14, contains a performance
improvement where records cache their hash value after it is first
calculated.  http://dev.clojure.org/jira/browse/CLJ-1224

I don't know if that change would make your version of the code using
records and Java HashSet's competitive with your fastest code or not, but
you may wish to try it.

Andy

On Mon, Nov 21, 2016 at 12:05 PM, Didier <didi...@gmail.com> wrote:

> I experimented with this a lot, and took everyone's advice, and this is
> the fastest I got, even faster then Michael Gardner's answer.
>
> (deftype point5 [^long i ^long j]
>   Object
>   (equals [this that] (and (= (.i this) (.i ^point5 that))
>                            (= (.j this) (.j ^point5 that))))
>   (hashCode [this] (+ (.i this) (* 4000 (.j this)))))
>
> (defn iter-neighbors5 [f ^point5 p]
>   (let [i ^long (.i p)
>         j ^long (.j p)]
>     (f (->point5 (dec i) j))
>     (f (->point5 (inc i) j))
>     (f (->point5 i (dec j)))
>     (f (->point5 i (inc j)))))
>
> (defn nth5 [n p]
>   (loop [n ^long n
>          s1 (HashSet. [p])
>          s2 (HashSet.)]
>     (if (zero? n)
>       s1
>       (let [s0 (HashSet.)]
>         (letfn [(add [p]
>                      (when (not (or (.contains s1 p) (.contains s2 p)))
>                        (.add s0 p)))]
>                (doseq [p s1] (iter-neighbors5 add p))
>                (recur (dec n) s0 s1))))))
>
> This is basically the closest I can get to the F#, Rust and OCaml
> solution. I create a minimal class where I overload equals and hashCode
> using the same logic the F# solution uses. It gives me equal performance to
> F# on my machine: 3s.
>
> Here's the run down from slowest to fastest with a size of 100 using
> criterium. All of them I did my best to avoid reflection and boxed numeric
> and had enabled uncheck math. Granted I did not find that boxing and
> unchecked math really provided a drastic improvement in my case.:
>
>    1. Using a record to hold the point with a java HashSet - 826ms
>    2. Using a struct to hold the point with a java HashSet - 246ms
>    3. Using a volatile with a Clojure HashSet and vectors for point - 50ms
>    4. Using an atom with a Clojure HashSet and vectors for point - 51ms
>    5. Using java.awt Point with a java HashSet and add as a closure - 41ms
>    6. Using vectors with a java HashSet, but with add done inside doseq,
>    without a closure - 11ms
>    7. Using vectors with a java HashSet and add as a closure - 9ms
>    8. Using java.awt Point with a java HashSet, but with add done inside
>    doseq, without a closure (Michael Gardner's solution) - 7.5ms
>    9. Using deftype with overloaded equals and hashCode as points with a
>    java HashSet, but with add done inside doseq, without a closure - 6ms
>    10. Using deftype with overloaded equals and hashCode as points with a
>    java HashSet and add as a closure - 4ms
>
> This is what I gathered from my experiments:
>
>    - Using java's mutable HashSet instead of Clojure's persistent hashset
>    gave a significant speed improvement. From 30s to 5s. This was the most
>    significant speed boost.
>    - Records are really slow, probably because of the time to instantiate
>    them. So slow, that even though it was using a mutable HashSet, it was
>    still slower then Clojure sets with vectors.
>    - Similarly, structs are also pretty slow, which I'd guess is also
>    because of instantiating them. Though they were faster then Records.
>    - Volatile did not noticeably improve performance compared to Atom,
>    but I learned about them today, did not know Clojure had that.
>    - Unsafe Math and type hinting numeric to get rid of boxing did not
>    noticeably improve performance, but I learned about it today too. Speaking
>    of which, is ^long faster then ^int?
>    - Vectors were as fast as java.awt.Point. Though for some reason,
>    putting the java.awt.Point inside a closure slowed the whole thing down
>    quite a bit. Where as without it, using a closure actually performed
>    faster. Maybe I did something wrong here, since I find that a little
>    surprising and confusing.
>    - Defining my own type optimized for my specific use-case was the
>    fastest. deftypes are lightweight, and instantiate quickly, and with the
>    custom equals and hashCode tailored to the problem, they performed best,
>    matching F# and beating out OCaml and Rust.
>
>
> On Tuesday, 15 November 2016 19:39:43 UTC-8, Didier wrote:
>>
>> Hey all,
>>
>> I came upon a benchmark of F#, Rust and OCaml, where F# performs much
>> faster then the other two. I decided for fun to try and port it to Clojure
>> to see how Clojure does. Benchmark link: https://github.com/c-cube/hash
>> set_benchs
>>
>> This is my code for it: https://gist.github.com/didibu
>> s/1fd4c00b69d927745fbce3dcd7ca461a
>>
>> (ns hash-set-bench
>>   "A Benchmark I modified to Clojure from:
>>    https://github.com/c-cube/hashset_benchs";)
>>
>> (defn iterNeighbors [f [i j]]
>>   (f [(dec i) j])
>>   (f [(inc i) j])
>>   (f [i (dec j)])
>>   (f [i (inc j)]))
>>
>> (defn nth* [n p]
>>   (loop [n n s1 #{p} s2 #{}]
>>     (if (= n 0)
>>       s1
>>       (let [s0 (atom #{})]
>>         (letfn [(add [p]
>>                      (when (not (or (contains? s1 p) (contains? s2 p)))
>>                        (reset! s0 (conj @s0 p))))]
>>                (doseq [p s1] (iterNeighbors add p))
>>                (recur (dec n) @s0 s1))))))
>>
>> #_(printf "result is %d" (count (time (nth* 2000 [0 0]))))
>>
>> And here's the F# code: https://github.com/c-cube/hash
>> set_benchs/blob/master/neighbors2.fsx
>>
>> Currently, this takes about 30s in Clojure, while it only takes around 3s
>> for OCaml, Rust and F#.
>>
>> From what I see, the differences between my code and theirs are:
>>
>>    - Lack of a Point struct, I'm just using a vector.
>>    - They use a mutable set, I don't.
>>    - They overrode Hashing for their point struct, as well as equality.
>>    I rely on Clojure's default hashing, and vector equality.
>>
>> I'm not sure if any of these things should really impact performance that
>> much though. And what I could do in Clojure if I wanted to improve it.
>>
>>
>> Any Help?
>>
>>
>> Thanks.
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to