awesome news :)

Jim

On 31/10/13 14:19, Andy Fingerhut wrote:
Just a follow up after a fair amount of thinking and experimentation has been done on this problem.

Mark Engelberg has a very nice document describing the existing hash functions used by Clojure, examples with why they can cause collisions so often, and suggestions for changes to them to have fewer collisions in these cases. https://docs.google.com/document/d/10olJzjNHXn1Si1LsSvjNqF_X1O7OTPZLuv3Uo_duY5o/edit

I have made modifications to a fork of the Clojure code base with his suggested hash modifications here, for experimentation purposes: https://github.com/jafingerhut/clojure I also have some modifications to Paul's N-queens Clojure program to print out additional statistics and try out different hash functions here: https://github.com/jafingerhut/chess-clojure

Running Paul's N-queens problem with a 6x6 board without any changes to Clojure takes about 7 mins on one of my computers. With those changes to the hash functions it finishes in about 12 sec.

I haven't let a 6x9 board run to completion without those hash changes, but it takes a long time :-) With those changes to the hash functions it finishes in about 11 minutes.

The reason for the currently slow behavior is a combination of the current hash functions and the implementation of the PersistentHashSet data structure used to implement sets. A PersistentHashSet is, very roughly, a 32-way branching tree with the hash function of the elements used to decide which branch to take. When you hit a leaf in the tree, all elements with the same hash function value are put into an array that is scanned linearly whenever you want to check whether a value is already in the set. So the more hash collisions, the more it is like using an array to store and look up set elements in O(n) time.

With the 6x9 board, there are 20,136,752 solutions. The way those solutions are represented in the program as sets of vectors, those 20 million values only have 17,936 different hash values. Thus the average length of those arrays of values in the tree leaves is 1,122 values. The longest of those arrays has 81,610 values in it, all with the same hash function.

With Mark's proposed hash function changes, those 20,136,752 values hash to 20,089,488 different ints, for an average array length of 1 value, and a longest array length of 3 values. Big speedup.

There is nothing unique about Mark's particular hash functions that achieves this. Other hash modifications can give similar improvements. The Clojure dev team is considering which changes would be good ones to make, in hopes of changing them once and not having to do it again.

Andy


On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <p...@paulbutcher.com <mailto:p...@paulbutcher.com>> wrote:

    I've been playing around with a generalised version of the
    N-Queens problem that handles other chess pieces. I have a Scala
    implementation that uses an eager depth-first search. Given enough
    RAM (around 12G - it's very memory hungry!) it can solve a 6x9
    board with 6 pieces in around 2.5 minutes on my MacBook Pro.

    I then created a Clojure version of the same algorithm with the
    intention of (eventually) using laziness to avoid the memory
    issue. However, the performance of the Clojure version is several
    orders of magnitude worse than the Scala version (I've left it
    running overnight on the problem that the Scala version solves in
    2.5 minutes and it still hasn't completed).

    I'm struggling to see why the performance of the Clojure version
    is so much worse than the Scala. Profiling in YourKit suggests
    that it's spending 99% of its time in PersistentHashSet.cons, but
    I'm at a loss to explain why. I would be grateful for any insight.

    The code for the two different versions is on GitHub:

    https://github.com/paulbutcher/chess-scala
    https://github.com/paulbutcher/chess-clojure

    Notes:

    - I know that an exhaustive depth-first search isn't a great way
    to tackle this problem. But I'd like to understand why I see such
    a dramatic difference between the performance of the Scala and
    Clojure versions.

    - I believe that I've implemented the same algorithm in both Scala
    and Clojure - certainly they both generate the same results for
    small problems (there are 3x3 and 4x4 problems commented out in
    the source). But clearly I can't rule out the possibility that
    I've made a mistake in the Clojure implementation. If anyone can
    spot my stupid mistake, I'd greatly appreciate it.

    Thanks in advance for any insight that anyone can offer.

    --
    paul.butcher->msgCount++

    Snetterton, Castle Combe, Cadwell Park...
    Who says I have a one track mind?

    http://www.paulbutcher.com <http://www.paulbutcher.com/>/
    LinkedIn: http://www.linkedin.com/in/paulbutcher
    MSN: p...@paulbutcher.com <mailto:p...@paulbutcher.com>
    AIM: paulrabutcher
    Skype: paulrabutcher

-- -- You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    <mailto:clojure@googlegroups.com>
    Note that posts from new members are moderated - please be patient
    with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscr...@googlegroups.com
    <mailto:clojure%2bunsubscr...@googlegroups.com>
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en
    ---
    You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to clojure+unsubscr...@googlegroups.com
    <mailto:clojure%2bunsubscr...@googlegroups.com>.
    For more options, visit https://groups.google.com/groups/opt_out.


--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to