Just a follow up after a fair amount of thinking and experimentation has
been done on this problem.

Mark Engelberg has a very nice document describing the existing hash
functions used by Clojure, examples with why they can cause collisions so
often, and suggestions for changes to them to have fewer collisions in
these cases.
https://docs.google.com/document/d/10olJzjNHXn1Si1LsSvjNqF_X1O7OTPZLuv3Uo_duY5o/edit

I have made modifications to a fork of the Clojure code base with his
suggested hash modifications here, for experimentation purposes:
https://github.com/jafingerhut/clojure   I also have some modifications to
Paul's N-queens Clojure program to print out additional statistics and try
out different hash functions here:
https://github.com/jafingerhut/chess-clojure

Running Paul's N-queens problem with a 6x6 board without any changes to
Clojure takes about 7 mins on one of my computers.  With those changes to
the hash functions it finishes in about 12 sec.

I haven't let a 6x9 board run to completion without those hash changes, but
it takes a long time :-)  With those changes to the hash functions it
finishes in about 11 minutes.

The reason for the currently slow behavior is a combination of the current
hash functions and the implementation of the PersistentHashSet data
structure used to implement sets.  A PersistentHashSet is, very roughly, a
32-way branching tree with the hash function of the elements used to decide
which branch to take.  When you hit a leaf in the tree, all elements with
the same hash function value are put into an array that is scanned linearly
whenever you want to check whether a value is already in the set.  So the
more hash collisions, the more it is like using an array to store and look
up set elements in O(n) time.

With the 6x9 board, there are 20,136,752 solutions.  The way those
solutions are represented in the program as sets of vectors, those 20
million values only have 17,936 different hash values.  Thus the average
length of those arrays of values in the tree leaves is 1,122 values.  The
longest of those arrays has 81,610 values in it, all with the same hash
function.

With Mark's proposed hash function changes, those 20,136,752 values hash to
20,089,488 different ints, for an average array length of 1 value, and a
longest array length of 3 values.  Big speedup.

There is nothing unique about Mark's particular hash functions that
achieves this.  Other hash modifications can give similar improvements.
The Clojure dev team is considering which changes would be good ones to
make, in hopes of changing them once and not having to do it again.

Andy


On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <p...@paulbutcher.com> wrote:

> I've been playing around with a generalised version of the N-Queens
> problem that handles other chess pieces. I have a Scala implementation that
> uses an eager depth-first search. Given enough RAM (around 12G - it's very
> memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5
> minutes on my MacBook Pro.
>
> I then created a Clojure version of the same algorithm with the intention
> of (eventually) using laziness to avoid the memory issue. However, the
> performance of the Clojure version is several orders of magnitude worse
> than the Scala version (I've left it running overnight on the problem that
> the Scala version solves in 2.5 minutes and it still hasn't completed).
>
> I'm struggling to see why the performance of the Clojure version is so
> much worse than the Scala. Profiling in YourKit suggests that it's spending
> 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain
> why. I would be grateful for any insight.
>
> The code for the two different versions is on GitHub:
>
> https://github.com/paulbutcher/chess-scala
> https://github.com/paulbutcher/chess-clojure
>
> Notes:
>
> - I know that an exhaustive depth-first search isn't a great way to tackle
> this problem. But I'd like to understand why I see such a dramatic
> difference between the performance of the Scala and Clojure versions.
>
> - I believe that I've implemented the same algorithm in both Scala and
> Clojure - certainly they both generate the same results for small problems
> (there are 3x3 and 4x4 problems commented out in the source). But clearly I
> can't rule out the possibility that I've made a mistake in the Clojure
> implementation. If anyone can spot my stupid mistake, I'd greatly
> appreciate it.
>
> Thanks in advance for any insight that anyone can offer.
>
> --
> paul.butcher->msgCount++
>
> Snetterton, Castle Combe, Cadwell Park...
> Who says I have a one track mind?
>
> http://www.paulbutcher.com/
> LinkedIn: http://www.linkedin.com/in/paulbutcher
> MSN: p...@paulbutcher.com
> AIM: paulrabutcher
> Skype: paulrabutcher
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to