Thanks for posting that here Andy - I've been mostly off the grid this week 
and haven't had time to do some of this followup. I would also like to say 
many thanks to Mark and Andy who have spent a lot of time elucidating the 
original issue and working on how to address it.

Alex

On Thursday, January 30, 2014 11:13:56 AM UTC-5, Andy Fingerhut wrote:
>
> Thanks to the work and thought of Mark Engelberg, Alex Miller, Rich 
> Hickey, myself, and likely several others who I should be naming here but 
> am forgetting, the latest (not yet released) Clojure master version has an 
> improved hash function that gives a much better variety of hash values, and 
> thus much improved performance for hash-based collections like hash maps 
> and hash sets, than previous versions of Clojure.
>
> For example, the N-queen program that Paul Butcher wrote and was asking 
> about its performance at the beginning of this thread, improves from 6.7 
> minutes before these hash changes to 12.8 seconds after the changes, when 
> run on the 6x6 board.  For the 6x9 board, the improvement is from 
> "something over 8 hours that I didn't measure precisely because I didn't 
> want to wait" down to 12.8 minutes.  A few more experimental results are 
> given on the design wiki page [1].
>
> [1] http://dev.clojure.org/display/design/Better+hashing
>
> Andy
>
>
> On Sat, Nov 2, 2013 at 9:44 AM, Andy Fingerhut 
> <andy.fi...@gmail.com<javascript:>
> > wrote:
>
>> A few minutes ago I finished copying, pasting, and doing a little 
>> reformatting on Mark Engelberg's document on the subject.
>>
>>     http://dev.clojure.org/display/design/Better+hashing
>>
>> Andy
>>
>>
>> On Fri, Nov 1, 2013 at 11:28 AM, Alex Miller 
>> <al...@puredanger.com<javascript:>
>> > wrote:
>>
>>> Has anyone created a design page on dev.clojure for this yet? 
>>>
>>>
>>> On Thursday, October 31, 2013 9:19:23 AM UTC-5, Andy Fingerhut wrote:
>>>
>>>> Just a follow up after a fair amount of thinking and experimentation 
>>>> has been done on this problem.
>>>>
>>>> Mark Engelberg has a very nice document describing the existing hash 
>>>> functions used by Clojure, examples with why they can cause collisions so 
>>>> often, and suggestions for changes to them to have fewer collisions in 
>>>> these cases.  https://docs.google.com/document/d/
>>>> 10olJzjNHXn1Si1LsSvjNqF_X1O7OTPZLuv3Uo_duY5o/edit
>>>>
>>>> I have made modifications to a fork of the Clojure code base with his 
>>>> suggested hash modifications here, for experimentation purposes: 
>>>> https://github.com/jafingerhut/clojure   I also have some 
>>>> modifications to Paul's N-queens Clojure program to print out additional 
>>>> statistics and try out different hash functions here: 
>>>> https://github.com/jafingerhut/chess-clojure
>>>>
>>>> Running Paul's N-queens problem with a 6x6 board without any changes to 
>>>> Clojure takes about 7 mins on one of my computers.  With those changes to 
>>>> the hash functions it finishes in about 12 sec.
>>>>
>>>> I haven't let a 6x9 board run to completion without those hash changes, 
>>>> but it takes a long time :-)  With those changes to the hash functions it 
>>>> finishes in about 11 minutes.
>>>>
>>>> The reason for the currently slow behavior is a combination of the 
>>>> current hash functions and the implementation of the PersistentHashSet 
>>>> data 
>>>> structure used to implement sets.  A PersistentHashSet is, very roughly, a 
>>>> 32-way branching tree with the hash function of the elements used to 
>>>> decide 
>>>> which branch to take.  When you hit a leaf in the tree, all elements with 
>>>> the same hash function value are put into an array that is scanned 
>>>> linearly 
>>>> whenever you want to check whether a value is already in the set.  So the 
>>>> more hash collisions, the more it is like using an array to store and look 
>>>> up set elements in O(n) time.
>>>>
>>>> With the 6x9 board, there are 20,136,752 solutions.  The way those 
>>>> solutions are represented in the program as sets of vectors, those 20 
>>>> million values only have 17,936 different hash values.  Thus the average 
>>>> length of those arrays of values in the tree leaves is 1,122 values.  The 
>>>> longest of those arrays has 81,610 values in it, all with the same hash 
>>>> function.
>>>>
>>>> With Mark's proposed hash function changes, those 20,136,752 values 
>>>> hash to 20,089,488 different ints, for an average array length of 1 value, 
>>>> and a longest array length of 3 values.  Big speedup.
>>>>
>>>> There is nothing unique about Mark's particular hash functions that 
>>>> achieves this.  Other hash modifications can give similar improvements.  
>>>> The Clojure dev team is considering which changes would be good ones to 
>>>> make, in hopes of changing them once and not having to do it again.
>>>>
>>>> Andy
>>>>
>>>>
>>>> On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <pa...@paulbutcher.com>wrote:
>>>>
>>>>> I've been playing around with a generalised version of the N-Queens 
>>>>> problem that handles other chess pieces. I have a Scala implementation 
>>>>> that 
>>>>> uses an eager depth-first search. Given enough RAM (around 12G - it's 
>>>>> very 
>>>>> memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5 
>>>>> minutes on my MacBook Pro.
>>>>>
>>>>> I then created a Clojure version of the same algorithm with the 
>>>>> intention of (eventually) using laziness to avoid the memory issue. 
>>>>> However, the performance of the Clojure version is several orders of 
>>>>> magnitude worse than the Scala version (I've left it running overnight on 
>>>>> the problem that the Scala version solves in 2.5 minutes and it still 
>>>>> hasn't completed).
>>>>>
>>>>> I'm struggling to see why the performance of the Clojure version is so 
>>>>> much worse than the Scala. Profiling in YourKit suggests that it's 
>>>>> spending 
>>>>> 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain 
>>>>> why. I would be grateful for any insight.
>>>>>
>>>>> The code for the two different versions is on GitHub:
>>>>>
>>>>> https://github.com/paulbutcher/chess-scala
>>>>>  https://github.com/paulbutcher/chess-clojure
>>>>>
>>>>> Notes:
>>>>>
>>>>> - I know that an exhaustive depth-first search isn't a great way to 
>>>>> tackle this problem. But I'd like to understand why I see such a dramatic 
>>>>> difference between the performance of the Scala and Clojure versions.
>>>>>
>>>>> - I believe that I've implemented the same algorithm in both Scala and 
>>>>> Clojure - certainly they both generate the same results for small 
>>>>> problems 
>>>>> (there are 3x3 and 4x4 problems commented out in the source). But clearly 
>>>>> I 
>>>>> can't rule out the possibility that I've made a mistake in the Clojure 
>>>>> implementation. If anyone can spot my stupid mistake, I'd greatly 
>>>>> appreciate it.
>>>>>
>>>>> Thanks in advance for any insight that anyone can offer.
>>>>>
>>>>> --
>>>>> paul.butcher->msgCount++
>>>>>
>>>>> Snetterton, Castle Combe, Cadwell Park...
>>>>> Who says I have a one track mind?
>>>>>
>>>>> http://www.paulbutcher.com/
>>>>> LinkedIn: http://www.linkedin.com/in/paulbutcher
>>>>> MSN: pa...@paulbutcher.com
>>>>> AIM: paulrabutcher
>>>>> Skype: paulrabutcher
>>>>>  
>>>>>  -- 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Clojure" group.
>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>>
>>>>> Note that posts from new members are moderated - please be patient 
>>>>> with your first post.
>>>>> To unsubscribe from this group, send email to
>>>>> clojure+u...@googlegroups.com
>>>>>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/clojure?hl=en
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Clojure" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to clojure+u...@googlegroups.com.
>>>>>
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>
>>>>  -- 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to