Thanks for posting that here Andy - I've been mostly off the grid this week and haven't had time to do some of this followup. I would also like to say many thanks to Mark and Andy who have spent a lot of time elucidating the original issue and working on how to address it.
Alex On Thursday, January 30, 2014 11:13:56 AM UTC-5, Andy Fingerhut wrote: > > Thanks to the work and thought of Mark Engelberg, Alex Miller, Rich > Hickey, myself, and likely several others who I should be naming here but > am forgetting, the latest (not yet released) Clojure master version has an > improved hash function that gives a much better variety of hash values, and > thus much improved performance for hash-based collections like hash maps > and hash sets, than previous versions of Clojure. > > For example, the N-queen program that Paul Butcher wrote and was asking > about its performance at the beginning of this thread, improves from 6.7 > minutes before these hash changes to 12.8 seconds after the changes, when > run on the 6x6 board. For the 6x9 board, the improvement is from > "something over 8 hours that I didn't measure precisely because I didn't > want to wait" down to 12.8 minutes. A few more experimental results are > given on the design wiki page [1]. > > [1] http://dev.clojure.org/display/design/Better+hashing > > Andy > > > On Sat, Nov 2, 2013 at 9:44 AM, Andy Fingerhut > <andy.fi...@gmail.com<javascript:> > > wrote: > >> A few minutes ago I finished copying, pasting, and doing a little >> reformatting on Mark Engelberg's document on the subject. >> >> http://dev.clojure.org/display/design/Better+hashing >> >> Andy >> >> >> On Fri, Nov 1, 2013 at 11:28 AM, Alex Miller >> <al...@puredanger.com<javascript:> >> > wrote: >> >>> Has anyone created a design page on dev.clojure for this yet? >>> >>> >>> On Thursday, October 31, 2013 9:19:23 AM UTC-5, Andy Fingerhut wrote: >>> >>>> Just a follow up after a fair amount of thinking and experimentation >>>> has been done on this problem. >>>> >>>> Mark Engelberg has a very nice document describing the existing hash >>>> functions used by Clojure, examples with why they can cause collisions so >>>> often, and suggestions for changes to them to have fewer collisions in >>>> these cases. https://docs.google.com/document/d/ >>>> 10olJzjNHXn1Si1LsSvjNqF_X1O7OTPZLuv3Uo_duY5o/edit >>>> >>>> I have made modifications to a fork of the Clojure code base with his >>>> suggested hash modifications here, for experimentation purposes: >>>> https://github.com/jafingerhut/clojure I also have some >>>> modifications to Paul's N-queens Clojure program to print out additional >>>> statistics and try out different hash functions here: >>>> https://github.com/jafingerhut/chess-clojure >>>> >>>> Running Paul's N-queens problem with a 6x6 board without any changes to >>>> Clojure takes about 7 mins on one of my computers. With those changes to >>>> the hash functions it finishes in about 12 sec. >>>> >>>> I haven't let a 6x9 board run to completion without those hash changes, >>>> but it takes a long time :-) With those changes to the hash functions it >>>> finishes in about 11 minutes. >>>> >>>> The reason for the currently slow behavior is a combination of the >>>> current hash functions and the implementation of the PersistentHashSet >>>> data >>>> structure used to implement sets. A PersistentHashSet is, very roughly, a >>>> 32-way branching tree with the hash function of the elements used to >>>> decide >>>> which branch to take. When you hit a leaf in the tree, all elements with >>>> the same hash function value are put into an array that is scanned >>>> linearly >>>> whenever you want to check whether a value is already in the set. So the >>>> more hash collisions, the more it is like using an array to store and look >>>> up set elements in O(n) time. >>>> >>>> With the 6x9 board, there are 20,136,752 solutions. The way those >>>> solutions are represented in the program as sets of vectors, those 20 >>>> million values only have 17,936 different hash values. Thus the average >>>> length of those arrays of values in the tree leaves is 1,122 values. The >>>> longest of those arrays has 81,610 values in it, all with the same hash >>>> function. >>>> >>>> With Mark's proposed hash function changes, those 20,136,752 values >>>> hash to 20,089,488 different ints, for an average array length of 1 value, >>>> and a longest array length of 3 values. Big speedup. >>>> >>>> There is nothing unique about Mark's particular hash functions that >>>> achieves this. Other hash modifications can give similar improvements. >>>> The Clojure dev team is considering which changes would be good ones to >>>> make, in hopes of changing them once and not having to do it again. >>>> >>>> Andy >>>> >>>> >>>> On Tue, Oct 22, 2013 at 9:28 AM, Paul Butcher <pa...@paulbutcher.com>wrote: >>>> >>>>> I've been playing around with a generalised version of the N-Queens >>>>> problem that handles other chess pieces. I have a Scala implementation >>>>> that >>>>> uses an eager depth-first search. Given enough RAM (around 12G - it's >>>>> very >>>>> memory hungry!) it can solve a 6x9 board with 6 pieces in around 2.5 >>>>> minutes on my MacBook Pro. >>>>> >>>>> I then created a Clojure version of the same algorithm with the >>>>> intention of (eventually) using laziness to avoid the memory issue. >>>>> However, the performance of the Clojure version is several orders of >>>>> magnitude worse than the Scala version (I've left it running overnight on >>>>> the problem that the Scala version solves in 2.5 minutes and it still >>>>> hasn't completed). >>>>> >>>>> I'm struggling to see why the performance of the Clojure version is so >>>>> much worse than the Scala. Profiling in YourKit suggests that it's >>>>> spending >>>>> 99% of its time in PersistentHashSet.cons, but I'm at a loss to explain >>>>> why. I would be grateful for any insight. >>>>> >>>>> The code for the two different versions is on GitHub: >>>>> >>>>> https://github.com/paulbutcher/chess-scala >>>>> https://github.com/paulbutcher/chess-clojure >>>>> >>>>> Notes: >>>>> >>>>> - I know that an exhaustive depth-first search isn't a great way to >>>>> tackle this problem. But I'd like to understand why I see such a dramatic >>>>> difference between the performance of the Scala and Clojure versions. >>>>> >>>>> - I believe that I've implemented the same algorithm in both Scala and >>>>> Clojure - certainly they both generate the same results for small >>>>> problems >>>>> (there are 3x3 and 4x4 problems commented out in the source). But clearly >>>>> I >>>>> can't rule out the possibility that I've made a mistake in the Clojure >>>>> implementation. If anyone can spot my stupid mistake, I'd greatly >>>>> appreciate it. >>>>> >>>>> Thanks in advance for any insight that anyone can offer. >>>>> >>>>> -- >>>>> paul.butcher->msgCount++ >>>>> >>>>> Snetterton, Castle Combe, Cadwell Park... >>>>> Who says I have a one track mind? >>>>> >>>>> http://www.paulbutcher.com/ >>>>> LinkedIn: http://www.linkedin.com/in/paulbutcher >>>>> MSN: pa...@paulbutcher.com >>>>> AIM: paulrabutcher >>>>> Skype: paulrabutcher >>>>> >>>>> -- >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> To post to this group, send email to clo...@googlegroups.com >>>>> >>>>> Note that posts from new members are moderated - please be patient >>>>> with your first post. >>>>> To unsubscribe from this group, send email to >>>>> clojure+u...@googlegroups.com >>>>> >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/clojure?hl=en >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Clojure" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to clojure+u...@googlegroups.com. >>>>> >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com<javascript:> >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to clojure+u...@googlegroups.com <javascript:>. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.