OK, I get that if lists have to match the Java hash code, then it's a nonstarter to try to improve it. However, Clojure makes a distinction between hasheq and hashCode, one of which is used within Clojure and one of which is used for Java interop, and I don't think they necessarily need to match.
I don't really consider this to be a contrived case. It's pretty typical, in my experience, for Clojure programmers to choose to represent a 2D array as a map from [x y] to values. That's often more practical to intialize and work with than a vectors-in-vector layout. So a map with keys ranging from [0 0] to [199 199] would be what you'd get for a 200x200 array. Having 6-7 objects per bucket isn't the end of the world, but it means that all your lookups are several times slower than they could be. All that said, I think that improving set hashing is a more pressing issue. On Wed, Oct 23, 2013 at 9:05 PM, Mikera <mike.r.anderson...@gmail.com>wrote: > On Thursday, 24 October 2013 09:12:21 UTC+8, puzzler wrote: > >> Even though vectors aren't as prone to collision between similar-looking >> vectors, I think you are right that the smallness of the numbers is a >> problem: >> >> Consider the following: >> >> => (count (set (for [x (range 200) y (range 200)] (hash [x y])))) >> 6369 >> >> This means that out of 40,000 common ordered pairs, there are only 6369 >> distinct hash values. That's not good. >> >> So yes, upon further thought, I think sequence hashing should be improved >> as well. >> >> I want to emphasize that I don't view this as merely a "theoretical" >> problem. I have noticed for over a year now, when profiling my Clojure >> code, that a surprising amount of time was spent on dealing with hash >> collisions. I chalked it up as the natural behavior of hash tables which >> will occasionally have collisions, and just learned to live with it, but >> the more I play around with the existing hash function on data similar to >> what I use (with small numbers in slightly varying structures), the more >> I'm convinced that Clojure's weak hashing strategy explains the performance >> issues I've had. Would love to see the Clojure community tackle this >> head-on; I think there's a lot of performance to be gained for many real >> apps by doing this. >> > > You can't change the hashCode of lists / vectors without breaking the > contract for java.util.List (which PersistentList and PersistentVector both > implement) - > http://docs.oracle.com/javase/7/docs/api/java/util/List.html#hashCode() > > I don't think breaking this would be a good idea. > > The java.util.List hashCode isn't actually too bad. It's a decent > compromise between an efficient implementation and reasonably effective > hashing for most real-world use cases. > > Sure you can construct artificial examples where you get some collisions, > but that's true of any hash code function. But note that the maximum number > of collisions is still pretty low: > > (apply max (vals (frequencies (for [x (range 200) y (range 200)] (hash [x > y]))))) > => 7 > > That's actually the number you really care about, since it is the number > of objects in the same hash bucket that drives the pathological O(n^2) > behaviours. > > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.