OK, I get that if lists have to match the Java hash code, then it's a
nonstarter to try to improve it.  However, Clojure makes a distinction
between hasheq and hashCode, one of which is used within Clojure and one of
which is used for Java interop, and I don't think they necessarily need to
match.

I don't really consider this to be a contrived case.  It's pretty typical,
in my experience, for Clojure programmers to choose to represent a 2D array
as a map from [x y] to values.  That's often more practical to intialize
and work with than a vectors-in-vector layout.  So a map with keys ranging
from [0 0] to [199 199] would be what you'd get for a 200x200 array.

Having 6-7 objects per bucket isn't the end of the world, but it means that
all your lookups are several times slower than they could be.

All that said, I think that improving set hashing is a more pressing issue.



On Wed, Oct 23, 2013 at 9:05 PM, Mikera <mike.r.anderson...@gmail.com>wrote:

> On Thursday, 24 October 2013 09:12:21 UTC+8, puzzler wrote:
>
>> Even though vectors aren't as prone to collision between similar-looking
>> vectors, I think you are right that the smallness of the numbers is a
>> problem:
>>
>> Consider the following:
>>
>> => (count (set (for [x (range 200) y (range 200)] (hash [x y]))))
>> 6369
>>
>> This means that out of 40,000 common ordered pairs, there are only 6369
>> distinct hash values.  That's not good.
>>
>> So yes, upon further thought, I think sequence hashing should be improved
>> as well.
>>
>> I want to emphasize that I don't view this as merely a "theoretical"
>> problem.  I have noticed for over a year now, when profiling my Clojure
>> code, that a surprising amount of time was spent on dealing with hash
>> collisions.  I chalked it up as the natural behavior of hash tables which
>> will occasionally have collisions, and just learned to live with it, but
>> the more I play around with the existing hash function on data similar to
>> what I use (with small numbers in slightly varying structures), the more
>> I'm convinced that Clojure's weak hashing strategy explains the performance
>> issues I've had.  Would love to see the Clojure community tackle this
>> head-on; I think there's a lot of performance to be gained for many real
>> apps by doing this.
>>
>
> You can't change the hashCode of lists / vectors without breaking the
> contract for java.util.List (which PersistentList and PersistentVector both
> implement) -
> http://docs.oracle.com/javase/7/docs/api/java/util/List.html#hashCode()
>
> I don't think breaking this would be a good idea.
>
> The java.util.List hashCode isn't actually too bad.  It's a decent
> compromise between an efficient implementation and reasonably effective
> hashing for most real-world use cases.
>
> Sure you can construct artificial examples where you get some collisions,
> but that's true of any hash code function. But note that the maximum number
> of collisions is still pretty low:
>
> (apply max (vals (frequencies (for [x (range 200) y (range 200)] (hash [x
> y])))))
> => 7
>
> That's actually the number you really care about, since it is the number
> of objects in the same hash bucket that drives the pathological O(n^2)
> behaviours.
>
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to