I wonder if this whole business could be addressed better with a better interface. When doing both hashing and equality, the real question is just "what parts of this thing actually matter?" and "how do I compare them?" If you had a "important_parts" function that equality and hashing were both generically defined in terms of, then they couldn't possibly mismatch. Obviously that function needs a better name.
On Fri, Jul 17, 2015 at 11:34 AM, Matt Bauman <mbau...@gmail.com> wrote: > unique() is only broken for custom mutable types that are broken. That > method of ensuring differentiation is equality. Here's what happened: > > * Define mutable type Foo > * Define equality by Foo's contents > * But don't define a new hash function, and instead fall back on the > default implementation (which hashes differently based upon object identity) > > Now we're in a situation where two objects that compare as equal have > different hashes. When we go to insert an object of this type into a keys > of dictionary (or set, as used by unique), here's what happens: > > * The dictionary asks for the hash of the key > * Dictionaries start with 16 "slots" where they can store objects, so it > just looks at the last hex digit from the hash to decide which one to use > * If there's already an object in that slot, the dictionary must determine > if it's the object it's looking for or simply a collision (which is > expected). So it checks if the existing object is equal to the new one. If > it's equal, then that's the slot it's looking for. If not, then it checks > if it's in the next slot. And so on. > > * So now we're in this inconsistent state: most of the time (15 out of 16 > times) objects that claim to be equal get put in different slots because > their hashes are different. But sometimes they'll happen to end up in the > same slot, and you get this very different behavior (which happened to be > what you want in this case — the one-element uniqued vector). > > > On Friday, July 17, 2015 at 10:55:31 AM UTC-4, Seth wrote: >> >> But doesn't mean that unique() will stay broken for custom mutable types? >> That is, if unique() on custom mutable types relies on the fact that the >> last hash octet is different (and therefore there are no hash collisions), >> then there has to be SOME method of ensuring differentiation. What am I >> missing? >> >> On Friday, July 17, 2015 at 5:40:42 AM UTC-7, Stefan Karpinski wrote: >>> >>> Two reasons: >>> >>> 1. That would entail recomputing the hash of any object that's >>> already in the same table slot (which is not expected to be *that* rare >>> in >>> a hash table). >>> 2. The == function (well, isequal, really), if correctly defined >>> should always imply that hashes match. If it's possible to get different >>> hashes when two objects are == then either your hash or your == function >>> are broken. >>> >>> If we stored hash values in the hash table, then reason 1 would go away >>> (at a significant memory cost), so we could warn the user when 2 occurs and >>> they have a mismatch between their == and hash functions. >>> >>> On Fri, Jul 17, 2015 at 3:32 AM, Jeffrey Sarnoff <jeffrey...@gmail.com> >>> wrote: >>> >>>> just curious, why are the full hashes not compared when least >>>> significant octets match (above)? >>>> >>>> >>>> >>>> On Thursday, July 16, 2015 at 12:25:01 PM UTC-4, Matt Bauman wrote: >>>> >>>>> On Thursday, July 16, 2015 at 12:19:25 PM UTC-4, milktrader wrote: >>>>>> >>>>>> Also, back to the OP question, is the correct solution to simply >>>>>> define >>>>>> >>>>>> Base.hash(f::Foo) = f.x >>>>>> >>>>> >>>>> No, I'd define Base.hash(f::Foo) = hash(f.x, 0x64c74221932dea5b), >>>>> where I chose the constant by rand(UInt). This way it won't collide with >>>>> other types. I really need to spend a bit more time with my interfaces >>>>> chapter and add this info there. >>>>> >>>>> On Thursday, July 16, 2015 at 12:07:50 PM UTC-4, Seth wrote: >>>>> >>>>>> If I'm reading between the lines correctly, the default/existing hash >>>>>> function is based on the last byte of the ID? Is there a reason we don't >>>>>> make the table wider to reduce the chances of collisions (or would this >>>>>> have bad effects on memory utilization)? >>>>>> >>>>> >>>>> It's not really a hash collision, but rather a hashtable index >>>>> collision. A new dict only has 16 slots for its keys, and the index for >>>>> an >>>>> element is simply chosen by the last 8 bits of its hash. If those last 8 >>>>> bits match the hash of an object already in the table, there's a >>>>> collision, >>>>> and the Dict checks to see if the existing element `isequal` to the new >>>>> one >>>>> (and doesn't look at the hash again). So the hashes are different, but >>>>> isequal says the objects are the same. This is why things are wonky. >>>>> >>>> >>>