Perhaps a @unique macro in front of the type field could specify that this is to be used in equality/hashing?
On Friday, July 17, 2015 at 8:50:22 AM UTC-7, Stefan Karpinski wrote: > > I wonder if this whole business could be addressed better with a better > interface. When doing both hashing and equality, the real question is just > "what parts of this thing actually matter?" and "how do I compare them?" If > you had a "important_parts" function that equality and hashing were both > generically defined in terms of, then they couldn't possibly mismatch. > Obviously that function needs a better name. > > On Fri, Jul 17, 2015 at 11:34 AM, Matt Bauman <mba...@gmail.com > <javascript:>> wrote: > >> unique() is only broken for custom mutable types that are broken. That >> method of ensuring differentiation is equality. Here's what happened: >> >> * Define mutable type Foo >> * Define equality by Foo's contents >> * But don't define a new hash function, and instead fall back on the >> default implementation (which hashes differently based upon object identity) >> >> Now we're in a situation where two objects that compare as equal have >> different hashes. When we go to insert an object of this type into a keys >> of dictionary (or set, as used by unique), here's what happens: >> >> * The dictionary asks for the hash of the key >> * Dictionaries start with 16 "slots" where they can store objects, so it >> just looks at the last hex digit from the hash to decide which one to use >> * If there's already an object in that slot, the dictionary must >> determine if it's the object it's looking for or simply a collision (which >> is expected). So it checks if the existing object is equal to the new one. >> If it's equal, then that's the slot it's looking for. If not, then it >> checks if it's in the next slot. And so on. >> >> * So now we're in this inconsistent state: most of the time (15 out of 16 >> times) objects that claim to be equal get put in different slots because >> their hashes are different. But sometimes they'll happen to end up in the >> same slot, and you get this very different behavior (which happened to be >> what you want in this case — the one-element uniqued vector). >> >> >> On Friday, July 17, 2015 at 10:55:31 AM UTC-4, Seth wrote: >>> >>> But doesn't mean that unique() will stay broken for custom mutable >>> types? That is, if unique() on custom mutable types relies on the fact that >>> the last hash octet is different (and therefore there are no hash >>> collisions), then there has to be SOME method of ensuring differentiation. >>> What am I missing? >>> >>> On Friday, July 17, 2015 at 5:40:42 AM UTC-7, Stefan Karpinski wrote: >>>> >>>> Two reasons: >>>> >>>> 1. That would entail recomputing the hash of any object that's >>>> already in the same table slot (which is not expected to be *that* rare >>>> in >>>> a hash table). >>>> 2. The == function (well, isequal, really), if correctly defined >>>> should always imply that hashes match. If it's possible to get >>>> different >>>> hashes when two objects are == then either your hash or your == >>>> function >>>> are broken. >>>> >>>> If we stored hash values in the hash table, then reason 1 would go away >>>> (at a significant memory cost), so we could warn the user when 2 occurs >>>> and >>>> they have a mismatch between their == and hash functions. >>>> >>>> On Fri, Jul 17, 2015 at 3:32 AM, Jeffrey Sarnoff <jeffrey...@gmail.com> >>>> wrote: >>>> >>>>> just curious, why are the full hashes not compared when least >>>>> significant octets match (above)? >>>>> >>>>> >>>>> >>>>> On Thursday, July 16, 2015 at 12:25:01 PM UTC-4, Matt Bauman wrote: >>>>> >>>>>> On Thursday, July 16, 2015 at 12:19:25 PM UTC-4, milktrader wrote: >>>>>>> >>>>>>> Also, back to the OP question, is the correct solution to simply >>>>>>> define >>>>>>> >>>>>>> Base.hash(f::Foo) = f.x >>>>>>> >>>>>> >>>>>> No, I'd define Base.hash(f::Foo) = hash(f.x, 0x64c74221932dea5b), >>>>>> where I chose the constant by rand(UInt). This way it won't collide >>>>>> with >>>>>> other types. I really need to spend a bit more time with my interfaces >>>>>> chapter and add this info there. >>>>>> >>>>>> On Thursday, July 16, 2015 at 12:07:50 PM UTC-4, Seth wrote: >>>>>> >>>>>>> If I'm reading between the lines correctly, the default/existing >>>>>>> hash function is based on the last byte of the ID? Is there a reason we >>>>>>> don't make the table wider to reduce the chances of collisions (or >>>>>>> would >>>>>>> this have bad effects on memory utilization)? >>>>>>> >>>>>> >>>>>> It's not really a hash collision, but rather a hashtable index >>>>>> collision. A new dict only has 16 slots for its keys, and the index for >>>>>> an >>>>>> element is simply chosen by the last 8 bits of its hash. If those last >>>>>> 8 >>>>>> bits match the hash of an object already in the table, there's a >>>>>> collision, >>>>>> and the Dict checks to see if the existing element `isequal` to the new >>>>>> one >>>>>> (and doesn't look at the hash again). So the hashes are different, but >>>>>> isequal says the objects are the same. This is why things are wonky. >>>>>> >>>>> >>>> >