Re: [julia-users] The unique function and iterables of custom composite types

Seth Fri, 17 Jul 2015 08:53:35 -0700

Perhaps a @unique macro in front of the type field could specify that this 
is to be used in equality/hashing?


On Friday, July 17, 2015 at 8:50:22 AM UTC-7, Stefan Karpinski wrote:
>
> I wonder if this whole business could be addressed better with a better 
> interface. When doing both hashing and equality, the real question is just 
> "what parts of this thing actually matter?" and "how do I compare them?" If 
> you had a "important_parts" function that equality and hashing were both 
> generically defined in terms of, then they couldn't possibly mismatch. 
> Obviously that function needs a better name.
>
> On Fri, Jul 17, 2015 at 11:34 AM, Matt Bauman <mba...@gmail.com 
> <javascript:>> wrote:
>
>> unique() is only broken for custom mutable types that are broken.  That 
>> method of ensuring differentiation is equality.  Here's what happened:
>>
>> * Define mutable type Foo
>> * Define equality by Foo's contents
>> * But don't define a new hash function, and instead fall back on the 
>> default implementation (which hashes differently based upon object identity)
>>
>> Now we're in a situation where two objects that compare as equal have 
>> different hashes.  When we go to insert an object of this type into a keys 
>> of dictionary (or set, as used by unique), here's what happens:
>>
>> * The dictionary asks for the hash of the key
>> * Dictionaries start with 16 "slots" where they can store objects, so it 
>> just looks at the last hex digit from the hash to decide which one to use
>> * If there's already an object in that slot, the dictionary must 
>> determine if it's the object it's looking for or simply a collision (which 
>> is expected).  So it checks if the existing object is equal to the new one. 
>> If it's equal, then that's the slot it's looking for.  If not, then it 
>> checks if it's in the next slot.  And so on.
>>
>> * So now we're in this inconsistent state: most of the time (15 out of 16 
>> times) objects that claim to be equal get put in different slots because 
>> their hashes are different.  But sometimes they'll happen to end up in the 
>> same slot, and you get this very different behavior (which happened to be 
>> what you want in this case — the one-element uniqued vector).
>>
>>
>> On Friday, July 17, 2015 at 10:55:31 AM UTC-4, Seth wrote:
>>>
>>> But doesn't mean that unique() will stay broken for custom mutable 
>>> types? That is, if unique() on custom mutable types relies on the fact that 
>>> the last hash octet is different (and therefore there are no hash 
>>> collisions), then there has to be SOME method of ensuring differentiation. 
>>> What am I missing?
>>>
>>> On Friday, July 17, 2015 at 5:40:42 AM UTC-7, Stefan Karpinski wrote:
>>>>
>>>> Two reasons:
>>>>
>>>>    1. That would entail recomputing the hash of any object that's 
>>>>    already in the same table slot (which is not expected to be *that* rare 
>>>> in 
>>>>    a hash table).
>>>>    2. The == function (well, isequal, really), if correctly defined 
>>>>    should always imply that hashes match. If it's possible to get 
>>>> different 
>>>>    hashes when two objects are == then either your hash or your == 
>>>> function 
>>>>    are broken.
>>>>    
>>>> If we stored hash values in the hash table, then reason 1 would go away 
>>>> (at a significant memory cost), so we could warn the user when 2 occurs 
>>>> and 
>>>> they have a mismatch between their == and hash functions.
>>>>
>>>> On Fri, Jul 17, 2015 at 3:32 AM, Jeffrey Sarnoff <jeffrey...@gmail.com> 
>>>> wrote:
>>>>
>>>>> just curious, why are the full hashes not compared when least 
>>>>> significant octets match (above)?
>>>>>
>>>>>
>>>>>
>>>>> On Thursday, July 16, 2015 at 12:25:01 PM UTC-4, Matt Bauman wrote:
>>>>>
>>>>>> On Thursday, July 16, 2015 at 12:19:25 PM UTC-4, milktrader wrote:
>>>>>>>
>>>>>>> Also, back to the OP question, is the correct solution to simply 
>>>>>>> define 
>>>>>>>
>>>>>>>  Base.hash(f::Foo) = f.x
>>>>>>>
>>>>>>
>>>>>> No, I'd define Base.hash(f::Foo) = hash(f.x, 0x64c74221932dea5b), 
>>>>>> where I chose the constant by rand(UInt).  This way it won't collide 
>>>>>> with 
>>>>>> other types.  I really need to spend a bit more time with my interfaces 
>>>>>> chapter and add this info there.
>>>>>>
>>>>>> On Thursday, July 16, 2015 at 12:07:50 PM UTC-4, Seth wrote:
>>>>>>
>>>>>>> If I'm reading between the lines correctly, the default/existing 
>>>>>>> hash function is based on the last byte of the ID? Is there a reason we 
>>>>>>> don't make the table wider to reduce the chances of collisions (or 
>>>>>>> would 
>>>>>>> this have bad effects on memory utilization)?
>>>>>>>
>>>>>>
>>>>>> It's not really a hash collision, but rather a hashtable index 
>>>>>> collision.  A new dict only has 16 slots for its keys, and the index for 
>>>>>> an 
>>>>>> element is simply chosen by the last 8 bits of its hash.  If those last 
>>>>>> 8 
>>>>>> bits match the hash of an object already in the table, there's a 
>>>>>> collision, 
>>>>>> and the Dict checks to see if the existing element `isequal` to the new 
>>>>>> one 
>>>>>> (and doesn't look at the hash again).  So the hashes are different, but 
>>>>>> isequal says the objects are the same.  This is why things are wonky.
>>>>>>
>>>>>
>>>>
>

Re: [julia-users] The unique function and iterables of custom composite types

Reply via email to