Harald Husum <harald.hu...@gmail.com> added the comment:

I am realising that me not knowing about the hash invariance is likely a 
symptom of something I have in common with most Python users, but not with 
Python maintainers: Having access to a powerful ecosystem, we mostly get our 
classes from 3rd parties, rather than implement them ourselves. When I do 
define my own classes, I usually don't have to touch the `__hash__` or `__eq__` 
implementations, since I am either subclassing, making a plain dataclass, or 
leaning on `attrs` to help me out. I think it is telling that even the pandas 
core devs are able to mess this up, and it suggests to me that this invariance 
isn't emphasised enough.

Here's a go at specifying what I mean with a backlink:

"""
For sequence container types such as list, tuple, or collections.deque,
the expression `x in y` is equivalent to `any(x is e or x == e for e in y)`.
For container that use hashing, such as dict, set, or frozenset, 
the same equivalence holds, assuming the [hash 
invariance](https://docs.python.org/3/glossary.html#term-hashable).
"""

I just derived this more or less directly from Hettinger's formulation. It 
could probably be made clearer.

I am realising that this, (famous, it seems), hash invariance isn't defined in 
isolation anywhere, making it slightly hard to link to. Any better suggestions 
than the glossary entry for hashable, which has the definition included? To me, 
it seems that such a fundamental assumption/convention/requirement, that isn't 
automatically enforced, should be as easy as possible to point to.

In my search for the definition (prompted by Hettinger) i discovered more 
surprised, by the way.

Surprise 1:
https://docs.python.org/3/library/collections.abc.html?highlight=hashable#collections.abc.Hashable

> ABC for classes that provide the __hash__() method.

Having now discovered the mentioned invariance, I am surprised this isn't 
explicitly formulated (and implemented? haven't checked) as:

"""
ABC for classes that provide the __hash__() and __eq__() methods.
"""

I also think this docstring deserves a backlink to the invariance definition, 
given it's importance, and how easy it is to shoot yourself in the foot. The 
current formulation of this docstring actually reflected what I (naively) 
assumed it meant to be hashable, suggesting this is the place in the docs I got 
my understanding of the term from.

Surprise 2:
https://docs.python.org/3/reference/expressions.html?highlight=hashable#value-comparisons

> The `hash()` result should be consistent with equality. Objects that are 
> equal should either have the same hash value, or be marked as unhashable.

I appreciate that this is mentioned in this section (I was hoping to find it). 
But it feels like a reiteration of the definition of the invariant, and could 
thus be replaced with a backlink, like suggested above. I'd much rather see the 
text real estate be used for a motivating statement (you do't want weird 
behaviour in sets and dicts), and a reminder of the importance of checking the 
__hash__ implementation if you are modifying the __eq__ implementation, in, 
say, some subclass.

Surprise 3:
https://docs.python.org/3/reference/datamodel.html#object.__eq__

> See the paragraph on __hash__() for some important notes on creating hashable 
> objects which support custom comparison operations and are usable as 
> dictionary keys.

Another case of the invariance being mentioned (I appreciate it), but in a way 
where it isn't directly evident that extreme care should be taken when 
modifying an __eq__ implementation. Perhaps another case where the invariance 
should be referred to by link, and the text should focus on the consequences of 
breaking it.

Surprise 4:
https://docs.python.org/3/reference/datamodel.html#object.__hash__

Another definition-in-passing of the invariance:

> The only required property is that objects which compare equal have the same 
> hash value.

Also replaceable by backlink?

There after follows descriptions of some, (in hindsight very important), 
protection mechanisms.

> User-defined classes have __eq__() and __hash__() methods by default; with 
> them, all objects compare unequal (except with themselves) and x.__hash__() 
> returns an appropriate value such that x == y implies both that x is y and 
> hash(x) == hash(y).

> A class that overrides __eq__() and does not define __hash__() will have its 
> __hash__() implicitly set to None.

But yet again, without some motivating statement for why we care about the 
invariance, all of this seems, well, surprising and weird.

Surprise 5:
https://docs.python.org/3/library/functions.html#hash

Perhaps another location where a backlink would be in order, although not sure 
in this case.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue45832>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to