On 12/27/2016 07:13 PM, j...@math.brown.edu wrote:
According to the docs [6]:
"""
it is advised to mix together the hash values of the components of the
object that also play a part in comparison of objects by packing them
into a tuple and hashing the tuple. Example:
def __hash__(self):
return hash((self.name, self.nick, self.color))
"""
Applying this advice to the use cases above would require creating an
arbitrarily large tuple in memory before passing it to hash(), which
is then just thrown away. It would be preferable if there were a way
to pass multiple values to hash() in a streaming fashion, such that
the overall hash were computed incrementally, without building up a
large object in memory first.
Part of the reason for creating __hash__ like above is that:
- it's simple
- it's reliable
However, it's not the only way to have a hash algorithm that works; in fact,
the beginning of the sentence you quoted says:
The only required property is that objects which compare equal have
the same hash value;
In other words, objects that do not compare equal can also have the same hash
value (although too much of that will reduce the efficiency of Python's
containers).
(ii) priming the overall hash value with some class-specific initial
value, so that if an instance of a different type of collection, which
comprised the same items but which compared unequal, were to compute
its hash value out of the same constituent items, we make sure our
hash value differs.
This is unnecessary: hashes are compared first as a way to weed out impossible
matches, but when the hashes are the same an actual __eq__ test is still done
[7].
--
~Ethan~
[6] https://docs.python.org/3/reference/datamodel.html#object.__hash__
[7] some test code to prove above points:
--- 8< ------------------------------------------------------------
from unittest import main, TestCase
class Eggs(object):
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(self.value)
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.value == other.value
return NotImplemented
class Spam(object):
def __init__(self, value):
self.value = value
def __hash__(self):
return hash(self.value)
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.value == other.value
return NotImplemented
e10 = Eggs(1)
e20 = Eggs(2)
e11 = Eggs(1)
e21 = Eggs(2)
s10 = Spam(1)
s20 = Spam(2)
s11 = Spam(1)
s21 = Spam(2)
bag = {}
bag[e10] = 1
bag[s10] = 2
bag[e20] = 3
bag[s20] = 4
class TestEqualityAndHashing(TestCase):
def test_equal(self):
# same class, same value --> equal
self.assertEqual(e10, e11)
self.assertEqual(e20, e21)
self.assertEqual(s10, s11)
self.assertEqual(s20, s21)
def test_not_equal(self):
# different class, same value --> not equal
self.assertEqual(e10.value, s10.value)
self.assertNotEqual(e10, s10)
self.assertEqual(e20.value, s20.value)
self.assertNotEqual(e20, s20)
def test_same_hash(self):
# same class, same value, same hash
self.assertEqual(hash(e10), hash(e11))
self.assertEqual(hash(e20), hash(e21))
self.assertEqual(hash(s10), hash(s11))
self.assertEqual(hash(s20), hash(s21))
# different class, same value, same hash
self.assertEqual(hash(e10), hash(s10))
self.assertEqual(hash(e11), hash(s11))
self.assertEqual(hash(e20), hash(s20))
self.assertEqual(hash(e21), hash(s21))
def test_as_key(self):
# different objects from different classes with same hash should still
be distinct
self.assertEqual(len(bag), 4)
self.assertEqual(bag[e10], 1)
self.assertEqual(bag[s10], 2)
self.assertEqual(bag[e20], 3)
self.assertEqual(bag[s20], 4)
# different objects from same classes with same hash should not be
distinct
self.assertEqual(bag[e11], 1)
self.assertEqual(bag[s11], 2)
self.assertEqual(bag[e21], 3)
self.assertEqual(bag[s21], 4)
main()
--- 8< ------------------------------------------------------------
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/