New submission from lplatypus <l...@deller.id.au>:

The documentation for the hash() function says:
"Numeric values that compare equal have the same hash value (even if they are 
of different types, as is the case for 1 and 1.0)"

This can be violated when comparing a unicode object with its str equivalent.  
Here is an example:

C:\>c:\Python27\python -S
Python 2.7a3 (r27a3:78021, Feb  7 2010, 00:00:09) [MSC v.1500 32 bit (Intel)] 
on win32
>>> import sys; sys.setdefaultencoding('utf-8')
>>> unicodeobj = u'No\xebl'
>>> strobj = str(unicodeobj)
>>> unicodeobj == strobj
True
>>> hash(unicodeobj) == hash(strobj)
False

The last response should be True not False.

I tested this on Python 2.7a3/windows, 2.6.4/linux, 2.5.2/linux.  The problem 
is not relevant to Python 3.0+.

Looking at unicodeobject.c:unicode_hash() and stringobject.c:string_hash(), I 
think that this problem would arise for "equal" objects strobj and unicodeobj 
when the unicode code points are not aligned with the encoded bytes, ie when:
    map(ord, unicodeobj) != map(ord, strobj)
This means that the problem never arises when sys.getdefaultencoding() is 
'ascii' or 'iso8859-1'/'latin1'.

----------
components: Interpreter Core
messages: 99084
nosy: ldeller
severity: normal
status: open
title: equal unicode/str objects can have unequal hash
type: behavior
versions: Python 2.5, Python 2.6, Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7890>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to