[issue34145] uuid3 and uuid5 hard to use portably between Python 2 and 3

Bence Romsics Wed, 18 Jul 2018 01:46:17 -0700


New submission from Bence Romsics <[email protected]>:


The issue I'd like to report may not be an outright bug neither in cPython 2 
nor in cPython 3, but more of a wishlist item to help Python programmers 
writing UUID-handling code that's valid in Python 2 and 3 at the same time.

Please consider these one-liners:

    $ python2 -c 'import uuid ; 
uuid.uuid5(namespace=uuid.UUID("850aeee8-e173-4da1-9d6b-dd06e4b06747"), 
name="foo")'
    $ python3 -c 'import uuid ; 
uuid.uuid5(namespace=uuid.UUID("850aeee8-e173-4da1-9d6b-dd06e4b06747"), 
name="foo")'

As long as the 'name' input to uuid.uuid3() or uuid.uuid5() is the literal 
string type of the relevant Python version there's no problem at all. However 
if you'd like to handle both unicode and non-unicode 'name' input in valid 
Python2/3 code then I find that's impossible to express without relying on 
Python version checking.

cPython2's uuid module is incompatible with unicode input:

    $ python2 -c 'import uuid ; 
uuid.uuid5(namespace=uuid.UUID("850aeee8-e173-4da1-9d6b-dd06e4b06747"), 
name=u"foo")'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/lib/python2.7/uuid.py", line 589, in uuid5
        hash = sha1(namespace.bytes + name).digest()
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x85 in position 0: 
ordinal not in range(128)

cPython3's uuid module is incompatible with non-unicode input:

    $ python3 -c 'import uuid ; 
uuid.uuid5(namespace=uuid.UUID("850aeee8-e173-4da1-9d6b-dd06e4b06747"), 
name=b"foo")'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/lib/python3.5/uuid.py", line 608, in uuid5
        hash = sha1(namespace.bytes + bytes(name, "utf-8")).digest()
    TypeError: encoding without a string argument

The reason is obvious looking at the uuid modules' source code:

cPython 2.7:
https://github.com/python/cpython/blob/ea9a0994cd0f4bd37799b045c34097eb21662b3d/Lib/uuid.py#L603

cPython 3.6:
https://github.com/python/cpython/blob/e9e2fd75ccbc6e9a5221cf3525e39e9d042d843f/Lib/uuid.py#L628

Therefore portable code has to resort to version checking like this:

    import six
    import uuid

    if six.PY2:
        name = name.encode('utf-8')
    uuid.uuid5(namespace=namespace, name=name)

IMHO this inconvenience could be avoided if cPython2's uuid.uuid3() and 
uuid.uuid5() had been changed to also accept unicode 'name' arguments and 
encode() them implicitly.

What do you think?

----------
components: Library (Lib)
messages: 321870
nosy: rubasov
priority: normal
severity: normal
status: open
title: uuid3 and uuid5 hard to use portably between Python 2 and 3
versions: Python 2.7, Python 3.6

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue34145>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34145] uuid3 and uuid5 hard to use portably between Python 2 and 3

Reply via email to