New submission from Marc-Andre Lemburg:
In Python 2, the unicode() constructor does not accept bytes arguments, unless
an encoding argument is given:
>>> unicode(u'abcäöü'.encode('utf-8'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal
not in range(128)
In Python 3, the str() constructor masks this programming error by returning
the repr() of the bytes object:
>>> str('abcäöü'.encode('utf-8'))
"b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'"
I think it would be more helpful to point the programmer to the most probably
missing encoding argument by raising an error.
Also note that you get a different output with encoding argument set:
>>> str('abcäöü'.encode('utf-8'), 'utf-8')
'abcäöü'
I know this is documented, but it is still not very helpful and can easily hide
errors.
----------
components: Interpreter Core, Unicode
messages: 241800
nosy: ezio.melotti, haypo, lemburg
priority: normal
severity: normal
status: open
title: str(bytes_obj) should raise an error
versions: Python 3.5, Python 3.6
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue24025>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com