New submission from Marc-Andre Lemburg:

In Python 2, the unicode() constructor does not accept bytes arguments, unless 
an encoding argument is given:

>>> unicode(u'abcäöü'.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal 
not in range(128)

In Python 3, the str() constructor masks this programming error by returning 
the repr() of the bytes object:

>>> str('abcäöü'.encode('utf-8'))
"b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'"

I think it would be more helpful to point the programmer to the most probably 
missing encoding argument by raising an error.

Also note that you get a different output with encoding argument set:

>>> str('abcäöü'.encode('utf-8'), 'utf-8')
'abcäöü'

I know this is documented, but it is still not very helpful and can easily hide 
errors.

----------
components: Interpreter Core, Unicode
messages: 241800
nosy: ezio.melotti, haypo, lemburg
priority: normal
severity: normal
status: open
title: str(bytes_obj) should raise an error
versions: Python 3.5, Python 3.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24025>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to