[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

tmp12342 Tue, 11 Aug 2015 04:29:32 -0700

tmp12342 added the comment:

Serhiy, I understand the first reason, but 
https://docs.python.org/3/library/codecs.html says
> applicable to text encodings:
> [...]
> This code will then be turned back into the same byte when the 
> 'surrogateescape' error handler is used when encoding the data.
Shouldn't it be corrected? Text encoding is defined as "A codec which encodes 
Unicode strings to bytes."



And about second one, could you explain a bit more? I mean, I don't know how to 
interpret it.

You say b'\xD8\x00' are invalid ASCII bytes, but from these two only 0xD8 is 
invalid. Also, we are talking about encoding here, str -> bytes, so who cares 
are resulting bytes ASCII compatible or not?

----------
nosy: +tmp12342

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12892>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

Reply via email to