[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

STINNER Victor Tue, 29 Nov 2011 12:46:09 -0800

STINNER Victor <[email protected]> added the comment:

Python 3.3 has a strange behaviour:


>>> '\uDBFF\uDFFF'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
>>> '\U0010ffff'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'

I would expect text.decode(encoding).encode(encoding)==text or an encode or 
decode error.

So I agree that the encoder should reject lone surogates.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue12892>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12892] UTF-16 and UTF-32 codecs should reject (lone) surrogates

Reply via email to