[issue9804] ascii() does not always join surrogate pairs

STINNER Victor Wed, 08 Sep 2010 15:57:50 -0700

STINNER Victor <[email protected]> added the comment:

For unicode, ascii(x) is implemented as repr(x).encode('ascii', 
'backslashreplace').decode('ascii').


repr(x) is "'" + x + "'" for printable characters (eg. U+1D121), and "'U+%08x'" 
% ord(x) for not printable characters (eg. U+12FFF).

About the unexpected output, the problem is that ascii+backslashreplace encodes 
non-BMP printable characters as b'\\uXXXX\\uXXXX' in narrow builds.

I don't see simple solution to encode non-BMP characters as b'\\UXXXXXXXX' 
because the principle of error handler is that it escapes non encodable 
characters one by one.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue9804>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9804] ascii() does not always join surrogate pairs

Reply via email to