[issue9804] ascii() does not always join surrogate pairs

Antoine Pitrou Wed, 08 Sep 2010 16:11:53 -0700

Antoine Pitrou <[email protected]> added the comment:

How about the following solution:


>>> def a(s):
...    s = s.encode('unicode-escape').decode('ascii')
...    s = s.replace("'", r"\'")
...    return "'" + s + "'"
... 
>>> s = "'\0\"\n\r\t abcd\x85é\U00012fff\U0001D121xxx\uD800."
>>> print(ascii(s)); print(a(s)); print(repr(s))
'\'\x00"\n\r\t abcd\x85\xe9\U00012fff\ud834\udd21xxx\ud800.'
'\'\x00"\n\r\t abcd\x85\xe9\U00012fff\U0001d121xxx\ud800.'
'\'\x00"\n\r\t abcd\x85é\U00012fff𝄡xxx\ud800.'


(I think I've included everything:
- normal chars
- control chars
- one-byte non-ASCII
- two-byte non-ASCII (and lone surrogate)
- printable and non-printable surrogate pairs)
- single and double quotes)

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue9804>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue9804] ascii() does not always join surrogate pairs

Reply via email to