Serhiy Storchaka added the comment:

Proposed preliminary patch adds three functions in the codecs module:

convert_surrogates(data, errors) -- handle lone surrogates with specified error 
handler.

>>> codecs.convert_surrogates('a\u20ac\udca4', 'backslashreplace')
'a€\\udca4'

convert_surrogateescape(data, errors) -- handle surrogateescaped bytes with 
specified error handler

>>> codecs.convert_surrogateescape('a\u20ac\udca4', 'backslashreplace')
'a€\\xa4'

convert_astrals(data, errors) -- handle astral (non-BMP) characters with 
specified error handler.

>>> codecs.convert_astral('a\u20ac\U000e007f', 'backslashreplace')
'a€\\U000e007f'

Names are discussable.

I think also about adding two functions or error handlers (that can used with 
convert_surrogates and convert_astrals) for composing astral characters from 
surrogate pairs and vice versa.

----------
components: +Library (Lib)
versions: +Python 3.5 -Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18814>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to