Serhiy Storchaka added the comment: Proposed preliminary patch adds three functions in the codecs module:
convert_surrogates(data, errors) -- handle lone surrogates with specified error handler. >>> codecs.convert_surrogates('a\u20ac\udca4', 'backslashreplace') 'a€\\udca4' convert_surrogateescape(data, errors) -- handle surrogateescaped bytes with specified error handler >>> codecs.convert_surrogateescape('a\u20ac\udca4', 'backslashreplace') 'a€\\xa4' convert_astrals(data, errors) -- handle astral (non-BMP) characters with specified error handler. >>> codecs.convert_astral('a\u20ac\U000e007f', 'backslashreplace') 'a€\\U000e007f' Names are discussable. I think also about adding two functions or error handlers (that can used with convert_surrogates and convert_astrals) for composing astral characters from surrogate pairs and vice versa. ---------- components: +Library (Lib) versions: +Python 3.5 -Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18814> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com