Adal Chiriliuc <adal.chiril...@gmail.com> added the comment: It's an internal web API at the place I work for.
To be able to use it from Python in some form, I did an workaround in which I just stripped everything outside BMP: # replace characters outside BMP with 'REPLACEMENT CHARACTER' (U+FFFD) def cesu8_to_utf8(text): ....result = "" ....index = 0 ....length = len(text) ....while index < length: ........if text[index] < "\xf0": ............result += text[index] ............index += 1 ........else: ............result += "\xef\xbf\xbd" # u"\ufffd".encode("utf8") ............index += 4 ....return result Now that I look at the workaround again, I'm not even sure it's about CESU-8 (it strips Unicode chars encoded to 4 bytes, not 2 pairs of 3 bytes surrogates). However I can see why there would be little interest in adding this encoding. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12742> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com