On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote: > The way of solving this issue in Python is using an error handler. The > "surrogateescape" error handler is specially designed for lossless > reversible decoding. It maps every unassigned byte in the range > 0x80-0xff to a single character in the range U+dc80-U+dcff. This allows > you to distinguish correctly decoded characters from the escaped bytes, > perform character by character processing of the decoded text, and > encode the result back with the same encoding.
Maybe we need a new error handler that maps unassigned bytes in the range 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the encodings being discussed have behavior other than the "normal" version of the encoding plus what I just described? _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/