On Thu, Jan 11, 2018, at 04:55, Serhiy Storchaka wrote:
> The way of solving this issue in Python is using an error handler. The 
> "surrogateescape" error handler is specially designed for lossless 
> reversible decoding. It maps every unassigned byte in the range 
> 0x80-0xff to a single character in the range U+dc80-U+dcff. This allows 
> you to distinguish correctly decoded characters from the escaped bytes, 
> perform character by character processing of the decoded text, and 
> encode the result back with the same encoding.

Maybe we need a new error handler that maps unassigned bytes in the range 
0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the 
encodings being discussed have behavior other than the "normal" version of the 
encoding plus what I just described?
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to