Re: PEP 383: Non-decodable Bytes in System Character Interfaces

MRAB Wed, 22 Apr 2009 05:19:11 -0700

Martin v. Löwis wrote:
[snip]

To convert non-decodable bytes, a new error handler "python-escape" is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that currently exist in Python codecs.


The error handler interface is extended to allow the encode error
handler to return byte strings immediately, in addition to returning
Unicode strings which then get encoded again.

If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
(which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.

If the byte stream happens to include a sequence which decodes to
U+F01xx, shouldn't that raise an exception?
--
http://mail.python.org/mailman/listinfo/python-list

Re: PEP 383: Non-decodable Bytes in System Character Interfaces

Reply via email to