Martin v. Löwis wrote:
MRAB wrote:
Martin v. Löwis wrote:
[snip]
To convert non-decodable bytes, a new error handler "python-escape" is
introduced, which decodes non-decodable bytes using into a private-use
character U+F01xx, which is believed to not conflict with private-use
characters that currently exist in Python codecs.

The error handler interface is extended to allow the encode error
handler to return byte strings immediately, in addition to returning
Unicode strings which then get encoded again.

If the locale's encoding is UTF-8, the file system encoding is set to
a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
(which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.

If the byte stream happens to include a sequence which decodes to
U+F01xx, shouldn't that raise an exception?

I apparently have not expressed it clearly, so please help me improve
the text. What I mean is this:

- if the environment encoding (for lack of better name) is UTF-8,
  Python stops using the utf-8 codec under this PEP, and switches
  to the utf-8b codec.
- otherwise (env encoding is not utf-8), undecodable bytes get decoded
  with the error handler. In this case, U+F01xx will not occur
  in the byte stream, since no other codec ever produces this PUA
  character (this is not fully true - UTF-16 may also produce PUA
  characters, but they can't appear as env encodings).
So the case you are referring to should not happen.

I think what's confusing me is that you talk about mapping non-decodable
bytes to U+F01xx, but you also talk about decoding to half surrogate
codes U+DC80..U+DCFF.

If the bytes are mapped to single half surrogate codes instead of the
normal pairs (low+high), then I can see that decoding could never be
ambiguous and encoding could produce the original bytes.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to