> IMO at the C level all conversions between bytes and Unicode that
> don't specify a conversion should use UTF-8. That's what most of the
> changes made so far do.

I agree. We should specify that somewhere, so we have a recorded
guideline to use in case of doubt.

One function that misbehaves under this spec is
PyUnicode_FromString[AndSize], which assumes the input is Latin-1
(i.e. it performs a codepoint-per-codepoint conversion).

As a consequence, this now can fail because of encoding errors
(which it previously couldn't).

> An exception should be made for stuff that explicitly handles
> filenames; there the filesystem encoding should obviously used.

In most cases, this still follows the rule, as the filename encoding
is specified explicitly. I agree this should also be specified, in
particular when the import code gets fixed (where strings typically
denote file names).

Regards,
Martin
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to