> IMO at the C level all conversions between bytes and Unicode that > don't specify a conversion should use UTF-8. That's what most of the > changes made so far do.
I agree. We should specify that somewhere, so we have a recorded guideline to use in case of doubt. One function that misbehaves under this spec is PyUnicode_FromString[AndSize], which assumes the input is Latin-1 (i.e. it performs a codepoint-per-codepoint conversion). As a consequence, this now can fail because of encoding errors (which it previously couldn't). > An exception should be made for stuff that explicitly handles > filenames; there the filesystem encoding should obviously used. In most cases, this still follows the rule, as the filename encoding is specified explicitly. I agree this should also be specified, in particular when the import code gets fixed (where strings typically denote file names). Regards, Martin _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
