On May 6, 2009, at 5:39 AM, Stephen J. Turnbull wrote:
Now, with Python's file system encoding == UTF-8 or any packed EUC, and more than a handful of Shift JIS or Big5 characters in file names, one is *almost certain* to encounter ASCII as the second byte of a multibyte sequence. PEP 383 can't handle this
Hm, I haven't tried the implementation, but I thought that what would happen is: '\x85a'.decode('utf-8', 'utf8b/surrogate-replace/whateveritscalled') - > u'\uDC85a'
If that indeed doesn't happen, that's certainly a defect and should be remedied.
, but it is sure to be the most common use case for PEP 383 in East Asia.
Yes. James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com