STINNER Victor <victor.stin...@haypocalc.com> added the comment: > Maybe os.path.supports_unicode_filenames should be deprecated. > The doc currently says: > "True if arbitrary Unicode strings can be used as file names > (within limitations imposed by the file system), and if os.listdir() > returns Unicode strings for a Unicode argument." > > On Linux both the things work, even if the value of > os.path.supports_unicode_filenames is still False: > (...)
It depends on the locale encoding: $ LC_CTYPE=C ./python Python 3.2a2+ (py3k, Sep 11 2010, 01:48:43) >>> import sys; sys.getfilesystemencoding() 'ascii' >>> open('\xe9', 'w').close() ... UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128) With utf-8, surrogates are forbidden. Eg. $ ./python Python 3.2a2+ (py3k, Sep 11 2010, 01:48:43) >>> import sys; sys.getfilesystemencoding() 'utf-8' >>> open('\uDC00', 'w').close() ... UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 0: surrogates not allowed On Windows, Python uses the unicode API and so the unicode support doesn't depend on the locale encoding (on the ansi code page). Surrogates are accepted on Windows: '\uDC00' is a valid filename. I think that supports_unicode_filenames is still useful to check if the filesystem API uses bytes (Linux, FreeBSD, Solaris, ...) or characters (Mac OS X, Windows). Mac OS X is a special case because the C API uses char* (byte string), but the filesystem encoding is fixed to utf-8 and it doesn't accept invalid utf-8 filenames. So I would like to say that supports_unicode_filenames should be True on Mac OS X (which was the initial request). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue767645> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com