Hi all, is there any way to determine what's the charset of filenames returned by os.walk()?
The trouble is, if I pass <type 'str'> argument to os.walk() I get the filenames as byte-strings. Possibly UTF-8 encoded Unicode, who knows. OTOH If I pass <type 'unicode'> to os.walk() all the filenames I get in the loop are already unicode()d. However with some locales settings os.walk() dies with for example: Traceback (most recent call last): File "tst.py", line 10, in <module> for root, dirs, files in filelist: File "/usr/lib/python2.5/os.py", line 303, in walk for x in walk(path, topdown, onerror): File "/usr/lib/python2.5/os.py", line 293, in walk if isdir(join(top, name)): File "/usr/lib/python2.5/posixpath.py", line 65, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) I can't even skip over these files with 'os.walk(..., onerror=handler)' the handler() is never called. That happens for instance when the file names have some non-ascii characters and locales are set to ascii, but reportedly in some other cases as well. What's the right and safe way to walk the filesystem and get some meaningful filenames? Related question - if the directory is given name on a command line what's the right way to preprocess the argument before passing it down to os.walk()? For instance with LANG=en_NZ.UTF-8 (i.e. UTF-8 system): * directory is called 'smile☺' * sys.argv[1] will be 'smile\xe2\x98\xba' (type str) * after .decode("utf-8") I get u'smile\u263a' (type unicode) But how should I decode() it when running on a system where $LANG doesn't end with "UTF-8"? Apparently some locales have non-ascii default charsets. For instance zh_TW is BIG5 charset by default, ru_RU is ISO-8850-5, etc. How do I detect that to get the right charset for decode()? I tend to have everything internally in Unicode but it's often unclear how to convert some inputs to Unicode in the first place. What are the best practices for dealing with these chraset issues in Python? Thanks! Michal -- * Amazon S3 backup tool -- http://s3tools.logix.cz/s3cmd -- http://mail.python.org/mailman/listinfo/python-list