hi, from the documentation (http://docs.python.org/lib/os-file-dir.html) for os.listdir:
"On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects." i'm on Unix. (linux, ubuntu edgy) so it seems that it does not always return unicode filenames. it seems that it tries to interpret the filenames using the filesystem's encoding, and if that fails, it simply returns the filename as byte-string. so you get back let's say an array of 21 filenames, from which 3 are byte-strings, and the rest unicode strings. after digging around, i found this in the source code: > #ifdef Py_USING_UNICODE > if (arg_is_unicode) { > PyObject *w; > > w = PyUnicode_FromEncodedObject(v, > Py_FileSystemDefaultEncoding, > "strict"); > if (w != NULL) { > Py_DECREF(v); > v = w; > } > else { > /* fall back to the original byte string, as > discussed in patch #683592 */ > PyErr_Clear(); > } > } > #endif so if the to-unicode-conversion fails, it falls back to the original byte-string. i went and have read the patch-discussion. and now i'm not sure what to do. i know that: 1. the documentation is completely wrong. it does not always return unicode filenames 2. it's true that the documentation does not specify what happens if the filename is not in the filesystem-encoding, but i simply expected that i get an Unicode-exception, as everywhere else. you see, exceptions are ok, i can deal with them. but this is just plain wrong. from now on, EVERYWHERE where i use os.listdir, i will have to go through all the filenames in it, and check if they are unicode-strings or not. so basically i'd like to ask here: am i reading something incorrectly? or am i using os.listdir the "wrong way"? how do other people deal with this? p.s: one additional note. if you code expects os.listdir to return unicode, that usually means that all your code uses unicode strings. which in turn means, that those filenames will somehow later interact with unicode strings. which means that that byte-string-filename will probably get auto-converted to unicode at a later point, and that auto-conversion will VERY probably fail, because the auto-convert only happens using 'ascii' as the encoding, and if it was not possible to decode the filename inside listdir, it's quite probable that it also will not work using 'ascii' as the charset. gabor -- http://mail.python.org/mailman/listinfo/python-list