On Jul 13, 2005, at 6:05 PM, Nick Matsakis wrote: > > What is the best way to deal with non-ASCII paths when working with > the > python standard library? Specifically, when using functions like > open() > and the os and glob modules, what should be passed in? What should I > expect out?
If you pass unicode in, you get unicode out: >>> import os >>> set(map(type, os.listdir('.'))) set([<type 'str'>]) >>> set(map(type, os.listdir(u'.'))) set([<type 'unicode'>]) Otherwise you pass and receive byte strings. The encoding of those byte strings is fixed: >>> import sys >>> sys.getfilesystemencoding() 'utf-8' > In experimenting with it, it appears that these libraries accept str > objects containing UTF-8 encoded bytes and similarly that is what they > return. It would seem better to me if they could be made to accept > and > return unicode objects, but I could see that that might cause > backwards > compatibility problems. Still, is UTF-8 encoded strs really a safe > bet? > Are there circumstances, including non HFS filesystems, where it > will bite > me if I make this assumption? HFS actually uses UTF-16 internally, but the POSIX layer is UTF-8. It will bite you if you expect the code to work on other platforms. Not all platforms use UTF-8 for their filesystem encoding. -bob _______________________________________________ Pythonmac-SIG maillist - Pythonmac-SIG@python.org http://mail.python.org/mailman/listinfo/pythonmac-sig