STINNER Victor <victor.stin...@haypocalc.com> added the comment: Le vendredi 30 avril 2010 15:58:28, vous avez écrit : > It's better to let the application decide how to solve this problem > and in order to allow for this, the encodings must be adjustable.
On POSIX, use byte strings to avoid encoding issues. Examples: subprocess.call(['env'], {b'TEST: b'a\xff-'}) # env subprocess.call(['echo', b'a\xff-']) # command line open('a\xff-') # filename os.getenv(b'a\xff-') # get env (result as unicode) Are you talking about issues on Windows? > By using fsencode() and fsdecode() in stdlib functions, you basically > prevent this kind of adjustment, ... Not if you use byte strings. On POSIX, an unicode string is always converted at the end for the system call (using sys.getfilesystemencoding()). > If you know that e.g. your environment variables are going to have > Latin-1 data (say some content-type variable has this information), > but the user's default LANG setting is UTF-8, Python will fetch the > data as broken Unicode data, you then have to convert it back to bytes > and then back to Unicode using the correct Latin-1 encoding. > > It would be a lot better to have the application provide the > encoding to the os.getenv() function and have Python do the > correct decoding right from the start. You mean that os.getenv() should have an optionnal argument? Something like: def getenv(key, default=None, encoding=None): value = environ.get(key, default) if encoding: value = value.encode(sys.getfileystemencoding(), 'surrogateescape') value = value.decode(encoding, 'surrogateescape') return value There are many indirect calls to os.getenv() (eg. by using os.environ.get()): - curses uses TERM - webbrowser uses PROGRAMFILES (path) - distutils.msvc9compiler uses "VS%0.f0COMNTOOLS" % version (path) - wsgiref.util uses HTTP_HOST, SERVER_NAME, SCRIPT_NAME, ... (url) - platform uses PROCESSOR_ARCHITEW6432 - sysconfig uses PYTHONUSERBASE, APPDATA, ... (path) - idlelib.PyShell uses IDLESTARTUP and PYTHONSTARTUP (path) - ... How would you specify the correct encoding in indirect calls? If your application gets variables in *mixed* encoding, I think that your program should start by reencoding variables: for name, encoding in (('PATH', 'latin1'), ...): value = os.getenv(name) value = value.encode(sys.getfileystemencoding(), 'surrogateescape') value = value.decode(encoding, 'surrogateescape') os.setenv(name, value) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8514> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com