Martin v. Löwis <mar...@v.loewis.de> added the comment: Am 10.10.2010 17:51, schrieb STINNER Victor: > > STINNER Victor <victor.stin...@haypocalc.com> added the comment: > >> We run into problems because we have two inconsistent encodings, >> ... > > What? No. We have problems because we don't use the same encoding to > decode and to encode the same data type. It's not a problem to use a > different encoding for each data type (stdout, filenames, environment > variables, ...).
This is exactly the very problem that we face. In particular, the question is what encoding to use if something is *both* a filename and an environment variable value, or both a filename and a command line argument. > Mac OS X is a special case. Filesystem encoding is utf-8 on this OS, > whereas the locale encoding depends on LANG variable. If I understood > MvL proposition correctly, we should not rely on the locale on Mac OS > X. "Not rely on" is perhaps a bit harsh. It's not clear (to me) under what conditions the locale's encoding will be more correct than just assuming UTF-8 - there may actually be use cases for it. However, with the surrogate escapes, we could just always decode using UTF-8, and leave any mojibake problems that may arise from this from this to the application. I do think that these problems will be rare, since a) many OSX installations use UTF-8, anyway, and b) those that don't likely experience the proper round-tripping of the escape mechanism. > So the "3rd encoding" and the filesystem encodings should be > hardcoded to utf-8? That's an option to consider, yes - I'd like an OSX expert to comment. > The "third encoding" is no more controlable by a special environment > variable, only by classic locale environment variables (LC_ALL, > LC_CTYPE, LANG). Is it a problem? I remember a comment from MAL > saying that it may be a problem for CGI for the environment variables > because some (all?) variables are not encoded with the locale > encoding (but the HTML encoding?). I don't know if Python should > workaround CGI specific issues. In Python 3.2, we have now > os.environb: it's now possible to use a different encoding for each > variable. I think these problems are sufficiently resolved now: either by PEP 3333, PEP 444, PEP 383, or os.environb. I think you misunderstood MAL's comment, though: the environment variables are not encoded in *any* specific encoding. Instead, they are copied literally from the HTTP request, using whatever bytes the browser originally put in there - which may or may not have followed a particular encoding. HTTP is silent on this most of the time, and HTML is out of scope. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9992> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com