Hi, Le Thursday 04 December 2008 21:02:19 Toshio Kuratomi, vous avez écrit : > I opened up bug http://bugs.python.org/issue4006 a while ago and it was > suggested in the report that it's not a bug but a feature and so I > should come here to see about getting the feature changed :-)
Yeah, I prefer to discuss such changes on the mailing list. > These mixed encodings can occur for a variety of reasons. Here's an > example that isn't too contrived :-) > (...) > Furthermore, they don't want to suffer from the space loss of using > utf-8 to encode Japanese so they use shift-jis everywhere. "space loss"? Really? If you configure your server correctly, you should get UTF-8 even if the file system is Shift-JIS. But it would be much easier to use UTF-8 everywhere. Hum... I don't think that the discussion is about one specific server, but the lack of bytes environment variables in Python3 :-) > 1) return mixed unicode and byte types in ... NO! > 2) return only byte types in os.environ Hum... Most users have UTF-8 everywhere (eg. all Windows users ;-)), and Python3 already use Unicode everywhere (input(), open(), filenames, ...). > 3) silently ignore non-decodable value when accessing os.environ['PATH'] > as we do now but allow access to the full information via > os.environ[b'PATH'] and os.getenvb() I don't like os.environ[b'PATH']. I prefer to always get the same result type... But os.listdir() doesn't respect that :-( os.listdir(str) -> list of str os.listdir(bytes) -> list of bytes I would prefer a similar API for easier migration from Python2/Python3 (unicode). os.environb sounds like the best choice for me. But they are open questions (already asked in the bug tracker): (a) Should os.environ be updated if os.environb is changed? If yes, how? os.environb['PATH'] = '\xff' (or any invalid string in the system default encoding) => os.environ['PATH'] = ??? (b) Should os.environb be updated if os.environ is changed? If yes, how? The problem comes with non-Unicode locale (eg. latin-1 or ASCII): most charset are unable to encode the whole Unicode charset (eg. codes >= 65535). os.environ['PATH'] = chr(0x10000) => os.environb['PATH'] = ??? (c) Same question when a key is deleted (del os.environ['PATH']). If Python 3.1 will have os.environ and os.environb, I'm quite sure that some modules will user os.environ and other will prefer os.environb. If both environments are differents, the two modules set will work differently :-/ It would be maybe easier if os.environ supports bytes and unicode keys. But we have to keep these assertions: os.environ[bytes] -> bytes os.environ[str] -> str > 4) raise an exception when non-decodable values are *accessed* and > continue as in #3. I like os.listdir() behaviour: just *ignore* non-decodable files. If you really want to access these files, use a bytes directory name ;-) > I think that the ease of debugging is lost when we silently ignore an error. Guido gave a good example. If your directory contains an non decodable filename (eg. "???.txt"): glob('*.py') will fail because of the evil filename. With the current behaviour, you're unable to list all files but glob('*.py') will list all Python scripts! And Python3 is released, it's maybe a bad idea to change the behaviour (of os.environ) in Python 3.1 :-/ > The bug report I opened suggests creating a PEP to address this issue. Please, try to answer to my questions about os.environ and os.environb consistency. I also like bytes environment variables. I need them for my fuzzing program. The lack of bytes variables is a regression from Python2 (for my program). On UNIX, filenames are bytes and the environment variables are bytes. For the best interoperability, Python3 should support bytes. But the default choice should always be characters (unicode) and to never mix the bytes and str types ;-) --- As usual, it goes faster if someone writes a patch :-) I could try to work on it. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com