Hi, > > But they are open questions (already asked in the bug tracker): > > I answered these in the bug tracker. Here are the answers for the > mailing list:
Oh, sorry. I didn't follow the end of the discussion on the bug tracker. > > os.environb['PATH'] = '\xff' > > => os.environ['PATH'] = ??? > > os.environ['PATH'] => raises KeyError because PATH is not a key in > the unicode decoded environment. Ok, good answer :-) > > os.environ['PATH'] = chr(0x10000) > > => os.environb['PATH'] = ??? > > raise UnicodeEncodeError when setting the value. Ok, it's consistent the current behaviour. $ LANG=C ./python Python 3.0rc3+ (py3k:67498M, Dec 4 2008, 17:45:54) >>> import os >>> os.environ['x'] = '\xff' >>> os.environ['x'] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/haypo/prog/py3k/Lib/io.py", line 1491, in write b = encoder.encode(s) File "/home/haypo/prog/py3k/Lib/encodings/ascii.py", line 22, in encode return codecs.ascii_encode(input, self.errors)[0] UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1: ordinal not in range(128) Oh, that's strange :-p The error is delayed when we read the value. > > It would be maybe easier if os.environ supports bytes and unicode keys. > > But we have to keep these assertions: > > os.environ[bytes] -> bytes > > os.environ[str] -> str > > I think the same choices have to be made here. If LANG=C, we still have > to decide what to do when os.environ[str] is set to a non-ASCii string. If the charset is US-ASCII, os.environ will drop non-ASCII values. But most variables are ASCII only. Examples with my shell: $ env XCURSOR_THEME=kubuntu LANG=fr_FR.UTF-8 EDITOR=vim HOME=/home/haypo ... > Additionally, the subprocess question makes using the key value > undesirable compared with having a separate os.environb that accesses > the same underlying data. The user should be able to choose bytes or unicode. Examples: - subprocess.Popen('ls') => use unicode environment (os.environ) - subprocess.Popen(b'ls') => use bytes environment (os.environb) > Here's my problem with it, though. With these semantics any program > that works on arbitrary files and runs on *NIX has to check > os.listdir(b'') and do the conversion manually. Only programs that have to support strange environment like yours (mixing Shift-JIS and UTF-8) :-) Most programs don't have to support these charset mixture. We can imagine an higher library working on UNIX and Windows (bytes or Unicode). But that would be later. > I think the desired behaviour assuming the existence of a nondecodable > file is this: I prefer the current behaviour :-) > Why do you think that glob.glob('*.py') is special and should not traceback? It's not special. glob() reuses listdir(), and it was an example to show that "it just works". > I just differ in that I think lack of tracebacks when > UnicodeDecodeErrors are encountered is a wart in python3 that did not > exist in python2. Right. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com