Bengt Richter wrote: > Well, what will be assumed about name after the lines > > #-*- coding: latin1 -*- > name = 'Martin Löwis' > > ?
Are you asking what is assumed about the identifier 'name', or the value bound to that identifier? Currently, the identifier must be encoded in latin1 in this source code, and it must only consist of letters, digits, and the underscore. The value of name will be a string consisting of the bytes 4d 61 72 74 69 6e 20 4c f6 77 69 73 > I know type(name) will be <type 'str'> and in itself contain no encoding > information now, > but why shouldn't the default assumption for literal-generated strings be > what the coding > cookie specified? That certainly is the assumption: string literals must be in the encoding specified in the source encoding, in the source code file on disk. If they aren't (and cannot be interpreted that way), you get a syntax error. > I know the current implementation doesn't keep track of the different > encodings that could reasonably be inferred from the source of the strings, > but we are talking about future stuff here ;-) Ah, so you want the source encoding to be preserved, say as an attribute of the string literal. This has been discussed many times, and was always rejected. Some people reject it because it is overkill: if you want reliable, stable representation of characters, you should use Unicode strings. Others reject it because of semantic difficulties: how would such strings behave under concatenation, if the encodings are different? > #-*- coding: latin1 -*- > name = 'Martin Löwis' > > could be that name.encoding == 'latin-1' That is not at all intuitive. I would have expected name.encoding to be 'latin1'. > Functions that generate strings, such as chr(), could be assumed to create > a string with the same encoding as the source code for the chr(...) > invocation. What is the source of the chr invocation? If I do chr(param), should I use the source where param was computed, or the source where the call to chr occurs? If the latter, how should the interpreter preserve the encoding of where the call came from? What about the many other sources of byte strings (like strings read from a file, or received via a socket)? > This is not a fully developed idea, and there has been discussion on the > topic before > (even between us ;-) but I thought another round might bring out your current > thinking > on it ;-) My thinking still is the same. It cannot really work, and it wouldn't do any good with what little it could do. Just use Unicode strings. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list