On 12月19日, 下午9时34分, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote: > On Fri, 19 Dec 2008 04:05:12 -0800, digisat...@gmail.com wrote: > > The below snippet code generates UnicodeDecodeError. > > #!/usr/bin/env > > python > > #--*-- coding: utf-8 --*-- > > s = 'äöü' > > u = unicode(s) > > > It seems that the system use the default encoding- ASCII to decode the > > utf8 encoded string literal, and thus generates the error. > > > The question is why the Python interpreter use the default encoding > > instead of "utf-8", which I explicitly declared in the source. > > Because the declaration is only for decoding unicode literals in that > very source file. > > Ciao, > Marc 'BlackJack' Rintsch
Thanks for the answer. I believe the declaration is not only for unicode literals, it is for all literals in the source even including Comments. we can try runing a source file without encoding declaration and have only 1 line of comments with non-ASCII characters. That will arise a Syntax error and bring me to the pep263 URL. I read the pep263 and quoted below: Python's tokenizer/compiler combo will need to be updated to work as follows: 1. read the file 2. decode it into Unicode assuming a fixed per-file encoding 3. convert it into a UTF-8 byte string 4. tokenize the UTF-8 content 5. compile it, creating Unicode objects from the given Unicode data and creating string objects from the Unicode literal data by first reencoding the UTF-8 data into 8-bit string data using the given file encoding The above described Python internal process indicate that the step 2 will utilise the specific encoding to decode all literals in source, while in step5 will evolve a re-encoding with the specific encoding. That is the reason why we have to explicitly declare a encoding as long as we have non-ASCII in source. Bruno answered why we need specify a encoding when decoding a byte string with perfect explanation, Thank you very much. -- http://mail.python.org/mailman/listinfo/python-list