Pascal DUCHATELLE wrote: > I am trying to understand character coding and python (unicode...). Can > someone tell me why in IDLE I have to enter the sequence u'\u03B1' to > get an alpha greek sign displayed while in the texttest.py file that > comes with th pyGTK demo package there are some 'cabalistic' characters > (in the greek example section) that are displayed like greek symbols > just right.
I could not find texttest.py anywhere, and I have never worked with IDLE, so this is just a general remark: The Python source code compiler needs to know, what encoding is used for the source code, in order to properly build a Unicode string object from an expression like u'umlauts äöü' or u'αβγδ' (the latter should be the first four lower case letters of the Greek alphabet; I am not sure, if they will be properly represented, when I send this mail...). Without any further hint, the source code is, from the "perspective" Python interpreter, just a sequence of bytes, where the byte values <= 127 are well defined by the ASCII standard, but where the "meaning" of values > 127 depends on the source encoding. If you have a source code file with only the line: print u'umlauts äöü and and greek αβγδ' the Python interpreter will print the warning Non-ASCII character '\xc3' in file pytest.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details On the machine I'm writing this (quite standard Suse 9.3 installation), where most editors use/assume UTF8 encoding, the script gives this output: umlauts äöü and and greek αβγδ The umlauts and the Greek characters are UTF8-encoded in two bytes by the editor, but the Python interpreter does not know this, and assumes by default, I believe, iso8859-1 encoding. PEP 263 describes how to fix this: #-*- coding: utf-8 -*- print u'umlauts äöü and and greek αβγδ' This script gives -- because is was written with an editor that writes a UTF8-encoded file -- the expected output, because the "comment" in the first line tells the interpreter explicitly, how to convert string constants into Unicode objects. Another option would be to explictly create a Python unicode object from a "normal" Python string: s = unicode('umlauts äöü and and greek αβγδ', 'utf-8') print s When this one-liner is run, the Python interpreter gives also the deprecation warning, but the text is printed correctly. Abel _______________________________________________ pygtk mailing list [email protected] http://www.daa.com.au/mailman/listinfo/pygtk Read the PyGTK FAQ: http://www.async.com.br/faq/pygtk/
