Re: [Tutor] Unicode problems
Ed Singleton wrote: On 8/29/06, Kent Johnson [EMAIL PROTECTED] wrote: The main problem I am having is in getting python not to give an error when it encounters a sterling currency sign (£, pound sign here in UK), which I suspect might be some wider problem on the mac as when I type that character in the terminal it shows a # (but in Python it shows a £). Where is the pound sign coming from? What encoding is it in? What do you mean, in Python it shows £? You said Python gives an error...Fixing your first problem may not help this one without a bit more digging... (BTW in the US a # is sometimes called a 'pound sign', maybe the computer is trying to translate for you ;) - though it is for pound weight, not pound sterling.) The pound sign is in the source code in a string, or in a text file I was reading in. Both should be in utf-8 as I save all files to that by default. I think it was (hopefully) just that python was choking on printing the character (I was printing everything for debugging purposes). You also need to tell Python that the file is in UTF-8 by putting an encoding declaration at the top of the file. # -*- coding: utf-8 -*- You probably want to make the strings Unicode strings as well, e.g. u'xxx'. If I type £ into a text document and copy and paste it to the python console, it comes out as £ (with a space). If I copy and paste it back, the space is gone. Sounds like maybe you are pasting Unicode (two bytes) and the console interprets it as two characters. If I type test £ (without quotes) into a text document and copy and paste it to the console it comes out as #test and goes to a new line, as if I had pressed enter. That on is very strange. By the way you can explicitly control the conversion on output by using e.g. print someString.encode('utf-8') Finally, please keep the discussion on list. Kent I'll keep digging and trying things out. Thanks Ed ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Unicode problems
I've been having unicode problems in python on Mac OS 10.4. I googled for it and found a good page in Dive Into Python that I thought might help (http://www.diveintopython.org/xml_processing/unicode.html). I tried following the instructions and set my default encoding using a sitecustomize.py, but got the following: import sys sys.getdefaultencoding() 'utf-8' s = u'La Pe\xf1a' print s Traceback (most recent call last): File stdin, line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 5: ordinal not in range(128) As I understand it, that should work. I tried using different character sets (like latin-1, etc), but none of them work. The main problem I am having is in getting python not to give an error when it encounters a sterling currency sign (£, pound sign here in UK), which I suspect might be some wider problem on the mac as when I type that character in the terminal it shows a # (but in Python it shows a £). Any help, or hints greatly appreciated. Thanks Ed ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unicode problems
Ed Singleton wrote: I've been having unicode problems in python on Mac OS 10.4. I googled for it and found a good page in Dive Into Python that I thought might help (http://www.diveintopython.org/xml_processing/unicode.html). I tried following the instructions and set my default encoding using a sitecustomize.py, but got the following: import sys sys.getdefaultencoding() 'utf-8' s = u'La Pe\xf1a' print s Traceback (most recent call last): File stdin, line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 5: ordinal not in range(128) As I understand it, that should work. I tried using different character sets (like latin-1, etc), but none of them work. I'm not sure Dive into Python is correct. Here is what I get on Windows: In [1]: s = u'La Pe\xf1a' In [2]: print s La Peña In [3]: import sys In [4]: sys.getdefaultencoding() Out[4]: 'ascii' In [5]: sys.stdout.encoding Out[5]: 'cp437' I think print converts to the encoding of sys.stdout, not the default encoding. What is the value of sys.stdout.encoding on your machine? Kent The main problem I am having is in getting python not to give an error when it encounters a sterling currency sign (£, pound sign here in UK), which I suspect might be some wider problem on the mac as when I type that character in the terminal it shows a # (but in Python it shows a £). Where is the pound sign coming from? What encoding is it in? What do you mean, in Python it shows £? You said Python gives an error...Fixing your first problem may not help this one without a bit more digging... (BTW in the US a # is sometimes called a 'pound sign', maybe the computer is trying to translate for you ;) - though it is for pound weight, not pound sterling.) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor