Re: [Tutor] Unicode problems
Ed Singleton wrote: > On 8/29/06, Kent Johnson <[EMAIL PROTECTED]> wrote: >>> The main problem I am having is in getting python not to give an >>> error when it encounters a sterling currency sign (£, pound sign here >>> in UK), which I suspect might be some wider problem on the mac as when >>> I type that character in the terminal it shows a # (but in Python it >>> shows a £). >>> >> Where is the pound sign coming from? What encoding is it in? What do you >> mean, in Python it shows £? You said Python gives an error...Fixing your >> first problem may not help this one without a bit more digging... (BTW >> in the US a # is sometimes called a 'pound sign', maybe the computer is >> trying to translate for you ;) - though it is for pound weight, not >> pound sterling.) >> > > The pound sign is in the source code in a string, or in a text file I > was reading in. Both should be in utf-8 as I save all files to that > by default. I think it was (hopefully) just that python was choking > on printing the character (I was printing everything for debugging > purposes). > You also need to tell Python that the file is in UTF-8 by putting an encoding declaration at the top of the file. # -*- coding: utf-8 -*- You probably want to make the strings Unicode strings as well, e.g. u'xxx'. > If I type "£" into a text document and copy and paste it to the python > console, it comes out as " £" (with a space). If I copy and paste it > back, the space is gone. > Sounds like maybe you are pasting Unicode (two bytes) and the console interprets it as two characters. > If I type "test £" (without quotes) into a text document and copy and > paste it to the console it comes out as "#test" and goes to a new > line, as if I had pressed enter. > That on is very strange. By the way you can explicitly control the conversion on output by using e.g. print someString.encode('utf-8') Finally, please keep the discussion on list. Kent > I'll keep digging and trying things out. > > Thanks > > Ed > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unicode problems
Ed Singleton wrote: > I've been having unicode problems in python on Mac OS 10.4. > > I googled for it and found a good page in Dive Into Python that I > thought might help > (http://www.diveintopython.org/xml_processing/unicode.html). > > I tried following the instructions and set my default encoding using a > sitecustomize.py, but got the following: > > import sys sys.getdefaultencoding() > 'utf-8' > s = u'La Pe\xf1a' print s > Traceback (most recent call last): > File "", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in > position 5: ordinal not in range(128) > > > As I understand it, that should work. I tried using different > character sets (like latin-1, etc), but none of them work. > I'm not sure Dive into Python is correct. Here is what I get on Windows: In [1]: s = u'La Pe\xf1a' In [2]: print s La Peña In [3]: import sys In [4]: sys.getdefaultencoding() Out[4]: 'ascii' In [5]: sys.stdout.encoding Out[5]: 'cp437' I think print converts to the encoding of sys.stdout, not the default encoding. What is the value of sys.stdout.encoding on your machine? Kent > The main problem I am having is in getting python not to give an > error when it encounters a sterling currency sign (£, pound sign here > in UK), which I suspect might be some wider problem on the mac as when > I type that character in the terminal it shows a # (but in Python it > shows a £). Where is the pound sign coming from? What encoding is it in? What do you mean, in Python it shows £? You said Python gives an error...Fixing your first problem may not help this one without a bit more digging... (BTW in the US a # is sometimes called a 'pound sign', maybe the computer is trying to translate for you ;) - though it is for pound weight, not pound sterling.) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Unicode problems
I've been having unicode problems in python on Mac OS 10.4. I googled for it and found a good page in Dive Into Python that I thought might help (http://www.diveintopython.org/xml_processing/unicode.html). I tried following the instructions and set my default encoding using a sitecustomize.py, but got the following: >>> import sys >>> sys.getdefaultencoding() 'utf-8' >>> s = u'La Pe\xf1a' >>> print s Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 5: ordinal not in range(128) >>> As I understand it, that should work. I tried using different character sets (like latin-1, etc), but none of them work. The main problem I am having is in getting python not to give an error when it encounters a sterling currency sign (£, pound sign here in UK), which I suspect might be some wider problem on the mac as when I type that character in the terminal it shows a # (but in Python it shows a £). Any help, or hints greatly appreciated. Thanks Ed ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor