Re: [Tutor] Unicode problems

2006-08-31 Thread Kent Johnson
Ed Singleton wrote:
 On 8/29/06, Kent Johnson [EMAIL PROTECTED] wrote:
 The main problem  I am having is in getting python not to give an
 error when it encounters a sterling currency sign (£, pound sign here
 in UK), which I suspect might be some wider problem on the mac as when
 I type that character in the terminal it shows a # (but in Python it
 shows a £).
   
 Where is the pound sign coming from? What encoding is it in? What do you
 mean, in Python it shows £? You said Python gives an error...Fixing your
 first problem may not help this one without a bit more digging... (BTW
 in the US a # is sometimes called a 'pound sign', maybe the computer is
 trying to translate for you ;) - though it is for pound weight, not
 pound sterling.)
 

 The pound sign is in the source code in a string, or in a text file I
 was reading in.  Both should be in utf-8 as I save all files to that
 by default.  I think it was (hopefully) just that python was choking
 on printing the character (I was printing everything for debugging
 purposes).
   
You also need to tell Python that the file is in UTF-8 by putting an 
encoding declaration at the top of the file.

# -*- coding: utf-8 -*-

You probably want to make the strings Unicode strings as well, e.g. u'xxx'.
 If I type £ into a text document and copy and paste it to the python
 console, it comes out as  £ (with a space).  If I copy and paste it
 back, the space is gone.
   
Sounds like maybe you are pasting Unicode (two bytes) and the console 
interprets it as two characters.
 If I type test £ (without quotes) into a text document and copy and
 paste it to the console it comes out as #test and goes to a new
 line, as if I had pressed enter.
   
That on is very strange.

By the way you can explicitly control the conversion on output by using e.g.

print someString.encode('utf-8')

Finally, please keep the discussion on list.

Kent
 I'll keep digging and trying things out.

 Thanks

 Ed
   


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Unicode problems

2006-08-29 Thread Ed Singleton
I've been having unicode problems in python on Mac OS 10.4.

I googled for it and found a good page in Dive Into Python that I
thought might help
(http://www.diveintopython.org/xml_processing/unicode.html).

I tried following the instructions and set my default encoding using a
sitecustomize.py, but got the following:

 import sys
 sys.getdefaultencoding()
'utf-8'
 s = u'La Pe\xf1a'
 print s
Traceback (most recent call last):
  File stdin, line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
position 5: ordinal not in range(128)


As I understand it, that should work.  I tried using different
character sets (like latin-1, etc), but none of them work.

The main problem  I am having is in getting python not to give an
error when it encounters a sterling currency sign (£, pound sign here
in UK), which I suspect might be some wider problem on the mac as when
I type that character in the terminal it shows a # (but in Python it
shows a £).

Any help, or hints greatly appreciated.

Thanks

Ed
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Unicode problems

2006-08-29 Thread Kent Johnson
Ed Singleton wrote:
 I've been having unicode problems in python on Mac OS 10.4.

 I googled for it and found a good page in Dive Into Python that I
 thought might help
 (http://www.diveintopython.org/xml_processing/unicode.html).

 I tried following the instructions and set my default encoding using a
 sitecustomize.py, but got the following:

   
 import sys
 sys.getdefaultencoding()
 
 'utf-8'
   
 s = u'La Pe\xf1a'
 print s
 
 Traceback (most recent call last):
   File stdin, line 1, in ?
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
 position 5: ordinal not in range(128)
   

 As I understand it, that should work.  I tried using different
 character sets (like latin-1, etc), but none of them work.
   
I'm not sure Dive into Python is correct. Here is what I get on Windows:
In [1]: s = u'La Pe\xf1a'

In [2]: print s
La Peña

In [3]: import sys

In [4]: sys.getdefaultencoding()
Out[4]: 'ascii'

In [5]: sys.stdout.encoding
Out[5]: 'cp437'

I think print converts to the encoding of sys.stdout, not the default 
encoding. What is the value of sys.stdout.encoding on your machine?

Kent
 The main problem  I am having is in getting python not to give an
 error when it encounters a sterling currency sign (£, pound sign here
 in UK), which I suspect might be some wider problem on the mac as when
 I type that character in the terminal it shows a # (but in Python it
 shows a £).

Where is the pound sign coming from? What encoding is it in? What do you 
mean, in Python it shows £? You said Python gives an error...Fixing your 
first problem may not help this one without a bit more digging... (BTW 
in the US a # is sometimes called a 'pound sign', maybe the computer is 
trying to translate for you ;) - though it is for pound weight, not 
pound sterling.)

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor