Re: [Tutor] Unicode problems

2006-08-31 Thread Kent Johnson
Ed Singleton wrote:
> On 8/29/06, Kent Johnson <[EMAIL PROTECTED]> wrote:
>>> The main problem  I am having is in getting python not to give an
>>> error when it encounters a sterling currency sign (£, pound sign here
>>> in UK), which I suspect might be some wider problem on the mac as when
>>> I type that character in the terminal it shows a # (but in Python it
>>> shows a £).
>>>   
>> Where is the pound sign coming from? What encoding is it in? What do you
>> mean, in Python it shows £? You said Python gives an error...Fixing your
>> first problem may not help this one without a bit more digging... (BTW
>> in the US a # is sometimes called a 'pound sign', maybe the computer is
>> trying to translate for you ;) - though it is for pound weight, not
>> pound sterling.)
>> 
>
> The pound sign is in the source code in a string, or in a text file I
> was reading in.  Both should be in utf-8 as I save all files to that
> by default.  I think it was (hopefully) just that python was choking
> on printing the character (I was printing everything for debugging
> purposes).
>   
You also need to tell Python that the file is in UTF-8 by putting an 
encoding declaration at the top of the file.

# -*- coding: utf-8 -*-

You probably want to make the strings Unicode strings as well, e.g. u'xxx'.
> If I type "£" into a text document and copy and paste it to the python
> console, it comes out as " £" (with a space).  If I copy and paste it
> back, the space is gone.
>   
Sounds like maybe you are pasting Unicode (two bytes) and the console 
interprets it as two characters.
> If I type "test £" (without quotes) into a text document and copy and
> paste it to the console it comes out as "#test" and goes to a new
> line, as if I had pressed enter.
>   
That on is very strange.

By the way you can explicitly control the conversion on output by using e.g.

print someString.encode('utf-8')

Finally, please keep the discussion on list.

Kent
> I'll keep digging and trying things out.
>
> Thanks
>
> Ed
>   


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Unicode problems

2006-08-29 Thread Kent Johnson
Ed Singleton wrote:
> I've been having unicode problems in python on Mac OS 10.4.
>
> I googled for it and found a good page in Dive Into Python that I
> thought might help
> (http://www.diveintopython.org/xml_processing/unicode.html).
>
> I tried following the instructions and set my default encoding using a
> sitecustomize.py, but got the following:
>
>   
 import sys
 sys.getdefaultencoding()
 
> 'utf-8'
>   
 s = u'La Pe\xf1a'
 print s
 
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
> position 5: ordinal not in range(128)
>   
>
> As I understand it, that should work.  I tried using different
> character sets (like latin-1, etc), but none of them work.
>   
I'm not sure Dive into Python is correct. Here is what I get on Windows:
In [1]: s = u'La Pe\xf1a'

In [2]: print s
La Peña

In [3]: import sys

In [4]: sys.getdefaultencoding()
Out[4]: 'ascii'

In [5]: sys.stdout.encoding
Out[5]: 'cp437'

I think print converts to the encoding of sys.stdout, not the default 
encoding. What is the value of sys.stdout.encoding on your machine?

Kent
> The main problem  I am having is in getting python not to give an
> error when it encounters a sterling currency sign (£, pound sign here
> in UK), which I suspect might be some wider problem on the mac as when
> I type that character in the terminal it shows a # (but in Python it
> shows a £).

Where is the pound sign coming from? What encoding is it in? What do you 
mean, in Python it shows £? You said Python gives an error...Fixing your 
first problem may not help this one without a bit more digging... (BTW 
in the US a # is sometimes called a 'pound sign', maybe the computer is 
trying to translate for you ;) - though it is for pound weight, not 
pound sterling.)

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Unicode problems

2006-08-29 Thread Ed Singleton
I've been having unicode problems in python on Mac OS 10.4.

I googled for it and found a good page in Dive Into Python that I
thought might help
(http://www.diveintopython.org/xml_processing/unicode.html).

I tried following the instructions and set my default encoding using a
sitecustomize.py, but got the following:

>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> s = u'La Pe\xf1a'
>>> print s
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in
position 5: ordinal not in range(128)
>>>

As I understand it, that should work.  I tried using different
character sets (like latin-1, etc), but none of them work.

The main problem  I am having is in getting python not to give an
error when it encounters a sterling currency sign (£, pound sign here
in UK), which I suspect might be some wider problem on the mac as when
I type that character in the terminal it shows a # (but in Python it
shows a £).

Any help, or hints greatly appreciated.

Thanks

Ed
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor