Re: [Tutor] symbol encoding and processing problem

2007-10-19 Thread Tim Golden
Timmie wrote:
> I am totally lost:
> * python has ascii as default encoding
> * my linux uses UTF-8 (therefore all files created on linux are UTF-8)
> * windows uses cp1250
> * IPtyhon something else: on the machine where I am currently on stdin is set 
> to
> cp850
> 
> So what encoding to I use to display and process characters that exeeed the
> standard english alphabet?

> My initial question was:
> 
> 1) get a coordinate (DEG° MIN' SEC'') as input from user via easygui
> 2) split that string into its subscripts: degrees, minutes and secons
> 3) do some processing of the 3 varaibles
> 4) print the output with easygui.
> 
> I am not really interested which is the best encoding. I want to know:
> * how I do this that I don't get a encoding error?
> * how do I code it that the code runs on linux and windows
> from file and in IPython

I realise that I am running the risk of confusing you further,
but I'm afraid that your attitude of "This isn't my problem;
it's Python's" isn't really going to wash. If you're going to
be using characters which fall outside the realm of 7-bit
ASCII you're going to have to get some understanding of how
the various input, output and language mechanisms deal with
them. And all the more so if you're trying to do this cross-platform.

Maybe there's some kind of sealed environment in some other
language or operating system which takes care of all of this
for you transparently. I wouldn't know. What I do know is that,
if you're using the Python interpreter under Windows and Linux
and whatever else then you're at the mercy of those operating
systems at a certain level.

There are at least two points you have to understand:

1) Python needs to know what encoding was used to save a text file
which it is compiling to bytecode: usually a .py file. It has a default
which you can override in a couple of ways. If whatever encoding you've
specified turns out not to match the text in, say, a literal string with
a degree symbol, then Python will not know what to do and will stop with
an exception. Of course how you encoded the file in question is between
you and your editor.

2) When you are reading or writing text to or from a console or GUI window
or database or PDF or whatever, you also need to know what encoding to use.
If you're writing out, then whatever you're writing to will be able to
make sense of the encoding you're supplying -- and you may need to say
which one it was. If you're reading in, you are at the mercy of libraries:
some will always return unicode (BeautifulSoup springs to mind), others will
return raw bytes leaving it up to you to decode, others will return an
encoded string. This is pretty much an historical artefact (or, sadly in some
cases, a case of ignorance) and you're going to have to cope with it.

On my windows box, easygui handles unicode perfectly well, and the
console running cp437 displays the degree sign. If if didn't, I'd
have to compromise on the display (or use chcp to switch code pages
first). To illustrate, the following program works:


import easygui

sample = u"DEG\u00b0 MIN' SEC\""
from_user = easygui.enterbox (u"Enter" + sample)
#
# Paste in values from your email since I can't
# be bothered to work out how to get the degree
# sign
#

print from_user



and from_user is a perfectly good unicode string. Now, if you
want to write that out to a file, or a database or what-have-you
which can't store unicode natively, then you'll have to encode
it, probably as UTF8 which can encode anything.

For this email, I've used the unicode-escape, but if -- as
you did -- you wanted to use the string literal, then you'd
need to save the .py file in a certain encoding and to place
a line at the top of the file indicating what that encoding
was. If you're happy using unicode-escapes then that saves
a bit of finnicking about.

TJG
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-19 Thread Timmie
> > I still don't know why there is such a encoding mess on Python. For me
> this
> > totally neglects the statement that python code is easily portable or
> executable
> > on other platforms.
> 
> I don't think this is entirely fair. For example at the start you had a 
> file containing cp1252 data but told Python it was utf-8. You can't 
> really expect Python to just do the right thing in such circumstances.
Here you are right. I was (without knowing it) doeing the wrong thing.
Well, my judgement was based on a confusion that I haven't been able to clear so
far:

At first my file was encoded differently than the python default encoding and I
coded even a different encoding in the file head. After I'd adaped all this
correctly my code worked well.
I tried to use the same code interactively on the IPython shell. As I said from
my previous post, the shell uses again a different encoding. Therefore, the same
code that worked well from file did fail. When I found out that I could set the
default encoding to UTF-8 you warned me of a lesser portability.
I am totally lost:
* python has ascii as default encoding
* my linux uses UTF-8 (therefore all files created on linux are UTF-8)
* windows uses cp1250
* IPtyhon something else: on the machine where I am currently on stdin is set to
cp850

So what encoding to I use to display and process characters that exeeed the
standard english alphabet?

> Does you IDL code include non-ascii characters? Perhaps you would do 
> better if you stuck with ascii for your Python source code, too - there 
> is no need to include non-ascii characters directly, you can use \x 
> string escapes to insert them into your strings.
My initial question was:

1) get a coordinate (DEG° MIN' SEC'') as input from user via easygui
2) split that string into its subscripts: degrees, minutes and secons
3) do some processing of the 3 varaibles
4) print the output with easygui.

I am not really interested which is the best encoding. I want to know:
* how I do this that I don't get a encoding error?
* how do I code it that the code runs on linux and windows
from file and in IPython

Thanks again for your support. I don't want any misunderstandings. You were a
good tutor to me and are helping
patiently many user stepping into python.

Kind regards,
Timmie



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-18 Thread Kent Johnson
Timmie wrote:
>> Was the problem with the print statements? Maybe changing the console 
>> encoding would help. I have some notes here:
>> http://personalpages.tds.net/~kent37/stories/00018.html
> Thanks, I read it yesterday evening.
> 
> I still don't know why there is such a encoding mess on Python. For me this
> totally neglects the statement that python code is easily portable or 
> executable
> on other platforms.

I don't think this is entirely fair. For example at the start you had a 
file containing cp1252 data but told Python it was utf-8. You can't 
really expect Python to just do the right thing in such circumstances.

Encoding issues require a certain amount of understanding of the 
underlying issues. In my experience most programmers find it confusing 
at first but it eventually sorts out.

> For instance, I am also unsing some IDL code. There I open the IDLE and code
> right away without worring about the encoding.

Does you IDL code include non-ascii characters? Perhaps you would do 
better if you stuck with ascii for your Python source code, too - there 
is no need to include non-ascii characters directly, you can use \x 
string escapes to insert them into your strings.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-18 Thread Timmie
> Was the problem with the print statements? Maybe changing the console 
> encoding would help. I have some notes here:
> http://personalpages.tds.net/~kent37/stories/00018.html
Thanks, I read it yesterday evening.

I still don't know why there is such a encoding mess on Python. For me this
totally neglects the statement that python code is easily portable or executable
on other platforms.

For instance, I am also unsing some IDL code. There I open the IDLE and code
right away without worring about the encoding.

Kind regards,
Timmie

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-18 Thread Timmie
> Just be aware that this affects portability of your scripts; they will 
> require this same change to run on other systems. For this reason you 
> might want to change the code instead.

> If you give a specific example of what is failing I will try to help.
>From the previous posts I learned that I should save the file as utf-8 encoded
and use unicode where possible. Atleast I hope that I understood this correctly.
By that method my inital posted code worked well.

I still have the problem that the same code works on some machines and others
not when I am unsing IPython. IPython apparently uses its own encoding or the
encoding of the underlying platform.
How can I avoid this problem?



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Kent Johnson
Evert Rol wrote:
>>>  >>> print unicode("125° 15' 5.55''", 'utf-8')
>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in  
>>> position 3: ordinal not in range(128)
>>
>> This is the same as the first encode error.
> 
> This is the thing I don't get; or only partly: I'm sending a utf-8 
> encoded string to print.

No, you are sending a unicode string to print.
   unicode("125° 15' 5.55''", 'utf-8')
means the same as
   "125° 15' 5.55''".decode('utf-8')
which is, "create a unicode string from this utf-8-encoded byte string". 
Once you have decoded to Unicode it is no longer utf-8.

> print apparently ignores that, and still tries 
> to print things using ascii encoding. If I'm correct in that assessment, 
> then why would print ignore that?

print just knows that you want to print a unicode string. stdout is 
byte-oriented so the unicode chars have to be converted to a byte 
stream. This is done by encoding with sys.getdefaultencoding(), i.e.
   print unicode("125° 15' 5.55''", 'utf-8')
is the same as
   print u"125° 15' 5.55''"
which is the same as
   print u"125° 15' 5.55''".encode(sys.getdefaultencoding())

> Ie, use encode('utf-8') where necessary?

Yes.

> But I did see some examples pass by using
> 
>   import sys
>   sys.setdefaultencoding('utf-8')

Yes, that will make the examples pass, it just isn't the recommended 
solution.

> Oh well, in general I tend to play long enough with things like this 
> that 1) I get it (script) working, and 2) I have a decent feeling (90%) 
> that I actually understand what is going on, and why other things 
> failed. Which is roughly where I am now ;-).

The key thing is to realize that there are implicit conversions between 
str and unicode and they will break if the data is not ascii. The best 
fix is to make the conversions explicit by providing the correct encoding.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Evert Rol
 raw = unicode("125° 15' 5.55''", 'utf-8')
>>> Again, I think this can be simplified to
>>>raw = u"125° 15' 5.55''"
>> It does, but it's getting confusing when I compare the following:
>>  >>> raw = u"125° 15' 5.55''"
>> 125° 15' 5.55''
>
> Where does that output come from?

sorry, my bad: over-hastily copy of non-existant output.

>>  >>> print u"125° 15' 5.55''"
>> UnicodeEncodeError: 'ascii' codec can't encode characters in  
>> position  3-4: ordinal not in range(128)
>
> print must encode unicode strings. It tries to encode them using  
> the default encoding which doesnt' work because the source is not  
> ascii.
>>  >>> print u"125° 15' 5.55''".encode('utf-8')
>> 125° 15' 5.55''
>
> That is the way to get it to work.
>
>>  >>> print unicode("125° 15' 5.55''")
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in  
>> position  3: ordinal not in range(128)
>
> Here the problem is trying to create the unicode string using the  
> default encoding, again it doesn't work because the source contains  
> non-ascii characters.
>
>>  >>> print unicode("125° 15' 5.55''", 'utf-8')
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0'  
>> in  position 3: ordinal not in range(128)
>
> This is the same as the first encode error.

This is the thing I don't get; or only partly: I'm sending a utf-8  
encoded string to print. print apparently ignores that, and still  
tries to print things using ascii encoding. If I'm correct in that  
assessment, then why would print ignore that?


>> So apart from the errors all being slightly different, is there   
>> perhaps some difference between the str() and repr() functions  
>> (looks  like repr uses escape backslashes)?
>
> Right.
>
>> And checking the default encoding inside the python cmdline, I  
>> see  that my sys module doesn't actually have a setdefaultencoding 
>> ()  method; was that something that should have been properly  
>> configured  at compile time? The documentation mentions something  
>> about the site  module, but I can't find it there either.
>
> The setdefaultencoding() function (it's not a method, it is a  
> module-level function)

yes, sorry, got my terminology wrong there.

> is removed from the sys module as part of startup (I think by the  
> site module). That is why you have to call it from  
> sitecustomize.py. You can also
>   reload(sys)
> to restore it but it's better to write your app so it doesn't  
> require the default encoding to be changed.

Ie, use encode('utf-8') where necessary?
But I did see some examples pass by using

   import sys
   sys.setdefaultencoding('utf-8')

??

Oh well, in general I tend to play long enough with things like this  
that 1) I get it (script) working, and 2) I have a decent feeling  
(90%) that I actually understand what is going on, and why other  
things failed. Which is roughly where I am now ;-).

   Evert


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Kent Johnson
Evert Rol wrote:
>>> raw = unicode("125° 15' 5.55''", 'utf-8')
>> Again, I think this can be simplified to
>>raw = u"125° 15' 5.55''"
> 
> It does, but it's getting confusing when I compare the following:
> 
>  >>> raw = u"125° 15' 5.55''"
> 125° 15' 5.55''

Where does that output come from?
> 
>  >>> print u"125° 15' 5.55''"
> UnicodeEncodeError: 'ascii' codec can't encode characters in position  
> 3-4: ordinal not in range(128)

print must encode unicode strings. It tries to encode them using the 
default encoding which doesnt' work because the source is not ascii.
> 
>  >>> print u"125° 15' 5.55''".encode('utf-8')
> 125° 15' 5.55''

That is the way to get it to work.

>  >>> print unicode("125° 15' 5.55''")
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position  
> 3: ordinal not in range(128)

Here the problem is trying to create the unicode string using the 
default encoding, again it doesn't work because the source contains 
non-ascii characters.

>  >>> print unicode("125° 15' 5.55''", 'utf-8')
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in  
> position 3: ordinal not in range(128)

This is the same as the first encode error.

> So apart from the errors all being slightly different, is there  
> perhaps some difference between the str() and repr() functions (looks  
> like repr uses escape backslashes)?

Right.

> And checking the default encoding inside the python cmdline, I see  
> that my sys module doesn't actually have a setdefaultencoding()  
> method; was that something that should have been properly configured  
> at compile time? The documentation mentions something about the site  
> module, but I can't find it there either.

The setdefaultencoding() function (it's not a method, it is a 
module-level function) is removed from the sys module as part of startup 
(I think by the site module). That is why you have to call it from 
sitecustomize.py. You can also
   reload(sys)
to restore it but it's better to write your app so it doesn't require 
the default encoding to be changed.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Evert Rol
>> raw = unicode("125° 15' 5.55''", 'utf-8')
>
> Again, I think this can be simplified to
>raw = u"125° 15' 5.55''"

It does, but it's getting confusing when I compare the following:

 >>> raw = u"125° 15' 5.55''"
125° 15' 5.55''

 >>> print u"125° 15' 5.55''"
UnicodeEncodeError: 'ascii' codec can't encode characters in position  
3-4: ordinal not in range(128)

 >>> print u"125° 15' 5.55''".encode('utf-8')
125° 15' 5.55''

 >>> print unicode("125° 15' 5.55''")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position  
3: ordinal not in range(128)

 >>> print unicode("125° 15' 5.55''", 'utf-8')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in  
position 3: ordinal not in range(128)


So apart from the errors all being slightly different, is there  
perhaps some difference between the str() and repr() functions (looks  
like repr uses escape backslashes)?
Or does it simply have to do with my locale, which is set to the  
default "C" (terminal = standard Mac OS X terminal, with UTF-8  
encoding)? Although that wouldn't explain to me why the third  
statement works.
And checking the default encoding inside the python cmdline, I see  
that my sys module doesn't actually have a setdefaultencoding()  
method; was that something that should have been properly configured  
at compile time? The documentation mentions something about the site  
module, but I can't find it there either.

Any enlightenment on this is welcome.

   Evert


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Kent Johnson
Timmie wrote:
> OK, I found out.
>> Since it didn't work in IPython as well I assume that I need to change the
>> encoding of the IPython shell to UTF-8, too. Still need to find out where.
> Put a file called 'sitecustomize.py' into any directory on your PYTHONPATH.
> 
> write the folowing two lines in that file:
> 
> import sys
> sys.setdefaultencoding('utf-8')

Just be aware that this affects portability of your scripts; they will 
require this same change to run on other systems. For this reason you 
might want to change the code instead. If you give a specific example of 
what is failing I will try to help.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Timmie
OK, I found out.
> Since it didn't work in IPython as well I assume that I need to change the
> encoding of the IPython shell to UTF-8, too. Still need to find out where.
Put a file called 'sitecustomize.py' into any directory on your PYTHONPATH.

write the folowing two lines in that file:

import sys
sys.setdefaultencoding('utf-8')

To test, start ipython and type

import sys
sys.getdefaultencoding()

It should be utf-9 now.

Again, may sound trivial. for some. But I started my Python adventures on Ubuntu
Linux which it set to UTF-8 as default encoding. Due to some software
environments I am currently forced to use Windows. After installing Python and
some modules with setup.exe I never new that I even have to care for this...

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Kent Johnson
Timmie wrote:

>> Get an editor on Windows that can edit UTF-8 text files and file 
>> transfer software that doesn't change the text encoding. Work with UTF-8 
>> exclusively.
> Thanks. This sounds really trivial but the thing is that one cannot define 
> file
> encoding in PythonWin.

Really! That is surprising. Anyone else know how to set the file 
encoding for the PythonWin editor?

> Since it didn't work in IPython as well I assume that I need to change the
> encoding of the IPython shell to UTF-8, too. Still need to find out where.

Was the problem with the print statements? Maybe changing the console 
encoding would help. I have some notes here:
http://personalpages.tds.net/~kent37/stories/00018.html

> The following code works on windows when saved to a UTF-8 encoded file:
> 
> # -*- coding: utf-8 -*-
> # the file needs to be set to UTF-8 encoding if working on windows
> from easygui import easygui
> raw = unicode("125° 15' 5.55''", 'utf-8')

Again, I think this can be simplified to
   raw = u"125° 15' 5.55''"

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Tim Golden
Timmie wrote:
>>> from easygui import easygui
>>> raw = unicode("121ø 55' 5.55''", 'utf-8')
>>> => gets a encoding error
>> Then your source file is not really in UTF-8.
> This really helped!
> 
>  
>> Get an editor on Windows that can edit UTF-8 text files and file 
>> transfer software that doesn't change the text encoding. Work with UTF-8 
>> exclusively.
> Thanks. This sounds really trivial but the thing is that one cannot define 
> file
> encoding in PythonWin.
> I will have to either use a advanced editor like Notepad++ and run the script
> via console or use Geany as IDE.

I'm sure there'll be lots of other suggestions, but the SciTE
editor (whose name I'm never sure how to prononunce without
blushing) understands the same encoding directive as Python.
It's quite lightweight, and also allows you to run Python scripts
directly, although there are limitations. Worth looking at, anyhow.

http://www.scintilla.org/SciTE.html

TJG
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Timmie
> > from easygui import easygui
> > raw = unicode("121ø 55' 5.55''", 'utf-8')
> > => gets a encoding error
> 
> Then your source file is not really in UTF-8.
This really helped!

 
> Get an editor on Windows that can edit UTF-8 text files and file 
> transfer software that doesn't change the text encoding. Work with UTF-8 
> exclusively.
Thanks. This sounds really trivial but the thing is that one cannot define file
encoding in PythonWin.
I will have to either use a advanced editor like Notepad++ and run the script
via console or use Geany as IDE.

Since it didn't work in IPython as well I assume that I need to change the
encoding of the IPython shell to UTF-8, too. Still need to find out where.

The following code works on windows when saved to a UTF-8 encoded file:

# -*- coding: utf-8 -*-
# the file needs to be set to UTF-8 encoding if working on windows
from easygui import easygui
raw = unicode("125° 15' 5.55''", 'utf-8')
print raw.encode('utf-8')
lines = raw.split(unicode('°', 'utf-8'))
print lines
entertext = easygui.enterbox(message="Enter something.", title="",  
argDefaultText=raw)
print entertext
degrees = lines[0]
print "degrees: ", str(degrees)

Thanks for your support, so far.
Timmie


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Kent Johnson
Timmie wrote:
> I tried your advice yesterday evening.
> 
>> And see if you get a ç.
> I see this character.
> 
> from easygui import easygui
> raw = unicode("121ø 55' 5.55''", 'utf-8')
> => gets a encoding error

Then your source file is not really in UTF-8.

BTW you can simply say
   raw = u"121ø 55' 5.55''"

> raw = unicode("121ø 55' 5.55''", 'cp1250')
> => this works while coding on windows.
> How do I make it work really crossplatform: On both Linux and Windows?

Get an editor on Windows that can edit UTF-8 text files and file 
transfer software that doesn't change the text encoding. Work with UTF-8 
exclusively.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-17 Thread Timmie
I tried your advice yesterday evening.

> And see if you get a ç.
I see this character.

from easygui import easygui
raw = unicode("121ø 55' 5.55''", 'utf-8')
=> gets a encoding error

raw = unicode("121ø 55' 5.55''", 'cp1250')
=> this works while coding on windows.
How do I make it work really crossplatform: On both Linux and Windows?

lines = raw.split(unicode('ø', 'cp1250'))
=> again work on windows

print lines
easygui.msgbox(raw)
=> prints a strange symbol instead of "°"

import Tkinker
Tkinter._test()
=> this test test the expected result.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-16 Thread Tim Michelsen
> How do you get this output? The print is after the statement causing the 
> traceback. Are you showing the same code as you ran?
Yes.
I created this file in PythonWin and run it with IPython.

> It displays correctly for me (on MacOS X). Are you sure your source is 
> actually encoded in utf-8?
Not really.
The thing is that I am exchanging my files from Linux (Ubuntu with UTF-8
default) to Windows.
> What platform are you on?
Yesterday I run the code on windows.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-16 Thread Kent Johnson
Tim Michelsen wrote:
> Dear list,
> I have encountered a problem with encoding of user input and variables.

> Heres my test script:

Posting without the line numbers would make it easier to try your code.
> 
>  1 #!/usr/bin/env python
>  2 # -*- coding: utf-8 -*-
>  3 from easygui import easygui
>  4 import sys
>  5 #raw = sys.argv[1]
>  6 raw = easygui.enterbox(message="Enter something.", title="",
> argDefaultText="20° 12' 33''")
>  7 #unicode = unicode(raw)
>  8 #conv = raw.encoding('latin-1')
>  9 split = raw.split('°')

try a unicode string: split = raw.split(u'°')
> 10 out = raw
> 11 print out
> 12 easygui.msgbox(out)
> 
> Here ist my output:
> 
> 20° 12' 33''

How do you get this output? The print is after the statement causing the 
traceback. Are you showing the same code as you ran?

> Traceback (most recent call last):
>   File 
> "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
> line 310, in RunScript
> exec codeObject in __main__.__dict__
>   File "D:\python\scripts\encoding-test.py", line 22, in ?
> split = raw.split('°')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: 
> ordinal
> not in range(128)
> Traceback (most recent call last):
>   File 
> "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
> line 310, in RunScript
> exec codeObject in __main__.__dict__
>   File "D:\python\scripts\encoding-test.py", line 22, in ?
> split = raw.split('°')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: 
> ordinal
> not in range(128)
> 
> Therefore my question:
> * How can I split at the "°" without getting a charater encoding error?
> * How do I have to encode the "°"-symbol that it gets correctly displayed in 
> the
> Easygui msgbox at line 6?

It displays correctly for me (on MacOS X). Are you sure your source is 
actually encoded in utf-8? What platform are you on?

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] symbol encoding and processing problem

2007-10-16 Thread Evert Rol
   Hi Tim,

> Heres my test script:
>
>  1 #!/usr/bin/env python
>  2 # -*- coding: utf-8 -*-
>  3 from easygui import easygui
>  4 import sys
>  5 #raw = sys.argv[1]
>  6 raw = easygui.enterbox(message="Enter something.", title="",
> argDefaultText="20° 12' 33''")
>  7 #unicode = unicode(raw)
>  8 #conv = raw.encoding('latin-1')
>  9 split = raw.split('°')
> 10 out = raw
> 11 print out
> 12 easygui.msgbox(out)
>
> Here ist my output:
>
> 20° 12' 33''
> Traceback (most recent call last):
>   File "C:\python24\Lib\site-packages\pythonwin\pywin\framework 
> \scriptutils.py",
> line 310, in RunScript
> exec codeObject in __main__.__dict__
>   File "D:\python\scripts\encoding-test.py", line 22, in ?
> split = raw.split('°')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in  
> position 0: ordinal
> not in range(128)
> Traceback (most recent call last):
>   File "C:\python24\Lib\site-packages\pythonwin\pywin\framework 
> \scriptutils.py",
> line 310, in RunScript
> exec codeObject in __main__.__dict__
>   File "D:\python\scripts\encoding-test.py", line 22, in ?
> split = raw.split('°')
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in  
> position 0: ordinal
> not in range(128)
>
> Therefore my question:
> * How can I split at the "°" without getting a charater encoding  
> error?
> * How do I have to encode the "°"-symbol that it gets correctly  
> displayed in the
> Easygui msgbox at line 6?

I don't know all the details about unicode, but here's what works for  
me:

# -*- coding: utf-8 -*-
import easygui
raw = unicode("121° 55' 5.55''", 'utf-8')
print raw.encode('utf-8')
lines = raw.split(unicode('°', 'utf-8'))
print lines
easygui.enterbox(message="Enter something.", title="",  
argDefaultText=raw)

So you may need to explicitly define the encoding (and encode the  
degree sign in the split argument as well). Google a bit for Python  
and unicode to get some more info, if you didn't do so already. (I'm  
really hoping that with Python 3 all this messy stuff does go away.)

I had no problem with the easygui box, either my way or hardcoding  
the string to argDefaultText as you did. Perhaps it's an underlying  
problem with Tkinter, which may not support unicode on your system? A  
simple test (not sure how definite that would be), is to start up a  
Python cmdline, and do
 >>> import Tkinter
 >>> Tkinter._test()

And see if you get a ç.


Good luck,

   Evert

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] symbol encoding and processing problem

2007-10-16 Thread Tim Michelsen
Dear list,
I have encountered a problem with encoding of user input and variables.

I want to read in user defined coordinates as a string like: 121° 55' 5.55''
Furthermore I would like to extract the degrees (integer number before the " ° "
sign), the minutes (integer number before the " ' " sign) and the seconds
(floating point number before the " '' " sign).

When reading and processing the degree part I get some errors:

Heres my test script:

 1 #!/usr/bin/env python
 2 # -*- coding: utf-8 -*-
 3 from easygui import easygui
 4 import sys
 5 #raw = sys.argv[1]
 6 raw = easygui.enterbox(message="Enter something.", title="",
argDefaultText="20° 12' 33''")
 7 #unicode = unicode(raw)
 8 #conv = raw.encoding('latin-1')
 9 split = raw.split('°')
10 out = raw
11 print out
12 easygui.msgbox(out)

Here ist my output:

20° 12' 33''
Traceback (most recent call last):
  File "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
exec codeObject in __main__.__dict__
  File "D:\python\scripts\encoding-test.py", line 22, in ?
split = raw.split('°')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal
not in range(128)
Traceback (most recent call last):
  File "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
line 310, in RunScript
exec codeObject in __main__.__dict__
  File "D:\python\scripts\encoding-test.py", line 22, in ?
split = raw.split('°')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal
not in range(128)

Therefore my question:
* How can I split at the "°" without getting a charater encoding error?
* How do I have to encode the "°"-symbol that it gets correctly displayed in the
Easygui msgbox at line 6?

Thanks inadvance for answering,
Timmie


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor