Re: [Tutor] symbol encoding and processing problem
Timmie wrote: > I am totally lost: > * python has ascii as default encoding > * my linux uses UTF-8 (therefore all files created on linux are UTF-8) > * windows uses cp1250 > * IPtyhon something else: on the machine where I am currently on stdin is set > to > cp850 > > So what encoding to I use to display and process characters that exeeed the > standard english alphabet? > My initial question was: > > 1) get a coordinate (DEG° MIN' SEC'') as input from user via easygui > 2) split that string into its subscripts: degrees, minutes and secons > 3) do some processing of the 3 varaibles > 4) print the output with easygui. > > I am not really interested which is the best encoding. I want to know: > * how I do this that I don't get a encoding error? > * how do I code it that the code runs on linux and windows > from file and in IPython I realise that I am running the risk of confusing you further, but I'm afraid that your attitude of "This isn't my problem; it's Python's" isn't really going to wash. If you're going to be using characters which fall outside the realm of 7-bit ASCII you're going to have to get some understanding of how the various input, output and language mechanisms deal with them. And all the more so if you're trying to do this cross-platform. Maybe there's some kind of sealed environment in some other language or operating system which takes care of all of this for you transparently. I wouldn't know. What I do know is that, if you're using the Python interpreter under Windows and Linux and whatever else then you're at the mercy of those operating systems at a certain level. There are at least two points you have to understand: 1) Python needs to know what encoding was used to save a text file which it is compiling to bytecode: usually a .py file. It has a default which you can override in a couple of ways. If whatever encoding you've specified turns out not to match the text in, say, a literal string with a degree symbol, then Python will not know what to do and will stop with an exception. Of course how you encoded the file in question is between you and your editor. 2) When you are reading or writing text to or from a console or GUI window or database or PDF or whatever, you also need to know what encoding to use. If you're writing out, then whatever you're writing to will be able to make sense of the encoding you're supplying -- and you may need to say which one it was. If you're reading in, you are at the mercy of libraries: some will always return unicode (BeautifulSoup springs to mind), others will return raw bytes leaving it up to you to decode, others will return an encoded string. This is pretty much an historical artefact (or, sadly in some cases, a case of ignorance) and you're going to have to cope with it. On my windows box, easygui handles unicode perfectly well, and the console running cp437 displays the degree sign. If if didn't, I'd have to compromise on the display (or use chcp to switch code pages first). To illustrate, the following program works: import easygui sample = u"DEG\u00b0 MIN' SEC\"" from_user = easygui.enterbox (u"Enter" + sample) # # Paste in values from your email since I can't # be bothered to work out how to get the degree # sign # print from_user and from_user is a perfectly good unicode string. Now, if you want to write that out to a file, or a database or what-have-you which can't store unicode natively, then you'll have to encode it, probably as UTF8 which can encode anything. For this email, I've used the unicode-escape, but if -- as you did -- you wanted to use the string literal, then you'd need to save the .py file in a certain encoding and to place a line at the top of the file indicating what that encoding was. If you're happy using unicode-escapes then that saves a bit of finnicking about. TJG ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
> > I still don't know why there is such a encoding mess on Python. For me > this > > totally neglects the statement that python code is easily portable or > executable > > on other platforms. > > I don't think this is entirely fair. For example at the start you had a > file containing cp1252 data but told Python it was utf-8. You can't > really expect Python to just do the right thing in such circumstances. Here you are right. I was (without knowing it) doeing the wrong thing. Well, my judgement was based on a confusion that I haven't been able to clear so far: At first my file was encoded differently than the python default encoding and I coded even a different encoding in the file head. After I'd adaped all this correctly my code worked well. I tried to use the same code interactively on the IPython shell. As I said from my previous post, the shell uses again a different encoding. Therefore, the same code that worked well from file did fail. When I found out that I could set the default encoding to UTF-8 you warned me of a lesser portability. I am totally lost: * python has ascii as default encoding * my linux uses UTF-8 (therefore all files created on linux are UTF-8) * windows uses cp1250 * IPtyhon something else: on the machine where I am currently on stdin is set to cp850 So what encoding to I use to display and process characters that exeeed the standard english alphabet? > Does you IDL code include non-ascii characters? Perhaps you would do > better if you stuck with ascii for your Python source code, too - there > is no need to include non-ascii characters directly, you can use \x > string escapes to insert them into your strings. My initial question was: 1) get a coordinate (DEG° MIN' SEC'') as input from user via easygui 2) split that string into its subscripts: degrees, minutes and secons 3) do some processing of the 3 varaibles 4) print the output with easygui. I am not really interested which is the best encoding. I want to know: * how I do this that I don't get a encoding error? * how do I code it that the code runs on linux and windows from file and in IPython Thanks again for your support. I don't want any misunderstandings. You were a good tutor to me and are helping patiently many user stepping into python. Kind regards, Timmie ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Timmie wrote: >> Was the problem with the print statements? Maybe changing the console >> encoding would help. I have some notes here: >> http://personalpages.tds.net/~kent37/stories/00018.html > Thanks, I read it yesterday evening. > > I still don't know why there is such a encoding mess on Python. For me this > totally neglects the statement that python code is easily portable or > executable > on other platforms. I don't think this is entirely fair. For example at the start you had a file containing cp1252 data but told Python it was utf-8. You can't really expect Python to just do the right thing in such circumstances. Encoding issues require a certain amount of understanding of the underlying issues. In my experience most programmers find it confusing at first but it eventually sorts out. > For instance, I am also unsing some IDL code. There I open the IDLE and code > right away without worring about the encoding. Does you IDL code include non-ascii characters? Perhaps you would do better if you stuck with ascii for your Python source code, too - there is no need to include non-ascii characters directly, you can use \x string escapes to insert them into your strings. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
> Was the problem with the print statements? Maybe changing the console > encoding would help. I have some notes here: > http://personalpages.tds.net/~kent37/stories/00018.html Thanks, I read it yesterday evening. I still don't know why there is such a encoding mess on Python. For me this totally neglects the statement that python code is easily portable or executable on other platforms. For instance, I am also unsing some IDL code. There I open the IDLE and code right away without worring about the encoding. Kind regards, Timmie ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
> Just be aware that this affects portability of your scripts; they will > require this same change to run on other systems. For this reason you > might want to change the code instead. > If you give a specific example of what is failing I will try to help. >From the previous posts I learned that I should save the file as utf-8 encoded and use unicode where possible. Atleast I hope that I understood this correctly. By that method my inital posted code worked well. I still have the problem that the same code works on some machines and others not when I am unsing IPython. IPython apparently uses its own encoding or the encoding of the underlying platform. How can I avoid this problem? ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Evert Rol wrote: >>> >>> print unicode("125° 15' 5.55''", 'utf-8') >>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in >>> position 3: ordinal not in range(128) >> >> This is the same as the first encode error. > > This is the thing I don't get; or only partly: I'm sending a utf-8 > encoded string to print. No, you are sending a unicode string to print. unicode("125° 15' 5.55''", 'utf-8') means the same as "125° 15' 5.55''".decode('utf-8') which is, "create a unicode string from this utf-8-encoded byte string". Once you have decoded to Unicode it is no longer utf-8. > print apparently ignores that, and still tries > to print things using ascii encoding. If I'm correct in that assessment, > then why would print ignore that? print just knows that you want to print a unicode string. stdout is byte-oriented so the unicode chars have to be converted to a byte stream. This is done by encoding with sys.getdefaultencoding(), i.e. print unicode("125° 15' 5.55''", 'utf-8') is the same as print u"125° 15' 5.55''" which is the same as print u"125° 15' 5.55''".encode(sys.getdefaultencoding()) > Ie, use encode('utf-8') where necessary? Yes. > But I did see some examples pass by using > > import sys > sys.setdefaultencoding('utf-8') Yes, that will make the examples pass, it just isn't the recommended solution. > Oh well, in general I tend to play long enough with things like this > that 1) I get it (script) working, and 2) I have a decent feeling (90%) > that I actually understand what is going on, and why other things > failed. Which is roughly where I am now ;-). The key thing is to realize that there are implicit conversions between str and unicode and they will break if the data is not ascii. The best fix is to make the conversions explicit by providing the correct encoding. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
raw = unicode("125° 15' 5.55''", 'utf-8') >>> Again, I think this can be simplified to >>>raw = u"125° 15' 5.55''" >> It does, but it's getting confusing when I compare the following: >> >>> raw = u"125° 15' 5.55''" >> 125° 15' 5.55'' > > Where does that output come from? sorry, my bad: over-hastily copy of non-existant output. >> >>> print u"125° 15' 5.55''" >> UnicodeEncodeError: 'ascii' codec can't encode characters in >> position 3-4: ordinal not in range(128) > > print must encode unicode strings. It tries to encode them using > the default encoding which doesnt' work because the source is not > ascii. >> >>> print u"125° 15' 5.55''".encode('utf-8') >> 125° 15' 5.55'' > > That is the way to get it to work. > >> >>> print unicode("125° 15' 5.55''") >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in >> position 3: ordinal not in range(128) > > Here the problem is trying to create the unicode string using the > default encoding, again it doesn't work because the source contains > non-ascii characters. > >> >>> print unicode("125° 15' 5.55''", 'utf-8') >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' >> in position 3: ordinal not in range(128) > > This is the same as the first encode error. This is the thing I don't get; or only partly: I'm sending a utf-8 encoded string to print. print apparently ignores that, and still tries to print things using ascii encoding. If I'm correct in that assessment, then why would print ignore that? >> So apart from the errors all being slightly different, is there >> perhaps some difference between the str() and repr() functions >> (looks like repr uses escape backslashes)? > > Right. > >> And checking the default encoding inside the python cmdline, I >> see that my sys module doesn't actually have a setdefaultencoding >> () method; was that something that should have been properly >> configured at compile time? The documentation mentions something >> about the site module, but I can't find it there either. > > The setdefaultencoding() function (it's not a method, it is a > module-level function) yes, sorry, got my terminology wrong there. > is removed from the sys module as part of startup (I think by the > site module). That is why you have to call it from > sitecustomize.py. You can also > reload(sys) > to restore it but it's better to write your app so it doesn't > require the default encoding to be changed. Ie, use encode('utf-8') where necessary? But I did see some examples pass by using import sys sys.setdefaultencoding('utf-8') ?? Oh well, in general I tend to play long enough with things like this that 1) I get it (script) working, and 2) I have a decent feeling (90%) that I actually understand what is going on, and why other things failed. Which is roughly where I am now ;-). Evert ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Evert Rol wrote: >>> raw = unicode("125° 15' 5.55''", 'utf-8') >> Again, I think this can be simplified to >>raw = u"125° 15' 5.55''" > > It does, but it's getting confusing when I compare the following: > > >>> raw = u"125° 15' 5.55''" > 125° 15' 5.55'' Where does that output come from? > > >>> print u"125° 15' 5.55''" > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 3-4: ordinal not in range(128) print must encode unicode strings. It tries to encode them using the default encoding which doesnt' work because the source is not ascii. > > >>> print u"125° 15' 5.55''".encode('utf-8') > 125° 15' 5.55'' That is the way to get it to work. > >>> print unicode("125° 15' 5.55''") > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position > 3: ordinal not in range(128) Here the problem is trying to create the unicode string using the default encoding, again it doesn't work because the source contains non-ascii characters. > >>> print unicode("125° 15' 5.55''", 'utf-8') > UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in > position 3: ordinal not in range(128) This is the same as the first encode error. > So apart from the errors all being slightly different, is there > perhaps some difference between the str() and repr() functions (looks > like repr uses escape backslashes)? Right. > And checking the default encoding inside the python cmdline, I see > that my sys module doesn't actually have a setdefaultencoding() > method; was that something that should have been properly configured > at compile time? The documentation mentions something about the site > module, but I can't find it there either. The setdefaultencoding() function (it's not a method, it is a module-level function) is removed from the sys module as part of startup (I think by the site module). That is why you have to call it from sitecustomize.py. You can also reload(sys) to restore it but it's better to write your app so it doesn't require the default encoding to be changed. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
>> raw = unicode("125° 15' 5.55''", 'utf-8') > > Again, I think this can be simplified to >raw = u"125° 15' 5.55''" It does, but it's getting confusing when I compare the following: >>> raw = u"125° 15' 5.55''" 125° 15' 5.55'' >>> print u"125° 15' 5.55''" UnicodeEncodeError: 'ascii' codec can't encode characters in position 3-4: ordinal not in range(128) >>> print u"125° 15' 5.55''".encode('utf-8') 125° 15' 5.55'' >>> print unicode("125° 15' 5.55''") UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 3: ordinal not in range(128) >>> print unicode("125° 15' 5.55''", 'utf-8') UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 3: ordinal not in range(128) So apart from the errors all being slightly different, is there perhaps some difference between the str() and repr() functions (looks like repr uses escape backslashes)? Or does it simply have to do with my locale, which is set to the default "C" (terminal = standard Mac OS X terminal, with UTF-8 encoding)? Although that wouldn't explain to me why the third statement works. And checking the default encoding inside the python cmdline, I see that my sys module doesn't actually have a setdefaultencoding() method; was that something that should have been properly configured at compile time? The documentation mentions something about the site module, but I can't find it there either. Any enlightenment on this is welcome. Evert ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Timmie wrote: > OK, I found out. >> Since it didn't work in IPython as well I assume that I need to change the >> encoding of the IPython shell to UTF-8, too. Still need to find out where. > Put a file called 'sitecustomize.py' into any directory on your PYTHONPATH. > > write the folowing two lines in that file: > > import sys > sys.setdefaultencoding('utf-8') Just be aware that this affects portability of your scripts; they will require this same change to run on other systems. For this reason you might want to change the code instead. If you give a specific example of what is failing I will try to help. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
OK, I found out. > Since it didn't work in IPython as well I assume that I need to change the > encoding of the IPython shell to UTF-8, too. Still need to find out where. Put a file called 'sitecustomize.py' into any directory on your PYTHONPATH. write the folowing two lines in that file: import sys sys.setdefaultencoding('utf-8') To test, start ipython and type import sys sys.getdefaultencoding() It should be utf-9 now. Again, may sound trivial. for some. But I started my Python adventures on Ubuntu Linux which it set to UTF-8 as default encoding. Due to some software environments I am currently forced to use Windows. After installing Python and some modules with setup.exe I never new that I even have to care for this... ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Timmie wrote: >> Get an editor on Windows that can edit UTF-8 text files and file >> transfer software that doesn't change the text encoding. Work with UTF-8 >> exclusively. > Thanks. This sounds really trivial but the thing is that one cannot define > file > encoding in PythonWin. Really! That is surprising. Anyone else know how to set the file encoding for the PythonWin editor? > Since it didn't work in IPython as well I assume that I need to change the > encoding of the IPython shell to UTF-8, too. Still need to find out where. Was the problem with the print statements? Maybe changing the console encoding would help. I have some notes here: http://personalpages.tds.net/~kent37/stories/00018.html > The following code works on windows when saved to a UTF-8 encoded file: > > # -*- coding: utf-8 -*- > # the file needs to be set to UTF-8 encoding if working on windows > from easygui import easygui > raw = unicode("125° 15' 5.55''", 'utf-8') Again, I think this can be simplified to raw = u"125° 15' 5.55''" Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Timmie wrote: >>> from easygui import easygui >>> raw = unicode("121ø 55' 5.55''", 'utf-8') >>> => gets a encoding error >> Then your source file is not really in UTF-8. > This really helped! > > >> Get an editor on Windows that can edit UTF-8 text files and file >> transfer software that doesn't change the text encoding. Work with UTF-8 >> exclusively. > Thanks. This sounds really trivial but the thing is that one cannot define > file > encoding in PythonWin. > I will have to either use a advanced editor like Notepad++ and run the script > via console or use Geany as IDE. I'm sure there'll be lots of other suggestions, but the SciTE editor (whose name I'm never sure how to prononunce without blushing) understands the same encoding directive as Python. It's quite lightweight, and also allows you to run Python scripts directly, although there are limitations. Worth looking at, anyhow. http://www.scintilla.org/SciTE.html TJG ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
> > from easygui import easygui > > raw = unicode("121ø 55' 5.55''", 'utf-8') > > => gets a encoding error > > Then your source file is not really in UTF-8. This really helped! > Get an editor on Windows that can edit UTF-8 text files and file > transfer software that doesn't change the text encoding. Work with UTF-8 > exclusively. Thanks. This sounds really trivial but the thing is that one cannot define file encoding in PythonWin. I will have to either use a advanced editor like Notepad++ and run the script via console or use Geany as IDE. Since it didn't work in IPython as well I assume that I need to change the encoding of the IPython shell to UTF-8, too. Still need to find out where. The following code works on windows when saved to a UTF-8 encoded file: # -*- coding: utf-8 -*- # the file needs to be set to UTF-8 encoding if working on windows from easygui import easygui raw = unicode("125° 15' 5.55''", 'utf-8') print raw.encode('utf-8') lines = raw.split(unicode('°', 'utf-8')) print lines entertext = easygui.enterbox(message="Enter something.", title="", argDefaultText=raw) print entertext degrees = lines[0] print "degrees: ", str(degrees) Thanks for your support, so far. Timmie ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Timmie wrote: > I tried your advice yesterday evening. > >> And see if you get a ç. > I see this character. > > from easygui import easygui > raw = unicode("121ø 55' 5.55''", 'utf-8') > => gets a encoding error Then your source file is not really in UTF-8. BTW you can simply say raw = u"121ø 55' 5.55''" > raw = unicode("121ø 55' 5.55''", 'cp1250') > => this works while coding on windows. > How do I make it work really crossplatform: On both Linux and Windows? Get an editor on Windows that can edit UTF-8 text files and file transfer software that doesn't change the text encoding. Work with UTF-8 exclusively. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
I tried your advice yesterday evening. > And see if you get a ç. I see this character. from easygui import easygui raw = unicode("121ø 55' 5.55''", 'utf-8') => gets a encoding error raw = unicode("121ø 55' 5.55''", 'cp1250') => this works while coding on windows. How do I make it work really crossplatform: On both Linux and Windows? lines = raw.split(unicode('ø', 'cp1250')) => again work on windows print lines easygui.msgbox(raw) => prints a strange symbol instead of "°" import Tkinker Tkinter._test() => this test test the expected result. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
> How do you get this output? The print is after the statement causing the > traceback. Are you showing the same code as you ran? Yes. I created this file in PythonWin and run it with IPython. > It displays correctly for me (on MacOS X). Are you sure your source is > actually encoded in utf-8? Not really. The thing is that I am exchanging my files from Linux (Ubuntu with UTF-8 default) to Windows. > What platform are you on? Yesterday I run the code on windows. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Tim Michelsen wrote: > Dear list, > I have encountered a problem with encoding of user input and variables. > Heres my test script: Posting without the line numbers would make it easier to try your code. > > 1 #!/usr/bin/env python > 2 # -*- coding: utf-8 -*- > 3 from easygui import easygui > 4 import sys > 5 #raw = sys.argv[1] > 6 raw = easygui.enterbox(message="Enter something.", title="", > argDefaultText="20° 12' 33''") > 7 #unicode = unicode(raw) > 8 #conv = raw.encoding('latin-1') > 9 split = raw.split('°') try a unicode string: split = raw.split(u'°') > 10 out = raw > 11 print out > 12 easygui.msgbox(out) > > Here ist my output: > > 20° 12' 33'' How do you get this output? The print is after the statement causing the traceback. Are you showing the same code as you ran? > Traceback (most recent call last): > File > "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", > line 310, in RunScript > exec codeObject in __main__.__dict__ > File "D:\python\scripts\encoding-test.py", line 22, in ? > split = raw.split('°') > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: > ordinal > not in range(128) > Traceback (most recent call last): > File > "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", > line 310, in RunScript > exec codeObject in __main__.__dict__ > File "D:\python\scripts\encoding-test.py", line 22, in ? > split = raw.split('°') > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: > ordinal > not in range(128) > > Therefore my question: > * How can I split at the "°" without getting a charater encoding error? > * How do I have to encode the "°"-symbol that it gets correctly displayed in > the > Easygui msgbox at line 6? It displays correctly for me (on MacOS X). Are you sure your source is actually encoded in utf-8? What platform are you on? Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] symbol encoding and processing problem
Hi Tim, > Heres my test script: > > 1 #!/usr/bin/env python > 2 # -*- coding: utf-8 -*- > 3 from easygui import easygui > 4 import sys > 5 #raw = sys.argv[1] > 6 raw = easygui.enterbox(message="Enter something.", title="", > argDefaultText="20° 12' 33''") > 7 #unicode = unicode(raw) > 8 #conv = raw.encoding('latin-1') > 9 split = raw.split('°') > 10 out = raw > 11 print out > 12 easygui.msgbox(out) > > Here ist my output: > > 20° 12' 33'' > Traceback (most recent call last): > File "C:\python24\Lib\site-packages\pythonwin\pywin\framework > \scriptutils.py", > line 310, in RunScript > exec codeObject in __main__.__dict__ > File "D:\python\scripts\encoding-test.py", line 22, in ? > split = raw.split('°') > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > position 0: ordinal > not in range(128) > Traceback (most recent call last): > File "C:\python24\Lib\site-packages\pythonwin\pywin\framework > \scriptutils.py", > line 310, in RunScript > exec codeObject in __main__.__dict__ > File "D:\python\scripts\encoding-test.py", line 22, in ? > split = raw.split('°') > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in > position 0: ordinal > not in range(128) > > Therefore my question: > * How can I split at the "°" without getting a charater encoding > error? > * How do I have to encode the "°"-symbol that it gets correctly > displayed in the > Easygui msgbox at line 6? I don't know all the details about unicode, but here's what works for me: # -*- coding: utf-8 -*- import easygui raw = unicode("121° 55' 5.55''", 'utf-8') print raw.encode('utf-8') lines = raw.split(unicode('°', 'utf-8')) print lines easygui.enterbox(message="Enter something.", title="", argDefaultText=raw) So you may need to explicitly define the encoding (and encode the degree sign in the split argument as well). Google a bit for Python and unicode to get some more info, if you didn't do so already. (I'm really hoping that with Python 3 all this messy stuff does go away.) I had no problem with the easygui box, either my way or hardcoding the string to argDefaultText as you did. Perhaps it's an underlying problem with Tkinter, which may not support unicode on your system? A simple test (not sure how definite that would be), is to start up a Python cmdline, and do >>> import Tkinter >>> Tkinter._test() And see if you get a ç. Good luck, Evert ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] symbol encoding and processing problem
Dear list, I have encountered a problem with encoding of user input and variables. I want to read in user defined coordinates as a string like: 121° 55' 5.55'' Furthermore I would like to extract the degrees (integer number before the " ° " sign), the minutes (integer number before the " ' " sign) and the seconds (floating point number before the " '' " sign). When reading and processing the degree part I get some errors: Heres my test script: 1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 from easygui import easygui 4 import sys 5 #raw = sys.argv[1] 6 raw = easygui.enterbox(message="Enter something.", title="", argDefaultText="20° 12' 33''") 7 #unicode = unicode(raw) 8 #conv = raw.encoding('latin-1') 9 split = raw.split('°') 10 out = raw 11 print out 12 easygui.msgbox(out) Here ist my output: 20° 12' 33'' Traceback (most recent call last): File "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "D:\python\scripts\encoding-test.py", line 22, in ? split = raw.split('°') UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in range(128) Traceback (most recent call last): File "C:\python24\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 310, in RunScript exec codeObject in __main__.__dict__ File "D:\python\scripts\encoding-test.py", line 22, in ? split = raw.split('°') UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 0: ordinal not in range(128) Therefore my question: * How can I split at the "°" without getting a charater encoding error? * How do I have to encode the "°"-symbol that it gets correctly displayed in the Easygui msgbox at line 6? Thanks inadvance for answering, Timmie ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor