Re: ElementTree cannot parse UTF-8 Unicode?
Hi ! ...Usenet to transmit it properly newsgroups (NNTP) : yes, it does it usenet : perhaps (that depends on the newsgroups) clp : no Michel Claveau -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Fredrik Lundh, Quinta 20 Janeiro 2005 05:17, wrote: what does it give you on your machine? (looks like wxPython cannot handle Unicode strings, but can that really be true?) It does support Unicode if it was built to do so... -- Godoy. [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Jorge Luiz Godoy Filho wrote: what does it give you on your machine? (looks like wxPython cannot handle Unicode strings, but can that really be true?) It does support Unicode if it was built to do so... Python has supported Unicode in release 1.6, 2.0, 2.1, 2.2, 2.3 and 2.4, so you might think that Unicode should be enabled by default in a UI toolkit for Python... /F -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
There is something wrong with the physical file... I d/l a trial version of XML Spy home edition and built an equivalent of the korean test file, and tried it and it got past the element tree error and now I am stuck with the wxEditCtrl error. To build the xml file in the first place I had code that looked like this: d=wxFileDialog( self, message=Choose a file, defaultDir=os.getcwd(), defaultFile=, wildcard=*.xml, style=wx.SAVE | wxOVERWRITE_PROMPT | wx.CHANGE_DIR) if d.ShowModal() == wx.ID_OK: # This returns a Python list of files that were selected. paths = d.GetPaths() layout = '?xml version=\1.0\ encoding=\UTF-8\?\n' L1Word = self.t1.GetValue() L2Word = 'undefined' layout += 'Vocab\n' layout += 'Word L1=\'' + L1Word + '\'/Word\n' layout += '/Vocab' open( paths[0], 'w' ).write(layout) d.Destroy() So apprantly there is something wrong with physically constructing the file in this manner? Thank you, -Erik -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Erik Bethke wrote: layout += 'Vocab\n' layout += 'Word L1=\'' + L1Word + '\'/Word\n' what does print repr(L1Word) print (that is, what does wxPython return?). it should be a Unicode string, but that would give you an error when you write it out: f = open(file.txt, w) f.write(u'\uc5b4\ub155\ud558\uc138\uc694!') Traceback (most recent call last): File stdin, line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128) have you hacked the default encoding in site/sitecustomize? what happens if you replace the L1Word term with L1Word.encode(utf-8) can you post the repr() (either of what's in your file or of the thing, whatever it is, that wxPython returns...) /F -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
That was a great clue. I am an idiot and tapped on the wrong download link... now I can read and parse the xml file fine - as long as I create it in XML spy - if I create it by this method: d=wxFileDialog( self, message=Choose a file, defaultDir=os.getcwd(), defaultFile=, wildcard=*.xml, style=wx.SAVE | wxOVERWRITE_PROMPT | wx.CHANGE_DIR) if d.ShowModal() == wx.ID_OK: # This returns a Python list of files that were selected. paths = d.GetPaths() layout = '?xml version=\1.0\ encoding=\UTF-8\?\n' L1Word = self.t1.GetValue() L2Word = 'undefined' layout += 'Vocab\n' layout += 'Word L1=\'' + L1Word + '\'/Word\n' layout += '/Vocab' open( paths[0], 'w' ).write(layout) I get hung up on the write statement, I am off to look for a a Unicode capable file write I think... -Erik -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Woo-hoo! Everything is working now! Thank you everyone! The TWO problems I had: 1) I needed to save my XML file in the first place with this code: f = codecs.open(paths[0], 'w', 'utf8') 2) I needed to download the UNICODE version of wxPython, duh. So why are there non-UNICODE versions of wxPython??? To save memory or something??? Thank you all! Best! -Erik -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Erik Bethke wrote: So why are there non-UNICODE versions of wxPython??? To save memory or something??? Win95, Win98, WinME have problems with unicode. GTK1 does not support unicode at all. -- Jarek Zgoda http://jpa.berlios.de/ | http://www.zgodowie.org/ -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Jarek Zgoda wrote: So why are there non-UNICODE versions of wxPython??? To save memory or something??? Win95, Win98, WinME have problems with unicode. This problem can be solved - on W9x, wxPython would have to pass all Unicode strings to WideCharToMultiByte, using CP_ACP, and then pass the result to the API function. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
wxPython unicode/ansi builds [was Re: ElementTree cannot parse UTF-8 Unicode?]
Martin v. Lwis wrote: Jarek Zgoda wrote: So why are there non-UNICODE versions of wxPython??? To save memory or something??? Robin Dunn has an explanation here: http://wiki.wxpython.org/index.cgi/UnicodeBuild ... which is the first hit from a Google search on wxpython unicode build. Also, from the wxPython downloads page: There are two versions of wxPython for each of the supported Python versions on Win32. They are nearly identical, except one of them has been compiled with support for the Unicode version of the platform APIs. If you don't know what that means then you probably don't need the Unicode version, get the ANSI version instead. The Unicode verison works best on Windows NT/2000/XP. It will also mostly work on Windows 95/98/Me systems, but it is based on a Microsoft hack called MSLU (or unicows.dll) that translates unicode API calls to ansi API calls, but the coverage of the API is not complete so there are some difficult bugs lurking in there. Steve -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Erik Bethke wrote: I am getting an error of not well-formed at the beginning of the Korean text in the second example. I am doing something wrong with how I am encoding my Korean? Do I need more of a wrapper about it than simple quotes? Is there some sort of XML syntax for indicating a Unicode string, or does the Elementree library just not support reading of Unicode? XML is Unicode, and ElementTree supports all common encodings just fine (including UTF-8). this one fails: ?xml version=1.0 encoding=UTF-8? Vocab Word L1=?!/Word /Vocab this works just fine on my machine. what's the exact error message? what does print repr(open(test2.xml).read()) print on your machine? what happens if you attempt to parse Vocab Word L1=#50612;#45397;#54616;#49464;#50836;! / /Vocab ? /F -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Hello Fredrik, 1) The exact error is in line 1160 of self._parser.Parse(data, 0 ): xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 16 2) You are right in that the print of the file read works just fine. 3) You are also right in that the digitally encoded unicode also works fine. However, this solution has two new problems: 1) The xml file is now not human readable 2) After ElementTree gets done parsing it, I am feeding the text to a wx.TextCtrl via .SetValue() but that is now giving me an error message of being unable to convert that style of string So it seems to me, that ElementTree is just not expecting to run into the Korean characters for it is at column 16 that these begin. Am I formatting the XML properly? Thank you, -Erik -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
On Wed, 19 Jan 2005 16:35:23 -0800, Erik Bethke wrote: So it seems to me, that ElementTree is just not expecting to run into the Korean characters for it is at column 16 that these begin. Am I formatting the XML properly? You should post the file somewhere on the web. (I wouldn't expect Usenet to transmit it properly.) (Just jumping in to possibly save you a reply cycle.) -- http://mail.python.org/mailman/listinfo/python-list
Re: ElementTree cannot parse UTF-8 Unicode?
Erik Bethke wrote: 2) You are right in that the print of the file read works just fine. but what does it look like? I saved a raw copy of your original mail, fixed the quoted-printable encoding, and got an UTF-8 encoded file that works just fine. the thing you've been parsing, and that you've cut and pasted into your mail, must be different, in some way. 3) You are also right in that the digitally encoded unicode also works fine. However, this solution has two new problems: that was just a test to make sure that your version of elementtree could handle Unicode characters on your platform. 1) The xml file is now not human readable 2) After ElementTree gets done parsing it, I am feeding the text to a wx.TextCtrl via .SetValue() but that is now giving me an error message of being unable to convert that style of string on my machine, the L1 attribute contains a Unicode string: print repr(root.find(Word).get(L1)) u'\uc5b4\ub155\ud558\uc138\uc694!' what does it give you on your machine? (looks like wxPython cannot handle Unicode strings, but can that really be true?) So it seems to me, that ElementTree is just not expecting to run into the Korean characters for it is at column 16 that these begin. Am I formatting the XML properly? nobody knows... /F -- http://mail.python.org/mailman/listinfo/python-list