En Tue, 03 Nov 2009 21:01:46 -0300, Kee Nethery <k...@kagi.com> escribió:

Having an issue with elementtree XML() in python 2.6.4.

This code works fine:

      from xml.etree import ElementTree as et
getResponse = u'''<?xml version="1.0" encoding="UTF-8"?> <customer><shipping><state>bobble</state><city>head</ city><street>city</street></shipping></customer>'''
      theResponseXml = et.XML(getResponse)

This code errors out when it tries to do the et.XML()

      from xml.etree import ElementTree as et
getResponse = u'''<?xml version="1.0" encoding="UTF-8"?> <customer><shipping><state>\ue58d83\ue89189\ue79c8C</state><city> \ue69f8f\ue5b882</city><street>\ue9ab98\ue58d97\ue58fb03</street></ shipping></customer>'''
      theResponseXml = et.XML(getResponse)

In my real code, I'm pulling the getResponse data from a web page that returns as XML and when I display it in the browser you can see the Japanese characters in the data. I've removed all the stuff in my code and tried to distill it down to just what is failing. Hopefully I have not removed something essential.

Why is this not working and what do I need to do to use Elementtree with unicode?

et expects bytes as input, not unicode. You're decoding too early (decoding early is good, but not in this case, because et does the work for you). Either feed et.XML with the bytes before decoding, or reencode the received xml text in UTF-8 (since this is the declared encoding).

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to