Re: iterparse and unicode

George Sakkis Thu, 21 Aug 2008 04:36:29 -0700

On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote:

> George Sakkis wrote:
> > It's interesting that the element text attributes after a successful
> > parse do not necessarily have the same type, i.e. all be str or all
> > unicode. I ported some text extraction code from  BeautifulSoup (which
> > handles all text as unicode) and I was surprized to find out that in
> > xml.etree the returned text's type is not fixed, even within the same
> > file. Although it's not a bug, having a mixed collection of byte and
> > unicode strings from the same source makes me somewhat uneasy.
>
> If you don't care about memory and execution performance, there are
> plenty of toolkits that guarantee that you always get Unicode strings.


As long as they are documented, both approaches are fine for different
cases. Currently the only reference I found about unicode in
ElementTree is "All strings can either be Unicode strings, or 8-bit
strings containing US-ASCII only." [1], which is rather ambiguous; at
least I read it as "all strings are Unicode or all strings are 8-bit
strings", not a potentially mix of both in the same tree.

Regards,
George

[1] http://effbot.org/zone/element.htm
--
http://mail.python.org/mailman/listinfo/python-list

Re: iterparse and unicode

Reply via email to