On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > George Sakkis wrote: > > It's interesting that the element text attributes after a successful > > parse do not necessarily have the same type, i.e. all be str or all > > unicode. I ported some text extraction code from BeautifulSoup (which > > handles all text as unicode) and I was surprized to find out that in > > xml.etree the returned text's type is not fixed, even within the same > > file. Although it's not a bug, having a mixed collection of byte and > > unicode strings from the same source makes me somewhat uneasy. > > If you don't care about memory and execution performance, there are > plenty of toolkits that guarantee that you always get Unicode strings.
As long as they are documented, both approaches are fine for different cases. Currently the only reference I found about unicode in ElementTree is "All strings can either be Unicode strings, or 8-bit strings containing US-ASCII only." [1], which is rather ambiguous; at least I read it as "all strings are Unicode or all strings are 8-bit strings", not a potentially mix of both in the same tree. Regards, George [1] http://effbot.org/zone/element.htm -- http://mail.python.org/mailman/listinfo/python-list