On Aug 24, 1:12 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: > George Sakkis wrote: > > On Aug 21, 1:48 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > > >> George Sakkis wrote: > >>> It's interesting that the element text attributes after a successful > >>> parse do not necessarily have the same type, i.e. all be str or all > >>> unicode. I ported some text extraction code from BeautifulSoup (which > >>> handles all text as unicode) and I was surprized to find out that in > >>> xml.etree the returned text's type is not fixed, even within the same > >>> file. Although it's not a bug, having a mixed collection of byte and > >>> unicode strings from the same source makes me somewhat uneasy. > >> If you don't care about memory and execution performance, there are > >> plenty of toolkits that guarantee that you always get Unicode strings. > > > As long as they are documented, both approaches are fine for different > > cases. Currently the only reference I found about unicode in > > ElementTree is "All strings can either be Unicode strings, or 8-bit > > strings containing US-ASCII only." [1], which is rather ambiguous > > It's not ambiguous in Py2.x, where ASCII byte strings and unicode strings are > compatible. No need to feel "uneasy". :)
It depends on what you mean by "compatible"; e.g. you can't safely do [s.decode('utf8') for s in strings] if you have byte strings mixed with unicode. George -- http://mail.python.org/mailman/listinfo/python-list