Torsten Bronger wrote: > Hallöchen! und zurück!
> Stefan Behnel writes: > >> Torsten Bronger wrote: >> >>> [...] >>> >>> My problem is that if there is only ASCII, these methods return >>> ordinary strings instead of unicode. So sometimes I get str, >>> sometimes I get unicode. Can one change this globally so that >>> they only return unicode? >> That's a convenience measure to reduce memory and processing >> overhead. > > But is this really worth the inconsistency of having partly str and > partly unicode, given that the common origin is unicode XML data? Yes. It's no difference in almost all use cases, as long as you assume Py2 string handling semantics. In Py3, you will always get Unicode strings anyway. >> Could you explain why this is a problem for you? > > I feed ElementTree's output to functions in the unicodedata module. > And they want unicode input. While it's not a big deal to write > e.g. unicodedata.category(unicode(my_character)), I find this rather > wasteful. I just looked at the code. It seems that you can use your own XMLTreeBuilder subclass and overwrite the "._fixtext()" method like this: def _fixtext(self, text): return text Then pass an instance of that as "parser" when parsing in ElementTree. That should do the trick. Stefan -- http://mail.python.org/mailman/listinfo/python-list