> so what i understood of all this, is that once you're using unicode > objects you're safe ! > At least as long as you don't use statements or operators that will > implicitely try to convert the unicode object back to bytestring using > your default encoding (ascii) which will most certainly result in codec > Errors...
Correct. > Also, minidom seems to use unicode object what was not really documented > in the python 2.3 doc i've read about it.. It might be somewhat hidden: http://docs.python.org/lib/dom-type-mapping.html "DOMString defined in the recommendation is mapped to a Python string or Unicode string. Applications should be able to handle Unicode whenever a string is returned from the DOM." http://docs.python.org/lib/minidom-and-dom.html "The type DOMString maps to Python strings. xml.dom.minidom supports either byte or Unicode strings, but will normally produce Unicode strings. Values of type DOMString may also be None where allowed to have the IDL null value by the DOM specification from the W3C." In principle, you should fill Unicode strings into DOM trees all the time, but it will work with byte strings as well as long as they are ASCII. > As a matter of fact using the following sequence will most certainly fail : > f = codecs.open('utf8codecs.xml', 'w', 'utf-8') > f.write(dom.toxml(encoding="utf-8")) > f.close() Correct. A codecs.StreamWriter expects Unicode objects, whereas toxml returns byte strings (atleast if you pass an encoding - because of a bug, it might return a Unicode string otherwise) > then again maybe this will work, i just thought of it.. > f = codecs.open('utf8codecs.xml', 'w', 'utf-8') > f.write(dom.toxml()) > f.close() Yeah, toxml() returned Unicode because of a bug - but for backwards compatibility, this cannot be changed. People should explicitly pass an encoding. > The next important thing is to make sure to use functions and objects > that support unicode all the way, like minidom seems to do.. Indeed, there are still many functions in the standard library which don't work with Unicode strings, but should. Some functions, of course, are only meaningful for byte strings (like networking API). Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list