C. Benson Manica wrote: > I have the following simple script running on 2.5.2 on a machine where > the default character encoding is "ascii": > > #!/usr/bin/env python > #coding: utf-8 > > import xml.dom.minidom > import codecs > > str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib= > \"ó\"/></elements>" > doc=xml.dom.minidom.parseString( str ) > xml=doc.toxml( encoding="utf-8" ) > file=codecs.open( "foo.xml", "w", "utf-8" ) > file.write( xml ) > file.close() > > I've specified utf-8 every place I can find that the documentation > allows me to, and yet this doesn't even come close to working without > UnicodeEncodeErrors. What on Earth do I have to do to please the > character encoding gods?
Verify every step as you proceed? >>> import xml.dom.minidom >>> s = u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib=\"ó\"/></elements>" >>> doc = xml.dom.minidom.parseString(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.5/xml/dom/minidom.py", line 1925, in parseString return expatbuilder.parseString(string) File "/usr/lib/python2.5/xml/dom/expatbuilder.py", line 940, in parseString return builder.parseString(string) File "/usr/lib/python2.5/xml/dom/expatbuilder.py", line 223, in parseString parser.Parse(string, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 62: ordinal not in range(128) It seems that parseString() doesn't like unicode -- let's try a byte string then: >>> doc = xml.dom.minidom.parseString(s.encode("utf-8")) >>> xml = doc.toxml(encoding="utf-8") No complaints -- let's have a look at the result: >>> xml '<?xml version="1.0" encoding="utf-8"?><elements><elem attrib="\xc3\xb3"/></elements>' That's a byte string, no need for codecs.open() then: >>> f = open("foo.xml", "w") >>> f.write(xml) >>> f.close() Peter -- http://mail.python.org/mailman/listinfo/python-list