C. Benson Manica, 21.04.2010 19:19:
I have the following simple script running on 2.5.2 on a machine where
the default character encoding is "ascii":
#!/usr/bin/env python
#coding: utf-8
import xml.dom.minidom
import codecs
str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib=
\"รณ\"/></elements>"
doc=xml.dom.minidom.parseString( str )
xml=doc.toxml( encoding="utf-8" )
file=codecs.open( "foo.xml", "w", "utf-8" )
file.write( xml )
file.close()
You are trying to re-encode an already encoded output string here.
toxml(encoding="utf-8") returns a byte string. If you pass that into an
encoding file object (as returned by codecs.open()), which expects unicode
input, it will fail to re-encode the already encoded string. This gives a
bizarre error in Python 2.x and an understandable one in Python 3.
So the right solution is to let toxml() do the encoding and drop the use of
codecs.open() in favour of
f = open("foo.xml", "wb")
(mind the 'b' in the file mode, which stands for 'bytes' or 'binary')
Stefan
--
http://mail.python.org/mailman/listinfo/python-list