C. Benson Manica, 21.04.2010 19:19:
I have the following simple script running on 2.5.2 on a machine where
the default character encoding is "ascii":

#!/usr/bin/env python
#coding: utf-8

import xml.dom.minidom
import codecs

str=u"<?xml version=\"1.0\" encoding=\"utf-8\"?><elements><elem attrib=
\"รณ\"/></elements>"
doc=xml.dom.minidom.parseString( str )
xml=doc.toxml( encoding="utf-8" )
file=codecs.open( "foo.xml", "w", "utf-8" )
file.write( xml )
file.close()

You are trying to re-encode an already encoded output string here. toxml(encoding="utf-8") returns a byte string. If you pass that into an encoding file object (as returned by codecs.open()), which expects unicode input, it will fail to re-encode the already encoded string. This gives a bizarre error in Python 2.x and an understandable one in Python 3.

So the right solution is to let toxml() do the encoding and drop the use of codecs.open() in favour of

    f = open("foo.xml", "wb")

(mind the 'b' in the file mode, which stands for 'bytes' or 'binary')

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to