hi all, i have a newbie problem arising from writing-then-reading a unicode file, and i can't work out what syntax i need to read it in.
the syntax i'm using now (just using quick hack tmp files): BEGIN f=codecs.open("tt.xml","r","utf8") fwrap=codecs.EncodedFile(f,"ascii","utf8") try: ss=u'' ss=fwrap.read() print ss ## rrr=xml.dom.minidom.parseString(f.read()) # originally finally: f.close() END barfs with this error: BEGIN UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5092: ordinal not in range(128) END any ideas? -- Context (if interested): had a look at the blogger api, downloaded the "15 most recent posts" into a miniDOM document, then decided to learn how to traverse the xml object in python. getting annoyed with the time taken to reconnect each time i played with a new syntax, i wrote the xml object to a file. that barfed with a similar sort of encoding error. sure enough, there in the debug coming back from blogger: "charset=utf-8". my python book said i needed to switch from "open/print" to "codecs.open/write", so i did this: BEGIN # get xml doct (from blogger: atom format) rrr=xml.dom.minidom.Document() conn.request("GET","/atom/1234",None,headers) response=conn.getresponse() rrr=xml.dom.minidom.parseString(response.read()) print rrr # dump to disk import codecs f=codecs.open("ttt.xml","w","utf8") try: ## print >> f, rrr.toxml() f.write(rrr.toxml()) finally: f.close() END this works fine and the resulting file looks like good xml to the naked eye. oh and i have tried both "utf8" and "utf-8" as the en/decoding tokens -- no change. ditto with explicitly initialising "ss" as unicode: same error as before when it was not explicitly initialised at all. -- http://mail.python.org/mailman/listinfo/python-list