[EMAIL PROTECTED] wrote: > Hello, > > I am having great problems writing norwegian characters æøå to file > from a python application. My (simplified) scenario is as follows: > > 1. I have a web form where the user can enter his name. > > 2. I use the cgi module module to get to the input from the user: > .... > name = form["name"].value > > 3. The name is stored in a file > > fileH = open(namefile , "a") > fileH.write("name:%s \n" % name) > fileH.close() > > Now, this works very well indeed as long the users have 'ascii' names, > however when someone enters a name with one of the norwegian characters > æøå - it breaks at the write() statement. > > UnicodeDecodeError: 'ascii' codec can't decode byte 0x8f in position > .... > > Now - I understand that the ascii codec can't be used to decode the > particular characters, however my attempts of specifying an alternative > encoding have all failed. > > I have tried variants along the line: > > fileH = codecs.open(namefile , "a" , "latin-1") / fileH = > open(namefile , "a") > fileH.write(name) / fileH.write(name.encode("latin-1")) > > It seems *whatever* I do the Python interpreter fails to see my pledge > for an alternative encoding, and fails with the dreaded > UnicodeDecodeError. > > Any tips on this would be *highly* appreciated.
The approach with codecs.open() should succeed >>> out = codecs.open("tmp.txt", "a", "latin1") >>> out.write(u"æøå") >>> out.write("abc") >>> out.write("æøå") Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/lib/python2.4/codecs.py", line 501, in write return self.writer.write(data) File "/usr/local/lib/python2.4/codecs.py", line 178, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) provided that you write only unicode strings with characters in the range unichr(0)...unichr(255) and normal strs in the range chr(0)...chr(127). You have to decode non-ascii strs before feeding them to write() with the appropriate encoding (that only you know) >>> out.write(unicode("\xe6\xf8\xe5", "latin1")) If there are unicode code points beyond unichr(255) you have to change the encoding in codecs.open(), typically to UTF-8. # raises UnicodeEncodeError codecs.open("tmp.txt", "a", "latin1").write(u"\u1234") # works codecs.open("tmp.txt", "a", "utf8").write(u"\u1234") Peter -- http://mail.python.org/mailman/listinfo/python-list