Hi, I've got an Ascii file with some latin characters. Specifically \xe1 and \xfc. I'm trying to import it into a Postgresql database that's running in Unicode mode. The Unicode converter chokes on those two characters.
I could just manually replace those to characters with something valid but if any other invalid characters show up in later versions of the file, I'd like to handle them correctly. I've been playing with the Unicode stuff and I found out that I could convert both those characters correctly using the latin1 encoder like this; import unicodedata s = '\xe1\xfc' print unicode(s,'latin1') The above works. When I try to convert my file however, I still get an error; import unicodedata input = file('ascii.csv', 'r') output = file('unicode.csv','w') for line in input.xreadlines(): output.write(unicode(line,'latin1')) input.close() output.close() Traceback (most recent call last): File "C:\Users\jgold\CloudmartFiles\UnicodeTest.py", line 10, in __main__ output.write(unicode(line,'latin1')) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 295: ordinal not in range(128) I'm stuck using Python 2.4.4 which may be handling the strings differently depending on if they're in the program or coming from the file. I just haven't been able to figure out how to get the Unicode conversion working from the file data. Can anyone explain what is going on? -- http://mail.python.org/mailman/listinfo/python-list