Hi Ulrich, Ascii.csv isn't really a latin-1 encoded file. It's an ascii file with a few characters above the 128 range that are causing Postgresql Unicode errors. Those characters work fine in the Windows world but they're not the correct byte representation for Unicode. What I'm attempting to do is translate those upper range characters into the correct Unicode representations so that they look the same in the Postgresql database as they did in the CSV file.
I wrote up the source of my confusion to Steven so I won't duplicate it here. You're comment on defining the encoding of the file directly instead of using functions to encode and decode the data lead me to the codecs module. Using it, I can define the encoding a file open time and then just read and write the lines. I ended up with this; import codecs input = codecs.open('ascii.csv', encoding='cp1252') output = codecs.open('unicode.csv', mode='wb', encoding='utf-8') output.writelines(input.readlines()) input.close() output.close() This is doing exactly the same thing but it's much clearer to me. Readlines translates the input using the cp1252 codec and writelines encodes it to utf-8 and writes it out. And as you mentioned, it's probably higher performance. I haven't tested that but since both programs do the job in seconds, performance isn't and issue. Thanks again to everyone who posted. I really do appreciate it. -- http://mail.python.org/mailman/listinfo/python-list