[EMAIL PROTECTED] wrote: > I'm guessing that your file is in UTF-16, then -- Windows seems to do > that a lot. It kind of makes it *not* a CSV file, but oh well. Try > > print open("test.csv").decode('utf-16').read().replace("\0", > ">>>NUL<<<") > > I'm not terribly unicode-savvy, so I'll leave it to others to suggest a > way to get the CSV reader to handle such encoding without reading in the > whole file, decoding it, and setting up a StringIO file.
Not pretty, but seems to work: from __future__ import with_statement import csv import codecs def recoding_reader(stream, from_encoding, args=(), kw={}): intermediate_encoding = "utf8" efrom = codecs.lookup(from_encoding) einter = codecs.lookup(intermediate_encoding) rstream = codecs.StreamRecoder(stream, einter.encode, efrom.decode, efrom.streamreader, einter.streamwriter) for row in csv.reader(rstream, *args, **kw): yield [unicode(column, intermediate_encoding) for column in row] def main(): file_encoding = "utf16" # generate sample data: data = u"\xe4hnlich,\xfcblich\r\nalpha,beta\r\ngamma,delta\r\n" with open("tmp.txt", "wb") as f: f.write(data.encode(file_encoding)) # read it with open("tmp.txt", "rb") as f: for row in recoding_reader(f, file_encoding): print u" | ".join(row) if __name__ == "__main__": main() Data from the file is recoded to UTF-8, then passed to a csv.reader() whose output is decoded to unicode. Peter -- http://mail.python.org/mailman/listinfo/python-list