Uwe Mayer wrote: > I need to read in a text file which seems to be stored in some unknown > encoding. Opening and reading the files content returns: > >>>> f.read() > '\x00 \x00 \x00<\x00l\x00o\x00g\x00E\x00n\x00t\x00r\x00y\x00... > > Each character has a \x00 prepended to it. I suspect its some kind of > unicode - how do I get rid of it?
Intermittent '\x00' bytes are a indeed strong evidence for unicode. Use codecs.open() to access the data in such a file: >>> import codecs >>> f = codecs.open(filename, "r", "UTF-16-BE") >>> f.read() u' <logEntry' If you don't want unicode, convert back to str: >>> _.encode("latin1") ' <logEntry' Note that the last step may fail if the file contains characters not available in the string encoding you specify. Peter -- http://mail.python.org/mailman/listinfo/python-list