On Dec 7, 6:20 am, "Mark Tolonen" <[EMAIL PROTECTED]> wrote: > "Johannes Bauer" <[EMAIL PROTECTED]> wrote in message > > news:[EMAIL PROTECTED] > > > > >John Machin schrieb: > >> On Dec 6, 5:36 am, Johannes Bauer <[EMAIL PROTECTED]> wrote: > >>> So UTF-16 has an explicit EOF marker within the text? I cannot find one > >>> in original file, only some kind of starting sequence I suppose > >>> (0xfeff). The last characters of the file are 0x00 0x0d 0x00 0x0a, > >>> simple \r\n line ending. > > >> Sorry, *WRONG*. It ends in 00 0d 00 0a 00. The file is 1559 bytes > >> long, an ODD number, which shouldn't happen with utf16. The file is > >> stuffed. Python 3.0 has a bug; it should give a meaningful error > >> message. > > >Yes, you are right. I fixed the file, yet another error pops up > >(http://www.file-upload.net/download-1299688/2008_12_05_Handy_Backup.t... > > >Traceback (most recent call last): > > File "./modify.py", line 12, in <module> > > a = AddressBook("2008_12_05_Handy_Backup.txt") > > File "./modify.py", line 7, in __init__ > > line = f.readline() > > File "/usr/local/lib/python3.0/io.py", line 1807, in readline > > while self._read_chunk(): > > File "/usr/local/lib/python3.0/io.py", line 1556, in _read_chunk > > self._set_decoded_chars(self._decoder.decode(input_chunk, eof)) > > File "/usr/local/lib/python3.0/io.py", line 1293, in decode > > output = self.decoder.decode(input, final=final) > > File "/usr/local/lib/python3.0/codecs.py", line 300, in decode > > (result, consumed) = self._buffer_decode(data, self.errors, final) > > File "/usr/local/lib/python3.0/encodings/utf_16.py", line 69, in > >_buffer_decode > > return self.decoder(input, self.errors, final) > >UnicodeDecodeError: 'utf16' codec can't decode byte 0x0a in position 0: > >truncated data > > >File size is 1630 bytes - so this clearly cannot be. > > How about posting your code?
He did. Ugly stuff using readline() :-) Should still work, though. There are definite problems with readline() and readlines(), including: First file: silently ignores error *and* the last line returned is garbage [consists of multiple actual lines, and the trailing codepoints have been byte-swapped] Second file: as he has just reported. I've reproduced it with f.open ('second_file.txt', encoding='utf16') followed by each of: (1) f.readlines() (2) list(f) (3) for line in f: print(repr(line)) With the last one, the error happens after printing the last actual line in his file. -- http://mail.python.org/mailman/listinfo/python-list