On 07/26/2012 01:15 PM, Stefan Behnel wrote: >> exits with a UnicodeDecodeError. > ... that tells you the exact code line where the error occurred.
Which property of a UnicodeDecodeError does include that information? On cPython 2.7 and 3.2, I see only start and end, both of which refer to the number of bytes read so far. I used the followin test script: e = None try: b'a\xc3\xa4\nb\xff0'.decode('utf-8') except UnicodeDecodeError as ude: e = ude print(e.start) # 5 for this input, 3 for the input b'a\nb\xff0' print(dir(e)) But even if you would somehow determine a line number, this would only work if the actual encoding uses 0xa for newline. Most encodings (101 out of 108 applicable ones in cPython 3.2) do include 0x0a in their representation of '\n', but multi-byte encodings routinely include 0x0a bytes in their representation of non-newline characters. Therefore, the most you can do is calculate an upper bound for the line number. - Philipp
signature.asc
Description: OpenPGP digital signature
-- http://mail.python.org/mailman/listinfo/python-list