Re: catch UnicodeDecodeError

Philipp Hagemeister Thu, 26 Jul 2012 05:21:15 -0700

On 07/26/2012 01:15 PM, Stefan Behnel wrote:
>> exits with a UnicodeDecodeError.
> ... that tells you the exact code line where the error occurred.


Which property of a UnicodeDecodeError does include that information?

On cPython 2.7 and 3.2, I see only start and end, both of which refer to
the number of bytes read so far.

I used the followin test script:

e = None
try:
    b'a\xc3\xa4\nb\xff0'.decode('utf-8')
except UnicodeDecodeError as ude:
    e = ude
print(e.start) # 5 for this input, 3 for the input b'a\nb\xff0'
print(dir(e))

But even if you would somehow determine a line number, this would only
work if the actual encoding uses 0xa for newline. Most encodings (101
out of 108 applicable ones in cPython 3.2) do include 0x0a in their
representation of '\n', but multi-byte encodings routinely include 0x0a
bytes in their representation of non-newline characters. Therefore, the
most you can do is calculate an upper bound for the line number.

- Philipp

signature.asc
Description: OpenPGP digital signature

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: catch UnicodeDecodeError

Reply via email to