Chris Angelico added the comment:

Actually pinpointing the invalid character may be impractical, as there are two 
boolean situations: either a UnicodeDecodeError (because you had an invalid 
UTF-8 stream), or PyUnicode_IsIdentifier returns false. Either way, it applies 
to the whole identifier. So there are a few possibilities, corresponding to the 
patches I'm attaching.

1) Change the way this one specific error is handled, in tokenizer.c 
verify_identifier(). If it finds an error, adjust tok->cur to point to the 
beginning of it. No new failures in test suite.

2) As above, but also change tok->inp, because of this comment in 
tokenizer.h:31 /* NB If done != E_OK, cur must be == inp!!! */ which I have no 
idea about the meaning of. This results in truncated error messages, but 
suggests that method 1 might be breaking an invariant that results in breakage 
elsewhere. If there are, though, they're not exercised by 'make test', which 
itself may be a problem. No new test failures.

3) Change the handling of ALL parser errors, in parsetok.c parsetok(), so now 
they all point to tok->start. Octal literals with 8s or 9s in them now get the 
caret pointing to the invalid digit, rather than the end of the literal. 
Unterminated strings point to the opening quote. And some forms of 
IndentationError now segfault Python. Test suite fails (unsurprisingly).

4) In response to the above segfault, hack it back to the old way of doing 
things if there's no tok->start. Maybe the condition should be done 
differently? No new failures in the test suite.

I'd ideally like to use the technique from method 3 (either as patch 4 or with 
some other guard condition). Failing that, can anyone explain the "NB" above, 
and what ought to be done to comply with it?

----------
keywords: +patch
Added file: http://bugs.python.org/file43811/method1-change-cur.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27582>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to