[issue27582] Mispositioned SyntaxError caret for unknown code points

2021-12-03 Thread Irit Katriel
Change by Irit Katriel : -- stage: needs patch -> resolved status: pending -> closed ___ Python tracker ___ ___ Python-bugs-list

[issue27582] Mispositioned SyntaxError caret for unknown code points

2021-11-23 Thread Irit Katriel
Irit Katriel added the comment: This seems to have been fixed by now, I get this on 3.11: >>> varname = “d“a”t”apoint File "", line 1 varname = “d“a”t”apoint ^ SyntaxError: invalid character '“' (U+201C) >>> varname = “d“a”t”apoint.evidence File "", line 1 varname =

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-24 Thread Nick Coghlan
Nick Coghlan added the comment: Drekin, agreed, that looks like the same problem. Since this one has draft patches attached, I'll mark the other one as a duplicate. -- ___ Python tracker

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-22 Thread Adam Bartoš
Adam Bartoš added the comment: Maybe this is related: http://bugs.python.org/issue26152. -- nosy: +Drekin ___ Python tracker ___

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Stephen J. Turnbull
Stephen J. Turnbull added the comment: I still think the easiest thing to do would be to make all non-ASCII characters instances of "invalid_character_token", self-delimiting in the same way that operators are. That would automatically point to exactly the right place in the token stream,

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Chris Angelico added the comment: Attached is a combined patch that has the new private function for IsIdentifier, method 4's error handling change, and a bit of glue in the middle to make use of it. The result is a passing test suite (bar test_site which was already failing on my system) and

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Chris Angelico added the comment: Hmm, that'd be curious. The code to do that is actually pretty simple - see attached patch - but actually using that to affect error messages is a bit harder. Is it safe to mess with tok->start? -- Added file:

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Nick Coghlan
Nick Coghlan added the comment: Looking at issue 2382, I agree that's a different problem (I'm seeing the current misbehaviour even though everything is consistently encoded as UTF-8) The main case we're interested in here is the PyUnicode_IsIdentifier one, so if we wanted to do better than

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Chris Angelico added the comment: BTW, here's how a session looks using method 4's change: >>> varname = asdf“d“a”t”apoint File "", line 1 varname = asdf“d“a”t”apoint ^ SyntaxError: invalid character in identifier >>> varname = asdf“d“a”t”apoint.evidence File "", line 1

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Changes by Chris Angelico : Added file: http://bugs.python.org/file43813/method3-change-all-errors.patch ___ Python tracker ___

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Changes by Chris Angelico : Added file: http://bugs.python.org/file43812/method2-change-cur-and-inp.patch ___ Python tracker ___

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Changes by Chris Angelico : Added file: http://bugs.python.org/file43814/method4-change-all-errors-if-possible.patch ___ Python tracker ___

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Chris Angelico added the comment: Actually pinpointing the invalid character may be impractical, as there are two boolean situations: either a UnicodeDecodeError (because you had an invalid UTF-8 stream), or PyUnicode_IsIdentifier returns false. Either way, it applies to the whole identifier.

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Chris Angelico added the comment: The question was raised that there might be a problem with (UTF-8) bytes vs characters, but that's definitely not it - pythonrun.c:1362 UTF-8-decodes the line of source and then gets its character length to use as the new offset. So I don't think this is a

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Berker Peksag
Berker Peksag added the comment: This looks like a duplicate of issue 2382. -- nosy: +berker.peksag ___ Python tracker ___

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Chris Angelico
Changes by Chris Angelico : -- nosy: +Rosuav ___ Python tracker ___ ___ Python-bugs-list

[issue27582] Mispositioned SyntaxError caret for unknown code points

2016-07-21 Thread Nick Coghlan
New submission from Nick Coghlan: Reporting by Rustom Mody on python-ideas, the SyntaxError caret is confusingly mispositioned when an invalid Unicode codepoint appears as part of a larger sequence of invalid codepoints and/or valid identifier characters: >>> varname = “d“a”t”apoint File