[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-18 Thread Lysandros Nikolaou


Lysandros Nikolaou  added the comment:

Fixed in GH-20072 and bpo-40612.

--
resolution:  -> fixed
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-11 Thread Guido van Rossum


Guido van Rossum  added the comment:

I would carefully think about the meaning of that line and see if there are 
other permutations, e.g. min(Len(blah), offset - 1) ?

It would,be nice to specify what we need here from both parsers and from both 
code chunks for error printing.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-11 Thread Lysandros Nikolaou


Lysandros Nikolaou  added the comment:

Guido:
> I wonder if this is the fix we're looking for?

I think this is the most sensible thing to do, yes. I guess, we also have to 
change both parsers to match this behaviour, right?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-08 Thread Guido van Rossum


Guido van Rossum  added the comment:

@Terry: Thanks for confirming the bug in the old parser.

Regarding whether column offsets should be 0-based or 1-based: I agree with Tk, 
but my hand was forced for SyntaxError: we found that it was already using 
1-based offsets and we found correct-looking code elsewhere that depended on 
this, so we had to change the few cases where it was using 0-based offsets, 
alas.

I think I also looked at how vim and Emacs interpret column numbers, and found 
that both expect 1-based offsets in compiler error messages of the form 
"file:line:col: message". IIRC I had to do a deep-dive on this subject when we 
added column offsets to mypy error messages.

But it's a pain!

@Lysandros:

I wonder if this is the fix we're looking for?

diff --git a/Lib/traceback.py b/Lib/traceback.py
index bf34bbab8a..0e286f60bc 100644
--- a/Lib/traceback.py
+++ b/Lib/traceback.py
@@ -582,7 +582,7 @@ class TracebackException:
 yield '{}\n'.format(badline.strip())
 if offset is not None:
 caretspace = badline.rstrip('\n')
-offset = min(len(caretspace), offset) - 1
+offset = min(len(caretspace) + 1, offset) - 1
 caretspace = caretspace[:offset].lstrip()
 # non-space whitespace (likes tabs) must be kept for alignment
 caretspace = ((c.isspace() and c or ' ') for c in caretspace)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-08 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

I verified that 3.8 puts the caret under '|', and agree that this is bug that 
should be fixed at least in a new version.  Other such bugs have been fixed in 
the past.

tk Text widgets have 1-based line numbers and 0-based column numbers. The 
latter is appropriate since column indexes mostly serve as slice positions 
rather than as character positions.  IDLE changes the background of the error 
character to red, so it must be subtracting 1 to get the beginning of the slice 
to be colored.  There was once a bug about this, so having the CPython behavior 
be consistent and documented would be good even if not a language feature.  
(For some reason I have not yet tracked down, IDLE colors the \n in 3.8 as well 
as 3.9, but is also correct and consistent for other exceptions where the two 
versions report the same column.)

On a related note: Since the exceptions raised by compile are not restricted to 
SyntaxError (or Warning), it would be nice to have them restricted to a 
definite list that is documented.  Both code.InteractiveInterpreter and IDLE 
have try-compile-except with (different) tuples extended as uncaught exceptions 
are reported.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Lysandros Nikolaou

Lysandros Nikolaou  added the comment:

> Hm, but there the old parser seems at fault: it points to the last space 
> rather than beyond it?

Both parsers seem to have the same behaviour, when parsing files. I just found 
out that even the new parser points to the `|` if there is not trailing 
whitespace:

╰─ cat -e t.py
x = 1 | 2 |$
╰─ ./python.exe t.py
  File "/Users/lysnikolaou/Repositories/cpython/t.py", line 1
x = 1 | 2 |
  ^
SyntaxError: invalid syntax
╰─ ./python.exe -X oldparser t.py
  File "/Users/lysnikolaou/Repositories/cpython/t.py", line 1
x = 1 | 2 |
  ^
SyntaxError: invalid syntax

I'll investigate more tomorrow.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Guido van Rossum


Guido van Rossum  added the comment:

Hm, but there the old parser seems at fault: it points to the last space rather 
than beyond it?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Lysandros Nikolaou

Lysandros Nikolaou  added the comment:

> But if the col_offset points way past the text, what should happen? The 
> clipping there seems reasonable.

Note that if a file is being parsed and not a string the behaviour is this:
╰─ cat -e t.py
x = 1 | 2 |$
╰─ ./python.exe t.py
  File "/Users/lysnikolaou/repos/cpython/t.py", line 1
x = 1 | 2 |
  ^
SyntaxError: invalid syntax
╰─ ./python.exe -X oldparser t.py
  File "/Users/lysnikolaou/repos/cpython/t.py", line 1
x = 1 | 2 |
  ^
SyntaxError: invalid syntax

Would we want to also change this? If not, then I guess that we should as 
closely as possible match this behaviour with strings as well, which means not 
clipping.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Guido van Rossum


Guido van Rossum  added the comment:

Aha! So the traceback module does have a difference.

First, if the offset points inside the text, there's no difference:

>>> raise SyntaxError("message", ("", 1, 4, "text"))
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1
text
   ^
SyntaxError: message
traceback.print_exception(SyntaxError, SyntaxError("message", ("", 1, 4, 
"text")), None)
  File "", line 1
text
   ^
SyntaxError: message
>>> 

But if the offset points past the text, there's a difference:

>>> raise SyntaxError("message", ("", 1, 5, "text"))
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1
text
^
SyntaxError: message
>>> traceback.print_exception(SyntaxError, SyntaxError("message", ("", 1, 
>>> 5, "text")), None)
  File "", line 1
text
   ^
SyntaxError: message
>>> 

And even:

>>> raise SyntaxError("message", ("", 1, 10, "text"))
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1
text
 ^
SyntaxError: message
>>> traceback.print_exception(SyntaxError, SyntaxError("message", ("", 1, 
>>> 10, "text")), None)
  File "", line 1
text
   ^
SyntaxError: message
>>> 

I wonder if we need to support the final case? It seems clear to me that if the 
col_offset points just past the text, we should display a caret just past the 
text. (I wonder if this might be an off-by-one issue caused by confusion about 
0-based vs. 1-based offsets.)

But if the col_offset points way past the text, what should happen? The 
clipping there seems reasonable.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Lysandros Nikolaou

Lysandros Nikolaou  added the comment:

Guido:
> I don't understand why the traceback module is implicated.

The traceback module is implicated, because it currently does not allow the 
caret to point to a position after the error line.

In format_exception_only there is the following snippet of code:

if offset is not None:
caretspace = badline.rstrip('\n')
offset = min(len(caretspace), offset) - 1
caretspace = caretspace[:offset].lstrip()
# non-space whitespace (likes tabs) must be kept for alignment
caretspace = ((c.isspace() and c or ' ') for c in caretspace)
yield '{}^\n'.format(''.join(caretspace))

There are two things here, that put together contradict the way that pegen 
reports errors. `badline` gets rstripped and then the offset is changed to 
point to the end of the string, in case it is currently pointing after its end. 
That basically does not allow SyntaxErrors to be formatted the way they 
currently are, when using the new parser.

For example:
╰─ ./python.exe
Python 3.9.0a6+ (heads/master:4638c64295, May  7 2020, 16:47:53)
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 1 | 2 |
  File "", line 1
x = 1 | 2 |
   ^
SyntaxError: invalid syntax

But:
╰─ ./python.exe
Python 3.9.0a6+ (heads/master:4638c64295, May  7 2020, 16:47:53)
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import traceback
>>> try:
... exec('x = 1 | 2 |')
... except SyntaxError as se:
... exc = se
...
>>> traceback.print_exception(type(exc), exc, exc.__traceback__)
Traceback (most recent call last):
  File "", line 2, in 
  File "", line 1
x = 1 | 2 |
  ^
SyntaxError: invalid syntax

I might be missing something here, but I think that that's the traceback 
module's "fault".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Guido van Rossum


Guido van Rossum  added the comment:

For example:

raise SyntaxError("message", ("", 1, 3, "text"))
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 1
text
  ^
SyntaxError: message
>>> 

(I can't find docs for the constructor of SyntaxError objects. We should update 
the docs.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Guido van Rossum


Guido van Rossum  added the comment:

I don't understand why the traceback module is implicated. It just formats the 
information in the SyntaxError object, same as the builtin printing for syntax 
errors. The key difference is always in what line/column/text is put in the 
SyntaxError object by whichever parser is being used.

In the past there was some misunderstanding about whether column numbers are 
0-based (the leftmost column is numbered 0) or 1-based (the leftmost column is 
numbered 1), and at some point we discovered there was an inconsistency -- 
certain parts of the code put 0-based offsets in the SyntaxError object and 
other parts put 1-based offsets.

We then decided that the SyntaxError column offset should be 1-based and 
changed various bits of code to match. It's however possible that we forgot 
some. It's also still not clearly documented (e.g. the stdlib docs for 
SyntaxError don't mention it).

What complicates matters further is that the lowest-level C code in the 
tokenizer definitely uses 0-based offsets, which means that whenever we create 
a SyntaxError we have to add 1 to the offset. (You can see this happening if 
you look at various calls to PyErr_SyntaxLocationObject().)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Batuhan Taskaya


Change by Batuhan Taskaya :


--
nosy: +BTaskaya

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40546] Inconsistencies between PEG parser and traceback SyntaxErrors

2020-05-07 Thread Lysandros Nikolaou


Change by Lysandros Nikolaou :


--
title: Inconsistencies between pegen and traceback SyntaxErrors -> 
Inconsistencies between PEG parser and traceback SyntaxErrors

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com