[Python-Dev] Re: Critique of PEP 657

2021-07-01 Thread Ammar Askar
Hi Terry,

> IDLE currently uses traceback.extract_tb and traceback.print_list

Awesome, that should work out perfectly then. Our current
proof-of-concept implementation augments the traceback.FrameSummary
class to include `end_lineno`, `colno` and `end_colno` attributes. We
will make sure to add you as a reviewer on that change so you can give
it an early shot at integrating with IDLE.

Regards,
Ammar
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/J622Y35DCXIBHEF72VENBJ5DX6F5ZFPJ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657

2021-06-30 Thread Terry Reedy

On 6/30/2021 5:30 PM, Pablo Galindo Salgado wrote:
Also, notice we are extending the traceback module (in Python) to 
support this, so you probably can also leverage those changes so you 
don't need to mess with code objects yourself :)


IDLE currently uses traceback.extract_tb and traceback.print_list.  In 
between, it a) removes extra entries at both ends that result from 
running in IDLE, and b) adds code lines for shell entries.  It does this 
in the user code execution process and send the resulting string tagged 
as stderr via the socket connection to the IDLE gui process.


What I believe I would like is to have 'line n' of each frame entry 
replaced with a position 4-tuple, however formatted, and no caret line. 
 IDLE would then use the position to tag the appropriate slice of the line.


Currently, if the user right clicks on either of the two lines of pair, 
to see the line in its context in its file, IDLE opens the file in an 
editor if not open already and  tags the entire line.  If 'line n' were 
replaced with the slice info, it could instead tag that slice, either 
within a line or spanning multiple lines.  Both would be improvements.


Please add me as nosy to any appropriate issues/PRs so I have at least 
an opportunity to test and comment.


--
Terry Jan Reedy

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E5NSLCFQBR6M27YZQHRXRQKO657O5GF4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657

2021-06-30 Thread Terry Reedy

 > Then how will modules that customizes traceback presentation, such as
idlelib, be able to get the 4-tuple for a particular traceback entry?

 From the exception, you can get the code object and from the code 
object the extra information using

the Python API:

Example:

 >>> try:
...   1/0
... except Exception as e:
...   f = e



 >>> list(f.__traceback__.tb_frame.f_code.co_positions())
[(1, 4, 1, 8), (2, 2, 3, 4), (2, 2, 5, 6), (2, 2, 3, 6), (2, 2, 3, 6), 
(2, 4, 3, 8), (2, 4, 3, 8), (None, 2, 3, 6), (3, 4, 1, 8), (3, 3, 8, 
17), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (4, 4, 7, 
8), (4, 4, 3, 4), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 
8), (4, 4, 3, 8), (4, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 
4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (3, 4, 3, 8)]


Ah, co_positions is an access method, corresponding to the current (also 
undocumented) co_lines.  There will be no direct access to the 
position-table as it will be kept private and subject to change.  The 
obvious first question: why 28 items and what does the index mean?


When I compile the above with 3.10.0b3, there are 29 bytecodes, so I am 
guessing your branch has 28 and that the first number in the tuple is 
the line number.  But how would one know which '2' entry corresponds to 
the divide in '1/0'.  And what do the rest of the tuple numbers mean?  I 
don't see anything like the (2,2, 2,4) I expect for '1/0'.  To be 
documented, as you say below.


This is equivalent (and more efficient) to have this information in the 
traceback itself (as it would have been duplicated and would require 
more changes).


Understood and agreed.

We would document this a bit better with some examples. And we will make 
sure to add docs about this for sure :)



--
Terry Jan Reedy


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/27NHADGIRL2KYU42LACX445TW53ENKWL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657

2021-06-30 Thread Pablo Galindo Salgado
Also, notice we are extending the traceback module (in Python) to support
this, so you probably can also leverage those changes so you don't need to
mess with code objects yourself :)

On Wed, 30 Jun 2021 at 22:29, Pablo Galindo Salgado 
wrote:

> Hi Terry,
>
> Thanks for your message!
>
> > Then how will modules that customizes traceback presentation, such as
> idlelib, be able to get the 4-tuple for a particular traceback entry?
>
> From the exception, you can get the code object and from the code object
> the extra information using
> the Python API:
>
> Example:
>
> >>> try:
> ...   1/0
> ... except Exception as e:
> ...   f = e
> ...
> >>> list(f.__traceback__.tb_frame.f_code.co_positions())
> [(1, 4, 1, 8), (2, 2, 3, 4), (2, 2, 5, 6), (2, 2, 3, 6), (2, 2, 3, 6), (2,
> 4, 3, 8), (2, 4, 3, 8), (None, 2, 3, 6), (3, 4, 1, 8), (3, 3, 8, 17), (3,
> 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (4, 4, 7, 8), (4, 4, 3,
> 4), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8),
> (4, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4,
> 3, 8), (None, 4, 3, 8), (3, 4, 3, 8)]
> This is equivalent (and more efficient) to have this information in the
> traceback itself (as it would have been duplicated and would require more
> changes).
>
> We would document this a bit better with some examples. And we will make
> sure to add docs about this for sure :)
>
> Pablo
>
>
> On Wed, 30 Jun 2021 at 22:16, Terry Reedy  wrote:
>
>> On 6/30/2021 12:30 PM, Ammar Askar wrote:
>>
>> > I don't think we're making strong claims that the full `(line,
>> > end_line, column, end_column)` should be the canonical representation
>> > for exception locations. The only concrete place we suggest their
>> > usage is in the printing of tracebacks.
>>
>> sys.__excepthook__ will have access to the information, but will censor
>> it for long lines or slices that span more than one line.
>>
>> > The information is not exposed
>> > in the exception or traceback objects intentionally as part of this.
>>
>> Then how will modules that customizes traceback presentation, such as
>> idlelib, be able to get the 4-tuple for a particular traceback entry?
>> This seems like a repeat of attribute and name error name hints being
>> not accessible from the traceback module and only accessible from Python
>> via a workaround such as in idlelib.run: 218:
>>
>>  if typ in (AttributeError, NameError):
>>  # 3.10+ hints are not directly accessible from python (#44026).
>>  err = io.StringIO()
>>  with contextlib.redirect_stderr(err):
>>  sys.__excepthook__(typ, exc, tb)
>>  return [err.getvalue().split("\n")[-2] + "\n"]
>>
>> As Pablo explained on #44026, the length hints are not part of the tb
>> object because they are expensive to compute and are not useful when
>> AttributeError and NameError are expected and are caught in code for
>> flow control purposes.  Therefore the hints are only computed when an
>> uncaught exception is displayed to users.
>>
>> However, the position information is already computed and it is just a
>> matter of passing it along *all the way* to the python coder.
>>
>> For slices within normal lines, the new caret line can be turned back
>> into position, as IDLE does now for SyntaxErrors.  But that does not
>> work when the caret represents a truncated slice.
>>
>> The PEP says co_positions will be added code objects but makes no
>> mention of an API for accessing information within it.  And that still
>> leaves the issue of getting the code object.
>>
>> Summary: the position information would be much more useful if it were
>> added to traceback items and if the traceback functions then exposed it.
>>
>> Note: the current co_lines and co_linetable seems to be undocumented --
>> no index entries, nothing in
>> https://docs.python.org/3.11/reference/datamodel.html#index-56
>> and no docstring for co_lines.  So I have no idea how these work.
>>
>> Note 2: "Opt-out mechanism" says new environmental variable, new
>> -Xnodebugranges flag. "Have a configure flag to opt out" says "we have
>> decided to the -O flag".  I presume the latter is obsolete.
>>
>>
>> --
>> Terry Jan Reedy
>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/RQYVQXJXZRLCWXVXKICJXB2RQLL2ZEXM/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CLHMM3I423I3K5SEWPBHFR6STAWX6S6E/
Code of Conduct: 

[Python-Dev] Re: Critique of PEP 657

2021-06-30 Thread Pablo Galindo Salgado
Hi Terry,

Thanks for your message!

> Then how will modules that customizes traceback presentation, such as
idlelib, be able to get the 4-tuple for a particular traceback entry?

>From the exception, you can get the code object and from the code object
the extra information using
the Python API:

Example:

>>> try:
...   1/0
... except Exception as e:
...   f = e
...
>>> list(f.__traceback__.tb_frame.f_code.co_positions())
[(1, 4, 1, 8), (2, 2, 3, 4), (2, 2, 5, 6), (2, 2, 3, 6), (2, 2, 3, 6), (2,
4, 3, 8), (2, 4, 3, 8), (None, 2, 3, 6), (3, 4, 1, 8), (3, 3, 8, 17), (3,
4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (3, 4, 1, 8), (4, 4, 7, 8), (4, 4, 3,
4), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8), (4, 4, 3, 8),
(4, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4, 3, 8), (None, 4,
3, 8), (None, 4, 3, 8), (3, 4, 3, 8)]
This is equivalent (and more efficient) to have this information in the
traceback itself (as it would have been duplicated and would require more
changes).

We would document this a bit better with some examples. And we will make
sure to add docs about this for sure :)

Pablo


On Wed, 30 Jun 2021 at 22:16, Terry Reedy  wrote:

> On 6/30/2021 12:30 PM, Ammar Askar wrote:
>
> > I don't think we're making strong claims that the full `(line,
> > end_line, column, end_column)` should be the canonical representation
> > for exception locations. The only concrete place we suggest their
> > usage is in the printing of tracebacks.
>
> sys.__excepthook__ will have access to the information, but will censor
> it for long lines or slices that span more than one line.
>
> > The information is not exposed
> > in the exception or traceback objects intentionally as part of this.
>
> Then how will modules that customizes traceback presentation, such as
> idlelib, be able to get the 4-tuple for a particular traceback entry?
> This seems like a repeat of attribute and name error name hints being
> not accessible from the traceback module and only accessible from Python
> via a workaround such as in idlelib.run: 218:
>
>  if typ in (AttributeError, NameError):
>  # 3.10+ hints are not directly accessible from python (#44026).
>  err = io.StringIO()
>  with contextlib.redirect_stderr(err):
>  sys.__excepthook__(typ, exc, tb)
>  return [err.getvalue().split("\n")[-2] + "\n"]
>
> As Pablo explained on #44026, the length hints are not part of the tb
> object because they are expensive to compute and are not useful when
> AttributeError and NameError are expected and are caught in code for
> flow control purposes.  Therefore the hints are only computed when an
> uncaught exception is displayed to users.
>
> However, the position information is already computed and it is just a
> matter of passing it along *all the way* to the python coder.
>
> For slices within normal lines, the new caret line can be turned back
> into position, as IDLE does now for SyntaxErrors.  But that does not
> work when the caret represents a truncated slice.
>
> The PEP says co_positions will be added code objects but makes no
> mention of an API for accessing information within it.  And that still
> leaves the issue of getting the code object.
>
> Summary: the position information would be much more useful if it were
> added to traceback items and if the traceback functions then exposed it.
>
> Note: the current co_lines and co_linetable seems to be undocumented --
> no index entries, nothing in
> https://docs.python.org/3.11/reference/datamodel.html#index-56
> and no docstring for co_lines.  So I have no idea how these work.
>
> Note 2: "Opt-out mechanism" says new environmental variable, new
> -Xnodebugranges flag. "Have a configure flag to opt out" says "we have
> decided to the -O flag".  I presume the latter is obsolete.
>
>
> --
> Terry Jan Reedy
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RQYVQXJXZRLCWXVXKICJXB2RQLL2ZEXM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y5BYJ4XXY4EHIDUCZD3BO3VWMXQHJP2I/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657

2021-06-30 Thread Terry Reedy

On 6/30/2021 12:30 PM, Ammar Askar wrote:


I don't think we're making strong claims that the full `(line,
end_line, column, end_column)` should be the canonical representation
for exception locations. The only concrete place we suggest their
usage is in the printing of tracebacks.


sys.__excepthook__ will have access to the information, but will censor 
it for long lines or slices that span more than one line.



The information is not exposed
in the exception or traceback objects intentionally as part of this.


Then how will modules that customizes traceback presentation, such as 
idlelib, be able to get the 4-tuple for a particular traceback entry? 
This seems like a repeat of attribute and name error name hints being 
not accessible from the traceback module and only accessible from Python 
via a workaround such as in idlelib.run: 218:


if typ in (AttributeError, NameError):
# 3.10+ hints are not directly accessible from python (#44026).
err = io.StringIO()
with contextlib.redirect_stderr(err):
sys.__excepthook__(typ, exc, tb)
return [err.getvalue().split("\n")[-2] + "\n"]

As Pablo explained on #44026, the length hints are not part of the tb 
object because they are expensive to compute and are not useful when 
AttributeError and NameError are expected and are caught in code for 
flow control purposes.  Therefore the hints are only computed when an 
uncaught exception is displayed to users.


However, the position information is already computed and it is just a 
matter of passing it along *all the way* to the python coder.


For slices within normal lines, the new caret line can be turned back 
into position, as IDLE does now for SyntaxErrors.  But that does not 
work when the caret represents a truncated slice.


The PEP says co_positions will be added code objects but makes no 
mention of an API for accessing information within it.  And that still 
leaves the issue of getting the code object.


Summary: the position information would be much more useful if it were 
added to traceback items and if the traceback functions then exposed it.


Note: the current co_lines and co_linetable seems to be undocumented -- 
no index entries, nothing in

https://docs.python.org/3.11/reference/datamodel.html#index-56
and no docstring for co_lines.  So I have no idea how these work.

Note 2: "Opt-out mechanism" says new environmental variable, new 
-Xnodebugranges flag. "Have a configure flag to opt out" says "we have 
decided to the -O flag".  I presume the latter is obsolete.



--
Terry Jan Reedy

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RQYVQXJXZRLCWXVXKICJXB2RQLL2ZEXM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657

2021-06-30 Thread Ammar Askar
Hi Mark,

Thank you for the feedback. Let me address/elaborate some of the
points that Pablo touched on.

> PEP 657 proposes that locations for exceptions be treated as ranges,
> whereas tracing, profiling and debugging currently treat locations as
> points.

I don't think we're making strong claims that the full `(line,
end_line, column, end_column)` should be the canonical representation
for exception locations. The only concrete place we suggest their
usage is in the printing of tracebacks. The information is not exposed
in the exception or traceback objects intentionally as part of this.

The place we make a reference to non-traceback tooling being able to
use this information is coverage tools being able to perform
expression-level granularity in coverage. As a quick example consider:

x = True or f()

might be marked as covered by a line coverage tool but this PEP
potentially exposes extra information that might help show that the
function call is not covered.

> Consider this example:
> https://github.com/python/cpython/blob/main/Lib/test/test_compile.py#L861

And this example continues to work just as it does right now. There is
no change to the tracing apis or the existing co_lines method. I think
your concern is if tracing tools switched to using PEP 657 (which they
are not obligated to), but even that case works out:

co_lines() returns (6, 8, 4) for the CALL_METHOD.
co_positions() from PEP 657 returns (4, 6, 5, 6) corresponding to
(line, end_line, column, end_column) for the CALL_METHOD.

> For example, how would one set a breakpoint on line 4 above?

Just as they would right now, they don't need to change how they set
breakpoints.

> PEP 657 claims it is fully backwards compatible, but it cannot be both
> backwards compatible and consistent.

I think there's a misunderstanding on what backwards compatibility
means between us, can you concretely explain how this PEP or its
implementation would break existing tools? I understand your concerns
about having two potentially conflicting APIs for source locations but
that is not a backwards compatibility problem.

> 1. Clarify, in detail, the impact on line-based tools like profilers,
> coverage.py and debuggers. This should include help on how to use the
> new APIs and where using the old APIs might result in behavioral changes.

As mentioned, we don't have an expectation for line-based tools to
switch to the new API. Its primary consumer is meant to be the
traceback mechanism. Usage of the old APIs will and must not lead to
any behavioral changes.

> 2. Change the C API to a single function:
> int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int
> *startcolumn, int *endline, int *endcolumn)

Thank you, this is a great suggestion.

> 3. Drop the opt-out option.
> If the extra information is optional, then the compression scheme must
> allow for that; making the code more complex and potentially less
> efficient. Does opting out use the start of the range, or the old line,
> as the location?

In the future if a fantastic compression scheme is figured out that
requires both the end lines and column information, I think it would
be acceptable to make the opt-out only suppress the printing of the
carets in the traceback while still maintaining the data in the code
objects. This would still be backwards compatible.

> 4. Drop the limitation on column offsets.
> The data needs to be compressed anyway, so allowing arbitrary column
> offsets is effectively free.

Sure, I think there were plenty of good ideas thrown around
compression and lazy-loading in the last PEP 657 thread so this is
more of just a soft-limitation of the current implementation of the
PEP. This limit can be increased in the future without any changes in
the API or hurting backwards compatibility.

> 6. Store all location information in a single table (this applies more
> to the implementation than the PEP)
> Using four separate objects to hold the location info adds a lot of
> overhead for most functions.

I would just like to cap off and address this point together. The PEP
is meant primarily to aid with debugging and improve tracebacks. The
API is designed to be very limited and allow for a lot of room for
expansion and optimization. It does not make strong prescriptive
claims that tools must switch to the new richer information as your
mail seems to suggest. The other smaller aspects like the internal
storage formats, not storing data when opted out are concerns that can
be addressed without making any breaking changes.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VWTCMNX5J6K7FMX4VPYHBU5DU34AMERI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657

2021-06-30 Thread Pablo Galindo Salgado
Hello Mark,

Thanks for writing this email. I do appreciate your effort and your passion
for trying to improve Python and this work, but I
have to say that I am a bit frustrated with how are you dealing with this
and that unfortunately, I have to admit that this is
absorbing too much emotional energy. On the other hand, I would love it if
you could create some Pull Requests after the initial
implementation is done and help us improving the base implementation with a
better format and efficient implementations.

> PEP 657 describes how locations for exceptions are to be handled, but is
> vague on the treatment of locations for tracing, profiling and debugging.


That is because the pep is called "Include Fine-Grained Error Locations in
Tracebacks", we are interested in
fixing that particular problem and by request of many tool authors we have
exposed the information as it turns out is
quite useful for them, but we don't want to specify how the information
will be used or the semantics for tracing profiling
and debugging. The semantic of that information is already there (the same
one attached to AST offsets) and we are not
interested in changing it or altering it, just (as the PEP mentions),
propagate it. We are not interested in creating more exhaustive
contracts for the existing information (AST offsets).

>  The impact on tools like coverage.py and debuggers should be made
> clearer. For example, how would one set a breakpoint on line 4 above?


There is no impact on existing tools because we are not changing previous
APIs, every API in the PEP is new so these changes do not affect existing
tools. Existing
tools could use the new APIs if they wish taking into account that these
numbers have the exact semantics as the ones they have in the AST, which we
believe is
almost always what they expect. I understand that you may have a different
view or disagree with how we view things and that is fine.

> The behavior of tracing and debuggers needs to be described for those
> locations.


No, there is no reason we need to do that description. That's not the scope
in the PEP, we only propagate AST offsets that can be consumed with the same
semantics as the one currently publicly available on the AST. And this is
not the primary focus of the pep, which is, again, mainly focused on
"Include Fine-Grained Error Locations in Tracebacks". I will respect if you
disagree with this vision, but that is our vision and that is how we have
written the PEP.

> PEP 657 claims it is fully backwards compatible, but it cannot be 
> bothnbackwards
> compatible and consistent.
> There are fundamental differences between using ranges and points as
> locations.


This PEP is fully backward compatible because is not changing any existing
API, is only adding new APIs. People that want "points" can use
the same existing APIs that have been used before without any problems
because they are there exactly as they were. Is absolutely fine if you have
a different view on what is backwards compatible, but regarding our
backwards compatibility policy and PEP 387, this is backwards compatible.

The PEP 657 suggests the impact on startup would be negligible. That is
> not quite true.


We already answered this in the other thread you created with exhaustive
benchmarks. I don't think we have anything useful to add here.

Regarding your suggestions:

2. Change the C API to a single function:
>
int PyCode_Addr2Location(PyCodeObject *co, int addr, int *startline, int
> *startcolumn, int *endline, int *endcolumn)


This is very reasonable, so I think we can change the PEP and the
implementation to do this.

3. Drop the opt-out option.
> If the extra information is optional, then the compression scheme must
> allow for that; making the code more complex and potentially less
> efficient. Does opting out use the start of the range, or the old line,
> as the location?


Not possible. The opt-out option is important for a lot of people and, to
the point I understand a
very likely requirement for the PEP to be accepted.

4. Drop the limitation on column offsets.
> The data needs to be compressed anyway, so allowing arbitrary column
> offsets is effectively free.


The limitation can be dropped in the future without problems. This is
covered in the PEP:

*>>> As specified previously, the underlying storage of the offsets should
be considered an implementation detail, as the public APIs to obtain this
values will return either C int types or Python int objects, which allows
to implement better compression/encoding in the future if bigger ranges
would need to be supported. *

For the first implementation, we are not interested in dealing with
compression or other heavy optimizations, but if you are interested you are
more than
welcome to improve upon and drop it in the future.

6. Store all location information in a single table (this applies more
> to the implementation than the PEP)
> Using four separate objects to hold the location info adds a lot of
> 

[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-20 Thread Pablo Galindo Salgado
> Hopefully if we add more pseudokeywords in the future it won't break
the ability to parse traceback spans.

It won't because soft keywords are now handled natively with the peg
parser (as "match" and "case" now) instead of hacked into the tokenizer :)


On Fri, 21 May 2021 at 01:55, Nathaniel Smith  wrote:

> On Wed, May 19, 2021 at 7:28 PM Pablo Galindo Salgado
>  wrote:
> >>
> >> Excellent point! Do you know how reliable this is in practice, i.e.
> >> what proportion of bytecode source spans are something you can
> >> successfully pass to ast.parse? If it works it's obviously nicer, but
> >> I can't tell how often it works. E.g. anything including
> >> return/break/continue/yield/await will fail, since those require an
> >> enclosing context to be legal. I doubt return/break/continue will
> >> raise exceptions often, but yield/await do all the time.
> >
> >
> > All those limitations are compiler-time limitations because they imply
> > scoping. A valid AST is any piece of a converted parse tree, or a piece
> > of the PEG sub grammar:
> >
> > >>> ast.dump(ast.parse("yield"))
> > 'Module(body=[Expr(value=Yield())], type_ignores=[])'
> > >>> ast.dump(ast.parse("return"))
> > 'Module(body=[Return()], type_ignores=[])'
> > >>> ast.dump(ast.parse("continue"))
> > 'Module(body=[Continue()], type_ignores=[])'
> > >>> ast.dump(ast.parse("await x"))
> > "Module(body=[Expr(value=Await(value=Name(id='x', ctx=Load(],
> type_ignores=[])"
>
> Ah, nice! I guess I was confused by memories of the behavior in 3.6
> and earlier, where 'await' was a pseudokeyword:
>
> ❯ docker run -it --rm python:3.6-alpine
> >>> import ast
> >>> ast.parse("await f()")
> SyntaxError: invalid syntax
>
> Hopefully if we add more pseudokeywords in the future it won't break
> the ability to parse traceback spans.
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RQ65HRD47SR73B256EUXPUCSVY65USDV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-20 Thread Nathaniel Smith
On Wed, May 19, 2021 at 7:28 PM Pablo Galindo Salgado
 wrote:
>>
>> Excellent point! Do you know how reliable this is in practice, i.e.
>> what proportion of bytecode source spans are something you can
>> successfully pass to ast.parse? If it works it's obviously nicer, but
>> I can't tell how often it works. E.g. anything including
>> return/break/continue/yield/await will fail, since those require an
>> enclosing context to be legal. I doubt return/break/continue will
>> raise exceptions often, but yield/await do all the time.
>
>
> All those limitations are compiler-time limitations because they imply
> scoping. A valid AST is any piece of a converted parse tree, or a piece
> of the PEG sub grammar:
>
> >>> ast.dump(ast.parse("yield"))
> 'Module(body=[Expr(value=Yield())], type_ignores=[])'
> >>> ast.dump(ast.parse("return"))
> 'Module(body=[Return()], type_ignores=[])'
> >>> ast.dump(ast.parse("continue"))
> 'Module(body=[Continue()], type_ignores=[])'
> >>> ast.dump(ast.parse("await x"))
> "Module(body=[Expr(value=Await(value=Name(id='x', ctx=Load(], 
> type_ignores=[])"

Ah, nice! I guess I was confused by memories of the behavior in 3.6
and earlier, where 'await' was a pseudokeyword:

❯ docker run -it --rm python:3.6-alpine
>>> import ast
>>> ast.parse("await f()")
SyntaxError: invalid syntax

Hopefully if we add more pseudokeywords in the future it won't break
the ability to parse traceback spans.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OGGRDHOPOVXS25UMYYLZ55FGDMA25UMU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-19 Thread Pablo Galindo Salgado
>
> Excellent point! Do you know how reliable this is in practice, i.e.
> what proportion of bytecode source spans are something you can
> successfully pass to ast.parse? If it works it's obviously nicer, but
> I can't tell how often it works. E.g. anything including
> return/break/continue/yield/await will fail, since those require an
> enclosing context to be legal. I doubt return/break/continue will
> raise exceptions often, but yield/await do all the time.


All those limitations are compiler-time limitations because they imply
scoping. A valid AST is any piece of a converted parse tree, or a piece
of the PEG sub grammar:

>>> ast.dump(ast.parse("yield"))
'Module(body=[Expr(value=Yield())], type_ignores=[])'
>>> ast.dump(ast.parse("return"))
'Module(body=[Return()], type_ignores=[])'
>>> ast.dump(ast.parse("continue"))
'Module(body=[Continue()], type_ignores=[])'
>>> ast.dump(ast.parse("await x"))
"Module(body=[Expr(value=Await(value=Name(id='x', ctx=Load(],
type_ignores=[])"

On Thu, 20 May 2021 at 03:22, Nathaniel Smith  wrote:

> On Tue, May 18, 2021 at 2:49 PM Pablo Galindo Salgado
>  wrote:
> > * It actually doesn't have more advantages. The current solution in the
> PEP can do exactly the same as this solution if you allow reparsing when
> > displaying tracebacks. This is because with the start line, end line,
> start offset and end offset and the original file, you can extract the
> source that
> > is associated with the instruction, parse it (and this
> > is much faster because you just need to parse the tiny fragment) and
> then you get an AST node that you can use for whatever you want.
>
> Excellent point! Do you know how reliable this is in practice, i.e.
> what proportion of bytecode source spans are something you can
> successfully pass to ast.parse? If it works it's obviously nicer, but
> I can't tell how often it works. E.g. anything including
> return/break/continue/yield/await will fail, since those require an
> enclosing context to be legal. I doubt return/break/continue will
> raise exceptions often, but yield/await do all the time.
>
> You could kluge it by wrapping the source span in a dummy 'async def'
> before parsing, since that makes yield/await legal, but OTOH it makes
> 'yield from' and 'from X import *' illegal.
>
> I guess you could have a helper that attempts passing the string to
> ast.parse, and if that fails tries wrapping it in a loop/sync
> def/async def/etc. until one of them succeeds. Maybe that would be a
> useful utility to add to the traceback module?
>
> Or add a PyCF_YOLO flag that tries to make sense of an arbitrary
> out-of-context string.
>
> (Are there any other bits of syntax that require specific contexts
> that I'm not thinking of? If __enter__/__exit__ raise an exception,
> then what's the corresponding span? The entire 'with' block, or just
> the 'with' line itself?)
>
> -n
>
> PS: this is completely orthogonal to PEP 657, but if you're excited
> about making tracebacks more readable, another piece of low-hanging
> fruit would be to print method __qualname__s instead of __name__s in
> the traceback output. The reason we don't do that now is that
> __qualname__ lives on the function object, but in a traceback, we
> can't get the function object. The traceback only has access to the
> code object, and the code object doesn't have __qualname__, just
> __name__. Probably the cleanest way to do this would be to make the
> traceback or code object have a pointer back to the function object.
> See also https://bugs.python.org/issue12857.
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VLTCIVKY3PBZC6LAMQ7EFZBO53KSGWYD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-19 Thread Nathaniel Smith
On Tue, May 18, 2021 at 2:49 PM Pablo Galindo Salgado
 wrote:
> * It actually doesn't have more advantages. The current solution in the PEP 
> can do exactly the same as this solution if you allow reparsing when
> displaying tracebacks. This is because with the start line, end line, start 
> offset and end offset and the original file, you can extract the source that
> is associated with the instruction, parse it (and this
> is much faster because you just need to parse the tiny fragment) and then you 
> get an AST node that you can use for whatever you want.

Excellent point! Do you know how reliable this is in practice, i.e.
what proportion of bytecode source spans are something you can
successfully pass to ast.parse? If it works it's obviously nicer, but
I can't tell how often it works. E.g. anything including
return/break/continue/yield/await will fail, since those require an
enclosing context to be legal. I doubt return/break/continue will
raise exceptions often, but yield/await do all the time.

You could kluge it by wrapping the source span in a dummy 'async def'
before parsing, since that makes yield/await legal, but OTOH it makes
'yield from' and 'from X import *' illegal.

I guess you could have a helper that attempts passing the string to
ast.parse, and if that fails tries wrapping it in a loop/sync
def/async def/etc. until one of them succeeds. Maybe that would be a
useful utility to add to the traceback module?

Or add a PyCF_YOLO flag that tries to make sense of an arbitrary
out-of-context string.

(Are there any other bits of syntax that require specific contexts
that I'm not thinking of? If __enter__/__exit__ raise an exception,
then what's the corresponding span? The entire 'with' block, or just
the 'with' line itself?)

-n

PS: this is completely orthogonal to PEP 657, but if you're excited
about making tracebacks more readable, another piece of low-hanging
fruit would be to print method __qualname__s instead of __name__s in
the traceback output. The reason we don't do that now is that
__qualname__ lives on the function object, but in a traceback, we
can't get the function object. The traceback only has access to the
code object, and the code object doesn't have __qualname__, just
__name__. Probably the cleanest way to do this would be to make the
traceback or code object have a pointer back to the function object.
See also https://bugs.python.org/issue12857.

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JWZHMW6WQOQMSAGWKMRTEHHRSZRMNW3C/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-18 Thread Pablo Galindo Salgado
Ok, we have implemented a rough prototype and we have decided not to go
with this for the following reasons:

* Is almost the same storage cost as the solution we already have. Since
storing the node id cost 4 bytes (an integer) per instruction
and our solution needs 2+ bytes per instruction for the offsets (normally
just 2 as offsets are generally < 256) + the compressed end line,
which is very small since is almost always the same as the start line.
* It actually doesn't have more advantages. The current solution in the PEP
can do exactly the same as this solution if you allow reparsing when
displaying tracebacks. This is because with the start line, end line, start
offset and end offset and the original file, you can extract the source that
is associated with the instruction, parse it (and this
is much faster because you just need to parse the tiny fragment) and then
you get an AST node that you can use for whatever you want.
* The proposed solution forces to reparse several times entire files just
to extract a single AST node. Even worse: for frames in the same file it
forces
to reparse those again and again unless you complicate the implementation
by adding AST caching. But even that it won't be free as it will incur in
quite a lot of extra memory and this is problematic, especially when
displaying exceptions as memory can be low, which the current design takes
into account.
* Nodes are created and modified after the optimization pass, so the AST
produced by the parser is not enough to reconstruct the actual
information, we need to also run the optimization passes, but
unfortunately, this is (by design) not don't in the Python API (to preserve
all possible information about the original code), so this complicates
quite a lot the API to get this, as `ast.parse` is not enough to get the
original tree.
* The implementation is quite more complex then the one the PEP has since
to do it right implies having to hash the source files, implement node id
propagation in the parser and a node visitor to find the correct node +
reparsing in the traceback.
* It makes the AST (both the C one and the Python one) bigger, which
increases the memory costs of parsing and very slightly affects performance
(we measured a 4% decrease in perf to add the new field).
* It makes the API for 3rd party tools very non-straightforward, forcing
reparsing and finding the AST nodes. Even if we provide
some new functionality in the ast module or similar to make this easier, it
quite a lot of overhead just to get the position information.

Even if is possible to solve many of these problems, we think the
complexity is not worth it, especially since it actually doesn't give more
functionality.

On Tue, 18 May 2021 at 13:47, Pablo Galindo Salgado 
wrote:

> > One integer is actually not enough to assign IDs.
>
> Actually, disregard this particular problem. I think that we could
> perfectly stop assigning IDs if we reach the overflow limit and call it a
> day
> since you need to have a truly horrendous file to reach 4,294,967,295 AST 
> nodes
> (I did some tests to check this).
>
> On Tue, 18 May 2021 at 13:25, Pablo Galindo Salgado 
> wrote:
>
>> Yet another problem that I found:
>>
>> One integer is actually not enough to assign IDs. One unsigned integer
>> can cover 4,294,967,295 AST nodes, but is technically possible
>> to have more than that in a single file. While in PEP 657 we are tracking
>> offsets that are normally very low < 100 typically or end lines
>> that are easily compressible (as end lines are normally equal to the
>> start of the line), node IDs can be arbitrarily big. Which is worse:
>> the parser creates some AST nodes that throw away if the parser fails
>> (that's how PEG parsers work), so there will be a lot of IDs that
>> don't get assigned (and the parser doesn't have an easy way to know how
>> many IDs has thrown away). This forces us to either use something
>> bigger than an integer (probably a size_t) or to deal with overflow.
>>
>> On Tue, 18 May 2021 at 10:23, Pablo Galindo Salgado 
>> wrote:
>>
>>> Hu, actually another problem of this approach:
>>>
>>> Nodes are created and modified after the optimization pass, so the AST
>>> produced by the parser is not enough to reconstruct the actual
>>> information, we need to also run the optimization passes, but
>>> unfortunately, this is (by design) not don't in the Python API (to preserve
>>> all possible information about the original code), so this complicates
>>> quite a lot the API to get this, as `ast.parse` is not enough to get the
>>> original tree.
>>>
>>> On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado 
>>> wrote:
>>>
 Hi Nathaniel,

 Thanks a lot for your suggestion! I like the idea although I still
 think is more complex than our current proposal, but on the other hand it
 allows for a much richer results so I'm quite excited to try it out. We are
 going to give it a go to explore it with a prototype and if we are
 

[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-18 Thread Larry Hastings

On 5/18/21 5:25 AM, Pablo Galindo Salgado wrote:

Yet another problem that I found:

One integer is actually not enough to assign IDs. One unsigned integer 
can cover 4,294,967,295 AST nodes, but is technically possibleto have 
more than that in a single file.



Surely you could use a 64-bit int for the node ID.


//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AEBB7YY2OYSTPJL5K44USRS3WCS2BTMI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-18 Thread Pablo Galindo Salgado
> One integer is actually not enough to assign IDs.

Actually, disregard this particular problem. I think that we could
perfectly stop assigning IDs if we reach the overflow limit and call it a
day
since you need to have a truly horrendous file to reach 4,294,967,295 AST nodes
(I did some tests to check this).

On Tue, 18 May 2021 at 13:25, Pablo Galindo Salgado 
wrote:

> Yet another problem that I found:
>
> One integer is actually not enough to assign IDs. One unsigned integer can
> cover 4,294,967,295 AST nodes, but is technically possible
> to have more than that in a single file. While in PEP 657 we are tracking
> offsets that are normally very low < 100 typically or end lines
> that are easily compressible (as end lines are normally equal to the start
> of the line), node IDs can be arbitrarily big. Which is worse:
> the parser creates some AST nodes that throw away if the parser fails
> (that's how PEG parsers work), so there will be a lot of IDs that
> don't get assigned (and the parser doesn't have an easy way to know how
> many IDs has thrown away). This forces us to either use something
> bigger than an integer (probably a size_t) or to deal with overflow.
>
> On Tue, 18 May 2021 at 10:23, Pablo Galindo Salgado 
> wrote:
>
>> Hu, actually another problem of this approach:
>>
>> Nodes are created and modified after the optimization pass, so the AST
>> produced by the parser is not enough to reconstruct the actual
>> information, we need to also run the optimization passes, but
>> unfortunately, this is (by design) not don't in the Python API (to preserve
>> all possible information about the original code), so this complicates
>> quite a lot the API to get this, as `ast.parse` is not enough to get the
>> original tree.
>>
>> On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado 
>> wrote:
>>
>>> Hi Nathaniel,
>>>
>>> Thanks a lot for your suggestion! I like the idea although I still think
>>> is more complex than our current proposal, but on the other hand it allows
>>> for a much richer results so I'm quite excited to try it out. We are going
>>> to give it a go to explore it with a prototype and if we are convinced we
>>> will update the PEP.
>>>
>>> One thing I'm not a fan of is that tools that want to leverage this
>>> information (notice our pep also offers this info for tools as an important
>>> aspect of it) will need to reparse the file AND search the AST node, which
>>> also involves transforming the full tree to Python objects, which is even
>>> slower. This is unfortunately a worse API than just get the full location
>>> given an instruction offset. Also, while displaying tracebacks may be a
>>> good scenario for the speed tradeoff, it may not be for other tools.
>>> Finally, if there is no file available all this information is lost,
>>> although one could argue that then is not extremely useful...
>>>
>>> Regards from sunny London,
>>> Pablo Galindo Salgado
>>>
>>> On Tue, 18 May 2021, 01:43 Nathaniel Smith,  wrote:
>>>
 On Mon, May 17, 2021 at 6:18 AM Mark Shannon  wrote:
 > 2. Repeated binary operations on the same line.
 >
 > A single location can also be clearer when all the code is on one
 line.
 >
 > i1 + i2 + s1
 >
 > PEP 657:
 >
 > i1 + i2 + s1
 > 
 >
 > Using a single location:
 >
 > i1 + i2 + s1
 >  ^

 It's true this case is a bit confusing with the whole operation span
 highlighted, but I'm not sure the single location version is much better. I
 feel like a Really Good UI would like, highlight the two operands in
 different colors or something, or at least underline the two separate items
 whose type is incompatible separately:

 TypeError: unsupported operand type(s) for +: 'int' + 'str':
 i1 + i2 + s1
 ^^^   ~~

 More generally, these error messages are the kind of thing where the UI
 can always be tweaked to improve further, and those tweaks can make good
 use of any rich source information that's available.

 So, here's another option to consider:

 - When parsing, assign each AST node a unique, deterministic id (e.g.
 sequentially across the AST tree from top-to-bottom, left-to-right).
 - For each bytecode offset, store the corresponding AST node id in an
 lnotab-like table
 - When displaying a traceback, we already need to go find and read the
 original .py file to print source code at all. Re-parse it, and use the ids
 to find the original AST node, in context with full structure. Let the
 traceback formatter do whatever clever stuff it wants with this info.

 Of course if the .py and .pyc files don't match, this might produce
 gibberish. We already have that problem with showing source lines, but it
 might be even more confusing if we get some random unrelated AST node. This
 could be avoided by storing some kind of hash in the code object, so that

[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-18 Thread Pablo Galindo Salgado
Yet another problem that I found:

One integer is actually not enough to assign IDs. One unsigned integer can
cover 4,294,967,295 AST nodes, but is technically possible
to have more than that in a single file. While in PEP 657 we are tracking
offsets that are normally very low < 100 typically or end lines
that are easily compressible (as end lines are normally equal to the start
of the line), node IDs can be arbitrarily big. Which is worse:
the parser creates some AST nodes that throw away if the parser fails
(that's how PEG parsers work), so there will be a lot of IDs that
don't get assigned (and the parser doesn't have an easy way to know how
many IDs has thrown away). This forces us to either use something
bigger than an integer (probably a size_t) or to deal with overflow.

On Tue, 18 May 2021 at 10:23, Pablo Galindo Salgado 
wrote:

> Hu, actually another problem of this approach:
>
> Nodes are created and modified after the optimization pass, so the AST
> produced by the parser is not enough to reconstruct the actual
> information, we need to also run the optimization passes, but
> unfortunately, this is (by design) not don't in the Python API (to preserve
> all possible information about the original code), so this complicates
> quite a lot the API to get this, as `ast.parse` is not enough to get the
> original tree.
>
> On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado 
> wrote:
>
>> Hi Nathaniel,
>>
>> Thanks a lot for your suggestion! I like the idea although I still think
>> is more complex than our current proposal, but on the other hand it allows
>> for a much richer results so I'm quite excited to try it out. We are going
>> to give it a go to explore it with a prototype and if we are convinced we
>> will update the PEP.
>>
>> One thing I'm not a fan of is that tools that want to leverage this
>> information (notice our pep also offers this info for tools as an important
>> aspect of it) will need to reparse the file AND search the AST node, which
>> also involves transforming the full tree to Python objects, which is even
>> slower. This is unfortunately a worse API than just get the full location
>> given an instruction offset. Also, while displaying tracebacks may be a
>> good scenario for the speed tradeoff, it may not be for other tools.
>> Finally, if there is no file available all this information is lost,
>> although one could argue that then is not extremely useful...
>>
>> Regards from sunny London,
>> Pablo Galindo Salgado
>>
>> On Tue, 18 May 2021, 01:43 Nathaniel Smith,  wrote:
>>
>>> On Mon, May 17, 2021 at 6:18 AM Mark Shannon  wrote:
>>> > 2. Repeated binary operations on the same line.
>>> >
>>> > A single location can also be clearer when all the code is on one line.
>>> >
>>> > i1 + i2 + s1
>>> >
>>> > PEP 657:
>>> >
>>> > i1 + i2 + s1
>>> > 
>>> >
>>> > Using a single location:
>>> >
>>> > i1 + i2 + s1
>>> >  ^
>>>
>>> It's true this case is a bit confusing with the whole operation span
>>> highlighted, but I'm not sure the single location version is much better. I
>>> feel like a Really Good UI would like, highlight the two operands in
>>> different colors or something, or at least underline the two separate items
>>> whose type is incompatible separately:
>>>
>>> TypeError: unsupported operand type(s) for +: 'int' + 'str':
>>> i1 + i2 + s1
>>> ^^^   ~~
>>>
>>> More generally, these error messages are the kind of thing where the UI
>>> can always be tweaked to improve further, and those tweaks can make good
>>> use of any rich source information that's available.
>>>
>>> So, here's another option to consider:
>>>
>>> - When parsing, assign each AST node a unique, deterministic id (e.g.
>>> sequentially across the AST tree from top-to-bottom, left-to-right).
>>> - For each bytecode offset, store the corresponding AST node id in an
>>> lnotab-like table
>>> - When displaying a traceback, we already need to go find and read the
>>> original .py file to print source code at all. Re-parse it, and use the ids
>>> to find the original AST node, in context with full structure. Let the
>>> traceback formatter do whatever clever stuff it wants with this info.
>>>
>>> Of course if the .py and .pyc files don't match, this might produce
>>> gibberish. We already have that problem with showing source lines, but it
>>> might be even more confusing if we get some random unrelated AST node. This
>>> could be avoided by storing some kind of hash in the code object, so that
>>> we can validate the .py file we find hasn't changed (sha512 if we're
>>> feeling fancy, crc32 if we want to save space, either way is probably fine).
>>>
>>> This would make traceback printing more expensive, but only if you want
>>> the fancy features, and traceback printing is already expensive (it does
>>> file I/O!). Usually by the time you're rendering a traceback it's more
>>> important to optimize for human time than CPU time. It would take less
>>> memory than PEP 657, and the 

[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-18 Thread Pablo Galindo Salgado
Hu, actually another problem of this approach:

Nodes are created and modified after the optimization pass, so the AST
produced by the parser is not enough to reconstruct the actual
information, we need to also run the optimization passes, but
unfortunately, this is (by design) not don't in the Python API (to preserve
all possible information about the original code), so this complicates
quite a lot the API to get this, as `ast.parse` is not enough to get the
original tree.

On Tue, 18 May 2021 at 08:53, Pablo Galindo Salgado 
wrote:

> Hi Nathaniel,
>
> Thanks a lot for your suggestion! I like the idea although I still think
> is more complex than our current proposal, but on the other hand it allows
> for a much richer results so I'm quite excited to try it out. We are going
> to give it a go to explore it with a prototype and if we are convinced we
> will update the PEP.
>
> One thing I'm not a fan of is that tools that want to leverage this
> information (notice our pep also offers this info for tools as an important
> aspect of it) will need to reparse the file AND search the AST node, which
> also involves transforming the full tree to Python objects, which is even
> slower. This is unfortunately a worse API than just get the full location
> given an instruction offset. Also, while displaying tracebacks may be a
> good scenario for the speed tradeoff, it may not be for other tools.
> Finally, if there is no file available all this information is lost,
> although one could argue that then is not extremely useful...
>
> Regards from sunny London,
> Pablo Galindo Salgado
>
> On Tue, 18 May 2021, 01:43 Nathaniel Smith,  wrote:
>
>> On Mon, May 17, 2021 at 6:18 AM Mark Shannon  wrote:
>> > 2. Repeated binary operations on the same line.
>> >
>> > A single location can also be clearer when all the code is on one line.
>> >
>> > i1 + i2 + s1
>> >
>> > PEP 657:
>> >
>> > i1 + i2 + s1
>> > 
>> >
>> > Using a single location:
>> >
>> > i1 + i2 + s1
>> >  ^
>>
>> It's true this case is a bit confusing with the whole operation span
>> highlighted, but I'm not sure the single location version is much better. I
>> feel like a Really Good UI would like, highlight the two operands in
>> different colors or something, or at least underline the two separate items
>> whose type is incompatible separately:
>>
>> TypeError: unsupported operand type(s) for +: 'int' + 'str':
>> i1 + i2 + s1
>> ^^^   ~~
>>
>> More generally, these error messages are the kind of thing where the UI
>> can always be tweaked to improve further, and those tweaks can make good
>> use of any rich source information that's available.
>>
>> So, here's another option to consider:
>>
>> - When parsing, assign each AST node a unique, deterministic id (e.g.
>> sequentially across the AST tree from top-to-bottom, left-to-right).
>> - For each bytecode offset, store the corresponding AST node id in an
>> lnotab-like table
>> - When displaying a traceback, we already need to go find and read the
>> original .py file to print source code at all. Re-parse it, and use the ids
>> to find the original AST node, in context with full structure. Let the
>> traceback formatter do whatever clever stuff it wants with this info.
>>
>> Of course if the .py and .pyc files don't match, this might produce
>> gibberish. We already have that problem with showing source lines, but it
>> might be even more confusing if we get some random unrelated AST node. This
>> could be avoided by storing some kind of hash in the code object, so that
>> we can validate the .py file we find hasn't changed (sha512 if we're
>> feeling fancy, crc32 if we want to save space, either way is probably fine).
>>
>> This would make traceback printing more expensive, but only if you want
>> the fancy features, and traceback printing is already expensive (it does
>> file I/O!). Usually by the time you're rendering a traceback it's more
>> important to optimize for human time than CPU time. It would take less
>> memory than PEP 657, and the same as Mark's proposal (both only track one
>> extra integer per bytecode offset). And it would allow for arbitrarily rich
>> traceback display.
>>
>> (I guess in theory you could make this even cheaper by using it to
>> replace lnotab, instead of extending it. But I think keeping lnotab around
>> is a good idea, as a fallback for cases where you can't find the original
>> source but still want some hint at location information.)
>>
>> -n
>>
>> --
>> Nathaniel J. Smith -- https://vorpus.org
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/BUXFOSAEBXLIHH432PKBCXOGXUAHQIVP/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>

[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-18 Thread Pablo Galindo Salgado
Hi Nathaniel,

Thanks a lot for your suggestion! I like the idea although I still think is
more complex than our current proposal, but on the other hand it allows for
a much richer results so I'm quite excited to try it out. We are going to
give it a go to explore it with a prototype and if we are convinced we will
update the PEP.

One thing I'm not a fan of is that tools that want to leverage this
information (notice our pep also offers this info for tools as an important
aspect of it) will need to reparse the file AND search the AST node, which
also involves transforming the full tree to Python objects, which is even
slower. This is unfortunately a worse API than just get the full location
given an instruction offset. Also, while displaying tracebacks may be a
good scenario for the speed tradeoff, it may not be for other tools.
Finally, if there is no file available all this information is lost,
although one could argue that then is not extremely useful...

Regards from sunny London,
Pablo Galindo Salgado

On Tue, 18 May 2021, 01:43 Nathaniel Smith,  wrote:

> On Mon, May 17, 2021 at 6:18 AM Mark Shannon  wrote:
> > 2. Repeated binary operations on the same line.
> >
> > A single location can also be clearer when all the code is on one line.
> >
> > i1 + i2 + s1
> >
> > PEP 657:
> >
> > i1 + i2 + s1
> > 
> >
> > Using a single location:
> >
> > i1 + i2 + s1
> >  ^
>
> It's true this case is a bit confusing with the whole operation span
> highlighted, but I'm not sure the single location version is much better. I
> feel like a Really Good UI would like, highlight the two operands in
> different colors or something, or at least underline the two separate items
> whose type is incompatible separately:
>
> TypeError: unsupported operand type(s) for +: 'int' + 'str':
> i1 + i2 + s1
> ^^^   ~~
>
> More generally, these error messages are the kind of thing where the UI
> can always be tweaked to improve further, and those tweaks can make good
> use of any rich source information that's available.
>
> So, here's another option to consider:
>
> - When parsing, assign each AST node a unique, deterministic id (e.g.
> sequentially across the AST tree from top-to-bottom, left-to-right).
> - For each bytecode offset, store the corresponding AST node id in an
> lnotab-like table
> - When displaying a traceback, we already need to go find and read the
> original .py file to print source code at all. Re-parse it, and use the ids
> to find the original AST node, in context with full structure. Let the
> traceback formatter do whatever clever stuff it wants with this info.
>
> Of course if the .py and .pyc files don't match, this might produce
> gibberish. We already have that problem with showing source lines, but it
> might be even more confusing if we get some random unrelated AST node. This
> could be avoided by storing some kind of hash in the code object, so that
> we can validate the .py file we find hasn't changed (sha512 if we're
> feeling fancy, crc32 if we want to save space, either way is probably fine).
>
> This would make traceback printing more expensive, but only if you want
> the fancy features, and traceback printing is already expensive (it does
> file I/O!). Usually by the time you're rendering a traceback it's more
> important to optimize for human time than CPU time. It would take less
> memory than PEP 657, and the same as Mark's proposal (both only track one
> extra integer per bytecode offset). And it would allow for arbitrarily rich
> traceback display.
>
> (I guess in theory you could make this even cheaper by using it to replace
> lnotab, instead of extending it. But I think keeping lnotab around is a
> good idea, as a fallback for cases where you can't find the original source
> but still want some hint at location information.)
>
> -n
>
> --
> Nathaniel J. Smith -- https://vorpus.org
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/BUXFOSAEBXLIHH432PKBCXOGXUAHQIVP/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SZPCHKVI32YI6EMRLLHKGTAU6M6VAKJZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Nathaniel Smith
On Mon, May 17, 2021 at 6:18 AM Mark Shannon  wrote:
> 2. Repeated binary operations on the same line.
>
> A single location can also be clearer when all the code is on one line.
>
> i1 + i2 + s1
>
> PEP 657:
>
> i1 + i2 + s1
> 
>
> Using a single location:
>
> i1 + i2 + s1
>  ^

It's true this case is a bit confusing with the whole operation span
highlighted, but I'm not sure the single location version is much better. I
feel like a Really Good UI would like, highlight the two operands in
different colors or something, or at least underline the two separate items
whose type is incompatible separately:

TypeError: unsupported operand type(s) for +: 'int' + 'str':
i1 + i2 + s1
^^^   ~~

More generally, these error messages are the kind of thing where the UI can
always be tweaked to improve further, and those tweaks can make good use of
any rich source information that's available.

So, here's another option to consider:

- When parsing, assign each AST node a unique, deterministic id (e.g.
sequentially across the AST tree from top-to-bottom, left-to-right).
- For each bytecode offset, store the corresponding AST node id in an
lnotab-like table
- When displaying a traceback, we already need to go find and read the
original .py file to print source code at all. Re-parse it, and use the ids
to find the original AST node, in context with full structure. Let the
traceback formatter do whatever clever stuff it wants with this info.

Of course if the .py and .pyc files don't match, this might produce
gibberish. We already have that problem with showing source lines, but it
might be even more confusing if we get some random unrelated AST node. This
could be avoided by storing some kind of hash in the code object, so that
we can validate the .py file we find hasn't changed (sha512 if we're
feeling fancy, crc32 if we want to save space, either way is probably fine).

This would make traceback printing more expensive, but only if you want the
fancy features, and traceback printing is already expensive (it does file
I/O!). Usually by the time you're rendering a traceback it's more important
to optimize for human time than CPU time. It would take less memory than
PEP 657, and the same as Mark's proposal (both only track one extra integer
per bytecode offset). And it would allow for arbitrarily rich traceback
display.

(I guess in theory you could make this even cheaper by using it to replace
lnotab, instead of extending it. But I think keeping lnotab around is a
good idea, as a fallback for cases where you can't find the original source
but still want some hint at location information.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BUXFOSAEBXLIHH432PKBCXOGXUAHQIVP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Brandt Bucher
Ethan Furman wrote:
> On 5/17/2021 6:13 AM, Mark Shannon wrote:
> > Where i1, i2 are integers and s1 is a string.
> > > i1 + i2 + s1
> > 
> Wouldn't the carets just be under the i2 + s1 portion?

I don't think so, since this is executed as `((i1 + i2) + s1)`.

Mark's carets look correct to me, since the second (outer) addition's LHS is 
the result of adding `i1` and `i2`:

```
Python 3.11.0a0 (heads/main:a42d98ed91, May 16 2021, 14:02:36) [GCC 7.5.0] on 
linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ast
>>> op = ast.parse("i1 + i2 + s1", mode="eval").body
>>> op

>>> op.col_offset
0
>>> op.end_col_offset
12
```

Brandt
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YCMHFU4I7TEYDQP7OH4AX2YOD4KPLNFX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Ethan Furman

On 5/17/2021 6:13 AM, Mark Shannon wrote:

> Where i1, i2 are integers and s1 is a string.

> i1 + i2 + s1
> 

Wouldn't the carets just be under the i2 + s1 portion?

--
~Ethan~
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YPWW3PIBJDUIETZQOJNHIYR5FNGMOJJ5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Christopher Barker
> > The cost I'm concerned about is the runtime cost of worse code, because
> > the compiler can't perform some optimizations due the constraints of
> > providing the extended debug information.


Python does have an Optimized mode (-O). Granted, it’s not used very often,
but this would be a good use case for it.

-CHB



> Aah thanks for clarifying, I see what you mean now. In cases like this
> where the compiler is making optimizations, I think it is perfectly
> fine to just elide the column information. While it would be nice to
> maintain accurate columns wherever possible, you shouldn't constrain
> improvements and optimizations based on it. The traceback machinery
> will simply not print out the carets in that case and everything
> should just work smoothly.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EB24LA7L5C35QHQTFLB6QZX26E77O6QM/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GBAJSME7P7D6FS4NDCFCJRSJXN6LIYZK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Ammar Askar
> The cost I'm concerned about is the runtime cost of worse code, because
> the compiler can't perform some optimizations due the constraints of
> providing the extended debug information.

Aah thanks for clarifying, I see what you mean now. In cases like this
where the compiler is making optimizations, I think it is perfectly
fine to just elide the column information. While it would be nice to
maintain accurate columns wherever possible, you shouldn't constrain
improvements and optimizations based on it. The traceback machinery
will simply not print out the carets in that case and everything
should just work smoothly.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/EB24LA7L5C35QHQTFLB6QZX26E77O6QM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Mark Shannon

Hi,

On 17/05/2021 5:22 pm, Ammar Askar wrote:
>> While nicer locations for errors is great, it won't be popular if it has
>> a negative impact on performance.
>> Locations need to tracked through the compiler.
>
> In performance sensitive contexts won't most code be pre-compiled into
> pyc files anyway? I feel like the performance cost of accurate column
> tracking in the compiler isn't too big of a concern unless I'm missing
> something.
>

The cost I'm concerned about is the runtime cost of worse code, because 
the compiler can't perform some optimizations due the constraints of 
providing the extended debug information.


Cheers,
Mark.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IXXOBTZJQEVF6EZP5ACQNKTN7RVDQ7SI/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Ammar Askar
> While nicer locations for errors is great, it won't be popular if it has
> a negative impact on performance.
> Locations need to tracked through the compiler.

In performance sensitive contexts won't most code be pre-compiled into
pyc files anyway? I feel like the performance cost of accurate column
tracking in the compiler isn't too big of a concern unless I'm missing
something.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JDRA26FG5TXD3D7VMLE2UNKQQ4WIHLEF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Pablo Galindo Salgado
P.S. We will add "using a single caret" to the "rejected ideas section"
with some rationale.

On Mon, 17 May 2021, 14:28 Pablo Galindo Salgado, 
wrote:

> Hi Mark,
>
> Thanks for your useful feedback. Some comments:
>
> > 1.  Errors spanning multiple lines.
>
> That is addressed. Check the latest version of the PEP: we are propising
> storing the end lines as well:
>
> https://www.python.org/dev/peps/pep-0657/#specification
>
>
> > 2. Repeated binary operations on the same line.
>
> It has been suggested to add this on top of the previous information, but
> we think this is problematic since these locations need to be
> manually calculated in the compiler, while the ranges are derived from the
> AST ranges. Given how error prone the old manual calculations of the AST
> were, we really want to avoid manual calculations here. Additionally, many
> users have mentioned that a single caret is difficult to read and that
> motivates the ranges in SyntaxErrors that we introduced.
> > 3. Tracking locations in the compiler. PEP 657 proposes that no
> compression be used
>
> No, PEP 657 doesn't specify the compression, which is different (check the
> latest version). We say:
>
> "The internal storage, compression and encoding of the information is
> left as an implementation detail and can be changed at any point as long as
> the public API remains unchanged."
>
> > I think it would be better to provide an API like PEP 626 and not
> restrict the internal format used.
>
> Indeed, that is what we are doing. Check
> https://www.python.org/dev/peps/pep-0657/#id10
>
> Cheers,
> Pablo Galindo Salgado
>
>
> On Mon, 17 May 2021 at 14:17, Mark Shannon  wrote:
>
>> Hi everyone,
>>
>> I fully agree with the rationale for the PEP, better error messages
>> always help.
>> However, I think the proposed implementation could be misleading in some
>> cases and wastes memory.
>>
>>
>> Use one position, not one-and-a-half positions.
>> ---
>>
>> The main problem with PEP 657, IMO, is that it wants to present the
>> location of an error as a pair of positions, but without the line number
>> of the second position.
>> Consequently it ends up with one and a half positions, not two.
>> This does not work well for a number of reasons.
>>
>> 1.  Errors spanning multiple lines.
>>
>> Consider:
>>
>>  (i1 + i2 +
>>   s1
>>   )
>>
>> Where i1, i2 are integers and s1 is a string.
>>
>> With a single location, the error points to the second `+`. PEP 657
>> would highlight the whole line, `i1 + i2 +`, making it unclear where the
>> error occurred.
>>
>>
>> 2. Repeated binary operations on the same line.
>>
>> A single location can also be clearer when all the code is on one line.
>>
>> i1 + i2 + s1
>>
>> PEP 657:
>>
>> i1 + i2 + s1
>> 
>>
>> Using a single location:
>>
>> i1 + i2 + s1
>>  ^
>>
>> 3. Tracking locations in the compiler.
>>
>> While nicer locations for errors is great, it won't be popular if it has
>> a negative impact on performance.
>> Locations need to tracked through the compiler. The simpler the
>> location, the easier this is to do correctly without a negative
>> performance impact.
>> It is already tricky to do this correctly with just line numbers because
>> we have both source that compiles to no bytecodes (try, pass) and
>> bytecodes that have no source (implicit return None and except cleanups).
>>
>>
>> A single location can still be presented as a whole token, as tokenizing
>> a single line of code is easy and fast. So when presenting an error, the
>> whole token can be highlighted.
>>
>> E.g:
>>
>> NameError
>>  name
>>  
>>
>> Compression
>> ---
>>
>> PEP 657 proposes that no compression be used, but also mandates the use
>> of a lossy compression scheme (truncating all offsets over 255).
>> I think it would be better to provide an API like PEP 626 and not
>> restrict the internal format used. In fact, extending or replacing the
>> API of PEP 626 seems the best approach to me.
>>
>> I wouldn't worry about memory consumption.
>> The code object layout and unmarshalling process needs to be
>> re-implemented for reduced memory use and faster startup:
>> https://github.com/markshannon/faster-cpython/blob/master/tiers.md#tier-0
>> A few bytes more or less in the location table(s) is inconsequential.
>>
>> Cheers,
>> Mark.
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/KR7ACFCUNMHT4M7R4XNHGRFV27HZBDFD/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org

[Python-Dev] Re: Critique of PEP 657 -- Include Fine Grained Error Locations in Tracebacks

2021-05-17 Thread Pablo Galindo Salgado
Hi Mark,

Thanks for your useful feedback. Some comments:

> 1.  Errors spanning multiple lines.

That is addressed. Check the latest version of the PEP: we are propising
storing the end lines as well:

https://www.python.org/dev/peps/pep-0657/#specification


> 2. Repeated binary operations on the same line.

It has been suggested to add this on top of the previous information, but
we think this is problematic since these locations need to be
manually calculated in the compiler, while the ranges are derived from the
AST ranges. Given how error prone the old manual calculations of the AST
were, we really want to avoid manual calculations here. Additionally, many
users have mentioned that a single caret is difficult to read and that
motivates the ranges in SyntaxErrors that we introduced.
> 3. Tracking locations in the compiler. PEP 657 proposes that no
compression be used

No, PEP 657 doesn't specify the compression, which is different (check the
latest version). We say:

"The internal storage, compression and encoding of the information is left
as an implementation detail and can be changed at any point as long as the
public API remains unchanged."

> I think it would be better to provide an API like PEP 626 and not
restrict the internal format used.

Indeed, that is what we are doing. Check
https://www.python.org/dev/peps/pep-0657/#id10

Cheers,
Pablo Galindo Salgado


On Mon, 17 May 2021 at 14:17, Mark Shannon  wrote:

> Hi everyone,
>
> I fully agree with the rationale for the PEP, better error messages
> always help.
> However, I think the proposed implementation could be misleading in some
> cases and wastes memory.
>
>
> Use one position, not one-and-a-half positions.
> ---
>
> The main problem with PEP 657, IMO, is that it wants to present the
> location of an error as a pair of positions, but without the line number
> of the second position.
> Consequently it ends up with one and a half positions, not two.
> This does not work well for a number of reasons.
>
> 1.  Errors spanning multiple lines.
>
> Consider:
>
>  (i1 + i2 +
>   s1
>   )
>
> Where i1, i2 are integers and s1 is a string.
>
> With a single location, the error points to the second `+`. PEP 657
> would highlight the whole line, `i1 + i2 +`, making it unclear where the
> error occurred.
>
>
> 2. Repeated binary operations on the same line.
>
> A single location can also be clearer when all the code is on one line.
>
> i1 + i2 + s1
>
> PEP 657:
>
> i1 + i2 + s1
> 
>
> Using a single location:
>
> i1 + i2 + s1
>  ^
>
> 3. Tracking locations in the compiler.
>
> While nicer locations for errors is great, it won't be popular if it has
> a negative impact on performance.
> Locations need to tracked through the compiler. The simpler the
> location, the easier this is to do correctly without a negative
> performance impact.
> It is already tricky to do this correctly with just line numbers because
> we have both source that compiles to no bytecodes (try, pass) and
> bytecodes that have no source (implicit return None and except cleanups).
>
>
> A single location can still be presented as a whole token, as tokenizing
> a single line of code is easy and fast. So when presenting an error, the
> whole token can be highlighted.
>
> E.g:
>
> NameError
>  name
>  
>
> Compression
> ---
>
> PEP 657 proposes that no compression be used, but also mandates the use
> of a lossy compression scheme (truncating all offsets over 255).
> I think it would be better to provide an API like PEP 626 and not
> restrict the internal format used. In fact, extending or replacing the
> API of PEP 626 seems the best approach to me.
>
> I wouldn't worry about memory consumption.
> The code object layout and unmarshalling process needs to be
> re-implemented for reduced memory use and faster startup:
> https://github.com/markshannon/faster-cpython/blob/master/tiers.md#tier-0
> A few bytes more or less in the location table(s) is inconsequential.
>
> Cheers,
> Mark.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KR7ACFCUNMHT4M7R4XNHGRFV27HZBDFD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TKY4YMPAQZDKCK7NV4AQ3IFAN5MF76DU/
Code of Conduct: http://python.org/psf/codeofconduct/