> What are people thoughts on the feature?

I'm +1, this level of detail in the bytecode is very useful. My main
interest is actually from the AST though. :) In order to be in the
bytecode, one assumes it must first be in the AST. That information is
incredibly useful for refactoring tools like https://github.com/ssbr/refex
(n.b. author=me) or https://github.com/gristlabs/asttokens (which refex
builds on). Currently, asttokens actually attempts to re-discover that kind
of information after the fact, which is error-prone and difficult.

This could also be useful for finer-grained code coverage tracking and/or
debugging. One can actually imagine highlighting the spans of code which
were only partially executed: e.g. if only x() were ever executed in "x()
and y()" . Ned Batchelder once did wild hacks in this space, and maybe this
proposal could lead in the future to something non-hacky?
https://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
I say "in the future" because it doesn't just automatically work, since as
I understand it, coverage currently doesn't track spans, but lines hit by
the line-based debugger. Something else is needed to be able to track which
spans were hit rather than which lines, and it may be similarly hacky if
it's isolated to coveragepy. If, for example, enough were exposed to let a
debugger skip to bytecode for the next different (sub) span, then this
would be useful for both coverage and actual debugging as you step through
an expression. This is probably way out of scope for your PEP, but even so,
the feature may be laying some useful ground work here.

-- Devin

On Fri, May 7, 2021 at 2:52 PM Pablo Galindo Salgado <pablog...@gmail.com>
wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in <module>
>
>     lel3(x)
>
>     ^^^^^^^
>
>   File "test.py", line 12, in lel3
>
>     return lel2(x) / 23
>
>            ^^^^^^^
>
>   File "test.py", line 9, in lel2
>
>     return 25 + lel(x) + lel(x)
>
>                 ^^^^^^
>
>   File "test.py", line 6, in lel
>
>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>                          ^^^^^^^^^^^^^^^^^^^^^
>
> TypeError: 'NoneType' object is not subscriptable
>
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>
> Thanks in advance,
>
> Regards from cloudy London,
> Pablo Galindo Salgado
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DB3RTYBF2BXTY6ZHP3Z4DXCRWPJIQUFD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Y4R44A4JY3WHJW2PVK5AXBXYO4X3BPA4/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to