[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Pablo Galindo Salgado Sat, 08 May 2021 14:01:29 -0700

> That could work, but in my personal opinion, I would prefer not to do
that as it complicates things and I think is overkill.


Let me expand on this:

I recognize the problem that -OO can be quite unusable if some of your
dependencies depend on docstrings and that It would be good to separate
this from that option, but I am afraid of the following:

- New APIs in the marshal module and other places to pass down the extra
information to read/write or not the extra information.
- Complication of the pyc format with more entries in the header.
- Complication of the implementation.

Given that the reasons to deactivate this option exist, but I expect them
to be very rare, I would prefer to maximize simplicity and maintainability.

On Sat, 8 May 2021 at 21:50, Pablo Galindo Salgado <[email protected]>
wrote:

> > I don't think the optional existence of column number information needs
> a different kind of pyc file.  Just a flag in a pyc file's header at most.
> It isn't a new type of file.
>
> That could work, but in my personal opinion, I would prefer not to do that
> as it complicates things and I think is overkill.
>
> On Sat, 8 May 2021 at 21:45, Gregory P. Smith <[email protected]> wrote:
>
>>
>> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado <[email protected]>
>> wrote:
>>
>>> > We can't piggy back on -OO as the only way to disable this, it needs
>>> to have an option of its own.  -OO is unusable as code that relies on
>>> "doc"strings as application data such as
>>> http://www.dabeaz.com/ply/ply.html exists.
>>>
>>> -OO is the only sensible way to disable the data. There are two things
>>> to disable:
>>>
>>
>> nit: I wouldn't choose the word "sensible" given that -OO is already
>> fundamentally unusable without knowing if any code in your entire
>> transitive dependencies might depend on the presence of docstrings...
>>
>>
>>>
>>> * The data in pyc files
>>> * Printing the exception highlighting
>>>
>>> Printing the exception highlighting can be disabled via combo of
>>> environment variable / -X option but collecting the data can only be
>>> disabled by -OO. The reason is that this will end in pyc files
>>> so when the data is not there, a different kind of pyc files need to be
>>> produced and I really don't want to have another set of pyc file extension
>>> just to deactivate this. Notice that also a configure
>>> time variable won't work because it will cause crashes when reading pyc
>>> files produced by the interpreter compiled without the flag.
>>>
>>
>> I don't think the optional existence of column number information needs a
>> different kind of pyc file.  Just a flag in a pyc file's header at most.
>> It isn't a new type of file.
>>
>>
>>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Brett,
>>>>>
>>>>> Just to be clear, .pyo files have not existed for a while:
>>>>>> https://www.python.org/dev/peps/pep-0488/.
>>>>>
>>>>>
>>>>> Whoops, my bad, I wanted to refer to the pyc files that are generated
>>>>> with -OO, which have the "opt-2" prefix.
>>>>>
>>>>> This only kicks in at the -OO level.
>>>>>
>>>>>
>>>>> I will correct the PEP so it reflex this more exactly.
>>>>>
>>>>> I personally prefer the idea of dropping the data with -OO since if
>>>>>> you're stripping out docstrings you're already hurting introspection
>>>>>> capabilities in the name of memory. Or one could go as far as to 
>>>>>> introduce
>>>>>> -Os to do -OO plus dropping this extra data.
>>>>>
>>>>>
>>>>> This is indeed the plan, sorry for the confusion. The opt-out
>>>>> mechanism is using -OO, precisely as we are already dropping other data.
>>>>>
>>>>
>>>> We can't piggy back on -OO as the only way to disable this, it needs to
>>>> have an option of its own.  -OO is unusable as code that relies on
>>>> "doc"strings as application data such as
>>>> http://www.dabeaz.com/ply/ply.html exists.
>>>>
>>>> -gps
>>>>
>>>>
>>>>>
>>>>> Thanks for the clarifications!
>>>>>
>>>>>
>>>>>
>>>>> On Sat, 8 May 2021 at 19:41, Brett Cannon <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Although we were originally not sympathetic with it, we may need to
>>>>>>> offer an opt-out mechanism for those users that care about the impact of
>>>>>>> the overhead of the new data in pyc files
>>>>>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>>>>>> Yury, and others). For this, we could propose that the functionality 
>>>>>>> will
>>>>>>> be deactivated along with the extra
>>>>>>> information when Python is executed in optimized mode (``python
>>>>>>> -O``) and therefore pyo files will not have the overhead associated with
>>>>>>> the extra required data.
>>>>>>>
>>>>>>
>>>>>> Just to be clear, .pyo files have not existed for a while:
>>>>>> https://www.python.org/dev/peps/pep-0488/.
>>>>>>
>>>>>>
>>>>>>> Notice that Python
>>>>>>> already strips docstrings in this mode so it would be "aligned"
>>>>>>> with the current mechanism of optimized mode.
>>>>>>>
>>>>>>
>>>>>> This only kicks in at the -OO level.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Although this complicates the implementation, it certainly is still
>>>>>>> much easier than dealing with compression (and more useful for those 
>>>>>>> that
>>>>>>> don't want the feature). Notice that we also
>>>>>>> expect pessimistic results from compression as offsets would be
>>>>>>> quite random (although predominantly in the range 10 - 120).
>>>>>>>
>>>>>>
>>>>>> I personally prefer the idea of dropping the data with -OO since if
>>>>>> you're stripping out docstrings you're already hurting introspection
>>>>>> capabilities in the name of memory. Or one could go as far as to 
>>>>>> introduce
>>>>>> -Os to do -OO plus dropping this extra data.
>>>>>>
>>>>>> As for .pyc file size, I personally wouldn't worry about it. If
>>>>>> someone is that space-constrained they either aren't using .pyc files or
>>>>>> are only shipping a single set of .pyc files under -OO and skipping 
>>>>>> source
>>>>>> code. And .pyc files are an implementation detail of CPython so there
>>>>>> shouldn't be too much of a concern for other interpreters.
>>>>>>
>>>>>> -Brett
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> One last note for clarity: that's the increase of size in the
>>>>>>>> stdlib, the increase of size
>>>>>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an
>>>>>>>> increase of 22%.
>>>>>>>>
>>>>>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Some update on the numbers. We have made some draft implementation
>>>>>>>>> to corroborate the
>>>>>>>>> numbers with some more realistic tests and seems that our original
>>>>>>>>> calculations were wrong.
>>>>>>>>> The actual increase in size is quite bigger than previously
>>>>>>>>> advertised:
>>>>>>>>>
>>>>>>>>> Using bytes object to encode the final object and marshalling that
>>>>>>>>> to disk (so using uint8_t) as the underlying
>>>>>>>>> type:
>>>>>>>>>
>>>>>>>>> BEFORE:
>>>>>>>>>
>>>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>>>>> 70M     Lib
>>>>>>>>> 70M     total
>>>>>>>>>
>>>>>>>>> AFTER:
>>>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>>>>>> ❯ du -h Lib -c --max-depth=0
>>>>>>>>> 76M     Lib
>>>>>>>>> 76M     total
>>>>>>>>>
>>>>>>>>> So that's an increase of 8.56 % over the original value. This is
>>>>>>>>> storing the start offset and end offset with no compression
>>>>>>>>> whatsoever.
>>>>>>>>>
>>>>>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> We are preparing a PEP and we would like to start some early
>>>>>>>>>> discussion about one of the main aspects of the PEP.
>>>>>>>>>>
>>>>>>>>>> The work we are preparing is to allow the interpreter to produce
>>>>>>>>>> more fine-grained error messages, pointing to
>>>>>>>>>> the source associated to the instructions that are failing. For
>>>>>>>>>> example:
>>>>>>>>>>
>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>
>>>>>>>>>>   File "test.py", line 14, in <module>
>>>>>>>>>>
>>>>>>>>>>     lel3(x)
>>>>>>>>>>
>>>>>>>>>>     ^^^^^^^
>>>>>>>>>>
>>>>>>>>>>   File "test.py", line 12, in lel3
>>>>>>>>>>
>>>>>>>>>>     return lel2(x) / 23
>>>>>>>>>>
>>>>>>>>>>            ^^^^^^^
>>>>>>>>>>
>>>>>>>>>>   File "test.py", line 9, in lel2
>>>>>>>>>>
>>>>>>>>>>     return 25 + lel(x) + lel(x)
>>>>>>>>>>
>>>>>>>>>>                 ^^^^^^
>>>>>>>>>>
>>>>>>>>>>   File "test.py", line 6, in lel
>>>>>>>>>>
>>>>>>>>>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>>>>>>>>
>>>>>>>>>>                          ^^^^^^^^^^^^^^^^^^^^^
>>>>>>>>>>
>>>>>>>>>> TypeError: 'NoneType' object is not subscriptable
>>>>>>>>>>
>>>>>>>>>> The cost of this is having the start column number and end
>>>>>>>>>> column number information for every bytecode instruction
>>>>>>>>>> and this is what we want to discuss (there is also some stack
>>>>>>>>>> cost to re-raise exceptions but that's not a big problem in
>>>>>>>>>> any case). Given that column numbers are not very big compared
>>>>>>>>>> with line numbers, we plan to store these as unsigned chars
>>>>>>>>>> or unsigned shorts. We ran some experiments over the standard
>>>>>>>>>> library and we found that the overhead of all pyc files is:
>>>>>>>>>>
>>>>>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB
>>>>>>>>>> and the extra size is 0.88 MB).
>>>>>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB
>>>>>>>>>> and the extra size is 0.44MB).
>>>>>>>>>>
>>>>>>>>>> One of the disadvantages of using chars is that we can only
>>>>>>>>>> report columns from 1 to 255 so if an error happens in a column
>>>>>>>>>> bigger than that then we would have to exclude it (and not show
>>>>>>>>>> the highlighting) for that frame. Unsigned short will allow
>>>>>>>>>> the values to go from 0 to 65535.
>>>>>>>>>>
>>>>>>>>>> Unfortunately these numbers are not easily compressible, as every
>>>>>>>>>> instruction would have very different offsets.
>>>>>>>>>>
>>>>>>>>>> There is also the possibility of not doing this based on some
>>>>>>>>>> build flag on when using -O to allow users to opt out, but given the 
>>>>>>>>>> fact
>>>>>>>>>> that these numbers can be quite useful to other tools like
>>>>>>>>>> coverage measuring tools, tracers, profilers and the such adding 
>>>>>>>>>> conditional
>>>>>>>>>> logic to many places would complicate the implementation
>>>>>>>>>> considerably and will potentially reduce the usability of those 
>>>>>>>>>> tools so we
>>>>>>>>>> prefer
>>>>>>>>>> not to have the conditional logic. We believe this is extra cost
>>>>>>>>>> is very much worth the better error reporting but we understand and 
>>>>>>>>>> respect
>>>>>>>>>> other points of view.
>>>>>>>>>>
>>>>>>>>>> Does anyone see a better way to encode this information **without
>>>>>>>>>> complicating a lot the implementation**? What are people thoughts on 
>>>>>>>>>> the
>>>>>>>>>> feature?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> Regards from cloudy London,
>>>>>>>>>> Pablo Galindo Salgado
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>> Python-Dev mailing list -- [email protected]
>>>>>>> To unsubscribe send an email to [email protected]
>>>>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>>>>> Message archived at
>>>>>>> https://mail.python.org/archives/list/[email protected]/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/
>>>>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>>>>
>>>>>> _______________________________________________
>>>>> Python-Dev mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
>>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>>>> Message archived at
>>>>> https://mail.python.org/archives/list/[email protected]/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/
>>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>>>
>>>>

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/7TMRMFHDHCF6ISTJQJ367GGGHKW4QVHB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Reply via email to