[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Gregory P. Smith Sat, 08 May 2021 13:14:50 -0700

On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <[email protected]>
wrote:


> Hi Brett,
>
> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>
>
> Whoops, my bad, I wanted to refer to the pyc files that are generated
> with -OO, which have the "opt-2" prefix.
>
> This only kicks in at the -OO level.
>
>
> I will correct the PEP so it reflex this more exactly.
>
> I personally prefer the idea of dropping the data with -OO since if you're
>> stripping out docstrings you're already hurting introspection capabilities
>> in the name of memory. Or one could go as far as to introduce -Os to do -OO
>> plus dropping this extra data.
>
>
> This is indeed the plan, sorry for the confusion. The opt-out mechanism is
> using -OO, precisely as we are already dropping other data.
>

We can't piggy back on -OO as the only way to disable this, it needs to
have an option of its own.  -OO is unusable as code that relies on
"doc"strings as application data such as http://www.dabeaz.com/ply/ply.html
exists.

-gps


>
> Thanks for the clarifications!
>
>
>
> On Sat, 8 May 2021 at 19:41, Brett Cannon <[email protected]> wrote:
>
>>
>>
>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <[email protected]>
>> wrote:
>>
>>> Although we were originally not sympathetic with it, we may need to
>>> offer an opt-out mechanism for those users that care about the impact of
>>> the overhead of the new data in pyc files
>>> and in in-memory code objectsas was suggested by some folks (Thomas,
>>> Yury, and others). For this, we could propose that the functionality will
>>> be deactivated along with the extra
>>> information when Python is executed in optimized mode (``python -O``)
>>> and therefore pyo files will not have the overhead associated with the
>>> extra required data.
>>>
>>
>> Just to be clear, .pyo files have not existed for a while:
>> https://www.python.org/dev/peps/pep-0488/.
>>
>>
>>> Notice that Python
>>> already strips docstrings in this mode so it would be "aligned" with
>>> the current mechanism of optimized mode.
>>>
>>
>> This only kicks in at the -OO level.
>>
>>
>>>
>>> Although this complicates the implementation, it certainly is still much
>>> easier than dealing with compression (and more useful for those that don't
>>> want the feature). Notice that we also
>>> expect pessimistic results from compression as offsets would be quite
>>> random (although predominantly in the range 10 - 120).
>>>
>>
>> I personally prefer the idea of dropping the data with -OO since if
>> you're stripping out docstrings you're already hurting introspection
>> capabilities in the name of memory. Or one could go as far as to introduce
>> -Os to do -OO plus dropping this extra data.
>>
>> As for .pyc file size, I personally wouldn't worry about it. If someone
>> is that space-constrained they either aren't using .pyc files or are only
>> shipping a single set of .pyc files under -OO and skipping source code. And
>> .pyc files are an implementation detail of CPython so there  shouldn't be
>> too much of a concern for other interpreters.
>>
>> -Brett
>>
>>
>>>
>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <[email protected]>
>>> wrote:
>>>
>>>> One last note for clarity: that's the increase of size in the stdlib,
>>>> the increase of size
>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an
>>>> increase of 22%.
>>>>
>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <[email protected]>
>>>> wrote:
>>>>
>>>>> Some update on the numbers. We have made some draft implementation to
>>>>> corroborate the
>>>>> numbers with some more realistic tests and seems that our original
>>>>> calculations were wrong.
>>>>> The actual increase in size is quite bigger than previously advertised:
>>>>>
>>>>> Using bytes object to encode the final object and marshalling that to
>>>>> disk (so using uint8_t) as the underlying
>>>>> type:
>>>>>
>>>>> BEFORE:
>>>>>
>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>> ❯ du -h Lib -c --max-depth=0
>>>>> 70M     Lib
>>>>> 70M     total
>>>>>
>>>>> AFTER:
>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>>> ❯ du -h Lib -c --max-depth=0
>>>>> 76M     Lib
>>>>> 76M     total
>>>>>
>>>>> So that's an increase of 8.56 % over the original value. This is
>>>>> storing the start offset and end offset with no compression
>>>>> whatsoever.
>>>>>
>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> We are preparing a PEP and we would like to start some early
>>>>>> discussion about one of the main aspects of the PEP.
>>>>>>
>>>>>> The work we are preparing is to allow the interpreter to produce more
>>>>>> fine-grained error messages, pointing to
>>>>>> the source associated to the instructions that are failing. For
>>>>>> example:
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>
>>>>>>   File "test.py", line 14, in <module>
>>>>>>
>>>>>>     lel3(x)
>>>>>>
>>>>>>     ^^^^^^^
>>>>>>
>>>>>>   File "test.py", line 12, in lel3
>>>>>>
>>>>>>     return lel2(x) / 23
>>>>>>
>>>>>>            ^^^^^^^
>>>>>>
>>>>>>   File "test.py", line 9, in lel2
>>>>>>
>>>>>>     return 25 + lel(x) + lel(x)
>>>>>>
>>>>>>                 ^^^^^^
>>>>>>
>>>>>>   File "test.py", line 6, in lel
>>>>>>
>>>>>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>>>>
>>>>>>                          ^^^^^^^^^^^^^^^^^^^^^
>>>>>>
>>>>>> TypeError: 'NoneType' object is not subscriptable
>>>>>>
>>>>>> The cost of this is having the start column number and end
>>>>>> column number information for every bytecode instruction
>>>>>> and this is what we want to discuss (there is also some stack cost to
>>>>>> re-raise exceptions but that's not a big problem in
>>>>>> any case). Given that column numbers are not very big compared with
>>>>>> line numbers, we plan to store these as unsigned chars
>>>>>> or unsigned shorts. We ran some experiments over the standard library
>>>>>> and we found that the overhead of all pyc files is:
>>>>>>
>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and
>>>>>> the extra size is 0.88 MB).
>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and
>>>>>> the extra size is 0.44MB).
>>>>>>
>>>>>> One of the disadvantages of using chars is that we can only report
>>>>>> columns from 1 to 255 so if an error happens in a column
>>>>>> bigger than that then we would have to exclude it (and not show the
>>>>>> highlighting) for that frame. Unsigned short will allow
>>>>>> the values to go from 0 to 65535.
>>>>>>
>>>>>> Unfortunately these numbers are not easily compressible, as every
>>>>>> instruction would have very different offsets.
>>>>>>
>>>>>> There is also the possibility of not doing this based on some build
>>>>>> flag on when using -O to allow users to opt out, but given the fact
>>>>>> that these numbers can be quite useful to other tools like coverage
>>>>>> measuring tools, tracers, profilers and the such adding conditional
>>>>>> logic to many places would complicate the implementation considerably
>>>>>> and will potentially reduce the usability of those tools so we prefer
>>>>>> not to have the conditional logic. We believe this is extra cost is
>>>>>> very much worth the better error reporting but we understand and respect
>>>>>> other points of view.
>>>>>>
>>>>>> Does anyone see a better way to encode this information **without
>>>>>> complicating a lot the implementation**? What are people thoughts on the
>>>>>> feature?
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Regards from cloudy London,
>>>>>> Pablo Galindo Salgado
>>>>>>
>>>>>> _______________________________________________
>>> Python-Dev mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/[email protected]/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> _______________________________________________
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/VLMGLTTFYEG2XWINKZLPSFWIWOZVQSQW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

Reply via email to