Hi Brett,

Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.


Whoops, my bad, I wanted to refer to the pyc files that are generated
with -OO, which have the "opt-2" prefix.

This only kicks in at the -OO level.


I will correct the PEP so it reflex this more exactly.

I personally prefer the idea of dropping the data with -OO since if you're
> stripping out docstrings you're already hurting introspection capabilities
> in the name of memory. Or one could go as far as to introduce -Os to do -OO
> plus dropping this extra data.


This is indeed the plan, sorry for the confusion. The opt-out mechanism is
using -OO, precisely as we are already dropping other data.

Thanks for the clarifications!



On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote:

>
>
> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <pablog...@gmail.com>
> wrote:
>
>> Although we were originally not sympathetic with it, we may need to offer
>> an opt-out mechanism for those users that care about the impact of the
>> overhead of the new data in pyc files
>> and in in-memory code objectsas was suggested by some folks (Thomas,
>> Yury, and others). For this, we could propose that the functionality will
>> be deactivated along with the extra
>> information when Python is executed in optimized mode (``python -O``) and
>> therefore pyo files will not have the overhead associated with the extra
>> required data.
>>
>
> Just to be clear, .pyo files have not existed for a while:
> https://www.python.org/dev/peps/pep-0488/.
>
>
>> Notice that Python
>> already strips docstrings in this mode so it would be "aligned" with the
>> current mechanism of optimized mode.
>>
>
> This only kicks in at the -OO level.
>
>
>>
>> Although this complicates the implementation, it certainly is still much
>> easier than dealing with compression (and more useful for those that don't
>> want the feature). Notice that we also
>> expect pessimistic results from compression as offsets would be quite
>> random (although predominantly in the range 10 - 120).
>>
>
> I personally prefer the idea of dropping the data with -OO since if you're
> stripping out docstrings you're already hurting introspection capabilities
> in the name of memory. Or one could go as far as to introduce -Os to do -OO
> plus dropping this extra data.
>
> As for .pyc file size, I personally wouldn't worry about it. If someone is
> that space-constrained they either aren't using .pyc files or are only
> shipping a single set of .pyc files under -OO and skipping source code. And
> .pyc files are an implementation detail of CPython so there  shouldn't be
> too much of a concern for other interpreters.
>
> -Brett
>
>
>>
>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <pablog...@gmail.com>
>> wrote:
>>
>>> One last note for clarity: that's the increase of size in the stdlib,
>>> the increase of size
>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
>>> of 22%.
>>>
>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <pablog...@gmail.com>
>>> wrote:
>>>
>>>> Some update on the numbers. We have made some draft implementation to
>>>> corroborate the
>>>> numbers with some more realistic tests and seems that our original
>>>> calculations were wrong.
>>>> The actual increase in size is quite bigger than previously advertised:
>>>>
>>>> Using bytes object to encode the final object and marshalling that to
>>>> disk (so using uint8_t) as the underlying
>>>> type:
>>>>
>>>> BEFORE:
>>>>
>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>> ❯ du -h Lib -c --max-depth=0
>>>> 70M     Lib
>>>> 70M     total
>>>>
>>>> AFTER:
>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>>>> ❯ du -h Lib -c --max-depth=0
>>>> 76M     Lib
>>>> 76M     total
>>>>
>>>> So that's an increase of 8.56 % over the original value. This is
>>>> storing the start offset and end offset with no compression
>>>> whatsoever.
>>>>
>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <pablog...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> We are preparing a PEP and we would like to start some early
>>>>> discussion about one of the main aspects of the PEP.
>>>>>
>>>>> The work we are preparing is to allow the interpreter to produce more
>>>>> fine-grained error messages, pointing to
>>>>> the source associated to the instructions that are failing. For
>>>>> example:
>>>>>
>>>>> Traceback (most recent call last):
>>>>>
>>>>>   File "test.py", line 14, in <module>
>>>>>
>>>>>     lel3(x)
>>>>>
>>>>>     ^^^^^^^
>>>>>
>>>>>   File "test.py", line 12, in lel3
>>>>>
>>>>>     return lel2(x) / 23
>>>>>
>>>>>            ^^^^^^^
>>>>>
>>>>>   File "test.py", line 9, in lel2
>>>>>
>>>>>     return 25 + lel(x) + lel(x)
>>>>>
>>>>>                 ^^^^^^
>>>>>
>>>>>   File "test.py", line 6, in lel
>>>>>
>>>>>     return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>>>
>>>>>                          ^^^^^^^^^^^^^^^^^^^^^
>>>>>
>>>>> TypeError: 'NoneType' object is not subscriptable
>>>>>
>>>>> The cost of this is having the start column number and end
>>>>> column number information for every bytecode instruction
>>>>> and this is what we want to discuss (there is also some stack cost to
>>>>> re-raise exceptions but that's not a big problem in
>>>>> any case). Given that column numbers are not very big compared with
>>>>> line numbers, we plan to store these as unsigned chars
>>>>> or unsigned shorts. We ran some experiments over the standard library
>>>>> and we found that the overhead of all pyc files is:
>>>>>
>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>>>>> extra size is 0.88 MB).
>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and
>>>>> the extra size is 0.44MB).
>>>>>
>>>>> One of the disadvantages of using chars is that we can only report
>>>>> columns from 1 to 255 so if an error happens in a column
>>>>> bigger than that then we would have to exclude it (and not show the
>>>>> highlighting) for that frame. Unsigned short will allow
>>>>> the values to go from 0 to 65535.
>>>>>
>>>>> Unfortunately these numbers are not easily compressible, as every
>>>>> instruction would have very different offsets.
>>>>>
>>>>> There is also the possibility of not doing this based on some build
>>>>> flag on when using -O to allow users to opt out, but given the fact
>>>>> that these numbers can be quite useful to other tools like coverage
>>>>> measuring tools, tracers, profilers and the such adding conditional
>>>>> logic to many places would complicate the implementation considerably
>>>>> and will potentially reduce the usability of those tools so we prefer
>>>>> not to have the conditional logic. We believe this is extra cost is
>>>>> very much worth the better error reporting but we understand and respect
>>>>> other points of view.
>>>>>
>>>>> Does anyone see a better way to encode this information **without
>>>>> complicating a lot the implementation**? What are people thoughts on the
>>>>> feature?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Regards from cloudy London,
>>>>> Pablo Galindo Salgado
>>>>>
>>>>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to