Hi Brett, Just to be clear, .pyo files have not existed for a while: > https://www.python.org/dev/peps/pep-0488/.
Whoops, my bad, I wanted to refer to the pyc files that are generated with -OO, which have the "opt-2" prefix. This only kicks in at the -OO level. I will correct the PEP so it reflex this more exactly. I personally prefer the idea of dropping the data with -OO since if you're > stripping out docstrings you're already hurting introspection capabilities > in the name of memory. Or one could go as far as to introduce -Os to do -OO > plus dropping this extra data. This is indeed the plan, sorry for the confusion. The opt-out mechanism is using -OO, precisely as we are already dropping other data. Thanks for the clarifications! On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote: > > > On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <pablog...@gmail.com> > wrote: > >> Although we were originally not sympathetic with it, we may need to offer >> an opt-out mechanism for those users that care about the impact of the >> overhead of the new data in pyc files >> and in in-memory code objectsas was suggested by some folks (Thomas, >> Yury, and others). For this, we could propose that the functionality will >> be deactivated along with the extra >> information when Python is executed in optimized mode (``python -O``) and >> therefore pyo files will not have the overhead associated with the extra >> required data. >> > > Just to be clear, .pyo files have not existed for a while: > https://www.python.org/dev/peps/pep-0488/. > > >> Notice that Python >> already strips docstrings in this mode so it would be "aligned" with the >> current mechanism of optimized mode. >> > > This only kicks in at the -OO level. > > >> >> Although this complicates the implementation, it certainly is still much >> easier than dealing with compression (and more useful for those that don't >> want the feature). Notice that we also >> expect pessimistic results from compression as offsets would be quite >> random (although predominantly in the range 10 - 120). >> > > I personally prefer the idea of dropping the data with -OO since if you're > stripping out docstrings you're already hurting introspection capabilities > in the name of memory. Or one could go as far as to introduce -Os to do -OO > plus dropping this extra data. > > As for .pyc file size, I personally wouldn't worry about it. If someone is > that space-constrained they either aren't using .pyc files or are only > shipping a single set of .pyc files under -OO and skipping source code. And > .pyc files are an implementation detail of CPython so there shouldn't be > too much of a concern for other interpreters. > > -Brett > > >> >> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <pablog...@gmail.com> >> wrote: >> >>> One last note for clarity: that's the increase of size in the stdlib, >>> the increase of size >>> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase >>> of 22%. >>> >>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <pablog...@gmail.com> >>> wrote: >>> >>>> Some update on the numbers. We have made some draft implementation to >>>> corroborate the >>>> numbers with some more realistic tests and seems that our original >>>> calculations were wrong. >>>> The actual increase in size is quite bigger than previously advertised: >>>> >>>> Using bytes object to encode the final object and marshalling that to >>>> disk (so using uint8_t) as the underlying >>>> type: >>>> >>>> BEFORE: >>>> >>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>> ❯ du -h Lib -c --max-depth=0 >>>> 70M Lib >>>> 70M total >>>> >>>> AFTER: >>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>> ❯ du -h Lib -c --max-depth=0 >>>> 76M Lib >>>> 76M total >>>> >>>> So that's an increase of 8.56 % over the original value. This is >>>> storing the start offset and end offset with no compression >>>> whatsoever. >>>> >>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado <pablog...@gmail.com> >>>> wrote: >>>> >>>>> Hi there, >>>>> >>>>> We are preparing a PEP and we would like to start some early >>>>> discussion about one of the main aspects of the PEP. >>>>> >>>>> The work we are preparing is to allow the interpreter to produce more >>>>> fine-grained error messages, pointing to >>>>> the source associated to the instructions that are failing. For >>>>> example: >>>>> >>>>> Traceback (most recent call last): >>>>> >>>>> File "test.py", line 14, in <module> >>>>> >>>>> lel3(x) >>>>> >>>>> ^^^^^^^ >>>>> >>>>> File "test.py", line 12, in lel3 >>>>> >>>>> return lel2(x) / 23 >>>>> >>>>> ^^^^^^^ >>>>> >>>>> File "test.py", line 9, in lel2 >>>>> >>>>> return 25 + lel(x) + lel(x) >>>>> >>>>> ^^^^^^ >>>>> >>>>> File "test.py", line 6, in lel >>>>> >>>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >>>>> >>>>> ^^^^^^^^^^^^^^^^^^^^^ >>>>> >>>>> TypeError: 'NoneType' object is not subscriptable >>>>> >>>>> The cost of this is having the start column number and end >>>>> column number information for every bytecode instruction >>>>> and this is what we want to discuss (there is also some stack cost to >>>>> re-raise exceptions but that's not a big problem in >>>>> any case). Given that column numbers are not very big compared with >>>>> line numbers, we plan to store these as unsigned chars >>>>> or unsigned shorts. We ran some experiments over the standard library >>>>> and we found that the overhead of all pyc files is: >>>>> >>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and the >>>>> extra size is 0.88 MB). >>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and >>>>> the extra size is 0.44MB). >>>>> >>>>> One of the disadvantages of using chars is that we can only report >>>>> columns from 1 to 255 so if an error happens in a column >>>>> bigger than that then we would have to exclude it (and not show the >>>>> highlighting) for that frame. Unsigned short will allow >>>>> the values to go from 0 to 65535. >>>>> >>>>> Unfortunately these numbers are not easily compressible, as every >>>>> instruction would have very different offsets. >>>>> >>>>> There is also the possibility of not doing this based on some build >>>>> flag on when using -O to allow users to opt out, but given the fact >>>>> that these numbers can be quite useful to other tools like coverage >>>>> measuring tools, tracers, profilers and the such adding conditional >>>>> logic to many places would complicate the implementation considerably >>>>> and will potentially reduce the usability of those tools so we prefer >>>>> not to have the conditional logic. We believe this is extra cost is >>>>> very much worth the better error reporting but we understand and respect >>>>> other points of view. >>>>> >>>>> Does anyone see a better way to encode this information **without >>>>> complicating a lot the implementation**? What are people thoughts on the >>>>> feature? >>>>> >>>>> Thanks in advance, >>>>> >>>>> Regards from cloudy London, >>>>> Pablo Galindo Salgado >>>>> >>>>> _______________________________________________ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/ Code of Conduct: http://python.org/psf/codeofconduct/