On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado <pablog...@gmail.com> wrote:
> Hi Brett, > > Just to be clear, .pyo files have not existed for a while: >> https://www.python.org/dev/peps/pep-0488/. > > > Whoops, my bad, I wanted to refer to the pyc files that are generated > with -OO, which have the "opt-2" prefix. > > This only kicks in at the -OO level. > > > I will correct the PEP so it reflex this more exactly. > > I personally prefer the idea of dropping the data with -OO since if you're >> stripping out docstrings you're already hurting introspection capabilities >> in the name of memory. Or one could go as far as to introduce -Os to do -OO >> plus dropping this extra data. > > > This is indeed the plan, sorry for the confusion. The opt-out mechanism is > using -OO, precisely as we are already dropping other data. > We can't piggy back on -OO as the only way to disable this, it needs to have an option of its own. -OO is unusable as code that relies on "doc"strings as application data such as http://www.dabeaz.com/ply/ply.html exists. -gps > > Thanks for the clarifications! > > > > On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote: > >> >> >> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado <pablog...@gmail.com> >> wrote: >> >>> Although we were originally not sympathetic with it, we may need to >>> offer an opt-out mechanism for those users that care about the impact of >>> the overhead of the new data in pyc files >>> and in in-memory code objectsas was suggested by some folks (Thomas, >>> Yury, and others). For this, we could propose that the functionality will >>> be deactivated along with the extra >>> information when Python is executed in optimized mode (``python -O``) >>> and therefore pyo files will not have the overhead associated with the >>> extra required data. >>> >> >> Just to be clear, .pyo files have not existed for a while: >> https://www.python.org/dev/peps/pep-0488/. >> >> >>> Notice that Python >>> already strips docstrings in this mode so it would be "aligned" with >>> the current mechanism of optimized mode. >>> >> >> This only kicks in at the -OO level. >> >> >>> >>> Although this complicates the implementation, it certainly is still much >>> easier than dealing with compression (and more useful for those that don't >>> want the feature). Notice that we also >>> expect pessimistic results from compression as offsets would be quite >>> random (although predominantly in the range 10 - 120). >>> >> >> I personally prefer the idea of dropping the data with -OO since if >> you're stripping out docstrings you're already hurting introspection >> capabilities in the name of memory. Or one could go as far as to introduce >> -Os to do -OO plus dropping this extra data. >> >> As for .pyc file size, I personally wouldn't worry about it. If someone >> is that space-constrained they either aren't using .pyc files or are only >> shipping a single set of .pyc files under -OO and skipping source code. And >> .pyc files are an implementation detail of CPython so there shouldn't be >> too much of a concern for other interpreters. >> >> -Brett >> >> >>> >>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado <pablog...@gmail.com> >>> wrote: >>> >>>> One last note for clarity: that's the increase of size in the stdlib, >>>> the increase of size >>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an >>>> increase of 22%. >>>> >>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado <pablog...@gmail.com> >>>> wrote: >>>> >>>>> Some update on the numbers. We have made some draft implementation to >>>>> corroborate the >>>>> numbers with some more realistic tests and seems that our original >>>>> calculations were wrong. >>>>> The actual increase in size is quite bigger than previously advertised: >>>>> >>>>> Using bytes object to encode the final object and marshalling that to >>>>> disk (so using uint8_t) as the underlying >>>>> type: >>>>> >>>>> BEFORE: >>>>> >>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>> ❯ du -h Lib -c --max-depth=0 >>>>> 70M Lib >>>>> 70M total >>>>> >>>>> AFTER: >>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>> ❯ du -h Lib -c --max-depth=0 >>>>> 76M Lib >>>>> 76M total >>>>> >>>>> So that's an increase of 8.56 % over the original value. This is >>>>> storing the start offset and end offset with no compression >>>>> whatsoever. >>>>> >>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado < >>>>> pablog...@gmail.com> wrote: >>>>> >>>>>> Hi there, >>>>>> >>>>>> We are preparing a PEP and we would like to start some early >>>>>> discussion about one of the main aspects of the PEP. >>>>>> >>>>>> The work we are preparing is to allow the interpreter to produce more >>>>>> fine-grained error messages, pointing to >>>>>> the source associated to the instructions that are failing. For >>>>>> example: >>>>>> >>>>>> Traceback (most recent call last): >>>>>> >>>>>> File "test.py", line 14, in <module> >>>>>> >>>>>> lel3(x) >>>>>> >>>>>> ^^^^^^^ >>>>>> >>>>>> File "test.py", line 12, in lel3 >>>>>> >>>>>> return lel2(x) / 23 >>>>>> >>>>>> ^^^^^^^ >>>>>> >>>>>> File "test.py", line 9, in lel2 >>>>>> >>>>>> return 25 + lel(x) + lel(x) >>>>>> >>>>>> ^^^^^^ >>>>>> >>>>>> File "test.py", line 6, in lel >>>>>> >>>>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >>>>>> >>>>>> ^^^^^^^^^^^^^^^^^^^^^ >>>>>> >>>>>> TypeError: 'NoneType' object is not subscriptable >>>>>> >>>>>> The cost of this is having the start column number and end >>>>>> column number information for every bytecode instruction >>>>>> and this is what we want to discuss (there is also some stack cost to >>>>>> re-raise exceptions but that's not a big problem in >>>>>> any case). Given that column numbers are not very big compared with >>>>>> line numbers, we plan to store these as unsigned chars >>>>>> or unsigned shorts. We ran some experiments over the standard library >>>>>> and we found that the overhead of all pyc files is: >>>>>> >>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB and >>>>>> the extra size is 0.88 MB). >>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and >>>>>> the extra size is 0.44MB). >>>>>> >>>>>> One of the disadvantages of using chars is that we can only report >>>>>> columns from 1 to 255 so if an error happens in a column >>>>>> bigger than that then we would have to exclude it (and not show the >>>>>> highlighting) for that frame. Unsigned short will allow >>>>>> the values to go from 0 to 65535. >>>>>> >>>>>> Unfortunately these numbers are not easily compressible, as every >>>>>> instruction would have very different offsets. >>>>>> >>>>>> There is also the possibility of not doing this based on some build >>>>>> flag on when using -O to allow users to opt out, but given the fact >>>>>> that these numbers can be quite useful to other tools like coverage >>>>>> measuring tools, tracers, profilers and the such adding conditional >>>>>> logic to many places would complicate the implementation considerably >>>>>> and will potentially reduce the usability of those tools so we prefer >>>>>> not to have the conditional logic. We believe this is extra cost is >>>>>> very much worth the better error reporting but we understand and respect >>>>>> other points of view. >>>>>> >>>>>> Does anyone see a better way to encode this information **without >>>>>> complicating a lot the implementation**? What are people thoughts on the >>>>>> feature? >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Regards from cloudy London, >>>>>> Pablo Galindo Salgado >>>>>> >>>>>> _______________________________________________ >>> Python-Dev mailing list -- python-dev@python.org >>> To unsubscribe send an email to python-dev-le...@python.org >>> https://mail.python.org/mailman3/lists/python-dev.python.org/ >>> Message archived at >>> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/ >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VLMGLTTFYEG2XWINKZLPSFWIWOZVQSQW/ Code of Conduct: http://python.org/psf/codeofconduct/