> That could work, but in my personal opinion, I would prefer not to do that as it complicates things and I think is overkill.
Let me expand on this: I recognize the problem that -OO can be quite unusable if some of your dependencies depend on docstrings and that It would be good to separate this from that option, but I am afraid of the following: - New APIs in the marshal module and other places to pass down the extra information to read/write or not the extra information. - Complication of the pyc format with more entries in the header. - Complication of the implementation. Given that the reasons to deactivate this option exist, but I expect them to be very rare, I would prefer to maximize simplicity and maintainability. On Sat, 8 May 2021 at 21:50, Pablo Galindo Salgado <pablog...@gmail.com> wrote: > > I don't think the optional existence of column number information needs > a different kind of pyc file. Just a flag in a pyc file's header at most. > It isn't a new type of file. > > That could work, but in my personal opinion, I would prefer not to do that > as it complicates things and I think is overkill. > > On Sat, 8 May 2021 at 21:45, Gregory P. Smith <g...@krypto.org> wrote: > >> >> On Sat, May 8, 2021 at 1:32 PM Pablo Galindo Salgado <pablog...@gmail.com> >> wrote: >> >>> > We can't piggy back on -OO as the only way to disable this, it needs >>> to have an option of its own. -OO is unusable as code that relies on >>> "doc"strings as application data such as >>> http://www.dabeaz.com/ply/ply.html exists. >>> >>> -OO is the only sensible way to disable the data. There are two things >>> to disable: >>> >> >> nit: I wouldn't choose the word "sensible" given that -OO is already >> fundamentally unusable without knowing if any code in your entire >> transitive dependencies might depend on the presence of docstrings... >> >> >>> >>> * The data in pyc files >>> * Printing the exception highlighting >>> >>> Printing the exception highlighting can be disabled via combo of >>> environment variable / -X option but collecting the data can only be >>> disabled by -OO. The reason is that this will end in pyc files >>> so when the data is not there, a different kind of pyc files need to be >>> produced and I really don't want to have another set of pyc file extension >>> just to deactivate this. Notice that also a configure >>> time variable won't work because it will cause crashes when reading pyc >>> files produced by the interpreter compiled without the flag. >>> >> >> I don't think the optional existence of column number information needs a >> different kind of pyc file. Just a flag in a pyc file's header at most. >> It isn't a new type of file. >> >> >>> On Sat, 8 May 2021 at 21:13, Gregory P. Smith <g...@krypto.org> wrote: >>> >>>> >>>> >>>> On Sat, May 8, 2021 at 11:58 AM Pablo Galindo Salgado < >>>> pablog...@gmail.com> wrote: >>>> >>>>> Hi Brett, >>>>> >>>>> Just to be clear, .pyo files have not existed for a while: >>>>>> https://www.python.org/dev/peps/pep-0488/. >>>>> >>>>> >>>>> Whoops, my bad, I wanted to refer to the pyc files that are generated >>>>> with -OO, which have the "opt-2" prefix. >>>>> >>>>> This only kicks in at the -OO level. >>>>> >>>>> >>>>> I will correct the PEP so it reflex this more exactly. >>>>> >>>>> I personally prefer the idea of dropping the data with -OO since if >>>>>> you're stripping out docstrings you're already hurting introspection >>>>>> capabilities in the name of memory. Or one could go as far as to >>>>>> introduce >>>>>> -Os to do -OO plus dropping this extra data. >>>>> >>>>> >>>>> This is indeed the plan, sorry for the confusion. The opt-out >>>>> mechanism is using -OO, precisely as we are already dropping other data. >>>>> >>>> >>>> We can't piggy back on -OO as the only way to disable this, it needs to >>>> have an option of its own. -OO is unusable as code that relies on >>>> "doc"strings as application data such as >>>> http://www.dabeaz.com/ply/ply.html exists. >>>> >>>> -gps >>>> >>>> >>>>> >>>>> Thanks for the clarifications! >>>>> >>>>> >>>>> >>>>> On Sat, 8 May 2021 at 19:41, Brett Cannon <br...@python.org> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, May 7, 2021 at 7:31 PM Pablo Galindo Salgado < >>>>>> pablog...@gmail.com> wrote: >>>>>> >>>>>>> Although we were originally not sympathetic with it, we may need to >>>>>>> offer an opt-out mechanism for those users that care about the impact of >>>>>>> the overhead of the new data in pyc files >>>>>>> and in in-memory code objectsas was suggested by some folks (Thomas, >>>>>>> Yury, and others). For this, we could propose that the functionality >>>>>>> will >>>>>>> be deactivated along with the extra >>>>>>> information when Python is executed in optimized mode (``python >>>>>>> -O``) and therefore pyo files will not have the overhead associated with >>>>>>> the extra required data. >>>>>>> >>>>>> >>>>>> Just to be clear, .pyo files have not existed for a while: >>>>>> https://www.python.org/dev/peps/pep-0488/. >>>>>> >>>>>> >>>>>>> Notice that Python >>>>>>> already strips docstrings in this mode so it would be "aligned" >>>>>>> with the current mechanism of optimized mode. >>>>>>> >>>>>> >>>>>> This only kicks in at the -OO level. >>>>>> >>>>>> >>>>>>> >>>>>>> Although this complicates the implementation, it certainly is still >>>>>>> much easier than dealing with compression (and more useful for those >>>>>>> that >>>>>>> don't want the feature). Notice that we also >>>>>>> expect pessimistic results from compression as offsets would be >>>>>>> quite random (although predominantly in the range 10 - 120). >>>>>>> >>>>>> >>>>>> I personally prefer the idea of dropping the data with -OO since if >>>>>> you're stripping out docstrings you're already hurting introspection >>>>>> capabilities in the name of memory. Or one could go as far as to >>>>>> introduce >>>>>> -Os to do -OO plus dropping this extra data. >>>>>> >>>>>> As for .pyc file size, I personally wouldn't worry about it. If >>>>>> someone is that space-constrained they either aren't using .pyc files or >>>>>> are only shipping a single set of .pyc files under -OO and skipping >>>>>> source >>>>>> code. And .pyc files are an implementation detail of CPython so there >>>>>> shouldn't be too much of a concern for other interpreters. >>>>>> >>>>>> -Brett >>>>>> >>>>>> >>>>>>> >>>>>>> On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado < >>>>>>> pablog...@gmail.com> wrote: >>>>>>> >>>>>>>> One last note for clarity: that's the increase of size in the >>>>>>>> stdlib, the increase of size >>>>>>>> for pyc files goes from 28.471296MB to 34.750464MB, which is an >>>>>>>> increase of 22%. >>>>>>>> >>>>>>>> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado < >>>>>>>> pablog...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Some update on the numbers. We have made some draft implementation >>>>>>>>> to corroborate the >>>>>>>>> numbers with some more realistic tests and seems that our original >>>>>>>>> calculations were wrong. >>>>>>>>> The actual increase in size is quite bigger than previously >>>>>>>>> advertised: >>>>>>>>> >>>>>>>>> Using bytes object to encode the final object and marshalling that >>>>>>>>> to disk (so using uint8_t) as the underlying >>>>>>>>> type: >>>>>>>>> >>>>>>>>> BEFORE: >>>>>>>>> >>>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>>>>>> ❯ du -h Lib -c --max-depth=0 >>>>>>>>> 70M Lib >>>>>>>>> 70M total >>>>>>>>> >>>>>>>>> AFTER: >>>>>>>>> ❯ ./python -m compileall -r 1000 Lib > /dev/null >>>>>>>>> ❯ du -h Lib -c --max-depth=0 >>>>>>>>> 76M Lib >>>>>>>>> 76M total >>>>>>>>> >>>>>>>>> So that's an increase of 8.56 % over the original value. This is >>>>>>>>> storing the start offset and end offset with no compression >>>>>>>>> whatsoever. >>>>>>>>> >>>>>>>>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado < >>>>>>>>> pablog...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi there, >>>>>>>>>> >>>>>>>>>> We are preparing a PEP and we would like to start some early >>>>>>>>>> discussion about one of the main aspects of the PEP. >>>>>>>>>> >>>>>>>>>> The work we are preparing is to allow the interpreter to produce >>>>>>>>>> more fine-grained error messages, pointing to >>>>>>>>>> the source associated to the instructions that are failing. For >>>>>>>>>> example: >>>>>>>>>> >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> >>>>>>>>>> File "test.py", line 14, in <module> >>>>>>>>>> >>>>>>>>>> lel3(x) >>>>>>>>>> >>>>>>>>>> ^^^^^^^ >>>>>>>>>> >>>>>>>>>> File "test.py", line 12, in lel3 >>>>>>>>>> >>>>>>>>>> return lel2(x) / 23 >>>>>>>>>> >>>>>>>>>> ^^^^^^^ >>>>>>>>>> >>>>>>>>>> File "test.py", line 9, in lel2 >>>>>>>>>> >>>>>>>>>> return 25 + lel(x) + lel(x) >>>>>>>>>> >>>>>>>>>> ^^^^^^ >>>>>>>>>> >>>>>>>>>> File "test.py", line 6, in lel >>>>>>>>>> >>>>>>>>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >>>>>>>>>> >>>>>>>>>> ^^^^^^^^^^^^^^^^^^^^^ >>>>>>>>>> >>>>>>>>>> TypeError: 'NoneType' object is not subscriptable >>>>>>>>>> >>>>>>>>>> The cost of this is having the start column number and end >>>>>>>>>> column number information for every bytecode instruction >>>>>>>>>> and this is what we want to discuss (there is also some stack >>>>>>>>>> cost to re-raise exceptions but that's not a big problem in >>>>>>>>>> any case). Given that column numbers are not very big compared >>>>>>>>>> with line numbers, we plan to store these as unsigned chars >>>>>>>>>> or unsigned shorts. We ran some experiments over the standard >>>>>>>>>> library and we found that the overhead of all pyc files is: >>>>>>>>>> >>>>>>>>>> * If we use shorts, the total overhead is ~3% (total size 28MB >>>>>>>>>> and the extra size is 0.88 MB). >>>>>>>>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB >>>>>>>>>> and the extra size is 0.44MB). >>>>>>>>>> >>>>>>>>>> One of the disadvantages of using chars is that we can only >>>>>>>>>> report columns from 1 to 255 so if an error happens in a column >>>>>>>>>> bigger than that then we would have to exclude it (and not show >>>>>>>>>> the highlighting) for that frame. Unsigned short will allow >>>>>>>>>> the values to go from 0 to 65535. >>>>>>>>>> >>>>>>>>>> Unfortunately these numbers are not easily compressible, as every >>>>>>>>>> instruction would have very different offsets. >>>>>>>>>> >>>>>>>>>> There is also the possibility of not doing this based on some >>>>>>>>>> build flag on when using -O to allow users to opt out, but given the >>>>>>>>>> fact >>>>>>>>>> that these numbers can be quite useful to other tools like >>>>>>>>>> coverage measuring tools, tracers, profilers and the such adding >>>>>>>>>> conditional >>>>>>>>>> logic to many places would complicate the implementation >>>>>>>>>> considerably and will potentially reduce the usability of those >>>>>>>>>> tools so we >>>>>>>>>> prefer >>>>>>>>>> not to have the conditional logic. We believe this is extra cost >>>>>>>>>> is very much worth the better error reporting but we understand and >>>>>>>>>> respect >>>>>>>>>> other points of view. >>>>>>>>>> >>>>>>>>>> Does anyone see a better way to encode this information **without >>>>>>>>>> complicating a lot the implementation**? What are people thoughts on >>>>>>>>>> the >>>>>>>>>> feature? >>>>>>>>>> >>>>>>>>>> Thanks in advance, >>>>>>>>>> >>>>>>>>>> Regards from cloudy London, >>>>>>>>>> Pablo Galindo Salgado >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>> Python-Dev mailing list -- python-dev@python.org >>>>>>> To unsubscribe send an email to python-dev-le...@python.org >>>>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/ >>>>>>> Message archived at >>>>>>> https://mail.python.org/archives/list/python-dev@python.org/message/JUXUC7TYPAMB4EKW6HJL77ORDYQRJEFG/ >>>>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>>>> >>>>>> _______________________________________________ >>>>> Python-Dev mailing list -- python-dev@python.org >>>>> To unsubscribe send an email to python-dev-le...@python.org >>>>> https://mail.python.org/mailman3/lists/python-dev.python.org/ >>>>> Message archived at >>>>> https://mail.python.org/archives/list/python-dev@python.org/message/PDWYJ55Z4XH6OHUQ7IDEG23GWIP6GJOT/ >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>> >>>>
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7TMRMFHDHCF6ISTJQJ367GGGHKW4QVHB/ Code of Conduct: http://python.org/psf/codeofconduct/