[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 7, 2021 at 8:14 PM Neil Schemenauer wrote: > > On 2021-05-07, Pablo Galindo Salgado wrote: > > Technically the main concern may be the size of the unmarshalled > > pyc files in memory, more than the storage size of disk. > > It would be cool if we could mmap the pyc files and have the VM run > code without an unmarshal step. One idea is something similar to > the Facebook "not another freeze" PR but with a twist. Their > approach was to dump out code objects so they could be loaded as if > they were statically defined structures. > > Instead, could we dump out the pyc data in a format similar to Cap'n > Proto? That way no unmarshal is needed. The VM would have to be > extensively changed to run code in that format. That's the hard > part. > > The benefit would be faster startup times. The unmarshal step is > costly. It would mostly solve the concern about these larger > linenum/colnum tables. We would only load that data into memory if > the table is accessed. A simpler version would be to pack just the docstrings/lnotab/column numbers into a separate part of the .pyc, and store a reference to the file + offset to load them lazily on demand. No need for mmap. Could also store them in memory, but with some cheap compression applied, and decompress on access. None of these get accessed often. -n -- Nathaniel J. Smith -- https://vorpus.org ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Q2DBRE5YKLTSPVCMUCXPEDXKFCA4UUGQ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Speeding up CPython
On Fri, May 7, 2021 at 6:51 PM Steven D'Aprano wrote: > On Tue, Oct 20, 2020 at 01:53:34PM +0100, Mark Shannon wrote: > > Hi everyone, > > > > CPython is slow. We all know that, yet little is done to fix it. > > > > I'd like to change that. > > I have a plan to speed up CPython by a factor of five over the next few > > years. But it needs funding. > > I've noticed a lot of optimization-related b.p.o. issues created by > Mark, which is great. What happened with Mark's proposal here? Did the > funding issue get sorted? > I believe Guido has Mark contracting on Python performance through Microsoft? -Greg ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5RTMMJLGZE5FHW3SAYWSKUYOLEUZ2RFX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On 2021-05-07, Pablo Galindo Salgado wrote: > Technically the main concern may be the size of the unmarshalled > pyc files in memory, more than the storage size of disk. It would be cool if we could mmap the pyc files and have the VM run code without an unmarshal step. One idea is something similar to the Facebook "not another freeze" PR but with a twist. Their approach was to dump out code objects so they could be loaded as if they were statically defined structures. Instead, could we dump out the pyc data in a format similar to Cap'n Proto? That way no unmarshal is needed. The VM would have to be extensively changed to run code in that format. That's the hard part. The benefit would be faster startup times. The unmarshal step is costly. It would mostly solve the concern about these larger linenum/colnum tables. We would only load that data into memory if the table is accessed. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UKDLCOTUFNWGSMWWGLH3DJC4AVYZANDM/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
Although we were originally not sympathetic with it, we may need to offer an opt-out mechanism for those users that care about the impact of the overhead of the new data in pyc files and in in-memory code objectsas was suggested by some folks (Thomas, Yury, and others). For this, we could propose that the functionality will be deactivated along with the extra information when Python is executed in optimized mode (``python -O``) and therefore pyo files will not have the overhead associated with the extra required data. Notice that Python already strips docstrings in this mode so it would be "aligned" with the current mechanism of optimized mode. Although this complicates the implementation, it certainly is still much easier than dealing with compression (and more useful for those that don't want the feature). Notice that we also expect pessimistic results from compression as offsets would be quite random (although predominantly in the range 10 - 120). On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado wrote: > One last note for clarity: that's the increase of size in the stdlib, the > increase of size > for pyc files goes from 28.471296MB to 34.750464MB, which is an increase > of 22%. > > On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado > wrote: > >> Some update on the numbers. We have made some draft implementation to >> corroborate the >> numbers with some more realistic tests and seems that our original >> calculations were wrong. >> The actual increase in size is quite bigger than previously advertised: >> >> Using bytes object to encode the final object and marshalling that to >> disk (so using uint8_t) as the underlying >> type: >> >> BEFORE: >> >> ❯ ./python -m compileall -r 1000 Lib > /dev/null >> ❯ du -h Lib -c --max-depth=0 >> 70M Lib >> 70M total >> >> AFTER: >> ❯ ./python -m compileall -r 1000 Lib > /dev/null >> ❯ du -h Lib -c --max-depth=0 >> 76M Lib >> 76M total >> >> So that's an increase of 8.56 % over the original value. This is storing >> the start offset and end offset with no compression >> whatsoever. >> >> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado >> wrote: >> >>> Hi there, >>> >>> We are preparing a PEP and we would like to start some early discussion >>> about one of the main aspects of the PEP. >>> >>> The work we are preparing is to allow the interpreter to produce more >>> fine-grained error messages, pointing to >>> the source associated to the instructions that are failing. For example: >>> >>> Traceback (most recent call last): >>> >>> File "test.py", line 14, in >>> >>> lel3(x) >>> >>> ^^^ >>> >>> File "test.py", line 12, in lel3 >>> >>> return lel2(x) / 23 >>> >>>^^^ >>> >>> File "test.py", line 9, in lel2 >>> >>> return 25 + lel(x) + lel(x) >>> >>> ^^ >>> >>> File "test.py", line 6, in lel >>> >>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >>> >>> ^ >>> >>> TypeError: 'NoneType' object is not subscriptable >>> >>> The cost of this is having the start column number and end column number >>> information for every bytecode instruction >>> and this is what we want to discuss (there is also some stack cost to >>> re-raise exceptions but that's not a big problem in >>> any case). Given that column numbers are not very big compared with line >>> numbers, we plan to store these as unsigned chars >>> or unsigned shorts. We ran some experiments over the standard library >>> and we found that the overhead of all pyc files is: >>> >>> * If we use shorts, the total overhead is ~3% (total size 28MB and the >>> extra size is 0.88 MB). >>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the >>> extra size is 0.44MB). >>> >>> One of the disadvantages of using chars is that we can only report >>> columns from 1 to 255 so if an error happens in a column >>> bigger than that then we would have to exclude it (and not show the >>> highlighting) for that frame. Unsigned short will allow >>> the values to go from 0 to 65535. >>> >>> Unfortunately these numbers are not easily compressible, as every >>> instruction would have very different offsets. >>> >>> There is also the possibility of not doing this based on some build flag >>> on when using -O to allow users to opt out, but given the fact >>> that these numbers can be quite useful to other tools like coverage >>> measuring tools, tracers, profilers and the such adding conditional >>> logic to many places would complicate the implementation considerably >>> and will potentially reduce the usability of those tools so we prefer >>> not to have the conditional logic. We believe this is extra cost is very >>> much worth the better error reporting but we understand and respect >>> other points of view. >>> >>> Does anyone see a better way to encode this information **without >>> complicating a lot the implementation**? What are people thoughts on the >>> feature? >>> >>> Tha
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 7, 2021 at 6:39 PM Steven D'Aprano wrote: > On Fri, May 07, 2021 at 06:02:51PM -0700, Chris Jerdonek wrote: > > > To know what compression methods might be effective, I’m wondering if it > > could be useful to see separate histograms of, say, the start column > number > > and width over the code base. Or for people that really want to dig in, > > maybe access to the set of all pairs could help. (E.g. maybe a histogram > of > > pairs could also reveal something.) > > I think this is over-analysing. Do we need to micro-optimize the > compression algorithm? Let's make the choice simple: live with the size > increase, or swap to LZ4 compression as Antoine suggested. Analysis > paralysis is a real risk here. > > If there are implementations which cannot support either (MicroPython?) > they should be free to continue doing things the old way. In other > words, "fine grained error messages" should be a quality of > implementation feature rather than a language guarantee. > > I understand that the plan is to make this feature optional in any case, > to allow third-party tools to catch up. > > If people really want to do that histogram analysis so that they can > optimize the choice of compression algorithm, of course they are free to > do so. But the PEP authors should not feel that they are obliged to do > so, and we should avoid the temptation to bikeshed over compressors. > I'm not sure why you're sounding so negative. Pablo asked for ideas in his first message to the list: On Fri, May 7, 2021 at 2:53 PM Pablo Galindo Salgado wrote: > Does anyone see a better way to encode this information **without > complicating a lot the implementation**? > Maybe a large gain can be made with a simple tweak to how the pair is encoded, but there's no way to know without seeing the distribution. Also, my reply wasn't about the pyc files on disk but about their representation in memory, which Pablo later said may be the main concern. So it's not compression algorithms like LZ4 so much as a method of encoding. --Chris > > (For what it's worth, I like this proposed feature, I don't care about a > 20-25% increase in pyc file size, but if this leads to adding LZ4 > compression to the stdlib, I like it even more :-) > > > -- > Steve > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/6H2XSRMARU4SX4WRMIO2M4MI4EQASPBC/ > Code of Conduct: http://python.org/psf/codeofconduct/ > > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UYARCZJJFIEKRWMEEBW2FAGBPAPDFJGG/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
> I'm wondering if it's possible to compromise with one position that's not as complete but still gives a good hint: Even if is possible, it will be quite less useful (a lot of users wanted to highlight full ranges for syntax errors, and that change was very well received when we announce it in 3.10) and most importantly, will render the feature much less useful for other tools such as profilers, coverage tools, and the like. It will also make the feature less useful for people that want to display even more information such as error reporting tools, IDEsetc On Sat, 8 May 2021 at 02:41, MRAB wrote: > On 2021-05-08 01:43, Pablo Galindo Salgado wrote: > > Some update on the numbers. We have made some draft implementation to > > corroborate the > > numbers with some more realistic tests and seems that our original > > calculations were wrong. > > The actual increase in size is quite bigger than previously advertised: > > > > Using bytes object to encode the final object and marshalling that to > > disk (so using uint8_t) as the underlying > > type: > > > > BEFORE: > > > > ❯ ./python -m compileall -r 1000 Lib > /dev/null > > ❯ du -h Lib -c --max-depth=0 > > 70M Lib > > 70M total > > > > AFTER: > > ❯ ./python -m compileall -r 1000 Lib > /dev/null > > ❯ du -h Lib -c --max-depth=0 > > 76M Lib > > 76M total > > > > So that's an increase of 8.56 % over the original value. This is storing > > the start offset and end offset with no compression > > whatsoever. > > > [snip] > > I'm wondering if it's possible to compromise with one position that's > not as complete but still gives a good hint: > > For example: > >File "test.py", line 6, in lel > return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >^ > > TypeError: 'NoneType' object is not subscriptable > > That at least tells you which subscript raised the exception. > > > Another example: > >Traceback (most recent call last): > File "test.py", line 4, in >print(1 / x + 1 / y) >^ >ZeroDivisionError: division by zero > > as distinct from: > >Traceback (most recent call last): > File "test.py", line 4, in >print(1 / x + 1 / y) >^ >ZeroDivisionError: division by zero > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/4RGQALI6T6HBNRDUUEYX4FA2YKTZDBNA/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7OKDSXZZ7TQFQ3X4RZGNGLX5UDF2B5QW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Speeding up CPython
On Tue, Oct 20, 2020 at 01:53:34PM +0100, Mark Shannon wrote: > Hi everyone, > > CPython is slow. We all know that, yet little is done to fix it. > > I'd like to change that. > I have a plan to speed up CPython by a factor of five over the next few > years. But it needs funding. I've noticed a lot of optimization-related b.p.o. issues created by Mark, which is great. What happened with Mark's proposal here? Did the funding issue get sorted? -- Steve ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6ZVJ5ZJC2BXUUSI3JTGOU4MQXQHORI4Q/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On 2021-05-08 01:43, Pablo Galindo Salgado wrote: Some update on the numbers. We have made some draft implementation to corroborate the numbers with some more realistic tests and seems that our original calculations were wrong. The actual increase in size is quite bigger than previously advertised: Using bytes object to encode the final object and marshalling that to disk (so using uint8_t) as the underlying type: BEFORE: ❯ ./python -m compileall -r 1000 Lib > /dev/null ❯ du -h Lib -c --max-depth=0 70M Lib 70M total AFTER: ❯ ./python -m compileall -r 1000 Lib > /dev/null ❯ du -h Lib -c --max-depth=0 76M Lib 76M total So that's an increase of 8.56 % over the original value. This is storing the start offset and end offset with no compression whatsoever. [snip] I'm wondering if it's possible to compromise with one position that's not as complete but still gives a good hint: For example: File "test.py", line 6, in lel return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) ^ TypeError: 'NoneType' object is not subscriptable That at least tells you which subscript raised the exception. Another example: Traceback (most recent call last): File "test.py", line 4, in print(1 / x + 1 / y) ^ ZeroDivisionError: division by zero as distinct from: Traceback (most recent call last): File "test.py", line 4, in print(1 / x + 1 / y) ^ ZeroDivisionError: division by zero ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4RGQALI6T6HBNRDUUEYX4FA2YKTZDBNA/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 07, 2021 at 06:02:51PM -0700, Chris Jerdonek wrote: > To know what compression methods might be effective, I’m wondering if it > could be useful to see separate histograms of, say, the start column number > and width over the code base. Or for people that really want to dig in, > maybe access to the set of all pairs could help. (E.g. maybe a histogram of > pairs could also reveal something.) I think this is over-analysing. Do we need to micro-optimize the compression algorithm? Let's make the choice simple: live with the size increase, or swap to LZ4 compression as Antoine suggested. Analysis paralysis is a real risk here. If there are implementations which cannot support either (MicroPython?) they should be free to continue doing things the old way. In other words, "fine grained error messages" should be a quality of implementation feature rather than a language guarantee. I understand that the plan is to make this feature optional in any case, to allow third-party tools to catch up. If people really want to do that histogram analysis so that they can optimize the choice of compression algorithm, of course they are free to do so. But the PEP authors should not feel that they are obliged to do so, and we should avoid the temptation to bikeshed over compressors. (For what it's worth, I like this proposed feature, I don't care about a 20-25% increase in pyc file size, but if this leads to adding LZ4 compression to the stdlib, I like it even more :-) -- Steve ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6H2XSRMARU4SX4WRMIO2M4MI4EQASPBC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 7, 2021 at 5:44 PM Pablo Galindo Salgado wrote: > Some update on the numbers. We have made some draft implementation to > corroborate the > numbers with some more realistic tests and seems that our original > calculations were wrong. > The actual increase in size is quite bigger than previously advertised: > > Using bytes object to encode the final object and marshalling that to disk > (so using uint8_t) as the underlying > type: > > BEFORE: > > ❯ ./python -m compileall -r 1000 Lib > /dev/null > ❯ du -h Lib -c --max-depth=0 > 70M Lib > 70M total > > AFTER: > ❯ ./python -m compileall -r 1000 Lib > /dev/null > ❯ du -h Lib -c --max-depth=0 > 76M Lib > 76M total > > So that's an increase of 8.56 % over the original value. This is storing > the start offset and end offset with no compression > whatsoever. > To know what compression methods might be effective, I’m wondering if it could be useful to see separate histograms of, say, the start column number and width over the code base. Or for people that really want to dig in, maybe access to the set of all pairs could help. (E.g. maybe a histogram of pairs could also reveal something.) —Chris > On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado > wrote: > >> Hi there, >> >> We are preparing a PEP and we would like to start some early discussion >> about one of the main aspects of the PEP. >> >> The work we are preparing is to allow the interpreter to produce more >> fine-grained error messages, pointing to >> the source associated to the instructions that are failing. For example: >> >> Traceback (most recent call last): >> >> File "test.py", line 14, in >> >> lel3(x) >> >> ^^^ >> >> File "test.py", line 12, in lel3 >> >> return lel2(x) / 23 >> >>^^^ >> >> File "test.py", line 9, in lel2 >> >> return 25 + lel(x) + lel(x) >> >> ^^ >> >> File "test.py", line 6, in lel >> >> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >> >> ^ >> >> TypeError: 'NoneType' object is not subscriptable >> >> The cost of this is having the start column number and end column number >> information for every bytecode instruction >> and this is what we want to discuss (there is also some stack cost to >> re-raise exceptions but that's not a big problem in >> any case). Given that column numbers are not very big compared with line >> numbers, we plan to store these as unsigned chars >> or unsigned shorts. We ran some experiments over the standard library and >> we found that the overhead of all pyc files is: >> >> * If we use shorts, the total overhead is ~3% (total size 28MB and the >> extra size is 0.88 MB). >> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the >> extra size is 0.44MB). >> >> One of the disadvantages of using chars is that we can only report >> columns from 1 to 255 so if an error happens in a column >> bigger than that then we would have to exclude it (and not show the >> highlighting) for that frame. Unsigned short will allow >> the values to go from 0 to 65535. >> >> Unfortunately these numbers are not easily compressible, as every >> instruction would have very different offsets. >> >> There is also the possibility of not doing this based on some build flag >> on when using -O to allow users to opt out, but given the fact >> that these numbers can be quite useful to other tools like coverage >> measuring tools, tracers, profilers and the such adding conditional >> logic to many places would complicate the implementation considerably and >> will potentially reduce the usability of those tools so we prefer >> not to have the conditional logic. We believe this is extra cost is very >> much worth the better error reporting but we understand and respect >> other points of view. >> >> Does anyone see a better way to encode this information **without >> complicating a lot the implementation**? What are people thoughts on the >> feature? >> >> Thanks in advance, >> >> Regards from cloudy London, >> Pablo Galindo Salgado >> >> ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/QDEKMTZRMPEKPFFBPCGUYWLLR43A6M6U/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZTNJHADASSERV65FSVVYWNL6JF65CYQK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
One last note for clarity: that's the increase of size in the stdlib, the increase of size for pyc files goes from 28.471296MB to 34.750464MB, which is an increase of 22%. On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado wrote: > Some update on the numbers. We have made some draft implementation to > corroborate the > numbers with some more realistic tests and seems that our original > calculations were wrong. > The actual increase in size is quite bigger than previously advertised: > > Using bytes object to encode the final object and marshalling that to disk > (so using uint8_t) as the underlying > type: > > BEFORE: > > ❯ ./python -m compileall -r 1000 Lib > /dev/null > ❯ du -h Lib -c --max-depth=0 > 70M Lib > 70M total > > AFTER: > ❯ ./python -m compileall -r 1000 Lib > /dev/null > ❯ du -h Lib -c --max-depth=0 > 76M Lib > 76M total > > So that's an increase of 8.56 % over the original value. This is storing > the start offset and end offset with no compression > whatsoever. > > On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado > wrote: > >> Hi there, >> >> We are preparing a PEP and we would like to start some early discussion >> about one of the main aspects of the PEP. >> >> The work we are preparing is to allow the interpreter to produce more >> fine-grained error messages, pointing to >> the source associated to the instructions that are failing. For example: >> >> Traceback (most recent call last): >> >> File "test.py", line 14, in >> >> lel3(x) >> >> ^^^ >> >> File "test.py", line 12, in lel3 >> >> return lel2(x) / 23 >> >>^^^ >> >> File "test.py", line 9, in lel2 >> >> return 25 + lel(x) + lel(x) >> >> ^^ >> >> File "test.py", line 6, in lel >> >> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >> >> ^ >> >> TypeError: 'NoneType' object is not subscriptable >> >> The cost of this is having the start column number and end column number >> information for every bytecode instruction >> and this is what we want to discuss (there is also some stack cost to >> re-raise exceptions but that's not a big problem in >> any case). Given that column numbers are not very big compared with line >> numbers, we plan to store these as unsigned chars >> or unsigned shorts. We ran some experiments over the standard library and >> we found that the overhead of all pyc files is: >> >> * If we use shorts, the total overhead is ~3% (total size 28MB and the >> extra size is 0.88 MB). >> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the >> extra size is 0.44MB). >> >> One of the disadvantages of using chars is that we can only report >> columns from 1 to 255 so if an error happens in a column >> bigger than that then we would have to exclude it (and not show the >> highlighting) for that frame. Unsigned short will allow >> the values to go from 0 to 65535. >> >> Unfortunately these numbers are not easily compressible, as every >> instruction would have very different offsets. >> >> There is also the possibility of not doing this based on some build flag >> on when using -O to allow users to opt out, but given the fact >> that these numbers can be quite useful to other tools like coverage >> measuring tools, tracers, profilers and the such adding conditional >> logic to many places would complicate the implementation considerably and >> will potentially reduce the usability of those tools so we prefer >> not to have the conditional logic. We believe this is extra cost is very >> much worth the better error reporting but we understand and respect >> other points of view. >> >> Does anyone see a better way to encode this information **without >> complicating a lot the implementation**? What are people thoughts on the >> feature? >> >> Thanks in advance, >> >> Regards from cloudy London, >> Pablo Galindo Salgado >> >> ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RICGTXCABZPK7RLDB7SISR4E64S6FEKR/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
Some update on the numbers. We have made some draft implementation to corroborate the numbers with some more realistic tests and seems that our original calculations were wrong. The actual increase in size is quite bigger than previously advertised: Using bytes object to encode the final object and marshalling that to disk (so using uint8_t) as the underlying type: BEFORE: ❯ ./python -m compileall -r 1000 Lib > /dev/null ❯ du -h Lib -c --max-depth=0 70M Lib 70M total AFTER: ❯ ./python -m compileall -r 1000 Lib > /dev/null ❯ du -h Lib -c --max-depth=0 76M Lib 76M total So that's an increase of 8.56 % over the original value. This is storing the start offset and end offset with no compression whatsoever. On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado wrote: > Hi there, > > We are preparing a PEP and we would like to start some early discussion > about one of the main aspects of the PEP. > > The work we are preparing is to allow the interpreter to produce more > fine-grained error messages, pointing to > the source associated to the instructions that are failing. For example: > > Traceback (most recent call last): > > File "test.py", line 14, in > > lel3(x) > > ^^^ > > File "test.py", line 12, in lel3 > > return lel2(x) / 23 > >^^^ > > File "test.py", line 9, in lel2 > > return 25 + lel(x) + lel(x) > > ^^ > > File "test.py", line 6, in lel > > return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) > > ^ > > TypeError: 'NoneType' object is not subscriptable > > The cost of this is having the start column number and end column number > information for every bytecode instruction > and this is what we want to discuss (there is also some stack cost to > re-raise exceptions but that's not a big problem in > any case). Given that column numbers are not very big compared with line > numbers, we plan to store these as unsigned chars > or unsigned shorts. We ran some experiments over the standard library and > we found that the overhead of all pyc files is: > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > extra size is 0.88 MB). > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > extra size is 0.44MB). > > One of the disadvantages of using chars is that we can only report columns > from 1 to 255 so if an error happens in a column > bigger than that then we would have to exclude it (and not show the > highlighting) for that frame. Unsigned short will allow > the values to go from 0 to 65535. > > Unfortunately these numbers are not easily compressible, as every > instruction would have very different offsets. > > There is also the possibility of not doing this based on some build flag > on when using -O to allow users to opt out, but given the fact > that these numbers can be quite useful to other tools like coverage > measuring tools, tracers, profilers and the such adding conditional > logic to many places would complicate the implementation considerably and > will potentially reduce the usability of those tools so we prefer > not to have the conditional logic. We believe this is extra cost is very > much worth the better error reporting but we understand and respect > other points of view. > > Does anyone see a better way to encode this information **without > complicating a lot the implementation**? What are people thoughts on the > feature? > > Thanks in advance, > > Regards from cloudy London, > Pablo Galindo Salgado > > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QDEKMTZRMPEKPFFBPCGUYWLLR43A6M6U/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Can't sync cpython main to my fork
On Fri, 7 May 2021, 8:13 am Ethan Furman, wrote: > On 5/6/21 7:14 AM, Jelle Zijlstra wrote: > > > Maybe others have different workflows, but I don't see much of a need > > for keeping your fork's main branch up to date. > > I will occasionally do a `git push origin main` just to shut up the > messages about being behind/ahead; other than that, > I have no idea why I would need origin to be up to date. > I sync mine occasionally so I can make draft PRs in my repo before submitting them to the main repo. That's the only use case I have found for it though. Cheers, Nick. > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/R5GX6JWKSKSSXA3JSU7QQOTGVE6IQEAC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Sat, 8 May 2021, 8:53 am Pablo Galindo Salgado, wrote: > > One thought: could the stored column position not include the > indentation? Would that help? > > The compiler doesn't have access easy access to the source unfortunately > so we don't know how much is the indentation. This can make life > a bit harder for other tools, although it can make it easier for reporting > the exception as the current traceback display removes indentation. > If the lnotab format (or a new data structure on the code object) could store a line indent offset for each line, each instruction within a line would only need to record the offset from the end of the indentation. If we assume "deeply indented code" is the most likely source of excessively long lines rather than "long expressions and other one line statements produced by code generators" it may be worth it, but I'm not sure that's actually true. If we instead assume long lines are likely to come from code generators, then we can impose the 255 column limit, and breaking lines at 255 code points to improve tracebacks would become a quality of implementation issue for code generators. The latter assumption seems more likely to be true to me, and if the deep indentation case does come up, the line offset idea could be pursued later. Cheers, Nick. > > On Fri, 7 May 2021 at 23:37, MRAB wrote: > >> On 2021-05-07 22:45, Pablo Galindo Salgado wrote: >> > Hi there, >> > >> > We are preparing a PEP and we would like to start some early discussion >> > about one of the main aspects of the PEP. >> > >> > The work we are preparing is to allow the interpreter to produce more >> > fine-grained error messages, pointing to >> > the source associated to the instructions that are failing. For example: >> > >> > Traceback (most recent call last): >> > >> >File "test.py", line 14, in >> > >> > lel3(x) >> > >> > ^^^ >> > >> >File "test.py", line 12, in lel3 >> > >> > return lel2(x) / 23 >> > >> > ^^^ >> > >> >File "test.py", line 9, in lel2 >> > >> > return 25 + lel(x) + lel(x) >> > >> > ^^ >> > >> >File "test.py", line 6, in lel >> > >> > return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) >> > >> > ^ >> > >> > TypeError: 'NoneType' object is not subscriptable >> > >> > >> > The cost of this is having the start column number and end >> column number >> > information for every bytecode instruction >> > and this is what we want to discuss (there is also some stack cost to >> > re-raise exceptions but that's not a big problem in >> > any case). Given that column numbers are not very big compared with >> line >> > numbers, we plan to store these as unsigned chars >> > or unsigned shorts. We ran some experiments over the standard library >> > and we found that the overhead of all pyc files is: >> > >> > * If we use shorts, the total overhead is ~3% (total size 28MB and the >> > extra size is 0.88 MB). >> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and >> the >> > extra size is 0.44MB). >> > >> > One of the disadvantages of using chars is that we can only report >> > columns from 1 to 255 so if an error happens in a column >> > bigger than that then we would have to exclude it (and not show the >> > highlighting) for that frame. Unsigned short will allow >> > the values to go from 0 to 65535. >> > >> [snip]How common are lines are longer than 255 characters, anyway? >> >> One thought: could the stored column position not include the >> indentation? Would that help? >> ___ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/OKQYNAI2B2BRCFMYJPLYPG2HHHUB5QR6/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BA4UQ36VQGZ52BN56XPKFRPVO2TWD6BN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: On the migration from master to main
On Tue, 4 May 2021, 10:50 am Łukasz Langa, wrote: > > On 4 May 2021, at 02:04, Łukasz Langa wrote: > > Having renamed the branch in my fork first, the exact sequence I used on > my own clone was: > > ❯ git checkout master > ❯ git branch -m master main > ❯ git fetch origin > ❯ git branch -u origin/main main > ❯ git remote set-head origin -a > ❯ git fetch upstream > ❯ git remote set-head upstream -a > ❯ git pull upstream main > > This worked, I can successfully pull from upstream's main and push to my > upstream's main. The `set-head` ones were suggested by the GitHub UI and > ended up with heads explicitly being listed in `git log` which I guess is > harmless. > > > Seeing Ned's and Mariatta's comments on the matter I guess I should > clarify that my commands above are for a *symmetrical* setup, i.e. where > I expect the default `git pull` to *pull from origin*, and the default > `git push` to *push to origin*. > > I might not be your typical user here, since I actually push to upstream > too every now and again as part of release management, in particular > pushing out signed release tags (which are full git objects with the same > metadata as commits). > The dev guide (and Github) have long recommended this arrangement, so I believe it's those of us that define a separate "pr" remote for our fork and have "origin" referring to the main CPython repo that are the odd ones out. Cheers, Nick. > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5KXVPXAJYFRJCT3RGQ4SC22NEM6IUKBE/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Release of a responsive python-docs-theme 2021.5
It’s with great pleasure that I announce that python-docs-theme has been released to PyPI. Thanks to the hard work and patience of Olga Bulat, @obulat, Python’s doc theme is now responsive. Many thanks to everyone who has contributed to this release by filing issues, writing PRs, reviewing PRs, and testing the theme. It was a great team effort. Here are the highlights from the CHANGELOG.rst: - Make the theme responsive (#46) Contributed by Olga Bulat. - Use Python 3.8 for the Github Actions (#71) Contributed by Stéphane Wirtel. - Use default pygments theme (#68) Contributed by Aaron Carlisle. - Test Github action to validate the theme against docsbuild scripts. (#69) Contributed by Julien Palard. - Add the copy button to pycon3 highlighted code blocks. (#64) Contributed by Julien Palard. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/75SE5KPBFEWXOGOAFEKM6FBHJQ3AORXK/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
> One thought: could the stored column position not include the indentation? Would that help? The compiler doesn't have access easy access to the source unfortunately so we don't know how much is the indentation. This can make life a bit harder for other tools, although it can make it easier for reporting the exception as the current traceback display removes indentation. On Fri, 7 May 2021 at 23:37, MRAB wrote: > On 2021-05-07 22:45, Pablo Galindo Salgado wrote: > > Hi there, > > > > We are preparing a PEP and we would like to start some early discussion > > about one of the main aspects of the PEP. > > > > The work we are preparing is to allow the interpreter to produce more > > fine-grained error messages, pointing to > > the source associated to the instructions that are failing. For example: > > > > Traceback (most recent call last): > > > >File "test.py", line 14, in > > > > lel3(x) > > > > ^^^ > > > >File "test.py", line 12, in lel3 > > > > return lel2(x) / 23 > > > > ^^^ > > > >File "test.py", line 9, in lel2 > > > > return 25 + lel(x) + lel(x) > > > > ^^ > > > >File "test.py", line 6, in lel > > > > return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) > > > > ^ > > > > TypeError: 'NoneType' object is not subscriptable > > > > > > The cost of this is having the start column number and end column number > > information for every bytecode instruction > > and this is what we want to discuss (there is also some stack cost to > > re-raise exceptions but that's not a big problem in > > any case). Given that column numbers are not very big compared with line > > numbers, we plan to store these as unsigned chars > > or unsigned shorts. We ran some experiments over the standard library > > and we found that the overhead of all pyc files is: > > > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > > extra size is 0.88 MB). > > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > > extra size is 0.44MB). > > > > One of the disadvantages of using chars is that we can only report > > columns from 1 to 255 so if an error happens in a column > > bigger than that then we would have to exclude it (and not show the > > highlighting) for that frame. Unsigned short will allow > > the values to go from 0 to 65535. > > > [snip]How common are lines are longer than 255 characters, anyway? > > One thought: could the stored column position not include the > indentation? Would that help? > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OKQYNAI2B2BRCFMYJPLYPG2HHHUB5QR6/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
Thanks, Irit for your comment! > Is it really every instruction? Or only those that can raise exceptions? Technically only the ones that can raise exceptions, but the majority can and optimizing this to only restrict to the set that can raise exceptions has the danger than the mapping needs to be maintained for new instructions and that if some instruction starts raising exceptions while it didn't before then it can introduce subtle bugs. On the other hand I think the stronger argument to do this on every instruction is that there is a lot of tools that can find this information quite useful such as coverage tools, profilers, state inspection tools and more. For example, a coverage tool will be able to tell you what part of x = f(x) if g(x) else y(x) actually was executed, while currently, it will highlight the full line. Although in this case these instructions can raise exceptions and would be covered, the distinction is different and both criteria could lead to a different subset. In short, that may be an optimization but I think I would prefer to avoid that complexity taking into account the other problems that can raise and the extra complication On Fri, 7 May 2021 at 23:21, Irit Katriel wrote: > > > On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado > wrote: > >> >> The cost of this is having the start column number and end column number >> information for every bytecode instruction >> > > > Is it really every instruction? Or only those that can raise exceptions? > > > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/W72ZOEIRWXSIY5OCGTBRSGHHKDXGZL6Z/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
> haha, true... Does our parser even have a maximum line length? (I'm not suggesting being unlimited or matching that if huge, 64k is already ridiculous We use py_ssize_t in some places but at the end of the day the lines and columns have a limit of INT_MAX IIRC On Fri, 7 May 2021 at 23:35, Gregory P. Smith wrote: > > On Fri, May 7, 2021 at 3:24 PM Pablo Galindo Salgado > wrote: > >> Thanks a lot Gregory for the comments! >> >> An additional cost to this is things that parse text tracebacks not >>> knowing how to handle it and things that log tracebacks >>> generating additional output. >> >> We should provide a way for people to disable the feature on a process as >>> part of this while they address tooling and logging issues. (via the usual >>> set of command line flag + python env var + runtime API) >> >> >> Absolutely! We were thinking about that and that's easy enough as that is >> a single conditional on the display function + the extra init configuration. >> >> Neither of those is large. While I'd lean towards uint8_t instead of >>> uint16_t because not even humans can understand a 255 character line so why >>> bother being pretty about such a thing... Just document the caveat and move >>> on with the lower value. A future pyc format could change it if a >>> compelling argument were ever found. >> >> >> I very much agree with you here but is worth noting that I have heard the >> counter-argument that the longer the line is, the more important may be to >> distinguish what part of the line is wrong. >> > > haha, true... Does our parser even have a maximum line length? (I'm not > suggesting being unlimited or matching that if huge, 64k is already > ridiculous) > > >> >> A compromise if you want to handle longer lines: A single uint16_t. >>> Represent the start column in the 9 bits and width in the other 7 bits. (or >>> any variations thereof) it's all a matter of what tradeoff you want to >>> make for space reasons. encoding as start + width instead of start + end >>> is likely better anyways if you care about compression as the width byte >>> will usually be small and thus be friendlier to compression. I'd >>> personally ignore compression entirely. >> >> >> I would personally prefer not to implement very tricky compression >> algorithms because tools may need to parse this and I don't want to >> complicate the logic a lot. Handling lnotab is already a bit painful and >> when bugs ocur it makes debugging very tricky. Having the possibility to >> index something based on the index of the instruction is quite a good API >> in my opinion. >> >> Overall doing this is going to be a big win for developer productivity! >> >> >> Thanks! We think that this has a lot of potential indeed! :) >> >> Pablo >> >> >> ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TNTBLEMSQ7JKZ2I75WJZSQHYIB6NSXCS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
This is actually a very good point. The only disadvantage is that it complicates the parsing a bit and we loose the possibility of indexing the table by instruction offset. On Fri, 7 May 2021 at 23:01, Larry Hastings wrote: > On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote: > > Given that column numbers are not very big compared with line numbers, we > plan to store these as unsigned chars > or unsigned shorts. We ran some experiments over the standard library and > we found that the overhead of all pyc files is: > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > extra size is 0.88 MB). > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > extra size is 0.44MB). > > One of the disadvantages of using chars is that we can only report columns > from 1 to 255 so if an error happens in a column > bigger than that then we would have to exclude it (and not show the > highlighting) for that frame. Unsigned short will allow > the values to go from 0 to 65535. > > Are lnotab entries required to be a fixed size? If not: > > if column < 255: > lnotab.write_one_byte(column) > else: > lnotab.write_one_byte(255) > lnotab.write_two_bytes(column) > > > I might even write four bytes instead of two in the latter case, > > > */arry* > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SJP2RIMEFEVWKBWOA2V2X4BMFGHHEZ5J/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 7, 2021 at 3:24 PM Pablo Galindo Salgado wrote: > Thanks a lot Gregory for the comments! > > An additional cost to this is things that parse text tracebacks not >> knowing how to handle it and things that log tracebacks >> generating additional output. > > We should provide a way for people to disable the feature on a process as >> part of this while they address tooling and logging issues. (via the usual >> set of command line flag + python env var + runtime API) > > > Absolutely! We were thinking about that and that's easy enough as that is > a single conditional on the display function + the extra init configuration. > > Neither of those is large. While I'd lean towards uint8_t instead of >> uint16_t because not even humans can understand a 255 character line so why >> bother being pretty about such a thing... Just document the caveat and move >> on with the lower value. A future pyc format could change it if a >> compelling argument were ever found. > > > I very much agree with you here but is worth noting that I have heard the > counter-argument that the longer the line is, the more important may be to > distinguish what part of the line is wrong. > haha, true... Does our parser even have a maximum line length? (I'm not suggesting being unlimited or matching that if huge, 64k is already ridiculous) > > A compromise if you want to handle longer lines: A single uint16_t. >> Represent the start column in the 9 bits and width in the other 7 bits. (or >> any variations thereof) it's all a matter of what tradeoff you want to >> make for space reasons. encoding as start + width instead of start + end >> is likely better anyways if you care about compression as the width byte >> will usually be small and thus be friendlier to compression. I'd >> personally ignore compression entirely. > > > I would personally prefer not to implement very tricky compression > algorithms because tools may need to parse this and I don't want to > complicate the logic a lot. Handling lnotab is already a bit painful and > when bugs ocur it makes debugging very tricky. Having the possibility to > index something based on the index of the instruction is quite a good API > in my opinion. > > Overall doing this is going to be a big win for developer productivity! > > > Thanks! We think that this has a lot of potential indeed! :) > > Pablo > > > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E7OM3GA4GNMRXAXOFAIZCCNTBWFUJAEP/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On 2021-05-07 22:56, Larry Hastings wrote: On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote: Given that column numbers are not very big compared with line numbers, we plan to store these as unsigned chars or unsigned shorts. We ran some experiments over the standard library and we found that the overhead of all pyc files is: * If we use shorts, the total overhead is ~3% (total size 28MB and the extra size is 0.88 MB). * If we use chars. the total overhead is ~1.5% (total size 28 MB and the extra size is 0.44MB). One of the disadvantages of using chars is that we can only report columns from 1 to 255 so if an error happens in a column bigger than that then we would have to exclude it (and not show the highlighting) for that frame. Unsigned short will allow the values to go from 0 to 65535. Are lnotab entries required to be a fixed size? If not: if column < 255: lnotab.write_one_byte(column) else: lnotab.write_one_byte(255) lnotab.write_two_bytes(column) I might even write four bytes instead of two in the latter case, A slight improvement would be: if column < 255: lnotab.write_one_byte(column) else: lnotab.write_one_byte(255) lnotab.write_two_bytes(column - 255) ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UJYYDMXCM5TM7GOSIPMK7GOWNC25GL7W/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On 2021-05-07 22:45, Pablo Galindo Salgado wrote: Hi there, We are preparing a PEP and we would like to start some early discussion about one of the main aspects of the PEP. The work we are preparing is to allow the interpreter to produce more fine-grained error messages, pointing to the source associated to the instructions that are failing. For example: Traceback (most recent call last): File "test.py", line 14, in lel3(x) ^^^ File "test.py", line 12, in lel3 return lel2(x) / 23 ^^^ File "test.py", line 9, in lel2 return 25 + lel(x) + lel(x) ^^ File "test.py", line 6, in lel return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) ^ TypeError: 'NoneType' object is not subscriptable The cost of this is having the start column number and end column number information for every bytecode instruction and this is what we want to discuss (there is also some stack cost to re-raise exceptions but that's not a big problem in any case). Given that column numbers are not very big compared with line numbers, we plan to store these as unsigned chars or unsigned shorts. We ran some experiments over the standard library and we found that the overhead of all pyc files is: * If we use shorts, the total overhead is ~3% (total size 28MB and the extra size is 0.88 MB). * If we use chars. the total overhead is ~1.5% (total size 28 MB and the extra size is 0.44MB). One of the disadvantages of using chars is that we can only report columns from 1 to 255 so if an error happens in a column bigger than that then we would have to exclude it (and not show the highlighting) for that frame. Unsigned short will allow the values to go from 0 to 65535. [snip]How common are lines are longer than 255 characters, anyway? One thought: could the stored column position not include the indentation? Would that help? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
Thanks a lot Gregory for the comments! An additional cost to this is things that parse text tracebacks not knowing > how to handle it and things that log tracebacks generating additional > output. We should provide a way for people to disable the feature on a process as > part of this while they address tooling and logging issues. (via the usual > set of command line flag + python env var + runtime API) Absolutely! We were thinking about that and that's easy enough as that is a single conditional on the display function + the extra init configuration. Neither of those is large. While I'd lean towards uint8_t instead of > uint16_t because not even humans can understand a 255 character line so why > bother being pretty about such a thing... Just document the caveat and move > on with the lower value. A future pyc format could change it if a > compelling argument were ever found. I very much agree with you here but is worth noting that I have heard the counter-argument that the longer the line is, the more important may be to distinguish what part of the line is wrong. A compromise if you want to handle longer lines: A single uint16_t. > Represent the start column in the 9 bits and width in the other 7 bits. (or > any variations thereof) it's all a matter of what tradeoff you want to > make for space reasons. encoding as start + width instead of start + end > is likely better anyways if you care about compression as the width byte > will usually be small and thus be friendlier to compression. I'd > personally ignore compression entirely. I would personally prefer not to implement very tricky compression algorithms because tools may need to parse this and I don't want to complicate the logic a lot. Handling lnotab is already a bit painful and when bugs ocur it makes debugging very tricky. Having the possibility to index something based on the index of the instruction is quite a good API in my opinion. Overall doing this is going to be a big win for developer productivity! Thanks! We think that this has a lot of potential indeed! :) Pablo ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OHSQ6VLMVSZHCLEUQZ52NXCWEGLG2DQN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 7, 2021 at 2:50 PM Pablo Galindo Salgado wrote: > Hi there, > > We are preparing a PEP and we would like to start some early discussion > about one of the main aspects of the PEP. > > The work we are preparing is to allow the interpreter to produce more > fine-grained error messages, pointing to > the source associated to the instructions that are failing. For example: > > Traceback (most recent call last): > > File "test.py", line 14, in > > lel3(x) > > ^^^ > > File "test.py", line 12, in lel3 > > return lel2(x) / 23 > >^^^ > > File "test.py", line 9, in lel2 > > return 25 + lel(x) + lel(x) > > ^^ > > File "test.py", line 6, in lel > > return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) > > ^ > > TypeError: 'NoneType' object is not subscriptable > > An additional cost to this is things that parse text tracebacks not knowing how to handle it and things that log tracebacks generating additional output. We should provide a way for people to disable the feature on a process as part of this while they address tooling and logging issues. (via the usual set of command line flag + python env var + runtime API) The cost of this is having the start column number and end column number > information for every bytecode instruction > and this is what we want to discuss (there is also some stack cost to > re-raise exceptions but that's not a big problem in > any case). Given that column numbers are not very big compared with line > numbers, we plan to store these as unsigned chars > or unsigned shorts. We ran some experiments over the standard library and > we found that the overhead of all pyc files is: > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > extra size is 0.88 MB). > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > extra size is 0.44MB). > > One of the disadvantages of using chars is that we can only report columns > from 1 to 255 so if an error happens in a column > bigger than that then we would have to exclude it (and not show the > highlighting) for that frame. Unsigned short will allow > the values to go from 0 to 65535. > Neither of those is large. While I'd lean towards uint8_t instead of uint16_t because not even humans can understand a 255 character line so why bother being pretty about such a thing... Just document the caveat and move on with the lower value. A future pyc format could change it if a compelling argument were ever found. > Unfortunately these numbers are not easily compressible, as every > instruction would have very different offsets. > > There is also the possibility of not doing this based on some build flag > on when using -O to allow users to opt out, but given the fact > that these numbers can be quite useful to other tools like coverage > measuring tools, tracers, profilers and the such adding conditional > logic to many places would complicate the implementation considerably and > will potentially reduce the usability of those tools so we prefer > not to have the conditional logic. We believe this is extra cost is very > much worth the better error reporting but we understand and respect > other points of view. > > Does anyone see a better way to encode this information **without > complicating a lot the implementation**? What are people thoughts on the > feature? > A compromise if you want to handle longer lines: A single uint16_t. Represent the start column in the 9 bits and width in the other 7 bits. (or any variations thereof) it's all a matter of what tradeoff you want to make for space reasons. encoding as start + width instead of start + end is likely better anyways if you care about compression as the width byte will usually be small and thus be friendlier to compression. I'd personally ignore compression entirely. Overall doing this is going to be a big win for developer productivity! -Greg ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ULNDFY5CWVDELNPE6S4HY5SDAODOT7DC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado wrote: > > The cost of this is having the start column number and end column number > information for every bytecode instruction > Is it really every instruction? Or only those that can raise exceptions? ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UL5XAR2MGYJPIKB67R56OJXUVKP2KM3H/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
Technically the main concern may be the size of the unmarshalled pyc files in memory, more than the storage size of disk. On Fri, 7 May 2021, 23:04 Antoine Pitrou, wrote: > On Fri, 7 May 2021 22:45:38 +0100 > Pablo Galindo Salgado wrote: > > > > The cost of this is having the start column number and end column number > > information for every bytecode instruction > > and this is what we want to discuss (there is also some stack cost to > > re-raise exceptions but that's not a big problem in > > any case). Given that column numbers are not very big compared with line > > numbers, we plan to store these as unsigned chars > > or unsigned shorts. We ran some experiments over the standard library and > > we found that the overhead of all pyc files is: > > > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > > extra size is 0.88 MB). > > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > > extra size is 0.44MB). > > More generally, if some people in 2021 are still concerned with the size > of pyc files (why not), how about introducing a new version of the pyc > format with built-in LZ4 compression? > > LZ4 decompression is extremely fast on modern CPUs (several GB/s) and > vendoring the C library should be simple. > https://github.com/lz4/lz4 > > Regards > > Antoine. > > > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/PQZ6OTWG6K6W65YXRLKEH7UOD5FM24TN/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SNDIHPBDW4Y3KGSOEL7MBJER3IEBIFTN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, May 7, 2021 at 3:01 PM Larry Hastings wrote: > On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote: > > Given that column numbers are not very big compared with line numbers, we > plan to store these as unsigned chars > or unsigned shorts. We ran some experiments over the standard library and > we found that the overhead of all pyc files is: > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > extra size is 0.88 MB). > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > extra size is 0.44MB). > > One of the disadvantages of using chars is that we can only report columns > from 1 to 255 so if an error happens in a column > bigger than that then we would have to exclude it (and not show the > highlighting) for that frame. Unsigned short will allow > the values to go from 0 to 65535. > > Are lnotab entries required to be a fixed size? If not: > > if column < 255: > lnotab.write_one_byte(column) > else: > lnotab.write_one_byte(255) > lnotab.write_two_bytes(column) > > If non-fixed size is acceptable. use utf-8 to encode the column number as a single codepoint number into bytes and you don't even need to write your own encode/decode logic for a varint. -gps ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QNWOZWTNFAVPD77KNG4LRYWCEDY3F6HX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On Fri, 7 May 2021 22:45:38 +0100 Pablo Galindo Salgado wrote: > > The cost of this is having the start column number and end column number > information for every bytecode instruction > and this is what we want to discuss (there is also some stack cost to > re-raise exceptions but that's not a big problem in > any case). Given that column numbers are not very big compared with line > numbers, we plan to store these as unsigned chars > or unsigned shorts. We ran some experiments over the standard library and > we found that the overhead of all pyc files is: > > * If we use shorts, the total overhead is ~3% (total size 28MB and the > extra size is 0.88 MB). > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the > extra size is 0.44MB). More generally, if some people in 2021 are still concerned with the size of pyc files (why not), how about introducing a new version of the pyc format with built-in LZ4 compression? LZ4 decompression is extremely fast on modern CPUs (several GB/s) and vendoring the C library should be simple. https://github.com/lz4/lz4 Regards Antoine. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PQZ6OTWG6K6W65YXRLKEH7UOD5FM24TN/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks
On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote: Given that column numbers are not very big compared with line numbers, we plan to store these as unsigned chars or unsigned shorts. We ran some experiments over the standard library and we found that the overhead of all pyc files is: * If we use shorts, the total overhead is ~3% (total size 28MB and the extra size is 0.88 MB). * If we use chars. the total overhead is ~1.5% (total size 28 MB and the extra size is 0.44MB). One of the disadvantages of using chars is that we can only report columns from 1 to 255 so if an error happens in a column bigger than that then we would have to exclude it (and not show the highlighting) for that frame. Unsigned short will allow the values to go from 0 to 65535. Are lnotab entries required to be a fixed size? If not: if column < 255: lnotab.write_one_byte(column) else: lnotab.write_one_byte(255) lnotab.write_two_bytes(column) I might even write four bytes instead of two in the latter case, //arry/ ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Future PEP: Include Fine Grained Error Locations in Tracebacks
Hi there, We are preparing a PEP and we would like to start some early discussion about one of the main aspects of the PEP. The work we are preparing is to allow the interpreter to produce more fine-grained error messages, pointing to the source associated to the instructions that are failing. For example: Traceback (most recent call last): File "test.py", line 14, in lel3(x) ^^^ File "test.py", line 12, in lel3 return lel2(x) / 23 ^^^ File "test.py", line 9, in lel2 return 25 + lel(x) + lel(x) ^^ File "test.py", line 6, in lel return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e) ^ TypeError: 'NoneType' object is not subscriptable The cost of this is having the start column number and end column number information for every bytecode instruction and this is what we want to discuss (there is also some stack cost to re-raise exceptions but that's not a big problem in any case). Given that column numbers are not very big compared with line numbers, we plan to store these as unsigned chars or unsigned shorts. We ran some experiments over the standard library and we found that the overhead of all pyc files is: * If we use shorts, the total overhead is ~3% (total size 28MB and the extra size is 0.88 MB). * If we use chars. the total overhead is ~1.5% (total size 28 MB and the extra size is 0.44MB). One of the disadvantages of using chars is that we can only report columns from 1 to 255 so if an error happens in a column bigger than that then we would have to exclude it (and not show the highlighting) for that frame. Unsigned short will allow the values to go from 0 to 65535. Unfortunately these numbers are not easily compressible, as every instruction would have very different offsets. There is also the possibility of not doing this based on some build flag on when using -O to allow users to opt out, but given the fact that these numbers can be quite useful to other tools like coverage measuring tools, tracers, profilers and the such adding conditional logic to many places would complicate the implementation considerably and will potentially reduce the usability of those tools so we prefer not to have the conditional logic. We believe this is extra cost is very much worth the better error reporting but we understand and respect other points of view. Does anyone see a better way to encode this information **without complicating a lot the implementation**? What are people thoughts on the feature? Thanks in advance, Regards from cloudy London, Pablo Galindo Salgado ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DB3RTYBF2BXTY6ZHP3Z4DXCRWPJIQUFD/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2021-04-30 - 2021-05-07) Python tracker at https://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open7428 (-25) closed 48377 (+104) total 55805 (+79) Open issues with patches: 2946 Issues opened (48) == #40943: PEP 353: Drop support for PyArg_ParseTuple() "#" formats when https://bugs.python.org/issue40943 reopened by methane #43001: python3.8.9, python3.9.2 test_embed test_tabnanny failed https://bugs.python.org/issue43001 reopened by asholomitskiy84 #43176: Dataclasses derived from empty frozen bases skip immutability https://bugs.python.org/issue43176 reopened by eric.smith #43882: [security] urllib.parse should sanitize urls containing ASCII https://bugs.python.org/issue43882 reopened by gregory.p.smith #43992: Unable to get external dependencies for CPython on Ubuntu Linu https://bugs.python.org/issue43992 opened by shreyanavigyan #43994: change representation of match as / capture as `Name(..., ctx= https://bugs.python.org/issue43994 opened by Anthony Sottile #44002: Use functools.lru_cache in urllib.parse instead of 1996 custom https://bugs.python.org/issue44002 opened by gregory.p.smith #44005: multiple socket bind failure on Mac OS X with SO_REUSEADDR https://bugs.python.org/issue44005 opened by giangipy #44010: IDLE: highlight soft keywords https://bugs.python.org/issue44010 opened by epaine #44011: Borrow asyncio ssl implementation from uvloop https://bugs.python.org/issue44011 opened by asvetlov #44012: IPv6Address.exploded does not support interface name (scope id https://bugs.python.org/issue44012 opened by ohwgiles #44013: tempfile.TemporaryFile: name of file descriptor cannot be reus https://bugs.python.org/issue44013 opened by zhongxiang117 #44016: Enum related deprecation warnings in test_httpservers and test https://bugs.python.org/issue44016 opened by xtreak #44019: operator.call/operator.__call__ https://bugs.python.org/issue44019 opened by Antony.Lee #44021: enum docs in 3.10: missing "New in version 3.10" https://bugs.python.org/issue44021 opened by Akuli #44023: "tarfile" library will lead to "write any content to any file https://bugs.python.org/issue44023 opened by leveryd #44024: Improve the TypeError message for non-string second arguments https://bugs.python.org/issue44024 opened by maggyero #44025: Match doc: Clarify '_' as a soft keyword https://bugs.python.org/issue44025 opened by terry.reedy #44026: IDLE: print "Did you mean?" for AttributeError and NameError https://bugs.python.org/issue44026 opened by Dennis Sweeney #44028: Request for locals().update() to work, it is https://bugs.python.org/issue44028 opened by xuancong84 #44030: Markup with_traceback code example https://bugs.python.org/issue44030 opened by terry.reedy #44031: python3.8.9, python3.9.2 test_embed test_tabnanny failed https://bugs.python.org/issue44031 opened by asholomitskiy84 #44032: Function locals and evaluation stack should be stored in a con https://bugs.python.org/issue44032 opened by Mark.Shannon #44035: Regenerating the configure script fails even if dependencies a https://bugs.python.org/issue44035 opened by pablogsal #44036: asyncio SSL server can be DOSed, event loop gets blocked: busy https://bugs.python.org/issue44036 opened by ghost43 #44037: Broad performance regression from 3.10a7 to 3.10b1 with python https://bugs.python.org/issue44037 opened by rhettinger #44038: In documentation Section 8.6, the definition of parameter_list https://bugs.python.org/issue44038 opened by webbnh #44041: [sqlite3] optimisation: only call sqlite3_column_count when ne https://bugs.python.org/issue44041 opened by erlendaasland #44042: [sqlite3] _pysqlite_connection_begin() optimisations https://bugs.python.org/issue44042 opened by erlendaasland #44043: 3.10 b1 armhf Bus Error in hashlib test: test_gil https://bugs.python.org/issue44043 opened by Anthony Sottile #44044: ConfigParser: cannot link to ConfigParser.optionxform(option) https://bugs.python.org/issue44044 opened by jugmac00 #44045: canonicalize "upper-case" -> "uppercase"; "lower-case" -> "low https://bugs.python.org/issue44045 opened by jugmac00 #44048: test_hashlib failure for "AMD64 RHEL8 FIPS Only Blake2 Builtin https://bugs.python.org/issue44048 opened by cstratak #44050: Exceptions in a subinterpreter are changed by another subinter https://bugs.python.org/issue44050 opened by trygveaa #44052: patch object as argument should be explicit https://bugs.python.org/issue44052 opened by CendioOssman #44053: Can't connect to a server also not showing any type of output https://bugs.python.org/issue44053 opened by muqadasrasheed652 #44055: NamedTemporaryFile opened twice on Windows https://bugs.python.org/issue44055 opened by frenzy #44057: Inconsitencies in `__init_subclass__` in a generic class https://bugs.python.org