[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Nathaniel Smith
On Fri, May 7, 2021 at 8:14 PM Neil Schemenauer  wrote:
>
> On 2021-05-07, Pablo Galindo Salgado wrote:
> > Technically the main concern may be the size of the unmarshalled
> > pyc files in memory, more than the storage size of disk.
>
> It would be cool if we could mmap the pyc files and have the VM run
> code without an unmarshal step.  One idea is something similar to
> the Facebook "not another freeze" PR but with a twist.  Their
> approach was to dump out code objects so they could be loaded as if
> they were statically defined structures.
>
> Instead, could we dump out the pyc data in a format similar to Cap'n
> Proto?  That way no unmarshal is needed.  The VM would have to be
> extensively changed to run code in that format.  That's the hard
> part.
>
> The benefit would be faster startup times.  The unmarshal step is
> costly.  It would mostly solve the concern about these larger
> linenum/colnum tables.  We would only load that data into memory if
> the table is accessed.

A simpler version would be to pack just the docstrings/lnotab/column
numbers into a separate part of the .pyc, and store a reference to the
file + offset to load them lazily on demand. No need for mmap.

Could also store them in memory, but with some cheap compression
applied, and decompress on access. None of these get accessed often.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q2DBRE5YKLTSPVCMUCXPEDXKFCA4UUGQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Speeding up CPython

2021-05-07 Thread Gregory P. Smith
On Fri, May 7, 2021 at 6:51 PM Steven D'Aprano  wrote:

> On Tue, Oct 20, 2020 at 01:53:34PM +0100, Mark Shannon wrote:
> > Hi everyone,
> >
> > CPython is slow. We all know that, yet little is done to fix it.
> >
> > I'd like to change that.
> > I have a plan to speed up CPython by a factor of five over the next few
> > years. But it needs funding.
>
> I've noticed a lot of optimization-related b.p.o. issues created by
> Mark, which is great. What happened with Mark's proposal here? Did the
> funding issue get sorted?
>

I believe Guido has Mark contracting on Python performance through
Microsoft?

-Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5RTMMJLGZE5FHW3SAYWSKUYOLEUZ2RFX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Neil Schemenauer
On 2021-05-07, Pablo Galindo Salgado wrote:
> Technically the main concern may be the size of the unmarshalled
> pyc files in memory, more than the storage size of disk.

It would be cool if we could mmap the pyc files and have the VM run
code without an unmarshal step.  One idea is something similar to
the Facebook "not another freeze" PR but with a twist.  Their
approach was to dump out code objects so they could be loaded as if
they were statically defined structures.

Instead, could we dump out the pyc data in a format similar to Cap'n
Proto?  That way no unmarshal is needed.  The VM would have to be
extensively changed to run code in that format.  That's the hard
part.

The benefit would be faster startup times.  The unmarshal step is
costly.  It would mostly solve the concern about these larger
linenum/colnum tables.  We would only load that data into memory if
the table is accessed.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UKDLCOTUFNWGSMWWGLH3DJC4AVYZANDM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Although we were originally not sympathetic with it, we may need to offer
an opt-out mechanism for those users that care about the impact of the
overhead of the new data in pyc files
and in in-memory code objectsas was suggested by some folks (Thomas, Yury,
and others). For this, we could propose that the functionality will be
deactivated along with the extra
information when Python is executed in optimized mode (``python -O``) and
therefore pyo files will not have the overhead associated with the extra
required data. Notice that Python
already strips docstrings in this mode so it would be "aligned" with the
current mechanism of optimized mode.

Although this complicates the implementation, it certainly is still much
easier than dealing with compression (and more useful for those that don't
want the feature). Notice that we also
expect pessimistic results from compression as offsets would be quite
random (although predominantly in the range 10 - 120).

On Sat, 8 May 2021 at 01:56, Pablo Galindo Salgado 
wrote:

> One last note for clarity: that's the increase of size in the stdlib, the
> increase of size
> for pyc files goes from 28.471296MB to 34.750464MB, which is an increase
> of 22%.
>
> On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
> wrote:
>
>> Some update on the numbers. We have made some draft implementation to
>> corroborate the
>> numbers with some more realistic tests and seems that our original
>> calculations were wrong.
>> The actual increase in size is quite bigger than previously advertised:
>>
>> Using bytes object to encode the final object and marshalling that to
>> disk (so using uint8_t) as the underlying
>> type:
>>
>> BEFORE:
>>
>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>> ❯ du -h Lib -c --max-depth=0
>> 70M Lib
>> 70M total
>>
>> AFTER:
>> ❯ ./python -m compileall -r 1000 Lib > /dev/null
>> ❯ du -h Lib -c --max-depth=0
>> 76M Lib
>> 76M total
>>
>> So that's an increase of 8.56 % over the original value. This is storing
>> the start offset and end offset with no compression
>> whatsoever.
>>
>> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
>> wrote:
>>
>>> Hi there,
>>>
>>> We are preparing a PEP and we would like to start some early discussion
>>> about one of the main aspects of the PEP.
>>>
>>> The work we are preparing is to allow the interpreter to produce more
>>> fine-grained error messages, pointing to
>>> the source associated to the instructions that are failing. For example:
>>>
>>> Traceback (most recent call last):
>>>
>>>   File "test.py", line 14, in 
>>>
>>> lel3(x)
>>>
>>> ^^^
>>>
>>>   File "test.py", line 12, in lel3
>>>
>>> return lel2(x) / 23
>>>
>>>^^^
>>>
>>>   File "test.py", line 9, in lel2
>>>
>>> return 25 + lel(x) + lel(x)
>>>
>>> ^^
>>>
>>>   File "test.py", line 6, in lel
>>>
>>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>>
>>>  ^
>>>
>>> TypeError: 'NoneType' object is not subscriptable
>>>
>>> The cost of this is having the start column number and end column number
>>> information for every bytecode instruction
>>> and this is what we want to discuss (there is also some stack cost to
>>> re-raise exceptions but that's not a big problem in
>>> any case). Given that column numbers are not very big compared with line
>>> numbers, we plan to store these as unsigned chars
>>> or unsigned shorts. We ran some experiments over the standard library
>>> and we found that the overhead of all pyc files is:
>>>
>>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>>> extra size is 0.88 MB).
>>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>>> extra size is 0.44MB).
>>>
>>> One of the disadvantages of using chars is that we can only report
>>> columns from 1 to 255 so if an error happens in a column
>>> bigger than that then we would have to exclude it (and not show the
>>> highlighting) for that frame. Unsigned short will allow
>>> the values to go from 0 to 65535.
>>>
>>> Unfortunately these numbers are not easily compressible, as every
>>> instruction would have very different offsets.
>>>
>>> There is also the possibility of not doing this based on some build flag
>>> on when using -O to allow users to opt out, but given the fact
>>> that these numbers can be quite useful to other tools like coverage
>>> measuring tools, tracers, profilers and the such adding conditional
>>> logic to many places would complicate the implementation considerably
>>> and will potentially reduce the usability of those tools so we prefer
>>> not to have the conditional logic. We believe this is extra cost is very
>>> much worth the better error reporting but we understand and respect
>>> other points of view.
>>>
>>> Does anyone see a better way to encode this information **without
>>> complicating a lot the implementation**? What are people thoughts on the
>>> feature?
>>>
>>> Tha

[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Chris Jerdonek
On Fri, May 7, 2021 at 6:39 PM Steven D'Aprano  wrote:

> On Fri, May 07, 2021 at 06:02:51PM -0700, Chris Jerdonek wrote:
>
> > To know what compression methods might be effective, I’m wondering if it
> > could be useful to see separate histograms of, say, the start column
> number
> > and width over the code base. Or for people that really want to dig in,
> > maybe access to the set of all pairs could help. (E.g. maybe a histogram
> of
> > pairs could also reveal something.)
>
> I think this is over-analysing. Do we need to micro-optimize the
> compression algorithm? Let's make the choice simple: live with the size
> increase, or swap to LZ4 compression as Antoine suggested. Analysis
> paralysis is a real risk here.
>
> If there are implementations which cannot support either (MicroPython?)
> they should be free to continue doing things the old way. In other
> words, "fine grained error messages" should be a quality of
> implementation feature rather than a language guarantee.
>
> I understand that the plan is to make this feature optional in any case,
> to allow third-party tools to catch up.
>
> If people really want to do that histogram analysis so that they can
> optimize the choice of compression algorithm, of course they are free to
> do so. But the PEP authors should not feel that they are obliged to do
> so, and we should avoid the temptation to bikeshed over compressors.
>

I'm not sure why you're sounding so negative. Pablo asked for ideas in his
first message to the list:

On Fri, May 7, 2021 at 2:53 PM Pablo Galindo Salgado 
wrote:

> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**?
>

Maybe a large gain can be made with a simple tweak to how the pair is
encoded, but there's no way to know without seeing the distribution. Also,
my reply wasn't about the pyc files on disk but about their representation
in memory, which Pablo later said may be the main concern. So it's not
compression algorithms like LZ4 so much as a method of encoding.

--Chris


>
> (For what it's worth, I like this proposed feature, I don't care about a
> 20-25% increase in pyc file size, but if this leads to adding LZ4
> compression to the stdlib, I like it even more :-)
>
>
> --
> Steve
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/6H2XSRMARU4SX4WRMIO2M4MI4EQASPBC/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UYARCZJJFIEKRWMEEBW2FAGBPAPDFJGG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
> I'm wondering if it's possible to compromise with one position that's
not as complete but still gives a good hint:

Even if is possible, it will be quite less useful (a lot of users wanted to
highlight full ranges for syntax errors, and that change was very well
received
when we announce it in 3.10) and most importantly, will render the feature
much less useful for other tools such as profilers, coverage tools, and the
like.

It will also make the feature less useful for people that want to display
even more information such as error reporting tools, IDEsetc


On Sat, 8 May 2021 at 02:41, MRAB  wrote:

> On 2021-05-08 01:43, Pablo Galindo Salgado wrote:
> > Some update on the numbers. We have made some draft implementation to
> > corroborate the
> > numbers with some more realistic tests and seems that our original
> > calculations were wrong.
> > The actual increase in size is quite bigger than previously advertised:
> >
> > Using bytes object to encode the final object and marshalling that to
> > disk (so using uint8_t) as the underlying
> > type:
> >
> > BEFORE:
> >
> > ❯ ./python -m compileall -r 1000 Lib > /dev/null
> > ❯ du -h Lib -c --max-depth=0
> > 70M Lib
> > 70M total
> >
> > AFTER:
> > ❯ ./python -m compileall -r 1000 Lib > /dev/null
> > ❯ du -h Lib -c --max-depth=0
> > 76M Lib
> > 76M total
> >
> > So that's an increase of 8.56 % over the original value. This is storing
> > the start offset and end offset with no compression
> > whatsoever.
> >
> [snip]
>
> I'm wondering if it's possible to compromise with one position that's
> not as complete but still gives a good hint:
>
> For example:
>
>File "test.py", line 6, in lel
>  return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>^
>
> TypeError: 'NoneType' object is not subscriptable
>
> That at least tells you which subscript raised the exception.
>
>
> Another example:
>
>Traceback (most recent call last):
>  File "test.py", line 4, in 
>print(1 / x + 1 / y)
>^
>ZeroDivisionError: division by zero
>
> as distinct from:
>
>Traceback (most recent call last):
>  File "test.py", line 4, in 
>print(1 / x + 1 / y)
>^
>ZeroDivisionError: division by zero
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/4RGQALI6T6HBNRDUUEYX4FA2YKTZDBNA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7OKDSXZZ7TQFQ3X4RZGNGLX5UDF2B5QW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Speeding up CPython

2021-05-07 Thread Steven D'Aprano
On Tue, Oct 20, 2020 at 01:53:34PM +0100, Mark Shannon wrote:
> Hi everyone,
> 
> CPython is slow. We all know that, yet little is done to fix it.
> 
> I'd like to change that.
> I have a plan to speed up CPython by a factor of five over the next few 
> years. But it needs funding.

I've noticed a lot of optimization-related b.p.o. issues created by 
Mark, which is great. What happened with Mark's proposal here? Did the 
funding issue get sorted?


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6ZVJ5ZJC2BXUUSI3JTGOU4MQXQHORI4Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread MRAB

On 2021-05-08 01:43, Pablo Galindo Salgado wrote:
Some update on the numbers. We have made some draft implementation to 
corroborate the
numbers with some more realistic tests and seems that our original 
calculations were wrong.

The actual increase in size is quite bigger than previously advertised:

Using bytes object to encode the final object and marshalling that to 
disk (so using uint8_t) as the underlying

type:

BEFORE:

❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
70M     Lib
70M     total

AFTER:
❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
76M     Lib
76M     total

So that's an increase of 8.56 % over the original value. This is storing 
the start offset and end offset with no compression

whatsoever.


[snip]

I'm wondering if it's possible to compromise with one position that's 
not as complete but still gives a good hint:


For example:

  File "test.py", line 6, in lel
return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
  ^

TypeError: 'NoneType' object is not subscriptable

That at least tells you which subscript raised the exception.


Another example:

  Traceback (most recent call last):
File "test.py", line 4, in 
  print(1 / x + 1 / y)
  ^
  ZeroDivisionError: division by zero

as distinct from:

  Traceback (most recent call last):
File "test.py", line 4, in 
  print(1 / x + 1 / y)
  ^
  ZeroDivisionError: division by zero
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4RGQALI6T6HBNRDUUEYX4FA2YKTZDBNA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Steven D'Aprano
On Fri, May 07, 2021 at 06:02:51PM -0700, Chris Jerdonek wrote:

> To know what compression methods might be effective, I’m wondering if it
> could be useful to see separate histograms of, say, the start column number
> and width over the code base. Or for people that really want to dig in,
> maybe access to the set of all pairs could help. (E.g. maybe a histogram of
> pairs could also reveal something.)

I think this is over-analysing. Do we need to micro-optimize the 
compression algorithm? Let's make the choice simple: live with the size 
increase, or swap to LZ4 compression as Antoine suggested. Analysis 
paralysis is a real risk here.

If there are implementations which cannot support either (MicroPython?) 
they should be free to continue doing things the old way. In other 
words, "fine grained error messages" should be a quality of 
implementation feature rather than a language guarantee.

I understand that the plan is to make this feature optional in any case, 
to allow third-party tools to catch up.

If people really want to do that histogram analysis so that they can 
optimize the choice of compression algorithm, of course they are free to 
do so. But the PEP authors should not feel that they are obliged to do 
so, and we should avoid the temptation to bikeshed over compressors.

(For what it's worth, I like this proposed feature, I don't care about a 
20-25% increase in pyc file size, but if this leads to adding LZ4 
compression to the stdlib, I like it even more :-)


-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6H2XSRMARU4SX4WRMIO2M4MI4EQASPBC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Chris Jerdonek
On Fri, May 7, 2021 at 5:44 PM Pablo Galindo Salgado 
wrote:

> Some update on the numbers. We have made some draft implementation to
> corroborate the
> numbers with some more realistic tests and seems that our original
> calculations were wrong.
> The actual increase in size is quite bigger than previously advertised:
>
> Using bytes object to encode the final object and marshalling that to disk
> (so using uint8_t) as the underlying
> type:
>
> BEFORE:
>
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 70M Lib
> 70M total
>
> AFTER:
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 76M Lib
> 76M total
>
> So that's an increase of 8.56 % over the original value. This is storing
> the start offset and end offset with no compression
> whatsoever.
>

To know what compression methods might be effective, I’m wondering if it
could be useful to see separate histograms of, say, the start column number
and width over the code base. Or for people that really want to dig in,
maybe access to the set of all pairs could help. (E.g. maybe a histogram of
pairs could also reveal something.)

—Chris



> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early discussion
>> about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>
>> return lel2(x) / 23
>>
>>^^^
>>
>>   File "test.py", line 9, in lel2
>>
>> return 25 + lel(x) + lel(x)
>>
>> ^^
>>
>>   File "test.py", line 6, in lel
>>
>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>
>>  ^
>>
>> TypeError: 'NoneType' object is not subscriptable
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>> and this is what we want to discuss (there is also some stack cost to
>> re-raise exceptions but that's not a big problem in
>> any case). Given that column numbers are not very big compared with line
>> numbers, we plan to store these as unsigned chars
>> or unsigned shorts. We ran some experiments over the standard library and
>> we found that the overhead of all pyc files is:
>>
>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> extra size is 0.88 MB).
>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>> extra size is 0.44MB).
>>
>> One of the disadvantages of using chars is that we can only report
>> columns from 1 to 255 so if an error happens in a column
>> bigger than that then we would have to exclude it (and not show the
>> highlighting) for that frame. Unsigned short will allow
>> the values to go from 0 to 65535.
>>
>> Unfortunately these numbers are not easily compressible, as every
>> instruction would have very different offsets.
>>
>> There is also the possibility of not doing this based on some build flag
>> on when using -O to allow users to opt out, but given the fact
>> that these numbers can be quite useful to other tools like coverage
>> measuring tools, tracers, profilers and the such adding conditional
>> logic to many places would complicate the implementation considerably and
>> will potentially reduce the usability of those tools so we prefer
>> not to have the conditional logic. We believe this is extra cost is very
>> much worth the better error reporting but we understand and respect
>> other points of view.
>>
>> Does anyone see a better way to encode this information **without
>> complicating a lot the implementation**? What are people thoughts on the
>> feature?
>>
>> Thanks in advance,
>>
>> Regards from cloudy London,
>> Pablo Galindo Salgado
>>
>> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QDEKMTZRMPEKPFFBPCGUYWLLR43A6M6U/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZTNJHADASSERV65FSVVYWNL6JF65CYQK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
One last note for clarity: that's the increase of size in the stdlib, the
increase of size
for pyc files goes from 28.471296MB to 34.750464MB, which is an increase of
22%.

On Sat, 8 May 2021 at 01:43, Pablo Galindo Salgado 
wrote:

> Some update on the numbers. We have made some draft implementation to
> corroborate the
> numbers with some more realistic tests and seems that our original
> calculations were wrong.
> The actual increase in size is quite bigger than previously advertised:
>
> Using bytes object to encode the final object and marshalling that to disk
> (so using uint8_t) as the underlying
> type:
>
> BEFORE:
>
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 70M Lib
> 70M total
>
> AFTER:
> ❯ ./python -m compileall -r 1000 Lib > /dev/null
> ❯ du -h Lib -c --max-depth=0
> 76M Lib
> 76M total
>
> So that's an increase of 8.56 % over the original value. This is storing
> the start offset and end offset with no compression
> whatsoever.
>
> On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
> wrote:
>
>> Hi there,
>>
>> We are preparing a PEP and we would like to start some early discussion
>> about one of the main aspects of the PEP.
>>
>> The work we are preparing is to allow the interpreter to produce more
>> fine-grained error messages, pointing to
>> the source associated to the instructions that are failing. For example:
>>
>> Traceback (most recent call last):
>>
>>   File "test.py", line 14, in 
>>
>> lel3(x)
>>
>> ^^^
>>
>>   File "test.py", line 12, in lel3
>>
>> return lel2(x) / 23
>>
>>^^^
>>
>>   File "test.py", line 9, in lel2
>>
>> return 25 + lel(x) + lel(x)
>>
>> ^^
>>
>>   File "test.py", line 6, in lel
>>
>> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>>
>>  ^
>>
>> TypeError: 'NoneType' object is not subscriptable
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>> and this is what we want to discuss (there is also some stack cost to
>> re-raise exceptions but that's not a big problem in
>> any case). Given that column numbers are not very big compared with line
>> numbers, we plan to store these as unsigned chars
>> or unsigned shorts. We ran some experiments over the standard library and
>> we found that the overhead of all pyc files is:
>>
>> * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> extra size is 0.88 MB).
>> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
>> extra size is 0.44MB).
>>
>> One of the disadvantages of using chars is that we can only report
>> columns from 1 to 255 so if an error happens in a column
>> bigger than that then we would have to exclude it (and not show the
>> highlighting) for that frame. Unsigned short will allow
>> the values to go from 0 to 65535.
>>
>> Unfortunately these numbers are not easily compressible, as every
>> instruction would have very different offsets.
>>
>> There is also the possibility of not doing this based on some build flag
>> on when using -O to allow users to opt out, but given the fact
>> that these numbers can be quite useful to other tools like coverage
>> measuring tools, tracers, profilers and the such adding conditional
>> logic to many places would complicate the implementation considerably and
>> will potentially reduce the usability of those tools so we prefer
>> not to have the conditional logic. We believe this is extra cost is very
>> much worth the better error reporting but we understand and respect
>> other points of view.
>>
>> Does anyone see a better way to encode this information **without
>> complicating a lot the implementation**? What are people thoughts on the
>> feature?
>>
>> Thanks in advance,
>>
>> Regards from cloudy London,
>> Pablo Galindo Salgado
>>
>>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RICGTXCABZPK7RLDB7SISR4E64S6FEKR/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Some update on the numbers. We have made some draft implementation to
corroborate the
numbers with some more realistic tests and seems that our original
calculations were wrong.
The actual increase in size is quite bigger than previously advertised:

Using bytes object to encode the final object and marshalling that to disk
(so using uint8_t) as the underlying
type:

BEFORE:

❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
70M Lib
70M total

AFTER:
❯ ./python -m compileall -r 1000 Lib > /dev/null
❯ du -h Lib -c --max-depth=0
76M Lib
76M total

So that's an increase of 8.56 % over the original value. This is storing
the start offset and end offset with no compression
whatsoever.

On Fri, 7 May 2021 at 22:45, Pablo Galindo Salgado 
wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>
> Thanks in advance,
>
> Regards from cloudy London,
> Pablo Galindo Salgado
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QDEKMTZRMPEKPFFBPCGUYWLLR43A6M6U/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can't sync cpython main to my fork

2021-05-07 Thread Nick Coghlan
On Fri, 7 May 2021, 8:13 am Ethan Furman,  wrote:

> On 5/6/21 7:14 AM, Jelle Zijlstra wrote:
>
>  > Maybe others have different workflows, but I don't see much of a need
>  > for keeping your fork's main branch up to date.
>
> I will occasionally do a `git push origin main` just to shut up the
> messages about being behind/ahead; other than that,
> I have no idea why I would need origin to be up to date.
>

I sync mine occasionally so I can make draft PRs in my repo before
submitting them to the main repo.

That's the only use case I have found for it though.

Cheers,
Nick.


>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/R5GX6JWKSKSSXA3JSU7QQOTGVE6IQEAC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Nick Coghlan
On Sat, 8 May 2021, 8:53 am Pablo Galindo Salgado, 
wrote:

> > One thought: could the stored column position not include the
> indentation? Would that help?
>
> The compiler doesn't have access easy access to the source unfortunately
> so we don't know how much is the indentation. This can make life
> a bit harder for other tools, although it can make it easier for reporting
> the exception as the current traceback display removes indentation.
>


If the lnotab format (or a new data structure on the code object) could
store a line indent offset for each line, each instruction within a line
would only need to record the offset from the end of the indentation.

If we assume "deeply indented code" is the most likely source of
excessively long lines rather than "long expressions and other one line
statements produced by code generators" it may be worth it, but I'm not
sure that's actually true.

If we instead assume long lines are likely to come from code generators,
then we can impose the 255 column limit, and breaking lines at 255 code
points to improve tracebacks would become a quality of implementation issue
for code generators.

The latter assumption seems more likely to be true to me, and if the deep
indentation case does come up, the line offset idea could be pursued later.

Cheers,
Nick.


>
> On Fri, 7 May 2021 at 23:37, MRAB  wrote:
>
>> On 2021-05-07 22:45, Pablo Galindo Salgado wrote:
>> > Hi there,
>> >
>> > We are preparing a PEP and we would like to start some early discussion
>> > about one of the main aspects of the PEP.
>> >
>> > The work we are preparing is to allow the interpreter to produce more
>> > fine-grained error messages, pointing to
>> > the source associated to the instructions that are failing. For example:
>> >
>> > Traceback (most recent call last):
>> >
>> >File "test.py", line 14, in 
>> >
>> >  lel3(x)
>> >
>> >  ^^^
>> >
>> >File "test.py", line 12, in lel3
>> >
>> >  return lel2(x) / 23
>> >
>> > ^^^
>> >
>> >File "test.py", line 9, in lel2
>> >
>> >  return 25 + lel(x) + lel(x)
>> >
>> >  ^^
>> >
>> >File "test.py", line 6, in lel
>> >
>> >  return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>> >
>> >   ^
>> >
>> > TypeError: 'NoneType' object is not subscriptable
>> >
>> >
>> > The cost of this is having the start column number and end
>> column number
>> > information for every bytecode instruction
>> > and this is what we want to discuss (there is also some stack cost to
>> > re-raise exceptions but that's not a big problem in
>> > any case). Given that column numbers are not very big compared with
>> line
>> > numbers, we plan to store these as unsigned chars
>> > or unsigned shorts. We ran some experiments over the standard library
>> > and we found that the overhead of all pyc files is:
>> >
>> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
>> > extra size is 0.88 MB).
>> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and
>> the
>> > extra size is 0.44MB).
>> >
>> > One of the disadvantages of using chars is that we can only report
>> > columns from 1 to 255 so if an error happens in a column
>> > bigger than that then we would have to exclude it (and not show the
>> > highlighting) for that frame. Unsigned short will allow
>> > the values to go from 0 to 65535.
>> >
>> [snip]How common are lines are longer than 255 characters, anyway?
>>
>> One thought: could the stored column position not include the
>> indentation? Would that help?
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/OKQYNAI2B2BRCFMYJPLYPG2HHHUB5QR6/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BA4UQ36VQGZ52BN56XPKFRPVO2TWD6BN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: On the migration from master to main

2021-05-07 Thread Nick Coghlan
On Tue, 4 May 2021, 10:50 am Łukasz Langa,  wrote:

>
> On 4 May 2021, at 02:04, Łukasz Langa  wrote:
>
> Having renamed the branch in my fork first, the exact sequence I used on
> my own clone was:
>
> ❯ git checkout master
> ❯ git branch -m master main
> ❯ git fetch origin
> ❯ git branch -u origin/main main
> ❯ git remote set-head origin -a
> ❯ git fetch upstream
> ❯ git remote set-head upstream -a
> ❯ git pull upstream main
>
> This worked, I can successfully pull from upstream's main and push to my
> upstream's main. The `set-head` ones were suggested by the GitHub UI and
> ended up with heads explicitly being listed in `git log` which I guess is
> harmless.
>
>
> Seeing Ned's and Mariatta's comments on the matter I guess I should
> clarify that my commands above are for a *symmetrical* setup, i.e. where
> I expect the default `git pull` to *pull from origin*, and the default
> `git push` to *push to origin*.
>
> I might not be your typical user here, since I actually push to upstream
> too every now and again as part of release management, in particular
> pushing out signed release tags (which are full git objects with the same
> metadata as commits).
>


The dev guide (and Github) have long recommended this arrangement, so I
believe it's those of us that define a separate "pr" remote for our fork
and have "origin" referring to the main CPython repo that are the odd ones
out.

Cheers,
Nick.



>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5KXVPXAJYFRJCT3RGQ4SC22NEM6IUKBE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Release of a responsive python-docs-theme 2021.5

2021-05-07 Thread Carol Willing
It’s with great pleasure that I announce that python-docs-theme has been 
released to PyPI.

Thanks to the hard work and patience of Olga Bulat, @obulat, Python’s doc theme 
is now responsive. Many thanks to everyone who has contributed to this release 
by filing issues, writing PRs, reviewing PRs, and testing the theme. It was a 
great team effort.

Here are the highlights from the CHANGELOG.rst:

- Make the theme responsive (#46) Contributed by Olga Bulat.
- Use Python 3.8 for the Github Actions (#71) Contributed by Stéphane Wirtel.
- Use default pygments theme (#68) Contributed by Aaron Carlisle.
- Test Github action to validate the theme against docsbuild scripts. (#69) 
Contributed by Julien Palard.
- Add the copy button to pycon3 highlighted code blocks. (#64) Contributed by 
Julien Palard.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/75SE5KPBFEWXOGOAFEKM6FBHJQ3AORXK/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
> One thought: could the stored column position not include the
indentation? Would that help?

The compiler doesn't have access easy access to the source unfortunately so
we don't know how much is the indentation. This can make life
a bit harder for other tools, although it can make it easier for reporting
the exception as the current traceback display removes indentation.


On Fri, 7 May 2021 at 23:37, MRAB  wrote:

> On 2021-05-07 22:45, Pablo Galindo Salgado wrote:
> > Hi there,
> >
> > We are preparing a PEP and we would like to start some early discussion
> > about one of the main aspects of the PEP.
> >
> > The work we are preparing is to allow the interpreter to produce more
> > fine-grained error messages, pointing to
> > the source associated to the instructions that are failing. For example:
> >
> > Traceback (most recent call last):
> >
> >File "test.py", line 14, in 
> >
> >  lel3(x)
> >
> >  ^^^
> >
> >File "test.py", line 12, in lel3
> >
> >  return lel2(x) / 23
> >
> > ^^^
> >
> >File "test.py", line 9, in lel2
> >
> >  return 25 + lel(x) + lel(x)
> >
> >  ^^
> >
> >File "test.py", line 6, in lel
> >
> >  return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
> >
> >   ^
> >
> > TypeError: 'NoneType' object is not subscriptable
> >
> >
> > The cost of this is having the start column number and end column number
> > information for every bytecode instruction
> > and this is what we want to discuss (there is also some stack cost to
> > re-raise exceptions but that's not a big problem in
> > any case). Given that column numbers are not very big compared with line
> > numbers, we plan to store these as unsigned chars
> > or unsigned shorts. We ran some experiments over the standard library
> > and we found that the overhead of all pyc files is:
> >
> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
> > extra size is 0.88 MB).
> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> > extra size is 0.44MB).
> >
> > One of the disadvantages of using chars is that we can only report
> > columns from 1 to 255 so if an error happens in a column
> > bigger than that then we would have to exclude it (and not show the
> > highlighting) for that frame. Unsigned short will allow
> > the values to go from 0 to 65535.
> >
> [snip]How common are lines are longer than 255 characters, anyway?
>
> One thought: could the stored column position not include the
> indentation? Would that help?
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OKQYNAI2B2BRCFMYJPLYPG2HHHUB5QR6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Thanks, Irit for your comment!

> Is it really every instruction? Or only those that can raise exceptions?

Technically only the ones that can raise exceptions, but the majority can
and optimizing this to only restrict to the set that can raise exceptions
has
the danger than the mapping needs to be maintained for new instructions and
that if some instruction starts raising exceptions while it didn't before
then it can introduce subtle bugs.

On the other hand I think the stronger argument to do this on every
instruction is that there is a lot of tools that can find this information
quite useful
such as coverage tools, profilers, state inspection tools and more. For
example, a coverage tool will be able to tell you what part of

x = f(x) if g(x) else y(x)

actually was executed, while currently, it will highlight the full line.
Although in this case these instructions can raise exceptions and would be
covered,
the distinction is different and both criteria could lead to a different
subset.

In short, that may be an optimization but I think I would prefer to avoid
that complexity taking into account the other problems that can raise and
the extra complication

On Fri, 7 May 2021 at 23:21, Irit Katriel 
wrote:

>
>
> On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado 
> wrote:
>
>>
>> The cost of this is having the start column number and end column number
>> information for every bytecode instruction
>>
>
>
> Is it really every instruction? Or only those that can raise exceptions?
>
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W72ZOEIRWXSIY5OCGTBRSGHHKDXGZL6Z/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
> haha, true... Does our parser even have a maximum line length? (I'm not
suggesting being unlimited or matching that if huge, 64k is already
ridiculous

We use py_ssize_t in some places but at the end of the day the lines and
columns have a limit of INT_MAX IIRC


On Fri, 7 May 2021 at 23:35, Gregory P. Smith  wrote:

>
> On Fri, May 7, 2021 at 3:24 PM Pablo Galindo Salgado 
> wrote:
>
>> Thanks a lot Gregory for the comments!
>>
>> An additional cost to this is things that parse text tracebacks not
>>> knowing how to handle it and things that log tracebacks
>>> generating additional output.
>>
>> We should provide a way for people to disable the feature on a process as
>>> part of this while they address tooling and logging issues.  (via the usual
>>> set of command line flag + python env var + runtime API)
>>
>>
>> Absolutely! We were thinking about that and that's easy enough as that is
>> a single conditional on the display function + the extra init configuration.
>>
>> Neither of those is large. While I'd lean towards uint8_t instead of
>>> uint16_t because not even humans can understand a 255 character line so why
>>> bother being pretty about such a thing... Just document the caveat and move
>>> on with the lower value. A future pyc format could change it if a
>>> compelling argument were ever found.
>>
>>
>> I very much agree with you here but is worth noting that I have heard the
>> counter-argument that the longer the line is, the more important may be to
>> distinguish what part of the line is wrong.
>>
>
> haha, true... Does our parser even have a maximum line length? (I'm not
> suggesting being unlimited or matching that if huge, 64k is already
> ridiculous)
>
>
>>
>> A compromise if you want to handle longer lines: A single uint16_t.
>>> Represent the start column in the 9 bits and width in the other 7 bits. (or
>>> any variations thereof)  it's all a matter of what tradeoff you want to
>>> make for space reasons.  encoding as start + width instead of start + end
>>> is likely better anyways if you care about compression as the width byte
>>> will usually be small and thus be friendlier to compression.  I'd
>>> personally ignore compression entirely.
>>
>>
>> I would personally prefer not to implement very tricky compression
>> algorithms because tools may need to parse this and I don't want to
>> complicate the logic a lot. Handling lnotab is already a bit painful and
>> when bugs ocur it makes debugging very tricky. Having the possibility to
>> index something based on the index of the instruction is quite a good API
>> in my opinion.
>>
>> Overall doing this is going to be a big win for developer productivity!
>>
>>
>> Thanks! We think that this has a lot of potential indeed! :)
>>
>> Pablo
>>
>>
>>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TNTBLEMSQ7JKZ2I75WJZSQHYIB6NSXCS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
This is actually a very good point. The only disadvantage is that it
complicates the parsing a bit and we loose the possibility of indexing
the table by instruction offset.

On Fri, 7 May 2021 at 23:01, Larry Hastings  wrote:

> On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
>
> Given that column numbers are not very big compared with line numbers, we
> plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Are lnotab entries required to be a fixed size?  If not:
>
> if column < 255:
> lnotab.write_one_byte(column)
> else:
> lnotab.write_one_byte(255)
> lnotab.write_two_bytes(column)
>
>
> I might even write four bytes instead of two in the latter case,
>
>
> */arry*
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SJP2RIMEFEVWKBWOA2V2X4BMFGHHEZ5J/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Gregory P. Smith
On Fri, May 7, 2021 at 3:24 PM Pablo Galindo Salgado 
wrote:

> Thanks a lot Gregory for the comments!
>
> An additional cost to this is things that parse text tracebacks not
>> knowing how to handle it and things that log tracebacks
>> generating additional output.
>
> We should provide a way for people to disable the feature on a process as
>> part of this while they address tooling and logging issues.  (via the usual
>> set of command line flag + python env var + runtime API)
>
>
> Absolutely! We were thinking about that and that's easy enough as that is
> a single conditional on the display function + the extra init configuration.
>
> Neither of those is large. While I'd lean towards uint8_t instead of
>> uint16_t because not even humans can understand a 255 character line so why
>> bother being pretty about such a thing... Just document the caveat and move
>> on with the lower value. A future pyc format could change it if a
>> compelling argument were ever found.
>
>
> I very much agree with you here but is worth noting that I have heard the
> counter-argument that the longer the line is, the more important may be to
> distinguish what part of the line is wrong.
>

haha, true... Does our parser even have a maximum line length? (I'm not
suggesting being unlimited or matching that if huge, 64k is already
ridiculous)


>
> A compromise if you want to handle longer lines: A single uint16_t.
>> Represent the start column in the 9 bits and width in the other 7 bits. (or
>> any variations thereof)  it's all a matter of what tradeoff you want to
>> make for space reasons.  encoding as start + width instead of start + end
>> is likely better anyways if you care about compression as the width byte
>> will usually be small and thus be friendlier to compression.  I'd
>> personally ignore compression entirely.
>
>
> I would personally prefer not to implement very tricky compression
> algorithms because tools may need to parse this and I don't want to
> complicate the logic a lot. Handling lnotab is already a bit painful and
> when bugs ocur it makes debugging very tricky. Having the possibility to
> index something based on the index of the instruction is quite a good API
> in my opinion.
>
> Overall doing this is going to be a big win for developer productivity!
>
>
> Thanks! We think that this has a lot of potential indeed! :)
>
> Pablo
>
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/E7OM3GA4GNMRXAXOFAIZCCNTBWFUJAEP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread MRAB

On 2021-05-07 22:56, Larry Hastings wrote:

On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
Given that column numbers are not very big compared with line numbers, 
we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library 
and we found that the overhead of all pyc files is:


* If we use shorts, the total overhead is ~3% (total size 28MB and the 
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and 
the extra size is 0.44MB).


One of the disadvantages of using chars is that we can only report 
columns from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the 
highlighting) for that frame. Unsigned short will allow

the values to go from 0 to 65535.


Are lnotab entries required to be a fixed size?  If not:

if column < 255:
     lnotab.write_one_byte(column)
else:
     lnotab.write_one_byte(255)
     lnotab.write_two_bytes(column)


I might even write four bytes instead of two in the latter case,


A slight improvement would be:

if column < 255:
 lnotab.write_one_byte(column)
else:
 lnotab.write_one_byte(255)
 lnotab.write_two_bytes(column - 255)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UJYYDMXCM5TM7GOSIPMK7GOWNC25GL7W/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread MRAB

On 2021-05-07 22:45, Pablo Galindo Salgado wrote:

Hi there,

We are preparing a PEP and we would like to start some early discussion 
about one of the main aspects of the PEP.


The work we are preparing is to allow the interpreter to produce more 
fine-grained error messages, pointing to

the source associated to the instructions that are failing. For example:

Traceback (most recent call last):

   File "test.py", line 14, in 

 lel3(x)

 ^^^

   File "test.py", line 12, in lel3

 return lel2(x) / 23

    ^^^

   File "test.py", line 9, in lel2

 return 25 + lel(x) + lel(x)

 ^^

   File "test.py", line 6, in lel

 return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)

  ^

TypeError: 'NoneType' object is not subscriptable


The cost of this is having the start column number and end column number 
information for every bytecode instruction
and this is what we want to discuss (there is also some stack cost to 
re-raise exceptions but that's not a big problem in
any case). Given that column numbers are not very big compared with line 
numbers, we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library 
and we found that the overhead of all pyc files is:


* If we use shorts, the total overhead is ~3% (total size 28MB and the 
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and the 
extra size is 0.44MB).


One of the disadvantages of using chars is that we can only report 
columns from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the 
highlighting) for that frame. Unsigned short will allow

the values to go from 0 to 65535.


[snip]How common are lines are longer than 255 characters, anyway?

One thought: could the stored column position not include the 
indentation? Would that help?

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MHF3PMCJOR6VK765OSA7NSO66NY3QU3V/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Thanks a lot Gregory for the comments!

An additional cost to this is things that parse text tracebacks not knowing
> how to handle it and things that log tracebacks generating additional
> output.

We should provide a way for people to disable the feature on a process as
> part of this while they address tooling and logging issues.  (via the usual
> set of command line flag + python env var + runtime API)


Absolutely! We were thinking about that and that's easy enough as that is a
single conditional on the display function + the extra init configuration.

Neither of those is large. While I'd lean towards uint8_t instead of
> uint16_t because not even humans can understand a 255 character line so why
> bother being pretty about such a thing... Just document the caveat and move
> on with the lower value. A future pyc format could change it if a
> compelling argument were ever found.


I very much agree with you here but is worth noting that I have heard the
counter-argument that the longer the line is, the more important may be to
distinguish what part of the line is wrong.

A compromise if you want to handle longer lines: A single uint16_t.
> Represent the start column in the 9 bits and width in the other 7 bits. (or
> any variations thereof)  it's all a matter of what tradeoff you want to
> make for space reasons.  encoding as start + width instead of start + end
> is likely better anyways if you care about compression as the width byte
> will usually be small and thus be friendlier to compression.  I'd
> personally ignore compression entirely.


I would personally prefer not to implement very tricky compression
algorithms because tools may need to parse this and I don't want to
complicate the logic a lot. Handling lnotab is already a bit painful and
when bugs ocur it makes debugging very tricky. Having the possibility to
index something based on the index of the instruction is quite a good API
in my opinion.

Overall doing this is going to be a big win for developer productivity!


Thanks! We think that this has a lot of potential indeed! :)

Pablo
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OHSQ6VLMVSZHCLEUQZ52NXCWEGLG2DQN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Gregory P. Smith
On Fri, May 7, 2021 at 2:50 PM Pablo Galindo Salgado 
wrote:

> Hi there,
>
> We are preparing a PEP and we would like to start some early discussion
> about one of the main aspects of the PEP.
>
> The work we are preparing is to allow the interpreter to produce more
> fine-grained error messages, pointing to
> the source associated to the instructions that are failing. For example:
>
> Traceback (most recent call last):
>
>   File "test.py", line 14, in 
>
> lel3(x)
>
> ^^^
>
>   File "test.py", line 12, in lel3
>
> return lel2(x) / 23
>
>^^^
>
>   File "test.py", line 9, in lel2
>
> return 25 + lel(x) + lel(x)
>
> ^^
>
>   File "test.py", line 6, in lel
>
> return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
>
>  ^
>
> TypeError: 'NoneType' object is not subscriptable
>
>
An additional cost to this is things that parse text tracebacks not knowing
how to handle it and things that log tracebacks generating additional
output.  We should provide a way for people to disable the feature on a
process as part of this while they address tooling and logging issues.
(via the usual set of command line flag + python env var + runtime API)

The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>

Neither of those is large. While I'd lean towards uint8_t instead of
uint16_t because not even humans can understand a 255 character line so why
bother being pretty about such a thing... Just document the caveat and move
on with the lower value. A future pyc format could change it if a
compelling argument were ever found.


> Unfortunately these numbers are not easily compressible, as every
> instruction would have very different offsets.
>
> There is also the possibility of not doing this based on some build flag
> on when using -O to allow users to opt out, but given the fact
> that these numbers can be quite useful to other tools like coverage
> measuring tools, tracers, profilers and the such adding conditional
> logic to many places would complicate the implementation considerably and
> will potentially reduce the usability of those tools so we prefer
> not to have the conditional logic. We believe this is extra cost is very
> much worth the better error reporting but we understand and respect
> other points of view.
>
> Does anyone see a better way to encode this information **without
> complicating a lot the implementation**? What are people thoughts on the
> feature?
>

A compromise if you want to handle longer lines: A single uint16_t.
Represent the start column in the 9 bits and width in the other 7 bits. (or
any variations thereof)  it's all a matter of what tradeoff you want to
make for space reasons.  encoding as start + width instead of start + end
is likely better anyways if you care about compression as the width byte
will usually be small and thus be friendlier to compression.  I'd
personally ignore compression entirely.

Overall doing this is going to be a big win for developer productivity!

-Greg
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ULNDFY5CWVDELNPE6S4HY5SDAODOT7DC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Irit Katriel via Python-Dev
On Fri, May 7, 2021 at 10:52 PM Pablo Galindo Salgado 
wrote:

>
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
>


Is it really every instruction? Or only those that can raise exceptions?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UL5XAR2MGYJPIKB67R56OJXUVKP2KM3H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Technically the main concern may be the size of the unmarshalled pyc files
in memory, more than the storage size of disk.

On Fri, 7 May 2021, 23:04 Antoine Pitrou,  wrote:

> On Fri, 7 May 2021 22:45:38 +0100
> Pablo Galindo Salgado  wrote:
> >
> > The cost of this is having the start column number and end column number
> > information for every bytecode instruction
> > and this is what we want to discuss (there is also some stack cost to
> > re-raise exceptions but that's not a big problem in
> > any case). Given that column numbers are not very big compared with line
> > numbers, we plan to store these as unsigned chars
> > or unsigned shorts. We ran some experiments over the standard library and
> > we found that the overhead of all pyc files is:
> >
> > * If we use shorts, the total overhead is ~3% (total size 28MB and the
> > extra size is 0.88 MB).
> > * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> > extra size is 0.44MB).
>
> More generally, if some people in 2021 are still concerned with the size
> of pyc files (why not), how about introducing a new version of the pyc
> format with built-in LZ4 compression?
>
> LZ4 decompression is extremely fast on modern CPUs (several GB/s) and
> vendoring the C library should be simple.
> https://github.com/lz4/lz4
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/PQZ6OTWG6K6W65YXRLKEH7UOD5FM24TN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SNDIHPBDW4Y3KGSOEL7MBJER3IEBIFTN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Gregory P. Smith
On Fri, May 7, 2021 at 3:01 PM Larry Hastings  wrote:

> On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
>
> Given that column numbers are not very big compared with line numbers, we
> plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
>
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).
>
> One of the disadvantages of using chars is that we can only report columns
> from 1 to 255 so if an error happens in a column
> bigger than that then we would have to exclude it (and not show the
> highlighting) for that frame. Unsigned short will allow
> the values to go from 0 to 65535.
>
> Are lnotab entries required to be a fixed size?  If not:
>
> if column < 255:
> lnotab.write_one_byte(column)
> else:
> lnotab.write_one_byte(255)
> lnotab.write_two_bytes(column)
>
> If non-fixed size is acceptable. use utf-8 to encode the column number as
a single codepoint number into bytes and you don't even need to write your
own encode/decode logic for a varint.

-gps
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QNWOZWTNFAVPD77KNG4LRYWCEDY3F6HX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Antoine Pitrou
On Fri, 7 May 2021 22:45:38 +0100
Pablo Galindo Salgado  wrote:
> 
> The cost of this is having the start column number and end column number
> information for every bytecode instruction
> and this is what we want to discuss (there is also some stack cost to
> re-raise exceptions but that's not a big problem in
> any case). Given that column numbers are not very big compared with line
> numbers, we plan to store these as unsigned chars
> or unsigned shorts. We ran some experiments over the standard library and
> we found that the overhead of all pyc files is:
> 
> * If we use shorts, the total overhead is ~3% (total size 28MB and the
> extra size is 0.88 MB).
> * If we use chars. the total overhead is ~1.5% (total size 28 MB and the
> extra size is 0.44MB).

More generally, if some people in 2021 are still concerned with the size
of pyc files (why not), how about introducing a new version of the pyc
format with built-in LZ4 compression?

LZ4 decompression is extremely fast on modern CPUs (several GB/s) and
vendoring the C library should be simple.
https://github.com/lz4/lz4

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PQZ6OTWG6K6W65YXRLKEH7UOD5FM24TN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Larry Hastings

On 5/7/21 2:45 PM, Pablo Galindo Salgado wrote:
Given that column numbers are not very big compared with line numbers, 
we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library 
and we found that the overhead of all pyc files is:


* If we use shorts, the total overhead is ~3% (total size 28MB and the 
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and 
the extra size is 0.44MB).


One of the disadvantages of using chars is that we can only report 
columns from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the 
highlighting) for that frame. Unsigned short will allow

the values to go from 0 to 65535.


Are lnotab entries required to be a fixed size?  If not:

   if column < 255:
    lnotab.write_one_byte(column)
   else:
    lnotab.write_one_byte(255)
    lnotab.write_two_bytes(column)


I might even write four bytes instead of two in the latter case,


//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B3SFCZPXIKGO3LM6UJVSJXFIRAZH2R26/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Future PEP: Include Fine Grained Error Locations in Tracebacks

2021-05-07 Thread Pablo Galindo Salgado
Hi there,

We are preparing a PEP and we would like to start some early discussion
about one of the main aspects of the PEP.

The work we are preparing is to allow the interpreter to produce more
fine-grained error messages, pointing to
the source associated to the instructions that are failing. For example:

Traceback (most recent call last):

  File "test.py", line 14, in 

lel3(x)

^^^

  File "test.py", line 12, in lel3

return lel2(x) / 23

   ^^^

  File "test.py", line 9, in lel2

return 25 + lel(x) + lel(x)

^^

  File "test.py", line 6, in lel

return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)

 ^

TypeError: 'NoneType' object is not subscriptable

The cost of this is having the start column number and end column number
information for every bytecode instruction
and this is what we want to discuss (there is also some stack cost to
re-raise exceptions but that's not a big problem in
any case). Given that column numbers are not very big compared with line
numbers, we plan to store these as unsigned chars
or unsigned shorts. We ran some experiments over the standard library and
we found that the overhead of all pyc files is:

* If we use shorts, the total overhead is ~3% (total size 28MB and the
extra size is 0.88 MB).
* If we use chars. the total overhead is ~1.5% (total size 28 MB and the
extra size is 0.44MB).

One of the disadvantages of using chars is that we can only report columns
from 1 to 255 so if an error happens in a column
bigger than that then we would have to exclude it (and not show the
highlighting) for that frame. Unsigned short will allow
the values to go from 0 to 65535.

Unfortunately these numbers are not easily compressible, as every
instruction would have very different offsets.

There is also the possibility of not doing this based on some build flag on
when using -O to allow users to opt out, but given the fact
that these numbers can be quite useful to other tools like coverage
measuring tools, tracers, profilers and the such adding conditional
logic to many places would complicate the implementation considerably and
will potentially reduce the usability of those tools so we prefer
not to have the conditional logic. We believe this is extra cost is very
much worth the better error reporting but we understand and respect
other points of view.

Does anyone see a better way to encode this information **without
complicating a lot the implementation**? What are people thoughts on the
feature?

Thanks in advance,

Regards from cloudy London,
Pablo Galindo Salgado
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DB3RTYBF2BXTY6ZHP3Z4DXCRWPJIQUFD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Summary of Python tracker Issues

2021-05-07 Thread Python tracker


ACTIVITY SUMMARY (2021-04-30 - 2021-05-07)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open7428 (-25)
  closed 48377 (+104)
  total  55805 (+79)

Open issues with patches: 2946 


Issues opened (48)
==

#40943: PEP 353: Drop support for PyArg_ParseTuple() "#" formats when 
https://bugs.python.org/issue40943  reopened by methane

#43001: python3.8.9, python3.9.2 test_embed test_tabnanny failed
https://bugs.python.org/issue43001  reopened by asholomitskiy84

#43176: Dataclasses derived from empty frozen bases skip immutability 
https://bugs.python.org/issue43176  reopened by eric.smith

#43882: [security] urllib.parse should sanitize urls containing ASCII 
https://bugs.python.org/issue43882  reopened by gregory.p.smith

#43992: Unable to get external dependencies for CPython on Ubuntu Linu
https://bugs.python.org/issue43992  opened by shreyanavigyan

#43994: change representation of match as / capture as `Name(..., ctx=
https://bugs.python.org/issue43994  opened by Anthony Sottile

#44002: Use functools.lru_cache in urllib.parse instead of 1996 custom
https://bugs.python.org/issue44002  opened by gregory.p.smith

#44005: multiple socket bind failure on Mac OS X with SO_REUSEADDR
https://bugs.python.org/issue44005  opened by giangipy

#44010: IDLE: highlight soft keywords
https://bugs.python.org/issue44010  opened by epaine

#44011: Borrow asyncio ssl implementation from uvloop
https://bugs.python.org/issue44011  opened by asvetlov

#44012: IPv6Address.exploded does not support interface name (scope id
https://bugs.python.org/issue44012  opened by ohwgiles

#44013: tempfile.TemporaryFile: name of file descriptor cannot be reus
https://bugs.python.org/issue44013  opened by zhongxiang117

#44016: Enum related deprecation warnings in test_httpservers and test
https://bugs.python.org/issue44016  opened by xtreak

#44019: operator.call/operator.__call__
https://bugs.python.org/issue44019  opened by Antony.Lee

#44021: enum docs in 3.10:  missing "New in version 3.10"
https://bugs.python.org/issue44021  opened by Akuli

#44023: "tarfile" library will lead to "write any content to any file 
https://bugs.python.org/issue44023  opened by leveryd

#44024: Improve the TypeError message for non-string second arguments 
https://bugs.python.org/issue44024  opened by maggyero

#44025: Match doc: Clarify '_' as a soft keyword
https://bugs.python.org/issue44025  opened by terry.reedy

#44026: IDLE: print "Did you mean?" for AttributeError and NameError
https://bugs.python.org/issue44026  opened by Dennis Sweeney

#44028: Request for locals().update() to work, it is
https://bugs.python.org/issue44028  opened by xuancong84

#44030: Markup with_traceback code example
https://bugs.python.org/issue44030  opened by terry.reedy

#44031: python3.8.9, python3.9.2 test_embed test_tabnanny failed
https://bugs.python.org/issue44031  opened by asholomitskiy84

#44032: Function locals and evaluation stack should be stored in a con
https://bugs.python.org/issue44032  opened by Mark.Shannon

#44035: Regenerating the configure script fails even if dependencies a
https://bugs.python.org/issue44035  opened by pablogsal

#44036: asyncio SSL server can be DOSed, event loop gets blocked: busy
https://bugs.python.org/issue44036  opened by ghost43

#44037: Broad performance regression from 3.10a7 to 3.10b1 with python
https://bugs.python.org/issue44037  opened by rhettinger

#44038: In documentation Section 8.6, the definition of parameter_list
https://bugs.python.org/issue44038  opened by webbnh

#44041: [sqlite3] optimisation: only call sqlite3_column_count when ne
https://bugs.python.org/issue44041  opened by erlendaasland

#44042: [sqlite3]  _pysqlite_connection_begin() optimisations
https://bugs.python.org/issue44042  opened by erlendaasland

#44043: 3.10 b1 armhf Bus Error in hashlib test: test_gil
https://bugs.python.org/issue44043  opened by Anthony Sottile

#44044: ConfigParser: cannot link to ConfigParser.optionxform(option)
https://bugs.python.org/issue44044  opened by jugmac00

#44045: canonicalize "upper-case" -> "uppercase"; "lower-case" -> "low
https://bugs.python.org/issue44045  opened by jugmac00

#44048: test_hashlib failure for "AMD64 RHEL8 FIPS Only Blake2 Builtin
https://bugs.python.org/issue44048  opened by cstratak

#44050: Exceptions in a subinterpreter are changed by another subinter
https://bugs.python.org/issue44050  opened by trygveaa

#44052: patch object as argument should be explicit
https://bugs.python.org/issue44052  opened by CendioOssman

#44053: Can't connect to a server also not showing any type of output
https://bugs.python.org/issue44053  opened by muqadasrasheed652

#44055: NamedTemporaryFile opened twice on Windows
https://bugs.python.org/issue44055  opened by frenzy

#44057: Inconsitencies in `__init_subclass__` in a generic class
https://bugs.python.org