[issue2771] Test issue

2022-04-08 Thread Ezio Melotti


Ezio Melotti  added the comment:

So long, and thanks for all the bugs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread miss-islington


miss-islington  added the comment:


New changeset 89697f7374ea947ebe8e36131e2d3e21fff6fa1d by Miss Islington (bot) 
in branch '3.10':
bpo-47260: Fix os.closerange() potentially being a no-op in a seccomp sandbox 
(GH-32418)
https://github.com/python/cpython/commit/89697f7374ea947ebe8e36131e2d3e21fff6fa1d


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] Clarify SortingHOWTO regarding locale aware string sorting

2022-04-08 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
title: string sorting often incorrect -> Clarify SortingHOWTO regarding locale 
aware string sorting
versions: +Python 3.10, Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] string sorting often incorrect

2022-04-08 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

I don't think splashing this everywhere else in the docs would be helpful.  
Tools like list.sort, sorted, min, max, nlargest, nsmallest use whatever sort 
order is provided by the underlying object whether it be a string, tuple, 
float, or int.

The section on expressions is the intended place to cover how comparison are 
defined for core objects:  
https://docs.python.org/3/reference/expressions.html#value-comparisons

As suggested, I will edit the sorting howto to be cleared that locale aware 
sort ordering refers to alphabetical orderings which can vary (for example, the 
Spanish ll sorts differently in different locales).

--
assignee:  -> rhettinger
components: +Documentation -Interpreter Core

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread Gregory P. Smith


Gregory P. Smith  added the comment:

Good catch.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread miss-islington


Change by miss-islington :


--
nosy: +miss-islington
nosy_count: 4.0 -> 5.0
pull_requests: +30445
pull_request: https://github.com/python/cpython/pull/32420

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread Gregory P. Smith


Gregory P. Smith  added the comment:


New changeset 1c8b3b5d66a629258f1db16939b996264a8b9c37 by Alexey Izbyshev in 
branch 'main':
bpo-47260: Fix os.closerange() potentially being a no-op in a seccomp sandbox 
(GH-32418)
https://github.com/python/cpython/commit/1c8b3b5d66a629258f1db16939b996264a8b9c37


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5901] missing meta-info in documentation pdf

2022-04-08 Thread Ned Deily


Ned Deily  added the comment:

The problem seems to have been fixed again somewhere in the past.

--
nosy: +ned.deily
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47087] Implement PEP 655 (Required/NotRequired)

2022-04-08 Thread Jelle Zijlstra


Change by Jelle Zijlstra :


--
keywords: +patch
pull_requests: +30444
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/32419

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35962] [doc] Slight error in words in [ 2.4.1. String and Bytes literals ]

2022-04-08 Thread Irit Katriel


Change by Irit Katriel :


--
keywords: +easy
priority: normal -> low
title: Slight error in words in [ 2.4.1. String and Bytes literals ] -> [doc] 
Slight error in words in [ 2.4.1. String and Bytes literals ]
versions: +Python 3.10, Python 3.11, Python 3.9 -Python 2.7, Python 3.7, Python 
3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44213] LIST_TO_TUPLE placed below the sentence "all of the following use their opcodes" in dis library documentaiton.

2022-04-08 Thread Irit Katriel


Irit Katriel  added the comment:

The "all of the following.. " sentence has been removed in 3.11.

--
nosy: +iritkatriel
resolution:  -> out of date
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47255] Many broken :meth: roles in the docs

2022-04-08 Thread Ken Jin


Ken Jin  added the comment:

It's 3.10 only. Presumably our sphinx version changed then and something broke. 
In 3.9 and earlier the links are all fine.

See https://bugs.python.org/issue42182 for a similar issue I raised ages ago.

--
nosy: +kj

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread Alexey Izbyshev


Change by Alexey Izbyshev :


--
keywords: +patch
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

> It's been years now and that hasn't happened, even with more recent flag 
> additions. I think it's safe to say it won't, and such a fallback upon error 
> won't put us back into a bogus pre-close_range situation where we're 
> needlessly close()ing a bunch of closed fds.

Yes, I also find it unlikely that close_range() will start reporting an error 
if it fails to close some fd, especially if not given some new flag. Such error 
reporting doesn't seem very useful to userspace because there is no way to 
associate the error with the fd.

But even if such change happens, it will be just a performance regression, not 
a potential correctness/security issue.

--
keywords:  -patch
stage: patch review -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47261] RFC: Clarify usage of macros for PySequence_Fast within the Limited C API

2022-04-08 Thread Rohit Goswami


Change by Rohit Goswami :


--
title: RFC: Clarify Limited API macros for PySequence_Fast -> RFC: Clarify 
usage of macros for PySequence_Fast within the Limited C API

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47261] RFC: Clarify Limited API macros for PySequence_Fast

2022-04-08 Thread Rohit Goswami


Rohit Goswami  added the comment:

Perhaps to be clear, there are two possibilities:
1. `PySequence_Fast` should be removed from the Limited API
2. All macros used with `PySequence_Fast` are valid for use in the context of 
the Limited API

In either case the documentation should need to be clarified.

The only situation where no changes would result is if:
- `PySequence_Fast` is part of the Limited API, but must be treated the same as 
a regular `PySequence` object
  + Since only `PySequence_Size` and other variants can be used in the context 
of the Limited API

This is actually also still confusing and should be mentioned clearly.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47261] RFC: Clarify Limited API macros for PySequence_Fast

2022-04-08 Thread Rohit Goswami


Change by Rohit Goswami :


--
assignee:  -> docs@python
components: +C API, Documentation
nosy: +docs@python

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47261] RFC: Clarify Limited API macros for PySequence_Fast

2022-04-08 Thread Rohit Goswami


New submission from Rohit Goswami :

The `current documentation`_ of the Python-C API mentions that 
``PySequence_Fast`` is part of the limited API. However, this may be a typo as 
all the functions interacting with a ``PySequence_Fast`` object are macros, 
e.g. `PySequence_Fast_GET_SIZE`, `PySequence_Fast_GET_ITEM` etc.

If this is indeed a documentation bug I'm happy to open a PR to fix this once 
consensus is reached.






.. _`current documentation` : 
https://docs.python.org/3/c-api/stable.html#contents-of-limited-api

--
messages: 416989
nosy: rgoswami
priority: normal
severity: normal
status: open
title: RFC: Clarify Limited API macros for PySequence_Fast

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread Alexey Izbyshev


Change by Alexey Izbyshev :


--
keywords: +patch
pull_requests: +30443
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32418

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread Kyle Evans


Kyle Evans  added the comment:

Sure, sounds good to me. The original theory (IIRC, I've slept many times since 
then :-)) was that we already know first/last are valid and there are no other 
defined errors, so 'other errors' must be because close_range has started 
percolating up something from closing individual files.

It's been years now and that hasn't happened, even with more recent flag 
additions. I think it's safe to say it won't, and such a fallback upon error 
won't put us back into a bogus pre-close_range situation where we're needlessly 
close()ing a bunch of closed fds.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40421] [C API] Add public getter functions for the internal PyFrameObject structure

2022-04-08 Thread STINNER Victor


Change by STINNER Victor :


--
pull_requests: +30442
pull_request: https://github.com/python/cpython/pull/32417

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39187] urllib.robotparser does not respect the longest match for the rule

2022-04-08 Thread matele secretaire


matele secretaire  added the comment:

Thank you

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47260] os.closerange() can be no-op in a seccomp sandbox

2022-04-08 Thread Alexey Izbyshev


New submission from Alexey Izbyshev :

After #40422 _Py_closerange() assumes that close_range() closes all file 
descriptors even if it returns an error (other than ENOSYS):

if (close_range(first, last, 0) == 0 || errno != ENOSYS) {
/* Any errors encountered while closing file descriptors are ignored;
 * ENOSYS means no kernel support, though,
 * so we'll fallback to the other methods. */
}
else
/* fallbacks */


This assumption can be wrong on Linux if a seccomp sandbox denies the 
underlying syscall, pretending that it returns EPERM or EACCES. In this case 
_Py_closerange() won't close any descriptors at all, which in the worst case 
can be a security issue.

I propose to fix this by falling back to other methods in case of *any* 
close_range() error. Note that fallbacks will not be triggered on any problems 
with closing individual file descriptors because close_range() is documented to 
ignore such errors on both Linux[1] and FreeBSD[2].

[1] https://man7.org/linux/man-pages/man2/close_range.2.html
[2] https://www.freebsd.org/cgi/man.cgi?query=close_range=2

--
assignee: izbyshev
components: Library (Lib)
keywords: 3.10regression
messages: 416986
nosy: gregory.p.smith, izbyshev, kevans, kevans91
priority: normal
severity: normal
status: open
title: os.closerange() can be no-op in a seccomp sandbox
type: behavior
versions: Python 3.10, Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47168] Improvements for stable ABI definition files

2022-04-08 Thread Petr Viktorin


Petr Viktorin  added the comment:

Thinking more about Doc/data/stable_abi.dat, I don't think the rename is worth 
it. The file is not meant to be used/edited by humans.
If someone needs the data for something other than running the Sphinx 
extension, let me know. We should provide a proper data source for their use 
case.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47245] potential undefined behavior with subprocess using vfork() on Linux?

2022-04-08 Thread Марк Коренберг

Марк Коренберг  added the comment:

I have studied assembler output of _posixsubprocess.o compilation. Yes, 
everything seems safe. So, I'm closing the bug.

--
resolution:  -> works for me
stage: test needed -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47168] Improvements for stable ABI definition files

2022-04-08 Thread Petr Viktorin


Change by Petr Viktorin :


--
pull_requests: +30441
pull_request: https://github.com/python/cpython/pull/32415

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47138] Pin Jinja2 to fix docs build

2022-04-08 Thread Łukasz Langa

Łukasz Langa  added the comment:


New changeset d35af52caae844cb4ea0aff06fa3fc5328708af1 by m-aciek in branch 
'3.8':
[3.8] bpo-47138: Fix documentation build by pinning Jinja version to 3.0.3 
(GH-32109)
https://github.com/python/cpython/commit/d35af52caae844cb4ea0aff06fa3fc5328708af1


--
nosy: +lukasz.langa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47234] PEP-484 "numeric tower" approach makes it hard/impossible to specify contracts in documentation

2022-04-08 Thread Thomas Fischbacher


Thomas Fischbacher  added the comment:

Re AlexWaygood:

If these PEP-484 related things were so obvious that they would admit a compact 
description of the problem in 2-3 lines, these issues would likely have been 
identified much earlier. We would not be seeing them now, given that Python by 
and large is a somewhat mature language.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47169] Stable ABI: Some optional (#ifdef'd) functions aren't handled correctly

2022-04-08 Thread Petr Viktorin


Change by Petr Viktorin :


--
pull_requests: +30440
pull_request: https://github.com/python/cpython/pull/32414

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47234] PEP-484 "numeric tower" approach makes it hard/impossible to specify contracts in documentation

2022-04-08 Thread Alex Waygood


Alex Waygood  added the comment:

Please try to make your messages more concise.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] string sorting often incorrect

2022-04-08 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23312] google thinks the docs are mobile unfriendly

2022-04-08 Thread Petr Viktorin


Change by Petr Viktorin :


--
resolution:  -> fixed
stage: needs patch -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23312] google thinks the docs are mobile unfriendly

2022-04-08 Thread Petr Viktorin

Petr Viktorin  added the comment:

This has been solved by the new theme. The Goolgle report linked above shows 
“Passed” and “96” (out of 100).
It does show a few opportunities to improve, but many seem to indirectly 
complain that the page is big.

I'm closing the issue.

--
nosy: +petr.viktorin

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47229] IDLE UI crashes on Chromebook Linux/Bullseye

2022-04-08 Thread Doug Bates


Doug Bates  added the comment:

'cc' Terry to say thank you.

Just fyi I regressed my Chromebook to Debian/Buster form Bullseye, as IDLE
and Thonny had previously worked seamlessly but now it doesn't work on
Bullseye either  -> so Google must have broken something along the way
upgrading ChromeOS [98..101].

I have reported this via Google's bug reporter.  Fingers crossed and thanks
for your help :-)

On Thu, Apr 7, 2022 at 6:28 AM Terry J. Reedy 
wrote:

>
> Terry J. Reedy  added the comment:
>
> "Thonny GUI uses the same as IDLE": I presume this means that Thonny also
> uses tkinter and both fail, which means that tkinter is not working right.
>  Your test run indicates that python is not running correctly either.  It
> only tried to run 10 of what should be over 400 tests and 9 of those failed
> because of failure to import the test file.  You should report this to
> whoever supplies python on Chromebook, which I presume is whoever supplies
> Bullseye.
>
> --
> resolution:  -> third party
> stage:  -> resolved
> status: open -> closed
> title: IDLE / Thonny UI crashes on Chromebook Linux/Bullseye -> IDLE  UI
> crashes on Chromebook Linux/Bullseye
>
> ___
> Python tracker 
> 
> ___
>

--
title: Python tests fail on Chromebook Linux/Bullseye -> IDLE UI crashes on 
Chromebook Linux/Bullseye

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47046] Add `f_state` attribute to FrameObjects.

2022-04-08 Thread Mark Shannon


Mark Shannon  added the comment:

Don't you need to know if a "call" event is a call or the resumption of a 
generator?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45116] Performance regression 3.10b1: inlining issue in the big _PyEval_EvalFrameDefault() function with Visual Studio (MSC)

2022-04-08 Thread Steve Dower


Steve Dower  added the comment:

> __assume(0) should be replaced with other function, inside the eval 
> switch-case or in the inlined paths of callees. This is critical with PGO.

Out of interest, have you done other experiments confirming this? The reference 
linked is talking about compiler throughput (i.e. how long it takes to 
compile), and while it hints that using __assume(0) may interfere with other 
optimisations, that isn't supported with any detail or analysis in the post.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47103] Copy pgort140.dll when building for PGO

2022-04-08 Thread Steve Dower


Change by Steve Dower :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47245] potential undefined behavior with subprocess using vfork() on Linux?

2022-04-08 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

> 3. We have to fix error-path in order not to change heap state (contents and 
> allocations), possibly do not touch locks. During vfork() child execution - 
> the only parent THREAD (not the process) is blocked. For example, it's not 
> allowed to touch GIL. Child process may die unexpectedly and leave GIL 
> locked. Is it possible to rewrite children path for vfork() case without any 
> Py* calls ? As an idea, we can prepare all low-level things (all the pointers 
> to strings and plain values) before vfork(), so child code will use only that 
> data.

What specifically do you propose to fix? There is no problem with GIL if the 
child dies because the GIL is locked and unlocked only by the parent and the 
child never touches it. Similarly, only Py_* calls known to be safe are used. 
As for "pointers to strings", it's not clear to me what you mean, but if you 
mean allocations, they are already done before (v)fork(), since the child code 
is required to be async-signal-safe even if plain fork() is used.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35823] Use vfork() in subprocess on Linux

2022-04-08 Thread Марк Коренберг

Марк Коренберг  added the comment:

Yes, you are almost right. Error-path is not so clear (it is discussed in 
another issue), but in general, yes, my previous comment is wrong. So, finally, 
there are no bugs around the stack at all.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47245] potential undefined behavior with subprocess using vfork() on Linux?

2022-04-08 Thread Марк Коренберг

Марк Коренберг  added the comment:

So, finally: 

1. Regarding vfork() and stack - everything is nice. No bugs because libc has 
nasty hacks for stack restoration.

2. Having the ability to turn off vfork using environment variables is NICE. At 
least, one can easily compare the performance.

3. We have to fix error-path in order not to change heap state (contents and 
allocations), possibly do not touch locks. During vfork() child execution - the 
only parent THREAD (not the process) is blocked. For example, it's not allowed 
to touch GIL. Child process may die unexpectedly and leave GIL locked. Is it 
possible to rewrite children path for vfork() case without any Py* calls ? As 
an idea, we can prepare all low-level things (all the pointers to strings and 
plain values) before vfork(), so child code will use only that data.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40421] [C API] Add public getter functions for the internal PyFrameObject structure

2022-04-08 Thread Mark Shannon


Change by Mark Shannon :


--
pull_requests: +30439
pull_request: https://github.com/python/cpython/pull/32413

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] string sorting often incorrect

2022-04-08 Thread Steven D'Aprano


Change by Steven D'Aprano :


--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47257] add methods to get first and last elements of a range

2022-04-08 Thread paul rubin


paul rubin  added the comment:

Oh nice, I didn't realize you could do that.  len(range) and bool(range) (to 
test for empty) also work.  Ok I guess this enhancement is not needed.  I will 
close ticket, hope that is procedurally correct, otherwise feel free to fix.  
Thanks.

--
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47259] string sorting often incorrect

2022-04-08 Thread Pierre Ossman

New submission from Pierre Ossman :

There is a big gotcha in Python that is easily overlooked and should at the 
very least be more prominently pointed out in the documentation.

Sorting strings will produce results that is very confusing for humans.

I happens to work for ASCII, but will generally produce bad results for other 
things as code points do not always follow the alphabetical order.

The expressions chapter¹ mentions this fact, but you have to dig quite a bit to 
reach that. It also mentions that normalization is an issue, but it never 
mentions the issue about code point order versus alphabetical order.

The sorting tutorial mentions under "Odds and ends"² that you need to use a 
special key or comparison function to get locale aware sorting. It doesn't 
mention that this also includes respecting alphabetical order, which might be 
overlooked unless you are very familiar with how the sorting works. The 
tutorial is also something you have to dig a bit to reach.

Ideally string comparison would always be locale aware in a high level language 
such as Python. However, a smaller step would be a note on sorted()³ that extra 
care needs to be taken for strings as the default behaviour will produce 
unexpected results once your strings include anything outside the English 
alphabet.

¹ https://docs.python.org/3/reference/expressions.html
² https://docs.python.org/3/howto/sorting.html#odd-and-ends
³ https://docs.python.org/3/library/functions.html#sorted

--
components: Interpreter Core
messages: 416972
nosy: CendioOssman
priority: normal
severity: normal
status: open
title: string sorting often incorrect

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40280] Consider supporting emscripten/webassembly as a build target

2022-04-08 Thread Christian Heimes


Change by Christian Heimes :


--
pull_requests: +30438
pull_request: https://github.com/python/cpython/pull/32412

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47257] add methods to get first and last elements of a range

2022-04-08 Thread Mark Dickinson


Mark Dickinson  added the comment:

> but it's messy and potentially tricky to get the actual first and last values 
> of the range

Doesn't simple indexing already provide what you need here?

>>> range(1, 5, 2)[0]  # first element of range
1
>>> range(1, 5, 2)[-1]  # last element of range
3

--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35823] Use vfork() in subprocess on Linux

2022-04-08 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

The preceding comment is wrong, see discussion in #47245 and 
https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 for explanation of why 
that bug report is irrelevant for CPython.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47121] math.isfinite() can raise exception when called on a number

2022-04-08 Thread Thomas Fischbacher


Thomas Fischbacher  added the comment:

Tim, the problem may well be simply due to the documentation of math.isfinite() 
being off here.

This is what we currently have:

https://docs.python.org/3/library/math.html#math.isfinite

===
math.isfinite(x)
Return True if x is neither an infinity nor a NaN, and False otherwise. (Note 
that 0.0 is considered finite.)

New in version 3.2.
===

If this were re-worded as follows (and corresponding changes were made to other 
such functions), everyone would know what the expectations and behavior are:

===
math.isfinite(x)

If `x` is a `float` instance, this evaluates to `True` if `x` is
neither a float infinity nor a NaN, and `False` otherwise.
If `x` is not a `float` instance, this is evaluates to
`math.isfinite(float(x))`.

New in version 3.2.
===

This would be an accurate defining description of the actual behavior. Note 
that, "thanks to PEP-484", this abbreviation would currently be ambiguous 
though:

===
math.isfinite(x)

If `x` is a float, this evaluates to `True` if `x` is
neither a float infinity nor a NaN, and `False` otherwise.
If `x` is not a float, this is evaluates to
`math.isfinite(float(x))`.

New in version 3.2.
===

("ambiguous" since "float" means different things as a static type and as a 
numbers class - and it is not clear what would be referred to here).

Changing/generalizing the behavior might potentially be an interesting other 
proposal, but I would argue that then one would want to change the behavior of 
quite a few other functions here as well, and all this should then perhaps go 
into some other `xmath` (or so) module - bit like it is with `cmath`.

However, since the Python philosophy is to not rely on bureaucracy to enforce 
contracts (as C++, Java, etc. do it), but instead to rely on people's ability 
to define their own contracts, making the math.isfinite() contract more 
accurate w.r.t. actual behavior in the CPython implementation via extra 
clarification looks like a good thing to do, no?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46263] FreeBSD buildbots cannot compile Python

2022-04-08 Thread Kubilay Kocak


Change by Kubilay Kocak :


--
nosy: +koobs

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47258] Python 3.10 hang at exit in drop_gil() (due to resource warning at exit?)

2022-04-08 Thread Richard Purdie


New submission from Richard Purdie :

We had a python hang at shutdown. The gdb python backtrace and C backtraces are 
below. It is hung in the COND_WAIT(gil->switch_cond, gil->switch_mutex) call in 
drop_gil().

Py_FinalizeEx -> handle_system_exit() -> PyGC_Collect -> handle_weakrefs -> 
drop_gil 

I think from the stack trace it may have been printing the warning:

sys:1: ResourceWarning: unclosed file <_io.TextIOWrapper 
name='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/build-st-1560250/bitbake-cookerdaemon.log'
 mode='a+' encoding='UTF-8'>

however I'm not sure if it was that or trying to show a different exception. 
Even if we have a resource leak, it shouldn't really hang!

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib64/python3.10/weakref.py", line 106, in remove
def remove(wr, selfref=ref(self), _atomic_removal=_remove_dead_weakref):
  Garbage-collecting

#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, 
op=393, expected=0, futex_word=0x7f0f7bd54b20 <_PyRuntime+512>) at 
futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7f0f7bd54b20 
<_PyRuntime+512>, expected=expected@entry=0, clockid=clockid@entry=0, 
abstime=abstime@entry=0x0, 
private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87
#2  0x7f0f7b88979f in __GI___futex_abstimed_wait_cancelable64 
(futex_word=futex_word@entry=0x7f0f7bd54b20 <_PyRuntime+512>, 
expected=expected@entry=0, clockid=clockid@entry=0, 
abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139
#3  0x7f0f7b88beb0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, 
mutex=0x7f0f7bd54b28 <_PyRuntime+520>, cond=0x7f0f7bd54af8 <_PyRuntime+472>) at 
pthread_cond_wait.c:504
#4  ___pthread_cond_wait (cond=cond@entry=0x7f0f7bd54af8 <_PyRuntime+472>, 
mutex=mutex@entry=0x7f0f7bd54b28 <_PyRuntime+520>) at pthread_cond_wait.c:619
#5  0x7f0f7bb388d8 in drop_gil (ceval=0x7f0f7bd54a78 <_PyRuntime+344>, 
ceval2=, tstate=0x558744ef7c10)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval_gil.h:182
#6  0x7f0f7bb223e8 in eval_frame_handle_pending (tstate=) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:1185
#7  _PyEval_EvalFrameDefault (tstate=, f=, 
throwflag=) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:1775
#8  0x7f0f7bb19600 in _PyEval_EvalFrame (throwflag=0, 
f=Frame 0x7f0f7a0c8a60, for file /usr/lib64/python3.10/weakref.py, line 
106, in remove (wr=, selfref=, _atomic_removal=), tstate=0x558744ef7c10)
at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/internal/pycore_ceval.h:46
#9  _PyEval_Vector (tstate=, con=, 
locals=, args=, argcount=1, kwnames=)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/ceval.c:5065
#10 0x7f0f7bb989a8 in _PyObject_VectorcallTstate (kwnames=0x0, 
nargsf=9223372036854775809, args=0x7fff8b815bc8, callable=, 
tstate=0x558744ef7c10) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/cpython/abstract.h:114
#11 PyObject_CallOneArg (func=, 
arg=) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Include/cpython/abstract.h:184
#12 0x7f0f7bb0fce1 in handle_weakrefs (old=0x558744edbd30, 
unreachable=0x7fff8b815c70) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:887
#13 gc_collect_main (tstate=0x558744ef7c10, generation=2, 
n_collected=0x7fff8b815d50, n_uncollectable=0x7fff8b815d48, nofail=0)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:1281
#14 0x7f0f7bb9194e in gc_collect_with_callback 
(tstate=tstate@entry=0x558744ef7c10, generation=generation@entry=2)
at /usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:1413
#15 0x7f0f7bbc827e in PyGC_Collect () at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Modules/gcmodule.c:2099
#16 0x7f0f7bbc7bc2 in Py_FinalizeEx () at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pylifecycle.c:1781
#17 0x7f0f7bbc7d7c in Py_Exit (sts=0) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pylifecycle.c:2858
#18 0x7f0f7bbc4fbb in handle_system_exit () at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:775
#19 0x7f0f7bbc4f3d in _PyErr_PrintEx (set_sys_last_vars=1, 
tstate=0x558744ef7c10) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:785
#20 PyErr_PrintEx (set_sys_last_vars=1) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:880
#21 0x7f0f7bbbcece in PyErr_Print () at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:886
#22 _PyRun_SimpleFileObject (fp=, filename=, 
closeit=1, flags=0x7fff8b815f18) at 
/usr/src/debug/python3.10-3.10.4-1.fc35.x86_64/Python/pythonrun.c:462
#23 0x7f0f7bbbcc57 in _PyRun_AnyFileObject (fp=0x558744ed9370, 
filename='/home/pokybuild/yocto-worker/oe-selftest-fedora/build/bitbake/bin/bitbake',
 closeit=1, 
flags=0x7fff8b815f18) at 

[issue47234] PEP-484 "numeric tower" approach makes it hard/impossible to specify contracts in documentation

2022-04-08 Thread Thomas Fischbacher


Thomas Fischbacher  added the comment:

This is not a partial duplicate of https://bugs.python.org/issue47121 about 
math.isfinite().
The problem there is about a specific function on which the documentation may 
be off -
I'll comment separately on that.


The problem here is: There is a semantic discrepancy between what the
term 'float' means "at run time", such as in a check like:

issubclass(type(x), float)

(I am deliberately writing it that way, given that isinstance() can, in general 
[but actually not for float], lie.)

and what the term 'float' means in a statically-checkable type annotation like:

def f(x: float) -> ... : ...

...and this causes headaches.


The specific example ('middle_mean') illustrates the sort of weird
situations that arise due to this. (I discovered this recently when
updating some of our company's Python onboarding material, where the
aspiration naturally is to be extremely accurate with all claims.)

So, basically, there is a choice to make between these options:

Option A: Give up on the idea that "we want to be able to reason with
stringency about the behavior of code" / "we accept that there will be
gaps between what code does and what we can reason about".  (Not
really an option, especially with an eye on "writing secure code
requires being able to reason out everything with stringency".)

Option B: Accept the discrepancy and tell people that they have to be
mindful about float-the-dynamic-type being a different concept from
float-the-static-type.

Option C: Realizing that having "float" mean different things for
dynamic and static typing was not a great idea to begin with, and get
everybody who wants to state things such as "this function parameter
can be any instance of a real number type" to use the type
`numbers.Real` instead (which may well need better support by
tooling), respectively express "can be int or float" as `Union[int,
float]`.

Also, there is Option D: PEP-484 has quite a lot of other problems
where the design does not meet rather natural requirements, such as:
"I cannot introduce a newtype for 'a mapping where I know the key to
be a particular enum-type, but the value is type-parametric'
(so the new type would also be 1-parameter type-parametric)", and
this float-mess is merely one symptom of "maybe PEP-484 was approved
too hastily and should have been also scrutinized by people
from a community with more static typing experience".


Basically, Option B would spell out as: 'We expect users who use
static type annotations to write code like this, and expect them to be
aware of the fact that the four places where the term "float" occurs
refer to two different concepts':

def foo(x: float) -> float:
  """Returns the foo of the number `x`.

  Args:
x: float, the number to foo.

  Returns:
float, the value of the foo-function at `x`.
  """
  ...

...which actually is shorthand for...:

def foo(x: float  # Note: means float-or-int
  ) -> float  # Note: means float-or-int
  :
  """Returns the foo of the number `x`.

  Args:
x: the number to foo, an instance of the `float` type.

  Returns:
The value of the foo-function at `x`,
as an instance of the `float` type.
  """
  ...

Option C (and perhaps D) appear - to me - to be the only viable
choices here. The pain with Option C is that it invalidates/changes
the meaning of already-written code that claims to follow PEP-484,
and the main point of Option D is all about: "If we have to cause
a new wound and open up the patient again, let's try to minimize
the number of times we have to do this."

Option C would amount to changing the meaning of...:

def foo(x: float) -> float:
  """Returns the foo of the number `x`.

  Args:
x: float, the number to foo.

  Returns:
float, the value of the foo-function at `x`.
  """
  ...

to "static type annotation float really means instance-of-float here"
(I do note that issubclass(numpy.float64, float), so passing a
numpy-float64 is expected to work here, which is good), and ask people
who would want to have functions that can process more generic real
numbers to announce this properly. So, we would end up with basically
a list of different things that a function-sketch like the one above
could turn into - depending on the author's intentions for
the function, some major cases being perhaps:

(a) ("this is supposed to strictly operate on float")
def foo(x: float) -> float:
  """Returns the foo of the number `x`.

  Args:
x: the number to foo.

  Returns:
the value of the foo-function at `x`.
  """

(b) ("this will eat any kind of real number")

def foo(x: numbers.Real) -> numbers.Real:
  """Returns the foo of the number `x`.

  Args:
x: the number to foo.

  Returns:
the value of the foo-function at `x`.
  """

(c) ("this will eat any kind of real number, but the result will always be 
float")

def foo(x: numbers.Real) -> float:
  """Returns the foo of the number `x`.

  Args:
x: the number to foo.

  Returns:
the value of 

[issue43944] Processes in Python 3.9 exiting with code 1 when It's created inside a ThreadPoolExecutor

2022-04-08 Thread Thomas Grainger


Thomas Grainger  added the comment:

the problem is multiprocessing/process is calling threading._shutdown which 
tries to join its own thread, because concurrent.futures.thread._threads_queues 
contains the main thread in the subprocess


  File 
"/home/graingert/miniconda3/envs/dask-distributed/lib/python3.10/multiprocessing/process.py",
 line 333, in _bootstrap
threading._shutdown()
  File 
"/home/graingert/miniconda3/envs/dask-distributed/lib/python3.10/threading.py", 
line 1530, in _shutdown
atexit_call()
  File 
"/home/graingert/miniconda3/envs/dask-distributed/lib/python3.10/concurrent/futures/thread.py",
 line 31, in _python_exit
t.join()
  File 
"/home/graingert/miniconda3/envs/dask-distributed/lib/python3.10/threading.py", 
line 1086, in join
raise RuntimeError("cannot join current thread")

--
nosy: +graingert

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39442] from __future__ import annotations makes dataclasses.Field.type a string, not type

2022-04-08 Thread Marco Barisione


Marco Barisione  added the comment:

Actually, sorry I realise I can pass `include_extras` to `get_type_hints`.
Still, it would be nicer not to have to do that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47245] potential undefined behavior with subprocess using vfork() on Linux?

2022-04-08 Thread Alexey Izbyshev


Alexey Izbyshev  added the comment:

> As for glibc specifics, I'm mostly thinking of the calls we do in the child.

> According to the "Standard Description (POSIX.1)" calls to anything other 
> than `_exit()` or `exec*()` are not allowed.  But the longer "Linux 
> Description" in that vfork(2) man page does not say that.  Which implies 
> merely by omission that calls to other things are okay so long as you 
> understand everything they do to the process heap/stack/state.  (I wish it 
> were *explicit* about that)

If we're talking about the kernel side of things, sure, we rely on Linux being 
"sane" here, though I suppose on *BSDs the situation is similar.

> Some of the calls we do from our child_exec() code... many are likely "just" 
> syscall shims and thus fine - but that is technically up to libc.

Yes, but I wouldn't say that "being just syscall shims" is specific for glibc. 
It's just a "natural" property that just about any libc is likely to possess. 
(Yeah, I know, those are vague words, but in my experience "glibc-specific" is 
usually applied to some functionality/bug present in glibc and absent in other 
libcs, and I don't think we rely on something like that).

Of course, there are also LD_PRELOAD things that could be called instead of 
libc, but good news here is that we don't create new constrains for them 
(CPython is not the only software that uses vfork()), and they're on their own 
otherwise.

> A few others are Py functions that go elsewhere in CPython and while they may 
> be fine for practical reasons today with dangerous bits on conditional 
> branches that technically should not be possible to hit given the state by 
> the time we're at this point in _posixsubprocess, pose a future risk - anyone 
> touching implementations of those is likely unaware of vfork'ed child 
> limitations that must be met.

We already have async-signal-safety requirement for all such code because of 
fork(). Requirements of vfork() are a bit more strict, but at least the set of 
functions we have to watch for dangerous changes is the same. And I suspect 
that most practical violations of vfork()-safety also violate 
async-signal-safety.

> For example if one of the potential code paths that trigger an indirect 
> Py_FatalError() is hit... that fatal exit code is definitely not 
> post-vfork-child safe.  The pre-exec child dying via that could screw up the 
> vfork parent process's state.

Yeah, and it can break the fork parent too, at least because it uses exit() 
(not _exit()), so stdio buffers will be flushed twice, in the child and in the 
parent.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39442] from __future__ import annotations makes dataclasses.Field.type a string, not type

2022-04-08 Thread Marco Barisione


Marco Barisione  added the comment:

This is particularly annoying if you are using `Annotated` with a dataclass.

For instance:
```
from __future__ import annotations

import dataclasses
from typing import Annotated, get_type_hints


@dataclasses.dataclass
class C:
v: Annotated[int, "foo"]


v_type = dataclasses.fields(C)[0].type
print(repr(v_type))  # "Annotated[int, 'foo']"
print(repr(get_type_hints(C)["v"]))  # 
print(repr(eval(v_type)))  # typing.Annotated[int, 'foo']
```

In the code above it looks like the only way to get the `Annotated` so you get 
get its args is using `eval`. The problem is that, in non-trivial, examples, 
`eval` would not be simple to use as you need to consider globals and locals, 
see https://peps.python.org/pep-0563/#resolving-type-hints-at-runtime.

--
nosy: +barisione

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47257] add methods to get first and last elements of a range

2022-04-08 Thread paul rubin


New submission from paul rubin :

Inspired by a question on comp.lang.python about how to deal with an int set 
composed of integers and ranges.  Range objects like range(1,5,2) contain 
start, stop, and step values, but it's messy and potentially tricky to get the 
actual first and last values of the range.  Examples:

range(1,5,2) - first = 1, last = 3

range (5, 1, 2) - range is empty, first = last = None

range(5, 1, -1) - first is 5, last is 2

Note in the case where the range is not empty, you can get the "last" by a 
messy calculation but it's easier to pick the first element from the reverse 
iterator.  But then you might forget to catch the stopiteration exception in 
the case that the list is empty.  The same goes for the first element, roughly. 
 And constructing the iterators just to pick one element seems like unnecessary 
overhead.

So it is better to have actual methods for these, with type Optional[int].  
Then mypy should remind you to check for the empty case if you forget.

--
messages: 416962
nosy: phr
priority: normal
severity: normal
status: open
title: add methods to get first and last elements of a range
type: enhancement

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47248] Possible slowdown of regex searching in 3.11

2022-04-08 Thread Ma Lin


Ma Lin  added the comment:

> Possibly related to the new atomic grouping support from GH-31982?

It seems not likely.
I will do some benchmarks for this issue, more information (version/platform) 
is welcome.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-08 Thread Ma Lin


Change by Ma Lin :


--
keywords: +patch
pull_requests: +30437
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32411

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47256] re: limit the maximum capturing group to 1, 073, 741, 823, reduce sizeof(match_context).

2022-04-08 Thread Ma Lin

New submission from Ma Lin :

These changes reduce sizeof(match_context):
- 32-bit build: 36 bytes, no change.
- 64-bit build: 72 bytes -> 56 bytes.

sre uses stack and `match_context` struct to simulate recursive call, smaller 
struct brings:
- deeper recursive call
- less memory consume
- less memory realloc

Here is a test, if limit the stack size to 1 GiB, the max available value of n 
is:

re.match(r'(ab)*', n * 'ab')   # need to save MARKs
72 bytes: n = 11,184,808
64 bytes: n = 12,201,609
56 bytes: n = 13,421,770

re.match(r'(?:ab)*', n * 'ab') # no need to save MARKs
72 bytes: n = 13,421,770
64 bytes: n = 14,913,078
56 bytes: n = 16,777,213

1,073,741,823 capturing groups should enough for almost all users.
If limit it to 16,383 (2-byte integer), the context size may reduce more. But 
maybe some patterns generated by program will have more than this number of 
capturing groups.

1️⃣Performance:

Before
regex_dna: Mean +- std dev: 149 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.22 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms
my benchmark[1]: 13.9 sec +- 0.0 sec

Commit 1. limit the maximum capture group to 1,073,741,823
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.3 ms +- 0.1 ms
my benchmark: 13.8 sec +- 0.0 sec

Commit 2. further reduce sizeof(SRE(match_context))
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.16 ms +- 0.02 ms
regex_v8: Mean +- std dev: 22.2 ms +- 0.1 ms
my benchmark: 13.8 sec +- 0.1 sec

If further change the types of toplevel/jump from int to char, in 32-bit build 
sizeof(match_context) will be reduced from 36 to 32 (In 64-bit build still 56). 
But it's slower on 64-bit build, so I didn't adopt it:
regex_dna: Mean +- std dev: 150 ms +- 1 ms
regex_effbot: Mean +- std dev: 2.18 ms +- 0.01 ms
regex_v8: Mean +- std dev: 22.4 ms +- 0.1 ms
my benchmark: 14.1 sec +- 0.0 sec

2️⃣ The type of match_context.count is Py_ssize_t
- If change it to 4-byte integer, need to modify some engine code.
- If keep it as Py_ssize_t, SRE_MAXREPEAT may >= 4 GiB in future versions.  
  Currently SRE_MAXREPEAT can't >= 4 GiB.
So the type of match_context.count is unchanged.

[1] My re benchmark, it uses 16 patterns to process 100 MiB text data:
https://github.com/animalize/re_benchmarks

--
components: Library (Lib)
messages: 416960
nosy: ezio.melotti, malin, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
status: open
title: re: limit the maximum capturing group to 1,073,741,823, reduce 
sizeof(match_context).
type: resource usage
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47248] Possible slowdown of regex searching in 3.11

2022-04-08 Thread Dennis Sweeney


Dennis Sweeney  added the comment:

Possibly related to the new atomic grouping support from GH-31982?

--
nosy: +Dennis Sweeney, serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com