from:"tim\-one"

[Python-checkins] GH-116554: Relax list.sort()'s notion of "descending" runs (#116578)

2024-03-12 Thread tim-one

https://github.com/python/cpython/commit/bf121d6a694bea4fe9864a19879fe0c70c4e0656
commit: bf121d6a694bea4fe9864a19879fe0c70c4e0656
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-03-12T19:59:42-05:00
summary:

GH-116554: Relax list.sort()'s notion of "descending" runs (#116578)

* GH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a 
descending run. Both ascending and descending runs can have arbitrarily many 
sub-runs of arbitrarily many equal elements now. This is tricky, because we 
only use ``<`` comparisons, so checking for equality doesn't come "for free". 
Surprisingly, it turned out there's a very cheap (one comparison) way to 
determine whether an ascending run consisted of all-equal elements. That sealed 
the deal.

In addition, after a descending run is reversed in-place, we now go on to see 
whether it can be extended by an ascending run that just happens to be 
adjacent. This succeeds in finding at least one additional element to append 
about half the time, and so appears to more than repay its cost (the savings 
come from getting to skip a binary search, when a short run is artificially 
forced to length MIINRUN later, for each new element `count_run()` can add to 
the initial run).

While these have been in the back of my mind for years, a question on 
StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood 
Co-authored-by: Pieter Eendebak 

files:
A Misc/NEWS.d/next/Core and 
Builtins/2024-03-11-00-45-39.gh-issue-116554.gYumG5.rst
M Lib/test/test_sort.py
M Objects/listobject.c
M Objects/listsort.txt

diff --git a/Lib/test/test_sort.py b/Lib/test/test_sort.py
index 3b6ad4d17b0416..2a7cfb7affaa21 100644
--- a/Lib/test/test_sort.py
+++ b/Lib/test/test_sort.py
@@ -128,6 +128,27 @@ def bad_key(x):
 x = [e for e, i in augmented] # a stable sort of s
 check("stability", x, s)
 
+def test_small_stability(self):
+from itertools import product
+from operator import itemgetter
+
+# Exhaustively test stability across all lists of small lengths
+# and only a few distinct elements.
+# This can provoke edge cases that randomization is unlikely to find.
+# But it can grow very expensive quickly, so don't overdo it.
+NELTS = 3
+MAXSIZE = 9
+
+pick0 = itemgetter(0)
+for length in range(MAXSIZE + 1):
+# There are NELTS ** length distinct lists.
+for t in product(range(NELTS), repeat=length):
+xs = list(zip(t, range(length)))
+# Stability forced by index in each element.
+forced = sorted(xs)
+# Use key= to hide the index from compares.
+native = sorted(xs, key=pick0)
+self.assertEqual(forced, native)
 #==
 
 class TestBugs(unittest.TestCase):
diff --git a/Misc/NEWS.d/next/Core and 
Builtins/2024-03-11-00-45-39.gh-issue-116554.gYumG5.rst b/Misc/NEWS.d/next/Core 
and Builtins/2024-03-11-00-45-39.gh-issue-116554.gYumG5.rst
new file mode 100644
index 00..82f92789de0a39
--- /dev/null
+++ b/Misc/NEWS.d/next/Core and 
Builtins/2024-03-11-00-45-39.gh-issue-116554.gYumG5.rst 
@@ -0,0 +1 @@
+``list.sort()`` now exploits more cases of partial ordering, particularly 
those with long descending runs with sub-runs of equal values. Those are 
recognized as single runs now (previously, each block of repeated values caused 
a new run to be created).
diff --git a/Objects/listobject.c b/Objects/listobject.c
index 759902c06b4ef3..7179d5929e2148 100644
--- a/Objects/listobject.c
+++ b/Objects/listobject.c
@@ -1618,10 +1618,11 @@ struct s_MergeState {
 /* binarysort is the best method for sorting small arrays: it does
few compares, but can do data movement quadratic in the number of
elements.
-   [lo, hi) is a contiguous slice of a list, and is sorted via
+   [lo.keys, hi) is a contiguous slice of a list of keys, and is sorted via
binary insertion.  This sort is stable.
-   On entry, must have lo <= start <= hi, and that [lo, start) is already
-   sorted (pass start == lo if you don't know!).
+   On entry, must have lo.keys <= start <= hi, and that
+   [lo.keys, start) is already sorted (pass start == lo.keys if you don't
+   know!).
If islt() complains return -1, else 0.
Even in case of error, the output slice will be some permutation of
the input (nothing is lost or duplicated).
@@ -1634,7 +1635,7 @@ binarysort(MergeState *ms, sortslice lo, PyObject **hi, 
Py

[Python-checkins] GH-116939: Rewrite binarysort() (#116940)

2024-03-21 Thread tim-one

https://github.com/python/cpython/commit/8383915031942f441f435a5ae800790116047b80
commit: 8383915031942f441f435a5ae800790116047b80
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-03-21T22:27:25-05:00
summary:

GH-116939: Rewrite binarysort() (#116940)

Rewrote binarysort() for clarity.

Also changed the signature to be more coherent (it was mixing sortslice with 
raw pointers).

No change in method or functionality. However, I left some experiments in, 
disabled for now
via `#if` tricks. Since this code was first written, some kinds of comparisons 
have gotten
enormously faster (like for lists of floats), which changes the tradeoffs.

For example, plain insertion sort's simpler innermost loop and highly 
predictable branches
leave it very competitive (even beating, by a bit) binary insertion when 
comparisons are
very cheap, despite that it can do many more compares. And it wins big on runs 
that
are already sorted (moving the next one in takes only 1 compare then).

So I left code for a plain insertion sort, to make future experimenting easier.

Also made the maximum value of minrun a `#define` (``MAX_MINRUN`) to make
experimenting with that easier too.

And another bit of `#if``-disabled code rewrites binary insertion's innermost 
loop to
remove its unpredictable branch. Surprisingly, this doesn't really seem to help
overall. I'm unclear on why not. It certainly adds more instructions, but 
they're very
simple, and it's hard to be believe they cost as much as a branch miss.

files:
M Objects/listobject.c
M Objects/listsort.txt

diff --git a/Objects/listobject.c b/Objects/listobject.c
index fc20a9bff3af47..470ad8eb8135db 100644
--- a/Objects/listobject.c
+++ b/Objects/listobject.c
@@ -1628,6 +1628,15 @@ sortslice_advance(sortslice *slice, Py_ssize_t n)
 /* Avoid malloc for small temp arrays. */
 #define MERGESTATE_TEMP_SIZE 256
 
+/* The largest value of minrun. This must be a power of 2, and >= 1, so that
+ * the compute_minrun() algorithm guarantees to return a result no larger than
+ * this,
+ */
+#define MAX_MINRUN 64
+#if ((MAX_MINRUN) < 1) || ((MAX_MINRUN) & ((MAX_MINRUN) - 1))
+#error "MAX_MINRUN must be a power of 2, and >= 1"
+#endif
+
 /* One MergeState exists on the stack per invocation of mergesort.  It's just
  * a convenient way to pass state around among the helper functions.
  */
@@ -1685,68 +1694,133 @@ struct s_MergeState {
 int (*tuple_elem_compare)(PyObject *, PyObject *, MergeState *);
 };
 
-/* binarysort is the best method for sorting small arrays: it does
-   few compares, but can do data movement quadratic in the number of
-   elements.
-   [lo.keys, hi) is a contiguous slice of a list of keys, and is sorted via
-   binary insertion.  This sort is stable.
-   On entry, must have lo.keys <= start <= hi, and that
-   [lo.keys, start) is already sorted (pass start == lo.keys if you don't
-   know!).
-   If islt() complains return -1, else 0.
+/* binarysort is the best method for sorting small arrays: it does few
+   compares, but can do data movement quadratic in the number of elements.
+   ss->keys is viewed as an array of n kays, a[:n]. a[:ok] is already sorted.
+   Pass ok = 0 (or 1) if you don't know.
+   It's sorted in-place, by a stable binary insertion sort. If ss->values
+   isn't NULL, it's permuted in lockstap with ss->keys.
+   On entry, must have n >= 1, and 0 <= ok <= n <= MAX_MINRUN.
+   Return -1 if comparison raises an exception, else 0.
Even in case of error, the output slice will be some permutation of
the input (nothing is lost or duplicated).
 */
 static int
-binarysort(MergeState *ms, sortslice lo, PyObject **hi, PyObject **start)
+binarysort(MergeState *ms, const sortslice *ss, Py_ssize_t n, Py_ssize_t ok)
 {
-Py_ssize_t k;
-PyObject **l, **p, **r;
+Py_ssize_t k; /* for IFLT macro expansion */
+PyObject ** const a = ss->keys;
+PyObject ** const v = ss->values;
+const bool has_values = v != NULL;
 PyObject *pivot;
-
-assert(lo.keys <= start && start <= hi);
-/* assert [lo.keys, start) is sorted */
-if (lo.keys == start)
-++start;
-for (; start < hi; ++start) {
-/* set l to where *start belongs */
-l = lo.keys;
-r = start;
-pivot = *r;
-/* Invariants:
- * pivot >= all in [lo.keys, l).
- * pivot  < all in [r, start).
- * These are vacuously true at the start.
+Py_ssize_t M;
+
+assert(0 <= ok && ok <= n && 1 <= n && n <= MAX_MINRUN);
+/* assert a[:ok] is sorted */
+if (! ok)
+++ok;
+/* Regular insertion sort has average- and worst-case O(n**2) cost
+   for both # of comparisons and number of bytes moved. But its branches
+   are highly predictable, and it loves sorted input (n-1 compares and no
+   data

[Python-checkins] gh-118164: str(10**10000) hangs if the C _decimal module is missing (#118503)

2024-05-04 Thread tim-one

https://github.com/python/cpython/commit/999f0c512281995fb61a0d9eda075fd846e8c505
commit: 999f0c512281995fb61a0d9eda075fd846e8c505
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-05-04T18:22:33-05:00
summary:

gh-118164: str(10**1) hangs if the C _decimal module is missing (#118503)

* Initial stab.

* Test the tentative fix. Hangs "forever" without this change.

* Move the new test to a better spot.

* New comment to explain why _convert_to_str allows any poewr of 10.

* Fixed a comment, and fleshed out an existing test that appeared unfinished.

* Added temporary asserts. Or maybe permanent ;-)

* Update Lib/_pydecimal.py

Co-authored-by: Serhiy Storchaka 

* Remove the new _convert_to_str().

Serhiy and I independently concluded that exact powers of 10
aren't possible in these contexts, so just checking the
string length is sufficient.

* At least for now, add the asserts to the other block too.

* 📜🤖 Added by blurb_it.

-

Co-authored-by: Serhiy Storchaka 
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>

files:
A Misc/NEWS.d/next/Library/2024-05-04-20-22-59.gh-issue-118164.9D02MQ.rst
M Lib/_pydecimal.py
M Lib/test/test_decimal.py

diff --git a/Lib/_pydecimal.py b/Lib/_pydecimal.py
index de4561a5ee050b..613123ec7b4329 100644
--- a/Lib/_pydecimal.py
+++ b/Lib/_pydecimal.py
@@ -2131,10 +2131,16 @@ def _power_exact(self, other, p):
 else:
 return None
 
-if xc >= 10**p:
+# An exact power of 10 is representable, but can convert to a
+# string of any length. But an exact power of 10 shouldn't be
+# possible at this point.
+assert xc > 1, self
+assert xc % 10 != 0, self
+strxc = str(xc)
+if len(strxc) > p:
 return None
 xe = -e-xe
-return _dec_from_triple(0, str(xc), xe)
+return _dec_from_triple(0, strxc, xe)
 
 # now y is positive; find m and n such that y = m/n
 if ye >= 0:
@@ -2184,13 +2190,18 @@ def _power_exact(self, other, p):
 return None
 xc = xc**m
 xe *= m
-if xc > 10**p:
+# An exact power of 10 is representable, but can convert to a string
+# of any length. But an exact power of 10 shouldn't be possible at
+# this point.
+assert xc > 1, self
+assert xc % 10 != 0, self
+str_xc = str(xc)
+if len(str_xc) > p:
 return None
 
 # by this point the result *is* exactly representable
 # adjust the exponent to get as close as possible to the ideal
 # exponent, if necessary
-str_xc = str(xc)
 if other._isinteger() and other._sign == 0:
 ideal_exponent = self._exp*int(other)
 zeros = min(xe-ideal_exponent, p-len(str_xc))
diff --git a/Lib/test/test_decimal.py b/Lib/test/test_decimal.py
index 7010c34792e093..e927e24b582a5d 100644
--- a/Lib/test/test_decimal.py
+++ b/Lib/test/test_decimal.py
@@ -4716,9 +4716,33 @@ def test_py_exact_power(self):
 
 c.prec = 1
 x = Decimal("152587890625") ** Decimal('-0.5')
+self.assertEqual(x, Decimal('3e-6'))
+c.prec = 2
+x = Decimal("152587890625") ** Decimal('-0.5')
+self.assertEqual(x, Decimal('2.6e-6'))
+c.prec = 3
+x = Decimal("152587890625") ** Decimal('-0.5')
+self.assertEqual(x, Decimal('2.56e-6'))
+c.prec = 28
+x = Decimal("152587890625") ** Decimal('-0.5')
+self.assertEqual(x, Decimal('2.56e-6'))
+
 c.prec = 201
 x = Decimal(2**578) ** Decimal("-0.5")
 
+# See https://github.com/python/cpython/issues/118027
+# Testing for an exact power could appear to hang, in the Python
+# version, as it attempted to compute 10**(MAX_EMAX + 1).
+# Fixed via https://github.com/python/cpython/pull/118503.
+c.prec = P.MAX_PREC
+c.Emax = P.MAX_EMAX
+c.Emin = P.MIN_EMIN
+c.traps[P.Inexact] = 1
+D2 = Decimal(2)
+# If the bug is still present, the next statement won't complete.
+res = D2 ** 117
+self.assertEqual(res, 1 << 117)
+
 def test_py_immutability_operations(self):
 # Do operations and check that it didn't change internal objects.
 Decimal = P.Decimal
@@ -5705,7 +5729,6 @@ def test_format_fallback_rounding(self):
 with C.localcontext(rounding=C.ROUND_DOWN):
 self.assertEqual(format(y, '#.1f'), '6.0')
 
-
 @requires_docstrings
 @requires_cdecimal
 class SignatureTest(unittest.TestCase):
diff --git 
a/Misc/NEWS.d

[Python-checkins] gh-118610: Centralize power caching in `_pylong.py` (#118611)

2024-05-07 Thread tim-one

https://github.com/python/cpython/commit/2f0a338be66e276a700f86af51d5ef54a138cbfb
commit: 2f0a338be66e276a700f86af51d5ef54a138cbfb
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-05-07T19:09:09-05:00
summary:

gh-118610: Centralize power caching in `_pylong.py` (#118611)

A new `compute_powers()` function computes all and only the powers of the base 
the various base-conversion functions need, as efficiently as reasonably 
possible (turns out that invoking `**`is needed at most once). This typically 
gives a few % speedup, but the primary point is to simplify the base-conversion 
functions, which no longer need their own, ad hoc, and less efficient 
power-caching schemes.

Co-authored-by: Serhiy Storchaka 

files:
M Lib/_pylong.py
M Lib/test/test_int.py

diff --git a/Lib/_pylong.py b/Lib/_pylong.py
index 30bee6fc9ef54c..4970eb3fa67b2b 100644
--- a/Lib/_pylong.py
+++ b/Lib/_pylong.py
@@ -19,6 +19,86 @@
 except ImportError:
 _decimal = None
 
+# A number of functions have this form, where `w` is a desired number of
+# digits in base `base`:
+#
+#def inner(...w...):
+#if w <= LIMIT:
+#return something
+#lo = w >> 1
+#hi = w - lo
+#something involving base**lo, inner(...lo...), j, and inner(...hi...)
+#figure out largest w needed
+#result = inner(w)
+#
+# They all had some on-the-fly scheme to cache `base**lo` results for reuse.
+# Power is costly.
+#
+# This routine aims to compute all amd only the needed powers in advance, as
+# efficiently as reasonably possible. This isn't trivial, and all the
+# on-the-fly methods did needless work in many cases. The driving code above
+# changes to:
+#
+#figure out largest w needed
+#mycache = compute_powers(w, base, LIMIT)
+#result = inner(w)
+#
+# and `mycache[lo]` replaces `base**lo` in the inner function.
+#
+# While this does give minor speedups (a few percent at best), the primary
+# intent is to simplify the functions using this, by eliminating the need for
+# them to craft their own ad-hoc caching schemes.
+def compute_powers(w, base, more_than, show=False):
+seen = set()
+need = set()
+ws = {w}
+while ws:
+w = ws.pop() # any element is fine to use next
+if w in seen or w <= more_than:
+continue
+seen.add(w)
+lo = w >> 1
+# only _need_ lo here; some other path may, or may not, need hi
+need.add(lo)
+ws.add(lo)
+if w & 1:
+ws.add(lo + 1)
+
+d = {}
+if not need:
+return d
+it = iter(sorted(need))
+first = next(it)
+if show:
+print("pow at", first)
+d[first] = base ** first
+for this in it:
+if this - 1 in d:
+if show:
+print("* base at", this)
+d[this] = d[this - 1] * base # cheap
+else:
+lo = this >> 1
+hi = this - lo
+assert lo in d
+if show:
+print("square at", this)
+# Multiplying a bigint by itself (same object!) is about twice
+# as fast in CPython.
+sq = d[lo] * d[lo]
+if hi != lo:
+assert hi == lo + 1
+if show:
+print("and * base")
+sq *= base
+d[this] = sq
+return d
+
+_unbounded_dec_context = decimal.getcontext().copy()
+_unbounded_dec_context.prec = decimal.MAX_PREC
+_unbounded_dec_context.Emax = decimal.MAX_EMAX
+_unbounded_dec_context.Emin = decimal.MIN_EMIN
+_unbounded_dec_context.traps[decimal.Inexact] = 1 # sanity check
 
 def int_to_decimal(n):
 """Asymptotically fast conversion of an 'int' to Decimal."""
@@ -33,57 +113,32 @@ def int_to_decimal(n):
 # "clever" recursive way.  If we want a string representation, we
 # apply str to _that_.
 
-D = decimal.Decimal
-D2 = D(2)
-
-BITLIM = 128
-
-mem = {}
-
-def w2pow(w):
-"""Return D(2)**w and store the result. Also possibly save some
-intermediate results. In context, these are likely to be reused
-across various levels of the conversion to Decimal."""
-if (result := mem.get(w)) is None:
-if w <= BITLIM:
-result = D2**w
-elif w - 1 in mem:
-result = (t := mem[w - 1]) + t
-else:
-w2 = w >> 1
-# If w happens to be odd, w-w2 is one larger then w2
-# now. Recurse on the smaller first (w2), so that it's
-# in the cache and the larger (w-w2) can be handled by
-# the cheaper `w-1 in mem` branch instead.
-result = w2pow(w2) * w2pow(w - w2)
-mem[w] = result
-return result
+from decimal import Decimal as D
+

[Python-checkins] gh-118750: Asymptotically faster `int(string)` (#118751)

2024-05-18 Thread tim-one

https://github.com/python/cpython/commit/ecd8664f11298a1a4f7428363c55ad2904c9f279
commit: ecd8664f11298a1a4f7428363c55ad2904c9f279
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-05-18T19:19:57-05:00
summary:

gh-118750: Asymptotically faster `int(string)` (#118751)

Asymptotically faster (O(n log n)) str->int for very large strings, leveraging 
the faster multiplication scheme in the C-coded `_decimal` when available. This 
is used instead of the current Karatsuba-limited method starting at 2 million 
digits.

Lots of opportunity remains for fine-tuning. Good targets include changing 
BYTELIM, and possibly changing the internal output base (from 256 to a higher 
number of bytes).

Doing this was substantial work, and many of the new lines are actually 
comments giving correctness proofs. The obvious approaches sticking to integers 
were too slow to be useful, so this is doing variable-precision decimal 
floating-point arithmetic. Much faster, but worst-possible rounding errors have 
to be wholly accounted for, using as little precision as possible.

Special thanks to Serhiy Storchaka for asking many good questions in his code 
reviews!

Co-authored-by: Jelle Zijlstra 
Co-authored-by: sstandre <[email protected]>
Co-authored-by: Pieter Eendebak 
Co-authored-by: Nice Zombies 

files:
A Misc/NEWS.d/next/Core and 
Builtins/2024-05-09-02-37-25.gh-issue-118750.7aLfT-.rst
M Lib/_pylong.py
M Lib/test/test_int.py

diff --git a/Lib/_pylong.py b/Lib/_pylong.py
index 4970eb3fa67b2b..f7aabde1434725 100644
--- a/Lib/_pylong.py
+++ b/Lib/_pylong.py
@@ -45,10 +45,16 @@
 #
 # and `mycache[lo]` replaces `base**lo` in the inner function.
 #
-# While this does give minor speedups (a few percent at best), the primary
-# intent is to simplify the functions using this, by eliminating the need for
-# them to craft their own ad-hoc caching schemes.
-def compute_powers(w, base, more_than, show=False):
+# If an algorithm wants the powers of ceiling(w/2) instead of the floor,
+# pass keyword argument `need_hi=True`.
+#
+# While this does give minor speedups (a few percent at best), the
+# primary intent is to simplify the functions using this, by eliminating
+# the need for them to craft their own ad-hoc caching schemes.
+#
+# See code near end of file for a block of code that can be enabled to
+# run millions of tests.
+def compute_powers(w, base, more_than, *, need_hi=False, show=False):
 seen = set()
 need = set()
 ws = {w}
@@ -58,40 +64,70 @@ def compute_powers(w, base, more_than, show=False):
 continue
 seen.add(w)
 lo = w >> 1
-# only _need_ lo here; some other path may, or may not, need hi
-need.add(lo)
-ws.add(lo)
-if w & 1:
-ws.add(lo + 1)
+hi = w - lo
+# only _need_ one here; the other may, or may not, be needed
+which = hi if need_hi else lo
+need.add(which)
+ws.add(which)
+if lo != hi:
+ws.add(w - which)
+
+# `need` is the set of exponents needed. To compute them all
+# efficiently, possibly add other exponents to `extra`. The goal is
+# to ensure that each exponent can be gotten from a smaller one via
+# multiplying by the base, squaring it, or squaring and then
+# multiplying by the base.
+#
+# If need_hi is False, this is already the case (w can always be
+# gotten from w >> 1 via one of the squaring strategies). But we do
+# the work anyway, just in case ;-)
+#
+# Note that speed is irrelevant. These loops are working on little
+# ints (exponents) and go around O(log w) times. The total cost is
+# insignificant compared to just one of the bigint multiplies.
+cands = need.copy()
+extra = set()
+while cands:
+w = max(cands)
+cands.remove(w)
+lo = w >> 1
+if lo > more_than and w-1 not in cands and lo not in cands:
+extra.add(lo)
+cands.add(lo)
+assert need_hi or not extra
 
 d = {}
-if not need:
-return d
-it = iter(sorted(need))
-first = next(it)
-if show:
-print("pow at", first)
-d[first] = base ** first
-for this in it:
-if this - 1 in d:
+for n in sorted(need | extra):
+lo = n >> 1
+hi = n - lo
+if n-1 in d:
 if show:
-print("* base at", this)
-d[this] = d[this - 1] * base # cheap
-else:
-lo = this >> 1
-hi = this - lo
-assert lo in d
+print("* base", end="")
+result = d[n-1] * base # cheap!
+elif lo in d:
+# Multiplying a bigint by itself is about twice as fast
+# in CPython provided it's the same object.
 if show:
-print("square at", this)
-

[Python-checkins] Try to repair oddball test bots timing out in test_int (#119166)

2024-05-18 Thread tim-one

https://github.com/python/cpython/commit/ba8af848648d3eb51eb17395e12117007bae8606
commit: ba8af848648d3eb51eb17395e12117007bae8606
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-05-18T20:54:23-05:00
summary:

Try to repair oddball test bots timing out in test_int (#119166)

Various test bots (outside the ones GH normally runs) are timing out during 
test_int after ecd8664 (asymptotically faster str->int). Best guess is that 
they don't build the C _decimal module. So require that module in the most 
likely tests to time out then. Flying mostly blind, though!

files:
M Lib/test/test_int.py

diff --git a/Lib/test/test_int.py b/Lib/test/test_int.py
index 67c080117edcc3..caeccbe1fed026 100644
--- a/Lib/test/test_int.py
+++ b/Lib/test/test_int.py
@@ -12,6 +12,11 @@
 except ImportError:
 _pylong = None
 
+try:
+import _decimal
+except ImportError:
+_decimal = None
+
 L = [
 ('0', 0),
 ('1', 1),
@@ -920,6 +925,7 @@ def test_pylong_roundtrip(self):
 bits <<= 1
 
 @support.requires_resource('cpu')
[email protected](_decimal, "C _decimal module required")
 def test_pylong_roundtrip_huge(self):
 # k blocks of 1234567890
 k = 1_000_000 # so 10 million digits in all
@@ -931,6 +937,7 @@ def test_pylong_roundtrip_huge(self):
 
 @support.requires_resource('cpu')
 @unittest.skipUnless(_pylong, "_pylong module required")
[email protected](_decimal, "C _decimal module required")
 def test_whitebox_dec_str_to_int_inner_failsafe(self):
 # While I believe the number of GUARD digits in this function is
 # always enough so that no more than one correction step is ever
@@ -950,6 +957,7 @@ def test_whitebox_dec_str_to_int_inner_failsafe(self):
 _pylong._spread.update(orig_spread)
 
 @unittest.skipUnless(_pylong, "pylong module required")
[email protected](_decimal, "C _decimal module required")
 def test_whitebox_dec_str_to_int_inner_monster(self):
 # I don't think anyone has enough RAM to build a string long enough
 # for this function to complain. So lie about the string length.

___
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-checkins.python.org/
Member address: [email protected]

[Python-checkins] gh-119105: difflib: improve recursion for degenerate cases (#119131)

2024-05-19 Thread tim-one

https://github.com/python/cpython/commit/0abf997e75bd3a8b76d920d33cc64d5e6c2d380f
commit: 0abf997e75bd3a8b76d920d33cc64d5e6c2d380f
branch: main
author: pulkin 
committer: tim-one 
date: 2024-05-19T16:46:37-05:00
summary:

gh-119105: difflib: improve recursion for degenerate cases (#119131)

Code from https://github.com/pulkin, in PR
https://github.com/python/cpython/pull/119131

Greatly speeds `Differ` when there are many identically scoring pairs, by 
splitting the recursion near the inputs' midpoints instead of degenerating (as 
now) into just peeling off the first two lines.

Co-authored-by: Tim Peters 

files:
A Misc/NEWS.d/next/Library/2024-05-19-12-25-36.gh-issue-119105.VcR4ig.rst
M Lib/difflib.py

diff --git a/Lib/difflib.py b/Lib/difflib.py
index ba0b256969ebff..54ca33d5615f8d 100644
--- a/Lib/difflib.py
+++ b/Lib/difflib.py
@@ -911,16 +911,26 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
 
 # don't synch up unless the lines have a similarity score of at
 # least cutoff; best_ratio tracks the best score seen so far
-best_ratio, cutoff = 0.74, 0.75
+# best_ratio is a tuple storing the best .ratio() seen so far, and
+# a measure of how far the indices are from their index range
+# midpoints. The latter is used to resolve ratio ties. Favoring
+# indices near the midpoints tends to cut the ranges in half. Else,
+# if there are many pairs with the best ratio, recursion can grow
+# very deep, and runtime becomes cubic. See:
+# https://github.com/python/cpython/issues/119105
+best_ratio, cutoff = (0.74, 0), 0.75
 cruncher = SequenceMatcher(self.charjunk)
 eqi, eqj = None, None   # 1st indices of equal lines (if any)
 
 # search for the pair that matches best without being identical
 # (identical lines must be junk lines, & we don't want to synch up
 # on junk -- unless we have to)
+amid = (alo + ahi - 1) / 2
+bmid = (blo + bhi - 1) / 2
 for j in range(blo, bhi):
 bj = b[j]
 cruncher.set_seq2(bj)
+weight_j = - abs(j - bmid)
 for i in range(alo, ahi):
 ai = a[i]
 if ai == bj:
@@ -928,16 +938,20 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
 eqi, eqj = i, j
 continue
 cruncher.set_seq1(ai)
+# weight is used to balance the recursion by prioritizing
+# i and j in the middle of their ranges
+weight = weight_j - abs(i - amid)
 # computing similarity is expensive, so use the quick
 # upper bounds first -- have seen this speed up messy
 # compares by a factor of 3.
 # note that ratio() is only expensive to compute the first
 # time it's called on a sequence pair; the expensive part
 # of the computation is cached by cruncher
-if cruncher.real_quick_ratio() > best_ratio and \
-  cruncher.quick_ratio() > best_ratio and \
-  cruncher.ratio() > best_ratio:
-best_ratio, best_i, best_j = cruncher.ratio(), i, j
+if (cruncher.real_quick_ratio(), weight) > best_ratio and \
+  (cruncher.quick_ratio(), weight) > best_ratio and \
+  (cruncher.ratio(), weight) > best_ratio:
+best_ratio, best_i, best_j = (cruncher.ratio(), weight), 
i, j
+best_ratio, _ = best_ratio
 if best_ratio < cutoff:
 # no non-identical "pretty close" pair
 if eqi is None:
diff --git 
a/Misc/NEWS.d/next/Library/2024-05-19-12-25-36.gh-issue-119105.VcR4ig.rst 
b/Misc/NEWS.d/next/Library/2024-05-19-12-25-36.gh-issue-119105.VcR4ig.rst
new file mode 100644
index 00..30b5f97b8059f9
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2024-05-19-12-25-36.gh-issue-119105.VcR4ig.rst
@@ -0,0 +1 @@
+``difflib.Differ`` is much faster for some cases of diffs where many pairs of 
lines are equally similar.

___
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-checkins.python.org/
Member address: [email protected]

[Python-checkins] gh-119105: difflib.py Differ.compare is too slow [for degenerate cases] (#119376)

2024-05-22 Thread tim-one

https://github.com/python/cpython/commit/07df93de73b6bdd4f80cd9e29834d761a0882622
commit: 07df93de73b6bdd4f80cd9e29834d761a0882622
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-05-22T18:25:08-05:00
summary:

gh-119105: difflib.py Differ.compare is too slow [for degenerate cases] 
(#119376)

Track all pairs achieving the best ratio in Differ(). This repairs the "very 
deep recursion and cubic time" bad cases in a way that preserves previous 
output.

files:
M Lib/difflib.py
M Lib/test/test_difflib.py

diff --git a/Lib/difflib.py b/Lib/difflib.py
index 54ca33d5615f8d..79b446c2afbdc6 100644
--- a/Lib/difflib.py
+++ b/Lib/difflib.py
@@ -908,29 +908,34 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
 + abcdefGhijkl
 ?^  ^  ^
 """
-
-# don't synch up unless the lines have a similarity score of at
-# least cutoff; best_ratio tracks the best score seen so far
-# best_ratio is a tuple storing the best .ratio() seen so far, and
-# a measure of how far the indices are from their index range
-# midpoints. The latter is used to resolve ratio ties. Favoring
-# indices near the midpoints tends to cut the ranges in half. Else,
-# if there are many pairs with the best ratio, recursion can grow
-# very deep, and runtime becomes cubic. See:
+from operator import ge, gt
+# Don't synch up unless the lines have a similarity score of at
+# least cutoff; best_ratio tracks the best score seen so far.
+# Keep track of all index pairs achieving the best ratio and
+# deal with them here. Previously only the smallest pair was
+# handled here, and if there are many pairs with the best ratio,
+# recursion could grow very deep, and runtime cubic. See:
 # https://github.com/python/cpython/issues/119105
-best_ratio, cutoff = (0.74, 0), 0.75
+best_ratio, cutoff = 0.74, 0.75
 cruncher = SequenceMatcher(self.charjunk)
 eqi, eqj = None, None   # 1st indices of equal lines (if any)
+# List of index pairs achieving best_ratio. Strictly increasing
+# in both index positions.
+max_pairs = []
+maxi = -1 # `i` index of last pair in max_pairs
 
 # search for the pair that matches best without being identical
-# (identical lines must be junk lines, & we don't want to synch up
-# on junk -- unless we have to)
-amid = (alo + ahi - 1) / 2
-bmid = (blo + bhi - 1) / 2
+# (identical lines must be junk lines, & we don't want to synch
+# up on junk -- unless we have to)
+crqr = cruncher.real_quick_ratio
+cqr = cruncher.quick_ratio
+cr = cruncher.ratio
 for j in range(blo, bhi):
 bj = b[j]
 cruncher.set_seq2(bj)
-weight_j = - abs(j - bmid)
+# Find new best, if possible. Else search for the smallest i
+# (if any) > maxi that equals the best ratio
+search_equal = True
 for i in range(alo, ahi):
 ai = a[i]
 if ai == bj:
@@ -938,65 +943,70 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
 eqi, eqj = i, j
 continue
 cruncher.set_seq1(ai)
-# weight is used to balance the recursion by prioritizing
-# i and j in the middle of their ranges
-weight = weight_j - abs(i - amid)
 # computing similarity is expensive, so use the quick
 # upper bounds first -- have seen this speed up messy
 # compares by a factor of 3.
-# note that ratio() is only expensive to compute the first
-# time it's called on a sequence pair; the expensive part
-# of the computation is cached by cruncher
-if (cruncher.real_quick_ratio(), weight) > best_ratio and \
-  (cruncher.quick_ratio(), weight) > best_ratio and \
-  (cruncher.ratio(), weight) > best_ratio:
-best_ratio, best_i, best_j = (cruncher.ratio(), weight), 
i, j
-best_ratio, _ = best_ratio
+cmp = ge if search_equal and i > maxi else gt
+if (cmp(crqr(), best_ratio)
+  and cmp(cqr(), best_ratio)
+  and cmp((ratio := cr()), best_ratio)):
+if ratio > best_ratio:
+best_ratio = ratio
+max_pairs.clear()
+else:
+assert best_ratio == ratio and search_equal
+assert i > maxi
+max_pairs.append((i, j))
+maxi = i
+search_equal = False
 if best_ratio < cutoff:
+

[Python-checkins] gh-119105: Differ.compare is too slow [for degenerate cases] (#119492)

2024-05-24 Thread tim-one

https://github.com/python/cpython/commit/de19694cfbcaa1c85c3a4b7184a24ff21b1c0919
commit: de19694cfbcaa1c85c3a4b7184a24ff21b1c0919
branch: main
author: Tim Peters 
committer: tim-one 
date: 2024-05-24T22:08:21-05:00
summary:

gh-119105: Differ.compare is too slow [for degenerate cases] (#119492)

``_fancy_replace()`` is no longer recursive. and a single call does a 
worst-case linear number of ratio() computations instead of quadratic. This 
renders toothless a universe of pathological cases. Some inputs may produce 
different output, but that's rare, and I didn't find a case where the final 
diff appeared to be of materially worse quality. To the contrary, by refusing 
to even consider synching on lines "far apart", there was more easy-to-digest 
locality in the output.

files:
A Misc/NEWS.d/next/Library/2024-05-24-04-05-37.gh-issue-119105.aDSRFn.rst
M Lib/difflib.py

diff --git a/Lib/difflib.py b/Lib/difflib.py
index 79b446c2afbdc6..0443963b4fd697 100644
--- a/Lib/difflib.py
+++ b/Lib/difflib.py
@@ -908,79 +908,52 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
 + abcdefGhijkl
 ?^  ^  ^
 """
-from operator import ge, gt
-# Don't synch up unless the lines have a similarity score of at
-# least cutoff; best_ratio tracks the best score seen so far.
-# Keep track of all index pairs achieving the best ratio and
-# deal with them here. Previously only the smallest pair was
-# handled here, and if there are many pairs with the best ratio,
-# recursion could grow very deep, and runtime cubic. See:
+# Don't synch up unless the lines have a similarity score above
+# cutoff. Previously only the smallest pair was handled here,
+# and if there are many pairs with the best ratio, recursion
+# could grow very deep, and runtime cubic. See:
 # https://github.com/python/cpython/issues/119105
-best_ratio, cutoff = 0.74, 0.75
+#
+# Later, more pathological cases prompted removing recursion
+# entirely.
+cutoff = 0.74999
 cruncher = SequenceMatcher(self.charjunk)
-eqi, eqj = None, None   # 1st indices of equal lines (if any)
-# List of index pairs achieving best_ratio. Strictly increasing
-# in both index positions.
-max_pairs = []
-maxi = -1 # `i` index of last pair in max_pairs
-
-# search for the pair that matches best without being identical
-# (identical lines must be junk lines, & we don't want to synch
-# up on junk -- unless we have to)
 crqr = cruncher.real_quick_ratio
 cqr = cruncher.quick_ratio
 cr = cruncher.ratio
+
+WINDOW = 10
+best_i = best_j = None
+dump_i, dump_j = alo, blo # smallest indices not yet resolved
 for j in range(blo, bhi):
-bj = b[j]
-cruncher.set_seq2(bj)
-# Find new best, if possible. Else search for the smallest i
-# (if any) > maxi that equals the best ratio
-search_equal = True
-for i in range(alo, ahi):
-ai = a[i]
-if ai == bj:
-if eqi is None:
-eqi, eqj = i, j
-continue
-cruncher.set_seq1(ai)
-# computing similarity is expensive, so use the quick
-# upper bounds first -- have seen this speed up messy
-# compares by a factor of 3.
-cmp = ge if search_equal and i > maxi else gt
-if (cmp(crqr(), best_ratio)
-  and cmp(cqr(), best_ratio)
-  and cmp((ratio := cr()), best_ratio)):
-if ratio > best_ratio:
-best_ratio = ratio
-max_pairs.clear()
-else:
-assert best_ratio == ratio and search_equal
-assert i > maxi
-max_pairs.append((i, j))
-maxi = i
-search_equal = False
-if best_ratio < cutoff:
-assert not max_pairs
-# no non-identical "pretty close" pair
-if eqi is None:
-# no identical pair either -- treat it as a straight replace
-yield from self._plain_replace(a, alo, ahi, b, blo, bhi)
-return
-# no close pair, but an identical pair -- synch up on that
-max_pairs = [(eqi, eqj)]
-else:
-# there's a close pair, so forget the identical pair (if any)
-assert max_pairs
-eqi = None
-
-last_i, last_j = alo, blo
-for this_i, this_j in max_pairs:
-# pump out diffs from before the synch point
-yield from self._fancy_he

[Python-checkins] gh-130914: Make graphlib.TopologicalSorter.prepare() idempotent (#131317)

2025-03-18 Thread tim-one

https://github.com/python/cpython/commit/c1b42db9e47b76fca3c2993cb172cc4991b04839
commit: c1b42db9e47b76fca3c2993cb172cc4991b04839
branch: main
author: Daniel Pope 
committer: tim-one 
date: 2025-03-18T16:28:00-05:00
summary:

gh-130914: Make graphlib.TopologicalSorter.prepare() idempotent (#131317)

Closes #130914: Make graphlib.TopologicalSorter.prepare() idempotent

Relax the rules so that `.prepare()` can be called multiple times, provided 
that no work has been passed out by `.get_ready()` yet.

files:
A Misc/NEWS.d/next/Library/2025-03-16-08-00-29.gh-issue-130914.6z883_.rst
M Doc/library/graphlib.rst
M Doc/whatsnew/3.14.rst
M Lib/graphlib.py
M Lib/test/test_graphlib.py
M Misc/ACKS

diff --git a/Doc/library/graphlib.rst b/Doc/library/graphlib.rst
index a0b16576fad219..49424700303fdd 100644
--- a/Doc/library/graphlib.rst
+++ b/Doc/library/graphlib.rst
@@ -106,6 +106,14 @@
   function, the graph cannot be modified, and therefore no more nodes can 
be
   added using :meth:`~TopologicalSorter.add`.
 
+  A :exc:`ValueError` will be raised if the sort has been started by
+  :meth:`~.static_order` or :meth:`~.get_ready`.
+
+  .. versionchanged:: next
+
+ ``prepare()`` can now be called more than once as long as the sort has
+ not started. Previously this raised :exc:`ValueError`.
+
.. method:: is_active()
 
   Returns ``True`` if more progress can be made and ``False`` otherwise.
diff --git a/Doc/whatsnew/3.14.rst b/Doc/whatsnew/3.14.rst
index 9a25f27ca57257..79b219dd72651c 100644
--- a/Doc/whatsnew/3.14.rst
+++ b/Doc/whatsnew/3.14.rst
@@ -607,6 +607,15 @@ getopt
 * Add support for returning intermixed options and non-option arguments in 
order.
   (Contributed by Serhiy Storchaka in :gh:`126390`.)
 
+
+graphlib
+
+
+* Allow :meth:`graphlib.TopologicalSorter.prepare` to be called more than once
+  as long as sorting has not started.
+  (Contributed by Daniel Pope in :gh:`130914`)
+
+
 http
 
 
diff --git a/Lib/graphlib.py b/Lib/graphlib.py
index 82f33fb5cf312c..7961c9c5cac2d6 100644
--- a/Lib/graphlib.py
+++ b/Lib/graphlib.py
@@ -90,13 +90,17 @@ def prepare(self):
 still be used to obtain as many nodes as possible until cycles block 
more
 progress. After a call to this function, the graph cannot be modified 
and
 therefore no more nodes can be added using "add".
+
+Raise ValueError if nodes have already been passed out of the sorter.
+
 """
-if self._ready_nodes is not None:
-raise ValueError("cannot prepare() more than once")
+if self._npassedout > 0:
+raise ValueError("cannot prepare() after starting sort")
 
-self._ready_nodes = [
-i.node for i in self._node2info.values() if i.npredecessors == 0
-]
+if self._ready_nodes is None:
+self._ready_nodes = [
+i.node for i in self._node2info.values() if i.npredecessors == 0
+]
 # ready_nodes is set before we look for cycles on purpose:
 # if the user wants to catch the CycleError, that's fine,
 # they can continue using the instance to grab as many
diff --git a/Lib/test/test_graphlib.py b/Lib/test/test_graphlib.py
index 5f38af4024c5b0..66722e0b0498a6 100644
--- a/Lib/test/test_graphlib.py
+++ b/Lib/test/test_graphlib.py
@@ -140,9 +140,21 @@ def test_calls_before_prepare(self):
 def test_prepare_multiple_times(self):
 ts = graphlib.TopologicalSorter()
 ts.prepare()
-with self.assertRaisesRegex(ValueError, r"cannot prepare\(\) more than 
once"):
+ts.prepare()
+
+def test_prepare_after_pass_out(self):
+ts = graphlib.TopologicalSorter({'a': 'bc'})
+ts.prepare()
+self.assertEqual(set(ts.get_ready()), {'b', 'c'})
+with self.assertRaisesRegex(ValueError, r"cannot prepare\(\) after 
starting sort"):
 ts.prepare()
 
+def test_prepare_cycleerror_each_time(self):
+ts = graphlib.TopologicalSorter({'a': 'b', 'b': 'a'})
+for attempt in range(1, 4):
+with self.assertRaises(graphlib.CycleError, msg=f"{attempt=}"):
+ts.prepare()
+
 def test_invalid_nodes_in_done(self):
 ts = graphlib.TopologicalSorter()
 ts.add(1, 2, 3, 4)
diff --git a/Misc/ACKS b/Misc/ACKS
index c623d69e086b63..b0b4ea0db44b64 100644
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -1483,6 +1483,7 @@ Michael Pomraning
 Martin Pool
 Iustin Pop
 Claudiu Popa
+Daniel Pope
 Nick Pope
 John Popplewell
 Matheus Vieira Portela
diff --git 
a/Misc/NEWS.d/next/Library/2025-03-16-08-00-29.gh-issue-130914.6z883_.rst 
b/Misc/NEWS.d/next/Library/2025-03-16-08-00-29.gh-issue-130914.6z883_.rst
new file mode 100644
index 00..956cf83758f9a2
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2025-

[Python-checkins] gh-135551: Change how sorting picks minimum run length (#135553)

2025-06-26 Thread tim-one

https://github.com/python/cpython/commit/2fc68e180ffdb31886938203e89a75b220a58cec
commit: 2fc68e180ffdb31886938203e89a75b220a58cec
branch: main
author: Tim Peters 
committer: tim-one 
date: 2025-06-26T23:48:05-05:00
summary:

gh-135551: Change how sorting picks minimum run length (#135553)

New scheme from Stefan Pochmann for picking minimum run lengths.

By allowing them to change a little from one run to the next, it's possible to
arrange for that all merges, at all levels, strongly tend to be as evenly 
balanced
as possible, for randomly ordered data. Meaning the number of initial runs is a
power of 2, and all merges involve runs whose lengths differ by no more than 1.

files:
A 
Misc/NEWS.d/next/Core_and_Builtins/2025-06-16-03-56-15.gh-issue-135551.hRTQO-.rst
M Misc/ACKS
M Objects/listobject.c
M Objects/listsort.txt

diff --git a/Misc/ACKS b/Misc/ACKS
index 74cf29cdbc552f..6ab50763feadd9 100644
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -1481,6 +1481,7 @@ Jean-François Piéronne
 Oleg Plakhotnyuk
 Anatoliy Platonov
 Marcel Plch
+Stefan Pochmann
 Kirill Podoprigora
 Remi Pointel
 Jon Poler
diff --git 
a/Misc/NEWS.d/next/Core_and_Builtins/2025-06-16-03-56-15.gh-issue-135551.hRTQO-.rst
 
b/Misc/NEWS.d/next/Core_and_Builtins/2025-06-16-03-56-15.gh-issue-135551.hRTQO-.rst
new file mode 100644
index 00..22dda2a3e972a8
--- /dev/null
+++ 
b/Misc/NEWS.d/next/Core_and_Builtins/2025-06-16-03-56-15.gh-issue-135551.hRTQO-.rst
@@ -0,0 +1 @@
+Sorting randomly ordered lists will often run a bit faster, thanks to a new 
scheme for picking minimum run lengths from Stefan Pochmann, which arranges for 
the merge tree to be as evenly balanced as is possible.
diff --git a/Objects/listobject.c b/Objects/listobject.c
index 23d3472b6d4153..1b36f4c25abf4d 100644
--- a/Objects/listobject.c
+++ b/Objects/listobject.c
@@ -1685,10 +1685,7 @@ sortslice_advance(sortslice *slice, Py_ssize_t n)
 /* Avoid malloc for small temp arrays. */
 #define MERGESTATE_TEMP_SIZE 256
 
-/* The largest value of minrun. This must be a power of 2, and >= 1, so that
- * the compute_minrun() algorithm guarantees to return a result no larger than
- * this,
- */
+/* The largest value of minrun. This must be a power of 2, and >= 1 */
 #define MAX_MINRUN 64
 #if ((MAX_MINRUN) < 1) || ((MAX_MINRUN) & ((MAX_MINRUN) - 1))
 #error "MAX_MINRUN must be a power of 2, and >= 1"
@@ -1749,6 +1746,11 @@ struct s_MergeState {
  * of tuples. It may be set to safe_object_compare, but the idea is that 
hopefully
  * we can assume more, and use one of the special-case compares. */
 int (*tuple_elem_compare)(PyObject *, PyObject *, MergeState *);
+
+/* Varisbles used for minrun computation. The "ideal" minrun length is
+ * the infinite precision listlen / 2**e. See listsort.txt.
+ */
+ Py_ssize_t mr_current, mr_e, mr_mask;
 };
 
 /* binarysort is the best method for sorting small arrays: it does few
@@ -2210,6 +2212,14 @@ merge_init(MergeState *ms, Py_ssize_t list_size, int 
has_keyfunc,
 ms->min_gallop = MIN_GALLOP;
 ms->listlen = list_size;
 ms->basekeys = lo->keys;
+
+/* State for generating minrun values. See listsort.txt. */
+ms->mr_e = 0;
+while (list_size >> ms->mr_e >= MAX_MINRUN) {
+++ms->mr_e;
+}
+ms->mr_mask = (1 << ms->mr_e) - 1;
+ms->mr_current = 0;
 }
 
 /* Free all the temp memory owned by the MergeState.  This must be called
@@ -2687,27 +2697,15 @@ merge_force_collapse(MergeState *ms)
 return 0;
 }
 
-/* Compute a good value for the minimum run length; natural runs shorter
- * than this are boosted artificially via binary insertion.
- *
- * If n < MAX_MINRUN return n (it's too small to bother with fancy stuff).
- * Else if n is an exact power of 2, return MAX_MINRUN / 2.
- * Else return an int k, MAX_MINRUN / 2 <= k <= MAX_MINRUN, such that n/k is
- * close to, but strictly less than, an exact power of 2.
- *
- * See listsort.txt for more info.
- */
-static Py_ssize_t
-merge_compute_minrun(Py_ssize_t n)
+/* Return the next minrun value to use. See listsort.txt. */
+Py_LOCAL_INLINE(Py_ssize_t)
+minrun_next(MergeState *ms)
 {
-Py_ssize_t r = 0;   /* becomes 1 if any 1 bits are shifted off */
-
-assert(n >= 0);
-while (n >= MAX_MINRUN) {
-r |= n & 1;
-n >>= 1;
-}
-return n + r;
+ms->mr_current += ms->listlen;
+assert(ms->mr_current >= 0); /* no overflow */
+Py_ssize_t result = ms->mr_current >> ms->mr_e;
+ms->mr_current &= ms->mr_mask;
+return result;
 }
 
 /* Here we define custom comparison functions to optimize for the cases one 
commonly
@@ -3075,7 +3073,6 @@ list_sort_impl(PyListObject *self, PyObject *keyfunc, int 
reverse)
 /* March over the array once, left to right, finding natural runs,
  * and extending short natural runs to minrun elements.
  *

[Python-checkins] gh-132876: workaround broken ldexp() on Windows 10 (#133135)

2025-05-25 Thread tim-one

https://github.com/python/cpython/commit/cf8941c60356acdd00055e5583a2d64761c34af4
commit: cf8941c60356acdd00055e5583a2d64761c34af4
branch: main
author: Sergey B Kirpichev 
committer: tim-one 
date: 2025-05-25T21:44:33-05:00
summary:

gh-132876: workaround broken ldexp() on Windows 10 (#133135)

* gh-132876: workaround broken ldexp() on Windows 10

ldexp() fails to round subnormal results before Windows 11,
so hide their bug.

Co-authored-by: Tim Peters 

files:
A Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
M Lib/test/test_math.py
M Modules/mathmodule.c

diff --git a/Lib/test/test_math.py b/Lib/test/test_math.py
index 913a60bf9e04e3..d14336f8bac498 100644
--- a/Lib/test/test_math.py
+++ b/Lib/test/test_math.py
@@ -1214,6 +1214,12 @@ def testLdexp(self):
 self.assertEqual(math.ldexp(NINF, n), NINF)
 self.assertTrue(math.isnan(math.ldexp(NAN, n)))
 
+@requires_IEEE_754
+def testLdexp_denormal(self):
+# Denormal output incorrectly rounded (truncated)
+# on some Windows.
+self.assertEqual(math.ldexp(6993274598585239, -1126), 1e-323)
+
 def testLog(self):
 self.assertRaises(TypeError, math.log)
 self.assertRaises(TypeError, math.log, 1, 2, 3)
diff --git 
a/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst 
b/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
new file mode 100644
index 00..cb3ca3321e3d26
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
@@ -0,0 +1,4 @@
+``ldexp()`` on Windows doesn't round subnormal results before Windows 11,
+but should.  Python's :func:`math.ldexp` wrapper now does round them, so
+results may change slightly, in rare cases of very small results, on
+Windows versions before 11.
diff --git a/Modules/mathmodule.c b/Modules/mathmodule.c
index 40abd69f0a6600..71d9c1387f5780 100644
--- a/Modules/mathmodule.c
+++ b/Modules/mathmodule.c
@@ -2161,6 +2161,27 @@ math_ldexp_impl(PyObject *module, double x, PyObject *i)
 } else {
 errno = 0;
 r = ldexp(x, (int)exp);
+#ifdef _MSC_VER
+if (DBL_MIN > r && r > -DBL_MIN) {
+/* Denormal (or zero) results can be incorrectly rounded here 
(rather,
+   truncated).  Fixed in newer versions of the C runtime, included
+   with Windows 11. */
+int original_exp;
+frexp(x, &original_exp);
+if (original_exp > DBL_MIN_EXP) {
+/* Shift down to the smallest normal binade.  No bits lost. */
+int shift = DBL_MIN_EXP - original_exp;
+x = ldexp(x, shift);
+exp -= shift;
+}
+/* Multiplying by 2**exp finishes the job, and the HW will round as
+   appropriate.  Note: if exp < -DBL_MANT_DIG, all of x is shifted
+   to be < 0.5ULP of smallest denorm, so should be thrown away.  If
+   exp is so very negative that ldexp underflows to 0, that's fine;
+   no need to check in advance. */
+r = x*ldexp(1.0, (int)exp);
+}
+#endif
 if (isinf(r))
 errno = ERANGE;
 }

___
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]

[Python-checkins] [3.14] gh-132876: workaround broken ldexp() on Windows 10 (GH-133135) (#134684)

2025-05-25 Thread tim-one

https://github.com/python/cpython/commit/16187b58bff9ccf6f800ba908ad5dae441fcb6a1
commit: 16187b58bff9ccf6f800ba908ad5dae441fcb6a1
branch: 3.14
author: Miss Islington (bot) <[email protected]>
committer: tim-one 
date: 2025-05-25T22:23:45-05:00
summary:

[3.14] gh-132876: workaround broken ldexp() on Windows 10 (GH-133135) (#134684)

gh-132876: workaround broken ldexp() on Windows 10 (GH-133135)

* gh-132876: workaround broken ldexp() on Windows 10

ldexp() fails to round subnormal results before Windows 11,
so hide their bug.
(cherry picked from commit cf8941c60356acdd00055e5583a2d64761c34af4)

Co-authored-by: Sergey B Kirpichev 
Co-authored-by: Tim Peters 

files:
A Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
M Lib/test/test_math.py
M Modules/mathmodule.c

diff --git a/Lib/test/test_math.py b/Lib/test/test_math.py
index 913a60bf9e04e3..d14336f8bac498 100644
--- a/Lib/test/test_math.py
+++ b/Lib/test/test_math.py
@@ -1214,6 +1214,12 @@ def testLdexp(self):
 self.assertEqual(math.ldexp(NINF, n), NINF)
 self.assertTrue(math.isnan(math.ldexp(NAN, n)))
 
+@requires_IEEE_754
+def testLdexp_denormal(self):
+# Denormal output incorrectly rounded (truncated)
+# on some Windows.
+self.assertEqual(math.ldexp(6993274598585239, -1126), 1e-323)
+
 def testLog(self):
 self.assertRaises(TypeError, math.log)
 self.assertRaises(TypeError, math.log, 1, 2, 3)
diff --git 
a/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst 
b/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
new file mode 100644
index 00..cb3ca3321e3d26
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
@@ -0,0 +1,4 @@
+``ldexp()`` on Windows doesn't round subnormal results before Windows 11,
+but should.  Python's :func:`math.ldexp` wrapper now does round them, so
+results may change slightly, in rare cases of very small results, on
+Windows versions before 11.
diff --git a/Modules/mathmodule.c b/Modules/mathmodule.c
index 40abd69f0a6600..71d9c1387f5780 100644
--- a/Modules/mathmodule.c
+++ b/Modules/mathmodule.c
@@ -2161,6 +2161,27 @@ math_ldexp_impl(PyObject *module, double x, PyObject *i)
 } else {
 errno = 0;
 r = ldexp(x, (int)exp);
+#ifdef _MSC_VER
+if (DBL_MIN > r && r > -DBL_MIN) {
+/* Denormal (or zero) results can be incorrectly rounded here 
(rather,
+   truncated).  Fixed in newer versions of the C runtime, included
+   with Windows 11. */
+int original_exp;
+frexp(x, &original_exp);
+if (original_exp > DBL_MIN_EXP) {
+/* Shift down to the smallest normal binade.  No bits lost. */
+int shift = DBL_MIN_EXP - original_exp;
+x = ldexp(x, shift);
+exp -= shift;
+}
+/* Multiplying by 2**exp finishes the job, and the HW will round as
+   appropriate.  Note: if exp < -DBL_MANT_DIG, all of x is shifted
+   to be < 0.5ULP of smallest denorm, so should be thrown away.  If
+   exp is so very negative that ldexp underflows to 0, that's fine;
+   no need to check in advance. */
+r = x*ldexp(1.0, (int)exp);
+}
+#endif
 if (isinf(r))
 errno = ERANGE;
 }

___
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]

[Python-checkins] [3.13] gh-132876: workaround broken ldexp() on Windows 10 (GH-133135) (#134685)

2025-05-25 Thread tim-one

https://github.com/python/cpython/commit/e483dcf19eee7143ba76d1829afc1052053a904e
commit: e483dcf19eee7143ba76d1829afc1052053a904e
branch: 3.13
author: Sergey B Kirpichev 
committer: tim-one 
date: 2025-05-25T22:39:34-05:00
summary:

[3.13] gh-132876: workaround broken ldexp() on Windows 10 (GH-133135) (#134685)

* gh-132876: workaround broken ldexp() on Windows 10

ldexp() fails to round subnormal results before Windows 11,
so hide their bug.
(cherry picked from commit cf8941c60356acdd00055e5583a2d64761c34af4)

Co-authored-by: Tim Peters 

files:
A Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
M Lib/test/test_math.py
M Modules/mathmodule.c

diff --git a/Lib/test/test_math.py b/Lib/test/test_math.py
index d309e8c1c41f18..5ee3055c871419 100644
--- a/Lib/test/test_math.py
+++ b/Lib/test/test_math.py
@@ -1202,6 +1202,12 @@ def testLdexp(self):
 self.assertEqual(math.ldexp(NINF, n), NINF)
 self.assertTrue(math.isnan(math.ldexp(NAN, n)))
 
+@requires_IEEE_754
+def testLdexp_denormal(self):
+# Denormal output incorrectly rounded (truncated)
+# on some Windows.
+self.assertEqual(math.ldexp(6993274598585239, -1126), 1e-323)
+
 def testLog(self):
 self.assertRaises(TypeError, math.log)
 self.assertRaises(TypeError, math.log, 1, 2, 3)
diff --git 
a/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst 
b/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
new file mode 100644
index 00..cb3ca3321e3d26
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2025-04-29-11-48-46.gh-issue-132876.lyTQGZ.rst
@@ -0,0 +1,4 @@
+``ldexp()`` on Windows doesn't round subnormal results before Windows 11,
+but should.  Python's :func:`math.ldexp` wrapper now does round them, so
+results may change slightly, in rare cases of very small results, on
+Windows versions before 11.
diff --git a/Modules/mathmodule.c b/Modules/mathmodule.c
index 2d56dc35731c0b..aee1b17be9cb2a 100644
--- a/Modules/mathmodule.c
+++ b/Modules/mathmodule.c
@@ -2166,6 +2166,27 @@ math_ldexp_impl(PyObject *module, double x, PyObject *i)
 } else {
 errno = 0;
 r = ldexp(x, (int)exp);
+#ifdef _MSC_VER
+if (DBL_MIN > r && r > -DBL_MIN) {
+/* Denormal (or zero) results can be incorrectly rounded here 
(rather,
+   truncated).  Fixed in newer versions of the C runtime, included
+   with Windows 11. */
+int original_exp;
+frexp(x, &original_exp);
+if (original_exp > DBL_MIN_EXP) {
+/* Shift down to the smallest normal binade.  No bits lost. */
+int shift = DBL_MIN_EXP - original_exp;
+x = ldexp(x, shift);
+exp -= shift;
+}
+/* Multiplying by 2**exp finishes the job, and the HW will round as
+   appropriate.  Note: if exp < -DBL_MANT_DIG, all of x is shifted
+   to be < 0.5ULP of smallest denorm, so should be thrown away.  If
+   exp is so very negative that ldexp underflows to 0, that's fine;
+   no need to check in advance. */
+r = x*ldexp(1.0, (int)exp);
+}
+#endif
 if (Py_IS_INFINITY(r))
 errno = ERANGE;
 }

___
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]

[Python-checkins] gh-140443: Use `fma` in `loghelper` to improve accuracy of log for very large integers (#140469)

2025-10-23 Thread tim-one

https://github.com/python/cpython/commit/f0291c3f2df8139870359c7d1d9a4858f19ee7bf
commit: f0291c3f2df8139870359c7d1d9a4858f19ee7bf
branch: main
author: Abhishek Tiwari 
committer: tim-one 
date: 2025-10-23T12:05:12-05:00
summary:

gh-140443: Use `fma` in `loghelper` to improve accuracy of log for very large 
integers (#140469)

* gh-140443:use fma in loghelper to improve accuracy of log for very large 
integers

Use fused multiply-add in log_helper() for huge ints.

Saving a rounding here is remarkably effective. Across some millions
of randomized test cases with ints up to a billion bits, on Windows
and using log10, the ULP error distribution was dramatically
flattened, and its range was nearly cut in half. In fact, the largest
error Tim saw was under 0.6 ULP.

-

Co-authored-by: abhi210 <[email protected]>
Co-authored-by: Stan Ulbrych <[email protected]>

files:
A 
Misc/NEWS.d/next/Core_and_Builtins/2025-10-22-23-26-37.gh-issue-140443.wT5i1A.rst
M Misc/ACKS
M Modules/mathmodule.c

diff --git a/Misc/ACKS b/Misc/ACKS
index 6876380e0ba8d2..f5f15f2eb7ea24 100644
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -1921,6 +1921,7 @@ Tim Tisdall
 Jason Tishler
 Christian Tismer
 Jim Tittsler
+Abhishek Tiwari
 Frank J. Tobin
 James Tocknell
 Bennett Todd
diff --git 
a/Misc/NEWS.d/next/Core_and_Builtins/2025-10-22-23-26-37.gh-issue-140443.wT5i1A.rst
 
b/Misc/NEWS.d/next/Core_and_Builtins/2025-10-22-23-26-37.gh-issue-140443.wT5i1A.rst
new file mode 100644
index 00..a1fff8fef7ebe2
--- /dev/null
+++ 
b/Misc/NEWS.d/next/Core_and_Builtins/2025-10-22-23-26-37.gh-issue-140443.wT5i1A.rst
@@ -0,0 +1,5 @@
+The logarithm functions (such as :func:`math.log10` and :func:`math.log`) may 
now produce
+slightly different results for extremely large integers that cannot be
+converted to floats without overflow. These results are generally more
+accurate, with reduced worst-case error and a tighter overall error
+distribution.
diff --git a/Modules/mathmodule.c b/Modules/mathmodule.c
index c631beb9ce5477..be88841716b004 100644
--- a/Modules/mathmodule.c
+++ b/Modules/mathmodule.c
@@ -2309,7 +2309,7 @@ loghelper(PyObject* arg, double (*func)(double))
 assert(e >= 0);
 assert(!PyErr_Occurred());
 /* Value is ~= x * 2**e, so the log ~= log(x) + log(2) * e. */
-result = func(x) + func(2.0) * e;
+result = fma(func(2.0), (double)e, func(x));
 }
 else
 /* Successfully converted x to a double. */

___
Python-checkins mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/python-checkins.python.org
Member address: [email protected]

[Python-checkins] GH-116554: Relax list.sort()'s notion of "descending" runs (#116578)

[Python-checkins] GH-116939: Rewrite binarysort() (#116940)

[Python-checkins] gh-118164: str(10**10000) hangs if the C _decimal module is missing (#118503)

[Python-checkins] gh-118610: Centralize power caching in `_pylong.py` (#118611)

[Python-checkins] gh-118750: Asymptotically faster `int(string)` (#118751)

[Python-checkins] Try to repair oddball test bots timing out in test_int (#119166)

[Python-checkins] gh-119105: difflib: improve recursion for degenerate cases (#119131)

[Python-checkins] gh-119105: difflib.py Differ.compare is too slow [for degenerate cases] (#119376)

[Python-checkins] gh-119105: Differ.compare is too slow [for degenerate cases] (#119492)

[Python-checkins] gh-130914: Make graphlib.TopologicalSorter.prepare() idempotent (#131317)

[Python-checkins] gh-135551: Change how sorting picks minimum run length (#135553)

[Python-checkins] gh-132876: workaround broken ldexp() on Windows 10 (#133135)

[Python-checkins] [3.14] gh-132876: workaround broken ldexp() on Windows 10 (GH-133135) (#134684)

[Python-checkins] [3.13] gh-132876: workaround broken ldexp() on Windows 10 (GH-133135) (#134685)

[Python-checkins] gh-140443: Use `fma` in `loghelper` to improve accuracy of log for very large integers (#140469)

15 matches

Site Navigation

Mail list logo

Footer information