[issue37295] Possible optimizations for math.comb()

2021-12-29 Thread Tim Peters


Tim Peters  added the comment:

About:

TableSize = 101
limits = bytearray(TableSize)
for n in range(0, TableSize):
for k in range(0, n+1):
if comb(n, k) != comb_small(n, k):

(and regardless of whether the last line is replaced with the later correction):

Did you try running that? Assuming "comb_small()" refers to the earlier Python 
function of that name you posted, it dies in the obvious way:

Traceback (most recent call last):
  File "C:\MyPy\temp3.py", line 29414, in 
if comb(n, k) != comb_small(n, k) % Modulus:
  File "C:\MyPy\temp3.py", line 29404, in comb_small
return (F[n] * Finv[k] * Finv[n-k] % Modulus) << (S[n] - S[k] - S[n-k])
IndexError: list index out of range

This occurs, as expected, when n first reaches 68 (because Cmax = 67 in the 
code posted for comb_small()).

So it's unclear what you intended to say. Certainly, the current mathmodule.c 
perm_comb_small() (where "small" appears to mean merely that the args fit in C 
"unsigned long long") makes no attempt to exploit the newer n <= 67 code Mark 
checked in.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-29 Thread Tim Peters


Tim Peters  added the comment:

All cool. Since I doubt these new rounding modes will get much use anyway, it's 
not worth arguing about. If I were to do this for myself, the rounding argument 
would be one of the three values {-1, 0, +1}, which are already more than good 
enough as mnemonics for "go down, get close, go up". Passing strings of any 
kind is already pedantic overkill ;-)

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-28 Thread Tim Peters


Tim Peters  added the comment:

FYI, I had a simpler derivation in mind. Say

sqrt(n) = r + f

where r = isqrt(n) and 0 <= f < 1. Then

sqrt(4n) = 2 * sqrt(n) = 2*(r + f) = 2r + 2f, with 0 <= 2f < 2.

If f < 0.5, 2f < 1, so isqrt(4n) = 2r, and we shouldn't round r up either.

If f > 0.5, 2f > 1, so sqrt(4n) = 2r + 1 + (2f - 1), with 0 <= 2f - 1 < 1, so 
isqrt(4n) = 2*r + 1. In this case (f > 0.5) we need to round r up.

f = 0.5 can't happen.

Regardless, I don't believe I would have thought of this myself! It was an 
unexpected delight :-

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-28 Thread Tim Peters


Tim Peters  added the comment:

>> can we use the decimal module's names for the supported
>> rounding modes?

> I'm not sure those make sense because we never get to
> exactly half.  There is only floor, ceil, and round,
> not half_up, half_even, etc.

So use decimal's ROUND_CEILING, ROUND_FLOOR, and ROUND_HALF_EVEN. It's 
irrelevant that the halfway case can't occur: it's still following the 
ROUND_HALF_EVEN rules, it's just that one of those rules never happens to apply 
in this context. So what? Your _intent_ is to supply "best possible rounding", 
and that's what ROUND_HALF_EVEN means. It's not doing any favors to make up a 
new name for a rounding mode that only applies to values that can never tie. 
Tail. dog, wag ;-) If someone knows what half/even does, they already know what 
_this_ rounding mode does. Why complicate it with a useless distinction?

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Tim Peters


Tim Peters  added the comment:

A timing confounder: I see that CPython's _Py_popcount32() only tries to use 
the relevant blazing fast hardware instruction if defined(__clang__) || 
defined(__GNUC__). On Windows, it's a long-winded bit-fiddling dance.

So which of xor-popcount and add-up-up-trailing-zero-counts is faster may well 
depend on platform.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-28 Thread Tim Peters


Tim Peters  added the comment:

Raymond, using the count of trailing zeroes instead is exactly what I suggested 
before, here:

https://bugs.python.org/issue37295#msg409000

So Mark & I already batted that around. For whatever reasons he had, though, he 
stuck with the xor-popcount approach. Presumably he found that was faster than 
looking up trailing zero counts.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46187] Optionally support rounding for math.isqrt()

2021-12-28 Thread Tim Peters


Tim Peters  added the comment:

[Mark]
> def risqrt(n):
>return (isqrt(n<<2) + 1) >> 1

Sweet! New one on me - did you invent this? It's slick :-)

I'd be happy to see recipes added to the docs for rounded and ceiling flavors 
of isqrt, but am dubious about the value of building them in.

If they are added via an optional rounding argument, can we use the decimal 
module's names for the supported rounding modes?

--

___
Python tracker 
<https://bugs.python.org/issue46187>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-24 Thread Tim Peters


Tim Peters  added the comment:

If people are keen to speed comb() for larger arguments, the approach in 
Stefan's "comb_with_primes.py" is very much worth looking at. I've played with 
that idea before, and it runs circles around any other approach I've seen.

The only divisions it does are of integers no larger than n by primes no larger 
than k (the "effective" k: min(k, n-k)). The current CPython implementation 
guarantees the latter fit in a C "long long", so the divisor is never notably 
stressful.

The downside is that it requires O(log(n) * k) temp storage, to hold 
list(range(n, n-k, -1)), and O(k) temp storage to hold a sieve to find the 
primes <= k.

A subtlety: Stefan's `myprod()` is also important to its speed gains when k 
gets large. For example, given xs = list(range(1, 100)), math.prod(xs) 
takes over 40 times longer than myprod(xs). myprod() is obscurely coded 
(presumably for peak speed), but effectively arranges the multiplications in a 
complete binary tree (e.g., myprod([1, 2, 3, 4, 5, 6, 7, 8]) is evaluated as 
((1*2)*(3*4))*((5*6)*(7*8))). This gives Karatsuba multiplication good chances 
to get exploited at higher levels in the tree.

The "divide-and-conquer" recurrence already checked in also bought speed gains 
by getting Karatsuba in play, but is much less effective overall because it can 
still need divisions of giant ints by giant ints. Then again, it's frugal with 
memory.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-23 Thread Tim Peters


Tim Peters  added the comment:

Please don't use "long long". It usually introduces platform dependence where 
none is intended, or helpful. PEP 7 requires that the compiler in use supply 
the fixed-width integer types defined by C99's stdint.h[1]. These are:

int8_t
int16_t
int32_t
int64_t
uint8_t
uint16_t
uint32_t
uint64_t

This has been required since Python 3.6. There is no reason not to use them.

[1] https://www.python.org/dev/peps/pep-0007/#c-dialect

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-22 Thread Tim Peters


Tim Peters  added the comment:

No problem, Mark! I just prefer the shallowest approaches that are "good 
enough". If it's materially faster to use xors and a popcount instead, fine by 
me, just provided a comment points to a clue about why that works.

BTW, the later xor version was clearer to me at first glance than what it 
replaced, the older

k.bit_count() + (n-k).bit_count() - n.bit_count()

The connection to "carries" is quite obscured there. Instead it's a 
straightforward coding of one statement of Kummer's theorem for multinomial 
coefficients:  the highest power of a prime p dividing the multinomial 
coefficient M(n; k1, k2, k3, ...), where sum(k_i)=n, is the sum of the digits 
of k1, k2, ... when expressed in base p, less n, then divided by p-1. So, for 
p=2 and M(n; k, n-k), that's exactly the same (and leaving out the no-op of 
dividing by p-1=1 in the p=2 case).

Which in turn is, I think, easiest derived not from thinking about carries, but 
from mechanically plugging in 3 instances of that the highest power of p 
dividing i! is i minus the sum of the digits of i in base p, then divided by 
p-1. That in turn is easy to show by considering what Legendre's formula does 
in each digit position (and "carries" don't come up in that line of proof).

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Tim Peters


Tim Peters  added the comment:

I see no use of 128-bit ints in the CPython core. Advice: forget it. 

int64_t and uint64_t are required by C99, and are used many places in the core. 
Advice: use freely.

Note that if tables of "odd part mod 2**64" and "number of trailing zeroes" are 
used up through 67, then factorials up through 25! are trivially computed via

Fodd[i] << Fntz[i]


(at 26, the odd part no longer fits in 64 bits)

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-21 Thread Tim Peters


Tim Peters  added the comment:

Clever, Mark! Very nice.

The justification for the shift count isn't self-evident, and appears to me to 
be an instance of the generalization of Kummer's theorem to multinomial 
coefficients. I think it would be clearer at first sight to rely instead on 
that 2**i/(2**j * 2**k) = 2**(i-j-k), which is shallow.

So here's a minor rewrite doing that instead; it would add 68 bytes to the 
precomputed static data.

import math
# Largest n such that comb(n, k) fits in 64 bits for all k.
Nmax = 67

# Express n! % 2**64 as Fodd[n] << Fntz[n] where Fodd[n] is odd.
Fodd = [] # unsigned 8-byte ints
Fntz = [] # unsigned 1-byte ints
for i in range(Nmax + 1):
f = math.factorial(i)
lsb = f & -f # isolate least-significant 1 bit
Fntz.append(lsb.bit_length() - 1)
Fodd.append((f >> Fntz[-1]) % 2**64)

Finv = [pow(fodd, -1, 2**64) for fodd in Fodd]

# All of the above is meant to be precomputed; just static tables in C.

# Fast comb for small values.
def comb_small(n, k):
if not 0 <= k <= n <= Nmax:
raise ValueError("k or n out of range")
return ((Fodd[n] * Finv[k] * Finv[n-k] % 2**64)
<< (Fntz[n] - Fntz[k] - Fntz[n-k]))

# Exhaustive test
for n in range(Nmax+1):
for k in range(0, n+1):
assert comb_small(n, k) == math.comb(n, k)

Since 99.86% of comb() calls in real life are applied to a deck of cards ;-) , 
it's valuable that Nmax be >= 52.

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-12-20 Thread Tim Peters


Tim Peters  added the comment:

Just curious about why the focus on the newer exp2 and log2? Logs are logs, and 
the "natural" `exp()` and `log()` should work just as well. On Windows, exp2() 
is particularly poor for now (I've seen dozens of cases of errors over 2 ulp, 
with exp2(x) very much worse than the Windows pow(2, x)).

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45917] Add math.exp2() function: 2^x

2021-11-29 Thread Tim Peters


Tim Peters  added the comment:

Across millions of tries, same thing: Windows exp2 is off by at least 1 ulp 
over a third of the time, and by over 2 ulp about 3 times per million. Still 
haven't seen pow(2, x) off by as much as 0.52 ulp.

>From its behavior, it appears Windows implements exp2(x) like so:

i = floor(x)
x -= i # now 0 <= x < 1
return ldexp(exp2(x), i)

So it's apparently using some sub-state-of-the-art approximation to 2**x over 
the domain [0, 1].

But a consequence is that it gets it exactly right whenever x is an integer, so 
it's unlikely anyone will notice it's sloppy ;-)

I expect we should just live with it.

--

___
Python tracker 
<https://bugs.python.org/issue45917>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45917] Add math.exp2() function: 2^x

2021-11-29 Thread Tim Peters


Tim Peters  added the comment:

Bad news: on Windows, exp2(x) is way worse then pow(2, x). Here I changed the 
loop of Mark's little driver like so:

differ = really_bad = 0
worst = 0.0
for n in range(100_000):
x = random.uniform(-1000.0, 999.0) + random.random()
if exp2(x) != pow(2.0, x):
differ += 1
exp2err = exp2_error_ulps(x)
pow2err = pow2_error_ulps(x)
assert abs(pow2err) < 0.52
if abs(exp2err) >= 1.0:
if abs(exp2err) > abs(worst):
worst = exp2err
really_bad += 1
if really_bad < 25:
print(f"{x.hex():21} {x:22.17f} {exp2err:.5f}, 
{pow2err:.5f}")
print(f"{differ=:,}")
print(f"{really_bad=:,}")
print(f"worst exp2 ulp error {worst:.5f}")

Then output from one run:

0x1.0946680d45f28p+9   530.55005041041749791 -1.04399, -0.04399
0x1.de4f9662d84f8p+9   956.62177691995657369 -1.00976, -0.00976
-0x1.60f9152be0a09p+4  -22.06081120624188330 1.02472, 0.02472
-0x1.687b056d7a81ap+8 -360.48055156937482479 1.48743, 0.48743
0x1.8e97e9d405622p+9   797.18682337057930454 1.05224, 0.05224
-0x1.2d1e3a03eda7fp+9 -602.23614548782632028 -1.21876, -0.21876
0x1.3af55e79cd45dp+8   314.95847283612766887 -1.10044, -0.10044
0x1.0e7fba610cde6p+9   540.99787533882476964 -1.39782, -0.39782
0x1.9c7d0258e460dp+9   824.97663413192060489 1.19690, 0.19690
0x1.3de5064eb1598p+9   635.78925498637818237 1.75376, -0.24624
-0x1.d5189d23da3d0p+9 -938.19229553371587826 1.07734, 0.07734
0x1.967d0857aa500p+9   812.97681709114112891 1.23630, 0.23630
-0x1.30ee89e018914p+6  -76.23294782781550794 -1.10275, -0.10275
-0x1.e35eb8936dddbp+9 -966.74000780930089149 -1.02686, -0.02686
-0x1.28d40d7693088p+6  -74.20708260795993283 1.00352, 0.00352
-0x1.e965d067d1084p+7 -244.69885563303625986 1.21136, 0.21136
-0x1.b1fbeec1c1ba3p+7 -216.99205594529948371 -1.05536, -0.05536
-0x1.543d715a5824cp+9 -680.48002175620922571 1.24955, 0.24955
0x1.526829d46c034p+9   676.81377654336984051 -1.17826, -0.17826
-0x1.bdaf1d7850c74p+6 -111.42101085656196346 1.08670, 0.08670
-0x1.48218d1605dd0p+9 -656.26211810385029821 1.06103, 0.06103
-0x1.16298bcdb9103p+9 -556.32457896744051595 -1.23732, -0.23732
0x1.39ff24b1a7573p+8   313.99665365539038930 -1.20931, -0.20931
0x1.9cdf1d0101646p+8   412.87153631481157845 -1.23481, -0.23481
differ=38,452
really_bad=7,306
worst exp2 ulp error -1.91748

So they differed in more than a third of the cases; in about a fifth of the 
differing cases, the exp2 error was at least 1 ulp, and nearly 2 ulp at worst; 
while in all the differing cases the pow(2, x) error was under 0.52 ulp.

--

___
Python tracker 
<https://bugs.python.org/issue45917>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45902] Bytes and bytesarrays can be sorted with a much faster count sort.

2021-11-28 Thread Tim Peters


Tim Peters  added the comment:

I agree with closing this - I don't know of real use cases either.

Serhiy, essentially all LSD radix sorts are stable, and rely on that for their 
own correctness. MSD radix sorts vary.

--

___
Python tracker 
<https://bugs.python.org/issue45902>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45918] Possibly use ROUND_05UP in decimal's localcontext() example

2021-11-28 Thread Tim Peters


Tim Peters  added the comment:

I'll add that the rounding mode is intended to ease emulating fixed-point 
arithmetic. The decimal spec claimed that was a goal, but there really isn't 
any direct support for saying, e.g., "I want two digits after the decimal 
point". Only for specifying total precision, independent of the radix point's 
position.

So, e.g., if you want to work with tax rates, etc, but keeping results to penny 
precision,

1. Set the rounding mode to ROUND_05UP with "plenty of" precision digits.

2. To add, say, a statutory 3.578% tax rate to an item with cost C:

   C *= decimal.Decimal("1.03578")
   C = C.quantize(decimal.Decimal(".01"), decimal.ROUND_HALF_UP)

or whatever final rounding mode local statutes require.

I"m not sure anyone other than Mike Cowlishaw realizes that, though ;-)

--

___
Python tracker 
<https://bugs.python.org/issue45918>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45918] Possibly use ROUND_05UP in decimal's localcontext() example

2021-11-28 Thread Tim Peters


Tim Peters  added the comment:

Not a good idea in general - this rounding mode is _mostly_ "to zero", and 
isn't intended for chains of operations. I don't believe I've ever seen it used 
in a real program, so the current "no example at all" is a fair representation 
of real usage ;-)

--

___
Python tracker 
<https://bugs.python.org/issue45918>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Tim Peters


Tim Peters  added the comment:

But I would like to leave it alone. Extended precision simply is not an issue 
on any current platform I'm aware of ("not even Windows"), and I would, e.g., 
hate trying to explain to users why

1 / 2731 != 1.0 / 2731.0

(assuming we're not also proposing to take float division away from the HW). 
It's A Feature that

I / J == float(I) / float(J)

whenever I and J are both representable as floats.

If extended precision is an issue on some platform, fine, let them speak up. On 
x87 we could document that CPython assumes the FPU's "precision control" is set 
to 53 bits.

--

___
Python tracker 
<https://bugs.python.org/issue45876>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Tim Peters


Tim Peters  added the comment:

> Objects/longobject.c::long_true_divide() uses ldexp() internally.
> Will it suffer the same issues with subnormals on Windows?

Doesn't look like it will. In context, looks like it's ensuring that ldexp can 
only lose trailing 0 bits, so that _whatever_ ldexp does in the way of rounding 
is irrelevant. But it's not doing this because of Windows - it's to prevent 
"double-rounding" errors regardless of platform.

> Is CPython int/int true division guaranteed to be correctly rounded?

If there's some promise of that in the docs, I don't know where it is. But the 
code clearly intends to strive for correct rounding. Ironically, PEP 238 
guarantees that if it is correctly rounded, that's purely by accident ;-) :

"""
True division for ints and longs will convert the arguments to float and then 
apply a float division. That is, even 2/1 will return a float 
"""

But i/j is emphatically not implemented via float(i)/float(j).

--

___
Python tracker 
<https://bugs.python.org/issue45876>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45876] Improve accuracy of stdev functions in statistics

2021-11-26 Thread Tim Peters


Tim Peters  added the comment:

Mark, ya, MS's Visual Studio's ldexp() has, far as I know, always worked this 
way. The code I showed was run under the 2019 edition, which we use to build 
the Windows CPython.

Raymond,

x = float(i)

is screamingly obvious at first glance.

x = i/1

looks like a coding error at first. The "reason" for different spellings in 
different branches looked obvious in the original: one branch needs to divide, 
and the other doesn't. So the original code was materially clearer to me.

Both, not sure it helps, but this use of round-to-odd appears akin to the 
decimal module's ROUND_05UP, which rounds an operation result in such a way 
that, if it's rounded again - under any rounding mode - to a narrower 
precision, you get the same narrower result as if you had used that rounding 
mode on the original operation to that narrower precision to begin with.

Decimal only needs to adjust the value of the last retained digit to, 
effectively, "encode" all possibilities, but binary needs two trailing bits. 
"Round" and "sticky" are great names for them :-)

--

___
Python tracker 
<https://bugs.python.org/issue45876>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45876] Improve accuracy of stdev functions in statistics

2021-11-25 Thread Tim Peters


Tim Peters  added the comment:

Note that, on Windows, ldexp() in the presence of denorms can truncate. 
Division rounds, so

assert x / 2**i == ldexp(x, -i)

can fail.

>>> import math
>>> d = math.nextafter(0.0, 1.0)
>>> d
5e-324
>>> d3 = 7 * d # ....0111
>>> d3
3.5e-323
>>> d3 / 4.0   # rounds
1e-323
>>> math.ldexp(d3, -2)  # truncates
5e-324

or, perhaps more dramatically,

>>> d3 / 8.0, math.ldexp(d3, -3)
(5e-324, 0.0)

--

___
Python tracker 
<https://bugs.python.org/issue45876>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45735] Promise the long-time truth that `args=list` works

2021-11-10 Thread Tim Peters


Tim Peters  added the comment:

Changed stage back to "needs patch", since Raymond appears to have closed his 
PR. Raymond, what's up with that?

--
stage: patch review -> needs patch

___
Python tracker 
<https://bugs.python.org/issue45735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45735] Promise the long-time truth that `args=list` works

2021-11-06 Thread Tim Peters


Tim Peters  added the comment:

Serhiy, we haven't documented such stuff, and, indeed, I've been burned by it 
but much more often in the case of multiprocessing.Process. But note that I'm 
SWAPPING the order of your last two lines. In the original, you mutated the 
argument _before_ starting any parallel work, so "of course" the new worker 
will see the mutation:

def access(xs):
print(xs)

args = ([1],)
t = multiprocessing.Process(target=access, args=args)
t.start() # start parallel work before mutating
args[0][0] = 2



Does that print [1] or [2]? Passing a tuple in no way prevents mutations to 
mutable objects the tuple contains.

When the docs are silent, "implementation defined" rules. Whether you use 
threading or multiprocessing in the altered example above, the result printed 
simply isn't defined - it's a race between the main thread doing the mutation 
and the "parallel part" accessing the mutated object.

This is subtler in the multiprocessing context, though, because the relevant 
"parallel part" is really the hidden thread that pickles the argument list to 
send to the worker. That effectively makes a deep copy. But it's still a race, 
just not one visible from staring at the Python code. In the threading case, no 
copies are made.

--

___
Python tracker 
<https://bugs.python.org/issue45735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45735] Promise the long-time truth that `args=list` works

2021-11-05 Thread Tim Peters


Change by Tim Peters :


--
title: Promise that the long-time truth that `args=list` works -> Promise the 
long-time truth that `args=list` works

___
Python tracker 
<https://bugs.python.org/issue45735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45735] Promise that the long-time truth that `args=list` works

2021-11-05 Thread Tim Peters


New submission from Tim Peters :

A number of contexts allow specifying a tuple of arguments to be passed later 
to a function. The Thread constructor is a fine example, and happened to come 
up (again! for me) here today:

https://stackoverflow.com/questions/69858950/why-do-we-have-to-add-comma-in-args-in-python-multithreading/69859068

This often confuses especially newbies, because the function they intend to 
parallelize often takes only a single argument, and Python's syntax for a 
1-element tuple actually _requires_ parentheses in the context of an argument 
list, with a naked trailing comma:

t = threading.Thread(target=access, args=(thread_number,))

It "looks weird" to people.

I'm not suggesting to change that, but instead to officially bless the 
workaround I've seen very often in real code: use a list instead.

t = threading.Thread(target=access, args=[thread_number])

Nobody scratches their head over what that means.

CPython's implementations typically couldn't care less what kind of sequence is 
used, and none that I'm aware of verify that it's specifically a tuple. The 
implementations just go on to do some simple variation of

self.target(*self.args)

Tuple or list makes no real difference. I'm not really keen to immortalize the 
"any sequence type whatsoever that just happens to work" implementation 
behavior, but am keen to promise that a list specifically will work. A lot of 
code already relies on it.

--
assignee: docs@python
components: Documentation
keywords: easy
messages: 405846
nosy: docs@python, tim.peters
priority: low
severity: normal
stage: needs patch
status: open
title: Promise that the long-time truth that `args=list` works
type: behavior
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue45735>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-24 Thread Tim Peters


Change by Tim Peters :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-24 Thread Tim Peters


Tim Peters  added the comment:


New changeset 51ed2c56a1852cd6b09c85ba81312dc9782772ce by Tim Peters in branch 
'main':
bpo-45530: speed listobject.c's unsafe_tuple_compare() (GH-29076)
https://github.com/python/cpython/commit/51ed2c56a1852cd6b09c85ba81312dc9782772ce


--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-24 Thread Tim Peters

Tim Peters  added the comment:

To be concrete, here's an implementation of a full-blown, stable lexicographic 
sort based on "bucket refinement". `xs` is a list of sequences to be sorted in 
lexicographic order. The types of the sequences don't matter (lists, tuples, 
strings, ...). Indeed, since list elements are never compared against each 
other directly, they don't even have to be the same sequence type.

This is already faster in pure Python than list.sort() for cases like:

xs = [random.choices(range(10), k=random.randrange(5, 30))
  for i in range(100)]

However, for cases like strings of the form

'x' * 10_000 + str(i)

it's very much slower than list.sort(), despite that it "should be" very much 
faster. That appears mostly due to the:

t.sort(key=lambda x: x[k])
xs[lo : hi] = t

lines. list.sort() doesn't accept `lo` and `hi` arguments, so sorting _just_ a 
slice requires copying that slice into a temp list, sorting the temp, then 
copying the sorted temp back in. So dealing with a single `x` position in the 
string prefixes effectively requires physically copying the entire list twice 
over - mad overhead to copy the list 20 thousand times.

While the need doesn't come up all that often, I'd support adding optional `lo` 
and `hi` arguments to `list.sort()`. This isn't the first time the lack has 
turned a slick approach into a timing disaster for me ;-)

About list.sort()'s merge strategy, I'm glad to be rid of the original. It's 
not just correctness, it's the effort of _reasoning_ about its consequences. It 
was about a full decade before the first proof was published of that it really 
is a worst-case O(N log N) sort. listsort.txt didn't contain a proof because 
the only proof attempts I had were so complicated not even _I_ found them 
compelling ;-)

Vincent Jugé in particular started at the other end, looking for a merge 
strategy that made proof straightforward instead of Byzantine. It's 
straightforward under the "powersort" strategy too, although it relies on "well 
known" results about approximations to optimal binary search trees.

def lexisort(xs):
buckets = [(0, len(xs), 0)]
while buckets:
lo, hi, k = buckets.pop()
t = []
for i in range(lo, hi):
x = xs[i]
if k >= len(x):
xs[lo] = x
lo += 1
else:
t.append(x)
t.sort(key=lambda x: x[k])
xs[lo : hi] = t
while lo < hi:
pivotk = xs[lo][k]
i = lo + 1
while i < hi and xs[i][k] == pivotk:
i += 1
if i - lo > 1:
buckets.append((lo, i, k + 1))
lo = i
assert lo == hi

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-22 Thread Tim Peters


Tim Peters  added the comment:

Stefan, thanks - I think I understand what you're driving at now.

You're (re)discovering that sorting by lexicographic ordering rules is 
_inherently_ suboptimal if list elements can only be compared "starting at 
index 0" each time. Not just tuples, but also, e.g., lists, byte arrays, and 
even strings. Any sequence type whose ordering derives from lexicographic 
generalization of its elements' ordering.

This is quite independent of whether or not CPython tries to exploit type 
homogeneity.

The usual(*) solution is indeed to exploit a kind of bucket sort instead, first 
sorting at elements' index 0 alone, saying all those elements with the same 
value at index 0 constitute "a bucket", and then applying the same idea to each 
bucket but sorting on index 1 instead. Where those in turn are partitioned into 
buckets equal at index 1, and those buckets are similarly each sorted on index 
2. And so on, and so on.

No doubt that can be a big win, and even in the absence of the type homogeneity 
tricks. It's far beyond the scope of _this_ PR, though, and is really an 
entirely different approach.

I've thought about it at times, but it's "a real project" to do it right. In 
effect, it would implement an optimized generalization of the SO OP's 
"automagically convert multikey to multisort" wish.

Fine by me if someone pursues that, but I don't have the energy for it, nor 
much interest either in doing a half-hearted job of it.

I always expected that if it came up at all, it would be in the context of 
sorting lists of strings.  For example, consider:

xs = ['x' * 10_000 + str(i) for i in range(1_000_000)]
random.shuffle(xs)

Even with type homogeneity tricks, Python's sorting of the result is very far 
from optimal. Every single comparison starts by burning time skipping over the 
(at least) ten thousands equal characters at the start.

The "bucket sort" (or it could be viewed as a form of MSD radix sort) quickly 
discovers that all the characters at index 0 are equal, and also all those at 
index 1, ..., and all those at index . The "real work" of sorting is then 
reduced to working on the (at most) 6 characters remaining.

(*) But I'm lying ;-) The actual usual solution is to use "multikey quicksort", 
because it's easy to code and works "in place". But it's not a stable sort. 
String sorting doesn't care about that, but sorting other kinds of sequences 
can care a lot.

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45569] Drop support for 15-bit PyLong digits?

2021-10-22 Thread Tim Peters


Tim Peters  added the comment:

+1 "in theory". But I too don't know whether any platforms would be adversely 
affected, or how to find out :-(

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue45569>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-21 Thread Tim Peters


Tim Peters  added the comment:

> I see you mentioned that PyObject_RichCompareBool(..., Py_EQ) might be
> faster for example because it checks identity.

For example, in tupsort.py replace

xs = [random() for _ in range(length)]

with

xs = ['z' * 100 for _ in range(length)]

Then sorting that directly is _slower_ than wrapping each string in a 1-tuple 
first. That's so today, and also so in the latest version of the PR (but not in 
the original PR code).

> Why not do an identity check before the ms->tuple_elem_compare calls? Too
> expensive and rarely successful?

It's a cheap test, but, ya, "rarely successful" outside of special cases. 
PyObject_RichCompareBool() already does that special-casing now, and the PR now 
adjusts to use PyObject_RichCompareBool(Py_EQ) instead when the pair of 
ms->tuple_elem_compare calls isn't paying off. It's enough that one part of the 
chain rarely pays off, but gets nearly all the potential benefit when it does 
pay off. Duplicating the special-casing would double the overhead with scant 
additional payoff when it pays.

> I sorted a million tuples where both first and second element
> are randomly chosen from a population of 10,000.

So you expect about 100 repetitions of each value in each tuple position.

> So their amounts of duplication were the same. But these are the
> statistics from sorting:
> - First position:   18,603,981 equality comparisons, 29.87% equal
> - Second position:   5,556,203 equality comparisons,  0.09% equal

Well, for any given fixed value in the first tuple position, you expect about 
100 tuples total to have that value in the first position, and the second 
position in those contains a random sample (with repetition) of 100 elements 
out of 10,000. It's not surprising the 2nd positions are rarely equal _given_ 
that the universe has been reduced to the 100 tuples with the same first 
element.

In any case, I don't see the point to this exercise ;-)

BTW, don't overlook that tuple_elem_compare can _only_ be used on the very 
first elements of the tuples. The pre-scan developed no idea whatsoever of the 
types of tuple elements other than the first.


> One more idea: What do you think of sorting lists of tuples
> (or tuple keys) in two stages?

Primarily that I'm not looking for a term project here - I'm looking to get an 
easy win from simple changes to one function.

What's there in the PR now is designed to play well with how sorting works. 
When there are many duplicates, a merge will find about 7 comparison attempts 
in a row where the first elements are equal, and the code adjusts to use 
PyObject_RichCompareBool(Py_EQ) for at least all but the first compare. 
Galloping mode will continue with that for a brief time at the start, but 
probably soon hit a chain of cases all of which can be resolved solely with 
tuple_elem_compare calls, and the code adjusts to that too, never using 
PyObject_RichCompareBool(Py_EQ) again after the first time it sees that return 
"no, not equal".

OTOH, when there are no duplicates, PyObject_RichCompareBool(Py_EQ) isn't 
helpful, and in fact will never be called.

> 1) Sort the list only by first position (comparing with only one
>tuple_elem_compare).

Not sure what that means, exactly. (the "with only one tuple_elem_compare" 
part; I know what it means to sort by the first position)

> 2) Identify equal-first-position-runs (with tuple_elem_compare) and
> sort each run independently (comparing only positions 1+).

What are you aiming at? There was no motivation given here.

> Some experiments I did with this looked good, not sure if too
> off-topic to post here...

Will, this issue report is about unsafe_tuple_compare(), so, ya, if the changes 
you envision aren't limited to that, I'd say you want to open a different issue 
report and a different PR :-)

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45542] Using multiple comparison operators can cause performance issues

2021-10-20 Thread Tim Peters


Tim Peters  added the comment:

I think Dennis's example is fatal: from section 6.10 ("Comparisons"):

"""
Comparisons can be chained arbitrarily, e.g., `x < y <= z` is equivalent to `x 
< y and y <= z`, except that y is evaluated only once (but in both cases z is 
not evaluated at all when x < y is found to be false).
"""

So doing LOAD_FAST twice on x (in `1 < x < 3`) is prohibited by the language 
definition. Doesn't matter to this whether it's plain `x` or `f(x)` where `f()` 
is some arbitrary function: the object the middle comparand signifies is fixed 
at whatever it was when the the first comparison is evaluated.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue45542>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-20 Thread Tim Peters


Tim Peters  added the comment:

It's rare that an optimization is a _pure_ win. Some cases win, others lose. 
There's almost never "a proof" of net gain ending with "QED".

Of course it's dead easy to construct examples where "many duplicates in the 
first tuple position" rules, and even easy to construct realistic examples.

But, as you already said in the SO report, in those cases it's a better idea to 
do multiple single-key sorts instead of a single multiple-keys-in-a-tuple sort. 
The base sorting algorithm can exploit duplicates in a single-key sort - 
indeed, the more duplicates there are, the more benefit it can get.

The sorting HOWTO could do a service by discussing this for CPython's sorting 
algorithm - it's not obvious, and can have major effects.

In the meantime, I'm only aiming to speed tuple keys for cases where they're 
well-suited to begin with: the first tuple elements _are_ mostly distinct. 
Giving up a significant win for the relative handful of people who know how to 
exploit the algorithm is not worth it, to me, to avoid making suboptimal uses 
even less optimal.

I'm more a pragmatist than a Utopian :-;

Extending the idea to positions beyond the first is dubious on those grounds: 
if the first tuple positions in fact often match, then the optimization has 
already backfired. Time to admit defeat then, not double down on failed 
arrogance ;-)

One idea I'm playing with: keep track of how often, during a tuple-key sort, a 
compare _is_ resolved by the first elements. Then adjust 
`unsafe_tuple_compare()` to switch to "the other" approach when "the current" 
approach isn't working out.

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-20 Thread Tim Peters


Tim Peters  added the comment:

> Elliot shortly after retrated from the approach, saying he
> rewrote unsafe_tuple_compare to move the less-than after
> the equality testing, to make sure it's 100% consistent".

I remember at the time having no idea what he meant by that comment - and I 
still have no idea ;-)

> The proposed method would change comparisons of *the same* tuples.
> You'd for example no longer have
> "sorted(tuples) == [min(tuples), max(tuples)]"
> for a list of two tuples.

Since very little is defined about the result of sorted() in the absence of a 
total ordering (just that it's _some_ permutation of the original), that's not 
surprising. It's already the case anyway (using the released 3.10.0):

>>> ts = [(float("nan"), 1), (float("nan"), 1.0)]
>>> sorted(ts) == [min(ts), max(ts)]
False

> I recently saw a case where despite the "only <" promise, sorting
> caused an "x > y" comparison (because "y < x" wasn't implemented).

Sorting itself never asks for __gt__, only __lt__. What the _rest_ of the 
language does to implement __lt__ is out of sorting's control. For example, 
tuple.__lt__ and list.__lt__ go on to invoke __eq__ on contained elements. A 
more general subsystem in the language falls back on x.__gt__(y) if y.__lt__(x) 
isn't implemented. Sorting has no control over any of that.

> tupsort.py does "range(2", should be "range(1" I think)

Good catch - thanks! Repaired now :-)

--
Added file: https://bugs.python.org/file50373/tupsort.py

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Tim Peters


Tim Peters  added the comment:

Stefan, I looked at that old PR and can't find anywhere I suggested that he 
change the unsafe_tuple_compare() logic. I just _asked_ him "I'm curious about 
what the patched Python prints for this program:".

And, in fact, that program showed that CPython was _already_ inconsistent with 
how NaNs were treated during tuple comparison (object identity overriding 
float.__eq__).

In any case, no, I have no problem at all with inferring "x == y" from "not (x 
< y) and not (y < x)".

Curious factoid: in [x, y].sort(), CPython never asks "x < y?". Because that's 
irrelevant ;-) The list is already sorted if and only if "not (y < x)". Which 
is how "x <= y" is spelled, because the implementation promises to do only "<" 
comparisons.

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Tim Peters


Tim Peters  added the comment:

Stefan, I have scant memory of ever caring, but, if I did, I got over it ;-) 

>>> math.nan == math.nan
False
>>> {math.nan : 5}[math.nan]
5

That is, PyObject_RichCompareBool() takes object identity as overriding __eq__; 
that's why the dict lookup works. But this one doesn't:

>>> {math.nan : 5}[float("nan")]
... Traceback (most recent call last):
KeyError: nan

Although that may change too.

I used to care a little, but not at all anymore. There's no sense trying to 
_make_ sense of what sorting could possibly mean in the absence of a total 
ordering.

> If you sort objects that always return True for both `<` and `==`,

A case of "garbage in, garbage out" to me.

> this would also have the opposite problem, considering tuple u smaller
> than v when it shouldn't.

What justifies "shouldn't"? If u[0] < v[0], then by the definition of 
lexicographic ordering, u < v. But if u[0] == v[0], which apparently is _also_ 
the case, then the same definition says the ordering of u and v is inherited 
from the ordering of u[1:] and v[1:]. There's no principled way of declaring 
one of those possibly contradicting definitions "the right one".

> That said, maybe the function could be optimized for
> known "well-behaving" types?

A type is well-behaving to me if and only if it implements a total ordering. If 
a type doesn't, what you get is an implementation accident, and code relying on 
any specific accident is inherently buggy.

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Tim Peters


Change by Tim Peters :


--
assignee:  -> tim.peters

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +27345
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/29076

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Tim Peters


Tim Peters  added the comment:

The attached tupsort.py gives a simple. focused example. Typical output on my 
box:

float   3.10
(float,)   11.75
[float]25.68

It's sorting a large list of floats. In the first line the list contains plain 
floats. In the second line, each float was wrapped in a 1-tuple. In the last 
line, wrapped in a singleton list.

Essentially any overhead of any kind is more expensive than merely comparing 
two floats in HW, so overhead is approximately everything here.  The tuple and 
list comparison functions are very similar, and the large advantage of 
"(float,)" over "[float]" is mostly due to that unsafe_tuple_compare() uses one 
less PyObject_RichCompareBool() call to resolve each compare (assuming that all 
floats in the list are distinct, which I didn't check, but is almost certainly 
the case).

Getting rid of its other PyObject_RichCompareBool() should yield another nice 
speed boost.

The pattern is worth addressing because tuples are routinely used as key= 
arguments to achieve multi-key sorting.

--
Added file: https://bugs.python.org/file50372/tupsort.py

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Tim Peters


Tim Peters  added the comment:

FYI, this is fallout from a StackOverflow mystery:

https://stackoverflow.com/questions/69468552/efficiency-of-sorting-by-multiple-keys-in-python/69610671#

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Tim Peters


New submission from Tim Peters :

The code could typically be faster if it did what its comments imply it does: 
skip the expense of PyObject_RichCompareBool() entirely for the first pair of 
tuple elements. It actually always calls PyObject_RichCompareBool() on the 
first pair, and only if that says "not equal" does it use the faster 
ms->tuple_elem_compare to resolve "ok, so it's less than?".

Instead it could do the first pair before, and entirely apart from, the loop, 
along the lines of:

- Use ms->tuple_elem_compare to see whether u[0] < v[0]. If so, or if an error, 
we're done.

- Use ms->tuple_elem_compare to see whether v[0] < u[0]. If so, or if an error, 
we're done.

Else we can assume u[0] == v[0], and go on to compare u[1:] to v[1:] via a 
trivial variation of the current code.

In cases where the first pair does resolve it (common!), replacing a 
PyObject_RichCompareBool() with a ms->tuple_elem_compare can be significantly 
faster overall.

--
components: Interpreter Core
messages: 404360
nosy: tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: Improve listobject.c's unsafe_tuple_compare()
type: performance
versions: Python 3.11

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45348] math.log(243, 3) value issue

2021-10-02 Thread Tim Peters


Tim Peters  added the comment:

CPython's log() builds on the platform C's libm facilities, and C simply 
doesn't define primitives capable of producing a worst-case < 1 ulp error 
2-argument log in reasonable time. Instead we have to build it out of two 
separate log operations, and a division. Each of those 3 operations can suffer 
its own rounding errors, which may, overall, cancel out, or build on each 
other. There's no error bound we can guarantee, although "< 2 ulp worst-case 
error" seems likely under modern libms.

Which is actually quite good. Doing better than that is out of scope for 
CPython's implementation. The easy way to get < 1 ulp is to use decimal to 
compute the intermediate results with excess precision. But that's also "too 
slow" for general use.

What Dennis did in his little test driver was fine for that, but we don't 
actually need to increase decimal's default precision at all to get < 1 ulp 
error in a converted-back-to-float-via-round-nearest result here.

Just an example (doesn't "prove" anything - but points at how to go about 
proving things):

>>> decimal.Decimal(3**8).ln() / decimal.Decimal(3).ln()
Decimal('7.999')
>>> float(_)
8.0

--
nosy: +tim.peters
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue45348>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45180] possible wrong result for difflib.SequenceMatcher.ratio()

2021-09-15 Thread Tim Peters


Tim Peters  added the comment:

Please stop re-opening this. The issue tracker is not a "help desk", and your 
confusions aren't necessarily Python bugs ;-) If you post something that looks 
like an actual bug, I'll re-open the report.

SequenceMatcher works on sequences.

HtmlFiff works on sequences OF sequences (typically lists of lines). Very 
different things. For example,

h = difflib.HtmlDiff()
h.make_file(['aaab'], ['aaa'])

finds nothing at all in common. It finds that the two lines don't match, and 
then finds that the lines aren't "similar enough" to bother looking any deeper. 
But, obviously, they do share 'aaa' as a common prefix, and calling 
SequenceMatcher directly on the two strings will find their common prefix.

There's no reason to imagine they'll produce the same results - they're not 
doing the same things. SequenceMatcher is used, in several places, as a 
building block to do the more complicated things HtmlDiff does. But HtmlDiff 
works on _lines_ first; SequenceMatcher has no concept of "line".

As to 1-(delta/totalSize), I have no idea where that came from. What 
SequenceMatcher.ratio() returns is documented:

"""
Where T is the total number of elements in both sequences, and M is the number 
of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are 
identical, and 0.0 if they have nothing in common.
"""

--
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue45180>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45180] possible wrong result for difflib.SequenceMatcher.ratio()

2021-09-15 Thread Tim Peters


Tim Peters  added the comment:

I have no idea why you think the result should be 0.2. 0.5630188679245283 looks 
correct to me with autojunk disabled:

sm = SequenceMatcher(None, a, b, autojunk=False)
total = 0
for m in sm.get_matching_blocks():
print(m, repr(a[m.a : m.a + m.size]))
total += m.size

Running that displays every matching block:

Match(a=0, b=0, size=73) '\n#include \n#include \nusing 
namespace std;\nint main() {\n'
Match(a=74, b=73, size=10) '   string '
Match(a=87, b=84, size=1) 'r'
Match(a=138, b=85, size=2) 'i;'
Match(a=141, b=87, size=11) '\n   cin >> '
Match(a=155, b=99, size=1) 'r'
Match(a=160, b=101, size=1) ';'
Match(a=200, b=102, size=10) '\n   for (i'
Match(a=210, b=116, size=9) ' = 0; i <'
Match(a=220, b=125, size=1) ' '
Match(a=221, b=130, size=1) 's'
Match(a=228, b=133, size=1) 'e'
Match(a=230, b=136, size=2) '; '
Match(a=232, b=139, size=2) '++'
Match(a=235, b=141, size=1) ')'
Match(a=237, b=142, size=1) '{'
Match(a=274, b=143, size=11) '\n  if ('
Match(a=285, b=156, size=1) 'i'
Match(a=288, b=161, size=1) 'i'
Match(a=294, b=163, size=8) " == 'i')"
Match(a=305, b=171, size=10) '\n '
Match(a=315, b=183, size=1) 'i'
Match(a=318, b=188, size=1) 'i'
Match(a=324, b=190, size=14) " = '1';\n  "
Match(a=380, b=204, size=4) 'if ('
Match(a=384, b=210, size=1) 'i'
Match(a=387, b=215, size=1) 'i'
Match(a=393, b=217, size=8) " == 'a')"
Match(a=404, b=225, size=10) '\n '
Match(a=414, b=237, size=1) 'i'
Match(a=417, b=242, size=1) 'i'
Match(a=423, b=244, size=14) " = '@';\n  "
Match(a=479, b=258, size=4) 'if ('
Match(a=483, b=264, size=1) 'i'
Match(a=486, b=269, size=1) 'i'
Match(a=492, b=271, size=8) " == 'm')"
Match(a=503, b=279, size=10) '\n '
Match(a=513, b=291, size=1) 'i'
Match(a=516, b=296, size=1) 'i'
Match(a=522, b=298, size=14) " = 'M';\n  "
Match(a=578, b=312, size=4) 'if ('
Match(a=582, b=318, size=1) 'i'
Match(a=585, b=323, size=1) 'i'
Match(a=591, b=325, size=8) " == 'B')"
Match(a=602, b=333, size=10) '\n '
Match(a=612, b=345, size=1) 'i'
Match(a=615, b=350, size=1) 'i'
Match(a=621, b=352, size=14) " = '8';\n  "
Match(a=677, b=366, size=4) 'if ('
Match(a=681, b=372, size=1) 'i'
Match(a=684, b=377, size=1) 'i'
Match(a=690, b=379, size=8) " == 's')"
Match(a=701, b=387, size=10) '\n '
Match(a=711, b=399, size=1) 'i'
Match(a=714, b=404, size=1) 'i'
Match(a=720, b=406, size=14) " = '$';\n  "
Match(a=763, b=420, size=1) '}'
Match(a=822, b=421, size=12) '\n   cout << '
Match(a=837, b=436, size=26) ' << endl;\n\n   return 0;\n}\n'
Match(a=863, b=462, size=0) ''

and then

>>> total
373
>>> 2 * total / (len(a) + len(b))
0.5630188679245283
>>> sm.ratio()
0.5630188679245283

give identical results.

--
resolution:  -> not a bug
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue45180>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45180] possible wrong result for difflib.SequenceMatcher.ratio()

2021-09-12 Thread Tim Peters


Tim Peters  added the comment:

Unfortunately, you're getting hurt by the "autojunk" feature (see the docs). If 
you turn it off, you'll get a result much more to your liking:

>>> print(SequenceMatcher(None, a, b).ratio())
0.3431803896920176

>>> print(SequenceMatcher(None, a, b, autojunk=False).ratio())
0.9553739786297926

--
nosy: +tim.peters
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue45180>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-09-06 Thread Tim Peters


Change by Tim Peters :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-09-06 Thread Tim Peters


Tim Peters  added the comment:


New changeset 5cb4c672d855033592f0e05162f887def236c00a by Tim Peters in branch 
'main':
bpo-34561: Switch to Munro & Wild "powersort" merge strategy. (#28108)
https://github.com/python/cpython/commit/5cb4c672d855033592f0e05162f887def236c00a


--

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-09-01 Thread Tim Peters


Tim Peters  added the comment:

I created a PR that implements the powersort merge strategy:

https://github.com/python/cpython/pull/28108

Across all the time this issue report has been open, that strategy continues to 
be the top contender. Enough already ;-) It's indeed a more difficult change to 
make to the code, but that's in relative terms. In absolute terms, it's not at 
all a hard change.

Laurent, if you find that some variant of ShiversSort actually runs faster than 
that, let us know here! I'm a big fan of Vincent's innovations too, but 
powersort seems to do somewhat better "on average" than even his 
length-adaptive ShiversSort (and implementing that too would require changing 
code outside of merge_collapse()).

--

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-08-31 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +26550
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/28108

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-08-30 Thread Tim Peters


Tim Peters  added the comment:

New runstack.py mostly adds comments about a surprise: the idea that 
length-adaptive ShiversSort eeks out better results than powersort appears 
nearly unique to the specific "0.80" cutoff used in the random-case generation 
code to pick between two uniform distributions. Change that cutoff, and 
powersort almost always does better.

So powersort remains the best known overall, although shivers4 remains 
competitive with it.

--
Added file: https://bugs.python.org/file50246/runstack.py

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-08-29 Thread Tim Peters


Tim Peters  added the comment:

And another runstack.py adds `shivers4()`, which reworks `shivers3()` 
(length-adaptive ShiversSort) so that division, log2(), and floor() aren't used 
anymore. It does need a loop, though, which needs to go around a number of 
times `k` such that k is the smallest integer for which

2**k * max(run_length_1, run_length_2, run_length3) >=
1 + len(original_list)

--
Added file: https://bugs.python.org/file50243/runstack.py

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45045] Optimize mapping patterns of structural pattern matching

2021-08-29 Thread Tim Peters


Change by Tim Peters :


Removed file: https://bugs.python.org/file50242/runstack.py

___
Python tracker 
<https://bugs.python.org/issue45045>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45045] Optimize mapping patterns of structural pattern matching

2021-08-29 Thread Tim Peters


Change by Tim Peters :


--
Removed message: https://bugs.python.org/msg400568

___
Python tracker 
<https://bugs.python.org/issue45045>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45045] Optimize mapping patterns of structural pattern matching

2021-08-29 Thread Tim Peters


Tim Peters  added the comment:

And another runstack.py adds `shivers4()`, which reworks `shivers3()` 
(length-adaptive ShiversSort) so that division, log2(), and floor() aren't used 
anymore. It does need a loop, though, which needs to go around a number of 
times `k` such that k is the smallest integer for which

2**k * max(run_length_1, run_length_2, run_length3) >=
1 + len(original_list)

--
nosy: +tim.peters
Added file: https://bugs.python.org/file50242/runstack.py

___
Python tracker 
<https://bugs.python.org/issue45045>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-08-29 Thread Tim Peters


Tim Peters  added the comment:

Added new runstack.py.

New `shivers2()` adds the implementation of adaptive ShiversSort from Vincent's 
later paper. While the code is simpler, it appears to behave identically.

New `shivers3()` adds, from the same paper, the new "length-adaptive 
ShiversSort". Wow! This is the only thing we've seen yet that's in the same 
universe as powersort. In fact, it usually does a tiny bit better (on the 
randomized cases). But it's not obvious how to rework its "if" test to be truly 
efficient (as-is, it needs two divisions, two calls to `log2()`, and two calls 
to `floor()`).

Worth some thought, though. From the results here, length-adaptive ShiversSort 
is a bigger improvement over (plain-old) adaptive ShiversSort than the latter 
is over timsort.

--
Added file: https://bugs.python.org/file50241/runstack.py

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34561] Replace list sorting merge_collapse()?

2021-08-28 Thread Tim Peters

Tim Peters  added the comment:

The merge order was mentioned on python-dev today, and a quick web searched 
turned up a revision of Vincent Jugé's "Adaptive Shivers Sort: An Alternative 
Sorting Algorithm" paper I hadn't seen before:

https://arxiv.org/pdf/1809.08411.pdf

Its "length-adaptive ShiversSort" variation sure _sounds_ like it was intended 
to address the issues we discussed here (some "bad" cases when very few runs 
are in play).

The analyses in that paper show that length-adaptive ShiversSort, and 
powersort, have the same asymptotic best and worst-case behaviors. Average 
case? Who knows? Hard to believe it could really be an issue, because even the 
worst cases are pretty darn good.

So length-adaptive ShiversSort is worth looking into. But, if someone has the 
energy to pursue it, I'd be happy, at this point, just to give plain old 
"adaptive ShiversSort" a try. The version of the paper linked to above even 
lists the precise changes needed to CPython's code to implement it (although a 
code review would ask for changes, most obviously that its "int x" is too 
narrow an integer type).

--

___
Python tracker 
<https://bugs.python.org/issue34561>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44835] What does "Python for Windows will still be Python for DOS" mean?

2021-08-04 Thread Tim Peters


Tim Peters  added the comment:

The CPython Windows installer has a "thank you" box at the end:

"""
Special Windows thanks to Mark Hammond, without whose years of freely shared 
Windows expertise, Python for Windows would still be Python for DOS.
"""

There was no support for Windows in Python's earliest releases, and Mark 
Hammond did heroic work adding Windows support.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44835>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44770] float('nan') is True

2021-07-28 Thread Tim Peters


Tim Peters  added the comment:

Sorry, I'm just going to close this. For values of all numeric types now, 
`bool(x)` returns the same as `x != type(x)(0)`. Besides being 
backward-incompatible, making an exception for NaN would be jarringly 
inconsistent.

Note that you don't need numpy to conveniently check for a NaN anyway; Python's 
math.isnan() does the same.

If you want to persist, please bring it up on the python-ideas mailing list. If 
it gains traction there, feel free to re-open this report.

--
nosy: +tim.peters
resolution:  -> rejected
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue44770>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44692] Const folding in parser with negative numbers doesn't match float/int behaviour

2021-07-20 Thread Tim Peters


Tim Peters  added the comment:

The binary power operator (`**`) has higher precedence than the unary negation 
operator (`-`).

That is, -x**y groups as -(x**y).

Not a bug - that's how it was designed and how it's documented.

Note that this isn't novel, either. For example, to give just one example of 
many, that's how Maxima does it too:

(%i1) -2**3;
(%o1) -8

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44692>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44663] Possible bug in datetime utc

2021-07-18 Thread Tim Peters

Tim Peters  added the comment:

If you want to pursue changing what utcnow() does, python-ideas or python-dev 
would probably need to be involved. Backward-incompatible changes are very hard 
sells.

As Paul Ganssle noted here,

https://blog.ganssle.io/articles/2019/11/utcnow.html

in Python 2, naïve datetimes generally did _not_ get treated as "being in local 
time", ever. They refused to pretend they had any opinion about time zone, and 
operations like .timestamp() (just "like", because .timestamp() didn't exist in 
Python 2) raised an exception when applied to a naïve datetime.

Which, IMO, Python 3 should have stuck with. "Naïve time" was never intended to 
be a synonym for "local time" - or for UTC time - or for any other 
timezone-aware notion of time.

But as Paul also said, if Python 2 had concrete tzinfo classes to begin with, 
it's a pretty safe bet `utcnow()` would never have existed.

--

___
Python tracker 
<https://bugs.python.org/issue44663>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44663] Possible bug in datetime utc

2021-07-17 Thread Tim Peters


Tim Peters  added the comment:

> It looks like the difference one would expect from (fast) human input)

Nope, the timestamps in the original report are about 3 hours apart (10808+ 
seconds).

Reports like these are often much clearer if they state the timezone of the 
system they're running on.

Plausible here: as the docs say, `utcnow()` returns a _naive_ datetime - no 
timezone info is attached.

But `.timestamp()`:

"""
Naive datetime instances are assumed to represent local time 
"""

So I _expect_ the results to differ unless the box this is running on uses UTC 
as its local time.

On my box, in native timezone CDT, the two ways are 5 hours apart, which is 
indeed CDT's offset from UTC.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44663>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44611] CPython uses deprecated randomness API

2021-07-12 Thread Tim Peters


Tim Peters  added the comment:

Dan, the Microsoft URL in your message gives a 404 for me. Did you perhaps mean 
to end it with "cng-portal" (instead of "cng-por")?

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44611>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44571] itertools: takedowhile()

2021-07-12 Thread Tim Peters


Tim Peters  added the comment:

That said, if you really do want those semantics, it's easy to build on top of 
Raymond's API:

def takewhile_plus_one_more_if_any(pred, iterable):
from itertools import islice, chain
before, after = before_and_after(pred, iterable)
return chain(before, islice(after, 1))

--

___
Python tracker 
<https://bugs.python.org/issue44571>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44571] itertools: takedowhile()

2021-07-12 Thread Tim Peters


Tim Peters  added the comment:

If you don't use the 'after` iterator, then of course you'll never see the 
values (if any) it would have yielded.

How could it possibly be otherwise? By design and construction, the `before` 
iterator ends before yielding the first (if any) transitional element.

As Raymond said at the start, the `takedowhile()` proposal appears much harder 
to use correctly, since there's no reasonably sane way to know that the last 
value it yields _is_ the transitional element (or, perhaps, that there was no 
transitional element, and the underlying iterable was just exhausted without 
finding one).

If the proposal were instead for `takewhile_plus_one_more_if_any()`, then at 
least the ugly name would warn about the surprising intended behavior ;-)

--

___
Python tracker 
<https://bugs.python.org/issue44571>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44571] itertools: takedowhile()

2021-07-10 Thread Tim Peters


Tim Peters  added the comment:

I agree Raymond's `before_and_after()` looks like an elegant, efficient, and 
usable approach to this.

One minor nit: there's no need for the `iter()` call in:

yield from iter(transition)

Indeed, it confused me at first, because `yield from x` does its own `iter(x)` 
call under the covers, and since everyone knows that ;-) , I wondered what deep 
trickery calling it again was intended to pull off.

But I persuaded myself there was no such subtle intent - it's just redundant.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44571>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-12 Thread Tim Peters


Tim Peters  added the comment:

Closing this now because the pull request did, I believe, all that can be done 
at the function level. Exponents of 1 and 2 are well within a factor of 2 of 
repeated multiplication now, and it's essentially a tie at exponent 3 now. 
Above that, pow() wins now. On my box.

Doing better would require a smarter compiler, which, e.g., knew that `pow(x, 
2)` is the same as `x*x`. But, as is, `pow` is just another identifier to 
CPython's compiler, and may refer to any code at all. `i**2` isn't really much 
better, because CPython just redirects to type(i)'s __pow__ function at 
runtime. Which, again to the compiler, may refer to any code at all.

`pow()` is quite an involved function, needing to cater to all sorts of things, 
including possible reduction by the optional modulus, and possibly negative 
exponents.

`pow(i, 2)` (same as `i**2` under the covers) does exactly one Python-int 
multiplication now, same as `i*i`. That's fast. In either case overheads 
account for the bulk of the elapsed time. The overhead of going around the eval 
loop an "extra" time (in `i*i`) and doing another name lookup is simply smaller 
than all the overheads `pow()` incurs to _deduce_, at runtime, that it's only 
being asked to do one multiply.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue44376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-12 Thread Tim Peters


Tim Peters  added the comment:


New changeset 9d8dd8f08aae4ad6e73a9322a4e9dee965afebbc by Tim Peters in branch 
'main':
bpo-44376 - reduce pow() overhead for small exponents (GH-26662)
https://github.com/python/cpython/commit/9d8dd8f08aae4ad6e73a9322a4e9dee965afebbc


--

___
Python tracker 
<https://bugs.python.org/issue44376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Tim Peters


Tim Peters  added the comment:

This is a stab at reducing overhead for small exponents, along the lines I 
sketched:

https://github.com/python/cpython/pull/26662

Unfortunately, I've been unable to convince BPO and GitHub to recognize that 
the PR is related to this report. Did something basic change?

Incidentally, at first this change triggered rare shutdown deaths due to 
negative refcounts, in the collection of small integer objects. That was a 
head-scratcher! Turns that was, I believe, due to a "temp = NULL" line missing 
from earlier code introduced to implement powers of modular inverses.

--

___
Python tracker 
<https://bugs.python.org/issue44376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Tim Peters


Change by Tim Peters :


--
keywords: +patch
pull_requests: +25248
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/26662

___
Python tracker 
<https://bugs.python.org/issue44376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44376] Improve performance of integer exponentiation

2021-06-10 Thread Tim Peters


Tim Peters  added the comment:

Under the released 3.9.5 for 64-bit Win10, raising to the power 2 is clearly 
much slower than multiplying directly:

C:\Windows\System32>py -3 -m timeit -s "x=151" "x*x"
1000 loops, best of 5: 30 nsec per loop

C:\Windows\System32>py -3 -m timeit -s "x=151" "x**2"
100 loops, best of 5: 194 nsec per loop

Since the multiplication itself is cheap, overheads must account for it. 
Offhand, looks to me like the `x**2` spelling is actually doing 31 
multiplications under the covers, although most of them are as cheap as 
Python-int multiplies get.

for (i = Py_SIZE(b) - 1; i >= 0; --i) {
digit bi = b->ob_digit[i];

for (j = (digit)1 << (PyLong_SHIFT-1); j != 0; j >>= 1) {
MULT(z, z, z);
if (bi & j)
MULT(z, a, z);
}
}

Python ints on a 64-bit box are stored internally in base 2**30 (PyLong_SHIFT 
is 30). z starts life at 1. The first 28 trips through the loop are chewing up 
the 28 leading zero bits in exponent 2, so MULT(z, z, z) multiplies 1 by 1 to 
get 1, 28 times. Then again on the 29th iteration, but now "bi & j" is finally 
true (we finally found the leading one bit in exponent 2), so z is replaced by 
1 times the base = the base. On the final, 30th, iteration, MULT(z, z, z) 
replaces z with its square, and we're done.

It would probably be worthwhile to add code special-casing the leading Python 
"digit" of the exponent, fast-forwarding without any multiplies to the leading 
one bit, and setting z directly to the base then.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44376>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44339] Discrepancy between math.pow(0.0, -inf) and 0.0**-inf

2021-06-07 Thread Tim Peters


Tim Peters  added the comment:

+1. Although, to be fair, I'd personally be happy if (+-0)**inf returned, say, 
1.375 instead ;-)

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44339>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44197] [request feature] Itertools extended combinations to limited number of repetition

2021-05-20 Thread Tim Peters


Change by Tim Peters :


--
nosy: +rhettinger

___
Python tracker 
<https://bugs.python.org/issue44197>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44197] [request feature] Itertools extended combinations to limited number of repetition

2021-05-20 Thread Tim Peters


Tim Peters  added the comment:

Dennis, combinations("aaabbbcccddd") isn't a valid call - the function requires 
a "how many?" argument too. If, e.g., we were asking for groups of 4, then 
combinations("aaabbbcccddd", 4) generates the 4-tuple ('a', 'b', 'c', 'd') 81 
(3**4) times, while the OP presumably only wants to get it once.

OTOH, combinations_with_replacement("abcd", 4) can generate tuples with more 
than 3 repetitions of a given element.

The linked StackOverflow entry gives an efficient "real solution", but I agree 
with its author's final comment: "It is much too specialized for itertools". 
Indeed, it seems too obscure and special-purpose to me to even qualify as a 
reasonable candidate for an itertools doc "recipe".

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44197>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44154] Optimize Fraction pickling

2021-05-16 Thread Tim Peters


Tim Peters  added the comment:

Oh yes - please do. It's not just pickle size - going through str() makes 
(un)pickling quadratic time in both directions if components are large. Pickle 
the component ints instead, and the more recent pickle protocol(s) can do both 
directions in linear time instead.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue44154>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44054] 2**53+1 != float(2**53+1)

2021-05-06 Thread Tim Peters


Tim Peters  added the comment:

[Stefan]
> I found it surprising that a comparison uses a different
> method of conversion than the (obvious) user-side
> conversion, with a different outcome. This seems to be
> implementation details leaking into the user side.

It's "spirit of 754", though, so any "principled" implementation would do the 
same. That is, part of the spirit of 754 is to deliver the infinitely price 
result _when_ the infinitely precise result is representable.

So, in particular,

> The net effect is that some integers will never equal
> a floating point value, even though the floating point
> value does its very best to represent that integer.

in fact for "almost no" Python ints `i` do i == float(i), because "almost all" 
unbounded ints `i` lose information when converted to finite-precision float 
(so with infinite precision they're not equal).

--

___
Python tracker 
<https://bugs.python.org/issue44054>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44034] Incorrect type casting of float into int

2021-05-04 Thread Tim Peters


Tim Peters  added the comment:

Please study the docs first:

https://docs.python.org/3/tutorial/floatingpoint.html

That will give you the background to understand why `int()` has nothing to do 
with this.

>>> 1.
2.0

That is, `int()` was passed 2.0 to begin with, because the binary float closest 
to the decimal value 1. is in fact 2.0.

If you can't live with that, use the `decimal` module instead:

>>> import decimal
>>> int(decimal.Decimal("1."))
1

--
nosy: +tim.peters
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue44034>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37387] test_compileall fails randomly on Windows when tests are run in parallel

2021-04-30 Thread Tim Peters


Tim Peters  added the comment:

Yes, test_compileall can still fail for this reason on Windows. From a run just 
now with -j0 (same as -j10 on this box, which has 8 logical cores: a -j value 
<= 0 is treated the same as "2 + number of logical cores"):

"""
Compiling 'C:\\Code\\Python\\lib\\types.py'...

*** PermissionError: [WinError 5] Access is denied: 
'C:\\Code\\Python\\lib\\__pycache__\\types.cpython-310.pyc.2205988433776' -> 
'C:\\Code\\Python\\lib\\__pycache__\\types.cpython-310.pyc'
"""

I did nothing in particular to try to provoke it. "It's random." So is the 
specific .py fail it fails on.

--

___
Python tracker 
<https://bugs.python.org/issue37387>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37387] test_compileall fails randomly on Windows when tests are run in parallel

2021-04-29 Thread Tim Peters


Tim Peters  added the comment:

A "good" solution would be one that runs the test in such a way that it doesn't 
fail only on Windows ;-)

There are presumably many ways that could be accomplished, including ugly ones. 
For example, if test_compileall is in the collection of tests to be run, and 
it's a parallel run, run it first by itself, before starting anything else in 
parallel.

No, I'm not suggesting we do that. Just trying to get across that there are 
_always_ ways to worm around the bug du jour.

I don't know enough about when and why CPython decides to replace .pyc files 
now to make another suggestion worth anyone else's time to evaluate. Victor 
already has something else in mind, though.

--

___
Python tracker 
<https://bugs.python.org/issue37387>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37387] test_compileall fails randomly on Windows when tests are run in parallel

2021-04-29 Thread Tim Peters


Tim Peters  added the comment:

@Sheyvan, whether it's possible to delete (rename, etc) an open file is a 
property not of Python, but of the operating system. Windows doesn't allow it; 
Linux (for example) does.

It's generally considered to be "a bug" in CPython's implementation whenever it 
contains code that _assumes_ such things are safe (which is, alas, very easy 
for a Linux programmer to do by accident).

So that test_compileall fails in this way is inarguably "a bug". Guido's larger 
point is harder to address, though. In theory, we control everything our test 
suite does, but little of what arbitrary users do.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue37387>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43955] Test Failures on Windows 10

2021-04-27 Thread Tim Peters


Tim Peters  added the comment:

Shreyan Avigyan:
> And the "(Pdb) continue (...) actually is manually entered by me.

Victor Stinner:
Do you mean that you modified the Python source code?

Me:
Doubt it. For me, with more words: the "(Pdb) " prompt appears all by itself, 
by magic, and the test run is stuck then. I bet Shreyan meant to say "so I 
manually entered 'continue [RETURN]' at the pdb prompt to get it unstuck again".

> Does the issue go away if you revert your change or
> if you test a newly installed Python?

For me, I was using Win10 x64 CPython built from yesterday's github master/main.

And thanks for the distutils clue! I bet that one is a distinct issue.

To make it more confusing, all 4 tests passed when I ran with `-j` (to force 
parallel test execution) - but then test_compileall failed instead :-)

--

___
Python tracker 
<https://bugs.python.org/issue43955>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43955] Test Failures on Windows 10

2021-04-27 Thread Tim Peters


Tim Peters  added the comment:

I expect parallelism is a red herring: early in the test output attached to 
this report:

0:00:04 Run tests sequentially

and there's no other evidence in the output that multiple tests are running 
simultaneously.

Also on Win10, the 4 failing tests here pass for me if I run them one at a 
time, so it's not obvious.

I _suspect_ that what's going wrong with test_pdb is the root cause: every now 
& again, for some weeks now, when I try to run tests on Windows I come back to 
the cmd.exe window and see that it's just sitting there, waiting at a pdb 
prompt.

In the output attached to this report, note that after test_threading starts, 

(Pdb) continue
(Pdb) continue
(Pdb) continue
(Pdb) continue

appears out of the blue. But test_pdb is long finished by that time.

But I'm clueless about current pdb internals.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue43955>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-10 Thread Tim Peters


Tim Peters  added the comment:

I agree hashing a NaN acting like the generic object hash (return rotated 
address) is a fine workaround, although I'm not convinced it addresses a 
real-world problem ;-) But why not? It might.

But that's for CPython. I'm loathe to guarantee anything about this in the 
language itself. If an implementation wants `__contains__()` tests to treat all 
NaNs as equal instead, or consider them equal only if "is" holds, or never 
considers them equal, or resolves equality as bitwise representation equality 
... all are fine by me. There's no truly compelling case to made for any of 
them, although "never considers them equal" is least "surprising" given the 
insane requirement that NaNs never compare equal under IEEE rules, and that 
Python-the-language doesn't formally support different notions of equality in 
different contexts.

--

___
Python tracker 
<https://bugs.python.org/issue43475>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-05 Thread Tim Peters


Tim Peters  added the comment:

Terry, your suggested replacement statement looks like an improvement to me. 
Perhaps the longer explanation could be placed in a footnote.

Note that I'm old ;-) I grew up on plain old ASCII, decades & decades ago, and 
tabs are in fact the only "characters" I've had a problem with in doctests. But 
then, e.g., I never in my life used goofy things like ASCII "form feed" 
characters, or NUL bytes, or ... in text either.

I don't use Unicode either, except to the extent that Python forces me to when 
I'm sticking printable ASCII characters inside string quotes ;-)

--

___
Python tracker 
<https://bugs.python.org/issue43689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43593] pymalloc is not aware of Memory Tagging Extension (MTE) and crashes

2021-04-04 Thread Tim Peters


Tim Peters  added the comment:

I think it's time to change what address_in_range() tries to answer. It 
currently gives a precise answer to "is this byte address in a region obmalloc 
owns?". But that's stronger than what it needs to do its job: the real question 
is "is this an address that obmalloc could return from its malloc() or 
realloc() functions?". Because it's only used by free() and realloc(), and they 
only care where the address they're _passed_ came from.

The difference is when an arena is not pool-aligned. Oddball chunks outside of 
full pools, at both ends of the arena, are never returned by obmalloc then.

Instead the tree could be changed to record the starting addresses of the 
_full_ pools obmalloc controls.  Those contain the only addresses obmalloc will 
ever pass out (from malloc() or realloc()).

Like so, where p is the arena base address. This hasn't been tested, and may 
contain typos or logical errors. I'm far more interested in the latter for now 
;-)

ideal1 = p & ~ARENA_SIZE_MASK # round down to ideal
ideal2 = ideal1 + ARENA_SIZE
offset = p - ideal1
b1 = bottom_node(ideal1)
b2 = bottom_node(ideal2)
if not offset:
# already ideal
b1.hi = -1
assert b2.lo == 0
elif offset & POOL_SIZE_MASK == 0:
# arena is pool-aligned
b1.hi = b2.lo = offset
else:
# Not pool-aligned.
# obmalloc won't pass out an address higher than in
# the last full pool.
# Round offset down to next lower pool address.
offset &= ~POOL_SIZE_MASK
b2.lo = offset
# And won't pass out an address lower than in the
# first full pool.
offset += POOL_SIZE # same as rounding original offset up
# That's almost right for b1.hi, but one exception: if
# offset is ARENA_SIZE now, the first full pool is at the
# start of ideal2, and no feasible address is in ideal1.
assert offset <= ARENA_SIZE
b1.hi = offset & ARENA_SIZE_MASK 

Note that, except for the oddball -1, everything stored in a bottom node is a 
pool address then, and so there's no need to store their all-zero lowest 
POOL_BITS bits. .lo and .hi can be shrunk to a signed 8-bit int with the 
current settings (20 arena bits and 14 pool bits).

And caching pool addresses wouldn't have any obscure failing end-cases either: 
address_in_range() could just as well be passed a pool address to begin with. 
It would only care about pool addresses, not byte addresses.

--

___
Python tracker 
<https://bugs.python.org/issue43593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43593] pymalloc is not aware of Memory Tagging Extension (MTE) and crashes

2021-04-03 Thread Tim Peters


Tim Peters  added the comment:

BTW, your cache WIP

https://github.com/python/cpython/pull/25130/files

partly moves to tracking pool (instead of byte) addresses, but any such attempt 
faces a subtlety:  it's not necessarily the case that a pool is entirely 
"owned" by obmalloc or by the system.  It can be split.

To be concrete, picture a decimal machine with arenas at 10,000 byte boundaries 
and pools at 100-byte boundaries, with the system malloc returning 20-byte 
aligned addresses.

If obmalloc gets an arena starting at address 20,020, the pool range(2, 
20100) has its first 20 bytes owned by the system, but its last 80 bytes owned 
by obmalloc. Pass 20050 to the PR's address_in_range, and it will say it's 
owned by the system (because its _pool_ address, 2, is). But it's actually 
owned by obmalloc.

I'm not sure it matters, but it's easy to get a headache trying to out-think it 
;-) In that case, obmalloc simply ignores the partial pool at the start, and 
the first address it can pass out is 20100. So it would never be valid for 
free() or realloc() to get 20050 as an input.

On the other end, the arena would end at byte 20020 + 1 - 1 = 30019. This 
seems worse! If 30040 were passed in, that's a system address, but its _pool_ 
address is 3, which obmalloc does control.

That reminds me now why the current scheme tracks byte addresses instead ;-) It 
appears it would require more tricks to deal correctly in all cases when 
system-supplied arenas aren't necessarily aligned to pool addresses (which was 
never a consideration in the old scheme, since a pool was at largest a system 
page, and all mmap()-like functions (used to get an arena) in real life return 
an address at worst page-aligned).

Or we'd need to find ways to force mmap() (across different systems' spellings) 
to return a pool-aligned address for arenas to begin with.

--

___
Python tracker 
<https://bugs.python.org/issue43593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43593] pymalloc is not aware of Memory Tagging Extension (MTE) and crashes

2021-04-03 Thread Tim Peters


Tim Peters  added the comment:

Can't really know without a system to try it on, but my best guess is that 
these asserts are the only thing that will fail with tagging enabled. The 
obvious "fix" is indeed just to skip them on a platform with tagging enabled. 
They're meant as a sanity check that only 48 bits are really used for 
addresses. Which remains true even when tagging is enabled - the tag bits are 
part of the _pointer_ on AMD, but not part of the address.

Taking tagging seriously instead would be a significant project, relying on 
platform-specific instructions. For a start, obmalloc would have to generate a 
random 4-bit integer for each object it returns, plug that into 4 specific 
"high order" bits of the pointer it returns, and tell the OS to associate those 
4 bits with each 16-byte chunk of the object's space.  mmap()-like calls would 
also need to be changed, to tell the OS to enable tag checking on the memory 
slice returned.

While caching may or may not speed things up, I'm not seeing how it _could_ 
help move to 64-bit addresses.  As is, the tree needs 8 bytes of bottom-node 
space for each arena in use, and that's independent of how many address bits 
there are (it only depends on the arena granularity).  I think that could be 
cut by a factor of 4 by keeping track of arena pool (instead of byte) 
boundaries in the bottom nodes, meaning that, with the _current_ settings, we'd 
only need to keep track of the 64=2^6 possible pool addresses in an arena, 
instead of the 2^20 possible byte addresses.  6 bits fits fine in a signed 
8-bit int (but we need a 32-bit int now to hold the 2^20 possible byte 
addresses in an arena).

So the clearest way to ease the space burden of keeping track of truly 
expansive address spaces is to boost arena size. And, if the tree bottom 
changed to keep track of pool (instead of byte) addresses, possibly boost pool 
size too.

--

___
Python tracker 
<https://bugs.python.org/issue43593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43689] difflib: mention other "problematic" characters in documentation

2021-04-02 Thread Tim Peters


Tim Peters  added the comment:

Lines beginning with "?" are entirely synthetic: they were not present in 
either input.  So that's what that part means.

I'm not clear on what else could be materially clearer without greatly bloating 
the text. For example,

>>> d = difflib.Differ()
>>> for L in d.compare(["abcefghijkl\n"], ["a cxefghijkl\n"]):
print(L, end="")
- abcefghijkl
?  ^
+ a cxefghijkl
?  ^ +

The "?" lines guide the eye to the places that differ: "b" was replaced by a 
blank, and "x" was inserted.  The marks on the "?" lines are intended to point 
out exactly where changes (substitutions, insertions, deletions) occurred.

If the second input had a tab instead of a blank, the "+" wouldn't _appear_ to 
be under the "x" at all.  It would instead "look like" a long string of blanks 
was between "a" and "c" in the first input, and the "+" would appear to be 
under one of them somewhere near the middle of the empty space.

Tough luck. Use tab characters (or any other kind of "goofy" whitespace) in 
input to visual tools, and you deserve whatever you get :-)

--

___
Python tracker 
<https://bugs.python.org/issue43689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43684] Add combined opcodes

2021-04-02 Thread Tim Peters

Tim Peters  added the comment:

"""
My philosophy here (which I learned from Tim Peters in the early 2000s) is that 
even though each individual improvement has no measurable effect on a general 
benchmark (as shown in the same comment), the combined effect of a number of 
tiny improvements can be significant.
"""

And sometimes more so than the naïve sum of their individual contributions! 
Although this was often much clearer on older architectures and compilers (both 
more predictable).

Indeed, in the old days I routinely checked "optimizations" in that _slowed_ 
microbenchmarks, provided they reduced the work on the critical path, and 
didn't make the code markedly harder to follow (but reducing the critical path 
usually makes the code clearer!).

Because, sooner or later, compilers will get smart enough to see what I saw, 
and generate better code. And then these can compound. Like "oh! these three 
temp variables don't actually need to be materialized at all anymore, and 
suddenly I have few enough that do need to be materialized that _all_ of them 
can live in fast registers...". So, at some point, the next seemingly 
insignificant optimization checked in could _appear_ to have an "inexplicable" 
benefit, by breaking the bottleneck on _some_ inscrutable HW resource overtaxed 
on the critical path by the original code.

So optimize for what a smart compiler will eventually do, not what they happen 
to do today ;-) Profile-guided optimization was a major step in that direction.

Saddest to me: layers of indirection introduced to support marginal features, 
because "well, they don't slow down pybench by much at all". That's the 
opposite: over time, they can compound to do worse damage than the sum of their 
individual burdens.

In the end, Everything Counts™.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue43684>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43593] pymalloc is not aware of Memory Tagging Extension (MTE) and crashes

2021-03-31 Thread Tim Peters


Tim Peters  added the comment:

I'm skeptical ;-) If MTE is actually being used, system software assigns 
"random" values to 4 of the higher-order bits. When obmalloc punts to the 
system malloc, presumably those bits will be randomized in the addresses 
returned by malloc. Then it's just not possible that obmalloc's

assert(HIGH_BITS(p) == HIGH_BITS(_map_root));

can always succeed - we're insisting there that _all_ the high-order bits are 
exactly the same as in the `_map_root` file static.  If `p` was actually 
obtained from the system `malloc()`, it should fail about 15 times out of 16 
(and regardless of which of the 16 bit patterns the platform C assigns to 
_map_root).

But, of course, that failure would only be seen in a debug build.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue43593>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43618] random.shuffle loses most of the elements

2021-03-24 Thread Tim Peters


Tim Peters  added the comment:

Are you sure it's "a list"? At least print out `type(questions_element)`. 
`random.shuffle()` doesn't contain any code _capable_ of changing a list's 
length. It only does indexed accessing of the list:

...
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randbelow(i + 1)
x[i], x[j] = x[j], x[i]

That's about all there is to it. Note that, for this purpose, it doesn't matter 
want `randbelow()` does, because that function never even sees `x`.

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue43618>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43420] Optimize rational arithmetics

2021-03-22 Thread Tim Peters


Tim Peters  added the comment:

This report is closed. Please open a different report.

We've already demonstrated that, as predicted, nothing can be said here without 
it being taken as invitation to open-ended discussion. So it goes, but it 
doesn't belong on _this_ report anymore.

--

___
Python tracker 
<https://bugs.python.org/issue43420>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43420] Optimize rational arithmetics

2021-03-21 Thread Tim Peters


Tim Peters  added the comment:

If experience is any guide, nothing about anything here will go smoothly ;-)

For example, setting up a module global `_gcd` name for `math.gcd` is a very 
standard, widespread kind of micro-optimization. But - if that's thought to be 
valuable (who knows? maybe someone will complain) - why not go on instead to 
add not-intended-to-be-passed trailing default `_gcd=math.gcd` arguments to the 
methods? Then it's even faster (& uglier, of course).

Or wrt changing properties to private attributes, that speeds some things but 
slows others - and, unless I missed it, nobody who wrote that code to begin 
with said a word about why it was done that way.

I'm not going to "pre-bless" shortcuts in an area where everything so far has 
been more contentious than it "should have been" (to my eyes). Opening a BPO 
report is a trivial effort. Skipping NEWS, or not, depends on how it goes.

--

___
Python tracker 
<https://bugs.python.org/issue43420>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43420] Optimize rational arithmetics

2021-03-21 Thread Tim Peters


Tim Peters  added the comment:

Thanks, all! This has been merged now. If someone wants to continue pursuing 
things left hanging, I'd suggest opening a different BPO report.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue43420>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43420] Optimize rational arithmetics

2021-03-21 Thread Tim Peters


Tim Peters  added the comment:


New changeset 690aca781152a498f5117682524d2cd9aa4d7657 by Sergey B Kirpichev in 
branch 'master':
bpo-43420: Simple optimizations for Fraction's arithmetics (GH-24779)
https://github.com/python/cpython/commit/690aca781152a498f5117682524d2cd9aa4d7657


--

___
Python tracker 
<https://bugs.python.org/issue43420>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43420] Optimize rational arithmetics

2021-03-12 Thread Tim Peters


Tim Peters  added the comment:

Terry, we could do that, but the opposition here isn't strong, and is pretty 
baffling anyway ;-) : the suggested changes are utterly ordinary for 
implementations of rationals, require very little code, are not delicate, and 
are actually straightforward to see are correct (although unfamiliarity can be 
an initial barrier - e.g., if you don't already know that after

 g = gcd(a, b)
 a1 = a // g
 b1 = b // g

it's necessarily true that a1 and b1 are coprime, a key reason for way the 
transformation is correct will be lost on you - but it's also very easy to 
prove that claim once you're told that it is a key here). The OP is right that 
"we" (at least Mark, and Raymond, and I) have routinely checked in far subtler 
optimizations in various areas.

--

___
Python tracker 
<https://bugs.python.org/issue43420>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43420] Optimize rational arithmetics

2021-03-10 Thread Tim Peters


Tim Peters  added the comment:

Issue 21922 lists several concerns, and best I know they all still apply. As a 
practical matter, I expect the vast bulk of core Python developers would reject 
a change that sped large int basic arithmetic by a factor of a billion if it 
slowed down basic arithmetic on native machine-size ints by even half a percent.

--

___
Python tracker 
<https://bugs.python.org/issue43420>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43420] Optimize rational arithmetics

2021-03-08 Thread Tim Peters


Tim Peters  added the comment:

I agree with everyone ;-) That is, my _experience_ matches Mark's:  as a 
more-or-less "numeric expert", I use Fraction in cases where it's already fast 
enough. Python isn't a CAS, and, e.g., in pure Python I'm not doing things like 
computing or composing power series with rational coefficients (but routinely 
do such stuff in a CAS). It's usually just combinatorial probabilities in 
relatively simple settings, and small-scale computations where float precision 
would be fine except I don't want to bother doing error analysis first to 
ensure things like catastrophic cancellation can't occur.

On the other hand, the proposed changes are bog standard optimizations for 
implementations of rationals, and can pay off very handsomely at relatively 
modest argument sizes.

So I'm +0. I don't really mind the slowdown for small arguments because, as 
above, I just don't use Fraction for extensive computation. But the other side 
of that is that I won't profit much from optimizations either, and while the 
optimizations for * and / are close to self-evident, those for + and - are much 
subtler. Code obscurity imposes ongoing costs of its own.

WRT which, I added Python's Karatsuba implementation and regret doing so. I 
don't want to take it out now (it's already in ;-) ), but it added quite a pile 
of delicate code to what _was_ a much easier module to grasp. People who need 
fast multiplication are still far better off using gmpy2 anyway (or fiddling 
Decimal precision - Stefan Krah implemented "advanced" multiplication schemes 
for that module).

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue43420>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43383] imprecise handling of weakref callbacks

2021-03-03 Thread Tim Peters


Tim Peters  added the comment:

This won't go anywhere without code (preferably minimal) we can run to 
reproduce the complaint. If there were a "general principle" at work here, 
someone else would surely have reported it over the last few decades ;-)

To the contrary, the common confusion is in the _other_ direction: a weakref 
callback _not_ getting invoked when a programmer thinks it "should be". The 
cause for that is always the same: the weakref object died before the weakly 
referenced object died. That was the primary motivation for introducing 
`weakref.finalize()`.

CPython does not, in general, "batch" (or in any other way delay) object 
destruction. An object is destroyed - immediately - when its refcount falls to 
0. In a technical sense, that's also true in the case of cycles (in which case 
the gc module artificially drives refcounts to 0, based on liveness and 
reachability analysis). With very few exceptions, neither does it hang on to 
"hidden" references. The primary exception to that is in interactive shells, 
where the module-global identifier "_" is typically bound, by magic, to the 
object most recently displayed. In the case of exceptions, it's also possible 
for programs to accidentally hang on to the exception object, from which the 
entire chain of stack frames back to the source of the exception can be reached.

So, based on what you said, this is the best attempt I can make:

import sys, weakref

class C:
def __del__(self):
print("__del__ called")

def wrcb(x):
print("weakref callback called")

c = C()
d = {"x" : weakref.ref(c, wrcb)}
print(sys.getrefcount(d["x"]))

#del d["x"]
del c

which displays:

2
__del__ called
weakref callback called

Note the 2! If the moral equivalent in your code displays a number larger than 
2, then there are more references to the weakref object than just as a dict 
value (that's one; the other reference comes from temporarily incrementing the 
refcount of `d["x"]` to pass it as an argument to `getrefcount()`).

If I uncomment the penultimate line, to destroy the weakref before `c` is 
destroyed, the output changes to:

2
__del__ called

So it's all as expected. Can you change that code to demonstrate your case?

--
nosy: +tim.peters

___
Python tracker 
<https://bugs.python.org/issue43383>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41972] bytes.find consistently hangs in a particular scenario

2021-02-28 Thread Tim Peters


Tim Peters  added the comment:


New changeset 73a85c4e1da42db28e3de57c868d24a089b8d277 by Dennis Sweeney in 
branch 'master':
bpo-41972: Use the two-way algorithm for string searching (GH-22904)
https://github.com/python/cpython/commit/73a85c4e1da42db28e3de57c868d24a089b8d277


--

___
Python tracker 
<https://bugs.python.org/issue41972>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



<    1   2   3   4   5   6   7   8   9   10   >