[issue43053] Speed up math.isqrt, again

Mark Dickinson Thu, 28 Jan 2021 09:30:21 -0800

Mark Dickinson <[email protected]> added the comment:

Some comments, now that I've had a chance to look properly at the suggestion.


For reference, here's the "near square root" function that forms the basis of 
Python's isqrt algorithm. For clarity, I've written it recursively, but it's 
equivalent to the iterative version described in mathmodule.c. (Definition: for 
a positive integer n, a "near square root" of n is an integer a such that |a - 
√n| < 1; or in other words the near square roots of n are the floor and the 
ceiling of √n.)

    def nsqrt(n):
        """Compute a near square root for a positive integer n."""
        if n < 4:
            return 1
        else:
            e = (n.bit_length() - 3) // 4
            a = nsqrt(n >> 2*e + 2)
            return (a << e) + (n >> e + 2) // a

Juraj's suggestion, applied to each step of the recursion rather than just the 
outer step, amounts to computing the expression in the last line in a different 
way. (What follows isn't *identical* to what Juraj is suggesting in all the 
details, but it's essentially equivalent and has the same key performance 
implications.) Here's the proposed new version, identical to the previous one 
except for the last line:

    def nsqrt(n):
        """Compute a near square root for a positive integer n."""
        if n < 4:
            return 1
        else:
            e = (n.bit_length() - 3) // 4
            a = nsqrt(n >> 2*e + 2)
            return (a << e + 1) + ((n >> e + 2) - (a * a << e)) // a

With regards to proof, it's straightforward to see that this is equivalent: we 
have

        (a << e) + (n >> e + 2) // a
     == (a << e) + (a << e) + (n >> e + 2) // a - (a << e)
     == (a << e + 1) + (n >> e + 2) // a - (a << e)
     == (a << e + 1) + ((n >> e + 2) - (a * a << e)) // a

It's interesting to note that with this version we *do* rely on Python's floor 
division semantics for negative numbers, since the quantity ((n >> e + 2) - (a 
* a << e)) could be negative; that last equality is not valid with C-style 
sign-of-result-is-sign-of-dividend division. The first version works entirely 
with nonnegative integers. (And the version proposed by Juraj also works 
entirely with nonnegative integers, as a result of the extra correction step.)

And now the key point is that the dividend (n >> e + 2) - (a * a << e) in the 
second version is significantly smaller (roughly two-thirds the bit count) than 
the original divisor n >> e + 2. That should make the division roughly twice as 
fast (in the limit as n gets large), and since the division is the main 
bottleneck in the algorithm for large n, it should speed up the algorithm 
overall, again in the limit for large n, despite the fact that we have a 
greater number of arithmetic operations per iteration. And with Python's 
current subquadratic multiplication but quadratic division, the advantage 
becomes more significant as n gets large.

The second version is a little more complicated than the first, but the 
complication probably amounts to no more than 10-20 extra lines of C code. 
Still, there's a maintenance cost in adding that complication to be considered.

But here's the thing: I don't actually care about the performance for *huge* n 
- people who care about that sort of thing would be better off using gmpy2. I'm 
much more interested in the performance implications for *smaller* n: that is, 
integers of length 64-256 bits, say. (For n smaller than 2**64 the difference 
is irrelevant, since we have a pure C fast path there.) If the second version 
performs better across the board it may be worth the extra complexity. If it 
doesn't, then what's the cutoff? That is, where does the second version start 
outperforming the first? I'm not really so interested in having a hybrid 
algorithm that switches from one solution to the other at some threshold - 
that's a step too far complexity-wise.

So I propose that the next step be to code up the second variant in 
mathmodule.c and do some performance testing.

Juraj: are you interested in working on a PR?

----------
nosy: +tim.peters

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue43053>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43053] Speed up math.isqrt, again

Reply via email to