Tim Peters <t...@python.org> added the comment:

[Tim]
>> The pure-Python Miller-Rabin code i posted in the
>> aforementioned thread is typically at least 100
>> times faster than that.

[Steven]
> This is exactly the sort of argument about quality of
> implementation which isn't, or shouldn't be, part of
> the argument about the API, IMO.

I wasn't at all talking about API at that point.  I was backing the argument 
_you_ were making, that trial division is a hopelessly inefficient 
implementation, compared to what's possible with probabilistic methods, 
regardless of API.  You were in fact underselling that argument, because it's 
straightforward to get an order or two of magnitude faster than you 
demonstrated.


> the Q I quoted above was carefully designed (not by me, I hasten
> to add!)

I know the paper it came from.

> to be a preudoprime to the first 307 prime bases, so it's
> something of a worst-case scenario for my version.

Which is why I have no problem picturing how this "should be" approached:  the 
original Miller-Rabin (which uses random bases, not succumbing to the 
"premature optimization" catastrophe magnet) has no problem with Q (or with 
anything else!), and hasn't really been improved on for general-purpose use 
since it was published.  It's a darned-near perfect mix of simplicity, brevity, 
clarity, robustness, generality, and speed.  "Good enough" by a long shot on 
all counts.

>>    def probably_prime(n, maxsteps=20)
>>
>> supplies a _default_ instead.  I don't want an API that's useful
>> _only_ to people who don't know what they're doing ;-)

> It's not just for people who don't know what they're doing. It
> is for people who don't want to sweat the small details,

Then they can accept the default.  In what conceivable sense is that a burden?

> they just want an nswer True or False and are prepared
> to trust that the function author knows what they are
> doing.

But the function author can't possibly know what the _user_ needs this for.  In 
some apps degree of certainty is far more important than speed, while in other 
apps it's the opposite.  Be realistic here?  Your argument here makes sense for 
always-right functions, but you're not willing to accept one of those for this 
specific purpose (& neither am I).

> If someone cares about the small details like how
> many bases to try,

It's not a "small detail" where it matters:  it is THE way to trade off 
computation time against confidence in the results.  It's a _fundamental_ 
aspect of how Miller-Rabin works.

> they might also care about details like:
>
> - which specific bases are used;
> - whether to use Miller-Rabin at all;
> - how many trial divisions to do first;
> - which specific primality test to use;

Then they should go use a special-purpose library ;-)  Letting them fiddle the 
single most important parameter isn't down in the weeds, it's a top-level 
control knob.

My answers to all the above:

- bases are picked via randrange(2, n-1)
- yes, because no better general-purpose algorithm is known
- none - although I'll allow that there may be a
  speed advantage in some apps if a gcd is done with a
  relatively small primorial first (more efficient than
  one-at-a-time trial divisions by small primes)
- Miller-Rabin

> What if the implementation shifts away from Miller-Rabin
> to (say) Baillie-PSW?

It can't ;-)  I would absolutely document that Miller-Rabin _is_ the algorithm 
being used, with the random-bases implementation choice.  Then the curious can 
search the web for a mountain of good information about it.

> Then your maxsteps parameter becomes meaningless. Do we
> deprecate it or leave it there in case the implementation
> changes again to something that has a concept of number
> of steps?

All moot, given that I have no interest in hiding the specific algorithm in 
use.  YAGNI.

> I think that if someone cares sufficiently about the details,
> then they probably won't be satisfied with a single isprime
> function, but may want is_fermat_prime, is_miller_rabin_prime,
> is_lucas_prime etc.

Again, go to a special-purpose library if that's what they want.  And, again, I 
don't agree with your characterization of the Miller-Rabin maxsteps parameter 
as a "detail".  It's a fundamental aspect of what the algorithm does.  Which 
casual users can certainly ignore, but at their own risk.

> ...
> Correct. The first twelve primes make up such a minimal set.

And if you don't care about picking the fixed bases from a prefix of the 
primes, you only need 7 bases for a 100% reliable test through 2**64:  2, 325, 
9375, 28178, 450775, 9780504, and 1795265022.  Or so this table claims:

https://miller-rabin.appspot.com/

But I don't care here.  Using a fixed set of bases is begging for embarrassment 
(for every fixed finite set of bases, there are an infinite number of 
composites they'll _claim_ are prime).  There are no systemic failure modes 
when using random bases.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39479>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to