Hi Robert, 2009/6/21 gerrob <robert.gerb...@gmail.com> > > Bill, I don't see your post to me at google group, I think the address > was bad > to mpir list. So I post your last message also with my 3 answers. > And uploaded your new code perfpow.c at google group.
We now have a group mpir-dev for discussing implementation, and mpir-devel is left for discussing algorithms. It has a much lower volume of posts. Feel free to join mpir-dev if you want, or for now we can keep the discussion on mpir-devel. > > > I've attached a first attempt at an mpn version of Robert Gerbicz' > fast perfect power test code, minus some functions I need to make it > work. It doesn't compile yet, and I have probably introduced oodles of > bugs to the code. However I'm going to stop working on it for now, > until we can get some of the functions described below, implemented at > the mpn level in MPIR. > > My impression of the code is that this is a *really* clever algorithm > and should beat pretty much anything out there when coded efficiently. > Therefore it is really important that we get this into MPIR. > > I did correct what I think was a minor bug. On what is now line 328 I > replaced smallprimes[j]*smallprimes[j] < i + bound_for_p with > smallprimes[j]*smallprimes[j] < i + sq, which I think is correct. > Robert can you verify this! > > > RE: > Yes, you are correct, (we are sieving in the [i,i+sq-1] interval). > > > I also replaced the primes q with just unsigned longs. I can imagine > one day someone having hardware to test numbers n of N = 10*2^40 > *bits* for whether they are perfect powers. As the space requirement > for this function is approximately N log N, this seems to be a > sensible bound for the forseeable future. To perform the test in this > case we need primes up to 2^40. At one point we look for primes of the > form q = p*2k+1, starting with q = 2*p + 1. But we only generate > primes q = p*2k + 1 for as long as p^k <= bound_for_p < B (i.e. fits > in a limb). In general q will be much smaller than p^k, and thus if > the latter fits in a limb, so does q. Therefore I changed quite a bit > of the code to use single limbs instead of mpz_t's. I think this will > be OK, but if Robert could also verify this, I would be more > comfortable. > > > RE: > OK, that raises my limits. > There is some misunderstanding in your post: > If p is prime then I'm using exactly c primes of the form q=2*k*p+1, > where c is the largest integer for that p^c<=bound_for_p, > so for example if p>sqrt(bound_for_p), > then I'm using only one such prime, c=1 and in fact q can be bigger > than bound_for_p, the good news is that I think: for the first q prime > in this form: > q<2*p*log(p)^2 if p>2. It is similar to a conjecture in prime gaps. I'm slightly confused. Isn't the first q of this form precisely q = 2p + 1? If so, it cannot be as large as your limit. > > > > Anyhow, we now need the following functions: > > * b = powm(a, e, m) which sets b = a^e mod m, where a, b, e and m are > all mp_limb_t's > (we should pull a function out of FLINT long_extras module for this). > > * mpn_root(r, ap, an, p) returns 1 is {ap, an} is a p-th power (else > 0), setting r to the root if it is > (actually we don't need the root itself) > Currently we have mpz_root which basically uses mpn_rootrem which > computes the root and > remainder and checks if the remainder is 0. This is probably > sufficient for this code which does > O(1) root computations. (In general one can do much better than > mpz_root to test if something > is a root. One only needs to work with an/p bits and compute a root. > Then check the p-th power > of the bottom limb of this root gives the correct bottom limb of > {ap, an}. If not it is not a root. Of > course then we need to perform a complete root with remainder check). > > * mpn_remove_1(bp, ap, an, p) remove the highest possible power of p > (which is 1 limb) from > {ap, an} and return the exponent. > Currently we have mpz_remove, but nothing optimised for the case > when p is 1 limb. > > * multimod_reduce_ui(mp_limb_t * res, ap, an, mp_limb_t * primes, > count) > Given an array of count primes, each of which fits in a limb, reduce > {ap, an} by each of the primes > in the array and return the residues in the array of limbs res. > Robert did this with a tree, but it can be done slightly more > efficiently. It turns out that you can do > the bottom layer or two of the tree using umul_ppmm when multiplying > up, and then udiv_qrnnd for > the last layer when doing the reductions. This gives a substantial > speedup in practice. > It is important that this function allows aliasing between primes > and res to save space. > > Some optimisations for later might include: > > * Factor out the prime sieve. MPIR could have a globally defined array > of primes, computed up to some > limit, which could be extended as needed. It seems silly to compute > such a list of primes every time > mpn_perfect_power_p is called. We need to be careful about doing > this however, as global objects are > not threadsafe. > > * Have an optimised version of mpn_perfect_power_p which takes a > precomputed "tree", then make > mpn_perfect_power_p just a wrapper for this function which keeps > track of some global precomputed tree > so that if someone calls mpn_perfect_power_p many times, the tree > does not need to be computed > multiple times. Again this needs to be done carefully for thread > safety reasons. This was handled in > FLINT with thread local storage, but that is not available on Sun or > Cygwin and it also means the > precomputation needs to be done per thread, which is unsatisfactory. > > * It's not clear to me that we need to take the "tree" out as far as > we do. We should probably check at > each step that we haven't already exceeded the size of the original > n when multiplying up. The reason > is that reduction is trivial once our modulus is bigger than n > itself. But this is an optimisation that can be > used in the multimodular reduction code. > > > RE: > After the multiplication I think in the root of the tree: tree[1] > is smaller than n for all inputs. If you just mulitple all primes > up to log(n)/10 then that's about n^(1/10), we are using more than > one primes for small p, but in general only one, and for avarage case > q=2kp+1 is about log(p)*p. I've tried some values, and for all of > them > prod[1]<n was true. Great, so we don't need to bother about this optimisation. Thanks for checking this. > > > > * An additional copy of the input is made (n --> u) if no shifting is > required, to avoid destroying the input > to this function. This copy might be avoided somehow. It's probably > not a big deal, as it only happens > one time in 64 (or 32 on a 32 bit machine), and it is certainly not > where the complexity in the algorithm > lies, so I left it for now. > Bill. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---