Hi Robert,
2009/6/21 gerrob <robert.gerb...@gmail.com>

>
> Bill, I don't see your post to me at google group, I think the address
> was bad
> to mpir list. So I post your last message also with my 3 answers.
> And uploaded your new code perfpow.c at google group.


We now have a group mpir-dev for discussing implementation, and mpir-devel
is left for discussing algorithms. It has a much lower volume of posts. Feel
free to join mpir-dev if you want, or for now we can keep the discussion on
mpir-devel.


>
>
> I've attached a first attempt at an mpn version of Robert Gerbicz'
> fast perfect power test code, minus some functions I need to make it
> work. It doesn't compile yet, and I have probably introduced oodles of
> bugs to the code. However I'm going to stop working on it for now,
> until we can get some of the functions described below, implemented at
> the mpn level in MPIR.
>
> My impression of the code is that this is a *really* clever algorithm
> and should beat pretty much anything out there when coded efficiently.
> Therefore it is really important that we get this into MPIR.
>
> I did correct what I think was a minor bug. On what is now line 328 I
> replaced smallprimes[j]*smallprimes[j] < i + bound_for_p with
> smallprimes[j]*smallprimes[j] < i + sq, which I think is correct.
> Robert can you verify this!
>
>
> RE:
> Yes, you are correct, (we are sieving in the [i,i+sq-1] interval).
>
>
> I also replaced the primes q with just unsigned longs. I can imagine
> one day someone having hardware to test numbers n of N = 10*2^40
> *bits* for whether they are perfect powers. As the space requirement
> for this function is approximately N log N, this seems to be a
> sensible bound for the forseeable future. To perform the test in this
> case we need primes up to 2^40. At one point we look for primes of the
> form q = p*2k+1, starting with q = 2*p + 1. But we only generate
> primes q = p*2k + 1 for as long as p^k <= bound_for_p < B (i.e. fits
> in a limb). In general q will be much smaller than p^k, and thus if
> the latter fits in a limb, so does q. Therefore I changed quite a bit
> of the code to use single limbs instead of mpz_t's. I think this will
> be OK, but if Robert could also verify this, I would be more
> comfortable.
>
>
> RE:
> OK, that raises my limits.
> There is some misunderstanding in your post:
> If p is prime then I'm using exactly c primes of the form q=2*k*p+1,
> where c is the largest integer for that p^c<=bound_for_p,
> so for example if p>sqrt(bound_for_p),
> then I'm using only one such prime, c=1 and in fact q can be bigger
> than bound_for_p, the good news is that I think: for the first q prime
> in this form:
> q<2*p*log(p)^2 if p>2. It is similar to a conjecture in prime gaps.


I'm slightly confused. Isn't the first q of this form precisely q = 2p + 1?
If so, it cannot be as large as your limit.


>
>
>
> Anyhow, we now need the following functions:
>
> * b = powm(a, e, m) which sets b = a^e mod m, where a, b, e and m are
> all mp_limb_t's
>  (we should pull a function out of FLINT long_extras module for this).
>
> * mpn_root(r, ap, an, p) returns 1 is {ap, an} is a p-th power (else
> 0), setting r to the root if it is
>  (actually we don't need the root itself)
>  Currently we have mpz_root which basically uses mpn_rootrem which
> computes the root and
>  remainder and checks if the remainder is 0. This is probably
> sufficient for this code which does
>  O(1) root computations. (In general one can do much better than
> mpz_root to test if something
>  is a root. One only needs to work with an/p bits and compute a root.
> Then check the p-th power
>  of the bottom limb of this root gives the correct bottom limb of
> {ap, an}. If not it is not a root. Of
>  course then we need to perform a complete root with remainder check).
>
> * mpn_remove_1(bp, ap, an, p) remove the highest possible power of p
> (which is 1 limb) from
>  {ap, an} and return the exponent.
>  Currently we have mpz_remove, but nothing optimised for the case
> when p is 1 limb.
>
> * multimod_reduce_ui(mp_limb_t * res, ap, an, mp_limb_t * primes,
> count)
>  Given an array of count primes, each of which fits in a limb, reduce
> {ap, an} by each of the primes
>  in the array and return the residues in the array of limbs res.
>  Robert did this with a tree, but it can be done slightly more
> efficiently. It turns out that you can do
>  the bottom layer or two of the tree using umul_ppmm when multiplying
> up, and then udiv_qrnnd for
>  the last layer when doing the reductions. This gives a substantial
> speedup in practice.
>  It is important that this function allows aliasing between primes
> and res to save space.
>
> Some optimisations for later might include:
>
> * Factor out the prime sieve. MPIR could have a globally defined array
> of primes, computed up to some
>  limit, which could be extended as needed. It seems silly to compute
> such a list of primes every time
>  mpn_perfect_power_p is called. We need to be careful about doing
> this however, as global objects are
>  not threadsafe.
>
> * Have an optimised version of mpn_perfect_power_p which takes a
> precomputed "tree", then make
>  mpn_perfect_power_p just a wrapper for this function which keeps
> track of some global precomputed tree
>  so that if someone calls mpn_perfect_power_p many times, the tree
> does not need to be computed
>  multiple times. Again this needs to be done carefully for thread
> safety reasons. This was handled in
>  FLINT with thread local storage, but that is not available on Sun or
> Cygwin and it also means the
>  precomputation needs to be done per thread, which is unsatisfactory.
>
> * It's not clear to me that we need to take the "tree" out as far as
> we do. We should probably check at
>  each step that we haven't already exceeded the size of the original
> n when multiplying up. The reason
>  is that reduction is trivial once our modulus is bigger than n
> itself. But this is an optimisation that can be
>  used in the multimodular reduction code.
>
>
> RE:
> After the multiplication I think in the root of the tree: tree[1]
> is smaller than n for all inputs. If you just mulitple all primes
> up to log(n)/10 then that's about n^(1/10), we are using more than
> one primes for small p, but in general only one, and for avarage case
> q=2kp+1 is about log(p)*p. I've tried some values, and for all of
> them
> prod[1]<n was true.


Great, so we don't need to bother about this optimisation. Thanks for
checking this.


>
>
>
> * An additional copy of the input is made (n --> u) if no shifting is
> required, to avoid destroying the input
>  to this function. This copy might be avoided somehow. It's probably
> not a big deal, as it only happens
>  one time in 64 (or 32 on a 32 bit machine), and it is certainly not
> where the complexity in the algorithm
>  lies, so I left it for now.
>

Bill.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to