ni...@lysator.liu.se (Niels Möller) writes: a^{1/k}, it should work fine to use a table indexed by low bits of a and *low bits only* of k. I think the underlying reason is that \phi(2^m) = 2^{m-1}, hence a^n (mod 2^m) = a^{n mod 2^{m-1}} (mod 2^m) Cute.
My implementation constructs a 4-bit starting value as r0 = 1 + (((n << 2) & ((a0 << 1) ^ (a0 << 2))) & 8); (here, a0 is the low input limb, r0 is the low output limb, and the iteration computes a^{1/n-1} mod a power of two. That's a clever formula, but it might not be faster than a tiny table if less than 7 operations can be used. We should be able to get a 8-bit starting value using a table lookup on at most 13 bits (18 KByte). But maybe it's not worth the effort; a single iteration getting from 4 bits to 8 shouldn't be terribly expensive. 18 kByte is too much. BTW, for large n one ought to use n mod the right power of 2 for the powering in the first few iterations, to avoid doing lots of useless work in powering. Perhaps as a comment add that to the file? > (2) iterate single limb code before entering the mpn loop. One should definitely have an initial single-limb loop. Similar to how it's doen with binvert and binvert_limb. I have become convinced that we need a mpn_broot, for example in a mpn_rootexact. I am not convinced that perfpow's early inversion and use of binv_root is worse than using mpn_broot and a mullo for each k. I suggest that we make four new files in mpn/generic: broot.c: mpn_broot and your mpn_xxx that computes a^{1/n-1}, perhaps call the latter mpn_brootinvm1 brootinv.c: mpn_brootinv bsqrt.c: mpn_bsqrt (which probably calls mpn_bsqrtinv, mpn_mullo) bsqrtinv.c: mpn_bsqrtinv -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel