t...@gmplib.org (Torbjörn Granlund) writes: > static mp_limb_t > gcd_bin_tab16 (mp_limb_t a, mp_limb_t b, size_t *np) > { > static signed char tab[16] = { > #if ! defined (TAB_VARIANT) || TAB_VARIANT == 4 || TAB_VARIANT > 4 > -1, -3, -5, -7, > -3, -1, -7, -5, > -5, -7, -1, -3, > -7, -5, -3, -1, > #endif > }; > return gcd_bin_tabkm (a, b, np, tab, 2); > }
And one has some freedom to choose representativs of these numbers mod 2^{k+1}. E.g., one might replace -7 by +1 and/or -5 by +3. > So is this practical? We haven't made a good implementation yet. Such > an implementation would need to eliminate all data-dependent branches. It could be arrange as first doing t = u - v; v <-- min (u,v) u <-- |t| with the same masking tricks as in the current gcd_11. Then, to eliminate some additional bits of u, do a table lookup (based on u, v *after* the above operations) for an even value m, and set u <-- u + m v Multiplication with additiona and masks is possible, but I'm afraid the needed shifts to get v added in at the right position may be slow. if u == 0, terminate, otherwise set c = count_trailing_zeros(u). If we use any negative candidates for m, do conditional negation to get u <-- |u| and finally shift u by c bits. Compared to gcd_11, we get a few more bits of progress. The additonal operations are the table lookup (could be shifts of a magic constant), the multiplication, and the absolute value. ----- On a somewhat different track, I've reread the paper on k-ary gcd, and lets focus on 16-ary, eliminating a least 4 bits at a time. Given u, v odd, u > v, one can compute the quotient q = u/v mod 16; this is the same multiplier that you tabulate. Let's use representatitives ±1, ±3, ± 5, ±7. One can then set u <-- (u - q v) / 16 (which may be odd or even). But that won't always give 4 bits of progress; if q v > u we get some growth at the high end, and this happens when q is largish and u and v are close in size. When q is small, ±1 or ±3, progress should aways be good is good. The k-ary trick comes in when q is largish. When q = 5, instead of u <-- (u - 5v) / 16 we could try u <-- (3u + v) / 16 Unfortunately, that may introduce a spurious factor of 3, since gcd (3u + v, v) = gcd (3u, v), and probably not worth it, so lets stick to u - 5v in this case, and consider larger q. If q = 7, then instead of u <-- u - 7v, either of the linear combinations u <-- (3u - 5v) / 16 u <-- (5u - 3v) / 16 gives good progress for u, but with risk for spurious factors. However, if we use *both* combinations, u <-- (5u - 3v) / 16 v <-- (3u - 5v) / 16 then we can note that the determinant is -9 + 25 = -16, is a power of two! So we get avoid the spurious factor problem, at the cost of some growth of min (u,v) when q = ±7; the -7 case is a bit worse, with u <-- (5u + 3v) / 16 v <-- (3u + 5v) / 16 Here v is still the smaller number, and worst case growth is from close to zero to close to 3u/16. Maybe we get good enough progress on average so that it's worth it? We could also adapt, and use the u <-- u - 7 v when v is small compared to u, and the k-ary linear combinations when u and v are close in size. That should give pretty good progress in every iteration, but perhaps it gets to complicated to have a chance of running fast. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel