ni...@lysator.liu.se (Niels Möller) writes: > t...@gmplib.org (Torbjörn Granlund) writes: > >> So I think plain / is the way to go for certain systems! > > Then we should use that for the double limb loop too! It gets a bit > tricky, since we need special handling of large quotients,
Here's a div2 function that does that. Together with the deleted div1, it gives a 25% (!) speed up of gcd in the range 3-10 limbs, benchmarked on my broadwell machine. And should do well on all machines with decent small-quotient division. /* Two-limb division optimized for small quotients. */ static inline mp_limb_t div2 (mp_ptr rp, mp_limb_t n1, mp_limb_t n0, mp_limb_t d1, mp_limb_t d0) { mp_limb_t t1, t0; mp_limb_t q = n1 / d1; /* Approximative quotient */ if (UNLIKELY (q > d1) ) { /* Normalize */ mp_limb_t n2; int c; count_leading_zeros (c, d1); n2 = n1 >> (GMP_LIMB_BITS - c); n1 = (n1 << c) | (n0 >> (GMP_LIMB_BITS - c)); n0 <<= c; d1 = (d1 << c) | (d0 >> (GMP_LIMB_BITS - c)); d0 <<= c; udiv_qrnnd (q, n1, n2, n1, d1); umul_ppmm (t1, t0, q, d0); if (t1 > n1 || (t1 == n1 && t0 > n1)) { ASSERT_ALWAYS (q > 0); q--; n1 += d1; sub_ddmmss (t1, t0, t1, t0, 0, d0); } sub_ddmmss (n1, n0, n1, n0, t1, t0); /* Undo normalization */ n0 = (n0 >> c) | (n1 << (GMP_LIMB_BITS - c)); n1 >>= c; } else { n1 -= q * d1; umul_ppmm (t1, t0, q, d0); if (UNLIKELY (t1 >= n1) && (t1 > n1 || t0 > n0)) { /* q must be exactly one off */ ASSERT_ALWAYS (q > 0); q--; n1 += d1; sub_ddmmss (t1, t0, t1, t0, 0, d0); } sub_ddmmss (n1, n0, n1, n0, t1, t0); } ASSERT_ALWAYS (n1 < d1 || (n1 == d1 && n0 < d0)); rp[0] = n0; rp[1] = n1; return q; } Ideally, it should be rearranged a bit so that the large q path is a function call, not inlined. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel