Ciao, Il Mer, 2 Ottobre 2019 3:29 pm, Torbjörn Granlund ha scritto: > "Marco Bodrato" <bodr...@mail.dm.unipi.it> writes:
> I wrote the doubling-modulo step with 3 possible operations: > if it can not overflow, simply lshift; > otherwise choose between (x*2-m) and ((x-m)*2) > > Do you choose that at each step? That might be a cause for delay due to > branch miss. Yes, at each step. I implemented it with two variants of that portion: #ifdef BRANCHLESS int cond1 = (r0 >= m0); int cond2 = (r0 > (m0 >> 1)); r0 -= m0 & - (mp_limb_t) cond1; r0 <<= 1; r0 -= m0 & - (mp_limb_t) (cond1 ^ cond2); #else if (r0 <= (m0 >> 1)) r0 <<= 1; else if (r0 >= m0) r0 = (r0 - m0) << 1; else r0 = (r0 << 1) - m0; #endif I did not measure them very carefully, but the BRANCHLESS variant seems slower. > Let me make sure I understand what you're comparing. > > Are you comparing mpz_powm with base = 2 with your mew special code for > base = 2 for size = 1? The current mpz_powm code has (basically) the same speed for any base. 2, 3, 0xffffffffff or whatever. I added a special function, that is called by mpz_powm when the base is 2. When size=1 this function uses an ad-hoc loop directly operating on limbs, for the other sizes the special function uses functions from the mpn_ layer. > overhead 0.000000002 secs, precision 10000 units of 2.86e-10 secs, CPU > freq 3500.08 MHz > mpz_powm mpz_powm.2 mpz_powm.3 > 1 0.000004322 #0.8082 0.9884 > 2 0.000007201 1.0723 #0.9836 > 3 0.000013274 #0.9465 0.9928 > 56 0.001738323 #0.7941 1.0011 The current code, with a small base (eg. 3) is basically as fast as with a random base (maybe 1% difference, maybe not). For size=1, my specialised function with base=2 is 20% faster. For small sizes (eg.3), my specialised function (base=2) is slightly faster (5%). For larger sizes (eg.56), my specialised function (base=2) is sensibly faster (20%). For size=2, my specialised function (base=2) is slower, at least on some architectures. > I'd expect to see significant speedup for slower functions like gcd, > powm and jacobi from specialisation. powm may have different specialisations. With respect to size (if we write a specialised powm function for a generic base and a modulus of size 2 limbs) then my function may loose even harder for that size... > Checking for particular operand values in general adds more overhead > than is possible to get back. For for slower functions it is worth it. With the new composite test, a specialisation for base=2 makes sense... Ĝis, m -- http://bodrato.it/papers/ _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel