t...@gmplib.org (Torbjörn Granlund) writes: > I believe METHOD 1 could almost always be pointless if METHOD 3's > straight line code is made really efficient.
I'd guess so too. But benchmarks are better than faith alone ;-) > That should in particular be true if the q = 1 code is removed from > hgcd2, for machines with good multiplication throughput. Can we remove that for everything but HGCD2_METHOD == 2? It would be nice to not have an explosion of cases. > The normalisation code only works if operands are not already > normalised. Cannot they be that? We call the normalizing div2 only when q > bh, which implies bh^2 < ah. Then bh can't be normalized. When debugging (on 64-bit), I only saw this code entered with size of bh exactly 32 bits, ah exactly 64 bits. But I think it might be called with bh even smaller. > Should be revive the commented-out div2? It looks good to me, the > comment about that it is slower than the branchy div2 is probably from > the era where branchy div1/div2 was great. I haven't had the time to benchmark it, but I wouldn't be surprised if it runs better than the other div2. > (I suspect that to be when Vax and m68020 was hot.) IIRC, the alternative div2 was written 2008, and benchmarked on AMD Opteron (or at most a few years older). One or both of div1 and the "regular" div2 were copied from much older code. > I think there might be a slight bug or inefficiency here (from hgcd2.c): > > ASSERT (ah >= bh); > > ah -= bh; > if (ah < (CNST_LIMB (1) << (GMP_LIMB_BITS / 2 + 1))) > break; > > if (ah <= bh) > { > /* Use q = 1 */ > u01 += u00; > u11 += u10; > } > else > { > mp_double_limb_t rq = div1 (ah, bh); > > If, in the the last 'if' above ah = bh, we have as a matter of fact q = > 2 (with zero remainder). The thing is, we don't want q = 2 and one number reduced to a zero. We want to terminate with a matrix corresponding to both a and b roughly half a limb (full limb, if it weren't for the truncation when going from double precision to single precision), and |a - b| smaller. If we made the change from "<=" to "<", the ah == bh case would continue to mp_double_limb_t rq = div1 (ah, bh); q <-- 1 mp_limb_t q = rq.d1; ah <-- 0 ah = rq.d0; cond true if (ah < (CNST_LIMB(1) << (GMP_LIMB_BITS / 2 + 1))) { /* A is too small, but q is correct. */ u01 += q * u00; u11 += q * u10; break; } so we'll do the same matrix update, u01 += u00; u11 += u10; and exit the loop. If we keep "<=" as is, we instead update u01, u11 and continue to the next iteration: bh <-- 0 bh -= ah; cond true if (bh < (CNST_LIMB (1) << (GMP_LIMB_BITS / 2 + 1))) break; and exit the loop. So we'll exit the loop correctly either way, but using "<=" avoids the div1 call and a few multiplications just before exit. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel