ni...@lysator.liu.se (Niels Möller) writes: > The current (METHOD == 2) code some somethign similar, it conditionally > adds in B2 at the right place in the next iteration. It uses the > name u2 for the carry that lives unti the next iteration, > > umul_ppmm (p1, p0, u1, B2); > ADDC_LIMB (cy, u0, u0, u2 & B2); > u0 -= (-cy) & d; > add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0);
[...] > The drawback, in the context of div_qr_1n_pi1, is that the u2 input to > the iteration makes the quotient production much more costly. And the cost is basically an add_ssaaaa (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1); so at least a sequence and, add, adc. I think I've found a way to get rid of that, at the cost of more side-channel leakage, by adding it into the multiplication by previous u1 (which, unfortunately, may give another carry out). The iteration would become something like /* u recurrence, via u2, u1, u0 variables. */ umul_ppmm (p1, p0, u1, B2); ADDC_LIMB (cy, u0, u0, u2 & B2); u0 -= (-cy) & d; q_u1 = u1; /* Save u1, for below quotient processing */ add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0); /* q recurrence, via q0 variable. Input is the sum of "old" u1 and "new" u2. */ q_u1 -= u2; if (UNLIKELY(q_u1 == 0) && u2) { /* The increment of q_u1 overflowed, handle specially. */ q0 += cy; add_ssaaaa (q1, q2, CNST_LIMB(1), dinv, q0 < cy, q0); q0 = 0; } else { umul_ppmm (p1, p0, q_u1, dinv); ADDC_LIMB (q_2, q1, q_u1, q0); add_ssaaaa (q_2, q_1, q_2, q_1, CNST_LIMB(0), p1 + cy); q0 = p0; } ... store q2 and q1 ... The second multiplication now depends on the first, so may put some more pressure on out-of-order execution. > It would be nice if we could stay with this recurrency for u, but keep > the other way of carry handling for q. I've given it a quick try, so far without success. I might play a bit more with microoptimizations of the current algorithm before doing something more sophisticated. You're idea of conditonally adding the invariant d * B2 at the right place is also interesting, I don't think I understood it correctly at first reading. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel