Torbjörn Granlund <t...@gmplib.org> writes: > ni...@lysator.liu.se (Niels Möller) writes: > > The critical path, via the u1 variable, is > > umul_ppmm (p1, p0, u1, B2); > add_mssaaaa (cy, u1, u0, u0, up[j], p1, p0); > u1 -= cy & d; > > Nice! > > The (cy & d) term is multiplied in the next iteration by B2, i.e., we > have either the invariant d * B2 or 0 as the contribution to the p1,p0 > product. If we can somehow add that to u0,up[j] during the umul_ppmm > then we could save several more cycles.
The current (METHOD == 2) code some somethign similar, it conditionally adds in B2 at the right place in the next iteration. It uses the name u2 for the carry that lives unti the next iteration, umul_ppmm (p1, p0, u1, B2); ADDC_LIMB (cy, u0, u0, u2 & B2); u0 -= (-cy) & d; add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0); Then the u0 update and carry propagation becomes independent of the multiplication. The critical u1 -> u1 path is shortened to just INSN CYCLE mul 0,5 add 3 adc 4 This is exactly it's done in the MOD_1_1P_METHOD == 2 code. The drawback, in the context of div_qr_1n_pi1, is that the u2 input to the iteration makes the quotient production much more costly. It would be nice if we could stay with this recurrency for u, but keep the other way of carry handling for q. Something like umul_ppmm (p1, p0, u1, B2); ADDC_LIMB (cy, u0, u0, u2 & B2); u0 -= (-cy) & d; t = u1 - (u2 & d); /* Outside of the critical u recurrency */ add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0); umul_ppmm (p1, p0, t, dinv); /* quotient logic... */ But it's going to be a bit subtle to do that and keep the q and u parts consistent. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel