Torbjorn Granlund <t...@gmplib.org> writes: > ni...@lysator.liu.se (Niels Möller) writes:
> But for addition, mpn_addmul_1 beats mpn_cnd_add_n for many small sizes, > > 6 #5.4937 5.9282 > > Not an alarming difference. Maybe not, but I got a measurable slowdown of some ECC operations when switching to mpn_cnd_add_n, and my best guess is that this is the reason for that. > 1. I guess one can expect submul_1 to always be a bit slower than > addmul_1, since submul_1 needs additional arithmetics besides the > umaal? One could perhaps do some negations on the fly, a - b C = - > ((-a) + b*C), maybe that would be advantageous? > > I encourage you to work on that; 3.25 c/l vs 5.25 c/l seem like a very > large difference between addmul_1 and submul_1. After some further thinking, it should work fine with one's complement rather than two's complement for the negations, a - b*C = ~(b*C + ~a) (if we do the complements on n+1 limbs) So it should be doable with the addmul_1 loop and two additional, non-recurrency, not instructions per limb, and then maybe some extra logic for the return value. One could aim for 4.25 c/l, I guess. > I've never considered addmul_1/submul_1 as alternatives to > cnd_add_n/cnd_sub_n. But they are, except that addmul_1/submul_1 always work in-place. Should be side-channel silent on the same machines where, e.g, mul_1 is side-channel silent, right? > A similar situation is that addmul_1/submul_1 is sometimes faster than > addlsh_1/sublsh_1. And in that case, it would be nice with some configure magic to disable the lsh_1 functions and use addmul_1/submul_1 instead. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel