Torbjorn Granlund <t...@gmplib.org> writes: > I recently played with karatsuba-based addmul_2 for s390. For MUL- > challenged machines, that might be an option. Sandybridge is OK there, > but Bull-dozer is not. A disadvantage is that such code would be > unsuitable for code which wants to be "side-channel silent".
I'm pretty sure one can do Karatsuba branchfree. Not sure one can do that and do it fast (but if the alternative is branches, they're costly too). Evaluation: mov u0, a sub u1, a lea (u0, u1), t cmovc t, a sbb mask, mask Now a = |u0 - u1|, with sign in mask. Can easily be done also without cmov, but with a longer chain of dependent instructions: mov u0, a sub u1, a sbb mask, mask xor mask, a sub mask, a Get b = |v0 - v1| in the same way, and arrange so that the final mask is all ones if the term |u0 - u1| * |v0 - v1| should be subtracted, i.e., if (u0 - u1)(v0 - v1) >= 0. Interpolation: Add in the signed term (u0 - u1) * (v0 - v1) to the three limbs <r3,r2,r1>, using two's complement: mov a, %rax mul b xor mask, %rax xor mask, %rdx bt $0, mask C Set carry from mask. Any better way? C <mask, %rdx, %rax> + c = - (u0 - u1) (v0 - v1), in two's complement. adc %rax, r1 adc %rdx, r2 adc mask, r3 Alternatively, one could use (u0 + u1) * (v0 + v1), with a couple of conditional adds instead. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel