Torbjorn Granlund <t...@gmplib.org> writes:

> I recently played with karatsuba-based addmul_2 for s390.  For MUL-
> challenged machines, that might be an option.  Sandybridge is OK there,
> but Bull-dozer is not.  A disadvantage is that such code would be
> unsuitable for code which wants to be "side-channel silent".

I'm pretty sure one can do Karatsuba branchfree. Not sure one can do
that and do it fast (but if the alternative is branches, they're costly
too).

Evaluation:

  mov   u0, a
  sub   u1, a
  lea   (u0, u1), t
  cmovc t, a
  sbb   mask, mask

Now a = |u0 - u1|, with sign in mask. Can easily be done also without
cmov, but with a longer chain of dependent instructions:

  mov   u0, a
  sub   u1, a
  sbb   mask, mask
  xor   mask, a
  sub   mask, a
  
Get b = |v0 - v1| in the same way, and arrange so that the final mask is
all ones if the term |u0 - u1| * |v0 - v1| should be subtracted, i.e., if
(u0 - u1)(v0 - v1) >= 0.

Interpolation:

Add in the signed term (u0 - u1) * (v0 - v1) to the three limbs
<r3,r2,r1>, using two's complement:

  mov   a, %rax
  mul   b
  xor   mask, %rax
  xor   mask, %rdx
  bt    $0, mask        C Set carry from mask. Any better way?
  C <mask, %rdx, %rax> + c = - (u0 - u1) (v0 - v1), in two's complement.
  adc   %rax, r1
  adc   %rdx, r2
  adc   mask, r3

Alternatively, one could use (u0 + u1) * (v0 + v1), with a couple of
conditional adds instead.
  
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to