Torbjorn Granlund <t...@gmplib.org> writes:

> * The code is no win for AMD k10/k8 (although close to 10 c/l might well be
>   possible)

I tried replacing one masking op by cmov, as you suggested. We then get
down to 11.25 c/l on K10. I put this modified version in the k10
subdirectory, since it was a significant slowdown on some other
processors.

Next thing to try is to delay the Q1 store, but that's a bit more work.
After that, I guess I should try the loop mixer.

I benchmarked the code on the k8, k10, core2, sandybridge, nehalem and
nano machines. I couldn't log in to haswell and piledriver.

/Niels


-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to