Torbjorn Granlund <t...@gmplib.org> writes: > * The code is no win for AMD k10/k8 (although close to 10 c/l might well be > possible)
I tried replacing one masking op by cmov, as you suggested. We then get down to 11.25 c/l on K10. I put this modified version in the k10 subdirectory, since it was a significant slowdown on some other processors. Next thing to try is to delay the Q1 store, but that's a bit more work. After that, I guess I should try the loop mixer. I benchmarked the code on the k8, k10, core2, sandybridge, nehalem and nano machines. I couldn't log in to haswell and piledriver. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel