ni...@lysator.liu.se (Niels Möller) writes: > So the gcd_22 code with all the branches needs about 11 cycles per input > bit. gcd_11 is coreihwl/gcd_11.asm in this build.
I had to do a quick try with the masking version before leaving for work: $ ./tune/speed -p 100000 -s 1-64 -t 3 -C mpn_gcd_11 mpn_gcd_22 overhead 4.01 cycles, precision 100000 units of 5.08e-10 secs, CPU freq 1966.75 MHz mpn_gcd_11 mpn_gcd_22 1 #7.0459 11.0729 4 #2.8721 6.6946 7 #2.8415 7.3658 10 #3.3039 7.4346 13 #3.6084 7.8537 16 #4.0952 7.8193 19 #4.0620 8.0769 22 #3.9624 7.9849 25 #3.9169 8.0956 28 #4.0359 8.2084 31 #3.9816 8.0920 34 #3.9748 8.0933 37 #4.0041 8.0792 40 #3.9327 8.0656 43 #3.9239 8.1070 46 #3.9064 8.0289 49 #3.9143 8.0191 52 #4.0642 9.5547 55 #4.1251 9.8507 58 #4.2523 7.9096 61 #3.8440 7.8590 64 #3.8325 7.8398 So down to around 8 cycles per input bit. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel