t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Attached is a patch to move the current (slow) gcd_2 to a separate file, > rename > it mpn_gcd_22, change argument and return type, and add a basic test. > > Nice!
Pushed now. > Speed support would be very welcome! Done! To get to measure only the gcd_22 loop, I reused the speed macros for gcd_11, but call gcd_22 (al, al, bl, bl), in effect premultiplying both inputs with a common factor (B+1). On my machine, $ ./tune/speed -s 1-64 -t 3 -C mpn_gcd_11 mpn_gcd_22 overhead 4.05 cycles, precision 10000 units of 5.87e-10 secs, CPU freq 1703.77 MHz mpn_gcd_11 mpn_gcd_22 1 #9.9349 11.4688 4 #3.7217 6.5098 7 #3.6897 6.7366 10 #3.8477 7.9195 13 #4.0583 9.3407 16 #4.1460 9.9268 19 #4.4428 10.8306 22 #4.3512 10.9613 25 #4.3475 10.9706 28 #4.3407 11.3027 31 #4.3135 10.9393 34 #4.2116 11.6335 37 #4.3125 11.5507 40 #4.1918 11.9219 43 #4.2716 10.2703 46 #3.9071 10.2429 49 #3.9190 10.2698 52 #4.0637 10.3419 55 #4.0631 11.6783 58 #4.1228 11.8280 61 #4.0512 10.3701 64 #3.8345 10.3486 So the gcd_22 code with all the branches needs about 11 cycles per input bit. gcd_11 is coreihwl/gcd_11.asm in this build. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel