ni...@lysator.liu.se (Niels Möller) writes: > I'll try to get the x86_64 assembly for mpn_div_qr_1n_pi1 in soon.
Pushed first working version now, see http://gmplib.org:8000/gmp/file/tip/mpn/x86_64/div_qr_1n_pi1.asm On my core2 laptop: $ ./speed -s 2-10,100,500 -C mpn_divrem_1.0x9999999999999999 mpn_div_qr_1.0x9999999999999999 overhead 6.13 cycles, precision 10000 units of 8.33e-10 secs, CPU freq 1200.00 MHz mpn_divrem_1.0x9999999999999999 mpn_div_qr_1.0x9999999999999999 2 60.6420 #39.9427 3 #40.9839 55.0469 4 #43.7667 44.4534 5 44.6333 #38.9055 6 39.6259 #34.4167 7 34.0063 #32.4018 8 30.1364 #28.5745 9 29.6472 #27.4599 10 29.1270 #26.7300 100 24.7920 #20.6700 500 24.4400 #19.7600 So here it's a clear win, except an ugly regression for n = 3. On shell, the same command gives: 2 #37.4379 51.1157 3 #30.0256 61.0904 4 #25.8058 27.0781 5 #23.2717 24.2831 6 #21.7520 22.4346 7 #20.5219 21.1111 8 #19.4783 20.1101 9 #18.7726 19.3369 10 #18.3271 18.7228 100 #13.8063 13.8175 500 #13.2670 13.2750 So here the new code is epsilon slower for the larger sizes. Maybe the loopmixer can help. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel