Paul Zimmermann <paul.zimmerm...@inria.fr> writes: > Dear Niels, > > ./speed -p 1000000 -c -s 10-200 -f1.1 mpn_mul.0.6 would be more readable, > although the change in speed.h would be larger.
That would be nicer. Ideally, one could use some way to parametrize both sizes. Anyway, I tried some benchmarking, on shell, using the simple hack. Speed of mpn_mul_n appear unchanged (expected, since then all subproducts are perfectly balanced, so I figured it might make sense to measure times in relation to mpn_mul_n. Before the toom22 and toom32 changes: $ ./speed-before -p 100000 -c -s 2-50 -f 1.1 -r mpn_mul_n mpn_mul.-50 mpn_mul.-60 mpn_mul.-70 mpn_mul.-80 mpn_mul.-90 overhead 5.94 cycles, precision 100000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_n mpn_mul.-50 mpn_mul.-60 mpn_mul.-70 mpn_mul.-80 mpn_mul.-90 2 25.35 #0.9639 0.9650 0.9651 0.9651 0.9651 3 53.46 #0.4773 0.4778 0.6400 0.6399 0.6399 4 68.32 0.5514 #0.5513 0.5537 0.9129 0.9129 5 95.08 #0.4501 0.7394 0.7394 0.8541 0.8540 6 120.21 0.6426 #0.6426 0.7513 0.7517 0.9146 7 158.69 #0.5305 0.6281 0.6281 0.7679 0.8712 8 196.26 0.5497 #0.5479 0.6679 0.7656 0.9037 9 247.84 #0.4791 0.5918 0.6784 0.8045 0.8902 10 301.55 #0.5264 0.6173 0.7226 0.8082 0.9199 11 358.73 #0.4765 0.5533 0.6517 0.7289 0.8301 12 410.29 #0.5157 0.6070 0.6783 0.7696 0.8426 13 489.66 #0.4637 0.5524 0.7048 0.7688 0.8569 14 566.06 #0.5116 0.5748 0.6551 0.7953 0.8601 15 645.85 #0.4751 0.6098 0.6694 0.7994 0.8739 16 712.23 #0.5065 0.5746 0.6984 0.7558 0.8786 17 814.82 #0.4712 0.5884 0.6537 0.7705 0.8875 18 912.66 #0.5096 0.5582 0.6688 0.7803 0.8907 19 1016.32 #0.4793 0.5834 0.6890 0.7925 0.8979 20 1052.75 #0.5241 0.6330 0.7306 0.8359 0.9406 22 1240.61 #0.5480 0.6460 0.7451 0.8428 0.9548 24 1408.13 #0.5577 0.6504 0.7434 0.8871 0.9590 26 1663.78 #0.5625 0.6484 0.7736 0.9174 0.9595 28 1897.37 #0.5592 0.6375 0.7624 0.9014 0.9436 30 2179.16 #0.5671 0.6777 0.8173 0.8539 0.9376 33 2569.17 #0.5558 0.6637 0.7667 0.8724 0.9450 36 2972.70 #0.5841 0.6830 0.7767 0.8368 0.9353 39 3389.81 #0.6078 0.6968 0.7666 0.8819 0.9416 42 3834.29 0.7139 #0.6918 0.7620 0.8818 0.9348 46 4333.80 #0.7084 0.7221 0.7607 0.8906 0.9739 50 5053.64 #0.6932 0.7102 0.7972 0.8792 0.9830 It makes sense that speed ratios roughly match the size ration in the middle of the table where basecase is a large part of the work, and increase with larer sizes. After the changes, I get these values, with some annotations for notable differences. $ ./speed -p 100000 -c -s 2-50 -f 1.1 -r mpn_mul_n mpn_mul.-50 mpn_mul.-60 mpn_mul.-70 mpn_mul.-80 mpn_mul.-90 overhead 5.94 cycles, precision 100000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_n mpn_mul.-50 mpn_mul.-60 mpn_mul.-70 mpn_mul.-80 mpn_mul.-90 2 25.25 0.9655 0.9650 0.9657 #0.9649 0.9650 3 53.46 0.4774 #0.4773 0.6409 0.6400 0.6406 4 68.32 0.5525 0.5580 #0.5524 0.9130 0.9129 5 95.08 #0.4500 0.7395 0.7396 0.8541 0.8540 6 119.99 #0.6437 0.6437 0.7524 0.7517 0.9163 7 158.37 #0.5316 0.6294 0.6297 0.7695 0.8700 8 193.54 #0.5591 0.5593 0.6772 0.7832 0.9133 (slightly worse, noise?) 9 248.64 #0.4791 0.5914 0.6752 0.8064 0.8861 10 300.28 #0.5272 0.6210 0.7265 0.8118 0.9237 11 358.70 #0.4764 0.5527 0.6545 0.7301 0.8321 12 410.21 #0.5161 0.6066 0.6785 0.7684 0.8442 13 488.42 #0.4631 0.5571 0.7069 0.7694 0.8590 14 564.68 #0.5137 0.5756 0.6560 0.7985 0.8645 15 688.37 #0.4472 0.5722 0.6247 0.7481 0.8184 (speedup!) 16 708.63 #0.5086 0.5779 0.7025 0.7608 0.8847 (slightly worse) 17 813.77 #0.4720 0.5888 0.6568 0.7724 0.8898 18 910.40 #0.5103 0.5615 0.6707 0.7826 0.8909 19 1015.09 #0.4790 0.5843 0.6898 0.7950 0.8985 20 1052.33 #0.5252 0.6300 0.7323 0.8351 0.9415 22 1238.14 #0.5495 0.6478 0.7463 0.8450 0.9571 24 1400.00 #0.5614 0.6545 0.7487 0.8922 0.9403 26 1665.52 #0.5619 0.6477 0.7741 0.8799 (!) 0.9345 (!) speedup 28 1897.93 #0.5600 0.6374 0.7621 0.8764 (!) 0.9372 speedup 30 2170.58 #0.5688 0.6802 0.8074 0.8744 (!) 0.9352 slowdown 33 2568.48 #0.5558 0.6644 0.7597 0.8814 0.9342 slight speedup 36 2970.97 #0.5847 0.6706 0.7705 0.8285 0.9327 39 3392.28 #0.6087 0.6868 0.7619 0.8864 0.9334 42 3830.21 0.8785 (!) #0.6857 0.7569 0.8734 0.9303 Slowdown for 0.5! 46 4345.72 0.7127 #0.7092 0.7514 0.8743 0.9665 50 5062.55 #0.6920 0.7027 0.7868 0.8891 0.9602 Nice speedup for some sices, in particular 15. Some notable regressions, in particular size 30x24 and 42x21, if I read the table correctly. To understand what's going on, one might need to separate the rather small changes to algorithm selection, to the reduced toom32 overhead. But overall, rather small changes. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel