Niels Möller <ni...@lysator.liu.se> writes: > Niels Möller <ni...@lysator.liu.se> writes: > >> And now I found that speed's handling of toom22, toom32 and friends >> always use fixed ratios, so I have to extend that to be able to get any >> interesting benchmarks when varying the ratio. > > Below patch seems to work. Most subtle part was to specify reasonable > ratio limits for the various toom functions. Tested by running
Pushed now. I've done some benchmarks on shell, on tip-of-tree GMP (no local changes). See numbers at the end of this message, comparing mpn_mul_basecase, mpn_toom22_mul and mpn_toom32_mul. To recall, the *_mul.-N notation to speed means that product size is N limbs. The speed s parameter is the size of the larger factor, so it must be >= N/2, and as s increases, there's less work as the multiply becomes more and mroe unbalanced. Some observations: 1. On this machine, MUL_TOOM22_THRESHOLD is 20. Toom-32 could use its own, higher, threshold. E.g, it doesn't beat basecase for product size 50, e.g., 28x22. And it's barely able to beat basecase for a few sizes at product size 70. 2. The current cutoff ratio of 0.8 between Toom-22 and Toom-32 looks pretty good. 3. For larger sizes, Toom-32 beats basecase also for ratios well below 0.5. Perhaps not so surprising, since the main work will be recursive balanced calls to Toom-22 (we're still below MUL_TOOM33_THRESHOLD, which is 65 on this machine), so perhaps top unbalanced product doesn't matter much. So beating basecase is perhaps a too low bar; for ratios around 0.5, Toom-32 should compete against either repeated Toom-22 or Toom-42, not included in these benchmarks. So next, I'd like to repeat benchmarks including my Toom-32, changes to see if I get more interesting results after fixing speed to do what I need it to do. I've also been thinking a bit more about the graph of which Toom functions should call each other. I currently believe that Toom-33 should recurse to Toom-32 for some size ratios (it would also need to recurse to itself, Toom-22, and we need to have either Toom-42 or Toom-43 in there as well). However, I wonder if we have any machines with thresholds such that Toom-33 can recurse to Toom-32 and with the recursive calls from Toom-32 still being in Toom-33 range? I would hope that it's fine to hardcode Toom-32 to recurse only to itself, Toom-22 and basecase, even for this case. Regards, /Niels $ ./speed -c -s 25-37 mpn_mul_basecase.-50 mpn_toom22_mul.-50 mpn_toom32_mul.-50 overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-50 mpn_toom22_mul.-50 mpn_toom32_mul.-50 25 2018.83 #1884.67 n/a 26 2080.80 #1894.83 2543.40 27 2124.00 #1970.67 2226.60 28 #2036.40 n/a 2143.80 28x22 0.786 29 #2034.60 n/a 2137.80 30 #2002.60 n/a 2352.83 31 #1905.50 n/a 2161.20 32 #1851.17 n/a 2168.80 33 #1831.67 n/a 2164.80 34 #1774.33 n/a 2652.40 35 #1733.50 n/a 2641.40 36 #1605.86 n/a 2587.00 37 #1559.14 n/a n/a $ ./speed -c -s 35-52 mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70 overhead 7.15 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-70 mpn_toom22_mul.-70 mpn_toom32_mul.-70 35 3960.00 #3499.00 n/a 36 3903.33 #3512.67 3928.67 37 3937.33 #3623.33 3947.33 38 3923.00 #3611.67 4232.00 38x32 0.842 39 3907.33 n/a #3713.67 39x31 0.818 40 #3824.67 n/a 6855.50 41 #3854.67 n/a 4274.00 42 #3802.33 n/a 4258.33 43 3768.33 n/a #3744.00 43x27 0.628 44 #3686.67 n/a 3755.67 45 #3648.00 n/a 3767.33 46 #3599.00 n/a 4350.67 47 #3548.67 n/a 4509.33 48 #3450.33 n/a 4277.00 49 #3415.33 n/a 4655.00 50 #3149.25 n/a 4011.33 51 #3064.00 n/a 3943.33 52 #2974.25 n/a n/a $ ./speed -c -s 50-74 mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100 overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-100 mpn_toom22_mul.-100 mpn_toom32_mul.-100 50 7625.50 #6052.00 n/a 51 7630.00 #6103.00 7214.50 52 7567.00 #6174.50 6376.00 53 7582.00 #6556.50 6640.00 54 7572.00 #6403.50 6584.50 55 7576.00 #6424.00 7046.50 55x45 0.818 56 7475.50 n/a #6357.00 56x44 0.786 57 7446.00 n/a #6465.00 58 7420.00 n/a #6408.00 59 7379.50 n/a #6438.50 60 7281.50 n/a #6260.50 61 7262.50 n/a #6790.00 62 7213.00 n/a #6715.50 63 7144.50 n/a #6399.50 64 7036.50 n/a #6478.00 65 7007.50 n/a #6440.00 66 6893.50 n/a #6489.50 67 6828.00 n/a #6696.50 67x33 0.493 68 #6685.00 n/a 7061.00 69 #6622.00 n/a 6755.00 70 6529.00 n/a #6514.50 70x30 0.429 71 #6419.50 n/a 6772.50 72 #6298.50 n/a 6399.00 73 #6179.00 n/a 6944.50 74 #6080.00 n/a n/a $ ./speed -c -s 75-112 mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150 overhead 7.17 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz mpn_mul_basecase.-150 mpn_toom22_mul.-150 mpn_toom32_mul.-150 75 16739.00 #12183.00 n/a 76 16683.00 #12148.00 15689.00 77 16669.00 #12241.00 15704.00 78 16689.00 #12195.00 12930.00 79 16669.00 #13586.00 14219.00 80 16616.00 #12641.00 13504.00 81 16628.00 #13429.00 13586.00 82 16611.00 13262.00 #12425.00 82x68 0.829 83 16561.00 13464.00 #13250.00 84 16785.00 n/a #13192.00 85 16403.00 n/a #13790.00 86 16421.00 n/a #12685.00 87 16319.00 n/a #14747.00 88 16219.00 n/a #12119.00 89 16132.00 n/a #14117.00 90 16118.00 n/a #12393.00 91 16039.00 n/a #14166.00 92 15919.00 n/a #14102.00 93 15788.00 n/a #12594.00 94 15770.00 n/a #12148.00 95 15645.00 n/a #13309.00 96 15476.00 n/a #12448.00 97 15377.00 n/a #13387.00 98 15330.00 n/a #13842.00 99 15173.00 n/a #13312.00 100 15018.00 n/a #13898.00 101 14875.00 n/a #13338.00 102 14767.00 n/a #13318.00 103 14601.00 n/a #14577.00 104 #14467.00 n/a 15572.00 105 14297.00 n/a #13664.00 106 14201.00 n/a #13154.00 107 13965.00 n/a #13766.00 108 13810.00 n/a #13020.00 108x42 0.389 109 #13627.00 n/a 15919.00 110 #13475.00 n/a 13983.00 111 #13259.00 n/a 14079.00 112 #13046.00 n/a n/a -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel