Thanks. We'll put that patch in at the next release. Does 3x seem to be optimal?
Bill. On 9 March 2013 16:03, Fredrik Johansson <fredrik.johans...@gmail.com> wrote: > On Sat, Mar 9, 2013 at 4:35 PM, Bill Hart <goodwillh...@googlemail.com> wrote: >> OK, well I'll add this to the long MPIR todo list. >> >> When we next update MPIR we'll attempt to find a robust patch for this. > > I just changed the FFT threshold test in mpn/generic/mul.c to > > if (ABOVE_THRESHOLD (un + vn, 2*MUL_FFT_FULL_THRESHOLD) > && ABOVE_THRESHOLD (3*vn, MUL_FFT_FULL_THRESHOLD)) > { > mpn_mul_fft_main (prodp, up, un, vp, vn); > return prodp[un + vn - 1]; > } > > It does not seem to slow anything down, and makes an enormous > difference in the very unbalanced cases (the difference grows even > larger with larger un). I suggest just making this change and putting > it in the next version, as I can't see it causing any problems, even > if it's likely that even more can be done about unbalanced tuning. > > Here is a quick comparison (rough benchmark code, with some > fluctuation) of ubits x vbits : old new. The improvement starting from > 520000 x is very clear. > > 1000 x 1000 : 340 ns 340 ns > 2000 x 1000 : 680 ns 700 ns > 2000 x 2000 : 1000 ns 1100 ns > 4000 x 1000 : 1300 ns 1500 ns > 4000 x 2000 : 2200 ns 2300 ns > 4000 x 4000 : 3400 ns 3400 ns > 8000 x 1000 : 2800 ns 2700 ns > 8000 x 2000 : 4500 ns 4700 ns > 8000 x 4000 : 6600 ns 6800 ns > 8000 x 8000 : 10000 ns 11 us > 16000 x 1000 : 5300 ns 5500 ns > 16000 x 2000 : 8900 ns 10000 ns > 16000 x 4000 : 14 us 14 us > 16000 x 8000 : 19 us 20 us > 16000 x 16000 : 31 us 30 us > 32000 x 1000 : 11 us 10000 ns > 32000 x 2000 : 17 us 18 us > 32000 x 4000 : 28 us 28 us > 32000 x 8000 : 41 us 41 us > 32000 x 16000 : 50 us 51 us > 32000 x 32000 : 75 us 77 us > 64000 x 1000 : 21 us 22 us > 64000 x 2000 : 35 us 36 us > 64000 x 4000 : 55 us 56 us > 64000 x 8000 : 81 us 83 us > 64000 x 16000 : 130 us 140 us > 64000 x 32000 : 140 us 130 us > 64000 x 64000 : 200 us 210 us > 128000 x 1000 : 42 us 44 us > 128000 x 2000 : 70 us 72 us > 128000 x 4000 : 110 us 110 us > 128000 x 8000 : 160 us 160 us > 128000 x 16000 : 240 us 240 us > 128000 x 32000 : 300 us 310 us > 128000 x 64000 : 360 us 370 us > 128000 x 128000 : 540 us 540 us > 256000 x 1000 : 86 us 88 us > 256000 x 2000 : 140 us 140 us > 256000 x 4000 : 220 us 240 us > 256000 x 8000 : 320 us 340 us > 256000 x 16000 : 480 us 500 us > 256000 x 32000 : 620 us 630 us > 256000 x 64000 : 800 us 820 us > 256000 x 128000 : 900 us 1000 us > 256000 x 256000 : 1200 us 1200 us > 512000 x 1000 : 1200 us 170 us > 512000 x 2000 : 1200 us 280 us > 512000 x 4000 : 1400 us 440 us > 512000 x 8000 : 1400 us 660 us > 512000 x 16000 : 1500 us 960 us > 512000 x 32000 : 1500 us 1300 us > 512000 x 64000 : 1600 us 1600 us > 512000 x 128000 : 1900 us 1800 us > 512000 x 256000 : 2000 us 2000 us > 512000 x 512000 : 2600 us 2800 us > 1024000 x 1000 : 2700 us 340 us > 1024000 x 2000 : 2700 us 560 us > 1024000 x 4000 : 2600 us 900 us > 1024000 x 8000 : 2600 us 1400 us > 1024000 x 16000 : 3200 us 2000 us > 1024000 x 32000 : 3200 us 2500 us > 1024000 x 64000 : 3300 us 3300 us > 1024000 x 128000 : 3300 us 3300 us > 1024000 x 256000 : 3800 us 3800 us > 1024000 x 512000 : 4300 us 4600 us > 1024000 x 1024000 : 5600 us 6300 us > 2048000 x 1000 : 5700 us 690 us > 2048000 x 2000 : 5600 us 1200 us > 2048000 x 4000 : 5700 us 1800 us > 2048000 x 8000 : 5500 us 2800 us > 2048000 x 16000 : 5500 us 3800 us > 2048000 x 32000 : 6600 us 4900 us > 2048000 x 64000 : 6800 us 6400 us > 2048000 x 128000 : 7000 us 7600 us > 2048000 x 256000 : 7100 us 7600 us > 2048000 x 512000 : 7800 us 8000 us > 2048000 x 1024000 : 10000 us 10000 us > 2048000 x 2048000 : 12 ms 12 ms > 4096000 x 1000 : 13 ms 1300 us > 4096000 x 2000 : 11 ms 2200 us > 4096000 x 4000 : 12 ms 3500 us > 4096000 x 8000 : 12 ms 5200 us > 4096000 x 16000 : 12 ms 7800 us > 4096000 x 32000 : 13 ms 10000 us > 4096000 x 64000 : 12 ms 13 ms > 4096000 x 128000 : 14 ms 14 ms > 4096000 x 256000 : 14 ms 14 ms > 4096000 x 512000 : 14 ms 15 ms > 4096000 x 1024000 : 16 ms 17 ms > 4096000 x 2048000 : 20 ms 19 ms > 4096000 x 4096000 : 25 ms 26 ms > 8192000 x 1000 : 25 ms 2700 us > 8192000 x 2000 : 26 ms 4500 us > 8192000 x 4000 : 27 ms 7500 us > 8192000 x 8000 : 26 ms 12 ms > 8192000 x 16000 : 26 ms 15 ms > 8192000 x 32000 : 27 ms 21 ms > 8192000 x 64000 : 26 ms 25 ms > 8192000 x 128000 : 27 ms 27 ms > 8192000 x 256000 : 33 ms 34 ms > 8192000 x 512000 : 34 ms 35 ms > 8192000 x 1024000 : 34 ms 36 ms > 8192000 x 2048000 : 38 ms 40 ms > 8192000 x 4096000 : 44 ms 47 ms > 8192000 x 8192000 : 55 ms 59 ms > 16384000 x 1000 : 57 ms 5600 us > 16384000 x 2000 : 56 ms 10000 us > 16384000 x 4000 : 57 ms 15 ms > 16384000 x 8000 : 56 ms 24 ms > 16384000 x 16000 : 57 ms 31 ms > 16384000 x 32000 : 58 ms 40 ms > 16384000 x 64000 : 57 ms 56 ms > 16384000 x 128000 : 60 ms 58 ms > 16384000 x 256000 : 58 ms 66 ms > 16384000 x 512000 : 69 ms 77 ms > 16384000 x 1024000 : 72 ms 76 ms > 16384000 x 2048000 : 74 ms 81 ms > 16384000 x 4096000 : 83 ms 89 ms > 16384000 x 8192000 : 98 ms 110 ms > 16384000 x 16384000 : 120 ms 130 ms > 32768000 x 1000 : 130 ms 10000 us > 32768000 x 2000 : 130 ms 18 ms > 32768000 x 4000 : 120 ms 28 ms > 32768000 x 8000 : 120 ms 43 ms > 32768000 x 16000 : 120 ms 63 ms > 32768000 x 32000 : 110 ms 83 ms > 32768000 x 64000 : 120 ms 100 ms > 32768000 x 128000 : 120 ms 120 ms > 32768000 x 256000 : 130 ms 120 ms > 32768000 x 512000 : 120 ms 120 ms > 32768000 x 1024000 : 170 ms 170 ms > 32768000 x 2048000 : 180 ms 190 ms > 32768000 x 4096000 : 180 ms 190 ms > 32768000 x 8192000 : 200 ms 200 ms > 32768000 x 16384000 : 240 ms 260 ms > 32768000 x 32768000 : 270 ms 290 ms > 65536000 x 1000 : 270 ms 23 ms > 65536000 x 2000 : 290 ms 37 ms > 65536000 x 4000 : 270 ms 60 ms > 65536000 x 8000 : 270 ms 90 ms > 65536000 x 16000 : 280 ms 120 ms > 65536000 x 32000 : 270 ms 160 ms > 65536000 x 64000 : 280 ms 210 ms > 65536000 x 128000 : 280 ms 290 ms > 65536000 x 256000 : 290 ms 300 ms > 65536000 x 512000 : 280 ms 290 ms > 65536000 x 1024000 : 290 ms 280 ms > 65536000 x 2048000 : 360 ms 370 ms > 65536000 x 4096000 : 370 ms 390 ms > 65536000 x 8192000 : 370 ms 400 ms > 65536000 x 16384000 : 410 ms 440 ms > 65536000 x 32768000 : 500 ms 530 ms > 65536000 x 65536000 : 580 ms 590 ms > > Fredrik > > -- > You received this message because you are subscribed to the Google Groups > "mpir-devel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to mpir-devel+unsubscr...@googlegroups.com. > To post to this group, send email to mpir-devel@googlegroups.com. > Visit this group at http://groups.google.com/group/mpir-devel?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to mpir-devel+unsubscr...@googlegroups.com. To post to this group, send email to mpir-devel@googlegroups.com. Visit this group at http://groups.google.com/group/mpir-devel?hl=en. For more options, visit https://groups.google.com/groups/opt_out.