Thanks. We'll put that patch in at the next release.

Does 3x seem to be optimal?

Bill.

On 9 March 2013 16:03, Fredrik Johansson <fredrik.johans...@gmail.com> wrote:
> On Sat, Mar 9, 2013 at 4:35 PM, Bill Hart <goodwillh...@googlemail.com> wrote:
>> OK, well I'll add this to the long MPIR todo list.
>>
>> When we next update MPIR we'll attempt to find a robust patch for this.
>
> I just changed the FFT threshold test in mpn/generic/mul.c to
>
>   if (ABOVE_THRESHOLD (un + vn, 2*MUL_FFT_FULL_THRESHOLD)
>       && ABOVE_THRESHOLD (3*vn, MUL_FFT_FULL_THRESHOLD))
>     {
>       mpn_mul_fft_main (prodp, up, un, vp, vn);
>       return prodp[un + vn - 1];
>     }
>
> It does not seem to slow anything down, and makes an enormous
> difference in the very unbalanced cases (the difference grows even
> larger with larger un). I suggest just making this change and putting
> it in the next version, as I can't see it causing any problems, even
> if it's likely that even more can be done about unbalanced tuning.
>
> Here is a quick comparison (rough benchmark code, with some
> fluctuation) of ubits x vbits : old new. The improvement starting from
> 520000 x is very clear.
>
> 1000 x 1000 : 340 ns   340 ns
> 2000 x 1000 : 680 ns   700 ns
> 2000 x 2000 : 1000 ns   1100 ns
> 4000 x 1000 : 1300 ns   1500 ns
> 4000 x 2000 : 2200 ns   2300 ns
> 4000 x 4000 : 3400 ns   3400 ns
> 8000 x 1000 : 2800 ns   2700 ns
> 8000 x 2000 : 4500 ns   4700 ns
> 8000 x 4000 : 6600 ns   6800 ns
> 8000 x 8000 : 10000 ns   11 us
> 16000 x 1000 : 5300 ns   5500 ns
> 16000 x 2000 : 8900 ns   10000 ns
> 16000 x 4000 : 14 us   14 us
> 16000 x 8000 : 19 us   20 us
> 16000 x 16000 : 31 us   30 us
> 32000 x 1000 : 11 us   10000 ns
> 32000 x 2000 : 17 us   18 us
> 32000 x 4000 : 28 us   28 us
> 32000 x 8000 : 41 us   41 us
> 32000 x 16000 : 50 us   51 us
> 32000 x 32000 : 75 us   77 us
> 64000 x 1000 : 21 us   22 us
> 64000 x 2000 : 35 us   36 us
> 64000 x 4000 : 55 us   56 us
> 64000 x 8000 : 81 us   83 us
> 64000 x 16000 : 130 us   140 us
> 64000 x 32000 : 140 us   130 us
> 64000 x 64000 : 200 us   210 us
> 128000 x 1000 : 42 us   44 us
> 128000 x 2000 : 70 us   72 us
> 128000 x 4000 : 110 us   110 us
> 128000 x 8000 : 160 us   160 us
> 128000 x 16000 : 240 us   240 us
> 128000 x 32000 : 300 us   310 us
> 128000 x 64000 : 360 us   370 us
> 128000 x 128000 : 540 us   540 us
> 256000 x 1000 : 86 us   88 us
> 256000 x 2000 : 140 us   140 us
> 256000 x 4000 : 220 us   240 us
> 256000 x 8000 : 320 us   340 us
> 256000 x 16000 : 480 us   500 us
> 256000 x 32000 : 620 us   630 us
> 256000 x 64000 : 800 us   820 us
> 256000 x 128000 : 900 us   1000 us
> 256000 x 256000 : 1200 us   1200 us
> 512000 x 1000 : 1200 us   170 us
> 512000 x 2000 : 1200 us   280 us
> 512000 x 4000 : 1400 us   440 us
> 512000 x 8000 : 1400 us   660 us
> 512000 x 16000 : 1500 us   960 us
> 512000 x 32000 : 1500 us   1300 us
> 512000 x 64000 : 1600 us   1600 us
> 512000 x 128000 : 1900 us   1800 us
> 512000 x 256000 : 2000 us   2000 us
> 512000 x 512000 : 2600 us   2800 us
> 1024000 x 1000 : 2700 us   340 us
> 1024000 x 2000 : 2700 us   560 us
> 1024000 x 4000 : 2600 us   900 us
> 1024000 x 8000 : 2600 us   1400 us
> 1024000 x 16000 : 3200 us   2000 us
> 1024000 x 32000 : 3200 us   2500 us
> 1024000 x 64000 : 3300 us   3300 us
> 1024000 x 128000 : 3300 us   3300 us
> 1024000 x 256000 : 3800 us   3800 us
> 1024000 x 512000 : 4300 us   4600 us
> 1024000 x 1024000 : 5600 us   6300 us
> 2048000 x 1000 : 5700 us   690 us
> 2048000 x 2000 : 5600 us   1200 us
> 2048000 x 4000 : 5700 us   1800 us
> 2048000 x 8000 : 5500 us   2800 us
> 2048000 x 16000 : 5500 us   3800 us
> 2048000 x 32000 : 6600 us   4900 us
> 2048000 x 64000 : 6800 us   6400 us
> 2048000 x 128000 : 7000 us   7600 us
> 2048000 x 256000 : 7100 us   7600 us
> 2048000 x 512000 : 7800 us   8000 us
> 2048000 x 1024000 : 10000 us   10000 us
> 2048000 x 2048000 : 12 ms   12 ms
> 4096000 x 1000 : 13 ms   1300 us
> 4096000 x 2000 : 11 ms   2200 us
> 4096000 x 4000 : 12 ms   3500 us
> 4096000 x 8000 : 12 ms   5200 us
> 4096000 x 16000 : 12 ms   7800 us
> 4096000 x 32000 : 13 ms   10000 us
> 4096000 x 64000 : 12 ms   13 ms
> 4096000 x 128000 : 14 ms   14 ms
> 4096000 x 256000 : 14 ms   14 ms
> 4096000 x 512000 : 14 ms   15 ms
> 4096000 x 1024000 : 16 ms   17 ms
> 4096000 x 2048000 : 20 ms   19 ms
> 4096000 x 4096000 : 25 ms   26 ms
> 8192000 x 1000 : 25 ms   2700 us
> 8192000 x 2000 : 26 ms   4500 us
> 8192000 x 4000 : 27 ms   7500 us
> 8192000 x 8000 : 26 ms   12 ms
> 8192000 x 16000 : 26 ms   15 ms
> 8192000 x 32000 : 27 ms   21 ms
> 8192000 x 64000 : 26 ms   25 ms
> 8192000 x 128000 : 27 ms   27 ms
> 8192000 x 256000 : 33 ms   34 ms
> 8192000 x 512000 : 34 ms   35 ms
> 8192000 x 1024000 : 34 ms   36 ms
> 8192000 x 2048000 : 38 ms   40 ms
> 8192000 x 4096000 : 44 ms   47 ms
> 8192000 x 8192000 : 55 ms   59 ms
> 16384000 x 1000 : 57 ms   5600 us
> 16384000 x 2000 : 56 ms   10000 us
> 16384000 x 4000 : 57 ms   15 ms
> 16384000 x 8000 : 56 ms   24 ms
> 16384000 x 16000 : 57 ms   31 ms
> 16384000 x 32000 : 58 ms   40 ms
> 16384000 x 64000 : 57 ms   56 ms
> 16384000 x 128000 : 60 ms   58 ms
> 16384000 x 256000 : 58 ms   66 ms
> 16384000 x 512000 : 69 ms   77 ms
> 16384000 x 1024000 : 72 ms   76 ms
> 16384000 x 2048000 : 74 ms   81 ms
> 16384000 x 4096000 : 83 ms   89 ms
> 16384000 x 8192000 : 98 ms   110 ms
> 16384000 x 16384000 : 120 ms   130 ms
> 32768000 x 1000 : 130 ms   10000 us
> 32768000 x 2000 : 130 ms   18 ms
> 32768000 x 4000 : 120 ms   28 ms
> 32768000 x 8000 : 120 ms   43 ms
> 32768000 x 16000 : 120 ms   63 ms
> 32768000 x 32000 : 110 ms   83 ms
> 32768000 x 64000 : 120 ms   100 ms
> 32768000 x 128000 : 120 ms   120 ms
> 32768000 x 256000 : 130 ms   120 ms
> 32768000 x 512000 : 120 ms   120 ms
> 32768000 x 1024000 : 170 ms   170 ms
> 32768000 x 2048000 : 180 ms   190 ms
> 32768000 x 4096000 : 180 ms   190 ms
> 32768000 x 8192000 : 200 ms   200 ms
> 32768000 x 16384000 : 240 ms   260 ms
> 32768000 x 32768000 : 270 ms   290 ms
> 65536000 x 1000 : 270 ms   23 ms
> 65536000 x 2000 : 290 ms   37 ms
> 65536000 x 4000 : 270 ms   60 ms
> 65536000 x 8000 : 270 ms   90 ms
> 65536000 x 16000 : 280 ms   120 ms
> 65536000 x 32000 : 270 ms   160 ms
> 65536000 x 64000 : 280 ms   210 ms
> 65536000 x 128000 : 280 ms   290 ms
> 65536000 x 256000 : 290 ms   300 ms
> 65536000 x 512000 : 280 ms   290 ms
> 65536000 x 1024000 : 290 ms   280 ms
> 65536000 x 2048000 : 360 ms   370 ms
> 65536000 x 4096000 : 370 ms   390 ms
> 65536000 x 8192000 : 370 ms   400 ms
> 65536000 x 16384000 : 410 ms   440 ms
> 65536000 x 32768000 : 500 ms   530 ms
> 65536000 x 65536000 : 580 ms   590 ms
>
> Fredrik
>
> --
> You received this message because you are subscribed to the Google Groups 
> "mpir-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to mpir-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to mpir-devel@googlegroups.com.
> Visit this group at http://groups.google.com/group/mpir-devel?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to