Someone has replaced longlong.h in MPIR with a really, really slow
version! Why on earth did we do that!?

When I replace it with the former fast version, mul_basecase is about
the same speed as BSDNT and unfortunately BSDNT divide and conquer
division does not catch MPIR.

However, there is still hope for BSDNT as switching longlong.h in MPIR
makes MPIR's division basecase nearly a factor of 3 faster than
BSDNT's.

This is just not going to be resolved without an efficient assembly
optimised implementation. There are too many variables. Probably
months of work....

Bill.

On 29 March 2013 20:47, Bill Hart <goodwillh...@googlemail.com> wrote:
> Actually, I think this comparison is not as fair as I thought. The
> mul_basecase code in MPIR really sucks. It's horribly slow compared to
> the code in BSDNT. I thought compilers were able to optimise this sort
> of stuff these days, but apparently not.
>
> mul_basecase is not actually called that much because we use mullow
> (which does use mul_basecase) and mulmid (which uses mullow_basecase),
> though these too might be way slower in mpir, which might entirely
> account for the difference.
>
> I'm gonna have to wait for an efficient implementation of everything
> using assembly language I think. Perhaps MPIR won't be that much
> slower.
>
> Bill.
>
> On 29 March 2013 20:13, Bill Hart <goodwillh...@googlemail.com> wrote:
>> Here are the times for nxn multiplication in bsdnt:
>>
>> size = 10: classical = 0s, kara = 0s, toom33 = 0s
>> size = 11: classical = 2e-06s, kara = 0s, toom33 = 2e-06s
>> size = 13: classical = 0s, kara = 0s, toom33 = 2e-06s
>> size = 15: classical = 0s, kara = 2e-06s, toom33 = 2e-06s
>> size = 17: classical = 0s, kara = 2e-06s, toom33 = 2e-06s
>> size = 19: classical = 0s, kara = 2e-06s, toom33 = 2e-06s
>> size = 21: classical = 2e-06s, kara = 0s, toom33 = 4e-06s
>> size = 24: classical = 0s, kara = 2e-06s, toom33 = 4e-06s
>> size = 27: classical = 2e-06s, kara = 2e-06s, toom33 = 4e-06s
>> size = 30: classical = 2e-06s, kara = 2e-06s, toom33 = 4e-06s
>> size = 33: classical = 2e-06s, kara = 4e-06s, toom33 = 4e-06s
>> size = 37: classical = 4e-06s, kara = 4e-06s, toom33 = 6e-06s
>> size = 41: classical = 4e-06s, kara = 4e-06s, toom33 = 6e-06s
>> size = 46: classical = 6e-06s, kara = 6e-06s, toom33 = 6e-06s
>> size = 51: classical = 8e-06s, kara = 6e-06s, toom33 = 8e-06s
>> size = 57: classical = 8e-06s, kara = 8e-06s, toom33 = 1e-05s
>> size = 63: classical = 1e-05s, kara = 8e-06s, toom33 = 1.2e-05s
>> size = 70: classical = 1.2e-05s, kara = 1.2e-05s, toom33 = 1.2e-05s
>> size = 77: classical = 1.6e-05s, kara = 1.2e-05s, toom33 = 1.4e-05s
>> size = 85: classical = 1.8e-05s, kara = 1.6e-05s, toom33 = 1.6e-05s
>> size = 94: classical = 2.4e-05s, kara = 1.6e-05s, toom33 = 2e-05s
>> size = 104: classical = 2.8e-05s, kara = 2e-05s, toom33 = 2.2e-05s
>> size = 115: classical = 3.4e-05s, kara = 2.4e-05s, toom33 = 2.6e-05s
>> size = 127: classical = 4.2e-05s, kara = 2.8e-05s, toom33 = 3e-05s
>> size = 140: classical = 5e-05s, kara = 3.2e-05s, toom33 = 3.4e-05s
>> size = 154: classical = 6.2e-05s, kara = 3.6e-05s, toom33 = 4.2e-05s
>> size = 170: classical = 7.4e-05s, kara = 4.2e-05s, toom33 = 4.8e-05s
>> size = 188: classical = 9e-05s, kara = 5.2e-05s, toom33 = 5.6e-05s
>> size = 207: classical = 0.000108s, kara = 6.2e-05s, toom33 = 6.2e-05s
>> size = 228: classical = 0.000132s, kara = 7.2e-05s, toom33 = 7.2e-05s
>> size = 251: classical = 0.00016s, kara = 8.4e-05s, toom33 = 8.4e-05s
>> size = 277: classical = 0.000196s, kara = 9.4e-05s, toom33 = 9.8e-05s
>> size = 305: classical = 0.000236s, kara = 0.00011s, toom33 = 0.000114s
>> size = 336: classical = 0.000286s, kara = 0.00013s, toom33 = 0.000132s
>> size = 370: classical = 0.000346s, kara = 0.000152s, toom33 = 0.000156s
>> size = 408: classical = 0.00042s, kara = 0.00018s, toom33 = 0.000174s
>> size = 449: classical = 0.000508s, kara = 0.000212s, toom33 = 0.000202s
>> size = 494: classical = 0.000616s, kara = 0.000248s, toom33 = 0.000236s
>>
>> Note bsdnt only has up to toom3 code. It's quite odd that the toom3
>> code doesn't really substantially outperform the karatsuba code until
>> something like 1000 limbs. I would have assumed the crossover would be
>> earlier, but I can't see anything wrong with the code, so this must be
>> correct.
>>
>> Bill.
>>
>> On 29 March 2013 20:02, Fredrik Johansson <fredrik.johans...@gmail.com> 
>> wrote:
>>> On Fri, Mar 29, 2013 at 7:38 PM, Bill Hart <goodwillh...@googlemail.com>
>>> wrote:
>>>>
>>>> So the conclusion is very clear. Even though my division basecase is
>>>> slightly slower in BSDNT, the BSDNT divconquer code ends up being
>>>> nearly twice as fast as MPIR!
>>>
>>>
>>> Could you show the ratios for 2n by n division versus n by n multiplication?
>>>
>>> Fredrik
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "mpir-devel" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to mpir-devel+unsubscr...@googlegroups.com.
>>> To post to this group, send email to mpir-devel@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/mpir-devel?hl=en.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to