The thing that bothers me most about karatsuba is that the crossover
from basecase on K8 is about 28 limbs.

But 3*14^2 = 588 whereas 28^2 = 784. So the difference between
karatsuba and classical at this point is 196 multiplies. For karatsuba
there is also 2 x 14 limb adds and 3 x 28 limb adds. The adds should
get done in about half the time of the multiplies. So for me,
something just doesn't add up. Karatsuba should be way ahead.

The idea of adding one of the multiplies straight in seems like a good idea.

So we'd do the outside multiplies into the result. Add them into a
temporary, then do the middle multiplication, taking it from the
temporary as we go, then subtract the temporary from the result.

It's not really clear that this is better though. We've replaced a mul
and addadd with an add and submul. Either way an add/sub is paired up
with something else for greater efficiency. Perhaps the difference is
that the extra addition in the addadd actually increases the number of
cycles, but pairing it with the multiplication actually doesn't change
the time. Yeah maybe this would work.

Do you recall what percentage the benchmarks changed when going from
ordinary karatsuba to one with addadd?

Bill.

2009/2/26  <ja...@njkfrudils.plus.com>:
>
> On Wednesday 25 February 2009 20:00:19 Bill Hart wrote:
>> I had a play with the karatsuba function today. I figured that one
>> might be able to cut out some overhead by having it complete the last
>> two iterations of the recursion in one go. I wrote some code which
>> works for the last two iterations when n/2, at that point, is
>> divisible by 4. But after fiddling for quite some time to try and find
>> a way to make the existing code faster with this as a special case, I
>> have not seen any speedups. Anyhow, here is the code, in case someone
>> else had some ideas. It obviously only works on systems with
>> mpn_addadd_n and mpn_addsub_n.
>
> Consider a new assembler function
>
> carry=mpn_muladd_basecase(rp,sp1,n1,sp2,n2)
>
> functionally
> mpn_mul_basecase(tp,sp1,n1,sp2,n2);
> carry=mpn_add_n(rp,rp,tp,n1+n2)
>
> implemented exactly like mpn_mul_basecase but with the first row an addmul_1
> rather than a mul_1 , and a extra carry to be taken from row to row.
>
> This is easy to implement at maximum efficiency , and it runs at the same
> speed as a mul_basecase but with a double sized add thrown in for free.Clearly
> we can do the same with sqr_basecase.Using it in the current karatsuba code
> would conflict with the use of addadd/addsub but I suppose/hope we can come
> up with a varient where we can use both.You would only use it at the lowest
> level of  recursion.An extension of this would be the obvious mpn_muladd_kara
> which we would use for the "middle mul"  as you only need it once.It may be
> helpful to play around with the different karatsuba formulars to find a
> better match.
>
> Jason
> - Show quoted text -
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to