On Thursday 26 February 2009 02:33:25 Bill Hart wrote:
> The thing that bothers me most about karatsuba is that the crossover
> from basecase on K8 is about 28 limbs.
>
> But 3*14^2 = 588 whereas 28^2 = 784. So the difference between
> karatsuba and classical at this point is 196 multiplies. For karatsuba
> there is also 2 x 14 limb adds and 3 x 28 limb adds. The adds should
> get done in about half the time of the multiplies. So for me,
> something just doesn't add up. Karatsuba should be way ahead.
>

cycle counts are

mul by 28 is 2087
mul by 14 is 563
add by 14 is 29
add by 28 is 50
addadd by 28 is 83

3*563+2*29+3*50=1897
or with addadd
3*563+2*29+83+50=1880
now add in the overhead ???

Its the overheads that kill it.

Note the current addadd can definitly be improved  by  ~20%

> The idea of adding one of the multiplies straight in seems like a good
> idea.
>
> So we'd do the outside multiplies into the result. Add them into a
> temporary, then do the middle multiplication, taking it from the
> temporary as we go, then subtract the temporary from the result.
>
> It's not really clear that this is better though. We've replaced a mul
> and addadd with an add and submul. Either way an add/sub is paired up
> with something else for greater efficiency. Perhaps the difference is
> that the extra addition in the addadd actually increases the number of
> cycles, but pairing it with the multiplication actually doesn't change
> the time. Yeah maybe this would work.
>
> Do you recall what percentage the benchmarks changed when going from
> ordinary karatsuba to one with addadd?
>

I think it was 1 or 2%

> Bill.
>
> 2009/2/26  <ja...@njkfrudils.plus.com>:
> > On Wednesday 25 February 2009 20:00:19 Bill Hart wrote:
> >> I had a play with the karatsuba function today. I figured that one
> >> might be able to cut out some overhead by having it complete the last
> >> two iterations of the recursion in one go. I wrote some code which
> >> works for the last two iterations when n/2, at that point, is
> >> divisible by 4. But after fiddling for quite some time to try and find
> >> a way to make the existing code faster with this as a special case, I
> >> have not seen any speedups. Anyhow, here is the code, in case someone
> >> else had some ideas. It obviously only works on systems with
> >> mpn_addadd_n and mpn_addsub_n.
> >
> > Consider a new assembler function
> >
> > carry=mpn_muladd_basecase(rp,sp1,n1,sp2,n2)
> >
> > functionally
> > mpn_mul_basecase(tp,sp1,n1,sp2,n2);
> > carry=mpn_add_n(rp,rp,tp,n1+n2)
> >
> > implemented exactly like mpn_mul_basecase but with the first row an
> > addmul_1 rather than a mul_1 , and a extra carry to be taken from row to
> > row.
> >
> > This is easy to implement at maximum efficiency , and it runs at the same
> > speed as a mul_basecase but with a double sized add thrown in for
> > free.Clearly we can do the same with sqr_basecase.Using it in the current
> > karatsuba code would conflict with the use of addadd/addsub but I
> > suppose/hope we can come up with a varient where we can use both.You
> > would only use it at the lowest level of  recursion.An extension of this
> > would be the obvious mpn_muladd_kara which we would use for the "middle
> > mul"  as you only need it once.It may be helpful to play around with the
> > different karatsuba formulars to find a better match.
> >
> > Jason
> > - Show quoted text -
>
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to