On Thursday 26 February 2009 02:33:25 Bill Hart wrote: > The thing that bothers me most about karatsuba is that the crossover > from basecase on K8 is about 28 limbs. > > But 3*14^2 = 588 whereas 28^2 = 784. So the difference between > karatsuba and classical at this point is 196 multiplies. For karatsuba > there is also 2 x 14 limb adds and 3 x 28 limb adds. The adds should > get done in about half the time of the multiplies. So for me, > something just doesn't add up. Karatsuba should be way ahead. >
cycle counts are mul by 28 is 2087 mul by 14 is 563 add by 14 is 29 add by 28 is 50 addadd by 28 is 83 3*563+2*29+3*50=1897 or with addadd 3*563+2*29+83+50=1880 now add in the overhead ??? Its the overheads that kill it. Note the current addadd can definitly be improved by ~20% > The idea of adding one of the multiplies straight in seems like a good > idea. > > So we'd do the outside multiplies into the result. Add them into a > temporary, then do the middle multiplication, taking it from the > temporary as we go, then subtract the temporary from the result. > > It's not really clear that this is better though. We've replaced a mul > and addadd with an add and submul. Either way an add/sub is paired up > with something else for greater efficiency. Perhaps the difference is > that the extra addition in the addadd actually increases the number of > cycles, but pairing it with the multiplication actually doesn't change > the time. Yeah maybe this would work. > > Do you recall what percentage the benchmarks changed when going from > ordinary karatsuba to one with addadd? > I think it was 1 or 2% > Bill. > > 2009/2/26 <ja...@njkfrudils.plus.com>: > > On Wednesday 25 February 2009 20:00:19 Bill Hart wrote: > >> I had a play with the karatsuba function today. I figured that one > >> might be able to cut out some overhead by having it complete the last > >> two iterations of the recursion in one go. I wrote some code which > >> works for the last two iterations when n/2, at that point, is > >> divisible by 4. But after fiddling for quite some time to try and find > >> a way to make the existing code faster with this as a special case, I > >> have not seen any speedups. Anyhow, here is the code, in case someone > >> else had some ideas. It obviously only works on systems with > >> mpn_addadd_n and mpn_addsub_n. > > > > Consider a new assembler function > > > > carry=mpn_muladd_basecase(rp,sp1,n1,sp2,n2) > > > > functionally > > mpn_mul_basecase(tp,sp1,n1,sp2,n2); > > carry=mpn_add_n(rp,rp,tp,n1+n2) > > > > implemented exactly like mpn_mul_basecase but with the first row an > > addmul_1 rather than a mul_1 , and a extra carry to be taken from row to > > row. > > > > This is easy to implement at maximum efficiency , and it runs at the same > > speed as a mul_basecase but with a double sized add thrown in for > > free.Clearly we can do the same with sqr_basecase.Using it in the current > > karatsuba code would conflict with the use of addadd/addsub but I > > suppose/hope we can come up with a varient where we can use both.You > > would only use it at the lowest level of recursion.An extension of this > > would be the obvious mpn_muladd_kara which we would use for the "middle > > mul" as you only need it once.It may be helpful to play around with the > > different karatsuba formulars to find a better match. > > > > Jason > > - Show quoted text - > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---