Ah. That makes sense, and it seems like a nice way to exploit the instruction level parallelism. Sounds like a good way to go.
--jason On Thu, Jan 1, 2009 at 4:21 PM, <ja...@njkfrudils.plus.com> wrote: > > On Thursday 01 January 2009 21:11:38 Jason Martin wrote: >> Pardon my daftness, but what exactly do you mean by a bidirectional >> divexact? >> > > got the usual division > > //(qp,n)=(xp,n)/d and the division is exact > mp_limb_t jaydiv(mp_ptr qp,mp_srcptr xp,mp_size_t n,mp_limb_t d) > {int j;mp_limb_t h,l,q,dummy; > > ASSERT(n>0);ASSERT(d!=0);ASSERT_MPN(xp,n);ASSERT(MPN_SAME_OR_SEPARATE_P(qp,xp,n)); > h=0;// first and last iterations are a bit redundant > for(j=n-1;j>=0;j--) > {l=xp[j]; > udiv_qrnnd(q,dummy,h,l,d); > qp[j]=q; > h=l-q*d;} > return h;}// this is the remainder > > of course the udiv_qrnnd should be replaced with mul by inverse > got the usual divexact > > mp_limb_t jayexactdiv(mp_ptr qp,mp_srcptr xp,mp_size_t n,mp_limb_t > d,mp_limb_t m) > {int j;mp_limb_t c,h,l,q,dummy; > > // m is the inversve , d is odd > c=0;h=0; > for(j=0;j<n;j++) > {l=xp[j]-h-c;if(h+c>xp[j]){c=1;}else{c=0;} > q=l*m; > qp[j]=q; > umul_ppmm(h,dummy,q,d); > } > return h+c;}// this is the remainder*B^k? , besides zero are their any other > easy cases > > but both are very wastefull of bandwidth so split quotient into top and bottom > to get > > mp_limb_t jaybiexactdiv(mp_ptr qp,mp_srcptr xp,mp_size_t n,mp_limb_t > d,mp_limb_t m) > {int j1,j2,k;mp_limb_t c1,h1,l1,q1,dummy1,h2,l2,q2,dummy2; > // m is the inversve , d is odd , n is even > c1=0;h1=0;h2=0;k=n/2; > // for odd n , top limb is most likely zero so , do the top odd limb with > normal division > for(j2=n-1,j1=0;j1<k;j1++,j2--)// alternative condition is j1<k or j2>=k > {l1=xp[j1]-h1-c1;if(h1+c1>xp[j1]){c1=1;}else{c1=0;} > q1=l1*m; > qp[j1]=q1; > umul_ppmm(h1,dummy1,q1,d); > l2=xp[j2]; > udiv_qrnnd(q2,dummy2,h2,l2,d); > qp[j2]=q2; > h2=l2-q2*d; > } > return h1+c1-h2;}// return zero if remainder is zero > > > > >> --jason >> >> On Thu, Jan 1, 2009 at 4:05 PM, Bill Hart <goodwillh...@googlemail.com> > wrote: >> > Number two would be very interesting. I've thought about implementing >> > such a thing elsewhere for polynomials, but it also makes sense for >> > MPIR. >> > >> > Bill. >> > >> > On 01/01/2009, ja...@njkfrudils.plus.com <ja...@njkfrudils.plus.com> > wrote: >> >> Hi there, >> >> I'm just starting to look at the mpn division functions , for the >> >> straight forward linear functions like mpn_divrem_1 mpn_divexact_1 etc , >> >> there is a lot of wasted "bandwidth" on the K8(other cpu's would be >> >> similar). For example the function mpn_divexact_1 has a 10c/l thruput >> >> which is restricted by the limb/carry dependances. If we didn't have to >> >> worry about that we could get it to about 5 or 6 , or we could do some >> >> other operation . For example a mpn_divexactadd_1 where we get the add >> >> for free , but no one wants a divexactadd. >> >> >> >> Any suggestions? >> >> >> >> these could be usefull >> >> 1)two independant divisions >> >> 2)a bidirectional divexact >> >> >> >> Jason >> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---