Ah.  That makes sense, and it seems like a nice way to exploit the
instruction level parallelism.  Sounds like a good way to go.

--jason


On Thu, Jan 1, 2009 at 4:21 PM,  <ja...@njkfrudils.plus.com> wrote:
>
> On Thursday 01 January 2009 21:11:38 Jason Martin wrote:
>> Pardon my daftness, but what exactly do you mean by a bidirectional
>> divexact?
>>
>
> got the usual division
>
> //(qp,n)=(xp,n)/d  and the division is exact
> mp_limb_t       jaydiv(mp_ptr qp,mp_srcptr xp,mp_size_t n,mp_limb_t d)
> {int j;mp_limb_t h,l,q,dummy;
>
> ASSERT(n>0);ASSERT(d!=0);ASSERT_MPN(xp,n);ASSERT(MPN_SAME_OR_SEPARATE_P(qp,xp,n));
> h=0;// first and last iterations are a bit redundant
> for(j=n-1;j>=0;j--)
>   {l=xp[j];
>    udiv_qrnnd(q,dummy,h,l,d);
>    qp[j]=q;
>    h=l-q*d;}
> return h;}// this is the remainder
>
> of course the udiv_qrnnd should be replaced with mul by inverse
> got the usual divexact
>
> mp_limb_t       jayexactdiv(mp_ptr qp,mp_srcptr xp,mp_size_t n,mp_limb_t
> d,mp_limb_t m)
> {int j;mp_limb_t c,h,l,q,dummy;
>
> // m is the inversve , d is odd
> c=0;h=0;
> for(j=0;j<n;j++)
>   {l=xp[j]-h-c;if(h+c>xp[j]){c=1;}else{c=0;}
>    q=l*m;
>    qp[j]=q;
>    umul_ppmm(h,dummy,q,d);
>   }
> return h+c;}// this is the remainder*B^k?  , besides zero are their any other
> easy cases
>
> but both are very wastefull of bandwidth so split quotient into top and bottom
> to get
>
> mp_limb_t       jaybiexactdiv(mp_ptr qp,mp_srcptr xp,mp_size_t n,mp_limb_t
> d,mp_limb_t m)
> {int j1,j2,k;mp_limb_t c1,h1,l1,q1,dummy1,h2,l2,q2,dummy2;
> // m is the inversve , d is odd , n is even
> c1=0;h1=0;h2=0;k=n/2;
> // for odd n , top limb is most likely zero so , do the top odd limb with
> normal division
> for(j2=n-1,j1=0;j1<k;j1++,j2--)// alternative condition is j1<k or j2>=k
>   {l1=xp[j1]-h1-c1;if(h1+c1>xp[j1]){c1=1;}else{c1=0;}
>    q1=l1*m;
>    qp[j1]=q1;
>    umul_ppmm(h1,dummy1,q1,d);
>    l2=xp[j2];
>    udiv_qrnnd(q2,dummy2,h2,l2,d);
>    qp[j2]=q2;
>    h2=l2-q2*d;
>   }
> return h1+c1-h2;}// return zero if remainder is zero
>
>
>
>
>> --jason
>>
>> On Thu, Jan 1, 2009 at 4:05 PM, Bill Hart <goodwillh...@googlemail.com>
> wrote:
>> > Number two would be very interesting. I've thought about implementing
>> > such a thing elsewhere for polynomials, but it also makes sense for
>> > MPIR.
>> >
>> > Bill.
>> >
>> > On 01/01/2009, ja...@njkfrudils.plus.com <ja...@njkfrudils.plus.com>
> wrote:
>> >> Hi there,
>> >> I'm just starting to look at the mpn division functions , for the
>> >> straight forward linear functions like mpn_divrem_1 mpn_divexact_1 etc ,
>> >> there is a lot of wasted "bandwidth" on the K8(other cpu's would be
>> >> similar). For example the function  mpn_divexact_1 has a 10c/l thruput
>> >> which is restricted by the limb/carry dependances. If we didn't have to
>> >> worry about that we could get it to about 5 or 6 , or we could do some
>> >> other operation . For example a mpn_divexactadd_1 where we get the add
>> >> for free , but no one wants a divexactadd.
>> >>
>> >> Any suggestions?
>> >>
>> >> these could be usefull
>> >> 1)two independant divisions
>> >> 2)a bidirectional divexact
>> >>
>> >> Jason
>>
>>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to