----- Original Message ----- 
From: "Fredrik Johansson" <fredrik.johans...@gmail.com>
To: <mpir-devel@googlegroups.com>
Sent: Thursday, October 15, 2009 7:01 AM
Subject: [mpir-devel] Function requests


>
> Hi all,
>
> I'm currently looking for fast ways to sum exponential-type Taylor
> series in fixed point arithmetic using mpz (see
> http://code.google.com/p/fastfunlib/)). I'm mainly interested in low
> precision (say a few hundred or thousand bits), so I don't think
> binary splitting is worthwhile (but I could be wrong here).
>

For such small sizes , the overhead of the mpz layer will be a fair fraction 
of the runtime. Write it in mpz's to get the algorithm and layout sorted , 
then convert to the mpn layer one part at a time. This is how I generally go 
about things , I dont want to be debugging an algorithm at the mpn level.

> The most important missing ingredient is a function that does
>
> mpz_mul(z, x, y);
> mpz_tdiv_q_2exp(z, z, p);
>
> efficiently in a single step. For other purposes, it should ideally
> only use the necessary high limbs of x and y so that x can be a fixed
> point value at much higher precision, without the need to place a
> truncated copy in a temporary variable. If it helps, it would be
> possible to choose p so as to be a multiple of the limb size.
>

We have an undocumented (for now, ie you may need to include gmp-impl.h , 
longlong.h etc)

// (rp,2n)=(xp,n)*(yp,n)
void mpn_mulhigh_n (mp_ptr rp, mp_srcptr xp, mp_srcptr yp, mp_size_t n)

where the upper n limbs are guarenteed to be correct , ( the lower n limbs 
may be junk or correct)

As we havent actually used it anywhere yet (next mpir version will use it 
for roots) , the basecase sizes are still writen in C and dont compare that 
favorably with the asm writen mul_basecase. When comapiring like-for-like it 
gives ~30% speedup.

For the more general (rp,n+m)=(xp,n)*(yp,m) where only the top k  limbs are 
correct , there are many algorithmic choices , Thom Mulders paper has some 
details on this, but for small sizes experiment is best. Note: like mullow , 
mulhigh can be simplified when k<n,m , which is I think what you were 
asking.

Basecase is easy though , and basecase is usually fastest <20 limbs = 
1280bits

> Secondly, there are a lot of divisions by small integers (1, 2, 3,
> ...). I'm not sure how much they contribute in total, but if there is
> a way to speed up the first few with precomputation, it could help.
>
> A single-step x = x + y/k, k a small integer, would also be useful if
> it could be done faster.
>

I expect the addition could be easily absorbed into the division , by which 
I mean that a mpn_add_div would run at the same speed as a mpn_div , 
Precalculating the inverses would be a definite benefit , although we dont 
have any interfaces for that at the moment. Note: some special divisors can 
run at much greater speeds. eg 2^k , 2^k-1 ,2^k+1 , divisors of 2^k-1

> Finally, for fixed-point square root, it can probably be done faster
> than padding with zeros and performing an integer square root.
>
> Any other ideas?
>
> Fredrik
>
> > 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to