----- Original Message ----- From: "Fredrik Johansson" <fredrik.johans...@gmail.com> To: <mpir-devel@googlegroups.com> Sent: Thursday, October 15, 2009 7:01 AM Subject: [mpir-devel] Function requests
> > Hi all, > > I'm currently looking for fast ways to sum exponential-type Taylor > series in fixed point arithmetic using mpz (see > http://code.google.com/p/fastfunlib/)). I'm mainly interested in low > precision (say a few hundred or thousand bits), so I don't think > binary splitting is worthwhile (but I could be wrong here). > For such small sizes , the overhead of the mpz layer will be a fair fraction of the runtime. Write it in mpz's to get the algorithm and layout sorted , then convert to the mpn layer one part at a time. This is how I generally go about things , I dont want to be debugging an algorithm at the mpn level. > The most important missing ingredient is a function that does > > mpz_mul(z, x, y); > mpz_tdiv_q_2exp(z, z, p); > > efficiently in a single step. For other purposes, it should ideally > only use the necessary high limbs of x and y so that x can be a fixed > point value at much higher precision, without the need to place a > truncated copy in a temporary variable. If it helps, it would be > possible to choose p so as to be a multiple of the limb size. > We have an undocumented (for now, ie you may need to include gmp-impl.h , longlong.h etc) // (rp,2n)=(xp,n)*(yp,n) void mpn_mulhigh_n (mp_ptr rp, mp_srcptr xp, mp_srcptr yp, mp_size_t n) where the upper n limbs are guarenteed to be correct , ( the lower n limbs may be junk or correct) As we havent actually used it anywhere yet (next mpir version will use it for roots) , the basecase sizes are still writen in C and dont compare that favorably with the asm writen mul_basecase. When comapiring like-for-like it gives ~30% speedup. For the more general (rp,n+m)=(xp,n)*(yp,m) where only the top k limbs are correct , there are many algorithmic choices , Thom Mulders paper has some details on this, but for small sizes experiment is best. Note: like mullow , mulhigh can be simplified when k<n,m , which is I think what you were asking. Basecase is easy though , and basecase is usually fastest <20 limbs = 1280bits > Secondly, there are a lot of divisions by small integers (1, 2, 3, > ...). I'm not sure how much they contribute in total, but if there is > a way to speed up the first few with precomputation, it could help. > > A single-step x = x + y/k, k a small integer, would also be useful if > it could be done faster. > I expect the addition could be easily absorbed into the division , by which I mean that a mpn_add_div would run at the same speed as a mpn_div , Precalculating the inverses would be a definite benefit , although we dont have any interfaces for that at the moment. Note: some special divisors can run at much greater speeds. eg 2^k , 2^k-1 ,2^k+1 , divisors of 2^k-1 > Finally, for fixed-point square root, it can probably be done faster > than padding with zeros and performing an integer square root. > > Any other ideas? > > Fredrik > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---