On Saturday 06 June 2009 00:14:58 Bill Hart wrote: > MPIR 1.2 looks to be passing all its tests, so should be out > officially over the weekend, so time to think about the next > release.... > > There will be an MPIR 1.2.1 which will merge Robert Gerbicz' > contributed code, "turn on" some improvements we have for the FFT, and > clean up some longstanding issues we have lurking in our trac server > (see those issues flagged for mpir-1.2.1: > http://trac.mpir.org/mpir_trac/report/1 ). > > Here are some of the things we've been musing about for the next > release(s) (MPIR 1.3 and following), roughly in order of importance in > each section: > > Multiplication: > > * Further improvements to assembly language on x86_64 (yes, we think > even for K8 :-)) > * Move more of karatsuba into assembly language (there should be two > new assembly functions for this) > * Factor Karatsuba code out into separate files (kara_mul_n.c and > kara_mul.c) > * Unbalanced version of karatsuba
Assuming I get the time etc , then I volunteer for the above assembler and karatsuba mul. > * Implement an improvement to Toom 3 suggested by Thorsten Kleinjung > * Toom 4 variants for unbalanced operands based on the code of Marco > Bodrato that we have > * Toom 7 variants for "slightly unbalanced" operands > * Factor out some of the lower level FFT helper functions into > separate files > * Implement some interesting strategies for speeding up pointwise > multiplications in the FFT code Looking at the existing toom code there are a lot of mpn_addmul_1(....,2^k) we should change this to mpn_addlsh_n(....,k) and write asm functions for this , for K8 this would be a macro to call addmul_1 , and for core2,K10 a new function Various other combinations can be done faster than we do at the moment , we just need to choose the most common or the easiest. > * Work on left and right truncated products and middle products > > Division: > > * Asymptotically fast division code based on Newton iteration - > Fredrick Johansson has python code which demonstrates an asymptotic > improvement Again , assuming I have the time , then I would like to do the division below > * Implement Peter Montgomery's mod_1 algorithm > * Further assembly improvements to divrem_1 and divrem_2 > > GCD: > > * Strassen multiplication for the 2x2 matrix multiplication used by > the asymptotically fast GCD code of Niels Moller > * A best of breed implementation at the mpn level making use of the > different strengths of the currently implemented GCD variants in Niels > Moller's code > * A super fast implementation of GCD for 1 and 2 limbs - I have some > code for the < 1 limb case already, but this is not merged > Is this for GCD(x,y) where x and y have <=2 limbs ? branch free?? > XGCD: > > * An XGCD implementation of at least one of the GCD algorithms which > is fast for smaller operands (currently we only have good > asymptotically fast code, after I finally got around to writing it) > > Benchmarketing: > > * The new GMP benchmark is better than the old one, however Brian > Gladman has written some code for a new benchmark program entirely in > C which will become a base for a new mpir benchmark. As we aren't > legally allowed to report gmpbench results for MPIR, we may as well > write our own benchmark to prevent people inadvertently doing this. > Jeff Gilchrist has a real world benchmark , ie something thats run for purposes other than generating benchmarks. I would like to see a few of these in our benchmark suite. > mpn layer: > > * If we are serious about maintaining compatibility with GMP, it looks > as if we have to think about catching up with some of the new mpn > functions they have implemented - it seems they plan to expose many to > the user in a later version - some of them I think we already have, > but perhaps with different names, but others I don't have a clue what > they do. We'll probably do this on a case by case basis for now, > essentially looking to implement the more canonical looking ones > before we implement ones which may be internal only - e.g. I think > mpn_neg is a safe bet for a user exposed function. > > Then we can think towards MPIR 2.0. Immediately that MPIR 1.2 is out I > will create mpir-mt, mpir-cuda and mpir-cell branches in svn. Glenn > reports the cuda1 machine is back up with working CUDA 2.0 drivers, > though there may still be some more packages we need to install from > NVIDIA for maximum joy. > I know nothing about cuda/cell , but it would great to work on them. > There are other things in the works, but these depend on us getting > written contracts and will be announced once we have signed copies in > our hands. :-) > > Of course all of the above is open for discussion (high level ideas > and algorithms can be discussed here, implementation details and > debugging will be thrashed out on mpir-dev). Further code > contributions and ideas are most welcome!! > > Bill. > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---