2009/6/6 Jason Moxham <ja...@njkfrudils.plus.com>: > > On Saturday 06 June 2009 00:14:58 Bill Hart wrote: >> MPIR 1.2 looks to be passing all its tests, so should be out >> officially over the weekend, so time to think about the next >> release.... >> >> There will be an MPIR 1.2.1 which will merge Robert Gerbicz' >> contributed code, "turn on" some improvements we have for the FFT, and >> clean up some longstanding issues we have lurking in our trac server >> (see those issues flagged for mpir-1.2.1: >> http://trac.mpir.org/mpir_trac/report/1 ). >> >> Here are some of the things we've been musing about for the next >> release(s) (MPIR 1.3 and following), roughly in order of importance in >> each section: >> >> Multiplication: >> >> * Further improvements to assembly language on x86_64 (yes, we think >> even for K8 :-)) >> * Move more of karatsuba into assembly language (there should be two >> new assembly functions for this) >> * Factor Karatsuba code out into separate files (kara_mul_n.c and >> kara_mul.c) >> * Unbalanced version of karatsuba > > Assuming I get the time etc , then I volunteer for the above assembler and > karatsuba mul. > >> * Implement an improvement to Toom 3 suggested by Thorsten Kleinjung >> * Toom 4 variants for unbalanced operands based on the code of Marco >> Bodrato that we have >> * Toom 7 variants for "slightly unbalanced" operands >> * Factor out some of the lower level FFT helper functions into >> separate files >> * Implement some interesting strategies for speeding up pointwise >> multiplications in the FFT code > > Looking at the existing toom code there are a lot of > mpn_addmul_1(....,2^k) > we should change this to > mpn_addlsh_n(....,k) > and write asm functions for this , for K8 this would be a macro to call > addmul_1 , and for core2,K10 a new function > Various other combinations can be done faster than we do at the moment , we > just need to choose the most common or the easiest. > >> * Work on left and right truncated products and middle products >> >> Division: >> >> * Asymptotically fast division code based on Newton iteration - >> Fredrick Johansson has python code which demonstrates an asymptotic >> improvement > > Again , assuming I have the time , then I would like to do the division below > >> * Implement Peter Montgomery's mod_1 algorithm >> * Further assembly improvements to divrem_1 and divrem_2 >> >> GCD: >> >> * Strassen multiplication for the 2x2 matrix multiplication used by >> the asymptotically fast GCD code of Niels Moller >> * A best of breed implementation at the mpn level making use of the >> different strengths of the currently implemented GCD variants in Niels >> Moller's code >> * A super fast implementation of GCD for 1 and 2 limbs - I have some >> code for the < 1 limb case already, but this is not merged >> > > Is this for GCD(x,y) where x and y have <=2 limbs ? branch free??
The current code is not branch free, but is fast. However we probably need multiple versions for different processors. > >> XGCD: >> >> * An XGCD implementation of at least one of the GCD algorithms which >> is fast for smaller operands (currently we only have good >> asymptotically fast code, after I finally got around to writing it) >> >> Benchmarketing: >> >> * The new GMP benchmark is better than the old one, however Brian >> Gladman has written some code for a new benchmark program entirely in >> C which will become a base for a new mpir benchmark. As we aren't >> legally allowed to report gmpbench results for MPIR, we may as well >> write our own benchmark to prevent people inadvertently doing this. >> > > Jeff Gilchrist has a real world benchmark , ie something thats run for > purposes other than generating benchmarks. I would like to see a few of these > in our benchmark suite. > >> mpn layer: >> >> * If we are serious about maintaining compatibility with GMP, it looks >> as if we have to think about catching up with some of the new mpn >> functions they have implemented - it seems they plan to expose many to >> the user in a later version - some of them I think we already have, >> but perhaps with different names, but others I don't have a clue what >> they do. We'll probably do this on a case by case basis for now, >> essentially looking to implement the more canonical looking ones >> before we implement ones which may be internal only - e.g. I think >> mpn_neg is a safe bet for a user exposed function. >> >> Then we can think towards MPIR 2.0. Immediately that MPIR 1.2 is out I >> will create mpir-mt, mpir-cuda and mpir-cell branches in svn. Glenn >> reports the cuda1 machine is back up with working CUDA 2.0 drivers, >> though there may still be some more packages we need to install from >> NVIDIA for maximum joy. >> > > I know nothing about cuda/cell , but it would great to work on them. > >> There are other things in the works, but these depend on us getting >> written contracts and will be announced once we have signed copies in >> our hands. :-) >> >> Of course all of the above is open for discussion (high level ideas >> and algorithms can be discussed here, implementation details and >> debugging will be thrashed out on mpir-dev). Further code >> contributions and ideas are most welcome!! >> >> Bill. >> >> >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---