On Saturday 06 June 2009 00:14:58 Bill Hart wrote:
> MPIR 1.2 looks to be passing all its tests, so should be out
> officially over the weekend, so time to think about the next
> release....
>
> There will be an MPIR 1.2.1 which will merge Robert Gerbicz'
> contributed code, "turn on" some improvements we have for the FFT, and
> clean up some longstanding issues we have lurking in our trac server
> (see those issues flagged for mpir-1.2.1:
> http://trac.mpir.org/mpir_trac/report/1 ).
>
> Here are some of the things we've been musing about for the next
> release(s) (MPIR 1.3 and following), roughly in order of importance in
> each section:
>
> Multiplication:
>
> * Further improvements to assembly language on x86_64 (yes, we think
> even for K8 :-))
> * Move more of karatsuba into assembly language (there should be two
> new assembly functions for this)
> * Factor Karatsuba code out into separate files (kara_mul_n.c and
> kara_mul.c)
> * Unbalanced version of karatsuba

Assuming I get the time etc , then I volunteer for the above assembler and 
karatsuba mul.

> * Implement an improvement to Toom 3 suggested by Thorsten Kleinjung
> * Toom 4 variants for unbalanced operands based on the code of Marco
> Bodrato that we have
> * Toom 7 variants for "slightly unbalanced" operands
> * Factor out some of the lower level FFT helper functions into
> separate files
> * Implement some interesting strategies for speeding up pointwise
> multiplications in the FFT code

Looking at the existing toom code there are a lot of
mpn_addmul_1(....,2^k) 
we should change this to 
mpn_addlsh_n(....,k)
and write asm functions for this , for K8 this would be a macro to call 
addmul_1 , and for core2,K10 a new function
Various other combinations can be done faster than we do at the moment , we 
just need to choose the most common or the easiest.

> * Work on left and right truncated products and middle products
>
> Division:
>
> * Asymptotically fast division code based on Newton iteration -
> Fredrick Johansson has python code which demonstrates an asymptotic
> improvement

Again , assuming I have the time , then I would like to do the division below

> * Implement Peter Montgomery's mod_1 algorithm
> * Further assembly improvements to divrem_1 and divrem_2
>
> GCD:
>
> * Strassen multiplication for the 2x2 matrix multiplication used by
> the asymptotically fast GCD code of Niels Moller
> * A best of breed implementation at the mpn level making use of the
> different strengths of the currently implemented GCD variants in Niels
> Moller's code
> * A super fast implementation of GCD for 1 and 2 limbs - I have some
> code for the < 1 limb case already, but this is not merged
>

Is this for GCD(x,y) where x and y have <=2 limbs ? branch free?? 

> XGCD:
>
> * An XGCD implementation of at least one of the GCD algorithms which
> is fast for smaller operands (currently we only have good
> asymptotically fast code, after I finally got around to writing it)
>
> Benchmarketing:
>
> * The new GMP benchmark is better than the old one, however Brian
> Gladman has written some code for a new benchmark program entirely in
> C which will become a base for a new mpir benchmark. As we aren't
> legally allowed to report gmpbench results for MPIR, we may as well
> write our own benchmark to prevent people inadvertently doing this.
>

Jeff Gilchrist has a real world benchmark , ie something thats run for 
purposes other than generating benchmarks. I would like to see a few of these 
in our benchmark suite.

> mpn layer:
>
> * If we are serious about maintaining compatibility with GMP, it looks
> as if we have to think about catching up with some of the new mpn
> functions they have implemented - it seems they plan to expose many to
> the user in a later version - some of them I think we already have,
> but perhaps with different names, but others I don't have a clue what
> they do. We'll probably do this on a case by case basis for now,
> essentially looking to implement the more canonical looking ones
> before we implement ones which may be internal only - e.g. I think
> mpn_neg is a safe bet for a user exposed function.
>
> Then we can think towards MPIR 2.0. Immediately that MPIR 1.2 is out I
> will create mpir-mt, mpir-cuda and mpir-cell branches in svn. Glenn
> reports the cuda1 machine is back up with working CUDA 2.0 drivers,
> though there may still be some more packages we need to install from
> NVIDIA for maximum joy.
>

I know nothing about cuda/cell , but it would great to work on them.

> There are other things in the works, but these depend on us getting
> written contracts and will be announced once we have signed copies in
> our hands. :-)
>
> Of course all of the above is open for discussion (high level ideas
> and algorithms can be discussed here, implementation details and
> debugging will be thrashed out on mpir-dev). Further code
> contributions and ideas are most welcome!!
>
> Bill.
>
>
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to