On Apr 5, 10:00 pm, mabshoff <michael.absh...@mathematik.uni- dortmund.de> wrote: > There is an somewhat interesting article about "A First Look at the > Larrabee New Instructions (LRBni)" at > > http://www.ddj.com/hpc-high-performance-computing/216402188 > > You might want to read the print version since that one is all on one > page. I am not sure how much of this will be useful to MPIR (a lot of > the instructions have obvious applications for BLAS for example), but > given that this is coming from Intel it will likely be quite > widespread.
It looks like LRBni can be used to hugely accelerate some MPIR operations, but it will be a lot of work to get the best results. LRBni gives you 32 512-bit registers, and lets you do vectorized operations with 16 32-bit integers, at basically one operation per cycle. This includes a 32x32->64 multiply (two separate instructions -- one to do 32x32 -> the low 32 bits, and one for 32x32 -> the high 32 bits, so this takes 2 cycles), add with carry, etc.; looks like a fairly complete instruction set for doing arithmetic vectorized 16x. (There doesn't seem to be an integer divide instruction, though.) Also, the LRBni instructions are on Larrabee graphics processors that are expected to be hugely multicore. I'm not sure how you would make the best use of this power, though. The nicest possibility might be if you can set up to do the same operations on 16 numbers simultaneously (and store the numbers interleaved limb-by-limb, so that the least-significant limb of all 16 numbers are adjacent, etc.) Otherwise, for multiplication, I guess you start with Karatsuba/Toom/... until you have divided the multiplication into 16 sub-multiplications, and then you shuffle the limbs into the correct format and do all 16 sub-multiplications simultaneously. As far as I can tell, the LRBni instructions are only going to go in Intel's graphics processors, not on their CPUs. CPUs will get AVX instructions instead, which are also nice (and tricky to make use of), but not nearly as nice as LRBni. (AVX has 256-bit registers, not 512- bit; and I think to make full use of AVX it may be necessary to switch to floating-point limbs. It's been a while since I looked, but if I recall correctly, the AVX integer instructions are quite impoverished compared to the floating point instructions.) Exciting times ahead :) Carl --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---