[mpir-devel] Re: DDJ: A First Look at the Larrabee New Instructions (LRBni)

Carl Witty Wed, 08 Apr 2009 21:52:19 -0700

On Apr 5, 10:00 pm, mabshoff <michael.absh...@mathematik.uni-
dortmund.de> wrote:
> There is an somewhat interesting article about "A First Look at the
> Larrabee New Instructions (LRBni)" at
>
>    http://www.ddj.com/hpc-high-performance-computing/216402188
>
> You might want to read the print version since that one is all on one
> page. I am not sure how much of this will be useful to MPIR (a lot of
> the instructions have obvious applications for BLAS for example), but
> given that this is coming from Intel it will likely be quite
> widespread.


It looks like LRBni can be used to hugely accelerate some MPIR
operations, but it will be a lot of work to get the best results.
LRBni gives you 32 512-bit registers, and lets you do vectorized
operations with 16 32-bit integers, at basically one operation per
cycle.  This includes a 32x32->64 multiply (two separate instructions
-- one to do 32x32 -> the low 32 bits, and one for 32x32 -> the high
32 bits, so this takes 2 cycles), add with carry, etc.; looks like a
fairly complete instruction set for doing arithmetic vectorized 16x.
(There doesn't seem to be an integer divide instruction, though.)

Also, the LRBni instructions are on Larrabee graphics processors that
are expected to be hugely multicore.

I'm not sure how you would make the best use of this power, though.
The nicest possibility might be if you can set up to do the same
operations on 16 numbers simultaneously (and store the numbers
interleaved limb-by-limb, so that the least-significant limb of all 16
numbers are adjacent, etc.)  Otherwise, for multiplication, I guess
you start with Karatsuba/Toom/... until you have divided the
multiplication into 16 sub-multiplications, and then you shuffle the
limbs into the correct format and do all 16 sub-multiplications
simultaneously.

As far as I can tell, the LRBni instructions are only going to go in
Intel's graphics processors, not on their CPUs.  CPUs will get AVX
instructions instead, which are also nice (and tricky to make use of),
but not nearly as nice as LRBni.  (AVX has 256-bit registers, not 512-
bit; and I think to make full use of AVX it may be necessary to switch
to floating-point limbs.  It's been a while since I looked, but if I
recall correctly, the AVX integer instructions are quite impoverished
compared to the floating point instructions.)

Exciting times ahead :)

Carl
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

[mpir-devel] Re: DDJ: A First Look at the Larrabee New Instructions (LRBni)

Reply via email to