[mpir-devel] Re: NVIDIA Tesla

Bill Hart Mon, 24 Nov 2008 09:56:35 -0800

That's totally cool!!

On 24/11/2008, Carl Witty <[EMAIL PROTECTED]> wrote:
>
> On Nov 23, 7:43 pm, "Jason Martin" <[EMAIL PROTECTED]>
> wrote:
>> Well, after a bit more reading, I now think that bitwise shifting is
>> available.  I was confused because all the floating point operations
>> are explicitly listed whereas for the integer operations only the
>> "atomic" operations are listed.  But, "atomic" just appears to be a
>> memory-locking operation.  On page 50 of the CUDA programming guide
>> they use a bitwise right shift to demonstrate fast division by 2, so
>> clearly it's supported (and takes 4 clock cycles).
>
> CUDA hardware can be programmed in PTX assembly language, as well as
> in CUDA's C-like language.
>
> PTX isn't quite a real assembly language; the assembler handles
> register allocation, for instance, and several "instructions" are
> actually synthesized from simpler instructions.  But it's much closer
> to the hardware than CUDA, so it gives a much better idea of what the
> hardware can really do.  You can get the PTX reference manual in the
> CUDA SDK's.  (I've also put a copy at
> http://sage.math.washington.edu/home/cwitty/ptx_isa_1.2.pdf
> .)
>
> If you read through the PTX reference manual, you see that the
> processor does have a full complement of integer instructions.
> However, the 32-bit multiply is apparently synthesized from 24-bit
> multiplies, which are also directly available: there are two 24x24->32
> multiply instructions.  One returns the low 32 bits of the 48-bit
> result, the other returns the high 32 bits (so there's a 16-bit
> overlap between these results).
>
> My guess, then, is that CUDA would work best with 24-bit limbs (packed
> in 32-bit words).  Also, for MPIR, it would probably be better to
> program directly in PTX.
>
> (And there's also a note that says that on CUDA hardware that supports
> 32-bit multiplies, the 32-bit multiply instruction will be fast and
> the 24-bit multiply instructions may be slow.  I don't know if such
> hardware exists yet, but eventually you might need two CUDA branches,
> one with 24-bit limbs and one with 32-bit limbs.)
>
> Carl
> >
>


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

[mpir-devel] Re: NVIDIA Tesla

Reply via email to