That's totally cool!! On 24/11/2008, Carl Witty <[EMAIL PROTECTED]> wrote: > > On Nov 23, 7:43 pm, "Jason Martin" <[EMAIL PROTECTED]> > wrote: >> Well, after a bit more reading, I now think that bitwise shifting is >> available. I was confused because all the floating point operations >> are explicitly listed whereas for the integer operations only the >> "atomic" operations are listed. But, "atomic" just appears to be a >> memory-locking operation. On page 50 of the CUDA programming guide >> they use a bitwise right shift to demonstrate fast division by 2, so >> clearly it's supported (and takes 4 clock cycles). > > CUDA hardware can be programmed in PTX assembly language, as well as > in CUDA's C-like language. > > PTX isn't quite a real assembly language; the assembler handles > register allocation, for instance, and several "instructions" are > actually synthesized from simpler instructions. But it's much closer > to the hardware than CUDA, so it gives a much better idea of what the > hardware can really do. You can get the PTX reference manual in the > CUDA SDK's. (I've also put a copy at > http://sage.math.washington.edu/home/cwitty/ptx_isa_1.2.pdf > .) > > If you read through the PTX reference manual, you see that the > processor does have a full complement of integer instructions. > However, the 32-bit multiply is apparently synthesized from 24-bit > multiplies, which are also directly available: there are two 24x24->32 > multiply instructions. One returns the low 32 bits of the 48-bit > result, the other returns the high 32 bits (so there's a 16-bit > overlap between these results). > > My guess, then, is that CUDA would work best with 24-bit limbs (packed > in 32-bit words). Also, for MPIR, it would probably be better to > program directly in PTX. > > (And there's also a note that says that on CUDA hardware that supports > 32-bit multiplies, the 32-bit multiply instruction will be fast and > the 24-bit multiply instructions may be slow. I don't know if such > hardware exists yet, but eventually you might need two CUDA branches, > one with 24-bit limbs and one with 32-bit limbs.) > > Carl > > >
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---