[mpir-devel] Re: Fast computation of binomial coefficients

Marc Sat, 02 May 2009 21:45:04 -0700

On Apr 30, 5:34 pm, Bill Hart <goodwillh...@googlemail.com> wrote:

> >> * Support for parallel processing (e.g. multithreaded code) and GPU's
>
> > This is the part I am curious about: what are your plans there? In
> > particular, at what level do you place the parallelism? The code is
> > already thread-safe. Do you want to port it so it runs on the GPU, and
> > each GPU core can do a separate operation (the parallelism is still at
> > the level of the application)?
>
> No, that's not the plan at this stage. I doubt it would be very useful.


Many people use GMP for some kind of portable (and sometimes slightly
longer) long long, so I don't think it is completely unrealistic,
although there would be many restrictions (with allocations, in
particular). For instance, there are RSA implementations on the GPU,
but they use only some parallelism inside each RSA op and rely on
executing a batch of many RSA ops in parallel for performance. Now I
can understand concentrating on very large numbers for MPIR. I see
there are implementations of the FFT available on the GPU (with
float), it already looks more relevant to your goals.

> > For multi-threading, there are some easy possibilities
> > in mpq (not so much in mpz, except for things like the Miller-Rabin
> > test) and inside mpn, for instance in the fft and toom codes (a couple
> > openmp pragmas and a second thread are enough to reduce the runtime of
> > a large multiplication by 25%), assuming applications are unable to
> > get enough parallelism on their own.
>
> That's pretty much what we have in mind.

Is there a branch somewhere with people working on it? Have you
decided on a framework (not sure that's the right term), like openmp
for instance?

> I suspect that (as you indicate), nails would have to be modified to
> make it useful. I personally haven't thought about its implications
> for parallel processing.
> If you have some ideas, it would be interesting to hear them.

Oh, nothing special. If you are in a model of computation where you
can do as many operations in parallel as you want as long as there is
no dependency or interference between them (that is not really a good
approximation of a GPU or anything else though), without nails, an
operation like sub(2^n,1) or add(2^n-1,1) must still take linear time
to propagate the carry. However, if you allow some non-determinism
(including nail bits), you can make add_n a constant time operation
(for instance 1: add limb by limb; 2: canonicalize even limbs, adding
the carry to the next limb; 3: same thing for odd limbs; in the end
some even limbs are not canonical, but that's ok), and similarly for
sub_n (which shows that the nail representation required is not as
simple as it looked from add_n), mul_1, etc. Although it might be
better to group limbs into blocks of 10 limbs and have nails for each
block instead of each limb. Now that may be completely irrelevant for
any realistic implementation... It places the parallelism at the
lowest level, which is why I was mentioning vector architectures, but
I don't think any vector architecture provides the right instructions
to make it useful. And I don't think you are trying to create some
bignum hardware and minimizing the instruction depth either :-)

--
Marc Glisse
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

[mpir-devel] Re: Fast computation of binomial coefficients

Reply via email to