On Apr 30, 5:34 pm, Bill Hart <goodwillh...@googlemail.com> wrote: > >> * Support for parallel processing (e.g. multithreaded code) and GPU's > > > This is the part I am curious about: what are your plans there? In > > particular, at what level do you place the parallelism? The code is > > already thread-safe. Do you want to port it so it runs on the GPU, and > > each GPU core can do a separate operation (the parallelism is still at > > the level of the application)? > > No, that's not the plan at this stage. I doubt it would be very useful.
Many people use GMP for some kind of portable (and sometimes slightly longer) long long, so I don't think it is completely unrealistic, although there would be many restrictions (with allocations, in particular). For instance, there are RSA implementations on the GPU, but they use only some parallelism inside each RSA op and rely on executing a batch of many RSA ops in parallel for performance. Now I can understand concentrating on very large numbers for MPIR. I see there are implementations of the FFT available on the GPU (with float), it already looks more relevant to your goals. > > For multi-threading, there are some easy possibilities > > in mpq (not so much in mpz, except for things like the Miller-Rabin > > test) and inside mpn, for instance in the fft and toom codes (a couple > > openmp pragmas and a second thread are enough to reduce the runtime of > > a large multiplication by 25%), assuming applications are unable to > > get enough parallelism on their own. > > That's pretty much what we have in mind. Is there a branch somewhere with people working on it? Have you decided on a framework (not sure that's the right term), like openmp for instance? > I suspect that (as you indicate), nails would have to be modified to > make it useful. I personally haven't thought about its implications > for parallel processing. > If you have some ideas, it would be interesting to hear them. Oh, nothing special. If you are in a model of computation where you can do as many operations in parallel as you want as long as there is no dependency or interference between them (that is not really a good approximation of a GPU or anything else though), without nails, an operation like sub(2^n,1) or add(2^n-1,1) must still take linear time to propagate the carry. However, if you allow some non-determinism (including nail bits), you can make add_n a constant time operation (for instance 1: add limb by limb; 2: canonicalize even limbs, adding the carry to the next limb; 3: same thing for odd limbs; in the end some even limbs are not canonical, but that's ok), and similarly for sub_n (which shows that the nail representation required is not as simple as it looked from add_n), mul_1, etc. Although it might be better to group limbs into blocks of 10 limbs and have nails for each block instead of each limb. Now that may be completely irrelevant for any realistic implementation... It places the parallelism at the lowest level, which is why I was mentioning vector architectures, but I don't think any vector architecture provides the right instructions to make it useful. And I don't think you are trying to create some bignum hardware and minimizing the instruction depth either :-) -- Marc Glisse --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---