Emmanuel Thomé <emmanuel.th...@gmail.com> writes: Hi, On Mon, Mar 11, 2013 at 9:53 PM, Torbjorn Granlund <t...@gmplib.org> wrote: > I have 'mpz_vec_t', 'mpf_vec_t" in mind, which have some number of mpz_t > elements, each probably (padded to) the same size counted in limbs. > Then mpz_vec_add(a,b,c), etc would operate on such vectors a, b, c, each > having the same number of elements... > > I don't think this will give good performance. Only of one builds > sequences of expressions trees, hangs vectors on the leaves, then > executes these, one could expect to come close to the GPU's peak > performance. Then how do you arrive to the estimate that ``2x speedup is about the limit'' ? It's highly application-dependent. Well, it is memory bandwidth dependent, of you loead and store operands there for each mpz_vec_foo operation. Application shouldn't matter as long as you have enough long vectors.
I looked in great detail at CUDA and Nvidia hardware. It would take long to evolve my reasoning. I haven't looked at AMD/ARI hardware. Perhaps it is more suitable for what one would want to do. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel