Ciao Thanassis,

Il 2022-06-13 23:17 Thanassis Tsiodras ha scritto:
I had a quick look at the x86_64 assembly implementations of the basic
primitive used in multiplications (mpn_mul_1), and saw this:

...I could not find any use of AVX-integer-related multiplication
instructions.
I am talking about things like " _mm512_mul_epu32", which at first glance
seemed promising (8x32bit multiplications in one instruction generating
8x64-bit results in one go).

Four 32x32->64 multiplications perform the same multiplication work of one 64x64->128. But are "8x32bit multiplications in one instruction" faster then two 64x64 mul? As you confirm, many other additions with carry propagation are needed.

So the question is, does using AVX reduce the resources needed for a multiplication?

I can't see a way to do that optimally. Is that the reason GMP asm code
seems to prefer the simple 64x64 => 128 instructions?  (mul %rcx)

When you'll find an implementation with AVX, more efficient than our current implementation, you can contribute it to the project :-)

Ĝis,
m
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to