Hi, GMP wizards. I had a quick look at the x86_64 assembly implementations of the basic primitive used in multiplications (mpn_mul_1), and saw this:
$ grep mul $(find . -type f | grep asm$ | grep x86_64 | grep /mul_1) | \ grep -v -P '\tmul\tv' | \ grep -v mul.uses | \ grep -v mpn_mul_ | \ grep -v mulx ./x86_64/silvermont/mul_1.asm:include_mpn(`x86_64/bd1/mul_1.asm') ./x86_64/pentium4/mul_1.asm:include_mpn(`x86_64/bd1/mul_1.asm') ./x86_64/goldmont/mul_1.asm:include_mpn(`x86_64/coreisbr/mul_1.asm') Basically, after grep-ing out: - instances of "mul v0", "mul v1", etc... - comments mentioning that "mul" clobbers rdx... - labels of mpn_mul_1... - ...and uses of "mulx"... ...I could not find any use of AVX-integer-related multiplication instructions. I am talking about things like " _mm512_mul_epu32", which at first glance seemed promising (8x32bit multiplications in one instruction generating 8x64-bit results in one go). Then again, the generated 64 bit outputs from the 8 32x32 multiplications would have to be add-/adc- "horizontally", shifted by 32-bits each... I can't see a way to do that optimally. Is that the reason GMP asm code seems to prefer the simple 64x64 => 128 instructions? (mul %rcx) Asking as a curious x86-64 guy, Thanassis. P.S. Most of the asm codebases indicate in a comment that: "The loop... code is the result of running a code generation and optimization tool suite written by David Harvey and Torbjorn Granlund". Did this tool check AVX (and in general, SIMD) instructions as well? _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel