Re: How to calculate cycles/limb in assembly routines

Albin Ahlbäck Thu, 04 Apr 2024 16:51:45 -0700

Thanks for the fast and helpful reply!

I see, I definitely need to read up on the CPU pipelines. I also testedone of your automated scripts for measuring cycles per limbs for avariety of functions, and it checks out.

Anyway, in regards to the performance of multiplication: I did manage towrite some half-hardcoded that outperforms the mpn_mul_basecase quite abit on Apple M1 (only tested on the Mac Mini on cfarm). They arebasically on the form


        mpn_mul_N(mp_ptr, mp_srcptr, mp_size_t, mp_srcptr)

for N in 1, 2, ..., 15. I recall that this translated very well into theToom-Cook territories (when using this, the cutoff between Toom22 usingthese underlying algorithms and GMP's Toom33 is at ~480 limbs, prettyimpressive!(?)). For instance, with N = 8 it is 80% fasterasymptotically than mpn_mul_basecase on M1. They do, however, span a lotof code as each case has to be handcoded, so I suppose they would notfit into GMP.


Anyway, thanks for your reply!

Best,
Albin
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: How to calculate cycles/limb in assembly routines

Reply via email to