Hello,

I am looking at Torbjörn's `aorsmul_1.asm' for Apple M1, and I am having trouble understanding how the cycles per limb number was calculated.

As I understand it, the cycles per limb number represents the loop(s) in any routine. Looking at the main loop, it seems like it should scale at 10 cycles per loop (of which 2 cycles are lost due to latency from loading x4, I believe), for which it treats four limbs from `up' at a time. However, the given number is 1.25 which is half the size of my calculated 10 / 4.

Do you use the number of limbs from both `rp' and `up' in this calculation to obtain this number, or is my calculations wrong due to miscalculation or overseeing some clever trick that the CPU employs?

Best,
Albin
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to