I tested newer Intel systems too (Haswell, Skylake) and they all need around 25 cycles for a division n/d = 1.
Intel Goldmont Plus (a current low-end CPU) is better, it needs about 12 cycles. AMD CPUs from the last 10 years all perform OK. It is funny that x86 vendors give division so little thought. ARM clearly got it right. I mean, doing SRT for just the non-zero part of the quotient cannot be very hard! (ARM processors before a77 have very poor multiplication, though.) AMD bd1 22 AMD bd2 15 AMD bd4 15 AMD zn1 14 AMD zn2 14 AMD bt2 13 Intel hwl 25 Intel sky 25 Intel slm 30 Intel glm 13 Intel glm+ 12 -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel