Hello Tom,

Which architecture has single cycle division? I think it's way above
that, based on profiles I've seen. And Agner seems to back me up:
https://www.agner.org/optimize/instruction_tables.pdf
That lists a 32/64 idiv with a latency of ~26/~42-95 cycles, even on a
moder uarch like skylake-x.

Huh.  I figured Intel would have thrown sufficient transistors at that
problem by now.

It is not just a problem of number of transistors, division is intrisically iterative (with various kind of iterations used in division algorithms), involving some level of guessing and other arithmetics, so the latency can only be bad, and the possibility of implementing that in 1 cycle at 3 GHz looks pretty remote.

--
Fabien.


Reply via email to