t...@gmplib.org (Torbjörn Granlund) writes: ni...@lysator.liu.se (Niels Möller) writes: > In that case, not so surprising that the div1 function loses. Do other architectures also have decent performance for small-quotient division? > I don't have the full picture, I'm afraid. > I know several ARM cores have great division performance for small quotients. For x86 I know of cores with horrible performance and ones (like Haswell and later) with half decent performance. I assume newer AMD cores got this right.
I ran tests of shell (Intel Ivy bridge, from around 2012) and ashell (AMD Ryzen 2700X from 2018) with this simple program: unsigned long qs[1000]; int main () { unsigned long r, i; for (r = 0; r < CLOCK/1000; r++) { for (i = 0; i < 1000; i++) { qs[i] = 2000 / (i + 1000); } } return 0; } The Intel system reports ~23 cycles per division, the AMD system reports ~13. ARM systems impress more, a73 gives 5 cycles/division, a72 gives 6. Even a low-end a53 gives 5. (The many ARM systems are always on, they're hiding behind ashell.) So I think plain / is the way to go for certain systems! -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel