Two configs hesitantly chose HGCD2_DIV1_METHOD = 2. For k10/64, method 2 outperforms method 3 by 0.34%. For ARM Cortex-A8 method 2's advantage is 1.94%. Wow.
Looking at when method 1 or 3 is faster than 2 is more interesting. Method 1 and to some extent also method 3 would benefit from asm code, so unless they are beaten with some margin, they might be the most sound algorithms for more configs. When method 2 is beaten, it is always by method 3, and then always by lower single-digit percent. Questions for Niels: Would your present tuneup/speed setup allow measuring of asm code? The current div1 measurements include hgcd2's own time, right? I.e., if we found a div1 which runs in zero cycles, the timings would not be zero. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel