t...@gmplib.org (Torbjörn Granlund) writes: I went ahead and pushed a hgcd2.c with _METHOD = 4 and _METHOD 5. The former tables quotients and neeeds one multiply per div1 call and the latter tables inverses and needs 2 multiplies per div1 call.
I pushed improved versions. Instead of detecting potential problematic operands ahead of time, both methods now try its division-free algorithm, carefully avoiding overflow. If the remainder is not in range, they fall back to division. The quotient table method isn't pretty. It can surely be improved. It uses a very large table and gets poor throughput. It also has only about 80% branch hit rate, which is not good enough. Now 90%. But the table is still the same huge size. The inverse table method is more mature, I believe. It has much smaller tables and with the default table size (256 2-byte entries) it gets a branch hit rate of 97%. Doubling the table results in 98.5% rate. Now 99.74% for the smaller table. The larger table is no longer useful and is gone. The default is now a 64-byte table with 99.5% division avoidance. It is easier to see that the new variants are correct; if the few arithmetic operations do not overflow (which I claim to be true) and if the remainder is in range [0,d-1] then the result will be correct. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel