> Sounds like the opposite of binary euclid. Sounds more interesting than binary euclid, to me. Because looking at the lowest bits and just detecting the bit size seems easier than extracting the highest bits each loop.
I suppose I don't fully understand how to make a table-based binary euclid with good progress. A binary Euclid approaches 3 bits/iteration, which is what we get without the latest scaling trick for the binary table gcd. I'd love to see a fast binary euclid. I have now added statistics collection to my latest code. It turns out that for equal size operands, the addmul_1 case is executed some 95% of the time. The scaling trick only applies to the submul_1 cases. The result is 2 bits/iteration (with 256-byte tables). :-( In order to trigger the submul arm, I tested with operands of quite different sizes. Now I get close to 6 bits per iteration, which of course is awesome. We need to think more abour this. With my present algorithm I seem to get almost 3 times better progress from gcd(2^64*U+1,V) than from gcd(U,V) when U and V are close in size. The former triggers the scaling submul_1 code while the latter gets stuck in the addmul_1 code. For similar-size operands, the scaling submul_1 code gets used just when ether operand by chance happen the geta large size reduction. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel