t...@gmplib.org (Torbjörn Granlund) writes: > Using the ARM "subs rd,rm,imm12" instruction, we compute > > {cout, rd} = rm + ~imm + 1 > > while the "adds rd,rm,imm12" instruction, we compute > > {cout, rd} = rm + imm > > . which is quite different. The former will for example always set > cout when rm = imm = 0 as in Vincent's example. The latter will never > set carry when imm = 0 or rm = 0;
Right, it's a bit subtle. The case we're trying to handle specially is {ah, al} - {bh, bl} with bl = B - x, x small. I would expect that the existing code could be fixed if we exclude bl = 0 (since we'd then get get x = B, which qualifies as "x small" only modulo B, but not as a plain mathematical integer). if (__builtin_constant_p (bl) && bl != 0 && -(UDItype)(bl) < 0x1000) Then, if bl = B - x, we get (modulo B^2): {ah, al} - {bh, bl} = (ah - bh) B + al + x - B = (ah + ~bh + 1) B + al + x - B = (ah + ~bh) B + al + x which should be computed correctly with the sequence adds, sbc, using carry out from al + x. Do you agree? The excluded case, sub_ddmmss(ah, al, bh, /*compile time constant*/0) could clearly be optimized, in a different way, but I'd guess it's rare enough in real code to not be worth the effort? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-bugs mailing list gmp-bugs@gmplib.org https://gmplib.org/mailman/listinfo/gmp-bugs