A carry bit helps for some codes, GMP being a prime example. Keeping carry/borrow conditions in plain registers can be made to work well too. But then you need good ways of computing carry/borrow, and good ways of inputting the carry/borrow result to dependent add/subtract instructions.
Risc V has OK ways of computing borrow but not carry. Risc V lacks good ways of inputting carry/borrow to dependent add/subtract instructions. For the subtraction c = a - b we could compare a and b independent of the subtraction using sltu. The sltu instruction is of course a subtraction which puts the the borrow-out in its result register. But for the addition c = a + b we don't have anything which computes the carry-out, a "compute carry-out from add" would be needed. We now need to first perform the add, then use sltu on the result, thus creating a very unfortunate dependency. Instruction dependencies are a major performance killer for super-scalar processors. We also don't have any way of efficiently consuming a computed carry/borrow result. 3-input add/subtract would have solved that (together with 3-input sltu and a 3-input "compute carry from add"). This all means that on Risc V, multi-word subtraction could be made to at 2 cycles/word while multi-word addition is limited to 3 cycles/word, in both cases assuming a very wide super-scalar core. Remember that other concurrent CPUs do these in 1+epsilon cycles/word, and that without needing to do wide super-scalar dispatch. I use multi-word add/subtract here as an example of the inefficiencies of Risc V. But the weak instruction set of Risc V shows in any integer-heavy application, as others have pointed out before me. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel