ni...@lysator.liu.se (Niels Möller) writes: The idea was that u0, u1 is the loop-invariant operand, and the above is for one iteration processing only a single limb from v. Ehum. Perhaps we should change to that cnvention, but until we've done that, sticking to the current will improve my understanding...
A sum of 32-bit values can be accumulated into 64-bit register. But if we want to accumulate 64-bit values, i.e., limb products, it gets tricky. It cannot be done, except with lots of contortions. One can add 32-bit things to a 64-bit product without problems, at least one may add two such things, since ((2^32-1)^2 + (2^32-1) + (2^32-1)) = B^2 - 1 just fits a two-word accumulator. > having a non-zero operand in the high part wouldn't work unless we use > nails, since else it would overflow. Maybe it's a poor way to think about addmul_2 to collect the two products involving a single v limb. I'm not really familiar with how current assembly loops are organized (if I ever looked into it, I'm afraid I've forgotten...). There are lots of variations... > Neat with just umaal and ld/st... Definitely neat. I had a quick look, but I'll need a bit more time to digest it. Note that there are two *parallel* recurrency paths, one over over cya and one over cyb. Pairwise adjacent umaal have a dependency, but that's of the benign, non-recurrent type. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel