Hans Petter Selasky <h...@selasky.org> writes: If the GMP could utilitize multiple cores when doing bignum multiplication and addition, I think the picture would look different.
For example for addition, you could split the number in two parts, and then speculate if there is an addition for the higher part or not. And if the guess is wrong, then what? It is well knowm in a model which ignores caches and memory bandwidth, than one can get 2n/k + log(k) word operation steps for n-word addition on k execution agents. Agent k computes the sum of block k with both carry = 1 and carry in = 0 and saves both results. The log(k) term is for serially choosing the proper block depending on whether carry-in happened to specific blocks. On a cached system, I would expect this algorithm to just slow things down. I thought that RISC-V would produce cheaper and more cores, and that single core performance was not that critical. Slow cores are useful in some applications, sure. Talking about x86, don't forget that there is microcode below each instruction. This is a false sattement. Even it it were true, how is that relevant for this discusson? The relevant instructions run in one cycle. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel