ni...@lysator.liu.se (Niels Möller) writes: > 2. cnd_add_n should be at least as fast as addmul_1, shouldn't it? It > appears to be 0.25 c/l faster for larger operands, so maybe it's "only" > a question of optimizing loop setup and feedin?
For large operands, it's strictly between add_n and addmul_1, which I guess is as expected. For small sizes, I had a look at the loop setup for add_n, which checks bit 0 and 1 of n separately. If that's faster, maybe one could borrow that logic. I also wonder if there are any other tricks to speed up cnd_add_n. As far as I understand, shift operations on arm don't truncate shift counts to 5 bits (0-31), so one could perhaps replace bic b, b, cnd C zero for true, all ones for false adcs r, a, b with adcs r, a, b, lsl cnd C zero for true, 32 for false (If we believe that timing and internal dependencies are independent of the shift count). I played a little with that, but I get no speed improvement so far. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel