I spent most of Friday reading the arm instruction reference (primarily motivated by a different project). It seems current GMP loops are based on umaal, which appears to be tailor-made for addmul_1.
But in the instruction list, I also noticed VMULL, which can do two 32x32->64 products in parallel (to bad it doesn' support 64-bit inputs, as far as I see). Has anyone played with that? And in general, where can I find info on the timing of arm instructions (for, say, the most common A9 and A15 implementations)? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel