I spent most of Friday reading the arm instruction reference (primarily
motivated by a different project). It seems current GMP loops are based
on umaal, which appears to be tailor-made for addmul_1.

But in the instruction list, I also noticed VMULL, which can do two
32x32->64 products in parallel (to bad it doesn' support 64-bit inputs,
as far as I see). Has anyone played with that? And in general, where can
I find info on the timing of arm instructions (for, say, the most common
A9 and A15 implementations)?

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to