ni...@lysator.liu.se (Niels Möller) writes: I spent most of Friday reading the arm instruction reference (primarily motivated by a different project). It seems current GMP loops are based on umaal, which appears to be tailor-made for addmul_1. It is OK for addmul_1, but our usage suffers from that they are on a tight critical path. For addmul_2 this is not a problem. I suspect addmul_1 should not really use umaal, at least not for A15.
But in the instruction list, I also noticed VMULL, which can do two 32x32->64 products in parallel (to bad it doesn' support 64-bit inputs, as far as I see). Has anyone played with that? And in general, where can I find info on the timing of arm instructions (for, say, the most common A9 and A15 implementations)? I found the A9 manual here: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0388i/DDI0388I_cortex_a9_r4p1_trm.pdf The corresponding A15 manual seems less forthcoming wrt cycle numbers. Login to parma, explore! I haven't played with Neon much. There are lots of instructions there which might be useful for us. At least lshift, lshiftc, rshift, popcount, hamdist, copyi, copyd, and com could be improved. While x86's SIMD seems to have as little organisation as a garbage dump, Neon is carefully designed. It is a nice change. Neon is surprisingly powerful. They generalised instructions in a nice way. Using Neon in a robust way might be a bit tricky, though. I have no idea how to determine if a CPU has Neon or not, and ARM has made most useful meta instructions supervisor-only. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel