Richard Henderson <r...@twiddle.net> writes: Three patches herein. If there's a better way to submit patches, please advise; I've never used hg before. The first patch gives gcc control over ctz/clz. Particularly for armv6t2 and later, which have rbit for use for ctz. The second patch improves multiplication a bit. I'm still playing with addmul_2, but this is a start for addmul_1/mul_1. I couldn't do better than the existing submul_1. Unfortunately the Xscale machines in the gcc build farm are turned off, so I can't test to see if I've regressed on that platform. The third patch tidies up add_n/sub_n, and provides for the carry-in entry points. The GMP project now finally has an ARM system in the test environment, so now we will implement ARM improvements. I have taken a brief look at your work, and it provides nice improvements.
I suppose we should make a few subdirs such as arm/a8 and arm/a9, to make sure we don't optimise for one CPU and pessimise for another. It's a bit touchy speed testing these. There's no cycle counter available in userspace, and Hz is depressingly low. So I've had to bump the minimum iterations way way up in order to get semi- reliable results. Which causes the speed testing to take quite a long time. What did you do to make it work? I always get "Fatal error: too many (11) failed measurements (0.0)" on any arm system. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel