Richard Henderson <r...@twiddle.net> writes:

  I used the following, almost certainly not appropriate for general 
application.
  
[snip]

Thanks.  I would be very useful to make GMP timing work with the kernel
Linux running om ARM.  Do you know if there are similar problems with,
say, NetBSD?

I have checked in major ARM improvements the last few days, inspired by
your patches.  I have an A9 but no A8, so I have optimised just for the
former.  I have left the top-level mpn/arm code largely unmodified in
order to keep supporting older v4 and v5 arch CPUs; new multiply code
using umaal resides in the directory mpn/arm/v6.  (Some new division
code still in the forge will appear in mpn/arm/v5, due to its use of
clz.)

The new code is carefully software pipelined, and mul_1 and addmul_1 run
faster than both the old code and your patched code, at least on A9.
Could you please try it on A8 and see if it is at least as fast as your
code there?  If it is slower, we need to try to make innocent
modifications that doesn't hurt A9, or if that turns out to be hard,
provide several functions and choose asm code not only based on
architecture, but also on exact core.

I would appreciate if you timed all new and modified functions on A8.

If you have ARMv4 (e.g., StrongARM) and/or ARMv5 (e.g., XScale) I would
appreciate if you could check if they still work after the latest
changes.

Do you know if there is a portable mechanism for recognising an ARM
core, akin to x86's cpuid?

-- 
Torbjörn
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to