Richard Henderson writes:
Building on the copyi that tege committed the other day, use neon for
the logical operations too.
I committed the 128 bit version to arm/neon, making it become used for
all Neon capable processors. I put it there since it is a speedup for
A9 as well as A15, compa
On 2013-03-08 03:46, Torbjorn Granlund wrote:
I assume you mean that the destination ptr are naturally aligned, while
the source ptrs are 32-bit aligned?
Yes.
My guess for the "jaggyness" is that of two src ptrs, you rarely strike
a case where they are 256-bit aligned, in particular not when
Richard Henderson writes:
Building on the copyi that tege committed the other day, use neon for
the logical operations too.
I did both a 128-bit aligned version,
> $ ./speed-128 -p 10 -C -s 10,50,100,500,1000,5000,1 mpn_and_n
mpn_nand_n
> clock_gettime is 1.000ns accu
Building on the copyi that tege committed the other day, use neon for the
logical operations too.
I did both a 128-bit aligned version,
$ ./speed-128 -p 10 -C -s 10,50,100,500,1000,5000,1 mpn_and_n
mpn_nand_n
clock_gettime is 1.000ns accurate
overhead 6.00 cycles, precision 10