On 2013-02-27 14:33, Torbjorn Granlund wrote:
        vld1.32         { q1, q2 }, [r0@128]!

   As specified in section A.3.2.1, if you specify the alignment it will
   also be checked, so you'll get SIGBUS if its not right.

I wanted to experiment, but I cannot find any syntax which is accepted
by gas.  @128 does not work (in gas 2.22).

I had to use the disassembler to figure it out.  Gas uses a colon.

        vld1.64 {d0-d3}, [r0:128]

Which while not obvious, I should have figured it had be something else since "@" begins a comment in ARM assembly.

And, I lied about not being able to read 4 128-bit registers in one insn.
You can't do it with VLD[1-4], but you can with VLDM.

Something else to look at is whether VLDR and VLDM perform better on A9.

The one thing that you do have to worry about there is that VLD[1-4] load consecutive "elements" as defined by the data type, whereas VLD[RM] load full 64-bit registers. This distinction matters in big-endian mode.

Of course, the big-endian caveat doesn't apply to popcount.

   It trades 1 vpaddl for two add insns, but the total latency is
   probably a cycle or two better since we're now operating in core.

Need to test that, I think.  I fear the corereg<->vreg bandwidth might
be poor.

If it's awful, one could perform the final fold with "vadd.i64 d16, d17" and perform only one move to r0, swallowing that latency in the function return.


r~
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to