[PATCH 0/6] aarch64: Implement TImode comparisons

Richard Henderson via Gcc-patches Wed, 18 Mar 2020 23:48:25 -0700

This is attacking case 3 of PR 94174.

The existing ccmp optimization happens at the gimple level,
which means that rtl expansion of TImode stuff cannot take
advantage.  But we can to even better than the existing
ccmp optimization.


This expansion is similar size to our current branchful 
expansion, but all straight-line code.  I will assume in
general that the branch predictor will work better with
fewer branches.

E.g.

-  10:  b7f800a3        tbnz    x3, #63, 24 <__subvti3+0x24>
-  14:  eb02003f        cmp     x1, x2
-  18:  5400010c        b.gt    38 <__subvti3+0x38>
-  1c:  54000140        b.eq    44 <__subvti3+0x44>  // b.none
-  20:  d65f03c0        ret
-  24:  eb01005f        cmp     x2, x1
-  28:  5400008c        b.gt    38 <__subvti3+0x38>
-  2c:  54ffffa1        b.ne    20 <__subvti3+0x20>  // b.any
-  30:  eb00009f        cmp     x4, x0
-  34:  54ffff69        b.ls    20 <__subvti3+0x20>  // b.plast
-  38:  a9bf7bfd        stp     x29, x30, [sp, #-16]!
-  3c:  910003fd        mov     x29, sp
-  40:  94000000        bl      0 <abort>
-  44:  eb04001f        cmp     x0, x4
-  48:  54ffff88        b.hi    38 <__subvti3+0x38>  // b.pmore
-  4c:  d65f03c0        ret

+  10:  b7f800e3        tbnz    x3, #63, 2c <__subvti3+0x2c>
+  14:  eb01005f        cmp     x2, x1
+  18:  1a9fb7e2        cset    w2, ge  // ge = tcont
+  1c:  fa400080        ccmp    x4, x0, #0x0, eq  // eq = none
+  20:  7a40a844        ccmp    w2, #0x0, #0x4, ge  // ge = tcont
+  24:  540000e0        b.eq    40 <__subvti3+0x40>  // b.none
+  28:  d65f03c0        ret
+  2c:  eb01005f        cmp     x2, x1
+  30:  1a9fc7e2        cset    w2, le
+  34:  fa400081        ccmp    x4, x0, #0x1, eq  // eq = none
+  38:  7a40d844        ccmp    w2, #0x0, #0x4, le
+  3c:  54ffff60        b.eq    28 <__subvti3+0x28>  // b.none
+  40:  a9bf7bfd        stp     x29, x30, [sp, #-16]!
+  44:  910003fd        mov     x29, sp
+  48:  94000000        bl      0 <abort>

So one less insn, but 2 branches instead of 6.

As for the specific case of the PR,

void test_int128(__int128 a, uint64_t l)
{
        if ((__int128_t)a - l <= 1)
                doit();
}

    0:  eb020000        subs    x0, x0, x2
    4:  da1f0021        sbc     x1, x1, xzr
    8:  f100003f        cmp     x1, #0x0
-   c:  5400004d        b.le    14 <test_int128+0x14>
-  10:  d65f03c0        ret
-  14:  54000061        b.ne    20 <test_int128+0x20>  // b.any
-  18:  f100041f        cmp     x0, #0x1
-  1c:  54ffffa8        b.hi    10 <test_int128+0x10>  // b.pmore
+   c:  1a9fc7e1        cset    w1, le
+  10:  fa410801        ccmp    x0, #0x1, #0x1, eq  // eq = none
+  14:  7a40d824        ccmp    w1, #0x0, #0x4, le
+  18:  54000041        b.ne    20 <test_int128+0x20>  // b.any
+  1c:  d65f03c0        ret
   20:  14000000        b       0 <doit>


r~


Richard Henderson (6):
  aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC
  aarch64: Adjust result of aarch64_gen_compare_reg
  aarch64: Accept 0 as first argument to compares
  aarch64: Simplify @ccmp<cc_mode><mode> operands
  aarch64: Improve nzcv argument to ccmp
  aarch64: Implement TImode comparisons

 gcc/config/aarch64/aarch64.c              | 304 ++++++++++++++++------
 gcc/config/aarch64/aarch64-simd.md        |  18 +-
 gcc/config/aarch64/aarch64-speculation.cc |   5 +-
 gcc/config/aarch64/aarch64.md             | 280 ++++++++++++++------
 4 files changed, 429 insertions(+), 178 deletions(-)

-- 
2.20.1

[PATCH 0/6] aarch64: Implement TImode comparisons

Reply via email to