This is attacking case 3 of PR 94174.
The existing ccmp optimization happens at the gimple level,
which means that rtl expansion of TImode stuff cannot take
advantage. But we can to even better than the existing
ccmp optimization.
This expansion is similar size to our current branchful
expansion, but all straight-line code. I will assume in
general that the branch predictor will work better with
fewer branches.
E.g.
- 10: b7f800a3tbnzx3, #63, 24 <__subvti3+0x24>
- 14: eb02003fcmp x1, x2
- 18: 5400010cb.gt38 <__subvti3+0x38>
- 1c: 54000140b.eq44 <__subvti3+0x44> // b.none
- 20: d65f03c0ret
- 24: eb01005fcmp x2, x1
- 28: 548cb.gt38 <__subvti3+0x38>
- 2c: 54a1b.ne20 <__subvti3+0x20> // b.any
- 30: eb9fcmp x4, x0
- 34: 5469b.ls20 <__subvti3+0x20> // b.plast
- 38: a9bf7bfdstp x29, x30, [sp, #-16]!
- 3c: 910003fdmov x29, sp
- 40: 9400bl 0
- 44: eb04001fcmp x0, x4
- 48: 5488b.hi38 <__subvti3+0x38> // b.pmore
- 4c: d65f03c0ret
+ 10: b7f800e3tbnzx3, #63, 2c <__subvti3+0x2c>
+ 14: eb01005fcmp x2, x1
+ 18: 1a9fb7e2csetw2, ge // ge = tcont
+ 1c: fa400080ccmpx4, x0, #0x0, eq // eq = none
+ 20: 7a40a844ccmpw2, #0x0, #0x4, ge // ge = tcont
+ 24: 54e0b.eq40 <__subvti3+0x40> // b.none
+ 28: d65f03c0ret
+ 2c: eb01005fcmp x2, x1
+ 30: 1a9fc7e2csetw2, le
+ 34: fa400081ccmpx4, x0, #0x1, eq // eq = none
+ 38: 7a40d844ccmpw2, #0x0, #0x4, le
+ 3c: 5460b.eq28 <__subvti3+0x28> // b.none
+ 40: a9bf7bfdstp x29, x30, [sp, #-16]!
+ 44: 910003fdmov x29, sp
+ 48: 9400bl 0
So one less insn, but 2 branches instead of 6.
As for the specific case of the PR,
void test_int128(__int128 a, uint64_t l)
{
if ((__int128_t)a - l <= 1)
doit();
}
0: eb02subsx0, x0, x2
4: da1f0021sbc x1, x1, xzr
8: f13fcmp x1, #0x0
- c: 544db.le14
- 10: d65f03c0ret
- 14: 5461b.ne20 // b.any
- 18: f100041fcmp x0, #0x1
- 1c: 54a8b.hi10 // b.pmore
+ c: 1a9fc7e1csetw1, le
+ 10: fa410801ccmpx0, #0x1, #0x1, eq // eq = none
+ 14: 7a40d824ccmpw1, #0x0, #0x4, le
+ 18: 5441b.ne20 // b.any
+ 1c: d65f03c0ret
20: 1400b 0
r~
Richard Henderson (6):
aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC
aarch64: Adjust result of aarch64_gen_compare_reg
aarch64: Accept 0 as first argument to compares
aarch64: Simplify @ccmp operands
aarch64: Improve nzcv argument to ccmp
aarch64: Implement TImode comparisons
gcc/config/aarch64/aarch64.c | 304 --
gcc/config/aarch64/aarch64-simd.md| 18 +-
gcc/config/aarch64/aarch64-speculation.cc | 5 +-
gcc/config/aarch64/aarch64.md | 280 ++--
4 files changed, 429 insertions(+), 178 deletions(-)
--
2.20.1