This is attacking case 3 of PR 94174.
The existing ccmp optimization happens at the gimple level,
which means that rtl expansion of TImode stuff cannot take
advantage. But we can to even better than the existing
ccmp optimization.
This expansion is similar size to our current branchful
expansion, but all straight-line code. I will assume in
general that the branch predictor will work better with
fewer branches.
E.g.
- 10: b7f800a3 tbnz x3, #63, 24 <__subvti3+0x24>
- 14: eb02003f cmp x1, x2
- 18: 5400010c b.gt 38 <__subvti3+0x38>
- 1c: 54000140 b.eq 44 <__subvti3+0x44> // b.none
- 20: d65f03c0 ret
- 24: eb01005f cmp x2, x1
- 28: 5400008c b.gt 38 <__subvti3+0x38>
- 2c: 54ffffa1 b.ne 20 <__subvti3+0x20> // b.any
- 30: eb00009f cmp x4, x0
- 34: 54ffff69 b.ls 20 <__subvti3+0x20> // b.plast
- 38: a9bf7bfd stp x29, x30, [sp, #-16]!
- 3c: 910003fd mov x29, sp
- 40: 94000000 bl 0 <abort>
- 44: eb04001f cmp x0, x4
- 48: 54ffff88 b.hi 38 <__subvti3+0x38> // b.pmore
- 4c: d65f03c0 ret
+ 10: b7f800e3 tbnz x3, #63, 2c <__subvti3+0x2c>
+ 14: eb01005f cmp x2, x1
+ 18: 1a9fb7e2 cset w2, ge // ge = tcont
+ 1c: fa400080 ccmp x4, x0, #0x0, eq // eq = none
+ 20: 7a40a844 ccmp w2, #0x0, #0x4, ge // ge = tcont
+ 24: 540000e0 b.eq 40 <__subvti3+0x40> // b.none
+ 28: d65f03c0 ret
+ 2c: eb01005f cmp x2, x1
+ 30: 1a9fc7e2 cset w2, le
+ 34: fa400081 ccmp x4, x0, #0x1, eq // eq = none
+ 38: 7a40d844 ccmp w2, #0x0, #0x4, le
+ 3c: 54ffff60 b.eq 28 <__subvti3+0x28> // b.none
+ 40: a9bf7bfd stp x29, x30, [sp, #-16]!
+ 44: 910003fd mov x29, sp
+ 48: 94000000 bl 0 <abort>
So one less insn, but 2 branches instead of 6.
As for the specific case of the PR,
void test_int128(__int128 a, uint64_t l)
{
if ((__int128_t)a - l <= 1)
doit();
}
0: eb020000 subs x0, x0, x2
4: da1f0021 sbc x1, x1, xzr
8: f100003f cmp x1, #0x0
- c: 5400004d b.le 14 <test_int128+0x14>
- 10: d65f03c0 ret
- 14: 54000061 b.ne 20 <test_int128+0x20> // b.any
- 18: f100041f cmp x0, #0x1
- 1c: 54ffffa8 b.hi 10 <test_int128+0x10> // b.pmore
+ c: 1a9fc7e1 cset w1, le
+ 10: fa410801 ccmp x0, #0x1, #0x1, eq // eq = none
+ 14: 7a40d824 ccmp w1, #0x0, #0x4, le
+ 18: 54000041 b.ne 20 <test_int128+0x20> // b.any
+ 1c: d65f03c0 ret
20: 14000000 b 0 <doit>
r~
Richard Henderson (6):
aarch64: Add ucmp_*_carryinC patterns for all usub_*_carryinC
aarch64: Adjust result of aarch64_gen_compare_reg
aarch64: Accept 0 as first argument to compares
aarch64: Simplify @ccmp<cc_mode><mode> operands
aarch64: Improve nzcv argument to ccmp
aarch64: Implement TImode comparisons
gcc/config/aarch64/aarch64.c | 304 ++++++++++++++++------
gcc/config/aarch64/aarch64-simd.md | 18 +-
gcc/config/aarch64/aarch64-speculation.cc | 5 +-
gcc/config/aarch64/aarch64.md | 280 ++++++++++++++------
4 files changed, 429 insertions(+), 178 deletions(-)
--
2.20.1