https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118622
Bug ID: 118622
Summary: vshrn_n_u16 with a vmvnq_u16 should produce the same
code as a vsubhn_u16 with -1
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
Take:
```
uint8x8_t neg_narrow(uint16x8_t a) {
uint16x8_t b = vmvnq_u16(a);
return vshrn_n_u16(b, 8);
}
uint8x8_t neg_narrow_vsubhn(uint16x8_t a) {
uint16x8_t ones = vdupq_n_u16(0xffff);
return vsubhn_u16(ones, a);
}
```
GCC should produce the same code for both of these functions.
For neg_narrow in combine we get:
Trying 6 -> 7:
6: r104:V8HI=~r106:V8HI
REG_DEAD r106:V8HI
7: r102:V8QI=trunc(r104:V8HI 0>>const_vector)
REG_DEAD r104:V8HI
Failed to match this instruction:
(set (reg:V8QI 102 [ <retval> ])
(truncate:V8QI (lshiftrt:V8HI (not:V8HI (reg:V8HI 106 [ a ]))
(const_vector:V8HI [
(const_int 8 [0x8]) repeated x8
]))))
Which is ok and then in neg_narrow_vsubhn (in combine) we get:
Trying 6 -> 7:
6: r103:V8HI=const_vector
7: r101:V8QI=trunc(r103:V8HI-r105:V8HI>>const_vector)
REG_DEAD r105:V8HI
REG_DEAD r103:V8HI
Failed to match this instruction:
(set (reg:V8QI 101 [ <retval> ])
(truncate:V8QI (ashiftrt:V8HI (not:V8HI (reg:V8HI 105 [ a ]))
(const_vector:V8HI [
(const_int 8 [0x8]) repeated x8
]))))
Notice the only difference is lshiftrt vs ashiftrt. but with the truncate, we
are getting the high part of the register so logical vs arithmetic shift does
not matter here.
So we should match both and then turn that split it back into what the original
IR for neg_narrow_vsubhn was.
I should note that LLVM Canonicalizes this to neg_narrow but neg_narrow_vsubhn
can be faster in some (all?) cases.