[Bug target/118622] New: vshrn_n_u16 with a vmvnq_u16 should produce the same code as a vsubhn_u16 with -1

pinskia at gcc dot gnu.org via Gcc-bugs Wed, 22 Jan 2025 22:03:00 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118622


            Bug ID: 118622
           Summary: vshrn_n_u16 with a vmvnq_u16 should produce the same
                    code as a vsubhn_u16 with -1
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
uint8x8_t neg_narrow(uint16x8_t a) {
  uint16x8_t b = vmvnq_u16(a);
  return vshrn_n_u16(b, 8);
}

uint8x8_t neg_narrow_vsubhn(uint16x8_t a) {
  uint16x8_t ones = vdupq_n_u16(0xffff);
  return vsubhn_u16(ones, a);
}
```

GCC should produce the same code for both of these functions.

For neg_narrow in combine we get:
Trying 6 -> 7:
    6: r104:V8HI=~r106:V8HI
      REG_DEAD r106:V8HI
    7: r102:V8QI=trunc(r104:V8HI 0>>const_vector)
      REG_DEAD r104:V8HI
Failed to match this instruction:
(set (reg:V8QI 102 [ <retval> ])
    (truncate:V8QI (lshiftrt:V8HI (not:V8HI (reg:V8HI 106 [ a ]))
            (const_vector:V8HI [
                    (const_int 8 [0x8]) repeated x8
                ]))))


Which is ok and then in neg_narrow_vsubhn (in combine) we get:

Trying 6 -> 7:
    6: r103:V8HI=const_vector
    7: r101:V8QI=trunc(r103:V8HI-r105:V8HI>>const_vector)
      REG_DEAD r105:V8HI
      REG_DEAD r103:V8HI
Failed to match this instruction:
(set (reg:V8QI 101 [ <retval> ])
    (truncate:V8QI (ashiftrt:V8HI (not:V8HI (reg:V8HI 105 [ a ]))
            (const_vector:V8HI [
                    (const_int 8 [0x8]) repeated x8
                ]))))

Notice the only difference is lshiftrt vs ashiftrt. but with the truncate, we
are getting the high part of the register so logical vs arithmetic shift does
not matter here.

So we should match both and then turn that split it back into what the original
IR for neg_narrow_vsubhn was.

I should note that LLVM Canonicalizes this to neg_narrow but neg_narrow_vsubhn
can be faster in some (all?) cases.

[Bug target/118622] New: vshrn_n_u16 with a vmvnq_u16 should produce the same code as a vsubhn_u16 with -1

Reply via email to