https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111032

            Bug ID: 111032
           Summary: using small types inside loops sometimes confuses the
                    vectorizer
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-linux-gnu x6_64-linux-gnu

Take:
```
void __attribute__ ((noipa))
f0 (int *__restrict r,
   int *__restrict a,
   int *__restrict pred)
{
  for (int i = 0; i < 1024; ++i)
  {
    unsigned short p = pred[i]?3:0;
    r[i] = p ;
  }
}

void __attribute__ ((noipa))
f1 (int *__restrict r,
   int *__restrict a,
   int *__restrict pred)
{
  for (int i = 0; i < 1024; ++i)
  {
    int p = pred[i]?1<<3:0;
    r[i] = p ;
  }
}
```

These 2 functions should produce the same code, selecting between 8 and 0 but
instead in f0, we have a truncation and then an extension.

This happens on x86_64 at -O3 and aarch64 at -O3.

Though aarch64 with `-O3 -march=armv8.5-a+sve2` will be fixed with the patch to
PR 111006 (which I will be submitting later today) because SVE uses conversions
rather than VEC_PACK_TRUNC_EXPR/vec_unpack_hi_expr/vec_unpack_lo_expr here.

Reply via email to