https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111032
Bug ID: 111032
Summary: using small types inside loops sometimes confuses the
vectorizer
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64-linux-gnu x6_64-linux-gnu
Take:
```
void __attribute__ ((noipa))
f0 (int *__restrict r,
int *__restrict a,
int *__restrict pred)
{
for (int i = 0; i < 1024; ++i)
{
unsigned short p = pred[i]?3:0;
r[i] = p ;
}
}
void __attribute__ ((noipa))
f1 (int *__restrict r,
int *__restrict a,
int *__restrict pred)
{
for (int i = 0; i < 1024; ++i)
{
int p = pred[i]?1<<3:0;
r[i] = p ;
}
}
```
These 2 functions should produce the same code, selecting between 8 and 0 but
instead in f0, we have a truncation and then an extension.
This happens on x86_64 at -O3 and aarch64 at -O3.
Though aarch64 with `-O3 -march=armv8.5-a+sve2` will be fixed with the patch to
PR 111006 (which I will be submitting later today) because SVE uses conversions
rather than VEC_PACK_TRUNC_EXPR/vec_unpack_hi_expr/vec_unpack_lo_expr here.