https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111032
Bug ID: 111032 Summary: using small types inside loops sometimes confuses the vectorizer Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64-linux-gnu x6_64-linux-gnu Take: ``` void __attribute__ ((noipa)) f0 (int *__restrict r, int *__restrict a, int *__restrict pred) { for (int i = 0; i < 1024; ++i) { unsigned short p = pred[i]?3:0; r[i] = p ; } } void __attribute__ ((noipa)) f1 (int *__restrict r, int *__restrict a, int *__restrict pred) { for (int i = 0; i < 1024; ++i) { int p = pred[i]?1<<3:0; r[i] = p ; } } ``` These 2 functions should produce the same code, selecting between 8 and 0 but instead in f0, we have a truncation and then an extension. This happens on x86_64 at -O3 and aarch64 at -O3. Though aarch64 with `-O3 -march=armv8.5-a+sve2` will be fixed with the patch to PR 111006 (which I will be submitting later today) because SVE uses conversions rather than VEC_PACK_TRUNC_EXPR/vec_unpack_hi_expr/vec_unpack_lo_expr here.