https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540
Bug ID: 88540 Summary: Issues with vectorization of min/max operations Product: gcc Version: 8.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: bugzi...@poradnik-webmastera.com Target Milestone: --- 1st issue: [code] #define SIZE 2 void test(double* __restrict d1, double* __restrict d2, double* __restrict d3) { for (int n = 0; n < SIZE; ++n) { d3[n] = d1[n] < d2[n] ? d1[n] : d2[n]; } } [code] When this is compiled with for SSE2, gcc produces non vectorized code: [asm] test(double*, double*, double*): vmovsd xmm0, QWORD PTR [rdi] vminsd xmm0, xmm0, QWORD PTR [rsi] vmovsd QWORD PTR [rdx], xmm0 vmovsd xmm0, QWORD PTR [rdi+8] vminsd xmm0, xmm0, QWORD PTR [rsi+8] vmovsd QWORD PTR [rdx+8], xmm0 ret [/asm] When SIZE is changed to 3 or greater, code gets vectorized properly. I thought that this may be some workaround for old CPU which was slower there, but this also happen when compiling with "-O3 -march=skylake". I also checked with SIZE 6, and got 1 AVX op and 2 scalar SSE ones. Looks that this is an off-by-one bug. The same happen for code with other relational operators (>, <=, >=). 2nd issue: when compiling for AVX512, gcc does not use new instructions which use ZMM registers, it still generates code for YMM ones.