https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540

            Bug ID: 88540
           Summary: Issues with vectorization of min/max operations
           Product: gcc
           Version: 8.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bugzi...@poradnik-webmastera.com
  Target Milestone: ---

1st issue:

[code]
#define SIZE 2

void test(double* __restrict d1, double* __restrict d2, double* __restrict d3)
{
    for (int n = 0; n < SIZE; ++n)
    {
        d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
    }
}
[code]

When this is compiled with for SSE2, gcc produces non vectorized code:

[asm]
test(double*, double*, double*):
        vmovsd  xmm0, QWORD PTR [rdi]
        vminsd  xmm0, xmm0, QWORD PTR [rsi]
        vmovsd  QWORD PTR [rdx], xmm0
        vmovsd  xmm0, QWORD PTR [rdi+8]
        vminsd  xmm0, xmm0, QWORD PTR [rsi+8]
        vmovsd  QWORD PTR [rdx+8], xmm0
        ret
[/asm]

When SIZE is changed to 3 or greater, code gets vectorized properly. I thought
that this may be some workaround for old CPU which was slower there, but this
also happen when compiling with "-O3 -march=skylake". I also checked with SIZE
6, and got 1 AVX op and 2 scalar SSE ones. Looks that this is an off-by-one
bug.

The same happen for code with other relational operators (>, <=, >=).

2nd issue: when compiling for AVX512, gcc does not use new instructions which
use ZMM registers, it still generates code for YMM ones.

Reply via email to