https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82108

            Bug ID: 82108
           Summary: [7.2 Regression] Wrong vectorized code generated for
                    x86_64
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ell_se at yahoo dot com
  Target Milestone: ---

The following snippet generates wrong vectorized code with GCC 7.2 on x86_64
with -O3:

  void downscale_2 (const float* src, int src_n, float* dst) {
    int i;

    for (i = 0; i < src_n; i += 2) {
      const float* a = src;
      const float* b = src + 4;

      dst[0] = (a[0] + b[0]) / 2;
      dst[1] = (a[1] + b[1]) / 2;
      dst[2] = (a[2] + b[2]) / 2;
      dst[3] = (a[3] + b[3]) / 2;

      src += 2 * 4;
      dst +=     4;
    }

The assembly for the vectorized version of the loop is:

  .L5:
          addl    $1, %ecx
          movups  (%rdi,%rax), %xmm0
          movups  16(%rdi,%rax,2), %xmm2
          addps   %xmm2, %xmm0
          mulps   %xmm1, %xmm0
          movups  %xmm0, (%rdx,%rax)
          addq    $16, %rax
          cmpl    %r8d, %ecx
          jb      .L5

Notice the missing ,2 on the first movups.

It can be tested with:

  #include <stdio.h>

  int main () {
    const float in[4 * 4] = {
      1, 2, 3, 4,
      5, 6, 7, 8,

      1, 2, 3, 4,
      5, 6, 7, 8
    };
    float out[2 * 4];

    downscale_2 (in, 4, out);

    /* correct: 3, 4, 5, 6 */
    printf ("%g, %g, %g, %g\n", out[0], out[1], out[2], out[3]);

    /* incorrect: 5, 6, 7, 8; should also be 3, 4, 5, 6 */
    printf ("%g, %g, %g, %g\n", out[4], out[5], out[6], out[7]);
  }

This doesn't seem to happen with 7.1 or 6.3.

For a chuckle, this affects a recent build of GIMP, and has resulted in this
beauty:
https://bug787222.bugzilla-attachments.gnome.org/attachment.cgi?id=359042 :)

$gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib
--libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --enable-libmpx --with-system-zlib --with-isl
--enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu
--disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object
--enable-linker-build-id --enable-lto --enable-plugin
--enable-install-libiberty --with-linker-hash-style=gnu
--enable-gnu-indirect-function --disable-multilib --disable-werror
--enable-checking=release --enable-default-pie --enable-default-ssp
Thread model: posix
gcc version 7.2.0 (GCC)

Reply via email to