https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68775

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok, it looks like a ppc64le cross happily (eh) accepts sources preprocessed on
x86_64-linux and even required built modules.

So I have the dump files myself and the -fopt-info-vec difference (for BB
vectorization only) is empty.  It looks like the only code difference is
for the vectorization of the BB in loop

shell2.fppized.f90:971

that is

                  do k = 1, k_max
                    k1 = k_x(k);    k2 = k_y(k);    k3 = k_z(k)
                    dot1 = k1*P1+k2*P2+k3*P3
                    dot2 = g4 * (k1*k1+k2*k2+k3*k3)
                    res_ij(k) = res_ij(k) + therm(k) * (fac1 *
exp(cmplx(dot2,dot1,kind=kind((1.0d0,1.0d0)))))
                  end do

which has now one less vector operand.

If you can confirm this by "bisecting" the file with -fdbg-cnt=vect_slp:N
that would be nice.

The vectorized code looks ok to me so I suspect a target issue here.

Note that we do both a vector load from the realpart of a complex and
a scalar load of the imaginary part and then use that to construct
another vector:

  _1371 = REALPART_EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0]
MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;
  vectp.6451_10558 = &REALPART_EXPR <[shell2.fppized.f90:975:0]
[shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;
  vect__1371.6452_10556 = MEM[(real(kind=8) *)vectp.6451_10558];
  _395 = IMAGPART_EXPR <[shell2.fppized.f90:975:0] [shell2.fppized.f90:975:0]
MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;

  [shell2.fppized.f90:975:0] _177 = _964 * _3980;
  [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395};
  [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 +
vect__1371.6452_10556;
  [shell2.fppized.f90:975:0] _389 = _395 + _3549;
  vectp.6455_5409 = &REALPART_EXPR <[shell2.fppized.f90:975:0]
[shell2.fppized.f90:975:0] MEM[(complex(kind=8)[0:] *)res.0_420][_960]>;
  [shell2.fppized.f90:975:0] MEM[(real(kind=8) *)vectp.6455_5409] =
vect__455.6453_5427;

in .optimized the above looks like

  vect__1371.6452_10556 = MEM[base: _9159, offset: 0B];
  _395 = MEM[base: _9159, offset: 8B];
  _9158 = (void *) ivtmp.7110_9170;
  [shell2.fppized.f90:975:0] _964 = MEM[base: _9158, offset: 0B];
  [shell2.fppized.f90:975:0] _486 = __builtin_exp (dot2_958);
  [shell2.fppized.f90:975:0] _508 = REALPART_EXPR <sincostmp_3746>;
  _1815 = _486 * fac1$real_1370;
  [shell2.fppized.f90:975:0] _518 = IMAGPART_EXPR <sincostmp_3746>;
  [shell2.fppized.f90:975:0] _178 = _508 * _1815;
  [shell2.fppized.f90:975:0] _201 = _518 * _1815;
  [shell2.fppized.f90:975:0] _967 = COMPLEX_EXPR <_178, _201>;
  [shell2.fppized.f90:975:0] _968 = ((_967));
  _3980 = REALPART_EXPR <_968>;
  [shell2.fppized.f90:975:0] _177 = _964 * _3980;
  [shell2.fppized.f90:975:0] vect_cst__10554 = {_177, _395};
  [shell2.fppized.f90:975:0] vect__455.6453_5427 = vect_cst__10554 +
vect__1371.6452_10556;
  [shell2.fppized.f90:975:0] MEM[base: _9159, offset: 0B] =
vect__455.6453_5427;

which might be enough to trigger later RTL opt confusion.  I can just guess
at something CSEing the scalar load with the vector load and getting
lane ordering (endianess) wrong.

Maybe you can extract a small testcase from the above info that reproduces
the difference.

Reply via email to