https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194

--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 16 May 2014, vincenzo.innocente at cern dot ch wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61194
> 
> --- Comment #7 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
> great!
> 
> the original version (that vectorized in 4.8.1)
> void barX() {
>   for (int i=0; i<1024; ++i) {
>     k[i] = (x[i]>0) & (w[i]<y[i]);
>     z[i] = (k[i]) ? z[i] : y[i];
>  }
> }
> 
> does not vectorize yet.

That's because we hit

check_bool_pattern (var=<ssa_name 0x7ffff6c36e10>, loop_vinfo=0x1f3e900, 
    bb_vinfo=0x0)
    at /space/rguenther/src/svn/trunk/gcc/tree-vect-patterns.c:2596
2596                               &dt))
...
2605      if (!has_single_use (def))
2606        return false;

because

  _5 = x[i_18];
  _6 = _5 > 0.0;
  _7 = w[i_18];
  _8 = y[i_18];
  _9 = _7 < _8;
  _10 = _9 & _6;
  _11 = (int) _10;
  k[i_18] = _11;
  iftmp.0_13 = z[i_18];
  iftmp.0_2 = _10 ? iftmp.0_13 : _8;

thus we have CSEd the load from k and propagated from the
conversion.  VRP does this:

   _11 = (int) _10;
-  k[i_1] = _11;
-  if (_11 != 0)
+  k[i_18] = _11;
+  if (_10 != 0)

and -fno-tree-vrp "fixes" the regression.  If k were of type
_Bool then it likely wouldn't vectorize with 4.8 either.

The vectorizer cannot handle multi-uses of a pattern part
(in this case it's the start which would be doable, but it's
far from trivial ...).  That said,

static float x[1024];
static float y[1024];
static float z[1024];
static float w[1024];

static _Bool k[1024];

void __attribute__((noinline,noclone)) barX()
{
  int i;
  for (i=0; i<1024; ++i) {
      k[i] = (x[i]>0) & (w[i]<y[i]);
      z[i] = (k[i]) ? z[i] : y[i];
  }
}

is not vectorized even in 4.8 for the cited reason.

> On the other hand I am very happy to see
> void bar() {
>   for (int i=0; i<1024; ++i) {
>     auto c = ( (x[i]>0) & (w[i]<y[i])) | (y[i]>0.5f);
>     z[i] = c ? y[i] : z[i];
>  }
> }
> vectorized
> if (c) z[i] = y[i];
> does not even with -ftree-loop-if-convert-stores
> not a real issue at least for what I am concerned

I think it doesn't introduce data races unless you
also specify --param allow-store-data-races=1.

I also don't see the testcases vectorized when using
&& instead of &.

If not already there, these warrant (different) bugreports.

Reply via email to