[Bug tree-optimization/54013] Loop with control flow not vectorized

2016-01-04 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013

--- Comment #2 from Andrew Pinski  ---
Note the loop needs an alignment potion otherwise there could be undefined
behavior if 45 is passed the tab array bounds and the one of the elements are
greater than x.

[Bug tree-optimization/54013] Loop with control flow not vectorized

2015-06-15 Thread alalaw01 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013

alalaw01 at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-06-15
 CC||alalaw01 at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from alalaw01 at gcc dot gnu.org ---
Indeed it does (confirmed).

So there are a few tricks here, but they are not Intel-specific, and don't even
look to require new tree codes. The loop body can be vectorized by computing
the (x < tab[i]) predicate across the vector, and then using a reduction opcode
(a bitwise-or reduction would be most natural but others work) to convert to a
scalar which then jumps out of the loop, i.e. if *any* of the lanes in the
vector would exit:

int foo (float x, float *tab)
{
  for (i = 2; i < 45; i+= 4)
{
  v4sf v_tab = ...load from tab...
  unsigned v4si v_exit_cond = vec_cond_expr({x,x,x,x} < v_tab, -1, 0);
  if (reduc_max_expr (v_exit_cond)) break;
}
  ...
}

The epilogue must then work out the value of i at exit (possibly a separate
epilogue for the "break" vs the other exit). I see two schemes:

(1) use vec_pack_trunc_expr, or similar, to narrow v_exit_cond down to a
scalar, where we can find the first set bit, and use this as an index to add to
the value still in i.

(2) compute a vector of the value i would have had if each element had been the
one that exitted:

v4si v_i_on_exit = vec_cond_expr (v_exit_cond,
{i, i+1, i+2, i+3}, /* Maybe available as induction variable?  */
{MAX_INT, MAX_INT, MAX_INT, MAX_INT})

and then take a reduc_min_expr to look for the *first* value of i that exits.

(There is one more issue, i.e. that we need to speculate the read of
tab[i+1...i+3], as the vector load will probably read all the lanes before we
know whether earlier iterations should have exited. So we'd need to have some
kind of check against that, or e.g. if tab[] were a global with known bounds.
Similar/complicated conditions apply to any/everything else in the loop, too!)


[Bug tree-optimization/54013] Loop with control flow not vectorized

2024-06-05 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013

--- Comment #3 from Andrew Pinski  ---
I think for SVE(2?) this could be vectorized using the fault first case.

[Bug tree-optimization/54013] Loop with control flow not vectorized

2024-06-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013

Tamar Christina  changed:

   What|Removed |Added

 Blocks||115130

--- Comment #4 from Tamar Christina  ---
Since there's only one source here, alignment peeling should be enough to
vectorize it.

our pending patches should support it.  Will add it to verify list.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
[Bug 115130] [meta-bug] early break vectorization