https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Just to note this is _basic block vectorization_ triggering.  Of course we do
vectorize basic blocks even when we do not vectorize any loop.

Is this about the "stupid" attempt to use as little AVX512 as possible because
the CPU has to down-clock?  We could add a hint to the prefered-vector-size
target hook whether we are asking for BB or loop vectorization and the target
could decide to not do AVX512 for BB vectorization.

Of course we also happily vectorize very short running loops with AVX512.

I think the compiler will have a _very_ hard job working around this CPU design
limitation...

So - I do have a way to "fix" it for this case.  What SLP detection does
for the piecewise vector construction is to push that down as far as possible,
even through binary operations that could otherwise be vectorized (the idea
was to minimize the number of such build-ups).  We could limit that to
unary operations.  Then we get a much larger SLP tree and

t.c:32:12: note: Cost model analysis:
  Vector inside of basic block cost: 152
  Vector prologue cost: 512
  Vector epilogue cost: 0
  Scalar cost of basic block: 544
t.c:32:12: note: not vectorized: vectorization is not profitable.

but of course the real issue is that the target claims that replacing
N stores with a vector store is profitable.

Hmm.  Honza did

      case vec_construct:
        return ix86_vec_cost (mode, ix86_cost->sse_op, false);

but it was

      case vec_construct:
        return ix86_cost->vec_stmt_cost * (TYPE_VECTOR_SUBPARTS (vectype) - 1);


before.  That seems like a bogus change.  I guess it should really be

      case vec_construct:
        return (ix86_vec_cost (mode, ix86_cost->sse_op, false)
                * (TYPE_VECTOR_SUBPARTS (vectype) - 1));

Honza - what was the motivation for this change?

Sergey, can you test this?  For me it makes the thing vectorized using
SSE which means it can use vpbroadcastd.

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c      (revision 255499)
+++ gcc/config/i386/i386.c      (working copy)
@@ -44879,7 +44879,8 @@ ix86_builtin_vectorization_cost (enum ve
                              ix86_cost->sse_op, true);

       case vec_construct:
-       return ix86_vec_cost (mode, ix86_cost->sse_op, false);
+       return (ix86_vec_cost (mode, ix86_cost->sse_op, false)
+               * (TYPE_VECTOR_SUBPARTS (vectype) - 1));

       default:
         gcc_unreachable ();

Reply via email to