https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87561

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, it is exactly the current pessimization of vector construction that makes
the AVX256 variant not profitable:

0x40e04e0 *co_99(D)[_53] 1 times vec_construct costs 112 in body

that's because we multiply the "real" cost (three inserts, 28) by
TYPE_VECTOR_SUBPARTS (four) in x86 add_stmt_cost.  For the SSE2 case
that results "only" in a factor of two.  Changing that "arbitrary"
doing into * (TYPE_VECTOR_SUBPARTS + 1) doesn't help.  We can add
equal handling to catch strided stores but that doesn't help either
on its own.  Doing both helps not vectorizing though.

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c      (revision 269683)
+++ gcc/config/i386/i386.c      (working copy)
@@ -50534,14 +50534,15 @@ ix86_add_stmt_cost (void *data, int coun
      latency and execution resources for the many scalar loads
      (AGU and load ports).  Try to account for this by scaling the
      construction cost by the number of elements involved.  */
-  if (kind == vec_construct
+  if ((kind == vec_construct || kind == vec_to_scalar)
       && stmt_info
-      && STMT_VINFO_TYPE (stmt_info) == load_vec_info_type
+      && (STMT_VINFO_TYPE (stmt_info) == load_vec_info_type
+         || STMT_VINFO_TYPE (stmt_info) == store_vec_info_type)
       && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE
       && TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info))) != INTEGER_CST)
     {
       stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
-      stmt_cost *= TYPE_VECTOR_SUBPARTS (vectype);
+      stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
     }
   if (stmt_cost == -1)
     stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);

Reply via email to