Hi

While looking at the vectorization for following example, we realized 
that even though vectorizable_shift function was distinguishing vector 
shifted by vector from vector shifted by scalar, while modeling the cost 
it would always add the cost of building a vector constant despite not 
needing it for vector shifted by scalar.

This patch fixes this by using scalar_shift_arg to determine whether we 
need to build a vector for the second operand or not. This reduces 
prologue cost as shown in the test.

Build and regression tests pass on aarch64-none-elf and 
x86_64-pc-linux-gnu-gcc. This gives a 3.42% boost to 525.x264_r in 
Spec2017 for AArch64.

gcc/ChangeLog:

2019-xx-xx  Sudakshina Das  <sudi....@arm.com>
            Richard Sandiford  <richard.sandif...@arm.com>

        * tree-vect-stmt.c (vectorizable_shift): Condition ndts for
        vect_model_simple_cost call on scalar_shift_arg.

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  <sudi....@arm.com>

        * gcc.dg/vect/vect-shift-5.c: New test.

Is this ok for trunk?

Thanks
Sudi

diff --git a/gcc/testsuite/gcc.dg/vect/vect-shift-5.c b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
new file mode 100644
index 0000000..c1fd4f2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-shift-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_int } */
+
+typedef unsigned int uint32_t;
+typedef short unsigned int uint16_t;
+
+int foo (uint32_t arr[4][4])
+{
+  int sum = 0;
+  for(int i = 0; i < 4; i++)
+    {
+      sum += ((arr[0][i] >> 10) * 20) + ((arr[1][i] >> 11) & 53)
+	     + ((arr[2][i] >> 12) * 7)  + ((arr[3][i] >> 13) ^ 43);
+    }
+    return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
+}
+
+/* { dg-final { scan-tree-dump {vectorizable_shift ===[\n\r][^\n]*prologue_cost = 0} "vect" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2cb6b15..396ff15 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -5764,7 +5764,8 @@ vectorizable_shift (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
     {
       STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
       DUMP_VECT_SCOPE ("vectorizable_shift");
-      vect_model_simple_cost (stmt_info, ncopies, dt, ndts, slp_node, cost_vec);
+      vect_model_simple_cost (stmt_info, ncopies, dt,
+			      scalar_shift_arg ? 1 : ndts, slp_node, cost_vec);
       return true;
     }
 

Reply via email to