On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Hi All,
>
> Here is a patch for 481.wrf preformance regression for avx2 which is
> sligthly modified mask store optimization. This transformation allows
> perform unpredication for semi-hammock containing masked stores, other
> words if we have a loop like
> for (i=0; i<n; i++)
>   if (c[i]) {
>     p1[i] += 1;
>     p2[i] = p3[i] +2;
>   }
>
> then it will be transformed to
>    if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>      vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
>      vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>      MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
>      vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
>      vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>      MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
>    }
> i.e. it will put all computations related to masked stores to semi-hammock.
>
> Bootstrapping and regression testing did not show any new failures.

Can you please split out the middle-end support for vector equality compares?

@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
       if (TREE_CODE (op0_type) == VECTOR_TYPE
          || TREE_CODE (op1_type) == VECTOR_TYPE)
         {
-          error ("vector comparison returning a boolean");
-          debug_generic_expr (op0_type);
-          debug_generic_expr (op1_type);
-          return true;
+         /* Allow vector comparison returning boolean if operand types
+            are equal and CODE is EQ/NE.  */
+         if ((code != EQ_EXPR && code != NE_EXPR)
+             || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
+                  || VECTOR_INTEGER_TYPE_P (op0_type)))
+           {
+             error ("type mismatch for vector comparison returning a boolean");
+             debug_generic_expr (op0_type);
+             debug_generic_expr (op1_type);
+             return true;
+           }
         }
     }

please merge the conditions with a &&

@@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
tree type, tree op0, tree op1)

   if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
     {
+      if (INTEGRAL_TYPE_P (type)
+         && (TREE_CODE (type) == BOOLEAN_TYPE
+             || TYPE_PRECISION (type) == 1))
+       {
+         /* Have vector comparison with scalar boolean result.  */
+         bool result = true;
+         gcc_assert (code == EQ_EXPR || code == NE_EXPR);
+         gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+         for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+           {
+             tree elem0 = VECTOR_CST_ELT (op0, i);
+             tree elem1 = VECTOR_CST_ELT (op1, i);
+             tree tmp = fold_relational_const (code, type, elem0, elem1);
+             result &= integer_onep (tmp);
+         if (code == NE_EXPR)
+           result = !result;
+         return constant_boolean_node (result, type);

... just assumes it is either EQ_EXPR or NE_EXPR.   I believe you want
to change the
guarding condition to just

   if (! VECTOR_TYPE_P (type))

and assert the boolean/precision.  Please also merge the asserts into
one with &&

diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..73ee3be 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
tree_code code, tree type,

   gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);

+  /* Do not perform combining it types are not compatible.  */
+  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+      && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
+    return NULL_TREE;
+

again, how does this happen?

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..1605520c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
gimple_stmt_iterator si,
                                                &comp_code, &val))
     return;

+  /* Use of vector comparison in gcond is very restricted and used to check
+     that the mask in masked store is zero, so assert for such comparison
+     is not implemented yet.  */
+  if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+    return;
+

VECTOR_TYPE_P

I believe the comment should simply say that VRP doesn't track ranges for
vector types.

In the previous review I suggested you should make sure that RTL expansion
ends up using a well-defined optab for these compares.  To make sure
this happens across targets I suggest you make these comparisons available
via the GCC vector extension.  Thus allow

typedef int v4si __attribute__((vector_size(16)));

int foo (v4si a, v4si b)
{
  if (a == b)
    return 4;
}

and != and also using floating point vectors.

Otherwise it's hard to see the impact of this change.  Obvious choices
are the eq/ne optabs for FP compares and [u]cmp optabs for integer
compares.

A half-way implementation like your VRP comment suggests (only
==/!= zero against integer vectors is implemented?!) this doesn't sound
good without also limiting the feature this way in the verifier.

Btw, the regression with WRF is >50% on AMD Bulldozer (which only
has AVX, not AVX2).

Thanks,
Richard.

> ChangeLog:
> 2015-11-30  Yuri Rumyantsev  <ysrum...@gmail.com>
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
> comparison with boolean result.
> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
> for vector comparion with eq/ne only.
> * fold-const.c (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vect-loop.c (is_valid_sink): New function.
> (optimize_mask_stores): Likewise.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
> type.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.

Reply via email to