On Tue, May 17, 2016 at 12:08 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Fri, May 13, 2016 at 5:53 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <bin.ch...@arm.com> wrote: >>>>Hi, >>>>As PR69848 reported, GCC vectorizer now generates comparison outside of >>>>VEC_COND_EXPR for COND_REDUCTION case, as below: >>>> >>>> _20 = vect__1.6_8 != { 0, 0, 0, 0 }; >>>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>; >>>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>; >>>> >>>>This results in inefficient expanding. With IR like: >>>> >>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0, >>>>0, 0 }, vect_c_2.7_13>; >>>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>; >>>> >>>>We can do: >>>>1) Expanding time optimization, for example, reverting comparison >>>>operator by switching VEC_COND_EXPR operands. This is useful when >>>>backend only supports some comparison operators. >>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR >>>>instruction which introduced by expand_vec_cond_expr. >>>> >>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even >>>>if it's used multiple times. For now, GCC does single_use_only >>>>propagation. Ideally, we may duplicate the comparison before each use >>>>statement just before expanding, so that TER can successfully backtrack >>>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to >>>>do this. Tree-vect-generic.c looks like a good candidate, but it's so >>>>early that following CSE could undo the transform. Another possible >>>>fix is to generate comparison inside VEC_COND_EXPR directly in function >>>>vectorizable_reduction. >>> >>> I prefer this for now. >> Hi Richard, you mean this patch, or the possible fix before your comment? > > The possible fix before my comment - make the vectorizer generate > VEC_COND_EXPRs > with embedded comparison. Hi, Here is updated patch doing that. It's definitely clearer than the original version. Bootstrap and test on x86_64. Also checked the expanding time optimization still happens. Is it OK?
Thanks, bin > > Thanks, > Richard. >
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index d673c67..67053af 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -6159,21 +6159,14 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi, Finally, we update the phi (NEW_PHI_TREE) to take the value of the new cond_expr (INDEX_COND_EXPR). */ - /* Turn the condition from vec_stmt into an ssa name. */ - gimple_stmt_iterator vec_stmt_gsi = gsi_for_stmt (*vec_stmt); - tree ccompare = gimple_assign_rhs1 (*vec_stmt); - tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare)); - gimple *ccompare_stmt = gimple_build_assign (ccompare_name, - ccompare); - gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT); - gimple_assign_set_rhs1 (*vec_stmt, ccompare_name); - update_stmt (*vec_stmt); + /* Duplicate the condition from vec_stmt. */ + tree ccompare = unshare_expr (gimple_assign_rhs1 (*vec_stmt)); /* Create a conditional, where the condition is taken from vec_stmt - (CCOMPARE_NAME), then is the induction index (INDEX_BEFORE_INCR) - and else is the phi (NEW_PHI_TREE). */ + (CCOMPARE), then is the induction index (INDEX_BEFORE_INCR) and + else is the phi (NEW_PHI_TREE). */ tree index_cond_expr = build3 (VEC_COND_EXPR, cr_index_vector_type, - ccompare_name, indx_before_incr, + ccompare, indx_before_incr, new_phi_tree); cond_name = make_ssa_name (cr_index_vector_type); gimple *index_condition = gimple_build_assign (cond_name,