Hi James Sorry for the late reply.
On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh <james.greenha...@arm.com> wrote: >> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote: >> > Thank you for your detailed explanation. >> > >> > Once GCC detects a reduction operation, it will automatically >> > accumulate all elements in the vector after the loop. In the loop the >> > reduction variable is always a vector whose elements are reductions of >> > corresponding values from other vectors. Therefore in your case the >> > only instruction you need to generate is: >> > >> > VABAL ops[3], ops[1], ops[2] >> > >> > It is OK if you accumulate the elements into one in the vector inside >> > of the loop (if one instruction can do this), but you have to make >> > sure other elements in the vector should remain zero so that the final >> > result is correct. >> > >> > If you are confused about the documentation, check the one for >> > udot_prod (just above usad in md.texi), as it has very similar >> > behavior as usad. Actually I copied the text from there and did some >> > changes. As those two instruction patterns are both for vectorization, >> > their behavior should not be difficult to explain. >> > >> > If you have more questions or think that the documentation is still >> > improper please let me know. > > Hi Cong, > > Thanks for your reply. > > I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and > DOT_PROD_EXPR and I see that the same ambiguity exists for > DOT_PROD_EXPR. Can you please add a note in your tree.def > that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either: > > tmp = WIDEN_MINUS_EXPR (arg1, arg2) > tmp2 = ABS_EXPR (tmp) > arg3 = PLUS_EXPR (tmp2, arg3) > > or: > > tmp = WIDEN_MINUS_EXPR (arg1, arg2) > tmp2 = ABS_EXPR (tmp) > arg3 = WIDEN_SUM_EXPR (tmp2, arg3) > > Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a > a value of the same (widened) type as arg3. > I have added it, although we currently don't have WIDEN_MINUS_EXPR (I mentioned it in tree.def). > Also, while looking for the history of DOT_PROD_EXPR I spotted this > patch: > > [autovect] [patch] detect mult-hi and sad patterns > http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html > > I wonder what the reason was for that patch to be dropped? > It has been 8 years.. I have no idea why this patch is not accepted finally. There is even no reply in that thread. But I believe the SAD pattern is very important to be recognized. ARM also provides instructions for it. Thank you for your comment again! thanks, Cong > Thanks, > James >
diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6bdaa31..37ff6c4 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,4 +1,24 @@ -2013-11-01 Trevor Saunders <tsaund...@mozilla.com> +2013-10-29 Cong Hou <co...@google.com> + + * tree-vect-patterns.c (vect_recog_sad_pattern): New function for SAD + pattern recognition. + (type_conversion_p): PROMOTION is true if it's a type promotion + conversion, and false otherwise. Return true if the given expression + is a type conversion one. + * tree-vectorizer.h: Adjust the number of patterns. + * tree.def: Add SAD_EXPR. + * optabs.def: Add sad_optab. + * cfgexpand.c (expand_debug_expr): Add SAD_EXPR case. + * expr.c (expand_expr_real_2): Likewise. + * gimple-pretty-print.c (dump_ternary_rhs): Likewise. + * gimple.c (get_gimple_rhs_num_ops): Likewise. + * optabs.c (optab_for_tree_code): Likewise. + * tree-cfg.c (estimate_operator_cost): Likewise. + * tree-ssa-operands.c (get_expr_operands): Likewise. + * tree-vect-loop.c (get_initial_def_for_reduction): Likewise. + * config/i386/sse.md: Add SSE2 and AVX2 expand for SAD. + * doc/generic.texi: Add document for SAD_EXPR. + * doc/md.texi: Add document for ssad and usad. * function.c (reorder_blocks): Convert block_stack to a stack_vec. * gimplify.c (gimplify_compound_lval): Likewise. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index fb05ce7..1f824fb 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2740,6 +2740,7 @@ expand_debug_expr (tree exp) { case COND_EXPR: case DOT_PROD_EXPR: + case SAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case FMA_EXPR: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9094a1c..af73817 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -7278,6 +7278,36 @@ DONE; }) +(define_expand "usadv16qi" + [(match_operand:V4SI 0 "register_operand") + (match_operand:V16QI 1 "register_operand") + (match_operand:V16QI 2 "nonimmediate_operand") + (match_operand:V4SI 3 "nonimmediate_operand")] + "TARGET_SSE2" +{ + rtx t1 = gen_reg_rtx (V2DImode); + rtx t2 = gen_reg_rtx (V4SImode); + emit_insn (gen_sse2_psadbw (t1, operands[1], operands[2])); + convert_move (t2, t1, 0); + emit_insn (gen_addv4si3 (operands[0], t2, operands[3])); + DONE; +}) + +(define_expand "usadv32qi" + [(match_operand:V8SI 0 "register_operand") + (match_operand:V32QI 1 "register_operand") + (match_operand:V32QI 2 "nonimmediate_operand") + (match_operand:V8SI 3 "nonimmediate_operand")] + "TARGET_AVX2" +{ + rtx t1 = gen_reg_rtx (V4DImode); + rtx t2 = gen_reg_rtx (V8SImode); + emit_insn (gen_avx2_psadbw (t1, operands[1], operands[2])); + convert_move (t2, t1, 0); + emit_insn (gen_addv8si3 (operands[0], t2, operands[3])); + DONE; +}) + (define_insn "ashr<mode>3" [(set (match_operand:VI24_AVX2 0 "register_operand" "=x,x") (ashiftrt:VI24_AVX2 diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi index 73dd123..fa9a19a 100644 --- a/gcc/doc/generic.texi +++ b/gcc/doc/generic.texi @@ -1713,6 +1713,7 @@ a value from @code{enum annot_expr_kind}. @tindex VEC_PACK_TRUNC_EXPR @tindex VEC_PACK_SAT_EXPR @tindex VEC_PACK_FIX_TRUNC_EXPR +@tindex SAD_EXPR @table @code @item VEC_LSHIFT_EXPR @@ -1793,6 +1794,15 @@ value, it is taken from the second operand. It should never evaluate to any other value currently, but optimizations should not rely on that property. In contrast with a @code{COND_EXPR}, all operands are always evaluated. + +@item SAD_EXPR +This node represents the Sum of Absolute Differences operation. The three +operands must be vectors of integral types. The first and second operand +must have the same type. The size of the vector element of the third +operand must be at lease twice of the size of the vector element of the +first and second one. The SAD is calculated between the first and second +operands, added to the third operand, and returned. + @end table diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index ac10a0a..8142a7f 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4786,6 +4786,16 @@ wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the product. The result is placed in operand 0, which is of the same mode as operand 3. +@cindex @code{ssad@var{m}} instruction pattern +@item @samp{ssad@var{m}} +@cindex @code{usad@var{m}} instruction pattern +@item @samp{usad@var{m}} +Compute the sum of absolute differences of two signed/unsigned elements. +Operand 1 and operand 2 are of the same mode. Their absolute difference, which +is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode +equal or wider than the mode of the absolute difference. The result is placed +in operand 0, which is of the same mode as operand 3. + @cindex @code{ssum_widen@var{m3}} instruction pattern @item @samp{ssum_widen@var{m3}} @cindex @code{usum_widen@var{m3}} instruction pattern diff --git a/gcc/expr.c b/gcc/expr.c index 551a660..ce45a79 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -8973,6 +8973,20 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, return target; } + case SAD_EXPR: + { + tree oprnd0 = treeop0; + tree oprnd1 = treeop1; + tree oprnd2 = treeop2; + rtx op2; + + expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL); + op2 = expand_normal (oprnd2); + target = expand_widen_pattern_expr (ops, op0, op1, op2, + target, unsignedp); + return target; + } + case REALIGN_LOAD_EXPR: { tree oprnd0 = treeop0; diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index 6842213..56a90f1 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -429,6 +429,16 @@ dump_ternary_rhs (pretty_printer *buffer, gimple gs, int spc, int flags) dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); pp_greater (buffer); break; + + case SAD_EXPR: + pp_string (buffer, "SAD_EXPR <"); + dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); + pp_string (buffer, ", "); + dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); + pp_string (buffer, ", "); + dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); + pp_greater (buffer); + break; case VEC_PERM_EXPR: pp_string (buffer, "VEC_PERM_EXPR <"); diff --git a/gcc/gimple.c b/gcc/gimple.c index 20f6010..599d1f6 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -2582,6 +2582,7 @@ get_gimple_rhs_num_ops (enum tree_code code) || (SYM) == WIDEN_MULT_PLUS_EXPR \ || (SYM) == WIDEN_MULT_MINUS_EXPR \ || (SYM) == DOT_PROD_EXPR \ + || (SYM) == SAD_EXPR \ || (SYM) == REALIGN_LOAD_EXPR \ || (SYM) == VEC_COND_EXPR \ || (SYM) == VEC_PERM_EXPR \ diff --git a/gcc/optabs.c b/gcc/optabs.c index 3755670..564aefe 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -462,6 +462,9 @@ optab_for_tree_code (enum tree_code code, const_tree type, case DOT_PROD_EXPR: return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab; + case SAD_EXPR: + return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab; + case WIDEN_MULT_PLUS_EXPR: return (TYPE_UNSIGNED (type) ? (TYPE_SATURATING (type) diff --git a/gcc/optabs.def b/gcc/optabs.def index 6b924ac..377763e 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -248,6 +248,8 @@ OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, "udot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") +OPTAB_D (usad_optab, "usad$I$a") +OPTAB_D (ssad_optab, "ssad$I$a") OPTAB_D (vec_extract_optab, "vec_extract$a") OPTAB_D (vec_init_optab, "vec_init$a") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index e9bd852..9528325 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-29 Cong Hou <co...@google.com> + + * gcc.dg/vect/vect-reduc-sad.c: New. + * lib/target-supports.exp (check_effective_target_vect_usad_char): New. 2013-11-01 Marc Glisse <marc.gli...@inria.fr> PR c++/58834 diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c new file mode 100644 index 0000000..15a625f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad.c @@ -0,0 +1,54 @@ +/* { dg-require-effective-target vect_usad_char } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 64 +#define SAD N*N/2 + +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); + +/* Sum of absolute differences between arrays of unsigned char types. + Detected as a sad pattern. + Vectorized on targets that support sad for unsigned chars. */ + +__attribute__ ((noinline)) int +foo (int len) +{ + int i; + int result = 0; + + for (i = 0; i < len; i++) + result += abs (X[i] - Y[i]); + + return result; +} + + +int +main (void) +{ + int i; + int sad; + + check_vect (); + + for (i = 0; i < N; i++) + { + X[i] = i; + Y[i] = N - i; + __asm__ volatile (""); + } + + sad = foo (N); + if (sad != SAD) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vect_recog_sad_pattern: detected" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5ca0b76..3ab71ea 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3685,6 +3685,26 @@ proc check_effective_target_vect_udot_hi { } { return $et_vect_udot_hi_saved } +# Return 1 if the target plus current options supports a vector +# sad operation of unsigned chars, 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_usad_char { } { + global et_vect_usad_char + + if [info exists et_vect_usad_char_saved] { + verbose "check_effective_target_vect_usad_char: using cached result" 2 + } else { + set et_vect_usad_char_saved 0 + if { ([istarget i?86-*-*] + || [istarget x86_64-*-*]) } { + set et_vect_usad_char_saved 1 + } + } + verbose "check_effective_target_vect_usad_char: returning $et_vect_usad_char_saved" 2 + return $et_vect_usad_char_saved +} # Return 1 if the target plus current options supports a vector # demotion (packing) of shorts (to chars) and ints (to shorts) diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index d646693..e63fce0 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3830,6 +3830,36 @@ verify_gimple_assign_ternary (gimple stmt) return false; + case SAD_EXPR: + if (!useless_type_conversion_p (rhs1_type, rhs2_type) + || !useless_type_conversion_p (lhs_type, rhs3_type) + || 2 * GET_MODE_BITSIZE (GET_MODE_INNER + (TYPE_MODE (TREE_TYPE (rhs1_type)))) + > GET_MODE_BITSIZE (GET_MODE_INNER + (TYPE_MODE (TREE_TYPE (lhs_type))))) + { + error ("type mismatch in sad expression"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + debug_generic_expr (rhs2_type); + debug_generic_expr (rhs3_type); + return true; + } + + if (TREE_CODE (rhs1_type) != VECTOR_TYPE + || TREE_CODE (rhs2_type) != VECTOR_TYPE + || TREE_CODE (rhs3_type) != VECTOR_TYPE) + { + error ("vector types expected in sad expression"); + debug_generic_expr (lhs_type); + debug_generic_expr (rhs1_type); + debug_generic_expr (rhs2_type); + debug_generic_expr (rhs3_type); + return true; + } + + return false; + case DOT_PROD_EXPR: case REALIGN_LOAD_EXPR: /* FIXME. */ diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 77013b3..a9a23ce 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -3605,6 +3605,7 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights, case WIDEN_SUM_EXPR: case WIDEN_MULT_EXPR: case DOT_PROD_EXPR: + case SAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case WIDEN_LSHIFT_EXPR: diff --git a/gcc/tree-ssa-operands.c b/gcc/tree-ssa-operands.c index 4e05d2d..b44aca8 100644 --- a/gcc/tree-ssa-operands.c +++ b/gcc/tree-ssa-operands.c @@ -859,6 +859,7 @@ get_expr_operands (gimple stmt, tree *expr_p, int flags) } case DOT_PROD_EXPR: + case SAD_EXPR: case REALIGN_LOAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index d5f86ad..7700f0c 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -3620,6 +3620,7 @@ get_initial_def_for_reduction (gimple stmt, tree init_val, { case WIDEN_SUM_EXPR: case DOT_PROD_EXPR: + case SAD_EXPR: case PLUS_EXPR: case MINUS_EXPR: case BIT_IOR_EXPR: diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 0998804..4f873b2 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -49,6 +49,8 @@ static gimple vect_recog_widen_mult_pattern (vec<gimple> *, tree *, tree *); static gimple vect_recog_dot_prod_pattern (vec<gimple> *, tree *, tree *); +static gimple vect_recog_sad_pattern (vec<gimple> *, tree *, + tree *); static gimple vect_recog_pow_pattern (vec<gimple> *, tree *, tree *); static gimple vect_recog_over_widening_pattern (vec<gimple> *, tree *, tree *); @@ -66,6 +68,7 @@ static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = { vect_recog_widen_mult_pattern, vect_recog_widen_sum_pattern, vect_recog_dot_prod_pattern, + vect_recog_sad_pattern, vect_recog_pow_pattern, vect_recog_widen_shift_pattern, vect_recog_over_widening_pattern, @@ -144,9 +147,8 @@ vect_single_imm_use (gimple def_stmt) } /* Check whether NAME, an ssa-name used in USE_STMT, - is a result of a type promotion or demotion, such that: + is a result of a type promotion, such that: DEF_STMT: NAME = NOP (name0) - where the type of name0 (ORIG_TYPE) is smaller/bigger than the type of NAME. If CHECK_SIGN is TRUE, check that either both types are signed or both are unsigned. */ @@ -193,10 +195,8 @@ type_conversion_p (tree name, gimple use_stmt, bool check_sign, if (TYPE_PRECISION (type) >= (TYPE_PRECISION (*orig_type) * 2)) *promotion = true; - else if (TYPE_PRECISION (*orig_type) >= (TYPE_PRECISION (type) * 2)) - *promotion = false; else - return false; + *promotion = false; if (!vect_is_simple_use (oprnd0, *def_stmt, loop_vinfo, bb_vinfo, &dummy_gimple, &dummy, &dt)) @@ -437,6 +437,240 @@ vect_recog_dot_prod_pattern (vec<gimple> *stmts, tree *type_in, } +/* Function vect_recog_sad_pattern + + Try to find the following Sum of Absolute Difference (SAD) pattern: + + type x_t, y_t; + signed TYPE1 diff, abs_diff; + TYPE2 sum = init; + loop: + sum_0 = phi <init, sum_1> + S1 x_t = ... + S2 y_t = ... + S3 x_T = (TYPE1) x_t; + S4 y_T = (TYPE1) y_t; + S5 diff = x_T - y_T; + S6 abs_diff = ABS_EXPR <diff>; + [S7 abs_diff = (TYPE2) abs_diff; #optional] + S8 sum_1 = abs_diff + sum_0; + + where 'TYPE1' is at least double the size of type 'type', and 'TYPE2' is the + same size of 'TYPE1' or bigger. This is a special case of a reduction + computation. + + Input: + + * STMTS: Contains a stmt from which the pattern search begins. In the + example, when this function is called with S8, the pattern + {S3,S4,S5,S6,S7,S8} will be detected. + + Output: + + * TYPE_IN: The type of the input arguments to the pattern. + + * TYPE_OUT: The type of the output of this pattern. + + * Return value: A new stmt that will be used to replace the sequence of + stmts that constitute the pattern. In this case it will be: + SAD_EXPR <x_t, y_t, sum_0> + */ + +static gimple +vect_recog_sad_pattern (vec<gimple> *stmts, tree *type_in, + tree *type_out) +{ + gimple last_stmt = (*stmts)[0]; + tree sad_oprnd0, sad_oprnd1; + stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); + tree half_type; + loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo); + struct loop *loop; + bool promotion; + + if (!loop_info) + return NULL; + + loop = LOOP_VINFO_LOOP (loop_info); + + if (!is_gimple_assign (last_stmt)) + return NULL; + + tree sum_type = gimple_expr_type (last_stmt); + + /* Look for the following pattern + DX = (TYPE1) X; + DY = (TYPE1) Y; + DDIFF = DX - DY; + DAD = ABS_EXPR <DDIFF>; + DDPROD = (TYPE2) DPROD; + sum_1 = DAD + sum_0; + In which + - DX is at least double the size of X + - DY is at least double the size of Y + - DX, DY, DDIFF, DAD all have the same type + - sum is the same size of DAD or bigger + - sum has been recognized as a reduction variable. + + This is equivalent to: + DDIFF = X w- Y; #widen sub + DAD = ABS_EXPR <DDIFF>; + sum_1 = DAD w+ sum_0; #widen summation + or + DDIFF = X w- Y; #widen sub + DAD = ABS_EXPR <DDIFF>; + sum_1 = DAD + sum_0; #summation + */ + + /* Starting from LAST_STMT, follow the defs of its uses in search + of the above pattern. */ + + if (gimple_assign_rhs_code (last_stmt) != PLUS_EXPR) + return NULL; + + tree plus_oprnd0, plus_oprnd1; + + if (STMT_VINFO_IN_PATTERN_P (stmt_vinfo)) + { + /* Has been detected as widening-summation? */ + + gimple stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo); + sum_type = gimple_expr_type (stmt); + if (gimple_assign_rhs_code (stmt) != WIDEN_SUM_EXPR) + return NULL; + plus_oprnd0 = gimple_assign_rhs1 (stmt); + plus_oprnd1 = gimple_assign_rhs2 (stmt); + half_type = TREE_TYPE (plus_oprnd0); + } + else + { + gimple def_stmt; + + if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_reduction_def) + return NULL; + plus_oprnd0 = gimple_assign_rhs1 (last_stmt); + plus_oprnd1 = gimple_assign_rhs2 (last_stmt); + if (!types_compatible_p (TREE_TYPE (plus_oprnd0), sum_type) + || !types_compatible_p (TREE_TYPE (plus_oprnd1), sum_type)) + return NULL; + + /* The type conversion could be promotion, demotion, + or just signed -> unsigned. */ + if (type_conversion_p (plus_oprnd0, last_stmt, false, + &half_type, &def_stmt, &promotion)) + plus_oprnd0 = gimple_assign_rhs1 (def_stmt); + else + half_type = sum_type; + } + + /* So far so good. Since last_stmt was detected as a (summation) reduction, + we know that plus_oprnd1 is the reduction variable (defined by a loop-header + phi), and plus_oprnd0 is an ssa-name defined by a stmt in the loop body. + Then check that plus_oprnd0 is defined by an abs_expr */ + + if (TREE_CODE (plus_oprnd0) != SSA_NAME) + return NULL; + + tree abs_type = half_type; + gimple abs_stmt = SSA_NAME_DEF_STMT (plus_oprnd0); + + /* It could not be the sad pattern if the abs_stmt is outside the loop. */ + if (!gimple_bb (abs_stmt) || !flow_bb_inside_loop_p (loop, gimple_bb (abs_stmt))) + return NULL; + + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi + inside the loop (in case we are analyzing an outer-loop). */ + if (!is_gimple_assign (abs_stmt)) + return NULL; + + stmt_vec_info abs_stmt_vinfo = vinfo_for_stmt (abs_stmt); + gcc_assert (abs_stmt_vinfo); + if (STMT_VINFO_DEF_TYPE (abs_stmt_vinfo) != vect_internal_def) + return NULL; + if (gimple_assign_rhs_code (abs_stmt) != ABS_EXPR) + return NULL; + + tree abs_oprnd = gimple_assign_rhs1 (abs_stmt); + if (!types_compatible_p (TREE_TYPE (abs_oprnd), abs_type)) + return NULL; + if (TYPE_UNSIGNED (abs_type)) + return NULL; + + /* We then detect if the operand of abs_expr is defined by a minus_expr. */ + + if (TREE_CODE (abs_oprnd) != SSA_NAME) + return NULL; + + gimple diff_stmt = SSA_NAME_DEF_STMT (abs_oprnd); + + /* It could not be the sad pattern if the diff_stmt is outside the loop. */ + if (!gimple_bb (diff_stmt) + || !flow_bb_inside_loop_p (loop, gimple_bb (diff_stmt))) + return NULL; + + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi + inside the loop (in case we are analyzing an outer-loop). */ + if (!is_gimple_assign (diff_stmt)) + return NULL; + + stmt_vec_info diff_stmt_vinfo = vinfo_for_stmt (diff_stmt); + gcc_assert (diff_stmt_vinfo); + if (STMT_VINFO_DEF_TYPE (diff_stmt_vinfo) != vect_internal_def) + return NULL; + if (gimple_assign_rhs_code (diff_stmt) != MINUS_EXPR) + return NULL; + + tree half_type0, half_type1; + gimple def_stmt; + + tree minus_oprnd0 = gimple_assign_rhs1 (diff_stmt); + tree minus_oprnd1 = gimple_assign_rhs2 (diff_stmt); + + if (!types_compatible_p (TREE_TYPE (minus_oprnd0), abs_type) + || !types_compatible_p (TREE_TYPE (minus_oprnd1), abs_type)) + return NULL; + if (!type_conversion_p (minus_oprnd0, diff_stmt, false, + &half_type0, &def_stmt, &promotion) + || !promotion) + return NULL; + sad_oprnd0 = gimple_assign_rhs1 (def_stmt); + + if (!type_conversion_p (minus_oprnd1, diff_stmt, false, + &half_type1, &def_stmt, &promotion) + || !promotion) + return NULL; + sad_oprnd1 = gimple_assign_rhs1 (def_stmt); + + if (!types_compatible_p (half_type0, half_type1)) + return NULL; + if (TYPE_PRECISION (abs_type) < TYPE_PRECISION (half_type0) * 2 + || TYPE_PRECISION (sum_type) < TYPE_PRECISION (half_type0) * 2) + return NULL; + + *type_in = TREE_TYPE (sad_oprnd0); + *type_out = sum_type; + + /* Pattern detected. Create a stmt to be used to replace the pattern: */ + tree var = vect_recog_temp_ssa_var (sum_type, NULL); + gimple pattern_stmt = gimple_build_assign_with_ops + (SAD_EXPR, var, sad_oprnd0, sad_oprnd1, plus_oprnd1); + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "vect_recog_sad_pattern: detected: "); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, pattern_stmt, 0); + dump_printf (MSG_NOTE, "\n"); + } + + /* We don't allow changing the order of the computation in the inner-loop + when doing outer-loop vectorization. */ + gcc_assert (!nested_in_vect_loop_p (loop, last_stmt)); + + return pattern_stmt; +} + + /* Handle widening operation by a constant. At the moment we support MULT_EXPR and LSHIFT_EXPR. diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a2f482d..d3a8137 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1046,7 +1046,7 @@ extern void vect_slp_transform_bb (basic_block); Additional pattern recognition functions can (and will) be added in the future. */ typedef gimple (* vect_recog_func_ptr) (vec<gimple> *, tree *, tree *); -#define NUM_PATTERNS 11 +#define NUM_PATTERNS 12 void vect_pattern_recog (loop_vec_info, bb_vec_info); /* In tree-vectorizer.c. */ diff --git a/gcc/tree.def b/gcc/tree.def index 399b5af..e9a147a 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1160,6 +1160,22 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3) with the second argument. */ DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2) +/* Widening sad (sum of absolute differences). + The first two arguments are of type t1 which should be integer. + The third argument and the result are of type t2, such that t2 is at least + twice the size of t1. Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is + equivalent to (note we don't have WIDEN_MINUS_EXPR now, but we assume its + behavior is similar to WIDEN_SUM_EXPR): + tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp2 = ABS_EXPR (tmp) + arg3 = PLUS_EXPR (tmp2, arg3) + or: + tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp2 = ABS_EXPR (tmp) + arg3 = WIDEN_SUM_EXPR (tmp2, arg3) + */ +DEFTREECODE (SAD_EXPR, "sad_expr", tcc_expression, 3) + /* Widening multiplication. The two arguments are of type t1. The result is of type t2, such that t2 is at least twice