On Thu, May 18, 2023 at 7:59 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Thanks for the update. Some of these comments would have applied > to the first version, so sorry for not catching them first time. > > <oluwatamilore.adeb...@arm.com> writes: > > From: oluade01 <oluwatamilore.adeb...@arm.com> > > > > This adds a recognition pattern for the non-widening > > absolute difference (ABD). > > > > gcc/ChangeLog: > > > > * doc/md.texi (sabd, uabd): Document them. > > * internal-fn.def (ABD): Use new optab. > > * optabs.def (sabd_optab, uabd_optab): New optabs, > > * tree-vect-patterns.cc (vect_recog_absolute_difference): > > Recognize the following idiom abs (a - b). > > (vect_recog_sad_pattern): Refactor to use > > vect_recog_absolute_difference. > > (vect_recog_abd_pattern): Use patterns found by > > vect_recog_absolute_difference to build a new ABD > > internal call. > > --- > > gcc/doc/md.texi | 10 ++ > > gcc/internal-fn.def | 3 + > > gcc/optabs.def | 2 + > > gcc/tree-vect-patterns.cc | 255 +++++++++++++++++++++++++++++++++----- > > 4 files changed, 239 insertions(+), 31 deletions(-) > > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > > index > > 07bf8bdebffb2e523f25a41f2b57e43c0276b745..3e65584d7efcd301f2c96a40edd82d30b84462b8 > > 100644 > > --- a/gcc/doc/md.texi > > +++ b/gcc/doc/md.texi > > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the > > Vector shift and rotate instructions that take vectors as operand 2 > > instead of a scalar type. > > > > +@cindex @code{uabd@var{m}} instruction pattern > > +@cindex @code{sabd@var{m}} instruction pattern > > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}} > > +Signed and unsigned absolute difference instructions. These > > +instructions find the difference between operands 1 and 2 > > +then return the absolute value. A C code equivalent would be: > > +@smallexample > > +op0 = op0 > op1 ? op0 - op1 : op1 - op0; > > Should be: > > op0 = op1 > op2 ? op1 - op2 : op2 - op1; > > since op0 is the output. > > > +@end smallexample > > + > > @cindex @code{avg@var{m}3_floor} instruction pattern > > @cindex @code{uavg@var{m}3_floor} instruction pattern > > @item @samp{avg@var{m}3_floor} > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index > > 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468 > > 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) > > DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) > > DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) > > > > +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first, > > + sabd, uabd, binary) > > + > > DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first, > > savg_floor, uavg_floor, binary) > > DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first, > > diff --git a/gcc/optabs.def b/gcc/optabs.def > > index > > 695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268 > > 100644 > > --- a/gcc/optabs.def > > +++ b/gcc/optabs.def > > @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, > > "mask_fold_left_plus_$a") > > OPTAB_D (extract_last_optab, "extract_last_$a") > > OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a") > > > > +OPTAB_D (uabd_optab, "uabd$a3") > > +OPTAB_D (sabd_optab, "sabd$a3") > > OPTAB_D (savg_floor_optab, "avg$a3_floor") > > OPTAB_D (uavg_floor_optab, "uavg$a3_floor") > > OPTAB_D (savg_ceil_optab, "avg$a3_ceil") > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index > > a49b09539776c0056e77f99b10365d0a8747fbc5..50f1822f220c023027f4b0f777965f3757842fa2 > > 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -770,6 +770,93 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info > > stmt2_info, tree new_rhs, > > } > > } > > > > +/* Look for the following pattern > > + X = x[i] > > + Y = y[i] > > + DIFF = X - Y > > + DAD = ABS_EXPR<DIFF> > > + > > + ABS_STMT should point to a statement of code ABS_EXPR or ABSU_EXPR. > > + If REJECT_UNSIGNED is true it aborts if the type of ABS_STMT is > > unsigned. > > + HALF_TYPE and UNPROM will be set should the statement be found to > > + be a widened operation. > > + DIFF_OPRNDS will be set to the two inputs of the MINUS_EXPR preceding > > + ABS_STMT, otherwise it will be set the operations found by > > + vect_widened_op_tree. > > + */ > > +static bool > > +vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt, > > + tree *half_type, bool reject_unsigned, > > + vect_unpromoted_value unprom[2], > > + tree diff_oprnds[2]) > > +{ > > + if (!abs_stmt) > > + return false; > > + > > + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a > > phi > > + inside the loop (in case we are analyzing an outer-loop). */ > > + enum tree_code code = gimple_assign_rhs_code (abs_stmt); > > + if (code != ABS_EXPR && code != ABSU_EXPR) > > + return false; > > + > > + tree abs_oprnd = gimple_assign_rhs1 (abs_stmt); > > + tree abs_type = TREE_TYPE (abs_oprnd); > > + if (!abs_oprnd) > > + return false; > > + if (reject_unsigned && TYPE_UNSIGNED (abs_type)) > > + return false; > > + if (!ANY_INTEGRAL_TYPE_P (abs_type) || TYPE_OVERFLOW_WRAPS (abs_type)) > > + return false; > > Could you explain the reject_unsigned behaviour? I'd have expected > TYPE_OVERFLOW_WRAPS (abs_type) to reject the unsigned case anyway. > > > + > > + /* Peel off conversions from the ABS input. This can involve sign > > + changes (e.g. from an unsigned subtraction to a signed ABS input) > > + or signed promotion, but it can't include unsigned promotion. > > + (Note that ABS of an unsigned promotion should have been folded > > + away before now anyway.) */ > > I'm not sure we can do the "unsigned subtraction to a signed ABS input" > case. The code: > > int > f (unsigned int x, int y) > { > unsigned int diff = x - y; > return __builtin_abs (diff); > } > > is well-defined C (no undefined behaviour). But it doesn't do the > same thing as an absolute difference. > > So... > > > + vect_unpromoted_value unprom_diff; > > + abs_oprnd = vect_look_through_possible_promotion (vinfo, abs_oprnd, > > + &unprom_diff); > > + if (!abs_oprnd) > > + return false; > > + if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (abs_type) > > + && TYPE_UNSIGNED (unprom_diff.type) > > + && TYPE_UNSIGNED (abs_type)) > > + return false; > > ...I think there are four valid cases: > > (1) !TYPE_UNSIGNED (abs_type) > && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (abs_oprnd)) > && TYPE_PRECISION (unprom_diff.type) == TYPE_PRECISION (abs_type) > && regular unwidened MINUS_EXPR > > Here the subtraction and ABS input are signed. We can assume that > there is no signed overflow in the subtraction. > > (2) !TYPE_UNSIGNED (abs_type) > && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (abs_oprnd)) > && TYPE_PRECISION (unprom_diff.type) < TYPE_PRECISION (abs_type) > && !TYPE_UNSIGNED (unprom_diff.type) > && regular unwidened MINUS_EXPR > > This is like (1) except that the MINUS_EXPR result is sign-extended > before the ABS. Again we rely on having no signed overflow, hence the > TYPE_OVERFLOW_UNDEFINED check on the subtraction type. We can then > push the sign-extension of the subtraction result “down” to the inputs > of the subtraction. > > (3) !TYPE_UNSIGNED (abs_type) > && TYPE_PRECISION (unprom_diff.type) == TYPE_PRECISION (abs_type) > && MINUS_EXPR of widened inputs > > In other words; > > - the ABS input is signed > - the MINUS_EXPR has the same precision as the ABS > - the MINUS_EXPR inputs are narrower than the result > > In this case the MINUS_EXPR is known to give correct results for > all inputs, due to the extra range. There's no need to rely on > TYPE_OVERFLOW_UNDEFINED. > > (4) !TYPE_UNSIGNED (abs_type) > && TYPE_PRECISION (unprom_diff.type) < TYPE_PRECISION (abs_type) > && !TYPE_UNSIGNED (unprom_diff.type) > && MINUS_EXPR of widened inputs > > Like (3), but the subtraction result is further sign-extended > (somewhat pointlessly). > > Which is a very convoluated way of classifying it, sorry.
There's also the possibility that you see ABSU_EXPR instead of ABS_EXPR. > I think this means that: > > (a) We should reject all cases in TYPE_UNSIGNED (abs_type) is true. > Absolute values with unsigned inputs are a niche game, if they're > valid at all. > > (b) We should require TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (abs_oprnd)) > whenever we see a MINUS_EXPR with unextended inputs (i.e. whenever > vect_widened_op_tree fails). > > I guess some of this is really a comment about the existing SAD_EXPR code. > > > + > > + /* We then detect if the operand of abs_expr is defined by a minus_expr. > > */ > > + stmt_vec_info diff_stmt_vinfo = vect_get_internal_def (vinfo, abs_oprnd); > > + if (!diff_stmt_vinfo) > > + return false; > > + > > + /* FORNOW. Can continue analyzing the def-use chain when this stmt in a > > phi > > + inside the loop (in case we are analyzing an outer-loop). */ > > + if (vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, > > + WIDEN_MINUS_EXPR, > > + false, 2, unprom, half_type)) > > + { > > + if (diff_oprnds) > > + { > > + diff_oprnds[0] = unprom[0].op; > > + diff_oprnds[1] = unprom[1].op; > > + } > > + return true; > > + } > > + > > + // Failed to find a widen operation so we check for a regular MINUS_EXPR > > + gassign *diff = dyn_cast <gassign *> (STMT_VINFO_STMT (diff_stmt_vinfo)); > > + if (diff_oprnds && diff && gimple_assign_rhs_code (diff) == MINUS_EXPR) > > + { > > + diff_oprnds[0] = gimple_assign_rhs1 (diff); > > + diff_oprnds[1] = gimple_assign_rhs2 (diff); > > + } > > + else > > + return false; > > + > > + *half_type = NULL_TREE; > > + > > + return true; > > Very minor, but I think it'd be clearer to put the *half_type assignment > and return true in the "if" body, and unconditionally return false > at the end. That makes it more consistent with the widening case. > > > +} > > + > > /* Convert UNPROM to TYPE and return the result, adding new statements > > to STMT_INFO's pattern definition statements if no better way is > > available. VECTYPE is the vector form of TYPE. > > @@ -1308,40 +1395,13 @@ vect_recog_sad_pattern (vec_info *vinfo, > > /* FORNOW. Can continue analyzing the def-use chain when this stmt in a > > phi > > inside the loop (in case we are analyzing an outer-loop). */ > > gassign *abs_stmt = dyn_cast <gassign *> (abs_stmt_vinfo->stmt); > > - if (!abs_stmt > > - || (gimple_assign_rhs_code (abs_stmt) != ABS_EXPR > > - && gimple_assign_rhs_code (abs_stmt) != ABSU_EXPR)) > > - return NULL; > > > > - tree abs_oprnd = gimple_assign_rhs1 (abs_stmt); > > - tree abs_type = TREE_TYPE (abs_oprnd); > > - if (TYPE_UNSIGNED (abs_type)) > > - return NULL; > > - > > - /* Peel off conversions from the ABS input. This can involve sign > > - changes (e.g. from an unsigned subtraction to a signed ABS input) > > - or signed promotion, but it can't include unsigned promotion. > > - (Note that ABS of an unsigned promotion should have been folded > > - away before now anyway.) */ > > - vect_unpromoted_value unprom_diff; > > - abs_oprnd = vect_look_through_possible_promotion (vinfo, abs_oprnd, > > - &unprom_diff); > > - if (!abs_oprnd) > > - return NULL; > > - if (TYPE_PRECISION (unprom_diff.type) != TYPE_PRECISION (abs_type) > > - && TYPE_UNSIGNED (unprom_diff.type)) > > - return NULL; > > - > > - /* We then detect if the operand of abs_expr is defined by a minus_expr. > > */ > > - stmt_vec_info diff_stmt_vinfo = vect_get_internal_def (vinfo, abs_oprnd); > > - if (!diff_stmt_vinfo) > > + vect_unpromoted_value unprom[2]; > > + if (!vect_recog_absolute_difference (vinfo, abs_stmt, &half_type, > > + true, unprom, NULL)) > > return NULL; > > > > - /* FORNOW. Can continue analyzing the def-use chain when this stmt in a > > phi > > - inside the loop (in case we are analyzing an outer-loop). */ > > - vect_unpromoted_value unprom[2]; > > - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, > > WIDEN_MINUS_EXPR, > > - false, 2, unprom, &half_type)) > > + if (!half_type) > > return NULL; > > > > vect_pattern_detected ("vect_recog_sad_pattern", last_stmt); > > @@ -1363,6 +1423,138 @@ vect_recog_sad_pattern (vec_info *vinfo, > > return pattern_stmt; > > } > > > > +/* Function vect_recog_abd_pattern > > + > > + Try to find the following ABsolute Difference (ABD) pattern: > > + > > + VTYPE x, y, out; > > + type diff; > > + loop i in range: > > + S1 diff = x[i] - y[i] > > + S2 out[i] = ABS_EXPR <diff>; > > + > > + where 'type' is a integer and 'VTYPE' is a vector of integers > > + the same size as 'type' > > + > > + Input: > > + > > + * STMT_VINFO: The stmt from which the pattern search begins > > + > > + Output: > > + > > + * TYPE_out: The type of the output of this pattern > > + > > + * Return value: A new stmt that will be used to replace the sequence of > > + stmts that constitute the pattern; either SABD or UABD: > > + SABD_EXPR<x, y, out> > > + UABD_EXPR<x, y, out> > > + > > + UABD expressions are used when the input types are > > + narrower than the output types or the output type is narrower > > + than 32 bits > > + */ > > + > > +static gimple * > > +vect_recog_abd_pattern (vec_info *vinfo, > > + stmt_vec_info stmt_vinfo, tree *type_out) > > +{ > > + /* Look for the following patterns > > + X = x[i] > > + Y = y[i] > > + DIFF = X - Y > > + DAD = ABS_EXPR<DIFF> > > + out[i] = DAD > > + > > + In which > > + - X, Y, DIFF, DAD all have the same type > > + - x, y, out are all vectors of the same type > > + */ > > + gassign *last_stmt = dyn_cast <gassign *> (STMT_VINFO_STMT (stmt_vinfo)); > > + if (!last_stmt) > > + return NULL; > > + > > + tree out_type = TREE_TYPE (gimple_assign_lhs (last_stmt)); > > + > > + gassign *abs_stmt = last_stmt; > > + if (gimple_assign_cast_p (last_stmt)) > > + { > > + tree last_rhs = gimple_assign_rhs1 (last_stmt); > > + if (!SSA_VAR_P (last_rhs)) > > + return NULL; > > + > > + abs_stmt = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (last_rhs)); > > + if (!abs_stmt) > > + return NULL; > > + } > > Could you explain why this is needed? I'd expect the function to be > called on the cast rhs too, so I wouldn't expect that we'd need to handle > casts explicitly. > > > + vect_unpromoted_value unprom[2]; > > + tree diff_oprnds[2]; > > + tree half_type; > > + if (!vect_recog_absolute_difference (vinfo, abs_stmt, &half_type, > > + false, unprom, diff_oprnds)) > > + return NULL; > > + > > +#define SAME_TYPE(A, B) (TYPE_PRECISION (A) == TYPE_PRECISION (B)) > > + > > + tree abd_oprnds[2]; > > + if (half_type) > > + { > > + if (!SAME_TYPE (unprom[0].type, unprom[1].type)) > > + return NULL; > > I wouldn't have expected this to be unecessary. half_type is supposed > to be a common type that can hold all values of unprom[0].type and > unprom[1].type. We should just be able to do: > > > + tree diff_type = TREE_TYPE (diff_oprnds[0]); > > + if (TYPE_PRECISION (out_type) != TYPE_PRECISION (diff_type)) > > + { > > + tree vectype = get_vectype_for_scalar_type (vinfo, half_type); > > + vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds, > > + half_type, unprom, vectype); > > ...this vect_convert_inputs unconditionally. We need to check that > the get_vectype_for_scalar_type call succeeds though. > > So does it work as: > > if (half_type) > { > tree vectype = get_vectype_for_scalar_type (vinfo, half_type); > if (!vectype) > return false; > vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds, > half_type, unprom, vectype); > } > > ? > > > + } > > + else > > + { > > + abd_oprnds[0] = diff_oprnds[0]; > > + abd_oprnds[1] = diff_oprnds[1]; > > + } > > + } > > + else > > + { > > + if (unprom[0].op && unprom[1].op > > + && (!SAME_TYPE (unprom[0].type, unprom[1].type) > > + || !SAME_TYPE (unprom[0].type, out_type))) > > + return NULL; > > AIUI, the !half_type case shouldn't look at unprom, since it's handling > simple MINUS_EXPRs. I think we can just delete this "if" statement. > > > + > > + unprom[0].op = diff_oprnds[0]; > > + unprom[1].op = diff_oprnds[1]; > > + tree signed_out = signed_type_for (out_type); > > + tree signed_out_vectype = get_vectype_for_scalar_type (vinfo, > > signed_out); > > We need to check for success here too. > > > + vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds, > > + signed_out, unprom, signed_out_vectype); > > + > > + if (!SAME_TYPE (TREE_TYPE (diff_oprnds[0]), TREE_TYPE > > (abd_oprnds[0]))) > > + return NULL; > > I don't think this is needed. > > > + } > > + > > + if (!SAME_TYPE (TREE_TYPE (abd_oprnds[0]), TREE_TYPE (abd_oprnds[1])) > > + || !SAME_TYPE (TREE_TYPE (abd_oprnds[0]), out_type)) > > + return NULL; > > I also don't think this is needed. AIUI, the previous code has done > all the necessary correctness checks. > > > + > > + vect_pattern_detected ("vect_recog_abd_pattern", last_stmt); > > + > > + tree vectype = get_vectype_for_scalar_type (vinfo, out_type); > > I think instead we want the vector types computed above. That is: > > - The ABD should be done on the vector version of half_type > if the subtraction was on promoted inputs. The result of > the ABD should then be zero-extended (using vect_convert_output) > to out_type. > > In particular, it's the sign of HALF_TYPE that decides whether > it's signed or unsigned ABD. > > - The ABD should be done on the vector version of signed_outtype > if the subtraction was on unpromoted inputs. We then might need > to sign-cast it to outtype, if outtype is unsigned. We can > use vect_convert_output for that too. > > In other words, this case must use signed ABD. > > Hope I've got that right... > > Thanks, > Richard > > > + if (!vectype > > + || !direct_internal_fn_supported_p (IFN_ABD, vectype, > > + OPTIMIZE_FOR_SPEED)) > > + return NULL; > > + > > + *type_out = STMT_VINFO_VECTYPE (stmt_vinfo); > > + > > + tree var = vect_recog_temp_ssa_var (out_type, NULL); > > + gcall *abd_stmt = gimple_build_call_internal (IFN_ABD, 2, > > + abd_oprnds[0], abd_oprnds[1]); > > + gimple_call_set_lhs (abd_stmt, var); > > + gimple_set_location (abd_stmt, gimple_location (last_stmt)); > > + return abd_stmt; > > +} > > + > > /* Recognize an operation that performs ORIG_CODE on widened inputs, > > so that it can be treated as though it had the form: > > > > @@ -6439,6 +6631,7 @@ struct vect_recog_func > > static vect_recog_func vect_vect_recog_func_ptrs[] = { > > { vect_recog_bitfield_ref_pattern, "bitfield_ref" }, > > { vect_recog_bit_insert_pattern, "bit_insert" }, > > + { vect_recog_abd_pattern, "abd" }, > > { vect_recog_over_widening_pattern, "over_widening" }, > > /* Must come after over_widening, which narrows the shift as much as > > possible beforehand. */