On Wed, 10 Sep 2025, Avinash Jayakar wrote:
> Hi,
>
> The following patch implements the vectorization logic for FLOOR_MOD_EXPR and
> FLOOR_DIV_EXPR. According to the logic mentioned in the PR, we have
> For signed operands,
> r = x %[fl] y;
>
>
> is
>
>
> r = x % y; if (r && (x ^ y) < 0) r += y;
>
>
> and
>
>
> d = x /[fl] y;
>
>
> is
>
>
> r = x % y; d = x / y; if (r && (x ^ y) < 0) --d;
>
> The first part enables FLOOR_{MOD,DIV}_EXPR in the switch case. Since the
> second
> operand is always a constant (check done in line 4875), we check if the
> operands
> are signed by seeing if first operand has unsigned type and the second
> operand
> is greater than zero. If so, then FLOOR_{DIV,MOD} is same as TRUNC_{DIV,MOD}.
>
> For the signed operands, the logic written above is implemented right after
> the
> TRUNC_MOD_EXPR ends, since that is needed in both cases. The pseudo code for
> vector implementation is as follows (op0, and op1 are the operands, and r is
> the
> remainder = op0%op1)
>
> v1 = op0 ^ op1
> v2 = (v1 < 0)
> v3 = (r!=0)
> v4 = (v3 && v2)
>
> // For FLOOR_MOD_EXPR
> result = r + (v4 ? op1 : 0)
> // For FLOOR_DIV_EXPR
> result = d - (v4 ? 1 : 0)
>
> Please let me know if this logic is fine. Also I needed some more inputs on
> this.
> 1. In the if (interger_pow2p (oprnd1)) path, are there any recommendations,
> or would
> it be fine to produce this same code? I have skipped the handling for
> correctness
> for now.
What's wrong with using the existing cases always as if it were
TRUNC_{DIV,MOD}_EXPR and emitting the required compensation for
the FLOOR_ operation afterwards? You can always split out parts
of the function if this makes the code easier to follow.
> 2. I can have a test case for FLOOR_MOD_EXPR using the modulo intrinsic
> procedure.
> Since the PR talks about FLOOR_DIV_EXPR and {CEIL,ROUND}_{MOD,DIV}_EXPR,
> could
> someone please suggest source code that can have these operators in the
> gimple?
> I am not sure how to feed gcc just the updated gimple (I assume this is not
> possible currently).
You can use the GIMPLE frontend to feed in __FLOOR_DIV (also
__ROUND_DIV and __CEIL_DIV and modulo variants). There are
various testcases using -fgimple in gcc.dg/vect/ which you can
use as example.
>
> PR vect/119702
This is the wrong PR number.
> gcc:
> * gcc/tree-vect-patterns.cc: Added vectorization logic for
> FLOOR_MOD_EXPR
> and FLOOR_DIV_EXPR
> ---
> gcc/tree-vect-patterns.cc | 82 +++++++++++++++++++++++++++++++++++++--
> 1 file changed, 78 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index b39da1062c0..c5e3c758ef0 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4862,6 +4862,8 @@ vect_recog_divmod_pattern (vec_info *vinfo,
> case TRUNC_DIV_EXPR:
> case EXACT_DIV_EXPR:
> case TRUNC_MOD_EXPR:
> + case FLOOR_MOD_EXPR:
> + case FLOOR_DIV_EXPR:
> break;
> default:
> return NULL;
> @@ -4881,6 +4883,23 @@ vect_recog_divmod_pattern (vec_info *vinfo,
> if (vectype == NULL_TREE)
> return NULL;
>
> + bool unsignedp = TYPE_UNSIGNED (itype) && (tree_int_cst_sgn (oprnd1) > 0);
> +
> + if (unsignedp)
> + {
> + switch (rhs_code)
> + {
> + case FLOOR_DIV_EXPR:
> + rhs_code = TRUNC_DIV_EXPR;
> + break;
> + case FLOOR_MOD_EXPR:
> + rhs_code = TRUNC_MOD_EXPR;
> + break;
I would have expected this to be already done on gimple, in
a more generic way by
/* For unsigned integral types, FLOOR_DIV_EXPR is the same as
TRUNC_DIV_EXPR. Rewrite into the latter in this case. Similarly
for MOD instead of DIV. */
(for floor_divmod (floor_div floor_mod)
trunc_divmod (trunc_div trunc_mod)
(simplify
(floor_divmod @0 @1)
(if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
&& TYPE_UNSIGNED (type))
(trunc_divmod @0 @1))))
> + default:
> + break;
> + }
> + }
> +
> if (optimize_bb_for_size_p (gimple_bb (last_stmt)))
> {
> /* If the target can handle vectorized division or modulo natively,
> @@ -4893,7 +4912,9 @@ vect_recog_divmod_pattern (vec_info *vinfo,
> }
>
> prec = TYPE_PRECISION (itype);
> - if (integer_pow2p (oprnd1))
> + if (integer_pow2p (oprnd1)
> + && rhs_code != FLOOR_DIV_EXPR
> + && rhs_code != FLOOR_MOD_EXPR)
> {
> if (TYPE_UNSIGNED (itype) || tree_int_cst_sgn (oprnd1) != 1)
> return NULL;
> @@ -5315,13 +5336,15 @@ vect_recog_divmod_pattern (vec_info *vinfo,
> }
> }
>
> - if (rhs_code == TRUNC_MOD_EXPR)
> + if (rhs_code == TRUNC_MOD_EXPR
> + || rhs_code == FLOOR_MOD_EXPR
> + || rhs_code == FLOOR_DIV_EXPR)
> {
> tree r, t1;
>
> /* We divided. Now finish by:
> - t1 = q * oprnd1;
> - r = oprnd0 - t1; */
> + t1 = q * oprnd1;
> + r = oprnd0 - t1; */
> append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt);
>
> t1 = vect_recog_temp_ssa_var (itype, NULL);
> @@ -5330,6 +5353,57 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>
> r = vect_recog_temp_ssa_var (itype, NULL);
> pattern_stmt = gimple_build_assign (r, MINUS_EXPR, oprnd0, t1);
> +
> + if (rhs_code == FLOOR_MOD_EXPR
> + || rhs_code == FLOOR_DIV_EXPR)
> + {
> + // r = x %[fl] y;
>
>
> + // is
>
>
> + // r = x % y; if (r && (x ^ y) < 0) r += y;
> + // Extract the sign bit
> + // x^y
> + append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt);
> + tree cond_reg = vect_recog_temp_ssa_var(itype, NULL);
> + def_stmt = gimple_build_assign(cond_reg, BIT_XOR_EXPR, oprnd0,
> oprnd1);
> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> +
> + // x^y < 0
> + tree cond_reg2 = vect_recog_temp_ssa_var(boolean_type_node, NULL);
> + def_stmt = gimple_build_assign(cond_reg2, LT_EXPR, cond_reg,
> build_int_cst(itype, 0));
> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt,
> truth_type_for(vectype), itype);
> +
> + // r != 0
> + tree cond_reg3 = vect_recog_temp_ssa_var(boolean_type_node, NULL);
> + def_stmt = gimple_build_assign(cond_reg3, NE_EXPR, r,
> build_int_cst(itype, 0));
> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt,
> truth_type_for(vectype), itype);
> +
> + // x^y < 0 && r != 0
> + tree cond_reg4 = vect_recog_temp_ssa_var(boolean_type_node, NULL);
> + def_stmt = gimple_build_assign(cond_reg4, BIT_AND_EXPR, cond_reg3,
> cond_reg2);
> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt,
> truth_type_for(vectype), itype);
> + if (rhs_code == FLOOR_MOD_EXPR)
> + {
> + // (x^y < 0 && r) ? y : 0
> + tree extr_cond = vect_recog_temp_ssa_var(itype, NULL);
> + def_stmt = gimple_build_assign(extr_cond, COND_EXPR, cond_reg4,
> oprnd1, build_int_cst(itype, 0));
> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> +
> + // r += (x ^ y < 0 && r) ? y : 0
> + tree floor_mod_r = vect_recog_temp_ssa_var(itype, NULL);
> + pattern_stmt = gimple_build_assign(floor_mod_r, PLUS_EXPR, r,
> extr_cond);
> + } else if (rhs_code == FLOOR_DIV_EXPR)
> + {
> + // (x^y < 0 && r) ? 1 : 0
> + tree extr_cond = vect_recog_temp_ssa_var(itype, NULL);
> + def_stmt = gimple_build_assign(extr_cond, COND_EXPR, cond_reg4,
> + build_int_cst(itype, 1), build_int_cst(itype, 0));
> + append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> +
> + // q -= (x ^ y < 0 && r) ? 1 : 0
> + tree floor_mod_r = vect_recog_temp_ssa_var(itype, NULL);
> + pattern_stmt = gimple_build_assign(floor_mod_r, MINUS_EXPR, q,
> extr_cond);
> + }
You are emitting code that might not be vectorizable and which needs
post-processing with bool vector patterns. So you should
1) use the appropriate precision scalar bools, anticipating the vector
mask type used
2) check at least whether the compares are supported, I think we can
rely on bit operations suppoort
Richard.
> + }
> }
>
> /* Pattern detected. */
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)