Re: [PATCH][PR104116] Add vectorization logic for floor_{mod,div}

Richard Biener Thu, 11 Sep 2025 04:17:25 -0700

On Wed, 10 Sep 2025, Avinash Jayakar wrote:

> Hi,
> 
> The following patch implements the vectorization logic for FLOOR_MOD_EXPR and 
> FLOOR_DIV_EXPR. According to the logic mentioned in the PR, we have 
> For signed operands,
>     r = x %[fl] y;                                                            
>                                                                               
>                           
>     is                                                                        
>                                                                               
>                           
>     r = x % y; if (r && (x ^ y) < 0) r += y;                                  
>                                                                               
>                           
>     and                                                                       
>                                                                               
>                           
>     d = x /[fl] y;                                                            
>                                                                               
>                           
>     is                                                                        
>                                                                               
>                           
>     r = x % y; d = x / y; if (r && (x ^ y) < 0) --d;
> 
> The first part enables FLOOR_{MOD,DIV}_EXPR in the switch case. Since the 
> second
> operand is always a constant (check done in line 4875), we check if the 
> operands
> are signed by seeing if first operand has unsigned type and the second 
> operand 
> is greater than zero. If so, then FLOOR_{DIV,MOD} is same as TRUNC_{DIV,MOD}.
> 
> For the signed operands, the logic written above is implemented right after 
> the 
> TRUNC_MOD_EXPR ends, since that is needed in both cases. The pseudo code for 
> vector implementation is as follows (op0, and op1 are the operands, and r is 
> the
> remainder = op0%op1)
> 
> v1 = op0 ^ op1 
> v2 = (v1 < 0)
> v3 = (r!=0)
> v4 = (v3 && v2)
> 
> // For FLOOR_MOD_EXPR
> result = r + (v4 ? op1 : 0)
> // For FLOOR_DIV_EXPR 
> result = d - (v4 ? 1 : 0)
> 
> Please let me know if this logic is fine. Also I needed some more inputs on 
> this.
> 1. In the if (interger_pow2p (oprnd1)) path, are there any recommendations, 
> or would 
> it be fine to produce this same code? I have skipped the handling for 
> correctness
> for now.


What's wrong with using the existing cases always as if it were
TRUNC_{DIV,MOD}_EXPR and emitting the required compensation for
the FLOOR_ operation afterwards?  You can always split out parts
of the function if this makes the code easier to follow.

> 2. I can have a test case for FLOOR_MOD_EXPR using the modulo intrinsic 
> procedure. 
> Since the PR talks about FLOOR_DIV_EXPR and {CEIL,ROUND}_{MOD,DIV}_EXPR, 
> could 
> someone please suggest source code that can have these operators in the 
> gimple? 
> I am not sure how to feed gcc just the updated gimple (I assume this is not 
> possible currently).

You can use the GIMPLE frontend to feed in __FLOOR_DIV (also
__ROUND_DIV and __CEIL_DIV and modulo variants).  There are
various testcases using -fgimple in gcc.dg/vect/ which you can
use as example.

> 
>       PR vect/119702

This is the wrong PR number.

> gcc:
>       * gcc/tree-vect-patterns.cc: Added vectorization logic for 
> FLOOR_MOD_EXPR
>       and FLOOR_DIV_EXPR
> ---
>  gcc/tree-vect-patterns.cc | 82 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 78 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index b39da1062c0..c5e3c758ef0 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4862,6 +4862,8 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>      case TRUNC_DIV_EXPR:
>      case EXACT_DIV_EXPR:
>      case TRUNC_MOD_EXPR:
> +    case FLOOR_MOD_EXPR:
> +    case FLOOR_DIV_EXPR:
>        break;
>      default:
>        return NULL;
> @@ -4881,6 +4883,23 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>    if (vectype == NULL_TREE)
>      return NULL;
>  
> +  bool unsignedp = TYPE_UNSIGNED (itype) && (tree_int_cst_sgn (oprnd1) > 0);
> +
> +  if (unsignedp) 
> +  {
> +    switch (rhs_code) 
> +    {
> +      case FLOOR_DIV_EXPR:
> +        rhs_code = TRUNC_DIV_EXPR;
> +        break;
> +      case FLOOR_MOD_EXPR:
> +        rhs_code = TRUNC_MOD_EXPR;
> +        break;

I would have expected this to be already done on gimple, in
a more generic way by

/* For unsigned integral types, FLOOR_DIV_EXPR is the same as
   TRUNC_DIV_EXPR.  Rewrite into the latter in this case.  Similarly
   for MOD instead of DIV.  */
(for floor_divmod (floor_div floor_mod)
     trunc_divmod (trunc_div trunc_mod)
 (simplify
  (floor_divmod @0 @1)
  (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
       && TYPE_UNSIGNED (type))
   (trunc_divmod @0 @1))))


> +      default:
> +        break;
> +    }
> +  }
> +
>    if (optimize_bb_for_size_p (gimple_bb (last_stmt)))
>      {
>        /* If the target can handle vectorized division or modulo natively,
> @@ -4893,7 +4912,9 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>      }
>  
>    prec = TYPE_PRECISION (itype);
> -  if (integer_pow2p (oprnd1))
> +  if (integer_pow2p (oprnd1) 
> +      && rhs_code != FLOOR_DIV_EXPR 
> +      && rhs_code != FLOOR_MOD_EXPR)
>      {
>        if (TYPE_UNSIGNED (itype) || tree_int_cst_sgn (oprnd1) != 1)
>       return NULL;
> @@ -5315,13 +5336,15 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>       }
>      }
>  
> -  if (rhs_code == TRUNC_MOD_EXPR)
> +  if (rhs_code == TRUNC_MOD_EXPR 
> +      || rhs_code == FLOOR_MOD_EXPR
> +      || rhs_code == FLOOR_DIV_EXPR)
>      {
>        tree r, t1;
>  
>        /* We divided.  Now finish by:
> -      t1 = q * oprnd1;
> -      r = oprnd0 - t1;  */
> +      t1 = q * oprnd1;
> +      r = oprnd0 - t1;  */
>        append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt);
>  
>        t1 = vect_recog_temp_ssa_var (itype, NULL);
> @@ -5330,6 +5353,57 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>  
>        r = vect_recog_temp_ssa_var (itype, NULL);
>        pattern_stmt = gimple_build_assign (r, MINUS_EXPR, oprnd0, t1);
> +
> +      if (rhs_code == FLOOR_MOD_EXPR
> +          || rhs_code == FLOOR_DIV_EXPR) 
> +      {
> +        //     r = x %[fl] y;                                                
>                                                                               
>                                       
> +        // is                                                                
>                                                                               
>                                   
> +        // r = x % y; if (r && (x ^ y) < 0) r += y;   
> +        // Extract the sign bit
> +        // x^y
> +        append_pattern_def_seq (vinfo, stmt_vinfo, pattern_stmt);
> +        tree cond_reg = vect_recog_temp_ssa_var(itype, NULL);
> +        def_stmt = gimple_build_assign(cond_reg, BIT_XOR_EXPR, oprnd0, 
> oprnd1);
> +        append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> +        
> +        // x^y < 0
> +        tree cond_reg2 = vect_recog_temp_ssa_var(boolean_type_node, NULL);
> +        def_stmt = gimple_build_assign(cond_reg2, LT_EXPR, cond_reg, 
> build_int_cst(itype, 0));
> +        append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt, 
> truth_type_for(vectype), itype);
> +
> +        // r != 0
> +        tree cond_reg3 = vect_recog_temp_ssa_var(boolean_type_node, NULL);
> +        def_stmt = gimple_build_assign(cond_reg3, NE_EXPR, r, 
> build_int_cst(itype, 0));
> +        append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt, 
> truth_type_for(vectype), itype);
> +
> +        // x^y < 0 && r != 0
> +        tree cond_reg4 = vect_recog_temp_ssa_var(boolean_type_node, NULL);
> +        def_stmt = gimple_build_assign(cond_reg4, BIT_AND_EXPR, cond_reg3, 
> cond_reg2);
> +        append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt, 
> truth_type_for(vectype), itype);
> +        if (rhs_code == FLOOR_MOD_EXPR) 
> +        {
> +          //  (x^y < 0 && r) ? y : 0
> +          tree extr_cond = vect_recog_temp_ssa_var(itype, NULL);
> +          def_stmt = gimple_build_assign(extr_cond, COND_EXPR, cond_reg4, 
> oprnd1, build_int_cst(itype, 0));
> +          append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> +          
> +          // r += (x ^ y < 0 && r) ? y : 0
> +          tree floor_mod_r = vect_recog_temp_ssa_var(itype, NULL);
> +          pattern_stmt = gimple_build_assign(floor_mod_r, PLUS_EXPR, r, 
> extr_cond);
> +        } else if (rhs_code == FLOOR_DIV_EXPR)
> +        {
> +          //  (x^y < 0 && r) ? 1 : 0
> +          tree extr_cond = vect_recog_temp_ssa_var(itype, NULL);
> +          def_stmt = gimple_build_assign(extr_cond, COND_EXPR, cond_reg4, 
> +            build_int_cst(itype, 1), build_int_cst(itype, 0));
> +          append_pattern_def_seq (vinfo, stmt_vinfo, def_stmt);
> +          
> +          // q -= (x ^ y < 0 && r) ? 1 : 0
> +          tree floor_mod_r = vect_recog_temp_ssa_var(itype, NULL);
> +          pattern_stmt = gimple_build_assign(floor_mod_r, MINUS_EXPR, q, 
> extr_cond);
> +        }

You are emitting code that might not be vectorizable and which needs
post-processing with bool vector patterns.  So you should

 1) use the appropriate precision scalar bools, anticipating the vector 
    mask type used
 2) check at least whether the compares are supported, I think we can
    rely on bit operations suppoort

Richard.

> +      }
>      }
>  
>    /* Pattern detected.  */
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH][PR104116] Add vectorization logic for floor_{mod,div}

Reply via email to