Re: [PATCH] match.pd: simplify "shift + reg EQ|NE reg"

Richard Biener Thu, 08 Jan 2026 23:25:07 -0800

On Thu, Jan 8, 2026 at 10:26 PM Daniel Henrique Barboza
<[email protected]> wrote:
>
>
>
> On 1/7/2026 7:06 PM, Andrew Pinski wrote:
> > On Wed, Jan 7, 2026 at 1:47 PM Daniel Henrique Barboza
> > <[email protected]> wrote:
> >>
> >>
> >>
> >> On 1/7/2026 11:44 AM, Jeffrey Law wrote:
> >>>
> >>>
> >>> On 1/6/2026 3:13 PM, Andrew Pinski wrote:
> >>>> On Tue, Jan 6, 2026 at 1:46 PM Daniel Barboza
> >>>> <[email protected]> wrote:
> >>>>> Add a transformation for a nested lshift/rshift inside a plus that
> >>>>> compares for
> >>>>> equality with the same operand of the plus. In other words:
> >>>>>
> >>>>> ((a OP b) + c EQ|NE c) ? x : y
> >>>>>
> >>>>> becomes, for OP = (lshift, rshift) and in a scenario without overflows:
> >>>>>
> >>>>> a !=/== 0 ? x : y
> >>>> I think we want the transformation even if it is used outside of a
> >>>> cond_expr.
> >>>> Also the above is valid even for types that wrap; just not valid for
> >>>> types that trap on overflow.
> >>>>
> >>>> As we already have a pattern for `a + C == C`:
> >>>> ```
> >>>> /* For equality, this is also true with wrapping overflow.  */
> >>>> (for op (eq ne)
> >>>>    (simplify
> >>>>     (op:c (nop_convert?@3 (plus:c@2 @0 (convert1? @1))) (convert2? @1))
> >>>>     (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> >>>>          && (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> >>>>              || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >>>>          && (CONSTANT_CLASS_P (@0) || (single_use (@2) && single_use
> >>>> (@3)))
> >>>>          && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@2))
> >>>>          && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@1)))
> >>>>      (op @0 { build_zero_cst (TREE_TYPE (@0)); })))
> >>>> ```
> >>>>
> >>>> The problem is the above does not work as there is an single_use check
> >>>> on the plus. The single use check was there since the original match
> >>>> pattern was added.
> >>>> I am not sure if we should add a special case where in the above
> >>>> pattern @0 is a shift.
> >>>> Though changing that will have to wait for GCC 17 I think.
> >>
> >>
> >> I tried  variation of that pattern, taking into account the lshift and
> >> removing the single_use and constant_class_p conditionals:
> >>
> >> /* Do the same but with lshift|rshift as @0.  */
> >> (for op (eq ne)
> >>    (simplify
> >>     (op:c (nop_convert?@3 (plus:c@2 (lshift @0 @4) (convert1? @1)))
> >> (convert2? @1))
> >>     (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> >>          && (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> >>             || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >>          && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@2))
> >>          && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@1)))
> >>      (op @0 { build_zero_cst (TREE_TYPE (@0)); }))))
> >>
> >>
> >> And I got the following 'optimized' dump:
> >>
> >>
> >> long int frob (long int x, long int y, long int z)
> >> {
> >>     long int ret;
> >>     long int _1;
> >>
> >> ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally,
> >> freq 1.0000), maybe hot
> >> ;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> >> ;;    pred:       ENTRY [always]  count:1073741824 (estimated locally,
> >> freq 1.0000) (FALLTHRU,EXECUTABLE)
> >>     if (y_3(D) == 0)
> >>       goto <bb 4>; [50.00%]
> >>     else
> >>       goto <bb 3>; [50.00%]
> >> ;;    succ:       4 [50.0% (guessed)]  count:536870912 (estimated
> >> locally, freq 0.5000) (TRUE_VALUE,EXECUTABLE)
> >> ;;                3 [50.0% (guessed)]  count:536870912 (estimated
> >> locally, freq 0.5000) (FALSE_VALUE,EXECUTABLE)
> >>
> >> ;;   basic block 3, loop depth 0, count 536870912 (estimated locally,
> >> freq 0.5000), maybe hot
> >> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> >> ;;    pred:       2 [50.0% (guessed)]  count:536870912 (estimated
> >> locally, freq 0.5000) (FALSE_VALUE,EXECUTABLE)
> >>     _1 = y_3(D) << 2;
> >>     ret_5 = _1 + z_4(D);
> >> ;;    succ:       4 [always]  count:536870912 (estimated locally, freq
> >> 0.5000) (FALLTHRU,EXECUTABLE)
> >>
> >> ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally,
> >> freq 1.0000), maybe hot
> >> ;;    prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> >> ;;    pred:       3 [always]  count:536870912 (estimated locally, freq
> >> 0.5000) (FALLTHRU,EXECUTABLE)
> >> ;;                2 [50.0% (guessed)]  count:536870912 (estimated
> >> locally, freq 0.5000) (TRUE_VALUE,EXECUTABLE)
> >>     # ret_2 = PHI <ret_5(3), y_3(D)(2)>
> >>     # VUSE <.MEM_6(D)>
> >>     return ret_2;
> >> ;;    succ:       EXIT [always]  count:1073741824 (estimated locally,
> >> freq 1.0000) (EXECUTABLE) test.c:8:10
> >>
> >> }
> >
> > Yes afterwards, the sink happens. And pushed ` y_3(D) << 2`/`_1 +
> > z_4(D)` into the branch.
> > And then uncprop changes `0(2)` into `y_3(D)(2)` in the PHI (sometimes
> > that is better; sometimes that is worse; for riscv with zicond it is
> > usually worse because ifcvt does not always convert back into 0).
> > I wonder if keeping `(lshift @0 @4)` as being compared to 0 if that
> > would be better, it would at least keep one expression out of the
> > branch.
>
>
> I did some tests with the RISC-V target to see the final .s for each
> (-march=rv64gcv_zba_zbb_zbc_zbs_zfa_zicond). For the pattern I sent:
>
> +(for cmp (eq ne)
> + (for op (lshift rshift)
> +  (simplify
> +   (cond (cmp:c (plus:c (op @1 @2) @0) @0) @3 @4)
> +    (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +        && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@1)))
> +     (with
> +      {
> +       fold_overflow_warning ("assuming signed overflow does not occur "
> +                             "when changing (X << imm) + C1 cmp C1 to "
> +                             "X << imm cmp 0 and X << imm cmp 0 "
> +                             "to X cmp 0",
> +                             WARN_STRICT_OVERFLOW_COMPARISON);  }


Please don't any new fold_overflow_warning things, those are borderly
useless (IMO all of them should be wiped).

No comments on the rest, I didn't review the discussion.

Richard.

> +     (cond (cmp @1 { build_zero_cst (type); }) @3 @4))))))
>
> frob:
> .LFB0:
>          .cfi_startproc
>          sh2add  a2,a1,a2
>          czero.eqz       a0,a2,a1
>          ret
>          .cfi_endproc
> .LFE0:
>
>
> Using a variation of the existing pattern Andrew mentioned, returning a
> lshift czero compare:
>
>
> +/* Do the same but with lshift|rshift as @0.  */
> +(for cmp (eq ne)
> + (for op (lshift rshift)
> +  (simplify
> +   (cmp:c (nop_convert?@3 (plus:c@2 (op @0 @4) (convert1? @1)))
> (convert2? @1))
> +   (with { tree type0 = TREE_TYPE (@0);  }
> +   (if (ANY_INTEGRAL_TYPE_P (type0)
> +       && (TYPE_OVERFLOW_UNDEFINED (type0)
> +           || TYPE_OVERFLOW_WRAPS (type0))
> +       && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@2))
> +       && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@1)))
> +    (cmp (convert:type0 (op @0 @4)) { build_zero_cst (type0); }))))))
>
>
> frob:
> .LFB0:
>          .cfi_startproc
>          slli    a1,a1,2
>          czero.eqz       a0,a2,a1
>          add     a0,a1,a0
>          ret
>          .cfi_endproc
> .LFE0:
>
>
> Same thing, but returning a @0 czero compare instead:
>
> +(for cmp (eq ne)
> + (for op (lshift rshift)
> +  (simplify
> +   (cmp:c (nop_convert?@3 (plus:c@2 (op @0 @4) (convert1? @1)))
> (convert2? @1))
> +   (with { tree type0 = TREE_TYPE (@0);  }
> +   (if (ANY_INTEGRAL_TYPE_P (type0)
> +       && (TYPE_OVERFLOW_UNDEFINED (type0)
> +           || TYPE_OVERFLOW_WRAPS (type0))
> +       && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@2))
> +       && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@1)))
> +    (cmp @0 { build_zero_cst (type0); }))))))
>
>
> frob:
> .LFB0:
>          .cfi_startproc
>          sh2add  a2,a1,a2
>          czero.eqz       a0,a2,a1
>          ret
>          .cfi_endproc
> .LFE0:
>
>
> So it seems to me that yes, ifcvt can convert from the existing pattern
> to the new pattern I sent if we return a @0 czero compare.
>
>
> I prefer adding a variation of an existing pattern instead of a brand
> new one so I would go with this approach. But it only works without the
> single_use() checks. Given that this pattern is rather specific, as long
> as we're not regressing existing tests, maybe it's ok?
>
>
> Thanks,
>
> Daniel
>
>
>
>
> >
> >
> >>
> >>
> >> With the pattern I sent this is the 'optimized' dump I get:
> >>
> >>
> >> long int frob (long int x, long int y, long int z)
> >> {
> >>     long int ret;
> >>     long int _1;
> >>     _Bool _7;
> >>     long int _8;
> >>
> >> ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally,
> >> freq 1.0000), maybe hot
> >> ;;    prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
> >> ;;    pred:       ENTRY [always]  count:1073741824 (estimated locally,
> >> freq 1.0000) (FALLTHRU,EXECUTABLE)
> >>     _1 = y_3(D) << 2;
> >>     ret_5 = _1 + z_4(D);
> >>     _7 = y_3(D) == 0;
> >>     _8 = _7 ? 0 : ret_5;
> >>     # VUSE <.MEM_6(D)>
> >>     return _8;
> >> ;;    succ:       EXIT [always]  count:1073741824 (estimated locally,
> >> freq 1.0000) (EXECUTABLE) test.c:8:10
> >>
> >>
> >> I'm not sure why we're having this difference given that the patterns
> >> are quite similar.
> >
> > Because when you have a cond_expr in the output of a match pattern, we
> > don't recreate an if/else branches.
> > In the pattern without the cond_expr, the sink pass sinks the other
> > expressions. While here we have an explicit COND_EXPR rather than
> > control flow.
> > The big question is, can ifcvt on the rtl level recover the first way
> > into the second way; I think currently it does not but I have not
> > fully looked; so we might to need help it on the gimple level (for GCC
> > 17 though).
> >
> >>
> >> By the way my initial idea for upstreaming was to split the pattern in
> >> two. One like this:
> >>
> >> X << N EQ|NE 0 -> X EQ|NE 0
> >>
> >> And another one like the original I sent but returning the full lshift
> >> comparison instead of doing the reduction, i.e.:
> >>
> >> (a OP b) + c EQ|NE c -> (a OP b) EQ|NE 0
> >>
> >>
> >> But I got a lot of trouble with errors like:
> >>
> >> Analyzing compilation unit
> >> Performing interprocedural optimizations
> >>    <*free_lang_data> {heap 1028k} <visibility> {heap 1028k}
> >> <build_ssa_passes> {heap 1028k} <targetclone> {heap 1440k} <opt_lo
> >> cal_passes> {heap 1440k}during GIMPLE pass: ccp
> >> dump file: test.c.038t.ccp1
> >> test.c: In function ‘frob’:
> >> test.c:9:1: internal compiler error: in decompose, at wide-int.h:1049
> >>       9 | }
> >>         | ^
> >> 0x2ecc34f internal_error(char const*, ...)
> >>           ../../gcc/diagnostic-global-context.cc:787
> >> 0xf0790f fancy_abort(char const*, int, char const*)
> >>           ../../gcc/diagnostics/context.cc:1805
> >> 0xa5b276 wi::int_traits<generic_wide_int<wide_int_ref_storage<false,
> >> false> > >::decompose(long*, unsigned int, generic_wide
> >> _int<wide_int_ref_storage<false, false> > const&)
> >>           ../../gcc/wide-int.h:1049
> >> 0xa5d419 wi::int_traits<generic_wide_int<wide_int_storage>
> >>   >::decompose(long*, unsigned int, generic_wide_int<wide_int_stora
> >> ge> const&)
> >>           ../../gcc/tree.h:3906
> >> 0xa5d419 wide_int_ref_storage<true,
> >> false>::wide_int_ref_storage<generic_wide_int<wide_int_storage>
> >>   >(generic_wide_int<wide_
> >> int_storage> const&, unsigned int)
> >>           ../../gcc/wide-int.h:1099
> >> 0xa5d419 generic_wide_int<wide_int_ref_storage<true, false>
> >>   >::generic_wide_int<generic_wide_int<wide_int_storage> >(generic
> >> _wide_int<wide_int_storage> const&, unsigned int)
> >>           ../../gcc/wide-int.h:855
> >>
> >> My guess is that I was messing up the relation between the 'type'
> >> precision and the precision used in build_zero_cst(). I gave up and
> >> sent the full pattern, but I wonder if splitting it would be better.
> >
> > This usually does mean a type mismatch has happened or you are
> > comparing two `wi::to_wide` which come from different precision
> > INTEGER_CSTs.
> >
> > Thanks,
> > Andrew
> >
> >>
> >>
> >> Thanks,
> >>
> >> Daniel
> >>
> >>
> >>
> >>
> >>>>
> >>>> Jeff/Richard,
> >>>>     What are your thoughts on this? I know Richard had thoughts on some
> >>>> of the single_use in the match patterns before but I have not tracked
> >>>> them that well.
> >>>
> >>> I had no idea we had that match.pd pattern; clearly what Daniel is doing
> >>> is closely related.
> >>>
> >>> The single_use in those patterns comes up here:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484606.html
> >>>
> >>> Looks like single use is in there because of interactions with another
> >>> pattern.
> >>>
> >>> jeff
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
>

Re: [PATCH] match.pd: simplify "shift + reg EQ|NE reg"

Reply via email to