Re: [PATCH] match.pd: simplify "shift + reg EQ|NE reg"

Andrew Pinski Wed, 07 Jan 2026 14:06:53 -0800

On Wed, Jan 7, 2026 at 1:47 PM Daniel Henrique Barboza
<[email protected]> wrote:
>
>
>
> On 1/7/2026 11:44 AM, Jeffrey Law wrote:
> >
> >
> > On 1/6/2026 3:13 PM, Andrew Pinski wrote:
> >> On Tue, Jan 6, 2026 at 1:46 PM Daniel Barboza
> >> <[email protected]> wrote:
> >>> Add a transformation for a nested lshift/rshift inside a plus that
> >>> compares for
> >>> equality with the same operand of the plus. In other words:
> >>>
> >>> ((a OP b) + c EQ|NE c) ? x : y
> >>>
> >>> becomes, for OP = (lshift, rshift) and in a scenario without overflows:
> >>>
> >>> a !=/== 0 ? x : y
> >> I think we want the transformation even if it is used outside of a
> >> cond_expr.
> >> Also the above is valid even for types that wrap; just not valid for
> >> types that trap on overflow.
> >>
> >> As we already have a pattern for `a + C == C`:
> >> ```
> >> /* For equality, this is also true with wrapping overflow.  */
> >> (for op (eq ne)
> >>   (simplify
> >>    (op:c (nop_convert?@3 (plus:c@2 @0 (convert1? @1))) (convert2? @1))
> >>    (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> >>         && (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> >>             || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> >>         && (CONSTANT_CLASS_P (@0) || (single_use (@2) && single_use
> >> (@3)))
> >>         && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@2))
> >>         && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@1)))
> >>     (op @0 { build_zero_cst (TREE_TYPE (@0)); })))
> >> ```
> >>
> >> The problem is the above does not work as there is an single_use check
> >> on the plus. The single use check was there since the original match
> >> pattern was added.
> >> I am not sure if we should add a special case where in the above
> >> pattern @0 is a shift.
> >> Though changing that will have to wait for GCC 17 I think.
>
>
> I tried  variation of that pattern, taking into account the lshift and
> removing the single_use and constant_class_p conditionals:
>
> /* Do the same but with lshift|rshift as @0.  */
> (for op (eq ne)
>   (simplify
>    (op:c (nop_convert?@3 (plus:c@2 (lshift @0 @4) (convert1? @1)))
> (convert2? @1))
>    (if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
>         && (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
>            || TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
>         && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@2))
>         && tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@1)))
>     (op @0 { build_zero_cst (TREE_TYPE (@0)); }))))
>
>
> And I got the following 'optimized' dump:
>
>
> long int frob (long int x, long int y, long int z)
> {
>    long int ret;
>    long int _1;
>
> ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally,
> freq 1.0000), maybe hot
> ;;    prev block 0, next block 3, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       ENTRY [always]  count:1073741824 (estimated locally,
> freq 1.0000) (FALLTHRU,EXECUTABLE)
>    if (y_3(D) == 0)
>      goto <bb 4>; [50.00%]
>    else
>      goto <bb 3>; [50.00%]
> ;;    succ:       4 [50.0% (guessed)]  count:536870912 (estimated
> locally, freq 0.5000) (TRUE_VALUE,EXECUTABLE)
> ;;                3 [50.0% (guessed)]  count:536870912 (estimated
> locally, freq 0.5000) (FALSE_VALUE,EXECUTABLE)
>
> ;;   basic block 3, loop depth 0, count 536870912 (estimated locally,
> freq 0.5000), maybe hot
> ;;    prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       2 [50.0% (guessed)]  count:536870912 (estimated
> locally, freq 0.5000) (FALSE_VALUE,EXECUTABLE)
>    _1 = y_3(D) << 2;
>    ret_5 = _1 + z_4(D);
> ;;    succ:       4 [always]  count:536870912 (estimated locally, freq
> 0.5000) (FALLTHRU,EXECUTABLE)
>
> ;;   basic block 4, loop depth 0, count 1073741824 (estimated locally,
> freq 1.0000), maybe hot
> ;;    prev block 3, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       3 [always]  count:536870912 (estimated locally, freq
> 0.5000) (FALLTHRU,EXECUTABLE)
> ;;                2 [50.0% (guessed)]  count:536870912 (estimated
> locally, freq 0.5000) (TRUE_VALUE,EXECUTABLE)
>    # ret_2 = PHI <ret_5(3), y_3(D)(2)>
>    # VUSE <.MEM_6(D)>
>    return ret_2;
> ;;    succ:       EXIT [always]  count:1073741824 (estimated locally,
> freq 1.0000) (EXECUTABLE) test.c:8:10
>
> }


Yes afterwards, the sink happens. And pushed ` y_3(D) << 2`/`_1 +
z_4(D)` into the branch.
And then uncprop changes `0(2)` into `y_3(D)(2)` in the PHI (sometimes
that is better; sometimes that is worse; for riscv with zicond it is
usually worse because ifcvt does not always convert back into 0).
I wonder if keeping `(lshift @0 @4)` as being compared to 0 if that
would be better, it would at least keep one expression out of the
branch.


>
>
> With the pattern I sent this is the 'optimized' dump I get:
>
>
> long int frob (long int x, long int y, long int z)
> {
>    long int ret;
>    long int _1;
>    _Bool _7;
>    long int _8;
>
> ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally,
> freq 1.0000), maybe hot
> ;;    prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       ENTRY [always]  count:1073741824 (estimated locally,
> freq 1.0000) (FALLTHRU,EXECUTABLE)
>    _1 = y_3(D) << 2;
>    ret_5 = _1 + z_4(D);
>    _7 = y_3(D) == 0;
>    _8 = _7 ? 0 : ret_5;
>    # VUSE <.MEM_6(D)>
>    return _8;
> ;;    succ:       EXIT [always]  count:1073741824 (estimated locally,
> freq 1.0000) (EXECUTABLE) test.c:8:10
>
>
> I'm not sure why we're having this difference given that the patterns
> are quite similar.

Because when you have a cond_expr in the output of a match pattern, we
don't recreate an if/else branches.
In the pattern without the cond_expr, the sink pass sinks the other
expressions. While here we have an explicit COND_EXPR rather than
control flow.
The big question is, can ifcvt on the rtl level recover the first way
into the second way; I think currently it does not but I have not
fully looked; so we might to need help it on the gimple level (for GCC
17 though).

>
> By the way my initial idea for upstreaming was to split the pattern in
> two. One like this:
>
> X << N EQ|NE 0 -> X EQ|NE 0
>
> And another one like the original I sent but returning the full lshift
> comparison instead of doing the reduction, i.e.:
>
> (a OP b) + c EQ|NE c -> (a OP b) EQ|NE 0
>
>
> But I got a lot of trouble with errors like:
>
> Analyzing compilation unit
> Performing interprocedural optimizations
>   <*free_lang_data> {heap 1028k} <visibility> {heap 1028k}
> <build_ssa_passes> {heap 1028k} <targetclone> {heap 1440k} <opt_lo
> cal_passes> {heap 1440k}during GIMPLE pass: ccp
> dump file: test.c.038t.ccp1
> test.c: In function ‘frob’:
> test.c:9:1: internal compiler error: in decompose, at wide-int.h:1049
>      9 | }
>        | ^
> 0x2ecc34f internal_error(char const*, ...)
>          ../../gcc/diagnostic-global-context.cc:787
> 0xf0790f fancy_abort(char const*, int, char const*)
>          ../../gcc/diagnostics/context.cc:1805
> 0xa5b276 wi::int_traits<generic_wide_int<wide_int_ref_storage<false,
> false> > >::decompose(long*, unsigned int, generic_wide
> _int<wide_int_ref_storage<false, false> > const&)
>          ../../gcc/wide-int.h:1049
> 0xa5d419 wi::int_traits<generic_wide_int<wide_int_storage>
>  >::decompose(long*, unsigned int, generic_wide_int<wide_int_stora
> ge> const&)
>          ../../gcc/tree.h:3906
> 0xa5d419 wide_int_ref_storage<true,
> false>::wide_int_ref_storage<generic_wide_int<wide_int_storage>
>  >(generic_wide_int<wide_
> int_storage> const&, unsigned int)
>          ../../gcc/wide-int.h:1099
> 0xa5d419 generic_wide_int<wide_int_ref_storage<true, false>
>  >::generic_wide_int<generic_wide_int<wide_int_storage> >(generic
> _wide_int<wide_int_storage> const&, unsigned int)
>          ../../gcc/wide-int.h:855
>
> My guess is that I was messing up the relation between the 'type'
> precision and the precision used in build_zero_cst(). I gave up and
> sent the full pattern, but I wonder if splitting it would be better.

This usually does mean a type mismatch has happened or you are
comparing two `wi::to_wide` which come from different precision
INTEGER_CSTs.

Thanks,
Andrew

>
>
> Thanks,
>
> Daniel
>
>
>
>
> >>
> >> Jeff/Richard,
> >>    What are your thoughts on this? I know Richard had thoughts on some
> >> of the single_use in the match patterns before but I have not tracked
> >> them that well.
> >
> > I had no idea we had that match.pd pattern; clearly what Daniel is doing
> > is closely related.
> >
> > The single_use in those patterns comes up here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2017-October/484606.html
> >
> > Looks like single use is in there because of interactions with another
> > pattern.
> >
> > jeff
> >
> >
> >
> >
> >
>

Re: [PATCH] match.pd: simplify "shift + reg EQ|NE reg"

Reply via email to