Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-11 Thread Richard Biener
On Fri, Mar 8, 2024 at 6:50 PM Ken Matsui  wrote:
>
> On Thu, Mar 7, 2024 at 10:49 PM Richard Biener
>  wrote:
> >
> > On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui  wrote:
> > >
> > > On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  
> > > > wrote:
> > > > >
> > > > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  
> > > > > > wrote:
> > > > > > >
> > > > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > > > equivalence does not hold for types other than signed integers.  
> > > > > > > When
> > > > > > > it comes to conditions, the former was translated to a 
> > > > > > > combination of
> > > > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > > > latter.
> > > > > > >
> > > > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed 
> > > > > > > integers
> > > > > > > as defined behavior, specifically, wrapping around according to 
> > > > > > > two's
> > > > > > > complement arithmetic.  This has implications for optimizations 
> > > > > > > that
> > > > > > > rely on the standard behavior of signed integers, where overflow 
> > > > > > > is
> > > > > > > undefined.  Consider the example given:
> > > > > > >
> > > > > > > long long llmax = __LONG_LONG_MAX__;
> > > > > > > long long llmin = -llmax - 1;
> > > > > > >
> > > > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, 
> > > > > > > which
> > > > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum 
> > > > > > > value
> > > > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > > > produces a negative value due to two's complement wraparound.
> > > > > > > Therefore, `llmax - llmin < 0` is true.
> > > > > > >
> > > > > > > However, the direct comparison `llmax < llmin` is false since 
> > > > > > > `llmax`
> > > > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > > > `x CMP y` (and vice versa) cannot be safely applied when 
> > > > > > > `-fwrapv` is
> > > > > > > enabled.  This is why this optimization pass is disabled under
> > > > > > > `-fwrapv`.
> > > > > > >
> > > > > > > This optimization pass must run before the Jump Threading pass 
> > > > > > > and the
> > > > > > > VRP pass, as it may modify conditions. For example, in the VRP 
> > > > > > > pass:
> > > > > > >
> > > > > > > (1)
> > > > > > >   int diff = x - y;
> > > > > > >   if (diff > 0)
> > > > > > > foo();
> > > > > > >   if (diff < 0)
> > > > > > > bar();
> > > > > > >
> > > > > > > The second condition would be converted to diff != 0 in the VRP 
> > > > > > > pass
> > > > > > > because we know the postcondition of the first condition is diff 
> > > > > > > <= 0,
> > > > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > > > after this VRP, we get:
> > > > > > >
> > > > > > > (2)
> > > > > > >   int diff = x - y;
> > > > > > >   if (x > y)
> > > > > > > foo();
> > > > > > >   if (diff != 0)
> > > > > > > bar();
> > > > > > >
> > > > > > > This generates sub and test for the second condition and cmp for 
> > > > > > > the
> > > > > > > first condition. However, if we apply this pass beforehand, we 
> > > > > > > simply
> > > > > > > get:
> > > > > > >
> > > > > > > (3)
> > > > > > >   int diff = x - y;
> > > > > > >   if (x > y)
> > > > > > > foo();
> > > > > > >   if (x < y)
> > > > > > > bar();
> > > > > > >
> > > > > > > In this code, diff will be eliminated as a dead code, and sub and 
> > > > > > > test
> > > > > > > will not be generated, which is more efficient.
> > > > > > >
> > > > > > > For the Jump Threading pass, without this optimization pass, (1) 
> > > > > > > and
> > > > > > > (3) above are recognized as different, which prevents TCO.
> > > > > > >
> > > > > > > PR middle-end/113680
> > > > > >
> > > > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > > > the explicit code present in the forwprop pass or a new match.pd
> > > > > > pattern.  There's possible interaction with x - y value being used
> > > > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > > > a comparison against zero being possibly implemented by
> > > > > > a flag setting subtraction instruction.
> > > > > >
> > > > >
> 

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-08 Thread Ken Matsui
On Thu, Mar 7, 2024 at 10:49 PM Richard Biener
 wrote:
>
> On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui  wrote:
> >
> > On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
> >  wrote:
> > >
> > > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  
> > > wrote:
> > > >
> > > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > > > > >
> > > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > > equivalence does not hold for types other than signed integers.  
> > > > > > When
> > > > > > it comes to conditions, the former was translated to a combination 
> > > > > > of
> > > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > > latter.
> > > > > >
> > > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed 
> > > > > > integers
> > > > > > as defined behavior, specifically, wrapping around according to 
> > > > > > two's
> > > > > > complement arithmetic.  This has implications for optimizations that
> > > > > > rely on the standard behavior of signed integers, where overflow is
> > > > > > undefined.  Consider the example given:
> > > > > >
> > > > > > long long llmax = __LONG_LONG_MAX__;
> > > > > > long long llmin = -llmax - 1;
> > > > > >
> > > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, 
> > > > > > which
> > > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum 
> > > > > > value
> > > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > > produces a negative value due to two's complement wraparound.
> > > > > > Therefore, `llmax - llmin < 0` is true.
> > > > > >
> > > > > > However, the direct comparison `llmax < llmin` is false since 
> > > > > > `llmax`
> > > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` 
> > > > > > is
> > > > > > enabled.  This is why this optimization pass is disabled under
> > > > > > `-fwrapv`.
> > > > > >
> > > > > > This optimization pass must run before the Jump Threading pass and 
> > > > > > the
> > > > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > > >
> > > > > > (1)
> > > > > >   int diff = x - y;
> > > > > >   if (diff > 0)
> > > > > > foo();
> > > > > >   if (diff < 0)
> > > > > > bar();
> > > > > >
> > > > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > > > because we know the postcondition of the first condition is diff <= 
> > > > > > 0,
> > > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > > after this VRP, we get:
> > > > > >
> > > > > > (2)
> > > > > >   int diff = x - y;
> > > > > >   if (x > y)
> > > > > > foo();
> > > > > >   if (diff != 0)
> > > > > > bar();
> > > > > >
> > > > > > This generates sub and test for the second condition and cmp for the
> > > > > > first condition. However, if we apply this pass beforehand, we 
> > > > > > simply
> > > > > > get:
> > > > > >
> > > > > > (3)
> > > > > >   int diff = x - y;
> > > > > >   if (x > y)
> > > > > > foo();
> > > > > >   if (x < y)
> > > > > > bar();
> > > > > >
> > > > > > In this code, diff will be eliminated as a dead code, and sub and 
> > > > > > test
> > > > > > will not be generated, which is more efficient.
> > > > > >
> > > > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > > > (3) above are recognized as different, which prevents TCO.
> > > > > >
> > > > > > PR middle-end/113680
> > > > >
> > > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > > the explicit code present in the forwprop pass or a new match.pd
> > > > > pattern.  There's possible interaction with x - y value being used
> > > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > > a comparison against zero being possibly implemented by
> > > > > a flag setting subtraction instruction.
> > > > >
> > > >
> > > > Thank you so much for your review!  Although the forwprop pass runs
> > > > multiple times, we might not need to apply this optimization multiple
> > > > times.  Would it be acceptable to add such optimization?  More
> > > > generally, I would like to know how to determine where to put
> > > > optimization in the future.
> > >
> > > This kind of pattern matching ex

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-07 Thread Richard Biener
On Thu, Mar 7, 2024 at 8:29 PM Ken Matsui  wrote:
>
> On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
>  wrote:
> >
> > On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  wrote:
> > >
> > > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > > > >
> > > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > > equivalence does not hold for types other than signed integers.  When
> > > > > it comes to conditions, the former was translated to a combination of
> > > > > sub and test, whereas the latter was translated to a single cmp.
> > > > > Thus, this optimization pass tries to optimize the former to the
> > > > > latter.
> > > > >
> > > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > > as defined behavior, specifically, wrapping around according to two's
> > > > > complement arithmetic.  This has implications for optimizations that
> > > > > rely on the standard behavior of signed integers, where overflow is
> > > > > undefined.  Consider the example given:
> > > > >
> > > > > long long llmax = __LONG_LONG_MAX__;
> > > > > long long llmin = -llmax - 1;
> > > > >
> > > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, 
> > > > > which
> > > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum 
> > > > > value
> > > > > for a `long long`, this calculation overflows in a defined manner
> > > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > > produces a negative value due to two's complement wraparound.
> > > > > Therefore, `llmax - llmin < 0` is true.
> > > > >
> > > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > > enabled.  This is why this optimization pass is disabled under
> > > > > `-fwrapv`.
> > > > >
> > > > > This optimization pass must run before the Jump Threading pass and the
> > > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > > >
> > > > > (1)
> > > > >   int diff = x - y;
> > > > >   if (diff > 0)
> > > > > foo();
> > > > >   if (diff < 0)
> > > > > bar();
> > > > >
> > > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > > because we know the postcondition of the first condition is diff <= 0,
> > > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > > after this VRP, we get:
> > > > >
> > > > > (2)
> > > > >   int diff = x - y;
> > > > >   if (x > y)
> > > > > foo();
> > > > >   if (diff != 0)
> > > > > bar();
> > > > >
> > > > > This generates sub and test for the second condition and cmp for the
> > > > > first condition. However, if we apply this pass beforehand, we simply
> > > > > get:
> > > > >
> > > > > (3)
> > > > >   int diff = x - y;
> > > > >   if (x > y)
> > > > > foo();
> > > > >   if (x < y)
> > > > > bar();
> > > > >
> > > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > > will not be generated, which is more efficient.
> > > > >
> > > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > > (3) above are recognized as different, which prevents TCO.
> > > > >
> > > > > PR middle-end/113680
> > > >
> > > > This shouldn't be done as a new optimization pass.  It fits either
> > > > the explicit code present in the forwprop pass or a new match.pd
> > > > pattern.  There's possible interaction with x - y value being used
> > > > elsewhere and thus exposing a CSE opportunity as well as
> > > > a comparison against zero being possibly implemented by
> > > > a flag setting subtraction instruction.
> > > >
> > >
> > > Thank you so much for your review!  Although the forwprop pass runs
> > > multiple times, we might not need to apply this optimization multiple
> > > times.  Would it be acceptable to add such optimization?  More
> > > generally, I would like to know how to determine where to put
> > > optimization in the future.
> >
> > This kind of pattern matching expression simplification is best
> > addressed by patterns in match.pd though historically the forwprop
> > pass still catches cases not in match.pd in its
> > forward_propagate_into_comparison_1 (and callers).
> >
>
> I see.  When would patterns in match.pd be applied?  Around forwprop
> or somewhere else?  (Also, could you please tell me a document I can
> learn about these if it exists?)  I as

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-07 Thread Ken Matsui
On Tue, Mar 5, 2024 at 7:58 AM Richard Biener
 wrote:
>
> On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  wrote:
> >
> > On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
> >  wrote:
> > >
> > > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > > >
> > > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > > equivalence does not hold for types other than signed integers.  When
> > > > it comes to conditions, the former was translated to a combination of
> > > > sub and test, whereas the latter was translated to a single cmp.
> > > > Thus, this optimization pass tries to optimize the former to the
> > > > latter.
> > > >
> > > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > > as defined behavior, specifically, wrapping around according to two's
> > > > complement arithmetic.  This has implications for optimizations that
> > > > rely on the standard behavior of signed integers, where overflow is
> > > > undefined.  Consider the example given:
> > > >
> > > > long long llmax = __LONG_LONG_MAX__;
> > > > long long llmin = -llmax - 1;
> > > >
> > > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > > for a `long long`, this calculation overflows in a defined manner
> > > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > > produces a negative value due to two's complement wraparound.
> > > > Therefore, `llmax - llmin < 0` is true.
> > > >
> > > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > > enabled.  This is why this optimization pass is disabled under
> > > > `-fwrapv`.
> > > >
> > > > This optimization pass must run before the Jump Threading pass and the
> > > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > > >
> > > > (1)
> > > >   int diff = x - y;
> > > >   if (diff > 0)
> > > > foo();
> > > >   if (diff < 0)
> > > > bar();
> > > >
> > > > The second condition would be converted to diff != 0 in the VRP pass
> > > > because we know the postcondition of the first condition is diff <= 0,
> > > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > > after this VRP, we get:
> > > >
> > > > (2)
> > > >   int diff = x - y;
> > > >   if (x > y)
> > > > foo();
> > > >   if (diff != 0)
> > > > bar();
> > > >
> > > > This generates sub and test for the second condition and cmp for the
> > > > first condition. However, if we apply this pass beforehand, we simply
> > > > get:
> > > >
> > > > (3)
> > > >   int diff = x - y;
> > > >   if (x > y)
> > > > foo();
> > > >   if (x < y)
> > > > bar();
> > > >
> > > > In this code, diff will be eliminated as a dead code, and sub and test
> > > > will not be generated, which is more efficient.
> > > >
> > > > For the Jump Threading pass, without this optimization pass, (1) and
> > > > (3) above are recognized as different, which prevents TCO.
> > > >
> > > > PR middle-end/113680
> > >
> > > This shouldn't be done as a new optimization pass.  It fits either
> > > the explicit code present in the forwprop pass or a new match.pd
> > > pattern.  There's possible interaction with x - y value being used
> > > elsewhere and thus exposing a CSE opportunity as well as
> > > a comparison against zero being possibly implemented by
> > > a flag setting subtraction instruction.
> > >
> >
> > Thank you so much for your review!  Although the forwprop pass runs
> > multiple times, we might not need to apply this optimization multiple
> > times.  Would it be acceptable to add such optimization?  More
> > generally, I would like to know how to determine where to put
> > optimization in the future.
>
> This kind of pattern matching expression simplification is best
> addressed by patterns in match.pd though historically the forwprop
> pass still catches cases not in match.pd in its
> forward_propagate_into_comparison_1 (and callers).
>

I see.  When would patterns in match.pd be applied?  Around forwprop
or somewhere else?  (Also, could you please tell me a document I can
learn about these if it exists?)  I ask because this optimization
should be applied before the Jump Threading and VRP passes.

> > FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
> >
> > > Our VN pass has some tricks to anticipate CSE opportunities
> > > like this, but it's not done "properly" in the wa

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-05 Thread Richard Biener
On Tue, Mar 5, 2024 at 1:51 PM Ken Matsui  wrote:
>
> On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
>  wrote:
> >
> > On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> > >
> > > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > > equivalence does not hold for types other than signed integers.  When
> > > it comes to conditions, the former was translated to a combination of
> > > sub and test, whereas the latter was translated to a single cmp.
> > > Thus, this optimization pass tries to optimize the former to the
> > > latter.
> > >
> > > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > > as defined behavior, specifically, wrapping around according to two's
> > > complement arithmetic.  This has implications for optimizations that
> > > rely on the standard behavior of signed integers, where overflow is
> > > undefined.  Consider the example given:
> > >
> > > long long llmax = __LONG_LONG_MAX__;
> > > long long llmin = -llmax - 1;
> > >
> > > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > > for a `long long`, this calculation overflows in a defined manner
> > > (wrapping around), which under `-fwrapv` is a legal operation that
> > > produces a negative value due to two's complement wraparound.
> > > Therefore, `llmax - llmin < 0` is true.
> > >
> > > However, the direct comparison `llmax < llmin` is false since `llmax`
> > > is the maximum possible value and `llmin` is the minimum.  Hence,
> > > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > > enabled.  This is why this optimization pass is disabled under
> > > `-fwrapv`.
> > >
> > > This optimization pass must run before the Jump Threading pass and the
> > > VRP pass, as it may modify conditions. For example, in the VRP pass:
> > >
> > > (1)
> > >   int diff = x - y;
> > >   if (diff > 0)
> > > foo();
> > >   if (diff < 0)
> > > bar();
> > >
> > > The second condition would be converted to diff != 0 in the VRP pass
> > > because we know the postcondition of the first condition is diff <= 0,
> > > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > > after this VRP, we get:
> > >
> > > (2)
> > >   int diff = x - y;
> > >   if (x > y)
> > > foo();
> > >   if (diff != 0)
> > > bar();
> > >
> > > This generates sub and test for the second condition and cmp for the
> > > first condition. However, if we apply this pass beforehand, we simply
> > > get:
> > >
> > > (3)
> > >   int diff = x - y;
> > >   if (x > y)
> > > foo();
> > >   if (x < y)
> > > bar();
> > >
> > > In this code, diff will be eliminated as a dead code, and sub and test
> > > will not be generated, which is more efficient.
> > >
> > > For the Jump Threading pass, without this optimization pass, (1) and
> > > (3) above are recognized as different, which prevents TCO.
> > >
> > > PR middle-end/113680
> >
> > This shouldn't be done as a new optimization pass.  It fits either
> > the explicit code present in the forwprop pass or a new match.pd
> > pattern.  There's possible interaction with x - y value being used
> > elsewhere and thus exposing a CSE opportunity as well as
> > a comparison against zero being possibly implemented by
> > a flag setting subtraction instruction.
> >
>
> Thank you so much for your review!  Although the forwprop pass runs
> multiple times, we might not need to apply this optimization multiple
> times.  Would it be acceptable to add such optimization?  More
> generally, I would like to know how to determine where to put
> optimization in the future.

This kind of pattern matching expression simplification is best
addressed by patterns in match.pd though historically the forwprop
pass still catches cases not in match.pd in its
forward_propagate_into_comparison_1 (and callers).

> FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse
>
> > Our VN pass has some tricks to anticipate CSE opportunities
> > like this, but it's not done "properly" in the way of anticipating
> > both forms during PRE.
> >
> > I'll note we have
> >
> >  /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
> >  (for cmp (ne ltgt)
> >
> > and similar which might be confused by canonicalizing to A != B.
>
> I will investigate and update my patch (after my final exam ends...)!
>
> > I'm also surprised we don't already have the pattern you add.
>
> Hmm, so am I.

It looks like we do it for equality compares which can also handle
types where overflow is undefined.  -fdump-tree-all-folding sho

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-05 Thread Ken Matsui
On Tue, Mar 5, 2024 at 12:38 AM Richard Biener
 wrote:
>
> On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
> >
> > (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> > integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> > equivalent to y CMP x.  As reported in PR middle-end/113680, this
> > equivalence does not hold for types other than signed integers.  When
> > it comes to conditions, the former was translated to a combination of
> > sub and test, whereas the latter was translated to a single cmp.
> > Thus, this optimization pass tries to optimize the former to the
> > latter.
> >
> > When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> > as defined behavior, specifically, wrapping around according to two's
> > complement arithmetic.  This has implications for optimizations that
> > rely on the standard behavior of signed integers, where overflow is
> > undefined.  Consider the example given:
> >
> > long long llmax = __LONG_LONG_MAX__;
> > long long llmin = -llmax - 1;
> >
> > Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> > simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> > for a `long long`, this calculation overflows in a defined manner
> > (wrapping around), which under `-fwrapv` is a legal operation that
> > produces a negative value due to two's complement wraparound.
> > Therefore, `llmax - llmin < 0` is true.
> >
> > However, the direct comparison `llmax < llmin` is false since `llmax`
> > is the maximum possible value and `llmin` is the minimum.  Hence,
> > optimizations that rely on the equivalence of `(x - y) CMP 0` to
> > `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> > enabled.  This is why this optimization pass is disabled under
> > `-fwrapv`.
> >
> > This optimization pass must run before the Jump Threading pass and the
> > VRP pass, as it may modify conditions. For example, in the VRP pass:
> >
> > (1)
> >   int diff = x - y;
> >   if (diff > 0)
> > foo();
> >   if (diff < 0)
> > bar();
> >
> > The second condition would be converted to diff != 0 in the VRP pass
> > because we know the postcondition of the first condition is diff <= 0,
> > and then diff != 0 is cheaper than diff < 0. If we apply this pass
> > after this VRP, we get:
> >
> > (2)
> >   int diff = x - y;
> >   if (x > y)
> > foo();
> >   if (diff != 0)
> > bar();
> >
> > This generates sub and test for the second condition and cmp for the
> > first condition. However, if we apply this pass beforehand, we simply
> > get:
> >
> > (3)
> >   int diff = x - y;
> >   if (x > y)
> > foo();
> >   if (x < y)
> > bar();
> >
> > In this code, diff will be eliminated as a dead code, and sub and test
> > will not be generated, which is more efficient.
> >
> > For the Jump Threading pass, without this optimization pass, (1) and
> > (3) above are recognized as different, which prevents TCO.
> >
> > PR middle-end/113680
>
> This shouldn't be done as a new optimization pass.  It fits either
> the explicit code present in the forwprop pass or a new match.pd
> pattern.  There's possible interaction with x - y value being used
> elsewhere and thus exposing a CSE opportunity as well as
> a comparison against zero being possibly implemented by
> a flag setting subtraction instruction.
>

Thank you so much for your review!  Although the forwprop pass runs
multiple times, we might not need to apply this optimization multiple
times.  Would it be acceptable to add such optimization?  More
generally, I would like to know how to determine where to put
optimization in the future.

FYI, I read this page: https://gcc.gnu.org/wiki/OptimizationCourse

> Our VN pass has some tricks to anticipate CSE opportunities
> like this, but it's not done "properly" in the way of anticipating
> both forms during PRE.
>
> I'll note we have
>
>  /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
>  (for cmp (ne ltgt)
>
> and similar which might be confused by canonicalizing to A != B.

I will investigate and update my patch (after my final exam ends...)!

> I'm also surprised we don't already have the pattern you add.

Hmm, so am I.  I saw that this optimization sometimes messes up VRP
while sometimes helping it (as I described in my comment).  I also
need to research this.

>
> Richard.
>
> > gcc/ChangeLog:
> >
> > * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> > * common.opt: Define ftree-cmp
> > * doc/invoke.texi: Document ftree-cmp.
> > * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> > * passes.def (pass_cmp): New optimization pass.
> > * timevar.def (TV_TREE_CMP): New variable for timing.
> > * tree-pass.h (make_pass_cmp): New declaration.
> > * tree-ssa-cmp.cc: New file.

Re: [PATCH] middle-end/113680 - Optimize (x - y) CMP 0 as x CMP y

2024-03-05 Thread Richard Biener
On Mon, Mar 4, 2024 at 9:40 PM Ken Matsui  wrote:
>
> (x - y) CMP 0 is equivalent to x CMP y where x and y are signed
> integers and CMP is <, <=, >, or >=.  Similarly, 0 CMP (x - y) is
> equivalent to y CMP x.  As reported in PR middle-end/113680, this
> equivalence does not hold for types other than signed integers.  When
> it comes to conditions, the former was translated to a combination of
> sub and test, whereas the latter was translated to a single cmp.
> Thus, this optimization pass tries to optimize the former to the
> latter.
>
> When `-fwrapv` is enabled, GCC treats the overflow of signed integers
> as defined behavior, specifically, wrapping around according to two's
> complement arithmetic.  This has implications for optimizations that
> rely on the standard behavior of signed integers, where overflow is
> undefined.  Consider the example given:
>
> long long llmax = __LONG_LONG_MAX__;
> long long llmin = -llmax - 1;
>
> Here, `llmax - llmin` effectively becomes `llmax - (-llmax - 1)`, which
> simplifies to `2 * llmax + 1`.  Given that `llmax` is the maximum value
> for a `long long`, this calculation overflows in a defined manner
> (wrapping around), which under `-fwrapv` is a legal operation that
> produces a negative value due to two's complement wraparound.
> Therefore, `llmax - llmin < 0` is true.
>
> However, the direct comparison `llmax < llmin` is false since `llmax`
> is the maximum possible value and `llmin` is the minimum.  Hence,
> optimizations that rely on the equivalence of `(x - y) CMP 0` to
> `x CMP y` (and vice versa) cannot be safely applied when `-fwrapv` is
> enabled.  This is why this optimization pass is disabled under
> `-fwrapv`.
>
> This optimization pass must run before the Jump Threading pass and the
> VRP pass, as it may modify conditions. For example, in the VRP pass:
>
> (1)
>   int diff = x - y;
>   if (diff > 0)
> foo();
>   if (diff < 0)
> bar();
>
> The second condition would be converted to diff != 0 in the VRP pass
> because we know the postcondition of the first condition is diff <= 0,
> and then diff != 0 is cheaper than diff < 0. If we apply this pass
> after this VRP, we get:
>
> (2)
>   int diff = x - y;
>   if (x > y)
> foo();
>   if (diff != 0)
> bar();
>
> This generates sub and test for the second condition and cmp for the
> first condition. However, if we apply this pass beforehand, we simply
> get:
>
> (3)
>   int diff = x - y;
>   if (x > y)
> foo();
>   if (x < y)
> bar();
>
> In this code, diff will be eliminated as a dead code, and sub and test
> will not be generated, which is more efficient.
>
> For the Jump Threading pass, without this optimization pass, (1) and
> (3) above are recognized as different, which prevents TCO.
>
> PR middle-end/113680

This shouldn't be done as a new optimization pass.  It fits either
the explicit code present in the forwprop pass or a new match.pd
pattern.  There's possible interaction with x - y value being used
elsewhere and thus exposing a CSE opportunity as well as
a comparison against zero being possibly implemented by
a flag setting subtraction instruction.

Our VN pass has some tricks to anticipate CSE opportunities
like this, but it's not done "properly" in the way of anticipating
both forms during PRE.

I'll note we have

 /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
 (for cmp (ne ltgt)

and similar which might be confused by canonicalizing to A != B.
I'm also surprised we don't already have the pattern you add.

Richard.

> gcc/ChangeLog:
>
> * Makefile.in: Add tree-ssa-cmp.o to OBJS.
> * common.opt: Define ftree-cmp
> * doc/invoke.texi: Document ftree-cmp.
> * opts.cc (default_options_table): Handle OPT_ftree_cmp.
> * passes.def (pass_cmp): New optimization pass.
> * timevar.def (TV_TREE_CMP): New variable for timing.
> * tree-pass.h (make_pass_cmp): New declaration.
> * tree-ssa-cmp.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr113680.c: New test.
>
> Signed-off-by: Ken Matsui 
> ---
>  gcc/Makefile.in |   1 +
>  gcc/common.opt  |   4 +
>  gcc/doc/invoke.texi |  11 +-
>  gcc/opts.cc |   1 +
>  gcc/passes.def  |   3 +
>  gcc/testsuite/gcc.dg/pr113680.c |  47 ++
>  gcc/timevar.def |   1 +
>  gcc/tree-pass.h |   1 +
>  gcc/tree-ssa-cmp.cc | 262 
>  9 files changed, 330 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr113680.c
>  create mode 100644 gcc/tree-ssa-cmp.cc
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a74761b7ab3..935b80b6947 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1731,6 +1731,7 @@ OBJS = \