Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-15 Thread Marc Glisse

On Fri, 14 May 2021, Jakub Jelinek via Gcc-patches wrote:


On Thu, May 06, 2021 at 09:42:41PM +0200, Marc Glisse wrote:

We can probably do it in 2 steps, first something like

(for cmp (eq ne)
 (simplify
  (cmp (bit_and:c @0 @1) @0)
  (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))

to get rid of the double use, and then simplify X&C==0 to X<=~C if C is a
mask 111...000 (I thought we already had a function to detect such masks, or
the 000...111, but I can't find them anymore).

I agree that the comparison seems preferable, although if X is signed, the
way GIMPLE represents types will add an inconvenient cast. And I think VRP
already manages to use the bit test to derive a range.


I've tried the second step, but it unfortunately regresses
+FAIL: gcc.dg/ipa/propbits-2.c scan-tree-dump-not optimized "fail_test"
+FAIL: gcc.dg/tree-ssa/loop-42.c scan-tree-dump-not ivcanon "under assumptions "
so maybe it is better to keep these cases as the users wrote them.


Thank you for trying it, the patch looks good (it would probably also work 
on GENERIC), but indeed it is just a canonicalization, not necessarily 
worth breaking other stuff. This seems to point to another missed 
optimization. Looking at propbits-2.c, if I rewrite the condition in f1 as


  if ((unsigned)x <= 0xf)

I see later in VRP

  # RANGE [0, 4294967295] NONZERO 15
  x.2_1 = (unsigned intD.9) x_4(D);
  if (x.2_1 <= 15)

where we fail to derive a range from the nonzero bits. Maybe VRP expects 
IPA-CP should have done it already and doesn't bother recomputing it.


I understand this may not be a priority though, that's fine.

I didn't look at loop-42.c.

--
Marc Glisse


Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-14 Thread Jakub Jelinek via Gcc-patches
On Thu, May 06, 2021 at 09:42:41PM +0200, Marc Glisse wrote:
> We can probably do it in 2 steps, first something like
> 
> (for cmp (eq ne)
>  (simplify
>   (cmp (bit_and:c @0 @1) @0)
>   (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))
> 
> to get rid of the double use, and then simplify X&C==0 to X<=~C if C is a
> mask 111...000 (I thought we already had a function to detect such masks, or
> the 000...111, but I can't find them anymore).
> 
> I agree that the comparison seems preferable, although if X is signed, the
> way GIMPLE represents types will add an inconvenient cast. And I think VRP
> already manages to use the bit test to derive a range.

I've tried the second step, but it unfortunately regresses
+FAIL: gcc.dg/ipa/propbits-2.c scan-tree-dump-not optimized "fail_test"
+FAIL: gcc.dg/tree-ssa/loop-42.c scan-tree-dump-not ivcanon "under assumptions "
so maybe it is better to keep these cases as the users wrote them.

So posting this patch just for archival purposes.

2021-05-13  Jakub Jelinek  

PR tree-optimization/94589
* match.pd ((X & (-1 << N)) == 0 -> X <= (1 << N) - 1U): New
simplification.

* gcc.dg/tree-ssa/pr94589-2.c: New test.

--- gcc/match.pd.jj 2021-05-12 09:45:55.832976540 +0200
+++ gcc/match.pd2021-05-13 19:51:17.458507854 +0200
@@ -4792,6 +4792,22 @@ (define_operator_list COND_TERNARY
   (cmp (bit_and:cs @0 @2) (bit_and:cs @1 @2))
   (cmp (bit_and (bit_xor @0 @1) @2) { build_zero_cst (TREE_TYPE (@2)); })))
 
+#if GIMPLE
+/* (X & (-1 << N)) == 0 becomes X <= (1 << N) - 1.  */
+(for cmp (eq ne)
+ ncmp (le gt)
+ (simplify
+  (cmp:c (bit_and:cs @0 INTEGER_CST@1) integer_zerop)
+  (with { tree utype = NULL_TREE;
+ int tz = wi::ctz (wi::to_wide (@1));
+ int prec = TYPE_PRECISION (TREE_TYPE (@1));
+ if (tz && wi::eq_p (wi::shifted_mask (tz, prec - tz, false, prec),
+ wi::to_wide (@1)))
+   utype = unsigned_type_for (TREE_TYPE (@0)); }
+   (if (utype)
+(ncmp (convert:utype @0) (convert:utype (bit_not @1)))
+#endif
+
 /* (X < 0) != (Y < 0) into (X ^ Y) < 0.
(X >= 0) != (Y >= 0) into (X ^ Y) < 0.
(X < 0) == (Y < 0) into (X ^ Y) >= 0.
--- gcc/testsuite/gcc.dg/tree-ssa/pr94589-2.c.jj2021-05-13 
19:55:21.201854003 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr94589-2.c   2021-05-13 19:54:59.804086977 
+0200
@@ -0,0 +1,21 @@
+/* PR tree-optimization/94589 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+foo (int x)
+{
+  return (x & (-1 << 7)) == 0;
+/* { dg-final { scan-tree-dump " = \\\(unsigned int\\\) x_" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " & -128" "optimized" } } */
+/* { dg-final { scan-tree-dump " <= 127" "optimized" } } */
+}
+
+int
+bar (int y)
+{
+  return (y & (-1 << 12)) != 0;
+/* { dg-final { scan-tree-dump " = \\\(unsigned int\\\) y_" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " & -4096" "optimized" } } */
+/* { dg-final { scan-tree-dump " > 4095" "optimized" } } */
+}


Jakub



Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-06 Thread Marc Glisse

On Thu, 6 May 2021, Jakub Jelinek via Gcc-patches wrote:


Though, (x&1) == x is equivalent to both (x&~1)==0 and to x < 2U
and from the latter two it isn't obvious which one is better/more canonical.
On aarch64 I don't see differences in number of insns nor in their size:
 10:13001c00sxtbw0, w0
 14:721f781ftst w0, #0xfffe
 18:1a9f17e0csetw0, eq  // eq = none
 1c:d65f03c0ret
vs.
 20:12001c00and w0, w0, #0xff
 24:7100041fcmp w0, #0x1
 28:1a9f87e0csetw0, ls  // ls = plast
 2c:d65f03c0ret
On x86_64 same number of insns, but the comparison is shorter (note, the
spaceship result is a struct with signed char based enum):
 10:31 c0   xor%eax,%eax
 12:81 e7 fe 00 00 00   and$0xfe,%edi
 18:0f 94 c0sete   %al
 1b:c3  retq
 1c:0f 1f 40 00 nopl   0x0(%rax)
vs.
 20:31 c0   xor%eax,%eax
 22:40 80 ff 01 cmp$0x1,%dil
 26:0f 96 c0setbe  %al
 29:c3  retq
Generally, I'd think that the comparison should be better because it
will be most common in user code that way and VRP will be able to do the
best thing for it.


We can probably do it in 2 steps, first something like

(for cmp (eq ne)
 (simplify
  (cmp (bit_and:c @0 @1) @0)
  (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))

to get rid of the double use, and then simplify X&C==0 to X<=~C if C is a 
mask 111...000 (I thought we already had a function to detect such masks, 
or the 000...111, but I can't find them anymore).


I agree that the comparison seems preferable, although if X is signed, the 
way GIMPLE represents types will add an inconvenient cast. And I think VRP 
already manages to use the bit test to derive a range.


--
Marc Glisse


Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-06 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 06:52:27PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Wed, May 05, 2021 at 01:45:29PM +0200, Marc Glisse wrote:
> > On Tue, 4 May 2021, Jakub Jelinek via Gcc-patches wrote:
> > 
> > > 2) the pr94589-2.C testcase should be matching just 12 times each, but 
> > > runs
> > > into operator>=(strong_ordering, unspecified) being defined as
> > > (_M_value&1)==_M_value
> > > rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
> > > unreachable and so (_M_value&1)==_M_value is then equivalent to 
> > > _M_value>=0,
> > > but is not a single use but two uses.  I'll need to pattern match that 
> > > case
> > > specially.
> > 
> > Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0,
> > that could be worth doing already in GIMPLE.
> 
> Apparently it is
>   /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
>  constant folding if x/y is a constant.  */
>   if ((code == EQ || code == NE)
>   && (op0code == AND || op0code == IOR)
>   && !side_effects_p (op1)
>   && op1 != CONST0_RTX (cmp_mode))
> {
>   /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
>  (eq/ne (and (not y) x) 0).  */
> ...
>   /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
>  (eq/ne (and (not x) y) 0).  */
> Yes, doing that on GIMPLE for the case where the not argument is constant
> would simplify the phiopt follow-up (it would be single imm use then).

Though, (x&1) == x is equivalent to both (x&~1)==0 and to x < 2U
and from the latter two it isn't obvious which one is better/more canonical.
On aarch64 I don't see differences in number of insns nor in their size:
  10:   13001c00sxtbw0, w0
  14:   721f781ftst w0, #0xfffe
  18:   1a9f17e0csetw0, eq  // eq = none
  1c:   d65f03c0ret
vs.
  20:   12001c00and w0, w0, #0xff
  24:   7100041fcmp w0, #0x1
  28:   1a9f87e0csetw0, ls  // ls = plast
  2c:   d65f03c0ret
On x86_64 same number of insns, but the comparison is shorter (note, the
spaceship result is a struct with signed char based enum):
  10:   31 c0   xor%eax,%eax
  12:   81 e7 fe 00 00 00   and$0xfe,%edi
  18:   0f 94 c0sete   %al
  1b:   c3  retq   
  1c:   0f 1f 40 00 nopl   0x0(%rax)
vs.
  20:   31 c0   xor%eax,%eax
  22:   40 80 ff 01 cmp$0x1,%dil
  26:   0f 96 c0setbe  %al
  29:   c3  retq   
Generally, I'd think that the comparison should be better because it
will be most common in user code that way and VRP will be able to do the
best thing for it.

Before we decide how to optimize it, can't we change libstdc++
to use the more efficient implementation?

I.e. either following, or { return (__v._M_value & ~1) == 0; } ?

2021-05-06  Jakub Jelinek  

* compare (operator>=(partial_ordering, __cmp_cat::__unspec),
operator<=(__cmp_cat::__unspec, partial_ordering)): Simplify to
(unsigned char) __v._M_value < 2.

--- gcc/compare.jj  2021-03-10 17:36:41.288491012 +0100
+++ gcc/compare 2021-05-06 11:44:37.519553192 +0200
@@ -107,7 +107,7 @@ namespace std
 
 friend constexpr bool
 operator>=(partial_ordering __v, __cmp_cat::__unspec) noexcept
-{ return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
+{ return static_cast(__v._M_value) < 2; }
 
 friend constexpr bool
 operator< (__cmp_cat::__unspec, partial_ordering __v) noexcept
@@ -119,7 +119,7 @@ namespace std
 
 friend constexpr bool
 operator<=(__cmp_cat::__unspec, partial_ordering __v) noexcept
-{ return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
+{ return static_cast(__v._M_value) < 2; }
 
 friend constexpr bool
 operator>=(__cmp_cat::__unspec, partial_ordering __v) noexcept


Jakub



Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 01:45:29PM +0200, Marc Glisse wrote:
> On Tue, 4 May 2021, Jakub Jelinek via Gcc-patches wrote:
> 
> > 2) the pr94589-2.C testcase should be matching just 12 times each, but runs
> > into operator>=(strong_ordering, unspecified) being defined as
> > (_M_value&1)==_M_value
> > rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
> > unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
> > but is not a single use but two uses.  I'll need to pattern match that case
> > specially.
> 
> Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0,
> that could be worth doing already in GIMPLE.

Apparently it is
  /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
 constant folding if x/y is a constant.  */
  if ((code == EQ || code == NE)
  && (op0code == AND || op0code == IOR)
  && !side_effects_p (op1)
  && op1 != CONST0_RTX (cmp_mode))
{
  /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
 (eq/ne (and (not y) x) 0).  */
...
  /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
 (eq/ne (and (not x) y) 0).  */
Yes, doing that on GIMPLE for the case where the not argument is constant
would simplify the phiopt follow-up (it would be single imm use then).

Jakub



Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Martin Sebor via Gcc-patches

On 5/4/21 1:44 AM, Jakub Jelinek via Gcc-patches wrote:

Hi!

genericize_spaceship genericizes i <=> j to approximately
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else c = 1; c; })
for strong ordering and
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; 
else c = 2; c; })
for partial ordering.
The C++ standard supports then == or != comparisons of that against
strong/partial ordering enums, or />= comparisons of <=> result
against literal 0.

In some cases we already optimize that but in many cases we keep performing
all the 2 or 3 comparisons, compute the spaceship value and then compare
that.

The following patch recognizes those patterns if the <=> operands are
integral types or floating point (the latter only for -ffast-math) and
optimizes it to the single comparison that is needed (plus adds debug stmts
if needed for the spaceship result).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Just a few words on readabilty.

I didn't follow all the logic carefully but I found the code very
dense and hard to read, and a little too frugal with comments.
The new spaceship_replacement function is over 340 lines long!  It
could stand to be broken up into a few smaller logical pieces to
be easier to read and understand.  Commenting the pieces would
also help (a lot).  (It may seem crystal clear to you the way it
is but not all of us read code as it was prose.  It's good to
keep that in mind.)

Overall, long functions are easier to read if they make more liberal
use of vertical whitespace than when they look like one uninterrupted
stream of text.  I see that sometimes you separate chunks of code with
blank lines but other times you don't.  I'm not sure I see any pattern
to it but it would help to separate logical blocks with blank lines,
especially after return statements.  I marked up a few spots to give
you a general idea what I mean.

Martin



There are two things I'd like to address in a follow-up:
1) if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE (lhs1)))
is what I've copied from elsewhere in phiopt, but thinking about it,
alll we care is probably only HONOR_NANS, the matched pattern starts with
== or != comparison and branches to the PHI bb with -1/0/1/2 result if it is
equal, which should be the case for signed zero differences.
2) the pr94589-2.C testcase should be matching just 12 times each, but runs
into operator>=(strong_ordering, unspecified) being defined as
(_M_value&1)==_M_value
rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
but is not a single use but two uses.  I'll need to pattern match that case
specially.

2021-05-04  Jakub Jelinek  

PR tree-optimization/94589
* tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
spaceship_replacement.
(cond_only_block_p, spaceship_replacement): New functions.

* gcc.dg/pr94589-1.c: New test.
* gcc.dg/pr94589-2.c: New test.
* gcc.dg/pr94589-3.c: New test.
* gcc.dg/pr94589-4.c: New test.
* g++.dg/opt/pr94589-1.C: New test.
* g++.dg/opt/pr94589-2.C: New test.
* g++.dg/opt/pr94589-3.C: New test.
* g++.dg/opt/pr94589-4.C: New test.

--- gcc/tree-ssa-phiopt.c.jj2021-05-02 10:17:49.095397758 +0200
+++ gcc/tree-ssa-phiopt.c   2021-05-03 17:49:54.233300624 +0200
@@ -64,6 +64,8 @@ static bool abs_replacement (basic_block
 edge, edge, gimple *, tree, tree);
  static bool xor_replacement (basic_block, basic_block,
 edge, edge, gimple *, tree, tree);
+static bool spaceship_replacement (basic_block, basic_block,
+  edge, edge, gimple *, tree, tree);
  static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, 
basic_block,
  edge, edge, gimple *,
  tree, tree);
@@ -357,6 +359,8 @@ tree_ssa_phiopt_worker (bool do_store_el
cfgchanged = true;
  else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
+ else if (spaceship_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
+   cfgchanged = true;
}
  }
  
@@ -1806,6 +1810,420 @@ minmax_replacement (basic_block cond_bb,
  
return true;

  }
+
+/* Return true if the only executable statement in BB is a GIMPLE_COND.  */
+
+static bool
+cond_only_block_p (basic_block bb)
+{
+  /* BB must have no executable statements.  */
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  if (phi_nodes (bb))
+return false;
+  while (!gsi_end_p (gsi))
+{
+  gimple *stmt = gsi_stmt (gsi);
+  if (is_gimple_debug (stmt))
+   ;
+  else if (gimple_code (stmt) == GIMPLE_NOP
+  || gimple_code (stmt) == GIMPLE_PREDICT
+  || gimple_code (stmt) =

Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Marc Glisse

On Tue, 4 May 2021, Jakub Jelinek via Gcc-patches wrote:


2) the pr94589-2.C testcase should be matching just 12 times each, but runs
into operator>=(strong_ordering, unspecified) being defined as
(_M_value&1)==_M_value
rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
but is not a single use but two uses.  I'll need to pattern match that case
specially.


Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0, 
that could be worth doing already in GIMPLE.


--
Marc Glisse


Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-05 Thread Richard Biener
On Tue, 4 May 2021, Jakub Jelinek wrote:

> Hi!
> 
> genericize_spaceship genericizes i <=> j to approximately
> ({ int c; if (i == j) c = 0; else if (i < j) c = -1; else c = 1; c; })
> for strong ordering and
> ({ int c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; 
> else c = 2; c; })
> for partial ordering.
> The C++ standard supports then == or != comparisons of that against
> strong/partial ordering enums, or />= comparisons of <=> result
> against literal 0.
> 
> In some cases we already optimize that but in many cases we keep performing
> all the 2 or 3 comparisons, compute the spaceship value and then compare
> that.
> 
> The following patch recognizes those patterns if the <=> operands are
> integral types or floating point (the latter only for -ffast-math) and
> optimizes it to the single comparison that is needed (plus adds debug stmts
> if needed for the spaceship result).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> There are two things I'd like to address in a follow-up:
> 1) if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE (lhs1)))
> is what I've copied from elsewhere in phiopt, but thinking about it,
> alll we care is probably only HONOR_NANS, the matched pattern starts with
> == or != comparison and branches to the PHI bb with -1/0/1/2 result if it is
> equal, which should be the case for signed zero differences.
> 2) the pr94589-2.C testcase should be matching just 12 times each, but runs
> into operator>=(strong_ordering, unspecified) being defined as
> (_M_value&1)==_M_value
> rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
> unreachable and so (_M_value&1)==_M_value is then equivalent to _M_value>=0,
> but is not a single use but two uses.  I'll need to pattern match that case
> specially.
> 
> 2021-05-04  Jakub Jelinek  
> 
>   PR tree-optimization/94589
>   * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Call
>   spaceship_replacement.
>   (cond_only_block_p, spaceship_replacement): New functions.
> 
>   * gcc.dg/pr94589-1.c: New test.
>   * gcc.dg/pr94589-2.c: New test.
>   * gcc.dg/pr94589-3.c: New test.
>   * gcc.dg/pr94589-4.c: New test.
>   * g++.dg/opt/pr94589-1.C: New test.
>   * g++.dg/opt/pr94589-2.C: New test.
>   * g++.dg/opt/pr94589-3.C: New test.
>   * g++.dg/opt/pr94589-4.C: New test.
> 
> --- gcc/tree-ssa-phiopt.c.jj  2021-05-02 10:17:49.095397758 +0200
> +++ gcc/tree-ssa-phiopt.c 2021-05-03 17:49:54.233300624 +0200
> @@ -64,6 +64,8 @@ static bool abs_replacement (basic_block
>edge, edge, gimple *, tree, tree);
>  static bool xor_replacement (basic_block, basic_block,
>edge, edge, gimple *, tree, tree);
> +static bool spaceship_replacement (basic_block, basic_block,
> +edge, edge, gimple *, tree, tree);
>  static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, 
> basic_block,
> edge, edge, gimple *,
> tree, tree);
> @@ -357,6 +359,8 @@ tree_ssa_phiopt_worker (bool do_store_el
>   cfgchanged = true;
> else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
>   cfgchanged = true;
> +   else if (spaceship_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> + cfgchanged = true;
>   }
>  }
>  
> @@ -1806,6 +1810,420 @@ minmax_replacement (basic_block cond_bb,
>  
>return true;
>  }
> +
> +/* Return true if the only executable statement in BB is a GIMPLE_COND.  */
> +
> +static bool
> +cond_only_block_p (basic_block bb)
> +{
> +  /* BB must have no executable statements.  */
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +  if (phi_nodes (bb))
> +return false;
> +  while (!gsi_end_p (gsi))
> +{
> +  gimple *stmt = gsi_stmt (gsi);
> +  if (is_gimple_debug (stmt))
> + ;
> +  else if (gimple_code (stmt) == GIMPLE_NOP
> +|| gimple_code (stmt) == GIMPLE_PREDICT
> +|| gimple_code (stmt) == GIMPLE_COND)
> + ;
> +  else
> + return false;
> +  gsi_next (&gsi);
> +}
> +  return true;
> +}
> +
> +/* Attempt to optimize (x <=> y) cmp 0 and similar comparisons.
> +   For strong ordering <=> try to match something like:
> + :
> +if (x_4(D) != y_5(D))
> +  goto ; [INV]
> +else
> +  goto ; [INV]
> +
> + :
> +if (x_4(D) < y_5(D))
> +  goto ; [INV]
> +else
> +  goto ; [INV]
> +
> + :
> +
> + :
> +# iftmp.0_2 = PHI <1(4), 0(2), -1(3)>
> +_1 = iftmp.0_2 == 0;
> +
> +   and for partial ordering <=> something like:
> +
> + :
> +if (a_3(D) == b_5(D))
> +  goto ; [50.00%]
> +else
> +  goto ; [50.00%]
> +
> + [local count: 536870913]:
> +if (a_3(D) < b_5(D))
> +  goto ; [50.00%]
> +else
> +  goto ; [50.00%]
> +
> + [local count:

Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-04 Thread Jakub Jelinek via Gcc-patches
On Tue, May 04, 2021 at 10:42:12AM +0100, Jonathan Wakely wrote:
> > There are two things I'd like to address in a follow-up:
> > 1) if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE 
> > (lhs1)))
> > is what I've copied from elsewhere in phiopt, but thinking about it,
> > alll we care is probably only HONOR_NANS, the matched pattern starts with
> > == or != comparison and branches to the PHI bb with -1/0/1/2 result if it is
> > equal, which should be the case for signed zero differences.
> > 2) the pr94589-2.C testcase should be matching just 12 times each, but runs
> > into operator>=(strong_ordering, unspecified) being defined as
> 
> Should this say s/strong/partial/  ?

Yeah, sorry.

Jakub



Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-04 Thread Jonathan Wakely via Gcc-patches

On 04/05/21 09:44 +0200, Jakub Jelinek wrote:

Hi!

genericize_spaceship genericizes i <=> j to approximately
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else c = 1; c; })
for strong ordering and
({ int c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; 
else c = 2; c; })
for partial ordering.
The C++ standard supports then == or != comparisons of that against
strong/partial ordering enums, or />= comparisons of <=> result
against literal 0.

In some cases we already optimize that but in many cases we keep performing
all the 2 or 3 comparisons, compute the spaceship value and then compare
that.

The following patch recognizes those patterns if the <=> operands are
integral types or floating point (the latter only for -ffast-math) and
optimizes it to the single comparison that is needed (plus adds debug stmts
if needed for the spaceship result).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

There are two things I'd like to address in a follow-up:
1) if (HONOR_NANS (TREE_TYPE (lhs1)) || HONOR_SIGNED_ZEROS (TREE_TYPE (lhs1)))
is what I've copied from elsewhere in phiopt, but thinking about it,
alll we care is probably only HONOR_NANS, the matched pattern starts with
== or != comparison and branches to the PHI bb with -1/0/1/2 result if it is
equal, which should be the case for signed zero differences.
2) the pr94589-2.C testcase should be matching just 12 times each, but runs
into operator>=(strong_ordering, unspecified) being defined as


Should this say s/strong/partial/  ?