Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

2022-01-17 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 17, 2022 at 01:04:40PM +0100, Richard Biener wrote:
> > I guess it depends, for code that can only be called during the expand pass
> > dropping it should be just fine, for code that can be called also (or only)
> > later I think adding JUMP_LABEL and correct LABEL_NUSES is needed because
> > nothing will fix it up afterwards.
> 
> I'm noting that
> 
> +  /* BB must have no executable statements.  */
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +  if (phi_nodes (bb))
> +return false;
> 
> disallows blocks with just a virtual PHI which wouldn't be
> "executable".  Not sure if anything will break when we fix that.

Note I'm only moving the existing function from phiopt to tree-cfg.c
so that I can use it from tree-ssa-math-opts.c.  But all the
cond_only_block_p uses in phiopt and now in tree-ssa-maht-opts.c too
only call it on single_pred_p (bb) basic blocks, so I don't see
what the virtual PHI on those would be good for.

> For code generation we rely on RTL opts to merge compare/scc
> and the subsequent branches on -1/0/1/[-2], correct?  I wonder
> whether that works on other targets as well or whether a
> asm-goto with "optab" UNSPEC text would be more forward looking?

Yes, we rely on some RTL opts, like we rely on it for e.g. the overflow
builtins or various other cases and they seem to be doing their job
well on my testing.  Initially I thought the optab would have 6 arguments,
2 comparison args and 4 labels and I'd emit a switch in the
tree-ssa-math-opts.c (I even wrote such code).  But it didn't work really
well, the switch in some cases wasn't really optimized, and optimization
passes after the widening_mul liked e.g. to propagate the .SPACESHIP
lhs into some but not all the PHI args if there were any etc.
Emitting a function that returns -1/0/1/2 worked better, especially if
the target attempts to emit it as a series of conditional jumps
with small bbs that just set those values.  RTL opts later on will
merge the jumps with further jumps that test the .SPACESHIP result,
or will turn some of the conditional jumps into scc etc.

> The restriction to scalar floats is probably because with
> scalar integers we're doing fine and with vectors we'd need some
> very much different tricks, right?

Sure, for vectors we couldn't use branches etc.  
I'm not really sure how would one write a vector version of the
spaceship actually.  The primary use case is C++ with <=>, but <=>
returns std::*_ordering which is an aggregate and one that isn't
very easy to turn into an integer even, switch doesn't work,
only if (... == std::partial_ordering::equality) ... else if (...
(unless I'm missing something).
But even in C, maybe:
typedef float V __attribute__((vector_size (16)));
typedef int W __attribute__((vector_size (16)));

W
foo (V x, V y)
{
  return (x != y) & (((x < y) & (W) { -1, -1, -1, -1 }) | ((x > y) & (W) { 1, 
1, 1, 1 }) | ((W) { 2, 2, 2, 2 } & ~(x < y) & ~(x > y)));
}

but it isn't clear how I'd optimize it at the assembly, where
we currently emit:
vbroadcastss.LC2(%rip), %xmm2
vcmpltps%xmm0, %xmm1, %xmm3
vbroadcastss.LC3(%rip), %xmm4
vpand   %xmm3, %xmm2, %xmm2
vcmpltps%xmm1, %xmm0, %xmm3
vpor%xmm3, %xmm2, %xmm2
vcmpneq_oqps%xmm1, %xmm0, %xmm3
vcmpneqps   %xmm1, %xmm0, %xmm0
vpandn  %xmm4, %xmm3, %xmm3
vpor%xmm3, %xmm2, %xmm2
vpand   %xmm0, %xmm2, %xmm0
ret
.LC2:
.long   1
.align 4
.LC3:
.long   2

> The middle-end changes look OK, I don't see anything that
> couldn't be changed if other targets run into problems with
> getting similar optimized code.

Thanks.

Jakub



Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

2022-01-17 Thread Richard Biener via Gcc-patches
On Sat, 15 Jan 2022, Jakub Jelinek wrote:

> On Sat, Jan 15, 2022 at 11:42:55AM +0100, Uros Bizjak wrote:
> > Yes, that would be nice. XFmode is used for long double, and not obsolete.
> 
> Ok, that seems to work.  Compared to the incremental patch I've posted, I
> also had to add handling of the case where we have just
> x == y ? 0 : x < y ? -1 : 1 (both for -ffast-math and non-ffast-math).
> Apparently even that is worth optimizing.
> Tested so far on the new testcases, will run full bootstrap/regtest tonight.
> 
> > > Why?  That seems to be a waste of time to me, unless something uses them
> > > already during expansion.  Because pass_expand::execute
> > > runs:
> > >   /* We need JUMP_LABEL be set in order to redirect jumps, and hence
> > >  split edges which edge insertions might do.  */
> > >   rebuild_jump_labels (get_insns ());
> > > which resets all LABEL_NUSES to 0 (well, to:
> > >   if (LABEL_P (insn))
> > > LABEL_NUSES (insn) = (LABEL_PRESERVE_P (insn) != 0);
> > > and then recomputes them and adds JUMP_LABEL if needed:
> > >   JUMP_LABEL (insn) = label;
> > 
> > I was not aware of that detail. Thanks for sharing (and I wonder if
> > all other cases should be removed from the source).
> 
> I guess it depends, for code that can only be called during the expand pass
> dropping it should be just fine, for code that can be called also (or only)
> later I think adding JUMP_LABEL and correct LABEL_NUSES is needed because
> nothing will fix it up afterwards.

I'm noting that

+  /* BB must have no executable statements.  */
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  if (phi_nodes (bb))
+return false;

disallows blocks with just a virtual PHI which wouldn't be
"executable".  Not sure if anything will break when we fix that.

For code generation we rely on RTL opts to merge compare/scc
and the subsequent branches on -1/0/1/[-2], correct?  I wonder
whether that works on other targets as well or whether a
asm-goto with "optab" UNSPEC text would be more forward looking?
The restriction to scalar floats is probably because with
scalar integers we're doing fine and with vectors we'd need some
very much different tricks, right?

The middle-end changes look OK, I don't see anything that
couldn't be changed if other targets run into problems with
getting similar optimized code.

Thanks,
Richard.

> 2022-01-15  Jakub Jelinek  
> 
>   PR target/103973
>   * tree-cfg.h (cond_only_block_p): Declare.
>   * tree-ssa-phiopt.c (cond_only_block_p): Move function to ...
>   * tree-cfg.c (cond_only_block_p): ... here.  No longer static.
>   * optabs.def (spaceship_optab): New optab.
>   * internal-fn.def (SPACESHIP): New internal function.
>   * internal-fn.h (expand_SPACESHIP): Declare.
>   * internal-fn.c (expand_PHI): Formatting fix.
>   (expand_SPACESHIP): New function.
>   * tree-ssa-math-opts.c (optimize_spaceship): New function.
>   (math_opts_dom_walker::after_dom_children): Use it.
>   * config/i386/i386.md (spaceship3): New define_expand.
>   * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare.
>   * config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function.
>   * doc/md.texi (spaceship@var{m}3): Document.
> 
>   * gcc.target/i386/pr103973-1.c: New test.
>   * gcc.target/i386/pr103973-2.c: New test.
>   * gcc.target/i386/pr103973-3.c: New test.
>   * gcc.target/i386/pr103973-4.c: New test.
>   * gcc.target/i386/pr103973-5.c: New test.
>   * gcc.target/i386/pr103973-6.c: New test.
>   * gcc.target/i386/pr103973-7.c: New test.
>   * gcc.target/i386/pr103973-8.c: New test.
>   * gcc.target/i386/pr103973-9.c: New test.
>   * gcc.target/i386/pr103973-10.c: New test.
>   * gcc.target/i386/pr103973-11.c: New test.
>   * gcc.target/i386/pr103973-12.c: New test.
>   * gcc.target/i386/pr103973-13.c: New test.
>   * gcc.target/i386/pr103973-14.c: New test.
>   * gcc.target/i386/pr103973-15.c: New test.
>   * gcc.target/i386/pr103973-16.c: New test.
>   * gcc.target/i386/pr103973-17.c: New test.
>   * gcc.target/i386/pr103973-18.c: New test.
>   * gcc.target/i386/pr103973-19.c: New test.
>   * gcc.target/i386/pr103973-20.c: New test.
>   * g++.target/i386/pr103973-1.C: New test.
>   * g++.target/i386/pr103973-2.C: New test.
>   * g++.target/i386/pr103973-3.C: New test.
>   * g++.target/i386/pr103973-4.C: New test.
>   * g++.target/i386/pr103973-5.C: New test.
>   * g++.target/i386/pr103973-6.C: New test.
>   * g++.target/i386/pr103973-7.C: New test.
>   * g++.target/i386/pr103973-8.C: New test.
>   * g++.target/i386/pr103973-9.C: New test.
>   * g++.target/i386/pr103973-10.C: New test.
>   * g++.target/i386/pr103973-11.C: New test.
>   * g++.target/i386/pr103973-12.C: New test.
>   * g++.target/i386/pr103973-13.C: New test.
>   * g++.target/i386/pr103973-14.C: New te

Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

2022-01-15 Thread Uros Bizjak via Gcc-patches
On Sat, Jan 15, 2022 at 12:23 PM Jakub Jelinek  wrote:
>
> On Sat, Jan 15, 2022 at 11:42:55AM +0100, Uros Bizjak wrote:
> > Yes, that would be nice. XFmode is used for long double, and not obsolete.
>
> Ok, that seems to work.  Compared to the incremental patch I've posted, I
> also had to add handling of the case where we have just
> x == y ? 0 : x < y ? -1 : 1 (both for -ffast-math and non-ffast-math).
> Apparently even that is worth optimizing.
> Tested so far on the new testcases, will run full bootstrap/regtest tonight.
>
> > > Why?  That seems to be a waste of time to me, unless something uses them
> > > already during expansion.  Because pass_expand::execute
> > > runs:
> > >   /* We need JUMP_LABEL be set in order to redirect jumps, and hence
> > >  split edges which edge insertions might do.  */
> > >   rebuild_jump_labels (get_insns ());
> > > which resets all LABEL_NUSES to 0 (well, to:
> > >   if (LABEL_P (insn))
> > > LABEL_NUSES (insn) = (LABEL_PRESERVE_P (insn) != 0);
> > > and then recomputes them and adds JUMP_LABEL if needed:
> > >   JUMP_LABEL (insn) = label;
> >
> > I was not aware of that detail. Thanks for sharing (and I wonder if
> > all other cases should be removed from the source).
>
> I guess it depends, for code that can only be called during the expand pass
> dropping it should be just fine, for code that can be called also (or only)
> later I think adding JUMP_LABEL and correct LABEL_NUSES is needed because
> nothing will fix it up afterwards.
>
> 2022-01-15  Jakub Jelinek  
>
> PR target/103973
> * tree-cfg.h (cond_only_block_p): Declare.
> * tree-ssa-phiopt.c (cond_only_block_p): Move function to ...
> * tree-cfg.c (cond_only_block_p): ... here.  No longer static.
> * optabs.def (spaceship_optab): New optab.
> * internal-fn.def (SPACESHIP): New internal function.
> * internal-fn.h (expand_SPACESHIP): Declare.
> * internal-fn.c (expand_PHI): Formatting fix.
> (expand_SPACESHIP): New function.
> * tree-ssa-math-opts.c (optimize_spaceship): New function.
> (math_opts_dom_walker::after_dom_children): Use it.
> * config/i386/i386.md (spaceship3): New define_expand.
> * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare.
> * config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function.
> * doc/md.texi (spaceship@var{m}3): Document.
>
> * gcc.target/i386/pr103973-1.c: New test.
> * gcc.target/i386/pr103973-2.c: New test.
> * gcc.target/i386/pr103973-3.c: New test.
> * gcc.target/i386/pr103973-4.c: New test.
> * gcc.target/i386/pr103973-5.c: New test.
> * gcc.target/i386/pr103973-6.c: New test.
> * gcc.target/i386/pr103973-7.c: New test.
> * gcc.target/i386/pr103973-8.c: New test.
> * gcc.target/i386/pr103973-9.c: New test.
> * gcc.target/i386/pr103973-10.c: New test.
> * gcc.target/i386/pr103973-11.c: New test.
> * gcc.target/i386/pr103973-12.c: New test.
> * gcc.target/i386/pr103973-13.c: New test.
> * gcc.target/i386/pr103973-14.c: New test.
> * gcc.target/i386/pr103973-15.c: New test.
> * gcc.target/i386/pr103973-16.c: New test.
> * gcc.target/i386/pr103973-17.c: New test.
> * gcc.target/i386/pr103973-18.c: New test.
> * gcc.target/i386/pr103973-19.c: New test.
> * gcc.target/i386/pr103973-20.c: New test.
> * g++.target/i386/pr103973-1.C: New test.
> * g++.target/i386/pr103973-2.C: New test.
> * g++.target/i386/pr103973-3.C: New test.
> * g++.target/i386/pr103973-4.C: New test.
> * g++.target/i386/pr103973-5.C: New test.
> * g++.target/i386/pr103973-6.C: New test.
> * g++.target/i386/pr103973-7.C: New test.
> * g++.target/i386/pr103973-8.C: New test.
> * g++.target/i386/pr103973-9.C: New test.
> * g++.target/i386/pr103973-10.C: New test.
> * g++.target/i386/pr103973-11.C: New test.
> * g++.target/i386/pr103973-12.C: New test.
> * g++.target/i386/pr103973-13.C: New test.
> * g++.target/i386/pr103973-14.C: New test.
> * g++.target/i386/pr103973-15.C: New test.
> * g++.target/i386/pr103973-16.C: New test.
> * g++.target/i386/pr103973-17.C: New test.
> * g++.target/i386/pr103973-18.C: New test.
> * g++.target/i386/pr103973-19.C: New test.
> * g++.target/i386/pr103973-20.C: New test.

OK (with a comment fix below) for the x86 part.

Thanks,
Uros.

> --- gcc/tree-cfg.h.jj   2022-01-14 23:57:44.491718086 +0100
> +++ gcc/tree-cfg.h  2022-01-15 09:51:25.359468982 +0100
> @@ -111,6 +111,7 @@ extern basic_block gimple_switch_label_b
>  extern basic_block gimple_switch_default_bb (function *, gswitch *);
>  extern edge gimple_switch_edge (function *, gswitch *, unsigned);
>  extern edge gimple_switch_default_ed

[PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

2022-01-15 Thread Jakub Jelinek via Gcc-patches
On Sat, Jan 15, 2022 at 11:42:55AM +0100, Uros Bizjak wrote:
> Yes, that would be nice. XFmode is used for long double, and not obsolete.

Ok, that seems to work.  Compared to the incremental patch I've posted, I
also had to add handling of the case where we have just
x == y ? 0 : x < y ? -1 : 1 (both for -ffast-math and non-ffast-math).
Apparently even that is worth optimizing.
Tested so far on the new testcases, will run full bootstrap/regtest tonight.

> > Why?  That seems to be a waste of time to me, unless something uses them
> > already during expansion.  Because pass_expand::execute
> > runs:
> >   /* We need JUMP_LABEL be set in order to redirect jumps, and hence
> >  split edges which edge insertions might do.  */
> >   rebuild_jump_labels (get_insns ());
> > which resets all LABEL_NUSES to 0 (well, to:
> >   if (LABEL_P (insn))
> > LABEL_NUSES (insn) = (LABEL_PRESERVE_P (insn) != 0);
> > and then recomputes them and adds JUMP_LABEL if needed:
> >   JUMP_LABEL (insn) = label;
> 
> I was not aware of that detail. Thanks for sharing (and I wonder if
> all other cases should be removed from the source).

I guess it depends, for code that can only be called during the expand pass
dropping it should be just fine, for code that can be called also (or only)
later I think adding JUMP_LABEL and correct LABEL_NUSES is needed because
nothing will fix it up afterwards.

2022-01-15  Jakub Jelinek  

PR target/103973
* tree-cfg.h (cond_only_block_p): Declare.
* tree-ssa-phiopt.c (cond_only_block_p): Move function to ...
* tree-cfg.c (cond_only_block_p): ... here.  No longer static.
* optabs.def (spaceship_optab): New optab.
* internal-fn.def (SPACESHIP): New internal function.
* internal-fn.h (expand_SPACESHIP): Declare.
* internal-fn.c (expand_PHI): Formatting fix.
(expand_SPACESHIP): New function.
* tree-ssa-math-opts.c (optimize_spaceship): New function.
(math_opts_dom_walker::after_dom_children): Use it.
* config/i386/i386.md (spaceship3): New define_expand.
* config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare.
* config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function.
* doc/md.texi (spaceship@var{m}3): Document.

* gcc.target/i386/pr103973-1.c: New test.
* gcc.target/i386/pr103973-2.c: New test.
* gcc.target/i386/pr103973-3.c: New test.
* gcc.target/i386/pr103973-4.c: New test.
* gcc.target/i386/pr103973-5.c: New test.
* gcc.target/i386/pr103973-6.c: New test.
* gcc.target/i386/pr103973-7.c: New test.
* gcc.target/i386/pr103973-8.c: New test.
* gcc.target/i386/pr103973-9.c: New test.
* gcc.target/i386/pr103973-10.c: New test.
* gcc.target/i386/pr103973-11.c: New test.
* gcc.target/i386/pr103973-12.c: New test.
* gcc.target/i386/pr103973-13.c: New test.
* gcc.target/i386/pr103973-14.c: New test.
* gcc.target/i386/pr103973-15.c: New test.
* gcc.target/i386/pr103973-16.c: New test.
* gcc.target/i386/pr103973-17.c: New test.
* gcc.target/i386/pr103973-18.c: New test.
* gcc.target/i386/pr103973-19.c: New test.
* gcc.target/i386/pr103973-20.c: New test.
* g++.target/i386/pr103973-1.C: New test.
* g++.target/i386/pr103973-2.C: New test.
* g++.target/i386/pr103973-3.C: New test.
* g++.target/i386/pr103973-4.C: New test.
* g++.target/i386/pr103973-5.C: New test.
* g++.target/i386/pr103973-6.C: New test.
* g++.target/i386/pr103973-7.C: New test.
* g++.target/i386/pr103973-8.C: New test.
* g++.target/i386/pr103973-9.C: New test.
* g++.target/i386/pr103973-10.C: New test.
* g++.target/i386/pr103973-11.C: New test.
* g++.target/i386/pr103973-12.C: New test.
* g++.target/i386/pr103973-13.C: New test.
* g++.target/i386/pr103973-14.C: New test.
* g++.target/i386/pr103973-15.C: New test.
* g++.target/i386/pr103973-16.C: New test.
* g++.target/i386/pr103973-17.C: New test.
* g++.target/i386/pr103973-18.C: New test.
* g++.target/i386/pr103973-19.C: New test.
* g++.target/i386/pr103973-20.C: New test.

--- gcc/tree-cfg.h.jj   2022-01-14 23:57:44.491718086 +0100
+++ gcc/tree-cfg.h  2022-01-15 09:51:25.359468982 +0100
@@ -111,6 +111,7 @@ extern basic_block gimple_switch_label_b
 extern basic_block gimple_switch_default_bb (function *, gswitch *);
 extern edge gimple_switch_edge (function *, gswitch *, unsigned);
 extern edge gimple_switch_default_edge (function *, gswitch *);
+extern bool cond_only_block_p (basic_block);
 
 /* Return true if the LHS of a call should be removed.  */
 
--- gcc/tree-ssa-phiopt.c.jj2022-01-14 23:57:44.536717549 +0100
+++ gcc/tree-ssa-phiopt.c   2022-01-15 09:51:25.361468954 +0100
@@ -1958,31 +1958,6 @@ min