[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-07-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

Kewen Lin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #15 from Kewen Lin  ---
Commit r15-1890 and r15-1891 make the regression gone and r15-1991 removes
vcond{,u} expanders in rs6000, mark this as resolved. For the potential
enhancement on targets without iorc & andc support, I'll file a PR once
middle-end takes over the expanding for the target unsupported comparison.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-07-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #14 from GCC Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:f7e4000397842671fe7e5c0473f1fa62707e1db9

commit r15-1991-gf7e4000397842671fe7e5c0473f1fa62707e1db9
Author: Kewen Lin 
Date:   Fri Jul 12 01:32:57 2024 -0500

rs6000: Remove vcond{,u} expanders

As PR114189 shows, middle-end will obsolete vcond, vcondu
and vcondeq optabs soon.  This patch is to remove all
vcond{,u} expanders in rs6000 port and adjust the function
rs6000_emit_vector_cond_expr which is called by those
expanders as static.

PR target/115659

gcc/ChangeLog:

* config/rs6000/rs6000-protos.h (rs6000_emit_vector_cond_expr):
Remove.
* config/rs6000/rs6000.cc (rs6000_emit_vector_cond_expr): Add
static
qualifier as it is only called by rs6000_emit_swsqrt now.
* config/rs6000/vector.md (vcond): Remove.
(vcond): Remove.
(vcondv4sfv4si): Likewise.
(vcondv4siv4sf): Likewise.
(vcondv2dfv2di): Likewise.
(vcondv2div2df): Likewise.
(vcondu): Likewise.
(vconduv4sfv4si): Likewise.
(vconduv2dfv2di): Likewise.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-07-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:f379596e0ba99df249d6e8b3f2e66edfcea916fe

commit r15-1890-gf379596e0ba99df249d6e8b3f2e66edfcea916fe
Author: Kewen Lin 
Date:   Mon Jul 8 00:14:59 2024 -0500

isel: Fold more in gimple_expand_vec_cond_expr with andc and iorc
[PR115659]

As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? 0/z : z/-1:
  - for r = c ? 0 : z, it can be folded into r = ~c & z.
  - for r = c ? z : -1, it can be folded into r = ~c | z.

But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
compound operation, it's arguable to consider it beats
vector selection.  So this patch is to introduce new
optabs andc, iorc and its corresponding internal functions
BIT_{ANDC,IORC}, and if targets defines such optabs for
vector modes, it means targets support these hardware
insns and should be not worse than vector selection.

PR tree-optimization/115659

gcc/ChangeLog:

* doc/md.texi: Document andcm3 and iorcm3.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings
for
patterns x CMP y ? 0 : z and x CMP y ? z : -1.
* internal-fn.def (BIT_ANDC): New internal function.
(BIT_IORC): Likewise.
* optabs.def (andc, iorc): New optab.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-07-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #13 from GCC Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:6425dae07aa4be58abade03455c2d9744f73d4e1

commit r15-1891-g6425dae07aa4be58abade03455c2d9744f73d4e1
Author: Kewen Lin 
Date:   Mon Jul 8 00:15:00 2024 -0500

rs6000: Replace orc with iorc [PR115659]

Since iorc optab is introduced, this patch is to update the
expander names and all the related uses like bif expanders,
gen functions accordingly.

PR tree-optimization/115659

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def: Update some bif expanders by
replacing orc3 with iorc3.
* config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update
gen
function by replacing orc3 with iorc3.
* config/rs6000/rs6000.md (orc3): Rename to ...
(iorc3): ... this.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-07-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #11 from GCC Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:56670281c6db19d75c7b63e38971ab84681b245c

commit r15-1763-g56670281c6db19d75c7b63e38971ab84681b245c
Author: Kewen Lin 
Date:   Tue Jul 2 02:13:35 2024 -0500

isel: Fold more in gimple_expand_vec_cond_expr [PR115659]

As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? -1/z : z/0.

For r = c ? -1 : z, it can be folded into:
  - r = c | z (with ior_optab supported)
  - or r = c ? c : z

while for r = c ?  z : 0, it can be foled into:
  - r = c & z (with and_optab supported)
  - or r = c ? z : c

This patch is to teach ISEL to take care of them and also
remove the redundant gsi_replace as the caller of function
gimple_expand_vec_cond_expr will handle it.

PR tree-optimization/115659

gcc/ChangeLog:

* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings
for
patterns x CMP y ? -1 : z and x CMP y ? z : 0.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-07-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #10 from Kewen Lin  ---
(In reply to Richard Biener from comment #9)
> I think the inversion code wants to check invert_tree_comparison and see if
> the inverted compare is supported and only if not fall back to inverting the
> comparison result (there is of course the multi-use case to consider).

OK, for now all/most targets claim to support all comparisons (do swapping and
inversion etc. in expanders themselves), it seems that we have to handle this
until then we have some generic handlings for them.

> I also think that incrementally improving the /* Try to fold x CMP y ? -1 :
> 0 to x CMP y.  */ is fine we don't have to handle everything in one patch.
> 
> Thanks for working on this.  The x86 folks seem to be able to handle most
> things within the backend which is also fine, handling common problems in
> the middle-end is of course better.

Thanks for the suggestions, posted two patches for review and comments. Yes, I
realized that with some define_insn_and_split in backend can also catch some
pattern and generate expected code.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #9 from Richard Biener  ---
I think the inversion code wants to check invert_tree_comparison and see if
the inverted compare is supported and only if not fall back to inverting the
comparison result (there is of course the multi-use case to consider).

I also think that incrementally improving the /* Try to fold x CMP y ? -1 : 0
to x CMP y.  */ is fine we don't have to handle everything in one patch.

Thanks for working on this.  The x86 folks seem to be able to handle most
things within the backend which is also fine, handling common problems in
the middle-end is of course better.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #8 from Kewen Lin  ---
Inspired by Andrew's comments, it looks we can have:

   c = x CMP y
   r = c ?  0 :  z   =>  r =  ~c & z  (1)
   r = c ?  z :  0   =>  r =   c & z  (2)
   r = c ? -1 :  z   =>  r =   c | z  (3)
   r = c ?  z : -1   =>  r =  ~c | z  (4)

so if target supports vector "or" and "and", (2)(3) is clearly an improvement
(basic logical operation should not be slower than vector select), (1)(4) may
need further cost comparison (or if target supports the compound operation then
query with optab support).

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #7 from Kewen Lin  ---
> > > (simplify
> > >  (vec_cond @0 @1 integer_all_ones_p)
> > >  (bit_ior (view_convert @0) @1))
> > > ```
> > 
> > Missing negate for the vector one?
> 
> No because vector true is already -1 :).

I could be wrong, but this vector transformation seems wrong, like @0 is -1,
originally wants @1 but this simplification returns -1, while @0 is 0,
originally wants -1 but this simplification returns @1, the results get
switched?

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #6 from Andrew Pinski  ---
(In reply to Kewen Lin from comment #5)
> (In reply to Andrew Pinski from comment #2)
> > Note I think this could help scalar code too:
> > ```
> > int a[1], b[1], c[1];
> > 
> > void
> > test (void)
> > {
> >   a[0] = (b[0] == c[0]) ? -1 : a[0];
> > }
> > 
> > void
> > test1 (void)
> > {
> >   a[0] = (-(b[0] == c[0])) | a[0];
> > }
> > 
> > ```
> > 
> 
> Good catch!
> 
> > So this could be something like:
> > ```
> > (simplify
> >  (cond @0 @1 integer_all_ones_p)
> >  (bit_ior (negate (convert @0)) @1))
> > (simplify
> >  (vec_cond @0 @1 integer_all_ones_p)
> >  (bit_ior (view_convert @0) @1))
> > ```
> 
> Missing negate for the vector one?

No because vector true is already -1 :).

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #5 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #2)
> Note I think this could help scalar code too:
> ```
> int a[1], b[1], c[1];
> 
> void
> test (void)
> {
>   a[0] = (b[0] == c[0]) ? -1 : a[0];
> }
> 
> void
> test1 (void)
> {
>   a[0] = (-(b[0] == c[0])) | a[0];
> }
> 
> ```
> 

Good catch!

> So this could be something like:
> ```
> (simplify
>  (cond @0 @1 integer_all_ones_p)
>  (bit_ior (negate (convert @0)) @1))
> (simplify
>  (vec_cond @0 @1 integer_all_ones_p)
>  (bit_ior (view_convert @0) @1))
> ```

Missing negate for the vector one?

> The second one might need a target_supports_op_p for the bit_ior.

Thanks for the hints! This looks more simplified than still keeping vec_cond,
do we need to consider the target costing on cond (conditional select) vs.
negate + or?

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #4 from Kewen Lin  ---
(In reply to Richard Biener from comment #3)
>c = x CMP y 
>r = c ? -1 : z  =>  r = c ? c : z
>r = c ?  z : 0  =>  r = c ? z : c
> 
> this is probably best left for ISEL.  I agree the transforms eliminating
> the COND are useful in general and suitable also for match.pd.  Watch
> out for vectorizer patterns though which creates scalar COND_EXPRs for
> bool mask <-> bool value transforms.

Thanks for the suggestion! If going with ISEL, the patch seems to be like:

-
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 54c1801038b..abb18932228 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -240,16 +240,34 @@ gimple_expand_vec_cond_expr (struct function *fun,
gimple_stmt_iterator *gsi,
 can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
  tcode);

-  /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
  if (can_compute_op0
- && integer_minus_onep (op1)
- && integer_zerop (op2)
  && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
{
- tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
- gassign *new_stmt = gimple_build_assign (lhs, conv_op);
- gsi_replace (gsi, new_stmt, true);
- return new_stmt;
+ bool op1_minus_onep = integer_minus_onep (op1);
+ bool op2_zerop = integer_zerop (op2);
+ /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
+ if (op1_minus_onep && op2_zerop)
+   {
+ tree conv_op
+   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+   }
+ /* Try to fold x CMP y ? -1 : z to x CMP y ? x CMP y : z,
+or x CMP y ? z : 0 to x CMP y ? z : x CMP y.  */
+ if (op1_minus_onep || op2_zerop)
+   {
+ tree conv_op
+   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ tree new_op = make_ssa_name (TREE_TYPE (lhs));
+ gassign *new_stmt = gimple_build_assign (new_op, conv_op);
+ if (op1_minus_onep)
+   op1 = new_op;
+ else
+   op2 = new_op;
+ gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
+   }
}

  /* When the compare has EH we do not want to forward it when

-

But this doesn't help this exposed failure, as it belongs to the latter case.
If further going with some hacks for inversion:

-
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index abb18932228..afc2c9f1386 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -240,6 +240,15 @@ gimple_expand_vec_cond_expr (struct function *fun,
gimple_stmt_iterator *gsi,
can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
 tcode);

+ auto need_inverted_p = [](tree_code c, machine_mode m) {
+   if (GET_MODE_CLASS (m) == MODE_VECTOR_INT)
+ return (c == NE_EXPR || c == GE_EXPR || c == LE_EXPR);
+   gcc_assert (GET_MODE_CLASS (m) == MODE_VECTOR_FLOAT);
+   return (c == NE_EXPR || c == UNLE_EXPR || c == UNLT_EXPR
+   || c == UNGE_EXPR || c == UNGT_EXPR || c == UNORDERED_EXPR
+   || c == UNEQ_EXPR);
+ };
+
  if (can_compute_op0
  && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
{
@@ -254,6 +263,23 @@ gimple_expand_vec_cond_expr (struct function *fun,
gimple_stmt_iterator *gsi,
  gsi_replace (gsi, new_stmt, true);
  return new_stmt;
}
+ bool inverted_p = need_inverted_p (tcode, TYPE_MODE (op0a_type));
+ bool op1_zerop = integer_zerop (op1);
+ bool op2_minus_onep = integer_minus_onep (op2);
+ /* Try to fold x CMP y ? 0 : -1 to ~(x CMP y), it can reuse
+the comparison before the inversion.  */
+ if (inverted_p && op1_zerop && op2_minus_onep)
+   {
+ tree inv_op0 = make_ssa_name (TREE_TYPE (op0));
+ gassign *inv_stmt
+   = gimple_build_assign (inv_op0, BIT_NOT_EXPR, op0);
+ gsi_insert_seq_before (gsi, inv_stmt, GSI_SAME_STMT);
+ tree conv_op
+   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), inv_op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+   }
  /* Try to fold x CMP y ? 

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #3 from Richard Biener  ---
   c = x CMP y 
   r = c ? -1 : z  =>  r = c ? c : z
   r = c ?  z : 0  =>  r = c ? z : c

this is probably best left for ISEL.  I agree the transforms eliminating
the COND are useful in general and suitable also for match.pd.  Watch
out for vectorizer patterns though which creates scalar COND_EXPRs for
bool mask <-> bool value transforms.

Note the transforms need guarding with the mode check since for targets
with compares producing mask modes it doesn't work (x86, for smaller
vectors could resort to AVX compares, but the way the mask mode target
hook operates this doesn't look easy).

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |tree-optimization
 CC||pinskia at gcc dot gnu.org

--- Comment #2 from Andrew Pinski  ---
Note I think this could help scalar code too:
```
int a[1], b[1], c[1];

void
test (void)
{
  a[0] = (b[0] == c[0]) ? -1 : a[0];
}

void
test1 (void)
{
  a[0] = (-(b[0] == c[0])) | a[0];
}

```

So this could be something like:
```
(simplify
 (cond @0 @1 integer_all_ones_p)
 (bit_ior (negate (convert @0)) @1))
(simplify
 (vec_cond @0 @1 integer_all_ones_p)
 (bit_ior (view_convert @0) @1))
```
The second one might need a target_supports_op_p for the bit_ior.