[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2024-05-15 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #22 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #21)
> I am not sure if powerpc vsx
> has &~ though.

VMX has vandc (since 1999), and VSX has xxlandc (since 2010).

In general, PowerPC has a full complement of logical ops, everywhere.  In some
cases it has the full truth table of the operation as part of the binary opcode
;-)

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2024-05-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #21 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #20)
> The aarch64 backend matches this:
> (insn 15 10 16 2 (set (reg/i:V4SI 32 v0)
> (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 101)
> (reg:V4SI 102))
> (reg:V4SI 103))
> (reg:V4SI 101))) "/app/example.cpp":7:1 3103
> {aarch64_simd_bslv4si_internal}
> 
> for the `bit v0.16b, v1.16b, v2.16b` instruction. which was done r5-6601 (PR
> 64448) .

One thing for the middle-end here is if we have `(xor (and (xor A B) C) B)` we
could try expand it into `(a)|(b&~c)` if there is an optab for the &~ (which
I am aiming to add for other reasons). I am not sure if powerpc vsx has &~
though. I will doing my development on both x86_64 and aarch64 and it will be
up to the other targets to add the optab pattern if needed.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2023-09-03 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=64448

--- Comment #20 from Andrew Pinski  ---
The aarch64 backend matches this:
(insn 15 10 16 2 (set (reg/i:V4SI 32 v0)
(xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 101)
(reg:V4SI 102))
(reg:V4SI 103))
(reg:V4SI 101))) "/app/example.cpp":7:1 3103
{aarch64_simd_bslv4si_internal}

for the `bit v0.16b, v1.16b, v2.16b` instruction. which was done r5-6601 (PR
64448) .

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-12-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #19 from Segher Boessenkool  ---
(In reply to luoxhu from comment #17)
> And what do you mean"This is not canonical form on RTL, and it's not a
> useful form either" in c#7, please? Not understanding the point...

On Gimple it is canonical to convert (a)|(b&~c) to ((a^b))^b), because
all Gimple cares about is number of operations (and it counts unary operations
as well, so this is three instead of four ops).

For RTL we do not have such a simple-minded rule.

> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -3405,7 +3405,6 @@ simplify_context::simplify_binary_operation_1
> (rtx_code code,
>  machines, and also has shorter instruction path length.  */
>if (GET_CODE (op0) == AND
>   && GET_CODE (XEXP (op0, 0)) == XOR
> - && CONST_INT_P (XEXP (op0, 1))
>   && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1))
> {
>   rtx a = trueop1;
> @@ -3419,7 +3418,6 @@ simplify_context::simplify_binary_operation_1
> (rtx_code code,
>/* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C))
> */
>else if (GET_CODE (op0) == AND
>   && GET_CODE (XEXP (op0, 0)) == XOR
> - && CONST_INT_P (XEXP (op0, 1))
>   && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1))
> {
>   rtx a = XEXP (XEXP (op0, 0), 0);

It needs *some* test on it.  It certainly cannot have side effects for
example.  CONST_INT_P || REG_P should catch all useful cases?

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-12-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #18 from Segher Boessenkool  ---
(In reply to luoxhu from comment #16)
> > +2016-11-09  Segher Boessenkool  
> > +
> > +   * simplify-rtx.c (simplify_binary_operation_1): Simplify
> > +   (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
> > +   (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) if C
> > +   is a const_int.
> 
> 
> Is it a MUST that C be const here?

It could be extended to C a reg as well, I think.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-30 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #17 from luoxhu at gcc dot gnu.org ---
If the constant limitation is removed, it could be combined successfully with
my new patch for PR94613.

https://gcc.gnu.org/pipermail/gcc-patches/2021-April/569255.html

And what do you mean"This is not canonical form on RTL, and it's not a useful
form either" in c#7, please? Not understanding the point...


Trying 11 -> 16:
   11: r124:V4SI=r127:V4SI:V4SI|~r129:V4SI:V4SI
  REG_DEAD r128:V4SI
  REG_DEAD r129:V4SI
  REG_DEAD r127:V4SI
   16: %v2:V4SI=r124:V4SI
  REG_DEAD r124:V4SI
Successfully matched this instruction:
(set (reg/i:V4SI 66 %v2)
(ior:V4SI (and:V4SI (reg:V4SI 127)
(reg:V4SI 129))
(and:V4SI (not:V4SI (reg:V4SI 129))
(reg:V4SI 128
allowing combination of insns 11 and 16
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 11.
modifying insn i316: %v2:V4SI=r127:V4SI:V4SI|~r129:V4SI:V4SI
  REG_DEAD r127:V4SI
  REG_DEAD r129:V4SI
  REG_DEAD r128:V4SI
deferring rescan insn with uid = 16.


diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 571e2337e27..701f37eb03e 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -3405,7 +3405,6 @@ simplify_context::simplify_binary_operation_1 (rtx_code
code,
 machines, and also has shorter instruction path length.  */
   if (GET_CODE (op0) == AND
  && GET_CODE (XEXP (op0, 0)) == XOR
- && CONST_INT_P (XEXP (op0, 1))
  && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1))
{
  rtx a = trueop1;
@@ -3419,7 +3418,6 @@ simplify_context::simplify_binary_operation_1 (rtx_code
code,
   /* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C)) 
*/
   else if (GET_CODE (op0) == AND
  && GET_CODE (XEXP (op0, 0)) == XOR
- && CONST_INT_P (XEXP (op0, 1))
  && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1))
{
  rtx a = XEXP (XEXP (op0, 0), 0);

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-29 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #16 from luoxhu at gcc dot gnu.org ---

> +2016-11-09  Segher Boessenkool  
> +
> +   * simplify-rtx.c (simplify_binary_operation_1): Simplify
> +   (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
> +   (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) if C
> +   is a const_int.


Is it a MUST that C be const here? For this case in PR90323, C is not a const 
actually.

l = l & ~mask;
l |= mask & r;

Trying 8, 9 -> 10:
8: r127:V4SI=r124:V4SI^r131:V4SI
  REG_DEAD r131:V4SI
9: r122:V4SI=r127:V4SI:V4SI
  REG_DEAD r130:V4SI
  REG_DEAD r127:V4SI
   10: r128:V4SI=r124:V4SI^r122:V4SI
  REG_DEAD r124:V4SI
  REG_DEAD r122:V4SI

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-12 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #15 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #14)
> (In reply to luoxhu from comment #12)
> > That code was called by combine pass but fail to match. 
> 
> > 
> > pr newpat
> > (set (reg:DI 125 [ l ])
> > (xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ])
> > (reg:DI 127))
> > (const_int 267390975 [0xff00fff]))
> > (reg/v:DI 120 [ l ])))
> 
> Note this is 0x0ff00fff, and this is not a valid mask for rlwimi.

OK, it also fails to combine for 0x0100.


.cfi_startproc
xor 4,3,4
rlwinm 4,4,0,7,7
xor 3,4,3
blr

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #14 from Segher Boessenkool  ---
(In reply to luoxhu from comment #12)
> That code was called by combine pass but fail to match. 

> 
> pr newpat
> (set (reg:DI 125 [ l ])
> (xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ])
> (reg:DI 127))
> (const_int 267390975 [0xff00fff]))
> (reg/v:DI 120 [ l ])))

Note this is 0x0ff00fff, and this is not a valid mask for rlwimi.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-12 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #13 from Segher Boessenkool  ---
(In reply to luoxhu from comment #11)
> I noticed that you added the below optimization with commit
> a62436c0a505155fc8becac07a8c0abe2c265bfe. But it doesn't even handle this
> case, cse1 pass will call simplify_binary_operation_1, both op0 and op1 are
> REGs instead of AND operators, do you have a test case to cover that piece
> of code?

This worked at the time.  It broke some time ago in simple testcases,
triggered by the "don't combine hard registers" thing I did.  This is
PR98468.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-09 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #12 from luoxhu at gcc dot gnu.org ---

That code was called by combine pass but fail to match. 

pr newpat
(set (reg:DI 125 [ l ])
(xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ])
(reg:DI 127))
(const_int 267390975 [0xff00fff]))
(reg/v:DI 120 [ l ])))


Trying 8, 10 -> 11:
8: r123:DI=r120:DI^r127:DI
  REG_DEAD r127:DI
   10: r118:DI=r123:DI&0xff00fff
  REG_DEAD r123:DI
   11: r125:DI=r118:DI^r120:DI
  REG_DEAD r120:DI
  REG_DEAD r118:DI
Failed to match this instruction:
(set (reg:DI 125 [ l ])
(ior:DI (and:DI (reg/v:DI 120 [ l ])
(const_int -267390976 [0xf00ff000]))
(and:DI (reg:DI 127)
(const_int 267390975 [0xff00fff]
Successfully matched this instruction:
(set (reg:DI 118 [ _2 ])
(and:DI (reg:DI 127)
(const_int 267390975 [0xff00fff])))
Failed to match this instruction:
(set (reg:DI 125 [ l ])
(ior:DI (and:DI (reg/v:DI 120 [ l ])
(const_int -267390976 [0xf00ff000]))
(reg:DI 118 [ _2 ])))

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-08 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #11 from luoxhu at gcc dot gnu.org ---
I noticed that you added the below optimization with commit
a62436c0a505155fc8becac07a8c0abe2c265bfe. But it doesn't even handle this case,
cse1 pass will call simplify_binary_operation_1, both op0 and op1 are REGs
instead of AND operators, do you have a test case to cover that piece of code?

__attribute__ ((noinline))
 long without_sel3( long l,  long r) {
long tmp = {0x0ff00fff};
l =  ( (l ^ r) & tmp) ^ l;
return l;
}


without_sel3:
xor 4,3,4
rlwinm 4,4,0,20,11
rldicl 4,4,0,36
xor 3,4,3
blr
.long 0
.byte 0,0,0,0,0,0,0,0


+2016-11-09  Segher Boessenkool  
+
+   * simplify-rtx.c (simplify_binary_operation_1): Simplify
+   (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and
+   (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) if C
+   is a const_int.

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 5c3dea1a349..11a2e0267c7 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2886,6 +2886,37 @@ simplify_binary_operation_1 (enum rtx_code code,
machine_mode mode,
}
}

+  /* If we have (xor (and (xor A B) C) A) with C a constant we can instead
+do (ior (and A ~C) (and B C)) which is a machine instruction on some
+machines, and also has shorter instruction path length.  */
+  if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == XOR
+ && CONST_INT_P (XEXP (op0, 1))
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1))
+   {
+ rtx a = trueop1;
+ rtx b = XEXP (XEXP (op0, 0), 1);
+ rtx c = XEXP (op0, 1);
+ rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+ rtx a_nc = simplify_gen_binary (AND, mode, a, nc);
+ rtx bc = simplify_gen_binary (AND, mode, b, c);
+ return simplify_gen_binary (IOR, mode, a_nc, bc);
+   }
+  /* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C)) 
*/
+  else if (GET_CODE (op0) == AND
+ && GET_CODE (XEXP (op0, 0)) == XOR
+ && CONST_INT_P (XEXP (op0, 1))
+ && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1))
+   {
+ rtx a = XEXP (XEXP (op0, 0), 0);
+ rtx b = trueop1;
+ rtx c = XEXP (op0, 1);
+ rtx nc = simplify_gen_unary (NOT, mode, c, mode);
+ rtx b_nc = simplify_gen_binary (AND, mode, b, nc);
+ rtx ac = simplify_gen_binary (AND, mode, a, c);
+ return simplify_gen_binary (IOR, mode, ac, b_nc);
+   }

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-08 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #10 from Segher Boessenkool  ---
You cannot fix a simplify-rtx problem in much earlier passes!  It may be
useful of course (I have no idea, I don't know gimple well enough), but
it is no solution to the problem at all.  The xor/and/xor thing should be
simplified to something proper.

((A^B))^A = (A&~C)^(B) = (A&~C)|(B)

This should already be done by the expand pass.  At gimple level the logical
complement is counted as an operation, making the contorted xor/and/xor form
the best form to use, but in a system that considers more than just operation
counts (like in RTL) this is not the best form at all.  But, anyway, RTL
simplification should be able to do this.

Similar problems happen all over the place, fwiw -- see the various rl* tests
for rs6000, for example.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-07 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #9 from luoxhu at gcc dot gnu.org ---
Then we could optimized it in match.pd

diff --git a/gcc/match.pd b/gcc/match.pd
index 036f92fa959..8944312c153 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3711,6 +3711,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (integer_all_onesp (@1) && integer_zerop (@2))
 @0

+#if GIMPLE
+(simplify
+ (bit_xor @0 (bit_and @2 (bit_xor @0 @1)))
+ (if (optimize_vectors_before_lowering_p () && types_match (@0, @1)
+  && types_match (@0, @2) && VECTOR_TYPE_P (TREE_TYPE (@0))
+  && VECTOR_TYPE_P (TREE_TYPE (@1)) && VECTOR_TYPE_P (TREE_TYPE (@2)))
+ (with { tree itype = truth_type_for (type); }
+ (vec_cond (convert:itype @2) @1 @0
+#endif

in pr90323.c.033t.forwprop1, it will be optimized to:

   :
  _1 = ~mask_3(D);
  l_5 = _1 & l_4(D);
  _2 = mask_3(D) & r_6(D);
  _8 = l_4(D) ^ r_6(D);
  _10 = mask_3(D) & _8;
  _11 = (vector(4) ) mask_3(D);
  l_7 = VEC_COND_EXPR <_11, r_6(D), l_4(D)>;
  return l_7;

Then in pr90323.c.243t.isel:

   [local count: 1073741824]:
  _6 = (vector(4) ) mask_1(D);
  l_4 = .VCOND_MASK (_6, r_3(D), l_2(D));
  return l_4;

final ASM:

without_sel:
.LFB11:
.cfi_startproc
xxsel 34,34,35,36
blr
.long 0
.byte 0,0,0,0,0,0,0,0
.cfi_endproc
.LFE11:
.size   without_sel,.-without_sel
.align 2
.p2align 4,,15
.globl with_sel
.type   with_sel, @function
with_sel:
.LFB12:
.cfi_startproc
xxsel 34,34,35,36
blr


@segher, Is this reasonable fix ???

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-07 Thread luoxhu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

luoxhu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||luoxhu at gcc dot gnu.org

--- Comment #8 from luoxhu at gcc dot gnu.org ---
Two minor updates for the case mentioned in #c2:

 for VEC_SEL (ARG1, ARG2, ARG3):

   Returns a vector containing the value of either ARG1 or ARG2 depending on
the 
   value of ARG3.


#include 
#include 
volatile vector unsigned orig = {0xebebebeb, 0x34343434, 0x76767676,
0x12121212};
volatile vector unsigned mask = {0x, 0, 0x, 0};
volatile vector unsigned fill = {0xfefefefe, 0x, 0x,
0x};
volatile vector unsigned expected = {0xfefefefe, 0x34343434, 0x,
0x12121212};
__attribute__ ((noinline))
vector unsigned without_sel(vector unsigned l, vector unsigned r, vector
unsigned mask) {
-l = l & ~r;
+l = l & ~mask;
l |= mask & r;
return l;
}

__attribute__ ((noinline))
vector unsigned with_sel(vector unsigned l, vector unsigned r, vector unsigned
mask) {
-return vec_sel(l, mask, r);
+return vec_sel(l, r, mask);
}

int main() {
vector unsigned res1 = without_sel(orig, fill, mask);
vector unsigned res2 = with_sel(orig, fill, mask);
if (!vec_all_eq(res1, expected)) printf ("error1\n");
if (!vec_all_eq(res2, expected)) printf ("error2\n");
return 0;
}


And the ASM would be:

without_sel:
xxlxor 35,34,35
xxland 35,35,36
xxlxor 34,34,35
blr
.long 0
.byte 0,0,0,0,0,0,0,0
with_sel:
xxsel 34,34,35,36
blr
.long 0
.byte 0,0,0,0,0,0,0,0

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2019-05-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #7 from Segher Boessenkool  ---
From the combine dump of without_sel:

Trying 8, 9 -> 10:
8: r127:V4SI=r124:V4SI^r131:V4SI
  REG_DEAD r131:V4SI
9: r122:V4SI=r127:V4SI:V4SI
  REG_DEAD r130:V4SI
  REG_DEAD r127:V4SI
   10: r128:V4SI=r124:V4SI^r122:V4SI
  REG_DEAD r124:V4SI
  REG_DEAD r122:V4SI
Failed to match this instruction:
(set (reg:V4SI 128 [ l ])
(xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 124 [ l ])
(reg:V4SI 131))
(reg:V4SI 130))
(reg/v:V4SI 124 [ l ])))



That's not canonical form on RTL, and it's not a useful form either.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2019-05-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

Segher Boessenkool  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #6 from Segher Boessenkool  ---
But we should translate the xor/and/xor back to something saner.

Thanks for the testcase!