[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #22 from Segher Boessenkool --- (In reply to Andrew Pinski from comment #21) > I am not sure if powerpc vsx > has &~ though. VMX has vandc (since 1999), and VSX has xxlandc (since 2010). In general, PowerPC has a full complement of logical ops, everywhere. In some cases it has the full truth table of the operation as part of the binary opcode ;-)
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #21 from Andrew Pinski --- (In reply to Andrew Pinski from comment #20) > The aarch64 backend matches this: > (insn 15 10 16 2 (set (reg/i:V4SI 32 v0) > (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 101) > (reg:V4SI 102)) > (reg:V4SI 103)) > (reg:V4SI 101))) "/app/example.cpp":7:1 3103 > {aarch64_simd_bslv4si_internal} > > for the `bit v0.16b, v1.16b, v2.16b` instruction. which was done r5-6601 (PR > 64448) . One thing for the middle-end here is if we have `(xor (and (xor A B) C) B)` we could try expand it into `(a)|(b&~c)` if there is an optab for the &~ (which I am aiming to add for other reasons). I am not sure if powerpc vsx has &~ though. I will doing my development on both x86_64 and aarch64 and it will be up to the other targets to add the optab pattern if needed.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=64448 --- Comment #20 from Andrew Pinski --- The aarch64 backend matches this: (insn 15 10 16 2 (set (reg/i:V4SI 32 v0) (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 101) (reg:V4SI 102)) (reg:V4SI 103)) (reg:V4SI 101))) "/app/example.cpp":7:1 3103 {aarch64_simd_bslv4si_internal} for the `bit v0.16b, v1.16b, v2.16b` instruction. which was done r5-6601 (PR 64448) .
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #19 from Segher Boessenkool --- (In reply to luoxhu from comment #17) > And what do you mean"This is not canonical form on RTL, and it's not a > useful form either" in c#7, please? Not understanding the point... On Gimple it is canonical to convert (a)|(b&~c) to ((a^b))^b), because all Gimple cares about is number of operations (and it counts unary operations as well, so this is three instead of four ops). For RTL we do not have such a simple-minded rule. > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -3405,7 +3405,6 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > machines, and also has shorter instruction path length. */ >if (GET_CODE (op0) == AND > && GET_CODE (XEXP (op0, 0)) == XOR > - && CONST_INT_P (XEXP (op0, 1)) > && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1)) > { > rtx a = trueop1; > @@ -3419,7 +3418,6 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, >/* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C)) > */ >else if (GET_CODE (op0) == AND > && GET_CODE (XEXP (op0, 0)) == XOR > - && CONST_INT_P (XEXP (op0, 1)) > && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1)) > { > rtx a = XEXP (XEXP (op0, 0), 0); It needs *some* test on it. It certainly cannot have side effects for example. CONST_INT_P || REG_P should catch all useful cases?
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #18 from Segher Boessenkool --- (In reply to luoxhu from comment #16) > > +2016-11-09 Segher Boessenkool > > + > > + * simplify-rtx.c (simplify_binary_operation_1): Simplify > > + (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and > > + (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) if C > > + is a const_int. > > > Is it a MUST that C be const here? It could be extended to C a reg as well, I think.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #17 from luoxhu at gcc dot gnu.org --- If the constant limitation is removed, it could be combined successfully with my new patch for PR94613. https://gcc.gnu.org/pipermail/gcc-patches/2021-April/569255.html And what do you mean"This is not canonical form on RTL, and it's not a useful form either" in c#7, please? Not understanding the point... Trying 11 -> 16: 11: r124:V4SI=r127:V4SI:V4SI|~r129:V4SI:V4SI REG_DEAD r128:V4SI REG_DEAD r129:V4SI REG_DEAD r127:V4SI 16: %v2:V4SI=r124:V4SI REG_DEAD r124:V4SI Successfully matched this instruction: (set (reg/i:V4SI 66 %v2) (ior:V4SI (and:V4SI (reg:V4SI 127) (reg:V4SI 129)) (and:V4SI (not:V4SI (reg:V4SI 129)) (reg:V4SI 128 allowing combination of insns 11 and 16 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 11. modifying insn i316: %v2:V4SI=r127:V4SI:V4SI|~r129:V4SI:V4SI REG_DEAD r127:V4SI REG_DEAD r129:V4SI REG_DEAD r128:V4SI deferring rescan insn with uid = 16. diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 571e2337e27..701f37eb03e 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -3405,7 +3405,6 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, machines, and also has shorter instruction path length. */ if (GET_CODE (op0) == AND && GET_CODE (XEXP (op0, 0)) == XOR - && CONST_INT_P (XEXP (op0, 1)) && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1)) { rtx a = trueop1; @@ -3419,7 +3418,6 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, /* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C)) */ else if (GET_CODE (op0) == AND && GET_CODE (XEXP (op0, 0)) == XOR - && CONST_INT_P (XEXP (op0, 1)) && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1)) { rtx a = XEXP (XEXP (op0, 0), 0);
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #16 from luoxhu at gcc dot gnu.org --- > +2016-11-09 Segher Boessenkool > + > + * simplify-rtx.c (simplify_binary_operation_1): Simplify > + (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and > + (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) if C > + is a const_int. Is it a MUST that C be const here? For this case in PR90323, C is not a const actually. l = l & ~mask; l |= mask & r; Trying 8, 9 -> 10: 8: r127:V4SI=r124:V4SI^r131:V4SI REG_DEAD r131:V4SI 9: r122:V4SI=r127:V4SI:V4SI REG_DEAD r130:V4SI REG_DEAD r127:V4SI 10: r128:V4SI=r124:V4SI^r122:V4SI REG_DEAD r124:V4SI REG_DEAD r122:V4SI
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #15 from luoxhu at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #14) > (In reply to luoxhu from comment #12) > > That code was called by combine pass but fail to match. > > > > > pr newpat > > (set (reg:DI 125 [ l ]) > > (xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ]) > > (reg:DI 127)) > > (const_int 267390975 [0xff00fff])) > > (reg/v:DI 120 [ l ]))) > > Note this is 0x0ff00fff, and this is not a valid mask for rlwimi. OK, it also fails to combine for 0x0100. .cfi_startproc xor 4,3,4 rlwinm 4,4,0,7,7 xor 3,4,3 blr
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #14 from Segher Boessenkool --- (In reply to luoxhu from comment #12) > That code was called by combine pass but fail to match. > > pr newpat > (set (reg:DI 125 [ l ]) > (xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ]) > (reg:DI 127)) > (const_int 267390975 [0xff00fff])) > (reg/v:DI 120 [ l ]))) Note this is 0x0ff00fff, and this is not a valid mask for rlwimi.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #13 from Segher Boessenkool --- (In reply to luoxhu from comment #11) > I noticed that you added the below optimization with commit > a62436c0a505155fc8becac07a8c0abe2c265bfe. But it doesn't even handle this > case, cse1 pass will call simplify_binary_operation_1, both op0 and op1 are > REGs instead of AND operators, do you have a test case to cover that piece > of code? This worked at the time. It broke some time ago in simple testcases, triggered by the "don't combine hard registers" thing I did. This is PR98468.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #12 from luoxhu at gcc dot gnu.org --- That code was called by combine pass but fail to match. pr newpat (set (reg:DI 125 [ l ]) (xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ]) (reg:DI 127)) (const_int 267390975 [0xff00fff])) (reg/v:DI 120 [ l ]))) Trying 8, 10 -> 11: 8: r123:DI=r120:DI^r127:DI REG_DEAD r127:DI 10: r118:DI=r123:DI&0xff00fff REG_DEAD r123:DI 11: r125:DI=r118:DI^r120:DI REG_DEAD r120:DI REG_DEAD r118:DI Failed to match this instruction: (set (reg:DI 125 [ l ]) (ior:DI (and:DI (reg/v:DI 120 [ l ]) (const_int -267390976 [0xf00ff000])) (and:DI (reg:DI 127) (const_int 267390975 [0xff00fff] Successfully matched this instruction: (set (reg:DI 118 [ _2 ]) (and:DI (reg:DI 127) (const_int 267390975 [0xff00fff]))) Failed to match this instruction: (set (reg:DI 125 [ l ]) (ior:DI (and:DI (reg/v:DI 120 [ l ]) (const_int -267390976 [0xf00ff000])) (reg:DI 118 [ _2 ])))
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #11 from luoxhu at gcc dot gnu.org --- I noticed that you added the below optimization with commit a62436c0a505155fc8becac07a8c0abe2c265bfe. But it doesn't even handle this case, cse1 pass will call simplify_binary_operation_1, both op0 and op1 are REGs instead of AND operators, do you have a test case to cover that piece of code? __attribute__ ((noinline)) long without_sel3( long l, long r) { long tmp = {0x0ff00fff}; l = ( (l ^ r) & tmp) ^ l; return l; } without_sel3: xor 4,3,4 rlwinm 4,4,0,20,11 rldicl 4,4,0,36 xor 3,4,3 blr .long 0 .byte 0,0,0,0,0,0,0,0 +2016-11-09 Segher Boessenkool + + * simplify-rtx.c (simplify_binary_operation_1): Simplify + (xor (and (xor A B) C) B) to (ior (and A C) (and B ~C)) and + (xor (and (xor A B) C) A) to (ior (and A ~C) (and B C)) if C + is a const_int. diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 5c3dea1a349..11a2e0267c7 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -2886,6 +2886,37 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, } } + /* If we have (xor (and (xor A B) C) A) with C a constant we can instead +do (ior (and A ~C) (and B C)) which is a machine instruction on some +machines, and also has shorter instruction path length. */ + if (GET_CODE (op0) == AND + && GET_CODE (XEXP (op0, 0)) == XOR + && CONST_INT_P (XEXP (op0, 1)) + && rtx_equal_p (XEXP (XEXP (op0, 0), 0), trueop1)) + { + rtx a = trueop1; + rtx b = XEXP (XEXP (op0, 0), 1); + rtx c = XEXP (op0, 1); + rtx nc = simplify_gen_unary (NOT, mode, c, mode); + rtx a_nc = simplify_gen_binary (AND, mode, a, nc); + rtx bc = simplify_gen_binary (AND, mode, b, c); + return simplify_gen_binary (IOR, mode, a_nc, bc); + } + /* Similarly, (xor (and (xor A B) C) B) as (ior (and A C) (and B ~C)) */ + else if (GET_CODE (op0) == AND + && GET_CODE (XEXP (op0, 0)) == XOR + && CONST_INT_P (XEXP (op0, 1)) + && rtx_equal_p (XEXP (XEXP (op0, 0), 1), trueop1)) + { + rtx a = XEXP (XEXP (op0, 0), 0); + rtx b = trueop1; + rtx c = XEXP (op0, 1); + rtx nc = simplify_gen_unary (NOT, mode, c, mode); + rtx b_nc = simplify_gen_binary (AND, mode, b, nc); + rtx ac = simplify_gen_binary (AND, mode, a, c); + return simplify_gen_binary (IOR, mode, ac, b_nc); + }
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #10 from Segher Boessenkool --- You cannot fix a simplify-rtx problem in much earlier passes! It may be useful of course (I have no idea, I don't know gimple well enough), but it is no solution to the problem at all. The xor/and/xor thing should be simplified to something proper. ((A^B))^A = (A&~C)^(B) = (A&~C)|(B) This should already be done by the expand pass. At gimple level the logical complement is counted as an operation, making the contorted xor/and/xor form the best form to use, but in a system that considers more than just operation counts (like in RTL) this is not the best form at all. But, anyway, RTL simplification should be able to do this. Similar problems happen all over the place, fwiw -- see the various rl* tests for rs6000, for example.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #9 from luoxhu at gcc dot gnu.org --- Then we could optimized it in match.pd diff --git a/gcc/match.pd b/gcc/match.pd index 036f92fa959..8944312c153 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3711,6 +3711,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (integer_all_onesp (@1) && integer_zerop (@2)) @0 +#if GIMPLE +(simplify + (bit_xor @0 (bit_and @2 (bit_xor @0 @1))) + (if (optimize_vectors_before_lowering_p () && types_match (@0, @1) + && types_match (@0, @2) && VECTOR_TYPE_P (TREE_TYPE (@0)) + && VECTOR_TYPE_P (TREE_TYPE (@1)) && VECTOR_TYPE_P (TREE_TYPE (@2))) + (with { tree itype = truth_type_for (type); } + (vec_cond (convert:itype @2) @1 @0 +#endif in pr90323.c.033t.forwprop1, it will be optimized to: : _1 = ~mask_3(D); l_5 = _1 & l_4(D); _2 = mask_3(D) & r_6(D); _8 = l_4(D) ^ r_6(D); _10 = mask_3(D) & _8; _11 = (vector(4) ) mask_3(D); l_7 = VEC_COND_EXPR <_11, r_6(D), l_4(D)>; return l_7; Then in pr90323.c.243t.isel: [local count: 1073741824]: _6 = (vector(4) ) mask_1(D); l_4 = .VCOND_MASK (_6, r_3(D), l_2(D)); return l_4; final ASM: without_sel: .LFB11: .cfi_startproc xxsel 34,34,35,36 blr .long 0 .byte 0,0,0,0,0,0,0,0 .cfi_endproc .LFE11: .size without_sel,.-without_sel .align 2 .p2align 4,,15 .globl with_sel .type with_sel, @function with_sel: .LFB12: .cfi_startproc xxsel 34,34,35,36 blr @segher, Is this reasonable fix ???
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 luoxhu at gcc dot gnu.org changed: What|Removed |Added CC||luoxhu at gcc dot gnu.org --- Comment #8 from luoxhu at gcc dot gnu.org --- Two minor updates for the case mentioned in #c2: for VEC_SEL (ARG1, ARG2, ARG3): Returns a vector containing the value of either ARG1 or ARG2 depending on the value of ARG3. #include #include volatile vector unsigned orig = {0xebebebeb, 0x34343434, 0x76767676, 0x12121212}; volatile vector unsigned mask = {0x, 0, 0x, 0}; volatile vector unsigned fill = {0xfefefefe, 0x, 0x, 0x}; volatile vector unsigned expected = {0xfefefefe, 0x34343434, 0x, 0x12121212}; __attribute__ ((noinline)) vector unsigned without_sel(vector unsigned l, vector unsigned r, vector unsigned mask) { -l = l & ~r; +l = l & ~mask; l |= mask & r; return l; } __attribute__ ((noinline)) vector unsigned with_sel(vector unsigned l, vector unsigned r, vector unsigned mask) { -return vec_sel(l, mask, r); +return vec_sel(l, r, mask); } int main() { vector unsigned res1 = without_sel(orig, fill, mask); vector unsigned res2 = with_sel(orig, fill, mask); if (!vec_all_eq(res1, expected)) printf ("error1\n"); if (!vec_all_eq(res2, expected)) printf ("error2\n"); return 0; } And the ASM would be: without_sel: xxlxor 35,34,35 xxland 35,35,36 xxlxor 34,34,35 blr .long 0 .byte 0,0,0,0,0,0,0,0 with_sel: xxsel 34,34,35,36 blr .long 0 .byte 0,0,0,0,0,0,0,0
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 --- Comment #7 from Segher Boessenkool --- From the combine dump of without_sel: Trying 8, 9 -> 10: 8: r127:V4SI=r124:V4SI^r131:V4SI REG_DEAD r131:V4SI 9: r122:V4SI=r127:V4SI:V4SI REG_DEAD r130:V4SI REG_DEAD r127:V4SI 10: r128:V4SI=r124:V4SI^r122:V4SI REG_DEAD r124:V4SI REG_DEAD r122:V4SI Failed to match this instruction: (set (reg:V4SI 128 [ l ]) (xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 124 [ l ]) (reg:V4SI 131)) (reg:V4SI 130)) (reg/v:V4SI 124 [ l ]))) That's not canonical form on RTL, and it's not a useful form either.
[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323 Segher Boessenkool changed: What|Removed |Added Status|WAITING |NEW --- Comment #6 from Segher Boessenkool --- But we should translate the xor/and/xor back to something saner. Thanks for the testcase!