[PATCH v3, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]
Hi, This patch fails TImode for all 128-bit logical operation expanders. So TImode splits to two DI registers during expand. Potential optimizations can be taken after expand pass. Originally, the TImode logical operations are split after reload pass. It's too late. The test case illustrates it. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is this okay for trunk? Any recommendations? Thanks a lot. ChangeLog 2022-07-04 Haochen Gui gcc/ PR target/100694 * config/rs6000/rs6000.md (and3): Fail TImode. (ior3): Likewise. (xor3): Likewise. (nor3): Likewise. (andc3): Likewise. (eqv3): Likewise. (nand3): Likewise. (orc3): Likewise. (one_cmpl2): Define as an expand and fail TImode. (*one_cmpl2): Define as an anonymous insn pattern. gcc/testsuite/ PR target/100694 * gcc.target/powerpc/pr100694.c: New. * gcc.target/powerpc/pr92398.c: New. * gcc.target/powerpc/pr92398.h: Remove. * gcc.target/powerpc/pr92398.p9-.c: Remove. * gcc.target/powerpc/pr92398.p9+.c: Remove. patch.diff diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index c55ee7e171a..6e57aac3ebf 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7078,27 +7078,38 @@ (define_expand "subti3" }) ;; 128-bit logical operations expanders +;; Fail TImode in all 128-bit logical operations expanders and split it into +;; two DI registers. (define_expand "and3" [(set (match_operand:BOOL_128 0 "vlogical_operand") (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") (match_operand:BOOL_128 2 "vlogical_operand")))] "" - "") +{ + if (mode == TImode) +FAIL; +}) (define_expand "ior3" [(set (match_operand:BOOL_128 0 "vlogical_operand") (ior:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") (match_operand:BOOL_128 2 "vlogical_operand")))] "" - "") +{ + if (mode == TImode) +FAIL; +}) (define_expand "xor3" [(set (match_operand:BOOL_128 0 "vlogical_operand") (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") (match_operand:BOOL_128 2 "vlogical_operand")))] "" - "") +{ + if (mode == TImode) +FAIL; +}) (define_expand "nor3" [(set (match_operand:BOOL_128 0 "vlogical_operand") @@ -7106,7 +7117,10 @@ (define_expand "nor3" (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")) (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"] "" - "") +{ + if (mode == TImode) +FAIL; +}) (define_expand "andc3" [(set (match_operand:BOOL_128 0 "vlogical_operand") @@ -7114,7 +7128,10 @@ (define_expand "andc3" (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand")) (match_operand:BOOL_128 1 "vlogical_operand")))] "" - "") +{ + if (mode == TImode) +FAIL; +}) ;; Power8 vector logical instructions. (define_expand "eqv3" @@ -7123,7 +7140,10 @@ (define_expand "eqv3" (xor:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") (match_operand:BOOL_128 2 "vlogical_operand"] "mode == TImode || mode == PTImode || TARGET_P8_VECTOR" - "") +{ + if (mode == TImode) +FAIL; +}) ;; Rewrite nand into canonical form (define_expand "nand3" @@ -7132,7 +7152,10 @@ (define_expand "nand3" (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")) (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand"] "mode == TImode || mode == PTImode || TARGET_P8_VECTOR" - "") +{ + if (mode == TImode) +FAIL; +}) ;; The canonical form is to have the negated element first, so we need to ;; reverse arguments. @@ -7142,7 +7165,10 @@ (define_expand "orc3" (not:BOOL_128 (match_operand:BOOL_128 2 "vlogical_operand")) (match_operand:BOOL_128 1 "vlogical_operand")))] "mode == TImode || mode == PTImode || TARGET_P8_VECTOR" - "") +{ + if (mode == TImode) +FAIL; +}) ;; 128-bit logical operations insns and split operations (define_insn_and_split "*and3_internal" @@ -7394,7 +7420,17 @@ (define_insn_and_split "*eqv3_internal2" (const_string "16")))]) ;; 128-bit one's complement -(define_insn_and_split "one_cmpl2" +(define_expand "one_cmpl2" + [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") + (not:BOOL_128 + (match_operand:BOOL_128 1 "vlogical_operand" "")))] + "" +{ + if (mode == TImode) +FAIL; +}) + +(define_insn_and_split "*one_cmpl2" [(set (match_operand:BOOL_128 0 "vlogical_operand" "=") (not:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand" "")))] diff --git a/gcc/testsuite/gcc.target/powerpc/pr100694.c b/gcc/testsuite/gcc.target/powerpc/pr100694.c new file mode 100644 index 000..99dd3ca89ff --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* {
Re: [PATCH v3, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]
Hi! On Mon, Jul 04, 2022 at 02:27:42PM +0800, HAO CHEN GUI wrote: > This patch fails TImode for all 128-bit logical operation expanders. So > TImode splits to two DI registers during expand. Potential optimizations can > be taken after expand pass. Originally, the TImode logical operations are > split after reload pass. It's too late. The test case illustrates it. Out of interest, did you see any performance win? There is a lot of opportunity for this to cause performance *loss*, on newer systems :-( > ChangeLog > 2022-07-04 Haochen Gui Two spaces both before and after your name, in changelogs. > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -7078,27 +7078,38 @@ (define_expand "subti3" > }) > > ;; 128-bit logical operations expanders > +;; Fail TImode in all 128-bit logical operations expanders and split it into > +;; two DI registers. > > (define_expand "and3" >[(set (match_operand:BOOL_128 0 "vlogical_operand") > (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") > (match_operand:BOOL_128 2 "vlogical_operand")))] >"" > - "") > +{ > + if (mode == TImode) > +FAIL; > +}) It is better to not FAIL it, but simply not have a pattern for the TImode version at all. Does nothing depend on the :TI version to exist? What about the :PTI version? Getting rid of that as well will allow some nice optimisations. Of course we *do* have instructions to do such TImode ops, on newer CPUs, but in vector registers only. It isn't obvious what is faster. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target int128 } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */ > +/* { dg-final { scan-assembler-not {\mli\M} } } */ > +/* { dg-final { scan-assembler-not {\mor\M} } } */ > + > +/* It just needs two std. */ > +void foo (unsigned __int128* res, unsigned long long hi, unsigned long long > lo) > +{ > + unsigned __int128 i = hi; > + i <<= 64; > + i |= lo; > + *res = i; > +} You can also just count the number of generated insns: /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 3 } } */ (three, not two, because of the blr insn at the end). If possible, we should simply not do :TI ops on older systems at all, and only on the newer systems that have instructions for it (and that does not fix PR100694 btw, the problems there have to be solved, not side-stepped :-( ) Segher
Re: [PATCH v3, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]
Hi Segher, Thanks for your comments. On 13/7/2022 上午 1:26, Segher Boessenkool wrote: >> --- a/gcc/config/rs6000/rs6000.md >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -7078,27 +7078,38 @@ (define_expand "subti3" >> }) >> >> ;; 128-bit logical operations expanders >> +;; Fail TImode in all 128-bit logical operations expanders and split it into >> +;; two DI registers. >> >> (define_expand "and3" >>[(set (match_operand:BOOL_128 0 "vlogical_operand") >> (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand") >>(match_operand:BOOL_128 2 "vlogical_operand")))] >>"" >> - "") >> +{ >> + if (mode == TImode) >> +FAIL; >> +}) > It is better to not FAIL it, but simply not have a pattern for the > TImode version at all. > > Does nothing depend on the :TI version to exist? > > What about the :PTI version? Getting rid of that as well will allow > some nice optimisations. > > Of course we *do* have instructions to do such TImode ops, on newer > CPUs, but in vector registers only. It isn't obvious what is faster. > During expand, TI mode is split to two registers when it can't match any expands. So I failed TI mode in each expand and expect to be split at expand. TI mode is still in some insn_and_split patterns (e.g. "*and3_internal"). If later rtl passes generate TI mode logical operations, they still can be matched. Originally, the TI mode is split after reload pass by rs6000_split_logical. It's too late to catch some rtl optimizations. For the PTI, it can't be split to two registers during expand. PTI requires an even/odd register pair. So splitting it after reload can make sure it gets correct registers, I think. >From my understanding, it's sub-optimal to use vector logical operation instructions for TI mode if the destination is an integer operand. It needs three instructions (move to vector register, vector logical operation and move from vector register). When splitting TImode, it only needs two logical instructions on two separate registers. Thanks again Gui Haochen