Re: Ping: [PATCH] Generate 128-bit divide/modulus
Note, as I mentioned previously, this patch needs to go into GCC 11. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Ping: [PATCH] Generate 128-bit divide/modulus
Ping this patch. | Date: Fri, 4 Jun 2021 11:10:37 -0400 | Subject: Generate 128-bit divide/modulus | Message-ID: <20210604151037.ga27...@ibm-toto.the-meissners.org> -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Ping [PATCH]: Generate 128-bit divide/modulus
Ping patch. In particular, we would like to get this to GCC 11.2 because it is power10 enablement. | Date: Fri, 4 Jun 2021 11:10:37 -0400 | Subject: Generate 128-bit divide/modulus | Message-ID: <20210604151037.ga27...@ibm-toto.the-meissners.org> -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Re: Generate 128-bit divide/modulus
On Fri, 2021-06-04 at 11:10 -0400, Michael Meissner wrote: Hi, > Generate 128-bit divide/modulus. > > This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ > instructions to do 128-bit arithmetic. vdivsq,vdivuq,vmodsq,vmoduq should be lowercase ? > > I have tested this on 3 compilers: > * Power9 little endian, --with-cpu=power9 > * Power8 big endian, --with-cpu=power8, both 32/64-bit tested > * Power10 little endian, --with-cpu=power10 > > There were no issues found in the runs. Can I check this into the > master > branch and later into the GCC 11 branch after a soak-in period? > > gcc/ > 2021-06-03 Michael Meissner > > PR target/100809 Add some reference to [PR/100809] in the subject? >From the GCC bugzilla > [tag] [reply] [−] Comment 3 Michael Meissner 2021-06-01 22:55:20 UTC > > Carl Love submitted a patch for this on April 26th. > > [tag] [reply] [−] Comment 4 Michael Meissner 2021-06-01 22:58:31 UTC > > Note, in looking at Carl's patch, it is only for adding the built- > ins. I don't believe it adds direct support for {,u}divti3 and > {,u}moddti3 to implement these for normal __int128 variables. > A few words to clarify the situation in the description may be good.. Since that patch did not directly address the PR, i imagine that was a happy accident that it partially implemented/resolved the situation here. > * config/rs6000/rs6000.md (udivti3): New insn. > (divti3): New insn. > (umodti3): New insn. > (modti3): New insn. ok > > gcc/testsuite/ > 2021-06-03 Michael Meissner > > PR target/100809 > * gcc.target/powerpc/p10-vdiv-vmod.c: New test. ok > --- > gcc/config/rs6000/rs6000.md | 34 > +++ > .../gcc.target/powerpc/p10-vdivq-vmodq.c | 27 +++ > 2 files changed, 61 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-vdivq- > vmodq.c > > diff --git a/gcc/config/rs6000/rs6000.md > b/gcc/config/rs6000/rs6000.md > index 2517901f239..e70dbe409df 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -3234,6 +3234,14 @@ (define_insn "udiv3" >[(set_attr "type" "div") > (set_attr "size" "")]) > > +(define_insn "udivti3" > + [(set (match_operand:TI 0 "altivec_register_operand" "=v") > +(udiv:TI (match_operand:TI 1 "altivec_register_operand" "v") > + (match_operand:TI 2 "altivec_register_operand" "v")))] > + "TARGET_POWER10 && TARGET_POWERPC64" > + "vdivuq %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > > ;; For powers of two we can do sra[wd]i/addze for divide and then > adjust for > ;; modulus. If it isn't a power of two, force operands into > register and do > @@ -3324,6 +3332,15 @@ (define_insn_and_split "*div3_sra_dot2" > (set_attr "length" "8,12") > (set_attr "cell_micro" "not")]) > > +(define_insn "divti3" > + [(set (match_operand:TI 0 "altivec_register_operand" "=v") > +(div:TI (match_operand:TI 1 "altivec_register_operand" "v") > + (match_operand:TI 2 "altivec_register_operand" "v")))] > + "TARGET_POWER10 && TARGET_POWERPC64" > + "vdivsq %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > + > (define_expand "mod3" >[(set (match_operand:GPR 0 "gpc_reg_operand") > (mod:GPR (match_operand:GPR 1 "gpc_reg_operand") > @@ -3424,6 +3441,23 @@ (define_peephole2 > (minus:GPR (match_dup 1) > (match_dup 3)))]) > > +(define_insn "umodti3" > + [(set (match_operand:TI 0 "altivec_register_operand" "=v") > +(umod:TI (match_operand:TI 1 "altivec_register_operand" "v") > + (match_operand:TI 2 "altivec_register_operand" "v")))] > + "TARGET_POWER10 && TARGET_POWERPC64" > + "vmoduq %0,%1,%2" > + [(set_attr "type" "vecdiv") > + (set_attr "size" "128")]) > + > +(define_insn "modti3" > + [(set (match_operand:TI 0 "altivec_register_operand" "=v") > +(mod:TI (match_operand:TI 1 "altivec_register_operand" "v") > + (
Generate 128-bit divide/modulus
Generate 128-bit divide/modulus. This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ instructions to do 128-bit arithmetic. I have tested this on 3 compilers: * Power9 little endian, --with-cpu=power9 * Power8 big endian, --with-cpu=power8, both 32/64-bit tested * Power10 little endian, --with-cpu=power10 There were no issues found in the runs. Can I check this into the master branch and later into the GCC 11 branch after a soak-in period? gcc/ 2021-06-03 Michael Meissner PR target/100809 * config/rs6000/rs6000.md (udivti3): New insn. (divti3): New insn. (umodti3): New insn. (modti3): New insn. gcc/testsuite/ 2021-06-03 Michael Meissner PR target/100809 * gcc.target/powerpc/p10-vdiv-vmod.c: New test. --- gcc/config/rs6000/rs6000.md | 34 +++ .../gcc.target/powerpc/p10-vdivq-vmodq.c | 27 +++ 2 files changed, 61 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 2517901f239..e70dbe409df 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -3234,6 +3234,14 @@ (define_insn "udiv3" [(set_attr "type" "div") (set_attr "size" "")]) +(define_insn "udivti3" + [(set (match_operand:TI 0 "altivec_register_operand" "=v") +(udiv:TI (match_operand:TI 1 "altivec_register_operand" "v") +(match_operand:TI 2 "altivec_register_operand" "v")))] + "TARGET_POWER10 && TARGET_POWERPC64" + "vdivuq %0,%1,%2" + [(set_attr "type" "vecdiv") + (set_attr "size" "128")]) ;; For powers of two we can do sra[wd]i/addze for divide and then adjust for ;; modulus. If it isn't a power of two, force operands into register and do @@ -3324,6 +3332,15 @@ (define_insn_and_split "*div3_sra_dot2" (set_attr "length" "8,12") (set_attr "cell_micro" "not")]) +(define_insn "divti3" + [(set (match_operand:TI 0 "altivec_register_operand" "=v") +(div:TI (match_operand:TI 1 "altivec_register_operand" "v") + (match_operand:TI 2 "altivec_register_operand" "v")))] + "TARGET_POWER10 && TARGET_POWERPC64" + "vdivsq %0,%1,%2" + [(set_attr "type" "vecdiv") + (set_attr "size" "128")]) + (define_expand "mod3" [(set (match_operand:GPR 0 "gpc_reg_operand") (mod:GPR (match_operand:GPR 1 "gpc_reg_operand") @@ -3424,6 +3441,23 @@ (define_peephole2 (minus:GPR (match_dup 1) (match_dup 3)))]) +(define_insn "umodti3" + [(set (match_operand:TI 0 "altivec_register_operand" "=v") +(umod:TI (match_operand:TI 1 "altivec_register_operand" "v") +(match_operand:TI 2 "altivec_register_operand" "v")))] + "TARGET_POWER10 && TARGET_POWERPC64" + "vmoduq %0,%1,%2" + [(set_attr "type" "vecdiv") + (set_attr "size" "128")]) + +(define_insn "modti3" + [(set (match_operand:TI 0 "altivec_register_operand" "=v") +(mod:TI (match_operand:TI 1 "altivec_register_operand" "v") + (match_operand:TI 2 "altivec_register_operand" "v")))] + "TARGET_POWER10 && TARGET_POWERPC64" + "vmodsq %0,%1,%2" + [(set_attr "type" "vecdiv") + (set_attr "size" "128")]) ;; Logical instructions ;; The logical instructions are mostly combined by using match_operator, diff --git a/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c b/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c new file mode 100644 index 000..cd29b0a4b6b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c @@ -0,0 +1,27 @@ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +unsigned __int128 u_div(unsigned __int128 a, unsigned __int128 b) +{ + return a/b; +} + +unsigned __int128 u_mod(unsigned __int128 a, unsigned __int128 b) +{ + return a%b; +} +__int128 s_div(__int128 a, __int128 b) +{ + return a/b; +} + +__int128 s_mod(__int128 a, __int128 b) +{ + return a%b; +} + +/* { dg-final { scan-assembler {\mvdivsq\M} } } */ +/* { dg-final { scan-assembler {\mvdivuq\M} } } */ +/* { dg-final { scan-assembler {\mvmodsq\M} } } */ +/* { dg-final { scan-assembler {\mvmoduq\M} } } */ -- 2.31.1 -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797