Re: [PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V
On 2021-07-12, Kito Cheng wrote: It was undocument before, but it might used in linux kernel for resolve code model issue, so LLVM community suggest we should document that, so that make it become supported/documented/non-internal machine constraints. gcc/ChangeLog: PR target/101275 * config/riscv/constraints.md ("S"): Update description and remove @internal. * doc/md.texi (Machine Constraints): Document the 'S' constraints for RISC-V. --- gcc/config/riscv/constraints.md | 3 +-- gcc/doc/md.texi | 3 +++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md index 8c15c6c0486..c87d5b796a5 100644 --- a/gcc/config/riscv/constraints.md +++ b/gcc/config/riscv/constraints.md @@ -67,8 +67,7 @@ (define_memory_constraint "A" (match_test "GET_CODE(XEXP(op,0)) == REG"))) (define_constraint "S" - "@internal - A constant call address." + "A constraint that matches an absolute symbolic address." (match_operand 0 "absolute_symbolic_operand")) (define_constraint "U" diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 00caf3844cc..2d120da96cf 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions. @item A An address that is held in a general-purpose register. +@item S +A constraint that matches an absolute symbolic address. + @end table @item RX---@file{config/rx/constraints.md} -- 2.31.1 LGTM
[PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V
It was undocument before, but it might used in linux kernel for resolve code model issue, so LLVM community suggest we should document that, so that make it become supported/documented/non-internal machine constraints. gcc/ChangeLog: PR target/101275 * config/riscv/constraints.md ("S"): Update description and remove @internal. * doc/md.texi (Machine Constraints): Document the 'S' constraints for RISC-V. --- gcc/config/riscv/constraints.md | 3 +-- gcc/doc/md.texi | 3 +++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md index 8c15c6c0486..c87d5b796a5 100644 --- a/gcc/config/riscv/constraints.md +++ b/gcc/config/riscv/constraints.md @@ -67,8 +67,7 @@ (define_memory_constraint "A" (match_test "GET_CODE(XEXP(op,0)) == REG"))) (define_constraint "S" - "@internal - A constant call address." + "A constraint that matches an absolute symbolic address." (match_operand 0 "absolute_symbolic_operand")) (define_constraint "U" diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 00caf3844cc..2d120da96cf 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions. @item A An address that is held in a general-purpose register. +@item S +A constraint that matches an absolute symbolic address. + @end table @item RX---@file{config/rx/constraints.md} -- 2.31.1
Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.
PING^1 On 6/28/21 2:19 PM, Martin Liška wrote: On 6/24/21 12:46 AM, Segher Boessenkool wrote: Hi! On Wed, Jun 23, 2021 at 03:22:34PM +0200, Martin Liška wrote: As mentioned in the "Fallout: save/restore target options in handle_optimize_attribute" thread, we need to support target option restore of rs6000_long_double_type_size == FLOAT_PRECISION_TFmode. I have no idea? Could you explain please? Sure. Few weeks ago, we started using cl_target_option_{save,restore} calls even for optimize attributes (and pragma). Motivation was that optimize options can influence target options (and vice versa). Doing that, FLOAT_PRECISION_TFmode must be accepted as a valid option value for rs6000_long_double_type_size. --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p) else rs6000_long_double_type_size = default_long_double_size; } + else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode) + ; /* The option can be restored a TREE_TARGET_OPTION. */ What does that mean? It is not grammatical, and not obvious what it should mean. Updated. --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c @@ -0,0 +1,14 @@ +/* { dg-do compile { target { powerpc*-*-linux* } } } */ Why on Linux only? That doesn't sound right. Do you need some other selector(s)? Sorry, I copied the test-case. +/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */ + +extern unsigned long int x; +extern float f (float); +extern __typeof (f) f_power8; +extern __typeof (f) f_power9; +extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc"))); +static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) * -fno-stack-protector is default. Yes, but one needs an optimize attribute in order to trigger cl_target_option_save/restore mechanism. Martin +f_ifunc (void) +{ + __typeof (f) *res = x ? f_power9 : f_power8; + return res; +} The testcase should say what it is testing for, it is not obvious? Segher
Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]
On 2021/7/10 02:40, will schmidt wrote: > On Wed, 2021-06-30 at 09:44 +0800, Xionghu Luo via Gcc-patches wrote: >> Gentle ping ^2, thanks. >> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html >> >> >> On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote: >>> Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%, >>> 526.blender_r +1.72%, no obvious changes to others. > > Ok. > >>> >>> >>> On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote: Gentle ping, thanks. On 2021/4/16 15:10, Xiong Hu Luo wrote: > fmod/fmodf and remainder/remainderf could be expanded instead of library > call when fast-math build, which is much faster. > > fmodf: >fdivs f0,f1,f2 >frizf0,f0 >fnmsubs f1,f2,f0,f1 > > remainderf: >fdivs f0,f1,f2 >frinf0,f0 >fnmsubs f1,f2,f0,f1 > > gcc/ChangeLog: > > 2021-04-16 Xionghu Luo > > PR target/97142 > > That PR is " Bug 97142 >- __builtin_fmod not optimized on POWER " > > OK. > > > * config/rs6000/rs6000.md (fmod3): New define_expand. > (remainder3): Likewise. > > > > gcc/testsuite/ChangeLog: > > 2021-04-16 Xionghu Luo > > PR target/97142 > * gcc.target/powerpc/pr97142.c: New test. > > Ok. > > --- >gcc/config/rs6000/rs6000.md| 36 ++ >gcc/testsuite/gcc.target/powerpc/pr97142.c | 30 ++ >2 files changed, 66 insertions(+) >create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97142.c > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index a1315523fec..7e0e94e6ba4 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -4902,6 +4902,42 @@ (define_insn "fre" > [(set_attr "type" "fp") > (set_attr "isa" "*,")]) > +(define_expand "fmod3" > + [(use (match_operand:SFDF 0 "gpc_reg_operand")) > +(use (match_operand:SFDF 1 "gpc_reg_operand")) > +(use (match_operand:SFDF 2 "gpc_reg_operand"))] > + "TARGET_HARD_FLOAT > + && TARGET_FPRND > + && flag_unsafe_math_optimizations" > +{ > + rtx div = gen_reg_rtx (mode); > + emit_insn (gen_div3 (div, operands[1], operands[2])); > + > + rtx friz = gen_reg_rtx (mode); > + emit_insn (gen_btrunc2 (friz, div)); > + > + emit_insn (gen_nfms4 (operands[0], operands[2], friz, > operands[1])); > + DONE; > + }) > + > +(define_expand "remainder3" > + [(use (match_operand:SFDF 0 "gpc_reg_operand")) > +(use (match_operand:SFDF 1 "gpc_reg_operand")) > +(use (match_operand:SFDF 2 "gpc_reg_operand"))] > + "TARGET_HARD_FLOAT > + && TARGET_FPRND > + && flag_unsafe_math_optimizations" > +{ > + rtx div = gen_reg_rtx (mode); > + emit_insn (gen_div3 (div, operands[1], operands[2])); > + > + rtx frin = gen_reg_rtx (mode); > + emit_insn (gen_round2 (frin, div)); > + > + emit_insn (gen_nfms4 (operands[0], operands[2], frin, > operands[1])); > + DONE; > + }) > > I notice the pattern of arguments to the final emit > is op[0],op[2],fri*,op[1] > while the description comment suggests the generated instruction > will be fnmsubs f1,f2,f0,f1 ; > > I don't see any rearranging in the nfms4 expansions, but > presumably this is correct and just a cosmetic nit that catches my eye. >From the ISA, fnmsub FRT,FRA,FRC,FRB The operation FRT ← - ( [(FRA) (FRC)] - (FRB) ) is performed. fmodf: fdivs f0,f1,f2 frizf0,f0 fnmsubs f1,f2,f0,f1 Then the ASM means: f1 = - (f2 * f0 - f1) = - ([f2 * f1/f2] - f1) So f1 is set with the mod result. > > Ok. > > > + >(define_insn "*rsqrt2" > [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa") >(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" ",wa")] > diff --git a/gcc/testsuite/gcc.target/powerpc/pr97142.c > b/gcc/testsuite/gcc.target/powerpc/pr97142.c > new file mode 100644 > index 000..48f25ca5b5b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr97142.c > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > +/* { dg-options "-Ofast" } */ > + > +#include > + > +float test1 (float x, float y) > +{ > + return fmodf (x, y); > +} > + > +double test2 (double x, double y) > +{ > + return fmod (x, y); > +} > + > +float test3 (float x, float y) > +{ > + return remainderf (x, y); > +} > + > +double test4 (double x, double y) > +{ > + return remainder (x, y); > +} > + > +/* { dg-final { scan-assembler-not {\mbl fmod\M} } } */ > +/* { dg-final { scan-assembler-not {\mbl fmodf\M} } } */ > +/* { dg-final { scan-assembler-not
Re: Repost: [PATCH] PR 100167: Fix vector long long multiply/divide tests on power10
Hi Mike, On 7/7/21 3:04 PM, Michael Meissner wrote: [PATCH] PR 100167: Fix vector long long multiply/divide tests on power10. This patch updates the vector long long multiply and divide tests to supply the correct code information if power10 code generation is used. 2021-07-07 Michael Meissner gcc/testsuite/ PR testsuite/100167 * gcc.target/powerpc/fold-vec-div-longlong.c: Missing information after colon. * gcc.target/powerpc/fold-vec-mult-longlong.c: Fix expected code generation on power10. --- gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c | 7 +-- gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c | 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c index 312e984d3cc..f6a9b290ae5 100644 --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c @@ -19,5 +19,8 @@ test6 (vector unsigned long long x, vector unsigned long long y) { return vec_div (x, y); } -/* { dg-final { scan-assembler-times {\mdivd\M} 2 } } */ -/* { dg-final { scan-assembler-times {\mdivdu\M} 2 } } */ + +/* { dg-final { scan-assembler-times {\mdivd\M} 2 { target { ! has_arch_pwr10 } } } } */ +/* { dg-final { scan-assembler-times {\mdivdu\M} 2 { target { ! has_arch_pwr10 } } } } */ +/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 { target { has_arch_pwr10 } } } } */ +/* { dg-final { scan-assembler-times {\mvdivud\M} 1 { target { has_arch_pwr10 } } } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c index 38dba9f5023..bd210e34801 100644 --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c @@ -20,5 +20,6 @@ test6 (vector unsigned long long x, vector unsigned long long y) return vec_mul (x, y); } -/* { dg-final { scan-assembler-times "\[ \t\]mulld " 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mmulld\M} 4 { target { lp64 && { ! has_arch_pwr10 } } } } } */ +/* { dg-final { scan-assembler-times {\mvmulld\M} 2 { target { has_arch_pwr10 } } } } */ Shouldn't this last be { lp64 && has_arch_pwr10 } ? Otherwise LGTM. I can't approve, but recommend approval with those changes. Thanks, Bill
Re: Repost: [PATCH] PR 100170: Fix eq/ne tests on power10.
Hi Mike, ENOPATCH Thanks, Bill :-) On 7/7/21 3:06 PM, Michael Meissner wrote: [PATCH] PR 100170: Fix eq/ne tests on power10. This patch updates eq/ne tests in the testsuite to adjust the test if power10 code generation is used. I have verified that these tests run on a power10 system using the --with-cpu=power10 configuration option, and they continue to run on power9 little endian and power8 big endian systems. Can I check this patch into th master branch? 2021-07-07 Michael Meissner gcc/testsuite/ PR testsuite/100170 * gcc.target/powerpc/ppc-eq0-1.c: Add support for the setbc instruction. * gcc.target/powerpc/ppc-ne0-1.c: Update instruction counts on power10.
Re: Repost: [PATCH] PR 100168: Fix call test on power10.
Hi Mike, LGTM. I can't approve, but recommend approval. Thanks, Bill On 7/7/21 3:08 PM, Michael Meissner wrote: [PATCH] PR 100168: Fix call test on power10. Fix a test that was checking for 64-bit TOC calls, to also allow for PC-relative calls. I have verified that this test passes when run on a power10 system configured with --with-cpu=power10 and it continues to pass on power9 little endian and power8 big endian systems. Can I check this into the master branch? 2021-07-07 Michael Meissner gcc/testsuite PR testsuite/100168 * gcc.dg/pr56727-2.c: Add support for PC-relative calls. --- gcc/testsuite/gcc.dg/pr56727-2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c b/gcc/testsuite/gcc.dg/pr56727-2.c index c54369ed25e..77fdf4bc350 100644 --- a/gcc/testsuite/gcc.dg/pr56727-2.c +++ b/gcc/testsuite/gcc.dg/pr56727-2.c @@ -18,4 +18,4 @@ void h () /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } } } */ /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* && ilp32 } } } } */ -/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { powerpc*-*-linux* && lp64 } } } } */ +/* { dg-final { scan-assembler "(bl f\n\\s*nop)|(bl f@notoc)" { target { powerpc*-*-linux* && lp64 } } } } */
Re: [PATCH 2/2] rs6000: Add tests for SSE4.1 "floor" intrinsics
Hi Paul, On 7/6/21 5:50 PM, Paul A. Clarke via Gcc-patches wrote: Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss. These are modelled after (and depend upon parts of) the tests for _mm_ceil intrinsics, recently posted. Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386. 2021-07-06 Paul A. Clarke gcc/testsuite/ChangeLog: [Applies to all your patches] Don't need to include "ChangeLog:" here. "gcc/testsuite/" works fine. * gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from gcc/testsuite/gcc.target/i386. As before, these all need to be relative to gcc/testsuite, so start with gcc.target/... --- .../gcc.target/powerpc/sse4_1-floorpd.c | 51 .../gcc.target/powerpc/sse4_1-floorps.c | 33 + .../gcc.target/powerpc/sse4_1-floorsd.c | 119 ++ .../gcc.target/powerpc/sse4_1-floorss.c | 95 ++ .../gcc.target/powerpc/sse4_1-roundpd-2.c | 36 ++ 5 files changed, 334 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c new file mode 100644 index ..ad21644f50c4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ + +#define NO_WARN_X86_INTRINSICS 1 +#include + +#define VEC_T __m128d +#define FP_T double + +#define ROUND_INTRIN(x, mode) _mm_floor_pd (x) + +#include "sse4_1-round-data.h" + +static struct data data[] = { + { .value = { .f = { 0.00, 0.25 } }, .answer = { 0.0, 0.0 } }, + { .value = { .f = { 0.50, 0.75 } }, .answer = { 0.0, 0.0 } }, + + { { .f = { 0x1.cp+50, 0x1.dp+50 } }, + { 0x1.cp+50, 0x1.cp+50 } }, + { { .f = { 0x1.ep+50, 0x1.0p+51 } }, + { 0x1.cp+50, 0x1.0p+51 } }, + { { .f = { 0x1.0p+51, 0x1.1p+51 } }, + { 0x1.0p+51, 0x1.0p+51 } }, + { { .f = { 0x1.2p+51, 0x1.3p+51 } }, + { 0x1.2p+51, 0x1.2p+51 } }, + + { { .f = { 0x1.ep+51, 0x1.fp+51 } }, + { 0x1.ep+51, 0x1.ep+51 } }, + { { .f = { 0x1.0p+52, 0x1.1p+52 } }, + { 0x1.0p+52, 0x1.1p+52 } }, + + { { .f = { -0x1.1p+52, -0x1.0p+52 } }, + { -0x1.1p+52, -0x1.0p+52 } }, + { { .f = { -0x1.fp+51, -0x1.ep+52 } }, + { -0x1.0p+52, -0x1.ep+52 } }, + + { { .f = { -0x1.3p+51, -0x1.2p+51 } }, + { -0x1.4p+51, -0x1.2p+51 } }, + { { .f = { -0x1.1p+51, -0x1.0p+51 } }, + { -0x1.2p+51, -0x1.0p+51 } }, + { { .f = { -0x1.fp+50, -0x1.ep+50 } }, + { -0x1.0p+51, -0x1.0p+51 } }, + { { .f = { -0x1.dp+50, -0x1.cp+50 } }, + { -0x1.0p+51, -0x1.cp+50 } }, + + { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } }, + { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } } +}; + +#include "sse4_1-round.h" diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c new file mode 100644 index ..17ff35a7360f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ + +#define NO_WARN_X86_INTRINSICS 1 +#include + +#define VEC_T __m128 +#define FP_T float + +#define ROUND_INTRIN(x, mode) _mm_floor_ps (x) + +#include "sse4_1-round-data.h" + +static struct data data[] = { + { { .f = { 0.00, 0.25, 0.50, 0.75 } }, { 0.0, 0.0, 0.0, 0.0 } }, + + { { .f = { 0x1.f8p+21, 0x1.fap+21, 0x1.fcp+21, 0x1.fep+21 } }, + { 0x1.f8p+21, 0x1.f8p+21, 0x1.f8p+21, 0x1.f8p+21 } }, + + { { .f = { 0x1.fap+22,
Re: [PATCH 1/2] rs6000: Add support for SSE4.1 "floor" intrinsics
Hi Paul, On 7/6/21 5:50 PM, Paul A. Clarke via Gcc-patches wrote: 2021-07-06 Paul A. Clarke gcc/ChangeLog: * config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss): New. --- gcc/config/rs6000/smmintrin.h | 28 1 file changed, 28 insertions(+) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index 0c0b0dd7c1e3..f484a7fd029f 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -240,4 +240,32 @@ _mm_ceil_ss (__m128 __A, __m128 __B) return r; } +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) Usual fuss about line length. LGTM with that fixed here and below. I can't approve, but recommend approval with those changes. Thanks, Bill +_mm_floor_pd (__m128d __A) +{ + return (__m128d) vec_floor ((__v2df) __A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_floor_ps (__m128 __A) +{ + return (__m128) vec_floor ((__v4sf) __A); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_floor_sd (__m128d __A, __m128d __B) +{ + __v2df r = vec_floor ((__v2df) __B); + r[1] = ((__v2df) __A)[1]; + return (__m128d) r; +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_floor_ss (__m128 __A, __m128 __B) +{ + __v4sf r = (__v4sf) __A; + r[0] = __builtin_floor (((__v4sf) __B)[0]); + return r; +} + #endif
Re: [PATCH 2/2] rs6000: Add tests for SSE4.1 "ceil" intrinsics
Hi Paul, On 7/1/21 9:11 PM, Paul A. Clarke via Gcc-patches wrote: Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss. Copy a test for _mm_ceil_pd and _mm_ceil_ps from gcc/testsuite/gcc.target/i386. Define __VSX_SSE2__ to pick up some union definitons in typo ("definitions"). m128-check.h. 2021-07-01 Paul A. Clarke gcc/testsuite/ChangeLog: "gcc/testsuite/" will make the tools happy. * gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c: New. All of these should be relative to gcc/testsuite/, so * gcc.target/powerpc/sse4_1-ceilpd.c: New. and similar. * gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-round.h: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h: New. * gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c: Copy from gcc/testsuite/gcc.target/i386. * gcc/testsuite/gcc.target/powerpc/sse4_1-check.h (__VSX_SSE2__): Define. --- .../gcc.target/powerpc/sse4_1-ceilpd.c| 51 .../gcc.target/powerpc/sse4_1-ceilps.c| 33 + .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 ++ .../gcc.target/powerpc/sse4_1-ceilss.c| 95 ++ .../gcc.target/powerpc/sse4_1-check.h | 4 + .../gcc.target/powerpc/sse4_1-round-data.h| 20 +++ .../gcc.target/powerpc/sse4_1-round.h | 27 .../gcc.target/powerpc/sse4_1-round2.h| 27 .../gcc.target/powerpc/sse4_1-roundpd-3.c | 36 ++ 9 files changed, 412 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c new file mode 100644 index ..f532fdb9c285 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ + +#define NO_WARN_X86_INTRINSICS 1 +#include + +#define VEC_T __m128d +#define FP_T double + +#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x) + +#include "sse4_1-round-data.h" + +static struct data data[] = { + { .value = { .f = { 0.00, 0.25 } }, .answer = { 0.0, 1.0 } }, + { .value = { .f = { 0.50, 0.75 } }, .answer = { 1.0, 1.0 } }, + + { { .f = { 0x1.cp+50, 0x1.dp+50 } }, + { 0x1.cp+50, 0x1.0p+51 } }, + { { .f = { 0x1.ep+50, 0x1.fp+50 } }, + { 0x1.0p+51, 0x1.0p+51 } }, + { { .f = { 0x1.0p+51, 0x1.1p+51 } }, + { 0x1.0p+51, 0x1.2p+51 } }, + { { .f = { 0x1.2p+51, 0x1.3p+51 } }, + { 0x1.2p+51, 0x1.4p+51 } }, + + { { .f = { 0x1.ep+51, 0x1.fp+51 } }, + { 0x1.ep+51, 0x1.0p+52 } }, + { { .f = { 0x1.0p+52, 0x1.1p+52 } }, + { 0x1.0p+52, 0x1.1p+52 } }, + + { { .f = { -0x1.1p+52, -0x1.0p+52 } }, + { -0x1.1p+52, -0x1.0p+52 } }, + { { .f = { -0x1.fp+51, -0x1.ep+51 } }, + { -0x1.ep+51, -0x1.ep+51 } }, + + { { .f = { -0x1.3p+51, -0x1.2p+51 } }, + { -0x1.2p+51, -0x1.2p+51 } }, + { { .f = { -0x1.1p+51, -0x1.0p+51 } }, + { -0x1.0p+51, -0x1.0p+51 } }, + { { .f = { -0x1.fp+50, -0x1.ep+50 } }, + { -0x1.cp+50, -0x1.cp+50 } }, + { { .f = { -0x1.dp+50, -0x1.cp+50 } }, + { -0x1.cp+50, -0x1.cp+50 } }, + + { { .f = { -1.00, -0.75 } }, { -1.0, 0.0 } }, + { { .f = { -0.50, -0.25 } }, { 0.0, 0.0 } } +}; + +#include "sse4_1-round.h" diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c new file mode 100644 index ..417ac76d8aa9 ---
Re: [PATCH 1/2] rs6000: Add support for SSE4.1 "ceil" intrinsics
Hi Paul, On 7/1/21 9:11 PM, Paul A. Clarke via Gcc-patches wrote: 2021-07-01 Paul A. Clarke gcc/ChangeLog: * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss): New. --- gcc/config/rs6000/smmintrin.h | 28 1 file changed, 28 insertions(+) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index fa17a8b2f478..0c0b0dd7c1e3 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -212,4 +212,32 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) return any_ones * any_zeros; } +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) Usual fuss over line length, here and below. Otherwise LGTM, can't approve but recommend approval with those changes. Thanks, Bill +_mm_ceil_pd (__m128d __A) +{ + return (__m128d) vec_ceil ((__v2df) __A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ceil_ps (__m128 __A) +{ + return (__m128) vec_ceil ((__v4sf) __A); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ceil_sd (__m128d __A, __m128d __B) +{ + __v2df r = vec_ceil ((__v2df) __B); + r[1] = ((__v2df) __A)[1]; + return (__m128d) r; +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ceil_ss (__m128 __A, __m128 __B) +{ + __v4sf r = (__v4sf) __A; + r[0] = __builtin_ceil (((__v4sf) __B)[0]); + return r; +} + #endif
Re: [committed] input.c: move file caching globals to a new file_cache class
Hi David- I thought this might be a good opportunity to ask about the patch that supports -finput-charset in diagnostic.c please? https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564527.html The patch will require some work to adapt to the new changes below. I am happy to do that, but thought I should check first whether you have any interest in this approach? Thanks! -Lewis On Thu, Jul 1, 2021 at 5:52 PM David Malcolm via Gcc-patches wrote: > > This moves some global state from input.c to a new file_cache class, > of which an instance is owned by global_dc. Various state is also > made private. > > No functional change intended. > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > Pushed to trunk as b544c348e13ad33d55f0d954370ab1fb0f2bf683. > > gcc/ChangeLog: > * diagnostic.h (diagnostic_context::m_file_cache): New field. > * input.c (class fcache): Rename to... > (class file_cache_slot): ...this, making most members private and > prefixing fields with "m_". > (file_cache_slot::get_file_path): New accessor. > (file_cache_slot::get_use_count): New accessor. > (file_cache_slot::missing_trailing_newline_p): New accessor. > (file_cache_slot::inc_use_count): New. > (fcache_buffer_size): Move to... > (file_cache_slot::buffer_size): ...here. > (fcache_line_record_size): Move to... > (file_cache_slot::line_record_size): ...here. > (fcache_tab): Delete, in favor of global_dc->m_file_cache. > (fcache_tab_size): Move to file_cache::num_file_slots. > (diagnostic_file_cache_init): Update for move of fcache_tab > to global_dc->m_file_cache. > (diagnostic_file_cache_fini): Likewise. > (lookup_file_in_cache_tab): Convert to... > (file_cache::lookup_file): ...this. > (diagnostics_file_cache_forcibly_evict_file): Update for move of > fcache_tab to global_dc->m_file_cache, moving most of > implementation to... > (file_cache::forcibly_evict_file): ...this new function and... > (file_cache_slot::evict): ...this new function. > (evicted_cache_tab_entry): Convert to... > (file_cache::evicted_cache_tab_entry): ...this. > (add_file_to_cache_tab): Convert to... > (file_cache::add_file): ...this, moving bulk of implementation > to... > (file_cache_slot::create): ..this new function. > (file_cache::file_cache): New. > (file_cache::~file_cache): New. > (lookup_or_add_file_to_cache_tab): Convert to... > (file_cache::lookup_or_add_file): ..this new function. > (fcache::fcache): Rename to... > (file_cache_slot::file_cache_slot): ...this, adding "m_" prefixes > to fields. > (fcache::~fcache): Rename to... > (file_cache_slot::~file_cache_slot): ...this, adding "m_" prefixes > to fields. > (needs_read): Convert to... > (file_cache_slot::needs_read_p): ...this. > (needs_grow): Convert to... > (file_cache_slot::needs_grow_p): ...this. > (maybe_grow): Convert to... > (file_cache_slot::maybe_grow): ...this. > (read_data): Convert to... > (file_cache_slot::read_data): ...this. > (maybe_read_data): Convert to... > (file_cache_slot::maybe_read_data): ...this. > (get_next_line): Convert to... > (file_cache_slot::get_next_line): ...this. > (goto_next_line): Convert to... > (file_cache_slot::goto_next_line): ...this. > (read_line_num): Convert to... > (file_cache_slot::read_line_num): ...this. > (location_get_source_line): Update for moving of globals to > global_dc->m_file_cache. > (location_missing_trailing_newline): Likewise. > * input.h (class file_cache_slot): New forward decl. > (class file_cache): New. > > Signed-off-by: David Malcolm > --- > gcc/diagnostic.h | 3 + > gcc/input.c | 459 +++ > gcc/input.h | 33 > 3 files changed, 301 insertions(+), 194 deletions(-) > > diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h > index 1b9d6b1f64d..086bc4f903f 100644 > --- a/gcc/diagnostic.h > +++ b/gcc/diagnostic.h > @@ -136,6 +136,9 @@ struct diagnostic_context >/* Where most of the diagnostic formatting work is done. */ >pretty_printer *printer; > > + /* Cache of source code. */ > + file_cache *m_file_cache; > + >/* The number of times we have issued diagnostics. */ >int diagnostic_count[DK_LAST_DIAGNOSTIC_KIND]; > > diff --git a/gcc/input.c b/gcc/input.c > index 9e39e7df83c..de20d983d2c 100644 > --- a/gcc/input.c > +++ b/gcc/input.c > @@ -32,9 +32,29 @@ along with GCC; see the file COPYING3. If not see > > /* This is a cache used by get_next_line to store the content of a > file to be searched for file lines. */ > -class fcache > +class file_cache_slot
Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics
On 7/11/21 11:17 AM, Bill Schmidt wrote: Hi Paul, On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote: _mm_blend_epi16 and _mm_blendv_epi8 were added earlier. Add these four to complete the set. 2021-06-29 Paul A. Clarke gcc/ChangeLog: * config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd, _mm_blend_ps, _mm_blendv_ps): New. --- gcc/config/rs6000/smmintrin.h | 46 +++ 1 file changed, 46 insertions(+) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index 1b8cad135ed0..fa17a8b2f478 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); } +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) Usual line length complaint. :) Here and below... +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8) +{ + const signed char __tmp = (__imm8 & 0b10) * 0b0000 | + (__imm8 & 0b01) * 0b; + __v16qi __charmask = vec_splats ((signed char) __tmp); + __charmask = vec_gb (__charmask); + __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask); + #ifdef __BIG_ENDIAN__ + __shortmask = vec_reve (__shortmask); + #endif + return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __shortmask); This seems way too complex, and needs commentary to explain what you're doing. Doesn't this instruction just translate into some form of xxpermdi? Different ones for BE and LE, but still just xxpermdi, I think. +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) +{ + const __v2di __zero = {0}; + const vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero); + return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask); +} Okay. + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8) +{ + const signed char __mask = (__imm8 & 0b1000) * 0b00011000 | + (__imm8 & 0b0100) * 0b1100 | + (__imm8 & 0b0010) * 0b0110 | + (__imm8 & 0b0001) * 0b0011; + __v16qi __charmask = vec_splats ( __mask); + __charmask = vec_gb (__charmask); + __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask); This is a good trick, but you need comments to explain what you're doing, including how you build __mask. I recommend you include alternate code for P10, where you can just use vec_genwm to expand from __mask to a mask of word elements. I don't understand how you're getting away with a v8hu mask for word elements. This seems wrong to me. Adequate testing? As an alternate approach, I suppose you could use vec_perm / vec_permr with one of sixteen possible masks, which would seem faster than the splat/gather/unpack/select approach. Something to consider. Bill + #ifdef __BIG_ENDIAN__ + __shortmask = vec_reve (__shortmask); + #endif + return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask) +{ + const __v4si __zero = {0}; + const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero); + return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask); +} + Okay. Please have a look at the above issues and resubmit. Thanks! Bill extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_testz_si128 (__m128i __A, __m128i __B) {
Re: [PATCH 4/4] rs6000: Add tests for SSE4.1 "blend" intrinsics
Hi Paul, Please resubmit this when you resubmit 3/4, in case any adjustments are needed. Thanks! Bill On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote: Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps, _mm_blendv_ps from gcc/testsuite/gcc.target/i386. 2021-06-29 Paul A. Clarke gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc/testsuite/gcc.target/i386. * gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c: Likewise. * gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c: Likewise. * gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c: Likewise. --- .../gcc.target/powerpc/sse4_1-blendpd.c | 89 ++ .../gcc.target/powerpc/sse4_1-blendps-2.c | 81 + .../gcc.target/powerpc/sse4_1-blendps.c | 90 +++ .../gcc.target/powerpc/sse4_1-blendvpd.c | 65 ++ 4 files changed, 325 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c new file mode 100644 index ..ca1780471fa2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c @@ -0,0 +1,89 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include +#include + +#define NUM 20 + +#ifndef MASK +#define MASK 0x03 +#endif + +static void +init_blendpd (double *src1, double *src2) +{ + int i, sign = 1; + + for (i = 0; i < NUM * 2; i++) +{ + src1[i] = i * i * sign; + src2[i] = (i + 20) * sign; + sign = -sign; +} +} + +static int +check_blendpd (__m128d *dst, double *src1, double *src2) +{ + double tmp[2]; + int j; + + memcpy ([0], src1, sizeof (tmp)); + + for(j = 0; j < 2; j++) +if ((MASK & (1 << j))) + tmp[j] = src2[j]; + + return memcmp (dst, [0], sizeof (tmp)); +} + +static void +TEST (void) +{ + __m128d x, y; + union +{ + __m128d x[NUM]; + double d[NUM * 2]; +} dst, src1, src2; + union +{ + __m128d x; + double d[2]; +} src3; + int i; + + init_blendpd (src1.d, src2.d); + + /* Check blendpd imm8, m128, xmm */ + for (i = 0; i < NUM; i++) +{ + dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK); + if (check_blendpd ([i], [i * 2], [i * 2])) + abort (); +} + + /* Check blendpd imm8, xmm, xmm */ + src3.x = _mm_setzero_pd (); + + x = _mm_blend_pd (dst.x[2], src3.x, MASK); + y = _mm_blend_pd (src3.x, dst.x[2], MASK); + + if (check_blendpd (, [4], [0])) +abort (); + + if (check_blendpd (, [0], [4])) +abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c new file mode 100644 index ..768b6e64bbae --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c @@ -0,0 +1,81 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ + +#include "sse4_1-check.h" + +#include +#include +#include + +#define NUM 20 + +#undef MASK +#define MASK 0xe + +static void +init_blendps (float *src1, float *src2) +{ + int i, sign = 1; + + for (i = 0; i < NUM * 4; i++) +{ + src1[i] = i * i * sign; + src2[i] = (i + 20) * sign; + sign = -sign; +} +} + +static int +check_blendps (__m128 *dst, float *src1, float *src2) +{ + float tmp[4]; + int j; + + memcpy ([0], src1, sizeof (tmp)); + for (j = 0; j < 4; j++) +if ((MASK & (1 << j))) + tmp[j] = src2[j]; + + return memcmp (dst, [0], sizeof (tmp)); +} + +static void +sse4_1_test (void) +{ + __m128 x, y; + union +{ + __m128 x[NUM]; + float f[NUM * 4]; +} dst, src1, src2; + union +{ + __m128 x; + float f[4]; +} src3; + int i; + + init_blendps (src1.f, src2.f); + + for (i = 0; i < 4; i++) +src3.f[i] = (int) rand (); + + /* Check blendps imm8, m128, xmm */ + for (i = 0; i < NUM; i++) +{ + dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK); + if (check_blendps ([i], [i * 4], [i * 4])) + abort (); +} + + /* Check blendps imm8, xmm, xmm */ + x = _mm_blend_ps (dst.x[2], src3.x, MASK); + y = _mm_blend_ps (src3.x, dst.x[2], MASK); + + if (check_blendps (, [8], [0])) +abort (); + + if (check_blendps (, [0], [8])) +abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c new file mode 100644 index
Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics
Hi Paul, On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote: _mm_blend_epi16 and _mm_blendv_epi8 were added earlier. Add these four to complete the set. 2021-06-29 Paul A. Clarke gcc/ChangeLog: * config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd, _mm_blend_ps, _mm_blendv_ps): New. --- gcc/config/rs6000/smmintrin.h | 46 +++ 1 file changed, 46 insertions(+) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index 1b8cad135ed0..fa17a8b2f478 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); } +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) Usual line length complaint. :) Here and below... +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8) +{ + const signed char __tmp = (__imm8 & 0b10) * 0b0000 | + (__imm8 & 0b01) * 0b; + __v16qi __charmask = vec_splats ((signed char) __tmp); + __charmask = vec_gb (__charmask); + __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask); + #ifdef __BIG_ENDIAN__ + __shortmask = vec_reve (__shortmask); + #endif + return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __shortmask); This seems way too complex, and needs commentary to explain what you're doing. Doesn't this instruction just translate into some form of xxpermdi? Different ones for BE and LE, but still just xxpermdi, I think. +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) +{ + const __v2di __zero = {0}; + const vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero); + return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask); +} Okay. + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8) +{ + const signed char __mask = (__imm8 & 0b1000) * 0b00011000 | +(__imm8 & 0b0100) * 0b1100 | +(__imm8 & 0b0010) * 0b0110 | +(__imm8 & 0b0001) * 0b0011; + __v16qi __charmask = vec_splats ( __mask); + __charmask = vec_gb (__charmask); + __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask); This is a good trick, but you need comments to explain what you're doing, including how you build __mask. I recommend you include alternate code for P10, where you can just use vec_genwm to expand from __mask to a mask of word elements. I don't understand how you're getting away with a v8hu mask for word elements. This seems wrong to me. Adequate testing? + #ifdef __BIG_ENDIAN__ + __shortmask = vec_reve (__shortmask); + #endif + return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask) +{ + const __v4si __zero = {0}; + const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero); + return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask); +} + Okay. Please have a look at the above issues and resubmit. Thanks! Bill extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_testz_si128 (__m128i __A, __m128i __B) {
Re: [PATCH 2/4] rs6000: Add tests for SSE4.1 "test" intrinsics
Hi Paul, LGTM. I can't approve, but recommend approval as is. Thanks, Bill On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote: Copy the test for _mm_testz_si128, _mm_testc_si128, _mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros, _mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386. 2021-06-29 Paul A. Clarke gcc/testsuite/ChangeLog: * gcc.target/powerpc/sse4_1-ptest.c: Copy from gcc/testsuite/gcc.target/i386. --- .../gcc.target/powerpc/sse4_1-ptest-1.c | 117 ++ 1 file changed, 117 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c new file mode 100644 index ..69d13d57770d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c @@ -0,0 +1,117 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +static int +make_ptestz (__m128i m, __m128i v) +{ + union +{ + __m128i x; + unsigned char c[16]; +} val, mask; + int i, z; + + mask.x = m; + val.x = v; + + z = 1; + for (i = 0; i < 16; i++) +if ((mask.c[i] & val.c[i])) + { + z = 0; + break; + } + return z; +} + +static int +make_ptestc (__m128i m, __m128i v) +{ + union +{ + __m128i x; + unsigned char c[16]; +} val, mask; + int i, c; + + mask.x = m; + val.x = v; + + c = 1; + for (i = 0; i < 16; i++) +if ((val.c[i] & ~mask.c[i])) + { + c = 0; + break; + } + return c; +} + +static void +TEST (void) +{ + union +{ + __m128i x; + unsigned int i[4]; +} val[4]; + int i, j, l; + int res[32]; + + val[0].i[0] = 0x; + val[0].i[1] = 0x; + val[0].i[2] = 0x; + val[0].i[3] = 0x; + + val[1].i[0] = 0x; + val[1].i[1] = 0x; + val[1].i[2] = 0x; + val[1].i[3] = 0x; + + val[2].i[0] = 0; + val[2].i[1] = 0; + val[2].i[2] = 0; + val[2].i[3] = 0; + + val[3].i[0] = 0x; + val[3].i[1] = 0x; + val[3].i[2] = 0x; + val[3].i[3] = 0x; + + l = 0; + for(i = 0; i < 4; i++) +for(j = 0; j < 4; j++) + { + res[l++] = _mm_testz_si128 (val[j].x, val[i].x); + res[l++] = _mm_testc_si128 (val[j].x, val[i].x); + } + + l = 0; + for(i = 0; i < 4; i++) +for(j = 0; j < 4; j++) + { + if (res[l++] != make_ptestz (val[j].x, val[i].x)) + abort (); + if (res[l++] != make_ptestc (val[j].x, val[i].x)) + abort (); + } + + if (res[2] != _mm_testz_si128 (val[1].x, val[0].x)) +abort (); + + if (res[3] != _mm_testc_si128 (val[1].x, val[0].x)) +abort (); +}
Re: [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics
Hi Paul, On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote: 2021-06-29 Paul A. Clarke gcc/ChangeLog: * config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128, _mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros, _mm_test_mix_ones_zeros): New. --- gcc/config/rs6000/smmintrin.h | 50 +++ 1 file changed, 50 insertions(+) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index bdf6eb365d88..1b8cad135ed0 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); } +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) Line too long, please fix here and below. (Existing cases can be left.) +_mm_testz_si128 (__m128i __A, __m128i __B) +{ + /* Note: This implementation does NOT set "zero" or "carry" flags. */ This is reasonable; thanks for documenting. LGTM; I can't approve, but recommend approval with line lengths fixed. Thanks! Bill + const __v16qu __zero = {0}; + return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testc_si128 (__m128i __A, __m128i __B) +{ + /* Note: This implementation does NOT set "zero" or "carry" flags. */ + const __v16qu __zero = {0}; + const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A); + return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testnzc_si128 (__m128i __A, __m128i __B) +{ + /* Note: This implementation does NOT set "zero" or "carry" flags. */ + return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0; +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_test_all_zeros (__m128i __A, __m128i __mask) +{ + const __v16qu __zero = {0}; + return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_test_all_ones (__m128i __A) +{ + const __v16qu __ones = vec_splats ((unsigned char) 0xff); + return vec_all_eq ((__v16qu) __A, __ones); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) +{ + const __v16qu __zero = {0}; + const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask); + const int any_ones = vec_any_ne (__Amasked, __zero); + const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A); + const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask); + const int any_zeros = vec_any_ne (__notAmasked, __zero); + return any_ones * any_zeros; +} + #endif
[PATCH] x86: Don't enable UINTR in 32-bit mode
UINTR is available only in 64-bit mode. Since the codegen target is unknown when the the gcc driver is processing -march=native, to properly handle UINTR for -march=native: 1. Add an undocumented option, -muintr-native. 2. Update the gcc driver to pass -muintr-native with -march=native if UINTR is available. 3. Change ix86_option_override_internal to a. Turn on UINTR in 64-bit mode for -muintr-native. b. Enable UINTR only in 64-bit mode for -march=CPU when PTA_CPU includes PTA_UINTR. gcc/ PR target/101395 * config/i386/driver-i386.c (host_detect_local_cpu): Add -muintr-native for FEATURE_UINTR. * config/i386/i386-options.c (ix86_option_override_internal): Move -muintr check after -march processing and turn on UINTR in 64-bit mode for -muintr-native if it has not been disabled explicitly. (DEF_PTA): Skip PTA_UINTR if not in 64-bit mode. * config/i386/i386.opt (muintr-native): New. Undocumented to enable -muintr only in 64-bit mode with -march=native. gcc/testsuite/ PR target/101395 * gcc.target/i386/pr101395-1.c: New test. * gcc.target/i386/pr101395-2.c: Likewise. * gcc.target/i386/pr101395-3.c: Likewise. --- gcc/config/i386/driver-i386.c | 10 -- gcc/config/i386/i386-options.c | 14 +++--- gcc/config/i386/i386.opt | 4 gcc/testsuite/gcc.target/i386/pr101395-1.c | 12 gcc/testsuite/gcc.target/i386/pr101395-2.c | 22 ++ gcc/testsuite/gcc.target/i386/pr101395-3.c | 6 ++ 6 files changed, 63 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-3.c diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c index dd9236616b4..7ade90a088d 100644 --- a/gcc/config/i386/driver-i386.c +++ b/gcc/config/i386/driver-i386.c @@ -804,8 +804,14 @@ const char *host_detect_local_cpu (int argc, const char **argv) if (isa_names_table[i].option) { if (has_feature (isa_names_table[i].feature)) - options = concat (options, " ", - isa_names_table[i].option, NULL); + { + const char *option; + if (isa_names_table[i].feature == FEATURE_UINTR) + option = "-muintr-native"; + else + option = isa_names_table[i].option; + options = concat (options, " ", option, NULL); + } else options = concat (options, neg_option, isa_names_table[i].option + 2, NULL); diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 7a35c468da3..6b12f030f0c 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -1920,9 +1920,6 @@ ix86_option_override_internal (bool main_args_p, opts->x_ix86_stringop_alg = no_stringop; } - if (TARGET_UINTR && !TARGET_64BIT) -error ("%<-muintr%> not supported for 32-bit code"); - if (!opts->x_ix86_arch_string) opts->x_ix86_arch_string = TARGET_64BIT_P (opts->x_ix86_isa_flags) @@ -2109,6 +2106,7 @@ ix86_option_override_internal (bool main_args_p, #define DEF_PTA(NAME) \ if (((processor_alias_table[i].flags & PTA_ ## NAME) != 0) \ && PTA_ ## NAME != PTA_64BIT \ + && (TARGET_64BIT || PTA_ ## NAME != PTA_UINTR) \ && !TARGET_EXPLICIT_ ## NAME ## _P (opts)) \ SET_TARGET_ ## NAME (opts); #include "i386-isa.def" @@ -2184,6 +2182,16 @@ ix86_option_override_internal (bool main_args_p, XDELETEVEC (s); } + if (TARGET_64BIT) +{ + /* Turn on UINTR in 64-bit mode for -muintr-native if it has not +been set explicitly. */ + if (ix86_uintr_native && !TARGET_EXPLICIT_UINTR_P (opts)) + SET_TARGET_UINTR (opts); +} + else if (TARGET_UINTR) +error ("%<-muintr%> not supported for 32-bit code"); + ix86_arch_mask = HOST_WIDE_INT_1U << ix86_arch; for (i = 0; i < X86_ARCH_LAST; ++i) ix86_arch_features[i] = !!(initial_ix86_arch_features[i] & ix86_arch_mask); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 7b8547bb1c3..5923bbfb40e 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -802,6 +802,10 @@ muintr Target Mask(ISA2_UINTR) Var(ix86_isa_flags2) Save Support UINTR built-in functions and code generation. +; Used to enable -muintr only in 64-bit mode with -march=native. +muintr-native +Target Undocumented RejectNegative Var(ix86_uintr_native) + msgx Target Mask(ISA2_SGX) Var(ix86_isa_flags2) Save Support SGX built-in functions and code generation. diff --git a/gcc/testsuite/gcc.target/i386/pr101395-1.c b/gcc/testsuite/gcc.target/i386/pr101395-1.c new file mode
[PATCH V2] coroutines: Adjust outlined function names [PR95520].
Hi Jason, > On 9 Jul 2021, at 22:40, Jason Merrill wrote: > > On 7/9/21 2:18 PM, Iain Sandoe wrote: > How about handling this in write_encoding, along the lines of the > devel/c++-contracts branch? OK, so I took a look at this and implemented as below. Some small differences from your contracts impl described here. recalling the original function becomes the ramp - it is called directly by the user-code. the resumer (actor) contains the outlined code wrapped in synthesized logic as dictated by the std the destroy function effectively calls the actor with a flag that says “take the DTOR path” (since the DTOR path has to be available in the case of resume too). this means that is is possible for the actor to be partially (or completely for a generator-style coro) inlined into either the ramp or the destroyer. 1. using DECL_ABSTRACT_ORIGIN didn’t work with optimisation and debug since the inlining of the outlining confuses the issue (the actor/destory helpers are not real clones). - there hasn’t been any specific reason to know “which” coroutine function was being lowered in the middle or back ends to date - so I had to add some book-keeping to allow that to be queried from write_encoding. 2. I had to cater for lambda coroutines; that meant recognising that we have a lambda coro helper and picking up the base mangling for the ramp (original lambda) 3. I made a minor adjustment to the string handling so that it can account for targets that don’t support ‘.’ or ‘$’ in symbols. > Speaking of which, I wonder if you also want to do something similar to what > I did there to put the ramp/actor/destroyer functions into into the same > comdat group. I looked through your code and agree that it should be possible to be more restrictive about the interfaces presented by the actor and destroy functions in coros. The ramp obviously has to keep the visiblity with which the user wrote it. As for comdat groups, I’d need to look harder. please could these things be TODOs - the fix for 95520 doesn’t make them any worse (or better), and there are other bugs that are higher priority. tested on x86_64-Linux,Darwin and powerpc64-linux, also with cppcoro (but i would plan to test it on folly too before pushing) OK for master / backports? = The mechanism used to date for uniquing the coroutine helper functions (actor, destroy) was over-complicating things and leading to the noted PR and also difficulties in setting breakpoints on these functions (so this will help PR99215 as well). This implementation delegates the adjustment to the mangling to write_encoding() which necessitates some book-keeping so that it is possible to determine which of the coroutine helper names is to be mangled. Signed-off-by: Iain Sandoe PR c++/95520 - [coroutines] __builtin_FUNCTION() returns mangled .actor instead of original function name PR c++/95520 gcc/cp/ChangeLog: * coroutines.cc (struct coroutine_info): Add fields for actor and destroy function decls. (to_ramp): New. (coro_get_ramp_function): New. (coro_get_actor_function): New. (coro_get_destroy_function): New. (act_des_fn): Set up mapping between ramp, actor and destroy functions. (morph_fn_to_coro): Adjust interface to the builder for helper function decls. * cp-tree.h (DECL_ACTOR_FN, DECL_DESTROY_FN, DECL_RAMP_FN DECL_IS_CORO_ACTOR_P, DECL_IS_CORO_DESTROY_P JOIN_STR): New. * mangle.c (write_encoding): Handle coroutine helpers. (write_unqualified_name): Handle lambda coroutine helpers. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr95520.C: New test. --- gcc/cp/coroutines.cc | 87 +++ gcc/cp/cp-tree.h | 30 gcc/cp/mangle.c | 18 - gcc/testsuite/g++.dg/coroutines/pr95520.C | 29 4 files changed, 150 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95520.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 54ffdc8d062..a75f55427cb 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -82,11 +82,13 @@ static bool coro_promise_type_found_p (tree, location_t); struct GTY((for_user)) coroutine_info { tree function_decl; /* The original function decl. */ - tree promise_type; /* The cached promise type for this function. */ - tree handle_type; /* The cached coroutine handle for this function. */ - tree self_h_proxy; /* A handle instance that is used as the proxy for the - one that will eventually be allocated in the coroutine - frame. */ + tree actor_decl;/* The synthesized actor function. */ + tree destroy_decl; /* The synthesized destroy function. */ + tree promise_type; /* The cached promise type for this function. */ + tree handle_type; /* The cached coroutine handle
[PATCH] PR tree-optimization/101403: Incorrect folding of ((T)bswap(x))>>C
My sincere apologies for the breakage. My recent patch to fold bswapN(x)>>C where the constant C was large enough that the result only contains bits from the low byte, and can therefore avoid the byte swap contains a minor logic error. The pattern contains a convert? allowing an extension to occur between the bswap and the shift. The logic is correct if there's no extension, or the extension has the same sign as the shift, but I'd mistakenly convinced myself that these couldn't have different signedness. (T)bswap16(x)>>12 is (T)((unsigned char)x>>4) or (T)((signed char)x>>4). The bug is that for zero-extensions to signed type T, we need to use the unsigned char variant [the signedness of the byte shift is not (always) the same as the signedness of T and the original shift]. Then because I'm now paranoid, I've also added a clause to handle the hypothetical (but in practice impossible) sign-extension to an unsigned type T, which can implemented as (T)(x<<8)>>12. This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and "make -k check" with no new failures, and a new testcase to confirm it fixes the regression. Ok for mainline? 2021-07-11 Roger Sayle gcc/ChangeLog PR tree-optimization/101403 * gcc/match.pd ((T)bswap(X)>>C): Correctly handle cases where signedness of the shift is not the same as the signedness of the type extension. gcc/testsuite/ChangeLog PR tree-optimization/101403 * gcc.dg/pr101403.c: New test case. Sorry again, Roger -- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/match.pd b/gcc/match.pd index 30680d4..beb8d27 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3659,19 +3659,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { unsigned HOST_WIDE_INT prec = TYPE_PRECISION (TREE_TYPE (@2)); unsigned HOST_WIDE_INT bits = tree_to_uhwi (@1); + /* If the bswap was extended before the original shift, this +byte (shift) has the sign of the extension, not the sign of +the original shift. */ + tree st = TYPE_PRECISION (type) > prec ? TREE_TYPE (@2) : type; } - (if (bits + 8 == prec) - (if (TYPE_UNSIGNED (type)) - (convert (convert:unsigned_char_type_node @0)) - (convert (convert:signed_char_type_node @0))) - (if (bits < prec && bits + 8 > prec) - (with - { -tree nst = build_int_cst (integer_type_node, bits & 7); -tree bt = TYPE_UNSIGNED (type) ? unsigned_char_type_node - : signed_char_type_node; - } - (convert (rshift:bt (convert:bt @0) {nst;} + /* Special case: logical right shift of sign-extended bswap. + (unsigned)(short)bswap16(x)>>12 is (unsigned)((short)x<<8)>>12. */ + (if (TYPE_PRECISION (type) > prec + && !TYPE_UNSIGNED (TREE_TYPE (@2)) + && TYPE_UNSIGNED (type) + && bits < prec && bits + 8 >= prec) + (with { tree nst = build_int_cst (integer_type_node, prec - 8); } + (rshift (convert (lshift:st (convert:st @0) {nst;})) @1)) + (if (bits + 8 == prec) + (if (TYPE_UNSIGNED (st)) + (convert (convert:unsigned_char_type_node @0)) + (convert (convert:signed_char_type_node @0))) + (if (bits < prec && bits + 8 > prec) + (with +{ + tree nst = build_int_cst (integer_type_node, bits & 7); + tree bt = TYPE_UNSIGNED (st) ? unsigned_char_type_node + : signed_char_type_node; +} +(convert (rshift:bt (convert:bt @0) {nst;}) /* bswap(x) & C1 can sometimes be simplified to (x >> C2) & C1. */ (simplify (bit_and (convert? (bswap@2 @0)) INTEGER_CST@1) /* { dg-do run } */ /* { dg-options "-O2" } */ unsigned int foo (unsigned int a) { unsigned int u; unsigned short b = __builtin_bswap16 (a); return b >> (u, 12); } int main (void) { unsigned int x = foo (0x80); if (x != 0x0008) __builtin_abort (); return 0; }