Re: [PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-11 Thread Fangrui Song

On 2021-07-12, Kito Cheng wrote:

It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* config/riscv/constraints.md ("S"): Update description and remove
@internal.
* doc/md.texi (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
gcc/config/riscv/constraints.md | 3 +--
gcc/doc/md.texi | 3 +++
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 8c15c6c0486..c87d5b796a5 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -67,8 +67,7 @@ (define_memory_constraint "A"
   (match_test "GET_CODE(XEXP(op,0)) == REG")))

(define_constraint "S"
-  "@internal
-   A constant call address."
+  "A constraint that matches an absolute symbolic address."
  (match_operand 0 "absolute_symbolic_operand"))

(define_constraint "U"
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..2d120da96cf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
@item A
An address that is held in a general-purpose register.

+@item S
+A constraint that matches an absolute symbolic address.
+
@end table

@item RX---@file{config/rx/constraints.md}
--
2.31.1


LGTM


[PATCH v2] docs: Add 'S' to Machine Constraints for RISC-V

2021-07-11 Thread Kito Cheng
It was undocument before, but it might used in linux kernel for resolve
code model issue, so LLVM community suggest we should document that,
so that make it become supported/documented/non-internal machine constraints.

gcc/ChangeLog:

PR target/101275
* config/riscv/constraints.md ("S"): Update description and remove
@internal.
* doc/md.texi (Machine Constraints): Document the 'S' constraints
for RISC-V.
---
 gcc/config/riscv/constraints.md | 3 +--
 gcc/doc/md.texi | 3 +++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index 8c15c6c0486..c87d5b796a5 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -67,8 +67,7 @@ (define_memory_constraint "A"
(match_test "GET_CODE(XEXP(op,0)) == REG")))
 
 (define_constraint "S"
-  "@internal
-   A constant call address."
+  "A constraint that matches an absolute symbolic address."
   (match_operand 0 "absolute_symbolic_operand"))
 
 (define_constraint "U"
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 00caf3844cc..2d120da96cf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3536,6 +3536,9 @@ A 5-bit unsigned immediate for CSR access instructions.
 @item A
 An address that is held in a general-purpose register.
 
+@item S
+A constraint that matches an absolute symbolic address.
+
 @end table
 
 @item RX---@file{config/rx/constraints.md}
-- 
2.31.1



Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-07-11 Thread Martin Liška

PING^1

On 6/28/21 2:19 PM, Martin Liška wrote:

On 6/24/21 12:46 AM, Segher Boessenkool wrote:

Hi!

On Wed, Jun 23, 2021 at 03:22:34PM +0200, Martin Liška wrote:

As mentioned in the "Fallout: save/restore target options in
handle_optimize_attribute"
thread, we need to support target option restore of
rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.


I have no idea?  Could you explain please?


Sure. Few weeks ago, we started using cl_target_option_{save,restore} calls
even for optimize attributes (and pragma). Motivation was that optimize options
can influence target options (and vice versa).

Doing that, FLOAT_PRECISION_TFmode must be accepted as a valid option value
for rs6000_long_double_type_size.




--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p)
    else
  rs6000_long_double_type_size = default_long_double_size;
  }
+  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
+    ; /* The option can be restored a TREE_TARGET_OPTION.  */


What does that mean?  It is not grammatical, and not obvious what it
should mean.


Updated.




--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { powerpc*-*-linux* } } } */


Why on Linux only?  That doesn't sound right.  Do you need some other
selector(s)?


Sorry, I copied the test-case.




+/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
+
+extern unsigned long int x;
+extern float f (float);
+extern __typeof (f) f_power8;
+extern __typeof (f) f_power9;
+extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) *


-fno-stack-protector is default.


Yes, but one needs an optimize attribute in order to trigger 
cl_target_option_save/restore
mechanism.

Martin




+f_ifunc (void)
+{
+  __typeof (f) *res = x ? f_power9 : f_power8;
+  return res;
+}


The testcase should say what it is testing for, it is not obvious?


Segher







Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-07-11 Thread Xionghu Luo via Gcc-patches



On 2021/7/10 02:40, will schmidt wrote:
> On Wed, 2021-06-30 at 09:44 +0800, Xionghu Luo via Gcc-patches wrote:
>> Gentle ping ^2, thanks.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html
>>
>>
>> On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote:
>>> Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%,
>>> 526.blender_r +1.72%, no obvious changes to others.
> 
> Ok.
> 
>>>
>>>
>>> On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote:
 Gentle ping, thanks.


 On 2021/4/16 15:10, Xiong Hu Luo wrote:
> fmod/fmodf and remainder/remainderf could be expanded instead of library
> call when fast-math build, which is much faster.
>
> fmodf:
>fdivs   f0,f1,f2
>frizf0,f0
>fnmsubs f1,f2,f0,f1
>
> remainderf:
>fdivs   f0,f1,f2
>frinf0,f0
>fnmsubs f1,f2,f0,f1
>
> gcc/ChangeLog:
>
> 2021-04-16  Xionghu Luo  
>
>  PR target/97142
> 
> That PR is " Bug 97142
>- __builtin_fmod not optimized on POWER   "
> 
> OK.
> 
> 
>  * config/rs6000/rs6000.md (fmod3): New define_expand.
>  (remainder3): Likewise.
> 
> 
>
> gcc/testsuite/ChangeLog:
>
> 2021-04-16  Xionghu Luo  
>
>  PR target/97142
>  * gcc.target/powerpc/pr97142.c: New test.
> 
> Ok.
> 
> ---
>gcc/config/rs6000/rs6000.md| 36 ++
>gcc/testsuite/gcc.target/powerpc/pr97142.c | 30 ++
>2 files changed, 66 insertions(+)
>create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97142.c
>
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a1315523fec..7e0e94e6ba4 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4902,6 +4902,42 @@ (define_insn "fre"
>  [(set_attr "type" "fp")
>   (set_attr "isa" "*,")])
> +(define_expand "fmod3"
> +  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
> +(use (match_operand:SFDF 1 "gpc_reg_operand"))
> +(use (match_operand:SFDF 2 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +  && TARGET_FPRND
> +  && flag_unsafe_math_optimizations"
> +{
> +  rtx div = gen_reg_rtx (mode);
> +  emit_insn (gen_div3 (div, operands[1], operands[2]));
> +
> +  rtx friz = gen_reg_rtx (mode);
> +  emit_insn (gen_btrunc2 (friz, div));
> +
> +  emit_insn (gen_nfms4 (operands[0], operands[2], friz,
> operands[1]));
> +  DONE;
> + })
> +
> +(define_expand "remainder3"
> +  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
> +(use (match_operand:SFDF 1 "gpc_reg_operand"))
> +(use (match_operand:SFDF 2 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +  && TARGET_FPRND
> +  && flag_unsafe_math_optimizations"
> +{
> +  rtx div = gen_reg_rtx (mode);
> +  emit_insn (gen_div3 (div, operands[1], operands[2]));
> +
> +  rtx frin = gen_reg_rtx (mode);
> +  emit_insn (gen_round2 (frin, div));
> +
> +  emit_insn (gen_nfms4 (operands[0], operands[2], frin,
> operands[1]));
> +  DONE;
> + })
> 
> I notice the pattern of arguments to the final emit
> is op[0],op[2],fri*,op[1]
> while the description comment suggests the generated instruction
> will be fnmsubs  f1,f2,f0,f1  ;
> 
> I don't see any rearranging in the nfms4 expansions, but
> presumably this is correct and just a cosmetic nit that catches my eye.


>From the ISA, 

fnmsub FRT,FRA,FRC,FRB

The operation
FRT ← - ( [(FRA) (FRC)] - (FRB) )
is performed.

 fmodf:
   fdivs   f0,f1,f2
   frizf0,f0
   fnmsubs f1,f2,f0,f1

Then the ASM means:

f1 = - (f2 * f0 - f1) = - ([f2 * f1/f2] - f1)

So f1 is set with the mod result.

> 
> Ok.
> 
> 
> +
>(define_insn "*rsqrt2"
>  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa")
>(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" ",wa")]
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97142.c
> b/gcc/testsuite/gcc.target/powerpc/pr97142.c
> new file mode 100644
> index 000..48f25ca5b5b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97142.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast" } */
> +
> +#include 
> +
> +float test1 (float x, float y)
> +{
> +  return fmodf (x, y);
> +}
> +
> +double test2 (double x, double y)
> +{
> +  return fmod (x, y);
> +}
> +
> +float test3 (float x, float y)
> +{
> +  return remainderf (x, y);
> +}
> +
> +double test4 (double x, double y)
> +{
> +  return remainder (x, y);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mbl fmod\M} } } */
> +/* { dg-final { scan-assembler-not {\mbl fmodf\M} } } */
> +/* { dg-final { scan-assembler-not 

Re: Repost: [PATCH] PR 100167: Fix vector long long multiply/divide tests on power10

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Mike,

On 7/7/21 3:04 PM, Michael Meissner wrote:

[PATCH] PR 100167: Fix vector long long multiply/divide tests on power10.

This patch updates the vector long long multiply and divide tests to
supply the correct code information if power10 code generation is used.

2021-07-07  Michael Meissner  

gcc/testsuite/
PR testsuite/100167
* gcc.target/powerpc/fold-vec-div-longlong.c:

Missing information after colon.

* gcc.target/powerpc/fold-vec-mult-longlong.c: Fix expected code
generation on power10.
---
  gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c  | 7 +--
  gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c | 3 ++-
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
index 312e984d3cc..f6a9b290ae5 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
@@ -19,5 +19,8 @@ test6 (vector unsigned long long x, vector unsigned long long 
y)
  {
return vec_div (x, y);
  }
-/* { dg-final { scan-assembler-times {\mdivd\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mdivdu\M} 2 } } */
+
+/* { dg-final { scan-assembler-times {\mdivd\M}   2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mdivdu\M}  2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 { target {   
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 { target {   
has_arch_pwr10 } } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
index 38dba9f5023..bd210e34801 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
@@ -20,5 +20,6 @@ test6 (vector unsigned long long x, vector unsigned long long 
y)
return vec_mul (x, y);
  }
  
-/* { dg-final { scan-assembler-times "\[ \t\]mulld " 4 { target lp64 } } } */

+/* { dg-final { scan-assembler-times {\mmulld\M}  4 { target { lp64 && { ! 
has_arch_pwr10 } } } } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 { target { has_arch_pwr10  
   } } } } */
  


Shouldn't this last be { lp64 && has_arch_pwr10 } ?

Otherwise LGTM.  I can't approve, but recommend approval with those changes.

Thanks,
Bill



Re: Repost: [PATCH] PR 100170: Fix eq/ne tests on power10.

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Mike,

ENOPATCH

Thanks,
Bill :-)

On 7/7/21 3:06 PM, Michael Meissner wrote:

[PATCH] PR 100170: Fix eq/ne tests on power10.

This patch updates eq/ne tests in the testsuite to adjust the test if
power10 code generation is used.

I have verified that these tests run on a power10 system using the
--with-cpu=power10 configuration option, and they continue to run on power9
little endian and power8 big endian systems.

Can I check this patch into th master branch?

2021-07-07  Michael Meissner  

gcc/testsuite/
PR testsuite/100170
* gcc.target/powerpc/ppc-eq0-1.c: Add support for the setbc
instruction.
* gcc.target/powerpc/ppc-ne0-1.c: Update instruction counts on
power10.



Re: Repost: [PATCH] PR 100168: Fix call test on power10.

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Mike,

LGTM.  I can't approve, but recommend approval.

Thanks,
Bill

On 7/7/21 3:08 PM, Michael Meissner wrote:

[PATCH] PR 100168: Fix call test on power10.

Fix a test that was checking for 64-bit TOC calls, to also allow for
PC-relative calls.

I have verified that this test passes when run on a power10 system configured
with --with-cpu=power10 and it continues to pass on power9 little endian and
power8 big endian systems.

Can I check this into the master branch?

2021-07-07  Michael Meissner  

gcc/testsuite
PR testsuite/100168
* gcc.dg/pr56727-2.c: Add support for PC-relative calls.
---
  gcc/testsuite/gcc.dg/pr56727-2.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr56727-2.c b/gcc/testsuite/gcc.dg/pr56727-2.c
index c54369ed25e..77fdf4bc350 100644
--- a/gcc/testsuite/gcc.dg/pr56727-2.c
+++ b/gcc/testsuite/gcc.dg/pr56727-2.c
@@ -18,4 +18,4 @@ void h ()
  
  /* { dg-final { scan-assembler "@(PLT|plt)" { target i?86-*-* x86_64-*-* } } } */

  /* { dg-final { scan-assembler "@(PLT|plt)" { target { powerpc*-*-linux* && 
ilp32 } } } } */
-/* { dg-final { scan-assembler "bl f\n\\s*nop" { target { powerpc*-*-linux* && 
lp64 } } } } */
+/* { dg-final { scan-assembler "(bl f\n\\s*nop)|(bl f@notoc)" { target { 
powerpc*-*-linux* && lp64 } } } } */


Re: [PATCH 2/2] rs6000: Add tests for SSE4.1 "floor" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/6/21 5:50 PM, Paul A. Clarke via Gcc-patches wrote:

Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-06  Paul A. Clarke  

gcc/testsuite/ChangeLog:
[Applies to all your patches] Don't need to include "ChangeLog:" here.  
"gcc/testsuite/" works fine.

* gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c: Copy
from gcc/testsuite/gcc.target/i386.
As before, these all need to be relative to gcc/testsuite, so start with 
gcc.target/...

---
  .../gcc.target/powerpc/sse4_1-floorpd.c   |  51 
  .../gcc.target/powerpc/sse4_1-floorps.c   |  33 +
  .../gcc.target/powerpc/sse4_1-floorsd.c   | 119 ++
  .../gcc.target/powerpc/sse4_1-floorss.c   |  95 ++
  .../gcc.target/powerpc/sse4_1-roundpd-2.c |  36 ++
  5 files changed, 334 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
new file mode 100644
index ..ad21644f50c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_floor_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  0.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  0.0,  0.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.cp+50 } },
+  { { .f = {  0x1.ep+50,  0x1.0p+51 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.2p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.ep+51 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+52 } },
+   { -0x1.0p+52, -0x1.ep+52 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.4p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.2p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.0p+51, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } },
+  { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
new file mode 100644
index ..17ff35a7360f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_floor_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  0.0,  0.0,  0.0 } },
+
+  { { .f = {  0x1.f8p+21,  0x1.fap+21,  0x1.fcp+21,  
0x1.fep+21 } },
+   {  0x1.f8p+21,  0x1.f8p+21,  0x1.f8p+21,  
0x1.f8p+21 } },
+
+  { { .f = {  0x1.fap+22,  

Re: [PATCH 1/2] rs6000: Add support for SSE4.1 "floor" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/6/21 5:50 PM, Paul A. Clarke via Gcc-patches wrote:

2021-07-06  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
_mm_floor_sd, _mm_floor_ss): New.
---
  gcc/config/rs6000/smmintrin.h | 28 
  1 file changed, 28 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 0c0b0dd7c1e3..f484a7fd029f 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -240,4 +240,32 @@ _mm_ceil_ss (__m128 __A, __m128 __B)
return r;
  }

+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))


Usual fuss about line length.  LGTM with that fixed here and below.

I can't approve, but recommend approval with those changes.

Thanks,
Bill


+_mm_floor_pd (__m128d __A)
+{
+  return (__m128d) vec_floor ((__v2df) __A);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_floor_ps (__m128 __A)
+{
+  return (__m128) vec_floor ((__v4sf) __A);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_floor_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_floor ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_floor_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_floor (((__v4sf) __B)[0]);
+  return r;
+}
+
  #endif


Re: [PATCH 2/2] rs6000: Add tests for SSE4.1 "ceil" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/1/21 9:11 PM, Paul A. Clarke via Gcc-patches wrote:

Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitons in

typo ("definitions").

m128-check.h.

2021-07-01  Paul A. Clarke  

gcc/testsuite/ChangeLog:

"gcc/testsuite/" will make the tools happy.

* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c: New.


All of these should be relative to gcc/testsuite/, so

    * gcc.target/powerpc/sse4_1-ceilpd.c: New.

and similar.


* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h: New.
* gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c: Copy
 from gcc/testsuite/gcc.target/i386.
* gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
(__VSX_SSE2__): Define.
---
  .../gcc.target/powerpc/sse4_1-ceilpd.c|  51 
  .../gcc.target/powerpc/sse4_1-ceilps.c|  33 +
  .../gcc.target/powerpc/sse4_1-ceilsd.c| 119 ++
  .../gcc.target/powerpc/sse4_1-ceilss.c|  95 ++
  .../gcc.target/powerpc/sse4_1-check.h |   4 +
  .../gcc.target/powerpc/sse4_1-round-data.h|  20 +++
  .../gcc.target/powerpc/sse4_1-round.h |  27 
  .../gcc.target/powerpc/sse4_1-round2.h|  27 
  .../gcc.target/powerpc/sse4_1-roundpd-3.c |  36 ++
  9 files changed, 412 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
new file mode 100644
index ..f532fdb9c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include 
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  1.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  1.0,  1.0 } },
+
+  { { .f = {  0x1.cp+50,  0x1.dp+50 } },
+   {  0x1.cp+50,  0x1.0p+51 } },
+  { { .f = {  0x1.ep+50,  0x1.fp+50 } },
+   {  0x1.0p+51,  0x1.0p+51 } },
+  { { .f = {  0x1.0p+51,  0x1.1p+51 } },
+   {  0x1.0p+51,  0x1.2p+51 } },
+  { { .f = {  0x1.2p+51,  0x1.3p+51 } },
+   {  0x1.2p+51,  0x1.4p+51 } },
+
+  { { .f = {  0x1.ep+51,  0x1.fp+51 } },
+   {  0x1.ep+51,  0x1.0p+52 } },
+  { { .f = {  0x1.0p+52,  0x1.1p+52 } },
+   {  0x1.0p+52,  0x1.1p+52 } },
+
+  { { .f = { -0x1.1p+52, -0x1.0p+52 } },
+   { -0x1.1p+52, -0x1.0p+52 } },
+  { { .f = { -0x1.fp+51, -0x1.ep+51 } },
+   { -0x1.ep+51, -0x1.ep+51 } },
+
+  { { .f = { -0x1.3p+51, -0x1.2p+51 } },
+   { -0x1.2p+51, -0x1.2p+51 } },
+  { { .f = { -0x1.1p+51, -0x1.0p+51 } },
+   { -0x1.0p+51, -0x1.0p+51 } },
+  { { .f = { -0x1.fp+50, -0x1.ep+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+  { { .f = { -0x1.dp+50, -0x1.cp+50 } },
+   { -0x1.cp+50, -0x1.cp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0,  0.0 } },
+  { { .f = { -0.50, -0.25 } }, {  0.0,  0.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
new file mode 100644
index ..417ac76d8aa9
--- 

Re: [PATCH 1/2] rs6000: Add support for SSE4.1 "ceil" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 7/1/21 9:11 PM, Paul A. Clarke via Gcc-patches wrote:

2021-07-01  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
_mm_ceil_sd, _mm_ceil_ss): New.
---
  gcc/config/rs6000/smmintrin.h | 28 
  1 file changed, 28 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fa17a8b2f478..0c0b0dd7c1e3 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -212,4 +212,32 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
return any_ones * any_zeros;
  }

+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))


Usual fuss over line length, here and below.  Otherwise LGTM, can't 
approve but recommend approval with those changes.


Thanks,
Bill


+_mm_ceil_pd (__m128d __A)
+{
+  return (__m128d) vec_ceil ((__v2df) __A);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_ps (__m128 __A)
+{
+  return (__m128) vec_ceil ((__v4sf) __A);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_ceil ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_ceil_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_ceil (((__v4sf) __B)[0]);
+  return r;
+}
+
  #endif


Re: [committed] input.c: move file caching globals to a new file_cache class

2021-07-11 Thread Lewis Hyatt via Gcc-patches
Hi David-

I thought this might be a good opportunity to ask about the patch that
supports -finput-charset in diagnostic.c please?
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564527.html

The patch will require some work to adapt to the new changes below. I
am happy to do that, but thought I should check first whether you have
any interest in this approach? Thanks!

-Lewis

On Thu, Jul 1, 2021 at 5:52 PM David Malcolm via Gcc-patches
 wrote:
>
> This moves some global state from input.c to a new file_cache class,
> of which an instance is owned by global_dc.  Various state is also
> made private.
>
> No functional change intended.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Pushed to trunk as b544c348e13ad33d55f0d954370ab1fb0f2bf683.
>
> gcc/ChangeLog:
> * diagnostic.h (diagnostic_context::m_file_cache): New field.
> * input.c (class fcache): Rename to...
> (class file_cache_slot): ...this, making most members private and
> prefixing fields with "m_".
> (file_cache_slot::get_file_path): New accessor.
> (file_cache_slot::get_use_count): New accessor.
> (file_cache_slot::missing_trailing_newline_p): New accessor.
> (file_cache_slot::inc_use_count): New.
> (fcache_buffer_size): Move to...
> (file_cache_slot::buffer_size): ...here.
> (fcache_line_record_size): Move to...
> (file_cache_slot::line_record_size): ...here.
> (fcache_tab): Delete, in favor of global_dc->m_file_cache.
> (fcache_tab_size): Move to file_cache::num_file_slots.
> (diagnostic_file_cache_init): Update for move of fcache_tab
> to global_dc->m_file_cache.
> (diagnostic_file_cache_fini): Likewise.
> (lookup_file_in_cache_tab): Convert to...
> (file_cache::lookup_file): ...this.
> (diagnostics_file_cache_forcibly_evict_file): Update for move of
> fcache_tab to global_dc->m_file_cache, moving most of
> implementation to...
> (file_cache::forcibly_evict_file): ...this new function and...
> (file_cache_slot::evict): ...this new function.
> (evicted_cache_tab_entry): Convert to...
> (file_cache::evicted_cache_tab_entry): ...this.
> (add_file_to_cache_tab): Convert to...
> (file_cache::add_file): ...this, moving bulk of implementation
> to...
> (file_cache_slot::create): ..this new function.
> (file_cache::file_cache): New.
> (file_cache::~file_cache): New.
> (lookup_or_add_file_to_cache_tab): Convert to...
> (file_cache::lookup_or_add_file): ..this new function.
> (fcache::fcache): Rename to...
> (file_cache_slot::file_cache_slot): ...this, adding "m_" prefixes
> to fields.
> (fcache::~fcache): Rename to...
> (file_cache_slot::~file_cache_slot): ...this, adding "m_" prefixes
> to fields.
> (needs_read): Convert to...
> (file_cache_slot::needs_read_p): ...this.
> (needs_grow): Convert to...
> (file_cache_slot::needs_grow_p): ...this.
> (maybe_grow): Convert to...
> (file_cache_slot::maybe_grow): ...this.
> (read_data): Convert to...
> (file_cache_slot::read_data): ...this.
> (maybe_read_data): Convert to...
> (file_cache_slot::maybe_read_data): ...this.
> (get_next_line): Convert to...
> (file_cache_slot::get_next_line): ...this.
> (goto_next_line): Convert to...
> (file_cache_slot::goto_next_line): ...this.
> (read_line_num): Convert to...
> (file_cache_slot::read_line_num): ...this.
> (location_get_source_line): Update for moving of globals to
> global_dc->m_file_cache.
> (location_missing_trailing_newline): Likewise.
> * input.h (class file_cache_slot): New forward decl.
> (class file_cache): New.
>
> Signed-off-by: David Malcolm 
> ---
>  gcc/diagnostic.h |   3 +
>  gcc/input.c  | 459 +++
>  gcc/input.h  |  33 
>  3 files changed, 301 insertions(+), 194 deletions(-)
>
> diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
> index 1b9d6b1f64d..086bc4f903f 100644
> --- a/gcc/diagnostic.h
> +++ b/gcc/diagnostic.h
> @@ -136,6 +136,9 @@ struct diagnostic_context
>/* Where most of the diagnostic formatting work is done.  */
>pretty_printer *printer;
>
> +  /* Cache of source code.  */
> +  file_cache *m_file_cache;
> +
>/* The number of times we have issued diagnostics.  */
>int diagnostic_count[DK_LAST_DIAGNOSTIC_KIND];
>
> diff --git a/gcc/input.c b/gcc/input.c
> index 9e39e7df83c..de20d983d2c 100644
> --- a/gcc/input.c
> +++ b/gcc/input.c
> @@ -32,9 +32,29 @@ along with GCC; see the file COPYING3.  If not see
>
>  /* This is a cache used by get_next_line to store the content of a
> file to be searched for file lines.  */
> -class fcache
> +class file_cache_slot

Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

On 7/11/21 11:17 AM, Bill Schmidt wrote:

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.
---
  gcc/config/rs6000/smmintrin.h | 46 +++
  1 file changed, 46 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h 
b/gcc/config/rs6000/smmintrin.h

index 1b8cad135ed0..fa17a8b2f478 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, 
__m128i __mask)

    return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
  }

+extern __inline __m128d __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

Usual line length complaint. :)  Here and below...

+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  const signed char __tmp = (__imm8 & 0b10) * 0b0000 |
+    (__imm8 & 0b01) * 0b;
+  __v16qi __charmask = vec_splats ((signed char) __tmp);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
__shortmask);


This seems way too complex, and needs commentary to explain what 
you're doing.  Doesn't this instruction just translate into some form 
of xxpermdi?  Different ones for BE and LE, but still just xxpermdi, I 
think.



+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const vector __bool long long __boolmask = vec_cmplt ((__v2di) 
__mask, __zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
__boolmask);

+}


Okay.


+
+extern __inline __m128 __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
+ (__imm8 & 0b0100) * 0b1100 |
+ (__imm8 & 0b0010) * 0b0110 |
+ (__imm8 & 0b0001) * 0b0011;
+  __v16qi __charmask = vec_splats ( __mask);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);


This is a good trick, but you need comments to explain what you're 
doing, including how you build __mask.  I recommend you include 
alternate code for P10, where you can just use vec_genwm to expand 
from __mask to a mask of word elements.


I don't understand how you're getting away with a v8hu mask for word 
elements.  This seems wrong to me.  Adequate testing?


As an alternate approach, I suppose you could use vec_perm / vec_permr 
with one of sixteen possible masks, which would seem faster than the 
splat/gather/unpack/select approach.  Something to consider.


Bill




+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, 
__zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) 
__boolmask);

+}
+


Okay.

Please have a look at the above issues and resubmit.  Thanks!
Bill

  extern __inline int __attribute__((__gnu_inline__, 
__always_inline__, __artificial__))

  _mm_testz_si128 (__m128i __A, __m128i __B)
  {


Re: [PATCH 4/4] rs6000: Add tests for SSE4.1 "blend" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

Please resubmit this when you resubmit 3/4, in case any adjustments are 
needed.


Thanks!
Bill

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  

gcc/testsuite/ChangeLog:
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c: Copy
from gcc/testsuite/gcc.target/i386.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
---
  .../gcc.target/powerpc/sse4_1-blendpd.c   | 89 ++
  .../gcc.target/powerpc/sse4_1-blendps-2.c | 81 +
  .../gcc.target/powerpc/sse4_1-blendps.c   | 90 +++
  .../gcc.target/powerpc/sse4_1-blendvpd.c  | 65 ++
  4 files changed, 325 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
new file mode 100644
index ..ca1780471fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+#include 
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x03
+#endif
+
+static void
+init_blendpd (double *src1, double *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 2; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendpd (__m128d *dst, double *src1, double *src2)
+{
+  double tmp[2];
+  int j;
+
+  memcpy ([0], src1, sizeof (tmp));
+
+  for(j = 0; j < 2; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, [0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128d x, y;
+  union
+{
+  __m128d x[NUM];
+  double d[NUM * 2];
+} dst, src1, src2;
+  union
+{
+  __m128d x;
+  double d[2];
+} src3;
+  int i;
+
+  init_blendpd (src1.d, src2.d);
+
+  /* Check blendpd imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
+  if (check_blendpd ([i], [i * 2], [i * 2]))
+   abort ();
+}
+
+  /* Check blendpd imm8, xmm, xmm */
+  src3.x = _mm_setzero_pd ();
+
+  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
+  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
+
+  if (check_blendpd (, [4], [0]))
+abort ();
+
+  if (check_blendpd (, [0], [4]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
new file mode 100644
index ..768b6e64bbae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+#include 
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xe
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy ([0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, [0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  __m128 x, y;
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2;
+  union
+{
+  __m128 x;
+  float f[4];
+} src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK);
+  if (check_blendps ([i], [i * 4], [i * 4]))
+   abort ();
+}
+
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (, [8], [0]))
+abort ();
+
+  if (check_blendps (, [0], [8]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
new file mode 100644
index 

Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.
---
  gcc/config/rs6000/smmintrin.h | 46 +++
  1 file changed, 46 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 1b8cad135ed0..fa17a8b2f478 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
  }

+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

Usual line length complaint. :)  Here and below...

+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  const signed char __tmp = (__imm8 & 0b10) * 0b0000 |
+   (__imm8 & 0b01) * 0b;
+  __v16qi __charmask = vec_splats ((signed char) __tmp);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __shortmask);


This seems way too complex, and needs commentary to explain what you're 
doing.  Doesn't this instruction just translate into some form of 
xxpermdi?  Different ones for BE and LE, but still just xxpermdi, I think.



+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+}


Okay.


+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
+(__imm8 & 0b0100) * 0b1100 |
+(__imm8 & 0b0010) * 0b0110 |
+(__imm8 & 0b0001) * 0b0011;
+  __v16qi __charmask = vec_splats ( __mask);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);


This is a good trick, but you need comments to explain what you're 
doing, including how you build __mask.  I recommend you include 
alternate code for P10, where you can just use vec_genwm to expand from 
__mask to a mask of word elements.


I don't understand how you're getting away with a v8hu mask for word 
elements.  This seems wrong to me.  Adequate testing?



+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+}
+


Okay.

Please have a look at the above issues and resubmit.  Thanks!
Bill


  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
  _mm_testz_si128 (__m128i __A, __m128i __B)
  {


Re: [PATCH 2/4] rs6000: Add tests for SSE4.1 "test" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

LGTM.  I can't approve, but recommend approval as is.

Thanks,
Bill

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

Copy the test for _mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  

gcc/testsuite/ChangeLog:
 * gcc.target/powerpc/sse4_1-ptest.c: Copy from
gcc/testsuite/gcc.target/i386.
---
  .../gcc.target/powerpc/sse4_1-ptest-1.c   | 117 ++
  1 file changed, 117 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
new file mode 100644
index ..69d13d57770d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
@@ -0,0 +1,117 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+static int
+make_ptestz (__m128i m, __m128i v)
+{
+  union
+{
+  __m128i x;
+  unsigned char c[16];
+} val, mask;
+  int i, z;
+
+  mask.x = m;
+  val.x = v;
+
+  z = 1;
+  for (i = 0; i < 16; i++)
+if ((mask.c[i] & val.c[i]))
+  {
+   z = 0;
+   break;
+  }
+  return z;
+}
+
+static int
+make_ptestc (__m128i m, __m128i v)
+{
+  union
+{
+  __m128i x;
+  unsigned char c[16];
+} val, mask;
+  int i, c;
+
+  mask.x = m;
+  val.x = v;
+
+  c = 1;
+  for (i = 0; i < 16; i++)
+if ((val.c[i] & ~mask.c[i]))
+  {
+   c = 0;
+   break;
+  }
+  return c;
+}
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x;
+  unsigned int i[4];
+} val[4];
+  int i, j, l;
+  int res[32];
+
+  val[0].i[0] = 0x;
+  val[0].i[1] = 0x;
+  val[0].i[2] = 0x;
+  val[0].i[3] = 0x;
+
+  val[1].i[0] = 0x;
+  val[1].i[1] = 0x;
+  val[1].i[2] = 0x;
+  val[1].i[3] = 0x;
+
+  val[2].i[0] = 0;
+  val[2].i[1] = 0;
+  val[2].i[2] = 0;
+  val[2].i[3] = 0;
+
+  val[3].i[0] = 0x;
+  val[3].i[1] = 0x;
+  val[3].i[2] = 0x;
+  val[3].i[3] = 0x;
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+for(j = 0; j < 4; j++)
+  {
+   res[l++] = _mm_testz_si128 (val[j].x, val[i].x);
+   res[l++] = _mm_testc_si128 (val[j].x, val[i].x);
+  }
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+for(j = 0; j < 4; j++)
+  {
+   if (res[l++] != make_ptestz (val[j].x, val[i].x))
+ abort ();
+   if (res[l++] != make_ptestc (val[j].x, val[i].x))
+ abort ();
+  }
+
+  if (res[2] != _mm_testz_si128 (val[1].x, val[0].x))
+abort ();
+
+  if (res[3] != _mm_testc_si128 (val[1].x, val[0].x))
+abort ();
+}


Re: [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics

2021-07-11 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
 * config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros): New.
---
  gcc/config/rs6000/smmintrin.h | 50 +++
  1 file changed, 50 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index bdf6eb365d88..1b8cad135ed0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
  }

+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

Line too long, please fix here and below.  (Existing cases can be left.)

+_mm_testz_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */


This is reasonable; thanks for documenting.

LGTM; I can't approve, but recommend approval with line lengths fixed.  
Thanks!

Bill


+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_testc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_testnzc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_all_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_all_ones (__m128i __A)
+{
+  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
+  return vec_all_eq ((__v16qu) __A, __ones);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
+  const int any_ones = vec_any_ne (__Amasked, __zero);
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
+  const int any_zeros = vec_any_ne (__notAmasked, __zero);
+  return any_ones * any_zeros;
+}
+
  #endif


[PATCH] x86: Don't enable UINTR in 32-bit mode

2021-07-11 Thread H.J. Lu via Gcc-patches
UINTR is available only in 64-bit mode.  Since the codegen target is
unknown when the the gcc driver is processing -march=native, to properly
handle UINTR for -march=native:

1. Add an undocumented option, -muintr-native.
2. Update the gcc driver to pass -muintr-native with -march=native if
UINTR is available.
3. Change ix86_option_override_internal to
   a. Turn on UINTR in 64-bit mode for -muintr-native.
   b. Enable UINTR only in 64-bit mode for -march=CPU when PTA_CPU
   includes PTA_UINTR.

gcc/

PR target/101395
* config/i386/driver-i386.c (host_detect_local_cpu): Add
-muintr-native for FEATURE_UINTR.
* config/i386/i386-options.c (ix86_option_override_internal):
Move -muintr check after -march processing and turn on UINTR in
64-bit mode for -muintr-native if it has not been disabled
explicitly.
(DEF_PTA): Skip PTA_UINTR if not in 64-bit mode.
* config/i386/i386.opt (muintr-native): New.  Undocumented to
enable -muintr only in 64-bit mode with -march=native.

gcc/testsuite/

PR target/101395
* gcc.target/i386/pr101395-1.c: New test.
* gcc.target/i386/pr101395-2.c: Likewise.
* gcc.target/i386/pr101395-3.c: Likewise.
---
 gcc/config/i386/driver-i386.c  | 10 --
 gcc/config/i386/i386-options.c | 14 +++---
 gcc/config/i386/i386.opt   |  4 
 gcc/testsuite/gcc.target/i386/pr101395-1.c | 12 
 gcc/testsuite/gcc.target/i386/pr101395-2.c | 22 ++
 gcc/testsuite/gcc.target/i386/pr101395-3.c |  6 ++
 6 files changed, 63 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr101395-3.c

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index dd9236616b4..7ade90a088d 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -804,8 +804,14 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
if (isa_names_table[i].option)
  {
if (has_feature (isa_names_table[i].feature))
- options = concat (options, " ",
-   isa_names_table[i].option, NULL);
+ {
+   const char *option;
+   if (isa_names_table[i].feature == FEATURE_UINTR)
+ option = "-muintr-native";
+   else
+ option = isa_names_table[i].option;
+   options = concat (options, " ", option, NULL);
+ }
else
  options = concat (options, neg_option,
isa_names_table[i].option + 2, NULL);
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index 7a35c468da3..6b12f030f0c 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -1920,9 +1920,6 @@ ix86_option_override_internal (bool main_args_p,
   opts->x_ix86_stringop_alg = no_stringop;
 }
 
-  if (TARGET_UINTR && !TARGET_64BIT)
-error ("%<-muintr%> not supported for 32-bit code");
-
   if (!opts->x_ix86_arch_string)
 opts->x_ix86_arch_string
   = TARGET_64BIT_P (opts->x_ix86_isa_flags)
@@ -2109,6 +2106,7 @@ ix86_option_override_internal (bool main_args_p,
 #define DEF_PTA(NAME) \
if (((processor_alias_table[i].flags & PTA_ ## NAME) != 0) \
&& PTA_ ## NAME != PTA_64BIT \
+   && (TARGET_64BIT || PTA_ ## NAME != PTA_UINTR) \
&& !TARGET_EXPLICIT_ ## NAME ## _P (opts)) \
  SET_TARGET_ ## NAME (opts);
 #include "i386-isa.def"
@@ -2184,6 +2182,16 @@ ix86_option_override_internal (bool main_args_p,
   XDELETEVEC (s);
 }
 
+  if (TARGET_64BIT)
+{
+  /* Turn on UINTR in 64-bit mode for -muintr-native if it has not
+been set explicitly.  */
+  if (ix86_uintr_native && !TARGET_EXPLICIT_UINTR_P (opts))
+   SET_TARGET_UINTR (opts);
+}
+  else if (TARGET_UINTR)
+error ("%<-muintr%> not supported for 32-bit code");
+
   ix86_arch_mask = HOST_WIDE_INT_1U << ix86_arch;
   for (i = 0; i < X86_ARCH_LAST; ++i)
 ix86_arch_features[i] = !!(initial_ix86_arch_features[i] & ix86_arch_mask);
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 7b8547bb1c3..5923bbfb40e 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -802,6 +802,10 @@ muintr
 Target Mask(ISA2_UINTR) Var(ix86_isa_flags2) Save
 Support UINTR built-in functions and code generation.
 
+; Used to enable -muintr only in 64-bit mode with -march=native.
+muintr-native
+Target Undocumented RejectNegative Var(ix86_uintr_native)
+
 msgx
 Target Mask(ISA2_SGX) Var(ix86_isa_flags2) Save
 Support SGX built-in functions and code generation.
diff --git a/gcc/testsuite/gcc.target/i386/pr101395-1.c 
b/gcc/testsuite/gcc.target/i386/pr101395-1.c
new file mode 

[PATCH V2] coroutines: Adjust outlined function names [PR95520].

2021-07-11 Thread Iain Sandoe
Hi Jason,

> On 9 Jul 2021, at 22:40, Jason Merrill  wrote:
> 
> On 7/9/21 2:18 PM, Iain Sandoe wrote:

> How about handling this in write_encoding, along the lines of the 
> devel/c++-contracts branch?

OK, so I took a look at this and implemented as below. 

 Some small differences from your contracts impl described here. 

recalling

the original function becomes the ramp - it is called directly by the user-code.
the resumer (actor) contains the outlined code wrapped in synthesized logic as 
dictated by the std
the destroy function effectively calls the actor with a flag that says “take 
the DTOR path” (since the DTOR path has to be available in the case of resume 
too).

this means that is is possible for the actor to be partially (or completely for 
a generator-style coro) inlined into either the ramp or the destroyer.

1. using DECL_ABSTRACT_ORIGIN didn’t work with optimisation and debug since the 
inlining of the outlining confuses the issue (the actor/destory helpers are not 
real clones).

 - there hasn’t been any specific reason to know “which” coroutine function was 
being lowered in the middle or back ends to date - so I had to add some 
book-keeping to allow that to be queried from write_encoding.

2. I had to cater for lambda coroutines; that meant recognising that we have a 
lambda coro helper and picking up the base mangling for the ramp (original 
lambda)

3. I made a minor adjustment to the string handling so that it can account for 
targets that don’t support ‘.’ or ‘$’ in symbols.

> Speaking of which, I wonder if you also want to do something similar to what 
> I did there to put the ramp/actor/destroyer functions into into the same 
> comdat group.

I looked through your code and agree that it should be possible to be more 
restrictive about the interfaces presented by the actor and destroy functions 
in coros.  The ramp obviously has to keep the visiblity with which the user 
wrote it.

As for comdat groups, I’d need to look harder.

please could these things be TODOs - the fix for 95520 doesn’t make them any 
worse (or better), and there are other bugs that are higher priority.

tested on x86_64-Linux,Darwin and powerpc64-linux, also with cppcoro (but i 
would plan to test it on folly too before pushing)

OK for master / backports?

=

The mechanism used to date for uniquing the coroutine helper
functions (actor, destroy) was over-complicating things and
leading to the noted PR and also difficulties in setting
breakpoints on these functions (so this will help PR99215 as
well).

This implementation delegates the adjustment to the mangling
to write_encoding() which necessitates some book-keeping so
that it is possible to determine which of the coroutine
helper names is to be mangled.

Signed-off-by: Iain Sandoe 

PR c++/95520 - [coroutines] __builtin_FUNCTION() returns mangled .actor instead 
of original function name

PR c++/95520

gcc/cp/ChangeLog:

* coroutines.cc (struct coroutine_info): Add fields for
actor and destroy function decls.
(to_ramp): New.
(coro_get_ramp_function): New.
(coro_get_actor_function): New.
(coro_get_destroy_function): New.
(act_des_fn): Set up mapping between ramp, actor and
destroy functions.
(morph_fn_to_coro): Adjust interface to the builder for
helper function decls.
* cp-tree.h (DECL_ACTOR_FN, DECL_DESTROY_FN, DECL_RAMP_FN
DECL_IS_CORO_ACTOR_P, DECL_IS_CORO_DESTROY_P JOIN_STR): New.
* mangle.c (write_encoding): Handle coroutine helpers.
(write_unqualified_name): Handle lambda coroutine helpers.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr95520.C: New test.
---
 gcc/cp/coroutines.cc  | 87 +++
 gcc/cp/cp-tree.h  | 30 
 gcc/cp/mangle.c   | 18 -
 gcc/testsuite/g++.dg/coroutines/pr95520.C | 29 
 4 files changed, 150 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr95520.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 54ffdc8d062..a75f55427cb 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -82,11 +82,13 @@ static bool coro_promise_type_found_p (tree, location_t);
 struct GTY((for_user)) coroutine_info
 {
   tree function_decl; /* The original function decl.  */
-  tree promise_type; /* The cached promise type for this function.  */
-  tree handle_type;  /* The cached coroutine handle for this function.  */
-  tree self_h_proxy; /* A handle instance that is used as the proxy for the
-   one that will eventually be allocated in the coroutine
-   frame.  */
+  tree actor_decl;/* The synthesized actor function.  */
+  tree destroy_decl;  /* The synthesized destroy function.  */
+  tree promise_type;  /* The cached promise type for this function.  */
+  tree handle_type;   /* The cached coroutine handle 

[PATCH] PR tree-optimization/101403: Incorrect folding of ((T)bswap(x))>>C

2021-07-11 Thread Roger Sayle

My sincere apologies for the breakage.  My recent patch to fold
bswapN(x)>>C where the constant C was large enough that the result
only contains bits from the low byte, and can therefore avoid
the byte swap contains a minor logic error.  The pattern contains
a convert? allowing an extension to occur between the bswap and
the shift.  The logic is correct if there's no extension, or the
extension has the same sign as the shift, but I'd mistakenly
convinced myself that these couldn't have different signedness.

(T)bswap16(x)>>12 is (T)((unsigned char)x>>4) or (T)((signed char)x>>4).
The bug is that for zero-extensions to signed type T, we need to use
the unsigned char variant [the signedness of the byte shift is not
(always) the same as the signedness of T and the original shift].

Then because I'm now paranoid, I've also added a clause to handle
the hypothetical (but in practice impossible) sign-extension to an
unsigned type T, which can implemented as (T)(x<<8)>>12.

This patch has been tested on x86_64-pc-linux-gnu with a "make
bootstrap" and "make -k check" with no new failures, and a new
testcase to confirm it fixes the regression.

Ok for mainline?

2021-07-11  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/101403
* gcc/match.pd ((T)bswap(X)>>C): Correctly handle cases where
signedness of the shift is not the same as the signedness of
the type extension.

gcc/testsuite/ChangeLog
PR tree-optimization/101403
* gcc.dg/pr101403.c: New test case.


Sorry again,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

diff --git a/gcc/match.pd b/gcc/match.pd
index 30680d4..beb8d27 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3659,19 +3659,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  {
   unsigned HOST_WIDE_INT prec = TYPE_PRECISION (TREE_TYPE (@2));
   unsigned HOST_WIDE_INT bits = tree_to_uhwi (@1);
+  /* If the bswap was extended before the original shift, this
+byte (shift) has the sign of the extension, not the sign of
+the original shift.  */
+  tree st = TYPE_PRECISION (type) > prec ? TREE_TYPE (@2) : type;
  }
- (if (bits + 8 == prec)
-  (if (TYPE_UNSIGNED (type))
-   (convert (convert:unsigned_char_type_node @0))
-   (convert (convert:signed_char_type_node @0)))
-  (if (bits < prec && bits + 8 > prec)
-   (with 
-   {
-tree nst = build_int_cst (integer_type_node, bits & 7);
-tree bt = TYPE_UNSIGNED (type) ? unsigned_char_type_node
-   : signed_char_type_node;
-   }
-   (convert (rshift:bt (convert:bt @0) {nst;}
+ /* Special case: logical right shift of sign-extended bswap.
+   (unsigned)(short)bswap16(x)>>12 is (unsigned)((short)x<<8)>>12. */
+ (if (TYPE_PRECISION (type) > prec
+ && !TYPE_UNSIGNED (TREE_TYPE (@2))
+ && TYPE_UNSIGNED (type)
+ && bits < prec && bits + 8 >= prec)
+  (with { tree nst = build_int_cst (integer_type_node, prec - 8); }
+   (rshift (convert (lshift:st (convert:st @0) {nst;})) @1))
+  (if (bits + 8 == prec)
+   (if (TYPE_UNSIGNED (st))
+   (convert (convert:unsigned_char_type_node @0))
+   (convert (convert:signed_char_type_node @0)))
+   (if (bits < prec && bits + 8 > prec)
+   (with 
+{
+ tree nst = build_int_cst (integer_type_node, bits & 7);
+ tree bt = TYPE_UNSIGNED (st) ? unsigned_char_type_node
+  : signed_char_type_node;
+}
+(convert (rshift:bt (convert:bt @0) {nst;})
  /* bswap(x) & C1 can sometimes be simplified to (x >> C2) & C1.  */
  (simplify
   (bit_and (convert? (bswap@2 @0)) INTEGER_CST@1)
/* { dg-do run } */
/* { dg-options "-O2" } */
unsigned int foo (unsigned int a)
{
  unsigned int u;
  unsigned short b = __builtin_bswap16 (a);
  return b >> (u, 12);
}

int main (void)
{
  unsigned int x = foo (0x80);
  if (x != 0x0008)
__builtin_abort ();
  return 0;
}