Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove vector set and vector init built-ins.
> 
> The vector init built-ins:
> 
>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>   __builtin_vec_set_v1ti
> 
> perform the same operation as initializing the vector in C code.  For
> example:
> 
>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>   result_v4si = {1, 2, 3, 4};
> 
> These two constructs were tested and verified they generate identical
> assembly instructions with no optimization and -O3 optimization.
> 
> The vector set built-ins:
> 
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf
> 
> perform the same operation as setting a specific element in the vector in
> C code.  For example:
> 
>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>   src_v4si[index] = int_val;
> 
> The built-in actually generates more instructions than the inline C code
> with no optimization but is identical with -O3 optimizations.
> 
> All of the above built-ins that are removed do not have test cases and
> are not documented.
> 
> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
> __builtin_vec_set_v2df are not removed as they are used in function
> resolve_vec_insert() in file rs6000-c.cc.

I think we can replace these calls with the equivalent gimple codes
(early expanding it) and then we can get rid of these instances.

BR,
Kewen

> 
> The built-ins are removed as they don't provide any benefit over just
> using C code.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>__builtin_vec_init_v8hi, __builtin_vec_init_v4si,
>   __builtin_vec_init_v4sf, __builtin_vec_init_v2di,
>   __builtin_vec_init_v2df, __builtin_vec_set_v1ti,
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v2di, __builtin_vec_set_v2df,
>   __builtin_vec_set_v1ti): Remove built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 42 ++-
>  1 file changed, 2 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 19d05b8043a..d04ad4ce7e5 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1115,37 +1115,6 @@
>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>  VEC_EXT_V8HI nothing {extract}
>  
> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char);
> -VEC_INIT_V16QI nothing {init}
> -
> -  const vf __builtin_vec_init_v4sf (float, float, float, float);
> -VEC_INIT_V4SF nothing {init}
> -
> -  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
> - signed int);
> -VEC_INIT_V4SI nothing {init}
> -
> -  const vss __builtin_vec_init_v8hi (signed short, signed short, signed 
> short,\
> - signed short, signed short, signed short, signed short, \
> - signed short);
> -VEC_INIT_V8HI nothing {init}
> -
> -  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
> -VEC_SET_V16QI nothing {set}
> -
> -  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
> -VEC_SET_V4SF nothing {set}
> -
> -  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
> -VEC_SET_V4SI nothing {set}
> -
> -  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
> -VEC_SET_V8HI nothing {set}
> -
> -
>  ; Cell builtins.
>  [cell]
>pure vsc __builtin_altivec_lvlx (signed long, const void *);
> @@ -1292,15 +1261,8 @@
>const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>  VEC_EXT_V2DI nothing {extract}
>  
> -  const vsq __builtin_vec_init_v1ti (signed __int128);
> -VEC_INIT_V1TI nothing {init}
> -
> -  const vd __builtin_vec_init_v2df (double, double);
> -VEC_INIT_V2DF nothing {init}
> -
> -  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
> -VEC_INIT_V2DI nothing {init}
> -
> +;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
> +;; resolve_vec_insert(), rs6000-c.cc
>const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
>  VEC_SET_V1TI nothing {set}
>  



Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
> 
> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
> the test cases are removed.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>   Remove built-in definition.
> 

Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
__builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.


> gcc/testsuite/ChangeLog:
>   * vsx-builtin-3.c (do_cmp): Remove test case for
>   __builtin_vsx_xvcmpeqsp.
> ---
>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>  2 files changed, 5 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 2f6149edd5f..19d05b8043a 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,9 +1613,6 @@
>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>  
> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
> -XVCMPEQSP vector_eqv4sf {}
> -
>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>  XVCMPGEDP vector_gev2df {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index 35ea31b2616..245893dc0e3 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -27,7 +27,6 @@
>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>  /* { dg-final { scan-assembler "xxsldwi" } } */
> @@ -112,7 +111,6 @@ int do_cmp (void)
>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>  
> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>return i;

As the other in this patch series, I prefer to change it with
vec_cmpeq here, OK for trunk with this tweaked (also keep the
scan there), thanks!

BR,
Kewen



Re: [PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, extend vec_xxpermdi built-in for __int128 args
> 
> Add a new overloaded instance for vec_xxpermdi
> 
>__int128 vec_xxpermdi (__int128, __int128, const int);
> 
> Update the documentation to include a reference to the new built-in
> instance.
> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new
>   overloaded built-in instance.
> ---
>  gcc/config/rs6000/rs6000-overload.def | 2 ++
>  gcc/doc/extend.texi   | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 5912c9452f4..49962e2f2a2 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -4932,6 +4932,8 @@
>  XXPERMDI_4SF  XXPERMDI_VF
>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>  XXPERMDI_2DF  XXPERMDI_VD
> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
> +XXPERMDI_1TI  XXPERMDI_1TI

This actually introduces the signed __int128, considering the other
existing ones, I think we want both signed and unsigned.

>  
>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 86b8e536dbe..47cf2f3bc8b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool 
> char *);
>  void vec_vsx_st (vector bool char, int, unsigned char *);
>  void vec_vsx_st (vector bool char, int, signed char *);
>  
> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>  vector double vec_xxpermdi (vector double, vector double, const int);
>  vector float vec_xxpermdi (vector float, vector float, const int);

Nit: Considering the existing ones sorted by element size descending, I guess
it's better to move the above here (and with the explicit signed and unsigned).

And we need a test case for it as well?

BR,
Kewen

>  vector long long vec_xxpermdi (vector long long, vector long long, const 
> int);




Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-13 Thread Jeff Law




On 5/13/24 6:54 PM, Patrick O'Neill wrote:


On 5/13/24 13:28, Jeff Law wrote:



On 5/13/24 12:49 PM, Vineet Gupta wrote:

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization 
didn't

fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

    gcc-13.1 release   |  gcc 230823 | |
   |    g6619b3d4c15c    |   This patch | clang/llvm
-
li  t0,-4096 | li    t0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addi    t0,t0,2016   | addi  t0,t0,2032    | add   sp,sp,-16   | addi 
sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0    | add 
a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sb    zero,0(a5)  | add 
a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb 
zero,0(a0)
addi    a4,a4,-2032  | li    t0, 4096  | addi  sp,sp,32    | addi 
sp,sp,2032
add a4,a4,a5 | sb    zero,2032(a0) | ret   | addi 
sp,sp,48

addi    a5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addi    t0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.





@@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style)
  }
    else
  {
-  if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+  HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+  if (SMALL_OPERAND (adj_off_value))
+    {
+  adjust = GEN_INT (adj_off_value);
+    }
+  else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+    {
+  HOST_WIDE_INT base, off;
+  riscv_split_sum_of_two_s12 (adj_off_value, , );
+  insn = gen_add3_insn (stack_pointer_rtx, 
hard_frame_pointer_rtx,

+    GEN_INT (base));
+  RTX_FRAME_RELATED_P (insn) = 1;
+  adjust = GEN_INT (off);
+    }
So this was the hunk that we identified internally as causing problems 
with libgomp's testsuite.  We never fully chased it down as this hunk 
didn't seem terribly important performance wise -- we just set it 
aside.  The thing is it looked basically correct to me.  So the 
failure was certainly unexpected, but it was consistent.


So I think the question is whether or not the CI system runs the 
libgomp testsuite, particularly in the rv64 linux configuration. If it 
does, and it passes, then we're good.  I'm still finding my way around 
the configuration, so I don't know if the CI system Edwin & Patrick 
have built tests libgomp or not.


I poked around the .sum files in pre/postcommit and we do run tests like:

PASS: c-c++-common/gomp/affinity-2.c  (test for errors, line 45)

I was able to find the summary info:


Tests that now fail, but worked before (15 tests):
libgomp: libgomp.fortran/simd7.f90   -O0  execution test
libgomp: libgomp.fortran/task2.f90   -O0  execution test
libgomp: libgomp.fortran/vla2.f90   -O0  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -O1  execution test
libgomp: libgomp.fortran/vla4.f90   -O2  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
libgomp: libgomp.fortran/vla4.f90   -O3 -g  execution test
libgomp: libgomp.fortran/vla4.f90   -Os  execution test

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Jiufu Guo
Hi,

"Kewen.Lin"  writes:

> Hi,
>
> on 2024/5/14 11:00, Jiufu Guo wrote:
>> Hi,
>> 
>> Thanks a lot for your helpful review!
>> 
>> "Kewen.Lin"  writes:
>> 
>>> Hi,
>>>
>>> on 2024/5/13 10:57, Jiufu Guo wrote:
 Hi,

 For PR96866, when gcc print asm code for modifier "%a" which requires
 an address operand, while the operand is with the constraint "X" which
 allow non-address form.  An error message would be reported to indicate
 the invalid asm operands.

 Bootstrap pass on ppc64{,le}.
 Is this ok for trunk?

 BR,
 Jeff(Jiufu Guo)

PR target/96866

 gcc/ChangeLog:

* config/rs6000/rs6000.cc (print_operand_address):

 gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr96866-1.c: New test.
* gcc.target/powerpc/pr96866-2.c: New test.

 ---
  gcc/config/rs6000/rs6000.cc  |  6 ++
  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++
  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++
  3 files changed, 31 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c

 diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
 index 117999613d8..50943d76f79 100644
 --- a/gcc/config/rs6000/rs6000.cc
 +++ b/gcc/config/rs6000/rs6000.cc
 @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
   || GET_CODE (x) == LABEL_REF)
  {
 +  if (this_is_asm_operands && !address_operand (x, VOIDmode))
>>>
>>> Do we really need this_is_asm_operands here?
>> I understand your point: 
>> since in function 'print_operand_address' which supports not only user
>> asm code.  So, it maybe incorrect if 'x' is not an 'address_operand',
>> no matter this_is_asm_operands.
>> 
>> Here, 'this_is_asm_operands' is needed because it would be treated as an
>> user fault in asm-code (otherwise, internal_error in the compiler).
>
> The called function "output_operand_lossage" already takes different
> actions for this_is_asm_operands and !this_is_asm_operands cases, so
> for this_is_asm_operands, it goes with error_for_asm and no ICE, no?
>
> And without this_is_asm_operands, if we adopt constraint X internally
> and hit this (it means it's already unexpected), isn't better to see
> the ICE instead of going further?
Yeap, exactly! "output_operand_lossage" could handle both user 'asm'
error and internal_error.  So it would be ok to call it directly just
for "gcc_assert(TARGET_TOC)" for this "if condition". Like:
```
  else if (TARGET_TOC)
output_operand_lossage ("invalid expression as operand");
```
I would refine the patch.

Thanks again for your great comments.

BR,
Jeff(Jiufu) Guo

>
> BR,
> Kewen
>
>> 
>> I notice one thing:
>> As what we need is emitting error for printing address if the address
>> can not be access directly.
>> So it would be better to emit message through 'output_operand_lossage'
>> just befor gcc_assert(TARGET_TOC).
>> 
>> Thanks a lot for your insight comment!
>> 
>>>
 +  {
 +output_operand_lossage ("invalid expression as operand");
 +return;
 +  }
 +
output_addr_const (file, x);
if (small_data_operand (x, GET_MODE (x)))
fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
 diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
 b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
 new file mode 100644
 index 000..6554a472a11
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
 @@ -0,0 +1,15 @@
 +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. 
  */
 +/* { dg-excess-errors "pr96866-2.c" } */
 +/* { dg-options "-fPIC -O2" } */
>>>
>>> Nit: If these two options are required, it would be good to have a comment 
>>> explaining it a bit
>>> when it's not obvious.
>> 
>> Good suggestion, thanks!
>>>
 +
 +int x[2];
 +
 +int __attribute__ ((noipa))
 +f1 (void)
 +{
 +  int n;
 +  int *p = x;
 +  *p++;
 +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
 +  return n;
 +}
 diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
 b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
 new file mode 100644
 index 000..a5ec96f29dd
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
 @@ -0,0 +1,10 @@
 +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. 
  */
 +/* { dg-excess-errors "pr96866-2.c" } */
 +/* { dg-options "-fPIC -O2" } */
>>>
>>> Ditto.
>> Thanks!
>> 
>> BR,
>> Jeff(Jiufu) Guo
>>>
>>> BR,
>>> Kewen
>>>
 +
 +void
 +f (void)
 +{
 +  extern int x;
 +  __asm__ volatile("#%a0" ::"X"());
 +}


Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/5/14 11:00, Jiufu Guo wrote:
> Hi,
> 
> Thanks a lot for your helpful review!
> 
> "Kewen.Lin"  writes:
> 
>> Hi,
>>
>> on 2024/5/13 10:57, Jiufu Guo wrote:
>>> Hi,
>>>
>>> For PR96866, when gcc print asm code for modifier "%a" which requires
>>> an address operand, while the operand is with the constraint "X" which
>>> allow non-address form.  An error message would be reported to indicate
>>> the invalid asm operands.
>>>
>>> Bootstrap pass on ppc64{,le}.
>>> Is this ok for trunk?
>>>
>>> BR,
>>> Jeff(Jiufu Guo)
>>>
>>> PR target/96866
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (print_operand_address):
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/pr96866-1.c: New test.
>>> * gcc.target/powerpc/pr96866-2.c: New test.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc  |  6 ++
>>>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++
>>>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++
>>>  3 files changed, 31 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 117999613d8..50943d76f79 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
>>>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
>>>|| GET_CODE (x) == LABEL_REF)
>>>  {
>>> +  if (this_is_asm_operands && !address_operand (x, VOIDmode))
>>
>> Do we really need this_is_asm_operands here?
> I understand your point: 
> since in function 'print_operand_address' which supports not only user
> asm code.  So, it maybe incorrect if 'x' is not an 'address_operand',
> no matter this_is_asm_operands.
> 
> Here, 'this_is_asm_operands' is needed because it would be treated as an
> user fault in asm-code (otherwise, internal_error in the compiler).

The called function "output_operand_lossage" already takes different
actions for this_is_asm_operands and !this_is_asm_operands cases, so
for this_is_asm_operands, it goes with error_for_asm and no ICE, no?

And without this_is_asm_operands, if we adopt constraint X internally
and hit this (it means it's already unexpected), isn't better to see
the ICE instead of going further?

BR,
Kewen

> 
> I notice one thing:
> As what we need is emitting error for printing address if the address
> can not be access directly.
> So it would be better to emit message through 'output_operand_lossage'
> just befor gcc_assert(TARGET_TOC).
> 
> Thanks a lot for your insight comment!
> 
>>
>>> +   {
>>> + output_operand_lossage ("invalid expression as operand");
>>> + return;
>>> +   }
>>> +
>>>output_addr_const (file, x);
>>>if (small_data_operand (x, GET_MODE (x)))
>>> fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>> new file mode 100644
>>> index 000..6554a472a11
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>> @@ -0,0 +1,15 @@
>>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>>> */
>>> +/* { dg-excess-errors "pr96866-2.c" } */
>>> +/* { dg-options "-fPIC -O2" } */
>>
>> Nit: If these two options are required, it would be good to have a comment 
>> explaining it a bit
>> when it's not obvious.
> 
> Good suggestion, thanks!
>>
>>> +
>>> +int x[2];
>>> +
>>> +int __attribute__ ((noipa))
>>> +f1 (void)
>>> +{
>>> +  int n;
>>> +  int *p = x;
>>> +  *p++;
>>> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
>>> +  return n;
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>>> new file mode 100644
>>> index 000..a5ec96f29dd
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>>> @@ -0,0 +1,10 @@
>>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>>> */
>>> +/* { dg-excess-errors "pr96866-2.c" } */
>>> +/* { dg-options "-fPIC -O2" } */
>>
>> Ditto.
> Thanks!
> 
> BR,
> Jeff(Jiufu) Guo
>>
>> BR,
>> Kewen
>>
>>> +
>>> +void
>>> +f (void)
>>> +{
>>> +  extern int x;
>>> +  __asm__ volatile("#%a0" ::"X"());
>>> +}





Re: [PATCH 9/13] rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins
> 
> The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are
> redundant.  The overloaded vec_neg built-in provides the same
> functionality.  The two buit-ins are not documented nor are there any
> test cases for them.
> 
> Remove the definitions so users will use the overloaded vec_neg built-in
> which is documented in the PVIPR.

OK, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp,
>   __builtin_vsx_xvnegsp): Remove built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f33564d3d9c..d65c858ac0c 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1763,12 +1763,6 @@
>const vf __builtin_vsx_xvnabssp (vf);
>  XVNABSSP vsx_nabsv4sf2 {}
>  
> -  const vd __builtin_vsx_xvnegdp (vd);
> -XVNEGDP negv2df2 {}
> -
> -  const vf __builtin_vsx_xvnegsp (vf);
> -XVNEGSP negv4sf2 {}
> -
>const vd __builtin_vsx_xvnmadddp (vd, vd, vd);
>  XVNMADDDP nfmav2df4 {}
>  



Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Jiufu Guo
Hi,

Thanks a lot for your helpful review!

"Kewen.Lin"  writes:

> Hi,
>
> on 2024/5/13 10:57, Jiufu Guo wrote:
>> Hi,
>> 
>> For PR96866, when gcc print asm code for modifier "%a" which requires
>> an address operand, while the operand is with the constraint "X" which
>> allow non-address form.  An error message would be reported to indicate
>> the invalid asm operands.
>> 
>> Bootstrap pass on ppc64{,le}.
>> Is this ok for trunk?
>> 
>> BR,
>> Jeff(Jiufu Guo)
>> 
>>  PR target/96866
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000.cc (print_operand_address):
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/powerpc/pr96866-1.c: New test.
>>  * gcc.target/powerpc/pr96866-2.c: New test.
>> 
>> ---
>>  gcc/config/rs6000/rs6000.cc  |  6 ++
>>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++
>>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++
>>  3 files changed, 31 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> 
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 117999613d8..50943d76f79 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
>>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
>> || GET_CODE (x) == LABEL_REF)
>>  {
>> +  if (this_is_asm_operands && !address_operand (x, VOIDmode))
>
> Do we really need this_is_asm_operands here?
I understand your point: 
since in function 'print_operand_address' which supports not only user
asm code.  So, it maybe incorrect if 'x' is not an 'address_operand',
no matter this_is_asm_operands.

Here, 'this_is_asm_operands' is needed because it would be treated as an
user fault in asm-code (otherwise, internal_error in the compiler).

I notice one thing:
As what we need is emitting error for printing address if the address
can not be access directly.
So it would be better to emit message through 'output_operand_lossage'
just befor gcc_assert(TARGET_TOC).

Thanks a lot for your insight comment!

>
>> +{
>> +  output_operand_lossage ("invalid expression as operand");
>> +  return;
>> +}
>> +
>>output_addr_const (file, x);
>>if (small_data_operand (x, GET_MODE (x)))
>>  fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>> new file mode 100644
>> index 000..6554a472a11
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>> @@ -0,0 +1,15 @@
>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>> */
>> +/* { dg-excess-errors "pr96866-2.c" } */
>> +/* { dg-options "-fPIC -O2" } */
>
> Nit: If these two options are required, it would be good to have a comment 
> explaining it a bit
> when it's not obvious.

Good suggestion, thanks!
>
>> +
>> +int x[2];
>> +
>> +int __attribute__ ((noipa))
>> +f1 (void)
>> +{
>> +  int n;
>> +  int *p = x;
>> +  *p++;
>> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
>> +  return n;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> new file mode 100644
>> index 000..a5ec96f29dd
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>> @@ -0,0 +1,10 @@
>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>> */
>> +/* { dg-excess-errors "pr96866-2.c" } */
>> +/* { dg-options "-fPIC -O2" } */
>
> Ditto.
Thanks!

BR,
Jeff(Jiufu) Guo
>
> BR,
> Kewen
>
>> +
>> +void
>> +f (void)
>> +{
>> +  extern int x;
>> +  __asm__ volatile("#%a0" ::"X"());
>> +}


Re: [PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove __builtin_vsx_vperm_* built-ins
> 
> The undocumented built-ins:
>   __builtin_vsx_vperm_16qi_uns,
>   __builtin_vsx_vperm_1ti,
>   __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df,
>   __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns,
>   __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si,
>   __builtin_vsx_vperm_4si_uns
> 
> are duplicats of the __builtin_altivec_* builtins that are used by
> the overloaded vec_perm built-in that is documented in the PVIPR.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
>   __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
>   built-in definitions and comments.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
>__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
>   test cases.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 33 ---
>  .../gcc.target/powerpc/vsx-builtin-3.c| 20 ---
>  2 files changed, 53 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3c409d729ea..f33564d3d9c 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1529,39 +1529,6 @@
>const vf __builtin_vsx_uns_floato_v2di (vsll);
>  UNS_FLOATO_V2DI unsfloatov2di {}
>  
> -; These are duplicates of __builtin_altivec_* counterparts, and are being
> -; kept for backwards compatibility.  The reason for their existence is
> -; unclear.  TODO: Consider deprecation/removal at some point.
> -  const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
> -VPERM_16QI_X altivec_vperm_v16qi {}
> -
> -  const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc);
> -VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {}
> -
> -  const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc);
> -VPERM_1TI_X altivec_vperm_v1ti {}
> -
> -  const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc);
> -VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {}
> -
> -  const vd __builtin_vsx_vperm_2df (vd, vd, vuc);
> -VPERM_2DF_X altivec_vperm_v2df {}
> -
> -  const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc);
> -VPERM_2DI_X altivec_vperm_v2di {}
> -
> -  const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc);
> -VPERM_2DI_UNS_X altivec_vperm_v2di_uns {}
> -
> -  const vf __builtin_vsx_vperm_4sf (vf, vf, vuc);
> -VPERM_4SF_X altivec_vperm_v4sf {}
> -
> -  const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc);
> -VPERM_4SI_X altivec_vperm_v4si {}
> -
> -  const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc);
> -VPERM_4SI_UNS_X altivec_vperm_v4si_uns {}
> -
>const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
>  VPERM_8HI_X altivec_vperm_v8hi {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index 01f35dad713..35ea31b2616 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -2,7 +2,6 @@
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
> -/* { dg-final { scan-assembler "vperm" } } */
>  /* { dg-final { scan-assembler "xvrdpi" } } */
>  /* { dg-final { scan-assembler "xvrdpic" } } */
>  /* { dg-final { scan-assembler "xvrdpim" } } */
> @@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4];
>  extern __vector __bool long bl[][4];
>  #endif
>  
> -int do_perm(void)
> -{
> -  int i = 0;
> -
> -  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
> -  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
> -  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
> -  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
> -  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
> -
> -  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
> -  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
> -  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
> -  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
> -  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
> -
> -  return i;
> -}
> -

I prefer to just relace these __builtin_vsx_vperm with vec_perm,
OK with this tweaked (also keep the above removed vperm scan), thanks!

BR,
Kewen

>  int do_xxperm (void)
>  {
>int i 

Re: [PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove the vec_xxsel built-ins, they are duplicates
> 
> The following undocumented built-ins are covered by the existing overloaded
> vec_sel built-in definitions.
> 
>   const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
> same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)
> 
>   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
> same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)
> 
>   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
> same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)
> 
>   const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
> same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)
> 
>   const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
> same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)
> 
>   const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
> same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)
> 
>   const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
> same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)
> 
>   const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
> same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)
> 
>   const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
> same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)
> 
>   const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
> same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)
> 
> This patch removed the duplicate built-in definitions so users will only
> use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
> 16qi, 4sf, 2df] tests are also removed.
> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrglw_4si,

Typo: __builtin_vsx_xxmrglw_4si, which doesn't belong to this patch.

>   __builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_16qi_uns,
>   __builtin_vsx_xxsel_2df, __builtin_vsx_xxsel_2di,
>   __builtin_vsx_xxsel_2di_uns, __builtin_vsx_xxsel_4sf,
>   __builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_4si_uns,
>   __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_8hi_uns): Remove
>   built-in definitions.
> 
> gcc/testsuite/ChangeLog:
> * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
> __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
> __builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test
> cases for removed built-ins.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 30 ---
>  .../gcc.target/powerpc/vsx-builtin-3.c| 26 
>  2 files changed, 56 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 46d2ae7b7cb..3c409d729ea 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1925,36 +1925,6 @@
>const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>);
>  XXPERMDI_8HI vsx_xxpermdi_v8hi {}
>  
> -  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
> -XXSEL_16QI vector_select_v16qi {}
> -
> -  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
> -XXSEL_16QI_UNS vector_select_v16qi_uns {}
> -
> -  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
> -XXSEL_2DF vector_select_v2df {}
> -
> -  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
> -XXSEL_2DI vector_select_v2di {}
> -
> -  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
> -XXSEL_2DI_UNS vector_select_v2di_uns {}
> -
> -  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
> -XXSEL_4SF vector_select_v4sf {}
> -
> -  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
> -XXSEL_4SI vector_select_v4si {}
> -
> -  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
> -XXSEL_4SI_UNS vector_select_v4si_uns {}
> -
> -  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
> -XXSEL_8HI vector_select_v8hi {}
> -
> -  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
> -XXSEL_8HI_UNS vector_select_v8hi_uns {}
> -
>const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>);
>  XXSLDWI_16QI vsx_xxsldwi_v16qi {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index ff875c55304..01f35dad713 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -2,7 +2,6 @@
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
> -/* { dg-final { scan-assembler "xxsel" } } */
>  /* { dg-final { scan-assembler "vperm" } } */
>  /* { dg-final { scan-assembler "xvrdpi" } } */
>  /* { dg-final { scan-assembler "xvrdpic" } } */
> @@ -57,31 +56,6 @@ extern __vector unsigned long long ull[][4];
>  extern __vector __bool long 

Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:17, Carl Love wrote:
> rs6000, add overloaded vec_sel with int128 arguments
> 
> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
> and return a signed/unsigned int128 result.
> 
> Extending the vec_sel built-in makes the existing buit-ins
> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
> patch removes these built-ins.
> 
> The patch adds documentation and test cases for the new overloaded vec_sel
> built-ins.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>   __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>   * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>   definitions.
>   * doc/extend.texi: Add documentation for new vec_sel arguments.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vec_sel_runnable-int128.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |  6 --
>  gcc/config/rs6000/rs6000-overload.def |  4 +
>  gcc/doc/extend.texi   | 14 
>  .../powerpc/vec-sel-runnable-i128.c   | 84 +++
>  4 files changed, 102 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index d09e21a9151..46d2ae7b7cb 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1931,12 +1931,6 @@
>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>  
> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
> -XXSEL_1TI vector_select_v1ti {}
> -
> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
> -
>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>  XXSEL_2DF vector_select_v2df {}
>  
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 68501c05289..5912c9452f4 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3274,6 +3274,10 @@
>  VSEL_2DF  VSEL_2DF_B
>vd __builtin_vec_sel (vd, vd, vull);
>  VSEL_2DF  VSEL_2DF_U
> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
> +VSEL_1TI  VSEL_1TI_S
> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
> +VSEL_1TI_UNS  VSEL_1TI_U
>  ; The following variants are deprecated.
>vsll __builtin_vec_sel (vsll, vsll, vsll);
>  VSEL_2DI_B  VSEL_2DI_S
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 64a43b55e2d..86b8e536dbe 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23358,6 +23358,20 @@ The programmer is responsible for understanding the 
> endianness issues involved
>  with the first argument and the result.
>  @findex vec_replace_unaligned
>  
> +Vector select
> +
> +@smallexample
> +vector signed __int128 vec_sel (vector signed __int128,
> +   vector signed __int128, vector signed __int128);
> +vector unsigned __int128 vec_sel (vector unsigned __int128,
> +   vector unsigned __int128, vector unsigned __int128);
> +@end smallexample
> +
> +The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128
> +arguments and returns a vector selecting bits from the two source vectors 
> based
> +on the values of the third input vector.  This built-in is an extension of 
> the
> +@code{vec_sel} built-in documented in the PVIPR.
> +

Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't really
require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are placed
in altivec stanza, so it looks that we should put it under the section
"PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an extension
of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an
extension of the @code{vec_sel} built-in documented in the PVIPR" and omitting
the description to avoid possible slightly different wording.

>  Vector Shift Left Double Bit Immediate
>  @smallexample
>  @exdent vector signed char vec_sldb (vector signed char, vector signed char,
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
> new file mode 100644
> index 000..58eb383e8c3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
> @@ -0,0 +1,84 @@
> +/* { dg-do run  { target power10_hw }} */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-require-effective-target power10_hw } */

As mentioned above, this doesn't require power10, you can specify vmx_hw.
(btw removing { target power10_hw } on dg-do run line).

> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */

s/-mdejagnu-cpu=power10/-maltivec/
s/-save-temps//

> +
> +
> +#include 
> +
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include 
> +void 

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Jiufu Guo
Hi,

Thanks for your helpful comments!

Segher Boessenkool  writes:

> Hi!
>
> On Mon, May 13, 2024 at 10:57:12AM +0800, Jiufu Guo wrote:
>> For PR96866, when gcc print asm code for modifier "%a" which requires
>> an address operand,
>
> It requires a *memory* operand, and it outputs its address.  This is a
> generic modifier btw (not rs6000).
Oh, yeap. it outputs the operands's address. I would update words like:
which requires an addressable operand.

>
>> while the operand is with the constraint "X" which
>> allow non-address form.  An error message would be reported to indicate
>> the invalid asm operands.
>
> "non-address form"?  Every mem has an address.
>
> But 'X' is not memory.  What is it at all?  Why do we use that when you
> *have to* have mem here?
"X" allows any thing.  This is the reason why the code is *invalid*.
Other constraints("r/m") should be better than "X" for "%a".

>
> The code you add that tests for address_operand looks wrong.  I would
> expect it to test the operand is memory, instead :-)
I understand your concern. While there is a tricky work:
before invoking print_operand_address/output_address, the orignal
operand (which would be 'mem') is stripped to it's address.
So, 'address_operand' is tested for print_operand_address is targets.

While I also wonder if "address_operand" is really needed. Because
under the condition:
```
  else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
   || GET_CODE (x) == LABEL_REF)
{
```
'x' is already known, it only could be: SYMBOL_REF/LABEL_REF or CONST.
I would update the patch for this.

Thanks for your comments.

BR,
Jeff(Jiufu) Guo

>
>
> Segher


RE: [PATCH] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-05-13 Thread Hu, Lin1
Do you have any advice?

BRs,
Lin

-Original Message-
From: Hu, Lin1  
Sent: Wednesday, May 8, 2024 9:38 AM
To: gcc-patches@gcc.gnu.org
Cc: Liu, Hongtao ; ubiz...@gmail.com
Subject: [PATCH] vect: generate suitable convert insn for int -> int, float -> 
float and int <-> float.

Hi, all

This patch aims to optimize __builtin_convertvector. We want the function can 
generate more efficient insn for some situations. Like v2si -> v2di.

The patch has been bootstrapped and regtested on x86_64-pc-linux-gnu, OK for 
trunk?

BRs,
Lin

gcc/ChangeLog:

PR target/107432
* tree-vect-generic.cc (expand_vector_conversion): Support
convert for int -> int, float -> float and int <-> float.
(expand_vector_conversion_no_vec_pack): Check if can convert
int <-> int, float <-> float and int <-> float, directly.
Support indirect convert, when direct optab is not supported.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-1.c: New test.
* gcc.target/i386/pr107432-2.c: Ditto.
* gcc.target/i386/pr107432-3.c: Ditto.
* gcc.target/i386/pr107432-4.c: Ditto.
* gcc.target/i386/pr107432-5.c: Ditto.
* gcc.target/i386/pr107432-6.c: Ditto.
* gcc.target/i386/pr107432-7.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/pr107432-1.c | 234 +  
gcc/testsuite/gcc.target/i386/pr107432-2.c | 105 +  
gcc/testsuite/gcc.target/i386/pr107432-3.c |  55 +  
gcc/testsuite/gcc.target/i386/pr107432-4.c |  56 +  
gcc/testsuite/gcc.target/i386/pr107432-5.c |  72 +++  
gcc/testsuite/gcc.target/i386/pr107432-6.c | 139   
gcc/testsuite/gcc.target/i386/pr107432-7.c | 156 ++
 gcc/tree-vect-generic.cc   | 107 +-
 8 files changed, 918 insertions(+), 6 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/i386/pr107432-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-7.c

diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c 
b/gcc/testsuite/gcc.target/i386/pr107432-1.c
new file mode 100644
index 000..a4f37447eb4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c
@@ -0,0 +1,234 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64 -mavx512bw -mavx512vl -O3" } */
+/* { dg-final { scan-assembler-times "vpmovqd" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovqw" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovqb" 6 } } */
+/* { dg-final { scan-assembler-times "vpmovdw" 6 { target { ia32 } } } 
+} */
+/* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } 
+} } */
+/* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } 
+} */
+/* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } 
+} } */
+/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
+
+#include 
+
+typedef short __v2hi __attribute__ ((__vector_size__ (4))); typedef 
+char __v2qi __attribute__ ((__vector_size__ (2))); typedef char __v4qi 
+__attribute__ ((__vector_size__ (4))); typedef char __v8qi 
+__attribute__ ((__vector_size__ (8)));
+
+typedef unsigned short __v2hu __attribute__ ((__vector_size__ (4))); 
+typedef unsigned short __v4hu __attribute__ ((__vector_size__ (8))); 
+typedef unsigned char __v2qu __attribute__ ((__vector_size__ (2))); 
+typedef unsigned char __v4qu __attribute__ ((__vector_size__ (4))); 
+typedef unsigned char __v8qu __attribute__ ((__vector_size__ (8))); 
+typedef unsigned int __v2su __attribute__ ((__vector_size__ (8)));
+
+__v2si mm_cvtepi64_epi32_builtin_convertvector(__m128i a) {
+  return __builtin_convertvector((__v2di)a, __v2si); }
+
+__m128imm256_cvtepi64_epi32_builtin_convertvector(__m256i a)
+{
+  return (__m128i)__builtin_convertvector((__v4di)a, __v4si); }
+
+__m256imm512_cvtepi64_epi32_builtin_convertvector(__m512i a)
+{
+  return (__m256i)__builtin_convertvector((__v8di)a, __v8si); }
+
+__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2hi); }
+
+__v4hi mm256_cvtepi64_epi16_builtin_convertvector(__m256i a)
+{
+  return __builtin_convertvector((__v4di)a, __v4hi); }
+
+__m128imm512_cvtepi64_epi16_builtin_convertvector(__m512i a)
+{
+  return (__m128i)__builtin_convertvector((__v8di)a, __v8hi); }
+
+__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a)
+{
+  return __builtin_convertvector((__v2di)a, __v2qi); }
+
+__v4qi mm256_cvtepi64_epi8_builtin_convertvector(__m256i a)
+{
+  return __builtin_convertvector((__v4di)a, __v4qi); }
+
+__v8qi mm512_cvtepi64_epi8_builtin_convertvector(__m512i a)
+{
+  return __builtin_convertvector((__v8di)a, __v8qi); }

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-13 Thread HAO CHEN GUI
Hi Aldy,
  Thanks for your review comments.

在 2024/5/13 19:18, Aldy Hernandez 写道:
> On Thu, May 9, 2024 at 10:05 AM Mikael Morin  wrote:
>>
>> Hello,
>>
>> Le 07/05/2024 à 04:37, HAO CHEN GUI a écrit :
>>> Hi,
>>>The former patch adds isfinite optab for __builtin_isfinite.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html
>>>
>>>Thus the builtin might not be folded at front end. The range op for
>>> isfinite is needed for value range analysis. This patch adds them.
>>>
>>>Compared to last version, this version fixes a typo.
>>>
>>>Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> Value Range: Add range op for builtin isfinite
>>>
>>> The former patch adds optab for builtin isfinite. Thus builtin isfinite 
>>> might
>>> not be folded at front end.  So the range op for isfinite is needed for 
>>> value
>>> range analysis.  This patch adds range op for builtin isfinite.
>>>
>>> gcc/
>>>   * gimple-range-op.cc (class cfn_isfinite): New.
>>>   (op_cfn_finite): New variables.
>>>   (gimple_range_op_handler::maybe_builtin_call): Handle
>>>   CFN_BUILT_IN_ISFINITE.
>>>
>>> gcc/testsuite/
>>>   * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.
>>>
>>> patch.diff
>>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
>>> index 9de130b4022..99c511728d3 100644
>>> --- a/gcc/gimple-range-op.cc
>>> +++ b/gcc/gimple-range-op.cc
>>> @@ -1192,6 +1192,56 @@ public:
>>> }
>>>   } op_cfn_isinf;
>>>
>>> +//Implement range operator for CFN_BUILT_IN_ISFINITE
>>> +class cfn_isfinite : public range_operator
>>> +{
>>> +public:
>>> +  using range_operator::fold_range;
>>> +  using range_operator::op1_range;
>>> +  virtual bool fold_range (irange , tree type, const frange ,
>>> +const irange &, relation_trio) const override
>>> +  {
>>> +if (op1.undefined_p ())
>>> +  return false;
>>> +
>>> +if (op1.known_isfinite ())
>>> +  {
>>> + r.set_nonzero (type);
>>> + return true;
>>> +  }
>>> +
>>> +if (op1.known_isnan ()
>>> + || op1.known_isinf ())
>>> +  {
>>> + r.set_zero (type);
>>> + return true;
>>> +  }
>>> +
>>> +return false;
>> I think the canonical API behaviour sets R to varying and returns true
>> instead of just returning false if nothing is known about the range.
> 
> Correct.  If we know it's varying, we just set varying and return
> true.  Returning false is usually reserved for "I have no idea".
> However, every caller of fold_range() should know to ignore a return
> of false, so you should be safe.

So it's better to set varying here and return true?
> 
>>
>> I'm not sure whether it makes any difference; Aldy can probably tell.
>> But if the type is bool, varying is [0,1] which is better than unknown
>> range.
> 
> Also, I see you're setting zero/nonzero.  Is the return type known to
> be boolean, because if so, we usually prefer to one of:
The return type is int. For __builtin_isfinite, the result is nonzero when
the float is a finite number, 0 otherwise.

> 
> r = range_true ()
> r = range_false ()
> r = range_true_and_false ();
> 
> It doesn't matter either way, but it's probably best to use these as
> they force boolean_type_node automatically.
> 
> I don't have a problem with this patch, but I would prefer the
> floating point savvy people to review this, as there are no members of
> the ranger team that are floating point experts :).
> 
> Also, I see you mention in your original post that this patch was
> needed as a follow-up to this one:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html
> 
> I don't see the above patch in the source tree currently:
Sorry, I may not express it clear. I sent a series of patches for review.
Some patches depend on others. The patch I mentioned is a patch also
under review.

Here is the list of the series of patches. Some of them are generic, and
others are rs6000 specific.

[PATCH] Value Range: Add range op for builtin isinf
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html

[patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode
[PR97786]
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html

[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf
on double [PR97786]
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648433.html

[PATCH] Optab: add isfinite_optab for __builtin_isfinite
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

[PATCHv2] Value range: Add range op for __builtin_isfinite
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html

[PATCH-2, rs6000] Implement optab_isfinite for SFmode, DFmode and TFmode
[PR97786]
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html

[PATCH-3] Builtin: Fold builtin_isfinite on IBM long double to
builtin_isfinite on double [PR97786]

Re: [PATCH 5/13] rs6000, remove duplicated built-ins of vecmergl and vec_mergeh

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:17, Carl Love wrote:
> rs6000, remove duplicated built-ins of vecmergl and vec_mergeh
> 
> The following undocumented built-ins are same as existing documented
> overloaded builtins.
> 
>   const vf __builtin_vsx_xxmrghw (vf, vf);
> same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)
> 
>   const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
> same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)
> 
>   const vf __builtin_vsx_xxmrglw (vf, vf);
> same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)
> 
>   const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
> same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)
> 
> This patch removes the duplicate built-in definitions so only the
> documented built-ins will be available for use.  The case statements in
> rs6000_gimple_fold_builtin are removed as they are no longer needed.  The
> patch removes the now unused define_expands for vsx_xxmrghw_ and
> vsx_xxmrglw_.

Ok for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
>   __builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
>   __builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove
>   built-in definition.
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
>   remove case entries RS6000_BIF_XXMRGLW_4SI,
>   RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
>   RS6000_BIF_XXMRGHW_4SF.
>   * config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_):
>   Remove unused define_expands.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   |  4 ---
>  gcc/config/rs6000/rs6000-builtins.def | 12 
>  gcc/config/rs6000/vsx.md  | 41 ---
>  3 files changed, 57 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index ac9f16fe51a..f83d65b06ef 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  /* vec_mergel (integrals).  */
>  case RS6000_BIF_VMRGLH:
>  case RS6000_BIF_VMRGLW:
> -case RS6000_BIF_XXMRGLW_4SI:
>  case RS6000_BIF_VMRGLB:
>  case RS6000_BIF_VEC_MERGEL_V2DI:
> -case RS6000_BIF_XXMRGLW_4SF:
>  case RS6000_BIF_VEC_MERGEL_V2DF:
>fold_mergehl_helper (gsi, stmt, 1);
>return true;
>  /* vec_mergeh (integrals).  */
>  case RS6000_BIF_VMRGHH:
>  case RS6000_BIF_VMRGHW:
> -case RS6000_BIF_XXMRGHW_4SI:
>  case RS6000_BIF_VMRGHB:
>  case RS6000_BIF_VEC_MERGEH_V2DI:
> -case RS6000_BIF_XXMRGHW_4SF:
>  case RS6000_BIF_VEC_MERGEH_V2DF:
>fold_mergehl_helper (gsi, stmt, 0);
>return true;
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 5b7237a2327..d09e21a9151 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1904,18 +1904,6 @@
>const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
>  XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
>  
> -  const vf __builtin_vsx_xxmrghw (vf, vf);
> -XXMRGHW_4SF vsx_xxmrghw_v4sf {}
> -
> -  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
> -XXMRGHW_4SI vsx_xxmrghw_v4si {}
> -
> -  const vf __builtin_vsx_xxmrglw (vf, vf);
> -XXMRGLW_4SF vsx_xxmrglw_v4sf {}
> -
> -  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
> -XXMRGLW_4SI vsx_xxmrglw_v4si {}
> -
>const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>);
>  XXPERMDI_16QI vsx_xxpermdi_v16qi {}
>  
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 3d39ae7995f..26560ecc38a 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4810,47 +4810,6 @@
>  }
>[(set_attr "type" "vecperm")])
>  
> -;; V4SF/V4SI interleave
> -(define_expand "vsx_xxmrghw_"
> -  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> -(vec_select:VSX_W
> -   (vec_concat:
> - (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> - (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
> -   (parallel [(const_int 0) (const_int 4)
> -  (const_int 1) (const_int 5)])))]
> -  "VECTOR_MEM_VSX_P (mode)"
> -{
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_
> -  : gen_altivec_vmrglw_direct_;
> -  if (!BYTES_BIG_ENDIAN)
> -std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> -  DONE;
> -}
> -  [(set_attr "type" "vecperm")])
> -
> -(define_expand "vsx_xxmrglw_"
> -  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> - (vec_select:VSX_W
> -   (vec_concat:
> - (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> - (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
> -   (parallel [(const_int 2) 

RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-13 Thread Li, Pan2


> That's just a matter of matching the overflow as an additional case no?
> i.e. you can add an overload for unsigned_integer_sat_add matching the
> IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers.

> I think that would be better as it avoid visiting all the statements twice 
> but also
> extends the matching to some __builtin_add_overflow uses and should be fairly
> simple.

Thanks Tamar, got the point here, will have a try with overload 
unsigned_integer_sat_add for that.

> Yeah, I think that's better than iterating over the statements twice.  It 
> also fits better
> In the existing code.

Ack, will follow the existing code.

Pan


-Original Message-
From: Tamar Christina  
Sent: Monday, May 13, 2024 11:03 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Liu, Hongtao 
Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

> 
> Thanks Tamer for comments.
> 
> > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when
> optimizing for size.
> 
> Sure thing, let me update it in v5.
> 
> > Hmm why do you iterate independently over the statements? The block below
> already visits
> > Every statement doesn't it?
> 
> Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the
> shape changed, or shall we put it to the previous pass ?
> 

That's just a matter of matching the overflow as an additional case no?
i.e. you can add an overload for unsigned_integer_sat_add matching the
IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers.

I think that would be better as it avoid visiting all the statements twice but 
also
extends the matching to some __builtin_add_overflow uses and should be fairly
simple.

> > The root of your match is a BIT_IOR_EXPR expression, so I think you just 
> > need to
> change the entry below to:
> >
> > case BIT_IOR_EXPR:
> >   match_saturation_arith (, stmt, m_cfg_changed_p);
> >   /* fall-through */
> > case BIT_XOR_EXPR:
> >   match_uaddc_usubc (, stmt, code);
> >   break;
> 
> There are other shapes (not covered in this patch) of SAT_ADD like below 
> branch
> version, the IOR should be one of the ROOT. Thus doesn't
> add case here.  Then, shall we take case for each shape here ? Both works for 
> me.
> 

Yeah, I think that's better than iterating over the statements twice.  It also 
fits better
In the existing code.

Tamar.

> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint32_t)
> 
> Pan
> 
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Monday, May 13, 2024 5:10 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Monday, May 6, 2024 3:48 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > We perform the tranform during widen_mult because that the sub-expr of
> > SAT_ADD will 

[committed] RISC-V: Fix format issue for trailing operator [NFC]

2024-05-13 Thread pan2 . li
From: Pan Li 

This patch would like to fix below format issue of trailing operator.

=== ERROR type #1: trailing operator (4 error(s)) ===
gcc/config/riscv/riscv-vector-builtins.cc:4641:39:  if ((exts &
RVV_REQUIRE_ELEN_FP_16) &&
gcc/config/riscv/riscv-vector-builtins.cc:4651:39:  if ((exts &
RVV_REQUIRE_ELEN_FP_32) &&
gcc/config/riscv/riscv-vector-builtins.cc:4661:39:  if ((exts &
RVV_REQUIRE_ELEN_FP_64) &&
gcc/config/riscv/riscv-vector-builtins.cc:4670:36:  if ((exts &
RVV_REQUIRE_ELEN_64) &&

Passed the ./contrib/check_GNU_style.sh for this patch,  and double
checked there is no other format issue of the original patch.

Committed as format change.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc
(validate_instance_type_required_extensions): Remove the
operator from the trailing and put it to new line.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-vector-builtins.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 3fdb4400d70..c08d87a2680 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -4638,8 +4638,8 @@ validate_instance_type_required_extensions (const 
rvv_type_info type,
 {
   uint64_t exts = type.required_extensions;
 
-  if ((exts & RVV_REQUIRE_ELEN_FP_16) &&
-!TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags))
+  if ((exts & RVV_REQUIRE_ELEN_FP_16)
+&& !TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags))
 {
   error_at (EXPR_LOCATION (exp),
"built-in function %qE requires the "
@@ -4648,8 +4648,8 @@ validate_instance_type_required_extensions (const 
rvv_type_info type,
   return false;
 }
 
-  if ((exts & RVV_REQUIRE_ELEN_FP_32) &&
-!TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags))
+  if ((exts & RVV_REQUIRE_ELEN_FP_32)
+&& !TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags))
 {
   error_at (EXPR_LOCATION (exp),
"built-in function %qE requires the "
@@ -4658,8 +4658,8 @@ validate_instance_type_required_extensions (const 
rvv_type_info type,
   return false;
 }
 
-  if ((exts & RVV_REQUIRE_ELEN_FP_64) &&
-!TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags))
+  if ((exts & RVV_REQUIRE_ELEN_FP_64)
+&& !TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags))
 {
   error_at (EXPR_LOCATION (exp),
"built-in function %qE requires the zve64d or v ISA extension",
@@ -4667,8 +4667,8 @@ validate_instance_type_required_extensions (const 
rvv_type_info type,
   return false;
 }
 
-  if ((exts & RVV_REQUIRE_ELEN_64) &&
-!TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags))
+  if ((exts & RVV_REQUIRE_ELEN_64)
+&& !TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags))
 {
   error_at (EXPR_LOCATION (exp),
"built-in function %qE requires the "
-- 
2.34.1



RE: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar

2024-05-13 Thread Li, Pan2
Ack, thanks Jeff and will fix it ASAP.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, May 14, 2024 2:10 AM
To: Li, Pan2 ; Kito Cheng ; 
juzhe.zh...@rivai.ai
Cc: gcc-patches 
Subject: Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 
scalar



On 5/13/24 9:00 AM, Li, Pan2 wrote:
> Committed, thanks Juzhe and Kito. Let's wait for a while before backport to 
> 14.
Could you fix the formatting nits caught by the CI linter?

=== ERROR type #1: trailing operator (4 error(s)) ===
gcc/config/riscv/riscv-vector-builtins.cc:4641:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_16) &&
gcc/config/riscv/riscv-vector-builtins.cc:4651:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_32) &&
gcc/config/riscv/riscv-vector-builtins.cc:4661:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_64) &&
gcc/config/riscv/riscv-vector-builtins.cc:4670:36:  if ((exts & 
RVV_REQUIRE_ELEN_64) &&


The "&&" needs to come down to the next line, indented like

if ((exts && RVV_REQUIRE_ELEN_FP_16)
 && !TARGET_VECTOR_.)

Ie, the "&&" indents just inside the first open paren.  It looks like 
all the conditions in validate_instance_type_required_extensions need to 
be fixed in a similar manner.

Given this is NFC, just post it for the archiver.  No need to wait on 
review.

Jeff




[PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-13 Thread Pengxuan Zheng
This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up more
optimization opportunities for gimple optimizers.

While we are here, we also remove the vget_low_* definitions from arm_neon.h and
use the new intrinsics framework.

PR target/102171

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_LOW_BUILTINS):
New macro to create definitions for all vget_low intrinsics.
(VGET_LOW_BUILTIN): Likewise.
(enum aarch64_builtins): Add vget_low function codes.
(aarch64_general_fold_builtin): Fold vget_low calls.
* config/aarch64/aarch64-simd-builtins.def: Delete vget_low builtins.
* config/aarch64/aarch64-simd.md (aarch64_get_low): Delete.
(aarch64_vget_lo_halfv8bf): Likewise.
* config/aarch64/arm_neon.h (__attribute__): Delete.
(vget_low_f16): Likewise.
(vget_low_f32): Likewise.
(vget_low_f64): Likewise.
(vget_low_p8): Likewise.
(vget_low_p16): Likewise.
(vget_low_p64): Likewise.
(vget_low_s8): Likewise.
(vget_low_s16): Likewise.
(vget_low_s32): Likewise.
(vget_low_s64): Likewise.
(vget_low_u8): Likewise.
(vget_low_u16): Likewise.
(vget_low_u32): Likewise.
(vget_low_u64): Likewise.
(vget_low_bf16): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr113573.c: Replace __builtin_aarch64_get_lowv8hi
with vget_low_s16.
* gcc.target/aarch64/vget_low_2.c: New test.
* gcc.target/aarch64/vget_low_2_be.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-builtins.cc|  60 ++
 gcc/config/aarch64/aarch64-simd-builtins.def  |   5 +-
 gcc/config/aarch64/aarch64-simd.md|  23 +---
 gcc/config/aarch64/arm_neon.h | 105 --
 gcc/testsuite/gcc.target/aarch64/pr113573.c   |   2 +-
 gcc/testsuite/gcc.target/aarch64/vget_low_2.c |  30 +
 .../gcc.target/aarch64/vget_low_2_be.c|  31 ++
 7 files changed, 124 insertions(+), 132 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 75d21de1401..4afe7c86ae3 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
   VREINTERPRET_BUILTINS \
   VREINTERPRETQ_BUILTINS
 
+#define AARCH64_SIMD_VGET_LOW_BUILTINS \
+  VGET_LOW_BUILTIN(f16) \
+  VGET_LOW_BUILTIN(f32) \
+  VGET_LOW_BUILTIN(f64) \
+  VGET_LOW_BUILTIN(p8) \
+  VGET_LOW_BUILTIN(p16) \
+  VGET_LOW_BUILTIN(p64) \
+  VGET_LOW_BUILTIN(s8) \
+  VGET_LOW_BUILTIN(s16) \
+  VGET_LOW_BUILTIN(s32) \
+  VGET_LOW_BUILTIN(s64) \
+  VGET_LOW_BUILTIN(u8) \
+  VGET_LOW_BUILTIN(u16) \
+  VGET_LOW_BUILTIN(u32) \
+  VGET_LOW_BUILTIN(u64) \
+  VGET_LOW_BUILTIN(bf16)
+
 typedef struct
 {
   const char *name;
@@ -697,6 +714,9 @@ typedef struct
 #define VREINTERPRET_BUILTIN(A, B, L) \
   AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B,
 
+#define VGET_LOW_BUILTIN(A) \
+  AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
+
 #undef VAR1
 #define VAR1(T, N, MAP, FLAG, A) \
   AARCH64_SIMD_BUILTIN_##T##_##N##A,
@@ -732,6 +752,7 @@ enum aarch64_builtins
   AARCH64_CRC32_BUILTIN_MAX,
   /* SIMD intrinsic builtins.  */
   AARCH64_SIMD_VREINTERPRET_BUILTINS
+  AARCH64_SIMD_VGET_LOW_BUILTINS
   /* ARMv8.3-A Pointer Authentication Builtins.  */
   AARCH64_PAUTH_BUILTIN_AUTIA1716,
   AARCH64_PAUTH_BUILTIN_PACIA1716,
@@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
  && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
   },
 
+#undef VGET_LOW_BUILTIN
+#define VGET_LOW_BUILTIN(A) \
+  {"vget_low_" #A, \
+   AARCH64_SIMD_BUILTIN_VGET_LOW_##A, \
+   2, \
+   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
+   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
+   FLAG_AUTO_FP, \
+   false \
+  },
+
+#define AARCH64_SIMD_VGET_LOW_BUILTINS \
+  VGET_LOW_BUILTIN(f16) \
+  VGET_LOW_BUILTIN(f32) \
+  VGET_LOW_BUILTIN(f64) \
+  VGET_LOW_BUILTIN(p8) \
+  VGET_LOW_BUILTIN(p16) \
+  VGET_LOW_BUILTIN(p64) \
+  VGET_LOW_BUILTIN(s8) \
+  VGET_LOW_BUILTIN(s16) \
+  VGET_LOW_BUILTIN(s32) \
+  VGET_LOW_BUILTIN(s64) \
+  VGET_LOW_BUILTIN(u8) \
+  VGET_LOW_BUILTIN(u16) \
+  VGET_LOW_BUILTIN(u32) \
+  VGET_LOW_BUILTIN(u64) \
+  VGET_LOW_BUILTIN(bf16)
+
 static const aarch64_simd_intrinsic_datum aarch64_simd_intrinsic_data[] = {
   AARCH64_SIMD_VREINTERPRET_BUILTINS
+  AARCH64_SIMD_VGET_LOW_BUILTINS
 };
 
 
@@ -3216,6 +3266,9 @@ aarch64_fold_builtin_lane_check (tree arg0, tree arg1, 
tree arg2)
 #define VREINTERPRET_BUILTIN(A, B, L) \
   case AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B:
 
+#undef VGET_LOW_BUILTIN
+#define VGET_LOW_BUILTIN(A) \
+  case 

Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-13 Thread Patrick O'Neill



On 5/13/24 13:28, Jeff Law wrote:



On 5/13/24 12:49 PM, Vineet Gupta wrote:

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization 
didn't

fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

    gcc-13.1 release   |  gcc 230823 | |
   |    g6619b3d4c15c    |   This patch |  
clang/llvm
- 

li  t0,-4096 | li    t0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addi    t0,t0,2016   | addi  t0,t0,2032    | add   sp,sp,-16   | addi 
sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0    | add  
a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sb    zero,0(a5)  | add  
a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   
zero,0(a0)
addi    a4,a4,-2032  | li    t0, 4096  | addi  sp,sp,32    | addi 
sp,sp,2032
add a4,a4,a5 | sb    zero,2032(a0) | ret   | addi 
sp,sp,48

addi    a5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addi    t0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.





@@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style)
  }
    else
  {
-  if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+  HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+  if (SMALL_OPERAND (adj_off_value))
+    {
+  adjust = GEN_INT (adj_off_value);
+    }
+  else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+    {
+  HOST_WIDE_INT base, off;
+  riscv_split_sum_of_two_s12 (adj_off_value, , );
+  insn = gen_add3_insn (stack_pointer_rtx, 
hard_frame_pointer_rtx,

+    GEN_INT (base));
+  RTX_FRAME_RELATED_P (insn) = 1;
+  adjust = GEN_INT (off);
+    }
So this was the hunk that we identified internally as causing problems 
with libgomp's testsuite.  We never fully chased it down as this hunk 
didn't seem terribly important performance wise -- we just set it 
aside.  The thing is it looked basically correct to me.  So the 
failure was certainly unexpected, but it was consistent.


So I think the question is whether or not the CI system runs the 
libgomp testsuite, particularly in the rv64 linux configuration. If it 
does, and it passes, then we're good.  I'm still finding my way around 
the configuration, so I don't know if the CI system Edwin & Patrick 
have built tests libgomp or not.


I poked around the .sum files in pre/postcommit and we do run tests like:

PASS: c-c++-common/gomp/affinity-2.c  (test for errors, line 45)

I'm not familar with libgomp so I don't know if that's the same libgomp 
tests you're referring to.


Patrick



If it isn't run, then we'll need to do a run to test that.  I'm set up 
here to do that if needed.   I can just drop this version into our 
internal tree, trigger an internal CI run and see if it complains :-)


If it does complain, then we know where to start investigations.




Jeff



Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Kees Cook
On Tue, May 14, 2024 at 01:38:49AM +0200, Andrew Pinski wrote:
> On Mon, May 13, 2024, 11:41 PM Kees Cook  wrote:
> > But it makes no sense to warn about:
> >
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >if (index >= 4)
> >  warn ();
> >*ptr = 0;
> >*val = sg->vals[index];
> >if (index >= 4)
> >  warn ();
> >*ptr = *val;
> > }
> >
> > Because at "*val = sg->vals[index];" the actual value range tracking for
> > index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
> > "if" statements is the range tracking [4,INT_MAX].)
> >
> > However, in the case where jump threading has split the execution flow
> > and produced a copy of "*val = sg->vals[index];" where the value range
> > tracking for "index" is now [4,INT_MAX], is the warning valid. But it
> > is only for that instance. Reporting it for effectively both (there is
> > only 1 source line for the array indexing) is misleading because there
> > is nothing the user can do about it -- the compiler created the copy and
> > then noticed it had a range it could apply to that array index.
> >
> 
> "there is nothing the user can do about it" is very much false. They could
> change warn call into a noreturn function call instead.  (In the case of
> the Linux kernel panic). There are things the user can do to fix the
> warning and even get better code generation out of the compilers.

This isn't about warn() not being noreturn. The warn() could be any
function call; the jump threading still happens.

GCC is warning about a compiler-constructed situation that cannot be
reliably fixed on the source side (GCC emitting the warning is highly
unstable in these cases), since the condition is not *always* true for
the given line of code. If it is not useful to warn for "array[index]"
being out of range when "index" is always [INT_MIN,INT_MAX], then it
is not useful to warn when "index" MAY be [INT_MIN,INT_MAX] for a given
line of code.

-Kees

-- 
Kees Cook


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Andrew Pinski
On Mon, May 13, 2024, 11:41 PM Kees Cook  wrote:

> On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote:
> >
> >
> > On 5/13/24 1:48 PM, Qing Zhao wrote:
> > > -Warray-bounds is an important option to enable linux kernal to keep
> > > the array out-of-bound errors out of the source tree.
> > >
> > > However, due to the false positive warnings reported in PR109071
> > > (-Warray-bounds false positive warnings due to code duplication from
> > > jump threading), -Warray-bounds=1 cannot be added on by default.
> > >
> > > Although it's impossible to elinimate all the false positive warnings
> > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > > documentation says "always out of bounds"), we should minimize the
> > > false positive warnings in -Warray-bounds=1.
> > >
> > > The root reason for the false positive warnings reported in PR109071
> is:
> > >
> > > When the thread jump optimization tries to reduce the # of branches
> > > inside the routine, sometimes it needs to duplicate the code and
> > > split into two conditional pathes. for example:
> > >
> > > The original code:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >if (index >= 4)
> > >  warn ();
> > >*ptr = 0;
> > >*val = sg->vals[index];
> > >if (index >= 4)
> > >  warn ();
> > >*ptr = *val;
> > >
> > >return;
> > > }
> > >
> > > With the thread jump, the above becomes:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >if (index >= 4)
> > >  {
> > >warn ();
> > >*ptr = 0;// Code duplications since "warn" does
> return;
> > >*val = sg->vals[index];  // same this line.
> > > // In this path, since it's under the
> condition
> > > // "index >= 4", the compiler knows the
> value
> > > // of "index" is larger then 4, therefore
> the
> > > // out-of-bound warning.
> > >warn ();
> > >  }
> > >else
> > >  {
> > >*ptr = 0;
> > >*val = sg->vals[index];
> > >  }
> > >*ptr = *val;
> > >return;
> > > }
> > >
> > > We can see, after the thread jump optimization, the # of branches
> inside
> > > the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> > > code duplication (which is needed for the correctness of the code), we
> > > got a false positive out-of-bound warning.
> > >
> > > In order to eliminate such false positive out-of-bound warning,
> > >
> > > A. Add one more flag for GIMPLE: is_splitted.
> > > B. During the thread jump optimization, when the basic blocks are
> > > duplicated, mark all the STMTs inside the original and duplicated
> > > basic blocks as "is_splitted";
> > > C. Inside the array bound checker, add the following new heuristic:
> > >
> > > If
> > > 1. the stmt is duplicated and splitted into two conditional paths;
> > > +  2. the warning level < 2;
> > > +  3. the current block is not dominating the exit block
> > > Then not report the warning.
> > >
> > > The false positive warnings are moved from -Warray-bounds=1 to
> > >   -Warray-bounds=2 now.
> > >
> > > Bootstrapped and regression tested on both x86 and aarch64. adjusted
> > >   -Warray-bounds-61.c due to the false positive warnings.
> > >
> > > Let me know if you have any comments and suggestions.
> > This sounds horribly wrong.   In the code above, the warning is correct.
>
> It's not sensible from a user's perspective.
>
> If this doesn't warn:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>*ptr = 0;
>*val = sg->vals[index];
>*ptr = *val;
> }
>
> ... because the value range tracking of "index" spans [INT_MIN,INT_MAX],
> and warnings based on the value range are silenced if they haven't been
> clamped at all. (Otherwise warnings would be produced everywhere: only
> when a limited set of values is known is it useful to produce a warning.)
>
>
> But it makes no sense to warn about:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>if (index >= 4)
>  warn ();
>*ptr = 0;
>*val = sg->vals[index];
>if (index >= 4)
>  warn ();
>*ptr = *val;
> }
>
> Because at "*val = sg->vals[index];" the actual value range tracking for
> index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
> "if" statements is the range tracking [4,INT_MAX].)
>
> However, in the case where jump threading has split the execution flow
> and produced a copy of "*val = sg->vals[index];" where the value range
> tracking for "index" is now [4,INT_MAX], is the warning valid. But it
> is only for that instance. Reporting it for effectively both (there is
> only 1 source line for the array indexing) is misleading because there
> is nothing the user can do about it -- the compiler created the copy and
> then noticed it had a range it could apply to that array index.
>

"there is 

Re: [PATCH v2 1/3] RISC-V: movmem for RISCV with V extension

2024-05-13 Thread Jeff Law




On 12/19/23 10:28 PM, Jeff Law wrote:



On 12/19/23 02:53, Sergei Lewis wrote:

gcc/ChangeLog

 * config/riscv/riscv.md (movmem): Use 
riscv_vector::expand_block_move,
 if and only if we know the entire operation can be performed 
using one vector

 load followed by one vector store

gcc/testsuite/ChangeLog

 PR target/112109
 * gcc.target/riscv/rvv/base/movmem-1.c: New test
So this needs to be regression tested.  Given that it only affects RVV, 
I would suggest testing rv64gcv or rv32gcv.





+(define_expand "movmem"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+   (match_operand:BLK 1 "general_operand"))
+    (use (match_operand:P 2 "const_int_operand"))
+    (use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8)
+    && (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
+    && riscv_vector::expand_block_move (operands[0], operands[1],
+ operands[2]))
+    DONE;
+  else
+    FAIL;
+})

Just a formatting nit.  A space on each side of the '/' operator above.
So I've fixed the formatting nit and tested on rv64gc and rv32gcv.  I 
hadn't planned to push it, but muscle memory kicked in and 1/3 has been 
pushed.


I'll be looking at 2/3 and 3/3 tomorrow (or possibly a bit tonight to 
take advantage of overnight CI runs).


jeff



Re: Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])

2024-05-13 Thread Vineet Gupta



On 5/13/24 15:47, Jeff Law wrote:
>> On 5/13/24 11:49, Vineet Gupta wrote:
>>>   500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
>>>   500.perlbench_r-1 |740,383,419,739 |   739,280,308,163 |
>>>   500.perlbench_r-2 |692,074,638,817 |   691,118,734,547 |
>>>   502.gcc_r-0   |190,820,141,435 |   190,857,065,988 |
>>>   502.gcc_r-1   |225,747,660,839 |   225,809,444,357 | <- -0.02%
>>>   502.gcc_r-2   |220,370,089,641 |   220,406,367,876 | <- -0.03%
>>>   502.gcc_r-3   |179,111,460,458 |   179,135,609,723 | <- -0.02%
>>>   502.gcc_r-4   |219,301,546,340 |   219,320,416,956 | <- -0.01%
>>>   503.bwaves_r-0|278,733,324,691 |   278,733,323,575 | <- -0.01%
>>>   503.bwaves_r-1|442,397,521,282 |   442,397,519,616 |
>>>   503.bwaves_r-2|344,112,218,206 |   344,112,216,760 |
>>>   503.bwaves_r-3|417,561,469,153 |   417,561,467,597 |
>>>   505.mcf_r |669,319,257,525 |   669,318,763,084 |
>>>   507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%
>> The small gcc regression seems like a tooling issue of some sort.
>> Looking at the topblocks, the insn sequences are exactly the same, only
>> the counts differ and its not obvious why.
>> Here's for gcc_r-1.
>>
>>
>>  > Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%:
>>
>>  000170ca :
>>     170ca:    7179        add    sp,sp,-48
>>     170cc:    ec26        sd    s1,24(sp)
>>     170ce:    e84a        sd    s2,16(sp)
>>     170d0:    e44e        sd    s3,8(sp)
>>     170d2:    f406        sd    ra,40(sp)
>>     170d4:    f022        sd    s0,32(sp)
>>     170d6:    84aa        mv    s1,a0
>>     170d8:    03200913      li    s2,50
>>     170dc:    03d00993      li    s3,61
>>     170e0:    8526        mv    a0,s1
>>     170e2:    001cd097      auipc    ra,0x1cd
>>     170e6:    bac080e7      jalr    -1108(ra) # 1e3c8e
>>  
>>
>>  > Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%:
>>  >  Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%:
>>  ...
>>
>>  < Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%:
>>  < Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%:
>>  < Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%:
>>
>>
>> FWIW, Greg internally has been looking at some of this and found some
>> issues in the bbv tooling, but I wish all of this was  shared/upstream
>> (QEMU bbv plugin) for people to compare notes and not discover/fix the
>> same issues over and again.
> Yea, we all meant to coordinate on those plugins.  The one we've got had 
> some problems with hash collisions and when there's a hash collision it 
> just produces total junk data.  I chased a few of these down and fixed 
> them about a year ago.
>
> The other thing is qemu will split up blocks based on its internal 
> notion of a translation page.   So if you're looking at block level data 
> you'll stumble over that as well.  This aspect is the most troublesome 
> problem I'm aware of right now.

And these two are exactly what Greg fixed, among others :-)

-Vineet


Re: Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])

2024-05-13 Thread Jeff Law




On 5/13/24 3:13 PM, Vineet Gupta wrote:

On 5/13/24 11:49, Vineet Gupta wrote:

  500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
  500.perlbench_r-1 |740,383,419,739 |   739,280,308,163 |
  500.perlbench_r-2 |692,074,638,817 |   691,118,734,547 |
  502.gcc_r-0   |190,820,141,435 |   190,857,065,988 |
  502.gcc_r-1   |225,747,660,839 |   225,809,444,357 | <- -0.02%
  502.gcc_r-2   |220,370,089,641 |   220,406,367,876 | <- -0.03%
  502.gcc_r-3   |179,111,460,458 |   179,135,609,723 | <- -0.02%
  502.gcc_r-4   |219,301,546,340 |   219,320,416,956 | <- -0.01%
  503.bwaves_r-0|278,733,324,691 |   278,733,323,575 | <- -0.01%
  503.bwaves_r-1|442,397,521,282 |   442,397,519,616 |
  503.bwaves_r-2|344,112,218,206 |   344,112,216,760 |
  503.bwaves_r-3|417,561,469,153 |   417,561,467,597 |
  505.mcf_r |669,319,257,525 |   669,318,763,084 |
  507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%


The small gcc regression seems like a tooling issue of some sort.
Looking at the topblocks, the insn sequences are exactly the same, only
the counts differ and its not obvious why.
Here's for gcc_r-1.


 > Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%:

 000170ca :
    170ca:    7179        add    sp,sp,-48
    170cc:    ec26        sd    s1,24(sp)
    170ce:    e84a        sd    s2,16(sp)
    170d0:    e44e        sd    s3,8(sp)
    170d2:    f406        sd    ra,40(sp)
    170d4:    f022        sd    s0,32(sp)
    170d6:    84aa        mv    s1,a0
    170d8:    03200913      li    s2,50
    170dc:    03d00993      li    s3,61
    170e0:    8526        mv    a0,s1
    170e2:    001cd097      auipc    ra,0x1cd
    170e6:    bac080e7      jalr    -1108(ra) # 1e3c8e
 

 > Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%:
 >  Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%:
 ...

 < Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%:
 < Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%:
 < Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%:


FWIW, Greg internally has been looking at some of this and found some
issues in the bbv tooling, but I wish all of this was  shared/upstream
(QEMU bbv plugin) for people to compare notes and not discover/fix the
same issues over and again.
Yea, we all meant to coordinate on those plugins.  The one we've got had 
some problems with hash collisions and when there's a hash collision it 
just produces total junk data.  I chased a few of these down and fixed 
them about a year ago.


The other thing is qemu will split up blocks based on its internal 
notion of a translation page.   So if you're looking at block level data 
you'll stumble over that as well.  This aspect is the most troublesome 
problem I'm aware of right now.






Jeff


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Kees Cook
On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote:
> 
> 
> On 5/13/24 1:48 PM, Qing Zhao wrote:
> > -Warray-bounds is an important option to enable linux kernal to keep
> > the array out-of-bound errors out of the source tree.
> > 
> > However, due to the false positive warnings reported in PR109071
> > (-Warray-bounds false positive warnings due to code duplication from
> > jump threading), -Warray-bounds=1 cannot be added on by default.
> > 
> > Although it's impossible to elinimate all the false positive warnings
> > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > documentation says "always out of bounds"), we should minimize the
> > false positive warnings in -Warray-bounds=1.
> > 
> > The root reason for the false positive warnings reported in PR109071 is:
> > 
> > When the thread jump optimization tries to reduce the # of branches
> > inside the routine, sometimes it needs to duplicate the code and
> > split into two conditional pathes. for example:
> > 
> > The original code:
> > 
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >if (index >= 4)
> >  warn ();
> >*ptr = 0;
> >*val = sg->vals[index];
> >if (index >= 4)
> >  warn ();
> >*ptr = *val;
> > 
> >return;
> > }
> > 
> > With the thread jump, the above becomes:
> > 
> > void sparx5_set (int * ptr, struct nums * sg, int index)
> > {
> >if (index >= 4)
> >  {
> >warn ();
> >*ptr = 0;// Code duplications since "warn" does return;
> >*val = sg->vals[index];  // same this line.
> > // In this path, since it's under the condition
> > // "index >= 4", the compiler knows the value
> > // of "index" is larger then 4, therefore the
> > // out-of-bound warning.
> >warn ();
> >  }
> >else
> >  {
> >*ptr = 0;
> >*val = sg->vals[index];
> >  }
> >*ptr = *val;
> >return;
> > }
> > 
> > We can see, after the thread jump optimization, the # of branches inside
> > the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> > code duplication (which is needed for the correctness of the code), we
> > got a false positive out-of-bound warning.
> > 
> > In order to eliminate such false positive out-of-bound warning,
> > 
> > A. Add one more flag for GIMPLE: is_splitted.
> > B. During the thread jump optimization, when the basic blocks are
> > duplicated, mark all the STMTs inside the original and duplicated
> > basic blocks as "is_splitted";
> > C. Inside the array bound checker, add the following new heuristic:
> > 
> > If
> > 1. the stmt is duplicated and splitted into two conditional paths;
> > +  2. the warning level < 2;
> > +  3. the current block is not dominating the exit block
> > Then not report the warning.
> > 
> > The false positive warnings are moved from -Warray-bounds=1 to
> >   -Warray-bounds=2 now.
> > 
> > Bootstrapped and regression tested on both x86 and aarch64. adjusted
> >   -Warray-bounds-61.c due to the false positive warnings.
> > 
> > Let me know if you have any comments and suggestions.
> This sounds horribly wrong.   In the code above, the warning is correct.

It's not sensible from a user's perspective.

If this doesn't warn:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   *ptr = 0;
   *val = sg->vals[index];
   *ptr = *val;
}

... because the value range tracking of "index" spans [INT_MIN,INT_MAX],
and warnings based on the value range are silenced if they haven't been
clamped at all. (Otherwise warnings would be produced everywhere: only
when a limited set of values is known is it useful to produce a warning.)


But it makes no sense to warn about:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   if (index >= 4)
 warn ();
   *ptr = 0;
   *val = sg->vals[index];
   if (index >= 4)
 warn ();
   *ptr = *val;
}

Because at "*val = sg->vals[index];" the actual value range tracking for
index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
"if" statements is the range tracking [4,INT_MAX].)

However, in the case where jump threading has split the execution flow
and produced a copy of "*val = sg->vals[index];" where the value range
tracking for "index" is now [4,INT_MAX], is the warning valid. But it
is only for that instance. Reporting it for effectively both (there is
only 1 source line for the array indexing) is misleading because there
is nothing the user can do about it -- the compiler created the copy and
then noticed it had a range it could apply to that array index.

This situation makes -Warray-bounds unusable for the Linux kernel (we
cannot have false positives says BDFL), but we'd *really* like to have
it enabled since it usually finds real bugs. But these false positives
can't be fixed on our end. :( So, moving them to -Warray-bounds=2 makes
sense as that's the level 

[PATCH] RISC-V: add option -m(no-)autovec-segment

2024-05-13 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai


[pushed] wwwdocs: cxx-dr-status: Replace by

2024-05-13 Thread Gerald Pfeifer
The validator warns about  as deprecated; use  instead.

Pushed.

Gerald

---
 htdocs/projects/cxx-dr-status.html | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index c70cdf21..e29d2407 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -19929,7 +19929,7 @@
 
   https://wg21.link/cwg2842;>2842
   open
-  Preferring an initializer_list over a single value
+  Preferring an initializer_list over a single value
   -
   
 
@@ -20062,7 +20062,7 @@
 
   https://wg21.link/cwg2861;>2861
   review
-  dynamic_cast on bad pointer value
+  dynamic_cast on bad pointer value
   ?
   
 
@@ -20097,7 +20097,7 @@
 
   https://wg21.link/cwg2866;>2866
   open
-  Observing the effects of [[no_unique_address] wwwdocs:]
+  Observing the effects of [[no_unique_address] 
wwwdocs:]
   -
   
 
@@ -20118,7 +20118,7 @@
 
   https://wg21.link/cwg2869;>2869
   open
-  this in local classes
+  this in local classes
   -
   
 
@@ -20167,7 +20167,7 @@
 
   https://wg21.link/cwg2876;>2876
   open
-  Disambiguation of T x = delete("text")
+  Disambiguation of T x = delete("text")
   -
   
 
@@ -20188,7 +20188,7 @@
 
   https://wg21.link/cwg2879;>2879
   open
-  Undesired outcomes with const_cast
+  Undesired outcomes with const_cast
   -
   
 
-- 
2.45.0


Re: Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-13 Thread 钟居哲
>> Seems a bit odd on first sight.  If all we want to do is to
>> select between two masks why do we need a large Pmode mode?

Since we are lowering final mask = vcond_mask_len (mask, 1s, 0s, len, bias),
into:

vid.v v1
vcmp v2
vmsltu.vx  v2, v1, len, TUMU
Then len is Pmode, so we only allow to lower vcond_mask_len with vector mode 
for Pmode.

>> So that's basically a mask-move with length?  Can't this be done
>> differently?  If not, please describe, maybe this is already
>> the shortest way.

We are implementing: final mask = mask[i] && i < len ? 1 : 0
The mask move with length but TUMU, I believe current approach is the optimal 
way.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-14 05:14
To: pan2.li; gcc-patches
CC: rdapp.gcc; juzhe.zhong; kito.cheng; richard.guenther; Tamar.Christina; 
richard.sandiford
Subject: Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len
Hi Pan,
 
thanks for working on this.
 
In general the patch looks reasonable to me but I'd rather
have some more comments about the high-level idea.
E.g. cbranch is implemented like aarch64 by xor'ing the
bitmasks and comparing the result against zero (so we branch
based on mask equality).
 
> +;; vcond_mask_len
 
High-level description here instead please.
 
> +(define_insn_and_split "vcond_mask_len_"
> +  [(set (match_operand:VB 0 "register_operand")
 
> +(unspec: VB [
> + (match_operand:VB 1 "register_operand")
> + (match_operand:VB 2 "const_1_operand")
 
I guess it works like that because operand[2] is just implicitly
used anyway but shouldn't that rather be an all_ones_operand?
 
> +   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
> (mode)).exists ()"
 
Seems a bit odd on first sight.  If all we want to do is to
select between two masks why do we need a large Pmode mode?
 
> +rtx ops[] = {operands[0], operands[1], operands[1], cmp, reg, 
> operands[4]};
 
So that's basically a mask-move with length?  Can't this be done
differently?  If not, please describe, maybe this is already
the shortest way.
 
Regards
Robin
 
 


Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-05-13 Thread 钟居哲
Hi, Robin.

I saw vwadd/vwsub.wx have same issue. Could you change them and add test too ?

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-14 04:15
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].
Hi,
 
this patch splits the vfw...wf pattern so we do not emit
e.g. vfwadd.wf v0,v8,fa5,v0.t anymore.
 
Regtested on rv64gcv_zvfh.
 
Regards
Robin
 
gcc/ChangeLog:
 
PR target/115068
 
* config/riscv/vector.md:  Split vfw.wf pattern.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068-run.c: New test.
* gcc.target/riscv/rvv/base/pr115068.c: New test.
---
gcc/config/riscv/vector.md| 20 ++---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 28 ++
.../gcc.target/riscv/rvv/base/pr115068.c  | 29 +++
3 files changed, 67 insertions(+), 10 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 2a54f78df8e..e408baa809c 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7178,24 +7178,24 @@ (define_insn "@pred_single_widen_sub"
(symbol_ref "riscv_vector::get_frm_mode (operands[9])"))])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTF 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTF 0 "register_operand""=vd, vd, 
vr, vr")
(if_then_else:VWEXTF
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
-  (match_operand 9 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"  " vm, vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand" " rK, rK, rK, rK")
+  (match_operand 6 "const_int_operand" "  i,  i,  i,  i")
+  (match_operand 7 "const_int_operand" "  i,  i,  i,  i")
+  (match_operand 8 "const_int_operand" "  i,  i,  i,  i")
+  (match_operand 9 "const_int_operand" "  i,  i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTF
- (match_operand:VWEXTF 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, vr")
(float_extend:VWEXTF
  (vec_duplicate:
- (match_operand: 4 "register_operand"   "f,f"
-   (match_operand:VWEXTF 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "register_operand"  "  f,  f,  f,  f"
+   (match_operand:VWEXTF 2 "vector_merge_operand"  " vu,  0, vu,  
0")))]
   "TARGET_VECTOR"
   "vfw.wf\t%0,%3,%4%p1"
   [(set_attr "type" "vf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
new file mode 100644
index 000..95ec8e06021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile ("fence" ::: "memory");
+  vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m (
+__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()),
+__riscv_vundefined_f64m8 (), 1.0, __riscv_vsetvlmax_e64m8 ());
+  asm volatile ("" ::"vr"(vfwadd_wf_f64m8_m_vd) : "memory");
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
new file mode 100644
index 000..6d680037aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile ("fence" ::: "memory");
+  vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m (
+

Re: [PATCH] arm: Force flag_pic for FDPIC

2024-05-13 Thread Fangrui Song
On Mon, Mar 4, 2024 at 12:13 AM Fangrui Song  wrote:
>
> From: Fangrui Song 
>
> -fno-pic -mfdpic generated code is like regular -fno-pic, not suitable
> for FDPIC (absolute addressing for symbol references and no function
> descriptor).  The sh port simply upgrades -fno-pic to -fpie by setting
> flag_pic.  Let's follow suit.
>
> Link: 
> https://inbox.sourceware.org/gcc-patches/20150913165303.gc17...@brightrain.aerifal.cx/
>
> gcc/ChangeLog:
>
> * config/arm/arm.cc (arm_option_override): Set flag_pic if
>   TARGET_FDPIC.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/fdpic-pie.c: New test.
> ---
>  gcc/config/arm/arm.cc|  6 +
>  gcc/testsuite/gcc.target/arm/fdpic-pie.c | 30 
>  2 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/arm/fdpic-pie.c
>
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 1cd69268ee9..f2fd3cce48c 100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -3682,6 +3682,12 @@ arm_option_override (void)
>arm_pic_register = FDPIC_REGNUM;
>if (TARGET_THUMB1)
> sorry ("FDPIC mode is not supported in Thumb-1 mode");
> +
> +  /* FDPIC code is a special form of PIC, and the vast majority of code
> +generation constraints that apply to PIC also apply to FDPIC, so we
> + set flag_pic to avoid the need to check TARGET_FDPIC everywhere
> + flag_pic is checked. */
> +  flag_pic = 2;
>  }
>
>if (arm_pic_register_string != NULL)
> diff --git a/gcc/testsuite/gcc.target/arm/fdpic-pie.c 
> b/gcc/testsuite/gcc.target/arm/fdpic-pie.c
> new file mode 100644
> index 000..909db8bce74
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/fdpic-pie.c
> @@ -0,0 +1,30 @@
> +// { dg-do compile }
> +// { dg-options "-O2 -fno-pic -mfdpic" }
> +// { dg-skip-if "-mpure-code and -fPIC incompatible" { *-*-* } { 
> "-mpure-code" } }
> +
> +__attribute__((visibility("hidden"))) void hidden_fun(void);
> +void fun(void);
> +__attribute__((visibility("hidden"))) extern int hidden_var;
> +extern int var;
> +__attribute__((visibility("hidden"))) const int ro_hidden_var = 42;
> +
> +// { dg-final { scan-assembler "hidden_fun\\(GOTOFFFUNCDESC\\)" } }
> +void *addr_hidden_fun(void) { return hidden_fun; }
> +
> +// { dg-final { scan-assembler "fun\\(GOTFUNCDESC\\)" } }
> +void *addr_fun(void) { return fun; }
> +
> +// { dg-final { scan-assembler "hidden_var\\(GOT\\)" } }
> +void *addr_hidden_var(void) { return _var; }
> +
> +// { dg-final { scan-assembler "var\\(GOT\\)" } }
> +void *addr_var(void) { return  }
> +
> +// { dg-final { scan-assembler ".LANCHOR0\\(GOT\\)" } }
> +const int *addr_ro_hidden_var(void) { return _hidden_var; }
> +
> +// { dg-final { scan-assembler "hidden_var\\(GOT\\)" } }
> +int read_hidden_var(void) { return hidden_var; }
> +
> +// { dg-final { scan-assembler "var\\(GOT\\)" } }
> +int read_var(void) { return var; }
> --
> 2.44.0.rc1.240.g4c46232300-goog

Ping:)


-- 
宋方睿


Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-13 Thread Robin Dapp
Hi Pan,

thanks for working on this.

In general the patch looks reasonable to me but I'd rather
have some more comments about the high-level idea.
E.g. cbranch is implemented like aarch64 by xor'ing the
bitmasks and comparing the result against zero (so we branch
based on mask equality).

> +;; vcond_mask_len

High-level description here instead please.

> +(define_insn_and_split "vcond_mask_len_"
> +  [(set (match_operand:VB 0 "register_operand")

> +(unspec: VB [
> + (match_operand:VB 1 "register_operand")
> + (match_operand:VB 2 "const_1_operand")

I guess it works like that because operand[2] is just implicitly
used anyway but shouldn't that rather be an all_ones_operand?

> +   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
> (mode)).exists ()"

Seems a bit odd on first sight.  If all we want to do is to
select between two masks why do we need a large Pmode mode?

> +rtx ops[] = {operands[0], operands[1], operands[1], cmp, reg, 
> operands[4]};

So that's basically a mask-move with length?  Can't this be done
differently?  If not, please describe, maybe this is already
the shortest way.

Regards
 Robin



Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])

2024-05-13 Thread Vineet Gupta
On 5/13/24 11:49, Vineet Gupta wrote:
>  500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
>  500.perlbench_r-1 |740,383,419,739 |   739,280,308,163 |
>  500.perlbench_r-2 |692,074,638,817 |   691,118,734,547 |
>  502.gcc_r-0   |190,820,141,435 |   190,857,065,988 |
>  502.gcc_r-1   |225,747,660,839 |   225,809,444,357 | <- -0.02%
>  502.gcc_r-2   |220,370,089,641 |   220,406,367,876 | <- -0.03%
>  502.gcc_r-3   |179,111,460,458 |   179,135,609,723 | <- -0.02%
>  502.gcc_r-4   |219,301,546,340 |   219,320,416,956 | <- -0.01%
>  503.bwaves_r-0|278,733,324,691 |   278,733,323,575 | <- -0.01%
>  503.bwaves_r-1|442,397,521,282 |   442,397,519,616 |
>  503.bwaves_r-2|344,112,218,206 |   344,112,216,760 |
>  503.bwaves_r-3|417,561,469,153 |   417,561,467,597 |
>  505.mcf_r |669,319,257,525 |   669,318,763,084 |
>  507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%

The small gcc regression seems like a tooling issue of some sort.
Looking at the topblocks, the insn sequences are exactly the same, only
the counts differ and its not obvious why.
Here's for gcc_r-1.


> Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%:

000170ca :
   170ca:    7179        add    sp,sp,-48
   170cc:    ec26        sd    s1,24(sp)
   170ce:    e84a        sd    s2,16(sp)
   170d0:    e44e        sd    s3,8(sp)
   170d2:    f406        sd    ra,40(sp)
   170d4:    f022        sd    s0,32(sp)
   170d6:    84aa        mv    s1,a0
   170d8:    03200913      li    s2,50
   170dc:    03d00993      li    s3,61
   170e0:    8526        mv    a0,s1
   170e2:    001cd097      auipc    ra,0x1cd
   170e6:    bac080e7      jalr    -1108(ra) # 1e3c8e


> Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%:
>  Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%:
...

< Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%:
< Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%:
< Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%:


FWIW, Greg internally has been looking at some of this and found some
issues in the bbv tooling, but I wish all of this was  shared/upstream
(QEMU bbv plugin) for people to compare notes and not discover/fix the
same issues over and again.

Thx,
-Vineet


Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Jeff Law




On 5/13/24 1:48 PM, Qing Zhao wrote:

-Warray-bounds is an important option to enable linux kernal to keep
the array out-of-bound errors out of the source tree.

However, due to the false positive warnings reported in PR109071
(-Warray-bounds false positive warnings due to code duplication from
jump threading), -Warray-bounds=1 cannot be added on by default.

Although it's impossible to elinimate all the false positive warnings
from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
documentation says "always out of bounds"), we should minimize the
false positive warnings in -Warray-bounds=1.

The root reason for the false positive warnings reported in PR109071 is:

When the thread jump optimization tries to reduce the # of branches
inside the routine, sometimes it needs to duplicate the code and
split into two conditional pathes. for example:

The original code:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   if (index >= 4)
 warn ();
   *ptr = 0;
   *val = sg->vals[index];
   if (index >= 4)
 warn ();
   *ptr = *val;

   return;
}

With the thread jump, the above becomes:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
   if (index >= 4)
 {
   warn ();
   *ptr = 0;// Code duplications since "warn" does return;
   *val = sg->vals[index];   // same this line.
// In this path, since it's under the condition
// "index >= 4", the compiler knows the value
// of "index" is larger then 4, therefore the
// out-of-bound warning.
   warn ();
 }
   else
 {
   *ptr = 0;
   *val = sg->vals[index];
 }
   *ptr = *val;
   return;
}

We can see, after the thread jump optimization, the # of branches inside
the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
code duplication (which is needed for the correctness of the code), we
got a false positive out-of-bound warning.

In order to eliminate such false positive out-of-bound warning,

A. Add one more flag for GIMPLE: is_splitted.
B. During the thread jump optimization, when the basic blocks are
duplicated, mark all the STMTs inside the original and duplicated
basic blocks as "is_splitted";
C. Inside the array bound checker, add the following new heuristic:

If
1. the stmt is duplicated and splitted into two conditional paths;
+  2. the warning level < 2;
+  3. the current block is not dominating the exit block
Then not report the warning.

The false positive warnings are moved from -Warray-bounds=1 to
  -Warray-bounds=2 now.

Bootstrapped and regression tested on both x86 and aarch64. adjusted
  -Warray-bounds-61.c due to the false positive warnings.

Let me know if you have any comments and suggestions.

This sounds horribly wrong.   In the code above, the warning is correct.

Jeff


Re: [PATCH] RISC-V: add option -m(no-)autovec-segment

2024-05-13 Thread Vineet Gupta


On 2/27/24 07:25, Jeff Law wrote:
> On 2/25/24 21:53, Greg McGary wrote:
>> Add option -m(no-)autovec-segment to enable/disable autovectorizer
>> from emitting vector segment load/store instructions. This is useful for
>> performance experiments.
>>
>> gcc/ChangeLog:
>>  * config/riscv/autovec.md (vec_mask_len_load_lanes, 
>> vec_mask_len_store_lanes):
>>Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
>>  * gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
>> macro.
>>  * gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
>>  * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
>> divide-by-zero.
>>  * testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
>>  testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.
> I don't mind having options to do this kind of selection (we've done 
> similar things internally for other RVV features).  But I don't think 
> now is the time to be introducing this stuff.  We're in stage4 of the 
> development cycle after all.

Ping ! now that we are back in stage1

Thx,
-Vineet


Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-13 Thread Jeff Law




On 5/13/24 12:49 PM, Vineet Gupta wrote:

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

gcc-13.1 release   |  gcc 230823 |   |
   |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.





@@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style)
}
else
{
- if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+ HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+ if (SMALL_OPERAND (adj_off_value))
+   {
+ adjust = GEN_INT (adj_off_value);
+   }
+ else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+   {
+ HOST_WIDE_INT base, off;
+ riscv_split_sum_of_two_s12 (adj_off_value, , );
+ insn = gen_add3_insn (stack_pointer_rtx, hard_frame_pointer_rtx,
+   GEN_INT (base));
+ RTX_FRAME_RELATED_P (insn) = 1;
+ adjust = GEN_INT (off);
+   }
So this was the hunk that we identified internally as causing problems 
with libgomp's testsuite.  We never fully chased it down as this hunk 
didn't seem terribly important performance wise -- we just set it aside. 
 The thing is it looked basically correct to me.  So the failure was 
certainly unexpected, but it was consistent.


So I think the question is whether or not the CI system runs the libgomp 
testsuite, particularly in the rv64 linux configuration.  If it does, 
and it passes, then we're good.  I'm still finding my way around the 
configuration, so I don't know if the CI system Edwin & Patrick have 
built tests libgomp or not.


If it isn't run, then we'll need to do a run to test that.  I'm set up 
here to do that if needed.   I can just drop this version into our 
internal tree, trigger an internal CI run and see if it complains :-)


If it does complain, then we know where to start investigations.




Jeff



[PATCH] Fortran: fix bounds check for assignment, class component [PR86100]

2024-05-13 Thread Harald Anlauf
Dear all,

the attached patch does two things:

- it fixes a bogus array bounds check when deep-copying a class component
  of a derived type and the class component has rank > 1, the reason being
  that the previous code compared the full size of one side with the size
  of the first dimension of the other

- the bounds-check error message that was generated e.g. by an allocate
  statement with conflicting sizes in the allocation and the source-expr
  will now use an improved abbreviated name pointing to the component
  involved, which was introduced in 14-development.

What I could not resolve: a deep copy may still create no useful array
name in the error message (which I am now unable to trigger).  If someone
sees how to extract it reliably from the tree, please let me know.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

I would like to backport this to 14-branch after a decent delay.

Thanks,
Harald

From e187285dfd83da2f69cfd50854c701744dc8acc5 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 13 May 2024 22:06:33 +0200
Subject: [PATCH] Fortran: fix bounds check for assignment, class component
 [PR86100]

gcc/fortran/ChangeLog:

	PR fortran/86100
	* trans-array.cc (gfc_conv_ss_startstride): Use abridged_ref_name
	to generate a more user-friendly name for bounds-check messages.
	* trans-expr.cc (gfc_copy_class_to_class): Fix bounds check for
	rank>1 by looping over the dimensions.

gcc/testsuite/ChangeLog:

	PR fortran/86100
	* gfortran.dg/bounds_check_25.f90: New test.
---
 gcc/fortran/trans-array.cc|  7 +++-
 gcc/fortran/trans-expr.cc | 40 ++-
 gcc/testsuite/gfortran.dg/bounds_check_25.f90 | 32 +++
 3 files changed, 60 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/bounds_check_25.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index c5b56f4e273..eec62c296ff 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4911,6 +4911,7 @@ done:
 	  gfc_expr *expr;
 	  locus *expr_loc;
 	  const char *expr_name;
+	  char *ref_name = NULL;

 	  ss_info = ss->info;
 	  if (ss_info->type != GFC_SS_SECTION)
@@ -4922,7 +4923,10 @@ done:

 	  expr = ss_info->expr;
 	  expr_loc = >where;
-	  expr_name = expr->symtree->name;
+	  if (expr->ref)
+	expr_name = ref_name = abridged_ref_name (expr, NULL);
+	  else
+	expr_name = expr->symtree->name;

 	  gfc_start_block ();

@@ -5134,6 +5138,7 @@ done:

 	  gfc_add_expr_to_block (, tmp);

+	  free (ref_name);
 	}

   tmp = gfc_finish_block ();
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index e315e2d3370..dfc5b8e9b4a 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -1520,7 +1520,6 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited)
   stmtblock_t body;
   stmtblock_t ifbody;
   gfc_loopinfo loop;
-  tree orig_nelems = nelems; /* Needed for bounds check.  */

   gfc_init_block ();
   tmp = fold_build2_loc (input_location, MINUS_EXPR,
@@ -1552,27 +1551,32 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited)
   /* Add bounds check.  */
   if ((gfc_option.rtcheck & GFC_RTCHECK_BOUNDS) > 0 && is_from_desc)
 	{
-	  char *msg;
 	  const char *name = "<>";
-	  tree from_len;
+	  int dim, rank;

 	  if (DECL_P (to))
-	name = (const char *)(DECL_NAME (to)->identifier.id.str);
-
-	  from_len = gfc_conv_descriptor_size (from_data, 1);
-	  from_len = fold_convert (TREE_TYPE (orig_nelems), from_len);
-	  tmp = fold_build2_loc (input_location, NE_EXPR,
-  logical_type_node, from_len, orig_nelems);
-	  msg = xasprintf ("Array bound mismatch for dimension %d "
-			   "of array '%s' (%%ld/%%ld)",
-			   1, name);
-
-	  gfc_trans_runtime_check (true, false, tmp, ,
-   _current_locus, msg,
-			 fold_convert (long_integer_type_node, orig_nelems),
-			   fold_convert (long_integer_type_node, from_len));
+	name = IDENTIFIER_POINTER (DECL_NAME (to));

-	  free (msg);
+	  rank = GFC_TYPE_ARRAY_RANK (TREE_TYPE (from_data));
+	  for (dim = 1; dim <= rank; dim++)
+	{
+	  tree from_len, to_len, cond;
+	  char *msg;
+
+	  from_len = gfc_conv_descriptor_size (from_data, dim);
+	  from_len = fold_convert (long_integer_type_node, from_len);
+	  to_len = gfc_conv_descriptor_size (to_data, dim);
+	  to_len = fold_convert (long_integer_type_node, to_len);
+	  msg = xasprintf ("Array bound mismatch for dimension %d "
+			   "of array '%s' (%%ld/%%ld)",
+			   dim, name);
+	  cond = fold_build2_loc (input_location, NE_EXPR,
+  logical_type_node, from_len, to_len);
+	  gfc_trans_runtime_check (true, false, cond, ,
+   _current_locus, msg,
+   to_len, from_len);
+	  free (msg);
+	}
 	}

   tmp = build_call_vec (fcn_type, fcn, args);
diff --git a/gcc/testsuite/gfortran.dg/bounds_check_25.f90 

Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-05-13 Thread Jeff Law




On 5/13/24 12:49 PM, Vineet Gupta wrote:

Apologies for the delay in getting this out. Needed to fix one ICE
with glibc build and fresh round of testing: both testsuite and SPEC
runs (which are similar to v1 in terms of Cactu gains, but some more minor
regressions elsewhere gcc). Again those seem so small that IMHO this
should still go in.

I'll investigate those next as well as an existing weirdnes in glibc tempnam
which I spotted during the debugging.

Changes since v1 [1]
  - Tighten the main conditition to avoid stack regs as destination
(to avoid making them potentially unaligned with -2047 addend:
 this might be OK execution/ABI wise, but undesirable/ugly still
 specially when coming from compiler codegen).
  - Ensure that first alternative is always split
  - Remove "&& 1" from split condition. That was tripping up glibc build
with illegal operands `add s0, s0, 2048`.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647877.html

  
+;; Special case of adding a reg and constant if latter is sum of two S12

+;; values (in range -2048 to 2047). Avoid materialized the const and fuse
+;; into the add (with an additional add for 2nd value). Makes a 3 insn
+;; sequence into 2 insn.
+
+(define_insn_and_split "*add3_const_sum_of_two_s12"
+  [(set (match_operand:P0 "register_operand" "=r,r")
+   (plus:P (match_operand:P 1 "register_operand" " r,r")
+   (match_operand:P 2 "const_two_s12"" MiG,r")))]
+  "!riscv_reg_frame_related (operands[0])"
So that !riscv_reg_frame_related is my only concern with this patch. 
It's a destination, so it *may* be OK.


If it were a source operand, then we'd have to worry about cases where 
it was a pseudo with the same value as sp/fp/argp and subsequent copy 
propagation replacing the pseudo with sp/fp/argp causing the insn to no 
longer match.


Similarly if it were a source operand we'd have to worry about cases 
where the pseudo had a registered (or discoverable) equivalence to 
sp/fp/argp plus an offset.  IRA/LRA can replace the use with its 
equivalence in some of those cases which would have potentially caused 
headaches.


But as a destination we really just have to worry about generation in 
the prologue/epilogue and for alloca calls.  Those should be the only 
places that set one of those special registers.  They're constrained 
enough that I think we'll be OK.


I'm very slightly worried about hard register cprop, but I think it 
should be safe these days WRT those special registers in the unlikely 
event it found an opportunity to propagate them.


So a tentative OK.  If we find this tidibit is problematical in the 
future, then what I would suggest is we allow those special registers 
and dial-back the aggressiveness on the range of allowed constants. 
That would allow the first instruction in the sequence to never create a 
mis-aligned sp.  But again, that's only if we need to revisit.


Please wait for CI to report back sane results :-)

Jeff


[PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-05-13 Thread Robin Dapp
Hi,

this patch splits the vfw...wf pattern so we do not emit
e.g. vfwadd.wf v0,v8,fa5,v0.t anymore.

Regtested on rv64gcv_zvfh.

Regards
 Robin

gcc/ChangeLog:

PR target/115068

* config/riscv/vector.md:  Split vfw.wf pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr115068-run.c: New test.
* gcc.target/riscv/rvv/base/pr115068.c: New test.
---
 gcc/config/riscv/vector.md| 20 ++---
 .../gcc.target/riscv/rvv/base/pr115068-run.c  | 28 ++
 .../gcc.target/riscv/rvv/base/pr115068.c  | 29 +++
 3 files changed, 67 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 2a54f78df8e..e408baa809c 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7178,24 +7178,24 @@ (define_insn "@pred_single_widen_sub"
(symbol_ref "riscv_vector::get_frm_mode (operands[9])"))])
 
 (define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTF 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTF 0 "register_operand""=vd, vd, 
vr, vr")
(if_then_else:VWEXTF
  (unspec:
-   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
-(match_operand 5 "vector_length_operand"  "   rK,   
rK")
-(match_operand 6 "const_int_operand"  "i,
i")
-(match_operand 7 "const_int_operand"  "i,
i")
-(match_operand 8 "const_int_operand"  "i,
i")
-(match_operand 9 "const_int_operand"  "i,
i")
+   [(match_operand: 1 "vector_mask_operand"  " vm, 
vm,Wc1,Wc1")
+(match_operand 5 "vector_length_operand" " rK, rK, rK, 
rK")
+(match_operand 6 "const_int_operand" "  i,  i,  i, 
 i")
+(match_operand 7 "const_int_operand" "  i,  i,  i, 
 i")
+(match_operand 8 "const_int_operand" "  i,  i,  i, 
 i")
+(match_operand 9 "const_int_operand" "  i,  i,  i, 
 i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTF
-   (match_operand:VWEXTF 3 "register_operand" "   vr,   
vr")
+   (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, 
vr")
(float_extend:VWEXTF
  (vec_duplicate:
-   (match_operand: 4 "register_operand"   "f,
f"
- (match_operand:VWEXTF 2 "vector_merge_operand"   "   vu,
0")))]
+   (match_operand: 4 "register_operand"  "  f,  f,  f, 
 f"
+ (match_operand:VWEXTF 2 "vector_merge_operand"  " vu,  0, vu, 
 0")))]
   "TARGET_VECTOR"
   "vfw.wf\t%0,%3,%4%p1"
   [(set_attr "type" "vf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
new file mode 100644
index 000..95ec8e06021
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target riscv_v_ok } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile ("fence" ::: "memory");
+  vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m (
+__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()),
+__riscv_vundefined_f64m8 (), 1.0, __riscv_vsetvlmax_e64m8 ());
+  asm volatile ("" ::"vr"(vfwadd_wf_f64m8_m_vd) : "memory");
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
new file mode 100644
index 000..6d680037aa1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-std=gnu99" } */
+
+#include 
+#include 
+
+vfloat64m8_t
+test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl)
+{
+  return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl);
+}
+
+char global_memory[1024];
+void *fake_memory = (void *) global_memory;
+
+int
+main ()
+{
+  asm volatile ("fence" ::: "memory");
+  vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m (
+__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()),
+

[wwwdocs] cxx-dr-status: Update from C++ Core Language Issue TOC, Revision 114

2024-05-13 Thread Marek Polacek
Pushed.

commit 06c46c88cc02e0dff5f65b41754178fb25fb939e
Author: Marek Polacek 
Date:   Mon May 13 16:09:05 2024 -0400

cxx-dr-status: Update from C++ Core Language Issue TOC, Revision 114

diff --git a/htdocs/projects/cxx-dr-status.html 
b/htdocs/projects/cxx-dr-status.html
index a5f45359..2a61cfbd 100644
--- a/htdocs/projects/cxx-dr-status.html
+++ b/htdocs/projects/cxx-dr-status.html
@@ -15,7 +15,7 @@
 
   This table tracks the implementation status of C++ defect reports in GCC.
   It is based on C++ Standard Core Language Issue Table of Contents, Revision
-  113 (https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here).
+  114 (https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here).
 
   
 
@@ -1652,7 +1652,7 @@
 
 
   https://wg21.link/cwg233;>233
-  drafting
+  review
   References vs pointers in UDC overload resolution
   No
   https://gcc.gnu.org/PR114697;>PR114697
@@ -3196,7 +3196,7 @@
 
 
   https://wg21.link/cwg453;>453
-  tentatively ready
+  DR
   References may only bind to "valid" objects
   ?
   
@@ -7031,11 +7031,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg1001;>1001
-  drafting
+  review
   Parameter type adjustment in dependent parameter types
-  -
+  ?
   https://gcc.gnu.org/PR51851;>PR51851
 
 
@@ -7292,7 +7292,7 @@
 
 
   https://wg21.link/cwg1038;>1038
-  DR
+  DRWP
   Overload resolution of x.static_func
   ?
   
@@ -8624,6 +8624,7 @@
   https://wg21.link/cwg1228;>1228
   NAD
   Copy-list-initialization and explicit constructors
+
   No
   https://gcc.gnu.org/PR113300;>PR113300
 
@@ -11916,7 +11917,7 @@
 
 
   https://wg21.link/cwg1698;>1698
-  DR
+  DRWP
   Files ending in \
   ?
   
@@ -12075,11 +12076,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg1721;>1721
-  drafting
+  review
   Diagnosing ODR violations for static data members
-  -
+  ?
   
 
 
@@ -13454,11 +13455,11 @@
   N/A
   
 
-
+
   https://wg21.link/cwg1918;>1918
-  open
+  CD5
   friend templates with dependent scopes
-  -
+  ?
   
 
 
@@ -13644,11 +13645,11 @@
   -
   
 
-
+
   https://wg21.link/cwg1945;>1945
-  open
+  CD5
   Friend declarations naming members of class templates in 
non-templates
-  -
+  ?
   
 
 
@@ -13709,7 +13710,7 @@
 
 
   https://wg21.link/cwg1954;>1954
-  tentatively ready
+  DR
   typeid null dereference check in subexpressions
   ?
   
@@ -14373,11 +14374,11 @@
   -
   
 
-
+
   https://wg21.link/cwg2049;>2049
-  drafting
+  DRWP
   List initializer in non-type template default argument
-  -
+  ?
   
 
 
@@ -14410,7 +14411,7 @@
 
 
   https://wg21.link/cwg2054;>2054
-  DR
+  DRWP
   Missing description of class SFINAE
   ?
   
@@ -14746,7 +14747,7 @@
 
 
   https://wg21.link/cwg2102;>2102
-  DR
+  DRWP
   Constructor checking in new-expression
   ?
   
@@ -15797,7 +15798,7 @@
 
 
   https://wg21.link/cwg2252;>2252
-  DR
+  DRWP
   Enumeration list-initialization from the same type
   ?
   
@@ -17069,11 +17070,11 @@
   ?
   
 
-
+
   https://wg21.link/cwg2434;>2434
-  open
+  review
   Mandatory copy elision vs non-class objects
-  -
+  ?
   
 
 
@@ -17183,7 +17184,7 @@
 
 
   https://wg21.link/cwg2450;>2450
-  review
+  DRWP
   braced-init-list as a template-argument
   11
   
@@ -17244,12 +17245,12 @@
   ?
   
 
-
+
   https://wg21.link/cwg2459;>2459
-  drafting
+  DRWP
   Template parameter initialization
-  -
-  
+  ?
+  https://gcc.gnu.org/PR113800;>PR113800
 
 
   https://wg21.link/cwg2460;>2460
@@ -17365,7 +17366,7 @@
 
 
   https://wg21.link/cwg2476;>2476
-  tentatively ready
+  DR
   placeholder-type-specifiers and function declarators
   ?
   
@@ -17561,7 +17562,7 @@
 
 
   https://wg21.link/cwg2504;>2504
-  DR
+  DRWP
   Inheriting constructors from virtual base classes
   ?
   
@@ -17750,7 +17751,7 @@
 
 
   https://wg21.link/cwg2531;>2531
-  DR
+  DRWP
   Static data members redeclared as constexpr
   ?
   
@@ -17764,7 +17765,7 @@
 
 
   https://wg21.link/cwg2533;>2533
-  review
+  DR
   Storage duration of implicitly created objects
   ?
   
@@ -17855,14 +17856,14 @@
 
 
   https://wg21.link/cwg2546;>2546
-  tentatively ready
+  DR
   Defaulted secondary comparison operators defined as deleted
   ?
 

[RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Qing Zhao
-Warray-bounds is an important option to enable linux kernal to keep
the array out-of-bound errors out of the source tree.

However, due to the false positive warnings reported in PR109071
(-Warray-bounds false positive warnings due to code duplication from
jump threading), -Warray-bounds=1 cannot be added on by default.

Although it's impossible to elinimate all the false positive warnings
from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
documentation says "always out of bounds"), we should minimize the
false positive warnings in -Warray-bounds=1.

The root reason for the false positive warnings reported in PR109071 is:

When the thread jump optimization tries to reduce the # of branches
inside the routine, sometimes it needs to duplicate the code and
split into two conditional pathes. for example:

The original code:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
  if (index >= 4)
warn ();
  *ptr = 0;
  *val = sg->vals[index];
  if (index >= 4)
warn ();
  *ptr = *val;

  return;
}

With the thread jump, the above becomes:

void sparx5_set (int * ptr, struct nums * sg, int index)
{
  if (index >= 4)
{
  warn ();
  *ptr = 0; // Code duplications since "warn" does return;
  *val = sg->vals[index];   // same this line.
// In this path, since it's under the condition
// "index >= 4", the compiler knows the value
// of "index" is larger then 4, therefore the
// out-of-bound warning.
  warn ();
}
  else
{
  *ptr = 0;
  *val = sg->vals[index];
}
  *ptr = *val;
  return;
}

We can see, after the thread jump optimization, the # of branches inside
the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
code duplication (which is needed for the correctness of the code), we
got a false positive out-of-bound warning.

In order to eliminate such false positive out-of-bound warning,

A. Add one more flag for GIMPLE: is_splitted.
B. During the thread jump optimization, when the basic blocks are
   duplicated, mark all the STMTs inside the original and duplicated
   basic blocks as "is_splitted";
C. Inside the array bound checker, add the following new heuristic:

If
   1. the stmt is duplicated and splitted into two conditional paths;
+  2. the warning level < 2;
+  3. the current block is not dominating the exit block
Then not report the warning.

The false positive warnings are moved from -Warray-bounds=1 to
 -Warray-bounds=2 now.

Bootstrapped and regression tested on both x86 and aarch64. adjusted
 -Warray-bounds-61.c due to the false positive warnings.

Let me know if you have any comments and suggestions.

Thanks.

Qing


PR tree optimization/109071

gcc/ChangeLog:

* gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add two new
arguments for the new heuristic to not issue warnings.
(array_bounds_checker::check_array_ref): Call the new prototype of the
routine check_out_of_bounds_and_warn.
(array_bounds_checker::check_mem_ref): Add one new argument for the
new heuristic to not issue warnings.
(array_bounds_checker::check_addr_expr): Call the new prototype of the
routine check_mem_ref, add new heuristic for not issue warnings.
(array_bounds_checker::check_array_bounds): Call the new prototype of
the routine check_mem_ref.
* gimple-array-bounds.h: New prototype of check_mem_ref.
* gimple.h (struct GTY): Add one new flag is_splitted for gimple.
(gimple_is_splitted_p): New function.
(gimple_set_is_splitted): New function.
* tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted): New function.
(back_jt_path_registry::duplicate_thread_path): Mark all the stmts in
both original and copied blocks as IS_SPLITTED.

gcc/testsuite/ChangeLog:

* gcc.dg/Warray-bounds-61.c: Adjust testing case.
* gcc.dg/pr109071-1.c: New test.
* gcc.dg/pr109071.c: New test.
---
 gcc/gimple-array-bounds.cc  | 46 +
 gcc/gimple-array-bounds.h   |  2 +-
 gcc/gimple.h| 21 +--
 gcc/testsuite/gcc.dg/Warray-bounds-61.c |  6 ++--
 gcc/testsuite/gcc.dg/pr109071-1.c   | 22 
 gcc/testsuite/gcc.dg/pr109071.c | 22 
 gcc/tree-ssa-threadupdate.cc| 15 
 7 files changed, 122 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr109071.c

diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 008071cd5464..4a2975623bc1 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t location, tree ref,
  tree up_bound, tree up_bound_p1,
  

[to-be-committed][RISC-V] Improve AND with some constants

2024-05-13 Thread Jeff Law


If we have an AND with a constant operand and the constant operand 
requires synthesis, then we may be able to generate more efficient code 
than we do now.


Essentially the need for constant synthesis gives us a budget for 
alternative ways to clear bits, which zext.w can do for bits 32..63 
trivially.   So if we clear 32..63  via zext.w, the constant for the 
remaining bits to clear may be simple enough to use with andi or bseti. 
That will save us an instruction.


This has tested in Ventana's CI system as well as my own.  I'll wait for 
the upstream CI tester to report success before committing.


Jeff
gcc/
* config/riscv/bitmanip.md: Add new splitter for AND with
a constant that masks off bits 32..63 and needs synthesis.

gcc/testsuite/

* gcc.target/riscv/zba_zbs_and-1.c: New test.

+++ b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 724511b6df3..8769a6b818b 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -843,6 +843,40 @@ (define_insn_and_split "*andi_extrabit"
 }
 [(set_attr "type" "bitmanip")])
 
+;; If we have the ZBA extension, then we can clear the upper half of a 64
+;; bit object with a zext.w.  So if we have AND where the constant would
+;; require synthesis of two or more instructions, but 32->64 sign extension
+;; of the constant is a simm12, then we can use zext.w+andi.  If the adjusted
+;; constant is a single bit constant, then we can use zext.w+bclri
+;;
+;; With the mvconst_internal pattern claiming a single insn to synthesize
+;; constants, this must be a define_insn_and_split.
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (match_operand:DI 1 "register_operand" "r")
+   (match_operand 2 "const_int_operand" "n")))]
+  "TARGET_64BIT
+   && TARGET_ZBA
+   && !paradoxical_subreg_p (operands[1])
+   /* Only profitable if synthesis takes more than one insn.  */
+   && riscv_const_insns (operands[2]) != 1
+   /* We need the upper half to be zero.  */
+   && (INTVAL (operands[2]) & HOST_WIDE_INT_C (0x)) == 0
+   /* And the the adjusted constant must either be something we can
+  implement with andi or bclri.  */
+   && ((SMALL_OPERAND (sext_hwi (INTVAL (operands[2]), 32))
+|| (TARGET_ZBS && popcount_hwi (INTVAL (operands[2])) == 31))
+   && INTVAL (operands[2]) != 0x7fff)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (zero_extend:DI (match_dup 3)))
+   (set (match_dup 0) (and:DI (match_dup 0) (match_dup 2)))]
+  "{
+ operands[3] = gen_lowpart (SImode, operands[1]);
+ operands[2] = GEN_INT (sext_hwi (INTVAL (operands[2]), 32));
+   }"
+  [(set_attr "type" "bitmanip")])
+
 ;; IF_THEN_ELSE: test for 2 bits of opposite polarity
 (define_insn_and_split "*branch_mask_twobits_equals_singlebit"
   [(set (pc)
diff --git a/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c 
b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c
new file mode 100644
index 000..23fd769449e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+
+unsigned long long w32mem_1(unsigned long long w32)
+{
+return w32 & ~(1U << 0);
+}
+
+unsigned long long w32mem_2(unsigned long long w32)
+{
+return w32 & ~(1U << 30);
+}
+
+unsigned long long w32mem_3(unsigned long long w32)
+{
+return w32 & ~(1U << 31);
+}
+
+/* If we do synthesis, then we'd see an addi.  */
+/* { dg-final { scan-assembler-not "addi\t" } } */


[PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265]

2024-05-13 Thread Vineet Gupta
Apologies for the delay in getting this out. Needed to fix one ICE
with glibc build and fresh round of testing: both testsuite and SPEC
runs (which are similar to v1 in terms of Cactu gains, but some more minor
regressions elsewhere gcc). Again those seem so small that IMHO this
should still go in.

I'll investigate those next as well as an existing weirdnes in glibc tempnam
which I spotted during the debugging.

Changes since v1 [1]
 - Tighten the main conditition to avoid stack regs as destination
   (to avoid making them potentially unaligned with -2047 addend:
this might be OK execution/ABI wise, but undesirable/ugly still
specially when coming from compiler codegen).
 - Ensure that first alternative is always split
 - Remove "&& 1" from split condition. That was tripping up glibc build
   with illegal operands `add s0, s0, 2048`.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647877.html

---

... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
 - avoid an additional LUI insn
 - side benefits of not clobbering a reg

e.g.
w/o patch w/ patch
long  | |
plus(unsigned long i) | li  a5,4096 |
{ | addia5,a5,-2032 | addi a0, a0, 2047
   return i + 2064;   | add a0,a0,a5| addi a0, a0, 17
} | ret | ret

NOTE: In theory not having const in a standalone reg might seem less
  CSE friendly, but for workloads in consideration these mat are
  from very late LRA reloads and follow on GCSE is not doing much
  currently.

The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.

This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.

 500.perlbench_r-0 |  1,214,534,029,025 | 1,212,887,959,387 |
 500.perlbench_r-1 |740,383,419,739 |   739,280,308,163 |
 500.perlbench_r-2 |692,074,638,817 |   691,118,734,547 |
 502.gcc_r-0   |190,820,141,435 |   190,857,065,988 |
 502.gcc_r-1   |225,747,660,839 |   225,809,444,357 | <- -0.02%
 502.gcc_r-2   |220,370,089,641 |   220,406,367,876 | <- -0.03%
 502.gcc_r-3   |179,111,460,458 |   179,135,609,723 | <- -0.02%
 502.gcc_r-4   |219,301,546,340 |   219,320,416,956 | <- -0.01%
 503.bwaves_r-0|278,733,324,691 |   278,733,323,575 | <- -0.01%
 503.bwaves_r-1|442,397,521,282 |   442,397,519,616 |
 503.bwaves_r-2|344,112,218,206 |   344,112,216,760 |
 503.bwaves_r-3|417,561,469,153 |   417,561,467,597 |
 505.mcf_r |669,319,257,525 |   669,318,763,084 |
 507.cactuBSSN_r   |  2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10%
 508.namd_r|  1,855,884,342,110 | 1,855,881,110,934 |
 510.parest_r  |  1,654,525,521,053 | 1,654,402,859,174 |
 511.povray_r  |  2,990,146,655,619 | 2,990,060,324,589 |
 519.lbm_r |  1,158,337,294,525 | 1,158,337,294,529 |
 520.omnetpp_r |  1,021,765,791,283 | 1,026,165,661,394 |
 521.wrf_r |  1,715,955,652,503 | 1,714,352,737,385 |
 523.xalancbmk_r   |849,846,008,075 |   849,836,851,752 |
 525.x264_r-0  |277,801,762,763 |   277,488,776,427 |
 525.x264_r-1  |927,281,789,540 |   926,751,516,742 |
 525.x264_r-2  |915,352,631,375 |   914,667,785,953 |
 526.blender_r |  1,652,839,180,887 | 1,653,260,825,512 |
 527.cam4_r|  1,487,053,494,925 | 1,484,526,670,770 |
 531.deepsjeng_r   |  1,641,969,526,837 | 1,642,126,598,866 |
 538.imagick_r |  2,098,016,546,691 | 2,097,997,929,125 |
 541.leela_r   |  1,983,557,323,877 | 1,983,531,314,526 |
 544.nab_r |  1,516,061,611,233 | 1,516,061,407,715 |
 548.exchange2_r   |  2,072,594,330,215 | 2,072,591,648,318 |
 549.fotonik3d_r   |  1,001,499,307,366 | 1,001,478,944,189 |
 554.roms_r|  1,028,799,739,111 | 1,028,780,904,061 |
 557.xz_r-0|363,827,039,684 |   363,057,014,260 |
 557.xz_r-1|906,649,112,601 |   905,928,888,732 |
 557.xz_r-2|509,023,898,187 |   508,140,356,932 |
 997.specrand_fr   |402,535,577 |   403,052,561 |
 999.specrand_ir   |402,535,577 |   403,052,561 |

This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs (thats a different PR/114729.

Implementation Details (for posterity)

[PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-13 Thread Vineet Gupta
If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

   gcc-13.1 release   |  gcc 230823 |   |
  |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv.cc | 74 +--
 gcc/config/riscv/riscv.h  |  7 ++
 gcc/testsuite/gcc.target/riscv/pr105733.c | 15 
 .../riscv/rvv/autovec/vls/spill-1.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-2.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-3.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-4.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-5.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-6.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-7.c   |  4 +-
 11 files changed, 105 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 706dc204e643..6da6ae4d041f 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -166,6 +166,8 @@ extern void riscv_subword_address (rtx, rtx *, rtx *, rtx 
*, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 extern bool riscv_reg_frame_related (rtx);
+extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, HOST_WIDE_INT *,
+   HOST_WIDE_INT *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4067505270e1..4b742489b272 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4063,6 +4063,32 @@ riscv_split_doubleword_move (rtx dest, rtx src)
riscv_emit_move (riscv_subword (dest, true), riscv_subword (src, true));
  }
 }
+
+/* Constant VAL is known to be sum of two S12 constants.  Break it into
+   comprising BASE and OFF.
+   Numerically S12 is -2048 to 2047, however it uses the more conservative
+   range -2048 to 2032 as offsets pertain to stack related registers.  */
+
+void
+riscv_split_sum_of_two_s12 (HOST_WIDE_INT val, HOST_WIDE_INT *base,
+   HOST_WIDE_INT *off)
+{
+  if (SUM_OF_TWO_S12_N (val))
+{
+  *base = -2048;
+  *off = val - (-2048);
+}
+  else if (SUM_OF_TWO_S12_P_ALGN (val))
+{
+  *base = 

[PATCH v2 0/2] RISC-V improve stack/array access by constant mat tweak

2024-05-13 Thread Vineet Gupta
Hi,

This set of patches help improve stack/array accesses by improving
constant materialization. Details are in respective patches.

The first patch is the main change which improves SPEC cactu by 10%.

As discussed/agreed for v1 [1], I've dropped the splitter variant for
stack accesses.

I also have a few follow-ups which I come back to seperately.

Thx,
-Vineet

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647874.html

Vineet Gupta (2):
  RISC-V: avoid LUI based const materialization ... [part of PR/106265]
  RISC-V: avoid LUI based const mat in prologue/epilogue expansion
[PR/105733]

 gcc/config/riscv/constraints.md   |  6 ++
 gcc/config/riscv/predicates.md|  6 ++
 gcc/config/riscv/riscv-protos.h   |  3 +
 gcc/config/riscv/riscv.cc | 85 +--
 gcc/config/riscv/riscv.h  | 22 +
 gcc/config/riscv/riscv.md | 40 +
 gcc/testsuite/gcc.target/riscv/pr105733.c | 15 
 .../riscv/rvv/autovec/vls/spill-1.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-2.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-3.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-4.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-5.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-6.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-7.c   |  4 +-
 .../gcc.target/riscv/sum-of-two-s12-const-1.c | 45 ++
 .../gcc.target/riscv/sum-of-two-s12-const-2.c | 15 
 .../gcc.target/riscv/sum-of-two-s12-const-3.c | 22 +
 17 files changed, 266 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-3.c

-- 
2.34.1



Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/5/9 15:35, HAO CHEN GUI wrote:
> Hi Kewen,
>   Thanks for your comments.
> 
> 在 2024/5/9 13:44, Kewen.Lin 写道:
>> Hi,
>>
>> on 2024/5/8 14:47, HAO CHEN GUI wrote:
>>> Hi,
>>>   This patch enables overlapped by-piece operations. On rs6000, default
>>> move/set/clear ratio is 2. So the overlap is only enabled with compare
>>> by-pieces.
>>
>> Thanks for enabling this, did you evaluate if it can help some benchmark?
> 
> Tested it with SPEC2017. No obvious performance impact. I think memory
> compare might not be hot enough.
> 
> Tested it with my micro benchmark. 5-10% performance gain when compare
> length is 7.

Nice!

> 
>>
>>>
>>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> rs6000: Enable overlapped by-pieces operations
>>>
>>> This patch enables overlapped by-piece operations by defining
>>> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
>>> ratio is 2.  So the overlap is only enabled with compare by-pieces.
>>>
>>> gcc/
>>> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
>>>
>>> gcc/testsuite/
>>> * gcc.target/powerpc/block-cmp-9.c: New.
>>>
>>>
>>> patch.diff
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 6b9a40fcc66..2b5f5cf1d86 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
>>> rs6000_attribute_table[] =
>>>  #undef TARGET_CONST_ANCHOR
>>>  #define TARGET_CONST_ANCHOR 0x8000
>>>
>>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
>>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
>>> +
>>>  
>>>
>>>  /* Processor table.  */
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
>>> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>>> new file mode 100644
>>> index 000..b5f51affbb7
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>>> @@ -0,0 +1,11 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
>>
>> Why does it need power8 forced here?
> 
> I just want to exclude P7 LE as targetm.slow_unaligned_access return false
> for it and the expand cmpmemsi won't be invoked.

> I think it over. It's no need. For the sub-targets which library is
> called, l[hb]z won't be generated too.

Thanks for checking, OK with dropping this forced power8.

BR,
Kewen

> 
>>
>> BR,
>> Kewen
>>
>>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
>>> +
>>> +/* Test if by-piece overlap compare is enabled and following case is
>>> +   implemented by two overlap word loads and compares.  */
>>> +
>>> +int foo (const char* s1, const char* s2)
>>> +{
>>> +  return __builtin_memcmp (s1, s2, 7) == 0;
>>> +}
>>
> 
> Thanks
> Gui Haochen



Re: [COMMITTED 2/5] Fix ranger when called from SCEV.

2024-05-13 Thread Jan-Benedict Glaw
On Mon, 2024-05-13 20:19:42 +0200, Jan-Benedict Glaw  wrote:
> On Tue, 2024-04-30 17:24:15 -0400, Andrew MacLeod  wrote:
> > Bootstrapped on x86_64-pc-linux-gnu with no regressions.  pushed.
> 
> Starting with this patch (upstream as
> e8ae56a7dc46e39a48017bb5159e4dc672ec7fad, can still be reproduced with
> 0c585c8d0dd85601a8d116ada99126a48c8ce9fd as of May 13th), my CI builds fail 
> for
> csky-elf in all-target-libgcc by falling into a loop infinite loop:
> 
> ../gcc/configure '--with-pkgversion=basepoints/gcc-15-432-g0c585c8d0dd, built 
> at 1715608899'  \
>   --prefix=/tmp/gcc-csky-elf --enable-werror-always 
> --enable-languages=all\
>   --disable-gcov --disable-shared --disable-threads --target=csky-elf 
> --without-headers
> make V=1 all-gcc
> make V=1 install-strip-gcc
> make V=1 all-target-libgcc

Just to add:

/var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/cc1 -quiet   
\
-I . -I . -I ../../.././gcc -I ../../../../gcc/libgcc   
\
-I ../../../../gcc/libgcc/. -I ../../../../gcc/libgcc/../gcc
\
-I ../../../../gcc/libgcc/../include -imultilib ck801   
\
-iprefix 
/var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/gcc/../lib/gcc/csky-elf/15.0.0/
   \
-isystem 
/var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/include \
-isystem 
/var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/include-fixed   \
-MD unwind-dw2-fde.d -MF unwind-dw2-fde.dep -MP -MT unwind-dw2-fde.o
\
-D IN_GCC -D CROSS_DIRECTORY_STRUCTURE -D IN_LIBGCC2 -D inhibit_libc
\
-D HAVE_CC_TLS -D USE_EMUTLS -D HIDE_EXPORTS
\
-isystem /tmp/gcc-csky-elf/csky-elf/include 
\
-isystem /tmp/gcc-csky-elf/csky-elf/sys-include 
\
-isystem ./include ../../../../gcc/libgcc/unwind-dw2-fde.c -quiet   
\
-dumpbase unwind-dw2-fde.c -dumpbase-ext .c -mcpu=ck801 -g -g -g -O2 
-O2 -O2\
-Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual 
-Wstrict-prototypes\
-Wmissing-prototypes -Wold-style-definition -fbuilding-libgcc 
-fno-stack-protector  \
-fexceptions -fvisibility=hidden -o /tmp/cc3SHedS.s

> (gdb) bt
> #0  0x0098f1df in bitmap_list_find_element (head=0x38f2e18, 
> indx=5001) at ../../gcc/gcc/bitmap.cc:375
> #1  bitmap_set_bit (head=0x38f2e18, bit=640244) at ../../gcc/gcc/bitmap.cc:962
> #2  0x00d39cd1 in process_bb_lives (bb=, 
> curr_point=@0x7ffe062c1b2c: 3039473, dead_insn_p=) at 
> ../../gcc/gcc/lra-lives.cc:889
> #3  lra_create_live_ranges_1 (all_p=all_p@entry=true, dead_insn_p= out>) at ../../gcc/gcc/lra-lives.cc:1416
> #4  0x00d3b810 in lra_create_live_ranges (all_p=all_p@entry=true, 
> dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1486
> #5  0x00d1a8bd in lra (f=, verbose=) at 
> ../../gcc/gcc/lra.cc:2482
> #6  0x00cd0e18 in do_reload () at ../../gcc/gcc/ira.cc:5973
> #7  (anonymous namespace)::pass_reload::execute (this=) at 
> ../../gcc/gcc/ira.cc:6161
> #8  0x00de6368 in execute_one_pass (pass=pass@entry=0x367c490) at 
> ../../gcc/gcc/passes.cc:2647
> #9  0x00de6c00 in execute_pass_list_1 (pass=0x367c490) at 
> ../../gcc/gcc/passes.cc:2756
> #10 0x00de6c12 in execute_pass_list_1 (pass=0x367b2f0) at 
> ../../gcc/gcc/passes.cc:2757
> #11 0x00de6c39 in execute_pass_list (fn=0x7f24a1c06240, 
> pass=) at ../../gcc/gcc/passes.cc:2767
> #12 0x00a188c6 in cgraph_node::expand (this=0x7f24a1bfaaa0) at 
> ../../gcc/gcc/context.h:48
> #13 cgraph_node::expand (this=0x7f24a1bfaaa0) at 
> ../../gcc/gcc/cgraphunit.cc:1798
> #14 0x00a1a69b in expand_all_functions () at 
> ../../gcc/gcc/cgraphunit.cc:2028
> #15 symbol_table::compile (this=0x7f24a205b000) at 
> ../../gcc/gcc/cgraphunit.cc:2404
> #16 0x00a1ccb8 in symbol_table::compile (this=0x7f24a205b000) at 
> ../../gcc/gcc/cgraphunit.cc:2315
> #17 symbol_table::finalize_compilation_unit (this=0x7f24a205b000) at 
> ../../gcc/gcc/cgraphunit.cc:2589
> #18 0x00f0932d in compile_file () at ../../gcc/gcc/toplev.cc:476
> #19 0x00839648 in do_compile () at ../../gcc/gcc/toplev.cc:2158
> #20 toplev::main (this=this@entry=0x7ffe062c1f2e, argc=, 
> argc@entry=78, argv=, argv@entry=0x7ffe062c2058) at 
> ../../gcc/gcc/toplev.cc:2314
> #21 0x0083ad9e in main (argc=78, argv=0x7ffe062c2058) at 
> ../../gcc/gcc/main.cc:39
> 
> (Loop is based in process_bb_lives(), looping in the
> FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next) block starting at
> about line 696.)

MfG, JBG

-- 


signature.asc
Description: PGP signature


Re: [COMMITTED 2/5] Fix ranger when called from SCEV.

2024-05-13 Thread Jan-Benedict Glaw
On Tue, 2024-04-30 17:24:15 -0400, Andrew MacLeod  wrote:
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  pushed.

Starting with this patch (upstream as
e8ae56a7dc46e39a48017bb5159e4dc672ec7fad, can still be reproduced with
0c585c8d0dd85601a8d116ada99126a48c8ce9fd as of May 13th), my CI builds fail for
csky-elf in all-target-libgcc by falling into a loop infinite loop:

../gcc/configure '--with-pkgversion=basepoints/gcc-15-432-g0c585c8d0dd, built 
at 1715608899'\
--prefix=/tmp/gcc-csky-elf --enable-werror-always 
--enable-languages=all\
--disable-gcov --disable-shared --disable-threads --target=csky-elf 
--without-headers
make V=1 all-gcc
make V=1 install-strip-gcc
make V=1 all-target-libgcc

(gdb) bt
#0  0x0098f1df in bitmap_list_find_element (head=0x38f2e18, indx=5001) 
at ../../gcc/gcc/bitmap.cc:375
#1  bitmap_set_bit (head=0x38f2e18, bit=640244) at ../../gcc/gcc/bitmap.cc:962
#2  0x00d39cd1 in process_bb_lives (bb=, 
curr_point=@0x7ffe062c1b2c: 3039473, dead_insn_p=) at 
../../gcc/gcc/lra-lives.cc:889
#3  lra_create_live_ranges_1 (all_p=all_p@entry=true, dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1416
#4  0x00d3b810 in lra_create_live_ranges (all_p=all_p@entry=true, 
dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1486
#5  0x00d1a8bd in lra (f=, verbose=) at 
../../gcc/gcc/lra.cc:2482
#6  0x00cd0e18 in do_reload () at ../../gcc/gcc/ira.cc:5973
#7  (anonymous namespace)::pass_reload::execute (this=) at 
../../gcc/gcc/ira.cc:6161
#8  0x00de6368 in execute_one_pass (pass=pass@entry=0x367c490) at 
../../gcc/gcc/passes.cc:2647
#9  0x00de6c00 in execute_pass_list_1 (pass=0x367c490) at 
../../gcc/gcc/passes.cc:2756
#10 0x00de6c12 in execute_pass_list_1 (pass=0x367b2f0) at 
../../gcc/gcc/passes.cc:2757
#11 0x00de6c39 in execute_pass_list (fn=0x7f24a1c06240, pass=) at ../../gcc/gcc/passes.cc:2767
#12 0x00a188c6 in cgraph_node::expand (this=0x7f24a1bfaaa0) at 
../../gcc/gcc/context.h:48
#13 cgraph_node::expand (this=0x7f24a1bfaaa0) at 
../../gcc/gcc/cgraphunit.cc:1798
#14 0x00a1a69b in expand_all_functions () at 
../../gcc/gcc/cgraphunit.cc:2028
#15 symbol_table::compile (this=0x7f24a205b000) at 
../../gcc/gcc/cgraphunit.cc:2404
#16 0x00a1ccb8 in symbol_table::compile (this=0x7f24a205b000) at 
../../gcc/gcc/cgraphunit.cc:2315
#17 symbol_table::finalize_compilation_unit (this=0x7f24a205b000) at 
../../gcc/gcc/cgraphunit.cc:2589
#18 0x00f0932d in compile_file () at ../../gcc/gcc/toplev.cc:476
#19 0x00839648 in do_compile () at ../../gcc/gcc/toplev.cc:2158
#20 toplev::main (this=this@entry=0x7ffe062c1f2e, argc=, 
argc@entry=78, argv=, argv@entry=0x7ffe062c2058) at 
../../gcc/gcc/toplev.cc:2314
#21 0x0083ad9e in main (argc=78, argv=0x7ffe062c2058) at 
../../gcc/gcc/main.cc:39

(Loop is based in process_bb_lives(), looping in the
FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next) block starting at
about line 696.)

MfG, JBG

-- 


signature.asc
Description: PGP signature


Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar

2024-05-13 Thread Jeff Law




On 5/13/24 9:00 AM, Li, Pan2 wrote:

Committed, thanks Juzhe and Kito. Let's wait for a while before backport to 14.

Could you fix the formatting nits caught by the CI linter?

=== ERROR type #1: trailing operator (4 error(s)) ===
gcc/config/riscv/riscv-vector-builtins.cc:4641:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_16) &&
gcc/config/riscv/riscv-vector-builtins.cc:4651:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_32) &&
gcc/config/riscv/riscv-vector-builtins.cc:4661:39:  if ((exts & 
RVV_REQUIRE_ELEN_FP_64) &&
gcc/config/riscv/riscv-vector-builtins.cc:4670:36:  if ((exts & 
RVV_REQUIRE_ELEN_64) &&



The "&&" needs to come down to the next line, indented like

if ((exts && RVV_REQUIRE_ELEN_FP_16)
&& !TARGET_VECTOR_.)

Ie, the "&&" indents just inside the first open paren.  It looks like 
all the conditions in validate_instance_type_required_extensions need to 
be fixed in a similar manner.


Given this is NFC, just post it for the archiver.  No need to wait on 
review.


Jeff




[COMMITTED][GCC12] Backport of 111009 patch.

2024-05-13 Thread Andrew MacLeod

Same patch for gcc12.

bootstraps and passes all tests on x86_64-pc-linux-gnu

On 5/9/24 10:32, Andrew MacLeod wrote:
As requested, backported the patch for 111009 to resolve incorrect 
ranges from addr_expr and committed to GCC 13 branch.


bootstraps and passes all tests on x86_64-pc-linux-gnu

Andrewcommit b5d079c37e9eee15c0bfe34ffcae31e551192777
Author: Andrew MacLeod 
Date:   Fri May 10 13:56:01 2024 -0400

Fix range-ops operator_addr.

Lack of symbolic information prevents op1_range from being able to draw
the same conclusions as fold_range can.

PR tree-optimization/111009
gcc/
* range-op.cc (operator_addr_expr::op1_range): Be more restrictive.
* value-range.h (contains_zero_p): New.

gcc/testsuite/
* gcc.dg/pr111009.c: New.

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index bf95f5fbaa1..2e0d67b70b6 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -3825,7 +3825,17 @@ operator_addr_expr::op1_range (irange , tree type,
 			   const irange ,
 			   relation_kind rel ATTRIBUTE_UNUSED) const
 {
-  return operator_addr_expr::fold_range (r, type, lhs, op2);
+   if (empty_range_varying (r, type, lhs, op2))
+return true;
+
+  // Return a non-null pointer of the LHS type (passed in op2), but only
+  // if we cant overflow, eitherwise a no-zero offset could wrap to zero.
+  // See PR 111009.
+  if (!contains_zero_p (lhs) && TYPE_OVERFLOW_UNDEFINED (type))
+r = range_nonzero (type);
+  else
+r.set_varying (type);
+  return true;
 }
 
 
diff --git a/gcc/testsuite/gcc.dg/pr111009.c b/gcc/testsuite/gcc.dg/pr111009.c
new file mode 100644
index 000..3accd9ac063
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr111009.c
@@ -0,0 +1,38 @@
+/* PR tree-optimization/111009 */
+/* { dg-do run } */
+/* { dg-options "-O3 -fno-strict-overflow" } */
+
+struct dso {
+ struct dso * next;
+ int maj;
+};
+
+__attribute__((noipa)) static void __dso_id__cmp_(void) {}
+
+__attribute__((noipa))
+static int bug(struct dso * d, struct dso *dso)
+{
+ struct dso **p = 
+ struct dso *curr = 0;
+
+ while (*p) {
+  curr = *p;
+  // prevent null deref below
+  if (!dso) return 1;
+  if (dso == curr) return 1;
+
+  int *a = >maj;
+  // null deref
+  if (!(a && *a)) __dso_id__cmp_();
+
+  p = >next;
+ }
+ return 0;
+}
+
+__attribute__((noipa))
+int main(void) {
+struct dso d = { 0, 0, };
+bug(, 0);
+}
+
diff --git a/gcc/value-range.h b/gcc/value-range.h
index d4cba22d540..22f5fc68d7c 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -605,6 +605,16 @@ irange::normalize_kind ()
 }
 }
 
+inline bool
+contains_zero_p (const irange )
+{
+  if (r.undefined_p ())
+return false;
+
+  tree zero = build_zero_cst (r.type ());
+  return r.contains_p (zero);
+}
+
 // Return the maximum value for TYPE.
 
 inline tree


[COMMITTED] c++: Avoid using __array_rank as a variable name [PR115061]

2024-05-13 Thread Ken Matsui
Pushed as obvious.

-- >8 --

This patch fixes a compilation error when building GCC using Clang.
Since __array_rank is used as a built-in trait name, use rank instead.

PR c++/115061

gcc/cp/ChangeLog:

* semantics.cc (finish_trait_expr): Use rank instead of
__array_rank.

Signed-off-by: Ken Matsui 
---
 gcc/cp/semantics.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 43b175f92fd..df62e2d80db 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12914,10 +12914,10 @@ finish_trait_expr (location_t loc, cp_trait_kind 
kind, tree type1, tree type2)
   tree val;
   if (kind == CPTK_RANK)
 {
-  size_t __array_rank = 0;
+  size_t rank = 0;
   for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1))
-   ++__array_rank;
-  val = build_int_cst (size_type_node, __array_rank);
+   ++rank;
+  val = build_int_cst (size_type_node, rank);
 }
   else
 val = (trait_expr_value (kind, type1, type2)
-- 
2.44.0



Re: [PATCH] c++: Avoid using __array_rank as a variable name [PR115061]

2024-05-13 Thread Ken Matsui
On Mon, May 13, 2024 at 8:19 AM Marek Polacek  wrote:
>
> On Sun, May 12, 2024 at 11:48:07PM -0700, Ken Matsui wrote:
> > This patch fixes a compilation error when building GCC using Clang.
> > Since __array_rank is used as a built-in trait name, use rank instead.
>
> I think you can go ahead and push this patch as obvious, thanks.

Oh, I see.  Thank you for letting me know!

>
> >   PR c++/115061
> >
> > gcc/cp/ChangeLog:
> >
> >   * semantics.cc (finish_trait_expr): Use rank instead of
> >   __array_rank.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  gcc/cp/semantics.cc | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> > index 43b175f92fd..df62e2d80db 100644
> > --- a/gcc/cp/semantics.cc
> > +++ b/gcc/cp/semantics.cc
> > @@ -12914,10 +12914,10 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> > kind, tree type1, tree type2)
> >tree val;
> >if (kind == CPTK_RANK)
> >  {
> > -  size_t __array_rank = 0;
> > +  size_t rank = 0;
> >for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1))
> > - ++__array_rank;
> > -  val = build_int_cst (size_type_node, __array_rank);
> > + ++rank;
> > +  val = build_int_cst (size_type_node, rank);
> >  }
> >else
> >  val = (trait_expr_value (kind, type1, type2)
> > --
> > 2.44.0
> >
>
> Marek
>


[r15-429 Regression] FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test for excess errors) on Linux/x86_64

2024-05-13 Thread haochen.jiang
On Linux/x86_64,

fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9 is the first bad commit
commit fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9
Author: Matthias Kretz 
Date:   Mon May 6 12:13:55 2024 +0200

libstdc++: Use __builtin_shufflevector for simd split and concat

caused

FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test 
for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-429/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=experimental/simd/pr109261_constexpr_simd.cc 
--target_board='unix{-m32}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [PATCH v1 3/3] RISC-V: Enable vectorizable early exit test

2024-05-13 Thread Robin Dapp
Hi Pan,

>  
> @@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } {
>   || [check_effective_target_arm_v8_neon_hw]
>   || [check_sse4_hw_available]
>   || [istarget amdgcn-*-*]
> + || [check_effective_target_riscv_v]
>   }}]
>  }

I believe this should be riscv_v_ok.  riscv_v only checks if we can
compile.  OK with that changed after 2/3 is in.

Regards
 Robin


[PATCH] Match: optimize `a == CST & unary(a)` [PR111487]

2024-05-13 Thread Andrew Pinski
This is an expansion of the optimize `a == CST & a`
to handle more than just casts. It adds optimization
for unary.
The patch for binary operators will come later.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111487
gcc/ChangeLog:

* match.pd (tcc_int_unary): New operator list.
(`a == CST & unary(a)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/and-unary-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd| 12 
 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c | 61 +
 2 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..3ee28a3d8fc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -57,6 +57,10 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "cfn-operators.pd"
 
+/* integer unary operators that return the same type. */
+(define_operator_list tcc_int_unary
+ abs absu negate bit_not BSWAP POPCOUNT CTZ CLZ PARITY)
+
 /* Define operand lists for math rounding functions {,i,l,ll}FN,
where the versions prefixed with "i" return an int, those prefixed with
"l" return a long and those prefixed with "ll" return a long long.
@@ -5451,6 +5455,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   @2
   { build_zero_cst (type); }))
 
+/* `(a == CST) & unary(a)` can be simplified to `(a == CST) & unary(CST)`. */
+(simplify
+ (bit_and:c (convert@2 (eq @0 INTEGER_CST@1))
+(convert? (tcc_int_unary @3)))
+ (if (bitwise_equal_p (@0, @3))
+  (with { tree  inner_type = TREE_TYPE (@3); }
+   (bit_and @2 (convert (tcc_int_unary (convert:inner_type @1)))
+
 /* Optimize
# x_5 in range [cst1, cst2] where cst2 = cst1 + 1
x_5 == cstN ? cst4 : cst3
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
new file mode 100644
index 000..c157bc11b00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-raw -fdump-tree-optimized-raw" } */
+/* unary part of PR tree-optimization/111487 */
+
+int abs1(int a)
+{
+  int b = __builtin_abs(a);
+  return (a == 1) & b;
+}
+int absu1(int a)
+{
+  int b;
+  b = a > 0 ? -a:a;
+  b = -b;
+return (a == 1) & b;
+}
+
+int bswap1(int a)
+{
+  int b = __builtin_bswap32(a);
+  return (a == 1) & b;
+}
+
+int ctz1(int a)
+{
+  int b = __builtin_ctz(a);
+  return (a == 1) & b;
+}
+int pop1(int a)
+{
+  int b = __builtin_popcount(a);
+  return (a == 1) & b;
+}
+int neg1(int a)
+{
+  int b = -(a);
+  return (a == 1) & b;
+}
+int not1(int a)
+{
+  int b = ~(a);
+  return (a == 1) & b;
+}
+int partity1(int a)
+{
+  int b = __builtin_parity(a);
+  return (a == 1) & b;
+}
+
+
+/* We should optimize out the unary operator for each.
+   For ctz we can optimize directly to `return 0`.
+   For bswap1 and not1, we can do the same but not until after forwprop1.  */
+/* { dg-final { scan-tree-dump-times "eq_expr, " 7 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "eq_expr, " 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "abs_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "absu_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "bit_not_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "negate_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "gimple_call <"  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "bit_and_expr,  "  "forwprop1" } } */
-- 
2.34.1



Re: [Patch, aarch64] v3: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-13 Thread Alex Coplan
Hi Ajit,

Why did you send three mails for this revision of the patch?  If you're
going to send a new revision of the patch you should increment the
version number and outline the changes / reasons for the new revision.

Mostly the comments below are just style nits and things you missed from
the last review(s) (please try not to miss so many in the future).

On 09/05/2024 17:06, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-05-09  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 542 +++
>  1 file changed, 363 insertions(+), 179 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..217790e111a 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,224 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int ) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +enum class writeback{

You missed a nit here.  Space before '{'.

> +  ALL,
> +  EXISTING
> +};

You also missed adding comments for the enum, please see the review for v2:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651074.html

> +
> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };
> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +
> +  // Return true if we should consider forming ldp/stp insns from memory
> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MEM_MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mode) = 0;

The comment needs updating since we changed the name of the last param,
i.e. s/MEM_MODE/MODE/.

> +
> +  // Return alias check limit.
> +  // This is needed to avoid unbounded quadratic behaviour when
> +  // performing alias analysis.
> +  virtual int pair_mem_alias_check_limit () = 0;
> +
> +  // Returns true if we should try to handle writeback opportunities
> +  // (not whether there are any).
> +  virtual bool handle_writeback_opportunities (enum writeback which) = 0 ;

Heh, the bit in parens from the v2 review probably doesn't need to go
into the comment here.

Also you should describe WHICH in the comment.

> +
> +  // Given BASE_MEM, the mem from the lower candidate access for a pair,
> +  // and LOAD_P (true if the access is a load), check if we should proceed
> +  // to form the pair given the target's code generation policy on
> +  // paired accesses.
> +  virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p,
> + machine_mode mode) = 0;

The name of the first param needs updating in the prototype, i.e.
s/first_mem/base_mem/.  I think you missed the bit about 

[pushed][PR115013][LRA]: Modify register starvation recognition

2024-05-13 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115013

Successfully tested and bootstrapped on x86-64.
commit 44430ef3d8ba75692efff5f6969d5610134566d3
Author: Vladimir N. Makarov 
Date:   Mon May 13 10:12:11 2024 -0400

[PR115013][LRA]: Modify register starvation recognition

  My recent patch to recognize reg starvation resulted in few GCC test
failures.  The following patch fixes this by using more accurate
starvation calculation and ignoring small reg classes.

gcc/ChangeLog:

PR rtl-optimization/115013
* lra-constraints.cc (process_alt_operands): Update all_used_nregs
only for winreg.  Ignore reg starvation for small reg classes.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index e945a4da451..92b343fa99a 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2674,8 +2674,9 @@ process_alt_operands (int only_alternative)
 	  if (early_clobber_p
 		  || curr_static_id->operand[nop].type != OP_OUT)
 		{
-		  all_used_nregs
-		+= ira_reg_class_min_nregs[this_alternative][mode];
+		  if (winreg)
+		all_used_nregs
+		  += ira_reg_class_min_nregs[this_alternative][mode];
 		  all_this_alternative
 		= (reg_class_subunion
 		   [all_this_alternative][this_alternative]);
@@ -3250,6 +3251,7 @@ process_alt_operands (int only_alternative)
 	  overall += LRA_MAX_REJECT;
 	}
   if (all_this_alternative != NO_REGS
+	  && !SMALL_REGISTER_CLASS_P (all_this_alternative)
 	  && all_used_nregs != 0 && all_reload_nregs != 0
 	  && (all_used_nregs + all_reload_nregs + 1
 	  >= ira_class_hard_regs_num[all_this_alternative]))


Re: [PATCH] c++: Avoid using __array_rank as a variable name [PR115061]

2024-05-13 Thread Marek Polacek
On Sun, May 12, 2024 at 11:48:07PM -0700, Ken Matsui wrote:
> This patch fixes a compilation error when building GCC using Clang.
> Since __array_rank is used as a built-in trait name, use rank instead.

I think you can go ahead and push this patch as obvious, thanks.
 
>   PR c++/115061
> 
> gcc/cp/ChangeLog:
> 
>   * semantics.cc (finish_trait_expr): Use rank instead of
>   __array_rank.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/semantics.cc | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 43b175f92fd..df62e2d80db 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -12914,10 +12914,10 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> kind, tree type1, tree type2)
>tree val;
>if (kind == CPTK_RANK)
>  {
> -  size_t __array_rank = 0;
> +  size_t rank = 0;
>for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1))
> - ++__array_rank;
> -  val = build_int_cst (size_type_node, __array_rank);
> + ++rank;
> +  val = build_int_cst (size_type_node, rank);
>  }
>else
>  val = (trait_expr_value (kind, type1, type2)
> -- 
> 2.44.0
> 

Marek



RE: [PATCH v1 1/3] Vect: Support loop len in vectorizable early exit

2024-05-13 Thread Tamar Christina
> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, May 13, 2024 3:54 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Tamar Christina ; Richard Sandiford
> ; Pan Li 
> Subject: [PATCH v1 1/3] Vect: Support loop len in vectorizable early exit
> 
> From: Pan Li 
> 
> This patch adds early break auto-vectorization support for target which
> use length on partial vectorization.  Consider this following example:
> 
> unsigned vect_a[802];
> unsigned vect_b[802];
> 
> void test (unsigned x, int n)
> {
>   for (int i = 0; i < n; i++)
>   {
>     vect_b[i] = x + i;
> 
>     if (vect_a[i] > x)
>       break;
> 
>     vect_a[i] = x;
>   }
> }
> 
> We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
> And then the IR of RVV looks like below:
> 
>   ...
>   _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
>   _55 = (int) _87;
>   ...
>   mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
>   vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
> {0, ... }, _87, 0);
>   if (vec_len_mask_72 != { 0, ... })
>     goto ; [5.50%]
>   else
>     goto ; [94.50%]
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> gcc/ChangeLog:
> 
>   * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len
>   handling for one or multiple stmt.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-stmts.cc | 47 -
> -
>  1 file changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 21e8fe98e44..bfd9d66568f 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -12896,7 +12896,9 @@ vectorizable_early_exit (vec_info *vinfo,
> stmt_vec_info stmt_info,
>  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> 
>vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
>bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> +  bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
> 
>/* Now build the new conditional.  Pattern gimple_conds get dropped during
>   codegen so we must replace the original insn.  */
> @@ -12960,12 +12962,11 @@ vectorizable_early_exit (vec_info *vinfo,
> stmt_vec_info stmt_info,
>   {
> if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> OPTIMIZE_FOR_SPEED))
> - return false;
> + vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1);
> else
>   vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
>   }
> 
> -
>return true;
>  }
> 
> @@ -13018,6 +13019,25 @@ vectorizable_early_exit (vec_info *vinfo,
> stmt_vec_info stmt_info,
> stmts[i], _gsi);
>   workset.quick_push (stmt_mask);
> }
> +  else if (len_loop_p)
> + for (unsigned i = 0; i < stmts.length (); i++)
> +   {
> + tree all_ones_mask = build_all_ones_cst (vectype);
> + tree all_zero_mask = build_zero_cst (vectype);
> + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies,
> +   vectype, i, 1);
> + signed char cst = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> (loop_vinfo);
> + tree bias = build_int_cst (intQI_type_node, cst);
> + tree len_mask
> +   = make_temp_ssa_name (TREE_TYPE (stmts[i]), NULL,
> "vec_len_mask");
> + gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5,
> +   stmts[i], all_ones_mask,
> +   all_zero_mask, len, bias);
> + gimple_call_set_lhs (call, len_mask);
> + gsi_insert_before (_gsi, call, GSI_SAME_STMT);
> +
> + workset.quick_push (len_mask);
> +   }
>else
>   workset.splice (stmts);
> 
> @@ -13042,6 +13062,29 @@ vectorizable_early_exit (vec_info *vinfo,
> stmt_vec_info stmt_info,
> new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
>  new_temp, _gsi);
>   }
> +  else if (len_loop_p)
> + {
> +   /* len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
> +
> +  which is equivalent to:
> +
> +  len_mask = compare_mask mask && i < len ? 1 : 0
> +   */
> +   tree all_ones_mask = build_all_ones_cst (vectype);
> +   tree all_zero_mask = build_zero_cst (vectype);
> +   tree len
> + = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, vectype, 0, 1);
> +   signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> (loop_vinfo);
> +   tree bias = build_int_cst (intQI_type_node, biasval);
> +   tree 

RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-13 Thread Tamar Christina
> 
> Thanks Tamer for comments.
> 
> > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when
> optimizing for size.
> 
> Sure thing, let me update it in v5.
> 
> > Hmm why do you iterate independently over the statements? The block below
> already visits
> > Every statement doesn't it?
> 
> Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the
> shape changed, or shall we put it to the previous pass ?
> 

That's just a matter of matching the overflow as an additional case no?
i.e. you can add an overload for unsigned_integer_sat_add matching the
IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers.

I think that would be better as it avoid visiting all the statements twice but 
also
extends the matching to some __builtin_add_overflow uses and should be fairly
simple.

> > The root of your match is a BIT_IOR_EXPR expression, so I think you just 
> > need to
> change the entry below to:
> >
> > case BIT_IOR_EXPR:
> >   match_saturation_arith (, stmt, m_cfg_changed_p);
> >   /* fall-through */
> > case BIT_XOR_EXPR:
> >   match_uaddc_usubc (, stmt, code);
> >   break;
> 
> There are other shapes (not covered in this patch) of SAT_ADD like below 
> branch
> version, the IOR should be one of the ROOT. Thus doesn't
> add case here.  Then, shall we take case for each shape here ? Both works for 
> me.
> 

Yeah, I think that's better than iterating over the statements twice.  It also 
fits better
In the existing code.

Tamar.

> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint32_t)
> 
> Pan
> 
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Monday, May 13, 2024 5:10 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Monday, May 6, 2024 3:48 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > We perform the tranform during widen_mult because that the sub-expr of
> > SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> > pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> > .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> > cannot perform the .SAT_ADD pattern match as the sub-expr will be
> > optmized to .ADD_OVERFLOW first.
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The aarch64 fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > PR target/51492
> > PR target/112600
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > to the return true switch case(s).
> > * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > * match.pd: Add unsigned SAT_ADD match.
> > * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> > * tree-ssa-math-opts.cc 

RE: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar

2024-05-13 Thread Li, Pan2
Committed, thanks Juzhe and Kito. Let's wait for a while before backport to 14.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, May 13, 2024 10:11 PM
To: juzhe.zh...@rivai.ai
Cc: Li, Pan2 ; gcc-patches 
Subject: Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 
scalar

LGTM as well :)

On Sat, May 11, 2024 at 3:58 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM from my side. Wait for kito chime in.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: pan2.li
> Date: 2024-05-11 15:54
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; Pan Li
> Subject: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 
> scalar
> From: Pan Li 
>
> For the vfw vx format RVV intrinsic, the scalar type _Float16 also
> requires the zvfh extension.  Unfortunately,  we only check the
> vector tree type and miss the scalar _Float16 type checking.  For
> example:
>
> vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t vs2, _Float16 rs1, size_t 
> vl)
> {
>   return __riscv_vfwsub_wf_f32mf2(vs2, rs1, vl);
> }
>
> It should report some error message like zvfh extension is required
> instead of ICE for unreg insn.
>
> This patch would like to make up such kind of validation for _Float16
> in the RVV intrinsic API.  It will report some error like below when
> there is no zvfh enabled.
>
> error: built-in function '__riscv_vfwsub_wf_f32mf2(vs2,  rs1,  vl)'
>   requires the zvfhmin or zvfh ISA extension
>
> PR target/114988
>
> Passed the rv64gcv fully regression tests, included c/c++/fortran.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins.cc
> (validate_instance_type_required_extensions): New func impl to
> validate the intrinisc func type ops.
> (expand_builtin): Validate instance type before expand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr114988-1.c: New test.
> * gcc.target/riscv/rvv/base/pr114988-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
> gcc/config/riscv/riscv-vector-builtins.cc | 51 +++
> .../gcc.target/riscv/rvv/base/pr114988-1.c|  9 
> .../gcc.target/riscv/rvv/base/pr114988-2.c|  9 
> 3 files changed, 69 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-2.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 192a6c230d1..3fdb4400d70 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -4632,6 +4632,54 @@ gimple_fold_builtin (unsigned int code, 
> gimple_stmt_iterator *gsi, gcall *stmt)
>return gimple_folder (rfn.instance, rfn.decl, gsi, stmt).fold ();
> }
> +static bool
> +validate_instance_type_required_extensions (const rvv_type_info type,
> + tree exp)
> +{
> +  uint64_t exts = type.required_extensions;
> +
> +  if ((exts & RVV_REQUIRE_ELEN_FP_16) &&
> +!TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the "
> + "zvfhmin or zvfh ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  if ((exts & RVV_REQUIRE_ELEN_FP_32) &&
> +!TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the "
> + "zve32f, zve64f, zve64d or v ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  if ((exts & RVV_REQUIRE_ELEN_FP_64) &&
> +!TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the zve64d or v ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  if ((exts & RVV_REQUIRE_ELEN_64) &&
> +!TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the "
> + "zve64x, zve64f, zve64d or v ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  return true;
> +}
> +
> /* Expand a call to the RVV function with subcode CODE.  EXP is the call
> expression and TARGET is the preferred location for the result.
> Return the value of the lhs.  */
> @@ -4649,6 +4697,9 @@ expand_builtin (unsigned int code, tree exp, rtx target)
>return target;
>  }
> +  if (!validate_instance_type_required_extensions (rfn.instance.type, exp))
> +return target;
> +
>return function_expander (rfn.instance, rfn.decl, exp, target).expand ();
> }
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c
> new file mode 100644
> index 000..b8474804c88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t 

[PATCH v1 3/3] RISC-V: Enable vectorizable early exit test

2024-05-13 Thread pan2 . li
From: Pan Li 

This patch depends on below 2 patches.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651459.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651460.html

After we supported vectorizable early exit in RISC-V,  we would like to
enable the gcc vect test for vectorizable early test.

The vect-early-break_124-pr114403.c failed to vectorize for now.
Because that the __builtin_memcpy with 8 bytes failed to folded into
int64 assignment during ccp1.  We will improve that first and mark
this as xfail for RISC-V.

The below tests are passed for this patch:
1. The riscv fully regression tests.
2. The aarch64 fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-mask-store-1.c: Add pragma novector as it will
have 2 times LOOP VECTORIZED in RISC-V.
* gcc.dg/vect/vect-early-break_124-pr114403.c: Xfail for the
riscv backend.
* lib/target-supports.exp: Add RISC-V backend.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c  | 2 ++
 gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 2 +-
 gcc/testsuite/lib/target-supports.exp | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
index fdd9032da98..2f80bf89e5e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
@@ -28,6 +28,8 @@ main ()
 
   if (__builtin_memcmp (x, res, sizeof (x)) != 0)
 abort ();
+
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
 if (flag[i] != 0 && flag[i] != 1)
   abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 51abf245ccb..101ae1e0eaa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -2,7 +2,7 @@
 /* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-require-effective-target vect_long_long } */
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 6f5d477b128..adaa5912588 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4099,6 +4099,7 @@ proc check_effective_target_vect_early_break { } {
|| [check_effective_target_arm_v8_neon_ok]
|| [check_effective_target_sse4]
|| [istarget amdgcn-*-*]
+   || [check_effective_target_riscv_v]
}}]
 }
 
@@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } {
|| [check_effective_target_arm_v8_neon_hw]
|| [check_sse4_hw_available]
|| [istarget amdgcn-*-*]
+   || [check_effective_target_riscv_v]
}}]
 }
 
-- 
2.34.1



[PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-13 Thread pan2 . li
From: Pan Li 

This patch depends on below middle-end implementation.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651459.html

After we support the loop lens for the vectorizable,  we would like to
implement the feature for the RISC-V target.  Given below example:

unsigned vect_a[1923];
unsigned vect_b[1923];

unsigned test (unsigned limit, int n)
{
  unsigned ret = 0;

  for (int i = 0; i < n; i++)
{
  vect_b[i] = limit + i;

  if (vect_a[i] > limit)
{
  ret = vect_b[i];
  return ret;
}

  vect_a[i] = limit;
}

  return ret;
}

Before this patch:
  ...
.L8:
  swa3,0(a5)
  addiw a0,a0,1
  addi  a4,a4,4
  addi  a5,a5,4
  beq   a1,a0,.L2
.L4:
  swa0,0(a4)
  lwa2,0(a5)
  bleu  a2,a3,.L8
  ret

After this patch:
  ...
.L5:
  vsetvli   a5,a3,e8,mf4,ta,ma
  vmv1r.v   v4,v2
  vsetvli   t4,zero,e32,m1,ta,ma
  vmv.v.x   v1,a5
  vadd.vv   v2,v2,v1
  vsetvli   zero,a5,e32,m1,ta,ma
  vadd.vv   v5,v4,v3
  slli  a6,a5,2
  vle32.v   v1,0(t1)
  vmsltu.vv v1,v3,v1
  vcpop.m   t4,v1
  beq   t4,zero,.L4
  vmv.x.s   a4,v4
.L3:
  ...

The below tests are passed for this patch:
1. The riscv fully regression tests.

gcc/ChangeLog:

* config/riscv/autovec-opt.md 
(*vcond_mask_len_popcount_):
New pattern of vcond_mask_len_popcount for vector bool mode.
* config/riscv/autovec.md (vcond_mask_len_): New pattern
of vcond_mask_len for vector bool mode.
(cbranch4): New pattern for vector bool mode.
* config/riscv/vector-iterators.md: Add new unspec UNSPEC_SELECT_MASK.
* config/riscv/vector.md (@pred_popcount): Add
VLS mode to popcount pattern.
(@pred_popcount): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/early-break-1.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec-opt.md   | 33 ++
 gcc/config/riscv/autovec.md   | 60 +++
 gcc/config/riscv/vector-iterators.md  |  1 +
 gcc/config/riscv/vector.md| 18 +++---
 .../riscv/rvv/autovec/early-break-1.c | 34 +++
 .../riscv/rvv/autovec/early-break-2.c | 37 
 6 files changed, 174 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..04f85d8e455 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,36 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; Optimization pattern for early break auto-vectorization
+;; vcond_mask_len (mask, ones, zeros, len, bias) + vlmax popcount
+;; -> non vlmax popcount (mask, len)
+(define_insn_and_split "*vcond_mask_len_popcount_"
+  [(set (match_operand:P 0 "register_operand")
+(popcount:P
+ (unspec:VB_VLS [
+  (unspec:VB_VLS [
+   (match_operand:VB_VLS 1 "register_operand")
+   (match_operand:VB_VLS 2 "const_1_operand")
+   (match_operand:VB_VLS 3 "const_0_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK)
+  (match_operand 6 "autovec_length_operand")
+  (const_int 1)
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_nonvlmax_insn (
+   code_for_pred_popcount (mode, Pmode),
+   riscv_vector::CPOP_OP,
+   operands, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..dfa58b8af69 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,63 @@ (define_expand "rawmemchr"
 DONE;
   }
 )
+
+;; =
+;; == Early break auto-vectorization patterns
+;; =
+
+;; vcond_mask_len
+(define_insn_and_split "vcond_mask_len_"
+  [(set (match_operand:VB 0 "register_operand")
+(unspec: VB [
+ (match_operand:VB 1 "register_operand")
+ (match_operand:VB 2 "const_1_operand")
+ (match_operand:VB 3 "const_0_operand")
+ (match_operand 4 "autovec_length_operand")
+ (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+machine_mode mode = riscv_vector::get_vector_mode (Pmode,
+   

[PATCH v1 1/3] Vect: Support loop len in vectorizable early exit

2024-05-13 Thread pan2 . li
From: Pan Li 

This patch adds early break auto-vectorization support for target which
use length on partial vectorization.  Consider this following example:

unsigned vect_a[802];
unsigned vect_b[802];

void test (unsigned x, int n)
{
  for (int i = 0; i < n; i++)
  {
    vect_b[i] = x + i;

    if (vect_a[i] > x)
      break;

    vect_a[i] = x;
  }
}

We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
And then the IR of RVV looks like below:

  ...
  _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
  _55 = (int) _87;
  ...
  mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
  vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
{0, ... }, _87, 0);
  if (vec_len_mask_72 != { 0, ... })
    goto ; [5.50%]
  else
    goto ; [94.50%]

The below tests are passed for this patch:
1. The riscv fully regression tests.
2. The aarch64 fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_early_exit): Add loop len
handling for one or multiple stmt.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-stmts.cc | 47 --
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 21e8fe98e44..bfd9d66568f 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12896,7 +12896,9 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info 
stmt_info,
 ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
   vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
   bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
+  bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
 
   /* Now build the new conditional.  Pattern gimple_conds get dropped during
  codegen so we must replace the original insn.  */
@@ -12960,12 +12962,11 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
{
  if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
  OPTIMIZE_FOR_SPEED))
-   return false;
+   vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1);
  else
vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
}
 
-
   return true;
 }
 
@@ -13018,6 +13019,25 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
  stmts[i], _gsi);
workset.quick_push (stmt_mask);
  }
+  else if (len_loop_p)
+   for (unsigned i = 0; i < stmts.length (); i++)
+ {
+   tree all_ones_mask = build_all_ones_cst (vectype);
+   tree all_zero_mask = build_zero_cst (vectype);
+   tree len = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies,
+ vectype, i, 1);
+   signed char cst = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+   tree bias = build_int_cst (intQI_type_node, cst);
+   tree len_mask
+ = make_temp_ssa_name (TREE_TYPE (stmts[i]), NULL, "vec_len_mask");
+   gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5,
+ stmts[i], all_ones_mask,
+ all_zero_mask, len, bias);
+   gimple_call_set_lhs (call, len_mask);
+   gsi_insert_before (_gsi, call, GSI_SAME_STMT);
+
+   workset.quick_push (len_mask);
+ }
   else
workset.splice (stmts);
 
@@ -13042,6 +13062,29 @@ vectorizable_early_exit (vec_info *vinfo, 
stmt_vec_info stmt_info,
  new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
   new_temp, _gsi);
}
+  else if (len_loop_p)
+   {
+ /* len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
+
+which is equivalent to:
+
+len_mask = compare_mask mask && i < len ? 1 : 0
+ */
+ tree all_ones_mask = build_all_ones_cst (vectype);
+ tree all_zero_mask = build_zero_cst (vectype);
+ tree len
+   = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, vectype, 0, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ tree bias = build_int_cst (intQI_type_node, biasval);
+ tree len_mask
+   = make_temp_ssa_name (TREE_TYPE (new_temp), NULL, "vec_len_mask");
+ gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5,
+   new_temp, all_ones_mask,
+   all_zero_mask, len, bias);
+ gimple_call_set_lhs (call, len_mask);
+ gsi_insert_before (_gsi, call, GSI_SAME_STMT);
+ new_temp = len_mask;
+   }
 }
 
   gcc_assert 

Re: [PATCH v2 2/3] diagnostics: Don't hardcode auto_enable_urls to false for mingw hosts

2024-05-13 Thread Peter0x44

13 May 2024 1:30:28 pm NightStrike :

On Thu, May 9, 2024 at 1:03 PM Peter Damianov  
wrote:


Windows terminal and mintty both have support for link escape 
sequences, and so
auto_enable_urls shouldn't be hardcoded to false. For older versions 
of the
windows console, mingw_ansi_fputs's console API translation logic does 
mangle
these sequences, but there's nothing useful it could do even if this 
weren't

the case, so check if the ansi escape sequences are supported at all.

conhost.exe doesn't support link escape sequences, but printing them 
does not

cause any problems.


Are there any issues when running under the Wine console, such as when
running the testsuite?


I installed wine and gave compiling a file emitting a warning a try. 
Unfortunately, yes, gcc emits mangled warnings here. Even simply running 
this patch under wine causes problems, it's not just wine's conhost.exe.


I'm not sure whether it's my fault or wine's. I've attached two 
screenshots demonstrating exactly what happens. (I think???) wine should 
only be advertising that it supports those settings regarding escape 
sequences if it actually does. Also, on this machine, wine is near 
unusably slow, I'm talking multiple seconds to react to a keypress 
through the wine conhost. I will not be attempting to run the testsuite, 
I severely doubt it will work.

Re: [PATCH] internal-fn: Do not force vcond operand to reg.

2024-05-13 Thread Richard Biener
On Mon, May 13, 2024 at 4:14 PM Robin Dapp  wrote:
>
> > What happens if we simply remove all of the force_reg here?
>
> On x86 I bootstrapped and tested the attached without fallout
> (gcc188, so it's no avx512-native machine and therefore limited
> coverage).  riscv regtest is unchanged.
> For aarch64 I would to rely on the pre-commit CI to pick it
> up (does that work on sub-threads?).

OK if that pre-commit CI works out.

Richard.

> Regards
>  Robin
>
>
> gcc/ChangeLog:
>
> PR middle-end/113474
>
> * internal-fn.cc (expand_vec_cond_mask_optab_fn):  Remove
> force_regs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr113474.c: New test.
> ---
>  gcc/internal-fn.cc  |  3 ---
>  .../gcc.target/riscv/rvv/autovec/pr113474.c | 13 +
>  2 files changed, 13 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 2c764441cde..4d226c478b4 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3163,9 +3163,6 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>rtx_op1 = expand_normal (op1);
>rtx_op2 = expand_normal (op2);
>
> -  mask = force_reg (mask_mode, mask);
> -  rtx_op1 = force_reg (mode, rtx_op1);
> -
>rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>create_output_operand ([0], target, mode);
>create_input_operand ([1], rtx_op1, mode);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
> new file mode 100644
> index 000..0364bf9f5e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target riscv_v } }  */
> +/* { dg-additional-options "-std=c99" }  */
> +
> +void
> +foo (int n, int **a)
> +{
> +  int b;
> +  for (b = 0; b < n; b++)
> +for (long e = 8; e > 0; e--)
> +  a[b][e] = a[b][e] == 15;
> +}
> +
> +/* { dg-final { scan-assembler "vmerge.vim" } }  */
> --
> 2.45.0
>


Re: [PATCH] internal-fn: Do not force vcond operand to reg.

2024-05-13 Thread Robin Dapp
> What happens if we simply remove all of the force_reg here?

On x86 I bootstrapped and tested the attached without fallout
(gcc188, so it's no avx512-native machine and therefore limited
coverage).  riscv regtest is unchanged.
For aarch64 I would to rely on the pre-commit CI to pick it
up (does that work on sub-threads?).

Regards
 Robin


gcc/ChangeLog:

PR middle-end/113474

* internal-fn.cc (expand_vec_cond_mask_optab_fn):  Remove
force_regs.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113474.c: New test.
---
 gcc/internal-fn.cc  |  3 ---
 .../gcc.target/riscv/rvv/autovec/pr113474.c | 13 +
 2 files changed, 13 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 2c764441cde..4d226c478b4 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3163,9 +3163,6 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
-  mask = force_reg (mask_mode, mask);
-  rtx_op1 = force_reg (mode, rtx_op1);
-
   rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   create_output_operand ([0], target, mode);
   create_input_operand ([1], rtx_op1, mode);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
new file mode 100644
index 000..0364bf9f5e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target riscv_v } }  */
+/* { dg-additional-options "-std=c99" }  */
+
+void
+foo (int n, int **a)
+{
+  int b;
+  for (b = 0; b < n; b++)
+for (long e = 8; e > 0; e--)
+  a[b][e] = a[b][e] == 15;
+}
+
+/* { dg-final { scan-assembler "vmerge.vim" } }  */
-- 
2.45.0



Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar

2024-05-13 Thread Kito Cheng
LGTM as well :)

On Sat, May 11, 2024 at 3:58 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM from my side. Wait for kito chime in.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: pan2.li
> Date: 2024-05-11 15:54
> To: gcc-patches
> CC: juzhe.zhong; kito.cheng; Pan Li
> Subject: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 
> scalar
> From: Pan Li 
>
> For the vfw vx format RVV intrinsic, the scalar type _Float16 also
> requires the zvfh extension.  Unfortunately,  we only check the
> vector tree type and miss the scalar _Float16 type checking.  For
> example:
>
> vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t vs2, _Float16 rs1, size_t 
> vl)
> {
>   return __riscv_vfwsub_wf_f32mf2(vs2, rs1, vl);
> }
>
> It should report some error message like zvfh extension is required
> instead of ICE for unreg insn.
>
> This patch would like to make up such kind of validation for _Float16
> in the RVV intrinsic API.  It will report some error like below when
> there is no zvfh enabled.
>
> error: built-in function '__riscv_vfwsub_wf_f32mf2(vs2,  rs1,  vl)'
>   requires the zvfhmin or zvfh ISA extension
>
> PR target/114988
>
> Passed the rv64gcv fully regression tests, included c/c++/fortran.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins.cc
> (validate_instance_type_required_extensions): New func impl to
> validate the intrinisc func type ops.
> (expand_builtin): Validate instance type before expand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr114988-1.c: New test.
> * gcc.target/riscv/rvv/base/pr114988-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
> gcc/config/riscv/riscv-vector-builtins.cc | 51 +++
> .../gcc.target/riscv/rvv/base/pr114988-1.c|  9 
> .../gcc.target/riscv/rvv/base/pr114988-2.c|  9 
> 3 files changed, 69 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-2.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 192a6c230d1..3fdb4400d70 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -4632,6 +4632,54 @@ gimple_fold_builtin (unsigned int code, 
> gimple_stmt_iterator *gsi, gcall *stmt)
>return gimple_folder (rfn.instance, rfn.decl, gsi, stmt).fold ();
> }
> +static bool
> +validate_instance_type_required_extensions (const rvv_type_info type,
> + tree exp)
> +{
> +  uint64_t exts = type.required_extensions;
> +
> +  if ((exts & RVV_REQUIRE_ELEN_FP_16) &&
> +!TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the "
> + "zvfhmin or zvfh ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  if ((exts & RVV_REQUIRE_ELEN_FP_32) &&
> +!TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the "
> + "zve32f, zve64f, zve64d or v ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  if ((exts & RVV_REQUIRE_ELEN_FP_64) &&
> +!TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the zve64d or v ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  if ((exts & RVV_REQUIRE_ELEN_64) &&
> +!TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags))
> +{
> +  error_at (EXPR_LOCATION (exp),
> + "built-in function %qE requires the "
> + "zve64x, zve64f, zve64d or v ISA extension",
> + exp);
> +  return false;
> +}
> +
> +  return true;
> +}
> +
> /* Expand a call to the RVV function with subcode CODE.  EXP is the call
> expression and TARGET is the preferred location for the result.
> Return the value of the lhs.  */
> @@ -4649,6 +4697,9 @@ expand_builtin (unsigned int code, tree exp, rtx target)
>return target;
>  }
> +  if (!validate_instance_type_required_extensions (rfn.instance.type, exp))
> +return target;
> +
>return function_expander (rfn.instance, rfn.decl, exp, target).expand ();
> }
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c
> new file mode 100644
> index 000..b8474804c88
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t vs2, _Float16 rs1, size_t 
> vl)
> +{
> +  return __riscv_vfwsub_wf_f32mf2(vs2, rs1, vl); /* { dg-error {built-in 
> function '__riscv_vfwsub_wf_f32mf2\(vs2,  rs1,  vl\)' requires the zvfhmin or 
> zvfh ISA extension} } */
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-2.c 
> 

Re: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls

2024-05-13 Thread David Malcolm
On Mon, 2024-05-13 at 09:42 -0400, David Malcolm wrote:
> On Mon, 2024-05-13 at 11:14 +0200, Mark Wielaard wrote:
> > Hi Evgeny,
> > 
> > Adding David to the CC, who might know the details.
> > 
> > On Mon, May 13, 2024 at 08:44:12AM +, Evgeny Karpov wrote:
> > > Sunday, May 12, 2024
> > > 
> > > Thank you for reviewing our changes related to the refactoring of
> > > extracting the MinGW implementation from ix64.
> > > 
> > > It was expected to move the MinGW-related files without changes
> > > in
> > > this commit ("Reuse MinGW from i386 for AArch64") and apply the
> > > renaming in a follow-up commit, which has been done in 'Rename
> > > "x86
> > > Windows Options" to "Cygwin and MinGW Options"'.
> > > 
> > > The script to update opt.urls files has been used.
> > > 
> > > > diff --git a/gcc/config/mingw/cygming.opt.urls
> > > > b/gcc/config/mingw/cygming.opt.urls
> > > > index c624e22e4427..af11c4997609 100644
> > > > --- a/gcc/config/mingw/cygming.opt.urls
> > > > +++ b/gcc/config/mingw/cygming.opt.urls
> > > > @@ -1,4 +1,4 @@
> > > 
> > > > -; Autogenerated by regenerate-opt-urls.py from
> > > > gcc/config/i386/cygming.opt
> > > > and generated HTML
> > > > +; Autogenerated by regenerate-opt-urls.py from
> > > > +gcc/config/mingw/cygming.opt and generated HTML
> > > 
> > > I am not sure why this comment has not been updated. Is it
> > > critical
> > > or it could be updated next time when it is needed?
> > 
> > Odd that the script didn't update this comment, it really should
> > have.
> > It might be that running the script through make regenerate-opt-
> > urls
> > inside the gcc build subdir invokes regenerate-opt-urls.py slightly
> > differently so that this line is updated.
> 
> It might be a "make" dependencies issue:
> "make regenerate-opt-urls" has dependencies on OPT_URLS_HTML_DEPS
> which
> is currently defined as:
> OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \
> $(build_htmldir)/gdc/Option-Index.html \
> $(build_htmldir)/gfortran/Option-Index.html
> which might not be enough for the doc changes when moving things
> around
> that affect other generated html files.
> 
> So when the CI runs "make regenerate-opt-urls" in a pristine build it
> will forcibly rerun texinfo to regenerate the docs first, whereas if
> you manually run the script in a build directory, you might not be
> seeing the latest version of the HTML (especially in thre presence of
> file moves).
> 
> So I think the Makefile as currently written handles most cases, but
> can get it slightly wrong for the case you ran into here (sorry);
> fully
> refreshing the built docs ought to fix such cases.

Specifically, if you have some generated .html files in the
$(build_htmldir) from a file that has gone away (due to a move), then I
suspect these .html files stick around until you fully delete the
$(build_htmldir), and in the meantime they get found by regenerate-opt-
urls.py and lead to duplicate enries, leading to differences against a
pristine build dir.

Dave

> 
> That's my theory of what happened here, anyway.
> 
> Dave
> 
> > 
> > > >  mconsole
> > > >  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole)
> > > > @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-
> > > > Options.html#index-
> > > > mdll)
> > > >  mnop-fun-dllimport
> > > >  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-
> > > > dllimport)
> > > > 
> > > > -; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> > > > -;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-
> > > > mthreads-1'
> > > > -;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> > > > +mthreads
> > > > +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1)
> > > 
> > > mthreads has the same issue before applying changes. Has
> > > something
> > > been changed recently?
> > > This is the change in patch series in 'Rename "x86 Windows
> > > Options"
> > > to "Cygwin and MinGW Options"' commit.
> > > 
> > > ; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> > > +;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-
> > > mthreads-
> > > 1'
> > >  ;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> > > -;   duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1'
> > 
> > Again, it might be caused by invoking the script by hand vs with
> > make
> > regenerate-opt-urls.py. I believe with the make option it will
> > renumber the suffixes making sure the urls are unique.
> > 
> > BTW. There is a CI buildbot that tries to regenerate all generated
> > files, which is how I spotted this:
> > https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen
> > (It should also sent email to the author of the patch on failure.)
> > 
> > Cheers,
> > 
> > Mark
> > 
> 



Re: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls

2024-05-13 Thread David Malcolm
On Mon, 2024-05-13 at 11:14 +0200, Mark Wielaard wrote:
> Hi Evgeny,
> 
> Adding David to the CC, who might know the details.
> 
> On Mon, May 13, 2024 at 08:44:12AM +, Evgeny Karpov wrote:
> > Sunday, May 12, 2024
> > 
> > Thank you for reviewing our changes related to the refactoring of
> > extracting the MinGW implementation from ix64.
> > 
> > It was expected to move the MinGW-related files without changes in
> > this commit ("Reuse MinGW from i386 for AArch64") and apply the
> > renaming in a follow-up commit, which has been done in 'Rename "x86
> > Windows Options" to "Cygwin and MinGW Options"'.
> > 
> > The script to update opt.urls files has been used.
> > 
> > > diff --git a/gcc/config/mingw/cygming.opt.urls
> > > b/gcc/config/mingw/cygming.opt.urls
> > > index c624e22e4427..af11c4997609 100644
> > > --- a/gcc/config/mingw/cygming.opt.urls
> > > +++ b/gcc/config/mingw/cygming.opt.urls
> > > @@ -1,4 +1,4 @@
> > 
> > > -; Autogenerated by regenerate-opt-urls.py from
> > > gcc/config/i386/cygming.opt
> > > and generated HTML
> > > +; Autogenerated by regenerate-opt-urls.py from
> > > +gcc/config/mingw/cygming.opt and generated HTML
> > 
> > I am not sure why this comment has not been updated. Is it critical
> > or it could be updated next time when it is needed?
> 
> Odd that the script didn't update this comment, it really should
> have.
> It might be that running the script through make regenerate-opt-urls
> inside the gcc build subdir invokes regenerate-opt-urls.py slightly
> differently so that this line is updated.

It might be a "make" dependencies issue:
"make regenerate-opt-urls" has dependencies on OPT_URLS_HTML_DEPS which
is currently defined as:
OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \
$(build_htmldir)/gdc/Option-Index.html \
$(build_htmldir)/gfortran/Option-Index.html
which might not be enough for the doc changes when moving things around
that affect other generated html files.

So when the CI runs "make regenerate-opt-urls" in a pristine build it
will forcibly rerun texinfo to regenerate the docs first, whereas if
you manually run the script in a build directory, you might not be
seeing the latest version of the HTML (especially in thre presence of
file moves).

So I think the Makefile as currently written handles most cases, but
can get it slightly wrong for the case you ran into here (sorry); fully
refreshing the built docs ought to fix such cases.

That's my theory of what happened here, anyway.

Dave

> 
> > >  mconsole
> > >  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole)
> > > @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-
> > > Options.html#index-
> > > mdll)
> > >  mnop-fun-dllimport
> > >  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-
> > > dllimport)
> > > 
> > > -; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> > > -;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-
> > > mthreads-1'
> > > -;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> > > +mthreads
> > > +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1)
> > 
> > mthreads has the same issue before applying changes. Has something
> > been changed recently?
> > This is the change in patch series in 'Rename "x86 Windows Options"
> > to "Cygwin and MinGW Options"' commit.
> > 
> > ; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> > +;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-
> > 1'
> >  ;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> > -;   duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1'
> 
> Again, it might be caused by invoking the script by hand vs with make
> regenerate-opt-urls.py. I believe with the make option it will
> renumber the suffixes making sure the urls are unique.
> 
> BTW. There is a CI buildbot that tries to regenerate all generated
> files, which is how I spotted this:
> https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen
> (It should also sent email to the author of the patch on failure.)
> 
> Cheers,
> 
> Mark
> 



RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-13 Thread Li, Pan2
Thanks Tamer for comments.

> I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when 
> optimizing for size.

Sure thing, let me update it in v5.

> Hmm why do you iterate independently over the statements? The block below 
> already visits
> Every statement doesn't it?

Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the 
shape changed, or shall we put it to the previous pass ?

> The root of your match is a BIT_IOR_EXPR expression, so I think you just need 
> to change the entry below to:
>
>   case BIT_IOR_EXPR:
> match_saturation_arith (, stmt, m_cfg_changed_p);
> /* fall-through */
>   case BIT_XOR_EXPR:
> match_uaddc_usubc (, stmt, code);
> break;

There are other shapes (not covered in this patch) of SAT_ADD like below branch 
version, the IOR should be one of the ROOT. Thus doesn't
add case here.  Then, shall we take case for each shape here ? Both works for 
me.

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint32_t)

Pan


-Original Message-
From: Tamar Christina  
Sent: Monday, May 13, 2024 5:10 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Liu, Hongtao 
Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

Hi Pan,

> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, May 6, 2024 3:48 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com;
> hongtao@intel.com; Pan Li 
> Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar
> int
> 
> From: Pan Li 
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;succ:   EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;succ:   EXIT
> }
> 
> We perform the tranform during widen_mult because that the sub-expr of
> SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> cannot perform the .SAT_ADD pattern match as the sub-expr will be
> optmized to .ADD_OVERFLOW first.
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
>   PR target/51492
>   PR target/112600
> 
> gcc/ChangeLog:
> 
>   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
>   to the return true switch case(s).
>   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
>   * match.pd: Add unsigned SAT_ADD match.
>   * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
>   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
>   func decl generated in match.pd match.
>   (match_saturation_arith): New func impl to match the saturation arith.
>   (math_opts_dom_walker::after_dom_children): Try match saturation
>   arith.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc|  1 +
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 28 
>  gcc/optabs.def|  4 ++--
>  gcc/tree-ssa-math-opts.cc | 46
> +++
>  5 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ 

Re: [PATCH 1/4] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-05-13 Thread Joseph Myers
On Mon, 13 May 2024, Kewen.Lin wrote:

> > In fact replacing all of X_TYPE_SIZE with a single hook might be worthwhile
> > though this removes the "convenient" defaulting, requiring each target to
> > enumerate all standard C ABI type modes.  But that might be also a good 
> > thing.
> > 
> 
> I guess the main value by extending from floating point types to all is to
> unify them?  (Assuming that excepting for floating types the others would
> not have multiple possible representations like what we faces on 128bit fp).

For integer types, giving the number of bits makes sense as an interface - 
there isn't an issue with different modes.

So I think it's appropriate for floating and integer types to have 
separate hooks - with the one for floating types returning a mode, and the 
one for integer types returning a number of bits.  (And also keep the 
existing separate hook for _FloatN / _FloatNx modes.)

That may also make for more convenient defaults (whether a target has long 
double wider than double is largely independent of what sizes it uses for 
integer types).

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH] PR60276 fix for single-lane SLP

2024-05-13 Thread Richard Biener
When enabling single-lane SLP and not splitting groups the fix for
PR60276 is no longer effective since it for unknown reason exempted
pure SLP.  The following removes this exemption, making
gcc.dg/vect/pr60276.c PASS even with --param vect-single-lane-slp=1

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/60276
* tree-vect-stmts.cc (vectorizable_load): Do not exempt
pure_slp grouped loads from the STMT_VINFO_MIN_NEG_DIST
restriction.
---
 gcc/tree-vect-stmts.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 21e8fe98e44..b8a71605f1b 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9995,8 +9995,7 @@ vectorizable_load (vec_info *vinfo,
 
   /* Invalidate assumptions made by dependence analysis when vectorization
 on the unrolled body effectively re-orders stmts.  */
-  if (!PURE_SLP_STMT (stmt_info)
- && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
+  if (STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
  && maybe_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
   STMT_VINFO_MIN_NEG_DIST (stmt_info)))
{
-- 
2.35.3


Re: [PATCH v2 2/3] diagnostics: Don't hardcode auto_enable_urls to false for mingw hosts

2024-05-13 Thread Peter0x44

13 May 2024 1:30:28 pm NightStrike :

On Thu, May 9, 2024 at 1:03 PM Peter Damianov  
wrote:


Windows terminal and mintty both have support for link escape 
sequences, and so
auto_enable_urls shouldn't be hardcoded to false. For older versions 
of the
windows console, mingw_ansi_fputs's console API translation logic does 
mangle
these sequences, but there's nothing useful it could do even if this 
weren't

the case, so check if the ansi escape sequences are supported at all.

conhost.exe doesn't support link escape sequences, but printing them 
does not

cause any problems.


Are there any issues when running under the Wine console, such as when
running the testsuite?


I did not try this. There shouldn't be problems if wine implements 
ENABLE_VIRTUAL_TERMINAL_PROCESSING correctly, but I agree it would be 
good to check. Are there instructions anywhere for running the testsuite 
with wine? Anything specific I need to do?


Re: [PATCH v2 2/3] diagnostics: Don't hardcode auto_enable_urls to false for mingw hosts

2024-05-13 Thread NightStrike
On Thu, May 9, 2024 at 1:03 PM Peter Damianov  wrote:
>
> Windows terminal and mintty both have support for link escape sequences, and 
> so
> auto_enable_urls shouldn't be hardcoded to false. For older versions of the
> windows console, mingw_ansi_fputs's console API translation logic does mangle
> these sequences, but there's nothing useful it could do even if this weren't
> the case, so check if the ansi escape sequences are supported at all.
>
> conhost.exe doesn't support link escape sequences, but printing them does not
> cause any problems.

Are there any issues when running under the Wine console, such as when
running the testsuite?


Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]

2024-05-13 Thread Rainer Orth
Hi Nathaniel,

>> > There are a couple of other tests that appear to potentially have a
>> > similar issue:
>> >
>> > global-2_a.C
>> > 21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*'
>> > added} module } }
>> >
>> > global-3_a.C
>> > 15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*'
>> > added} module } }
>> 
>> neither module file contains "Reachable GMF" at all, with ::printf or
>> otherwise.
>> 
>
> Yes, I think the test is aiming to check that such a declaration is not
> added at all, and so that's correct. But if for some reason on some
> system it did add "::std::printf" that would be a bug that would not be
> caught by this test.

understood.  However, the question about global-3_a.C remains which
contains no printf at all.

>> > Which I suppose maybe also should be updated in the same way; I guess
>> > they don't fail on Solaris because they aren't actually correctly
>> > testing what they think they are.
>> 
>> Perhaps, but it would be useful to first understand what those tests are
>> supposed to look like.  WRT global-3_a.C, printf doesn't occur at all,
>> so this may just be a case of copy-and-paste.
>> 
>> Maybe Nathan, who authored the tests, can shed some light.
>> 
>> > Otherwise LGTM.
>> 
>> Thanks.  I'll go ahead and commit the patch as is, asjusting the other
>> two once it's become clear what they should look like.
>> 
>
> Ah, I should have been clearer: I'm not sure I can approve, but I've
> CC'd Jason in.

Sorry, I already committed the patch.  I can revert, of course, if
that's inappropriate.  OTOH, it could be considered obvious ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/5/13 10:57, Jiufu Guo wrote:
> Hi,
> 
> For PR96866, when gcc print asm code for modifier "%a" which requires
> an address operand, while the operand is with the constraint "X" which
> allow non-address form.  An error message would be reported to indicate
> the invalid asm operands.
> 
> Bootstrap pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff(Jiufu Guo)
> 
>   PR target/96866
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (print_operand_address):
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr96866-1.c: New test.
>   * gcc.target/powerpc/pr96866-2.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc  |  6 ++
>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++
>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++
>  3 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 117999613d8..50943d76f79 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
>  || GET_CODE (x) == LABEL_REF)
>  {
> +  if (this_is_asm_operands && !address_operand (x, VOIDmode))

Do we really need this_is_asm_operands here?

> + {
> +   output_operand_lossage ("invalid expression as operand");
> +   return;
> + }
> +
>output_addr_const (file, x);
>if (small_data_operand (x, GET_MODE (x)))
>   fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> new file mode 100644
> index 000..6554a472a11
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> @@ -0,0 +1,15 @@
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-2.c" } */
> +/* { dg-options "-fPIC -O2" } */

Nit: If these two options are required, it would be good to have a comment 
explaining it a bit
when it's not obvious.

> +
> +int x[2];
> +
> +int __attribute__ ((noipa))
> +f1 (void)
> +{
> +  int n;
> +  int *p = x;
> +  *p++;
> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
> +  return n;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> new file mode 100644
> index 000..a5ec96f29dd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> @@ -0,0 +1,10 @@
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-2.c" } */
> +/* { dg-options "-fPIC -O2" } */

Ditto.

BR,
Kewen

> +
> +void
> +f (void)
> +{
> +  extern int x;
> +  __asm__ volatile("#%a0" ::"X"());
> +}



Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]

2024-05-13 Thread Nathaniel Shead
On Mon, May 13, 2024 at 01:59:51PM +0200, Rainer Orth wrote:
> Hi Nathaniel,
> 
> > On Mon, May 13, 2024 at 10:40:30AM +0200, Rainer Orth wrote:
> >> g++.dg/modules/stdio-1_a.H currently FAILs on Solaris:
> >> 
> >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++17  scan-lang-dump module 
> >> "Depset:0 decl entity:[0-9]* function_decl:'::printf'"
> >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a  scan-lang-dump module 
> >> "Depset:0 decl entity:[0-9]* function_decl:'::printf'"
> >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b  scan-lang-dump module 
> >> "Depset:0 decl entity:[0-9]* function_decl:'::printf'"
> >> 
> >> The problem is that the module file doesn't contain
> >> 
> >>  Depset:0 decl entity:95 function_decl:'::printf'
> >> 
> >> as expected by the test, but
> >> 
> >>  Depset:0 decl entity:26 function_decl:'::std::printf'
> >> 
> >> This happens because Solaris  declares printf in namespace std
> >> as allowed by C++11, Annex D, D.5.
> >> 
> >> This patch allows for both forms.
> >> 
> >> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
> >> x86_64-pc-linux-gnu.
> >> 
> >> Ok for trunk?
> >> 
> >>Rainer
> >
> > There are a couple of other tests that appear to potentially have a
> > similar issue:
> >
> > global-2_a.C
> > 21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*'
> > added} module } }
> >
> > global-3_a.C
> > 15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*'
> > added} module } }
> 
> neither module file contains "Reachable GMF" at all, with ::printf or
> otherwise.
> 

Yes, I think the test is aiming to check that such a declaration is not
added at all, and so that's correct. But if for some reason on some
system it did add "::std::printf" that would be a bug that would not be
caught by this test.

> > Which I suppose maybe also should be updated in the same way; I guess
> > they don't fail on Solaris because they aren't actually correctly
> > testing what they think they are.
> 
> Perhaps, but it would be useful to first understand what those tests are
> supposed to look like.  WRT global-3_a.C, printf doesn't occur at all,
> so this may just be a case of copy-and-paste.
> 
> Maybe Nathan, who authored the tests, can shed some light.
> 
> > Otherwise LGTM.
> 
> Thanks.  I'll go ahead and commit the patch as is, asjusting the other
> two once it's become clear what they should look like.
> 

Ah, I should have been clearer: I'm not sure I can approve, but I've
CC'd Jason in.

>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]

2024-05-13 Thread Rainer Orth
Hi Nathaniel,

> On Mon, May 13, 2024 at 10:40:30AM +0200, Rainer Orth wrote:
>> g++.dg/modules/stdio-1_a.H currently FAILs on Solaris:
>> 
>> FAIL: g++.dg/modules/stdio-1_a.H -std=c++17  scan-lang-dump module "Depset:0 
>> decl entity:[0-9]* function_decl:'::printf'"
>> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a  scan-lang-dump module "Depset:0 
>> decl entity:[0-9]* function_decl:'::printf'"
>> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b  scan-lang-dump module "Depset:0 
>> decl entity:[0-9]* function_decl:'::printf'"
>> 
>> The problem is that the module file doesn't contain
>> 
>>  Depset:0 decl entity:95 function_decl:'::printf'
>> 
>> as expected by the test, but
>> 
>>  Depset:0 decl entity:26 function_decl:'::std::printf'
>> 
>> This happens because Solaris  declares printf in namespace std
>> as allowed by C++11, Annex D, D.5.
>> 
>> This patch allows for both forms.
>> 
>> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
>> x86_64-pc-linux-gnu.
>> 
>> Ok for trunk?
>> 
>>  Rainer
>
> There are a couple of other tests that appear to potentially have a
> similar issue:
>
> global-2_a.C
> 21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*'
> added} module } }
>
> global-3_a.C
> 15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*'
> added} module } }

neither module file contains "Reachable GMF" at all, with ::printf or
otherwise.

> Which I suppose maybe also should be updated in the same way; I guess
> they don't fail on Solaris because they aren't actually correctly
> testing what they think they are.

Perhaps, but it would be useful to first understand what those tests are
supposed to look like.  WRT global-3_a.C, printf doesn't occur at all,
so this may just be a case of copy-and-paste.

Maybe Nathan, who authored the tests, can shed some light.

> Otherwise LGTM.

Thanks.  I'll go ahead and commit the patch as is, asjusting the other
two once it's become clear what they should look like.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH][14 backport] c++: Fix instantiation of imported temploid friends [PR114275]

2024-05-13 Thread Nathaniel Shead
> > @@ -11751,9 +11767,16 @@ tsubst_friend_class (tree friend_tmpl, tree args)
> > if (tmpl != error_mark_node)
> > {
> >   /* The new TMPL is not an instantiation of anything, so we
> > -forget its origins.  We don't reset CLASSTYPE_TI_TEMPLATE
> > +forget its origins.  It is also not a specialization of
> > +anything.  We don't reset CLASSTYPE_TI_TEMPLATE
> >  for the new type because that is supposed to be the
> >  corresponding template decl, i.e., TMPL.  */
> > + spec_entry elt;
> > + elt.tmpl = friend_tmpl;
> > + elt.args = CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl));
> > + elt.spec = TREE_TYPE (tmpl);
> > + type_specializations->remove_elt ();
> 
> For GCC 14.2 let's guard this with if (modules_p ()); for GCC 15 it can be
> unconditional.  OK.
> 
> Jason
> 

I'm looking to backport this patch to GCC 14 now that it's been on trunk
some time.  Here's the patch I'm aiming to add (squashed with the
changes from r15-220-gec2365e07537e8) after cherrypicking the
prerequisite commit r15-58-g2faf040335f9b4; is this OK?

Or should I keep it as two separate commits to make the cherrypicking
more obvious? Not entirely sure on the etiquette around this.

Bootstrapped and regtested on x86_64-pc-linux-gnu on top of the
releases/gcc-14 branch.

-- >8 --

This patch fixes a number of issues with the handling of temploid friend
declarations.

The primary issue is that instantiations of friend declarations should
attach the declaration to the same module as the befriending class, by
[module.unit] p7.1 and [temp.friend] p2; this could be a different
module from the current TU, and so needs special handling.

The other main issue here is that we can't assume that just because name
lookup didn't find a definition for a hidden class template, that it
doesn't exist at all: it could be a non-exported entity that we've
nevertheless streamed in from an imported module.  We need to ensure
that when instantiating template friend classes that we return the same
TEMPLATE_DECL that we got from our imports, otherwise we will get later
issues with 'duplicate_decls' (rightfully) complaining that they're
different when trying to merge.

This doesn't appear necessary for function templates due to the existing
name lookup handling already finding these hidden declarations.

PR c++/105320
PR c++/114275

gcc/cp/ChangeLog:

* cp-tree.h (propagate_defining_module): Declare.
(remove_defining_module): Declare.
(lookup_imported_hidden_friend): Declare.
* decl.cc (duplicate_decls): Also check if hidden decls can be
redeclared in this module. Call remove_defining_module on
to-be-freed newdecl.
* module.cc (imported_temploid_friends): New.
(init_modules): Initialize it.
(trees_out::decl_value): Write it; don't consider imported
temploid friends as attached to a module.
(trees_in::decl_value): Read it for non-discarded decls.
(get_originating_module_decl): Follow the owning decl for an
imported temploid friend.
(propagate_defining_module): New.
(remove_defining_module): New.
* name-lookup.cc (get_mergeable_namespace_binding): New.
(lookup_imported_hidden_friend): New.
* pt.cc (tsubst_friend_function): Propagate defining module for
new friend functions.
(tsubst_friend_class): Lookup imported hidden friends.  Check
for valid module attachment of existing names.  Propagate
defining module for new classes.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-10_a.C: New test.
* g++.dg/modules/tpl-friend-10_b.C: New test.
* g++.dg/modules/tpl-friend-10_c.C: New test.
* g++.dg/modules/tpl-friend-10_d.C: New test.
* g++.dg/modules/tpl-friend-11_a.C: New test.
* g++.dg/modules/tpl-friend-11_b.C: New test.
* g++.dg/modules/tpl-friend-12_a.C: New test.
* g++.dg/modules/tpl-friend-12_b.C: New test.
* g++.dg/modules/tpl-friend-12_c.C: New test.
* g++.dg/modules/tpl-friend-12_d.C: New test.
* g++.dg/modules/tpl-friend-12_e.C: New test.
* g++.dg/modules/tpl-friend-12_f.C: New test.
* g++.dg/modules/tpl-friend-13_a.C: New test.
* g++.dg/modules/tpl-friend-13_b.C: New test.
* g++.dg/modules/tpl-friend-13_c.C: New test.
* g++.dg/modules/tpl-friend-13_d.C: New test.
* g++.dg/modules/tpl-friend-13_e.C: New test.
* g++.dg/modules/tpl-friend-13_f.C: New test.
* g++.dg/modules/tpl-friend-13_g.C: New test.
* g++.dg/modules/tpl-friend-14_a.C: New test.
* g++.dg/modules/tpl-friend-14_b.C: New test.
* g++.dg/modules/tpl-friend-14_c.C: New test.
* g++.dg/modules/tpl-friend-14_d.C: New test.
* g++.dg/modules/tpl-friend-9.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 
Reviewed-by: Patrick Palka 
---
 

Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]

2024-05-13 Thread Nathaniel Shead
On Mon, May 13, 2024 at 10:40:30AM +0200, Rainer Orth wrote:
> g++.dg/modules/stdio-1_a.H currently FAILs on Solaris:
> 
> FAIL: g++.dg/modules/stdio-1_a.H -std=c++17  scan-lang-dump module "Depset:0 
> decl entity:[0-9]* function_decl:'::printf'"
> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a  scan-lang-dump module "Depset:0 
> decl entity:[0-9]* function_decl:'::printf'"
> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b  scan-lang-dump module "Depset:0 
> decl entity:[0-9]* function_decl:'::printf'"
> 
> The problem is that the module file doesn't contain
> 
>  Depset:0 decl entity:95 function_decl:'::printf'
> 
> as expected by the test, but
> 
>  Depset:0 decl entity:26 function_decl:'::std::printf'
> 
> This happens because Solaris  declares printf in namespace std
> as allowed by C++11, Annex D, D.5.
> 
> This patch allows for both forms.
> 
> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
> x86_64-pc-linux-gnu.
> 
> Ok for trunk?
> 
>   Rainer

There are a couple of other tests that appear to potentially have a
similar issue:

global-2_a.C
21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*' added} 
module } }

global-3_a.C
15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*' added} 
module } }

Which I suppose maybe also should be updated in the same way; I guess
they don't fail on Solaris because they aren't actually correctly
testing what they think they are.

Otherwise LGTM.

Nathaniel

> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
> 
> 
> 2024-05-13  Rainer Orth  
> 
>   gcc/testsuite:
>   PR c++/98529
>   * g++.dg/modules/stdio-1_a.H (scan-lang-dump): Allow for
>   ::std::printf.
> 

> diff --git a/gcc/testsuite/g++.dg/modules/stdio-1_a.H 
> b/gcc/testsuite/g++.dg/modules/stdio-1_a.H
> --- a/gcc/testsuite/g++.dg/modules/stdio-1_a.H
> +++ b/gcc/testsuite/g++.dg/modules/stdio-1_a.H
> @@ -10,5 +10,5 @@
>  #endif
>  // There should be *lots* of depsets (209 for glibc today)
>  // { dg-final { scan-lang-dump {Writing section:60 } module } }
> -// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* 
> function_decl:'::printf'} module } }
> +// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* 
> function_decl:'(::std)?::printf'} module } }
>  // { dg-final { scan-lang-dump {Depset:1 binding namespace_decl:'::printf'} 
> module } }



Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-13 Thread Aldy Hernandez
On Thu, May 9, 2024 at 10:05 AM Mikael Morin  wrote:
>
> Hello,
>
> Le 07/05/2024 à 04:37, HAO CHEN GUI a écrit :
> > Hi,
> >The former patch adds isfinite optab for __builtin_isfinite.
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html
> >
> >Thus the builtin might not be folded at front end. The range op for
> > isfinite is needed for value range analysis. This patch adds them.
> >
> >Compared to last version, this version fixes a typo.
> >
> >Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> > regressions. Is it OK for the trunk?
> >
> > Thanks
> > Gui Haochen
> >
> > ChangeLog
> > Value Range: Add range op for builtin isfinite
> >
> > The former patch adds optab for builtin isfinite. Thus builtin isfinite 
> > might
> > not be folded at front end.  So the range op for isfinite is needed for 
> > value
> > range analysis.  This patch adds range op for builtin isfinite.
> >
> > gcc/
> >   * gimple-range-op.cc (class cfn_isfinite): New.
> >   (op_cfn_finite): New variables.
> >   (gimple_range_op_handler::maybe_builtin_call): Handle
> >   CFN_BUILT_IN_ISFINITE.
> >
> > gcc/testsuite/
> >   * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.
> >
> > patch.diff
> > diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> > index 9de130b4022..99c511728d3 100644
> > --- a/gcc/gimple-range-op.cc
> > +++ b/gcc/gimple-range-op.cc
> > @@ -1192,6 +1192,56 @@ public:
> > }
> >   } op_cfn_isinf;
> >
> > +//Implement range operator for CFN_BUILT_IN_ISFINITE
> > +class cfn_isfinite : public range_operator
> > +{
> > +public:
> > +  using range_operator::fold_range;
> > +  using range_operator::op1_range;
> > +  virtual bool fold_range (irange , tree type, const frange ,
> > +const irange &, relation_trio) const override
> > +  {
> > +if (op1.undefined_p ())
> > +  return false;
> > +
> > +if (op1.known_isfinite ())
> > +  {
> > + r.set_nonzero (type);
> > + return true;
> > +  }
> > +
> > +if (op1.known_isnan ()
> > + || op1.known_isinf ())
> > +  {
> > + r.set_zero (type);
> > + return true;
> > +  }
> > +
> > +return false;
> I think the canonical API behaviour sets R to varying and returns true
> instead of just returning false if nothing is known about the range.

Correct.  If we know it's varying, we just set varying and return
true.  Returning false is usually reserved for "I have no idea".
However, every caller of fold_range() should know to ignore a return
of false, so you should be safe.

>
> I'm not sure whether it makes any difference; Aldy can probably tell.
> But if the type is bool, varying is [0,1] which is better than unknown
> range.

Also, I see you're setting zero/nonzero.  Is the return type known to
be boolean, because if so, we usually prefer to one of:

r = range_true ()
r = range_false ()
r = range_true_and_false ();

It doesn't matter either way, but it's probably best to use these as
they force boolean_type_node automatically.

I don't have a problem with this patch, but I would prefer the
floating point savvy people to review this, as there are no members of
the ranger team that are floating point experts :).

Also, I see you mention in your original post that this patch was
needed as a follow-up to this one:

https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

I don't see the above patch in the source tree currently:

Thanks.
Aldy

>
> > +  }
> > +  virtual bool op1_range (frange , tree type, const irange ,
> > +   const frange &, relation_trio) const override
> > +  {
> > +if (lhs.zero_p ())
> > +  {
> > + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be 
> > represented.
> > + // Set range to varying
> > + r.set_varying (type);
> > + return true;
> > +  }
> > +
> > +if (!range_includes_zero_p ())
> > +  {
> > + nan_state nan (false);
> > + r.set (type, real_min_representable (type),
> > +real_max_representable (type), nan);
> > + return true;
> > +  }
> > +
> > +return false;
> Same here.
>
> > +  }
> > +} op_cfn_isfinite;
> > +
> >   // Implement range operator for CFN_BUILT_IN_
> >   class cfn_parity : public range_operator
> >   {
>



Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Segher Boessenkool
Hi!

On Mon, May 13, 2024 at 10:57:12AM +0800, Jiufu Guo wrote:
> For PR96866, when gcc print asm code for modifier "%a" which requires
> an address operand,

It requires a *memory* operand, and it outputs its address.  This is a
generic modifier btw (not rs6000).

> while the operand is with the constraint "X" which
> allow non-address form.  An error message would be reported to indicate
> the invalid asm operands.

"non-address form"?  Every mem has an address.

But 'X' is not memory.  What is it at all?  Why do we use that when you
*have to* have mem here?

The code you add that tests for address_operand looks wrong.  I would
expect it to test the operand is memory, instead :-)


Segher


Re: [PATCH] libstdc++: Use __builtin_shufflevector for simd split and concat

2024-05-13 Thread Jonathan Wakely
On Tue, 7 May 2024 at 14:42, Matthias Kretz  wrote:
>
> Tested on x86_64-linux-gnu and aarch64-linux-gnu and with Clang 18 on x86_64-
> linux-gnu.
>
> OK for trunk and backport(s)?

OK for all.


>
> -- 8< 
>
> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/114958
> * include/experimental/bits/simd.h (__as_vector): Return scalar
> simd as one-element vector. Return vector from single-vector
> fixed_size simd.
> (__vec_shuffle): New.
> (__extract_part): Adjust return type signature.
> (split): Use __extract_part for any split into non-fixed_size
> simds.
> (concat): If the return type stores a single vector, use
> __vec_shuffle (which calls __builtin_shufflevector) to produce
> the return value.
> * include/experimental/bits/simd_builtin.h
> (__shift_elements_right): Removed.
> (__extract_part): Return single elements directly. Use
> __vec_shuffle (which calls __builtin_shufflevector) to for all
> non-trivial cases.
> * include/experimental/bits/simd_fixed_size.h (__extract_part):
> Return single elements directly.
> * testsuite/experimental/simd/pr114958.cc: New test.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 161 +-
>  .../include/experimental/bits/simd_builtin.h  | 152 +
>  .../experimental/bits/simd_fixed_size.h   |   4 +-
>  .../testsuite/experimental/simd/pr114958.cc   |  20 +++
>  4 files changed, 145 insertions(+), 192 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/experimental/simd/pr114958.cc
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  stdₓ::simd
> ──



[PATCH] Refactor SLP reduction group discovery

2024-05-13 Thread Richard Biener
The following refactors a bit how we perform SLP reduction group
discovery possibly making it easier to have multiple reduction
groups later, esp. with single-lane SLP.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_analyze_slp_instance): Remove
slp_inst_kind_reduc_group handling.
(vect_analyze_slp): Add the meat here.
---
 gcc/tree-vect-slp.cc | 67 ++--
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 8c18f5308e2..f34ed54a70b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3586,7 +3586,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
   slp_instance_kind kind,
   unsigned max_tree_size, unsigned *limit)
 {
-  unsigned int i;
   vec scalar_stmts;
 
   if (is_a  (vinfo))
@@ -3620,35 +3619,6 @@ vect_analyze_slp_instance (vec_info *vinfo,
   STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
= STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
 }
-  else if (kind == slp_inst_kind_reduc_group)
-{
-  /* Collect reduction statements.  */
-  const vec 
-   = as_a  (vinfo)->reductions;
-  scalar_stmts.create (reductions.length ());
-  for (i = 0; reductions.iterate (i, _info); i++)
-   {
- gassign *g;
- next_info = vect_stmt_to_vectorize (next_info);
- if ((STMT_VINFO_RELEVANT_P (next_info)
-  || STMT_VINFO_LIVE_P (next_info))
- /* ???  Make sure we didn't skip a conversion around a reduction
-path.  In that case we'd have to reverse engineer that
-conversion stmt following the chain using reduc_idx and from
-the PHI using reduc_def.  */
- && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
- /* Do not discover SLP reductions for lane-reducing ops, that
-will fail later.  */
- && (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
- || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
- && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
- && gimple_assign_rhs_code (g) != SAD_EXPR)))
-   scalar_stmts.quick_push (next_info);
-   }
-  /* If less than two were relevant/live there's nothing to SLP.  */
-  if (scalar_stmts.length () < 2)
-   return false;
-}
   else
 gcc_unreachable ();
 
@@ -3740,9 +3710,40 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
 
   /* Find SLP sequences starting from groups of reductions.  */
   if (loop_vinfo->reductions.length () > 1)
-   vect_analyze_slp_instance (vinfo, bst_map, loop_vinfo->reductions[0],
-  slp_inst_kind_reduc_group, max_tree_size,
-  );
+   {
+ /* Collect reduction statements.  */
+ vec scalar_stmts;
+ scalar_stmts.create (loop_vinfo->reductions.length ());
+ for (auto next_info : loop_vinfo->reductions)
+   {
+ gassign *g;
+ next_info = vect_stmt_to_vectorize (next_info);
+ if ((STMT_VINFO_RELEVANT_P (next_info)
+  || STMT_VINFO_LIVE_P (next_info))
+ /* ???  Make sure we didn't skip a conversion around a
+reduction path.  In that case we'd have to reverse
+engineer that conversion stmt following the chain using
+reduc_idx and from the PHI using reduc_def.  */
+ && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
+ /* Do not discover SLP reductions for lane-reducing ops, that
+will fail later.  */
+ && (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
+ || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
+ && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
+ && gimple_assign_rhs_code (g) != SAD_EXPR)))
+   scalar_stmts.quick_push (next_info);
+   }
+ if (scalar_stmts.length () > 1)
+   {
+ vec roots = vNULL;
+ vec remain = vNULL;
+ vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group,
+  scalar_stmts, roots, remain,
+  max_tree_size, , bst_map, NULL);
+   }
+ else
+   scalar_stmts.release ();
+   }
 }
 
   hash_set visited_patterns;
-- 
2.35.3


RE: [PATCH] Allow patterns in SLP reductions

2024-05-13 Thread Richard Biener
On Mon, 13 May 2024, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 10, 2024 2:07 PM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] Allow patterns in SLP reductions
> > 
> > On Fri, Mar 1, 2024 at 10:21 AM Richard Biener  wrote:
> > >
> > > The following removes the over-broad rejection of patterns for SLP
> > > reductions which is done by removing them from LOOP_VINFO_REDUCTIONS
> > > during pattern detection.  That's also insufficient in case the
> > > pattern only appears on the reduction path.  Instead this implements
> > > the proper correctness check in vectorizable_reduction and guides
> > > SLP discovery to heuristically avoid forming later invalid groups.
> > >
> > > I also couldn't find any testcase that FAILs when allowing the SLP
> > > reductions to form so I've added one.
> > >
> > > I came across this for single-lane SLP reductions with the all-SLP
> > > work where we rely on patterns to properly vectorize COND_EXPR
> > > reductions.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.
> > 
> > Re-bootstrapped/tested, r15-361-g52d4691294c847
> 
> Awesome!
> 
> Does this now allow us to write new reductions using patterns? i.e. 
> widening reductions?

Yes (SLP reductions, that is).  This is really only for SLP reductions
(not SLP reduction chains, not non-SLP reductions).  So it's just
a corner-case but since with SLP-only non-SLP reductions become
SLP reductions with a single lane that was important to fix ;)

Richard.

> Cheers,
> Tamar
> > 
> > Richard.
> > 
> > > Richard.
> > >
> > > * tree-vect-patterns.cc (vect_pattern_recog_1): Do not
> > > remove reductions involving patterns.
> > > * tree-vect-loop.cc (vectorizable_reduction): Reject SLP
> > > reduction groups with multiple lane-reducing reductions.
> > > * tree-vect-slp.cc (vect_analyze_slp_instance): When discovering
> > > SLP reduction groups avoid including lane-reducing ones.
> > >
> > > * gcc.dg/vect/vect-reduc-sad-9.c: New testcase.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c | 68 
> > >  gcc/tree-vect-loop.cc| 15 +
> > >  gcc/tree-vect-patterns.cc| 13 
> > >  gcc/tree-vect-slp.cc | 26 +---
> > >  4 files changed, 101 insertions(+), 21 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > > new file mode 100644
> > > index 000..3c6af4510f4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > > @@ -0,0 +1,68 @@
> > > +/* Disabling epilogues until we find a better way to deal with scans.  */
> > > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> > > +/* { dg-additional-options "-msse4.2" { target { x86_64-*-* i?86-*-* } } 
> > > } */
> > > +/* { dg-require-effective-target vect_usad_char } */
> > > +
> > > +#include 
> > > +#include "tree-vect.h"
> > > +
> > > +#define N 64
> > > +
> > > +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > > +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > > +int abs (int);
> > > +
> > > +/* Sum of absolute differences between arrays of unsigned char types.
> > > +   Detected as a sad pattern.
> > > +   Vectorized on targets that support sad for unsigned chars.  */
> > > +
> > > +__attribute__ ((noinline)) int
> > > +foo (int len, int *res2)
> > > +{
> > > +  int i;
> > > +  int result = 0;
> > > +  int result2 = 0;
> > > +
> > > +  for (i = 0; i < len; i++)
> > > +{
> > > +  /* Make sure we are not using an SLP reduction for this.  */
> > > +  result += abs (X[2*i] - Y[2*i]);
> > > +  result2 += abs (X[2*i + 1] - Y[2*i + 1]);
> > > +}
> > > +
> > > +  *res2 = result2;
> > > +  return result;
> > > +}
> > > +
> > > +
> > > +int
> > > +main (void)
> > > +{
> > > +  int i;
> > > +  int sad;
> > > +
> > > +  check_vect ();
> > > +
> > > +  for (i = 0; i < N/2; i++)
> > > +{
> > > +  X[2*i] = i;
> > > +  Y[2*i] = N/2 - i;
> > > +  X[2*i+1] = i;
> > > +  Y[2*i+1] = 0;
> > > +  __asm__ volatile ("");
> > > +}
> > > +
> > > +
> > > +  int sad2;
> > > +  sad = foo (N/2, );
> > > +  if (sad != (N/2)*(N/4))
> > > +abort ();
> > > +  if (sad2 != (N/2-1)*(N/2)/2)
> > > +abort ();
> > > +
> > > +  return 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" 
> > > } } */
> > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > > +
> > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > index 35f1f8c7d42..13dcdba403a 100644
> > > --- a/gcc/tree-vect-loop.cc
> > > +++ b/gcc/tree-vect-loop.cc
> > > @@ -7703,6 +7703,21 

Re: [PATCH] c++: Optimize in maybe_clone_body aliases even when not at_eof [PR113208]

2024-05-13 Thread Jakub Jelinek
On Fri, May 10, 2024 at 03:59:25PM -0400, Jason Merrill wrote:
> > 2024-05-09  Jakub Jelinek  
> > Jason Merrill  
> > 
> > PR lto/113208
> > * cp-tree.h (maybe_optimize_cdtor): Remove.
> > * decl2.cc (tentative_decl_linkage): Call maybe_make_one_only
> > for implicit instantiations of maybe in charge ctors/dtors
> > declared inline.
> > (import_export_decl): Don't call maybe_optimize_cdtor.
> > (c_parse_final_cleanups): Formatting fixes.
> > * optimize.cc (can_alias_cdtor): Adjust condition, for
> > HAVE_COMDAT_GROUP && DECL_ONE_ONLY && DECL_WEAK return true even
> > if not DECL_INTERFACE_KNOWN.
> 
> > --- gcc/cp/optimize.cc.jj   2024-04-25 20:33:30.771858912 +0200
> > +++ gcc/cp/optimize.cc  2024-05-09 17:10:23.920478922 +0200
> > @@ -220,10 +220,8 @@ can_alias_cdtor (tree fn)
> > gcc_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (fn));
> > /* Don't use aliases for weak/linkonce definitions unless we can put 
> > both
> >symbols in the same COMDAT group.  */
> > -  return (DECL_INTERFACE_KNOWN (fn)
> > - && (SUPPORTS_ONE_ONLY || !DECL_WEAK (fn))
> > - && (!DECL_ONE_ONLY (fn)
> > - || (HAVE_COMDAT_GROUP && DECL_WEAK (fn;
> > +  return (DECL_WEAK (fn) ? (HAVE_COMDAT_GROUP && DECL_ONE_ONLY (fn))
> > +: (DECL_INTERFACE_KNOWN (fn) && !DECL_ONE_ONLY (fn)));
> 
> Hmm, would
> 
> (DECL_ONE_ONLY (fn) ? HAVE_COMDAT_GROUP
>  : (DECL_INTERFACE_KNOWN (fn) && !DECL_WEAK (fn)))
> 
> make sense instead?  I don't think DECL_WEAK is necessary for COMDAT.

I think it isn't indeed necessary for COMDAT, although e.g. comdat_linkage
will not call make_decl_one_only if !flag_weak.

But I think it is absolutely required for the alias cdtor optimization
in question, because otherwise it would be an ABI change.
Consider older version of GCC or some other compiler emitting
_ZN6vectorI12QualityValueEC1ERKS1_
and
_ZN6vectorI12QualityValueEC2ERKS1_
symbols not as aliases, each in their own comdat groups, so
.text._ZN6vectorI12QualityValueEC1ERKS1_ in _ZN6vectorI12QualityValueEC1ERKS1_
comdat group and
.text._ZN6vectorI12QualityValueEC2ERKS1_ in _ZN6vectorI12QualityValueEC2ERKS1_
comdat group.  And then comes GCC with the above patch without the DECL_WEAK
check in there, and decides to use alias, so
_ZN6vectorI12QualityValueEC1ERKS1_ is an alias to
_ZN6vectorI12QualityValueEC2ERKS1_ and both live in
.text._ZN6vectorI12QualityValueEC2ERKS1_ section in
_ZN6vectorI12QualityValueEC5ERKS1_ comdat group.  If you mix TUs with this,
the linker can keep one of the section sets from the 
_ZN6vectorI12QualityValueEC1ERKS1_
and _ZN6vectorI12QualityValueEC2ERKS1_ and _ZN6vectorI12QualityValueEC5ERKS1_
comdat groups.  If there is no .weak for the symbols, this will fail to
link, one can emit it either the old way or the new way but never both, it
is part of an ABI.
While with .weak, mixing it is possible, worst case one gets some unused
code in the linked binary or shared library.  Of course the desirable case
is that there is no mixing and there is no unused code, but if it happens,
no big deal.  Without .weak it is a big deal.

Jakub



RE: [PATCH] Allow patterns in SLP reductions

2024-05-13 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 10, 2024 2:07 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Allow patterns in SLP reductions
> 
> On Fri, Mar 1, 2024 at 10:21 AM Richard Biener  wrote:
> >
> > The following removes the over-broad rejection of patterns for SLP
> > reductions which is done by removing them from LOOP_VINFO_REDUCTIONS
> > during pattern detection.  That's also insufficient in case the
> > pattern only appears on the reduction path.  Instead this implements
> > the proper correctness check in vectorizable_reduction and guides
> > SLP discovery to heuristically avoid forming later invalid groups.
> >
> > I also couldn't find any testcase that FAILs when allowing the SLP
> > reductions to form so I've added one.
> >
> > I came across this for single-lane SLP reductions with the all-SLP
> > work where we rely on patterns to properly vectorize COND_EXPR
> > reductions.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.
> 
> Re-bootstrapped/tested, r15-361-g52d4691294c847

Awesome!

Does this now allow us to write new reductions using patterns? i.e. widening 
reductions?

Cheers,
Tamar
> 
> Richard.
> 
> > Richard.
> >
> > * tree-vect-patterns.cc (vect_pattern_recog_1): Do not
> > remove reductions involving patterns.
> > * tree-vect-loop.cc (vectorizable_reduction): Reject SLP
> > reduction groups with multiple lane-reducing reductions.
> > * tree-vect-slp.cc (vect_analyze_slp_instance): When discovering
> > SLP reduction groups avoid including lane-reducing ones.
> >
> > * gcc.dg/vect/vect-reduc-sad-9.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c | 68 
> >  gcc/tree-vect-loop.cc| 15 +
> >  gcc/tree-vect-patterns.cc| 13 
> >  gcc/tree-vect-slp.cc | 26 +---
> >  4 files changed, 101 insertions(+), 21 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > new file mode 100644
> > index 000..3c6af4510f4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c
> > @@ -0,0 +1,68 @@
> > +/* Disabling epilogues until we find a better way to deal with scans.  */
> > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> > +/* { dg-additional-options "-msse4.2" { target { x86_64-*-* i?86-*-* } } } 
> > */
> > +/* { dg-require-effective-target vect_usad_char } */
> > +
> > +#include 
> > +#include "tree-vect.h"
> > +
> > +#define N 64
> > +
> > +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__)));
> > +int abs (int);
> > +
> > +/* Sum of absolute differences between arrays of unsigned char types.
> > +   Detected as a sad pattern.
> > +   Vectorized on targets that support sad for unsigned chars.  */
> > +
> > +__attribute__ ((noinline)) int
> > +foo (int len, int *res2)
> > +{
> > +  int i;
> > +  int result = 0;
> > +  int result2 = 0;
> > +
> > +  for (i = 0; i < len; i++)
> > +{
> > +  /* Make sure we are not using an SLP reduction for this.  */
> > +  result += abs (X[2*i] - Y[2*i]);
> > +  result2 += abs (X[2*i + 1] - Y[2*i + 1]);
> > +}
> > +
> > +  *res2 = result2;
> > +  return result;
> > +}
> > +
> > +
> > +int
> > +main (void)
> > +{
> > +  int i;
> > +  int sad;
> > +
> > +  check_vect ();
> > +
> > +  for (i = 0; i < N/2; i++)
> > +{
> > +  X[2*i] = i;
> > +  Y[2*i] = N/2 - i;
> > +  X[2*i+1] = i;
> > +  Y[2*i+1] = 0;
> > +  __asm__ volatile ("");
> > +}
> > +
> > +
> > +  int sad2;
> > +  sad = foo (N/2, );
> > +  if (sad != (N/2)*(N/4))
> > +abort ();
> > +  if (sad2 != (N/2-1)*(N/2)/2)
> > +abort ();
> > +
> > +  return 0;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } 
> > } */
> > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> > +
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 35f1f8c7d42..13dcdba403a 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -7703,6 +7703,21 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> >return false;
> >  }
> >
> > +  /* Lane-reducing ops also never can be used in a SLP reduction group
> > + since we'll mix lanes belonging to different reductions.  But it's
> > + OK to use them in a reduction chain or when the reduction group
> > + has just one element.  */
> > +  if (lane_reduc_code_p
> > +  && slp_node
> > +  && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
> > +  && SLP_TREE_LANES (slp_node) > 1)
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf_loc 

Re: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls

2024-05-13 Thread Mark Wielaard
Hi Evgeny,

Adding David to the CC, who might know the details.

On Mon, May 13, 2024 at 08:44:12AM +, Evgeny Karpov wrote:
> Sunday, May 12, 2024
>
> Thank you for reviewing our changes related to the refactoring of
> extracting the MinGW implementation from ix64.
>
> It was expected to move the MinGW-related files without changes in
> this commit ("Reuse MinGW from i386 for AArch64") and apply the
> renaming in a follow-up commit, which has been done in 'Rename "x86
> Windows Options" to "Cygwin and MinGW Options"'.
>
> The script to update opt.urls files has been used.
> 
> > diff --git a/gcc/config/mingw/cygming.opt.urls
> > b/gcc/config/mingw/cygming.opt.urls
> > index c624e22e4427..af11c4997609 100644
> > --- a/gcc/config/mingw/cygming.opt.urls
> > +++ b/gcc/config/mingw/cygming.opt.urls
> > @@ -1,4 +1,4 @@
> 
> > -; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/cygming.opt
> > and generated HTML
> > +; Autogenerated by regenerate-opt-urls.py from
> > +gcc/config/mingw/cygming.opt and generated HTML
> 
> I am not sure why this comment has not been updated. Is it critical
> or it could be updated next time when it is needed?

Odd that the script didn't update this comment, it really should have.
It might be that running the script through make regenerate-opt-urls
inside the gcc build subdir invokes regenerate-opt-urls.py slightly
differently so that this line is updated.

> >  mconsole
> >  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole)
> > @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-
> > mdll)
> >  mnop-fun-dllimport
> >  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-dllimport)
> > 
> > -; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> > -;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1'
> > -;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> > +mthreads
> > +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1)
> 
> mthreads has the same issue before applying changes. Has something been 
> changed recently?
> This is the change in patch series in 'Rename "x86 Windows Options" to 
> "Cygwin and MinGW Options"' commit.
> 
> ; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> +;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1'
>  ;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> -;   duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1'

Again, it might be caused by invoking the script by hand vs with make
regenerate-opt-urls.py. I believe with the make option it will
renumber the suffixes making sure the urls are unique.

BTW. There is a CI buildbot that tries to regenerate all generated
files, which is how I spotted this:
https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen
(It should also sent email to the author of the patch on failure.)

Cheers,

Mark


RE: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int

2024-05-13 Thread Tamar Christina
Hi Pan,

> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, May 6, 2024 3:49 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com;
> hongtao@intel.com; Pan Li 
> Subject: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int
> 
> From: Pan Li 
> 
> This patch depends on below scalar enabling patch:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650822.html
> 
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd3
> in vector mode.  For example:
> 
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
> 
>   for (i = 0; i < n; i++)
> out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
> 
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
>   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
>   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
>   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
>   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
> ... }, vect__7.11_66);
>   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
>   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
>   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
>   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
>   ivtmp_79 = ivtmp_78 - _80;
>   ...
> }
> 
> After this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
>   ivtmp_46 = _62 * 8;
>   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
>   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
>   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
>   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
>   ...
> }
> 
> The below test suites are passed for this patch.
> * The riscv fully regression tests.
> * The aarch64 fully regression tests.
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
> 
>   PR target/51492
>   PR target/112600
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New func
>   decl generated by match.pd match.
>   (vect_recog_sat_add_pattern): New func impl to recog the pattern
>   for unsigned SAT_ADD.
> 
> Signed-off-by: Pan Li 

Patch looks good to me, but I cannot approve so I'll pass it on to Richi.

Cheers,
Tamar

> ---
>  gcc/tree-vect-patterns.cc | 51 +++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87c2acff386..8ffcaf71d5c 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4487,6 +4487,56 @@ vect_recog_mult_pattern (vec_info *vinfo,
>return pattern_stmt;
>  }
> 
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> + *   _7 = _4 + _6;
> + *   _8 = _4 > _7;
> + *   _9 = (long unsigned int) _8;
> + *   _10 = -_9;
> + *   _12 = _7 | _10;
> + *
> + * And then simplied to
> + *   _12 = .SAT_ADD (_4, _6);
> + */
> +
> +static gimple *
> +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> + tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +
> +  if (!is_gimple_assign (last_stmt))
> +return NULL;
> +
> +  tree res_ops[2];
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> +{
> +  tree itype = TREE_TYPE (res_ops[0]);
> +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +  if (vtype != NULL_TREE && direct_internal_fn_supported_p (
> + IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))
> + {
> +   *type_out = vtype;
> +   gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, res_ops[0],
> + res_ops[1]);
> +
> +   gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +   gimple_call_set_nothrow (call, /* nothrow_p */ false);
> +   gimple_set_location (call, gimple_location (last_stmt));
> +
> +   vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> +   return call;
> + }
> +}
> +
> +  return NULL;
> +}
> +
>  /* Detect a signed division by a constant that wouldn't be
> 

RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-13 Thread Tamar Christina
Hi Pan,

> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, May 6, 2024 3:48 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com;
> hongtao@intel.com; Pan Li 
> Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar
> int
> 
> From: Pan Li 
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;succ:   EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;succ:   EXIT
> }
> 
> We perform the tranform during widen_mult because that the sub-expr of
> SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> cannot perform the .SAT_ADD pattern match as the sub-expr will be
> optmized to .ADD_OVERFLOW first.
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
>   PR target/51492
>   PR target/112600
> 
> gcc/ChangeLog:
> 
>   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
>   to the return true switch case(s).
>   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
>   * match.pd: Add unsigned SAT_ADD match.
>   * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
>   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
>   func decl generated in match.pd match.
>   (match_saturation_arith): New func impl to match the saturation arith.
>   (math_opts_dom_walker::after_dom_children): Try match saturation
>   arith.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc|  1 +
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 28 
>  gcc/optabs.def|  4 ++--
>  gcc/tree-ssa-math-opts.cc | 46
> +++
>  5 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>  case IFN_UBSAN_CHECK_MUL:
>  case IFN_ADD_OVERFLOW:
>  case IFN_MUL_OVERFLOW:
> +case IFN_SAT_ADD:
>  case IFN_VEC_WIDEN_PLUS:
>  case IFN_VEC_WIDEN_PLUS_LO:
>  case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
> smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..7058e4cbe29 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> || POINTER_TYPE_P (itype))
>&& wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
> 
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0))
> +  && types_match (type, TREE_TYPE (@0))
> +  && types_match (type, 

Re: [pushed 00/21] Various backports to gcc 13 (analyzer, jit, diagnostics)

2024-05-13 Thread Jakub Jelinek
On Thu, May 09, 2024 at 01:42:15PM -0400, David Malcolm wrote:
> I've pushed the following changes to releases/gcc-13
> as r13-8741-g89feb3557a0188 through r13-8761-gb7a2697733d19a.

Unfortunately many of the commits contained git commit message wording
that update_git_version can't cope with.
Wording like
(cherry picked from commit r14-1664-gfe9771b59f576f)
is wrong,
(cherry picked from commit .)
is reserved solely for what one gets from git cherry-pick -x
(i.e. the full commit hash without anything extra).

I had to ignore the following commits in the ChangeLog generation
because of this:

89feb3557a018893cfe50c2e07f91559bd3cde2b
ccf8d3e3d26c6ba3d5e11fffeed8d64018e9c060
e0c52905f666e3d23881f82dbf39466a24f009f4
b38472ffc1e631bd357573b44d956ce16d94e666
a0b13d0860848dd5f2876897ada1e22e4e681e91
b8c772cae97b54386f7853edf0f9897012bfa90b
810d35a7e054bcbb5b66d2e5924428e445f5fba9
0df1ee083434ac00ecb19582b1e5b25e105981b2
2c688f6afce4cbb414f5baab1199cd525f309fca
60dcb710b6b4aa22ea96abc8df6dfe9067f3d7fe
44968a0e00f656e9bb3e504bb2fa1a8282002015

Can you please add the ChangeLog entries for these by hand
(commits which only touch ChangeLog files are allowed and shouldn't
contain ChangeLog style entry in the commit message)?

Thanks.

Jakub



RE: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls

2024-05-13 Thread Evgeny Karpov
Sunday, May 12, 2024
Mark Wielaard  wrote:

> The new cygming.opt.urls and mingw.opt.urls in the
> gcc/config/mingw/cygming.opt.urls directory need to generated by make
> regenerate-opt-urls in the gcc subdirectory. They still contained references 
> to
> the gcc/config/i386 directory from which they were copied.
> 
> Fixes: 1f05dfc131c7 ("Reuse MinGW from i386 for AArch64")
> Fixes: e8d003736e6c ("Rename "x86 Windows Options" to "Cygwin and
> MinGW Options"")
> 
> gcc/ChangeLog:
> 
>   * config/mingw/cygming.opt.urls: Regenerate.
>   * config/mingw/mingw.opt.urls: Likewise.
> ---

Hello Mark, 
Thank you for reviewing our changes related to the refactoring of extracting 
the MinGW implementation from ix64. 

It was expected to move the MinGW-related files without changes in this commit 
("Reuse MinGW from i386 for AArch64") and apply the renaming in a follow-up 
commit, which has been done in 'Rename "x86 Windows Options" to "Cygwin and 
MinGW Options"'. 

The script to update opt.urls files has been used.

>  gcc/config/mingw/cygming.opt.urls | 7 +++
>  gcc/config/mingw/mingw.opt.urls   | 2 +-
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/config/mingw/cygming.opt.urls
> b/gcc/config/mingw/cygming.opt.urls
> index c624e22e4427..af11c4997609 100644
> --- a/gcc/config/mingw/cygming.opt.urls
> +++ b/gcc/config/mingw/cygming.opt.urls
> @@ -1,4 +1,4 @@

> -; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/cygming.opt
> and generated HTML
> +; Autogenerated by regenerate-opt-urls.py from
> +gcc/config/mingw/cygming.opt and generated HTML

I am not sure why this comment has not been updated. Is it critical or it could 
be updated next time when it is needed?

>
>  mconsole
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole)
> @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-
> mdll)
>  mnop-fun-dllimport
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-dllimport)
> 
> -; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> -;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1'
> -;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> +mthreads
> +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1)

mthreads has the same issue before applying changes. Has something been changed 
recently?
This is the change in patch series in 'Rename "x86 Windows Options" to "Cygwin 
and MinGW Options"' commit.

; skipping UrlSuffix for 'mthreads' due to multiple URLs:
+;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1'
 ;   duplicate: 'gcc/x86-Options.html#index-mthreads'
-;   duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1'

Regards,
Evgeny

>  mwin32
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mwin32)
> diff --git a/gcc/config/mingw/mingw.opt.urls
> b/gcc/config/mingw/mingw.opt.urls index f8ee5be6a535..40fb086606b2
> 100644

> --- a/gcc/config/mingw/mingw.opt.urls
> +++ b/gcc/config/mingw/mingw.opt.urls
> @@ -1,4 +1,4 @@
> -; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/mingw.opt
> and generated HTML
> +; Autogenerated by regenerate-opt-urls.py from
> +gcc/config/mingw/mingw.opt and generated HTML
> 
>  mcrtdll=
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mcrtdll)
> --
> 2.39.3



[PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]

2024-05-13 Thread Rainer Orth
g++.dg/modules/stdio-1_a.H currently FAILs on Solaris:

FAIL: g++.dg/modules/stdio-1_a.H -std=c++17  scan-lang-dump module "Depset:0 
decl entity:[0-9]* function_decl:'::printf'"
FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a  scan-lang-dump module "Depset:0 
decl entity:[0-9]* function_decl:'::printf'"
FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b  scan-lang-dump module "Depset:0 
decl entity:[0-9]* function_decl:'::printf'"

The problem is that the module file doesn't contain

 Depset:0 decl entity:95 function_decl:'::printf'

as expected by the test, but

 Depset:0 decl entity:26 function_decl:'::std::printf'

This happens because Solaris  declares printf in namespace std
as allowed by C++11, Annex D, D.5.

This patch allows for both forms.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-05-13  Rainer Orth  

gcc/testsuite:
PR c++/98529
* g++.dg/modules/stdio-1_a.H (scan-lang-dump): Allow for
::std::printf.

diff --git a/gcc/testsuite/g++.dg/modules/stdio-1_a.H b/gcc/testsuite/g++.dg/modules/stdio-1_a.H
--- a/gcc/testsuite/g++.dg/modules/stdio-1_a.H
+++ b/gcc/testsuite/g++.dg/modules/stdio-1_a.H
@@ -10,5 +10,5 @@
 #endif
 // There should be *lots* of depsets (209 for glibc today)
 // { dg-final { scan-lang-dump {Writing section:60 } module } }
-// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* function_decl:'::printf'} module } }
+// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* function_decl:'(::std)?::printf'} module } }
 // { dg-final { scan-lang-dump {Depset:1 binding namespace_decl:'::printf'} module } }


[COMMITTED] ada: Attributes Put_Image and Object_Size are defined by Ada 2022

2024-05-13 Thread Marc Poulhiès
From: Piotr Trojanek 

Recognize references to attributes Put_Image and Object_Size as
language-defined in Ada 2022 and implementation-defined in earlier
versions of Ada. Other attributes listed in Ada 2022 RM, K.2 and
currently implemented in GNAT are correctly categorized.

This change only affects code with restriction
No_Implementation_Attributes.

gcc/ada/

* sem_attr.adb (Attribute_22): Add Put_Image and Object_Size.
* sem_attr.ads (Attribute_Imp_Def): Remove Object_Size.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb |  4 +++-
 gcc/ada/sem_attr.ads | 11 ---
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 65442d45a85..b979ffdf0b1 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -181,7 +181,9 @@ package body Sem_Attr is
  (Attribute_Enum_Rep |
   Attribute_Enum_Val |
   Attribute_Index|
-  Attribute_Preelaborable_Initialization => True,
+  Attribute_Object_Size  |
+  Attribute_Preelaborable_Initialization |
+  Attribute_Put_Image=> True,
   others => False);
 
--  The following array contains all attributes that imply a modification
diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads
index 4c9f27043c6..65b7b534711 100644
--- a/gcc/ada/sem_attr.ads
+++ b/gcc/ada/sem_attr.ads
@@ -373,17 +373,6 @@ package Sem_Attr is
   --  other composite object passed by reference, there is no other way
   --  of specifying that a zero address should be passed.
 
-  -
-  -- Object_Size --
-  -
-
-  Attribute_Object_Size => True,
-  --  Type'Object_Size is the same as Type'Size for all types except
-  --  fixed-point types and discrete types. For fixed-point types and
-  --  discrete types, this attribute gives the size used for default
-  --  allocation of objects and components of the size. See section in
-  --  Einfo ("Handling of Type'Size values") for further details.
-
   -
   -- Passed_By_Reference --
   -
-- 
2.43.2



[COMMITTED] ada: Fix crash on Compile_Time_Warning in dead code

2024-05-13 Thread Marc Poulhiès
From: Bob Duff 

If a pragma Compile_Time_Warning triggers, and the pragma
is later removed because it is dead code, then the compiler
can return a bad exit code. This causes gprbuild to report
"*** compilation phase failed".

This is because Total_Errors_Detected, which is declared as Nat,
goes negative, causing Constraint_Error. In assertions-off mode,
the Constraint_Error is not detected, but the compiler nonetheless
reports a bad exit code.

This patch prevents that negative count.

gcc/ada/

* errout.adb (Output_Messages): Protect against the total going
negative.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/errout.adb | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
index d28a410f47b..c4761bd1bc9 100644
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3399,11 +3399,16 @@ package body Errout is
 
   if Warning_Mode = Treat_As_Error then
  declare
-Compile_Time_Pragma_Warnings : constant Int :=
+Compile_Time_Pragma_Warnings : constant Nat :=
Count_Compile_Time_Pragma_Warnings;
- begin
-Total_Errors_Detected := Total_Errors_Detected + Warnings_Detected
+Total : constant Int := Total_Errors_Detected + Warnings_Detected
- Warning_Info_Messages - Compile_Time_Pragma_Warnings;
+--  We need to protect against a negative Total here, because
+--  if a pragma Compile_Time_Warning occurs in dead code, it
+--  gets counted in Compile_Time_Pragma_Warnings but not in
+--  Warnings_Detected.
+ begin
+Total_Errors_Detected := Int'Max (Total, 0);
 Warnings_Detected :=
Warning_Info_Messages + Compile_Time_Pragma_Warnings;
  end;
-- 
2.43.2



[COMMITTED] ada: Refine type of a local variable

2024-05-13 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_util.adb (Has_No_Output): Iteration with
First_Formal/Next_Formal involves Entity_Ids.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index e9ab6650dac..03055039a1f 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -4203,7 +4203,7 @@ package body Sem_Util is
 ---
 
 function Has_No_Output (Subp : Entity_Id) return Boolean is
-   Param : Node_Id;
+   Param : Entity_Id;
 
 begin
--  A function has its result as output
-- 
2.43.2



Re: [PATCH] tree-ssa-math-opts: Pattern recognize yet another .ADD_OVERFLOW pattern [PR113982]

2024-05-13 Thread Richard Biener
On Mon, 13 May 2024, Jakub Jelinek wrote:

> Hi!
> 
> We pattern recognize already many different patterns, and closest to the
> requested one also
>yc = (type) y;
>zc = (type) z;
>x = yc + zc;
>w = (typeof_y) x;
>if (x > max)
> where y/z has the same unsigned type and type is a wider unsigned type
> and max is maximum value of the narrower unsigned type.
> But apparently people are creative in writing this in diffent ways,
> this requests
>yc = (type) y;
>zc = (type) z;
>x = yc + zc;
>w = (typeof_y) x;
>if (x >> narrower_type_bits)
> 
> The following patch implements that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.  Seeing the large matching code I wonder if using a match
in match.pd might be more easy to maintain (eh, and I'd still
like to somehow see "inline" match patterns in source files, not
sure how, but requiring some gen* program extracting them).

Thanks,
Richard.

> 2024-05-13  Jakub Jelinek  
> 
>   PR middle-end/113982
>   * tree-ssa-math-opts.cc (arith_overflow_check_p): Also return 1
>   for RSHIFT_EXPR by precision of maxval if shift result is only
>   used in a cast or comparison against zero.
>   (match_arith_overflow): Handle the RSHIFT_EXPR use case.
> 
>   * gcc.dg/pr113982.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj  2024-04-11 09:26:36.318369218 +0200
> +++ gcc/tree-ssa-math-opts.cc 2024-05-10 18:17:08.795744811 +0200
> @@ -3947,6 +3947,66 @@ arith_overflow_check_p (gimple *stmt, gi
>else
>  return 0;
>  
> +  if (maxval
> +  && ccode == RSHIFT_EXPR
> +  && crhs1 == lhs
> +  && TREE_CODE (crhs2) == INTEGER_CST
> +  && wi::to_widest (crhs2) == TYPE_PRECISION (TREE_TYPE (maxval)))
> +{
> +  tree shiftlhs = gimple_assign_lhs (use_stmt);
> +  if (!shiftlhs)
> + return 0;
> +  use_operand_p use;
> +  if (!single_imm_use (shiftlhs, , _use_stmt))
> + return 0;
> +  if (gimple_code (cur_use_stmt) == GIMPLE_COND)
> + {
> +   ccode = gimple_cond_code (cur_use_stmt);
> +   crhs1 = gimple_cond_lhs (cur_use_stmt);
> +   crhs2 = gimple_cond_rhs (cur_use_stmt);
> + }
> +  else if (is_gimple_assign (cur_use_stmt))
> + {
> +   if (gimple_assign_rhs_class (cur_use_stmt) == GIMPLE_BINARY_RHS)
> + {
> +   ccode = gimple_assign_rhs_code (cur_use_stmt);
> +   crhs1 = gimple_assign_rhs1 (cur_use_stmt);
> +   crhs2 = gimple_assign_rhs2 (cur_use_stmt);
> + }
> +   else if (gimple_assign_rhs_code (cur_use_stmt) == COND_EXPR)
> + {
> +   tree cond = gimple_assign_rhs1 (cur_use_stmt);
> +   if (COMPARISON_CLASS_P (cond))
> + {
> +   ccode = TREE_CODE (cond);
> +   crhs1 = TREE_OPERAND (cond, 0);
> +   crhs2 = TREE_OPERAND (cond, 1);
> + }
> +   else
> + return 0;
> + }
> +   else
> + {
> +   enum tree_code sc = gimple_assign_rhs_code (cur_use_stmt);
> +   tree castlhs = gimple_assign_lhs (cur_use_stmt);
> +   if (!CONVERT_EXPR_CODE_P (sc)
> +   || !castlhs
> +   || !INTEGRAL_TYPE_P (TREE_TYPE (castlhs))
> +   || (TYPE_PRECISION (TREE_TYPE (castlhs))
> +   > TYPE_PRECISION (TREE_TYPE (maxval
> + return 0;
> +   return 1;
> + }
> + }
> +  else
> + return 0;
> +  if ((ccode != EQ_EXPR && ccode != NE_EXPR)
> +   || crhs1 != shiftlhs
> +   || !integer_zerop (crhs2))
> + return 0;
> +  return 1;
> +}
> +
>if (TREE_CODE_CLASS (ccode) != tcc_comparison)
>  return 0;
>  
> @@ -4049,6 +4109,7 @@ arith_overflow_check_p (gimple *stmt, gi
> _8 = IMAGPART_EXPR <_7>;
> if (_8)
> and replace (utype) x with _9.
> +   Or with x >> popcount (max) instead of x > max.
>  
> Also recognize:
> x = ~z;
> @@ -4481,10 +4542,62 @@ match_arith_overflow (gimple_stmt_iterat
> gcc_checking_assert (is_gimple_assign (use_stmt));
> if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
>   {
> -   gimple_assign_set_rhs1 (use_stmt, ovf);
> -   gimple_assign_set_rhs2 (use_stmt, build_int_cst (type, 0));
> -   gimple_assign_set_rhs_code (use_stmt,
> -   ovf_use == 1 ? NE_EXPR : EQ_EXPR);
> +   if (gimple_assign_rhs_code (use_stmt) == RSHIFT_EXPR)
> + {
> +   g2 = gimple_build_assign (make_ssa_name (boolean_type_node),
> + ovf_use == 1 ? NE_EXPR : EQ_EXPR,
> + ovf, build_int_cst (type, 0));
> +   gimple_stmt_iterator gsiu = gsi_for_stmt (use_stmt);
> +   gsi_insert_before (, g2, GSI_SAME_STMT);
> +   gimple_assign_set_rhs_with_ops (, NOP_EXPR,
> +

[COMMITTED] ada: Remove code that expected pre/post being split into conjuncts

2024-05-13 Thread Marc Poulhiès
From: Piotr Trojanek 

The removed code is no longer needed (and causes assertion failures).
Most likely it should have been using the Split_PPC flag.

gcc/ada/

* sem_util.adb (Is_Potentially_Unevaluated): Remove code for
recovering the original structure of expressions with AND THEN.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 29 ++---
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 1166c68b972..b5c33638b35 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -19582,39 +19582,14 @@ package body Sem_Util is
 
   --  Local variables
 
-  Par  : Node_Id;
   Expr : Node_Id;
+  Par  : Node_Id;
 
--  Start of processing for Is_Potentially_Unevaluated
 
begin
   Expr := N;
-  Par  := N;
-
-  --  A postcondition whose expression is a short-circuit is broken down
-  --  into individual aspects for better exception reporting. The original
-  --  short-circuit expression is rewritten as the second operand, and an
-  --  occurrence of 'Old in that operand is potentially unevaluated.
-  --  See sem_ch13.adb for details of this transformation. The reference
-  --  to 'Old may appear within an expression, so we must look for the
-  --  enclosing pragma argument in the tree that contains the reference.
-
-  while Present (Par)
-and then Nkind (Par) /= N_Pragma_Argument_Association
-  loop
- if Is_Rewrite_Substitution (Par)
-   and then Nkind (Original_Node (Par)) = N_And_Then
- then
-return True;
- end if;
-
- Par := Parent (Par);
-  end loop;
-
-  --  Other cases; 'Old appears within other expression (not the top-level
-  --  conjunct in a postcondition) with a potentially unevaluated operand.
-
-  Par := Parent (Expr);
+  Par  := Parent (Expr);
 
   while Present (Par)
 and then Nkind (Par) /= N_Pragma_Argument_Association
-- 
2.43.2



[COMMITTED] ada: Revert recent change for Put_Image and Object_Size attributes

2024-05-13 Thread Marc Poulhiès
From: Piotr Trojanek 

Recent change for attribute Object_Size caused spurious errors when
restriction No_Implementation_Attributes is active and attribute
Object_Size is introduced by expansion of dispatching operations.

Temporarily revert that change for a further investigation.

gcc/ada/

* sem_attr.adb (Attribute_22): Remove Put_Image and Object_Size.
* sem_attr.ads (Attribute_Imp_Def): Restore Object_Size.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb |  4 +---
 gcc/ada/sem_attr.ads | 11 +++
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index b979ffdf0b1..65442d45a85 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -181,9 +181,7 @@ package body Sem_Attr is
  (Attribute_Enum_Rep |
   Attribute_Enum_Val |
   Attribute_Index|
-  Attribute_Object_Size  |
-  Attribute_Preelaborable_Initialization |
-  Attribute_Put_Image=> True,
+  Attribute_Preelaborable_Initialization => True,
   others => False);
 
--  The following array contains all attributes that imply a modification
diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads
index 65b7b534711..4c9f27043c6 100644
--- a/gcc/ada/sem_attr.ads
+++ b/gcc/ada/sem_attr.ads
@@ -373,6 +373,17 @@ package Sem_Attr is
   --  other composite object passed by reference, there is no other way
   --  of specifying that a zero address should be passed.
 
+  -
+  -- Object_Size --
+  -
+
+  Attribute_Object_Size => True,
+  --  Type'Object_Size is the same as Type'Size for all types except
+  --  fixed-point types and discrete types. For fixed-point types and
+  --  discrete types, this attribute gives the size used for default
+  --  allocation of objects and components of the size. See section in
+  --  Einfo ("Handling of Type'Size values") for further details.
+
   -
   -- Passed_By_Reference --
   -
-- 
2.43.2



  1   2   >