Re: [PATCH] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]

2024-01-18 Thread juzhe.zh...@rivai.ai
Could you add a test for vle with mask?

For example:

__riscv_vle8 which overload __riscv_vle8_v_i8mf8_m and __riscv_vle8_v_u8mf8_m

You are using pointer type and mask type to resolve it.

So this pointer type is expecting const int8_t or const uint8_t.

Could you add test:
1.__riscv_vle8 (const int8_t *...)
2. __riscv_vle8 (const uint8_t *...)
3. __riscv_vle8 (const int32_t *...) ---> I worry this will cause ICE since 
pointer type doesn't match the expecting type,
I wonder whether it will cause ICE while resolving API.

Thanks.




juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2024-01-19 15:44
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; zhengyu; pan2.li; xuli
Subject: [PATCH] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]
From: xuli 
 
Change the hash value of overloaded intrinsic from considering
all parameter types to:
1. Encoding vector data type
2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, 
size_t vl)
   and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the 
pointer type
3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t 
vs1, size_t vl)
   and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode 
the number of
   parameters. The same goes for the vxrm intrinsics.
 
PR target/113420
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p): remove.
(registered_function::overloaded_hash): refactor.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr113420.c: New test.
---
gcc/config/riscv/riscv-vector-builtins.cc | 88 +++
.../gcc.target/riscv/rvv/base/pr113420.c  | 30 +++
2 files changed, 43 insertions(+), 75 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 25e0b6e56de..5240f9e1f02 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -4271,24 +4271,22 @@ registered_function::overloaded_hash () const
: TYPE_UNSIGNED (type);
   mode_p = POINTER_TYPE_P (type) ? TYPE_MODE (TREE_TYPE (type))
 : TYPE_MODE (type);
-  h.add_int (unsigned_p);
-  h.add_int (mode_p);
+  if (POINTER_TYPE_P (type) || lookup_vector_type_attribute (type))
+ {
+   h.add_int (unsigned_p);
+   h.add_int (mode_p);
+ }
+  else if (instance.base->may_require_vxrm_p ()
+|| instance.base->may_require_frm_p ())
+ {
+   h.add_int (argument_types.length ());
+   break;
+ }
 }
   return h.end ();
}
-bool
-has_vxrm_or_frm_p (function_instance , const vec 
)
-{
-  if (instance.base->may_require_vxrm_p ()
-  || (instance.base->may_require_frm_p ()
-   && (TREE_CODE (TREE_TYPE (arglist[arglist.length () - 2]))
-   == INTEGER_TYPE)))
-return true;
-  return false;
-}
-
hashval_t
registered_function::overloaded_hash (const vec )
{
@@ -4296,68 +4294,8 @@ registered_function::overloaded_hash (const vec )
   unsigned int len = arglist.length ();
   for (unsigned int i = 0; i < len; i++)
-{
-  /* vint8m1_t __riscv_vget_i8m1(vint8m2_t src, size_t index);
-  When the user calls vget intrinsic, the __riscv_vget_i8m1(src, 1)
-   form is used. The compiler recognizes that the parameter index is signed
-   int, which is inconsistent with size_t, so the index is converted to
-   size_t type in order to get correct hash value. vint8m2_t
-   __riscv_vset(vint8m2_t dest, size_t index, vint8m1_t value); The reason
-   is the same as above. */
-  if ((instance.base == bases::vget && (i == (len - 1)))
-   || ((instance.base == bases::vset
-   || instance.shape == shapes::crypto_vi)
- && (i == (len - 2
- argument_types.safe_push (size_type_node);
-  /* Vector fixed-point arithmetic instructions requiring argument vxrm.
-  For example: vuint32m4_t __riscv_vaaddu(vuint32m4_t vs2,
-  vuint32m4_t vs1, unsigned int vxrm, size_t vl); The user calls vaaddu
-  intrinsic in the form of __riscv_vaaddu(vs2, vs1, 2, vl). The compiler
-  recognizes that the parameter vxrm is a signed int, which is inconsistent
-  with the parameter unsigned int vxrm declared by intrinsic, so the
-  parameter vxrm is converted to an unsigned int type in order to get
-  correct hash value.
-
-  Vector Floating-Point Instructions requiring argument frm.
-  DEF_RVV_FUNCTION (vfadd, alu, full_preds, f_vvv_ops)
-  DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
-  Taking vfadd as an example, theoretically we can add base or shape to the
-  hash value to distinguish whether the frm parameter is required.
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, float32_t rs1, size_t vl);
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, unsigned 
int
-  frm, size_t vl);
-
- However, the current registration mechanism 

[PATCH] RISC-V: Bugfix for resolve_overloaded_builtin[PR113420]

2024-01-18 Thread Li Xu
From: xuli 

Change the hash value of overloaded intrinsic from considering
all parameter types to:
1. Encoding vector data type
2. In order to distinguish vle8_v_i8mf8_m(vbool64_t vm, const int8_t *rs1, 
size_t vl)
   and vle8_v_u8mf8_m(vbool64_t vm, const uint8_t *rs1, size_t vl), encode the 
pointer type
3. In order to distinguish vfadd_vv_f32mf2_rm(vfloat32mf2_t vs2, vfloat32mf2_t 
vs1, size_t vl)
   and vfadd_vv_f32mf2(vfloat32mf2_t vs2, vfloat32mf2_t vs1, size_t vl), encode 
the number of
   parameters. The same goes for the vxrm intrinsics.

PR target/113420

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc (has_vxrm_or_frm_p): remove.
(registered_function::overloaded_hash): refactor.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr113420.c: New test.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 88 +++
 .../gcc.target/riscv/rvv/base/pr113420.c  | 30 +++
 2 files changed, 43 insertions(+), 75 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr113420.c

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 25e0b6e56de..5240f9e1f02 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -4271,24 +4271,22 @@ registered_function::overloaded_hash () const
 : TYPE_UNSIGNED (type);
   mode_p = POINTER_TYPE_P (type) ? TYPE_MODE (TREE_TYPE (type))
 : TYPE_MODE (type);
-  h.add_int (unsigned_p);
-  h.add_int (mode_p);
+  if (POINTER_TYPE_P (type) || lookup_vector_type_attribute (type))
+   {
+ h.add_int (unsigned_p);
+ h.add_int (mode_p);
+   }
+  else if (instance.base->may_require_vxrm_p ()
+  || instance.base->may_require_frm_p ())
+   {
+ h.add_int (argument_types.length ());
+ break;
+   }
 }
 
   return h.end ();
 }
 
-bool
-has_vxrm_or_frm_p (function_instance , const vec 
)
-{
-  if (instance.base->may_require_vxrm_p ()
-  || (instance.base->may_require_frm_p ()
- && (TREE_CODE (TREE_TYPE (arglist[arglist.length () - 2]))
- == INTEGER_TYPE)))
-return true;
-  return false;
-}
-
 hashval_t
 registered_function::overloaded_hash (const vec )
 {
@@ -4296,68 +4294,8 @@ registered_function::overloaded_hash (const vec )
   unsigned int len = arglist.length ();
 
   for (unsigned int i = 0; i < len; i++)
-{
-  /* vint8m1_t __riscv_vget_i8m1(vint8m2_t src, size_t index);
-When the user calls vget intrinsic, the __riscv_vget_i8m1(src, 1)
-   form is used. The compiler recognizes that the parameter index is signed
-   int, which is inconsistent with size_t, so the index is converted to
-   size_t type in order to get correct hash value. vint8m2_t
-   __riscv_vset(vint8m2_t dest, size_t index, vint8m1_t value); The reason
-   is the same as above. */
-  if ((instance.base == bases::vget && (i == (len - 1)))
- || ((instance.base == bases::vset
-   || instance.shape == shapes::crypto_vi)
- && (i == (len - 2
-   argument_types.safe_push (size_type_node);
-  /* Vector fixed-point arithmetic instructions requiring argument vxrm.
-For example: vuint32m4_t __riscv_vaaddu(vuint32m4_t vs2,
-  vuint32m4_t vs1, unsigned int vxrm, size_t vl); The user calls vaaddu
-  intrinsic in the form of __riscv_vaaddu(vs2, vs1, 2, vl). The compiler
-  recognizes that the parameter vxrm is a signed int, which is inconsistent
-  with the parameter unsigned int vxrm declared by intrinsic, so the
-  parameter vxrm is converted to an unsigned int type in order to get
-  correct hash value.
-
-  Vector Floating-Point Instructions requiring argument frm.
-  DEF_RVV_FUNCTION (vfadd, alu, full_preds, f_vvv_ops)
-  DEF_RVV_FUNCTION (vfadd_frm, alu_frm, full_preds, f_vvv_ops)
-  Taking vfadd as an example, theoretically we can add base or shape to the
-  hash value to distinguish whether the frm parameter is required.
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, float32_t rs1, size_t vl);
-  vfloat32m1_t __riscv_vfadd(vfloat32m1_t vs2, vfloat32m1_t vs1, unsigned 
int
-  frm, size_t vl);
-
-   However, the current registration mechanism of overloaded intinsic 
for gcc
-  limits the intrinsic obtained by entering the hook to always be vfadd, 
not
-  vfadd_frm. Therefore, the correct hash value cannot be obtained through 
the
-  parameter list and overload name, base or shape.
-  ++---+---+
-  | index  | name  | kind  |
-  ++---+---+
-  | 124733 | __riscv_vfadd | Overloaded| <- Hook fun 
code
-  

[PATCH] testsuite: Disable test for PR113292 on targets without TLS support

2024-01-18 Thread Nathaniel Shead
Tested on x86_64-pc-linux-gnu using a cross-compiler to
arm-unknown-linux-gnueabihf with --enable-threads=0 that the link test
is correctly skipped. OK for trunk?

-- >8 --

This disables the new test added by r14-8168 on machines that don't have
TLS support, such as bare-metal ARM.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr113292_c.C: Require TLS.

Signed-off-by: Nathaniel Shead 
---
 gcc/testsuite/g++.dg/modules/pr113292_c.C | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.dg/modules/pr113292_c.C 
b/gcc/testsuite/g++.dg/modules/pr113292_c.C
index aa3f32ae818..c117c7cfcd4 100644
--- a/gcc/testsuite/g++.dg/modules/pr113292_c.C
+++ b/gcc/testsuite/g++.dg/modules/pr113292_c.C
@@ -1,6 +1,8 @@
 // PR c++/113292
 // { dg-module-do link }
+// { dg-add-options tls }
 // { dg-additional-options "-fmodules-ts" }
+// { dg-require-effective-target tls_runtime }
 
 import "pr113292_a.H";
 
-- 
2.43.0



Re: [PATCH 0/5] RISC-V: Relax the -march string for accept any order

2024-01-18 Thread Kito Cheng
Pushed to trunk :)

On Tue, Jan 16, 2024 at 10:33 PM Jeff Law  wrote:
>
>
>
> On 1/9/24 17:58, Kito Cheng wrote:
> > Oops, I should leave more context here:
> >
> > Actually we discussed that years ago, and most people agree with that,
> > but I guess we are just missing that, and also the ISA string isn't so
> > terribly long yet at that moment, however...the number of extensions are
> > growth so fast in last year, so I think it's time to moving this forward.
> >
> > Also we (SiFive) will send patches for clang/LLVM to relax that as well :)
> >
> > https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/14
> > 
> Then let's go forward.  It seems like as good a time as any with gcc-14
> and llvm-18 both right around the corner.
>
> jeff


Re: [PATCH] RISC-V: Add split pattern to generate SFB instructions. [PR113095]

2024-01-18 Thread Kito Cheng
Thanks! generally LGTM, but I would wait one more week to see any
other comments :)

On Fri, Jan 19, 2024 at 3:05 PM Monk Chiang  wrote:
>
> Since the match.pd transforms (zero_one == 0) ? y : z  y,
> into ((typeof(y))zero_one * z)  y. Add splitters to recongize
> this expression to generate SFB instructions.
>
> gcc/ChangeLog:
> PR target/113095
> * config/riscv/sfb.md: New splitters to rewrite single bit
> sign extension as the condition to SFB instructions.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/sfb.c: New test.
> ---
>  gcc/config/riscv/sfb.md  | 32 
>  gcc/testsuite/gcc.target/riscv/sfb.c | 24 +
>  2 files changed, 56 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sfb.c
>
> diff --git a/gcc/config/riscv/sfb.md b/gcc/config/riscv/sfb.md
> index 8ab747142c8..520b12c22f9 100644
> --- a/gcc/config/riscv/sfb.md
> +++ b/gcc/config/riscv/sfb.md
> @@ -35,3 +35,35 @@
>[(set_attr "length" "8")
> (set_attr "type" "sfb_alu")
> (set_attr "mode" "")])
> +
> +;; Combine creates this form ((typeof(y))zero_one * z)  y
> +;; for SiFive short forward branches.
> +
> +(define_split
> +  [(set (match_operand:X 0 "register_operand")
> +   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
> +  (const_int 1)
> +  (match_operand 2 "immediate_operand"))
> +  (match_operand:X 3 "register_operand")))
> +   (clobber (match_operand:X 4 "register_operand"))]
> +  "TARGET_SFB_ALU"
> +  [(set (match_dup 4) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
> 2)))
> +   (set (match_dup 0) (if_then_else:X (ne:X (match_dup 4) (const_int 0))
> + (match_dup 3)
> + (const_int 0)))])
> +
> +(define_split
> +  [(set (match_operand:X 0 "register_operand")
> +   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
> +  (const_int 1)
> +  (match_operand 2 "immediate_operand"))
> +  (match_operand:X 3 "register_operand")))
> +   (clobber (match_operand:X 4 "register_operand"))]
> +  "TARGET_SFB_ALU && (UINTVAL (operands[2]) < 11)"
> +  [(set (match_dup 4) (and:X (match_dup 1) (match_dup 2)))
> +   (set (match_dup 0) (if_then_else:X (ne:X (match_dup 4) (const_int 0))
> + (match_dup 3)
> + (const_int 0)))]
> +{
> +  operands[2] = GEN_INT (1 << UINTVAL(operands[2]));
> +})
> diff --git a/gcc/testsuite/gcc.target/riscv/sfb.c 
> b/gcc/testsuite/gcc.target/riscv/sfb.c
> new file mode 100644
> index 000..22f164051f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sfb.c
> @@ -0,0 +1,24 @@
> +//* { dg-do compile } */
> +/* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series" } */
> +
> +int f1(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z ^ y;
> +}
> +
> +int f2(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z ^ y : y;
> +}
> +
> +int f3(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) == 0) ? y : z | y;
> +}
> +
> +int f4(unsigned int x, unsigned int y, unsigned int z)
> +{
> +  return ((x & 1) != 0) ? z | y : y;
> +}
> +/* { dg-final { scan-assembler-times "bne" 4 } } */
> +/* { dg-final { scan-assembler-times "movcc" 4 } } */
> --
> 2.40.1
>


[PATCH] RISC-V: Add split pattern to generate SFB instructions. [PR113095]

2024-01-18 Thread Monk Chiang
Since the match.pd transforms (zero_one == 0) ? y : z  y,
into ((typeof(y))zero_one * z)  y. Add splitters to recongize
this expression to generate SFB instructions.

gcc/ChangeLog:
PR target/113095
* config/riscv/sfb.md: New splitters to rewrite single bit
sign extension as the condition to SFB instructions.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/sfb.c: New test.
---
 gcc/config/riscv/sfb.md  | 32 
 gcc/testsuite/gcc.target/riscv/sfb.c | 24 +
 2 files changed, 56 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sfb.c

diff --git a/gcc/config/riscv/sfb.md b/gcc/config/riscv/sfb.md
index 8ab747142c8..520b12c22f9 100644
--- a/gcc/config/riscv/sfb.md
+++ b/gcc/config/riscv/sfb.md
@@ -35,3 +35,35 @@
   [(set_attr "length" "8")
(set_attr "type" "sfb_alu")
(set_attr "mode" "")])
+
+;; Combine creates this form ((typeof(y))zero_one * z)  y
+;; for SiFive short forward branches.
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_SFB_ALU"
+  [(set (match_dup 4) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (if_then_else:X (ne:X (match_dup 4) (const_int 0))
+ (match_dup 3)
+ (const_int 0)))])
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_SFB_ALU && (UINTVAL (operands[2]) < 11)"
+  [(set (match_dup 4) (and:X (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (if_then_else:X (ne:X (match_dup 4) (const_int 0))
+ (match_dup 3)
+ (const_int 0)))]
+{
+  operands[2] = GEN_INT (1 << UINTVAL(operands[2]));
+})
diff --git a/gcc/testsuite/gcc.target/riscv/sfb.c 
b/gcc/testsuite/gcc.target/riscv/sfb.c
new file mode 100644
index 000..22f164051f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sfb.c
@@ -0,0 +1,24 @@
+//* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32gc -mabi=ilp32d -mtune=sifive-7-series" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+/* { dg-final { scan-assembler-times "bne" 4 } } */
+/* { dg-final { scan-assembler-times "movcc" 4 } } */
-- 
2.40.1



Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-18 Thread Alexandre Oliva
On Jan 18, 2024, "Kewen.Lin"  wrote:

> Not sure if I missed something in the testing, could you
> kindly double check if those test cases started to fail from r14-6275 on your
> env?

My guess is that they started to fail when David attempted to bypass the
strub tests by changing the dg proc that detects strub support.  The
tests then detected the mismatch between the result of the proc and the
expected errors when strub is disabled properly.


Here's the cleanup patch I promised.  Sorry it took so long, I was
hitting bootstrap errors even on a pristine tree yesterday.

Regstrapped on x86_64-linux-gnu.  This patch restores the status on
sparc-sun-solaris from before Kewen's patch, while leaving all other
ports unchanged.  Ok to install?


strub: introduce STACK_ADDRESS_OFFSET

Since STACK_POINTER_OFFSET is not necessarily at the boundary between
caller- and callee-owned stack, as desired by
__builtin_stack_address(), and using it as if it were or not causes
problems, introduce a new macro so that ports can define it suitably,
without modifying STACK_POINTER_OFFSET.


for  gcc/ChangeLog

PR middle-end/112917
PR middle-end/113100
* builtins.cc (expand_builtin_stack_address): Use
STACK_ADDRESS_OFFSET.
* doc/extend.texi (__builtin_stack_address): Adjust.
* config/sparc/sparc.h (STACK_ADDRESS_OFFSET): Define.
* doc/tm.texi.in (STACK_ADDRESS_OFFSET): Document.
* doc/tm.texi.in: Rebuilt.
---
 gcc/builtins.cc  |5 ++---
 gcc/config/sparc/sparc.h |7 +++
 gcc/doc/extend.texi  |2 +-
 gcc/doc/tm.texi  |   29 +
 gcc/doc/tm.texi.in   |   29 +
 5 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 09f2354f1144b..37df7dcda0a0e 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5450,7 +5450,7 @@ expand_builtin_stack_address ()
   rtx ret = convert_to_mode (ptr_mode, copy_to_reg (stack_pointer_rtx),
 STACK_UNSIGNED);
 
-#ifdef SPARC_STACK_BOUNDARY_HACK
+#ifdef STACK_ADDRESS_OFFSET
   /* Unbias the stack pointer, bringing it to the boundary between the
  stack area claimed by the active function calling this builtin,
  and stack ranges that could get clobbered if it called another
@@ -5477,8 +5477,7 @@ expand_builtin_stack_address ()
  (caller) function's active area as well, whereas those pushed or
  allocated temporarily for a call are regarded as part of the
  callee's stack range, rather than the caller's.  */
-  if (SPARC_STACK_BOUNDARY_HACK)
-ret = plus_constant (ptr_mode, ret, STACK_POINTER_OFFSET);
+  ret = plus_constant (ptr_mode, ret, STACK_ADDRESS_OFFSET);
 #endif
 
   return force_reg (ptr_mode, ret);
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index fc064a92c22d9..fb074808d30d4 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -734,6 +734,13 @@ along with GCC; see the file COPYING3.  If not see
  parameter regs.  */
 #define STACK_POINTER_OFFSET (FIRST_PARM_OFFSET(0) + SPARC_STACK_BIAS)
 
+/* Unbias the stack pointer if needed, and move past the register save area,
+   that is never in use while a function is active, so that it is regarded as a
+   callee save area rather than as part of the function's own stack area.  This
+   enables __strub_leave() to do a better job of clearing the stack frame of a
+   previously-called sibling.  */
+#define STACK_ADDRESS_OFFSET STACK_POINTER_OFFSET
+
 /* Base register for access to local variables of the function.  */
 #define HARD_FRAME_POINTER_REGNUM 30
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0bc586d120e76..00d8aa390cc5e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12791,7 +12791,7 @@ situations.
 
 @deftypefn {Built-in Function} {void *} __builtin_stack_address ()
 This function returns the stack pointer register, offset by
-@code{STACK_POINTER_OFFSET}.
+@code{STACK_ADDRESS_OFFSET} if that's defined.
 
 Conceptually, the returned address returned by this built-in function is
 the boundary between the stack area allocated for use by its caller, and
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 69ae63c77de6e..c8b8b126b2424 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3456,6 +3456,35 @@ or type, otherwise return false.  The default 
implementation always returns
 true.
 @end deftypefn
 
+@defmac STACK_ADDRESS_OFFSET
+Offset from the stack pointer register to the boundary address between
+the stack area claimed by an active function, and stack ranges that
+could get clobbered if it called another function.  It should NOT
+encompass any stack red zone, that is used in leaf functions.
+
+This value is added to the stack pointer register to compute the address
+returned by @code{__builtin_stack_address}, and this is its only use.
+If this macro is not defined, no offset is added.  Defining it like

Re: [PATCH v2 2/2] LoongArch: When the code model is extreme, the symbol address is obtained through macro instructions regardless of the value of -mexplicit-relocs.

2024-01-18 Thread Xi Ruoyao
On Wed, 2024-01-17 at 17:57 +0800, chenglulu wrote:
> > > Virtual register 1479 will be used in insn 2744, but register 1479 was
> > > assigned the REG_UNUSED attribute in the previous instruction.
> > > 
> > > The attached file is the wrong file.
> > > The compilation command is as follows:
> > > 
> > > $ ./gcc/cc1 -fpreprocessed regrename.i -quiet -dp -dumpbase regrename.c
> > > -dumpbase-ext .c -mno-relax -mabi=lp64d -march=loongarch64 -mfpu=64
> > > -msimd=lasx -mcmodel=extreme -mtune=loongarch64 -g3 -O2
> > > -Wno-int-conversion -Wno-implicit-int -Wno-implicit-function-declaration
> > > -Wno-incompatible-pointer-types -version -o regrename.s
> > > -mexplicit-relocs=always -fdump-rtl-all-all
> > I've seen some "guality" test failures in GCC test suite as well.
> > Normally I just ignore the guality failures but this time they look very
> > suspicious.  I'll investigate these issues...
> > 
> I've also seen this type of failed regression tests and I'll continue to 
> look at this issue as well.

The guality regression is simple: I didn't call
delegitimize_mem_from_attrs (the default TARGET_DELEGITIMIZE_ADDRESS) in
the custom implementation.

The failure of this test case was because the compiler believes that two
(UNSPEC_PCREL_64_PART2 [(symbol)]) instances would always produce the
same result, but this isn't true because the result depends on PC.  Thus
(pc) needed to be included in the RTX, like:

  [(set (match_operand:DI 0 "register_operand" "=r")
(unspec:DI [(match_operand:DI 2 "") (pc)] UNSPEC_LA_PCREL_64_PART1))
   (set (match_operand:DI 1 "register_operand" "=r")
(unspec:DI [(match_dup 2) (pc)] UNSPEC_LA_PCREL_64_PART2))]

With this the buggy REG_UNUSED notes were gone.  But it then prevented
the CSE when loading the address of __tls_get_addr (i.e. if we address
10 TLE_LD symbols in a function it would emit 10 instances of "la.global
__tls_get_addr") so I added an REG_EQUAL note for it.  For symbols other
than __tls_get_addr such notes are added automatically by optimization
passes.

Updated patch attached.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University
From e9d789f8dcb52984b0f894fdecc402a49c5ad6d7 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao 
Date: Fri, 5 Jan 2024 18:40:06 +0800
Subject: [PATCH v2] LoongArch: Don't split the instructions containing relocs
 for extreme code model

The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent.  So model them as "one large
instruction", i.e. define_insn, with two output registers.  The real
address is the sum of these two registers.

The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (unspec): Add
	UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
	(la_pcrel64_two_parts): New define_insn.
	* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
	typo in the comment.
	(loongarch_call_tls_get_addr): If TARGET_CMODEL_EXTREME, use
	la_pcrel64_two_parts for addressing the TLS symbol and
	__tls_get_addr.  Emit an REG_EQUAL note to allow CSE addressing
	__tls_get_addr.
	(loongarch_legitimize_tls_address): If TARGET_CMODEL_EXTREME,
	address TLS IE symbols with la_pcrel64_two_parts.
	(loongarch_split_symbol): If TARGET_CMODEL_EXTREME, address
	symbols with la_pcrel64_two_parts.
	(TARGET_DELEGITIMIZE_ADDRESS): Define.
	(loongarch_delegitimize_address): Implement the target hook.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
	Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
	instruction sequences are not reordered by the compiler.
	(NOIPA): Disallow interprocedural optimizations.
	* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
	duplicated from func-call-extreme-1.c, include it instead.
	(dg-options): Likewise.
	* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
	Likewise.
	* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
	Likewise.
	* gcc.target/loongarch/cmodel-extreme-1.c: New test.
	* gcc.target/loongarch/cmodel-extreme-2.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 135 +++---
 gcc/config/loongarch/loongarch.md |  21 +++
 .../gcc.target/loongarch/cmodel-extreme-1.c   |  18 +++
 .../gcc.target/loongarch/cmodel-extreme-2.c   |   7 +
 .../loongarch/func-call-extreme-1.c   |  14 +-
 .../loongarch/func-call-extreme-2.c   |  29 +---
 .../loongarch/func-call-extreme-3.c   |   2 +-
 .../loongarch/func-call-extreme-4.c   |   2 +-
 8 files changed, 144 insertions(+), 84 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/cmodel-extreme-2.c

diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 82467474288..358d2f8f3f5 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ 

Re: Go patch committed: Move lowering pass after check types pass

2024-01-18 Thread Ian Lance Taylor
On Mon, Dec 18, 2023 at 5:32 PM Ian Lance Taylor  wrote:
>
> This Go frontend patch moves the lowering pass after the type
> determination and the type checking passes.  This lets us simplify
> some of the code that determines the type of an expression, which
> previously had to work correctly both before and after type
> determination.
>
> I'm doing this to help with future generic support.  For example, with
> generics, we can see code like
>
> func ident[T any](v T) T { return v }
>
> func F() int32 {
> s := int32(1)
> return ident(s)
> }
>
> Before this change, we would type check return statements in the
> lowering pass (see Return_statement::do_lower).  With a generic
> example like the above, that means we have to determine the type of s,
> and use that to infer the type arguments passed to ident, and use that
> to determine the result type of ident.  That is too much to do at
> lowering time.  Of course we can change the way that return statements
> work, but similar issues arise with index expressions, the types of
> closures for function literals, and probably other cases as well.
>
> Rather than try to deal with all those cases, we move the lowering
> pass after type checking.  This requires a bunch of changes, notably
> for determining constant types.  We have to add type checking for
> various constructs that formerly disappeared in the lowering pass. So
> it's a lot of shuffling.  Sorry for the size of the patch.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

Sorry, I forgot to commit the changes to some of the test files.  I've
committed this patch to fix them.  This fixes PR 113447.

Ian
3d7820c58f9466a80916dfa50dcdfde457b4c597
diff --git a/gcc/testsuite/go.test/test/fixedbugs/issue20185.go 
b/gcc/testsuite/go.test/test/fixedbugs/issue20185.go
index 9065868d7f2..24d74f09126 100644
--- a/gcc/testsuite/go.test/test/fixedbugs/issue20185.go
+++ b/gcc/testsuite/go.test/test/fixedbugs/issue20185.go
@@ -10,7 +10,7 @@
 package p
 
 func F() {
-   switch t := nil.(type) { // ERROR "cannot type switch on non-interface 
value"
+   switch t := nil.(type) { // ERROR "cannot type switch on non-interface 
value|defined to nil type"
default:
_ = t
}
diff --git a/gcc/testsuite/go.test/test/fixedbugs/issue33386.go 
b/gcc/testsuite/go.test/test/fixedbugs/issue33386.go
index 7b2f565285e..c5073910a4c 100644
--- a/gcc/testsuite/go.test/test/fixedbugs/issue33386.go
+++ b/gcc/testsuite/go.test/test/fixedbugs/issue33386.go
@@ -18,7 +18,7 @@ func _() {
 
 func _() {
defer func() { // no error here about deferred function
-   1 +// GCCGO_ERROR "value computed is not used"
+   1 +
}()// ERROR "expecting expression|expected operand"
 }
 
diff --git a/gcc/testsuite/go.test/test/fixedbugs/issue4085a.go 
b/gcc/testsuite/go.test/test/fixedbugs/issue4085a.go
index 200290a081d..f457fcf2b12 100644
--- a/gcc/testsuite/go.test/test/fixedbugs/issue4085a.go
+++ b/gcc/testsuite/go.test/test/fixedbugs/issue4085a.go
@@ -10,9 +10,9 @@ type T []int
 
 func main() {
_ = make(T, -1)// ERROR "negative"
-   _ = make(T, 0.5)   // ERROR "constant 0.5 truncated to 
integer|non-integer len argument"
+   _ = make(T, 0.5)   // ERROR "truncated to integer|non-integer len 
argument"
_ = make(T, 1.0)   // ok
-   _ = make(T, 1<<63) // ERROR "len argument too large"
+   _ = make(T, 1<<63) // ERROR "integer constant overflow|len argument too 
large"
_ = make(T, 0, -1) // ERROR "negative cap"
_ = make(T, 10, 0) // ERROR "len larger than cap"
 }
diff --git a/gcc/testsuite/go.test/test/shift1.go 
b/gcc/testsuite/go.test/test/shift1.go
index d6a6c38839f..3b1aa9e6900 100644
--- a/gcc/testsuite/go.test/test/shift1.go
+++ b/gcc/testsuite/go.test/test/shift1.go
@@ -189,12 +189,12 @@ func _() {
var m1 map[int]string
delete(m1, 1<

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-18 Thread Michael Meissner
On Mon, Jan 15, 2024 at 06:25:13PM +0530, Ajit Agarwal wrote:
> Also Mike and Kewwn suggested to use this pass \before IRA register
> allocator. They are in To List. They have other concerns doing after 
> register allocator.
> 
> They have responded in other mail Chain.

The problem with doing it after register allocation is it limits the hit rate
to the situation where the register allocation happened to guess right, and
allocated adjacent registers.

Note, the PowerPC has some twists:

1) load/store vector pair must use an even/odd VSX register pair.

2) Some instructions only operate on traditional FPR registers (VSX registers
0..31) and others only operate on traditional Altivec registers (VSX reigsters
32..63).  I.e. if you are doing a load vector pair, and you are going to do say
a V2DI vector add, you need to load the vector pair into Altivec registers to
avoid having to do a copy operation.

In general, I tend to feel stuffing things into a larger register and then
using SUBREG is going to be often times generate other moves.  On the PowerPC
right now, we can't even use SUBREG of OOmode (the 256-bit opaque type), but
Peter has patches to deal with some of the issues.

But at the moment, we don't have support for expressing this load such that
register allocation can handle it.

Rather than using a large register mode, I tend to feel that we should enhace
match_parallel so that register allocation can allocate the registers
sequentially.  Now, I haven't looked at match_parallel for 15-20 years, but my
sense was it only worked for fixed registers generated elsewhere (such as for
the load/store string instruction support).

I.e. rather than doing something like:

(set (reg:OO )
 (mem:OO ))

(set (reg:V2DF )
 (subreg:V2DF (reg:OO ) 0))

(set (reg:V2DF )
 (subreg:V2DF (reg:OO ) 16))

; do stuff involving v2df_reg1 and v2df_reg2

(clobber (reg:OO )

(set (subreg:V2DF (reg:OO ) 0)
 (reg:V2DF ))

(set (subreg:V2DF (reg:OO ) 16)
 (reg:V2DF ))

(set (mem:OO )
 (reg:OO ))

We would do:

(parallel [(set (reg:V2DF )
(mem:V2DF ))
   (set (reg:V2DF )
(mem:V2DF )))])

; do stuff involving v2df_reg1 and v2df_reg2

(parallel [(set (mem:V2DF )
(reg:V2DF ))
   (set (mem:V2DF )
(reg:V2DF ))])

Now in those two parallels above, we would need to use match_parallel to ensure
that the registers are allocated sequentially (and in the PowerPC, start on an
even VSX register), and the addresses are bumped up by 16 bytes.

Ideally, the combiner should try to combine things, but it may be simpler to
use a separate MD pass.

It would be nice if we had a standard constraint mechanism like % that says
use % but add 1/2/3/etc. to the register number if it is a REG, or a
size*number added to a memory address if it is a MEM.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[PATCH] Adjust testcase gcc.target/i386/part-vect-copysignhf.c.

2024-01-18 Thread liuhongt
After vect_early_break is supported, more vectorization is enabled(3
COPYSIGN), so adjust testcase for that.

Commit as obvious fix.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-copysignhf.c: Remove
-ftree-vectorize from dg-options.
---
 gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c 
b/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
index 811617bc3dd..0fdcbaea363 100644
--- a/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
+++ b/gcc/testsuite/gcc.target/i386/part-vect-copysignhf.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target avx512fp16 } } */
-/* { dg-options "-O1 -mavx512fp16 -mavx512vl -ftree-vectorize 
-fdump-tree-slp-details -fdump-tree-optimized" } */
+/* { dg-options "-O1 -mavx512fp16 -mavx512vl -fdump-tree-slp-details 
-fdump-tree-optimized" } */
 
 extern void abort ();
 
-- 
2.31.1



Re: [PATCH] RISC-V: Tweak the wording for the sorry message

2024-01-18 Thread Kito Cheng
Thanks, pushed to trunk :)

On Fri, Jan 19, 2024 at 10:36 AM juzhe.zh...@rivai.ai
 wrote:
>
> OK
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Kito Cheng
> Date: 2024-01-19 10:34
> To: rep.dot.nop; jeffreyalaw; rdapp.gcc; juzhe.zhong; gcc-patches
> CC: Kito Cheng
> Subject: [PATCH] RISC-V: Tweak the wording for the sorry message
> Use "does not" rather than "cannot", because it's implementation issue.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_override_options_internal): Tweak
> sorry message.
> ---
> gcc/config/riscv/riscv.cc | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index f1d5129397f..dd6e68a08c2 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -8798,13 +8798,13 @@ riscv_override_options_internal (struct gcc_options 
> *opts)
>   We can only allow TARGET_MIN_VLEN * 8 (LMUL) < 65535.  */
>if (TARGET_MIN_VLEN_OPTS (opts) > 4096)
> -sorry ("Current RISC-V GCC cannot support VLEN greater than 4096bit for "
> +sorry ("Current RISC-V GCC does not support VLEN greater than 4096bit 
> for "
>"'V' Extension");
>/* FIXME: We don't support RVV in big-endian for now, we may enable RVV 
> with
>   big-endian after finishing full coverage testing.  */
>if (TARGET_VECTOR && TARGET_BIG_ENDIAN)
> -sorry ("Current RISC-V GCC cannot support RVV in big-endian mode");
> +sorry ("Current RISC-V GCC does not support RVV in big-endian mode");
>/* Convert -march to a chunks count.  */
>riscv_vector_chunks = riscv_convert_vector_bits (opts);
> --
> 2.34.1
>
>


Re: [PATCH] RISC-V:Raname UNSPEC_CLMUL in vector-crypto.md

2024-01-18 Thread Kito Cheng
Thanks, pushed to trunk :)

On Fri, Jan 19, 2024 at 10:30 AM KuanLin Chen  wrote:
>
>  UNSPEC_CLMUL is defined to define_c_enum in riscv.md, so
>  it shouldn't be redefined to define_int_iterator again.
>
> gcc/ChangeLog:
>
> * config/riscv/vector-crypto.md (UNSPEC_CLMUL): Rename to UNSPEC_CLMUL_VC.
>


Re: [PATCH] RISC-V: Tweak the wording for the sorry message

2024-01-18 Thread juzhe.zh...@rivai.ai
OK



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2024-01-19 10:34
To: rep.dot.nop; jeffreyalaw; rdapp.gcc; juzhe.zhong; gcc-patches
CC: Kito Cheng
Subject: [PATCH] RISC-V: Tweak the wording for the sorry message
Use "does not" rather than "cannot", because it's implementation issue.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_override_options_internal): Tweak
sorry message.
---
gcc/config/riscv/riscv.cc | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f1d5129397f..dd6e68a08c2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8798,13 +8798,13 @@ riscv_override_options_internal (struct gcc_options 
*opts)
  We can only allow TARGET_MIN_VLEN * 8 (LMUL) < 65535.  */
   if (TARGET_MIN_VLEN_OPTS (opts) > 4096)
-sorry ("Current RISC-V GCC cannot support VLEN greater than 4096bit for "
+sorry ("Current RISC-V GCC does not support VLEN greater than 4096bit for "
   "'V' Extension");
   /* FIXME: We don't support RVV in big-endian for now, we may enable RVV with
  big-endian after finishing full coverage testing.  */
   if (TARGET_VECTOR && TARGET_BIG_ENDIAN)
-sorry ("Current RISC-V GCC cannot support RVV in big-endian mode");
+sorry ("Current RISC-V GCC does not support RVV in big-endian mode");
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits (opts);
-- 
2.34.1
 
 


[PATCH] RISC-V: Tweak the wording for the sorry message

2024-01-18 Thread Kito Cheng
Use "does not" rather than "cannot", because it's implementation issue.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_override_options_internal): Tweak
sorry message.
---
 gcc/config/riscv/riscv.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f1d5129397f..dd6e68a08c2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8798,13 +8798,13 @@ riscv_override_options_internal (struct gcc_options 
*opts)
 
  We can only allow TARGET_MIN_VLEN * 8 (LMUL) < 65535.  */
   if (TARGET_MIN_VLEN_OPTS (opts) > 4096)
-sorry ("Current RISC-V GCC cannot support VLEN greater than 4096bit for "
+sorry ("Current RISC-V GCC does not support VLEN greater than 4096bit for "
   "'V' Extension");
 
   /* FIXME: We don't support RVV in big-endian for now, we may enable RVV with
  big-endian after finishing full coverage testing.  */
   if (TARGET_VECTOR && TARGET_BIG_ENDIAN)
-sorry ("Current RISC-V GCC cannot support RVV in big-endian mode");
+sorry ("Current RISC-V GCC does not support RVV in big-endian mode");
 
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits (opts);
-- 
2.34.1



[PATCH] RISC-V:Raname UNSPEC_CLMUL in vector-crypto.md

2024-01-18 Thread KuanLin Chen
 UNSPEC_CLMUL is defined to define_c_enum in riscv.md, so
 it shouldn't be redefined to define_int_iterator again.

*gcc/ChangeLog:*

* config/riscv/vector-crypto.md (UNSPEC_CLMUL): Rename to UNSPEC_CLMUL_VC.


0001-RISC-V-Raname-UNSPEC_CLMUL-in-vector-crypto.md.patch
Description: Binary data


[COMMITTED] More precise documentation for cleanup attribute [PR110029]

2024-01-18 Thread Sandra Loosemore
gcc/ChangeLog
PR c/110029
* doc/extend.texi (Common Variable Attributes): Explain what
happens when multiple variables with cleanups are in the same scope.
---
 gcc/doc/extend.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 616e26d47dc..0bc586d120e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7782,6 +7782,11 @@ with static storage duration.  The function must take 
one parameter,
 a pointer to a type compatible with the variable.  The return value
 of the function (if any) is ignored.
 
+When multiple variables in the same scope have @code{cleanup}
+attributes, at exit from the scope their associated cleanup functions
+are run in reverse order of definition (last defined, first
+cleanup).
+
 If @option{-fexceptions} is enabled, then @var{cleanup_function}
 is run during the stack unwinding that happens during the
 processing of the exception.  Note that the @code{cleanup} attribute
-- 
2.31.1



[PATCH] Fix testcase failure on many platforms which don't support vect_int_max.

2024-01-18 Thread liuhongt
After r14-7124-g6686e16fda4190, the testcase can be optimized to
MAX_EXPR if the backends support that. So I adjust the testcase to
scan for MAX_EXPR, but it failed many platforms which don't support
that.
As pinski mentioned, target vect_no_int_min_max is only available
under vect directory, so for simplicity, I adjust the testcase to scan
either MAX_EXPR or original VEC_COND_EXPR.

Commit as an obvious fix.

gcc/testsuite/ChangeLog:

PR testsuite/113437
* gcc.dg/tree-ssa/pr95906.c: Scan either MAX_EXPR or
VEC_COND_EXPR.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr95906.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
index d15670f3e9e..ce43983f341 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c
@@ -9,4 +9,4 @@ v16i8 f(v16i8 a, v16i8 b)
 }
 
 /* { dg-final { scan-tree-dump-not "bit_(and|ior)_expr" "forwprop3" } } */
-/* { dg-final { scan-tree-dump-times "max_expr" 1 "forwprop3" } } */
+/* { dg-final { scan-tree-dump-times {(?n)(?:max_expr|vec_cond_expr)} 1 
"forwprop3" } } */
-- 
2.31.1



[COMMITTED] Improve documentation of noinline and noipa attributes [PR108470]

2024-01-18 Thread Sandra Loosemore
gcc/ChangeLog
PR ipa/108470
* doc/extend.texi (Common Function Attributes): Document that
noinline also disables some interprocedural optimizations and
improve flow to the part about using inline asm instead to
disable calls from being optimized away completely.  Remove the
sentence that says noipa is mainly for internal compiler testing.
---
 gcc/doc/extend.texi | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d879ad544b5..616e26d47dc 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3666,13 +3666,17 @@ propagation.
 @cindex @code{noinline} function attribute
 @item noinline
 This function attribute prevents a function from being considered for
-inlining.
+inlining.  It also disables some other interprocedural optimizations; it's
+preferable to use the more comprehensive @code{noipa} attribute instead
+if that is your goal.
+
 @c Don't enumerate the optimizations by name here; we try to be
 @c future-compatible with this mechanism.
-If the function does not have side effects, there are optimizations
-other than inlining that cause function calls to be optimized away,
-although the function call is live.  To keep such calls from being
-optimized away, put
+Even if a function is declared with the @code{noinline} attribute,
+there are optimizations other than inlining that can cause calls to be
+optimized away if it does not have side effects, although the function
+call is live.  To keep such calls from being optimized away, put
+
 @smallexample
 asm ("");
 @end smallexample
@@ -3691,8 +3695,7 @@ the body.  This attribute implies @code{noinline}, 
@code{noclone} and
 to a combination of other attributes, because its purpose is to suppress
 existing and future optimizations employing interprocedural analysis,
 including those that do not have an attribute suitable for disabling
-them individually.  This attribute is supported mainly for the purpose
-of testing the compiler.
+them individually.
 
 @cindex @code{nonnull} function attribute
 @cindex functions with non-null pointer arguments
-- 
2.31.1



[r14-8206 Regression] FAIL: gfortran.dg/forall_1.f90 -O3 -g (test for excess errors) on Linux/x86_64

2024-01-18 Thread haochen.jiang
On Linux/x86_64,

0f3880d6ad0e40c4a8b6d94b2c93931cdf42 is the first bad commit
commit 0f3880d6ad0e40c4a8b6d94b2c93931cdf42
Author: Richard Biener 
Date:   Wed Jan 17 13:24:22 2024 +0100

tree-optimization/113374 - early break vect and virtual operands

caused

FAIL: gcc.c-torture/execute/20150611-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler 
error: Segmentation fault)
FAIL: gcc.c-torture/execute/20150611-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
errors)
FAIL: gcc.c-torture/execute/20150611-1.c   -O3 -g  (internal compiler error: 
Segmentation fault)
FAIL: gcc.c-torture/execute/20150611-1.c   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/forall_1.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (internal compiler error: in 
add_phi_arg, at tree-phinodes.cc:366)
FAIL: gfortran.dg/forall_1.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/forall_1.f90   -O3 -g  (internal compiler error: in 
add_phi_arg, at tree-phinodes.cc:366)
FAIL: gfortran.dg/forall_1.f90   -O3 -g  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-8206/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/20150611-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="execute.exp=gcc.c-torture/execute/20150611-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/forall_1.f90 --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/forall_1.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH] aarch64: Fix __builtin_apply with -mgeneral-regs-only [PR113486]

2024-01-18 Thread Andrew Pinski
The problem here is the builtin apply mechanism thinks the FP registers
are to be used due to get_raw_arg_mode not returning VOIDmode. This
fixes that oversight and the backend now returns VOIDmode for non-general-regs
if TARGET_GENERAL_REGS_ONLY is true.

Built and tested for aarch64-linux-gnu with no regressions.

PR target/113486

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_get_reg_raw_mode): For
TARGET_GENERAL_REGS_ONLY, return VOIDmode for non-GP_REGNUM_P regno.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/builtin_apply-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.cc  |  4 
 gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c | 12 
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e6bd3fd0bb4..a838cbba51d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -7221,6 +7221,10 @@ aarch64_function_arg_boundary (machine_mode mode, 
const_tree type)
 static fixed_size_mode
 aarch64_get_reg_raw_mode (int regno)
 {
+  /* Don't use any non GP registers for __builtin_apply and
+ __builtin_return if general registers only mode is requested. */
+  if (TARGET_GENERAL_REGS_ONLY && !GP_REGNUM_P (regno))
+return as_a  (VOIDmode);
   if (TARGET_SVE && FP_REGNUM_P (regno))
 /* Don't use the SVE part of the register for __builtin_apply and
__builtin_return.  The SVE registers aren't used by the normal PCS,
diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c 
b/gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c
new file mode 100644
index 000..d70abe037d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/builtin_apply-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mgeneral-regs-only" } */
+/* PR target/113486 */
+
+
+/* __builtin_apply should not use FP registers if 
+   general registers only mode is requested. */
+void
+foo (void)
+{
+  __builtin_apply (foo, 0, 0);
+}
-- 
2.39.3



Re: [PATCH] Pass GUILE down to subdirectories

2024-01-18 Thread Tom Tromey
Andrew> This change is causing some problems for me.

Yeah, Tom de Vries as well.

Andrew> One of my build machines has 2 versions of guile installed.  One is
Andrew> guile 2.0.14 and the other is guile 2.2.21.

Andrew> When GDB configures itself the configure script figures out that it
Andrew> should use 2.2.21 to compile the guile libraries that GDB uses.

Andrew> However, when we actually build the guile libraries we do use guild2.2,
Andrew> but due to this 'GUILE = guile' line, guild2.2 uses guile 2.0.14 in
Andrew> order to perform the compile (I guess, I don't know the details of how
Andrew> guile compilation works).

Andrew> Unfortunately guile 2.0.14 compiles in a way which is not compatible
Andrew> with how GDB then tries to load the guile library.

I consider this a bug in guile -- it installs 'guild' with this:

#!/usr/bin/sh
# -*- scheme -*-
exec ${GUILE:-/usr/bin/guile2.2} $GUILE_FLAGS -e '(@@ (guild) main)' -s 
"$0" "$@"
!#

Allowing a system script to pick $GUILE here seems weird, especially for
a versioned install of "guild", where as you can see it already knows
the correct guile to use.

However -- I think it's better to just work around this.
I plan to back out this change.  Anyone needing to re-run cgen (which
itself ought to come with smarts here, but since it is un-maintained...)
can just specify this by hand.  I.e., the status quo ante.

I'll try to send a patch tomorrow.

Tom


Re: [COMMITTED] rust_debug: Cast size_t values to unsigned long before printing.

2024-01-18 Thread Arthur Cohen

Hi Iain,

On 1/18/24 12:02, Iain Sandoe wrote:

Hi Arthur,


On 18 Jan 2024, at 10:30, Arthur Cohen  wrote:



On 1/18/24 10:13, Rainer Orth wrote:

Arthur Cohen  writes:

Using %lu to format size_t values breaks 32 bit targets, and %zu is not
supported by one of the hosts GCC aims to support - HPUX

But we do have uses of %zu in gcc/rust already!

diff --git a/gcc/rust/expand/rust-proc-macro.cc 
b/gcc/rust/expand/rust-proc-macro.cc
index e8618485b71..09680733e98 100644
--- a/gcc/rust/expand/rust-proc-macro.cc
+++ b/gcc/rust/expand/rust-proc-macro.cc
@@ -171,7 +171,7 @@ load_macros (std::string path)
if (array == nullptr)
  return {};
  -  rust_debug ("Found %lu procedural macros", array->length);
+  rust_debug ("Found %lu procedural macros", (unsigned long) array->length);

Not the best way either: array->length is std::uint64_t, so the format
should use
... %" PRIu64 " procedural...
instead.
I've attached my patch to PR rust/113461.


Yes, I was talking about this on IRC the other day - if we do run in a 
situation where we have more than UINT32_MAX procedural macros in memory we 
have big issues. These debug prints will probably end up getting removed soon 
as they clutter the output a lot for little information.

I don't mind doing it the right way for our regular prints, but we have not 
been using PRIu64 in our codebase so far, so I'd rather change all those 
incriminating format specifiers at once later down the line - this patch was 
pushed so that 32bit targets could bootstrap the Rust frontend for now.


For the sake of completeness, the issue does not just affect 32b hosts;  If a 
64b host chooses (as Darwin does, so that 32b and 64b targets have the same 
representation) to make uint64_t “unsigned long long int”, then %lu breaks 
there too.
thanks
Iain



Thanks for the precision! I'll definitely be more careful moving forward.

Kindly,

Arthur


Re: [COMMITTED] rust_debug: Cast size_t values to unsigned long before printing.

2024-01-18 Thread Arthur Cohen

Hi Rainer,

On 1/18/24 10:34, Rainer Orth wrote:

Hi Arthur,


Yes, I was talking about this on IRC the other day - if we do run in a
situation where we have more than UINT32_MAX procedural macros in memory
we have big issues. These debug prints will probably end up getting removed
soon as they clutter the output a lot for little information.


makes sense, especially if they break the build once in a while ;-)


I don't mind doing it the right way for our regular prints, but we have not
been using PRIu64 in our codebase so far, so I'd rather change all those
incriminating format specifiers at once later down the line - this patch
was pushed so that 32bit targets could bootstrap the Rust frontend for now.


Makes sense: using different styles throughout the codebase only creates
confusion.

On a related issue: didn't you have some 32-bit host in your CI?  I
remember having similar issues in the past which could easily be avoided
in advance this way.


We do have 32 bits runners in the buildbot Mark takes care of, but they 
were not running bootstrap builds so this was getting ignored as it only 
produced a warning. Definitely something I want to fix quickly.




Thanks.
 Rainer



[middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-18 Thread Roger Sayle

This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:

unsigned long long foo(unsigned long long x) { return x<<2; }

which with -O2 is currently compiled to:

foo:lsr r2,r0,30
asl_s   r1,r1,2
asl_s   r0,r0,2
j_s.d   [blink]
or_sr1,r1,r2

with this patch becomes:

foo:lsr r2,r0,30
add2r1,r2,r1
j_s.d   [blink]
asl_s   r0,r0,2

unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }

which with -O2 is currently compiled to 6 insns + return:

bar:lsr r12,r0,30
asl_s   r3,r1,2
asl_s   r0,r0,2
lsr_s   r1,r1,30
or_sr0,r0,r1
j_s.d   [blink]
or  r1,r12,r3

with this patch becomes 4 insns + return:

bar:lsr r3,r1,30
lsr r2,r0,30
add2r1,r2,r1
j_s.d   [blink]
add2r0,r3,r0


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-01-18  Roger Sayle  

gcc/ChangeLog
* expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
to generate PLUS instead or IOR when unioning disjoint bitfields.
* optabs.cc (expand_subword_shift): Likewise.
(expand_binop): Likewise for double-word rotate.


Thanks in advance,
Roger
--

diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 5916d6ed1bc..d1900f97f0c 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2610,10 +2610,11 @@ expand_shift_1 (enum tree_code code, machine_mode mode, 
rtx shifted,
  else if (methods == OPTAB_LIB_WIDEN)
{
  /* If we have been unable to open-code this by a rotation,
-do it as the IOR of two shifts.  I.e., to rotate A
-by N bits, compute
+do it as the IOR or PLUS of two shifts.  I.e., to rotate
+A by N bits, compute
 (A << N) | ((unsigned) A >> ((-N) & (C - 1)))
-where C is the bitsize of A.
+where C is the bitsize of A.  If N cannot be zero,
+use PLUS instead of IOR.
 
 It is theoretically possible that the target machine might
 not be able to perform either shift and hence we would
@@ -2650,8 +2651,9 @@ expand_shift_1 (enum tree_code code, machine_mode mode, 
rtx shifted,
  temp1 = expand_shift_1 (left ? RSHIFT_EXPR : LSHIFT_EXPR,
  mode, shifted, other_amount,
  subtarget, 1);
- return expand_binop (mode, ior_optab, temp, temp1, target,
-  unsignedp, methods);
+ return expand_binop (mode,
+  CONST_INT_P (op1) ? add_optab : ior_optab,
+  temp, temp1, target, unsignedp, methods);
}
 
  temp = expand_binop (mode,
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ce91f94ed43..dcd3e406719 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -566,8 +566,8 @@ expand_subword_shift (scalar_int_mode op1_mode, optab 
binoptab,
   if (tmp == 0)
return false;
 
-  /* Now OR in the bits carried over from OUTOF_INPUT.  */
-  if (!force_expand_binop (word_mode, ior_optab, tmp, carries,
+  /* Now OR/PLUS in the bits carried over from OUTOF_INPUT.  */
+  if (!force_expand_binop (word_mode, add_optab, tmp, carries,
   into_target, unsignedp, methods))
return false;
 }
@@ -1937,7 +1937,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, 
rtx op1,
 NULL_RTX, unsignedp, next_methods);
 
  if (into_temp1 != 0 && into_temp2 != 0)
-   inter = expand_binop (word_mode, ior_optab, into_temp1, into_temp2,
+   inter = expand_binop (word_mode, add_optab, into_temp1, into_temp2,
  into_target, unsignedp, next_methods);
  else
inter = 0;
@@ -1953,7 +1953,7 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, 
rtx op1,
  NULL_RTX, unsignedp, next_methods);
 
  if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
-   inter = expand_binop 

Re: [PATCH] Remove remnant of removed Cygwin options from invoke.texi [PR108521]

2024-01-18 Thread Sandra Loosemore

On 1/18/24 12:41, Sandra Loosemore wrote:

From: Brian Inglis 

The -mcygwin option for x86 Windows was removed in 2010 by commit
3edeb30d044a4852881c34229e618b34f95b0d9e, but this reference was
overlooked.

gcc/ChangeLog
PR target/108521
* doc/invoke.texi (Option Summary): Remove -mcygwin and -mno-cygwin
from x86 Windows Options.


Oops, forgot to edit the tag in the patch before mailing.  This is committed 
now.

-Sandra


[PATCH] Remove remnant of removed Cygwin options from invoke.texi [PR108521]

2024-01-18 Thread Sandra Loosemore
From: Brian Inglis 

The -mcygwin option for x86 Windows was removed in 2010 by commit
3edeb30d044a4852881c34229e618b34f95b0d9e, but this reference was
overlooked.

gcc/ChangeLog
PR target/108521
* doc/invoke.texi (Option Summary): Remove -mcygwin and -mno-cygwin
from x86 Windows Options.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4d43dda9839..0ef2b894ea9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1490,7 +1490,7 @@ See RS/6000 and PowerPC Options.
 -munroll-only-small-loops -mlam=@var{choice}}
 
 @emph{x86 Windows Options}
-@gccoptlist{-mconsole  -mcrtdll=@var{library}  -mcygwin  -mno-cygwin  -mdll
+@gccoptlist{-mconsole  -mcrtdll=@var{library}  -mdll
 -mnop-fun-dllimport  -mthread
 -municode  -mwin32  -mwindows  -fno-set-stack-executable}
 
-- 
2.31.1



[COMMITTED] Restore documentation for const/volatile functions [PR107942]

2024-01-18 Thread Sandra Loosemore
In r5-7698-g8648c55f3b703a I accidentally removed the documentation of
GCC's special interpretation of const/volatile qualifiers on functions
from the function attributes section, thinking this was just a
bit-rotten leftover from old versions of GCC.  PR107942 points out
that this functionality is still present even though the docs are now gone.

I decided this material didn't really belong in the function
attributes discussion, but a new subsection in the general list of GCC
extensions to the C language.  And I agree with the comment in the
issue that we shouldn't really recommend this usage any more.

gcc/ChangeLog
PR c/107942
* doc/extend.texi (C Extensions): Add new section to menu.
(Function Attributes):  Move dangling index entries to
(Const and Volatile Functions): New section.
---
 gcc/doc/extend.texi | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d1893ad860c..d879ad544b5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -70,6 +70,7 @@ extensions, accepted by GCC in C90 mode and in C++.
 * Character Escapes::   @samp{\e} stands for the character @key{ESC}.
 * Alignment::   Determining the alignment of a function, type or 
variable.
 * Inline::  Defining inline functions (as fast as macros).
+* Const and Volatile Functions :: GCC interprets these specially in C.
 * Volatiles::   What constitutes an access to a volatile object.
 * Using Assembly Language with C:: Instructions and extensions for interfacing 
C with assembler.
 * Alternate Keywords::  @code{__const__}, @code{__asm__}, etc., for header 
files.
@@ -2522,8 +2523,6 @@ the enclosing block.
 @section Declaring Attributes of Functions
 @cindex function attributes
 @cindex declaring attributes of functions
-@cindex @code{volatile} applied to function
-@cindex @code{const} applied to function
 
 In GNU C and C++, you can use function attributes to specify certain
 function properties that may help the compiler optimize calls or
@@ -10397,6 +10396,40 @@ The definition in the header file causes most calls to 
the function
 to be inlined.  If any uses of the function remain, they refer to
 the single copy in the library.
 
+@node Const and Volatile Functions
+@section Const and Volatile Functions
+@cindex @code{const} applied to function
+@cindex @code{volatile} applied to function
+
+The C standard explicitly leaves the behavior of the @code{const} and
+@code{volatile} type qualifiers applied to functions undefined; these
+constructs can only arise through the use of @code{typedef}.  As an extension,
+GCC defines this use of the @code{const} qualifier to have the same meaning
+as the GCC @code{const} function attribute, and the @code{volatile} qualifier
+to be equivalent to the @code{noreturn} attribute.
+@xref{Common Function Attributes}, for more information.
+
+As examples of this usage,
+
+@smallexample
+
+/* @r{Equivalent to:}
+   void fatal () __attribute__ ((noreturn));  */
+typedef void voidfn ();
+volatile voidfn fatal;
+
+/* @r{Equivalent to:}
+   extern int square (int) __attribute__ ((const));  */
+typedef int intfn (int);
+extern const intfn square;
+@end smallexample
+
+In general, using function attributes instead is preferred, since the
+attributes make both the intent of the code and its reliance on a GNU
+extension explicit.  Additionally, using @code{const} and
+@code{volatile} in this way is specific to GNU C and does not work in
+GNU C++.
+
 @node Volatiles
 @section When is a Volatile Object Accessed?
 @cindex accessing volatiles
-- 
2.31.1



[pushed] Fix ICE in -fdiagnostics-generate-patch [PR112684]

2024-01-18 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-8255-ge254d1224df306.

gcc/ChangeLog:
PR middle-end/112684
* toplev.cc (toplev::main): Don't ICE in
-fdiagnostics-generate-patch when exiting after options,
since no edit context will have been created.

Signed-off-by: David Malcolm 
---
 gcc/toplev.cc | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index 55636ff6e80..175d4cd18fa 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -2323,11 +2323,8 @@ toplev::main (int argc, char **argv)
  emit some diagnostics here.  */
   invoke_plugin_callbacks (PLUGIN_FINISH, NULL);
 
-  if (flag_diagnostics_generate_patch)
+  if (auto edit_context_ptr = global_dc->get_edit_context ())
 {
-  auto edit_context_ptr = global_dc->get_edit_context ();
-  gcc_assert (edit_context_ptr);
-
   pretty_printer pp;
   pp_show_color () = pp_show_color (global_dc->printer);
   edit_context_ptr->print_diff (, true);
-- 
2.26.3



[pushed] analyzer: fix ICE on strlen ((char *)_CST) [PR111361]

2024-01-18 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-8257-gd5604febcfb094.

gcc/analyzer/ChangeLog:
PR analyzer/111361
* region-model.cc (svalue_byte_range_has_null_terminator_1): The
initial byte of an all-zeroes SVAL is a zero byte.  Remove
gcc_unreachable from SK_CONSTANT for constants that aren't
STRING_CST or INTEGER_CST.

gcc/testsuite/ChangeLog:
PR analyzer/111361
* c-c++-common/analyzer/strlen-pr111361.c: New test.
* c-c++-common/analyzer/strncpy-1.c (test_zero_fill): Remove fixed
xfail.
* c-c++-common/analyzer/strncpy-pr111361.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc   |  9 -
 .../c-c++-common/analyzer/strlen-pr111361.c| 18 ++
 .../c-c++-common/analyzer/strncpy-1.c  |  3 +--
 .../c-c++-common/analyzer/strncpy-pr111361.c   |  8 
 4 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strlen-pr111361.c
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strncpy-pr111361.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index f01010cf630..dbb2149dbd4 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3605,6 +3605,14 @@ svalue_byte_range_has_null_terminator_1 (const svalue 
*sval,
 byte_offset_t *out_bytes_read,
 logger *logger)
 {
+  if (bytes.m_start_byte_offset == 0
+  && sval->all_zeroes_p ())
+{
+  /* The initial byte of an all-zeroes SVAL is a zero byte.  */
+  *out_bytes_read = 1;
+  return tristate (true);
+}
+
   switch (sval->get_kind ())
 {
 case SK_CONSTANT:
@@ -3631,7 +3639,6 @@ svalue_byte_range_has_null_terminator_1 (const svalue 
*sval,
return tristate::TS_UNKNOWN;
 
  default:
-   gcc_unreachable ();
break;
  }
   }
diff --git a/gcc/testsuite/c-c++-common/analyzer/strlen-pr111361.c 
b/gcc/testsuite/c-c++-common/analyzer/strlen-pr111361.c
new file mode 100644
index 000..b3b875c5a97
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/strlen-pr111361.c
@@ -0,0 +1,18 @@
+#include "analyzer-decls.h"
+
+typedef int __attribute__((__vector_size__ (32))) V;
+
+typedef __SIZE_TYPE__ size_t;
+
+static size_t __attribute__((noinline))
+call_strlen (const char *p)
+{
+  return __builtin_strlen (p);
+}
+
+void
+foo (void *out)
+{
+  V v = (V) { };
+  __analyzer_eval (call_strlen ((const char *)) == 0); /* { dg-warning 
"TRUE" } */
+}
diff --git a/gcc/testsuite/c-c++-common/analyzer/strncpy-1.c 
b/gcc/testsuite/c-c++-common/analyzer/strncpy-1.c
index 3ca1d81d90c..8edaf26654d 100644
--- a/gcc/testsuite/c-c++-common/analyzer/strncpy-1.c
+++ b/gcc/testsuite/c-c++-common/analyzer/strncpy-1.c
@@ -44,8 +44,7 @@ test_zero_fill (char *dst)
   __analyzer_eval (dst[4] == '\0'); /* { dg-warning "TRUE" "correct" { xfail 
*-*-* } } */
   /* { dg-bogus "UNKNOWN" "status quo" { xfail *-*-* } .-1 } */
   __analyzer_eval (__analyzer_get_strlen (dst) == 0); /* { dg-warning "TRUE" } 
*/
-  __analyzer_eval (__analyzer_get_strlen (dst + 1) == 0); /* { dg-warning 
"TRUE" "correct" { xfail *-*-* } } */
-  /* { dg-bogus "UNKNOWN" "status quo" { xfail *-*-* } .-1 } */
+  __analyzer_eval (__analyzer_get_strlen (dst + 1) == 0); /* { dg-warning 
"TRUE" } */
 }
 
 char *test_unterminated_concrete_a (char *dst)
diff --git a/gcc/testsuite/c-c++-common/analyzer/strncpy-pr111361.c 
b/gcc/testsuite/c-c++-common/analyzer/strncpy-pr111361.c
new file mode 100644
index 000..da3eaeb6edb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/strncpy-pr111361.c
@@ -0,0 +1,8 @@
+typedef int __attribute__((__vector_size__ (32))) V;
+
+void
+foo (char *out)
+{
+  V v = (V) { };
+  __builtin_strncpy (out, (char *), 5);
+}
-- 
2.26.3



[pushed] analyzer: fix offsets in has_null_terminator [PR112811]

2024-01-18 Thread David Malcolm
PR analyzer/112811 reports an ICE attempting to determine whether a
string is null-terminated.

The root cause is confusion in the code about whether byte offsets are
relative to the start of the base region, or relative to the bound
fragment within the the region.

This patch rewrites the code to enforce a clearer separation between
the kinds of offset, fixing the ICE, and adds logging to help track
down future issues in this area of the code.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-8256-g84096e665c5f7d.

gcc/analyzer/ChangeLog:
PR analyzer/112811
* region-model.cc (fragment::dump_to_pp): New.
(fragment::has_null_terminator): Convert to...
(svalue_byte_range_has_null_terminator_1): ...this new function,
updating to use a byte_range relative to the start of the svalue.
(svalue_byte_range_has_null_terminator): New.
(fragment::string_cst_has_null_terminator): Convert to...
(string_cst_has_null_terminator): ...this, updating to use a
byte_range relative to the start of the svalue.
(iterable_cluster::dump_to_pp): New.
(region_model::scan_for_null_terminator): Add logging, moving body
to...
(region_model::scan_for_null_terminator_1): ...this new function,
adding more logging, and updating to use
svalue_byte_range_has_null_terminator.
* region-model.h (region_model::scan_for_null_terminator_1): New
decl.

gcc/testsuite/ChangeLog:
PR analyzer/112811
* c-c++-common/analyzer/strlen-pr112811.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc  | 431 --
 gcc/analyzer/region-model.h   |   4 +
 .../c-c++-common/analyzer/strlen-pr112811.c   |  18 +
 3 files changed, 319 insertions(+), 134 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/analyzer/strlen-pr112811.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 95a52f66933..f01010cf630 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3528,144 +3528,206 @@ struct fragment
 return byte_range::cmp (f1->m_byte_range, f2->m_byte_range);
   }
 
-  /* Determine if there is a zero terminator somewhere in the
- bytes of this fragment, starting at START_READ_OFFSET (which
- is absolute to the start of the cluster as a whole), and stopping
- at the end of this fragment.
-
- Return a tristate:
- - true if there definitely is a zero byte, writing to *OUT_BYTES_READ
- the number of bytes from that would be read, including the zero byte.
- - false if there definitely isn't a zero byte
- - unknown if we don't know.  */
-  tristate has_null_terminator (byte_offset_t start_read_offset,
-   byte_offset_t *out_bytes_read) const
+  void
+  dump_to_pp (pretty_printer *pp) const
   {
-byte_offset_t rel_start_read_offset
-  = start_read_offset - m_byte_range.get_start_byte_offset ();
-gcc_assert (rel_start_read_offset >= 0);
-byte_offset_t available_bytes
-  = (m_byte_range.get_next_byte_offset () - start_read_offset);
-gcc_assert (available_bytes >= 0);
-
-if (rel_start_read_offset > INT_MAX)
-  return tristate::TS_UNKNOWN;
-HOST_WIDE_INT rel_start_read_offset_hwi = rel_start_read_offset.slow ();
-
-if (available_bytes > INT_MAX)
-  return tristate::TS_UNKNOWN;
-HOST_WIDE_INT available_bytes_hwi = available_bytes.slow ();
-
-switch (m_sval->get_kind ())
+pp_string (pp, "fragment(");
+m_byte_range.dump_to_pp (pp);
+pp_string (pp, ", sval: ");
+if (m_sval)
+  m_sval->dump_to_pp (pp, true);
+else
+  pp_string (pp, "nullptr");
+pp_string (pp, ")");
+  }
+
+  byte_range m_byte_range;
+  const svalue *m_sval;
+};
+
+/* Determine if there is a zero terminator somewhere in the
+   part of STRING_CST covered by BYTES (where BYTES is relative to the
+   start of the constant).
+
+   Return a tristate:
+   - true if there definitely is a zero byte, writing to *OUT_BYTES_READ
+   the number of bytes from that would be read, including the zero byte.
+   - false if there definitely isn't a zero byte
+   - unknown if we don't know.  */
+
+static tristate
+string_cst_has_null_terminator (tree string_cst,
+   const byte_range ,
+   byte_offset_t *out_bytes_read)
+{
+  gcc_assert (bytes.m_start_byte_offset >= 0);
+  gcc_assert (bytes.m_start_byte_offset < TREE_STRING_LENGTH (string_cst));
+
+  /* Look for the first 0 byte within STRING_CST
+ from START_READ_OFFSET onwards.  */
+  const byte_offset_t num_bytes_to_search
+= std::min ((TREE_STRING_LENGTH (string_cst)
+   - bytes.m_start_byte_offset),
+  bytes.m_size_in_bytes);
+  const char 

[PATCH] Avoid ICE in single-bit logical RMWs on m68k-uclinux [PR108640]

2024-01-18 Thread Mikael Pettersson
When generating RMW logical operations on m68k, the backend
recognizes single-bit operations and rewrites them as bit
instructions on operands adjusted to address the intended byte.
When offsetting the addresses the backend keeps the modes as
SImode, even though the actual access will be in QImode.

The uclinux target defines M68K_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
which adds a check that the adjusted operand is within the bounds
of the original object.  Since the address has been offset it is
not, and the compiler ICEs.

The bug is that the modes of the adjusted operands should have been
narrowed to QImode, which is that this patch does.  Nearby code
which narrows to HImode gets that right.

Bootstrapped and regression tested on m68k-linux-gnu.

Ok for master? (Note: I don't have commit rights.)

gcc/

PR target/108640
* config/m68k/m68k.cc (output_andsi3): Use QImode for
address adjusted for 1-byte RMW access.
(output_iorsi3): Likewise.
(output_xorsi3): Likewise.

gcc/testsuite/

PR target/108640
* gcc.target/m68k/pr108640.c: New test.
---
 gcc/config/m68k/m68k.cc  | 6 +++---
 gcc/testsuite/gcc.target/m68k/pr108640.c | 7 +++
 2 files changed, 10 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/m68k/pr108640.c

diff --git a/gcc/config/m68k/m68k.cc b/gcc/config/m68k/m68k.cc
index e9325686b92..6cd45b53406 100644
--- a/gcc/config/m68k/m68k.cc
+++ b/gcc/config/m68k/m68k.cc
@@ -5471,7 +5471,7 @@ output_andsi3 (rtx *operands)
operands[1] = GEN_INT (logval);
   else
 {
- operands[0] = adjust_address (operands[0], SImode, 3 - (logval / 8));
+ operands[0] = adjust_address (operands[0], QImode, 3 - (logval / 8));
  operands[1] = GEN_INT (logval % 8);
 }
   return "bclr %1,%0";
@@ -5510,7 +5510,7 @@ output_iorsi3 (rtx *operands)
operands[1] = GEN_INT (logval);
   else
 {
- operands[0] = adjust_address (operands[0], SImode, 3 - (logval / 8));
+ operands[0] = adjust_address (operands[0], QImode, 3 - (logval / 8));
  operands[1] = GEN_INT (logval % 8);
}
   return "bset %1,%0";
@@ -5548,7 +5548,7 @@ output_xorsi3 (rtx *operands)
operands[1] = GEN_INT (logval);
   else
 {
- operands[0] = adjust_address (operands[0], SImode, 3 - (logval / 8));
+ operands[0] = adjust_address (operands[0], QImode, 3 - (logval / 8));
  operands[1] = GEN_INT (logval % 8);
}
   return "bchg %1,%0";
diff --git a/gcc/testsuite/gcc.target/m68k/pr108640.c 
b/gcc/testsuite/gcc.target/m68k/pr108640.c
new file mode 100644
index 000..5f3e8b49d42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/m68k/pr108640.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { do-options "-O1" } */
+
+int x;
+void andsi3(void) { x &= ~(1 << 16); }
+void iorsi3(void) { x |=  (1 << 16); }
+void xorsi3(void) { x ^=  (1 << 16); }
-- 
2.43.0



Re: [PATCH v2] c++: side effect in nullptr_t conversion fix

2024-01-18 Thread Dmitry Drozodv
[PATCH v3] c++: side effect in nullptr_t conversion fix

Hi,

> This seems to assume that a CONVERT_EXPR can't have any other
> side-effects nested in it.
>
> It seems to me a better approach is the one in keep_unused_object_arg
> and cp_gimplify_expr, basically
>
> if (TREE_THIS_VOLATILE (e))
>e = cp_build_addr_expr (e);
>
> since the address of a volatile lvalue is safe to put on the lhs of a
> COMPOUND_EXPR.

Thank you for your review! I have implemented what you suggested with a
little modification:
I cut the conversion nodes using STRIP_NOPS (expr), since the value is not
actually converted anywhere and should be ignored.

> I don't see why ocp_convert needs to change at all; if TREE_SIDE_EFFECTS
> is set we'll eventually get to cp_convert_to_pointer and can do the
> right thing there.

Without changes in ocp_convert, it does not happen to call the
cp_convert_to_pointer function.
Due to the equality of the 'to' and 'from' types, identity conversion
occurs with preserving the side effect.

> Why not just
>
> DECL_INITIAL (dest) = nullptr_node;
>
> ?

Because it doesn't work.
This does not completely replace initialization; reading from volatile
nullptr remains.

I also added a test suite 'nullptr47.C' to check the original tree.

---
 gcc/cp/cvt.cc  | 20 ++---
 gcc/cp/typeck2.cc  |  8 +++
 gcc/testsuite/g++.dg/cpp0x/nullptr47.C | 30 ++
 3 files changed, 55 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nullptr47.C

diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index cbed847b343..578c9b8ef20 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -218,8 +218,20 @@ cp_convert_to_pointer (tree type, tree expr, bool
dofold,
   ? build_int_cst_type (type, -1)
   : build_int_cst (type, 0));

-  return (TREE_SIDE_EFFECTS (expr)
-  ? build2 (COMPOUND_EXPR, type, expr, val) : val);
+  tree e = STRIP_NOPS (expr);
+  if (TREE_THIS_VOLATILE (e))
+ {
+  /* can't drop expr with side effect (e.g. function call). */
+  e = cp_build_addr_expr(e, tf_warning_or_error);
+  return build2 (COMPOUND_EXPR, type, e, val);
+ }
+
+  /* C++ [conv.lval]p3:
+ If T is cv std::nullptr_t, the result is a null pointer constant. */
+  /* expr value must be ignored. */
+  return ((TREE_SIDE_EFFECTS(e))
+   ? build2(COMPOUND_EXPR, type, e, val)
+   : val);
 }
   else if (TYPE_PTRMEM_P (type) && INTEGRAL_CODE_P (form))
 {
@@ -743,9 +755,11 @@ ocp_convert (tree type, tree expr, int convtype, int
flags,
 {
   if (complain & tf_warning)
  maybe_warn_zero_as_null_pointer_constant (e, loc);
-
   if (!TREE_SIDE_EFFECTS (e))
  return nullptr_node;
+  else
+ /* process nullptr to nullptr conversion with side effect. */
+ return cp_convert_to_pointer(type, e, dofold, complain);
 }

   if (MAYBE_CLASS_TYPE_P (type) && (convtype & CONV_FORCE_TEMP))
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index ac0fefa24f2..cdabb32b289 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -770,6 +770,14 @@ split_nonconstant_init (tree dest, tree init)
&& array_of_runtime_bound_p (TREE_TYPE (dest)))
 code = build_vec_init (dest, NULL_TREE, init, /*value-init*/false,
/*from array*/1, tf_warning_or_error);
+  else if (TREE_CODE (TREE_TYPE(init)) == NULLPTR_TYPE)
+{
+  /* C++ [conv.lval]p3:
+ If T is cv std::nullptr_t, the result is a null pointer constant. */
+  tree ie = cp_build_init_expr(dest, nullptr_node);
+  // remain expr with side effect
+  code = add_stmt_to_compound(init, ie);
+}
   else
 code = cp_build_init_expr (dest, init);

diff --git a/gcc/testsuite/g++.dg/cpp0x/nullptr47.C
b/gcc/testsuite/g++.dg/cpp0x/nullptr47.C
new file mode 100644
index 000..f05e91465f7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nullptr47.C
@@ -0,0 +1,30 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fdump-tree-original" }
+
+int* foo()
+{
+volatile auto a_0 = nullptr;
+int* b_0 = a_0;
+return b_0;
+}
+/* nullptr to int pointer conversion: no load from a_0. Init b_0 by zero.
*/
+// { dg-final { scan-tree-dump-times "cleanup_point   int \\* b_0 = 0" 1
"original"} }
+
+auto __attribute__ ((noinline)) bar()
+{
+   /* side effect. Elimination of 'call bar()' will be incorrect. */
+   volatile int* b = (int *)0xff;
+   *b = 10;
+
+   volatile auto n = nullptr;
+   return n;
+}
+/* nullptr to nullptr conversion (identity): no load from n. Just return
zero. */
+// { dg-final { scan-tree-dump-times "return  =\[^\\n\\r\]* 0;" 1
"original"} }
+
+void foo_2()
+{
+  volatile auto a_2 = bar();
+}
+/* non-constant init: no store bar() result to a_2. Call bar() and init
a_2 by zero. */
+// { dg-final { scan-tree-dump-times "bar \\(\\);, a_2 = 0;" 1 "original"}
}
-- 
2.34.1

On Wed, Jan 17, 2024 at 12:10 AM Jason Merrill  wrote:

> On 1/11/24 15:34, Dmitry Drozodv wrote:
> > You are absolutely right, we can't throw all 

Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-18 Thread Jeff Law




On 1/17/24 20:53, Greg McGary wrote:
On Tue, Jan 16, 2024 at 11:44 PM Richard Biener 
mailto:richard.guent...@gmail.com>> wrote:


 > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary > wrote:


 > >

 > > The sign bit of a sign-extending load cannot be known until runtime,

 > > so don't attempt to simplify it in the combiner.

 >
 > It feels like this papers over an issue downstream?

While the code comment is true, perhaps it obscures the primary intent,
which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined
to expand into a single memory-load instruction and no simplification is
possible, so why waste time with further analysis or transformation? There
are plenty of other conditions that also short circuit to "do nothing" and
this seems just as straightforward as those others. Efforts to catch this
further downstream add gratuitous complexity.
Because the real bug is likely still lurking, waiting for something else 
to trigger it.


An early exit is fine when we're just trying to avoid unnecessary work, 
but there's something else going on here we need to understand first.


jeff


[PATCH] hwasan: Always set target_hwasan_flags

2024-01-18 Thread H.J. Lu <>
Fix the "make check" error:

Running .../gcc/testsuite/gcc.dg/hwasan/hwasan.exp ...
ERROR: tcl error sourcing .../gcc/testsuite/gcc.dg/hwasan/hwasan.exp.
ERROR: tcl error code TCL LOOKUP VARNAME target_hwasan_flags
ERROR: can't read "target_hwasan_flags": no such variable
...

on non-x86-64 targets.

* lib/hwasan-dg.exp (hwasan_init): Always set target_hwasan_flags.
---
 gcc/testsuite/lib/hwasan-dg.exp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/lib/hwasan-dg.exp b/gcc/testsuite/lib/hwasan-dg.exp
index 76057502ee6..8d66b4db3e3 100644
--- a/gcc/testsuite/lib/hwasan-dg.exp
+++ b/gcc/testsuite/lib/hwasan-dg.exp
@@ -119,6 +119,8 @@ proc hwasan_init { args } {
 
 if [istarget x86_64-*-*] {
   set target_hwasan_flags "-mlam=u57"
+} else {
+  set target_hwasan_flags ""
 }
 
 set link_flags ""
-- 
2.43.0



[PATCH] Another memory leak in vectorizable_store

2024-01-18 Thread Richard Biener
Similar to the last one.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-stmts.cc (vectorizable_store): Do not pre-allocate
operands vector.
---
 gcc/tree-vect-stmts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 69d76c3b350..09749ae3817 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8542,7 +8542,7 @@ vectorizable_store (vec_info *vinfo,
 
   alias_off = build_int_cst (ref_type, 0);
   stmt_vec_info next_stmt_info = first_stmt_info;
-  auto_vec vec_oprnds (ncopies);
+  auto_vec vec_oprnds;
   /* For costing some adjacent vector stores, we'd like to cost with
 the total number of them once instead of cost each one by one. */
   unsigned int n_adjacent_stores = 0;
-- 
2.35.3


[pushed] Darwin, configure: Handle a missing substitution.

2024-01-18 Thread Iain Sandoe
Tested on x86_64 Darwin21 (has default rpath) and i686 darwin9 and
x86_64 Linux (no @rpath), pushed to trunk, thanks,
Iain

--- 8< ---

The configure substitution for enable_darwin_at_rpath has been
omitted, which leads to a failure to set ENABLE_DARWIN_AT_RPATH in
the testsuite site.exp (which leads to failure to add -B options
in some cases, breaking uninstalled testing there).

Since we already have substitutions for ENABLE_DARWIN_AT_RPATH_TRUE
we can use that instead, which is what this patch does.

gcc/ChangeLog:

* Makefile.in: Emit ENABLE_DARWIN_AT_RPATH into site.exp
when ENABLE_DARWIN_AT_RPATH_TRUE is not '#'.

Signed-off-by: Iain Sandoe 
---
 gcc/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index deb12e17d25..95caa54a52b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -4303,7 +4303,7 @@ site.exp: ./config.status Makefile
  echo "set COMPAT_OPTIONS \"$(COMPAT_OPTIONS)\"" >> ./site.tmp; \
else true; \
fi
-   @if test "x@enable_darwin_at_rpath@" = "xyes" ; then \
+   @if test "X@ENABLE_DARWIN_AT_RPATH_TRUE@" != "X#" ; then \
  echo "set ENABLE_DARWIN_AT_RPATH 1" >> ./site.tmp; \
fi
@echo "## All variables above are generated by configure. Do Not Edit 
##" >> ./site.tmp
-- 
2.39.2 (Apple Git-143)



Re: [PATCH v3] Fix __builtin_nested_func_ptr_{created,deleted} symbol versions [PR113402]

2024-01-18 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 02:59:23PM +, Iain Sandoe wrote:
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -8416,6 +8416,11 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
> machine_mode mode,
>  case BUILT_IN_ADJUST_DESCRIPTOR:
>return expand_builtin_adjust_descriptor (exp);
>  
> +case BUILT_IN_NESTED_PTR_CREATED:
> +case BUILT_IN_NESTED_PTR_DELETED:

Unsure if it is ok to have the BUILT_IN_ names so different from the actual
functions, if they shouldn't be
BUILT_IN_GCC_NESTED_FUNC_PTR_{CREATED,DELETED} instead.
The missing __ is what happens even with BUILT_IN_CLEAR_CACHE / __clear_cache.

> +  break; /* At present, no expansion, just call the function.  */
> +
> +

Just one empty newline, not 2.

>  case BUILT_IN_FORK:
>  case BUILT_IN_EXECL:
>  case BUILT_IN_EXECV:
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index 4d97ca0eec9..fd040eb8d80 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -1084,8 +1084,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, 
> "__builtin_adjust_trampoline")
>  DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
>  DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
>  DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
> -DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, 
> "__builtin_nested_func_ptr_created")
> -DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, 
> "__builtin_nested_func_ptr_deleted")
> +DEF_EXT_LIB_BUILTIN (BUILT_IN_NESTED_PTR_CREATED, 
> "__gcc_nested_func_ptr_created", BT_FN_VOID_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
> +DEF_EXT_LIB_BUILTIN (BUILT_IN_NESTED_PTR_DELETED, 
> "__gcc_nested_func_ptr_deleted", BT_FN_VOID, ATTR_NOTHROW_LIST)

See above.

Otherwise LGTM.

Jakub



[PATCH v3] Fix __builtin_nested_func_ptr_{created, deleted} symbol versions [PR113402]

2024-01-18 Thread Iain Sandoe
In order to regularise the two new builtins as extension library types
the scope of this patch has grown w.r.t "just rename".

Tested on x86_64-darwin21 (default heap trampolines) and x86_64 Linux and
other Darwin platforms that are default executable stack.

How does this look now?
thanks
Iain

--- 8< ---

The symbols for the functions supporting heap-based trampolines were
exported at an incorrect symbol version, the following patch fixes that.

As requested in the PR, this also renames __builtin_nested_func_ptr* to
__gcc_nested_func_ptr*.  In carrying our the rename, we move the builtins
to use DEF_EXT_LIB_BUILTIN.

PR libgcc/113402

gcc/ChangeLog:

* builtins.cc (expand_builtin): Handle BUILT_IN_NESTED_PTR_CREATED
and BUILT_IN_NESTED_PTR_DELETED.
* builtins.def (BUILT_IN_NESTED_PTR_CREATED,
BUILT_IN_NESTED_PTR_DELETED): Make these builtins LIB-EXT and
rename the library fallbacks to __gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted.
* doc/invoke.texi: Rename these to __gcc_nested_func_ptr_created
and __gcc_nested_func_ptr_deleted.
* tree-nested.cc (finalize_nesting_tree_1): Use builtin_explicit for
BUILT_IN_NESTED_PTR_CREATED and BUILT_IN_NESTED_PTR_DELETED.
* tree.cc (build_common_builtin_nodes): Build the
BUILT_IN_NESTED_PTR_CREATED and BUILT_IN_NESTED_PTR_DELETED local
builtins only for non-explicit.

libgcc/ChangeLog:

* config/aarch64/heap-trampoline.c: Rename
__builtin_nested_func_ptr_created to __gcc_nested_func_ptr_created and
__builtin_nested_func_ptr_deleted to __gcc_nested_func_ptr_deleted.
* config/i386/heap-trampoline.c: Likewise.
* libgcc2.h: Likewise.
* libgcc-std.ver.in (GCC_7.0.0): Likewise and then move
__gcc_nested_func_ptr_created and
__gcc_nested_func_ptr_deleted from this symbol version to ...
(GCC_14.0.0): ... this one.

Signed-off-by: Iain Sandoe 
Co-authored-by: Jakub Jelinek  
---
 gcc/builtins.cc |  5 
 gcc/builtins.def|  4 ++--
 gcc/doc/invoke.texi |  4 ++--
 gcc/tree-nested.cc  |  4 ++--
 gcc/tree.cc | 31 ++---
 libgcc/config/aarch64/heap-trampoline.c |  8 +++
 libgcc/config/i386/heap-trampoline.c|  8 +++
 libgcc/libgcc-std.ver.in|  5 ++--
 libgcc/libgcc2.h|  4 ++--
 9 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 09f2354f114..cebd88142b0 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -8416,6 +8416,11 @@ expand_builtin (tree exp, rtx target, rtx subtarget, 
machine_mode mode,
 case BUILT_IN_ADJUST_DESCRIPTOR:
   return expand_builtin_adjust_descriptor (exp);
 
+case BUILT_IN_NESTED_PTR_CREATED:
+case BUILT_IN_NESTED_PTR_DELETED:
+  break; /* At present, no expansion, just call the function.  */
+
+
 case BUILT_IN_FORK:
 case BUILT_IN_EXECL:
 case BUILT_IN_EXECV:
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 4d97ca0eec9..fd040eb8d80 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1084,8 +1084,8 @@ DEF_BUILTIN_STUB (BUILT_IN_ADJUST_TRAMPOLINE, 
"__builtin_adjust_trampoline")
 DEF_BUILTIN_STUB (BUILT_IN_INIT_DESCRIPTOR, "__builtin_init_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_ADJUST_DESCRIPTOR, "__builtin_adjust_descriptor")
 DEF_BUILTIN_STUB (BUILT_IN_NONLOCAL_GOTO, "__builtin_nonlocal_goto")
-DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_CREATED, 
"__builtin_nested_func_ptr_created")
-DEF_BUILTIN_STUB (BUILT_IN_NESTED_PTR_DELETED, 
"__builtin_nested_func_ptr_deleted")
+DEF_EXT_LIB_BUILTIN (BUILT_IN_NESTED_PTR_CREATED, 
"__gcc_nested_func_ptr_created", BT_FN_VOID_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
+DEF_EXT_LIB_BUILTIN (BUILT_IN_NESTED_PTR_DELETED, 
"__gcc_nested_func_ptr_deleted", BT_FN_VOID, ATTR_NOTHROW_LIST)
 
 /* Implementing __builtin_setjmp.  */
 DEF_BUILTIN_STUB (BUILT_IN_SETJMP_SETUP, "__builtin_setjmp_setup")
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4d43dda9839..7a5ba9e7fb5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -19457,8 +19457,8 @@ for nested functions.
 By default, trampolines are generated on stack.  However, certain platforms
 (such as the Apple M1) do not permit an executable stack.  Compiling with
 @option{-ftrampoline-impl=heap} generate calls to
-@code{__builtin_nested_func_ptr_created} and
-@code{__builtin_nested_func_ptr_deleted} in order to allocate and
+@code{__gcc_nested_func_ptr_created} and
+@code{__gcc_nested_func_ptr_deleted} in order to allocate and
 deallocate trampoline space on the executable heap.  These functions are
 implemented in libgcc, and will only be provided on specific targets:
 x86_64 Darwin, x86_64 and aarch64 Linux.  @emph{PLEASE NOTE}: Heap
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index 

Re: HELP: Questions on unshare_expr

2024-01-18 Thread Qing Zhao


> On Jan 17, 2024, at 1:43 AM, Richard Biener  
> wrote:
> 
> On Wed, Jan 17, 2024 at 7:42 AM Richard Biener
>  wrote:
>> 
>> On Tue, Jan 16, 2024 at 9:26 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Jan 15, 2024, at 4:31 AM, Richard Biener  
 wrote:
 
> All my questions for unshare_expr relate to a  LTO bug that I currently 
> stuck with
> when using .ACCESS_WITH_SIZE in bound sanitizer (only with -flto, without 
> -flto, no issue):
> 
> [opc@qinzhao-aarch64-ol8 gcc]$ sh t
> during IPA pass: modref
> t.c:20:1: internal compiler error: tree code ‘ssa_name’ is not supported 
> in LTO streams
> 0x14c3993 lto_write_tree
>   ../../latest-gcc-write/gcc/lto-streamer-out.cc:561
> 0x14c3aeb lto_output_tree_1
> 
> And the value of the tree node that triggered the ICE is:
> (gdb) call debug_tree(expr)
> 
>   nothrow
>   def_stmt
>   version:13 in-free-list>
> 
> Is there any good way to debug LTO bug?
 
 This happens usually when you have a VLA type and its type fields are not
 properly gimplified which usually happens because the frontend fails to
 insert a gimplification point for it (a DECL_EXPR).
>>> 
>>> I found an old gcc bug
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97172
>>> ICE: tree code ‘ssa_name’ is not supported in LTO streams since 
>>> r11-3303-g6450f07388f9fe57
>>> 
>>> Which is very similar to the bug I am having right now.
>>> 
>>> After further study, I suspect that the issue I am having right now with 
>>> the LTO streaming also
>>> relate to “unshare_expr”, “save_expr”, and the combination of these two, I 
>>> suspect that
>>> the current gcc cannot handle the combination of these two correctly for my 
>>> case.
>>> 
>>> My testing case is:
>>> 
>>> #include 
>>> void __attribute__((__noinline__)) setup_and_test_vla (int n1, int n2, int 
>>> m)
>>> {
>>>   struct foo {
>>>   int n;
>>>   int p[][n2][n1] __attribute__((counted_by(n)));
>>>   } *f;
>>> 
>>>   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n2][n1]));
>>>   f->n = m;
>>>   f->p[m][n2][n1]=1;
>>>   return;
>>> }
>>> 
>>> int main(int argc, char *argv[])
>>> {
>>>  setup_and_test_vla (10, 11, 20);
>>>  return 0;
>>> }
>>> 
>>> Failed with
>>> my_gcc -Os -fsanitize=bounds -flto
>>> 
>>> If changing either n1 or n2 to a constant, the testing passed.
>>> If deleting -flto, the testing passed too.
>>> 
>>> I double checked my code per the suggestions provided by you and Jakub in 
>>> this
>>> email thread, and I think the code should be fine.
>>> 
>>> The code is following:
>>> 
>>> =
>>> 504 /* Instrument array bounds for INDIRECT_REFs whose pointers are
>>> 505POINTER_PLUS_EXPRs of calls to .ACCESS_WITH_SIZE. We create special
>>> 506builtins that gets expanded in the sanopt pass, and make an array
>>> 507dimension of it.  ARRAY is the pointer to the base of the array,
>>> 508which is a call to .ACCESS_WITH_SIZE, *OFFSET is the offset to the
>>> 509beginning of array.
>>> 510Return NULL_TREE if no instrumentation is emitted.  */
>>> 511
>>> 512 tree
>>> 513 ubsan_instrument_bounds_indirect_ref (location_t loc, tree array, tree 
>>> *offset)
>>> 514 {
>>> 515   if (!is_access_with_size_p (array))
>>> 516 return NULL_TREE;
>>> 517   tree bound = get_bound_from_access_with_size (array);
>>> 518   /* The type of the call to .ACCESS_WITH_SIZE is a pointer type to
>>> 519  the element of the array.  */
>>> 520   tree element_size = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array)));
>>> 521   gcc_assert (bound);
>>> 522
>>> 523   /* Given the offset, and the size of each element, the index can be
>>> 524  computed as: offset/element_size.  */
>>> 525   *offset = save_expr (*offset);
>>> 526   tree index = fold_build2 (EXACT_DIV_EXPR,
>>> 527sizetype, *offset,
>>> 528unshare_expr (element_size));
>>> 529   /* Create a "(T *) 0" tree node to describe the original array type.
>>> 530  We get the original array type from the first argument of the call 
>>> to
>>> 531  .ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, num_bytes, -1).
>>> 532
>>> 533  Originally, REF is a COMPONENT_REF with the original array type,
>>> 534  it was converted to a pointer to an ADDR_EXPR, and the ADDR_EXPR's
>>> 535  first operand is the original COMPONENT_REF.  */
>>> 536   tree ref = CALL_EXPR_ARG (array, 0);
>>> 537   tree array_type
>>> 538 = unshare_expr (TREE_TYPE (TREE_OPERAND (TREE_OPERAND(ref, 0), 0)));
>>> 539   tree zero_with_type = build_int_cst (build_pointer_type (array_type), 
>>> 0);
>>> 540   return build_call_expr_internal_loc (loc, IFN_UBSAN_BOUNDS,
>>> 541void_type_node, 3, 
>>> zero_with_type,
>>> 542index, bound);
>>> 543 }
>>> 
>>> =
>>> 
>>> Inside gdb, the guilty IR failed in LTO streaming is from the above line 

Re: [PATCH v5] RISC-V: Support XTheadVector extension

2024-01-18 Thread Christoph Müllner
On Fri, Jan 12, 2024 at 4:18 AM Jun Sha (Joshua)
 wrote:
>
> This patch series presents gcc implementation of the XTheadVector
> extension [1].
>
> [1] https://github.com/T-head-Semi/thead-extension-spec/
>
> For some vector patterns that cannot be avoided, we use
> "!TARGET_XTHEADVECTOR" to disable them in order not to
> generate instructions that xtheadvector does not support,
> causing 10 changes in vector.md.
>
> For the th. prefix issue, we use current_output_insn and
> the ASM_OUTPUT_OPCODE hook instead of directly modifying
> patterns in vector.md.
>
> We have run the GCC test suite and can confirm that there
> are no regressions.
>
> Furthermore, we have run the tests in
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/main/examples,
> and all the tests passed.
>
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
>
> [PATCH v4] RISC-V: Introduce XTheadVector as a subset of V1.0.0
> [PATCH v5] RISC-V: Adds the prefix "th." for the instructions of XTheadVector
> [PATCH v6] RISC-V: Handle differences between XTheadvector and Vector
> [PATCH v6] RISC-V: Add support for xtheadvector-specific intrinsics
> [PATCH v6] RISC-V: Fix register overlap issue for some xtheadvector 
> instructions
> [PATCH v5] RISC-V: Rewrite some instructions using ASM targethook

All patches of this series got either "LGTM" or "OK":
* https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643339.html
* https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642798.html
* https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642799.html
* https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642800.html
* https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642801.html
* https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642802.html

As mentioned earlier, I have rebased the patches, retested them locally and
(after ensuring there are no regressions) pushed them.

To all involved people: thank you very much!
A special 'thank you' goes to Juzhe, who did a great job in reviewing
the patches
and providing suggestions to get the code into shape!


[pushed] Objective-C/C++: Ensure sufficient setup for the preprocessor.

2024-01-18 Thread Iain Sandoe
This is a regression fix where non-trivial Objective-C parses would
ICE when given -save-temps (ICE in the lexer).

This is a short-term fix for stage-4.  ISTM that we should not really
be making use of these functions in lexing and hopefully in GCC-15
we can take a look at moving the functionality to a later phase.

Tested on i686, powerpc, x86_64 Darwin, x86_64 Linux, pushed to trunk,
thanks
Iain

--- 8< ---

The tokenizer makes use of functions that determine if identifiers
are interface or class names, and those functions need a hash map
to be set up.

This ensures that these are initialized before pre-process-only
jobs are run.

gcc/objc/ChangeLog:

* objc-act.cc (objc_init): Initialize interface and class
name hash maps before the preprocessor uses them.

Signed-off-by: Iain Sandoe 
---
 gcc/objc/objc-act.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/objc/objc-act.cc b/gcc/objc/objc-act.cc
index 143134832ff..cec64c4bfbd 100644
--- a/gcc/objc/objc-act.cc
+++ b/gcc/objc/objc-act.cc
@@ -345,6 +345,11 @@ bool
 objc_init (void)
 {
   bool ok;
+
+  /* Set up stuff used by the preprocessor as well as FE parser.  */
+  interface_hash_init ();
+  hash_init ();
+
 #ifdef OBJCPLUS
   if (cxx_init () == false)
 #else
@@ -374,8 +379,6 @@ objc_init (void)
 
   /* Set up stuff used by FE parser and all runtimes.  */
   errbuf = XNEWVEC (char, 1024 * 10);
-  interface_hash_init ();
-  hash_init ();
   objc_encoding_init ();
   /* ... and then check flags and set-up for the selected runtime ... */
   if (flag_next_runtime && flag_objc_abi >= 2)
-- 
2.39.2 (Apple Git-143)



[pushed] Darwin: Suppress adding embedded rpaths for earlier OS versions.

2024-01-18 Thread Iain Sandoe
The current setup leads to spurious test fails, where we are building for
macOS 10.4 or earlier.

Tested on x86_64, i868, powerpc Darwin, x86_64 Linux, pushed to trunk,
thanks
Iain

--- 8< ---

When we have @rpath support by virtue of the OS version we're hosting on
we still need to omit those rpath entries when targeting < 10.5 (or the
linker will complain).  To do this we (maybe ab-)use a property of the
spec function expansion that a non-null return value can be used as the
true input to a second spec (whereas, unfortunately, we cannot pass specs
to the version function at present).

gcc/ChangeLog:

* config/darwin.h (DARWIN_RPATH_SPEC): Arrange for the %P spec
to be conditional on macosx-version-min.

Signed-off-by: Iain Sandoe 
---
 gcc/config/darwin.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index e94a29c639c..cb96d67b3b1 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -612,8 +612,7 @@ extern GTY(()) int darwin_ms_struct;
director as one being loaded.  */
 #define DARWIN_RPATH_SPEC \
   "%:version-compare(>= 10.5 mmacosx-version-min= -rpath) \
-   %:version-compare(>= 10.5 mmacosx-version-min= @loader_path) \
-   %P "
+   %{%:version-compare(>= 10.5 mmacosx-version-min= @loader_path): %P }"
 #else
 #define DARWIN_RPATH_SPEC ""
 #endif
-- 
2.39.2 (Apple Git-143)



[pushed] Darwin: Fix a typo in Objective-C meta-data.

2024-01-18 Thread Iain Sandoe
Tested on i686, powerpc, x86_64 Darwin, x86_64 Linux, pushed to trunk,
thanks,
Iain

--- 8< ---

We have a typo in the metadata for assigning NSStrings to a specific
section for the V1 (32b) ABI.  When that is fixed we should never see
the case where the section needs to be deduced from the properties of
the DECLs.

gcc/ChangeLog:

* config/darwin.cc (darwin_objc1_section): Use the correct
meta-data version for constant strings.
(machopic_select_section): Assert if we fail to handle CFString
sections as Obejctive-C meta-data or drectly.

Signed-off-by: Iain Sandoe 
---
 gcc/config/darwin.cc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
index b15f3b1a1d9..7f43718820b 100644
--- a/gcc/config/darwin.cc
+++ b/gcc/config/darwin.cc
@@ -1638,7 +1638,7 @@ darwin_objc1_section (tree decl ATTRIBUTE_UNUSED, tree 
meta, section * base)
   else if (startswith (p, "V1_CEXT"))
 return darwin_sections[objc1_class_ext_section];
 
-  else if (startswith (p, "V2_CSTR"))
+  else if (startswith (p, "V1_CSTR"))
 return darwin_sections[objc_constant_string_object_section];
 
   return base;
@@ -1782,7 +1782,7 @@ machopic_select_section (tree decl,
return base_section; /* GNU runtime is happy with it all in one pot.  */
 }
 
-  /* b) Constant string objects.  */
+  /* b) Constructors for constant NSstring [but not CFString] objects.  */
   if (TREE_CODE (decl) == CONSTRUCTOR
   && TREE_TYPE (decl)
   && TREE_CODE (TREE_TYPE (decl)) == RECORD_TYPE
@@ -1804,6 +1804,12 @@ machopic_select_section (tree decl,
  else
return darwin_sections[objc_string_object_section];
}
+  else if (!strcmp (IDENTIFIER_POINTER (name), "__builtin_CFString"))
+   {
+ /* We should have handled __anon_cfstrings above.  */
+ gcc_checking_assert (0);
+ return darwin_sections[cfstring_constant_object_section];
+   }
   else
return base_section;
 }
-- 
2.39.2 (Apple Git-143)



Re: [PATCH] libstdc++: Fix constexpr _Safe_iterator in C++20 mode

2024-01-18 Thread Jonathan Wakely
On Thu, 18 Jan 2024 at 13:51, Patrick Palka  wrote:
>
> On Thu, 18 Jan 2024, Jonathan Wakely wrote:
>
> > On Thu, 18 Jan 2024 at 02:48, Patrick Palka wrote:
> > >
> > > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> >
> > Please add PR109536 to the commit message.
>
> Done.
>
> >
> >
> >
> > >
> > > -- >8 --
> > >
> > > Some _Safe_iterator member functions define a variable of non-literal
> > > type __gnu_cxx::__scoped_lock, which automatically disqualifies them from
> > > being constexpr in C++20 mode even if that code path is never constant
> > > evaluated.  This restriction was lifted by P2242R3 for C++23, but we
> > > need to work around it in C++20 mode.  To that end this patch defines
> > > a pair of macros that encapsulate the lambda-based workaround mentioned
> > > in that paper and uses them to make the functions valid C++20 constexpr
> > > functions.  The augmented std::vector test element_access/constexpr.cc
> > > now successfully compiles in C++20 mode with -D_GLIBCXX_DEBUG (and it
> > > tests all modified member functions).
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * include/debug/safe_base.h (_Safe_sequence_base::_M_swap):
> > > Remove _GLIBCXX20_CONSTEXPR.
> > > * include/debug/safe_iterator.h 
> > > (_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN):
> > > (_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END): Define.
> > > (_Safe_iterator::operator=): Use them around the code path that
> > > defines a variable of type __gnu_cxx::__scoped_lock.
> > > (_Safe_iterator::operator++): Likewise.
> > > (_Safe_iterator::operator--): Likewise.
> > > (_Safe_iterator::operator+=): Likewise.
> > > (_Safe_iterator::operator-=): Likewise.
> > > * testsuite/23_containers/vector/element_access/constexpr.cc
> > > (test_iterators): Also test copy and move assignment.
> > > * testsuite/std/ranges/adaptors/all.cc (test08) [_GLIBCXX_DEBUG]:
> > > Use std::vector unconditionally.
> > > ---
> > >  libstdc++-v3/include/debug/safe_base.h|  1 -
> > >  libstdc++-v3/include/debug/safe_iterator.h| 48 ++-
> > >  .../vector/element_access/constexpr.cc|  2 +
> > >  .../testsuite/std/ranges/adaptors/all.cc  |  4 --
> > >  4 files changed, 38 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/include/debug/safe_base.h 
> > > b/libstdc++-v3/include/debug/safe_base.h
> > > index 107fef3cb02..d5fbe4b1320 100644
> > > --- a/libstdc++-v3/include/debug/safe_base.h
> > > +++ b/libstdc++-v3/include/debug/safe_base.h
> > > @@ -268,7 +268,6 @@ namespace __gnu_debug
> > >   *  operation is complete all iterators that originally referenced
> > >   *  one container now reference the other container.
> > >   */
> > > -_GLIBCXX20_CONSTEXPR
> > >  void
> > >  _M_swap(_Safe_sequence_base& __x) _GLIBCXX_USE_NOEXCEPT;
> > >
> > > diff --git a/libstdc++-v3/include/debug/safe_iterator.h 
> > > b/libstdc++-v3/include/debug/safe_iterator.h
> > > index 1bc7c904ee0..929fd9b0ade 100644
> > > --- a/libstdc++-v3/include/debug/safe_iterator.h
> > > +++ b/libstdc++-v3/include/debug/safe_iterator.h
> > > @@ -65,6 +65,20 @@
> > >_GLIBCXX_DEBUG_VERIFY_OPERANDS(_Lhs, _Rhs, __msg_distance_bad,   \
> > >  __msg_distance_different)
> > >
> > > +// This pair of macros helps with writing valid C++20 constexpr 
> > > functions that
> > > +// contain a non-constexpr code path that defines a non-literal 
> > > variable, which
> > > +// was otherwise disallowed until P2242R3 for C++23.  We use them below 
> > > for
> > > +// __gnu_cxx::__scoped_lock so that the containing functions are still
> > > +// considered valid C++20 constexpr functions.
> > > +
> > > +#if __cplusplus >= 202002L && __cpp_constexpr < 202110L
> > > +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN [&]() -> void { do
> > > +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END while(false); }();
> >
> > Do we need the do-while to create a single statement from the block?
> > Isn't the lambda body enough to create a single statement from it,
> > which can't be broken by a dangling else or anything like that?
>
> I was thinking that the do-while gives compile-time assurance that the
> macros are used properly and in particular every ..._BEGIN is matched
> with an ..._END, so that e.g.
>
>   _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
> do_stuff();
>   } // omitted ..._END
>
> doesn't parse.  But it turns out that won't parse even without the
> do-while, due to a missing semicolon.  And the parse error is much more
> readable when the do-while isn't used.
>
> One risk without the do-while is that the seemingly innocent
>
>  _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
>do_stuff();
>  };
>
> will parse, and (in C++20 mode) define a lambda that's never invoked,
> and thus do_stuff() is never invoked.  But tests should catch that,
> so consider 

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-18 Thread chenglulu



在 2024/1/18 下午4:49, chenglulu 写道:


在 2024/1/18 下午3:44, Xi Ruoyao 写道:

On Thu, 2024-01-18 at 15:15 +0800, chenglulu wrote:


gcc.dg/tree-ssa/scev-16.c is OK to move
gcc.dg/pr104992.c should simply add -fno-tree-vectorize to the used
options and remove the vect_* stuff

Hi Richard:

I have a question. I don't understand the purpose of adding
'-fno-tree-vectorize' here.

I don't think -fno-tree-vectorize will make a difference here. This
test case uses __attribute__((vector_size(...))) explicitly so the
vector operation will be used even if -fno-tree-vectorize.

Yes, I did the test and compared the intermediate results and saw no 
difference.


“remove the vect_* stuff”,I don’t quite understand what it means 
either.:-(



The test case scev-16.c was moved to the vect directory in r14-8210.



[pushed] Darwin: Fix constant CFString code-gen [PR105522].

2024-01-18 Thread Iain Sandoe
@Richi, @Andrew - FIO since you were involved in the IRC discussion.

Tested on i686, powerpc, x86_64 Darwin (and x86_64 Linux), pushed
to trunk, thanks,
Iain

--- 8< ---

Although this only fires for one of the Darwin sub-ports, it is latent
elsewhere, it is also a regression c.f. the Darwin system compiler.

In the code we imported from an earlier branch, CFString objects (which
are constant aggregates) are constructed as CONST_DECLs.  Although our
current documentation suggests that these are reserved for enumeration
values, in fact they are used elsewhere in the compiler for constants.
This includes Objective-C where they are used to form NSString constants.

In the particular case, we take the address of the constant and that
triggers varasm.cc:decode_addr_constant, which does not currently support
CONST_DECL.

If there is a general intent to allow/encourage wider use of CONST_DECL,
then we should fix decode_addr_constant to look through these and evaluate
the initializer (a two-line patch, but I'm not suggesting it for stage-4).

We also need to update the GCC internals documentation to allow for the
additional uses.

This patch is Darwin-local and fixes the problem by making the CFString
constants into regular variable but TREE_CONSTANT+TREE_READONLY. I plan
to back-port this to the open branches once it has baked a while on trunk.

Since, for Darwin, the Objective-C default is to construct constant
NSString objects as CFStrings; this will also cover the majority of cases
there (this patch does not make any changes to Objective-C NSStrings).

PR target/105522

gcc/ChangeLog:

* config/darwin.cc (machopic_select_section): Handle C and C++
CFStrings.
(darwin_rename_builtins): Move this out of the CFString code.
(darwin_libc_has_function): Likewise.
(darwin_build_constant_cfstring): Create an anonymous var to
hold each CFString.
* config/darwin.h (ASM_OUTPUT_LABELREF): Handle constant
CFstrings.

Signed-off-by: Iain Sandoe 
---
 gcc/config/darwin.cc| 100 ++--
 gcc/config/darwin.h |   2 +
 gcc/testsuite/gcc.dg/pr105522.c |  17 ++
 3 files changed, 76 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr105522.c

diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
index cf203dc4b3e..b15f3b1a1d9 100644
--- a/gcc/config/darwin.cc
+++ b/gcc/config/darwin.cc
@@ -1731,7 +1731,16 @@ machopic_select_section (tree decl,
base_section = darwin_sections[zobj_data_section];
}
   else if (ro)
-   base_section = darwin_sections[const_data_section];
+   {
+ if (VAR_P (decl) && TREE_TYPE (decl)
+ && TREE_CODE (TREE_TYPE (decl)) == RECORD_TYPE
+ && DECL_NAME (decl)
+ && strncmp (IDENTIFIER_POINTER (DECL_NAME (decl)),
+ "__anon_cfstring", 15) == 0)
+  base_section = darwin_sections[cfstring_constant_object_section];
+ else
+   base_section = darwin_sections[const_data_section];
+   }
   else
base_section = data_section;
   break;
@@ -1795,8 +1804,6 @@ machopic_select_section (tree decl,
  else
return darwin_sections[objc_string_object_section];
}
-  else if (!strcmp (IDENTIFIER_POINTER (name), "__builtin_CFString"))
-   return darwin_sections[cfstring_constant_object_section];
   else
return base_section;
 }
@@ -3612,6 +3619,29 @@ darwin_patch_builtins (void)
 }
 #endif
 
+void
+darwin_rename_builtins (void)
+{
+}
+
+/* Implementation for the TARGET_LIBC_HAS_FUNCTION hook.  */
+
+bool
+darwin_libc_has_function (enum function_class fn_class,
+ tree type ATTRIBUTE_UNUSED)
+{
+  if (fn_class == function_sincos && darwin_macosx_version_min)
+return (strverscmp (darwin_macosx_version_min, "10.9") >= 0);
+#if DARWIN_PPC && SUPPORT_DARWIN_LEGACY
+  if (fn_class == function_c99_math_complex
+  || fn_class == function_c99_misc)
+return (TARGET_64BIT
+   || (darwin_macosx_version_min &&
+   strverscmp (darwin_macosx_version_min, "10.3") >= 0));
+#endif
+  return default_libc_has_function (fn_class, type);
+}
+
 /*  CFStrings implementation.  */
 static GTY(()) tree cfstring_class_reference = NULL_TREE;
 static GTY(()) tree cfstring_type_node = NULL_TREE;
@@ -3629,7 +3659,7 @@ typedef struct GTY ((for_user)) cfstring_descriptor {
   /* The string literal.  */
   tree literal;
   /* The resulting constant CFString.  */
-  tree constructor;
+  tree ccf_str;
 } cfstring_descriptor;
 
 struct cfstring_hasher : ggc_ptr_hash
@@ -3704,7 +3734,7 @@ darwin_init_cfstring_builtins (unsigned builtin_cfstring)
   /* Make a lang-specific section - dup_lang_specific_decl makes a new node
  in place of the existing, which may be NULL.  */
   DECL_LANG_SPECIFIC (cfsfun) = NULL;
-  (*lang_hooks.dup_lang_specific_decl) (cfsfun);
+  

Re: [PATCH] libstdc++: Fix constexpr _Safe_iterator in C++20 mode

2024-01-18 Thread Patrick Palka
On Thu, 18 Jan 2024, Jonathan Wakely wrote:

> On Thu, 18 Jan 2024 at 02:48, Patrick Palka wrote:
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
> 
> Please add PR109536 to the commit message.

Done.

> 
> 
> 
> >
> > -- >8 --
> >
> > Some _Safe_iterator member functions define a variable of non-literal
> > type __gnu_cxx::__scoped_lock, which automatically disqualifies them from
> > being constexpr in C++20 mode even if that code path is never constant
> > evaluated.  This restriction was lifted by P2242R3 for C++23, but we
> > need to work around it in C++20 mode.  To that end this patch defines
> > a pair of macros that encapsulate the lambda-based workaround mentioned
> > in that paper and uses them to make the functions valid C++20 constexpr
> > functions.  The augmented std::vector test element_access/constexpr.cc
> > now successfully compiles in C++20 mode with -D_GLIBCXX_DEBUG (and it
> > tests all modified member functions).
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/debug/safe_base.h (_Safe_sequence_base::_M_swap):
> > Remove _GLIBCXX20_CONSTEXPR.
> > * include/debug/safe_iterator.h 
> > (_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN):
> > (_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END): Define.
> > (_Safe_iterator::operator=): Use them around the code path that
> > defines a variable of type __gnu_cxx::__scoped_lock.
> > (_Safe_iterator::operator++): Likewise.
> > (_Safe_iterator::operator--): Likewise.
> > (_Safe_iterator::operator+=): Likewise.
> > (_Safe_iterator::operator-=): Likewise.
> > * testsuite/23_containers/vector/element_access/constexpr.cc
> > (test_iterators): Also test copy and move assignment.
> > * testsuite/std/ranges/adaptors/all.cc (test08) [_GLIBCXX_DEBUG]:
> > Use std::vector unconditionally.
> > ---
> >  libstdc++-v3/include/debug/safe_base.h|  1 -
> >  libstdc++-v3/include/debug/safe_iterator.h| 48 ++-
> >  .../vector/element_access/constexpr.cc|  2 +
> >  .../testsuite/std/ranges/adaptors/all.cc  |  4 --
> >  4 files changed, 38 insertions(+), 17 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/debug/safe_base.h 
> > b/libstdc++-v3/include/debug/safe_base.h
> > index 107fef3cb02..d5fbe4b1320 100644
> > --- a/libstdc++-v3/include/debug/safe_base.h
> > +++ b/libstdc++-v3/include/debug/safe_base.h
> > @@ -268,7 +268,6 @@ namespace __gnu_debug
> >   *  operation is complete all iterators that originally referenced
> >   *  one container now reference the other container.
> >   */
> > -_GLIBCXX20_CONSTEXPR
> >  void
> >  _M_swap(_Safe_sequence_base& __x) _GLIBCXX_USE_NOEXCEPT;
> >
> > diff --git a/libstdc++-v3/include/debug/safe_iterator.h 
> > b/libstdc++-v3/include/debug/safe_iterator.h
> > index 1bc7c904ee0..929fd9b0ade 100644
> > --- a/libstdc++-v3/include/debug/safe_iterator.h
> > +++ b/libstdc++-v3/include/debug/safe_iterator.h
> > @@ -65,6 +65,20 @@
> >_GLIBCXX_DEBUG_VERIFY_OPERANDS(_Lhs, _Rhs, __msg_distance_bad,   \
> >  __msg_distance_different)
> >
> > +// This pair of macros helps with writing valid C++20 constexpr functions 
> > that
> > +// contain a non-constexpr code path that defines a non-literal variable, 
> > which
> > +// was otherwise disallowed until P2242R3 for C++23.  We use them below for
> > +// __gnu_cxx::__scoped_lock so that the containing functions are still
> > +// considered valid C++20 constexpr functions.
> > +
> > +#if __cplusplus >= 202002L && __cpp_constexpr < 202110L
> > +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN [&]() -> void { do
> > +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END while(false); }();
> 
> Do we need the do-while to create a single statement from the block?
> Isn't the lambda body enough to create a single statement from it,
> which can't be broken by a dangling else or anything like that?

I was thinking that the do-while gives compile-time assurance that the
macros are used properly and in particular every ..._BEGIN is matched
with an ..._END, so that e.g.

  _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
do_stuff();
  } // omitted ..._END

doesn't parse.  But it turns out that won't parse even without the
do-while, due to a missing semicolon.  And the parse error is much more
readable when the do-while isn't used.

One risk without the do-while is that the seemingly innocent

 _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
   do_stuff();
 };

will parse, and (in C++20 mode) define a lambda that's never invoked,
and thus do_stuff() is never invoked.  But tests should catch that,
so consider the do-while removed.

> 
> 
> > +#else
> > +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN
> > +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
> > +#endif
> > +
> >  namespace __gnu_debug
> >  {
> >/** Helper struct to deal with sequence offering a 

Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> On Thu, Jan 18, 2024 at 02:13:55PM +0100, Jakub Jelinek wrote:
> > The == BITINT_TYPE check is non-essential, was just trying to keep existing
> > behavior otherwise.  I can certainly drop that.
> 
> So following then?

OK.

Thanks,
Richard.

> 2024-01-18  Jakub Jelinek  
>   Richard Biener  
> 
>   * cfgexpand.cc (discover_nonconstant_array_refs_r): Force non-BLKmode
>   VAR_DECLs referenced in BLKmode VIEW_CONVERT_EXPRs into memory.
>   * expr.cc (expand_expr_real_1) : Do nothing
>   but adjust_address also for BLKmode mode and MEM op0.
> 
> --- gcc/cfgexpand.cc.jj   2024-01-16 11:45:16.159326506 +0100
> +++ gcc/cfgexpand.cc  2024-01-18 14:15:54.853008586 +0100
> @@ -6380,11 +6380,15 @@ discover_nonconstant_array_refs_r (tree
>/* References of size POLY_INT_CST to a fixed-size object must go
>   through memory.  It's more efficient to force that here than
>   to create temporary slots on the fly.
> - RTL expansion expectes TARGET_MEM_REF to always address actual memory.  
> */
> + RTL expansion expectes TARGET_MEM_REF to always address actual memory.
> + Also, force to stack non-BLKmode vars accessed through VIEW_CONVERT_EXPR
> + to BLKmode type.  */
>else if (TREE_CODE (t) == TARGET_MEM_REF
>  || (TREE_CODE (t) == MEM_REF
>  && TYPE_SIZE (TREE_TYPE (t))
> -&& POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t)
> +&& POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
> +|| (TREE_CODE (t) == VIEW_CONVERT_EXPR
> +&& TYPE_MODE (TREE_TYPE (t)) == BLKmode))
>  {
>tree base = get_base_address (t);
>if (base
> --- gcc/expr.cc.jj2024-01-12 10:07:58.194851657 +0100
> +++ gcc/expr.cc   2024-01-18 14:15:31.970328685 +0100
> @@ -12389,6 +12389,10 @@ expand_expr_real_1 (tree exp, rtx target
>/* If the input and output modes are both the same, we are done.  */
>if (mode == GET_MODE (op0))
>   ;
> +  /* Similarly if the output mode is BLKmode and input is a MEM,
> +  adjust_address done below is all we need.  */
> +  else if (mode == BLKmode && MEM_P (op0))
> + ;
>/* If neither mode is BLKmode, and both modes are the same size
>then we can use gen_lowpart.  */
>else if (mode != BLKmode
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] hwasan: Check if Intel LAM_U57 is enabled

2024-01-18 Thread H.J. Lu
On Wed, Jan 17, 2024 at 8:51 PM Hongtao Liu  wrote:
>
> On Wed, Jan 10, 2024 at 12:47 AM H.J. Lu  wrote:
> >
> > When -fsanitize=hwaddress is used, libhwasan will try to enable LAM_U57
> > in the startup code.  Update the target check to enable hwaddress tests
> > if LAM_U57 is enabled.  Also compile hwaddress tests with -mlam=u57 on
> > x86-64 since hwasan requires LAM_U57 on x86-64.
> I've tested it on lam enabled SRF, and it passed all hwasan testcases
> except below
>
> FAIL: c-c++-common/hwasan/alloca-outside-caught.c   -O0  output pattern test
> FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O1
> scan-assembler-times bl
> s*__hwasan_tag_mismatch4 1
> FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2
> scan-assembler-times bl
> s*__hwasan_tag_mismatch4 1
> FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O3 -g
> scan-assembler-times bl
> s*__hwasan_tag_mismatch4 1
> FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -Os
> scan-assembler-times bl
> s*__hwasan_tag_mismatch4 1
> FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2 -flto
> -fno-use-linker-plugin -flto-partition=none   scan-assembler-times bl
> s*__hwasan_tag_mismatch4 1
> FAIL: c-c++-common/hwasan/hwasan-poison-optimisation.c   -O2 -flto
> -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times bl
> s*__hwasan_tag_mismatch4 1
> FAIL: c-c++-common/hwasan/vararray-outside-caught.c   -O0  output pattern test
>
> Basically they're testcase issues, the testcases needs to be adjusted
> for x86/ I'll commit a separate patch for those after this commit is
> upstream.
> Also I've also tested the patch on lam unsupported platforms, all
> hwasan testcases shows unsupported.
> So the patch LGTM.
>
> >
> > * lib/hwasan-dg.exp (check_effective_target_hwaddress_exec):
> > Return 1 if Intel LAM_U57 is enabled.
> > (hwasan_init): Add -mlam=u57 on x86-64.

Pushed.  LAM has been enabled in GCC 13:

[hjl@gnu-cfl-3 tmp]$ gcc -fsanitize=hwaddress -mlam=u57 alloca-outside-caught.c
[hjl@gnu-cfl-3 tmp]$ ./a.out
FATAL: HWAddressSanitizer requires a kernel with tagged address ABI.
[hjl@gnu-cfl-3 tmp]$ strace ./a.out
...
arch_prctl(ARCH_GET_MAX_TAG_BITS, 0x7ffc56267708) = 0
write(2, "FATAL: HWAddressSanitizer requir"..., 69FATAL:
HWAddressSanitizer requires a kernel with tagged address ABI.

I'd like to backport it to GCC 13.

We should mention LAM in changes for GCC 13.

-- 
H.J.


Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 02:13:55PM +0100, Jakub Jelinek wrote:
> The == BITINT_TYPE check is non-essential, was just trying to keep existing
> behavior otherwise.  I can certainly drop that.

So following then?

2024-01-18  Jakub Jelinek  
Richard Biener  

* cfgexpand.cc (discover_nonconstant_array_refs_r): Force non-BLKmode
VAR_DECLs referenced in BLKmode VIEW_CONVERT_EXPRs into memory.
* expr.cc (expand_expr_real_1) : Do nothing
but adjust_address also for BLKmode mode and MEM op0.

--- gcc/cfgexpand.cc.jj 2024-01-16 11:45:16.159326506 +0100
+++ gcc/cfgexpand.cc2024-01-18 14:15:54.853008586 +0100
@@ -6380,11 +6380,15 @@ discover_nonconstant_array_refs_r (tree
   /* References of size POLY_INT_CST to a fixed-size object must go
  through memory.  It's more efficient to force that here than
  to create temporary slots on the fly.
- RTL expansion expectes TARGET_MEM_REF to always address actual memory.  */
+ RTL expansion expectes TARGET_MEM_REF to always address actual memory.
+ Also, force to stack non-BLKmode vars accessed through VIEW_CONVERT_EXPR
+ to BLKmode type.  */
   else if (TREE_CODE (t) == TARGET_MEM_REF
   || (TREE_CODE (t) == MEM_REF
   && TYPE_SIZE (TREE_TYPE (t))
-  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t)
+  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
+  || (TREE_CODE (t) == VIEW_CONVERT_EXPR
+  && TYPE_MODE (TREE_TYPE (t)) == BLKmode))
 {
   tree base = get_base_address (t);
   if (base
--- gcc/expr.cc.jj  2024-01-12 10:07:58.194851657 +0100
+++ gcc/expr.cc 2024-01-18 14:15:31.970328685 +0100
@@ -12389,6 +12389,10 @@ expand_expr_real_1 (tree exp, rtx target
   /* If the input and output modes are both the same, we are done.  */
   if (mode == GET_MODE (op0))
;
+  /* Similarly if the output mode is BLKmode and input is a MEM,
+adjust_address done below is all we need.  */
+  else if (mode == BLKmode && MEM_P (op0))
+   ;
   /* If neither mode is BLKmode, and both modes are the same size
 then we can use gen_lowpart.  */
   else if (mode != BLKmode


Jakub



Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 01:57:49PM +0100, Richard Biener wrote:
> > - RTL expansion expectes TARGET_MEM_REF to always address actual 
> > memory.  */
> > + RTL expansion expectes TARGET_MEM_REF to always address actual memory.
> > + Also, force to stack non-BLKmode vars accessed through 
> > VIEW_CONVERT_EXPR
> > + to BLKmode BITINT_TYPEs.  */
> >else if (TREE_CODE (t) == TARGET_MEM_REF
> >|| (TREE_CODE (t) == MEM_REF
> >&& TYPE_SIZE (TREE_TYPE (t))
> > -  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t)
> > +  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
> > +  || (TREE_CODE (t) == VIEW_CONVERT_EXPR
> > +  && TREE_CODE (TREE_TYPE (t)) == BITINT_TYPE
> > +  && TYPE_MODE (TREE_TYPE (t)) == BLKmode))
> 
> I'm still not getting what's special about BITINT_TYPE here so
> shouldn't that apply to all BLKmode V_C_E?  But sure we can for
> now just handle BITINT_TYPE.
> 
> That hunk looks OK to me.

The == BITINT_TYPE check is non-essential, was just trying to keep existing
behavior otherwise.  I can certainly drop that.

> > --- gcc/expr.cc.jj  2024-01-12 10:07:58.194851657 +0100
> > +++ gcc/expr.cc 2024-01-18 13:38:19.677556646 +0100
> > @@ -12382,6 +12382,17 @@ expand_expr_real_1 (tree exp, rtx target
> >   }
> >}
> >  
> > +  /* Ensure non-BLKmode array VAR_DECLs VCEd to BLKmode BITINT_TYPE
> > +aren't promoted to registers.  */
> > +  if (op0 == NULL_RTX
> > + && mode == BLKmode
> > + && TREE_CODE (type) == BITINT_TYPE
> > + && VAR_P (treeop0)
> > + && DECL_MODE (treeop0) != BLKmode
> > + && DECL_RTL_SET_P (treeop0)
> > + && MEM_P (DECL_RTL (treeop0)))
> > +   op0 = adjust_address (DECL_RTL (treeop0), BLKmode, 0);
> > +
> >if (!op0)
> > op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, modifier,
> > NULL, inner_reference_p);
> 
> So we're now sure we have MEM_P (op0) after expand_expr_real,
> even without this change, right?  What's wrong with the
> suggestion to use

I wasn't sure if VAR_P (treeop0) && MEM_P (DECL_RTL (treeop0)) implies that
expand_expr_real will return a MEM, but I'm not able to find a path in which
it would return something different, so maybe ok.

>   if (mode == GET_MODE (op0) || (mode == BLKmode && MEM_P (op0))
> 
> thus not run into any of the special-casing?  We're doing just

It is true the later code will then do:
> 
>   op0 = adjust_address (op0, mode, 0);

so perhaps it is ok as you wrote it (though perhaps adding it as a separate
else if would allow a separate comment).

Jakub



Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> On Thu, Jan 18, 2024 at 01:34:53PM +0100, Richard Biener wrote:
> > So - if we simply do
> > 
> >   /* If the input and output modes are both the same, we are done.  */
> >   if (mode == GET_MODE (op0) || (mode == BLKmode && MEM_P (op0))
> > ;
> > 
> > ?  After all if we want BLKmode we want a MEM, and if we have one
> > we should be done already.  V_C_E isn't supposed to do any value
> > transform.
> 
> We'd need to make sure that op0 is actually MEM, which is not the case.
> 
> The following patch seems to work though, the 
> discover_nonconstant_array_refs_r
> part helps for -O2, the expr.cc part is needed at all optimization levels.
> 
> 2024-01-18  Jakub Jelinek  
> 
>   * cfgexpand.cc (discover_nonconstant_array_refs_r): Force non-BLKmode
>   VAR_DECLs referenced in BLKmode VIEW_CONVERT_EXPRs into memory.
>   * expr.cc (expand_expr_real_1) : If mode is
>   BLKmode and type a BITINT_TYPE from a non-BLKmode VAR_P with MEM
>   DECL_RTL, use adjust_address to BLKmode for it instead of expand_expr.
> 
> --- gcc/cfgexpand.cc.jj   2024-01-16 11:45:16.159326506 +0100
> +++ gcc/cfgexpand.cc  2024-01-18 13:41:54.579551282 +0100
> @@ -6380,11 +6380,16 @@ discover_nonconstant_array_refs_r (tree
>/* References of size POLY_INT_CST to a fixed-size object must go
>   through memory.  It's more efficient to force that here than
>   to create temporary slots on the fly.
> - RTL expansion expectes TARGET_MEM_REF to always address actual memory.  
> */
> + RTL expansion expectes TARGET_MEM_REF to always address actual memory.
> + Also, force to stack non-BLKmode vars accessed through VIEW_CONVERT_EXPR
> + to BLKmode BITINT_TYPEs.  */
>else if (TREE_CODE (t) == TARGET_MEM_REF
>  || (TREE_CODE (t) == MEM_REF
>  && TYPE_SIZE (TREE_TYPE (t))
> -&& POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t)
> +&& POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
> +|| (TREE_CODE (t) == VIEW_CONVERT_EXPR
> +&& TREE_CODE (TREE_TYPE (t)) == BITINT_TYPE
> +&& TYPE_MODE (TREE_TYPE (t)) == BLKmode))

I'm still not getting what's special about BITINT_TYPE here so
shouldn't that apply to all BLKmode V_C_E?  But sure we can for
now just handle BITINT_TYPE.

That hunk looks OK to me.

>  {
>tree base = get_base_address (t);
>if (base
> --- gcc/expr.cc.jj2024-01-12 10:07:58.194851657 +0100
> +++ gcc/expr.cc   2024-01-18 13:38:19.677556646 +0100
> @@ -12382,6 +12382,17 @@ expand_expr_real_1 (tree exp, rtx target
> }
>}
>  
> +  /* Ensure non-BLKmode array VAR_DECLs VCEd to BLKmode BITINT_TYPE
> +  aren't promoted to registers.  */
> +  if (op0 == NULL_RTX
> +   && mode == BLKmode
> +   && TREE_CODE (type) == BITINT_TYPE
> +   && VAR_P (treeop0)
> +   && DECL_MODE (treeop0) != BLKmode
> +   && DECL_RTL_SET_P (treeop0)
> +   && MEM_P (DECL_RTL (treeop0)))
> + op0 = adjust_address (DECL_RTL (treeop0), BLKmode, 0);
> +
>if (!op0)
>   op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, modifier,
>   NULL, inner_reference_p);

So we're now sure we have MEM_P (op0) after expand_expr_real,
even without this change, right?  What's wrong with the
suggestion to use

  if (mode == GET_MODE (op0) || (mode == BLKmode && MEM_P (op0))

thus not run into any of the special-casing?  We're doing just

  op0 = adjust_address (op0, mode, 0);

then which is basically what you do above?

> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[committed] libstdc++: Avoid -Wmaybe-uninitialized warnings in text_encoding.cc

2024-01-18 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

These variables are only read from if we haven't reached the end of
either range, in which case they're guaranteed to be initialized to the
next alphanumeric character. But we can just initialize them to make the
compiler happy.

libstdc++-v3/ChangeLog:

* include/bits/unicode.h (__charset_alias_match): Initialize
__var_a and __var_b.
---
 libstdc++-v3/include/bits/unicode.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/unicode.h 
b/libstdc++-v3/include/bits/unicode.h
index d025d21f3dd..51bf02e927f 100644
--- a/libstdc++-v3/include/bits/unicode.h
+++ b/libstdc++-v3/include/bits/unicode.h
@@ -1084,7 +1084,7 @@ inline namespace __v15_1_0
 while (true)
   {
// Find the value of the next alphanumeric character in each string.
-   unsigned char __val_a, __val_b;
+   unsigned char __val_a{}, __val_b{};
while (__ptr_a != __end_a
 && (__val_a = __map(*__ptr_a, __num_a)) == 127)
  ++__ptr_a;
-- 
2.43.0



[committed] libstdc++: Fix std::format test for Solaris [PR113450]

2024-01-18 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

When int8_t is a typedef for char (rather than signed char) this test
fails because it tries to format a char, which is treated differently
from formatting other integral types (including signed char).

Use signed char explicitly so the result doesn't depend on the
non-portable definition of int8_t.

libstdc++-v3/ChangeLog:

PR libstdc++/113450
* testsuite/std/format/functions/format.cc: Use signed char
instead of int8_t.
---
 libstdc++-v3/testsuite/std/format/functions/format.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc 
b/libstdc++-v3/testsuite/std/format/functions/format.cc
index 63702edbd42..30c5fc22237 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format.cc
@@ -365,7 +365,7 @@ test_minmax()
 s = std::format("{:b}" , std::numeric_limits::max());
 VERIFY( s == '1' + ones );
   };
-  check(std::int8_t(0));
+  check((signed char)(0)); // int8_t is char on Solaris, see PR 113450
   check(std::int16_t(0));
   check(std::int32_t(0));
   check(std::int64_t(0));
-- 
2.43.0



Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 01:34:53PM +0100, Richard Biener wrote:
> So - if we simply do
> 
>   /* If the input and output modes are both the same, we are done.  */
>   if (mode == GET_MODE (op0) || (mode == BLKmode && MEM_P (op0))
> ;
> 
> ?  After all if we want BLKmode we want a MEM, and if we have one
> we should be done already.  V_C_E isn't supposed to do any value
> transform.

We'd need to make sure that op0 is actually MEM, which is not the case.

The following patch seems to work though, the discover_nonconstant_array_refs_r
part helps for -O2, the expr.cc part is needed at all optimization levels.

2024-01-18  Jakub Jelinek  

* cfgexpand.cc (discover_nonconstant_array_refs_r): Force non-BLKmode
VAR_DECLs referenced in BLKmode VIEW_CONVERT_EXPRs into memory.
* expr.cc (expand_expr_real_1) : If mode is
BLKmode and type a BITINT_TYPE from a non-BLKmode VAR_P with MEM
DECL_RTL, use adjust_address to BLKmode for it instead of expand_expr.

--- gcc/cfgexpand.cc.jj 2024-01-16 11:45:16.159326506 +0100
+++ gcc/cfgexpand.cc2024-01-18 13:41:54.579551282 +0100
@@ -6380,11 +6380,16 @@ discover_nonconstant_array_refs_r (tree
   /* References of size POLY_INT_CST to a fixed-size object must go
  through memory.  It's more efficient to force that here than
  to create temporary slots on the fly.
- RTL expansion expectes TARGET_MEM_REF to always address actual memory.  */
+ RTL expansion expectes TARGET_MEM_REF to always address actual memory.
+ Also, force to stack non-BLKmode vars accessed through VIEW_CONVERT_EXPR
+ to BLKmode BITINT_TYPEs.  */
   else if (TREE_CODE (t) == TARGET_MEM_REF
   || (TREE_CODE (t) == MEM_REF
   && TYPE_SIZE (TREE_TYPE (t))
-  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t)
+  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
+  || (TREE_CODE (t) == VIEW_CONVERT_EXPR
+  && TREE_CODE (TREE_TYPE (t)) == BITINT_TYPE
+  && TYPE_MODE (TREE_TYPE (t)) == BLKmode))
 {
   tree base = get_base_address (t);
   if (base
--- gcc/expr.cc.jj  2024-01-12 10:07:58.194851657 +0100
+++ gcc/expr.cc 2024-01-18 13:38:19.677556646 +0100
@@ -12382,6 +12382,17 @@ expand_expr_real_1 (tree exp, rtx target
  }
   }
 
+  /* Ensure non-BLKmode array VAR_DECLs VCEd to BLKmode BITINT_TYPE
+aren't promoted to registers.  */
+  if (op0 == NULL_RTX
+ && mode == BLKmode
+ && TREE_CODE (type) == BITINT_TYPE
+ && VAR_P (treeop0)
+ && DECL_MODE (treeop0) != BLKmode
+ && DECL_RTL_SET_P (treeop0)
+ && MEM_P (DECL_RTL (treeop0)))
+   op0 = adjust_address (DECL_RTL (treeop0), BLKmode, 0);
+
   if (!op0)
op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, modifier,
NULL, inner_reference_p);


Jakub



Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> On Thu, Jan 18, 2024 at 01:16:45PM +0100, Richard Biener wrote:
> > > This doesn't actually do anything, because the base is TREE_ADDRESSABLE.
> > > The var gets both with -O0 and -O2 DECL_RTL like
> > > (mem/c:OI (plus:DI (reg/f:DI 95 virtual-stack-vars)
> > > (const_int -64 [0xffc0])) [2 bitint.2+0 S32 A128])
> > 
> > But then it's not promoted to register but instead somebody decides
> > it gets an integer mode instead of BLKmode.
> 
> It is promoted to register, expand_expr on treeop0 in that case
> sees it is OImode MEM and at least when optimize forces it into a register.
> 
> > > but the problem is that the expansion of the VAR_DECL because of the
> > > non-BLKmode is forced into a pseudo.
> > > --- gcc/expr.cc.jj2024-01-12 10:07:58.194851657 +0100
> > > +++ gcc/expr.cc   2024-01-18 11:56:07.142361031 +0100
> > > @@ -12382,6 +12382,17 @@ expand_expr_real_1 (tree exp, rtx target
> > > }
> > >}
> > 
> > There's already an odd bit of code dealing with non-BLKmode to
> > BLKmode converts, suggesting those would need intermediate memory,
> > but likely not triggering because the base is a decl, not a
> > handled_component.  Does it work to go there also for DECL_P (treeop0)?
> 
> Tried that, it doesn't do anything interesting.  It can handle mostly
> the case where it is say a large structure element, the structure is
> BLKmode, but the element is not and then is cast to BLKmode
> VIEW_CONVERT_EXPR.
> 
> > > path which can't deal with BLKmode extraction from non-BLKmode.
> > > I guess we could in the above new expr.cc hunk perhaps
> > > also if (MEM_P (op0)) op0 = adjust_address (op0, BLKmode, 0);
> > 
> > Hmm, 'reduce_bit_field' is odd with V_C_E - if you disable that,
> > does it work?
> 
> Generally, we need reduce_bit_field to work as is even for _BitInt,
> in most spots it results in the reduction for bit-field precision
> which has to happen.
> If we'd somehow disable the
>   /* If the output type is a bit-field type, do an extraction.  */
>   else if (reduce_bit_field)
> return extract_bit_field (op0, TYPE_PRECISION (type), 0,
>   TYPE_UNSIGNED (type), NULL_RTX,
>   mode, mode, false, NULL);
> case for mode == BLKmode, then we'd trigger the
>   /* As a last resort, spill op0 to memory, and reload it in a
>  different mode.  */
>   else if (!MEM_P (op0))
> {
> case which would spill it again into memory and extract.  That would
> then not ICE, but I strongly doubt we'd be able to undo that later,
> at least not the stack allocations.  IMHO it is much better to keep
> the stuff in memory, instead of forcing it into register and then
> force it to some other memory.
> 
> > How does it behave differently when the base is BLKmode instead
> > of OImode?
> 
> If op0 has BLKmode and mode is BLKmode too, then it triggers the
>   /* If the input and output modes are both the same, we are done.  */
>   if (mode == GET_MODE (op0))
> ;
> case and doesn't run into anything else, so the result is just the MEM
> with the array.  Which is what the hack to set BLKmode DECL_MODE achieves
> too.

So - if we simply do

  /* If the input and output modes are both the same, we are done.  */
  if (mode == GET_MODE (op0) || (mode == BLKmode && MEM_P (op0))
;

?  After all if we want BLKmode we want a MEM, and if we have one
we should be done already.  V_C_E isn't supposed to do any value
transform.

Richard.


[PATCH] tree-optimization/113475 - fix memory leak in phi_analyzer

2024-01-18 Thread Richard Biener
phi_analyzer leaks all phi_group objects it allocates.  The following
fixes this by maintaining a vector of allocated objects and release
them when destroying the phi_analyzer object.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/113475
* gimple-range-phi.h (phi_analyzer::m_phi_groups): New.
* gimple-range-phi.cc (phi_analyzer::phi_analyzer): Initialize.
(phi_analyzer::~phi_analyzer): Deallocate and free collected
phi_grous.
(phi_analyzer::process_phi): Record allocated phi_groups.
---
 gcc/gimple-range-phi.cc | 6 +-
 gcc/gimple-range-phi.h  | 1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-range-phi.cc b/gcc/gimple-range-phi.cc
index 5aee761c6f4..01900a35b32 100644
--- a/gcc/gimple-range-phi.cc
+++ b/gcc/gimple-range-phi.cc
@@ -254,7 +254,7 @@ phi_group::dump (FILE *f)
 
 // Construct a phi analyzer which uses range_query G to pick up values.
 
-phi_analyzer::phi_analyzer (range_query ) : m_global (g)
+phi_analyzer::phi_analyzer (range_query ) : m_global (g), m_phi_groups 
(vNULL)
 {
   m_work.create (0);
   m_work.safe_grow (20);
@@ -273,6 +273,9 @@ phi_analyzer::~phi_analyzer ()
   bitmap_obstack_release (_bitmaps);
   m_tab.release ();
   m_work.release ();
+  for (auto grp : m_phi_groups)
+delete grp;
+  m_phi_groups.release ();
 }
 
 //  Return the group, if any, that NAME is part of.  Do no analysis.
@@ -458,6 +461,7 @@ phi_analyzer::process_phi (gphi *phi)
  if (!cyc.range ().varying_p ())
{
  g = new phi_group (cyc);
+ m_phi_groups.safe_push (g);
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "PHI ANALYZER : New ");
diff --git a/gcc/gimple-range-phi.h b/gcc/gimple-range-phi.h
index 04747ba9784..a40aece5b22 100644
--- a/gcc/gimple-range-phi.h
+++ b/gcc/gimple-range-phi.h
@@ -87,6 +87,7 @@ protected:
 
   bitmap m_simple;   // Processed, not part of a group.
   bitmap m_current; // Potential group currently being analyzed.
+  vec m_phi_groups;
   vec m_tab;
   bitmap_obstack m_bitmaps;
 };
-- 
2.35.3


Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 01:16:45PM +0100, Richard Biener wrote:
> > This doesn't actually do anything, because the base is TREE_ADDRESSABLE.
> > The var gets both with -O0 and -O2 DECL_RTL like
> > (mem/c:OI (plus:DI (reg/f:DI 95 virtual-stack-vars)
> >   (const_int -64 [0xffc0])) [2 bitint.2+0 S32 A128])
> 
> But then it's not promoted to register but instead somebody decides
> it gets an integer mode instead of BLKmode.

It is promoted to register, expand_expr on treeop0 in that case
sees it is OImode MEM and at least when optimize forces it into a register.

> > but the problem is that the expansion of the VAR_DECL because of the
> > non-BLKmode is forced into a pseudo.
> > --- gcc/expr.cc.jj  2024-01-12 10:07:58.194851657 +0100
> > +++ gcc/expr.cc 2024-01-18 11:56:07.142361031 +0100
> > @@ -12382,6 +12382,17 @@ expand_expr_real_1 (tree exp, rtx target
> >   }
> >}
> 
> There's already an odd bit of code dealing with non-BLKmode to
> BLKmode converts, suggesting those would need intermediate memory,
> but likely not triggering because the base is a decl, not a
> handled_component.  Does it work to go there also for DECL_P (treeop0)?

Tried that, it doesn't do anything interesting.  It can handle mostly
the case where it is say a large structure element, the structure is
BLKmode, but the element is not and then is cast to BLKmode
VIEW_CONVERT_EXPR.

> > path which can't deal with BLKmode extraction from non-BLKmode.
> > I guess we could in the above new expr.cc hunk perhaps
> > also if (MEM_P (op0)) op0 = adjust_address (op0, BLKmode, 0);
> 
> Hmm, 'reduce_bit_field' is odd with V_C_E - if you disable that,
> does it work?

Generally, we need reduce_bit_field to work as is even for _BitInt,
in most spots it results in the reduction for bit-field precision
which has to happen.
If we'd somehow disable the
  /* If the output type is a bit-field type, do an extraction.  */
  else if (reduce_bit_field)
return extract_bit_field (op0, TYPE_PRECISION (type), 0,
  TYPE_UNSIGNED (type), NULL_RTX,
  mode, mode, false, NULL);
case for mode == BLKmode, then we'd trigger the
  /* As a last resort, spill op0 to memory, and reload it in a
 different mode.  */
  else if (!MEM_P (op0))
{
case which would spill it again into memory and extract.  That would
then not ICE, but I strongly doubt we'd be able to undo that later,
at least not the stack allocations.  IMHO it is much better to keep
the stuff in memory, instead of forcing it into register and then
force it to some other memory.

> How does it behave differently when the base is BLKmode instead
> of OImode?

If op0 has BLKmode and mode is BLKmode too, then it triggers the
  /* If the input and output modes are both the same, we are done.  */
  if (mode == GET_MODE (op0))
;
case and doesn't run into anything else, so the result is just the MEM
with the array.  Which is what the hack to set BLKmode DECL_MODE achieves
too.

Jakub



Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Richard Biener
On Thu, 18 Jan 2024, Jakub Jelinek wrote:

> On Thu, Jan 18, 2024 at 08:27:51AM +0100, Richard Biener wrote:
> > On Thu, 18 Jan 2024, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > On aarch64 the backend decides to use non-BLKmode for some arrays
> > > like unsigned long[4] - OImode in that case, but the corresponding
> > > BITINT_TYPEs have BLKmode (like structures containing that many limb
> > > elements).  This both isn't a good idea (we really want such underlying 
> > > vars
> > > to live in memory and access them there, rather than live in registers and
> > > access their parts in there) and causes ICEs during expansion
> > > (VIEW_CONVERT_EXPR from such OImode array to BLKmode BITINT_TYPE), so the
> > > following patch makes sure such arrays reflect the BLKmode of BITINT_TYPEs
> > > it is accessed with (if any).
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > So the issue is only manifesting during expansion?  I think it would
> > be better to detect the specific issue (V_C_E from register to BLKmode)
> > in discover_nonconstant_array_refs_r and force the register argument
> > to stack?
> 
> That doesn't really work, tried:
> --- gcc/cfgexpand.cc.jj   2024-01-16 11:45:16.159326506 +0100
> +++ gcc/cfgexpand.cc  2024-01-18 11:26:17.906447274 +0100
> @@ -6380,11 +6380,16 @@ discover_nonconstant_array_refs_r (tree
>/* References of size POLY_INT_CST to a fixed-size object must go
>   through memory.  It's more efficient to force that here than
>   to create temporary slots on the fly.
> - RTL expansion expectes TARGET_MEM_REF to always address actual memory.  
> */
> + RTL expansion expectes TARGET_MEM_REF to always address actual memory.
> + Also, force to stack non-BLKmode vars accessed through VIEW_CONVERT_EXPR
> + to BLKmode BITINT_TYPEs.  */
>else if (TREE_CODE (t) == TARGET_MEM_REF
>  || (TREE_CODE (t) == MEM_REF
>  && TYPE_SIZE (TREE_TYPE (t))
> -&& POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t)
> +&& POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
> +|| (TREE_CODE (t) == VIEW_CONVERT_EXPR
> +&& TREE_CODE (TREE_TYPE (t)) == BITINT_TYPE
> +&& TYPE_MODE (TREE_TYPE (t)) == BLKmode))
>  {
>tree base = get_base_address (t);
>if (base
> This doesn't actually do anything, because the base is TREE_ADDRESSABLE.
> The var gets both with -O0 and -O2 DECL_RTL like
> (mem/c:OI (plus:DI (reg/f:DI 95 virtual-stack-vars)
> (const_int -64 [0xffc0])) [2 bitint.2+0 S32 A128])

But then it's not promoted to register but instead somebody decides
it gets an integer mode instead of BLKmode.

> but the problem is that the expansion of the VAR_DECL because of the
> non-BLKmode is forced into a pseudo.
> --- gcc/expr.cc.jj2024-01-12 10:07:58.194851657 +0100
> +++ gcc/expr.cc   2024-01-18 11:56:07.142361031 +0100
> @@ -12382,6 +12382,17 @@ expand_expr_real_1 (tree exp, rtx target
> }
>}

There's already an odd bit of code dealing with non-BLKmode to
BLKmode converts, suggesting those would need intermediate memory,
but likely not triggering because the base is a decl, not a
handled_component.  Does it work to go there also for DECL_P (treeop0)?


> +  /* Ensure non-BLKmode array VAR_DECLs VCEd to BLKmode BITINT_TYPE
> +  aren't promoted to registers.  */
> +  if (op0 == NULL_RTX
> +   && mode == BLKmode
> +   && TREE_CODE (type) == BITINT_TYPE
> +   && modifier == EXPAND_NORMAL
> +   && VAR_P (treeop0)
> +   && DECL_MODE (treeop0) != BLKmode)
> + op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, EXPAND_MEMORY,
> + NULL, inner_reference_p);
> +
>if (!op0)
>   op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, modifier,
>   NULL, inner_reference_p);
> doesn't work either, while we get the MEM op0 in that case, because
> mode != GET_MODE (op0) we still take the
>   /* If the output type is a bit-field type, do an extraction.  */
>   else if (reduce_bit_field)
> return extract_bit_field (op0, TYPE_PRECISION (type), 0,
>   TYPE_UNSIGNED (type), NULL_RTX,
>   mode, mode, false, NULL);
> path which can't deal with BLKmode extraction from non-BLKmode.
> I guess we could in the above new expr.cc hunk perhaps
> also if (MEM_P (op0)) op0 = adjust_address (op0, BLKmode, 0);

Hmm, 'reduce_bit_field' is odd with V_C_E - if you disable that,
does it work?

How does it behave differently when the base is BLKmode instead
of OImode?

Richard.


Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-18 Thread Ajit Agarwal
Hello Michael:

On 17/01/24 7:58 pm, Michael Matz wrote:
> Hello,
> 
> On Wed, 17 Jan 2024, Ajit Agarwal wrote:
> 
>>> first is even, since OOmode is only ok for even vsx register and its
>>> size makes it take two consecutive vsx registers.
>>>
>>> Hi Peter, is my understanding correct?
>>>
>>
>> I tried all the combination in the past RA is not allocating sequential 
>> register. I dont see any such code in RA that generates sequential 
>> registers.
> 
> See HARD_REGNO_NREGS.  If you form a pseudo of a mode that's larger than a 
> native-sized hardreg (and the target is correctly set up) then the RA will 
> allocate the correct number of hardregs (consecutively) for this pseudo.  
> This is what Kewen was referring to by mentioning the OOmode for the new 
> hypothetical pseudo.  The individual parts of such pseudo will then need 
> to use subreg to access them.
> 
> So, when you work before RA you simply will transform this (I'm going to 
> use SImode and DImode for demonstration):
> 
>(set (reg:SI x) (mem:SI (addr)))
>(set (reg:SI y) (mem:SI (addr+4)))
>...
>( ...use1... (reg:SI x))
>( ...use2... (reg:SI y))
> 
> into this:
> 
>(set (reg:DI z) (mem:DI (addr)))
>...
>( ...use1... (subreg:SI (reg:DI z) 0))
>( ...use2... (subreg:SI (reg:DI z) 4))
> 
> For this to work the target needs to accept the (subreg...) in certain 
> operands of instruction patterns, which I assume was what Kewen also 
> referred to.  The register allocator will then assign hardregs X and X+1 
> to the pseudo-reg 'z'.  (Assuming that DImode is okay for hardreg X, and 
> HARD_REGNO_NREGS says that it needs two hardregs to hold DImode).
> 
> It will also replace the subregs by their appropriate concrete hardreg.
> 
> It seems your problems stem from trying to place your new pass somewhere 
> within the register-allocation pipeline, rather than simply completely 
> before.
> 

Thanks for the suggestions. It worked and with above changes sequential
registers are generated by RA pass.

I am working on common infrastructure with AARCH64 for register pairs
loads and stores pass.

Thanks & Regards
Ajit

> 
> Ciao,
> Michael.


Re: [PATCH] libstdc++: Update baseline symbols for riscv64-linux

2024-01-18 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 01:11:56PM +0100, Rainer Orth wrote:
> Andreas Schwab  writes:
> 
> > * config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.
> 
> Speaking of baselines: is this a good time to update them for other
> targets, too, or should we better wait a little longer?

I think we usually do this in March or so, not in January.

Jakub



Re: [PATCH] libstdc++: Update baseline symbols for riscv64-linux

2024-01-18 Thread Rainer Orth
Andreas Schwab  writes:

>   * config/abi/post/riscv64-linux-gnu/baseline_symbols.txt: Update.

Speaking of baselines: is this a good time to update them for other
targets, too, or should we better wait a little longer?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix memory leak in vectorizable_store

2024-01-18 Thread Richard Biener
The following fixes a memory leak in vectorizable_store which happens
because the functions populating gvec_oprnds[i] will call .create ()
on the incoming vector, leaking what we've previously allocated.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-stmts.cc (vectorizable_store): Do not allocate
storage for gvec_oprnds elements.
---
 gcc/tree-vect-stmts.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cabd4e3ae86..69d76c3b350 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8772,7 +8772,7 @@ vectorizable_store (vec_info *vinfo,
   tree vec_mask = NULL;
   auto_delete_vec> gvec_oprnds (group_size);
   for (i = 0; i < group_size; i++)
-gvec_oprnds.quick_push (new auto_vec (ncopies));
+gvec_oprnds.quick_push (new auto_vec ());
 
   if (memory_access_type == VMAT_LOAD_STORE_LANES)
 {
-- 
2.35.3


Re: [RFC] Either fix or disable SME feature for `aarch64-w64-mingw32` target?

2024-01-18 Thread Radek Barton
Are there any further comments or suggestions, please? What needs to be done to 
merge this change? (Note we don't have merge rights).

Thank you.

Radek


[patch,avr,applied] Tabify some files.

2024-01-18 Thread Georg-Johann Lay

The C++ files in the avr backend are a mix of TABS and
8 spaces for indentation.  This patch rectifies that
according to the coding rules.

I applied this prior to the advent of the v14 branch so
that back-porting will be easier, at least from the new master
to v14.

Johann

--

gcc/
* config/avr/avr-log.cc: Tabify.
* config/avr/avr-devices.cc: Tabify.
* config/avr/avr-c.cc: Tabify.
* config/avr/driver-avr.cc: Tabify.
* config/avr/gen-avr-mmcu-texi.cc: Tabify.
* config/avr/gen-avr-mmcu-specs.cc: Tabify.diff --git a/gcc/config/avr/gen-avr-mmcu-specs.cc b/gcc/config/avr/gen-avr-mmcu-specs.cc
index 72841b1bb42..02778aa3ce8 100644
--- a/gcc/config/avr/gen-avr-mmcu-specs.cc
+++ b/gcc/config/avr/gen-avr-mmcu-specs.cc
@@ -44,13 +44,13 @@
 #endif
 
 
-#define SPECFILE_DOC_URL\
+#define SPECFILE_DOC_URL\
   "https://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html;
 
-#define SPECFILE_USAGE_URL  \
+#define SPECFILE_USAGE_URL			\
   "https://gcc.gnu.org/gcc-5/changes.html;
 
-#define WIKI_URL\
+#define WIKI_URL	\
   "https://gcc.gnu.org/wiki/avr-gcc#spec-files;
 
 static const char header[] =
@@ -210,7 +210,7 @@ print_mcu (const avr_mcu_t *mcu)
 
   if (is_arch
   && (ARCH_AVR2 == arch_id
-  || ARCH_AVR25 == arch_id))
+	  || ARCH_AVR25 == arch_id))
 {
   // Leave "avr2" and "avr25" alone.  These two architectures are
   // the only ones that mix devices with 8-bit SP and 16-bit SP.
@@ -244,7 +244,7 @@ print_mcu (const avr_mcu_t *mcu)
 link_arch_spec = link_arch_flmap_spec;
 
   fprintf (f, "#\n"
-   "# Auto-generated specs for AVR ");
+	   "# Auto-generated specs for AVR ");
   if (is_arch)
 fprintf (f, "core architecture %s\n", arch->name);
   else
@@ -279,19 +279,19 @@ print_mcu (const avr_mcu_t *mcu)
   int n_flash = 1 + (mcu->flash_size - 1) / 0x1;
 
   fprintf (f, "*cc1_n_flash:\n"
-   "\t%%{!mn-flash=*:-mn-flash=%d}\n\n", n_flash);
+	   "\t%%{!mn-flash=*:-mn-flash=%d}\n\n", n_flash);
 
   fprintf (f, "*cc1_rmw:\n%s\n\n", rmw
-   ? "\t%{!mno-rmw: -mrmw}"
-   : "\t%{mrmw}");
+	   ? "\t%{!mno-rmw: -mrmw}"
+	   : "\t%{mrmw}");
 
   fprintf (f, "*cc1_errata_skip:\n%s\n\n", errata_skip
-   ? "\t%{!mno-skip-bug: -mskip-bug}"
-   : "\t%{!mskip-bug: -mno-skip-bug}");
+	   ? "\t%{!mno-skip-bug: -mskip-bug}"
+	   : "\t%{!mskip-bug: -mno-skip-bug}");
 
   fprintf (f, "*cc1_absdata:\n%s\n\n", absdata
-   ? "\t%{!mno-absdata: -mabsdata}"
-   : "\t%{mabsdata}");
+	   ? "\t%{!mno-absdata: -mabsdata}"
+	   : "\t%{mabsdata}");
 
   // -m[no-]rodata-in-ram basically affects linking, but sanity-check early.
   fprintf (f, "*cc1_misc:\n\t%%(check_rodata_in_ram)\n\n");
@@ -306,18 +306,18 @@ print_mcu (const avr_mcu_t *mcu)
 
 #ifdef HAVE_AS_AVR_MRMW_OPTION
   fprintf (f, "*asm_rmw:\n%s\n\n", rmw
-   ? "\t%{!mno-rmw: -mrmw}"
-   : "\t%{mrmw}");
+	   ? "\t%{!mno-rmw: -mrmw}"
+	   : "\t%{mrmw}");
 #endif // have avr-as -mrmw
 
 #ifdef HAVE_AS_AVR_MGCCISR_OPTION
   fprintf (f, "*asm_gccisr:\n%s\n\n",
-   "\t%{!mno-gas-isr-prologues: -mgcc-isr}");
+	   "\t%{!mno-gas-isr-prologues: -mgcc-isr}");
 #endif // have avr-as -mgcc-isr
 
   fprintf (f, "*asm_errata_skip:\n%s\n\n", errata_skip
-   ? "\t%{mno-skip-bug}"
-   : "\t%{!mskip-bug: -mno-skip-bug}");
+	   ? "\t%{mno-skip-bug}"
+	   : "\t%{!mskip-bug: -mno-skip-bug}");
 
   fprintf (f, "*asm_misc:\n" /* empty */ "\n\n");
 
@@ -349,14 +349,14 @@ print_mcu (const avr_mcu_t *mcu)
 {
   fprintf (f, "*link_data_start:\n");
   if (mcu->data_section_start
-  != arch->default_data_section_start)
-fprintf (f, "\t%%{!Tdata:-Tdata 0x%lX}",
- 0x80UL + mcu->data_section_start);
+	  != arch->default_data_section_start)
+	fprintf (f, "\t%%{!Tdata:-Tdata 0x%lX}",
+		 0x80UL + mcu->data_section_start);
   fprintf (f, "\n\n");
 
   fprintf (f, "*link_text_start:\n");
   if (mcu->text_section_start != 0x0)
-fprintf (f, "\t%%{!Ttext:-Ttext 0x%lX}", 0UL + mcu->text_section_start);
+	fprintf (f, "\t%%{!Ttext:-Ttext 0x%lX}", 0UL + mcu->text_section_start);
   fprintf (f, "\n\n");
 }
 
diff --git a/gcc/config/avr/gen-avr-mmcu-texi.cc b/gcc/config/avr/gen-avr-mmcu-texi.cc
index d928236e172..70aa430902e 100644
--- a/gcc/config/avr/gen-avr-mmcu-texi.cc
+++ b/gcc/config/avr/gen-avr-mmcu-texi.cc
@@ -118,23 +118,23 @@ comparator (const void *va, const void *vb)
 
   if (*a != *b)
 	return *a - *b;
-  
+
   a++;
   b++;
 }
 
   return *a - *b;
-} 
+}
 
 static void
 print_mcus (size_t n_mcus)
 {
   int duplicate = 0;
   size_t i;
-
+
   if (!n_mcus)
 return;
-
+
   qsort (mcus, n_mcus, sizeof (avr_mcu_t*), comparator);
 
   printf ("@*@var{mcu}@tie{}=");
diff --git a/gcc/config/avr/driver-avr.cc 

Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Jakub Jelinek
And
--- gcc/expr.cc.jj  2024-01-12 10:07:58.194851657 +0100
+++ gcc/expr.cc 2024-01-18 12:08:16.412147569 +0100
@@ -12382,6 +12382,21 @@ expand_expr_real_1 (tree exp, rtx target
  }
   }
 
+  /* Ensure non-BLKmode array VAR_DECLs VCEd to BLKmode BITINT_TYPE
+aren't promoted to registers.  */
+  if (op0 == NULL_RTX
+ && mode == BLKmode
+ && TREE_CODE (type) == BITINT_TYPE
+ && modifier == EXPAND_NORMAL
+ && VAR_P (treeop0)
+ && DECL_MODE (treeop0) != BLKmode)
+   {
+ op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, EXPAND_MEMORY,
+ NULL, inner_reference_p);
+ if (MEM_P (op0))
+   op0 = adjust_address (op0, BLKmode, 0);
+   }
+
   if (!op0)
op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, modifier,
NULL, inner_reference_p);
helps at -O0, but doesn't at -O2, where even EXPAND_MEMORY doesn't actually
cause MEM_P result, it is still forced into a REG.

Jakub



Re: [COMMITTED] rust_debug: Cast size_t values to unsigned long before printing.

2024-01-18 Thread Iain Sandoe
Hi Arthur,

> On 18 Jan 2024, at 10:30, Arthur Cohen  wrote:

> On 1/18/24 10:13, Rainer Orth wrote:
>> Arthur Cohen  writes:
>>> Using %lu to format size_t values breaks 32 bit targets, and %zu is not
>>> supported by one of the hosts GCC aims to support - HPUX
>> But we do have uses of %zu in gcc/rust already!
>>> diff --git a/gcc/rust/expand/rust-proc-macro.cc 
>>> b/gcc/rust/expand/rust-proc-macro.cc
>>> index e8618485b71..09680733e98 100644
>>> --- a/gcc/rust/expand/rust-proc-macro.cc
>>> +++ b/gcc/rust/expand/rust-proc-macro.cc
>>> @@ -171,7 +171,7 @@ load_macros (std::string path)
>>>if (array == nullptr)
>>>  return {};
>>>  -  rust_debug ("Found %lu procedural macros", array->length);
>>> +  rust_debug ("Found %lu procedural macros", (unsigned long) 
>>> array->length);
>> Not the best way either: array->length is std::uint64_t, so the format
>> should use
>> ... %" PRIu64 " procedural...
>> instead.
>> I've attached my patch to PR rust/113461.
> 
> Yes, I was talking about this on IRC the other day - if we do run in a 
> situation where we have more than UINT32_MAX procedural macros in memory we 
> have big issues. These debug prints will probably end up getting removed soon 
> as they clutter the output a lot for little information.
> 
> I don't mind doing it the right way for our regular prints, but we have not 
> been using PRIu64 in our codebase so far, so I'd rather change all those 
> incriminating format specifiers at once later down the line - this patch was 
> pushed so that 32bit targets could bootstrap the Rust frontend for now.

For the sake of completeness, the issue does not just affect 32b hosts;  If a 
64b host chooses (as Darwin does, so that 32b and 64b targets have the same 
representation) to make uint64_t “unsigned long long int”, then %lu breaks 
there too.
thanks
Iain



Re: [PATCH] lower-bitint: Force some arrays corresponding to large/huge _BitInt SSA_NAMEs to BLKmode

2024-01-18 Thread Jakub Jelinek
On Thu, Jan 18, 2024 at 08:27:51AM +0100, Richard Biener wrote:
> On Thu, 18 Jan 2024, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > On aarch64 the backend decides to use non-BLKmode for some arrays
> > like unsigned long[4] - OImode in that case, but the corresponding
> > BITINT_TYPEs have BLKmode (like structures containing that many limb
> > elements).  This both isn't a good idea (we really want such underlying vars
> > to live in memory and access them there, rather than live in registers and
> > access their parts in there) and causes ICEs during expansion
> > (VIEW_CONVERT_EXPR from such OImode array to BLKmode BITINT_TYPE), so the
> > following patch makes sure such arrays reflect the BLKmode of BITINT_TYPEs
> > it is accessed with (if any).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> So the issue is only manifesting during expansion?  I think it would
> be better to detect the specific issue (V_C_E from register to BLKmode)
> in discover_nonconstant_array_refs_r and force the register argument
> to stack?

That doesn't really work, tried:
--- gcc/cfgexpand.cc.jj 2024-01-16 11:45:16.159326506 +0100
+++ gcc/cfgexpand.cc2024-01-18 11:26:17.906447274 +0100
@@ -6380,11 +6380,16 @@ discover_nonconstant_array_refs_r (tree
   /* References of size POLY_INT_CST to a fixed-size object must go
  through memory.  It's more efficient to force that here than
  to create temporary slots on the fly.
- RTL expansion expectes TARGET_MEM_REF to always address actual memory.  */
+ RTL expansion expectes TARGET_MEM_REF to always address actual memory.
+ Also, force to stack non-BLKmode vars accessed through VIEW_CONVERT_EXPR
+ to BLKmode BITINT_TYPEs.  */
   else if (TREE_CODE (t) == TARGET_MEM_REF
   || (TREE_CODE (t) == MEM_REF
   && TYPE_SIZE (TREE_TYPE (t))
-  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t)
+  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
+  || (TREE_CODE (t) == VIEW_CONVERT_EXPR
+  && TREE_CODE (TREE_TYPE (t)) == BITINT_TYPE
+  && TYPE_MODE (TREE_TYPE (t)) == BLKmode))
 {
   tree base = get_base_address (t);
   if (base
This doesn't actually do anything, because the base is TREE_ADDRESSABLE.
The var gets both with -O0 and -O2 DECL_RTL like
(mem/c:OI (plus:DI (reg/f:DI 95 virtual-stack-vars)
  (const_int -64 [0xffc0])) [2 bitint.2+0 S32 A128])
but the problem is that the expansion of the VAR_DECL because of the
non-BLKmode is forced into a pseudo.
--- gcc/expr.cc.jj  2024-01-12 10:07:58.194851657 +0100
+++ gcc/expr.cc 2024-01-18 11:56:07.142361031 +0100
@@ -12382,6 +12382,17 @@ expand_expr_real_1 (tree exp, rtx target
  }
   }
 
+  /* Ensure non-BLKmode array VAR_DECLs VCEd to BLKmode BITINT_TYPE
+aren't promoted to registers.  */
+  if (op0 == NULL_RTX
+ && mode == BLKmode
+ && TREE_CODE (type) == BITINT_TYPE
+ && modifier == EXPAND_NORMAL
+ && VAR_P (treeop0)
+ && DECL_MODE (treeop0) != BLKmode)
+   op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, EXPAND_MEMORY,
+   NULL, inner_reference_p);
+
   if (!op0)
op0 = expand_expr_real (treeop0, NULL_RTX, VOIDmode, modifier,
NULL, inner_reference_p);
doesn't work either, while we get the MEM op0 in that case, because
mode != GET_MODE (op0) we still take the
  /* If the output type is a bit-field type, do an extraction.  */
  else if (reduce_bit_field)
return extract_bit_field (op0, TYPE_PRECISION (type), 0,
  TYPE_UNSIGNED (type), NULL_RTX,
  mode, mode, false, NULL);
path which can't deal with BLKmode extraction from non-BLKmode.
I guess we could in the above new expr.cc hunk perhaps
also if (MEM_P (op0)) op0 = adjust_address (op0, BLKmode, 0);

Jakub



[wwwdocs] Document new additions to libstdc++

2024-01-18 Thread Jonathan Wakely
Pushed to wwwdocs.

-- >8 --

std::generator, std::format improvements, std::text_encoding.
---
 htdocs/gcc-14/changes.html | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 5644de1e..951d005b 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -245,6 +245,7 @@ a work-in-progress.
   Improved experimental support for C++20, including:
 
 std::chrono::parse.
+Unicode-aware string handling in std::format.
 
   
   Improved experimental support for C++23, including:
@@ -252,6 +253,9 @@ a work-in-progress.
 The std::ranges::to function for converting
   ranges to containers.
 
+The std::generator view for getting results from
+  coroutines.
+
 The stacktrace header is supported by default.
 
 std::print and std::println
@@ -272,7 +276,13 @@ a work-in-progress.
 Functions for saturation arithmetic on integers.
 std::to_string now uses std::format.
 Enhanced formatting of pointers with std::format.
+The std::runtime_format function to allow using
+  non-literal format strings with std::format.
 Testable result types for charconv functions.
+The std::text_encoding class for identifying character
+  sets (requires linking with -lstdc++exp for some member
+  functions).
+
 
   
   Faster numeric conversions using std::to_string and
-- 
2.43.0



[PATCH] Fix memory leak in vect_analyze_loop_form

2024-01-18 Thread Richard Biener
The following fixes a memory leak in vect_analyze_loop_form which fails
to free the loop body it gets.  It also allows more countable exits,
matching what we can handle later, when we decide which exit to use
as main exit.  Finally some no longer applying comments are adjusted.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-loop.cc (vec_init_loop_exit_info): Adjust comment,
prefer all later exits we can handle.
(vect_analyze_loop_form): Free the allocated loop body.
Adjust comments.
---
 gcc/tree-vect-loop.cc | 45 ++-
 1 file changed, 19 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 330c4571c8d..c815c606f21 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -977,8 +977,8 @@ vec_init_loop_exit_info (class loop *loop)
   if (exits.length () == 1)
 return exits[0];
 
-  /* If we have multiple exits we only support counting IV at the moment.  
Analyze
- all exits and return one */
+  /* If we have multiple exits we only support counting IV at the moment.
+ Analyze all exits and return the last one we can analyze.  */
   class tree_niter_desc niter_desc;
   edge candidate = NULL;
   for (edge exit : exits)
@@ -990,7 +990,9 @@ vec_init_loop_exit_info (class loop *loop)
  && !chrec_contains_undetermined (niter_desc.niter))
{
  tree may_be_zero = niter_desc.may_be_zero;
- if (integer_zerop (may_be_zero)
+ if ((integer_zerop (may_be_zero)
+  || integer_nonzerop (may_be_zero)
+  || COMPARISON_CLASS_P (may_be_zero))
  && (!candidate
  || dominated_by_p (CDI_DOMINATORS, exit->src,
 candidate->src)))
@@ -1745,14 +1747,18 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
 
   /* Check if we have any control flow that doesn't leave the loop.  */
   class loop *v_loop = loop->inner ? loop->inner : loop;
-  basic_block *bbs= get_loop_body (v_loop);
+  basic_block *bbs = get_loop_body (v_loop);
   for (unsigned i = 0; i < v_loop->num_nodes; i++)
 if (EDGE_COUNT (bbs[i]->succs) != 1
&& (EDGE_COUNT (bbs[i]->succs) != 2
|| !loop_exits_from_bb_p (bbs[i]->loop_father, bbs[i])))
-  return opt_result::failure_at (vect_location,
-"not vectorized:"
-" unsupported control flow in loop.\n");
+  {
+   free (bbs);
+   return opt_result::failure_at (vect_location,
+  "not vectorized:"
+  " unsupported control flow in loop.\n");
+  }
+  free (bbs);
 
   /* Different restrictions apply when we are considering an inner-most loop,
  vs. an outer (nested) loop.
@@ -1761,17 +1767,7 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   info->inner_loop_cond = NULL;
   if (!loop->inner)
 {
-  /* Inner-most loop.  We currently require that the number of BBs is
-exactly 2 (the header and latch).  Vectorizable inner-most loops
-look like this:
-
-(pre-header)
-   |
-  header <+
-   | ||
-   | +--> latch --+
-   |
-(exit-bb)  */
+  /* Inner-most loop.  */
 
   if (empty_block_p (loop->header))
return opt_result::failure_at (vect_location,
@@ -1783,7 +1779,8 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   edge entryedge;
 
   /* Nested loop. We currently require that the loop is doubly-nested,
-contains a single inner loop, and the number of BBs is exactly 5.
+contains a single inner loop with a single exit to the block
+with the single exit condition in the outer loop.
 Vectorizable outer-loops look like this:
 
(pre-header)
@@ -1796,7 +1793,7 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   |
(exit-bb)
 
-The inner-loop has the properties expected of inner-most loops
+The inner-loop also has the properties expected of inner-most loops
 as described above.  */
 
   if ((loop->inner)->inner || (loop->inner)->next)
@@ -1845,16 +1842,13 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   "not vectorized:"
   " too many incoming edges.\n");
 
-  /* We assume that the loop exit condition is at the end of the loop. i.e,
- that the loop is represented as a do-while (with a proper if-guard
- before the loop if needed), where the loop header contains all the
- executable statements, and the latch is empty.  */
+  

Re: Add -falign-all-functions

2024-01-18 Thread Richard Biener
On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > On Wed, 17 Jan 2024, Jan Hubicka wrote:
> > 
> > > > 
> > > > I meant the new option might be named -fmin-function-alignment=
> > > > rather than -falign-all-functions because of how it should
> > > > override all other options.
> > > 
> > > I was also pondering about both names.  -falign-all-functions has the
> > > advantage that it is similar to all the other alignment flags that are
> > > all called -falign-XXX
> > > 
> > > but both options are finte for me.
> > > > 
> > > > Otherwise is there an updated patch to look at?
> > > 
> > > I will prepare one.  So shall I drop the max-skip support for alignment
> > > and rename the flag?
> > 
> > Yes.
> OK, here is updated version.
> Bootstrapped/regtested on x86_64-linux, OK?
> 
> gcc/ChangeLog:
> 
>   * common.opt (flimit-function-alignment): Reorder so file is
>   alphabetically ordered.
>   (flimit-function-alignment): New flag.

fmin-function-alignment

OK with that change.

Thanks,
Richard.

>   * doc/invoke.texi (-fmin-function-alignment): Document
>   (-falign-jumps,-falign-labels): Document that this is an optimization
>   bypassed in cold code.
>   * varasm.cc (assemble_start_function): Honor -fmin-function-alignment.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5f0a101bccb..6e85853f086 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1040,9 +1040,6 @@ Align the start of functions.
>  falign-functions=
>  Common RejectNegative Joined Var(str_align_functions) Optimization
>  
> -flimit-function-alignment
> -Common Var(flag_limit_function_alignment) Optimization Init(0)
> -
>  falign-jumps
>  Common Var(flag_align_jumps) Optimization
>  Align labels which are only reached by jumping.
> @@ -2277,6 +2274,10 @@ fmessage-length=
>  Common RejectNegative Joined UInteger
>  -fmessage-length=Limit diagnostics to  characters per 
> line.  0 suppresses line-wrapping.
>  
> +fmin-function-alignment=
> +Common Joined RejectNegative UInteger Var(flag_min_function_alignment) 
> Optimization
> +Align the start of every function.
> +
>  fmodulo-sched
>  Common Var(flag_modulo_sched) Optimization
>  Perform SMS based modulo scheduling before the first scheduling pass.
> @@ -2601,6 +2602,9 @@ starts and when the destructor finishes.
>  flifetime-dse=
>  Common Joined RejectNegative UInteger Var(flag_lifetime_dse) Optimization 
> IntegerRange(0, 2)
>  
> +flimit-function-alignment
> +Common Var(flag_limit_function_alignment) Optimization Init(0)
> +
>  flive-patching
>  Common RejectNegative Alias(flive-patching=,inline-clone) Optimization
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 43fd3c3a3cd..456374d9446 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -546,6 +546,7 @@ Objective-C and Objective-C++ Dialects}.
>  -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
>  -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
>  -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}
> +-fmin-function-alignment=[@var{n}]
>  -fno-allocation-dce -fallow-store-data-races
>  -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}]
>  -fauto-inc-dec  -fbranch-probabilities
> @@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two 
> greater than or
>  equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
>  least the first @var{m} bytes of the function can be fetched by the CPU
>  without crossing an @var{n}-byte alignment boundary.
> +This is an optimization of code performance and alignment is ignored for
> +functions considered cold.  If alignment is required for all functions,
> +use @option{-fmin-function-alignment}.
>  
>  If @var{m} is not specified, it defaults to @var{n}.
>  
> @@ -14240,6 +14244,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align loops to a power-of-two boundary.  If the loops are executed
>  many times, this makes up for any execution of the dummy padding
>  instructions.
> +This is an optimization of code performance and alignment is ignored for
> +loops considered cold.
>  
>  If @option{-falign-labels} is greater than this value, then its value
>  is used instead.
> @@ -14262,6 +14268,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align branch targets to a power-of-two boundary, for branch targets
>  where the targets can only be reached by jumping.  In this case,
>  no dummy operations need be executed.
> +This is an optimization of code performance and alignment is ignored for
> +jumps considered cold.
>  
>  If @option{-falign-labels} is greater than this value, then its value
>  is used instead.
> @@ -14275,6 +14283,14 @@ The maximum allowed @var{n} option value is 65536.
>  
>  Enabled at levels @option{-O2}, @option{-O3}.
>  
> +@opindex fmin-function-alignment=@var{n}
> +@item -fmin-function-alignment
> +Specify minimal alignment of functions to the next power-of-two greater than 
> or
> +equal to @var{n}. Unlike 

Re: [PATCH] libstdc++: Fix constexpr _Safe_iterator in C++20 mode

2024-01-18 Thread Jonathan Wakely
On Thu, 18 Jan 2024 at 02:48, Patrick Palka wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

Please add PR109536 to the commit message.



>
> -- >8 --
>
> Some _Safe_iterator member functions define a variable of non-literal
> type __gnu_cxx::__scoped_lock, which automatically disqualifies them from
> being constexpr in C++20 mode even if that code path is never constant
> evaluated.  This restriction was lifted by P2242R3 for C++23, but we
> need to work around it in C++20 mode.  To that end this patch defines
> a pair of macros that encapsulate the lambda-based workaround mentioned
> in that paper and uses them to make the functions valid C++20 constexpr
> functions.  The augmented std::vector test element_access/constexpr.cc
> now successfully compiles in C++20 mode with -D_GLIBCXX_DEBUG (and it
> tests all modified member functions).
>
> libstdc++-v3/ChangeLog:
>
> * include/debug/safe_base.h (_Safe_sequence_base::_M_swap):
> Remove _GLIBCXX20_CONSTEXPR.
> * include/debug/safe_iterator.h 
> (_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN):
> (_GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END): Define.
> (_Safe_iterator::operator=): Use them around the code path that
> defines a variable of type __gnu_cxx::__scoped_lock.
> (_Safe_iterator::operator++): Likewise.
> (_Safe_iterator::operator--): Likewise.
> (_Safe_iterator::operator+=): Likewise.
> (_Safe_iterator::operator-=): Likewise.
> * testsuite/23_containers/vector/element_access/constexpr.cc
> (test_iterators): Also test copy and move assignment.
> * testsuite/std/ranges/adaptors/all.cc (test08) [_GLIBCXX_DEBUG]:
> Use std::vector unconditionally.
> ---
>  libstdc++-v3/include/debug/safe_base.h|  1 -
>  libstdc++-v3/include/debug/safe_iterator.h| 48 ++-
>  .../vector/element_access/constexpr.cc|  2 +
>  .../testsuite/std/ranges/adaptors/all.cc  |  4 --
>  4 files changed, 38 insertions(+), 17 deletions(-)
>
> diff --git a/libstdc++-v3/include/debug/safe_base.h 
> b/libstdc++-v3/include/debug/safe_base.h
> index 107fef3cb02..d5fbe4b1320 100644
> --- a/libstdc++-v3/include/debug/safe_base.h
> +++ b/libstdc++-v3/include/debug/safe_base.h
> @@ -268,7 +268,6 @@ namespace __gnu_debug
>   *  operation is complete all iterators that originally referenced
>   *  one container now reference the other container.
>   */
> -_GLIBCXX20_CONSTEXPR
>  void
>  _M_swap(_Safe_sequence_base& __x) _GLIBCXX_USE_NOEXCEPT;
>
> diff --git a/libstdc++-v3/include/debug/safe_iterator.h 
> b/libstdc++-v3/include/debug/safe_iterator.h
> index 1bc7c904ee0..929fd9b0ade 100644
> --- a/libstdc++-v3/include/debug/safe_iterator.h
> +++ b/libstdc++-v3/include/debug/safe_iterator.h
> @@ -65,6 +65,20 @@
>_GLIBCXX_DEBUG_VERIFY_OPERANDS(_Lhs, _Rhs, __msg_distance_bad,   \
>  __msg_distance_different)
>
> +// This pair of macros helps with writing valid C++20 constexpr functions 
> that
> +// contain a non-constexpr code path that defines a non-literal variable, 
> which
> +// was otherwise disallowed until P2242R3 for C++23.  We use them below for
> +// __gnu_cxx::__scoped_lock so that the containing functions are still
> +// considered valid C++20 constexpr functions.
> +
> +#if __cplusplus >= 202002L && __cpp_constexpr < 202110L
> +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN [&]() -> void { do
> +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END while(false); }();

Do we need the do-while to create a single statement from the block?
Isn't the lambda body enough to create a single statement from it,
which can't be broken by a dangling else or anything like that?


> +#else
> +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN
> +# define _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
> +#endif
> +
>  namespace __gnu_debug
>  {
>/** Helper struct to deal with sequence offering a before_begin
> @@ -266,11 +280,11 @@ namespace __gnu_debug
>   ._M_iterator(__x, "other"));
>
> if (this->_M_sequence && this->_M_sequence == __x._M_sequence)
> - {
> + _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
> __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
> base() = __x.base();
> _M_version = __x._M_sequence->_M_version;
> - }
> + } _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_END
> else
>   {
> _M_detach();
> @@ -306,11 +320,11 @@ namespace __gnu_debug
>   return *this;
>
> if (this->_M_sequence && this->_M_sequence == __x._M_sequence)
> - {
> + _GLIBCXX20_CONSTEXPR_NON_LITERAL_SCOPE_BEGIN {
> __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
> base() = __x.base();
> _M_version = __x._M_sequence->_M_version;
> - }
> + } 

Re: [PATCH] RISC-V: Support vi variant for vec_cmp

2024-01-18 Thread Kito Cheng
LGTM, thanks :)

On Thu, Jan 18, 2024 at 5:59 PM Juzhe-Zhong  wrote:
>
> While running various benchmarks, I notice we miss vi variant support for 
> integer comparison.
> That is, we can vectorize code into vadd.vi but we can't vectorize into 
> vmseq.vi.
>
> Consider this following case:
>
> void
> foo (int n, int **__restrict a)
> {
>   int b;
>   int c;
>   int d;
>   for (b = 0; b < n; b++)
> for (long e = 8; e > 0; e--)
>   a[b][e] = a[b][e] == 15;
> }
>
> Before this patch:
>
> vsetivlizero,4,e32,m1,ta,ma
> vmv.v.i v4,15
> vmv.v.i v3,1
> vmv.v.i v2,0
> .L3:
> ld  a5,0(a1)
> addia4,a5,4
> addia5,a5,20
> vle32.v v1,0(a5)
> vle32.v v0,0(a4)
> vmseq.vvv0,v0,v4
>
> After this patch:
>
> ld  a5,0(a1)
> addia4,a5,4
> addia5,a5,20
> vle32.v v1,0(a5)
> vle32.v v0,0(a4)
> vmseq.viv0,v0,15
>
> It's the missing feature caused by our some mistakes, support vi variant for 
> vec_cmp like other patterns (add, sub, ..., etc).
>
> Tested with no regression, ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Support vi variant.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-3.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-4.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-5.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-6.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-7.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-8.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/cmp_vi-9.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/macro.h: New test.
>
> ---
>  gcc/config/riscv/autovec.md   |  4 +--
>  .../riscv/rvv/autovec/cmp/cmp_vi-1.c  | 16 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-2.c  | 16 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-3.c  | 28 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-4.c  | 28 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-5.c  | 16 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-6.c  | 16 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-7.c  | 28 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-8.c  | 28 +++
>  .../riscv/rvv/autovec/cmp/cmp_vi-9.c  | 18 
>  .../gcc.target/riscv/rvv/autovec/cmp/macro.h  | 11 
>  11 files changed, 207 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-8.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-9.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/macro.h
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 706cd9717cb..5ec1c59bdd4 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -664,7 +664,7 @@
>[(set (match_operand: 0 "register_operand")
> (match_operator: 1 "comparison_operator"
>   [(match_operand:V_VLSI 2 "register_operand")
> -  (match_operand:V_VLSI 3 "register_operand")]))]
> +  (match_operand:V_VLSI 3 "nonmemory_operand")]))]
>"TARGET_VECTOR"
>{
>  riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
> @@ -677,7 +677,7 @@
>[(set (match_operand: 0 "register_operand")
> (match_operator: 1 "comparison_operator"
>   [(match_operand:V_VLSI 2 "register_operand")
> -  (match_operand:V_VLSI 3 "register_operand")]))]
> +  (match_operand:V_VLSI 3 "nonmemory_operand")]))]
>"TARGET_VECTOR"
>{
>  riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c
> new file mode 100644
> index 000..10c232f77bd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +#include "macro.h"
> +
> +CMP_VI (ne_char, char, n, !=, 15)
> +CMP_VI 

[PATCH] RISC-V: Support vi variant for vec_cmp

2024-01-18 Thread Juzhe-Zhong
While running various benchmarks, I notice we miss vi variant support for 
integer comparison.
That is, we can vectorize code into vadd.vi but we can't vectorize into 
vmseq.vi.

Consider this following case:

void
foo (int n, int **__restrict a)
{
  int b;
  int c;
  int d;
  for (b = 0; b < n; b++)
for (long e = 8; e > 0; e--)
  a[b][e] = a[b][e] == 15;
}

Before this patch:

vsetivlizero,4,e32,m1,ta,ma
vmv.v.i v4,15
vmv.v.i v3,1
vmv.v.i v2,0
.L3:
ld  a5,0(a1)
addia4,a5,4
addia5,a5,20
vle32.v v1,0(a5)
vle32.v v0,0(a4)
vmseq.vvv0,v0,v4

After this patch:

ld  a5,0(a1)
addia4,a5,4
addia5,a5,20
vle32.v v1,0(a5)
vle32.v v0,0(a4)
vmseq.viv0,v0,15

It's the missing feature caused by our some mistakes, support vi variant for 
vec_cmp like other patterns (add, sub, ..., etc).

Tested with no regression, ok for trunk ?

gcc/ChangeLog:

* config/riscv/autovec.md: Support vi variant.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-4.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-5.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-6.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-7.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-8.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/cmp_vi-9.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/macro.h: New test.

---
 gcc/config/riscv/autovec.md   |  4 +--
 .../riscv/rvv/autovec/cmp/cmp_vi-1.c  | 16 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-2.c  | 16 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-3.c  | 28 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-4.c  | 28 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-5.c  | 16 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-6.c  | 16 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-7.c  | 28 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-8.c  | 28 +++
 .../riscv/rvv/autovec/cmp/cmp_vi-9.c  | 18 
 .../gcc.target/riscv/rvv/autovec/cmp/macro.h  | 11 
 11 files changed, 207 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/macro.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 706cd9717cb..5ec1c59bdd4 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -664,7 +664,7 @@
   [(set (match_operand: 0 "register_operand")
(match_operator: 1 "comparison_operator"
  [(match_operand:V_VLSI 2 "register_operand")
-  (match_operand:V_VLSI 3 "register_operand")]))]
+  (match_operand:V_VLSI 3 "nonmemory_operand")]))]
   "TARGET_VECTOR"
   {
 riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
@@ -677,7 +677,7 @@
   [(set (match_operand: 0 "register_operand")
(match_operator: 1 "comparison_operator"
  [(match_operand:V_VLSI 2 "register_operand")
-  (match_operand:V_VLSI 3 "register_operand")]))]
+  (match_operand:V_VLSI 3 "nonmemory_operand")]))]
   "TARGET_VECTOR"
   {
 riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c
new file mode 100644
index 000..10c232f77bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/cmp_vi-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
+
+#include "macro.h"
+
+CMP_VI (ne_char, char, n, !=, 15)
+CMP_VI (ne_short, short, n, !=, 15)
+CMP_VI (ne_int, int, n, !=, 15)
+CMP_VI (ne_long, long, n, !=, 15)
+CMP_VI (ne_unsigned_char, unsigned char, n, !=, 15)
+CMP_VI (ne_unsigned_short, unsigned short, n, !=, 15)
+CMP_VI (ne_unsigned_int, unsigned int, n, !=, 15)
+CMP_VI (ne_unsigned_long, unsigned long, n, !=, 

Re: [PATCH v4] RISC-V: Introduce XTheadVector as a subset of V1.0.0

2024-01-18 Thread Kito Cheng
LGTM

On Fri, Jan 12, 2024 at 3:32 PM juzhe.zh...@rivai.ai
 wrote:
>
> This patch needs kito review. I can't approve that.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Jun Sha (Joshua)
> Date: 2024-01-12 11:20
> To: gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
> christoph.muellner; juzhe.zhong; kito.cheng; Jun Sha (Joshua); Jin Ma; 
> Xianmiao Qu
> Subject: [PATCH v4] RISC-V: Introduce XTheadVector as a subset of V1.0.0
> This patch is to introduce basic XTheadVector support
> (march string parsing and a test for __riscv_xtheadvector)
> according to https://github.com/T-head-Semi/thead-extension-spec/
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc
> (riscv_subset_list::parse): Add new vendor extension.
> * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
> Add test marco.
> * config/riscv/riscv.opt:  Add new mask.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/predef-__riscv_th_v_intrinsic.c: New test.
> * gcc.target/riscv/rvv/xtheadvector.c: New test.
>
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
> ---
> gcc/common/config/riscv/riscv-common.cc   | 23 +++
> gcc/config/riscv/riscv-c.cc   |  8 +--
> gcc/config/riscv/riscv.opt|  2 ++
> .../riscv/predef-__riscv_th_v_intrinsic.c | 11 +
> .../gcc.target/riscv/rvv/xtheadvector.c   | 13 +++
> 5 files changed, 55 insertions(+), 2 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 0301d170a41..449722070d4 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -368,6 +368,7 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"xtheadvector", ISA_SPEC_CLASS_NONE, 1, 0},
>{"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
> @@ -1251,6 +1252,15 @@ riscv_subset_list::check_conflict_ext ()
>if (lookup ("zcmp"))
> error_at (m_loc, "%<-march=%s%>: zcd conflicts with zcmp", m_arch);
>  }
> +
> +  if ((lookup ("v") || lookup ("zve32x")
> + || lookup ("zve64x") || lookup ("zve32f")
> + || lookup ("zve64f") || lookup ("zve64d")
> + || lookup ("zvl32b") || lookup ("zvl64b")
> + || lookup ("zvl128b") || lookup ("zvfh"))
> + && lookup ("xtheadvector"))
> +error_at (m_loc, "%<-march=%s%>: xtheadvector conflicts with vector "
> +"extension or its sub-extensions", m_arch);
> }
> /* Parsing function for multi-letter extensions.
> @@ -1743,6 +1753,19 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{"xtheadmemidx",  _options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
>{"xtheadmempair", _options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
>{"xtheadsync",_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
> +  {"xtheadvector",  _options::x_riscv_xthead_subext, MASK_XTHEADVECTOR},
> +  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_32},
> +  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_64},
> +  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_FP_32},
> +  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_FP_64},
> +  {"xtheadvector",  _options::x_riscv_vector_elen_flags, 
> MASK_VECTOR_ELEN_FP_16},
> +  {"xtheadvector",  _options::x_riscv_zvl_flags, MASK_ZVL32B},
> +  {"xtheadvector",  _options::x_riscv_zvl_flags, MASK_ZVL64B},
> +  {"xtheadvector",  _options::x_riscv_zvl_flags, MASK_ZVL128B},
> +  {"xtheadvector",  _options::x_riscv_zf_subext, MASK_ZVFHMIN},
> +  {"xtheadvector",  _options::x_riscv_zf_subext, MASK_ZVFH},
> +  {"xtheadvector",  _options::x_target_flags, MASK_FULL_V},
> +  {"xtheadvector",  _options::x_target_flags, MASK_VECTOR},
>{"xventanacondops", _options::x_riscv_xventana_subext, 
> MASK_XVENTANACONDOPS},
> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
> index ba60cd8b555..422ddc2c308 100644
> --- a/gcc/config/riscv/riscv-c.cc
> +++ b/gcc/config/riscv/riscv-c.cc
> @@ -142,6 +142,10 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
>  riscv_ext_version_value (0, 11));
>  }
> +   if (TARGET_XTHEADVECTOR)
> + builtin_define_with_int_value ("__riscv_th_v_intrinsic",
> +  riscv_ext_version_value (0, 11));
> +
>/* Define architecture extension test macros.  */
>builtin_define_with_int_value ("__riscv_arch_test", 1);
> @@ -195,8 +199,8 @@ riscv_pragma_intrinsic (cpp_reader *)
>  {
>if (!TARGET_VECTOR)
> {
> -   error ("%<#pragma riscv intrinsic%> option %qs needs 'V' extension "
> - 

Re: [COMMITTED] rust_debug: Cast size_t values to unsigned long before printing.

2024-01-18 Thread Rainer Orth
Hi Arthur,

> Yes, I was talking about this on IRC the other day - if we do run in a
> situation where we have more than UINT32_MAX procedural macros in memory 
> we have big issues. These debug prints will probably end up getting removed
> soon as they clutter the output a lot for little information.

makes sense, especially if they break the build once in a while ;-)

> I don't mind doing it the right way for our regular prints, but we have not
> been using PRIu64 in our codebase so far, so I'd rather change all those
> incriminating format specifiers at once later down the line - this patch
> was pushed so that 32bit targets could bootstrap the Rust frontend for now.

Makes sense: using different styles throughout the codebase only creates
confusion.

On a related issue: didn't you have some 32-bit host in your CI?  I
remember having similar issues in the past which could easily be avoided
in advance this way.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [COMMITTED] rust_debug: Cast size_t values to unsigned long before printing.

2024-01-18 Thread Arthur Cohen

Hi Rainer,

On 1/18/24 10:13, Rainer Orth wrote:

Arthur Cohen  writes:


Using %lu to format size_t values breaks 32 bit targets, and %zu is not
supported by one of the hosts GCC aims to support - HPUX


But we do have uses of %zu in gcc/rust already!


diff --git a/gcc/rust/expand/rust-proc-macro.cc 
b/gcc/rust/expand/rust-proc-macro.cc
index e8618485b71..09680733e98 100644
--- a/gcc/rust/expand/rust-proc-macro.cc
+++ b/gcc/rust/expand/rust-proc-macro.cc
@@ -171,7 +171,7 @@ load_macros (std::string path)
if (array == nullptr)
  return {};
  
-  rust_debug ("Found %lu procedural macros", array->length);

+  rust_debug ("Found %lu procedural macros", (unsigned long) array->length);


Not the best way either: array->length is std::uint64_t, so the format
should use

... %" PRIu64 " procedural...

instead.

I've attached my patch to PR rust/113461.


Yes, I was talking about this on IRC the other day - if we do run in a 
situation where we have more than UINT32_MAX procedural macros in memory 
we have big issues. These debug prints will probably end up getting 
removed soon as they clutter the output a lot for little information.


I don't mind doing it the right way for our regular prints, but we have 
not been using PRIu64 in our codebase so far, so I'd rather change all 
those incriminating format specifiers at once later down the line - this 
patch was pushed so that 32bit targets could bootstrap the Rust frontend 
for now.


Best,

Arthur


Rainer



Re: [PATCH] libstdc++: hashtable: No need to update before begin node in _M_remove_bucket_begin

2024-01-18 Thread Huanghui Nie
Yes, I have. I did a benchmark today.

The conclusion is: the time consumption can be reduced by 0.4% ~ 1.2% when
unordered_set erase(begin()), and 1.2% ~ 2.4% when erase(begin(), end()).


My test environment:

CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 2393.365 MHz, 56 CPUs

MEM: 256G

OS: CentOS-8.2

g++: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

Compile flags: -O3 -std=c++17


Test conclusion data (time taken to delete every 100 million elements):

erase(begin()):

|size of unordered_set   |100 |1,000   |10,000  |100,000
|1,000,000|10,000,000|

|base time consuming (ms)|3827.736|3807.725|3830.168|3807.373|3798.713
|3854.168  |

|test time consuming (ms)|3783.406|3789.460|3791.146|3778.033|3783.494
|3808.137  |

|Time-consuming reduction|1.16%   |0.48%   |1.02%   |0.77%   |0.40%|1.19%
|

erase(begin(),end()):

|size of unordered_set   |100 |1,000   |10,000  |100,000
|1,000,000|10,000,000|

|base time consuming (ms)|2779.229|2768.550|2795.778|2767.385|2761.521
|2804.099  |

|test time consuming (ms)|2712.759|2726.578|2752.224|2732.140|2718.953
|2739.727  |

|Time-consuming reduction|2.39%   |1.52%   |1.56%   |1.27%   |1.54%|2.30%
|


Please see the attachment for test code and detailed test result.

2024年1月18日(木) 4:04 François Dumont :

> Hi
>
> Looks like a great finding to me, this is indeed a useless check, thanks!
>
> Have you any figures on the performance enhancement ? It might help to get
> proper approval as gcc is currently in dev stage 4 that is to say only bug
> fixes normally.
>
> François
> On 17/01/2024 09:11, Huanghui Nie wrote:
>
> Hi.
>
> When I implemented a hash table with reference to the C++ STL, I found
> that when the hash table in the C++ STL deletes elements, if the first
> element deleted is the begin element, the before begin node is repeatedly
> assigned. This creates unnecessary performance overhead.
>
>
> First, let’s see the code implementation:
>
> In _M_remove_bucket_begin, _M_before_begin._M_nxt is assigned when
> &_M_before_begin == _M_buckets[__bkt]. That also means
> _M_buckets[__bkt]->_M_nxt is assigned under some conditions.
>
> _M_remove_bucket_begin is called by _M_erase and _M_extract_node:
>
>1. Case _M_erase a range: _M_remove_bucket_begin is called in a for
>loop when __is_bucket_begin is true. And if __is_bucket_begin is true and
>&_M_before_begin == _M_buckets[__bkt], __prev_n must be &_M_before_begin.
>__prev_n->_M_nxt is always assigned in _M_erase. That means
>_M_before_begin._M_nxt is always assigned, if _M_remove_bucket_begin is
>called and &_M_before_begin == _M_buckets[__bkt]. So there’s no need to
>assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>2. Other cases: _M_remove_bucket_begin is called when __prev_n ==
>_M_buckets[__bkt]. And __prev_n->_M_nxt is always assigned in _M_erase and
>_M_before_begin. That means _M_buckets[__bkt]->_M_nxt is always assigned.
>So there's no need to assign _M_buckets[__bkt]->_M_nxt in
>_M_remove_bucket_begin.
>
> In summary, there’s no need to check &_M_before_begin == _M_buckets[__bkt]
> and assign _M_before_begin._M_nxt in _M_remove_bucket_begin.
>
>
> Then let’s see the responsibility of each method:
>
> The hash table in the C++ STL is composed of hash buckets and a node list.
> The update of the node list is responsible for _M_erase and _M_extract_node
> method. _M_remove_bucket_begin method only needs to update the hash
> buckets. The update of _M_before_begin belongs to the update of the node
> list. So _M_remove_bucket_begin doesn’t need to update _M_before_begin.
>
>
> Existing tests listed below cover this change:
>
> 23_containers/unordered_set/allocator/copy.cc
>
> 23_containers/unordered_set/allocator/copy_assign.cc
>
> 23_containers/unordered_set/allocator/move.cc
>
> 23_containers/unordered_set/allocator/move_assign.cc
>
> 23_containers/unordered_set/allocator/swap.cc
>
> 23_containers/unordered_set/erase/1.cc
>
> 23_containers/unordered_set/erase/24061-set.cc
>
> 23_containers/unordered_set/modifiers/extract.cc
>
> 23_containers/unordered_set/operations/count.cc
>
> 23_containers/unordered_set/requirements/exception/basic.cc
>
> 23_containers/unordered_map/allocator/copy.cc
>
> 23_containers/unordered_map/allocator/copy_assign.cc
>
> 23_containers/unordered_map/allocator/move.cc
>
> 23_containers/unordered_map/allocator/move_assign.cc
>
> 23_containers/unordered_map/allocator/swap.cc
>
> 23_containers/unordered_map/erase/1.cc
>
> 23_containers/unordered_map/erase/24061-map.cc
>
> 23_containers/unordered_map/modifiers/extract.cc
>
> 23_containers/unordered_map/modifiers/move_assign.cc
>
> 23_containers/unordered_map/operations/count.cc
>
> 23_containers/unordered_map/requirements/exception/basic.cc
>
>
> Regression tested on x86_64-pc-linux-gnu. Is it OK to commit?
>
>
> ---
>
> ChangeLog:
>
>
> libstdc++: hashtable: No need to update before begin node in
> _M_remove_bucket_begin
>
>
> 2024-01-16  Huanghui 

Re: [PATCH] i386: Add -masm=intel profiling support [PR113122]

2024-01-18 Thread Uros Bizjak
On Thu, Jan 18, 2024 at 8:31 AM Jakub Jelinek  wrote:
>
> Hi!
>
> x86_function_profiler emits assembly directly into file and only emits
> AT syntax.  The following patch adjusts it to emit MASM syntax
> if -masm=intel.
> As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.
>
> I've tested using
> for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; 
> do
> ./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
> ./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
> objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
> diff -up /tmp/1 /tmp/2; done
> that the emitted sequences are identical after assembly.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-01-18  Jakub Jelinek  
>
> PR target/113122
> * config/i386/i386.cc (x86_function_profiler): Add -masm=intel
> support.  Add missing space after , in emitted assembly in some
> cases.  Formatting fixes.
>
> * gcc.target/i386/pr113122-1.c: New test.
> * gcc.target/i386/pr113122-2.c: New test.
> * gcc.target/i386/pr113122-3.c: New test.
> * gcc.target/i386/pr113122-4.c: New test.

LGTM.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.cc.jj  2024-01-05 15:22:21.810685516 +0100
> +++ gcc/config/i386/i386.cc 2024-01-17 16:52:48.026177278 +0100
> @@ -22746,7 +22746,10 @@ x86_function_profiler (FILE *file, int l
>if (TARGET_64BIT)
>  {
>  #ifndef NO_PROFILE_COUNTERS
> -  fprintf (file, "\tleaq\t%sP%d(%%rip),%%r11\n", LPREFIX, labelno);
> +  if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file, "\tlea\tr11, %sP%d[rip]\n", LPREFIX, labelno);
> +  else
> +   fprintf (file, "\tleaq\t%sP%d(%%rip), %%r11\n", LPREFIX, labelno);
>  #endif
>
>if (!TARGET_PECOFF)
> @@ -22757,12 +22760,29 @@ x86_function_profiler (FILE *file, int l
>   /* NB: R10 is caller-saved.  Although it can be used as a
>  static chain register, it is preserved when calling
>  mcount for nested functions.  */
> - fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> -  mcount_name);
> + if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file, "1:\tmovabs\tr10, OFFSET FLAT:%s\n"
> +  "\tcall\tr10\n", mcount_name);
> + else
> +   fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> +mcount_name);
>   break;
> case CM_LARGE_PIC:
>  #ifdef NO_PROFILE_COUNTERS
> - fprintf (file, "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, 
> %%r11\n");
> + if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   {
> + fprintf (file, "1:movabs\tr11, "
> +"OFFSET FLAT:_GLOBAL_OFFSET_TABLE_-1b\n");
> + fprintf (file, "\tlea\tr10, 1b[rip]\n");
> + fprintf (file, "\tadd\tr10, r11\n");
> + fprintf (file, "\tmovabs\tr11, OFFSET FLAT:%s@PLTOFF\n",
> +  mcount_name);
> + fprintf (file, "\tadd\tr10, r11\n");
> + fprintf (file, "\tcall\tr10\n");
> + break;
> +   }
> + fprintf (file,
> +  "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
>   fprintf (file, "\tleaq\t1b(%%rip), %%r10\n");
>   fprintf (file, "\taddq\t%%r11, %%r10\n");
>   fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n", mcount_name);
> @@ -22776,7 +22796,12 @@ x86_function_profiler (FILE *file, int l
> case CM_MEDIUM_PIC:
>   if (!ix86_direct_extern_access)
> {
> - fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", 
> mcount_name);
> + if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]",
> +mcount_name);
> + else
> +   fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n",
> +mcount_name);
>   break;
> }
>   /* fall through */
> @@ -22791,23 +22816,37 @@ x86_function_profiler (FILE *file, int l
>else if (flag_pic)
>  {
>  #ifndef NO_PROFILE_COUNTERS
> -  fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%" PROFILE_COUNT_REGISTER 
> "\n",
> -  LPREFIX, labelno);
> +  if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file,
> +"\tlea\t" PROFILE_COUNT_REGISTER ", %sP%d@GOTOFF[ebx]\n",
> +LPREFIX, labelno);
> +  else
> +   fprintf (file,
> +"\tleal\t%sP%d@GOTOFF(%%ebx), %%" PROFILE_COUNT_REGISTER 
> "\n",
> +LPREFIX, labelno);
>  #endif
> -  fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
> +  if 

[patch,avr,applied] Minor fixes in device-specs generation

2024-01-18 Thread Georg-Johann Lay

This fixes a typo in the diagnose of a spec.
Also re-uses a spec for a simpler specs file.

Johann

--

AVR: Fix typo in device-specs generation.  Reuse -m[no-]rodata-in-ram 
checker.


gcc/
* config/avr/gen-avr-mmcu-specs.cc (diagnose_rodata_in_ram): Fix typo
in the diagnostic, and capitalize the device name.
(print_mcu): Generate specs such that:
<*check_rodata_in_ram>: New.
<*cc1_misc>: Use check_rodata_in_ram instead of cc1_rodata_in_ram.
<*link_misc>: Use check_rodata_in_ram instead of link_rodata_in_ram.
<*cc1_rodata_in_ram, *link_rodata_in_ram>: Remove.diff --git a/gcc/config/avr/gen-avr-mmcu-specs.cc b/gcc/config/avr/gen-avr-mmcu-specs.cc
index eb9ab8854d8..72841b1bb42 100644
--- a/gcc/config/avr/gen-avr-mmcu-specs.cc
+++ b/gcc/config/avr/gen-avr-mmcu-specs.cc
@@ -143,22 +143,28 @@ diagnose_mrodata_in_ram (FILE *f, const char *spec, const avr_mcu_t *mcu)
   const bool rodata_in_flash = (arch_id == ARCH_AVRTINY
 || (arch_id == ARCH_AVRXMEGA3
 && have_avrxmega3_rodata_in_flash));
+  // Device name as used by the vendor, extracted from "__AVR___".
+  char mcu_Name[50] = { 0 };
+  if (! is_arch)
+snprintf (mcu_Name, 1 + strlen (mcu->macro) - strlen ("__AVR___"),
+	  "%s", mcu->macro + strlen ("__AVR_"));
+
   fprintf (f, "%s:\n", spec);
   if (rodata_in_flash && is_arch)
-fprintf (f, "\t%%{mrodata-in-ram: %%e-mrodata-in-ram not supported"
+fprintf (f, "\t%%{mrodata-in-ram: %%e-mrodata-in-ram is not supported"
 	 " for %s}", mcu->name);
   else if (rodata_in_flash)
-fprintf (f, "\t%%{mrodata-in-ram: %%e-mrodata-in-ram not supported"
-	 " for %s (arch=%s)}", mcu->name, arch->name);
+fprintf (f, "\t%%{mrodata-in-ram: %%e-mrodata-in-ram is not supported"
+	 " for %s (arch=%s)}", mcu_Name, arch->name);
   else if (is_arch)
 {
   if (! have_flmap2 && ! have_flmap4)
-	fprintf (f, "\t%%{mno-rodata-in-ram: %%e-mno-rodata-in-ram not"
+	fprintf (f, "\t%%{mno-rodata-in-ram: %%e-mno-rodata-in-ram is not"
 		 " supported for %s}", mcu->name);
 }
   else if (! have_flmap)
-fprintf (f, "\t%%{mno-rodata-in-ram: %%e-mno-rodata-in-ram not supported"
-	 " for %s (arch=%s)}", mcu->name, arch->name);
+fprintf (f, "\t%%{mno-rodata-in-ram: %%e-mno-rodata-in-ram is not supported"
+	 " for %s (arch=%s)}", mcu_Name, arch->name);
   fprintf (f, "\n\n");
 }
 
@@ -265,6 +271,9 @@ print_mcu (const avr_mcu_t *mcu)
 }
 #endif  // WITH_AVRLIBC
 
+  // Diagnose usage of -m[no-]rodata-in-ram.
+  diagnose_mrodata_in_ram (f, "*check_rodata_in_ram", mcu);
+
   // avr-gcc specific specs for the compilation / the compiler proper.
 
   int n_flash = 1 + (mcu->flash_size - 1) / 0x1;
@@ -285,9 +294,7 @@ print_mcu (const avr_mcu_t *mcu)
: "\t%{mabsdata}");
 
   // -m[no-]rodata-in-ram basically affects linking, but sanity-check early.
-  diagnose_mrodata_in_ram (f, "*cc1_rodata_in_ram", mcu);
-
-  fprintf (f, "*cc1_misc:\n\t%%(cc1_rodata_in_ram)\n\n");
+  fprintf (f, "*cc1_misc:\n\t%%(check_rodata_in_ram)\n\n");
 
   // avr-gcc specific specs for assembling / the assembler.
 
@@ -332,9 +339,6 @@ print_mcu (const avr_mcu_t *mcu)
 
   fprintf (f, "*link_relax:\n\t%s\n\n", LINK_RELAX_SPEC);
 
-  // -m[no-]rodata-in-ram affects linking.  Sanity check its usage.
-  diagnose_mrodata_in_ram (f, "*link_rodata_in_ram", mcu);
-
   fprintf (f, "*link_arch:\n\t%s", link_arch_spec);
   if (is_device
   && flash_pm_offset)
@@ -356,7 +360,8 @@ print_mcu (const avr_mcu_t *mcu)
   fprintf (f, "\n\n");
 }
 
-  fprintf (f, "*link_misc:\n\t%%(link_rodata_in_ram)\n\n");
+  // -m[no-]rodata-in-ram affects linking.  Sanity check its usage.
+  fprintf (f, "*link_misc:\n\t%%(check_rodata_in_ram)\n\n");
 
   // Specs known to GCC.
 


Re: [COMMITTED] rust_debug: Cast size_t values to unsigned long before printing.

2024-01-18 Thread Rainer Orth
Arthur Cohen  writes:

> Using %lu to format size_t values breaks 32 bit targets, and %zu is not
> supported by one of the hosts GCC aims to support - HPUX

But we do have uses of %zu in gcc/rust already!

> diff --git a/gcc/rust/expand/rust-proc-macro.cc 
> b/gcc/rust/expand/rust-proc-macro.cc
> index e8618485b71..09680733e98 100644
> --- a/gcc/rust/expand/rust-proc-macro.cc
> +++ b/gcc/rust/expand/rust-proc-macro.cc
> @@ -171,7 +171,7 @@ load_macros (std::string path)
>if (array == nullptr)
>  return {};
>  
> -  rust_debug ("Found %lu procedural macros", array->length);
> +  rust_debug ("Found %lu procedural macros", (unsigned long) array->length);

Not the best way either: array->length is std::uint64_t, so the format
should use

... %" PRIu64 " procedural...

instead.

I've attached my patch to PR rust/113461.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[COMMITTED] rust_debug: Cast size_t values to unsigned long before printing.

2024-01-18 Thread Arthur Cohen
Using %lu to format size_t values breaks 32 bit targets, and %zu is not
supported by one of the hosts GCC aims to support - HPUX

gcc/rust/ChangeLog:

* backend/rust-compile-base.cc (HIRCompileBase::resolve_method_address):
Cast size_t value to unsigned long.
* expand/rust-proc-macro.cc (load_macros): Likewise.
* typecheck/rust-hir-type-check-expr.cc (TypeCheckExpr::visit): 
Likewise.
---
 gcc/rust/backend/rust-compile-base.cc  | 3 ++-
 gcc/rust/expand/rust-proc-macro.cc | 2 +-
 gcc/rust/typecheck/rust-hir-type-check-expr.cc | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/rust/backend/rust-compile-base.cc 
b/gcc/rust/backend/rust-compile-base.cc
index b4a3685ad93..ae9f6707b72 100644
--- a/gcc/rust/backend/rust-compile-base.cc
+++ b/gcc/rust/backend/rust-compile-base.cc
@@ -965,7 +965,8 @@ HIRCompileBase::resolve_method_address (TyTy::FnType 
*fntype,
 }
 
   const Resolver::PathProbeCandidate *selectedCandidate = nullptr;
-  rust_debug_loc (expr_locus, "resolved to %lu candidates", candidates.size 
());
+  rust_debug_loc (expr_locus, "resolved to %lu candidates",
+ (unsigned long) candidates.size ());
 
   // filter for the possible case of non fn type items
   std::set filteredFunctionCandidates;
diff --git a/gcc/rust/expand/rust-proc-macro.cc 
b/gcc/rust/expand/rust-proc-macro.cc
index e8618485b71..09680733e98 100644
--- a/gcc/rust/expand/rust-proc-macro.cc
+++ b/gcc/rust/expand/rust-proc-macro.cc
@@ -171,7 +171,7 @@ load_macros (std::string path)
   if (array == nullptr)
 return {};
 
-  rust_debug ("Found %lu procedural macros", array->length);
+  rust_debug ("Found %lu procedural macros", (unsigned long) array->length);
 
   return std::vector (array->macros,
array->macros + array->length);
diff --git a/gcc/rust/typecheck/rust-hir-type-check-expr.cc 
b/gcc/rust/typecheck/rust-hir-type-check-expr.cc
index 9dbf657958d..030e5f1b63c 100644
--- a/gcc/rust/typecheck/rust-hir-type-check-expr.cc
+++ b/gcc/rust/typecheck/rust-hir-type-check-expr.cc
@@ -1122,10 +1122,10 @@ TypeCheckExpr::visit (HIR::MethodCallExpr )
 
   auto candidate = *candidates.begin ();
   rust_debug_loc (expr.get_method_name ().get_locus (),
- "resolved method to: {%u} {%s} with [%zu] adjustments",
+ "resolved method to: {%u} {%s} with [%lu] adjustments",
  candidate.candidate.ty->get_ref (),
  candidate.candidate.ty->debug_str ().c_str (),
- candidate.adjustments.size ());
+ (unsigned long) candidate.adjustments.size ());
 
   // Get the adjusted self
   Adjuster adj (receiver_tyty);
-- 
2.42.1



Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-18 Thread chenglulu



在 2024/1/18 下午3:44, Xi Ruoyao 写道:

On Thu, 2024-01-18 at 15:15 +0800, chenglulu wrote:


gcc.dg/tree-ssa/scev-16.c is OK to move
gcc.dg/pr104992.c should simply add -fno-tree-vectorize to the used
options and remove the vect_* stuff

Hi Richard:

I have a question. I don't understand the purpose of adding
'-fno-tree-vectorize' here.

I don't think -fno-tree-vectorize will make a difference here.  This
test case uses __attribute__((vector_size(...))) explicitly so the
vector operation will be used even if -fno-tree-vectorize.

Yes, I did the test and compared the intermediate results and saw no 
difference.


“remove the vect_* stuff”,I don’t quite understand what it means either.:-(



Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2024-01-18 Thread Matthias Kretz
On Thursday, 18 January 2024 08:40:48 CET Andrew Pinski wrote:
> On Wed, Jan 17, 2024 at 11:28 PM Matthias Kretz  wrote:
> > template 
> > struct Point
> > {
> >   T x, y, z;
> >   
> >   T distance_to_origin() {
> > return sqrt(x * x + y * y + z * z);
> >   }
> > };
> > 
> > Point is one point in 3D space, Point> stores multiple
> > points in 3D space and can work on them in parallel.
> > 
> > This implies that simd must have a sizeof. C++ is unlikely to get
> > sizeless types (the discussions were long, there were many papers, ...).
> > Should sizeless types in C++ ever happen, then composition is likely going
> > to be constrained to the last data member.
> 
> Even this is a bad design in general for simd. It means the code needs
> to know the size.

Yes and no. The developer writes size-agnostic code. The person compiling the 
code chooses the size (via -m flags) and thus the compiler sees fixed-size 
code.

> Also AoS vs SoA is always an interesting point here. In some cases you
> want an array of structs
> for speed and Point> does not work there at all. I guess
> This is all water under the bridge with how folks design code.
> You are basically pushing AoSoA idea here which is much worse idea than
> before.

I like to call it "array of vectorized struct" (AoVS) instead of AoSoA to 
emphasize the compiler-flags dependent memory layout.

I've been doing a lot of heterogeneous SIMD programming since 2009, starting 
with an outer loop vectorization across many TUs of a high-energy physics code 
targeting Intel Larrabee (pre-AVX512 ISA) and SSE2 with one source. In all 
these years my experience has been that, if the problem allows, AoVS is best 
in terms of performance and code generality & readability. I'd be interested 
to learn why you think differently.

> That being said sometimes it is not a vector of N elements you want to
> work on but rather 1/2/3 vector of  N elements. Seems like this is
> just pushing the idea one of one vector of one type of element which
> again is wrong push.

I might have misunderstood. You're saying that sometimes I want a  
even though my target CPU only has  registers? Yes! The 
std::experimental::simd spec and implementation isn't good enough in that area 
yet, but the C++26 paper(s) and my prototype implementation provides perfect 
SIMD + ILP translation of the expressed data-parallelism.

> Also more over, I guess pushing one idea of SIMD is worse than pushing
> any idea of SIMD. For Mathematical code, it is better for the compiler
> to do the vectorization than the user try to be semi-portable between
> different targets.

I guess I agree with that statement. But I wouldn't, in general, call the use 
of simd "the user try[ing] to be semi-portable". In my experience, working 
on physics code - a lot of math - using simd (as intended) is better in 
terms of performance and performance portability. As always, abuse is possible 
...

> This is what was learned on Fortran but I guess
> some folks in the C++ likes to expose the underlying HW instead of
> thinking high level here.

The C++ approach is to "leave no room for a lower-level language" while 
designing for high-level abstractions / usage.

> > With the above as our design constraints, SVE at first seems to be a bad
> > fit for implementing std::simd. However, if (at least initially) we accept
> > the need for different binaries for different SVE implementations, then
> > you
> > can look at the "scalable" part of SVE as an efficient way of reducing the
> > number of opcodes necessary for supporting all kinds of different vector
> > lengths. But otherwise you can treat it as fixed-size registers - which it
> > is for a given CPU. In the case of a multi-CPU shared-memory system (e.g.
> > RDMA between different ARM implementations) all you need is a different
> > name for incompatible types. So std::simd on SVE256 must have a
> > different name on SVE512. Same for std::simd (which is currently
> > not the case with Sriniva's patch, I think, and needs to be resolved).
> 
> For SVE that is a bad design. It means The code is not portable at all.

When you say "code" you mean "source code", not binaries, right? I don't see 
how that follows.

- Matthias

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research   https://gsi.de
 std::simd
──





Re: [PATCH] riscv: Remove Bool keywords from riscv.opt

2024-01-18 Thread Kito Cheng
OK, thanks :)

On Thu, Jan 18, 2024 at 4:17 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As I wrote recently, Bool is an undocumented unsupported keyword, as
> can be seen by
> grep Bool doc/options.texi *.awk
> The option parsing just parses and ignores all keywords it doesn't handle.
> But, because it isn't a supported keyword, I think we shouldn't have it in
> *.opt files, because that just means people copy it over to other places
> even when it doesn't have any effect.
>
> Tested with a cross to riscv64-linux, none of the generated
> options.{h,cc} options-{save,urls}.cc
> files change with the patch, only optionlist does (but that is just
> used as a source for those files).
>
> Ok for trunk?
>
> 2024-01-18  Jakub Jelinek  
>
> * config/riscv/riscv.opt (mshorten-memrefs, mrelax, mcsr-check,
> minline-strcmp, minline-strncmp, minline-strlen,
> -param=riscv-vector-abi): Remove Bool keywords.
>
> --- gcc/config/riscv/riscv.opt.jj   2024-01-18 08:44:33.441919890 +0100
> +++ gcc/config/riscv/riscv.opt  2024-01-18 08:58:22.788359898 +0100
> @@ -103,7 +103,7 @@ Target Mask(SAVE_RESTORE)
>  Use smaller but slower prologue and epilogue code.
>
>  mshorten-memrefs
> -Target Bool Var(riscv_mshorten_memrefs) Init(1)
> +Target Var(riscv_mshorten_memrefs) Init(1)
>  Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow 
> more
>  memory accesses to be generated as compressed instructions.  Currently 
> targets
>  32-bit integer load/stores.
> @@ -134,12 +134,12 @@ Target Mask(EXPLICIT_RELOCS)
>  Use %reloc() operators, rather than assembly macros, to load addresses.
>
>  mrelax
> -Target Bool Var(riscv_mrelax) Init(1)
> +Target Var(riscv_mrelax) Init(1)
>  Take advantage of linker relaxations to reduce the number of instructions
>  required to materialize symbol addresses.
>
>  mcsr-check
> -Target Bool Var(riscv_mcsr_check) Init(0)
> +Target Var(riscv_mcsr_check) Init(0)
>  Enable the CSR checking for the ISA-dependent CRS and the read-only CSR.
>  The ISA-dependent CSR are only valid when the specific ISA is set.  The
>  read-only CSR can not be written by the CSR instructions.
> @@ -483,15 +483,15 @@ Target Var(TARGET_INLINE_SUBWORD_ATOMIC)
>  Always inline subword atomic operations.
>
>  minline-strcmp
> -Target Bool Var(riscv_inline_strcmp) Init(0)
> +Target Var(riscv_inline_strcmp) Init(0)
>  Inline strcmp calls if possible.
>
>  minline-strncmp
> -Target Bool Var(riscv_inline_strncmp) Init(0)
> +Target Var(riscv_inline_strncmp) Init(0)
>  Inline strncmp calls if possible.
>
>  minline-strlen
> -Target Bool Var(riscv_inline_strlen) Init(0)
> +Target Var(riscv_inline_strlen) Init(0)
>  Inline strlen calls if possible.
>
>  -param=riscv-strcmp-inline-limit=
> @@ -542,7 +542,7 @@ madjust-lmul-cost
>  Target Var(TARGET_ADJUST_LMUL_COST) Init(0)
>
>  -param=riscv-vector-abi
> -Target Undocumented Bool Var(riscv_vector_abi) Init(0)
> +Target Undocumented Var(riscv_vector_abi) Init(0)
>  Enable the use of vector registers for function arguments and return value.
>  This is an experimental switch and may be subject to change in the future.
>
>
> Jakub
>


[PATCH] riscv: Remove Bool keywords from riscv.opt

2024-01-18 Thread Jakub Jelinek
Hi!

As I wrote recently, Bool is an undocumented unsupported keyword, as
can be seen by
grep Bool doc/options.texi *.awk
The option parsing just parses and ignores all keywords it doesn't handle.
But, because it isn't a supported keyword, I think we shouldn't have it in
*.opt files, because that just means people copy it over to other places
even when it doesn't have any effect.

Tested with a cross to riscv64-linux, none of the generated
options.{h,cc} options-{save,urls}.cc
files change with the patch, only optionlist does (but that is just
used as a source for those files).

Ok for trunk?

2024-01-18  Jakub Jelinek  

* config/riscv/riscv.opt (mshorten-memrefs, mrelax, mcsr-check,
minline-strcmp, minline-strncmp, minline-strlen,
-param=riscv-vector-abi): Remove Bool keywords.

--- gcc/config/riscv/riscv.opt.jj   2024-01-18 08:44:33.441919890 +0100
+++ gcc/config/riscv/riscv.opt  2024-01-18 08:58:22.788359898 +0100
@@ -103,7 +103,7 @@ Target Mask(SAVE_RESTORE)
 Use smaller but slower prologue and epilogue code.
 
 mshorten-memrefs
-Target Bool Var(riscv_mshorten_memrefs) Init(1)
+Target Var(riscv_mshorten_memrefs) Init(1)
 Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
 memory accesses to be generated as compressed instructions.  Currently targets
 32-bit integer load/stores.
@@ -134,12 +134,12 @@ Target Mask(EXPLICIT_RELOCS)
 Use %reloc() operators, rather than assembly macros, to load addresses.
 
 mrelax
-Target Bool Var(riscv_mrelax) Init(1)
+Target Var(riscv_mrelax) Init(1)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
 
 mcsr-check
-Target Bool Var(riscv_mcsr_check) Init(0)
+Target Var(riscv_mcsr_check) Init(0)
 Enable the CSR checking for the ISA-dependent CRS and the read-only CSR.
 The ISA-dependent CSR are only valid when the specific ISA is set.  The
 read-only CSR can not be written by the CSR instructions.
@@ -483,15 +483,15 @@ Target Var(TARGET_INLINE_SUBWORD_ATOMIC)
 Always inline subword atomic operations.
 
 minline-strcmp
-Target Bool Var(riscv_inline_strcmp) Init(0)
+Target Var(riscv_inline_strcmp) Init(0)
 Inline strcmp calls if possible.
 
 minline-strncmp
-Target Bool Var(riscv_inline_strncmp) Init(0)
+Target Var(riscv_inline_strncmp) Init(0)
 Inline strncmp calls if possible.
 
 minline-strlen
-Target Bool Var(riscv_inline_strlen) Init(0)
+Target Var(riscv_inline_strlen) Init(0)
 Inline strlen calls if possible.
 
 -param=riscv-strcmp-inline-limit=
@@ -542,7 +542,7 @@ madjust-lmul-cost
 Target Var(TARGET_ADJUST_LMUL_COST) Init(0)
 
 -param=riscv-vector-abi
-Target Undocumented Bool Var(riscv_vector_abi) Init(0)
+Target Undocumented Var(riscv_vector_abi) Init(0)
 Enable the use of vector registers for function arguments and return value.
 This is an experimental switch and may be subject to change in the future.
 

Jakub