[PATCH] LoongArch: Optimize the loading of immediate numbers with the same high and low 32-bit values

2023-11-17 Thread Guo Jie
For the following immediate load operation in 
gcc/testsuite/gcc.target/loongarch/imm-load1.c:

long long r = 0x0101010101010101;

Before this patch:

lu12i.w $r15,16842752>>12
ori $r15,$r15,257
lu32i.d $r15,0x10101>>32
lu52i.d $r15,$r15,0x100>>52

After this patch:

lu12i.w $r15,16842752>>12
ori $r15,$r15,257
bstrins.d   $r15,$r15,63,32

gcc/ChangeLog:

* config/loongarch/loongarch.cc (enum loongarch_load_imm_method): Add 
new method.
(loongarch_build_integer): Add relevant implementations for new method.
(loongarch_move_integer): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load1.c: Change old check.
---
 gcc/config/loongarch/loongarch.cc | 22 ++-
 .../gcc.target/loongarch/imm-load1.c  |  3 ++-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d05743bec87..58c00344d09 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -142,12 +142,16 @@ struct loongarch_address_info
 
METHOD_LU52I:
  Load 52-63 bit of the immediate number.
+
+   METHOD_MIRROR:
+ Copy 0-31 bit of the immediate number to 32-63bit.
 */
 enum loongarch_load_imm_method
 {
   METHOD_NORMAL,
   METHOD_LU32I,
-  METHOD_LU52I
+  METHOD_LU52I,
+  METHOD_MIRROR
 };
 
 struct loongarch_integer_op
@@ -1556,11 +1560,23 @@ loongarch_build_integer (struct loongarch_integer_op 
*codes,
 
   int sign31 = (value & (HOST_WIDE_INT_1U << 31)) >> 31;
   int sign51 = (value & (HOST_WIDE_INT_1U << 51)) >> 51;
+
+  unsigned HOST_WIDE_INT hival = value >> 32;
+  unsigned HOST_WIDE_INT loval = value << 32 >> 32;
+
   /* Determine whether the upper 32 bits are sign-extended from the lower
 32 bits. If it is, the instructions to load the high order can be
 ommitted.  */
   if (lu32i[sign31] && lu52i[sign31])
return cost;
+  /* If the lower 32 bits are the same as the upper 32 bits, just copy
+the lower 32 bits to the upper 32 bits.  */
+  else if (loval == hival)
+   {
+ codes[cost].method = METHOD_MIRROR;
+ codes[cost].curr_value = value;
+ return cost + 1;
+   }
   /* Determine whether bits 32-51 are sign-extended from the lower 32
 bits. If so, directly load 52-63 bits.  */
   else if (lu32i[sign31])
@@ -3230,6 +3246,10 @@ loongarch_move_integer (rtx temp, rtx dest, unsigned 
HOST_WIDE_INT value)
   gen_rtx_AND (DImode, x, GEN_INT (0xf)),
   GEN_INT (codes[i].value));
  break;
+   case METHOD_MIRROR:
+ gcc_assert (mode == DImode);
+ emit_insn (gen_insvdi (x, GEN_INT (32), GEN_INT (32), x));
+ break;
default:
  gcc_unreachable ();
}
diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load1.c 
b/gcc/testsuite/gcc.target/loongarch/imm-load1.c
index 2ff02971239..f64cc2956a3 100644
--- a/gcc/testsuite/gcc.target/loongarch/imm-load1.c
+++ b/gcc/testsuite/gcc.target/loongarch/imm-load1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mabi=lp64d -O2" } */
-/* { dg-final { scan-assembler "test:.*lu52i\.d.*\n\taddi\.w.*\n\.L2:" } } */
+/* { dg-final { scan-assembler-not "test:.*lu52i\.d.*\n\taddi\.w.*\n\.L2:" } } 
*/
+/* { dg-final { scan-assembler "test:.*lu12i\.w.*\n\tbstrins\.d.*\n\.L2:" } } 
*/
 
 
 extern long long b[10];
-- 
2.20.1



Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-17 Thread waffl3x
The patch is coming along, I just have a quick question regarding
style. I make use of IILE's (immediately invoked lambda expression) a
whole lot in my own code. I know that their use is controversial in
general so I would prefer to ask instead of just submitting the patch
using them a bunch suddenly. I wouldn't have bothered either but this
part is really miserable without them.

If that would be okay, I would suggest an additional exception to
bracing style for lambdas.
This:
[](){
  // stuff
};
Instead of this:
[]()
  {
// stuff
  };

This is especially important for IILE pattern IMO, else it looks really
mediocre. If this isn't okay okay I'll refactor all the IILE's that I
added, or just name them and call them instead. Whatever you think is
most appropriate.

Alex


[PATCH v2 9/9] RISC-V: Disable fractional type intrinsics for the XTheadVector extension

2023-11-17 Thread Jun Sha (Joshua)
Because the XTheadVector extension does not support fractional
operations, so we need to delete the related intrinsics.

The types involved are as follows:
v(u)int8mf8_t,
v(u)int8mf4_t,
v(u)int8mf2_t,
v(u)int16mf4_t,
v(u)int16mf2_t,
v(u)int32mf2_t,
vfloat16mf4_t,
vfloat16mf2_t,
vfloat32mf2_t

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_v_ext_mode_p):
New extern.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
New function.
(build_one): If the checked types fail, no function is generated.
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fractional-type.c: New test.
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-shapes.cc |  22 +++
 gcc/config/riscv/riscv-vector-switch.def  | 144 +-
 gcc/config/riscv/riscv.cc |   2 +-
 .../gcc.target/riscv/rvv/fractional-type.c|  79 ++
 5 files changed, 175 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/fractional-type.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8cdfadbcf10..7de4f81aa9a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -153,6 +153,7 @@ extern poly_uint64 riscv_regmode_natural_size 
(machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_v_ext_tuple_mode_p (machine_mode);
 extern bool riscv_v_ext_vls_mode_p (machine_mode);
+extern bool riscv_v_ext_mode_p (machine_mode);
 extern int riscv_get_v_regno_alignment (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index e24c535e496..dcdb9506ff2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -33,6 +33,24 @@
 
 namespace riscv_vector {
 
+/* Check whether the RET and ARGS are valid for the function.  */
+
+static bool
+check_type (tree ret, vec )
+{
+  tree arg;
+  unsigned i;
+
+  if (!ret || (builtin_type_p (ret) && !riscv_v_ext_mode_p (TYPE_MODE (ret
+return false;
+
+  FOR_EACH_VEC_ELT (args, i, arg)
+if (!arg || (builtin_type_p (arg) && !riscv_v_ext_mode_p (TYPE_MODE 
(arg
+  return false;
+
+  return true;
+}
+
 /* Add one function instance for GROUP, using operand suffix at index OI,
mode suffix at index PAIR && bi and predication suffix at index pred_idx.  
*/
 static void
@@ -49,6 +67,10 @@ build_one (function_builder , const function_group_info 
,
 group.ops_infos.types[vec_type_idx].index);
   b.allocate_argument_types (function_instance, argument_types);
   b.apply_predication (function_instance, return_type, argument_types);
+
+  if (TARGET_XTHEADVECTOR && !check_type (return_type, argument_types))
+return;
+
   b.add_overloaded_function (function_instance, *group.shape);
   b.add_unique_function (function_instance, (*group.shape), return_type,
 argument_types);
diff --git a/gcc/config/riscv/riscv-vector-switch.def 
b/gcc/config/riscv/riscv-vector-switch.def
index 5c9f9bcbc3e..f17f87f89c9 100644
--- a/gcc/config/riscv/riscv-vector-switch.def
+++ b/gcc/config/riscv/riscv-vector-switch.def
@@ -81,39 +81,39 @@ ENTRY (RVVM8QI, true, LMUL_8, 1)
 ENTRY (RVVM4QI, true, LMUL_4, 2)
 ENTRY (RVVM2QI, true, LMUL_2, 4)
 ENTRY (RVVM1QI, true, LMUL_1, 8)
-ENTRY (RVVMF2QI, true, LMUL_F2, 16)
-ENTRY (RVVMF4QI, true, LMUL_F4, 32)
-ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32, LMUL_F8, 64)
+ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16)
+ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32)
+ENTRY (RVVMF8QI, (TARGET_MIN_VLEN > 32) && !TARGET_XTHEADVECTOR, LMUL_F8, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32.  */
 ENTRY (RVVM8HI, true, LMUL_8, 2)
 ENTRY (RVVM4HI, true, LMUL_4, 4)
 ENTRY (RVVM2HI, true, LMUL_2, 8)
 ENTRY (RVVM1HI, true, LMUL_1, 16)
-ENTRY (RVVMF2HI, true, LMUL_F2, 32)
-ENTRY (RVVMF4HI, TARGET_MIN_VLEN > 32, LMUL_F4, 64)
+ENTRY (RVVMF2HI, !TARGET_XTHEADVECTOR, LMUL_F2, 32)
+ENTRY (RVVMF4HI, (TARGET_MIN_VLEN > 32) && !TARGET_XTHEADVECTOR, LMUL_F4, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_FP_16.  */
 ENTRY (RVVM8HF, TARGET_VECTOR_ELEN_FP_16, LMUL_8, 2)
 ENTRY (RVVM4HF, TARGET_VECTOR_ELEN_FP_16, LMUL_4, 4)
 ENTRY (RVVM2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_2, 8)
 ENTRY (RVVM1HF, TARGET_VECTOR_ELEN_FP_16, LMUL_1, 16)
-ENTRY (RVVMF2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_F2, 32)
-ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && 

[PATCH v2 8/9] RISC-V: Add support for xtheadvector-specific load/store intrinsics

2023-11-17 Thread Jun Sha (Joshua)
This patch involves the generation of xtheadvector special
load/store instructions.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 122 +++
 .../riscv/riscv-vector-builtins-bases.h   |  30 ++
 .../riscv/riscv-vector-builtins-functions.def |   2 +
 .../riscv/riscv-vector-builtins-shapes.cc | 100 ++
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 300 +-
 .../riscv/thead-vector-builtins-functions.def |  30 ++
 gcc/config/riscv/thead-vector.md  | 235 ++
 gcc/config/riscv/vector.md|   1 +
 .../riscv/rvv/xtheadvector/vlb-vsb.c  |  68 
 .../riscv/rvv/xtheadvector/vlbu-vsb.c |  68 
 .../riscv/rvv/xtheadvector/vlh-vsh.c  |  68 
 .../riscv/rvv/xtheadvector/vlhu-vsh.c |  68 
 .../riscv/rvv/xtheadvector/vlw-vsw.c  |  68 
 .../riscv/rvv/xtheadvector/vlwu-vsw.c |  68 
 16 files changed, 1349 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/thead-vector.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c

diff --git 

[PATCH v2 6/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part4)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for ternary and unary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-2.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-7.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-8.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-9.c: New test.
* gcc.target/riscv/rvv/xtheadvector/unop_v_constraint-1.c: New test.
---
 .../rvv/xtheadvector/ternop_vv_constraint-1.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-2.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-3.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-4.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-5.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-6.c |  83 +++
 .../rvv/xtheadvector/ternop_vx_constraint-1.c |  71 ++
 .../rvv/xtheadvector/ternop_vx_constraint-2.c |  38 +
 .../rvv/xtheadvector/ternop_vx_constraint-3.c | 125 +
 .../rvv/xtheadvector/ternop_vx_constraint-4.c | 123 +
 .../rvv/xtheadvector/ternop_vx_constraint-5.c | 123 +
 .../rvv/xtheadvector/ternop_vx_constraint-6.c | 130 ++
 .../rvv/xtheadvector/ternop_vx_constraint-7.c | 130 ++
 .../rvv/xtheadvector/ternop_vx_constraint-8.c |  71 ++
 .../rvv/xtheadvector/ternop_vx_constraint-9.c |  71 ++
 .../rvv/xtheadvector/unop_v_constraint-1.c|  68 +
 16 files changed, 1448 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/unop_v_constraint-1.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c
new file mode 100644
index 000..d98755e7040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcxtheadvector -mabi=ilp32d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+#include "riscv_th_vector.h"
+
+/*
+** f1:
+**  ...
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)
+** th.vma[c-d][c-d]\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** th.vma[c-d][c-d]\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** th.vma[c-d][c-d]\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** 

[PATCH v2 5/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part3)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for binary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-31.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-33.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-34.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-35.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-36.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-37.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-38.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-39.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-40.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-41.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-42.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-43.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-44.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-45.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-46.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-47.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-48.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-49.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-50.c: New test.
---
 .../rvv/xtheadvector/binop_vx_constraint-31.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-32.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-33.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-34.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-35.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-36.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-37.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-38.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-39.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-40.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-41.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-42.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-43.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-44.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-45.c | 123 ++
 .../rvv/xtheadvector/binop_vx_constraint-46.c |  72 ++
 .../rvv/xtheadvector/binop_vx_constraint-47.c |  16 +++
 .../rvv/xtheadvector/binop_vx_constraint-48.c |  16 +++
 .../rvv/xtheadvector/binop_vx_constraint-49.c |  16 +++
 .../rvv/xtheadvector/binop_vx_constraint-50.c |  18 +++
 20 files changed, 1238 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-31.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-33.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-34.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-35.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-36.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-37.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-38.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-39.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-40.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-41.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-42.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-43.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-44.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-45.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-46.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-47.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-48.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-49.c
 create mode 

[PATCH v2 4/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part2)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for binary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-11.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-12.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-13.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-14.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-15.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-16.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-17.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-18.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-19.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-20.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-21.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-22.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-23.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-24.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-25.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-26.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-27.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-28.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-29.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-30.c: New test.
---
 .../rvv/xtheadvector/binop_vx_constraint-11.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-12.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-13.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-14.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-15.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-16.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-17.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-18.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-19.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-20.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-21.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-22.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-23.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-24.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-25.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-26.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-27.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-28.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-29.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-30.c | 68 +
 20 files changed, 1405 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-13.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-14.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-15.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-17.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-18.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-19.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-20.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-21.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-22.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-23.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-24.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-25.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-26.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-27.c
 create mode 100644 

[PATCH v2 3/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part1)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for binary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-7.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-10.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-7.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-8.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-9.c: New test.
* gcc.target/riscv/rvv/xtheadvector/rvv-xtheadvector.exp: New test.
---
 .../rvv/xtheadvector/binop_vv_constraint-1.c  | 68 +
 .../rvv/xtheadvector/binop_vv_constraint-3.c  | 27 +++
 .../rvv/xtheadvector/binop_vv_constraint-4.c  | 27 +++
 .../rvv/xtheadvector/binop_vv_constraint-5.c  | 29 
 .../rvv/xtheadvector/binop_vv_constraint-6.c  | 28 +++
 .../rvv/xtheadvector/binop_vv_constraint-7.c  | 29 
 .../rvv/xtheadvector/binop_vx_constraint-1.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-10.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-2.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-3.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-4.c  | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-5.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-6.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-7.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-8.c  | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-9.c  | 68 +
 .../rvv/xtheadvector/rvv-xtheadvector.exp | 41 +++
 17 files changed, 939 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/rvv-xtheadvector.exp

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c
new file mode 100644
index 000..172dfb6c228
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcxtheadvector -mabi=ilp32d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+#include "riscv_th_vector.h"
+
+/*
+** f1:
+**  ...
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)

[PATCH v2 2/9] RISC-V: Handle differences between xtheadvector and vector

2023-11-17 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation
between vector and xtheadvector, mainly adding th. prefix
to all xtheadvector instructions.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* config.gcc: Add header for XTheadVector intrinsics.
* config/riscv/riscv-c.cc (riscv_pragma_intrinsic):
Add XTheadVector.
* config/riscv/riscv.cc (riscv_print_operand):
Add new operand format directives.
(riscv_print_operand_punct_valid_p): Likewise.
* config/riscv/vector-iterators.md: Split any_int_unop
for not and neg.
* config/riscv/vector.md (@pred_):
Add th. for xtheadvector instructions.
* config/riscv/riscv_th_vector.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-c.cc   |   4 +-
 gcc/config/riscv/riscv.cc |  11 +-
 gcc/config/riscv/riscv_th_vector.h|  49 ++
 gcc/config/riscv/vector-iterators.md  |   4 +
 gcc/config/riscv/vector.md| 777 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 7 files changed, 466 insertions(+), 383 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ba6d63e33ac..e0fc2b1a27c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -548,7 +548,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 184fff905b2..0a17d5f6656 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -194,8 +194,8 @@ riscv_pragma_intrinsic (cpp_reader *)
 {
   if (!TARGET_VECTOR)
{
- error ("%<#pragma riscv intrinsic%> option %qs needs 'V' extension "
-"enabled",
+ error ("%<#pragma riscv intrinsic%> option %qs needs 'V' or "
+"'XTHEADVECTOR' extension enabled",
 name);
  return;
}
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ecee7eb4727..754107cdaac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5323,7 +5323,7 @@ riscv_get_v_regno_alignment (machine_mode mode)
 static void
 riscv_print_operand (FILE *file, rtx op, int letter)
 {
-  /* `~` does not take an operand so op will be null
+  /* `~` and '^' does not take an operand so op will be null
  Check for before accessing op.
   */
   if (letter == '~')
@@ -5332,6 +5332,13 @@ riscv_print_operand (FILE *file, rtx op, int letter)
fputc('w', file);
   return;
 }
+
+  if (letter == '^')
+{
+  if (TARGET_XTHEADVECTOR)
+   fputs ("th.", file);
+  return;
+}
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
 
@@ -5584,7 +5591,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 static bool
 riscv_print_operand_punct_valid_p (unsigned char code)
 {
-  return (code == '~');
+  return (code == '~' || code == '^');
 }
 
 /* Implement TARGET_PRINT_OPERAND_ADDRESS.  */
diff --git a/gcc/config/riscv/riscv_th_vector.h 
b/gcc/config/riscv/riscv_th_vector.h
new file mode 100644
index 000..194652032bc
--- /dev/null
+++ b/gcc/config/riscv/riscv_th_vector.h
@@ -0,0 +1,49 @@
+/* RISC-V 'XTheadVector' Extension intrinsics include file.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef 

[PATCH v2 1/9] RISC-V: minimal support for xtheadvector

2023-11-17 Thread Jun Sha (Joshua)
This patch is to introduce basic XTheadVector support
(march string parsing and a test for __riscv_xtheadvector)
according to https://github.com/T-head-Semi/thead-extension-spec/

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse): : Add new vendor extension.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Add test marco.
* config/riscv/riscv.opt: Add new mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-__riscv_th_v_intrinsic.c: New test.
* gcc.target/riscv/rvv/xtheadvector.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc | 10 ++
 gcc/config/riscv/riscv-c.cc |  4 
 gcc/config/riscv/riscv.opt  |  2 ++
 .../riscv/predef-__riscv_th_v_intrinsic.c   | 11 +++
 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c   | 13 +
 5 files changed, 40 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 526dbb7603b..914924171fd 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -75,6 +75,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
 
   {"v", "zvl128b"},
   {"v", "zve64d"},
+  {"xtheadvector", "zvl128b"},
+  {"xtheadvector", "zve64d"},
 
   {"zve32f", "f"},
   {"zve64f", "f"},
@@ -325,6 +327,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadvector", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1495,6 +1498,10 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
 error_at (loc, "%<-march=%s%>: z*inx conflicts with floating-point "
   "extensions", arch);
 
+  if (subset_list->lookup ("v") && subset_list->lookup ("xtheadvector"))
+error_at (loc, "%<-march=%s%>: xtheadvector conflicts with vector "
+  "extensions", arch);
+
   /* 'H' hypervisor extension requires base ISA with 32 registers.  */
   if (subset_list->lookup ("e") && subset_list->lookup ("h"))
 error_at (loc, "%<-march=%s%>: h extension requires i extension", arch);
@@ -1680,6 +1687,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xtheadmemidx",  _options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
   {"xtheadmempair", _options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
   {"xtheadsync",_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
+  {"xtheadvector",  _options::x_riscv_xthead_subext, MASK_XTHEADVECTOR},
+  {"xtheadvector",  _options::x_target_flags, MASK_FULL_V},
+  {"xtheadvector",  _options::x_target_flags, MASK_VECTOR},
 
   {"xventanacondops", _options::x_riscv_xventana_subext, 
MASK_XVENTANACONDOPS},
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index b7f9ba204f7..184fff905b2 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -137,6 +137,10 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 riscv_ext_version_value (0, 11));
 }
 
+   if (TARGET_XTHEADVECTOR)
+ builtin_define_with_int_value ("__riscv_th_v_intrinsic",
+riscv_ext_version_value (0, 11));
+
   /* Define architecture extension test macros.  */
   builtin_define_with_int_value ("__riscv_arch_test", 1);
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 70d78151cee..72857aea352 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -438,6 +438,8 @@ Mask(XTHEADMEMPAIR) Var(riscv_xthead_subext)
 
 Mask(XTHEADSYNC)Var(riscv_xthead_subext)
 
+Mask(XTHEADVECTOR)  Var(riscv_xthead_subext)
+
 TargetVariable
 int riscv_xventana_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
new file mode 100644
index 000..1c764241db6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imafdcxtheadvector -mabi=lp64d" } */
+
+int main () {
+
+#if __riscv_th_v_intrinsic != 11000
+#error "__riscv_th_v_intrinsic"
+#endif
+
+  return 0;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c
new file mode 100644
index 000..d52921e1314
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options 

[PATCH v2 0/9] RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jun Sha (Joshua)
This patch series presents gcc implementation of the XTheadVector
extension [1].

[1] https://github.com/T-head-Semi/thead-extension-spec/

I updated my patch series, because I forgot to add co-authors in
the last version.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

RISC-V: minimal support for xtheadvector
RISC-V: Handle differences between xtheadvector and vector
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part1)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part2)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part3)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part4)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part5)
RISC-V: Add support for xtheadvector-specific load/store intrinsics
RISC-V: Disable fractional type intrinsics for XTheadVector

---
 gcc/common/config/riscv/riscv-common.cc   |  10 +
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-c.cc   |   8 +-
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-bases.cc  | 122 +++
 .../riscv/riscv-vector-builtins-bases.h   |  30 +
 .../riscv/riscv-vector-builtins-functions.def |   2 +
 .../riscv/riscv-vector-builtins-shapes.cc | 122 +++
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 300 ++-
 gcc/config/riscv/riscv-vector-switch.def  | 144 ++--
 gcc/config/riscv/riscv.cc |  13 +-
 gcc/config/riscv/riscv.opt|   2 +
 gcc/config/riscv/riscv_th_vector.h|  49 ++
 .../riscv/thead-vector-builtins-functions.def |  30 +
 gcc/config/riscv/thead-vector.md  | 235 ++
 gcc/config/riscv/vector-iterators.md  |   4 +
 gcc/config/riscv/vector.md| 778 +-
 .../riscv/predef-__riscv_th_v_intrinsic.c |  11 +
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 .../gcc.target/riscv/rvv/fractional-type.c|  79 ++
 .../gcc.target/riscv/rvv/xtheadvector.c   |  13 +
 .../rvv/xtheadvector/autovec/vadd-run-nofm.c  |   4 +
 .../riscv/rvv/xtheadvector/autovec/vadd-run.c |  81 ++
 .../xtheadvector/autovec/vadd-rv32gcv-nofm.c  |  10 +
 .../rvv/xtheadvector/autovec/vadd-rv32gcv.c   |   8 +
 .../xtheadvector/autovec/vadd-rv64gcv-nofm.c  |  10 +
 .../rvv/xtheadvector/autovec/vadd-rv64gcv.c   |   8 +
 .../rvv/xtheadvector/autovec/vadd-template.h  |  70 ++
 .../rvv/xtheadvector/autovec/vadd-zvfh-run.c  |  54 ++
 .../riscv/rvv/xtheadvector/autovec/vand-run.c |  75 ++
 .../rvv/xtheadvector/autovec/vand-rv32gcv.c   |   7 +
 .../rvv/xtheadvector/autovec/vand-rv64gcv.c   |   7 +
 .../rvv/xtheadvector/autovec/vand-template.h  |  61 ++
 .../rvv/xtheadvector/binop_vv_constraint-1.c  |  68 ++
 .../rvv/xtheadvector/binop_vv_constraint-3.c  |  27 +
 .../rvv/xtheadvector/binop_vv_constraint-4.c  |  27 +
 .../rvv/xtheadvector/binop_vv_constraint-5.c  |  29 +
 .../rvv/xtheadvector/binop_vv_constraint-6.c  |  28 +
 .../rvv/xtheadvector/binop_vv_constraint-7.c  |  29 +
 .../rvv/xtheadvector/binop_vx_constraint-1.c  |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-10.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-11.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-12.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-13.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-14.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-15.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-16.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-17.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-18.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-19.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-2.c  |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-20.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-21.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-22.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-23.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-24.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-25.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-26.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-27.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-28.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-29.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-3.c  |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-30.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-31.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-32.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-33.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-34.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-35.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-36.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-37.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-38.c |  68 ++
 

Re: [PATCH] RISC-V: Refactor RVV iterators[NFC]

2023-11-17 Thread Kito Cheng
LGTM, that's a really great clean up :)

On Sat, Nov 18, 2023 at 11:12 AM Juzhe-Zhong  wrote:
>
> This patch refactors RVV iteratros for easier maintain.
>
> E.g.
>
> (define_mode_iterator V [
>   RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
>
>   RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
>
>   (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
> (RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
>   (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
>   (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
>
>   RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
>
>   (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
>   (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
> TARGET_MIN_VLEN > 32")
>
>   (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
>   (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
>
>   (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
>   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
> ])
>
> change it into:
>
> (define_mode_iterator V [VI VF_ZVFHMIN])
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Refactor iterators.
>
> ---
>  gcc/config/riscv/vector-iterators.md | 661 +--
>  1 file changed, 124 insertions(+), 537 deletions(-)
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index f04c7fe5491..469875ce67c 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -108,48 +108,49 @@
>UNSPECV_FRM_RESTORE_EXIT
>  ])
>
> -(define_mode_iterator V [
> +(define_mode_iterator VI [
>RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
>
>RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
>
> -  (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
> (RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
> -  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
> -  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> -
>RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
>
> -  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
> -  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 
> && TARGET_MIN_VLEN > 32")
> -
>(RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
>(RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
> +])
> +
> +;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
> +;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
> +;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
> +;; from load, store and convert for it.
> +;; It is not enough to set the "enabled" attribute to false
> +;; since this will only disable insn alternatives in reload but still
> +;; allow the instruction and mode to be matched during combine et al.
> +(define_mode_iterator VF [
> +  (RVVM8HF "TARGET_ZVFH") (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH")
> +  (RVVM1HF "TARGET_ZVFH") (RVVMF2HF "TARGET_ZVFH")
> +  (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
> +
> +  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
> +  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 
> && TARGET_MIN_VLEN > 32")
>
>(RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
>(RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
>  ])
>
> -(define_mode_iterator V_VLS [
> -  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
> -
> -  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
> -
> +(define_mode_iterator VF_ZVFHMIN [
>(RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
> (RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
>(RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
>(RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
>
> -  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
> -
>(RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
>(RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 
> && TARGET_MIN_VLEN > 32")
>
> -  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
> -  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
> -
>(RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
>(RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")

[PATCH] LoongArch: Modify MUSL_DYNAMIC_LINKER.

2023-11-17 Thread Lulu Cheng
Use no suffix at all in the musl dynamic linker name for hard
float ABI. Use -sf and -sp suffixes in musl dynamic linker name
for soft float and single precision ABIs. The following table
outlines the musl interpreter names for the LoongArch64 ABI names.

musl interpreter| LoongArch64 ABI
--- | -
ld-musl-loongarch64.so.1| loongarch64-lp64d
ld-musl-loongarch64-sp.so.1 | loongarch64-lp64f
ld-musl-loongarch64-sf.so.1 | loongarch64-lp64s

gcc/ChangeLog:

* config/loongarch/gnu-user.h (MUSL_ABI_SPEC): Modify suffix.
---
 gcc/config/loongarch/gnu-user.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/gnu-user.h b/gcc/config/loongarch/gnu-user.h
index 9616d6e8a0b..e9f4bcef1d4 100644
--- a/gcc/config/loongarch/gnu-user.h
+++ b/gcc/config/loongarch/gnu-user.h
@@ -34,9 +34,9 @@ along with GCC; see the file COPYING3.  If not see
   "/lib" ABI_GRLEN_SPEC "/ld-linux-loongarch-" ABI_SPEC ".so.1"
 
 #define MUSL_ABI_SPEC \
-  "%{mabi=lp64d:-lp64d}" \
-  "%{mabi=lp64f:-lp64f}" \
-  "%{mabi=lp64s:-lp64s}"
+  "%{mabi=lp64d:}" \
+  "%{mabi=lp64f:-sp}" \
+  "%{mabi=lp64s:-sf}"
 
 #undef MUSL_DYNAMIC_LINKER
 #define MUSL_DYNAMIC_LINKER \
-- 
2.31.1



[PATCH] RISC-V: Refactor RVV iterators[NFC]

2023-11-17 Thread Juzhe-Zhong
This patch refactors RVV iteratros for easier maintain.

E.g. 

(define_mode_iterator V [
  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")

  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")

  (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
(RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")

  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")

  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")

  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")

  (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
  (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
])

change it into:

(define_mode_iterator V [VI VF_ZVFHMIN])

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Refactor iterators.

---
 gcc/config/riscv/vector-iterators.md | 661 +--
 1 file changed, 124 insertions(+), 537 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f04c7fe5491..469875ce67c 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -108,48 +108,49 @@
   UNSPECV_FRM_RESTORE_EXIT
 ])
 
-(define_mode_iterator V [
+(define_mode_iterator VI [
   RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")
 
   RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
 
-  (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
(RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-
   RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
 
-  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
-  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-
   (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
   (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
+])
+
+;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
+;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
+;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
+;; from load, store and convert for it.
+;; It is not enough to set the "enabled" attribute to false
+;; since this will only disable insn alternatives in reload but still
+;; allow the instruction and mode to be matched during combine et al.
+(define_mode_iterator VF [
+  (RVVM8HF "TARGET_ZVFH") (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH")
+  (RVVM1HF "TARGET_ZVFH") (RVVMF2HF "TARGET_ZVFH")
+  (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
+
+  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
+  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
 
   (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
-(define_mode_iterator V_VLS [
-  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")
-
-  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
+(define_mode_iterator VF_ZVFHMIN [
   (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
(RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
   (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
   (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
 
-  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
-
   (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
 
-  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
-  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
-
   (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
+])
 
-  ;; VLS modes.
+(define_mode_iterator VLSI [
   (V1QI "riscv_vector::vls_mode_valid_p (V1QImode)")
   (V2QI "riscv_vector::vls_mode_valid_p (V2QImode)")
   (V4QI "riscv_vector::vls_mode_valid_p (V4QImode)")
@@ -195,7 +196,45 @@
   (V64DI "riscv_vector::vls_mode_valid_p (V64DImode) && 

[pushed] analyzer: new warning: -Wanalyzer-infinite-loop [PR106147]

2023-11-17 Thread David Malcolm
This patch implements a new analyzer warning: -Wanalyzer-infinite-loop.

It works by examining the exploded graph once the latter has been
fully built.  It attempts to detect cycles in the exploded graph in
which:
- no externally visible work occurs
- no escape is possible from the cycle once it has been entered
- the program state is "sufficiently concrete" at each step:
  - no unknown activity could be occurring
  - the worklist was fully drained for each enode in the cycle
i.e. every enode in the cycle is processed

For example, it correctly complains about this bogus "for" loop:

  int sum = 0;
  for (struct node *iter = n; iter; iter->next)
sum += n->val;
  return sum;

like this:

infinite-loop-linked-list.c: In function ‘for_loop_noop_next’:
infinite-loop-linked-list.c:110:31: warning: infinite loop [CWE-835] 
[-Wanalyzer-infinite-loop]
  110 |   for (struct node *iter = n; iter; iter->next)
  |   ^~~~
  ‘for_loop_noop_next’: events 1-5
|
|  110 |   for (struct node *iter = n; iter; iter->next)
|  |   ^~~~
|  |   |
|  |   (1) infinite loop here
|  |   (2) when ‘iter’ is non-NULL: always 
following ‘true’ branch...
|  |   (5) ...to here
|  111 | sum += n->val;
|  | ~
|  | |   |
|  | |   (3) ...to here
|  | (4) looping back...
|

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

My integration test suite shows 12 true positives and 2 false positives,
which seems good enough for an initial implementation; I'll file bugs
for myself to track fixing the two false positives.

Pushed to trunk as r14-5566-g841008d3966c0f.

gcc/ChangeLog:
PR analyzer/106147
* Makefile.in (ANALYZER_OBJS): Add analyzer/infinite-loop.o.
* doc/invoke.texi: Add -fdump-analyzer-infinite-loop and
-Wanalyzer-infinite-loop.  Add missing CWE link for
-Wanalyzer-infinite-recursion.
* timevar.def (TV_ANALYZER_INFINITE_LOOPS): New.

gcc/analyzer/ChangeLog:
PR analyzer/106147
* analyzer.opt (Wanalyzer-infinite-loop): New option.
(fdump-analyzer-infinite-loop): New option.
* checker-event.h (start_cfg_edge_event::get_desc): Drop "final".
(start_cfg_edge_event::maybe_describe_condition): Convert from
private to protected.
* checker-path.h (checker_path::get_logger): New.
* diagnostic-manager.cc (process_worklist_item): Update for
new context param of maybe_update_for_edge.
* engine.cc
(impl_region_model_context::impl_region_model_context): Add
out_could_have_done_work param to both ctors and use it to
initialize mm_out_could_have_done_work.
(impl_region_model_context::maybe_did_work): New vfunc
implementation.
(exploded_node::on_stmt): Add out_could_have_done_work param and
pass to ctxt ctor.
(exploded_node::on_stmt_pre): Treat setjmp and longjmp as "doing
work".
(exploded_node::on_longjmp): Likewise.
(exploded_edge::exploded_edge): Add "could_do_work" param and use
it to initialize m_could_do_work_p.
(exploded_edge::dump_dot_label): Add result of could_do_work_p.
(exploded_graph::add_function_entry): Mark edge as doing no work.
(exploded_graph::add_edge): Add "could_do_work" param and pass to
exploded_edge ctor.
(add_tainted_args_callback): Treat as doing no work.
(exploded_graph::process_worklist): Likewise when merging nodes.
(maybe_process_run_of_before_supernode_enodes::item): Likewise.
(exploded_graph::maybe_create_dynamic_call): Likewise.
(exploded_graph::process_node): Likewise for phi nodes.
Pass in a "could_have_done_work" bool when handling stmts and use
when creating edges.  Assume work is done at bifurcation.
(exploded_path::feasible_p): Update for new context param of
maybe_update_for_edge.
(feasibility_state::feasibility_state): New ctor.
(feasibility_state::operator=): New.
(feasibility_state::maybe_update_for_edge): Add ctxt param and use
it.  Fix missing newline when logging state.
(impl_run_checkers): Call exploded_graph::detect_infinite_loops.
* exploded-graph.h
(impl_region_model_context::impl_region_model_context): Add
out_could_have_done_work param to both ctors.
(impl_region_model_context::maybe_did_work): New decl.
(impl_region_model_context::checking_for_infinite_loop_p): New.
(impl_region_model_context::on_unusable_in_infinite_loop): New.
(impl_region_model_context::m_out_could_have_done_work): New
field.
(exploded_node::on_stmt): Add "out_could_have_done_work" 

[PATCH v3] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-17 Thread Cassio Neri
The following functions invoke signed integer overflow (UB) for some extreme
values of days and months [1]:

  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1 the problem is that in libstdc++ days::rep is int64_t. Other
implementations use int32_t and cast operands to int64_t. Hence then perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

For x86-64, long long is int64 so the cast is useless.  For #2, casting to a
larger type could help but all implementations follow the Standard's "Returns
clause" and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++.  The fix uses this new function:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing. This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

Changes with respect to previous versions:
 v3: Fix screwed up email send with v2. (Sorry about that. I shall learn at
 some point.)
 v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from test.

 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/month/2.cc   | 30 ++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 30 ++
 5 files changed, 114 insertions(+), 24 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..691bb106bb9 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != __y mod d.  To
+   // fix both issues we from _Up(__y)"subtract"  an __offset >=
+   // 255 + d - 1 to make room for the addition to __x and shift the modulo
+   // to the correct value.
+   auto constexpr __a = _Up(-1) - _Up(255 

[PATCH v2] The following functions invoke signed integer overflow (UB) for some extreme values of days and months [1]:

2023-11-17 Thread Cassio Neri
  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1 the problem is that in libstdc++ days::rep is int64_t. Other
implementations use int32_t and cast operands to int64_t. Hence then perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

For x86-64, long long is int64 so the cast is useless.  For #2, casting to a
larger type could help but all implementations follow the Standard's "Returns
clause" and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++.  The fix uses this new function:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing. This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---
 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 3 files changed, 54 insertions(+), 24 deletions(-)

 Changes with respect to previous versions:
 v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from test.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..691bb106bb9 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != __y mod d.  To
+   // fix both issues we from _Up(__y)"subtract"  an __offset >=
+   // 255 + d - 1 to make room for the addition to __x and shift the modulo
+   // to the correct value.
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - _Up(255 + d - 1) and b % d = d - 1.
+   auto const __offset = __y >= 0 ? _Up(0) : __b - _Up(-1);
+   return (__x + _Up(__y) + __offset) % __d;
   }

   inline constexpr unsigned __days_per_month[12]
@@ -700,8 +720,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   friend constexpr month
   operator+(const month& __x, const 

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲
>> I suspect it's going to be even worse if you we have multiple patterns
>> with the same underlying RTL, but just different output strings.
No. We don't need to add (duplicate) any new patterns.
I know RVV GCC very well. I know how to do that.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-18 08:01
To: 钟居哲; palmer
CC: gcc-patches; kito.cheng; kito.cheng; cooper.joshua; rdapp.gcc
Subject: Re: RISC-V: Support XTheadVector extensions
 
 
On 11/17/23 16:16, 钟居哲 wrote:
>  >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
>>>may also need to add '^' to the punct_valid_p hook.  But yes, this is
>>>the preferred way to go when all we need to do is prefix the instruction
>>>with "th.".
> 
> No. I don't think we need to add '^' . I don't want theadvector to touch 
> any codes
> of vector.md.
> Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
> People like me don't want to touch any thing related to Thead.
> But anyway, I will take care of that in GCC-15.
I suspect it's going to be even worse if you we have multiple patterns 
with the same underlying RTL, but just different output strings.
 
The standard way to handle that has been with an output modifier and/or 
ASSEMBLER_DIALECT.  If you look at the PA port for example, the 
assembler syntax changed dramatically between the PA1.0/PA1.1 era and 
the PA2.0 era.  But we support both variants trivially without 
duplicating all the patterns.
 
But we've got time to sort this out.  I don't think the code in question 
was targeted towards gcc-14.
 
 
jeff
 


Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jeff Law




On 11/17/23 16:16, 钟居哲 wrote:

 >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We

may also need to add '^' to the punct_valid_p hook.  But yes, this is
the preferred way to go when all we need to do is prefix the instruction
with "th.".


No. I don't think we need to add '^' . I don't want theadvector to touch 
any codes

of vector.md.
Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
People like me don't want to touch any thing related to Thead.
But anyway, I will take care of that in GCC-15.
I suspect it's going to be even worse if you we have multiple patterns 
with the same underlying RTL, but just different output strings.


The standard way to handle that has been with an output modifier and/or 
ASSEMBLER_DIALECT.  If you look at the PA port for example, the 
assembler syntax changed dramatically between the PA1.0/PA1.1 era and 
the PA2.0 era.  But we support both variants trivially without 
duplicating all the patterns.


But we've got time to sort this out.  I don't think the code in question 
was targeted towards gcc-14.



jeff


Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲
>> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
>> may also need to add '^' to the punct_valid_p hook.  But yes, this is
>> the preferred way to go when all we need to do is prefix the instruction
>> with "th.".

No. I don't think we need to add '^' . I don't want theadvector to touch any 
codes
of vector.md.
Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
People like me don't want to touch any thing related to Thead.
But anyway, I will take care of that in GCC-15.





juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-11-18 01:11
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; cooper.joshua; rdapp.gcc; jeffreyalaw
Subject: Re: RISC-V: Support XTheadVector extensions
On Fri, 17 Nov 2023 03:39:48 PST (-0800), juzhe.zh...@rivai.ai wrote:
> 90% theadvector extension reusing current RVV 1.0 instructions patterns:
> Just change ASM, For example:
> 
> @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
>   (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] VMULH)
>(match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
>"TARGET_VECTOR"
> -  "vmulh.vx\t%0,%3,%z4%p1"
> +  "%^vmulh.vx\t%0,%3,%z4%p1"
>[(set_attr "type" "vimul")
> (set_attr "mode" "")])
> +  if (letter == '^')
> +{
> +  if (TARGET_XTHEADVECTOR)
> + fputs ("th.", file);
> +  return;
> +}
> 
> For almost all patterns, you just simply append "th." in the ASM prefix.
> like change "vmulh.vv" -> "th.vmulh.vv"
> 
> Almost all theadvector instructions are not new features,  all same as RVV1.0.
> Why do you invent the such ISA doesn't include any features that RVV1.0 
> doesn't satisfy ?
> 
> I am not explicitly object this patch. But I should know the reason.
 
There's some more in the later threads, but with the top posting it kind 
of got lost so I'm just replying here.
 
This really isn't T-Head's fault: we announced V-0.7 as a stable draft 
that was being implemented, and then T-Head went and implemented it.  
Most of that history has been scrubbed by RVI, but you can still find 
some stuff like this old talk on YouTube 
.
 
In general we've just figured out a way to make things work when HW 
vendors end up in a grey area in RISC-V land.  That obviously results in 
a bunch of pain for the SW people, but this stuff is only useful if we 
can run on real HW and that always involves some amount of pain.  
Hopefully we can get to a point where we make fewer problems for 
ourselves, but we've got a long history to dig out from and there's 
going to be a lot more of this in the future.
 
So I don't like this XTHeadV stuff, but I think we're best to take it: 
these guys tried to do the right thing and got thrown under the bus by 
RVI, we should help them.  This is almost certainly going to be a lot 
more pain that we're used to, just given the size of the extensions in 
question, but I still think it's the right  way to go.
 
The other option is to essentially just tell them to fork the ISA, which 
isn't good for anyone.
 
> Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as long 
> as all other RISC-V maintainers agree.
 
I agree this is gcc-15 material: there's a lot of subtle differences in 
behavior between 0.7 and 1.0, even when the mnemonics are the same.  
We're already pretty buried in testing for 14, so trying to pick up 
another target is going to be a huge headache (particularly one that's a 
bit special).
 
> 
> 
> 
> 
> juzhe.zh...@rivai.ai
 


[PATCH] libgccjit: Add ways to set the personality function

2023-11-17 Thread Antoni Boucher
Hi.
This adds functions to set the personality function (bug 112603).

I'm not sure I can make a test for this: it seems the personality
function will not be set if there are no try/catch inside the
functions.
Do you know a way to keep the personality function that is set in this
case?

Or should we wait until I send the patch for try/catch?

Thanks for the review.
From 6beb6452c7bac9ecbdaea750d61d6e6c6bd3ed8f Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Sun, 16 Apr 2023 13:19:20 -0400
Subject: [PATCH] libgccjit: Add ways to set the personality function

gcc/ChangeLog:
	PR jit/112603
	* expr.cc (build_personality_function_with_name): New function.
	* tree.cc (tree_cc_finalize): Cleanup gcc_eh_personality_decl.
	* tree.h (build_personality_function_with_name): New decl.

gcc/jit/ChangeLog:
	PR jit/112603
	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_26): New ABI tag.
	* docs/topics/functions.rst: Document the functions
	gcc_jit_set_global_personality_function_name and
	gcc_jit_function_set_personality_function.
	* dummy-frontend.cc (jit_gc_root): New variable.
	(jit_preserve_from_gc): New function.
	(jit_langhook_init): Initialize new variables.
	(jit_langhook_eh_personality): New hook.
	(LANG_HOOKS_EH_PERSONALITY): New hook.
	* jit-playback.cc (set_personality_function): New function.
	* jit-playback.h: New decl.
	* jit-recording.cc
	(memento_of_set_personality_function::make_debug_string,
	recording::memento_of_set_personality_function::write_reproducer,
	recording::function::set_personality_function,
	recording::memento_of_set_personality_function::replay_into):
	New functions
	* jit-recording.h (class memento_of_set_personality_function):
	New class.
	(recording::function::set_personality_function): New function.
	* libgccjit.cc (gcc_jit_function_set_personality_function,
	gcc_jit_set_global_personality_function_name): New functions.
	* libgccjit.h (gcc_jit_set_global_personality_function_name,
	gcc_jit_function_set_personality_function): New functions.
	* libgccjit.map: New functions.

gcc/testsuite/ChangeLog:

	* jit.dg/test-personality-function.c: New test.
	* jit.dg/all-non-failing-tests.h: Mention
	test-personality-function.c.
---
 gcc/expr.cc   |  8 +++
 gcc/jit/docs/topics/compatibility.rst | 10 
 gcc/jit/docs/topics/functions.rst | 28 ++
 gcc/jit/dummy-frontend.cc | 36 
 gcc/jit/jit-playback.cc   |  8 +++
 gcc/jit/jit-playback.h|  3 +
 gcc/jit/jit-recording.cc  | 44 +++
 gcc/jit/jit-recording.h   | 23 
 gcc/jit/libgccjit.cc  | 22 
 gcc/jit/libgccjit.h   |  8 +++
 gcc/jit/libgccjit.map |  6 ++
 gcc/testsuite/jit.dg/all-non-failing-tests.h  |  3 +
 .../jit.dg/test-personality-function.c| 55 +++
 gcc/tree.cc   |  1 +
 gcc/tree.h|  1 +
 15 files changed, 256 insertions(+)
 create mode 100644 gcc/testsuite/jit.dg/test-personality-function.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 556bcf7ef59..25d50289b24 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13559,6 +13559,14 @@ build_personality_function (const char *lang)
 
   name = ACONCAT (("__", lang, "_personality", unwind_and_version, NULL));
 
+  return build_personality_function_with_name (name);
+}
+
+tree
+build_personality_function_with_name (const char *name)
+{
+  tree decl, type;
+
   type = build_function_type_list (unsigned_type_node,
    integer_type_node, integer_type_node,
    long_long_unsigned_type_node,
diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index ebede440ee4..31c3ef6401a 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -378,3 +378,13 @@ alignment of a variable:
 
 ``LIBGCCJIT_ABI_25`` covers the addition of
 :func:`gcc_jit_type_get_restrict`
+
+.. _LIBGCCJIT_ABI_26:
+
+``LIBGCCJIT_ABI_26``
+
+``LIBGCCJIT_ABI_26`` covers the addition of functions to set the personality
+function:
+
+  * :func:`gcc_jit_function_set_personality_function`
+  * :func:`gcc_jit_set_global_personality_function_name`
diff --git a/gcc/jit/docs/topics/functions.rst b/gcc/jit/docs/topics/functions.rst
index cf5cb716daf..e59885c3549 100644
--- a/gcc/jit/docs/topics/functions.rst
+++ b/gcc/jit/docs/topics/functions.rst
@@ -197,6 +197,34 @@ Functions
 
.. type:: gcc_jit_case
 
+.. function::  void
+   gcc_jit_function_set_personality_function (gcc_jit_function *fn,
+  gcc_jit_function *personality_func)
+
+   Set the personality function of ``fn`` to ``personality_func``.
+
+   were added in :ref:`LIBGCCJIT_ABI_26`; you can test for their presence
+   using
+
+   .. 

[PATCH] libgccjit: Add vector permutation and vector access operations

2023-11-17 Thread Antoni Boucher
Hi.
This patch adds a vector permutation and vector access operations (bug
112602).

This was split from this patch:
https://gcc.gnu.org/pipermail/jit/2023q1/001606.html

Thanks for the review.
From 25b386334f22845d7ba1b60658730373eb6ddbb3 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Fri, 17 Nov 2023 17:23:28 -0500
Subject: [PATCH] libgccjit: Add vector permutation and vector access
 operations

gcc/jit/ChangeLog:
	PR jit/112602
	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_26): New ABI tag.
	* docs/topics/expressions.rst: Document
	gcc_jit_context_new_rvalue_vector_perm and
	gcc_jit_context_new_vector_access.
	* jit-playback.cc (playback::context::new_rvalue_vector_perm,
	common_mark_addressable_vec,
	gnu_vector_type_p,
	lvalue_p,
	convert_vector_to_array_for_subscript,
	new_vector_access): new functions.
	* jit-playback.h (new_rvalue_vector_perm, new_vector_access):
	New functions.
	* jit-recording.cc (recording::context::new_rvalue_vector_perm,
	recording::context::new_vector_access,
	memento_of_new_rvalue_vector_perm,
	recording::memento_of_new_rvalue_vector_perm::replay_into,
	recording::memento_of_new_rvalue_vector_perm::visit_children,
	recording::memento_of_new_rvalue_vector_perm::make_debug_string,
	recording::memento_of_new_rvalue_vector_perm::write_reproducer,
	recording::vector_access::replay_into,
	recording::vector_access::visit_children,
	recording::vector_access::make_debug_string,
	recording::vector_access::write_reproducer): New methods.
	* jit-recording.h (class memento_of_new_rvalue_vector_perm,
	class vector_access): New classes.
	* libgccjit.cc (gcc_jit_context_new_vector_access,
	gcc_jit_context_new_rvalue_vector_perm): New functions.
	* libgccjit.h (gcc_jit_context_new_rvalue_vector_perm,
	gcc_jit_context_new_vector_access): New functions.
	* libgccjit.map: New functions.

gcc/testsuite/ChangeLog:
	PR jit/112602
	* jit.dg/all-non-failing-tests.h: New test test-vector-perm.c.
	* jit.dg/test-vector-perm.c: New test.
---
 gcc/jit/docs/topics/compatibility.rst|  10 ++
 gcc/jit/docs/topics/expressions.rst  |  53 ++
 gcc/jit/jit-playback.cc  | 150 
 gcc/jit/jit-playback.h   |  11 ++
 gcc/jit/jit-recording.cc | 169 +++
 gcc/jit/jit-recording.h  |  72 
 gcc/jit/libgccjit.cc | 109 
 gcc/jit/libgccjit.h  |  29 
 gcc/jit/libgccjit.map|   6 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h |  12 +-
 gcc/testsuite/jit.dg/test-vector-perm.c  |  96 +++
 11 files changed, 716 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/jit.dg/test-vector-perm.c

diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index ebede440ee4..a764e3968d1 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -378,3 +378,13 @@ alignment of a variable:
 
 ``LIBGCCJIT_ABI_25`` covers the addition of
 :func:`gcc_jit_type_get_restrict`
+
+
+.. _LIBGCCJIT_ABI_26:
+
+``LIBGCCJIT_ABI_26``
+
+``LIBGCCJIT_ABI_26`` covers the addition of functions to manipulate vectors:
+
+  * :func:`gcc_jit_context_new_rvalue_vector_perm`
+  * :func:`gcc_jit_context_new_vector_access`
diff --git a/gcc/jit/docs/topics/expressions.rst b/gcc/jit/docs/topics/expressions.rst
index 42cfee36302..4a45aa13f5c 100644
--- a/gcc/jit/docs/topics/expressions.rst
+++ b/gcc/jit/docs/topics/expressions.rst
@@ -295,6 +295,35 @@ Vector expressions
 
   #ifdef LIBGCCJIT_HAVE_gcc_jit_context_new_rvalue_from_vector
 
+.. function:: gcc_jit_rvalue * \
+  gcc_jit_context_new_rvalue_vector_perm (gcc_jit_context *ctxt, \
+  gcc_jit_location *loc, \
+  gcc_jit_rvalue *elements1, \
+  gcc_jit_rvalue *elements2, \
+  gcc_jit_rvalue *mask);
+
+   Build a permutation of two vectors.
+
+   "elements1" and "elements2" should have the same type.
+   The length of "mask" and "elements1" should be the same.
+   The element type of "mask" should be integral.
+   The size of the element type of "mask" and "elements1" should be the same.
+
+   This entrypoint was added in :ref:`LIBGCCJIT_ABI_25`; you can test for
+   its presence using
+
+   .. code-block:: c
+
+  #ifdef LIBGCCJIT_HAVE_VECTOR_OPERATIONS
+
+Analogous to:
+
+.. code-block:: c
+
+   __builtin_shuffle (elements1, elements2, mask)
+
+in C.
+
 Unary Operations
 
 
@@ -1020,3 +1049,27 @@ Field access is provided separately for both lvalues and rvalues.
   PTR[INDEX]
 
in C (or, indeed, to ``PTR + INDEX``).
+
+.. function:: gcc_jit_lvalue *\
+  

[PATCH] Makefile.tpl: Avoid race condition in generating site.exp from the top level

2023-11-17 Thread Lewis Hyatt
Hello-

I often find it convenient to run a new c-c++-common test from the
main build dir like:

$ make -j 2 RUNTESTFLAGS=dg.exp=new-test.c check-gcc-{c,c++}

I noticed that sometimes this produces a corrupted site.exp and then no
tests work until it is remade manually. To avoid the issue, it is necessary
to do "cd gcc; make site.exp" before running a parallel make from the top
level directory. The below patch fixes it by just making that dependency on
site.exp explicit in the top level Makefile. Is it OK please? Thanks...

-Lewis

-- >8 --

A command like "make -j 2 check-gcc-c check-gcc-c++" run in the top level of
a fresh build directory does not work reliably. That will spawn two
independent make processes inside the "gcc" directory, and each of those
will attempt to create site.exp if it doesn't exist and will interfere with
each other, producing often a corrupted or empty site.exp. Resolve that by
making these targets depend on a new phony target which makes sure site.exp
is created first before starting the recursive makes.

ChangeLog:

* Makefile.in: Regenerate.
* Makefile.tpl: Add dependency on site.exp to check-gcc-* targets
---
 Makefile.in  | 30 +++---
 Makefile.tpl | 10 +-
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/Makefile.tpl b/Makefile.tpl
index 8b7783bb4f1..6e22adecd2f 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -1639,9 +1639,17 @@ cross: all-build all-gas all-ld
 @endif gcc-no-bootstrap
 
 @if gcc
+
+.PHONY: gcc-site.exp
+gcc-site.exp:
+   r=`${PWD_COMMAND}`; export r; \
+   s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
+   $(HOST_EXPORTS) \
+   (cd gcc && $(MAKE) $(GCC_FLAGS_TO_PASS) site.exp);
+
 [+ FOR languages +]
 .PHONY: check-gcc-[+language+] check-[+language+]
-check-gcc-[+language+]:
+check-gcc-[+language+]: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
diff --git a/Makefile.in b/Makefile.in
index b65ab4953bc..da2344b3f3d 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -62200,8 +62200,16 @@ cross: all-build all-gas all-ld
 
 @if gcc
 
+.PHONY: gcc-site.exp
+gcc-site.exp:
+   r=`${PWD_COMMAND}`; export r; \
+   s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
+   $(HOST_EXPORTS) \
+   (cd gcc && $(MAKE) $(GCC_FLAGS_TO_PASS) site.exp);
+
+
 .PHONY: check-gcc-c check-c
-check-gcc-c:
+check-gcc-c: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62209,7 +62217,7 @@ check-gcc-c:
 check-c: check-gcc-c
 
 .PHONY: check-gcc-c++ check-c++
-check-gcc-c++:
+check-gcc-c++: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62217,7 +62225,7 @@ check-gcc-c++:
 check-c++: check-gcc-c++ check-target-libstdc++-v3 check-target-libitm-c++ 
check-target-libgomp-c++
 
 .PHONY: check-gcc-fortran check-fortran
-check-gcc-fortran:
+check-gcc-fortran: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62225,7 +62233,7 @@ check-gcc-fortran:
 check-fortran: check-gcc-fortran check-target-libquadmath 
check-target-libgfortran check-target-libgomp-fortran
 
 .PHONY: check-gcc-ada check-ada
-check-gcc-ada:
+check-gcc-ada: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62233,7 +62241,7 @@ check-gcc-ada:
 check-ada: check-gcc-ada check-target-libada
 
 .PHONY: check-gcc-objc check-objc
-check-gcc-objc:
+check-gcc-objc: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62241,7 +62249,7 @@ check-gcc-objc:
 check-objc: check-gcc-objc check-target-libobjc
 
 .PHONY: check-gcc-obj-c++ check-obj-c++
-check-gcc-obj-c++:
+check-gcc-obj-c++: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62249,7 +62257,7 @@ check-gcc-obj-c++:
 check-obj-c++: check-gcc-obj-c++
 
 .PHONY: check-gcc-go check-go
-check-gcc-go:
+check-gcc-go: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62257,7 +62265,7 @@ check-gcc-go:
 check-go: check-gcc-go check-target-libgo check-gotools
 
 .PHONY: check-gcc-m2 check-m2
-check-gcc-m2:
+check-gcc-m2: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62265,7 +62273,7 @@ check-gcc-m2:
 check-m2: check-gcc-m2 check-target-libgm2
 
 .PHONY: check-gcc-d check-d
-check-gcc-d:
+check-gcc-d: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62273,7 

Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Jonathan Wakely
On Fri, 17 Nov 2023 at 15:32, Jonathan Wakely  wrote:
>
> Tested x86_64-linux. Pushed to trunk.
>
> GCC generates better code for add_sat if we use:
>
> unsigned z = x + y;
> z |= -(z < x);
> return z;
>
> If the compiler can't be improved we should consider using that instead
> of __builtin_add_overflow.

I reported PR 112600 for the missed optimization. I added an optimized
sub_sat there as well.



[PATCH] c++: P2280R4, Using unknown refs in constant expr [PR106650]

2023-11-17 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch is an attempt to implement (part of?) P2280, Using unknown
pointers and references in constant expressions.  (Note that R4 seems to
only allow References to unknown/Accesses via this, but not Pointers to
unknown.)

This patch works to the extent that the test case added in [expr.const]
works as expected, as well as the test in


Most importantly, the proposal makes this compile:

  template 
  constexpr auto array_size(T (&)[N]) -> size_t {
  return N;
  }

  void check(int const ()[3]) {
  constexpr auto s = array_size(param);
  static_assert (s == 3);
  }

and I think it would be a pity not to have it in GCC 14.

What still doesn't work (and I don't know if it should) is the test in $3.2:

  struct A2 { constexpr int f() { return 0; } };
  struct B2 : virtual A2 {};
  void f2(B2 ) { constexpr int k = b.f(); }

where we say
error: '* & b' is not a constant expression

PR c++/106650

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Allow reference to
unknown as per P2280.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-array-ptr6.C: Remove dg-error.
* g++.dg/cpp0x/constexpr-ref12.C: Likewise.
* g++.dg/cpp1y/lambda-generic-const10.C: Likewise.
* g++.dg/cpp0x/constexpr-ref13.C: New test.
* g++.dg/cpp1z/constexpr-ref1.C: New test.
* g++.dg/cpp1z/constexpr-ref2.C: New test.
* g++.dg/cpp2a/constexpr-ref1.C: New test.
---
 gcc/cp/constexpr.cc   |  2 +
 .../g++.dg/cpp0x/constexpr-array-ptr6.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C  |  4 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C  | 25 +
 .../g++.dg/cpp1y/lambda-generic-const10.C |  2 +-
 gcc/testsuite/g++.dg/cpp1z/constexpr-ref1.C   | 26 +
 gcc/testsuite/g++.dg/cpp1z/constexpr-ref2.C   | 23 
 gcc/testsuite/g++.dg/cpp2a/constexpr-ref1.C   | 54 +++
 8 files changed, 134 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-ref1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-ref2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-ref1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 344107d494b..d5e487801cc 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7378,6 +7378,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  r = build_constructor (TREE_TYPE (t), NULL);
  TREE_CONSTANT (r) = true;
}
+  else if (TYPE_REF_P (TREE_TYPE (t)))
+   /* P2280 allows references to unknown.  */;
   else
{
  if (!ctx->quiet)
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C
index 1c065120314..d212665e51f 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C
@@ -12,7 +12,7 @@ constexpr auto sz_d = size(array_double);
 static_assert(sz_d == 3, "Array size failure");
 
 void f(bool ()[2]) {
-  static_assert(size(param) == 2, "Array size failure"); // { dg-error "" }
+  static_assert(size(param) == 2, "Array size failure");
   short data[] = {-1, 2, -45, 6, 88, 99, -345};
   static_assert(size(data) == 7, "Array size failure");
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C
index 7c3ce66b4c9..f4500144946 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C
@@ -40,7 +40,7 @@ void f(a ap, a& arp)
   static_assert (g(ar2),"");   // { dg-error "constant" }
   static_assert (h(ar2),"");   // { dg-error "constant" }
 
-  static_assert (arp.g(),"");  // { dg-error "constant" }
-  static_assert (g(arp),"");   // { dg-error "constant" }
+  static_assert (arp.g(),"");
+  static_assert (g(arp),"");
   static_assert (h(arp),"");   // { dg-error "constant" }
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C
new file mode 100644
index 000..4be729c2301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C
@@ -0,0 +1,25 @@
+// P2280R4 - Using unknown pointers and references in constant expressions
+// PR c++/106650
+// { dg-do compile { target c++11 } }
+
+using size_t = decltype(sizeof(42));
+
+template 
+constexpr auto array_size(T (&)[N]) -> size_t {
+return N;
+}
+
+void check(int const ()[3]) {
+int local[] = {1, 2, 3};
+constexpr auto s0 = array_size(local);
+constexpr auto s1 = array_size(param);
+}
+
+template 
+constexpr size_t array_size_ptr(T (*)[N]) {
+return N;
+}
+
+void check_ptr(int const (*param)[3]) {
+constexpr auto s2 = array_size_ptr(param); // { 

Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Jeff Law




On 11/17/23 14:08, Antoni Boucher wrote:

In contrast with the other frontends, libgccjit can be executed
multiple times in a row in the same process.
Yup.  I'm aware of that.  Even so calling init_emit_once more than one 
time still seems wrong.


jeff


Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Antoni Boucher
In contrast with the other frontends, libgccjit can be executed
multiple times in a row in the same process.
This is the source of multiple bugs due to global variables as can be
seen by several patches I sent these past years.

On Fri, 2023-11-17 at 14:06 -0700, Jeff Law wrote:
> 
> 
> On 11/16/23 15:36, Antoni Boucher wrote:
> > Hi.
> > This patch fixes a RTL bug when using some target-specific builtins
> > in
> > libgccjit (bug 112576).
> > 
> > The test use a function from an unmerged patch:
> > https://gcc.gnu.org/pipermail/jit/2023q1/001605.html
> > 
> > Thanks for the review!
> The natural question here is why does libgccjit call init_emit_once
> more 
> than one time?  The whole point of that routine is doing one time 
> initializations.  It's not supposed to be called more than once.
> 
> David?  Thoughts here?
> 
> jeff



Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Jeff Law




On 11/16/23 15:36, Antoni Boucher wrote:

Hi.
This patch fixes a RTL bug when using some target-specific builtins in
libgccjit (bug 112576).

The test use a function from an unmerged patch:
https://gcc.gnu.org/pipermail/jit/2023q1/001605.html

Thanks for the review!
The natural question here is why does libgccjit call init_emit_once more 
than one time?  The whole point of that routine is doing one time 
initializations.  It's not supposed to be called more than once.


David?  Thoughts here?

jeff


[PATCH] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-17 Thread Xi Ruoyao
The usage LSX and LASX frint/ftint instructions had some problems:

1. These instructions raises FE_INEXACT, which is not allowed with
   -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
   (the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without explicit rounding mode is used for
   roundM2, this is incorrect because roundM2 is defined "rounding
   operand 1 to the *nearest* integer, rounding away from zero in the
   event of a tie".  We actually don't have such an instruction.  Our
   frintrne instruction is roundevenM2 (unfortunately, this is not
   documented).
3. These define_insn's are written in a way not so easy to hack.

So I removed these instructions and created a "simd.md" file, then added
them and the corresponding expanders there.  The advantage of the
simd.md file is we don't need to duplicate the RTL template twice (in
lsx.md and lasx.md).

gcc/ChangeLog:

PR target/112578
* config/loongarch/lsx.md (UNSPEC_LSX_VFTINT_S,
UNSPEC_LSX_VFTINTRNE, UNSPEC_LSX_VFTINTRP,
UNSPEC_LSX_VFTINTRM, UNSPEC_LSX_VFRINTRNE_S,
UNSPEC_LSX_VFRINTRNE_D, UNSPEC_LSX_VFRINTRZ_S,
UNSPEC_LSX_VFRINTRZ_D, UNSPEC_LSX_VFRINTRP_S,
UNSPEC_LSX_VFRINTRP_D, UNSPEC_LSX_VFRINTRM_S,
UNSPEC_LSX_VFRINTRM_D): Remove.
(ILSX, FLSX): Move into ...
(VIMODE): Move into ...
(FRINT_S, FRINT_D): Remove.
(frint_pattern_s, frint_pattern_d, frint_suffix): Remove.
(lsx_vfrint_, lsx_vftint_s__,
lsx_vftintrne_w_s, lsx_vftintrne_l_d, lsx_vftintrp_w_s,
lsx_vftintrp_l_d, lsx_vftintrm_w_s, lsx_vftintrm_l_d,
lsx_vfrintrne_s, lsx_vfrintrne_d, lsx_vfrintrz_s,
lsx_vfrintrz_d, lsx_vfrintrp_s, lsx_vfrintrp_d,
lsx_vfrintrm_s, lsx_vfrintrm_d,
v4sf2,
v2df2, round2,
fix_trunc2): Remove.
* config/loongarch/lasx.md: Likewise.
* config/loongarch/simd.md: New file.
(ILSX, ILASX, FLSX, FLASX, VIMODE): ... here.
(IVEC, FVEC): New mode iterators.
(VIMODE): ... here.  Extend it to work for all LSX/LASX vector
modes.
(x, wu, simd_isa, WVEC, vimode, simdfmt, simdifmt_for_f,
elebits): New mode attributes.
(UNSPEC_SIMD_FRINTRP, UNSPEC_SIMD_FRINTRZ, UNSPEC_SIMD_FRINT,
UNSPEC_SIMD_FRINTRM, UNSPEC_SIMD_FRINTRNE): New unspecs.
(SIMD_FRINT): New int iterator.
(simd_frint_rounding, simd_frint_pattern): New int attributes.
(_vfrint_): New
define_insn template for frint instructions.
(_vftint__):
Likewise, but for ftint instructions.
(2): New define_expand with
flag_fp_int_builtin_inexact checked.
(l2): Likewise.
(rint2): New define_expand.  It does not require
flag_fp_int_builtin_inexact.
(ftrunc2): Likewise.
(lrint2): Likewise.
(fix_trunc2): New define_insn_and_split.  It does
not require flag_fp_int_builtin_inexact.
(include): Add lsx.md and lasx.md.
* config/loongarch/loongarch.md (include): Include simd.md,
instead of including lsx.md and lasx.md directly.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vftint_w_s, CODE_FOR_lsx_vftint_l_d,
CODE_FOR_lasx_xvftint_w_s, CODE_FOR_lasx_xvftint_l_d):
Remove.

gcc/testsuite/ChangeLog:

PR target/112578
* gcc.target/loongarch/vect-frint.c: New test.
* gcc.target/loongarch/vect-frint-no-inexact.c: New test.
* gcc.target/loongarch/vect-ftint.c: New test.
* gcc.target/loongarch/vect-ftint-no-inexact.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu (with LASX enabled
in BOOT_CFLAGS).  Ok for trunk?

 gcc/config/loongarch/lasx.md  | 239 -
 gcc/config/loongarch/loongarch-builtins.cc|   4 -
 gcc/config/loongarch/loongarch.md |   7 +-
 gcc/config/loongarch/lsx.md   | 243 --
 gcc/config/loongarch/simd.md  | 204 +++
 .../loongarch/vect-frint-no-inexact.c |  48 
 .../gcc.target/loongarch/vect-frint.c |  82 ++
 .../loongarch/vect-ftint-no-inexact.c |  44 
 .../gcc.target/loongarch/vect-ftint.c |  80 ++
 9 files changed, 460 insertions(+), 491 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 2e11f061202..d4a56c307c4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -53,7 +53,6 @@
   UNSPEC_LASX_XVFCMP_SULT
   UNSPEC_LASX_XVFCMP_SUN
   

[PATCH v2 3/6] LoongArch: Add evolution features of base ISA revisions

2023-11-17 Thread Xi Ruoyao
* config/loongarch/loongarch-def.h:
(loongarch_isa_base_features): Declare.  Define it in ...
* config/loongarch/loongarch-cpu.cc
(loongarch_isa_base_features): ... here.
(fill_native_cpu_config): If we know the base ISA of the CPU
model from PRID, use it instead of la64 (v1.0).  Check if all
expected features of this base ISA is available, emit a warning
if not.
* config/loongarch/loongarch-opts.cc (config_target_isa): Enable
the features implied by the base ISA if not -march=native.
---
 gcc/config/loongarch/loongarch-cpu.cc  | 62 ++
 gcc/config/loongarch/loongarch-def.h   |  5 +++
 gcc/config/loongarch/loongarch-opts.cc |  3 ++
 3 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-cpu.cc 
b/gcc/config/loongarch/loongarch-cpu.cc
index f41e175257a..7acf1a9121d 100644
--- a/gcc/config/loongarch/loongarch-cpu.cc
+++ b/gcc/config/loongarch/loongarch-cpu.cc
@@ -32,6 +32,19 @@ along with GCC; see the file COPYING3.  If not see
 #include "loongarch-cpucfg-map.h"
 #include "loongarch-str.h"
 
+/* loongarch_isa_base_features defined here instead of loongarch-def.c
+   because we need to use options.h.  Pay attention on the order of elements
+   in the initializer becaue ISO C++ does not allow C99 designated
+   initializers!  */
+
+#define ISA_BASE_LA64V110_FEATURES \
+  (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA)
+
+int64_t loongarch_isa_base_features[N_ISA_BASE_TYPES] = {
+  /* [ISA_BASE_LA64V100] = */ 0,
+  /* [ISA_BASE_LA64V110] = */ ISA_BASE_LA64V110_FEATURES,
+};
+
 /* Native CPU detection with "cpucfg" */
 static uint32_t cpucfg_cache[N_CPUCFG_WORDS] = { 0 };
 
@@ -127,24 +140,22 @@ fill_native_cpu_config (struct loongarch_target *tgt)
 With: base architecture (ARCH)
 At:   cpucfg_words[1][1:0] */
 
-  switch (cpucfg_cache[1] & 0x3)
-   {
- case 0x02:
-   tmp = ISA_BASE_LA64V100;
-   break;
-
- default:
-   fatal_error (UNKNOWN_LOCATION,
-"unknown native base architecture %<0x%x%>, "
-"%qs failed", (unsigned int) (cpucfg_cache[1] & 0x3),
-"-m" OPTSTR_ARCH "=" STR_CPU_NATIVE);
-   }
-
-  /* Check consistency with PRID presets.  */
-  if (native_cpu_type != CPU_NATIVE && tmp != preset.base)
-   warning (0, "base architecture %qs differs from PRID preset %qs",
-loongarch_isa_base_strings[tmp],
-loongarch_isa_base_strings[preset.base]);
+  if (native_cpu_type != CPU_NATIVE)
+   tmp = loongarch_cpu_default_isa[native_cpu_type].base;
+  else
+   switch (cpucfg_cache[1] & 0x3)
+ {
+   case 0x02:
+ tmp = ISA_BASE_LA64V100;
+ break;
+
+   default:
+ fatal_error (UNKNOWN_LOCATION,
+  "unknown native base architecture %<0x%x%>, "
+  "%qs failed",
+  (unsigned int) (cpucfg_cache[1] & 0x3),
+  "-m" OPTSTR_ARCH "=" STR_CPU_NATIVE);
+ }
 
   /* Use the native value anyways.  */
   preset.base = tmp;
@@ -227,6 +238,21 @@ fill_native_cpu_config (struct loongarch_target *tgt)
   for (const auto : cpucfg_map)
if (cpucfg_cache[entry.cpucfg_word] & entry.cpucfg_bit)
  preset.evolution |= entry.isa_evolution_bit;
+
+  if (native_cpu_type != CPU_NATIVE)
+   {
+ /* Check if the local CPU really supports the features of the base
+ISA of probed native_cpu_type.  If any feature is not detected,
+either GCC or the hardware is buggy.  */
+ auto base_isa_feature = loongarch_isa_base_features[preset.base];
+ if ((preset.evolution & base_isa_feature) != base_isa_feature)
+   warning (0,
+"detected base architecture %qs, but some of its "
+"features are not detected; the detected base "
+"architecture may be unreliable, only detected "
+"features will be enabled",
+loongarch_isa_base_strings[preset.base]);
+   }
 }
 
   if (tune_native_p)
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 6123c8e0f19..af7bd635d6e 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -55,12 +55,17 @@ extern "C" {
 
 /* enum isa_base */
 extern const char* loongarch_isa_base_strings[];
+
 /* LoongArch V1.00.  */
 #define ISA_BASE_LA64V100 0
 /* LoongArch V1.10.  */
 #define ISA_BASE_LA64V110 1
 #define N_ISA_BASE_TYPES  2
 
+/* Unlike other arrays, this is defined in loongarch-cpu.cc.  The problem is
+   we cannot use the C++ header options.h in loongarch-def.c.  */
+extern int64_t loongarch_isa_base_features[];
+
 /* enum isa_ext_* */
 extern const 

[PATCH v2 4/6] LoongArch: Take the advantage of -mdiv32 if it's enabled

2023-11-17 Thread Xi Ruoyao
With -mdiv32, we can assume div.w[u] and mod.w[u] works on low 32 bits
of a 64-bit GPR even if it's not sign-extended.

gcc/ChangeLog:

* config/loongarch/loongarch.md (DIV): New mode iterator.
(3): Don't expand if TARGET_DIV32.
(di3_fake): Disable if TARGET_DIV32.
(*3): Allow SImode if TARGET_DIV32.
(si3_extended): New insn if TARGET_DIV32.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/div-div32.c: New test.
* gcc.target/loongarch/div-no-div32.c: New test.
---
 gcc/config/loongarch/loongarch.md | 31 ---
 .../gcc.target/loongarch/div-div32.c  | 31 +++
 .../gcc.target/loongarch/div-no-div32.c   | 11 +++
 3 files changed, 68 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-div32.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-no-div32.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 22814a3679c..a97e5ee094a 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -408,6 +408,10 @@ (define_mode_iterator LD_AT_LEAST_32_BIT [GPR ANYF])
 ;; st.w.
 (define_mode_iterator ST_ANY [QHWD ANYF])
 
+;; A mode for anything legal as a input of a div or mod instruction.
+(define_mode_iterator DIV [(DI "TARGET_64BIT")
+  (SI "!TARGET_64BIT || TARGET_DIV32")])
+
 ;; In GPR templates, a string like "mul." will expand to "mul.w" in the
 ;; 32-bit version and "mul.d" in the 64-bit version.
 (define_mode_attr d [(SI "w") (DI "d")])
@@ -914,7 +918,7 @@ (define_expand "3"
 (match_operand:GPR 2 "register_operand")))]
   ""
 {
- if (GET_MODE (operands[0]) == SImode && TARGET_64BIT)
+ if (GET_MODE (operands[0]) == SImode && TARGET_64BIT && !TARGET_DIV32)
   {
 rtx reg1 = gen_reg_rtx (DImode);
 rtx reg2 = gen_reg_rtx (DImode);
@@ -934,9 +938,9 @@ (define_expand "3"
 })
 
 (define_insn "*3"
-  [(set (match_operand:X 0 "register_operand" "=r,,")
-   (any_div:X (match_operand:X 1 "register_operand" "r,r,0")
-  (match_operand:X 2 "register_operand" "r,r,r")))]
+  [(set (match_operand:DIV 0 "register_operand" "=r,,")
+   (any_div:DIV (match_operand:DIV 1 "register_operand" "r,r,0")
+(match_operand:DIV 2 "register_operand" "r,r,r")))]
   ""
 {
   return loongarch_output_division (".\t%0,%1,%2", operands);
@@ -949,6 +953,23 @@ (define_insn "*3"
(const_string "yes")
(const_string "no")))])
 
+(define_insn "si3_extended"
+  [(set (match_operand:DI 0 "register_operand" "=r,,")
+   (sign_extend
+ (any_div:SI (match_operand:SI 1 "register_operand" "r,r,0")
+ (match_operand:SI 2 "register_operand" "r,r,r"]
+  "TARGET_64BIT && TARGET_DIV32"
+{
+  return loongarch_output_division (".w\t%0,%1,%2", operands);
+}
+  [(set_attr "type" "idiv")
+   (set_attr "mode" "SI")
+   (set (attr "enabled")
+  (if_then_else
+   (match_test "!!which_alternative == loongarch_check_zero_div_p()")
+   (const_string "yes")
+   (const_string "no")))])
+
 (define_insn "di3_fake"
   [(set (match_operand:DI 0 "register_operand" "=r,,")
(sign_extend:DI
@@ -957,7 +978,7 @@ (define_insn "di3_fake"
 (any_div:DI (match_operand:DI 1 "register_operand" "r,r,0")
 (match_operand:DI 2 "register_operand" "r,r,r")) 0)]
  UNSPEC_FAKE_ANY_DIV)))]
-  "TARGET_64BIT"
+  "TARGET_64BIT && !TARGET_DIV32"
 {
   return loongarch_output_division (".w\t%0,%1,%2", operands);
 }
diff --git a/gcc/testsuite/gcc.target/loongarch/div-div32.c 
b/gcc/testsuite/gcc.target/loongarch/div-div32.c
new file mode 100644
index 000..8b1f686eca2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-div32.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mdiv32" } */
+/* { dg-final { scan-assembler "div\.w" } } */
+/* { dg-final { scan-assembler "div\.wu" } } */
+/* { dg-final { scan-assembler "mod\.w" } } */
+/* { dg-final { scan-assembler "mod\.wu" } } */
+/* { dg-final { scan-assembler-not "slli\.w.*,0" } } */
+
+int
+divw (long a, long b)
+{
+  return (int)a / (int)b;
+}
+
+unsigned int
+divwu (long a, long b)
+{
+  return (unsigned int)a / (unsigned int)b;
+}
+
+int
+modw (long a, long b)
+{
+  return (int)a % (int)b;
+}
+
+unsigned int
+modwu (long a, long b)
+{
+  return (unsigned int)a % (unsigned int)b;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/div-no-div32.c 
b/gcc/testsuite/gcc.target/loongarch/div-no-div32.c
new file mode 100644
index 000..f0f697ba589
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-no-div32.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "div\.w" } } */
+/* { dg-final { scan-assembler "div\.wu" } } */
+/* { dg-final { scan-assembler "mod\.w" } } */
+/* { 

[PATCH v2 6/6] LoongArch: Add fine-grained control for LAM_BH and LAMCAS

2023-11-17 Thread Xi Ruoyao
gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in: (lam-bh, lamcas):
Add.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-cpucfg-map.h: Regenerate.
* config/loongarch/loongarch-cpu.cc
(ISA_BASE_LA64V110_FEATURES): Include OPTION_MASK_ISA_LAM_BH
and OPTION_MASK_ISA_LAMCAS.
* config/loongarch/sync.md (atomic_add): Use
TARGET_LAM_BH instead of ISA_BASE_IS_LA64V110.  Remove empty
lines from assembly output.
(atomic_exchange_short): Likewise.
(atomic_exchange): Likewise.
(atomic_fetch_add_short): Likewise.
(atomic_fetch_add): Likewise.
(atomic_cas_value_strong_amcas): Use TARGET_LAMCAS instead
of ISA_BASE_IS_LA64V110.
(atomic_compare_and_swap): Likewise.
(atomic_compare_and_swap): Likewise.
(atomic_compare_and_swap): Likewise.
* config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump
status if -mlam-bh and -mlamcas if -fverbose-asm.
---
 gcc/config/loongarch/genopts/isa-evolution.in |  2 ++
 gcc/config/loongarch/loongarch-cpu.cc |  3 ++-
 gcc/config/loongarch/loongarch-cpucfg-map.h   |  2 ++
 gcc/config/loongarch/loongarch-str.h  |  2 ++
 gcc/config/loongarch/loongarch.cc |  2 ++
 gcc/config/loongarch/loongarch.opt|  8 
 gcc/config/loongarch/sync.md  | 18 +-
 7 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/gcc/config/loongarch/genopts/isa-evolution.in 
b/gcc/config/loongarch/genopts/isa-evolution.in
index e58f0d6a1a1..a6bc3f87f20 100644
--- a/gcc/config/loongarch/genopts/isa-evolution.in
+++ b/gcc/config/loongarch/genopts/isa-evolution.in
@@ -1,2 +1,4 @@
 2  26  div32   Support div.w[u] and mod.w[u] instructions with 
inputs not sign-extended.
+2  27  lam-bh  Support am{swap/add}[_db].{b/h} instructions.
+2  28  lamcas  Support amcas[_db].{b/h/w/d} instructions.
 3  23  ld-seq-sa   Do not need load-load barriers (dbar 0x700).
diff --git a/gcc/config/loongarch/loongarch-cpu.cc 
b/gcc/config/loongarch/loongarch-cpu.cc
index 7acf1a9121d..622df47916f 100644
--- a/gcc/config/loongarch/loongarch-cpu.cc
+++ b/gcc/config/loongarch/loongarch-cpu.cc
@@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
initializers!  */
 
 #define ISA_BASE_LA64V110_FEATURES \
-  (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA)
+  (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA \
+   | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS)
 
 int64_t loongarch_isa_base_features[N_ISA_BASE_TYPES] = {
   /* [ISA_BASE_LA64V100] = */ 0,
diff --git a/gcc/config/loongarch/loongarch-cpucfg-map.h 
b/gcc/config/loongarch/loongarch-cpucfg-map.h
index 0c078c39786..02ff1671255 100644
--- a/gcc/config/loongarch/loongarch-cpucfg-map.h
+++ b/gcc/config/loongarch/loongarch-cpucfg-map.h
@@ -30,6 +30,8 @@ static constexpr struct {
   HOST_WIDE_INT isa_evolution_bit;
 } cpucfg_map[] = {
   { 2, 1u << 26, OPTION_MASK_ISA_DIV32 },
+  { 2, 1u << 27, OPTION_MASK_ISA_LAM_BH },
+  { 2, 1u << 28, OPTION_MASK_ISA_LAMCAS },
   { 3, 1u << 23, OPTION_MASK_ISA_LD_SEQ_SA },
 };
 
diff --git a/gcc/config/loongarch/loongarch-str.h 
b/gcc/config/loongarch/loongarch-str.h
index 889962e9ab0..0384493765c 100644
--- a/gcc/config/loongarch/loongarch-str.h
+++ b/gcc/config/loongarch/loongarch-str.h
@@ -70,6 +70,8 @@ along with GCC; see the file COPYING3.  If not see
 #define STR_EXPLICIT_RELOCS_ALWAYS "always"
 
 #define OPTSTR_DIV32   "div32"
+#define OPTSTR_LAM_BH  "lam-bh"
+#define OPTSTR_LAMCAS  "lamcas"
 #define OPTSTR_LD_SEQ_SA   "ld-seq-sa"
 
 #endif /* LOONGARCH_STR_H */
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5d3282c5e93..46a898b79b7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -11451,6 +11451,8 @@ loongarch_asm_code_end (void)
   fprintf (asm_out_file, "%s Base ISA: %s\n", ASM_COMMENT_START,
   loongarch_isa_base_strings [la_target.isa.base]);
   DUMP_FEATURE (TARGET_DIV32);
+  DUMP_FEATURE (TARGET_LAM_BH);
+  DUMP_FEATURE (TARGET_LAMCAS);
   DUMP_FEATURE (TARGET_LD_SEQ_SA);
 }
 
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index a39eddc108b..4d36e3ec4de 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -267,6 +267,14 @@ mdiv32
 Target Mask(ISA_DIV32) Var(isa_evolution)
 Support div.w[u] and mod.w[u] instructions with inputs not sign-extended.
 
+mlam-bh
+Target Mask(ISA_LAM_BH) Var(isa_evolution)
+Support am{swap/add}[_db].{b/h} instructions.
+
+mlamcas
+Target Mask(ISA_LAMCAS) Var(isa_evolution)
+Support amcas[_db].{b/h/w/d} instructions.
+
 mld-seq-sa
 Target Mask(ISA_LD_SEQ_SA) Var(isa_evolution)
 Do not need 

[PATCH v2 5/6] LoongArch: Don't emit dbar 0x700 if -mld-seq-sa

2023-11-17 Thread Xi Ruoyao
This option (CPUCFG word 0x3 bit 23) means "the hardware guarantee that
two loads on the same address won't be reordered with each other".  Thus
we can omit the "load-load" barrier dbar 0x700.

This is only a micro-optimization because dbar 0x700 is already treated
as nop if the hardware supports LD_SEQ_SA.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand): Don't
print dbar 0x700 if TARGET_LD_SEQ_SA.
* config/loongarch/sync.md (atomic_load): Likewise.
---
 gcc/config/loongarch/loongarch.cc | 2 +-
 gcc/config/loongarch/sync.md  | 9 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index b4bb2b6eeb5..5d3282c5e93 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6057,7 +6057,7 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
   if (loongarch_cas_failure_memorder_needs_acquire (
memmodel_from_int (INTVAL (op
fputs ("dbar\t0b10100", file);
-  else
+  else if (!TARGET_LD_SEQ_SA)
fputs ("dbar\t0x700", file);
   break;
 
diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 67848d72b87..ce3ce89a61d 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -119,13 +119,14 @@ (define_insn "atomic_load"
 case MEMMODEL_SEQ_CST:
   return "dbar\t0x11\\n\\t"
 "ld.\t%0,%1\\n\\t"
-"dbar\t0x14\\n\\t";
+"dbar\t0x14";
 case MEMMODEL_ACQUIRE:
   return "ld.\t%0,%1\\n\\t"
-"dbar\t0x14\\n\\t";
+"dbar\t0x14";
 case MEMMODEL_RELAXED:
-  return "ld.\t%0,%1\\n\\t"
-"dbar\t0x700\\n\\t";
+  return TARGET_LD_SEQ_SA ? "ld.\t%0,%1\\n\\t"
+ : "ld.\t%0,%1\\n\\t"
+   "dbar\t0x700";
 
 default:
   /* The valid memory order variants are __ATOMIC_RELAXED, 
__ATOMIC_SEQ_CST,
-- 
2.42.1



[PATCH v2 2/6] LoongArch: genopts: Add infrastructure to generate code for new features in ISA evolution

2023-11-17 Thread Xi Ruoyao
LoongArch v1.10 introduced the concept of ISA evolution.  During ISA
evolution, many independent features can be added and enumerated via
CPUCFG.

Add a data file into genopts storing the CPUCFG word, bit, the name
of the command line option controlling if this feature should be used
for compilation, and the text description.  Make genstr.sh process these
info and add the command line options into loongarch.opt and
loongarch-str.h, and generate a new file loongarch-cpucfg-map.h for
mapping CPUCFG output to the corresponding option.  When handling
-march=native, use the information in loongarch-cpucfg-map.h to generate
the corresponding option mask.  Enable the features implied by -march
setting unless the user has explicitly disabled the feature.

The added options (-mdiv32 and -mld-seq-sa) are not really handled yet.
They'll be used in the following patches.

gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in: New data file.
* config/loongarch/genopts/genstr.sh: Translate info in
isa-evolution.in when generating loongarch-str.h, loongarch.opt,
and loongarch-cpucfg-map.h.
* config/loongarch/genopts/loongarch.opt.in (isa_evolution):
New variable.
* config/loongarch/t-loongarch: (loongarch-cpucfg-map.h): New
rule.
(loongarch-str.h): Depend on isa-evolution.in.
(loongarch.opt): Depend on isa-evolution.in.
(loongarch-cpu.o): Depend on loongarch-cpucfg-map.h.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch-def.h (loongarch_isa):  Add field
for evolution features.  Add helper function to enable features
in this field.
Probe native CPU capability and save the corresponding options
into preset.
* config/loongarch/loongarch-cpu.cc (fill_native_cpu_config):
Probe native CPU capability and save the corresponding options
into preset.
(cache_cpucfg): Simplify with C++11-style for loop.
(cpucfg_useful_idx, N_CPUCFG_WORDS): Move to ...
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Enable the ISA evolution
feature options implied by -march and not explicitly disabled.
(loongarch_asm_code_end): New function, print ISA information as
comments in the assembly if -fverbose-asm.  It makes easier to
debug things like -march=native.
(TARGET_ASM_CODE_END): Define.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-cpucfg-map.h: Generate.
(cpucfg_useful_idx, N_CPUCFG_WORDS) ... here.
---
 gcc/config/loongarch/genopts/genstr.sh| 92 ++-
 gcc/config/loongarch/genopts/isa-evolution.in |  2 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  7 ++
 gcc/config/loongarch/loongarch-cpu.cc | 46 +-
 gcc/config/loongarch/loongarch-cpucfg-map.h   | 48 ++
 gcc/config/loongarch/loongarch-def.h  |  7 ++
 gcc/config/loongarch/loongarch-str.h  |  7 +-
 gcc/config/loongarch/loongarch.cc | 31 +++
 gcc/config/loongarch/loongarch.opt| 20 +++-
 gcc/config/loongarch/t-loongarch  | 21 -
 10 files changed, 245 insertions(+), 36 deletions(-)
 create mode 100644 gcc/config/loongarch/genopts/isa-evolution.in
 create mode 100644 gcc/config/loongarch/loongarch-cpucfg-map.h

diff --git a/gcc/config/loongarch/genopts/genstr.sh 
b/gcc/config/loongarch/genopts/genstr.sh
index 04e785576bb..cc83496ae38 100755
--- a/gcc/config/loongarch/genopts/genstr.sh
+++ b/gcc/config/loongarch/genopts/genstr.sh
@@ -25,8 +25,8 @@ cd "$(dirname "$0")"
 # Generate a header containing definitions from the string table.
 gen_defines() {
 cat .  */
+
+#ifndef LOONGARCH_CPUCFG_MAP_H
+#define LOONGARCH_CPUCFG_MAP_H
+
+#include "options.h"
+
+static constexpr struct {
+  int cpucfg_word;
+  unsigned int cpucfg_bit;
+  HOST_WIDE_INT isa_evolution_bit;
+} cpucfg_map[] = {
+EOF
+
+# Generate the strings from isa-evolution.in.
+awk '{
+  gsub(/-/, "_", $3)
+  print("  { "$1", 1u << "$2", OPTION_MASK_ISA_"toupper($3)" },")
+}' isa-evolution.in
+
+echo "};"
+echo
+echo "static constexpr int cpucfg_useful_idx[] = {"
+
+awk 'BEGIN { print("  0,\n  1,\n  2,\n  16,\n  17,\n  18,\n  19,") }
+{if ($1+0 > max+0) max=$1; print("  "$1",")}' \
+   isa-evolution.in | sort -n | uniq
+
+echo "};"
+echo ""
+
+awk 'BEGIN { max=19 }
+{ if ($1+0 > max+0) max=$1 }
+END { print "static constexpr int N_CPUCFG_WORDS = "1+max";" }' \
+   isa-evolution.in
+
+echo "#endif /* LOONGARCH_CPUCFG_MAP_H */"
 }
 
 main() {
 case "$1" in
+   cpucfg-map) gen_cpucfg_map;;
header) gen_defines;;
opt) gen_options;;
-   *) echo "Unknown Command: \"$1\". Available: header, opt"; exit 1;;
+   *) echo "Unknown Command: \"$1\". Available: 

[PATCH v2 1/6] LoongArch: Fix internal error running "gcc -march=native" on LA664

2023-11-17 Thread Xi Ruoyao
On LA664, the PRID preset is ISA_BASE_LA64V110 but the base architecture
is guessed ISA_BASE_LA64V100.  This causes a warning to be outputed:

cc1: warning: base architecture 'la64' differs from PRID preset '?'

But we've not set the "?" above in loongarch_isa_base_strings, thus it's
a nullptr and then an ICE is triggered.

Add ISA_BASE_LA64V110 to genopts and initialize
loongarch_isa_base_strings[ISA_BASE_LA64V110] correctly to fix the ICE.
The warning itself will be fixed later.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings:
(STR_ISA_BASE_LA64V110): Add.
* config/loongarch/genopts/loongarch.opt.in:
(ISA_BASE_LA64V110): Add.
* config/loongarch/loongarch-def.c
(loongarch_isa_base_strings): Initialize [ISA_BASE_LA64V110]
to STR_ISA_BASE_LA64V110.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-str.h: Regenerate.
---
 gcc/config/loongarch/genopts/loongarch-strings | 1 +
 gcc/config/loongarch/genopts/loongarch.opt.in  | 3 +++
 gcc/config/loongarch/loongarch-def.c   | 1 +
 gcc/config/loongarch/loongarch-str.h   | 1 +
 gcc/config/loongarch/loongarch.opt | 3 +++
 5 files changed, 9 insertions(+)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index 7bc4824007e..b2070c83ed0 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -30,6 +30,7 @@ STR_CPU_LA664   la664
 
 # Base architecture
 STR_ISA_BASE_LA64V100 la64
+STR_ISA_BASE_LA64V110 la64v1.1
 
 # -mfpu
 OPTSTR_ISA_EXT_FPUfpu
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 00b4733d75b..b274b3fb21e 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -32,6 +32,9 @@ Basic ISAs of LoongArch:
 EnumValue
 Enum(isa_base) String(@@STR_ISA_BASE_LA64V100@@) Value(ISA_BASE_LA64V100)
 
+EnumValue
+Enum(isa_base) String(@@STR_ISA_BASE_LA64V110@@) Value(ISA_BASE_LA64V110)
+
 ;; ISA extensions / adjustments
 Enum
 Name(isa_ext_fpu) Type(int)
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 067629141b6..f22d488acb2 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -165,6 +165,7 @@ loongarch_cpu_multipass_dfa_lookahead[N_TUNE_TYPES] = {
 const char*
 loongarch_isa_base_strings[N_ISA_BASE_TYPES] = {
   [ISA_BASE_LA64V100] = STR_ISA_BASE_LA64V100,
+  [ISA_BASE_LA64V110] = STR_ISA_BASE_LA64V110,
 };
 
 const char*
diff --git a/gcc/config/loongarch/loongarch-str.h 
b/gcc/config/loongarch/loongarch-str.h
index fc4f41bfc1e..114dbc692d7 100644
--- a/gcc/config/loongarch/loongarch-str.h
+++ b/gcc/config/loongarch/loongarch-str.h
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #define STR_CPU_LA664 "la664"
 
 #define STR_ISA_BASE_LA64V100 "la64"
+#define STR_ISA_BASE_LA64V110 "la64v1.1"
 
 #define OPTSTR_ISA_EXT_FPU "fpu"
 #define STR_NONE "none"
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 7f129e53ba5..350ca30d232 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -39,6 +39,9 @@ Basic ISAs of LoongArch:
 EnumValue
 Enum(isa_base) String(la64) Value(ISA_BASE_LA64V100)
 
+EnumValue
+Enum(isa_base) String(la64v1.1) Value(ISA_BASE_LA64V110)
+
 ;; ISA extensions / adjustments
 Enum
 Name(isa_ext_fpu) Type(int)
-- 
2.42.1



[PATCH v2 0/6] Add LoongArch v1.1 div32 and ld-seq-sa support

2023-11-17 Thread Xi Ruoyao
Superseds
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636795.html.

Requires
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636946.html.

Changes:

- Rebase on top of "Add LoongarchV1.1 instructions support".
- Not to translate loongarch-def.c C++.  Use int64_t instead of
  HOST_WIDE_INT in loongarch-def.h.
- In genopts, also generates cpucfg_useful_idx[] and N_CPUCFG_WORDS.
  Use decimals instead of hexidecimals for CPUCFG word index to make awk
  happy to perform numerical comparision.
- Dump arch and feature info as comments in generated assembly if
  -fverbose-asm.  It's helpful for testing and debugging.

Xi Ruoyao (6):
  LoongArch: Fix internal error running "gcc -march=native" on LA664
  LoongArch: genopts: Add infrastructure to generate code for new
features in ISA evolution
  LoongArch: Add evolution features of base ISA revisions
  LoongArch: Take the advantage of -mdiv32 if it's enabled
  LoongArch: Don't emit dbar 0x700 if -mld-seq-sa
  LoongArch: Add fine-grained control for LAM_BH and LAMCAS

 gcc/config/loongarch/genopts/genstr.sh|  92 ++-
 gcc/config/loongarch/genopts/isa-evolution.in |   4 +
 .../loongarch/genopts/loongarch-strings   |   1 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  10 ++
 gcc/config/loongarch/loongarch-cpu.cc | 105 +++---
 gcc/config/loongarch/loongarch-cpucfg-map.h   |  50 +
 gcc/config/loongarch/loongarch-def.c  |   1 +
 gcc/config/loongarch/loongarch-def.h  |  12 ++
 gcc/config/loongarch/loongarch-opts.cc|   3 +
 gcc/config/loongarch/loongarch-str.h  |  10 +-
 gcc/config/loongarch/loongarch.cc |  35 +-
 gcc/config/loongarch/loongarch.md |  31 +-
 gcc/config/loongarch/loongarch.opt|  31 +-
 gcc/config/loongarch/sync.md  |  25 +++--
 gcc/config/loongarch/t-loongarch  |  21 +++-
 .../gcc.target/loongarch/div-div32.c  |  31 ++
 .../gcc.target/loongarch/div-no-div32.c   |  11 ++
 17 files changed, 403 insertions(+), 70 deletions(-)
 create mode 100644 gcc/config/loongarch/genopts/isa-evolution.in
 create mode 100644 gcc/config/loongarch/loongarch-cpucfg-map.h
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-div32.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-no-div32.c

-- 
2.42.1



[PATCH 7/7] lto: partition specific lto_clone_numbers

2023-11-17 Thread Michal Jires
Replaces "lto_priv.$clone_number" by
"lto_priv.$partition_hash.$partition_specific_clone_number".
To reduce divergence for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/lto/ChangeLog:

* lto-partition.cc (set_clone_partition_name_checksum): New.
(CHECKSUM_STRING): New.
(privatize_symbol_name_1): Use partition hash for lto_priv.
(lto_promote_cross_file_statics): Use set_clone_partition_name_checksum.
(lto_promote_statics_nonwpa): Changed clone_map type.
---
 gcc/lto/lto-partition.cc | 49 +++-
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index eb31ecba0d3..a2ce24eea23 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "lto-partition.h"
 #include "sreal.h"
+#include "md5.h"
 
 #include 
 #include 
@@ -1516,8 +1517,36 @@ validize_symbol_for_target (symtab_node *node)
 }
 }
 
-/* Maps symbol names to unique lto clone counters.  */
-static hash_map *lto_clone_numbers;
+/* Maps symbol names with partition checksum to unique lto clone counters.  */
+using clone_map = hash_map>, unsigned>;
+static clone_map *lto_clone_numbers;
+uint64_t current_partition_checksum = 0;
+
+/* Computes a quick checksum to distinguish partitions of clone numbers.  */
+void
+set_clone_partition_name_checksum (ltrans_partition part)
+{
+#define CHECKSUM_STRING(FOO) md5_process_bytes ((FOO), strlen (FOO), )
+  struct md5_ctx ctx;
+  md5_init_ctx ();
+
+  CHECKSUM_STRING (part->name);
+
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder = part->encoder;
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next ())
+{
+  symtab_node *node = lsei_node (lsei);
+  CHECKSUM_STRING (node->name ());
+}
+
+  uint64_t checksum[2];
+  md5_finish_ctx (, checksum);
+  current_partition_checksum = checksum[0];
+#undef CHECKSUM_STRING
+}
 
 /* Helper for privatize_symbol_name.  Mangle NODE symbol name
represented by DECL.  */
@@ -1531,10 +1560,16 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
 return false;
 
   const char *name = maybe_rewrite_identifier (name0);
-  unsigned _number = lto_clone_numbers->get_or_insert (name);
+
+  unsigned _number = lto_clone_numbers->get_or_insert (
+std::pair {name, current_partition_checksum});
+
+  char lto_priv[32];
+  sprintf (lto_priv, "lto_priv.%lu", current_partition_checksum);
+
   symtab->change_decl_assembler_name (decl,
  clone_function_name (
- name, "lto_priv", clone_number));
+ name, lto_priv, clone_number));
   clone_number++;
 
   if (node->lto_file_data)
@@ -1735,11 +1770,13 @@ lto_promote_cross_file_statics (void)
   part->encoder = compute_ltrans_boundary (part->encoder);
 }
 
-  lto_clone_numbers = new hash_map;
+  lto_clone_numbers = new clone_map;
 
   /* Look at boundaries and promote symbols as needed.  */
   for (i = 0; i < n_sets; i++)
 {
+  set_clone_partition_name_checksum (ltrans_partitions[i]);
+
   lto_symtab_encoder_iterator lsei;
   lto_symtab_encoder_t encoder = ltrans_partitions[i]->encoder;
 
@@ -1778,7 +1815,7 @@ lto_promote_statics_nonwpa (void)
 {
   symtab_node *node;
 
-  lto_clone_numbers = new hash_map;
+  lto_clone_numbers = new clone_map;
   FOR_EACH_SYMBOL (node)
 {
   rename_statics (NULL, node);
-- 
2.42.1



[PATCH 6/7] lto: squash order of symbols in partitions

2023-11-17 Thread Michal Jires
This patch squashes order of symbols in individual partitions, so that
their relative order is conserved, but is not influenced by symbols in
other partitions.
Order of cloned symbols is set to 0. This should be fine because order
specifies order of symbols in input files, which cloned symbols are not
part of.

This is important for incremental LTO because if there is a new symbol,
it otherwise shifts order of all symbols with higher order, which would
diverge them all.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-cgraph.cc (lto_output_node): Add and use order_remap.
(lto_output_varpool_node): Likewise.
(output_symtab): Likewise.
* lto-streamer-out.cc (produce_asm): Likewise.
(output_function): Likewise.
(output_constructor): Likewise.
(copy_function_or_variable): Likewise.
(cmp_int): New.
(lto_output): Generate order_remap.
* lto-streamer.h (produce_asm): Add order_remap.
(output_symtab): Likewise.
---
 gcc/lto-cgraph.cc   | 20 
 gcc/lto-streamer-out.cc | 71 +
 gcc/lto-streamer.h  |  5 +--
 3 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 32c0f5ac6db..a7530290fba 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -381,7 +381,8 @@ reachable_from_this_partition_p (struct cgraph_node *node, 
lto_symtab_encoder_t
 
 static void
 lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
-lto_symtab_encoder_t encoder)
+lto_symtab_encoder_t encoder,
+hash_map, int>* order_remap)
 {
   unsigned int tag;
   struct bitpack_d bp;
@@ -405,7 +406,9 @@ lto_output_node (struct lto_simple_output_block *ob, struct 
cgraph_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
   tag);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
 
   /* In WPA mode, we only output part of the call-graph.  Also, we
  fake cgraph node attributes.  There are two cases that we care.
@@ -585,7 +588,8 @@ lto_output_node (struct lto_simple_output_block *ob, struct 
cgraph_node *node,
 
 static void
 lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node 
*node,
-lto_symtab_encoder_t encoder)
+lto_symtab_encoder_t encoder,
+hash_map, int>* order_remap)
 {
   bool boundary_p = !lto_symtab_encoder_in_partition_p (encoder, node);
   bool encode_initializer_p
@@ -602,7 +606,8 @@ lto_output_varpool_node (struct lto_simple_output_block 
*ob, varpool_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
   LTO_symtab_variable);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
   lto_output_var_decl_ref (ob->decl_state, ob->main_stream, node->decl);
   bp = bitpack_create (ob->main_stream);
   bp_pack_value (, node->externally_visible, 1);
@@ -967,7 +972,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 /* Output the part of the symtab in SET and VSET.  */
 
 void
-output_symtab (void)
+output_symtab (hash_map, int>* order_remap)
 {
   struct cgraph_node *node;
   struct lto_simple_output_block *ob;
@@ -994,9 +999,10 @@ output_symtab (void)
 {
   symtab_node *node = lto_symtab_encoder_deref (encoder, i);
   if (cgraph_node *cnode = dyn_cast  (node))
-lto_output_node (ob, cnode, encoder);
+   lto_output_node (ob, cnode, encoder, order_remap);
   else
-   lto_output_varpool_node (ob, dyn_cast (node), encoder);
+   lto_output_varpool_node (ob, dyn_cast (node), encoder,
+order_remap);
 }
 
   /* Go over the nodes in SET again to write edges.  */
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index a1bbea8fc68..9448ab195d5 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2212,7 +2212,8 @@ output_cfg (struct output_block *ob, struct function *fn)
a function, set FN to the decl for that function.  */
 
 void
-produce_asm (struct output_block *ob, tree fn)
+produce_asm (struct output_block *ob, tree fn,
+hash_map, int>* order_remap)
 {
   enum lto_section_type section_type = ob->section_type;
   struct lto_function_header header;
@@ -2221,9 +,11 @@ produce_asm (struct output_block *ob, tree fn)
   if (section_type == LTO_section_function_body)
 {
   const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fn));
-  section_name = lto_get_section_name (section_type, name,
-  

[PATCH 5/7] lto: Implement cache partitioning

2023-11-17 Thread Michal Jires
This patch implements new cache partitioning. It tries to keep symbols
from single source file together to minimize propagation of divergence.

It starts with symbols already grouped by source files. If reasonably
possible it only either combines several files into one final partition,
or, if a file is large, split the file into several final partitions.

Intermediate representation is partition_set which contains set of
groups of symbols (each group corresponding to original source file) and
number of final partitions this partition_set should split into.

First partition_fixed_split splits partition_set into constant number of
partition_sets with equal number of symbols groups. If for example there
are 39 source files, the resulting partition_sets will contain 10, 10,
10, and 9 source files. This splitting intentionally ignores estimated
instruction counts to minimize propagation of divergence.

Second partition_over_target_split separates too large files and splits
them into individual symbols to be combined back into several smaller
files in next step.

Third partition_binary_split splits partition_set into two halves until
it should be split into only one final partition, at which point the
remaining symbols are joined into one final partition.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* common.opt: Add cache partitioning.
* flag-types.h (enum lto_partition_model): Likewise.

gcc/lto/ChangeLog:

* lto-partition.cc (new_partition): Use new_partition_no_push.
(new_partition_no_push): New.
(free_ltrans_partition): New.
(free_ltrans_partitions): Use free_ltrans_partition.
(join_partitions): New.
(split_partition_into_nodes): New.
(is_partition_reorder): New.
(class partition_set): New.
(distribute_n_partitions): New.
(partition_over_target_split): New.
(partition_binary_split): New.
(partition_fixed_split): New.
(class partitioner_base): New.
(class partitioner_default): New.
(lto_cache_map): New.
* lto-partition.h (lto_cache_map): New.
* lto.cc (do_whole_program_analysis): Use lto_cache_map.

gcc/testsuite/ChangeLog:

* gcc.dg/completion-2.c: Add -flto-partition=cache.
---
 gcc/common.opt  |   3 +
 gcc/flag-types.h|   3 +-
 gcc/lto/lto-partition.cc| 605 +++-
 gcc/lto/lto-partition.h |   1 +
 gcc/lto/lto.cc  |   2 +
 gcc/testsuite/gcc.dg/completion-2.c |   1 +
 6 files changed, 605 insertions(+), 10 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1cf3bdd3b51..fe5cf3c0a05 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2174,6 +2174,9 @@ Enum(lto_partition_model) String(1to1) 
Value(LTO_PARTITION_1TO1)
 EnumValue
 Enum(lto_partition_model) String(max) Value(LTO_PARTITION_MAX)
 
+EnumValue
+Enum(lto_partition_model) String(cache) Value(LTO_PARTITION_CACHE)
+
 flto-partition=
 Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) 
Init(LTO_PARTITION_BALANCED)
 Specify the algorithm to partition symbols and vars at linktime.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index c1852cd810c..59b3c23081b 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -393,7 +393,8 @@ enum lto_partition_model {
   LTO_PARTITION_ONE = 1,
   LTO_PARTITION_BALANCED = 2,
   LTO_PARTITION_1TO1 = 3,
-  LTO_PARTITION_MAX = 4
+  LTO_PARTITION_MAX = 4,
+  LTO_PARTITION_CACHE = 5
 };
 
 /* flag_lto_linker_output initialization values.  */
diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index e4c91213f4b..eb31ecba0d3 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -36,6 +36,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-partition.h"
 #include "sreal.h"
 
+#include 
+#include 
+
 vec ltrans_partitions;
 
 static void add_symbol_to_partition (ltrans_partition part, symtab_node *node);
@@ -59,20 +62,41 @@ cmp_partitions_order (const void *a, const void *b)
   return orderb - ordera;
 }
 
-/* Create new partition with name NAME.  */
-
+/* Create new partition with name NAME.
+   Does not push into ltrans_partitions.  */
 static ltrans_partition
-new_partition (const char *name)
+new_partition_no_push (const char *name)
 {
   ltrans_partition part = XCNEW (struct ltrans_partition_def);
   part->encoder = lto_symtab_encoder_new (false);
   part->name = name;
   part->insns = 0;
   part->symbols = 0;
+  return part;
+}
+
+/* Create new partition with name NAME.  */
+
+static ltrans_partition
+new_partition (const char *name)
+{
+  ltrans_partition part = new_partition_no_push (name);
   ltrans_partitions.safe_push (part);
   return part;
 }
 
+/* Free memory used by ltrans partition.
+   Encoder can be kept to be freed after streaming.  */
+static void
+free_ltrans_partition (ltrans_partition part, bool delete_encoder)
+  {
+if 

[PATCH 3/7] Lockfile.

2023-11-17 Thread Michal Jires
This patch implements lockfile used for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lockfile.o.
* lockfile.cc: New file.
* lockfile.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lockfile.cc | 136 
 gcc/lockfile.h  |  85 ++
 3 files changed, 224 insertions(+), 2 deletions(-)
 create mode 100644 gcc/lockfile.cc
 create mode 100644 gcc/lockfile.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7b7a4ff789a..2c527245c81 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o
+  lto-wrapper.o collect-utils.o lockfile.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
   $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBS)
diff --git a/gcc/lockfile.cc b/gcc/lockfile.cc
new file mode 100644
index 000..9440e8938f3
--- /dev/null
+++ b/gcc/lockfile.cc
@@ -0,0 +1,136 @@
+/* File locking.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+
+#include "lockfile.h"
+
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Blocking call.  */
+int
+lockfile::lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  while (fcntl (fd, F_SETLKW, _flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Only locks if this filelock is not locked by any other process.
+   Return whether locking was successful.  */
+int
+lockfile::try_lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  if (fcntl (fd, F_SETLK, _flock) == -1)
+{
+  close (fd);
+  fd = -1;
+  return 1;
+}
+#endif
+  return 0;
+}
+
+/* Shared read lock.  Only read lock can be held concurrently.
+   If write lock is already held by this process, it will be
+   changed to read lock.
+   Blocking call.  */
+int
+lockfile::lock_read ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_RDLCK;
+
+  while (fcntl (fd, F_SETLKW, _flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unlock all previously placed locks.  */
+void
+lockfile::unlock ()
+{
+  if (fd < 0)
+{
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_UNLCK;
+
+  fcntl (fd, F_SETLK, _flock);
+#endif
+  close (fd);
+  fd = -1;
+}
+}
+
+/* Are lockfiles supported?  */
+bool
+lockfile::lockfile_supported ()
+{
+#if HAVE_FCNTL_H
+  return true;
+#else
+  return false;
+#endif
+}
diff --git a/gcc/lockfile.h b/gcc/lockfile.h
new file mode 100644
index 000..afcbaf599c1
--- /dev/null
+++ b/gcc/lockfile.h
@@ -0,0 +1,85 @@
+/* File locking.
+   Copyright (C) 

[PATCH 4/7] lto: Implement ltrans cache

2023-11-17 Thread Michal Jires
This patch implements Incremental LTO as ltrans cache.

The cache is active when directory $GCC_LTRANS_CACHE is specified and exists.
Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lto-ltrans-cache.cc | 407 
 gcc/lto-ltrans-cache.h  | 164 
 gcc/lto-wrapper.cc  | 150 +--
 4 files changed, 711 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2c527245c81..495e5f3d069 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 000..0d43e548fb3
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,407 @@
+/* File caching.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "md5.h"
+#include "lto-ltrans-cache.h"
+
+#include 
+#include 
+#include 
+
+const md5_checksum_t INVALID_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns INVALID_CHECKSUM if not possible.
+ */
+static md5_checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+return INVALID_CHECKSUM;
+
+  md5_checksum_t result;
+
+  int ret = md5_stream (file, );
+
+  if (ret)
+result = INVALID_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files byte by byte.  */
+static bool
+files_identical (char const *first_filename, char const *second_filename)
+{
+  FILE *f_first = fopen (first_filename, "rb");
+  if (!f_first)
+return false;
+
+  FILE *f_second = fopen (second_filename, "rb");
+  if (!f_second)
+{
+  fclose (f_first);
+  return false;
+}
+
+  bool ret = true;
+
+  for (;;)
+{
+  int c1, c2;
+  c1 = fgetc (f_first);
+  c2 = fgetc (f_second);
+
+  if (c1 != c2)
+   {
+ ret = false;
+ break;
+   }
+
+  if (c1 == EOF)
+   break;
+}
+
+  fclose (f_first);
+  fclose (f_second);
+  return ret;
+}
+
+/* Contructor of cache item.  */
+ltrans_file_cache::item::item (std::string input, std::string output,
+  md5_checksum_t input_checksum, uint32_t last_used):
+  input (std::move (input)), output (std::move (output)),
+  input_checksum (input_checksum), last_used (last_used)
+{
+  lock = lockfile (this->input + ".lock");
+}
+/* Destructor of cache item.  */
+ltrans_file_cache::item::~item ()
+{
+  lock.unlock ();
+}
+
+/* Reads next cache item from cachedata file.
+   Adds `dir/` prefix to filenames.  */
+static ltrans_file_cache::item*
+read_cache_item (FILE* f, const char* dir)
+{
+  md5_checksum_t checksum;
+  uint32_t last_used;
+
+  if (fread (, 1, checksum.size (), f) != checksum.size ())
+return NULL;
+  if (fread (_used, sizeof (last_used), 1, f) != 1)
+return NULL;
+
+  std::vector input (strlen (dir));
+  

[PATCH 2/7] lto: Remove random_seed from section name.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-streamer.cc (lto_get_section_name): Remove random_seed in WPA.
---
 gcc/lto-streamer.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
index 4968fd13413..53275e32618 100644
--- a/gcc/lto-streamer.cc
+++ b/gcc/lto-streamer.cc
@@ -132,11 +132,17 @@ lto_get_section_name (int section_type, const char *name,
  doesn't confuse the reader with merged sections.
 
  For options don't add a ID, the option reader cannot deal with them
- and merging should be ok here. */
+ and merging should be ok here.
+
+ WPA output is sent to LTRANS directly inside of lto-wrapper, so name
+ uniqueness for external tools is not needed.
+ Randomness would inhibit incremental LTO.  */
   if (section_type == LTO_section_opts)
 strcpy (post, "");
   else if (f != NULL) 
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
+  else if (flag_wpa)
+strcpy (post, ".0");
   else
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   char *res = concat (section_name_prefix, sep, add, post, NULL);
-- 
2.42.1



[PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
---
 gcc/lto-opts.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lto-opts.cc b/gcc/lto-opts.cc
index c9bee9d4197..0451e290c75 100644
--- a/gcc/lto-opts.cc
+++ b/gcc/lto-opts.cc
@@ -152,6 +152,7 @@ lto_write_options (void)
case OPT_fprofile_prefix_map_:
case OPT_fcanon_prefix_map:
case OPT_fwhole_program:
+   case OPT_fltrans_output_list_:
  continue;
 
default:
-- 
2.42.1



[PATCH 0/7] lto: Incremental LTO.

2023-11-17 Thread Michal Jires
Hi,
these patches implement Incremental LTO, specifically by caching results of
ltrans phase. Secondarily these patches contain changes to reduce divergence of
ltrans partitions so that they can be cached.

The aim is to reduce compile times for quick edit-compile cycles while using
LTO. Even with these minimal changes to the rest of GCC it works surprisingly
well. Currently testing by self compiling cc1, with individual commits used as
incremental changes, on average only ~1/3 of partitions need to be recompiled
with `-O2 -g0` and ~1/2 with `-O2 -g`. Which directly reduces time spent in
ltrans phase of LTO.

Unfortunately larger gains are a bit fragile. You may remember that during my
Cauldron talk I claimed reduction to ~1/6 and ~1/3 recompilations. That was
achieved with branch from March. Since then there were at least two commits
which introduced new divergence of partitions, though they seem fixable in
future.


Re: [PATCH] vect: Use statement vectype for conditional mask.

2023-11-17 Thread Robin Dapp
> No, you shouldn't place _7 != 0 inside the .COND_ADD but instead
> have an extra pattern stmt producing that so
> 
> patt_8 = _7 != 0;
> patt_9 = .COND_ADD (patt_8, ...);
> 
> that's probably still not enough, but I always quickly forget how
> bool patterns work ... basically a comparison like patt_8 = _7 != 0
> vectorizes to a mask (aka vector boolean) while any "data" uses
> of bools are replaced by mask ? 1 : 0; - there's a complication for
> bool data producing loads which is why we need to insert the
> "fake" compares to produce a mask.  IIRC.

I already had call handling to vect_recog_bool_pattern in working
shape when I realized that vect_recog_mask_conversion_pattern already
handles most of what I need.  The difference is that it doesn't do
 patt_8 = _7 != 0
but rather
 patt_8 =  () _7;

It works equally well and most of the code can be reused.

The attached was bootstrapped and regtested on x86 and aarch64
and regtested on riscv.

Regards
 Robin

Subject: [PATCH] vect: Add bool pattern handling for COND_OPs.

In order to handle masks properly for conditional operations this patch
teaches vect_recog_mask_conversion_pattern to also handle conditional
operations.  Now we convert e.g.

 _mask = *_6;
 _ifc123 = COND_OP (_mask, ...);

into
 _mask = *_6;
 patt200 = () _mask;
 patt201 = COND_OP (patt200, ...);

This way the mask will be properly recognized as boolean mask and the
correct vector mask will be generated.

gcc/ChangeLog:

PR middle-end/112406

* tree-vect-patterns.cc (build_mask_conversion):
(vect_convert_mask_for_vectype):

gcc/testsuite/ChangeLog:

* gfortran.dg/pr112406.f90: New test.
---
 gcc/testsuite/gfortran.dg/pr112406.f90 | 21 +
 gcc/tree-vect-patterns.cc  | 26 ++
 2 files changed, 39 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr112406.f90

diff --git a/gcc/testsuite/gfortran.dg/pr112406.f90 
b/gcc/testsuite/gfortran.dg/pr112406.f90
new file mode 100644
index 000..27e96df7e26
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr112406.f90
@@ -0,0 +1,21 @@
+! { dg-do compile { target { aarch64-*-* || riscv*-*-* } } }
+! { dg-options "-Ofast -w -fprofile-generate" }
+! { dg-additional-options "-march=rv64gcv -mabi=lp64d" { target riscv*-*-* } }
+! { dg-additional-options "-march=armv8-a+sve" { target aarch64-*-* } }
+
+module brute_force
+  integer, parameter :: r=9
+   integer sudoku1(1, r)
+  contains
+subroutine brute
+integer l(r), u(r)
+   where(sudoku1(1, :) /= 1)
+l = 1
+  u = 1
+   end where
+do i1 = 1, u(1)
+   do
+  end do
+   end do
+end
+end
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731..696b70b76a8 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -5830,7 +5830,8 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
   tree rhs1_op0 = NULL_TREE, rhs1_op1 = NULL_TREE;
   tree rhs1_op0_type = NULL_TREE, rhs1_op1_type = NULL_TREE;
 
-  /* Check for MASK_LOAD ans MASK_STORE calls requiring mask conversion.  */
+  /* Check for MASK_LOAD and MASK_STORE as well as COND_OP calls requiring mask
+ conversion.  */
   if (is_gimple_call (last_stmt)
   && gimple_call_internal_p (last_stmt))
 {
@@ -5842,6 +5843,7 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
return NULL;
 
   bool store_p = internal_store_fn_p (ifn);
+  bool load_p = internal_store_fn_p (ifn);
   if (store_p)
{
  int rhs_index = internal_fn_stored_value_index (ifn);
@@ -5856,15 +5858,21 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
  vectype1 = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
}
 
+  if (!vectype1)
+   return NULL;
+
   tree mask_arg = gimple_call_arg (last_stmt, mask_argno);
   tree mask_arg_type = integer_type_for_mask (mask_arg, vinfo);
-  if (!mask_arg_type)
-   return NULL;
-  vectype2 = get_mask_type_for_scalar_type (vinfo, mask_arg_type);
+  if (mask_arg_type)
+   {
+ vectype2 = get_mask_type_for_scalar_type (vinfo, mask_arg_type);
 
-  if (!vectype1 || !vectype2
- || known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
-  TYPE_VECTOR_SUBPARTS (vectype2)))
+ if (!vectype2
+ || known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
+  TYPE_VECTOR_SUBPARTS (vectype2)))
+   return NULL;
+   }
+  else if (store_p || load_p)
return NULL;
 
   tmp = build_mask_conversion (vinfo, mask_arg, vectype1, stmt_vinfo);
@@ -5883,7 +5891,9 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
  gimple_call_set_lhs (pattern_stmt, lhs);
}
-  gimple_call_set_nothrow (pattern_stmt, true);
+
+  if (load_p || store_p)
+   gimple_call_set_nothrow (pattern_stmt, true);
 
   pattern_stmt_info = vinfo->add_stmt 

Re: [Patch] Fortran: Accept -std=f2023, update line-length for Fortran 2023

2023-11-17 Thread Harald Anlauf

Hi Tobias,

On 11/17/23 12:38, Tobias Burnus wrote:

Hi Harald, hi all,

On 16.11.23 20:30, Harald Anlauf wrote:

According to the standard one can have 99 lines with only
"&" and then an ";", but then only 100 lines with 1 characters.


I believe a single '&' is not valid, you either need '&&' or something
else + '&'; thus, you can have only half a million lines + 1.


after looking at the F2023 standard again I wonder why
they did such a disservice to compiler developers...

You are right: a single '&' is not valid.

6.3.2.4 also has:

"When used for continuation, the “&” is not part of the statement"

And 6.3.2.5 (also 6.3.3.4): "The “;” is not part of the statement".

So a million "&"-continued lines is possible in free form.

For fixed form, 6.3.3.1 has: "If a source line contains only characters
of default kind, it shall contain exactly 72 characters; otherwise, its
maximum number of characters is processor dependent."

I wonder what I should make out of this...


In the code, I still use 1,000,000 but now with a comment.


Yeah, for the time being this is the most reasonable solution.
Let's claim that the 10^6 line limit is the new GNU standard ;-)

Cheers,
Harald




[PATCH 4/5] aarch64: Add ZT0

2023-11-17 Thread Richard Sandiford
SME2 adds a 512-bit lookup table called ZT0.  It is enabled
and disabled by PSTATE.ZA, just like ZA itself.  This patch
adds support for the register, including saving and restoring
contents.

The code reuses the V8DI that was added for LS64, including
the associated memory classification rules.  (The ZT0 range
is more restricted than the LS64 range, but that's enforced
by predicates and constraints.)

gcc/
* config/aarch64/aarch64.md (ZT0_REGNUM): New constant.
(LAST_FAKE_REGNUM): Bump to include it.
* config/aarch64/aarch64.h (FIXED_REGISTERS): Add an entry for ZT0.
(CALL_REALLY_USED_REGISTERS, REGISTER_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(machine_function): Add zt0_save_buffer.
(CUMULATIVE_ARGS): Add shared_zt0_flags;
* config/aarch64/aarch64.cc (aarch64_check_state_string): Handle zt0.
(aarch64_fntype_pstate_za, aarch64_fndecl_pstate_za): Likewise.
(aarch64_function_arg): Add the shared ZT0 flags as an extra
limb of the parallel.
(aarch64_init_cumulative_args): Initialize shared_zt0_flags.
(aarch64_extra_live_on_entry): Handle ZT0_REGNUM.
(aarch64_epilogue_uses): Likewise.
(aarch64_get_zt0_save_buffer, aarch64_save_zt0): New functions.
(aarch64_restore_zt0): Likewise.
(aarch64_start_call_args): Reject calls to functions that share
ZT0 from functions that have no ZT0 state.  Save ZT0 around shared-ZA
calls that do not share ZT0.
(aarch64_expand_call): Handle ZT0.  Reject calls to functions that
share ZT0 but not ZA from functions with ZA state.
(aarch64_end_call_args): Restore ZT0 after calls to shared-ZA functions
that do not share ZT0.
(aarch64_set_current_function): Require +sme2 for functions that
have ZT0 state.
(aarch64_function_attribute_inlinable_p): Don't allow functions to
be inlined if they have local zt0 state.
(AARCH64_IPA_CLOBBERS_ZT0): New constant.
(aarch64_update_ipa_fn_target_info): Record asms that clobber ZT0.
(aarch64_can_inline_p): Don't inline callees that clobber ZT0
into functions that have ZT0 state.
(aarch64_comp_type_attributes): Check for compatible ZT0 sharing.
(aarch64_optimize_mode_switching): Use mode switching if the
function has ZT0 state.
(aarch64_mode_emit_local_sme_state): Save and restore ZT0 around
calls to private-ZA functions.
(aarch64_mode_needed_local_sme_state): Require ZA to be active
for instructions that access ZT0.
(aarch64_md_asm_adjust): Extend handling of ZA clobbers to ZT0.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __ARM_STATE_ZT0.
* config/aarch64/aarch64-sme.md (UNSPECV_ASM_UPDATE_ZT0): New unspecv.
(aarch64_asm_update_zt0): New insn.
(UNSPEC_RESTORE_ZT0): New unspec.
(aarch64_sme_ldr_zt0, aarch64_restore_zt0): New insns.
(aarch64_sme_str_zt0): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sme/zt0_state_1.c: New test.
* gcc.target/aarch64/sme/zt0_state_2.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_3.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_4.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_5.c: Likewise.
---
 gcc/config/aarch64/aarch64-c.cc   |   1 +
 gcc/config/aarch64/aarch64-sme.md |  63 +
 gcc/config/aarch64/aarch64.cc | 205 --
 gcc/config/aarch64/aarch64.h  |  14 +-
 gcc/config/aarch64/aarch64.md |   7 +-
 .../gcc.target/aarch64/sme/zt0_state_1.c  |  65 +
 .../gcc.target/aarch64/sme/zt0_state_2.c  |  31 +++
 .../gcc.target/aarch64/sme/zt0_state_3.c  |   6 +
 .../gcc.target/aarch64/sme/zt0_state_4.c  |  53 
 .../gcc.target/aarch64/sme/zt0_state_5.c  | 260 ++
 10 files changed, 670 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_5.c

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 2a8ca46987a..017380b7563 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -74,6 +74,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
   builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
 
   builtin_define ("__ARM_STATE_ZA");
+  builtin_define ("__ARM_STATE_ZT0");
 
   /* Define keyword attributes like __arm_streaming as macros that expand
  to the associated [[...]] attribute.  Use __extension__ in the attribute
diff --git a/gcc/config/aarch64/aarch64-sme.md 

[PATCH 2/5] aarch64: Add svcount_t

2023-11-17 Thread Richard Sandiford
Some SME2 instructions interpret predicates as counters, rather than
as bit-per-byte masks.  The SME2 ACLE defines an svcount_t type for
this interpretation.

I don't think we have a better way of representing counters than
the VNx16BI that we use for masks.  The patch therefore doesn't
add a new mode for this representation.  It's just something that
is interpreted in context, a bit like signed vs. unsigned integers.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Handle reinterprets between svbool_t
and svcount_t.
(svreinterpret_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Add
b<->c forms.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_reinterpret_b): New
type suffix list.
(wrap_type_in_struct, register_type_decl): New functions, split out
from...
(register_tuple_type): ...here.
(register_builtin_types): Handle svcount_t.
(handle_arm_sve_h): Don't create tuples of svcount_t.
* config/aarch64/aarch64-sve-builtins.def (svcount_t): New type.
(c): New type suffix.
* config/aarch64/aarch64-sve-builtins.h (TYPE_count): New type class.

gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Add test
for svcount_t.
* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise.
* g++.target/aarch64/sve/acle/general-c++/svcount_1.C: New test.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_P)
(TEST_DUAL_P_REV): New macros.
* gcc.target/aarch64/sve/acle/asm/reinterpret_b.c: New test.
* gcc.target/aarch64/sve/acle/general-c/load_1.c: Test passing
an svcount_t.
* gcc.target/aarch64/sve/acle/general-c/svcount_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/unary_convert_1.c: Test
reinterprets involving svcount_t.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Test svcount_t.
* gcc.target/aarch64/sve/pcs/annotate_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_12.c: New test.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |   8 +-
 .../aarch64/aarch64-sve-builtins-base.def |   1 +
 gcc/config/aarch64/aarch64-sve-builtins.cc| 157 -
 gcc/config/aarch64/aarch64-sve-builtins.def   |   2 +
 gcc/config/aarch64/aarch64-sve-builtins.h |   4 +-
 .../aarch64/sve/acle/general-c++/mangle_1.C   |   2 +
 .../aarch64/sve/acle/general-c++/mangle_2.C   |   2 +
 .../aarch64/sve/acle/general-c++/svcount_1.C  |  10 +
 .../aarch64/sve/acle/asm/reinterpret_b.c  |  20 ++
 .../aarch64/sve/acle/asm/test_sve_acle.h  |  15 ++
 .../aarch64/sve/acle/general-c/load_1.c   |   4 +-
 .../aarch64/sve/acle/general-c/svcount_1.c|  10 +
 .../sve/acle/general-c/unary_convert_1.c  |   8 +-
 .../aarch64/sve/acle/general/attributes_7.c   |   1 +
 .../gcc.target/aarch64/sve/pcs/annotate_1.c   |   4 +
 .../gcc.target/aarch64/sve/pcs/annotate_2.c   |   4 +
 .../gcc.target/aarch64/sve/pcs/args_12.c  | 214 ++
 17 files changed, 402 insertions(+), 64 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/svcount_1.C
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_b.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svcount_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/args_12.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 5b75b903e5f..7d9ec5a911f 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2166,8 +2166,9 @@ public:
 
 /* Punt to rtl if the effect of the reinterpret on registers does not
conform to GCC's endianness model.  */
-if (!targetm.can_change_mode_class (f.vector_mode (0),
-   f.vector_mode (1), FP_REGS))
+if (GET_MODE_CLASS (f.vector_mode (0)) != MODE_VECTOR_BOOL
+   && !targetm.can_change_mode_class (f.vector_mode (0),
+  f.vector_mode (1), FP_REGS))
   return NULL;
 
 /* Otherwise svreinterpret corresponds directly to a VIEW_CONVERT_EXPR
@@ -2181,6 +2182,9 @@ public:
   expand (function_expander ) const override
   {
 machine_mode mode = e.tuple_mode (0);
+/* Handle svbool_t <-> svcount_t.  */
+if (mode == e.tuple_mode (1))
+  return e.args[0];
 return e.use_exact_insn (code_for_aarch64_sve_reinterpret (mode));
   }
 };
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def 
b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index ac53f35220d..a742c7bbc56 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -198,6 

[PATCH 3/5] aarch64: Add svboolx2_t

2023-11-17 Thread Richard Sandiford
SME2 has some instructions that operate on pairs of predicates.
The SME2 ACLE defines an svboolx2_t type for the associated
intrinsics.

The patch uses a double-width predicate mode, VNx32BI, to represent
the contents, similarly to how data vector tuples work.  At present
there doesn't seem to be any need to define pairs for VNx2BI,
VNx4BI and VNx8BI.

We already supported pairs of svbool_ts at the PCS level, as part
of a more general framework.  All that changes on the PCS side is
that we now have an associated mode.

gcc/
* config/aarch64/aarch64-modes.def (VNx32BI): New mode.
* config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(register_tuple_type): Handle tuples of predicates.
(handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts.
* config/aarch64/aarch64-sve.md (movvnx32bi): New insn.
* config/aarch64/aarch64.cc
(pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs
of predicates.
(pure_scalable_type_info::add_piece): Don't try to form pairs of
predicates.
(VEC_STRUCT): Generalize comment.
(aarch64_classify_vector_mode): Handle VNx32BI.
(aarch64_array_mode): Likewise.  Return BLKmode for arrays of
predicates that have no associated mode, rather than allowing
an integer mode to be chosen.
(aarch64_hard_regno_nregs): Handle VNx32BI.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_split_double_move): New function, split out from...
(aarch64_split_128bit_move): ...here.
(aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p.
(aarch64_pfalse_reg): Likewise.
(aarch64_sve_same_pred_for_ptest_p): Likewise.
(aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI.
(aarch64_expand_mov_immediate): Restrict handling of boolean vector
constants to single-predicate modes.
(aarch64_classify_address): Handle VNx32BI, ensuring that both halves
can be addressed.
(aarch64_class_max_nregs): Handle VNx32BI.
(aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t.
(aarch64_simd_valid_immediate): Allow all-zeros and all-ones for
VNx32BI.
(aarch64_mov_operand_p): Restrict predicate constant canonicalization
to single-predicate modes.
(aarch64_evpc_ext): Generalize exclusion to all predicate modes.
(aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise.
* config/aarch64/constraints.md (PR_REGS): New predicate.

gcc/testsuite/
* gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust
stack offsets.
(ret_nonpst3): Remove XFAIL.
* gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test.
---
 gcc/config/aarch64/aarch64-modes.def  |   3 +
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64-sve-builtins.cc|  18 ++-
 gcc/config/aarch64/aarch64-sve.md |  22 +++
 gcc/config/aarch64/aarch64.cc | 136 --
 gcc/config/aarch64/constraints.md |   4 +
 .../aarch64/sve/acle/general-c/svboolx2_1.c   | 135 +
 .../gcc.target/aarch64/sve/pcs/struct_3_128.c |   6 +-
 8 files changed, 272 insertions(+), 53 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index a3efc5b8484..ffca5517dec 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -48,16 +48,19 @@ ADJUST_FLOAT_FORMAT (HF, _half_format);
 
 /* Vector modes.  */
 
+VECTOR_BOOL_MODE (VNx32BI, 32, BI, 4);
 VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
 VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
 VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
 VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
+ADJUST_NUNITS (VNx32BI, aarch64_sve_vg * 16);
 ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
 ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
 ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
 ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
 
+ADJUST_ALIGNMENT (VNx32BI, 2);
 ADJUST_ALIGNMENT (VNx16BI, 2);
 ADJUST_ALIGNMENT (VNx8BI, 2);
 ADJUST_ALIGNMENT (VNx4BI, 2);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 3afb521c55c..25e2375c4fa 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -948,6 +948,7 @@ rtx aarch64_simd_expand_builtin (int, tree, rtx);
 void aarch64_simd_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
 rtx aarch64_endian_lane_rtx (machine_mode, unsigned int);
 
+void aarch64_split_double_move (rtx, rtx, machine_mode);
 void aarch64_split_128bit_move (rtx, rtx);
 
 bool aarch64_split_128bit_move_p (rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 

[PATCH 1/5] aarch64: Add +sme2

2023-11-17 Thread Richard Sandiford
gcc/
* doc/invoke.texi: Document +sme2.
* doc/sourcebuild.texi: Document aarch64_sme2.
* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
Add sme2.
* config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sme2): New
target test.
(check_effective_target_aarch64_asm_sme2_ok): Likewise.
---
 gcc/config/aarch64/aarch64-option-extensions.def |  2 ++
 gcc/config/aarch64/aarch64.h |  4 
 gcc/doc/invoke.texi  |  3 ++-
 gcc/doc/sourcebuild.texi |  2 ++
 gcc/testsuite/lib/target-supports.exp| 14 +-
 5 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 1480e498bbb..c156d2ee76a 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -157,4 +157,6 @@ AARCH64_OPT_EXTENSION("sme-i16i64", SME_I16I64, (SME), (), 
(), "")
 
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 
+AARCH64_OPT_EXTENSION("sme2", SME2, (SME), (), (), "sme2")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 9f690809e79..14205ce34b3 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -227,6 +227,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SME   (aarch64_isa_flags & AARCH64_FL_SME)
 #define AARCH64_ISA_SME_I16I64(aarch64_isa_flags & AARCH64_FL_SME_I16I64)
 #define AARCH64_ISA_SME_F64F64(aarch64_isa_flags & AARCH64_FL_SME_F64F64)
+#define AARCH64_ISA_SME2  (aarch64_isa_flags & AARCH64_FL_SME2)
 #define AARCH64_ISA_V8_3A (aarch64_isa_flags & AARCH64_FL_V8_3A)
 #define AARCH64_ISA_DOTPROD   (aarch64_isa_flags & AARCH64_FL_DOTPROD)
 #define AARCH64_ISA_AES   (aarch64_isa_flags & AARCH64_FL_AES)
@@ -332,6 +333,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 /* The FEAT_SME_F64F64 extension to SME, enabled through +sme-f64f64.  */
 #define TARGET_SME_F64F64 (AARCH64_ISA_SME_F64F64)
 
+/* SME2 instructions, enabled through +sme2.  */
+#define TARGET_SME2 (AARCH64_ISA_SME2)
+
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3 (AARCH64_ISA_V8_3A)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bc56170aadb..475244bb4ff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21065,7 +21065,8 @@ Enable the Scalable Matrix Extension.
 Enable the FEAT_SME_I16I64 extension to SME.
 @item sme-f64f64
 Enable the FEAT_SME_F64F64 extension to SME.
-
++@item sme2
+Enable the Scalable Matrix Extension 2.  This also enables SME instructions.
 @end table
 
 Feature @option{crypto} implies @option{aes}, @option{sha2}, and @option{simd},
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 448f5e08578..8d8d21f9fee 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2318,6 +2318,8 @@ Binutils installed on test system supports relocation 
types required by -fpic
 for AArch64 small memory model.
 @item aarch64_sme
 AArch64 target that generates instructions for SME.
+@item aarch64_sme2
+AArch64 target that generates instructions for SME2.
 @item aarch64_sve_hw
 AArch64 target that is able to generate and execute SVE code (regardless of
 whether it does so by default).
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b9061e5a552..87ee26f9119 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4425,6 +4425,18 @@ proc check_effective_target_aarch64_sme { } {
 }]
 }
 
+# Return 1 if this is an AArch64 target that generates instructions for SME.
+proc check_effective_target_aarch64_sme2 { } {
+if { ![istarget aarch64*-*-*] } {
+   return 0
+}
+return [check_no_compiler_messages aarch64_sme2 assembly {
+   #if !defined (__ARM_FEATURE_SME2)
+   #error FOO
+   #endif
+}]
+}
+
 # Return 1 if this is a compiler supporting ARC atomic operations
 proc check_effective_target_arc_atomic { } {
 return [check_no_compiler_messages arc_atomic assembly {
@@ -11621,7 +11633,7 @@ proc check_effective_target_aarch64_tiny { } {
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
  "i8mm" "f32mm" "f64mm" "bf16" "sb" "sve2" "ls64"
- "sme" "sme-i16i64" } {
+ "sme" "sme-i16i64" "sme2" } {
 eval [string map [list FUNC $aarch64_ext] {
proc check_effective_target_aarch64_asm_FUNC_ok { } {
  if { [istarget aarch64*-*-*] } {
-- 
2.25.1



Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Daniel Krügler
Am Fr., 17. Nov. 2023 um 18:31 Uhr schrieb Jonathan Wakely :
>
> On Fri, 17 Nov 2023 at 17:01, Daniel Krügler  
> wrote:
> >
[..]
> > > +
> > > +namespace std _GLIBCXX_VISIBILITY(default)
> > > +{
> > > +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > +
> > > +  /// Add two integers, with saturation in case of overflow.
> > > +  template requires __is_standard_integer<_Tp>::value
> > > +constexpr _Tp
> > > +add_sat(_Tp __x, _Tp __y) noexcept
> > > +{
> > > +  _Tp __z;
> > > +  if (!__builtin_add_overflow(__x, __y, &__z))
> > > +   return __z;
> > > +  if constexpr (is_unsigned_v<_Tp>)
> > > +   return __gnu_cxx::__int_traits<_Tp>::__max;
> > > +  else if (__x < 0)
> > > +   return __gnu_cxx::__int_traits<_Tp>::__min;
> >
> > My apologies, but why does the sign of x decide the direction of the
> > result, shouldn't that be the sign of the returned value of z?
>
> z is incorrect at this point, it only has the correct value if no
> overflow occurred. But we know that an overflow occurred because the
> built-in returned true.
>
> We need to determine whether the overflow was positive, i.e. greater
> than numeric_limits::max(), or negative, i.e. lower than
> numeric_limits::min(). For unsigned types, it must have been a
> positive overflow, because neither value is negative so that's easy.
>
> If x is negative, then there is no possible y that can cause a
> positive overflow. If we consider Tp==int, then the maximum y is
> INT_MAX, so if x is negative, x+INT_MAX < INT_MAX. So if x is
> negative, we must have had a negative overflow, and so the result
> saturates to INT_MIN.
>
> If x is positive, there is no possible y that can cause a negative
> overflow. The minimum y is INT_MIN, and so if x is positive, x +
> INT_MIN > INT_MIN. So if x is positive, we must have had a positive
> overflow.
>
> (And x can't be zero, because 0+y would not overflow).

Ah right, thanks.

- Daniel


aarch64: Add support for SME2

2023-11-17 Thread Richard Sandiford
This series of patches adds support for SME2.  It is gated behind
the earlier series for SME.

All of the detail is in the individual patch summaries.

Tested on aarch64-linux-gnu.

Richard


Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Jonathan Wakely
On Fri, 17 Nov 2023 at 17:01, Daniel Krügler  wrote:
>
> Am Fr., 17. Nov. 2023 um 16:32 Uhr schrieb Jonathan Wakely 
> :
> >
> > Tested x86_64-linux. Pushed to trunk.
> >
> > GCC generates better code for add_sat if we use:
> >
> > unsigned z = x + y;
> > z |= -(z < x);
> > return z;
> >
> > If the compiler can't be improved we should consider using that instead
> > of __builtin_add_overflow.
> >
> >
> > -- >8 --
> >
> >
> > This was approved for C++26 last week at the WG21 meeting in Kona.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * include/Makefile.am: Add new header.
> > * include/Makefile.in: Regenerate.
> > * include/bits/version.def (saturation_arithmetic): Define.
> > * include/bits/version.h: Regenerate.
> > * include/std/numeric: Include new header.
> > * include/bits/sat_arith.h: New file.
> > * testsuite/26_numerics/saturation/add.cc: New test.
> > * testsuite/26_numerics/saturation/cast.cc: New test.
> > * testsuite/26_numerics/saturation/div.cc: New test.
> > * testsuite/26_numerics/saturation/mul.cc: New test.
> > * testsuite/26_numerics/saturation/sub.cc: New test.
> > * testsuite/26_numerics/saturation/version.cc: New test.
> > ---
> >  libstdc++-v3/include/Makefile.am  |   1 +
> >  libstdc++-v3/include/Makefile.in  |   1 +
> >  libstdc++-v3/include/bits/sat_arith.h | 148 ++
> >  libstdc++-v3/include/bits/version.def |   8 +
> >  libstdc++-v3/include/bits/version.h   |  11 ++
> >  libstdc++-v3/include/std/numeric  |   5 +
> >  .../testsuite/26_numerics/saturation/add.cc   |  73 +
> >  .../testsuite/26_numerics/saturation/cast.cc  |  24 +++
> >  .../testsuite/26_numerics/saturation/div.cc   |  45 ++
> >  .../testsuite/26_numerics/saturation/mul.cc   |  34 
> >  .../testsuite/26_numerics/saturation/sub.cc   |  86 ++
> >  .../26_numerics/saturation/version.cc |  19 +++
> >  12 files changed, 455 insertions(+)
> >  create mode 100644 libstdc++-v3/include/bits/sat_arith.h
> >  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/add.cc
> >  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/cast.cc
> >  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/div.cc
> >  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/mul.cc
> >  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/sub.cc
> >  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/version.cc
> >
> > diff --git a/libstdc++-v3/include/Makefile.am 
> > b/libstdc++-v3/include/Makefile.am
> > index dab9f720cbb..17d9d9cec31 100644
> > --- a/libstdc++-v3/include/Makefile.am
> > +++ b/libstdc++-v3/include/Makefile.am
> > @@ -142,6 +142,7 @@ bits_freestanding = \
> > ${bits_srcdir}/ranges_uninitialized.h \
> > ${bits_srcdir}/ranges_util.h \
> > ${bits_srcdir}/refwrap.h \
> > +   ${bits_srcdir}/sat_arith.h \
> > ${bits_srcdir}/stl_algo.h \
> > ${bits_srcdir}/stl_algobase.h \
> > ${bits_srcdir}/stl_construct.h \
> > diff --git a/libstdc++-v3/include/Makefile.in 
> > b/libstdc++-v3/include/Makefile.in
> > index 4f7ab2dfbab..f038af709cc 100644
> > --- a/libstdc++-v3/include/Makefile.in
> > +++ b/libstdc++-v3/include/Makefile.in
> > @@ -497,6 +497,7 @@ bits_freestanding = \
> > ${bits_srcdir}/ranges_uninitialized.h \
> > ${bits_srcdir}/ranges_util.h \
> > ${bits_srcdir}/refwrap.h \
> > +   ${bits_srcdir}/sat_arith.h \
> > ${bits_srcdir}/stl_algo.h \
> > ${bits_srcdir}/stl_algobase.h \
> > ${bits_srcdir}/stl_construct.h \
> > diff --git a/libstdc++-v3/include/bits/sat_arith.h 
> > b/libstdc++-v3/include/bits/sat_arith.h
> > new file mode 100644
> > index 000..71793467984
> > --- /dev/null
> > +++ b/libstdc++-v3/include/bits/sat_arith.h
> > @@ -0,0 +1,148 @@
> > +// Saturation arithmetic -*- C++ -*-
> > +
> > +// Copyright The GNU Toolchain Authors.
> > +//
> > +// This file is part of the GNU ISO C++ Library.  This library is free
> > +// software; you can redistribute it and/or modify it under the
> > +// terms of the GNU General Public License as published by the
> > +// Free Software Foundation; either version 3, or (at your option)
> > +// any later version.
> > +
> > +// This library is distributed in the hope that it will be useful,
> > +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +// GNU General Public License for more details.
> > +
> > +// Under Section 7 of GPL version 3, you are granted additional
> > +// permissions described in the GCC Runtime Library Exception, version
> > +// 3.1, as published by the Free Software Foundation.
> > +
> > +// You should have received a copy of the GNU General Public License and
> > +// a copy of the GCC Runtime Library 

[PATCH 20/21] aarch64: Enforce inlining restrictions for SME

2023-11-17 Thread Richard Sandiford
A function that has local ZA state cannot be inlined into its caller,
since we only support managing ZA switches at function scope.

A function whose body directly clobbers ZA state cannot be inlined into
a function with ZA state.

A function whose body requires a particular PSTATE.SM setting can only
be inlined into a function body that guarantees that PSTATE.SM setting.
The callee's function type doesn't matter here: one locally-streaming
function can be inlined into another.

gcc/
* config/aarch64/aarch64.cc: Include symbol-summary.h, ipa-prop.h,
and ipa-fnsummary.h
(aarch64_function_attribute_inlinable_p): New function.
(AARCH64_IPA_SM_FIXED, AARCH64_IPA_CLOBBERS_ZA): New constants.
(aarch64_need_ipa_fn_target_info): New function.
(aarch64_update_ipa_fn_target_info): Likewise.
(aarch64_can_inline_p): Restrict the previous ISA flag checks
to non-modal features.  Prevent callees that require a particular
PSTATE.SM state from being inlined into callers that can't guarantee
that state.  Also prevent callees that have ZA state from being
inlined into callers that don't.  Finally, prevent callees that
clobber ZA from being inlined into callers that have ZA state.
(TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P): Define.
(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sme/inlining_1.c: New test.
* gcc.target/aarch64/sme/inlining_2.c: Likewise.
* gcc.target/aarch64/sme/inlining_3.c: Likewise.
* gcc.target/aarch64/sme/inlining_4.c: Likewise.
* gcc.target/aarch64/sme/inlining_5.c: Likewise.
* gcc.target/aarch64/sme/inlining_6.c: Likewise.
* gcc.target/aarch64/sme/inlining_7.c: Likewise.
* gcc.target/aarch64/sme/inlining_8.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc | 132 +-
 .../gcc.target/aarch64/sme/inlining_1.c   |  47 +++
 .../gcc.target/aarch64/sme/inlining_10.c  |  57 
 .../gcc.target/aarch64/sme/inlining_11.c  |  57 
 .../gcc.target/aarch64/sme/inlining_12.c  |  15 ++
 .../gcc.target/aarch64/sme/inlining_13.c  |  15 ++
 .../gcc.target/aarch64/sme/inlining_14.c  |  15 ++
 .../gcc.target/aarch64/sme/inlining_15.c  |  27 
 .../gcc.target/aarch64/sme/inlining_2.c   |  47 +++
 .../gcc.target/aarch64/sme/inlining_3.c   |  47 +++
 .../gcc.target/aarch64/sme/inlining_4.c   |  47 +++
 .../gcc.target/aarch64/sme/inlining_5.c   |  47 +++
 .../gcc.target/aarch64/sme/inlining_6.c   |  31 
 .../gcc.target/aarch64/sme/inlining_7.c   |  31 
 .../gcc.target/aarch64/sme/inlining_8.c   |  31 
 .../gcc.target/aarch64/sme/inlining_9.c   |  55 
 16 files changed, 696 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_10.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_11.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_12.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_13.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_14.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_15.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_8.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/inlining_9.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 340aa438d49..6fa77d79dd7 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -88,6 +88,9 @@
 #include "except.h"
 #include "tree-pass.h"
 #include "cfgbuild.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -21533,6 +21536,17 @@ aarch64_option_valid_attribute_p (tree fndecl, tree, 
tree args, int)
   return ret;
 }
 
+/* Implement TARGET_FUNCTION_ATTRIBUTE_INLINABLE_P.  Use an opt-out
+   rather than an opt-in list.  */
+
+static bool
+aarch64_function_attribute_inlinable_p (const_tree fndecl)
+{
+  /* A function that has local ZA state cannot be inlined into its caller,
+ since we only support managing ZA switches at function scope.  */
+  return !aarch64_fndecl_has_new_state (fndecl, "za");
+}
+
 /* Helper for aarch64_can_inline_p.  In the case where CALLER and CALLEE are

[PATCH 21/21] aarch64: Update sibcall handling for SME

2023-11-17 Thread Richard Sandiford
We only support tail calls between functions with the same PSTATE.ZA
setting ("private-ZA" to "private-ZA" and "shared-ZA" to "shared-ZA").

Only a normal non-streaming function can tail-call another non-streaming
function, and only a streaming function can tail-call another streaming
function.  Any function can tail-call a streaming-compatible function.

gcc/
* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall):
Enforce PSTATE.SM and PSTATE.ZA restrictions.
(aarch64_expand_epilogue): Save and restore the arguments
to a sibcall around any change to PSTATE.SM.

gcc/testsuite/
* gcc.target/aarch64/sme/sibcall_1.c: New test.
* gcc.target/aarch64/sme/sibcall_2.c: Likewise.
* gcc.target/aarch64/sme/sibcall_3.c: Likewise.
* gcc.target/aarch64/sme/sibcall_4.c: Likewise.
* gcc.target/aarch64/sme/sibcall_5.c: Likewise.
* gcc.target/aarch64/sme/sibcall_6.c: Likewise.
* gcc.target/aarch64/sme/sibcall_7.c: Likewise.
* gcc.target/aarch64/sme/sibcall_8.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc |  9 +++-
 .../gcc.target/aarch64/sme/sibcall_1.c| 45 +++
 .../gcc.target/aarch64/sme/sibcall_2.c| 45 +++
 .../gcc.target/aarch64/sme/sibcall_3.c| 45 +++
 .../gcc.target/aarch64/sme/sibcall_4.c| 45 +++
 .../gcc.target/aarch64/sme/sibcall_5.c| 45 +++
 .../gcc.target/aarch64/sme/sibcall_6.c| 26 +++
 .../gcc.target/aarch64/sme/sibcall_7.c| 26 +++
 .../gcc.target/aarch64/sme/sibcall_8.c| 19 
 9 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_8.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 6fa77d79dd7..c8f99d5c991 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8498,6 +8498,11 @@ aarch64_function_ok_for_sibcall (tree, tree exp)
   if (crtl->abi->id () != expr_callee_abi (exp).id ())
 return false;
 
+  tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
+  if (aarch64_fntype_pstate_sm (fntype) & ~aarch64_cfun_incoming_pstate_sm ())
+return false;
+  if (aarch64_fntype_pstate_za (fntype) != aarch64_cfun_incoming_pstate_za ())
+return false;
   return true;
 }
 
@@ -11950,7 +11955,9 @@ aarch64_expand_epilogue (rtx_call_insn *sibcall)
guard_label = aarch64_guard_switch_pstate_sm (IP0_REGNUM,
  aarch64_isa_flags);
   aarch64_sme_mode_switch_regs return_switch;
-  if (crtl->return_rtx && REG_P (crtl->return_rtx))
+  if (sibcall)
+   return_switch.add_call_args (sibcall);
+  else if (crtl->return_rtx && REG_P (crtl->return_rtx))
return_switch.add_reg (GET_MODE (crtl->return_rtx),
   REGNO (crtl->return_rtx));
   return_switch.emit_prologue ();
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c 
b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c
new file mode 100644
index 000..c7530de5c37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_1.c
@@ -0,0 +1,45 @@
+/* { dg-options "-O2" } */
+
+void sc_callee () [[arm::streaming_compatible]];
+void s_callee () [[arm::streaming]];
+void n_callee ();
+
+[[arm::locally_streaming]] __attribute__((noipa)) void
+sc_ls_callee () [[arm::streaming_compatible]] {}
+[[arm::locally_streaming]] __attribute__((noipa)) void
+n_ls_callee () {}
+
+void
+sc_to_sc () [[arm::streaming_compatible]]
+{
+  sc_callee ();
+}
+/* { dg-final { scan-assembler {\tb\tsc_callee} } } */
+
+void
+sc_to_s () [[arm::streaming_compatible]]
+{
+  s_callee ();
+}
+/* { dg-final { scan-assembler {\tbl\ts_callee} } } */
+
+void
+sc_to_n () [[arm::streaming_compatible]]
+{
+  n_callee ();
+}
+/* { dg-final { scan-assembler {\tbl\tn_callee} } } */
+
+void
+sc_to_sc_ls () [[arm::streaming_compatible]]
+{
+  sc_ls_callee ();
+}
+/* { dg-final { scan-assembler {\tb\tsc_ls_callee} } } */
+
+void
+sc_to_n_ls () [[arm::streaming_compatible]]
+{
+  n_ls_callee ();
+}
+/* { dg-final { scan-assembler {\tbl\tn_ls_callee} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c 
b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c
new file mode 100644
index 000..8d1c8a9f901
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/sibcall_2.c
@@ -0,0 +1,45 @@
+/* { 

[PATCH 18/21] aarch64: Add support for __arm_locally_streaming

2023-11-17 Thread Richard Sandiford
This patch adds support for the __arm_locally_streaming attribute,
which allows a function to use SME internally without changing
the function's ABI.  The attribute is valid but redundant for
__arm_streaming functions.

gcc/
* config/aarch64/aarch64.cc (aarch64_arm_attribute_table): Add
arm::locally_streaming.
(aarch64_fndecl_is_locally_streaming): New function.
(aarch64_fndecl_sm_state): Handle locally-streaming functions.
(aarch64_cfun_enables_pstate_sm): New function.
(aarch64_add_offset): Add an argument that specifies whether
the streaming vector length should be used instead of the
prevailing one.
(aarch64_split_add_offset, aarch64_add_sp, aarch64_sub_sp): Likewise.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_expand_mov_immediate): Update calls accordingly.
(aarch64_need_old_pstate_sm): Return true for locally-streaming
streaming-compatible functions.
(aarch64_layout_frame): Force all call-preserved Z and P registers
to be saved and restored if the function switches PSTATE.SM in the
prologue.
(aarch64_get_separate_components): Disable shrink-wrapping of
such Z and P saves and restores.
(aarch64_use_late_prologue_epilogue): New function.
(aarch64_expand_prologue): Measure SVE lengths in the streaming
vector length for locally-streaming functions, then emit code
to enable streaming mode.
(aarch64_expand_epilogue): Likewise in reverse.
(TARGET_USE_LATE_PROLOGUE_EPILOGUE): Define.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __arm_locally_streaming.

gcc/testsuite/
* gcc.target/aarch64/sme/locally_streaming_1.c: New test.
* gcc.target/aarch64/sme/locally_streaming_2.c: Likewise.
* gcc.target/aarch64/sme/locally_streaming_3.c: Likewise.
* gcc.target/aarch64/sme/locally_streaming_4.c: Likewise.
* gcc.target/aarch64/sme/keyword_macros_1.c: Add
__arm_locally_streaming.
* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.
---
 gcc/config/aarch64/aarch64-c.cc   |   1 +
 gcc/config/aarch64/aarch64.cc | 233 +++--
 .../g++.target/aarch64/sme/keyword_macros_1.C |   1 +
 .../gcc.target/aarch64/sme/keyword_macros_1.c |   1 +
 .../aarch64/sme/locally_streaming_1.c | 466 ++
 .../aarch64/sme/locally_streaming_2.c | 177 +++
 .../aarch64/sme/locally_streaming_3.c | 273 ++
 .../aarch64/sme/locally_streaming_4.c | 145 ++
 8 files changed, 1259 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_4.c

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index f2fa5df1b82..2a8ca46987a 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -86,6 +86,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
 
   DEFINE_ARM_KEYWORD_MACRO ("streaming");
   DEFINE_ARM_KEYWORD_MACRO ("streaming_compatible");
+  DEFINE_ARM_KEYWORD_MACRO ("locally_streaming");
 
 #undef DEFINE_ARM_KEYWORD_MACRO
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 12753ac133e..6ad29a3a84f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3136,6 +3136,7 @@ static const attribute_spec aarch64_arm_attributes[] =
  NULL, attr_streaming_exclusions },
   { "streaming_compatible", 0, 0, false, true,  true,  true,
  NULL, attr_streaming_exclusions },
+  { "locally_streaming",  0, 0, true, false, false, false, NULL, NULL },
   { "new",   1, -1, true, false, false, false,
  handle_arm_new, NULL },
   { "preserves", 1, -1, false, true,  true,  true,
@@ -4445,6 +4446,16 @@ aarch64_fntype_isa_mode (const_tree fntype)
  | aarch64_fntype_pstate_za (fntype));
 }
 
+/* Return true if FNDECL uses streaming mode internally, as an
+   implementation choice.  */
+
+static bool
+aarch64_fndecl_is_locally_streaming (const_tree fndecl)
+{
+  return lookup_attribute ("arm", "locally_streaming",
+  DECL_ATTRIBUTES (fndecl));
+}
+
 /* Return the state of PSTATE.SM when compiling the body of
function FNDECL.  This might be different from the state of
PSTATE.SM on entry.  */
@@ -4452,6 +4463,9 @@ aarch64_fntype_isa_mode (const_tree fntype)
 static aarch64_feature_flags
 aarch64_fndecl_pstate_sm (const_tree fndecl)
 {
+  if (aarch64_fndecl_is_locally_streaming (fndecl))
+return AARCH64_FL_SM_ON;
+
   return aarch64_fntype_pstate_sm 

[PATCH 19/21] aarch64: Handle PSTATE.SM across abnormal edges

2023-11-17 Thread Richard Sandiford
PSTATE.SM is always off on entry to an exception handler, and on entry
to a nonlocal goto receiver.  Those entry points need to switch
PSTATE.SM back to the appropriate state for the current function.
In the case of streaming-compatible functions, they need to restore
the mode that the caller was originally using.

The requirement on nonlocal goto receivers means that nonlocal
jumps need to ensure that PSTATE.SM is zero.

gcc/
* config/aarch64/aarch64.cc: Include except.h
(aarch64_sme_mode_switch_regs::add_call_preserved_reg): New function.
(aarch64_sme_mode_switch_regs::add_call_preserved_regs): Likewise.
(aarch64_need_old_pstate_sm): Return true if the function has
a nonlocal-goto or exception receiver.
(aarch64_switch_pstate_sm_for_landing_pad): New function.
(aarch64_switch_pstate_sm_for_jump): Likewise.
(pass_switch_pstate_sm::gate): Enable the pass for all
streaming and streaming-compatible functions.
(pass_switch_pstate_sm::execute): Handle non-local gotos and their
receivers.  Handle exception handler entry points.

gcc/testsuite/
* g++.target/aarch64/sme/exceptions_2.C: New test.
* gcc.target/aarch64/sme/nonlocal_goto_1.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_2.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_3.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_4.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_5.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_6.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_7.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc | 141 -
 .../g++.target/aarch64/sme/exceptions_2.C | 148 ++
 .../gcc.target/aarch64/sme/nonlocal_goto_1.c  |  58 +++
 .../gcc.target/aarch64/sme/nonlocal_goto_2.c  |  44 ++
 .../gcc.target/aarch64/sme/nonlocal_goto_3.c  |  46 ++
 .../gcc.target/aarch64/sme/nonlocal_goto_4.c  |  25 +++
 .../gcc.target/aarch64/sme/nonlocal_goto_5.c  |  26 +++
 .../gcc.target/aarch64/sme/nonlocal_goto_6.c  |  31 
 .../gcc.target/aarch64/sme/nonlocal_goto_7.c  |  25 +++
 9 files changed, 537 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/sme/exceptions_2.C
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/nonlocal_goto_7.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 6ad29a3a84f..340aa438d49 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -85,6 +85,7 @@
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
 #include "ssa.h"
+#include "except.h"
 #include "tree-pass.h"
 #include "cfgbuild.h"
 
@@ -7132,6 +7133,8 @@ public:
   void add_reg (machine_mode, unsigned int);
   void add_call_args (rtx_call_insn *);
   void add_call_result (rtx_call_insn *);
+  void add_call_preserved_reg (unsigned int);
+  void add_call_preserved_regs (bitmap);
 
   void emit_prologue ();
   void emit_epilogue ();
@@ -7264,6 +7267,46 @@ aarch64_sme_mode_switch_regs::add_call_result 
(rtx_call_insn *call_insn)
 add_reg (GET_MODE (dest), REGNO (dest));
 }
 
+/* REGNO is a register that is call-preserved under the current function's ABI.
+   Record that it must be preserved around the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_call_preserved_reg (unsigned int regno)
+{
+  if (FP_REGNUM_P (regno))
+switch (crtl->abi->id ())
+  {
+  case ARM_PCS_SVE:
+   add_reg (VNx16QImode, regno);
+   break;
+  case ARM_PCS_SIMD:
+   add_reg (V16QImode, regno);
+   break;
+  case ARM_PCS_AAPCS64:
+   add_reg (DImode, regno);
+   break;
+  default:
+   gcc_unreachable ();
+  }
+  else if (PR_REGNUM_P (regno))
+add_reg (VNx16BImode, regno);
+}
+
+/* The hard registers in REGS are call-preserved under the current function's
+   ABI.  Record that they must be preserved around the mode switch.  */
+
+void
+aarch64_sme_mode_switch_regs::add_call_preserved_regs (bitmap regs)
+{
+  bitmap_iterator bi;
+  unsigned int regno;
+  EXECUTE_IF_SET_IN_BITMAP (regs, 0, regno, bi)
+if (HARD_REGISTER_NUM_P (regno))
+  add_call_preserved_reg (regno);
+else
+  break;
+}
+
 /* Emit code to save registers before the mode switch.  */
 
 void
@@ -9798,6 +9841,23 @@ aarch64_need_old_pstate_sm ()
   if (aarch64_cfun_enables_pstate_sm ())
 return true;
 
+  /* Non-local goto receivers are entered with 

[PATCH 14/21] aarch64: Add a VNx1TI mode

2023-11-17 Thread Richard Sandiford
Although TI isn't really a native SVE element mode, it's convenient
for SME if we define VNx1TI anyway, so that it can be used to
distinguish .Q ZA operations from others.  It's purely an RTL
convenience and isn't (yet) a valid storage mode.

gcc/
* config/aarch64/aarch64-modes.def: Add VNx1TI.
---
 gcc/config/aarch64/aarch64-modes.def | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index 6b4f4e17dd5..a3efc5b8484 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -156,7 +156,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
for 8-bit, 16-bit, 32-bit and 64-bit elements respectively.  It isn't
strictly necessary to set the alignment here, since the default would
be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer.  */
-#define SVE_MODES(NVECS, VB, VH, VS, VD) \
+#define SVE_MODES(NVECS, VB, VH, VS, VD, VT) \
   VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS, NVECS == 1 ? 1 : 4); \
   VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS, NVECS == 1 ? 1 : 4); \
   \
@@ -164,6 +164,7 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
   ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \
   ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \
+  ADJUST_NUNITS (VT##TI, exact_div (aarch64_sve_vg * NVECS, 2)); \
   ADJUST_NUNITS (VH##BF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \
   ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \
@@ -173,17 +174,23 @@ ADV_SIMD_Q_REG_STRUCT_MODES (4, V4x16, V4x8, V4x4, V4x2)
   ADJUST_ALIGNMENT (VH##HI, 16); \
   ADJUST_ALIGNMENT (VS##SI, 16); \
   ADJUST_ALIGNMENT (VD##DI, 16); \
+  ADJUST_ALIGNMENT (VT##TI, 16); \
   ADJUST_ALIGNMENT (VH##BF, 16); \
   ADJUST_ALIGNMENT (VH##HF, 16); \
   ADJUST_ALIGNMENT (VS##SF, 16); \
   ADJUST_ALIGNMENT (VD##DF, 16);
 
-/* Give SVE vectors the names normally used for 256-bit vectors.
-   The actual number depends on command-line flags.  */
-SVE_MODES (1, VNx16, VNx8, VNx4, VNx2)
-SVE_MODES (2, VNx32, VNx16, VNx8, VNx4)
-SVE_MODES (3, VNx48, VNx24, VNx12, VNx6)
-SVE_MODES (4, VNx64, VNx32, VNx16, VNx8)
+/* Give SVE vectors names of the form VNxX, where X describes what is
+   stored in each 128-bit unit.  The actual size of the mode depends
+   on command-line flags.
+
+   VNx1TI isn't really a native SVE mode, but it can be useful in some
+   limited situations.  */
+VECTOR_MODE_WITH_PREFIX (VNx, INT, TI, 1, 1);
+SVE_MODES (1, VNx16, VNx8, VNx4, VNx2, VNx1)
+SVE_MODES (2, VNx32, VNx16, VNx8, VNx4, VNx2)
+SVE_MODES (3, VNx48, VNx24, VNx12, VNx6, VNx3)
+SVE_MODES (4, VNx64, VNx32, VNx16, VNx8, VNx4)
 
 /* Partial SVE vectors:
 
-- 
2.25.1



[PATCH 13/21] aarch64: Add a register class for w12-w15

2023-11-17 Thread Richard Sandiford
Some SME instructions use w12-w15 to index ZA.  This patch
adds a register class for that range.

gcc/
* config/aarch64/aarch64.h (W12_W15_REGNUM_P): New macro.
(W12_W15_REGS): New register class.
(REG_CLASS_NAMES, REG_CLASS_CONTENTS): Add entries for it.
* config/aarch64/aarch64.cc (aarch64_regno_regclass)
(aarch64_class_max_nregs, aarch64_register_move_cost): Handle
W12_W15_REGS.
---
 gcc/config/aarch64/aarch64.cc | 12 +++-
 gcc/config/aarch64/aarch64.h  |  6 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2782feef0f3..1e4d1b03c0a 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14368,6 +14368,9 @@ aarch64_label_mentioned_p (rtx x)
 enum reg_class
 aarch64_regno_regclass (unsigned regno)
 {
+  if (W12_W15_REGNUM_P (regno))
+return W12_W15_REGS;
+
   if (STUB_REGNUM_P (regno))
 return STUB_REGS;
 
@@ -14732,6 +14735,7 @@ aarch64_class_max_nregs (reg_class_t regclass, 
machine_mode mode)
   unsigned int nregs, vec_flags;
   switch (regclass)
 {
+case W12_W15_REGS:
 case STUB_REGS:
 case TAILCALL_ADDR_REGS:
 case POINTER_REGS:
@@ -17090,13 +17094,11 @@ aarch64_register_move_cost (machine_mode mode,
   const struct cpu_regmove_cost *regmove_cost
 = aarch64_tune_params.regmove_cost;
 
-  /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS
-  || to == STUB_REGS)
+  /* Trest any subset of POINTER_REGS as though it were GENERAL_REGS.  */
+  if (reg_class_subset_p (to, POINTER_REGS))
 to = GENERAL_REGS;
 
-  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS
-  || from == STUB_REGS)
+  if (reg_class_subset_p (from, POINTER_REGS))
 from = GENERAL_REGS;
 
   /* Make RDFFR very expensive.  In particular, if we know that the FFR
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index dc544273d32..83bd8ebdad7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -660,6 +660,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
&& (REGNO) != R17_REGNUM \
&& (REGNO) != R30_REGNUM) \
 
+#define W12_W15_REGNUM_P(REGNO) \
+  IN_RANGE (REGNO, R12_REGNUM, R15_REGNUM)
+
 #define FP_REGNUM_P(REGNO) \
   (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM))
 
@@ -686,6 +689,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 enum reg_class
 {
   NO_REGS,
+  W12_W15_REGS,
   TAILCALL_ADDR_REGS,
   STUB_REGS,
   GENERAL_REGS,
@@ -710,6 +714,7 @@ enum reg_class
 #define REG_CLASS_NAMES\
 {  \
   "NO_REGS",   \
+  "W12_W15_REGS",  \
   "TAILCALL_ADDR_REGS",\
   "STUB_REGS", \
   "GENERAL_REGS",  \
@@ -731,6 +736,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS \
 {  \
   { 0x, 0x, 0x },  /* NO_REGS */   \
+  { 0xf000, 0x, 0x },  /* W12_W15_REGS */  \
   { 0x0003, 0x, 0x },  /* TAILCALL_ADDR_REGS */\
   { 0x3ffc, 0x, 0x },  /* STUB_REGS */ \
   { 0x7fff, 0x, 0x0003 },  /* GENERAL_REGS */  \
-- 
2.25.1



[PATCH 16/21] aarch64: Generalise _m rules for SVE intrinsics

2023-11-17 Thread Richard Sandiford
In SVE there was a simple rule that unary merging (_m) intrinsics
had a separate initial argument to specify the values of inactive
lanes, whereas other merging functions took inactive lanes from
the first operand to the operation.

That rule began to break down in SVE2, and it continues to do
so in SME.  This patch therefore adds a virtual function to
specify whether the separate initial argument is present or not.
The old rule is still the default.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_shape::has_merge_argument_p): New member function.
* config/aarch64/aarch64-sve-builtins.cc:
(function_resolver::check_gp_argument): Use it.
(function_expander::get_fallback_value): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(apply_predication): Likewise.
(unary_convert_narrowt_def::has_merge_argument_p): New function.
---
 gcc/config/aarch64/aarch64-sve-builtins-shapes.cc | 10 --
 gcc/config/aarch64/aarch64-sve-builtins.cc|  4 ++--
 gcc/config/aarch64/aarch64-sve-builtins.h | 13 +
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index aa5dbb5df9d..8f6c0515ed6 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -66,8 +66,8 @@ apply_predication (const function_instance , tree 
return_type,
 the same type as the result.  For unary_convert_narrowt it also
 provides the "bottom" half of active elements, and is present
 for all types of predication.  */
-  if ((argument_types.length () == 2 && instance.pred == PRED_m)
- || instance.shape == shapes::unary_convert_narrowt)
+  auto nargs = argument_types.length () - 1;
+  if (instance.shape->has_merge_argument_p (instance, nargs))
argument_types.quick_insert (0, return_type);
 }
 }
@@ -3273,6 +3273,12 @@ SHAPE (unary_convert)
predicate.  */
 struct unary_convert_narrowt_def : public overloaded_base<1>
 {
+  bool
+  has_merge_argument_p (const function_instance &, unsigned int) const override
+  {
+return true;
+  }
+
   void
   build (function_builder , const function_group_info ) const override
   {
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 41e7d88bffa..b2d16c318e9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2230,7 +2230,7 @@ function_resolver::check_gp_argument (unsigned int nops,
   if (pred != PRED_none)
 {
   /* Unary merge operations should use resolve_unary instead.  */
-  gcc_assert (nops != 1 || pred != PRED_m);
+  gcc_assert (!shape->has_merge_argument_p (*this, nops));
   nargs = nops + 1;
   if (!check_num_arguments (nargs)
  || !require_vector_type (i, VECTOR_TYPE_svbool_t))
@@ -2874,7 +2874,7 @@ function_expander::get_fallback_value (machine_mode mode, 
unsigned int nops,
 
   gcc_assert (pred == PRED_m || pred == PRED_x);
   if (merge_argno == DEFAULT_MERGE_ARGNO)
-merge_argno = nops == 1 && pred == PRED_m ? 0 : 1;
+merge_argno = shape->has_merge_argument_p (*this, nops) ? 0 : 1;
 
   if (merge_argno == 0)
 return args[argno++];
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
b/gcc/config/aarch64/aarch64-sve-builtins.h
index 981a57d82d2..c65c1f6e959 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -676,6 +676,9 @@ public:
 class function_shape
 {
 public:
+  virtual bool has_merge_argument_p (const function_instance &,
+unsigned int) const;
+
   virtual bool explicit_type_suffix_p (unsigned int) const = 0;
 
   /* True if the group suffix is present in overloaded names.
@@ -948,6 +951,16 @@ function_base::vectors_per_tuple (const function_instance 
) const
   return instance.group_suffix ().vectors_per_tuple;
 }
 
+/* Return true if INSTANCE (which has NARGS arguments) has an initial
+   vector argument whose only purpose is to specify the values of
+   inactive lanes.  */
+inline bool
+function_shape::has_merge_argument_p (const function_instance ,
+ unsigned int nargs) const
+{
+  return nargs == 1 && instance.pred == PRED_m;
+}
+
 /* Return the mode of the result of a call.  */
 inline machine_mode
 function_expander::result_mode () const
-- 
2.25.1



[PATCH 11/21] aarch64: Switch PSTATE.SM around calls

2023-11-17 Thread Richard Sandiford
This patch adds support for switching to the appropriate SME mode
for each call.  Switching to streaming mode requires an SMSTART SM
instruction and switching to non-streaming mode requires an SMSTOP SM
instruction.  If the call is being made from streaming-compatible code,
these switches are conditional on the current mode being the opposite
of the one that the call needs.

Since changing PSTATE.SM changes the vector length and effectively
changes the ISA, the code to do the switching has to be emitted late.
The patch does this using a new pass that runs next to late prologue/
epilogue insertion.  (It doesn't use md_reorg because later additions
need the CFG.)

If a streaming-compatible function needs to switch mode for a call,
it must restore the original mode afterwards.  The old mode must
therefore be available immediately after the call.  The easiest
way of ensuring this is to force the use of a hard frame pointer
and ensure that the old state is saved at an in-range offset
from there.

Changing modes clobbers the Z and P registers, so we need to
save and restore live Z and P state around each mode switch.
However, mode switches are not expected to be performance
critical, so it seemed better to err on the side of being
correct rather than trying to optimise the save and restore
with surrounding code.

gcc/
* config/aarch64/aarch64-passes.def
(pass_late_thread_prologue_and_epilogue): New pass.
* config/aarch64/aarch64-sme.md: New file.
* config/aarch64/aarch64.md: Include it.
(*tb1): Rename to...
(@aarch64_tb): ...this.
(call, call_value, sibcall, sibcall_value): Don't require operand 2
to be a CONST_INT.
* config/aarch64/aarch64-protos.h (aarch64_emit_call_insn): Return
the insn.
(make_pass_switch_sm_state): Declare.
* config/aarch64/aarch64.h (TARGET_STREAMING_COMPATIBLE): New macro.
(CALL_USED_REGISTER): Mark VG as call-preserved.
(aarch64_frame::old_svcr_offset): New member variable.
(machine_function::call_switches_sm_state): Likewise.
(CUMULATIVE_ARGS::num_sme_mode_switch_args): Likewise.
(CUMULATIVE_ARGS::sme_mode_switch_args): Likewise.
* config/aarch64/aarch64.cc: Include tree-pass.h and cfgbuild.h.
(aarch64_cfun_incoming_pstate_sm): New function.
(aarch64_call_switches_pstate_sm): Likewise.
(aarch64_reg_save_mode): Return DImode for VG_REGNUM.
(aarch64_callee_isa_mode): New function.
(aarch64_insn_callee_isa_mode): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_switch_pstate_sm): Likewise.
(aarch64_sme_mode_switch_regs): New class.
(aarch64_record_sme_mode_switch_args): New function.
(aarch64_finish_sme_mode_switch_args): Likewise.
(aarch64_function_arg): Handle the end marker by returning a
PARALLEL that contains the ABI cookie that we used previously
alongside the result of aarch64_finish_sme_mode_switch_args.
(aarch64_init_cumulative_args): Initialize num_sme_mode_switch_args.
(aarch64_function_arg_advance): If a call would switch SM state,
record all argument registers that would need to be saved around
the mode switch.
(aarch64_need_old_pstate_sm): New function.
(aarch64_layout_frame): Decide whether the frame needs to store the
incoming value of PSTATE.SM and allocate a save slot for it if so.
If a function switches SME state, arrange to save the old value
of the DWARF VG register.  Handle the case where this is the only
register save slot above the FP.
(aarch64_save_callee_saves): Handles saves of the DWARF VG register.
(aarch64_get_separate_components): Prevent such saves from being
shrink-wrapped.
(aarch64_old_svcr_mem): New function.
(aarch64_read_old_svcr): Likewise.
(aarch64_guard_switch_pstate_sm): Likewise.
(aarch64_expand_prologue): Handle saves of the DWARF VG register.
Initialize any SVCR save slot.
(aarch64_expand_call): Allow the cookie to be PARALLEL that contains
both the UNSPEC_CALLEE_ABI value and a list of registers that need
to be preserved across a change to PSTATE.SM.  If the call does
involve such a change to PSTATE.SM, record the registers that
would be clobbered by this process.  Also emit an instruction
to mark the temporary change in VG.  Update call_switches_pstate_sm.
(aarch64_emit_call_insn): Return the emitted instruction.
(aarch64_frame_pointer_required): New function.
(aarch64_conditional_register_usage): Prevent VG_REGNUM from being
treated as a register operand.
(aarch64_switch_pstate_sm_for_call): New function.
(pass_data_switch_pstate_sm): New pass variable.
(pass_switch_pstate_sm): New pass class.
(make_pass_switch_pstate_sm): New 

[PATCH 15/21] aarch64: Generalise unspec_based_function_base

2023-11-17 Thread Richard Sandiford
Until now, SVE intrinsics that map directly to unspecs
have always used type suffix 0 to distinguish between signed
integers, unsigned integers, and floating-point values.
SME adds functions that need to use type suffix 1 instead.
This patch generalises the classes accordingly.

gcc/
* config/aarch64/aarch64-sve-builtins-functions.h
(unspec_based_function_base): Allow type suffix 1 to determine
the mode of the operation.
(unspec_based_function): Update accordingly.
(unspec_based_fused_function): Likewise.
(unspec_based_fused_lane_function): Likewise.
---
 .../aarch64/aarch64-sve-builtins-functions.h  | 29 ---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h 
b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
index 4a10102038a..be2561620f4 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
@@ -234,18 +234,21 @@ class unspec_based_function_base : public function_base
 public:
   CONSTEXPR unspec_based_function_base (int unspec_for_sint,
int unspec_for_uint,
-   int unspec_for_fp)
+   int unspec_for_fp,
+   unsigned int suffix_index = 0)
 : m_unspec_for_sint (unspec_for_sint),
   m_unspec_for_uint (unspec_for_uint),
-  m_unspec_for_fp (unspec_for_fp)
+  m_unspec_for_fp (unspec_for_fp),
+  m_suffix_index (suffix_index)
   {}
 
   /* Return the unspec code to use for INSTANCE, based on type suffix 0.  */
   int
   unspec_for (const function_instance ) const
   {
-return (!instance.type_suffix (0).integer_p ? m_unspec_for_fp
-   : instance.type_suffix (0).unsigned_p ? m_unspec_for_uint
+auto  = instance.type_suffix (m_suffix_index);
+return (!suffix.integer_p ? m_unspec_for_fp
+   : suffix.unsigned_p ? m_unspec_for_uint
: m_unspec_for_sint);
   }
 
@@ -254,6 +257,9 @@ public:
   int m_unspec_for_sint;
   int m_unspec_for_uint;
   int m_unspec_for_fp;
+
+  /* Which type suffix is used to choose between the unspecs.  */
+  unsigned int m_suffix_index;
 };
 
 /* A function_base for functions that have an associated unspec code.
@@ -306,7 +312,8 @@ public:
   rtx
   expand (function_expander ) const override
   {
-return e.use_exact_insn (CODE (unspec_for (e), e.vector_mode (0)));
+return e.use_exact_insn (CODE (unspec_for (e),
+  e.vector_mode (m_suffix_index)));
   }
 };
 
@@ -360,16 +367,16 @@ public:
   {
 int unspec = unspec_for (e);
 insn_code icode;
-if (e.type_suffix (0).float_p)
+if (e.type_suffix (m_suffix_index).float_p)
   {
/* Put the operands in the normal (fma ...) order, with the accumulator
   last.  This fits naturally since that's also the unprinted operand
   in the asm output.  */
e.rotate_inputs_left (0, e.pred != PRED_none ? 4 : 3);
-   icode = code_for_aarch64_sve (unspec, e.vector_mode (0));
+   icode = code_for_aarch64_sve (unspec, e.vector_mode (m_suffix_index));
   }
 else
-  icode = INT_CODE (unspec, e.vector_mode (0));
+  icode = INT_CODE (unspec, e.vector_mode (m_suffix_index));
 return e.use_exact_insn (icode);
   }
 };
@@ -390,16 +397,16 @@ public:
   {
 int unspec = unspec_for (e);
 insn_code icode;
-if (e.type_suffix (0).float_p)
+if (e.type_suffix (m_suffix_index).float_p)
   {
/* Put the operands in the normal (fma ...) order, with the accumulator
   last.  This fits naturally since that's also the unprinted operand
   in the asm output.  */
e.rotate_inputs_left (0, e.pred != PRED_none ? 5 : 4);
-   icode = code_for_aarch64_lane (unspec, e.vector_mode (0));
+   icode = code_for_aarch64_lane (unspec, e.vector_mode (m_suffix_index));
   }
 else
-  icode = INT_CODE (unspec, e.vector_mode (0));
+  icode = INT_CODE (unspec, e.vector_mode (m_suffix_index));
 return e.use_exact_insn (icode);
   }
 };
-- 
2.25.1



[PATCH 12/21] aarch64: Add support for SME ZA attributes

2023-11-17 Thread Richard Sandiford
SME has an array called ZA that can be enabled and disabled separately
from streaming mode.  A status bit called PSTATE.ZA indicates whether
ZA is currently enabled or not.

In C and C++, the state of PSTATE.ZA is controlled using function
attributes.  There are four attributes that can be attached to
function types to indicate that the function shares ZA with its
caller.  These are:

- arm::in("za")
- arm::out("za")
- arm::inout("za")
- arm::preserves("za")

If a function's type has one of these shared-ZA attributes,
PSTATE.ZA is specified to be 1 on entry to the function and on return
from the function.  Otherwise, the caller and callee have separate
ZA contexts; they do not use ZA to share data.

Although normal non-shared-ZA functions have a separate ZA context
from their callers, nested uses of ZA are expected to be rare.
The ABI therefore defines a cooperative lazy saving scheme that
allows saves and restore of ZA to be kept to a minimum.
(Callers still have the option of doing a full save and restore
if they prefer.)

Functions that want to use ZA internally have an arm::new("za")
attribute, which tells the compiler to enable PSTATE.ZA for
the duration of the function body.  It also tells the compiler
to commit any lazy save initiated by a caller.

The patch uses various abstract hard registers to track dataflow
relating to ZA.  See the comments in the patch for details.

The lazy save scheme is intended be transparent to most normal
functions, so that they don't need to be recompiled for SME.
This is reflected in the way that most normal functions ignore
the new hard registers added in the patch.

As with arm::streaming and arm::streaming_compatible, the attributes are
also available as __arm_.  This has two advantages: it triggers an
error on compilers that don't understand the attributes, and it eases
use on C, where [[...]] attributes were only added in C23.

gcc/
* config/aarch64/aarch64-isa-modes.def (ZA_ON): New ISA mode.
* config/aarch64/aarch64-protos.h (aarch64_rdsvl_immediate_p)
(aarch64_output_rdsvl, aarch64_optimize_mode_switching)
(aarch64_restore_za): Declare.
* config/aarch64/constraints.md (UsR): New constraint.
* config/aarch64/aarch64.md (LOWERING_REGNUM, TPIDR_BLOCK_REGNUM)
(SME_STATE_REGNUM, TPIDR2_SETUP_REGNUM, ZA_FREE_REGNUM)
(ZA_SAVED_REGNUM, ZA_REGNUM, FIRST_FAKE_REGNUM): New constants.
(LAST_FAKE_REGNUM): Likewise.
(UNSPEC_SAVE_NZCV, UNSPEC_RESTORE_NZCV, UNSPEC_SME_VQ): New unspecs.
(arches): Add sme.
(arch_enabled): Handle it.
(*cb1): Rename to...
(aarch64_cb1): ...this.
(*movsi_aarch64): Add an alternative for RDSVL.
(*movdi_aarch64): Likewise.
(aarch64_save_nzcv, aarch64_restore_nzcv): New insns.
* config/aarch64/aarch64-sme.md (UNSPEC_SMSTOP_ZA)
(UNSPEC_INITIAL_ZERO_ZA, UNSPEC_TPIDR2_SAVE, UNSPEC_TPIDR2_RESTORE)
(UNSPEC_READ_TPIDR2, UNSPEC_WRITE_TPIDR2, UNSPEC_SETUP_LOCAL_TPIDR2)
(UNSPEC_RESTORE_ZA, UNSPEC_START_PRIVATE_ZA_CALL): New unspecs.
(UNSPEC_END_PRIVATE_ZA_CALL, UNSPEC_COMMIT_LAZY_SAVE): Likewise.
(UNSPECV_ASM_UPDATE_ZA): New unspecv.
(aarch64_tpidr2_save, aarch64_smstart_za, aarch64_smstop_za)
(aarch64_initial_zero_za, aarch64_setup_local_tpidr2)
(aarch64_clear_tpidr2, aarch64_write_tpidr2, aarch64_read_tpidr2)
(aarch64_tpidr2_restore, aarch64_restore_za, aarch64_asm_update_za)
(aarch64_start_private_za_call, aarch64_end_private_za_call)
(aarch64_commit_lazy_save): New patterns.
* config/aarch64/aarch64.h (AARCH64_ISA_ZA_ON, TARGET_ZA): New macros.
(FIXED_REGISTERS, REGISTER_NAMES): Add the new fake ZA registers.
(CALL_USED_REGISTERS): Replace with...
(CALL_REALLY_USED_REGISTERS): ...this and add the fake ZA registers.
(FIRST_PSEUDO_REGISTER): Bump to include the fake ZA registers.
(FAKE_REGS): New register class.
(REG_CLASS_NAMES): Update accordingly.
(REG_CLASS_CONTENTS): Likewise.
(machine_function::tpidr2_block): New member variable.
(machine_function::tpidr2_block_ptr): Likewise.
(machine_function::za_save_buffer): Likewise.
(machine_function::next_asm_update_za_id): Likewise.
(CUMULATIVE_ARGS::shared_za_flags): Likewise.
(aarch64_mode_entity, aarch64_local_sme_state): New enums.
(aarch64_tristate_mode): Likewise.
(OPTIMIZE_MODE_SWITCHING, NUM_MODES_FOR_MODE_SWITCHING): Define.
* config/aarch64/aarch64.cc (AARCH64_STATE_SHARED, AARCH64_STATE_IN)
(AARCH64_STATE_OUT): New constants.
(aarch64_attribute_shared_state_flags): New function.
(aarch64_lookup_shared_state_flags, aarch64_fndecl_has_new_state)
(aarch64_check_state_string, cmp_string_csts): Likewise.
(aarch64_merge_string_arguments, aarch64_check_arm_new_against_type)
(handle_arm_new, 

[PATCH 09/21] aarch64: Distinguish streaming-compatible AdvSIMD insns

2023-11-17 Thread Richard Sandiford
The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are.  This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.

The vector-to-vector move instructions are not streaming-compatible,
so we need to use the SVE move instructions where enabled, or fall
back to the nofp16 handling otherwise.

I haven't found a good way of testing the SVE EXT alternative
in aarch64_simd_mov_from_high, but I'd rather provide it
than not.

gcc/
* config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro.
(TARGET_SIMD): Require PSTATE.SM to be 0.
(AARCH64_ISA_SM_OFF): New macro.
* config/aarch64/aarch64.cc (aarch64_array_mode_supported_p):
Allow Advanced SIMD structure modes for TARGET_BASE_SIMD.
(aarch64_print_operand): Support '%Z'.
(aarch64_secondary_reload): Expect SVE moves to be used for
Advanced SIMD modes if SVE is enabled and non-streaming
Advanced SIMD isn't.
(aarch64_register_move_cost): Likewise.
(aarch64_simd_container_mode): Extend Advanced SIMD mode
handling to TARGET_BASE_SIMD.
(aarch64_expand_cpymem): Expand commentary.
* config/aarch64/aarch64.md (arches): Add base_simd and nobase_simd.
(arch_enabled): Handle it.
(*mov_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD.
(*movti_aarch64): Use an SVE move instruction if non-streaming
SIMD isn't available.
(*mov_aarch64): Likewise.
(load_pair_dw_tftf): Extend to TARGET_BASE_SIMD.
(store_pair_dw_tftf): Likewise.
(loadwb_pair_): Likewise.
(storewb_pair_): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov):
Allow UMOV in streaming mode.
(*aarch64_simd_mov): Use an SVE move instruction
if non-streaming SIMD isn't available.
(aarch64_store_lane0): Depend on TARGET_FLOAT rather than
TARGET_SIMD.
(aarch64_simd_mov_from_low): Likewise.  Use fmov if
Advanced SIMD is completely disabled.
(aarch64_simd_mov_from_high): Use SVE EXT instructions if
non-streaming SIMD isn't available.

gcc/testsuite/
* gcc.target/aarch64/movdf_2.c: New test.
* gcc.target/aarch64/movdi_3.c: Likewise.
* gcc.target/aarch64/movhf_2.c: Likewise.
* gcc.target/aarch64/movhi_2.c: Likewise.
* gcc.target/aarch64/movqi_2.c: Likewise.
* gcc.target/aarch64/movsf_2.c: Likewise.
* gcc.target/aarch64/movsi_2.c: Likewise.
* gcc.target/aarch64/movtf_3.c: Likewise.
* gcc.target/aarch64/movtf_4.c: Likewise.
* gcc.target/aarch64/movti_3.c: Likewise.
* gcc.target/aarch64/movti_4.c: Likewise.
* gcc.target/aarch64/movv16qi_4.c: Likewise.
* gcc.target/aarch64/movv16qi_5.c: Likewise.
* gcc.target/aarch64/movv8qi_4.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_1.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_2.c: Likewise.
* gcc.target/aarch64/sme/arm_neon_3.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md| 48 +--
 gcc/config/aarch64/aarch64.cc | 16 ++--
 gcc/config/aarch64/aarch64.h  | 12 ++-
 gcc/config/aarch64/aarch64.md | 79 +
 gcc/testsuite/gcc.target/aarch64/movdf_2.c| 51 +++
 gcc/testsuite/gcc.target/aarch64/movdi_3.c| 59 +
 gcc/testsuite/gcc.target/aarch64/movhf_2.c| 53 
 gcc/testsuite/gcc.target/aarch64/movhi_2.c| 61 +
 gcc/testsuite/gcc.target/aarch64/movqi_2.c| 59 +
 gcc/testsuite/gcc.target/aarch64/movsf_2.c| 51 +++
 gcc/testsuite/gcc.target/aarch64/movsi_2.c| 59 +
 gcc/testsuite/gcc.target/aarch64/movtf_3.c| 81 +
 gcc/testsuite/gcc.target/aarch64/movtf_4.c| 78 +
 gcc/testsuite/gcc.target/aarch64/movti_3.c| 86 +++
 gcc/testsuite/gcc.target/aarch64/movti_4.c| 83 ++
 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c | 82 ++
 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c | 79 +
 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c  | 55 
 .../gcc.target/aarch64/sme/arm_neon_1.c   | 13 +++
 .../gcc.target/aarch64/sme/arm_neon_2.c   | 11 +++
 .../gcc.target/aarch64/sme/arm_neon_3.c   | 11 +++
 21 files changed, 1060 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdi_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movqi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movsf_2.c
 create mode 100644 

[PATCH 08/21] aarch64: Add +sme

2023-11-17 Thread Richard Sandiford
This patch adds the +sme ISA feature and requires it to be present
when compiling arm_streaming code.  (arm_streaming_compatible code
does not necessarily assume the presence of SME.  It just has to
work when SME is present and streaming mode is enabled.)

gcc/
* doc/invoke.texi: Document SME.
* doc/sourcebuild.texi: Document aarch64_sve.
* config/aarch64/aarch64-option-extensions.def (sme): Define.
* config/aarch64/aarch64.h (AARCH64_ISA_SME): New macro.
(TARGET_SME): Likewise.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Ensure that SME is present when compiling streaming code.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sme): New
target test.
* gcc.target/aarch64/sme/aarch64-sme.exp: Force SME to be enabled
if it isn't by default.
* g++.target/aarch64/sme/aarch64-sme.exp: Likewise.
* gcc.target/aarch64/sme/streaming_mode_3.c: New test.
---
 .../aarch64/aarch64-option-extensions.def |  2 +
 gcc/config/aarch64/aarch64.cc | 33 ++
 gcc/config/aarch64/aarch64.h  |  5 ++
 gcc/doc/invoke.texi   |  2 +
 gcc/doc/sourcebuild.texi  |  2 +
 .../g++.target/aarch64/sme/aarch64-sme.exp| 10 ++-
 .../gcc.target/aarch64/sme/aarch64-sme.exp| 10 ++-
 .../gcc.target/aarch64/sme/streaming_mode_3.c | 63 +++
 .../gcc.target/aarch64/sme/streaming_mode_4.c | 22 +++
 gcc/testsuite/lib/target-supports.exp | 12 
 10 files changed, 157 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/streaming_mode_4.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 825f3bf7758..fb9ff1b66b2 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -151,4 +151,6 @@ AARCH64_OPT_EXTENSION("mops", MOPS, (), (), (), "")
 
 AARCH64_OPT_EXTENSION("cssc", CSSC, (), (), (), "cssc")
 
+AARCH64_OPT_EXTENSION("sme", SME, (BF16, SVE2), (), (), "sme")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1a4ef2a4396..fcaea87c737 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -11737,6 +11737,23 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, 
unsigned int *p2)
   return true;
 }
 
+/* Implement TARGET_START_CALL_ARGS.  */
+
+static void
+aarch64_start_call_args (cumulative_args_t ca_v)
+{
+  CUMULATIVE_ARGS *ca = get_cumulative_args (ca_v);
+
+  if (!TARGET_SME && (ca->isa_mode & AARCH64_FL_SM_ON))
+{
+  error ("calling a streaming function requires the ISA extension %qs",
+"sme");
+  inform (input_location, "you can enable %qs using the command-line"
+ " option %<-march%>, or by using the %"
+ " attribute or pragma", "sme");
+}
+}
+
 /* This function is used by the call expanders of the machine description.
RESULT is the register in which the result is returned.  It's NULL for
"call" and "sibcall".
@@ -18541,6 +18558,19 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
   && !fixed_regs[R18_REGNUM])
 error ("%<-fsanitize=shadow-call-stack%> requires %<-ffixed-x18%>");
 
+  if ((opts->x_aarch64_isa_flags & AARCH64_FL_SM_ON)
+  && !(opts->x_aarch64_isa_flags & AARCH64_FL_SME))
+{
+  error ("streaming functions require the ISA extension %qs", "sme");
+  inform (input_location, "you can enable %qs using the command-line"
+ " option %<-march%>, or by using the %"
+ " attribute or pragma", "sme");
+  opts->x_target_flags &= ~MASK_GENERAL_REGS_ONLY;
+  auto new_flags = (opts->x_aarch64_asm_isa_flags
+   | feature_deps::SME ().enable);
+  aarch64_set_asm_isa_flags (opts, new_flags);
+}
+
   initialize_aarch64_code_model (opts);
   initialize_aarch64_tls_size (opts);
   aarch64_tpidr_register = opts->x_aarch64_tpidr_reg;
@@ -28607,6 +28637,9 @@ aarch64_run_selftests (void)
 #undef TARGET_FUNCTION_VALUE_REGNO_P
 #define TARGET_FUNCTION_VALUE_REGNO_P aarch64_function_value_regno_p
 
+#undef TARGET_START_CALL_ARGS
+#define TARGET_START_CALL_ARGS aarch64_start_call_args
+
 #undef TARGET_GIMPLE_FOLD_BUILTIN
 #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 4c7d9409fbc..ded640e8c7b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -214,6 +214,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SVE2_BITPERM  (aarch64_isa_flags & AARCH64_FL_SVE2_BITPERM)
 #define AARCH64_ISA_SVE2_SHA3 (aarch64_isa_flags & AARCH64_FL_SVE2_SHA3)
 #define 

[PATCH 07/21] aarch64: Add arm_streaming(_compatible) attributes

2023-11-17 Thread Richard Sandiford
This patch adds support for recognising the SME arm::streaming
and arm::streaming_compatible attributes.  These attributes
respectively describe whether the processor is definitely in
"streaming mode" (PSTATE.SM==1), whether the processor is
definitely not in streaming mode (PSTATE.SM==0), or whether
we don't know at compile time either way.

As far as the compiler is concerned, this effectively creates three
ISA submodes: streaming mode enables things that are not available
in non-streaming mode, non-streaming mode enables things that not
available in streaming mode, and streaming-compatible mode has to stick
to the common subset.  This means that some instructions are conditional
on PSTATE.SM==1 and some are conditional on PSTATE.SM==0.

I wondered about recording the streaming state in a new variable.
However, the set of available instructions is also influenced by
PSTATE.ZA (added later), so I think it makes sense to view this
as an instance of a more general mechanism.  Also, keeping the
PSTATE.SM state in the same flag variable as the other ISA
features makes it possible to sum up the requirements of an
ACLE function in a single value.

The patch therefore adds a new set of feature flags called "ISA modes".
Unlike the other two sets of flags (optional features and architecture-
level features), these ISA modes are not controlled directly by
command-line parameters or "target" attributes.

arm::streaming and arm::streaming_compatible are function type attributes
rather than function declaration attributes.  This means that we need
to find somewhere to copy the type information across to a function's
target options.  The patch does this in aarch64_set_current_function.

We also need to record which ISA mode a callee expects/requires
to be active on entry.  (The same mode is then active on return.)
The patch extends the current UNSPEC_CALLEE_ABI cookie to include
this information, as well as the PCS variant that it recorded
previously.

The attributes can also be written __arm_streaming and
__arm_streaming_compatible.  This has two advantages: it triggers
an error on compilers that don't understand the attributes, and it
eases use on C, where [[...]] attributes were only added in C23.

gcc/
* config/aarch64/aarch64-isa-modes.def: New file.
* config/aarch64/aarch64.h: Include it in the feature enumerations.
(AARCH64_FL_SM_STATE, AARCH64_FL_ISA_MODES): New constants.
(AARCH64_FL_DEFAULT_ISA_MODE): Likewise.
(AARCH64_ISA_MODE): New macro.
(CUMULATIVE_ARGS): Add an isa_mode field.
* config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Declare.
(aarch64_tlsdesc_abi_id): Return an arm_pcs.
* config/aarch64/aarch64.cc (attr_streaming_exclusions)
(aarch64_gnu_attributes, aarch64_gnu_attribute_table)
(aarch64_arm_attributes, aarch64_arm_attribute_table): New tables.
(aarch64_attribute_table): Redefine to include the gnu and arm
attributes.
(aarch64_fntype_pstate_sm, aarch64_fntype_isa_mode): New functions.
(aarch64_fndecl_pstate_sm, aarch64_fndecl_isa_mode): Likewise.
(aarch64_gen_callee_cookie, aarch64_callee_abi): Likewise.
(aarch64_insn_callee_cookie, aarch64_insn_callee_abi): Use them.
(aarch64_function_arg, aarch64_output_mi_thunk): Likewise.
(aarch64_init_cumulative_args): Initialize the isa_mode field.
(aarch64_output_mi_thunk): Use aarch64_gen_callee_cookie to get
the ABI cookie.
(aarch64_override_options): Add the ISA mode to the feature set.
(aarch64_temporary_target::copy_from_fndecl): Likewise.
(aarch64_fndecl_options, aarch64_handle_attr_arch): Likewise.
(aarch64_set_current_function): Maintain the correct ISA mode.
(aarch64_tlsdesc_abi_id): Return an arm_pcs.
(aarch64_comp_type_attributes): Handle arm::streaming and
arm::streaming_compatible.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __arm_streaming and __arm_streaming_compatible.
* config/aarch64/aarch64.md (tlsdesc_small_): Use
aarch64_gen_callee_cookie to get the ABI cookie.
* config/aarch64/t-aarch64 (TM_H): Add all feature-related .def files.

gcc/testsuite/
* gcc.target/aarch64/sme/aarch64-sme.exp: New harness.
* gcc.target/aarch64/sme/streaming_mode_1.c: New test.
* gcc.target/aarch64/sme/streaming_mode_2.c: Likewise.
* gcc.target/aarch64/sme/keyword_macros_1.c: Likewise.
* g++.target/aarch64/sme/aarch64-sme.exp: New harness.
* g++.target/aarch64/sme/streaming_mode_1.C: New test.
* g++.target/aarch64/sme/streaming_mode_2.C: Likewise.
* g++.target/aarch64/sme/keyword_macros_1.C: Likewise.
* gcc.target/aarch64/auto-init-1.c: Only expect the call insn
to contain 1 (const_int 0), not 2.
---
 gcc/config/aarch64/aarch64-c.cc   |  14 ++
 

[PATCH 06/21] aarch64: Add tuple forms of svreinterpret

2023-11-17 Thread Richard Sandiford
SME2 adds a number of intrinsics that operate on tuples of 2 and 4
vectors.  The ACLE therefore extends the existing svreinterpret
intrinsics to handle tuples as well.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Punt on tuple forms.
(svreinterpret_impl::expand): Use tuple_mode instead of vector_mode.
* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret):
Extend to x1234 groups.
* config/aarch64/aarch64-sve-builtins-functions.h
(multi_vector_function::vectors_per_tuple): If the function has
a group suffix, get the number of vectors from there.
* config/aarch64/aarch64-sve-builtins-shapes.h (reinterpret): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (reinterpret_def)
(reinterpret): New function shape.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Handle
DEF_SVE_FUNCTION_GS.
(function_resolver::infer_vector_type_and_group_suffix): New
function.
* config/aarch64/aarch64-sve-builtins.def (DEF_SVE_FUNCTION_GS): New
macro.
(DEF_SVE_FUNCTION): Forward to DEF_SVE_FUNCTION_GS by default.
* config/aarch64/aarch64-sve-builtins.h
(function_instance::tuple_mode): New member function.
(function_resolver::infer_vector_type_and_group_suffix): Likewise.
(function_base::vectors_per_tuple): Take the function instance
as argument and get the number from the group suffix.
(function_instance::vectors_per_tuple): Update accordingly.
* config/aarch64/iterators.md (SVE_FULLx2, SVE_FULLx3, SVE_FULLx4)
(SVE_ALL_STRUCT): New mode iterators.
(SVE_STRUCT): Redefine in terms of SVE_FULL*.
* config/aarch64/aarch64-sve.md (@aarch64_sve_reinterpret)
(*aarch64_sve_reinterpret): Extend to SVE structure modes.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_XN):
New macro.
* gcc.target/aarch64/sve/acle/asm/reinterpret_bf16.c: Add tests for
tuple forms.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/reinterpret_u8.c: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |  5 +-
 .../aarch64/aarch64-sve-builtins-base.def |  2 +-
 .../aarch64/aarch64-sve-builtins-functions.h  |  7 ++-
 .../aarch64/aarch64-sve-builtins-shapes.cc| 30 +
 .../aarch64/aarch64-sve-builtins-shapes.h |  1 +
 gcc/config/aarch64/aarch64-sve-builtins.cc| 52 +++-
 gcc/config/aarch64/aarch64-sve-builtins.def   |  8 ++-
 gcc/config/aarch64/aarch64-sve-builtins.h | 23 ++-
 gcc/config/aarch64/aarch64-sve.md |  8 +--
 gcc/config/aarch64/iterators.md   | 26 +---
 .../aarch64/sve/acle/asm/reinterpret_bf16.c   | 62 +++
 .../aarch64/sve/acle/asm/reinterpret_f16.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_f32.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_f64.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_s16.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_s32.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_s64.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_s8.c | 62 +++
 .../aarch64/sve/acle/asm/reinterpret_u16.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_u32.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_u64.c| 62 +++
 .../aarch64/sve/acle/asm/reinterpret_u8.c | 62 +++
 .../aarch64/sve/acle/asm/test_sve_acle.h  | 14 +
 23 files changed, 900 insertions(+), 20 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index b84e245eb3e..5b75b903e5f 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2161,6 +2161,9 @@ public:
   gimple *
   fold (gimple_folder ) const override
   {
+if (f.vectors_per_tuple () > 1)
+  return NULL;
+
 /* Punt to rtl if the effect of the reinterpret on registers does not
conform to GCC's endianness model.  */
 if 

[PATCH 05/21] aarch64: Add group suffixes to SVE intrinsics

2023-11-17 Thread Richard Sandiford
The SME2 ACLE adds a new "group" suffix component to the naming
convention for SVE intrinsics.  This is also used in the new tuple
forms of the svreinterpret intrinsics.

This patch adds support for group suffixes and defines the
x2, x3 and x4 suffixes that are needed for the svreinterprets.

gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc (build_one): Take
a group suffix index parameter.
(build_32_64, build_all): Update accordingly.  Iterate over all
group suffixes.
* config/aarch64/aarch64-sve-builtins-sve2.cc (svqrshl_impl::fold)
(svqshl_impl::fold, svrshl_impl::fold): Update function_instance
constructors.
* config/aarch64/aarch64-sve-builtins.cc (group_suffixes): New array.
(groups_none): New constant.
(function_groups): Initialize the groups field.
(function_instance::hash): Hash the group index.
(function_builder::get_name): Add the group suffix.
(function_builder::add_overloaded_functions): Iterate over all
group suffixes.
(function_resolver::lookup_form): Take a group suffix parameter.
(function_resolver::resolve_to): Likewise.
* config/aarch64/aarch64-sve-builtins.def (DEF_SVE_GROUP_SUFFIX): New
macro.
(x2, x3, x4): New group suffixes.
* config/aarch64/aarch64-sve-builtins.h (group_suffix_index): New enum.
(group_suffix_info): New structure.
(function_group_info::groups): New member variable.
(function_instance::group_suffix_id): Likewise.
(group_suffixes): New array.
(function_instance::operator==): Compare the group suffixes.
(function_instance::group_suffix): New function.
---
 .../aarch64/aarch64-sve-builtins-shapes.cc| 53 ++--
 .../aarch64/aarch64-sve-builtins-sve2.cc  | 10 +--
 gcc/config/aarch64/aarch64-sve-builtins.cc| 84 +--
 gcc/config/aarch64/aarch64-sve-builtins.def   |  9 ++
 gcc/config/aarch64/aarch64-sve-builtins.h | 81 ++
 5 files changed, 165 insertions(+), 72 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index 1646afc7a0d..dc255fc59f2 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -275,18 +275,20 @@ parse_signature (const function_instance , const 
char *format,
 }
 
 /* Add one function instance for GROUP, using mode suffix MODE_SUFFIX_ID,
-   the type suffixes at index TI and the predication suffix at index PI.
-   The other arguments are as for build_all.  */
+   the type suffixes at index TI, the group suffixes at index GI, and the
+   predication suffix at index PI.  The other arguments are as for
+   build_all.  */
 static void
 build_one (function_builder , const char *signature,
   const function_group_info , mode_suffix_index mode_suffix_id,
-  unsigned int ti, unsigned int pi, bool force_direct_overloads)
+  unsigned int ti, unsigned int gi, unsigned int pi,
+  bool force_direct_overloads)
 {
   /* Byte forms of svdupq take 16 arguments.  */
   auto_vec argument_types;
   function_instance instance (group.base_name, *group.base, *group.shape,
  mode_suffix_id, group.types[ti],
- group.preds[pi]);
+ group.groups[gi], group.preds[pi]);
   tree return_type = parse_signature (instance, signature, argument_types);
   apply_predication (instance, return_type, argument_types);
   b.add_unique_function (instance, return_type, argument_types,
@@ -312,24 +314,26 @@ build_32_64 (function_builder , const char *signature,
 mode_suffix_index mode64, bool force_direct_overloads = false)
 {
   for (unsigned int pi = 0; group.preds[pi] != NUM_PREDS; ++pi)
-if (group.types[0][0] == NUM_TYPE_SUFFIXES)
-  {
-   gcc_assert (mode32 != MODE_none && mode64 != MODE_none);
-   build_one (b, signature, group, mode32, 0, pi,
-  force_direct_overloads);
-   build_one (b, signature, group, mode64, 0, pi,
-  force_direct_overloads);
-  }
-else
-  for (unsigned int ti = 0; group.types[ti][0] != NUM_TYPE_SUFFIXES; ++ti)
+for (unsigned int gi = 0; group.groups[gi] != NUM_GROUP_SUFFIXES; ++gi)
+  if (group.types[0][0] == NUM_TYPE_SUFFIXES)
{
- unsigned int bits = type_suffixes[group.types[ti][0]].element_bits;
- gcc_assert (bits == 32 || bits == 64);
- mode_suffix_index mode = bits == 32 ? mode32 : mode64;
- if (mode != MODE_none)
-   build_one (b, signature, group, mode, ti, pi,
-  force_direct_overloads);
+ gcc_assert (mode32 != MODE_none && mode64 != MODE_none);
+ build_one (b, signature, group, mode32, 0, gi, pi,
+force_direct_overloads);
+ build_one (b, signature, group, 

[PATCH 04/21] aarch64: Make AARCH64_FL_SVE requirements explicit

2023-11-17 Thread Richard Sandiford
So far, all intrinsics covered by the aarch64-sve-builtins*
framework have (naturally enough) required at least SVE.
However, arm_sme.h defines a couple of intrinsics that can
be called by any code.  It's therefore necessary to make
the implicit SVE requirement explicit.

gcc/
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Remove
implied requirement on SVE.
* config/aarch64/aarch64-sve-builtins-base.def: Explicitly require SVE.
* config/aarch64/aarch64-sve-builtins-sve2.def: Likewise.
---
 .../aarch64/aarch64-sve-builtins-base.def  | 10 +-
 .../aarch64/aarch64-sve-builtins-sve2.def  | 18 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc |  2 +-
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def 
b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index 95ae1d71629..0484863d3f7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -17,7 +17,7 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
-#define REQUIRED_EXTENSIONS 0
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE
 DEF_SVE_FUNCTION (svabd, binary_opt_n, all_arith, mxz)
 DEF_SVE_FUNCTION (svabs, unary, all_float_and_signed, mxz)
 DEF_SVE_FUNCTION (svacge, compare_opt_n, all_float, implicit)
@@ -318,7 +318,7 @@ DEF_SVE_FUNCTION (svzip2, binary, all_data, none)
 DEF_SVE_FUNCTION (svzip2, binary_pred, all_pred, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_BF16
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_BF16
 DEF_SVE_FUNCTION (svbfdot, ternary_bfloat_opt_n, s_float, none)
 DEF_SVE_FUNCTION (svbfdot_lane, ternary_bfloat_lanex2, s_float, none)
 DEF_SVE_FUNCTION (svbfmlalb, ternary_bfloat_opt_n, s_float, none)
@@ -330,7 +330,7 @@ DEF_SVE_FUNCTION (svcvt, unary_convert, cvt_bfloat, mxz)
 DEF_SVE_FUNCTION (svcvtnt, unary_convert_narrowt, cvt_bfloat, mx)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_I8MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_I8MM
 DEF_SVE_FUNCTION (svmmla, mmla, s_integer, none)
 DEF_SVE_FUNCTION (svusmmla, ternary_uintq_intq, s_signed, none)
 DEF_SVE_FUNCTION (svsudot, ternary_intq_uintq_opt_n, s_signed, none)
@@ -339,11 +339,11 @@ DEF_SVE_FUNCTION (svusdot, ternary_uintq_intq_opt_n, 
s_signed, none)
 DEF_SVE_FUNCTION (svusdot_lane, ternary_uintq_intq_lane, s_signed, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F32MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F32MM
 DEF_SVE_FUNCTION (svmmla, mmla, s_float, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_F64MM
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_F64MM
 DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit)
 DEF_SVE_FUNCTION (svmmla, mmla, d_float, none)
 DEF_SVE_FUNCTION (svtrn1q, binary, all_data, none)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def 
b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
index dd6d1357d51..565393f3081 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sve2.def
@@ -17,7 +17,7 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
-#define REQUIRED_EXTENSIONS AARCH64_FL_SVE2
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_SVE2
 DEF_SVE_FUNCTION (svaba, ternary_opt_n, all_integer, none)
 DEF_SVE_FUNCTION (svabalb, ternary_long_opt_n, hsd_integer, none)
 DEF_SVE_FUNCTION (svabalt, ternary_long_opt_n, hsd_integer, none)
@@ -189,7 +189,9 @@ DEF_SVE_FUNCTION (svwhilewr, compare_ptr, all_data, none)
 DEF_SVE_FUNCTION (svxar, ternary_shift_right_imm, all_integer, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_AES)
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+| AARCH64_FL_SVE2 \
+| AARCH64_FL_SVE2_AES)
 DEF_SVE_FUNCTION (svaesd, binary, b_unsigned, none)
 DEF_SVE_FUNCTION (svaese, binary, b_unsigned, none)
 DEF_SVE_FUNCTION (svaesmc, unary, b_unsigned, none)
@@ -198,17 +200,23 @@ DEF_SVE_FUNCTION (svpmullb_pair, binary_opt_n, 
d_unsigned, none)
 DEF_SVE_FUNCTION (svpmullt_pair, binary_opt_n, d_unsigned, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_BITPERM)
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+| AARCH64_FL_SVE2 \
+| AARCH64_FL_SVE2_BITPERM)
 DEF_SVE_FUNCTION (svbdep, binary_opt_n, all_unsigned, none)
 DEF_SVE_FUNCTION (svbext, binary_opt_n, all_unsigned, none)
 DEF_SVE_FUNCTION (svbgrp, binary_opt_n, all_unsigned, none)
 #undef REQUIRED_EXTENSIONS
 
-#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE2 | AARCH64_FL_SVE2_SHA3)
+#define REQUIRED_EXTENSIONS (AARCH64_FL_SVE \
+

[PATCH 03/21] aarch64: Use SVE's RDVL instruction

2023-11-17 Thread Richard Sandiford
We didn't previously use SVE's RDVL instruction, since the CNT*
forms are preferred and provide most of the range.  However,
there are some cases that RDVL can handle and CNT* can't,
and using RDVL-like instructions becomes important for SME.

gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_rdvl_immediate_p)
(aarch64_output_sve_rdvl): Declare.
* config/aarch64/aarch64.cc (aarch64_sve_cnt_factor_p): New
function, split out from...
(aarch64_sve_cnt_immediate_p): ...here.
(aarch64_sve_rdvl_factor_p): New function.
(aarch64_sve_rdvl_immediate_p): Likewise.
(aarch64_output_sve_rdvl): Likewise.
(aarch64_offset_temporaries): Rewrite the SVE handling to use RDVL
for some cases.
(aarch64_expand_mov_immediate): Handle RDVL immediates.
(aarch64_mov_operand_p): Likewise.
* config/aarch64/constraints.md (Usr): New constraint.
* config/aarch64/aarch64.md (*mov_aarch64): Add an RDVL
alternative.
(*movsi_aarch64, *movdi_aarch64): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/cntb.c: Tweak expected output.
* gcc.target/aarch64/sve/acle/asm/cnth.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cntw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/cntd.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfb.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfh.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfw.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/prfd.c: Likewise.
* gcc.target/aarch64/sve/loop_add_4.c: Expect RDVL to be used
to calculate the -17 and 17 factors.
* gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise the 18 factor.
---
 gcc/config/aarch64/aarch64-protos.h   |   2 +
 gcc/config/aarch64/aarch64.cc | 191 --
 gcc/config/aarch64/aarch64.md |   3 +
 gcc/config/aarch64/constraints.md |   6 +
 .../gcc.target/aarch64/sve/acle/asm/cntb.c|  71 +--
 .../gcc.target/aarch64/sve/acle/asm/cntd.c|  12 +-
 .../gcc.target/aarch64/sve/acle/asm/cnth.c|  20 +-
 .../gcc.target/aarch64/sve/acle/asm/cntw.c|  16 +-
 .../gcc.target/aarch64/sve/acle/asm/prfb.c|   6 +-
 .../gcc.target/aarch64/sve/acle/asm/prfd.c|   4 +-
 .../gcc.target/aarch64/sve/acle/asm/prfh.c|   4 +-
 .../gcc.target/aarch64/sve/acle/asm/prfw.c|   4 +-
 .../gcc.target/aarch64/sve/loop_add_4.c   |   6 +-
 .../aarch64/sve/pcs/stack_clash_1.c   |   3 +-
 14 files changed, 225 insertions(+), 123 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 82e83402b75..7ebdec2f58c 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -798,6 +798,7 @@ bool aarch64_sve_mode_p (machine_mode);
 HOST_WIDE_INT aarch64_fold_sve_cnt_pat (aarch64_svpattern, unsigned int);
 bool aarch64_sve_cnt_immediate_p (rtx);
 bool aarch64_sve_scalar_inc_dec_immediate_p (rtx);
+bool aarch64_sve_rdvl_immediate_p (rtx);
 bool aarch64_sve_addvl_addpl_immediate_p (rtx);
 bool aarch64_sve_vector_inc_dec_immediate_p (rtx);
 int aarch64_add_offset_temporaries (rtx);
@@ -810,6 +811,7 @@ char *aarch64_output_sve_prefetch (const char *, rtx, const 
char *);
 char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
 char *aarch64_output_sve_cnt_pat_immediate (const char *, const char *, rtx *);
 char *aarch64_output_sve_scalar_inc_dec (rtx);
+char *aarch64_output_sve_rdvl (rtx);
 char *aarch64_output_sve_addvl_addpl (rtx);
 char *aarch64_output_sve_vector_inc_dec (const char *, rtx);
 char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index fc1492b43ae..622ab763306 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -5307,6 +5307,18 @@ aarch64_fold_sve_cnt_pat (aarch64_svpattern pattern, 
unsigned int nelts_per_vq)
   return -1;
 }
 
+/* Return true if a single CNT[BHWD] instruction can multiply FACTOR
+   by the number of 128-bit quadwords in an SVE vector.  */
+
+static bool
+aarch64_sve_cnt_factor_p (HOST_WIDE_INT factor)
+{
+  /* The coefficient must be [1, 16] * {2, 4, 8, 16}.  */
+  return (IN_RANGE (factor, 2, 16 * 16)
+ && (factor & 1) == 0
+ && factor <= 16 * (factor & -factor));
+}
+
 /* Return true if we can move VALUE into a register using a single
CNT[BHWD] instruction.  */
 
@@ -5314,11 +5326,7 @@ static bool
 aarch64_sve_cnt_immediate_p (poly_int64 value)
 {
   HOST_WIDE_INT factor = value.coeffs[0];
-  /* The coefficient must be [1, 16] * {2, 4, 8, 16}.  */
-  return (value.coeffs[1] == factor
- && IN_RANGE (factor, 2, 16 * 16)
- && (factor & 1) == 0
- && factor <= 16 * (factor & -factor));
+  return value.coeffs[1] == factor && aarch64_sve_cnt_factor_p (factor);
 }
 
 /* 

[PATCH 02/21] aarch64: Add a result_mode helper function

2023-11-17 Thread Richard Sandiford
SME will add more intrinsics whose expansion code requires
the mode of the function return value.  This patch adds an
associated helper routine.

gcc/
* config/aarch64/aarch64-sve-builtins.h
(function_expander::result_mode): New member function.
* config/aarch64/aarch64-sve-builtins-base.cc
(svld234_impl::expand): Use it.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::get_reg_target): Likewise.
---
 gcc/config/aarch64/aarch64-sve-builtins-base.cc | 2 +-
 gcc/config/aarch64/aarch64-sve-builtins.cc  | 2 +-
 gcc/config/aarch64/aarch64-sve-builtins.h   | 9 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 9010ecca6da..b84e245eb3e 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1506,7 +1506,7 @@ public:
   rtx
   expand (function_expander ) const override
   {
-machine_mode tuple_mode = TYPE_MODE (TREE_TYPE (e.call_expr));
+machine_mode tuple_mode = e.result_mode ();
 insn_code icode = convert_optab_handler (vec_mask_load_lanes_optab,
 tuple_mode, e.vector_mode (0));
 return e.use_contiguous_load_insn (icode);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 75a51565ed2..8b7b885a8f4 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2802,7 +2802,7 @@ function_expander::get_fallback_value (machine_mode mode, 
unsigned int nops,
 rtx
 function_expander::get_reg_target ()
 {
-  machine_mode target_mode = TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl)));
+  machine_mode target_mode = result_mode ();
   if (!possible_target || GET_MODE (possible_target) != target_mode)
 possible_target = gen_reg_rtx (target_mode);
   return possible_target;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
b/gcc/config/aarch64/aarch64-sve-builtins.h
index 99bfd906a07..7cf8f45b3d5 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -529,6 +529,8 @@ public:
   insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
   machine_mode = E_VOIDmode);
 
+  machine_mode result_mode () const;
+
   bool overlaps_input_p (rtx);
 
   rtx convert_to_pmode (rtx);
@@ -878,6 +880,13 @@ function_base::call_properties (const function_instance 
) const
   return flags;
 }
 
+/* Return the mode of the result of a call.  */
+inline machine_mode
+function_expander::result_mode () const
+{
+  return TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl)));
+}
+
 }
 
 #endif
-- 
2.25.1



[PATCH 01/21] aarch64: Generalise require_immediate_lane_index

2023-11-17 Thread Richard Sandiford
require_immediate_lane_index previously hard-coded the assumption
that the group size is determined by the argument immediately before
the index.  However, for SME, there are cases where it should be
determined by an earlier argument instead.

gcc/
* config/aarch64/aarch64-sve-builtins.h:
(function_checker::require_immediate_lane_index): Add an argument
for the index of the indexed vector argument.
* config/aarch64/aarch64-sve-builtins.cc
(function_checker::require_immediate_lane_index): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(ternary_bfloat_lane_base::check): Update accordingly.
(ternary_qq_lane_base::check): Likewise.
(binary_lane_def::check): Likewise.
(binary_long_lane_def::check): Likewise.
(ternary_lane_def::check): Likewise.
(ternary_lane_rotate_def::check): Likewise.
(ternary_long_lane_def::check): Likewise.
(ternary_qq_lane_rotate_def::check): Likewise.
---
 .../aarch64/aarch64-sve-builtins-shapes.cc | 16 
 gcc/config/aarch64/aarch64-sve-builtins.cc | 18 --
 gcc/config/aarch64/aarch64-sve-builtins.h  |  3 ++-
 3 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
index af816c4c9e7..1646afc7a0d 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-shapes.cc
@@ -941,7 +941,7 @@ struct ternary_bfloat_lane_base
   bool
   check (function_checker ) const override
   {
-return c.require_immediate_lane_index (3, N);
+return c.require_immediate_lane_index (3, 2, N);
   }
 };
 
@@ -956,7 +956,7 @@ struct ternary_qq_lane_base
   bool
   check (function_checker ) const override
   {
-return c.require_immediate_lane_index (3, 4);
+return c.require_immediate_lane_index (3, 0);
   }
 };
 
@@ -1123,7 +1123,7 @@ struct binary_lane_def : public overloaded_base<0>
   bool
   check (function_checker ) const override
   {
-return c.require_immediate_lane_index (2);
+return c.require_immediate_lane_index (2, 1);
   }
 };
 SHAPE (binary_lane)
@@ -1162,7 +1162,7 @@ struct binary_long_lane_def : public overloaded_base<0>
   bool
   check (function_checker ) const override
   {
-return c.require_immediate_lane_index (2);
+return c.require_immediate_lane_index (2, 1);
   }
 };
 SHAPE (binary_long_lane)
@@ -2817,7 +2817,7 @@ struct ternary_lane_def : public overloaded_base<0>
   bool
   check (function_checker ) const override
   {
-return c.require_immediate_lane_index (3);
+return c.require_immediate_lane_index (3, 2);
   }
 };
 SHAPE (ternary_lane)
@@ -2845,7 +2845,7 @@ struct ternary_lane_rotate_def : public overloaded_base<0>
   bool
   check (function_checker ) const override
   {
-return (c.require_immediate_lane_index (3, 2)
+return (c.require_immediate_lane_index (3, 2, 2)
&& c.require_immediate_one_of (4, 0, 90, 180, 270));
   }
 };
@@ -2868,7 +2868,7 @@ struct ternary_long_lane_def
   bool
   check (function_checker ) const override
   {
-return c.require_immediate_lane_index (3);
+return c.require_immediate_lane_index (3, 2);
   }
 };
 SHAPE (ternary_long_lane)
@@ -2965,7 +2965,7 @@ struct ternary_qq_lane_rotate_def : public 
overloaded_base<0>
   bool
   check (function_checker ) const override
   {
-return (c.require_immediate_lane_index (3, 4)
+return (c.require_immediate_lane_index (3, 0)
&& c.require_immediate_one_of (4, 0, 90, 180, 270));
   }
 };
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 161a14edde7..75a51565ed2 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -2440,20 +2440,26 @@ function_checker::require_immediate_enum (unsigned int 
rel_argno, tree type)
   return false;
 }
 
-/* Check that argument REL_ARGNO is suitable for indexing argument
-   REL_ARGNO - 1, in groups of GROUP_SIZE elements.  REL_ARGNO counts
-   from the end of the predication arguments.  */
+/* The intrinsic conceptually divides vector argument REL_VEC_ARGNO into
+   groups of GROUP_SIZE elements.  Return true if argument REL_ARGNO is
+   a suitable constant index for selecting one of these groups.  The
+   selection happens within a 128-bit quadword, rather than the whole vector.
+
+   REL_ARGNO and REL_VEC_ARGNO count from the end of the predication
+   arguments.  */
 bool
 function_checker::require_immediate_lane_index (unsigned int rel_argno,
+   unsigned int rel_vec_argno,
unsigned int group_size)
 {
   unsigned int argno = m_base_arg + rel_argno;
   if (!argument_exists_p (argno))
 return true;
 
-  /* Get the type of the previous argument.  tree_argument_type wants a
- 1-based 

[PATCH 00/21] aarch64: Add support for SME

2023-11-17 Thread Richard Sandiford
This series of patches adds support for SME.  A follow-on series
will add SME2 on top.

All of the detail is in the individual patch summaries.

The series can't go in yet, because it depends on:

  https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629713.html

and some reviewed-but-unpushed patches that follow on from it.

The series also depends on some patches that I posted last year and were
approved (thanks!), but that I didn't commit at the time because the SME
support didn't go in then.  I'll repost those when I push them.

Tested on aarch64-linux-gnu.  I tested on top of the late-combine pass,
so that might also be an unintentional dependency.

Richard


Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Palmer Dabbelt

On Fri, 17 Nov 2023 03:39:48 PST (-0800), juzhe.zh...@rivai.ai wrote:

90% theadvector extension reusing current RVV 1.0 instructions patterns:
Just change ASM, For example:

@@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
 (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] 
VMULH)
  (match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
   "TARGET_VECTOR"
-  "vmulh.vx\t%0,%3,%z4%p1"
+  "%^vmulh.vx\t%0,%3,%z4%p1"
   [(set_attr "type" "vimul")
(set_attr "mode" "")])
+  if (letter == '^')
+{
+  if (TARGET_XTHEADVECTOR)
+   fputs ("th.", file);
+  return;
+}

For almost all patterns, you just simply append "th." in the ASM prefix.
like change "vmulh.vv" -> "th.vmulh.vv"

Almost all theadvector instructions are not new features,  all same as RVV1.0.
Why do you invent the such ISA doesn't include any features that RVV1.0 doesn't 
satisfy ?

I am not explicitly object this patch. But I should know the reason.


There's some more in the later threads, but with the top posting it kind 
of got lost so I'm just replying here.


This really isn't T-Head's fault: we announced V-0.7 as a stable draft 
that was being implemented, and then T-Head went and implemented it.  
Most of that history has been scrubbed by RVI, but you can still find 
some stuff like this old talk on YouTube 
.


In general we've just figured out a way to make things work when HW 
vendors end up in a grey area in RISC-V land.  That obviously results in 
a bunch of pain for the SW people, but this stuff is only useful if we 
can run on real HW and that always involves some amount of pain.  
Hopefully we can get to a point where we make fewer problems for 
ourselves, but we've got a long history to dig out from and there's 
going to be a lot more of this in the future.


So I don't like this XTHeadV stuff, but I think we're best to take it: 
these guys tried to do the right thing and got thrown under the bus by 
RVI, we should help them.  This is almost certainly going to be a lot 
more pain that we're used to, just given the size of the extensions in 
question, but I still think it's the right  way to go.


The other option is to essentially just tell them to fork the ISA, which 
isn't good for anyone.



Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as long as 
all other RISC-V maintainers agree.


I agree this is gcc-15 material: there's a lot of subtle differences in 
behavior between 0.7 and 1.0, even when the mnemonics are the same.  
We're already pretty buried in testing for 14, so trying to pick up 
another target is going to be a huge headache (particularly one that's a 
bit special).







juzhe.zh...@rivai.ai


Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Daniel Krügler
Am Fr., 17. Nov. 2023 um 16:32 Uhr schrieb Jonathan Wakely :
>
> Tested x86_64-linux. Pushed to trunk.
>
> GCC generates better code for add_sat if we use:
>
> unsigned z = x + y;
> z |= -(z < x);
> return z;
>
> If the compiler can't be improved we should consider using that instead
> of __builtin_add_overflow.
>
>
> -- >8 --
>
>
> This was approved for C++26 last week at the WG21 meeting in Kona.
>
> libstdc++-v3/ChangeLog:
>
> * include/Makefile.am: Add new header.
> * include/Makefile.in: Regenerate.
> * include/bits/version.def (saturation_arithmetic): Define.
> * include/bits/version.h: Regenerate.
> * include/std/numeric: Include new header.
> * include/bits/sat_arith.h: New file.
> * testsuite/26_numerics/saturation/add.cc: New test.
> * testsuite/26_numerics/saturation/cast.cc: New test.
> * testsuite/26_numerics/saturation/div.cc: New test.
> * testsuite/26_numerics/saturation/mul.cc: New test.
> * testsuite/26_numerics/saturation/sub.cc: New test.
> * testsuite/26_numerics/saturation/version.cc: New test.
> ---
>  libstdc++-v3/include/Makefile.am  |   1 +
>  libstdc++-v3/include/Makefile.in  |   1 +
>  libstdc++-v3/include/bits/sat_arith.h | 148 ++
>  libstdc++-v3/include/bits/version.def |   8 +
>  libstdc++-v3/include/bits/version.h   |  11 ++
>  libstdc++-v3/include/std/numeric  |   5 +
>  .../testsuite/26_numerics/saturation/add.cc   |  73 +
>  .../testsuite/26_numerics/saturation/cast.cc  |  24 +++
>  .../testsuite/26_numerics/saturation/div.cc   |  45 ++
>  .../testsuite/26_numerics/saturation/mul.cc   |  34 
>  .../testsuite/26_numerics/saturation/sub.cc   |  86 ++
>  .../26_numerics/saturation/version.cc |  19 +++
>  12 files changed, 455 insertions(+)
>  create mode 100644 libstdc++-v3/include/bits/sat_arith.h
>  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/add.cc
>  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/cast.cc
>  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/div.cc
>  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/mul.cc
>  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/sub.cc
>  create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/version.cc
>
> diff --git a/libstdc++-v3/include/Makefile.am 
> b/libstdc++-v3/include/Makefile.am
> index dab9f720cbb..17d9d9cec31 100644
> --- a/libstdc++-v3/include/Makefile.am
> +++ b/libstdc++-v3/include/Makefile.am
> @@ -142,6 +142,7 @@ bits_freestanding = \
> ${bits_srcdir}/ranges_uninitialized.h \
> ${bits_srcdir}/ranges_util.h \
> ${bits_srcdir}/refwrap.h \
> +   ${bits_srcdir}/sat_arith.h \
> ${bits_srcdir}/stl_algo.h \
> ${bits_srcdir}/stl_algobase.h \
> ${bits_srcdir}/stl_construct.h \
> diff --git a/libstdc++-v3/include/Makefile.in 
> b/libstdc++-v3/include/Makefile.in
> index 4f7ab2dfbab..f038af709cc 100644
> --- a/libstdc++-v3/include/Makefile.in
> +++ b/libstdc++-v3/include/Makefile.in
> @@ -497,6 +497,7 @@ bits_freestanding = \
> ${bits_srcdir}/ranges_uninitialized.h \
> ${bits_srcdir}/ranges_util.h \
> ${bits_srcdir}/refwrap.h \
> +   ${bits_srcdir}/sat_arith.h \
> ${bits_srcdir}/stl_algo.h \
> ${bits_srcdir}/stl_algobase.h \
> ${bits_srcdir}/stl_construct.h \
> diff --git a/libstdc++-v3/include/bits/sat_arith.h 
> b/libstdc++-v3/include/bits/sat_arith.h
> new file mode 100644
> index 000..71793467984
> --- /dev/null
> +++ b/libstdc++-v3/include/bits/sat_arith.h
> @@ -0,0 +1,148 @@
> +// Saturation arithmetic -*- C++ -*-
> +
> +// Copyright The GNU Toolchain Authors.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// Under Section 7 of GPL version 3, you are granted additional
> +// permissions described in the GCC Runtime Library Exception, version
> +// 3.1, as published by the Free Software Foundation.
> +
> +// You should have received a copy of the GNU General Public License and
> +// a copy of the GCC Runtime Library Exception along with this program;
> +// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +// .
> +
> +/** @file include/bits/sat_arith.h
> + *  This is an internal header file, included by other library headers.
> + *  Do not 

Re: Add 'libgomp.c++/static-local-variable-1.C'

2023-11-17 Thread Thomas Schwinge
Hi!

On 2023-11-17T16:24:46+0100, I wrote:
> [...] attached "Add 'libgomp.c++/static-local-variable-1.C'" [...]

Now, working on translating this into an OpenMP 'target' variant.  My
goal here is not necessarily to make this work now, but rather to figure
out whether '-fthreadsafe-statics' actually does or doesn't need to be
supported in offloading compilation, whether '__cxa_guard_acquire' is in
fact unreachable there.  (Currently the latter symbol isn't available in
offloading compilation; as you know I'm currently working on GPU
libstdc++ library support.)  However, GCC offloading compilation
currently fails differently, as follows:

r.cc:16:12: error: variable ‘_ZGVZL1fvE1s’ has been referenced in offloaded 
code but hasn’t been marked to be included in the offloaded code
   16 |   static S s;
  |^
lto1: fatal error: errors during merging of translation units
compilation terminated.
nvptx mkoffload: fatal error: 
build-gcc/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
[...]

... with:

$ c++filt _ZGVZL1fvE1s
guard variable for f()::s

Now I wonder how is that supposed to behave; is this valid OpenMP
'target' code at all?  Can you please help me find my way through the
OpenMP specification regarding this?

In OpenMP 5.2, 5.1.1 "Variables Referenced in a Construct", we have:

  - Variables with static storage duration that are declared in a scope 
inside the construct are shared.

Does this apply to a 'declare target'ed function 'f'?  (I was thinking:
"dynamic extend" of the scope of the 'target' construct?)  Ah, probably
that's 5.1.2 "Variables Referenced in a Region but not in a Construct":

  - Variables with static storage duration that are declared in called 
routines in the region are shared.

In 7.8 "Declare Target Directives", we have:

If a variable with static storage duration is declared in a device routine 
then the named variable is
treated as if it had appeared in an 'enter' clause on a declare target 
directive.

Similarly, in 13.8 "'target' Construct":

If a variable with static storage duration is declared in a 'target' 
construct that does not specify a
'device' clause in which the 'ancestor' _device-modifier_ appears then the 
named variable is
treated as if it had appeared in a 'enter' clause on a declare target 
directive.

Per those occurrences, and per GCC not raising an error when encountering
a static local variable, I assume this is intended to work "as expected"?

On the other hand, NVHPC nvc++ 23.1-0 fails, too:

NVC++-S-1062-Support procedure called within a compute region - 
__cxa_guard_acquire (r.cc: 16)
[local to r_cc]::f():
 16, Accelerator restriction: unsupported call to support routine 
'__cxa_guard_acquire'
NVC++/x86-64 Linux 23.1-0: compilation completed with severe errors

Hmm...


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
#pragma omp declare target

struct S
{
  S()
  {
  }

  ~S()
  {
  }
};

static void f()
{
  static S s;
}

#pragma omp end declare target

int main()
{
#pragma omp target
  {
f();
  }
}


Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jeff Law




On 11/17/23 04:39, juzhe.zh...@rivai.ai wrote:

90% theadvector extension reusing current RVV 1.0 instructions patterns:
Just change ASM, For example:

@@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
 (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] 
VMULH)
  (match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
"TARGET_VECTOR"
-  "vmulh.vx\t%0,%3,%z4%p1"
+  "%^vmulh.vx\t%0,%3,%z4%p1"
[(set_attr "type" "vimul")
 (set_attr "mode" "")])

+  if (letter == '^')
+{
+  if (TARGET_XTHEADVECTOR)
+   fputs ("th.", file);
+  return;
+}
I assume this hunk is meant for riscv_output_operand in riscv.cc.  We 
may also need to add '^' to the punct_valid_p hook.  But yes, this is 
the preferred way to go when all we need to do is prefix the instruction 
with "th.".





Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as 
long as all other RISC-V maintainers agree.
I *think* it's a gcc-15 issue.  Philipp T. and I briefly spoke about 
this at the RVI summit a couple weeks back and he indicated the thead 
vector work was targeting gcc-15.


Jeff


Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-17 Thread David Edelsohn
On Fri, Nov 17, 2023 at 10:17 AM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > On Fri, Nov 17, 2023 at 3:46 AM Arsen Arsenović  wrote:
> >
> >>
> >> David Edelsohn  writes:
> >>
> >> > On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović 
> wrote:
> >> >
> >> > [snip]
> >> >> Sure, but my patch does insert --disable-shared:
> >> >>
> >> >> --8<---cut here---start->8---
> >> >> host_modules= { module= gettext; bootstrap=true; no_install=true;
> >> >> module_srcdir= "gettext/gettext-runtime";
> >> >> // We always build gettext with pic, because some
> >> packages
> >> >> (e.g. gdbserver)
> >> >> // need it in some configuratons, which is determined
> >> via
> >> >> nontrivial tests.
> >> >> // Always enabling pic seems to make sense for
> something
> >> >> tied to
> >> >> // user-facing output.
> >> >> extra_configure_flags='--disable-shared
> --disable-java
> >> >> --disable-csharp --with-pic';
> >> >> lib_path=intl/.libs; };
> >> >> --8<---cut here---end--->8---
> >> >>
> >> >> ... and it is applied:
> >> >>
> >> >> --8<---cut here---start->8---
> >> >> -bash-5.1$ ./config.status --config
> >> >> --srcdir=../../gcc/gettext/gettext-runtime
> --cache-file=./config.cache
> >> >>   --disable-werror --with-gmp=/opt/cfarm
> >> >>   --with-libiconv-prefix=/opt/cfarm --disable-libstdcxx-pch
> >> >>   --with-included-gettext --program-transform-name=s,y,y,
> >> >>   --disable-option-checking --build=powerpc-ibm-aix7.3.1.0
> >> >>   --host=powerpc-ibm-aix7.3.1.0 --target=powerpc-ibm-aix7.3.1.0
> >> >>   --disable-intermodule --enable-checking=yes,types,extra
> >> >>   --disable-coverage --enable-languages=c,c++
> >> >>   --disable-build-format-warnings --disable-shared --disable-java
> >> >>   --disable-csharp --with-pic build_alias=powerpc-ibm-aix7.3.1.0
> >> >>   host_alias=powerpc-ibm-aix7.3.1.0
> target_alias=powerpc-ibm-aix7.3.1.0
> >> >>   CC=gcc CFLAGS=-g 'LDFLAGS=-static-libstdc++ -static-libgcc
> >> >>   -Wl,-bbigtoc' 'CXX=g++ -std=c++11' CXXFLAGS=-g
> >> >> --8<---cut here---end--->8---
> >> >>
> >> >> I'm unsure how to tell what the produced binaries are w.r.t static or
> >> >> shared, but I only see .o files inside intl/.libs/libintl.a, while I
> see
> >> >> a .so.1 in (e.g.) /lib/libz.a, hinting at it not being shared (?)
> >> >>
> >> >
> >> > An AIX shared library created by libtool will look like
> >> > libfoo.a[libfoo.so.N], where N is the package major version number.
> >> > Normally with one file.
> >>
> >> > An AIX static library will look like libfoo.a[a.o, b.o, c.o]
> >> > with multiple object files.
> >> >
> >> > An AIX archive can contain a combination of shared objects and
> >> > normal object files.
> >> >
> >> > AIX normally uses the convention shr.o or shr_64.o for the name
> >> > of the shared object file.  Hint, hint, an AIX archive can contain
> >> > both 32 bit and 64 bit object files or shared objects.
> >> >
> >> > I don't know why the gettext build system would create
> >> > /home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8)
> >> > if --disable-shared was requested.  That clearly is using the
> >> > naming of a libtool AIX shared object and failing due to
> >> > the missing shared object.  Although in this case, the problem
> >> > seems to be the shared library load path.  AIX uses LIBPATH,
> >> > not LD_LIBRARY_PATH.
> >>
> >> It doesn't create libintl.a with a libintl.so.8 inside of it.  The
> >> libintl.a contains a bunch of objects, as I'd expect of a static
> >> library:
> >>
> >> --8<---cut here---start->8---
> >> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a  | grep libintl
> >> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a
> >> bindtextdom.o
> >> dcgettext.o
> >> ...
> >> --8<---cut here---end--->8---
> >>
> >>
> >> > Also, for me, the out of tree path was
> >> >
> >> > gettext/gettext-runtime/intl/.libs
> >> >
> >> > Is your search path missing a level?
> >>
> >> No, the above is generated by the GCC build system and builds
> >> gettext-runtime directly (per Brunos recommendation a while ago) as it
> >> is replacing intl/ of similar functionality.
> >>
> >> I'm currently building GCC with libintl with the threads hack you
> >> mentioned applied (as I got undefined references to the pthread
> >> functions you discovered).  I suspect that, bar this issue (which, IIUC,
> >> Bruno will fix in a new release?) the patch above will fix the issues
> >> you've encountered on AIX (note that if you want to use gettext in-tree,
> >> you'd still have to fetch gettext into the tree).
> >>
> >> Maybe we should provide a download-prerequisite-y script that skips
> >> everything but GNU gettext, to retain same behavior?
> >>
> >> Have a lovely day.

[PATCH 2/2] libstdc++: Ensure valid UTF-8 in std::vprint_unicode

2023-11-17 Thread Jonathan Wakely
This is a naive implementation of the UTF-8 validation algorithm, which
could definitely be optimized. But it's faster than using
std::codecvt_utf8 and checking the result of that, which is the only
existing code we have to do it in the library.

As the TODO suggests, we could do the UTF-8 to UTF-16 conversion at the
same time. But that is only needed for Windows and as I said in the 1/2
email, the output for Windows seems to be broken currently anyway and I
can't test it properly.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/locale_conv.h (__to_valid_utf8): New function.
* include/std/ostream (vprint_unicode): Use it.
* include/std/print (vprint_unicode): Use it.
---
 libstdc++-v3/include/bits/locale_conv.h | 104 
 libstdc++-v3/include/std/ostream|  74 +++--
 libstdc++-v3/include/std/print  |   8 +-
 3 files changed, 160 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/include/bits/locale_conv.h 
b/libstdc++-v3/include/bits/locale_conv.h
index 284142a360a..f6ade1d0395 100644
--- a/libstdc++-v3/include/bits/locale_conv.h
+++ b/libstdc++-v3/include/bits/locale_conv.h
@@ -624,6 +624,110 @@ _GLIBCXX_END_NAMESPACE_CXX11
   bool _M_always_noconv;
 };
 
+#if __cplusplus >= 202002L
+  template
+  bool
+  __to_valid_utf8(string& __s)
+  {
+// TODO if _CharT is wchar_t then transcode at the same time.
+
+unsigned __seen = 0, __needed = 0;
+unsigned char __lo_bound = 0x80, __hi_bound = 0xBF;
+size_t __errors = 0;
+
+auto __q = __s.data(), __eoq = __q + __s.size();
+while (__q != __eoq)
+  {
+   unsigned char __byte = *__q;
+   if (__needed == 0)
+ {
+   if (__byte <= 0x7F)  // 0x00 to 0x7F
+ {
+   while (++__q != __eoq && (unsigned char)*__q <= 0x7F)
+ { } // Fast forward to the next non-ASCII character.
+   continue;
+ }
+   else if (__byte < 0xC2)
+ {
+   *__q = 0xFF;
+   ++__errors;
+ }
+   else if (__byte <= 0xDF) // 0xC2 to 0xDF
+ {
+   __needed = 1;
+ }
+   else if (__byte <= 0xEF) // 0xE0 to 0xEF
+ {
+   if (__byte == 0xE0)
+ __lo_bound = 0xA0;
+   else if (__byte == 0xED)
+ __hi_bound = 0x9F;
+
+   __needed = 2;
+ }
+   else if (__byte <= 0xF4) // 0xF0 to 0xF4
+ {
+   if (__byte == 0xF0)
+ __lo_bound = 0x90;
+   else if (__byte == 0xF4)
+ __hi_bound = 0x8F;
+
+   __needed = 3;
+ }
+   else
+ {
+   *__q = 0xFF;
+   ++__errors;
+ }
+ }
+   else
+ {
+   if (__byte < __lo_bound || __byte > __hi_bound)
+ {
+   *(__q - __seen - 1) = 0xFF;
+   __builtin_memset(__q - __seen, 0xFE, __seen);
+   ++__errors;
+   __needed = __seen = 0;
+   __lo_bound = 0x80;
+   __hi_bound = 0xBF;
+   continue; // Reprocess the current character.
+ }
+
+   __lo_bound = 0x80;
+   __hi_bound = 0xBF;
+   ++__seen;
+   if (__seen == __needed)
+ __needed = __seen = 0;
+ }
+   __q++;
+  }
+
+if (__needed)
+  {
+   // The string ends with an incomplete multibyte sequence.
+   if (__seen)
+ __s.resize(__s.size() - __seen);
+   __s.back() = 0xFF;
+   ++__errors;
+  }
+
+if (__errors == 0)
+  return true;
+
+string __s2;
+__s2.reserve(__s.size() + __errors * 2);
+for (unsigned char __byte : __s)
+  {
+   if (__byte == 0xFF)
+ __s2 += "\uFFFD";
+   else if (__byte != 0xFE)
+ __s2 += (char)__byte;
+  }
+__s = std::move(__s2);
+return false;
+  }
+#endif // C++20
+
   /// @} group locales
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index e81c39a7c80..760aaa206da 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -917,42 +917,68 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   inline void
   vprint_unicode(ostream& __os, string_view __fmt, format_args __args)
   {
-// TODO: diagnose invalid UTF-8 code units
-#ifdef _WIN32
-int __fd_for_console(std::streambuf*);
-void __write_utf16_to_console(int, string);
-
-// If stream refers to a terminal convert to UTF-16 and use WriteConsoleW.
-if (int __fd = __fd_for_console(__os.rdbuf()); __fd >= 0)
+ostream::sentry __cerb(__os);
+if (__cerb)
   {
-   ostream::sentry __cerb(__os);
-   if (__cerb)
+   string __out = std::vformat(__fmt, __args);
+   std::__to_valid_utf8(__out);
+
+#ifdef _WIN32
+   

[PATCH 1/2] libstdc++: Implement C++23 header [PR107760]

2023-11-17 Thread Jonathan Wakely
There's a TODO here about checking for invalid UTF-8, which is done by
the next patch.

I don't know if the Windows code actually works. I tried to test it with
mingw and Wine, but I got garbled text. But I'm not sure if that's my
code here, or the conversion to UTF-16, or how I'm testing, or just that
Wine in a Linux terminal doesn't properly emulat the Windows console, or
something else.

This needs tests, so I need to write them before pushing, but I still
plan to get that done for GCC 14.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/107760
* config/abi/pre/gnu.ver: Export new symbols.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (__cpp_lib_print): Define.
* include/bits/version.h: Regenerate.
* include/std/ostream (vprintf_nonunicode, vprintf_unicode)
(print, println): New functions.
* include/std/print: New file.
* src/c++20/Makefile.am: Add new source file.
* src/c++20/Makefile.in: Regenerate.
* src/c++98/globals_io.cc [_WIN32] (__fd_for_console): New
function.
* src/c++20/print.cc: New file.
---
 libstdc++-v3/config/abi/pre/gnu.ver   |   4 +
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/version.def |   9 ++
 libstdc++-v3/include/bits/version.h   |  29 --
 libstdc++-v3/include/std/ostream  | 102 
 libstdc++-v3/include/std/print| 128 ++
 libstdc++-v3/src/c++20/Makefile.am|   2 +-
 libstdc++-v3/src/c++20/Makefile.in|   4 +-
 libstdc++-v3/src/c++20/print.cc   |  35 +++
 libstdc++-v3/src/c++98/globals_io.cc  |  23 +
 11 files changed, 326 insertions(+), 12 deletions(-)
 create mode 100644 libstdc++-v3/include/std/print
 create mode 100644 libstdc++-v3/src/c++20/print.cc

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 15b50d51251..c7200929e34 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2514,6 +2514,10 @@ GLIBCXX_3.4.31 {
 
_ZNKSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
 
_ZNKSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEcvbEv;
 
+# These are only defined for *-*-mingw*
+_ZSt16__fd_for_consolePSt15basic_streambufIcSt11char_traitsIcEE;
+
_ZSt24__write_utf16_to_consoleiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;
+
 } GLIBCXX_3.4.30;
 
 GLIBCXX_3.4.32 {
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 17d9d9cec31..368b92eafbc 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -85,6 +85,7 @@ std_headers = \
${std_srcdir}/memory_resource \
${std_srcdir}/mutex \
${std_srcdir}/ostream \
+   ${std_srcdir}/print \
${std_srcdir}/queue \
${std_srcdir}/random \
${std_srcdir}/regex \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index f038af709cc..a31588c0100 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -441,6 +441,7 @@ std_freestanding = \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/memory_resource \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/mutex \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/ostream \
+@GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/print \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/queue \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/random \
 @GLIBCXX_HOSTED_TRUE@  ${std_srcdir}/regex \
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 15bd502f52c..8b5cace3775 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1578,6 +1578,15 @@ ftms = {
   };
 };
 
+ftms = {
+  name = print;
+  values = {
+v = 202211;
+cxxmin = 23;
+hosted = yes;
+  };
+};
+
 ftms = {
   name = spanstream;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 9563b6cd2f7..f197408e60f 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1923,6 +1923,17 @@
 #undef __glibcxx_want_out_ptr
 
 // from version.def line 1582
+#if !defined(__cpp_lib_print)
+# if (__cplusplus >= 202100L) && _GLIBCXX_HOSTED
+#  define __glibcxx_print 202211L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_print)
+#   define __cpp_lib_print 202211L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_print) && defined(__glibcxx_want_print) */
+#undef __glibcxx_want_print
+
+// from version.def line 1591
 #if !defined(__cpp_lib_spanstream)
 # if (__cplusplus >= 202100L) && _GLIBCXX_HOSTED && (__glibcxx_span)
 #  define __glibcxx_spanstream 202106L
diff --git a/libstdc++-v3/include/std/ostream 

[PATCH] libstdc++: Add fast path for std::format("{}", x) [PR110801]

2023-11-17 Thread Jonathan Wakely
I'll probably push this before stage 1 closes.

I might move the new lambda out to a struct at namespace scope first
though.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/110801
* include/std/format (_Sink_iter::_M_get_pointer)
(_Sink_iter::_M_end_pointer): New functions
(_Sink::_M_get_pointer, _Sink::_M_end_pointer): New virtual
functions.
(_Seq_sink::_M_get_pointer, _Seq_sink::_M_end_pointer): New
functions.
(_Iter_sink::_M_get_pointer): Likewise.
(__do_vformat_to): Use new functions to optimize "{}" case.
---
 libstdc++-v3/include/std/format | 155 +++-
 1 file changed, 154 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 8ec1c8a0b9a..3a9c64e4ab9 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2442,6 +2442,10 @@ namespace __format
   iter_difference_t<_Out> size;
 };
 
+_GLIBCXX_BEGIN_NAMESPACE_CONTAINER
+template class vector;
+_GLIBCXX_END_NAMESPACE_CONTAINER
+
 /// @cond undocumented
 namespace __format
 {
@@ -2492,6 +2496,14 @@ namespace __format
   [[__gnu__::__always_inline__]]
   constexpr _Sink_iter
   operator++(int) { return *this; }
+
+  _CharT*
+  _M_get_pointer(size_t __n) const
+  { return _M_sink->_M_get_pointer(__n); }
+
+  void
+  _M_end_pointer(size_t __n) const
+  { _M_sink->_M_end_pointer(__n); }
 };
 
   // Abstract base class for type-erased character sinks.
@@ -2508,6 +2520,7 @@ namespace __format
   // Called when the span is full, to make more space available.
   // Precondition: _M_next != _M_span.begin()
   // Postcondition: _M_next != _M_span.end()
+  // TODO: remove the precondition? could make overflow handle it.
   virtual void _M_overflow() = 0;
 
 protected:
@@ -2572,6 +2585,32 @@ namespace __format
  }
   }
 
+  // If this returns a non-null pointer it can be used to write directly
+  // up to N characters to the sink to avoid unwanted buffering.
+  // If anything is written to the buffer then there must be a call to
+  // _M_end_pointer(n2) before any call to another member function of
+  // this object, where N2 is the number of characters written.
+  // TODO: rewrite this direct access as an RAII type that exposes a span.
+  virtual _CharT*
+  _M_get_pointer(size_t __n)
+  {
+   auto __avail = _M_unused();
+   if (__n <= __avail.size())
+ return __avail.data();
+
+   if (__n > _M_span.size()) // Cannot meet the request.
+ return nullptr;
+
+   _M_overflow(); // Make more space available.
+   __avail = _M_unused();
+   return __n <= __avail.size() ? __avail.data() : nullptr;
+  }
+
+  // pre: no calls to _M_write or _M_overflow since _M_get_pointer.
+  virtual void
+  _M_end_pointer(size_t __n)
+  { _M_next += __n; }
+
 public:
   _Sink(const _Sink&) = delete;
   _Sink& operator=(const _Sink&) = delete;
@@ -2596,6 +2635,8 @@ namespace __format
   { }
 };
 
+  using _GLIBCXX_STD_C::vector;
+
   // A sink that fills a sequence (e.g. std::string, std::vector, std::deque).
   // Writes to a buffer then appends that to the sequence when it fills up.
   template
@@ -2619,6 +2660,46 @@ namespace __format
this->_M_rewind();
   }
 
+  _CharT*
+  _M_get_pointer(size_t __n) override
+  {
+   if constexpr (__is_specialization_of<_Seq, basic_string>
+   || __is_specialization_of<_Seq, vector>)
+ {
+   // Flush the buffer to _M_seq first:
+   if (this->_M_used().size())
+ _M_overflow();
+   // Expand _M_seq to make __n new characters available:
+   const auto __sz = _M_seq.size();
+   if constexpr (is_same_v || is_same_v)
+ _M_seq.__resize_and_overwrite(__sz + __n,
+   [](auto, auto __n2) {
+ return __n2;
+   });
+   else
+ _M_seq.resize(__sz + __n);
+   // Set _M_used() to be a span over the original part of _M_seq:
+   this->_M_reset(_M_seq, __sz);
+   // And return a pointer to the new portion:
+   return this->_M_unused().data();
+ }
+   else // Try to use the base class' buffer.
+ return _Sink<_CharT>::_M_get_pointer();
+  }
+
+  void
+  _M_end_pointer(size_t __n) override
+  {
+   if constexpr (__is_specialization_of<_Seq, basic_string>
+   || __is_specialization_of<_Seq, vector>)
+ {
+   // Truncate the sequence to the part that was actually written to:
+   _M_seq.resize(this->_M_used().size() + __n);
+   // Switch back to using buffer:
+   this->_M_reset(this->_M_buf);
+ }
+

[PATCH] libstdc++: Define std::ranges::to for C++23 (P1206R7) [PR111055]

2023-11-17 Thread Jonathan Wakely
This needs tests, and doesn't include the changes to the standard
containers to add insert_range etc. (but they work with ranges::to
anyway, using the existing member functions).

I plan to write the tests and push this tomorrow.

I've trimmed the boring bits of the version.h changes, that are caused
just by reordering some existing entries to be in alphabetical order.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/111055
* include/bits/ranges_base.h (from_range_t): Define new tag
type.
(from_range): Define new tag object.
* include/bits/version.def (ranges_to_container): Define.
* include/bits/version.h: Regenerate.
* include/std/ranges (ranges::to): Define.
---
 libstdc++-v3/include/bits/ranges_base.h |   6 +
 libstdc++-v3/include/bits/version.def   |  34 ++-
 libstdc++-v3/include/bits/version.h | 111 +-
 libstdc++-v3/include/std/ranges | 279 +++-
 4 files changed, 371 insertions(+), 59 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index 7fa43d1965a..555065b4ed7 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef __cpp_lib_concepts
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -1057,6 +1058,11 @@ namespace ranges
iterator_t<_Range>,
dangling>;
 
+#if __glibcxx_ranges_to_container // C++ >= 23
+  struct from_range_t { explicit from_range_t() = default; };
+  inline constexpr from_range_t from_range{};
+#endif
+
 } // namespace ranges
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 447fdeb9519..15bd502f52c 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1370,19 +1370,21 @@ ftms = {
   };
 };
 
-ftms = {
-  name = to_underlying;
-  values = {
-v = 202102;
-cxxmin = 23;
-  };
-};
+//ftms = {
+//  name = container_ranges;
+//  values = {
+//v = 202202;
+//cxxmin = 23;
+//hosted = yes;
+//  };
+//};
 
 ftms = {
-  name = unreachable;
+  name = ranges_to_container;
   values = {
 v = 202202;
 cxxmin = 23;
+hosted = yes;
   };
 };
 
@@ -1614,6 +1616,22 @@ ftms = {
   };
 };
 
+ftms = {
+  name = to_underlying;
+  values = {
+v = 202102;
+cxxmin = 23;
+  };
+};
+
+ftms = {
+  name = unreachable;
+  values = {
+v = 202202;
+cxxmin = 23;
+  };
+};
+
 ftms = {
   name = fstream_native_handle;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 97c6d8508f4..9563b6cd2f7 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1658,29 +1658,18 @@
 #endif /* !defined(__cpp_lib_reference_from_temporary) && 
defined(__glibcxx_want_reference_from_temporary) */
 #undef __glibcxx_want_reference_from_temporary
 
-// from version.def line 1374
-#if !defined(__cpp_lib_to_underlying)
-# if (__cplusplus >= 202100L)
-#  define __glibcxx_to_underlying 202102L
-#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_to_underlying)
-#   define __cpp_lib_to_underlying 202102L
+// from version.def line 1383
+#if !defined(__cpp_lib_ranges_to_container)
+# if (__cplusplus >= 202100L) && _GLIBCXX_HOSTED
+#  define __glibcxx_ranges_to_container 202202L
+#  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_ranges_to_container)
+#   define __cpp_lib_ranges_to_container 202202L
 #  endif
 # endif
-#endif /* !defined(__cpp_lib_to_underlying) && 
defined(__glibcxx_want_to_underlying) */
-#undef __glibcxx_want_to_underlying
+#endif /* !defined(__cpp_lib_ranges_to_container) && 
defined(__glibcxx_want_ranges_to_container) */
+#undef __glibcxx_want_ranges_to_container
 
-// from version.def line 1382
-#if !defined(__cpp_lib_unreachable)
-# if (__cplusplus >= 202100L)
-#  define __glibcxx_unreachable 202202L
-#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_unreachable)
-#   define __cpp_lib_unreachable 202202L
-#  endif
-# endif
-#endif /* !defined(__cpp_lib_unreachable) && 
defined(__glibcxx_want_unreachable) */
-#undef __glibcxx_want_unreachable
-
-// from version.def line 1390
+// from version.def line 1392
 #if !defined(__cpp_lib_ranges_zip)
 # if (__cplusplus >= 202100L)
 #  define __glibcxx_ranges_zip 202110L
@@ -1977,7 +1966,29 @@
 #endif /* !defined(__cpp_lib_string_resize_and_overwrite) && 
defined(__glibcxx_want_string_resize_and_overwrite) */
 #undef __glibcxx_want_string_resize_and_overwrite
 
-// from version.def line 1618
+// from version.def line 1620
+#if !defined(__cpp_lib_to_underlying)
+# if (__cplusplus >= 202100L)
+#  define __glibcxx_to_underlying 202102L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_to_underlying)
+#   define 

[committed] libstdc++: Regenerate config.h.in

2023-11-17 Thread Jonathan Wakely
Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* config.h.in: Regenerate.
---
 libstdc++-v3/config.h.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index c0aa51af3f0..17da7bb9867 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -167,7 +167,7 @@
 /* Define to 1 if you have the `hypotl' function. */
 #undef HAVE_HYPOTL
 
-/* Define if you have the iconv() function. */
+/* Define if you have the iconv() function and it works. */
 #undef HAVE_ICONV
 
 /* Define to 1 if you have the  header file. */
-- 
2.41.0



Re: [PATCH v2] RISC-V: Implement target attribute

2023-11-17 Thread Andreas Schwab
In file included from 
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/memory:78,
 from ../../gcc/system.h:769,
 from ../../gcc/config/riscv/riscv-target-attr.cc:25:
In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const [with 
_Tp = char]',
inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
_Dp = std::default_delete]' at 
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
inlined from 'bool riscv_process_one_target_attr(char*, location_t, 
{anonymous}::riscv_target_attr_parser&)' at 
../../gcc/config/riscv/riscv-target-attr.cc:274:1,
inlined from 'bool riscv_process_target_attr(tree, location_t, 
gcc_options*)' at ../../gcc/config/riscv/riscv-target-attr.cc:346:37:
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:93:9:
 error: 'void operator delete(void*, std::size_t)' called on pointer returned 
from a mismatched allocation function [-Werror=mismatched-new-delete]
   93 | delete __ptr;
  | ^~~~
In function 'bool riscv_process_one_target_attr(char*, location_t, 
{anonymous}::riscv_target_attr_parser&)',
inlined from 'bool riscv_process_target_attr(tree, location_t, 
gcc_options*)' at ../../gcc/config/riscv/riscv-target-attr.cc:346:37:
../../gcc/config/riscv/riscv-target-attr.cc:244:42: note: returned from 'void* 
operator new [](std::size_t)'
  244 |   std::unique_ptr buf (new char[len]);
  |  ^
In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const [with 
_Tp = char]',
inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
_Dp = std::default_delete]' at 
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
inlined from 'bool riscv_process_target_attr(tree, location_t, 
gcc_options*)' at ../../gcc/config/riscv/riscv-target-attr.cc:361:1:
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:93:9:
 error: 'void operator delete(void*, std::size_t)' called on pointer returned 
from a mismatched allocation function [-Werror=mismatched-new-delete]
   93 | delete __ptr;
  | ^~~~
../../gcc/config/riscv/riscv-target-attr.cc: In function 'bool 
riscv_process_target_attr(tree, location_t, gcc_options*)':
../../gcc/config/riscv/riscv-target-attr.cc:330:42: note: returned from 'void* 
operator new [](std::size_t)'
  330 |   std::unique_ptr buf (new char[len]);
  |  ^
In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const [with 
_Tp = char]',
inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
_Dp = std::default_delete]' at 
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
inlined from 'bool {anonymous}::riscv_target_attr_parser::parse_arch(const 
char*)' at ../../gcc/config/riscv/riscv-target-attr.cc:140:5,
inlined from 'bool {anonymous}::riscv_target_attr_parser::handle_arch(const 
char*)' at ../../gcc/config/riscv/riscv-target-attr.cc:158:21:
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:93:9:
 error: 'void operator delete(void*, std::size_t)' called on pointer returned 
from a mismatched allocation function [-Werror=mismatched-new-delete]
   93 | delete __ptr;
  | ^~~~
In member function 'bool 
{anonymous}::riscv_target_attr_parser::parse_arch(const char*)',
inlined from 'bool {anonymous}::riscv_target_attr_parser::handle_arch(const 
char*)' at ../../gcc/config/riscv/riscv-target-attr.cc:158:21:
../../gcc/config/riscv/riscv-target-attr.cc:108:46: note: returned from 'void* 
operator new [](std::size_t)'
  108 |   std::unique_ptr buf (new char[len]);
  |  ^
In member function 'void std::default_delete<_Tp>::operator()(_Tp*) const [with 
_Tp = char]',
inlined from 'std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp = char; 
_Dp = std::default_delete]' at 
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:398:17,
inlined from 'bool {anonymous}::riscv_target_attr_parser::parse_arch(const 
char*)' at ../../gcc/config/riscv/riscv-target-attr.cc:140:5,
inlined from 'bool {anonymous}::riscv_target_attr_parser::handle_arch(const 
char*)' at ../../gcc/config/riscv/riscv-target-attr.cc:158:21:
/daten/riscv64/gcc/gcc-20231117/Build/prev-riscv64-suse-linux/libstdc++-v3/include/bits/unique_ptr.h:93:9:
 error: 'void operator delete(void*, std::size_t)' called on pointer returned 
from a mismatched allocation function [-We

[committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

GCC generates better code for add_sat if we use:

unsigned z = x + y;
z |= -(z < x);
return z;

If the compiler can't be improved we should consider using that instead
of __builtin_add_overflow.


-- >8 --


This was approved for C++26 last week at the WG21 meeting in Kona.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/bits/version.def (saturation_arithmetic): Define.
* include/bits/version.h: Regenerate.
* include/std/numeric: Include new header.
* include/bits/sat_arith.h: New file.
* testsuite/26_numerics/saturation/add.cc: New test.
* testsuite/26_numerics/saturation/cast.cc: New test.
* testsuite/26_numerics/saturation/div.cc: New test.
* testsuite/26_numerics/saturation/mul.cc: New test.
* testsuite/26_numerics/saturation/sub.cc: New test.
* testsuite/26_numerics/saturation/version.cc: New test.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 libstdc++-v3/include/bits/sat_arith.h | 148 ++
 libstdc++-v3/include/bits/version.def |   8 +
 libstdc++-v3/include/bits/version.h   |  11 ++
 libstdc++-v3/include/std/numeric  |   5 +
 .../testsuite/26_numerics/saturation/add.cc   |  73 +
 .../testsuite/26_numerics/saturation/cast.cc  |  24 +++
 .../testsuite/26_numerics/saturation/div.cc   |  45 ++
 .../testsuite/26_numerics/saturation/mul.cc   |  34 
 .../testsuite/26_numerics/saturation/sub.cc   |  86 ++
 .../26_numerics/saturation/version.cc |  19 +++
 12 files changed, 455 insertions(+)
 create mode 100644 libstdc++-v3/include/bits/sat_arith.h
 create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/add.cc
 create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/cast.cc
 create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/div.cc
 create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/mul.cc
 create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/sub.cc
 create mode 100644 libstdc++-v3/testsuite/26_numerics/saturation/version.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index dab9f720cbb..17d9d9cec31 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -142,6 +142,7 @@ bits_freestanding = \
${bits_srcdir}/ranges_uninitialized.h \
${bits_srcdir}/ranges_util.h \
${bits_srcdir}/refwrap.h \
+   ${bits_srcdir}/sat_arith.h \
${bits_srcdir}/stl_algo.h \
${bits_srcdir}/stl_algobase.h \
${bits_srcdir}/stl_construct.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 4f7ab2dfbab..f038af709cc 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -497,6 +497,7 @@ bits_freestanding = \
${bits_srcdir}/ranges_uninitialized.h \
${bits_srcdir}/ranges_util.h \
${bits_srcdir}/refwrap.h \
+   ${bits_srcdir}/sat_arith.h \
${bits_srcdir}/stl_algo.h \
${bits_srcdir}/stl_algobase.h \
${bits_srcdir}/stl_construct.h \
diff --git a/libstdc++-v3/include/bits/sat_arith.h 
b/libstdc++-v3/include/bits/sat_arith.h
new file mode 100644
index 000..71793467984
--- /dev/null
+++ b/libstdc++-v3/include/bits/sat_arith.h
@@ -0,0 +1,148 @@
+// Saturation arithmetic -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file include/bits/sat_arith.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{numeric}
+ */
+
+#ifndef _GLIBCXX_SAT_ARITH_H
+#define _GLIBCXX_SAT_ARITH_H 1
+
+#pragma GCC system_header
+
+#include 
+
+#ifdef __glibcxx_saturation_arithmetic // C++ >= 26
+
+#include 
+#include 
+
+namespace std 

Add 'libgomp.c++/static-local-variable-1.C'

2023-11-17 Thread Thomas Schwinge
Hi!

I found that with GCC's '-fthreadsafe-statics' implementation (..., which
is enabled by default) instrumented as follows:

--- libstdc++-v3/libsupc++/guard.cc
+++ libstdc++-v3/libsupc++/guard.cc
@@ -271,6 +273,7 @@ namespace __cxxabiv1
   extern "C"
   int __cxa_guard_acquire (__guard *g)
   {
+asm("int3");
 #ifdef __GTHREADS
 // If the target can reorder loads, we need to insert a read memory
 // barrier so that accesses to the guarded variable happen after the

..., there is only one single libgomp C++ test case where this triggers;
'libgomp.c++/taskloop-6.C':

Thread 1 "a.out" received signal SIGTRAP, Trace/breakpoint trap.
__cxxabiv1::__cxa_guard_acquire (g=0x60b228 (J)::i>) at [...]/source-gcc/libstdc++-v3/libsupc++/guard.cc:281
281 if (_GLIBCXX_GUARD_TEST_AND_ACQUIRE (g))
(gdb) bt
#0  __cxxabiv1::__cxa_guard_acquire (g=0x60b228 (J)::i>) at [...]/source-gcc/libstdc++-v3/libsupc++/guard.cc:281
#1  0x00404772 in f17<121> (j=...) at 
source-gcc/libgomp/testsuite/libgomp.c++/taskloop-6.C:300
#2  0x00401e11 in main () at 
source-gcc/libgomp/testsuite/libgomp.c++/taskloop-6.C:411

That test case however isn't per se testing behavior of a C++ static
local variable vs. OpenMP.

OK to push the attached "Add 'libgomp.c++/static-local-variable-1.C'"?
(Also, I'm happy to extend the test case to verify any additional
features you think are userful to be tested there.)

(With '-fno-threadsafe-statics', this fails, as expected.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d3140a1b4a649c5acb3735ef7fd04a4ebffe5e9a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 17 Nov 2023 16:06:25 +0100
Subject: [PATCH] Add 'libgomp.c++/static-local-variable-1.C'

A debug run may look as follows:

int main()
void f()
S::S()
void f()
S::S()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()
void f()/
void f()/
void f()/
  cSC = 1
  cf = 12
int main()/
S::~S()
S::~S()/

	libgomp/
	* testsuite/libgomp.c++/static-local-variable-1.C: New.
---
 .../libgomp.c++/static-local-variable-1.C | 95 +++
 1 file changed, 95 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.c++/static-local-variable-1.C

diff --git a/libgomp/testsuite/libgomp.c++/static-local-variable-1.C b/libgomp/testsuite/libgomp.c++/static-local-variable-1.C
new file mode 100644
index 000..3169ba77d8d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/static-local-variable-1.C
@@ -0,0 +1,95 @@
+// Test basic behavior of a C++ static local variable vs. OpenMP.
+
+#include 
+#include 
+
+#define DEBUG_PRINTF //__builtin_printf
+
+static int state;
+
+static int cSC, cSD, cf;
+
+struct S
+{
+  S()
+  {
+DEBUG_PRINTF("%s\n", __PRETTY_FUNCTION__);
+
+int c;
+#pragma omp atomic capture
+c = ++cSC;
+if (c != 1)
+  __builtin_abort();
+
+if (state++ != 1)
+  __builtin_abort();
+
+DEBUG_PRINTF("%s/\n", __PRETTY_FUNCTION__);
+  }
+
+  ~S()
+  {
+DEBUG_PRINTF("%s\n", __PRETTY_FUNCTION__);
+
+int c;
+#pragma omp atomic capture
+c = ++cSD;
+if (c != 1)
+  __builtin_abort();
+
+if (state++ != 3)
+  __builtin_abort();
+
+DEBUG_PRINTF("%s/\n", __PRETTY_FUNCTION__);
+// Exit '0', now that we've verified all is OK.
+_exit(0);
+  }
+};
+
+static void f()
+{
+  DEBUG_PRINTF("%s\n", __PRETTY_FUNCTION__);
+
+#pragma omp atomic
+  ++cf;
+
+  // 
+  static S s;
+
+  DEBUG_PRINTF("%s/\n", __PRETTY_FUNCTION__);
+}
+
+int main()
+{
+  DEBUG_PRINTF("%s\n", __PRETTY_FUNCTION__);
+
+  if (state++ != 0)
+__builtin_abort();
+
+  int nthreads;
+
+#pragma omp parallel
+  {
+#pragma omp master
+{
+  nthreads = omp_get_num_threads ();
+}
+
+f();
+  }
+
+  DEBUG_PRINTF("  cSC = %d\n", cSC);
+  DEBUG_PRINTF("  cf = %d\n", cf);
+  if (cSC != 1)
+__builtin_abort();
+  if (cf != nthreads)
+__builtin_abort();
+
+  if (state++ != 2)
+__builtin_abort();
+
+  DEBUG_PRINTF("%s/\n", __PRETTY_FUNCTION__);
+
+  // See '_exit(0);' elsewhere.
+  return 1;
+}
-- 
2.34.1



Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread 钟居哲
>> Yeah, just noticed that myself.  Anyway will do some more tests,
>> maybe my initial VLS analysis was somehow flawed.

You can check binop_vx_constraint-167.c ~ binop_vx_constraint-174.c

This patch is pre-approved if you change as my suggestion.
I am gonna sleep so I am not able to review again. 
Feel free to commit it after change as I suggested.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-17 23:13
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
> It must be correct. We already have test (intrinsic codes) for it.
 
Yeah, just noticed that myself.  Anyway will do some more tests,
maybe my initial VLS analysis was somehow flawed.
> Condition should be put into iterators (Add a new iterator for
> indexed load store).
 
Ah, that's what you meant.  Sure.
 
Regards
Robin
 


Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-17 Thread Arsen Arsenović

David Edelsohn  writes:

> On Fri, Nov 17, 2023 at 3:46 AM Arsen Arsenović  wrote:
>
>>
>> David Edelsohn  writes:
>>
>> > On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović  wrote:
>> >
>> > [snip]
>> >> Sure, but my patch does insert --disable-shared:
>> >>
>> >> --8<---cut here---start->8---
>> >> host_modules= { module= gettext; bootstrap=true; no_install=true;
>> >> module_srcdir= "gettext/gettext-runtime";
>> >> // We always build gettext with pic, because some
>> packages
>> >> (e.g. gdbserver)
>> >> // need it in some configuratons, which is determined
>> via
>> >> nontrivial tests.
>> >> // Always enabling pic seems to make sense for something
>> >> tied to
>> >> // user-facing output.
>> >> extra_configure_flags='--disable-shared --disable-java
>> >> --disable-csharp --with-pic';
>> >> lib_path=intl/.libs; };
>> >> --8<---cut here---end--->8---
>> >>
>> >> ... and it is applied:
>> >>
>> >> --8<---cut here---start->8---
>> >> -bash-5.1$ ./config.status --config
>> >> --srcdir=../../gcc/gettext/gettext-runtime --cache-file=./config.cache
>> >>   --disable-werror --with-gmp=/opt/cfarm
>> >>   --with-libiconv-prefix=/opt/cfarm --disable-libstdcxx-pch
>> >>   --with-included-gettext --program-transform-name=s,y,y,
>> >>   --disable-option-checking --build=powerpc-ibm-aix7.3.1.0
>> >>   --host=powerpc-ibm-aix7.3.1.0 --target=powerpc-ibm-aix7.3.1.0
>> >>   --disable-intermodule --enable-checking=yes,types,extra
>> >>   --disable-coverage --enable-languages=c,c++
>> >>   --disable-build-format-warnings --disable-shared --disable-java
>> >>   --disable-csharp --with-pic build_alias=powerpc-ibm-aix7.3.1.0
>> >>   host_alias=powerpc-ibm-aix7.3.1.0 target_alias=powerpc-ibm-aix7.3.1.0
>> >>   CC=gcc CFLAGS=-g 'LDFLAGS=-static-libstdc++ -static-libgcc
>> >>   -Wl,-bbigtoc' 'CXX=g++ -std=c++11' CXXFLAGS=-g
>> >> --8<---cut here---end--->8---
>> >>
>> >> I'm unsure how to tell what the produced binaries are w.r.t static or
>> >> shared, but I only see .o files inside intl/.libs/libintl.a, while I see
>> >> a .so.1 in (e.g.) /lib/libz.a, hinting at it not being shared (?)
>> >>
>> >
>> > An AIX shared library created by libtool will look like
>> > libfoo.a[libfoo.so.N], where N is the package major version number.
>> > Normally with one file.
>>
>> > An AIX static library will look like libfoo.a[a.o, b.o, c.o]
>> > with multiple object files.
>> >
>> > An AIX archive can contain a combination of shared objects and
>> > normal object files.
>> >
>> > AIX normally uses the convention shr.o or shr_64.o for the name
>> > of the shared object file.  Hint, hint, an AIX archive can contain
>> > both 32 bit and 64 bit object files or shared objects.
>> >
>> > I don't know why the gettext build system would create
>> > /home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8)
>> > if --disable-shared was requested.  That clearly is using the
>> > naming of a libtool AIX shared object and failing due to
>> > the missing shared object.  Although in this case, the problem
>> > seems to be the shared library load path.  AIX uses LIBPATH,
>> > not LD_LIBRARY_PATH.
>>
>> It doesn't create libintl.a with a libintl.so.8 inside of it.  The
>> libintl.a contains a bunch of objects, as I'd expect of a static
>> library:
>>
>> --8<---cut here---start->8---
>> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a  | grep libintl
>> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a
>> bindtextdom.o
>> dcgettext.o
>> ...
>> --8<---cut here---end--->8---
>>
>>
>> > Also, for me, the out of tree path was
>> >
>> > gettext/gettext-runtime/intl/.libs
>> >
>> > Is your search path missing a level?
>>
>> No, the above is generated by the GCC build system and builds
>> gettext-runtime directly (per Brunos recommendation a while ago) as it
>> is replacing intl/ of similar functionality.
>>
>> I'm currently building GCC with libintl with the threads hack you
>> mentioned applied (as I got undefined references to the pthread
>> functions you discovered).  I suspect that, bar this issue (which, IIUC,
>> Bruno will fix in a new release?) the patch above will fix the issues
>> you've encountered on AIX (note that if you want to use gettext in-tree,
>> you'd still have to fetch gettext into the tree).
>>
>> Maybe we should provide a download-prerequisite-y script that skips
>> everything but GNU gettext, to retain same behavior?
>>
>> Have a lovely day.
>>
>
> I'm concerned that the gettext fixes are working around AIX support for
> libpthread.a as opposed to making --disable-threads function.

Indeed, my intention is to --disable-threads.  The goal of the
workaround is simply to test the patch I wrote.

> --enabled-threads=isoc use of 

Re: [PATCH] RISC-V: Fix bug of tuple move splitter[PR112561]

2023-11-17 Thread Jeff Law




On 11/17/23 07:18, Kito Cheng wrote:
I didn’t take a closer look yet on the ira/lra dump yet, but my feeling 
is that may cause by the earlyclober modifier isn’t work as expect?


Let me take closer look tomorrow.
Remember that constraints aren't checked until register allocation.  So 
the combiner, splitters, etc don't know about "earlyclobber".  It's a 
relatively common goof.


Not sure if that's playing a role here, but I've seen it happen several 
times in the past.


Jeff


Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread Robin Dapp
> It must be correct. We already have test (intrinsic codes) for it.

Yeah, just noticed that myself.  Anyway will do some more tests,
maybe my initial VLS analysis was somehow flawed.
 
> Condition should be put into iterators (Add a new iterator for
> indexed load store).

Ah, that's what you meant.  Sure.

Regards
 Robin


Re: [PATCH v4] c-family: Implement __has_feature and __has_extension [PR60512]

2023-11-17 Thread Alex Coplan
On 03/11/2023 12:19, Marek Polacek wrote:
> On Wed, Sep 27, 2023 at 03:27:30PM +0100, Alex Coplan wrote:
> > Hi,
> > 
> > This is a v4 patch to address Jason's feedback here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630911.html
> > 
> > w.r.t. v3 it just removes a comment now that some uncertainty around
> > cxx_binary_literals has been resolved, and updates the documentation as
> > suggested to point to the Clang docs.
> > 
> > --
> > 
> > This patch implements clang's __has_feature and __has_extension in GCC.
> > Currently the patch aims to implement all documented features (and some
> > undocumented ones) following the documentation at
> > https://clang.llvm.org/docs/LanguageExtensions.html with the exception
> > of the legacy features for C++ type traits.  These are omitted, since as
> > the clang documentation notes, __has_builtin is the correct "modern" way
> > to query for these (which GCC already implements).
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu, bootstrapped on
> > x86_64-apple-darwin, darwin regtest in progress.  OK for trunk if
> > testing passes?
> 
> Thanks for the patch.  I only have a few minor comments.

Thanks a lot for the detailed review.  Please see the incremental change
from v4 to v5 attached (which addresses your comments).  The full v5
patch is posted here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637028.html

> 
> > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > index aae57260097..1210953d33a 100644
> > --- a/gcc/c-family/c-common.cc
> > +++ b/gcc/c-family/c-common.cc
> > @@ -311,6 +311,43 @@ const struct fname_var_t fname_vars[] =
> >{NULL, 0, 0},
> >  };
> >  
> > +/* Flags to restrict availability of generic features that
> > +   are known to __has_{feature,extension}.  */
> > +
> > +enum
> > +{
> > +  HF_FLAG_EXT = 1, /* Available only as an extension.  */
> > +  HF_FLAG_SANITIZE = 2, /* Availability depends on sanitizer flags.  */
> > +};
> 
> Why not have a new HF_FLAG_ = 0 here and use it below...

Sure, I've used HF_FLAG_NONE for this in the updated patch.

> 
> > +/* Info for generic features which can be queried through
> > +   __has_{feature,extension}.  */
> > +
> > +struct hf_feature_info
> > +{
> > +  const char *ident;
> > +  unsigned flags;
> > +  unsigned mask;
> 
> Not enum sanitize_code for mask?

I initially intended the mask field to have a flexible interpretation
depending on the value of flags, i.e. it's only interpreted as
enum sanitize_code if flags has HF_FLAG_SANITIZE set.  Of course, at
the moment the mask field happens to only be used for sanitizer flags.

So personally I'd lean towards keeping the type as is with the view to
allowing re-purposing in the future, but happy to change it if you feel
strongly.

> 
> > +};
> > +
> > +/* Table of generic features which can be queried through
> > +   __has_{feature,extension}.  */
> > +
> > +static const hf_feature_info has_feature_table[] =
> > +{
> > +  { "address_sanitizer",   HF_FLAG_SANITIZE, SANITIZE_ADDRESS },
> > +  { "thread_sanitizer",HF_FLAG_SANITIZE, SANITIZE_THREAD },
> > +  { "leak_sanitizer",  HF_FLAG_SANITIZE, SANITIZE_LEAK },
> > +  { "hwaddress_sanitizer", HF_FLAG_SANITIZE, SANITIZE_HWADDRESS },
> > +  { "undefined_behavior_sanitizer", HF_FLAG_SANITIZE, SANITIZE_UNDEFINED },
> > +  { "attribute_deprecated_with_message",  0, 0 },
> > +  { "attribute_unavailable_with_message", 0, 0 },
> > +  { "enumerator_attributes", 0, 0 },
> > +  { "tls", 0, 0 },
> 
> ...here?  Might be more obvious what it means then.
> 
> > +  { "gnu_asm_goto_with_outputs", HF_FLAG_EXT, 0 },
> > +  { "gnu_asm_goto_with_outputs_full",HF_FLAG_EXT, 0 }
> > +};
> > +
> >  /* Global visibility options.  */
> >  struct visibility_flags visibility_options;
> >  
> > @@ -9808,4 +9845,63 @@ c_strict_flex_array_level_of (tree array_field)
> >return strict_flex_array_level;
> >  }
> >  
> > +/* Map from identifiers to booleans.  Value is true for features, and
> > +   false for extensions.  Used to implement __has_{feature,extension}.  */
> > +
> > +using feature_map_t = hash_map ;
> > +static feature_map_t *feature_map = nullptr;
> 
> You don't need " = nullptr" here.

Dropped, thanks.

> 
> > +/* Register a feature for __has_{feature,extension}.  FEATURE_P is true
> > +   if the feature identified by NAME is a feature (as opposed to an
> > +   extension).  */
> > +
> > +void
> > +c_common_register_feature (const char *name, bool feature_p)
> > +{
> > +  bool dup = feature_map->put (get_identifier (name), feature_p);
> > +  gcc_checking_assert (!dup);
> > +}
> > +
> > +/* Lazily initialize hash table for __has_{feature,extension},
> > +   dispatching to the appropriate frontend to register language-specific
> 
> "front end"

Done.

> 
> > +   features.  */
> > +
> > +static void
> > +init_has_feature ()
> > +{
> > +  

Re: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread 钟居哲
>> I'm wondering whether the VLA modes in the iterator are correct.
>> Looks dubious to me but unsure, will need to create some tests
>> before continuing.

It must be correct. We already have test (intrinsic codes) for it.

>> What's the problem with those?  We probably won't reach there
>> because the indexed is considered invalid before but we could,
>> theoretically, still combine them?
Condition should be put into iterators (Add a new iterator for indexed load 
store).
Like you did on RAITO iterators adding !TARGET_64BIT.
It's easier to maintain since iterators codes only happence once.
Wheras this patch is adding GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= 
GET_MODE_BITSIZE (Pmode)
in many places.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-11-17 22:55
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for 
rv32gcv.
> OK. Make sense。
 
I'm wondering whether the VLA modes in the iterator are correct.
Looks dubious to me but unsure, will need to create some tests
before continuing.
 
> LGTM as long as you remove  all
> GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode)
 
What's the problem with those?  We probably won't reach there
because the indexed is considered invalid before but we could,
theoretically, still combine them?
 
Regards
Robin
 


Re: [PATCH] RISC-V: Disallow 64-bit indexed loads and stores for rv32gcv.

2023-11-17 Thread Robin Dapp
> OK. Make sense。

I'm wondering whether the VLA modes in the iterator are correct.
Looks dubious to me but unsure, will need to create some tests
before continuing.

> LGTM as long as you remove  all
> GET_MODE_BITSIZE (GET_MODE_INNER (mode)) <= GET_MODE_BITSIZE (Pmode)

What's the problem with those?  We probably won't reach there
because the indexed is considered invalid before but we could,
theoretically, still combine them?

Regards
 Robin


[PATCH v5] c-family: Implement __has_feature and __has_extension [PR60512]

2023-11-17 Thread Alex Coplan
Hi,

This is a v5 patch to address Marek's feedback here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635157.html

I also implemented Jason's suggestion to use constexpr for the tables
from this review:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634484.html

I'll attach the incremental change in reply to Marek's review to make
things easier to compare.

Bootstrapped/regtested on aarch64-linux-gnu.  Bootstrap/regtest on
x86_64-apple-darwin in progress (on top of this libsanitizer fix:
https://github.com/llvm/llvm-project/issues/72639).

OK for trunk if testing passes?

Thanks,
Alex

-- >8 --

This patch implements clang's __has_feature and __has_extension in GCC.
Currently the patch aims to implement all documented features (and some
undocumented ones) following the documentation at
https://clang.llvm.org/docs/LanguageExtensions.html with the exception
of the legacy features for C++ type traits.  These are omitted, since as
the clang documentation notes, __has_builtin is the correct "modern" way
to query for these (which GCC already implements).

gcc/c-family/ChangeLog:

PR c++/60512
* c-common.cc (struct hf_feature_info): New.
(c_common_register_feature): New.
(init_has_feature): New.
(has_feature_p): New.
* c-common.h (c_common_has_feature): New.
(c_family_register_lang_features): New.
(c_common_register_feature): New.
(has_feature_p): New.
(c_register_features): New.
(cp_register_features): New.
* c-lex.cc (init_c_lex): Plumb through has_feature callback.
(c_common_has_builtin): Generalize and move common part ...
(c_common_lex_availability_macro): ... here.
(c_common_has_feature): New.
* c-ppoutput.cc (init_pp_output): Plumb through has_feature.

gcc/c/ChangeLog:

PR c++/60512
* c-lang.cc (c_family_register_lang_features): New.
* c-objc-common.cc (struct c_feature_info): New.
(c_register_features): New.

gcc/cp/ChangeLog:

PR c++/60512
* cp-lang.cc (c_family_register_lang_features): New.
* cp-objcp-common.cc (struct cp_feature_selector): New.
(cp_feature_selector::has_feature): New.
(struct cp_feature_info): New.
(cp_register_features): New.

gcc/ChangeLog:

PR c++/60512
* doc/cpp.texi: Document __has_{feature,extension}.

gcc/objc/ChangeLog:

PR c++/60512
* objc-act.cc (struct objc_feature_info): New.
(objc_nonfragile_abi_p): New.
(objc_common_register_features): New.
* objc-act.h (objc_common_register_features): New.
* objc-lang.cc (c_family_register_lang_features): New.

gcc/objcp/ChangeLog:

PR c++/60512
* objcp-lang.cc (c_family_register_lang_features): New.

libcpp/ChangeLog:

PR c++/60512
* include/cpplib.h (struct cpp_callbacks): Add has_feature.
(enum cpp_builtin_type): Add BT_HAS_{FEATURE,EXTENSION}.
* init.cc: Add __has_{feature,extension}.
* macro.cc (_cpp_builtin_macro_text): Handle
BT_HAS_{FEATURE,EXTENSION}.

gcc/testsuite/ChangeLog:

PR c++/60512
* c-c++-common/has-feature-common.c: New test.
* c-c++-common/has-feature-pedantic.c: New test.
* g++.dg/ext/has-feature.C: New test.
* gcc.dg/asan/has-feature-asan.c: New test.
* gcc.dg/has-feature.c: New test.
* gcc.dg/ubsan/has-feature-ubsan.c: New test.
* obj-c++.dg/has-feature.mm: New test.
* objc.dg/has-feature.m: New test.
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 0ea0c4f4bef..f270fa2f5b5 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -311,6 +311,44 @@ const struct fname_var_t fname_vars[] =
   {NULL, 0, 0},
 };
 
+/* Flags to restrict availability of generic features that
+   are known to __has_{feature,extension}.  */
+
+enum
+{
+  HF_FLAG_NONE = 0,
+  HF_FLAG_EXT = 1, /* Available only as an extension.  */
+  HF_FLAG_SANITIZE = 2, /* Availability depends on sanitizer flags.  */
+};
+
+/* Info for generic features which can be queried through
+   __has_{feature,extension}.  */
+
+struct hf_feature_info
+{
+  const char *ident;
+  unsigned flags;
+  unsigned mask;
+};
+
+/* Table of generic features which can be queried through
+   __has_{feature,extension}.  */
+
+static constexpr hf_feature_info has_feature_table[] =
+{
+  { "address_sanitizer",   HF_FLAG_SANITIZE, SANITIZE_ADDRESS },
+  { "thread_sanitizer",HF_FLAG_SANITIZE, SANITIZE_THREAD },
+  { "leak_sanitizer",  HF_FLAG_SANITIZE, SANITIZE_LEAK },
+  { "hwaddress_sanitizer", HF_FLAG_SANITIZE, SANITIZE_HWADDRESS },
+  { "undefined_behavior_sanitizer", HF_FLAG_SANITIZE, SANITIZE_UNDEFINED },
+  { "attribute_deprecated_with_message",  HF_FLAG_NONE, 0 },
+  { "attribute_unavailable_with_message", HF_FLAG_NONE, 0 },
+  { "enumerator_attributes", 

[committed] libstdc++: Adjust std::in_range template parameter name

2023-11-17 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This is more consistent with the specification in the standard.

libstdc++-v3/ChangeLog:

* include/std/utility (in_range): Rename _Up parameter to _Res.
---
 libstdc++-v3/include/std/utility | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/std/utility b/libstdc++-v3/include/std/utility
index 18bef7adccd..5d5fcc7da73 100644
--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -171,22 +171,22 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 cmp_greater_equal(_Tp __t, _Up __u) noexcept
 { return !std::cmp_less(__t, __u); }
 
-  template
+  template
 constexpr bool
 in_range(_Tp __t) noexcept
 {
-  static_assert(__is_standard_integer<_Up>::value);
+  static_assert(__is_standard_integer<_Res>::value);
   static_assert(__is_standard_integer<_Tp>::value);
   using __gnu_cxx::__int_traits;
 
-  if constexpr (is_signed_v<_Tp> == is_signed_v<_Up>)
-   return __int_traits<_Up>::__min <= __t
- && __t <= __int_traits<_Up>::__max;
+  if constexpr (is_signed_v<_Tp> == is_signed_v<_Res>)
+   return __int_traits<_Res>::__min <= __t
+ && __t <= __int_traits<_Res>::__max;
   else if constexpr (is_signed_v<_Tp>)
return __t >= 0
- && make_unsigned_t<_Tp>(__t) <= __int_traits<_Up>::__max;
+ && make_unsigned_t<_Tp>(__t) <= __int_traits<_Res>::__max;
   else
-   return __t <= make_unsigned_t<_Up>(__int_traits<_Up>::__max);
+   return __t <= make_unsigned_t<_Res>(__int_traits<_Res>::__max);
 }
 #endif // __cpp_lib_integer_comparison_functions
 
-- 
2.41.0



[committed] libstdc++: Add more Doxygen comments and another test for std::out_ptr

2023-11-17 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Improve Doxygen comments for std::out_ptr etc. and add a test for the
feature test macro.  Also remove a redundant preprocessor condition.

Ideally the docs for std::out_ptr and std::inout_ptr would show examples
of how to use them and what they do, but that would take some effort.
I'll aim to do that before GCC 14 is released.

libstdc++-v3/ChangeLog:

* include/bits/out_ptr.h: Add Doxygen comments. Remove a
redundant preprocessor condition.
* testsuite/20_util/smartptr.adapt/version.cc: New test.
---
 libstdc++-v3/include/bits/out_ptr.h   | 41 ---
 .../20_util/smartptr.adapt/version.cc | 19 +
 2 files changed, 55 insertions(+), 5 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/smartptr.adapt/version.cc

diff --git a/libstdc++-v3/include/bits/out_ptr.h 
b/libstdc++-v3/include/bits/out_ptr.h
index 49712fa7e31..aeeb6640441 100644
--- a/libstdc++-v3/include/bits/out_ptr.h
+++ b/libstdc++-v3/include/bits/out_ptr.h
@@ -43,8 +43,14 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-#ifdef __glibcxx_out_ptr // C++ >= 23
-  /// Adapt a smart pointer for functions taking an output pointer parameter.
+  /// Smart pointer adaptor for functions taking an output pointer parameter.
+  /**
+   * @tparam _Smart The type of pointer to adapt.
+   * @tparam _Pointer The type of pointer to convert to.
+   * @tparam _Args... Argument types used when resetting the smart pointer.
+   * @since C++23
+   * @headerfile 
+   */
   template
 class out_ptr_t
 {
@@ -276,7 +282,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template friend class inout_ptr_t;
 };
 
-  /// Adapt a smart pointer for functions taking an output pointer parameter.
+  /// Smart pointer adaptor for functions taking an inout pointer parameter.
+  /**
+   * @tparam _Smart The type of pointer to adapt.
+   * @tparam _Pointer The type of pointer to convert to.
+   * @tparam _Args... Argument types used when resetting the smart pointer.
+   * @since C++23
+   * @headerfile 
+   */
   template
 class inout_ptr_t
 {
@@ -367,6 +380,15 @@ namespace __detail
 }
 /// @endcond
 
+  /// Adapt a smart pointer for functions taking an output pointer parameter.
+  /**
+   * @tparam _Pointer The type of pointer to convert to.
+   * @param __s The pointer that should take ownership of the result.
+   * @param __args... Arguments to use when resetting the smart pointer.
+   * @return A std::inout_ptr_t referring to `__s`.
+   * @since C++23
+   * @headerfile 
+   */
   template
 inline auto
 out_ptr(_Smart& __s, _Args&&... __args)
@@ -379,6 +401,15 @@ namespace __detail
   return _Ret(__s, std::forward<_Args>(__args)...);
 }
 
+  /// Adapt a smart pointer for functions taking an inout pointer parameter.
+  /**
+   * @tparam _Pointer The type of pointer to convert to.
+   * @param __s The pointer that should take ownership of the result.
+   * @param __args... Arguments to use when resetting the smart pointer.
+   * @return A std::inout_ptr_t referring to `__s`.
+   * @since C++23
+   * @headerfile 
+   */
   template
 inline auto
 inout_ptr(_Smart& __s, _Args&&... __args)
@@ -391,6 +422,7 @@ namespace __detail
   return _Ret(__s, std::forward<_Args>(__args)...);
 }
 
+  /// @cond undocumented
   template
   template
 inline
@@ -422,11 +454,10 @@ namespace __detail
   else
__reset();
 }
-#endif // __glibcxx_out_ptr
+  /// @endcond
 
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
 #endif // __glibcxx_out_ptr
 #endif /* _GLIBCXX_OUT_PTR_H */
-
diff --git a/libstdc++-v3/testsuite/20_util/smartptr.adapt/version.cc 
b/libstdc++-v3/testsuite/20_util/smartptr.adapt/version.cc
new file mode 100644
index 000..5110f8b371e
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/smartptr.adapt/version.cc
@@ -0,0 +1,19 @@
+// { dg-do preprocess { target c++23 } }
+// { dg-add-options no_pch }
+
+#include 
+
+#ifndef __cpp_lib_out_ptr
+# error "Feature test macro for out_ptr is missing in "
+#elif __cpp_lib_out_ptr < 202106L
+# error "Feature test macro for out_ptr has wrong value in "
+#endif
+
+#undef __cpp_lib_out_ptr
+#include 
+
+#ifndef __cpp_lib_out_ptr
+# error "Feature test macro for out_ptr is missing in "
+#elif __cpp_lib_out_ptr < 202106L
+# error "Feature test macro for out_ptr has wrong value in "
+#endif
-- 
2.41.0



Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-17 Thread David Edelsohn
On Fri, Nov 17, 2023 at 3:46 AM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović  wrote:
> >
> > [snip]
> >> Sure, but my patch does insert --disable-shared:
> >>
> >> --8<---cut here---start->8---
> >> host_modules= { module= gettext; bootstrap=true; no_install=true;
> >> module_srcdir= "gettext/gettext-runtime";
> >> // We always build gettext with pic, because some
> packages
> >> (e.g. gdbserver)
> >> // need it in some configuratons, which is determined
> via
> >> nontrivial tests.
> >> // Always enabling pic seems to make sense for something
> >> tied to
> >> // user-facing output.
> >> extra_configure_flags='--disable-shared --disable-java
> >> --disable-csharp --with-pic';
> >> lib_path=intl/.libs; };
> >> --8<---cut here---end--->8---
> >>
> >> ... and it is applied:
> >>
> >> --8<---cut here---start->8---
> >> -bash-5.1$ ./config.status --config
> >> --srcdir=../../gcc/gettext/gettext-runtime --cache-file=./config.cache
> >>   --disable-werror --with-gmp=/opt/cfarm
> >>   --with-libiconv-prefix=/opt/cfarm --disable-libstdcxx-pch
> >>   --with-included-gettext --program-transform-name=s,y,y,
> >>   --disable-option-checking --build=powerpc-ibm-aix7.3.1.0
> >>   --host=powerpc-ibm-aix7.3.1.0 --target=powerpc-ibm-aix7.3.1.0
> >>   --disable-intermodule --enable-checking=yes,types,extra
> >>   --disable-coverage --enable-languages=c,c++
> >>   --disable-build-format-warnings --disable-shared --disable-java
> >>   --disable-csharp --with-pic build_alias=powerpc-ibm-aix7.3.1.0
> >>   host_alias=powerpc-ibm-aix7.3.1.0 target_alias=powerpc-ibm-aix7.3.1.0
> >>   CC=gcc CFLAGS=-g 'LDFLAGS=-static-libstdc++ -static-libgcc
> >>   -Wl,-bbigtoc' 'CXX=g++ -std=c++11' CXXFLAGS=-g
> >> --8<---cut here---end--->8---
> >>
> >> I'm unsure how to tell what the produced binaries are w.r.t static or
> >> shared, but I only see .o files inside intl/.libs/libintl.a, while I see
> >> a .so.1 in (e.g.) /lib/libz.a, hinting at it not being shared (?)
> >>
> >
> > An AIX shared library created by libtool will look like
> > libfoo.a[libfoo.so.N], where N is the package major version number.
> > Normally with one file.
>
> > An AIX static library will look like libfoo.a[a.o, b.o, c.o]
> > with multiple object files.
> >
> > An AIX archive can contain a combination of shared objects and
> > normal object files.
> >
> > AIX normally uses the convention shr.o or shr_64.o for the name
> > of the shared object file.  Hint, hint, an AIX archive can contain
> > both 32 bit and 64 bit object files or shared objects.
> >
> > I don't know why the gettext build system would create
> > /home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8)
> > if --disable-shared was requested.  That clearly is using the
> > naming of a libtool AIX shared object and failing due to
> > the missing shared object.  Although in this case, the problem
> > seems to be the shared library load path.  AIX uses LIBPATH,
> > not LD_LIBRARY_PATH.
>
> It doesn't create libintl.a with a libintl.so.8 inside of it.  The
> libintl.a contains a bunch of objects, as I'd expect of a static
> library:
>
> --8<---cut here---start->8---
> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a  | grep libintl
> -bash-5.1$ ar -t gettext/intl/.libs/libintl.a
> bindtextdom.o
> dcgettext.o
> ...
> --8<---cut here---end--->8---
>
>
> > Also, for me, the out of tree path was
> >
> > gettext/gettext-runtime/intl/.libs
> >
> > Is your search path missing a level?
>
> No, the above is generated by the GCC build system and builds
> gettext-runtime directly (per Brunos recommendation a while ago) as it
> is replacing intl/ of similar functionality.
>
> I'm currently building GCC with libintl with the threads hack you
> mentioned applied (as I got undefined references to the pthread
> functions you discovered).  I suspect that, bar this issue (which, IIUC,
> Bruno will fix in a new release?) the patch above will fix the issues
> you've encountered on AIX (note that if you want to use gettext in-tree,
> you'd still have to fetch gettext into the tree).
>
> Maybe we should provide a download-prerequisite-y script that skips
> everything but GNU gettext, to retain same behavior?
>
> Have a lovely day.
>

I'm concerned that the gettext fixes are working around AIX support for
libpthread.a as opposed to making --disable-threads function.

--enabled-threads=isoc use of mtx_* is a workaround, but it's still not
allowing users to truly disable threads.

Thanks, David


[committed] libstdc++: Fix Doxygen markup

2023-11-17 Thread Jonathan Wakely
Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h: Fix Doxygen markup.
---
 libstdc++-v3/include/bits/chrono_io.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index 7352df095ff..16e8fc58dff 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2242,7 +2242,7 @@ namespace __detail
 using _Parser_t = _Parser>;
 
 } // namespace __detail
-/// ~endcond
+/// @endcond
 
   template>
-- 
2.41.0



Re: [PATCH] c++, v2: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348]

2023-11-17 Thread Jakub Jelinek
On Fri, Nov 17, 2023 at 09:18:39AM -0500, Jason Merrill wrote:
> You recently pinged this patch, but I haven't seen an update since this
> review?

Oops, sorry, I've missed this and DR 2406 review posts in my inbox
during vacation, will get to that momentarily.

Thanks.

Jakub



Re: Darwin: Replace environment runpath with embedded [PR88590]

2023-11-17 Thread FX Coudert
>> I have done a full rebuild, and having looked more at the structure of 
>> libtool.m4 I am now convinced that having that line outside of the scope of 
>> _LT_DARWIN_LINKER_FEATURES is simply wrong (probably a copy-pasto or 
>> leftover from earlier code).
>> Having rebuilt everything, it only manifests itself in 
>> fixincludes/ChangeLog. Iain is traveling right now, but when he is back I 
>> would like to submit this patch if he agrees with the above. It was 
>> regtested on x86_64-apple-darwin21.

With the correct patch attached.



0001-Build-fix-error-in-fixinclude-configure.patch
Description: Binary data


  1   2   >