[Bug target/111078] csneg is not used for (cset) * 2 - 1

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111078

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-11-18

--- Comment #2 from Andrew Pinski  ---
MIne
For f0 we should be able to match:
(set (reg:SI 98)
(plus:SI (ashift:SI (eq:SI (reg:CC 66 cc)
(const_int 0 [0]))
(const_int 1 [0x1]))
(const_int -1 [0x])))

here. Note -1 is important.

for f1 we should be able to match:
(set (reg:SI 98)
(ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
(const_int 0 [0])))
(const_int 1 [0x1])))

Though I wonder for gimple if we should conconalization to one form or another
...

[PATCH] LoongArch: Optimize the loading of immediate numbers with the same high and low 32-bit values

2023-11-17 Thread Guo Jie
For the following immediate load operation in 
gcc/testsuite/gcc.target/loongarch/imm-load1.c:

long long r = 0x0101010101010101;

Before this patch:

lu12i.w $r15,16842752>>12
ori $r15,$r15,257
lu32i.d $r15,0x10101>>32
lu52i.d $r15,$r15,0x100>>52

After this patch:

lu12i.w $r15,16842752>>12
ori $r15,$r15,257
bstrins.d   $r15,$r15,63,32

gcc/ChangeLog:

* config/loongarch/loongarch.cc (enum loongarch_load_imm_method): Add 
new method.
(loongarch_build_integer): Add relevant implementations for new method.
(loongarch_move_integer): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load1.c: Change old check.
---
 gcc/config/loongarch/loongarch.cc | 22 ++-
 .../gcc.target/loongarch/imm-load1.c  |  3 ++-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d05743bec87..58c00344d09 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -142,12 +142,16 @@ struct loongarch_address_info
 
METHOD_LU52I:
  Load 52-63 bit of the immediate number.
+
+   METHOD_MIRROR:
+ Copy 0-31 bit of the immediate number to 32-63bit.
 */
 enum loongarch_load_imm_method
 {
   METHOD_NORMAL,
   METHOD_LU32I,
-  METHOD_LU52I
+  METHOD_LU52I,
+  METHOD_MIRROR
 };
 
 struct loongarch_integer_op
@@ -1556,11 +1560,23 @@ loongarch_build_integer (struct loongarch_integer_op 
*codes,
 
   int sign31 = (value & (HOST_WIDE_INT_1U << 31)) >> 31;
   int sign51 = (value & (HOST_WIDE_INT_1U << 51)) >> 51;
+
+  unsigned HOST_WIDE_INT hival = value >> 32;
+  unsigned HOST_WIDE_INT loval = value << 32 >> 32;
+
   /* Determine whether the upper 32 bits are sign-extended from the lower
 32 bits. If it is, the instructions to load the high order can be
 ommitted.  */
   if (lu32i[sign31] && lu52i[sign31])
return cost;
+  /* If the lower 32 bits are the same as the upper 32 bits, just copy
+the lower 32 bits to the upper 32 bits.  */
+  else if (loval == hival)
+   {
+ codes[cost].method = METHOD_MIRROR;
+ codes[cost].curr_value = value;
+ return cost + 1;
+   }
   /* Determine whether bits 32-51 are sign-extended from the lower 32
 bits. If so, directly load 52-63 bits.  */
   else if (lu32i[sign31])
@@ -3230,6 +3246,10 @@ loongarch_move_integer (rtx temp, rtx dest, unsigned 
HOST_WIDE_INT value)
   gen_rtx_AND (DImode, x, GEN_INT (0xf)),
   GEN_INT (codes[i].value));
  break;
+   case METHOD_MIRROR:
+ gcc_assert (mode == DImode);
+ emit_insn (gen_insvdi (x, GEN_INT (32), GEN_INT (32), x));
+ break;
default:
  gcc_unreachable ();
}
diff --git a/gcc/testsuite/gcc.target/loongarch/imm-load1.c 
b/gcc/testsuite/gcc.target/loongarch/imm-load1.c
index 2ff02971239..f64cc2956a3 100644
--- a/gcc/testsuite/gcc.target/loongarch/imm-load1.c
+++ b/gcc/testsuite/gcc.target/loongarch/imm-load1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mabi=lp64d -O2" } */
-/* { dg-final { scan-assembler "test:.*lu52i\.d.*\n\taddi\.w.*\n\.L2:" } } */
+/* { dg-final { scan-assembler-not "test:.*lu52i\.d.*\n\taddi\.w.*\n\.L2:" } } 
*/
+/* { dg-final { scan-assembler "test:.*lu12i\.w.*\n\tbstrins\.d.*\n\.L2:" } } 
*/
 
 
 extern long long b[10];
-- 
2.20.1



[Bug target/98477] aarch64: Unnecessary GPR -> FPR moves for conditional select

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98477

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Andrew Pinski  ---
I am going to look into this ...

[Bug target/112454] csinc (csel is though) is not being used when there is matches twice

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112454

--- Comment #3 from Andrew Pinski  ---

-1/~0 has the same issue as mentioned:
```
int finv(int a, int b, int c, int d)
{
  return (a == 2 ? -1 : b) + (c == 3 ? -1 : d);
}
```

[Bug target/112454] csinc (csel is though) is not being used when there is matches twice

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112454

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2023-11-18
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #2 from Andrew Pinski  ---
Mine, looks like a cost issue not recording that 1 (and ~0) are free to create.

you can see the cost issue if we look at combine for the case of 1 csel:
Trying 37 -> 39:
   37: r98:SI=0x1
   39: r92:SI={(cc:CC!=0)?r100:SI:r98:SI}
  REG_DEAD r100:SI
  REG_DEAD cc:CC
  REG_DEAD r98:SI
Successfully matched this instruction:
(set (reg:SI 92 [  ])
(if_then_else:SI (ne (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 100)
(const_int 1 [0x1])))
allowing combination of insns 37 and 39
original costs 4 + 4 = 8
replacement cost 8
deferring deletion of insn with uid = 37.
modifying insn i339: r92:SI={(cc:CC!=0)?r100:SI:0x1}
  REG_DEAD cc:CC
  REG_DEAD r100:SI
deferring rescan insn with uid = 39.

The replacement cost should be still 4.

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-17 Thread waffl3x
The patch is coming along, I just have a quick question regarding
style. I make use of IILE's (immediately invoked lambda expression) a
whole lot in my own code. I know that their use is controversial in
general so I would prefer to ask instead of just submitting the patch
using them a bunch suddenly. I wouldn't have bothered either but this
part is really miserable without them.

If that would be okay, I would suggest an additional exception to
bracing style for lambdas.
This:
[](){
  // stuff
};
Instead of this:
[]()
  {
// stuff
  };

This is especially important for IILE pattern IMO, else it looks really
mediocre. If this isn't okay okay I'll refactor all the IILE's that I
added, or just name them and call them instead. Whatever you think is
most appropriate.

Alex


[Bug tree-optimization/112416] absu is not detected

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112416

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
 CC||pinskia at gcc dot gnu.org
   Last reconfirmed||2023-11-18
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #1 from Andrew Pinski  ---
Mine.

[Bug target/112519] [14 Regression] wrong code with __builtin_sub_overflow_p() on x86_64-pc-linux-gnu

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112519

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Andrew Pinski  ---
Fixed.

[Bug target/112518] [14 Regression] wrong code with __builtin_mul_overflow_p() and int128_t on x86_64-pc-linux-gnu

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112518

--- Comment #5 from Andrew Pinski  ---
Works for me with r14-5566-g841008d3966c0f .

[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-11-18
 CC||pinskia at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #2 from Andrew Pinski  ---
Confirmed.

[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560

--- Comment #1 from Andrew Pinski  ---
This code seems to originally was added with
https://gcc.gnu.org/pipermail/gcc-patches/2011-April/311365.html 

The discussion on the patch continues into May:
https://gcc.gnu.org/pipermail/gcc-patches/2011-May/312415.html

[PATCH v2 9/9] RISC-V: Disable fractional type intrinsics for the XTheadVector extension

2023-11-17 Thread Jun Sha (Joshua)
Because the XTheadVector extension does not support fractional
operations, so we need to delete the related intrinsics.

The types involved are as follows:
v(u)int8mf8_t,
v(u)int8mf4_t,
v(u)int8mf2_t,
v(u)int16mf4_t,
v(u)int16mf2_t,
v(u)int32mf2_t,
vfloat16mf4_t,
vfloat16mf2_t,
vfloat32mf2_t

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_v_ext_mode_p):
New extern.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
New function.
(build_one): If the checked types fail, no function is generated.
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/fractional-type.c: New test.
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-shapes.cc |  22 +++
 gcc/config/riscv/riscv-vector-switch.def  | 144 +-
 gcc/config/riscv/riscv.cc |   2 +-
 .../gcc.target/riscv/rvv/fractional-type.c|  79 ++
 5 files changed, 175 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/fractional-type.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 8cdfadbcf10..7de4f81aa9a 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -153,6 +153,7 @@ extern poly_uint64 riscv_regmode_natural_size 
(machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_v_ext_tuple_mode_p (machine_mode);
 extern bool riscv_v_ext_vls_mode_p (machine_mode);
+extern bool riscv_v_ext_mode_p (machine_mode);
 extern int riscv_get_v_regno_alignment (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
 extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *);
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index e24c535e496..dcdb9506ff2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -33,6 +33,24 @@
 
 namespace riscv_vector {
 
+/* Check whether the RET and ARGS are valid for the function.  */
+
+static bool
+check_type (tree ret, vec )
+{
+  tree arg;
+  unsigned i;
+
+  if (!ret || (builtin_type_p (ret) && !riscv_v_ext_mode_p (TYPE_MODE (ret
+return false;
+
+  FOR_EACH_VEC_ELT (args, i, arg)
+if (!arg || (builtin_type_p (arg) && !riscv_v_ext_mode_p (TYPE_MODE 
(arg
+  return false;
+
+  return true;
+}
+
 /* Add one function instance for GROUP, using operand suffix at index OI,
mode suffix at index PAIR && bi and predication suffix at index pred_idx.  
*/
 static void
@@ -49,6 +67,10 @@ build_one (function_builder , const function_group_info 
,
 group.ops_infos.types[vec_type_idx].index);
   b.allocate_argument_types (function_instance, argument_types);
   b.apply_predication (function_instance, return_type, argument_types);
+
+  if (TARGET_XTHEADVECTOR && !check_type (return_type, argument_types))
+return;
+
   b.add_overloaded_function (function_instance, *group.shape);
   b.add_unique_function (function_instance, (*group.shape), return_type,
 argument_types);
diff --git a/gcc/config/riscv/riscv-vector-switch.def 
b/gcc/config/riscv/riscv-vector-switch.def
index 5c9f9bcbc3e..f17f87f89c9 100644
--- a/gcc/config/riscv/riscv-vector-switch.def
+++ b/gcc/config/riscv/riscv-vector-switch.def
@@ -81,39 +81,39 @@ ENTRY (RVVM8QI, true, LMUL_8, 1)
 ENTRY (RVVM4QI, true, LMUL_4, 2)
 ENTRY (RVVM2QI, true, LMUL_2, 4)
 ENTRY (RVVM1QI, true, LMUL_1, 8)
-ENTRY (RVVMF2QI, true, LMUL_F2, 16)
-ENTRY (RVVMF4QI, true, LMUL_F4, 32)
-ENTRY (RVVMF8QI, TARGET_MIN_VLEN > 32, LMUL_F8, 64)
+ENTRY (RVVMF2QI, !TARGET_XTHEADVECTOR, LMUL_F2, 16)
+ENTRY (RVVMF4QI, !TARGET_XTHEADVECTOR, LMUL_F4, 32)
+ENTRY (RVVMF8QI, (TARGET_MIN_VLEN > 32) && !TARGET_XTHEADVECTOR, LMUL_F8, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32.  */
 ENTRY (RVVM8HI, true, LMUL_8, 2)
 ENTRY (RVVM4HI, true, LMUL_4, 4)
 ENTRY (RVVM2HI, true, LMUL_2, 8)
 ENTRY (RVVM1HI, true, LMUL_1, 16)
-ENTRY (RVVMF2HI, true, LMUL_F2, 32)
-ENTRY (RVVMF4HI, TARGET_MIN_VLEN > 32, LMUL_F4, 64)
+ENTRY (RVVMF2HI, !TARGET_XTHEADVECTOR, LMUL_F2, 32)
+ENTRY (RVVMF4HI, (TARGET_MIN_VLEN > 32) && !TARGET_XTHEADVECTOR, LMUL_F4, 64)
 
 /* Disable modes if TARGET_MIN_VLEN == 32 or !TARGET_VECTOR_ELEN_FP_16.  */
 ENTRY (RVVM8HF, TARGET_VECTOR_ELEN_FP_16, LMUL_8, 2)
 ENTRY (RVVM4HF, TARGET_VECTOR_ELEN_FP_16, LMUL_4, 4)
 ENTRY (RVVM2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_2, 8)
 ENTRY (RVVM1HF, TARGET_VECTOR_ELEN_FP_16, LMUL_1, 16)
-ENTRY (RVVMF2HF, TARGET_VECTOR_ELEN_FP_16, LMUL_F2, 32)
-ENTRY (RVVMF4HF, TARGET_VECTOR_ELEN_FP_16 && 

[PATCH v2 8/9] RISC-V: Add support for xtheadvector-specific load/store intrinsics

2023-11-17 Thread Jun Sha (Joshua)
This patch involves the generation of xtheadvector special
load/store instructions.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class th_loadstore_width): Define new builtin bases.
(BASE): Define new builtin bases.
* config/riscv/riscv-vector-builtins-bases.h:
Define new builtin class.
* config/riscv/riscv-vector-builtins-functions.def (vlsegff):
Include thead-vector-builtins-functions.def.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct th_loadstore_width_def): Define new builtin shapes.
(struct th_indexed_loadstore_width_def):
Define new builtin shapes.
(SHAPE): Define new builtin shapes.
* config/riscv/riscv-vector-builtins-shapes.h:
Define new builtin shapes.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
(vint8m1_t): Add datatypes for XTheadVector.
(vint8m2_t): Likewise.
(vint8m4_t): Likewise.
(vint8m8_t): Likewise.
(vint16m1_t): Likewise.
(vint16m2_t): Likewise.
(vint16m4_t): Likewise.
(vint16m8_t): Likewise.
(vint32m1_t): Likewise.
(vint32m2_t): Likewise.
(vint32m4_t): Likewise.
(vint32m8_t): Likewise.
(vint64m1_t): Likewise.
(vint64m2_t): Likewise.
(vint64m4_t): Likewise.
(vint64m8_t): Likewise.
(vuint8m1_t): Likewise.
(vuint8m2_t): Likewise.
(vuint8m4_t): Likewise.
(vuint8m8_t): Likewise.
(vuint16m1_t): Likewise.
(vuint16m2_t): Likewise.
(vuint16m4_t): Likewise.
(vuint16m8_t): Likewise.
(vuint32m1_t): Likewise.
(vuint32m2_t): Likewise.
(vuint32m4_t): Likewise.
(vuint32m8_t): Likewise.
(vuint64m1_t): Likewise.
(vuint64m2_t): Likewise.
(vuint64m4_t): Likewise.
(vuint64m8_t): Likewise.
* config/riscv/riscv-vector-builtins.cc
(DEF_RVV_I8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_I32_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U8_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U16_OPS): Add datatypes for XTheadVector.
(DEF_RVV_U32_OPS): Add datatypes for XTheadVector.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/thead-vector-builtins-functions.def: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c: New test.
* gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 122 +++
 .../riscv/riscv-vector-builtins-bases.h   |  30 ++
 .../riscv/riscv-vector-builtins-functions.def |   2 +
 .../riscv/riscv-vector-builtins-shapes.cc | 100 ++
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 300 +-
 .../riscv/thead-vector-builtins-functions.def |  30 ++
 gcc/config/riscv/thead-vector.md  | 235 ++
 gcc/config/riscv/vector.md|   1 +
 .../riscv/rvv/xtheadvector/vlb-vsb.c  |  68 
 .../riscv/rvv/xtheadvector/vlbu-vsb.c |  68 
 .../riscv/rvv/xtheadvector/vlh-vsh.c  |  68 
 .../riscv/rvv/xtheadvector/vlhu-vsh.c |  68 
 .../riscv/rvv/xtheadvector/vlw-vsw.c  |  68 
 .../riscv/rvv/xtheadvector/vlwu-vsw.c |  68 
 16 files changed, 1349 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/thead-vector-builtins-functions.def
 create mode 100644 gcc/config/riscv/thead-vector.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlb-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlbu-vsb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlh-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlhu-vsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlw-vsw.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/vlwu-vsw.c

diff --git 

[PATCH v2 6/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part4)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for ternary and unary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-2.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-7.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-8.c: New test.
* gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-9.c: New test.
* gcc.target/riscv/rvv/xtheadvector/unop_v_constraint-1.c: New test.
---
 .../rvv/xtheadvector/ternop_vv_constraint-1.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-2.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-3.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-4.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-5.c |  83 +++
 .../rvv/xtheadvector/ternop_vv_constraint-6.c |  83 +++
 .../rvv/xtheadvector/ternop_vx_constraint-1.c |  71 ++
 .../rvv/xtheadvector/ternop_vx_constraint-2.c |  38 +
 .../rvv/xtheadvector/ternop_vx_constraint-3.c | 125 +
 .../rvv/xtheadvector/ternop_vx_constraint-4.c | 123 +
 .../rvv/xtheadvector/ternop_vx_constraint-5.c | 123 +
 .../rvv/xtheadvector/ternop_vx_constraint-6.c | 130 ++
 .../rvv/xtheadvector/ternop_vx_constraint-7.c | 130 ++
 .../rvv/xtheadvector/ternop_vx_constraint-8.c |  71 ++
 .../rvv/xtheadvector/ternop_vx_constraint-9.c |  71 ++
 .../rvv/xtheadvector/unop_v_constraint-1.c|  68 +
 16 files changed, 1448 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vx_constraint-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/unop_v_constraint-1.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c
new file mode 100644
index 000..d98755e7040
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/ternop_vv_constraint-1.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcxtheadvector -mabi=ilp32d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+#include "riscv_th_vector.h"
+
+/*
+** f1:
+**  ...
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)
+** th.vma[c-d][c-d]\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** th.vma[c-d][c-d]\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** th.vma[c-d][c-d]\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** 

[PATCH v2 5/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part3)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for binary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-31.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-33.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-34.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-35.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-36.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-37.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-38.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-39.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-40.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-41.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-42.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-43.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-44.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-45.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-46.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-47.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-48.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-49.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-50.c: New test.
---
 .../rvv/xtheadvector/binop_vx_constraint-31.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-32.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-33.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-34.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-35.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-36.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-37.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-38.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-39.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-40.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-41.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-42.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-43.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-44.c |  73 +++
 .../rvv/xtheadvector/binop_vx_constraint-45.c | 123 ++
 .../rvv/xtheadvector/binop_vx_constraint-46.c |  72 ++
 .../rvv/xtheadvector/binop_vx_constraint-47.c |  16 +++
 .../rvv/xtheadvector/binop_vx_constraint-48.c |  16 +++
 .../rvv/xtheadvector/binop_vx_constraint-49.c |  16 +++
 .../rvv/xtheadvector/binop_vx_constraint-50.c |  18 +++
 20 files changed, 1238 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-31.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-33.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-34.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-35.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-36.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-37.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-38.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-39.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-40.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-41.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-42.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-43.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-44.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-45.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-46.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-47.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-48.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-49.c
 create mode 

[PATCH v2 4/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part2)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for binary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-11.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-12.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-13.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-14.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-15.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-16.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-17.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-18.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-19.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-20.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-21.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-22.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-23.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-24.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-25.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-26.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-27.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-28.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-29.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-30.c: New test.
---
 .../rvv/xtheadvector/binop_vx_constraint-11.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-12.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-13.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-14.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-15.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-16.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-17.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-18.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-19.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-20.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-21.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-22.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-23.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-24.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-25.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-26.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-27.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-28.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-29.c | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-30.c | 68 +
 20 files changed, 1405 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-13.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-14.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-15.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-17.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-18.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-19.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-20.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-21.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-22.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-23.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-24.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-25.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-26.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-27.c
 create mode 100644 

[PATCH v2 3/9] RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part1)

2023-11-17 Thread Jun Sha (Joshua)
For big changes in instruction generation, we can only duplicate
some typical tests in testsuite/gcc.target/riscv/rvv/base.

This patch is adding some tests for binary operations.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-7.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-10.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-3.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-4.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-5.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-6.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-7.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-8.c: New test.
* gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-9.c: New test.
* gcc.target/riscv/rvv/xtheadvector/rvv-xtheadvector.exp: New test.
---
 .../rvv/xtheadvector/binop_vv_constraint-1.c  | 68 +
 .../rvv/xtheadvector/binop_vv_constraint-3.c  | 27 +++
 .../rvv/xtheadvector/binop_vv_constraint-4.c  | 27 +++
 .../rvv/xtheadvector/binop_vv_constraint-5.c  | 29 
 .../rvv/xtheadvector/binop_vv_constraint-6.c  | 28 +++
 .../rvv/xtheadvector/binop_vv_constraint-7.c  | 29 
 .../rvv/xtheadvector/binop_vx_constraint-1.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-10.c | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-2.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-3.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-4.c  | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-5.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-6.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-7.c  | 68 +
 .../rvv/xtheadvector/binop_vx_constraint-8.c  | 73 +++
 .../rvv/xtheadvector/binop_vx_constraint-9.c  | 68 +
 .../rvv/xtheadvector/rvv-xtheadvector.exp | 41 +++
 17 files changed, 939 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vx_constraint-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/rvv-xtheadvector.exp

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c
new file mode 100644
index 000..172dfb6c228
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/binop_vv_constraint-1.c
@@ -0,0 +1,68 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcxtheadvector -mabi=ilp32d -O3" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+#include "riscv_th_vector.h"
+
+/*
+** f1:
+**  ...
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)
+** th.vle\.v\tv[0-9]+,0\([a-x0-9]+\)

[PATCH v2 2/9] RISC-V: Handle differences between xtheadvector and vector

2023-11-17 Thread Jun Sha (Joshua)
This patch is to handle the differences in instruction generation
between vector and xtheadvector, mainly adding th. prefix
to all xtheadvector instructions.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* config.gcc: Add header for XTheadVector intrinsics.
* config/riscv/riscv-c.cc (riscv_pragma_intrinsic):
Add XTheadVector.
* config/riscv/riscv.cc (riscv_print_operand):
Add new operand format directives.
(riscv_print_operand_punct_valid_p): Likewise.
* config/riscv/vector-iterators.md: Split any_int_unop
for not and neg.
* config/riscv/vector.md (@pred_):
Add th. for xtheadvector instructions.
* config/riscv/riscv_th_vector.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-c.cc   |   4 +-
 gcc/config/riscv/riscv.cc |  11 +-
 gcc/config/riscv/riscv_th_vector.h|  49 ++
 gcc/config/riscv/vector-iterators.md  |   4 +
 gcc/config/riscv/vector.md| 777 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 7 files changed, 466 insertions(+), 383 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ba6d63e33ac..e0fc2b1a27c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -548,7 +548,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 184fff905b2..0a17d5f6656 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -194,8 +194,8 @@ riscv_pragma_intrinsic (cpp_reader *)
 {
   if (!TARGET_VECTOR)
{
- error ("%<#pragma riscv intrinsic%> option %qs needs 'V' extension "
-"enabled",
+ error ("%<#pragma riscv intrinsic%> option %qs needs 'V' or "
+"'XTHEADVECTOR' extension enabled",
 name);
  return;
}
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ecee7eb4727..754107cdaac 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5323,7 +5323,7 @@ riscv_get_v_regno_alignment (machine_mode mode)
 static void
 riscv_print_operand (FILE *file, rtx op, int letter)
 {
-  /* `~` does not take an operand so op will be null
+  /* `~` and '^' does not take an operand so op will be null
  Check for before accessing op.
   */
   if (letter == '~')
@@ -5332,6 +5332,13 @@ riscv_print_operand (FILE *file, rtx op, int letter)
fputc('w', file);
   return;
 }
+
+  if (letter == '^')
+{
+  if (TARGET_XTHEADVECTOR)
+   fputs ("th.", file);
+  return;
+}
   machine_mode mode = GET_MODE (op);
   enum rtx_code code = GET_CODE (op);
 
@@ -5584,7 +5591,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 static bool
 riscv_print_operand_punct_valid_p (unsigned char code)
 {
-  return (code == '~');
+  return (code == '~' || code == '^');
 }
 
 /* Implement TARGET_PRINT_OPERAND_ADDRESS.  */
diff --git a/gcc/config/riscv/riscv_th_vector.h 
b/gcc/config/riscv/riscv_th_vector.h
new file mode 100644
index 000..194652032bc
--- /dev/null
+++ b/gcc/config/riscv/riscv_th_vector.h
@@ -0,0 +1,49 @@
+/* RISC-V 'XTheadVector' Extension intrinsics include file.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef 

[PATCH v2 1/9] RISC-V: minimal support for xtheadvector

2023-11-17 Thread Jun Sha (Joshua)
This patch is to introduce basic XTheadVector support
(march string parsing and a test for __riscv_xtheadvector)
according to https://github.com/T-head-Semi/thead-extension-spec/

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::parse): : Add new vendor extension.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Add test marco.
* config/riscv/riscv.opt: Add new mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-__riscv_th_v_intrinsic.c: New test.
* gcc.target/riscv/rvv/xtheadvector.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc | 10 ++
 gcc/config/riscv/riscv-c.cc |  4 
 gcc/config/riscv/riscv.opt  |  2 ++
 .../riscv/predef-__riscv_th_v_intrinsic.c   | 11 +++
 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c   | 13 +
 5 files changed, 40 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 526dbb7603b..914924171fd 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -75,6 +75,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
 
   {"v", "zvl128b"},
   {"v", "zve64d"},
+  {"xtheadvector", "zvl128b"},
+  {"xtheadvector", "zve64d"},
 
   {"zve32f", "f"},
   {"zve64f", "f"},
@@ -325,6 +327,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"xtheadmemidx", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadmempair", ISA_SPEC_CLASS_NONE, 1, 0},
   {"xtheadsync", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xtheadvector", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1495,6 +1498,10 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
 error_at (loc, "%<-march=%s%>: z*inx conflicts with floating-point "
   "extensions", arch);
 
+  if (subset_list->lookup ("v") && subset_list->lookup ("xtheadvector"))
+error_at (loc, "%<-march=%s%>: xtheadvector conflicts with vector "
+  "extensions", arch);
+
   /* 'H' hypervisor extension requires base ISA with 32 registers.  */
   if (subset_list->lookup ("e") && subset_list->lookup ("h"))
 error_at (loc, "%<-march=%s%>: h extension requires i extension", arch);
@@ -1680,6 +1687,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"xtheadmemidx",  _options::x_riscv_xthead_subext, MASK_XTHEADMEMIDX},
   {"xtheadmempair", _options::x_riscv_xthead_subext, MASK_XTHEADMEMPAIR},
   {"xtheadsync",_options::x_riscv_xthead_subext, MASK_XTHEADSYNC},
+  {"xtheadvector",  _options::x_riscv_xthead_subext, MASK_XTHEADVECTOR},
+  {"xtheadvector",  _options::x_target_flags, MASK_FULL_V},
+  {"xtheadvector",  _options::x_target_flags, MASK_VECTOR},
 
   {"xventanacondops", _options::x_riscv_xventana_subext, 
MASK_XVENTANACONDOPS},
 
diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index b7f9ba204f7..184fff905b2 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -137,6 +137,10 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 riscv_ext_version_value (0, 11));
 }
 
+   if (TARGET_XTHEADVECTOR)
+ builtin_define_with_int_value ("__riscv_th_v_intrinsic",
+riscv_ext_version_value (0, 11));
+
   /* Define architecture extension test macros.  */
   builtin_define_with_int_value ("__riscv_arch_test", 1);
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 70d78151cee..72857aea352 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -438,6 +438,8 @@ Mask(XTHEADMEMPAIR) Var(riscv_xthead_subext)
 
 Mask(XTHEADSYNC)Var(riscv_xthead_subext)
 
+Mask(XTHEADVECTOR)  Var(riscv_xthead_subext)
+
 TargetVariable
 int riscv_xventana_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
new file mode 100644
index 000..1c764241db6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-__riscv_th_v_intrinsic.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64imafdcxtheadvector -mabi=lp64d" } */
+
+int main () {
+
+#if __riscv_th_v_intrinsic != 11000
+#error "__riscv_th_v_intrinsic"
+#endif
+
+  return 0;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c
new file mode 100644
index 000..d52921e1314
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options 

[PATCH v2 0/9] RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jun Sha (Joshua)
This patch series presents gcc implementation of the XTheadVector
extension [1].

[1] https://github.com/T-head-Semi/thead-extension-spec/

I updated my patch series, because I forgot to add co-authors in
the last version.

Contributors:
Jun Sha (Joshua) 
Jin Ma 
Christoph Müllner 

RISC-V: minimal support for xtheadvector
RISC-V: Handle differences between xtheadvector and vector
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part1)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part2)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part3)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part4)
RISC-V: Tests for overlapping RVV and XTheadVector instructions (Part5)
RISC-V: Add support for xtheadvector-specific load/store intrinsics
RISC-V: Disable fractional type intrinsics for XTheadVector

---
 gcc/common/config/riscv/riscv-common.cc   |  10 +
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-c.cc   |   8 +-
 gcc/config/riscv/riscv-protos.h   |   1 +
 .../riscv/riscv-vector-builtins-bases.cc  | 122 +++
 .../riscv/riscv-vector-builtins-bases.h   |  30 +
 .../riscv/riscv-vector-builtins-functions.def |   2 +
 .../riscv/riscv-vector-builtins-shapes.cc | 122 +++
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 .../riscv/riscv-vector-builtins-types.def | 120 +++
 gcc/config/riscv/riscv-vector-builtins.cc | 300 ++-
 gcc/config/riscv/riscv-vector-switch.def  | 144 ++--
 gcc/config/riscv/riscv.cc |  13 +-
 gcc/config/riscv/riscv.opt|   2 +
 gcc/config/riscv/riscv_th_vector.h|  49 ++
 .../riscv/thead-vector-builtins-functions.def |  30 +
 gcc/config/riscv/thead-vector.md  | 235 ++
 gcc/config/riscv/vector-iterators.md  |   4 +
 gcc/config/riscv/vector.md| 778 +-
 .../riscv/predef-__riscv_th_v_intrinsic.c |  11 +
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 .../gcc.target/riscv/rvv/fractional-type.c|  79 ++
 .../gcc.target/riscv/rvv/xtheadvector.c   |  13 +
 .../rvv/xtheadvector/autovec/vadd-run-nofm.c  |   4 +
 .../riscv/rvv/xtheadvector/autovec/vadd-run.c |  81 ++
 .../xtheadvector/autovec/vadd-rv32gcv-nofm.c  |  10 +
 .../rvv/xtheadvector/autovec/vadd-rv32gcv.c   |   8 +
 .../xtheadvector/autovec/vadd-rv64gcv-nofm.c  |  10 +
 .../rvv/xtheadvector/autovec/vadd-rv64gcv.c   |   8 +
 .../rvv/xtheadvector/autovec/vadd-template.h  |  70 ++
 .../rvv/xtheadvector/autovec/vadd-zvfh-run.c  |  54 ++
 .../riscv/rvv/xtheadvector/autovec/vand-run.c |  75 ++
 .../rvv/xtheadvector/autovec/vand-rv32gcv.c   |   7 +
 .../rvv/xtheadvector/autovec/vand-rv64gcv.c   |   7 +
 .../rvv/xtheadvector/autovec/vand-template.h  |  61 ++
 .../rvv/xtheadvector/binop_vv_constraint-1.c  |  68 ++
 .../rvv/xtheadvector/binop_vv_constraint-3.c  |  27 +
 .../rvv/xtheadvector/binop_vv_constraint-4.c  |  27 +
 .../rvv/xtheadvector/binop_vv_constraint-5.c  |  29 +
 .../rvv/xtheadvector/binop_vv_constraint-6.c  |  28 +
 .../rvv/xtheadvector/binop_vv_constraint-7.c  |  29 +
 .../rvv/xtheadvector/binop_vx_constraint-1.c  |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-10.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-11.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-12.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-13.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-14.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-15.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-16.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-17.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-18.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-19.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-2.c  |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-20.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-21.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-22.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-23.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-24.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-25.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-26.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-27.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-28.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-29.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-3.c  |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-30.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-31.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-32.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-33.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-34.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-35.c |  73 ++
 .../rvv/xtheadvector/binop_vx_constraint-36.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-37.c |  68 ++
 .../rvv/xtheadvector/binop_vx_constraint-38.c |  68 ++
 

[Bug middle-end/112581] [14 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu (generated code hangs) since r14-4661 due to reassoc not handling maybe_undefs

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112581

--- Comment #10 from Andrew Pinski  ---
I have a patch but I don't like it since it requires to touch
gimple-if-to-switch.cc since that uses init_range_entry.  Maybe someone else
can come up with a better patch.

Re: [PATCH] RISC-V: Refactor RVV iterators[NFC]

2023-11-17 Thread Kito Cheng
LGTM, that's a really great clean up :)

On Sat, Nov 18, 2023 at 11:12 AM Juzhe-Zhong  wrote:
>
> This patch refactors RVV iteratros for easier maintain.
>
> E.g.
>
> (define_mode_iterator V [
>   RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
>
>   RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
>
>   (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
> (RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
>   (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
>   (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
>
>   RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
>
>   (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
>   (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
> TARGET_MIN_VLEN > 32")
>
>   (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
>   (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
>
>   (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
>   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
> ])
>
> change it into:
>
> (define_mode_iterator V [VI VF_ZVFHMIN])
>
> gcc/ChangeLog:
>
> * config/riscv/vector-iterators.md: Refactor iterators.
>
> ---
>  gcc/config/riscv/vector-iterators.md | 661 +--
>  1 file changed, 124 insertions(+), 537 deletions(-)
>
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index f04c7fe5491..469875ce67c 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -108,48 +108,49 @@
>UNSPECV_FRM_RESTORE_EXIT
>  ])
>
> -(define_mode_iterator V [
> +(define_mode_iterator VI [
>RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
>
>RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
>
> -  (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
> (RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
> -  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
> -  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> -
>RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
>
> -  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
> -  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 
> && TARGET_MIN_VLEN > 32")
> -
>(RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
>(RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
> +])
> +
> +;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
> +;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
> +;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
> +;; from load, store and convert for it.
> +;; It is not enough to set the "enabled" attribute to false
> +;; since this will only disable insn alternatives in reload but still
> +;; allow the instruction and mode to be matched during combine et al.
> +(define_mode_iterator VF [
> +  (RVVM8HF "TARGET_ZVFH") (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH")
> +  (RVVM1HF "TARGET_ZVFH") (RVVMF2HF "TARGET_ZVFH")
> +  (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
> +
> +  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
> +  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 
> && TARGET_MIN_VLEN > 32")
>
>(RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
>(RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
>  ])
>
> -(define_mode_iterator V_VLS [
> -  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI 
> "TARGET_MIN_VLEN > 32")
> -
> -  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
> -
> +(define_mode_iterator VF_ZVFHMIN [
>(RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
> (RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
>(RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
>(RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
>
> -  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
> -
>(RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
> (RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
>(RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 
> && TARGET_MIN_VLEN > 32")
>
> -  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
> -  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
> -
>(RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
>(RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")

[PATCH] LoongArch: Modify MUSL_DYNAMIC_LINKER.

2023-11-17 Thread Lulu Cheng
Use no suffix at all in the musl dynamic linker name for hard
float ABI. Use -sf and -sp suffixes in musl dynamic linker name
for soft float and single precision ABIs. The following table
outlines the musl interpreter names for the LoongArch64 ABI names.

musl interpreter| LoongArch64 ABI
--- | -
ld-musl-loongarch64.so.1| loongarch64-lp64d
ld-musl-loongarch64-sp.so.1 | loongarch64-lp64f
ld-musl-loongarch64-sf.so.1 | loongarch64-lp64s

gcc/ChangeLog:

* config/loongarch/gnu-user.h (MUSL_ABI_SPEC): Modify suffix.
---
 gcc/config/loongarch/gnu-user.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/gnu-user.h b/gcc/config/loongarch/gnu-user.h
index 9616d6e8a0b..e9f4bcef1d4 100644
--- a/gcc/config/loongarch/gnu-user.h
+++ b/gcc/config/loongarch/gnu-user.h
@@ -34,9 +34,9 @@ along with GCC; see the file COPYING3.  If not see
   "/lib" ABI_GRLEN_SPEC "/ld-linux-loongarch-" ABI_SPEC ".so.1"
 
 #define MUSL_ABI_SPEC \
-  "%{mabi=lp64d:-lp64d}" \
-  "%{mabi=lp64f:-lp64f}" \
-  "%{mabi=lp64s:-lp64s}"
+  "%{mabi=lp64d:}" \
+  "%{mabi=lp64f:-sp}" \
+  "%{mabi=lp64s:-sf}"
 
 #undef MUSL_DYNAMIC_LINKER
 #define MUSL_DYNAMIC_LINKER \
-- 
2.31.1



[PATCH] RISC-V: Refactor RVV iterators[NFC]

2023-11-17 Thread Juzhe-Zhong
This patch refactors RVV iteratros for easier maintain.

E.g. 

(define_mode_iterator V [
  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")

  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")

  (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
(RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")

  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")

  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")

  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")

  (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
  (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
])

change it into:

(define_mode_iterator V [VI VF_ZVFHMIN])

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Refactor iterators.

---
 gcc/config/riscv/vector-iterators.md | 661 +--
 1 file changed, 124 insertions(+), 537 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f04c7fe5491..469875ce67c 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -108,48 +108,49 @@
   UNSPECV_FRM_RESTORE_EXIT
 ])
 
-(define_mode_iterator V [
+(define_mode_iterator VI [
   RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")
 
   RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
 
-  (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
(RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
-  (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-
   RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
 
-  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
-  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
-
   (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
   (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
+])
+
+;; This iterator is the same as above but with TARGET_VECTOR_ELEN_FP_16
+;; changed to TARGET_ZVFH.  TARGET_VECTOR_ELEN_FP_16 is also true for
+;; TARGET_ZVFHMIN while we actually want to disable all instructions apart
+;; from load, store and convert for it.
+;; It is not enough to set the "enabled" attribute to false
+;; since this will only disable insn alternatives in reload but still
+;; allow the instruction and mode to be matched during combine et al.
+(define_mode_iterator VF [
+  (RVVM8HF "TARGET_ZVFH") (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH")
+  (RVVM1HF "TARGET_ZVFH") (RVVMF2HF "TARGET_ZVFH")
+  (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
+
+  (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
+  (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
 
   (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
 ])
 
-(define_mode_iterator V_VLS [
-  RVVM8QI RVVM4QI RVVM2QI RVVM1QI RVVMF2QI RVVMF4QI (RVVMF8QI "TARGET_MIN_VLEN 
> 32")
-
-  RVVM8HI RVVM4HI RVVM2HI RVVM1HI RVVMF2HI (RVVMF4HI "TARGET_MIN_VLEN > 32")
-
+(define_mode_iterator VF_ZVFHMIN [
   (RVVM8HF "TARGET_VECTOR_ELEN_FP_16") (RVVM4HF "TARGET_VECTOR_ELEN_FP_16") 
(RVVM2HF "TARGET_VECTOR_ELEN_FP_16")
   (RVVM1HF "TARGET_VECTOR_ELEN_FP_16") (RVVMF2HF "TARGET_VECTOR_ELEN_FP_16")
   (RVVMF4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
 
-  RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32")
-
   (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32") 
(RVVM2SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVM1SF "TARGET_VECTOR_ELEN_FP_32") (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN > 32")
 
-  (RVVM8DI "TARGET_VECTOR_ELEN_64") (RVVM4DI "TARGET_VECTOR_ELEN_64")
-  (RVVM2DI "TARGET_VECTOR_ELEN_64") (RVVM1DI "TARGET_VECTOR_ELEN_64")
-
   (RVVM8DF "TARGET_VECTOR_ELEN_FP_64") (RVVM4DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64") (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
+])
 
-  ;; VLS modes.
+(define_mode_iterator VLSI [
   (V1QI "riscv_vector::vls_mode_valid_p (V1QImode)")
   (V2QI "riscv_vector::vls_mode_valid_p (V2QImode)")
   (V4QI "riscv_vector::vls_mode_valid_p (V4QImode)")
@@ -195,7 +196,45 @@
   (V64DI "riscv_vector::vls_mode_valid_p (V64DImode) && 

[Bug driver/108865] gcc on Windows fails with Unicode path to source file

2023-11-17 Thread aoliva at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865

Alexandre Oliva  changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu.org

--- Comment #44 from Alexandre Oliva  ---
All configure --with-* and --enable-* options are stored in shell variables
named with_* and enable_*, respectively, so it's just a matter of testing for
yes (for --enable) or rather for no (for explicit --disable).  Look for e.g.
--enable-initfini-array around line 1932 in $top_srcdir/gcc/configure.ac, or
with_avrlibc around line 1495 in gcc/config.gcc; you can do something similar
in gcc/config.host, without any explicit argument passing, because the
config.{host,gcc} files are sourced by configure, so they inherit all shell
variables.

Re: [ARM] unexpected sizeof() of a complex packed type

2023-11-17 Thread Andrew Pinski via Gcc
On Thu, Nov 16, 2023 at 8:42 AM Dmitry Antipov  wrote:
>
> (The following sample is taken from my LKML post at 
> https://lkml.org/lkml/2023/11/15/213)
>
> $ cat t-build-bug.c
>
> struct vring_tx_mac {
> unsigned int d[3];
> unsigned int ucode_cmd;
> } __attribute__((packed));
>
> struct vring_rx_mac {
> unsigned int d0;
> unsigned int d1;
> unsigned short w4;
> union { struct { unsigned short pn_15_0; unsigned int pn_47_16; } 
> __attribute__((packed));
> struct { unsigned short pn_15_0; unsigned int pn_47_16; } 
> __attribute__((packed)) pn;
> };
> } __attribute__((packed));
>
> struct wil_ring_dma_addr {
> unsigned int addr_low;
> unsigned short addr_high;
> } __attribute__((packed));
>
> struct vring_tx_dma {
> unsigned int d0;
> struct wil_ring_dma_addr addr;
> unsigned char ip_length;
> unsigned char b11;
> unsigned char error;
> unsigned char status;
> unsigned short length;
> } __attribute__((packed));
>
> struct vring_tx_desc {
> struct vring_tx_mac mac;
> struct vring_tx_dma dma;
> } __attribute__((packed));
>
> struct wil_ring_tx_enhanced_mac {
> unsigned int d[3];
> unsigned short tso_mss;
> unsigned short scratchpad;
> } __attribute__((packed));
>
> struct wil_ring_tx_enhanced_dma {
> unsigned char l4_hdr_len;
> unsigned char cmd;
> unsigned short w1;
> struct wil_ring_dma_addr addr;
> unsigned char ip_length;
> unsigned char b11;
> unsigned short addr_high_high;
> unsigned short length;
> } __attribute__((packed));
>
> struct wil_tx_enhanced_desc {
> struct wil_ring_tx_enhanced_mac mac;
> struct wil_ring_tx_enhanced_dma dma;
> } __attribute__((packed));
>
> union wil_tx_desc {
> struct vring_tx_desc legacy;
> struct wil_tx_enhanced_desc enhanced;
> } __attribute__((packed));
>
> struct vring_rx_dma {
> unsigned int d0;
> struct wil_ring_dma_addr addr;
> unsigned char ip_length;
> unsigned char b11;
> unsigned char error;
> unsigned char status;
> unsigned short length;
> } __attribute__((packed));
>
> struct vring_rx_desc {
> struct vring_rx_mac mac;
> struct vring_rx_dma dma;
> } __attribute__((packed));
>
> struct wil_ring_rx_enhanced_mac {
> unsigned int d[3];
> unsigned short buff_id;
> unsigned short reserved;
> } __attribute((packed));
>
> struct wil_ring_rx_enhanced_dma {
> unsigned int d0;
> struct wil_ring_dma_addr addr;
> unsigned short w5;
> unsigned short addr_high_high;
> unsigned short length;
> } __attribute((packed));
>
> struct wil_rx_enhanced_desc {
> struct wil_ring_rx_enhanced_mac mac;
> struct wil_ring_rx_enhanced_dma dma;
> } __attribute((packed));
>
> union wil_rx_desc {
> struct vring_rx_desc legacy;
> struct wil_rx_enhanced_desc enhanced;
> } __attribute__((packed));
>
> union wil_ring_desc {
> union wil_tx_desc tx;
> union wil_rx_desc rx;
> } __attribute__((packed));
>
> int f (void) {
> return sizeof(union wil_ring_desc);
> }
>
> $ arm-linux-gnu-gcc -v
> Using built-in specs.
> COLLECT_GCC=arm-linux-gnu-gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/arm-linux-gnueabi/13/lto-wrapper
> Target: arm-linux-gnueabi
> Configured with: ../gcc-13.2.1-20230728/configure --bindir=/usr/bin 
> --build=x86_64-redhat-linux-gnu --datadir=/usr/share --disable-decimal-float 
> --disable-dependency-tracking --disable-gold
> --disable-libgcj --disable-libgomp --disable-libmpx --disable-libquadmath 
> --disable-libssp --disable-libunwind-exceptions --disable-shared 
> --disable-silent-rules --disable-sjlj-exceptions
> --disable-threads --with-ld=/usr/bin/arm-linux-gnu-ld --enable-__cxa_atexit 
> --enable-checking=release --enable-gnu-unique-object --enable-initfini-array 
> --enable-languages=c,c++
> --enable-linker-build-id --enable-lto --enable-nls --enable-obsolete 
> --enable-plugin --enable-targets=all --exec-prefix=/usr 
> --host=x86_64-redhat-linux-gnu --includedir=/usr/include
> --infodir=/usr/share/info --libexecdir=/usr/libexec --localstatedir=/var 
> --mandir=/usr/share/man --prefix=/usr --program-prefix=arm-linux-gnu- 
> --sbindir=/usr/sbin --sharedstatedir=/var/lib
> --sysconfdir=/etc --target=arm-linux-gnueabi 
> --with-bugurl=http://bugzilla.redhat.com/bugzilla/ 
> --with-gcc-major-version-only --with-isl --with-newlib 
> --with-plugin-ld=/usr/bin/arm-linux-gnu-ld
> --with-sysroot=/usr/arm-linux-gnu/sys-root --with-system-libunwind 
> --with-system-zlib --without-headers --with-tune=generic-armv7-a 
> --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16
> --with-abi=aapcs-linux --enable-gnu-indirect-function 
> --with-linker-hash-style=gnu
> Thread model: single
> Supported LTO compression algorithms: zlib 

[pushed] analyzer: new warning: -Wanalyzer-infinite-loop [PR106147]

2023-11-17 Thread David Malcolm
This patch implements a new analyzer warning: -Wanalyzer-infinite-loop.

It works by examining the exploded graph once the latter has been
fully built.  It attempts to detect cycles in the exploded graph in
which:
- no externally visible work occurs
- no escape is possible from the cycle once it has been entered
- the program state is "sufficiently concrete" at each step:
  - no unknown activity could be occurring
  - the worklist was fully drained for each enode in the cycle
i.e. every enode in the cycle is processed

For example, it correctly complains about this bogus "for" loop:

  int sum = 0;
  for (struct node *iter = n; iter; iter->next)
sum += n->val;
  return sum;

like this:

infinite-loop-linked-list.c: In function ‘for_loop_noop_next’:
infinite-loop-linked-list.c:110:31: warning: infinite loop [CWE-835] 
[-Wanalyzer-infinite-loop]
  110 |   for (struct node *iter = n; iter; iter->next)
  |   ^~~~
  ‘for_loop_noop_next’: events 1-5
|
|  110 |   for (struct node *iter = n; iter; iter->next)
|  |   ^~~~
|  |   |
|  |   (1) infinite loop here
|  |   (2) when ‘iter’ is non-NULL: always 
following ‘true’ branch...
|  |   (5) ...to here
|  111 | sum += n->val;
|  | ~
|  | |   |
|  | |   (3) ...to here
|  | (4) looping back...
|

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

My integration test suite shows 12 true positives and 2 false positives,
which seems good enough for an initial implementation; I'll file bugs
for myself to track fixing the two false positives.

Pushed to trunk as r14-5566-g841008d3966c0f.

gcc/ChangeLog:
PR analyzer/106147
* Makefile.in (ANALYZER_OBJS): Add analyzer/infinite-loop.o.
* doc/invoke.texi: Add -fdump-analyzer-infinite-loop and
-Wanalyzer-infinite-loop.  Add missing CWE link for
-Wanalyzer-infinite-recursion.
* timevar.def (TV_ANALYZER_INFINITE_LOOPS): New.

gcc/analyzer/ChangeLog:
PR analyzer/106147
* analyzer.opt (Wanalyzer-infinite-loop): New option.
(fdump-analyzer-infinite-loop): New option.
* checker-event.h (start_cfg_edge_event::get_desc): Drop "final".
(start_cfg_edge_event::maybe_describe_condition): Convert from
private to protected.
* checker-path.h (checker_path::get_logger): New.
* diagnostic-manager.cc (process_worklist_item): Update for
new context param of maybe_update_for_edge.
* engine.cc
(impl_region_model_context::impl_region_model_context): Add
out_could_have_done_work param to both ctors and use it to
initialize mm_out_could_have_done_work.
(impl_region_model_context::maybe_did_work): New vfunc
implementation.
(exploded_node::on_stmt): Add out_could_have_done_work param and
pass to ctxt ctor.
(exploded_node::on_stmt_pre): Treat setjmp and longjmp as "doing
work".
(exploded_node::on_longjmp): Likewise.
(exploded_edge::exploded_edge): Add "could_do_work" param and use
it to initialize m_could_do_work_p.
(exploded_edge::dump_dot_label): Add result of could_do_work_p.
(exploded_graph::add_function_entry): Mark edge as doing no work.
(exploded_graph::add_edge): Add "could_do_work" param and pass to
exploded_edge ctor.
(add_tainted_args_callback): Treat as doing no work.
(exploded_graph::process_worklist): Likewise when merging nodes.
(maybe_process_run_of_before_supernode_enodes::item): Likewise.
(exploded_graph::maybe_create_dynamic_call): Likewise.
(exploded_graph::process_node): Likewise for phi nodes.
Pass in a "could_have_done_work" bool when handling stmts and use
when creating edges.  Assume work is done at bifurcation.
(exploded_path::feasible_p): Update for new context param of
maybe_update_for_edge.
(feasibility_state::feasibility_state): New ctor.
(feasibility_state::operator=): New.
(feasibility_state::maybe_update_for_edge): Add ctxt param and use
it.  Fix missing newline when logging state.
(impl_run_checkers): Call exploded_graph::detect_infinite_loops.
* exploded-graph.h
(impl_region_model_context::impl_region_model_context): Add
out_could_have_done_work param to both ctors.
(impl_region_model_context::maybe_did_work): New decl.
(impl_region_model_context::checking_for_infinite_loop_p): New.
(impl_region_model_context::on_unusable_in_infinite_loop): New.
(impl_region_model_context::m_out_could_have_done_work): New
field.
(exploded_node::on_stmt): Add "out_could_have_done_work" 

[Bug analyzer/106147] RFE: -fanalyzer could complain about some cases of infinite loops and infinite recursion

2023-11-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106147

--- Comment #7 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:841008d3966c0fe7a80ec10703a50fbdab7620ac

commit r14-5566-g841008d3966c0fe7a80ec10703a50fbdab7620ac
Author: David Malcolm 
Date:   Fri Nov 17 19:55:25 2023 -0500

analyzer: new warning: -Wanalyzer-infinite-loop [PR106147]

This patch implements a new analyzer warning: -Wanalyzer-infinite-loop.

It works by examining the exploded graph once the latter has been
fully built.  It attempts to detect cycles in the exploded graph in
which:
- no externally visible work occurs
- no escape is possible from the cycle once it has been entered
- the program state is "sufficiently concrete" at each step:
  - no unknown activity could be occurring
  - the worklist was fully drained for each enode in the cycle
i.e. every enode in the cycle is processed

For example, it correctly complains about this bogus "for" loop:

  int sum = 0;
  for (struct node *iter = n; iter; iter->next)
sum += n->val;
  return sum;

like this:

infinite-loop-linked-list.c: In function âfor_loop_noop_nextâ:
infinite-loop-linked-list.c:110:31: warning: infinite loop [CWE-835]
[-Wanalyzer-infinite-loop]
  110 |   for (struct node *iter = n; iter; iter->next)
  |   ^~~~
  âfor_loop_noop_nextâ: events 1-5
|
|  110 |   for (struct node *iter = n; iter; iter->next)
|  |   ^~~~
|  |   |
|  |   (1) infinite loop here
|  |   (2) when âiterâ is non-NULL:
always following âtrueâ branch...
|  |   (5) ...to here
|  111 | sum += n->val;
|  | ~
|  | |   |
|  | |   (3) ...to here
|  | (4) looping back...
|

gcc/ChangeLog:
PR analyzer/106147
* Makefile.in (ANALYZER_OBJS): Add analyzer/infinite-loop.o.
* doc/invoke.texi: Add -fdump-analyzer-infinite-loop and
-Wanalyzer-infinite-loop.  Add missing CWE link for
-Wanalyzer-infinite-recursion.
* timevar.def (TV_ANALYZER_INFINITE_LOOPS): New.

gcc/analyzer/ChangeLog:
PR analyzer/106147
* analyzer.opt (Wanalyzer-infinite-loop): New option.
(fdump-analyzer-infinite-loop): New option.
* checker-event.h (start_cfg_edge_event::get_desc): Drop "final".
(start_cfg_edge_event::maybe_describe_condition): Convert from
private to protected.
* checker-path.h (checker_path::get_logger): New.
* diagnostic-manager.cc (process_worklist_item): Update for
new context param of maybe_update_for_edge.
* engine.cc
(impl_region_model_context::impl_region_model_context): Add
out_could_have_done_work param to both ctors and use it to
initialize mm_out_could_have_done_work.
(impl_region_model_context::maybe_did_work): New vfunc
implementation.
(exploded_node::on_stmt): Add out_could_have_done_work param and
pass to ctxt ctor.
(exploded_node::on_stmt_pre): Treat setjmp and longjmp as "doing
work".
(exploded_node::on_longjmp): Likewise.
(exploded_edge::exploded_edge): Add "could_do_work" param and use
it to initialize m_could_do_work_p.
(exploded_edge::dump_dot_label): Add result of could_do_work_p.
(exploded_graph::add_function_entry): Mark edge as doing no work.
(exploded_graph::add_edge): Add "could_do_work" param and pass to
exploded_edge ctor.
(add_tainted_args_callback): Treat as doing no work.
(exploded_graph::process_worklist): Likewise when merging nodes.
(maybe_process_run_of_before_supernode_enodes::item): Likewise.
(exploded_graph::maybe_create_dynamic_call): Likewise.
(exploded_graph::process_node): Likewise for phi nodes.
Pass in a "could_have_done_work" bool when handling stmts and use
when creating edges.  Assume work is done at bifurcation.
(exploded_path::feasible_p): Update for new context param of
maybe_update_for_edge.
(feasibility_state::feasibility_state): New ctor.
(feasibility_state::operator=): New.
(feasibility_state::maybe_update_for_edge): Add ctxt param and use
it.  Fix missing newline when logging state.
(impl_run_checkers): Call exploded_graph::detect_infinite_loops.
* exploded-graph.h

[Bug c++/112588] ICE in make_decl_rtl when returning str literal when string header imported in module

2023-11-17 Thread nathanieloshead at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112588

Nathaniel Shead  changed:

   What|Removed |Added

 CC||nathanieloshead at gmail dot 
com

--- Comment #1 from Nathaniel Shead  ---
Minimised (the actual offender here is the with std::allocator):


  // test.h
  void f(int*);

  template 
  struct S {
void g(int n) { f(); }
  };

  template struct S;


  // a.cpp
  module;
  #include "test.h"
  export module test;


  // b.cpp
  #include "test.h"
  import test;


So far it seems the issue is that the PARM_DECL in the expression tree of the
body of the instantiation for `S::g` is a different node from the actual
PARM_DECL in g's DECL_ARGUMENTS; the latter gets RTL but the former does not.
The issue is in the deduplication logic for instantiations somewhere.

The following patch fixes this issue but causes other issues in the testsuite,
and I don't think this is the correct approach anyway:


diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 4f5b6e2747a..f2d191fc408 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8302,9 +8302,7 @@ trees_in::decl_value ()
   if (TREE_CODE (inner) == FUNCTION_DECL)
{
  tree e_inner = STRIP_TEMPLATE (existing);
- for (auto parm = DECL_ARGUMENTS (inner);
-  parm; parm = DECL_CHAIN (parm))
-   DECL_CONTEXT (parm) = e_inner;
+ DECL_ARGUMENTS (inner) = DECL_ARGUMENTS (e_inner);
}

   /* And our result is the existing node.  */


(I was originally working on this after attempting to reduce PR9.)

[Bug ipa/112601] [11/12/13/14 Regression] ICE in cgraph_node::verify_node(): error: invalid calls_comdat_local flag

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112601

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-11-18
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed. Looks like an older regression.

[PATCH v3] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-17 Thread Cassio Neri
The following functions invoke signed integer overflow (UB) for some extreme
values of days and months [1]:

  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1 the problem is that in libstdc++ days::rep is int64_t. Other
implementations use int32_t and cast operands to int64_t. Hence then perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

For x86-64, long long is int64 so the cast is useless.  For #2, casting to a
larger type could help but all implementations follow the Standard's "Returns
clause" and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++.  The fix uses this new function:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing. This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

Changes with respect to previous versions:
 v3: Fix screwed up email send with v2. (Sorry about that. I shall learn at
 some point.)
 v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from test.

 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/month/2.cc   | 30 ++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 30 ++
 5 files changed, 114 insertions(+), 24 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..691bb106bb9 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != __y mod d.  To
+   // fix both issues we from _Up(__y)"subtract"  an __offset >=
+   // 255 + d - 1 to make room for the addition to __x and shift the modulo
+   // to the correct value.
+   auto constexpr __a = _Up(-1) - _Up(255 

[PATCH v2] The following functions invoke signed integer overflow (UB) for some extreme values of days and months [1]:

2023-11-17 Thread Cassio Neri
  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1 the problem is that in libstdc++ days::rep is int64_t. Other
implementations use int32_t and cast operands to int64_t. Hence then perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

For x86-64, long long is int64 so the cast is useless.  For #2, casting to a
larger type could help but all implementations follow the Standard's "Returns
clause" and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++.  The fix uses this new function:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing. This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---
 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 3 files changed, 54 insertions(+), 24 deletions(-)

 Changes with respect to previous versions:
 v2: Replaced _T with _Tp and _U with _Up. Removed copyright+license from test.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..691bb106bb9 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y) != __y mod d.  To
+   // fix both issues we from _Up(__y)"subtract"  an __offset >=
+   // 255 + d - 1 to make room for the addition to __x and shift the modulo
+   // to the correct value.
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - _Up(255 + d - 1) and b % d = d - 1.
+   auto const __offset = __y >= 0 ? _Up(0) : __b - _Up(-1);
+   return (__x + _Up(__y) + __offset) % __d;
   }

   inline constexpr unsigned __days_per_month[12]
@@ -700,8 +720,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   friend constexpr month
   operator+(const month& __x, const 

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲
>> I suspect it's going to be even worse if you we have multiple patterns
>> with the same underlying RTL, but just different output strings.
No. We don't need to add (duplicate) any new patterns.
I know RVV GCC very well. I know how to do that.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-11-18 08:01
To: 钟居哲; palmer
CC: gcc-patches; kito.cheng; kito.cheng; cooper.joshua; rdapp.gcc
Subject: Re: RISC-V: Support XTheadVector extensions
 
 
On 11/17/23 16:16, 钟居哲 wrote:
>  >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
>>>may also need to add '^' to the punct_valid_p hook.  But yes, this is
>>>the preferred way to go when all we need to do is prefix the instruction
>>>with "th.".
> 
> No. I don't think we need to add '^' . I don't want theadvector to touch 
> any codes
> of vector.md.
> Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
> People like me don't want to touch any thing related to Thead.
> But anyway, I will take care of that in GCC-15.
I suspect it's going to be even worse if you we have multiple patterns 
with the same underlying RTL, but just different output strings.
 
The standard way to handle that has been with an output modifier and/or 
ASSEMBLER_DIALECT.  If you look at the PA port for example, the 
assembler syntax changed dramatically between the PA1.0/PA1.1 era and 
the PA2.0 era.  But we support both variants trivially without 
duplicating all the patterns.
 
But we've got time to sort this out.  I don't think the code in question 
was targeted towards gcc-14.
 
 
jeff
 


Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread Jeff Law




On 11/17/23 16:16, 钟居哲 wrote:

 >> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We

may also need to add '^' to the punct_valid_p hook.  But yes, this is
the preferred way to go when all we need to do is prefix the instruction
with "th.".


No. I don't think we need to add '^' . I don't want theadvector to touch 
any codes

of vector.md.
Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
People like me don't want to touch any thing related to Thead.
But anyway, I will take care of that in GCC-15.
I suspect it's going to be even worse if you we have multiple patterns 
with the same underlying RTL, but just different output strings.


The standard way to handle that has been with an output modifier and/or 
ASSEMBLER_DIALECT.  If you look at the PA port for example, the 
assembler syntax changed dramatically between the PA1.0/PA1.1 era and 
the PA2.0 era.  But we support both variants trivially without 
duplicating all the patterns.


But we've got time to sort this out.  I don't think the code in question 
was targeted towards gcc-14.



jeff


[Bug middle-end/112581] [14 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu (generated code hangs) since r14-4661

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112581

--- Comment #9 from Andrew Pinski  ---
Adding -fdisable-tree-reassoc2 causes the problem not to happen so yes it is a
bug in reassoc. If I get some time next week I will look into adding the
support.

[Bug middle-end/112581] [14 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu (generated code hangs) since r14-4661

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112581

--- Comment #8 from Andrew Pinski  ---
tree-ssa-reassoc.cc needs to call mark_ssa_maybe_undefs and checks
ssa_name_maybe_undef_p similar to the way tree-ssa-ifcombine.cc does it.

Re: Re: RISC-V: Support XTheadVector extensions

2023-11-17 Thread 钟居哲
>> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
>> may also need to add '^' to the punct_valid_p hook.  But yes, this is
>> the preferred way to go when all we need to do is prefix the instruction
>> with "th.".

No. I don't think we need to add '^' . I don't want theadvector to touch any 
codes
of vector.md.
Mixing up theadvector with RVV1.0 is a nighmare for RVV maintain.
People like me don't want to touch any thing related to Thead.
But anyway, I will take care of that in GCC-15.





juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-11-18 01:11
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; cooper.joshua; rdapp.gcc; jeffreyalaw
Subject: Re: RISC-V: Support XTheadVector extensions
On Fri, 17 Nov 2023 03:39:48 PST (-0800), juzhe.zh...@rivai.ai wrote:
> 90% theadvector extension reusing current RVV 1.0 instructions patterns:
> Just change ASM, For example:
> 
> @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
>   (match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] VMULH)
>(match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
>"TARGET_VECTOR"
> -  "vmulh.vx\t%0,%3,%z4%p1"
> +  "%^vmulh.vx\t%0,%3,%z4%p1"
>[(set_attr "type" "vimul")
> (set_attr "mode" "")])
> +  if (letter == '^')
> +{
> +  if (TARGET_XTHEADVECTOR)
> + fputs ("th.", file);
> +  return;
> +}
> 
> For almost all patterns, you just simply append "th." in the ASM prefix.
> like change "vmulh.vv" -> "th.vmulh.vv"
> 
> Almost all theadvector instructions are not new features,  all same as RVV1.0.
> Why do you invent the such ISA doesn't include any features that RVV1.0 
> doesn't satisfy ?
> 
> I am not explicitly object this patch. But I should know the reason.
 
There's some more in the later threads, but with the top posting it kind 
of got lost so I'm just replying here.
 
This really isn't T-Head's fault: we announced V-0.7 as a stable draft 
that was being implemented, and then T-Head went and implemented it.  
Most of that history has been scrubbed by RVI, but you can still find 
some stuff like this old talk on YouTube 
.
 
In general we've just figured out a way to make things work when HW 
vendors end up in a grey area in RISC-V land.  That obviously results in 
a bunch of pain for the SW people, but this stuff is only useful if we 
can run on real HW and that always involves some amount of pain.  
Hopefully we can get to a point where we make fewer problems for 
ourselves, but we've got a long history to dig out from and there's 
going to be a lot more of this in the future.
 
So I don't like this XTHeadV stuff, but I think we're best to take it: 
these guys tried to do the right thing and got thrown under the bus by 
RVI, we should help them.  This is almost certainly going to be a lot 
more pain that we're used to, just given the size of the extensions in 
question, but I still think it's the right  way to go.
 
The other option is to essentially just tell them to fork the ISA, which 
isn't good for anyone.
 
> Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as long 
> as all other RISC-V maintainers agree.
 
I agree this is gcc-15 material: there's a lot of subtle differences in 
behavior between 0.7 and 1.0, even when the mnemonics are the same.  
We're already pretty buried in testing for 14, so trying to pick up 
another target is going to be a huge headache (particularly one that's a 
bit special).
 
> 
> 
> 
> 
> juzhe.zh...@rivai.ai
 


[Bug middle-end/112581] [14 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu (generated code hangs) since r14-4661

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112581

--- Comment #7 from Andrew Pinski  ---
Wait a minute, h might be uninitialized.


So the issue is inside reassociate which is must not have recognized that
iftmp.7_30 is defined by a maybe an uninitialized variable ...

[Bug middle-end/112581] [14 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu (generated code hangs) since r14-4661

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112581

--- Comment #6 from Andrew Pinski  ---
forwprop3 does:
```
  _12 = ~iftmp.7_30;
  h_40 = (unsigned int) _12;
...
  if (h_40 == 4294967295)
```

into:
```
  _75 = (unsigned int) iftmp.7_30;
  if (iftmp.7_30 == 0)
```

Which as far as I can tell is correct.

And then reassociate2 combines:
```
   [local count: 57431765]:
  if (iftmp.7_30 == 0)
goto ; [97.17%]
  else
goto ; [2.83%]

   [local count: 55807730]:
  if (i_27 != 0)
goto ; [100.00%]
  else
goto ; [0.00%]
```
into:
```
   [local count: 57431765]:
  _75 = iftmp.7_30 | i_27;
  _33 = _75 == 0;
  if (_33 != 0)
goto ; [97.17%]
  else
goto ; [2.83%]
```

or
```
if ((_30 == 0) & (i_27 == 0)) goto <17> else goto <19/20>
```
Which looks correct.
I don't see anything going wrong with the patch itself ...

[Bug jit/112603] Allow setting the personality function

2023-11-17 Thread bouanto at zoho dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112603

--- Comment #1 from Antoni  ---
Created attachment 56628
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56628=edit
Patch

[PATCH] libgccjit: Add ways to set the personality function

2023-11-17 Thread Antoni Boucher
Hi.
This adds functions to set the personality function (bug 112603).

I'm not sure I can make a test for this: it seems the personality
function will not be set if there are no try/catch inside the
functions.
Do you know a way to keep the personality function that is set in this
case?

Or should we wait until I send the patch for try/catch?

Thanks for the review.
From 6beb6452c7bac9ecbdaea750d61d6e6c6bd3ed8f Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Sun, 16 Apr 2023 13:19:20 -0400
Subject: [PATCH] libgccjit: Add ways to set the personality function

gcc/ChangeLog:
	PR jit/112603
	* expr.cc (build_personality_function_with_name): New function.
	* tree.cc (tree_cc_finalize): Cleanup gcc_eh_personality_decl.
	* tree.h (build_personality_function_with_name): New decl.

gcc/jit/ChangeLog:
	PR jit/112603
	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_26): New ABI tag.
	* docs/topics/functions.rst: Document the functions
	gcc_jit_set_global_personality_function_name and
	gcc_jit_function_set_personality_function.
	* dummy-frontend.cc (jit_gc_root): New variable.
	(jit_preserve_from_gc): New function.
	(jit_langhook_init): Initialize new variables.
	(jit_langhook_eh_personality): New hook.
	(LANG_HOOKS_EH_PERSONALITY): New hook.
	* jit-playback.cc (set_personality_function): New function.
	* jit-playback.h: New decl.
	* jit-recording.cc
	(memento_of_set_personality_function::make_debug_string,
	recording::memento_of_set_personality_function::write_reproducer,
	recording::function::set_personality_function,
	recording::memento_of_set_personality_function::replay_into):
	New functions
	* jit-recording.h (class memento_of_set_personality_function):
	New class.
	(recording::function::set_personality_function): New function.
	* libgccjit.cc (gcc_jit_function_set_personality_function,
	gcc_jit_set_global_personality_function_name): New functions.
	* libgccjit.h (gcc_jit_set_global_personality_function_name,
	gcc_jit_function_set_personality_function): New functions.
	* libgccjit.map: New functions.

gcc/testsuite/ChangeLog:

	* jit.dg/test-personality-function.c: New test.
	* jit.dg/all-non-failing-tests.h: Mention
	test-personality-function.c.
---
 gcc/expr.cc   |  8 +++
 gcc/jit/docs/topics/compatibility.rst | 10 
 gcc/jit/docs/topics/functions.rst | 28 ++
 gcc/jit/dummy-frontend.cc | 36 
 gcc/jit/jit-playback.cc   |  8 +++
 gcc/jit/jit-playback.h|  3 +
 gcc/jit/jit-recording.cc  | 44 +++
 gcc/jit/jit-recording.h   | 23 
 gcc/jit/libgccjit.cc  | 22 
 gcc/jit/libgccjit.h   |  8 +++
 gcc/jit/libgccjit.map |  6 ++
 gcc/testsuite/jit.dg/all-non-failing-tests.h  |  3 +
 .../jit.dg/test-personality-function.c| 55 +++
 gcc/tree.cc   |  1 +
 gcc/tree.h|  1 +
 15 files changed, 256 insertions(+)
 create mode 100644 gcc/testsuite/jit.dg/test-personality-function.c

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 556bcf7ef59..25d50289b24 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13559,6 +13559,14 @@ build_personality_function (const char *lang)
 
   name = ACONCAT (("__", lang, "_personality", unwind_and_version, NULL));
 
+  return build_personality_function_with_name (name);
+}
+
+tree
+build_personality_function_with_name (const char *name)
+{
+  tree decl, type;
+
   type = build_function_type_list (unsigned_type_node,
    integer_type_node, integer_type_node,
    long_long_unsigned_type_node,
diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index ebede440ee4..31c3ef6401a 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -378,3 +378,13 @@ alignment of a variable:
 
 ``LIBGCCJIT_ABI_25`` covers the addition of
 :func:`gcc_jit_type_get_restrict`
+
+.. _LIBGCCJIT_ABI_26:
+
+``LIBGCCJIT_ABI_26``
+
+``LIBGCCJIT_ABI_26`` covers the addition of functions to set the personality
+function:
+
+  * :func:`gcc_jit_function_set_personality_function`
+  * :func:`gcc_jit_set_global_personality_function_name`
diff --git a/gcc/jit/docs/topics/functions.rst b/gcc/jit/docs/topics/functions.rst
index cf5cb716daf..e59885c3549 100644
--- a/gcc/jit/docs/topics/functions.rst
+++ b/gcc/jit/docs/topics/functions.rst
@@ -197,6 +197,34 @@ Functions
 
.. type:: gcc_jit_case
 
+.. function::  void
+   gcc_jit_function_set_personality_function (gcc_jit_function *fn,
+  gcc_jit_function *personality_func)
+
+   Set the personality function of ``fn`` to ``personality_func``.
+
+   were added in :ref:`LIBGCCJIT_ABI_26`; you can test for their presence
+   using
+
+   .. 

[Bug jit/112603] New: Allow setting the personality function

2023-11-17 Thread bouanto at zoho dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112603

Bug ID: 112603
   Summary: Allow setting the personality function
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: jit
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: bouanto at zoho dot com
  Target Milestone: ---

I'll soon send a patch for this.

[Bug middle-end/112581] [14 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu (generated code hangs) since r14-4661

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112581

--- Comment #5 from Andrew Pinski  ---
Looking into what is going wrong.

gcc-12-20231117 is now available

2023-11-17 Thread GCC Administrator via Gcc
Snapshot gcc-12-20231117 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20231117/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision 7fae9873a74c7a5a62044bb6a4cde8e3ac1a5e5d

You'll find:

 gcc-12-20231117.tar.xz   Complete GCC

  SHA256=1a07e402233828c436b7f68411813a49372db4360a386d0cbb3fa2acc1219504
  SHA1=5a47b17baf94579012bf2f9f2307368df0036619

Diffs from 12-20231110 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug jit/112602] Support vector permutation and access

2023-11-17 Thread bouanto at zoho dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112602

--- Comment #1 from Antoni  ---
Created attachment 56627
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56627=edit
Patch

[PATCH] libgccjit: Add vector permutation and vector access operations

2023-11-17 Thread Antoni Boucher
Hi.
This patch adds a vector permutation and vector access operations (bug
112602).

This was split from this patch:
https://gcc.gnu.org/pipermail/jit/2023q1/001606.html

Thanks for the review.
From 25b386334f22845d7ba1b60658730373eb6ddbb3 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Fri, 17 Nov 2023 17:23:28 -0500
Subject: [PATCH] libgccjit: Add vector permutation and vector access
 operations

gcc/jit/ChangeLog:
	PR jit/112602
	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_26): New ABI tag.
	* docs/topics/expressions.rst: Document
	gcc_jit_context_new_rvalue_vector_perm and
	gcc_jit_context_new_vector_access.
	* jit-playback.cc (playback::context::new_rvalue_vector_perm,
	common_mark_addressable_vec,
	gnu_vector_type_p,
	lvalue_p,
	convert_vector_to_array_for_subscript,
	new_vector_access): new functions.
	* jit-playback.h (new_rvalue_vector_perm, new_vector_access):
	New functions.
	* jit-recording.cc (recording::context::new_rvalue_vector_perm,
	recording::context::new_vector_access,
	memento_of_new_rvalue_vector_perm,
	recording::memento_of_new_rvalue_vector_perm::replay_into,
	recording::memento_of_new_rvalue_vector_perm::visit_children,
	recording::memento_of_new_rvalue_vector_perm::make_debug_string,
	recording::memento_of_new_rvalue_vector_perm::write_reproducer,
	recording::vector_access::replay_into,
	recording::vector_access::visit_children,
	recording::vector_access::make_debug_string,
	recording::vector_access::write_reproducer): New methods.
	* jit-recording.h (class memento_of_new_rvalue_vector_perm,
	class vector_access): New classes.
	* libgccjit.cc (gcc_jit_context_new_vector_access,
	gcc_jit_context_new_rvalue_vector_perm): New functions.
	* libgccjit.h (gcc_jit_context_new_rvalue_vector_perm,
	gcc_jit_context_new_vector_access): New functions.
	* libgccjit.map: New functions.

gcc/testsuite/ChangeLog:
	PR jit/112602
	* jit.dg/all-non-failing-tests.h: New test test-vector-perm.c.
	* jit.dg/test-vector-perm.c: New test.
---
 gcc/jit/docs/topics/compatibility.rst|  10 ++
 gcc/jit/docs/topics/expressions.rst  |  53 ++
 gcc/jit/jit-playback.cc  | 150 
 gcc/jit/jit-playback.h   |  11 ++
 gcc/jit/jit-recording.cc | 169 +++
 gcc/jit/jit-recording.h  |  72 
 gcc/jit/libgccjit.cc | 109 
 gcc/jit/libgccjit.h  |  29 
 gcc/jit/libgccjit.map|   6 +
 gcc/testsuite/jit.dg/all-non-failing-tests.h |  12 +-
 gcc/testsuite/jit.dg/test-vector-perm.c  |  96 +++
 11 files changed, 716 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/jit.dg/test-vector-perm.c

diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index ebede440ee4..a764e3968d1 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -378,3 +378,13 @@ alignment of a variable:
 
 ``LIBGCCJIT_ABI_25`` covers the addition of
 :func:`gcc_jit_type_get_restrict`
+
+
+.. _LIBGCCJIT_ABI_26:
+
+``LIBGCCJIT_ABI_26``
+
+``LIBGCCJIT_ABI_26`` covers the addition of functions to manipulate vectors:
+
+  * :func:`gcc_jit_context_new_rvalue_vector_perm`
+  * :func:`gcc_jit_context_new_vector_access`
diff --git a/gcc/jit/docs/topics/expressions.rst b/gcc/jit/docs/topics/expressions.rst
index 42cfee36302..4a45aa13f5c 100644
--- a/gcc/jit/docs/topics/expressions.rst
+++ b/gcc/jit/docs/topics/expressions.rst
@@ -295,6 +295,35 @@ Vector expressions
 
   #ifdef LIBGCCJIT_HAVE_gcc_jit_context_new_rvalue_from_vector
 
+.. function:: gcc_jit_rvalue * \
+  gcc_jit_context_new_rvalue_vector_perm (gcc_jit_context *ctxt, \
+  gcc_jit_location *loc, \
+  gcc_jit_rvalue *elements1, \
+  gcc_jit_rvalue *elements2, \
+  gcc_jit_rvalue *mask);
+
+   Build a permutation of two vectors.
+
+   "elements1" and "elements2" should have the same type.
+   The length of "mask" and "elements1" should be the same.
+   The element type of "mask" should be integral.
+   The size of the element type of "mask" and "elements1" should be the same.
+
+   This entrypoint was added in :ref:`LIBGCCJIT_ABI_25`; you can test for
+   its presence using
+
+   .. code-block:: c
+
+  #ifdef LIBGCCJIT_HAVE_VECTOR_OPERATIONS
+
+Analogous to:
+
+.. code-block:: c
+
+   __builtin_shuffle (elements1, elements2, mask)
+
+in C.
+
 Unary Operations
 
 
@@ -1020,3 +1049,27 @@ Field access is provided separately for both lvalues and rvalues.
   PTR[INDEX]
 
in C (or, indeed, to ``PTR + INDEX``).
+
+.. function:: gcc_jit_lvalue *\
+  

[Bug ipa/112601] [11/12/13/14 Regression] ICE in cgraph_node::verify_node(): error: invalid calls_comdat_local flag

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112601

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.5
  Known to fail||10.1.0
Summary|ICE in  |[11/12/13/14 Regression]
   |cgraph_node::verify_node(): |ICE in
   |error: invalid  |cgraph_node::verify_node():
   |calls_comdat_local flag |error: invalid
   ||calls_comdat_local flag
  Known to work||9.5.0
   Keywords||ice-on-valid-code

[PATCH] Makefile.tpl: Avoid race condition in generating site.exp from the top level

2023-11-17 Thread Lewis Hyatt
Hello-

I often find it convenient to run a new c-c++-common test from the
main build dir like:

$ make -j 2 RUNTESTFLAGS=dg.exp=new-test.c check-gcc-{c,c++}

I noticed that sometimes this produces a corrupted site.exp and then no
tests work until it is remade manually. To avoid the issue, it is necessary
to do "cd gcc; make site.exp" before running a parallel make from the top
level directory. The below patch fixes it by just making that dependency on
site.exp explicit in the top level Makefile. Is it OK please? Thanks...

-Lewis

-- >8 --

A command like "make -j 2 check-gcc-c check-gcc-c++" run in the top level of
a fresh build directory does not work reliably. That will spawn two
independent make processes inside the "gcc" directory, and each of those
will attempt to create site.exp if it doesn't exist and will interfere with
each other, producing often a corrupted or empty site.exp. Resolve that by
making these targets depend on a new phony target which makes sure site.exp
is created first before starting the recursive makes.

ChangeLog:

* Makefile.in: Regenerate.
* Makefile.tpl: Add dependency on site.exp to check-gcc-* targets
---
 Makefile.in  | 30 +++---
 Makefile.tpl | 10 +-
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/Makefile.tpl b/Makefile.tpl
index 8b7783bb4f1..6e22adecd2f 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -1639,9 +1639,17 @@ cross: all-build all-gas all-ld
 @endif gcc-no-bootstrap
 
 @if gcc
+
+.PHONY: gcc-site.exp
+gcc-site.exp:
+   r=`${PWD_COMMAND}`; export r; \
+   s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
+   $(HOST_EXPORTS) \
+   (cd gcc && $(MAKE) $(GCC_FLAGS_TO_PASS) site.exp);
+
 [+ FOR languages +]
 .PHONY: check-gcc-[+language+] check-[+language+]
-check-gcc-[+language+]:
+check-gcc-[+language+]: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
diff --git a/Makefile.in b/Makefile.in
index b65ab4953bc..da2344b3f3d 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -62200,8 +62200,16 @@ cross: all-build all-gas all-ld
 
 @if gcc
 
+.PHONY: gcc-site.exp
+gcc-site.exp:
+   r=`${PWD_COMMAND}`; export r; \
+   s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
+   $(HOST_EXPORTS) \
+   (cd gcc && $(MAKE) $(GCC_FLAGS_TO_PASS) site.exp);
+
+
 .PHONY: check-gcc-c check-c
-check-gcc-c:
+check-gcc-c: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62209,7 +62217,7 @@ check-gcc-c:
 check-c: check-gcc-c
 
 .PHONY: check-gcc-c++ check-c++
-check-gcc-c++:
+check-gcc-c++: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62217,7 +62225,7 @@ check-gcc-c++:
 check-c++: check-gcc-c++ check-target-libstdc++-v3 check-target-libitm-c++ 
check-target-libgomp-c++
 
 .PHONY: check-gcc-fortran check-fortran
-check-gcc-fortran:
+check-gcc-fortran: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62225,7 +62233,7 @@ check-gcc-fortran:
 check-fortran: check-gcc-fortran check-target-libquadmath 
check-target-libgfortran check-target-libgomp-fortran
 
 .PHONY: check-gcc-ada check-ada
-check-gcc-ada:
+check-gcc-ada: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62233,7 +62241,7 @@ check-gcc-ada:
 check-ada: check-gcc-ada check-target-libada
 
 .PHONY: check-gcc-objc check-objc
-check-gcc-objc:
+check-gcc-objc: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62241,7 +62249,7 @@ check-gcc-objc:
 check-objc: check-gcc-objc check-target-libobjc
 
 .PHONY: check-gcc-obj-c++ check-obj-c++
-check-gcc-obj-c++:
+check-gcc-obj-c++: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62249,7 +62257,7 @@ check-gcc-obj-c++:
 check-obj-c++: check-gcc-obj-c++
 
 .PHONY: check-gcc-go check-go
-check-gcc-go:
+check-gcc-go: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62257,7 +62265,7 @@ check-gcc-go:
 check-go: check-gcc-go check-target-libgo check-gotools
 
 .PHONY: check-gcc-m2 check-m2
-check-gcc-m2:
+check-gcc-m2: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62265,7 +62273,7 @@ check-gcc-m2:
 check-m2: check-gcc-m2 check-target-libgm2
 
 .PHONY: check-gcc-d check-d
-check-gcc-d:
+check-gcc-d: gcc-site.exp
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
$(HOST_EXPORTS) \
@@ -62273,7 

[Bug jit/112602] New: Support vector permutation and access

2023-11-17 Thread bouanto at zoho dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112602

Bug ID: 112602
   Summary: Support vector permutation and access
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: jit
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: bouanto at zoho dot com
  Target Milestone: ---

I'll soon send a patch for this feature request.

[Bug libstdc++/112596] GCC regex error in Opentelemetry C++

2023-11-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112596

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2023-11-17
 Status|UNCONFIRMED |WAITING
  Component|c++ |libstdc++
 Ever confirmed|0   |1

--- Comment #1 from Jonathan Wakely  ---
std::regex doesn't work in any release older than gcc-4.9.0 so I doubt it's
working properly with gcc-4.8.5

If it is working properly, then you're not actually using std::regex, so you
should just be able to remove it.

In any case, we can't do anything without the information we request for all
bug reports:
https://gcc.gnu.org/bugs/

Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Jonathan Wakely
On Fri, 17 Nov 2023 at 15:32, Jonathan Wakely  wrote:
>
> Tested x86_64-linux. Pushed to trunk.
>
> GCC generates better code for add_sat if we use:
>
> unsigned z = x + y;
> z |= -(z < x);
> return z;
>
> If the compiler can't be improved we should consider using that instead
> of __builtin_add_overflow.

I reported PR 112600 for the missed optimization. I added an optimized
sub_sat there as well.



[Bug ipa/112601] New: ICE in cgraph_node::verify_node(): error: invalid calls_comdat_local flag

2023-11-17 Thread slyfox at gcc dot gnu.org via Gcc-bugs
/slyfox/dev/git/gcc/configure --disable-multilib
--disable-bootstrap --disable-lto --disable-libsanitizer
--disable-libstdcxx-pch --enable-languages=c,c++ --disable-libgomp
--disable-libquadmath --disable-libvtv CFLAGS='-O1 -g0' CXXFLAGS='-O1 -g0'
LDFLAGS='-O1 -g0'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20231117 (experimental) (GCC)

[Bug c++/106650] [C++23] P2280 - Using unknown references in constant expressions

2023-11-17 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106650

Marek Polacek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |mpolacek at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #3 from Marek Polacek  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637101.html
though I'm not sure it's complete.

[PATCH] c++: P2280R4, Using unknown refs in constant expr [PR106650]

2023-11-17 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch is an attempt to implement (part of?) P2280, Using unknown
pointers and references in constant expressions.  (Note that R4 seems to
only allow References to unknown/Accesses via this, but not Pointers to
unknown.)

This patch works to the extent that the test case added in [expr.const]
works as expected, as well as the test in


Most importantly, the proposal makes this compile:

  template 
  constexpr auto array_size(T (&)[N]) -> size_t {
  return N;
  }

  void check(int const ()[3]) {
  constexpr auto s = array_size(param);
  static_assert (s == 3);
  }

and I think it would be a pity not to have it in GCC 14.

What still doesn't work (and I don't know if it should) is the test in $3.2:

  struct A2 { constexpr int f() { return 0; } };
  struct B2 : virtual A2 {};
  void f2(B2 ) { constexpr int k = b.f(); }

where we say
error: '* & b' is not a constant expression

PR c++/106650

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Allow reference to
unknown as per P2280.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-array-ptr6.C: Remove dg-error.
* g++.dg/cpp0x/constexpr-ref12.C: Likewise.
* g++.dg/cpp1y/lambda-generic-const10.C: Likewise.
* g++.dg/cpp0x/constexpr-ref13.C: New test.
* g++.dg/cpp1z/constexpr-ref1.C: New test.
* g++.dg/cpp1z/constexpr-ref2.C: New test.
* g++.dg/cpp2a/constexpr-ref1.C: New test.
---
 gcc/cp/constexpr.cc   |  2 +
 .../g++.dg/cpp0x/constexpr-array-ptr6.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C  |  4 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C  | 25 +
 .../g++.dg/cpp1y/lambda-generic-const10.C |  2 +-
 gcc/testsuite/g++.dg/cpp1z/constexpr-ref1.C   | 26 +
 gcc/testsuite/g++.dg/cpp1z/constexpr-ref2.C   | 23 
 gcc/testsuite/g++.dg/cpp2a/constexpr-ref1.C   | 54 +++
 8 files changed, 134 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-ref1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-ref2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-ref1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 344107d494b..d5e487801cc 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7378,6 +7378,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  r = build_constructor (TREE_TYPE (t), NULL);
  TREE_CONSTANT (r) = true;
}
+  else if (TYPE_REF_P (TREE_TYPE (t)))
+   /* P2280 allows references to unknown.  */;
   else
{
  if (!ctx->quiet)
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C
index 1c065120314..d212665e51f 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-array-ptr6.C
@@ -12,7 +12,7 @@ constexpr auto sz_d = size(array_double);
 static_assert(sz_d == 3, "Array size failure");
 
 void f(bool ()[2]) {
-  static_assert(size(param) == 2, "Array size failure"); // { dg-error "" }
+  static_assert(size(param) == 2, "Array size failure");
   short data[] = {-1, 2, -45, 6, 88, 99, -345};
   static_assert(size(data) == 7, "Array size failure");
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C
index 7c3ce66b4c9..f4500144946 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref12.C
@@ -40,7 +40,7 @@ void f(a ap, a& arp)
   static_assert (g(ar2),"");   // { dg-error "constant" }
   static_assert (h(ar2),"");   // { dg-error "constant" }
 
-  static_assert (arp.g(),"");  // { dg-error "constant" }
-  static_assert (g(arp),"");   // { dg-error "constant" }
+  static_assert (arp.g(),"");
+  static_assert (g(arp),"");
   static_assert (h(arp),"");   // { dg-error "constant" }
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C
new file mode 100644
index 000..4be729c2301
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-ref13.C
@@ -0,0 +1,25 @@
+// P2280R4 - Using unknown pointers and references in constant expressions
+// PR c++/106650
+// { dg-do compile { target c++11 } }
+
+using size_t = decltype(sizeof(42));
+
+template 
+constexpr auto array_size(T (&)[N]) -> size_t {
+return N;
+}
+
+void check(int const ()[3]) {
+int local[] = {1, 2, 3};
+constexpr auto s0 = array_size(local);
+constexpr auto s1 = array_size(param);
+}
+
+template 
+constexpr size_t array_size_ptr(T (*)[N]) {
+return N;
+}
+
+void check_ptr(int const (*param)[3]) {
+constexpr auto s2 = array_size_ptr(param); // { 

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2023-11-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #2 from Jonathan Wakely  ---
For similar saturating subtraction functions:

unsigned
sub_sat(unsigned x, unsigned y) noexcept
{
unsigned z;
if (!__builtin_sub_overflow(x, y, ))
return z;
return 0;
}

unsigned
sub_sat2(unsigned x, unsigned y) noexcept
{
unsigned res;
res = x - y;
res &= -(res <= x);;
return res;
}

GCC x86_64 gives:

sub_sat(unsigned int, unsigned int):
sub edi, esi
jb  .L3
mov eax, edi
ret
.L3:
xor eax, eax
ret
sub_sat2(unsigned int, unsigned int):
sub edi, esi
mov eax, 0
cmovnb  eax, edi
ret

GCC aarch64 gives:

sub_sat(unsigned int, unsigned int):
subsw2, w0, w1
mov w3, 0
cmp w0, w1
cselw0, w2, w3, cs
ret
sub_sat2(unsigned int, unsigned int):
subsw0, w0, w1
cselw0, w0, wzr, cs
ret


Clang x86_64 gives:

sub_sat(unsigned int, unsigned int):
xor eax, eax
sub edi, esi
cmovae  eax, edi
ret
sub_sat2(unsigned int, unsigned int):
xor eax, eax
sub edi, esi
cmovae  eax, edi
ret

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2023-11-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #1 from Jonathan Wakely  ---
Similar results for aarch64 with GCC:

add_sat(unsigned int, unsigned int):
addsw0, w0, w1
bcs .L7
ret
.L7:
mov w0, -1
ret
add_sat2(unsigned int, unsigned int):
addsw0, w0, w1
csinv   w0, w0, wzr, cc
ret

[Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow

2023-11-17 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Bug ID: 112600
   Summary: Failed to optimize saturating addition using
__builtin_add_overflow
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

These two implementations of C++26 saturating addition (std::add_sat)
have equivalent behaviour:

unsigned
add_sat(unsigned x, unsigned y) noexcept
{
unsigned z;
if (!__builtin_add_overflow(x, y, ))
return z;
return -1u;
}

unsigned
add_sat2(unsigned x, unsigned y) noexcept
{
unsigned res;
res = x + y;
res |= -(res < x);
return res;
}


For -O3 on x86_64 GCC uses a branch for the first one:

add_sat(unsigned int, unsigned int):
add edi, esi
jc  .L3
mov eax, edi
ret
.L3:
or  eax, -1
ret

For the second one we get better code:

add_sat2(unsigned int, unsigned int):
add edi, esi
sbb eax, eax
or  eax, edi
ret



Clang compiles them both to the same code:

add_sat(unsigned int, unsigned int):
add edi, esi
mov eax, -1
cmovae  eax, edi
ret

[Bug target/112599] RISC-V regression testsuite errors with rv64gcv_zvl1024b

2023-11-17 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112599

--- Comment #1 from Patrick O'Neill  ---
Related issues:
128: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
256: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112597
512: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
1024: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112599

[Bug target/112598] RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-17 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

--- Comment #1 from Patrick O'Neill  ---
Related issues:
128: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
256: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112597
512: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
1024: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112599

[Bug target/112597] RISC-V regression testsuite errors with rv64gcv_zvl256b

2023-11-17 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112597

--- Comment #1 from Patrick O'Neill  ---
Related issues:
128: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
256: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112597
512: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
1024: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112599

[Bug target/112583] RISC-V regression testsuite errors with rv64gcv_zvl128b

2023-11-17 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583

--- Comment #1 from Patrick O'Neill  ---
Related issues:
128: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
256: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112597
512: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
1024: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112599

[Bug target/112599] New: RISC-V regression testsuite errors with rv64gcv_zvl1024b

2023-11-17 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112599

Bug ID: 112599
   Summary: RISC-V regression testsuite errors with
rv64gcv_zvl1024b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: patrick at rivosinc dot com
  Target Milestone: ---

Created attachment 56626
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56626=edit
rv64gcv_zvl1024b testsuite failures 2023-11-17

Current testsuite status of rv64gcv_zvl1024b on GCC
5cb13173e85537a8a423b7b22b60ca3b6505f91e

I've started running zvl variants 128-1024b weekly on the postcommit CI. This
is my first time running these so if any of the failures look odd poke me here
or via email and I can dig into/share the logs.

Artifacts for this run can be downloaded here:
https://github.com/patrick-rivos/gcc-postcommit-ci/actions/runs/6898356494
Likely artifacts of interest:
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-debug-output.log
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-report.log
 
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-sum-files
 

This is just a tracking issue, similar to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

I've attached the current results for rv64gcv_zvl1024b with glibc v2.37 on QEMU
v8.1.2

[Bug target/112598] New: RISC-V regression testsuite errors with rv64gcv_zvl512b

2023-11-17 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598

Bug ID: 112598
   Summary: RISC-V regression testsuite errors with
rv64gcv_zvl512b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: patrick at rivosinc dot com
  Target Milestone: ---

Created attachment 56625
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56625=edit
rv64gcv_zvl512b testsuite failures 2023-11-17

Current testsuite status of rv64gcv_zvl512b on GCC
5cb13173e85537a8a423b7b22b60ca3b6505f91e

I've started running zvl variants 128-1024b weekly on the postcommit CI. This
is my first time running these so if any of the failures look odd poke me here
or via email and I can dig into/share the logs.

Artifacts for this run can be downloaded here:
https://github.com/patrick-rivos/gcc-postcommit-ci/actions/runs/6898356494
Likely artifacts of interest:
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-debug-output.log
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-report.log
 
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-sum-files
 

This is just a tracking issue, similar to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

I've attached the current results for rv64gcv_zvl512b with glibc v2.37 on QEMU
v8.1.2

[Bug target/112597] New: RISC-V regression testsuite errors with rv64gcv_zvl256b

2023-11-17 Thread patrick at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112597

Bug ID: 112597
   Summary: RISC-V regression testsuite errors with
rv64gcv_zvl256b
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: patrick at rivosinc dot com
  Target Milestone: ---

Created attachment 56624
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56624=edit
rv64gcv_zvl256b testsuite failures 2023-11-17

Current testsuite status of rv64gcv_zvl256b on GCC
5cb13173e85537a8a423b7b22b60ca3b6505f91e

I've started running zvl variants 128-1024b weekly on the postcommit CI. This
is my first time running these so if any of the failures look odd poke me here
or via email and I can dig into/share the logs.

Artifacts for this run can be downloaded here:
https://github.com/patrick-rivos/gcc-postcommit-ci/actions/runs/6898356494
Likely artifacts of interest:
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-debug-output.log
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-report.log
 
gcc-linux-rv64gcv_zvl-lp64d-5cb13173e85537a8a423b7b22b60ca3b6505f91e-multilib-sum-files
 

This is just a tracking issue, similar to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

I've attached the current results for rv64gcv_zvl256b with glibc v2.37 on QEMU
v8.1.2

Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Jeff Law




On 11/17/23 14:08, Antoni Boucher wrote:

In contrast with the other frontends, libgccjit can be executed
multiple times in a row in the same process.
Yup.  I'm aware of that.  Even so calling init_emit_once more than one 
time still seems wrong.


jeff


Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Antoni Boucher
In contrast with the other frontends, libgccjit can be executed
multiple times in a row in the same process.
This is the source of multiple bugs due to global variables as can be
seen by several patches I sent these past years.

On Fri, 2023-11-17 at 14:06 -0700, Jeff Law wrote:
> 
> 
> On 11/16/23 15:36, Antoni Boucher wrote:
> > Hi.
> > This patch fixes a RTL bug when using some target-specific builtins
> > in
> > libgccjit (bug 112576).
> > 
> > The test use a function from an unmerged patch:
> > https://gcc.gnu.org/pipermail/jit/2023q1/001605.html
> > 
> > Thanks for the review!
> The natural question here is why does libgccjit call init_emit_once
> more 
> than one time?  The whole point of that routine is doing one time 
> initializations.  It's not supposed to be called more than once.
> 
> David?  Thoughts here?
> 
> jeff



Re: [PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-17 Thread Jeff Law




On 11/16/23 15:36, Antoni Boucher wrote:

Hi.
This patch fixes a RTL bug when using some target-specific builtins in
libgccjit (bug 112576).

The test use a function from an unmerged patch:
https://gcc.gnu.org/pipermail/jit/2023q1/001605.html

Thanks for the review!
The natural question here is why does libgccjit call init_emit_once more 
than one time?  The whole point of that routine is doing one time 
initializations.  It's not supposed to be called more than once.


David?  Thoughts here?

jeff


[Bug target/112578] LoongArch: Wrong code -with -mlsx -fno-fp-int-builtin-inexact

2023-11-17 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112578

Xi Ruoyao  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2023-Novembe
   ||r/637097.html
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=107723
   Keywords||patch

--- Comment #3 from Xi Ruoyao  ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637097.html

This fixes most of -ml[a]sx -fno-fp-int-builtin-inexact issues on LoongArch,
except PR107723.

[PATCH] LoongArch: Fix usage of LSX and LASX frint/ftint instructions [PR112578]

2023-11-17 Thread Xi Ruoyao
The usage LSX and LASX frint/ftint instructions had some problems:

1. These instructions raises FE_INEXACT, which is not allowed with
   -fno-fp-int-builtin-inexact for most C2x section F.10.6 functions
   (the only exceptions are rint, lrint, and llrint).
2. The "frint" instruction without explicit rounding mode is used for
   roundM2, this is incorrect because roundM2 is defined "rounding
   operand 1 to the *nearest* integer, rounding away from zero in the
   event of a tie".  We actually don't have such an instruction.  Our
   frintrne instruction is roundevenM2 (unfortunately, this is not
   documented).
3. These define_insn's are written in a way not so easy to hack.

So I removed these instructions and created a "simd.md" file, then added
them and the corresponding expanders there.  The advantage of the
simd.md file is we don't need to duplicate the RTL template twice (in
lsx.md and lasx.md).

gcc/ChangeLog:

PR target/112578
* config/loongarch/lsx.md (UNSPEC_LSX_VFTINT_S,
UNSPEC_LSX_VFTINTRNE, UNSPEC_LSX_VFTINTRP,
UNSPEC_LSX_VFTINTRM, UNSPEC_LSX_VFRINTRNE_S,
UNSPEC_LSX_VFRINTRNE_D, UNSPEC_LSX_VFRINTRZ_S,
UNSPEC_LSX_VFRINTRZ_D, UNSPEC_LSX_VFRINTRP_S,
UNSPEC_LSX_VFRINTRP_D, UNSPEC_LSX_VFRINTRM_S,
UNSPEC_LSX_VFRINTRM_D): Remove.
(ILSX, FLSX): Move into ...
(VIMODE): Move into ...
(FRINT_S, FRINT_D): Remove.
(frint_pattern_s, frint_pattern_d, frint_suffix): Remove.
(lsx_vfrint_, lsx_vftint_s__,
lsx_vftintrne_w_s, lsx_vftintrne_l_d, lsx_vftintrp_w_s,
lsx_vftintrp_l_d, lsx_vftintrm_w_s, lsx_vftintrm_l_d,
lsx_vfrintrne_s, lsx_vfrintrne_d, lsx_vfrintrz_s,
lsx_vfrintrz_d, lsx_vfrintrp_s, lsx_vfrintrp_d,
lsx_vfrintrm_s, lsx_vfrintrm_d,
v4sf2,
v2df2, round2,
fix_trunc2): Remove.
* config/loongarch/lasx.md: Likewise.
* config/loongarch/simd.md: New file.
(ILSX, ILASX, FLSX, FLASX, VIMODE): ... here.
(IVEC, FVEC): New mode iterators.
(VIMODE): ... here.  Extend it to work for all LSX/LASX vector
modes.
(x, wu, simd_isa, WVEC, vimode, simdfmt, simdifmt_for_f,
elebits): New mode attributes.
(UNSPEC_SIMD_FRINTRP, UNSPEC_SIMD_FRINTRZ, UNSPEC_SIMD_FRINT,
UNSPEC_SIMD_FRINTRM, UNSPEC_SIMD_FRINTRNE): New unspecs.
(SIMD_FRINT): New int iterator.
(simd_frint_rounding, simd_frint_pattern): New int attributes.
(_vfrint_): New
define_insn template for frint instructions.
(_vftint__):
Likewise, but for ftint instructions.
(2): New define_expand with
flag_fp_int_builtin_inexact checked.
(l2): Likewise.
(rint2): New define_expand.  It does not require
flag_fp_int_builtin_inexact.
(ftrunc2): Likewise.
(lrint2): Likewise.
(fix_trunc2): New define_insn_and_split.  It does
not require flag_fp_int_builtin_inexact.
(include): Add lsx.md and lasx.md.
* config/loongarch/loongarch.md (include): Include simd.md,
instead of including lsx.md and lasx.md directly.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vftint_w_s, CODE_FOR_lsx_vftint_l_d,
CODE_FOR_lasx_xvftint_w_s, CODE_FOR_lasx_xvftint_l_d):
Remove.

gcc/testsuite/ChangeLog:

PR target/112578
* gcc.target/loongarch/vect-frint.c: New test.
* gcc.target/loongarch/vect-frint-no-inexact.c: New test.
* gcc.target/loongarch/vect-ftint.c: New test.
* gcc.target/loongarch/vect-ftint-no-inexact.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu (with LASX enabled
in BOOT_CFLAGS).  Ok for trunk?

 gcc/config/loongarch/lasx.md  | 239 -
 gcc/config/loongarch/loongarch-builtins.cc|   4 -
 gcc/config/loongarch/loongarch.md |   7 +-
 gcc/config/loongarch/lsx.md   | 243 --
 gcc/config/loongarch/simd.md  | 204 +++
 .../loongarch/vect-frint-no-inexact.c |  48 
 .../gcc.target/loongarch/vect-frint.c |  82 ++
 .../loongarch/vect-ftint-no-inexact.c |  44 
 .../gcc.target/loongarch/vect-ftint.c |  80 ++
 9 files changed, 460 insertions(+), 491 deletions(-)
 create mode 100644 gcc/config/loongarch/simd.md
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-frint.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint-no-inexact.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-ftint.c

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 2e11f061202..d4a56c307c4 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -53,7 +53,6 @@
   UNSPEC_LASX_XVFCMP_SULT
   UNSPEC_LASX_XVFCMP_SUN
   

[PATCH v2 3/6] LoongArch: Add evolution features of base ISA revisions

2023-11-17 Thread Xi Ruoyao
* config/loongarch/loongarch-def.h:
(loongarch_isa_base_features): Declare.  Define it in ...
* config/loongarch/loongarch-cpu.cc
(loongarch_isa_base_features): ... here.
(fill_native_cpu_config): If we know the base ISA of the CPU
model from PRID, use it instead of la64 (v1.0).  Check if all
expected features of this base ISA is available, emit a warning
if not.
* config/loongarch/loongarch-opts.cc (config_target_isa): Enable
the features implied by the base ISA if not -march=native.
---
 gcc/config/loongarch/loongarch-cpu.cc  | 62 ++
 gcc/config/loongarch/loongarch-def.h   |  5 +++
 gcc/config/loongarch/loongarch-opts.cc |  3 ++
 3 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-cpu.cc 
b/gcc/config/loongarch/loongarch-cpu.cc
index f41e175257a..7acf1a9121d 100644
--- a/gcc/config/loongarch/loongarch-cpu.cc
+++ b/gcc/config/loongarch/loongarch-cpu.cc
@@ -32,6 +32,19 @@ along with GCC; see the file COPYING3.  If not see
 #include "loongarch-cpucfg-map.h"
 #include "loongarch-str.h"
 
+/* loongarch_isa_base_features defined here instead of loongarch-def.c
+   because we need to use options.h.  Pay attention on the order of elements
+   in the initializer becaue ISO C++ does not allow C99 designated
+   initializers!  */
+
+#define ISA_BASE_LA64V110_FEATURES \
+  (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA)
+
+int64_t loongarch_isa_base_features[N_ISA_BASE_TYPES] = {
+  /* [ISA_BASE_LA64V100] = */ 0,
+  /* [ISA_BASE_LA64V110] = */ ISA_BASE_LA64V110_FEATURES,
+};
+
 /* Native CPU detection with "cpucfg" */
 static uint32_t cpucfg_cache[N_CPUCFG_WORDS] = { 0 };
 
@@ -127,24 +140,22 @@ fill_native_cpu_config (struct loongarch_target *tgt)
 With: base architecture (ARCH)
 At:   cpucfg_words[1][1:0] */
 
-  switch (cpucfg_cache[1] & 0x3)
-   {
- case 0x02:
-   tmp = ISA_BASE_LA64V100;
-   break;
-
- default:
-   fatal_error (UNKNOWN_LOCATION,
-"unknown native base architecture %<0x%x%>, "
-"%qs failed", (unsigned int) (cpucfg_cache[1] & 0x3),
-"-m" OPTSTR_ARCH "=" STR_CPU_NATIVE);
-   }
-
-  /* Check consistency with PRID presets.  */
-  if (native_cpu_type != CPU_NATIVE && tmp != preset.base)
-   warning (0, "base architecture %qs differs from PRID preset %qs",
-loongarch_isa_base_strings[tmp],
-loongarch_isa_base_strings[preset.base]);
+  if (native_cpu_type != CPU_NATIVE)
+   tmp = loongarch_cpu_default_isa[native_cpu_type].base;
+  else
+   switch (cpucfg_cache[1] & 0x3)
+ {
+   case 0x02:
+ tmp = ISA_BASE_LA64V100;
+ break;
+
+   default:
+ fatal_error (UNKNOWN_LOCATION,
+  "unknown native base architecture %<0x%x%>, "
+  "%qs failed",
+  (unsigned int) (cpucfg_cache[1] & 0x3),
+  "-m" OPTSTR_ARCH "=" STR_CPU_NATIVE);
+ }
 
   /* Use the native value anyways.  */
   preset.base = tmp;
@@ -227,6 +238,21 @@ fill_native_cpu_config (struct loongarch_target *tgt)
   for (const auto : cpucfg_map)
if (cpucfg_cache[entry.cpucfg_word] & entry.cpucfg_bit)
  preset.evolution |= entry.isa_evolution_bit;
+
+  if (native_cpu_type != CPU_NATIVE)
+   {
+ /* Check if the local CPU really supports the features of the base
+ISA of probed native_cpu_type.  If any feature is not detected,
+either GCC or the hardware is buggy.  */
+ auto base_isa_feature = loongarch_isa_base_features[preset.base];
+ if ((preset.evolution & base_isa_feature) != base_isa_feature)
+   warning (0,
+"detected base architecture %qs, but some of its "
+"features are not detected; the detected base "
+"architecture may be unreliable, only detected "
+"features will be enabled",
+loongarch_isa_base_strings[preset.base]);
+   }
 }
 
   if (tune_native_p)
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 6123c8e0f19..af7bd635d6e 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -55,12 +55,17 @@ extern "C" {
 
 /* enum isa_base */
 extern const char* loongarch_isa_base_strings[];
+
 /* LoongArch V1.00.  */
 #define ISA_BASE_LA64V100 0
 /* LoongArch V1.10.  */
 #define ISA_BASE_LA64V110 1
 #define N_ISA_BASE_TYPES  2
 
+/* Unlike other arrays, this is defined in loongarch-cpu.cc.  The problem is
+   we cannot use the C++ header options.h in loongarch-def.c.  */
+extern int64_t loongarch_isa_base_features[];
+
 /* enum isa_ext_* */
 extern const 

[PATCH v2 4/6] LoongArch: Take the advantage of -mdiv32 if it's enabled

2023-11-17 Thread Xi Ruoyao
With -mdiv32, we can assume div.w[u] and mod.w[u] works on low 32 bits
of a 64-bit GPR even if it's not sign-extended.

gcc/ChangeLog:

* config/loongarch/loongarch.md (DIV): New mode iterator.
(3): Don't expand if TARGET_DIV32.
(di3_fake): Disable if TARGET_DIV32.
(*3): Allow SImode if TARGET_DIV32.
(si3_extended): New insn if TARGET_DIV32.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/div-div32.c: New test.
* gcc.target/loongarch/div-no-div32.c: New test.
---
 gcc/config/loongarch/loongarch.md | 31 ---
 .../gcc.target/loongarch/div-div32.c  | 31 +++
 .../gcc.target/loongarch/div-no-div32.c   | 11 +++
 3 files changed, 68 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-div32.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-no-div32.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 22814a3679c..a97e5ee094a 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -408,6 +408,10 @@ (define_mode_iterator LD_AT_LEAST_32_BIT [GPR ANYF])
 ;; st.w.
 (define_mode_iterator ST_ANY [QHWD ANYF])
 
+;; A mode for anything legal as a input of a div or mod instruction.
+(define_mode_iterator DIV [(DI "TARGET_64BIT")
+  (SI "!TARGET_64BIT || TARGET_DIV32")])
+
 ;; In GPR templates, a string like "mul." will expand to "mul.w" in the
 ;; 32-bit version and "mul.d" in the 64-bit version.
 (define_mode_attr d [(SI "w") (DI "d")])
@@ -914,7 +918,7 @@ (define_expand "3"
 (match_operand:GPR 2 "register_operand")))]
   ""
 {
- if (GET_MODE (operands[0]) == SImode && TARGET_64BIT)
+ if (GET_MODE (operands[0]) == SImode && TARGET_64BIT && !TARGET_DIV32)
   {
 rtx reg1 = gen_reg_rtx (DImode);
 rtx reg2 = gen_reg_rtx (DImode);
@@ -934,9 +938,9 @@ (define_expand "3"
 })
 
 (define_insn "*3"
-  [(set (match_operand:X 0 "register_operand" "=r,,")
-   (any_div:X (match_operand:X 1 "register_operand" "r,r,0")
-  (match_operand:X 2 "register_operand" "r,r,r")))]
+  [(set (match_operand:DIV 0 "register_operand" "=r,,")
+   (any_div:DIV (match_operand:DIV 1 "register_operand" "r,r,0")
+(match_operand:DIV 2 "register_operand" "r,r,r")))]
   ""
 {
   return loongarch_output_division (".\t%0,%1,%2", operands);
@@ -949,6 +953,23 @@ (define_insn "*3"
(const_string "yes")
(const_string "no")))])
 
+(define_insn "si3_extended"
+  [(set (match_operand:DI 0 "register_operand" "=r,,")
+   (sign_extend
+ (any_div:SI (match_operand:SI 1 "register_operand" "r,r,0")
+ (match_operand:SI 2 "register_operand" "r,r,r"]
+  "TARGET_64BIT && TARGET_DIV32"
+{
+  return loongarch_output_division (".w\t%0,%1,%2", operands);
+}
+  [(set_attr "type" "idiv")
+   (set_attr "mode" "SI")
+   (set (attr "enabled")
+  (if_then_else
+   (match_test "!!which_alternative == loongarch_check_zero_div_p()")
+   (const_string "yes")
+   (const_string "no")))])
+
 (define_insn "di3_fake"
   [(set (match_operand:DI 0 "register_operand" "=r,,")
(sign_extend:DI
@@ -957,7 +978,7 @@ (define_insn "di3_fake"
 (any_div:DI (match_operand:DI 1 "register_operand" "r,r,0")
 (match_operand:DI 2 "register_operand" "r,r,r")) 0)]
  UNSPEC_FAKE_ANY_DIV)))]
-  "TARGET_64BIT"
+  "TARGET_64BIT && !TARGET_DIV32"
 {
   return loongarch_output_division (".w\t%0,%1,%2", operands);
 }
diff --git a/gcc/testsuite/gcc.target/loongarch/div-div32.c 
b/gcc/testsuite/gcc.target/loongarch/div-div32.c
new file mode 100644
index 000..8b1f686eca2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-div32.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d -mdiv32" } */
+/* { dg-final { scan-assembler "div\.w" } } */
+/* { dg-final { scan-assembler "div\.wu" } } */
+/* { dg-final { scan-assembler "mod\.w" } } */
+/* { dg-final { scan-assembler "mod\.wu" } } */
+/* { dg-final { scan-assembler-not "slli\.w.*,0" } } */
+
+int
+divw (long a, long b)
+{
+  return (int)a / (int)b;
+}
+
+unsigned int
+divwu (long a, long b)
+{
+  return (unsigned int)a / (unsigned int)b;
+}
+
+int
+modw (long a, long b)
+{
+  return (int)a % (int)b;
+}
+
+unsigned int
+modwu (long a, long b)
+{
+  return (unsigned int)a % (unsigned int)b;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/div-no-div32.c 
b/gcc/testsuite/gcc.target/loongarch/div-no-div32.c
new file mode 100644
index 000..f0f697ba589
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/div-no-div32.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "div\.w" } } */
+/* { dg-final { scan-assembler "div\.wu" } } */
+/* { dg-final { scan-assembler "mod\.w" } } */
+/* { 

[PATCH v2 6/6] LoongArch: Add fine-grained control for LAM_BH and LAMCAS

2023-11-17 Thread Xi Ruoyao
gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in: (lam-bh, lamcas):
Add.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-cpucfg-map.h: Regenerate.
* config/loongarch/loongarch-cpu.cc
(ISA_BASE_LA64V110_FEATURES): Include OPTION_MASK_ISA_LAM_BH
and OPTION_MASK_ISA_LAMCAS.
* config/loongarch/sync.md (atomic_add): Use
TARGET_LAM_BH instead of ISA_BASE_IS_LA64V110.  Remove empty
lines from assembly output.
(atomic_exchange_short): Likewise.
(atomic_exchange): Likewise.
(atomic_fetch_add_short): Likewise.
(atomic_fetch_add): Likewise.
(atomic_cas_value_strong_amcas): Use TARGET_LAMCAS instead
of ISA_BASE_IS_LA64V110.
(atomic_compare_and_swap): Likewise.
(atomic_compare_and_swap): Likewise.
(atomic_compare_and_swap): Likewise.
* config/loongarch/loongarch.cc (loongarch_asm_code_end): Dump
status if -mlam-bh and -mlamcas if -fverbose-asm.
---
 gcc/config/loongarch/genopts/isa-evolution.in |  2 ++
 gcc/config/loongarch/loongarch-cpu.cc |  3 ++-
 gcc/config/loongarch/loongarch-cpucfg-map.h   |  2 ++
 gcc/config/loongarch/loongarch-str.h  |  2 ++
 gcc/config/loongarch/loongarch.cc |  2 ++
 gcc/config/loongarch/loongarch.opt|  8 
 gcc/config/loongarch/sync.md  | 18 +-
 7 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/gcc/config/loongarch/genopts/isa-evolution.in 
b/gcc/config/loongarch/genopts/isa-evolution.in
index e58f0d6a1a1..a6bc3f87f20 100644
--- a/gcc/config/loongarch/genopts/isa-evolution.in
+++ b/gcc/config/loongarch/genopts/isa-evolution.in
@@ -1,2 +1,4 @@
 2  26  div32   Support div.w[u] and mod.w[u] instructions with 
inputs not sign-extended.
+2  27  lam-bh  Support am{swap/add}[_db].{b/h} instructions.
+2  28  lamcas  Support amcas[_db].{b/h/w/d} instructions.
 3  23  ld-seq-sa   Do not need load-load barriers (dbar 0x700).
diff --git a/gcc/config/loongarch/loongarch-cpu.cc 
b/gcc/config/loongarch/loongarch-cpu.cc
index 7acf1a9121d..622df47916f 100644
--- a/gcc/config/loongarch/loongarch-cpu.cc
+++ b/gcc/config/loongarch/loongarch-cpu.cc
@@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
initializers!  */
 
 #define ISA_BASE_LA64V110_FEATURES \
-  (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA)
+  (OPTION_MASK_ISA_DIV32 | OPTION_MASK_ISA_LD_SEQ_SA \
+   | OPTION_MASK_ISA_LAM_BH | OPTION_MASK_ISA_LAMCAS)
 
 int64_t loongarch_isa_base_features[N_ISA_BASE_TYPES] = {
   /* [ISA_BASE_LA64V100] = */ 0,
diff --git a/gcc/config/loongarch/loongarch-cpucfg-map.h 
b/gcc/config/loongarch/loongarch-cpucfg-map.h
index 0c078c39786..02ff1671255 100644
--- a/gcc/config/loongarch/loongarch-cpucfg-map.h
+++ b/gcc/config/loongarch/loongarch-cpucfg-map.h
@@ -30,6 +30,8 @@ static constexpr struct {
   HOST_WIDE_INT isa_evolution_bit;
 } cpucfg_map[] = {
   { 2, 1u << 26, OPTION_MASK_ISA_DIV32 },
+  { 2, 1u << 27, OPTION_MASK_ISA_LAM_BH },
+  { 2, 1u << 28, OPTION_MASK_ISA_LAMCAS },
   { 3, 1u << 23, OPTION_MASK_ISA_LD_SEQ_SA },
 };
 
diff --git a/gcc/config/loongarch/loongarch-str.h 
b/gcc/config/loongarch/loongarch-str.h
index 889962e9ab0..0384493765c 100644
--- a/gcc/config/loongarch/loongarch-str.h
+++ b/gcc/config/loongarch/loongarch-str.h
@@ -70,6 +70,8 @@ along with GCC; see the file COPYING3.  If not see
 #define STR_EXPLICIT_RELOCS_ALWAYS "always"
 
 #define OPTSTR_DIV32   "div32"
+#define OPTSTR_LAM_BH  "lam-bh"
+#define OPTSTR_LAMCAS  "lamcas"
 #define OPTSTR_LD_SEQ_SA   "ld-seq-sa"
 
 #endif /* LOONGARCH_STR_H */
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 5d3282c5e93..46a898b79b7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -11451,6 +11451,8 @@ loongarch_asm_code_end (void)
   fprintf (asm_out_file, "%s Base ISA: %s\n", ASM_COMMENT_START,
   loongarch_isa_base_strings [la_target.isa.base]);
   DUMP_FEATURE (TARGET_DIV32);
+  DUMP_FEATURE (TARGET_LAM_BH);
+  DUMP_FEATURE (TARGET_LAMCAS);
   DUMP_FEATURE (TARGET_LD_SEQ_SA);
 }
 
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index a39eddc108b..4d36e3ec4de 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -267,6 +267,14 @@ mdiv32
 Target Mask(ISA_DIV32) Var(isa_evolution)
 Support div.w[u] and mod.w[u] instructions with inputs not sign-extended.
 
+mlam-bh
+Target Mask(ISA_LAM_BH) Var(isa_evolution)
+Support am{swap/add}[_db].{b/h} instructions.
+
+mlamcas
+Target Mask(ISA_LAMCAS) Var(isa_evolution)
+Support amcas[_db].{b/h/w/d} instructions.
+
 mld-seq-sa
 Target Mask(ISA_LD_SEQ_SA) Var(isa_evolution)
 Do not need 

[PATCH v2 5/6] LoongArch: Don't emit dbar 0x700 if -mld-seq-sa

2023-11-17 Thread Xi Ruoyao
This option (CPUCFG word 0x3 bit 23) means "the hardware guarantee that
two loads on the same address won't be reordered with each other".  Thus
we can omit the "load-load" barrier dbar 0x700.

This is only a micro-optimization because dbar 0x700 is already treated
as nop if the hardware supports LD_SEQ_SA.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_print_operand): Don't
print dbar 0x700 if TARGET_LD_SEQ_SA.
* config/loongarch/sync.md (atomic_load): Likewise.
---
 gcc/config/loongarch/loongarch.cc | 2 +-
 gcc/config/loongarch/sync.md  | 9 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index b4bb2b6eeb5..5d3282c5e93 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6057,7 +6057,7 @@ loongarch_print_operand (FILE *file, rtx op, int letter)
   if (loongarch_cas_failure_memorder_needs_acquire (
memmodel_from_int (INTVAL (op
fputs ("dbar\t0b10100", file);
-  else
+  else if (!TARGET_LD_SEQ_SA)
fputs ("dbar\t0x700", file);
   break;
 
diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md
index 67848d72b87..ce3ce89a61d 100644
--- a/gcc/config/loongarch/sync.md
+++ b/gcc/config/loongarch/sync.md
@@ -119,13 +119,14 @@ (define_insn "atomic_load"
 case MEMMODEL_SEQ_CST:
   return "dbar\t0x11\\n\\t"
 "ld.\t%0,%1\\n\\t"
-"dbar\t0x14\\n\\t";
+"dbar\t0x14";
 case MEMMODEL_ACQUIRE:
   return "ld.\t%0,%1\\n\\t"
-"dbar\t0x14\\n\\t";
+"dbar\t0x14";
 case MEMMODEL_RELAXED:
-  return "ld.\t%0,%1\\n\\t"
-"dbar\t0x700\\n\\t";
+  return TARGET_LD_SEQ_SA ? "ld.\t%0,%1\\n\\t"
+ : "ld.\t%0,%1\\n\\t"
+   "dbar\t0x700";
 
 default:
   /* The valid memory order variants are __ATOMIC_RELAXED, 
__ATOMIC_SEQ_CST,
-- 
2.42.1



[PATCH v2 2/6] LoongArch: genopts: Add infrastructure to generate code for new features in ISA evolution

2023-11-17 Thread Xi Ruoyao
LoongArch v1.10 introduced the concept of ISA evolution.  During ISA
evolution, many independent features can be added and enumerated via
CPUCFG.

Add a data file into genopts storing the CPUCFG word, bit, the name
of the command line option controlling if this feature should be used
for compilation, and the text description.  Make genstr.sh process these
info and add the command line options into loongarch.opt and
loongarch-str.h, and generate a new file loongarch-cpucfg-map.h for
mapping CPUCFG output to the corresponding option.  When handling
-march=native, use the information in loongarch-cpucfg-map.h to generate
the corresponding option mask.  Enable the features implied by -march
setting unless the user has explicitly disabled the feature.

The added options (-mdiv32 and -mld-seq-sa) are not really handled yet.
They'll be used in the following patches.

gcc/ChangeLog:

* config/loongarch/genopts/isa-evolution.in: New data file.
* config/loongarch/genopts/genstr.sh: Translate info in
isa-evolution.in when generating loongarch-str.h, loongarch.opt,
and loongarch-cpucfg-map.h.
* config/loongarch/genopts/loongarch.opt.in (isa_evolution):
New variable.
* config/loongarch/t-loongarch: (loongarch-cpucfg-map.h): New
rule.
(loongarch-str.h): Depend on isa-evolution.in.
(loongarch.opt): Depend on isa-evolution.in.
(loongarch-cpu.o): Depend on loongarch-cpucfg-map.h.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch-def.h (loongarch_isa):  Add field
for evolution features.  Add helper function to enable features
in this field.
Probe native CPU capability and save the corresponding options
into preset.
* config/loongarch/loongarch-cpu.cc (fill_native_cpu_config):
Probe native CPU capability and save the corresponding options
into preset.
(cache_cpucfg): Simplify with C++11-style for loop.
(cpucfg_useful_idx, N_CPUCFG_WORDS): Move to ...
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Enable the ISA evolution
feature options implied by -march and not explicitly disabled.
(loongarch_asm_code_end): New function, print ISA information as
comments in the assembly if -fverbose-asm.  It makes easier to
debug things like -march=native.
(TARGET_ASM_CODE_END): Define.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-cpucfg-map.h: Generate.
(cpucfg_useful_idx, N_CPUCFG_WORDS) ... here.
---
 gcc/config/loongarch/genopts/genstr.sh| 92 ++-
 gcc/config/loongarch/genopts/isa-evolution.in |  2 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  7 ++
 gcc/config/loongarch/loongarch-cpu.cc | 46 +-
 gcc/config/loongarch/loongarch-cpucfg-map.h   | 48 ++
 gcc/config/loongarch/loongarch-def.h  |  7 ++
 gcc/config/loongarch/loongarch-str.h  |  7 +-
 gcc/config/loongarch/loongarch.cc | 31 +++
 gcc/config/loongarch/loongarch.opt| 20 +++-
 gcc/config/loongarch/t-loongarch  | 21 -
 10 files changed, 245 insertions(+), 36 deletions(-)
 create mode 100644 gcc/config/loongarch/genopts/isa-evolution.in
 create mode 100644 gcc/config/loongarch/loongarch-cpucfg-map.h

diff --git a/gcc/config/loongarch/genopts/genstr.sh 
b/gcc/config/loongarch/genopts/genstr.sh
index 04e785576bb..cc83496ae38 100755
--- a/gcc/config/loongarch/genopts/genstr.sh
+++ b/gcc/config/loongarch/genopts/genstr.sh
@@ -25,8 +25,8 @@ cd "$(dirname "$0")"
 # Generate a header containing definitions from the string table.
 gen_defines() {
 cat .  */
+
+#ifndef LOONGARCH_CPUCFG_MAP_H
+#define LOONGARCH_CPUCFG_MAP_H
+
+#include "options.h"
+
+static constexpr struct {
+  int cpucfg_word;
+  unsigned int cpucfg_bit;
+  HOST_WIDE_INT isa_evolution_bit;
+} cpucfg_map[] = {
+EOF
+
+# Generate the strings from isa-evolution.in.
+awk '{
+  gsub(/-/, "_", $3)
+  print("  { "$1", 1u << "$2", OPTION_MASK_ISA_"toupper($3)" },")
+}' isa-evolution.in
+
+echo "};"
+echo
+echo "static constexpr int cpucfg_useful_idx[] = {"
+
+awk 'BEGIN { print("  0,\n  1,\n  2,\n  16,\n  17,\n  18,\n  19,") }
+{if ($1+0 > max+0) max=$1; print("  "$1",")}' \
+   isa-evolution.in | sort -n | uniq
+
+echo "};"
+echo ""
+
+awk 'BEGIN { max=19 }
+{ if ($1+0 > max+0) max=$1 }
+END { print "static constexpr int N_CPUCFG_WORDS = "1+max";" }' \
+   isa-evolution.in
+
+echo "#endif /* LOONGARCH_CPUCFG_MAP_H */"
 }
 
 main() {
 case "$1" in
+   cpucfg-map) gen_cpucfg_map;;
header) gen_defines;;
opt) gen_options;;
-   *) echo "Unknown Command: \"$1\". Available: header, opt"; exit 1;;
+   *) echo "Unknown Command: \"$1\". Available: 

[PATCH v2 1/6] LoongArch: Fix internal error running "gcc -march=native" on LA664

2023-11-17 Thread Xi Ruoyao
On LA664, the PRID preset is ISA_BASE_LA64V110 but the base architecture
is guessed ISA_BASE_LA64V100.  This causes a warning to be outputed:

cc1: warning: base architecture 'la64' differs from PRID preset '?'

But we've not set the "?" above in loongarch_isa_base_strings, thus it's
a nullptr and then an ICE is triggered.

Add ISA_BASE_LA64V110 to genopts and initialize
loongarch_isa_base_strings[ISA_BASE_LA64V110] correctly to fix the ICE.
The warning itself will be fixed later.

gcc/ChangeLog:

* config/loongarch/genopts/loongarch-strings:
(STR_ISA_BASE_LA64V110): Add.
* config/loongarch/genopts/loongarch.opt.in:
(ISA_BASE_LA64V110): Add.
* config/loongarch/loongarch-def.c
(loongarch_isa_base_strings): Initialize [ISA_BASE_LA64V110]
to STR_ISA_BASE_LA64V110.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch-str.h: Regenerate.
---
 gcc/config/loongarch/genopts/loongarch-strings | 1 +
 gcc/config/loongarch/genopts/loongarch.opt.in  | 3 +++
 gcc/config/loongarch/loongarch-def.c   | 1 +
 gcc/config/loongarch/loongarch-str.h   | 1 +
 gcc/config/loongarch/loongarch.opt | 3 +++
 5 files changed, 9 insertions(+)

diff --git a/gcc/config/loongarch/genopts/loongarch-strings 
b/gcc/config/loongarch/genopts/loongarch-strings
index 7bc4824007e..b2070c83ed0 100644
--- a/gcc/config/loongarch/genopts/loongarch-strings
+++ b/gcc/config/loongarch/genopts/loongarch-strings
@@ -30,6 +30,7 @@ STR_CPU_LA664   la664
 
 # Base architecture
 STR_ISA_BASE_LA64V100 la64
+STR_ISA_BASE_LA64V110 la64v1.1
 
 # -mfpu
 OPTSTR_ISA_EXT_FPUfpu
diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in 
b/gcc/config/loongarch/genopts/loongarch.opt.in
index 00b4733d75b..b274b3fb21e 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -32,6 +32,9 @@ Basic ISAs of LoongArch:
 EnumValue
 Enum(isa_base) String(@@STR_ISA_BASE_LA64V100@@) Value(ISA_BASE_LA64V100)
 
+EnumValue
+Enum(isa_base) String(@@STR_ISA_BASE_LA64V110@@) Value(ISA_BASE_LA64V110)
+
 ;; ISA extensions / adjustments
 Enum
 Name(isa_ext_fpu) Type(int)
diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index 067629141b6..f22d488acb2 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -165,6 +165,7 @@ loongarch_cpu_multipass_dfa_lookahead[N_TUNE_TYPES] = {
 const char*
 loongarch_isa_base_strings[N_ISA_BASE_TYPES] = {
   [ISA_BASE_LA64V100] = STR_ISA_BASE_LA64V100,
+  [ISA_BASE_LA64V110] = STR_ISA_BASE_LA64V110,
 };
 
 const char*
diff --git a/gcc/config/loongarch/loongarch-str.h 
b/gcc/config/loongarch/loongarch-str.h
index fc4f41bfc1e..114dbc692d7 100644
--- a/gcc/config/loongarch/loongarch-str.h
+++ b/gcc/config/loongarch/loongarch-str.h
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #define STR_CPU_LA664 "la664"
 
 #define STR_ISA_BASE_LA64V100 "la64"
+#define STR_ISA_BASE_LA64V110 "la64v1.1"
 
 #define OPTSTR_ISA_EXT_FPU "fpu"
 #define STR_NONE "none"
diff --git a/gcc/config/loongarch/loongarch.opt 
b/gcc/config/loongarch/loongarch.opt
index 7f129e53ba5..350ca30d232 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -39,6 +39,9 @@ Basic ISAs of LoongArch:
 EnumValue
 Enum(isa_base) String(la64) Value(ISA_BASE_LA64V100)
 
+EnumValue
+Enum(isa_base) String(la64v1.1) Value(ISA_BASE_LA64V110)
+
 ;; ISA extensions / adjustments
 Enum
 Name(isa_ext_fpu) Type(int)
-- 
2.42.1



[PATCH v2 0/6] Add LoongArch v1.1 div32 and ld-seq-sa support

2023-11-17 Thread Xi Ruoyao
Superseds
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636795.html.

Requires
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636946.html.

Changes:

- Rebase on top of "Add LoongarchV1.1 instructions support".
- Not to translate loongarch-def.c C++.  Use int64_t instead of
  HOST_WIDE_INT in loongarch-def.h.
- In genopts, also generates cpucfg_useful_idx[] and N_CPUCFG_WORDS.
  Use decimals instead of hexidecimals for CPUCFG word index to make awk
  happy to perform numerical comparision.
- Dump arch and feature info as comments in generated assembly if
  -fverbose-asm.  It's helpful for testing and debugging.

Xi Ruoyao (6):
  LoongArch: Fix internal error running "gcc -march=native" on LA664
  LoongArch: genopts: Add infrastructure to generate code for new
features in ISA evolution
  LoongArch: Add evolution features of base ISA revisions
  LoongArch: Take the advantage of -mdiv32 if it's enabled
  LoongArch: Don't emit dbar 0x700 if -mld-seq-sa
  LoongArch: Add fine-grained control for LAM_BH and LAMCAS

 gcc/config/loongarch/genopts/genstr.sh|  92 ++-
 gcc/config/loongarch/genopts/isa-evolution.in |   4 +
 .../loongarch/genopts/loongarch-strings   |   1 +
 gcc/config/loongarch/genopts/loongarch.opt.in |  10 ++
 gcc/config/loongarch/loongarch-cpu.cc | 105 +++---
 gcc/config/loongarch/loongarch-cpucfg-map.h   |  50 +
 gcc/config/loongarch/loongarch-def.c  |   1 +
 gcc/config/loongarch/loongarch-def.h  |  12 ++
 gcc/config/loongarch/loongarch-opts.cc|   3 +
 gcc/config/loongarch/loongarch-str.h  |  10 +-
 gcc/config/loongarch/loongarch.cc |  35 +-
 gcc/config/loongarch/loongarch.md |  31 +-
 gcc/config/loongarch/loongarch.opt|  31 +-
 gcc/config/loongarch/sync.md  |  25 +++--
 gcc/config/loongarch/t-loongarch  |  21 +++-
 .../gcc.target/loongarch/div-div32.c  |  31 ++
 .../gcc.target/loongarch/div-no-div32.c   |  11 ++
 17 files changed, 403 insertions(+), 70 deletions(-)
 create mode 100644 gcc/config/loongarch/genopts/isa-evolution.in
 create mode 100644 gcc/config/loongarch/loongarch-cpucfg-map.h
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-div32.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/div-no-div32.c

-- 
2.42.1



[Bug c++/112596] New: GCC regex error in

2023-11-17 Thread svraghavan7 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112596

Bug ID: 112596
   Summary: GCC regex error in
   Product: gcc
   Version: 9.5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: svraghavan7 at gmail dot com
  Target Milestone: ---

Created attachment 56623
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56623=edit
Gdb Output trace

I have a plugin code which is developed using Opentelemetry C++ where the code
is running fine on gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) when i
deploy the same code in gcc 9.5.0 or 11.4 (Ubuntu 9.5.0-1ubuntu1~22.04)  the
plugin is generating following error.

terminate called after throwing an instance of '
std::bad_alloc'
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc  what():  std::bad_alloc

at the stage of executing this code
std::__cxx11::basic_regex
>::basic_regex, std::allocator
>(std::__cxx11::basic_string, std::allocator
> const&, std::regex_constants::syntax_option_type)

[Bug middle-end/112552] [14 Regression] ICE: in expand_insn, at optabs.cc:8305

2023-11-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112552

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Robin Dapp :

https://gcc.gnu.org/g:231bb992592a9e1bd7ce6583131acb1874c8e34e

commit r14-5564-g231bb992592a9e1bd7ce6583131acb1874c8e34e
Author: Robin Dapp 
Date:   Thu Nov 16 20:42:10 2023 +0100

vect: Pass truth type to vect_get_vec_defs.

For conditional operations the mask is loop invariant and cannot be
stored explicitly.  By default, for reductions, we deduce the vectype
from the statement or the loop but this does not work for conditional
operations.  Therefore this patch passes the truth type of the reduction
input vectype for the mask operand instead.  This will override the
other choices and make sure we have the proper mask vectype.

gcc/ChangeLog:

PR middle-end/112406
PR middle-end/112552

* tree-vect-loop.cc (vect_transform_reduction): Pass truth
vectype for mask operand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr112406.c: New test.
* gcc.target/riscv/rvv/autovec/pr112552.c: New test.

[Bug middle-end/112406] [14 Regression] Several SPECCPU 2017 benchmarks fail with on internal compiler error: in expand_insn, at optabs.cc:8305 after g:01c18f58d37865d5f3bbe93e666183b54ec608c7

2023-11-17 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Robin Dapp :

https://gcc.gnu.org/g:231bb992592a9e1bd7ce6583131acb1874c8e34e

commit r14-5564-g231bb992592a9e1bd7ce6583131acb1874c8e34e
Author: Robin Dapp 
Date:   Thu Nov 16 20:42:10 2023 +0100

vect: Pass truth type to vect_get_vec_defs.

For conditional operations the mask is loop invariant and cannot be
stored explicitly.  By default, for reductions, we deduce the vectype
from the statement or the loop but this does not work for conditional
operations.  Therefore this patch passes the truth type of the reduction
input vectype for the mask operand instead.  This will override the
other choices and make sure we have the proper mask vectype.

gcc/ChangeLog:

PR middle-end/112406
PR middle-end/112552

* tree-vect-loop.cc (vect_transform_reduction): Pass truth
vectype for mask operand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr112406.c: New test.
* gcc.target/riscv/rvv/autovec/pr112552.c: New test.

[PATCH 7/7] lto: partition specific lto_clone_numbers

2023-11-17 Thread Michal Jires
Replaces "lto_priv.$clone_number" by
"lto_priv.$partition_hash.$partition_specific_clone_number".
To reduce divergence for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/lto/ChangeLog:

* lto-partition.cc (set_clone_partition_name_checksum): New.
(CHECKSUM_STRING): New.
(privatize_symbol_name_1): Use partition hash for lto_priv.
(lto_promote_cross_file_statics): Use set_clone_partition_name_checksum.
(lto_promote_statics_nonwpa): Changed clone_map type.
---
 gcc/lto/lto-partition.cc | 49 +++-
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index eb31ecba0d3..a2ce24eea23 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "lto-partition.h"
 #include "sreal.h"
+#include "md5.h"
 
 #include 
 #include 
@@ -1516,8 +1517,36 @@ validize_symbol_for_target (symtab_node *node)
 }
 }
 
-/* Maps symbol names to unique lto clone counters.  */
-static hash_map *lto_clone_numbers;
+/* Maps symbol names with partition checksum to unique lto clone counters.  */
+using clone_map = hash_map>, unsigned>;
+static clone_map *lto_clone_numbers;
+uint64_t current_partition_checksum = 0;
+
+/* Computes a quick checksum to distinguish partitions of clone numbers.  */
+void
+set_clone_partition_name_checksum (ltrans_partition part)
+{
+#define CHECKSUM_STRING(FOO) md5_process_bytes ((FOO), strlen (FOO), )
+  struct md5_ctx ctx;
+  md5_init_ctx ();
+
+  CHECKSUM_STRING (part->name);
+
+  lto_symtab_encoder_iterator lsei;
+  lto_symtab_encoder_t encoder = part->encoder;
+
+  for (lsei = lsei_start (encoder); !lsei_end_p (lsei); lsei_next ())
+{
+  symtab_node *node = lsei_node (lsei);
+  CHECKSUM_STRING (node->name ());
+}
+
+  uint64_t checksum[2];
+  md5_finish_ctx (, checksum);
+  current_partition_checksum = checksum[0];
+#undef CHECKSUM_STRING
+}
 
 /* Helper for privatize_symbol_name.  Mangle NODE symbol name
represented by DECL.  */
@@ -1531,10 +1560,16 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
 return false;
 
   const char *name = maybe_rewrite_identifier (name0);
-  unsigned _number = lto_clone_numbers->get_or_insert (name);
+
+  unsigned _number = lto_clone_numbers->get_or_insert (
+std::pair {name, current_partition_checksum});
+
+  char lto_priv[32];
+  sprintf (lto_priv, "lto_priv.%lu", current_partition_checksum);
+
   symtab->change_decl_assembler_name (decl,
  clone_function_name (
- name, "lto_priv", clone_number));
+ name, lto_priv, clone_number));
   clone_number++;
 
   if (node->lto_file_data)
@@ -1735,11 +1770,13 @@ lto_promote_cross_file_statics (void)
   part->encoder = compute_ltrans_boundary (part->encoder);
 }
 
-  lto_clone_numbers = new hash_map;
+  lto_clone_numbers = new clone_map;
 
   /* Look at boundaries and promote symbols as needed.  */
   for (i = 0; i < n_sets; i++)
 {
+  set_clone_partition_name_checksum (ltrans_partitions[i]);
+
   lto_symtab_encoder_iterator lsei;
   lto_symtab_encoder_t encoder = ltrans_partitions[i]->encoder;
 
@@ -1778,7 +1815,7 @@ lto_promote_statics_nonwpa (void)
 {
   symtab_node *node;
 
-  lto_clone_numbers = new hash_map;
+  lto_clone_numbers = new clone_map;
   FOR_EACH_SYMBOL (node)
 {
   rename_statics (NULL, node);
-- 
2.42.1



[PATCH 6/7] lto: squash order of symbols in partitions

2023-11-17 Thread Michal Jires
This patch squashes order of symbols in individual partitions, so that
their relative order is conserved, but is not influenced by symbols in
other partitions.
Order of cloned symbols is set to 0. This should be fine because order
specifies order of symbols in input files, which cloned symbols are not
part of.

This is important for incremental LTO because if there is a new symbol,
it otherwise shifts order of all symbols with higher order, which would
diverge them all.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-cgraph.cc (lto_output_node): Add and use order_remap.
(lto_output_varpool_node): Likewise.
(output_symtab): Likewise.
* lto-streamer-out.cc (produce_asm): Likewise.
(output_function): Likewise.
(output_constructor): Likewise.
(copy_function_or_variable): Likewise.
(cmp_int): New.
(lto_output): Generate order_remap.
* lto-streamer.h (produce_asm): Add order_remap.
(output_symtab): Likewise.
---
 gcc/lto-cgraph.cc   | 20 
 gcc/lto-streamer-out.cc | 71 +
 gcc/lto-streamer.h  |  5 +--
 3 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 32c0f5ac6db..a7530290fba 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -381,7 +381,8 @@ reachable_from_this_partition_p (struct cgraph_node *node, 
lto_symtab_encoder_t
 
 static void
 lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
-lto_symtab_encoder_t encoder)
+lto_symtab_encoder_t encoder,
+hash_map, int>* order_remap)
 {
   unsigned int tag;
   struct bitpack_d bp;
@@ -405,7 +406,9 @@ lto_output_node (struct lto_simple_output_block *ob, struct 
cgraph_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
   tag);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
 
   /* In WPA mode, we only output part of the call-graph.  Also, we
  fake cgraph node attributes.  There are two cases that we care.
@@ -585,7 +588,8 @@ lto_output_node (struct lto_simple_output_block *ob, struct 
cgraph_node *node,
 
 static void
 lto_output_varpool_node (struct lto_simple_output_block *ob, varpool_node 
*node,
-lto_symtab_encoder_t encoder)
+lto_symtab_encoder_t encoder,
+hash_map, int>* order_remap)
 {
   bool boundary_p = !lto_symtab_encoder_in_partition_p (encoder, node);
   bool encode_initializer_p
@@ -602,7 +606,8 @@ lto_output_varpool_node (struct lto_simple_output_block 
*ob, varpool_node *node,
 
   streamer_write_enum (ob->main_stream, LTO_symtab_tags, LTO_symtab_last_tag,
   LTO_symtab_variable);
-  streamer_write_hwi_stream (ob->main_stream, node->order);
+  int order = flag_wpa ? *order_remap->get (node->order) : node->order;
+  streamer_write_hwi_stream (ob->main_stream, order);
   lto_output_var_decl_ref (ob->decl_state, ob->main_stream, node->decl);
   bp = bitpack_create (ob->main_stream);
   bp_pack_value (, node->externally_visible, 1);
@@ -967,7 +972,7 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
 /* Output the part of the symtab in SET and VSET.  */
 
 void
-output_symtab (void)
+output_symtab (hash_map, int>* order_remap)
 {
   struct cgraph_node *node;
   struct lto_simple_output_block *ob;
@@ -994,9 +999,10 @@ output_symtab (void)
 {
   symtab_node *node = lto_symtab_encoder_deref (encoder, i);
   if (cgraph_node *cnode = dyn_cast  (node))
-lto_output_node (ob, cnode, encoder);
+   lto_output_node (ob, cnode, encoder, order_remap);
   else
-   lto_output_varpool_node (ob, dyn_cast (node), encoder);
+   lto_output_varpool_node (ob, dyn_cast (node), encoder,
+order_remap);
 }
 
   /* Go over the nodes in SET again to write edges.  */
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index a1bbea8fc68..9448ab195d5 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -2212,7 +2212,8 @@ output_cfg (struct output_block *ob, struct function *fn)
a function, set FN to the decl for that function.  */
 
 void
-produce_asm (struct output_block *ob, tree fn)
+produce_asm (struct output_block *ob, tree fn,
+hash_map, int>* order_remap)
 {
   enum lto_section_type section_type = ob->section_type;
   struct lto_function_header header;
@@ -2221,9 +,11 @@ produce_asm (struct output_block *ob, tree fn)
   if (section_type == LTO_section_function_body)
 {
   const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (fn));
-  section_name = lto_get_section_name (section_type, name,
-  

[PATCH 5/7] lto: Implement cache partitioning

2023-11-17 Thread Michal Jires
This patch implements new cache partitioning. It tries to keep symbols
from single source file together to minimize propagation of divergence.

It starts with symbols already grouped by source files. If reasonably
possible it only either combines several files into one final partition,
or, if a file is large, split the file into several final partitions.

Intermediate representation is partition_set which contains set of
groups of symbols (each group corresponding to original source file) and
number of final partitions this partition_set should split into.

First partition_fixed_split splits partition_set into constant number of
partition_sets with equal number of symbols groups. If for example there
are 39 source files, the resulting partition_sets will contain 10, 10,
10, and 9 source files. This splitting intentionally ignores estimated
instruction counts to minimize propagation of divergence.

Second partition_over_target_split separates too large files and splits
them into individual symbols to be combined back into several smaller
files in next step.

Third partition_binary_split splits partition_set into two halves until
it should be split into only one final partition, at which point the
remaining symbols are joined into one final partition.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* common.opt: Add cache partitioning.
* flag-types.h (enum lto_partition_model): Likewise.

gcc/lto/ChangeLog:

* lto-partition.cc (new_partition): Use new_partition_no_push.
(new_partition_no_push): New.
(free_ltrans_partition): New.
(free_ltrans_partitions): Use free_ltrans_partition.
(join_partitions): New.
(split_partition_into_nodes): New.
(is_partition_reorder): New.
(class partition_set): New.
(distribute_n_partitions): New.
(partition_over_target_split): New.
(partition_binary_split): New.
(partition_fixed_split): New.
(class partitioner_base): New.
(class partitioner_default): New.
(lto_cache_map): New.
* lto-partition.h (lto_cache_map): New.
* lto.cc (do_whole_program_analysis): Use lto_cache_map.

gcc/testsuite/ChangeLog:

* gcc.dg/completion-2.c: Add -flto-partition=cache.
---
 gcc/common.opt  |   3 +
 gcc/flag-types.h|   3 +-
 gcc/lto/lto-partition.cc| 605 +++-
 gcc/lto/lto-partition.h |   1 +
 gcc/lto/lto.cc  |   2 +
 gcc/testsuite/gcc.dg/completion-2.c |   1 +
 6 files changed, 605 insertions(+), 10 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1cf3bdd3b51..fe5cf3c0a05 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2174,6 +2174,9 @@ Enum(lto_partition_model) String(1to1) 
Value(LTO_PARTITION_1TO1)
 EnumValue
 Enum(lto_partition_model) String(max) Value(LTO_PARTITION_MAX)
 
+EnumValue
+Enum(lto_partition_model) String(cache) Value(LTO_PARTITION_CACHE)
+
 flto-partition=
 Common Joined RejectNegative Enum(lto_partition_model) Var(flag_lto_partition) 
Init(LTO_PARTITION_BALANCED)
 Specify the algorithm to partition symbols and vars at linktime.
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index c1852cd810c..59b3c23081b 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -393,7 +393,8 @@ enum lto_partition_model {
   LTO_PARTITION_ONE = 1,
   LTO_PARTITION_BALANCED = 2,
   LTO_PARTITION_1TO1 = 3,
-  LTO_PARTITION_MAX = 4
+  LTO_PARTITION_MAX = 4,
+  LTO_PARTITION_CACHE = 5
 };
 
 /* flag_lto_linker_output initialization values.  */
diff --git a/gcc/lto/lto-partition.cc b/gcc/lto/lto-partition.cc
index e4c91213f4b..eb31ecba0d3 100644
--- a/gcc/lto/lto-partition.cc
+++ b/gcc/lto/lto-partition.cc
@@ -36,6 +36,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-partition.h"
 #include "sreal.h"
 
+#include 
+#include 
+
 vec ltrans_partitions;
 
 static void add_symbol_to_partition (ltrans_partition part, symtab_node *node);
@@ -59,20 +62,41 @@ cmp_partitions_order (const void *a, const void *b)
   return orderb - ordera;
 }
 
-/* Create new partition with name NAME.  */
-
+/* Create new partition with name NAME.
+   Does not push into ltrans_partitions.  */
 static ltrans_partition
-new_partition (const char *name)
+new_partition_no_push (const char *name)
 {
   ltrans_partition part = XCNEW (struct ltrans_partition_def);
   part->encoder = lto_symtab_encoder_new (false);
   part->name = name;
   part->insns = 0;
   part->symbols = 0;
+  return part;
+}
+
+/* Create new partition with name NAME.  */
+
+static ltrans_partition
+new_partition (const char *name)
+{
+  ltrans_partition part = new_partition_no_push (name);
   ltrans_partitions.safe_push (part);
   return part;
 }
 
+/* Free memory used by ltrans partition.
+   Encoder can be kept to be freed after streaming.  */
+static void
+free_ltrans_partition (ltrans_partition part, bool delete_encoder)
+  {
+if 

[PATCH 3/7] Lockfile.

2023-11-17 Thread Michal Jires
This patch implements lockfile used for incremental LTO.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lockfile.o.
* lockfile.cc: New file.
* lockfile.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lockfile.cc | 136 
 gcc/lockfile.h  |  85 ++
 3 files changed, 224 insertions(+), 2 deletions(-)
 create mode 100644 gcc/lockfile.cc
 create mode 100644 gcc/lockfile.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7b7a4ff789a..2c527245c81 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o
+  lto-wrapper.o collect-utils.o lockfile.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
   $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBS)
diff --git a/gcc/lockfile.cc b/gcc/lockfile.cc
new file mode 100644
index 000..9440e8938f3
--- /dev/null
+++ b/gcc/lockfile.cc
@@ -0,0 +1,136 @@
+/* File locking.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+
+#include "lockfile.h"
+
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Blocking call.  */
+int
+lockfile::lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  while (fcntl (fd, F_SETLKW, _flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unique write lock.  No other lock can be held on this lockfile.
+   Only locks if this filelock is not locked by any other process.
+   Return whether locking was successful.  */
+int
+lockfile::try_lock_write ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_WRLCK;
+
+  if (fcntl (fd, F_SETLK, _flock) == -1)
+{
+  close (fd);
+  fd = -1;
+  return 1;
+}
+#endif
+  return 0;
+}
+
+/* Shared read lock.  Only read lock can be held concurrently.
+   If write lock is already held by this process, it will be
+   changed to read lock.
+   Blocking call.  */
+int
+lockfile::lock_read ()
+{
+  fd = open (filename.c_str (), O_RDWR | O_CREAT, 0666);
+  if (fd < 0)
+return -1;
+
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_RDLCK;
+
+  while (fcntl (fd, F_SETLKW, _flock) && errno == EINTR)
+continue;
+#endif
+  return 0;
+}
+
+/* Unlock all previously placed locks.  */
+void
+lockfile::unlock ()
+{
+  if (fd < 0)
+{
+#if HAVE_FCNTL_H
+  struct flock s_flock;
+
+  s_flock.l_whence = SEEK_SET;
+  s_flock.l_start = 0;
+  s_flock.l_len = 0;
+  s_flock.l_pid = getpid ();
+  s_flock.l_type = F_UNLCK;
+
+  fcntl (fd, F_SETLK, _flock);
+#endif
+  close (fd);
+  fd = -1;
+}
+}
+
+/* Are lockfiles supported?  */
+bool
+lockfile::lockfile_supported ()
+{
+#if HAVE_FCNTL_H
+  return true;
+#else
+  return false;
+#endif
+}
diff --git a/gcc/lockfile.h b/gcc/lockfile.h
new file mode 100644
index 000..afcbaf599c1
--- /dev/null
+++ b/gcc/lockfile.h
@@ -0,0 +1,85 @@
+/* File locking.
+   Copyright (C) 

[PATCH 4/7] lto: Implement ltrans cache

2023-11-17 Thread Michal Jires
This patch implements Incremental LTO as ltrans cache.

The cache is active when directory $GCC_LTRANS_CACHE is specified and exists.
Stored are pairs of ltrans input/output files and input file hash.
File locking is used to allow multiple GCC instances to use to same cache.

Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* Makefile.in: Add lto-ltrans-cache.o.
* lto-wrapper.cc: Use ltrans cache.
* lto-ltrans-cache.cc: New file.
* lto-ltrans-cache.h: New file.
---
 gcc/Makefile.in |   5 +-
 gcc/lto-ltrans-cache.cc | 407 
 gcc/lto-ltrans-cache.h  | 164 
 gcc/lto-wrapper.cc  | 150 +--
 4 files changed, 711 insertions(+), 15 deletions(-)
 create mode 100644 gcc/lto-ltrans-cache.cc
 create mode 100644 gcc/lto-ltrans-cache.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2c527245c81..495e5f3d069 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1831,7 +1831,7 @@ ALL_HOST_BACKEND_OBJS = $(GCC_OBJS) $(OBJS) 
$(OBJS-libcommon) \
   $(OBJS-libcommon-target) main.o c-family/cppspec.o \
   $(COLLECT2_OBJS) $(EXTRA_GCC_OBJS) $(GCOV_OBJS) $(GCOV_DUMP_OBJS) \
   $(GCOV_TOOL_OBJS) $(GENGTYPE_OBJS) gcc-ar.o gcc-nm.o gcc-ranlib.o \
-  lto-wrapper.o collect-utils.o lockfile.o
+  lto-wrapper.o collect-utils.o lockfile.o lto-ltrans-cache.o
 
 # for anything that is shared use the cc1plus profile data, as that
 # is likely the most exercised during the build
@@ -2359,7 +2359,8 @@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS)
 CFLAGS-collect2.o += -DTARGET_MACHINE=\"$(target_noncanonical)\" \
@TARGET_SYSTEM_ROOT_DEFINE@
 
-LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o
+LTO_WRAPPER_OBJS = lto-wrapper.o collect-utils.o ggc-none.o lockfile.o \
+  lto-ltrans-cache.o
 
 lto-wrapper$(exeext): $(LTO_WRAPPER_OBJS) libcommon-target.a $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o T$@ \
diff --git a/gcc/lto-ltrans-cache.cc b/gcc/lto-ltrans-cache.cc
new file mode 100644
index 000..0d43e548fb3
--- /dev/null
+++ b/gcc/lto-ltrans-cache.cc
@@ -0,0 +1,407 @@
+/* File caching.
+   Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "md5.h"
+#include "lto-ltrans-cache.h"
+
+#include 
+#include 
+#include 
+
+const md5_checksum_t INVALID_CHECKSUM = {
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+};
+
+/* Computes checksum for given file, returns INVALID_CHECKSUM if not possible.
+ */
+static md5_checksum_t
+file_checksum (char const *filename)
+{
+  FILE *file = fopen (filename, "rb");
+
+  if (!file)
+return INVALID_CHECKSUM;
+
+  md5_checksum_t result;
+
+  int ret = md5_stream (file, );
+
+  if (ret)
+result = INVALID_CHECKSUM;
+
+  fclose (file);
+
+  return result;
+}
+
+/* Checks identity of two files byte by byte.  */
+static bool
+files_identical (char const *first_filename, char const *second_filename)
+{
+  FILE *f_first = fopen (first_filename, "rb");
+  if (!f_first)
+return false;
+
+  FILE *f_second = fopen (second_filename, "rb");
+  if (!f_second)
+{
+  fclose (f_first);
+  return false;
+}
+
+  bool ret = true;
+
+  for (;;)
+{
+  int c1, c2;
+  c1 = fgetc (f_first);
+  c2 = fgetc (f_second);
+
+  if (c1 != c2)
+   {
+ ret = false;
+ break;
+   }
+
+  if (c1 == EOF)
+   break;
+}
+
+  fclose (f_first);
+  fclose (f_second);
+  return ret;
+}
+
+/* Contructor of cache item.  */
+ltrans_file_cache::item::item (std::string input, std::string output,
+  md5_checksum_t input_checksum, uint32_t last_used):
+  input (std::move (input)), output (std::move (output)),
+  input_checksum (input_checksum), last_used (last_used)
+{
+  lock = lockfile (this->input + ".lock");
+}
+/* Destructor of cache item.  */
+ltrans_file_cache::item::~item ()
+{
+  lock.unlock ();
+}
+
+/* Reads next cache item from cachedata file.
+   Adds `dir/` prefix to filenames.  */
+static ltrans_file_cache::item*
+read_cache_item (FILE* f, const char* dir)
+{
+  md5_checksum_t checksum;
+  uint32_t last_used;
+
+  if (fread (, 1, checksum.size (), f) != checksum.size ())
+return NULL;
+  if (fread (_used, sizeof (last_used), 1, f) != 1)
+return NULL;
+
+  std::vector input (strlen (dir));
+  

[PATCH 2/7] lto: Remove random_seed from section name.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-streamer.cc (lto_get_section_name): Remove random_seed in WPA.
---
 gcc/lto-streamer.cc | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer.cc b/gcc/lto-streamer.cc
index 4968fd13413..53275e32618 100644
--- a/gcc/lto-streamer.cc
+++ b/gcc/lto-streamer.cc
@@ -132,11 +132,17 @@ lto_get_section_name (int section_type, const char *name,
  doesn't confuse the reader with merged sections.
 
  For options don't add a ID, the option reader cannot deal with them
- and merging should be ok here. */
+ and merging should be ok here.
+
+ WPA output is sent to LTRANS directly inside of lto-wrapper, so name
+ uniqueness for external tools is not needed.
+ Randomness would inhibit incremental LTO.  */
   if (section_type == LTO_section_opts)
 strcpy (post, "");
   else if (f != NULL) 
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
+  else if (flag_wpa)
+strcpy (post, ".0");
   else
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
   char *res = concat (section_name_prefix, sep, add, post, NULL);
-- 
2.42.1



[PATCH 1/7] lto: Skip flag OPT_fltrans_output_list_.

2023-11-17 Thread Michal Jires
Bootstrapped/regtested on x86_64-pc-linux-gnu

gcc/ChangeLog:

* lto-opts.cc (lto_write_options): Skip OPT_fltrans_output_list_.
---
 gcc/lto-opts.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lto-opts.cc b/gcc/lto-opts.cc
index c9bee9d4197..0451e290c75 100644
--- a/gcc/lto-opts.cc
+++ b/gcc/lto-opts.cc
@@ -152,6 +152,7 @@ lto_write_options (void)
case OPT_fprofile_prefix_map_:
case OPT_fcanon_prefix_map:
case OPT_fwhole_program:
+   case OPT_fltrans_output_list_:
  continue;
 
default:
-- 
2.42.1



[PATCH 0/7] lto: Incremental LTO.

2023-11-17 Thread Michal Jires
Hi,
these patches implement Incremental LTO, specifically by caching results of
ltrans phase. Secondarily these patches contain changes to reduce divergence of
ltrans partitions so that they can be cached.

The aim is to reduce compile times for quick edit-compile cycles while using
LTO. Even with these minimal changes to the rest of GCC it works surprisingly
well. Currently testing by self compiling cc1, with individual commits used as
incremental changes, on average only ~1/3 of partitions need to be recompiled
with `-O2 -g0` and ~1/2 with `-O2 -g`. Which directly reduces time spent in
ltrans phase of LTO.

Unfortunately larger gains are a bit fragile. You may remember that during my
Cauldron talk I claimed reduction to ~1/6 and ~1/3 recompilations. That was
achieved with branch from March. Since then there were at least two commits
which introduced new divergence of partitions, though they seem fixable in
future.


[Bug middle-end/112584] Suboptimal stack usage on third memcpy

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112584

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug c++/112590] structural constexpr class fails to instantiate

2023-11-17 Thread janezz55 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112590

--- Comment #3 from Janez Zemva  ---
Sorry for my last comment. I have prepared a more involved example:

https://github.com/user1095108/ca/blob/master/consttests.cpp

If

test();

is commented out, everything compiles, otherwise not.

Re: [PATCH] vect: Use statement vectype for conditional mask.

2023-11-17 Thread Robin Dapp
> No, you shouldn't place _7 != 0 inside the .COND_ADD but instead
> have an extra pattern stmt producing that so
> 
> patt_8 = _7 != 0;
> patt_9 = .COND_ADD (patt_8, ...);
> 
> that's probably still not enough, but I always quickly forget how
> bool patterns work ... basically a comparison like patt_8 = _7 != 0
> vectorizes to a mask (aka vector boolean) while any "data" uses
> of bools are replaced by mask ? 1 : 0; - there's a complication for
> bool data producing loads which is why we need to insert the
> "fake" compares to produce a mask.  IIRC.

I already had call handling to vect_recog_bool_pattern in working
shape when I realized that vect_recog_mask_conversion_pattern already
handles most of what I need.  The difference is that it doesn't do
 patt_8 = _7 != 0
but rather
 patt_8 =  () _7;

It works equally well and most of the code can be reused.

The attached was bootstrapped and regtested on x86 and aarch64
and regtested on riscv.

Regards
 Robin

Subject: [PATCH] vect: Add bool pattern handling for COND_OPs.

In order to handle masks properly for conditional operations this patch
teaches vect_recog_mask_conversion_pattern to also handle conditional
operations.  Now we convert e.g.

 _mask = *_6;
 _ifc123 = COND_OP (_mask, ...);

into
 _mask = *_6;
 patt200 = () _mask;
 patt201 = COND_OP (patt200, ...);

This way the mask will be properly recognized as boolean mask and the
correct vector mask will be generated.

gcc/ChangeLog:

PR middle-end/112406

* tree-vect-patterns.cc (build_mask_conversion):
(vect_convert_mask_for_vectype):

gcc/testsuite/ChangeLog:

* gfortran.dg/pr112406.f90: New test.
---
 gcc/testsuite/gfortran.dg/pr112406.f90 | 21 +
 gcc/tree-vect-patterns.cc  | 26 ++
 2 files changed, 39 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr112406.f90

diff --git a/gcc/testsuite/gfortran.dg/pr112406.f90 
b/gcc/testsuite/gfortran.dg/pr112406.f90
new file mode 100644
index 000..27e96df7e26
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr112406.f90
@@ -0,0 +1,21 @@
+! { dg-do compile { target { aarch64-*-* || riscv*-*-* } } }
+! { dg-options "-Ofast -w -fprofile-generate" }
+! { dg-additional-options "-march=rv64gcv -mabi=lp64d" { target riscv*-*-* } }
+! { dg-additional-options "-march=armv8-a+sve" { target aarch64-*-* } }
+
+module brute_force
+  integer, parameter :: r=9
+   integer sudoku1(1, r)
+  contains
+subroutine brute
+integer l(r), u(r)
+   where(sudoku1(1, :) /= 1)
+l = 1
+  u = 1
+   end where
+do i1 = 1, u(1)
+   do
+  end do
+   end do
+end
+end
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 7debe7f0731..696b70b76a8 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -5830,7 +5830,8 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
   tree rhs1_op0 = NULL_TREE, rhs1_op1 = NULL_TREE;
   tree rhs1_op0_type = NULL_TREE, rhs1_op1_type = NULL_TREE;
 
-  /* Check for MASK_LOAD ans MASK_STORE calls requiring mask conversion.  */
+  /* Check for MASK_LOAD and MASK_STORE as well as COND_OP calls requiring mask
+ conversion.  */
   if (is_gimple_call (last_stmt)
   && gimple_call_internal_p (last_stmt))
 {
@@ -5842,6 +5843,7 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
return NULL;
 
   bool store_p = internal_store_fn_p (ifn);
+  bool load_p = internal_store_fn_p (ifn);
   if (store_p)
{
  int rhs_index = internal_fn_stored_value_index (ifn);
@@ -5856,15 +5858,21 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
  vectype1 = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
}
 
+  if (!vectype1)
+   return NULL;
+
   tree mask_arg = gimple_call_arg (last_stmt, mask_argno);
   tree mask_arg_type = integer_type_for_mask (mask_arg, vinfo);
-  if (!mask_arg_type)
-   return NULL;
-  vectype2 = get_mask_type_for_scalar_type (vinfo, mask_arg_type);
+  if (mask_arg_type)
+   {
+ vectype2 = get_mask_type_for_scalar_type (vinfo, mask_arg_type);
 
-  if (!vectype1 || !vectype2
- || known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
-  TYPE_VECTOR_SUBPARTS (vectype2)))
+ if (!vectype2
+ || known_eq (TYPE_VECTOR_SUBPARTS (vectype1),
+  TYPE_VECTOR_SUBPARTS (vectype2)))
+   return NULL;
+   }
+  else if (store_p || load_p)
return NULL;
 
   tmp = build_mask_conversion (vinfo, mask_arg, vectype1, stmt_vinfo);
@@ -5883,7 +5891,9 @@ vect_recog_mask_conversion_pattern (vec_info *vinfo,
  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
  gimple_call_set_lhs (pattern_stmt, lhs);
}
-  gimple_call_set_nothrow (pattern_stmt, true);
+
+  if (load_p || store_p)
+   gimple_call_set_nothrow (pattern_stmt, true);
 
   pattern_stmt_info = vinfo->add_stmt 

Re: [Patch] Fortran: Accept -std=f2023, update line-length for Fortran 2023

2023-11-17 Thread Harald Anlauf

Hi Tobias,

On 11/17/23 12:38, Tobias Burnus wrote:

Hi Harald, hi all,

On 16.11.23 20:30, Harald Anlauf wrote:

According to the standard one can have 99 lines with only
"&" and then an ";", but then only 100 lines with 1 characters.


I believe a single '&' is not valid, you either need '&&' or something
else + '&'; thus, you can have only half a million lines + 1.


after looking at the F2023 standard again I wonder why
they did such a disservice to compiler developers...

You are right: a single '&' is not valid.

6.3.2.4 also has:

"When used for continuation, the “&” is not part of the statement"

And 6.3.2.5 (also 6.3.3.4): "The “;” is not part of the statement".

So a million "&"-continued lines is possible in free form.

For fixed form, 6.3.3.1 has: "If a source line contains only characters
of default kind, it shall contain exactly 72 characters; otherwise, its
maximum number of characters is processor dependent."

I wonder what I should make out of this...


In the code, I still use 1,000,000 but now with a comment.


Yeah, for the time being this is the most reasonable solution.
Let's claim that the 10^6 line limit is the new GNU standard ;-)

Cheers,
Harald




[Bug c++/112590] structural constexpr class fails to instantiate

2023-11-17 Thread janezz55 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112590

--- Comment #2 from Janez Zemva  ---
Very nice, but if I write:

int main()
{
  static constexpr S<10> s;
  return 0;
}

there will still be a compile error.

[Bug middle-end/112589] man gcc does not specify the default behavior of -fcf-protection when used without arguments

2023-11-17 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112589

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-11-17

--- Comment #1 from Andrew Pinski  ---
Looks like it has been undocumented since the documentation was added in
r8-3997-g771c6b44dd0353 .

Note the option itself was added in r8-3995-g5c5f0b65eebe36 .

```
fcf-protection
Common RejectNegative Alias(fcf-protection=,full)

fcf-protection=
Common Joined RejectNegative Enum(cf_protection_level) EnumSet
Var(flag_cf_protection) Init(CF_NONE)
-fcf-protection=[full|branch|return|none|check] Instrument functions with
checks to verify jump/call/return control-flow transfer
instructions have valid targets.

```
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fcf-protection


Confirmed.

[Bug c++/112590] structural constexpr class fails to instantiate

2023-11-17 Thread mital at mitalashok dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112590

Mital Ashok  changed:

   What|Removed |Added

 CC||mital at mitalashok dot co.uk

--- Comment #1 from Mital Ashok  ---
See [temp.arg.nontype]p3
:

> For a non-type template-parameter of reference type, or for each non-static
> data member of reference or pointer type in a non-type template-parameter
> of class type or subobject thereof, the reference or pointer value shall
> not refer to or be the address of (respectively):
>  - A temporary object,
>  - [...]
>  - a subject of one of the above.

And "f_{a_}" is initializing the pointer as a subobject of a temporary
object, so this is invalid. Though the error given might be wrong or
misleading.

[Bug c++/112595] New: ICE on invalid code: Literal class NTTP aggregate initialized with self-referential pointer

2023-11-17 Thread mital at mitalashok dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112595

Bug ID: 112595
   Summary: ICE on invalid code: Literal class NTTP aggregate
initialized with self-referential pointer
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mital at mitalashok dot co.uk
  Target Milestone: ---

This code is invalid because it tries to create a NTTP with a pointer to a
temporary object:



struct S {
S* self = this;
};

template
void f() {}

int main() {
f();
}

With either -std=c++20 or -std=c++23, this causes an internal compiler error:

: In function 'int main()':
:9:11: error: no matching function for call to 'f()'
9 | f();
  | ~~^~
:6:6: note: candidate: 'template void f()'
6 | void f() {}
  |  ^
:6:6: note:   template argument deduction/substitution failed:
:9:9: internal compiler error: in cxx_eval_constant_expression, at
cp/constexpr.cc:8236
9 | f();
  | ^
0x26a4c3e internal_error(char const*, ...)
???:0
0xb0affd fancy_abort(char const*, int, char const*)
???:0
0xb77c23 cxx_constant_value(tree_node*, tree_node*, int)
???:0
0xd1580b coerce_template_parms(tree_node*, tree_node*, tree_node*, int, bool)
???:0
0xd440fc fn_type_unification(tree_node*, tree_node*, tree_node*, tree_node*
const*, unsigned int, tree_node*, unification_kind_t, int, conversion**, bool,
bool)
???:0
0xb3a13d build_new_function_call(tree_node*, vec**, int)
???:0
0xd6991c finish_call_expr(tree_node*, vec**, bool,
bool, int)
???:0
0xcf5ac9 c_parse_file()
???:0
0xe37da9 c_common_parse_file()
???:0

This ICE doesn't happen if aggregate initialization isn't used (Changing to
`f()` or adding `constexpr S() = default;`)

[PATCH 4/5] aarch64: Add ZT0

2023-11-17 Thread Richard Sandiford
SME2 adds a 512-bit lookup table called ZT0.  It is enabled
and disabled by PSTATE.ZA, just like ZA itself.  This patch
adds support for the register, including saving and restoring
contents.

The code reuses the V8DI that was added for LS64, including
the associated memory classification rules.  (The ZT0 range
is more restricted than the LS64 range, but that's enforced
by predicates and constraints.)

gcc/
* config/aarch64/aarch64.md (ZT0_REGNUM): New constant.
(LAST_FAKE_REGNUM): Bump to include it.
* config/aarch64/aarch64.h (FIXED_REGISTERS): Add an entry for ZT0.
(CALL_REALLY_USED_REGISTERS, REGISTER_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(machine_function): Add zt0_save_buffer.
(CUMULATIVE_ARGS): Add shared_zt0_flags;
* config/aarch64/aarch64.cc (aarch64_check_state_string): Handle zt0.
(aarch64_fntype_pstate_za, aarch64_fndecl_pstate_za): Likewise.
(aarch64_function_arg): Add the shared ZT0 flags as an extra
limb of the parallel.
(aarch64_init_cumulative_args): Initialize shared_zt0_flags.
(aarch64_extra_live_on_entry): Handle ZT0_REGNUM.
(aarch64_epilogue_uses): Likewise.
(aarch64_get_zt0_save_buffer, aarch64_save_zt0): New functions.
(aarch64_restore_zt0): Likewise.
(aarch64_start_call_args): Reject calls to functions that share
ZT0 from functions that have no ZT0 state.  Save ZT0 around shared-ZA
calls that do not share ZT0.
(aarch64_expand_call): Handle ZT0.  Reject calls to functions that
share ZT0 but not ZA from functions with ZA state.
(aarch64_end_call_args): Restore ZT0 after calls to shared-ZA functions
that do not share ZT0.
(aarch64_set_current_function): Require +sme2 for functions that
have ZT0 state.
(aarch64_function_attribute_inlinable_p): Don't allow functions to
be inlined if they have local zt0 state.
(AARCH64_IPA_CLOBBERS_ZT0): New constant.
(aarch64_update_ipa_fn_target_info): Record asms that clobber ZT0.
(aarch64_can_inline_p): Don't inline callees that clobber ZT0
into functions that have ZT0 state.
(aarch64_comp_type_attributes): Check for compatible ZT0 sharing.
(aarch64_optimize_mode_switching): Use mode switching if the
function has ZT0 state.
(aarch64_mode_emit_local_sme_state): Save and restore ZT0 around
calls to private-ZA functions.
(aarch64_mode_needed_local_sme_state): Require ZA to be active
for instructions that access ZT0.
(aarch64_md_asm_adjust): Extend handling of ZA clobbers to ZT0.
* config/aarch64/aarch64-c.cc (aarch64_define_unconditional_macros):
Define __ARM_STATE_ZT0.
* config/aarch64/aarch64-sme.md (UNSPECV_ASM_UPDATE_ZT0): New unspecv.
(aarch64_asm_update_zt0): New insn.
(UNSPEC_RESTORE_ZT0): New unspec.
(aarch64_sme_ldr_zt0, aarch64_restore_zt0): New insns.
(aarch64_sme_str_zt0): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sme/zt0_state_1.c: New test.
* gcc.target/aarch64/sme/zt0_state_2.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_3.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_4.c: Likewise.
* gcc.target/aarch64/sme/zt0_state_5.c: Likewise.
---
 gcc/config/aarch64/aarch64-c.cc   |   1 +
 gcc/config/aarch64/aarch64-sme.md |  63 +
 gcc/config/aarch64/aarch64.cc | 205 --
 gcc/config/aarch64/aarch64.h  |  14 +-
 gcc/config/aarch64/aarch64.md |   7 +-
 .../gcc.target/aarch64/sme/zt0_state_1.c  |  65 +
 .../gcc.target/aarch64/sme/zt0_state_2.c  |  31 +++
 .../gcc.target/aarch64/sme/zt0_state_3.c  |   6 +
 .../gcc.target/aarch64/sme/zt0_state_4.c  |  53 
 .../gcc.target/aarch64/sme/zt0_state_5.c  | 260 ++
 10 files changed, 670 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/zt0_state_5.c

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 2a8ca46987a..017380b7563 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -74,6 +74,7 @@ aarch64_define_unconditional_macros (cpp_reader *pfile)
   builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
 
   builtin_define ("__ARM_STATE_ZA");
+  builtin_define ("__ARM_STATE_ZT0");
 
   /* Define keyword attributes like __arm_streaming as macros that expand
  to the associated [[...]] attribute.  Use __extension__ in the attribute
diff --git a/gcc/config/aarch64/aarch64-sme.md 

[PATCH 2/5] aarch64: Add svcount_t

2023-11-17 Thread Richard Sandiford
Some SME2 instructions interpret predicates as counters, rather than
as bit-per-byte masks.  The SME2 ACLE defines an svcount_t type for
this interpretation.

I don't think we have a better way of representing counters than
the VNx16BI that we use for masks.  The patch therefore doesn't
add a new mode for this representation.  It's just something that
is interpreted in context, a bit like signed vs. unsigned integers.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svreinterpret_impl::fold): Handle reinterprets between svbool_t
and svcount_t.
(svreinterpret_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-base.def (svreinterpret): Add
b<->c forms.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_reinterpret_b): New
type suffix list.
(wrap_type_in_struct, register_type_decl): New functions, split out
from...
(register_tuple_type): ...here.
(register_builtin_types): Handle svcount_t.
(handle_arm_sve_h): Don't create tuples of svcount_t.
* config/aarch64/aarch64-sve-builtins.def (svcount_t): New type.
(c): New type suffix.
* config/aarch64/aarch64-sve-builtins.h (TYPE_count): New type class.

gcc/testsuite/
* g++.target/aarch64/sve/acle/general-c++/mangle_1.C: Add test
for svcount_t.
* g++.target/aarch64/sve/acle/general-c++/mangle_2.C: Likewise.
* g++.target/aarch64/sve/acle/general-c++/svcount_1.C: New test.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h (TEST_DUAL_P)
(TEST_DUAL_P_REV): New macros.
* gcc.target/aarch64/sve/acle/asm/reinterpret_b.c: New test.
* gcc.target/aarch64/sve/acle/general-c/load_1.c: Test passing
an svcount_t.
* gcc.target/aarch64/sve/acle/general-c/svcount_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/unary_convert_1.c: Test
reinterprets involving svcount_t.
* gcc.target/aarch64/sve/acle/general/attributes_7.c: Test svcount_t.
* gcc.target/aarch64/sve/pcs/annotate_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_12.c: New test.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |   8 +-
 .../aarch64/aarch64-sve-builtins-base.def |   1 +
 gcc/config/aarch64/aarch64-sve-builtins.cc| 157 -
 gcc/config/aarch64/aarch64-sve-builtins.def   |   2 +
 gcc/config/aarch64/aarch64-sve-builtins.h |   4 +-
 .../aarch64/sve/acle/general-c++/mangle_1.C   |   2 +
 .../aarch64/sve/acle/general-c++/mangle_2.C   |   2 +
 .../aarch64/sve/acle/general-c++/svcount_1.C  |  10 +
 .../aarch64/sve/acle/asm/reinterpret_b.c  |  20 ++
 .../aarch64/sve/acle/asm/test_sve_acle.h  |  15 ++
 .../aarch64/sve/acle/general-c/load_1.c   |   4 +-
 .../aarch64/sve/acle/general-c/svcount_1.c|  10 +
 .../sve/acle/general-c/unary_convert_1.c  |   8 +-
 .../aarch64/sve/acle/general/attributes_7.c   |   1 +
 .../gcc.target/aarch64/sve/pcs/annotate_1.c   |   4 +
 .../gcc.target/aarch64/sve/pcs/annotate_2.c   |   4 +
 .../gcc.target/aarch64/sve/pcs/args_12.c  | 214 ++
 17 files changed, 402 insertions(+), 64 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/aarch64/sve/acle/general-c++/svcount_1.C
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/asm/reinterpret_b.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svcount_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/args_12.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 5b75b903e5f..7d9ec5a911f 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2166,8 +2166,9 @@ public:
 
 /* Punt to rtl if the effect of the reinterpret on registers does not
conform to GCC's endianness model.  */
-if (!targetm.can_change_mode_class (f.vector_mode (0),
-   f.vector_mode (1), FP_REGS))
+if (GET_MODE_CLASS (f.vector_mode (0)) != MODE_VECTOR_BOOL
+   && !targetm.can_change_mode_class (f.vector_mode (0),
+  f.vector_mode (1), FP_REGS))
   return NULL;
 
 /* Otherwise svreinterpret corresponds directly to a VIEW_CONVERT_EXPR
@@ -2181,6 +2182,9 @@ public:
   expand (function_expander ) const override
   {
 machine_mode mode = e.tuple_mode (0);
+/* Handle svbool_t <-> svcount_t.  */
+if (mode == e.tuple_mode (1))
+  return e.args[0];
 return e.use_exact_insn (code_for_aarch64_sve_reinterpret (mode));
   }
 };
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def 
b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index ac53f35220d..a742c7bbc56 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -198,6 

[PATCH 3/5] aarch64: Add svboolx2_t

2023-11-17 Thread Richard Sandiford
SME2 has some instructions that operate on pairs of predicates.
The SME2 ACLE defines an svboolx2_t type for the associated
intrinsics.

The patch uses a double-width predicate mode, VNx32BI, to represent
the contents, similarly to how data vector tuples work.  At present
there doesn't seem to be any need to define pairs for VNx2BI,
VNx4BI and VNx8BI.

We already supported pairs of svbool_ts at the PCS level, as part
of a more general framework.  All that changes on the PCS side is
that we now have an associated mode.

gcc/
* config/aarch64/aarch64-modes.def (VNx32BI): New mode.
* config/aarch64/aarch64-protos.h (aarch64_split_double_move): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(register_tuple_type): Handle tuples of predicates.
(handle_arm_sve_h): Define svboolx2_t as a pair of two svbool_ts.
* config/aarch64/aarch64-sve.md (movvnx32bi): New insn.
* config/aarch64/aarch64.cc
(pure_scalable_type_info::piece::get_rtx): Use VNx32BI for pairs
of predicates.
(pure_scalable_type_info::add_piece): Don't try to form pairs of
predicates.
(VEC_STRUCT): Generalize comment.
(aarch64_classify_vector_mode): Handle VNx32BI.
(aarch64_array_mode): Likewise.  Return BLKmode for arrays of
predicates that have no associated mode, rather than allowing
an integer mode to be chosen.
(aarch64_hard_regno_nregs): Handle VNx32BI.
(aarch64_hard_regno_mode_ok): Likewise.
(aarch64_split_double_move): New function, split out from...
(aarch64_split_128bit_move): ...here.
(aarch64_ptrue_reg): Tighten assert to aarch64_sve_pred_mode_p.
(aarch64_pfalse_reg): Likewise.
(aarch64_sve_same_pred_for_ptest_p): Likewise.
(aarch64_sme_mode_switch_regs::add_reg): Handle VNx32BI.
(aarch64_expand_mov_immediate): Restrict handling of boolean vector
constants to single-predicate modes.
(aarch64_classify_address): Handle VNx32BI, ensuring that both halves
can be addressed.
(aarch64_class_max_nregs): Handle VNx32BI.
(aarch64_member_type_forces_blk): Don't for BLKmode for svboolx2_t.
(aarch64_simd_valid_immediate): Allow all-zeros and all-ones for
VNx32BI.
(aarch64_mov_operand_p): Restrict predicate constant canonicalization
to single-predicate modes.
(aarch64_evpc_ext): Generalize exclusion to all predicate modes.
(aarch64_evpc_rev_local, aarch64_evpc_dup): Likewise.
* config/aarch64/constraints.md (PR_REGS): New predicate.

gcc/testsuite/
* gcc.target/aarch64/sve/pcs/struct_3_128.c (test_nonpst3): Adjust
stack offsets.
(ret_nonpst3): Remove XFAIL.
* gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c: New test.
---
 gcc/config/aarch64/aarch64-modes.def  |   3 +
 gcc/config/aarch64/aarch64-protos.h   |   1 +
 gcc/config/aarch64/aarch64-sve-builtins.cc|  18 ++-
 gcc/config/aarch64/aarch64-sve.md |  22 +++
 gcc/config/aarch64/aarch64.cc | 136 --
 gcc/config/aarch64/constraints.md |   4 +
 .../aarch64/sve/acle/general-c/svboolx2_1.c   | 135 +
 .../gcc.target/aarch64/sve/pcs/struct_3_128.c |   6 +-
 8 files changed, 272 insertions(+), 53 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svboolx2_1.c

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index a3efc5b8484..ffca5517dec 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -48,16 +48,19 @@ ADJUST_FLOAT_FORMAT (HF, _half_format);
 
 /* Vector modes.  */
 
+VECTOR_BOOL_MODE (VNx32BI, 32, BI, 4);
 VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
 VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
 VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
 VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
+ADJUST_NUNITS (VNx32BI, aarch64_sve_vg * 16);
 ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
 ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
 ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
 ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
 
+ADJUST_ALIGNMENT (VNx32BI, 2);
 ADJUST_ALIGNMENT (VNx16BI, 2);
 ADJUST_ALIGNMENT (VNx8BI, 2);
 ADJUST_ALIGNMENT (VNx4BI, 2);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 3afb521c55c..25e2375c4fa 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -948,6 +948,7 @@ rtx aarch64_simd_expand_builtin (int, tree, rtx);
 void aarch64_simd_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
 rtx aarch64_endian_lane_rtx (machine_mode, unsigned int);
 
+void aarch64_split_double_move (rtx, rtx, machine_mode);
 void aarch64_split_128bit_move (rtx, rtx);
 
 bool aarch64_split_128bit_move_p (rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 

[PATCH 1/5] aarch64: Add +sme2

2023-11-17 Thread Richard Sandiford
gcc/
* doc/invoke.texi: Document +sme2.
* doc/sourcebuild.texi: Document aarch64_sme2.
* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
Add sme2.
* config/aarch64/aarch64.h (AARCH64_ISA_SME2, TARGET_SME2): New macros.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_aarch64_sme2): New
target test.
(check_effective_target_aarch64_asm_sme2_ok): Likewise.
---
 gcc/config/aarch64/aarch64-option-extensions.def |  2 ++
 gcc/config/aarch64/aarch64.h |  4 
 gcc/doc/invoke.texi  |  3 ++-
 gcc/doc/sourcebuild.texi |  2 ++
 gcc/testsuite/lib/target-supports.exp| 14 +-
 5 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 1480e498bbb..c156d2ee76a 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -157,4 +157,6 @@ AARCH64_OPT_EXTENSION("sme-i16i64", SME_I16I64, (SME), (), 
(), "")
 
 AARCH64_OPT_EXTENSION("sme-f64f64", SME_F64F64, (SME), (), (), "")
 
+AARCH64_OPT_EXTENSION("sme2", SME2, (SME), (), (), "sme2")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 9f690809e79..14205ce34b3 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -227,6 +227,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 #define AARCH64_ISA_SME   (aarch64_isa_flags & AARCH64_FL_SME)
 #define AARCH64_ISA_SME_I16I64(aarch64_isa_flags & AARCH64_FL_SME_I16I64)
 #define AARCH64_ISA_SME_F64F64(aarch64_isa_flags & AARCH64_FL_SME_F64F64)
+#define AARCH64_ISA_SME2  (aarch64_isa_flags & AARCH64_FL_SME2)
 #define AARCH64_ISA_V8_3A (aarch64_isa_flags & AARCH64_FL_V8_3A)
 #define AARCH64_ISA_DOTPROD   (aarch64_isa_flags & AARCH64_FL_DOTPROD)
 #define AARCH64_ISA_AES   (aarch64_isa_flags & AARCH64_FL_AES)
@@ -332,6 +333,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 /* The FEAT_SME_F64F64 extension to SME, enabled through +sme-f64f64.  */
 #define TARGET_SME_F64F64 (AARCH64_ISA_SME_F64F64)
 
+/* SME2 instructions, enabled through +sme2.  */
+#define TARGET_SME2 (AARCH64_ISA_SME2)
+
 /* ARMv8.3-A features.  */
 #define TARGET_ARMV8_3 (AARCH64_ISA_V8_3A)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bc56170aadb..475244bb4ff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21065,7 +21065,8 @@ Enable the Scalable Matrix Extension.
 Enable the FEAT_SME_I16I64 extension to SME.
 @item sme-f64f64
 Enable the FEAT_SME_F64F64 extension to SME.
-
++@item sme2
+Enable the Scalable Matrix Extension 2.  This also enables SME instructions.
 @end table
 
 Feature @option{crypto} implies @option{aes}, @option{sha2}, and @option{simd},
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 448f5e08578..8d8d21f9fee 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2318,6 +2318,8 @@ Binutils installed on test system supports relocation 
types required by -fpic
 for AArch64 small memory model.
 @item aarch64_sme
 AArch64 target that generates instructions for SME.
+@item aarch64_sme2
+AArch64 target that generates instructions for SME2.
 @item aarch64_sve_hw
 AArch64 target that is able to generate and execute SVE code (regardless of
 whether it does so by default).
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b9061e5a552..87ee26f9119 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4425,6 +4425,18 @@ proc check_effective_target_aarch64_sme { } {
 }]
 }
 
+# Return 1 if this is an AArch64 target that generates instructions for SME.
+proc check_effective_target_aarch64_sme2 { } {
+if { ![istarget aarch64*-*-*] } {
+   return 0
+}
+return [check_no_compiler_messages aarch64_sme2 assembly {
+   #if !defined (__ARM_FEATURE_SME2)
+   #error FOO
+   #endif
+}]
+}
+
 # Return 1 if this is a compiler supporting ARC atomic operations
 proc check_effective_target_arc_atomic { } {
 return [check_no_compiler_messages arc_atomic assembly {
@@ -11621,7 +11633,7 @@ proc check_effective_target_aarch64_tiny { } {
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
  "i8mm" "f32mm" "f64mm" "bf16" "sb" "sve2" "ls64"
- "sme" "sme-i16i64" } {
+ "sme" "sme-i16i64" "sme2" } {
 eval [string map [list FUNC $aarch64_ext] {
proc check_effective_target_aarch64_asm_FUNC_ok { } {
  if { [istarget aarch64*-*-*] } {
-- 
2.25.1



Re: [committed] libstdc++: Define C++26 saturation arithmetic functions (P0543R3)

2023-11-17 Thread Daniel Krügler
Am Fr., 17. Nov. 2023 um 18:31 Uhr schrieb Jonathan Wakely :
>
> On Fri, 17 Nov 2023 at 17:01, Daniel Krügler  
> wrote:
> >
[..]
> > > +
> > > +namespace std _GLIBCXX_VISIBILITY(default)
> > > +{
> > > +_GLIBCXX_BEGIN_NAMESPACE_VERSION
> > > +
> > > +  /// Add two integers, with saturation in case of overflow.
> > > +  template requires __is_standard_integer<_Tp>::value
> > > +constexpr _Tp
> > > +add_sat(_Tp __x, _Tp __y) noexcept
> > > +{
> > > +  _Tp __z;
> > > +  if (!__builtin_add_overflow(__x, __y, &__z))
> > > +   return __z;
> > > +  if constexpr (is_unsigned_v<_Tp>)
> > > +   return __gnu_cxx::__int_traits<_Tp>::__max;
> > > +  else if (__x < 0)
> > > +   return __gnu_cxx::__int_traits<_Tp>::__min;
> >
> > My apologies, but why does the sign of x decide the direction of the
> > result, shouldn't that be the sign of the returned value of z?
>
> z is incorrect at this point, it only has the correct value if no
> overflow occurred. But we know that an overflow occurred because the
> built-in returned true.
>
> We need to determine whether the overflow was positive, i.e. greater
> than numeric_limits::max(), or negative, i.e. lower than
> numeric_limits::min(). For unsigned types, it must have been a
> positive overflow, because neither value is negative so that's easy.
>
> If x is negative, then there is no possible y that can cause a
> positive overflow. If we consider Tp==int, then the maximum y is
> INT_MAX, so if x is negative, x+INT_MAX < INT_MAX. So if x is
> negative, we must have had a negative overflow, and so the result
> saturates to INT_MIN.
>
> If x is positive, there is no possible y that can cause a negative
> overflow. The minimum y is INT_MIN, and so if x is positive, x +
> INT_MIN > INT_MIN. So if x is positive, we must have had a positive
> overflow.
>
> (And x can't be zero, because 0+y would not overflow).

Ah right, thanks.

- Daniel


  1   2   3   >