Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 15:10 +0800, chenxiaolong wrote:
> At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote:
> > On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote:
> > > Xi Ruoyao  wrote at 12:11pm on Monday, January
> > > 15, 2024:
> > > > On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> > > > > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > > > > > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > 
> > > > > > >    * gcc.dg/pr104992.c: Added additional "-mlsx" compilation
> > > > > > > options.
> > > > > > >    * gcc.dg/signbit-2.c: Dito.
> > > > > > >    * gcc.dg/tree-ssa/scev-16.c: Dito.
> > > > > > >    * gfortran.dg/graphite/vect-pr40979.f90: Dito.
> > > > > > >    * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
> > > > > > 
> > > > > > I don't feel it right about the changes to pr104992.c and
> > > > > > scev-16.c
> > > > > > because no other architectures add special options there. 
> > > > > > Why are we
> > > > > > so special?
> > > > > Because on the LoongArch architecture, GCC requires the
> > > > > addition of
> > > > > vectorization options in order to generate vector code. Use the
> > > > > check_effective_target_vect_cmdline_needed command in the
> > > > > lib/target-
> > > > > supports.exp file to set whether the command line option is
> > > > > needed to
> > > > > enable vectorizations. For example, ia64,x86,aarch64, and riscv
> > > > > architectures, vectorization is enabled by default.
> > > > 
> > > > But no.  The default baseline of 32-bit x86 is i686, which is
> > > > basically
> > > > a Pentium III launched in 1999 without any vector instructions.
> > > > 
> > > > We are still missing something here.
> > > > 
> > > There is a line
> > >   #define vector __attribute__((vector_size(4*sizeof(int
> > > I guess it is the syntax needs to be supported.
> > 
> > This is always supported.  If the target does not have vector
> > instructions GCC will just expand vector arithmetic as a loop.
> > 
> > Maybe we should just move this test into gcc.dg/vect where the
> > framework
> > automatically add options like -mlsx or -msse2?
> > 
> 
> The "-mlsx" option is turned on by default after vectorization testing
> is turned on. However, the use of dg-options in some files resets the
> compilation options for testing this file. Therefore, to detect
> vectorization on LoongArch, it is necessary to add an additional "-
> mlsx" option.

Then it should use dg-additional-options instead of dg-options.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread chenglulu



在 2024/1/15 下午2:42, Xi Ruoyao 写道:

On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote:

Xi Ruoyao  于2024年1月15日周一 12:11写道:

On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:

At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:

At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:

gcc/testsuite/ChangeLog:

   * gcc.dg/pr104992.c: Added additional "-mlsx" compilation
options.
   * gcc.dg/signbit-2.c: Dito.
   * gcc.dg/tree-ssa/scev-16.c: Dito.
   * gfortran.dg/graphite/vect-pr40979.f90: Dito.
   * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.

I don't feel it right about the changes to pr104992.c and scev-16.c
because no other architectures add special options there.  Why are we
so special?

Because on the LoongArch architecture, GCC requires the addition of
vectorization options in order to generate vector code. Use the
check_effective_target_vect_cmdline_needed command in the lib/target-
supports.exp file to set whether the command line option is needed to
enable vectorizations. For example, ia64,x86,aarch64, and riscv
architectures, vectorization is enabled by default.

But no.  The default baseline of 32-bit x86 is i686, which is basically
a Pentium III launched in 1999 without any vector instructions.

We are still missing something here.


There is a line
  #define vector __attribute__((vector_size(4*sizeof(int
I guess it is the syntax needs to be supported.

This is always supported.  If the target does not have vector
instructions GCC will just expand vector arithmetic as a loop.

Maybe we should just move this test into gcc.dg/vect where the framework
automatically add options like -mlsx or -msse2?


The test contents of pr104992.c and scev-16.c are related to vectorization.

It would be great if these two could be moved to the gcc.dg/vect directory,

but we are not sure if this is allowed to be moved.

If it can be moved, how about modifying it in this patch?




Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-14 Thread Kewen.Lin
Hi Haochen,

on 2024/1/11 16:28, HAO CHEN GUI wrote:
> Hi,
>   This patch eliminates unnecessary byte swaps for block clear on P8
> LE. For block clear, all the bytes are set to zero. The byte order
> doesn't make sense. So the alignment of destination could be set to
> the store mode size in stead of 1 byte in order to eliminates
> unnecessary byte swap instructions on P8 LE. The test case shows the
> problem.

I agree with Richi's concern, a bytes swap can be eliminated if the
bytes swapped result is known as before, one typical case is the vector
constant with predicate const_vector_each_byte_same, we can do some
optimization for that.

BR,
Kewen

> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for block clear on P8 LE
> 
> gcc/
>   PR target/113325
>   * config/rs6000/rs6000-string.cc (expand_block_clear): Set the
>   alignment of destination to the size of mode.
> 
> gcc/testsuite/
>   PR target/113325
>   * gcc.target/powerpc/pr113325.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 7f777666ba9..4c9b2cbeefc 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -140,7 +140,9 @@ expand_block_clear (rtx operands[])
>   }
> 
>dest = adjust_address (orig_dest, mode, offset);
> -
> +  /* Set the alignment of dest to the size of mode in order to
> +  avoid unnecessary byte swaps on LE.  */
> +  set_mem_align (dest, GET_MODE_SIZE (mode) * BITS_PER_UNIT);
>emit_move_insn (dest, CONST0_RTX (mode));
>  }
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
> b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 000..4a3cae019c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> +  return __builtin_memset (s1, 0, 32);
> +}



Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread chenxiaolong
At 14:42 +0800 on the first day of 2024-01-15, Xi Ruoyao wrote:
> On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote:
> > Xi Ruoyao  wrote at 12:11pm on Monday, January
> > 15, 2024:
> > > On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> > > > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > > > > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > >   * gcc.dg/pr104992.c: Added additional "-mlsx" compilation
> > > > > > options.
> > > > > >   * gcc.dg/signbit-2.c: Dito.
> > > > > >   * gcc.dg/tree-ssa/scev-16.c: Dito.
> > > > > >   * gfortran.dg/graphite/vect-pr40979.f90: Dito.
> > > > > >   * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
> > > > > 
> > > > > I don't feel it right about the changes to pr104992.c and
> > > > > scev-16.c
> > > > > because no other architectures add special options there. 
> > > > > Why are we
> > > > > so special?
> > > > Because on the LoongArch architecture, GCC requires the
> > > > addition of
> > > > vectorization options in order to generate vector code. Use the
> > > > check_effective_target_vect_cmdline_needed command in the
> > > > lib/target-
> > > > supports.exp file to set whether the command line option is
> > > > needed to
> > > > enable vectorizations. For example, ia64,x86,aarch64, and riscv
> > > > architectures, vectorization is enabled by default.
> > > 
> > > But no.  The default baseline of 32-bit x86 is i686, which is
> > > basically
> > > a Pentium III launched in 1999 without any vector instructions.
> > > 
> > > We are still missing something here.
> > > 
> > There is a line
> >  #define vector __attribute__((vector_size(4*sizeof(int
> > I guess it is the syntax needs to be supported.
> 
> This is always supported.  If the target does not have vector
> instructions GCC will just expand vector arithmetic as a loop.
> 
> Maybe we should just move this test into gcc.dg/vect where the
> framework
> automatically add options like -mlsx or -msse2?
> 

The "-mlsx" option is turned on by default after vectorization testing
is turned on. However, the use of dg-options in some files resets the
compilation options for testing this file. Therefore, to detect
vectorization on LoongArch, it is necessary to add an additional "-
mlsx" option.



[Committed] RISC-V: Fix attributes bug configuration of ternary instructions

2024-01-14 Thread Juzhe-Zhong
This patch fixes the following FAILs:

Running target 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.c-torture/execute/pr68532.c   -O0  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr68532.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test

Running target 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

Running target 
riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test

The root cause is attributes of ternary intructions are incorrect which cause 
AVL prop PASS and VSETVL PASS behave
incorrectly.

Tested no regression and committed.

PR target/113393

gcc/ChangeLog:

* config/riscv/vector.md: Fix ternary attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr113393-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr113393-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr113393-3.c: New test.

---
 gcc/config/riscv/vector.md| 42 +--
 .../gcc.target/riscv/rvv/autovec/pr113393-1.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/pr113393-2.c | 29 +
 .../gcc.target/riscv/rvv/autovec/pr113393-3.c |  5 +++
 4 files changed, 79 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113393-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113393-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113393-3.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c1a282a27b3..ee4ee059a50 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -715,7 +715,7 @@
   (const_int 1)
 
   (eq_attr "type" "vimuladd,vfmuladd")
-  (const_int 5)]
+  (const_int 2)]
(const_int INVALID_ATTRIBUTE)))
 
 ;; The index of operand[] represents the machine mode of the instruction.
@@ -5308,7 +5308,7 @@
vmv.v.v\t%0,%2\;vmadd.vv\t%0,%3,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
-   (set_attr "merge_op_idx" "4")
+   (set_attr "merge_op_idx" "2")
(set_attr "vl_op_idx" "5")
(set (attr "ta") (symbol_ref "riscv_vector::get_ta(operands[6])"))
(set (attr "ma") (symbol_ref "riscv_vector::get_ma(operands[7])"))
@@ -5339,7 +5339,7 @@
vmv.v.v\t%0,%4\;vmacc.vv\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
-   (set_attr "merge_op_idx" "2")
+   (set_attr "merge_op_idx" "4")
(set_attr "vl_op_idx" "5")
(set (attr "ta") (symbol_ref "riscv_vector::get_ta(operands[6])"))
(set (attr "ma") (symbol_ref "riscv_vector::get_ma(operands[7])"))
@@ -5392,7 +5392,7 @@
vmv.v.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
-   (set_attr "merge_op_idx" "4")
+   (set_attr "merge_op_idx" "3")
(set_attr "vl_op_idx" "5")
(set (attr "ta") (symbol_ref "riscv_vector::get_ta(operands[6])"))
(set (attr "ma") (symbol_ref "riscv_vector::get_ma(operands[7])"))
@@ -5424,7 +5424,7 @@
vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
-   (set_attr "merge_op_idx" "2")
+   (set_attr "merge_op_idx" "4")
(set_attr "vl_op_idx" "5")
(set (attr "ta") (symbol_ref "riscv_vector::get_ta(operands[6])"))
(set (attr "ma") (symbol_ref "riscv_vector::get_ma(operands[7])"))
@@ -5492,7 +5492,7 @@
vmv.v.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
-   (set_attr "merge_op_idx" "4")
+   (set_attr "merge_op_idx" "3")
(set_attr "vl_op_idx" "5")
(set (attr "ta") (symbol_ref "riscv_vector::get_ta(operands[6])"))
(set (attr "ma") (symbol_ref "riscv_vector::get_ma(operands[7])"))
@@ -5525,7 +5525,7 @@
vmv.v.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
-   (set_attr "merge_op_idx" "2")
+   (set_attr "merge_op_idx" "4")
(set_attr "vl_op_idx" "5")
(set (attr "ta") (symbol_ref 

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 14:32 +0800, YunQiang Su wrote:
> Xi Ruoyao  于2024年1月15日周一 12:11写道:
> > 
> > On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> > > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > > > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:
> > > > > gcc/testsuite/ChangeLog:
> > > > > 
> > > > >   * gcc.dg/pr104992.c: Added additional "-mlsx" compilation
> > > > > options.
> > > > >   * gcc.dg/signbit-2.c: Dito.
> > > > >   * gcc.dg/tree-ssa/scev-16.c: Dito.
> > > > >   * gfortran.dg/graphite/vect-pr40979.f90: Dito.
> > > > >   * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
> > > > 
> > > > I don't feel it right about the changes to pr104992.c and scev-16.c
> > > > because no other architectures add special options there.  Why are we
> > > > so special?
> > 
> > > 
> > > Because on the LoongArch architecture, GCC requires the addition of
> > > vectorization options in order to generate vector code. Use the
> > > check_effective_target_vect_cmdline_needed command in the lib/target-
> > > supports.exp file to set whether the command line option is needed to
> > > enable vectorizations. For example, ia64,x86,aarch64, and riscv
> > > architectures, vectorization is enabled by default.
> > 
> > But no.  The default baseline of 32-bit x86 is i686, which is basically
> > a Pentium III launched in 1999 without any vector instructions.
> > 
> > We are still missing something here.
> > 
> There is a line
>  #define vector __attribute__((vector_size(4*sizeof(int
> I guess it is the syntax needs to be supported.

This is always supported.  If the target does not have vector
instructions GCC will just expand vector arithmetic as a loop.

Maybe we should just move this test into gcc.dg/vect where the framework
automatically add options like -mlsx or -msse2?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread YunQiang Su
Xi Ruoyao  于2024年1月15日周一 12:11写道:
>
> On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> > At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.dg/pr104992.c: Added additional "-mlsx" compilation
> > > > options.
> > > >   * gcc.dg/signbit-2.c: Dito.
> > > >   * gcc.dg/tree-ssa/scev-16.c: Dito.
> > > >   * gfortran.dg/graphite/vect-pr40979.f90: Dito.
> > > >   * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
> > >
> > > I don't feel it right about the changes to pr104992.c and scev-16.c
> > > because no other architectures add special options there.  Why are we
> > > so special?
>
> >
> > Because on the LoongArch architecture, GCC requires the addition of
> > vectorization options in order to generate vector code. Use the
> > check_effective_target_vect_cmdline_needed command in the lib/target-
> > supports.exp file to set whether the command line option is needed to
> > enable vectorizations. For example, ia64,x86,aarch64, and riscv
> > architectures, vectorization is enabled by default.
>
> But no.  The default baseline of 32-bit x86 is i686, which is basically
> a Pentium III launched in 1999 without any vector instructions.
>
> We are still missing something here.
>
There is a line
 #define vector __attribute__((vector_size(4*sizeof(int
I guess it is the syntax needs to be supported.




-- 
YunQiang Su


Re: [PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64

2024-01-14 Thread Kewen.Lin
Hi Haochen,

on 2024/1/12 14:48, HAO CHEN GUI wrote:
> Hi,
>   On P9 "setb" is used to set the result of block compare. So it works
> with m32 and mpowerpc64. On P8, carry bit is used. So it can't work
> with m32 and mpowerpc64. This patch enables block compare expand for
> m32 and mpowerpc64 on P9.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?

OK with two nits below tweaked.  Thanks!

BR,
Kewen

> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Enable block compare expand on P9 with m32 and mpowerpc64
> 
> gcc/
>   * config/rs6000/rs6000-string.cc (expand_block_compare): Enable
>   P9 with m32 and mpowerpc64.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/block-cmp-1.c: Exclude m32 and mpowerpc64.
>   * gcc.target/powerpc/block-cmp-4.c: Likewise.
>   * gcc.target/powerpc/block-cmp-8.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index 018b87f2501..346708071b5 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -1677,11 +1677,12 @@ expand_block_compare (rtx operands[])
>/* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
>gcc_assert (TARGET_POPCNTD);
> 
> -  /* This case is complicated to handle because the subtract
> - with carry instructions do not generate the 64-bit
> - carry and so we must emit code to calculate it ourselves.
> - We choose not to implement this yet.  */
> -  if (TARGET_32BIT && TARGET_POWERPC64)
> +  /* For P8, this case is complicated to handle because the subtract
> + with carry instructions do not generate the 64-bit carry and so
> + we must emit code to calculate it ourselves.  We skip it on P8
> + but setb works well on P9.  */
> +  if (TARGET_32BIT && TARGET_POWERPC64

Nit: Move "&& TARGET_POWERPC64" as one separated line to make it read better.

> +  && !TARGET_P9_MISC)
>  return false;
> 
>/* Allow this param to shut off all expansion.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c 
> b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
> index bcf0cb2ab4f..cd076cf1dce 100644
> --- a/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-1.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-vsx" } */
> +/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */
>  /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */
> 
>  /* Test that it still can do expand for memcmpsi instead of calling library
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c 
> b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
> index c86febae68a..9373b53a3a4 100644
> --- a/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-4.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile { target be } } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
> +/* { dg-skip-if "" { has_arch_ppc64 && ilp32 } } */
>  /* { dg-final { scan-assembler-not {\mb[l]? memcmp\M} } }  */
> 
>  /* Test that it does expand for memcmpsi instead of calling library on
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c 
> b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
> new file mode 100644
> index 000..b470f873973
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-8.c
> @@ -0,0 +1,8 @@
> +/* { dg-do run { target ilp32 } } */
> +/* { dg-options "-O2 -m32 -mpowerpc64" } */

Nit: -m32 isn't needed.

> +/* { dg-require-effective-target has_arch_ppc64 } */
> +/* { dg-timeout-factor 2 } */
> +
> +/* Verify memcmp on m32 mpowerpc64 */
> +
> +#include "../../gcc.dg/memcmp-1.c"
BR,
Kewen


Re: [PATCH 1/2] RISC-V: delete all the vector psabi checking.

2024-01-14 Thread juzhe.zh...@rivai.ai
I think you should also remove riscv_vector_abi
since vector ABI is ratified and we should by default enable vector calling 
convention by default.



juzhe.zh...@rivai.ai
 
From: yanzhang.wang
Date: 2024-01-15 14:00
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; lehua.ding; yanzhang.wang
Subject: [PATCH 1/2] RISC-V: delete all the vector psabi checking.
From: Yanzhang Wang 
 
Thanks the
https://hub.fgit.cf/riscv-non-isa/riscv-elf-psabi-doc/pull/389, we
need not to maintain the psabi checking any more.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_arg_has_vector): Delete.
(riscv_pass_in_vector_p): Delete.
(riscv_init_cumulative_args): Delete the checking.
(riscv_get_arg_info): Delete the checking.
(riscv_function_value): Delete the checking.
* config/riscv/riscv.h: Delete the member for checking.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/binop_vx_constraint-120.c: Delete the -Wno-psabi.
* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Ditto.
* gcc.target/riscv/rvv/base/pr110109-2.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-9.c: Ditto.
* gcc.target/riscv/rvv/base/spill-10.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.
* gcc.target/riscv/rvv/base/vlmul_ext-1.c: Ditto.
* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
* gcc.target/riscv/rvv/base/vector-abi-1.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-2.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-3.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-4.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-5.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-6.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-7.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-8.c: Removed.
 
Signed-off-by: Yanzhang Wang 
 
---
Have tested the two patches on my local and there's no regression.
 
---
gcc/config/riscv/riscv.cc | 80 +--
gcc/config/riscv/riscv.h  |  2 -
.../riscv/rvv/base/binop_vx_constraint-120.c  |  2 +-
.../rvv/base/integer_compare_insn_shortcut.c  |  2 +-
.../riscv/rvv/base/mask_insn_shortcut.c   |  2 +-
.../rvv/base/misc_vreinterpret_vbool_vint.c   |  2 +-
.../gcc.target/riscv/rvv/base/pr110109-2.c|  2 +-
.../gcc.target/riscv/rvv/base/scalar_move-9.c |  2 +-
.../gcc.target/riscv/rvv/base/spill-10.c  |  2 +-
.../gcc.target/riscv/rvv/base/spill-11.c  |  2 +-
.../gcc.target/riscv/rvv/base/spill-9.c   |  2 +-
.../gcc.target/riscv/rvv/base/vector-abi-1.c  | 14 
.../gcc.target/riscv/rvv/base/vector-abi-2.c  | 15 
.../gcc.target/riscv/rvv/base/vector-abi-3.c  | 14 
.../gcc.target/riscv/rvv/base/vector-abi-4.c  | 16 
.../gcc.target/riscv/rvv/base/vector-abi-5.c  | 20 -
.../gcc.target/riscv/rvv/base/vector-abi-6.c  | 20 -
.../gcc.target/riscv/rvv/base/vector-abi-7.c  | 14 
.../gcc.target/riscv/rvv/base/vector-abi-8.c  | 14 
.../gcc.target/riscv/rvv/base/vlmul_ext-1.c   |  2 +-
.../base/zero_base_load_store_optimization.c  |  2 +-
.../riscv/rvv/base/zvfh-intrinsic.c   |  2 +-
.../riscv/rvv/base/zvfh-over-zvfhmin.c|  2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c|  2 +-
24 files changed, 15 insertions(+), 222 deletions(-)
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-1.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-2.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-3.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-4.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-5.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-6.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-7.c
delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-8.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32183d63180..e7f7ce605db 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4844,59 +4844,6 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned regno1,
   GEN_INT (offset2;
}
-/* Return true if a vector type is included in the type TYPE.  */
-
-static bool
-riscv_arg_has_vector (const_tree type)
-{
-  if (riscv_v_ext_mode_p (TYPE_MODE (type)))
-return true;
-
-  if (!COMPLETE_TYPE_P (type))
-return false;
-
-  switch (TREE_CODE (type))
-{
-case RECORD_TYPE:
-  /* If it is a record, it is further determined whether its fields have
- vector type.  */
-  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
- if (TREE_CODE (f) == FIELD_DECL)
-   {
- tree field_type = TREE_TYPE (f);
- if (!TYPE_P 

Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-14 Thread Kewen.Lin
Hi Haochen,

on 2024/1/10 09:35, HAO CHEN GUI wrote:
> Hi,
>   This patch refactors function expand_compare_loop and split it to two
> functions. One is for fixed length and another is for variable length.
> These two functions share some low level common help functions.

I'm expecting refactoring doesn't introduce any functional changes, but
this patch has some enhancements as described below, so I think the
subject is off, it's more like rework.

> 
>   Besides above changes, the patch also does:
> 1. Don't generate load and compare loop when max_bytes is less than
> loop bytes.
> 2. Remove do_load_mask_compare as it's no needed. All sub-targets
> entering the function should support efficient overlapping load and
> compare.
> 3. Implement an variable length overlapping load and compare for the
> case which remain bytes is less than the loop bytes in variable length
> compare. The 4k boundary test and one-byte load and compare loop are
> removed as they're no need now.
> 4. Remove the codes for "bytes > max_bytes" with fixed length as the
> case is already excluded by pre-checking.
> 5. Remove running time codes for "bytes > max_bytes" with variable length
> as it should jump to call library at the beginning.
> 6. Enhance do_overlap_load_compare to avoid overlapping load and compare
> when the remain bytes can be loaded and compared by a smaller unit.

Considering it's stage 4 now and the impact of this patch, let's defer
this to next stage 1, if possible could you organize the above changes
into patches:

1) Refactor expand_compare_loop by splitting into two functions without
   any functional changes.
2) Remove some useless codes like 2, 4, 5.
3) Some more enhancements like 1, 3, 6.

?  It would be helpful for the review.  Thanks!

BR,
Kewen

> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Refactor expand_compare_loop and split it to two functions
> 
> The original expand_compare_loop has a complicated logical as it's
> designed for both fixed and variable length.  This patch splits it to
> two functions and make these two functions share common help functions.
> Also the 4K boundary test and corresponding one byte load and compare
> are replaced by variable length overlapping load and compare.  The
> do_load_mask_compare is removed as all sub-targets entering the function
> has efficient overlapping load and compare so that mask load is no needed.
> 
> gcc/
>   * config/rs6000/rs6000-string.cc (do_isel): Remove.
>   (do_load_mask_compare): Remove.
>   (do_reg_compare): New.
>   (do_load_and_compare): New.
>   (do_overlap_load_compare): Do load and compare with a small unit
>   other than overlapping load and compare when the remain bytes can
>   be done by one instruction.
>   (expand_compare_loop): Remove.
>   (get_max_inline_loop_bytes): New.
>   (do_load_compare_rest_of_loop): New.
>   (generate_6432_conversion): Set it to a static function and move
>   ahead of gen_diff_handle.
>   (gen_diff_handle): New.
>   (gen_load_compare_loop): New.
>   (gen_library_call): New.
>   (expand_compare_with_fixed_length): New.
>   (expand_compare_with_variable_length): New.
>   (expand_block_compare): Call expand_compare_with_variable_length
>   to expand block compare for variable length.  Call
>   expand_compare_with_fixed_length to expand block compare loop for
>   fixed length.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/block-cmp-5.c: New.
>   * gcc.target/powerpc/block-cmp-6.c: New.
>   * gcc.target/powerpc/block-cmp-7.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc 
> b/gcc/config/rs6000/rs6000-string.cc
> index f707bb2727e..018b87f2501 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -404,21 +404,6 @@ do_ifelse (machine_mode cmpmode, rtx_code comparison,
>LABEL_NUSES (true_label) += 1;
>  }
> 
> -/* Emit an isel of the proper mode for DEST.
> -
> -   DEST is the isel destination register.
> -   SRC1 is the isel source if CR is true.
> -   SRC2 is the isel source if CR is false.
> -   CR is the condition for the isel.  */
> -static void
> -do_isel (rtx dest, rtx cmp, rtx src_t, rtx src_f, rtx cr)
> -{
> -  if (GET_MODE (dest) == DImode)
> -emit_insn (gen_isel_cc_di (dest, cmp, src_t, src_f, cr));
> -  else
> -emit_insn (gen_isel_cc_si (dest, cmp, src_t, src_f, cr));
> -}
> -
>  /* Emit a subtract of the proper mode for DEST.
> 
> DEST is the destination register for the subtract.
> @@ -499,65 +484,61 @@ do_rotl3 (rtx dest, rtx src1, rtx src2)
>  emit_insn (gen_rotlsi3 (dest, src1, src2));
>  }
> 
> -/* Generate rtl for a load, shift, and compare of less than a full word.
> -
> -   LOAD_MODE is the machine mode for the loads.
> -   DIFF is the reg for the difference.
> -   CMP_REM is the reg 

Re: [PATCH] strub: Only unbias stack point for SPARC_STACK_BOUNDARY_HACK [PR113100]

2024-01-14 Thread Kewen.Lin
on 2024/1/12 19:03, Alexandre Oliva wrote:
> On Jan 12, 2024, "Kewen.Lin"  wrote:
> 
 By checking PR112917, IMHO we should keep this unbiasing
 guarded under SPARC_STACK_BOUNDARY_HACK (TARGET_ARCH64 &&
 TARGET_STACK_BIAS), similar to some existing code special
 treating SPARC stack bias.
>>>
>>> I'm afraid this change will most certainly regress 32-bit sparc, because
>>> of the large register save area.
> 
>> Oh, I read the comments and commit logs in PR112917, mainly
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112917#{c4,c5,c6},
>> and the "sparc64" in subject of commit r14-6737 also implies
>> that this unbiasing is only required for sparc64, so I thought
>> it should be safe to guard with SPARC_STACK_BOUNDARY_HACK.
> 
> It is safe, in a way, because that protects potentially active stack
> areas, but it's unsafe in that it may leak data that stack scrubbing was
> supposed to scrub.  There's no conservative solution here, alas; we have
> to get it just right.
> 
> Specifically on sparc32, if __builtin_scrub_leave allocated its own
> frame (it doesn't) with the large register-save area for its potential
> (but inexistent) callees to use, it could overlap with a large chunk of
> the very stack frame that it's supposed to clear.

Thanks for the further explanation!

> 
> Unfortunately, this is slowly drifting away from the notion of stack
> address.  I mean, all of the following could conceivably be returned by
> __builtin_stack_address:
> 
> - the (biased) stack pointer
> 
> - the address of the red zone
> 
> - the unbiased stack pointer
> 
> - the address of the save area reserved by callees for potential callees
> 
> - the boundary between caller- and callee-used stack space
> 
> The last one is what we need for stack scrubbing, so that's what I'm
> planning to implement, but I'm pondering whether to change
> __builtin_stack_address() to take an extra argument to select among the
> different possibilities, or of other means to query these various
> offsets.  It feels like overthinking, so I'm trying to push these
> thoughts aside, but...  Does anyone think that would be a desirable
> feature?  We can always add it later.

One immature idea: maybe we can introduce a hook with clear meaning for
the last one and its default implementation still adopts the function
__builtin_stack_address directly, if this default implementation for
some port is imperfect, someone who is familiar with its own ABIs can
further enhance it with its own hook implementation.

BR,
Kewen


[PATCH 2/2] RISC-V: delete vector abi checking in all relevant tests.

2024-01-14 Thread yanzhang . wang
From: Yanzhang Wang 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: Delete the
  -Wno-psabi.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-return.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: Ditto.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm-error.c: Ditto.
* gcc.target/riscv/rvv/base/fixed-point-vxrm.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-cvt-f.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-cvt-x.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-cvt-xu.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-1.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-10.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-11.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-12.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-13.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-14.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-15.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-16.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-17.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-18.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-19.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-2.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-20.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-21.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-22.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-23.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-24.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-25.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-26.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-27.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-28.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-29.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-3.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-30.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-31.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-32.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-4.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: Ditto.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: Ditto.
* 

[PATCH 1/2] RISC-V: delete all the vector psabi checking.

2024-01-14 Thread yanzhang . wang
From: Yanzhang Wang 

Thanks the
https://hub.fgit.cf/riscv-non-isa/riscv-elf-psabi-doc/pull/389, we
need not to maintain the psabi checking any more.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_arg_has_vector): Delete.
(riscv_pass_in_vector_p): Delete.
(riscv_init_cumulative_args): Delete the checking.
(riscv_get_arg_info): Delete the checking.
(riscv_function_value): Delete the checking.
* config/riscv/riscv.h: Delete the member for checking.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-120.c: Delete the 
-Wno-psabi.
* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Ditto.
* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Ditto.
* gcc.target/riscv/rvv/base/pr110109-2.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-9.c: Ditto.
* gcc.target/riscv/rvv/base/spill-10.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.
* gcc.target/riscv/rvv/base/vlmul_ext-1.c: Ditto.
* gcc.target/riscv/rvv/base/zero_base_load_store_optimization.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Ditto.
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/vsetvl-1.c: Ditto.
* gcc.target/riscv/rvv/base/vector-abi-1.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-2.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-3.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-4.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-5.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-6.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-7.c: Removed.
* gcc.target/riscv/rvv/base/vector-abi-8.c: Removed.

Signed-off-by: Yanzhang Wang 

---
Have tested the two patches on my local and there's no regression.

---
 gcc/config/riscv/riscv.cc | 80 +--
 gcc/config/riscv/riscv.h  |  2 -
 .../riscv/rvv/base/binop_vx_constraint-120.c  |  2 +-
 .../rvv/base/integer_compare_insn_shortcut.c  |  2 +-
 .../riscv/rvv/base/mask_insn_shortcut.c   |  2 +-
 .../rvv/base/misc_vreinterpret_vbool_vint.c   |  2 +-
 .../gcc.target/riscv/rvv/base/pr110109-2.c|  2 +-
 .../gcc.target/riscv/rvv/base/scalar_move-9.c |  2 +-
 .../gcc.target/riscv/rvv/base/spill-10.c  |  2 +-
 .../gcc.target/riscv/rvv/base/spill-11.c  |  2 +-
 .../gcc.target/riscv/rvv/base/spill-9.c   |  2 +-
 .../gcc.target/riscv/rvv/base/vector-abi-1.c  | 14 
 .../gcc.target/riscv/rvv/base/vector-abi-2.c  | 15 
 .../gcc.target/riscv/rvv/base/vector-abi-3.c  | 14 
 .../gcc.target/riscv/rvv/base/vector-abi-4.c  | 16 
 .../gcc.target/riscv/rvv/base/vector-abi-5.c  | 20 -
 .../gcc.target/riscv/rvv/base/vector-abi-6.c  | 20 -
 .../gcc.target/riscv/rvv/base/vector-abi-7.c  | 14 
 .../gcc.target/riscv/rvv/base/vector-abi-8.c  | 14 
 .../gcc.target/riscv/rvv/base/vlmul_ext-1.c   |  2 +-
 .../base/zero_base_load_store_optimization.c  |  2 +-
 .../riscv/rvv/base/zvfh-intrinsic.c   |  2 +-
 .../riscv/rvv/base/zvfh-over-zvfhmin.c|  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-1.c|  2 +-
 24 files changed, 15 insertions(+), 222 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-1.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-2.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-3.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-4.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-5.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-6.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-7.c
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-8.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 32183d63180..e7f7ce605db 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4844,59 +4844,6 @@ riscv_pass_fpr_pair (machine_mode mode, unsigned regno1,
   GEN_INT (offset2;
 }
 
-/* Return true if a vector type is included in the type TYPE.  */
-
-static bool
-riscv_arg_has_vector (const_tree type)
-{
-  if (riscv_v_ext_mode_p (TYPE_MODE (type)))
-return true;
-
-  if (!COMPLETE_TYPE_P (type))
-return false;
-
-  switch (TREE_CODE (type))
-{
-case RECORD_TYPE:
-  /* If it is a record, it is further determined whether its fields have
-vector type.  */
-  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
-   if (TREE_CODE (f) == FIELD_DECL)
- {
-   tree field_type = TREE_TYPE (f);
-   if (!TYPE_P (field_type))
- break;
-
-   if 

Re: MIPS: the method of getting GOT address for PIC code

2024-01-14 Thread YunQiang Su
YunQiang Su  于2023年8月25日周五 15:16写道:
>
> When working on LLVM, I found this problem
> https://github.com/llvm/llvm-project/issues/64974.
> Maybe it's time for us to reconsider the way of getting GOT address
> for PIC code.
>

I have my draft patch pushed to GitHub:
https://github.com/wzssyqa/gcc/tree/pcrel
And the patch is also attached.

Any comment is welcome.

-- 
YunQiang Su


0001-MIPS-PCREL-support.patch
Description: Binary data


[PATCH] MIPS: avoid $gp store if global_pointer is not $gp

2024-01-14 Thread YunQiang Su
$GP is used for expanding GOT load, and in the afterward passes,
we will try to use a temporary register instead.

If sucess, we have no need to store and reload $gp. The example
of failure is that the function calls a preemtive function.

We shouldn't use $GP for any other purpose in the code we generate.
If a user's inline asm code clobbers $GP, it's their duty to save
and restore $GP.

gcc
* config/mips/mips.cc (mips_compute_frame_info): If another
register is used as global_pointer, mark $GP live false.

gcc/testsuite
* gcc.target/mips/mips.exp (mips_option_groups):
Add -mxgot/-mno-xgot options.
* gcc.target/mips/xgot-n32-avoid-gp.c: New test.
* gcc.target/mips/xgot-n32-need-gp.c: New test.
---
 gcc/config/mips/mips.cc   |  2 ++
 gcc/testsuite/gcc.target/mips/mips.exp|  1 +
 gcc/testsuite/gcc.target/mips/xgot-n32-avoid-gp.c | 11 +++
 gcc/testsuite/gcc.target/mips/xgot-n32-need-gp.c  | 11 +++
 4 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/xgot-n32-avoid-gp.c
 create mode 100644 gcc/testsuite/gcc.target/mips/xgot-n32-need-gp.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index e752019b5e2..30e99811ff6 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -11353,6 +11353,8 @@ mips_compute_frame_info (void)
  in, which is why the global_pointer field is initialised here and not
  earlier.  */
   cfun->machine->global_pointer = mips_global_pointer ();
+  if (cfun->machine->global_pointer != GLOBAL_POINTER_REGNUM)
+df_set_regs_ever_live (GLOBAL_POINTER_REGNUM, false);
 
   offset = frame->args_size + frame->cprestore_size;
 
diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 9f8d533cfa5..e028bc93b40 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -266,6 +266,7 @@ set mips_option_groups {
 stack-protector "-fstack-protector"
 stdlib "REQUIRES_STDLIB"
 unaligned-access "-m(no-|)unaligned-access"
+xgot "-m(no-|)xgot"
 }
 
 for { set option 0 } { $option < 32 } { incr option } {
diff --git a/gcc/testsuite/gcc.target/mips/xgot-n32-avoid-gp.c 
b/gcc/testsuite/gcc.target/mips/xgot-n32-avoid-gp.c
new file mode 100644
index 000..3f52fc5a765
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/xgot-n32-avoid-gp.c
@@ -0,0 +1,11 @@
+/* Check if we skip store and load gp if there is no stub function call.  */
+/* { dg-options "-mips64r2 -mxgot -mabi=n32 -fPIC" } */
+
+extern int a;
+int
+foo ()
+{
+  return a;
+}
+/* { dg-final { scan-assembler-not "\tsd\t\\\$28," } } */
+/* { dg-final { scan-assembler-not "\tld\t\\\$28," } } */
diff --git a/gcc/testsuite/gcc.target/mips/xgot-n32-need-gp.c 
b/gcc/testsuite/gcc.target/mips/xgot-n32-need-gp.c
new file mode 100644
index 000..631409cb7fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/xgot-n32-need-gp.c
@@ -0,0 +1,11 @@
+/* We cannot skip store and load gp if there is stub function call.  */
+/* { dg-options "-mips64r2 -mxgot -mabi=n32 -fPIC" } */
+
+extern int f();
+int
+foo ()
+{
+  return f();
+}
+/* { dg-final { scan-assembler "\tsd\t\\\$28," } } */
+/* { dg-final { scan-assembler "\tld\t\\\$28," } } */
-- 
2.39.2



Re: [Patch] libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy* (was: [Patch] libgomp.texi: Document omp_pause_resource{,_all})

2024-01-14 Thread Sandra Loosemore

On 1/14/24 16:15, Tobias Burnus wrote:


+@node omp_target_memcpy
+@subsection @code{omp_target_memcpy} -- Copy data between devices
+@table @asis
+@item @emph{Description}:
+This routine tests copies @var{length} of bytes of data from the device
+identified by device number @var{src_device_num} to device 
@var{dst_device_num}.


Hmmm, I'm sure it's the train's fault :-) but "tests copies" makes no sense, 
and that's cut-and-pasted multiple times.  I think you just mean "copies" in 
all cases.



+@node omp_target_memcpy_rect
+@subsection @code{omp_target_memcpy_rect} -- Copy a subvolume of data between 
devices
+@table @asis
+@item @emph{Description}:
+This routine tests copies a subvolume of data from the device identified by
+device number @var{src_device_num} to device @var{dst_device_num}.  The
+subvolume of a multi-dimensional array of array dimension @var{num_dims} and
+each array element has a size of @var{element_size} bytes.  The @var{volume}


This is kind of garbled.  How about rephrasing that second sentence as

The array has @var{num_dims} and each array element has a size of 
@var{element_size} bytes.




+array specifies how many elements per dimension will be copied.  The full


s/will be/are/


+array in number of elements is given by the @var{dst_dimensions} and
+@var{src_dimensions} arguments for the array on the destination and source
+device, respectively.  The offset per dimension to the first element to


I think we can simplify that sentence, too, like

The full sizes of the destination and source arrays are given by the 
@var{dst_dimensions} and @var{src_dimensions} arguments, respectively.



+be copied is given by the @var{dst_offset} and @var{src_offset} arguments.
+The routine returns zero on success and non-zero otherwise.
+
+The OpenMP only requires that @var{num_dims} up to three is supported. In order


s/OpenMP/OpenMP specification/ ?


+to find implementation-specific maximally supported number of dimensions, the
+routine will return this value when invoked with a NULL pointer to both the


s/will return/returns/

either "null pointer" or "@code{NULL}" is preferable to "NULL pointer".


+@var{dst} and @var{src} arguments.  As GCC supports arbitrary dimensions, it
+will return INTMAX.


s/will return INTMAX/returns @code{INT_MAX}/


+
+The device-number arguments must be conforming device number, the @var{src} and


s/number,/numbers,/



+@var{dst} must be either both NULL or any of the following must be fulfilled:


same issue with "NULL" here, either "@code{NULL}" or "null pointers".

"any" seems unlikely to be useful.  Do you mean "all" of the following 
conditions?


+@var{element_size} and @var{num_dims} must be positive, the @var{volume}, 
offset
+and dimension arrays must have at least @var{num_dims} dimensions.
+Running this routine in a @code{target} region except on the initial device
+is not supported.


The part of the patch for omp_target_memcpy_rect_async has very similar 
problems and needs the same fixes.


-Sandra


Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread Xi Ruoyao
On Mon, 2024-01-15 at 09:29 +0800, chenxiaolong wrote:
> At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> > At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * gcc.dg/pr104992.c: Added additional "-mlsx" compilation
> > > options.
> > >   * gcc.dg/signbit-2.c: Dito.
> > >   * gcc.dg/tree-ssa/scev-16.c: Dito.
> > >   * gfortran.dg/graphite/vect-pr40979.f90: Dito.
> > >   * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
> > 
> > I don't feel it right about the changes to pr104992.c and scev-16.c
> > because no other architectures add special options there.  Why are we
> > so special?

> 
> Because on the LoongArch architecture, GCC requires the addition of
> vectorization options in order to generate vector code. Use the
> check_effective_target_vect_cmdline_needed command in the lib/target-
> supports.exp file to set whether the command line option is needed to
> enable vectorizations. For example, ia64,x86,aarch64, and riscv
> architectures, vectorization is enabled by default.

But no.  The default baseline of 32-bit x86 is i686, which is basically
a Pentium III launched in 1999 without any vector instructions.

We are still missing something here.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-01-14 Thread H.J. Lu
On Sun, Jan 7, 2024, 5:02 PM xndcn  wrote:

> Hi, I found __atomic_float constructor does not clear padding,
> while __compare_exchange assumes it as zeroed padding. So it is easy to
> reproducing a infinite loop in X86-64 with long double type like:
> ---
> -O0 -std=c++23 -mlong-double-80
> #include 
> #include 
>
> #define T long double
> int main() {
> std::atomic t(0.5);
> t.fetch_add(0.5);
> float x = t;
> printf("%f\n", x);
> }
> ---
>
> So we should add __builtin_clear_padding in __atomic_float constructor,
> just like the generic atomic struct.
>
> regtested on x86_64-linux. Is it OK for trunk?
>
> ---
> libstdc++: atomic: Add missing clear_padding in __atomic_float constructor.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_base.h: add __builtin_clear_padding in
> __atomic_float constructor.
> ---
>  libstdc++-v3/include/bits/atomic_base.h | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_base.h
> b/libstdc++-v3/include/bits/atomic_base.h
> index f4ce0fa53..d59c2209e 100644
> --- a/libstdc++-v3/include/bits/atomic_base.h
> +++ b/libstdc++-v3/include/bits/atomic_base.h
> @@ -1283,7 +1283,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>constexpr
>__atomic_float(_Fp __t) : _M_fp(__t)
> -  { }
> +  {
> +#if __has_builtin(__builtin_clear_padding)
> + if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Fp>())
> +  __builtin_clear_padding(std::__addressof(_M_fp));
> +#endif
> +  }
>
>__atomic_float(const __atomic_float&) = delete;
>__atomic_float& operator=(const __atomic_float&) = delete;
> --
> 2.25.1
>

Can you add a testcase?

Thanks.

H.J.

>


Re: [PATCH] libsupc++: Fix UB terminating on foreign exception

2024-01-14 Thread Julia DeMille

On 2024-01-14 18:51, Julia DeMille wrote:
I'm unsure if my patch actually fixes it with this demo -- I need to 
work out how to use a patched GCC without installing it on my system, 
but without it breaking from not having things it expects to exist on 
the system.


I've gotten this to work, and run into an unexpected situation. 
Something about the personality routine is causing a SIGABRT. 
Investigating further.


I'm also going to go make sure that the Objective-C unwind personality 
is unique, otherwise we could have trouble.

Checked this -- it is.

--
Thanks,
Julia DeMille
she/her



Re: [PATCH] Pass GUILE down to subdirectories

2024-01-14 Thread Eric Gallager
On Sat, Jan 13, 2024 at 6:36 AM Andrew Burgess  wrote:
>
> Tom Tromey  writes:
>
> > When I enable cgen rebuilding in the binutils-gdb tree, the default is
> > to run cgen using 'guile'.  However, on my host, guile is guile 2.2,
> > which doesn't work for me -- I have to use guile3.0.
> >
> > This patch arranges to pass "GUILE" down to subdirectories, so I can
> > use 'make GUILE=guile3.0'.
> >
> > ChangeLog
> > 2023-12-30  Tom Tromey  
> >
> >   * Makefile.in: Rebuild.
> >   * Makefile.tpl (BASE_EXPORTS): Add GUILE.
> >   (GUILE): New variable.
> >   * Makefile.def (flags_to_pass): Add GUILE.
> > ---
> >  ChangeLog| 7 +++
> >  Makefile.def | 1 +
> >  Makefile.in  | 8 ++--
> >  Makefile.tpl | 7 +--
> >  4 files changed, 19 insertions(+), 4 deletions(-)
> >
> > diff --git a/Makefile.def b/Makefile.def
> > index 662e50fdc18..792919e561c 100644
> > --- a/Makefile.def
> > +++ b/Makefile.def
> > @@ -310,6 +310,7 @@ flags_to_pass = { flag= GNATBIND ; };
> >  flags_to_pass = { flag= GNATMAKE ; };
> >  flags_to_pass = { flag= GDC ; };
> >  flags_to_pass = { flag= GDCFLAGS ; };
> > +flags_to_pass = { flag= GUILE ; };
> >
> >  // Target tools
> >  flags_to_pass = { flag= AR_FOR_TARGET ; };
> > diff --git a/Makefile.in b/Makefile.in
> > index 48320bb549e..9a58d5a4f20 100644
> > --- a/Makefile.in
> > +++ b/Makefile.in
> > @@ -3,7 +3,7 @@
> >  #
> >  # Makefile for directory with subdirs to build.
> >  #   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
> > -#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 
> > 2010, 2011
> > +#   1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 
> > 2010, 2011, 2023
> >  #   Free Software Foundation
> >  #
> >  # This file is free software; you can redistribute it and/or modify
> > @@ -143,7 +143,8 @@ BASE_EXPORTS = \
> >   M4="$(M4)"; export M4; \
> >   SED="$(SED)"; export SED; \
> >   AWK="$(AWK)"; export AWK; \
> > - MAKEINFO="$(MAKEINFO)"; export MAKEINFO;
> > + MAKEINFO="$(MAKEINFO)"; export MAKEINFO; \
> > + GUILE="$(GUILE)"; export GUILE;
> >
> >  # This is the list of variables to export in the environment when
> >  # configuring subdirectories for the build system.
> > @@ -450,6 +451,8 @@ GM2FLAGS = $(CFLAGS)
> >
> >  PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >
> > +GUILE = guile
>
> Hi Tom,
>
> This change is causing some problems for me.
>
> One of my build machines has 2 versions of guile installed.  One is
> guile 2.0.14 and the other is guile 2.2.21.
>
> When GDB configures itself the configure script figures out that it
> should use 2.2.21 to compile the guile libraries that GDB uses.
>
> However, when we actually build the guile libraries we do use guild2.2,
> but due to this 'GUILE = guile' line, guild2.2 uses guile 2.0.14 in
> order to perform the compile (I guess, I don't know the details of how
> guile compilation works).
>
> Unfortunately guile 2.0.14 compiles in a way which is not compatible
> with how GDB then tries to load the guile library.
>
> Maybe better to show you what's going on:
>
>   $ pwd
>   /tmp/binutils-gdb/build/gdb/data-directory/guile
>   $ GUILE=guile /usr/bin/guild2.2 compile -Warity-mismatch -Wformat 
> -Wunused-toplevel -L . -o ./gdb.go ./gdb.scm
>   wrote `./gdb.go'
>   $ file gdb.go
>   gdb.go: Guile Object, little endian, 64bit, bytecode v2.0
>   $ GUILE=guile2.2 /usr/bin/guild2.2 compile -Warity-mismatch -Wformat 
> -Wunused-toplevel -L . -o ./gdb.go ./gdb.scm
>   wrote `./gdb.go'
>   $ file gdb.go
>   gdb.go: ELF 64-bit LSB shared object, no machine, version 1 (embedded), 
> dynamically linked, with debug_info, not stripped
>
> The first compile uses GUILE=guile, so I use guile 2.0.14, which results
> in a non-ELF being generated.  When I start GDB with this non-ELF in
> place, I see this:
>
>   $ ./gdb/gdb --data-directory ./gdb/data-directory/
>   Exception caught while booting Guile.
>   Error in function "load-thunk-from-memory":
>   not an ELF file
>   ./gdb/gdb: warning: Could not complete Guile gdb module initialization from:
>   /tmp/binutils-gdb/build/gdb/data-directory/guile/gdb/boot.scm.
>   Limited Guile support is available.
>   Suggest passing --data-directory=/path/to/gdb/data-directory.
>   GDB Version: 15.1
>
>   (gdb)
>
> The second compile, with GUILE=guile2.2 results in an ELF being
> generated, and with this in place GDB starts just fine.
>
> Now, clearly the obvious answer is: don't have such an old, out of date
> version of guile installed.  But I think there's a bigger issue here.
> As guild version X will by default pick up the corresponding version of
> guile, shouldn't that be the default behaviour?
>
> My proposal would be that we change the line 'GUILE = guile' to instead
> be just 'GUILE ='.
>
> With this in place I can still override the choice of guile executable
> with:
>
>   make GUILE=guile
>
> but, if I don't do this then instead of forcing 'guile' as the default,
> we allow the guild 

[PATCH] RISC-V: Fix regression (GCC-14 compare with GCC-13.2) of SHA256 from coremark-pro

2024-01-14 Thread Juzhe-Zhong
This patch fixes -70% performance drop from GCC-13.2 to GCC-14 with 
-march=rv64gcv in real hardware.

The root cause is incorrect cost model cause inefficient vectorization which 
makes us performance drop significantly.

So this patch does:

1. Adjust vector to scalar cost by introducing v to scalar reg move.
2. Adjust vec_construct cost since we does spend NUNITS instructions to 
construct the vector.

Tested on both RV32/RV64 no regression, ok for trunk ?
 
PR target/113247

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct regmove_vector_cost): Add vector 
to scalar regmove.
* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Ditto.
* config/riscv/riscv.cc (riscv_builtin_vectorization_cost): Adjust 
vec_construct cost.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/reduc-19.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/reduc-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/reduc-21.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113247-4.c: New test.

---
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-vector-costs.cc|   3 +
 gcc/config/riscv/riscv.cc |   4 +-
 .../vect/costmodel/riscv/rvv/pr113247-1.c | 195 ++
 .../vect/costmodel/riscv/rvv/pr113247-2.c |   6 +
 .../vect/costmodel/riscv/rvv/pr113247-3.c |   6 +
 .../vect/costmodel/riscv/rvv/pr113247-4.c |   6 +
 .../riscv/rvv/autovec/vls/reduc-19.c  |  11 +-
 .../riscv/rvv/autovec/vls/reduc-20.c  |  11 +-
 .../riscv/rvv/autovec/vls/reduc-21.c  |  11 +-
 10 files changed, 251 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-4.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4f3b677f4f9..21f6dadf113 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -255,6 +255,8 @@ struct regmove_vector_cost
 {
   const int GR2VR;
   const int FR2VR;
+  const int VR2GR;
+  const int VR2FR;
 };
 
 /* Cost for vector insn classes.  */
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 8adf5700890..298702d2807 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1069,6 +1069,9 @@ adjust_stmt_cost (enum vect_cost_for_stmt kind, tree 
vectype, int stmt_cost)
 case scalar_to_vec:
   return stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->FR2VR
  : costs->regmove->GR2VR);
+case vec_to_scalar:
+  return stmt_cost += (FLOAT_TYPE_P (vectype) ? costs->regmove->VR2FR
+ : costs->regmove->VR2GR);
 default:
   break;
 }
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ee1a57b321d..568db90a27d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -395,6 +395,8 @@ static const scalable_vector_cost rvv_vla_vector_cost = {
 static const regmove_vector_cost rvv_regmove_vector_cost = {
   2, /* GR2VR  */
   2, /* FR2VR  */
+  2, /* VR2GR  */
+  2, /* VR2FR  */
 };
 
 /* Generic costs for vector insn classes.  It is supposed to be the vector cost
@@ -10522,7 +10524,7 @@ riscv_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
   return fp ? common_costs->fp_stmt_cost : common_costs->int_stmt_cost;
 
 case vec_construct:
-  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype)) - 1;
+  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
 
 default:
   gcc_unreachable ();
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c
new file mode 100644
index 000..0d09a624a00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113247-1.c
@@ -0,0 +1,195 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
--param=riscv-autovec-lmul=dynamic" } */
+
+#include 
+
+#define Ch(x,y,z)   (z ^ (x & (y ^ z)))
+#define Maj(x,y,z)  ((x & y) | (z & (x | y)))
+
+#define SHR(x, n)(x >> n)
+#define ROTR(x,n)(SHR(x,n) | (x << (32 - n)))
+#define S1(x)(ROTR(x, 6) ^ ROTR(x,11) ^ ROTR(x,25))
+#define S0(x)(ROTR(x, 2) ^ ROTR(x,13) ^ ROTR(x,22))
+
+#define s1(x)(ROTR(x,17) ^ ROTR(x,19) ^  SHR(x,10))
+#define s0(x)(ROTR(x, 7) ^ ROTR(x,18) ^  SHR(x, 3))
+
+#define 

Re: [PATCH v2] LoongArch: testsuite:Added additional vectorization "-mlsx" option.

2024-01-14 Thread chenxiaolong
At 21:13 +0800 on Saturday, 2024-01-13, Xi Ruoyao wrote:
> At 15:28 +0800 on Saturday 2024-01-13, chenxiaolong wrote:
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/pr104992.c: Added additional "-mlsx" compilation
> > options.
> > * gcc.dg/signbit-2.c: Dito.
> > * gcc.dg/tree-ssa/scev-16.c: Dito.
> > * gfortran.dg/graphite/vect-pr40979.f90: Dito.
> > * gfortran.dg/vect/fast-math-mgrid-resid.f: Dito.
> 
> I don't feel it right about the changes to pr104992.c and scev-16.c
> because no other architectures add special options there.  Why are we
> so special?
> 
> > ---
> >  gcc/testsuite/gcc.dg/pr104992.c| 1 +
> >  gcc/testsuite/gcc.dg/signbit-2.c   | 1 +
> >  gcc/testsuite/gcc.dg/tree-ssa/scev-16.c| 1 +
> >  gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90| 1 +
> >  gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f | 1 +
> >  5 files changed, 5 insertions(+)
> > 
> > diff --git a/gcc/testsuite/gcc.dg/pr104992.c
> > b/gcc/testsuite/gcc.dg/pr104992.c
> > index 82f8c75559c..a77992fa491 100644
> > --- a/gcc/testsuite/gcc.dg/pr104992.c
> > +++ b/gcc/testsuite/gcc.dg/pr104992.c
> > @@ -1,6 +1,7 @@
> >  /* PR tree-optimization/104992 */
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -Wno-psabi -fdump-tree-optimized" } */
> > +/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
> >  
> >  #define vector __attribute__((vector_size(4*sizeof(int
> >  
> > diff --git a/gcc/testsuite/gcc.dg/signbit-2.c
> > b/gcc/testsuite/gcc.dg/signbit-2.c
> > index 62bb4047d74..5511bb78149 100644
> > --- a/gcc/testsuite/gcc.dg/signbit-2.c
> > +++ b/gcc/testsuite/gcc.dg/signbit-2.c
> > @@ -5,6 +5,7 @@
> >  /* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-
> > *-* x86_64-*-* } } } */
> >  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve }
> > } */
> >  /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok
> > } } */
> > +/* { dg-additional-options "-mlsx" { target loongarch_sx } } */
> >  /* { dg-skip-if "no fallback for MVE" { arm_mve } } */
> >  
> >  #include 
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > index 120f40c0b6c..06cfbbcfae5 100644
> > --- a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c
> > @@ -1,6 +1,7 @@
> >  /* { dg-do compile } */
> >  /* { dg-require-effective-target vect_int } */
> >  /* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" }
> > */
> > +/* { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
> > */
> >  
> >  int A[1024 * 2];
> >  
> > diff --git a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
> > b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
> > index a42290948c4..6f2ad1166a4 100644
> > --- a/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
> > +++ b/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
> > @@ -1,6 +1,7 @@
> >  ! { dg-do compile }
> >  ! { dg-require-effective-target vect_double }
> >  ! { dg-additional-options "-msse2" { target { { i?86-*-* x86_64-*-
> > * } && ilp32 } } }
> > +! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
> >  
> >  module mqc_m
> >  integer, parameter, private :: longreal =
> > selected_real_kind(15,90)
> > diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
> > b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> > index 08965cc5e20..97b88821731 100644
> > --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> > +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
> > @@ -2,6 +2,7 @@
> >  ! { dg-require-effective-target vect_double }
> >  ! { dg-options "-O3 --param vect-max-peeling-for-alignment=0
> > -fpredictive-commoning -fdump-tree-pcom-details -std=legacy" }
> >  ! { dg-additional-options "-mprefer-avx128" { target { i?86-*-*
> > x86_64-*-* } } }
> > +! { dg-additional-options "-mlsx" { target { loongarch*-*-* } } }
> >  ! { dg-additional-options "-mzarch" { target { s390*-*-* } } }
> >  
> >  *** RESID COMPUTES THE RESIDUAL:  R = V - AU

Because on the LoongArch architecture, GCC requires the addition of
vectorization options in order to generate vector code. Use the
check_effective_target_vect_cmdline_needed command in the lib/target-
supports.exp file to set whether the command line option is needed to
enable vectorizations. For example, ia64,x86,aarch64, and riscv
architectures, vectorization is enabled by default.



[PATCH] RISC-V: Adjust loop len by costing 1 when NITER < VF

2024-01-14 Thread Juzhe-Zhong
Update in v2: Add dynmaic lmul test.

This patch fixes the regression between GCC 13.2.0 and trunk GCC (GCC-14)

GCC 13.2.0:

lui a5,%hi(a)
li  a4,19
sb  a4,%lo(a)(a5)
li  a0,0
ret

Trunk GCC:

vsetvli a5,zero,e8,mf2,ta,ma
li  a4,-32768
vid.v   v1
vsetvli zero,zero,e16,m1,ta,ma
addiw   a4,a4,104
vmv.v.i v3,15
lui a1,%hi(a)
li  a0,19
vsetvli zero,zero,e8,mf2,ta,ma
vadd.vi v1,v1,1
sb  a0,%lo(a)(a1)
vsetvli zero,zero,e16,m1,ta,ma
vzext.vf2   v2,v1
vmv.v.x v1,a4
vminu.vvv2,v2,v3
vsrl.vv v1,v1,v2
vslidedown.vi   v1,v1,17
vmv.x.s a0,v1
sneza0,a0
ret

The root cause we are vectorizing the codes inefficiently since we doesn't cost 
len when NITERS < VF.
Leverage loop control of mask targets or rs6000 fixes the regression.

Tested no regression. Ok for trunk ?

PR target/113281

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc 
(costs::adjust_vect_cost_per_loop): New function.
(costs::finish_cost): Adjust cost for LOOP LEN with NITERS < VF.
* config/riscv/riscv-vector-costs.h: New function.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113281-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/pr113281-5.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 57 +++
 gcc/config/riscv/riscv-vector-costs.h |  2 +
 .../vect/costmodel/riscv/rvv/pr113281-3.c | 18 ++
 .../vect/costmodel/riscv/rvv/pr113281-4.c | 18 ++
 .../vect/costmodel/riscv/rvv/pr113281-5.c | 18 ++
 5 files changed, 113 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-4.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-5.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 1c3708f23a0..8adf5700890 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1110,9 +1110,66 @@ costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
   return record_stmt_cost (stmt_info, where, count * stmt_cost);
 }
 
+/* For some target specific vectorization cost which can't be handled per stmt,
+   we check the requisite conditions and adjust the vectorization cost
+   accordingly if satisfied.  One typical example is to model model and adjust
+   loop_len cost for known_lt (NITERS, VF).  */
+
+void
+costs::adjust_vect_cost_per_loop (loop_vec_info loop_vinfo)
+{
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)
+  && !LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  /* In middle-end loop vectorizer, we don't count the loop_len cost in
+vect_estimate_min_profitable_iters when NITERS < VF, that is, we only
+count cost of len that we need to iterate loop more than once with VF
+(m_num_vector_iterations > 1).  It's correct for most of the cases:
+
+E.g. VF = [4, 4]
+  for (int i = 0; i < 3; i ++)
+a[i] += b[i];
+
+We don't need to cost MIN_EXPR or SELECT_VL for the case above.
+
+However, for some inefficient vectorized cases, it does use MIN_EXPR
+to generate len.
+
+E.g. VF = [256, 256]
+
+Loop body:
+  # loop_len_110 = PHI <18(2), _119(11)>
+  ...
+  _117 = MIN_EXPR ;
+  _118 = 18 - _117;
+  _119 = MIN_EXPR <_118, POLY_INT_CST [256, 256]>;
+  ...
+
+Epilogue:
+  ...
+  _112 = .VEC_EXTRACT (vect_patt_27.14_109, _111);
+
+We cost 1 unconditionally for this situation like other targets which
+apply mask as the loop control.  */
+  rgroup_controls *rgc;
+  unsigned int num_vectors_m1;
+  unsigned int body_stmts = 0;
+  FOR_EACH_VEC_ELT (LOOP_VINFO_LENS (loop_vinfo), num_vectors_m1, rgc)
+   if (rgc->type)
+ body_stmts += num_vectors_m1 + 1;
+
+  add_stmt_cost (body_stmts, scalar_stmt, NULL, NULL, NULL_TREE, 0,
+vect_body);
+}
+}
+
 void
 costs::finish_cost (const vector_costs *scalar_costs)
 {
+  if (loop_vec_info loop_vinfo = dyn_cast (m_vinfo))
+{
+  adjust_vect_cost_per_loop (loop_vinfo);
+}
   vector_costs::finish_cost (scalar_costs);
 }
 
diff --git a/gcc/config/riscv/riscv-vector-costs.h 
b/gcc/config/riscv/riscv-vector-costs.h
index 9bf041bb65c..3defd45fd4c 100644
--- a/gcc/config/riscv/riscv-vector-costs.h
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -101,6 +101,8 @@ private:
  V_REGS spills according to the analysis.  */
   bool m_has_unexpected_spills_p = false;
   void record_potential_unexpected_spills (loop_vec_info);
+
+  

Re: [PATCH] libsupc++: Fix UB terminating on foreign exception

2024-01-14 Thread Julia DeMille

On 2024-01-14 01:52, Jonathan Wakely wrote:
The reason for this is that the ChangeLog files are auto-generated from 
the git commit messages, not edited by hand. Patches to those files 
rarely apply cleanly anyway, because they change so frequently that 
patches are stale almost immediately.
Makes sense. I'm new to the GCC mailing lists, so that one was 
unfamiliar to me.


That would be great thanks. If not obvious, easy instructions for 
building the test would be helpful for Rust newbs such as myself!


I've actually managed to come up with a much more concise Objective-C 
demonstration. I've uploaded it at:

https://codeberg.org/jdemille/gcc-exception-ub-demo

I'm unsure if my patch actually fixes it with this demo -- I need to 
work out how to use a patched GCC without installing it on my system, 
but without it breaking from not having things it expects to exist on 
the system.


I'm also going to go make sure that the Objective-C unwind personality 
is unique, otherwise we could have trouble.


--
Thanks,
Julia DeMille
she/her



[PING][PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-01-14 Thread xndcn
Ping. Thanks.

xndcn  于2024年1月8日周一 09:01写道:

> Hi, I found __atomic_float constructor does not clear padding,
> while __compare_exchange assumes it as zeroed padding. So it is easy to
> reproducing a infinite loop in X86-64 with long double type like:
> ---
> -O0 -std=c++23 -mlong-double-80
> #include 
> #include 
>
> #define T long double
> int main() {
> std::atomic t(0.5);
> t.fetch_add(0.5);
> float x = t;
> printf("%f\n", x);
> }
> ---
>
> So we should add __builtin_clear_padding in __atomic_float constructor,
> just like the generic atomic struct.
>
> regtested on x86_64-linux. Is it OK for trunk?
>
> ---
> libstdc++: atomic: Add missing clear_padding in __atomic_float constructor.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_base.h: add __builtin_clear_padding in
> __atomic_float constructor.
> ---
>  libstdc++-v3/include/bits/atomic_base.h | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_base.h
> b/libstdc++-v3/include/bits/atomic_base.h
> index f4ce0fa53..d59c2209e 100644
> --- a/libstdc++-v3/include/bits/atomic_base.h
> +++ b/libstdc++-v3/include/bits/atomic_base.h
> @@ -1283,7 +1283,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>constexpr
>__atomic_float(_Fp __t) : _M_fp(__t)
> -  { }
> +  {
> +#if __has_builtin(__builtin_clear_padding)
> + if _GLIBCXX17_CONSTEXPR (__atomic_impl::__maybe_has_padding<_Fp>())
> +  __builtin_clear_padding(std::__addressof(_M_fp));
> +#endif
> +  }
>
>__atomic_float(const __atomic_float&) = delete;
>__atomic_float& operator=(const __atomic_float&) = delete;
> --
> 2.25.1
>


[Patch] libgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy* (was: [Patch] libgomp.texi: Document omp_pause_resource{,_all})

2024-01-14 Thread Tobias Burnus

Hi Sandra, hi all,

Sandra Loosemore:

On 1/14/24 07:26, Tobias Burnus wrote:
I have some minor nits about typos and copy-editing.


Thanks. That's the downside of doing editing while being sleepy on a 
train. Updated and extended version (documenting also omp_target_memcpy) 
is attached. Warning: Still mostly done during a train ride.



I assume the  formatting of the interface syntax
is consistent with how it's done elsewhere in the manual.


It should be consistent, but I think it eventually need some cleanup as 
the indentation does not work for the continuation lines.


And I think some other improvements are needed, but that's a slow 
step-by-step process.


* * *

Re the content, I see no documentation for omp_pause_resource_t or the 
equivalent in Fortran, or any hint about what the kind argument is for.


There is actually some Fortran documentation at 
https://gcc.gnu.org/onlinedocs/gfortran/OpenMP-Modules-OMP_005fLIB-and-OMP_005fLIB_005fKINDS.html 
(gcc/fortran/intrinsic.texi).


But I concur that moving it to libgomp.texi and adding a C version makes 
sense; see also PR110364 under "BTW" (2nd paragraph).


If it's to explain implementation-specific 
features, then it should at least be documenting whether GCC supports 
additional pause kinds as permitted by the spec.


It doesn't - and it lacks the OpenMP 6.0 addition (post-TR12) as well.

Tobiaslibgomp.texi: Document omp_pause_resource{,_all} and omp_target_memcpy*

libgomp/ChangeLog:

	* libgomp.texi (Runtime Library Routines): Document
	omp_pause_resource, omp_pause_resource_all and
	omp_target_memcpy{,_rect}{,_async}.

 libgomp/libgomp.texi | 329 ---
 1 file changed, 314 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 74d4ef34c43..d3adfd48545 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -561,7 +561,7 @@ specification in version 5.2.
 * Thread Affinity Routines::
 * Teams Region Routines::
 * Tasking Routines::
-@c * Resource Relinquishing Routines::
+* Resource Relinquishing Routines::
 * Device Information Routines::
 * Device Memory Routines::
 * Lock Routines::
@@ -1504,16 +1504,78 @@ and @code{false} represent their language-specific counterparts.
 
 
 
-@c @node Resource Relinquishing Routines
-@c @section Resource Relinquishing Routines
-@c
-@c Routines releasing resources used by the OpenMP runtime.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_pause_resource:: 
-@c * omp_pause_resource_all:: 
-@c @end menu
+@node Resource Relinquishing Routines
+@section Resource Relinquishing Routines
+
+Routines releasing resources used by the OpenMP runtime.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_pause_resource:: Release OpenMP resources on a device
+* omp_pause_resource_all:: Release OpenMP resources on all devices
+@end menu
+
+
+
+@node omp_pause_resource
+@subsection @code{omp_pause_resource} -- Release OpenMP resources on a device
+@table @asis
+@item @emph{Description}:
+Free resources used by the OpenMP program and the runtime library on and for the
+device specified by @var{device_num}; on success, zero is returned and non-zero
+otherwise.
+
+The value of @var{device_num} must be a conforming device number.  The routine
+may not be called from within any explicit region and all explicit threads that
+do not bind to the implicit parallel region have finalized execution.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind, int device_num);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind, device_num)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@item   @tab @code{integer device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.43.
+@end table
+
+
+
+@node omp_pause_resource_all
+@subsection @code{omp_pause_resource_all} -- Release OpenMP resources on all devices
+@table @asis
+@item @emph{Description}:
+Free resources used by the OpenMP program and the runtime library on all devices,
+including the host. On success, zero is returned and non-zero otherwise.
+
+The routine may not be called from within any explicit region and all explicit
+threads that do not bind to the implicit parallel region have finalized execution.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@end multitable
+
+@item @emph{See also}:

[committed] Disable tests for strdup/strndup on __hpux__ in various builtin-object-size tests

2024-01-14 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

Disable tests for strdup/strndup on __hpux__

hppa*-*-hpux* doesn't have strdup or strndup.

2024-01-14  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-1.c: Disable tests for strdup/strndup
on __hpux__.
* gcc.dg/builtin-object-size-2.c: Likewise.
* gcc.dg/builtin-object-size-3.c: Likewise.
* gcc.dg/builtin-object-size-4.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
index 64c4bc4da39..4f7d4c0b370 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-1.c
@@ -621,7 +621,7 @@ test10 (void)
 }
 }
 
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -726,7 +726,7 @@ main (void)
   test8 ();
   test9 (1);
   test10 ();
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
   test11 ();
 #endif
   DONE ();
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-2.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-2.c
index da10b6b0632..37d3dcc6f56 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-2.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-2.c
@@ -536,7 +536,7 @@ test8 (unsigned cond)
 #endif
 }
 
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -639,7 +639,7 @@ main (void)
   test6 ();
   test7 ();
   test8 (1);
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
   test9 ();
 #endif
   DONE ();
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-3.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
index f23873bec38..f4d1ebf7027 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-3.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-3.c
@@ -628,7 +628,7 @@ test10 (void)
 }
 }
 
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -734,7 +734,7 @@ main (void)
   test8 ();
   test9 (1);
   test10 ();
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
   test11 ();
 #endif
   DONE ();
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-4.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
index dcb042f34b6..2887dd15042 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-4.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
@@ -509,7 +509,7 @@ test8 (unsigned cond)
 #endif
 }
 
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
 /* Tests for strdup/strndup.  */
 size_t
 __attribute__ ((noinline))
@@ -612,7 +612,7 @@ main (void)
   test6 ();
   test7 ();
   test8 (1);
-#ifndef __AVR__ /* avr has no strndup */
+#if !defined(__AVR__) && !defined(__hpux__) /* avr and hpux have no strndup */
   test9 ();
 #endif
   DONE ();


signature.asc
Description: PGP signature


[committed] Skip several gcc.dg/builtin-dynamic-object-size tests on hppa*-*-hpux*

2024-01-14 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

Skip several gcc.dg/builtin-dynamic-object-size tests on hppa*-*-hpux*

hppa*-*-hpux* doesn't have strdup or strndup.

2024-01-14  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c: Skip on hppa*-*-hpux*.
* gcc.dg/builtin-dynamic-object-size-1.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index c3ac6230d4d..173e7c755f4 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
 /* { dg-require-effective-target size20plus } */
+/* { dg-skip-if "no strndup" { hppa*-*-hpux* } } */
 
 #include "builtin-object-size-common.h"
 
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
index 8f17c8edcaf..ffa59985024 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -Wno-stringop-overread" } */
 /* { dg-require-effective-target alloca } */
+/* { dg-skip-if "no strndup" { hppa*-*-hpux* } } */
 
 #define __builtin_object_size __builtin_dynamic_object_size
 #include "builtin-object-size-1.c"
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
index 3677782ff1c..fff32da7aea 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -Wno-stringop-overread" } */
 /* { dg-require-effective-target alloca } */
+/* { dg-skip-if "no strndup" { hppa*-*-hpux* } } */
 
 #define __builtin_object_size __builtin_dynamic_object_size
 #include "builtin-object-size-2.c"
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
index 5b6987b7773..ac223d67b10 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-3.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -Wno-stringop-overread" } */
 /* { dg-require-effective-target alloca } */
+/* { dg-skip-if "no strndup" { hppa*-*-hpux* } } */
 
 #define __builtin_object_size __builtin_dynamic_object_size
 #include "builtin-object-size-3.c"
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
index 9d796224e96..fdf4284ae11 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-4.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -Wno-stringop-overread" } */
 /* { dg-require-effective-target alloca } */
+/* { dg-skip-if "no strndup" { hppa*-*-hpux* } } */
 
 #define __builtin_object_size __builtin_dynamic_object_size
 #include "builtin-object-size-4.c"


signature.asc
Description: PGP signature


[PATCH/RFC] Add --with-dwarf4 configure option.

2024-01-14 Thread Roger Sayle

This patch fixes three of the four unexpected failures that I'm seeing
in the gcc testsuite on x86_64-pc-linux-gnu.  The three FAILs are:
FAIL: gcc.c-torture/execute/fprintf-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/printf-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/user-printf.c   -O3 -g  (test for excess errors)

and are caused by the linker/toolchain (GNU ld 2.27 on RedHat 7) issuing
a link-time warning:
/usr/bin/ld: Dwarf Error: found dwarf version '5', this reader only handles
version 2, 3 and 4 information.

This also explains why these c-torture tests only fail with -g.

One solution might be to tweak/improve GCC's testsuite to ignore
these warnings.  However, ideally it should also be possible to
configure gcc not to generate dwarf5 debugging information on
systems that don't/can't support it.  This patch supplements the
current --with-dwarf2 configure option with the addition of a
--with-dwarf4 option that adds a tm-dwarf4.h to $tm_file (using
the same mechanism as --with-dwarf2) that changes/redefines
DWARF_VERSION_DEFAULT to 4 (overriding the current default of 5),

This patch has been tested on x86_64-pc-linux-gnu, with a full
make bootstrap, both with and without --with-dwarf4.  This is
fixes the three failures above, and causes no new failures outside
of the gcc.dg/guality directory.  Unfortunately, the guality
testsuite contains a large number of tests that assume support
for dwarf5 and don't (yet) check check_effective_target_dwarf5.
Hopefully, adding --with-dwarf4 will help improve/test the
portability of the guality testsuite.

Ok for mainline?  An alternative implementation might be to
allow integer values for $with_dwarf such that --with-dwarf5,
--with-dwarf3 etc. do the right thing.  In fact, I'd originally
misread the documentation and assumed --with-dwarf4 was already
supported.


2024-01-14  Roger Sayle  

gcc/ChangeLog
* configure.ac: Add a with --with dwarf4 option.
* configure: Regenerate.
* config/tm-dwarf4.h: New target file to define
DWARF_VERSION_DEFAULT to 4.


Thanks in advance,
Roger
--

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 596e5f2..2ce9093 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1036,6 +1036,11 @@ AC_ARG_WITH(dwarf2,
 dwarf2="$with_dwarf2",
 dwarf2=no)
 
+AC_ARG_WITH(dwarf4,
+[AS_HELP_STRING([--with-dwarf4], [force the default debug format to be DWARF 
4])],
+dwarf4="$with_dwarf4",
+dwarf4=no)
+
 AC_ARG_ENABLE(shared,
 [AS_HELP_STRING([--disable-shared], [don't provide a shared libgcc])],
 [
@@ -1916,6 +1921,10 @@ if test x"$dwarf2" = xyes
 then tm_file="$tm_file tm-dwarf2.h"
 fi
 
+if test x"$dwarf4" = xyes
+then tm_file="$tm_file tm-dwarf4.h"
+fi
+
 # Say what files are being used for the output code and MD file.
 echo "Using \`$srcdir/config/$out_file' for machine-specific logic."
 echo "Using \`$srcdir/config/$md_file' as machine description file."
diff --git a/gcc/config/tm-dwarf4.h b/gcc/config/tm-dwarf4.h
new file mode 100644
index 000..9557b40
--- /dev/null
+++ b/gcc/config/tm-dwarf4.h
@@ -0,0 +1,3 @@
+/* Make Dwarf4 debugging info the default */
+#undef  DWARF_VERSION_DEFAULT
+#define  DWARF_VERSION_DEFAULT 4


[committed] Fix dg-warning on hppa*64*-*-*

2024-01-14 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

Fix dg-warning on hppa*64*-*-*

2024-01-14  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.dg/Wattributes-6.c: Fix dg-warning on hppa*64*-*-*.

diff --git a/gcc/testsuite/gcc.dg/Wattributes-6.c 
b/gcc/testsuite/gcc.dg/Wattributes-6.c
index 978f3f938e9..49a085def9e 100644
--- a/gcc/testsuite/gcc.dg/Wattributes-6.c
+++ b/gcc/testsuite/gcc.dg/Wattributes-6.c
@@ -408,7 +408,7 @@ finline_hot_noret_align (int);  /* { dg-warning "ignoring 
attribute .warn_unused
/* { dg-note"previous declaration here" "" 
{ target *-*-* } .-1 } */
 
 inline int ATTR ((aligned (4)))
-  finline_hot_noret_align (int);  /* { dg-warning "ignoring attribute .aligned 
\\(4\\). because it conflicts with attribute .aligned \\(8\\)." "" { target { ! 
{ hppa*64*-*-* } } } } */
+  finline_hot_noret_align (int);  /* { dg-warning "ignoring attribute .aligned 
\\(4\\). because it conflicts with attribute .aligned \\(8\\)." "" } */
 
 inline int ATTR ((aligned (8)))
 finline_hot_noret_align (int);  /* { dg-note   "previous declaration here" } */


signature.asc
Description: PGP signature


[patch,wwwdocs,avr,applied] Add AVR news for v14.

2024-01-14 Thread Georg-Johann Lay

https://gcc.gnu.org/gcc-14/changes.html#avr

Johann

--

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 9c9dfa44..8c738683 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -342,7 +342,55 @@ a work-in-progress.
   
 

-
+AVR
+
+  On AVR64* and AVR128* devices, read-only data is now located in 
program

+memory per default and no more in RAM.
+
+  Only a 32KiB block of program memory can be used to store
+   .rodata in that way. Which block is used can be selected by
+   defining symbol __flmap.
+   As an alternative, the byte address of the block can be specified
+   by symbol __RODATA_FLASH_START__ which takes
+   precedence over __flmap.
+  The default uses the last 32KiB block, which is also the
+   hardware default for bit field NVMCTRL_CTRLB.FLMAP.
+  When a block other than the last 32 KiB block is used  to store
+   .rodata, then NVMCTRL_CTRLB.FLMAP
+   must be initialized accordingly by hand, or a version of
+   AVR-LibC that implementshttps://github.com/avrdudes/avr-libc/issues/931;>#931
+   must be used. The latter initializes NVMCTRL_CTRLB.FLMAP
+   in the startup code and according to the value
+   of__flmap resp.
+   __RODATA_FLASH_START__.
+  When AVR-LibC with#931 is used, then defining symbol
+   __flmap_lock to a non-zero value will set bit
+   NVMCTRL_CTRLB.FLMAPLOCK. This will protect
+   NVMCTRL_CTRLB.FLMAP from any further changes 
+   which would be Undefined Behaviour in C/C++.
+  In order to return to the old placement of read-only data in RAM,
+   the new compiler option -mrodata-in-ram can be used.
+  Read-only data is located in output section .rodata,
+   wheras it is part of .text when located in RAM.
+  The feature is only available when the compiler is configured
+   with a version of Binutils that implements
+   https://sourceware.org/PR31124;>PR31124, which is the
+   case for Binutilsv2.42 and up.
+
+  
+  A new compiler option -m[no]-rodata-in-ram has been 
added.
+The default is to locate read-only data in program memory for 
devices that

+support it, e.g. for AVR64* and AVR128* devices as explained above,
+and for devices from the
+href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#avrxmega3;>avrxmega3

+and
+href="https://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#avrtiny;>avrtiny 
families.

+  
+  The new built-in macro __AVR_RODATA_IN_RAM__ is 
supported

+on all devices. Its defined to  0 or 1.
+  
+

 IA-32/x86-64
 


[committed] Skip several analyzer socket tests on hppa*-*-hpux*

2024-01-14 Thread John David Anglin
Tested on hppa64-hp-hpux11.11.  Committed to trunk.

Dave
---

Skip several analyzer socket tests on hppa*-*-hpux*

2024-01-14  John David Anglin  

gcc/testsuite/ChangeLog:

PR analyzer/113150
* c-c++-common/analyzer/fd-glibc-byte-stream-socket.c: Skip
on hppa*-*-hpux*.
* c-c++-common/analyzer/fd-manpage-getaddrinfo-client.c: Likewise.
* c-c++-common/analyzer/fd-mappage-getaddrinfo-server.c: Likewise.
* c-c++-common/analyzer/fd-symbolic-socket.c: Likewise.
* gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c: Likewise.

diff --git a/gcc/testsuite/c-c++-common/analyzer/fd-glibc-byte-stream-socket.c 
b/gcc/testsuite/c-c++-common/analyzer/fd-glibc-byte-stream-socket.c
index d9666f99edd..fab8426acb9 100644
--- a/gcc/testsuite/c-c++-common/analyzer/fd-glibc-byte-stream-socket.c
+++ b/gcc/testsuite/c-c++-common/analyzer/fd-glibc-byte-stream-socket.c
@@ -1,6 +1,6 @@
 /* Example from glibc manual (16.9.6).  */
 /* { dg-require-effective-target sockets } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "" { hppa*-*-hpux* powerpc*-*-aix* } } */
 
 #include 
 #include 
diff --git 
a/gcc/testsuite/c-c++-common/analyzer/fd-manpage-getaddrinfo-client.c 
b/gcc/testsuite/c-c++-common/analyzer/fd-manpage-getaddrinfo-client.c
index 16da9333074..21dfe977db8 100644
--- a/gcc/testsuite/c-c++-common/analyzer/fd-manpage-getaddrinfo-client.c
+++ b/gcc/testsuite/c-c++-common/analyzer/fd-manpage-getaddrinfo-client.c
@@ -28,7 +28,7 @@ the source, must acknowledge the copyright and authors of 
this work.
 
 /* { dg-require-effective-target sockets } */
 /* { dg-additional-options "-Wno-analyzer-too-complex" } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "" { hppa*-*-hpux* powerpc*-*-aix* } } */
 
 #include 
 #include 
diff --git 
a/gcc/testsuite/c-c++-common/analyzer/fd-mappage-getaddrinfo-server.c 
b/gcc/testsuite/c-c++-common/analyzer/fd-mappage-getaddrinfo-server.c
index c02ee6ff643..2e9cec4abf0 100644
--- a/gcc/testsuite/c-c++-common/analyzer/fd-mappage-getaddrinfo-server.c
+++ b/gcc/testsuite/c-c++-common/analyzer/fd-mappage-getaddrinfo-server.c
@@ -27,7 +27,7 @@ the source, must acknowledge the copyright and authors of 
this work.
 */
 
 /* { dg-require-effective-target sockets } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "" { hppa*-*-hpux* powerpc*-*-aix* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/c-c++-common/analyzer/fd-symbolic-socket.c 
b/gcc/testsuite/c-c++-common/analyzer/fd-symbolic-socket.c
index d7dc46a2d47..32264fd9701 100644
--- a/gcc/testsuite/c-c++-common/analyzer/fd-symbolic-socket.c
+++ b/gcc/testsuite/c-c++-common/analyzer/fd-symbolic-socket.c
@@ -1,5 +1,5 @@
 /* { dg-require-effective-target sockets } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "" { hppa*-*-hpux* powerpc*-*-aix* } } */
 
 #include 
 #include 
diff --git 
a/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c
index d8b697d323e..fcbcc740170 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c
@@ -1,7 +1,7 @@
 /* Example from glibc manual (16.9.7).  */
 /* { dg-require-effective-target sockets } */
 /* { dg-additional-options "-Wno-analyzer-too-complex" } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-skip-if "" { hppa*-*-hpux* powerpc*-*-aix* } } */
 
 #include 
 #include 


signature.asc
Description: PGP signature


Re: [Patch] libgomp.texi: Document omp_pause_resource{,_all}

2024-01-14 Thread Sandra Loosemore

On 1/14/24 07:26, Tobias Burnus wrote:
This documents two more OpenMP (5.0) routines, omp_pause_resource and 
omp_pause_resource_all.


Comments, remarks, suggestions - to the patch or the documentation in general?


I have some minor nits about typos and copy-editing.  I assume the formatting 
of the interface syntax

is consistent with how it's done elsewhere in the manual.


+@node Resource Relinquishing Routines
+@section Resource Relinquishing Routines
+
+Routines releasing resources used by the OpenMP runtime.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_pause_resource:: Release OpenMP ressouces on a device
+* omp_pause_resource_all:: Release OpenMP ressouces on all devices


s/ressouces/resources/g (there are more instances below)


+@end menu
+
+
+
+@node omp_pause_resource
+@subsection @code{omp_pause_resource} -- Release OpenMP ressouces on a device
+@table @asis
+@item @emph{Description}:
+Free resources used by OpenMP programm and runtime library on and for the


s/OpenMP programm/an OpenMP program/g (same mistake below)


+device specified by @var{device_num}; on success, zero is returned and non-zero
+otherwise.
+
+The value of @var{device_num} must be valid device number.  The effect when


s/valid device number/a valid device number/


+invoked from within a @code{target} region is unspecified.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t 
kind, int device_num);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind, 
device_num)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@item   @tab @code{integer device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.43.
+@end table
+
+
+
+@node omp_pause_resource_all
+@subsection @code{omp_pause_resource_all} -- Release OpenMP ressouces on all 
devices
+@table @asis
+@item @emph{Description}:
+Free resources used by OpenMP programm and runtime library on all devices, 
including
+the host. On success, zero is returned and non-zero otherwise.
+
+The effect when invoked from within a @code{target} region is unspecified.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t 
kind);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_pause_resource}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.44.
+@end table
+
+


Re the content, I see no documentation for omp_pause_resource_t or the 
equivalent in Fortran, or any hint about what the kind argument is for.  I 
understand this is in the OpenMP spec but if you're going to make users read 
the spec anyway I wonder what the purpose of the GCC documentation is.  :-S  If 
it's to explain implementation-specific features, then it should at least be 
documenting whether GCC supports additional pause kinds as permitted by the spec.


-Sandra



[PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-14 Thread Ajit Agarwal
Hello All:

This patch add the vecload pass to replace adjacent memory accesses lxv with 
lxvp
instructions. This pass is added before ira pass.

vecload pass removes one of the defined adjacent lxv (load) and replace with 
lxvp.
Due to removal of one of the defined loads the allocno is has only uses but
not defs.

Due to this IRA pass doesn't assign register pairs like registers in sequence.
Changes are made in IRA register allocator to assign sequential registers to
adjacent loads.

Some of the registers are cleared and are not set as profitable registers due 
to zero cost is greater than negative costs and checks are added to compare
positive costs.

LRA register is changed not to reassign them to different register and form
the sequential register pairs intact.


contrib/check_GNU_style.sh run on patch looks good.

Bootstrapped and regtested for powerpc64-linux-gnu.

Spec2017 benchmarks are run and I get impressive benefits for some of the FP
benchmarks.

Thanks & Regards
Ajit


rs6000: New  pass for replacement of adjacent lxv with lxvp.

New pass to replace adjacent memory addresses lxv with lxvp.
This pass is registered before ira rtl pass.

2024-01-14  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: Registered vecload pass.
* config/rs6000/rs6000-vecload-opt.cc: Add new pass.
* config.gcc: Add new executable.
* config/rs6000/rs6000-protos.h: Add new prototype for vecload
pass.
* config/rs6000/rs6000.cc: Add new prototype for vecload pass.
* config/rs6000/t-rs6000: Add new rule.
* ira-color.cc: Form register pair with adjacent loads.
* lra-assigns.cc: Skip modifying register pair assignment.
* lra-int.h: Add pseudo_conflict field in lra_reg_p structure.
* lra.cc: Initialize pseudo_conflict field.
* ira-build.cc: Use of REG_FREQ.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/vecload.C: New test.
* g++.target/powerpc/vecload1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   4 +-
 gcc/config/rs6000/rs6000-passes.def   |   4 +
 gcc/config/rs6000/rs6000-protos.h |   5 +-
 gcc/config/rs6000/rs6000-vecload-opt.cc   | 432 ++
 gcc/config/rs6000/rs6000.cc   |   8 +-
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/ira-color.cc  | 220 -
 gcc/lra-assigns.cc| 118 -
 gcc/lra-int.h |   2 +
 gcc/lra.cc|   1 +
 gcc/testsuite/g++.target/powerpc/vecload.C|  15 +
 gcc/testsuite/g++.target/powerpc/vecload1.C   |  22 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 13 files changed, 816 insertions(+), 24 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0676c830e8..4cf15e807de 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -518,7 +518,7 @@ or1k*-*-*)
;;
 powerpc*-*-*)
cpu_type=rs6000
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
@@ -555,7 +555,7 @@ riscv*)
;;
 rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
-   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
+   extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
rs6000-vecload-opt.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
diff --git a/gcc/config/rs6000/rs6000-passes.def 
b/gcc/config/rs6000/rs6000-passes.def
index ca899d5f7af..8bd172dd779 100644
--- a/gcc/config/rs6000/rs6000-passes.def
+++ b/gcc/config/rs6000/rs6000-passes.def
@@ -29,6 +29,10 @@ along with GCC; see the file COPYING3.  If not see
  for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
 
+  /* Pass to replace adjacent memory addresses lxv instruction with lxvp
+ instruction.  */
+  INSERT_PASS_BEFORE (pass_ira, 1, pass_analyze_vecload);
+
   /* Pass to do the PCREL_OPT optimization that combines the load of an
  external symbol's address along with a single load or store using that
  address as a base register.  */
diff --git a/gcc/config/rs6000/rs6000-protos.h 

Re: [patch, avr, ping #3] PR target/112944: Support .rodata in RAM for AVR64* and AVR128* devices

2024-01-14 Thread Jeff Law




On 1/14/24 06:05, Georg-Johann Lay wrote:

Ping #3

RFA: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640140.html
Ping #1 https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640981.html
Ping #2 https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641912.html

This is a patch that locates .rodata in flash for some AVR
devices that can support it.  All new functionality depends
on Binutils PR31124 and is switched on by configure checks
for the new emulations.

https://sourceware.org/PR31124 is already upstream.

For explanation of the gcc part see commit message below.

Most of the patch is adjusting device-specs generation.

When there are no objections, I would apply this in the
next week or so, so that it is part of v14.


Johann


--

AVR: Support .rodata in Flash for AVR64* and AVR128* Devices.

These devices see a 32 KiB block of their program memory (flash) in
the RAM address space.  This can be used to support .rodata in flash
provided Binutils support PR31124 (Add new emulations which locate
.rodata in flash).  This patch does the following:

* configure checks availability of Binutils PR31124.

* Add new command line options -mrodata-in-ram and -flmap.
While -flmap is for internal usage (communicate hardware properties
from device-specs to the compiler proper), -mrodata-in-ram is a user
space option that allows to return to the current rodata-in-ram layout.

* Adjust gen-avr-mmcu-specs.cc so that device-specs are generated
that sanity check options, and that translate -m[no-]rodata-in-ram
to its emulation.

* Objects in .rodata don't drag __do_copy_data.

* Document new options and built-in macros.

 PR target/112944

gcc/
 * configure.ac [target=avr]: Check availability of emulations
 avrxmega2_flmap and avrxmega4_flmap, resulting in new config vars
 HAVE_LD_AVR_AVRXMEGA2_FLMAP and HAVE_LD_AVR_AVRXMEGA4_FLMAP.
 * configure: Regenerate.
 * config.in: Regenerate.
 * doc/invoke.texi (AVR Options): Document -mflmap, -mrodata-in-ram,
 __AVR_HAVE_FLMAP__, __AVR_RODATA_IN_RAM__.
 * doc/avr-mmcu.texi: Regenerate.

 * gcc/config/avr/avr.opt (-mflmap, -mrodata-in-ram): New options.
 * config/avr/avr-arch.h (enum avr_device_specific_features):
 Add AVR_ISA_FLMAP.
 * config/avr/avr-mcus.def (AVR_MCU) [avr64*, avr128*]: Set isa flag
 AVR_ISA_FLMAP.
 * gcc/config/avr/avr.cc (avr_arch_index, avr_has_rodata_p): New vars.
 (avr_set_core_architecture): Set avr_arch_index.
 (have_avrxmega2_flmap, have_avrxmega4_flmap)
 (have_avrxmega3_rodata_in_flash): Set new static const bool according
 to configure results.
 (avr_rodata_in_flash_p): New function using them.
 (avr_asm_init_sections): Let readonly_data_section->unnamed.callback
 track avr_need_copy_data_p only if not avr_rodata_in_flash_p().
 (avr_asm_named_section): Track avr_has_rodata_p.
 (avr_file_end): Emit __do_copy_data also when avr_has_rodata_p
 and not avr_rodata_in_flash_p ().
 * config/avr/specs.h (CC1_SPEC): Add %(cc1_rodata_in_ram).
 (LINK_SPEC): Add %(link_rodata_in_ram).
 (LINK_ARCH_SPEC): Remove.
 * gcc/config/avr/gen-avr-mmcu-specs.cc 
(have_avrxmega3_rodata_in_flash)

 (have_avrxmega2_flmap, have_avrxmega4_flmap): Set new static
 const bool according to configure results.
 (diagnose_mrodata_in_ram): New function.
 (print_mcu): Generate specs with the following changes:
 <*cc1_misc, *asm_misc, *link_misc>: New specs so that we don't
 need to extend avr/specs.h each time we add a new bell or whistle.
 <*cc1_rodata_in_ram, *link_rodata_in_ram>: New specs to diagnose
 -m[no-]rodata-in-ram.
 <*cpp_rodata_in_ram>: New. Does -D__AVR_RODATA_IN_RAM__=0/1.
 <*cpp_mcu>: Add -D__AVR_AVR_FLMAP__ if it applies.
 <*cpp>: Add %(cpp_rodata_in_ram).
 <*link_arch>: Use emulation avrxmega2_flmap, avrxmega4_flmap as
 requested.
 <*self_spec>: Add -mflmap or %I think you should go ahead and move forward with this patch.  As you 
know Denis isn't active anymore and you probably know the port better 
than anyone at this point.


I spot checked the patch and didn't see anything obviously wrong. The 
path appears to self configure based on binutils features, so there's no 
concerns in that space.  As you indicated, most of the changes are 
adjusting the specs.


jeff


[committed] Fix MIPS bootstrap

2024-01-14 Thread Jeff Law
mips bootstraps have been broken for a while.  They've been triggering 
an error about mutually exclusive equal-tests always being false when 
building gencondmd.


This was ultimately tracked down to the ior3_mips16_asmacro 
pattern.  The pattern uses the GPR mode iterator which looks like this:


(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])


The condition for the pattern looks like this:

  "ISA_HAS_MIPS16E2"

And if you dig into ISA_HAS_MIPS16E2:

/* The MIPS16e V2 instructions are available.  */
#define ISA_HAS_MIPS16E2   (TARGET_MIPS16 && TARGET_MIPS16E2 \
&& !TARGET_64BIT)


The way the mode iterator is handled is by adding its condition to the 
pattern's condition when we expand copies of the pattern resulting in 
this condition for one of the two generated patterns:


(TARGET_MIPS16 && TARGET_MIPS16E2 && !TARGET_64BIT) && TARGET_64BIT

This can never be true because of the TARGET_64BIT tests.

The fix is trivial.  Don't use a mode iterator on that pattern.

Bootstrapped on mips64el.  I don't have any tests to compare against, so 
no regression test data.


Pushed to the trunk,
Jeff



commit e927cfa842c16bea902500e69ab4eca2ef15af4e
Author: Jeff Law 
Date:   Sun Jan 14 07:53:49 2024 -0700

[committed] Fix MIPS bootstrap

mips bootstraps have been broken for a while.  They've been triggering an 
error
about mutually exclusive equal-tests always being false when building
gencondmd.

This was ultimately tracked down to the ior3_mips16_asmacro pattern.  
The
pattern uses the GPR mode iterator which looks like this:

(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])

The condition for the pattern looks like this:

  "ISA_HAS_MIPS16E2"

And if you dig into ISA_HAS_MIPS16E2:

/* The MIPS16e V2 instructions are available.  */
&& !TARGET_64BIT)

The way the mode iterator is handled is by adding its condition to the
pattern's condition when we expand copies of the pattern resulting in this
condition for one of the two generated patterns:

(TARGET_MIPS16 && TARGET_MIPS16E2 && !TARGET_64BIT) && TARGET_64BIT

This can never be true because of the TARGET_64BIT tests.

The fix is trivial.  Don't use a mode iterator on that pattern.

Bootstrapped on mips64el.  I don't have any tests to compare against, so no
regression test data.

gcc/
* config/mips/mips.md (ior3_mips16_asmacro): Use SImode,
not the GPR iterator.  Adjust pattern name and mode attribute
accordingly.

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 17dfcbd6722..b0fb5850a9e 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -3440,16 +3440,16 @@ (define_insn "*ior3"
(set_attr "compression" "micromips,*,*")
(set_attr "mode" "")])
 
-(define_insn "*ior3_mips16_asmacro"
-  [(set (match_operand:GPR 0 "register_operand" "=d,d")
-   (ior:GPR (match_operand:GPR 1 "register_operand" "%0,0")
-(match_operand:GPR 2 "uns_arith_operand" "d,K")))]
+(define_insn "*iorsi3_mips16_asmacro"
+  [(set (match_operand:SI 0 "register_operand" "=d,d")
+   (ior:SI (match_operand:SI 1 "register_operand" "%0,0")
+   (match_operand:SI 2 "uns_arith_operand" "d,K")))]
   "ISA_HAS_MIPS16E2"
   "@
or\t%0,%2
ori\t%0,%x2"
[(set_attr "alu_type" "or")
-(set_attr "mode" "")
+(set_attr "mode" "SI")
 (set_attr "extended_mips16" "*,yes")])
 
 (define_insn "*ior3_mips16"


[Patch] libgomp.texi: Document omp_pause_resource{,_all}

2024-01-14 Thread Tobias Burnus
This documents two more OpenMP (5.0) routines, omp_pause_resource and 
omp_pause_resource_all.


Comments, remarks, suggestions - to the patch or the documentation in 
general?


Tobias

PS: When looking at it, I found an issue in the spec with regards to a 
new constant (post TR12, hence, not added here) and the missing 
unspecified behavior when invoked from within a target region; that's 
now tracked as OpenMP spec issue #3793.


PPS: Still to be documented routines: omp_target_memcpy* and the 
places/affinity routines. (Plus OMPT, interop and TR11/TR12/... but 
those have to be implemented first.)libgomp.texi: Document omp_pause_resource{,_all}

libgomp/ChangeLog:

	* libgomp.texi (Runtime Library Routines): Document
	omp_pause_resource and omp_pause_resource_all.

 libgomp/libgomp.texi | 82 +---
 1 file changed, 71 insertions(+), 11 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 74d4ef34c43..4946dfe2c84 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -561,7 +561,7 @@ specification in version 5.2.
 * Thread Affinity Routines::
 * Teams Region Routines::
 * Tasking Routines::
-@c * Resource Relinquishing Routines::
+* Resource Relinquishing Routines::
 * Device Information Routines::
 * Device Memory Routines::
 * Lock Routines::
@@ -1504,16 +1504,76 @@ and @code{false} represent their language-specific counterparts.
 
 
 
-@c @node Resource Relinquishing Routines
-@c @section Resource Relinquishing Routines
-@c
-@c Routines releasing resources used by the OpenMP runtime.
-@c They have C linkage and do not throw exceptions.
-@c
-@c @menu
-@c * omp_pause_resource:: 
-@c * omp_pause_resource_all:: 
-@c @end menu
+@node Resource Relinquishing Routines
+@section Resource Relinquishing Routines
+
+Routines releasing resources used by the OpenMP runtime.
+They have C linkage and do not throw exceptions.
+
+@menu
+* omp_pause_resource:: Release OpenMP ressouces on a device
+* omp_pause_resource_all:: Release OpenMP ressouces on all devices
+@end menu
+
+
+
+@node omp_pause_resource
+@subsection @code{omp_pause_resource} -- Release OpenMP ressouces on a device
+@table @asis
+@item @emph{Description}:
+Free resources used by OpenMP programm and runtime library on and for the
+device specified by @var{device_num}; on success, zero is returned and non-zero
+otherwise.
+
+The value of @var{device_num} must be valid device number.  The effect when
+invoked from within a @code{target} region is unspecified.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind, int device_num);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind, device_num)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@item   @tab @code{integer device_num}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.43.
+@end table
+
+
+
+@node omp_pause_resource_all
+@subsection @code{omp_pause_resource_all} -- Release OpenMP ressouces on all devices
+@table @asis
+@item @emph{Description}:
+Free resources used by OpenMP programm and runtime library on all devices, including
+the host. On success, zero is returned and non-zero otherwise.
+
+The effect when invoked from within a @code{target} region is unspecified.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind)}
+@item   @tab @code{integer (kind=omp_pause_resource_kind) kind}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_pause_resource}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.44.
+@end table
+
+
 
 @node Device Information Routines
 @section Device Information Routines


[patch,avr,ping #3] PR target/112944: Support .rodata in RAM for AVR64* and AVR128* devices

2024-01-14 Thread Georg-Johann Lay

Ping #3

RFA: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640140.html
Ping #1 https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640981.html
Ping #2 https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641912.html

This is a patch that locates .rodata in flash for some AVR
devices that can support it.  All new functionality depends
on Binutils PR31124 and is switched on by configure checks
for the new emulations.

https://sourceware.org/PR31124 is already upstream.

For explanation of the gcc part see commit message below.

Most of the patch is adjusting device-specs generation.

When there are no objections, I would apply this in the
next week or so, so that it is part of v14.


Johann


--

AVR: Support .rodata in Flash for AVR64* and AVR128* Devices.

These devices see a 32 KiB block of their program memory (flash) in
the RAM address space.  This can be used to support .rodata in flash
provided Binutils support PR31124 (Add new emulations which locate
.rodata in flash).  This patch does the following:

* configure checks availability of Binutils PR31124.

* Add new command line options -mrodata-in-ram and -flmap.
While -flmap is for internal usage (communicate hardware properties
from device-specs to the compiler proper), -mrodata-in-ram is a user
space option that allows to return to the current rodata-in-ram layout.

* Adjust gen-avr-mmcu-specs.cc so that device-specs are generated
that sanity check options, and that translate -m[no-]rodata-in-ram
to its emulation.

* Objects in .rodata don't drag __do_copy_data.

* Document new options and built-in macros.

PR target/112944

gcc/
* configure.ac [target=avr]: Check availability of emulations
avrxmega2_flmap and avrxmega4_flmap, resulting in new config vars
HAVE_LD_AVR_AVRXMEGA2_FLMAP and HAVE_LD_AVR_AVRXMEGA4_FLMAP.
* configure: Regenerate.
* config.in: Regenerate.
* doc/invoke.texi (AVR Options): Document -mflmap, -mrodata-in-ram,
__AVR_HAVE_FLMAP__, __AVR_RODATA_IN_RAM__.
* doc/avr-mmcu.texi: Regenerate.

* gcc/config/avr/avr.opt (-mflmap, -mrodata-in-ram): New options.
* config/avr/avr-arch.h (enum avr_device_specific_features):
Add AVR_ISA_FLMAP.
* config/avr/avr-mcus.def (AVR_MCU) [avr64*, avr128*]: Set isa flag
AVR_ISA_FLMAP.
* gcc/config/avr/avr.cc (avr_arch_index, avr_has_rodata_p): New vars.
(avr_set_core_architecture): Set avr_arch_index.
(have_avrxmega2_flmap, have_avrxmega4_flmap)
(have_avrxmega3_rodata_in_flash): Set new static const bool according
to configure results.
(avr_rodata_in_flash_p): New function using them.
(avr_asm_init_sections): Let readonly_data_section->unnamed.callback
track avr_need_copy_data_p only if not avr_rodata_in_flash_p().
(avr_asm_named_section): Track avr_has_rodata_p.
(avr_file_end): Emit __do_copy_data also when avr_has_rodata_p
and not avr_rodata_in_flash_p ().
* config/avr/specs.h (CC1_SPEC): Add %(cc1_rodata_in_ram).
(LINK_SPEC): Add %(link_rodata_in_ram).
(LINK_ARCH_SPEC): Remove.
* gcc/config/avr/gen-avr-mmcu-specs.cc (have_avrxmega3_rodata_in_flash)
(have_avrxmega2_flmap, have_avrxmega4_flmap): Set new static
const bool according to configure results.
(diagnose_mrodata_in_ram): New function.
(print_mcu): Generate specs with the following changes:
<*cc1_misc, *asm_misc, *link_misc>: New specs so that we don't
need to extend avr/specs.h each time we add a new bell or whistle.
<*cc1_rodata_in_ram, *link_rodata_in_ram>: New specs to diagnose
-m[no-]rodata-in-ram.
<*cpp_rodata_in_ram>: New. Does -D__AVR_RODATA_IN_RAM__=0/1.
<*cpp_mcu>: Add -D__AVR_AVR_FLMAP__ if it applies.
<*cpp>: Add %(cpp_rodata_in_ram).
<*link_arch>: Use emulation avrxmega2_flmap, avrxmega4_flmap as
requested.
<*self_spec>: Add -mflmap or %diff --git a/gcc/config.in b/gcc/config.in
index b499bbfdda7..99fd2d89fe3 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1679,6 +1679,12 @@
 #endif
 
 
+/* Define if your linker supports emulation avrxmega2_flmap. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_AVR_AVRXMEGA2_FLMAP
+#endif
+
+
 /* Define if your default avr linker script for avrxmega3 leaves .rodata in
flash. */
 #ifndef USED_FOR_TARGET
@@ -1686,6 +1692,12 @@
 #endif
 
 
+/* Define if your linker supports emulation avrxmega4_flmap. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_LD_AVR_AVRXMEGA4_FLMAP
+#endif
+
+
 /* Define if your linker supports -z bndplt */
 #ifndef USED_FOR_TARGET
 #undef HAVE_LD_BNDPLT_SUPPORT
diff --git a/gcc/config/avr/avr-arch.h b/gcc/config/avr/avr-arch.h
index 03b3263d529..d0a297d81e4 100644
--- a/gcc/config/avr/avr-arch.h
+++ b/gcc/config/avr/avr-arch.h
@@ -166,7 +166,35 @@ AVR_ISA_RCALL
   assume these instructions are not available and we set the