Re: [RFC] light expander sra for parameters and returns

2023-07-23 Thread Jiufu Guo via Gcc-patches


Hi Martin,

Not sure about your current option about re-using the ipa-sra code
in the light-expander-sra. And if anything I could input please
let me know.

And I'm thinking about the difference between the expander-sra, ipa-sra
and tree-sra. 1. For stmts walking, expander-sra has special behavior
for return-stmt, and also a little special on assign-stmt. And phi
stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
I'm also thinking if we need a tree structure; it would be useful when
checking overlaps, it was not used now in the expander-sra.

For ipa-sra and tree-sra, I notice that there is some similar code,
but of cause there are differences. While it seems the difference
is 'intended', for example: 1. when creating and accessing,
'size != max_size' is acceptable in tree-sra but not for ipa-sra.
2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
not ok for tree-ipa.  
I'm wondering if those slight difference blocks re-use the code
between ipa-sra and tree-sra.

The expander-sra may be more light, for example, maybe we can use
FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
need to walk all the stmts.


BR,
Jeff (Jiufu Guo)


Jiufu Guo via Gcc-patches  writes:

> Hi Martin,
>
> Jiufu Guo via Gcc-patches  writes:
>
>> Hi,
>>
>> Martin Jambor  writes:
>>
>>> Hi,
>>>
>>> On Tue, May 30 2023, Richard Biener wrote:
 On Mon, 29 May 2023, Jiufu Guo wrote:

> Hi,
> 
> Previously, I was investigating some struct parameters and returns related
> PRs 69143/65421/108073.
> 
> Investigating the issues case by case, and drafting patches for each of
> them one by one. This would help us to enhance code incrementally.
> While, this way, patches would interact with each other and implement
> different codes for similar issues (because of the different paths in
> gimple/rtl).  We may have a common fix for those issues.
> 
> We know a few other related PRs(such as meta-bug PR101926) exist. For 
> those
> PRs in different targets with different symptoms (and also different root
> cause), I would expect a method could help some of them, but it may
> be hard to handle all of them in one fix.
> 
> With investigation and check discussion for the issues, I remember a
> suggestion from Richard: it would be nice to perform some SRA-like 
> analysis
> for the accesses on the structs (parameter/returns).
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
> This may be a 'fairly common method' for those issues. With this idea,
> I drafted a patch as below in this mail.
> 
> I also thought about directly using tree-sra.cc, e.g. enhance it and 
> rerun it
> at the end of GIMPLE passes. While since some issues are introduced inside
> the expander, so below patch also co-works with other parts of the 
> expander.
> And since we already have tree-sra in gimple pass, we only need to take 
> more
> care on parameter and return in this patch: other decls could be handled
> well in tree-sra.
> 
> The steps of this patch are:
> 1. Collect struct type parameters and returns, and then scan the function 
> to
> get the accesses on them. And figure out the accesses which would be 
> profitable
> to be scalarized (using registers of the parameter/return ). Now, reading 
> on
> parameter and writing on returns are checked in the current patch.
> 2. When/after the scalar registers are determined/expanded for the return 
> or
> parameters, compute the corresponding scalar register(s) for each 
> accesses of
> the return/parameter, and prepare the scalar RTLs for those accesses.
> 3. When using/expanding the accesses expression, leverage the 
> computed/prepared
> scalars directly.
> 
> This patch is tested on ppc64 both LE and BE.
> To continue, I would ask for comments and suggestions first. And then I 
> would
> update/enhance accordingly.  Thanks in advance!

 Thanks for working on this - the description above sounds exactly like
 what should be done.

 Now - I'd like the code to re-use the access tree data structure from
 SRA plus at least the worker creating the accesses from a stmt.
>>>
>
> I'm thinking about which part of the code can be re-used from
> ipa-sra and tree-sra.
> It seems there are some similar concepts between them:
> "access with offset/size", "collect and check candidates",
> "analyze accesses"...
>
> While because the purposes are different, the logic and behavior
> between them (ipa-sra, tree-sra, and expander-sra) are different,
> even for similar concepts.
>
> The same behavior and similar concept may be reusable. Below list
> may be part of them.
> *. allocate and maintain access
>basic access structure: offset, size, reverse
> *. type or expr checking
> *. disqualify
> *. scan and build expr 

[PATCH] libstdc++: Add missing constexpr specifier and function overloads

2023-07-23 Thread Deev Patel via Gcc-patches
Hi,

A couple of virtual functions in the libstdc++ format header are marked
constexpr in the base class, but not in the derived class. This was causing
build failures when trying to compile latest gcc libstd with clang 16 using
c++20. Adding the constexpr specifier resolves the issue.

2023-07-23  Deev Patel  

* include/std/format: Add missing constexpr specifiers on function 
overloads


>From ac34afa1109b4c82e5cc377f49abf55422b89529 Mon Sep 17 00:00:00 2001
From: Deev Patel 
Date: Sun, 23 Jul 2023 20:08:46 -0700
Subject: [PATCH] [libstdc++] Add missing constexpr specifiers on function
 overloads

---
 libstdc++-v3/include/std/format | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 9710bff3c03..0c6069b2681 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3554,14 +3554,14 @@ namespace __format

   using iterator = typename _Scanner<_CharT>::iterator;

-  void
+  constexpr void
   _M_on_chars(iterator __last) override
   {
basic_string_view<_CharT> __str(this->begin(), __last);
_M_fc.advance_to(__format::__write(_M_fc.out(), __str));
   }

-  void
+  constexpr void
   _M_format_arg(size_t __id) override
   {
using _Context = basic_format_context<_Out, _CharT>;
-- 
2.41.0


RE: [PATCH v5] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-23 Thread Li, Pan2 via Gcc-patches
Passed the riscv.exp/rvv.exp in both rv32 and rv64 in PATCH v6 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625315.html

Pan

From: Li, Pan2
Sent: Monday, July 24, 2023 9:52 AM
To: 'juzhe.zh...@rivai.ai' ; gcc-patches 

Cc: Kito.cheng ; Wang, Yanzhang 
Subject: RE: [PATCH v5] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Thanks Juzhe for reviewing, looks some conflicts with upstream, and will send 
the PATCH v6 after passed all the tests.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Monday, July 24, 2023 8:54 AM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; Li, Pan2 
mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>
Subject: Re: [PATCH v5] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Overall LGTM from my side.
But we should  wait for Kito's review.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-07-23 21:11
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH v5] RISC-V: Support CALL for RVV floating-point dynamic rounding
From: Pan Li mailto:pan2...@intel.com>>

In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.

During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.

1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.

We will perfrom some steps to make above happen.

1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.

Let's take a flow for this.

   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+

When call is the last insn of one bb, we take care of it when needed
for each insn by inserting one frm backup (frrm) insn to the end of
the current bb.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored-By: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

gcc/ChangeLog:

* config/riscv/riscv.cc (DYNAMIC_FRM_RTL): New macro.
(STATIC_FRM_P): Ditto.
(struct mode_switching_info): New struct for mode switching.
(struct machine_function): Add new field mode switching.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_adjust_mode_after_call): New function for call mode.
(riscv_frm_reconcile_call_as_bb_end): New function for call as
the last insn of bb.
(riscv_frm_mode_needed): New function for frm mode needed.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
(riscv_mode_entry): Remove backup rtl initialization.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Adjust
test cases.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: New test.
* 

Re: [PATCHv2, rs6000] Generate mfvsrwz for all subtargets and remove redundant zero extend [PR106769]

2023-07-23 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/7/21 09:32, HAO CHEN GUI wrote:
> Hi,
>   This patch modifies vsx extract expand and generates mfvsrwz/stxsiwx
> for all subtargets when the mode is V4SI and the index of extracted element
> is 1 for BE and 2 for LE. Also this patch adds a insn pattern for mfvsrwz
> which can help eliminate redundant zero extend.
> 
>   Compared to last version, the main change is to add a new expand for V4SI
> and separate "vsx_extract_si" to 2 insn patterns.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622101.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Generate mfvsrwz for all subtargets and remove redundant zero extend
> 
> mfvsrwz has lower latency than xxextractuw or vextuw[lr]x.  So it should be
> generated even with p9 vector enabled.  Also the instruction is already
> zero extended.  A combine pattern is needed to eliminate redundant zero
> extend instructions.
> 
> gcc/
>   PR target/106769
>   * config/rs6000/vsx.md (expand vsx_extract_): Set it only
>   for V8HI and V16QI.
>   (vsx_extract_v4si): New expand for V4SI.
>   (*vsx_extract__di_p9): Not generate the insn when it can
>   be generated by mfvsrwz.
>   (mfvsrwz): New insn pattern for zero extended vsx_extract_v4si.
>   (*vsx_extract_si): Removed.
>   (vsx_extract_v4si_0): New insn pattern to deal with V4SI extract
>   when the index of extracted element is 1 with BE and 2 with LE.
>   (vsx_extract_v4si_1): New insn and split pattern which deals with
>   the cases not handled by vsx_extract_v4si_0.
> 
> gcc/testsuite/
>   PR target/106769
>   * gcc.target/powerpc/pr106769.h: New.
>   * gcc.target/powerpc/pr106769-p8.c: New.
>   * gcc.target/powerpc/pr106769-p9.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0a34ceebeb5..ad249441bcf 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3722,9 +3722,9 @@ (define_insn "vsx_xxpermdi2__1"
>  (define_expand  "vsx_extract_"
>[(parallel [(set (match_operand: 0 "gpc_reg_operand")
>  (vec_select:
> - (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand")
> + (match_operand:VSX_EXTRACT_I2 1 "gpc_reg_operand")
>   (parallel [(match_operand:QI 2 "const_int_operand")])))
> -   (clobber (match_scratch:VSX_EXTRACT_I 3))])]
> +   (clobber (match_scratch:VSX_EXTRACT_I2 3))])]
>"VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT"
>  {
>/* If we have ISA 3.0, we can do a xxextractuw/vextractu{b,h}.  */
> @@ -3736,6 +3736,23 @@ (define_expand  "vsx_extract_"
>  }
>  })
> 
> +(define_expand  "vsx_extract_v4si"
> +  [(parallel [(set (match_operand:SI 0 "gpc_reg_operand")
> +(vec_select:SI
> + (match_operand:V4SI 1 "gpc_reg_operand")
> + (parallel [(match_operand:QI 2 "const_0_to_3_operand")])))
> +   (clobber (match_scratch:V4SI 3))])]
> +  "TARGET_DIRECT_MOVE_64BIT"
> +{

Nit: Maybe add a comment here for why we special-case op2.

> +  if (TARGET_P9_VECTOR
> +  && INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2))
> +{
> +  emit_insn (gen_vsx_extract_v4si_p9 (operands[0], operands[1],
> +   operands[2]));
> +  DONE;
> +}
> +})
> +

Nit: Move "(define_insn \"vsx_extract_v4si_0\"..." up here to ensure
it takes the first priority in matching.

>  (define_insn "vsx_extract__p9"
>[(set (match_operand: 0 "gpc_reg_operand" "=r,")
>   (vec_select:
> @@ -3798,7 +3815,9 @@ (define_insn_and_split "*vsx_extract__di_p9"
> (match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,")
> (parallel [(match_operand:QI 2 "const_int_operand" "n,n")]
> (clobber (match_scratch:SI 3 "=r,X"))]
> -  "VECTOR_MEM_VSX_P (mode) && TARGET_VEXTRACTUB"
> +  "TARGET_VEXTRACTUB
> +   && (mode != V4SImode
> +   || INTVAL (operands[2]) != (BYTES_BIG_ENDIAN ? 1 : 2))"

I'd expect that under condition TARGET_VEXTRACTUB, we won't get this kind of
pattern with V4SI and 1/2 op2 now?  Instead of putting one condition to exclude
it, IMHO it's better to assert op2 isn't 1 or 2 in its splitters.

>"#"
>"&& reload_completed"
>[(parallel [(set (match_dup 4)
> @@ -3830,58 +3849,78 @@ (define_insn_and_split "*vsx_extract__store_p9"
> (set (match_dup 0)
>   (match_dup 3))])
> 
> -(define_insn_and_split  "*vsx_extract_si"
> +(define_insn "mfvsrwz"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> + (zero_extend:DI
> +   (vec_select:SI
> + (match_operand:V4SI 1 "vsx_register_operand" "wa")
> + (parallel [(match_operand:QI 2 "const_int_operand" "n")]
> +   (clobber (match_scratch:V4SI 3 "=v"))]
> +  "TARGET_DIRECT_MOVE_64BIT
> +   && INTVAL (operands[2]) == (BYTES_BIG_ENDIAN ? 1 : 2)"
> +  "mfvsrwz %0,%x1"
> +  [(set_attr "type" 

[PATCH v6] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-23 Thread Pan Li via Gcc-patches
From: Pan Li 

In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.

During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.

1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.

We will perfrom some steps to make above happen.

1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.

Let's take a flow for this.

   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+

When call is the last insn of one bb, we take care of it when needed
for each insn by inserting one frm backup (frrm) insn to the end of
the current bb.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv.cc (DYNAMIC_FRM_RTL): New macro.
(STATIC_FRM_P): Ditto.
(struct mode_switching_info): New struct for mode switching.
(struct machine_function): Add new field mode switching.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_adjust_mode_after_call): New function for call mode.
(riscv_frm_reconcile_call_as_bb_end): New function for call as
the last insn of bb.
(riscv_frm_mode_needed): New function for frm mode needed.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
(riscv_mode_entry): Remove backup rtl initialization.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Adjust
test cases.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-51.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-53.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-54.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-55.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-56.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-57.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-58.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-59.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-60.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-61.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-62.c: New test.
   

[Bug c++/108626] GCC doesn't deduplicate string literals for const char*const and const char[]

2023-07-23 Thread de34 at live dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108626

--- Comment #9 from Jiang An  ---
See also CWG2753.
https://cplusplus.github.io/CWG/issues/2753.html

[Bug c++/104095] g++ diagnosis may use non-standard terminology: "constant" instead of "literal", "integer" instead of "integral"

2023-07-23 Thread de34 at live dot cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104095

Jiang An  changed:

   What|Removed |Added

 CC||de34 at live dot cn

--- Comment #2 from Jiang An  ---
C standard says "floating constant" while C++ standard says "floating-point
literal".

I wonder whether it makes sense for gcc to say different standardese terms for
C and C++...

Re: [PATCH 2/2 ver 5] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-23 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/7/22 07:38, Carl Love wrote:
> GCC maintainers:
> 
> Version 5, Fixed patch description, the first argument should be of
> type vector.  Fixed comment in vsx.md to say "Vector and scalar
> extract_elt iterator/attr ".  Removed a few of the changes in
> version 4.  Specifically, reverted the names of REPLACE_ELT_V_sh back
> to REPLACE_ELT_sh and REPLACE_ELT_V_max back to REPLACE_ELT_V_max. 
> Combined the REPLACE_ELT_char and REPLACE_ELT_V_char mode attributes
> into REPLACE_ELT_char.  Put the "dg-do link" directive back into the
> vec-replace-word-runnable_1.c test file.  The patch was tested with the
> updated patch 1 in the series on Power 8 LE/BE, Power 9 LE/BE and Power
> 10 with no regressions.
> 

snip...

> 
> rs6000, fix vec_replace_unaligned built-in arguments
> 
> The first argument of the vec_replace_unaligned built-in should always be
> of type vector unsigned char, as specified in gcc/doc/extend.texi.
> 
> This patch fixes the builtin definitions and updates the test cases to use
> the correct arguments.  The original test file is renamed and a second test
> file is added for a new test case.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def: Rename
>   __builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
>   __builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
>   __builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
>   __builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
>   __builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
>   __builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
>   Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
>   VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
>   VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
>   VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
>   Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
>   vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
>   vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
>   vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
>   * config/rs6000/rs6000-c.cc (find_instance): Add case
>   RS6000_OVLD_VEC_REPLACE_UN.
>   * config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
>   Fix first argument type.  Rename VREPLACE_UN_UV4SI as
>   VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
>   VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
>   VREPLACE_UN_DI, VREPLACE_UN_V4SF as VREPLACE_UN_SF,
>   VREPLACE_UN_V2DF as VREPLACE_UN_DF.
>   * config/rs6000/vsx.md (REPLACE_ELT): Renamed the mode_iterator

Nit: s/Renamed/Rename/

>   REPLACE_ELT_V for vector modes.
>   (REPLACE_ELT): New scalar mode iterator.
>   (REPLACE_ELT_char): Add scalar attributes.
>   (vreplace_un_): Change iterator and mode attribute.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vec-replace-word-runnable.c: Renamed
>   vec-replace-word-runnable_1.c.

Ditto.

>   * gcc.target/powerpc/vec-replace-word-runnable_1.c
>   (dg-options): add -flax-vector-conversions.
>   (vec_replace_unaligned) Fix first argument type.
>   (vresult_uchar): Fix expected results.
>   (vec_replace_unaligned): Update for loop to check uchar results.
>   Remove extra spaces in if statements. Insert missing spaces in
>   for statements.
>   * gcc.target/powerpc/vec-replace-word-runnable_2.c: New test file.
> ---

[snip...]

>  
>  [VEC_REVB, vec_revb, __builtin_vec_revb]
>vss __builtin_vec_revb (vss);
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0c269e4e8d9..7a4cf492ea5 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -380,10 +380,13 @@ (define_int_attr xvcvbf16   [(UNSPEC_VSX_XVCVSPBF16 
> "xvcvspbf16")
>  ;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops
>  (define_mode_iterator VI2 [V4SI V8HI V16QI V2DI])
>  
> -;; Vector extract_elt iterator/attr for 32-bit and 64-bit elements
> -(define_mode_iterator REPLACE_ELT [V4SI V4SF V2DI V2DF])
> +;; Vector and scalar extract_elt iterator/attr for 32-bit and 64-bit elements

Nit: Since you touched this comment line, extract_elt is wrong before.
Maybe s/extract_elt/vector replace/?

> +(define_mode_iterator REPLACE_ELT_V [V4SI V4SF V2DI V2DF])
> +(define_mode_iterator REPLACE_ELT [SI SF DI DF])
>  (define_mode_attr REPLACE_ELT_char [(V4SI "w") (V4SF "w")
> - (V2DI  "d") (V2DF "d")])
> + (V2DI "d") (V2DF "d")
> + (SI "w") (SF "w")
> + (DI "d") (DF "d")])
>  (define_mode_attr REPLACE_ELT_sh [(V4SI "2") (V4SF "2")
> (V2DI  "3") (V2DF "3")])
>  (define_mode_attr REPLACE_ELT_max [(V4SI "12") 

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-23 Thread Hao Liu OS via Gcc-patches
Hi Richard,

Gentle ping.  Is it ok for trunk?

Or, you will have patch covering such fix?

Thanks,
-Hao


From: Hao Liu OS 
Sent: Wednesday, July 19, 2023 12:33
To: GCC-patches@gcc.gnu.org
Cc: richard.sandif...@arm.com
Subject: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

This only affects the new costs in aarch64 backend.  Currently, the reduction
latency of vector body is too large as it is multiplied by stmt count.  As the
scalar reduction latency is small, the new costs model may think "scalar code
would issue more quickly" and increase the vector body cost a lot, which will
miss vectorization opportunities.

Tested by bootstrapping on aarch64-linux-gnu.

gcc/ChangeLog:

PR target/110625
* config/aarch64/aarch64.cc (count_ops): Remove the '* count'
for reduction_latency.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr110625.c: New testcase.
---
 gcc/config/aarch64/aarch64.cc   |  5 +--
 gcc/testsuite/gcc.target/aarch64/pr110625.c | 46 +
 2 files changed, 47 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 560e5431636..27afa64b7d5 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16788,10 +16788,7 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 {
   unsigned int base
= aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
-
-  /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
-that's not yet the case.  */
-  ops->reduction_latency = MAX (ops->reduction_latency, base * count);
+  ops->reduction_latency = MAX (ops->reduction_latency, base);
 }

   /* Assume that multiply-adds will become a single operation.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625.c 
b/gcc/testsuite/gcc.target/aarch64/pr110625.c
new file mode 100644
index 000..0965cac33a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr110625.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details 
-fno-tree-slp-vectorize" } */
+/* { dg-final { scan-tree-dump-not "reduction latency = 8" "vect" } } */
+
+/* Do not increase the vector body cost due to the incorrect reduction latency
+Original vector body cost = 51
+Scalar issue estimate:
+  ...
+  reduction latency = 2
+  estimated min cycles per iteration = 2.00
+  estimated cycles per vector iteration (for VF 2) = 4.00
+Vector issue estimate:
+  ...
+  reduction latency = 8  <-- Too large
+  estimated min cycles per iteration = 8.00
+Increasing body cost to 102 because scalar code would issue more quickly
+  ...
+missed:  cost model: the vector iteration cost = 102 divided by the scalar 
iteration cost = 44 is greater or equal to the vectorization factor = 2.
+missed:  not vectorized: vectorization not profitable.  */
+
+typedef struct
+{
+  unsigned short m1, m2, m3, m4;
+} the_struct_t;
+typedef struct
+{
+  double m1, m2, m3, m4, m5;
+} the_struct2_t;
+
+double
+bar (the_struct2_t *);
+
+double
+foo (double *k, unsigned int n, the_struct_t *the_struct)
+{
+  unsigned int u;
+  the_struct2_t result;
+  for (u = 0; u < n; u++, k--)
+{
+  result.m1 += (*k) * the_struct[u].m1;
+  result.m2 += (*k) * the_struct[u].m2;
+  result.m3 += (*k) * the_struct[u].m3;
+  result.m4 += (*k) * the_struct[u].m4;
+}
+  return bar ();
+}
--
2.34.1


Re: [PATCH 1/2 ver 2] rs6000, add argument to function find_instance

2023-07-23 Thread Kewen.Lin via Gcc-patches
Hi Carl,

on 2023/7/22 07:38, Carl Love wrote:
> GCC maintainers:
> 
> Version 2:  Updated a number of formatting and spacing issues.   Added
> the NARGS description to the header comment for function find_instance.
> This patch was tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
> with no regressions.
> 
> The rs6000 function find_instance assumes that it is called for built-
> ins with only two arguments.  There is no checking for the actual
> number of aruguments used in the built-in.  This patch adds an
> additional parameter to the function call containing the number of
> aruguments in the built-in.  The function will now do the needed checks
> for all of the arguments.
> 
> This fix is needed for the next patch in the series that fixes the
> vec_replace_unaligned built-in.c test.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> 
> 
> 
> -
> rs6000, add argument to function find_instance
> 
> The function find_instance assumes it is called to check a built-in with
> only two arguments.  This patch extends the function by adding a parameter
> specifying the number of built-in arguments to check.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-c.cc (find_instance): Add new parameter that
>   specifies the number of built-in arguments to check.
>   (altivec_resolve_overloaded_builtin): Update calls to find_instance
>   to pass the number of built-in arguments to be checked.
> ---
>  gcc/config/rs6000/rs6000-c.cc | 40 +++
>  1 file changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index a353bca19ef..de35490de42 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -1668,18 +1668,20 @@ resolve_vec_step (resolution *res, vec 
> *arglist, unsigned nargs)
>  /* Look for a matching instance in a chain of instances.  INSTANCE points to
> the chain of instances; INSTANCE_CODE is the code identifying the specific
> built-in being searched for; FCODE is the overloaded function code; TYPES
> -   contains an array of two types that must match the types of the instance's
> -   parameters; and ARGS contains an array of two arguments to be passed to
> -   the instance.  If found, resolve the built-in and return it, unless the
> -   built-in is not supported in context.  In that case, set
> -   UNSUPPORTED_BUILTIN to true.  If we don't match, return error_mark_node
> -   and leave UNSUPPORTED_BUILTIN alone.  */
> +   contains an array of NARGS types that must match the types of the
> +   instance's parameters; ARGS contains an array of NARGS arguments to be
> +   passed to the instance; and NARGS is the number of built-in arguments to
> +   check.  If found, resolve the built-in and return it, unless the built-in
> +   is not supported in context.  In that case, set UNSUPPORTED_BUILTIN to
> +   true.  If we don't match, return error_mark_node and leave
> +   UNSUPPORTED_BUILTIN alone.
> +*/

Minor nit: "*/" doesn't need to start with a new line, just "... alone.  */".

This patch is okay for trunk with this fixed.  Thanks!

BR,
Kewen



Re: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

2023-07-23 Thread Kewen.Lin via Gcc-patches
Hi Iain,

on 2023/7/22 23:58, Iain Sandoe wrote:
> Hi Kewen,
> 
> This patch breaks bootstrap on powerpc-darwin (which has Altivec, but not 
> VSX) while building libgfortran.
> 
>> On 3 Jul 2023, at 04:19, Kewen.Lin via Gcc-patches  
>> wrote:
> 
> Please see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110776
> thanks
> Iain
> 

Thanks for reporting!  I'll have a look at it.


BR,
Kewen


[Bug target/110776] [14 Regression] powerpc-darwin bootstrap broken after r14-2490 with ICE rs6000.cc:5069 building libgfortran

2023-07-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110776

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2023-07-24

--- Comment #2 from Kewen Lin  ---
Thanks for reporting and sorry for the breakage. I'll have a look first.

(In reply to Iain Sandoe from comment #0)
> The ICE seems to be because rs6000_builtin_vectorization_cost () is called
> with a request for a misaligned load (which we do not support), It
> reproduces on a cross from x86_64.
> 
> This is in compiling libgfortran generated code (so nothing Darwin-specific,
> other than being an Altivec platform).

Thanks for the information.

> 
> A philosophical question; if a request is made for the cost of doing
> something unsupported - should we not return "infinity" rather than ICEing?  
> 
> Presumably, the alternative is that the middle end needs to know that some
> kinds of operation are not supported and therefore not to try and cost them
> (speculation here; I have no knowledge of the relevant code).

I think that's what's being adopted now, if the target doesn't support
unaligned load, the middle-end should take it as dr_unaligned_unsupported
(dr_alignment_support) and use VECT_MAX_COST, it's expected that there is no
chance to query it with unaligned_load. Maybe some path was changed by the
culprit commit.

[Bug target/110786] bpf: make use of the V4 byte swap instructions

2023-07-23 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110786

Jose E. Marchesi  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jemarch at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2023-07-24

[Bug c++/110785] [c++14+] Incorrect return type deduction for const auto with no return statement

2023-07-23 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110785

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||rejects-valid
   Last reconfirmed||2023-07-24
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed, not a regression. has been a bug since GCC 4.9.0 when C++14 support
was added ...

RE: [PATCH v1] RISC-V: Bugfix for allowing incorrect dyn for static rounding

2023-07-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Monday, July 24, 2023 8:53 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Bugfix for allowing incorrect dyn for static 
rounding

Ok. You can commit it.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-07-23 21:54
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for allowing incorrect dyn for static 
rounding
From: Pan Li mailto:pan2...@intel.com>>

According to the spec, dyn rounding mode is invalid for RVV
floating-point, this patch would like to fix this.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-shapes.cc
(struct alu_frm_def): Take range check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-error.c: Update cases.
* gcc.target/riscv/rvv/base/float-point-frm-insert-6.c: Removed.
---
.../riscv/riscv-vector-builtins-shapes.cc |  3 +-
.../riscv/rvv/base/float-point-frm-error.c|  6 ++--
.../riscv/rvv/base/float-point-frm-insert-6.c | 33 ---
3 files changed, 4 insertions(+), 38 deletions(-)
delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 69a67106418..22b5fe256df 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -285,8 +285,7 @@ struct alu_frm_def : public build_base
   {
unsigned int frm_num = c.arg_num () - 2;
- return c.require_immediate_range_or (frm_num, FRM_STATIC_MIN,
-  FRM_STATIC_MAX, FRM_DYN);
+ return c.require_immediate (frm_num, FRM_STATIC_MIN, FRM_STATIC_MAX);
   }
 return true;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
index 4ebaa15ab0b..01d82d4e661 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
@@ -7,9 +7,9 @@ typedef float float32_t;
void test_float_point_frm_error (float32_t *out, vfloat32m1_t op1, vfloat32m1_t 
op2, size_t vl)
{
-  vfloat32m1_t v1 = __riscv_vfadd_vv_f32m1_rm (op1, op2, 5, vl); /* { dg-error 
{passing 5 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
-  vfloat32m1_t v2 = __riscv_vfadd_vv_f32m1_rm (v1, v1, 6, vl);   /* { dg-error 
{passing 6 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
-  vfloat32m1_t v3 = __riscv_vfadd_vv_f32m1_rm (v2, v2, 8, vl);   /* { dg-error 
{passing 8 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
+  vfloat32m1_t v1 = __riscv_vfadd_vv_f32m1_rm (op1, op2, 5, vl); /* { dg-error 
{passing 5 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
+  vfloat32m1_t v2 = __riscv_vfadd_vv_f32m1_rm (v1, v1, 6, vl);   /* { dg-error 
{passing 6 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
+  vfloat32m1_t v3 = __riscv_vfadd_vv_f32m1_rm (v2, v2, 8, vl);   /* { dg-error 
{passing 8 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
   __riscv_vse32_v_f32m1 (out, v3, vl);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c
deleted file mode 100644
index 1ef0e015d8f..000
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
-
-#include "riscv_vector.h"
-
-typedef float float32_t;
-
-vfloat32m1_t
-test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
-  return __riscv_vfadd_vv_f32m1_rm (op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
- size_t vl) {
-  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
-  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
- size_t vl) {
-  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 7, vl);
-}
-
-/* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
-/* { dg-final { scan-assembler-not {fsrm\s+[axs][0-9]+} } } 

[Bug target/110786] New: bpf: make use of the V4 byte swap instructions

2023-07-23 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110786

Bug ID: 110786
   Summary: bpf: make use of the V4 byte swap instructions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jemarch at gcc dot gnu.org
  Target Milestone: ---

The BPF V4 ISA introduces support for byte-swap instructions: bswap16, bswap32
and bswap64.  GCC shall be adapted to use these instructions for the builtins:

  __builtin_bswap{16,32,64}

[Bug c++/110785] New: Incorrect return type deduction for const auto with no return statement

2023-07-23 Thread bbi5291 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110785

Bug ID: 110785
   Summary: Incorrect return type deduction for const auto with no
return statement
   Product: gcc
   Version: 13.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bbi5291 at gmail dot com
  Target Milestone: ---

With `g++ -std=c++2b` the following is rejected:
---
const auto f() { }
const void (*fp)() = 
---
with error message:
---
:2:22: error: invalid conversion from 'void (*)()' to 'const void
(*)()' [-fpermissive]
2 | const void (*fp)() = 
  |  ^~
  |  |
  |  void (*)()
---

However, the type of `` should be `const void (*)()`.

Clang does the right thing here, as does MSVC. GCC also does the right thing if
the body of `f` contains the statement `return;`, instead of no return
statement.

Re: [PATCH v5] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-23 Thread juzhe.zh...@rivai.ai
Overall LGTM from my side. 
But we should  wait for Kito's review.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-07-23 21:11
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v5] RISC-V: Support CALL for RVV floating-point dynamic rounding
From: Pan Li 
 
In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.
 
During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.
 
1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.
 
We will perfrom some steps to make above happen.
 
1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.
 
Let's take a flow for this.
 
   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+
 
When call is the last insn of one bb, we take care of it when needed
for each insn by inserting one frm backup (frrm) insn to the end of
the current bb.
 
Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (DYNAMIC_FRM_RTL): New macro.
(STATIC_FRM_P): Ditto.
(struct mode_switching_info): New struct for mode switching.
(struct machine_function): Add new field mode switching.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_adjust_mode_after_call): New function for call mode.
(riscv_frm_reconcile_call_as_bb_end): New function for call as
the last insn of bb.
(riscv_frm_mode_needed): New function for frm mode needed.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
(riscv_mode_entry): Remove backup rtl initialization.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Adjust
test cases.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-51.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-53.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-54.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-55.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-56.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-57.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-58.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-59.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-60.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-61.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-62.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-63.c: New test.
* 

Re: [PATCH v1] RISC-V: Bugfix for allowing incorrect dyn for static rounding

2023-07-23 Thread juzhe.zh...@rivai.ai
Ok. You can commit it.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-07-23 21:54
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for allowing incorrect dyn for static 
rounding
From: Pan Li 
 
According to the spec, dyn rounding mode is invalid for RVV
floating-point, this patch would like to fix this.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-shapes.cc
(struct alu_frm_def): Take range check.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-frm-error.c: Update cases.
* gcc.target/riscv/rvv/base/float-point-frm-insert-6.c: Removed.
---
.../riscv/riscv-vector-builtins-shapes.cc |  3 +-
.../riscv/rvv/base/float-point-frm-error.c|  6 ++--
.../riscv/rvv/base/float-point-frm-insert-6.c | 33 ---
3 files changed, 4 insertions(+), 38 deletions(-)
delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 69a67106418..22b5fe256df 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -285,8 +285,7 @@ struct alu_frm_def : public build_base
   {
unsigned int frm_num = c.arg_num () - 2;
- return c.require_immediate_range_or (frm_num, FRM_STATIC_MIN,
-  FRM_STATIC_MAX, FRM_DYN);
+ return c.require_immediate (frm_num, FRM_STATIC_MIN, FRM_STATIC_MAX);
   }
 return true;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
index 4ebaa15ab0b..01d82d4e661 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
@@ -7,9 +7,9 @@ typedef float float32_t;
void test_float_point_frm_error (float32_t *out, vfloat32m1_t op1, vfloat32m1_t 
op2, size_t vl)
{
-  vfloat32m1_t v1 = __riscv_vfadd_vv_f32m1_rm (op1, op2, 5, vl); /* { dg-error 
{passing 5 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
-  vfloat32m1_t v2 = __riscv_vfadd_vv_f32m1_rm (v1, v1, 6, vl);   /* { dg-error 
{passing 6 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
-  vfloat32m1_t v3 = __riscv_vfadd_vv_f32m1_rm (v2, v2, 8, vl);   /* { dg-error 
{passing 8 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
+  vfloat32m1_t v1 = __riscv_vfadd_vv_f32m1_rm (op1, op2, 5, vl); /* { dg-error 
{passing 5 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
+  vfloat32m1_t v2 = __riscv_vfadd_vv_f32m1_rm (v1, v1, 6, vl);   /* { dg-error 
{passing 6 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
+  vfloat32m1_t v3 = __riscv_vfadd_vv_f32m1_rm (v2, v2, 8, vl);   /* { dg-error 
{passing 8 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
   __riscv_vse32_v_f32m1 (out, v3, vl);
}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c
deleted file mode 100644
index 1ef0e015d8f..000
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
-
-#include "riscv_vector.h"
-
-typedef float float32_t;
-
-vfloat32m1_t
-test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
-  return __riscv_vfadd_vv_f32m1_rm (op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
- size_t vl) {
-  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
-  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
- size_t vl) {
-  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 7, vl);
-}
-
-/* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
-/* { dg-final { scan-assembler-not {fsrm\s+[axs][0-9]+} } } */
-/* { dg-final { scan-assembler-not {frrm\s+[axs][0-9]+} } } */
-/* { dg-final { scan-assembler-not {fsrmi\s+[01234]} } } */
-- 
2.34.1
 
 


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-23 Thread Ben Boeckel via Gcc
On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:
> It occurs to me that the model I am envisioning is similar to CMake's object 
> libraries.  Object libraries are a convenient name for a bunch of object 
> files. 
> IIUC they're linked by naming the individual object files (or I think the 
> could 
> be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
> -no-whole-archive.  But for this conversation consider them a bunch of 
> separate 
> object files with a convenient group name.

Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.

> Consider also that object libraries could themselves contain object libraries 
> (I 
> don't know of they can, but it seems like a useful concept).  Then one could 
> create an object library from a collection of object files and object 
> libraries 
> (recursively).  CMake would handle the transitive gtaph.

I think this detail is relevant, but you can use
`$` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

A (OBJECT library)
B (library of some kind; links PUBLIC to A)
C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.

> Now, allow an object library to itself have some kind of tangible, on-disk 
> representation.  *BUT* not like a static library -- it doesn't include the 
> object files.
> 
> 
> Now that immediately maps onto modules.
> 
> CMI: Object library
> Direct imports: Direct object libraries of an object library
> 
> This is why I don't understand the need explicitly indicate the indirect 
> imports 
> of a CMI.  CMake knows them, because it knows the graph.

Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.

--Ben


Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-23 Thread Ben Boeckel via Gcc-patches
On Fri, Jul 21, 2023 at 16:23:07 -0400, Nathan Sidwell wrote:
> It occurs to me that the model I am envisioning is similar to CMake's object 
> libraries.  Object libraries are a convenient name for a bunch of object 
> files. 
> IIUC they're linked by naming the individual object files (or I think the 
> could 
> be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
> -no-whole-archive.  But for this conversation consider them a bunch of 
> separate 
> object files with a convenient group name.

Yes, `--whole-archive` would work great if it had any kind of
portability across CMake's platform set.

> Consider also that object libraries could themselves contain object libraries 
> (I 
> don't know of they can, but it seems like a useful concept).  Then one could 
> create an object library from a collection of object files and object 
> libraries 
> (recursively).  CMake would handle the transitive gtaph.

I think this detail is relevant, but you can use
`$` as an `INTERFACE` sources and it would act
like that, but it is an explicit thing. Instead, `OBJECT` libraries
*only* provide their objects to targets that *directly* link them. If
not, given this:

A (OBJECT library)
B (library of some kind; links PUBLIC to A)
C (links to B)

If `A` has things like linker flags (or, more likely, libraries) as part
of its usage requirements, C will get them on is link line. However, if
OBJECT files are transitive in the same way, the linker (on most
platforms at least) chokes because it now has duplicates of all of A's
symbols: those from the B library and those from A's objects on the link
line.

> Now, allow an object library to itself have some kind of tangible, on-disk 
> representation.  *BUT* not like a static library -- it doesn't include the 
> object files.
> 
> 
> Now that immediately maps onto modules.
> 
> CMI: Object library
> Direct imports: Direct object libraries of an object library
> 
> This is why I don't understand the need explicitly indicate the indirect 
> imports 
> of a CMI.  CMake knows them, because it knows the graph.

Sure, *CMake* knows them, but the *build tool* needs to be told
(typically `make` or `ninja`) because it is what is actually executing
the build graph. The way this is communicated is via `-MF` files and
that's what I'm providing in this patch. Note that `ninja` does not
allow rules to specify such dependencies for other rules than the one it
is reading the file for.

--Ben


[Bug target/110784] New: bpf: make use of the V4 sign-extended move instructions

2023-07-23 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110784

Bug ID: 110784
   Summary: bpf: make use of the V4 sign-extended move
instructions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jemarch at gcc dot gnu.org
  Target Milestone: ---

The BPF V4 ISA introduces three new instructions to move 16-, 32- and 64-bit
signed values to 64-bit registers with sign extension.  We shall make GCC to
use these instructions whenever feasible/desirable, but only with -mcpu=v4 or
later.

[Bug target/110783] New: bpf; make sure of V4 signed division/modulus instructions

2023-07-23 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110783

Bug ID: 110783
   Summary: bpf; make sure of V4 signed division/modulus
instructions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jemarch at gcc dot gnu.org
  Target Milestone: ---

The BPF V4 ISA introduces support for signed division and signed modulus
instructions.  Up to now, we were supporting these instructions in GCC only for
-mxbpf.  We shall change the compiler in order to generate them with -mcpu=v4
or later instead.

[Bug target/110782] New: bpf: make use of the V4 sign-extended load instructions

2023-07-23 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110782

Bug ID: 110782
   Summary: bpf: make use of the V4 sign-extended load
instructions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jemarch at gcc dot gnu.org
  Target Milestone: ---

The BPF V4 ISA introduces sign-extended load instructions, to load 8-bit,
16-bit, 32-bit and 64-bit values from memory to a 64-bit register with sign
extension.  We should make GCC use these instructions whenever
feasible/desirable.

[Bug target/110781] New: bpf: make use of the V4 long-range jump instruction (jal/gotol)

2023-07-23 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110781

Bug ID: 110781
   Summary: bpf: make use of the V4 long-range jump instruction
(jal/gotol)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jemarch at gcc dot gnu.org
  Target Milestone: ---

The BPF V4 ISA introduces a new unconditional jump instruction (jal/gotol) that
has a much wide target range compared to the previous jal/goto instruction:
from PC-relative 16-bit to PC-relative 32-bit.

We shall make GCC to generate this instruction when it needs to emit jumps that
would be too long for the regular ja/goto instruction.  But only if -mcpu=v4 is
used.

gcc-14-20230723 is now available

2023-07-23 Thread GCC Administrator via Gcc
Snapshot gcc-14-20230723 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/14-20230723/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 14 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch master 
revision bbc1a102735c72e3c5a4dede8ab382813d12b058

You'll find:

 gcc-14-20230723.tar.xz   Complete GCC

  SHA256=36b2833b1b59f693d7c321d971c142038b74332a9aa4a320e24fbf849fa79322
  SHA1=e6bf2c865a0d3b47b7005f69e1fd1eb5782f0fb7

Diffs from 14-20230716 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-14
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[r14-2709 Regression] FAIL: gcc.target/i386/pr93089-3.c scan-assembler vmulps[^\n\r]*zmm on Linux/x86_64

2023-07-23 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

65ff4a45b11b5ab13ef849bd5721ab28ff316202 is the first bad commit
commit 65ff4a45b11b5ab13ef849bd5721ab28ff316202
Author: Jan Hubicka 
Date:   Fri Jul 21 14:54:23 2023 +0200

loop-ch improvements, part 5

caused

FAIL: gcc.target/i386/pr93089-2.c scan-assembler vmulps[^\n\r]*zmm
FAIL: gcc.target/i386/pr93089-3.c scan-assembler vmulps[^\n\r]*zmm

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2709/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-2.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-2.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-2.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-2.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-3.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-3.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-3.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr93089-3.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[PATCH V2 5/5] OpenMP: Fortran support for imperfectly-nested loops

2023-07-23 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.

gcc/fortran/ChangeLog
* gfortran.h (struct gfc_namespace): Add omp_structured_block bit.
* openmp.cc: Include omp-api.h.
(resolve_omp_clauses): Consolidate inscan reduction clause conflict
checking here.
(find_nested_loop_in_chain): New.
(find_nested_loop_in_block): New.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
Handle imperfectly-nested loops when looking for nested omp scan.
Refactor to move inscan reduction clause conflict checking to
resolve_omp_clauses.
(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
(struct icode_error_state): New.
(icode_code_error_callback): New.
(icode_expr_error_callback): New.
(diagnose_intervening_code_errors_1): New.
(diagnose_intervening_code_errors): New.
(make_structured_block): New.
(restructure_intervening_code): New.
(is_outer_iteration_variable): Do not assume loops are perfectly
nested.
(check_nested_loop_in_chain): New.
(check_nested_loop_in_block_state): New.
(check_nested_loop_in_block_symbol): New.
(check_nested_loop_in_block): New.
(expr_uses_intervening_var): New.
(is_intervening_var): New.
(expr_is_invariant): Do not assume loops are perfectly nested.
(resolve_omp_do): Handle imperfectly-nested loops.
* trans-stmt.cc (gfc_trans_block_construct): Generate
OMP_STRUCTURED_BLOCK if magic bit is set on block namespace.

gcc/testsuite/ChangeLog
* gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
* gfortran.dg/gomp/collapse2.f90: Likewise.
* gfortran.dg/gomp/imperfect-gotos.f90: New.
* gfortran.dg/gomp/imperfect-invalid-scope.f90: New.
* gfortran.dg/gomp/imperfect1.f90: New.
* gfortran.dg/gomp/imperfect2.f90: New.
* gfortran.dg/gomp/imperfect3.f90: New.
* gfortran.dg/gomp/imperfect4.f90: New.
* gfortran.dg/gomp/imperfect5.f90: New.

libgomp/ChangeLog
* testsuite/libgomp.fortran/imperfect-destructor.f90: New.
* testsuite/libgomp.fortran/imperfect1.f90: New.
* testsuite/libgomp.fortran/imperfect2.f90: New.
* testsuite/libgomp.fortran/imperfect3.f90: New.
* testsuite/libgomp.fortran/imperfect4.f90: New.
* testsuite/libgomp.fortran/target-imperfect1.f90: New.
* testsuite/libgomp.fortran/target-imperfect2.f90: New.
* testsuite/libgomp.fortran/target-imperfect3.f90: New.
* testsuite/libgomp.fortran/target-imperfect4.f90: New.
---
 gcc/fortran/gfortran.h|   3 +
 gcc/fortran/openmp.cc | 765 +++---
 gcc/fortran/trans-stmt.cc |   7 +-
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90  |   6 +-
 gcc/testsuite/gfortran.dg/gomp/collapse2.f90  |  10 +-
 .../gfortran.dg/gomp/imperfect-gotos.f90  |  69 ++
 .../gomp/imperfect-invalid-scope.f90  |  81 ++
 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90 |  39 +
 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90 |  56 ++
 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90 |  29 +
 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90 |  36 +
 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90 |  67 ++
 .../libgomp.fortran/imperfect-destructor.f90  | 142 
 .../testsuite/libgomp.fortran/imperfect1.f90  |  67 ++
 .../testsuite/libgomp.fortran/imperfect2.f90  | 102 +++
 .../testsuite/libgomp.fortran/imperfect3.f90  | 110 +++
 .../testsuite/libgomp.fortran/imperfect4.f90  | 121 +++
 .../libgomp.fortran/target-imperfect1.f90 |  72 ++
 .../libgomp.fortran/target-imperfect2.f90 | 110 +++
 .../libgomp.fortran/target-imperfect3.f90 | 116 +++
 .../libgomp.fortran/target-imperfect4.f90 | 126 +++
 21 files changed, 2025 insertions(+), 109 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect-gotos.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect-invalid-scope.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/imperfect-destructor.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/imperfect1.f90
 create 

[PATCH V2 3/5] OpenMP: C++ support for imperfectly-nested loops

2023-07-23 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C++ front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an
iterative approach, in order to preserve proper nesting of compound
statements.  Preserving cleanups (destructors) for class objects
declared in intervening code and loop initializers complicates moving
the former into the body of the loop; this is handled by parsing the
entire construct before reassembling any of it.

gcc/cp/ChangeLog
* cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
* parser.cc (struct omp_for_parse_data): New.
(cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
in intervening code.
(check_omp_intervening_code): New.
(cp_parser_statement_seq_opt): Special-case nested loops, blocks,
and other constructs for OpenMP loops.
(cp_parser_iteration_statement): Reject loops in intervening code.
(cp_parser_omp_for_loop_init): Expand comments and tweak the
interface slightly to better distinguish input/output parameters.
(cp_convert_omp_range_for): Likewise.
(cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
and largely rewritten.  Add more comments.
(insert_structured_blocks): New.
(find_structured_blocks): New.
(struct sit_data, substitute_in_tree_walker, substitute_in_tree):
New.
(fixup_blocks_walker): New.
(cp_parser_omp_for_loop): Rewrite to use recursive descent instead
of a loop.  Add logic to reshuffle the bits of code collected
during parsing so intervening code gets moved to the loop body.
(cp_parser_omp_loop): Remove call to finish_omp_for_block, which
is now redundant.
(cp_parser_omp_simd): Likewise.
(cp_parser_omp_for): Likewise.
(cp_parser_omp_distribute): Likewise.
(cp_parser_oacc_loop): Likewise.
(cp_parser_omp_taskloop): Likewise.
(cp_parser_pragma): Reject OpenMP pragmas in intervening code.
* parser.h (struct cp_parser): Add omp_for_parse_state field.
* pt.cc (tsubst_omp_for_iterator): Adjust call to
cp_convert_omp_range_for.
* semantics.cc (finish_omp_for): Try harder to preserve location
of loop variable init expression for use in diagnostics.
(struct fofb_data, finish_omp_for_block_walker): New.
(finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
nested inside BIND instead of directly in BIND itself.

gcc/testsuite/ChangeLog
* c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
* g++.dg/gomp/attrs-imperfect1.C: New test.
* g++.dg/gomp/attrs-imperfect2.C: New test.
* g++.dg/gomp/attrs-imperfect3.C: New test.
* g++.dg/gomp/attrs-imperfect4.C: New test.
* g++.dg/gomp/attrs-imperfect5.C: New test.
* g++.dg/gomp/pr41967.C: Adjust expected error patterns.
* g++.dg/gomp/tpl-imperfect-gotos.C: New test.
* g++.dg/gomp/tpl-imperfect-invalid-scope.C: New test.

libgomp/ChangeLog
* testsuite/libgomp.c++/attrs-imperfect1.C: New test.
* testsuite/libgomp.c++/attrs-imperfect2.C: New test.
* testsuite/libgomp.c++/attrs-imperfect3.C: New test.
* testsuite/libgomp.c++/attrs-imperfect4.C: New test.
* testsuite/libgomp.c++/attrs-imperfect5.C: New test.
* testsuite/libgomp.c++/attrs-imperfect6.C: New test.
* testsuite/libgomp.c++/imperfect-class-1.C: New test.
* testsuite/libgomp.c++/imperfect-class-2.C: New test.
* testsuite/libgomp.c++/imperfect-class-3.C: New test.
* testsuite/libgomp.c++/imperfect-destructor.C: New test.
* testsuite/libgomp.c++/imperfect-template-1.C: New test.
* testsuite/libgomp.c++/imperfect-template-2.C: New test.
* testsuite/libgomp.c++/imperfect-template-3.C: New test.
---
 gcc/cp/cp-tree.h  |2 +-
 gcc/cp/parser.cc  | 1315 -
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |3 +-
 gcc/cp/semantics.cc   |  117 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c |4 +-
 gcc/testsuite/g++.dg/gomp/attrs-imperfect1.C  |   38 +
 gcc/testsuite/g++.dg/gomp/attrs-imperfect2.C  |   34 +
 gcc/testsuite/g++.dg/gomp/attrs-imperfect3.C  |   33 +
 gcc/testsuite/g++.dg/gomp/attrs-imperfect4.C  |   33 +
 gcc/testsuite/g++.dg/gomp/attrs-imperfect5.C  |   57 +
 gcc/testsuite/g++.dg/gomp/pr41967.C   |2 +-
 .../g++.dg/gomp/tpl-imperfect-gotos.C |  161 ++
 .../g++.dg/gomp/tpl-imperfect-invalid-scope.C |   

[PATCH V2 1/5] OpenMP: Add OMP_STRUCTURED_BLOCK and GIMPLE_OMP_STRUCTURED_BLOCK.

2023-07-23 Thread Sandra Loosemore
In order to detect invalid jumps in and out of intervening code in
imperfectly-nested loops, the front ends need to insert some sort of
marker to identify the structured block sequences that they push into
the inner body of the loop.  The error checking happens in the
diagnose_omp_blocks pass, between gimplification and OMP lowering, so
we need both GENERIC and GIMPLE representations of these markers.
They are removed in OMP lowering so no subsequent passes need to know
about them.

This patch doesn't include any front-end changes to generate the new
data structures.

gcc/cp/ChangeLog
* constexpr.cc (cxx_eval_constant_expression): Handle
OMP_STRUCTURED_BLOCK.
* pt.cc (tsubst_expr): Likewise.

gcc/ChangeLog
* doc/generic.texi (OpenMP): Document OMP_STRUCTURED_BLOCK.
* doc/gimple.texi (GIMPLE instruction set): Add
GIMPLE_OMP_STRUCTURED_BLOCK.
(GIMPLE_OMP_STRUCTURED_BLOCK): New subsection.
* gimple-low.cc (lower_stmt): Error on GIMPLE_OMP_STRUCTURED_BLOCK.
* gimple-pretty-print.cc (dump_gimple_omp_block): Handle
GIMPLE_OMP_STRUCTURED_BLOCK.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.cc (walk_gimple_stmt): Likewise.
* gimple.cc (gimple_build_omp_structured_block): New.
* gimple.def (GIMPLE_OMP_STRUCTURED_BLOCK): New.
* gimple.h (gimple_build_omp_structured_block): Declare.
(gimple_has_substatements): Handle GIMPLE_OMP_STRUCTURED_BLOCK.
(CASE_GIMPLE_OMP): Likewise.
* gimplify.cc (is_gimple_stmt): Handle OMP_STRUCTURED_BLOCK.
(gimplify_expr): Likewise.
* omp-expand.cc (GIMPLE_OMP_STRUCTURED_BLOCK): Error on
GIMPLE_OMP_STRUCTURED_BLOCK.
* omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_STRUCTURED_BLOCK.
(lower_omp_1): Likewise.
(diagnose_sb_1): Likewise.
(diagnose_sb_2): Likewise.
* tree-inline.cc (remap_gimple_stmt): Handle
GIMPLE_OMP_STRUCTURED_BLOCK.
(estimate_num_insns): Likewise.
* tree-nested.cc (convert_nonlocal_reference_stmt): Likewise.
(convert_local_reference_stmt): Likewise.
(convert_gimple_call): Likewise.
* tree-pretty-print.cc (dump_generic_node): Handle
OMP_STRUCTURED_BLOCK.
* tree.def (OMP_STRUCTURED_BLOCK): New.
* tree.h (OMP_STRUCTURED_BLOCK_BODY): New.
---
 gcc/cp/constexpr.cc|  1 +
 gcc/cp/pt.cc   |  1 +
 gcc/doc/generic.texi   | 14 ++
 gcc/doc/gimple.texi| 19 +++
 gcc/gimple-low.cc  |  5 +
 gcc/gimple-pretty-print.cc |  6 +-
 gcc/gimple-walk.cc |  1 +
 gcc/gimple.cc  | 15 +++
 gcc/gimple.def |  5 +
 gcc/gimple.h   |  3 +++
 gcc/gimplify.cc|  6 ++
 gcc/omp-expand.cc  |  5 +
 gcc/omp-low.cc | 11 +++
 gcc/tree-inline.cc |  6 ++
 gcc/tree-nested.cc |  3 +++
 gcc/tree-pretty-print.cc   |  4 
 gcc/tree.def   |  9 +
 gcc/tree.h |  3 +++
 18 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index cca0435bafc..a43e5d7f29d 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8055,6 +8055,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
 case OMP_SCAN:
 case OMP_SCOPE:
 case OMP_SECTION:
+case OMP_STRUCTURED_BLOCK:
 case OMP_MASTER:
 case OMP_MASKED:
 case OMP_TASKGROUP:
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d7d774fd9e5..303f72353c0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19692,6 +19692,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   break;
 
 case OMP_MASTER:
+case OMP_STRUCTURED_BLOCK:
   omp_parallel_combined_clauses = NULL;
   /* FALLTHRU */
 case OMP_SECTION:
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 3f9bddd7eae..c8d6ef062f6 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -2488,6 +2488,20 @@ In some cases, @code{OMP_CONTINUE} is placed right before
 occur right after the looping body, it will be emitted between
 @code{OMP_CONTINUE} and @code{OMP_RETURN}.
 
+@item OMP_STRUCTURED_BLOCK
+
+This is another statement that doesn't correspond to an OpenMP directive.
+It is used to mark sections of code in another directive that must
+be structured block sequences, in particular for sequences of intervening code
+in the body of an @code{OMP_FOR}.  It is not necessary to use this when the
+entire body of a directive is required to be a structured block sequence,
+since that is implicit in the representation of the corresponding node.
+
+This tree node is used only to allow error checking transfers of control
+in/out of the structured block sequence after gimplification.
+It has a single operand (@code{OMP_STRUCTURED_BLOCK_BODY}) that is
+the code within the structured 

[PATCH V2 4/5] OpenMP: New C/C++ testcases for imperfectly nested loops.

2023-07-23 Thread Sandra Loosemore
gcc/testsuite/ChangeLog
* c-c++-common/gomp/imperfect-attributes.c: New.
* c-c++-common/gomp/imperfect-badloops.c: New.
* c-c++-common/gomp/imperfect-blocks.c: New.
* c-c++-common/gomp/imperfect-extension.c: New.
* c-c++-common/gomp/imperfect-gotos.c: New.
* c-c++-common/gomp/imperfect-invalid-scope.c: New.
* c-c++-common/gomp/imperfect-labels.c: New.
* c-c++-common/gomp/imperfect-legacy-syntax.c: New.
* c-c++-common/gomp/imperfect-pragmas.c: New.
* c-c++-common/gomp/imperfect1.c: New.
* c-c++-common/gomp/imperfect2.c: New.
* c-c++-common/gomp/imperfect3.c: New.
* c-c++-common/gomp/imperfect4.c: New.
* c-c++-common/gomp/imperfect5.c: New.

libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/imperfect1.c: New.
* testsuite/libgomp.c-c++-common/imperfect2.c: New.
* testsuite/libgomp.c-c++-common/imperfect3.c: New.
* testsuite/libgomp.c-c++-common/imperfect4.c: New.
* testsuite/libgomp.c-c++-common/imperfect5.c: New.
* testsuite/libgomp.c-c++-common/imperfect6.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect1.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect2.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect3.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect4.c: New.
---
 .../c-c++-common/gomp/imperfect-attributes.c  |  81 
 .../c-c++-common/gomp/imperfect-badloops.c|  50 +
 .../c-c++-common/gomp/imperfect-blocks.c  |  75 
 .../c-c++-common/gomp/imperfect-extension.c   |  55 ++
 .../c-c++-common/gomp/imperfect-gotos.c   | 174 ++
 .../gomp/imperfect-invalid-scope.c|  77 
 .../c-c++-common/gomp/imperfect-labels.c  |  85 +
 .../gomp/imperfect-legacy-syntax.c|  44 +
 .../c-c++-common/gomp/imperfect-pragmas.c |  85 +
 gcc/testsuite/c-c++-common/gomp/imperfect1.c  |  38 
 gcc/testsuite/c-c++-common/gomp/imperfect2.c  |  34 
 gcc/testsuite/c-c++-common/gomp/imperfect3.c  |  33 
 gcc/testsuite/c-c++-common/gomp/imperfect4.c  |  33 
 gcc/testsuite/c-c++-common/gomp/imperfect5.c  |  57 ++
 .../libgomp.c-c++-common/imperfect1.c |  76 
 .../libgomp.c-c++-common/imperfect2.c | 114 
 .../libgomp.c-c++-common/imperfect3.c | 119 
 .../libgomp.c-c++-common/imperfect4.c | 117 
 .../libgomp.c-c++-common/imperfect5.c |  49 +
 .../libgomp.c-c++-common/imperfect6.c | 115 
 .../libgomp.c-c++-common/target-imperfect1.c  |  81 
 .../libgomp.c-c++-common/target-imperfect2.c  | 122 
 .../libgomp.c-c++-common/target-imperfect3.c  | 125 +
 .../libgomp.c-c++-common/target-imperfect4.c  | 122 
 24 files changed, 1961 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-attributes.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-badloops.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-blocks.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-extension.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-gotos.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-invalid-scope.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-labels.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-legacy-syntax.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect-pragmas.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect6.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect4.c

diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect-attributes.c 
b/gcc/testsuite/c-c++-common/gomp/imperfect-attributes.c
new file mode 100644
index 000..776295ce22a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/imperfect-attributes.c
@@ -0,0 +1,81 @@
+/* { dg-do compile { target { c || c++11 } 

[PATCH V2 0/5] OpenMP: support for imperfectly-nested loops

2023-07-23 Thread Sandra Loosemore
Here is the latest version of my imperfectly-nested loops patches.
Compared to the initial version I'd posted in April

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617103.html

this version includes many minor cosmetic fixes suggested by Jakub in
his initial review (also present in the version I committed to the
OG13 branch last month), many new test cases to cover various corner
cases, and code fixes so that C and C++ at least behave consistently
even if the spec is unclear.  The most intrusive of those fixes is
that I couldn't figure out how to make jumping between different
structured blocks of intervening code in the same OMP loop construct
produce errors without introducing new GENERIC and GIMPLE data
structures to represent a structured block without any other
associated OpenMP semantics; that's now part 1 of the patch series.

There are a few things from the review comments I haven't done anything
about:

* I left omp-api.h alone because the Fortran front end needs those
  declarations without everything else in omp-general.h.

* I didn't think I ought to be speculatively implementing extensions
  like allowing "do { ... } while (0);" in intervening code.  If it's
  really important for supporting macros, I suppose it will make it
  into a future version of the OpenMP spec.

* I didn't understand the comment about needing to add "#pragma omp
  ordered doacross(source) and sink" to the testcase for errors with
  the "ordered" clause.  Isn't that only for cross-iteration
  data dependencies?  There aren't any in that loop.  Also note that some
  of my new corner-case tests use the "ordered" clause to trigger an
  error to check that things are being correctly parsed as intervening
  code, so if there is something really bogus there that must be fixed,
  it now affects other test cases as well.

* Likewise I didn't know what to do with coming up with a better
  testcase for "scan".  I could not find an existing testcase with nested
  loops that I could just add intervening code to, and when I made
  another attempt to write a new one from scratch I quickly realized I
  couldn't do much better than the existing one, which Tobias had
  originally helped me with.

-Sandra


Sandra Loosemore (5):
  OpenMP: Add OMP_STRUCTURED_BLOCK and GIMPLE_OMP_STRUCTURED_BLOCK.
  OpenMP:  C front end support for imperfectly-nested loops
  OpenMP: C++ support for imperfectly-nested loops
  OpenMP: New C/C++ testcases for imperfectly nested loops.
  OpenMP: Fortran support for imperfectly-nested loops

 gcc/c-family/c-common.h   |1 +
 gcc/c-family/c-omp.cc |  151 ++
 gcc/c/c-parser.cc |  860 +++
 gcc/cp/constexpr.cc   |1 +
 gcc/cp/cp-tree.h  |2 +-
 gcc/cp/parser.cc  | 1315 -
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |4 +-
 gcc/cp/semantics.cc   |  117 +-
 gcc/doc/generic.texi  |   14 +
 gcc/doc/gimple.texi   |   19 +
 gcc/fortran/gfortran.h|3 +
 gcc/fortran/openmp.cc |  765 --
 gcc/fortran/trans-stmt.cc |7 +-
 gcc/gimple-low.cc |5 +
 gcc/gimple-pretty-print.cc|6 +-
 gcc/gimple-walk.cc|1 +
 gcc/gimple.cc |   15 +
 gcc/gimple.def|5 +
 gcc/gimple.h  |3 +
 gcc/gimplify.cc   |6 +
 gcc/omp-api.h |   32 +
 gcc/omp-expand.cc |5 +
 gcc/omp-general.cc|  134 ++
 gcc/omp-general.h |1 +
 gcc/omp-low.cc|  140 +-
 gcc/testsuite/c-c++-common/goacc/collapse-1.c |   16 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c |4 +-
 .../c-c++-common/gomp/imperfect-attributes.c  |   81 +
 .../c-c++-common/gomp/imperfect-badloops.c|   50 +
 .../c-c++-common/gomp/imperfect-blocks.c  |   75 +
 .../c-c++-common/gomp/imperfect-extension.c   |   55 +
 .../c-c++-common/gomp/imperfect-gotos.c   |  174 +++
 .../gomp/imperfect-invalid-scope.c|   77 +
 .../c-c++-common/gomp/imperfect-labels.c  |   85 ++
 .../gomp/imperfect-legacy-syntax.c|   44 +
 .../c-c++-common/gomp/imperfect-pragmas.c |   85 ++
 gcc/testsuite/c-c++-common/gomp/imperfect1.c  |   38 +
 gcc/testsuite/c-c++-common/gomp/imperfect2.c  |   34 +
 gcc/testsuite/c-c++-common/gomp/imperfect3.c  |   33 +
 gcc/testsuite/c-c++-common/gomp/imperfect4.c  |   33 +
 gcc/testsuite/c-c++-common/gomp/imperfect5.c  |   57 +
 gcc/testsuite/g++.dg/gomp/attrs-imperfect1.C  |   38 +
 

[PATCH V2 2/5] OpenMP: C front end support for imperfectly-nested loops

2023-07-23 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an iterative
approach, in order to preserve proper nesting of compound statements.

New common C/C++ testcases are in a separate patch.

gcc/c-family/ChangeLog
* c-common.h (c_omp_check_loop_binding_exprs): Declare.
* c-omp.cc: Include tree-iterator.h.
(find_binding_in_body): New.
(check_loop_binding_expr_r): New.
(LOCATION_OR): New.
(check_looop_binding_expr): New.
(c_omp_check_loop_binding_exprs): New.

gcc/c/ChangeLog
* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
(struct omp_for_parse_data): New.
(check_omp_intervening_code): New.
(add_structured_block_stmt): New.
(c_parser_compound_statement_nostart): Recognize intervening code,
nested loops, and other things that need special handling in
OpenMP loop constructs.
(c_parser_while_statement): Error on loop in intervening code.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.
(c_parser_postfix_expression_after_primary): Error on calls to
the OpenMP runtime in intervening code.
(c_parser_pragma): Error on OpenMP pragmas in intervening code.
(c_parser_omp_loop_nest): New.
(c_parser_omp_for_loop): Rewrite to use recursive descent, calling
c_parser_omp_loop_nest to do the heavy lifting.

gcc/ChangeLog
* omp-api.h: New.
* omp-general.cc (omp_runtime_api_procname): New.
(omp_runtime_api_call): Moved here from omp-low.cc, and make
non-static.
* omp-general.h: Include omp-api.h.
* omp-low.cc (omp_runtime_api_call): Delete this copy.

gcc/testsuite/ChangeLog
* c-c++-common/goacc/collapse-1.c: Update for new C error behavior.
* c-c++-common/goacc/tile-2.c: Likewise.
* gcc.dg/gomp/collapse-1.c: Likewise.
---
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.cc | 151 +++
 gcc/c/c-parser.cc | 860 +-
 gcc/omp-api.h |  32 +
 gcc/omp-general.cc| 134 +++
 gcc/omp-general.h |   1 +
 gcc/omp-low.cc| 129 ---
 gcc/testsuite/c-c++-common/goacc/collapse-1.c |  16 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c |   4 +-
 gcc/testsuite/gcc.dg/gomp/collapse-1.c|  10 +-
 10 files changed, 942 insertions(+), 396 deletions(-)
 create mode 100644 gcc/omp-api.h

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..a05a4b54e1c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1296,6 +1296,7 @@ extern tree c_finish_omp_for (location_t, enum tree_code, 
tree, tree, tree,
 extern bool c_omp_check_loop_iv (tree, tree, walk_tree_lh);
 extern bool c_omp_check_loop_iv_exprs (location_t, enum tree_code, tree, int,
   tree, tree, tree, walk_tree_lh);
+extern bool c_omp_check_loop_binding_exprs (tree, vec *);
 extern tree c_finish_oacc_wait (location_t, tree, tree);
 extern tree c_oacc_split_loop_clauses (tree, tree *, bool);
 extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 4faddb00bbc..9b7d7f789e3 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "langhooks.h"
 #include "bitmap.h"
+#include "tree-iterator.h"
 
 
 /* Complete a #pragma oacc wait construct.  LOC is the location of
@@ -1728,6 +1729,156 @@ c_omp_check_loop_iv_exprs (location_t stmt_loc, enum 
tree_code code,
   return !data.fail;
 }
 
+
+/* Helper function for c_omp_check_loop_binding_exprs: look for a binding
+   of DECL in BODY.  Only traverse things that might be containers for
+   intervening code in an OMP loop.  Returns the BIND_EXPR or DECL_EXPR
+   if found, otherwise null.  */
+
+static tree
+find_binding_in_body (tree decl, tree body)
+{
+  if (!body)
+return NULL_TREE;
+
+  switch (TREE_CODE (body))
+{
+case BIND_EXPR:
+  for (tree b = BIND_EXPR_VARS (body); b; b = DECL_CHAIN (b))
+   if (b == decl)
+ return body;
+  return find_binding_in_body (decl, BIND_EXPR_BODY (body));
+
+case DECL_EXPR:
+  if (DECL_EXPR_DECL (body) == decl)
+   return body;
+  return NULL_TREE;
+
+case STATEMENT_LIST:
+  for (tree_stmt_iterator si = tsi_start (body); !tsi_end_p (si);
+  tsi_next ())
+ 

[Bug tree-optimization/110780] aarch64 NEON redundant displaced ld3

2023-07-23 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
  Component|target  |tree-optimization
   Last reconfirmed||2023-07-23
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
The vectorizer produces:
  vectp_pBE.28_115 = (unsigned char[48] *) ivtmp.73_338;
  _218 = ivtmp.68_335 + 1;
  vectp_pCSI2.19_107 = (unsigned char[48] *) _218;
  vectp_pCSI2.10_98 = (unsigned char[48] *) ivtmp.68_335;
  vect_array.12 = .LOAD_LANES (MEM  [(uint8_t
*)vectp_pCSI2.10_98]);
  vect__1.13_100 = vect_array.12[0];
  vect__1.14_101 = vect_array.12[1];
  vect__1.15_102 = vect_array.12[2];
  vect_array.12 ={v} {CLOBBER};
  vect__4.16_103 = vect__1.14_101 >> 4;
  vect__22.17_104 = vect__1.15_102 << 4;
  vect__5.18_105 = vect__4.16_103 | vect__22.17_104;
  vect_array.21 = .LOAD_LANES (MEM  [(uint8_t
*)vectp_pCSI2.19_107]);
  vect__6.22_109 = vect_array.21[0];
  vect__6.23_110 = vect_array.21[1];
  vect_array.21 ={v} {CLOBBER};

Here vect__6.22_109 is the same as vect__1.14_101 and vect__6.23_110 is the
same as vect__1.15_102 (if I did this correctly).

[Bug target/110780] New: aarch64 NEON redundant displaced ld3

2023-07-23 Thread nate at thatsmathematics dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780

Bug ID: 110780
   Summary: aarch64 NEON redundant displaced ld3
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nate at thatsmathematics dot com
  Target Milestone: ---

Compile the following with gcc 14.0.0 20230723 on aarch64 with -O3:

#include 
void CSI2toBE12(uint8_t* pCSI2, uint8_t* pBE, uint8_t* pCSI2LineEnd)
{
while (pCSI2 < pCSI2LineEnd) {
pBE[0] = pCSI2[0];
pBE[1] = ((pCSI2[2] & 0xf) << 4) | (pCSI2[1] >> 4);
pBE[2] = ((pCSI2[1] & 0xf) << 4) | (pCSI2[2] >> 4);
pCSI2 += 3;
pBE += 3;
}
}

Godbolt: https://godbolt.org/z/WshTPKzY5

In the inner loop (.L5 of the godbolt asm) we have

ld3 {v25.16b - v27.16b}, [x3]
add x6, x3, 1
// no intervening stores
ld3 {v25.16b - v27.16b}, [x6]

The second load is redundant.  v25, v26 are the same as what was already in
v26, v27 respectively.  The value loaded into v27 is new but it is not used in
the subsequent code.

This might also account for some extra later complexity, because it means that
the last 48 bytes of the input can't be handled by this loop (or else the
second load would be out of bounds by one byte) and so must be handled
specially.

[Bug modula2/110779] SysClock can not read the clock

2023-07-23 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110779

Gaius Mulley  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-23
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Gaius Mulley  ---
$ gm2 -g TetSysClock.mod
$ ./a.out 

 clock can NOT be read

confirmed - many thanks for the bug report - will fix!

[Bug modula2/110779] New: SysClock can not read the clock

2023-07-23 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110779

Bug ID: 110779
   Summary: SysClock can not read the clock
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: gaius at gcc dot gnu.org
  Target Milestone: ---

Relayed from the gm2 mailing list:

can you please have a short check - my versions of the compiler now
always seem to have a return value of "FALSE" for SysClock.CanGetClock();

Just notices as I had update one test routine and noted that I was
always testing against the same set of data - the random number
generator uses a seed depended on the system time.


-

Shot check routine:

MODULE TstSysClock;

IMPORT SysClock;
IMPORT STextIO;

BEGIN
   STextIO.WriteLn;
   STextIO.WriteLn;
   IF SysClock.CanGetClock() THEN
 STextIO.WriteString(" clock can be read");
   ELSE
 STextIO.WriteString(" clock can NOT be read");
   END;
   STextIO.WriteLn;
   STextIO.WriteLn;
END TstSysClock.

[Bug fortran/98433] double free detected in tcache 2, after merge of structures

2023-07-23 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98433

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||wrong-code

--- Comment #8 from anlauf at gcc dot gnu.org ---
We are now generating a shallow copy instead of a deep copy for the line:

x3 = merge(x1, x2, .false.)

This gives:

  {
struct t D.4345;
struct t D.4346;
struct t D.4347;

D.4345 = x1;
D.4346 = x2;
D.4347 = x3;
x3 = D.4346;
if ((real(kind=4)[0:] * restrict) D.4347.v.data != 0B)
  {
__builtin_free ((void *) D.4347.v.data);
(real(kind=4)[0:] * restrict) D.4347.v.data = 0B;
  }
  }

while e.g. the assignment x3 = x2 produces:

...
x3 = x2;
if ((void *) x2.v.data != 0B)
  {
D.4346 = (x2.v.dim[0].ubound - x2.v.dim[0].lbound) + 1;
D.4347 = NON_LVALUE_EXPR ;
D.4348 = (void * restrict) __builtin_malloc (MAX_EXPR <(unsigned
long) (D.4347 * 4), 1>);
x3.v.data = D.4348;
__builtin_memcpy ((real(kind=4)[0:] * restrict) x3.v.data,
(real(kind=4)[0:] * restrict) x2.v.data, (unsigned long) (D.4347 * 4));
  }
else
  {
x3.v.data = 0B;
  }
...

[Bug c++/72825] ICE on invalid C++ code on x86_64-linux-gnu (internal compiler error: tree check: expected array_type, have error_mark in array_ref_low_bound, at tree.c:13013)

2023-07-23 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72825

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug c/110699] [12/13/14 Regression] internal compiler error: tree check: expected array_type, have error_mark in array_ref_low_bound, at tree.cc:12754

2023-07-23 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110699

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|12.4|14.0

[PATCH v1] RISC-V: Bugfix for allowing incorrect dyn for static rounding

2023-07-23 Thread Pan Li via Gcc-patches
From: Pan Li 

According to the spec, dyn rounding mode is invalid for RVV
floating-point, this patch would like to fix this.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-shapes.cc
(struct alu_frm_def): Take range check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-error.c: Update cases.
* gcc.target/riscv/rvv/base/float-point-frm-insert-6.c: Removed.
---
 .../riscv/riscv-vector-builtins-shapes.cc |  3 +-
 .../riscv/rvv/base/float-point-frm-error.c|  6 ++--
 .../riscv/rvv/base/float-point-frm-insert-6.c | 33 ---
 3 files changed, 4 insertions(+), 38 deletions(-)
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc 
b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
index 69a67106418..22b5fe256df 100644
--- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
@@ -285,8 +285,7 @@ struct alu_frm_def : public build_base
   {
unsigned int frm_num = c.arg_num () - 2;
 
-   return c.require_immediate_range_or (frm_num, FRM_STATIC_MIN,
-FRM_STATIC_MAX, FRM_DYN);
+   return c.require_immediate (frm_num, FRM_STATIC_MIN, FRM_STATIC_MAX);
   }
 
 return true;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
index 4ebaa15ab0b..01d82d4e661 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
@@ -7,9 +7,9 @@ typedef float float32_t;
 
 void test_float_point_frm_error (float32_t *out, vfloat32m1_t op1, 
vfloat32m1_t op2, size_t vl)
 {
-  vfloat32m1_t v1 = __riscv_vfadd_vv_f32m1_rm (op1, op2, 5, vl); /* { dg-error 
{passing 5 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
-  vfloat32m1_t v2 = __riscv_vfadd_vv_f32m1_rm (v1, v1, 6, vl);   /* { dg-error 
{passing 6 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
-  vfloat32m1_t v3 = __riscv_vfadd_vv_f32m1_rm (v2, v2, 8, vl);   /* { dg-error 
{passing 8 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\] or 7} } */
+  vfloat32m1_t v1 = __riscv_vfadd_vv_f32m1_rm (op1, op2, 5, vl); /* { dg-error 
{passing 5 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
+  vfloat32m1_t v2 = __riscv_vfadd_vv_f32m1_rm (v1, v1, 6, vl);   /* { dg-error 
{passing 6 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
+  vfloat32m1_t v3 = __riscv_vfadd_vv_f32m1_rm (v2, v2, 8, vl);   /* { dg-error 
{passing 8 to argument 3 of '__riscv_vfadd_vv_f32m1_rm', which expects a value 
in the range \[0, 4\]} } */
 
   __riscv_vse32_v_f32m1 (out, v3, vl);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c
deleted file mode 100644
index 1ef0e015d8f..000
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-insert-6.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
-
-#include "riscv_vector.h"
-
-typedef float float32_t;
-
-vfloat32m1_t
-test_riscv_vfadd_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
-  return __riscv_vfadd_vv_f32m1_rm (op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vv_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
-size_t vl) {
-  return __riscv_vfadd_vv_f32m1_m_rm(mask, op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vf_f32m1_rm(vfloat32m1_t op1, float32_t op2, size_t vl) {
-  return __riscv_vfadd_vf_f32m1_rm(op1, op2, 7, vl);
-}
-
-vfloat32m1_t
-test_vfadd_vf_f32m1_m_rm(vbool32_t mask, vfloat32m1_t op1, float32_t op2,
-size_t vl) {
-  return __riscv_vfadd_vf_f32m1_m_rm(mask, op1, op2, 7, vl);
-}
-
-/* { dg-final { scan-assembler-times 
{vfadd\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
-/* { dg-final { scan-assembler-not {fsrm\s+[axs][0-9]+} } } */
-/* { dg-final { scan-assembler-not {frrm\s+[axs][0-9]+} } } */
-/* { dg-final { scan-assembler-not {fsrmi\s+[01234]} } } */
-- 
2.34.1



[PATCH v5] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-23 Thread Pan Li via Gcc-patches
From: Pan Li 

In basic dynamic rounding mode, we simply ignore call instructions and
we would like to take care of call in this PATCH.

During the call, the frm may be updated or keep as is. Thus, we must
make sure at least 2 things.

1. The static frm before call should not pollute the frm value in call.
2. The updated frm value in call should be sticky after call completed.

We will perfrom some steps to make above happen.

1. Mark call instruction with new mode DYN_CALL.
2. Mark the instruction after CALL from NONE to DYN.
3. When emit for a DYN_CALL, we will restore the frm value.
4. When emit from a DYN_CALL, we will backup the frm value.

Let's take a flow for this.

   +-+
   | Entry (DYN) | <- frrm a5
   +-+
  /   \
+---+ +---+
| VFADD | | VFADD RTZ |  <- fsrmi 1(RTZ)
+---+ +---+
  ||
+---+ +---+
| CALL  | | CALL  |  <- fsrm a5
+---+ +---+
  |   |
+---+ +---+
| SHIFT | <- frrm a5  | VFADD |  <- frrm a5
+---+ +---+
  |  /
+---+   /
| VFADD RUP | <- fsrm1 3(RUP)
+---+ /
   \ /
+-+
| Exit (DYN_EXIT) | <- fsrm a5
+-+

When call is the last insn of one bb, we take care of it when needed
for each insn by inserting one frm backup (frrm) insn to the end of
the current bb.

Signed-off-by: Pan Li 
Co-Authored-By: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv.cc (DYNAMIC_FRM_RTL): New macro.
(STATIC_FRM_P): Ditto.
(struct mode_switching_info): New struct for mode switching.
(struct machine_function): Add new field mode switching.
(riscv_emit_frm_mode_set): Add DYN_CALL emit.
(riscv_frm_adjust_mode_after_call): New function for call mode.
(riscv_frm_reconcile_call_as_bb_end): New function for call as
the last insn of bb.
(riscv_frm_mode_needed): New function for frm mode needed.
(riscv_mode_needed): Extrac function for frm.
(riscv_frm_mode_after): Add DYN_CALL after.
(riscv_mode_entry): Remove backup rtl initialization.
* config/riscv/vector.md (frm_mode): Add dyn_call.
(fsrmsi_restore_exit): Rename to _volatile.
(fsrmsi_restore_volatile): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-insert-7.c: Adjust
test cases.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-33.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-34.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-35.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-36.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-37.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-38.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-39.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-40.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-41.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-42.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-43.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-44.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-45.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-47.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-48.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-49.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-50.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-51.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-52.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-53.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-54.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-55.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-56.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-57.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-58.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-59.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-60.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-61.c: New test.
* gcc.target/riscv/rvv/base/float-point-dynamic-frm-62.c: New test.
   

[Bug c++/72825] ICE on invalid C++ code on x86_64-linux-gnu (internal compiler error: tree check: expected array_type, have error_mark in array_ref_low_bound, at tree.c:13013)

2023-07-23 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72825

Roger Sayle  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
 CC||roger at nextmovesoftware dot 
com

--- Comment #4 from Roger Sayle  ---
This issue has been fixed on mainline (for GCC 14), by the patch for PR 110699.

[Bug c/109598] [12/13/14 Regression] ICE: tree check: expected array_type, have error_mark in array_ref_low_bound, at tree.cc

2023-07-23 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109598

Roger Sayle  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||roger at nextmovesoftware dot 
com

--- Comment #4 from Roger Sayle  ---
This issue has been fixed on mainline (for GCC 14), by the patch for PR 110699.

[Bug c/110699] [12/13/14 Regression] internal compiler error: tree check: expected array_type, have error_mark in array_ref_low_bound, at tree.cc:12754

2023-07-23 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110699

Roger Sayle  changed:

   What|Removed |Added

 CC||roger at nextmovesoftware dot 
com
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Roger Sayle  ---
This issue has been fixed on mainline for GCC 14.

[r14-2655 Regression] FAIL: g++.dg/gomp/pr58567.C -std=c++98 (test for excess errors) on Linux/x86_64

2023-07-23 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

92d1425ca7804000cfe8aa635cf363a87d362d75 is the first bad commit
commit 92d1425ca7804000cfe8aa635cf363a87d362d75
Author: Patrick Palka 
Date:   Wed Jul 19 16:10:20 2023 -0400

c++: redundant targ coercion for var/alias tmpls

caused

FAIL: g++.dg/gomp/pr58567.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/gomp/pr58567.C  -std=c++17 (test for excess errors)
FAIL: g++.dg/gomp/pr58567.C  -std=c++20 (test for excess errors)
FAIL: g++.dg/gomp/pr58567.C  -std=c++98 (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2655/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=g++.dg/gomp/pr58567.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=g++.dg/gomp/pr58567.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=g++.dg/gomp/pr58567.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="gomp.exp=g++.dg/gomp/pr58567.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[Bug middle-end/110750] [og13] nvptx offloading FAILs 'libgomp.c-c++-common/target-implicit-map-4.c' execution test

2023-07-23 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110750

--- Comment #1 from Thomas Schwinge  ---
... but it *does not* FAIL in my powerpc64le testing with Nvidia Tesla V100
(but *does* FAIL with powerpc64le Nvidia Quadro GV100 in the same way as it
does for x86_64 originally reported here).

Re: [PATCH] Fix 100864: `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-07-23 Thread Richard Biener via Gcc-patches



> Am 23.07.2023 um 01:27 schrieb Andrew Pinski via Gcc-patches 
> :
> 
> This adds a special case of the `(a&~b) | b` pattern where
> `b` and `~b` are comparisons.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Don’t we have an existing match for inversion s we could amend?

> gcc/ChangeLog:
> 
>PR tree-optimization/100864
>* match.pd ((~x & y) | x -> x | y): Add comparison variant.
> 
> gcc/testsuite/ChangeLog:
> 
>* gcc.dg/tree-ssa/bitops-3.c: New test.
> ---
> gcc/match.pd | 17 +-
> gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 
> 2 files changed, 83 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index bfd15d6cd4a..dd4a2df537d 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1928,7 +1928,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* (~x & y) | x -> x | y */
>  (simplify
>   (bitop:c (rbitop:c (bit_not @0) @1) @0)
> -  (bitop @0 @1)))
> +  (bitop @0 @1))
> + /* Similar but for comparisons which have been inverted already,
> +Note it is hard to simulate the inverted tcc_comparison due
> +NaNs; That is == and != are sometimes inversions and sometimes not.
> +So a double for loop is needed and then compare the inverse code
> +with the result of invert_tree_comparison is needed.
> +This works fine for vector compares as -1 and 0 are bitwise
> +inverses.  */
> + (for cmp (tcc_comparison)
> +  (for icmp (tcc_comparison)
> +   (simplify
> +(bitop:c (rbitop:c (icmp @0 @1) @2) (cmp@3 @0 @1))
> + (with { enum tree_code ic = invert_tree_comparison
> + (cmp, HONOR_NANS (@0)); }
> +  (if (ic == icmp)
> +   (bitop @3 @2)))
> 
> /* ((x | y) & z) | x -> (z & y) | x */
> (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> new file mode 100644
> index 000..68fff4edce9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> @@ -0,0 +1,67 @@
> +/* PR tree-optimization/100864 */
> +
> +/* { dg-do run } */
> +/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
> +
> +#define op_ne !=
> +#define op_eq ==
> +#define op_lt <
> +#define op_le <=
> +#define op_gt >
> +#define op_ge >=
> +
> +#define operators(t) \
> +t(ne) \
> +t(eq) \
> +t(lt) \
> +t(le) \
> +t(gt) \
> +t(ge)
> +
> +#define cmpfunc(v, op) \
> +__attribute__((noipa)) \
> +_Bool func_##op##_##v(v int a, v int b, v _Bool e) \
> +{ \
> +  v _Bool c = (a op_##op b); \
> +  v _Bool d = !c; \
> +  return (e & d) | c; \
> +}
> +
> +#define cmp_funcs(op) \
> +cmpfunc(, op) \
> +cmpfunc(volatile , op)
> +
> +operators(cmp_funcs)
> +
> +#define test(op) \
> +if (func_##op##_ (a, b, e) != func_##op##_volatile (a, b, e)) \
> + __builtin_abort();
> + 
> +int main()
> +{
> +  for(int a = -3; a <= 3; a++)
> +for(int b = -3; b <= 3; b++)
> +  {
> +_Bool e = 0;
> +operators(test)
> +e = 1;
> +operators(test)
> +  }
> +  return 0;
> +}
> +
> +/* Check to make sure we optimize `(a&!b) | b` -> `a | b`. */
> +/* There are 6 different comparison operators testing here. */
> +/* bit_not_expr and bit_and_expr should show up for each one (volatile). */
> +/* Each operator should show up twice
> +   (except for `!=` which shows up 2*6 (each tester) + 2 (the 2 loops) extra 
> = 16). */
> +/* bit_ior_expr will show up for each operator twice (non-volatile and 
> volatile). */
> +/* { dg-final { scan-tree-dump-times "ne_expr,"  16 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "eq_expr,"   2 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "lt_expr,"   2 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "le_expr,"   2 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "gt_expr,"   2 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "ge_expr,"   2 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "bit_not_expr,"  6 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "bit_and_expr,"  6 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "bit_ior_expr," 12 "optimized"} } */
> \ No newline at end of file
> -- 
> 2.31.1
> 


Re: [PATCH 1/2] Fix PR 110066: crash with -pg -static on riscv

2023-07-23 Thread Andreas Schwab
On Jul 22 2023, Andrew Pinski via Gcc-patches wrote:

> The problem -fasynchronous-unwind-tables is on by default for riscv linux
> We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> to .eh_frame data from crtbeginT.o instead of the user-defined object
> during static linking.
>
> This turns it off.

Since this is a recurring problem, and difficult to notice (see how long
the aarch64 case went unnoticed), it should be fixed generically,
instead of having to patch every case separately.

> diff --git a/libgcc/config/riscv/t-crtstuff b/libgcc/config/riscv/t-crtstuff
> new file mode 100644
> index 000..685d11b3e66
> --- /dev/null
> +++ b/libgcc/config/riscv/t-crtstuff
> @@ -0,0 +1,5 @@
> +# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv 
> linux
> +# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> +# to .eh_frame data from crtbeginT.o instead of the user-defined object
> +# during static linking.
> +CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables

What about CRTSTUFF_T_CFLAGS_S?

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."