Re: Re: [PATCH 3/5] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-07-26 Thread Xiao Zeng
On Wed, Jul 26, 2023 at 01:55:00 AM Andreas Schwab  
wrote:
>
>On Jul 19 2023, Xiao Zeng wrote:
>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 38d8eb2fcf5..7e6b24bd232 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -2448,6 +2448,17 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
>> outer_code, int opno ATTRIBUTE_UN
>>    *total = COSTS_N_INSNS (1);
>>    return true;
>>  }
>> +  else if (TARGET_ZICOND && outer_code == SET &&
>> +   ((GET_CODE (XEXP (x, 1)) == REG && XEXP (x, 2) == 
>> const0_rtx) ||
>> +   (GET_CODE (XEXP (x, 2)) == REG && XEXP (x, 1) == const0_rtx) 
>> ||
>> +   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
>> +    XEXP (x, 1) == XEXP (XEXP (x, 0), 0)) ||
>> +   (GET_CODE (XEXP (x, 1)) == REG && GET_CODE (XEXP (x, 2)) &&
>> +    XEXP (x, 2) == XEXP (XEXP (x, 0), 0
>
>Line breaks before the operator, not after.
>
>--
>Andreas Schwab, sch...@linux-m68k.org
>GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
>"And now for something completely different." 

Thank you for pointing out the code format issue. I will fix it in the future 
patch.

Re: [PATCH] c++: Fix ICE with parameter pack of decltype(auto) [PR103497]

2023-07-26 Thread Jason Merrill via Gcc-patches

On 6/30/23 03:05, Nathaniel Shead wrote:

On Thu, Jun 29, 2023 at 01:43:07PM -0400, Jason Merrill wrote:

On 6/24/23 09:24, Nathaniel Shead wrote:

On Fri, Jun 23, 2023 at 11:59:51AM -0400, Patrick Palka wrote:

Hi,

On Sat, 22 Apr 2023, Nathaniel Shead via Gcc-patches wrote:


Bootstrapped and tested on x86_64-pc-linux-gnu.

-- 8< --

This patch raises an error early when the decltype(auto) specifier is
used as a parameter of a function. This prevents any issues with an
unexpected tree type later on when performing the call.


Thanks very much for the patch!  Some minor comments below.



PR 103497


We should include the bug component name when referring to the PR in the
commit message (i.e. PR c++/103497) so that upon pushing the patch the
post-commit hook automatically adds a comment to the PR reffering to the
commit.  I could be wrong but AFAIK the hook only performs this when the
component name is included.


Thanks for the review! Fixed.



gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_type_specifier): Add check for
decltype(auto) as function parameter.

gcc/testsuite/ChangeLog:

* g++.dg/pr103497.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/parser.cc| 10 ++
   gcc/testsuite/g++.dg/pr103497.C |  7 +++
   2 files changed, 17 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/pr103497.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index e5f032f2330..1415e07e152 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19884,6 +19884,16 @@ cp_parser_simple_type_specifier (cp_parser* parser,
 && cp_lexer_peek_nth_token (parser->lexer, 2)->type != CPP_SCOPE)
   {
 type = saved_checks_value (token->u.tree_check_value);
+  /* Within a function parameter declaration, decltype(auto) is always an
+error.  */
+  if (parser->auto_is_implicit_function_template_parm_p
+ && TREE_CODE (type) == TEMPLATE_TYPE_PARM


We could check is_auto (type) here instead, to avoid any confusion with
checking AUTO_IS_DECLTYPE for a non-auto TEMPLATE_TYPE_PARM.


+ && AUTO_IS_DECLTYPE (type))
+   {
+ error_at (token->location,
+   "cannot declare a parameter with %");
+ type = error_mark_node;
+   }
 if (decl_specs)
{
  cp_parser_set_decl_spec_type (decl_specs, type,
diff --git a/gcc/testsuite/g++.dg/pr103497.C b/gcc/testsuite/g++.dg/pr103497.C
new file mode 100644
index 000..bcd421c2907
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr103497.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++14 } }
+
+void foo(decltype(auto)... args);  // { dg-error "parameter with 
.decltype.auto..|no parameter packs" }


I noticed for

void foo(decltype(auto) arg);

we already issue an identical error from grokdeclarator.  Perhaps we could
instead extend the error handling there to detect decltype(auto)... as well,
rather than adding new error handling in cp_parser_simple_type_specifier?


Ah thanks, I didn't notice this; this simplifies the change a fair bit.
How about this patch instead?

Regtested on x86_64-pc-linux-gnu.

-- 8< --

This patch ensures that checks for usages of 'auto' in function
parameters also consider parameter packs, since 'type_uses_auto' does
not seem to consider this case.

PR c++/103497

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Check for decltype(auto) in
parameter pack.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto-103497.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/decl.cc| 3 +++
   gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C | 8 
   2 files changed, 11 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 60f107d50c4..aaf691fce68 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14044,6 +14044,9 @@ grokdeclarator (const cp_declarator *declarator,
error ("cannot use %<::%> in parameter declaration");
 tree auto_node = type_uses_auto (type);
+  if (!auto_node && parameter_pack_p)
+   auto_node = type_uses_auto (PACK_EXPANSION_PATTERN (type));


Hmm, I wonder if type_uses_auto should look into PACK_EXPANSION_PATTERN
itself.  Would that break anything?


I gave that a try and it seems to work fine.

Regtested on x86_64-pc-linux-gnu.


Pushed, thanks.


-- 8< --

This patch ensures 'type_uses_auto' also checks for usages of 'auto' in
parameter packs.

PR c++/103497

gcc/cp/ChangeLog:

* pt.cc (type_uses_auto): Check inside parameter packs.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/decltype-auto-103497.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc  | 7 ++-
  gcc/testsuite/g++.dg/cpp1y/decltype-auto-103497.C | 8 
  2 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 

RE: [PATCH] Replace invariant ternlog operands

2023-07-26 Thread Liu, Hongtao via Gcc-patches



> -Original Message-
> From: Yan Simonaytes 
> Sent: Wednesday, July 26, 2023 2:11 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Uros Bizjak ;
> Yan Simonaytes 
> Subject: [PATCH] Replace invariant ternlog operands
> 
> Sometimes GCC generates ternlog with three operands, but some of them are
> invariant.
> For example:
> 
> vpternlogq$252, %zmm2, %zmm1, %zmm0
> 
> In this case zmm1 register isnt used by ternlog.
> So should replace zmm1 with zmm0 or zmm2:
> 
> vpternlogq$252, %zmm0, %zmm1, %zmm0
> 
> When the third operand of ternlog is memory and both others are invariant
> should add load instruction from this memory to register and replace the first
> and the second operands to this register.
> So insted of
> 
> vpternlogq$85, (%rdi), %zmm1, %zmm0
> 
> Should emit
> 
> vmovdqa64 (%rdi), %zmm0
> vpternlogq$85, %zmm0, %zmm0, %zmm0
> 
> gcc/ChangeLog:
> 
> * config/i386/i386.cc (ternlog_invariant_operand_mask): New helper
>   function for replacing invariant operands.
> (reduce_ternlog_operands): Likewise.
> * config/i386/i386-protos.h (ternlog_invariant_operand_mask):
> Prototype here.
> (reduce_ternlog_operands): Likewise.
> * config/i386/sse.md:
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/i386/reduce-ternlog-operands-1.c: New test.
> * gcc.target/i386/reduce-ternlog-operands-2.c: New test.
> ---
>  gcc/config/i386/i386-protos.h |  2 +
>  gcc/config/i386/i386.cc   | 45 +++
>  gcc/config/i386/sse.md| 43 ++
>  .../i386/reduce-ternlog-operands-1.c  | 20 +
>  .../i386/reduce-ternlog-operands-2.c  | 11 +
>  5 files changed, 121 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/reduce-ternlog-operands-
> 1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/reduce-ternlog-operands-
> 2.c
> 
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 27fe73ca65c..49398ef9936 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -57,6 +57,8 @@ extern int standard_80387_constant_p (rtx);  extern
> const char *standard_80387_constant_opcode (rtx);  extern rtx
> standard_80387_constant_rtx (int);  extern int standard_sse_constant_p (rtx,
> machine_mode);
> +extern int ternlog_invariant_operand_mask (rtx *operands); extern void
> +reduce_ternlog_operands (rtx *operands);
>  extern const char *standard_sse_constant_opcode (rtx_insn *, rtx *);  extern
> bool ix86_standard_x87sse_constant_load_p (const rtx_insn *, rtx);  extern
> bool ix86_pre_reload_split (void); diff --git a/gcc/config/i386/i386.cc
> b/gcc/config/i386/i386.cc index f0d6167e667..140de478571 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -5070,6 +5070,51 @@ ix86_check_no_addr_space (rtx insn)
>  }
>return true;
>  }
> +
> +/* Return mask of invariant operands:
> +   bit number 0 1 2
> +   operand number 1 2 3.  */
> +
> +int
> +ternlog_invariant_operand_mask (rtx *operands) {
> +  int mask = 0;
> +  int imm8 = XINT (operands[4], 0);
> +
> +  if (((imm8 >> 4) & 0xF) == (imm8 & 0xF))
> +mask |= 1;
> +  if (((imm8 >> 2) & 0x33) == (imm8 & 0x33))
> +mask |= (1 << 1);
> +  if (((imm8 >> 1) & 0x55) == (imm8 & 0x55))
> +mask |= (1 << 2);
> +
> +  return mask;
> +}
> +
> +/* Replace one of the unused operators with the one used.  */
> +
> +void
> +reduce_ternlog_operands (rtx *operands) {
> +  int mask = ternlog_invariant_operand_mask (operands);
> +
> +  if (mask & 1) /* the first operand is invariant.  */
> +operands[1] = operands[2];
> +
> +  if (mask & 2) /* the second operand is invariant.  */
> +operands[2] = operands[1];
> +
> +  if (mask & 4)  /* the third operand is invariant.  */
> +   operands[3] = operands[1];
> +  else if (!MEM_P (operands[3]))
> +{
> +  if (mask & 1) /* the first operand is invariant.  */
> + operands[1] = operands[3];
> +  if (mask & 2) /* the second operands is invariant.  */
> + operands[2] = operands[3];
> +}
> +}
> +
> 
> 
> 
>  /* Initialize the table of extra 80387 mathematical constants.  */
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> a2099373123..f88d82b315c 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -12625,6 +12625,49 @@
> (symbol_ref " == 64 || TARGET_AVX512VL")
> (const_string "*")))])
> 
> +;; If the first and the second operands of ternlog are invariant and ;;
> +the third operand is memory ;; then we should add load third operand
> +from memory to register and ;; replace first and second operands with
> +this register (define_split
> +  [(set (match_operand:V 0 "register_operand")
> + (unspec:V
> +   [(match_operand:V 1 "register_operand")
> +(match_operand:V 2 "register_operand")
> +(match_operand:V 3 

Re: [PATCH] vect: Treat VMAT_ELEMENTWISE as scalar load in costing [PR110776]

2023-07-26 Thread Kewen.Lin via Gcc-patches
on 2023/7/26 18:02, Richard Biener wrote:
> On Wed, Jul 26, 2023 at 4:52 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> PR110776 exposes one issue that we could query unaligned
>> load for vector type but actually no unaligned vector load
>> is supported there.  The reason is that the costed load is
>> with single-lane vector type and its memory access type is
>> VMAT_ELEMENTWISE, we actually take it as scalar load and
>> set its alignment_support_scheme as dr_unaligned_supported.
>>
>> To avoid the ICE as exposed, following Rich's suggestion,
>> this patch is to make VMAT_ELEMENTWISE be costed as scalar
>> load.
>>
>> Bootstrapped and regress-tested on x86_64-redhat-linux,
>> powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
>>
>> Is it ok for trunk?
> 
> OK.

Thanks Richi, pushed as r14-2813.

BR,
Kewen


RE: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Li, Pan2 via Gcc-patches
 > rtx_insn *last = BB_END (bb);
 > emit_insn_before_noloc (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)), last, bb);

The frrmsi insn need to be placed after CALL (aka last), then I bet here we 
should use emit_insn_after_noloc.
Unfortunately, it will have ICE like below. I am still investigating the 
suggestion from Jeff for this.

../../../.././gcc/libstdc++-v3/libsupc++/vec.cc:292:3: error: flow control insn 
inside a basic block.
../../../.././gcc/libstdc++-v3/libsupc++/vec.cc:292:3: internal compiler error: 
in rtl_verify_bb_insns, at cfgrtl.cc:2796

> Why do we appear to return a different mode here?  We already request
> FRM_MODE_DYN_CALL in mode_needed.  It looks like in the whole function
> we do not change the mode so we could just always return the incoming
> mode?

Because we need to emit 2 insn when meet a call.
One before the call, we must return DYN_CALL when needed, then the emit part is 
able to know the mode switch to DYN_CALL and restore.
One after the call, we must return DYN_CALL when after, then the next insn emit 
part is able to know the prev_mode is DYN_CALL and backup.

> frm_unknown_dynamic_p checks CALL_P which has already been checked
> before.  It returns FRM_MODE_DYN instead of FRM_MODE_DYN_CALL, though.

Thanks for pointing this out, and will cleanup in PATCH v8.

> Here and in similar cases, NEW_FRM is not exactly telling.  Can't we
> use "should be " and then

Thanks and will fix in v8.

> NON -> FRM.

Thanks and will fix in v8.

> This causes a FAIL for me.  I believe the scan directives are off by one.

Will double check about it for both rv32/rv64 tests.

Pan

-Original Message-
From: Robin Dapp  
Sent: Wednesday, July 26, 2023 9:08 PM
To: Kito Cheng ; Li, Pan2 
Cc: rdapp@gmail.com; gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, 
Yanzhang 
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

So after thinking about it again - I'm still not really sure
I like treating every function as essentially an fesetround.
There is a reason why fesetround is special.  Does LLVM behave
the same way?

But supposing we really, really want it and assuming there's consensus:

+  start_sequence ();
+  emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
+  rtx_insn *backup_insn = get_insns ();
+  end_sequence ();

A comment here would be nice why we need a sequence for a single
instruction.  I'm not fully aware what insert_insn_end_basic_block
does but won't a

  rtx_insn *last = BB_END (bb);
  emit_insn_before_noloc (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)), last, bb);

suffice?  One way or another need these kinds of non-local
constructs here don't seem entirely rock solid.

@@ -7843,6 +7946,11 @@ riscv_vxrm_mode_after (rtx_insn *insn, int mode)
 static int
 riscv_frm_mode_after (rtx_insn *insn, int mode)
 {
+  STATIC_FRM_P (cfun) = STATIC_FRM_P (cfun) || riscv_static_frm_mode_p (mode);
+
+  if (CALL_P (insn))
+return FRM_MODE_DYN_CALL;

Why do we appear to return a different mode here?  We already request
FRM_MODE_DYN_CALL in mode_needed.  It looks like in the whole function
we do not change the mode so we could just always return the incoming
mode?

This is not part of this patch but related and originally I assumed
that we would untangle things after the initial patch, so:

   if (frm_unknown_dynamic_p (insn))
 return FRM_MODE_DYN;

frm_unknown_dynamic_p checks CALL_P which has already been checked
before.  It returns FRM_MODE_DYN instead of FRM_MODE_DYN_CALL, though.

Apart from that, the function is called unknown_dynamic but we check
for a SET of FRM?  Wouldn't something that sets FRM rather be a "static"
rounding-mode instruction? (using the "static" wording from before)

Then we also still have

  if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
return get_attr_frm_mode (insn);

from before.  Isn't that pretty much the same?


+  assert_equal (NEW_FRM, get_frm (),
+   "The value of frm register should be NEW_FRM.");

Here and in similar cases, NEW_FRM is not exactly telling.  Can't we
use "should be " and then 

+  fprintf (stdout, "%s %d, but get %d != %d\n", message, a, b);

or similar?

+   will do the mode switch from MODE_CALL to MODE_NON_NONE natively.

NON -> FRM.

+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"

This causes a FAIL for me.  I believe the scan directives are off by one.

Are you going to do asm directives in a separate patch?
Similar to vxrm_unknown_p we could just check for one here
and handle it similarly to a call.  Would need some more tests, though.

Regards
 Robin



RE: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Li, Pan2 via Gcc-patches
Thanks Juzhe and Jeff for suggestion. The approach like emit_insn_before_noloc 
will result in below ICE here.

../../../.././gcc/libstdc++-v3/libsupc++/new_opant.cc:42:1: error: flow control 
insn inside a basic block.
../../../.././gcc/libstdc++-v3/libsupc++/new_opant.cc:42:1: internal compiler 
error: in rtl_verify_bb_insns, at cfgrtl.cc:2796

Then I tried below approach but also have ICE like below.

../../../.././gcc/libstdc++-v3/libsupc++/eh_personality.cc:805:1: internal 
compiler error: in insert_insn_on_edge, at cfgrtl.cc:1976.

The insert_insn_end_basic_block have some special handling when end bb is CALL.

Pan

From: 钟居哲 
Sent: Thursday, July 27, 2023 6:56 AM
To: Jeff Law ; rdapp.gcc ; 
kito.cheng ; Li, Pan2 
Cc: gcc-patches ; Wang, Yanzhang 

Subject: Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Thanks Jeff.

Hi, Pan:
Plz try (insert edge and put 'frrm' on that edge) instead of insert end of 
block to see whether it works
(I have tried onece but I don't remember what happens).

Try that with following codes:
edge eg;
edge_iterator ei;
FOR_EACH_EDGE (eg, ei, bb->succs)
{
  start_sequence ();
   emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
   rtx_insn *backup_insn = get_insns ();
  end_sequence ();
  insert_insn_on_edge (backup_insn, eg);
}

to see how's going.

Not sure whether it is correct, Jeff could comments on that.

Thanks.

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2023-07-27 06:46
To: 钟居哲; rdapp.gcc; 
kito.cheng; pan2.li
CC: gcc-patches; 
yanzhang.wang
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding


On 7/26/23 16:21, 钟居哲 wrote:
> Hi, Jeff.
>
> insert_insn_end_basic_block is to handle this following case:
>
> bb 1:
> ...
> CALL.>BB_END of bb
> bb 2:
> vfadd rne
>
> You can see there is no instructions after CALL.
>
> So you we use insert_insn_end_basic_block insert a "frrm" at the end of
> the bb 1.
>
> I know typically it's better to insert a edge between bb 1 and bb 2,
> then put "frrm" in that edgen.
> However, it causes ICE.
We'd need to know the reason for the ICE.

>
> If we really need to follow this approach, it seems that we need to
> modify the "mode_sw" PASS?
> Currently, we are avoiding changing the codes of PASS.
Generally wise, but sometimes we do need to change generic bits.  Let's
dive a bit into this.

We have more freedom here to loosen the profitability constraints since
its a target specific pass, but let's at least understand the what's
going on with the ICE, then make some decisions about the best way forward.

jeff



[committed] [RISC-V] Fix expected diagnostic messages in testsuite

2023-07-26 Thread Jeff Law
Whoops, this should have gone in with the fixes to the RISC-V 
diagnostics from earlier this week.


Committed to the trunk.

Jeff
commit 6f709f79c915a1ea82220a44e9f4a144d5eedfd1
Author: Jeff Law 
Date:   Wed Jul 26 19:25:33 2023 -0600

[committed] [RISC-V] Fix expected diagnostic messages in testsuite

Whoops, this should have gone in with the fixes to the RISC-V
diagnostics from earlier this week.

gcc/testsuite
* gcc.target/riscv/arch-23.c: Update expected diagnostic messages.
* gcc.target/riscv/pr102957.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/riscv/arch-23.c 
b/gcc/testsuite/gcc.target/riscv/arch-23.c
index 1cb55889c50..fca5425790c 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-23.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-23.c
@@ -6,6 +6,6 @@ int foo()
 
 /* { dg-error "ISA string is not in canonical order. 'c'" "" { target *-*-* } 
0 } */
 /* { dg-error "extension 'w' is unsupported standard single letter extension" 
"" { target *-*-* } 0 } */
-/* { dg-error "extension 'zvl' starts with `z` but is unsupported standard 
extension" "" { target *-*-* } 0 } */
-/* { dg-error "extension 's123' starts with `s` but is unsupported standard 
supervisor extension" "" { target *-*-* } 0 } */
-/* { dg-error "extension 'x123' starts with `x` but is unsupported 
non-standard extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 'zvl' starts with 'z' but is unsupported standard 
extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 's123' starts with 's' but is unsupported standard 
supervisor extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 'x123' starts with 'x' but is unsupported 
non-standard extension" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/pr102957.c 
b/gcc/testsuite/gcc.target/riscv/pr102957.c
index 322d49c62d4..5273ee6c501 100644
--- a/gcc/testsuite/gcc.target/riscv/pr102957.c
+++ b/gcc/testsuite/gcc.target/riscv/pr102957.c
@@ -4,4 +4,4 @@ int foo()
 {
 }
 
-/* { dg-error "extension 'zb' starts with `z` but is unsupported standard 
extension" "" { target *-*-* } 0 } */
+/* { dg-error "extension 'zb' starts with 'z' but is unsupported standard 
extension" "" { target *-*-* } 0 } */


Re: [PATCH] c++: devirtualization of array destruction [PR110057]

2023-07-26 Thread Ng YongXiang via Gcc-patches
Hi Jason,

I've made the following changes.

1. Add pr83054-2.C
2. Move the devirt tests to tree-ssa.
3. Remove dg do run for devirt tests
4. Add // PR c++/110057
5. Generate commit message with git gcc-commit-mklog
6. Check commit format with git gcc-verify

Thanks!

On Thu, Jul 27, 2023 at 12:30 AM Jason Merrill  wrote:

> On 7/26/23 12:00, Ng YongXiang wrote:
> > Hi Jason,
> >
> > Thanks for the reply and review. I've attached an updated patch with the
> > change log and sign off.
> >
> > The change made in gcc/testsuite/g++.dg/warn/pr83054.C is because I
> > think there is no more warning since we have already devirtualized the
> > destruction for the array.
>
> Makes sense, and it's good to have your adjusted testcase in the
> testsuite, it should just be a new one (maybe pr83054-2.C).
>
> > Apologies for the poor formatting. It is my first time contributing. Do
> > let me know if there's any stuff I've missed and feel free to modify the
> > patch where you deem necessary.
>
> No worries!
>
> The ChangeLog entries still need some adjustment, according to git
> gcc-verify (from contrib/gcc-git-customization.sh, see
> https://gcc.gnu.org/gitwrite.html):
>
> ERR: line should start with a tab: "* init.c: Call non virtual
> destructor of objects in array"
> ERR: line should start with a tab: "*
> g++.dg/devirt-array-destructor-1.C: New."
> ERR: line should start with a tab: "*
> g++.dg/devirt-array-destructor-2.C: New."
> ERR: line should start with a tab: "* g++.dg/warn/pr83054.C:
> Remove expected warnings caused by devirtualization"
> ERR: PR 110057 in subject but not in changelog: "c++: devirtualization
> of array destruction [PR110057]"
>
> git gcc-commit-mklog (also from gcc-git-customization.sh) makes
> generating ChangeLog entries a lot simpler.
>
> > * g++.dg/devirt-array-destructor-1.C: New.
>
> Tests that look at tree-optimization dump files should go in the
> g++.dg/tree-ssa subdirectory.
>
> > +/* { dg-do run } */
>
> It seems unnecessary to execute these tests, I'd think the default of {
> dg-do compile } would be fine.
>
> It's also good to have a
>
> // PR c++/110057
>
> line at the top of the testcase for future reference.  gcc-commit-mklog
> also uses that to add the PR number to the ChangeLog.
>
> Jason
>
>
From 04e9b412e49a3966f84edd3afda66ebdb729efdc Mon Sep 17 00:00:00 2001
From: yongxiangng 
Date: Thu, 27 Jul 2023 08:01:42 +0800
Subject: [PATCH 1/1] [PATCH] c++: devirtualization of array destruction
 [PR110057]

	PR c++/110057
	PR ipa/83054

gcc/cp/ChangeLog:

	* init.cc (build_vec_delete_1): Devirtualize array destruction.

gcc/testsuite/ChangeLog:

	* g++.dg/warn/pr83054.C: Remove devirtualization warning.
	* g++.dg/tree-ssa/devirt-array-destructor-1.C: New test.
	* g++.dg/tree-ssa/devirt-array-destructor-2.C: New test.
	* g++.dg/warn/pr83054-2.C: New test.

Signed-off-by: Ng Yong Xiang 
---
 gcc/cp/init.cc|  8 ++--
 .../tree-ssa/devirt-array-destructor-1.C  | 28 
 .../tree-ssa/devirt-array-destructor-2.C  | 29 
 gcc/testsuite/g++.dg/warn/pr83054-2.C | 44 +++
 gcc/testsuite/g++.dg/warn/pr83054.C   |  2 +-
 5 files changed, 106 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
 create mode 100644 gcc/testsuite/g++.dg/warn/pr83054-2.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 6ccda365b04..69ab51d0a4b 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4112,8 +4112,8 @@ build_vec_delete_1 (location_t loc, tree base, tree maxindex, tree type,
   if (type_build_dtor_call (type))
 	{
 	  tmp = build_delete (loc, ptype, base, sfk_complete_destructor,
-			  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR, 1,
-			  complain);
+			  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR|LOOKUP_NONVIRTUAL,
+			  1, complain);
 	  if (tmp == error_mark_node)
 	return error_mark_node;
 	}
@@ -4143,8 +4143,8 @@ build_vec_delete_1 (location_t loc, tree base, tree maxindex, tree type,
 return error_mark_node;
   body = build_compound_expr (loc, body, tmp);
   tmp = build_delete (loc, ptype, tbase, sfk_complete_destructor,
-		  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR, 1,
-		  complain);
+		  LOOKUP_NORMAL|LOOKUP_DESTRUCTOR|LOOKUP_NONVIRTUAL,
+		  1, complain);
   if (tmp == error_mark_node)
 return error_mark_node;
   body = build_compound_expr (loc, body, tmp);
diff --git a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
new file mode 100644
index 000..ce8dc2a57cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
@@ -0,0 +1,28 @@
+// PR c++/110057
+/* { dg-do-compile } */
+/* Virtual calls should be devirtualized because we know dynamic type of object in array at compile time */
+/* { dg-options "-O3 

Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread 钟居哲
Thanks Jeff.

Hi, Pan:
Plz try (insert edge and put 'frrm' on that edge) instead of insert end of 
block to see whether it works 
(I have tried onece but I don't remember what happens).

Try that with following codes:
edge eg;
edge_iterator ei;
FOR_EACH_EDGE (eg, ei, bb->succs)
{
start_sequence ();
   emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
   rtx_insn *backup_insn = get_insns ();
end_sequence ();
insert_insn_on_edge (backup_insn, eg);
}

to see how's going.

Not sure whether it is correct, Jeff could comments on that.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-07-27 06:46
To: 钟居哲; rdapp.gcc; kito.cheng; pan2.li
CC: gcc-patches; yanzhang.wang
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding
 
 
On 7/26/23 16:21, 钟居哲 wrote:
> Hi, Jeff.
> 
> insert_insn_end_basic_block is to handle this following case:
> 
> bb 1:
> ...
> CALL.>BB_END of bb
> bb 2:
> vfadd rne
> 
> You can see there is no instructions after CALL.
> 
> So you we use insert_insn_end_basic_block insert a "frrm" at the end of 
> the bb 1.
> 
> I know typically it's better to insert a edge between bb 1 and bb 2, 
> then put "frrm" in that edgen.
> However, it causes ICE.
We'd need to know the reason for the ICE.
 
> 
> If we really need to follow this approach, it seems that we need to 
> modify the "mode_sw" PASS?
> Currently, we are avoiding changing the codes of PASS.
Generally wise, but sometimes we do need to change generic bits.  Let's 
dive a bit into this.
 
We have more freedom here to loosen the profitability constraints since 
its a target specific pass, but let's at least understand the what's 
going on with the ICE, then make some decisions about the best way forward.
 
jeff
 


Re: Re: [PATCH] RISC-V: Enable basic VLS modes support

2023-07-26 Thread 钟居哲

>> Any specific reason for MAX_BITS_PER_WORD instead of
>> GET_MODE_BITSIZE (Pmode)?  In general I like the idea to switch
>> to scalar moves here but couldn't it already be debatable for
>> a 64-bit move on rv32 i.e. a question of costing?
V2SImode for example, 

I think 
I prefer this following sequence:
lw
lw
sw
sw

instead of:
vsetvli zero, 2, e32, mf2
vle
vse

in RV32 system.


>> Here as well as before, why the assert?  Is this intended for
>> easier debugging later?
Since before may be false, wheras now it should always be true for VLS modes.

>> These are the same as for the fractional modes but we define
>> the expanders for all modes instead.  legitimize_move will only
>> create the _lra if the mode is indeed smaller than a vector and
>> I assume the scratch will prevent us from ever generating the
>> insn any other way.  Still, couldn't we add the new modes to
>> V_FRACT?  Or didn't you want to mix VLA and VLS modes there?
>> The others seem to be mixed, though.
No, you should take a look at the splitting, VLSmodes are different from 
V_FRACT.

>> What do we need this for?
>> Maybe for this?  Won't MIN suffice?
Address comments.

>> Why do these iterator modes have a condition on VLS while the following
>> ones don't?  It's probably not terribly important as it works either way
>> but still sticks out.
Address comments.


>> Btw. as a general remark.  In the past I also found the single-element
>> vectors helpful for codegen but that might be obsolete.  Not in scope
>> for this patch.
Address comments. I saw many targets added single-element vector.
I will add it in V2.




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-27 04:27
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Enable basic VLS modes support
Hi Juzhe,
 
just some small remarks, all in all no major concerns.
 
> +   vmv%m1r.v\t%0,%1"
> +  "&& (!register_operand (operands[0], mode)
> +   || !register_operand (operands[1], mode))"
> +  [(const_int 0)]
> +  {
> +unsigned size = GET_MODE_BITSIZE (mode).to_constant ();
> +if (size <= MAX_BITS_PER_WORD
 
Any specific reason for MAX_BITS_PER_WORD instead of
GET_MODE_BITSIZE (Pmode)?  In general I like the idea to switch
to scalar moves here but couldn't it already be debatable for
a 64-bit move on rv32 i.e. a question of costing?
 
> +  gcc_assert (ok_p);
> +  DONE;
 
Here as well as before, why the assert?  Is this intended for
easier debugging later?
 
> +
> +(define_expand "@mov_lra"
> +  [(parallel
> +[(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand")
> +   (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand"))
> +   (clobber (match_scratch:P 2))])]
> +  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
> +{})
> +
> +(define_insn_and_split "*mov_lra"
 
These are the same as for the fractional modes but we define
the expanders for all modes instead.  legitimize_move will only
create the _lra if the mode is indeed smaller than a vector and
I assume the scratch will prevent us from ever generating the
insn any other way.  Still, couldn't we add the new modes to
V_FRACT?  Or didn't you want to mix VLA and VLS modes there?
The others seem to be mixed, though.
> +#define INCLUDE_ALGORITHM
 
What do we need this for?
 
> +  int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
> +  if (size < TARGET_MIN_VLEN)
> + {
> +   int factor = TARGET_MIN_VLEN / size;
> +   if (inner_size == 8)
> + factor = std::min (factor, 8);
 
Maybe for this?  Won't MIN suffice?
 
> +  ;; VLS modes.
> +  (V2QI "TARGET_VECTOR_VLS")
 
Why do these iterator modes have a condition on VLS while the following
ones don't?  It's probably not terribly important as it works either way
but still sticks out.
 
Btw. as a general remark.  In the past I also found the single-element
vectors helpful for codegen but that might be obsolete.  Not in scope
for this patch.
 
Regards
Robin
 
 


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Jeff Law via Gcc-patches




On 7/26/23 16:21, 钟居哲 wrote:

Hi, Jeff.

insert_insn_end_basic_block is to handle this following case:

bb 1:
...
CALL.>BB_END of bb
bb 2:
vfadd rne

You can see there is no instructions after CALL.

So you we use insert_insn_end_basic_block insert a "frrm" at the end of 
the bb 1.


I know typically it's better to insert a edge between bb 1 and bb 2, 
then put "frrm" in that edgen.

However, it causes ICE.

We'd need to know the reason for the ICE.



If we really need to follow this approach, it seems that we need to 
modify the "mode_sw" PASS?

Currently, we are avoiding changing the codes of PASS.
Generally wise, but sometimes we do need to change generic bits.  Let's 
dive a bit into this.


We have more freedom here to loosen the profitability constraints since 
its a target specific pass, but let's at least understand the what's 
going on with the ICE, then make some decisions about the best way forward.


jeff


Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-26 Thread Lewis Hyatt via Gcc-patches
On Wed, Jul 26, 2023 at 5:36 PM Jason Merrill  wrote:
>
> On 6/30/23 18:59, Lewis Hyatt wrote:
> > In order to support processing #pragma in preprocess-only mode (-E or
> > -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> > libcpp. In full compilation modes, this is accomplished by calling
> > pragma_lex (), which is a symbol that must be exported by the frontend, and
> > which is currently implemented for C and C++. Neither of those frontends
> > initializes its parser machinery in preprocess-only mode, and consequently
> > pragma_lex () does not work in this case.
> >
> > Address that by adding a new function c_init_preprocess () for the frontends
> > to implement, which arranges for pragma_lex () to work in preprocess-only
> > mode, and adjusting pragma_lex () accordingly.
> >
> > In preprocess-only mode, the preprocessor is accustomed to controlling the
> > interaction with libcpp, and it only knows about tokens that it has called
> > into libcpp itself to obtain. Since it still needs to see the tokens
> > obtained by pragma_lex () so that they can be streamed to the output, also
> > add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> > sees these tokens too.
> >
> > Currently, there is one place where we are already supporting #pragma in
> > preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> > was done by directly interfacing with libcpp, rather than making use of
> > pragma_lex (). Now that pragma_lex () works, that code is no longer
> > necessary; remove it.
> >
> > gcc/c-family/ChangeLog:
> >
> >   * c-common.h (c_init_preprocess): Declare new function.
> >   * c-opts.cc (c_common_init): Call it.
> >   * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> >   (pragma_diagnostic_lex): ...this.
> >   (pragma_diagnostic_lex_pp): Remove.
> >   (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> >   all modes.
> >   (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> >   usage.
> >   * c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
> >
> > gcc/c/ChangeLog:
> >
> >   * c-parser.cc (pragma_lex): Support preprocess-only mode.
> >   (pragma_lex_discard_to_eol): New function.
> >   (c_init_preprocess): New function.
> >
> > gcc/cp/ChangeLog:
> >
> >   * parser.cc (c_init_preprocess): New function.
> >   (maybe_read_tokens_for_pragma_lex): New function.
> >   (pragma_lex): Support preprocess-only mode.
> >   (pragma_lex_discard_to_eol): New funtion.
> >
> > libcpp/ChangeLog:
> >
> >   * include/cpplib.h (struct cpp_callbacks): Add new callback
> >   on_token_lex.
> >   * macro.cc (cpp_get_token_1): Support new callback.
> > ---
> >
> > Notes:
> >  Hello-
> >
> >  In r13-1544, I added support for processing `#pragma GCC diagnostic' in
> >  preprocess-only mode. Because pragma_lex () doesn't work in that mode, 
> > in
> >  that patch I called into libcpp directly to obtain the tokens needed to
> >  process the pragma. As part of the review, Jason noted that it would
> >  probably be better to make pragma_lex () usable in preprocess-only 
> > mode, and
> >  we decided just to add a comment about that for the time being, and to 
> > go
> >  ahead and implement that in the future, if it became necessary to 
> > support
> >  other pragmas during preprocessing.
> >
> >  I think now is a good time to proceed with that plan, because I would 
> > like
> >  to fix PR87299, which is about another pragma (#pragma GCC target) not
> >  working in preprocess-only mode. This patch makes the necessary 
> > changes for
> >  pragma_lex () to work in preprocess-only mode.
> >
> >  I have also added a new callback, on_token_lex (), to libcpp. This is 
> > so the
> >  preprocessor can see and stream out all the tokens that pragma_lex () 
> > gets
> >  from libcpp, since it won't otherwise see them.  This seemed the 
> > simplest
> >  approach to me. Another possibility would be to add a wrapper function 
> > in
> >  c-family/c-lex.cc, which would call cpp_get_token_with_location(), and 
> > then
> >  also stream the token in preprocess-only mode, and then change all 
> > calls
> >  into libcpp in that file to use the wrapper function.  The libcpp 
> > callback
> >  seemed cleaner to me FWIW.
>
> I think the other way sounds better to me; there are only three calls to
> cpp_get_... in c_lex_with_flags.
>
> The rest of the patch looks good.

Thank you very much for the feedback. I will test it this way and send
the updated version.

-Lewis


Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread 钟居哲
Hi, Jeff.

insert_insn_end_basic_block is to handle this following case:

bb 1:
...
CALL.>BB_END of bb 
bb 2:
vfadd rne

You can see there is no instructions after CALL.

So you we use insert_insn_end_basic_block insert a "frrm" at the end of the bb 
1.

I know typically it's better to insert a edge between bb 1 and bb 2, then put 
"frrm" in that edgen.
However, it causes ICE.

If we really need to follow this approach, it seems that we need to modify the 
"mode_sw" PASS?
Currently, we are avoiding changing the codes of PASS.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-07-27 05:40
To: Robin Dapp; Kito Cheng; Li, Pan2
CC: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding
 
 
On 7/26/23 07:08, Robin Dapp via Gcc-patches wrote:
> So after thinking about it again - I'm still not really sure
> I like treating every function as essentially an fesetround.
> There is a reason why fesetround is special.  Does LLVM behave
> the same way?
> 
> But supposing we really, really want it and assuming there's consensus:
> 
> +  start_sequence ();
> +  emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
> +  rtx_insn *backup_insn = get_insns ();
> +  end_sequence ();
> 
> A comment here would be nice why we need a sequence for a single
> instruction.  I'm not fully aware what insert_insn_end_basic_block
> does but won't a
> 
>rtx_insn *last = BB_END (bb);
>emit_insn_before_noloc (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)), last, bb);
> 
> suffice?  One way or another need these kinds of non-local
> constructs here don't seem entirely rock solid.
Typically an LCM algorithm needs to insert on edges rather than at the 
end of blocks -- this is particularly important to preserve its property 
that on no path through the CFG can we have more evaluations of the 
expression after PRE/LCM than before PRE/LCM.
 
The other thing edge insertions do is simplify the abnormal critical 
edge problems.  I'd have to dig into the precise details, but in the 
generic PRE/LCM code we clobber the available expressions on critical 
edges so that we don't try to hold a value live across that edge.
 
Thus the insertion point will tend to be the normal edge of a block that 
ends with a call to a potentially throwing function.
 
Inserting on the edge also significantly simplifies handling of 
conditional branches ;-)
 
Jeff
 


Re: [PATCH] Fix typo in insn name.

2023-07-26 Thread Michael Meissner via Gcc-patches
On Wed, Jul 26, 2023 at 01:54:01PM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> on 2023/7/11 03:59, Michael Meissner wrote:
> > In doing other work, I noticed that there was an insn:
> > 
> > vsx_extract_v4sf__load
> > 
> > Which did not have an iterator.  I removed the useless .
> 
> It actually has a mode iterator, the "P" is used for clobber.
> 
> The whole pattern of this define_insn_and_split is
> 
> (define_insn_and_split "*vsx_extract_v4sf__load"
>   [(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
>   (vec_select:SF
>(match_operand:V4SF 1 "memory_operand" "m,Z,m,m")
>(parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n,n,n")])))
>(clobber (match_scratch:P 3 "=,,,"))] <== *P used here*
> 
> Its definition is:
> 
> (define_mode_iterator P [(SI "TARGET_32BIT") (DI "TARGET_64BIT")])
> 
> I guess we can just leave it there?
> 
> BR,
> Kewen

Yes, I didn't notice the :P in the insn.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Jeff Law via Gcc-patches




On 7/26/23 07:08, Robin Dapp via Gcc-patches wrote:

So after thinking about it again - I'm still not really sure
I like treating every function as essentially an fesetround.
There is a reason why fesetround is special.  Does LLVM behave
the same way?

But supposing we really, really want it and assuming there's consensus:

+  start_sequence ();
+  emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
+  rtx_insn *backup_insn = get_insns ();
+  end_sequence ();

A comment here would be nice why we need a sequence for a single
instruction.  I'm not fully aware what insert_insn_end_basic_block
does but won't a

   rtx_insn *last = BB_END (bb);
   emit_insn_before_noloc (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)), last, bb);

suffice?  One way or another need these kinds of non-local
constructs here don't seem entirely rock solid.
Typically an LCM algorithm needs to insert on edges rather than at the 
end of blocks -- this is particularly important to preserve its property 
that on no path through the CFG can we have more evaluations of the 
expression after PRE/LCM than before PRE/LCM.


The other thing edge insertions do is simplify the abnormal critical 
edge problems.  I'd have to dig into the precise details, but in the 
generic PRE/LCM code we clobber the available expressions on critical 
edges so that we don't try to hold a value live across that edge.


Thus the insertion point will tend to be the normal edge of a block that 
ends with a call to a potentially throwing function.


Inserting on the edge also significantly simplifies handling of 
conditional branches ;-)


Jeff


Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-26 Thread Jason Merrill via Gcc-patches

On 6/30/23 18:59, Lewis Hyatt wrote:

In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
add a new libcpp callback, on_token_lex (), that ensures the preprocessor
sees these tokens too.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare new function.
* c-opts.cc (c_common_init): Call it.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare new function.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New funtion.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_callbacks): Add new callback
on_token_lex.
* macro.cc (cpp_get_token_1): Support new callback.
---

Notes:
 Hello-
 
 In r13-1544, I added support for processing `#pragma GCC diagnostic' in

 preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
 that patch I called into libcpp directly to obtain the tokens needed to
 process the pragma. As part of the review, Jason noted that it would
 probably be better to make pragma_lex () usable in preprocess-only mode, 
and
 we decided just to add a comment about that for the time being, and to go
 ahead and implement that in the future, if it became necessary to support
 other pragmas during preprocessing.
 
 I think now is a good time to proceed with that plan, because I would like

 to fix PR87299, which is about another pragma (#pragma GCC target) not
 working in preprocess-only mode. This patch makes the necessary changes for
 pragma_lex () to work in preprocess-only mode.
 
 I have also added a new callback, on_token_lex (), to libcpp. This is so the

 preprocessor can see and stream out all the tokens that pragma_lex () gets
 from libcpp, since it won't otherwise see them.  This seemed the simplest
 approach to me. Another possibility would be to add a wrapper function in
 c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then
 also stream the token in preprocess-only mode, and then change all calls
 into libcpp in that file to use the wrapper function.  The libcpp callback
 seemed cleaner to me FWIW.


I think the other way sounds better to me; there are only three calls to 
cpp_get_... in c_lex_with_flags.


The rest of the patch looks good.

Jason



Re: [PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

2023-07-26 Thread Jeff Law via Gcc-patches



On 7/19/23 04:11, Xiao Zeng wrote:

This patch completes the recognition of the basic semantics
defined in the spec, namely:

Conditional zero, if condition is equal to zero
   rd = (rs2 == 0) ? 0 : rs1
Conditional zero, if condition is non zero
   rd = (rs2 != 0) ? 0 : rs1

gcc/ChangeLog:

* config/riscv/riscv.md: Include zicond.md
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics.c: New test.
So as I mentioned earlier today, adjusting the insn condition to use 
rtx_equal_p seems to be the right way to go.  Attached is a V2 of this 
patch that implements that.  It was trivial enough to do that there's no 
need to break this patch down further.


I've pushed V2 to the trunk.

Thanks,
Jeffcommit 74290c664d1d4c067a996253fe50ec671668
Author: Xiao Zeng 
Date:   Wed Jul 26 11:59:59 2023 -0600

[PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

This patch completes the recognition of the basic semantics
defined in the spec, namely:

Conditional zero, if condition is equal to zero
  rd = (rs2 == 0) ? 0 : rs1
Conditional zero, if condition is non zero
  rd = (rs2 != 0) ? 0 : rs1

gcc/ChangeLog:

* config/riscv/riscv.md: Include zicond.md
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics.c: New test.

Co-authored-by: Philipp Tomsich 
Co-authored-by: Raphael Zinsly 
Co-authored-by: Jeff Law 

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 24515bcf706..8d8fc93bb14 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3319,3 +3319,4 @@ (define_expand "msubhisi4"
 (include "sifive-7.md")
 (include "thead.md")
 (include "vector.md")
+(include "zicond.md")
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
new file mode 100644
index 000..723a22422e1
--- /dev/null
+++ b/gcc/config/riscv/zicond.md
@@ -0,0 +1,84 @@
+;; Machine description for the RISC-V Zicond extension
+;; Copyright (C) 2022-23 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_code_iterator eq_or_ne [eq ne])
+(define_code_attr eqz [(eq "nez") (ne "eqz")])
+(define_code_attr nez [(eq "eqz") (ne "nez")])
+
+;; Zicond
+(define_insn "*czero.."
+  [(set (match_operand:GPR 0 "register_operand"  "=r")
+(if_then_else:GPR (eq_or_ne (match_operand:ANYI 1 "register_operand" 
"r")
+(const_int 0))
+  (match_operand:GPR 2 "register_operand""r")
+  (const_int 0)))]
+  "TARGET_ZICOND"
+  "czero.\t%0,%2,%1"
+)
+
+(define_insn "*czero.."
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(if_then_else:GPR (eq_or_ne (match_operand:ANYI 1 "register_operand" 
"r")
+(const_int 0))
+  (const_int 0)
+  (match_operand:GPR 2 "register_operand"   "r")))]
+  "TARGET_ZICOND"
+  "czero.\t%0,%2,%1"
+)
+
+;; Special optimization under eq/ne in primitive semantics
+(define_insn "*czero.eqz..opt1"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
+  (const_int 0))
+  (match_operand:GPR 2 "register_operand" "1")
+  (match_operand:GPR 3 "register_operand" "r")))]
+  "TARGET_ZICOND && rtx_equal_p (operands[1], operands[2])"
+  "czero.eqz\t%0,%3,%1"
+)
+
+(define_insn "*czero.eqz..opt2"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (eq (match_operand:ANYI 1 "register_operand" "r")
+  (const_int 0))
+  (match_operand:GPR 2 "register_operand" "r")
+  (match_operand:GPR 3 "register_operand" "1")))]
+  "TARGET_ZICOND && rtx_equal_p (operands[1],  operands[3])"
+  "czero.nez\t%0,%2,%1"
+)
+
+(define_insn "*czero.nez..opt3"
+  [(set (match_operand:GPR 0 "register_operand"   "=r")
+(if_then_else:GPR (ne (match_operand:ANYI 1 

Re: [PATCH 1/5] [RISC-V] Recognize Zicond extension

2023-07-26 Thread Jeff Law via Gcc-patches




On 7/19/23 04:11, Xiao Zeng wrote:

This patch is the minimal support for Zicond extension, include
the extension name, mask and target defination.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* config/riscv/riscv-opts.h (MASK_ZICOND): New mask.
(TARGET_ZICOND): New target.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-20.c: New test.
* gcc.target/riscv/attribute-21.c: New test.

I've pushed this to the trunk.
jeff


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Robin Dapp via Gcc-patches
> I would like to propose that being focus and moving forward for this
> patch itself, the underlying other RVV floating point API support and
> the RVV instrinsic API fully tests depend on this.

Sorry, I didn't mean to ditch LCM/mode switching.  I believe it is doing
a pretty good job and we should continue to use it.  The changes in this
patch (and the ones before) seem to follow a certain plan but, at least
to me, it only became obvious with this last patch.  We're already lost
in details when the fundamentals are not agreed upon yet.  It would have
been easier to discuss (and quicker to "focus and move forward") if the
cover letter had already laid out the possible alternatives and their
respective pros and cons instead, even more so when many things depend
on it.

Still, three things:
 
 (1) I'm fully on board with restoring the rounding mode after changing
 it implicitly via an intrinsic (guess everybody is).  This needs to be
 done anyway and also implies a costly fsrm.  "Forcing" it before a call
 can most likely be treated like any other DYN instruction requiring the
 "entry" rounding mode.  Likewise restoring at function exit.  The
 placement of the necessary restores LCM can handle reasonably well.

 (2) What I'm not entirely happy with is assuming that every function
 call can _change_ the rounding mode and we always need to re-backup it.
 I realize that it might be a necessary evil because all other options
 are worse.  Assuming no change through a call makes properly using
 fesetround-like calls impossible as they would clobber our backup
 register.  This patch takes the approach to re-backup after every call.
 As-is, wouldn't we also need to make sure that GCC  knows that a call
 clobbers the FRM (via clobber: (reg:SI 69 frm)) so we don't accidentally
 move something beyond it?

 (3) One other option I can think of is "localized" re-backup of the
 FRM before each mode-changing intrinsic.  That would result in
 redundant save/restore insns around those than with the call proposal
 and therefore likely worse.  Whether that is relevant when the restore
 is slow anyway might be debatable.  Yet, it's not a given that storing
 the FRM always is an in-order operation, it has just mostly been
 that way historically.  Another conceivable option (and maybe even
 the right thing to do) would be special treatment, like a propagating
 flag etc. for fesetround.  That's common code and not likely to happen
 or land soon, though.

Regards
 Robin


Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-26 Thread Lewis Hyatt via Gcc-patches
May I please ping this?
I am just about ready with the followup patch that fixes PR87299, but
it depends on this one. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623364.html

-Lewis

On Fri, Jun 30, 2023 at 6:59 PM Lewis Hyatt  wrote:
>
> In order to support processing #pragma in preprocess-only mode (-E or
> -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> libcpp. In full compilation modes, this is accomplished by calling
> pragma_lex (), which is a symbol that must be exported by the frontend, and
> which is currently implemented for C and C++. Neither of those frontends
> initializes its parser machinery in preprocess-only mode, and consequently
> pragma_lex () does not work in this case.
>
> Address that by adding a new function c_init_preprocess () for the frontends
> to implement, which arranges for pragma_lex () to work in preprocess-only
> mode, and adjusting pragma_lex () accordingly.
>
> In preprocess-only mode, the preprocessor is accustomed to controlling the
> interaction with libcpp, and it only knows about tokens that it has called
> into libcpp itself to obtain. Since it still needs to see the tokens
> obtained by pragma_lex () so that they can be streamed to the output, also
> add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> sees these tokens too.
>
> Currently, there is one place where we are already supporting #pragma in
> preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> was done by directly interfacing with libcpp, rather than making use of
> pragma_lex (). Now that pragma_lex () works, that code is no longer
> necessary; remove it.
>
> gcc/c-family/ChangeLog:
>
> * c-common.h (c_init_preprocess): Declare new function.
> * c-opts.cc (c_common_init): Call it.
> * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> (pragma_diagnostic_lex): ...this.
> (pragma_diagnostic_lex_pp): Remove.
> (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> all modes.
> (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> usage.
> * c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
>
> gcc/c/ChangeLog:
>
> * c-parser.cc (pragma_lex): Support preprocess-only mode.
> (pragma_lex_discard_to_eol): New function.
> (c_init_preprocess): New function.
>
> gcc/cp/ChangeLog:
>
> * parser.cc (c_init_preprocess): New function.
> (maybe_read_tokens_for_pragma_lex): New function.
> (pragma_lex): Support preprocess-only mode.
> (pragma_lex_discard_to_eol): New funtion.
>
> libcpp/ChangeLog:
>
> * include/cpplib.h (struct cpp_callbacks): Add new callback
> on_token_lex.
> * macro.cc (cpp_get_token_1): Support new callback.
> ---
>
> Notes:
> Hello-
>
> In r13-1544, I added support for processing `#pragma GCC diagnostic' in
> preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
> that patch I called into libcpp directly to obtain the tokens needed to
> process the pragma. As part of the review, Jason noted that it would
> probably be better to make pragma_lex () usable in preprocess-only mode, 
> and
> we decided just to add a comment about that for the time being, and to go
> ahead and implement that in the future, if it became necessary to support
> other pragmas during preprocessing.
>
> I think now is a good time to proceed with that plan, because I would like
> to fix PR87299, which is about another pragma (#pragma GCC target) not
> working in preprocess-only mode. This patch makes the necessary changes 
> for
> pragma_lex () to work in preprocess-only mode.
>
> I have also added a new callback, on_token_lex (), to libcpp. This is so 
> the
> preprocessor can see and stream out all the tokens that pragma_lex () gets
> from libcpp, since it won't otherwise see them.  This seemed the simplest
> approach to me. Another possibility would be to add a wrapper function in
> c-family/c-lex.cc, which would call cpp_get_token_with_location(), and 
> then
> also stream the token in preprocess-only mode, and then change all calls
> into libcpp in that file to use the wrapper function.  The libcpp callback
> seemed cleaner to me FWIW.
>
> There are no new tests added here, since it's just a change of
> implementation covered by existing tests. Bootstrap + regtest all 
> languages
> looks good on x86-64 Linux.
>
> Please let me know what you think? Thanks!
>
> -Lewis
>
>  gcc/c-family/c-common.h  |  3 +++
>  gcc/c-family/c-opts.cc   |  1 +
>  gcc/c-family/c-pragma.cc | 56 ++--
>  gcc/c-family/c-pragma.h  |  2 ++
>  gcc/c/c-parser.cc| 34 
>  gcc/cp/parser.cc | 50 +++
>  libcpp/include/cpplib.h  |  4 

Re: [PATCH 1/2][frontend] Add novector C++ pragma

2023-07-26 Thread Jason Merrill via Gcc-patches

On 7/26/23 15:32, Tamar Christina wrote:

+
+   cp_token *tok = pragma_tok;
+
+   do
  {
-   tok = cp_lexer_consume_token (parser->lexer);
-   ivdep = cp_parser_pragma_ivdep (parser, tok);
-   tok = cp_lexer_peek_token (the_parser->lexer);
+   switch (cp_parser_pragma_kind (tok))
+ {
+   case PRAGMA_IVDEP:
+ {
+   if (tok != pragma_tok)
+ tok = cp_lexer_consume_token (parser->lexer);
+   ivdep = cp_parser_pragma_ivdep (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   case PRAGMA_UNROLL:
+ {
+   if (tok != pragma_tok)
+ tok = cp_lexer_consume_token (parser->lexer);
+   unroll = cp_parser_pragma_unroll (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   case PRAGMA_NOVECTOR:
+ {
+   if (tok != pragma_tok)
+ tok = cp_lexer_consume_token (parser->lexer);
+   novector = cp_parser_pragma_novector (parser, tok);
+   tok = cp_lexer_peek_token (the_parser->lexer);
+   break;
+ }
+   default:
+ gcc_unreachable ();


This unreachable seems to assert that if a pragma follows one of these
pragmas, it must be another one of these pragmas?  That seems wrong;
instead of hitting gcc_unreachable() in that case we should fall through to the
diagnostic below.



Ah, good should. Since it has to exit two levels I had to introduce a bool
for controlling the loop iterations.  New patch below.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?


OK.


Thanks,
Tamar

gcc/cp/ChangeLog:

* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect.exp (support vect- prefix).
* g++.dg/vect/vect-novector-pragma.cc: New test.

--- inline copy of patch ---

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 
8398223311194837441107cb335d497ff5f5ec1c..bece7bff1f01a23cfc94386fd3295a0be8c462fe
 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t)
  #define RANGE_FOR_UNROLL(NODE)TREE_OPERAND (RANGE_FOR_STMT_CHECK 
(NODE), 4)
  #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 
5)
  #define RANGE_FOR_IVDEP(NODE) TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
+#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_5 (RANGE_FOR_STMT_CHECK (NODE))
  
  /* STMT_EXPR accessor.  */

  #define STMT_EXPR_STMT(NODE)  TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0)
@@ -7286,7 +7287,7 @@ extern bool maybe_clone_body  (tree);
  
  /* In parser.cc */

  extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
- unsigned short);
+ unsigned short, bool);
  extern void cp_convert_omp_range_for (tree &, vec *, tree &,
  tree &, tree &, tree &, tree &, tree &);
  extern void cp_finish_omp_range_for (tree, tree);
@@ -7609,16 +7610,19 @@ extern void begin_else_clause   (tree);
  extern void finish_else_clause(tree);
  extern void finish_if_stmt(tree);
  extern tree begin_while_stmt  (void);
-extern void finish_while_stmt_cond (tree, tree, bool, unsigned short);
+extern void finish_while_stmt_cond (tree, tree, bool, unsigned short,
+bool);
  extern void finish_while_stmt (tree);
  extern tree begin_do_stmt (void);
  extern void finish_do_body(tree);
-extern void finish_do_stmt (tree, tree, bool, unsigned short);
+extern void finish_do_stmt (tree, tree, bool, unsigned short,
+bool);
  extern tree finish_return_stmt(tree);
  extern tree begin_for_scope   (tree *);

Re: [PATCH] RISC-V: Enable basic VLS modes support

2023-07-26 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

just some small remarks, all in all no major concerns.

> +   vmv%m1r.v\t%0,%1"
> +  "&& (!register_operand (operands[0], mode)
> +   || !register_operand (operands[1], mode))"
> +  [(const_int 0)]
> +  {
> +unsigned size = GET_MODE_BITSIZE (mode).to_constant ();
> +if (size <= MAX_BITS_PER_WORD

Any specific reason for MAX_BITS_PER_WORD instead of
GET_MODE_BITSIZE (Pmode)?  In general I like the idea to switch
to scalar moves here but couldn't it already be debatable for
a 64-bit move on rv32 i.e. a question of costing?

> +  gcc_assert (ok_p);
> +  DONE;

Here as well as before, why the assert?  Is this intended for
easier debugging later?

> +
> +(define_expand "@mov_lra"
> +  [(parallel
> +[(set (match_operand:VLS_AVL_REG 0 "reg_or_mem_operand")
> +   (match_operand:VLS_AVL_REG 1 "reg_or_mem_operand"))
> +   (clobber (match_scratch:P 2))])]
> +  "TARGET_VECTOR && (lra_in_progress || reload_completed)"
> +{})
> +
> +(define_insn_and_split "*mov_lra"

These are the same as for the fractional modes but we define
the expanders for all modes instead.  legitimize_move will only
create the _lra if the mode is indeed smaller than a vector and
I assume the scratch will prevent us from ever generating the
insn any other way.  Still, couldn't we add the new modes to
V_FRACT?  Or didn't you want to mix VLA and VLS modes there?
The others seem to be mixed, though.
> +#define INCLUDE_ALGORITHM

What do we need this for?

> +  int inner_size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
> +  if (size < TARGET_MIN_VLEN)
> + {
> +   int factor = TARGET_MIN_VLEN / size;
> +   if (inner_size == 8)
> + factor = std::min (factor, 8);

Maybe for this?  Won't MIN suffice?

> +  ;; VLS modes.
> +  (V2QI "TARGET_VECTOR_VLS")

Why do these iterator modes have a condition on VLS while the following
ones don't?  It's probably not terribly important as it works either way
but still sticks out.

Btw. as a general remark.  In the past I also found the single-element
vectors helpful for codegen but that might be obsolete.  Not in scope
for this patch.

Regards
 Robin



Re: [PATCH v2][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-07-26 Thread Jason Merrill via Gcc-patches

On 6/28/23 06:35, Alex Coplan wrote:

Hi,

This patch implements clang's __has_feature and __has_extension in GCC.
This is a v2 of the original RFC posted here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html

Changes since v1:
  - Follow the clang behaviour where -pedantic-errors means that
__has_extension behaves exactly like __has_feature.
  - We're now more conservative with reporting C++ features as extensions
available in C++98. For features where we issue a pedwarn in C++98
mode, we no longer report these as available extensions for C++98.
  - Switch to using a hash_map to store the features. As well as ensuring
lookup is constant time, this allows us to dynamically register
features (right now based on frontend, but later we could allow the
target to register additional features).
  - Also implement some Objective-C features, add a langhook to dispatch
to each frontend to allow it to register language-specific features.


Hmm, it seems questionable to use a generic langhook for something that 
the generic code doesn't care about, only the c-family front ends.  A 
common pattern in c-family is to declare a signature in c-common.h and 
define it differently for the various front-ends, i.e. in the *-lang.cc 
files.



There is an outstanding question around what to do with
cxx_binary_literals in the C frontend for C2x. Should we introduce a new
c_binary_literals feature that is a feature in C2x and an extension
below that, or should we just continue using the cxx_binary_literals
feature and mark that as a standard feature in C2x? See the comment in
c_feature_table in the patch.


What does clang do here?


There is also some doubt over what to do with the undocumented "tls"
feature.  In clang this is gated on whether the target supports TLS, but
in clang (unlike GCC) it is a hard error to use TLS when the target
doesn't support it.  In GCC I believe you can always use TLS, you just
get emulated TLS in the case that the target doesn't support it
natively.  So in this patch GCC always reports having the "tls" feature.
Would appreciate if anyone has feedback on this aspect.


Hmm, I don't think GCC always supports TLS, given that the testsuite has 
a predicate to check for that support (and others to check for emulated 
or native support).


But I think it's right to report having "tls" for emulated support.


I know Iain was concerned that it should be possible to have
target-specific features. Hopefully it is clear that the design in this
patch is more amenable in this. I think for Darwin it should be possible
to add a targetcm hook to register additional features (either passing
through a callback to allow the target code to add to the hash_map, or
exposing a separate langhook that the target can call to register
features).


The design seems a bit complicated still, with putting a callback into 
the map.  Do we need the callbacks?  Do we expect the value of 
__has_feature to change at different points in compilation?  Does that 
happen in clang?



Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin. Any
thoughts?


Most of the patch needs more comments, particularly before various 
top-level definitions.


Jason



Re: [PATCH, v3] Fortran: diagnose strings of non-constant length in DATA statements [PR68569]

2023-07-26 Thread Steve Kargl via Gcc-patches
On Wed, Jul 26, 2023 at 09:33:22PM +0200, Harald Anlauf via Fortran wrote:
> I am going to get the brown bag for today...  This is now the right
> corrected patch.
> 
> Sorry for all the noise!
> 

Third times a charm (as the saying goes).

Looks good to me.  Thanks for the patch.

-- 
Steve


RE: [PATCH 2/2][frontend]: Add novector C pragma

2023-07-26 Thread Tamar Christina via Gcc-patches
Hi, This is a respin of the patch taking in the feedback received from the C++ 
part.

Simultaneously it's also a ping 



Hi All,

FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should
not be applied to a particular loop.

ICC/ICX also has such a pragma for C and C++ called #pragma novector.

As part of this patch series I need a way to easily turn off vectorization of
particular loops, particularly for testsuite reasons.

This patch proposes a #pragma GCC novector that does the same for C
as gfortan does for FORTRAN and what ICX/ICX does for C.

I added only some basic tests here, but the next patch in the series uses this
in the testsuite in about ~800 tests.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/c-family/ChangeLog:

* c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
* c-pragma.cc (init_pragma): Use it.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement, c_parser_statement_after_labels,
c_parse_pragma_novector, c_parser_pragma): Wire through novector and
default to false.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-novector-pragma.c: New test.

--- inline copy of patch ---

diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 
9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576
 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -87,6 +87,7 @@ enum pragma_kind {
   PRAGMA_GCC_PCH_PREPROCESS,
   PRAGMA_IVDEP,
   PRAGMA_UNROLL,
+  PRAGMA_NOVECTOR,
 
   PRAGMA_FIRST_EXTERNAL
 };
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 
0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88
 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1862,6 +1862,10 @@ init_pragma (void)
 cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL,
  false, false);
 
+  if (!flag_preprocess_only)
+cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR,
+ false, false);
+
 #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
   c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);
 #else
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..74f3cbb0d61b5f4c0eb300672f495dde3f1517f7
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, 
bool *,
  location_t * = NULL);
 static void c_parser_if_statement (c_parser *, bool *, vec *);
 static void c_parser_switch_statement (c_parser *, bool *);
-static void c_parser_while_statement (c_parser *, bool, unsigned short, bool 
*);
-static void c_parser_do_statement (c_parser *, bool, unsigned short);
-static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
+static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
+ bool *);
+static void c_parser_do_statement (c_parser *, bool, unsigned short, bool);
+static void c_parser_for_statement (c_parser *, bool, unsigned short, bool,
+   bool *);
 static tree c_parser_asm_statement (c_parser *);
 static tree c_parser_asm_operands (c_parser *);
 static tree c_parser_asm_goto_operands (c_parser *);
@@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool 
*if_p,
  c_parser_switch_statement (parser, if_p);
  break;
case RID_WHILE:
- c_parser_while_statement (parser, false, 0, if_p);
+ c_parser_while_statement (parser, false, 0, false, if_p);
  break;
case RID_DO:
- c_parser_do_statement (parser, false, 0);
+ c_parser_do_statement (parser, false, 0, false);
  break;
case RID_FOR:
- c_parser_for_statement (parser, false, 0, if_p);
+ c_parser_for_statement (parser, false, 0, false, if_p);
  break;
case RID_GOTO:
  c_parser_consume_token (parser);
@@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p)
 
 static void
 c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll,
- bool *if_p)
+ bool novector, bool *if_p)
 {
   tree block, cond, body;
   unsigned char save_in_statement;
@@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, 
unsigned short unroll,
   build_int_cst (integer_type_node,
  annot_expr_unroll_kind),
   build_int_cst (integer_type_node, unroll));
+  if (novector && cond != error_mark_node)
+cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
+  build_int_cst 

[PATCH, v3] Fortran: diagnose strings of non-constant length in DATA statements [PR68569]

2023-07-26 Thread Harald Anlauf via Gcc-patches
I am going to get the brown bag for today...  This is now the right 
corrected patch.


Sorry for all the noise!

Harald

Am 26.07.23 um 21:17 schrieb Harald Anlauf via Gcc-patches:

Dear all,

the original submission missed the adjustments of the expected
patterns of 2 tests.  This is corrected in the new attachments.

Am 26.07.23 um 21:10 schrieb Harald Anlauf via Gcc-patches:

Dear all,

the attached patch fixes an ICE-on-invalid after use of strings of
non-constant length in DATA statements.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald



Thanks,
Harald

From d872b8ffc121fd57d47aa7d3d12d9ba86389f092 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 26 Jul 2023 21:12:45 +0200
Subject: [PATCH] Fortran: diagnose strings of non-constant length in DATA
 statements [PR68569]

gcc/fortran/ChangeLog:

	PR fortran/68569
	* resolve.cc (check_data_variable): Do not accept strings with
	deferred length or non-constant length in a DATA statement.
	Reject also substrings of string variables of non-constant length.

gcc/testsuite/ChangeLog:

	PR fortran/68569
	* gfortran.dg/data_char_4.f90: Adjust expected diagnostic.
	* gfortran.dg/data_char_5.f90: Likewise.
	* gfortran.dg/data_char_6.f90: New test.
---
 gcc/fortran/resolve.cc| 22 ++-
 gcc/testsuite/gfortran.dg/data_char_4.f90 |  2 +-
 gcc/testsuite/gfortran.dg/data_char_5.f90 |  8 +++
 gcc/testsuite/gfortran.dg/data_char_6.f90 | 26 +++
 4 files changed, 52 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/data_char_6.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index f7cfdfc133f..3cd470ddcca 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -16771,7 +16771,6 @@ check_data_variable (gfc_data_variable *var, locus *where)
 return false;
 
   ar = NULL;
-  mpz_init_set_si (offset, 0);
   e = var->expr;
 
   if (e->expr_type == EXPR_FUNCTION && e->value.function.isym
@@ -16838,8 +16837,24 @@ check_data_variable (gfc_data_variable *var, locus *where)
 		 "attribute", ref->u.c.component->name, >where);
 	  return false;
 	}
+
+  /* Reject substrings of strings of non-constant length.  */
+  if (ref->type == REF_SUBSTRING
+	  && ref->u.ss.length
+	  && ref->u.ss.length->length
+	  && !gfc_is_constant_expr (ref->u.ss.length->length))
+	goto bad_charlen;
 }
 
+  /* Reject strings with deferred length or non-constant length.  */
+  if (e->ts.type == BT_CHARACTER
+  && (e->ts.deferred
+	  || (e->ts.u.cl->length
+	  && !gfc_is_constant_expr (e->ts.u.cl->length
+goto bad_charlen;
+
+  mpz_init_set_si (offset, 0);
+
   if (e->rank == 0 || has_pointer)
 {
   mpz_init_set_ui (size, 1);
@@ -16967,6 +16982,11 @@ check_data_variable (gfc_data_variable *var, locus *where)
   mpz_clear (offset);
 
   return t;
+
+bad_charlen:
+  gfc_error ("Non-constant character length at %L in DATA statement",
+	 >where);
+  return false;
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/data_char_4.f90 b/gcc/testsuite/gfortran.dg/data_char_4.f90
index ed0782ce8a0..fa5e0a0134a 100644
--- a/gcc/testsuite/gfortran.dg/data_char_4.f90
+++ b/gcc/testsuite/gfortran.dg/data_char_4.f90
@@ -4,7 +4,7 @@
 
 program p
   character(l) :: c(2) ! { dg-error "must have constant character length" }
-  data c /'a', 'b'/
+  data c /'a', 'b'/! { dg-error "Non-constant character length" }
   common c
 end
 
diff --git a/gcc/testsuite/gfortran.dg/data_char_5.f90 b/gcc/testsuite/gfortran.dg/data_char_5.f90
index ea26687e3d5..7556e63c01b 100644
--- a/gcc/testsuite/gfortran.dg/data_char_5.f90
+++ b/gcc/testsuite/gfortran.dg/data_char_5.f90
@@ -4,12 +4,12 @@
 subroutine sub ()
   integer :: ll = 4
   block
-character(ll) :: c(2) ! { dg-error "non-constant" }
-data c /'a', 'b'/
+character(ll) :: c(2)
+data c /'a', 'b'/ ! { dg-error "Non-constant character length" }
   end block
 contains
   subroutine sub1 ()
-character(ll) :: d(2) ! { dg-error "non-constant" }
-data d /'a', 'b'/
+character(ll) :: d(2)
+data d /'a', 'b'/ ! { dg-error "Non-constant character length" }
   end subroutine sub1
 end subroutine sub
diff --git a/gcc/testsuite/gfortran.dg/data_char_6.f90 b/gcc/testsuite/gfortran.dg/data_char_6.f90
new file mode 100644
index 000..4e32c647d4d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/data_char_6.f90
@@ -0,0 +1,26 @@
+! { dg-do compile }
+! PR fortran/68569 - ICE with automatic character object and DATA 
+! Contributed by G. Steinmetz
+
+subroutine s1 (n)
+  implicit none
+  integer, intent(in) :: n
+  character(n) :: x
+  data x /'a'/ ! { dg-error "Non-constant character length" }
+end
+
+subroutine s2 (n)
+  implicit none
+  integer, intent(in) :: n
+  character(n) :: x
+  data x(1:1) /'a'/! { dg-error "Non-constant character length" }
+end
+
+subroutine s3 ()
+  implicit none
+  type t
+ character(:) :: c ! { dg-error "must be a POINTER or 

RE: [PATCH 1/2][frontend] Add novector C++ pragma

2023-07-26 Thread Tamar Christina via Gcc-patches
> > +
> > +   cp_token *tok = pragma_tok;
> > +
> > +   do
> >   {
> > -   tok = cp_lexer_consume_token (parser->lexer);
> > -   ivdep = cp_parser_pragma_ivdep (parser, tok);
> > -   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   switch (cp_parser_pragma_kind (tok))
> > + {
> > +   case PRAGMA_IVDEP:
> > + {
> > +   if (tok != pragma_tok)
> > + tok = cp_lexer_consume_token (parser->lexer);
> > +   ivdep = cp_parser_pragma_ivdep (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   case PRAGMA_UNROLL:
> > + {
> > +   if (tok != pragma_tok)
> > + tok = cp_lexer_consume_token (parser->lexer);
> > +   unroll = cp_parser_pragma_unroll (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   case PRAGMA_NOVECTOR:
> > + {
> > +   if (tok != pragma_tok)
> > + tok = cp_lexer_consume_token (parser->lexer);
> > +   novector = cp_parser_pragma_novector (parser, tok);
> > +   tok = cp_lexer_peek_token (the_parser->lexer);
> > +   break;
> > + }
> > +   default:
> > + gcc_unreachable ();
> 
> This unreachable seems to assert that if a pragma follows one of these
> pragmas, it must be another one of these pragmas?  That seems wrong;
> instead of hitting gcc_unreachable() in that case we should fall through to 
> the
> diagnostic below.
> 

Ah, good should. Since it has to exit two levels I had to introduce a bool
for controlling the loop iterations.  New patch below.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/cp/ChangeLog:

* cp-tree.h (RANGE_FOR_NOVECTOR): New.
(cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Add novector param.
* init.cc (build_vec_init): Default novector to false.
* method.cc (build_comparison_op): Likewise.
* parser.cc (cp_parser_statement): Likewise.
(cp_parser_for, cp_parser_c_for, cp_parser_range_for,
cp_convert_range_for, cp_parser_iteration_statement,
cp_parser_omp_for_loop, cp_parser_pragma): Support novector.
(cp_parser_pragma_novector): New.
* pt.cc (tsubst_expr): Likewise.
* semantics.cc (finish_while_stmt_cond, finish_do_stmt,
finish_for_cond): Likewise.

gcc/ChangeLog:

* doc/extend.texi: Document it.

gcc/testsuite/ChangeLog:

* g++.dg/vect/vect.exp (support vect- prefix).
* g++.dg/vect/vect-novector-pragma.cc: New test.

--- inline copy of patch ---

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 
8398223311194837441107cb335d497ff5f5ec1c..bece7bff1f01a23cfc94386fd3295a0be8c462fe
 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t)
 #define RANGE_FOR_UNROLL(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4)
 #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5)
 #define RANGE_FOR_IVDEP(NODE)  TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE))
+#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_5 (RANGE_FOR_STMT_CHECK (NODE))
 
 /* STMT_EXPR accessor.  */
 #define STMT_EXPR_STMT(NODE)   TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0)
@@ -7286,7 +7287,7 @@ extern bool maybe_clone_body  (tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool,
- unsigned short);
+ unsigned short, bool);
 extern void cp_convert_omp_range_for (tree &, vec *, tree &,
  tree &, tree &, tree &, tree &, tree &);
 extern void cp_finish_omp_range_for (tree, tree);
@@ -7609,16 +7610,19 @@ extern void begin_else_clause   (tree);
 extern void finish_else_clause (tree);
 extern void finish_if_stmt (tree);
 extern tree begin_while_stmt   (void);
-extern void finish_while_stmt_cond (tree, tree, bool, unsigned short);
+extern void finish_while_stmt_cond (tree, tree, bool, unsigned short,
+bool);
 extern void finish_while_stmt  (tree);
 extern tree begin_do_stmt  (void);
 extern void finish_do_body (tree);
-extern void finish_do_stmt (tree, tree, bool, unsigned short);
+extern void finish_do_stmt (tree, tree, bool, unsigned short,
+bool);
 extern tree finish_return_stmt (tree);
 extern tree begin_for_scope(tree *);
 extern tree begin_for_stmt (tree, tree);
 

[PATCH, v2] Fortran: diagnose strings of non-constant length in DATA statements [PR68569]

2023-07-26 Thread Harald Anlauf via Gcc-patches

Dear all,

the original submission missed the adjustments of the expected
patterns of 2 tests.  This is corrected in the new attachments.

Am 26.07.23 um 21:10 schrieb Harald Anlauf via Gcc-patches:

Dear all,

the attached patch fixes an ICE-on-invalid after use of strings of
non-constant length in DATA statements.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald



Thanks,
Harald

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index f7cfdfc133f..cd8e223edce 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -16771,7 +16787,6 @@ check_data_variable (gfc_data_variable *var, locus *where)
 return false;
 
   ar = NULL;
-  mpz_init_set_si (offset, 0);
   e = var->expr;
 
   if (e->expr_type == EXPR_FUNCTION && e->value.function.isym
@@ -16838,8 +16853,24 @@ check_data_variable (gfc_data_variable *var, locus *where)
 		 "attribute", ref->u.c.component->name, >where);
 	  return false;
 	}
+
+  /* Reject substrings of strings of non-constant length.  */
+  if (ref->type == REF_SUBSTRING
+	  && ref->u.ss.length
+	  && ref->u.ss.length->length
+	  && !gfc_is_constant_expr (ref->u.ss.length->length))
+	goto bad_charlen;
 }
 
+  /* Reject deferred length character and strings of non-constant length.  */
+  if (e->ts.type == BT_CHARACTER
+  && (e->ts.deferred
+	  || (e->ts.u.cl->length
+	  && !gfc_is_constant_expr (e->ts.u.cl->length
+goto bad_charlen;
+
+  mpz_init_set_si (offset, 0);
+
   if (e->rank == 0 || has_pointer)
 {
   mpz_init_set_ui (size, 1);
@@ -16967,6 +16998,11 @@ check_data_variable (gfc_data_variable *var, locus *where)
   mpz_clear (offset);
 
   return t;
+
+bad_charlen:
+  gfc_error ("Non-constant character length at %L in DATA statement",
+	 >where);
+  return false;
 }
 
 


[PATCH] Fortran: diagnose strings of non-constant length in DATA statements [PR68569]

2023-07-26 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached patch fixes an ICE-on-invalid after use of strings of
non-constant length in DATA statements.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From b5b13db48c169ef18a8b75739bd4f22f9fd5654e Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 26 Jul 2023 20:46:50 +0200
Subject: [PATCH] Fortran: diagnose strings of non-constant length in DATA
 statements [PR68569]

gcc/fortran/ChangeLog:

	PR fortran/68569
	* resolve.cc (check_data_variable): Do not accept strings with
	deferred length or non-constant length in a DATA statement.
	Reject also substrings of string variables of non-constant length.

gcc/testsuite/ChangeLog:

	PR fortran/68569
	* gfortran.dg/data_char_6.f90: New test.
---
 gcc/fortran/resolve.cc| 22 ++-
 gcc/testsuite/gfortran.dg/data_char_6.f90 | 26 +++
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/data_char_6.f90

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index f7cfdfc133f..3cd470ddcca 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -16771,7 +16771,6 @@ check_data_variable (gfc_data_variable *var, locus *where)
 return false;

   ar = NULL;
-  mpz_init_set_si (offset, 0);
   e = var->expr;

   if (e->expr_type == EXPR_FUNCTION && e->value.function.isym
@@ -16838,8 +16837,24 @@ check_data_variable (gfc_data_variable *var, locus *where)
 		 "attribute", ref->u.c.component->name, >where);
 	  return false;
 	}
+
+  /* Reject substrings of strings of non-constant length.  */
+  if (ref->type == REF_SUBSTRING
+	  && ref->u.ss.length
+	  && ref->u.ss.length->length
+	  && !gfc_is_constant_expr (ref->u.ss.length->length))
+	goto bad_charlen;
 }

+  /* Reject strings with deferred length or non-constant length.  */
+  if (e->ts.type == BT_CHARACTER
+  && (e->ts.deferred
+	  || (e->ts.u.cl->length
+	  && !gfc_is_constant_expr (e->ts.u.cl->length
+goto bad_charlen;
+
+  mpz_init_set_si (offset, 0);
+
   if (e->rank == 0 || has_pointer)
 {
   mpz_init_set_ui (size, 1);
@@ -16967,6 +16982,11 @@ check_data_variable (gfc_data_variable *var, locus *where)
   mpz_clear (offset);

   return t;
+
+bad_charlen:
+  gfc_error ("Non-constant character length at %L in DATA statement",
+	 >where);
+  return false;
 }


diff --git a/gcc/testsuite/gfortran.dg/data_char_6.f90 b/gcc/testsuite/gfortran.dg/data_char_6.f90
new file mode 100644
index 000..4e32c647d4d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/data_char_6.f90
@@ -0,0 +1,26 @@
+! { dg-do compile }
+! PR fortran/68569 - ICE with automatic character object and DATA
+! Contributed by G. Steinmetz
+
+subroutine s1 (n)
+  implicit none
+  integer, intent(in) :: n
+  character(n) :: x
+  data x /'a'/ ! { dg-error "Non-constant character length" }
+end
+
+subroutine s2 (n)
+  implicit none
+  integer, intent(in) :: n
+  character(n) :: x
+  data x(1:1) /'a'/! { dg-error "Non-constant character length" }
+end
+
+subroutine s3 ()
+  implicit none
+  type t
+ character(:) :: c ! { dg-error "must be a POINTER or ALLOCATABLE" }
+  end type t
+  type(t) :: tp
+  data tp%c /'a'/  ! { dg-error "Non-constant character length" }
+end
--
2.35.3



Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-26 Thread Drew Ross via Gcc-patches
Here is what I came up with for combining the two:

/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
   unsigned x OR truncate into the precision(type) - c lowest bits
   of signed x (if they have mode precision or a precision of 1)  */
(simplify
 (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
 (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
  (if (TYPE_UNSIGNED (type))
   (bit_and @0 (rshift { build_minus_one_cst (type); } @1))
   (if (INTEGRAL_TYPE_P (type))
(with {
  int width = element_precision (type) - tree_to_uhwi (@1);
  tree stype = build_nonstandard_integer_type (width, 0);
 }
 (if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
  (convert (convert:stype @0

Let me know what you think.

> Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?

I will look into this.

Drew

On Wed, Jul 26, 2023 at 4:40 AM Richard Biener 
wrote:

> On Tue, Jul 25, 2023 at 9:26 PM Drew Ross  wrote:
> >
> > > With that fixed I think for non-vector integrals the above is the most
> suitable
> > > canonical form of a sign-extension.  Note it should also work for any
> other
> > > constant shift amount - just use the appropriate intermediate
> precision for
> > > the truncating type.
> > > We _might_ want
> > > to consider to only use the converts when the intermediate type has
> > > mode precision (and as a special case allow one bit as in your above
> case)
> > > so it can expand to (sign_extend: (subreg: reg)).
> >
> > Here is a pattern that that only matches to truncations that result in
> mode precision (or precision of 1):
> >
> > (simplify
> >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >  (if (INTEGRAL_TYPE_P (type)
> >   && !TYPE_UNSIGNED (type)
> >   && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN
> (TREE_TYPE (@1
> >   (with {
> > int width = element_precision (type) - tree_to_uhwi (@1);
> > tree stype = build_nonstandard_integer_type (width, 0);
> >}
> >(if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> > (convert (convert:stype @0))
> >
> > Look ok?
>
> I suppose so.  Can you see to amend the existing
>
> /* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
>types.  */
> (simplify
>  (rshift (lshift @0 INTEGER_CST@1) @1)
>  (if (TYPE_UNSIGNED (type)
>   && (wi::ltu_p (wi::to_wide (@1), element_precision (type
>   (bit_and @0 (rshift { build_minus_one_cst (type); } @1
>
> pattern?  You will get a duplicate pattern diagnostic otherwise.  It
> also looks like this
> one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
> some cases of widening/truncating converts between the shifts?
>
> Richard.
>
> > > You might also want to verify what RTL expansion
> > > produces before/after - it at least shouldn't be worse.
> >
> > The RTL is slightly better for the mode precision cases and slightly
> worse for the precision 1 case.
> >
> > > That said - do you have any testcase where the canonicalization is an
> enabler
> > > for further transforms or was this requested stand-alone?
> >
> > No, I don't have any specific test cases. This patch is just in response
> to pr101955.
> >
> > On Tue, Jul 25, 2023 at 2:55 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >>
> >> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek  wrote:
> >> >
> >> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches
> wrote:
> >> > > So would something like
> >> > >
> >> > > (simplify
> >> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
> >> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
> >> > >  (if (INTEGRAL_TYPE_P (type)
> >> > >   && !TYPE_UNSIGNED (type)
> >> > >   && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
> >> > >   (convert (convert:stype @0)
> >> > >
> >> > > work?
> >> >
> >> > Certainly swap the if and with and the (with then should be indented
> by 1
> >> > column to the right of (if and (convert one further (the reason for
> the
> >> > swapping is not to call build_nonstandard_integer_type when it will
> not be
> >> > needed, which will be probably far more often then an actual match).
> >>
> >> With that fixed I think for non-vector integrals the above is the most
> suitable
> >> canonical form of a sign-extension.  Note it should also work for any
> other
> >> constant shift amount - just use the appropriate intermediate precision
> for
> >> the truncating type.  You might also want to verify what RTL expansion
> >> produces before/after - it at least shouldn't be worse.  We _might_ want
> >> to consider to only use the converts when the intermediate type has
> >> mode precision (and as a special case allow one bit as in your above
> case)
> >> so it can expand to (sign_extend: (subreg: reg)).
> >>
> >> > As discussed privately, the above isn't what we want for vectors and
> 

[committed] libstdc++: Require C++11 for 23_containers/vector/bool/110807.cc [PR110807]

2023-07-26 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk.

-- >8 --

This new test uses uniform initialization syntax, so requires C++11 or
later.

libstdc++-v3/ChangeLog:

PR libstdc++/110807
* testsuite/23_containers/vector/bool/110807.cc: Require c++11.
---
 libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc 
b/libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc
index 5c019bd9524..2e9d4019edb 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc
@@ -1,5 +1,5 @@
 // { dg-options "-O2" }
-// { dg-do compile }
+// { dg-do compile { target c++11 } }
 
 // Bug 110807
 // Copy list initialisation of a vector raises a warning with -O2
-- 
2.41.0



Re: [PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

2023-07-26 Thread Jeff Law via Gcc-patches




On 7/19/23 04:11, Xiao Zeng wrote:

This patch completes the recognition of the basic semantics
defined in the spec, namely:

Conditional zero, if condition is equal to zero
   rd = (rs2 == 0) ? 0 : rs1
Conditional zero, if condition is non zero
   rd = (rs2 != 0) ? 0 : rs1

gcc/ChangeLog:

* config/riscv/riscv.md: Include zicond.md
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-primitiveSemantics.c: New test.
So I played with this a bit today.  I originally thought that using 
match_dup was the right way to go for those 4 secondary patterns.  But 
after further pondering it's not ideal.


match_dup will require pointer equality within the RTL structure.  That 
could inhibit detection in two cases.  First, SUBREGs.   SUBREGs are not 
shared.  So we'd never match if we had a SUBREG expression.


Second, post register allocation we can have the same looking RTX, but 
it may not be pointer equal.


The SUBREG issue also means that we don't want to use a REGNO (x) == 
REGNO (y) style check because those macros are only valid on REG 
expressions.  We could strip the SUBREG, but that's usually awkward to 
do in a pattern's condition.


The net result is we probably should use rtx_equal_p which I was hoping 
to avoid.  I'm testing with that change to the 4 secondary patterns 
right now.  Assuming that passes (and I have no reason to think it 
won't) then I'll go ahead and commit #1 and #2 from this series which is 
all I have time for today.




Jeff


Re: [gcc13 backport 12/12] riscv: fix error: control reaches end of non-void function

2023-07-26 Thread Patrick O'Neill

This final patch fixes an error introduced by patch 9/12 in this series.
I'll backport alongside the other patches once the 13 branch is unfrozen :)

On 7/25/23 18:22, Kito Cheng wrote:

OK for backport :)

On Wed, Jul 26, 2023 at 2:11 AM Patrick O'Neill  wrote:

From: Martin Liska 

Fixes:
gcc/config/riscv/sync.md:66:1: error: control reaches end of non-void function 
[-Werror=return-type]
66 |   [(set (attr "length") (const_int 4))])
| ^

 PR target/109713

gcc/ChangeLog:

 * config/riscv/sync.md: Add gcc_unreachable to a switch.
---
  gcc/config/riscv/sync.md | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 6e7c762ac57..9fc626267de 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -62,6 +62,8 @@
 return "fence\tr,rw";
  else if (model == MEMMODEL_RELEASE)
 return "fence\trw,w";
+else
+   gcc_unreachable ();
}
[(set (attr "length") (const_int 4))])

--
2.34.1



Re: [gcc-13] Backport PR10280 fix

2023-07-26 Thread Prathamesh Kulkarni via Gcc-patches
Sorry, I meant PR110280 in subject line (not PR10280).

On Wed, 26 Jul 2023 at 23:03, Prathamesh Kulkarni
 wrote:
>
> Hi Richard,
> Sorry for the delay in backport to gcc-13.
> The attached patch (cherry picked from master) is bootstrapped+tested
> on aarch64-linux-gnu with SVE enabled on gcc-13 branch.
> OK to commit to gcc-13 branch ?
>
> Thanks,
> Prathamesh


[gcc-13] Backport PR10280 fix

2023-07-26 Thread Prathamesh Kulkarni via Gcc-patches
Hi Richard,
Sorry for the delay in backport to gcc-13.
The attached patch (cherry picked from master) is bootstrapped+tested
on aarch64-linux-gnu with SVE enabled on gcc-13 branch.
OK to commit to gcc-13 branch ?

Thanks,
Prathamesh
[aarch64/match.pd] Fix ICE observed in PR110280.

gcc/ChangeLog:
PR tree-optimization/110280
* match.pd (vec_perm_expr(v, v, mask) -> v): Explicitly build vector
using build_vector_from_val with the element of input operand, and
mask's type if operand and mask's types don't match.

gcc/testsuite/ChangeLog:
PR tree-optimization/110280
* gcc.target/aarch64/sve/pr110280.c: New test.

(cherry picked from commit 85d8e0d8d5342ec8b4e6a54e22741c30b33c6f04)

diff --git a/gcc/match.pd b/gcc/match.pd
index 91182448250..c3bb4fbc0a7 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -8292,7 +8292,14 @@ and,
 
 (simplify
  (vec_perm vec_same_elem_p@0 @0 @1)
- @0)
+ (if (types_match (type, TREE_TYPE (@0)))
+  @0
+  (with
+   {
+ tree elem = uniform_vector_p (@0);
+   }
+   (if (elem)
+{ build_vector_from_val (type, elem); }
 
 /* Push VEC_PERM earlier if that may help FMA perception (PR101895).  */
 (simplify
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
new file mode 100644
index 000..d3279f38362
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr110280.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include "arm_sve.h"
+
+svuint32_t l()
+{
+  _Alignas(16) const unsigned int lanes[4] = {0, 0, 0, 0};
+  return svld1rq_u32(svptrue_b8(), lanes);
+}
+
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */


Re: [PATCH] c++: constexpr empty subobject confusion [PR110197]

2023-07-26 Thread Jason Merrill via Gcc-patches

On 7/26/23 12:57, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13 (later)?


OK.


-- >8 --

Now that init_subob_ctx no longer sets new_ctx.ctor for a subobject of
empty type, it seems we need to ensure its callers cxx_eval_bare_aggregate
and cxx_eval_vec_init_1 consistently omit entries for such subobjects in
the parent ctx->ctor.  We also need to allow cxx_eval_array_reference
to synthesize an empty element object even if the array CONSTRUCTOR
has CONSTRUCTOR_NO_CLEARING set.

PR c++/110197

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Return a synthesized
empty subobject even if CONSTRUCTOR_NO_CLEARING is set.
(cxx_eval_bare_aggregate): Set 'no_slot' to true more generally
whenever new_ctx.ctor is empty, i.e. for any subobject of empty
type.
(cxx_eval_vec_init_1): Define 'no_slot' as above and use it
accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-empty18.C: New test.
* g++.dg/cpp0x/constexpr-empty19.C: New test.
---
  gcc/cp/constexpr.cc   | 23 +--
  .../g++.dg/cpp0x/constexpr-empty18.C  |  7 ++
  .../g++.dg/cpp0x/constexpr-empty19.C  | 12 ++
  3 files changed, 35 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f2fcb54626d..da2c3116810 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4297,6 +4297,9 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
  
/* Not found.  */
  
+  if (is_really_empty_class (elem_type, /*ignore_vptr*/false))

+return build_constructor (elem_type, NULL);
+
if (TREE_CODE (ary) == CONSTRUCTOR
&& CONSTRUCTOR_NO_CLEARING (ary))
  {
@@ -4314,9 +4317,7 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   directly for non-aggregates to avoid creating a garbage CONSTRUCTOR.  */
tree val;
constexpr_ctx new_ctx;
-  if (is_really_empty_class (elem_type, /*ignore_vptr*/false))
-return build_constructor (elem_type, NULL);
-  else if (CP_AGGREGATE_TYPE_P (elem_type))
+  if (CP_AGGREGATE_TYPE_P (elem_type))
  {
tree empty_ctor = build_constructor (init_list_type_node, NULL);
val = digest_init (elem_type, empty_ctor, tf_warning_or_error);
@@ -5095,9 +5096,9 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
FOR_EACH_CONSTRUCTOR_ELT (v, i, index, value)
  {
tree orig_value = value;
-  /* Like in cxx_eval_store_expression, omit entries for empty fields.  */
-  bool no_slot = TREE_CODE (type) == RECORD_TYPE && is_empty_field (index);
init_subob_ctx (ctx, new_ctx, index, value);
+  /* Like in cxx_eval_store_expression, omit entries for empty fields.  */
+  bool no_slot = new_ctx.ctor == NULL_TREE;
int pos_hint = -1;
if (new_ctx.ctor != ctx->ctor && !no_slot)
{
@@ -5261,7 +5262,8 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
bool reuse = false;
constexpr_ctx new_ctx;
init_subob_ctx (ctx, new_ctx, idx, pre_init ? init : elttype);
-  if (new_ctx.ctor != ctx->ctor)
+  bool no_slot = new_ctx.ctor == NULL_TREE;
+  if (new_ctx.ctor != ctx->ctor && !no_slot)
{
  if (zeroed_out)
CONSTRUCTOR_NO_CLEARING (new_ctx.ctor) = false;
@@ -5306,7 +5308,14 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
}
if (*non_constant_p)
break;
-  if (new_ctx.ctor != ctx->ctor)
+  if (no_slot)
+   {
+ /* This is an initializer for an empty subobject; now that we've
+checked that it's constant, we can ignore it.  */
+ gcc_checking_assert (i == 0);
+ break;
+   }
+  else if (new_ctx.ctor != ctx->ctor)
{
  /* We appended this element above; update the value.  */
  gcc_assert ((*p)->last().index == idx);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C
new file mode 100644
index 000..4bb9e3dcb64
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C
@@ -0,0 +1,7 @@
+// PR c++/110197
+// { dg-do compile { target c++11 } }
+
+struct A { constexpr A(int) { } };
+struct B { A a; };
+constexpr B f(int n) { return B{A{n}}; }
+constexpr B b = f(1);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C
new file mode 100644
index 000..5ad67682c5b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C
@@ -0,0 +1,12 @@
+// PR c++/110197
+// { dg-do compile { target c++11 } }
+
+struct A {
+  constexpr A() : A(__builtin_is_constant_evaluated()) { }
+  constexpr A(int) { }
+};
+constexpr 

Re: [PATCH] c++: unifying REAL_CSTs [PR110809]

2023-07-26 Thread Jason Merrill via Gcc-patches

On 7/26/23 12:57, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13 (after the branch is unfrozen)?


OK.


-- >8 --

This teaches unify how to compare two REAL_CSTs.

PR c++/110809

gcc/cp/ChangeLog:

* pt.cc (unify) : Generalize to handle
REAL_CST as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-float3.C: New test.
---
  gcc/cp/pt.cc|  5 +++--
  gcc/testsuite/g++.dg/cpp2a/nontype-float3.C | 12 
  2 files changed, 15 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-float3.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d342ab5929a..1e09f304490 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -24890,12 +24890,13 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
/* Types INTEGER_CST and MINUS_EXPR can come from array bounds.  */
/* Type INTEGER_CST can come from ordinary constant template args.  */
  case INTEGER_CST:
+case REAL_CST:
while (CONVERT_EXPR_P (arg))
arg = TREE_OPERAND (arg, 0);
  
-  if (TREE_CODE (arg) != INTEGER_CST)

+  if (TREE_CODE (arg) != TREE_CODE (parm))
return unify_template_argument_mismatch (explain_p, parm, arg);
-  return (tree_int_cst_equal (parm, arg)
+  return (simple_cst_equal (parm, arg)
  ? unify_success (explain_p)
  : unify_template_argument_mismatch (explain_p, parm, arg));
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-float3.C b/gcc/testsuite/g++.dg/cpp2a/nontype-float3.C

new file mode 100644
index 000..044fb99905a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-float3.C
@@ -0,0 +1,12 @@
+// PR c++/110809
+// { dg-do compile { target c++20 } }
+
+template struct A { };
+
+template void f(A);
+template void f(A);
+
+int main() {
+  f(A{});
+  f(A{}); // { dg-error "no match" }
+}




[PATCH] c++: constexpr empty subobject confusion [PR110197]

2023-07-26 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13 (later)?

-- >8 --

Now that init_subob_ctx no longer sets new_ctx.ctor for a subobject of
empty type, it seems we need to ensure its callers cxx_eval_bare_aggregate
and cxx_eval_vec_init_1 consistently omit entries for such subobjects in
the parent ctx->ctor.  We also need to allow cxx_eval_array_reference
to synthesize an empty element object even if the array CONSTRUCTOR
has CONSTRUCTOR_NO_CLEARING set.

PR c++/110197

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Return a synthesized
empty subobject even if CONSTRUCTOR_NO_CLEARING is set.
(cxx_eval_bare_aggregate): Set 'no_slot' to true more generally
whenever new_ctx.ctor is empty, i.e. for any subobject of empty
type.
(cxx_eval_vec_init_1): Define 'no_slot' as above and use it
accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-empty18.C: New test.
* g++.dg/cpp0x/constexpr-empty19.C: New test.
---
 gcc/cp/constexpr.cc   | 23 +--
 .../g++.dg/cpp0x/constexpr-empty18.C  |  7 ++
 .../g++.dg/cpp0x/constexpr-empty19.C  | 12 ++
 3 files changed, 35 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f2fcb54626d..da2c3116810 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4297,6 +4297,9 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
 
   /* Not found.  */
 
+  if (is_really_empty_class (elem_type, /*ignore_vptr*/false))
+return build_constructor (elem_type, NULL);
+
   if (TREE_CODE (ary) == CONSTRUCTOR
   && CONSTRUCTOR_NO_CLEARING (ary))
 {
@@ -4314,9 +4317,7 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
  directly for non-aggregates to avoid creating a garbage CONSTRUCTOR.  */
   tree val;
   constexpr_ctx new_ctx;
-  if (is_really_empty_class (elem_type, /*ignore_vptr*/false))
-return build_constructor (elem_type, NULL);
-  else if (CP_AGGREGATE_TYPE_P (elem_type))
+  if (CP_AGGREGATE_TYPE_P (elem_type))
 {
   tree empty_ctor = build_constructor (init_list_type_node, NULL);
   val = digest_init (elem_type, empty_ctor, tf_warning_or_error);
@@ -5095,9 +5096,9 @@ cxx_eval_bare_aggregate (const constexpr_ctx *ctx, tree t,
   FOR_EACH_CONSTRUCTOR_ELT (v, i, index, value)
 {
   tree orig_value = value;
-  /* Like in cxx_eval_store_expression, omit entries for empty fields.  */
-  bool no_slot = TREE_CODE (type) == RECORD_TYPE && is_empty_field (index);
   init_subob_ctx (ctx, new_ctx, index, value);
+  /* Like in cxx_eval_store_expression, omit entries for empty fields.  */
+  bool no_slot = new_ctx.ctor == NULL_TREE;
   int pos_hint = -1;
   if (new_ctx.ctor != ctx->ctor && !no_slot)
{
@@ -5261,7 +5262,8 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
   bool reuse = false;
   constexpr_ctx new_ctx;
   init_subob_ctx (ctx, new_ctx, idx, pre_init ? init : elttype);
-  if (new_ctx.ctor != ctx->ctor)
+  bool no_slot = new_ctx.ctor == NULL_TREE;
+  if (new_ctx.ctor != ctx->ctor && !no_slot)
{
  if (zeroed_out)
CONSTRUCTOR_NO_CLEARING (new_ctx.ctor) = false;
@@ -5306,7 +5308,14 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
}
   if (*non_constant_p)
break;
-  if (new_ctx.ctor != ctx->ctor)
+  if (no_slot)
+   {
+ /* This is an initializer for an empty subobject; now that we've
+checked that it's constant, we can ignore it.  */
+ gcc_checking_assert (i == 0);
+ break;
+   }
+  else if (new_ctx.ctor != ctx->ctor)
{
  /* We appended this element above; update the value.  */
  gcc_assert ((*p)->last().index == idx);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C
new file mode 100644
index 000..4bb9e3dcb64
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty18.C
@@ -0,0 +1,7 @@
+// PR c++/110197
+// { dg-do compile { target c++11 } }
+
+struct A { constexpr A(int) { } };
+struct B { A a; };
+constexpr B f(int n) { return B{A{n}}; }
+constexpr B b = f(1);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C
new file mode 100644
index 000..5ad67682c5b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty19.C
@@ -0,0 +1,12 @@
+// PR c++/110197
+// { dg-do compile { target c++11 } }
+
+struct A {
+  constexpr A() : A(__builtin_is_constant_evaluated()) { }
+  constexpr A(int) { }
+};
+constexpr A a1[1] = {{}};
+constexpr A a2[2] = {{}, {}};
+constexpr A a3[3] = {{}, {}, 

[PATCH] c++: unifying REAL_CSTs [PR110809]

2023-07-26 Thread Patrick Palka via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13 (after the branch is unfrozen)?

-- >8 --

This teaches unify how to compare two REAL_CSTs.

PR c++/110809

gcc/cp/ChangeLog:

* pt.cc (unify) : Generalize to handle
REAL_CST as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-float3.C: New test.
---
 gcc/cp/pt.cc|  5 +++--
 gcc/testsuite/g++.dg/cpp2a/nontype-float3.C | 12 
 2 files changed, 15 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-float3.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d342ab5929a..1e09f304490 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -24890,12 +24890,13 @@ unify (tree tparms, tree targs, tree parm, tree arg, 
int strict,
   /* Types INTEGER_CST and MINUS_EXPR can come from array bounds.  */
   /* Type INTEGER_CST can come from ordinary constant template args.  */
 case INTEGER_CST:
+case REAL_CST:
   while (CONVERT_EXPR_P (arg))
arg = TREE_OPERAND (arg, 0);
 
-  if (TREE_CODE (arg) != INTEGER_CST)
+  if (TREE_CODE (arg) != TREE_CODE (parm))
return unify_template_argument_mismatch (explain_p, parm, arg);
-  return (tree_int_cst_equal (parm, arg)
+  return (simple_cst_equal (parm, arg)
  ? unify_success (explain_p)
  : unify_template_argument_mismatch (explain_p, parm, arg));
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-float3.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-float3.C
new file mode 100644
index 000..044fb99905a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-float3.C
@@ -0,0 +1,12 @@
+// PR c++/110809
+// { dg-do compile { target c++20 } }
+
+template struct A { };
+
+template void f(A);
+template void f(A);
+
+int main() {
+  f(A{});
+  f(A{}); // { dg-error "no match" }
+}
-- 
2.41.0.450.ga80be15292



New Ukrainian PO file for 'gcc' (version 13.1.0)

2023-07-26 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-13.1.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[pushed] c++: member vs global template [PR106310]

2023-07-26 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

For backward compatibility we still want to allow patterns like
this->A::foo, but the template keyword in a qualified name is
specifically to specify that a dependent name is a template, so don't look
in the enclosing scope at all.

Also fix handling of dependent bases: if member lookup in the current
instantiation fails and we have dependent bases, the lookup is dependent.
We were already handling that for the case where lookup in the enclosing
scope also fails, but we also want it to affect that lookup itself.

PR c++/106310

gcc/cp/ChangeLog:

* parser.cc (cp_parser_template_name): Skip non-member
lookup after the template keyword.
(cp_parser_lookup_name): Pass down template_keyword_p.

gcc/testsuite/ChangeLog:

* g++.dg/template/template-keyword4.C: New test.
---
 gcc/cp/parser.cc  | 20 ---
 .../g++.dg/template/template-keyword4.C   | 18 +
 2 files changed, 31 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/template-keyword4.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5e2b5cba57e..571997733be 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -2748,7 +2748,7 @@ static tree cp_parser_objc_struct_declaration
 /* Utility Routines */
 
 static cp_expr cp_parser_lookup_name
-  (cp_parser *, tree, enum tag_types, bool, bool, bool, tree *, location_t);
+  (cp_parser *, tree, enum tag_types, int, bool, bool, tree *, location_t);
 static tree cp_parser_lookup_name_simple
   (cp_parser *, tree, location_t);
 static tree cp_parser_maybe_treat_template_as_class
@@ -18771,7 +18771,7 @@ cp_parser_template_name (cp_parser* parser,
   /* Look up the name.  */
   decl = cp_parser_lookup_name (parser, identifier,
tag_type,
-   /*is_template=*/true,
+   /*is_template=*/1 + template_keyword_p,
/*is_namespace=*/false,
check_dependency_p,
/*ambiguous_decls=*/NULL,
@@ -31173,7 +31173,7 @@ prefer_type_arg (tag_types tag_type)
refer to types are ignored.
 
If IS_TEMPLATE is TRUE, bindings that do not refer to templates are
-   ignored.
+   ignored.  If IS_TEMPLATE IS 2, the 'template' keyword was specified.
 
If IS_NAMESPACE is TRUE, bindings that do not refer to namespaces
are ignored.
@@ -31188,7 +31188,7 @@ prefer_type_arg (tag_types tag_type)
 static cp_expr
 cp_parser_lookup_name (cp_parser *parser, tree name,
   enum tag_types tag_type,
-  bool is_template,
+  int is_template,
   bool is_namespace,
   bool check_dependency,
   tree *ambiguous_decls,
@@ -31372,7 +31372,14 @@ cp_parser_lookup_name (cp_parser *parser, tree name,
   else
decl = NULL_TREE;
 
-  if (!decl)
+  /* If we didn't find a member and have dependent bases, the member lookup
+is now dependent.  */
+  if (!dep && !decl && any_dependent_bases_p (object_type))
+   dep = true;
+
+  if (dep && is_template == 2)
+   /* The template keyword specifies a dependent template.  */;
+  else if (!decl)
/* Look it up in the enclosing context.  DR 141: When looking for a
   template-name after -> or ., only consider class templates.  */
decl = lookup_name (name, is_namespace ? LOOK_want::NAMESPACE
@@ -31400,8 +31407,7 @@ cp_parser_lookup_name (cp_parser *parser, tree name,
 
   /* If we know we're looking for a type (e.g. A in p->A::x),
 mock up a typename.  */
-  if (!decl && object_type && tag_type != none_type
- && dependentish_scope_p (object_type))
+  if (!decl && dep && tag_type != none_type)
{
  tree type = build_typename_type (object_type, name, name,
   typename_type);
diff --git a/gcc/testsuite/g++.dg/template/template-keyword4.C 
b/gcc/testsuite/g++.dg/template/template-keyword4.C
new file mode 100644
index 000..a7ab9bb8ca6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/template-keyword4.C
@@ -0,0 +1,18 @@
+// PR c++/106310
+
+template 
+struct set{};
+
+template< typename T >
+struct Base
+{
+  template< int > int set(T const &);
+};
+
+template< typename T >
+struct Derived : Base< T >
+{
+  void f(T const ) {
+this->template set< 0 >(arg);
+  }
+};

base-commit: 21da32d995c8b574c929ec420cd3b0fcfe6fa4fe
-- 
2.39.3



Re: [PATCH] c++: devirtualization of array destruction [PR110057]

2023-07-26 Thread Jason Merrill via Gcc-patches

On 7/26/23 12:00, Ng YongXiang wrote:

Hi Jason,

Thanks for the reply and review. I've attached an updated patch with the 
change log and sign off.


The change made in gcc/testsuite/g++.dg/warn/pr83054.C is because I 
think there is no more warning since we have already devirtualized the 
destruction for the array.


Makes sense, and it's good to have your adjusted testcase in the 
testsuite, it should just be a new one (maybe pr83054-2.C).


Apologies for the poor formatting. It is my first time contributing. Do 
let me know if there's any stuff I've missed and feel free to modify the 
patch where you deem necessary.


No worries!

The ChangeLog entries still need some adjustment, according to git 
gcc-verify (from contrib/gcc-git-customization.sh, see 
https://gcc.gnu.org/gitwrite.html):


ERR: line should start with a tab: "* init.c: Call non virtual 
destructor of objects in array"
ERR: line should start with a tab: "* 
g++.dg/devirt-array-destructor-1.C: New."
ERR: line should start with a tab: "* 
g++.dg/devirt-array-destructor-2.C: New."
ERR: line should start with a tab: "* g++.dg/warn/pr83054.C: 
Remove expected warnings caused by devirtualization"
ERR: PR 110057 in subject but not in changelog: "c++: devirtualization 
of array destruction [PR110057]"


git gcc-commit-mklog (also from gcc-git-customization.sh) makes 
generating ChangeLog entries a lot simpler.



* g++.dg/devirt-array-destructor-1.C: New.


Tests that look at tree-optimization dump files should go in the 
g++.dg/tree-ssa subdirectory.



+/* { dg-do run } */


It seems unnecessary to execute these tests, I'd think the default of { 
dg-do compile } would be fine.


It's also good to have a

// PR c++/110057

line at the top of the testcase for future reference.  gcc-commit-mklog 
also uses that to add the PR number to the ChangeLog.


Jason



Re: [PATCH v2 2/2] bpf: add v3 atomic instructions

2023-07-26 Thread Jose E. Marchesi via Gcc-patches


OK.
Thanks!

> [Changes from v1: fix merge issue in invoke.texi]
>
> This patch adds support for the general atomic operations introduced in
> eBPF v3. In addition to the existing atomic add instruction, this adds:
>  - Atomic and, or, xor
>  - Fetching versions of these operations (including add)
>  - Atomic exchange
>  - Atomic compare-and-exchange
>
> To control emission of these instructions, a new target option
> -m[no-]v3-atomics is added. This option is enabled by -mcpu=v3
> and above.
>
> Support for these instructions was recently added in binutils.
>
> gcc/
>
>   * config/bpf/bpf.opt (mv3-atomics): New option.
>   * config/bpf/bpf.cc (bpf_option_override): Handle it here.
>   * config/bpf/bpf.h (enum_reg_class): Add R0 class.
>   (REG_CLASS_NAMES): Likewise.
>   (REG_CLASS_CONTENTS): Likewise.
>   (REGNO_REG_CLASS): Handle R0.
>   * config/bpf/bpf.md (UNSPEC_XADD): Rename to UNSPEC_AADD.
>   (UNSPEC_AAND): New unspec.
>   (UNSPEC_AOR): Likewise.
>   (UNSPEC_AXOR): Likewise.
>   (UNSPEC_AFADD): Likewise.
>   (UNSPEC_AFAND): Likewise.
>   (UNSPEC_AFOR): Likewise.
>   (UNSPEC_AFXOR): Likewise.
>   (UNSPEC_AXCHG): Likewise.
>   (UNSPEC_ACMPX): Likewise.
>   (atomic_add): Use UNSPEC_AADD and atomic type attribute.
>   Move to...
>   * config/bpf/atomic.md: ...Here. New file.
>   * config/bpf/constraints.md (t): New constraint for R0.
>   * doc/invoke.texi (eBPF Options): Document -mv3-atomics.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/atomic-cmpxchg-1.c: New test.
>   * gcc.target/bpf/atomic-cmpxchg-2.c: New test.
>   * gcc.target/bpf/atomic-fetch-op-1.c: New test.
>   * gcc.target/bpf/atomic-fetch-op-2.c: New test.
>   * gcc.target/bpf/atomic-fetch-op-3.c: New test.
>   * gcc.target/bpf/atomic-op-1.c: New test.
>   * gcc.target/bpf/atomic-op-2.c: New test.
>   * gcc.target/bpf/atomic-op-3.c: New test.
>   * gcc.target/bpf/atomic-xchg-1.c: New test.
>   * gcc.target/bpf/atomic-xchg-2.c: New test.
> ---
>  gcc/config/bpf/atomic.md  | 185 ++
>  gcc/config/bpf/bpf.cc |   3 +
>  gcc/config/bpf/bpf.h  |   6 +-
>  gcc/config/bpf/bpf.md |  29 ++-
>  gcc/config/bpf/bpf.opt|   4 +
>  gcc/config/bpf/constraints.md |   3 +
>  gcc/doc/invoke.texi   |   8 +-
>  .../gcc.target/bpf/atomic-cmpxchg-1.c |  19 ++
>  .../gcc.target/bpf/atomic-cmpxchg-2.c |  19 ++
>  .../gcc.target/bpf/atomic-fetch-op-1.c|  50 +
>  .../gcc.target/bpf/atomic-fetch-op-2.c|  50 +
>  .../gcc.target/bpf/atomic-fetch-op-3.c|  49 +
>  gcc/testsuite/gcc.target/bpf/atomic-op-1.c|  49 +
>  gcc/testsuite/gcc.target/bpf/atomic-op-2.c|  49 +
>  gcc/testsuite/gcc.target/bpf/atomic-op-3.c|  49 +
>  gcc/testsuite/gcc.target/bpf/atomic-xchg-1.c  |  20 ++
>  gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c  |  20 ++
>  17 files changed, 593 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/config/bpf/atomic.md
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-cmpxchg-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-fetch-op-3.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-op-3.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xchg-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/atomic-xchg-2.c
>
> diff --git a/gcc/config/bpf/atomic.md b/gcc/config/bpf/atomic.md
> new file mode 100644
> index 000..caf8cc15cd4
> --- /dev/null
> +++ b/gcc/config/bpf/atomic.md
> @@ -0,0 +1,185 @@
> +;; Machine description for eBPF.
> +;; Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify
> +;; it under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +
> +;; GCC is distributed in the hope that it will be useful,
> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +;; GNU General Public License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; .
> +
> +
> +(define_mode_iterator AMO [SI DI])
> +
> +;;; Plain atomic modify operations.
> +
> +;; Non-fetching atomic add predates all 

Re: RISC-V: Replace unspec with bitreverse in riscv_brev8_ insn

2023-07-26 Thread Jeff Law via Gcc-patches




On 7/26/23 02:21, Kito Cheng via Gcc-patches wrote:

My understanding is the semantic is slightly different, brev8 is only
the bit reverse within each byte, but bitreverse means did bit reverse
for the whole content of the mode, e.g. riscv_brev8_si will bit
reserved within 32 bit.

Using RV32 as example:
UNSPEC_BREV8:
rd[0...7]  = rs[7...0]
rd[8...15]  = rs[15...8]
rd[16...23]  = rs[23...16]
rd[16...23]  = rs[31...24]

bitreverse:
rd[0...31] = rs[31...0]

Yea, I think you're right Kito.  Goof on our side.

Jivan, I think this explains why it's not working for all of Mariam's 
cases -- odds are the cases where it is working are for the reversed crc8.


Let's drop this since it doesn't match the semantics of GCC's bitreverse.

Mariam's call on whether or not to utilize brev8 for the crc8 cases 
where it's likely faster than other sequences to reverse bits.


jeff


Re: [PATCH] tree-optimization/106081 - elide redundant permute

2023-07-26 Thread Jeff Law via Gcc-patches




On 7/26/23 07:27, Richard Biener via Gcc-patches wrote:

The following patch makes sure to elide a redundant permute that
can be merged with existing splats represented as load permutations
as we now do for non-grouped SLP loads.  This is the last bit
missing to fix this PR where the main fix was already done by
r14-2117-gdd86a5a69cbda4

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106081
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Assign layout -1 to splats.

* gcc.dg/vect/pr106081.c: New testcase.
:-)  Glad to see how easy this ended up being after the work you put 
into pushing permutes around a couple years ago.


jeff


[C PATCH] Synthesize nonnull attribute for parameters declared with static

2023-07-26 Thread Martin Uecker



C programmers increasingly use static to indicate that
pointer parameters are non-null.  Clang can exploit this
for warnings and optimizations.  GCC has some warnings
but not all warnings it has for nonnull.  Below is a
patch to add a nonnull attribute automatically for such 
arguments and to remove the special and more limited
nonnull warnings for static. This patch found some 
misplaced annotations in one of my projects via
-Wnonnull-compare which clang does not seem to have, 
so I think this could be useful.



Parameters declared with `static` are nonnull. We synthesize
an artifical nonnull attribute for such parameters to get the
same warnings and optimizations.

Bootstrapped and regression tested on x86.


gcc/c-family/:
* c-attribs.cc (build_attr_access_from_parms): Synthesize
nonnull attribute for parameters declared with `static`.

gcc/:
* gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes):
remove warning for parameters declared with `static`.

gcc/testsuite/:
* gcc.dg/Wnonnull-8.c: Adapt test.
* gcc.dg/Wnonnull-9.c: New test.


diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898..ae7ffeb1f65 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -5280,6 +5280,7 @@ build_attr_access_from_parms (tree parms, bool
skip_voidptr)
 arg2pos.put (arg, argpos);
 }
+ tree nnlist = NULL_TREE;
 argpos = 0;
 for (tree arg = parms; arg; arg = TREE_CHAIN (arg), ++argpos)
 {
@@ -5313,6 +5314,11 @@ build_attr_access_from_parms (tree parms, bool
skip_voidptr)
 tree str = TREE_VALUE (argspec);
 const char *s = TREE_STRING_POINTER (str);
+ /* Collect the list of nonnull arguments which use "[static ..]". */
+ if (s != NULL && s[0] == '[' && s[1] == 's')
+ nnlist = tree_cons (NULL_TREE, build_int_cst (integer_type_node,
+ argpos + 1), nnlist);
+
 /* Create the attribute access string from the arg spec string,
 optionally followed by position of the VLA bound argument if
 it is one. */
@@ -5380,6 +5386,10 @@ build_attr_access_from_parms (tree parms, bool
skip_voidptr)
 if (!spec.length ())
 return NULL_TREE;
+ /* If we have nonnull arguments, synthesize an attribute. */
+ if (nnlist != NULL_TREE)
+ nnlist = build_tree_list (get_identifier ("nonnull"), nnlist);
+
 /* Attribute access takes a two or three arguments. Wrap VBLIST in
 another list in case it has more nodes than would otherwise fit. */
 vblist = build_tree_list (NULL_TREE, vblist);
@@ -5390,7 +5400,7 @@ build_attr_access_from_parms (tree parms, bool
skip_voidptr)
 tree str = build_string (spec.length (), spec.c_str ());
 tree attrargs = tree_cons (NULL_TREE, str, vblist);
 tree name = get_identifier ("access");
- return build_tree_list (name, attrargs);
+ return tree_cons (name, attrargs, nnlist);
 }
 /* Handle a "nothrow" attribute; arguments as in
diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-
access.cc
index ac07a6f9b95..b1cd06c 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -3504,16 +3504,6 @@ pass_waccess::maybe_check_access_sizes (rdwr_map
*rwm, tree fndecl, tree fntype,
 ptridx + 1, sizidx + 1, sizstr))
 arg_warned = OPT_Wnonnull;
 }
- else if (access_size && access.second.static_p)
- {
- /* Warn about null pointers for [static N] array arguments
- but do not warn for ordinary (i.e., nonstatic) arrays. */
- if (warning_at (loc, OPT_Wnonnull,
- "argument %i to %<%T[static %E]%> "
- "is null where non-null expected",
- ptridx + 1, argtype, access_nelts))
- arg_warned = OPT_Wnonnull;
- }
 if (arg_warned != no_warning)
 {
diff --git a/gcc/testsuite/gcc.dg/Wnonnull-8.c
b/gcc/testsuite/gcc.dg/Wnonnull-8.c
index 02871a76689..b24fd67cebc 100644
--- a/gcc/testsuite/gcc.dg/Wnonnull-8.c
+++ b/gcc/testsuite/gcc.dg/Wnonnull-8.c
@@ -10,5 +10,5 @@ foo (int a[static 7])
 int
 main ()
 {
- foo ((int *) 0); /* { dg-warning "argument 1 to 'int\\\[static 7\\\]'
is null where non-null expected" } */
+ foo ((int *) 0); /* { dg-warning "argument 1 null where non-null
expected" } */
 }
diff --git a/gcc/testsuite/gcc.dg/Wnonnull-9.c
b/gcc/testsuite/gcc.dg/Wnonnull-9.c
new file mode 100644
index 000..1c57eefd2ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wnonnull-9.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall" } */
+
+
+void
+foo (int a[static 1])
+{
+ if ((void*)0 == a) /* { dg-warning "argument" "compared to NULL" } */
+ return;
+}
+
+int
+main ()
+{
+ foo ((void*)0); /* { dg-warning "argument 1 null where non-null
expected" } */
+}
+





[committed] libstdc++: Avoid bogus overflow warnings in std::vector [PR110807]

2023-07-26 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk. Backport to gcc-13 will come after
the 13.2 release.

-- >8 --

GCC thinks the allocation can alter the object being copied if it's
globally reachable, so doesn't realize that [x.begin(), x.end()) after
the allocation is the same as x.size() before it.

This causes a testsuite failure when testing with -D_GLIBCXX_DEBUG:
FAIL: 23_containers/vector/bool/swap.cc (test for excess errors)
A fix is to move the calls to x.begin() and x.end() to before the
allocation.

A similar problem exists in vector::_M_insert_range where *this is
globally reachable, as reported in PR libstdc++/110807. That can also be
fixed by moving calls to begin() and end() before the allocation.

libstdc++-v3/ChangeLog:

PR libstdc++/110807
* include/bits/stl_bvector.h (vector(const vector&)): Access
iterators before allocating.
* include/bits/vector.tcc (vector::_M_insert_range):
Likewise.
* testsuite/23_containers/vector/bool/110807.cc: New test.
---
 libstdc++-v3/include/bits/stl_bvector.h|  3 ++-
 libstdc++-v3/include/bits/vector.tcc   |  5 +++--
 .../testsuite/23_containers/vector/bool/110807.cc  | 14 ++
 3 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index ad462c5933c..8d18bcaffd4 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -773,8 +773,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   vector(const vector& __x)
   : _Base(_Bit_alloc_traits::_S_select_on_copy(__x._M_get_Bit_allocator()))
   {
+   const_iterator __xbegin = __x.begin(), __xend = __x.end();
_M_initialize(__x.size());
-   _M_copy_aligned(__x.begin(), __x.end(), begin());
+   _M_copy_aligned(__xbegin, __xend, begin());
   }
 
 #if __cplusplus >= 201103L
diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index f592c72dec2..ada396c9b30 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -980,11 +980,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  {
const size_type __len =
  _M_check_len(__n, "vector::_M_insert_range");
+   const iterator __begin = begin(), __end = end();
_Bit_pointer __q = this->_M_allocate(__len);
iterator __start(std::__addressof(*__q), 0);
-   iterator __i = _M_copy_aligned(begin(), __position, __start);
+   iterator __i = _M_copy_aligned(__begin, __position, __start);
__i = std::copy(__first, __last, __i);
-   iterator __finish = std::copy(__position, end(), __i);
+   iterator __finish = std::copy(__position, __end, __i);
this->_M_deallocate();
this->_M_impl._M_end_of_storage = __q + _S_nword(__len);
this->_M_impl._M_start = __start;
diff --git a/libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc 
b/libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc
new file mode 100644
index 000..5c019bd9524
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/vector/bool/110807.cc
@@ -0,0 +1,14 @@
+// { dg-options "-O2" }
+// { dg-do compile }
+
+// Bug 110807
+// Copy list initialisation of a vector raises a warning with -O2
+
+#include 
+
+std::vector byCallSpread;
+
+void f()
+{
+  byCallSpread = {true};
+}
-- 
2.41.0



[committed] libstdc++: Add deprecated attribute to std::random_shuffle declarations

2023-07-26 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

Probably worth backporting to 12 and 13.

-- >8 --

We already have these attributes on the definitions in 
but they don't work due to PR c++/84542. Add the attributes to the
declarations in  as well, and add a test.

libstdc++-v3/ChangeLog:

* include/bits/algorithmfwd.h (random_shuffle): Add deprecated
attribute.
* include/bits/stl_algo.h (random_shuffle): Correct comments.
* testsuite/25_algorithms/random_shuffle/1.cc: Disable
deprecated warnings.
* testsuite/25_algorithms/random_shuffle/59603.cc: Likewise.
* testsuite/25_algorithms/random_shuffle/moveable.cc: Likewise.
* testsuite/25_algorithms/random_shuffle/deprecated.cc: New
test.
---
 libstdc++-v3/include/bits/algorithmfwd.h  |  2 ++
 libstdc++-v3/include/bits/stl_algo.h  |  6 +++---
 .../25_algorithms/random_shuffle/1.cc |  1 +
 .../25_algorithms/random_shuffle/59603.cc |  1 +
 .../random_shuffle/deprecated.cc  | 19 +++
 .../25_algorithms/random_shuffle/moveable.cc  |  1 +
 6 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/25_algorithms/random_shuffle/deprecated.cc

diff --git a/libstdc++-v3/include/bits/algorithmfwd.h 
b/libstdc++-v3/include/bits/algorithmfwd.h
index 0d623901cd2..130943d2340 100644
--- a/libstdc++-v3/include/bits/algorithmfwd.h
+++ b/libstdc++-v3/include/bits/algorithmfwd.h
@@ -832,10 +832,12 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 
 #if _GLIBCXX_HOSTED
   template
+_GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle")
 void
 random_shuffle(_RAIter, _RAIter);
 
   template
+_GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle")
 void
 random_shuffle(_RAIter, _RAIter,
 #if __cplusplus >= 201103L
diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 2c52ed51402..3abf0f69d46 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -4479,7 +4479,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
*  equally likely.
*
*  @deprecated
-   *  Since C++14 `std::random_shuffle` is not part of the C++ standard.
+   *  Since C++17, `std::random_shuffle` is not part of the C++ standard.
*  Use `std::shuffle` instead, which was introduced in C++11.
   */
   template
@@ -4518,7 +4518,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
*  range `[0, N)`.
*
*  @deprecated
-   *  Since C++14 `std::random_shuffle` is not part of the C++ standard.
+   *  Since C++17, `std::random_shuffle` is not part of the C++ standard.
*  Use `std::shuffle` instead, which was introduced in C++11.
   */
   template
@@ -4546,7 +4546,7 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
}
 }
 #endif // HOSTED
-#endif // C++11 || USE_DEPRECATED
+#endif // <= C++11 || USE_DEPRECATED
 
   /**
*  @brief Move elements for which a predicate is true to the beginning
diff --git a/libstdc++-v3/testsuite/25_algorithms/random_shuffle/1.cc 
b/libstdc++-v3/testsuite/25_algorithms/random_shuffle/1.cc
index 200c097f4a7..2a8c6323794 100644
--- a/libstdc++-v3/testsuite/25_algorithms/random_shuffle/1.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/random_shuffle/1.cc
@@ -15,6 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// { dg-options "-Wno-deprecated-declarations" }
 // { dg-add-options using-deprecated }
 // { dg-require-effective-target hosted }
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/random_shuffle/59603.cc 
b/libstdc++-v3/testsuite/25_algorithms/random_shuffle/59603.cc
index c678eb333f2..a17429efba0 100644
--- a/libstdc++-v3/testsuite/25_algorithms/random_shuffle/59603.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/random_shuffle/59603.cc
@@ -16,6 +16,7 @@
 // .
 
 // { dg-do run { target c++11 } }
+// { dg-options "-Wno-deprecated-declarations" }
 // { dg-add-options using-deprecated }
 // { dg-require-debug-mode "" }
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/random_shuffle/deprecated.cc 
b/libstdc++-v3/testsuite/25_algorithms/random_shuffle/deprecated.cc
new file mode 100644
index 000..4bc58a6447e
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/random_shuffle/deprecated.cc
@@ -0,0 +1,19 @@
+// { dg-do compile }
+// { dg-add-options using-deprecated }
+// { dg-require-effective-target hosted }
+
+// std::random_shuffle was deprecated in C++17 and removed in C++17.
+
+#include 
+
+std::ptrdiff_t rando(std::ptrdiff_t n);
+
+void
+test_depr(int* first, int* last)
+{
+  std::random_shuffle(first, last);
+  // { dg-warning "deprecated" "" { target c++14 } 14 }
+
+  std::random_shuffle(first, last, rando);
+  // { dg-warning "deprecated" "" { target c++14 } 17 }
+}
diff --git a/libstdc++-v3/testsuite/25_algorithms/random_shuffle/moveable.cc 
b/libstdc++-v3/testsuite/25_algorithms/random_shuffle/moveable.cc
index 2e2cadaf862..e74e6f1c866 100644
--- 

[PATCH] c++: devirtualization of array destruction [PR110057]

2023-07-26 Thread Ng YongXiang via Gcc-patches
Hi Jason,

Thanks for the reply and review. I've attached an updated patch with the
change log and sign off.

The change made in gcc/testsuite/g++.dg/warn/pr83054.C is because I think
there is no more warning since we have already devirtualized the
destruction for the array.

Apologies for the poor formatting. It is my first time contributing. Do let
me know if there's any stuff I've missed and feel free to modify the patch
where you deem necessary.

Thanks!

On Wed, Jul 26, 2023 at 12:25 PM Jason Merrill  wrote:

> On 7/12/23 10:10, Ng YongXiang via Gcc-patches wrote:
> > Component:
> > c++
> >
> > Bug ID:
> > 110057
> >
> > Bugzilla link:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057
> >
> > Description:
> > Array should not call virtual destructor of object when array is
> destructed
> >
> > ChangeLog:
> >
> > 2023-07-12  Ng YongXiang  PR c++
> > * Devirtualize auto generated destructor calls of arraycp/*
> > init.c: Call non virtual destructor of objects in arraytestsuite/
> >* g++.dg/devirt-array-destructor-1.C: New.*
> > g++.dg/devirt-array-destructor-2.C: New.
> >
> >
> > On Wed, Jul 12, 2023 at 5:02 PM Xi Ruoyao  wrote:
> >
> >> On Wed, 2023-07-12 at 16:58 +0800, Ng YongXiang via Gcc-patches wrote:
> >>> I'm writing to seek for a review for an issue I filed some time ago.
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110057 . A proposed patch
> >> is
> >>> attached in the bug tracker as well.
> >>
> >> You should send the patch to gcc-patches@gcc.gnu.org for a review, see
> >> https://gcc.gnu.org/contribute.html for the details.  Generally we
> >> consider patches attached in bugzilla as drafts.
>
> Thanks!  The change makes sense under
> https://eel.is/c++draft/expr.delete#3.sentence-2 , but please look again
> at contribute.html.
>
> In particular, the Legal section; you don't seem to have a copyright
> assignment with the FSF, nor do I see a DCO certification
> (https://gcc.gnu.org/dco.html) in your patch.
>
> Like the examples in contribute.html, the subject line should be more
> like "[PATCH] c++: devirtualization of array destruction [PR110057]"
>
> The ChangeLog entry should be in the commit message.
>
> >  * g++.dg/warn/pr83054.C: Change expected number of
> devirtualized calls
>
> This isn't just changing the expected number, it's also changing the
> array from a local variable to dynamically allocated, which is a big
> change to what's being tested.  If you want to test the dynamic case,
> please add a new test instead of making this change.
>
> > diff --git a/gcc/testsuite/g++.dg/warn/pr83054.C
> b/gcc/testsuite/g++.dg/warn/pr83054.C
> > index 5285f94acee..7cd0951713d 100644
> > --- a/gcc/testsuite/g++.dg/warn/pr83054.C
> > +++ b/gcc/testsuite/g++.dg/warn/pr83054.C
> > @@ -10,7 +10,7 @@
> >  #endif
> >
> >  extern "C" int printf (const char *, ...);
> > -struct foo // { dg-warning "final would enable devirtualization of 5
> calls" }
> > +struct foo // { dg-warning "final would enable devirtualization of 1
> call" }
> >  {
> >static int count;
> >void print (int i, int j) { printf ("foo[%d][%d] = %d\n", i, j, x); }
> > @@ -29,19 +29,15 @@ int foo::count;
> >
> >  int main ()
> >  {
> > -  {
> > -foo array[3][3];
> > -for (int i = 0; i < 3; i++)
> > -  {
> > -   for (int j = 0; j < 3; j++)
> > - {
> > -   printf("[%d][%d] = %x\n", i, j, (void *)[i][j]);
> > - }
> > -  }
> > -  // The count should be nine, if not, fail the test.
> > -  if (foo::count != 9)
> > -   return 1;
> > -  }
> > +  foo* arr[9];
> > +  for (int i = 0; i < 9; ++i)
> > +arr[i] = new foo();
> > +  if (foo::count != 9)
> > +return 1;
> > +  for (int i = 0; i < 9; ++i)
> > +arr[i]->print(i / 3, i % 3);
> > +  for (int i = 0; i < 9; ++i)
> > +delete arr[i];
>
>
>
From 2b7cbd8e0787c8e20f0464dbf610908a9f3c68f7 Mon Sep 17 00:00:00 2001
From: yongxiangng 
Date: Wed, 26 Jul 2023 23:45:25 +0800
Subject: [PATCH 1/1] [PATCH] c++: devirtualization of array destruction
 [PR110057]

cp/ChangeLog:

* init.c: Call non virtual destructor of objects in array

testsuite/ChangeLog:

* g++.dg/devirt-array-destructor-1.C: New.
* g++.dg/devirt-array-destructor-2.C: New.
* g++.dg/warn/pr83054.C: Remove expected warnings caused by devirtualization

Signed-off-by: Ng Yong Xiang 
---
 gcc/cp/init.cc|  8 +++---
 .../g++.dg/devirt-array-destructor-1.C| 27 ++
 .../g++.dg/devirt-array-destructor-2.C| 28 +++
 gcc/testsuite/g++.dg/warn/pr83054.C   |  2 +-
 4 files changed, 60 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/devirt-array-destructor-1.C
 create mode 100644 gcc/testsuite/g++.dg/devirt-array-destructor-2.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 6ccda365b04..69ab51d0a4b 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4112,8 +4112,8 @@ build_vec_delete_1 

Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Palmer Dabbelt

On Wed, 26 Jul 2023 08:34:14 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

I would say LCM/PRE is the key of this set of static rounding model
intrinsic, otherwise I think it's will push people to using dynamic with
fesetrouding mode or inline asm to set the rounding mode for performance
issue - it's kind of opposite way of the design concept, we want to provide
a reliable way with performance to precisely control the ronding model.

For the function call stuff that could be resolved by fenv_access pragma in
theory, since it can be an annotation to tell compiler some function has
modify fenv or not, but unfortunately it’s not well modeled within GCC yet,
so we must did the conservative to make sure we didn't break anything.

And also the LLVM side is trying to implement some simple LCM/PRE to
optimize that, so I believe we need LCM/PRE based mode switching to do that.


IMO that's a perfectly reasonably way to start: let's just get something 
that's correct and simple, if we need to do more complicated stuff later 
we can always add it.


There's going to be a very small amount of this code written my a very 
small number of people (that are likely very close to the compiler teams 
doing the optimizations here), so we can just all work with each other 
to sort out any important performance issues as we go.


I think whether LCM or entry/exit performs better is probably just going 
to boil down to some uarch/workload specific decisions, so as long as 
whatever we have is correct and reasonably simple it seems fine for now.  
Given how little of this code there's going to be it's probably not 
worth spending a ton of time on things until we have a concrete use case 
to drive things.


Let's just make sure to also update the intrinsic spec to get rid of the 
grey area here, that way we can point to something if we want to 
optimize differently in the future.



Li, Pan2 於 2023年7月26日 週三,22:31寫道:


As Juzhe mentioned, the problem of the CALL is resolved by LCM/PRE
similar to the VSETVL pass, which is well proofed up to a point.



I would like to propose that being focus and moving forward for this patch
itself, the underlying other RVV floating point API support and the RVV
instrinsic API fully tests depend on this.



Of course, I am working on PATCH v8 and thanks again for Robin’s comments.



Pan



*From:* 钟居哲 
*Sent:* Wednesday, July 26, 2023 10:18 PM
*To:* rdapp.gcc ; Li, Pan2 
*Cc:* rdapp.gcc ; kito.cheng ;
gcc-patches ; Wang, Yanzhang <
yanzhang.w...@intel.com>
*Subject:* Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point
dynamic rounding



Explicitly backup and restore for each intrinsic just the same as we did
for CALL in this patch.



I can't have the data to prove how good we use LCM/PRE of mode switching
but I trust it.



Since the the LCM/PRE is the key optimization method of VSETVL PASS which
is doing good job on VSETVL instruction optimizations.



I don't we should give up LCM/PRE chance then just backup and restore for
each intrinsic bindly.




--

juzhe.zh...@rivai.ai



*From:* Robin Dapp 

*Date:* 2023-07-26 21:46

*To:* juzhe.zhong ; Li, Pan2 

*CC:* rdapp.gcc ; Kito Cheng ;
gcc-patches@gcc.gnu.org; Wang, Yanzhang 

*Subject:* Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point
dynamic rounding

> current llvm didn't do any pre optimization.  They always

> backup+restore for each rounding mode intrinsic



I see.  There is still the option of lazily restoring the

(entry) FRM before a function call but not read the FRM

after every call.  Do we have any data on how good or bad the

mode-switching LCM works when we explicitly backup and restore

for each intrinsic?



Regards

Robin






[committed] Add check_vect in a testcase

2023-07-26 Thread Matthew Malcomson via Gcc-patches
Also reformat a comment that had too long lines.
Commit c5bd0e5870a introduced both these problems to fix.

Committed as obvious after ensuring that when running the testcase on
AArch64 it still fails before the original c5bd0e5870a commit and passes
after it.

gcc/ChangeLog:

* tree-vect-stmts.cc (get_group_load_store_type): Reformat
comment.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-multi-peel-gaps.c: Add `check_vect` call into
`main` of this testcase.



### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c 
b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
index 
1aab4c5a14d1e8346d89587bd9544a1516535a45..7e7db3c7a45fa1ecc748f51353170ec66c5dfed7
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
@@ -9,6 +9,7 @@
 /* { dg-require-effective-target mmap } */
 #include 
 #include 
+#include "tree-vect.h"
 
 #define MMAP_SIZE 0x2
 #define ADDRESS 0x112200
@@ -24,6 +25,7 @@ void initialise_s(int *s) { }
 int main() {
 void *s_mapping;
 void *end_s;
+check_vect ();
 s_mapping = mmap ((void *)ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
 if (s_mapping == MAP_FAILED)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
99e61f8a85dc6cebbb3cd4183f8bd7fdfe9aa1de..5018bd23e6ec457a8e4790d333752b7e009228e0
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2214,8 +2214,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
 we can end up with no gap recorded but still excess
 elements accessed, see PR103116.  Make sure we peel for
 gaps if necessary and sufficient and give up if not.
-If there is a combination of the access not covering the full 
vector and
-a gap recorded then we may need to peel twice.  */
+
+If there is a combination of the access not covering the full
+vector and a gap recorded then we may need to peel twice.  */
  if (loop_vinfo
  && *memory_access_type == VMAT_CONTIGUOUS
  && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()



diff --git a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c 
b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
index 
1aab4c5a14d1e8346d89587bd9544a1516535a45..7e7db3c7a45fa1ecc748f51353170ec66c5dfed7
 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
@@ -9,6 +9,7 @@
 /* { dg-require-effective-target mmap } */
 #include 
 #include 
+#include "tree-vect.h"
 
 #define MMAP_SIZE 0x2
 #define ADDRESS 0x112200
@@ -24,6 +25,7 @@ void initialise_s(int *s) { }
 int main() {
 void *s_mapping;
 void *end_s;
+check_vect ();
 s_mapping = mmap ((void *)ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
  MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
 if (s_mapping == MAP_FAILED)
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
99e61f8a85dc6cebbb3cd4183f8bd7fdfe9aa1de..5018bd23e6ec457a8e4790d333752b7e009228e0
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2214,8 +2214,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
 we can end up with no gap recorded but still excess
 elements accessed, see PR103116.  Make sure we peel for
 gaps if necessary and sufficient and give up if not.
-If there is a combination of the access not covering the full 
vector and
-a gap recorded then we may need to peel twice.  */
+
+If there is a combination of the access not covering the full
+vector and a gap recorded then we may need to peel twice.  */
  if (loop_vinfo
  && *memory_access_type == VMAT_CONTIGUOUS
  && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()





Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Kito Cheng via Gcc-patches
I would say LCM/PRE is the key of this set of static rounding model
intrinsic, otherwise I think it's will push people to using dynamic with
fesetrouding mode or inline asm to set the rounding mode for performance
issue - it's kind of opposite way of the design concept, we want to provide
a reliable way with performance to precisely control the ronding model.

For the function call stuff that could be resolved by fenv_access pragma in
theory, since it can be an annotation to tell compiler some function has
modify fenv or not, but unfortunately it’s not well modeled within GCC yet,
so we must did the conservative to make sure we didn't break anything.

And also the LLVM side is trying to implement some simple LCM/PRE to
optimize that, so I believe we need LCM/PRE based mode switching to do that.


Li, Pan2 於 2023年7月26日 週三,22:31寫道:

> As Juzhe mentioned, the problem of the CALL is resolved by LCM/PRE
> similar to the VSETVL pass, which is well proofed up to a point.
>
>
>
> I would like to propose that being focus and moving forward for this patch
> itself, the underlying other RVV floating point API support and the RVV
> instrinsic API fully tests depend on this.
>
>
>
> Of course, I am working on PATCH v8 and thanks again for Robin’s comments.
>
>
>
> Pan
>
>
>
> *From:* 钟居哲 
> *Sent:* Wednesday, July 26, 2023 10:18 PM
> *To:* rdapp.gcc ; Li, Pan2 
> *Cc:* rdapp.gcc ; kito.cheng ;
> gcc-patches ; Wang, Yanzhang <
> yanzhang.w...@intel.com>
> *Subject:* Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point
> dynamic rounding
>
>
>
> Explicitly backup and restore for each intrinsic just the same as we did
> for CALL in this patch.
>
>
>
> I can't have the data to prove how good we use LCM/PRE of mode switching
> but I trust it.
>
>
>
> Since the the LCM/PRE is the key optimization method of VSETVL PASS which
> is doing good job on VSETVL instruction optimizations.
>
>
>
> I don't we should give up LCM/PRE chance then just backup and restore for
> each intrinsic bindly.
>
>
>
>
> --
>
> juzhe.zh...@rivai.ai
>
>
>
> *From:* Robin Dapp 
>
> *Date:* 2023-07-26 21:46
>
> *To:* juzhe.zhong ; Li, Pan2 
>
> *CC:* rdapp.gcc ; Kito Cheng ;
> gcc-patches@gcc.gnu.org; Wang, Yanzhang 
>
> *Subject:* Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point
> dynamic rounding
>
> > current llvm didn't do any pre optimization.  They always
>
> > backup+restore for each rounding mode intrinsic
>
>
>
> I see.  There is still the option of lazily restoring the
>
> (entry) FRM before a function call but not read the FRM
>
> after every call.  Do we have any data on how good or bad the
>
> mode-switching LCM works when we explicitly backup and restore
>
> for each intrinsic?
>
>
>
> Regards
>
> Robin
>
>
>
>


[PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-07-26 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch enables the use of mixed-types for simd clones for AArch64 
and adds aarch64 as a target_vect_simd_clones.


Bootstrapped and regression tested on aarch64-unknown-linux-gnu

gcc/ChangeLog:

* config/aarch64/aarch64.cc (currently_supported_simd_type): 
Remove.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Use NFS type 
to determine simdlen.


gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
* c-c++-common/gomp/declare-variant-14.c: Add aarch64 checks 
and remove warning check.

* g++.dg/gomp/attrs-10.C: Likewise.
* g++.dg/gomp/declare-simd-1.C: Likewise.
* g++.dg/gomp/declare-simd-3.C: Likewise.
* g++.dg/gomp/declare-simd-4.C: Likewise.
* gcc.dg/gomp/declare-simd-3.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* c-c++-common/gomp/pr60823-1.c: Remove warning check.
* c-c++-common/gomp/pr60823-3.c: Likewise.
* g++.dg/gomp/declare-simd-7.C: Likewise.
* g++.dg/gomp/declare-simd-8.C: Likewise.
* g++.dg/gomp/pr88182.C: Likewise.
* gcc.dg/declare-simd.c: Likewise.
* gcc.dg/gomp/declare-simd-1.c: Likewise.
* gcc.dg/gomp/pr87895-1.c: Likewise.
* gfortran.dg/gomp/declare-simd-2.f90: Likewise.
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
* gfortran.dg/gomp/pr79154-1.f90: Likewise.
* gfortran.dg/gomp/pr83977.f90: Likewise.
* gcc.dg/gomp/pr87887-1.c: Add warning test.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/pr99542.c: Update warning test.diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
560e5431636ef46c41d56faa0c4e95be78f64b50..ac6350a44481628a947a0f20e034acf92cde63ec
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27194,21 +27194,6 @@ supported_simd_type (tree t)
   return false;
 }
 
-/* Return true for types that currently are supported as SIMD return
-   or argument types.  */
-
-static bool
-currently_supported_simd_type (tree t, tree b)
-{
-  if (COMPLEX_FLOAT_TYPE_P (t))
-return false;
-
-  if (TYPE_SIZE (t) != TYPE_SIZE (b))
-return false;
-
-  return supported_simd_type (t);
-}
-
 /* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
 
 static int
@@ -27217,7 +27202,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
tree base_type, int num,
bool explicit_p)
 {
-  tree t, ret_type;
+  tree t, ret_type, nfs_type;
   unsigned int elt_bits, count;
   unsigned HOST_WIDE_INT const_simdlen;
   poly_uint64 vec_bits;
@@ -27240,55 +27225,61 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
(struct cgraph_node *node,
 }
 
   ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  /* According to AArch64's Vector ABI the type that determines the simdlen is
+ the narrowest of types, so we ignore base_type for AArch64.  */
   if (TREE_CODE (ret_type) != VOID_TYPE
-  && !currently_supported_simd_type (ret_type, base_type))
+  && !supported_simd_type (ret_type))
 {
   if (!explicit_p)
;
-  else if (TYPE_SIZE (ret_type) != TYPE_SIZE (base_type))
-   warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-   "GCC does not currently support mixed size types "
-   "for % functions");
-  else if (supported_simd_type (ret_type))
+  else if (COMPLEX_FLOAT_TYPE_P (ret_type))
warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
"GCC does not currently support return type %qT "
-   "for % functions", ret_type);
+   "for simd", ret_type);
   else
warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-   "unsupported return type %qT for % functions",
+   "unsupported return type %qT for simd",
ret_type);
   return 0;
 }
 
+  nfs_type = ret_type;
   int i;
   tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
   bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);
-
   for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
t && t != void_list_node; t = TREE_CHAIN (t), i++)
 {
   tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
-
   if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM
- && !currently_supported_simd_type (arg_type, base_type))
+ && !supported_simd_type (arg_type))
{
  if (!explicit_p)
;
- else if (TYPE_SIZE (arg_type) != TYPE_SIZE (base_type))
+ else if (COMPLEX_FLOAT_TYPE_P (ret_type))
warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-   "GCC does not currently support mixed size types "
- 

[pushed] analyzer: add symbol base class, moving region id to there [PR104940]

2023-07-26 Thread David Malcolm via Gcc-patches
This patch introduces a "symbol" base class that region and svalue
both inherit from, generalizing the ID from the region class so it's
also used by svalues.  This gives a way of sorting regions and svalues
into creation order, which I've found useful in my experiments with
adding SMT support (PR analyzer/104940).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-2793-g9d804f9b2709b3.

gcc/ChangeLog:
PR analyzer/104940
* Makefile.in (ANALYZER_OBJS): Add analyzer/symbol.o.

gcc/analyzer/ChangeLog:
PR analyzer/104940
* region-model-manager.cc
(region_model_manager::region_model_manager): Update for
generalizing region ids to also cover svalues.
(region_model_manager::get_or_create_constant_svalue): Likewise.
(region_model_manager::get_or_create_unknown_svalue): Likewise.
(region_model_manager::create_unique_svalue): Likewise.
(region_model_manager::get_or_create_initial_value): Likewise.
(region_model_manager::get_or_create_setjmp_svalue): Likewise.
(region_model_manager::get_or_create_poisoned_svalue): Likewise.
(region_model_manager::get_ptr_svalue): Likewise.
(region_model_manager::get_or_create_unaryop): Likewise.
(region_model_manager::get_or_create_binop): Likewise.
(region_model_manager::get_or_create_sub_svalue): Likewise.
(region_model_manager::get_or_create_repeated_svalue): Likewise.
(region_model_manager::get_or_create_bits_within): Likewise.
(region_model_manager::get_or_create_unmergeable): Likewise.
(region_model_manager::get_or_create_widening_svalue): Likewise.
(region_model_manager::get_or_create_compound_svalue): Likewise.
(region_model_manager::get_or_create_conjured_svalue): Likewise.
(region_model_manager::get_or_create_asm_output_svalue): Likewise.
(region_model_manager::get_or_create_const_fn_result_svalue):
Likewise.
(region_model_manager::get_region_for_fndecl): Likewise.
(region_model_manager::get_region_for_label): Likewise.
(region_model_manager::get_region_for_global): Likewise.
(region_model_manager::get_field_region): Likewise.
(region_model_manager::get_element_region): Likewise.
(region_model_manager::get_offset_region): Likewise.
(region_model_manager::get_sized_region): Likewise.
(region_model_manager::get_cast_region): Likewise.
(region_model_manager::get_frame_region): Likewise.
(region_model_manager::get_symbolic_region): Likewise.
(region_model_manager::get_region_for_string): Likewise.
(region_model_manager::get_bit_range): Likewise.
(region_model_manager::get_var_arg_region): Likewise.
(region_model_manager::get_region_for_unexpected_tree_code):
Likewise.
(region_model_manager::get_or_create_region_for_heap_alloc):
Likewise.
(region_model_manager::create_region_for_alloca): Likewise.
(region_model_manager::log_stats): Likewise.
* region-model-manager.h (region_model_manager::get_num_regions):
Replace with...
(region_model_manager::get_num_symbols): ...this.
(region_model_manager::alloc_region_id): Replace with...
(region_model_manager::alloc_symbol_id): ...this.
(region_model_manager::m_next_region_id): Replace with...
(region_model_manager::m_next_symbol_id): ...this.
* region-model.cc (selftest::test_get_representative_tree): Update
for generalizing region ids to also cover svalues.
(selftest::test_binop_svalue_folding): Likewise.
(selftest::test_state_merging): Likewise.
* region.cc (region::cmp_ids): Delete, in favor of
symbol::cmp_ids.
(region::region): Update for introduction of symbol base class.
(frame_region::get_region_for_local): Likewise.
(root_region::root_region): Likewise.
(symbolic_region::symbolic_region): Likewise.
* region.h: Replace include of "analyzer/complexity.h" with
"analyzer/symbol.h".
(class region): Make a subclass of symbol.
(region::get_id): Delete in favor of symbol::get_id.
(region::cmp_ids): Delete in favor of symbol::cmp_ids.
(region::get_complexity): Delete in favor of
symbol::get_complexity.
(region::region): Use symbol::id_t for "id" param.
(region::m_complexity): Move field to symbol base class.
(region::m_id): Likewise.
(space_region::space_region): Use symbol::id_t for "id" param.
(frame_region::frame_region): Likewise.
(globals_region::globals_region): Likewise.
(code_region::code_region): Likewise.
(function_region::function_region): Likewise.
(label_region::label_region): Likewise.
(stack_region::stack_region): Likewise.
(heap_region::heap_region): Likewise.

RE: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Li, Pan2 via Gcc-patches
As Juzhe mentioned, the problem of the CALL is resolved by LCM/PRE similar to 
the VSETVL pass, which is well proofed up to a point.

I would like to propose that being focus and moving forward for this patch 
itself, the underlying other RVV floating point API support and the RVV 
instrinsic API fully tests depend on this.

Of course, I am working on PATCH v8 and thanks again for Robin’s comments.

Pan

From: 钟居哲 
Sent: Wednesday, July 26, 2023 10:18 PM
To: rdapp.gcc ; Li, Pan2 
Cc: rdapp.gcc ; kito.cheng ; 
gcc-patches ; Wang, Yanzhang 
Subject: Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Explicitly backup and restore for each intrinsic just the same as we did for 
CALL in this patch.

I can't have the data to prove how good we use LCM/PRE of mode switching but I 
trust it.

Since the the LCM/PRE is the key optimization method of VSETVL PASS which is 
doing good job on VSETVL instruction optimizations.

I don't we should give up LCM/PRE chance then just backup and restore for each 
intrinsic bindly.



juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-07-26 21:46
To: juzhe.zhong; Li, Pan2
CC: rdapp.gcc; Kito 
Cheng; 
gcc-patches@gcc.gnu.org; Wang, 
Yanzhang
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding
> current llvm didn't do any pre optimization.  They always
> backup+restore for each rounding mode intrinsic

I see.  There is still the option of lazily restoring the
(entry) FRM before a function call but not read the FRM
after every call.  Do we have any data on how good or bad the
mode-switching LCM works when we explicitly backup and restore
for each intrinsic?

Regards
Robin



Re: [WIP RFC] analyzer: Add optional trim of the analyzer diagnostics going too deep [PR110543]

2023-07-26 Thread David Malcolm via Gcc-patches
On Wed, 2023-07-26 at 11:27 +0200, Benjamin Priour wrote:
> On Sat, Jul 22, 2023 at 12:04 AM David Malcolm 
> wrote:
> 
> > On Fri, 2023-07-21 at 17:35 +0200, Benjamin Priour wrote:
> > > Hi,
> > > 
> > > Upon David's request I've joined the in progress patch to the
> > > below
> > > email.
> > > I hope it makes more sense now.
> > > 
> > > Best,
> > > Benjamin.
> > 
> > Thanks for posting the work-in-progress patch; it makes the idea
> > clearer.
> > 
> > Some thoughts about this:
> > 
> > - I like the idea of defaulting to *not* showing events within
> > system
> > headers, which the patch achieves
> > - I don't like the combination of never/system with maxdepth, in
> > that
> > it seems complicated and I don't think a user is likely to
> > experiment
> > with different depths.
> > - Hence I think it would work better as a simple boolean, perhaps
> >   "-fanalyzer-show-events-in-system-headers"
> >   or somesuch?  It seems like the sort of thing that we want to
> > provide
> > a sensible default for, but have the option of turning off for
> > debugging the analyzer itself, but I don't expect an end-user to
> > touch
> > that option.
> > 
> 
> A boolean sounds good, I will trust your experience with the end-user
> here,
> especially since  and "never" had some overlap, it could
> have
> been confusing.
> 
> 
> > FWIW the patch seems to have been mangled somewhat via email, so I
> > don't have a sense of what the actual output from patched analyzer
> > looks like.  What should we output to the user with -fanalyzer and
> > no
> > other options for the case in PR 110543?  Currently, for
> > https://godbolt.org/z/sb9dM9Gqa trunk emits 12 events, of which
> > probably only this last one is useful:
> > 
> >   (12) dereference of NULL 'a.std::__shared_ptr_access > __gnu_cxx::_S_atomic, false, false>::operator->()'
> > 
> > What does the output look like with your patch?
> > 
> 
> The plan with this patch was to get events :
> (1) entry to 'main'
> (2) calling 'std::__shared_ptr_access false>::operator->' from 'main'
> (12) dereference of NULL 'a.std::__shared_ptr_access __gnu_cxx::_S_atomic, false, false>::operator->()'
> (11) returning to 'main' from 'std::__shared_ptr_access __gnu_cxx::_S_atomic, false, false>::operator->'
> 
> This way, we get the entry and exit point to the system headers ( (2)
> and
> (11) ), and the actual injurious event ( (12) ).
> We could however go as you suggest, with an even more succint path
> and only
> keep (1) and (12).

I think we could still have events (11) and (12): what if for call and
return events we consider the location of both the call site and the
called function: we suppress if both are in a system header, but don't
suppress if only one is a system header.  That way by default we'd show
the call into a system header and show the return from the system
header, but we'd suppress all the implementation details within the
system header.

Does that seem like it could work?

Thanks
Dave



> 
> Thanks,
> Benjamin
> 
> 
> > Thanks
> > Dave
> > 
> 
> > 
> > > 
> > > -- Forwarded message -
> > > From: Benjamin Priour 
> > > Date: Tue, Jul 18, 2023 at 3:30 PM
> > > Subject: [RFC] analyzer: Add optional trim of the analyzer
> > > diagnostics
> > > going too deep [PR110543]
> > > To: , David Malcolm
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > I'd like to request comments on a patch I am writing for
> > > PR110543.
> > > The goal of this patch is to reduce the noise of the analyzer
> > > emitted
> > > diagnostics when dealing with
> > > system headers, or simply diagnostic paths that are too long. The
> > > new
> > > option only affects the display
> > > of the diagnostics, but doesn't hinder the actual analysis.
> > > 
> > > I've defaulted the new option to "system", thus preventing the
> > > diagnostic
> > > paths from showing system headers.
> > > "never" corresponds to the pre-patch behavior, whereas you can
> > > also
> > > specify
> > > an unsigned value 
> > > that prevents paths to go deeper than  frames.
> > > 
> > > fanalyzer-trim-diagnostics=
> > > > Common Joined RejectNegative ToLower
> > > > Var(flag_analyzer_trim_diagnostics)
> > > > Init("system")
> > > > -fanalyzer-trim-diagnostics=[never|system|] Trim
> > > > diagnostics
> > > > path that are too long before emission.
> > > > 
> > > 
> > > Does it sounds reasonable and user-friendly ?
> > > 
> > > Regstrapping was a success against trunk, although one of the
> > > newly
> > > added
> > > test case fails for c++14.
> > > Note that the test case below was done with "never", thus behaves
> > > exactly
> > > as the pre-patch analyzer
> > > on x86_64-linux-gnu.
> > > 
> > > /* { dg-additional-options "-fdiagnostics-plain-output
> > > > -fdiagnostics-path-format=inline-events -fanalyzer-trim-
> > > > diagnostics=never"
> > > > } */
> > > > /* { dg-skip-if "" { c++98_only }  } */
> > > > 
> > > > #include 
> > > > struct A {int x; int y;};
> > > > 
> > > > int main () {
> > > >   std::shared_ptr a;
> > 

Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread 钟居哲
Explicitly backup and restore for each intrinsic just the same as we did for 
CALL in this patch.

I can't have the data to prove how good we use LCM/PRE of mode switching but I 
trust it.

Since the the LCM/PRE is the key optimization method of VSETVL PASS which is 
doing good job on VSETVL instruction optimizations.

I don't we should give up LCM/PRE chance then just backup and restore for each 
intrinsic bindly.
 



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-26 21:46
To: juzhe.zhong; Li, Pan2
CC: rdapp.gcc; Kito Cheng; gcc-patches@gcc.gnu.org; Wang, Yanzhang
Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding
> current llvm didn't do any pre optimization.  They always
> backup+restore for each rounding mode intrinsic
 
I see.  There is still the option of lazily restoring the
(entry) FRM before a function call but not read the FRM
after every call.  Do we have any data on how good or bad the
mode-switching LCM works when we explicitly backup and restore
for each intrinsic?
 
Regards
Robin
 


[committed] testsuite: Fix gfortran.dg/ieee/comparisons_3.F90 testsuite failures

2023-07-26 Thread Uros Bizjak via Gcc-patches
The testcase should use dg-additional-options instead of dg-options to
not overwrite default compile flags that include path for finding
the IEEE modules.

gcc/testsuite/ChangeLog:

* gfortran.dg/ieee/comparisons_3.F90: Use dg-additional-options
instead of dg-options.

Tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/testsuite/gfortran.dg/ieee/comparisons_3.F90 
b/gcc/testsuite/gfortran.dg/ieee/comparisons_3.F90
index c15678fec35..40e8466c132 100644
--- a/gcc/testsuite/gfortran.dg/ieee/comparisons_3.F90
+++ b/gcc/testsuite/gfortran.dg/ieee/comparisons_3.F90
@@ -1,5 +1,5 @@
 ! { dg-do run }
-! { dg-options "-ffree-line-length-none" }
+! { dg-additional-options "-ffree-line-length-none" }
 program foo
   use ieee_arithmetic
   use iso_fortran_env


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Robin Dapp via Gcc-patches


> CSR write could be expensive, it will flush whole pipeline in some
> RISC-V core implementation…
Hopefully not flush but just sequentialize but yes, it's usually a
performance concern.  However if we set the rounding mode to something
else for an intrinsic and then call a function we want to restore it
one way or another, right?

That's also the reason why glibc has done a lot of work to minimize
fesetround calls (or other fcsr setters).

Regards
 Robin


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Kito Cheng via Gcc-patches
CSR write could be expensive, it will flush whole pipeline in some RISC-V
core implementation…

Kito Cheng 於 2023年7月26日 週三,21:57寫道:

> Sorry for late ack on the LLVM part, I can say they are did the same
> model/semantics, it done by our team member too, and I have regular meeting
> with that guy :P
>
> Robin Dapp via Gcc-patches 於 2023年7月26日
> 週三,21:47寫道:
>
>> > current llvm didn't do any pre optimization.  They always
>> > backup+restore for each rounding mode intrinsic
>>
>> I see.  There is still the option of lazily restoring the
>> (entry) FRM before a function call but not read the FRM
>> after every call.  Do we have any data on how good or bad the
>> mode-switching LCM works when we explicitly backup and restore
>> for each intrinsic?
>>
>> Regards
>>  Robin
>>
>


Re: [PATCH v2][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-07-26 Thread Alex Coplan via Gcc-patches
On 28/06/2023 11:35, Alex Coplan via Gcc-patches wrote:
> Hi,
> 
> This patch implements clang's __has_feature and __has_extension in GCC.
> This is a v2 of the original RFC posted here:

Ping. The Objective-C parts have been approved, but the C, C++, and generic bits
need review.

Let me know if there's anything I can do to make it easier to review, e.g. would
it help to split into a series which adds the language-specific bits in separate
patches?

Thanks,
Alex

> 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html
> 
> Changes since v1:
>  - Follow the clang behaviour where -pedantic-errors means that
>__has_extension behaves exactly like __has_feature.
>  - We're now more conservative with reporting C++ features as extensions
>available in C++98. For features where we issue a pedwarn in C++98
>mode, we no longer report these as available extensions for C++98.
>  - Switch to using a hash_map to store the features. As well as ensuring
>lookup is constant time, this allows us to dynamically register
>features (right now based on frontend, but later we could allow the
>target to register additional features).
>  - Also implement some Objective-C features, add a langhook to dispatch
>to each frontend to allow it to register language-specific features.
> 
> There is an outstanding question around what to do with
> cxx_binary_literals in the C frontend for C2x. Should we introduce a new
> c_binary_literals feature that is a feature in C2x and an extension
> below that, or should we just continue using the cxx_binary_literals
> feature and mark that as a standard feature in C2x? See the comment in
> c_feature_table in the patch.
> 
> There is also some doubt over what to do with the undocumented "tls"
> feature.  In clang this is gated on whether the target supports TLS, but
> in clang (unlike GCC) it is a hard error to use TLS when the target
> doesn't support it.  In GCC I believe you can always use TLS, you just
> get emulated TLS in the case that the target doesn't support it
> natively.  So in this patch GCC always reports having the "tls" feature.
> Would appreciate if anyone has feedback on this aspect.
> 
> I know Iain was concerned that it should be possible to have
> target-specific features. Hopefully it is clear that the design in this
> patch is more amenable in this. I think for Darwin it should be possible
> to add a targetcm hook to register additional features (either passing
> through a callback to allow the target code to add to the hash_map, or
> exposing a separate langhook that the target can call to register
> features).
> 
> Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin. Any
> thoughts?
> 
> Thanks,
> Alex
> 
> --
> 
> Co-Authored-By: Iain Sandoe 
> 
> gcc/c-family/ChangeLog:
> 
>   PR c++/60512
>   * c-common.cc (struct hf_feature_info): New.
>   (struct hf_table_entry): New.
>   (hf_generic_predicate): New.
>   (c_common_register_feature): New.
>   (init_has_feature): New.
>   (has_feature_p): New.
>   * c-common.h (c_common_has_feature): New.
>   (has_feature_p): New.
>   (c_common_register_feature): New.
>   (c_register_features): New.
>   (cp_register_features): New.
>   * c-lex.cc (init_c_lex): Plumb through has_feature callback.
>   (c_common_has_builtin): Generalize and move common part ...
>   (c_common_lex_availability_macro): ... here.
>   (c_common_has_feature): New.
>   * c-ppoutput.cc (init_pp_output): Plumb through has_feature.
> 
> gcc/c/ChangeLog:
> 
>   PR c++/60512
>   * c-lang.cc (LANG_HOOKS_REGISTER_FEATURES): Implement with
>   c_register_features.
>   * c-objc-common.cc (struct c_feature_info): New.
>   (c_has_feature): New.
>   (c_register_features): New.
> 
> gcc/cp/ChangeLog:
> 
>   PR c++/60512
>   * cp-lang.cc (LANG_HOOKS_REGISTER_FEATURES): Implement with
>   cp_register_features.
>   * cp-objcp-common.cc (struct cp_feature_selector): New.
>   (cp_feature_selector::has_feature): New.
>   (struct cp_feature_info): New.
>   (cp_has_feature): New.
>   (cp_register_features): New.
> 
> gcc/ChangeLog:
> 
>   PR c++/60512
>   * doc/cpp.texi: Document __has_{feature,extension}.
>   * langhooks-def.h (LANG_HOOKS_REGISTER_FEATURES): New.
>   (LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_REGISTER_FEATURES.
>   * langhooks.h (struct lang_hooks): Add register_features hook.
> 
> gcc/objc/ChangeLog:
> 
>   PR c++/60512
>   * objc-act.cc (struct objc_feature_info): New.
>   (objc_nonfragile_abi_p): New.
>   (objc_has_feature): New.
>   (objc_common_register_features): New.
>   * objc-act.h (objc_register_features): New.
>   (objc_common_register_features): New.
>   * objc-lang.cc (LANG_HOOKS_REGISTER_FEATURES): Implement with
>   objc_register_features.
>   (objc_register_features): New.
> 
> gcc/objcp/ChangeLog:
> 
>   PR 

Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Kito Cheng via Gcc-patches
Sorry for late ack on the LLVM part, I can say they are did the same
model/semantics, it done by our team member too, and I have regular meeting
with that guy :P

Robin Dapp via Gcc-patches 於 2023年7月26日 週三,21:47寫道:

> > current llvm didn't do any pre optimization.  They always
> > backup+restore for each rounding mode intrinsic
>
> I see.  There is still the option of lazily restoring the
> (entry) FRM before a function call but not read the FRM
> after every call.  Do we have any data on how good or bad the
> mode-switching LCM works when we explicitly backup and restore
> for each intrinsic?
>
> Regards
>  Robin
>


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Robin Dapp via Gcc-patches
> current llvm didn't do any pre optimization.  They always
> backup+restore for each rounding mode intrinsic

I see.  There is still the option of lazily restoring the
(entry) FRM before a function call but not read the FRM
after every call.  Do we have any data on how good or bad the
mode-switching LCM works when we explicitly backup and restore
for each intrinsic?

Regards
 Robin


RE: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Li, Pan2 via Gcc-patches
For clarification, the llvm side is lack of reference value, due to it will 
always back up before vfadd, and then restore it after vfadd. For example as 
below clang trunk code-gen.

test_float_point_dynamic_frm:   # @test_float_point_dynamic_frm
vsetvli zero, a0, e32, m1, ta, ma

fsrmi   a0, 1 <- set and backup
vfadd.vvv9, v8, v9
fsrma0 <- restore

fsrmi   a0, 2
vfadd.vvv9, v8, v9
fsrma0

fsrmi   a0, 4
vfadd.vvv9, v8, v9
fsrma0

fsrmi   a0, 3
vfadd.vvv8, v8, v9
fsrma0

ret

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Wednesday, July 26, 2023 9:35 PM
To: juzhe.zhong ; Robin Dapp 
Cc: Kito Cheng ; rdapp@gmail.com; 
gcc-patches@gcc.gnu.org; Wang, Yanzhang 
Subject: RE: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

Thanks Robin for comments.

Yes, you can reference this link to compare the difference between gcc and 
llvm. And I am trying to understand and send the V8 later.

https://godbolt.org/z/4E434vaqv

Pan

From: juzhe.zhong 
Sent: Wednesday, July 26, 2023 9:13 PM
To: Robin Dapp 
Cc: Kito Cheng ; Li, Pan2 ; 
rdapp@gmail.com; gcc-patches@gcc.gnu.org; Wang, Yanzhang 

Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

current llvm didn't do any pre optimization.  They always backup+restore for 
each rounding mode intrinsic

We should not reference current llvm
 Replied Message 
From
Robin Dapp
Date
07/26/2023 21:08
To
Kito Cheng,
Li, Pan2
Cc
rdapp@gmail.com,
gcc-patches@gcc.gnu.org,
juzhe.zh...@rivai.ai,
Wang, Yanzhang
Subject
Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding
So after thinking about it again - I'm still not really sure
I like treating every function as essentially an fesetround.
There is a reason why fesetround is special.  Does LLVM behave
the same way?

But supposing we really, really want it and assuming there's consensus:

+  start_sequence ();
+  emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
+  rtx_insn *backup_insn = get_insns ();
+  end_sequence ();

A comment here would be nice why we need a sequence for a single
instruction.  I'm not fully aware what insert_insn_end_basic_block
does but won't a

 rtx_insn *last = BB_END (bb);
 emit_insn_before_noloc (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)), last, bb);

suffice?  One way or another need these kinds of non-local
constructs here don't seem entirely rock solid.

@@ -7843,6 +7946,11 @@ riscv_vxrm_mode_after (rtx_insn *insn, int mode)
static int
riscv_frm_mode_after (rtx_insn *insn, int mode)
{
+  STATIC_FRM_P (cfun) = STATIC_FRM_P (cfun) || riscv_static_frm_mode_p (mode);
+
+  if (CALL_P (insn))
+return FRM_MODE_DYN_CALL;

Why do we appear to return a different mode here?  We already request
FRM_MODE_DYN_CALL in mode_needed.  It looks like in the whole function
we do not change the mode so we could just always return the incoming
mode?

This is not part of this patch but related and originally I assumed
that we would untangle things after the initial patch, so:

  if (frm_unknown_dynamic_p (insn))
return FRM_MODE_DYN;

frm_unknown_dynamic_p checks CALL_P which has already been checked
before.  It returns FRM_MODE_DYN instead of FRM_MODE_DYN_CALL, though.

Apart from that, the function is called unknown_dynamic but we check
for a SET of FRM?  Wouldn't something that sets FRM rather be a "static"
rounding-mode instruction? (using the "static" wording from before)

Then we also still have

 if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
   return get_attr_frm_mode (insn);

from before.  Isn't that pretty much the same?


+  assert_equal (NEW_FRM, get_frm (),
+   "The value of frm register should be NEW_FRM.");

Here and in similar cases, NEW_FRM is not exactly telling.  Can't we
use "should be " and then

+  fprintf (stdout, "%s %d, but get %d != %d\n", message, a, b);

or similar?

+   will do the mode switch from MODE_CALL to MODE_NON_NONE natively.

NON -> FRM.

+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"

This causes a FAIL for me.  I believe the scan directives are off by one.

Are you going to do asm directives in a separate patch?
Similar to vxrm_unknown_p we could just check for one here
and handle it similarly to a call.  Would need some more tests, though.

Regards
Robin


Re: [PATCH] match.pd: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-26 Thread Drew Ross via Gcc-patches
Thanks for catching and fixing David and Andrew.

Drew

On Tue, Jul 25, 2023 at 5:59 PM Andrew Pinski  wrote:

> On Tue, Jul 25, 2023 at 1:54 PM Andrew Pinski  wrote:
> >
> > On Tue, Jul 25, 2023 at 12:45 PM Jakub Jelinek via Gcc-patches
> >  wrote:
> > >
> > > On Tue, Jul 25, 2023 at 03:42:21PM -0400, David Edelsohn via
> Gcc-patches wrote:
> > > > Hi, Drew
> > > >
> > > > Thanks for addressing this missed optimization.
> > > >
> > > > The testcase includes an incorrect assumption: signed char, which
> > > > causes the testcase to fail on PowerPC.
> > > >
> > > > Should the testcase be updated to specify signed char in the function
> > > > signatures or should -fsigned-char be added to the command line
> > > > options?
> > >
> > > I think we should use signed char instead of char in the testcase.
> >
> > I also think it should be `signed char` instead as I mentioned in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110803 .
>
> Committed the testsuite fix as r14-2767-g67357270772b91 .
>
> Thanks,
> Andrew
>
> >
> > Thanks,
> > Andrew
> >
> > >
> > > Jakub
> > >
>
>


RE: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Li, Pan2 via Gcc-patches
Thanks Robin for comments.

Yes, you can reference this link to compare the difference between gcc and 
llvm. And I am trying to understand and send the V8 later.

https://godbolt.org/z/4E434vaqv

Pan

From: juzhe.zhong 
Sent: Wednesday, July 26, 2023 9:13 PM
To: Robin Dapp 
Cc: Kito Cheng ; Li, Pan2 ; 
rdapp@gmail.com; gcc-patches@gcc.gnu.org; Wang, Yanzhang 

Subject: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic 
rounding

current llvm didn't do any pre optimization.  They always backup+restore for 
each rounding mode intrinsic

We should not reference current llvm
 Replied Message 
From
Robin Dapp
Date
07/26/2023 21:08
To
Kito Cheng,
Li, Pan2
Cc
rdapp@gmail.com,
gcc-patches@gcc.gnu.org,
juzhe.zh...@rivai.ai,
Wang, Yanzhang
Subject
Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding
So after thinking about it again - I'm still not really sure
I like treating every function as essentially an fesetround.
There is a reason why fesetround is special.  Does LLVM behave
the same way?

But supposing we really, really want it and assuming there's consensus:

+  start_sequence ();
+  emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
+  rtx_insn *backup_insn = get_insns ();
+  end_sequence ();

A comment here would be nice why we need a sequence for a single
instruction.  I'm not fully aware what insert_insn_end_basic_block
does but won't a

 rtx_insn *last = BB_END (bb);
 emit_insn_before_noloc (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)), last, bb);

suffice?  One way or another need these kinds of non-local
constructs here don't seem entirely rock solid.

@@ -7843,6 +7946,11 @@ riscv_vxrm_mode_after (rtx_insn *insn, int mode)
static int
riscv_frm_mode_after (rtx_insn *insn, int mode)
{
+  STATIC_FRM_P (cfun) = STATIC_FRM_P (cfun) || riscv_static_frm_mode_p (mode);
+
+  if (CALL_P (insn))
+return FRM_MODE_DYN_CALL;

Why do we appear to return a different mode here?  We already request
FRM_MODE_DYN_CALL in mode_needed.  It looks like in the whole function
we do not change the mode so we could just always return the incoming
mode?

This is not part of this patch but related and originally I assumed
that we would untangle things after the initial patch, so:

  if (frm_unknown_dynamic_p (insn))
return FRM_MODE_DYN;

frm_unknown_dynamic_p checks CALL_P which has already been checked
before.  It returns FRM_MODE_DYN instead of FRM_MODE_DYN_CALL, though.

Apart from that, the function is called unknown_dynamic but we check
for a SET of FRM?  Wouldn't something that sets FRM rather be a "static"
rounding-mode instruction? (using the "static" wording from before)

Then we also still have

 if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
   return get_attr_frm_mode (insn);

from before.  Isn't that pretty much the same?


+  assert_equal (NEW_FRM, get_frm (),
+   "The value of frm register should be NEW_FRM.");

Here and in similar cases, NEW_FRM is not exactly telling.  Can't we
use "should be " and then

+  fprintf (stdout, "%s %d, but get %d != %d\n", message, a, b);

or similar?

+   will do the mode switch from MODE_CALL to MODE_NON_NONE natively.

NON -> FRM.

+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"

This causes a FAIL for me.  I believe the scan directives are off by one.

Are you going to do asm directives in a separate patch?
Similar to vxrm_unknown_p we could just check for one here
and handle it similarly to a call.  Would need some more tests, though.

Regards
Robin


[PATCH] tree-optimization/106081 - elide redundant permute

2023-07-26 Thread Richard Biener via Gcc-patches
The following patch makes sure to elide a redundant permute that
can be merged with existing splats represented as load permutations
as we now do for non-grouped SLP loads.  This is the last bit
missing to fix this PR where the main fix was already done by
r14-2117-gdd86a5a69cbda4

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106081
* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
Assign layout -1 to splats.

* gcc.dg/vect/pr106081.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr106081.c | 33 
 gcc/tree-vect-slp.cc |  5 -
 2 files changed, 37 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr106081.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr106081.c 
b/gcc/testsuite/gcc.dg/vect/pr106081.c
new file mode 100644
index 000..8f97af2d642
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr106081.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffast-math -fdump-tree-optimized" } */
+/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vect_unpack } */
+/* { dg-require-effective-target vect_intdouble_cvt } */
+/* { dg-require-effective-target vect_perm } */
+
+struct pixels
+{
+  short a,b,c,d;
+} *pixels;
+struct dpixels
+{
+  double a,b,c,d;
+};
+
+double
+test(double *k)
+{
+  struct dpixels results={};
+  for (int u=0; u<1000*16;u++,k--)
+{
+  results.a += *k*pixels[u].a;
+  results.b += *k*pixels[u].b;
+  results.c += *k*pixels[u].c;
+  results.d += *k*pixels[u].d;
+}
+  return results.a+results.b*2+results.c*3+results.d*4;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM" 4 "optimized" { target 
x86_64-*-* i?86-*-* } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e4430248ab5..a1b153035e1 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4605,7 +4605,10 @@ vect_optimize_slp_pass::start_choosing_layouts ()
 IFN_MASK_LOADs).  */
  gcc_assert (partition.layout == 0 && !m_slpg->vertices[node_i].succ);
  if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt))
-   continue;
+   {
+ partition.layout = -1;
+ continue;
+   }
  dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt);
  imin = DR_GROUP_SIZE (dr_stmt) + 1;
  tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node));
-- 
2.35.3


Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point dynamic rounding

2023-07-26 Thread Robin Dapp via Gcc-patches
So after thinking about it again - I'm still not really sure
I like treating every function as essentially an fesetround.
There is a reason why fesetround is special.  Does LLVM behave
the same way?

But supposing we really, really want it and assuming there's consensus:

+  start_sequence ();
+  emit_insn (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)));
+  rtx_insn *backup_insn = get_insns ();
+  end_sequence ();

A comment here would be nice why we need a sequence for a single
instruction.  I'm not fully aware what insert_insn_end_basic_block
does but won't a

  rtx_insn *last = BB_END (bb);
  emit_insn_before_noloc (gen_frrmsi (DYNAMIC_FRM_RTL (cfun)), last, bb);

suffice?  One way or another need these kinds of non-local
constructs here don't seem entirely rock solid.

@@ -7843,6 +7946,11 @@ riscv_vxrm_mode_after (rtx_insn *insn, int mode)
 static int
 riscv_frm_mode_after (rtx_insn *insn, int mode)
 {
+  STATIC_FRM_P (cfun) = STATIC_FRM_P (cfun) || riscv_static_frm_mode_p (mode);
+
+  if (CALL_P (insn))
+return FRM_MODE_DYN_CALL;

Why do we appear to return a different mode here?  We already request
FRM_MODE_DYN_CALL in mode_needed.  It looks like in the whole function
we do not change the mode so we could just always return the incoming
mode?

This is not part of this patch but related and originally I assumed
that we would untangle things after the initial patch, so:

   if (frm_unknown_dynamic_p (insn))
 return FRM_MODE_DYN;

frm_unknown_dynamic_p checks CALL_P which has already been checked
before.  It returns FRM_MODE_DYN instead of FRM_MODE_DYN_CALL, though.

Apart from that, the function is called unknown_dynamic but we check
for a SET of FRM?  Wouldn't something that sets FRM rather be a "static"
rounding-mode instruction? (using the "static" wording from before)

Then we also still have

  if (reg_mentioned_p (gen_rtx_REG (SImode, FRM_REGNUM), PATTERN (insn)))
return get_attr_frm_mode (insn);

from before.  Isn't that pretty much the same?


+  assert_equal (NEW_FRM, get_frm (),
+   "The value of frm register should be NEW_FRM.");

Here and in similar cases, NEW_FRM is not exactly telling.  Can't we
use "should be " and then 

+  fprintf (stdout, "%s %d, but get %d != %d\n", message, a, b);

or similar?

+   will do the mode switch from MODE_CALL to MODE_NON_NONE natively.

NON -> FRM.

+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-dynamic-frm-46.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"

This causes a FAIL for me.  I believe the scan directives are off by one.

Are you going to do asm directives in a separate patch?
Similar to vxrm_unknown_p we could just check for one here
and handle it similarly to a call.  Would need some more tests, though.

Regards
 Robin



Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Hao Liu OS via Gcc-patches
> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && vect_is_reduction (stmt_info))
>
> to:
>
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && STMT_VINFO_LIVE_P (stmt_info)
>   && vect_is_reduction (stmt_info))

I  tried this and it indeed can avoid ICE.  But it seems the reduction_latency 
calculation is also skipped, after such modification, the redunction_latency is 
0 for this case. Previously, it is 1 and 2 for scalar and vector separately.

IMHO, to keep it consistent with previous result, should we move 
STMT_VINFO_LIVE_P check below and inside the if? such as:

  /* Calculate the minimum cycles per iteration imposed by a reduction
 operation.  */
  if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
  && vect_is_reduction (stmt_info))
{
  unsigned int base
= aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
  if (STMT_VINFO_LIVE_P (stmt_info) && STMT_VINFO_FORCE_SINGLE_CYCLE (
info_for_reduction (m_vinfo, stmt_info)))
/* ??? Ideally we'd use a tree to reduce the copies down to 1 vector,
   and then accumulate that, but at the moment the loop-carried
   dependency includes all copies.  */
ops->reduction_latency = MAX (ops->reduction_latency, base * count);
  else
ops->reduction_latency = MAX (ops->reduction_latency, base);

Thanks,
Hao


From: Richard Sandiford 
Sent: Wednesday, July 26, 2023 17:14
To: Richard Biener
Cc: Hao Liu OS; GCC-patches@gcc.gnu.org
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
multiplying count [PR110625]

Richard Biener  writes:
> On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>  wrote:
>>
>> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that we're 
>> > not papering over an issue elsewhere.
>>
>> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below is 
>> the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>>
>>   :
>>   # res_18 = PHI 
>>   # i_20 = PHI 
>>   _1 = (long unsigned int) i_20;
>>   _2 = _1 * 2;
>>   _3 = x_14(D) + _2;
>>   _4 = *_3;
>>   _5 = (unsigned short) _4;
>>   res.0_6 = (unsigned short) res_18;
>>   _7 = _5 + res.0_6; <-- The current stmt_info
>>   res_15 = (short int) _7;
>>   i_16 = i_20 + 1;
>>   if (n_11(D) > i_16)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   goto ;
>>
>> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI > 0(6)>"?
>> The status here is:
>>   STMT_VINFO_REDUC_IDX (stmt_info): 1
>>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
>
> Not all stmts in the SSA cycle forming the reduction have
> STMT_VINFO_REDUC_DEF set,
> only the last (latch def) and live stmts have at the moment.

Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:

  if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
  && vect_is_reduction (stmt_info))

to:

  if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
  && STMT_VINFO_LIVE_P (stmt_info)
  && vect_is_reduction (stmt_info))

instead of using a null check.

I see that vectorizable_reduction calculates a reduc_chain_length.
Would it be OK to store that in the stmt_vec_info?  I suppose the
AArch64 code should be multiplying by that as well.  (It would be a
separate patch from this one though.)

Richard


>
> Richard.
>
>> Thanks,
>> Hao
>>
>> 
>> From: Richard Sandiford 
>> Sent: Tuesday, July 25, 2023 17:44
>> To: Hao Liu OS
>> Cc: GCC-patches@gcc.gnu.org
>> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
>> multiplying count [PR110625]
>>
>> Hao Liu OS  writes:
>> > Hi,
>> >
>> > Thanks for the suggestion.  I tested it and found a gcc_assert failure:
>> > gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in 
>> > info_for_reduction, at tree-vect-loop.cc:5473)
>> >
>> > It is caused by empty STMT_VINFO_REDUC_DEF.
>>
>> When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that
>> we're not papering over an issue elsewhere.
>>
>> Thanks,
>> Richard
>>
>>   So, I added an extra check before checking single_defuse_cycle. The 
>> updated patch is below.  Is it OK for trunk?
>> >
>> > ---
>> >
>> > The new costs should only count reduction latency by multiplying count for
>> > single_defuse_cycle.  For other situations, this will increase the 
>> > reduction
>> > latency a lot and miss vectorization opportunities.
>> >
>> > Tested on aarch64-linux-gnu.
>> >
>> > gcc/ChangeLog:
>> >
>> >   PR target/110625
>> >   * config/aarch64/aarch64.cc (count_ops): Only '* count' for
>> >   single_defuse_cycle while counting 

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-26 Thread Richard Biener via Gcc-patches
On Wed, Jul 26, 2023 at 2:26 PM Richard Biener
 wrote:
>
> On Wed, Jul 26, 2023 at 9:21 AM Tejas Belagod  wrote:
> >
> > On 7/17/23 5:46 PM, Richard Biener wrote:
> > > On Fri, Jul 14, 2023 at 12:18 PM Tejas Belagod  
> > > wrote:
> > >>
> > >> On 7/13/23 4:05 PM, Richard Biener wrote:
> > >>> On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod  
> > >>> wrote:
> > 
> >  On 7/3/23 1:31 PM, Richard Biener wrote:
> > > On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod  
> > > wrote:
> > >>
> > >> On 6/29/23 6:55 PM, Richard Biener wrote:
> > >>> On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod 
> > >>>  wrote:
> > 
> > 
> > 
> > 
> > 
> >  From: Richard Biener 
> >  Date: Tuesday, June 27, 2023 at 12:58 PM
> >  To: Tejas Belagod 
> >  Cc: gcc-patches@gcc.gnu.org 
> >  Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> > 
> >  On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod 
> >   wrote:
> > >
> > >
> > >
> > >
> > >
> > > From: Richard Biener 
> > > Date: Monday, June 26, 2023 at 2:23 PM
> > > To: Tejas Belagod 
> > > Cc: gcc-patches@gcc.gnu.org 
> > > Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> > >
> > > On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
> > >  wrote:
> > >>
> > >> Hi,
> > >>
> > >> Packed Boolean Vectors
> > >> --
> > >>
> > >> I'd like to propose a feature addition to GNU Vector extensions 
> > >> to add packed
> > >> boolean vectors (PBV).  This has been discussed in the past 
> > >> here[1] and a variant has
> > >> been implemented in Clang recently[2].
> > >>
> > >> With predication features being added to vector architectures 
> > >> (SVE, MVE, AVX),
> > >> it is a useful feature to have to model predication on targets.  
> > >> This could
> > >> find its use in intrinsics or just used as is as a GNU vector 
> > >> extension being
> > >> mapped to underlying target features.  For example, the packed 
> > >> boolean vector
> > >> could directly map to a predicate register on SVE.
> > >>
> > >> Also, this new packed boolean type GNU extension can be used 
> > >> with SVE ACLE
> > >> intrinsics to replace a fixed-length svbool_t.
> > >>
> > >> Here are a few options to represent the packed boolean vector 
> > >> type.
> > >
> > > The GIMPLE frontend uses a new 'vector_mask' attribute:
> > >
> > > typedef int v8si __attribute__((vector_size(8*sizeof(int;
> > > typedef v8si v8sib __attribute__((vector_mask));
> > >
> > > it get's you a vector type that's the appropriate (dependent on 
> > > the
> > > target) vector
> > > mask type for the vector data type (v8si in this case).
> > >
> > >
> > >
> > > Thanks Richard.
> > >
> > > Having had a quick look at the implementation, it does seem to 
> > > tick the boxes.
> > >
> > > I must admit I haven't dug deep, but if the target hook allows 
> > > the mask to be
> > >
> > > defined in way that is target-friendly (and I don't know how much 
> > > effort it will
> > >
> > > be to migrate the attribute to more front-ends), it should do the 
> > > job nicely.
> > >
> > > Let me go back and dig a bit deeper and get back with questions 
> > > if any.
> > 
> > 
> >  Let me add that the advantage of this is the compiler doesn't need
> >  to support weird explicitely laid out packed boolean vectors that 
> >  do
> >  not match what the target supports and the user doesn't need to 
> >  know
> >  what the target supports (and thus have an #ifdef maze around 
> >  explicitely
> >  specified layouts).
> > 
> >  Sorry for the delayed response – I spent a day experimenting with 
> >  vector_mask.
> > 
> > 
> > 
> >  Yeah, this is what option 4 in the RFC is trying to achieve – be 
> >  portable enough
> > 
> >  to avoid having to sprinkle the code with ifdefs.
> > 
> > 
> >  It does remove some flexibility though, for example with -mavx512f 
> >  -mavx512vl
> >  you'll get AVX512 style masks for V4SImode data vectors but of 
> >  course the
> >  target sill supports SSE2/AVX2 style masks as well, but those 
> >  would not be
> >  available as "packed boolean vectors", though they are of 

Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-26 Thread Richard Biener via Gcc-patches
On Wed, Jul 26, 2023 at 9:21 AM Tejas Belagod  wrote:
>
> On 7/17/23 5:46 PM, Richard Biener wrote:
> > On Fri, Jul 14, 2023 at 12:18 PM Tejas Belagod  
> > wrote:
> >>
> >> On 7/13/23 4:05 PM, Richard Biener wrote:
> >>> On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod  
> >>> wrote:
> 
>  On 7/3/23 1:31 PM, Richard Biener wrote:
> > On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod  
> > wrote:
> >>
> >> On 6/29/23 6:55 PM, Richard Biener wrote:
> >>> On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  
> >>> wrote:
> 
> 
> 
> 
> 
>  From: Richard Biener 
>  Date: Tuesday, June 27, 2023 at 12:58 PM
>  To: Tejas Belagod 
>  Cc: gcc-patches@gcc.gnu.org 
>  Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> 
>  On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod 
>   wrote:
> >
> >
> >
> >
> >
> > From: Richard Biener 
> > Date: Monday, June 26, 2023 at 2:23 PM
> > To: Tejas Belagod 
> > Cc: gcc-patches@gcc.gnu.org 
> > Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
> >
> > On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
> >  wrote:
> >>
> >> Hi,
> >>
> >> Packed Boolean Vectors
> >> --
> >>
> >> I'd like to propose a feature addition to GNU Vector extensions to 
> >> add packed
> >> boolean vectors (PBV).  This has been discussed in the past 
> >> here[1] and a variant has
> >> been implemented in Clang recently[2].
> >>
> >> With predication features being added to vector architectures 
> >> (SVE, MVE, AVX),
> >> it is a useful feature to have to model predication on targets.  
> >> This could
> >> find its use in intrinsics or just used as is as a GNU vector 
> >> extension being
> >> mapped to underlying target features.  For example, the packed 
> >> boolean vector
> >> could directly map to a predicate register on SVE.
> >>
> >> Also, this new packed boolean type GNU extension can be used with 
> >> SVE ACLE
> >> intrinsics to replace a fixed-length svbool_t.
> >>
> >> Here are a few options to represent the packed boolean vector type.
> >
> > The GIMPLE frontend uses a new 'vector_mask' attribute:
> >
> > typedef int v8si __attribute__((vector_size(8*sizeof(int;
> > typedef v8si v8sib __attribute__((vector_mask));
> >
> > it get's you a vector type that's the appropriate (dependent on the
> > target) vector
> > mask type for the vector data type (v8si in this case).
> >
> >
> >
> > Thanks Richard.
> >
> > Having had a quick look at the implementation, it does seem to tick 
> > the boxes.
> >
> > I must admit I haven't dug deep, but if the target hook allows the 
> > mask to be
> >
> > defined in way that is target-friendly (and I don't know how much 
> > effort it will
> >
> > be to migrate the attribute to more front-ends), it should do the 
> > job nicely.
> >
> > Let me go back and dig a bit deeper and get back with questions if 
> > any.
> 
> 
>  Let me add that the advantage of this is the compiler doesn't need
>  to support weird explicitely laid out packed boolean vectors that do
>  not match what the target supports and the user doesn't need to know
>  what the target supports (and thus have an #ifdef maze around 
>  explicitely
>  specified layouts).
> 
>  Sorry for the delayed response – I spent a day experimenting with 
>  vector_mask.
> 
> 
> 
>  Yeah, this is what option 4 in the RFC is trying to achieve – be 
>  portable enough
> 
>  to avoid having to sprinkle the code with ifdefs.
> 
> 
>  It does remove some flexibility though, for example with -mavx512f 
>  -mavx512vl
>  you'll get AVX512 style masks for V4SImode data vectors but of 
>  course the
>  target sill supports SSE2/AVX2 style masks as well, but those would 
>  not be
>  available as "packed boolean vectors", though they are of course in 
>  fact
>  equal to V4SImode data vectors with -1 or 0 values, so in this 
>  particular
>  case it might not matter.
> 
>  That said, the vector_mask attribute will get you V4SImode vectors 
>  with
>  signed boolean elements of 32 bits for V4SImode data vectors with
>  SSE2/AVX2.
> 

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Richard Biener via Gcc-patches
On Wed, Jul 26, 2023 at 12:12 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
> >> >  wrote:
> >> >>
> >> >> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that 
> >> >> > we're not papering over an issue elsewhere.
> >> >>
> >> >> Yes, I also wonder if this is an issue in vectorizable_reduction.  
> >> >> Below is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
> >> >>
> >> >>   :
> >> >>   # res_18 = PHI 
> >> >>   # i_20 = PHI 
> >> >>   _1 = (long unsigned int) i_20;
> >> >>   _2 = _1 * 2;
> >> >>   _3 = x_14(D) + _2;
> >> >>   _4 = *_3;
> >> >>   _5 = (unsigned short) _4;
> >> >>   res.0_6 = (unsigned short) res_18;
> >> >>   _7 = _5 + res.0_6; <-- The current 
> >> >> stmt_info
> >> >>   res_15 = (short int) _7;
> >> >>   i_16 = i_20 + 1;
> >> >>   if (n_11(D) > i_16)
> >> >> goto ;
> >> >>   else
> >> >> goto ;
> >> >>
> >> >>   :
> >> >>   goto ;
> >> >>
> >> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI 
> >> >> "?
> >> >> The status here is:
> >> >>   STMT_VINFO_REDUC_IDX (stmt_info): 1
> >> >>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
> >> >>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
> >> >
> >> > Not all stmts in the SSA cycle forming the reduction have
> >> > STMT_VINFO_REDUC_DEF set,
> >> > only the last (latch def) and live stmts have at the moment.
> >>
> >> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
> >>
> >>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
> >>   && vect_is_reduction (stmt_info))
> >>
> >> to:
> >>
> >>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
> >>   && STMT_VINFO_LIVE_P (stmt_info)
> >>   && vect_is_reduction (stmt_info))
> >>
> >> instead of using a null check.
> >
> > But as seen you will miss stmts that are part of the reduction then?
>
> Yeah, but the code is doing a maximum of all the reductions in the loop:
>
>   /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
>  that's not yet the case.  */
>   ops->reduction_latency = MAX (ops->reduction_latency, base * count);
>
> So as it stands, we only need to see each reduction (as opposed to each
> reduction statement) once.
>
> But we do want to know the length of each reduction...

Yeah, it really tells the current costing infrastructure is far from
perfect here ...

We could pass you the stmt_info for the reduction PHI (aka the
info_for_reduction)
once with a special kind, vect_reduction, so you could walk relevant stmts
from that.  You get a number of add_stmts for each reduction, that could
be a hint of the length as well ...  (but I think we always pass the 'main'
stmt_info here)

> > In principle we could put STMT_VINFO_REDUC_DEF to other stmts
> > as well.  See vectorizable_reduction in the
> >
> >   while (reduc_def != PHI_RESULT (reduc_def_phi))
> >
> > loop.
>
> Nod.  That's where I'd got the STMT_VINFO_LIVE_P thing from.
>
> >> I see that vectorizable_reduction calculates a reduc_chain_length.
> >> Would it be OK to store that in the stmt_vec_info?  I suppose the
> >> AArch64 code should be multiplying by that as well.  (It would be a
> >> separate patch from this one though.)
> >
> > I don't think that's too relevant here (it also counts noop conversions).
>
> Bah.  I'm loath to copy that loop and just pick out the relevant
> statements though.
>
> I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could
> maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and
> then take the maximum of those total latencies.  It sounds pretty
> complex though...

Ick.  Yeah.  I think most of the costing is still nearly GIGO, I hardly see any
loop vectorization disqualified because of cost (happens for BB SLP though).

Richard.

> Thanks,
> Richard
>


[PATCH] PR rtl-optimization/110701: Fix SUBREG SET_DEST handling in combine.

2023-07-26 Thread Roger Sayle

This patch is my proposed fix to PR rtl-optimization 110701, a latent bug
in combine's record_dead_and_set_regs_1 exposed by recent improvements to
simplify_subreg.

The issue involves the handling of (normal) SUBREG SET_DESTs as in the
instruction:

(set (subreg:HI (reg:SI x) 0) (expr:HI y))

The semantics of this are that the bits specified by the SUBREG are set
to the SET_SRC, y, and that the other bits of the SET_DEST are left/become
undefined.  To simplify explanation, we'll only consider lowpart SUBREGs
(though in theory non-lowpart SUBREGS could be handled), and the fact that
bits outside of the lowpart WORD retain their original values (treating
these as undefined is a missed optimization rather than incorrect code
bug, that only affects targets with less than 64-bit words).

The bug is that combine simulates the behaviour of the above instruction,
for calculating nonzero_bits and set_sign_bit_copies, in the function
record_value_for_reg, by using the equivalent of:

(set (reg:SI x) (subreg:SI (expr:HI y))

by calling gen_lowpart on the SET_SRC.  Alas, the semantics of this
revised instruction aren't always equivalent to the original.

In the test case for PR110701, the original instruction

(set (subreg:HI (reg:SI x), 0)
 (and:HI (subreg:HI (reg:SI y) 0)
 (const_int 340)))

which (by definition) leaves the top bits of x undefined, is mistakenly
considered to be equivalent to

(set (reg:SI x) (and:SI (reg:SI y) (const_int 340)))

where gen_lowpart's freedom to do anything with paradoxical SUBREG bits,
has now cleared the high bits.  The same bug also triggers when the
SET_SRC is say (subreg:HI (reg:DI z)), where gen_lowpart transforms
this into (subreg:SI (reg:DI z)) which defines bits 16-31 to be the
same as bits 16-31 of z.

The fix is that after calling record_value_for_reg, we need to mark
the bits that should be undefined as undefined, in case gen_lowpart,
which performs transforms appropriate for r-values, has changed the
interpretation of the SUBREG when used as an l-value.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

I've a version of this patch that preserves the original bits outside
of the lowpart WORD that can be submitted as a follow-up, but this is
the piece that addresses the wrong code regression.


2023-07-26  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/110701
* combine.cc (record_dead_and_set_regs_1): Split comment into
pieces placed before the relevant clauses.  When the SET_DEST
is a partial_subreg_p, mark the bits outside of the updated
portion of the destination as undefined.

gcc/testsuite/ChangeLog
PR rtl-optimization/110701
* gcc.target/i386/pr110701.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 4bf867d..c5ebb78 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -13337,27 +13337,43 @@ record_dead_and_set_regs_1 (rtx dest, const_rtx 
setter, void *data)
 
   if (REG_P (dest))
 {
-  /* If we are setting the whole register, we know its value.  Otherwise
-show that we don't know the value.  We can handle a SUBREG if it's
-the low part, but we must be careful with paradoxical SUBREGs on
-RISC architectures because we cannot strip e.g. an extension around
-a load and record the naked load since the RTL middle-end considers
-that the upper bits are defined according to LOAD_EXTEND_OP.  */
+  /* If we are setting the whole register, we know its value.  */
   if (GET_CODE (setter) == SET && dest == SET_DEST (setter))
record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
+  /* We can handle a SUBREG if it's the low part, but we must be
+careful with paradoxical SUBREGs on RISC architectures because
+we cannot strip e.g. an extension around a load and record the
+naked load since the RTL middle-end considers that the upper bits
+are defined according to LOAD_EXTEND_OP.  */
   else if (GET_CODE (setter) == SET
   && GET_CODE (SET_DEST (setter)) == SUBREG
   && SUBREG_REG (SET_DEST (setter)) == dest
   && known_le (GET_MODE_PRECISION (GET_MODE (dest)),
BITS_PER_WORD)
   && subreg_lowpart_p (SET_DEST (setter)))
-   record_value_for_reg (dest, record_dead_insn,
- WORD_REGISTER_OPERATIONS
- && word_register_operation_p (SET_SRC (setter))
- && paradoxical_subreg_p (SET_DEST (setter))
- ? SET_SRC (setter)
- : gen_lowpart (GET_MODE (dest),
-SET_SRC (setter)));
+   {
+ if (WORD_REGISTER_OPERATIONS
+ 

[COMMITTED] [range-ops] Remove special case for handling bitmasks in casts.

2023-07-26 Thread Aldy Hernandez via Gcc-patches
Now that we can generically handle bitmasks for unary operators,
there's no need to special case them.

gcc/ChangeLog:

* range-op-mixed.h (class operator_cast): Add update_bitmask.
* range-op.cc (operator_cast::update_bitmask): New.
(operator_cast::fold_range): Call update_bitmask.
---
 gcc/range-op-mixed.h |  2 ++
 gcc/range-op.cc  | 23 ---
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 6944742ecbc..91a4fcc3989 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -346,6 +346,8 @@ public:
   relation_kind lhs_op1_relation (const irange ,
  const irange , const irange ,
  relation_kind) const final override;
+  void update_bitmask (irange , const irange ,
+  const irange ) const final override;
 private:
   bool truncating_cast_p (const irange , const irange ) const;
   bool inside_domain_p (const wide_int , const wide_int ,
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 6b5d4f2accd..be8f8c48d7c 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2867,24 +2867,17 @@ operator_cast::fold_range (irange , tree type 
ATTRIBUTE_UNUSED,
return true;
 }
 
-  // Update the bitmask.  Truncating casts are problematic unless
-  // the conversion fits in the resulting outer type.
-  irange_bitmask bm = inner.get_bitmask ();
-  if (truncating_cast_p (inner, outer)
-  && wi::rshift (bm.mask (),
-wi::uhwi (TYPE_PRECISION (outer.type ()),
-  TYPE_PRECISION (inner.type ())),
-TYPE_SIGN (inner.type ())) != 0)
-return true;
-  unsigned prec = TYPE_PRECISION (type);
-  signop sign = TYPE_SIGN (inner.type ());
-  bm = irange_bitmask (wide_int::from (bm.value (), prec, sign),
-  wide_int::from (bm.mask (), prec, sign));
-  r.update_bitmask (bm);
-
+  update_bitmask (r, inner, outer);
   return true;
 }
 
+void
+operator_cast::update_bitmask (irange , const irange ,
+  const irange ) const
+{
+  update_known_bitmask (r, CONVERT_EXPR, lh, rh);
+}
+
 bool
 operator_cast::op1_range (irange , tree type,
  const irange ,
-- 
2.41.0



RE: [PATCH] RISC-V: Fix vector tuple intrinsic

2023-07-26 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Juzhe.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, July 26, 2023 4:25 PM
To: juzhe.zh...@rivai.ai
Cc: Li Xu ; gcc-patches ; 
palmer 
Subject: Re: [PATCH] RISC-V: Fix vector tuple intrinsic

OK, thanks

On Wed, Jul 26, 2023 at 4:22 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM from my side.
>
> It should be V3 though, never mind.
> No need to send V3 again.
>
> Give kito a chance chime in for more comments.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Li Xu
> Date: 2023-07-26 16:18
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; Li Xu
> Subject: [PATCH] RISC-V: Fix vector tuple intrinsic
> Consider this following case:
> void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
> vint32mf2x3_t v_tuple, size_t vl) {
>   return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
> }
>
> Compiler failed with:
> test.c:19:1: internal compiler error: in vl_vtype_info, at 
> config/riscv/riscv-vsetvl.cc:1679
>19 | }
>   | ^
> 0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
> unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
> 0x143f788 get_vl_vtype_info
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
> 0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
> 0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
> 0x14407ee pass_vsetvl::init()
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
> 0x14471cf pass_vsetvl::execute(function*)
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
> scalar type to float16, eliminate warning.
> (vfloat16mf4x3_t): Ditto.
> (vfloat16mf4x4_t): Ditto.
> (vfloat16mf4x5_t): Ditto.
> (vfloat16mf4x6_t): Ditto.
> (vfloat16mf4x7_t): Ditto.
> (vfloat16mf4x8_t): Ditto.
> (vfloat16mf2x2_t): Ditto.
> (vfloat16mf2x3_t): Ditto.
> (vfloat16mf2x4_t): Ditto.
> (vfloat16mf2x5_t): Ditto.
> (vfloat16mf2x6_t): Ditto.
> (vfloat16mf2x7_t): Ditto.
> (vfloat16mf2x8_t): Ditto.
> (vfloat16m1x2_t): Ditto.
> (vfloat16m1x3_t): Ditto.
> (vfloat16m1x4_t): Ditto.
> (vfloat16m1x5_t): Ditto.
> (vfloat16m1x6_t): Ditto.
> (vfloat16m1x7_t): Ditto.
> (vfloat16m1x8_t): Ditto.
> (vfloat16m2x2_t): Ditto.
> (vfloat16m2x3_t): Ditto.
> (vfloat16m2x4_t): Ditto.
> (vfloat16m4x2_t): Ditto.
> * config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
> * config/riscv/vector.md: add tuple mode in attr sew.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/tuple-intrinsic.c: New test.
> ---
> gcc/config/riscv/riscv-vector-builtins.def| 50 +--
> gcc/config/riscv/vector-iterators.md  |  1 +
> gcc/config/riscv/vector.md|  1 +
> .../riscv/rvv/base/tuple-intrinsic.c  | 23 +
> 4 files changed, 50 insertions(+), 25 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-intrinsic.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
> b/gcc/config/riscv/riscv-vector-builtins.def
> index 0e49480703b..6661629aad8 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.def
> +++ b/gcc/config/riscv/riscv-vector-builtins.def
> @@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, 
> uint64, RVVM8DI, _u64m8, _u64,
> DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, 
> _f16mf4,
>   _f16, _e16mf4)
> /* Define tuple types for SEW = 16, LMUL = MF4. */
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, 
> vfloat16mf4_t, float, 2, _f16mf4x2)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, 
> vfloat16mf4_t, float, 3, _f16mf4x3)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, 
> vfloat16mf4_t, float, 4, _f16mf4x4)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, 
> vfloat16mf4_t, float, 5, _f16mf4x5)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, 
> vfloat16mf4_t, float, 6, _f16mf4x6)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, 
> vfloat16mf4_t, float, 7, _f16mf4x7)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, 
> vfloat16mf4_t, float, 8, _f16mf4x8)
> +DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, 
> vfloat16mf4_t, float16, 2, _f16mf4x2)
> +DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, 
> vfloat16mf4_t, float16, 3, _f16mf4x3)
> +DEF_RVV_TUPLE_TYPE 

[committed] i386: Clear upper half of XMM register for V2SFmode operations [PR110762]

2023-07-26 Thread Uros Bizjak via Gcc-patches
Clear the upper half of a V4SFmode operand register in front of all
potentially trapping instructions. The testcase:

--cut here--
typedef float v2sf __attribute__((vector_size(8)));
typedef float v4sf __attribute__((vector_size(16)));

v2sf test(v4sf x, v4sf y)
{
  v2sf x2, y2;

  x2 = __builtin_shufflevector (x, x, 0, 1);
  y2 = __builtin_shufflevector (y, y, 0, 1);

  return x2 + y2;
}
--cut here--

now compiles to:

movq%xmm1, %xmm1# 9 [c=4 l=4]  *vec_concatv4sf_0
movq%xmm0, %xmm0# 10[c=4 l=4]  *vec_concatv4sf_0
addps   %xmm1, %xmm0# 11[c=12 l=3]  *addv4sf3/0

This approach addresses issues with exceptions, as well as issues with
denormal/invalid values. An obvious exception to the rule is a division,
where the value != 0.0 should be loaded into the upper half of the
denominator to avoid division by zero exception.

The patch effectively tightens the solution from PR95046 by clearing upper
halves of all operand registers before every potentially trapping instruction.
The testcase:

--cut here--
typedef float __attribute__((vector_size(8))) v2sf;

v2sf test (v2sf a, v2sf b, v2sf c)
{
  return a * b - c;
}
--cut here--

compiles to:

movq%xmm1, %xmm1# 8 [c=4 l=4]  *vec_concatv4sf_0
movq%xmm0, %xmm0# 9 [c=4 l=4]  *vec_concatv4sf_0
movq%xmm2, %xmm2# 12[c=4 l=4]  *vec_concatv4sf_0
mulps   %xmm1, %xmm0# 10[c=16 l=3]  *mulv4sf3/0
movq%xmm0, %xmm0# 13[c=4 l=4]  *vec_concatv4sf_0
subps   %xmm2, %xmm0# 14[c=12 l=3]  *subv4sf3/0

The implementation emits V4SFmode operation, so we can remove all "emulated"
SSE2 V2SFmode trapping instructions and remove "emulated" SSE2 V2SFmode
alternatives from 3dNOW! insn patterns.

PR target/110762

gcc/ChangeLog:

* config/i386/i386.md (plusminusmult): New code iterator.
* config/i386/mmx.md (mmxdoublevecmode): New mode attribute.
(movq__to_sse): New expander.
(v2sf3): Macroize expander from addv2sf3,
subv2sf3 and mulv2sf3 using plusminusmult code iterator.  Rewrite
as a wrapper around V4SFmode operation.
(mmx_addv2sf3): Change operand 1 and operand 2 predicates to
nonimmediate_operand.
(*mmx_addv2sf3): Remove SSE alternatives.  Change operand 1 and
operand 2 predicates to nonimmediate_operand.
(mmx_subv2sf3): Change operand 2 predicate to nonimmediate_operand.
(mmx_subrv2sf3): Change operand 1 predicate to nonimmediate_operand.
(*mmx_subv2sf3): Remove SSE alternatives.  Change operand 1 and
operand 2 predicates to nonimmediate_operand.
(mmx_mulv2sf3): Change operand 1 and operand 2 predicates to
nonimmediate_operand.
(*mmx_mulv2sf3): Remove SSE alternatives.  Change operand 1 and
operand 2 predicates to nonimmediate_operand.
(divv2sf3): Rewrite as a wrapper around V4SFmode operation.
(v2sf3): Ditto.
(mmx_v2sf3): Change operand 1 and operand 2
predicates to nonimmediate_operand.
(*mmx_v2sf3): Remove SSE alternatives.  Change
operand 1 and operand 2 predicates to nonimmediate_operand.
(mmx_ieee_v2sf3): Ditto.
(sqrtv2sf2): Rewrite as a wrapper around V4SFmode operation.
(*mmx_haddv2sf3_low): Ditto.
(*mmx_hsubv2sf3_low): Ditto.
(vec_addsubv2sf3): Ditto.
(*mmx_maskcmpv2sf3_comm): Remove.
(*mmx_maskcmpv2sf3): Remove.
(vec_cmpv2sfv2si): Rewrite as a wrapper around V4SFmode operation.
(vcondv2sf): Ditto.
(fmav2sf4): Ditto.
(fmsv2sf4): Ditto.
(fnmav2sf4): Ditto.
(fnmsv2sf4): Ditto.
(fix_truncv2sfv2si2): Ditto.
(fixuns_truncv2sfv2si2): Ditto.
(mmx_fix_truncv2sfv2si2): Remove SSE alternatives.
Change operand 1 predicate to nonimmediate_operand.
(floatv2siv2sf2): Rewrite as a wrapper around V4SFmode operation.
(floatunsv2siv2sf2): Ditto.
(mmx_floatv2siv2sf2): Remove SSE alternatives.
Change operand 1 predicate to nonimmediate_operand.
(nearbyintv2sf2): Rewrite as a wrapper around V4SFmode operation.
(rintv2sf2): Ditto.
(lrintv2sfv2si2): Ditto.
(ceilv2sf2): Ditto.
(lceilv2sfv2si2): Ditto.
(floorv2sf2): Ditto.
(lfloorv2sfv2si2): Ditto.
(btruncv2sf2): Ditto.
(roundv2sf2): Ditto.
(lroundv2sfv2si2): Ditto.
(*mmx_roundv2sf2): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110762.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4db210cc795..cedba3b90f0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -933,6 +933,7 @@ (define_asm_attributes
(set_attr "type" "multi")])
 
 (define_code_iterator plusminus [plus minus])
+(define_code_iterator plusminusmult [plus minus mult])
 (define_code_iterator plusminusmultdiv [plus minus mult div])
 
 (define_code_iterator sat_plusminus [ss_plus us_plus ss_minus us_minus])
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>> >  wrote:
>> >>
>> >> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that 
>> >> > we're not papering over an issue elsewhere.
>> >>
>> >> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below 
>> >> is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>> >>
>> >>   :
>> >>   # res_18 = PHI 
>> >>   # i_20 = PHI 
>> >>   _1 = (long unsigned int) i_20;
>> >>   _2 = _1 * 2;
>> >>   _3 = x_14(D) + _2;
>> >>   _4 = *_3;
>> >>   _5 = (unsigned short) _4;
>> >>   res.0_6 = (unsigned short) res_18;
>> >>   _7 = _5 + res.0_6; <-- The current stmt_info
>> >>   res_15 = (short int) _7;
>> >>   i_16 = i_20 + 1;
>> >>   if (n_11(D) > i_16)
>> >> goto ;
>> >>   else
>> >> goto ;
>> >>
>> >>   :
>> >>   goto ;
>> >>
>> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI 
>> >> "?
>> >> The status here is:
>> >>   STMT_VINFO_REDUC_IDX (stmt_info): 1
>> >>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>> >>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
>> >
>> > Not all stmts in the SSA cycle forming the reduction have
>> > STMT_VINFO_REDUC_DEF set,
>> > only the last (latch def) and live stmts have at the moment.
>>
>> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>   && vect_is_reduction (stmt_info))
>>
>> to:
>>
>>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>>   && STMT_VINFO_LIVE_P (stmt_info)
>>   && vect_is_reduction (stmt_info))
>>
>> instead of using a null check.
>
> But as seen you will miss stmts that are part of the reduction then?

Yeah, but the code is doing a maximum of all the reductions in the loop:

  /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
 that's not yet the case.  */
  ops->reduction_latency = MAX (ops->reduction_latency, base * count);

So as it stands, we only need to see each reduction (as opposed to each
reduction statement) once.

But we do want to know the length of each reduction...

> In principle we could put STMT_VINFO_REDUC_DEF to other stmts
> as well.  See vectorizable_reduction in the
>
>   while (reduc_def != PHI_RESULT (reduc_def_phi))
>
> loop.

Nod.  That's where I'd got the STMT_VINFO_LIVE_P thing from.

>> I see that vectorizable_reduction calculates a reduc_chain_length.
>> Would it be OK to store that in the stmt_vec_info?  I suppose the
>> AArch64 code should be multiplying by that as well.  (It would be a
>> separate patch from this one though.)
>
> I don't think that's too relevant here (it also counts noop conversions).

Bah.  I'm loath to copy that loop and just pick out the relevant
statements though.

I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could
maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and
then take the maximum of those total latencies.  It sounds pretty
complex though...

Thanks,
Richard



[COMMITTED] bpf: fix generation of neg and neg32 BPF instructions

2023-07-26 Thread Jose E. Marchesi via Gcc-patches
This patch fixes GCC to generate correct neg and neg32 instructions,
which do not take a source register operand.  A couple of new tests
are added.

Tested in bpf-unknown-none.

gcc/ChangeLog

2023-07-26  Jose E. Marchesi  

* config/bpf/bpf.md: Fix neg{SI,DI}2 insn.

gcc/testsuite/ChangeLog

2023-07-26  Jose E. Marchesi  

* gcc.target/bpf/neg-1.c: New test.
* gcc.target/bpf/neg-pseudoc-1.c: Likewise.
---
 gcc/config/bpf/bpf.md|  4 ++--
 gcc/testsuite/gcc.target/bpf/neg-1.c | 14 ++
 gcc/testsuite/gcc.target/bpf/neg-pseudoc-1.c | 14 ++
 3 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/neg-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/neg-pseudoc-1.c

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 579a8213b09..1b5e1900d4f 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -142,9 +142,9 @@ (define_insn "sub3"
 ;;; Negation
 (define_insn "neg2"
   [(set (match_operand:AM 0 "register_operand"   "=r,r")
-(neg:AM (match_operand:AM 1 "reg_or_imm_operand" " r,I")))]
+(neg:AM (match_operand:AM 1 "reg_or_imm_operand" " 0,I")))]
   ""
-  "{neg\t%0,%1|%w0 = -%w1}"
+  "{neg\t%0|%w0 = -%w1}"
   [(set_attr "type" "")])
 
 ;;; Multiplication
diff --git a/gcc/testsuite/gcc.target/bpf/neg-1.c 
b/gcc/testsuite/gcc.target/bpf/neg-1.c
new file mode 100644
index 000..9ffb956859d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/neg-1.c
@@ -0,0 +1,14 @@
+/* Check negr and negr32 instructions.  */
+
+/* { dg-do compile } */
+/* { dg-options "-malu32" } */
+
+long foo (long a, long b, int x, int y)
+{
+  a = -b;
+  x = -y;
+  return a + x;
+}
+
+/* { dg-final { scan-assembler "neg\t%r.\n" } } */
+/* { dg-final { scan-assembler "neg32\t%r.\n" } } */
diff --git a/gcc/testsuite/gcc.target/bpf/neg-pseudoc-1.c 
b/gcc/testsuite/gcc.target/bpf/neg-pseudoc-1.c
new file mode 100644
index 000..a4fb687f04a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/neg-pseudoc-1.c
@@ -0,0 +1,14 @@
+/* Check negr and negr32 instructions (pseudoc asm dialect.)  */
+
+/* { dg-do compile } */
+/* { dg-options "-malu32 -masm=pseudoc" } */
+
+long foo (long a, long b, int x, int y)
+{
+  a = -b;
+  x = -y;
+  return a + x;
+}
+
+/* { dg-final { scan-assembler {\t(r.) = -\1\n} } } */
+/* { dg-final { scan-assembler {\t(w.) = -\1\n} } } */
-- 
2.30.2



Re: [PATCH] vect: Treat VMAT_ELEMENTWISE as scalar load in costing [PR110776]

2023-07-26 Thread Richard Biener via Gcc-patches
On Wed, Jul 26, 2023 at 4:52 AM Kewen.Lin  wrote:
>
> Hi,
>
> PR110776 exposes one issue that we could query unaligned
> load for vector type but actually no unaligned vector load
> is supported there.  The reason is that the costed load is
> with single-lane vector type and its memory access type is
> VMAT_ELEMENTWISE, we actually take it as scalar load and
> set its alignment_support_scheme as dr_unaligned_supported.
>
> To avoid the ICE as exposed, following Rich's suggestion,
> this patch is to make VMAT_ELEMENTWISE be costed as scalar
> load.
>
> Bootstrapped and regress-tested on x86_64-redhat-linux,
> powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

OK.

> BR,
> Kewen
> -
>
> Co-authored-by: Richard Biener 
>
> PR tree-optimization/110776
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_load): Always cost VMAT_ELEMENTWISE
> as scalar load.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr110776.c: New test.
> ---
>  gcc/testsuite/gcc.target/powerpc/pr110776.c | 22 +
>  gcc/tree-vect-stmts.cc  |  5 -
>  2 files changed, 26 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr110776.c
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110776.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110776.c
> new file mode 100644
> index 000..749159fd675
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110776.c
> @@ -0,0 +1,22 @@
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power6 -maltivec" } */
> +
> +/* Verify there is no ICE.  */
> +
> +int a;
> +long *b;
> +int
> +c ()
> +{
> +  long e;
> +  int d = 0;
> +  for (long f; f; f++)
> +{
> +  e = b[f * a];
> +  if (e)
> +   d = 1;
> +}
> +  if (d)
> +for (;;)
> +  ;
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index ed28fbdced3..09705200594 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9840,7 +9840,10 @@ vectorizable_load (vec_info *vinfo,
> {
>   if (costing_p)
> {
> - if (VECTOR_TYPE_P (ltype))
> + /* For VMAT_ELEMENTWISE, just cost it as scalar_load to
> +avoid ICE, see PR110776.  */
> + if (VECTOR_TYPE_P (ltype)
> + && memory_access_type != VMAT_ELEMENTWISE)
> vect_get_load_cost (vinfo, stmt_info, 1,
> alignment_support_scheme, 
> misalignment,
> false, _cost, nullptr, 
> cost_vec,
> --
> 2.39.1


Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Richard Biener via Gcc-patches
On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
> >  wrote:
> >>
> >> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that 
> >> > we're not papering over an issue elsewhere.
> >>
> >> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below 
> >> is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
> >>
> >>   :
> >>   # res_18 = PHI 
> >>   # i_20 = PHI 
> >>   _1 = (long unsigned int) i_20;
> >>   _2 = _1 * 2;
> >>   _3 = x_14(D) + _2;
> >>   _4 = *_3;
> >>   _5 = (unsigned short) _4;
> >>   res.0_6 = (unsigned short) res_18;
> >>   _7 = _5 + res.0_6; <-- The current stmt_info
> >>   res_15 = (short int) _7;
> >>   i_16 = i_20 + 1;
> >>   if (n_11(D) > i_16)
> >> goto ;
> >>   else
> >> goto ;
> >>
> >>   :
> >>   goto ;
> >>
> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI 
> >> "?
> >> The status here is:
> >>   STMT_VINFO_REDUC_IDX (stmt_info): 1
> >>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
> >>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
> >
> > Not all stmts in the SSA cycle forming the reduction have
> > STMT_VINFO_REDUC_DEF set,
> > only the last (latch def) and live stmts have at the moment.
>
> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && vect_is_reduction (stmt_info))
>
> to:
>
>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>   && STMT_VINFO_LIVE_P (stmt_info)
>   && vect_is_reduction (stmt_info))
>
> instead of using a null check.

But as seen you will miss stmts that are part of the reduction then?
In principle we could put STMT_VINFO_REDUC_DEF to other stmts
as well.  See vectorizable_reduction in the

  while (reduc_def != PHI_RESULT (reduc_def_phi))

loop.

> I see that vectorizable_reduction calculates a reduc_chain_length.
> Would it be OK to store that in the stmt_vec_info?  I suppose the
> AArch64 code should be multiplying by that as well.  (It would be a
> separate patch from this one though.)

I don't think that's too relevant here (it also counts noop conversions).

Richard.

>
> Richard
>
>
> >
> > Richard.
> >
> >> Thanks,
> >> Hao
> >>
> >> 
> >> From: Richard Sandiford 
> >> Sent: Tuesday, July 25, 2023 17:44
> >> To: Hao Liu OS
> >> Cc: GCC-patches@gcc.gnu.org
> >> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency 
> >> by multiplying count [PR110625]
> >>
> >> Hao Liu OS  writes:
> >> > Hi,
> >> >
> >> > Thanks for the suggestion.  I tested it and found a gcc_assert failure:
> >> > gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in 
> >> > info_for_reduction, at tree-vect-loop.cc:5473)
> >> >
> >> > It is caused by empty STMT_VINFO_REDUC_DEF.
> >>
> >> When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that
> >> we're not papering over an issue elsewhere.
> >>
> >> Thanks,
> >> Richard
> >>
> >>   So, I added an extra check before checking single_defuse_cycle. The 
> >> updated patch is below.  Is it OK for trunk?
> >> >
> >> > ---
> >> >
> >> > The new costs should only count reduction latency by multiplying count 
> >> > for
> >> > single_defuse_cycle.  For other situations, this will increase the 
> >> > reduction
> >> > latency a lot and miss vectorization opportunities.
> >> >
> >> > Tested on aarch64-linux-gnu.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >   PR target/110625
> >> >   * config/aarch64/aarch64.cc (count_ops): Only '* count' for
> >> >   single_defuse_cycle while counting reduction_latency.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> >   * gcc.target/aarch64/pr110625_1.c: New testcase.
> >> >   * gcc.target/aarch64/pr110625_2.c: New testcase.
> >> > ---
> >> >  gcc/config/aarch64/aarch64.cc | 13 --
> >> >  gcc/testsuite/gcc.target/aarch64/pr110625_1.c | 46 +++
> >> >  gcc/testsuite/gcc.target/aarch64/pr110625_2.c | 14 ++
> >> >  3 files changed, 69 insertions(+), 4 deletions(-)
> >> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_1.c
> >> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_2.c
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
> >> > b/gcc/config/aarch64/aarch64.cc
> >> > index 560e5431636..478a4e00110 100644
> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> > @@ -16788,10 +16788,15 @@ aarch64_vector_costs::count_ops (unsigned int 
> >> > count, vect_cost_for_stmt kind,
> >> >  {
> >> >unsigned int base
> >> >   = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, 
> >> > m_vec_flags);
> >> > -
> >> > -  /* ??? Ideally we'd do COUNT reductions in parallel, but 
> >> > unfortunately
> >> > -  that's not yet 

[committed] libgomp.texi: Add status item, @ref and document omp_in_explicit_task

2023-07-26 Thread Tobias Burnus

I stumbled recently over:
(a) 'omp target defaultmap(to: all)' - now added to the impl.status (as 'N')
(b) a @code that should have been @ref in our .texi
(c) while being there, I completed the 'Tasking Routines'
section by adding the missing, third routine omp_in_explicit_task.
(Documentation is still missing for about 20 already implemented routines.)

Committed as Rev. r14-2784-g819f3d3692cbfe

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 819f3d3692cbfe69ed7861da6ef47805914bb0b8
Author: Tobias Burnus 
Date:   Wed Jul 26 11:52:20 2023 +0200

libgomp.texi: Add status item, @ref and document omp_in_explicit_task

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.2 features): Add 'all' for 'defaultmap' as 'N'.
(Tasking Routines): Document omp_in_explicit_task.
(Implementation-defined ICV Initialization): Use @ref not @code.
---
 libgomp/libgomp.texi | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 9d3b2ae54cb..4ac01e977ec 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -426,6 +426,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
   @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
 @item Initial value of @var{default-device-var} ICV with
   @code{OMP_TARGET_OFFLOAD=mandatory} @tab Y @tab
+@item @code{all} as @emph{implicit-behavior} for @code{defaultmap} @tab N @tab
 @item @emph{interop_types} in any position of the modifier list for the @code{init} clause
   of the @code{interop} construct @tab N @tab
 @end multitable
@@ -1370,7 +1371,7 @@ They have C linkage and do not throw exceptions.
 
 @menu
 * omp_get_max_task_priority::   Maximum task priority value that can be set
-@c * omp_in_explicit_task:: 
+* omp_in_explicit_task::Whether a given task is an explicit task
 * omp_in_final::Whether in final or included task region
 @end menu
 
@@ -1399,6 +1400,32 @@ This function obtains the maximum allowed priority number for tasks.
 
 
 
+@node omp_in_explicit_task
+@subsection @code{omp_in_explicit_task} -- Whether a given task is an explicit task
+@table @asis
+@item @emph{Description}:
+The function returns the @var{explicit-task-var} ICV; it returns true when the
+encountering task was generated by a task-generating construct such as
+@code{target}, @code{task} or @code{taskloop}.  Otherwise, the encountering task
+is in an implicit task region such as generated by the implicit or explicit
+@code{parallel} region and @code{omp_in_explicit_task} returns false.
+
+@item @emph{C/C++}
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_in_explicit_task(void);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{logical function omp_in_explicit_task()}
+@end multitable
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v5.2}, Section 18.5.2.
+@end table
+
+
+
 @node omp_in_final
 @subsection @code{omp_in_final} -- Whether in final or included task region
 @table @asis
@@ -4802,7 +4829,7 @@ offloading devices (it's not clear if they should be):
 @item @var{def-allocator-var} @tab See @ref{OMP_ALLOCATOR}.
 @item @var{max-active-levels-var} @tab See @ref{OMP_MAX_ACTIVE_LEVELS}.
 @item @var{dyn-var} @tab See @ref{OMP_DYNAMIC}.
-@item @var{nthreads-var} @tab See @code{OMP_NUM_THREADS}.
+@item @var{nthreads-var} @tab See @ref{OMP_NUM_THREADS}.
 @item @var{num-devices-var} @tab Number of non-host devices found
 by GCC's run-time library
 @item @var{num-procs-var} @tab The number of CPU cores on the


[PATCH] tree-optimization/110799 - fix bug in code hoisting

2023-07-26 Thread Richard Biener via Gcc-patches
Code hoisting part of GIMPLE PRE failed to adjust the TBAA behavior
of common loads in the case the alias set of the ref was the same
but the base alias set was not.  It also failed to adjust the
base behavior, assuming it would match.  The following plugs this
hole.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110799
* tree-ssa-pre.cc (compute_avail): More thoroughly match
up TBAA behavior of redundant loads.

* gcc.dg/torture/pr110799.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110799.c | 46 +
 gcc/tree-ssa-pre.cc | 15 +---
 2 files changed, 56 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110799.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110799.c 
b/gcc/testsuite/gcc.dg/torture/pr110799.c
new file mode 100644
index 000..53f06f079e1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110799.c
@@ -0,0 +1,46 @@
+/* { dg-do run { target { { *-*-linux* *-*-gnu* *-*-uclinux* } && mmap } } } */
+
+#include 
+#include 
+#include 
+
+struct S {
+int a;
+};
+struct M {
+int a, b;
+};
+
+int __attribute__((noipa))
+f(struct S *p, int c, int d)
+{
+  int r;
+  if (c)
+{
+  if (d)
+   r = p->a;
+  else
+   r = ((struct M*)p)->a;
+}
+  else
+r = ((struct M*)p)->b;
+  return r;
+}
+
+int main ()
+{
+  long pgsz = sysconf(_SC_PAGESIZE);
+  if (pgsz < sizeof (struct M))
+return 0;
+  char *p = mmap ((void *) 0, 2 * pgsz, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS,
+ -1, 0);
+  if (p == MAP_FAILED)
+return 0;
+  if (mprotect (p, pgsz, PROT_READ | PROT_WRITE))
+return 0;
+  struct S *q = (struct S *)(p + pgsz) - 1;
+  q->a = 42;
+  if (f (q, 1, 1) != 42)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index e33c5ba80e2..0f2e458395c 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -4217,8 +4217,10 @@ compute_avail (function *fun)
  /* TBAA behavior is an obvious part so make sure
 that the hashtable one covers this as well
 by adjusting the ref alias set and its base.  */
- if (ref->set == set
- || alias_set_subset_of (set, ref->set))
+ if ((ref->set == set
+  || alias_set_subset_of (set, ref->set))
+ && (ref->base_set == base_set
+ || alias_set_subset_of (base_set, ref->base_set)))
;
  else if (ref1->opcode != ref2->opcode
   || (ref1->opcode != MEM_REF
@@ -4230,16 +4232,19 @@ compute_avail (function *fun)
  operands.release ();
  continue;
}
- else if (alias_set_subset_of (ref->set, set))
+ else if (ref->set == set
+  || alias_set_subset_of (ref->set, set))
{
+ tree reft = reference_alias_ptr_type (rhs1);
  ref->set = set;
+ ref->base_set = set;
  if (ref1->opcode == MEM_REF)
ref1->op0
- = wide_int_to_tree (TREE_TYPE (ref2->op0),
+ = wide_int_to_tree (reft,
  wi::to_wide (ref1->op0));
  else
ref1->op2
- = wide_int_to_tree (TREE_TYPE (ref2->op2),
+ = wide_int_to_tree (reft,
  wi::to_wide (ref1->op2));
}
  else
-- 
2.35.3


Re: [WIP RFC] analyzer: Add optional trim of the analyzer diagnostics going too deep [PR110543]

2023-07-26 Thread Benjamin Priour via Gcc-patches
On Sat, Jul 22, 2023 at 12:04 AM David Malcolm  wrote:

> On Fri, 2023-07-21 at 17:35 +0200, Benjamin Priour wrote:
> > Hi,
> >
> > Upon David's request I've joined the in progress patch to the below
> > email.
> > I hope it makes more sense now.
> >
> > Best,
> > Benjamin.
>
> Thanks for posting the work-in-progress patch; it makes the idea
> clearer.
>
> Some thoughts about this:
>
> - I like the idea of defaulting to *not* showing events within system
> headers, which the patch achieves
> - I don't like the combination of never/system with maxdepth, in that
> it seems complicated and I don't think a user is likely to experiment
> with different depths.
> - Hence I think it would work better as a simple boolean, perhaps
>   "-fanalyzer-show-events-in-system-headers"
>   or somesuch?  It seems like the sort of thing that we want to provide
> a sensible default for, but have the option of turning off for
> debugging the analyzer itself, but I don't expect an end-user to touch
> that option.
>

A boolean sounds good, I will trust your experience with the end-user here,
especially since  and "never" had some overlap, it could have
been confusing.


> FWIW the patch seems to have been mangled somewhat via email, so I
> don't have a sense of what the actual output from patched analyzer
> looks like.  What should we output to the user with -fanalyzer and no
> other options for the case in PR 110543?  Currently, for
> https://godbolt.org/z/sb9dM9Gqa trunk emits 12 events, of which
> probably only this last one is useful:
>
>   (12) dereference of NULL 'a.std::__shared_ptr_access __gnu_cxx::_S_atomic, false, false>::operator->()'
>
> What does the output look like with your patch?
>

The plan with this patch was to get events :
(1) entry to 'main'
(2) calling 'std::__shared_ptr_access::operator->' from 'main'
(12) dereference of NULL 'a.std::__shared_ptr_access::operator->()'
(11) returning to 'main' from 'std::__shared_ptr_access::operator->'

This way, we get the entry and exit point to the system headers ( (2) and
(11) ), and the actual injurious event ( (12) ).
We could however go as you suggest, with an even more succint path and only
keep (1) and (12).

Thanks,
Benjamin


> Thanks
> Dave
>

>
> >
> > -- Forwarded message -
> > From: Benjamin Priour 
> > Date: Tue, Jul 18, 2023 at 3:30 PM
> > Subject: [RFC] analyzer: Add optional trim of the analyzer
> > diagnostics
> > going too deep [PR110543]
> > To: , David Malcolm 
> >
> >
> > Hi,
> >
> > I'd like to request comments on a patch I am writing for PR110543.
> > The goal of this patch is to reduce the noise of the analyzer emitted
> > diagnostics when dealing with
> > system headers, or simply diagnostic paths that are too long. The new
> > option only affects the display
> > of the diagnostics, but doesn't hinder the actual analysis.
> >
> > I've defaulted the new option to "system", thus preventing the
> > diagnostic
> > paths from showing system headers.
> > "never" corresponds to the pre-patch behavior, whereas you can also
> > specify
> > an unsigned value 
> > that prevents paths to go deeper than  frames.
> >
> > fanalyzer-trim-diagnostics=
> > > Common Joined RejectNegative ToLower
> > > Var(flag_analyzer_trim_diagnostics)
> > > Init("system")
> > > -fanalyzer-trim-diagnostics=[never|system|] Trim
> > > diagnostics
> > > path that are too long before emission.
> > >
> >
> > Does it sounds reasonable and user-friendly ?
> >
> > Regstrapping was a success against trunk, although one of the newly
> > added
> > test case fails for c++14.
> > Note that the test case below was done with "never", thus behaves
> > exactly
> > as the pre-patch analyzer
> > on x86_64-linux-gnu.
> >
> > /* { dg-additional-options "-fdiagnostics-plain-output
> > > -fdiagnostics-path-format=inline-events -fanalyzer-trim-
> > > diagnostics=never"
> > > } */
> > > /* { dg-skip-if "" { c++98_only }  } */
> > >
> > > #include 
> > > struct A {int x; int y;};
> > >
> > > int main () {
> > >   std::shared_ptr a;
> > >   a->x = 4; /* { dg-line deref_a } */
> > >   /* { dg-warning "dereference of NULL" "" { target *-*-* } deref_a
> > > } */
> > >
> > >   return 0;
> > > }
> > >
> > > /* { dg-begin-multiline-output "" }
> > >   'int main()': events 1-2
> > > |
> > > |
> > > +--> 'std::__shared_ptr_access<_Tp, _Lp, ,
> > > 
> > > > ::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
> > >  >::operator->() const [with _Tp = A;
> > > __gnu_cxx::_Lock_policy
> > > _Lp = __gnu_cxx::_S_atomic; bool  = false; bool
> > >  =
> > > false]': events 3-4
> > >|
> > >|
> > >+--> 'std::__shared_ptr_access<_Tp, _Lp, ,
> > >  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> > > ,  >::_M_get() const [with _Tp = A;
> > > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool
> > >  =
> > > false; bool  = false]': events 5-6
> > >   |
> > >   |
> > >   +--> 'std::__shared_ptr<_Tp, 

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>  wrote:
>>
>> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that we're 
>> > not papering over an issue elsewhere.
>>
>> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below is 
>> the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>>
>>   :
>>   # res_18 = PHI 
>>   # i_20 = PHI 
>>   _1 = (long unsigned int) i_20;
>>   _2 = _1 * 2;
>>   _3 = x_14(D) + _2;
>>   _4 = *_3;
>>   _5 = (unsigned short) _4;
>>   res.0_6 = (unsigned short) res_18;
>>   _7 = _5 + res.0_6; <-- The current stmt_info
>>   res_15 = (short int) _7;
>>   i_16 = i_20 + 1;
>>   if (n_11(D) > i_16)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   goto ;
>>
>> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI > 0(6)>"?
>> The status here is:
>>   STMT_VINFO_REDUC_IDX (stmt_info): 1
>>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
>
> Not all stmts in the SSA cycle forming the reduction have
> STMT_VINFO_REDUC_DEF set,
> only the last (latch def) and live stmts have at the moment.

Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:

  if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
  && vect_is_reduction (stmt_info))

to:

  if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
  && STMT_VINFO_LIVE_P (stmt_info)
  && vect_is_reduction (stmt_info))

instead of using a null check.

I see that vectorizable_reduction calculates a reduc_chain_length.
Would it be OK to store that in the stmt_vec_info?  I suppose the
AArch64 code should be multiplying by that as well.  (It would be a
separate patch from this one though.)

Richard


>
> Richard.
>
>> Thanks,
>> Hao
>>
>> 
>> From: Richard Sandiford 
>> Sent: Tuesday, July 25, 2023 17:44
>> To: Hao Liu OS
>> Cc: GCC-patches@gcc.gnu.org
>> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
>> multiplying count [PR110625]
>>
>> Hao Liu OS  writes:
>> > Hi,
>> >
>> > Thanks for the suggestion.  I tested it and found a gcc_assert failure:
>> > gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in 
>> > info_for_reduction, at tree-vect-loop.cc:5473)
>> >
>> > It is caused by empty STMT_VINFO_REDUC_DEF.
>>
>> When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that
>> we're not papering over an issue elsewhere.
>>
>> Thanks,
>> Richard
>>
>>   So, I added an extra check before checking single_defuse_cycle. The 
>> updated patch is below.  Is it OK for trunk?
>> >
>> > ---
>> >
>> > The new costs should only count reduction latency by multiplying count for
>> > single_defuse_cycle.  For other situations, this will increase the 
>> > reduction
>> > latency a lot and miss vectorization opportunities.
>> >
>> > Tested on aarch64-linux-gnu.
>> >
>> > gcc/ChangeLog:
>> >
>> >   PR target/110625
>> >   * config/aarch64/aarch64.cc (count_ops): Only '* count' for
>> >   single_defuse_cycle while counting reduction_latency.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >   * gcc.target/aarch64/pr110625_1.c: New testcase.
>> >   * gcc.target/aarch64/pr110625_2.c: New testcase.
>> > ---
>> >  gcc/config/aarch64/aarch64.cc | 13 --
>> >  gcc/testsuite/gcc.target/aarch64/pr110625_1.c | 46 +++
>> >  gcc/testsuite/gcc.target/aarch64/pr110625_2.c | 14 ++
>> >  3 files changed, 69 insertions(+), 4 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_1.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_2.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 560e5431636..478a4e00110 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -16788,10 +16788,15 @@ aarch64_vector_costs::count_ops (unsigned int 
>> > count, vect_cost_for_stmt kind,
>> >  {
>> >unsigned int base
>> >   = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, 
>> > m_vec_flags);
>> > -
>> > -  /* ??? Ideally we'd do COUNT reductions in parallel, but 
>> > unfortunately
>> > -  that's not yet the case.  */
>> > -  ops->reduction_latency = MAX (ops->reduction_latency, base * count);
>> > +  if (STMT_VINFO_REDUC_DEF (stmt_info)
>> > +   && STMT_VINFO_FORCE_SINGLE_CYCLE (
>> > + info_for_reduction (m_vinfo, stmt_info)))
>> > + /* ??? Ideally we'd use a tree to reduce the copies down to 1 vector,
>> > +and then accumulate that, but at the moment the loop-carried
>> > +dependency includes all copies.  */
>> > + ops->reduction_latency = MAX (ops->reduction_latency, base * count);
>> > +  else
>> > + ops->reduction_latency = MAX (ops->reduction_latency, base);
>> >  }
>> >
>> >/* 

Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

2023-07-26 Thread Richard Biener via Gcc-patches
On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
 wrote:
>
> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that we're 
> > not papering over an issue elsewhere.
>
> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below is 
> the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>
>   :
>   # res_18 = PHI 
>   # i_20 = PHI 
>   _1 = (long unsigned int) i_20;
>   _2 = _1 * 2;
>   _3 = x_14(D) + _2;
>   _4 = *_3;
>   _5 = (unsigned short) _4;
>   res.0_6 = (unsigned short) res_18;
>   _7 = _5 + res.0_6; <-- The current stmt_info
>   res_15 = (short int) _7;
>   i_16 = i_20 + 1;
>   if (n_11(D) > i_16)
> goto ;
>   else
> goto ;
>
>   :
>   goto ;
>
> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI  0(6)>"?
> The status here is:
>   STMT_VINFO_REDUC_IDX (stmt_info): 1
>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0

Not all stmts in the SSA cycle forming the reduction have
STMT_VINFO_REDUC_DEF set,
only the last (latch def) and live stmts have at the moment.

Richard.

> Thanks,
> Hao
>
> 
> From: Richard Sandiford 
> Sent: Tuesday, July 25, 2023 17:44
> To: Hao Liu OS
> Cc: GCC-patches@gcc.gnu.org
> Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by 
> multiplying count [PR110625]
>
> Hao Liu OS  writes:
> > Hi,
> >
> > Thanks for the suggestion.  I tested it and found a gcc_assert failure:
> > gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in 
> > info_for_reduction, at tree-vect-loop.cc:5473)
> >
> > It is caused by empty STMT_VINFO_REDUC_DEF.
>
> When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that
> we're not papering over an issue elsewhere.
>
> Thanks,
> Richard
>
>   So, I added an extra check before checking single_defuse_cycle. The updated 
> patch is below.  Is it OK for trunk?
> >
> > ---
> >
> > The new costs should only count reduction latency by multiplying count for
> > single_defuse_cycle.  For other situations, this will increase the reduction
> > latency a lot and miss vectorization opportunities.
> >
> > Tested on aarch64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> >   PR target/110625
> >   * config/aarch64/aarch64.cc (count_ops): Only '* count' for
> >   single_defuse_cycle while counting reduction_latency.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/pr110625_1.c: New testcase.
> >   * gcc.target/aarch64/pr110625_2.c: New testcase.
> > ---
> >  gcc/config/aarch64/aarch64.cc | 13 --
> >  gcc/testsuite/gcc.target/aarch64/pr110625_1.c | 46 +++
> >  gcc/testsuite/gcc.target/aarch64/pr110625_2.c | 14 ++
> >  3 files changed, 69 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_1.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110625_2.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 560e5431636..478a4e00110 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -16788,10 +16788,15 @@ aarch64_vector_costs::count_ops (unsigned int 
> > count, vect_cost_for_stmt kind,
> >  {
> >unsigned int base
> >   = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, m_vec_flags);
> > -
> > -  /* ??? Ideally we'd do COUNT reductions in parallel, but 
> > unfortunately
> > -  that's not yet the case.  */
> > -  ops->reduction_latency = MAX (ops->reduction_latency, base * count);
> > +  if (STMT_VINFO_REDUC_DEF (stmt_info)
> > +   && STMT_VINFO_FORCE_SINGLE_CYCLE (
> > + info_for_reduction (m_vinfo, stmt_info)))
> > + /* ??? Ideally we'd use a tree to reduce the copies down to 1 vector,
> > +and then accumulate that, but at the moment the loop-carried
> > +dependency includes all copies.  */
> > + ops->reduction_latency = MAX (ops->reduction_latency, base * count);
> > +  else
> > + ops->reduction_latency = MAX (ops->reduction_latency, base);
> >  }
> >
> >/* Assume that multiply-adds will become a single operation.  */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr110625_1.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr110625_1.c
> > new file mode 100644
> > index 000..0965cac33a0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr110625_1.c
> > @@ -0,0 +1,46 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Ofast -mcpu=neoverse-n2 -fdump-tree-vect-details 
> > -fno-tree-slp-vectorize" } */
> > +/* { dg-final { scan-tree-dump-not "reduction latency = 8" "vect" } } */
> > +
> > +/* Do not increase the vector body cost due to the incorrect reduction 
> > latency
> > +Original vector body cost = 51
> > +Scalar issue estimate:
> > +  ...
> > +  reduction latency = 2
> > +  estimated min cycles per 

[COMMITTED] [range-ops] Handle bitmasks for ABSU_EXPR.

2023-07-26 Thread Aldy Hernandez via Gcc-patches
gcc/ChangeLog:

* range-op.cc (class operator_absu): Add update_bitmask.
(operator_absu::update_bitmask): New.
---
 gcc/range-op.cc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index bfab53caea0..5653ca0d186 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4221,6 +4221,8 @@ class operator_absu : public range_operator
   virtual void wi_fold (irange , tree type,
const wide_int _lb, const wide_int _ub,
const wide_int _lb, const wide_int _ub) const;
+  virtual void update_bitmask (irange , const irange ,
+  const irange ) const final override;
 } op_absu;
 
 void
@@ -4258,6 +4260,13 @@ operator_absu::wi_fold (irange , tree type,
   r = int_range<1> (type, new_lb, new_ub);
 }
 
+void
+operator_absu::update_bitmask (irange , const irange ,
+ const irange ) const
+{
+  update_known_bitmask (r, ABSU_EXPR, lh, rh);
+}
+
 
 bool
 operator_negate::fold_range (irange , tree type,
-- 
2.41.0



[COMMITTED] [range-ops] Handle bitmasks for ABS_EXPR.

2023-07-26 Thread Aldy Hernandez via Gcc-patches
gcc/ChangeLog:

* range-op-mixed.h (class operator_abs): Add update_bitmask.
* range-op.cc (operator_abs::update_bitmask): New.
---
 gcc/range-op-mixed.h | 2 ++
 gcc/range-op.cc  | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index ead41ed0515..70550c52232 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -408,6 +408,8 @@ class operator_abs : public range_operator
   bool op1_range (frange , tree type,
  const frange , const frange ,
  relation_trio rel = TRIO_VARYING) const final override;
+  void update_bitmask (irange , const irange ,
+  const irange ) const final override;
 private:
   void wi_fold (irange , tree type, const wide_int _lb,
const wide_int _ub, const wide_int _lb,
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 13ba973a08d..bfab53caea0 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4208,6 +4208,12 @@ operator_abs::op1_range (irange , tree type,
   return true;
 }
 
+void
+operator_abs::update_bitmask (irange , const irange ,
+ const irange ) const
+{
+  update_known_bitmask (r, ABS_EXPR, lh, rh);
+}
 
 class operator_absu : public range_operator
 {
-- 
2.41.0



[COMMITTED] [range-ops] Handle bitmasks for BIT_NOT_EXPR.

2023-07-26 Thread Aldy Hernandez via Gcc-patches
gcc/ChangeLog:

* range-op-mixed.h (class operator_bitwise_not): Add update_bitmask.
* range-op.cc (operator_bitwise_not::update_bitmask): New.
---
 gcc/range-op-mixed.h | 2 ++
 gcc/range-op.cc  | 7 +++
 2 files changed, 9 insertions(+)

diff --git a/gcc/range-op-mixed.h b/gcc/range-op-mixed.h
index 6944742ecbc..ead41ed0515 100644
--- a/gcc/range-op-mixed.h
+++ b/gcc/range-op-mixed.h
@@ -551,6 +551,8 @@ public:
   bool op1_range (irange , tree type,
  const irange , const irange ,
  relation_trio rel = TRIO_VARYING) const final override;
+  void update_bitmask (irange , const irange ,
+  const irange ) const final override;
 };
 
 class operator_bitwise_xor : public range_operator
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index d959a3e93dc..13ba973a08d 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -4027,6 +4027,13 @@ operator_bitwise_not::op1_range (irange , tree type,
   return fold_range (r, type, lhs, op2);
 }
 
+void
+operator_bitwise_not::update_bitmask (irange , const irange ,
+ const irange ) const
+{
+  update_known_bitmask (r, BIT_NOT_EXPR, lh, rh);
+}
+
 
 bool
 operator_cst::fold_range (irange , tree type ATTRIBUTE_UNUSED,
-- 
2.41.0



[COMMITTED] [range-ops] Handle bitmasks for unary operators.

2023-07-26 Thread Aldy Hernandez via Gcc-patches
It looks like we missed out on bitmasks for unary operators because we
were using bit_value_binop exclusively.  This patch hands off to
bit_value_unop when appropriate, thus allowing us to handle ABS and
BIT_NOT_EXPR, and others.  Follow-up patches will add the tweaks for the
range-ops entries themselves.

gcc/ChangeLog:

* range-op.cc (update_known_bitmask): Handle unary operators.
---
 gcc/range-op.cc | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 6b5d4f2accd..d959a3e93dc 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -385,15 +385,29 @@ update_known_bitmask (irange , tree_code code,
   irange_bitmask lh_bits = lh.get_bitmask ();
   irange_bitmask rh_bits = rh.get_bitmask ();
 
-  bit_value_binop (code, sign, prec, _value, _mask,
-  TYPE_SIGN (lh.type ()),
-  TYPE_PRECISION (lh.type ()),
-  widest_int::from (lh_bits.value (), sign),
-  widest_int::from (lh_bits.mask (), sign),
-  TYPE_SIGN (rh.type ()),
-  TYPE_PRECISION (rh.type ()),
-  widest_int::from (rh_bits.value (), sign),
-  widest_int::from (rh_bits.mask (), sign));
+  switch (get_gimple_rhs_class (code))
+{
+case GIMPLE_UNARY_RHS:
+  bit_value_unop (code, sign, prec, _value, _mask,
+ TYPE_SIGN (lh.type ()),
+ TYPE_PRECISION (lh.type ()),
+ widest_int::from (lh_bits.value (), sign),
+ widest_int::from (lh_bits.mask (), sign));
+  break;
+case GIMPLE_BINARY_RHS:
+  bit_value_binop (code, sign, prec, _value, _mask,
+  TYPE_SIGN (lh.type ()),
+  TYPE_PRECISION (lh.type ()),
+  widest_int::from (lh_bits.value (), sign),
+  widest_int::from (lh_bits.mask (), sign),
+  TYPE_SIGN (rh.type ()),
+  TYPE_PRECISION (rh.type ()),
+  widest_int::from (rh_bits.value (), sign),
+  widest_int::from (rh_bits.mask (), sign));
+  break;
+default:
+  gcc_unreachable ();
+}
 
   wide_int mask = wide_int::from (widest_mask, prec, sign);
   wide_int value = wide_int::from (widest_value, prec, sign);
-- 
2.41.0



Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-26 Thread Richard Biener via Gcc-patches
On Tue, Jul 25, 2023 at 9:26 PM Drew Ross  wrote:
>
> > With that fixed I think for non-vector integrals the above is the most 
> > suitable
> > canonical form of a sign-extension.  Note it should also work for any other
> > constant shift amount - just use the appropriate intermediate precision for
> > the truncating type.
> > We _might_ want
> > to consider to only use the converts when the intermediate type has
> > mode precision (and as a special case allow one bit as in your above case)
> > so it can expand to (sign_extend: (subreg: reg)).
>
> Here is a pattern that that only matches to truncations that result in mode 
> precision (or precision of 1):
>
> (simplify
>  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>  (if (INTEGRAL_TYPE_P (type)
>   && !TYPE_UNSIGNED (type)
>   && wi::gt_p (element_precision (type), wi::to_wide (@1), TYPE_SIGN 
> (TREE_TYPE (@1
>   (with {
> int width = element_precision (type) - tree_to_uhwi (@1);
> tree stype = build_nonstandard_integer_type (width, 0);
>}
>(if (TYPE_PRECISION (stype) == 1 || type_has_mode_precision_p (stype))
> (convert (convert:stype @0))
>
> Look ok?

I suppose so.  Can you see to amend the existing

/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
   types.  */
(simplify
 (rshift (lshift @0 INTEGER_CST@1) @1)
 (if (TYPE_UNSIGNED (type)
  && (wi::ltu_p (wi::to_wide (@1), element_precision (type
  (bit_and @0 (rshift { build_minus_one_cst (type); } @1

pattern?  You will get a duplicate pattern diagnostic otherwise.  It
also looks like this
one has the (nop_convert? ..) missing.  Btw, I wonder whether we can handle
some cases of widening/truncating converts between the shifts?

Richard.

> > You might also want to verify what RTL expansion
> > produces before/after - it at least shouldn't be worse.
>
> The RTL is slightly better for the mode precision cases and slightly worse 
> for the precision 1 case.
>
> > That said - do you have any testcase where the canonicalization is an 
> > enabler
> > for further transforms or was this requested stand-alone?
>
> No, I don't have any specific test cases. This patch is just in response to 
> pr101955.
>
> On Tue, Jul 25, 2023 at 2:55 AM Richard Biener  
> wrote:
>>
>> On Mon, Jul 24, 2023 at 9:42 PM Jakub Jelinek  wrote:
>> >
>> > On Mon, Jul 24, 2023 at 03:29:54PM -0400, Drew Ross via Gcc-patches wrote:
>> > > So would something like
>> > >
>> > > (simplify
>> > >  (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
>> > >  (with { tree stype = build_nonstandard_integer_type (1, 0); }
>> > >  (if (INTEGRAL_TYPE_P (type)
>> > >   && !TYPE_UNSIGNED (type)
>> > >   && wi::eq_p (wi::to_wide (@1), element_precision (type) - 1))
>> > >   (convert (convert:stype @0)
>> > >
>> > > work?
>> >
>> > Certainly swap the if and with and the (with then should be indented by 1
>> > column to the right of (if and (convert one further (the reason for the
>> > swapping is not to call build_nonstandard_integer_type when it will not be
>> > needed, which will be probably far more often then an actual match).
>>
>> With that fixed I think for non-vector integrals the above is the most 
>> suitable
>> canonical form of a sign-extension.  Note it should also work for any other
>> constant shift amount - just use the appropriate intermediate precision for
>> the truncating type.  You might also want to verify what RTL expansion
>> produces before/after - it at least shouldn't be worse.  We _might_ want
>> to consider to only use the converts when the intermediate type has
>> mode precision (and as a special case allow one bit as in your above case)
>> so it can expand to (sign_extend: (subreg: reg)).
>>
>> > As discussed privately, the above isn't what we want for vectors and the 2
>> > shifts are probably best on most arches because even when using -(x & 1) 
>> > the
>> > { 1, 1, 1, ... } vector would often needed to be loaded from memory.
>>
>> I think for vectors a vpcmpgt {0,0,0,..}, %xmm is the cheapest way of
>> producing the result.  Note that to reflect this on GIMPLE you'd need
>>
>>   _2 = _1 < { 0,0...};
>>   res = _2 ? { -1, -1, ...} : { 0, 0,...};
>>
>> because whether the ISA has a way to produce all-ones masks isn't known.
>>
>> For scalars using -(T)(_1 < 0) would also be possible.
>>
>> That said - do you have any testcase where the canonicalization is an enabler
>> for further transforms or was this requested stand-alone?
>>
>> Thanks,
>> Richard.
>>
>> > Jakub
>> >
>>


Re: [PATCH] RISC-V: Fix vector tuple intrinsic

2023-07-26 Thread Kito Cheng via Gcc-patches
OK, thanks

On Wed, Jul 26, 2023 at 4:22 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM from my side.
>
> It should be V3 though, never mind.
> No need to send V3 again.
>
> Give kito a chance chime in for more comments.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Li Xu
> Date: 2023-07-26 16:18
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; Li Xu
> Subject: [PATCH] RISC-V: Fix vector tuple intrinsic
> Consider this following case:
> void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
> vint32mf2x3_t v_tuple, size_t vl) {
>   return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
> }
>
> Compiler failed with:
> test.c:19:1: internal compiler error: in vl_vtype_info, at 
> config/riscv/riscv-vsetvl.cc:1679
>19 | }
>   | ^
> 0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
> unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
> 0x143f788 get_vl_vtype_info
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
> 0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
> 0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
> 0x14407ee pass_vsetvl::init()
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
> 0x14471cf pass_vsetvl::execute(function*)
> ../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
> scalar type to float16, eliminate warning.
> (vfloat16mf4x3_t): Ditto.
> (vfloat16mf4x4_t): Ditto.
> (vfloat16mf4x5_t): Ditto.
> (vfloat16mf4x6_t): Ditto.
> (vfloat16mf4x7_t): Ditto.
> (vfloat16mf4x8_t): Ditto.
> (vfloat16mf2x2_t): Ditto.
> (vfloat16mf2x3_t): Ditto.
> (vfloat16mf2x4_t): Ditto.
> (vfloat16mf2x5_t): Ditto.
> (vfloat16mf2x6_t): Ditto.
> (vfloat16mf2x7_t): Ditto.
> (vfloat16mf2x8_t): Ditto.
> (vfloat16m1x2_t): Ditto.
> (vfloat16m1x3_t): Ditto.
> (vfloat16m1x4_t): Ditto.
> (vfloat16m1x5_t): Ditto.
> (vfloat16m1x6_t): Ditto.
> (vfloat16m1x7_t): Ditto.
> (vfloat16m1x8_t): Ditto.
> (vfloat16m2x2_t): Ditto.
> (vfloat16m2x3_t): Ditto.
> (vfloat16m2x4_t): Ditto.
> (vfloat16m4x2_t): Ditto.
> * config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
> * config/riscv/vector.md: add tuple mode in attr sew.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/tuple-intrinsic.c: New test.
> ---
> gcc/config/riscv/riscv-vector-builtins.def| 50 +--
> gcc/config/riscv/vector-iterators.md  |  1 +
> gcc/config/riscv/vector.md|  1 +
> .../riscv/rvv/base/tuple-intrinsic.c  | 23 +
> 4 files changed, 50 insertions(+), 25 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-intrinsic.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
> b/gcc/config/riscv/riscv-vector-builtins.def
> index 0e49480703b..6661629aad8 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.def
> +++ b/gcc/config/riscv/riscv-vector-builtins.def
> @@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, 
> uint64, RVVM8DI, _u64m8, _u64,
> DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, 
> _f16mf4,
>   _f16, _e16mf4)
> /* Define tuple types for SEW = 16, LMUL = MF4. */
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, 
> vfloat16mf4_t, float, 2, _f16mf4x2)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, 
> vfloat16mf4_t, float, 3, _f16mf4x3)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, 
> vfloat16mf4_t, float, 4, _f16mf4x4)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, 
> vfloat16mf4_t, float, 5, _f16mf4x5)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, 
> vfloat16mf4_t, float, 6, _f16mf4x6)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, 
> vfloat16mf4_t, float, 7, _f16mf4x7)
> -DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, 
> vfloat16mf4_t, float, 8, _f16mf4x8)
> +DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, 
> vfloat16mf4_t, float16, 2, _f16mf4x2)
> +DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, 
> vfloat16mf4_t, float16, 3, _f16mf4x3)
> +DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, 
> vfloat16mf4_t, float16, 4, _f16mf4x4)
> +DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, 
> vfloat16mf4_t, float16, 5, _f16mf4x5)
> +DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, 
> vfloat16mf4_t, float16, 6, 

Re: [PATCH] RISC-V: Fix vector tuple intrinsic

2023-07-26 Thread juzhe.zh...@rivai.ai
LGTM from my side.

It should be V3 though, never mind.
No need to send V3 again.

Give kito a chance chime in for more comments.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-07-26 16:18
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: Fix vector tuple intrinsic
Consider this following case:
void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
vint32mf2x3_t v_tuple, size_t vl) {
  return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
}
 
Compiler failed with:
test.c:19:1: internal compiler error: in vl_vtype_info, at 
config/riscv/riscv-vsetvl.cc:1679
   19 | }
  | ^
0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
0x143f788 get_vl_vtype_info
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
0x14407ee pass_vsetvl::init()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
0x14471cf pass_vsetvl::execute(function*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
scalar type to float16, eliminate warning.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
* config/riscv/vector.md: add tuple mode in attr sew.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/tuple-intrinsic.c: New test.
---
gcc/config/riscv/riscv-vector-builtins.def| 50 +--
gcc/config/riscv/vector-iterators.md  |  1 +
gcc/config/riscv/vector.md|  1 +
.../riscv/rvv/base/tuple-intrinsic.c  | 23 +
4 files changed, 50 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 0e49480703b..6661629aad8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, uint64, 
RVVM8DI, _u64m8, _u64,
DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, _f16mf4,
  _f16, _e16mf4)
/* Define tuple types for SEW = 16, LMUL = MF4. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float, 2, _f16mf4x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float, 3, _f16mf4x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float, 4, _f16mf4x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float, 5, _f16mf4x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float, 6, _f16mf4x6)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float, 7, _f16mf4x7)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float, 8, _f16mf4x8)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float16, 2, _f16mf4x2)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float16, 3, _f16mf4x3)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float16, 4, _f16mf4x4)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float16, 5, _f16mf4x5)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float16, 6, _f16mf4x6)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float16, 7, _f16mf4x7)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float16, 8, _f16mf4x8)
/* LMUL = 1/2.  */
DEF_RVV_TYPE (vfloat16mf2_t, 18, __rvv_float16mf2_t, float16, RVVMF2HF, _f16mf2,
  

Re: Re: [PATCH v4] RISC-V: Fixbug for fsflags instruction error using immediate.

2023-07-26 Thread Kito Cheng via Gcc-patches
commit, thanks :)

On Wed, Jul 26, 2023 at 3:39 PM Kito Cheng  wrote:
>
> Oh, yeah, my bad, there is no fscsri, gonna test and push :)
>
> On Wed, Jul 26, 2023 at 3:20 PM juzhe.zh...@rivai.ai
>  wrote:
> >
> > I just checked SPEC:
> >
> > fscsr rd, rs csrrw rd, fcsr, rs
> > Swap FP control/status register
> > fscsr rs csrrw x0, fcsr, rs
> > Write FP control/status register
> >
> > It seems that fscsr doesn't have immediate form? I am not sure.
> >
> >
> > juzhe.zh...@rivai.ai
> >
> > From: Kito Cheng
> > Date: 2023-07-26 15:07
> > To: Jin Ma
> > CC: gcc-patches; jeffreyalaw; palmer; richard.sandiford; philipp.tomsich; 
> > christoph.muellner; rdapp.gcc; juzhe.zhong; jinma.contrib
> > Subject: Re: [PATCH v4] RISC-V: Fixbug for fsflags instruction error using 
> > immediate.
> > On Wed, Jul 26, 2023 at 1:41 PM Jin Ma via Gcc-patches
> >  wrote:
> > >
> > > The pattern mistakenly believes that fsflags can use immediate numbers,
> > > but in fact it does not support it. Immediate numbers should use fsflagsi.
> > >
> > > For example:
> > > __builtin_riscv_fsflags(4);
> > >
> > > The following error occurred.
> > > /tmp/ccoWdWqT.s: Assembler messages:
> > > /tmp/ccoWdWqT.s:14: Error: illegal operands `fsflags 4'
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/riscv.md: Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/riscv/fsflags.c: New test.
> > > ---
> > >  gcc/config/riscv/riscv.md|  4 ++--
> > >  gcc/testsuite/gcc.target/riscv/fsflags.c | 16 
> > >  2 files changed, 18 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/fsflags.c
> > >
> > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > > index 4615e811947..24515bcf706 100644
> > > --- a/gcc/config/riscv/riscv.md
> > > +++ b/gcc/config/riscv/riscv.md
> > > @@ -3074,7 +3074,7 @@ (define_insn "riscv_frcsr"
> > >"frcsr\t%0")
> > >
> > >  (define_insn "riscv_fscsr"
> > > -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] 
> > > UNSPECV_FSCSR)]
> > > +  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] 
> > > UNSPECV_FSCSR)]
> > >"TARGET_HARD_FLOAT || TARGET_ZFINX"
> > >"fscsr\t%0")
> >
> > Wait, this patch still drops K?
> >


Re: RISC-V: Replace unspec with bitreverse in riscv_brev8_ insn

2023-07-26 Thread Kito Cheng via Gcc-patches
My understanding is the semantic is slightly different, brev8 is only
the bit reverse within each byte, but bitreverse means did bit reverse
for the whole content of the mode, e.g. riscv_brev8_si will bit
reserved within 32 bit.

Using RV32 as example:
UNSPEC_BREV8:
rd[0...7]  = rs[7...0]
rd[8...15]  = rs[15...8]
rd[16...23]  = rs[23...16]
rd[16...23]  = rs[31...24]

bitreverse:
rd[0...31] = rs[31...0]

On Wed, Jul 26, 2023 at 3:55 PM Jivan Hakobyan via Gcc-patches
 wrote:
>
> This small patch replaces unspec opcode with bitreverse in
> riscv_brev8_ insn.
>
> gcc/ChangeLog:
> * config/riscv/crypto.md (UNSPEC_BREV8): Remov.
> (riscv_brev8_): Use bitreverse opcode.
>
>
> --
> With the best regards
> Jivan Hakobyan


Re: [PATCH] Initialize value in bit_value_unop.

2023-07-26 Thread Richard Biener via Gcc-patches
On Tue, Jul 25, 2023 at 9:08 PM Aldy Hernandez via Gcc-patches
 wrote:
>
> bit_value_binop initializes VAL regardless of the final mask.  It even
> has a comment to that effect:
>
>   /* Ensure that VAL is initialized (to any value).  */
>
> However, bit_value_unop, which in theory shares the same API, does not.
> This causes range-ops to choke on uninitialized VALs for some inputs to
> ABS.
>
> Instead of fixing the callers, it's cleaner to make bit_value_unop and
> bit_value_binop consistent.
>
> OK for trunk?

OK

> gcc/ChangeLog:
>
> * tree-ssa-ccp.cc (bit_value_unop): Initialize val when appropriate.
> ---
>  gcc/tree-ssa-ccp.cc | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> index 73fb7c11c64..15e65f16008 100644
> --- a/gcc/tree-ssa-ccp.cc
> +++ b/gcc/tree-ssa-ccp.cc
> @@ -1359,7 +1359,10 @@ bit_value_unop (enum tree_code code, signop type_sgn, 
> int type_precision,
>  case ABS_EXPR:
>  case ABSU_EXPR:
>if (wi::sext (rmask, rtype_precision) == -1)
> -   *mask = -1;
> +   {
> + *mask = -1;
> + *val = 0;
> +   }
>else if (wi::neg_p (rmask))
> {
>   /* Result is either rval or -rval.  */
> @@ -1385,6 +1388,7 @@ bit_value_unop (enum tree_code code, signop type_sgn, 
> int type_precision,
>
>  default:
>*mask = -1;
> +  *val = 0;
>break;
>  }
>  }
> --
> 2.41.0
>


[PATCH] RISC-V: Fix vector tuple intrinsic

2023-07-26 Thread Li Xu
Consider this following case:
void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
vint32mf2x3_t v_tuple, size_t vl) {
  return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
}

Compiler failed with:
test.c:19:1: internal compiler error: in vl_vtype_info, at 
config/riscv/riscv-vsetvl.cc:1679
   19 | }
  | ^
0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
0x143f788 get_vl_vtype_info
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
0x14407ee pass_vsetvl::init()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
0x14471cf pass_vsetvl::execute(function*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
scalar type to float16, eliminate warning.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
* config/riscv/vector.md: add tuple mode in attr sew.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple-intrinsic.c: New test.
---
 gcc/config/riscv/riscv-vector-builtins.def| 50 +--
 gcc/config/riscv/vector-iterators.md  |  1 +
 gcc/config/riscv/vector.md|  1 +
 .../riscv/rvv/base/tuple-intrinsic.c  | 23 +
 4 files changed, 50 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-intrinsic.c

diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 0e49480703b..6661629aad8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, uint64, 
RVVM8DI, _u64m8, _u64,
 DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, 
_f16mf4,
  _f16, _e16mf4)
 /* Define tuple types for SEW = 16, LMUL = MF4. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float, 2, _f16mf4x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float, 3, _f16mf4x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float, 4, _f16mf4x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float, 5, _f16mf4x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float, 6, _f16mf4x6)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float, 7, _f16mf4x7)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float, 8, _f16mf4x8)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float16, 2, _f16mf4x2)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float16, 3, _f16mf4x3)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float16, 4, _f16mf4x4)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float16, 5, _f16mf4x5)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float16, 6, _f16mf4x6)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float16, 7, _f16mf4x7)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float16, 8, _f16mf4x8)
 /* LMUL = 1/2.  */
 DEF_RVV_TYPE (vfloat16mf2_t, 18, __rvv_float16mf2_t, float16, RVVMF2HF, 
_f16mf2,
  _f16, _e16mf2)
 /* Define tuple types for SEW = 16, LMUL = MF2. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x2_t, 20, __rvv_float16mf2x2_t, vfloat16mf2_t, 
float, 2, _f16mf2x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x3_t, 20, __rvv_float16mf2x3_t, vfloat16mf2_t, 
float, 3, _f16mf2x3)

Re: [PATCH v2] RISC-V: Fix vector tuple intrinsic

2023-07-26 Thread juzhe.zh...@rivai.ai
A minor fix:
+#include 

Better change it into:
+#include "riscv_vector.h"

Since we have a riscv_vector.h wrapper header added by kito.

I don't remember why we should use it. Maybe kito can answer it.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-07-26 16:00
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH v2] RISC-V: Fix vector tuple intrinsic
Consider this following case:
void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
vint32mf2x3_t v_tuple, size_t vl) {
  return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
}
 
Compiler failed with:
test.c:19:1: internal compiler error: in vl_vtype_info, at 
config/riscv/riscv-vsetvl.cc:1679
   19 | }
  | ^
0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
0x143f788 get_vl_vtype_info
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
0x14407ee pass_vsetvl::init()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
0x14471cf pass_vsetvl::execute(function*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
scalar type to float16, eliminate warning.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
* config/riscv/vector.md: add tuple mode in attr sew.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/tuple-intrinsic.c: New test.
---
gcc/config/riscv/riscv-vector-builtins.def| 50 +--
gcc/config/riscv/vector-iterators.md  |  1 +
gcc/config/riscv/vector.md|  1 +
.../riscv/rvv/base/tuple-intrinsic.c  | 23 +
4 files changed, 50 insertions(+), 25 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 0e49480703b..6661629aad8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, uint64, 
RVVM8DI, _u64m8, _u64,
DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, _f16mf4,
  _f16, _e16mf4)
/* Define tuple types for SEW = 16, LMUL = MF4. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float, 2, _f16mf4x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float, 3, _f16mf4x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float, 4, _f16mf4x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float, 5, _f16mf4x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float, 6, _f16mf4x6)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float, 7, _f16mf4x7)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float, 8, _f16mf4x8)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float16, 2, _f16mf4x2)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float16, 3, _f16mf4x3)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float16, 4, _f16mf4x4)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float16, 5, _f16mf4x5)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float16, 6, _f16mf4x6)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float16, 7, _f16mf4x7)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float16, 8, _f16mf4x8)
/* LMUL = 1/2.  */
DEF_RVV_TYPE 

[PATCH v2] RISC-V: Fix vector tuple intrinsic

2023-07-26 Thread Li Xu
Consider this following case:
void test_vsoxseg3ei32_v_i32mf2x3(int32_t *base, vuint32mf2_t bindex, 
vint32mf2x3_t v_tuple, size_t vl) {
  return __riscv_vsoxseg3ei32_v_i32mf2x3(base, bindex, v_tuple, vl);
}

Compiler failed with:
test.c:19:1: internal compiler error: in vl_vtype_info, at 
config/riscv/riscv-vsetvl.cc:1679
   19 | }
  | ^
0x1439ec2 riscv_vector::vl_vtype_info::vl_vtype_info(riscv_vector::avl_info, 
unsigned char, riscv_vector::vlmul_type, unsigned char, bool, bool)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1679
0x143f788 get_vl_vtype_info
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:807
0x143f788 riscv_vector::vector_insn_info::parse_insn(rtl_ssa::insn_info*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:1843
0x1440371 riscv_vector::vector_infos_manager::vector_infos_manager()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:2350
0x14407ee pass_vsetvl::init()
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4581
0x14471cf pass_vsetvl::execute(function*)
../.././riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Change 
scalar type to float16, eliminate warning.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/vector-iterators.md: add RVVM4x2DF in iterator V4T.
* config/riscv/vector.md: add tuple mode in attr sew.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple-intrinsic.c: New test.
---
 gcc/config/riscv/riscv-vector-builtins.def| 50 +--
 gcc/config/riscv/vector-iterators.md  |  1 +
 gcc/config/riscv/vector.md|  1 +
 .../riscv/rvv/base/tuple-intrinsic.c  | 23 +
 4 files changed, 50 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-intrinsic.c

diff --git a/gcc/config/riscv/riscv-vector-builtins.def 
b/gcc/config/riscv/riscv-vector-builtins.def
index 0e49480703b..6661629aad8 100644
--- a/gcc/config/riscv/riscv-vector-builtins.def
+++ b/gcc/config/riscv/riscv-vector-builtins.def
@@ -441,47 +441,47 @@ DEF_RVV_TYPE (vuint64m8_t, 16, __rvv_uint64m8_t, uint64, 
RVVM8DI, _u64m8, _u64,
 DEF_RVV_TYPE (vfloat16mf4_t, 18, __rvv_float16mf4_t, float16, RVVMF4HF, 
_f16mf4,
  _f16, _e16mf4)
 /* Define tuple types for SEW = 16, LMUL = MF4. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float, 2, _f16mf4x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float, 3, _f16mf4x3)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float, 4, _f16mf4x4)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float, 5, _f16mf4x5)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float, 6, _f16mf4x6)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float, 7, _f16mf4x7)
-DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float, 8, _f16mf4x8)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x2_t, 20, __rvv_float16mf4x2_t, vfloat16mf4_t, 
float16, 2, _f16mf4x2)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x3_t, 20, __rvv_float16mf4x3_t, vfloat16mf4_t, 
float16, 3, _f16mf4x3)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x4_t, 20, __rvv_float16mf4x4_t, vfloat16mf4_t, 
float16, 4, _f16mf4x4)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x5_t, 20, __rvv_float16mf4x5_t, vfloat16mf4_t, 
float16, 5, _f16mf4x5)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x6_t, 20, __rvv_float16mf4x6_t, vfloat16mf4_t, 
float16, 6, _f16mf4x6)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x7_t, 20, __rvv_float16mf4x7_t, vfloat16mf4_t, 
float16, 7, _f16mf4x7)
+DEF_RVV_TUPLE_TYPE (vfloat16mf4x8_t, 20, __rvv_float16mf4x8_t, vfloat16mf4_t, 
float16, 8, _f16mf4x8)
 /* LMUL = 1/2.  */
 DEF_RVV_TYPE (vfloat16mf2_t, 18, __rvv_float16mf2_t, float16, RVVMF2HF, 
_f16mf2,
  _f16, _e16mf2)
 /* Define tuple types for SEW = 16, LMUL = MF2. */
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x2_t, 20, __rvv_float16mf2x2_t, vfloat16mf2_t, 
float, 2, _f16mf2x2)
-DEF_RVV_TUPLE_TYPE (vfloat16mf2x3_t, 20, __rvv_float16mf2x3_t, vfloat16mf2_t, 
float, 3, _f16mf2x3)

RISC-V: Replace unspec with bitreverse in riscv_brev8_ insn

2023-07-26 Thread Jivan Hakobyan via Gcc-patches
This small patch replaces unspec opcode with bitreverse in
riscv_brev8_ insn.

gcc/ChangeLog:
* config/riscv/crypto.md (UNSPEC_BREV8): Remov.
(riscv_brev8_): Use bitreverse opcode.


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
index e4b7f0190df..d40e108b10d 100644
--- a/gcc/config/riscv/crypto.md
+++ b/gcc/config/riscv/crypto.md
@@ -19,7 +19,6 @@
 
 (define_c_enum "unspec" [
 ;; Zbkb unspecs
-UNSPEC_BREV8
 UNSPEC_ZIP
 UNSPEC_UNZIP
 UNSPEC_PACK
@@ -73,8 +72,7 @@
 ;; ZBKB extension
 (define_insn "riscv_brev8_"
   [(set (match_operand:X 0 "register_operand" "=r")
-(unspec:X [(match_operand:X 1 "register_operand" "r")]
-  UNSPEC_BREV8))]
+(bitreverse:X (match_operand:X 1 "register_operand" "r")))]
   "TARGET_ZBKB"
   "brev8\t%0,%1"
   [(set_attr "type" "crypto")])


Re: Re: [PATCH v4] RISC-V: Fixbug for fsflags instruction error using immediate.

2023-07-26 Thread Kito Cheng via Gcc-patches
Oh, yeah, my bad, there is no fscsri, gonna test and push :)

On Wed, Jul 26, 2023 at 3:20 PM juzhe.zh...@rivai.ai
 wrote:
>
> I just checked SPEC:
>
> fscsr rd, rs csrrw rd, fcsr, rs
> Swap FP control/status register
> fscsr rs csrrw x0, fcsr, rs
> Write FP control/status register
>
> It seems that fscsr doesn't have immediate form? I am not sure.
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-07-26 15:07
> To: Jin Ma
> CC: gcc-patches; jeffreyalaw; palmer; richard.sandiford; philipp.tomsich; 
> christoph.muellner; rdapp.gcc; juzhe.zhong; jinma.contrib
> Subject: Re: [PATCH v4] RISC-V: Fixbug for fsflags instruction error using 
> immediate.
> On Wed, Jul 26, 2023 at 1:41 PM Jin Ma via Gcc-patches
>  wrote:
> >
> > The pattern mistakenly believes that fsflags can use immediate numbers,
> > but in fact it does not support it. Immediate numbers should use fsflagsi.
> >
> > For example:
> > __builtin_riscv_fsflags(4);
> >
> > The following error occurred.
> > /tmp/ccoWdWqT.s: Assembler messages:
> > /tmp/ccoWdWqT.s:14: Error: illegal operands `fsflags 4'
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.md: Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/fsflags.c: New test.
> > ---
> >  gcc/config/riscv/riscv.md|  4 ++--
> >  gcc/testsuite/gcc.target/riscv/fsflags.c | 16 
> >  2 files changed, 18 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fsflags.c
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 4615e811947..24515bcf706 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -3074,7 +3074,7 @@ (define_insn "riscv_frcsr"
> >"frcsr\t%0")
> >
> >  (define_insn "riscv_fscsr"
> > -  [(unspec_volatile [(match_operand:SI 0 "csr_operand" "rK")] 
> > UNSPECV_FSCSR)]
> > +  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] 
> > UNSPECV_FSCSR)]
> >"TARGET_HARD_FLOAT || TARGET_ZFINX"
> >"fscsr\t%0")
>
> Wait, this patch still drops K?
>


  1   2   >