[Bug ipa/109445] r13-6372-g822a11a1e642e0 regression due to noline with -Ofast -march=sapphirerapids -funroll-loops -flto, 541.leela_r performance decrease by 2-3%

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109445

Andrew Pinski  changed:

   What|Removed |Added

  Component|libstdc++   |ipa
 CC||marxin at gcc dot gnu.org

--- Comment #1 from Andrew Pinski  ---
This just seems like bad luck.

Maybe look at the inline dumps see what the cost difference is.

[Bug libstdc++/109445] New: r13-6372-g822a11a1e642e0 regression due to noline with -Ofast -march=sapphirerapids -funroll-loops -flto, 541.leela_r performance decrease by 2-3%

2023-04-06 Thread zhangjungcc at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109445

Bug ID: 109445
   Summary: r13-6372-g822a11a1e642e0 regression due to noline with
-Ofast -march=sapphirerapids -funroll-loops -flto,
541.leela_r performance decrease by 2-3%
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhangjungcc at gmail dot com
  Target Milestone: ---

r13-6372-g822a11a1e642e0 regression due to noline with -Ofast
-march=sapphirerapids -funroll-loops -flto, 541.leela_r performance decrease by
2-3%

Follow is the inline dump, left dump is before the commit, right dump is after
the commit.

   [local count: 210861628]:[local count:
210861628]:
  # DEBUG BEGIN_STMT   # DEBUG BEGIN_STMT
  _466 = s_rng;_466 = s_rng;
--
  _607 = _466;<>   _561 = _466;
  _118 = _607; _118 = _561;
--
  _35 = this_72(D)->board.D.5191.m_empty_cnt; =_35 =
this_72(D)->board.D.5191.m_empty_cnt;
  _5 = _35 & 65535;_5 = _35 & 65535;
  # DEBUG this => _118 # DEBUG this => _118
  max_458 = (const uint16) _5; max_458 = (const uint16) _5;
  # DEBUG max => max_458   # DEBUG max => max_458
  # DEBUG BEGIN_STMT   # DEBUG BEGIN_STMT
--
  # DEBUG this => _118<>
  # DEBUG BEGIN_STMT
  # DEBUG mask => 4294967295
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  _467 = _118->s1;
  _468 = _467 << 13;
  _469 = _467 ^ _468;
  b_470 = _469 >> 19;
  # DEBUG b => b_470
  # DEBUG BEGIN_STMT
  _471 = _467 << 12;
  _472 = _471 & 4294959104;
  _473 = b_470 ^ _472;
  _118->s1 = _473;
  # DEBUG BEGIN_STMT
  _474 = _118->s2;
  _475 = _474 << 2;
  _476 = _474 ^ _475;
  b_477 = _476 >> 25;
  # DEBUG b => b_477
  # DEBUG BEGIN_STMT
  _478 = _474 << 4;
  _479 = _478 & 4294967168;
  _480 = b_477 ^ _479;
  _118->s2 = _480;
  # DEBUG BEGIN_STMT
  _481 = _118->s3;
  _482 = _481 << 3;
  _483 = _481 ^ _482;
  b_484 = _483 >> 11;
  # DEBUG b => b_484
  # DEBUG BEGIN_STMT
  _485 = _481 << 17;
  _486 = _485 & 4292870144;
  _487 = b_484 ^ _486;
  _118->s3 = _487;
  # DEBUG BEGIN_STMT
  _488 = _473 ^ _480;
  _489 = _487 ^ _488;
  _611 = _489; _459 = random (_118);
  # DEBUG this => NULL
  # DEBUG b => NULL
  _459 = _611;
--
  _460 = _459 >> 16;  =_460 = _459 >> 16;
  _461 = (unsigned int) max_458;   _461 = (unsigned int)
max_458;
  _462 = _460 * _461;  _462 = _460 * _461;
  _463 = _462 >> 16;   _463 = _462 >> 16;
--
  _612 = _463;<>   _563 = _463;
--
  # DEBUG this => NULL=# DEBUG this => NULL
  # DEBUG max => NULL  # DEBUG max => NULL
--
  _120 = _612;<>   _120 = _563;
--
  vidx_121 = (int) _120;  =vidx_121 = (int) _120;
  # DEBUG vidx => vidx_121 # DEBUG vidx => vidx_121
  # DEBUG BEGIN_STMT   # DEBUG BEGIN_STMT
  _37 = this_72(D)->board.D.5191.m_tomove; _37 =
this_72(D)->board.D.5191.m_tomove;
--
  # DEBUG D#1845 => 1 <>   # DEBUG D#1824 => 1
--
  # DEBUG this => this_72(D)  =# DEBUG this => this_72(D)
  # DEBUG color => _37 # DEBUG color => _37
  # DEBUG vidx => vidx_121 # DEBUG vidx => vidx_121
  # DEBUG allow_sa => 1# DEBUG allow_sa => 1
  # DEBUG BEGIN_STMT   # DEBUG BEGIN_STMT
--
  

[PATCH] PR target/109402: v850 (not v850e) variant of __muldi3() moves sp in reversed direction [PR109402]

2023-04-06 Thread
Where I talk about is: /libgcc/config/v850/lib1funcs.S L2214, L2259 - in a 
commit 8b1204d7.

There are stack-pointer operations.
I think these operations: shrink before, grow after --- may reversed way.
There is one more consideration; this version of __muldi3() does not use
local storage in stack. So the problem will be resolved simply to remove 
sp-operations.

I have no idea to show a reproduce way in shortly.
Because a problem happens with inter-procedure such as interrupt service 
routines which use storage in stack.

In my environment, the next patch works well.

---

diff --git a/libgcc/config/v850/lib1funcs.S b/libgcc/config/v850/lib1funcs.S
index 00dd61d..99e79bf 100644
--- a/libgcc/config/v850/lib1funcs.S
+++ b/libgcc/config/v850/lib1funcs.S
@@ -2211,7 +2211,6 @@
 ___muldi3:
 #ifdef __v850__
 jarl  __save_r26_r31, r10
-addi  16,  sp, sp
 mov   r6,  r28
 shr   15,  r28
 movea lo(32767), r0, r14
@@ -2256,7 +2255,6 @@
 mulh  r12, r6
 mov   r28,  r17
 mulh  r10, r17
-add   -16, sp
 mov   r28,  r12
 mulh  r8,  r12
 add   r17, r18




Re: [PATCH] [testsuite] [ppc] expect vectorization in gen-vect-11c.c

2023-04-06 Thread Alexandre Oliva via Gcc-patches
On Apr  6, 2023, "Kewen.Lin"  wrote:

> on 2023/4/6 13:20, Alexandre Oliva wrote:
>> I confirm I observe the problem with gcc-12 targeting ppc64-vx7r2,
>> containing the backported patch, and that the loop is vectorized,
>> failing the test.

I take that back.  My notes indicate I looked into this failure on March
15th.  The patch you referenced was dated Feb 10, so I assumed it was
already in when I looked into it: my confirmation amounted to checking
what I had observed according to my notes, and when.

But now that you asked me to investigate it again, I used a far more
recent tree, and I failed to duplicate it.  Digging further, I found out
the patch, despite its commit date, was only merged into gcc-12 on March
16th.  What I was missing to get the intended effects of the fix was
just a fresher tree athat actually contained the fix.

I suppose this means we don't need the testsuite tweak, after all.
Patch withdrawn.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] [testsuite] [ppc] skip ppc-fortran if fortran is disabled

2023-04-06 Thread Alexandre Oliva via Gcc-patches
Hello, Kewen,

On Apr  6, 2023, "Kewen.Lin"  wrote:

> on 2023/4/6 14:19, Alexandre Oliva wrote:

>> Skip ppc-fortran.exp if a trivial fortran program cannot be compiled.

> IIUC, without this patch and under the configuration disabling fortran,
> all the cases in this sub-testsuite would fail?  Thanks for fixing!

Yup

> super nit: this check only needs proc check_no_compiler_messages,
> can it be moved a bit upward just after line "load_lib gfortran-dg.exp"
> then it can skip more unnecessary codes?

I wasn't sure, so I'd put it after supporting code.  Turns out it can.
Here's what I've just finished retesting, and am thus checking in.
Thanks,

> OK with this nit fixed (if you agree).  Thanks!


[testsuite] [ppc] skip ppc-fortran if fortran is disabled

Skip ppc-fortran.exp if a trivial fortran program cannot be compiled.


for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: Test for
fortran compiler, skip if missing.
---
 .../gcc.target/powerpc/ppc-fortran/ppc-fortran.exp |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp 
b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
index f7e99ac848753..f7b7c05487cda 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
@@ -21,6 +21,16 @@ if { ![istarget powerpc*-*-*] && ![istarget rs6000-*-*] } 
then {
   return
 }
 
+# Make sure there is a fortran compiler to test.
+if { ![check_no_compiler_messages fortran_available assembly {
+! Fortran
+program P
+  stop
+end program P
+} ""] } {
+return
+}
+
 # Load support procs.
 load_lib gfortran-dg.exp
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH V2] RISC-V: Modified validation information for contracts-tmpl-spec2.C

2023-04-06 Thread shiyulong
From: yulong 

This patch fixes the problem of the contracts-tmpl-spec2.c running failure.
When run the dejagnu test, I find that the output is inconsistent with that 
verified
in the testcase. So I try to modify it, and then it can be passed.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/contracts-tmpl-spec2.C:delete some output information

---
 gcc/testsuite/g++.dg/contracts/contracts-tmpl-spec2.C | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/contracts/contracts-tmpl-spec2.C 
b/gcc/testsuite/g++.dg/contracts/contracts-tmpl-spec2.C
index 82117671b2d..17048584ac9 100644
--- a/gcc/testsuite/g++.dg/contracts/contracts-tmpl-spec2.C
+++ b/gcc/testsuite/g++.dg/contracts/contracts-tmpl-spec2.C
@@ -369,15 +369,9 @@ int main(int, char**)
 // { dg-output {contract violation in function G3::f at .*:148: s 
> 2(\n|\r\n|\r)} }
 // { dg-output {\[continue:on\](\n|\r\n|\r)} }
 // { dg-output {G3 full int double(\n|\r\n|\r)} }
-// { dg-output {contract violation in function G3::f at .*:124: t 
> 0(\n|\r\n|\r)} }
-// { dg-output {\[continue:on\](\n|\r\n|\r)} }
-// { dg-output {contract violation in function G3::f at .*:125: s 
> 0(\n|\r\n|\r)} }
-// { dg-output {\[continue:on\](\n|\r\n|\r)} }
 // { dg-output {G3 general T S(\n|\r\n|\r)} }
 // { dg-output {contract violation in function G3::f at .*:139: t > 
1(\n|\r\n|\r)} }
 // { dg-output {\[continue:on\](\n|\r\n|\r)} }
-// { dg-output {contract violation in function G3::f at .*:140: s > 
1(\n|\r\n|\r)} }
-// { dg-output {\[continue:on\](\n|\r\n|\r)} }
 // { dg-output {G3 partial int S(\n|\r\n|\r)} }
 // { dg-output {G3 full int C(\n|\r\n|\r)} }
 // { dg-output {G3 full int C(\n|\r\n|\r)} }
-- 
2.25.1



[PATCH V4] RISC-V: Fix a redefinition bug for the fd-4.c

2023-04-06 Thread shiyulong
From: yulong 

This patch fix a redefinition bug.
There are have a definition about mode_t in the fd-4.c, but it duplicates the 
definition in types.h that be included by stdio.h.

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/fd-4.c: delete the definition of mode_t.

---
 gcc/testsuite/gcc.dg/analyzer/fd-4.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-4.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
index 994bad84342..9ec015679e9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
@@ -13,11 +13,6 @@ int read (int fd, void *buf, int nbytes);
 #define O_WRONLY 1
 #define O_RDWR 2
 
-typedef enum {
-  S_IRWXU
-  // etc
-} mode_t;
-
 int creat (const char *, mode_t mode);
 
 void
-- 
2.25.1



RE: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-06 Thread Li, Pan2 via Gcc-patches
The bootstrap in X86 passed with this patch applied, target commit id 
a8c8351cf4fedb842988eed4f73304019c361e86 (13.0.1 20230407).

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of juzhe.zh...@rivai.ai
Sent: Friday, April 7, 2023 9:48 AM
To: gcc-patches@gcc.gnu.org
Cc: richard.sandif...@arm.com; rguent...@suse.de; jeffreya...@gmail.com; 
Juzhe-Zhong 
Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization

From: Juzhe-Zhong 

This patch is to add WHILE_LEN pattern.
It's inspired by RVV ISA simple "vvaddint32.s" example:
https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s

More details are in "vect_set_loop_controls_by_while_len" implementation and 
comments.

Consider such following case:
#define N 16
int src[N];
int dest[N];

void
foo (int n)
{
  for (int i = 0; i < n; i++)
dest[i] = src[i];
}

-march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
-fno-vect-cost-model -fno-tree-loop-distribute-patterns:

foo:
ble a0,zero,.L1
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
addia3,a4,64
csrra2,vlenb
.L3:
vsetvli a5,a0,e32,m1,ta,ma
vle32.v v1,0(a4)
sub a0,a0,a5
vse32.v v1,0(a3)
add a4,a4,a2
add a3,a3,a2
bne a0,zero,.L3
.L1:
ret

gcc/ChangeLog:

* doc/md.texi: Add WHILE_LEN support.
* internal-fn.cc (while_len_direct): Ditto.
(expand_while_len_optab_fn): Ditto.
(direct_while_len_optab_supported_p): Ditto.
* internal-fn.def (WHILE_LEN): Ditto.
* optabs.def (OPTAB_D): Ditto.
* tree-ssa-loop-manip.cc (create_iv): Ditto.
* tree-ssa-loop-manip.h (create_iv): Ditto.
* tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): Ditto.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (vect_get_loop_len): Ditto.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_get_loop_len): Ditto.

---
 gcc/doc/md.texi |  14 +++
 gcc/internal-fn.cc  |  29 ++
 gcc/internal-fn.def |   1 +
 gcc/optabs.def  |   1 +
 gcc/tree-ssa-loop-manip.cc  |   4 +-
 gcc/tree-ssa-loop-manip.h   |   2 +-
 gcc/tree-vect-loop-manip.cc | 186 ++--
 gcc/tree-vect-loop.cc   |  35 +--
 gcc/tree-vect-stmts.cc  |   9 +-
 gcc/tree-vectorizer.h   |   4 +-
 10 files changed, 264 insertions(+), 21 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 8e3113599fd..72178ab014c 
100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);  @end 
smallexample
 
+@cindex @code{while_len@var{m}@var{n}} instruction pattern @item 
+@code{while_len@var{m}@var{n}} Set operand 0 to the number of active 
+elements in vector will be updated value.
+operand 1 is the total elements need to be updated value.
+operand 2 is the vectorization factor.
+The operation is equivalent to:
+
+@smallexample
+operand0 = MIN (operand1, operand2);
+operand2 can be const_poly_int or poly_int related to vector mode size.
+Some target like RISC-V has a standalone instruction to get MIN (n, 
+MODE SIZE) so that we can reduce a use of general purpose register.
+@end smallexample
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern  @item 
@samp{check_raw_ptrs@var{m}}  Check whether, given two pointers @var{a} and 
@var{b} and a length @var{len}, diff --git a/gcc/internal-fn.cc 
b/gcc/internal-fn.cc index 6e81dc05e0e..5f44def90d3 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -127,6 +127,7 @@ init_internal_fns ()  #define cond_binary_direct { 1, 1, 
true }  #define cond_ternary_direct { 1, 1, true }  #define while_direct { 0, 
2, false }
+#define while_len_direct { 0, 0, false }
 #define fold_extract_direct { 2, 2, false }  #define fold_left_direct { 1, 1, 
false }  #define mask_fold_left_direct { 1, 1, false } @@ -3702,6 +3703,33 @@ 
expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 emit_move_insn (lhs_rtx, ops[0].value);  }
 
+/* Expand WHILE_LEN call STMT using optab OPTAB.  */ static void 
+expand_while_len_optab_fn (internal_fn, gcall *stmt, convert_optab 
+optab) {
+  expand_operand ops[3];
+  tree rhs_type[2];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree lhs_type = TREE_TYPE (lhs);
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);  
+ create_output_operand ([0], lhs_rtx, TYPE_MODE (lhs_type));
+
+  for (unsigned int i = 0; i < gimple_call_num_args (stmt); ++i)
+{
+  tree rhs = gimple_call_arg (stmt, i);
+  rhs_type[i] = TREE_TYPE (rhs);
+  rtx rhs_rtx = expand_normal (rhs);
+  create_input_operand ([i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
+}
+
+  insn_code icode = 

[RFC] arm: atomics: ARMv7 doubleword atomicity

2023-04-06 Thread mudrievskyjpetro via Gcc
Dear maintainers of GCC arm port,

Can you share your thoughts on the email I've sent to the mailing list?
I've originally sent it to Will Deacon, gcc and linux mailing lists, but
no one is responding, so I'm pinging you directly.

https://gcc.gnu.org/pipermail/gcc/2023-April/241093.html

---
Peter


Re: [PATCH] [PR99708] [rs6000] don't expect __ibm128 with 64-bit long double

2023-04-06 Thread Alexandre Oliva via Gcc-patches
On Apr  6, 2023, "Kewen.Lin"  wrote:

> The reason why personally I preferred to fix it with xfail is that:

Got it.  I'm convinced, and I agree.

I tried an xfail in the initial dg-do, but that is no good for a compile
error, so I went for a dg-bogus xfail.  I hope that will still have the
intended effect when __ibm128 is defined when it currently isn't.

There is a dg-skip-if in this test on the trunk, covering some targets,
that IIRC are longdouble64, so maybe that's related and I could have
dropped them, but I wasn't sure, so I left them alone.

Regstrapped on ppc64-linux-gnu (pass), also tested on ppc64-vx7r2/gcc-12
(xfail).  Ok to install?


[PR99708] [rs6000] don't expect __ibm128 with 64-bit long double

When long double is 64-bit wide, as on vxworks, the rs6000 backend
defines neither the __ibm128 type nor the __SIZEOF_IBM128__ macro, but
pr99708.c expected both to be always defined.  Adjust the test to
match the implementation.


for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/pr99708.c: Accept lack of
__SIZEOF_IBM128__ when long double is 64-bit wide.
---
 gcc/testsuite/gcc.target/powerpc/pr99708.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr99708.c 
b/gcc/testsuite/gcc.target/powerpc/pr99708.c
index 02b40ebc40d3d..66a5f88479330 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr99708.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr99708.c
@@ -14,7 +14,7 @@
 int main (void)
 {
   if (__SIZEOF_FLOAT128__ != sizeof (__float128)
-  || __SIZEOF_IBM128__ != sizeof (__ibm128))
+  || __SIZEOF_IBM128__ != sizeof (__ibm128)) /* { dg-bogus "undeclared" "" 
{ xfail longdouble64 } } */
 abort ();
 
   return 0;


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is to add WHILE_LEN pattern.
It's inspired by RVV ISA simple "vvaddint32.s" example:
https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s

More details are in "vect_set_loop_controls_by_while_len" implementation
and comments.

Consider such following case:
#define N 16
int src[N];
int dest[N];

void
foo (int n)
{
  for (int i = 0; i < n; i++)
dest[i] = src[i];
}

-march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
-fno-vect-cost-model -fno-tree-loop-distribute-patterns:

foo:
ble a0,zero,.L1
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
addia3,a4,64
csrra2,vlenb
.L3:
vsetvli a5,a0,e32,m1,ta,ma
vle32.v v1,0(a4)
sub a0,a0,a5
vse32.v v1,0(a3)
add a4,a4,a2
add a3,a3,a2
bne a0,zero,.L3
.L1:
ret

gcc/ChangeLog:

* doc/md.texi: Add WHILE_LEN support.
* internal-fn.cc (while_len_direct): Ditto.
(expand_while_len_optab_fn): Ditto.
(direct_while_len_optab_supported_p): Ditto.
* internal-fn.def (WHILE_LEN): Ditto.
* optabs.def (OPTAB_D): Ditto.
* tree-ssa-loop-manip.cc (create_iv): Ditto.
* tree-ssa-loop-manip.h (create_iv): Ditto.
* tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): Ditto.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (vect_get_loop_len): Ditto.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_get_loop_len): Ditto.

---
 gcc/doc/md.texi |  14 +++
 gcc/internal-fn.cc  |  29 ++
 gcc/internal-fn.def |   1 +
 gcc/optabs.def  |   1 +
 gcc/tree-ssa-loop-manip.cc  |   4 +-
 gcc/tree-ssa-loop-manip.h   |   2 +-
 gcc/tree-vect-loop-manip.cc | 186 ++--
 gcc/tree-vect-loop.cc   |  35 +--
 gcc/tree-vect-stmts.cc  |   9 +-
 gcc/tree-vectorizer.h   |   4 +-
 10 files changed, 264 insertions(+), 21 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8e3113599fd..72178ab014c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{while_len@var{m}@var{n}} instruction pattern
+@item @code{while_len@var{m}@var{n}}
+Set operand 0 to the number of active elements in vector will be updated value.
+operand 1 is the total elements need to be updated value.
+operand 2 is the vectorization factor.
+The operation is equivalent to:
+
+@smallexample
+operand0 = MIN (operand1, operand2);
+operand2 can be const_poly_int or poly_int related to vector mode size.
+Some target like RISC-V has a standalone instruction to get MIN (n, MODE SIZE) 
so
+that we can reduce a use of general purpose register.
+@end smallexample
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 6e81dc05e0e..5f44def90d3 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -127,6 +127,7 @@ init_internal_fns ()
 #define cond_binary_direct { 1, 1, true }
 #define cond_ternary_direct { 1, 1, true }
 #define while_direct { 0, 2, false }
+#define while_len_direct { 0, 0, false }
 #define fold_extract_direct { 2, 2, false }
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
@@ -3702,6 +3703,33 @@ expand_while_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
 emit_move_insn (lhs_rtx, ops[0].value);
 }
 
+/* Expand WHILE_LEN call STMT using optab OPTAB.  */
+static void
+expand_while_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  expand_operand ops[3];
+  tree rhs_type[2];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree lhs_type = TREE_TYPE (lhs);
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand ([0], lhs_rtx, TYPE_MODE (lhs_type));
+
+  for (unsigned int i = 0; i < gimple_call_num_args (stmt); ++i)
+{
+  tree rhs = gimple_call_arg (stmt, i);
+  rhs_type[i] = TREE_TYPE (rhs);
+  rtx rhs_rtx = expand_normal (rhs);
+  create_input_operand ([i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
+}
+
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (rhs_type[0]));
+
+  expand_insn (icode, 3, ops);
+  if (!rtx_equal_p (lhs_rtx, ops[0].value))
+emit_move_insn (lhs_rtx, ops[0].value);
+}
+
 /* Expand a call to a convert-like optab using the operands in STMT.
FN has a single output operand and NARGS input operands.  */
 
@@ -3843,6 +3871,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
tree_pair types,
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 

Re: Re: [PATCH 2/3] RISC-V: Enable basic RVV auto-vectorization and support WHILE_LEN/LEN_LOAD/LEN_STORE pattern

2023-04-06 Thread juzhe.zh...@rivai.ai
Address all comments, and fix all of them in these splitted patches:

These 5 patches only including RISC-V port changes:
https://patchwork.sourceware.org/project/gcc/patch/20230407011143.46004-1-juzhe.zh...@rivai.ai/
 
https://patchwork.sourceware.org/project/gcc/patch/20230407012129.63142-1-juzhe.zh...@rivai.ai/
 
https://patchwork.sourceware.org/project/gcc/patch/20230407012503.65215-1-juzhe.zh...@rivai.ai/
 
https://patchwork.sourceware.org/project/gcc/patch/20230407013413.127686-1-juzhe.zh...@rivai.ai/
 
https://patchwork.sourceware.org/project/gcc/patch/20230407013701.129875-1-juzhe.zh...@rivai.ai/
 

I would like to resend a patch for pure middle-end changes for WHILE_LEN 
pattern support in Middle-end.
Ignore this serise of patches.

Thanks!


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-04-07 00:04
To: juzhe.zhong
CC: gcc-patches; palmer; richard.sandiford; rguenther; jeffreyalaw
Subject: Re: [PATCH 2/3] RISC-V: Enable basic RVV auto-vectorization and 
support WHILE_LEN/LEN_LOAD/LEN_STORE pattern
Is changes for riscv-vsetvl.cc necessary for autovec? or is it
additional optimization for the autovec use case? I would suggest
splitting that if it's later one.
 
And plz split out fixed-vlmax part into separated patch, that would be
easier to review.
 
On Thu, Apr 6, 2023 at 10:44 PM  wrote:
>
> From: Juzhe-Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Add 
> compile option for RVV auto-vectorization.
> (enum riscv_autovec_lmul_enum): Ditto.
> * config/riscv/riscv-protos.h (get_vector_mode): Remove unused global 
> function.
> (preferred_simd_mode): Enable basic auto-vectorization for RVV.
> (expand_while_len): Enable while_len pattern.
> * config/riscv/riscv-v.cc (get_avl_type_rtx): Ditto.
> (autovec_use_vlmax_p): New function.
> (preferred_simd_mode): New function.
> (expand_while_len): Ditto.
> * config/riscv/riscv-vector-switch.def (ENTRY): Disable SEW = 64 for 
> MIN_VLEN > 32 but EEW = 32.
 
It's bug fix? plz send a separated patch if it's a bug.
 
> * config/riscv/riscv-vsetvl.cc (get_all_successors): New function.
> (get_all_overlap_blocks): Ditto.
> (local_eliminate_vsetvl_insn): Ditto.
> (vector_insn_info::skip_avl_compatible_p): Ditto.
> (vector_insn_info::merge): Ditto.
> (pass_vsetvl::compute_local_backward_infos): Ehance VSETVL PASS for 
> RVV auto-vectorization.
> (pass_vsetvl::global_eliminate_vsetvl_p): Ditto.
> (pass_vsetvl::cleanup_insns): Ditto.
> * config/riscv/riscv-vsetvl.h: Ditto.
> * config/riscv/riscv.cc (riscv_convert_vector_bits): Add basic RVV 
> auto-vectorization support.
> (riscv_preferred_simd_mode): Ditto.
> (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
> * config/riscv/riscv.opt: Add compile option.
> * config/riscv/vector.md: Add RVV auto-vectorization.
> * config/riscv/autovec.md: New file.
>
> ---
>  gcc/config/riscv/autovec.md  |  63 +++
>  gcc/config/riscv/riscv-opts.h|  16 ++
>  gcc/config/riscv/riscv-protos.h  |   3 +-
>  gcc/config/riscv/riscv-v.cc  |  61 ++-
>  gcc/config/riscv/riscv-vector-switch.def |  47 +++--
>  gcc/config/riscv/riscv-vsetvl.cc | 210 ++-
>  gcc/config/riscv/riscv-vsetvl.h  |   1 +
>  gcc/config/riscv/riscv.cc|  34 +++-
>  gcc/config/riscv/riscv.opt   |  40 +
>  gcc/config/riscv/vector.md   |   6 +-
>  10 files changed, 457 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/config/riscv/autovec.md
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> new file mode 100644
> index 000..ff616d81586
> --- /dev/null
> +++ b/gcc/config/riscv/autovec.md
> @@ -0,0 +1,63 @@
> +;; Machine description for auto-vectorization using RVV for GNU compiler.
> +;; Copyright (C) 2023-2023 Free Software Foundation, Inc.
 
2023 rather than 2023-2023
 
> +;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify
> +;; it under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +
> +;; GCC is distributed in the hope that it will be useful,
> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +;; GNU General Public License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; .
> +
> +;; =
> +;; == While_len
> +;; 

[PATCH] RISC-V: Add testcases for RVV auto-vectorization

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add auto-vectorization testing.
* gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/template-1.h: New test.
* gcc.target/riscv/rvv/autovec/v-1.c: New test.
* gcc.target/riscv/rvv/autovec/v-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-3.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.


---
 .../rvv/autovec/partial/multiple_rgroup-1.c   |   6 +
 .../rvv/autovec/partial/multiple_rgroup-1.h   | 304 ++
 .../rvv/autovec/partial/multiple_rgroup-2.c   |   6 +
 .../rvv/autovec/partial/multiple_rgroup-2.h   | 546 ++
 .../autovec/partial/multiple_rgroup_run-1.c   |  19 +
 .../autovec/partial/multiple_rgroup_run-2.c   |  19 +
 .../rvv/autovec/partial/single_rgroup-1.c |   8 +
 .../rvv/autovec/partial/single_rgroup-1.h | 106 
 .../rvv/autovec/partial/single_rgroup_run-1.c |  19 +
 .../gcc.target/riscv/rvv/autovec/template-1.h |  68 +++
 .../gcc.target/riscv/rvv/autovec/v-1.c|   4 +
 .../gcc.target/riscv/rvv/autovec/v-2.c|   6 +
 .../gcc.target/riscv/rvv/autovec/zve32f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |   5 +
 .../gcc.target/riscv/rvv/autovec/zve32f-3.c   |   6 +
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve32f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-3.c   |   6 +
 .../riscv/rvv/autovec/zve32x_zvl128b-1.c  |   5 +
 .../riscv/rvv/autovec/zve32x_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64d-3.c   |   6 +
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64f-3.c   |   6 +
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64x-2.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64x-3.c   |   6 +
 .../riscv/rvv/autovec/zve64x_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64x_zvl128b-2.c  |   6 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  16 +
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-17.c   |   2 +-
 39 files changed, 1252 insertions(+), 1 deletion(-)
 create mode 100644 

[PATCH] RISC-V: Add local user vsetvl instruction elimination

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is to enhance optimization for auto-vectorization.

Before this patch:

Loop:
vsetvl a5,a2...
vsetvl zero,a5...
vle

After this patch:

Loop:
vsetvl a5,a2
vle

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): New 
function.
(vector_insn_info::skip_avl_compatible_p): Ditto.
(vector_insn_info::merge): Remove default value.
(pass_vsetvl::compute_local_backward_infos): Ditto.
(pass_vsetvl::cleanup_insns): Add local vsetvl elimination.
* config/riscv/riscv-vsetvl.h: Ditto.

---
 gcc/config/riscv/riscv-vsetvl.cc | 71 +++-
 gcc/config/riscv/riscv-vsetvl.h  |  1 +
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 7e8a5376705..b402035f7a5 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1054,6 +1054,51 @@ change_vsetvl_insn (const insn_info *insn, const 
vector_insn_info )
   change_insn (rinsn, new_pat);
 }
 
+static void
+local_eliminate_vsetvl_insn (const vector_insn_info )
+{
+  const insn_info *insn = dem.get_insn ();
+  if (!insn || insn->is_artificial ())
+return;
+  rtx_insn *rinsn = insn->rtl ();
+  const bb_info *bb = insn->bb ();
+  if (vsetvl_insn_p (rinsn))
+{
+  rtx vl = get_vl (rinsn);
+  for (insn_info *i = insn->next_nondebug_insn ();
+  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
+   {
+ if (i->is_call () || i->is_asm ()
+ || find_access (i->defs (), VL_REGNUM)
+ || find_access (i->defs (), VTYPE_REGNUM))
+   return;
+
+ if (has_vtype_op (i->rtl ()))
+   {
+ if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
+   return;
+ rtx avl = get_avl (i->rtl ());
+ if (avl != vl)
+   return;
+ set_info *def = find_access (i->uses (), REGNO (avl))->def ();
+ if (def->insn () != insn)
+   return;
+
+ vector_insn_info new_info;
+ new_info.parse_insn (i);
+ if (!new_info.skip_avl_compatible_p (dem))
+   return;
+
+ new_info.set_avl_info (dem.get_avl_info ());
+ new_info = dem.merge (new_info, LOCAL_MERGE);
+ change_vsetvl_insn (insn, new_info);
+ eliminate_insn (PREV_INSN (i->rtl ()));
+ return;
+   }
+   }
+}
+}
+
 static bool
 source_equal_p (insn_info *insn1, insn_info *insn2)
 {
@@ -1984,6 +2029,19 @@ vector_insn_info::compatible_p (const vector_insn_info 
) const
   return true;
 }
 
+bool
+vector_insn_info::skip_avl_compatible_p (const vector_insn_info ) const
+{
+  gcc_assert (valid_or_dirty_p () && other.valid_or_dirty_p ()
+ && "Can't compare invalid demanded infos");
+  unsigned array_size = sizeof (incompatible_conds) / sizeof (demands_cond);
+  /* Bypass AVL incompatible cases.  */
+  for (unsigned i = 1; i < array_size; i++)
+if (incompatible_conds[i].dual_incompatible_p (*this, other))
+  return false;
+  return true;
+}
+
 bool
 vector_insn_info::compatible_avl_p (const vl_vtype_info ) const
 {
@@ -2178,7 +2236,7 @@ vector_insn_info::fuse_mask_policy (const 
vector_insn_info ,
 
 vector_insn_info
 vector_insn_info::merge (const vector_insn_info _info,
-enum merge_type type = LOCAL_MERGE) const
+enum merge_type type) const
 {
   if (!vsetvl_insn_p (get_insn ()->rtl ()))
 gcc_assert (this->compatible_p (merge_info)
@@ -2716,7 +2774,7 @@ pass_vsetvl::compute_local_backward_infos (const bb_info 
*bb)
&& !reg_available_p (insn, change))
  && change.compatible_p (info))
{
- info = change.merge (info);
+ info = change.merge (info, LOCAL_MERGE);
  /* Fix PR109399, we should update user vsetvl instruction
 if there is a change in demand fusion.  */
  if (vsetvl_insn_p (insn->rtl ()))
@@ -3998,6 +4056,15 @@ pass_vsetvl::cleanup_insns (void) const
   for (insn_info *insn : bb->real_nondebug_insns ())
{
  rtx_insn *rinsn = insn->rtl ();
+ const auto  = m_vector_manager->vector_insn_infos[insn->uid ()];
+ /* Eliminate local vsetvl:
+  bb 0:
+  vsetvl a5,a6,...
+  vsetvl zero,a5.
+
+Eliminate vsetvl in bb2 when a5 is only coming from
+bb 0.  */
+ local_eliminate_vsetvl_insn (dem);
 
  if (vlmax_avl_insn_p (rinsn))
{
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index d05472c86a0..d7a6c14e931 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -380,6 +380,7 @@ public:
   void fuse_mask_policy (const vector_insn_info &, const 

[PATCH] RISC-V: Enable basic RVV auto-vectorization support

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

Enable basic auto-vectorization support of WHILE_LEN/LEN_LOAD/LEN_STORE.
gcc/ChangeLog:

* config/riscv/riscv-protos.h (preferred_simd_mode): New function.
(expand_while_len): Ditto.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Ditto.
(preferred_simd_mode): Ditto.
(expand_while_len): Ditto.
* config/riscv/riscv.cc (riscv_convert_vector_bits): Add basic 
auto-vectorization support.
(riscv_preferred_simd_mode): New function.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): New targethook for RVV 
auto-vectorization support.
* config/riscv/vector.md: Add basic autovec.
* config/riscv/autovec.md: New file.

---
 gcc/config/riscv/autovec.md | 63 ++
 gcc/config/riscv/riscv-protos.h |  2 +
 gcc/config/riscv/riscv-v.cc | 78 +
 gcc/config/riscv/riscv.cc   | 24 +-
 gcc/config/riscv/vector.md  |  4 +-
 5 files changed, 169 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/autovec.md

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
new file mode 100644
index 000..34561383041
--- /dev/null
+++ b/gcc/config/riscv/autovec.md
@@ -0,0 +1,63 @@
+;; Machine description for auto-vectorization using RVV for GNU compiler.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; =
+;; == While_len
+;; =
+
+(define_expand "while_len"
+  [(match_operand:P 0 "register_operand")
+   (match_operand:P 1 "vector_length_operand")
+   (match_operand:P 2 "")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_while_len (operands);
+  DONE;
+})
+
+;; =
+;; == Loads/Stores
+;; =
+
+;; len_load/len_store is a sub-optimal pattern for RVV auto-vectorization 
support.
+;; We will replace them when len_maskload/len_maskstore is supported in loop 
vectorizer.
+(define_expand "len_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "memory_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+(define_expand "len_store_"
+  [(match_operand:V 0 "memory_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4611447ddde..6cd91987199 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -206,6 +206,8 @@ enum vlen_enum
 bool slide1_sew64_helper (int, machine_mode, machine_mode,
  machine_mode, rtx *);
 rtx gen_avl_for_scalar_move (rtx);
+machine_mode preferred_simd_mode (scalar_mode);
+void expand_while_len (rtx *);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index ed3c5e0756f..84d33fcdd14 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -43,6 +43,7 @@
 #include "optabs.h"
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
+#include "targhooks.h"
 
 using namespace riscv_vector;
 
@@ -729,4 +730,81 @@ gen_avl_for_scalar_move (rtx avl)
 }
 }
 
+/* SCALABLE means that the vector-length is agnostic (run-time invariant and
+   compile-time unknown). FIXED meands that the vector-length is specific
+   (compile-time known). Both RVV_SCALABLE and RVV_FIXED_VLMAX are doing
+   auto-vectorization using VLMAX vsetvl configuration.  */
+static bool
+autovec_use_vlmax_p (void)
+{
+  return riscv_autovec_preference == RVV_SCALABLE
+|| riscv_autovec_preference == 

[PATCH] RISC-V: Add RVV auto-vectorization compile option

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

The next patch to enable basic RVV auto-vectorization of
VLA auto-vectorization (RVV_SCALABLE) and fixed-length VLS auto-vectorization 
(RVV_FIXED_VLMAX).

We will support RVV_FIXED_VLMIN in the future.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Add 
RVV auto-vectorization compile option.
(enum riscv_autovec_lmul_enum): Ditto.
* config/riscv/riscv.opt: Ditto.

---
 gcc/config/riscv/riscv-opts.h | 15 ++
 gcc/config/riscv/riscv.opt| 37 +++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index cf0cd669be4..4207db240ea 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,21 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+/* RISC-V auto-vectorization preference.  */
+enum riscv_autovec_preference_enum {
+  NO_AUTOVEC,
+  RVV_SCALABLE,
+  RVV_FIXED_VLMAX
+};
+
+/* RISC-V auto-vectorization RVV LMUL.  */
+enum riscv_autovec_lmul_enum {
+  RVV_M1 = 1,
+  RVV_M2 = 2,
+  RVV_M4 = 4,
+  RVV_M8 = 8
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ff1dd4ddd4f..ef1bdfcfe28 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -254,3 +254,40 @@ Enum(isa_spec_class) String(20191213) 
Value(ISA_SPEC_CLASS_20191213)
 misa-spec=
 Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) 
Init(TARGET_DEFAULT_ISA_SPEC)
 Set the version of RISC-V ISA spec.
+
+Enum
+Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
+The RISC-V auto-vectorization preference:
+
+EnumValue
+Enum(riscv_autovec_preference) String(none) Value(NO_AUTOVEC)
+
+EnumValue
+Enum(riscv_autovec_preference) String(scalable) Value(RVV_SCALABLE)
+
+EnumValue
+Enum(riscv_autovec_preference) String(fixed-vlmax) Value(RVV_FIXED_VLMAX)
+
+-param=riscv-autovec-preference=
+Target RejectNegative Joined Enum(riscv_autovec_preference) 
Var(riscv_autovec_preference) Init(NO_AUTOVEC)
+-param=riscv-autovec-preference=   Set the preference of 
auto-vectorization in the RISC-V port.
+
+Enum
+Name(riscv_autovec_lmul) Type(enum riscv_autovec_lmul_enum)
+The RVV possible LMUL:
+
+EnumValue
+Enum(riscv_autovec_lmul) String(m1) Value(RVV_M1)
+
+EnumValue
+Enum(riscv_autovec_lmul) String(m2) Value(RVV_M2)
+
+EnumValue
+Enum(riscv_autovec_lmul) String(m4) Value(RVV_M4)
+
+EnumValue
+Enum(riscv_autovec_lmul) String(m8) Value(RVV_M8)
+
+-param=riscv-autovec-lmul=
+Target RejectNegative Joined Enum(riscv_autovec_lmul) Var(riscv_autovec_lmul) 
Init(RVV_M1)
+-param=riscv-autovec-lmul= Set the RVV LMUL of auto-vectorization 
in the RISC-V port.
-- 
2.36.3



[PATCH] RISC-V: Fix incorrect condition of EEW = 64 mode

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch should be merged before this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614935.html

According to RVV ISA, the EEW = 64 is enable only when -march=*zve64*
Current condition is incorrect, since -march=*zve32*_zvl64b will enable EEW = 
64 which
is incorrect.

gcc/ChangeLog:

* config/riscv/riscv-vector-switch.def (ENTRY): Change to 
TARGET_VECTOR_ELEN_64.

---
 gcc/config/riscv/riscv-vector-switch.def | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-switch.def 
b/gcc/config/riscv/riscv-vector-switch.def
index bfb591773dc..b772c282769 100644
--- a/gcc/config/riscv/riscv-vector-switch.def
+++ b/gcc/config/riscv/riscv-vector-switch.def
@@ -187,11 +187,11 @@ ENTRY (VNx1SF, TARGET_VECTOR_FP32 && TARGET_MIN_VLEN < 
128, LMUL_1, 32, LMUL_F2,
For double-precision floating-point, we need TARGET_VECTOR_FP64 ==
RVV_ENABLE.  */
 /* SEW = 64. Disable VNx1DImode/VNx1DFmode when TARGET_MIN_VLEN >= 128.  */
-ENTRY (VNx16DI, TARGET_MIN_VLEN >= 128, LMUL_RESERVED, 0, LMUL_RESERVED, 0, 
LMUL_8, 8)
-ENTRY (VNx8DI, TARGET_MIN_VLEN > 32, LMUL_RESERVED, 0, LMUL_8, 8, LMUL_4, 16)
-ENTRY (VNx4DI, TARGET_MIN_VLEN > 32, LMUL_RESERVED, 0, LMUL_4, 16, LMUL_2, 32)
-ENTRY (VNx2DI, TARGET_MIN_VLEN > 32, LMUL_RESERVED, 0, LMUL_2, 32, LMUL_1, 64)
-ENTRY (VNx1DI, TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128, LMUL_RESERVED, 
0, LMUL_1, 64, LMUL_RESERVED, 0)
+ENTRY (VNx16DI, TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128, 
LMUL_RESERVED, 0, LMUL_RESERVED, 0, LMUL_8, 8)
+ENTRY (VNx8DI, TARGET_VECTOR_ELEN_64, LMUL_RESERVED, 0, LMUL_8, 8, LMUL_4, 16)
+ENTRY (VNx4DI, TARGET_VECTOR_ELEN_64, LMUL_RESERVED, 0, LMUL_4, 16, LMUL_2, 32)
+ENTRY (VNx2DI, TARGET_VECTOR_ELEN_64, LMUL_RESERVED, 0, LMUL_2, 32, LMUL_1, 64)
+ENTRY (VNx1DI, TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128, LMUL_RESERVED, 
0, LMUL_1, 64, LMUL_RESERVED, 0)
 
 ENTRY (VNx16DF, TARGET_VECTOR_FP64 && (TARGET_MIN_VLEN >= 128), LMUL_RESERVED, 
0, LMUL_RESERVED, 0, LMUL_8, 8)
 ENTRY (VNx8DF, TARGET_VECTOR_FP64 && (TARGET_MIN_VLEN > 32), LMUL_RESERVED, 0,
-- 
2.36.3



Re: [PATCHv4] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2023-04-06 Thread Andrew Pinski via Gcc-patches
On Tue, Apr 4, 2023 at 10:48 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Andrew Pinski via Gcc-patches  writes:
> > The problem here is that aarch64_expand_setmem does not change the alignment
> > for strict alignment case.
> > This is version 4 of the fix, major changes from the last version is fixing
> > the way store pairs are handled which allows handling of storing 2 SI mode
> > at a time.
> > This also adds a testcase to show a case with -mstrict-align we can do
> > the store word pair stores.
>
> Heh.  The patch seems to be getting more complicated. :-)

Note I am no longer working on this patch. Review cycles for this
patch is just too long for me.

Thanks,
Andrew Pinski

>
> > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> >
> >   PR target/103100
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.cc (aarch64_gen_store_pair):
> >   Add support for SImode.
> >   (aarch64_set_one_block_and_progress_pointer):
> >   Add use_pair argument and rewrite and simplifying the
> >   code.
> >   (aarch64_can_use_pair_load_stores): New function.
> >   (aarch64_expand_setmem): Rewrite mode selection to
> >   better handle strict alignment and non ld/stp pair case.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/memset-strict-align-1.c: Update test.
> >   Reduce the size down to 207 and make s1 global and aligned
> >   to 16 bytes.
> >   * gcc.target/aarch64/memset-strict-align-2.c: New test.
> >   * gcc.target/aarch64/memset-strict-align-3.c: New test.
> > ---
> >  gcc/config/aarch64/aarch64.cc | 136 ++
> >  .../aarch64/memset-strict-align-1.c   |  19 ++-
> >  .../aarch64/memset-strict-align-2.c   |  14 ++
> >  .../aarch64/memset-strict-align-3.c   |  15 ++
> >  4 files changed, 113 insertions(+), 71 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-3.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 5c40b6ed22a..3eaf9bd608a 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -8850,6 +8850,9 @@ aarch64_gen_store_pair (machine_mode mode, rtx mem1, 
> > rtx reg1, rtx mem2,
> >  {
> >switch (mode)
> >  {
> > +case E_SImode:
> > +  return gen_store_pair_sw_sisi (mem1, reg1, mem2, reg2);
> > +
> >  case E_DImode:
> >return gen_store_pair_dw_didi (mem1, reg1, mem2, reg2);
> >
> > @@ -24896,42 +24899,49 @@ aarch64_expand_cpymem (rtx *operands)
> > SRC is a register we have created with the duplicated value to be set.  
> > */
> >  static void
> >  aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
> > - machine_mode mode)
> > + machine_mode mode, bool use_pairs)
>
> It would be good to update the comment, since this no longer matches
> the aarch64_copy_one_block_and_progress_pointers interface very closely.
>
> >  {
> > +  rtx reg = src;
> >/* If we are copying 128bits or 256bits, we can do that straight from
> >   the SIMD register we prepared.  */
> > -  if (known_eq (GET_MODE_BITSIZE (mode), 256))
> > -{
> > -  mode = GET_MODE (src);
> > -  /* "Cast" the *dst to the correct mode.  */
> > -  *dst = adjust_address (*dst, mode, 0);
> > -  /* Emit the memset.  */
> > -  emit_insn (aarch64_gen_store_pair (mode, *dst, src,
> > -  aarch64_progress_pointer (*dst), 
> > src));
> > -
> > -  /* Move the pointers forward.  */
> > -  *dst = aarch64_move_pointer (*dst, 32);
> > -  return;
> > -}
> >if (known_eq (GET_MODE_BITSIZE (mode), 128))
> > -{
> > -  /* "Cast" the *dst to the correct mode.  */
> > -  *dst = adjust_address (*dst, GET_MODE (src), 0);
> > -  /* Emit the memset.  */
> > -  emit_move_insn (*dst, src);
> > -  /* Move the pointers forward.  */
> > -  *dst = aarch64_move_pointer (*dst, 16);
> > -  return;
> > -}
> > -  /* For copying less, we have to extract the right amount from src.  */
> > -  rtx reg = lowpart_subreg (mode, src, GET_MODE (src));
> > +mode = GET_MODE(src);
>
> Nit: space before "(".
>
> > +  else
> > +/* For copying less, we have to extract the right amount from src.  */
> > +reg = lowpart_subreg (mode, src, GET_MODE (src));
> >
> >/* "Cast" the *dst to the correct mode.  */
> >*dst = adjust_address (*dst, mode, 0);
> >/* Emit the memset.  */
> > -  emit_move_insn (*dst, reg);
> > +  if (use_pairs)
> > +emit_insn (aarch64_gen_store_pair (mode, *dst, reg,
> > +aarch64_progress_pointer (*dst),
> > +reg));
> > +  else
> > +emit_move_insn (*dst, reg);
> > +
> >/* Move the pointer 

Re: 'g++.dg/modules/modules.exp': don't leak local 'unsupported' proc [PR108899]

2023-04-06 Thread Alexandre Oliva via Gcc-patches
On Apr  6, 2023, Thomas Schwinge  wrote:

> Eh, given your "Ooh, nice, I didn't know [...]" comment in
> :

Oh my, you're right, I apologize, I misremembered.  When I wrote "before
I saw your patch" yesterday, I meant the formal, already-tested patch
submission, that I recall seeing while I tested the patchlet I'd posted.
I forgot you had included that patch also in the initial report, but I
see it there too.
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614884.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614880.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614857.html

I learned that tcl trick from you indeed, and that much I remember
clearly: I've long sought but failed to find a way to do that.

Alas, for some reason, I had a misrecollection that you had merely
recommended using that trick, instead of including an actual patch, in
the report I claimed to have based the patch on.  I suppose I may have
drawn that wrong conclusion from my having set out to write a patch
myself, instead of recommending the approval of yours.  That, in turn,
was presumably because there was an additional issue that needed fixing,
and that you had asked me to look into.  Anyhow, it's probably a safe
bet that I based our patch on yours indeed, but I wouldn't be able to
confirm or deny it either way: those details have unfortunately faded
away from my memory.

Anyway, it was based on the misrecollection that I stated "before even
seeing your patch", and I acknowledge that I was wrong, and probably
also overthinking the whole issue ;-)

Please accept my embarrassed apologies.  I think I had better memory
when I was younger, but I'm not really sure, I can't recall ;-D

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[Bug sanitizer/109444] Possible array overflow without diagnosis in memcpy if called within a virtual method scenario

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109444

--- Comment #1 from Andrew Pinski  ---
There is padding bytes for Foo because the alignment of Foo needs to be the
same alignment as a pointer.

[Bug sanitizer/109444] New: Possible array overflow without diagnosis in memcpy if called within a virtual method scenario

2023-04-06 Thread mohamed.selim at dxc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109444

Bug ID: 109444
   Summary: Possible array overflow without diagnosis in memcpy if
called within a virtual method scenario
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mohamed.selim at dxc dot com
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

It's possible to overflow the destination array size in std::memcpy, this
behavior doesn't trigger the expected sanitizer diagnosis when using memcpy in
a virtual method scenario (scenario 1).

While in (scenario 2) when the std::memcpy is called from a normal method, the
overflow is diagnosed as expected.


#include 
#include 
#include 

// zero terminated 8 characters string literal 
const char txt[] = "1234567";

class Bar
{
 public:
  constexpr Bar() : dst{}
  {
  }
  std::int8_t dst[6];
};

void test(Bar& b)
{
std::cout << "staring memcpy.\n";

std::cout << "size of bytes to be copied: " << sizeof(txt) <<"\n";
std::cout << "dst array size: " << sizeof(b.dst) << "\n";
std::memcpy(b.dst, txt, sizeof(txt));
}

class Base
{
public:
virtual ~Base() = default;
virtual void func() = 0;

};

// 1 - Foo inherits Base, virtual method implementation
class Foo: public Base
{
public:
void func() override
{
test(b);
}

private:
Bar b{};
};

// 2 - no inheritance
class Foo2
{
public:
void func()
{
test(b);
}

private:
Bar b{};
};

//-std=c++14 -fsanitize=address -fsanitize=undefined -static-libasan
-static-libubsan

int main()
{
Foo c{}; // scenario 1, no sanitizer diagnosis
//Foo2 c{}; // scenario 2, triggers sanitizer diagnosis
c.func();
return 0;
}

gcc-10-20230406 is now available

2023-04-06 Thread GCC Administrator via Gcc
Snapshot gcc-10-20230406 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20230406/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision a23627d8af71214e27798a2083a337b98b284f3e

You'll find:

 gcc-10-20230406.tar.xz   Complete GCC

  SHA256=c31397f723e0638ef2467f0e945c00cc58c6deb4524a67a56196f9185942d994
  SHA1=cd9f72a518bd5b3329a1eb9cc8b2d60daf8a56d8

Diffs from 10-20230330 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug c++/109356] Enhancement idea to provide clearer missing brace line number

2023-04-06 Thread jg at jguk dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109356

--- Comment #5 from Jonny Grant  ---
I see it is more complicated than I imagined. Thank you for looking into it.

Re: -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds

2023-04-06 Thread David Malcolm via Gcc
On Thu, 2023-04-06 at 13:02 +0200, Benjamin Priour wrote:
> Hi David,
> I haven't yet looked into your suggestions, probably won't have time
> until
> tomorrow actually :/
> Still, here are some updates
> 
> On Thu, Apr 6, 2023 at 2:32 AM David Malcolm 
> wrote:
> 
> > On Wed, 2023-04-05 at 19:50 +0200, Benjamin Priour wrote:
> > > Hi David,
> 

[...snip...]

> > There's quite a bit of duplication here.  My recollection is that
> > there's code in the analyzer that's meant to be eliminating some of
> > this e.g. we want to show the OOB when consecutive_oob_in_frame is
> > called directly; we *don't* want to show it when
> > consecutive_oob_in_frame is called by goo.  Perhaps this
> > deduplication
> > code isn't working?  Can you reproduce similar behavior with C, or
> > is
> > it specific to C++?
> > 
> > 
> Identical behavior both in C and C++. I will look at this code, any
> hint at
> where it starts ?
> Otherwise I would find it the good old way.

It's in diagnostic-manager.cc:
diagnostic_manager::emit_saved_diagnostics uses class dedupe_winners to
try to deduplicate the various saved_diagnostic instances, using
dedupe_key::operator== (and also its hash function).  One of the things
dedupe_key::operator== uses is saved_diagnostic::operator==, which does
the bulk of the work to decide "are these two really the 'same'
diagnotic?"


> > > 
> > > First, as the subject line reads, I get a
> > > -Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-
> > > bounds. I
> > > feel it might be too much, as fixing the OOB would fix the
> > > former.
> > > So maybe only OOB could result in a warning here ?
> > 
> > Yes, that's a good point.  Can you file a bug about this in
> > bugzilla
> > please?  (and feel free to assign it to yourself if you want to
> > have a
> > go at fixing it)
> > 
> 
> Unfortunately the Assignee field is grayed out for me in both
> enter_bug.cgi
> and show_bug.cgi.
> I've also created a new tracker bug for out-of-bounds, as there is a
> number
> of related bugs.

I think you get more access rights in our bugzilla if you log in with a
gcc.gnu.org email address.  You can request one here:
  https://sourceware.org/cgi-bin/pdw/ps_form.cgi
and cite me as approving the request.


> > 
> > Maybe we could fix this by having region_model::check_region_bounds
> > return a bool that signifies if the access is valid, and
> > propagating
> > that value up through callers so that we can return a non-
> > poisoned_svalue at the point where we'd normally return an
> > "uninitialized" poisoned_svalue.
> > 
> > Alternatively, we could simply terminate any analysis path in which
> > an
> > OOB access is detected (by implementing the
> > pending_diagnostic::terminate_path_p virtual function for class
> > out_of_bounds).
> > 
> 
> I'm adding your suggestions as comment to the filed bugs so as to not
> forget them.

Thanks.


> > > Second, it seems that if a frame was a cause for a OOB (either by
> > > containing the spurious code or by being a caller to such code),
> > > it
> > > will
> > > only emit one set of warning, rather than at each unique
> > > compromising
> > > statements.
> > 
> > Maybe.  There's a pending_diagnostic::supercedes_p virtual function
> > that perhaps we could implement for out_of_bounds (or its
> > subclasses).
> > 
> > 
> 
> > > 
> > > Finally, I think the diagnostic path should only go at deep as
> > > the
> > > declaration of the injurious index.
> > 
> > I'm not quite sure what you mean by this, sorry.
> > 
> > 
> Indeed not the best explanation so far. I was actually sort of
> suggesting
> to only emit OOB only on direct call sites,
> you did too, so in a way you have answered me on this.
> 
> Just an addition though: if there is an OOB independent of its
> enclosing
> function's parameters, I think
> it might make sense to not emit for this particular OOB outside the
> function definition itself.
> Meaning that no OOB should be emitted on call sites to this function
> for
> this particular array access.
> (Typically, consecutive_oob_in_frame () shouldn't have resulted in
> more
> than one warning, since the OOB within is independent of its
> parameters).

Yes; I think we're in agreement here.

> 
> I believe right now the expected behavior is to issue warnings only
> on
> actual function calls, so that a function never called
> won't result in warnings. 

IIRC that was the case in GCC 10, but I changed it in
8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86.

> As a result, the initial analysis of each
> functions should never results in warnings -actually the case for
> malloc-leak,
> not for OOB though-.
> Thus we would need to tweak this into actually diagnosing the issues
> on
> initial analysis -those that can be at least-, so that they are saved
> for a
> later
> use whenever the function is actually called. Then we would emit them
> once,
> and only once, because by nature these diagnostics are parameters
> independent.
> I hope I made it clearer, not more convoluted.
> 

[Bug tree-optimization/35269] missed optimization of std::vector access.

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35269

AK  changed:

   What|Removed |Added

 CC||hiraditya at msn dot com

--- Comment #2 from AK  ---
I posted a revised version of this bug here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443

Re: PR target/70243: Do not generate fmaddfp and fnmsubfp

2023-04-06 Thread Segher Boessenkool
Hi!

On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote:
> The Altivec instructions fmaddfp and fnmsubfp have different rounding 
> behaviors

Those are not existing instructions.  You mean "vmaddfp" etc.

> than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
> these instructions seems to break Eigen.

Those instructions use round-to-nearest-tiea-to-even, like all other
VMX FP insns.  A proper patch has to deal with all VMX FP insns.  But,
almost all programs expect that rounding mode anyway, so this is not a
problem in practice.  What happened on Eigen is that the Linux kernel
starts every new process with VSCR[NJ]=1, breaking pretty much
everything that wants floating point for non-toy purposes.  (There
currently is a bug on LE that sets the wrong bit, hiding the problem in
that configuration, but it is intended there as well).

> GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
> as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
> advantage  of the Altivec instructions is that they are 4 operand instructions
> (i.e. the target register does not have to overlap with one of the input
> registers).  The advantage is it can eliminate an extra move instruction.  The
> disadvantage is it does round the same was as the VSX instructions.

And it gets the VSCR[NJ] setting applied.  Yup.

> This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> instructions as alternatives in the VSX instruction insn support, and in the
> Altivec insns it adds a test to prevent the insn from being used if VSX is
> available.  I also added a test to the regression test suite.

Please leave the latter out, it does not belong in this patch.  If you
want a patch to do that deal with *all* VMX FP insns?  There also are
add, sub, mul, etc.  Well I think those (as well as madd and nmsub) are
the only ones that use the NJ bit or the RN bits, but please check.

> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -750,12 +750,15 @@ (define_insn "altivec_vsel4"
>  
>  ;; Fused multiply add.
>  
> +;; If we are using VSX instructions, do not generate the vmaddfp instruction
> +;; since is has different rounding behavior than the xvmaddsp instruction.
> +

No blank lines please.

>  (define_insn "*altivec_fmav4sf4"
>[(set (match_operand:V4SF 0 "register_operand" "=v")
>   (fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
> (match_operand:V4SF 2 "register_operand" "v")
> (match_operand:V4SF 3 "register_operand" "v")))]
> -  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> +  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"

This is very error-prone.  Maybe add a test to the VECTOR_UNIT_ALTIVEC
macro instead?

> -;; Fused vector multiply/add instructions. Support the classical Altivec
> -;; versions of fma, which allows the target to be a separate register from 
> the
> -;; 3 inputs.  Under VSX, the target must be either the addend or the first
> -;; multiply.
> +;; Fused vector multiply/add instructions. Do not use the classical Altivec

(Two spaces after dot, and AltiVec is spelled with a capital V.  I don't
like it either, VMX is a much nicer and more regular name).

> +;; versions of fma.  Those instructions allows the target to be a separate
> +;; register from the 3 inputs, but they have different rounding behaviors.
>  
>  (define_insn "*vsx_fmav4sf4"
> -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
>   (fma:V4SF
> -   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> -   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> -   (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
> +   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> +   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> +   (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
>"VECTOR_UNIT_VSX_P (V4SFmode)"
>"@
> xvmaddasp %x0,%x1,%x2
> -   xvmaddmsp %x0,%x1,%x3
> -   vmaddfp %0,%1,%2,%3"
> +   xvmaddmsp %x0,%x1,%x3"
>[(set_attr "type" "vecfloat")])

So this part looks okay, and it alone is safe for GCC 13 as well.

>  (define_insn "*vsx_nfmsv4sf4"
> -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
>   (neg:V4SF
>(fma:V4SF
> -(match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> -(match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> +(match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> +(match_operand:V4SF 2 "vsx_register_operand" "wa,0")
>  (neg:V4SF
> -  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")]
> +  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")]
>"VECTOR_UNIT_VSX_P (V4SFmode)"
>"@
> xvnmsubasp %x0,%x1,%x2
> -   xvnmsubmsp 

Re: 'g++.dg/modules/modules.exp': don't leak local 'unsupported' proc [PR108899]

2023-04-06 Thread Thomas Schwinge
Hi Alexandre!

On 2023-04-05T23:38:43-0300, Alexandre Oliva via Gcc-patches 
 wrote:
> On Apr  5, 2023, Thomas Schwinge  wrote:
>> With...
>
>> Co-authored-by: Thomas Schwinge 
>
>> ... added, I suppose.
>
> I wrote the patch based on your report, before even seeing your patch

Eh, given your "Ooh, nice, I didn't know [...]" comment in
:

On 2023-03-30T04:00:03-0300, Alexandre Oliva via Gcc-patches 
 wrote:
| On Mar 29, 2023, Thomas Schwinge  wrote:
|> ..., this isn't sufficient.  Instead, we should undo the 'rename' at the
|> end of 'g++.dg/modules/modules.exp'.  OK to push the attached
|> "'g++.dg/modules/modules.exp': don't leak local 'unsupported' proc 
[PR108899]"
|> after proper testing?
|
| Ooh, nice, I didn't know how to drop the renaming after we were done
| with it, [...]

..., I had certainly assumed that you'd learned "how to drop [...]" from
looking at my patch.

> though I only posted mine later, so I tried to give you credit for the
> report in the commit message, but if you feel that the note is
> appropriate, sure :-)  Thanks again!

Thanks.


> Here's what I'm checking in.
>
>
> testsuite: fix proc unsupported overriding in modules.exp [PR108899]
>
> The overrider of proc unsupported in modules.exp had two problems
> reported by Thomas Schwinge, even after Jakub Jelínek's fix:
>
> - it remained in effect while running other dejagnu testsets
>
> - it didn't quote correctly the argument list passed to it, which
>   caused test names to be surrounded by curly braces, as in:
>
> UNSUPPORTED: {...}
>
> This patch fixes both issues

Confirmed, thanks.


Grüße
 Thomas


> obsoleting and reverting Jakub's change,
> by dropping the overrider and renaming the saved proc back, and by
> using uplevel's argument list splicing.
>
>
> Co-authored-by: Thomas Schwinge 
>
> for  gcc/testsuite/ChangeLog
>
>   PR testsuite/108899
>   * g++.dg/modules/modules.exp (unsupported): Drop renaming.
>   Fix quoting.
> ---
>  gcc/testsuite/g++.dg/modules/modules.exp |   20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
> b/gcc/testsuite/g++.dg/modules/modules.exp
> index 80aa392bc7f3b..dc302d3d0af48 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -319,15 +319,11 @@ cleanup_module_files [find $DEFAULT_REPO *.gcm]
>  # so that, after an unsupported result in dg-test, we can skip rather
>  # than fail subsequent related tests.
>  set module_do {"compile" "P"}
> -if { [info procs unsupported] != [list] \
> -  && [info procs saved-unsupported] == [list] } {
> -rename unsupported saved-unsupported
> -
> -proc unsupported { args } {
> - global module_do
> - lset module_do 1 "N"
> - return [saved-unsupported $args]
> -}
> +rename unsupported modules-saved-unsupported
> +proc unsupported { args } {
> +global module_do
> +lset module_do 1 "N"
> +return [uplevel 1 modules-saved-unsupported $args]
>  }
>
>  # not grouped tests, sadly tcl doesn't have negated glob
> @@ -412,4 +408,10 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>  }
>  }
>
> +# Restore the original unsupported proc, lest it will affect
> +# subsequent test runs, or even fail renaming if we run modules.exp
> +# for multiple targets/multilibs/options.
> +rename unsupported {}
> +rename modules-saved-unsupported unsupported
> +
>  dg-finish
>
>
>
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Bug tree-optimization/109443] missed optimization of std::vector access (Related to issue 35269)

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443

--- Comment #1 from AK  ---
Link to issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35269 where I
derived the testcase from.

[Bug tree-optimization/109443] New: missed optimization of std::vector access (Related to issue 35269)

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443

Bug ID: 109443
   Summary: missed optimization of std::vector access (Related to
issue 35269)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

here is slightly modified code example from issue #35269. Both accesses are
similar bug different code is generated. the function `h` has better codegen
than `g` for some reason.


$ g++ -O3 -std=c++20 -fno-exceptions

void f(int);

void g(std::vector v)
{
for (std::vector::size_type i = 0; i < v.size(); i++)
f( v[ i ] );
}

void h(std::vector v)
{
for (std::vector::const_iterator i = v.begin(); i != v.end(); ++i)
f( *i );
}


g(std::vector >):
mov rdx, QWORD PTR [rdi]
cmp QWORD PTR [rdi+8], rdx
je  .L6
pushrbp
mov rbp, rdi
pushrbx
xor ebx, ebx
sub rsp, 8
.L3:
mov edi, DWORD PTR [rdx+rbx*4]
add rbx, 1
callf(int)
mov rdx, QWORD PTR [rbp+0]
mov rax, QWORD PTR [rbp+8]
sub rax, rdx
sar rax, 2
cmp rbx, rax
jb  .L3
add rsp, 8
pop rbx
pop rbp
ret
.L6:
ret



h(std::vector >):
pushrbp
pushrbx
sub rsp, 8
mov rbx, QWORD PTR [rdi]
cmp rbx, QWORD PTR [rdi+8]
je  .L10
mov rbp, rdi
.L12:
mov edi, DWORD PTR [rbx]
add rbx, 4
callf(int)
cmp QWORD PTR [rbp+8], rbx
jne .L12
.L10:
add rsp, 8
pop rbx
pop rbp
ret

[Bug c++/109433] [12/13 Regression] ICE with -std=c++11 and static constexpr array inside a template constexpr

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109433

--- Comment #2 from Andrew Pinski  ---
(In reply to Jakub Jelinek from comment #1)
> With -std=c++11, started to ICE with r12-6326-ge948436eab818c527dd6.


> With -std=c++14, started to ICE with r9-1483-g307193b82cecb8ab79cf.

Yes that is exactly the same where PR 109431 started to ICE too.

[Bug target/109435] overaligned structs are not passed correctly for mips64

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109435

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=88469

--- Comment #2 from Andrew Pinski  ---
The aarch64 and arm backends had a similar issue which was fixed too.

[Bug tree-optimization/109442] New: Dead local copy of std::vector not removed from function

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442

Bug ID: 109442
   Summary: Dead local copy of std::vector not removed from
function
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

T vat1(std::vector v1) {
auto v = v1;
return 10;
}

g++ -O3 -std=c++20 -fno-exceptions

vat1(std::vector >):
mov rax, QWORD PTR [rdi+8]
sub rax, QWORD PTR [rdi]
je  .L11
pushrbp
mov rbp, rax
movabs  rax, 9223372036854775804
pushrbx
sub rsp, 8
cmp rax, rbp
jb  .L15
mov rbx, rdi
mov rdi, rbp
calloperator new(unsigned long)
mov rsi, QWORD PTR [rbx]
mov rdx, QWORD PTR [rbx+8]
mov rdi, rax
sub rdx, rsi
cmp rdx, 4
jle .L16
callmemmove
mov rdi, rax
.L6:
mov rsi, rbp
calloperator delete(void*, unsigned long)
add rsp, 8
mov eax, 10
pop rbx
pop rbp
ret
.L11:
mov eax, 10
ret
.L15:
callstd::__throw_bad_array_new_length()
.L16:
jne .L6
mov eax, DWORD PTR [rsi]
mov DWORD PTR [rdi], eax
jmp .L6

[patch] 'omp scan' struct block seq update for OpenMP 5.x

2023-04-06 Thread Tobias Burnus

That's scheduled for GCC 13 and was found by Sandra and Frederik,

'omp scan' has undergone quite some transformation:

In 5.0 it was added with a preceding and succeeding structured block.

In 5.1, 'structured block' was replaced by 'structured-block-sequence'
defined as "...a sequence of two or more executable statements..."

In 5.2, the s.b.s. became "... a sequence of zero or more executable
statements ...". While there are restrictions against orphaned
separating directives, having zero statements seems to be fine (albeit odd).

The C/C++ parser already permitted >= executable statements (albeit the
testcases didn't use those – and used {...} to enclose multiple
executable statements). The Fortran parser was very strict and did
require exactly one single executable statement before/after the scan. —
Now both accept >=0.

I have opt for adding a warning if there are zero statements before or
after the scan.

And [C/C++], while technically 'for (...)' / "#pragma omp scan ..." is
valid ("for" is followed by a structured block), the before and with the
attached patch there is an error if the final-loop-body does not have
enclosing curly braces; I don't feel like changing the parser for such
an odd use – and for any sensible use, the curly braces are required.

Thoughts? Comments?

* * *

Warning: It seems as if nearly all warnings related to OpenMP are "0"
and on by default with very few exceptions. I am wondering whether we
should add a -Wopenmp and change all those 0 to warn_openmp to make it
easier for the user to -Wno-openmp or -Werror=openmp - or just to '2>&1
|grep Wopenmp' to find them.

Thoughts?

* * *

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
'omp scan' struct block seq update for OpenMP 5.x

While OpenMP 5.0 required a single structured block before and after the
'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence,
denoting 2 or more executable statements in OpenMP 5.1 (ups!) and zero or more
in OpenMP 5.2. This updated C/C++ to accept zero statements (but still requires
the '{' ... '}' for the final-loop-body) and updates Fortran to accept zero or
more than one statements.

If there is no preceeding or succeeding executable statement, a warning is
shown.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_scan_loop_body): Handle
	zero exec statements before/after 'omp scan'.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_scan_loop_body): Handle
	zero exec statements before/after 'omp scan'.

gcc/fortran/ChangeLog:

	* openmp.cc (gfc_resolve_omp_do_blocks): Handle zero
	or more than one exec statements before/after 'omp scan'.
	* trans-openmp.cc (gfc_trans_omp_do): Likewise.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/scan-1.c: New test.
	* testsuite/libgomp.c/scan-23.c: New test.
	* testsuite/libgomp.fortran/scan-2.f90: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning.
	* gfortran.dg/gomp/loop-2.f90: Likewise.
	* gfortran.dg/gomp/reduction5.f90: Likewise.
	* gfortran.dg/gomp/reduction6.f90: Likewise.
	* gfortran.dg/gomp/scan-1.f90: Likewise.
	* gfortran.dg/gomp/taskloop-2.f90: Likewise.
	* c-c++-common/gomp/scan-6.c: New test.
	* gfortran.dg/gomp/scan-8.f90: New test.

 gcc/c/c-parser.cc   |  22 -
 gcc/cp/parser.cc|  24 -
 gcc/fortran/openmp.cc   |  35 +--
 gcc/fortran/trans-openmp.cc |  31 +++---
 gcc/testsuite/c-c++-common/gomp/scan-6.c|  95 +++
 gcc/testsuite/g++.dg/gomp/attrs-7.C |   8 +-
 gcc/testsuite/gfortran.dg/gomp/loop-2.f90   |  10 +-
 gcc/testsuite/gfortran.dg/gomp/reduction5.f90   |   2 +-
 gcc/testsuite/gfortran.dg/gomp/reduction6.f90   |   4 +-
 gcc/testsuite/gfortran.dg/gomp/scan-1.f90   |   9 +-
 gcc/testsuite/gfortran.dg/gomp/scan-8.f90   |  96 +++
 gcc/testsuite/gfortran.dg/gomp/taskloop-2.f90   |  12 +--
 libgomp/testsuite/libgomp.c-c++-common/scan-1.c |  68 +
 libgomp/testsuite/libgomp.c/scan-23.c   | 121 
 libgomp/testsuite/libgomp.fortran/scan-2.f90|  59 
 15 files changed, 545 insertions(+), 51 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 21bc3167ce2..9398c7a5271 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -20112,6 +20112,7 @@ c_parser_omp_scan_loop_body (c_parser *parser, bool open_brace_parsed)
   tree substmt;
   location_t loc;
   tree clauses = NULL_TREE;
+  bool found_scan = false;
 
   loc = c_parser_peek_token (parser)->location;
   if (!open_brace_parsed
@@ -20122,7 +20123,15 @@ c_parser_omp_scan_loop_body (c_parser *parser, bool open_brace_parsed)
   return;
 }
 
-  substmt = 

[Bug c/86584] Incorrect -Wsequence-point warning on structure member

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86584

--- Comment #2 from Andrew Pinski  ---
I think there is just a missing warning for the plain decl case and the
structure member case is not a spurious warning.

[Bug c/86584] Incorrect -Wsequence-point warning on structure member

2023-04-06 Thread oliver at futaura dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86584

oliver at futaura dot co.uk changed:

   What|Removed |Added

 CC||oliver at futaura dot co.uk

--- Comment #1 from oliver at futaura dot co.uk ---
Confirmed here. Still happens with gcc (adtools build 11.2.0) 11.2.0:

test.c: In function 'main':
test.c:13:24: warning: operation on 's.f' may be undefined [-Wsequence-point]
   13 | func(, s.f = 1);
  |^~~

It really doesn't make much sense for gcc to think this a problem. Going to
have to disable the warning here as I can't seem to figure a way around it with
unnecessarily rewriting huge chunks of code.

[Bug tree-optimization/109441] missed optimization when all elements of vector are known

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

--- Comment #1 from AK  ---
I guess a better test case is this:

#include
using namespace std;

using T = int;


T v(std::vector v) {
T s;
std::fill(v.begin(), v.end(), T());
for (auto i = 0; i < v.size(); ++i) {
s += v[i];
}

return s;
}

which has similar effect.

$ g++ -O3 -std=c++17

v(std::vector >):
pushrbp
pushrbx
sub rsp, 8
mov rbp, QWORD PTR [rdi+8]
mov rcx, QWORD PTR [rdi]
cmp rcx, rbp
je  .L7
sub rbp, rcx
mov rdi, rcx
xor esi, esi
mov rbx, rcx
mov rdx, rbp
callmemset
mov rdi, rbp
mov edx, 1
mov rcx, rbx
sar rdi, 2
testrbp, rbp
cmovne  rdx, rdi
cmp rbp, 12
jbe .L8
mov rax, rdx
pxorxmm0, xmm0
shr rax, 2
sal rax, 4
add rax, rbx
.L4:
movdqu  xmm2, XMMWORD PTR [rbx]
add rbx, 16
paddd   xmm0, xmm2
cmp rbx, rax
jne .L4
movdqa  xmm1, xmm0
psrldq  xmm1, 8
paddd   xmm0, xmm1
movdqa  xmm1, xmm0
psrldq  xmm1, 4
paddd   xmm0, xmm1
movdeax, xmm0
testdl, 3
je  .L1
and rdx, -4
mov esi, edx
.L3:
add eax, DWORD PTR [rcx+rdx*4]
lea edx, [rsi+1]
movsx   rdx, edx
cmp rdx, rdi
jnb .L1
add esi, 2
lea r8, [0+rdx*4]
add eax, DWORD PTR [rcx+rdx*4]
movsx   rsi, esi
cmp rsi, rdi
jnb .L1
add eax, DWORD PTR [rcx+4+r8]
.L1:
add rsp, 8
pop rbx
pop rbp
ret
.L7:
add rsp, 8
xor eax, eax
pop rbx
pop rbp
ret
.L8:
xor eax, eax
xor esi, esi
xor edx, edx
jmp .L3

[Bug tree-optimization/109441] missed optimization when all elements of vector are known

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Severity|normal  |enhancement

[Bug tree-optimization/109440] Missed optimization of vector::at when a function is called inside the loop

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||alias

--- Comment #1 from Andrew Pinski  ---
There are a few things, the most obvious thing is the middle-end thinks the
content of v can change via the call to foo.

If you make a local copy, then there is a missing VRP which can be fixed by
changing the type of i to size_t.

[Bug tree-optimization/109441] New: missed optimization when all elements of vector are known

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441

Bug ID: 109441
   Summary: missed optimization when all elements of vector are
known
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

Reference: https://godbolt.org/z/af4x6zhz9

When all elements of vector are 0, then the compiler should be able to remove
the loop and just return 0.

Testcase:

#include
using namespace std;

using T = int;


T v() {
T s;
std::vector v;
v.resize(1000, 0);
for (auto i = 0; i < v.size(); ++i) {
s += v[i];
}

return s;
}



$ g++ -O3 -std=c++17

.LC0:
  .string "vector::_M_fill_insert"
v():
  push rbx
  pxor xmm0, xmm0
  mov edx, 1000
  xor esi, esi
  sub rsp, 48
  lea rcx, [rsp+12]
  lea rdi, [rsp+16]
  mov QWORD PTR [rsp+32], 0
  mov DWORD PTR [rsp+12], 0
  movaps XMMWORD PTR [rsp+16], xmm0
  call std::vector
>::_M_fill_insert(__gnu_cxx::__normal_iterator > >, unsigned long, int const&)
  mov rdx, QWORD PTR [rsp+24]
  mov rdi, QWORD PTR [rsp+16]
  mov rax, rdx
  sub rax, rdi
  mov rsi, rax
  sar rsi, 2
  cmp rdx, rdi
  je .L99
  test rax, rax
  mov ecx, 1
  cmovne rcx, rsi
  cmp rax, 12
  jbe .L107
  mov rdx, rcx
  pxor xmm0, xmm0
  mov rax, rdi
  shr rdx, 2
  sal rdx, 4
  add rdx, rdi
.L101:
  movdqu xmm2, XMMWORD PTR [rax]
  add rax, 16
  paddd xmm0, xmm2
  cmp rdx, rax
  jne .L101
  movdqa xmm1, xmm0
  psrldq xmm1, 8
  paddd xmm0, xmm1
  movdqa xmm1, xmm0
  psrldq xmm1, 4
  paddd xmm0, xmm1
  movd ebx, xmm0
  test cl, 3
  je .L99
  and rcx, -4
  mov eax, ecx
.L100:
  lea edx, [rax+1]
  add ebx, DWORD PTR [rdi+rcx*4]
  movsx rdx, edx
  cmp rdx, rsi
  jnb .L99
  add eax, 2
  lea rcx, [0+rdx*4]
  add ebx, DWORD PTR [rdi+rdx*4]
  cdqe
  cmp rax, rsi
  jnb .L99
  add ebx, DWORD PTR [rdi+4+rcx]
.L99:
  test rdi, rdi
  je .L98
  mov rsi, QWORD PTR [rsp+32]
  sub rsi, rdi
  call operator delete(void*, unsigned long)
.L98:
  add rsp, 48
  mov eax, ebx
  pop rbx
  ret
.L107:
  xor eax, eax
  xor ecx, ecx
  jmp .L100
  mov rbx, rax
  jmp .L105
v() [clone .cold]:

[Bug target/70243] PowerPC V4SFmode should not use Altivec instructions on VSX systems

2023-04-06 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70243

Segher Boessenkool  changed:

   What|Removed |Added

 Status|WAITING |NEW
   Priority|P3  |P1

--- Comment #6 from Segher Boessenkool  ---
We should not use any VMX insn unless explicitly asked for it, since those
do not work as expected if VSCR[NJ]=1, which unfortunately is the default on
Linux (but not on powerpc64le-linux; that is a separate (kernel) bug).

Rounding mode does not matter too much, if we have some subset of fast-math
anyway; the only rounding mode in VMX is round-to-nearest-ties-to-even, which
is the default for most everything else).

But NJ=1 makes arithmetic behave completely unexpectedly, and it isn't
actually faster than NJ=0 on modern hardware anyway.  We cannot change the
default for setting NJ because some code might rely on it, unfortunately.
Luckily disabling generating all VMX insns automatically (i.e. without it
being explicitly asked for) isn't all that expensive, just ends up as a few
more move instructions here and there.

This isn't a regression, but we should have this in GCC 13.

[Bug tree-optimization/109440] New: Missed optimization of vector::at when a function is called inside the loop

2023-04-06 Thread hiraditya at msn dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440

Bug ID: 109440
   Summary: Missed optimization of vector::at when a function is
called inside the loop
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hiraditya at msn dot com
  Target Milestone: ---

#include
#include
using namespace std;

bool bar();

using T = int;

T vat(std::vector v) {
T s;
for (auto i = 0; i < v.size(); ++i) {
if (bar())
s += v.at(i);
}

return s;
}


$ gcc -O2 -fexceptions -fno-unroll-loops


.LC0:
.string "vector::_M_range_check: __n (which is %zu) >= this->size()
(which is %zu)"
vat(std::vector >):
mov rax, QWORD PTR [rdi]
cmp QWORD PTR [rdi+8], rax
je  .L9
pushr12
pushrbp
mov rbp, rdi
pushrbx
xor ebx, ebx
jmp .L6
.L14:
mov rax, QWORD PTR [rbp+8]
sub rax, QWORD PTR [rbp+0]
add rbx, 1
sar rax, 2
cmp rbx, rax
jnb .L13
.L6:
callbar()
testal, al
je  .L14
mov rcx, QWORD PTR [rbp+0]
mov rdx, QWORD PTR [rbp+8]
sub rdx, rcx
sar rdx, 2
mov rax, rdx
cmp rbx, rdx
jnb .L15
add r12d, DWORD PTR [rcx+rbx*4]
add rbx, 1
cmp rbx, rax
jb  .L6
.L13:
mov eax, r12d
pop rbx
pop rbp
pop r12
ret
.L9:
mov eax, r12d
ret
.L15:
mov rsi, rbx
mov edi, OFFSET FLAT:.LC0
xor eax, eax
callstd::__throw_out_of_range_fmt(char const*, ...)

[committed][testsuite] arm: remove unused variables from test

2023-04-06 Thread Stamatis Markianos-Wright via Gcc-patches

Hi all,

This is just a minor issue I found with a previous test
of mine that caused it to fail in C++ mode due to these
unused const variables being uninitialised. I forgot to
remove these after removing some test cases that did use
them.
I removed the test cases, because I came to the
conclusion that the const-ness of the immediate was
irrelevant to the test itself.
Removing the variables now makes the test PASS.
Committed as Obvious.

gcc/testsuite/ChangeLog:

    * 
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c: Remove 
unused variables.
    * 
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c: Remove 
unused variables.




 Inline diff of patch 

diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c

index 7492e9b22bd..a2787a47859 100644
--- 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
+++ 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c

@@ -19,15 +19,6 @@ int16_t i6;
 int32_t i7;
 int64_t i8;

-const int ci1;
-const short ci2;
-const long ci3;
-const long long ci4;
-const int8_t ci5;
-const int16_t ci6;
-const int32_t ci7;
-const int64_t ci8;
-
 float16x8_t floatvec;
 int16x8_t intvec;

diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c

index 9a921bf40e8..7b88f462e17 100644
--- 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
+++ 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c

@@ -13,15 +13,6 @@ int16_t i6;
 int32_t i7;
 int64_t i8;

-const int ci1;
-const short ci2;
-const long ci3;
-const long long ci4;
-const int8_t ci5;
-const int16_t ci6;
-const int32_t ci7;
-const int64_t ci8;
-
 int16x8_t intvec;

 void test(void)
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
index 7492e9b22bd..a2787a47859 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
@@ -19,15 +19,6 @@ int16_t i6;
 int32_t i7;
 int64_t i8;
 
-const int ci1;
-const short ci2;
-const long ci3;
-const long long ci4;
-const int8_t ci5;
-const int16_t ci6;
-const int32_t ci7;
-const int64_t ci8;
-
 float16x8_t floatvec;
 int16x8_t intvec;
 
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
index 9a921bf40e8..7b88f462e17 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
@@ -13,15 +13,6 @@ int16_t i6;
 int32_t i7;
 int64_t i8;
 
-const int ci1;
-const short ci2;
-const long ci3;
-const long long ci4;
-const int8_t ci5;
-const int16_t ci6;
-const int32_t ci7;
-const int64_t ci8;
-
 int16x8_t intvec;
 
 void test(void)


[Bug target/108177] MVE predicated stores to same address get optimized away

2023-04-06 Thread stammark at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108177

--- Comment #5 from Stam Markianos-Wright  ---
With the fix to MVE auto_inc having gone in as
ddc9b5ee13cd686c8674f92d46045563c06a23ea I have found that this fix keeps the
auto-inc on these predicated stores broken.

It seems to fail in auto_inc_dec at this condition:

```
  /* Make sure this reg appears only once in this insn.  */
  if (count_occurrences (PATTERN (mem_insn.insn), mem_insn.reg0, 1) != 1)
{
  if (dump_file)
fprintf (dump_file, "mem count failure\n");
  return false;
}
```
(which makes sense with the pattern now having the MEM appear twice)


I guess this is not urgent since this is only a performance impact on one
instruction. Also if the change needs to be in the auto-inc pass instead of the
backend, then likely something for GCC14, but I thought this would be a good
place to record this ;)

Does anyone have any ideas on this? Or I wonder what the AVX case does for this

Ping: [PATCH v2][RFC] vect: Verify that GET_MODE_NUNITS is greater than one for vect_grouped_store_supported

2023-04-06 Thread Kevin Lee
May I ping this patch?
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614700.html
Any suggestions and comments would be appreciated. Thank you!

Sincerely,
Kevin Lee


[Bug analyzer/109439] New: RFE: Spurious -Wanalyzer-use-of-uninitialized-value tagging along -Wanalyzer-out-of-bounds

2023-04-06 Thread priour.be at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109439

Bug ID: 109439
   Summary: RFE: Spurious -Wanalyzer-use-of-uninitialized-value
tagging along -Wanalyzer-out-of-bounds
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: priour.be at gmail dot com
CC: priour.be at gmail dot com
  Target Milestone: ---
 Build: 13.0.1 20230328 (experimental)

For every -Wanalyzer-out-of-bounds, a corresponding
-Wanalyzer-use-of-initialized-value seemingly tags along.
As most likely fixing the former would also fix the latter, it would make sense
to only have a OOB diagnosed.

Tested on trunk.

int would_like_only_oob ()
{
int arr[] = {1,2,3,4,5,6,7};
int y1 = arr[9]; // 2 warnings instead of only OOB warning are emitted here
return y1;
} 

In the mail list, David suggested that

"Maybe we could fix this by having region_model::check_region_bounds
return a bool that signifies if the access is valid, and propagating
that value up through callers so that we can return a non-
poisoned_svalue at the point where we'd normally return an
"uninitialized" poisoned_svalue.

Alternatively, we could simply terminate any analysis path in which an
OOB access is detected (by implementing the
pending_diagnostic::terminate_path_p virtual function for class
out_of_bounds)."

[Bug ipa/109318] [12/13 Regression] csmith: -fipa-cp seems to cause trouble since r12-2523-g13586172d0b70c

2023-04-06 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109318

--- Comment #10 from Martin Jambor  ---
The problem is actually slightly different, I have just attached a possible fix
to both to PR 107769.

[Bug ipa/107769] [12/13 Regression] -flto with -Os/-O2/-O3 emitted code with gcc 12.x segfaults via mutated global in .rodata since r12-2887-ga6da2cddcf0e959d

2023-04-06 Thread jamborm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107769

--- Comment #7 from Martin Jambor  ---
Created attachment 54817
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54817=edit
potential patch

I am testing the attached patch.  I'd like to think about the whole situation a
bit more next week, but this seems like a way to fix this and PR 109318.

[Bug analyzer/109438] New: Excessive Duplication of -Wanalyzer-out-of-bounds warnings

2023-04-06 Thread priour.be at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109438

Bug ID: 109438
   Summary: Excessive Duplication of -Wanalyzer-out-of-bounds
warnings
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: priour.be at gmail dot com
CC: priour.be at gmail dot com
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-pc-linux-gnu
 Build: 13.0.1 20230328 (experimental)

The reproducer below demonstrates an excessive duplication of the same warning
-Wanalyzer-out-of-bounds. The warning appears duplicated thrice, at points (1),
(2) and (3), yet none appear at (4) (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109437)

See https://godbolt.org/z/Tcrv8h614 for live warnings on trunk, both gcc and
g++.


int consecutive_oob_in_frame () // one warning from analysis of initial EN
{
int arr[] = {1,2,3,4,5,6,7};
int y1 = arr[9]; // injurious line (1)
return y1;
}

int goo () {
int x = consecutive_oob_in_frame (); // causes another warning (2)
return 2 * x;
}

int main () {
goo (); // causes another warning (3)
consecutive_oob_in_frame (); // silent (4)
return 0;
}

Currently the expected behavior would be to see it appear whenever when
consecutive_oob_in_frame is called directly. It means that the above should
rather emits at points (2) and (4).

[Bug analyzer/109437] New: -Wanalyzer-out-of-bounds is emitted at most once per frame.

2023-04-06 Thread priour.be at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109437

Bug ID: 109437
   Summary: -Wanalyzer-out-of-bounds is emitted at most once per
frame.
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: priour.be at gmail dot com
CC: priour.be at gmail dot com
  Target Milestone: ---
 Build: 13.0.1 20230328 (experimental)

OOB refers to Out-Of-Bounds.

Curiously, it seems that if a frame was a cause for a OOB (either by containing
the spurious code or by being a caller to such code), it will only emit one set
of warning, rather than at each unique compromising statements.


int consecutive_oob_in_frame ()
{
int arr[] = {1,2,3,4,5,6,7};
int y1 = arr[9]; // only  this one is diagnosed
int y2 = arr[10]; // no OOB warning emitted here ...
int y3 = arr[50]; // ... nor here.
return (y1+y2+y3);
}

int main () {
consecutive_oob_in_frame (); // OOB warning emitted
int x [] = {1,2};
x[5]; /* silent, probably because another set of OOB warnings
has already been issued with this frame being the source */
return 0;
}


As per David suggestion, it might be worth to implement
pending_diagnostic::supercedes_p vfunc for the OOB checker.

[Bug c++/88061] section attributes of variable templates are ignored

2023-04-06 Thread barry.revzin at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88061

--- Comment #6 from Barry Revzin  ---
Any action on this one?

A workaround right now is to change code that would ideally look like (which is
pretty clean in my opinion):

template 
void foo() {
[[gnu::section(".meow")]] static int value = 0;
}

to code that looks like:

template 
void foo() {
static int PUT_IN_MEOW_value = 0;
}

and add a linker script that moves these variables over:

.meow : {
KEEP(*(SORT(.*PUT_IN_MEOW_*)))
}

But this is, to put it mildly, less than ideal.

Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-06 Thread Alejandro Colomar via Gcc
Hi Eric,

On 4/6/23 18:24, Eric Blake wrote:
> On Wed, Apr 05, 2023 at 02:42:04AM +0200, Alejandro Colomar wrote:
>> Hi Eric,
>>
>> I'm going to reply both your emails here so that GCC is CCed, and they can
>> suggest better stuff.  I'm worried about sending something to POSIX without
>> enough eyes checking it.  So this will be a long email.
> 
> Because your mail landed in a publicly archived mailing list, the
> POSIX folks saw it anyways ;)

:)

> 
> ...
>>>
>>> Whether gcc already has all the attributes you need is not my area of
>>> expertise.  In my skim of the glibc list conversation, I saw mention
>>> of attribute [[gnu:transparent_union]] rather than [[__may_alias__]] -
>>> if that's a better implementation-defined extension that does what we
>>> need, then use it.  The standard developers were a bit uncomfortable
>>> directly putting [[gnu:transparent_union]] in the standard, but
>>> [[__may_alias__]] was noncontroversial (it's in the namespace reserved
>>> for the implementation)
>>
>> Not really; implementation-defined attributes are required to use an
>> implementation-defined prefix like 'gnu::'.  So [[__may_alias__]] is
>> reserved by ISO C, AFAIR.  Maybe it would be better to just mention
>> attributes without any specific attribute name; being fuzzy about it
>> would help avoid making promises that we can't hold.
> 
> On this point, the group agreed, and we intentionally loosened to
> wording to just mention an implementation-defined extension, rather
> than giving any specific attribute name.
> 
> ...
>>
>> I would just make it more fuzzy about which standard version did what.
>> How about this?:
>>
>> [[
>> Note that defining the sockaddr_storage and sockaddr structures using
>> only mechanisms defined in editions of the ISO C standard may produce
>> aliasing diagnostics.  Because of the large body of existing code
>> utilizing sockets in a way that could trigger undefined behavior due
>> to strict aliasing rules, this standard mandates that the various socket
>> address structures can alias each other for accessing their first member,
> 
> The sa_family_t member is not necessarily the first member on all
> platforms (it happens to be first in Linux, but as a counter-example,
> https://man.freebsd.org/cgi/man.cgi?query=unix=4 shows
> sun_family as the second one-byte field in struct sockaddr_un).  The
> emphasis is on derefencing the family member (whatever offset it is
> at) to learn what cast to use to then safely access the rest of the
> storage.
> 
> As such, here's the updated wording that the Austin Group tried today
> (and we plan on starting a 30-day interpretation feedback window if
> there are still adjustments to be made to the POSIX wording):
> 
> https://austingroupbugs.net/view.php?id=1641#c6255

Thanks!  That wording (both paragraphs) LGTM.

Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] sockaddr.3type: Document that sockaddr_storage is the API to be used

2023-04-06 Thread Eric Blake via Gcc
On Wed, Apr 05, 2023 at 02:42:04AM +0200, Alejandro Colomar wrote:
> Hi Eric,
> 
> I'm going to reply both your emails here so that GCC is CCed, and they can
> suggest better stuff.  I'm worried about sending something to POSIX without
> enough eyes checking it.  So this will be a long email.

Because your mail landed in a publicly archived mailing list, the
POSIX folks saw it anyways ;)

...
> > 
> > Whether gcc already has all the attributes you need is not my area of
> > expertise.  In my skim of the glibc list conversation, I saw mention
> > of attribute [[gnu:transparent_union]] rather than [[__may_alias__]] -
> > if that's a better implementation-defined extension that does what we
> > need, then use it.  The standard developers were a bit uncomfortable
> > directly putting [[gnu:transparent_union]] in the standard, but
> > [[__may_alias__]] was noncontroversial (it's in the namespace reserved
> > for the implementation)
> 
> Not really; implementation-defined attributes are required to use an
> implementation-defined prefix like 'gnu::'.  So [[__may_alias__]] is
> reserved by ISO C, AFAIR.  Maybe it would be better to just mention
> attributes without any specific attribute name; being fuzzy about it
> would help avoid making promises that we can't hold.

On this point, the group agreed, and we intentionally loosened to
wording to just mention an implementation-defined extension, rather
than giving any specific attribute name.

...
> 
> I would just make it more fuzzy about which standard version did what.
> How about this?:
> 
> [[
> Note that defining the sockaddr_storage and sockaddr structures using
> only mechanisms defined in editions of the ISO C standard may produce
> aliasing diagnostics.  Because of the large body of existing code
> utilizing sockets in a way that could trigger undefined behavior due
> to strict aliasing rules, this standard mandates that the various socket
> address structures can alias each other for accessing their first member,

The sa_family_t member is not necessarily the first member on all
platforms (it happens to be first in Linux, but as a counter-example,
https://man.freebsd.org/cgi/man.cgi?query=unix=4 shows
sun_family as the second one-byte field in struct sockaddr_un).  The
emphasis is on derefencing the family member (whatever offset it is
at) to learn what cast to use to then safely access the rest of the
storage.

As such, here's the updated wording that the Austin Group tried today
(and we plan on starting a 30-day interpretation feedback window if
there are still adjustments to be made to the POSIX wording):

https://austingroupbugs.net/view.php?id=1641#c6255

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Bug target/109436] New: AArch64: suboptimal codegen in 128 bit constant stores

2023-04-06 Thread sinan.lin at linux dot alibaba.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109436

Bug ID: 109436
   Summary: AArch64: suboptimal codegen in 128 bit constant stores
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sinan.lin at linux dot alibaba.com
  Target Milestone: ---

When splitting a 128-bit constant, there may be cases where the part that is
cut out is a constant 0. And we could use a zero register to avoid the "mov
reg, 0" instruction.

e.g.
```
__int128 Data;

void init() {
Data = 0xf;
}
```

gcc
```
init:
adrpx0, .LANCHOR0
add x0, x0, :lo12:.LANCHOR0
mov x2, 1048575
mov x3, 0
stp x2, x3, [x0]
ret
```


clang
```
init:
mov w8, #1048575
adrpx9, Data
add x9, x9, :lo12:Data
stp x8, xzr, [x9]
ret
```

Re: [PATCH 2/3] RISC-V: Enable basic RVV auto-vectorization and support WHILE_LEN/LEN_LOAD/LEN_STORE pattern

2023-04-06 Thread Kito Cheng via Gcc-patches
Is changes for riscv-vsetvl.cc necessary for autovec? or is it
additional optimization for the autovec use case? I would suggest
splitting that if it's later one.

And plz split out fixed-vlmax part into separated patch, that would be
easier to review.

On Thu, Apr 6, 2023 at 10:44 PM  wrote:
>
> From: Juzhe-Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Add 
> compile option for RVV auto-vectorization.
> (enum riscv_autovec_lmul_enum): Ditto.
> * config/riscv/riscv-protos.h (get_vector_mode): Remove unused global 
> function.
> (preferred_simd_mode): Enable basic auto-vectorization for RVV.
> (expand_while_len): Enable while_len pattern.
> * config/riscv/riscv-v.cc (get_avl_type_rtx): Ditto.
> (autovec_use_vlmax_p): New function.
> (preferred_simd_mode): New function.
> (expand_while_len): Ditto.
> * config/riscv/riscv-vector-switch.def (ENTRY): Disable SEW = 64 for 
> MIN_VLEN > 32 but EEW = 32.

It's bug fix? plz send a separated patch if it's a bug.

> * config/riscv/riscv-vsetvl.cc (get_all_successors): New function.
> (get_all_overlap_blocks): Ditto.
> (local_eliminate_vsetvl_insn): Ditto.
> (vector_insn_info::skip_avl_compatible_p): Ditto.
> (vector_insn_info::merge): Ditto.
> (pass_vsetvl::compute_local_backward_infos): Ehance VSETVL PASS for 
> RVV auto-vectorization.
> (pass_vsetvl::global_eliminate_vsetvl_p): Ditto.
> (pass_vsetvl::cleanup_insns): Ditto.
> * config/riscv/riscv-vsetvl.h: Ditto.
> * config/riscv/riscv.cc (riscv_convert_vector_bits): Add basic RVV 
> auto-vectorization support.
> (riscv_preferred_simd_mode): Ditto.
> (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
> * config/riscv/riscv.opt: Add compile option.
> * config/riscv/vector.md: Add RVV auto-vectorization.
> * config/riscv/autovec.md: New file.
>
> ---
>  gcc/config/riscv/autovec.md  |  63 +++
>  gcc/config/riscv/riscv-opts.h|  16 ++
>  gcc/config/riscv/riscv-protos.h  |   3 +-
>  gcc/config/riscv/riscv-v.cc  |  61 ++-
>  gcc/config/riscv/riscv-vector-switch.def |  47 +++--
>  gcc/config/riscv/riscv-vsetvl.cc | 210 ++-
>  gcc/config/riscv/riscv-vsetvl.h  |   1 +
>  gcc/config/riscv/riscv.cc|  34 +++-
>  gcc/config/riscv/riscv.opt   |  40 +
>  gcc/config/riscv/vector.md   |   6 +-
>  10 files changed, 457 insertions(+), 24 deletions(-)
>  create mode 100644 gcc/config/riscv/autovec.md
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> new file mode 100644
> index 000..ff616d81586
> --- /dev/null
> +++ b/gcc/config/riscv/autovec.md
> @@ -0,0 +1,63 @@
> +;; Machine description for auto-vectorization using RVV for GNU compiler.
> +;; Copyright (C) 2023-2023 Free Software Foundation, Inc.

2023 rather than 2023-2023

> +;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify
> +;; it under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation; either version 3, or (at your option)
> +;; any later version.
> +
> +;; GCC is distributed in the hope that it will be useful,
> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +;; GNU General Public License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING3.  If not see
> +;; .
> +
> +;; =
> +;; == While_len
> +;; =
> +
> +(define_expand "while_len"
> +  [(match_operand:P 0 "register_operand")
> +   (match_operand:P 1 "vector_length_operand")
> +   (match_operand:P 2 "")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_while_len (operands);
> +  DONE;
> +})
> +
> +;; =
> +;; == Loads/Stores
> +;; =
> +
> +;; len_load/len_store is sub-optimal pattern for RVV auto-vectorization 
> support.

Google doc say you need a `a`: "is a sub-optimal " :P


> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 4611447ddde..7db0deb4dbf 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -184,7 +184,6 @@ enum mask_policy
>  enum tail_policy get_prefer_tail_policy ();
>  enum mask_policy get_prefer_mask_policy ();
>  rtx get_avl_type_rtx (enum avl_type);
> 

Re: [RFC PATCH] driver: unfilter default library path [PR 104707]

2023-04-06 Thread Michael Matz via Gcc
Hello,

On Thu, 6 Apr 2023, Shiqi Zhang wrote:

> Currently, gcc delibrately filters out default library paths "/lib/" and
> "/usr/lib/", causing some linkers like mold fails to find libraries.

If linkers claim to be a compatible replacement for other linkers then 
they certainly should behave in a similar way.  In this case: look into 
/lib and /usr/lib when something isn't found till then.

> This behavior was introduced at least 31 years ago in the initial
> revision of the git repo, personally I think it's obsolete because:
>  1. The less than 20 bytes of saving is negligible compares to the command
> line argument space of most hosts today.

That's not the issue that is solved by ignoring these paths in the driver 
for %D/%I directives.  The issue is (traditionally) that even if the 
startfiles sit in /usr/lib (say), you don't want to add -L/usr/lib to the 
linker command line because the user might have added -L/usr/local/lib 
explicitely into her link command and depending on order of spec file 
entries the -L/usr/lib would be added in front interacting with the 
expectations of where libraries are found.

Hence: never add something in (essentially) random places that is default 
fallback anyway.  (Obviously the above problem could be solved in a 
different, more complicated, way.  But this is the way it was solved since 
about forever).

If mold doesn't look into {,/usr}/lib{,64} (as appropriate) by default 
then that's the problem of mold.


Ciao,
Michael.


[Bug target/109402] v850: non-v850e version of __muldi3() in /libgcc/config/v850/lib1funcs.S operates sp in reversed direction

2023-04-06 Thread mikpelinux at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109402

--- Comment #2 from Mikael Pettersson  ---
Please send patches to gcc-patches for review.

[Bug target/107674] [11/12/13 Regressions] arm: MVE codegen regressions on VCTP and vector LDR/STR instructions

2023-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107674

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Richard Earnshaw :

https://gcc.gnu.org/g:ddc9b5ee13cd686c8674f92d46045563c06a23ea

commit r13-7114-gddc9b5ee13cd686c8674f92d46045563c06a23ea
Author: Richard Earnshaw 
Date:   Thu Apr 6 14:44:30 2023 +0100

arm: mve: fix auto-inc generation [PR107674]

My change r13-416-g485a0ae0982abe caused the compiler to stop
generating auto-inc operations on mve loads and stores.  The fix
is to check whether there is a replacement register available
when in strict mode and the register is still a pseudo.

gcc:

PR target/107674
* config/arm/arm.cc (arm_effective_regno): New function.
(mve_vector_mem_operand): Use it.

Re: [PATCH] RISC-V: Add RVV auto-vectorization testcase

2023-04-06 Thread Kito Cheng via Gcc-patches
You included asm output by accidently  :P

On Thu, Apr 6, 2023 at 10:45 PM  wrote:
>
> From: Juzhe-Zhong 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add testing for RVV 
> auto-vectorization.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Adapt testcase.
> * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New test.
> * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New test.
> * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.s: New test.
> * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
> * gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
> * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New 
> test.
> * gcc.target/riscv/rvv/autovec/template-1.h: New test.
> * gcc.target/riscv/rvv/autovec/v-1.c: New test.
> * gcc.target/riscv/rvv/autovec/v-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
> * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.
>
> ---
>  .../rvv/autovec/partial/multiple_rgroup-1.c   |   6 +
>  .../rvv/autovec/partial/multiple_rgroup-1.h   | 304 +++
>  .../rvv/autovec/partial/multiple_rgroup-2.c   |   6 +
>  .../rvv/autovec/partial/multiple_rgroup-2.h   | 546 
>  .../rvv/autovec/partial/multiple_rgroup-2.s   | 774 ++
>  .../autovec/partial/multiple_rgroup_run-1.c   |  19 +
>  .../autovec/partial/multiple_rgroup_run-2.c   |  19 +
>  .../rvv/autovec/partial/single_rgroup-1.c |   8 +
>  .../rvv/autovec/partial/single_rgroup-1.h | 106 +++
>  .../rvv/autovec/partial/single_rgroup_run-1.c |  19 +
>  .../gcc.target/riscv/rvv/autovec/template-1.h |  68 ++
>  .../gcc.target/riscv/rvv/autovec/v-1.c|   4 +
>  .../gcc.target/riscv/rvv/autovec/v-2.c|   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32f-1.c   |   4 +
>  .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |   5 +
>  .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   4 +
>  .../riscv/rvv/autovec/zve32f_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve32x-1.c   |   4 +
>  .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |   6 +
>  .../riscv/rvv/autovec/zve32x_zvl128b-1.c  |   5 +
>  .../riscv/rvv/autovec/zve32x_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   4 +
>  .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |   4 +
>  .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   4 +
>  .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   4 +
>  .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |   4 +
>  .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   4 +
>  .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |   6 +
>  .../gcc.target/riscv/rvv/autovec/zve64x-1.c   |   4 +
>  .../gcc.target/riscv/rvv/autovec/zve64x-2.c   |   4 +
>  .../riscv/rvv/autovec/zve64x_zvl128b-1.c  |   4 +
>  .../riscv/rvv/autovec/zve64x_zvl128b-2.c  |   6 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  16 +
>  .../gcc.target/riscv/rvv/vsetvl/vsetvl-17.c   |   2 +-
>  35 files changed, 1996 insertions(+), 1 deletion(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h
>  create mode 100644 
> 

[committed] arm: mve: fix auto-inc generation [PR107674]

2023-04-06 Thread Richard Earnshaw via Gcc-patches

My change r13-416-g485a0ae0982abe caused the compiler to stop
generating auto-inc operations on mve loads and stores.  The fix
is to check whether there is a replacement register available
when in strict mode and the register is still a pseudo.

gcc:

PR target/107674
* config/arm/arm.cc (arm_effective_regno): New function.
(mve_vector_mem_operand): Use it.
---
 gcc/config/arm/arm.cc | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index a46627bc375..bf7ff9a9704 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -13639,6 +13639,19 @@ arm_coproc_mem_operand_no_writeback (rtx op)
   return arm_coproc_mem_operand_wb (op, 0);
 }
 
+/* In non-STRICT mode, return the register number; in STRICT mode return
+   the hard regno or the replacement if it won't be a mem.  Otherwise, return
+   the original pseudo number.  */
+static int
+arm_effective_regno (rtx op, bool strict)
+{
+  gcc_assert (REG_P (op));
+  if (!strict || REGNO (op) < FIRST_PSEUDO_REGISTER
+  || !reg_renumber || reg_renumber[REGNO (op)] < 0)
+return REGNO (op);
+  return reg_renumber[REGNO (op)];
+}
+
 /* This function returns TRUE on matching mode and op.
 1. For given modes, check for [Rn], return TRUE for Rn <= LO_REGS.
 2. For other modes, check for [Rn], return TRUE for Rn < R15 (expect R13).  */
@@ -13651,7 +13664,7 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
   /* Match: (mem (reg)).  */
   if (REG_P (op))
 {
-  int reg_no = REGNO (op);
+  reg_no = arm_effective_regno (op, strict);
   return (((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode)
 	   ? reg_no <= LAST_LO_REGNUM
 	   : reg_no < LAST_ARM_REGNUM)
@@ -13662,7 +13675,7 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
   if (code == POST_INC || code == PRE_DEC
   || code == PRE_INC || code == POST_DEC)
 {
-  reg_no = REGNO (XEXP (op, 0));
+  reg_no = arm_effective_regno (XEXP (op, 0), strict);
   return (((mode == E_V8QImode || mode == E_V4QImode || mode == E_V4HImode)
 	   ? reg_no <= LAST_LO_REGNUM
 	   :(reg_no < LAST_ARM_REGNUM && reg_no != SP_REGNUM))
@@ -13678,7 +13691,7 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool strict)
 	   || (reload_completed && code == PLUS && REG_P (XEXP (op, 0))
 	   && GET_CODE (XEXP (op, 1)) == CONST_INT))
 {
-  reg_no = REGNO (XEXP (op, 0));
+  reg_no = arm_effective_regno (XEXP (op, 0), strict);
   if (code == PLUS)
 	val = INTVAL (XEXP (op, 1));
   else


PR target/70243: Do not generate fmaddfp and fnmsubfp

2023-04-06 Thread Michael Meissner via Gcc-patches
The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
these instructions seems to break Eigen.

GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
advantage  of the Altivec instructions is that they are 4 operand instructions
(i.e. the target register does not have to overlap with one of the input
registers).  The advantage is it can eliminate an extra move instruction.  The
disadvantage is it does round the same was as the VSX instructions.

This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
instructions as alternatives in the VSX instruction insn support, and in the
Altivec insns it adds a test to prevent the insn from being used if VSX is
available.  I also added a test to the regression test suite.

I have done bootstrap builds on power9 little endian (with both IEEE long
double and IBM long double).  I have also done the builds and test on a power8
big endian system (testing both 32-bit and 64-bit code generation).  Chip has
verified that it fixes the problem that Eigen encountered.  Can I check this
into the master GCC branch?  After a burn-in period, can I check this patch
into the active GCC branches?

Thanks in advance.

2023-04-06   Michael Meissner  

gcc/

PR target/70243
* config/rs6000/altivec.md (altivec_fmav4sf4): Add a test to prevent
fmaddfp and fnmsubfp from being generated on VSX systems.
(altivec_vnmsubfp): Likewise.
* config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate fmaddfp or
fnmsubfp.
(vsx_nfmsv4sf4): Likewise.

gcc/testsuite/

PR target/70243
* gcc.target/powerpc/pr70243.c: New test.
---
 gcc/config/rs6000/altivec.md   |  9 +++--
 gcc/config/rs6000/vsx.md   | 29 +++
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++
 3 files changed, 61 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..63eab228d0d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -750,12 +750,15 @@ (define_insn "altivec_vsel4"
 
 ;; Fused multiply add.
 
+;; If we are using VSX instructions, do not generate the vmaddfp instruction
+;; since is has different rounding behavior than the xvmaddsp instruction.
+
 (define_insn "*altivec_fmav4sf4"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
  (match_operand:V4SF 2 "register_operand" "v")
  (match_operand:V4SF 3 "register_operand" "v")))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vmaddfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
@@ -984,6 +987,8 @@ (define_insn "vstril_p_direct_"
   [(set_attr "type" "vecsimple")])
 
 ;; Fused multiply subtract 
+;; If we are using VSX instructions, do not generate the vnmsubfp instruction
+;; since is has different rounding behavior than the xvnmsubsp instruction.
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(neg:V4SF
@@ -991,7 +996,7 @@ (define_insn "*altivec_vnmsubfp"
   (match_operand:V4SF 2 "register_operand" "v")
   (neg:V4SF
(match_operand:V4SF 3 "register_operand" "v")]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vnmsubfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..03c1d787b6c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@ (define_insn "*vsx_tsqrt2_internal"
   "xtsqrtp %0,%x1"
   [(set_attr "type" "")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
+;; Fused vector multiply/add instructions. Do not use the classical Altivec
+;; versions of fma.  Those instructions allows the target to be a separate
+;; register from the 3 inputs, but they have different rounding behaviors.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
(fma:V4SF
- (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
- (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
- (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+ (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+ (match_operand:V4SF 2 

[Bug target/82028] Windows x86_64 should not pass float aggregates in xmm

2023-04-06 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82028

--- Comment #7 from LIU Hao  ---
clang generates 14 bytes:

```
mov rax, 0x7FFF   # 48 B8 FF FF FF FF FF FF FF 7F
and rax, rcx  # 48 23 C1
ret   # C3
``

but in principle this function requires only 8 bytes:

```
lea rax, qword ptr [rcx + rcx]# 48 8D 04 09
shr rax, 1# 48 D1 E8
ret   # C3
``

Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-06 Thread Jeff Law via Gcc-patches




On 4/6/23 04:31, Jakub Jelinek wrote:



If we want to fix it in the combiner, I think the fix would be following.
The optimization is about
(and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
and IMHO we can only optimize it into
(subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
if we know that the upper bits of the REG are zeros.  
But in WORD_REGISTER_OPERATIONS, that inner AND variant operates on a 
full word.  So I think they're equivalent.  But maybe I'm getting myself 
confused again.






Now, this patch fixes the PR, but certainly generates worse (but correct)
code than the dse.cc patch.  So perhaps we want both of them?
I think the dse patch has value independently of this discussion, though 
I think it's more of a gcc-14 thing.




As before, I unfortunately can't test it on riscv-linux (could perhaps try
that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
target, but last my bootstrap attempt there failed miserably because of the
Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
for that once I test it).

I can spin it here when the time comes.

jeff




[Bug target/109416] Missed constant propagation cases after reload

2023-04-06 Thread sinan.lin at linux dot alibaba.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109416

--- Comment #3 from Sinan  ---
Hi Andrew,

Thank you for taking the time to explain the issue. I appreciate it.

I think the issue between init/init2 and init3 might be different. Regarding
init3, any 32-bit backend attempting to split a complex constant will encounter
such a suboptimal case.

I tried with mip in gcc 12, and here are the ouputs for `init` and `init3`

init:
lui $2,%hi(Data)
move$5,$0
move$4,$0
sw  $5,%lo(Data+4)($2)
jr  $31
sw  $4,%lo(Data)($2)

init3:
lui $2,%hi(Data)
li  $5,15 # 0xf
li  $4,15 # 0xf
sw  $5,%lo(Data+4)($2)
jr  $31
sw  $4,%lo(Data)($2)

register $4 or $5 could be eliminated.

Re: [PATCH] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-06 Thread Kito Cheng via Gcc-patches
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 2e91d019f6c..90c69b52bb4 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -43,6 +43,7 @@
> #include "optabs.h"
> #include "tm-constrs.h"
> #include "rtx-vector-builder.h"
> +#include "diagnostic-core.h"
> using namespace riscv_vector;
> @@ -724,4 +725,82 @@ gen_avl_for_scalar_move (rtx avl)
>  }
> }
> +/* Generate a sequence of instructions that zero registers specified by
> +   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
> +   zeroed.  */
> +static HARD_REG_SET
> +gpr_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)

Drop this - call default_zero_call_used_regs instead of build our own one.

> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +
> +  for (unsigned regno = GP_REG_FIRST; regno <= GP_REG_LAST; ++regno)
> +{
> +  if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> + continue;
> +
> +  rtx reg = regno_reg_rtx[regno];
> +  machine_mode mode = GET_MODE (reg);
> +  emit_move_insn (reg, CONST0_RTX (mode));
> +
> +  SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +}
> +
> +  return zeroed_hardregs;
> +}
> +
> +static HARD_REG_SET
> +vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)

Plz move this into riscv.cc

> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +
> +  /* Find a register to hold vl.  */
> +  unsigned vl_regno = GP_REG_LAST + 1;

Use INVALID_REGNUM as sentinel value

> +  for (unsigned regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)

Start from `GP_REG_FIRST + 1`

> +{
> +  /* If vl and avl both are x0, the existing vl is kept.  */
> +  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno) && regno != 
> X0_REGNUM)

Then we don't need to check `regno != X0_REGNUM` here.

> + {
> +   vl_regno = regno;
> +   break;
> + }
> +}
> +
> +  if (vl_regno > GP_REG_LAST)
> +sorry ("can't allocate vl register for %qs on this target",
> +"-fzero-call-used-regs");
> +
> +  rtx vl = gen_rtx_REG (Pmode, vl_regno); /* vl is VLMAX.  */
> +  for (unsigned regno = V_REG_FIRST; regno <= V_REG_LAST; ++regno)
> +{
> +  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> + {
> +   rtx target = regno_reg_rtx[regno];
> +   machine_mode mode = GET_MODE (target);
> +   poly_uint16 nunits = GET_MODE_NUNITS (mode);
> +   machine_mode mask_mode = get_vector_mode (BImode, nunits).require ();
> +
> +   emit_vlmax_vsetvl (mode, vl);

You can add an variable to check vlmax_vsetvl is emitted or not, and
skip that if already emitted

e.g.

if (!emitted_vlmax_vsetvl)
  emit_vlmax_vsetvl (mode, vl);

emitted_vlmax_vsetvl = true;

Add a new function maybe named emit_hard_vlmax_vsetvl to prevent the
vsetvli instruction gone when optimization is enabled.
---
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4611447ddde..5244e8dcbf0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -159,6 +159,7 @@ bool check_builtin_call (location_t,
vec, unsigned int,
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 bool legitimize_move (rtx, rtx, machine_mode);
 void emit_vlmax_vsetvl (machine_mode, rtx);
+void emit_hard_vlmax_vsetvl (machine_mode, rtx);
 void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
 void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
 void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 90c69b52bb4..6d34e3a2b6c 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -119,6 +119,20 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
  && IN_RANGE (INTVAL (elt), minval, maxval));
 }

+/* Emit a vlmax vsetvl instruction with side effect, this should be only used
+   when optimization is tune off or emit after vsetvl insertion pass.  */
+void
+emit_hard_vlmax_vsetvl (machine_mode vmode, rtx vl)
+{
+  unsigned int sew = get_sew (vmode);
+  enum vlmul_type vlmul = get_vlmul (vmode);
+  unsigned int ratio = calculate_ratio (sew, vlmul);
+
+  emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew, Pmode),
+gen_int_mode (get_vlmul (vmode), Pmode), const0_rtx,
+const0_rtx));
+}
+
 void
 emit_vlmax_vsetvl (machine_mode vmode, rtx vl)
 {
@@ -127,9 +141,7 @@ emit_vlmax_vsetvl (machine_mode vmode, rtx vl)
   unsigned int ratio = calculate_ratio (sew, vlmul);

   if (!optimize)
-emit_insn (gen_vsetvl (Pmode, vl, RVV_VLMAX, gen_int_mode (sew, Pmode),
-  gen_int_mode (get_vlmul (vmode), Pmode), const0_rtx,
-  const0_rtx));
+emit_hard_vlmax_vsetvl (vmode, vl);
   else
 emit_insn (gen_vlmax_avl (Pmode, vl, gen_int_mode (ratio, Pmode)));
 }

---


> +   emit_vlmax_op (code_for_pred_mov (mode), 

Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-06 Thread Jeff Law via Gcc-patches




On 4/6/23 04:15, Eric Botcazou wrote:

Originally I didn't really see this as an operation.  But the more and
more I ponder it feels like it's an operation and thus should be subject
to WORD_REGISTER_OPERATIONS.

While it's not really binding on RTL semantics, if we look at how some
architectures implement reg->reg copies, they're actually implemented
with an ADD or IOR -- so a real operation under the hood.

If we accept a subreg copy as an operation and thus subject to
WORD_REGISTER_OPERATIONS then that would seem to imply the combine is
the real problem here.  Otherwise dse is the culprit.


Yes, I agree that there is an ambiguity for subreg copy operations.  At some
point I tried to define what register operations are and added a predicate to
that effect (word_register_operation_p ); while it returns true for SUBREG,
it's an opt-out predicate so this does not mean much.
Yea, I saw word_register_operation_p.  I was hesitant to treat it as a 
canonical definition of what ops are and are not subject to 
WORD_REGISTER_OPERATIONS.




I don't think that DSE does anything wrong: as I wrote in the PR, defining
WORD_REGISTER_OPERATIONS should not prevent any particular form of RTL.
That was the conclusion I'd come to, predicated on treating SUBREGs as 
affected by WORD_REGISTER_OPERATIONS.




I therefore think that the problem is in the combiner and probably in the
intermediate step shown by Jakub:

"Then after that try_combine we do:
13325   record_value_for_reg (dest, record_dead_insn,
13326 WORD_REGISTER_OPERATIONS
13327 && word_register_operation_p (SET_SRC
(setter))
13328 && paradoxical_subreg_p (SET_DEST
(setter))
13329 ? SET_SRC (setter)
13330 : gen_lowpart (GET_MODE (dest),
13331SET_SRC (setter)));
and the 3 conditions are true here and so record value of the whole setter.
That then records among other things nonzero_bits as 0x8084c."

That's a recent addition of mine (ae20d760b1ed69f631c3bf9351bf7e5005d52297)
and I think that it probably abuses WORD_REGISTER_OPERATIONS and should either
be reverted or restricted to the load case documented in its comment.  I can
provide testing on SPARC if need be.
I think that's the job for today.  Pan2, Jakub and myself have all 
zero'd in on this code in combine.


jeff



Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-06 Thread Jeff Law via Gcc-patches





On 4/6/23 03:37, Li, Pan2 wrote:

Yes, RISC-V riscv.h defined the WORD_REGISTER_OPERATIONS to be 1, while 
aarch64.h defined it as 0, with below comments. No idea this can fit RISC-V or 
not.
I don't see any fundamental reason why it won't work.  Most of the 
expansion code already has code to widen types as necessary.  And given 
that we have a subset of 32bit ops, even in 64bit modes makes a 
WORD_REGISTER_OPERATIONS 0 a more sensible choice.


Jeff


Re: [PATCH] dse: Handle SUBREGs of word REGs differently for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-06 Thread Jeff Law via Gcc-patches




On 4/6/23 03:31, Richard Sandiford wrote:

Jeff Law  writes:

On 4/5/23 10:48, Jakub Jelinek wrote:

On Wed, Apr 05, 2023 at 10:17:59AM -0600, Jeff Law wrote:

It is true that an instruction like
(insn 8 7 9 2 (set (reg:HI 141)
   (subreg:HI (reg:SI 142) 0)) "aauu.c":6:18 181 {*movhi_internal}
(nil))
can appear in the IL on WORD_REGISTER_OPERATIONS target, but I think the
upper bits shouldn't be random garbage in that case, it should be zero
extended or sign extended.

Well, that's one of the core questions here.  What are the state of the
upper 16 bits of (reg:HI 141)?  The WORD_REGISTER_OPERATIONS docs aren't
100% clear as we're not really doing any operation.

So again, I think we need to decide if the DSE transformation is correct or
not.  I *think* we can aggree that insn 39 is OK.  It's really the semantics
of insn 47 that I think we need to agree on.  What is the state of the upper
16 bits of (reg:HI 175) after insn 47?


I'm afraid I don't know the answers here, I think Eric is
WORD_REGISTER_OPERATIONS expert here I think these days (most of the major
targets are !WORD_REGISTER_OPERATIONS).

Hopefully he'll chime in.


Just curious: have you experimented with making RISC-V
!WORD_REGISTER_OPERATIONS too?  Realise it's not the right way
to fix the bug, just curious in general.
We haven't experimented with it AFAIK.  We don't have a full set of 
SImode operations, but we may have enough of a subset to try. 
Alternately, we could potentially see what happens if we ignore the 
32bit ops that we do support.  Both general directions are probably 
worth exploring, but not right now and probably not even for gcc-14 
(where we're going to be busy as hell on the vector side).




Not defining it seems to have worked well for AArch64.  And IMO
the semantics are much easier to follow when there is no special
treatment of upper bits.  Subregs are hard enough to reason about
as it is...
Amen to that.  My sense is that the risc-v port relies far too heavily 
on SUBREGs and the WORD_REGISTER_OPERATIONS just makes reasoning about 
correctness harder.


jeff


[PATCH] RISC-V: Add RVV auto-vectorization testcase

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add testing for RVV auto-vectorization.
* gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.s: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/template-1.h: New test.
* gcc.target/riscv/rvv/autovec/v-1.c: New test.
* gcc.target/riscv/rvv/autovec/v-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: New test.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-2.c: New test.

---
 .../rvv/autovec/partial/multiple_rgroup-1.c   |   6 +
 .../rvv/autovec/partial/multiple_rgroup-1.h   | 304 +++
 .../rvv/autovec/partial/multiple_rgroup-2.c   |   6 +
 .../rvv/autovec/partial/multiple_rgroup-2.h   | 546 
 .../rvv/autovec/partial/multiple_rgroup-2.s   | 774 ++
 .../autovec/partial/multiple_rgroup_run-1.c   |  19 +
 .../autovec/partial/multiple_rgroup_run-2.c   |  19 +
 .../rvv/autovec/partial/single_rgroup-1.c |   8 +
 .../rvv/autovec/partial/single_rgroup-1.h | 106 +++
 .../rvv/autovec/partial/single_rgroup_run-1.c |  19 +
 .../gcc.target/riscv/rvv/autovec/template-1.h |  68 ++
 .../gcc.target/riscv/rvv/autovec/v-1.c|   4 +
 .../gcc.target/riscv/rvv/autovec/v-2.c|   6 +
 .../gcc.target/riscv/rvv/autovec/zve32f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32f-2.c   |   5 +
 .../riscv/rvv/autovec/zve32f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve32f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve32x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve32x-2.c   |   6 +
 .../riscv/rvv/autovec/zve32x_zvl128b-1.c  |   5 +
 .../riscv/rvv/autovec/zve32x_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64d-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64d-2.c   |   4 +
 .../riscv/rvv/autovec/zve64d_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64d_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64f-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64f-2.c   |   4 +
 .../riscv/rvv/autovec/zve64f_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64f_zvl128b-2.c  |   6 +
 .../gcc.target/riscv/rvv/autovec/zve64x-1.c   |   4 +
 .../gcc.target/riscv/rvv/autovec/zve64x-2.c   |   4 +
 .../riscv/rvv/autovec/zve64x_zvl128b-1.c  |   4 +
 .../riscv/rvv/autovec/zve64x_zvl128b-2.c  |   6 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  16 +
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-17.c   |   2 +-
 35 files changed, 1996 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.s
 create mode 100644 

[PATCH 2/3] RISC-V: Enable basic RVV auto-vectorization and support WHILE_LEN/LEN_LOAD/LEN_STORE pattern

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Add 
compile option for RVV auto-vectorization.
(enum riscv_autovec_lmul_enum): Ditto.
* config/riscv/riscv-protos.h (get_vector_mode): Remove unused global 
function.
(preferred_simd_mode): Enable basic auto-vectorization for RVV.
(expand_while_len): Enable while_len pattern.
* config/riscv/riscv-v.cc (get_avl_type_rtx): Ditto.
(autovec_use_vlmax_p): New function.
(preferred_simd_mode): New function.
(expand_while_len): Ditto.
* config/riscv/riscv-vector-switch.def (ENTRY): Disable SEW = 64 for 
MIN_VLEN > 32 but EEW = 32.
* config/riscv/riscv-vsetvl.cc (get_all_successors): New function.
(get_all_overlap_blocks): Ditto.
(local_eliminate_vsetvl_insn): Ditto.
(vector_insn_info::skip_avl_compatible_p): Ditto.
(vector_insn_info::merge): Ditto.
(pass_vsetvl::compute_local_backward_infos): Ehance VSETVL PASS for RVV 
auto-vectorization.
(pass_vsetvl::global_eliminate_vsetvl_p): Ditto.
(pass_vsetvl::cleanup_insns): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.
* config/riscv/riscv.cc (riscv_convert_vector_bits): Add basic RVV 
auto-vectorization support.
(riscv_preferred_simd_mode): Ditto.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
* config/riscv/riscv.opt: Add compile option.
* config/riscv/vector.md: Add RVV auto-vectorization.
* config/riscv/autovec.md: New file.

---
 gcc/config/riscv/autovec.md  |  63 +++
 gcc/config/riscv/riscv-opts.h|  16 ++
 gcc/config/riscv/riscv-protos.h  |   3 +-
 gcc/config/riscv/riscv-v.cc  |  61 ++-
 gcc/config/riscv/riscv-vector-switch.def |  47 +++--
 gcc/config/riscv/riscv-vsetvl.cc | 210 ++-
 gcc/config/riscv/riscv-vsetvl.h  |   1 +
 gcc/config/riscv/riscv.cc|  34 +++-
 gcc/config/riscv/riscv.opt   |  40 +
 gcc/config/riscv/vector.md   |   6 +-
 10 files changed, 457 insertions(+), 24 deletions(-)
 create mode 100644 gcc/config/riscv/autovec.md

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
new file mode 100644
index 000..ff616d81586
--- /dev/null
+++ b/gcc/config/riscv/autovec.md
@@ -0,0 +1,63 @@
+;; Machine description for auto-vectorization using RVV for GNU compiler.
+;; Copyright (C) 2023-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; =
+;; == While_len
+;; =
+
+(define_expand "while_len"
+  [(match_operand:P 0 "register_operand")
+   (match_operand:P 1 "vector_length_operand")
+   (match_operand:P 2 "")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_while_len (operands);
+  DONE;
+})
+
+;; =
+;; == Loads/Stores
+;; =
+
+;; len_load/len_store is sub-optimal pattern for RVV auto-vectorization 
support.
+;; We will replace them when len_maskload/len_maskstore is supported in loop 
vectorizer.
+(define_expand "len_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "memory_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+(define_expand "len_store_"
+  [(match_operand:V 0 "memory_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index cf0cd669be4..22b79b65de5 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ 

[PATCH 0/3] RISC-V:Enable basic auto-vectorization for RVV

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

PATCH 1: Add WHILE_LEN pattern in Loop Vectorizer to support decrement IV for 
RVV.
PATCH 2: Enable basic auto-vectorization for RVV in RISC-V port.
PATCH 3: Add testcases for basic RVV auto-vectorization of WHILE_LEN pattern 
 includeing single rgroup test and multiple rgroup test of SLP.

*** BLURB HERE ***

Juzhe-Zhong (3):
  VECT: Add WHILE_LEN pattern to support decrement IV manipulation for
loop vectorizer.
  RISC-V: Enable basic RVV auto-vectorization and support
WHILE_LEN/LEN_LOAD/LEN_STORE pattern
  RISC-V: Add testcase for basic RVV auto-vectorization

 gcc/config/riscv/autovec.md   |  63 ++
 gcc/config/riscv/riscv-opts.h |  16 +
 gcc/config/riscv/riscv-protos.h   |   3 +-
 gcc/config/riscv/riscv-v.cc   |  61 +-
 gcc/config/riscv/riscv-vector-switch.def  |  47 +-
 gcc/config/riscv/riscv-vsetvl.cc  | 210 -
 gcc/config/riscv/riscv-vsetvl.h   |   1 +
 gcc/config/riscv/riscv.cc |  34 +-
 gcc/config/riscv/riscv.opt|  40 +
 gcc/config/riscv/vector.md|   6 +-
 gcc/doc/md.texi   |  14 +
 gcc/internal-fn.cc|  29 +
 gcc/internal-fn.def   |   1 +
 gcc/optabs.def|   1 +
 gcc/testsuite/gcc.target/riscv/rvv/api/vadc.c | 361 
 gcc/testsuite/gcc.target/riscv/rvv/api/vadd.c | 713 
 gcc/testsuite/gcc.target/riscv/rvv/api/vand.c | 713 
 .../gcc.target/riscv/rvv/api/vcpop.c  |  65 ++
 gcc/testsuite/gcc.target/riscv/rvv/api/vdiv.c | 361 
 .../gcc.target/riscv/rvv/api/vdivu.c  | 361 
 .../gcc.target/riscv/rvv/api/vfirst.c |  65 ++
 gcc/testsuite/gcc.target/riscv/rvv/api/vid.c  | 185 +
 .../gcc.target/riscv/rvv/api/viota.c  | 185 +
 .../gcc.target/riscv/rvv/api/vle16.c  | 105 +++
 .../gcc.target/riscv/rvv/api/vle32.c  | 129 +++
 .../gcc.target/riscv/rvv/api/vle64.c  |  73 ++
 gcc/testsuite/gcc.target/riscv/rvv/api/vle8.c | 121 +++
 gcc/testsuite/gcc.target/riscv/rvv/api/vlm.c  |  37 +
 .../gcc.target/riscv/rvv/api/vloxei16.c   | 385 +
 .../gcc.target/riscv/rvv/api/vloxei32.c   | 353 
 .../gcc.target/riscv/rvv/api/vloxei64.c   | 297 +++
 .../gcc.target/riscv/rvv/api/vloxei8.c| 401 +
 .../gcc.target/riscv/rvv/api/vlse16.c | 105 +++
 .../gcc.target/riscv/rvv/api/vlse32.c | 129 +++
 .../gcc.target/riscv/rvv/api/vlse64.c |  73 ++
 .../gcc.target/riscv/rvv/api/vlse8.c  | 121 +++
 .../gcc.target/riscv/rvv/api/vluxei16.c   | 385 +
 .../gcc.target/riscv/rvv/api/vluxei32.c   | 353 
 .../gcc.target/riscv/rvv/api/vluxei64.c   | 297 +++
 .../gcc.target/riscv/rvv/api/vluxei8.c| 401 +
 .../gcc.target/riscv/rvv/api/vmacc.c  | 713 
 .../gcc.target/riscv/rvv/api/vmadc.c  | 713 
 .../gcc.target/riscv/rvv/api/vmadd.c  | 713 
 .../gcc.target/riscv/rvv/api/vmand.c  |  37 +
 .../gcc.target/riscv/rvv/api/vmandn.c |  37 +
 gcc/testsuite/gcc.target/riscv/rvv/api/vmax.c | 361 
 .../gcc.target/riscv/rvv/api/vmaxu.c  | 361 
 .../gcc.target/riscv/rvv/api/vmclr.c  |  37 +
 .../gcc.target/riscv/rvv/api/vmerge.c | 361 
 gcc/testsuite/gcc.target/riscv/rvv/api/vmin.c | 361 
 .../gcc.target/riscv/rvv/api/vminu.c  | 361 
 gcc/testsuite/gcc.target/riscv/rvv/api/vmmv.c |  37 +
 .../gcc.target/riscv/rvv/api/vmnand.c |  37 +
 .../gcc.target/riscv/rvv/api/vmnor.c  |  37 +
 .../gcc.target/riscv/rvv/api/vmnot.c  |  37 +
 gcc/testsuite/gcc.target/riscv/rvv/api/vmor.c |  37 +
 .../gcc.target/riscv/rvv/api/vmorn.c  |  37 +
 .../gcc.target/riscv/rvv/api/vmsbc.c  | 713 
 .../gcc.target/riscv/rvv/api/vmsbf.c  |  65 ++
 .../gcc.target/riscv/rvv/api/vmseq.c  | 713 
 .../gcc.target/riscv/rvv/api/vmset.c  |  37 +
 .../gcc.target/riscv/rvv/api/vmsge.c  | 361 
 .../gcc.target/riscv/rvv/api/vmsgeu.c | 361 
 .../gcc.target/riscv/rvv/api/vmsgt.c  | 361 
 .../gcc.target/riscv/rvv/api/vmsgtu.c | 361 
 .../gcc.target/riscv/rvv/api/vmsif.c  |  65 ++
 .../gcc.target/riscv/rvv/api/vmsle.c  | 361 
 .../gcc.target/riscv/rvv/api/vmsleu.c | 361 
 .../gcc.target/riscv/rvv/api/vmslt.c  | 361 
 .../gcc.target/riscv/rvv/api/vmsltu.c | 361 
 .../gcc.target/riscv/rvv/api/vmsne.c  | 713 
 .../gcc.target/riscv/rvv/api/vmsof.c  |  65 ++
 gcc/testsuite/gcc.target/riscv/rvv/api/vmul.c | 713 
 

[PATCH 1/3] VECT: Add WHILE_LEN pattern to support decrement IV manipulation for loop vectorizer.

2023-04-06 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is to add WHILE_LEN pattern.
It's inspired by RVV ISA simple "vvaddint32.s" example:
https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s


More details are in "vect_set_loop_controls_by_while_len" implementation
and comments.

Consider such following case:
#define N 16
int src[N];
int dest[N];

void
foo (int n)
{
  for (int i = 0; i < n; i++)
dest[i] = src[i];
}

-march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
-fno-vect-cost-model -fno-tree-loop-distribute-patterns:
foo:
ble a0,zero,.L1
lui a4,%hi(.LANCHOR0)
addia4,a4,%lo(.LANCHOR0)
addia3,a4,64
csrra2,vlenb
.L3:
vsetvli a5,a0,e32,m1,ta,ma
vle32.v v1,0(a4)
sub a0,a0,a5
vse32.v v1,0(a3)
add a4,a4,a2
add a3,a3,a2
bne a0,zero,.L3
.L1:
ret

Also, we support multiple rgroup for SLP:
More testcases are in gcc/testsuite/gcc.target/riscv/rvv/autovec.

gcc/ChangeLog:

* doc/md.texi: Add while_len support
* internal-fn.cc (while_len_direct): Ditto.
(expand_while_len_optab_fn): Ditto.
(direct_while_len_optab_supported_p): Ditto.
* internal-fn.def (WHILE_LEN): Ditto.
* optabs.def (OPTAB_D): Ditto.
* tree-ssa-loop-manip.cc (create_iv): Ditto.
* tree-ssa-loop-manip.h (create_iv): Ditto.
* tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): New 
function.
(vect_set_loop_condition_partial_vectors): Add while_len support.
* tree-vect-loop.cc (vect_get_loop_len): Ditto.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto
* tree-vectorizer.h (vect_get_loop_len): Ditto.

---
 gcc/doc/md.texi |  14 +++
 gcc/internal-fn.cc  |  29 ++
 gcc/internal-fn.def |   1 +
 gcc/optabs.def  |   1 +
 gcc/tree-ssa-loop-manip.cc  |   4 +-
 gcc/tree-ssa-loop-manip.h   |   2 +-
 gcc/tree-vect-loop-manip.cc | 186 ++--
 gcc/tree-vect-loop.cc   |  35 +--
 gcc/tree-vect-stmts.cc  |   9 +-
 gcc/tree-vectorizer.h   |   4 +-
 10 files changed, 264 insertions(+), 21 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8e3113599fd..72178ab014c 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{while_len@var{m}@var{n}} instruction pattern
+@item @code{while_len@var{m}@var{n}}
+Set operand 0 to the number of active elements in vector will be updated value.
+operand 1 is the total elements need to be updated value.
+operand 2 is the vectorization factor.
+The operation is equivalent to:
+
+@smallexample
+operand0 = MIN (operand1, operand2);
+operand2 can be const_poly_int or poly_int related to vector mode size.
+Some target like RISC-V has a standalone instruction to get MIN (n, MODE SIZE) 
so
+that we can reduce a use of general purpose register.
+@end smallexample
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 6e81dc05e0e..5f44def90d3 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -127,6 +127,7 @@ init_internal_fns ()
 #define cond_binary_direct { 1, 1, true }
 #define cond_ternary_direct { 1, 1, true }
 #define while_direct { 0, 2, false }
+#define while_len_direct { 0, 0, false }
 #define fold_extract_direct { 2, 2, false }
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
@@ -3702,6 +3703,33 @@ expand_while_optab_fn (internal_fn, gcall *stmt, 
convert_optab optab)
 emit_move_insn (lhs_rtx, ops[0].value);
 }
 
+/* Expand WHILE_LEN call STMT using optab OPTAB.  */
+static void
+expand_while_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  expand_operand ops[3];
+  tree rhs_type[2];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree lhs_type = TREE_TYPE (lhs);
+  rtx lhs_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand ([0], lhs_rtx, TYPE_MODE (lhs_type));
+
+  for (unsigned int i = 0; i < gimple_call_num_args (stmt); ++i)
+{
+  tree rhs = gimple_call_arg (stmt, i);
+  rhs_type[i] = TREE_TYPE (rhs);
+  rtx rhs_rtx = expand_normal (rhs);
+  create_input_operand ([i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
+}
+
+  insn_code icode = direct_optab_handler (optab, TYPE_MODE (rhs_type[0]));
+
+  expand_insn (icode, 3, ops);
+  if (!rtx_equal_p (lhs_rtx, ops[0].value))
+emit_move_insn (lhs_rtx, ops[0].value);
+}
+
 /* Expand a call to a convert-like optab using the operands in STMT.
FN has a single output operand and NARGS input operands.  */
 
@@ -3843,6 +3871,7 @@ 

Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-06 Thread Jeff Law via Gcc-patches




On 4/6/23 04:31, Jakub Jelinek wrote:



As before, I unfortunately can't test it on riscv-linux (could perhaps try
that on sparc-solaris on GCC Farm which is another WORD_REGISTER_OPERATIONS
target, but last my bootstrap attempt there failed miserably because of the
Don't bootstrap at midnight issue in cp/Make-lang.in; I'll post a patch
for that once I test it).
I can do that test.  It'll take most of a day once it starts, so not 
eager to fire it up until we've settled on a patch.


jeff


[Bug other/109435] [MIPS64R6] Typedef struct alignment returns incorrect results

2023-04-06 Thread jovan.dmitrovic at syrmia dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109435

--- Comment #1 from Jovan Dmitrović  ---
This is compile command that I used:

mipsisa64r6-linux-gnuabi64-gcc -march=mips64r6 -mabi=64 -O0 -o foo foo.c
-static

I used the MIPS gcc package from Ubuntu's package repository.
Also, I used qemu-mips64 to run the executable file.

Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-06 Thread Eric Botcazou via Gcc-patches
> If the
> (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> to
> (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the
> invalid operation is then in
> simplify_context::simplify_binary_operation_1
> case AND:
> ...
>   if (HWI_COMPUTABLE_MODE_P (mode))
> {
>   HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
>   HOST_WIDE_INT nzop1;
>   if (CONST_INT_P (trueop1))
> {
>   HOST_WIDE_INT val1 = INTVAL (trueop1);
>   /* If we are turning off bits already known off in OP0, we
> need not do an AND.  */
>   if ((nzop0 & ~val1) == 0)
> return op0;
> }
> We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124
> [0x84c]).
> We then for integral? modes smaller than word_mode would then need to
> actually check nonzero_bits in the word_mode (on paradoxical subreg of
> trueop0?).  If INTVAL (trueop1) is >= 0, then I think just doing
> nonzero_bits in the wider mode would be all we need (although the
> subsequent (nzop1 & nzop0) == 0 case probably wants to have the current
> nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS
> means AND with a constant which has the most significant bit set for the
> upper bits.

Yes, I agree that there is a tension between this AND case and the swapping 
done in the combiner for WORD_REGISTER_OPERATIONS.  I also agree that it would 
make sense do call nonzero_bits on word_mode instead of mode here in this case 
because AND is a word_register_operation_p.

> So, perhaps just in the return op0; case add further code for
> WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
> again for the word mode and decide if it is still safe.

Does it work to just replace mode by word_mode in the calls to nonzero_bits?

> That patch doesn't change anything at all on the testcase, it is still
> miscompiled.

OK, too bad, thanks for trying it!

-- 
Eric Botcazou




[Bug other/109435] New: [MIPS64R6] Typedef struct alignment returns incorrect results

2023-04-06 Thread jovan.dmitrovic at syrmia dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109435

Bug ID: 109435
   Summary: [MIPS64R6] Typedef struct alignment returns incorrect
results
   Product: gcc
   Version: 10.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jovan.dmitrovic at syrmia dot com
  Target Milestone: ---

Consider the following testcase:


#include

typedef struct uint8 {
  unsigned v[8];
} uint8 __attribute__ ((aligned(128)));

unsigned callee(int x, uint8 a) {
return a.v[0];
}

uint8 identity(uint8 in) {
return in;
}

int main() {
uint8 vec = {1, 2, 3, 4, 5, 6, 7, 8};
printf("res1: %d\n", callee(99, identity(vec)));
uint8 temp = identity(vec);
printf("res2: %d\n", callee(99, temp));
}


When this code is compiled for MIPS64 R6, output is as follows:

res1: 3
res2: 1

However, when aligned attribute is removed from the testcase, res1 and res2
become 1,
which are values that were expected in the first place. 
Furthermore, this bug is reproducible only if there is a unused argument x in
function callee. Adding more arguments or switching their places also produces
expected values. Type of x doesn't seem to impact the result.

Re: [PATCH] riscv: Fix genrvv-type-indexer dependencies

2023-04-06 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Thu, Apr 6, 2023 at 5:46 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> I've noticed
> make: Circular build/genrvv-type-indexer.o <- gtype-desc.h dependency dropped.
>
> The following patch fixes that.  The RTL_BASE_H variable includes a lot of
> headers which the generator doesn't include, including gtype-desc.h.
> I've preprocessed it and checked all gcc/libiberty headers against what is
> included in the other dependency variables and here is what I found:
> 1) coretypes.h includes align.h, poly-int.h and poly-int-types.h which
>weren't listed (most of dependencies are thankfully done automatically,
>so it isn't that big deal except for these generators and the like)
> 2) system.h includes filenames.h (already listed) but filenames.h includes
>hashtab.h; instead of adding FILENAMES_H I've just added the dependency
>to SYSTEM_H
> 3) $(RTL_BASE_H) wasn't really needed at all and insn-modes.h is already
>included in $(CORETYPES_H)
>
> I'll bootstrap/regtest this on x86_64-linux tonight, ok for trunk?
>
> 2023-04-06  Jakub Jelinek  
>
> * Makefile.in (CORETYPES_H): Depend on align.h, poly-int.h and
> poly-int-types.h.
> (SYSTEM_H): Depend on $(HASHTAB_H).
> * config/riscv/t-riscv (build/genrvv-type-indexer.o): Remove unused
> dependency on $(RTL_BASE_H), remove redundant dependency on
> insn-modes.h.
>
> --- gcc/Makefile.in.jj  2023-03-21 11:04:19.034831460 +0100
> +++ gcc/Makefile.in 2023-04-06 10:55:58.457207062 +0200
> @@ -945,7 +945,8 @@ TARGET_DEF_H = target-def.h target-hooks
>  C_TARGET_DEF_H = c-family/c-target-def.h c-family/c-target-hooks-def.h \
>$(TREE_H) $(C_COMMON_H) $(HOOKS_H) common/common-targhooks.h
>  CORETYPES_H = coretypes.h insn-modes.h signop.h wide-int.h wide-int-print.h \
> -  insn-modes-inline.h $(MACHMODE_H) double-int.h
> +  insn-modes-inline.h $(MACHMODE_H) double-int.h align.h poly-int.h \
> +  poly-int-types.h
>  RTL_BASE_H = $(CORETYPES_H) rtl.h rtl.def reg-notes.def \
>insn-notes.def $(INPUT_H) $(REAL_H) statistics.h $(VEC_H) \
>$(FIXED_VALUE_H) alias.h $(HASHTAB_H)
> @@ -998,7 +999,8 @@ C_COMMON_H = c-family/c-common.h c-famil
>  C_PRAGMA_H = c-family/c-pragma.h $(CPPLIB_H)
>  C_TREE_H = c/c-tree.h $(C_COMMON_H) $(DIAGNOSTIC_H)
>  SYSTEM_H = system.h hwint.h $(srcdir)/../include/libiberty.h \
> -   $(srcdir)/../include/safe-ctype.h $(srcdir)/../include/filenames.h
> +   $(srcdir)/../include/safe-ctype.h $(srcdir)/../include/filenames.h \
> +   $(HASHTAB_H)
>  PREDICT_H = predict.h predict.def
>  CPPLIB_H = $(srcdir)/../libcpp/include/line-map.h \
> $(srcdir)/../libcpp/include/cpplib.h
> --- gcc/config/riscv/t-riscv.jj 2023-03-31 09:26:47.996219555 +0200
> +++ gcc/config/riscv/t-riscv2023-04-06 10:56:48.166479250 +0200
> @@ -102,8 +102,8 @@ $(common_out_file): $(srcdir)/config/ris
>  $(srcdir)/config/riscv/riscv-protos.h \
>  $(srcdir)/config/riscv/riscv-subset.h
>
> -build/genrvv-type-indexer.o: $(srcdir)/config/riscv/genrvv-type-indexer.cc 
> $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)  \
> -  $(CORETYPES_H) $(GTM_H) errors.h $(GENSUPPORT_H) insn-modes.h
> +build/genrvv-type-indexer.o: $(srcdir)/config/riscv/genrvv-type-indexer.cc 
> $(BCONFIG_H) $(SYSTEM_H)\
> +  $(CORETYPES_H) $(GTM_H) errors.h $(GENSUPPORT_H)
>
>  build/genrvv-type-indexer$(build_exeext): build/genrvv-type-indexer.o
> +$(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \
>
>
> Jakub
>


[PATCH] gcov: add info about "calls" to JSON output format

2023-04-06 Thread Martin Liška
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed after stage1 opens?

Thanks,
Martin

gcc/ChangeLog:

* doc/gcov.texi: Document the new "calls" field and document
the API bump.
* gcov.cc (output_intermediate_json_line): Output info about
calls.
(generate_results): Bump version to 2.

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov-17.C: Add call to a noreturn function.
* g++.dg/gcov/test-gcov-17.py: Cover new format.
* lib/gcov.exp: Add options for gcov that emit the extra info.
---
 gcc/doc/gcov.texi | 27 +--
 gcc/gcov.cc   | 12 +-
 gcc/testsuite/g++.dg/gcov/gcov-17.C   |  7 ++
 gcc/testsuite/g++.dg/gcov/test-gcov-17.py | 17 ++
 gcc/testsuite/lib/gcov.exp|  2 +-
 5 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/gcc/doc/gcov.texi b/gcc/doc/gcov.texi
index d39cce3a683..6739ebb3643 100644
--- a/gcc/doc/gcov.texi
+++ b/gcc/doc/gcov.texi
@@ -195,7 +195,7 @@ Structure of the JSON is following:
 @{
   "current_working_directory": "foo/bar",
   "data_file": "a.out",
-  "format_version": "1",
+  "format_version": "2",
   "gcc_version": "11.1.1 20210510"
   "files": ["$file"]
 @}
@@ -214,6 +214,12 @@ a compilation unit was compiled
 @item
 @var{format_version}: semantic version of the format
 
+Changes in version @emph{2}:
+@itemize @bullet
+@item
+@var{calls}: information about function calls is added
+@end itemize
+
 @item
 @var{gcc_version}: version of the GCC compiler
 @end itemize
@@ -292,6 +298,7 @@ Each @var{line} has the following form:
 @smallexample
 @{
   "branches": ["$branch"],
+  "calls": ["$call"],
   "count": 2,
   "line_number": 15,
   "unexecuted_block": false,
@@ -299,7 +306,7 @@ Each @var{line} has the following form:
 @}
 @end smallexample
 
-Branches are present only with @var{-b} option.
+Branches and calls are present only with @var{-b} option.
 Fields of the @var{line} element have following semantics:
 
 @itemize @bullet
@@ -341,6 +348,22 @@ Fields of the @var{branch} element have following 
semantics:
 @var{throw}: true when the branch is an exceptional branch
 @end itemize
 
+Each @var{call} has the following form:
+
+@smallexample
+@{
+  "returned": 11,
+@}
+@end smallexample
+
+Fields of the @var{call} element have following semantics:
+
+@itemize @bullet
+@item
+@var{returned}: number of times a function call returned (call count is equal
+to @var{line::count})
+@end itemize
+
 @item -H
 @itemx --human-readable
 Write counts in human readable format (like 24.6k).
diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index 2ec7248cc0e..88324143640 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -1116,6 +1116,9 @@ output_intermediate_json_line (json::array *object,
   json::array *branches = new json::array ();
   lineo->set ("branches", branches);
 
+  json::array *calls = new json::array ();
+  lineo->set ("calls", calls);
+
   vector::const_iterator it;
   if (flag_branches)
 for (it = line->branches.begin (); it != line->branches.end ();
@@ -1130,6 +1133,13 @@ output_intermediate_json_line (json::array *object,
 new json::literal ((*it)->fall_through));
branches->append (branch);
  }
+   else if ((*it)->is_call_non_return)
+ {
+   json::object *call = new json::object ();
+   gcov_type returns = (*it)->src->count - (*it)->count;
+   call->set ("returned", new json::integer_number (returns));
+   calls->append (call);
+ }
   }
 
   object->append (lineo);
@@ -1523,7 +1533,7 @@ generate_results (const char *file_name)
   gcov_intermediate_filename = get_gcov_intermediate_filename (file_name);
 
   json::object *root = new json::object ();
-  root->set ("format_version", new json::string ("1"));
+  root->set ("format_version", new json::string ("2"));
   root->set ("gcc_version", new json::string (version_string));
 
   if (bbg_cwd != NULL)
diff --git a/gcc/testsuite/g++.dg/gcov/gcov-17.C 
b/gcc/testsuite/g++.dg/gcov/gcov-17.C
index d11883cfd39..efe019599a5 100644
--- a/gcc/testsuite/g++.dg/gcov/gcov-17.C
+++ b/gcc/testsuite/g++.dg/gcov/gcov-17.C
@@ -15,6 +15,11 @@ private:
 template class Foo;
 template class Foo;
 
+static void noret()
+{
+  __builtin_exit (0);
+}
+
 int
 main (void)
 {
@@ -34,6 +39,8 @@ main (void)
 __builtin_printf ("Failure\n");
   else
 __builtin_printf ("Success\n");
+
+  noret ();
   return 0;
 }
 
diff --git a/gcc/testsuite/g++.dg/gcov/test-gcov-17.py 
b/gcc/testsuite/g++.dg/gcov/test-gcov-17.py
index ec5df3dec03..a0b8b09b85c 100644
--- a/gcc/testsuite/g++.dg/gcov/test-gcov-17.py
+++ b/gcc/testsuite/g++.dg/gcov/test-gcov-17.py
@@ -12,7 +12,7 @@ def test_basics(gcov):
 files = gcov['files']
 assert len(files) == 1
 functions = files[0]['functions']
-assert len(functions) == 5
+assert len(functions) == 6
 
 
 def 

Re: [PATCH] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-06 Thread juzhe.zh...@rivai.ai
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7317,6 +7317,12 @@ riscv_shamt_matches_mask_p (int shamt, HOST_WIDE_INT 
mask)
 #undef TARGET_DWARF_POLY_INDETERMINATE_VALUE
 #define TARGET_DWARF_POLY_INDETERMINATE_VALUE 
riscv_dwarf_poly_indeterminate_value
 
+namespace riscv_vector {
+extern HARD_REG_SET riscv_zero_call_used_regs (HARD_REG_SET);
+}

namespace riscv_vector should be put in the riscv-protos.h. Since there is 
already a riscv_vector namespace there.


juzhe.zh...@rivai.ai
 
From: yanzhang.wang
Date: 2023-04-06 21:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH] RISC-V: Fix regression of -fzero-call-used-regs=all
From: Yanzhang Wang 
 
This patch registers a riscv specific function to
TARGET_ZERO_CALL_USED_REGS instead of default in targhooks.cc. It will
clean gpr and vector relevant registers.
 
PR 109104
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (default_zero_call_used_regs):
(riscv_zero_call_used_regs):
* config/riscv/riscv.cc (riscv_zero_call_used_regs):
(TARGET_ZERO_CALL_USED_REGS):
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/zero-scratch-regs-1.c: New test.
* gcc.target/riscv/zero-scratch-regs-2.c: New test.
 
Signed-off-by: Yanzhang Wang 
Co-authored-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
Co-authored-by: Kito Cheng 
---
gcc/config/riscv/riscv-v.cc   | 79 +++
gcc/config/riscv/riscv.cc |  6 ++
.../gcc.target/riscv/zero-scratch-regs-1.c|  9 +++
.../gcc.target/riscv/zero-scratch-regs-2.c| 24 ++
4 files changed, 118 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-2.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2e91d019f6c..90c69b52bb4 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -43,6 +43,7 @@
#include "optabs.h"
#include "tm-constrs.h"
#include "rtx-vector-builder.h"
+#include "diagnostic-core.h"
using namespace riscv_vector;
@@ -724,4 +725,82 @@ gen_avl_for_scalar_move (rtx avl)
 }
}
+/* Generate a sequence of instructions that zero registers specified by
+   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
+   zeroed.  */
+static HARD_REG_SET
+gpr_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  for (unsigned regno = GP_REG_FIRST; regno <= GP_REG_LAST; ++regno)
+{
+  if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+ continue;
+
+  rtx reg = regno_reg_rtx[regno];
+  machine_mode mode = GET_MODE (reg);
+  emit_move_insn (reg, CONST0_RTX (mode));
+
+  SET_HARD_REG_BIT (zeroed_hardregs, regno);
+}
+
+  return zeroed_hardregs;
+}
+
+static HARD_REG_SET
+vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  /* Find a register to hold vl.  */
+  unsigned vl_regno = GP_REG_LAST + 1;
+  for (unsigned regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
+{
+  /* If vl and avl both are x0, the existing vl is kept.  */
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno) && regno != 
X0_REGNUM)
+ {
+   vl_regno = regno;
+   break;
+ }
+}
+
+  if (vl_regno > GP_REG_LAST)
+sorry ("can't allocate vl register for %qs on this target",
+"-fzero-call-used-regs");
+
+  rtx vl = gen_rtx_REG (Pmode, vl_regno); /* vl is VLMAX.  */
+  for (unsigned regno = V_REG_FIRST; regno <= V_REG_LAST; ++regno)
+{
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+ {
+   rtx target = regno_reg_rtx[regno];
+   machine_mode mode = GET_MODE (target);
+   poly_uint16 nunits = GET_MODE_NUNITS (mode);
+   machine_mode mask_mode = get_vector_mode (BImode, nunits).require ();
+
+   emit_vlmax_vsetvl (mode, vl);
+   emit_vlmax_op (code_for_pred_mov (mode), target, CONST0_RTX (mode),
+ vl, mask_mode);
+
+   SET_HARD_REG_BIT (zeroed_hardregs, regno);
+ }
+}
+
+  return zeroed_hardregs;
+}
+
+HARD_REG_SET
+riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  if (TARGET_VECTOR)
+zeroed_hardregs |= vector_zero_call_used_regs (need_zeroed_hardregs);
+
+  return zeroed_hardregs | gpr_zero_call_used_regs (need_zeroed_hardregs);
+}
} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5f542932d13..e176f2d9f34 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7317,6 +7317,12 @@ riscv_shamt_matches_mask_p (int shamt, HOST_WIDE_INT 
mask)
#undef TARGET_DWARF_POLY_INDETERMINATE_VALUE
#define TARGET_DWARF_POLY_INDETERMINATE_VALUE 
riscv_dwarf_poly_indeterminate_value
+namespace riscv_vector {
+extern HARD_REG_SET riscv_zero_call_used_regs (HARD_REG_SET);
+}
+#undef TARGET_ZERO_CALL_USED_REGS
+#define 

[PATCH] RISC-V: Fix regression of -fzero-call-used-regs=all

2023-04-06 Thread yanzhang.wang--- via Gcc-patches
From: Yanzhang Wang 

This patch registers a riscv specific function to
TARGET_ZERO_CALL_USED_REGS instead of default in targhooks.cc. It will
clean gpr and vector relevant registers.

PR 109104

gcc/ChangeLog:

* config/riscv/riscv-v.cc (default_zero_call_used_regs):
(riscv_zero_call_used_regs):
* config/riscv/riscv.cc (riscv_zero_call_used_regs):
(TARGET_ZERO_CALL_USED_REGS):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zero-scratch-regs-1.c: New test.
* gcc.target/riscv/zero-scratch-regs-2.c: New test.

Signed-off-by: Yanzhang Wang 
Co-authored-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
Co-authored-by: Kito Cheng 
---
 gcc/config/riscv/riscv-v.cc   | 79 +++
 gcc/config/riscv/riscv.cc |  6 ++
 .../gcc.target/riscv/zero-scratch-regs-1.c|  9 +++
 .../gcc.target/riscv/zero-scratch-regs-2.c| 24 ++
 4 files changed, 118 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-scratch-regs-2.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2e91d019f6c..90c69b52bb4 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -43,6 +43,7 @@
 #include "optabs.h"
 #include "tm-constrs.h"
 #include "rtx-vector-builder.h"
+#include "diagnostic-core.h"
 
 using namespace riscv_vector;
 
@@ -724,4 +725,82 @@ gen_avl_for_scalar_move (rtx avl)
 }
 }
 
+/* Generate a sequence of instructions that zero registers specified by
+   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
+   zeroed.  */
+static HARD_REG_SET
+gpr_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  for (unsigned regno = GP_REG_FIRST; regno <= GP_REG_LAST; ++regno)
+{
+  if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+   continue;
+
+  rtx reg = regno_reg_rtx[regno];
+  machine_mode mode = GET_MODE (reg);
+  emit_move_insn (reg, CONST0_RTX (mode));
+
+  SET_HARD_REG_BIT (zeroed_hardregs, regno);
+}
+
+  return zeroed_hardregs;
+}
+
+static HARD_REG_SET
+vector_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  /* Find a register to hold vl.  */
+  unsigned vl_regno = GP_REG_LAST + 1;
+  for (unsigned regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
+{
+  /* If vl and avl both are x0, the existing vl is kept.  */
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno) && regno != 
X0_REGNUM)
+   {
+ vl_regno = regno;
+ break;
+   }
+}
+
+  if (vl_regno > GP_REG_LAST)
+sorry ("can't allocate vl register for %qs on this target",
+  "-fzero-call-used-regs");
+
+  rtx vl = gen_rtx_REG (Pmode, vl_regno); /* vl is VLMAX.  */
+  for (unsigned regno = V_REG_FIRST; regno <= V_REG_LAST; ++regno)
+{
+  if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+   {
+ rtx target = regno_reg_rtx[regno];
+ machine_mode mode = GET_MODE (target);
+ poly_uint16 nunits = GET_MODE_NUNITS (mode);
+ machine_mode mask_mode = get_vector_mode (BImode, nunits).require ();
+
+ emit_vlmax_vsetvl (mode, vl);
+ emit_vlmax_op (code_for_pred_mov (mode), target, CONST0_RTX (mode),
+vl, mask_mode);
+
+ SET_HARD_REG_BIT (zeroed_hardregs, regno);
+   }
+}
+
+  return zeroed_hardregs;
+}
+
+HARD_REG_SET
+riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+
+  if (TARGET_VECTOR)
+zeroed_hardregs |= vector_zero_call_used_regs (need_zeroed_hardregs);
+
+  return zeroed_hardregs | gpr_zero_call_used_regs (need_zeroed_hardregs);
+}
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5f542932d13..e176f2d9f34 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7317,6 +7317,12 @@ riscv_shamt_matches_mask_p (int shamt, HOST_WIDE_INT 
mask)
 #undef TARGET_DWARF_POLY_INDETERMINATE_VALUE
 #define TARGET_DWARF_POLY_INDETERMINATE_VALUE 
riscv_dwarf_poly_indeterminate_value
 
+namespace riscv_vector {
+extern HARD_REG_SET riscv_zero_call_used_regs (HARD_REG_SET);
+}
+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS riscv_vector::riscv_zero_call_used_regs
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c 
b/gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c
new file mode 100644
index 000..2d9dfeb9dc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zero-scratch-regs-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -fno-stack-protector -fno-PIC" 
} 

[Bug c++/107853] [10/11 Regression] variadic template with a variadic template friend with a requires of fold expression

2023-04-06 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107853

Patrick Palka  changed:

   What|Removed |Added

Summary|[10/11/12 Regression]   |[10/11 Regression] variadic
   |variadic template with a|template with a variadic
   |variadic template friend|template friend with a
   |with a requires of fold |requires of fold expression
   |expression  |

--- Comment #9 from Patrick Palka  ---
Fixed for GCC 13 and 12.3 so far

[Bug c++/109425] mismatched argument pack lengths while expanding

2023-04-06 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109425

Patrick Palka  changed:

   What|Removed |Added

 CC||ppalka at gcc dot gnu.org

--- Comment #4 from Patrick Palka  ---
(In reply to Hannes Hauswedell from comment #2)
> Thanks for the quick reply, and nice that it is already fixed for 13!
> 
> I assume this will not be backported? It wouldn't be a huge problem, because
> it is possible to workaround with non-friend operators.

It's already been backported to the 12 branch, so the upcoming GCC 12.3 (or a
recent enough snapshot of the release branch) will also include the fix.  No
decision yet about backporting to the 11/10 branches.  Having to work around
this by defining these functions as non-friends is pretty inconvenient..

[Bug c++/109425] mismatched argument pack lengths while expanding

2023-04-06 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109425

--- Comment #3 from Marek Polacek  ---
(In reply to Hannes Hauswedell from comment #2)
> Thanks for the quick reply, and nice that it is already fixed for 13!
> 
> I assume this will not be backported? It wouldn't be a huge problem, because
> it is possible to workaround with non-friend operators.

I think this is up to Patrick to decide.

[Bug c++/109431] [10/11/12/13 Regression] internal compiler error: in output_constructor_regular_field with static constexpr array inside a template constexpr function

2023-04-06 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109431

Marek Polacek  changed:

   What|Removed |Added

   Keywords|needs-bisection |
 CC||mpolacek at gcc dot gnu.org

--- Comment #3 from Marek Polacek  ---
Started with r262173:

commit 307193b82cecb8ab79cf8880d642e1a3acb9c9f6
Author: Jason Merrill 
Date:   Tue Jun 26 22:59:44 2018 -0400

PR c++/86320 - memory-hog with std::array of pair

Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive

2023-04-06 Thread Frederik Harwath via Gcc-patches

Hi Thomas,

On 01.04.23 10:42, Thomas Schwinge wrote:

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):
   

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-1.f90   -O0  execution test

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90   -O0  execution 
test



Thank you for reporting the failures! They are caused by mistakes in the 
test code, not the implementation. I have attached a patch which fixes 
the failures.


I have been able to reproduce the failures with -m32. With the patch 
they went away, even with 100 of repeated test executions ;-).



Best regards,

Frederik
From 3f471ed293d2e97198a65447d2f0d2bb69a2f305 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 6 Apr 2023 14:52:07 +0200
Subject: [PATCH] openmp: Fix loop transformation tests

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause.
	* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var.
	* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction
	and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90  | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
 integer :: i,j
 
 sum = 0
-!$omp parallel do collapse(2)
+!$omp parallel do collapse(2) reduction(+:sum)
 !$omp tile sizes(6,10)
 do i = 1,10,3
do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp do
 do i = 1,10,3
!$omp unroll full
@@ -22,6 +23,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp parallel do reduction(+:sum)
 !$omp unroll partial(2)
 do i = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions
 
 integer :: i,j
 
-!$omp simd
+sum = 0
+!$omp simd reduction(+:sum)
 do i = 1,10,3
!$omp unroll full
do j = 1,10,3
-- 
2.36.1



[Bug c++/109433] [12/13 Regression] ICE with -std=c++11 and static constexpr array inside a template constexpr

2023-04-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109433

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||jason at gcc dot gnu.org
   Keywords|needs-bisection |
 Status|UNCONFIRMED |NEW
   Priority|P3  |P2
 Ever confirmed|0   |1
   Last reconfirmed||2023-04-06

--- Comment #1 from Jakub Jelinek  ---
With -std=c++11, started to ICE with r12-6326-ge948436eab818c527dd6.
With -std=c++14, started to ICE with r9-1483-g307193b82cecb8ab79cf.

[Bug lto/109428] GCC did not fix CVE-2022-37434, a heap overflow bug introduced by its dependency zlib code.

2023-04-06 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109428

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #6 from Xi Ruoyao  ---
FWIW use --with-system-zlib when you configure GCC if you want to use zlib
installed on the system instead of the shipped copy.

[Bug tree-optimization/109417] [13 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: Segmentation fault since r13-6945

2023-04-06 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109417

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Andrew Macleod  ---
fixed.

[Bug tree-optimization/109417] [13 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: Segmentation fault since r13-6945

2023-04-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109417

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:7f056d5f4a0b9e29561d0375d5b4ad42c9f3f61e

commit r13-7113-g7f056d5f4a0b9e29561d0375d5b4ad42c9f3f61e
Author: Andrew MacLeod 
Date:   Wed Apr 5 15:59:38 2023 -0400

Check if dependency is valid before using in may_recompute_p.

When the IL is rewritten after a statement has been processed and
dependencies cached, its possible that an ssa-name in the dependency
cache is no longer in the IL.  Check this before trying to recompute.

PR tree-optimization/109417
gcc/
* gimple-range-gori.cc (gori_compute::may_recompute_p): Check if
dependency is in SSA_NAME_FREE_LIST.

gcc/testsuite/
* gcc.dg/pr109417.c: New.

[Bug target/82028] Windows x86_64 should not pass float aggregates in xmm

2023-04-06 Thread lh_mouse at 126 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82028

--- Comment #6 from LIU Hao  ---
Looks like this has been fixed?  https://gcc.godbolt.org/z/xP5E76aYz

Despite that however, GCC generates suboptimal code that uses an XMM register
to perform the bitwise AND operation.

[Bug tree-optimization/109434] [12/13 Regression] std::optional weird -Wmaybe-unitialized and behaviour with -O2

2023-04-06 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109434

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-04-06
  Known to fail||12.1.0
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Keywords||needs-bisection,
   ||needs-reduction
Summary|std::optional weird |[12/13 Regression]
   |-Wmaybe-unitialized and |std::optional weird
   |behaviour with -O2  |-Wmaybe-unitialized and
   ||behaviour with -O2
   Target Milestone|--- |12.3
  Known to work||10.1.0, 11.1.0

--- Comment #1 from Andrew Pinski  ---
DSE2 removes the original store:

  optInt = {};
  optInt = foo ();


  Deleted dead store: optInt = {};

Even though foo can throw ...

Confirmed.

Re: [PATCH] Add ssp_nonshared to link commandline for musl targets

2023-04-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 06, 2023 at 05:03:22PM +0530, Yash Shinde via Gcc-patches wrote:
> When -fstack-protector options are enabled we need to
> link with ssp_shared on musl since it does not provide
> the __stack_chk_fail_local() so essentially it provides
> libssp but not libssp_nonshared something like
> TARGET_LIBC_PROVIDES_SSP_BUT_NOT_SSP_NONSHARED
>  where-as for glibc the needed symbols
> are already present in libc_nonshared library therefore
> we do not need any library helper on glibc based systems
> but musl needs the libssp_noshared from gcc
> 
> diff --git a/gcc/config/linux.h b/gcc/config/linux.h
> index e3aca79..33f9265bb93 100644
> --- a/gcc/config/linux.h
> +++ b/gcc/config/linux.h
> @@ -189,6 +189,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
>  If not, see
>}
>  #endif
>  
> +#ifdef TARGET_LIBC_PROVIDES_SSP
> +#undef LINK_SSP_SPEC
> +#define LINK_SSP_SPEC "%{fstack-protector|fstack-protector-all" \
> +   "|fstack-protector-strong|fstack-protector-explicit" \
> +   ":-lssp_nonshared}"
> +#endif

This links with -lssp_nonshared even for glibc, that certainly shouldn't be 
done.

Jakub



[PATCH] Search target sysroot gcc version specific dirs with multilib.

2023-04-06 Thread Yash Shinde via Gcc-patches
From: Khem Raj 

We install the gcc libraries (such as crtbegin.p) into
//5.2.0/
which is a default search path for GCC (aka multi_suffix in the
code below).  is 'machine' in gcc's terminology. We use
these directories so that multiple gcc versions could in theory
co-exist on target.

We only want to build one gcc-cross-canadian per arch and have this work
for all multilibs.  can be handled by mapping the multilib
 to the one used by gcc-cross-canadian, e.g.
mips64-polkmllib32-linux
is symlinked to by mips64-poky-linux.

The default gcc search path in the target sysroot for a "lib64" mutlilib
is:

/lib32/mips64-poky-linux/5.2.0/
/lib32/../lib64/
/usr/lib32/mips64-poky-linux/5.2.0/
/usr/lib32/../lib64/
/lib32/
/usr/lib32/

which means that the lib32 crtbegin.o will be found and the lib64 ones
will not which leads to compiler failures.

This patch injects a multilib version of that path first so the lib64
binaries can be found first. With this change the search path becomes:

/lib32/../lib64/mips64-poky-linux/5.2.0/
/lib32/mips64-poky-linux/5.2.0/
/lib32/../lib64/
/usr/lib32/../lib64/mips64-poky-linux/5.2.0/
/usr/lib32/mips64-poky-linux/5.2.0/
/usr/lib32/../lib64/
/lib32/
/usr/lib32/

Signed-off-by: Khem Raj 
Signed-off-by: Yash Shinde 
---
 gcc/gcc.cc | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 16bb07f2cdc..4e5e3079804 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -2801,7 +2801,7 @@ for_each_path (const struct path_prefix *paths,
   if (path == NULL)
{
  len = paths->max_len + extra_space + 1;
- len += MAX (MAX (suffix_len, multi_os_dir_len), multiarch_len);
+ len += MAX ((suffix_len + multi_os_dir_len), multiarch_len);
  path = XNEWVEC (char, len);
}
 
@@ -2813,6 +2813,33 @@ for_each_path (const struct path_prefix *paths,
  /* Look first in MACHINE/VERSION subdirectory.  */
  if (!skip_multi_dir)
{
+ if (!(pl->os_multilib ? skip_multi_os_dir : skip_multi_dir))
+  {
+const char *this_multi;
+size_t this_multi_len;
+
+if (pl->os_multilib)
+  {
+this_multi = multi_os_dir;
+this_multi_len = multi_os_dir_len;
+  }
+else
+  {
+this_multi = multi_dir;
+this_multi_len = multi_dir_len;
+  }
+
+/* Look in multilib MACHINE/VERSION subdirectory first */
+if (this_multi_len)
+  {
+memcpy (path + len, this_multi, this_multi_len + 1);
+memcpy (path + len + this_multi_len, multi_suffix, 
suffix_len + 1);
+ret = callback (path, callback_info);
+   if (ret)
+break;
+  }
+  }
+
  memcpy (path + len, multi_suffix, suffix_len + 1);
  ret = callback (path, callback_info);
  if (ret)
-- 
2.34.1



[PATCH] gcc: armv4: pass fix-v4bx to linker to support EABI.

2023-04-06 Thread Khem Raj via Gcc-patches
The LINK_SPEC for linux gets overwritten by linux-eabi.h which
means the value of TARGET_FIX_V4BX_SPEC gets lost and as a result
the option is not passed to linker when chosing march=armv4
This patch redefines this in linux-eabi.h and reinserts it
for eabi defaulting toolchains.

Signed-off-by: Khem Raj 
Signed-off-by: Yash Shinde 

---
 gcc/config/arm/linux-eabi.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)
---
 gcc/config/arm/linux-eabi.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
index a119875599d..e8b64c17b01 100644
--- a/gcc/config/arm/linux-eabi.h
+++ b/gcc/config/arm/linux-eabi.h
@@ -88,10 +88,14 @@
 #define MUSL_DYNAMIC_LINKER \
   "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E 
"%{mfloat-abi=hard:hf}%{mfdpic:-fdpic}.so.1"
 
+/* For armv4 we pass --fix-v4bx to linker to support EABI */
+#undef TARGET_FIX_V4BX_SPEC
+#define TARGET_FIX_V4BX_SPEC 
"%{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4: --fix-v4bx}"
+
 /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
use the GNU/Linux version, not the generic BPABI version.  */
 #undef  LINK_SPEC
-#define LINK_SPEC EABI_LINK_SPEC   \
+#define LINK_SPEC TARGET_FIX_V4BX_SPEC EABI_LINK_SPEC  \
   LINUX_OR_ANDROID_LD (LINUX_TARGET_LINK_SPEC, \
   LINUX_TARGET_LINK_SPEC " " ANDROID_LINK_SPEC)
 
-- 
2.34.1



Re: [PATCH] combine: Fix simplify_comparison AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-06 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 06, 2023 at 12:51:20PM +0200, Eric Botcazou wrote:
> > If we want to fix it in the combiner, I think the fix would be following.
> > The optimization is about
> > (and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
> > and IMHO we can only optimize it into
> > (subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
> > if we know that the upper bits of the REG are zeros.
> 
> The reasoning is that, for WORD_REGISTER_OPERATIONS, the subword AND 
> operation 
> is done on the full word register, in other words that it's in effect:
> 
> (subreg:SI (and:SI (reg:SI xxx) (const_int 0x84c)) 0)
> 
> that is equivalent to the initial RTL so correct for WORD_REGISTER_OPERATIONS.

If the
(and:SI (subreg:SI (reg:HI xxx) 0) (const_int 0x84c))
to
(subreg:SI (and:HI (reg:HI xxx) (const_int 0x84c)) 0)
transformation is kosher for WORD_REGISTER_OPERATIONS, then I guess the
invalid operation is then in
simplify_context::simplify_binary_operation_1
case AND:
...
  if (HWI_COMPUTABLE_MODE_P (mode))
{
  HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
  HOST_WIDE_INT nzop1;
  if (CONST_INT_P (trueop1))
{
  HOST_WIDE_INT val1 = INTVAL (trueop1);
  /* If we are turning off bits already known off in OP0, we need
 not do an AND.  */
  if ((nzop0 & ~val1) == 0)
return op0;
}
We have there op0==trueop0 (reg:HI 175) and op1==trueop1 (const_int 2124
[0x84c]).
We then for integral? modes smaller than word_mode would then need to
actually check nonzero_bits in the word_mode (on paradoxical subreg of
trueop0?).  If INTVAL (trueop1) is >= 0, then I think just doing
nonzero_bits in the wider mode would be all we need (although the
subsequent (nzop1 & nzop0) == 0 case probably wants to have the current
nonzero_bits calls), not really sure what for WORD_REGISTER_OPERATIONS
means AND with a constant which has the most significant bit set for the
upper bits.

So, perhaps just in the return op0; case add further code for
WORD_REGISTER_OPERATIONS and sub-word modes which will call nonzero_bits
again for the word mode and decide if it is still safe.

> > Now, this patch fixes the PR, but certainly generates worse (but correct)
> > code than the dse.cc patch.  So perhaps we want both of them?
> 
> What happens if you disable the step I mentioned (patchlet attached)?

That patch doesn't change anything at all on the testcase, it is still
miscompiled.

Jakub



[PATCH] Add ssp_nonshared to link commandline for musl targets

2023-04-06 Thread Yash Shinde via Gcc-patches
From: Khem Raj 

When -fstack-protector options are enabled we need to
link with ssp_shared on musl since it does not provide
the __stack_chk_fail_local() so essentially it provides
libssp but not libssp_nonshared something like
TARGET_LIBC_PROVIDES_SSP_BUT_NOT_SSP_NONSHARED
 where-as for glibc the needed symbols
are already present in libc_nonshared library therefore
we do not need any library helper on glibc based systems
but musl needs the libssp_noshared from gcc

Signed-off-by: Khem Raj 
Signed-off-by: Yash Shinde 
---
 gcc/config/linux.h  |  7 +++
 gcc/config/rs6000/linux.h   | 10 ++
 gcc/config/rs6000/linux64.h | 10 ++
 3 files changed, 27 insertions(+)
---
 gcc/config/linux.h  |  7 +++
 gcc/config/rs6000/linux.h   | 10 ++
 gcc/config/rs6000/linux64.h | 10 ++
 3 files changed, 27 insertions(+)

diff --git a/gcc/config/linux.h b/gcc/config/linux.h
index e3aca79..33f9265bb93 100644
--- a/gcc/config/linux.h
+++ b/gcc/config/linux.h
@@ -189,6 +189,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
   }
 #endif
 
+#ifdef TARGET_LIBC_PROVIDES_SSP
+#undef LINK_SSP_SPEC
+#define LINK_SSP_SPEC "%{fstack-protector|fstack-protector-all" \
+ "|fstack-protector-strong|fstack-protector-explicit" \
+ ":-lssp_nonshared}"
+#endif
+
 #if (DEFAULT_LIBC == LIBC_UCLIBC) && defined (SINGLE_LIBC) /* uClinux */
 /* This is a *uclinux* target.  We don't define below macros to normal linux
versions, because doing so would require *uclinux* targets to include
diff --git a/gcc/config/rs6000/linux.h b/gcc/config/rs6000/linux.h
index 5d21befe8e4..4fc17e781ba 100644
--- a/gcc/config/rs6000/linux.h
+++ b/gcc/config/rs6000/linux.h
@@ -99,6 +99,16 @@
 " -m elf32ppclinux")
 #endif
 
+/* link libssp_nonshared.a with musl */
+#if DEFAULT_LIBC == LIBC_MUSL
+#ifdef TARGET_LIBC_PROVIDES_SSP
+#undef LINK_SSP_SPEC
+#define LINK_SSP_SPEC "%{fstack-protector|fstack-protector-all" \
+ "|fstack-protector-strong|fstack-protector-explicit" \
+ ":-lssp_nonshared}"
+#endif
+#endif
+
 #undef LINK_OS_LINUX_SPEC
 #define LINK_OS_LINUX_SPEC LINK_OS_LINUX_EMUL " %{!shared: %{!static: \
   %{!static-pie: \
diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index 9e457033d11..49c9f6e2105 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -377,6 +377,16 @@ extern int dot_symbols;
   " -m elf64ppc")
 #endif
 
+/* link libssp_nonshared.a with musl */
+#if DEFAULT_LIBC == LIBC_MUSL
+#ifdef TARGET_LIBC_PROVIDES_SSP
+#undef LINK_SSP_SPEC
+#define LINK_SSP_SPEC "%{fstack-protector|fstack-protector-all" \
+ "|fstack-protector-strong|fstack-protector-explicit" \
+ ":-lssp_nonshared}"
+#endif
+#endif
+
 #define LINK_OS_LINUX_SPEC32 LINK_OS_LINUX_EMUL32 " %{!shared: %{!static: \
   %{!static-pie: \
 %{rdynamic:-export-dynamic} \
-- 
2.34.1



[Bug c++/109434] New: std::optional weird -Wmaybe-unitialized and behaviour with -O2

2023-04-06 Thread tomas.pecka at cesnet dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109434

Bug ID: 109434
   Summary: std::optional weird -Wmaybe-unitialized and behaviour
with -O2
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tomas.pecka at cesnet dot cz
  Target Milestone: ---

Created attachment 54816
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54816=edit
reproducer

Hello,

I've seen a compile warning on a code (see attachment) that looks bogus. 

  In file included from optional.cpp:2:
  In member function ‘constexpr bool std::_Optional_base_impl<_Tp,
_Dp>::_M_is_engaged() const [with _Tp = int; _Dp = std::_Optional_base]’,
  inlined from ‘constexpr bool std::optional<_Tp>::has_value() const [with
_Tp = int]’ at /usr/include/c++/12.2.1/optional:988:35,
  inlined from ‘int main()’ at optional.cpp:21:63:
  /usr/include/c++/12.2.1/optional:471:58: warning: ‘*(unsigned
char*)((char*) +
offsetof(std::optional,std::optional::.std::_Optional_base::_M_payload.std::_Optional_payload::.std::_Optional_payload_base::_M_engaged))’ may be used
uninitialized [-Wmaybe-uninitialized]
471 |   { return static_cast(this)->_M_payload._M_engaged;
}
|  ^~
  optional.cpp: In function ‘int main()’:
  optional.cpp:15:24: note: ‘*(unsigned char*)((char*) +
offsetof(std::optional,std::optional::.std::_Optional_base::_M_payload.std::_Optional_payload::.std::_Optional_payload_base::_M_engaged))’ was declared
here
 15 | std::optional optInt;
|^~

The warning is present only when compiling with -Wall -O2. When I run the
executable, I see unexpected weird output, e.g.

  catch 56
  val=72704.00

which look like some uninitialized variable is really there. When run under
valgrind, I see reports about unitialized variables.

Code seems to be working correctly when compiled with -O1 or lower and the
executable seem to be behaving expectedly as well.
Compiling and executing under clang++ works as well regardless of the
optimization level. 
I don't have older versions of g++ (< 12) but compiling the attached code on
godbolt with g++ 11 and lower does not trigger any warning.

Interesting part is that declaring optDbl as std::optional makes the code
behave correctly.

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-04-06 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 6 Apr 2023 at 16:05, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 4 Apr 2023 at 23:35, Richard Sandiford
> >  wrote:
> >> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> >> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> > index cd9cace3c9b..3de79060619 100644
> >> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> >> > @@ -817,6 +817,62 @@ public:
> >> >
> >> >  class svdupq_impl : public quiet
> >> >  {
> >> > +private:
> >> > +  gimple *
> >> > +  fold_nonconst_dupq (gimple_folder , unsigned factor) const
> >> > +  {
> >> > +/* Lower lhs = svdupq (arg0, arg1, ..., argN} into:
> >> > +   tmp = {arg0, arg1, ..., arg}
> >> > +   lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...})  */
> >> > +
> >> > +/* TODO: Revisit to handle factor by padding zeros.  */
> >> > +if (factor > 1)
> >> > +  return NULL;
> >>
> >> Isn't the key thing here predicate vs. vector rather than factor == 1 vs.
> >> factor != 1?  Do we generate good code for b8, where factor should be 1?
> > Hi,
> > It generates the following code for svdup_n_b8:
> > https://pastebin.com/ypYt590c
>
> Hmm, yeah, not pretty :-)  But it's not pretty without either.
>
> > I suppose lowering to ctor+vec_perm_expr is not really useful
> > for this case because it won't simplify ctor, unlike the above case of
> > svdupq_s32 (x[0], x[1], x[2], x[3]);
> > However I wonder if it's still a good idea to lower svdupq for predicates, 
> > for
> > representing svdupq (or other intrinsics) using GIMPLE constructs as
> > far as possible ?
>
> It's possible, but I think we'd need an example in which its a clear
> benefit.
Sorry I posted for wrong test case above.
For the following test:
svbool_t f(uint8x16_t x)
{
  return svdupq_n_b8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
x[8], x[9], x[10], x[11], x[12],
x[13], x[14], x[15]);
}

Code-gen:
https://pastebin.com/maexgeJn

I suppose it's equivalent to following ?

svbool_t f2(uint8x16_t x)
{
  svuint8_t tmp = svdupq_n_u8 ((bool) x[0], (bool) x[1], (bool) x[2],
(bool) x[3],
   (bool) x[4], (bool) x[5], (bool) x[6],
(bool) x[7],
   (bool) x[8], (bool) x[9], (bool) x[10],
(bool) x[11],
   (bool) x[12], (bool) x[13], (bool)
x[14], (bool) x[15]);
  return svcmpne_n_u8 (svptrue_b8 (), tmp, 0);
}

which generates:
f2:
.LFB3901:
.cfi_startproc
moviv1.16b, 0x1
ptrue   p0.b, all
cmeqv0.16b, v0.16b, #0
bic v0.16b, v1.16b, v0.16b
dup z0.q, z0.q[0]
cmpne   p0.b, p0/z, z0.b, #0
ret

Thanks,
Prathamesh
>
> > In the attached patch, it simply punts if the type
> > suffix is b,
> > and doesn't try to fold the call.
>
> Yeah, think that's best for now.
>
> >> > +
> >> > +if (BYTES_BIG_ENDIAN)
> >> > +  return NULL;
> >> > +
> >> > +tree lhs = gimple_call_lhs (f.call);
> >> > +if (TREE_CODE (lhs) != SSA_NAME)
> >> > +  return NULL;
> >>
> >> Why is this check needed?
> > This was a left-over from something else I was doing wrongly. Sorry I
> > forgot to remove it.
> >>
> >> > +tree lhs_type = TREE_TYPE (lhs);
> >> > +tree elt_type = TREE_TYPE (lhs_type);
> >> > +scalar_mode elt_mode = GET_MODE_INNER (TYPE_MODE (elt_type));
> >>
> >> Aren't we already dealing with a scalar type here?  I'd have expected
> >> SCALAR_TYPE_MODE rather than GET_MODE_INNER (TYPE_MODE ...).
> > Ugh, sorry, I had most of the code copied over from svld1rq_impl for
> > building VEC_PERM_EXPR with VLA mask and adjusted it,
> > but overlooked this :/
> >>
> >> > +machine_mode vq_mode = aarch64_vq_mode (elt_mode).require ();
> >> > +tree vq_type = build_vector_type_for_mode (elt_type, vq_mode);
> >> > +
> >> > +unsigned nargs = gimple_call_num_args (f.call);
> >> > +vec *v;
> >> > +vec_alloc (v, nargs);
> >> > +for (unsigned i = 0; i < nargs; i++)
> >> > +  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, gimple_call_arg (f.call, 
> >> > i));
> >> > +tree vec = build_constructor (vq_type, v);
> >> > +
> >> > +tree access_type
> >> > +  = build_aligned_type (vq_type, TYPE_ALIGN (elt_type));
> >>
> >> Nit: seems to fit on one line.  But do we need this?  We're not accessing
> >> memory, so I'd have expected vq_type to be OK as-is.
> >>
> >> > +tree tmp = make_ssa_name_fn (cfun, access_type, 0);
> >> > +gimple *g = gimple_build_assign (tmp, vec);
> >> > +
> >> > +gimple_seq stmts = NULL;
> >> > +gimple_seq_add_stmt_without_update (, g);
> >> > +
> >> > +int source_nelts = TYPE_VECTOR_SUBPARTS (access_type).to_constant 
> >> > ();
> >>
> >> Looks like we should be able to use nargs instead of source_nelts.
> > Does the attached patch look OK ?
> >
> > Thanks,
> > Prathamesh
> >>
> >
> >> Thanks,
> >> Richard
> >>
> >> > +poly_uint64 

[Bug c/109426] Gcc runs into Infinite loop

2023-04-06 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109426

--- Comment #6 from Jonathan Wakely  ---
It's a pattern with this person:
https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=RESOLVED_known_to_fail_type=allwords_known_to_work_type=allwords=zhonghao%40pku.org.cn=1=substring_id=377697=gcc_format=advanced=INVALID

See PR 86306 for discussion and their justification for these low quality
reports. I think they're genuinely trying to be helpful, just not succeeding.

Re: -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds

2023-04-06 Thread Benjamin Priour via Gcc
Hi David,
I haven't yet looked into your suggestions, probably won't have time until
tomorrow actually :/
Still, here are some updates

On Thu, Apr 6, 2023 at 2:32 AM David Malcolm  wrote:

> On Wed, 2023-04-05 at 19:50 +0200, Benjamin Priour wrote:
> > Hi David,
> >
> > I used the below code snippet to experiment with out-of-bounds (OOB)
> > on
> > trunk. Three things occurred that I believe could see some
> > improvement. See
> > https://godbolt.org/z/57n459cEs for the warnings.
> >
> > int consecutive_oob_in_frame ()
> > {
> > int arr[] = {1,2,3,4,5,6,7};
> > int y1 = arr[9]; // only  this one get warnings (3*2 actually),
> > expect
> > only 1 OOB though
> > int y2 = arr[10]; // expect a warning too, despite fooling with
> > asm
> > int y3 = arr[50]; // expect a warning for that one too (see asm)
> > return (y1+y2+y3);
> > }
> >
> > int goo () {
> > int x = consecutive_oob_in_frame (); // causes another pair of
> > warnings
> > return 2 * x;
> > }
> >
> > int main () {
> > goo (); // causes another pair of warning
> > consecutive_oob_in_frame (); // silent
> > int x [] = {1,2};
> > x[5]; /* silent, probably because another set of OOB warnings
> > has already been issued with this frame being the source */
> > return 0;
> > }
>
>

> There's quite a bit of duplication here.  My recollection is that
> there's code in the analyzer that's meant to be eliminating some of
> this e.g. we want to show the OOB when consecutive_oob_in_frame is
> called directly; we *don't* want to show it when
> consecutive_oob_in_frame is called by goo.  Perhaps this deduplication
> code isn't working?  Can you reproduce similar behavior with C, or is
> it specific to C++?
>
>
Identical behavior both in C and C++. I will look at this code, any hint at
where it starts ?
Otherwise I would find it the good old way.


>
> >
> > First, as the subject line reads, I get a
> > -Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-
> > bounds. I
> > feel it might be too much, as fixing the OOB would fix the former.
> > So maybe only OOB could result in a warning here ?
>
> Yes, that's a good point.  Can you file a bug about this in bugzilla
> please?  (and feel free to assign it to yourself if you want to have a
> go at fixing it)
>

Unfortunately the Assignee field is grayed out for me in both enter_bug.cgi
and show_bug.cgi.
I've also created a new tracker bug for out-of-bounds, as there is a number
of related bugs.


>
> Maybe we could fix this by having region_model::check_region_bounds
> return a bool that signifies if the access is valid, and propagating
> that value up through callers so that we can return a non-
> poisoned_svalue at the point where we'd normally return an
> "uninitialized" poisoned_svalue.
>
> Alternatively, we could simply terminate any analysis path in which an
> OOB access is detected (by implementing the
> pending_diagnostic::terminate_path_p virtual function for class
> out_of_bounds).
>

I'm adding your suggestions as comment to the filed bugs so as to not
forget them.


>
> >
> > Second, it seems that if a frame was a cause for a OOB (either by
> > containing the spurious code or by being a caller to such code), it
> > will
> > only emit one set of warning, rather than at each unique compromising
> > statements.
>
> Maybe.  There's a pending_diagnostic::supercedes_p virtual function
> that perhaps we could implement for out_of_bounds (or its subclasses).
>
>

> >
> > Finally, I think the diagnostic path should only go at deep as the
> > declaration of the injurious index.
>
> I'm not quite sure what you mean by this, sorry.
>
>
Indeed not the best explanation so far. I was actually sort of suggesting
to only emit OOB only on direct call sites,
you did too, so in a way you have answered me on this.

Just an addition though: if there is an OOB independent of its enclosing
function's parameters, I think
it might make sense to not emit for this particular OOB outside the
function definition itself.
Meaning that no OOB should be emitted on call sites to this function for
this particular array access.
(Typically, consecutive_oob_in_frame () shouldn't have resulted in more
than one warning, since the OOB within is independent of its parameters).

I believe right now the expected behavior is to issue warnings only on
actual function calls, so that a function never called
won't result in warnings. As a result, the initial analysis of each
functions should never results in warnings -actually the case for
malloc-leak,
not for OOB though-.
Thus we would need to tweak this into actually diagnosing the issues on
initial analysis -those that can be at least-, so that they are saved for a
later
use whenever the function is actually called. Then we would emit them once,
and only once, because by nature these diagnostics are parameters
independent.
I hope I made it clearer, not more convoluted.

>
> > Also, have you considered extending the current call summaries 

[PATCH 2/3] RFC - match.pd: simplify debug dump checks

2023-04-06 Thread Tamar Christina via Gcc-patches
Hi All,

Just sending these so people can test the series

This is a small improvement in QoL codegen for match.pd to save time not
re-evaluating the condition for printing debug information in every function.

There is a small but consistent runtime and compile time win here.  The runtime
win comes from not having to do the condition over again, and on Arm plaforms
we now use the new test-and-branch support for booleans to only have a single
instruction here.

Compile time win is gotten from not having to do all the string parsing for the
printf and having less string interning to do.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for GCC 14?

Thanks,
Tamar

gcc/ChangeLog:

PR bootstrap/84402
* dumpfile.h (dump_folding_p): New.
* dumpfile.cc (set_dump_file): Use it.
* generic-match-head.cc (dump_debug): New.
* gimple-match-head.cc (dump_debug): New.
* genmatch.cc (output_line_directive):  Support outputting only line
because file is implied.
(dt_simplify::gen_1): Call debug_dump instead of printf.

--- inline copy of patch -- 
diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
index 
7d5eca899dcc98676a9ce7a7efff8e439854ff89..e7b595ddecdcca9983d9584b8b2417ae1941c7d4
 100644
--- a/gcc/dumpfile.h
+++ b/gcc/dumpfile.h
@@ -522,6 +522,7 @@ parse_dump_option (const char *, const char **);
 extern FILE *dump_file;
 extern dump_flags_t dump_flags;
 extern const char *dump_file_name;
+extern bool dump_folding_p;
 
 extern bool dumps_are_enabled;
 
diff --git a/gcc/dumpfile.cc b/gcc/dumpfile.cc
index 
51f68c8c6b40051ba3125c84298ee44ca52f5d17..f805aa73f3aa244d847149eec26505181ce4efe8
 100644
--- a/gcc/dumpfile.cc
+++ b/gcc/dumpfile.cc
@@ -63,6 +63,7 @@ FILE *dump_file = NULL;
 const char *dump_file_name;
 dump_flags_t dump_flags;
 bool dumps_are_enabled = false;
+bool dump_folding_p = false;
 
 
 /* Set global "dump_file" to NEW_DUMP_FILE, refreshing the "dumps_are_enabled"
@@ -73,6 +74,7 @@ set_dump_file (FILE *new_dump_file)
 {
   dumpfile_ensure_any_optinfo_are_flushed ();
   dump_file = new_dump_file;
+  dump_folding_p = dump_file && (dump_flags & TDF_FOLDING);
   dump_context::get ().refresh_dumps_are_enabled ();
 }
 
diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 
f011204c5be450663231bdece0596317b37f9f9b..16b8f9f3b61d3d5651a5a41a8c0552f50b55cc7c
 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -102,3 +102,17 @@ optimize_successive_divisions_p (tree, tree)
 {
   return false;
 }
+
+/* Helper method for debug printing to reducing string parsing overhead.  Keep
+   in sync with version in gimple-match-head.cc.  */
+
+static
+void dump_debug (bool simplify, int loc, const char *file, int lineno)
+{
+  if (simplify)
+fprintf (dump_file, "Applying pattern %s:%d, %s:%d\n", "match.pd", loc,
+file, lineno);
+  else
+fprintf (dump_file, "Matching expression %s:%d, %s:%d\n", "match.pd", loc,
+file, lineno);
+}
\ No newline at end of file
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 
638606b2502f640e59527fc5a0b23fa3bedd0cee..bd7c6ff4a3fb89d456b02242707fd823b737f20d
 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -185,7 +185,8 @@ fprintf_indent (FILE *f, unsigned int indent, const char 
*format, ...)
 
 static void
 output_line_directive (FILE *f, location_t location,
-  bool dumpfile = false, bool fnargs = false)
+  bool dumpfile = false, bool fnargs = false,
+  bool loc_only = false)
 {
   const line_map_ordinary *map;
   linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, );
@@ -204,7 +205,9 @@ output_line_directive (FILE *f, location_t location,
   else
++file;
 
-  if (fnargs)
+  if (loc_only)
+   fprintf (f, "%d", loc.line);
+  else if (fnargs)
fprintf (f, "\"%s\", %d", file, loc.line);
   else
fprintf (f, "%s:%d", file, loc.line);
@@ -3431,14 +3434,11 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   needs_label = true;
 }
 
-  fprintf_indent (f, indent, "if (UNLIKELY (dump_file && (dump_flags & 
TDF_FOLDING))) "
-  "fprintf (dump_file, \"%s ",
-  s->kind == simplify::SIMPLIFY
-  ? "Applying pattern" : "Matching expression");
-  fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
+  fprintf_indent (f, indent, "if (UNLIKELY (dump_folding_p)) "
+   "dump_debug (%s, ", s->kind == simplify::SIMPLIFY ? "true" : "false");
   output_line_directive (f,
 result ? result->location : s->match->location, true,
-true);
+true, true);
   fprintf (f, ", __FILE__, __LINE__);\n");
 
   fprintf_indent (f, indent, "{\n");
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 
ec603f9d043c3924ea442bb49b5300a3573503cf..ae0c5c8a74fd9f1acdb616014941b11961e96c04
 100644
--- a/gcc/gimple-match-head.cc

  1   2   >