Pushed: [PATCH 0/8] aarch64: testsuite: Fix test failures with --enable-default-pie or --enable-default-ssp

2023-03-06 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-03-02 at 10:26 +, Richard Sandiford wrote:
> Xi Ruoyao  writes:
> > Hi,
> > 
> > This patch series fixes a lot of test failures with --enable-default-pie
> > or --enable-default-ssp for AArch64 target.  Only test files are changed
> > to disable PIE or SSP to satisify the expectation of the developer who
> > programmed the test.
> > 
> > Bootstrapped and regtested on aarch64-linux-gnu.  Ok for trunk?
> 
> OK for the series.  Thanks for doing this!

Pushed r13-6516 .. r13-6523.

> 
> Richard
> 
> > Xi Ruoyao (8):
> >   aarch64: testsuite: disable PIE for aapcs64 tests [PR70150]
> >   aarch64: testsuite: disable PIE for tests with large code model
> >     [PR70150]
> >   aarch64: testsuite: disable PIE for fuse_adrp_add_1.c [PR70150]
> >   aarch64: testsuite: disable stack protector for sve-pcs tests
> >   aarch64: testsuite: disable stack protector for pr103147-10 tests
> >   aarch64: testsuite: disable stack protector for auto-init-7.c
> >   aarch64: testsuite: disable stack protector for pr104005.c
> >   aarch64: testsuite: disable stack protector for tests relying on stack
> >     offset
> > 
> >  gcc/testsuite/g++.target/aarch64/pr103147-10.C | 2 +-
> >  gcc/testsuite/gcc.dg/tls/pr78796.c | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp   | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/auto-init-7.c | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/fuse_adrp_add_1.c | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr103147-10.c | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr104005.c    | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr63304_1.c   | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr70120-2.c   | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr78733.c | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr79041-2.c   | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr94530.c | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/pr94577.c | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/reload-valid-spoff.c  | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/shrink_wrap_1.c   | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c   | 2 +-
> >  gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c   | 2 +-
> >  .../gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp | 7 ---
> >  gcc/testsuite/gcc.target/aarch64/test_frame_17.c   | 2 +-
> >  19 files changed, 22 insertions(+), 21 deletions(-)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 1/2] gcov: Fix "do-while" structure in case statement leads to incorrect code coverage [PR93680]

2023-03-06 Thread Xionghu Luo via Gcc-patches



On 2023/3/6 16:11, Richard Biener wrote:

On Mon, Mar 6, 2023 at 8:22 AM Xionghu Luo  wrote:




On 2023/3/2 18:45, Richard Biener wrote:



small.gcno:  648:  block 2:`small.c':1, 3, 4, 6
small.gcno:  688:0145:  36:LINES
small.gcno:  700:  block 3:`small.c':8, 9
small.gcno:  732:0145:  32:LINES
small.gcno:  744:  block 5:`small.c':10
-small.gcno:  772:0145:  32:LINES
-small.gcno:  784:  block 6:`small.c':12
-small.gcno:  812:0145:  36:LINES
-small.gcno:  824:  block 7:`small.c':12, 13
+small.gcno:  772:0145:  36:LINES
+small.gcno:  784:  block 6:`small.c':12, 13
+small.gcno:  816:0145:  32:LINES
+small.gcno:  828:  block 8:`small.c':14
small.gcno:  856:0145:  32:LINES
-small.gcno:  868:  block 8:`small.c':14
-small.gcno:  896:0145:  32:LINES
-small.gcno:  908:  block 9:`small.c':17
+small.gcno:  868:  block 9:`small.c':17


Looking at the CFG and the instrumentation shows

 :
PROF_edge_counter_17 = __gcov0.f[0];
PROF_edge_counter_18 = PROF_edge_counter_17 + 1;
__gcov0.f[0] = PROF_edge_counter_18;
[t.c:3:7] p_6 = 0;
[t.c:5:3] switch (s_7(D))  [INV], [t.c:7:5] case 0:
 [INV], [t.c:11:5] case 1:  [INV]>

 :
# n_1 = PHI 
# p_3 = PHI <[t.c:3:7] p_6(2), [t.c:8:15] p_12(4)>
[t.c:7:5] :
[t.c:8:15] p_12 = p_3 + 1;
[t.c:8:28] n_13 = n_1 + -1;
[t.c:8:28] if (n_13 != 0)
  goto ; [INV]
else
  goto ; [INV]

 :
PROF_edge_counter_21 = __gcov0.f[2];
PROF_edge_counter_22 = PROF_edge_counter_21 + 1;
__gcov0.f[2] = PROF_edge_counter_22;
[t.c:7:5] goto ; [100.00%]

 :
PROF_edge_counter_23 = __gcov0.f[3];
PROF_edge_counter_24 = PROF_edge_counter_23 + 1;
__gcov0.f[3] = PROF_edge_counter_24;
[t.c:9:16] _14 = p_12;
[t.c:9:16] goto ; [INV]

so the reason this goes wrong is that gcov associates the "wrong"
counter with the block containing
the 'case' label(s), for the case 0 it should have chosen the counter
from bb5 but it likely
computed the count of bb3?

It might be that ordering blocks differently puts the instrumentation
to different blocks or it
makes gcovs association chose different blocks but that means it's
just luck and not fixing
the actual issue?

To me it looks like the correct thing to investigate is switch
statement and/or case label
handling.  One can also see that  having line number 7 is wrong to
the extent that
the position of the label doesn't match the number of times it
executes in the source.  So
placement of the label is wrong here, possibly caused by CFG cleanup
after CFG build
(but generally labels are not used for anything once the CFG is built
and coverage
instrumentation is late so it might fail due to us moving labels).  It
might be OK to
avoid moving labels for --coverage but then coverage should possibly
look at edges
rather than labels?



Thanks, I investigated the Labels, it seems wrong at the beginning from
.gimple to .cfg very early quite like PR90574:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90574

.gimple:

int f (int s, int n)
[small.c:2:1] {
int D.2755;
int p;

[small.c:3:7] p = 0;
[small.c:5:3] switch (s) , [small.c:7:5] case 0: , 
[small.c:11:5] case 1: >
[small.c:7:5] :  <= case label
:<= loop label
[small.c:8:13] p = p + 1;
[small.c:8:26] n = n + -1;
[small.c:8:26] if (n != 0) goto ; else goto ;
:
[small.c:9:14] D.2755 = p;
[small.c:9:14] return D.2755;
[small.c:11:5] :
:
[small.c:12:13] p = p + 1;
[small.c:12:26] n = n + -1;
[small.c:12:26] if (n != 0) goto ; else goto ;
:
[small.c:13:14] D.2755 = p;
[small.c:13:14] return D.2755;
:
[small.c:16:10] D.2755 = 0;
[small.c:16:10] return D.2755;
}

.cfg:

int f (int s, int n)
{
int p;
int D.2755;

 :
[small.c:3:7] p = 0;
[small.c:5:3] switch (s)  [INV], [small.c:7:5] case 0:  [INV], 
[small.c:11:5] case 1:  [INV]>

 :
[small.c:7:5] :   <= case 0
[small.c:8:13 discrim 1] p = p + 1;
[small.c:8:26 discrim 1] n = n + -1;
[small.c:8:26 discrim 1] if (n != 0)
  goto ; [INV]
else
  goto ; [INV]

 :
[small.c:9:14] D.2755 = p;
[small.c:9:14] goto ; [INV]

 :
[small.c:11:5] :  <= case 1
[small.c:12:13 discrim 1] p = p + 1;
[small.c:12:26 discrim 1] n = n + -1;
[small.c:12:26 discrim 1] if (n != 0)
  goto ; [INV]
else
  goto ; [INV]


The labels are merged into the loop unexpected, so I tried below fix
for --coverage if two labels are not on same line to start new basic block:


index 10ca86714f4..b788198ac31 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -2860,6 +2860,13 @@ stmt_starts_bb_p (gimple *stmt, gimple *prev_stmt)
|| !DECL_ARTIFICIAL (gimple

[PATCH] RISC-V: Add fault first load C/C++ support

2023-03-06 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc (riscv_gimple_fold_builtin): New 
function.
* config/riscv/riscv-protos.h (riscv_gimple_fold_builtin): Ditto.
(gimple_fold_builtin):  Ditto.
* config/riscv/riscv-vector-builtins-bases.cc (class read_vl): New 
class.
(class vleff): Ditto.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (read_vl): Ditto.
(vleff): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct read_vl_def): 
Ditto.
(struct fault_load_def): Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc 
(rvv_arg_type_info::get_tree_type): Add size_ptr.
(gimple_folder::gimple_folder): New class.
(gimple_folder::fold): Ditto.
(gimple_fold_builtin): New function.
(get_read_vl_instance): Ditto.
(get_read_vl_decl): Ditto.
* config/riscv/riscv-vector-builtins.def (size_ptr): Add size_ptr.
* config/riscv/riscv-vector-builtins.h (class gimple_folder): New class.
(get_read_vl_instance): New function.
(get_read_vl_decl):  Ditto.
* config/riscv/riscv-vsetvl.cc (fault_first_load_p): Ditto.
(read_vl_insn_p): Ditto.
(available_occurrence_p): Ditto.
(backward_propagate_worthwhile_p): Ditto.
(gen_vsetvl_pat): Adapt for vleff support.
(get_forward_read_vl_insn): New function.
(get_backward_fault_first_load_insn): Ditto.
(source_equal_p): Adapt for vleff support.
(first_ratio_invalid_for_second_sew_p): Remove.
(first_ratio_invalid_for_second_lmul_p): Ditto.
(first_lmul_less_than_second_lmul_p): Ditto.
(first_ratio_less_than_second_ratio_p): Ditto.
(support_relaxed_compatible_p): New function.
(vector_insn_info::operator>): Remove.
(vector_insn_info::operator>=): Refine.
(vector_insn_info::parse_insn): Adapt for vleff support.
(vector_insn_info::compatible_p): Ditto.
(vector_insn_info::update_fault_first_load_avl): New function.
(pass_vsetvl::transfer_after): Adapt for vleff support.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_insns): Ditto.
* config/riscv/riscv-vsetvl.def (DEF_INCOMPATIBLE_COND): Remove 
redundant condtions.
* config/riscv/riscv-vsetvl.h (struct demands_cond): New function.
* config/riscv/riscv.cc (TARGET_GIMPLE_FOLD_BUILTIN): New target hook.
* config/riscv/riscv.md: Adapt for vleff support.
* config/riscv/t-riscv: Ditto.
* config/riscv/vector-iterators.md: New iterator.
* config/riscv/vector.md (read_vlsi): New pattern.
(read_vldi_zero_extend): Ditto.
(@pred_fault_load): Ditto.

---
 gcc/config/riscv/riscv-builtins.cc|  31 ++
 gcc/config/riscv/riscv-protos.h   |   2 +
 .../riscv/riscv-vector-builtins-bases.cc  |  86 -
 .../riscv/riscv-vector-builtins-bases.h   |   2 +
 .../riscv/riscv-vector-builtins-functions.def |   7 +-
 .../riscv/riscv-vector-builtins-shapes.cc |  58 
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +
 gcc/config/riscv/riscv-vector-builtins.cc |  83 -
 gcc/config/riscv/riscv-vector-builtins.def|   1 +
 gcc/config/riscv/riscv-vector-builtins.h  |  25 ++
 gcc/config/riscv/riscv-vsetvl.cc  | 323 +++---
 gcc/config/riscv/riscv-vsetvl.def | 189 +-
 gcc/config/riscv/riscv-vsetvl.h   |  10 +-
 gcc/config/riscv/riscv.cc |   3 +
 gcc/config/riscv/riscv.md |   8 +-
 gcc/config/riscv/t-riscv  |   3 +-
 gcc/config/riscv/vector-iterators.md  |   1 +
 gcc/config/riscv/vector.md|  53 ++-
 18 files changed, 575 insertions(+), 312 deletions(-)

diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 390f8a38309..b1c4b7547d7 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -38,6 +38,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "langhooks.h"
 #include "tm_p.h"
+#include "backend.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
 
 /* Macros to create an enumeration identifier for a function prototype.  */
 #define RISCV_FTYPE_NAME0(A) RISCV_##A##_FTYPE
@@ -332,6 +335,34 @@ riscv_expand_builtin_direct (enum insn_code icode, rtx 
target, tree exp,
   return riscv_expand_builtin_insn (icode, opno, ops, has_target_p);
 }
 
+/* Implement TARGET_GIMPLE_FOLD_BUILTIN.  */
+
+bool
+riscv_gimple_fold_builtin (gimple_stmt_iterator *gsi)
+{
+  gcall *stmt = as_a (gsi_stmt (*gsi));
+  tree fndecl = gimple_call_fndecl (stmt);
+  unsigned int code = DECL_MD_FUNCTION_CODE (fndecl);
+  unsigned i

Enable UTF-8 code page in driver and compiler on 64-bit mingw host [PR108865]

2023-03-06 Thread Costas Argyris via Gcc-patches
Hi

This is a proposal for addressing

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108865

by integrating the UTF-8 manifest file into gcc's build process for the
64-bit mingw host.

The analysis and discussion leading up to the latest patch are written in
the bug report.

The patch attached in this email is exactly the same as the one posted last
in the bug report.

I should also mention that, in case of approval, I would need someone with
write access to make the commit for me please.

Thanks,
Costas


0001-Enable-UTF-8-code-page-on-Windows-64-bit-host-PR1088.patch
Description: Binary data


Re: Ping: [PATCH 1/2] testsuite: Provide means to regexp in multiline patterns

2023-03-06 Thread Hans-Peter Nilsson via Gcc-patches
> From: Mike Stump 
> Date: Mon, 6 Mar 2023 02:05:35 -0800

> Ok

Thanks!  The server-side hook didn't like my ChangeLog
entry:

* lib/multiline.exp (_build_multiline_regex): Map
"{re:" to "(", ":re}" to ")" and ":re?}" to ")?".

It seems I forgot to validate that patch by
contrib/gcc-changelog/git_check_commit.py, which complains:

Checking c0debd6f586ef76f1ceabfed11d7eaf8f6d1b110: FAILED
ERR: bad wrapping of parenthesis: " "{re:" to "(", ":re}" to ")" and 
":re?}" to ")?"."

I gave in and took the easy way out; not fixing the bug in
that script, but instead "wrapped the parenthesis" to:

* lib/multiline.exp (_build_multiline_regex): Map
"{re:" to "(", similarly ")?" from ":re?}" and the
same without question mark.

I hope to make amends by fixing git_check_commit.py, if
given guidance.

brgds, H-P


[PATCH] c++: noexcept and copy elision [PR109030]

2023-03-06 Thread Marek Polacek via Gcc-patches
When processing a noexcept, constructors aren't elided: build_over_call
has
 /* It's unsafe to elide the constructor when handling
a noexcept-expression, it may evaluate to the wrong
value (c++/53025).  */
 && (force_elide || cp_noexcept_operand == 0))
so the assert I added recently needs to be relaxed a little bit.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/109030

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Relax assert.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept77.C: New test.
---
 gcc/cp/constexpr.cc | 6 +-
 gcc/testsuite/g++.dg/cpp0x/noexcept77.C | 9 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept77.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 364695b762c..5384d0e8e46 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2869,7 +2869,11 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 
   /* We used to shortcut trivial constructor/op= here, but nowadays
  we can only get a trivial function here with -fno-elide-constructors.  */
-  gcc_checking_assert (!trivial_fn_p (fun) || !flag_elide_constructors);
+  gcc_checking_assert (!trivial_fn_p (fun)
+  || !flag_elide_constructors
+  /* We don't elide constructors when processing
+ a noexcept-expression.  */
+  || cp_noexcept_operand);
 
   bool non_constant_args = false;
   new_call.bindings
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept77.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept77.C
new file mode 100644
index 000..16db8eb79ee
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept77.C
@@ -0,0 +1,9 @@
+// PR c++/109030
+// { dg-do compile { target c++11 } }
+
+struct foo { };
+
+struct __as_receiver {
+  foo empty_env;
+};
+void sched(foo __fun) noexcept(noexcept(__as_receiver{__fun})) { }

base-commit: dfb14cdd796ad9df6b5f2def047ef36b29385902
-- 
2.39.2



[PATCH] libstdc++: use copy_file_range, improve sendfile in filesystem::copy_file

2023-03-06 Thread Jannik Glückert via Gcc-patches
The current copy_file implementation is suboptimal. It only uses
sendfile for files smaller than 2GB, falling back to a userspace copy,
and does not support copy_file_range at all.
copy_file_range is particularly of increasing importance with the
adoption of reflinks in filesystems.

I am pretty sure I got some of the formatting wrong, feel free to tear apart.
I don't know if sendfile has identical semantics on linux as it does
on solaris, if someone could test with a big file that'd be great.
Otherwise, this should not regress. The implementation will fall back
to sendfile / userspace copy if copy_file_range is not available for
the target paths.

The copy implementations for sendfile and copy_file_range were put
into separate functions and the callee code simplified to the point
where you can basically just copy-paste it to add a new
implementation, should new interesting syscalls pop up.

Best
Jannik
From 306f9d5e1076ff936ef35942bca546ce188fba81 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jannik=20Gl=C3=BCckert?= 
Date: Mon, 6 Mar 2023 20:52:08 +0100
Subject: [PATCH 1/2] libstdc++: also use sendfile for big files

we were previously only using sendfile for files smaller than 2GB, as
sendfile needs to be called repeatedly for files bigger than that.

some quick numbers, copying a 16GB file, average of 10 repetitions:
old:
real: 13.4s
user: 0.14s
sys : 7.43s
new:
real: 8.90s
user: 0.00s
sys : 3.68s

libstdc++-v3/ChangeLog:

* src/filesystem/ops-common.h: enable sendfile for files
  >2GB in std::filesystem::copy_file
---
 libstdc++-v3/src/filesystem/ops-common.h | 77 
 1 file changed, 40 insertions(+), 37 deletions(-)

diff --git a/libstdc++-v3/src/filesystem/ops-common.h b/libstdc++-v3/src/filesystem/ops-common.h
index abbfca43e5c..d8afc6a4d64 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++ b/libstdc++-v3/src/filesystem/ops-common.h
@@ -358,6 +358,24 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
   }
 
 #ifdef NEED_DO_COPY_FILE
+#if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  bool
+  copy_file_sendfile(int fd_in, int fd_out, size_t length) noexcept
+  {
+size_t bytes_left = length;
+off_t offset = 0;
+ssize_t bytes_copied;
+do {
+  bytes_copied = ::sendfile(fd_out, fd_in, &offset, bytes_left);
+  if (bytes_copied < 0)
+{
+  return false;
+}
+  bytes_left -= bytes_copied;
+} while (bytes_left > 0 && bytes_copied > 0);
+return true;
+  }
+#endif
   bool
   do_copy_file(const char_type* from, const char_type* to,
 	   std::filesystem::copy_options_existing_file options,
@@ -498,28 +516,30 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 	return false;
   }
 
-size_t count = from_st->st_size;
+bool has_copied = false;
+
 #if defined _GLIBCXX_USE_SENDFILE && ! defined _GLIBCXX_FILESYSTEM_IS_WINDOWS
-off_t offset = 0;
-ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
-if (n < 0 && errno != ENOSYS && errno != EINVAL)
+if (!has_copied)
+  has_copied = copy_file_sendfile(in.fd, out.fd, from_st->st_size);
+if (!has_copied)
   {
-	ec.assign(errno, std::generic_category());
-	return false;
+  if (errno != ENOSYS && errno != EINVAL)
+{
+  ec.assign(errno, std::generic_category());
+  return false;
+}
   }
-if ((size_t)n == count)
+#endif
+
+if (has_copied)
   {
-	if (!out.close() || !in.close())
-	  {
-	ec.assign(errno, std::generic_category());
-	return false;
-	  }
-	ec.clear();
-	return true;
+if (!out.close() || !in.close())
+  {
+	  ec.assign(errno, std::generic_category());
+	  return false;
+  }
+return true;
   }
-else if (n > 0)
-  count -= n;
-#endif // _GLIBCXX_USE_SENDFILE
 
 using std::ios;
 __gnu_cxx::stdio_filebuf sbin(in.fd, ios::in|ios::binary);
@@ -530,29 +550,12 @@ _GLIBCXX_BEGIN_NAMESPACE_FILESYSTEM
 if (sbout.is_open())
   out.fd = -1;
 
-#ifdef _GLIBCXX_USE_SENDFILE
-if (n != 0)
+if (!(std::ostream(&sbout) << &sbin))
   {
-	if (n < 0)
-	  n = 0;
-
-	const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
-	const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
-
-	const std::streampos errpos(std::streamoff(-1));
-	if (p1 == errpos || p2 == errpos)
-	  {
-	ec = std::make_error_code(std::errc::io_error);
-	return false;
-	  }
+  ec = std::make_error_code(std::errc::io_error);
+  return false;
   }
-#endif
 
-if (count && !(std::ostream(&sbout) << &sbin))
-  {
-	ec = std::make_error_code(std::errc::io_error);
-	return false;
-  }
 if (!sbout.close() || !sbin.close())
   {
 	ec.assign(errno, std::generic_category());
-- 
2.39.2

From 72b7ad044246e496d90b5f241f59bd0b69e214fa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jannik=20Gl=C3=BCckert?= 
Date: Mon, 6 Mar 2023 23:11:41 +0100
Subject: [PATCH 2/2] 

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Segher Boessenkool
Hi!

On Mon, Mar 06, 2023 at 07:13:08PM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > Most importantly, what makes you think this is a problem for aarch64
> > only?  If it actually is, you can fix it in the aarch64 config!  Either
> > with or without new hooks, whatever works best.
> 
> The point is that I don't think it's a problem for AArch64 only.
> I think it's a generic issue that should be solved in a generic way
> (which is what the patch is trying to do).  The suggestion to restrict
> it to AArch64 came from Jakub.
> 
> The reason I'm pushing back against a hook is precisely because
> I don't want to solve this in AArch64-specific code.

But it is many times worse still to do it in target-specific magic code
disguised as generic code :-(

If there is no clear explanation why combine should do X, then it
probably should not.

> I'm not sure we would be talking about restricting this to AArch64
> if the patch had been posted in stage 1.  If people are concerned
> about doing this for all targets in stage 4 (which they seem to be),

Not me, not in principle.  But it takes more time than we have left in
stage 4 to handle this, even for only combine.  We should give the other
target maintainers much longer as well.

> I thought the #ifdef was the simplest way of addressing that concern.

An #ifdef is a way of making a change that is not finished yet not hurt
the other targets.  It still hurts generic development, which indirectly
hurts all targets.

> And I don't think what the patch does is ad hoc.

It is almost impossible to explain what it does and why that is a good
thing, why it is what we want, what we should do here; and certainly not
in a compact, terse, focused way.  It has all the hallmarks of ad hoc
patches.

> Reorganising the
> expression in this way isn't something new.  extract_left_shift already
> does a similar thing (and does it for all targets).

That is not similar at all, no.

/* See if X (of mode MODE) contains an ASHIFT of COUNT or more bits that
   can be commuted with any other operations in X.  Return X without
   that shift if so.  */

If you can factor out a utility function like that, with an actual nice
description like that, it would be a much more palatable patch.


Segher


Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Segher Boessenkool
On Mon, Mar 06, 2023 at 04:34:59PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Mon, Mar 06, 2023 at 03:08:00PM +, Richard Sandiford via Gcc-patches 
> > wrote:
> >> Segher Boessenkool  writes:
> >> > On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote:
> >> >> How about the patch below?
> >> >
> >> > What about it?  What would make it any better than the previous?
> >> 
> >> It does what Jeff suggested in the quoted message: work within the existing
> >> extract/make_compound_operation scheme rather than try to opt out of it.
> >
> > That still feels like it could be risky in stage4, affecting various other
> > FEs which would be expecting ANDs in their patterns instead of *_EXTEND, no?
> > So, at least we'd need something like Segher ran to test it on various
> > targets on Linux kernel (but would be really nice to get also i?86/x86_64).
> >
> > If it were on the aarch64 side just one pattern, I'd suggest a pre-reload
> > splitter, but unfortunately the sign extends (and zero extends?) are handled
> > in legitimate address hook.  Also, I see nonzero_bits only called in
> > rs6000's combine splitter and s390'x canonicalize_comparison target hook,
> > nowhere else in the backends, so I think using it outside of the combiner
> > isn't desirable.
> >
> > Could we have a target hook to canonicalize memory addresses for combiner,
> > like we have that targetm.canonicalize_comparison ?
> 
> I don't think a hook makes sense as a long-term design decision.
> The canonicalisation we're doing here isn't logically AArch64-specific,
> and in general, the less variation in RTL rules between targets, the better.

C1 is trunk, C2 is the previous patch, C3 is this one:

$ perl sizes.pl --percent C[123]
C1C2C3
   alpha   7082243  100.066%  100.000%
 arc   4207975  100.015%  100.000%
 arm  11518624  100.008%  100.000%
   arm64  24514565  100.067%  100.033%
   armhf  16661684  100.098%  100.000%
csky   4031841  100.002%  100.000%
i386 0 0 0
ia64  20354295  100.029%  100.000%
m68k   4394084  100.023%  100.000%
  microblaze   6549965  100.014%  100.000%
mips  10684680  100.024%  100.000%
  mips64   8171850  100.002%  100.000%
   nios2   4356713  100.012%  100.000%
openrisc   5010570  100.003%  100.000%
  parisc   8406294  100.002%  100.000%
parisc64 0 0 0
 powerpc  11104901   99.992%  100.000%
   powerpc64  24532358  100.057%  100.000%
 powerpc64le  21293219  100.062%  100.000%
 riscv32   2028474  100.131%  100.000%
 riscv64   9515453  100.120%  100.000%
s390  20519612  100.279%  100.000%
  sh 0 0 0
 shnommu   1840960  100.012%  100.000%
   sparc   5314422  100.004%  100.000%
 sparc64   7964129   99.992%  100.000%
  x86_64 0 0 0
  xtensa   2925723  100.070%  100.000%

It does absolutely nothing for all those other targets you say it is
beneficial for; and it is a net *negative* for aarch64 itself!


Segher


[PATCH v2] c++: error with constexpr operator() [PR107939]

2023-03-06 Thread Marek Polacek via Gcc-patches
On Mon, Mar 06, 2023 at 11:12:56AM -0500, Jason Merrill wrote:
> On 3/3/23 12:51, Marek Polacek wrote:
> > Similarly to PR107938, this also started with r11-557, whereby 
> > cp_finish_decl
> > can call check_initializer even in a template for a constexpr initializer.
> > 
> > Here we are rejecting
> > 
> >extern const Q q;
> > 
> >template
> >constexpr auto p = q(0);
> > 
> > even though q has a constexpr operator().  It's deemed non-const by
> > decl_maybe_constant_var_p because even though 'q' is const it is not
> > of integral/enum type.  I think the fix is for p_c_e to treat q(0) as
> > potentially-constant, as below.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?
> > 
> > PR c++/107939
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constexpr.cc (is_constexpr_function_object): New.
> > (potential_constant_expression_1): Treat an object with constexpr
> > operator() as potentially-constant.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1y/var-templ74.C: Remove dg-error.
> > * g++.dg/cpp1y/var-templ77.C: New test.
> > ---
> >   gcc/cp/constexpr.cc  | 23 ++-
> >   gcc/testsuite/g++.dg/cpp1y/var-templ74.C |  2 +-
> >   gcc/testsuite/g++.dg/cpp1y/var-templ77.C | 14 ++
> >   3 files changed, 37 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ77.C
> > 
> > diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> > index acf9847a4d1..7d786f332b4 100644
> > --- a/gcc/cp/constexpr.cc
> > +++ b/gcc/cp/constexpr.cc
> > @@ -8929,6 +8929,24 @@ check_for_return_continue (tree *tp, int 
> > *walk_subtrees, void *data)
> > return NULL_TREE;
> >   }
> > +/* Return true iff TYPE is a class with constexpr operator().  */
> > +
> > +static bool
> > +is_constexpr_function_object (tree type)
> > +{
> > +  if (!CLASS_TYPE_P (type))
> > +return false;
> > +
> > +  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> > +if (TREE_CODE (f) == FUNCTION_DECL
> > +   && DECL_OVERLOADED_OPERATOR_P (f)
> > +   && DECL_OVERLOADED_OPERATOR_IS (f, CALL_EXPR)
> > +   && DECL_DECLARED_CONSTEXPR_P (f))
> > +  return true;
> > +
> > +  return false;
> > +}
> > +
> >   /* Return true if T denotes a potentially constant expression.  Issue
> >  diagnostic as appropriate under control of FLAGS.  If WANT_RVAL is 
> > true,
> >  an lvalue-rvalue conversion is implied.  If NOW is true, we want to
> > @@ -9160,7 +9178,10 @@ potential_constant_expression_1 (tree t, bool 
> > want_rval, bool strict, bool now,
> >   }
> > else if (fun)
> > {
> > -   if (RECUR (fun, rval))
> > +   if (VAR_P (fun)
> > +   && is_constexpr_function_object (TREE_TYPE (fun)))
> > + /* Could be an object with constexpr operator().  */;
> 
> I guess if fun is not a function pointer, we don't know if we're using it as
> an lvalue or rvalue

Presumably the operator function could return this, making it an lvalue?
I'm not sure I'm really clear on this.

> , so we want to pass 'any' for want_rval, which should
> make this work; 

Yes, want_rval==false means that p_c_e/VAR_DECL will not issue the
hard error.

> I don't think we need to be specific about constexpr op(),
> as a constexpr conversion operator to fn* could also do the trick.

Ah, those surrogate classes.  I couldn't reproduce the problem with
them, though I'm adding a test for it anyway.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Similarly to PR107938, this also started with r11-557, whereby cp_finish_decl
can call check_initializer even in a template for a constexpr initializer.

Here we are rejecting

  extern const Q q;

  template
  constexpr auto p = q(0);

even though q has a constexpr operator().  It's deemed non-const by
decl_maybe_constant_var_p because even though 'q' is const it is not
of integral/enum type.

If fun is not a function pointer, we don't know if we're using it as an
lvalue or rvalue, so with this patch we pass 'any' for want_rval.  With
that, p_c_e/VAR_DECL doesn't flat out reject the underlying VAR_DECL.

PR c++/107939

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1) : Pass
'any' when recursing on a VAR_DECL and not a pointer to function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ74.C: Remove dg-error.
* g++.dg/cpp1y/var-templ77.C: New test.
---
 gcc/cp/constexpr.cc  |  8 --
 gcc/testsuite/g++.dg/cpp1y/var-templ74.C |  2 +-
 gcc/testsuite/g++.dg/cpp1y/var-templ77.C | 32 
 3 files changed, 39 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ77.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 364695b762c..3079561f2e8 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9179,8 +9179,12 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool no

[PATCH v6] c++: -Wdangling-reference with reference wrapper [PR107532]

2023-03-06 Thread Marek Polacek via Gcc-patches
On Fri, Mar 03, 2023 at 09:30:38PM -0500, Jason Merrill wrote:
> On 3/3/23 12:50, Marek Polacek wrote:
> > switch (TREE_CODE (expr))
> >   {
> >   case CALL_EXPR:
> > @@ -13831,7 +13895,8 @@ do_warn_dangling_reference (tree expr)
> >  std::pair v = std::minmax(1, 2);
> >which also creates a dangling reference, because std::minmax
> >returns std::pair(b, a).  */
> > -   if (!(TYPE_REF_OBJ_P (rettype) || std_pair_ref_ref_p (rettype)))
> > +   if (!arg_p
> > +   && (!(TYPE_REF_OBJ_P (rettype) || std_pair_ref_ref_p (rettype
> 
> Instead of checking !arg_p maybe the std_pair_ref_ref_p call should change
> to reference_like_class_p (which in turn should check std_pair_ref_ref_p)?

Could do.  I suppose the logic is that for std::pair
arguments we want to see through it to get at its arguments.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Here, -Wdangling-reference triggers where it probably shouldn't, causing
some grief.  The code in question uses a reference wrapper with a member
function returning a reference to a subobject of a non-temporary object:

  const Plane & meta = fm.planes().inner();

I've tried a few approaches, e.g., checking that the member function's
return type is the same as the type of the enclosing class (which is
the case for member functions returning *this), but that then breaks
Wdangling-reference4.C with std::optional.

This patch adjusts do_warn_dangling_reference so that we look through
reference wrapper classes (meaning, has a reference member and a
constructor taking the same reference type, or is std::reference_wrapper
or std::ranges::ref_view) and don't warn for them, supposing that the
member function returns a reference to a non-temporary object.

PR c++/107532

gcc/cp/ChangeLog:

* call.cc (reference_like_class_p): New.
(do_warn_dangling_reference): Add new bool parameter.  See through
reference_like_class_p.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-reference8.C: New test.
* g++.dg/warn/Wdangling-reference9.C: New test.
---
 gcc/cp/call.cc| 97 ---
 .../g++.dg/warn/Wdangling-reference8.C| 77 +++
 .../g++.dg/warn/Wdangling-reference9.C| 21 
 3 files changed, 181 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference8.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wdangling-reference9.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 048b2b052f8..a43980b6e15 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -13779,6 +13779,52 @@ std_pair_ref_ref_p (tree t)
   return true;
 }
 
+/* Return true if a class CTYPE is either std::reference_wrapper or
+   std::ref_view, or a reference wrapper class.  We consider a class
+   a reference wrapper class if it has a reference member and a
+   constructor taking the same reference type.  */
+
+static bool
+reference_like_class_p (tree ctype)
+{
+  if (!CLASS_TYPE_P (ctype))
+return false;
+
+  /* Also accept a std::pair.  */
+  if (std_pair_ref_ref_p (ctype))
+return true;
+
+  tree tdecl = TYPE_NAME (TYPE_MAIN_VARIANT (ctype));
+  if (decl_in_std_namespace_p (tdecl))
+{
+  tree name = DECL_NAME (tdecl);
+  return (name
+ && (id_equal (name, "reference_wrapper")
+ || id_equal (name, "ref_view")));
+}
+  for (tree fields = TYPE_FIELDS (ctype);
+   fields;
+   fields = DECL_CHAIN (fields))
+{
+  if (TREE_CODE (fields) != FIELD_DECL || DECL_ARTIFICIAL (fields))
+   continue;
+  tree type = TREE_TYPE (fields);
+  if (!TYPE_REF_P (type))
+   continue;
+  /* OK, the field is a reference member.  Do we have a constructor
+taking its type?  */
+  for (tree fn : ovl_range (CLASSTYPE_CONSTRUCTORS (ctype)))
+   {
+ tree args = FUNCTION_FIRST_USER_PARMTYPE (fn);
+ if (args
+ && same_type_p (TREE_VALUE (args), type)
+ && TREE_CHAIN (args) == void_list_node)
+   return true;
+   }
+}
+  return false;
+}
+
 /* Helper for maybe_warn_dangling_reference to find a problematic CALL_EXPR
that initializes the LHS (and at least one of its arguments represents
a temporary, as outlined in maybe_warn_dangling_reference), or NULL_TREE
@@ -13793,12 +13839,36 @@ std_pair_ref_ref_p (tree t)
  const int& y = (f(1), 42); // NULL_TREE
  const int& z = f(f(1)); // f(f(1))
 
-   EXPR is the initializer.  */
+   EXPR is the initializer.  If ARG_P is true, we're processing an argument
+   to a function; the point is to distinguish between, for example,
+
+ Ref::inner (&TARGET_EXPR )
+
+   where we shouldn't warn, and
+
+ Ref::inner (&TARGET_EXPR )>)
+
+   where we should warn (Ref is a reference_like_class_p so we see through
+   it.  */
 
 static tree
-do_warn_dangling_reference (tree expr)
+do_warn_dangling_reference (tree expr, bool arg_p)
 {
 

Re: [PATCH] testsuite: Support scanning tree-dumps

2023-03-06 Thread Mike Stump via Gcc-patches
On Mar 6, 2023, at 10:52 AM, Hans-Peter Nilsson via Gcc-patches 
 wrote:
> 
> Ok to apply?

Ok.

>   * lib/target-supports.exp (check_compile): Support scanning tree-dumps.



Re: [PATCH 1/3] testsuite: Add tail_call effective target

2023-03-06 Thread Mike Stump via Gcc-patches
On Mar 6, 2023, at 10:45 AM, Hans-Peter Nilsson via Gcc-patches 
 wrote:
> 
> Ok to commit?

Ok.

> -- >8 --
> The RTL "expand" dump is the first RTL dump, and it also appears to be
> the earliest trace of the target having implemented sibcalls.
> Including the "," in the pattern searched for, to try and avoid
> possible false matches, but there doesn't appear to be any identifiers
> or target names nearby so this is just belts and suspenders.  Using
> "tail_call" as a shorter and more commonly used term than a derivative
> of "sibling calls", and expecting only gcc folks to have heard of
> "sibcalls".
> 
>   * lib/target-supports.exp (check_effective_target_tail_call): New.



PING Re: [RFC] internal documentation for OMP_FOR

2023-03-06 Thread Sandra Loosemore

Ping!

https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612298.html
On 2/18/23 22:21, Sandra Loosemore wrote:
I've been working on support for OpenMP imperfectly-nested loops.  In 
the process I have gone astray multiple times because of 
incorrect/inadequate internal documentation for the OMP_FOR tree 
structure.  The code changes I've been working on are Stage 1 material, 
but the documentation improvements are independent and can be fixed now. 
  Here is a patch I put together for the internals manual; can other 
people familiar with this functionality review it for technical 
correctness?  Even after working with this code for months, I'm still 
not sure I understand all the nuances.  :-P  The comments in tree.def 
are also wrong, but once we get the content right in the manual I can 
just copy the changes there for the final version of the patch.


-Sandra




Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool  writes:
> On Mon, Mar 06, 2023 at 04:34:59PM +, Richard Sandiford wrote:
>> Jakub Jelinek  writes:
>> > Could we have a target hook to canonicalize memory addresses for combiner,
>> > like we have that targetm.canonicalize_comparison ?
>> 
>> I don't think a hook makes sense as a long-term design decision.
>> The canonicalisation we're doing here isn't logically AArch64-specific,
>> and in general, the less variation in RTL rules between targets, the better.
>
> Some targets do not want all insasnity allowed for other targets.  We
> have quite a few exampples of this already.  But of course a hook like
> the proposed one can be abused a lot to do completely unrelated things.
> We'll just have to trust target maintainers to have good taste and some
> wisdom (because not everyine else looks at all target patches).  What
> else is new :-)
>
>> But if you mean adding target control as a GCC 13 hack, to avoid any
>> effect on other targets, then TBH, I'd prefer just sticking it in an
>> #ifdef GCC_AARCH64_H :-)
>
> And I will NAK that for all the same reasons: it is unmaintainable, it
> makes things harder instead of solving problems, it is a completely
> ad-hoc code change.
>
>> That makes it 100% clear that it's a
>> temporary hack to restrict the target impact rather than something
>> based on fundamentals.  We can then revisit for GCC 14.
>
> And that will never happen, you know this as well as anyone else :-(
>
> Most importantly, what makes you think this is a problem for aarch64
> only?  If it actually is, you can fix it in the aarch64 config!  Either
> with or without new hooks, whatever works best.

The point is that I don't think it's a problem for AArch64 only.
I think it's a generic issue that should be solved in a generic way
(which is what the patch is trying to do).  The suggestion to restrict
it to AArch64 came from Jakub.

The reason I'm pushing back against a hook is precisely because
I don't want to solve this in AArch64-specific code.

I'm not sure we would be talking about restricting this to AArch64
if the patch had been posted in stage 1.  If people are concerned
about doing this for all targets in stage 4 (which they seem to be),
I thought the #ifdef was the simplest way of addressing that concern.
"Revisit for GCC 14" would be a case of removing the #ifdef in stage 1.

And I don't think what the patch does is ad hoc.  Reorganising the
expression in this way isn't something new.  extract_left_shift already
does a similar thing (and does it for all targets).

Thanks,
Richard


Re: [PATCH]AArch64: Fix codegen regressions around tbz.

2023-03-06 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, January 27, 2023 12:26 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH]AArch64: Fix codegen regressions around tbz.
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > We were analyzing code quality after recent changes and have noticed
>> > that the tbz support somehow managed to increase the number of
>> > branches overall rather than decreased them.
>> >
>> > While investigating this we figured out that the problem is that when
>> > an existing &  exists in gimple and the instruction is
>> > generated because of the range information gotten from the ANDed
>> > constant that we end up with the situation that you get a NOP AND in the
>> RTL expansion.
>> >
>> > This is not a problem as CSE will take care of it normally.   The issue is 
>> > when
>> > this original AND was done in a location where PRE or FRE "lift" the
>> > AND to a different basic block.  This triggers a problem when the
>> > resulting value is not single use.  Instead of having an AND and tbz,
>> > we end up generating an AND + TST + BR if the mode is HI or QI.
>> >
>> > This CSE across BB was a problem before but this change made it worse.
>> > Our branch patterns rely on combine being able to fold AND or
>> > zero_extends into the instructions.
>> >
>> > To work around this (since a proper fix is outside of the scope of
>> > stage-4) we are limiting the new tbranch optab to only HI and QI mode
>> > values.  This isn't a problem because these two modes are modes for
>> > which we don't have CBZ support, so they are the problematic cases to
>> begin with.  Additionally booleans are QI.
>> >
>> > The second thing we're doing is limiting the only legal bitpos to pos 0. 
>> > i.e.
>> > only the bottom bit.  This such that we prevent the double ANDs as
>> > much as possible.
>> >
>> > Now most other cases, i.e. where we had an explicit & in the source
>> > code are still handled correctly by the anonymous
>> > (*tb1)
>> > pattern that was added along with tbranch support.
>> >
>> > This means we don't expand the superflous AND here, and while it
>> > doesn't fix the problem that in the cross BB case we loss tbz, it also 
>> > doesn't
>> make things worse.
>> >
>> > With these tweaks we've now reduced the number of insn uniformly as
>> > originally expected.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* config/aarch64/aarch64.md (tbranch_3): Restrict to
>> SHORT
>> >and bottom bit only.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >* gcc.target/aarch64/tbz_2.c: New test.
>> 
>> Agreed that reducing the scope of the new optimisation seems like a safe
>> compromise for GCC 13.  But could you add a testcase that shows the effect
>> of both changes (reducing the mode selection and the bit selection)?  The
>> test above passes even without the patch.
>> 
>> It would be good to have a PR tracking this too.
>
> I've been trying to isolate a small testcase to include, and it's not been 
> trivial as GCC
> will do various transformations on smaller sequences to still do the right 
> thing.
>
> I have various testcase where GCC is doing the wrong thing for the branches 
> but as soon
> As my repro for the cases this fixes gets too small problem gets hidden..

Thanks for sharing one of these off-list.  It's a bit of a contrived
example, but:

void g(int);

void
f (unsigned int x, _Bool y)
{
  for (int i = 0; i < 100; ++i)
{
  if ((x >> 31) | y)
g (1);
  for (int j = 0; j < 100; ++j)
g (2);
}
}

generates an extra AND without the patch, so a test for:

  { dg-final { scan-assembler-times {and\t} 1 } }

would cover it.  This seemed to be at least related to the problem
in one of the functions in the bigger test.

The patch is OK with a testcase along those lines in addition to the
original tbz_2.c.  (I agree we should keep the tbz_2.c one, for any
extra coverage it gives.)

>> Personally, I think we should try to get to the stage where gimple does the
>> CSE optimisations we need, and where the tbranch optab can generate a tbz
>> directly (rather than splitting it apart and hoping that combine will put it 
>> back
>> together later).
>
> Agreed, but that doesn't solve all the problems though. GCC is in general 
> quite bad at branch
> layouts especially wrt. To branch distances. For instance BB rotation doesn't 
> take distance into
> account. And TBZ, CBZ, B have different ranges.  Since the distance isn't 
> taken into account we
> end up "optimizing" the branch and then at codegen doing an emergency 
> distance enlargement
> using a TST + B to replace whatever we optimized too.

Hmm, true.

> LLVM does much better in all of these scenarios, so it's likely that the 
> entire branch strategy needs
>

[PATCH] testsuite: Support scanning tree-dumps

2023-03-06 Thread Hans-Peter Nilsson via Gcc-patches
This is sort-of a spin-off from effective_target_tail_call: I thought
that'd best be implemented by scanning a tree-dump, specifically
-fdump-tree-optimized, but the "tail call" found there is emitted for
*all* targets.  Debugged and ready to apply, putting it out for
consideration as someone will need it (or should use it) sooner rather
than later...  Best committed rather than sitting in mail-archives so:
Ok to apply?
-- >8 --
No planned usage.

* lib/target-supports.exp (check_compile): Support scanning tree-dumps.
---
 gcc/testsuite/lib/target-supports.exp | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4236c920baeb..0ca7a9680bb4 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -87,6 +87,7 @@ proc check_compile {basename type contents args} {
assembly { set output ${basename}[pid].s }
object { set output ${basename}[pid].o }
executable { set output ${basename}[pid].exe }
+   "tree-*" -
"rtl-*" {
set output ${basename}[pid].s
lappend options "additional_flags=-fdump-$type"
@@ -108,6 +109,9 @@ proc check_compile {basename type contents args} {
 if [regexp "rtl-(.*)" $type dummy rtl_type] {
set scan_output "[glob $src.\[0-9\]\[0-9\]\[0-9\]r.$rtl_type]"
file delete $output
+} elseif [regexp "tree-(.*)" $type dummy tree_type] {
+   set scan_output "[glob $src.\[0-9\]\[0-9\]\[0-9\]t.$tree_type]"
+   file delete $output
 }
 
 # Restore additional_sources.
-- 
2.30.2



[PATCH 3/3] testsuite: Gate gcc.dg/plugin/must-tail-call-1.c and -2.c on tail_call

2023-03-06 Thread Hans-Peter Nilsson via Gcc-patches
Borderline obvious when tail_call is available, so I'll then apply.
-- >8 --
While gcc.dg/plugin/must-tail-call-2.c passes for all targets even
without this, the error message is, for a target like cris-elf that
doesn't implement sibling calls: "error: cannot tail-call: machine
description does not have a sibcall_epilogue instruction pattern"
rather than "error: cannot tail-call: callee returns a structure".
Also, it'd be confusing to exclude must-tail-call-1.c but not
must-tail-call-2.c

* gcc.dg/plugin/must-tail-call-1.c, gcc.dg/plugin/must-tail-call-2.c:
Gate on effective target tail_call.
---
 gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c | 1 +
 gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c 
b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
index 1495a48232a6..3a6d4cceaba7 100644
--- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
@@ -1,3 +1,4 @@
+/* { dg-do compile { target tail_call } } */
 /* { dg-options "-fdelayed-branch" { target sparc*-*-* } } */
 
 extern void abort (void);
diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c 
b/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
index c6dfecd32458..d51d15cc0879 100644
--- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-2.c
@@ -1,3 +1,4 @@
+/* { dg-do compile { target tail_call } } */
 /* Allow nested functions.  */
 /* { dg-options "-Wno-pedantic" } */
 
-- 
2.30.2



[PATCH 2/3] doc: Document testsuite check_effective_target_tail_call

2023-03-06 Thread Hans-Peter Nilsson via Gcc-patches
Will commit as obvious, when the 1/3 tail_call is applied.
-- >8 --
Spot-checked the PDF output for sanity.

* doc/sourcebuild.texi: Document check_effective_target_tail_call.
---
 gcc/doc/sourcebuild.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index c348a1e47cc3..80bef7f0a0e2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2844,6 +2844,9 @@ Target supports named sections.
 Target uses natural alignment (aligned to type size) for types of
 32 bits or less.
 
+@item tail_call
+Target supports tail-call optimizations.
+
 @item target_natural_alignment_64
 Target uses natural alignment (aligned to type size) for types of
 64 bits or less.
-- 
2.30.2



[PATCH 1/3] testsuite: Add tail_call effective target

2023-03-06 Thread Hans-Peter Nilsson via Gcc-patches
Ok to commit?
-- >8 --
The RTL "expand" dump is the first RTL dump, and it also appears to be
the earliest trace of the target having implemented sibcalls.
Including the "," in the pattern searched for, to try and avoid
possible false matches, but there doesn't appear to be any identifiers
or target names nearby so this is just belts and suspenders.  Using
"tail_call" as a shorter and more commonly used term than a derivative
of "sibling calls", and expecting only gcc folks to have heard of
"sibcalls".

* lib/target-supports.exp (check_effective_target_tail_call): New.
---
 gcc/testsuite/lib/target-supports.exp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 0ca7a9680bb4..958537b3b7c0 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11684,6 +11684,15 @@ proc check_effective_target_frame_pointer_for_non_leaf 
{ } {
   return 0
 }
 
+# Return 1 if the target can perform tail-call optimizations of the
+# most trivial type.
+proc check_effective_target_tail_call { } {
+return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
+   __attribute__((__noipa__)) void foo (void) { }
+   __attribute__((__noipa__)) void bar (void) { foo(); }
+} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
dump.
+}
+
 # Return 1 if the target's calling sequence or its ABI
 # create implicit stack probes at or prior to function entry.
 proc check_effective_target_caller_implicit_probes { } {
-- 
2.30.2



Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Segher Boessenkool
On Mon, Mar 06, 2023 at 04:34:59PM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > Could we have a target hook to canonicalize memory addresses for combiner,
> > like we have that targetm.canonicalize_comparison ?
> 
> I don't think a hook makes sense as a long-term design decision.
> The canonicalisation we're doing here isn't logically AArch64-specific,
> and in general, the less variation in RTL rules between targets, the better.

Some targets do not want all insasnity allowed for other targets.  We
have quite a few exampples of this already.  But of course a hook like
the proposed one can be abused a lot to do completely unrelated things.
We'll just have to trust target maintainers to have good taste and some
wisdom (because not everyine else looks at all target patches).  What
else is new :-)

> But if you mean adding target control as a GCC 13 hack, to avoid any
> effect on other targets, then TBH, I'd prefer just sticking it in an
> #ifdef GCC_AARCH64_H :-)

And I will NAK that for all the same reasons: it is unmaintainable, it
makes things harder instead of solving problems, it is a completely
ad-hoc code change.

> That makes it 100% clear that it's a
> temporary hack to restrict the target impact rather than something
> based on fundamentals.  We can then revisit for GCC 14.

And that will never happen, you know this as well as anyone else :-(

Most importantly, what makes you think this is a problem for aarch64
only?  If it actually is, you can fix it in the aarch64 config!  Either
with or without new hooks, whatever works best.


Segher


Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Segher Boessenkool
On Mon, Mar 06, 2023 at 05:18:50PM +0100, Jakub Jelinek wrote:
> On Mon, Mar 06, 2023 at 03:08:00PM +, Richard Sandiford via Gcc-patches 
> wrote:
> That still feels like it could be risky in stage4, affecting various other
> FEs which would be expecting ANDs in their patterns instead of *_EXTEND, no?
> So, at least we'd need something like Segher ran to test it on various
> targets on Linux kernel (but would be really nice to get also i?86/x86_64).

It is running.  Still without x86 though, but I'll add that later
hopefully, also for the previous runs.

> If it were on the aarch64 side just one pattern, I'd suggest a pre-reload
> splitter, but unfortunately the sign extends (and zero extends?) are handled
> in legitimate address hook.  Also, I see nonzero_bits only called in
> rs6000's combine splitter and s390'x canonicalize_comparison target hook,
> nowhere else in the backends, so I think using it outside of the combiner
> isn't desirable.

nonzero_bits cannot be used in insn conditions.  This is a well-known
long-standing problem.

The problem is that it can give different output in the passes after
combine than it does in combine itself, since combine does more thorough
analysis.  This than causes insns generated in combine to no longer be
recognised later -> kaboom, ICE.

> Could we have a target hook to canonicalize memory addresses for combiner,
> like we have that targetm.canonicalize_comparison ?

If it makes sense, sure.  And it is implemented in a sensible spot.  It
has to stay maintainable :-)

Looking foreward to a patch,


Segher


Re: [PATCH] libstdc++: Limit allocations in _Rb_tree 1/2

2023-03-06 Thread Jonathan Wakely via Gcc-patches
On Wed, 22 Feb 2023 at 06:06, François Dumont via Libstdc++
 wrote:
>
> Here is eventually a working proposal.
>
> Compared to the unordered container approach we need to find out what
> type is going to be used to call the comparer. Otherwise we might
> reinstantiate a temporary each time we call the comparer. For example in
> case of const char* insertion with a less comparer we would
> create a string_view instance on each comparer call and so each time do
> a strlen.

That's what std::less is for. I don't think we need to spend
time trying to solve the problem against for std::less when
std::less already exists.

If the concern is strings vs const char*, we could explore your
suggestion in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96088#c1
(keeping https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96088#c4 in
mind) so that we optimize that specific case.

> This code is tricky and do not cover all use cases. For those uncovered
> cases the default behavior is to create a key_type instance which will
> be moved to storage if needed.

Yes, it's tricky, and don't handle all cases, but slows down
compilation for all cases.

Also, I think you'll get ambiguous overloads for a comparison type
that has both first_argument_type and is_transparent defined, because
it will match two overloads.

I don't think we should be recreating the logic of transparent
comparison functions, and we shouldn't be reintroducing dependencies
on first_argument_type (in C++20 users should be able to use that
typedef for anything, e.g. make it a typedef for void, and the library
shouldn't care ... I think that would break with your patch).


> Is there any plan to create a builtin function to get help from the
> compiler to find out this type ? Something like std::invoke_result but
> giving also the actual argument types.

No, I don't think so.


>
>  libstdc++: [_Rb_tree] Limit allocations on unique insertions [PR 96088]
>
>  Detect when invoking the comparer requires an allocation using the
> noexcept
>  qualification of the functor. In this case guess the type needed to
> invoke
>  the comparer and create a temporary instance used for all comparer
> invocations.
>  This temporary instance will be eventually moved to storage
> location if it is to
>  insert. Avoid to allocate a node and construct the stored value
> otherwise.
>
>  libstdc++-v3/ChangeLog:
>
>  PR libstdc++/96088
>  * include/bits/stl_function.h
>  (std::less<>::operator()): Add noexcept qualification.
>  (std::greater::operator()): Likewise.
> (std::_Identity<>::operator<_Tp2>(_Tp2&&)): New perfect forwarding operator.
> (std::_Select1st<>::operator<_Pair2>(_Pair2&&)): New move operator.
>  * include/bits/stl_tree.h
> (_Rb_tree<>::_ConvertToValueType<>): New helper type.
>  (_Rb_tree<>::__has_firstargument): Likewise.
>  (_Rb_tree<>::_S_get_key_type_aux): New helper method, use
> latter.
>  (_Rb_tree<>::_S_get_key_type): New helper method, use latter.
>  (_Rb_tree<>::__key_type_t): New.
>  (_Rb_tree<>::__is_comparable_lhs): New.
>  (_Rb_tree<>::__is_comparable_rhs): New.
>  (_Rb_tree<>::__is_comparable): New, use latters.
>  (_Rb_tree<>::__is_nothrow_comparable_lhs): New.
>  (_Rb_tree<>::__is_nothrow_comparable_rhs): New.
>  (_Rb_tree<>::__is_nothrow_comparable): New, use latters.
>  (_Rb_tree<>::_S_forward_key): New.
>  (_Rb_tree<>::_M_get_insert_unique_pos_tr): New.
>  (_Rb_tree<>::_M_emplace_unique_kv): New.
>  (_Rb_tree<>::_M_emplace_unique_aux): New, use latter.
>  (_Rb_tree<>::_M_emplace_unique): New, use latter.
>  (_Rb_tree<>::_Auto_node::_S_build): New.
>  * testsuite/23_containers/map/96088.cc: New test case.
>  * testsuite/23_containers/multimap/96088.cc: New test case.
>  * testsuite/23_containers/multiset/96088.cc: New test case.
>  * testsuite/23_containers/set/96088.cc: New test case.
>
> François



Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Mon, Mar 06, 2023 at 03:08:00PM +, Richard Sandiford via Gcc-patches 
> wrote:
>> Segher Boessenkool  writes:
>> > On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote:
>> >> How about the patch below?
>> >
>> > What about it?  What would make it any better than the previous?
>> 
>> It does what Jeff suggested in the quoted message: work within the existing
>> extract/make_compound_operation scheme rather than try to opt out of it.
>
> That still feels like it could be risky in stage4, affecting various other
> FEs which would be expecting ANDs in their patterns instead of *_EXTEND, no?
> So, at least we'd need something like Segher ran to test it on various
> targets on Linux kernel (but would be really nice to get also i?86/x86_64).
>
> If it were on the aarch64 side just one pattern, I'd suggest a pre-reload
> splitter, but unfortunately the sign extends (and zero extends?) are handled
> in legitimate address hook.  Also, I see nonzero_bits only called in
> rs6000's combine splitter and s390'x canonicalize_comparison target hook,
> nowhere else in the backends, so I think using it outside of the combiner
> isn't desirable.
>
> Could we have a target hook to canonicalize memory addresses for combiner,
> like we have that targetm.canonicalize_comparison ?

I don't think a hook makes sense as a long-term design decision.
The canonicalisation we're doing here isn't logically AArch64-specific,
and in general, the less variation in RTL rules between targets, the better.

But if you mean adding target control as a GCC 13 hack, to avoid any
effect on other targets, then TBH, I'd prefer just sticking it in an
#ifdef GCC_AARCH64_H :-)  That makes it 100% clear that it's a
temporary hack to restrict the target impact rather than something
based on fundamentals.  We can then revisit for GCC 14.

Thanks,
Richard


Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Jakub Jelinek via Gcc-patches
On Mon, Mar 06, 2023 at 03:08:00PM +, Richard Sandiford via Gcc-patches 
wrote:
> Segher Boessenkool  writes:
> > On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote:
> >> How about the patch below?
> >
> > What about it?  What would make it any better than the previous?
> 
> It does what Jeff suggested in the quoted message: work within the existing
> extract/make_compound_operation scheme rather than try to opt out of it.

That still feels like it could be risky in stage4, affecting various other
FEs which would be expecting ANDs in their patterns instead of *_EXTEND, no?
So, at least we'd need something like Segher ran to test it on various
targets on Linux kernel (but would be really nice to get also i?86/x86_64).

If it were on the aarch64 side just one pattern, I'd suggest a pre-reload
splitter, but unfortunately the sign extends (and zero extends?) are handled
in legitimate address hook.  Also, I see nonzero_bits only called in
rs6000's combine splitter and s390'x canonicalize_comparison target hook,
nowhere else in the backends, so I think using it outside of the combiner
isn't desirable.

Could we have a target hook to canonicalize memory addresses for combiner,
like we have that targetm.canonicalize_comparison ?

Jakub



Re: [PATCH] c++: error with constexpr operator() [PR107939]

2023-03-06 Thread Jason Merrill via Gcc-patches

On 3/3/23 12:51, Marek Polacek wrote:

Similarly to PR107938, this also started with r11-557, whereby cp_finish_decl
can call check_initializer even in a template for a constexpr initializer.

Here we are rejecting

   extern const Q q;

   template
   constexpr auto p = q(0);

even though q has a constexpr operator().  It's deemed non-const by
decl_maybe_constant_var_p because even though 'q' is const it is not
of integral/enum type.  I think the fix is for p_c_e to treat q(0) as
potentially-constant, as below.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?

PR c++/107939

gcc/cp/ChangeLog:

* constexpr.cc (is_constexpr_function_object): New.
(potential_constant_expression_1): Treat an object with constexpr
operator() as potentially-constant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ74.C: Remove dg-error.
* g++.dg/cpp1y/var-templ77.C: New test.
---
  gcc/cp/constexpr.cc  | 23 ++-
  gcc/testsuite/g++.dg/cpp1y/var-templ74.C |  2 +-
  gcc/testsuite/g++.dg/cpp1y/var-templ77.C | 14 ++
  3 files changed, 37 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ77.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index acf9847a4d1..7d786f332b4 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8929,6 +8929,24 @@ check_for_return_continue (tree *tp, int *walk_subtrees, 
void *data)
return NULL_TREE;
  }
  
+/* Return true iff TYPE is a class with constexpr operator().  */

+
+static bool
+is_constexpr_function_object (tree type)
+{
+  if (!CLASS_TYPE_P (type))
+return false;
+
+  for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
+if (TREE_CODE (f) == FUNCTION_DECL
+   && DECL_OVERLOADED_OPERATOR_P (f)
+   && DECL_OVERLOADED_OPERATOR_IS (f, CALL_EXPR)
+   && DECL_DECLARED_CONSTEXPR_P (f))
+  return true;
+
+  return false;
+}
+
  /* Return true if T denotes a potentially constant expression.  Issue
 diagnostic as appropriate under control of FLAGS.  If WANT_RVAL is true,
 an lvalue-rvalue conversion is implied.  If NOW is true, we want to
@@ -9160,7 +9178,10 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
  }
else if (fun)
{
-   if (RECUR (fun, rval))
+   if (VAR_P (fun)
+   && is_constexpr_function_object (TREE_TYPE (fun)))
+ /* Could be an object with constexpr operator().  */;


I guess if fun is not a function pointer, we don't know if we're using 
it as an lvalue or rvalue, so we want to pass 'any' for want_rval, which 
should make this work; I don't think we need to be specific about 
constexpr op(), as a constexpr conversion operator to fn* could also do 
the trick.



+   else if (RECUR (fun, rval))
  /* Might end up being a constant function pointer.  */;
else
  return false;
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ74.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ74.C
index 4e2e800a6eb..c76a7d949ac 100644
--- a/gcc/testsuite/g++.dg/cpp1y/var-templ74.C
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ74.C
@@ -9,7 +9,7 @@ struct Q {
  extern const Q q;
  
  template

-constexpr const Q* p = q(0); // { dg-bogus "not usable" "PR107939" { xfail 
*-*-* } }
+constexpr const Q* p = q(0);
  
  void

  g ()
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ77.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ77.C
new file mode 100644
index 000..b480f54b001
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ77.C
@@ -0,0 +1,14 @@
+// PR c++/107939
+// { dg-do compile { target c++14 } }
+
+struct Q {
+  struct P {
+const Q* p;
+  };
+  int n;
+  constexpr P operator()(int) const { return {this}; }
+};
+
+extern const Q q;
+template
+constexpr auto p = q(0);

base-commit: 9056d0df830c5a295d7594d517d409d10476990d




Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Segher Boessenkool  writes:
> On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote:
>> How about the patch below?
>
> What about it?  What would make it any better than the previous?

It does what Jeff suggested in the quoted message: work within the existing
extract/make_compound_operation scheme rather than try to opt out of it.

Richard




Re: [PATCH v4] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-06 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes:
> From: Pan Li 
>
>   Fix the bug of the rvv bool mode precision with the adjustment.
>   The bits size of vbool*_t will be adjusted to
>   [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>   adjusted mode precison of vbool*_t will help underlying pass to
>   make the right decision for both the correctness and optimization.
>
>   Given below sample code:
>   void test_1(int8_t * restrict in, int8_t * restrict out)
>   {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>   }
>
>   Before the precision adjustment:
>   addia4,a1,100
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia1,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a4)
>   // Need one vsetvli and vlm.v for correctness here.
>   vsm.v   v24,0(a1)
>
>   After the precision adjustment:
>   csrrt0,vlenb
>   sllit1,t0,1
>   csrra3,vlenb
>   sub sp,sp,t1
>   sllia4,a3,1
>   add a4,a4,sp
>   sub a3,a4,a3
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia2,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a3)
>   addia1,a1,100
>   vsetvli a4,zero,e8,mf2,ta,ma
>   csrrt0,vlenb
>   vlm.v   v25,0(a3)
>   vsm.v   v25,0(a2)
>   sllit1,t0,1
>   vsetvli a5,zero,e8,m1,ta,ma
>   vsm.v   v24,0(a1)
>   add sp,sp,t1
>   jr  ra
>
>   However, there may be some optimization opportunates after
>   the mode precision adjustment. It can be token care of in
>   the RISC-V backend in the underlying separted PR(s).
>
>   PR 108185
>   PR 108654
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>   * config/riscv/riscv.cc (riscv_v_adjust_precision):
>   * config/riscv/riscv.h (riscv_v_adjust_precision):
>   * genmodes.cc (ADJUST_PRECISION):
>   (emit_mode_adjustments):

OK for the genmodes.cc part, thanks.

Richard

> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/pr108185-1.c: New test.
>   * gcc.target/riscv/pr108185-2.c: New test.
>   * gcc.target/riscv/pr108185-3.c: New test.
>   * gcc.target/riscv/pr108185-4.c: New test.
>   * gcc.target/riscv/pr108185-5.c: New test.
>   * gcc.target/riscv/pr108185-6.c: New test.
>   * gcc.target/riscv/pr108185-7.c: New test.
>   * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li 
> Co-authored-by: Ju-Zhe Zhong 
> ---
>  gcc/config/riscv/riscv-modes.def|  8 +++
>  gcc/config/riscv/riscv.cc   | 12 
>  gcc/config/riscv/riscv.h|  1 +
>  gcc/genmodes.cc | 28 +++-
>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
>  12 files changed, 600 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def 
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
>  ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
>  ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>  
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> +
>  /*
> | Mode| MI

RE: [PATCH v3] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-06 Thread Li, Pan2 via Gcc-patches
Got it and it makes sense to me from the perspective of the defensive 
programming.

Thanks a lot, and update the PATCH v4 as below link.

https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613465.html

Pan

-Original Message-
From: Richard Sandiford  
Sent: Monday, March 6, 2023 9:41 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; 
rguent...@suse.de
Subject: Re: [PATCH v3] RISC-V: Bugfix for rvv bool mode precision adjustment

pan2...@intel.com writes:
> From: Pan Li 
>
>   Fix the bug of the rvv bool mode precision with the adjustment.
>   The bits size of vbool*_t will be adjusted to
>   [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>   adjusted mode precison of vbool*_t will help underlying pass to
>   make the right decision for both the correctness and optimization.
>
>   Given below sample code:
>   void test_1(int8_t * restrict in, int8_t * restrict out)
>   {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>   }
>
>   Before the precision adjustment:
>   addia4,a1,100
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia1,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a4)
>   // Need one vsetvli and vlm.v for correctness here.
>   vsm.v   v24,0(a1)
>
>   After the precision adjustment:
>   csrrt0,vlenb
>   sllit1,t0,1
>   csrra3,vlenb
>   sub sp,sp,t1
>   sllia4,a3,1
>   add a4,a4,sp
>   sub a3,a4,a3
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia2,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a3)
>   addia1,a1,100
>   vsetvli a4,zero,e8,mf2,ta,ma
>   csrrt0,vlenb
>   vlm.v   v25,0(a3)
>   vsm.v   v25,0(a2)
>   sllit1,t0,1
>   vsetvli a5,zero,e8,m1,ta,ma
>   vsm.v   v24,0(a1)
>   add sp,sp,t1
>   jr  ra
>
>   However, there may be some optimization opportunates after
>   the mode precision adjustment. It can be token care of in
>   the RISC-V backend in the underlying separted PR(s).
>
>   PR 108185
>   PR 108654
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>   * config/riscv/riscv.cc (riscv_v_adjust_precision):
>   * config/riscv/riscv.h (riscv_v_adjust_precision):
>   * genmodes.cc (ADJUST_PRECISION):
>   (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/pr108185-1.c: New test.
>   * gcc.target/riscv/pr108185-2.c: New test.
>   * gcc.target/riscv/pr108185-3.c: New test.
>   * gcc.target/riscv/pr108185-4.c: New test.
>   * gcc.target/riscv/pr108185-5.c: New test.
>   * gcc.target/riscv/pr108185-6.c: New test.
>   * gcc.target/riscv/pr108185-7.c: New test.
>   * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-modes.def|  8 +++
>  gcc/config/riscv/riscv.cc   | 12 
>  gcc/config/riscv/riscv.h|  1 +
>  gcc/genmodes.cc | 26 ++-
>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
>  12 files changed, 598 insertions(+), 2 deletions(-)  create mode 
> 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def 
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);  ADJUST_BYTESIZE (VNx32BI, 
> riscv_vector_chunks * riscv_bytes_per_vector_chunk);  ADJUST_BYTESIZE 
> (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>  
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1)); 
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BI

[PATCH v4] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-06 Thread pan2.li--- via Gcc-patches
From: Pan Li 

Fix the bug of the rvv bool mode precision with the adjustment.
The bits size of vbool*_t will be adjusted to
[1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
adjusted mode precison of vbool*_t will help underlying pass to
make the right decision for both the correctness and optimization.

Given below sample code:
void test_1(int8_t * restrict in, int8_t * restrict out)
{
  vbool8_t v2 = *(vbool8_t*)in;
  vbool16_t v5 = *(vbool16_t*)in;
  *(vbool16_t*)(out + 200) = v5;
  *(vbool8_t*)(out + 100) = v2;
}

Before the precision adjustment:
addia4,a1,100
vsetvli a5,zero,e8,m1,ta,ma
addia1,a1,200
vlm.v   v24,0(a0)
vsm.v   v24,0(a4)
// Need one vsetvli and vlm.v for correctness here.
vsm.v   v24,0(a1)

After the precision adjustment:
csrrt0,vlenb
sllit1,t0,1
csrra3,vlenb
sub sp,sp,t1
sllia4,a3,1
add a4,a4,sp
sub a3,a4,a3
vsetvli a5,zero,e8,m1,ta,ma
addia2,a1,200
vlm.v   v24,0(a0)
vsm.v   v24,0(a3)
addia1,a1,100
vsetvli a4,zero,e8,mf2,ta,ma
csrrt0,vlenb
vlm.v   v25,0(a3)
vsm.v   v25,0(a2)
sllit1,t0,1
vsetvli a5,zero,e8,m1,ta,ma
vsm.v   v24,0(a1)
add sp,sp,t1
jr  ra

However, there may be some optimization opportunates after
the mode precision adjustment. It can be token care of in
the RISC-V backend in the underlying separted PR(s).

PR 108185
PR 108654

gcc/ChangeLog:

* config/riscv/riscv-modes.def (ADJUST_PRECISION):
* config/riscv/riscv.cc (riscv_v_adjust_precision):
* config/riscv/riscv.h (riscv_v_adjust_precision):
* genmodes.cc (ADJUST_PRECISION):
(emit_mode_adjustments):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.

Signed-off-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
---
 gcc/config/riscv/riscv-modes.def|  8 +++
 gcc/config/riscv/riscv.cc   | 12 
 gcc/config/riscv/riscv.h|  1 +
 gcc/genmodes.cc | 28 +++-
 gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
 12 files changed, 600 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index d5305efa8a6..110bddce851 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * 
riscv_bytes_per_vector_chunk);
 ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
 ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
 
+ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
+ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
+ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
+ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
+ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
+ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
+ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
+
 /*
| Mode| MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
| | LMUL| SEW/LMUL| LMUL| SEW/LMUL|
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Segher Boessenkool
On Mon, Mar 06, 2023 at 12:47:06PM +, Richard Sandiford wrote:
> How about the patch below?

What about it?  What would make it any better than the previous?

Oh, and please do not send new patches in old threads :-(


Segher


Re: [PATCH v3] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-06 Thread Richard Sandiford via Gcc-patches
pan2...@intel.com writes:
> From: Pan Li 
>
>   Fix the bug of the rvv bool mode precision with the adjustment.
>   The bits size of vbool*_t will be adjusted to
>   [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>   adjusted mode precison of vbool*_t will help underlying pass to
>   make the right decision for both the correctness and optimization.
>
>   Given below sample code:
>   void test_1(int8_t * restrict in, int8_t * restrict out)
>   {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>   }
>
>   Before the precision adjustment:
>   addia4,a1,100
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia1,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a4)
>   // Need one vsetvli and vlm.v for correctness here.
>   vsm.v   v24,0(a1)
>
>   After the precision adjustment:
>   csrrt0,vlenb
>   sllit1,t0,1
>   csrra3,vlenb
>   sub sp,sp,t1
>   sllia4,a3,1
>   add a4,a4,sp
>   sub a3,a4,a3
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia2,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a3)
>   addia1,a1,100
>   vsetvli a4,zero,e8,mf2,ta,ma
>   csrrt0,vlenb
>   vlm.v   v25,0(a3)
>   vsm.v   v25,0(a2)
>   sllit1,t0,1
>   vsetvli a5,zero,e8,m1,ta,ma
>   vsm.v   v24,0(a1)
>   add sp,sp,t1
>   jr  ra
>
>   However, there may be some optimization opportunates after
>   the mode precision adjustment. It can be token care of in
>   the RISC-V backend in the underlying separted PR(s).
>
>   PR 108185
>   PR 108654
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>   * config/riscv/riscv.cc (riscv_v_adjust_precision):
>   * config/riscv/riscv.h (riscv_v_adjust_precision):
>   * genmodes.cc (ADJUST_PRECISION):
>   (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/pr108185-1.c: New test.
>   * gcc.target/riscv/pr108185-2.c: New test.
>   * gcc.target/riscv/pr108185-3.c: New test.
>   * gcc.target/riscv/pr108185-4.c: New test.
>   * gcc.target/riscv/pr108185-5.c: New test.
>   * gcc.target/riscv/pr108185-6.c: New test.
>   * gcc.target/riscv/pr108185-7.c: New test.
>   * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-modes.def|  8 +++
>  gcc/config/riscv/riscv.cc   | 12 
>  gcc/config/riscv/riscv.h|  1 +
>  gcc/genmodes.cc | 26 ++-
>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
>  12 files changed, 598 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def 
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
>  ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
>  ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>  
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> +
>  /*
> | Mode| MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> | | LMU

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-06 Thread Andrew Stubbs

On 03/03/2023 17:05, Paul-Antoine Arras wrote:

Le 02/03/2023 à 18:18, Andrew Stubbs a écrit :

On 01/03/2023 16:56, Paul-Antoine Arras wrote:

This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It 
also allows the exec register to be saved in SGPRs to avoid spilling 
to memory.

Tested on GCN3 Fiji gfx803.

OK for trunk?


Not quite yet, but it's only a few cosmetic issues, I think.


+(define_insn_and_split "3"
+  [(set (match_operand:V_DI 0 "register_operand"  "=  v")
+    (minmaxop:V_DI
+  (match_operand:V_DI 1 "gcn_alu_operand" "%  v")
+  (match_operand:V_DI 2 "gcn_alu_operand" "   v")))
+    (clobber (reg:DI VCC_REG))]


No need to make it commutative when the two operands have the same 
constraints. There's a few more instances of this later.



+    if ( == smin ||  == smax)
+  emit_insn (gen_vec_cmpdi (vcc, minp ? gen_rtx_LT 
(VOIDmode, 0, 0) :
+    gen_rtx_GT (VOIDmode, 0, 0), 
operands[1], operands[2]));

+    else
+  emit_insn (gen_vec_cmpdi (vcc, minp ? gen_rtx_LTU 
(VOIDmode, 0, 0) :
+    gen_rtx_GTU (VOIDmode, 0, 0), 
operands[1], operands[2]));

+


Long lines need to be wrapped, here and elsewhere.


The amended patch attached should fix those issues. Let me know if it 
looks good to you.


OK to commit, thanks.

Andrew


[PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-06 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches  writes:
> On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
>> 
>> The regression was reported during stage-1. A patch was provided during 
>> stage 1 and the discussions around combine stalled.
>> 
>> The regression for AArch64 needs to be fixed in GCC 13. The hit is too big 
>> just to "take".
>> 
>> So we need a way forward, even if it's stage-4.
> Then it needs to be in a way that works within the design constraints of 
> combine.
>
> As Segher has indicated, using a magic constant to say "this is always 
> cheap enough" isn't acceptable.  Furthermore, what this patch changes is 
> combine's internal canonicalization of extensions into shift pairs.
>
> So I think another path forward needs to be found.  I don't see hacking 
> up expand_compound_operation is viable.
>
> Jeff

How about the patch below?  Tested on aarch64-linux-gnu,
but I'll test on x86_64-linux-gnu and powerpc64le-linux-gnu
before committing.

-

The PR contains a case where we want GCC to combine a sign_extend
into an address (which is something that GCC 12 could do).  After
substitution, the sign_extend goes through the usual
expand_compound_operation wrangler and, after taking nonzero_bits
into account, make_compound_operation is presented with:

  X1: (and (mult (subreg x) (const_int N2)) (const_int N1))

where:

(a) the subreg is paradoxical
(b) N2 is a power of 2
(c) all bits outside N1/N2 are known to be zero in x

This is equivalent to:

  X2: (mult (and (subreg x) (const_int N1/N2)) (const_int N2))

Given in this form, existing code would already use (c) to convert
the inner "and" to a zero_extend:

  (mult (zero_extend x) (const_int N2))

This patch makes the code handle X1 as well as X2.

Logically, it would make sense to do the same for ASHIFT, which
would be the canonical form outside memory addresses.  However, it
seemed better to do the minimum possible, given the late stage in
the release cycle.

gcc/
PR rtl-optimization/106594
* combine.cc (make_compound_operation_int): Extend the AND to
ZERO_EXTEND transformation so that it can handle an intervening
multiplication by a power of two.

gcc/testsuite/
* gcc.target/aarch64/pr106594.c: New test.
---
 gcc/combine.cc  | 60 ++---
 gcc/testsuite/gcc.target/aarch64/pr106594.c | 21 
 2 files changed, 63 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr106594.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 053879500b7..b45042bbafd 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -8188,28 +8188,52 @@ make_compound_operation_int (scalar_int_mode mode, rtx 
*x_ptr,
   /* If the one operand is a paradoxical subreg of a register or memory and
 the constant (limited to the smaller mode) has only zero bits where
 the sub expression has known zero bits, this can be expressed as
-a zero_extend.  */
-  else if (GET_CODE (XEXP (x, 0)) == SUBREG)
-   {
- rtx sub;
+a zero_extend.
+
+Look also for the case where the operand is such a subreg that
+is multiplied by 2**N:
 
- sub = XEXP (XEXP (x, 0), 0);
- machine_mode sub_mode = GET_MODE (sub);
- int sub_width;
- if ((REG_P (sub) || MEM_P (sub))
- && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
- && sub_width < mode_width)
+  (and (mult ... 2**N) M) --> (mult (and ... M>>N) 2**N) -> ...  */
+  else
+   {
+ rtx y = XEXP (x, 0);
+ rtx top = y;
+ int shift = 0;
+ if (GET_CODE (y) == MULT
+ && CONST_INT_P (XEXP (y, 1))
+ && pow2p_hwi (INTVAL (XEXP (y, 1
{
- unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
- unsigned HOST_WIDE_INT mask;
+ shift = exact_log2 (INTVAL (XEXP (y, 1)));
+ y = XEXP (y, 0);
+   }
+ if (GET_CODE (y) == SUBREG)
+   {
+ rtx sub;
 
- /* original AND constant with all the known zero bits set */
- mask = UINTVAL (XEXP (x, 1)) | (~nonzero_bits (sub, sub_mode));
- if ((mask & mode_mask) == mode_mask)
+ sub = XEXP (y, 0);
+ machine_mode sub_mode = GET_MODE (sub);
+ int sub_width;
+ if ((REG_P (sub) || MEM_P (sub))
+ && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
+ && sub_width < mode_width)
{
- new_rtx = make_compound_operation (sub, next_code);
- new_rtx = make_extraction (mode, new_rtx, 0, 0, sub_width,
-1, 0, in_code == COMPARE);
+ unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
+ unsigned HOST_WIDE_INT mask;
+
+ /* The shifted AND constant with all t

RE: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-06 Thread Tamar Christina via Gcc-patches
> Hi!
> 
> On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> > On Sun, Mar 05, 2023 at 08:43:20PM +, Tamar Christina wrote:
> > Yes, *look* better: I have seen no proof or indication that this would
> 
> ("looks", I cannot type, sorry)
> 
> > actually generate better code, not even on just aarch, let alone on
> > the majority of targets.  As I said I have a test running, you may be
> > lucky even :-)  It has to run for about six hours more and after that
> > it needs analysis still (a few more hours if it isn't obviously always
> > better or worse), so expect results tomorrow night at the earliest.
> 
> The results are in:
> 
> $ perl sizes.pl --percent C[12]
> C1C2
>alpha   7082243  100.066%
>  arc   4207975  100.015%
>  arm  11518624  100.008%
>arm64  24514565  100.067%
>armhf  16661684  100.098%
> csky   4031841  100.002%
> i386 0 0
> ia64  20354295  100.029%
> m68k   4394084  100.023%
>   microblaze   6549965  100.014%
> mips  10684680  100.024%
>   mips64   8171850  100.002%
>nios2   4356713  100.012%
> openrisc   5010570  100.003%
>   parisc   8406294  100.002%
> parisc64 0 0
>  powerpc  11104901   99.992%
>powerpc64  24532358  100.057%
>  powerpc64le  21293219  100.062%
>  riscv32   2028474  100.131%
>  riscv64   9515453  100.120%
> s390  20519612  100.279%
>   sh 0 0
>  shnommu   1840960  100.012%
>sparc   5314422  100.004%
>  sparc64   7964129   99.992%
>   x86_64 0 0
>   xtensa   2925723  100.070%
> 
> 
> C1 is the original, C2 with your patch.  These numbers are the code sizes of a
> Linux kernel, some defconfig for every arch.  This is a good measure of how
> effective combine was.
> 
> The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
> everywhere else.  Look at that s390 number!  Or riscv, or most of the arm
> variants (including aarch64).
> 
> Do you want me to look in detail what causes this regression on some
> particular target, i.e. why we really still need the expand_compound
> functionality there?
> 

Hi,

Thanks for having a look! I think the Richards are exploring a different 
solution on the PR
so I don't think it's worth looking at now (maybe in stage-1?).  Thanks for 
checking though!

I Appreciate you all helping to get this fixed!

Kind Regards,
Tamar

> (Btw.  "0" means the target did not build.  For the x86 targets this is just 
> more
> -Werror madness that seeped in it seems.  For parisc64 and sh it is the choice
> of config.  Will fix.)
> 
> 
> Segher


Re: [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.

2023-03-06 Thread Segher Boessenkool
Hi!

On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> On Sun, Mar 05, 2023 at 08:43:20PM +, Tamar Christina wrote:
> Yes, *look* better: I have seen no proof or indication that this would

("looks", I cannot type, sorry)

> actually generate better code, not even on just aarch, let alone on the
> majority of targets.  As I said I have a test running, you may be lucky
> even :-)  It has to run for about six hours more and after that it needs
> analysis still (a few more hours if it isn't obviously always better or
> worse), so expect results tomorrow night at the earliest.

The results are in:

$ perl sizes.pl --percent C[12]
C1C2
   alpha   7082243  100.066%
 arc   4207975  100.015%
 arm  11518624  100.008%
   arm64  24514565  100.067%
   armhf  16661684  100.098%
csky   4031841  100.002%
i386 0 0
ia64  20354295  100.029%
m68k   4394084  100.023%
  microblaze   6549965  100.014%
mips  10684680  100.024%
  mips64   8171850  100.002%
   nios2   4356713  100.012%
openrisc   5010570  100.003%
  parisc   8406294  100.002%
parisc64 0 0
 powerpc  11104901   99.992%
   powerpc64  24532358  100.057%
 powerpc64le  21293219  100.062%
 riscv32   2028474  100.131%
 riscv64   9515453  100.120%
s390  20519612  100.279%
  sh 0 0
 shnommu   1840960  100.012%
   sparc   5314422  100.004%
 sparc64   7964129   99.992%
  x86_64 0 0
  xtensa   2925723  100.070%


C1 is the original, C2 with your patch.  These numbers are the code
sizes of a Linux kernel, some defconfig for every arch.  This is a good
measure of how effective combine was.

The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
everywhere else.  Look at that s390 number!  Or riscv, or most of the
arm variants (including aarch64).

Do you want me to look in detail what causes this regression on some
particular target, i.e. why we really still need the expand_compound
functionality there?

(Btw.  "0" means the target did not build.  For the x86 targets this is
just more -Werror madness that seeped in it seems.  For parisc64 and sh
it is the choice of config.  Will fix.)


Segher


Re: [wwwdocs] document modula-2 in gcc-13/changes.html (and index.html)

2023-03-06 Thread Gaius Mulley via Gcc-patches
Gerald Pfeifer  writes:

> Hi Gaius,
>
> apologies, I thought you had pushed the updated patch and only now 
> realized it's not in yet.
>
> Please look into the few bits below and then go ahead and push.
>
> On Mon, 6 Feb 2023, Gaius Mulley wrote:
>>  * htdocs/frontends.html: An update to say the front end is now in
>>  the development trunk.
>
> Here we'd usually simply say
>
>   * frontends: The Modula-2 front end is now on the 
>   development trunk.
>
> skiping "An update to say", or at least "An update to" and the "htdocs/" 
> port (and adding which front end).
>
>>  * htdocs/gcc-13/changes.html: A description of which dialects are
>>  supported and the user level front end changes so far.
>
>   * gcc-13: Note which Modula-2 dialects are...
>
>>  * htdocs/index.html: Proposed news entry.
>
>   * index: Announce Modula-2 inclusion 
>
> (or something like that).
>
>>  * htdocs/onlinedocs/index.html: PDF, PS and HTML documentation
>>  links.
>   * onlinedocs: Add links to Modula-2 documentation
>
>
>> +The compiler is operational with GCC 10, GCC 11 GCC 12 (on
>
> ...GCC 11, and GCC 12...
>
>> +GNU/Linux x86 systems).  The front end is now in the GCC development
>> +trunk (GCC 13).  The front end is mostly written in Modula-2 and it
>>  includes a bootstrap tool which translates Modula-2 into C/C++.
>
> How about "It is mostly written in Modula-2 and includes..." to avoid
> repetition of "the front end"?
>
>
> Thank you,
> Gerald

Hi Gerald,

no problem at all and many thanks for the advice.  I'll push the changes
shortly,

regards,
Gaius


Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-06 Thread Richard Biener via Gcc-patches
On Mon, 6 Mar 2023, Jakub Jelinek wrote:

> On Mon, Mar 06, 2023 at 11:01:18AM +, Richard Biener wrote:
> > +  auto_mpfr &operator=(const auto_mpfr &) = delete;
> > +  auto_mpz &operator=(const auto_mpz &) = delete;
> 
> Just formatting nit, space before (.
> 
> Looks like nice improvement and thanks Jonathan for the suggestions ;)

Good, I've queued it for stage1 unless fortran folks want to pick it
up earlier for the purpose of fixing leaks.

Richard.


RE: [PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

2023-03-06 Thread Tamar Christina via Gcc-patches
Ping,

And updated the hook to allow to differentiate between ISAs.

As Andy said before initializing a ranger instance is cheap but not free, and if
the intention is to call it often during a pass it should be instantiated at
pass startup and passed along to the places that need it.  This is a big
refactoring and doesn't seem right to do in this PR.  But we should in GCC 14.

Currently we only instantiate it after a long series of much cheaper checks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* target.def (preferred_div_as_shifts_over_mult): New.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* targhooks.cc (default_preferred_div_as_shifts_over_mult): New.
* targhooks.h (default_preferred_div_as_shifts_over_mult): New.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it.

gcc/testsuite/ChangeLog:

PR target/108583
* gcc.dg/vect/vect-div-bitmask-4.c: New test.
* gcc.dg/vect/vect-div-bitmask-5.c: New test.

--- inline copy of patch ---

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
50a8872a6695b18b9bed0d393bacf733833633db..f69f7f036272e867ea1c3fee851b117f057f68c5
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6137,6 +6137,10 @@ instruction pattern.  There is no need for the hook to 
handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool 
TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT (const_tree @var{type})
+If possible, when decomposing a division operation of vectors of
+type @var{type} during vectorization, prefer to use shifts rather than
+multiplication by magic constants.
 @end deftypefn
 
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
(unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
3e07978a02f4e6077adae6cadc93ea4273295f1f..0051017a7fd67691a343470f36ad4fc32c8e7e15
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4173,6 +4173,7 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT
 
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
diff --git a/gcc/target.def b/gcc/target.def
index 
e0a5c7adbd962f5d08ed08d1d81afa2c2baa64a5..bdee9b7f9c941508738fac49593b5baa525e2915
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1868,6 +1868,16 @@ correct for most targets.",
  poly_uint64, (const_tree type),
  default_preferred_vector_alignment)
 
+/* Returns whether the target has a preference for decomposing divisions using
+   shifts rather than multiplies.  */
+DEFHOOK
+(preferred_div_as_shifts_over_mult,
+ "If possible, when decomposing a division operation of vectors of\n\
+type @var{type} during vectorization, prefer to use shifts rather than\n\
+multiplication by magic constants.",
+ bool, (const_tree type),
+ default_preferred_div_as_shifts_over_mult)
+
 /* Return true if vector alignment is reachable (by peeling N
iterations) for the given scalar type.  */
 DEFHOOK
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 
a6a4809ca91baa5d7fad2244549317a31390f0c2..a207963b9e6eb9300df0043e1b79aa6c941d0f7f
 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -53,6 +53,8 @@ extern scalar_int_mode default_unwind_word_mode (void);
 extern unsigned HOST_WIDE_INT default_shift_truncation_mask
   (machine_mode);
 extern unsigned int default_min_divisions_for_recip_mul (machine_mode);
+extern bool default_preferred_div_as_shifts_over_mult
+  (const_tree);
 extern int default_mode_rep_extended (scalar_int_mode, scalar_int_mode);
 
 extern tree default_stack_protect_guard (void);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 
211525720a620d6f533e2da91e03877337a931e7..becea6ef4b6329cfa0b676f8d844630fbdc97f20
 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1483,6 +1483,15 @@ default_preferred_vector_alignment (const_tree type)
   return TYPE_ALIGN (type);
 }
 
+/* The default implementation of
+   TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT.  */
+
+bool
+default_preferred_div_as_shifts_over_mult (const_tree /* type */)
+{
+  return false;
+}
+
 /* By default assume vectors of element TYPE require a multiple of the natural
alignment of TYPE.  TYPE is naturally aligned if IS_PACKED is false.  */
 bool
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
new file mode 100644
index 
..c81f8946922250234bf759e0a0a04ea8c1f73e3c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target vect_int } */
+
+#include 
+#include "tree-vect.h"
+
+typedef unsigned __attribute__((__vector_size__ (16))) V;
+
+static __attribute__((__noinline__)) __att

RE: [PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-03-06 Thread Tamar Christina via Gcc-patches
Ping,

And updating the hook.

There are no new test as new correctness tests were added to the mid-end and
the existing codegen tests for this already exist.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv3): Remove.
(*bitmask_shift_plus): New.
* config/aarch64/aarch64-sve2.md (*bitmask_shift_plus): New.
(@aarch64_bitmask_udiv3): Remove.
* config/aarch64/aarch64.cc
(aarch64_vectorize_can_special_div_by_constant,
TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
(TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
aarch64_vectorize_preferred_div_as_shifts_over_mult): New.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
7f212bf37cd2c120dceb7efa733c9fa76226f029..e1ecb88634f93d380ef534093ea6599dc7278108
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4867,60 +4867,27 @@ (define_expand "aarch64_hn2"
   }
 )
 
-;; div optimizations using narrowings
-;; we can do the division e.g. shorts by 255 faster by calculating it as
-;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
-;; double the precision of x.
-;;
-;; If we imagine a short as being composed of two blocks of bytes then
-;; adding 257 or 0b_0001__0001 to the number is equivalent to
-;; adding 1 to each sub component:
-;;
-;;  short value of 16-bits
-;; ┌──┬┐
-;; │  ││
-;; └──┴┘
-;;   8-bit part1 ▲  8-bit part2   ▲
-;;   ││
-;;   ││
-;;  +1   +1
-;;
-;; after the first addition, we have to shift right by 8, and narrow the
-;; results back to a byte.  Remember that the addition must be done in
-;; double the precision of the input.  Since 8 is half the size of a short
-;; we can use a narrowing halfing instruction in AArch64, addhn which also
-;; does the addition in a wider precision and narrows back to a byte.  The
-;; shift itself is implicit in the operation as it writes back only the top
-;; half of the result. i.e. bits 2*esize-1:esize.
-;;
-;; Since we have narrowed the result of the first part back to a byte, for
-;; the second addition we can use a widening addition, uaddw.
-;;
-;; For the final shift, since it's unsigned arithmetic we emit an ushr by 8.
-;;
-;; The shift is later optimized by combine to a uzp2 with movi #0.
-(define_expand "@aarch64_bitmask_udiv3"
-  [(match_operand:VQN 0 "register_operand")
-   (match_operand:VQN 1 "register_operand")
-   (match_operand:VQN 2 "immediate_operand")]
+;; Optimize ((a + b) >> n) + c where n is half the bitsize of the vector
+(define_insn_and_split "*bitmask_shift_plus"
+  [(set (match_operand:VQN 0 "register_operand" "=&w")
+   (plus:VQN
+ (lshiftrt:VQN
+   (plus:VQN (match_operand:VQN 1 "register_operand" "w")
+ (match_operand:VQN 2 "register_operand" "w"))
+   (match_operand:VQN 3 "aarch64_simd_shift_imm_vec_exact_top" "Dr"))
+ (match_operand:VQN 4 "register_operand" "w")))]
   "TARGET_SIMD"
+  "#"
+  "&& true"
+  [(const_int 0)]
 {
-  unsigned HOST_WIDE_INT size
-= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
-  rtx elt = unwrap_const_vec_duplicate (operands[2]);
-  if (!CONST_INT_P (elt) || UINTVAL (elt) != size)
-FAIL;
-
-  rtx addend = gen_reg_rtx (mode);
-  rtx val = aarch64_simd_gen_const_vector_dup (mode, 1);
-  emit_move_insn (addend, lowpart_subreg (mode, val, mode));
-  rtx tmp1 = gen_reg_rtx (mode);
-  rtx tmp2 = gen_reg_rtx (mode);
-  emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend));
-  unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode);
-  rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode, bitsize);
-  emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1));
-  emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2, shift_vector));
+  rtx tmp;
+  if (can_create_pseudo_p ())
+tmp = gen_reg_rtx (mode);
+  else
+tmp = gen_rtx_REG (mode, REGNO (operands[0]));
+  emit_insn (gen_aarch64_addhn (tmp, operands[1], operands[2]));
+  emit_insn (gen_aarch64_uaddw (operands[0], operands[4], tmp));
   DONE;
 })
 
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 
40c0728a7e6f00c395c360ce7625bc2e4a018809..bed44d7d6873877386222d56144cc115e3953a61
 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2317,41 +2317,24 @@ (define_insn "@aarch64_sve_"
 ;;  [INT] Misc optab implementations
 ;; -
 ;; Includes:
-;; - aarch64_bitmask_udiv
+;; - bitmask_shift_plus
 ;; -
 
-;; div

RE: [PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-03-06 Thread Tamar Christina via Gcc-patches
Ping.

And updated the patch to reject cases that we don't expect or can handle 
cleanly for now.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* gimple-range-op.h (gimple_range_op_handler): Add maybe_non_standard.
* gimple-range-op.cc (gimple_range_op_handler::gimple_range_op_handler):
Use it.
(gimple_range_op_handler::maybe_non_standard): New.
* range-op.cc (class operator_widen_plus_signed,
operator_widen_plus_signed::wi_fold, class operator_widen_plus_unsigned,
operator_widen_plus_unsigned::wi_fold, class operator_widen_mult_signed,
operator_widen_mult_signed::wi_fold, class operator_widen_mult_unsigned,
operator_widen_mult_unsigned::wi_fold,
ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New.
* range-op.h (ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New

Co-Authored-By: Andrew MacLeod 

--- Inline copy of patch ---

diff --git a/gcc/gimple-range-op.h b/gcc/gimple-range-op.h
index 
743b858126e333ea9590c0f175aacb476260c048..1bf63c5ce6f5db924a1f5907ab4539e376281bd0
 100644
--- a/gcc/gimple-range-op.h
+++ b/gcc/gimple-range-op.h
@@ -41,6 +41,7 @@ public:
 relation_trio = TRIO_VARYING);
 private:
   void maybe_builtin_call ();
+  void maybe_non_standard ();
   gimple *m_stmt;
   tree m_op1, m_op2;
 };
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 
d9dfdc56939bb62ade72726b15c3d5e87e4ddcd1..a5d625387e712c170e1e68f6a7d494027f6ef0d0
 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -179,6 +179,8 @@ gimple_range_op_handler::gimple_range_op_handler (gimple *s)
   // statements.
   if (is_a  (m_stmt))
 maybe_builtin_call ();
+  else
+maybe_non_standard ();
 }
 
 // Calculate what we can determine of the range of this unary
@@ -764,6 +766,57 @@ public:
   }
 } op_cfn_parity;
 
+// Set up a gimple_range_op_handler for any nonstandard function which can be
+// supported via range-ops.
+
+void
+gimple_range_op_handler::maybe_non_standard ()
+{
+  range_operator *signed_op = ptr_op_widen_mult_signed;
+  range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
+  if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
+switch (gimple_assign_rhs_code (m_stmt))
+  {
+   case WIDEN_PLUS_EXPR:
+   {
+ signed_op = ptr_op_widen_plus_signed;
+ unsigned_op = ptr_op_widen_plus_unsigned;
+   }
+   gcc_fallthrough ();
+   case WIDEN_MULT_EXPR:
+   {
+ m_valid = false;
+ m_op1 = gimple_assign_rhs1 (m_stmt);
+ m_op2 = gimple_assign_rhs2 (m_stmt);
+ tree ret = gimple_assign_lhs (m_stmt);
+ bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+ bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+ bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
+
+ /* Normally these operands should all have the same sign, but
+some passes and violate this by taking mismatched sign args.  At
+the moment the only one that's possible is mismatch inputs and
+unsigned output.  Once ranger supports signs for the operands we
+can properly fix it,  for now only accept the case we can do
+correctly.  */
+ if ((signed1 ^ signed2) && signed_ret)
+   return;
+
+ m_valid = true;
+ if (signed2 && !signed1)
+   std::swap (m_op1, m_op2);
+
+ if (signed1 || signed2)
+   m_int = signed_op;
+ else
+   m_int = unsigned_op;
+ break;
+   }
+   default:
+ break;
+  }
+}
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 
f00b747f08a1fa8404c63bfe5a931b4048008b03..b1eeac70df81f2bdf228af7adff5399e7ac5e5d6
 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -311,4 +311,8 @@ private:
 // This holds the range op table for floating point operations.
 extern floating_op_table *floating_tree_table;
 
+extern range_operator *ptr_op_widen_mult_signed;
+extern range_operator *ptr_op_widen_mult_unsigned;
+extern range_operator *ptr_op_widen_plus_signed;
+extern range_operator *ptr_op_widen_plus_unsigned;
 #endif // GCC_RANGE_OP_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 
5c67bce6d3aab81ad3186b902e09d6a96878d9bb..718ccb6f074e1a2a9ef1b7a5d4e879898d4a7fc3
 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1556,6 +1556,73 @@ operator_plus::op2_range (irange &r, tree type,
   return op1_range (r, type, lhs, op1, rel.swap_op1_op2 ());
 }
 
+class operator_widen_plus_signed : public range_operator
+{
+public:
+  virtual void wi_fold (irange &r, tree type,
+   const wide_int &lh_lb,
+   

Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-06 Thread Jakub Jelinek via Gcc-patches
On Mon, Mar 06, 2023 at 11:01:18AM +, Richard Biener wrote:
> +  auto_mpfr &operator=(const auto_mpfr &) = delete;
> +  auto_mpz &operator=(const auto_mpz &) = delete;

Just formatting nit, space before (.

Looks like nice improvement and thanks Jonathan for the suggestions ;)

Jakub



Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-06 Thread Jonathan Wakely via Gcc-patches
On Mon, 6 Mar 2023 at 11:01, Richard Biener  wrote:
>
> On Mon, 6 Mar 2023, Jonathan Wakely wrote:
>
> > On Mon, 6 Mar 2023 at 10:11, Richard Biener  wrote:
> > >
> > > The following adds two RAII classes, one for mpz_t and one for mpfr_t
> > > making object lifetime management easier.  Both formerly require
> > > explicit initialization with {mpz,mpfr}_init and release with
> > > {mpz,mpfr}_clear.
> > >
> > > I've converted two example places (where lifetime is trivial).
> > >
> > > I've sofar only build cc1 with the change.  Any comments?
> > >
> > > Thanks,
> > > Richard.
> > >
> > > * system.h (class auto_mpz): New,
> > > * realmpfr.h (class auto_mpfr): Likewise.
> > > * fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
> > > (do_mpfr_arg2): Likewise.
> > > * tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
> > > ---
> > >  gcc/fold-const-call.cc |  8 ++--
> > >  gcc/realmpfr.h | 15 +++
> > >  gcc/system.h   | 14 ++
> > >  gcc/tree-ssa-loop-niter.cc | 10 +-
> > >  4 files changed, 32 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc
> > > index 43819c1f984..fa0b287cc8a 100644
> > > --- a/gcc/fold-const-call.cc
> > > +++ b/gcc/fold-const-call.cc
> > > @@ -130,14 +130,12 @@ do_mpfr_arg1 (real_value *result,
> > >
> > >int prec = format->p;
> > >mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
> > > -  mpfr_t m;
> > >
> > > -  mpfr_init2 (m, prec);
> > > +  auto_mpfr m (prec);
> > >mpfr_from_real (m, arg, MPFR_RNDN);
> > >mpfr_clear_flags ();
> > >bool inexact = func (m, m, rnd);
> > >bool ok = do_mpfr_ckconv (result, m, inexact, format);
> > > -  mpfr_clear (m);
> > >
> > >return ok;
> > >  }
> > > @@ -224,14 +222,12 @@ do_mpfr_arg2 (real_value *result,
> > >
> > >int prec = format->p;
> > >mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
> > > -  mpfr_t m;
> > >
> > > -  mpfr_init2 (m, prec);
> > > +  auto_mpfr m (prec);
> > >mpfr_from_real (m, arg1, MPFR_RNDN);
> > >mpfr_clear_flags ();
> > >bool inexact = func (m, arg0.to_shwi (), m, rnd);
> > >bool ok = do_mpfr_ckconv (result, m, inexact, format);
> > > -  mpfr_clear (m);
> > >
> > >return ok;
> > >  }
> > > diff --git a/gcc/realmpfr.h b/gcc/realmpfr.h
> > > index 5e032c05f25..2db2ecc94d4 100644
> > > --- a/gcc/realmpfr.h
> > > +++ b/gcc/realmpfr.h
> > > @@ -24,6 +24,21 @@
> > >  #include 
> > >  #include 
> > >
> > > +class auto_mpfr
> > > +{
> > > +public:
> > > +  auto_mpfr () { mpfr_init (m_mpfr); }
> > > +  explicit auto_mpfr (mpfr_prec_t prec) { mpfr_init2 (m_mpfr, prec); }
> > > +  ~auto_mpfr () { mpfr_clear (m_mpfr); }
> > > +
> > > +  operator mpfr_t& () { return m_mpfr; }
> >
> >
> > This implicit conversion makes the following mistake possible, if code
> > is incorrectly converted to use it:
> >
> > auto_mpfr m (prec);
> > // ...
> > mpfr_clear (m);  // oops!
> >
> > You could prevent that by adding this to the class body:
> >
> > friend void mpfr_clear (auto_mpfr&) = delete;
> >
> > This will be a better match for calls to mpfr_clear(m) than using the
> > implicit conversion then calling the real function, and will give an
> > error if used:
> > auto.cc:20:13: error: use of deleted function 'void mpfr_clear(auto_mpfr&)'
> >
> > This deleted friend will not be a candidate for calls to mpfr_clear
> > with an argument of any other type, only for calls with an argument of
> > type auto_mpfr.
>
> OK, it might be OK to mpfr_clear() twice and/or mpfr_clear/mpfr_init
> again.  Quite possibly mpfr_init should get the same treatmen, mixing
> auto_* with explicit lifetime management is bad.

Ah yes, good point.

> > > +
> > > +  auto_mpfr (const auto_mpfr &) = delete;
> >
> > This class has an implicit-defined assignment operator, which would
> > result in a leaks and double-frees.
> > You should add:
> >auto_mpfr &operator=(const auto_mpfr &) = delete;
> > This ensures it can't becopied by construction or assignment.
> >
> > The same two comments apply to auto_mpz.
>
> Thanks a lot, I've adjusted the patch to the one below.

LGTM.



Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-06 Thread Richard Biener via Gcc-patches
On Mon, 6 Mar 2023, Jonathan Wakely wrote:

> On Mon, 6 Mar 2023 at 10:11, Richard Biener  wrote:
> >
> > The following adds two RAII classes, one for mpz_t and one for mpfr_t
> > making object lifetime management easier.  Both formerly require
> > explicit initialization with {mpz,mpfr}_init and release with
> > {mpz,mpfr}_clear.
> >
> > I've converted two example places (where lifetime is trivial).
> >
> > I've sofar only build cc1 with the change.  Any comments?
> >
> > Thanks,
> > Richard.
> >
> > * system.h (class auto_mpz): New,
> > * realmpfr.h (class auto_mpfr): Likewise.
> > * fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
> > (do_mpfr_arg2): Likewise.
> > * tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
> > ---
> >  gcc/fold-const-call.cc |  8 ++--
> >  gcc/realmpfr.h | 15 +++
> >  gcc/system.h   | 14 ++
> >  gcc/tree-ssa-loop-niter.cc | 10 +-
> >  4 files changed, 32 insertions(+), 15 deletions(-)
> >
> > diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc
> > index 43819c1f984..fa0b287cc8a 100644
> > --- a/gcc/fold-const-call.cc
> > +++ b/gcc/fold-const-call.cc
> > @@ -130,14 +130,12 @@ do_mpfr_arg1 (real_value *result,
> >
> >int prec = format->p;
> >mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
> > -  mpfr_t m;
> >
> > -  mpfr_init2 (m, prec);
> > +  auto_mpfr m (prec);
> >mpfr_from_real (m, arg, MPFR_RNDN);
> >mpfr_clear_flags ();
> >bool inexact = func (m, m, rnd);
> >bool ok = do_mpfr_ckconv (result, m, inexact, format);
> > -  mpfr_clear (m);
> >
> >return ok;
> >  }
> > @@ -224,14 +222,12 @@ do_mpfr_arg2 (real_value *result,
> >
> >int prec = format->p;
> >mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
> > -  mpfr_t m;
> >
> > -  mpfr_init2 (m, prec);
> > +  auto_mpfr m (prec);
> >mpfr_from_real (m, arg1, MPFR_RNDN);
> >mpfr_clear_flags ();
> >bool inexact = func (m, arg0.to_shwi (), m, rnd);
> >bool ok = do_mpfr_ckconv (result, m, inexact, format);
> > -  mpfr_clear (m);
> >
> >return ok;
> >  }
> > diff --git a/gcc/realmpfr.h b/gcc/realmpfr.h
> > index 5e032c05f25..2db2ecc94d4 100644
> > --- a/gcc/realmpfr.h
> > +++ b/gcc/realmpfr.h
> > @@ -24,6 +24,21 @@
> >  #include 
> >  #include 
> >
> > +class auto_mpfr
> > +{
> > +public:
> > +  auto_mpfr () { mpfr_init (m_mpfr); }
> > +  explicit auto_mpfr (mpfr_prec_t prec) { mpfr_init2 (m_mpfr, prec); }
> > +  ~auto_mpfr () { mpfr_clear (m_mpfr); }
> > +
> > +  operator mpfr_t& () { return m_mpfr; }
> 
> 
> This implicit conversion makes the following mistake possible, if code
> is incorrectly converted to use it:
> 
> auto_mpfr m (prec);
> // ...
> mpfr_clear (m);  // oops!
> 
> You could prevent that by adding this to the class body:
> 
> friend void mpfr_clear (auto_mpfr&) = delete;
> 
> This will be a better match for calls to mpfr_clear(m) than using the
> implicit conversion then calling the real function, and will give an
> error if used:
> auto.cc:20:13: error: use of deleted function 'void mpfr_clear(auto_mpfr&)'
>
> This deleted friend will not be a candidate for calls to mpfr_clear
> with an argument of any other type, only for calls with an argument of
> type auto_mpfr.

OK, it might be OK to mpfr_clear() twice and/or mpfr_clear/mpfr_init
again.  Quite possibly mpfr_init should get the same treatmen, mixing
auto_* with explicit lifetime management is bad.

> > +
> > +  auto_mpfr (const auto_mpfr &) = delete;
> 
> This class has an implicit-defined assignment operator, which would
> result in a leaks and double-frees.
> You should add:
>auto_mpfr &operator=(const auto_mpfr &) = delete;
> This ensures it can't becopied by construction or assignment.
> 
> The same two comments apply to auto_mpz.

Thanks a lot, I've adjusted the patch to the one below.

Richard.

>From c2736b929a3d0440432f31e65f5c89f4ec9dc21d Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Mon, 6 Mar 2023 11:06:38 +0100
Subject: [PATCH] [RFC] RAII auto_mpfr and autp_mpz
To: gcc-patches@gcc.gnu.org

The following adds two RAII classes, one for mpz_t and one for mpfr_t
making object lifetime management easier.  Both formerly require
explicit initialization with {mpz,mpfr}_init and release with
{mpz,mpfr}_clear.

I've converted two example places (where lifetime is trivial).

* system.h (class auto_mpz): New,
* realmpfr.h (class auto_mpfr): Likewise.
* fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
(do_mpfr_arg2): Likewise.
* tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
---
 gcc/fold-const-call.cc |  8 ++--
 gcc/realmpfr.h | 20 
 gcc/system.h   | 18 ++
 gcc/tree-ssa-loop-niter.cc | 10 +-
 4 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc
index 43819

Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-06 Thread Jonathan Wakely via Gcc-patches
On Mon, 6 Mar 2023 at 10:11, Richard Biener  wrote:
>
> The following adds two RAII classes, one for mpz_t and one for mpfr_t
> making object lifetime management easier.  Both formerly require
> explicit initialization with {mpz,mpfr}_init and release with
> {mpz,mpfr}_clear.
>
> I've converted two example places (where lifetime is trivial).
>
> I've sofar only build cc1 with the change.  Any comments?
>
> Thanks,
> Richard.
>
> * system.h (class auto_mpz): New,
> * realmpfr.h (class auto_mpfr): Likewise.
> * fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
> (do_mpfr_arg2): Likewise.
> * tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
> ---
>  gcc/fold-const-call.cc |  8 ++--
>  gcc/realmpfr.h | 15 +++
>  gcc/system.h   | 14 ++
>  gcc/tree-ssa-loop-niter.cc | 10 +-
>  4 files changed, 32 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc
> index 43819c1f984..fa0b287cc8a 100644
> --- a/gcc/fold-const-call.cc
> +++ b/gcc/fold-const-call.cc
> @@ -130,14 +130,12 @@ do_mpfr_arg1 (real_value *result,
>
>int prec = format->p;
>mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
> -  mpfr_t m;
>
> -  mpfr_init2 (m, prec);
> +  auto_mpfr m (prec);
>mpfr_from_real (m, arg, MPFR_RNDN);
>mpfr_clear_flags ();
>bool inexact = func (m, m, rnd);
>bool ok = do_mpfr_ckconv (result, m, inexact, format);
> -  mpfr_clear (m);
>
>return ok;
>  }
> @@ -224,14 +222,12 @@ do_mpfr_arg2 (real_value *result,
>
>int prec = format->p;
>mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
> -  mpfr_t m;
>
> -  mpfr_init2 (m, prec);
> +  auto_mpfr m (prec);
>mpfr_from_real (m, arg1, MPFR_RNDN);
>mpfr_clear_flags ();
>bool inexact = func (m, arg0.to_shwi (), m, rnd);
>bool ok = do_mpfr_ckconv (result, m, inexact, format);
> -  mpfr_clear (m);
>
>return ok;
>  }
> diff --git a/gcc/realmpfr.h b/gcc/realmpfr.h
> index 5e032c05f25..2db2ecc94d4 100644
> --- a/gcc/realmpfr.h
> +++ b/gcc/realmpfr.h
> @@ -24,6 +24,21 @@
>  #include 
>  #include 
>
> +class auto_mpfr
> +{
> +public:
> +  auto_mpfr () { mpfr_init (m_mpfr); }
> +  explicit auto_mpfr (mpfr_prec_t prec) { mpfr_init2 (m_mpfr, prec); }
> +  ~auto_mpfr () { mpfr_clear (m_mpfr); }
> +
> +  operator mpfr_t& () { return m_mpfr; }


This implicit conversion makes the following mistake possible, if code
is incorrectly converted to use it:

auto_mpfr m (prec);
// ...
mpfr_clear (m);  // oops!

You could prevent that by adding this to the class body:

friend void mpfr_clear (auto_mpfr&) = delete;

This will be a better match for calls to mpfr_clear(m) than using the
implicit conversion then calling the real function, and will give an
error if used:
auto.cc:20:13: error: use of deleted function 'void mpfr_clear(auto_mpfr&)'

This deleted friend will not be a candidate for calls to mpfr_clear
with an argument of any other type, only for calls with an argument of
type auto_mpfr.

> +
> +  auto_mpfr (const auto_mpfr &) = delete;

This class has an implicit-defined assignment operator, which would
result in a leaks and double-frees.
You should add:
   auto_mpfr &operator=(const auto_mpfr &) = delete;
This ensures it can't becopied by construction or assignment.

The same two comments apply to auto_mpz.



Re: [committed] testsuite: Fix up syntax errors in scan-tree-dump-times target selectors

2023-03-06 Thread Robin Dapp via Gcc-patches
Hi,

> This broke the tests, I'm seeing syntax errors:
> ERROR: gcc.dg/vect/slp-3.c -flto -ffat-lto-objects: error executing dg-final: 
> syntax error in target selector "target !  vect_partial_vectors || vect32  || 
> s390_vx"
> ERROR: gcc.dg/vect/slp-3.c: error executing dg-final: syntax error in target 
> selector "target !  vect_partial_vectors || vect32  || s390_vx"
> ERROR: gcc.dg/vect/slp-multitypes-11.c -flto -ffat-lto-objects: error 
> executing dg-final: syntax error in target selector "target vect_unpack && 
> vect_partial_vectors_usage_1 &&  ! s390_vx"
> ERROR: gcc.dg/vect/slp-multitypes-11.c: error executing dg-final: syntax 
> error in target selector "target vect_unpack && vect_partial_vectors_usage_1 
> &&  ! s390_vx"

it appears that we are still missing some braces:

diff --git a/gcc/testsuite/gcc.dg/vect/slp-3.c 
b/gcc/testsuite/gcc.dg/vect/slp-3.c
index a0c6a72995bb..760b3fa35a2a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-3.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-3.c
@@ -144,4 +144,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { 
{ ! { vect_partial_vectors || vect32 } } || s390_vx } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect" { target { 
{ vect_partial_vectors || vect32 } && { ! s390_vx } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target { { ! { vect_partial_vectors || vect32 } } || s390_vx } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { vect_partial_vectors || vect32 } && { ! s390_vx } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target { { vect_partial_vectors || vect32 } && { ! s390_vx } } } } } */

Would you mind double-checking and committing if it's OK?

I keep making mistakes with the dejagnu syntax.  I suppose there is no better 
way
to test the selector (and regex) syntax than just running an individual test 
case?

Thanks
 Robin


[PATCH] tree-optimization/109025 - fixup double reduction detection

2023-03-06 Thread Richard Biener via Gcc-patches
The following closes a gap in double reduction detection where we
in the outer loop analysis fail to verify the inner LC PHI use is
the latch definition of the inner loop PHI.  That latch definition
is used to detect that an inner loop is part of a double reduction
when later doing the inner loop analysis.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109025
* tree-vect-loop.cc (vect_is_simple_reduction): Verify
the inner LC PHI use is the inner loop PHI latch definition
before classifying an outer PHI as double reduction.

* gcc.dg/vect/pr109025.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr109025.c | 14 ++
 gcc/tree-vect-loop.cc|  6 +-
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr109025.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr109025.c 
b/gcc/testsuite/gcc.dg/vect/pr109025.c
new file mode 100644
index 000..13fb0ce4ba9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr109025.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int func_4(int t, int b)
+{
+  for (int tt1 = 0; tt1 < 128 ; tt1 ++)
+{
+  for (int tt = 0; tt < 128; tt ++)
+   if (b)
+ t |= 3;
+  t |= 3;
+}
+  return t;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b17e8745d3f..320c15f144b 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3894,6 +3894,8 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
   return NULL;
 }
 
+  /* Verify there is an inner cycle composed of the PHI phi_use_stmt
+and the latch definition op1.  */
   gimple *def1 = SSA_NAME_DEF_STMT (op1);
   if (gimple_bb (def1)
  && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))
@@ -3901,7 +3903,9 @@ vect_is_simple_reduction (loop_vec_info loop_info, 
stmt_vec_info phi_info,
  && flow_bb_inside_loop_p (loop->inner, gimple_bb (def1))
  && (is_gimple_assign (def1) || is_gimple_call (def1))
  && is_a  (phi_use_stmt)
- && flow_bb_inside_loop_p (loop->inner, gimple_bb (phi_use_stmt)))
+ && flow_bb_inside_loop_p (loop->inner, gimple_bb (phi_use_stmt))
+ && (op1 == PHI_ARG_DEF_FROM_EDGE (phi_use_stmt,
+   loop_latch_edge (loop->inner
 {
   if (dump_enabled_p ())
 report_vect_op (MSG_NOTE, def_stmt,
-- 
2.35.3


RE: [PATCH v3] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-03-06 Thread Li, Pan2 via Gcc-patches
Thank you, Kito.

Hi Richard Sandiford,

Could you please help to review this PATCH continuously? Thank you and have a 
nice day!

Pan

-Original Message-
From: Kito Cheng  
Sent: Friday, March 3, 2023 4:06 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; rguent...@suse.de; 
richard.sandif...@arm.com
Subject: Re: [PATCH v3] RISC-V: Bugfix for rvv bool mode precision adjustment

Thanks!

RISC-V part is LGTM, and I would like to wait for Richard Sandiford to say OK 
to the genmodes.cc part :)


On Fri, Mar 3, 2023 at 10:31 AM  wrote:
>
> From: Pan Li 
>
> Fix the bug of the rvv bool mode precision with the adjustment.
> The bits size of vbool*_t will be adjusted to
> [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
> adjusted mode precison of vbool*_t will help underlying pass to
> make the right decision for both the correctness and optimization.
>
> Given below sample code:
> void test_1(int8_t * restrict in, int8_t * restrict out)
> {
>   vbool8_t v2 = *(vbool8_t*)in;
>   vbool16_t v5 = *(vbool16_t*)in;
>   *(vbool16_t*)(out + 200) = v5;
>   *(vbool8_t*)(out + 100) = v2;
> }
>
> Before the precision adjustment:
> addia4,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> addia1,a1,200
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a4)
> // Need one vsetvli and vlm.v for correctness here.
> vsm.v   v24,0(a1)
>
> After the precision adjustment:
> csrrt0,vlenb
> sllit1,t0,1
> csrra3,vlenb
> sub sp,sp,t1
> sllia4,a3,1
> add a4,a4,sp
> sub a3,a4,a3
> vsetvli a5,zero,e8,m1,ta,ma
> addia2,a1,200
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a3)
> addia1,a1,100
> vsetvli a4,zero,e8,mf2,ta,ma
> csrrt0,vlenb
> vlm.v   v25,0(a3)
> vsm.v   v25,0(a2)
> sllit1,t0,1
> vsetvli a5,zero,e8,m1,ta,ma
> vsm.v   v24,0(a1)
> add sp,sp,t1
> jr  ra
>
> However, there may be some optimization opportunates after
> the mode precision adjustment. It can be token care of in
> the RISC-V backend in the underlying separted PR(s).
>
> PR 108185
> PR 108654
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (ADJUST_PRECISION):
> * config/riscv/riscv.cc (riscv_v_adjust_precision):
> * config/riscv/riscv.h (riscv_v_adjust_precision):
> * genmodes.cc (ADJUST_PRECISION):
> (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr108185-1.c: New test.
> * gcc.target/riscv/pr108185-2.c: New test.
> * gcc.target/riscv/pr108185-3.c: New test.
> * gcc.target/riscv/pr108185-4.c: New test.
> * gcc.target/riscv/pr108185-5.c: New test.
> * gcc.target/riscv/pr108185-6.c: New test.
> * gcc.target/riscv/pr108185-7.c: New test.
> * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-modes.def|  8 +++
>  gcc/config/riscv/riscv.cc   | 12 
>  gcc/config/riscv/riscv.h|  1 +
>  gcc/genmodes.cc | 26 ++-
>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++  
> gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
>  12 files changed, 598 insertions(+), 2 deletions(-)  create mode 
> 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def 
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);  ADJUST_BYTESIZE (VNx32BI, 
> riscv_vector_chunks * riscv_bytes_per_vector_chunk);  ADJUST_BYTESIZE 
> (VNx64BI, r

[PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-06 Thread Richard Biener via Gcc-patches
The following adds two RAII classes, one for mpz_t and one for mpfr_t
making object lifetime management easier.  Both formerly require
explicit initialization with {mpz,mpfr}_init and release with
{mpz,mpfr}_clear.

I've converted two example places (where lifetime is trivial).

I've sofar only build cc1 with the change.  Any comments?

Thanks,
Richard.

* system.h (class auto_mpz): New,
* realmpfr.h (class auto_mpfr): Likewise.
* fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr.
(do_mpfr_arg2): Likewise.
* tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
---
 gcc/fold-const-call.cc |  8 ++--
 gcc/realmpfr.h | 15 +++
 gcc/system.h   | 14 ++
 gcc/tree-ssa-loop-niter.cc | 10 +-
 4 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/gcc/fold-const-call.cc b/gcc/fold-const-call.cc
index 43819c1f984..fa0b287cc8a 100644
--- a/gcc/fold-const-call.cc
+++ b/gcc/fold-const-call.cc
@@ -130,14 +130,12 @@ do_mpfr_arg1 (real_value *result,
 
   int prec = format->p;
   mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
-  mpfr_t m;
 
-  mpfr_init2 (m, prec);
+  auto_mpfr m (prec);
   mpfr_from_real (m, arg, MPFR_RNDN);
   mpfr_clear_flags ();
   bool inexact = func (m, m, rnd);
   bool ok = do_mpfr_ckconv (result, m, inexact, format);
-  mpfr_clear (m);
 
   return ok;
 }
@@ -224,14 +222,12 @@ do_mpfr_arg2 (real_value *result,
 
   int prec = format->p;
   mpfr_rnd_t rnd = format->round_towards_zero ? MPFR_RNDZ : MPFR_RNDN;
-  mpfr_t m;
 
-  mpfr_init2 (m, prec);
+  auto_mpfr m (prec);
   mpfr_from_real (m, arg1, MPFR_RNDN);
   mpfr_clear_flags ();
   bool inexact = func (m, arg0.to_shwi (), m, rnd);
   bool ok = do_mpfr_ckconv (result, m, inexact, format);
-  mpfr_clear (m);
 
   return ok;
 }
diff --git a/gcc/realmpfr.h b/gcc/realmpfr.h
index 5e032c05f25..2db2ecc94d4 100644
--- a/gcc/realmpfr.h
+++ b/gcc/realmpfr.h
@@ -24,6 +24,21 @@
 #include 
 #include 
 
+class auto_mpfr
+{
+public:
+  auto_mpfr () { mpfr_init (m_mpfr); }
+  explicit auto_mpfr (mpfr_prec_t prec) { mpfr_init2 (m_mpfr, prec); }
+  ~auto_mpfr () { mpfr_clear (m_mpfr); }
+
+  operator mpfr_t& () { return m_mpfr; }
+
+  auto_mpfr (const auto_mpfr &) = delete;
+
+private:
+  mpfr_t m_mpfr;
+};
+
 /* Convert between MPFR and REAL_VALUE_TYPE.  The caller is
responsible for initializing and clearing the MPFR parameter.  */
 
diff --git a/gcc/system.h b/gcc/system.h
index 64cd5a49258..99f6c410481 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -701,6 +701,20 @@ extern int vsnprintf (char *, size_t, const char *, 
va_list);
 /* Do not introduce a gmp.h dependency on the build system.  */
 #ifndef GENERATOR_FILE
 #include 
+
+class auto_mpz
+{
+public:
+  auto_mpz () { mpz_init (m_mpz); }
+  ~auto_mpz () { mpz_clear (m_mpz); }
+
+  operator mpz_t& () { return m_mpz; }
+
+  auto_mpz (const auto_mpz &) = delete;
+
+private:
+  mpz_t m_mpz;
+};
 #endif
 
 /* Get libiberty declarations.  */
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index dc4c7a418f6..dcfba2fc7ae 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -722,7 +722,6 @@ bound_difference (class loop *loop, tree x, tree y, bounds 
*bnds)
   tree type = TREE_TYPE (x);
   tree varx, vary;
   mpz_t offx, offy;
-  mpz_t minx, maxx, miny, maxy;
   int cnt = 0;
   edge e;
   basic_block bb;
@@ -754,19 +753,12 @@ bound_difference (class loop *loop, tree x, tree y, 
bounds *bnds)
 {
   /* Otherwise, use the value ranges to determine the initial
 estimates on below and up.  */
-  mpz_init (minx);
-  mpz_init (maxx);
-  mpz_init (miny);
-  mpz_init (maxy);
+  auto_mpz minx, maxx, miny, maxy;
   determine_value_range (loop, type, varx, offx, minx, maxx);
   determine_value_range (loop, type, vary, offy, miny, maxy);
 
   mpz_sub (bnds->below, minx, maxy);
   mpz_sub (bnds->up, maxx, miny);
-  mpz_clear (minx);
-  mpz_clear (maxx);
-  mpz_clear (miny);
-  mpz_clear (maxy);
 }
 
   /* If both X and Y are constants, we cannot get any more precise.  */
-- 
2.35.3


Enable scatter for generic

2023-03-06 Thread Jan Hubicka via Gcc-patches
Hi,
while adding tunes to siable scatters on znver4 I mistakely also disabled
them on generic.  This patch fixes it.

Bootstraped/regtested x86_64, comitted.

Honza

gcc/ChangeLog:

2023-03-06  Jan Hubicka  

* config/i386/x86-tune.def (X86_TUNE_USE_SCATTER_2PARTS): Enable for
generic.
(X86_TUNE_USE_SCATTER_4PARTS): Likewise.
(X86_TUNE_USE_SCATTER): Likewise.

diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index 3054656a12c..9d603cc84e4 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -486,7 +486,7 @@ DEF_TUNE (X86_TUNE_USE_GATHER_2PARTS, "use_gather_2parts",
 /* X86_TUNE_USE_SCATTER_2PARTS: Use scater instructions for vectors with 2
elements.  */
 DEF_TUNE (X86_TUNE_USE_SCATTER_2PARTS, "use_scatter_2parts",
- ~(m_ZNVER4 | m_GENERIC))
+ ~(m_ZNVER4))
 
 /* X86_TUNE_USE_GATHER_4PARTS: Use gather instructions for vectors with 4
elements.  */
@@ -496,7 +496,7 @@ DEF_TUNE (X86_TUNE_USE_GATHER_4PARTS, "use_gather_4parts",
 /* X86_TUNE_USE_SCATTER_4PARTS: Use scater instructions for vectors with 4
elements.  */
 DEF_TUNE (X86_TUNE_USE_SCATTER_4PARTS, "use_scatter_4parts",
- ~(m_ZNVER4 | m_GENERIC))
+ ~(m_ZNVER4))
 
 /* X86_TUNE_USE_GATHER: Use gather instructions for vectors with 8 or more
elements.  */
@@ -506,7 +506,7 @@ DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
 /* X86_TUNE_USE_SCATTER: Use scater instructions for vectors with 8 or more
elements.  */
 DEF_TUNE (X86_TUNE_USE_SCATTER, "use_scatter",
- ~(m_ZNVER4 | m_GENERIC))
+ ~(m_ZNVER4))
 
 /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
smaller FMA chain.  */


Re: Ping: [PATCH 1/2] testsuite: Provide means to regexp in multiline patterns

2023-03-06 Thread Mike Stump via Gcc-patches
Ok

On Mar 3, 2023, at 5:58 PM, Hans-Peter Nilsson  wrote:
> 
> Ping...
> 
>> From: Hans-Peter Nilsson 
>> Date: Fri, 24 Feb 2023 20:16:03 +0100
>> 
>> Ok to commit?



Re: [PATCHv2, gfortran] Escalate failure when Hollerith constant to real conversion fails [PR103628]

2023-03-06 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/3/3 20:54, Tobias Burnus wrote:
> Hi Haochen,
> 
> On 03.03.23 10:56, HAO CHEN GUI via Gcc-patches wrote:
>> Sure, I will merge it into the patch and do the regression test.
> Thanks :-)
>> Additionally, Kewen suggested:
 Since this test case is powerpc only, I think it can be moved to 
 gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.expgcc.target/powerpc/ppc-fortran.
>>> Which sounds reasonable.
>> Test cases under gcc.target are tested by check-gcc-c. It greps "warning"
>> and "error" (C style, lower case) from the output while check-gcc-fortran
>> greps "Warning" and "Error" (upper case). As the test case needs to check
>> the "Warning" and "Error" messages. I have to put it in gfortran.dg
>> directory. What's your opinion?
> 
> Thanks for digging and giving a reason.

+1!

I just posted one patch[1] to make ppc-fortran.exp support the need of your
patch here, I verified it can work for this revision, could you double check
with your updated revision?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613442.html

> 
> Looks as if at some point, adapting
> gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp to handle
> this as well could make sense.
> 
> But placing it - as you did - under gcc/testsuite/gfortran.dg is fine
> and surely the simpler solution. Thus, leave it as it.

Yeah, either way works for me.

Thanks again!

BR,
Kewen


[PATCH] testsuite, rs6000: Adjust ppc-fortran.exp to support dg-{warning,error}

2023-03-06 Thread Kewen.Lin via Gcc-patches
Hi,

According to Haochen's finding in [1], currently ppc-fortran.exp
doesn't support Fortran specific warning or error messages well.
By looking into it, it's due to that gfortran uses some different
warning/error prefixes as follows:

set gcc_warning_prefix "\[Ww\]arning:"
set gcc_error_prefix "(Fatal )?\[Ee\]rror:"

comparing to:

set gcc_warning_prefix "warning:"
set gcc_error_prefix "(fatal )?error:"

So this is to override these two prefixes and make it support
dg-{warning,error} checks.

Tested on powerpc64-linux-gnu P7/P8/P9 and
powerpc64le-linux-gnu P9/P10.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613302.html

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/ppc-fortran/ppc-fortran.exp: Override
gcc_{warning,error}_prefix with Fortran specific one used in
gfortran_init.
---
 gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp 
b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
index a556d7b48a3..f7e99ac8487 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
@@ -58,6 +58,11 @@ proc dg-compile-aux-modules { args } {
 }
 }

+# Override gcc_{warning,error}_prefix with Fortran specific prefixes used
+# in gfortran_init to support dg-{warning,error} checks.
+set gcc_warning_prefix "\[Ww\]arning:"
+set gcc_error_prefix "(Fatal )?\[Ee\]rror:"
+
 # Main loop.
 gfortran-dg-runtest [lsort \
[glob -nocomplain $srcdir/$subdir/*.\[fF\]{,90,95,03,08} ] ] "" 
$DEFAULT_FFLAGS
--
2.39.1


[PATCH] rs6000, libgcc: Fix bump size for powerpc64 elfv1 ABI [PR108727]

2023-03-06 Thread Kewen.Lin via Gcc-patches
Hi,

As PR108727 shows, when cleanup code called by the stack
unwinder calls function _Unwind_Resume, it goes via plt
stub like:

   function .plt_call._Unwind_Resume:

=> 0x10003580 <+0>: std r2,40(r1)
   0x10003584 <+4>: ld  r12,-31760(r2)
   0x10003588 <+8>: mtctr   r12
   0x1000358c <+12>:ld  r2,-31752(r2)
   0x10003590 <+16>:cmpldi  r2,0
   0x10003594 <+20>:bnectr+
   0x10003598 <+24>:b   0x100031a4
<_Unwind_Resume@plt>

It wants to save TOC base (r2) to r1 + 40, but we only
bump the stack segment by 32 bytes as follows:

   stdu %r29,-32(%r3)

It means the access is out of the stack segment allocated
by __generic_morestack, once the touch area isn't writable
like this failure shows, it would cause segment fault.

So fix the bump size with one reasonable value PARAMS.

Bootstrapped and regtested on powerpc64-linux-gnu P{8,9} and
powerpc64le-linux-gnu P{8,9,10}.

Alan ack'ed this in that PR, I'm going to push this soon.

BR,
Kewen
-
PR libgcc/108727

libgcc/ChangeLog:

* config/rs6000/morestack.S (__morestack): Use PARAMS for new stack
bump size.
---
 libgcc/config/rs6000/morestack.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgcc/config/rs6000/morestack.S b/libgcc/config/rs6000/morestack.S
index 5e7ad133303..f2fea6abb10 100644
--- a/libgcc/config/rs6000/morestack.S
+++ b/libgcc/config/rs6000/morestack.S
@@ -205,12 +205,12 @@ ENTRY0(__morestack)
bl JUMP_TARGET(__generic_morestack)

 # Start using new stack
-   stdu %r29,-32(%r3)  # back-chain
+   stdu %r29,-PARAMS(%r3)  # back-chain
mr %r1,%r3

 # Set __private_ss stack guard for the new stack.
ld %r12,NEWSTACKSIZE_SAVE(%r29) # modified size
-   addi %r3,%r3,BACKOFF-32
+   addi %r3,%r3,BACKOFF-PARAMS
sub %r3,%r3,%r12
 # Note that a signal frame has $pc pointing at the instruction
 # where the signal occurred.  For something like a timer
--
2.39.2


Re: [PATCH 0/2] LoongArch: testsuite: Fix tests related to stack

2023-03-06 Thread Lulu Cheng



在 2023/3/6 下午4:18, Xi Ruoyao 写道:

On Mon, 2023-03-06 at 16:12 +0800, Lulu Cheng wrote:

Has the first patch been merged into the main branch yet?

I think there is one more test case that needs to be modified:

--- a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
+++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
@@ -1,7 +1,7 @@
   /* Test that LoongArch backend stack drop operation optimized. */

   /* { dg-do compile } */
-/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-options "-O2 -mabi=lp64d -fno-stack-protector" } */
   /* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */

The first patch contains this hunk.  It's r13-6501 now.


Ok!

Thanks! :-)





Re: [PATCH 0/2] LoongArch: testsuite: Fix tests related to stack

2023-03-06 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-03-06 at 16:12 +0800, Lulu Cheng wrote:
> Has the first patch been merged into the main branch yet?
> 
> I think there is one more test case that needs to be modified:
> 
> --- a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
> +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
> @@ -1,7 +1,7 @@
>   /* Test that LoongArch backend stack drop operation optimized. */
> 
>   /* { dg-do compile } */
> -/* { dg-options "-O2 -mabi=lp64d" } */
> +/* { dg-options "-O2 -mabi=lp64d -fno-stack-protector" } */
>   /* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */

The first patch contains this hunk.  It's r13-6501 now.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 0/2] LoongArch: testsuite: Fix tests related to stack

2023-03-06 Thread Lulu Cheng



在 2023/3/6 下午4:02, Xi Ruoyao 写道:

On Mon, 2023-03-06 at 13:59 +0800, Xi Ruoyao via Gcc-patches wrote:

On Mon, 2023-03-06 at 11:16 +0800, Xi Ruoyao wrote:

/* snip */


Sorry for the late reply, the first patch I think is fine. But I haven't
reproduced the problem of the second mail.

Is there any special option in the configuration?

Oh some strange thing might be happening... I'll try to figure out what
has caused the behavior difference.

Oh no, the difference is caused by --enable-default-pie.

Maybe I should just add -fno-PIE for the dg-options.  But now I'm still
puzzled: why would -fPIE affect code generation on LoongArch?  AFAIK all
the code we are generating is position independent (at least for now).

Without -fPIE, the compiler stores a register with no reason:

Pushed the first patch as r13-6501.  The second one is dropped and I've
created PR109035 for the "unnecessary store" issue (after some failed
attempts to triage it).


Has the first patch been merged into the main branch yet?

I think there is one more test case that needs to be modified:

--- a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
+++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
@@ -1,7 +1,7 @@
 /* Test that LoongArch backend stack drop operation optimized. */

 /* { dg-do compile } */
-/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-options "-O2 -mabi=lp64d -fno-stack-protector" } */
 /* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */



$ cat t.c
int test(int x)
{
 char buf[128 << 10];
 return buf[x];
}
$ ./gcc/cc1 t.c -nostdinc  -O2 -fdump-rtl-all -o- 2>/dev/null | grep test: -A20
test:
.LFB0 = .
 lu12i.w $r13,-135168>>12# 0xfffdf000
 ori $r13,$r13,4080
 add.d   $r3,$r3,$r13
.LCFI0 = .
 lu12i.w $r12,-131072>>12# 0xfffe
 lu12i.w $r13,131072>>12 # 0x2
 add.d   $r13,$r13,$r12
 addi.d  $r12,$r3,16
 add.d   $r12,$r13,$r12
 lu12i.w $r13,131072>>12 # 0x2
 st.d$r12,$r3,8
 ori $r13,$r13,16
 ldx.b   $r4,$r12,$r4
 add.d   $r3,$r3,$r13
.LCFI1 = .
 jr  $r1
.LFE0:
 .size   test, .-test
 .section.eh_frame,"aw",@progbits

Note the "st.d  $r12,$r3,8" instruction is completely meaningless.

The t.c.300r.ira dump contains some "interesting" thing:

Pass 0 for finding pseudo/allocno costs

     a0 (r87,l0) best GR_REGS, allocno GR_REGS
     a1 (r84,l0) best NO_REGS, allocno NO_REGS
     a2 (r83,l0) best GR_REGS, allocno GR_REGS

   a0(r87,l0) costs: SIBCALL_REGS:2000,2000 JIRL_REGS:2000,2000 
CSR_REGS:2000,2000 GR_REGS:2000,2000 FP_REGS:8000,8000 ALL_REGS:32000,32000 
MEM:8000,8000
   a1(r84,l0) costs: SIBCALL_REGS:100,100 JIRL_REGS:100,100 
CSR_REGS:100,100 GR_REGS:100,100 FP_REGS:1004000,1004000 
ALL_REGS:1016000,1016000 MEM:1004000,1004000
   a2(r83,l0) costs: SIBCALL_REGS:100,100 JIRL_REGS:100,100 
CSR_REGS:100,100 GR_REGS:100,100 FP_REGS:1004000,1004000 
ALL_REGS:1008000,1008000 MEM:1004000,1004000


Here r84 is the pseudo register for ($frame - 131072).  Any idea why the
compiler selects "NO_REGS" here?

FWIW RISC-V port suffers the same issue:
https://godbolt.org/z/aPorqj73b.






Re: [PATCH 1/2] gcov: Fix "do-while" structure in case statement leads to incorrect code coverage [PR93680]

2023-03-06 Thread Richard Biener via Gcc-patches
On Mon, Mar 6, 2023 at 8:22 AM Xionghu Luo  wrote:
>
>
>
> On 2023/3/2 18:45, Richard Biener wrote:
> >>
> >>
> >>small.gcno:  648:  block 2:`small.c':1, 3, 4, 6
> >>small.gcno:  688:0145:  36:LINES
> >>small.gcno:  700:  block 3:`small.c':8, 9
> >>small.gcno:  732:0145:  32:LINES
> >>small.gcno:  744:  block 5:`small.c':10
> >> -small.gcno:  772:0145:  32:LINES
> >> -small.gcno:  784:  block 6:`small.c':12
> >> -small.gcno:  812:0145:  36:LINES
> >> -small.gcno:  824:  block 7:`small.c':12, 13
> >> +small.gcno:  772:0145:  36:LINES
> >> +small.gcno:  784:  block 6:`small.c':12, 13
> >> +small.gcno:  816:0145:  32:LINES
> >> +small.gcno:  828:  block 8:`small.c':14
> >>small.gcno:  856:0145:  32:LINES
> >> -small.gcno:  868:  block 8:`small.c':14
> >> -small.gcno:  896:0145:  32:LINES
> >> -small.gcno:  908:  block 9:`small.c':17
> >> +small.gcno:  868:  block 9:`small.c':17
> >
> > Looking at the CFG and the instrumentation shows
> >
> > :
> >PROF_edge_counter_17 = __gcov0.f[0];
> >PROF_edge_counter_18 = PROF_edge_counter_17 + 1;
> >__gcov0.f[0] = PROF_edge_counter_18;
> >[t.c:3:7] p_6 = 0;
> >[t.c:5:3] switch (s_7(D))  [INV], [t.c:7:5] case 0:
> >  [INV], [t.c:11:5] case 1:  [INV]>
> >
> > :
> ># n_1 = PHI 
> ># p_3 = PHI <[t.c:3:7] p_6(2), [t.c:8:15] p_12(4)>
> > [t.c:7:5] :
> >[t.c:8:15] p_12 = p_3 + 1;
> >[t.c:8:28] n_13 = n_1 + -1;
> >[t.c:8:28] if (n_13 != 0)
> >  goto ; [INV]
> >else
> >  goto ; [INV]
> >
> > :
> >PROF_edge_counter_21 = __gcov0.f[2];
> >PROF_edge_counter_22 = PROF_edge_counter_21 + 1;
> >__gcov0.f[2] = PROF_edge_counter_22;
> >[t.c:7:5] goto ; [100.00%]
> >
> > :
> >PROF_edge_counter_23 = __gcov0.f[3];
> >PROF_edge_counter_24 = PROF_edge_counter_23 + 1;
> >__gcov0.f[3] = PROF_edge_counter_24;
> >[t.c:9:16] _14 = p_12;
> >[t.c:9:16] goto ; [INV]
> >
> > so the reason this goes wrong is that gcov associates the "wrong"
> > counter with the block containing
> > the 'case' label(s), for the case 0 it should have chosen the counter
> > from bb5 but it likely
> > computed the count of bb3?
> >
> > It might be that ordering blocks differently puts the instrumentation
> > to different blocks or it
> > makes gcovs association chose different blocks but that means it's
> > just luck and not fixing
> > the actual issue?
> >
> > To me it looks like the correct thing to investigate is switch
> > statement and/or case label
> > handling.  One can also see that  having line number 7 is wrong to
> > the extent that
> > the position of the label doesn't match the number of times it
> > executes in the source.  So
> > placement of the label is wrong here, possibly caused by CFG cleanup
> > after CFG build
> > (but generally labels are not used for anything once the CFG is built
> > and coverage
> > instrumentation is late so it might fail due to us moving labels).  It
> > might be OK to
> > avoid moving labels for --coverage but then coverage should possibly
> > look at edges
> > rather than labels?
> >
>
> Thanks, I investigated the Labels, it seems wrong at the beginning from
> .gimple to .cfg very early quite like PR90574:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90574
>
> .gimple:
>
> int f (int s, int n)
> [small.c:2:1] {
>int D.2755;
>int p;
>
>[small.c:3:7] p = 0;
>[small.c:5:3] switch (s) , [small.c:7:5] case 0: 
> , [small.c:11:5] case 1: >
>[small.c:7:5] :  <= case label
>:<= loop label
>[small.c:8:13] p = p + 1;
>[small.c:8:26] n = n + -1;
>[small.c:8:26] if (n != 0) goto ; else goto ;
>:
>[small.c:9:14] D.2755 = p;
>[small.c:9:14] return D.2755;
>[small.c:11:5] :
>:
>[small.c:12:13] p = p + 1;
>[small.c:12:26] n = n + -1;
>[small.c:12:26] if (n != 0) goto ; else goto ;
>:
>[small.c:13:14] D.2755 = p;
>[small.c:13:14] return D.2755;
>:
>[small.c:16:10] D.2755 = 0;
>[small.c:16:10] return D.2755;
> }
>
> .cfg:
>
> int f (int s, int n)
> {
>int p;
>int D.2755;
>
> :
>[small.c:3:7] p = 0;
>[small.c:5:3] switch (s)  [INV], [small.c:7:5] case 0:  
> [INV], [small.c:11:5] case 1:  [INV]>
>
> :
> [small.c:7:5] :   <= case 0
>[small.c:8:13 discrim 1] p = p + 1;
>[small.c:8:26 discrim 1] n = n + -1;
>[small.c:8:26 discrim 1] if (n != 0)
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>[small.c:9:14] D.2755 = p;
>[small.c:9:14] goto ; [INV]
>
> :
> [small.c:11:5] :  <= case 1
>[small.c:12:13 discrim 1] p = p + 1;
>[small.c:12:26 discrim 1] n = n + -1;
>[small.c:12:26 discrim 1] if (n != 0)
>  goto ; [INV]
>else
>  goto ; [INV]

Re: [PATCH 0/2] LoongArch: testsuite: Fix tests related to stack

2023-03-06 Thread Xi Ruoyao via Gcc-patches
On Mon, 2023-03-06 at 13:59 +0800, Xi Ruoyao via Gcc-patches wrote:
> On Mon, 2023-03-06 at 11:16 +0800, Xi Ruoyao wrote:
> 
> /* snip */
> 
> > > > Sorry for the late reply, the first patch I think is fine. But I 
> > > > haven't 
> > > > reproduced the problem of the second mail.
> > > > 
> > > > Is there any special option in the configuration?
> > > 
> > > Oh some strange thing might be happening... I'll try to figure out what
> > > has caused the behavior difference.
> > 
> > Oh no, the difference is caused by --enable-default-pie.
> > 
> > Maybe I should just add -fno-PIE for the dg-options.  But now I'm still
> > puzzled: why would -fPIE affect code generation on LoongArch?  AFAIK all
> > the code we are generating is position independent (at least for now).
> 
> Without -fPIE, the compiler stores a register with no reason:

Pushed the first patch as r13-6501.  The second one is dropped and I've
created PR109035 for the "unnecessary store" issue (after some failed
attempts to triage it).

> $ cat t.c
> int test(int x)
> {
> char buf[128 << 10];
> return buf[x];
> }
> $ ./gcc/cc1 t.c -nostdinc  -O2 -fdump-rtl-all -o- 2>/dev/null | grep test: 
> -A20
> test:
> .LFB0 = .
> lu12i.w $r13,-135168>>12# 0xfffdf000
> ori $r13,$r13,4080
> add.d   $r3,$r3,$r13
> .LCFI0 = .
> lu12i.w $r12,-131072>>12# 0xfffe
> lu12i.w $r13,131072>>12 # 0x2
> add.d   $r13,$r13,$r12
> addi.d  $r12,$r3,16
> add.d   $r12,$r13,$r12
> lu12i.w $r13,131072>>12 # 0x2
> st.d$r12,$r3,8
> ori $r13,$r13,16
> ldx.b   $r4,$r12,$r4
> add.d   $r3,$r3,$r13
> .LCFI1 = .
> jr  $r1
> .LFE0:
> .size   test, .-test
> .section.eh_frame,"aw",@progbits
> 
> Note the "st.d  $r12,$r3,8" instruction is completely meaningless.
> 
> The t.c.300r.ira dump contains some "interesting" thing:
> 
> Pass 0 for finding pseudo/allocno costs
> 
>     a0 (r87,l0) best GR_REGS, allocno GR_REGS
>     a1 (r84,l0) best NO_REGS, allocno NO_REGS
>     a2 (r83,l0) best GR_REGS, allocno GR_REGS
> 
>   a0(r87,l0) costs: SIBCALL_REGS:2000,2000 JIRL_REGS:2000,2000 
> CSR_REGS:2000,2000 GR_REGS:2000,2000 FP_REGS:8000,8000 ALL_REGS:32000,32000 
> MEM:8000,8000
>   a1(r84,l0) costs: SIBCALL_REGS:100,100 JIRL_REGS:100,100 
> CSR_REGS:100,100 GR_REGS:100,100 FP_REGS:1004000,1004000 
> ALL_REGS:1016000,1016000 MEM:1004000,1004000
>   a2(r83,l0) costs: SIBCALL_REGS:100,100 JIRL_REGS:100,100 
> CSR_REGS:100,100 GR_REGS:100,100 FP_REGS:1004000,1004000 
> ALL_REGS:1008000,1008000 MEM:1004000,1004000
> 
> 
> Here r84 is the pseudo register for ($frame - 131072).  Any idea why the
> compiler selects "NO_REGS" here?
> 
> FWIW RISC-V port suffers the same issue:
> https://godbolt.org/z/aPorqj73b.


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Pushed: [PATCH v2] LoongArch: Stop -mfpu from silently breaking ABI [PR109000]

2023-03-06 Thread Xi Ruoyao via Gcc-patches
Pushed r13-6500 and r12-9225.

On Mon, 2023-03-06 at 15:21 +0800, Lulu Cheng wrote:
> 
> 在 2023/3/3 下午4:16, Xi Ruoyao 写道:
> > In the toolchain convention, we describe -mfpu= as:
> > 
> > "Selects the allowed set of basic floating-point instructions and
> > registers. This option should not change the FP calling convention
> > unless it's necessary."
> > 
> > Though not explicitly stated, the rationale of this rule is to allow
> > combinations like "-mabi=lp64s -mfpu=64".  This will be useful for
> > running applications with LP64S/F ABI on a double-float-capable
> > LoongArch hardware and using a math library with LP64S/F ABI but
> > native
> > double float HW instructions, for a better performance.
> > 
> > And now a case in Linux kernel has again proven the usefulness of
> > this
> > kind of combination.  The AMDGPU DCN kernel driver needs to perform
> > some
> > floating-point operation, but the entire kernel uses LP64S ABI.  So
> > the
> > translation units of the AMDGPU DCN driver need to be compiled with
> > -mfpu=64 (the kernel lacks soft-FP routines in libgcc), but -
> > mabi=lp64s
> > (or you can't link it with the other part of the kernel).
> > 
> > Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to
> > determine the floating calling convention.  This causes "-mfpu=64"
> > silently allow using $fa* to pass parameters and return values EVEN
> > IF
> > -mabi=lp64s is used.  To make things worse, the generated object
> > file
> > has SOFT-FLOAT set in the eflags field so the linker will happily
> > link
> > it with other LP64S ABI object files, but obviously this will lead
> > to
> > bad results at runtime.  And for now all loongarch64 CPU models (-
> > march
> > settings) implies -mfpu=64 on by default, so the issue makes a
> > single
> > "-mabi=lp64s" option basically broken (fortunately most projects for
> > eg
> > the Linux kernel have used -msoft-float which implies both -
> > mabi=lp64s
> > and -mfpu=none as we've recommended in the toolchain convention
> > doc).
> > 
> > The fix is simple: use TARGET_*_FLOAT_ABI instead.
> > 
> > I consider this a bug fix: the behavior difference from the
> > toolchain
> > convention doc is a bug, and generating object files with SOFT-FLOAT
> > flag but parameters/return values passed through FPRs is definitely
> > a
> > bug.
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk
> > and
> > release/gcc-12 branch?
> 
> LGTM!
> 
> Thanks!
> 
> > 
> > gcc/ChangeLog:
> > 
> > PR target/109000
> > * config/loongarch/loongarch.h (FP_RETURN): Use
> > TARGET_*_FLOAT_ABI instead of TARGET_*_FLOAT.
> > (UNITS_PER_FP_ARG): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/109000
> > * gcc.target/loongarch/flt-abi-isa-1.c: New test.
> > * gcc.target/loongarch/flt-abi-isa-2.c: New test.
> > * gcc.target/loongarch/flt-abi-isa-3.c: New test.
> > * gcc.target/loongarch/flt-abi-isa-4.c: New test.
> > ---
> >   gcc/config/loongarch/loongarch.h   |  4 ++--
> >   gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c | 14
> > ++
> >   gcc/testsuite/gcc.target/loongarch/flt-abi-isa-2.c | 10 ++
> >   gcc/testsuite/gcc.target/loongarch/flt-abi-isa-3.c |  9 +
> >   gcc/testsuite/gcc.target/loongarch/flt-abi-isa-4.c | 10 ++
> >   5 files changed, 45 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-
> > 1.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-
> > 2.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-
> > 3.c
> >   create mode 100644 gcc/testsuite/gcc.target/loongarch/flt-abi-isa-
> > 4.c
> > 
> > diff --git a/gcc/config/loongarch/loongarch.h
> > b/gcc/config/loongarch/loongarch.h
> > index f4e903d46bb..f8167875646 100644
> > --- a/gcc/config/loongarch/loongarch.h
> > +++ b/gcc/config/loongarch/loongarch.h
> > @@ -676,7 +676,7 @@ enum reg_class
> >  point values.  */
> >   
> >   #define GP_RETURN (GP_REG_FIRST + 4)
> > -#define FP_RETURN ((TARGET_SOFT_FLOAT) ? GP_RETURN : (FP_REG_FIRST
> > + 0))
> > +#define FP_RETURN ((TARGET_SOFT_FLOAT_ABI) ? GP_RETURN :
> > (FP_REG_FIRST + 0))
> >   
> >   #define MAX_ARGS_IN_REGISTERS 8
> >   
> > @@ -1154,6 +1154,6 @@ struct GTY (()) machine_function
> >   /* The largest type that can be passed in floating-point
> > registers.  */
> >   /* TODO: according to mabi.  */
> >   #define UNITS_PER_FP_ARG  \
> > -  (TARGET_HARD_FLOAT ? (TARGET_DOUBLE_FLOAT ? 8 : 4) : 0)
> > +  (TARGET_HARD_FLOAT_ABI ? (TARGET_DOUBLE_FLOAT_ABI ? 8 : 4) : 0)
> >   
> >   #define FUNCTION_VALUE_REGNO_P(N) ((N) == GP_RETURN || (N) ==
> > FP_RETURN)
> > diff --git a/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
> > b/gcc/testsuite/gcc.target/loongarch/flt-abi-isa-1.c
> > new file mode 100644
> > index 000..1c9490f6a87
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/loongarc