[PATCH] PR target/103069: Relax cmpxchg loop for x86 target

2021-11-12 Thread Hongyu Wang via Gcc-patches
Hi,

>From the CPU's point of view, getting a cache line for writing is more
expensive than reading.  See Appendix A.2 Spinlock in:

https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
xeon-lock-scaling-analysis-paper.pdf

The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.

The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under
-march=x86-64 like:

movl(%rdi), %eax
.L2:
movl%eax, %edx
movl%eax, %r8d
orl $esi, %edx
lock cmpxchgl   %edx, (%rdi)
jne .L2
movl%r8d, %eax
ret

To relax above loop, GCC should first emit a normal load, check and jump to
.L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to
yield the CPU to another hyperthread and to save power, so the code is
like

movl(%rdi), %eax
.L4:
movl(%rdi), %ecx
movl%eax, %edx
orl %esi, %edx
cmpl%eax, %ecx
jne .L2
lock cmpxchgl   %edx, (%rdi)
jne .L4
.L2:
rep nop
jmp .L4

This patch adds corresponding atomic_fetch_op expanders to insert load/
compare and pause for all the atomic logic fetch builtins. Add flag
-mrelax-cmpxchg-loop to control whether to generate relaxed loop.

Bootstraped/regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for master?

gcc/ChangeLog:

PR target/103069
* config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
New expand function.
* config/i386/i386-options.c (ix86_target_string): Add
-mrelax-cmpxchg-loop flag.
(ix86_valid_target_attribute_inner_p): Likewise.
* config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop):
New expand function prototype.
* config/i386/i386.opt: Add -mrelax-cmpxchg-loop.
* config/i386/sync.md (atomic_fetch_): New expander
for SI,HI,QI modes.
(atomic__fetch): Likewise.
(atomic_fetch_nand): Likewise.
(atomic_nand_fetch): Likewise.
(atomic_fetch_): New expander for DI,TI modes.
(atomic__fetch): Likewise.
(atomic_fetch_nand): Likewise.
(atomic_nand_fetch): Likewise.
* doc/invoke.texi: Document -mrelax-cmpxchg-loop.

gcc/testsuite/ChangeLog:

PR target/103069
* gcc.target/i386/pr103069-1.c: New test.
* gcc.target/i386/pr103069-2.c: Ditto.
---
 gcc/config/i386/i386-expand.c  |  77 ++
 gcc/config/i386/i386-options.c |   7 +-
 gcc/config/i386/i386-protos.h  |   2 +
 gcc/config/i386/i386.opt   |   4 +
 gcc/config/i386/sync.md| 117 +
 gcc/doc/invoke.texi|   9 +-
 gcc/testsuite/gcc.target/i386/pr103069-1.c |  35 ++
 gcc/testsuite/gcc.target/i386/pr103069-2.c |  70 
 8 files changed, 319 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103069-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103069-2.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 088e6af2258..f8a61835d85 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -23138,4 +23138,81 @@ ix86_expand_divmod_libfunc (rtx libfunc, machine_mode 
mode,
   *rem_p = rem;
 }
 
+void ix86_expand_atomic_fetch_op_loop (rtx target, rtx mem, rtx val,
+  enum rtx_code code, bool after,
+  bool doubleword)
+{
+  rtx old_reg, new_reg, old_mem, success, oldval, new_mem;
+  rtx_code_label *loop_label, *pause_label;
+  machine_mode mode = GET_MODE (target);
+
+  old_reg = gen_reg_rtx (mode);
+  new_reg = old_reg;
+  loop_label = gen_label_rtx ();
+  pause_label = gen_label_rtx ();
+  old_mem = copy_to_reg (mem);
+  emit_label (loop_label);
+  emit_move_insn (old_reg, old_mem);
+
+  /* return value for atomic_fetch_op.  */
+  if (!after)
+emit_move_insn (target, old_reg);
+
+  if (code == NOT)
+{
+  new_reg = expand_simple_binop (mode, AND, new_reg, val, NULL_RTX,
+true, OPTAB_LIB_WIDEN);
+  new_reg = expand_simple_unop (mode, code, new_reg, NULL_RTX, true);
+}
+  else
+new_reg = expand_simple_binop (mode, code, new_reg, val, NULL_RTX,
+  true, OPTAB_LIB_WIDEN);
+
+  /* return value for atomic_op_fetch.  */
+  if (after)
+emit_move_insn (target, new_reg);
+
+  /* Load memory again inside loop.  */
+  new_mem = copy_to_reg (mem);
+  /* Compare mem value with expected value.  */
+
+  if (doubleword)
+{
+  machine_mode half_mode = (mode == DImode)? SImode : DImode;
+  rtx low_new_mem = gen_lowpart (half_mode, new_mem);
+  rtx low_old_mem = gen_lowpart (half_mode, old_mem);
+  rtx high_new_mem = gen_highpart (half_mode, new_mem);
+  rtx high_old_mem = gen_highpart 

Fix modref and hadnling of some builtins

2021-11-12 Thread Jan Hubicka via Gcc-patches
Hi,
ipa-modref gets confused by EAF flags of memcpy becuase parameter 1 is
escaping but used only directly.  In modref we do not track values saved to
memory and thus we clear all other flags on each store.  This needs to also
happen when called function escapes parameter.

gcc/ChangeLog:

PR tree-optimization/103182
* ipa-modref.c (callee_to_caller_flags): Fix merging of flags.
(modref_eaf_analysis::analyze_ssa_name): Fix merging of flags.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index e999c2c5d1e..90985cc1326 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1888,19 +1888,18 @@ callee_to_caller_flags (int call_flags, bool 
ignore_stores,
  that is not the same as caller returning it.  */
   call_flags |= EAF_NOT_RETURNED_DIRECTLY
| EAF_NOT_RETURNED_INDIRECTLY;
-  /* TODO: We miss return value propagation.
- Be conservative and if value escapes to memory
- also mark it as escaping.  */
   if (!ignore_stores && !(call_flags & EAF_UNUSED))
 {
+  /* If value escapes we are no longer able to track what happens
+with it because we can read it from the escaped location
+anytime.  */
   if (!(call_flags & EAF_NO_DIRECT_ESCAPE))
-   lattice.merge (~(EAF_NOT_RETURNED_DIRECTLY
-| EAF_NOT_RETURNED_INDIRECTLY
-| EAF_NO_DIRECT_READ
-| EAF_UNUSED));
-  if (!(call_flags & EAF_NO_INDIRECT_ESCAPE))
+   lattice.merge (0);
+  else if (!(call_flags & EAF_NO_INDIRECT_ESCAPE))
lattice.merge (~(EAF_NOT_RETURNED_INDIRECTLY
 | EAF_NO_DIRECT_READ
+| EAF_NO_INDIRECT_READ
+| EAF_NO_INDIRECT_CLOBBER
 | EAF_UNUSED));
 }
   else
@@ -2036,18 +2035,17 @@ modref_eaf_analysis::analyze_ssa_name (tree name)
 not_returned and escape has same meaning.
 However passing arg to return slot is different.  If
 the callee's return slot is returned it means that
-arg is written to itself which is an escape.  */
+arg is written to itself which is an escape.
+Since we do not track the memory it is written to we
+need to give up on analysisng it.  */
  if (!isretslot)
{
  if (!(call_flags & (EAF_NOT_RETURNED_DIRECTLY
  | EAF_UNUSED)))
-   m_lattice[index].merge (~(EAF_NO_DIRECT_ESCAPE
- | EAF_UNUSED));
- if (!(call_flags & (EAF_NOT_RETURNED_INDIRECTLY
- | EAF_UNUSED)))
-   m_lattice[index].merge (~(EAF_NO_INDIRECT_ESCAPE
- | EAF_NO_DIRECT_READ
- | EAF_UNUSED));
+   m_lattice[index].merge (0);
+ else gcc_checking_assert
+   (call_flags & (EAF_NOT_RETURNED_INDIRECTLY
+  | EAF_UNUSED));
  call_flags = callee_to_caller_flags
   (call_flags, false,
m_lattice[index]);


Re: [COMMITTED] path solver: Solve PHI imports first for ranges.

2021-11-12 Thread Andrew MacLeod via Gcc-patches

On 11/12/21 14:50, Richard Biener via Gcc-patches wrote:

On November 12, 2021 8:46:25 PM GMT+01:00, Aldy Hernandez via Gcc-patches 
 wrote:

PHIs must be resolved first while solving ranges in a block,
regardless of where they appear in the import bitmap.  We went through
a similar exercise for the relational code, but missed these.

Must not all stmts be resolved in program order (for optimality at least)?


Generally,Imports are live on entry values to a block, so their order is 
not particularly important.. they are all simultaneous. PHIs are also 
considered imports for data flow purposes, but they happen before the 
first stmt, all simultaneously... they need to be distinguished because 
phi arguments can refer to other phi defs which may be in this block 
live around a back edge, and we need to be sure we get the right version.


we should look closer to be sure this isn't an accidental fix that 
leaves the root problem .   we need to be sure *all* the PHI arguments 
are resolved from outside this block. whats the testcase?





Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103202
* gimple-range-path.cc
(path_range_query::compute_ranges_in_block): Solve PHI imports first.
---
gcc/gimple-range-path.cc | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index b9aceaf2565..71b290434cb 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -365,12 +365,23 @@ path_range_query::compute_ranges_in_block (basic_block bb)
clear_cache (name);
 }

-  // Solve imports defined in this block.
+  // Solve imports defined in this block, starting with the PHIs...
+  for (gphi_iterator iter = gsi_start_phis (bb); !gsi_end_p (iter);
+   gsi_next ())
+{
+  gphi *phi = iter.phi ();
+  tree name = gimple_phi_result (phi);
+
+  if (import_p (name) && range_defined_in_block (r, name, bb))
+   set_cache (r, name);
+}
+  // ...and then the rest of the imports.
   EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
 {
   tree name = ssa_name (i);

-  if (range_defined_in_block (r, name, bb))
+  if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
+ && range_defined_in_block (r, name, bb))
set_cache (r, name);
 }





Re: [PATCH] libstdc++: Use GCC_TRY_COMPILE_OR_LINK for getentropy, arc4random

2021-11-12 Thread Jonathan Wakely via Gcc-patches
On Fri, 12 Nov 2021, 20:24 Hans-Peter Nilsson via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Since r12-5056-g3439657b0286, there has been a regression in
> test results; an additional 100 FAILs running the g++ and
> libstdc++ testsuite on cris-elf, a newlib target.  The
> failures are linker errors, not finding a definition for
> getentropy.  It appears newlib has since 2017-12-03
> declarations of getentropy and arc4random, and provides an
> implementation of arc4random using getentropy, but provides no
> definition of getentropy, not even a stub yielding ENOSYS.
> This is similar to what it does for many other functions too.
>
> While fixing newlib (like adding said stub) would likely help,
> it still leaves older newlib releases hanging.  Thankfully,
> the libstdc++ configury test can be improved to try linking
> where possible; using the bespoke GCC_TRY_COMPILE_OR_LINK
> instead of AC_TRY_COMPILE.  BTW, I see a lack of consistency;
> some tests use AC_TRY_COMPILE and some GCC_TRY_COMPILE_OR_LINK
> for no apparent reason,


Almost certainly due to me not knowing what I'm doing.


but this commit just amends
> r12-5056-g3439657b0286.
>
> Testing for cris-elf is underway and the log says so far the
> related regressions are fixed.  Ok to commit?
>


OK, thanks!



> libstdc++-v3:
> PR libstdc++/103166
> * acinclude.m4 (GLIBCXX_CHECK_GETENTROPY,
> GLIBCXX_CHECK_ARC4RANDOM):
> Use GCC_TRY_COMPILE_OR_LINK instead of AC_TRY_COMPILE.
> * configure: Regenerate.
> ---
>  libstdc++-v3/acinclude.m4 |  4 ++--
>  libstdc++-v3/configure| 53
> +--
>  2 files changed, 53 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index 4adfdf646acb..30bd92d37f23 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -4839,7 +4839,7 @@ AC_DEFUN([GLIBCXX_CHECK_GETENTROPY], [
>AC_LANG_CPLUSPLUS
>AC_MSG_CHECKING([for getentropy])
>AC_CACHE_VAL(glibcxx_cv_getentropy, [
> -  AC_TRY_COMPILE(
> +  GCC_TRY_COMPILE_OR_LINK(
> [#include ],
> [unsigned i;
>  ::getentropy(, sizeof(i));],
> @@ -4862,7 +4862,7 @@ AC_DEFUN([GLIBCXX_CHECK_ARC4RANDOM], [
>AC_LANG_CPLUSPLUS
>AC_MSG_CHECKING([for arc4random])
>AC_CACHE_VAL(glibcxx_cv_arc4random, [
> -  AC_TRY_COMPILE(
> +  GCC_TRY_COMPILE_OR_LINK(
> [#include ],
> [unsigned i = ::arc4random();],
> [glibcxx_cv_arc4random=yes], [glibcxx_cv_arc4random=no])
> diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> index 3a572475546f..3eb391f409f2 100755
> --- a/libstdc++-v3/configure
> +++ b/libstdc++-v3/configure
> @@ -75445,7 +75445,8 @@ $as_echo_n "checking for getentropy... " >&6; }
>$as_echo_n "(cached) " >&6
>  else
>
> -  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +  if test x$gcc_no_link = xyes; then
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
>  /* end confdefs.h.  */
>  #include 
>  int
> @@ -75463,6 +75464,30 @@ else
>glibcxx_cv_getentropy=no
>  fi
>  rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +else
> +  if test x$gcc_no_link = xyes; then
> +  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES."
> "$LINENO" 5
> +fi
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +#include 
> +int
> +main ()
> +{
> +unsigned i;
> +::getentropy(, sizeof(i));
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_cxx_try_link "$LINENO"; then :
> +  glibcxx_cv_getentropy=yes
> +else
> +  glibcxx_cv_getentropy=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +conftest$ac_exeext conftest.$ac_ext
> +fi
>
>  fi
>
> @@ -75496,7 +75521,8 @@ $as_echo_n "checking for arc4random... " >&6; }
>$as_echo_n "(cached) " >&6
>  else
>
> -  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +  if test x$gcc_no_link = xyes; then
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
>  /* end confdefs.h.  */
>  #include 
>  int
> @@ -75513,6 +75539,29 @@ else
>glibcxx_cv_arc4random=no
>  fi
>  rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +else
> +  if test x$gcc_no_link = xyes; then
> +  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES."
> "$LINENO" 5
> +fi
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +#include 
> +int
> +main ()
> +{
> +unsigned i = ::arc4random();
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_cxx_try_link "$LINENO"; then :
> +  glibcxx_cv_arc4random=yes
> +else
> +  glibcxx_cv_arc4random=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext \
> +conftest$ac_exeext conftest.$ac_ext
> +fi
>
>  fi
>
> --
> 2.11.0
>
>


Re: [PATCH] PR middle-end/103059: reload: Also accept ASHIFT with indexed addressing

2021-11-12 Thread Hans-Peter Nilsson
On Mon, 8 Nov 2021, Maciej W. Rozycki wrote:
> On Sun, 7 Nov 2021, Hans-Peter Nilsson wrote:
> > (I thought you'd use 6cb68940dcf9 and do the same for VAX.)
>
>  I could, easily, but being confined to gcc/config/cris I don't expect it
> to be included in the build let alone trigger anything.

There was some level of misunderstanding here, but even looking
closer, my suggestion wouldn't help.  Still, I'll be more
verbose:

I don't think you got me here.  I mean, do as in 6cb68940dcf9
and for VAX create a define_insn_and_split, recognized at reload
time, where you recognize the MULT form and translate that into
the ASHIFT form, to help reload.

But looking closer, the situation requires a non-cc0-clobbering
insn, but AFAICT VAX only has MOVA and it clobbers
VAX_PSL_REGNUM.  Nevermind...

brgds, H-P


Fix wrong code with pure functions

2021-11-12 Thread Jan Hubicka via Gcc-patches
Fix wrong code with pure functions

I introduced bug into find_func_aliases_for_call in handling pure functions.
Instead of reading global memory pure functions are believed to write global
memory.  This results in misoptimization of the testcase at -O1.

The change to pta-callused.c updates the template for new behaviour of the
constraint generation. We copy nonlocal memory to calluse which is correct but
also not strictly necessary because later we take care to add nonlocal_p flag
manually.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

PR tree-optimization/103209
* tree-ssa-structalias.c (find_func_aliases_for_call): Fix
use of handle_rhs_call

gcc/testsuite/ChangeLog:

PR tree-optimization/103209
* gcc.dg/tree-ssa/pta-callused.c: Update template.
* gcc.c-torture/execute/pr103209.c: New test.

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr103209.c 
b/gcc/testsuite/gcc.c-torture/execute/pr103209.c
new file mode 100644
index 000..481689396f4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr103209.c
@@ -0,0 +1,36 @@
+#include 
+#include 
+
+int32_t a[6];
+int64_t b;
+int32_t *c;
+int32_t **d = 
+int64_t *e = 
+int32_t **const *f = 
+int32_t **const **g = 
+int32_t *h();
+static int16_t j();
+static uint32_t k(int8_t, const int32_t *, int64_t);
+static uint32_t l() {
+  int32_t *m = [3];
+  int32_t n = 0;
+  int8_t o = 0;
+  int32_t *p[] = {, , , };
+  uint32_t q[6][1][2] = {};
+  for (o = 0; o <= 1; o = 6)
+if (h(j(k(3, 0, q[2][0][0]), ), n) == p[3])
+  *m = *e;
+  return 0;
+}
+int32_t *h(uint32_t, int32_t) { return ***g; }
+int16_t j(uint32_t, int32_t *r) { **f = r; return 0;}
+uint32_t k(int8_t, const int32_t *, int64_t) { *e = 3; return 0;}
+int main() {
+  int i = 0;
+  l();
+  for (i = 0; i < 6; i++){
+if (i == 3 && a[i] != 3)
+   __builtin_abort ();
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-callused.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pta-callused.c
index aa639b45dc2..b9a57d8d135 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pta-callused.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-callused.c
@@ -22,5 +22,5 @@ int bar (int b)
   return *foo ();
 }
 
-/* { dg-final { scan-tree-dump "CALLUSED\\(\[0-9\]+\\) = { NONLOCAL f.* i q }" 
"alias" } } */
+/* { dg-final { scan-tree-dump "CALLUSED\\(\[0-9\]+\\) = { ESCAPED NONLOCAL 
f.* i q }" "alias" } } */
 
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 153ddf57a61..34fd47fdf47 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4996,7 +4996,7 @@ find_func_aliases_for_call (struct function *fn, gcall *t)
 reachable from their arguments, but they are not an escape
 point for reachable memory of their arguments.  */
   else if (flags & (ECF_PURE|ECF_LOOPING_CONST_OR_PURE))
-   handle_rhs_call (t, , implicit_pure_eaf_flags, true, false);
+   handle_rhs_call (t, , implicit_pure_eaf_flags, false, true);
   /* If the call is to a replaceable operator delete and results
 from a delete expression as opposed to a direct call to
 such operator, then the effects for PTA (in particular


Re: [RFC PATCH] or1k: Fix clobbering of _mcount argument if fPIC is enabled

2021-11-12 Thread Stafford Horne via Gcc-patches
I have committed this as is.

-Stafford

On Tue, Nov 09, 2021 at 09:13:08PM +0900, Stafford Horne wrote:
> Recently we changed the PROFILE_HOOK _mcount call to pass in the link
> register as an argument.  This actually does not work when the _mcount
> call uses a PLT because the GOT register setup code ends up getting
> inserted before the PROFILE_HOOK and clobbers the link register
> argument.
> 
> These glibc tests are failing:
>   gmon/tst-gmon-pie-gprof
>   gmon/tst-gmon-static-gprof
> 
> This patch fixes this by saving the instruction that stores the Link
> Register to the _mcount argument and then inserts the GOT register setup
> instructions after that.
> 
> For example:
> 
> main.c:
> 
> extern int e;
> 
> int f2(int a) {
>   return a + e;
> }
> 
> int f1(int a) {
>   return f2 (a + a);
> }
> 
> int main(int argc, char ** argv) {
>   return f1 (argc);
> }
> 
> Compiled:
> 
> or1k-smh-linux-gnu-gcc -Wall -c -O2 -fPIC -pg -S main.c
> 
> Before Fix:
> 
> main:
> l.addi  r1, r1, -16
> l.sw8(r1), r2
> l.sw0(r1), r16
> l.addi  r2, r1, 16   # Keeping FP, but not needed
> l.sw4(r1), r18
> l.sw12(r1), r9
> l.jal   8# GOT Setup clobbers r9 (Link Register)
>  l.movhir16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
> l.ori   r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
> l.add   r16, r16, r9
> l.orr18, r3, r3
> l.orr3, r9, r9# This is not the original LR
> l.jal   plt(_mcount)
>  l.nop
> 
> l.jal   plt(f1)
>  l.orr3, r18, r18
> l.lwz   r9, 12(r1)
> l.lwz   r16, 0(r1)
> l.lwz   r18, 4(r1)
> l.lwz   r2, 8(r1)
> l.jrr9
>  l.addi  r1, r1, 16
> 
> After the fix:
> 
> main:
> l.addi  r1, r1, -12
> l.sw0(r1), r16
> l.sw4(r1), r18
> l.sw8(r1), r9
> l.orr18, r3, r3
> l.orr3, r9, r9# We now have r9 (LR) set early
> l.jal   8 # Clobbers r9 (Link Register)
>  l.movhir16, gotpchi(_GLOBAL_OFFSET_TABLE_-4)
> l.ori   r16, r16, gotpclo(_GLOBAL_OFFSET_TABLE_+0)
> l.add   r16, r16, r9
> l.jal   plt(_mcount)
>  l.nop
> 
> l.jal   plt(f1)
>  l.orr3, r18, r18
> l.lwz   r9, 8(r1)
> l.lwz   r16, 0(r1)
> l.lwz   r18, 4(r1)
> l.jrr9
>  l.addi  r1, r1, 12
> 
> Fixes: 308531d148a ("or1k: Add return address argument to _mcount call")
> 
> gcc/ChangeLog:
>   * config/or1k/or1k-protos.h (or1k_profile_hook): New function.
>   * config/or1k/or1k.h (PROFILE_HOOK): Change macro to reference
>   new function or1k_profile_hook.
>   * config/or1k/or1k.c (struct machine_function): Add new field
>   set_mcount_arg_insn.
>   (or1k_profile_hook): New function.
>   (or1k_init_pic_reg): Update to inject pic rtx after _mcount arg
>   when profiling.
>   (or1k_frame_pointer_required): Frame pointer no longer needed
>   when profiling.
> ---
> I am sending this as RFC as I think there should be a better way to handle
> this but I am not sure how that would be.
> 
> An earlier patch I tried was to store the link register to a temporary 
> register
> then pass the temporary register as an argument to _mcount, however
> optimizations caused the link register to still get clobbered.
> 
> Any thoughts will be helpful.
> 
> -Stafford
> 
>  gcc/config/or1k/or1k-protos.h |  1 +
>  gcc/config/or1k/or1k.c| 49 ---
>  gcc/config/or1k/or1k.h|  8 +-
>  3 files changed, 42 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/or1k/or1k-protos.h b/gcc/config/or1k/or1k-protos.h
> index bbb54c8f790..56554f2937f 100644
> --- a/gcc/config/or1k/or1k-protos.h
> +++ b/gcc/config/or1k/or1k-protos.h
> @@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
>  extern HOST_WIDE_INT or1k_initial_elimination_offset (int, int);
>  extern void or1k_expand_prologue (void);
>  extern void or1k_expand_epilogue (void);
> +extern void or1k_profile_hook (void);
>  extern void or1k_expand_eh_return (rtx);
>  extern rtx  or1k_initial_frame_addr (void);
>  extern rtx  or1k_dynamic_chain_addr (rtx);
> diff --git a/gcc/config/or1k/or1k.c b/gcc/config/or1k/or1k.c
> index e772a7addea..335c4c5decf 100644
> --- a/gcc/config/or1k/or1k.c
> +++ b/gcc/config/or1k/or1k.c
> @@ -73,6 +73,10 @@ struct GTY(()) machine_function
>  
>/* Remember where the set_got_placeholder is located.  */
>rtx_insn *set_got_insn;
> +
> +  /* Remember where mcount args are stored so we can insert set_got_insn
> + after.  */
> +  rtx_insn *set_mcount_arg_insn;
>  };
>  
>  /* Zero initialization is OK for all current fields.  */
> @@ -415,6 +419,25 @@ or1k_expand_epilogue (void)
>  EH_RETURN_STACKADJ_RTX));
>  }
>  
> 

Re: [PATCH] options: Make -Ofast switch off -fsemantic-interposition

2021-11-12 Thread Martin Jambor
Hi,

On Fri, Nov 12 2021, Martin Jambor wrote:
> Hi,
>
> using -fno-semantic-interposition has been reported by various people
> to bring about considerable speed up at the cost of strict compliance
> to the ELF symbol interposition rules  See for example
> https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
>
> As such I believe it should be implied by our -Ofast optimization
> level, not only so that benchmarks that can benefit run faster, but
> also so that people looking at -Ofast documentation for options that
> could speed their programs find it.
>
> I have verified that with the following patch IPA-CP sees
> flag_semantic_interposition set to zero at Ofast and that info and pdf
> manual builds fine with the documentation change.  I am bootstrapping
> and testing it now in order to comply with submission criteria but I
> don't think an Ofast change gets much tested.
>
> Assuming it passes, is the patch OK?  (If it is, I will also add a note
> about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> after I commit the patch.)
>

Unfortunately, I was wrong, there are testcases which use the optimize
attribute to switch a function to Ofast and those ICE because
-fsemantic-interposition is not an optimization flag and only
optimization flags can change in an optimize attribute (AFAIK, I only
had a quick glance at the results).

I am not sure what is the right way to tackle this, whether to set the
flag at Ofast in some nonstandard way or make the flag an optimization
flag - probably affecting function definitions, having it affect
call-sites seems too fine-grained.  I will try to discuss this on IRC on
Monday (and hope such change is still doable early stage3).

Sorry for posting this a bit prematurely,

Martin

>
>
> gcc/ChangeLog:
>
> 2021-11-12  Martin Jambor  
>
>   * opts.c (default_options_table): Switch off
>   flag_semantic_interposition at Ofast.
>   * doc/invoke.texi (Optimize Options): Document that Ofast switches off
>   -fsemantic-interposition.
> ---
>  gcc/doc/invoke.texi | 1 +
>  gcc/opts.c  | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2ea23d07c4c..fd16c91aec8 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10551,6 +10551,7 @@ valid for all standard-compliant programs.
>  It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
>  and the Fortran-specific @option{-fstack-arrays}, unless
>  @option{-fmax-stack-var-size} is specified, and @option{-fno-protect-parens}.
> +It turns off @option {-fsemantic-interposition}.
>  
>  @item -Og
>  @opindex Og
> diff --git a/gcc/opts.c b/gcc/opts.c
> index caed6255500..3da53d8f890 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -682,6 +682,7 @@ static const struct default_options 
> default_options_table[] =
>  /* -Ofast adds optimizations to -O3.  */
>  { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
>  { OPT_LEVELS_FAST, OPT_fallow_store_data_races, NULL, 1 },
> +{ OPT_LEVELS_FAST, OPT_fsemantic_interposition, NULL, 0 },
>  
>  { OPT_LEVELS_NONE, 0, NULL, 0 }
>};
> -- 
> 2.33.0


[PATCH] fixincludes: simplify handling for access() failure [PR21283, PR80047]

2021-11-12 Thread Xi Ruoyao via Gcc-patches
POSIX says:

On some implementations, if buf is a null pointer, getcwd() may obtain
size bytes of memory using malloc(). In this case, the pointer returned
by getcwd() may be used as the argument in a subsequent call to free().
Invoking getcwd() with buf as a null pointer is not recommended in
conforming applications.

This produces an error building GCC with --enable-werror-always:

../../../fixincludes/fixincl.c: In function ‘process’:
../../../fixincludes/fixincl.c:1356:7: error: argument 1 is null but
the corresponding size argument 2 value is 4096 [-Werror=nonnull]

It's suggested by POSIX to call getcwd() with progressively larger
buffers until it does not give an [ERANGE] error. However, it's highly
unlikely that this error-handling route is ever used.

So we can simplify it instead of writting too much code.  We give up to
use getcwd(), because `make` will output a `Leaving directory ...` message
containing the path to cwd when we call abort().

fixincludes/ChangeLog:

PR other/21823
PR bootstrap/80047
* fixincl.c (process): Simplify the handling for highly
  unlikely access() failure, to avoid using non-standard
  extensions.
---
 fixincludes/fixincl.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index 6dba2f6e830..ee57fbf61b4 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -1352,11 +1352,10 @@ process (void)
 
   if (access (pz_curr_file, R_OK) != 0)
 {
-  int erno = errno;
-  fprintf (stderr, "Cannot access %s from %s\n\terror %d (%s)\n",
-   pz_curr_file, getcwd ((char *) NULL, MAXPATHLEN),
-   erno, xstrerror (erno));
-  return;
+  /* Some really strange error happened. */
+  fprintf (stderr, "Cannot access %s: %s\n", pz_curr_file,
+  xstrerror (errno));
+  abort();
 }
 
   pz_curr_data = load_file (pz_curr_file);
-- 
2.33.1

> On Fri, 2021-11-12 at 12:59 -0800, Bruce Korb wrote:
> > If you are going to be excruciatingly, painfully correct, free() is
> > going to be unhappy about freeing a static string in the event
> > getcwd() fails for some inexplicable reason. I'd replace the free()
> +
> > return with a call to exit. Maybe even:
> 
> It's free (buf), not free (cwd).  buf won't point to a static string.
> 
> buf may be NULL though, but free (NULL) is legal (no-op).
> 
> 
> > > if (VERY_UNLIKELY (access (pz_curr_file, R_OK) != 0)) abort()
> 
> Perhaps just 
> 
> if (access (pz_curr_file, R_OK) != 0))
>   {
> /* Some really inexplicable error happens. */
> fprintf (stderr, "Cannot access %s: %s",
>  pz_curr_file, xstrerror (errno));
> abort();
>   }
> 
> It will show which file can't be accessed so it's possible to
> diagnose.
> And the working directory will be outputed by "make" when the fixincl
> command fails anyway, so we don't need to really care it.


Re: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Alexander Monakov via Gcc-patches



On Fri, 12 Nov 2021, Jakub Jelinek via Gcc-patches wrote:

> On Fri, Nov 12, 2021 at 08:47:09PM +0100, Jakub Jelinek wrote:
> > The problem is that the argument of the num_teams clause isn't always known
> > before target is launched.
> 
> There was a design mistake that the clause has been put on teams rather than
> on target (well, for host teams we need it on teams), and 5.1 actually
> partially fixes this up for thread_limit by allowing that clause on both,
> but not for num_teams.

If this is a mistake in the standard, can GCC say "the spec is bad; fix the
spec" and refuse to implement support, since it penalizes the common case?

Technically, this could be implemented without penalizing the common case via
CUDA "dynamic parallelism" where you initially launch just one block on the
device that figures out the dimensions and then performs a GPU-side launch of
the required amount of blocks, but that's a nontrivial amount of work.

I looked over your patch. I sent a small nitpick about 'nocommon' in a separate
message, and I still think it's better to adjust GOMP_OFFLOAD_run to take into
account the lower bound when it's known on the host side (otherwise you do
static scheduling of blocks which is going to be inferior to dynamic scheduling:
imagine lower bound is 3, and maximum resident blocks is 2: then you first do
teams 0 and 1 in parallel, then you do team 2 from the 0'th block, while in fact
you want to do it from whichever block finished its initial team first).

Alexander


[r12-5201 Regression] FAIL: g++.dg/pr98499.C -std=gnu++98 execution test on Linux/x86_64

2021-11-12 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

4526ec20f17a6182f754da9460d9d944dd123cc0 is the first bad commit
commit 4526ec20f17a6182f754da9460d9d944dd123cc0
Author: Jan Hubicka 
Date:   Fri Nov 12 16:34:03 2021 +0100

Fix ICE in tree-ssa-structalias.c

caused

FAIL: g++.dg/pr98499.C  -std=gnu++14 execution test
FAIL: g++.dg/pr98499.C  -std=gnu++17 execution test
FAIL: g++.dg/pr98499.C  -std=gnu++2a execution test
FAIL: g++.dg/pr98499.C  -std=gnu++98 execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5201/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr98499.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr98499.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr98499.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr98499.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] Combine malloc + memset to calloc

2021-11-12 Thread Arnaud Charlet via Gcc-patches
> I apologize this is the diff I meant to send:

Thanks for sending this diff.

Note that in order to allow a review (and approval) of your change,
you need to send also an explanation of your change, as well as the
corresponding commit log.

Thanks in advance!

Arno


Re: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Alexander Monakov via Gcc-patches
On Fri, 12 Nov 2021, Jakub Jelinek via Gcc-patches wrote:

> --- libgomp/config/nvptx/team.c.jj2021-05-25 13:43:02.793121350 +0200
> +++ libgomp/config/nvptx/team.c   2021-11-12 17:49:02.847341650 +0100
> @@ -32,6 +32,7 @@
>  #include 
>  
>  struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
> +int __gomp_team_num __attribute__((shared));

It's going to be weird to have two declarations next to each other, one with
'nocommon', one without. Could you have 'nocommon' also on the new one, and
then, if you like, to add extern declarations for both variables and drop the
attribute (in a separate patch)?

Alexander


Re: [PATCH] fixincludes: fix portability issues about getcwd() [PR21283, PR80047]

2021-11-12 Thread Xi Ruoyao via Gcc-patches
On Fri, 2021-11-12 at 12:59 -0800, Bruce Korb wrote:
> If you are going to be excruciatingly, painfully correct, free() is
> going to be unhappy about freeing a static string in the event
> getcwd() fails for some inexplicable reason. I'd replace the free() +
> return with a call to exit. Maybe even:

It's free (buf), not free (cwd).  buf won't point to a static string.

buf may be NULL though, but free (NULL) is legal (no-op).


> > if (VERY_UNLIKELY (access (pz_curr_file, R_OK) != 0)) abort()

Perhaps just 

if (access (pz_curr_file, R_OK) != 0))
  {
/* Some really inexplicable error happens. */
fprintf (stderr, "Cannot access %s: %s",
 pz_curr_file, xstrerror (errno));
abort();
  }

It will show which file can't be accessed so it's possible to diagnose.
And the working directory will be outputed by "make" when the fixincl
command fails anyway, so we don't need to really care it.

> On 11/11/21 8:33 AM, Xi Ruoyao wrote:
>  
> > ---
> >  fixincludes/fixincl.c | 13 +++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
> > index 6dba2f6e830..1580c67efec 100644
> > --- a/fixincludes/fixincl.c
> > +++ b/fixincludes/fixincl.c
> > @@ -1353,9 +1353,18 @@ process (void)
> >    if (access (pz_curr_file, R_OK) != 0)
> >  {
> >    int erno = errno;
> > +  char *buf = NULL;
> > +  const char *cwd = NULL;
> > +  for (size_t size = 256; !cwd; size += size)
> > +   {
> > + buf = xrealloc (buf, size);
> > + cwd = getcwd (buf, size);
> > + if (!cwd && errno != ERANGE)
> > +   cwd = "the working directory";
> > +   }
> >    fprintf (stderr, "Cannot access %s from %s\n\terror %d
> > (%s)\n",
> > -   pz_curr_file, getcwd ((char *) NULL, MAXPATHLEN),
> > -   erno, xstrerror (erno));
> > +  pz_curr_file, cwd, erno, xstrerror (erno));
> > +  free (buf);
> >    return;
> >  }
> >  
>  

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] Combine malloc + memset to calloc

2021-11-12 Thread Seija K. via Gcc-patches
I apologize this is the diff I meant to send:

diff --git a/gcc/ada/terminals.c b/gcc/ada/terminals.c
index a2dd4895d48..25d9acda752 100644
--- a/gcc/ada/terminals.c
+++ b/gcc/ada/terminals.c
@@ -609,8 +609,7 @@ __gnat_setup_communication (struct TTY_Process**
process_out) /* output param */
 {
   struct TTY_Process* process;

-  process = (struct TTY_Process*)malloc (sizeof (struct TTY_Process));
-  ZeroMemory (process, sizeof (struct TTY_Process));
+  process = (struct TTY_Process*)calloc (1, sizeof (struct TTY_Process));
   *process_out = process;

   return 0;
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c
b/gcc/config/rs6000/rs6000-gen-builtins.c
index 1655a2fd765..2c895a2d9a9 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -1307,8 +1307,7 @@ parse_args (prototype *protoptr)
   do {
 consume_whitespace ();
 int oldpos = pos;
-typelist *argentry = (typelist *) malloc (sizeof (typelist));
-memset (argentry, 0, sizeof *argentry);
+typelist *argentry = (typelist *) calloc (1, sizeof (typelist));
 typeinfo *argtype = >info;
 success = match_type (argtype, VOID_NOTOK);
 if (success)
diff --git a/gcc/d/dmd/ctfeexpr.c b/gcc/d/dmd/ctfeexpr.c
index a8e97833ad0..1acad62c371 100644
--- a/gcc/d/dmd/ctfeexpr.c
+++ b/gcc/d/dmd/ctfeexpr.c
@@ -1350,8 +1350,7 @@ int ctfeRawCmp(Loc loc, Expression *e1, Expression
*e2)
 if (es2->keys->length != dim)
 return 1;

-bool *used = (bool *)mem.xmalloc(sizeof(bool) * dim);
-memset(used, 0, sizeof(bool) * dim);
+bool *used = (bool *) mem.xcalloc (dim, sizeof(bool));

 for (size_t i = 0; i < dim; ++i)
 {
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..f5bff8b9441 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3081,9 +3081,16 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
  0).exists ())
  {
   unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
-  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
-  memset (buf, (init_type == AUTO_INIT_PATTERN
- ? INIT_PATTERN_VALUE : 0), total_bytes);
+  unsigned char *buf;
+if (init_type == AUTO_INIT_PATTERN)
+  {
+buf = (unsigned char *) xmalloc (total_bytes);
+memset (buf, INIT_PATTERN_VALUE, total_bytes);
+  }
+else
+  {
+buf = (unsigned char *) xcalloc (1, total_bytes);
+  }
   tree itype = build_nonstandard_integer_type
  (total_bytes * BITS_PER_UNIT, 1);
   wide_int w = wi::from_buffer (buf, total_bytes);
diff --git a/libiberty/calloc.c b/libiberty/calloc.c
index f4bd27b1cd2..1ef4156d28a 100644
--- a/libiberty/calloc.c
+++ b/libiberty/calloc.c
@@ -17,7 +17,7 @@ Uses @code{malloc} to allocate storage for @var{nelem}
objects of

 /* For systems with larger pointers than ints, this must be declared.  */
 PTR malloc (size_t);
-void bzero (PTR, size_t);
+void memset (PTR, int, size_t);

 PTR
 calloc (size_t nelem, size_t elsize)
@@ -28,7 +28,7 @@ calloc (size_t nelem, size_t elsize)
 nelem = elsize = 1;

   ptr = malloc (nelem * elsize);
-  if (ptr) bzero (ptr, nelem * elsize);
+  if (ptr) memset (ptr, 0, nelem * elsize);

   return ptr;
 }
diff --git a/libiberty/partition.c b/libiberty/partition.c
index 81e5fc0f79a..75512d67258 100644
--- a/libiberty/partition.c
+++ b/libiberty/partition.c
@@ -146,8 +146,7 @@ partition_print (partition part, FILE *fp)
   int e;

   /* Flag the elements we've already printed.  */
-  done = (char *) xmalloc (num_elements);
-  memset (done, 0, num_elements);
+  done = (char *) xcalloc (num_elements, 1);

   /* A buffer used to sort elements in a class.  */
   class_elements = (int *) xmalloc (num_elements * sizeof (int));
diff --git a/libobjc/gc.c b/libobjc/gc.c
index 57895e61930..95a75f5cb2e 100644
--- a/libobjc/gc.c
+++ b/libobjc/gc.c
@@ -307,10 +307,9 @@ __objc_generate_gc_type_description (Class class)
  / sizeof (void *));
   size = ROUND (bits_no, BITS_PER_WORD) / BITS_PER_WORD;
   mask = objc_atomic_malloc (size * sizeof (int));
-  memset (mask, 0, size * sizeof (int));

   class_structure_type = objc_atomic_malloc (type_size);
-  *class_structure_type = current = 0;
+  current = 0;
   __objc_class_structure_encoding (class, _structure_type,
_size, );
   if (current + 1 == type_size)


[PATCH] Combine malloc + memset to calloc

2021-11-12 Thread Seija K. via Gcc-patches
diff --git a/gcc/ada/terminals.c b/gcc/ada/terminals.c
index a2dd4895d48..25d9acda752 100644
--- a/gcc/ada/terminals.c
+++ b/gcc/ada/terminals.c
@@ -609,8 +609,7 @@ __gnat_setup_communication (struct TTY_Process**
process_out) /* output param */
 {
   struct TTY_Process* process;

-  process = (struct TTY_Process*)malloc (sizeof (struct TTY_Process));
-  ZeroMemory (process, sizeof (struct TTY_Process));
+  process = (struct TTY_Process*)calloc (1, sizeof (struct TTY_Process));
   *process_out = process;

   return 0;
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c
b/gcc/config/rs6000/rs6000-gen-builtins.c
index 1655a2fd765..2c895a2d9a9 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -1307,8 +1307,7 @@ parse_args (prototype *protoptr)
   do {
 consume_whitespace ();
 int oldpos = pos;
-typelist *argentry = (typelist *) malloc (sizeof (typelist));
-memset (argentry, 0, sizeof *argentry);
+typelist *argentry = (typelist *) calloc (1, sizeof (typelist));
 typeinfo *argtype = >info;
 success = match_type (argtype, VOID_NOTOK);
 if (success)
diff --git a/gcc/d/dmd/ctfeexpr.c b/gcc/d/dmd/ctfeexpr.c
index a8e97833ad0..0086aceef84 100644
--- a/gcc/d/dmd/ctfeexpr.c
+++ b/gcc/d/dmd/ctfeexpr.c
@@ -1350,8 +1350,7 @@ int ctfeRawCmp(Loc loc, Expression *e1, Expression
*e2)
 if (es2->keys->length != dim)
 return 1;

-bool *used = (bool *)mem.xmalloc(sizeof(bool) * dim);
-memset(used, 0, sizeof(bool) * dim);
+bool *used = (bool *) mem.xcalloc(dim, sizeof(bool));

 for (size_t i = 0; i < dim; ++i)
 {
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..f5bff8b9441 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3081,9 +3081,16 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
  0).exists ())
  {
   unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
-  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
-  memset (buf, (init_type == AUTO_INIT_PATTERN
- ? INIT_PATTERN_VALUE : 0), total_bytes);
+  unsigned char *buf;
+if (init_type == AUTO_INIT_PATTERN)
+  {
+buf = (unsigned char *) xmalloc (total_bytes);
+memset (buf, INIT_PATTERN_VALUE, total_bytes);
+  }
+else
+  {
+buf = (unsigned char *) xcalloc (1, total_bytes);
+  }
   tree itype = build_nonstandard_integer_type
  (total_bytes * BITS_PER_UNIT, 1);
   wide_int w = wi::from_buffer (buf, total_bytes);
diff --git a/libiberty/calloc.c b/libiberty/calloc.c
index f4bd27b1cd2..1ef4156d28a 100644
--- a/libiberty/calloc.c
+++ b/libiberty/calloc.c
@@ -17,7 +17,7 @@ Uses @code{malloc} to allocate storage for @var{nelem}
objects of

 /* For systems with larger pointers than ints, this must be declared.  */
 PTR malloc (size_t);
-void bzero (PTR, size_t);
+void memset (PTR, int, size_t);

 PTR
 calloc (size_t nelem, size_t elsize)
@@ -28,7 +28,7 @@ calloc (size_t nelem, size_t elsize)
 nelem = elsize = 1;

   ptr = malloc (nelem * elsize);
-  if (ptr) bzero (ptr, nelem * elsize);
+  if (ptr) memset (ptr, 0, nelem * elsize);

   return ptr;
 }
diff --git a/libiberty/partition.c b/libiberty/partition.c
index 81e5fc0f79a..75512d67258 100644
--- a/libiberty/partition.c
+++ b/libiberty/partition.c
@@ -146,8 +146,7 @@ partition_print (partition part, FILE *fp)
   int e;

   /* Flag the elements we've already printed.  */
-  done = (char *) xmalloc (num_elements);
-  memset (done, 0, num_elements);
+  done = (char *) xcalloc (num_elements, 1);

   /* A buffer used to sort elements in a class.  */
   class_elements = (int *) xmalloc (num_elements * sizeof (int));
diff --git a/libobjc/gc.c b/libobjc/gc.c
index 57895e61930..95a75f5cb2e 100644
--- a/libobjc/gc.c
+++ b/libobjc/gc.c
@@ -307,10 +307,9 @@ __objc_generate_gc_type_description (Class class)
  / sizeof (void *));
   size = ROUND (bits_no, BITS_PER_WORD) / BITS_PER_WORD;
   mask = objc_atomic_malloc (size * sizeof (int));
-  memset (mask, 0, size * sizeof (int));

   class_structure_type = objc_atomic_malloc (type_size);
-  *class_structure_type = current = 0;
+  current = 0;
   __objc_class_structure_encoding (class, _structure_type,
_size, );
   if (current + 1 == type_size)


Re: [PATCH] fixincludes: fix portability issues about getcwd() [PR21283, PR80047]

2021-11-12 Thread Bruce Korb via Gcc-patches
If you are going to be excruciatingly, painfully correct, free() is 
going to be unhappy about freeing a static string in the event getcwd() 
fails for some inexplicable reason. I'd replace the free() + return with 
a call to exit. Maybe even:


   if (VERY_UNLIKELY (access (pz_curr_file, R_OK) != 0)) abort()

On 11/11/21 8:33 AM, Xi Ruoyao wrote:

---
  fixincludes/fixincl.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index 6dba2f6e830..1580c67efec 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -1353,9 +1353,18 @@ process (void)
if (access (pz_curr_file, R_OK) != 0)
  {
int erno = errno;
+  char *buf = NULL;
+  const char *cwd = NULL;
+  for (size_t size = 256; !cwd; size += size)
+   {
+ buf = xrealloc (buf, size);
+ cwd = getcwd (buf, size);
+ if (!cwd && errno != ERANGE)
+   cwd = "the working directory";
+   }
fprintf (stderr, "Cannot access %s from %s\n\terror %d (%s)\n",
-   pz_curr_file, getcwd ((char *) NULL, MAXPATHLEN),
-   erno, xstrerror (erno));
+  pz_curr_file, cwd, erno, xstrerror (erno));
+ free (buf); return;
  }
  


Re: [PATCH] PR fortran/102368 - Failure to compile program using the C_SIZEOF function in ISO_C_BINDING

2021-11-12 Thread Harald Anlauf via Gcc-patches

Hi Bernhard,

Am 12.11.21 um 21:18 schrieb Bernhard Reutner-Fischer via Fortran:

On Fri, 12 Nov 2021 18:39:48 +0100
Harald Anlauf via Fortran  wrote:

Sounds plausible.


this is what I thought, too.  And nvfortran and flang accept the
testcase, as well as crayftn (cce/12.0.2).

Intel accepts the first case (a), but rejects the second (b).
I asked in the Intel forum.  Steve Lionel doubts that the code is
valid.

There might be some confusion on my side, but having Cray on my
side feels good.  (Although the PR was entered into bugzilla by
a Cray employee).


Nits:


diff --git a/gcc/testsuite/gfortran.dg/c_sizeof_7.f90 
b/gcc/testsuite/gfortran.dg/c_sizeof_7.f90
new file mode 100644
index 000..3cfa3371f72
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/c_sizeof_7.f90


[I'd name this .f08, no?]


@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-options "-std=f2008 -fdump-tree-original" }


[and drop the -std]


+! { dg-final { scan-tree-dump-times "_gfortran_stop_numeric" 0 "original" } }


[ ...-times 0 == scan-tree-dump-not ]


Good point.


+! PR fortran/102368
+
+program main
+  use, intrinsic :: iso_c_binding
+  implicit none
+  character(kind=c_char, len=*), parameter :: a = 'abc'
+  character(kind=c_char, len=8):: b


character(kind=c_char, len=-42) :: c ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=-0) :: d ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=0) :: e ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=+0) :: f ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=0.0d) :: g ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=3.) :: h ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=.031415e2) :: i ! { dg-error "positive integer greater 
than 0" }
...
are caught elsewhere if one assumes that len should be a positive int > 0 
(didn't look)
Also did not look if
character(kind=c_char, len=SELECTED_REAL_KIND(10)) :: j ! is that constant? 
Should it be?


These things should already be handled in general and
elsewhere, as they are not about interoperability.


+  if (c_sizeof (a) /= 3) stop 1
+  if (c_sizeof (b) /= 8) stop 2


indeed.
cheers,


+end program main
--
2.26.2





Thanks,
Harald





[PATCH] Combine malloc + memset to calloc

2021-11-12 Thread Seija K. via Gcc-patches
diff --git a/gcc/ada/terminals.c b/gcc/ada/terminals.c
index a2dd4895d48..25d9acda752 100644
--- a/gcc/ada/terminals.c
+++ b/gcc/ada/terminals.c
@@ -609,8 +609,7 @@ __gnat_setup_communication (struct TTY_Process**
process_out) /* output param */
 {
   struct TTY_Process* process;

-  process = (struct TTY_Process*)malloc (sizeof (struct TTY_Process));
-  ZeroMemory (process, sizeof (struct TTY_Process));
+  process = (struct TTY_Process*)calloc (1, sizeof (struct TTY_Process));
   *process_out = process;

   return 0;
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c
b/gcc/config/rs6000/rs6000-gen-builtins.c
index 1655a2fd765..2c895a2d9a9 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -1307,8 +1307,7 @@ parse_args (prototype *protoptr)
   do {
 consume_whitespace ();
 int oldpos = pos;
-typelist *argentry = (typelist *) malloc (sizeof (typelist));
-memset (argentry, 0, sizeof *argentry);
+typelist *argentry = (typelist *) calloc (1, sizeof (typelist));
 typeinfo *argtype = >info;
 success = match_type (argtype, VOID_NOTOK);
 if (success)
diff --git a/gcc/d/dmd/ctfeexpr.c b/gcc/d/dmd/ctfeexpr.c
index a8e97833ad0..401ed748f43 100644
--- a/gcc/d/dmd/ctfeexpr.c
+++ b/gcc/d/dmd/ctfeexpr.c
@@ -1350,8 +1350,7 @@ int ctfeRawCmp(Loc loc, Expression *e1, Expression
*e2)
 if (es2->keys->length != dim)
 return 1;

-bool *used = (bool *)mem.xmalloc(sizeof(bool) * dim);
-memset(used, 0, sizeof(bool) * dim);
+bool *used = (bool *)mem.xcalloc(dim, sizeof(bool));

 for (size_t i = 0; i < dim; ++i)
 {
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..f5bff8b9441 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3081,9 +3081,16 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
  0).exists ())
  {
   unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
-  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);
-  memset (buf, (init_type == AUTO_INIT_PATTERN
- ? INIT_PATTERN_VALUE : 0), total_bytes);
+  unsigned char *buf;
+if (init_type == AUTO_INIT_PATTERN)
+  {
+buf = (unsigned char *) xmalloc (total_bytes);
+memset (buf, INIT_PATTERN_VALUE, total_bytes);
+  }
+else
+  {
+buf = (unsigned char *) xcalloc (1, total_bytes);
+  }
   tree itype = build_nonstandard_integer_type
  (total_bytes * BITS_PER_UNIT, 1);
   wide_int w = wi::from_buffer (buf, total_bytes);
diff --git a/libiberty/calloc.c b/libiberty/calloc.c
index f4bd27b1cd2..1ef4156d28a 100644
--- a/libiberty/calloc.c
+++ b/libiberty/calloc.c
@@ -17,7 +17,7 @@ Uses @code{malloc} to allocate storage for @var{nelem}
objects of

 /* For systems with larger pointers than ints, this must be declared.  */
 PTR malloc (size_t);
-void bzero (PTR, size_t);
+void memset (PTR, int, size_t);

 PTR
 calloc (size_t nelem, size_t elsize)
@@ -28,7 +28,7 @@ calloc (size_t nelem, size_t elsize)
 nelem = elsize = 1;

   ptr = malloc (nelem * elsize);
-  if (ptr) bzero (ptr, nelem * elsize);
+  if (ptr) memset (ptr, 0, nelem * elsize);

   return ptr;
 }
diff --git a/libiberty/partition.c b/libiberty/partition.c
index 81e5fc0f79a..75512d67258 100644
--- a/libiberty/partition.c
+++ b/libiberty/partition.c
@@ -146,8 +146,7 @@ partition_print (partition part, FILE *fp)
   int e;

   /* Flag the elements we've already printed.  */
-  done = (char *) xmalloc (num_elements);
-  memset (done, 0, num_elements);
+  done = (char *) xcalloc (num_elements, 1);

   /* A buffer used to sort elements in a class.  */
   class_elements = (int *) xmalloc (num_elements * sizeof (int));
diff --git a/libobjc/gc.c b/libobjc/gc.c
index 57895e61930..95a75f5cb2e 100644
--- a/libobjc/gc.c
+++ b/libobjc/gc.c
@@ -307,10 +307,9 @@ __objc_generate_gc_type_description (Class class)
  / sizeof (void *));
   size = ROUND (bits_no, BITS_PER_WORD) / BITS_PER_WORD;
   mask = objc_atomic_malloc (size * sizeof (int));
-  memset (mask, 0, size * sizeof (int));

   class_structure_type = objc_atomic_malloc (type_size);
-  *class_structure_type = current = 0;
+  current = 0;
   __objc_class_structure_encoding (class, _structure_type,
_size, );
   if (current + 1 == type_size)


[PATCH] libstdc++: Use GCC_TRY_COMPILE_OR_LINK for getentropy, arc4random

2021-11-12 Thread Hans-Peter Nilsson via Gcc-patches
Since r12-5056-g3439657b0286, there has been a regression in
test results; an additional 100 FAILs running the g++ and
libstdc++ testsuite on cris-elf, a newlib target.  The
failures are linker errors, not finding a definition for
getentropy.  It appears newlib has since 2017-12-03
declarations of getentropy and arc4random, and provides an
implementation of arc4random using getentropy, but provides no
definition of getentropy, not even a stub yielding ENOSYS.
This is similar to what it does for many other functions too.

While fixing newlib (like adding said stub) would likely help,
it still leaves older newlib releases hanging.  Thankfully,
the libstdc++ configury test can be improved to try linking
where possible; using the bespoke GCC_TRY_COMPILE_OR_LINK
instead of AC_TRY_COMPILE.  BTW, I see a lack of consistency;
some tests use AC_TRY_COMPILE and some GCC_TRY_COMPILE_OR_LINK
for no apparent reason, but this commit just amends
r12-5056-g3439657b0286.

Testing for cris-elf is underway and the log says so far the
related regressions are fixed.  Ok to commit?

libstdc++-v3:
PR libstdc++/103166
* acinclude.m4 (GLIBCXX_CHECK_GETENTROPY, GLIBCXX_CHECK_ARC4RANDOM):
Use GCC_TRY_COMPILE_OR_LINK instead of AC_TRY_COMPILE.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 |  4 ++--
 libstdc++-v3/configure| 53 +--
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 4adfdf646acb..30bd92d37f23 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4839,7 +4839,7 @@ AC_DEFUN([GLIBCXX_CHECK_GETENTROPY], [
   AC_LANG_CPLUSPLUS
   AC_MSG_CHECKING([for getentropy])
   AC_CACHE_VAL(glibcxx_cv_getentropy, [
-  AC_TRY_COMPILE(
+  GCC_TRY_COMPILE_OR_LINK(
[#include ],
[unsigned i;
 ::getentropy(, sizeof(i));],
@@ -4862,7 +4862,7 @@ AC_DEFUN([GLIBCXX_CHECK_ARC4RANDOM], [
   AC_LANG_CPLUSPLUS
   AC_MSG_CHECKING([for arc4random])
   AC_CACHE_VAL(glibcxx_cv_arc4random, [
-  AC_TRY_COMPILE(
+  GCC_TRY_COMPILE_OR_LINK(
[#include ],
[unsigned i = ::arc4random();],
[glibcxx_cv_arc4random=yes], [glibcxx_cv_arc4random=no])
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 3a572475546f..3eb391f409f2 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -75445,7 +75445,8 @@ $as_echo_n "checking for getentropy... " >&6; }
   $as_echo_n "(cached) " >&6
 else
 
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+  if test x$gcc_no_link = xyes; then
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include 
 int
@@ -75463,6 +75464,30 @@ else
   glibcxx_cv_getentropy=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+else
+  if test x$gcc_no_link = xyes; then
+  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
+fi
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+unsigned i;
+::getentropy(, sizeof(i));
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_link "$LINENO"; then :
+  glibcxx_cv_getentropy=yes
+else
+  glibcxx_cv_getentropy=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+fi
 
 fi
 
@@ -75496,7 +75521,8 @@ $as_echo_n "checking for arc4random... " >&6; }
   $as_echo_n "(cached) " >&6
 else
 
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+  if test x$gcc_no_link = xyes; then
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include 
 int
@@ -75513,6 +75539,29 @@ else
   glibcxx_cv_arc4random=no
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+else
+  if test x$gcc_no_link = xyes; then
+  as_fn_error $? "Link tests are not allowed after GCC_NO_EXECUTABLES." 
"$LINENO" 5
+fi
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+unsigned i = ::arc4random();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_link "$LINENO"; then :
+  glibcxx_cv_arc4random=yes
+else
+  glibcxx_cv_arc4random=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+fi
 
 fi
 
-- 
2.11.0



Re: [PATCH] PR fortran/102368 - Failure to compile program using the C_SIZEOF function in ISO_C_BINDING

2021-11-12 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 12 Nov 2021 18:39:48 +0100
Harald Anlauf via Fortran  wrote:

Sounds plausible.
Nits:

> diff --git a/gcc/testsuite/gfortran.dg/c_sizeof_7.f90 
> b/gcc/testsuite/gfortran.dg/c_sizeof_7.f90
> new file mode 100644
> index 000..3cfa3371f72
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/c_sizeof_7.f90

[I'd name this .f08, no?]

> @@ -0,0 +1,13 @@
> +! { dg-do compile }
> +! { dg-options "-std=f2008 -fdump-tree-original" }

[and drop the -std]

> +! { dg-final { scan-tree-dump-times "_gfortran_stop_numeric" 0 "original" } }

[ ...-times 0 == scan-tree-dump-not ]

> +! PR fortran/102368
> +
> +program main
> +  use, intrinsic :: iso_c_binding
> +  implicit none
> +  character(kind=c_char, len=*), parameter :: a = 'abc'
> +  character(kind=c_char, len=8):: b

character(kind=c_char, len=-42) :: c ! { dg-error "positive integer greater 
than 0" }
character(kind=c_char, len=-0) :: d ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=0) :: e ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=+0) :: f ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=0.0d) :: g ! { dg-error "positive integer greater 
than 0" }
character(kind=c_char, len=3.) :: h ! { dg-error "positive integer greater than 
0" }
character(kind=c_char, len=.031415e2) :: i ! { dg-error "positive integer 
greater than 0" }
...
are caught elsewhere if one assumes that len should be a positive int > 0 
(didn't look)
Also did not look if
character(kind=c_char, len=SELECTED_REAL_KIND(10)) :: j ! is that constant? 
Should it be?

> +  if (c_sizeof (a) /= 3) stop 1
> +  if (c_sizeof (b) /= 8) stop 2

indeed.
cheers,

> +end program main
> --
> 2.26.2
> 



Re: [COMMITTED] path solver: Solve PHI imports first for ranges.

2021-11-12 Thread Aldy Hernandez via Gcc-patches
On Fri, Nov 12, 2021, 20:50 Richard Biener 
wrote:

> On November 12, 2021 8:46:25 PM GMT+01:00, Aldy Hernandez via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >PHIs must be resolved first while solving ranges in a block,
> >regardless of where they appear in the import bitmap.  We went through
> >a similar exercise for the relational code, but missed these.
>
> Must not all stmts be resolved in program order (for optimality at least)?
>

The recursion takes care of that. Dependencies get taken care of before the
definitions that need them. I've yet to see a case where we get it wrong,
even in the presence of loops and interdependencies. Wellexcept in the
phis cause we should've done them first. :-)

Aldy


Re: [wwwdocs][PATCH] Document deprecation of OpenMP MIC offloading in GCC 12

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 8:26:40 PM GMT+01:00, "H.J. Lu via Gcc-patches" 
 wrote:
>---
> htdocs/gcc-12/changes.html | 4 
> 1 file changed, 4 insertions(+)

Ok. 

Thanks, 
Richard. 

>diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
>index 5f12fb42..c133786d 100644
>--- a/htdocs/gcc-12/changes.html
>+++ b/htdocs/gcc-12/changes.html
>@@ -49,6 +49,10 @@ a work-in-progress.
> which still supports -std=f95 and is recommended to be used
> instead in general.
>   
>+  
>+OpenMP offloading to Intel MIC has been deprecated and will be removed
>+in a future release.
>+  
>   
> The cr16 target with the cr16-*-* configuration
> has been obsoleted and will be removed in a future release.



Re: [PATCH] Remove dead code.

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 8:18:59 PM GMT+01:00, "H.J. Lu"  
wrote:
>On Fri, Nov 12, 2021 at 11:15 AM Richard Biener
> wrote:
>>
>> On November 12, 2021 3:41:41 PM GMT+01:00, "H.J. Lu via Gcc-patches" 
>>  wrote:
>> >On Fri, Nov 12, 2021 at 6:27 AM Martin Liška  wrote:
>> >>
>> >> On 11/8/21 15:19, Jeff Law wrote:
>> >> >
>> >> >
>> >> > On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:
>> >> >> On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:
>> >> >>> This fixes issue reported in the PR.
>> >> >>>
>> >> >>> Ready to be installed?
>> >> >> I'm not sure.  liboffloadmic is copied from upstream, so the right
>> >> >> thing if we want to do anything at all (if we don't remove it, nothing
>> >> >> bad happens, the condition is never true anyway, whether removed away
>> >> >> in the source or removed by the compiler) would be to let Intel fix it 
>> >> >> in
>> >> >> their source and update from that.
>> >> >> But I have no idea where it even lives upstream.
>> >> > I thought MIC as an architecture was dead, so it could well be the case 
>> >> > that there isn't a viable upstream anymore for that code.
>> >> >
>> >> > jeff
>> >>
>> >> @H.J. ?
>> >>
>> >
>> >We'd like to deprecate MIC offload in GCC 12.  We will remove all traces of
>> >MIC offload in GCC 13.
>>
>> Can you document that in gcc-12/changes.html in the caveats section please?
>>
>
>I will do that.
>
>Can you review my last wwwdocs change:
>
>https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578344.html

That change is OK. 

Richard. 

>Thanks.
>



Re: [COMMITTED] path solver: Solve PHI imports first for ranges.

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 8:46:25 PM GMT+01:00, Aldy Hernandez via Gcc-patches 
 wrote:
>PHIs must be resolved first while solving ranges in a block,
>regardless of where they appear in the import bitmap.  We went through
>a similar exercise for the relational code, but missed these.

Must not all stmts be resolved in program order (for optimality at least)? 

>Tested on x86-64 & ppc64le Linux.
>
>gcc/ChangeLog:
>
>   PR tree-optimization/103202
>   * gimple-range-path.cc
>   (path_range_query::compute_ranges_in_block): Solve PHI imports first.
>---
> gcc/gimple-range-path.cc | 15 +--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
>diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
>index b9aceaf2565..71b290434cb 100644
>--- a/gcc/gimple-range-path.cc
>+++ b/gcc/gimple-range-path.cc
>@@ -365,12 +365,23 @@ path_range_query::compute_ranges_in_block (basic_block 
>bb)
>   clear_cache (name);
> }
> 
>-  // Solve imports defined in this block.
>+  // Solve imports defined in this block, starting with the PHIs...
>+  for (gphi_iterator iter = gsi_start_phis (bb); !gsi_end_p (iter);
>+   gsi_next ())
>+{
>+  gphi *phi = iter.phi ();
>+  tree name = gimple_phi_result (phi);
>+
>+  if (import_p (name) && range_defined_in_block (r, name, bb))
>+  set_cache (r, name);
>+}
>+  // ...and then the rest of the imports.
>   EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
> {
>   tree name = ssa_name (i);
> 
>-  if (range_defined_in_block (r, name, bb))
>+  if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
>+&& range_defined_in_block (r, name, bb))
>   set_cache (r, name);
> }
> 



PING: [PATCH] rs6000: MMA test case emits wrong code when building a vector pair

2021-11-12 Thread Peter Bergner via Gcc-patches
I'd like to ping the following patch.

Peter


On 10/27/21 8:37 PM, Peter Bergner via Gcc-patches wrote:
> PR102976 shows a test case where we generate wrong code when building
> a vector pair from 2 vector registers.  The bug here is that with unlucky
> register assignments, we can clobber one of the input operands before
> we write both registers of the output operand.  The solution is to use
> early-clobbers in the assemble pair and accumulator patterns.
> 
> This passed bootstrap and regtesting with no regressions and our
> OpenBLAS team has confirmed it fixes the issues they reported.
> Ok for mainline?
> 
> Ok for GCC 11 too after a few days on trunk?
> 
> Peter
> 
> 
> gcc/
>   PR target/102976
>   * config/rs6000/mma.md (*vsx_assemble_pair): Add early-clobber for
>   output operand.
>   (*mma_assemble_acc): Likewise.
> 
> gcc/testsuite/
>   PR target/102976
>   * gcc.target/powerpc/pr102976.c: New test.
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 1990a2183f6..f0ea99963f7 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -339,7 +339,7 @@ (define_expand "vsx_assemble_pair"
>  })
>  
>  (define_insn_and_split "*vsx_assemble_pair"
> -  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
> +  [(set (match_operand:OO 0 "vsx_register_operand" "=")
>   (unspec:OO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
>   (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")]
>   UNSPEC_MMA_ASSEMBLE))]
> @@ -405,7 +405,7 @@ (define_expand "mma_assemble_acc"
>  })
>  
>  (define_insn_and_split "*mma_assemble_acc"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=")
>   (unspec:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
>   (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")
>   (match_operand:V16QI 3 "mma_assemble_input_operand" "mwa")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102976.c
> new file mode 100644
> index 000..a8de8f056f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c
> @@ -0,0 +1,14 @@
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +
> +#include 
> +void
> +bug (__vector_pair *dst)
> +{
> +  register vector unsigned char vec0 asm ("vs44");
> +  register vector unsigned char vec1 asm ("vs32");
> +  __builtin_vsx_build_pair (dst, vec0, vec1);
> +}
> +
> +/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */
> +/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */
> 



Re: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 12, 2021 at 08:47:09PM +0100, Jakub Jelinek wrote:
> The problem is that the argument of the num_teams clause isn't always known
> before target is launched.

There was a design mistake that the clause has been put on teams rather than
on target (well, for host teams we need it on teams), and 5.1 actually
partially fixes this up for thread_limit by allowing that clause on both,
but not for num_teams.

Jakub



Re: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 12, 2021 at 10:16:11PM +0300, Alexander Monakov wrote:
> I suspect there may be a misunderstanding here, or maybe your explanation is
> incomplete. I don't think the intention of the standard was to force such
> complexity. You can launch as many blocks on the GPU as you like, limited only
> by the bitwidth of the indexing register used in hardware, NVIDIA guarantees
> at least INT_MAX blocks (in fact almost 1<<63 blocks if you launch a
> three-dimensional grid with INT_MAX x 65535 x 65535 blocks).
> 
> The hardware will schedule blocks automatically (so for example if the 
> hardware
> can run 40 blocks simultaneously and you launch 100, the hardware may launch
> blocks 0 to 39, then when one of those finishes it will launch the 40'th block
> and so on).
> 
> So isn't the solution simply to adjust the logic around
> nvptx_adjust_launch_bounds in GOMP_OFFLOAD_run, that is, if there's a lower
> bound specified, use it instead of what adjust_launch_bounds is computing as
> max_blocks?

The problem is that the argument of the num_teams clause isn't always known
before target is launched.
While gimplify.c tries hard to figure it out as often as possible and the
standard makes it easy for the combined target teams case where we say
that the expressions in the num_teams/thread_limit clauses are evaluated on
the host before the target construct - in that case the plugin is told the
expected number and unless CUDA decides to allocate fewer than requested,
we are fine, there are cases where target is not combined with teams where
per the spec the expressions need to be evaluated on the target, not on the
host (gimplify still tries to optimize some of those cases by e.g. seeing if
it is some simple arithmetic expression where all the vars would be
firstprivatized), and in that case we create some default number of CTAs and
only later on find out what the user asked for.
extern int foo (void);
#pragma omp declare target to (foo)
void bar (void)
{
  #pragma omp target
  #pragma omp teams num_teams (foo ())
  ;
}
is such a case, we simply don't know and foo () needs to be called in
target.  In OpenMP 5.0 we had the option to always create fewer teams if
we decided so (of course at least 1), but in 5.1 we don't have that option,
if there is just one expression, we need to create exactly that many teams,
if it is num_teams (foo () - 10 : foo () + 10), we need to be within that
range (inclusive).

Jakub



[COMMITTED] path solver: Solve PHI imports first for ranges.

2021-11-12 Thread Aldy Hernandez via Gcc-patches
PHIs must be resolved first while solving ranges in a block,
regardless of where they appear in the import bitmap.  We went through
a similar exercise for the relational code, but missed these.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

PR tree-optimization/103202
* gimple-range-path.cc
(path_range_query::compute_ranges_in_block): Solve PHI imports first.
---
 gcc/gimple-range-path.cc | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index b9aceaf2565..71b290434cb 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -365,12 +365,23 @@ path_range_query::compute_ranges_in_block (basic_block bb)
clear_cache (name);
 }
 
-  // Solve imports defined in this block.
+  // Solve imports defined in this block, starting with the PHIs...
+  for (gphi_iterator iter = gsi_start_phis (bb); !gsi_end_p (iter);
+   gsi_next ())
+{
+  gphi *phi = iter.phi ();
+  tree name = gimple_phi_result (phi);
+
+  if (import_p (name) && range_defined_in_block (r, name, bb))
+   set_cache (r, name);
+}
+  // ...and then the rest of the imports.
   EXECUTE_IF_SET_IN_BITMAP (m_imports, 0, i, bi)
 {
   tree name = ssa_name (i);
 
-  if (range_defined_in_block (r, name, bb))
+  if (gimple_code (SSA_NAME_DEF_STMT (name)) != GIMPLE_PHI
+ && range_defined_in_block (r, name, bb))
set_cache (r, name);
 }
 
-- 
2.31.1



[wwwdocs][PATCH] Document deprecation of OpenMP MIC offloading in GCC 12

2021-11-12 Thread H.J. Lu via Gcc-patches
---
 htdocs/gcc-12/changes.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 5f12fb42..c133786d 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -49,6 +49,10 @@ a work-in-progress.
 which still supports -std=f95 and is recommended to be used
 instead in general.
   
+  
+OpenMP offloading to Intel MIC has been deprecated and will be removed
+in a future release.
+  
   
 The cr16 target with the cr16-*-* configuration
 has been obsoleted and will be removed in a future release.
-- 
2.33.1



Re: [PATCH] Remove dead code.

2021-11-12 Thread H.J. Lu via Gcc-patches
On Fri, Nov 12, 2021 at 11:15 AM Richard Biener
 wrote:
>
> On November 12, 2021 3:41:41 PM GMT+01:00, "H.J. Lu via Gcc-patches" 
>  wrote:
> >On Fri, Nov 12, 2021 at 6:27 AM Martin Liška  wrote:
> >>
> >> On 11/8/21 15:19, Jeff Law wrote:
> >> >
> >> >
> >> > On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:
> >> >> On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:
> >> >>> This fixes issue reported in the PR.
> >> >>>
> >> >>> Ready to be installed?
> >> >> I'm not sure.  liboffloadmic is copied from upstream, so the right
> >> >> thing if we want to do anything at all (if we don't remove it, nothing
> >> >> bad happens, the condition is never true anyway, whether removed away
> >> >> in the source or removed by the compiler) would be to let Intel fix it 
> >> >> in
> >> >> their source and update from that.
> >> >> But I have no idea where it even lives upstream.
> >> > I thought MIC as an architecture was dead, so it could well be the case 
> >> > that there isn't a viable upstream anymore for that code.
> >> >
> >> > jeff
> >>
> >> @H.J. ?
> >>
> >
> >We'd like to deprecate MIC offload in GCC 12.  We will remove all traces of
> >MIC offload in GCC 13.
>
> Can you document that in gcc-12/changes.html in the caveats section please?
>

I will do that.

Can you review my last wwwdocs change:

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578344.html

Thanks.

-- 
H.J.


Re: Enable pure/const discovery in modref

2021-11-12 Thread Jan Hubicka via Gcc-patches
> Hi Honza,
> 
> On Thu, 11 Nov 2021 17:39:18 +0100
> Jan Hubicka via Gcc-patches  wrote:
> 
> > diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
> > index 422b52fba4b..550bdeded16 100644
> > --- a/gcc/ipa-pure-const.c
> > +++ b/gcc/ipa-pure-const.c
> > @@ -1513,6 +1611,9 @@ propagate_pure_const (void)
> >   enum pure_const_state_e edge_state = IPA_CONST;
> >   bool edge_looping = false;
> >  
> > + if (e->recursive_p ())
> > +   looping = true;
> > +
> >   if (e->recursive_p ())
> > looping = true;
> >  
> 
> This seems redundant, no?

Yes, artifact of breaking up the patch :(
I also noticed that I mixed up looping flag (there are two variables one
called looping and other this_looping while the second is one I should
use and first is one I used)

Fixed as follows.


gcc/ChangeLog:

* ipa-pure-const.c (propagate_pure_const): Remove redundant check;
fix call of ipa_make_function_const and ipa_make_function_pure.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index b831844afa6..5056850c0a8 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1611,9 +1611,6 @@ propagate_pure_const (void)
  enum pure_const_state_e edge_state = IPA_CONST;
  bool edge_looping = false;
 
- if (e->recursive_p ())
-   looping = true;
-
  if (e->recursive_p ())
looping = true;
 
@@ -1800,11 +1797,11 @@ propagate_pure_const (void)
switch (this_state)
  {
  case IPA_CONST:
-   remove_p |= ipa_make_function_const (node, looping, false);
+   remove_p |= ipa_make_function_const (node, this_looping, false);
break;
 
  case IPA_PURE:
-   remove_p |= ipa_make_function_pure (node, looping, false);
+   remove_p |= ipa_make_function_pure (node, this_looping, false);
break;
 
  default:


Re: [PATCH] Replace more DEBUG_EXPR_DECL creations with build_debug_expr_decl

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 3:39:56 PM GMT+01:00, Martin Jambor  
wrote:
>Hi,
>
>On Tue, Nov 09 2021, Richard Biener wrote:
>> On Mon, 8 Nov 2021, Martin Jambor wrote:
>>> this patch introduces a helper function build_debug_expr_decl to build
>>> DEBUG_EXPR_DECL tree nodes in the most common way and replaces with a
>>> call of this function all code pieces which build such a DECL itself
>>> and sets its mode to the TYPE_MODE of its type.
>>> 
>>> There still remain 11 instances of open-coded creation of a
>>> DEBUG_EXPR_DECL which set the mode of the DECL to something else.  It
>>> would probably be a good idea to figure out that has any effect and if
>>> not, convert them to calls of build_debug_expr_decl too.  But this
>>> patch deliberately does not introduce any functional changes.
>>> 
>>> Bootstrapped and tested on x86_64-linux, OK for trunk?
>>
>> OK (the const_tree suggestion is a good one).
>>
>> For the remaining cases I'd simply use
>>
>> decl = build_debug_expr_decl (type);
>> SET_DECL_MODE (decl) = ...;
>>
>> and thus override the mode afterwards, maybe adding a comment to
>> check whether that's necessary.  As said, the only case where it
>> might matter is when we create a debug decl replacement for a FIELD_DECL,
>> so maybe for those SRA things we create for DWARF "piece" info?
>>
>
>Like this?  This patch replaces all but one remaining open coded
>constructions of DEBUG_EXPR_DECL with calls to build_debug_expr_decl,
>even if - in order not to introduce any functional change - the mode of
>the constructed decl is then overwritten.
>
>It is not clear if changing the mode has any effect in practice and
>therefore I have added a FIXME note to code which does it, as
>requested.
>
>After this patch, DEBUG_EXPR_DECLs are created only by
>build_debug_expr_decl and make_debug_expr_from_rtl which looks like
>it should be left alone.
>
>Bootstrapped and tested on x86_64-linux.  OK for trunk?

Yes. 

Thanks, 
Richard. 

>I have also compared the generated DWARF (with readelf -w) of cc1plus
>generated by a compiler with this patch and one with the mode setting
>removed (on top of this patch) and there were no differences
>whatsoever.  So perhaps we can just remove it?  I have not
>bootstrapped that patch yet, though.

I guess that for one case it mattered and we might have a testcase to show that 
and other cases were just cut and pasted from the "wrong" place... 

Richard. 

>Thanks,
>
>Martin
>
>
>gcc/ChangeLog:
>
>2021-11-11  Martin Jambor  
>
>   * cfgexpand.c (expand_gimple_basic_block): Use build_debug_expr_decl,
>   add a fixme note about the mode assignment perhaps being unnecessary.
>   * ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
>   Likewise.
>   (ipa_param_body_adjustments::mark_dead_statements): Likewise.
>   (ipa_param_body_adjustments::reset_debug_stmts): Likewise.
>   * tree-inline.c (remap_ssa_name): Likewise.
>   (tree_function_versioning): Likewise.
>   * tree-into-ssa.c (rewrite_debug_stmt_uses): Likewise.
>   * tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
>   * tree-ssa.c (insert_debug_temp_for_var_def): Likewise.
>---
> gcc/cfgexpand.c  |  5 ++---
> gcc/ipa-param-manipulation.c | 17 +++--
> gcc/tree-inline.c| 17 +++--
> gcc/tree-into-ssa.c  |  7 +++
> gcc/tree-ssa-loop-ivopts.c   |  5 ++---
> gcc/tree-ssa.c   |  5 ++---
> 6 files changed, 23 insertions(+), 33 deletions(-)
>
>diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>index 55ff75bd78e..eb6466f4be6 100644
>--- a/gcc/cfgexpand.c
>+++ b/gcc/cfgexpand.c
>@@ -5898,18 +5898,17 @@ expand_gimple_basic_block (basic_block bb, bool 
>disable_tail_calls)
>  temporary.  */
>   gimple *debugstmt;
>   tree value = gimple_assign_rhs_to_tree (def);
>-  tree vexpr = make_node (DEBUG_EXPR_DECL);
>+  tree vexpr = build_debug_expr_decl (TREE_TYPE (value));
>   rtx val;
>   machine_mode mode;
> 
>   set_curr_insn_location (gimple_location (def));
> 
>-  DECL_ARTIFICIAL (vexpr) = 1;
>-  TREE_TYPE (vexpr) = TREE_TYPE (value);
>   if (DECL_P (value))
> mode = DECL_MODE (value);
>   else
> mode = TYPE_MODE (TREE_TYPE (value));
>+  /* FIXME: Is setting the mode really necessary? */
>   SET_DECL_MODE (vexpr, mode);
> 
>   val = gen_rtx_VAR_LOCATION
>diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
>index ae3149718ca..a230735d71e 100644
>--- a/gcc/ipa-param-manipulation.c
>+++ b/gcc/ipa-param-manipulation.c
>@@ -831,9 +831,8 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs,
> }
> if (ddecl == NULL)
>   {
>-ddecl = make_node (DEBUG_EXPR_DECL);
>-

Re: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Alexander Monakov via Gcc-patches
Hello Jakub,

On Fri, 12 Nov 2021, Jakub Jelinek via Gcc-patches wrote:

> On Fri, Nov 12, 2021 at 02:27:16PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > On Fri, Nov 12, 2021 at 02:20:23PM +0100, Jakub Jelinek via Gcc-patches 
> > wrote:
> > > This patch assumes that .shared variables are initialized to 0,
> > > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html lists
> > > in Table 7. .shared as non-initializable.  If that isn't the case,
> > > we need to initialize it somewhere for the case of #pragma omp target
> > > without #pragma omp teams in it, maybe in libgcc/config/nvptx/crt0.c ?
> > 
> > A quick look at libgcc/config/nvptx/crt0.c shows the target supports
> > __attribute__((shared)), so perhaps either following instead, or, if
> > .shared isn't preinitialized to zero, defining the variable in
> > libgcc/config/nvptx/crt0.c , adding there __gomp_team_num = 0;
> > and adding extern keyword before int __gomp_team_num 
> > __attribute__((shared));
> > in libgomp/config/nvptx/target.c.
> 
> And finally here is a third version, which fixes a typo in the previous
> patch (in instead of int) and actually initializes the shared var because
> PTX documentation doesn't say anything about how the shared vars are
> initialized.
> 
> Tested on x86_64-linux with nvptx-none offloading, ok for trunk?

I suspect there may be a misunderstanding here, or maybe your explanation is
incomplete. I don't think the intention of the standard was to force such
complexity. You can launch as many blocks on the GPU as you like, limited only
by the bitwidth of the indexing register used in hardware, NVIDIA guarantees
at least INT_MAX blocks (in fact almost 1<<63 blocks if you launch a
three-dimensional grid with INT_MAX x 65535 x 65535 blocks).

The hardware will schedule blocks automatically (so for example if the hardware
can run 40 blocks simultaneously and you launch 100, the hardware may launch
blocks 0 to 39, then when one of those finishes it will launch the 40'th block
and so on).

So isn't the solution simply to adjust the logic around
nvptx_adjust_launch_bounds in GOMP_OFFLOAD_run, that is, if there's a lower
bound specified, use it instead of what adjust_launch_bounds is computing as
max_blocks?

Yours,
Alexander


Re: [PATCH] Remove dead code.

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 3:41:41 PM GMT+01:00, "H.J. Lu via Gcc-patches" 
 wrote:
>On Fri, Nov 12, 2021 at 6:27 AM Martin Liška  wrote:
>>
>> On 11/8/21 15:19, Jeff Law wrote:
>> >
>> >
>> > On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:
>> >> On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:
>> >>> This fixes issue reported in the PR.
>> >>>
>> >>> Ready to be installed?
>> >> I'm not sure.  liboffloadmic is copied from upstream, so the right
>> >> thing if we want to do anything at all (if we don't remove it, nothing
>> >> bad happens, the condition is never true anyway, whether removed away
>> >> in the source or removed by the compiler) would be to let Intel fix it in
>> >> their source and update from that.
>> >> But I have no idea where it even lives upstream.
>> > I thought MIC as an architecture was dead, so it could well be the case 
>> > that there isn't a viable upstream anymore for that code.
>> >
>> > jeff
>>
>> @H.J. ?
>>
>
>We'd like to deprecate MIC offload in GCC 12.  We will remove all traces of
>MIC offload in GCC 13.

Can you document that in gcc-12/changes.html in the caveats section please?

Thanks, 
Richard. 

>



Re: [PATCH 2/5] vect: Use generalised accessors to build SLP nodes

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 6:59:22 PM GMT+01:00, Richard Sandiford via Gcc-patches 
 wrote:
>This patch adds:
>
>- gimple_num_args
>- gimple_arg
>- gimple_arg_ptr
>
>for accessing rhs operands of an assignment, call or PHI.  This is
>similar to the existing gimple_get_lhs.
>
>I guess there's a danger that these routines could be overused,
>such as in cases where gimple_assign_rhs1 etc. would be more
>appropriate.  I think the routines are still worth having though.
>These days, most new operations are added as internal functions rather
>than tree codes, so it's useful to be able to handle assignments and
>calls in a consistent way.
>
>The patch also generalises the way that SLP child nodes map
>to gimple stmt operands.  This is useful for later patches.
>
>Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Nice.

Ok.
Thanks, 
Richard. 

>Richard
>
>
>gcc/
>   * gimple.h (gimple_num_args, gimple_arg, gimple_arg_ptr): New
>   functions.
>   * tree-vect-slp.c (cond_expr_maps, arg2_map): New variables.
>   (vect_get_operand_map): New function.
>   (vect_get_and_check_slp_defs): Fix outdated comment.
>   Use vect_get_operand_map and new gimple argument accessors.
>   (vect_build_slp_tree_2): Likewise.
>---
> gcc/gimple.h|  38 
> gcc/tree-vect-slp.c | 148 +++-
> 2 files changed, 114 insertions(+), 72 deletions(-)
>
>diff --git a/gcc/gimple.h b/gcc/gimple.h
>index 3cde3cde7fe..f7fdefc5362 100644
>--- a/gcc/gimple.h
>+++ b/gcc/gimple.h
>@@ -4692,6 +4692,44 @@ gimple_phi_arg_has_location (const gphi *phi, size_t i)
>   return gimple_phi_arg_location (phi, i) != UNKNOWN_LOCATION;
> }
> 
>+/* Return the number of arguments that can be accessed by gimple_arg.  */
>+
>+static inline unsigned
>+gimple_num_args (const gimple *gs)
>+{
>+  if (auto phi = dyn_cast (gs))
>+return gimple_phi_num_args (phi);
>+  if (auto call = dyn_cast (gs))
>+return gimple_call_num_args (call);
>+  return gimple_num_ops (as_a  (gs)) - 1;
>+}
>+
>+/* GS must be an assignment, a call, or a PHI.
>+   If it's an assignment, return rhs operand I.
>+   If it's a call, return function argument I.
>+   If it's a PHI, return the value of PHI argument I.  */
>+
>+static inline tree
>+gimple_arg (const gimple *gs, unsigned int i)
>+{
>+  if (auto phi = dyn_cast (gs))
>+return gimple_phi_arg_def (phi, i);
>+  if (auto call = dyn_cast (gs))
>+return gimple_call_arg (call, i);
>+  return gimple_op (as_a  (gs), i + 1);
>+}
>+
>+/* Return a pointer to gimple_arg (GS, I).  */
>+
>+static inline tree *
>+gimple_arg_ptr (gimple *gs, unsigned int i)
>+{
>+  if (auto phi = dyn_cast (gs))
>+return gimple_phi_arg_def_ptr (phi, i);
>+  if (auto call = dyn_cast (gs))
>+return gimple_call_arg_ptr (call, i);
>+  return gimple_op_ptr (as_a  (gs), i + 1);
>+}
> 
> /* Return the region number for GIMPLE_RESX RESX_STMT.  */
> 
>diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>index f4123cf830a..2594ab7607f 100644
>--- a/gcc/tree-vect-slp.c
>+++ b/gcc/tree-vect-slp.c
>@@ -454,15 +454,57 @@ vect_def_types_match (enum vect_def_type dta, enum 
>vect_def_type dtb)
> && (dtb == vect_external_def || dtb == vect_constant_def)));
> }
> 
>+static const int cond_expr_maps[3][5] = {
>+  { 4, -1, -2, 1, 2 },
>+  { 4, -2, -1, 1, 2 },
>+  { 4, -1, -2, 2, 1 }
>+};
>+static const int arg2_map[] = { 1, 2 };
>+
>+/* For most SLP statements, there is a one-to-one mapping between
>+   gimple arguments and child nodes.  If that is not true for STMT,
>+   return an array that contains:
>+
>+   - the number of child nodes, followed by
>+   - for each child node, the index of the argument associated with that node.
>+ The special index -1 is the first operand of an embedded comparison and
>+ the special index -2 is the second operand of an embedded comparison.
>+
>+   SWAP is as for vect_get_and_check_slp_defs.  */
>+
>+static const int *
>+vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
>+{
>+  if (auto assign = dyn_cast (stmt))
>+{
>+  if (gimple_assign_rhs_code (assign) == COND_EXPR
>+&& COMPARISON_CLASS_P (gimple_assign_rhs1 (assign)))
>+  return cond_expr_maps[swap];
>+}
>+  gcc_assert (!swap);
>+  if (auto call = dyn_cast (stmt))
>+{
>+  if (gimple_call_internal_p (call))
>+  switch (gimple_call_internal_fn (call))
>+{
>+case IFN_MASK_LOAD:
>+  return arg2_map;
>+
>+default:
>+  break;
>+}
>+}
>+  return nullptr;
>+}
>+
> /* Get the defs for the rhs of STMT (collect them in OPRNDS_INFO), check that
>they are of a valid type and that they match the defs of the first stmt of
>the SLP group (stored in OPRNDS_INFO).  This function tries to match stmts
>-   by swapping operands of STMTS[STMT_NUM] when possible.  Non-zero *SWAP
>-   indicates swap is required for cond_expr stmts.  Specifically, *SWAP
>+   by swapping operands of 

Re: [PATCH 1/5] vect: Use code_helper when building SLP nodes

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 6:57:29 PM GMT+01:00, Richard Sandiford via Gcc-patches 
 wrote:
>This patch uses code_helper to represent the common (and
>alternative) operations when building an SLP node.  It's not
>much of a saving on its own, but it helps with later patches.
>
>Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok. 

Richard. 

>Richard
>
>
>gcc/
>   * tree-vect-slp.c (vect_build_slp_tree_1): Use code_helper
>   to record the operations performed by statements, only using
>   CALL_EXPR for things that don't map to built-in or internal
>   functions.  For shifts, require all shift amounts to be equal
>   if optab_vector is not supported but optab_scalar is.
>---
> gcc/tree-vect-slp.c | 77 +++--
> 1 file changed, 26 insertions(+), 51 deletions(-)
>
>diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>index 94c75497495..f4123cf830a 100644
>--- a/gcc/tree-vect-slp.c
>+++ b/gcc/tree-vect-slp.c
>@@ -876,17 +876,13 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
>*swap,
> {
>   unsigned int i;
>   stmt_vec_info first_stmt_info = stmts[0];
>-  enum tree_code first_stmt_code = ERROR_MARK;
>-  enum tree_code alt_stmt_code = ERROR_MARK;
>-  enum tree_code rhs_code = ERROR_MARK;
>-  enum tree_code first_cond_code = ERROR_MARK;
>+  code_helper first_stmt_code = ERROR_MARK;
>+  code_helper alt_stmt_code = ERROR_MARK;
>+  code_helper rhs_code = ERROR_MARK;
>+  code_helper first_cond_code = ERROR_MARK;
>   tree lhs;
>   bool need_same_oprnds = false;
>   tree vectype = NULL_TREE, first_op1 = NULL_TREE;
>-  optab optab;
>-  int icode;
>-  machine_mode optab_op2_mode;
>-  machine_mode vec_mode;
>   stmt_vec_info first_load = NULL, prev_first_load = NULL;
>   bool first_stmt_load_p = false, load_p = false;
>   bool first_stmt_phi_p = false, phi_p = false;
>@@ -966,13 +962,16 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
>*swap,
>   gcall *call_stmt = dyn_cast  (stmt);
>   if (call_stmt)
>   {
>-rhs_code = CALL_EXPR;
>+combined_fn cfn = gimple_call_combined_fn (call_stmt);
>+if (cfn != CFN_LAST)
>+  rhs_code = cfn;
>+else
>+  rhs_code = CALL_EXPR;
> 
>-if (gimple_call_internal_p (stmt, IFN_MASK_LOAD))
>+if (cfn == CFN_MASK_LOAD)
>   load_p = true;
>-else if ((gimple_call_internal_p (call_stmt)
>-  && (!vectorizable_internal_fn_p
>-  (gimple_call_internal_fn (call_stmt
>+else if ((internal_fn_p (cfn)
>+  && !vectorizable_internal_fn_p (as_internal_fn (cfn)))
>  || gimple_call_tail_p (call_stmt)
>  || gimple_call_noreturn_p (call_stmt)
>  || gimple_call_chain (call_stmt))
>@@ -1013,32 +1012,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
>*swap,
> || rhs_code == LROTATE_EXPR
> || rhs_code == RROTATE_EXPR)
>   {
>-vec_mode = TYPE_MODE (vectype);
>-
> /* First see if we have a vector/vector shift.  */
>-optab = optab_for_tree_code (rhs_code, vectype,
>- optab_vector);
>-
>-if (!optab
>-|| optab_handler (optab, vec_mode) == CODE_FOR_nothing)
>+if (!directly_supported_p (rhs_code, vectype, optab_vector))
>   {
> /* No vector/vector shift, try for a vector/scalar shift.  */
>-optab = optab_for_tree_code (rhs_code, vectype,
>- optab_scalar);
>-
>-if (!optab)
>-  {
>-if (dump_enabled_p ())
>-  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>-   "Build SLP failed: no optab.\n");
>-if (is_a  (vinfo) && i != 0)
>-  continue;
>-/* Fatal mismatch.  */
>-matches[0] = false;
>-return false;
>-  }
>-icode = (int) optab_handler (optab, vec_mode);
>-if (icode == CODE_FOR_nothing)
>+if (!directly_supported_p (rhs_code, vectype, optab_scalar))
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>@@ -1050,12 +1028,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
>*swap,
> matches[0] = false;
> return false;
>   }
>-optab_op2_mode = insn_data[icode].operand[2].mode;
>-if (!VECTOR_MODE_P (optab_op2_mode))
>-  {
>-need_same_oprnds = true;
>-first_op1 = gimple_assign_rhs2 (stmt);
>-  }
>+need_same_oprnds = true;
>+

Re: [PATCH] vect: Fix SVE mask_gather_load/store_store tests

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 6:55:47 PM GMT+01:00, Richard Sandiford via Gcc-patches 
 wrote:
>If-conversion now applies rewrite_to_defined_overflow to the
>address calculation in an IFN_MASK_LOAD.  This means that we
>end up with:
>
>cast_base = (uintptr_t) base;
>uncast_sum = cast_base + offset;
>sum = (orig_type *) uncast_sum;
>
>If the target supports IFN_MASK_GATHER_LOAD with pointer-sized
>offsets for the given vectype, we wouldn't look through the sum
>cast and so would needlessly vectorise the uncast_sum addition.
>
>This showed up as several failures in gcc.target/aarch64/sve.
>
>Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok. 

Richard. 

>Richard
>
>
>gcc/
>   * tree-vect-data-refs.c (vect_check_gather_scatter): Continue
>   processing conversions if the current offset is a pointer.
>---
> gcc/tree-vect-data-refs.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>index f1d7f01a9ce..888ad72f3a9 100644
>--- a/gcc/tree-vect-data-refs.c
>+++ b/gcc/tree-vect-data-refs.c
>@@ -4139,6 +4139,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
>loop_vec_info loop_vinfo,
> /* Don't include the conversion if the target is happy with
>the current offset type.  */
> if (use_ifn_p
>+&& !POINTER_TYPE_P (TREE_TYPE (off))
> && vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr),
>  masked_p, vectype, memory_type,
>  TREE_TYPE (off), scale, ,



Re: [PATCH] vect: Fix vect_is_reduction

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 6:54:29 PM GMT+01:00, Richard Sandiford via Gcc-patches 
 wrote:
>The current definition of vect_is_reduction (provided for target
>costing) misses some pattern statements.
>
>Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

This now will return true for all stmts on the reduction path (not sure for the 
PHI node though) 

Ok if that's intentional. 

Richard. 

>Richard
>
>
>gcc/
>   * tree-vectorizer.h (vect_is_reduction): Use STMT_VINFO_REDUC_IDX.
>
>gcc/testsuite/
>   * gcc.target/aarch64/sve/cost_model_13.c: New test.
>---
> .../gcc.target/aarch64/sve/cost_model_13.c   | 16 
> gcc/tree-vectorizer.h|  3 +--
> 2 files changed, 17 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c
>
>diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c 
>b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c
>new file mode 100644
>index 000..95f2ce91f80
>--- /dev/null
>+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c
>@@ -0,0 +1,16 @@
>+/* { dg-options "-O3 -mtune=neoverse-v1" } */
>+
>+int
>+f11 (short *restrict x, int n)
>+{
>+  short res = 0;
>+  for (int i = 0; i < n; ++i)
>+res += x[i];
>+  return res;
>+}
>+
>+/* We should use SVE rather than Advanced SIMD.  */
>+/* { dg-final { scan-assembler {\tld1h\tz[0-9]+\.h,} } } */
>+/* { dg-final { scan-assembler {\tadd\tz[0-9]+\.h,} } } */
>+/* { dg-final { scan-assembler-not {\tldr\tq[0-9]+,} } } */
>+/* { dg-final { scan-assembler-not {\tv[0-9]+\.8h,} } } */
>diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>index 0eb13d6cc74..76e81ea546a 100644
>--- a/gcc/tree-vectorizer.h
>+++ b/gcc/tree-vectorizer.h
>@@ -2372,8 +2372,7 @@ vect_is_store_elt_extraction (vect_cost_for_stmt kind, 
>stmt_vec_info stmt_info)
> inline bool
> vect_is_reduction (stmt_vec_info stmt_info)
> {
>-  return (STMT_VINFO_REDUC_DEF (stmt_info)
>-|| VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)));
>+  return STMT_VINFO_REDUC_IDX (stmt_info) >= 0;
> }
> 
> /* If STMT_INFO describes a reduction, return the vect_reduction_type



[committed] analyzer: "__analyzer_dump_state" has no side-effects

2021-11-12 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 72f1c1c452198ba1df6f70959180b201cedc506e.

gcc/analyzer/ChangeLog:
* engine.cc (exploded_node::on_stmt_pre): Return when handling
"__analyzer_dump_state".

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index f21f8e5b78a..b29a21cce30 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1325,8 +1325,11 @@ exploded_node::on_stmt_pre (exploded_graph ,
  return;
}
   else if (is_special_named_call_p (call, "__analyzer_dump_state", 2))
-   state->impl_call_analyzer_dump_state (call, eg.get_ext_state (),
- ctxt);
+   {
+ state->impl_call_analyzer_dump_state (call, eg.get_ext_state (),
+   ctxt);
+ return;
+   }
   else if (is_setjmp_call_p (call))
{
  state->m_region_model->on_setjmp (call, this, ctxt);
-- 
2.26.3



Re: [PATCH] vect: Pass mode to gather/scatter tests

2021-11-12 Thread Richard Biener via Gcc-patches
On November 12, 2021 6:53:04 PM GMT+01:00, Richard Sandiford via Gcc-patches 
 wrote:
>vect_check_gather_scatter had a binary “does this target support
>internal gather/scatter functions” test.  This dates from the time when
>we only handled gathers and scatters via direct target support, with
>x86_64 using built-in functions and aarch64 using IFNs.  But now that we
>can emulate gathers, we need to check whether the gather for a particular
>mode is going to be emulated or not.
>
>Without this, enabling SVE regresses emulated Advanced SIMD gather
>sequences in cases where SVE isn't used.
>
>Livermore kernel 15 can now be vectorised with Advanced SIMD when
>SVE is enabled.
>
>Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok. 

Richard. 

>Richard
>
>
>gcc/
>   * genopinit.c (main): Turn supports_vec_gather_load and
>   supports_vec_scatter_store into signed char arrays and remove
>   supports_vec_gather_load_cached and supports_vec_scatter_store_cached.
>   * optabs-query.c (supports_vec_convert_optab_p): Add a mode parameter.
>   If the mode is not VOIDmode, test only for that mode.
>   (supports_vec_gather_load_p): Likewise.
>   (supports_vec_scatter_store_p): Likewise.
>   * optabs-query.h (supports_vec_gather_load_p): Likewise.
>   (supports_vec_scatter_store_p): Likewise.
>   * tree-vect-data-refs.c (vect_check_gather_scatter): Pass the
>   vector mode to supports_vec_gather_load_p and
>   supports_vec_scatter_store_p.
>
>gcc/testsuite/
>   * gfortran.dg/vect/vect-8.f90: Bump number of vectorized loops
>   to 25 for SVE.
>   * gcc.target/aarch64/sve/gather_load_10.c: New test.
>---
> gcc/genopinit.c   | 11 ++--
> gcc/optabs-query.c| 55 +--
> gcc/optabs-query.h|  4 +-
> .../gcc.target/aarch64/sve/gather_load_10.c   | 18 ++
> gcc/testsuite/gfortran.dg/vect/vect-8.f90 |  3 +-
> gcc/tree-vect-data-refs.c |  4 +-
> 6 files changed, 56 insertions(+), 39 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/gather_load_10.c
>
>diff --git a/gcc/genopinit.c b/gcc/genopinit.c
>index 195ddf74fa2..c6be748079d 100644
>--- a/gcc/genopinit.c
>+++ b/gcc/genopinit.c
>@@ -313,12 +313,11 @@ main (int argc, const char **argv)
>  "  /* Patterns that are used by optabs that are enabled for this 
> target.  */\n"
>  "  bool pat_enable[NUM_OPTAB_PATTERNS];\n"
>  "\n"
>- "  /* Cache if the target supports vec_gather_load for at least one 
>vector\n"
>- " mode.  */\n"
>- "  bool supports_vec_gather_load;\n"
>- "  bool supports_vec_gather_load_cached;\n"
>- "  bool supports_vec_scatter_store;\n"
>- "  bool supports_vec_scatter_store_cached;\n"
>+ "  /* Index VOIDmode caches if the target supports vec_gather_load 
>for any\n"
>+ " vector mode.  Every other index X caches specifically for mode 
>X.\n"
>+ " 1 means yes, -1 means no.  */\n"
>+ "  signed char supports_vec_gather_load[NUM_MACHINE_MODES];\n"
>+ "  signed char supports_vec_scatter_store[NUM_MACHINE_MODES];\n"
>  "};\n"
>  "extern void init_all_optabs (struct target_optabs *);\n"
>  "\n"
>diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
>index a6dd0fed610..1c0778cba55 100644
>--- a/gcc/optabs-query.c
>+++ b/gcc/optabs-query.c
>@@ -712,13 +712,16 @@ lshift_cheap_p (bool speed_p)
>   return cheap[speed_p];
> }
> 
>-/* Return true if vector conversion optab OP supports at least one mode,
>-   given that the second mode is always an integer vector.  */
>+/* If MODE is not VOIDmode, return true if vector conversion optab OP supports
>+   that mode, given that the second mode is always an integer vector.
>+   If MODE is VOIDmode, return true if OP supports any vector mode.  */
> 
> static bool
>-supports_vec_convert_optab_p (optab op)
>+supports_vec_convert_optab_p (optab op, machine_mode mode)
> {
>-  for (int i = 0; i < NUM_MACHINE_MODES; ++i)
>+  int start = mode == VOIDmode ? 0 : mode;
>+  int end = mode == VOIDmode ? MAX_MACHINE_MODE : mode;
>+  for (int i = start; i <= end; ++i)
> if (VECTOR_MODE_P ((machine_mode) i))
>   for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j)
>   if (convert_optab_handler (op, (machine_mode) i,
>@@ -728,39 +731,35 @@ supports_vec_convert_optab_p (optab op)
>   return false;
> }
> 
>-/* Return true if vec_gather_load is available for at least one vector
>-   mode.  */
>+/* If MODE is not VOIDmode, return true if vec_gather_load is available for
>+   that mode.  If MODE is VOIDmode, return true if gather_load is available
>+   for at least one vector mode.  */
> 
> bool
>-supports_vec_gather_load_p ()
>+supports_vec_gather_load_p (machine_mode mode)
> {
>-  if (this_fn_optabs->supports_vec_gather_load_cached)
>-return 

[r12-5194 Regression] FAIL: c-c++-common/goacc/firstprivate-mappings-1.c scan-tree-dump omplower "(?n)#pragma omp target oacc_parallel firstprivate\\(array_li.[0-9]+\\) map\\(from:array_so \\[len: 4\\

2021-11-12 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

b7e20480630e3eeb9eed8b3941da3b3f0c22c969 is the first bad commit
commit b7e20480630e3eeb9eed8b3941da3b3f0c22c969
Author: Chung-Lin Tang 
Date:   Fri Nov 12 20:29:00 2021 +0800

openmp: Relax handling of implicit map vs. existing device mappings

caused

FAIL: c-c++-common/goacc/firstprivate-mappings-1.c scan-tree-dump omplower 
"(?n)#pragma omp target oacc_parallel firstprivate\\(array_li.[0-9]+\\) 
map\\(from:array_so \\[len: 4\\]\\) \\["

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5194/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="goacc.exp=c-c++-common/goacc/firstprivate-mappings-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="goacc.exp=c-c++-common/goacc/firstprivate-mappings-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH v2] Check optab before transforming atomic bit test and operations

2021-11-12 Thread H.J. Lu via Gcc-patches
On Fri, Nov 12, 2021 at 8:13 AM Jakub Jelinek  wrote:
>
> On Fri, Nov 12, 2021 at 07:55:26AM -0800, H.J. Lu wrote:
> > > I have following patch queued for testing for this...
> > >
> > > 2021-11-12  Jakub Jelinek  
> > >
> > > PR target/103205
> > > * config/i386/sync.md (atomic_bit_test_and_set,
> > > atomic_bit_test_and_complement,
> > > atomic_bit_test_and_reset): Use OPTAB_WIDEN instead of
> > > OPTAB_DIRECT.
> > >
> > > * gcc.target/i386/pr103205.c: New test.
> >
> > Can you include my tests?  Or you can leave out your test and I can check
> > in my tests after your fix has been checked in.
>
> I'd prefer the latter.
>

Here is the v2 patch on top of yours.

-- 
H.J.
From 9520fa78ae04e845905d8bb2bab88cf429bf7840 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 12 Nov 2021 07:21:43 -0800
Subject: [PATCH v2] Check optab before transforming atomic bit test and
 operations

Check optab before transforming equivalent, but slighly different cases
of atomic bit test and operations to their canonical forms.

gcc/

	PR target/103205
	* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Check optab
	before transforming equivalent, but slighly different cases to
	their canonical forms.

gcc/testsuite/

	PR target/103205
	* gcc.target/i386/pr103205-1a.c: New test.
	* gcc.target/i386/pr103205-1b.c: Likewise.
	* gcc.target/i386/pr103205-2a.c: Likewise.
	* gcc.target/i386/pr103205-2b.c: Likewise.
	* gcc.target/i386/pr103205-3.c: Likewise.
	* gcc.target/i386/pr103205-4.c: Likewise.
---
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-1a.c b/gcc/testsuite/gcc.target/i386/pr103205-1a.c
new file mode 100644
index 000..3ea74b68059
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=himode_math" } */
+
+extern short foo;
+
+int
+foo1 (void)
+{
+  return __sync_fetch_and_and(, ~1) & 1;
+}
+
+int
+foo2 (void)
+{
+  return __sync_fetch_and_or (, 1) & 1;
+}
+
+int
+foo3 (void)
+{
+  return __sync_fetch_and_xor (, 1) & 1;
+}
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btcw" 1 } } */
+/* { dg-final { scan-assembler-not "cmpxchgw" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-1b.c b/gcc/testsuite/gcc.target/i386/pr103205-1b.c
new file mode 100644
index 000..061ffb8f95f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-1b.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^himode_math" } */
+
+#include "pr103205-1a.c"
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btcw" 1 } } */
+/* { dg-final { scan-assembler-not "cmpxchgw" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-2a.c b/gcc/testsuite/gcc.target/i386/pr103205-2a.c
new file mode 100644
index 000..4b2fb1f7c29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-2a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=himode_math" } */
+
+extern unsigned short foo;
+
+unsigned short
+foo1 (void)
+{
+  return __sync_fetch_and_and(, ~1) & 1;
+}
+
+unsigned short
+foo2 (void)
+{
+  return __sync_fetch_and_or (, 1) & 1;
+}
+
+unsigned short
+foo3 (void)
+{
+  return __sync_fetch_and_xor (, 1) & 1;
+}
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btcw" 1 } } */
+/* { dg-final { scan-assembler-not "cmpxchgw" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-2b.c b/gcc/testsuite/gcc.target/i386/pr103205-2b.c
new file mode 100644
index 000..0190d7c8c20
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-2b.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^himode_math" } */
+
+#include "pr103205-2a.c"
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btcw" 1 } } */
+/* { dg-final { scan-assembler-not "cmpxchgw" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-3.c b/gcc/testsuite/gcc.target/i386/pr103205-3.c
new file mode 100644
index 000..8500f6d7e63
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-3.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+extern char foo;
+
+int
+foo1 (void)
+{
+  return __sync_fetch_and_and(, ~1) & 1;
+}
+
+int
+foo2 (void)
+{
+  return __sync_fetch_and_or (, 1) & 1;
+}
+
+int
+foo3 (void)
+{
+  return __sync_fetch_and_xor (, 1) & 1;
+}
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*cmpxchgb" 3 } } */
diff --git 

[PATCH 5/5] vect: Support masked gather loads with SLP

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch extends the previous SLP gather load support so
that it can handle masked loads too.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vect-slp.c (arg1_arg4_map): New variable.
(vect_get_operand_map): Handle IFN_MASK_GATHER_LOAD.
(vect_build_slp_tree_1): Likewise.
(vect_build_slp_tree_2): Likewise.
* tree-vect-stmts.c (vectorizable_load): Expect the mask to be
the last SLP child node rather than the first.

gcc/testsuite/
* gcc.dg/vect/vect-gather-3.c: New test.
* gcc.dg/vect/vect-gather-4.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_8.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-gather-3.c | 64 ++
 gcc/testsuite/gcc.dg/vect/vect-gather-4.c | 48 ++
 .../aarch64/sve/mask_gather_load_8.c  | 65 +++
 gcc/tree-vect-slp.c   | 15 -
 gcc/tree-vect-stmts.c | 21 --
 5 files changed, 203 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_8.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
new file mode 100644
index 000..738bd3f3106
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-3.c
@@ -0,0 +1,64 @@
+#include "tree-vect.h"
+
+#define N 16
+
+void __attribute__((noipa))
+f (int *restrict y, int *restrict x, int *restrict indices)
+{
+  for (int i = 0; i < N; ++i)
+{
+  y[i * 2] = (indices[i * 2] < N * 2
+ ? x[indices[i * 2]] + 1
+ : 1);
+  y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2
+ ? x[indices[i * 2 + 1]] + 2
+ : 2);
+}
+}
+
+int y[N * 2];
+int x[N * 2] = {
+  72704, 52152, 51301, 96681,
+  57937, 60490, 34504, 60944,
+  42225, 28333, 88336, 74300,
+  29250, 20484, 38852, 91536,
+  86917, 63941, 31590, 21998,
+  22419, 26974, 28668, 13968,
+  3451, 20247, 44089, 85521,
+  22871, 87362, 50555, 85939
+};
+int indices[N * 2] = {
+  15, 0x1, 0xcafe0, 19,
+  7, 22, 19, 1,
+  0x2, 0x7, 15, 30,
+  5, 12, 11, 11,
+  10, 25, 5, 20,
+  22, 24, 32, 28,
+  30, 19, 6, 0xabcdef,
+  7, 12, 8, 21
+};
+int expected[N * 2] = {
+  91537, 2, 1, 22000,
+  60945, 28670, 21999, 52154,
+  1, 2, 91537, 50557,
+  60491, 29252, 74301, 74302,
+  88337, 20249, 60491, 22421,
+  28669, 3453, 1, 22873,
+  50556, 22000, 34505, 2,
+  60945, 29252, 42226, 26976
+};
+
+int
+main (void)
+{
+  check_vect ();
+
+  f (y, x, indices);
+  for (int i = 0; i < 32; ++i)
+if (y[i] != expected[i])
+  __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target { 
vect_gather_load_ifn && vect_masked_load } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
new file mode 100644
index 000..ee2e4e4999a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+
+#define N 16
+
+void
+f1 (int *restrict y, int *restrict x1, int *restrict x2,
+int *restrict indices)
+{
+  for (int i = 0; i < N; ++i)
+{
+  y[i * 2] = (indices[i * 2] < N * 2
+ ? x1[indices[i * 2]] + 1
+ : 1);
+  y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2
+ ? x2[indices[i * 2 + 1]] + 2
+ : 2);
+}
+}
+
+void
+f2 (int *restrict y, int *restrict x, int *restrict indices)
+{
+  for (int i = 0; i < N; ++i)
+{
+  y[i * 2] = (indices[i * 2] < N * 2
+ ? x[indices[i * 2]] + 1
+ : 1);
+  y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2
+ ? x[indices[i * 2 + 1] * 2] + 2
+ : 2);
+}
+}
+
+void
+f3 (int *restrict y, int *restrict x, int *restrict indices)
+{
+  for (int i = 0; i < N; ++i)
+{
+  y[i * 2] = (indices[i * 2] < N * 2
+ ? x[indices[i * 2]] + 1
+ : 1);
+  y[i * 2 + 1] = (indices[i * 2 + 1] < N * 2
+ ? x[(unsigned int) indices[i * 2 + 1]] + 2
+ : 2);
+}
+}
+
+/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect { 
target vect_gather_load_ifn } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_8.c 
b/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_8.c
new file mode 100644
index 000..95767f30a80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_8.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-vect-cost-model" } */
+
+#include 
+
+void
+f1 (int32_t *restrict y, int32_t *restrict x, int32_t *restrict index)
+{
+  for (int i = 0; i < 100; ++i)
+{
+  y[i * 2] = (index[i * 2] < 128

[PATCH 4/5] if-conv: Apply VN to hoisted conversions

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch is a prerequisite for a later one.  At the moment,
if-conversion converts predicated POINTER_PLUS_EXPRs into
non-wrapping forms, which for:

… = base + offset

becomes:

tmp = (unsigned long) base
… = tmp + offset

It then hoists these conversions out of the loop where possible.

However, because “base” is a valid gimple operand, there can be
multiple POINTER_PLUS_EXPRs with the same base, which can in turn
lead to multiple instances of the same conversion.  The later VN pass
is (and I think needs to be) restricted to the new if-converted code,
whereas here we're deliberately inserting the conversions before the
.LOOP_VECTORIZED condition:

/* If we versioned loop then make sure to insert invariant
   stmts before the .LOOP_VECTORIZED check since the vectorizer
   will re-use that for things like runtime alias versioning
   whose condition can end up using those invariants.  */

We can therefore enter the vectoriser with redundant conversions.

The easiest fix seemed to be to defer the hoisting until after VN.
This catches other hoisting opportunities too.

Hoisting the code from the (artificial) loop in pr99102.c means
that it's no longer worth vectorising.  The patch forces vectorisation
instead of relying on the cost model.

The patch also reverts pr87007-4.c and pr87007-5.c back to their
original forms, undoing changes in 783dc66f9ccb0019c3dad.
The code at the time the tests were added was:

testl   %edi, %edi
je  .L10
vxorps  %xmm1, %xmm1, %xmm1
vsqrtsd d3(%rip), %xmm1, %xmm0
vsqrtsd d2(%rip), %xmm1, %xmm1
...
.L10:
ret

with the operations being hoisted, and the vxorps was specifically
wanted (compared to the previous code).  This patch restores the code
to that form, with the hoisted operations and the vxorps.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-if-conv.c: Include tree-eh.h.
(predicate_statements): Remove pe argument.  Don't hoist
statements here.
(combine_blocks): Remove pe argument.
(ifcvt_can_hoist, ifcvt_can_hoist_further): New functions.
(ifcvt_hoist_invariants): Likewise.
(tree_if_conversion): Update call to combine_blocks.  Call
ifcvt_hoist_invariants after VN.

gcc/testsuite/
* gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model.

Revert:

2020-09-09  Richard Biener  

* gcc.target/i386/pr87007-4.c: Adjust.
* gcc.target/i386/pr87007-5.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr99102.c   |   2 +-
 gcc/testsuite/gcc.target/i386/pr87007-4.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr87007-5.c |   2 +-
 gcc/tree-if-conv.c| 122 --
 4 files changed, 114 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c 
b/gcc/testsuite/gcc.dg/vect/pr99102.c
index 6c1a13f0783..0d030d15c86 100644
--- a/gcc/testsuite/gcc.dg/vect/pr99102.c
+++ b/gcc/testsuite/gcc.dg/vect/pr99102.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model 
-fdump-tree-vect-details" } */
 /* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve256_hw 
} } */
 long a[44];
 short d, e = -7;
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c 
b/gcc/testsuite/gcc.target/i386/pr87007-4.c
index 9c4b8005af3..e91bdcbac44 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-4.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c
@@ -15,4 +15,4 @@ foo (int n, int k)
   d1 = ceil (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index e4d956a5d7f..20d13cf650b 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -15,4 +15,4 @@ foo (int n, int k)
   d1 = sqrt (d3);
 }
 
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index e88ddc9f788..0ad557a2f4d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -121,6 +121,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-ssa-dse.h"
 #include "tree-vectorizer.h"
+#include "tree-eh.h"
 
 /* Only handle PHIs with no more arguments unless we are asked to by
simd pragma.  */
@@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree cond,
 */
 
 static void
-predicate_statements (loop_p loop, edge pe)
+predicate_statements (loop_p loop)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
   auto_vec vect_sizes;
@@ -2597,13 +2598,7 @@ predicate_statements 

[PATCH v2][GCC] arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.

2021-11-12 Thread Srinath Parvathaneni via Gcc-patches
Hello,

This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo 
hard-register and also 
.save {ra_auth_code} and .cfi_offset ra_auth_code  dwarf directives for 
the PAC feature
in Armv8.1-M architecture.

RA_AUTH_CODE register number is 107 and it's dwarf register number is 143.

When compiled with "arm-none-eabi-gcc -O2  -mthumb -march=armv8.1-m.main+pacbti 
-S -fasynchronous-unwind-tables -g"
command line options, the directives supported in this patch looks like below:

...
push{ip}
.save {ra_auth_code}
.cfi_def_cfa_offset 8
.cfi_offset 143, -8
...

This patch can be committed after the patch at 
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583407.html
is committed.

Regression tested on arm-none-eabi target and found no regressions.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2021-11-12  Srinath Parvathaneni  

* config/arm/aout.h (ra_auth_code): Add to enum.
* config/arm/arm.c (emit_multi_reg_push): Add RA_AUTH_CODE register to
dwarf frame expression instead of IP_REGNUM.
(arm_expand_prologue): Mark as frame related insn.
(arm_regno_class): Check for pac pseudo reigster.
(arm_dbx_register_number): Assign ra_auth_code register number in dwarf.
(arm_unwind_emit_sequence): Print .save directive with ra_auth_code
register.
(arm_conditional_register_usage): Mark ra_auth_code in fixed reigsters.
* config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify.
(IS_PAC_Pseudo_REGNUM): Define.
(enum reg_class): Add PAC_REG entry.
* config/arm/arm.md (RA_AUTH_CODE): Define.

gcc/testsuite/ChangeLog:

2021-11-12  Srinath Parvathaneni  

* g++.target/arm/pac-1.C: New test.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
25a2812a663742893b928398b0d3948e97f1905b..c69e299e012f46c8d0711830125dbf2f6b2e93d7
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -74,7 +74,8 @@
   "wr8",   "wr9",   "wr10",  "wr11",   \
   "wr12",  "wr13",  "wr14",  "wr15",   \
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",  \
-  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0" \
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0",\
+  "ra_auth_code"   \
 }
 #endif
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
8e6ef41f6b065217d1af3f4f1cb85b2d8fbd0dc0..f31944e85c9ab83501f156d138e2aea1bcb5b79d
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -815,7 +815,8 @@ extern const int arm_arch_cde_coproc_bits[];
s16-s31   S VFP variable (aka d8-d15).
vfpcc   Not a real register.  Represents the VFP condition
code flags.
-   vpr Used to represent MVE VPR predication.  */
+   vpr Used to represent MVE VPR predication.
+   ra_auth_codePseudo register to save PAC.  */
 
 /* The stack backtrace structure is as follows:
   fp points to here:  |  save code pointer  |  [fp]
@@ -856,7 +857,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 /* 1 for registers not available across function calls.
@@ -886,7 +887,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE
@@ -1062,10 +1063,10 @@ extern const int arm_arch_cde_coproc_bits[];
&& (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   + 1 APSRQ + 1 APSRGE + 1 VPR + 1 Pseudo register to save PAC.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
-#define FIRST_PSEUDO_REGISTER   107
+#define FIRST_PSEUDO_REGISTER   108
 
 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)
 
@@ -1248,12 +1249,15 @@ extern int arm_regs_in_sequence[];
   CC_REGNUM, VFPCC_REGNUM, \
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,\
   SP_REGNUM, PC_REGNUM, APSRQ_REGNUM,  \
-  APSRGE_REGNUM, VPR_REGNUM\
+  APSRGE_REGNUM, VPR_REGNUM, RA_AUTH_CODE  \
 }
 
 #define IS_VPR_REGNUM(REGNUM) \
   ((REGNUM) == VPR_REGNUM)
 
+#define IS_PAC_Pseudo_REGNUM(REGNUM) \
+  ((REGNUM) == RA_AUTH_CODE)
+
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
 
@@ -1292,6 +1296,7 @@ enum 

[PATCH 3/5] vect: Support gather loads with SLP

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch adds SLP support for IFN_GATHER_LOAD.  Like the SLP
support for IFN_MASK_LOAD, it works by treating only some of the
arguments as child nodes.  Unlike IFN_MASK_LOAD, it requires the
other arguments (base, scale, and extension type) to be the same
for all calls in the group.  It does not require/expect the loads
to be in a group (which probably wouldn't make sense for gathers).

I was worried about the possible alias effect of moving gathers
around to be part of the same SLP group.  The patch therefore
makes vect_analyze_data_ref_dependence treat gathers and scatters
as a top-level concern, punting if the accesses aren't completely
independent and if the user hasn't told us that a particular
VF is safe.  I think in practice we already punted in the same
circumstances; the idea is just to make it more explicit.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* doc/sourcebuild.texi (vect_gather_load_ifn): Document.
* tree-vect-data-refs.c (vect_analyze_data_ref_dependence):
Commonize safelen handling.  Punt for anything involving
gathers and scatters unless safelen says otherwise.
* tree-vect-slp.c (arg1_map): New variable.
(vect_get_operand_map): Handle IFN_GATHER_LOAD.
(vect_build_slp_tree_1): Likewise.
(vect_build_slp_tree_2): Likewise.
(compatible_calls_p): If vect_get_operand_map returns nonnull,
check that any skipped arguments are equal.
(vect_slp_analyze_node_operations_1): Tighten reduction check.
* tree-vect-stmts.c (check_load_store_for_partial_vectors): Take
an ncopies argument.
(vect_get_gather_scatter_ops): Take slp_node and ncopies arguments.
Handle SLP nodes.
(vectorizable_store, vectorizable_load): Adjust accordingly.

gcc/testsuite/
* lib/target-supports.exp
(check_effective_target_vect_gather_load_ifn): New target test.
* gcc.dg/vect/vect-gather-1.c: New test.
* gcc.dg/vect/vect-gather-2.c: Likewise.
* gcc.target/aarch64/sve/gather_load_11.c: Likewise.
---
 gcc/doc/sourcebuild.texi  |  4 ++
 gcc/testsuite/gcc.dg/vect/vect-gather-1.c | 60 +
 gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 36 +++
 .../gcc.target/aarch64/sve/gather_load_11.c   | 49 ++
 gcc/testsuite/lib/target-supports.exp |  6 ++
 gcc/tree-vect-data-refs.c | 64 +--
 gcc/tree-vect-slp.c   | 29 +++--
 gcc/tree-vect-stmts.c | 26 
 8 files changed, 223 insertions(+), 51 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/gather_load_11.c

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 40b1e0d8167..702cd0c53e4 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1639,6 +1639,10 @@ Target supports vector masked loads.
 @item vect_masked_store
 Target supports vector masked stores.
 
+@item vect_gather_load_ifn
+Target supports vector gather loads using internal functions
+(rather than via built-in functions or emulation).
+
 @item vect_scatter_store
 Target supports vector scatter stores.
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
new file mode 100644
index 000..4cee73fc775
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-1.c
@@ -0,0 +1,60 @@
+#include "tree-vect.h"
+
+#define N 16
+
+void __attribute__((noipa))
+f (int *restrict y, int *restrict x, int *restrict indices)
+{
+  for (int i = 0; i < N; ++i)
+{
+  y[i * 2] = x[indices[i * 2]] + 1;
+  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+}
+}
+
+int y[N * 2];
+int x[N * 2] = {
+  72704, 52152, 51301, 96681,
+  57937, 60490, 34504, 60944,
+  42225, 28333, 88336, 74300,
+  29250, 20484, 38852, 91536,
+  86917, 63941, 31590, 21998,
+  22419, 26974, 28668, 13968,
+  3451, 20247, 44089, 85521,
+  22871, 87362, 50555, 85939
+};
+int indices[N * 2] = {
+  15, 16, 9, 19,
+  7, 22, 19, 1,
+  22, 13, 15, 30,
+  5, 12, 11, 11,
+  10, 25, 5, 20,
+  22, 24, 24, 28,
+  30, 19, 6, 4,
+  7, 12, 8, 21
+};
+int expected[N * 2] = {
+  91537, 86919, 28334, 22000,
+  60945, 28670, 21999, 52154,
+  28669, 20486, 91537, 50557,
+  60491, 29252, 74301, 74302,
+  88337, 20249, 60491, 22421,
+  28669, 3453, 3452, 22873,
+  50556, 22000, 34505, 57939,
+  60945, 29252, 42226, 26976
+};
+
+int
+main (void)
+{
+  check_vect ();
+
+  f (y, x, indices);
+  for (int i = 0; i < 32; ++i)
+if (y[i] != expected[i])
+  __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
vect_gather_load_ifn } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c 

[PATCH 2/5] vect: Use generalised accessors to build SLP nodes

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch adds:

- gimple_num_args
- gimple_arg
- gimple_arg_ptr

for accessing rhs operands of an assignment, call or PHI.  This is
similar to the existing gimple_get_lhs.

I guess there's a danger that these routines could be overused,
such as in cases where gimple_assign_rhs1 etc. would be more
appropriate.  I think the routines are still worth having though.
These days, most new operations are added as internal functions rather
than tree codes, so it's useful to be able to handle assignments and
calls in a consistent way.

The patch also generalises the way that SLP child nodes map
to gimple stmt operands.  This is useful for later patches.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* gimple.h (gimple_num_args, gimple_arg, gimple_arg_ptr): New
functions.
* tree-vect-slp.c (cond_expr_maps, arg2_map): New variables.
(vect_get_operand_map): New function.
(vect_get_and_check_slp_defs): Fix outdated comment.
Use vect_get_operand_map and new gimple argument accessors.
(vect_build_slp_tree_2): Likewise.
---
 gcc/gimple.h|  38 
 gcc/tree-vect-slp.c | 148 +++-
 2 files changed, 114 insertions(+), 72 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 3cde3cde7fe..f7fdefc5362 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -4692,6 +4692,44 @@ gimple_phi_arg_has_location (const gphi *phi, size_t i)
   return gimple_phi_arg_location (phi, i) != UNKNOWN_LOCATION;
 }
 
+/* Return the number of arguments that can be accessed by gimple_arg.  */
+
+static inline unsigned
+gimple_num_args (const gimple *gs)
+{
+  if (auto phi = dyn_cast (gs))
+return gimple_phi_num_args (phi);
+  if (auto call = dyn_cast (gs))
+return gimple_call_num_args (call);
+  return gimple_num_ops (as_a  (gs)) - 1;
+}
+
+/* GS must be an assignment, a call, or a PHI.
+   If it's an assignment, return rhs operand I.
+   If it's a call, return function argument I.
+   If it's a PHI, return the value of PHI argument I.  */
+
+static inline tree
+gimple_arg (const gimple *gs, unsigned int i)
+{
+  if (auto phi = dyn_cast (gs))
+return gimple_phi_arg_def (phi, i);
+  if (auto call = dyn_cast (gs))
+return gimple_call_arg (call, i);
+  return gimple_op (as_a  (gs), i + 1);
+}
+
+/* Return a pointer to gimple_arg (GS, I).  */
+
+static inline tree *
+gimple_arg_ptr (gimple *gs, unsigned int i)
+{
+  if (auto phi = dyn_cast (gs))
+return gimple_phi_arg_def_ptr (phi, i);
+  if (auto call = dyn_cast (gs))
+return gimple_call_arg_ptr (call, i);
+  return gimple_op_ptr (as_a  (gs), i + 1);
+}
 
 /* Return the region number for GIMPLE_RESX RESX_STMT.  */
 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index f4123cf830a..2594ab7607f 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -454,15 +454,57 @@ vect_def_types_match (enum vect_def_type dta, enum 
vect_def_type dtb)
  && (dtb == vect_external_def || dtb == vect_constant_def)));
 }
 
+static const int cond_expr_maps[3][5] = {
+  { 4, -1, -2, 1, 2 },
+  { 4, -2, -1, 1, 2 },
+  { 4, -1, -2, 2, 1 }
+};
+static const int arg2_map[] = { 1, 2 };
+
+/* For most SLP statements, there is a one-to-one mapping between
+   gimple arguments and child nodes.  If that is not true for STMT,
+   return an array that contains:
+
+   - the number of child nodes, followed by
+   - for each child node, the index of the argument associated with that node.
+ The special index -1 is the first operand of an embedded comparison and
+ the special index -2 is the second operand of an embedded comparison.
+
+   SWAP is as for vect_get_and_check_slp_defs.  */
+
+static const int *
+vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
+{
+  if (auto assign = dyn_cast (stmt))
+{
+  if (gimple_assign_rhs_code (assign) == COND_EXPR
+ && COMPARISON_CLASS_P (gimple_assign_rhs1 (assign)))
+   return cond_expr_maps[swap];
+}
+  gcc_assert (!swap);
+  if (auto call = dyn_cast (stmt))
+{
+  if (gimple_call_internal_p (call))
+   switch (gimple_call_internal_fn (call))
+ {
+ case IFN_MASK_LOAD:
+   return arg2_map;
+
+ default:
+   break;
+ }
+}
+  return nullptr;
+}
+
 /* Get the defs for the rhs of STMT (collect them in OPRNDS_INFO), check that
they are of a valid type and that they match the defs of the first stmt of
the SLP group (stored in OPRNDS_INFO).  This function tries to match stmts
-   by swapping operands of STMTS[STMT_NUM] when possible.  Non-zero *SWAP
-   indicates swap is required for cond_expr stmts.  Specifically, *SWAP
+   by swapping operands of STMTS[STMT_NUM] when possible.  Non-zero SWAP
+   indicates swap is required for cond_expr stmts.  Specifically, SWAP
is 1 if STMT is cond and operands of comparison need to be swapped;
-   *SWAP is 2 if STMT is cond and code of comparison 

[PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 12, 2021 at 02:27:16PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Nov 12, 2021 at 02:20:23PM +0100, Jakub Jelinek via Gcc-patches wrote:
> > This patch assumes that .shared variables are initialized to 0,
> > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html lists
> > in Table 7. .shared as non-initializable.  If that isn't the case,
> > we need to initialize it somewhere for the case of #pragma omp target
> > without #pragma omp teams in it, maybe in libgcc/config/nvptx/crt0.c ?
> 
> A quick look at libgcc/config/nvptx/crt0.c shows the target supports
> __attribute__((shared)), so perhaps either following instead, or, if
> .shared isn't preinitialized to zero, defining the variable in
> libgcc/config/nvptx/crt0.c , adding there __gomp_team_num = 0;
> and adding extern keyword before int __gomp_team_num __attribute__((shared));
> in libgomp/config/nvptx/target.c.

And finally here is a third version, which fixes a typo in the previous
patch (in instead of int) and actually initializes the shared var because
PTX documentation doesn't say anything about how the shared vars are
initialized.

Tested on x86_64-linux with nvptx-none offloading, ok for trunk?

2021-11-12  Jakub Jelinek  

* config/nvptx/team.c (__gomp_team_num): Define as
__attribute__((shared)) var.
(gomp_nvptx_main): Initialize __gomp_team_num to 0.
* config/nvptx/target.c (__gomp_team_num): Declare as
extern __attribute__((shared)) var.
(GOMP_teams4): Use __gomp_team_num as the team number instead of
%ctaid.x.  If first, initialize it to %ctaid.x.  If num_teams_lower
is bigger than num_blocks, use num_teams_lower teams and arrange for
bumping of __gomp_team_num if !first and returning false once we run
out of teams.
* config/nvptx/teams.c (__gomp_team_num): Declare as
extern __attribute__((shared)) var.
(omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x.

--- libgomp/config/nvptx/team.c.jj  2021-05-25 13:43:02.793121350 +0200
+++ libgomp/config/nvptx/team.c 2021-11-12 17:49:02.847341650 +0100
@@ -32,6 +32,7 @@
 #include 
 
 struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
+int __gomp_team_num __attribute__((shared));
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
@@ -57,6 +58,7 @@ gomp_nvptx_main (void (*fn) (void *), vo
   /* Starting additional threads is not supported.  */
   gomp_global_icv.dyn_var = true;
 
+  __gomp_team_num = 0;
   nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
   memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
 
--- libgomp/config/nvptx/target.c.jj2021-11-12 15:57:29.400632875 +0100
+++ libgomp/config/nvptx/target.c   2021-11-12 17:47:39.499533296 +0100
@@ -26,28 +26,41 @@
 #include "libgomp.h"
 #include 
 
+extern int __gomp_team_num __attribute__((shared));
+
 bool
 GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
 unsigned int thread_limit, bool first)
 {
+  unsigned int num_blocks, block_id;
+  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
   if (!first)
-return false;
+{
+  unsigned int team_num;
+  if (num_blocks > gomp_num_teams_var)
+   return false;
+  team_num = __gomp_team_num;
+  if (team_num > gomp_num_teams_var - num_blocks)
+   return false;
+  __gomp_team_num = team_num + num_blocks;
+  return true;
+}
   if (thread_limit)
 {
   struct gomp_task_icv *icv = gomp_icv (true);
   icv->thread_limit_var
= thread_limit > INT_MAX ? UINT_MAX : thread_limit;
 }
-  unsigned int num_blocks, block_id;
-  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
-  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
-  /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple
- times for some CTAs.  */
-  (void) num_teams_lower;
-  if (!num_teams_upper || num_teams_upper >= num_blocks)
+  if (!num_teams_upper)
 num_teams_upper = num_blocks;
-  else if (block_id >= num_teams_upper)
+  else if (num_blocks < num_teams_lower)
+num_teams_upper = num_teams_lower;
+  else if (num_blocks < num_teams_upper)
+num_teams_upper = num_blocks;
+  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
+  if (block_id >= num_teams_upper)
 return false;
+  __gomp_team_num = block_id;
   gomp_num_teams_var = num_teams_upper - 1;
   return true;
 }
--- libgomp/config/nvptx/teams.c.jj 2021-05-25 13:43:02.793121350 +0200
+++ libgomp/config/nvptx/teams.c2021-11-12 17:37:18.933361024 +0100
@@ -28,6 +28,8 @@
 
 #include "libgomp.h"
 
+extern int __gomp_team_num __attribute__((shared));
+
 void
 GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams,
unsigned int thread_limit, unsigned int flags)
@@ -48,9 +50,7 @@ omp_get_num_teams (void)
 int
 omp_get_team_num (void)
 {
-  int ctaid;
-  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid));
-  

[PATCH 1/5] vect: Use code_helper when building SLP nodes

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch uses code_helper to represent the common (and
alternative) operations when building an SLP node.  It's not
much of a saving on its own, but it helps with later patches.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vect-slp.c (vect_build_slp_tree_1): Use code_helper
to record the operations performed by statements, only using
CALL_EXPR for things that don't map to built-in or internal
functions.  For shifts, require all shift amounts to be equal
if optab_vector is not supported but optab_scalar is.
---
 gcc/tree-vect-slp.c | 77 +++--
 1 file changed, 26 insertions(+), 51 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 94c75497495..f4123cf830a 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -876,17 +876,13 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
 {
   unsigned int i;
   stmt_vec_info first_stmt_info = stmts[0];
-  enum tree_code first_stmt_code = ERROR_MARK;
-  enum tree_code alt_stmt_code = ERROR_MARK;
-  enum tree_code rhs_code = ERROR_MARK;
-  enum tree_code first_cond_code = ERROR_MARK;
+  code_helper first_stmt_code = ERROR_MARK;
+  code_helper alt_stmt_code = ERROR_MARK;
+  code_helper rhs_code = ERROR_MARK;
+  code_helper first_cond_code = ERROR_MARK;
   tree lhs;
   bool need_same_oprnds = false;
   tree vectype = NULL_TREE, first_op1 = NULL_TREE;
-  optab optab;
-  int icode;
-  machine_mode optab_op2_mode;
-  machine_mode vec_mode;
   stmt_vec_info first_load = NULL, prev_first_load = NULL;
   bool first_stmt_load_p = false, load_p = false;
   bool first_stmt_phi_p = false, phi_p = false;
@@ -966,13 +962,16 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   gcall *call_stmt = dyn_cast  (stmt);
   if (call_stmt)
{
- rhs_code = CALL_EXPR;
+ combined_fn cfn = gimple_call_combined_fn (call_stmt);
+ if (cfn != CFN_LAST)
+   rhs_code = cfn;
+ else
+   rhs_code = CALL_EXPR;
 
- if (gimple_call_internal_p (stmt, IFN_MASK_LOAD))
+ if (cfn == CFN_MASK_LOAD)
load_p = true;
- else if ((gimple_call_internal_p (call_stmt)
-   && (!vectorizable_internal_fn_p
-   (gimple_call_internal_fn (call_stmt
+ else if ((internal_fn_p (cfn)
+   && !vectorizable_internal_fn_p (as_internal_fn (cfn)))
   || gimple_call_tail_p (call_stmt)
   || gimple_call_noreturn_p (call_stmt)
   || gimple_call_chain (call_stmt))
@@ -1013,32 +1012,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  || rhs_code == LROTATE_EXPR
  || rhs_code == RROTATE_EXPR)
{
- vec_mode = TYPE_MODE (vectype);
-
  /* First see if we have a vector/vector shift.  */
- optab = optab_for_tree_code (rhs_code, vectype,
-  optab_vector);
-
- if (!optab
- || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
+ if (!directly_supported_p (rhs_code, vectype, optab_vector))
{
  /* No vector/vector shift, try for a vector/scalar shift.  */
- optab = optab_for_tree_code (rhs_code, vectype,
-  optab_scalar);
-
- if (!optab)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"Build SLP failed: no optab.\n");
- if (is_a  (vinfo) && i != 0)
-   continue;
- /* Fatal mismatch.  */
- matches[0] = false;
- return false;
-   }
- icode = (int) optab_handler (optab, vec_mode);
- if (icode == CODE_FOR_nothing)
+ if (!directly_supported_p (rhs_code, vectype, optab_scalar))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -1050,12 +1028,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  matches[0] = false;
  return false;
}
- optab_op2_mode = insn_data[icode].operand[2].mode;
- if (!VECTOR_MODE_P (optab_op2_mode))
-   {
- need_same_oprnds = true;
- first_op1 = gimple_assign_rhs2 (stmt);
-   }
+ need_same_oprnds = true;
+ first_op1 = gimple_assign_rhs2 (stmt);
}
}
  else if (rhs_code == WIDEN_LSHIFT_EXPR)
@@ -1081,8 +1055,7 @@ 

[PATCH] vect: Fix SVE mask_gather_load/store_store tests

2021-11-12 Thread Richard Sandiford via Gcc-patches
If-conversion now applies rewrite_to_defined_overflow to the
address calculation in an IFN_MASK_LOAD.  This means that we
end up with:

cast_base = (uintptr_t) base;
uncast_sum = cast_base + offset;
sum = (orig_type *) uncast_sum;

If the target supports IFN_MASK_GATHER_LOAD with pointer-sized
offsets for the given vectype, we wouldn't look through the sum
cast and so would needlessly vectorise the uncast_sum addition.

This showed up as several failures in gcc.target/aarch64/sve.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vect-data-refs.c (vect_check_gather_scatter): Continue
processing conversions if the current offset is a pointer.
---
 gcc/tree-vect-data-refs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f1d7f01a9ce..888ad72f3a9 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -4139,6 +4139,7 @@ vect_check_gather_scatter (stmt_vec_info stmt_info, 
loop_vec_info loop_vinfo,
  /* Don't include the conversion if the target is happy with
 the current offset type.  */
  if (use_ifn_p
+ && !POINTER_TYPE_P (TREE_TYPE (off))
  && vect_gather_scatter_fn_p (loop_vinfo, DR_IS_READ (dr),
   masked_p, vectype, memory_type,
   TREE_TYPE (off), scale, ,
-- 
2.25.1



[PATCH] vect: Fix vect_is_reduction

2021-11-12 Thread Richard Sandiford via Gcc-patches
The current definition of vect_is_reduction (provided for target
costing) misses some pattern statements.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vectorizer.h (vect_is_reduction): Use STMT_VINFO_REDUC_IDX.

gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_13.c: New test.
---
 .../gcc.target/aarch64/sve/cost_model_13.c   | 16 
 gcc/tree-vectorizer.h|  3 +--
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c
new file mode 100644
index 000..95f2ce91f80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_13.c
@@ -0,0 +1,16 @@
+/* { dg-options "-O3 -mtune=neoverse-v1" } */
+
+int
+f11 (short *restrict x, int n)
+{
+  short res = 0;
+  for (int i = 0; i < n; ++i)
+res += x[i];
+  return res;
+}
+
+/* We should use SVE rather than Advanced SIMD.  */
+/* { dg-final { scan-assembler {\tld1h\tz[0-9]+\.h,} } } */
+/* { dg-final { scan-assembler {\tadd\tz[0-9]+\.h,} } } */
+/* { dg-final { scan-assembler-not {\tldr\tq[0-9]+,} } } */
+/* { dg-final { scan-assembler-not {\tv[0-9]+\.8h,} } } */
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 0eb13d6cc74..76e81ea546a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2372,8 +2372,7 @@ vect_is_store_elt_extraction (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info)
 inline bool
 vect_is_reduction (stmt_vec_info stmt_info)
 {
-  return (STMT_VINFO_REDUC_DEF (stmt_info)
- || VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)));
+  return STMT_VINFO_REDUC_IDX (stmt_info) >= 0;
 }
 
 /* If STMT_INFO describes a reduction, return the vect_reduction_type
-- 
2.25.1



[PATCH] vect: Pass mode to gather/scatter tests

2021-11-12 Thread Richard Sandiford via Gcc-patches
vect_check_gather_scatter had a binary “does this target support
internal gather/scatter functions” test.  This dates from the time when
we only handled gathers and scatters via direct target support, with
x86_64 using built-in functions and aarch64 using IFNs.  But now that we
can emulate gathers, we need to check whether the gather for a particular
mode is going to be emulated or not.

Without this, enabling SVE regresses emulated Advanced SIMD gather
sequences in cases where SVE isn't used.

Livermore kernel 15 can now be vectorised with Advanced SIMD when
SVE is enabled.

Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* genopinit.c (main): Turn supports_vec_gather_load and
supports_vec_scatter_store into signed char arrays and remove
supports_vec_gather_load_cached and supports_vec_scatter_store_cached.
* optabs-query.c (supports_vec_convert_optab_p): Add a mode parameter.
If the mode is not VOIDmode, test only for that mode.
(supports_vec_gather_load_p): Likewise.
(supports_vec_scatter_store_p): Likewise.
* optabs-query.h (supports_vec_gather_load_p): Likewise.
(supports_vec_scatter_store_p): Likewise.
* tree-vect-data-refs.c (vect_check_gather_scatter): Pass the
vector mode to supports_vec_gather_load_p and
supports_vec_scatter_store_p.

gcc/testsuite/
* gfortran.dg/vect/vect-8.f90: Bump number of vectorized loops
to 25 for SVE.
* gcc.target/aarch64/sve/gather_load_10.c: New test.
---
 gcc/genopinit.c   | 11 ++--
 gcc/optabs-query.c| 55 +--
 gcc/optabs-query.h|  4 +-
 .../gcc.target/aarch64/sve/gather_load_10.c   | 18 ++
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 |  3 +-
 gcc/tree-vect-data-refs.c |  4 +-
 6 files changed, 56 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/gather_load_10.c

diff --git a/gcc/genopinit.c b/gcc/genopinit.c
index 195ddf74fa2..c6be748079d 100644
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@@ -313,12 +313,11 @@ main (int argc, const char **argv)
   "  /* Patterns that are used by optabs that are enabled for this 
target.  */\n"
   "  bool pat_enable[NUM_OPTAB_PATTERNS];\n"
   "\n"
-  "  /* Cache if the target supports vec_gather_load for at least one 
vector\n"
-  " mode.  */\n"
-  "  bool supports_vec_gather_load;\n"
-  "  bool supports_vec_gather_load_cached;\n"
-  "  bool supports_vec_scatter_store;\n"
-  "  bool supports_vec_scatter_store_cached;\n"
+  "  /* Index VOIDmode caches if the target supports vec_gather_load 
for any\n"
+  " vector mode.  Every other index X caches specifically for mode 
X.\n"
+  " 1 means yes, -1 means no.  */\n"
+  "  signed char supports_vec_gather_load[NUM_MACHINE_MODES];\n"
+  "  signed char supports_vec_scatter_store[NUM_MACHINE_MODES];\n"
   "};\n"
   "extern void init_all_optabs (struct target_optabs *);\n"
   "\n"
diff --git a/gcc/optabs-query.c b/gcc/optabs-query.c
index a6dd0fed610..1c0778cba55 100644
--- a/gcc/optabs-query.c
+++ b/gcc/optabs-query.c
@@ -712,13 +712,16 @@ lshift_cheap_p (bool speed_p)
   return cheap[speed_p];
 }
 
-/* Return true if vector conversion optab OP supports at least one mode,
-   given that the second mode is always an integer vector.  */
+/* If MODE is not VOIDmode, return true if vector conversion optab OP supports
+   that mode, given that the second mode is always an integer vector.
+   If MODE is VOIDmode, return true if OP supports any vector mode.  */
 
 static bool
-supports_vec_convert_optab_p (optab op)
+supports_vec_convert_optab_p (optab op, machine_mode mode)
 {
-  for (int i = 0; i < NUM_MACHINE_MODES; ++i)
+  int start = mode == VOIDmode ? 0 : mode;
+  int end = mode == VOIDmode ? MAX_MACHINE_MODE : mode;
+  for (int i = start; i <= end; ++i)
 if (VECTOR_MODE_P ((machine_mode) i))
   for (int j = MIN_MODE_VECTOR_INT; j < MAX_MODE_VECTOR_INT; ++j)
if (convert_optab_handler (op, (machine_mode) i,
@@ -728,39 +731,35 @@ supports_vec_convert_optab_p (optab op)
   return false;
 }
 
-/* Return true if vec_gather_load is available for at least one vector
-   mode.  */
+/* If MODE is not VOIDmode, return true if vec_gather_load is available for
+   that mode.  If MODE is VOIDmode, return true if gather_load is available
+   for at least one vector mode.  */
 
 bool
-supports_vec_gather_load_p ()
+supports_vec_gather_load_p (machine_mode mode)
 {
-  if (this_fn_optabs->supports_vec_gather_load_cached)
-return this_fn_optabs->supports_vec_gather_load;
+  if (!this_fn_optabs->supports_vec_gather_load[mode])
+this_fn_optabs->supports_vec_gather_load[mode]
+  = (supports_vec_convert_optab_p 

[committed] aarch64: Remove redundant costing code

2021-11-12 Thread Richard Sandiford via Gcc-patches
Previous patches made some of the complex parts of the issue rate
code redundant.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_vector_op::n_advsimd_ops): Delete.
(aarch64_vector_op::m_seen_loads): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Don't push to
m_advsimd_ops.
(aarch64_vector_op::count_ops): Remove vectype and factor parameters.
Remove code that tries to predict different vec_flags from the
current loop's.
(aarch64_vector_costs::add_stmt_cost): Update accordingly.
Remove m_advsimd_ops handling.
---
 gcc/config/aarch64/aarch64.c | 142 ---
 1 file changed, 30 insertions(+), 112 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1e2f3bf3765..d8410fc52f2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14914,8 +14914,8 @@ public:
 private:
   void record_potential_advsimd_unrolling (loop_vec_info);
   void analyze_loop_vinfo (loop_vec_info);
-  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info, tree,
- aarch64_vec_op_count *, unsigned int);
+  void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info,
+ aarch64_vec_op_count *);
   fractional_cost adjust_body_cost_sve (const aarch64_vec_op_count *,
fractional_cost, unsigned int,
unsigned int *, bool *);
@@ -14959,16 +14959,6 @@ private:
  or vector loop.  There is one entry for each tuning option of
  interest.  */
   auto_vec m_ops;
-
-  /* Used only when vectorizing loops for SVE.  For the first element of M_OPS,
- it estimates what the equivalent Advanced SIMD-only code would need
- in order to perform the same work as one iteration of the SVE loop.  */
-  auto_vec m_advsimd_ops;
-
-  /* Used to detect cases in which we end up costing the same load twice,
- once to account for results that are actually used and once to account
- for unused results.  */
-  hash_map, unsigned int> m_seen_loads;
 };
 
 aarch64_vector_costs::aarch64_vector_costs (vec_info *vinfo,
@@ -14980,8 +14970,6 @@ aarch64_vector_costs::aarch64_vector_costs (vec_info 
*vinfo,
   if (auto *issue_info = aarch64_tune_params.vec_costs->issue_info)
 {
   m_ops.quick_push ({ issue_info, m_vec_flags });
-  if (m_vec_flags & VEC_ANY_SVE)
-   m_advsimd_ops.quick_push ({ issue_info, VEC_ADVSIMD });
   if (aarch64_tune_params.vec_costs == _vector_cost)
{
  unsigned int vf_factor = (m_vec_flags & VEC_ANY_SVE) ? 2 : 1;
@@ -15620,26 +15608,19 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
   return stmt_cost;
 }
 
-/* COUNT, KIND, STMT_INFO and VECTYPE are the same as for
-   vector_costs::add_stmt_cost and they describe an operation in the
-   body of a vector loop.  Record issue information relating to the vector
-   operation in OPS, where OPS is one of m_ops or m_advsimd_ops; see the
-   comments above those variables for details.
-
-   FACTOR says how many iterations of the loop described by VEC_FLAGS would be
-   needed to match one iteration of the vector loop in VINFO.  */
+/* COUNT, KIND and STMT_INFO are the same as for vector_costs::add_stmt_cost
+   and they describe an operation in the body of a vector loop.  Record issue
+   information relating to the vector operation in OPS.  */
 void
 aarch64_vector_costs::count_ops (unsigned int count, vect_cost_for_stmt kind,
-stmt_vec_info stmt_info, tree vectype,
-aarch64_vec_op_count *ops,
-unsigned int factor)
+stmt_vec_info stmt_info,
+aarch64_vec_op_count *ops)
 {
   const aarch64_base_vec_issue_info *base_issue = ops->base_issue_info ();
   if (!base_issue)
 return;
   const aarch64_simd_vec_issue_info *simd_issue = ops->simd_issue_info ();
   const aarch64_sve_vec_issue_info *sve_issue = ops->sve_issue_info ();
-  unsigned int vec_flags = ops->vec_flags ();
 
   /* Calculate the minimum cycles per iteration imposed by a reduction
  operation.  */
@@ -15647,46 +15628,17 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
   && vect_is_reduction (stmt_info))
 {
   unsigned int base
-   = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, vec_flags);
-  if (vect_reduc_type (m_vinfo, stmt_info) == FOLD_LEFT_REDUCTION)
-   {
- if (vectype && aarch64_sve_mode_p (TYPE_MODE (vectype)))
-   {
- /* When costing an SVE FADDA, the vectorizer treats vec_to_scalar
-as a single operation, whereas for Advanced SIMD it is a
-per-element one.  Increase the factor accordingly, both for
-the 

[committed] aarch64: Use new hooks for vector comparisons

2021-11-12 Thread Richard Sandiford via Gcc-patches
Previously we tried to account for the different issue rates of
the various vector modes by guessing what the Advanced SIMD version
of an SVE loop would look like and what its issue rate was likely to be.
We'd then increase the cost of the SVE loop if the Advanced SIMD loop
might issue more quickly.

This patch moves that logic to better_main_loop_than_p, so that we
can compare loops side-by-side rather than having to guess.  This also
means we can apply the issue rate heuristics to *any* vector loop
comparison, rather than just weighting SVE vs. Advanced SIMD.

The actual heuristics are otherwise unchanged.  We're just
applying them in a different place.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs::m_saw_sve_only_op)
(aarch64_sve_only_stmt_p): Delete.
(aarch64_vector_costs::prefer_unrolled_loop): New function,
extracted from adjust_body_cost.
(aarch64_vector_costs::better_main_loop_than_p): New function,
using heuristics extracted from adjust_body_cost and
adjust_body_cost_sve.
(aarch64_vector_costs::adjust_body_cost_sve): Remove
advsimd_cycles_per_iter and could_use_advsimd parameters.
Update after changes above.
(aarch64_vector_costs::adjust_body_cost): Update after changes above.
---
 gcc/config/aarch64/aarch64.c | 291 +--
 1 file changed, 145 insertions(+), 146 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5fa64fe5350..1e2f3bf3765 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14909,6 +14909,7 @@ public:
  int misalign,
  vect_cost_model_location where) override;
   void finish_cost (const vector_costs *) override;
+  bool better_main_loop_than_p (const vector_costs *other) const override;
 
 private:
   void record_potential_advsimd_unrolling (loop_vec_info);
@@ -14916,20 +14917,16 @@ private:
   void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info, tree,
  aarch64_vec_op_count *, unsigned int);
   fractional_cost adjust_body_cost_sve (const aarch64_vec_op_count *,
-   fractional_cost, fractional_cost,
-   bool, unsigned int, unsigned int *,
-   bool *);
+   fractional_cost, unsigned int,
+   unsigned int *, bool *);
   unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
 unsigned int);
+  bool prefer_unrolled_loop () const;
 
   /* True if we have performed one-time initialization based on the
  vec_info.  */
   bool m_analyzed_vinfo = false;
 
-  /* True if we've seen an SVE operation that we cannot currently vectorize
- using Advanced SIMD.  */
-  bool m_saw_sve_only_op = false;
-
   /* - If M_VEC_FLAGS is zero then we're costing the original scalar code.
  - If M_VEC_FLAGS & VEC_ADVSIMD is nonzero then we're costing Advanced
SIMD code.
@@ -15306,42 +15303,6 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return false;
 }
 
-/* Return true if the vectorized form of STMT_INFO is something that is only
-   possible when using SVE instead of Advanced SIMD.  VECTYPE is the type of
-   the vector that STMT_INFO is operating on.  */
-static bool
-aarch64_sve_only_stmt_p (stmt_vec_info stmt_info, tree vectype)
-{
-  if (!aarch64_sve_mode_p (TYPE_MODE (vectype)))
-return false;
-
-  if (STMT_VINFO_DATA_REF (stmt_info))
-{
-  /* Check for true gathers and scatters (rather than just strided accesses
-that we've chosen to implement using gathers and scatters).  Although
-in principle we could use elementwise accesses for Advanced SIMD,
-the vectorizer doesn't yet support that.  */
-  if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
-   return true;
-
-  /* Check for masked loads and stores.  */
-  if (auto *call = dyn_cast (stmt_info->stmt))
-   if (gimple_call_internal_p (call)
-   && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0)
- return true;
-}
-
-  /* Check for 64-bit integer multiplications.  */
-  auto *assign = dyn_cast (stmt_info->stmt);
-  if (assign
-  && gimple_assign_rhs_code (assign) == MULT_EXPR
-  && GET_MODE_INNER (TYPE_MODE (vectype)) == DImode
-  && !integer_pow2p (gimple_assign_rhs2 (assign)))
-return true;
-
-  return false;
-}
-
 /* We are considering implementing STMT_INFO using SVE.  If STMT_INFO is an
in-loop reduction that SVE supports directly, return its latency in cycles,
otherwise return zero.  SVE_COSTS specifies the latencies of the relevant
@@ -15866,9 +15827,6 @@ aarch64_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
  of just 

[committed] aarch64: Add vf_factor to aarch64_vec_op_count

2021-11-12 Thread Richard Sandiford via Gcc-patches
-mtune=neoverse-512tvb sets the likely SVE vector length to 128 bits,
but it also takes into account Neoverse V1, which is a 256-bit target.
This patch adds this VF (VL) factor to aarch64_vec_op_count.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_vec_op_count::m_vf_factor):
New member variable.
(aarch64_vec_op_count::aarch64_vec_op_count): Add a parameter for it.
(aarch64_vec_op_count::vf_factor): New function.
(aarch64_vector_costs::aarch64_vector_costs): When costing for
neoverse-512tvb, pass a vf_factor of 2 for the Neoverse V1 version
of an SVE loop.
(aarch64_vector_costs::adjust_body_cost): Read the vf factor
instead of hard-coding 2.
---
 gcc/config/aarch64/aarch64.c | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 241cef8c5d9..5fa64fe5350 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14711,9 +14711,12 @@ class aarch64_vec_op_count
 {
 public:
   aarch64_vec_op_count () = default;
-  aarch64_vec_op_count (const aarch64_vec_issue_info *, unsigned int);
+  aarch64_vec_op_count (const aarch64_vec_issue_info *, unsigned int,
+   unsigned int = 1);
 
   unsigned int vec_flags () const { return m_vec_flags; }
+  unsigned int vf_factor () const { return m_vf_factor; }
+
   const aarch64_base_vec_issue_info *base_issue_info () const;
   const aarch64_simd_vec_issue_info *simd_issue_info () const;
   const aarch64_sve_vec_issue_info *sve_issue_info () const;
@@ -14753,13 +14756,23 @@ private:
  - If M_VEC_FLAGS & VEC_ANY_SVE is nonzero then this structure describes
SVE code.  */
   unsigned int m_vec_flags = 0;
+
+  /* Assume that, when the code is executing on the core described
+ by M_ISSUE_INFO, one iteration of the loop will handle M_VF_FACTOR
+ times more data than the vectorizer anticipates.
+
+ This is only ever different from 1 for SVE.  It allows us to consider
+ what would happen on a 256-bit SVE target even when the -mtune
+ parameters say that the “likely” SVE length is 128 bits.  */
+  unsigned int m_vf_factor = 1;
 };
 
 aarch64_vec_op_count::
 aarch64_vec_op_count (const aarch64_vec_issue_info *issue_info,
- unsigned int vec_flags)
+ unsigned int vec_flags, unsigned int vf_factor)
   : m_issue_info (issue_info),
-m_vec_flags (vec_flags)
+m_vec_flags (vec_flags),
+m_vf_factor (vf_factor)
 {
 }
 
@@ -14973,7 +14986,11 @@ aarch64_vector_costs::aarch64_vector_costs (vec_info 
*vinfo,
   if (m_vec_flags & VEC_ANY_SVE)
m_advsimd_ops.quick_push ({ issue_info, VEC_ADVSIMD });
   if (aarch64_tune_params.vec_costs == _vector_cost)
-   m_ops.quick_push ({ _vec_issue_info, m_vec_flags });
+   {
+ unsigned int vf_factor = (m_vec_flags & VEC_ANY_SVE) ? 2 : 1;
+ m_ops.quick_push ({ _vec_issue_info, m_vec_flags,
+ vf_factor });
+   }
 }
 }
 
@@ -16111,8 +16128,9 @@ adjust_body_cost (loop_vec_info loop_vinfo,
  if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "Neoverse V1 estimate:\n");
- adjust_body_cost_sve (_ops[1], scalar_cycles_per_iter * 2,
-   advsimd_cycles_per_iter * 2,
+ auto vf_factor = m_ops[1].vf_factor ();
+ adjust_body_cost_sve (_ops[1], scalar_cycles_per_iter * vf_factor,
+   advsimd_cycles_per_iter * vf_factor,
could_use_advsimd, orig_body_cost,
_cost, _disparage);
}
-- 
2.25.1



[PATCH] PR fortran/102368 - Failure to compile program using the C_SIZEOF function in ISO_C_BINDING

2021-11-12 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

F2008:15.3.5 relaxed the condition on interoperable character variables
and now allows values different from one.  Similar text in F2018:18.3.4.
This required an adjustment in the interoperability check.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 1fc44a5bf0b294021490f3c0a1539982a09000f5 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 12 Nov 2021 18:32:18 +0100
Subject: [PATCH] Fortran: fix interoperability check for character variables
 for F2008

gcc/fortran/ChangeLog:

	PR fortran/102368
	* check.c (is_c_interoperable): F2008:15.3.5 relaxed the condition
	on interoperable character variables and allows values different
	from one.

gcc/testsuite/ChangeLog:

	PR fortran/102368
	* gfortran.dg/c_sizeof_7.f90: New test.
---
 gcc/fortran/check.c  | 20 ++--
 gcc/testsuite/gfortran.dg/c_sizeof_7.f90 | 13 +
 2 files changed, 27 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/c_sizeof_7.f90

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index ffa07b510cd..69a2e35e81b 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -5272,13 +5272,21 @@ is_c_interoperable (gfc_expr *expr, const char **msg, bool c_loc, bool c_f_ptr)
 	&& !gfc_simplify_expr (expr->ts.u.cl->length, 0))
   gfc_internal_error ("is_c_interoperable(): gfc_simplify_expr failed");

-if (!c_loc && expr->ts.u.cl
-	&& (!expr->ts.u.cl->length
-	|| expr->ts.u.cl->length->expr_type != EXPR_CONSTANT
-	|| mpz_cmp_si (expr->ts.u.cl->length->value.integer, 1) != 0))
+if (!c_loc && expr->ts.u.cl)
   {
-	*msg = "Type shall have a character length of 1";
-	return false;
+	bool len_ok = (expr->ts.u.cl->length
+		   && expr->ts.u.cl->length->expr_type == EXPR_CONSTANT);
+
+	/* F2003:15.2.1 required the length of a character variable to be one.
+	   F2008:15.3.5 relaxed this to constant length. */
+	if (len_ok && !(gfc_option.allow_std & GFC_STD_F2008))
+	  len_ok = mpz_cmp_si (expr->ts.u.cl->length->value.integer, 1) == 0;
+
+	if (!len_ok)
+	  {
+	*msg = "Type shall have a character length of 1";
+	return false;
+	  }
   }
 }

diff --git a/gcc/testsuite/gfortran.dg/c_sizeof_7.f90 b/gcc/testsuite/gfortran.dg/c_sizeof_7.f90
new file mode 100644
index 000..3cfa3371f72
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/c_sizeof_7.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-options "-std=f2008 -fdump-tree-original" }
+! { dg-final { scan-tree-dump-times "_gfortran_stop_numeric" 0 "original" } }
+! PR fortran/102368
+
+program main
+  use, intrinsic :: iso_c_binding
+  implicit none
+  character(kind=c_char, len=*), parameter :: a = 'abc'
+  character(kind=c_char, len=8):: b
+  if (c_sizeof (a) /= 3) stop 1
+  if (c_sizeof (b) /= 8) stop 2
+end program main
--
2.26.2



[committed] aarch64: Move cycle estimation into aarch64_vec_op_count

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch just moves the main cycle estimation routines
into aarch64_vec_op_count.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c
(aarch64_vec_op_count::rename_cycles_per_iter): New function.
(aarch64_vec_op_count::min_nonpred_cycles_per_iter): Likewise.
(aarch64_vec_op_count::min_pred_cycles_per_iter): Likewise.
(aarch64_vec_op_count::min_cycles_per_iter): Likewise.
(aarch64_vec_op_count::dump): Move earlier in file.  Dump the
above properties too.
(aarch64_estimate_min_cycles_per_iter): Delete.
(adjust_body_cost): Use aarch64_vec_op_count::min_cycles_per_iter
instead of aarch64_estimate_min_cycles_per_iter.  Rely on the dump
routine to print CPI estimates.
(adjust_body_cost_sve): Likewise.  Use the other functions above
instead of doing the work inline.
---
 gcc/config/aarch64/aarch64.c | 203 ++-
 1 file changed, 105 insertions(+), 98 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 71c44d6327e..241cef8c5d9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14718,6 +14718,11 @@ public:
   const aarch64_simd_vec_issue_info *simd_issue_info () const;
   const aarch64_sve_vec_issue_info *sve_issue_info () const;
 
+  fractional_cost rename_cycles_per_iter () const;
+  fractional_cost min_nonpred_cycles_per_iter () const;
+  fractional_cost min_pred_cycles_per_iter () const;
+  fractional_cost min_cycles_per_iter () const;
+
   void dump () const;
 
   /* The number of individual "general" operations.  See the comments
@@ -14791,6 +14796,95 @@ aarch64_vec_op_count::sve_issue_info () const
   return nullptr;
 }
 
+/* Estimate the minimum number of cycles per iteration needed to rename
+   the instructions.
+
+   ??? For now this is done inline rather than via cost tables, since it
+   isn't clear how it should be parameterized for the general case.  */
+fractional_cost
+aarch64_vec_op_count::rename_cycles_per_iter () const
+{
+  if (sve_issue_info () == _sve_issue_info)
+/* + 1 for an addition.  We've already counted a general op for each
+   store, so we don't need to account for stores separately.  The branch
+   reads no registers and so does not need to be counted either.
+
+   ??? This value is very much on the pessimistic side, but seems to work
+   pretty well in practice.  */
+return { general_ops + loads + pred_ops + 1, 5 };
+
+  return 0;
+}
+
+/* Like min_cycles_per_iter, but excluding predicate operations.  */
+fractional_cost
+aarch64_vec_op_count::min_nonpred_cycles_per_iter () const
+{
+  auto *issue_info = base_issue_info ();
+
+  fractional_cost cycles = MAX (reduction_latency, 1);
+  cycles = std::max (cycles, { stores, issue_info->stores_per_cycle });
+  cycles = std::max (cycles, { loads + stores,
+  issue_info->loads_stores_per_cycle });
+  cycles = std::max (cycles, { general_ops,
+  issue_info->general_ops_per_cycle });
+  cycles = std::max (cycles, rename_cycles_per_iter ());
+  return cycles;
+}
+
+/* Like min_cycles_per_iter, but including only the predicate operations.  */
+fractional_cost
+aarch64_vec_op_count::min_pred_cycles_per_iter () const
+{
+  if (auto *issue_info = sve_issue_info ())
+return { pred_ops, issue_info->pred_ops_per_cycle };
+  return 0;
+}
+
+/* Estimate the minimum number of cycles needed to issue the operations.
+   This is a very simplistic model!  */
+fractional_cost
+aarch64_vec_op_count::min_cycles_per_iter () const
+{
+  return std::max (min_nonpred_cycles_per_iter (),
+  min_pred_cycles_per_iter ());
+}
+
+/* Dump information about the structure.  */
+void
+aarch64_vec_op_count::dump () const
+{
+  dump_printf_loc (MSG_NOTE, vect_location,
+  "  load operations = %d\n", loads);
+  dump_printf_loc (MSG_NOTE, vect_location,
+  "  store operations = %d\n", stores);
+  dump_printf_loc (MSG_NOTE, vect_location,
+  "  general operations = %d\n", general_ops);
+  if (sve_issue_info ())
+dump_printf_loc (MSG_NOTE, vect_location,
+"  predicate operations = %d\n", pred_ops);
+  dump_printf_loc (MSG_NOTE, vect_location,
+  "  reduction latency = %d\n", reduction_latency);
+  if (auto rcpi = rename_cycles_per_iter ())
+dump_printf_loc (MSG_NOTE, vect_location,
+"  estimated cycles per iteration to rename = %f\n",
+rcpi.as_double ());
+  if (auto pred_cpi = min_pred_cycles_per_iter ())
+{
+  dump_printf_loc (MSG_NOTE, vect_location,
+  "  estimated min cycles per iteration"
+  " without predication = %f\n",
+  min_nonpred_cycles_per_iter ().as_double ());
+  dump_printf_loc (MSG_NOTE, vect_location,
+  "  

[committed] aarch64: Use an array of aarch64_vec_op_counts

2021-11-12 Thread Richard Sandiford via Gcc-patches
-mtune=neoverse-512tvb uses two issue rates, one for Neoverse V1
and one with more generic parameters.  We use both rates when
making a choice between scalar, Advanced SIMD and SVE code.

Previously we calculated the Neoverse V1 issue rates from the
more generic issue rates, but by removing m_scalar_ops and
(later) m_advsimd_ops, it becomes easier to track multiple
issue rates directly.

This patch therefore converts m_ops and (temporarily) m_advsimd_ops
into arrays.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_vec_op_count): Allow default
initialization.
(aarch64_vec_op_count::base_issue_info): Remove handling of null
issue_infos.
(aarch64_vec_op_count::simd_issue_info): Likewise.
(aarch64_vec_op_count::sve_issue_info): Likewise.
(aarch64_vector_costs::m_ops): Turn into a vector.
(aarch64_vector_costs::m_advsimd_ops): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Add entries to
the vectors based on aarch64_tune_params.
(aarch64_vector_costs::analyze_loop_vinfo): Update the pred_ops
of all entries in m_ops.
(aarch64_vector_costs::add_stmt_cost): Call count_ops for all
entries in m_ops.
(aarch64_estimate_min_cycles_per_iter): Remove issue_info
parameter and get the information from the ops instead.
(aarch64_vector_costs::adjust_body_cost_sve): Take a
aarch64_vec_issue_info instead of a aarch64_vec_op_count.
(aarch64_vector_costs::adjust_body_cost): Update call accordingly.
Exit earlier if m_ops is empty for either cost structure.
---
 gcc/config/aarch64/aarch64.c | 115 ++-
 1 file changed, 60 insertions(+), 55 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3944c095e1d..71c44d6327e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14710,6 +14710,7 @@ aarch64_first_cycle_multipass_dfa_lookahead_guard 
(rtx_insn *insn,
 class aarch64_vec_op_count
 {
 public:
+  aarch64_vec_op_count () = default;
   aarch64_vec_op_count (const aarch64_vec_issue_info *, unsigned int);
 
   unsigned int vec_flags () const { return m_vec_flags; }
@@ -14739,14 +14740,14 @@ public:
 
 private:
   /* The issue information for the core.  */
-  const aarch64_vec_issue_info *m_issue_info;
+  const aarch64_vec_issue_info *m_issue_info = nullptr;
 
   /* - If M_VEC_FLAGS is zero then this structure describes scalar code
  - If M_VEC_FLAGS & VEC_ADVSIMD is nonzero then this structure describes
Advanced SIMD code.
  - If M_VEC_FLAGS & VEC_ANY_SVE is nonzero then this structure describes
SVE code.  */
-  unsigned int m_vec_flags;
+  unsigned int m_vec_flags = 0;
 };
 
 aarch64_vec_op_count::
@@ -14765,9 +14766,7 @@ aarch64_vec_op_count::base_issue_info () const
 {
   if (auto *ret = simd_issue_info ())
 return ret;
-  if (m_issue_info)
-return m_issue_info->scalar;
-  return nullptr;
+  return m_issue_info->scalar;
 }
 
 /* If the structure describes vector code and we have associated issue
@@ -14777,7 +14776,7 @@ aarch64_vec_op_count::simd_issue_info () const
 {
   if (auto *ret = sve_issue_info ())
 return ret;
-  if (m_issue_info && m_vec_flags)
+  if (m_vec_flags)
 return m_issue_info->advsimd;
   return nullptr;
 }
@@ -14787,7 +14786,7 @@ aarch64_vec_op_count::simd_issue_info () const
 const aarch64_sve_vec_issue_info *
 aarch64_vec_op_count::sve_issue_info () const
 {
-  if (m_issue_info && (m_vec_flags & VEC_ANY_SVE))
+  if (m_vec_flags & VEC_ANY_SVE)
 return m_issue_info->sve;
   return nullptr;
 }
@@ -14809,7 +14808,7 @@ private:
   void analyze_loop_vinfo (loop_vec_info);
   void count_ops (unsigned int, vect_cost_for_stmt, stmt_vec_info, tree,
  aarch64_vec_op_count *, unsigned int);
-  fractional_cost adjust_body_cost_sve (const aarch64_vec_issue_info *,
+  fractional_cost adjust_body_cost_sve (const aarch64_vec_op_count *,
fractional_cost, fractional_cost,
bool, unsigned int, unsigned int *,
bool *);
@@ -14853,13 +14852,14 @@ private:
 
   /* Used only when vectorizing loops.  Estimates the number and kind of
  operations that would be needed by one iteration of the scalar
- or vector loop.  */
-  aarch64_vec_op_count m_ops;
+ or vector loop.  There is one entry for each tuning option of
+ interest.  */
+  auto_vec m_ops;
 
-  /* Used only when vectorizing loops for SVE.  It estimates what the
- equivalent Advanced SIMD-only code would need in order to perform
- the same work as one iteration of the SVE loop.  */
-  aarch64_vec_op_count m_advsimd_ops;
+  /* Used only when vectorizing loops for SVE.  For the first element of M_OPS,
+ it estimates what the equivalent Advanced SIMD-only code would need
+

[committed] aarch64: Use real scalar op counts

2021-11-12 Thread Richard Sandiford via Gcc-patches
Now that vector finish_costs is passed the associated scalar costs,
we can record the scalar issue information while computing the scalar
costs, rather than trying to estimate it while computing the vector
costs.

This simplifies things a little, but the main motivation is to improve
accuracy.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_vector_costs::m_scalar_ops)
(aarch64_vector_costs::m_sve_ops): Replace with...
(aarch64_vector_costs::m_ops): ...this.
(aarch64_vector_costs::analyze_loop_vinfo): Update accordingly.
(aarch64_vector_costs::adjust_body_cost_sve): Likewise.
(aarch64_vector_costs::aarch64_vector_costs): Likewise.
Initialize m_vec_flags here rather than in add_stmt_cost.
(aarch64_vector_costs::count_ops): Test for scalar reductions too.
Allow vectype to be null.
(aarch64_vector_costs::add_stmt_cost): Call count_ops for scalar
code too.  Don't require vectype to be nonnull.
(aarch64_vector_costs::adjust_body_cost): Take the loop_vec_info
and scalar costs as parameters.  Use the scalar costs to determine
the cycles per iteration of the scalar loop, then multiply it
by the estimated VF.
(aarch64_vector_costs::finish_cost): Update call accordingly.
---
 gcc/config/aarch64/aarch64.c | 182 +--
 1 file changed, 88 insertions(+), 94 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d8bbc66c226..3944c095e1d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14813,7 +14813,8 @@ private:
fractional_cost, fractional_cost,
bool, unsigned int, unsigned int *,
bool *);
-  unsigned int adjust_body_cost (unsigned int);
+  unsigned int adjust_body_cost (loop_vec_info, const aarch64_vector_costs *,
+unsigned int);
 
   /* True if we have performed one-time initialization based on the
  vec_info.  */
@@ -14850,22 +14851,16 @@ private:
  iterate, otherwise it is zero.  */
   uint64_t m_num_vector_iterations = 0;
 
-  /* Used only when vectorizing loops.  Estimates the number and kind of scalar
- operations that would be needed to perform the same work as one iteration
- of the vector loop.  */
-  aarch64_vec_op_count m_scalar_ops;
+  /* Used only when vectorizing loops.  Estimates the number and kind of
+ operations that would be needed by one iteration of the scalar
+ or vector loop.  */
+  aarch64_vec_op_count m_ops;
 
-  /* Used only when vectorizing loops.  If M_VEC_FLAGS & VEC_ADVSIMD,
- this structure estimates the number and kind of operations that the
- vector loop would contain.  If M_VEC_FLAGS & VEC_SVE, the structure
- estimates what the equivalent Advanced SIMD-only code would need in
- order to perform the same work as one iteration of the SVE loop.  */
+  /* Used only when vectorizing loops for SVE.  It estimates what the
+ equivalent Advanced SIMD-only code would need in order to perform
+ the same work as one iteration of the SVE loop.  */
   aarch64_vec_op_count m_advsimd_ops;
 
-  /* Used only when vectorizing loops with SVE.  It estimates the number and
- kind of operations that the SVE loop would contain.  */
-  aarch64_vec_op_count m_sve_ops;
-
   /* Used to detect cases in which we end up costing the same load twice,
  once to account for results that are actually used and once to account
  for unused results.  */
@@ -14875,9 +14870,10 @@ private:
 aarch64_vector_costs::aarch64_vector_costs (vec_info *vinfo,
bool costing_for_scalar)
   : vector_costs (vinfo, costing_for_scalar),
-m_scalar_ops (aarch64_tune_params.vec_costs->issue_info, 0),
-m_advsimd_ops (aarch64_tune_params.vec_costs->issue_info, VEC_ADVSIMD),
-m_sve_ops (aarch64_tune_params.vec_costs->issue_info, VEC_ANY_SVE)
+m_vec_flags (costing_for_scalar ? 0
+: aarch64_classify_vector_mode (vinfo->vector_mode)),
+m_ops (aarch64_tune_params.vec_costs->issue_info, m_vec_flags),
+m_advsimd_ops (aarch64_tune_params.vec_costs->issue_info, VEC_ADVSIMD)
 {
 }
 
@@ -15016,7 +15012,7 @@ aarch64_vector_costs::analyze_loop_vinfo (loop_vec_info 
loop_vinfo)
   FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), num_vectors_m1, rgm)
if (rgm->type)
  num_masks += num_vectors_m1 + 1;
-  m_sve_ops.pred_ops += num_masks * issue_info->sve->while_pred_ops;
+  m_ops.pred_ops += num_masks * issue_info->sve->while_pred_ops;
 }
 }
 
@@ -15550,8 +15546,8 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
 /* COUNT, KIND, STMT_INFO and VECTYPE are the same as for
vector_costs::add_stmt_cost and they describe an operation in the

[committed] aarch64: Get floatness from stmt_info

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch gets the floatness of a memory access from the data
reference rather than the vectype.  This makes it more suitable
for use in scalar costing code.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_dr_type): New function.
(aarch64_vector_costs::count_ops): Use it rather than the
vectype to determine floatness.
---
 gcc/config/aarch64/aarch64.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 416362beefd..d8bbc66c226 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14920,6 +14920,16 @@ aarch64_simd_vec_costs_for_flags (unsigned int flags)
   return costs->advsimd;
 }
 
+/* If STMT_INFO is a memory reference, return the scalar memory type,
+   otherwise return null.  */
+static tree
+aarch64_dr_type (stmt_vec_info stmt_info)
+{
+  if (auto dr = STMT_VINFO_DATA_REF (stmt_info))
+return TREE_TYPE (DR_REF (dr));
+  return NULL_TREE;
+}
+
 /* Decide whether to use the unrolling heuristic described above
m_unrolled_advsimd_niters, updating that field if so.  LOOP_VINFO
describes the loop that we're vectorizing.  */
@@ -15649,7 +15659,7 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
prev_count = num_copies;
}
   ops->loads += num_copies;
-  if (vec_flags || FLOAT_TYPE_P (vectype))
+  if (vec_flags || FLOAT_TYPE_P (aarch64_dr_type (stmt_info)))
ops->general_ops += base_issue->fp_simd_load_general_ops * num_copies;
   break;
 
@@ -15657,7 +15667,7 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 case unaligned_store:
 case scalar_store:
   ops->stores += num_copies;
-  if (vec_flags || FLOAT_TYPE_P (vectype))
+  if (vec_flags || FLOAT_TYPE_P (aarch64_dr_type (stmt_info)))
ops->general_ops += base_issue->fp_simd_store_general_ops * num_copies;
   break;
 }
-- 
2.25.1



[committed] aarch64: Remove vectype from latency tests

2021-11-12 Thread Richard Sandiford via Gcc-patches
This patch gets the scalar mode of a reduction operation from the
gimple stmt rather than the vectype.  This makes it more suitable
for use in scalar costs.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_sve_in_loop_reduction_latency):
Remove vectype parameter and get floatness from the type of the
stmt lhs instead.
(arch64_in_loop_reduction_latency): Likewise.
(aarch64_detect_vector_stmt_subtype): Update caller.
(aarch64_vector_costs::count_ops): Likewise.
---
 gcc/config/aarch64/aarch64.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c8a3cb38473..416362beefd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15220,14 +15220,13 @@ aarch64_sve_only_stmt_p (stmt_vec_info stmt_info, 
tree vectype)
   return false;
 }
 
-/* We are considering implementing STMT_INFO using SVE vector type VECTYPE.
-   If STMT_INFO is an in-loop reduction that SVE supports directly, return
-   its latency in cycles, otherwise return zero.  SVE_COSTS specifies the
-   latencies of the relevant instructions.  */
+/* We are considering implementing STMT_INFO using SVE.  If STMT_INFO is an
+   in-loop reduction that SVE supports directly, return its latency in cycles,
+   otherwise return zero.  SVE_COSTS specifies the latencies of the relevant
+   instructions.  */
 static unsigned int
 aarch64_sve_in_loop_reduction_latency (vec_info *vinfo,
   stmt_vec_info stmt_info,
-  tree vectype,
   const sve_vec_cost *sve_costs)
 {
   switch (vect_reduc_type (vinfo, stmt_info))
@@ -15236,7 +15235,7 @@ aarch64_sve_in_loop_reduction_latency (vec_info *vinfo,
   return sve_costs->clast_cost;
 
 case FOLD_LEFT_REDUCTION:
-  switch (GET_MODE_INNER (TYPE_MODE (vectype)))
+  switch (TYPE_MODE (TREE_TYPE (gimple_get_lhs (stmt_info->stmt
{
case E_HFmode:
case E_BFmode:
@@ -15268,14 +15267,10 @@ aarch64_sve_in_loop_reduction_latency (vec_info 
*vinfo,
  Advanced SIMD implementation.
 
- If VEC_FLAGS & VEC_ANY_SVE, return the loop carry latency of the
- SVE implementation.
-
-   VECTYPE is the type of vector that the vectorizer is considering using
-   for STMT_INFO, which might be different from the type of vector described
-   by VEC_FLAGS.  */
+ SVE implementation.  */
 static unsigned int
 aarch64_in_loop_reduction_latency (vec_info *vinfo, stmt_vec_info stmt_info,
-  tree vectype, unsigned int vec_flags)
+  unsigned int vec_flags)
 {
   const cpu_vector_cost *vec_costs = aarch64_tune_params.vec_costs;
   const sve_vec_cost *sve_costs = nullptr;
@@ -15287,16 +15282,16 @@ aarch64_in_loop_reduction_latency (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (sve_costs)
 {
   unsigned int latency
-   = aarch64_sve_in_loop_reduction_latency (vinfo, stmt_info, vectype,
-sve_costs);
+   = aarch64_sve_in_loop_reduction_latency (vinfo, stmt_info, sve_costs);
   if (latency)
return latency;
 }
 
   /* Handle scalar costs.  */
+  bool is_float = FLOAT_TYPE_P (TREE_TYPE (gimple_get_lhs (stmt_info->stmt)));
   if (vec_flags == 0)
 {
-  if (FLOAT_TYPE_P (vectype))
+  if (is_float)
return vec_costs->scalar_fp_stmt_cost;
   return vec_costs->scalar_int_stmt_cost;
 }
@@ -15305,7 +15300,7 @@ aarch64_in_loop_reduction_latency (vec_info *vinfo, 
stmt_vec_info stmt_info,
  with a vector reduction outside the loop.  */
   const simd_vec_cost *simd_costs
 = aarch64_simd_vec_costs_for_flags (vec_flags);
-  if (FLOAT_TYPE_P (vectype))
+  if (is_float)
 return simd_costs->fp_stmt_cost;
   return simd_costs->int_stmt_cost;
 }
@@ -15382,8 +15377,7 @@ aarch64_detect_vector_stmt_subtype (vec_info *vinfo, 
vect_cost_for_stmt kind,
   && sve_costs)
 {
   unsigned int latency
-   = aarch64_sve_in_loop_reduction_latency (vinfo, stmt_info, vectype,
-sve_costs);
+   = aarch64_sve_in_loop_reduction_latency (vinfo, stmt_info, sve_costs);
   if (latency)
return latency;
 }
@@ -15570,8 +15564,7 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
   && vect_is_reduction (stmt_info))
 {
   unsigned int base
-   = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, vectype,
-vec_flags);
+   = aarch64_in_loop_reduction_latency (m_vinfo, stmt_info, vec_flags);
   if (vect_reduc_type (m_vinfo, stmt_info) == FOLD_LEFT_REDUCTION)
{
  if (aarch64_sve_mode_p (TYPE_MODE (vectype)))
-- 
2.25.1



[committed] aarch64: Fold aarch64_sve_op_count into aarch64_vec_op_count

2021-11-12 Thread Richard Sandiford via Gcc-patches
Later patches make aarch64 use the new vector hooks.  We then
only need to track one set of ops for each aarch64_vector_costs
structure.  This in turn means that it's more convenient to merge
aarch64_sve_op_count and aarch64_vec_op_count.

The patch also adds issue info and vec flags to aarch64_vec_op_count,
so that the structure is more self-descriptive.  This simplifies some
things later.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_sve_op_count): Fold into...
(aarch64_vec_op_count): ...this.  Add a constructor.
(aarch64_vec_op_count::vec_flags): New function.
(aarch64_vec_op_count::base_issue_info): Likewise.
(aarch64_vec_op_count::simd_issue_info): Likewise.
(aarch64_vec_op_count::sve_issue_info): Likewise.
(aarch64_vec_op_count::m_issue_info): New member variable.
(aarch64_vec_op_count::m_vec_flags): Likewise.
(aarch64_vector_costs): Add a constructor.
(aarch64_vector_costs::m_sve_ops): Change type to aarch64_vec_op_count.
(aarch64_vector_costs::aarch64_vector_costs): New function.
Initialize m_scalar_ops, m_advsimd_ops and m_sve_ops.
(aarch64_vector_costs::count_ops): Remove vec_flags and
issue_info parameters, using the new aarch64_vec_op_count
functions instead.
(aarch64_vector_costs::add_stmt_cost): Update call accordingly.
(aarch64_sve_op_count::dump): Fold into...
(aarch64_vec_op_count::dump): ..here.
---
 gcc/config/aarch64/aarch64.c | 153 ++-
 1 file changed, 96 insertions(+), 57 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 850288d0e01..c8a3cb38473 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14707,8 +14707,16 @@ aarch64_first_cycle_multipass_dfa_lookahead_guard 
(rtx_insn *insn,
 /* Information about how the CPU would issue the scalar, Advanced SIMD
or SVE version of a vector loop, using the scheme defined by the
aarch64_base_vec_issue_info hierarchy of structures.  */
-struct aarch64_vec_op_count
+class aarch64_vec_op_count
 {
+public:
+  aarch64_vec_op_count (const aarch64_vec_issue_info *, unsigned int);
+
+  unsigned int vec_flags () const { return m_vec_flags; }
+  const aarch64_base_vec_issue_info *base_issue_info () const;
+  const aarch64_simd_vec_issue_info *simd_issue_info () const;
+  const aarch64_sve_vec_issue_info *sve_issue_info () const;
+
   void dump () const;
 
   /* The number of individual "general" operations.  See the comments
@@ -14724,23 +14732,71 @@ struct aarch64_vec_op_count
  operations, which in the vector code become associated with
  reductions.  */
   unsigned int reduction_latency = 0;
-};
-
-/* Extends aarch64_vec_op_count with SVE-specific information.  */
-struct aarch64_sve_op_count : aarch64_vec_op_count
-{
-  void dump () const;
 
   /* The number of individual predicate operations.  See the comments
  in aarch64_sve_vec_issue_info for details.  */
   unsigned int pred_ops = 0;
+
+private:
+  /* The issue information for the core.  */
+  const aarch64_vec_issue_info *m_issue_info;
+
+  /* - If M_VEC_FLAGS is zero then this structure describes scalar code
+ - If M_VEC_FLAGS & VEC_ADVSIMD is nonzero then this structure describes
+   Advanced SIMD code.
+ - If M_VEC_FLAGS & VEC_ANY_SVE is nonzero then this structure describes
+   SVE code.  */
+  unsigned int m_vec_flags;
 };
 
+aarch64_vec_op_count::
+aarch64_vec_op_count (const aarch64_vec_issue_info *issue_info,
+ unsigned int vec_flags)
+  : m_issue_info (issue_info),
+m_vec_flags (vec_flags)
+{
+}
+
+/* Return the base issue information (i.e. the parts that make sense
+   for both scalar and vector code).  Return null if we have no issue
+   information.  */
+const aarch64_base_vec_issue_info *
+aarch64_vec_op_count::base_issue_info () const
+{
+  if (auto *ret = simd_issue_info ())
+return ret;
+  if (m_issue_info)
+return m_issue_info->scalar;
+  return nullptr;
+}
+
+/* If the structure describes vector code and we have associated issue
+   information, return that issue information, otherwise return null.  */
+const aarch64_simd_vec_issue_info *
+aarch64_vec_op_count::simd_issue_info () const
+{
+  if (auto *ret = sve_issue_info ())
+return ret;
+  if (m_issue_info && m_vec_flags)
+return m_issue_info->advsimd;
+  return nullptr;
+}
+
+/* If the structure describes SVE code and we have associated issue
+   information, return that issue information, otherwise return null.  */
+const aarch64_sve_vec_issue_info *
+aarch64_vec_op_count::sve_issue_info () const
+{
+  if (m_issue_info && (m_vec_flags & VEC_ANY_SVE))
+return m_issue_info->sve;
+  return nullptr;
+}
+
 /* Information about vector code that we're in the process of costing.  */
 class aarch64_vector_costs : public vector_costs
 {
 public:
-  using 

[committed] aarch64: Detect more consecutive MEMs

2021-11-12 Thread Richard Sandiford via Gcc-patches
For tests like:

int res[2];
void
f1 (int x, int y)
{
  res[0] = res[1] = x + y;
}

we generated:

add w0, w0, w1
adrpx1, .LANCHOR0
add x2, x1, :lo12:.LANCHOR0
str w0, [x1, #:lo12:.LANCHOR0]
str w0, [x2, 4]
ret

Using [x1, #:lo12:.LANCHOR0] for the first store prevented the
two stores being recognised as a pair.  However, the MEM_EXPR
and MEM_OFFSET information tell us that the MEMs really are
consecutive.  The peehole2 context then guarantees that the
first address is equivalent to [x2, 0].

While there: the reg_mentioned_p tests for loads were probably correct,
but seemed a bit indirect.  We're matching two consecutive loads,
so the thing we need to test is that the second MEM in the original
sequence doesn't depend on the result of the first load in the
original sequence.

Tested on aarch64-linux-gnu & applied.

Richard


gcc/
* config/aarch64/aarch64.c: Include tree-dfa.h.
(aarch64_check_consecutive_mems): New function that takes MEM_EXPR
and MEM_OFFSET into account.
(aarch64_swap_ldrstr_operands): Use it.
(aarch64_operands_ok_for_ldpstp): Likewise.  Check that the
address of the second memory doesn't depend on the result of
the first load.

gcc/testsuite/
* gcc.target/aarch64/stp_1.c: New test.
---
 gcc/config/aarch64/aarch64.c | 156 +++
 gcc/testsuite/gcc.target/aarch64/stp_1.c |  29 +
 2 files changed, 133 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_1.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a8f53b85d92..850288d0e01 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -79,6 +79,7 @@
 #include "tree-ssa-loop-niter.h"
 #include "fractional-cost.h"
 #include "rtlanal.h"
+#include "tree-dfa.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -24569,6 +24570,97 @@ aarch64_sched_adjust_priority (rtx_insn *insn, int 
priority)
   return priority;
 }
 
+/* Check if *MEM1 and *MEM2 are consecutive memory references and,
+   if they are, try to make them use constant offsets from the same base
+   register.  Return true on success.  When returning true, set *REVERSED
+   to true if *MEM1 comes after *MEM2, false if *MEM1 comes before *MEM2.  */
+static bool
+aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed)
+{
+  *reversed = false;
+  if (GET_RTX_CLASS (GET_CODE (XEXP (*mem1, 0))) == RTX_AUTOINC
+  || GET_RTX_CLASS (GET_CODE (XEXP (*mem2, 0))) == RTX_AUTOINC)
+return false;
+
+  if (!MEM_SIZE_KNOWN_P (*mem1) || !MEM_SIZE_KNOWN_P (*mem2))
+return false;
+
+  auto size1 = MEM_SIZE (*mem1);
+  auto size2 = MEM_SIZE (*mem2);
+
+  rtx base1, base2, offset1, offset2;
+  extract_base_offset_in_addr (*mem1, , );
+  extract_base_offset_in_addr (*mem2, , );
+
+  /* Make sure at least one memory is in base+offset form.  */
+  if (!(base1 && offset1) && !(base2 && offset2))
+return false;
+
+  /* If both mems already use the same base register, just check the
+ offsets.  */
+  if (base1 && base2 && rtx_equal_p (base1, base2))
+{
+  if (!offset1 || !offset2)
+   return false;
+
+  if (known_eq (UINTVAL (offset1) + size1, UINTVAL (offset2)))
+   return true;
+
+  if (known_eq (UINTVAL (offset2) + size2, UINTVAL (offset1)))
+   {
+ *reversed = true;
+ return true;
+   }
+
+  return false;
+}
+
+  /* Otherwise, check whether the MEM_EXPRs and MEM_OFFSETs together
+ guarantee that the values are consecutive.  */
+  if (MEM_EXPR (*mem1)
+  && MEM_EXPR (*mem2)
+  && MEM_OFFSET_KNOWN_P (*mem1)
+  && MEM_OFFSET_KNOWN_P (*mem2))
+{
+  poly_int64 expr_offset1;
+  poly_int64 expr_offset2;
+  tree expr_base1 = get_addr_base_and_unit_offset (MEM_EXPR (*mem1),
+  _offset1);
+  tree expr_base2 = get_addr_base_and_unit_offset (MEM_EXPR (*mem2),
+  _offset2);
+  if (!expr_base1
+ || !expr_base2
+ || !operand_equal_p (expr_base1, expr_base2, OEP_ADDRESS_OF))
+   return false;
+
+  expr_offset1 += MEM_OFFSET (*mem1);
+  expr_offset2 += MEM_OFFSET (*mem2);
+
+  if (known_eq (expr_offset1 + size1, expr_offset2))
+   ;
+  else if (known_eq (expr_offset2 + size2, expr_offset1))
+   *reversed = true;
+  else
+   return false;
+
+  if (base2)
+   {
+ rtx addr1 = plus_constant (Pmode, XEXP (*mem2, 0),
+expr_offset1 - expr_offset2);
+ *mem1 = replace_equiv_address_nv (*mem1, addr1);
+   }
+  else
+   {
+ rtx addr2 = plus_constant (Pmode, XEXP (*mem1, 0),
+expr_offset2 - expr_offset1);
+ *mem2 = 

Re: [PATCH 1/3] gimple-fold: Transform stp*cpy_chk to str*cpy directly

2021-11-12 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 12 Nov 2021 at 01:12, Siddhesh Poyarekar  wrote:
>
> Avoid going through another folding cycle and use the ignore flag to
> directly transform BUILT_IN_STPCPY_CHK to BUILT_IN_STRCPY when set,
> likewise for BUILT_IN_STPNCPY_CHK to BUILT_IN_STPNCPY.
>
> Dump the transformation in dump_file so that we can verify in tests that
> the direct transformation actually happened.
>
> gcc/ChangeLog:
>
> * gimple-fold.c (gimple_fold_builtin_stxcpy_chk,
> gimple_fold_builtin_stxncpy_chk): Use BUILT_IN_STRNCPY if return
> value is not used.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/fold-stringops.c: New test.
>
> Signed-off-by: Siddhesh Poyarekar 
> ---
>  gcc/gimple-fold.c   | 50 +
>  gcc/testsuite/gcc.dg/fold-stringops-1.c | 23 
>  2 files changed, 57 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/fold-stringops-1.c
>
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index 6e25a7c05db..92e15784803 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -3088,6 +3088,19 @@ gimple_fold_builtin_memory_chk (gimple_stmt_iterator 
> *gsi,
>return true;
>  }
>
> +static void
> +dump_transformation (gimple *from, gimple *to)
I assume that both from and to will always be builtin calls ?
In that case, perhaps better to use gcall * here (and in rest of patch).
Also, needs a top-level comment describing the function.
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
Perhaps better to use dump_enabled_p ?
> +{
> +  fprintf (dump_file, "transformed ");
Perhaps use dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, ...) ?
I think you can use gimple_location to get the location.

Thanks,
Prathamesh
> +  print_generic_expr (dump_file, gimple_call_fn (from), dump_flags);
> +  fprintf (dump_file, " to ");
> +  print_generic_expr (dump_file, gimple_call_fn (to), dump_flags);
> +  fprintf (dump_file, "\n");
> +}
> +}
> +
>  /* Fold a call to the __st[rp]cpy_chk builtin.
> DEST, SRC, and SIZE are the arguments to the call.
> IGNORE is true if return value can be ignored.  FCODE is the BUILT_IN_*
> @@ -3184,12 +3197,13 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterator 
> *gsi,
>  }
>
>/* If __builtin_st{r,p}cpy_chk is used, assume st{r,p}cpy is available.  */
> -  fn = builtin_decl_explicit (fcode == BUILT_IN_STPCPY_CHK
> +  fn = builtin_decl_explicit (fcode == BUILT_IN_STPCPY_CHK && !ignore
>   ? BUILT_IN_STPCPY : BUILT_IN_STRCPY);
>if (!fn)
>  return false;
>
>gimple *repl = gimple_build_call (fn, 2, dest, src);
> +  dump_transformation (stmt, repl);
>replace_call_with_call_and_fold (gsi, repl);
>return true;
>  }
> @@ -3209,19 +3223,6 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator 
> *gsi,
>bool ignore = gimple_call_lhs (stmt) == NULL_TREE;
>tree fn;
>
> -  if (fcode == BUILT_IN_STPNCPY_CHK && ignore)
> -{
> -   /* If return value of __stpncpy_chk is ignored,
> -  optimize into __strncpy_chk.  */
> -   fn = builtin_decl_explicit (BUILT_IN_STRNCPY_CHK);
> -   if (fn)
> -{
> -  gimple *repl = gimple_build_call (fn, 4, dest, src, len, size);
> -  replace_call_with_call_and_fold (gsi, repl);
> -  return true;
> -}
> -}
> -
>if (! tree_fits_uhwi_p (size))
>  return false;
>
> @@ -3234,7 +3235,23 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator 
> *gsi,
>  For MAXLEN only allow optimizing into non-_ocs function
>  if SIZE is >= MAXLEN, never convert to __ocs_fail ().  */
>   if (maxlen == NULL_TREE || ! tree_fits_uhwi_p (maxlen))
> -   return false;
> +   {
> + if (fcode == BUILT_IN_STPNCPY_CHK && ignore)
> +   {
> + /* If return value of __stpncpy_chk is ignored,
> +optimize into __strncpy_chk.  */
> + fn = builtin_decl_explicit (BUILT_IN_STRNCPY_CHK);
> + if (fn)
> +   {
> + gimple *repl = gimple_build_call (fn, 4, dest, src, len,
> +   size);
> + replace_call_with_call_and_fold (gsi, repl);
> + return true;
> +   }
> +   }
> +
> + return false;
> +   }
> }
>else
> maxlen = len;
> @@ -3244,12 +3261,13 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator 
> *gsi,
>  }
>
>/* If __builtin_st{r,p}ncpy_chk is used, assume st{r,p}ncpy is available.  
> */
> -  fn = builtin_decl_explicit (fcode == BUILT_IN_STPNCPY_CHK
> +  fn = builtin_decl_explicit (fcode == BUILT_IN_STPNCPY_CHK && !ignore
>   ? BUILT_IN_STPNCPY : BUILT_IN_STRNCPY);
>if (!fn)
>  return false;
>
>gimple *repl = gimple_build_call (fn, 3, dest, src, len);
> +  

Re: [PATCH] gcc: vxworks: fix providing stdint.h header

2021-11-12 Thread Olivier Hainque via Gcc-patches
Hi Rasmus,

We have had to use for stdbool a similar trick as we had
for stdint (need to preinclude yyvals.h), which we will need to
propagate somehow. I'm not yet sure how to reconcile that with
your observations.

Olivier

> On 12 Nov 2021, at 11:15, Rasmus Villemoes  wrote:
> 
> Commit bbbc05957e (Arrange to preinclude yvals.h ahead of stdint on
> VxWorks 7) breaks the build of libstdc++ for our VxWorks 5 platform.
> 
> In file included from 
> .../gcc-build/powerpc-wrs-vxworks/libstdc++-v3/include/memory:72,
> from .../gcc-src/libstdc++-v3/include/precompiled/stdc++.h:82:
> .../gcc-build/powerpc-wrs-vxworks/libstdc++-v3/include/bits/align.h:36:10: 
> fatal error: stdint.h: No such file or directory
>   36 | #include  // uintptr_t
>  |  ^~
> compilation terminated.
> Makefile:1861: recipe for target 
> 'powerpc-wrs-vxworks/bits/stdc++.h.gch/O2ggnu++0x.gch' failed
> make[5]: *** [powerpc-wrs-vxworks/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
> 
> The problem is that the stdint.h header does not exist (in the
> gcc/include/ directory) during the build, but is only added at "make
> install" time.
> 
> For the approach with an extra makefile fragment to work, that rule
> would have to fire after stmp-int-hdrs as it does now (i.e., after the
> common logic has removed stdint.h), but it must also run before we
> actually start building target libraries that depend on having a
> stdint.h - and I can't find something reasonable to make the rule a
> dependency of.
> 
> I agree with the intent of avoiding "altering the common stdint-gcc.h
> with unpleasant vxworks specific bits". The best approach I could come
> up with is adding another variant of "use_gcc_stdint", "preserve",
> meaning "leave whatever the target's extra_headers settings put inside
> gcc/include/ alone". There's no change in behaviour for any of the
> existing values "none", "wrap" or "provide".
> 
> gcc/ChangeLog:
> 
>   * Makefile.in (stmp-int-hdrs): Only remove include/stdint.h when
>   $(USE_GCC_STDINT) != "preserve".
>   * config.gcc: Document new possible value of use_gcc_stdint:
>   "preserve".
>   * config.gcc (vxworks): Add ../vxworks/stdint.h to
>   extra_headers and set use_gcc_stdint=preserve.
>   * config/t-vxworks: Remove install-stdint.h rule.
> ---
> 
> I have previously sent something similar to Olivier privately, hoping
> I could come up with a better/cleaner fix. But I have failed, so now
> I've taken what I had and added the necesary documentation and
> changelog bits.
> 
> Better ideas are of course welcome. I thought of using "custom"
> instead of "preserve", but chose the latter since "wrap" and "provide"
> are verbs.
> 
> gcc/Makefile.in  |  4 +++-
> gcc/config.gcc   | 11 ++-
> gcc/config/t-vxworks | 12 
> 3 files changed, 9 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 571e9c28e29..759982f1d7d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -3132,7 +3132,9 @@ stmp-int-hdrs: $(STMP_FIXINC) $(T_GLIMITS_H) $(USER_H) 
> fixinc_list
>   chmod a+r include/$$file; \
> fi; \
>   done
> - rm -f include/stdint.h
> + if [ $(USE_GCC_STDINT) != preserve ]; then \
> +   rm -f include/stdint.h; \
> + fi
>   if [ $(USE_GCC_STDINT) = wrap ]; then \
> rm -f include/stdint-gcc.h; \
> cp $(srcdir)/ginclude/stdint-gcc.h include/stdint-gcc.h; \
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index edd12655c4a..7a236e1a967 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -129,8 +129,9 @@
> #  use_gcc_stdint If "wrap", install a version of stdint.h that
> # wraps the system's copy for hosted compilations;
> # if "provide", provide a version of systems without
> -#such a system header; otherwise "none", do not
> -#provide such a header at all.
> +#such a system header; if "preserve", keep the copy
> +#installed via the target's extra_headers; otherwise
> +#"none", do not provide such a header at all.
> #
> #  extra_programs List of extra executables compiled for this target
> # machine, used when linking.
> @@ -1024,11 +1025,11 @@ case ${target} in
>   tm_file="${tm_file} vxworks-stdint.h"
> 
>   # .. only through the yvals conditional wrapping mentioned above
> -  # to abide by the VxWorks 7 expectations.  The final copy is performed
> -  # explicitly by a t-vxworks Makefile rule.
> +  # to abide by the VxWorks 7 expectations.
> 
> -  use_gcc_stdint=none
> +  use_gcc_stdint=preserve
>   extra_headers="${extra_headers} ../../ginclude/stdint-gcc.h"
> +  extra_headers="${extra_headers} ../vxworks/stdint.h"
> 
>   case ${enable_threads} in
> no) ;;
> diff --git a/gcc/config/t-vxworks b/gcc/config/t-vxworks
> index 5a06ebe1b87..a544bedf634 100644
> --- 

Re: [PATCH][V2] rs6000: Remove unnecessary option manipulation.

2021-11-12 Thread Martin Liška

On 11/12/21 16:58, Segher Boessenkool wrote:

On Fri, Nov 12, 2021 at 03:34:17PM +0100, Martin Liška wrote:

On 11/11/21 18:52, Segher Boessenkool wrote:

You forgot to send the commit message though?


No, the patch is simple so I didn't write any message (except commit title).


How is a maintainer supposed to know what the patch is about, then?  Not
all of us are clairvoyant.


Oh yeah, lemme explain it.




--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void)
/* Explicit -funroll-loops turns -munroll-only-small-loops off, and
   turns -frename-registers on.  */
if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
-   || (OPTION_SET_P (flag_unroll_all_loops)
-  && flag_unroll_all_loops))
+   || (OPTION_SET_P (flag_unroll_all_loops) &&
flag_unroll_all_loops))
  {
-  if (!OPTION_SET_P (unroll_only_small_loops))
-   unroll_only_small_loops = 0;
-  if (!OPTION_SET_P (flag_rename_registers))
-   flag_rename_registers = 1;
if (!OPTION_SET_P (flag_cunroll_grow_size))
flag_cunroll_grow_size = 1;
  }


So some explanation for these two changes would be good to have.


It's explained in the ChangeLog entry.


It is not.  Besides, a changelog should describe *what* changed, not
*why*, anyway.


All right.




diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..faeb7423ca7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1)
Save
  Analyze and remove doubleword swaps from VSX computations.
  
  munroll-only-small-loops

-Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save
EnabledBy(funroll-loops)


You used format=flowed it seems?  Don't.  Patches are mangled with it :-(


No, it's correct:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583310.html


Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

It is not correct.  Please fix.


Ok, I've just used directly git send-email, please search for V4.

Thanks,
Martin




Segher





Re: [PATCH] x86: Require TARGET_HIMODE_MATH for HImode atomic bit expanders

2021-11-12 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 12, 2021 at 07:55:26AM -0800, H.J. Lu wrote:
> > I have following patch queued for testing for this...
> >
> > 2021-11-12  Jakub Jelinek  
> >
> > PR target/103205
> > * config/i386/sync.md (atomic_bit_test_and_set,
> > atomic_bit_test_and_complement,
> > atomic_bit_test_and_reset): Use OPTAB_WIDEN instead of
> > OPTAB_DIRECT.
> >
> > * gcc.target/i386/pr103205.c: New test.
> 
> Can you include my tests?  Or you can leave out your test and I can check
> in my tests after your fix has been checked in.

I'd prefer the latter.

Jakub



[PATCH][V4] rs6000: Remove unnecessary option manipulation.

2021-11-12 Thread Martin Liska
Do not set flag_rename_registers, it's already enabled with 
EnabledBy(funroll-loops)
in the common.opt file. Use EnabledBy for unroll_only_small_loops which
is a canonical approach how can be make option dependencies.

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_override_options_after_change):
Do not set flag_rename_registers and unroll_only_small_loops.
* config/rs6000/rs6000.opt: Use EnabledBy for unroll_only_small_loops.
---
 gcc/config/rs6000/rs6000.c   | 7 +--
 gcc/config/rs6000/rs6000.opt | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e4843eb0f1c..5550113a94c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3466,13 +3466,8 @@ rs6000_override_options_after_change (void)
   /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
  turns -frename-registers on.  */
   if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
-   || (OPTION_SET_P (flag_unroll_all_loops)
-  && flag_unroll_all_loops))
+   || (OPTION_SET_P (flag_unroll_all_loops) && flag_unroll_all_loops))
 {
-  if (!OPTION_SET_P (unroll_only_small_loops))
-   unroll_only_small_loops = 0;
-  if (!OPTION_SET_P (flag_rename_registers))
-   flag_rename_registers = 1;
   if (!OPTION_SET_P (flag_cunroll_grow_size))
flag_cunroll_grow_size = 1;
 }
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..faeb7423ca7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
 munroll-only-small-loops
-Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save 
EnabledBy(funroll-loops)
 ; Use conservative small loop unrolling.
 
 mpower9-misc
-- 
2.33.1



Re: [Patch] Fortran/openmp: Fix '!$omp end'

2021-11-12 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 12, 2021 at 04:56:37PM +0100, Tobias Burnus wrote:
> Fortran/openmp: Fix '!$omp end'
> 
> gcc/fortran/ChangeLog:
> 
>   * parse.c (decode_omp_directive): Fix permitting 'nowait' for some
>   combined directives, add missing 'omp end ... loop'.
>   (gfc_ascii_statement): Fix ST_OMP_END_TEAMS_LOOP result.
>   * openmp.c (resolve_omp_clauses): Add missing combined loop constructs
>   case values to the 'if(directive-name: ...)' check.
>   * trans-openmp.c (gfc_split_omp_clauses): Put nowait on target if
>   first leaf construct accepting it.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/unexpected-end.f90: Update dg-error.
>   * gfortran.dg/gomp/clauses-1.f90: New test.
>   * gfortran.dg/gomp/nowait-2.f90: New test.
>   * gfortran.dg/gomp/nowait-3.f90: New test.

Mostly good, except:

> @@ -6132,10 +6134,9 @@ gfc_split_omp_clauses (gfc_code *code,
> if (mask & GFC_OMP_MASK_TEAMS && innermost != GFC_OMP_MASK_TEAMS)
>   gfc_add_clause_implicitly ([GFC_OMP_SPLIT_TEAMS],
>   code->ext.omp_clauses, false, false);
> -   if (((mask & (GFC_OMP_MASK_PARALLEL | GFC_OMP_MASK_DO))
> - == (GFC_OMP_MASK_PARALLEL | GFC_OMP_MASK_DO))
> -   && !is_loop)
> -clausesa[GFC_OMP_SPLIT_DO].nowait = true;
> +   if ((mask & (GFC_OMP_MASK_PARALLEL | GFC_OMP_MASK_DO))
> +   == (GFC_OMP_MASK_PARALLEL | GFC_OMP_MASK_DO))
> +clausesa[GFC_OMP_SPLIT_DO].nowait = false;
>  }

this.  In the standard, yes, for parallel {do,sections,workshare}
indeed the do/sections/workshare doesn't get nowait (either
it is not allowed to specify it at all, or if combined with
target, nowait should go to target and nothing else).
But, for the middle-end, we actually want nowait true
whenever a worksharing construct is combined with parallel,
because when the worksharing construct ends, doing a barrier there
will mean we wait, then immediately get to the implicit barrier at the end
of parallel.

c_omp_split_clauses does:
  /* Add implicit nowait clause on
 #pragma omp parallel {for,for simd,sections}.  */
  if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_NUM_THREADS)) != 0)
switch (code)
  {
  case OMP_FOR:
  case OMP_SIMD:
if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_SCHEDULE)) != 0)
  cclauses[C_OMP_CLAUSE_SPLIT_FOR]
= build_omp_clause (loc, OMP_CLAUSE_NOWAIT);
break;
  case OMP_SECTIONS:
cclauses[C_OMP_CLAUSE_SPLIT_SECTIONS]
  = build_omp_clause (loc, OMP_CLAUSE_NOWAIT);
break;
  default:
break;
  }
and I think the previous code did exactly that.

So, the patch is ok for trunk without the above hunk.

Jakub



Re: [PATCH][V2] rs6000: Remove unnecessary option manipulation.

2021-11-12 Thread Segher Boessenkool
On Fri, Nov 12, 2021 at 03:34:17PM +0100, Martin Liška wrote:
> On 11/11/21 18:52, Segher Boessenkool wrote:
> >You forgot to send the commit message though?
> 
> No, the patch is simple so I didn't write any message (except commit title).

How is a maintainer supposed to know what the patch is about, then?  Not
all of us are clairvoyant.

> >>--- a/gcc/config/rs6000/rs6000.c
> >>+++ b/gcc/config/rs6000/rs6000.c
> >>@@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void)
> >>/* Explicit -funroll-loops turns -munroll-only-small-loops off, and
> >>   turns -frename-registers on.  */
> >>if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
> >>-   || (OPTION_SET_P (flag_unroll_all_loops)
> >>-  && flag_unroll_all_loops))
> >>+   || (OPTION_SET_P (flag_unroll_all_loops) && 
> >>flag_unroll_all_loops))
> >>  {
> >>-  if (!OPTION_SET_P (unroll_only_small_loops))
> >>-   unroll_only_small_loops = 0;
> >>-  if (!OPTION_SET_P (flag_rename_registers))
> >>-   flag_rename_registers = 1;
> >>if (!OPTION_SET_P (flag_cunroll_grow_size))
> >>flag_cunroll_grow_size = 1;
> >>  }
> >
> >So some explanation for these two changes would be good to have.
> 
> It's explained in the ChangeLog entry.

It is not.  Besides, a changelog should describe *what* changed, not
*why*, anyway.

> >>diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> >>index 9d7878f144a..faeb7423ca7 100644
> >>--- a/gcc/config/rs6000/rs6000.opt
> >>+++ b/gcc/config/rs6000/rs6000.opt
> >>@@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1)
> >>Save
> >>  Analyze and remove doubleword swaps from VSX computations.
> >>  
> >>  munroll-only-small-loops
> >>-Target Undocumented Var(unroll_only_small_loops) Init(0) Save
> >>+Target Undocumented Var(unroll_only_small_loops) Init(0) Save
> >>EnabledBy(funroll-loops)
> >
> >You used format=flowed it seems?  Don't.  Patches are mangled with it :-(
> 
> No, it's correct:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583310.html

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

It is not correct.  Please fix.


Segher


Re: [Patch] Fortran/openmp: Fix '!$omp end'

2021-11-12 Thread Tobias Burnus

On 12.11.21 13:02, Jakub Jelinek wrote:

3) anything combined with target allows it


... and puts it on 'target' as it shouldn't be on 'for' or 'do' in
'target ... parallel do/for ...', I'd guess.

Updated patch attach.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/openmp: Fix '!$omp end'

gcc/fortran/ChangeLog:

	* parse.c (decode_omp_directive): Fix permitting 'nowait' for some
	combined directives, add missing 'omp end ... loop'.
	(gfc_ascii_statement): Fix ST_OMP_END_TEAMS_LOOP result.
	* openmp.c (resolve_omp_clauses): Add missing combined loop constructs
	case values to the 'if(directive-name: ...)' check.
	* trans-openmp.c (gfc_split_omp_clauses): Put nowait on target if
	first leaf construct accepting it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/unexpected-end.f90: Update dg-error.
	* gfortran.dg/gomp/clauses-1.f90: New test.
	* gfortran.dg/gomp/nowait-2.f90: New test.
	* gfortran.dg/gomp/nowait-3.f90: New test.

 gcc/fortran/openmp.c  |   3 +
 gcc/fortran/parse.c   |  31 +-
 gcc/fortran/trans-openmp.c|   9 +-
 gcc/testsuite/gfortran.dg/gomp/clauses-1.f90  | 667 ++
 gcc/testsuite/gfortran.dg/gomp/nowait-2.f90   | 315 ++
 gcc/testsuite/gfortran.dg/gomp/unexpected-end.f90 |  12 +-
 6 files changed, 1016 insertions(+), 21 deletions(-)

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 7b2df0d0be3..2893ab2befb 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -6232,6 +6232,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 
 	case EXEC_OMP_PARALLEL:
 	case EXEC_OMP_PARALLEL_DO:
+	case EXEC_OMP_PARALLEL_LOOP:
 	case EXEC_OMP_PARALLEL_MASKED:
 	case EXEC_OMP_PARALLEL_MASTER:
 	case EXEC_OMP_PARALLEL_SECTIONS:
@@ -6285,6 +6286,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	case EXEC_OMP_TARGET:
 	case EXEC_OMP_TARGET_TEAMS:
 	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE:
+	case EXEC_OMP_TARGET_TEAMS_LOOP:
 	  ok = ifc == OMP_IF_TARGET;
 	  break;
 
@@ -6312,6 +6314,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	case EXEC_OMP_TARGET_TEAMS_DISTRIBUTE_PARALLEL_DO:
 	case EXEC_OMP_TARGET_PARALLEL:
 	case EXEC_OMP_TARGET_PARALLEL_DO:
+	case EXEC_OMP_TARGET_PARALLEL_LOOP:
 	  ok = ifc == OMP_IF_TARGET || ifc == OMP_IF_PARALLEL;
 	  break;
 
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 12aa80ec45c..94b677f2a70 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -924,6 +924,7 @@ decode_omp_directive (void)
   matcho ("end distribute", gfc_match_omp_eos_error, ST_OMP_END_DISTRIBUTE);
   matchs ("end do simd", gfc_match_omp_end_nowait, ST_OMP_END_DO_SIMD);
   matcho ("end do", gfc_match_omp_end_nowait, ST_OMP_END_DO);
+  matcho ("end loop", gfc_match_omp_eos_error, ST_OMP_END_LOOP);
   matchs ("end simd", gfc_match_omp_eos_error, ST_OMP_END_SIMD);
   matcho ("end masked taskloop simd", gfc_match_omp_eos_error,
 	  ST_OMP_END_MASKED_TASKLOOP_SIMD);
@@ -939,6 +940,8 @@ decode_omp_directive (void)
   matchs ("end parallel do simd", gfc_match_omp_eos_error,
 	  ST_OMP_END_PARALLEL_DO_SIMD);
   matcho ("end parallel do", gfc_match_omp_eos_error, ST_OMP_END_PARALLEL_DO);
+  matcho ("end parallel loop", gfc_match_omp_eos_error,
+	  ST_OMP_END_PARALLEL_LOOP);
   matcho ("end parallel masked taskloop simd", gfc_match_omp_eos_error,
 	  ST_OMP_END_PARALLEL_MASKED_TASKLOOP_SIMD);
   matcho ("end parallel masked taskloop", gfc_match_omp_eos_error,
@@ -960,24 +963,29 @@ decode_omp_directive (void)
   matcho ("end sections", gfc_match_omp_end_nowait, ST_OMP_END_SECTIONS);
   matcho ("end single", gfc_match_omp_end_single, ST_OMP_END_SINGLE);
   matcho ("end target data", gfc_match_omp_eos_error, ST_OMP_END_TARGET_DATA);
-  matchs ("end target parallel do simd", gfc_match_omp_eos_error,
+  matchs ("end target parallel do simd", gfc_match_omp_end_nowait,
 	  ST_OMP_END_TARGET_PARALLEL_DO_SIMD);
-  matcho ("end target parallel do", gfc_match_omp_eos_error,
+  matcho ("end target parallel do", gfc_match_omp_end_nowait,
 	  ST_OMP_END_TARGET_PARALLEL_DO);
-  matcho ("end target parallel", gfc_match_omp_eos_error,
+  matcho ("end target parallel loop", gfc_match_omp_end_nowait,
+	  ST_OMP_END_TARGET_PARALLEL_LOOP);
+  matcho ("end target parallel", gfc_match_omp_end_nowait,
 	  ST_OMP_END_TARGET_PARALLEL);
-  matchs ("end target simd", gfc_match_omp_eos_error, ST_OMP_END_TARGET_SIMD);
+  matchs ("end target simd", gfc_match_omp_end_nowait, ST_OMP_END_TARGET_SIMD);
   

Re: [PATCH] x86: Require TARGET_HIMODE_MATH for HImode atomic bit expanders

2021-11-12 Thread H.J. Lu via Gcc-patches
On Fri, Nov 12, 2021 at 7:34 AM Jakub Jelinek  wrote:
>
> On Fri, Nov 12, 2021 at 07:29:03AM -0800, H.J. Lu wrote:
> > Check optab before transforming equivalent, but slighly different cases
> > to their canonical forms in optimize_atomic_bit_test_and and require
> > TARGET_HIMODE_MATH in HImode atomic bit expanders.
> >
> > gcc/
> >
> >   PR target/103205
> >   * tree-ssa-ccp.c (optimize_atomic_bit_test_and): Check optab
> >   before transforming equivalent, but slighly different cases to
> >   their canonical forms.
> >   * config/i386/sync.md (atomic_bit_test_and_set): Require
> >   TARGET_HIMODE_MATH for HImode.
> >   (atomic_bit_test_and_complement): Likewise.
> >   (atomic_bit_test_and_reset): Likewise.
> >
> > gcc/testsuite/
> >
> >   PR target/103205
> >   * gcc.target/i386/pr103205-1a.c: New test.
> >   * gcc.target/i386/pr103205-1b.c: Likewise.
> >   * gcc.target/i386/pr103205-2a.c: Likewise.
> >   * gcc.target/i386/pr103205-2b.c: Likewise.
> >   * gcc.target/i386/pr103205-3.c: Likewise.
> >   * gcc.target/i386/pr103205-4.c: Likewise.
>
> Why?  When one uses 16-bit atomics, no matter what he does there will be
> some HImode math (at least the atomic instruction).  And the rest can be
> dealt with.

I withdrew my patch.

> I have following patch queued for testing for this...
>
> 2021-11-12  Jakub Jelinek  
>
> PR target/103205
> * config/i386/sync.md (atomic_bit_test_and_set,
> atomic_bit_test_and_complement,
> atomic_bit_test_and_reset): Use OPTAB_WIDEN instead of
> OPTAB_DIRECT.
>
> * gcc.target/i386/pr103205.c: New test.

Can you include my tests?  Or you can leave out your test and I can check
in my tests after your fix has been checked in.

Thanks.

> --- gcc/config/i386/sync.md.jj  2021-10-04 19:53:01.025005548 +0200
> +++ gcc/config/i386/sync.md 2021-11-12 15:27:47.387273428 +0100
> @@ -726,7 +726,7 @@ (define_expand "atomic_bit_test_and_set<
>rtx result = convert_modes (mode, QImode, tem, 1);
>if (operands[4] == const0_rtx)
>  result = expand_simple_binop (mode, ASHIFT, result,
> - operands[2], operands[0], 0, OPTAB_DIRECT);
> + operands[2], operands[0], 0, OPTAB_WIDEN);
>if (result != operands[0])
>  emit_move_insn (operands[0], result);
>DONE;
> @@ -763,7 +763,7 @@ (define_expand "atomic_bit_test_and_comp
>rtx result = convert_modes (mode, QImode, tem, 1);
>if (operands[4] == const0_rtx)
>  result = expand_simple_binop (mode, ASHIFT, result,
> - operands[2], operands[0], 0, OPTAB_DIRECT);
> + operands[2], operands[0], 0, OPTAB_WIDEN);
>if (result != operands[0])
>  emit_move_insn (operands[0], result);
>DONE;
> @@ -801,7 +801,7 @@ (define_expand "atomic_bit_test_and_rese
>rtx result = convert_modes (mode, QImode, tem, 1);
>if (operands[4] == const0_rtx)
>  result = expand_simple_binop (mode, ASHIFT, result,
> - operands[2], operands[0], 0, OPTAB_DIRECT);
> + operands[2], operands[0], 0, OPTAB_WIDEN);
>if (result != operands[0])
>  emit_move_insn (operands[0], result);
>DONE;
> --- gcc/testsuite/gcc.target/i386/pr103205.c.jj 2021-11-12 15:47:21.218380790 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr103205.c2021-11-12 15:46:39.546980182 
> +0100
> @@ -0,0 +1,11 @@
> +/* PR target/103205 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^himode_math" } */
> +
> +unsigned short a;
> +
> +unsigned short
> +foo (void)
> +{
> +  return __sync_fetch_and_and (, ~1) & 1;
> +}
>
>
> Jakub
>


-- 
H.J.


Re: [PATCH] options: Make -Ofast switch off -fsemantic-interposition

2021-11-12 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> using -fno-semantic-interposition has been reported by various people
> to bring about considerable speed up at the cost of strict compliance
> to the ELF symbol interposition rules  See for example
> https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> 
> As such I believe it should be implied by our -Ofast optimization
> level, not only so that benchmarks that can benefit run faster, but
> also so that people looking at -Ofast documentation for options that
> could speed their programs find it.
> 
> I have verified that with the following patch IPA-CP sees
> flag_semantic_interposition set to zero at Ofast and that info and pdf
> manual builds fine with the documentation change.  I am bootstrapping
> and testing it now in order to comply with submission criteria but I
> don't think an Ofast change gets much tested.
> 
> Assuming it passes, is the patch OK?  (If it is, I will also add a note
> about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> after I commit the patch.)
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2021-11-12  Martin Jambor  
> 
>   * opts.c (default_options_table): Switch off
>   flag_semantic_interposition at Ofast.
>   * doc/invoke.texi (Optimize Options): Document that Ofast switches off
>   -fsemantic-interposition.
OK,
thanks!
Honza


[PATCH] options: Make -Ofast switch off -fsemantic-interposition

2021-11-12 Thread Martin Jambor
Hi,

using -fno-semantic-interposition has been reported by various people
to bring about considerable speed up at the cost of strict compliance
to the ELF symbol interposition rules  See for example
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup

As such I believe it should be implied by our -Ofast optimization
level, not only so that benchmarks that can benefit run faster, but
also so that people looking at -Ofast documentation for options that
could speed their programs find it.

I have verified that with the following patch IPA-CP sees
flag_semantic_interposition set to zero at Ofast and that info and pdf
manual builds fine with the documentation change.  I am bootstrapping
and testing it now in order to comply with submission criteria but I
don't think an Ofast change gets much tested.

Assuming it passes, is the patch OK?  (If it is, I will also add a note
about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
after I commit the patch.)

Thanks,

Martin


gcc/ChangeLog:

2021-11-12  Martin Jambor  

* opts.c (default_options_table): Switch off
flag_semantic_interposition at Ofast.
* doc/invoke.texi (Optimize Options): Document that Ofast switches off
-fsemantic-interposition.
---
 gcc/doc/invoke.texi | 1 +
 gcc/opts.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2ea23d07c4c..fd16c91aec8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10551,6 +10551,7 @@ valid for all standard-compliant programs.
 It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
 and the Fortran-specific @option{-fstack-arrays}, unless
 @option{-fmax-stack-var-size} is specified, and @option{-fno-protect-parens}.
+It turns off @option {-fsemantic-interposition}.
 
 @item -Og
 @opindex Og
diff --git a/gcc/opts.c b/gcc/opts.c
index caed6255500..3da53d8f890 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -682,6 +682,7 @@ static const struct default_options default_options_table[] 
=
 /* -Ofast adds optimizations to -O3.  */
 { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
 { OPT_LEVELS_FAST, OPT_fallow_store_data_races, NULL, 1 },
+{ OPT_LEVELS_FAST, OPT_fsemantic_interposition, NULL, 0 },
 
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
-- 
2.33.0



Fix ICE in tree-ssa-structalias

2021-11-12 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes ICE in sanity check of EAF flags determined: we can not
escape/clobber/return param indirectly w/o reading it.
I moved check earlier and fixed the wrong updates.

Boottrapped/regtested x86_64-linux, comitted.

Honza

PR tree-optimization/103175
* ipa-modref.c (modref_lattice::merge): Add sanity check.
(callee_to_caller_flags): Make flags adjustment sane.
(modref_eaf_analysis::analyze_ssa_name): Likewise.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 44b3427a202..e999c2c5d1e 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1681,6 +1681,13 @@ modref_lattice::merge (int f)
 {
   if (f & EAF_UNUSED)
 return false;
+  /* Check that flags seems sane: if function does not read the parameter
+ it can not access it indirectly.  */
+  gcc_checking_assert (!(f & EAF_NO_DIRECT_READ)
+  || ((f & EAF_NO_INDIRECT_READ)
+  && (f & EAF_NO_INDIRECT_CLOBBER)
+  && (f & EAF_NO_INDIRECT_ESCAPE)
+  && (f & EAF_NOT_RETURNED_INDIRECTLY)));
   if ((flags & f) != flags)
 {
   flags &= f;
@@ -1889,9 +1896,11 @@ callee_to_caller_flags (int call_flags, bool 
ignore_stores,
   if (!(call_flags & EAF_NO_DIRECT_ESCAPE))
lattice.merge (~(EAF_NOT_RETURNED_DIRECTLY
 | EAF_NOT_RETURNED_INDIRECTLY
+| EAF_NO_DIRECT_READ
 | EAF_UNUSED));
   if (!(call_flags & EAF_NO_INDIRECT_ESCAPE))
lattice.merge (~(EAF_NOT_RETURNED_INDIRECTLY
+| EAF_NO_DIRECT_READ
 | EAF_UNUSED));
 }
   else
@@ -2033,11 +2042,11 @@ modref_eaf_analysis::analyze_ssa_name (tree name)
  if (!(call_flags & (EAF_NOT_RETURNED_DIRECTLY
  | EAF_UNUSED)))
m_lattice[index].merge (~(EAF_NO_DIRECT_ESCAPE
- | EAF_NO_INDIRECT_ESCAPE
  | EAF_UNUSED));
  if (!(call_flags & (EAF_NOT_RETURNED_INDIRECTLY
  | EAF_UNUSED)))
m_lattice[index].merge (~(EAF_NO_INDIRECT_ESCAPE
+ | EAF_NO_DIRECT_READ
  | EAF_UNUSED));
  call_flags = callee_to_caller_flags
   (call_flags, false,


Re: [PATCH] x86: Require TARGET_HIMODE_MATH for HImode atomic bit expanders

2021-11-12 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 12, 2021 at 07:29:03AM -0800, H.J. Lu wrote:
> Check optab before transforming equivalent, but slighly different cases
> to their canonical forms in optimize_atomic_bit_test_and and require
> TARGET_HIMODE_MATH in HImode atomic bit expanders.
> 
> gcc/
> 
>   PR target/103205
>   * tree-ssa-ccp.c (optimize_atomic_bit_test_and): Check optab
>   before transforming equivalent, but slighly different cases to
>   their canonical forms.
>   * config/i386/sync.md (atomic_bit_test_and_set): Require
>   TARGET_HIMODE_MATH for HImode.
>   (atomic_bit_test_and_complement): Likewise.
>   (atomic_bit_test_and_reset): Likewise.
> 
> gcc/testsuite/
> 
>   PR target/103205
>   * gcc.target/i386/pr103205-1a.c: New test.
>   * gcc.target/i386/pr103205-1b.c: Likewise.
>   * gcc.target/i386/pr103205-2a.c: Likewise.
>   * gcc.target/i386/pr103205-2b.c: Likewise.
>   * gcc.target/i386/pr103205-3.c: Likewise.
>   * gcc.target/i386/pr103205-4.c: Likewise.

Why?  When one uses 16-bit atomics, no matter what he does there will be
some HImode math (at least the atomic instruction).  And the rest can be
dealt with.

I have following patch queued for testing for this...

2021-11-12  Jakub Jelinek  

PR target/103205
* config/i386/sync.md (atomic_bit_test_and_set,
atomic_bit_test_and_complement,
atomic_bit_test_and_reset): Use OPTAB_WIDEN instead of
OPTAB_DIRECT.

* gcc.target/i386/pr103205.c: New test.

--- gcc/config/i386/sync.md.jj  2021-10-04 19:53:01.025005548 +0200
+++ gcc/config/i386/sync.md 2021-11-12 15:27:47.387273428 +0100
@@ -726,7 +726,7 @@ (define_expand "atomic_bit_test_and_set<
   rtx result = convert_modes (mode, QImode, tem, 1);
   if (operands[4] == const0_rtx)
 result = expand_simple_binop (mode, ASHIFT, result,
- operands[2], operands[0], 0, OPTAB_DIRECT);
+ operands[2], operands[0], 0, OPTAB_WIDEN);
   if (result != operands[0])
 emit_move_insn (operands[0], result);
   DONE;
@@ -763,7 +763,7 @@ (define_expand "atomic_bit_test_and_comp
   rtx result = convert_modes (mode, QImode, tem, 1);
   if (operands[4] == const0_rtx)
 result = expand_simple_binop (mode, ASHIFT, result,
- operands[2], operands[0], 0, OPTAB_DIRECT);
+ operands[2], operands[0], 0, OPTAB_WIDEN);
   if (result != operands[0])
 emit_move_insn (operands[0], result);
   DONE;
@@ -801,7 +801,7 @@ (define_expand "atomic_bit_test_and_rese
   rtx result = convert_modes (mode, QImode, tem, 1);
   if (operands[4] == const0_rtx)
 result = expand_simple_binop (mode, ASHIFT, result,
- operands[2], operands[0], 0, OPTAB_DIRECT);
+ operands[2], operands[0], 0, OPTAB_WIDEN);
   if (result != operands[0])
 emit_move_insn (operands[0], result);
   DONE;
--- gcc/testsuite/gcc.target/i386/pr103205.c.jj 2021-11-12 15:47:21.218380790 
+0100
+++ gcc/testsuite/gcc.target/i386/pr103205.c2021-11-12 15:46:39.546980182 
+0100
@@ -0,0 +1,11 @@
+/* PR target/103205 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^himode_math" } */
+
+unsigned short a;
+
+unsigned short
+foo (void)
+{
+  return __sync_fetch_and_and (, ~1) & 1;
+}


Jakub



[PATCH] x86: Require TARGET_HIMODE_MATH for HImode atomic bit expanders

2021-11-12 Thread H.J. Lu via Gcc-patches
Check optab before transforming equivalent, but slighly different cases
to their canonical forms in optimize_atomic_bit_test_and and require
TARGET_HIMODE_MATH in HImode atomic bit expanders.

gcc/

PR target/103205
* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Check optab
before transforming equivalent, but slighly different cases to
their canonical forms.
* config/i386/sync.md (atomic_bit_test_and_set): Require
TARGET_HIMODE_MATH for HImode.
(atomic_bit_test_and_complement): Likewise.
(atomic_bit_test_and_reset): Likewise.

gcc/testsuite/

PR target/103205
* gcc.target/i386/pr103205-1a.c: New test.
* gcc.target/i386/pr103205-1b.c: Likewise.
* gcc.target/i386/pr103205-2a.c: Likewise.
* gcc.target/i386/pr103205-2b.c: Likewise.
* gcc.target/i386/pr103205-3.c: Likewise.
* gcc.target/i386/pr103205-4.c: Likewise.
---
 gcc/config/i386/sync.md |  6 ++--
 gcc/testsuite/gcc.target/i386/pr103205-1a.c | 27 
 gcc/testsuite/gcc.target/i386/pr103205-1b.c |  9 ++
 gcc/testsuite/gcc.target/i386/pr103205-2a.c | 26 
 gcc/testsuite/gcc.target/i386/pr103205-2b.c |  9 ++
 gcc/testsuite/gcc.target/i386/pr103205-3.c  | 11 +++
 gcc/testsuite/gcc.target/i386/pr103205-4.c  | 11 +++
 gcc/tree-ssa-ccp.c  | 34 -
 8 files changed, 115 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103205-4.c

diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 05a835256bb..68c4314c21c 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -717,7 +717,7 @@ (define_expand "atomic_bit_test_and_set"
(match_operand:SWI248 2 "nonmemory_operand")
(match_operand:SI 3 "const_int_operand") ;; model
(match_operand:SI 4 "const_int_operand")]
-  ""
+  "mode != HImode || TARGET_HIMODE_MATH"
 {
   emit_insn (gen_atomic_bit_test_and_set_1 (operands[1], operands[2],
  operands[3]));
@@ -753,7 +753,7 @@ (define_expand "atomic_bit_test_and_complement"
(match_operand:SWI248 2 "nonmemory_operand")
(match_operand:SI 3 "const_int_operand") ;; model
(match_operand:SI 4 "const_int_operand")]
-  ""
+  "mode != HImode || TARGET_HIMODE_MATH"
 {
   emit_insn (gen_atomic_bit_test_and_complement_1 (operands[1],
 operands[2],
@@ -792,7 +792,7 @@ (define_expand "atomic_bit_test_and_reset"
(match_operand:SWI248 2 "nonmemory_operand")
(match_operand:SI 3 "const_int_operand") ;; model
(match_operand:SI 4 "const_int_operand")]
-  ""
+  "mode != HImode || TARGET_HIMODE_MATH"
 {
   emit_insn (gen_atomic_bit_test_and_reset_1 (operands[1], operands[2],
operands[3]));
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-1a.c 
b/gcc/testsuite/gcc.target/i386/pr103205-1a.c
new file mode 100644
index 000..3ea74b68059
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-1a.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=himode_math" } */
+
+extern short foo;
+
+int
+foo1 (void)
+{
+  return __sync_fetch_and_and(, ~1) & 1;
+}
+
+int
+foo2 (void)
+{
+  return __sync_fetch_and_or (, 1) & 1;
+}
+
+int
+foo3 (void)
+{
+  return __sync_fetch_and_xor (, 1) & 1;
+}
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btrw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btsw" 1 } } */
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*btcw" 1 } } */
+/* { dg-final { scan-assembler-not "cmpxchgw" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-1b.c 
b/gcc/testsuite/gcc.target/i386/pr103205-1b.c
new file mode 100644
index 000..4ce24b5011e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-1b.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=^himode_math" } */
+
+#include "pr103205-1a.c"
+
+/* { dg-final { scan-assembler-times "lock;?\[ \t\]*cmpxchgw" 3 } } */
+/* { dg-final { scan-assembler-not "btrw" } } */
+/* { dg-final { scan-assembler-not "btsw" } } */
+/* { dg-final { scan-assembler-not "btcw" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr103205-2a.c 
b/gcc/testsuite/gcc.target/i386/pr103205-2a.c
new file mode 100644
index 000..7eb7122aaaf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103205-2a.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune-ctrl=himode_math" } */
+
+extern unsigned short foo;
+unsigned short
+foo1 (void)
+{
+  return 

[committed] libgomp: Unbreak gcn offload build

2021-11-12 Thread Jakub Jelinek via Gcc-patches
Hi!

My recent libgomp change apparently broke libgomp build for gcn offloading.
The problem is that gcn, unlike nvptx, doesn't override teams.c source file
and the patch I've committed assumed all the non-LIBGOMP_USE_PTHREADS targets
do not use it.  My understanding is that gcn included omp_get_num_teams
and omp_get_team_num definitions in both icv-device.o and teams.o,
with the definitions only in the former working correctly.

This patch brings gcn into sync with how nvptx does it, that teams.c
is overridden, provides a dummy GOMP_teams_reg and omp_get_{num_teams,team_num}
definitions and icv-device.c doesn't provide those.

Tobias said he has build-tested this with gcn offloading.
Committed to trunk.

2021-11-12  Jakub Jelinek  

PR target/103201
* config/gcn/icv-device.c (omp_get_num_teams, omp_get_team_num): Move
to ...
* config/gcn/teams.c: ... here.  New file.

--- libgomp/config/gcn/icv-device.c.jj  2021-08-05 17:30:59.085260610 +0200
+++ libgomp/config/gcn/icv-device.c 2021-11-12 10:37:19.249351143 +0100
@@ -52,18 +52,6 @@ omp_get_num_devices (void)
 }
 
 int
-omp_get_num_teams (void)
-{
-  return gomp_num_teams_var + 1;
-}
-
-int __attribute__ ((__optimize__ ("O2")))
-omp_get_team_num (void)
-{
-  return __builtin_gcn_dim_pos (0);
-}
-
-int
 omp_is_initial_device (void)
 {
   /* AMD GCN is an accelerator-only target.  */
@@ -84,7 +72,5 @@ ialias (omp_set_default_device)
 ialias (omp_get_default_device)
 ialias (omp_get_initial_device)
 ialias (omp_get_num_devices)
-ialias (omp_get_num_teams)
-ialias (omp_get_team_num)
 ialias (omp_is_initial_device)
 ialias (omp_get_device_num)
--- libgomp/config/gcn/teams.c.jj   2021-11-12 10:37:47.227951052 +0100
+++ libgomp/config/gcn/teams.c  2021-11-12 10:39:34.010426094 +0100
@@ -0,0 +1,54 @@
+/* Copyright (C) 2015-2021 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This file defines OpenMP API entry points that accelerator targets are
+   expected to replace.  */
+
+#include "libgomp.h"
+
+void
+GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams,
+   unsigned int thread_limit, unsigned int flags)
+{
+  (void) fn;
+  (void) data;
+  (void) flags;
+  (void) num_teams;
+  (void) thread_limit;
+}
+
+int
+omp_get_num_teams (void)
+{
+  return gomp_num_teams_var + 1;
+}
+
+int __attribute__ ((__optimize__ ("O2")))
+omp_get_team_num (void)
+{
+  return __builtin_gcn_dim_pos (0);
+}
+
+ialias (omp_get_num_teams)
+ialias (omp_get_team_num)


Jakub



[PATCH][GCC] arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.

2021-11-12 Thread Srinath Parvathaneni via Gcc-patches
Hello,

This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo 
hard-register and also 
.save {ra_auth_code} and .cfi_offset ra_auth_code  dwarf directives for 
the PAC feature
in Armv8.1-M architecture.

RA_AUTH_CODE register number is 107 and it's dwarf register number is 143.

When compiled with "arm-none-eabi-gcc -O2  -mthumb -march=armv8.1-m.main+pacbti 
-S -fasynchronous-unwind-tables -g"
command line options, the directives supported in this patch looks like below:

...
push{ip}
.save {ra_auth_code}
.cfi_def_cfa_offset 8
.cfi_offset 143, -8
...

This patch can be committed after the patch at 
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583407.html
is committed.

Regression tested on arm-none-eabi target and found no regressions.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2021-11-12  Srinath Parvathaneni  

* config/arm/aout.h (ra_auth_code): Add to enum.
* config/arm/arm.c (emit_multi_reg_push): Add RA_AUTH_CODE register to
dwarf frame expression instead of IP_REGNUM.
(arm_expand_prologue): Mark as frame related insn.
(arm_regno_class): Check for pac pseudo reigster.
(arm_dbx_register_number): Assign ra_auth_code register number in dwarf.
(arm_unwind_emit_sequence): Print .save directive with ra_auth_code
register.
(arm_conditional_register_usage): Mark ra_auth_code in fixed reigsters.
* config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify.
(IS_PAC_Pseudo_REGNUM): Define.
(enum reg_class): Add PAC_REG entry.
* config/arm/arm.md (RA_AUTH_CODE): Define.

gcc/testsuite/ChangeLog:

2021-11-12  Srinath Parvathaneni  

* gcc.target/arm/pac-6.c: New test.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
25a2812a663742893b928398b0d3948e97f1905b..c69e299e012f46c8d0711830125dbf2f6b2e93d7
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -74,7 +74,8 @@
   "wr8",   "wr9",   "wr10",  "wr11",   \
   "wr12",  "wr13",  "wr14",  "wr15",   \
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",  \
-  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0" \
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0",\
+  "ra_auth_code"   \
 }
 #endif
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
8e6ef41f6b065217d1af3f4f1cb85b2d8fbd0dc0..f31944e85c9ab83501f156d138e2aea1bcb5b79d
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -815,7 +815,8 @@ extern const int arm_arch_cde_coproc_bits[];
s16-s31   S VFP variable (aka d8-d15).
vfpcc   Not a real register.  Represents the VFP condition
code flags.
-   vpr Used to represent MVE VPR predication.  */
+   vpr Used to represent MVE VPR predication.
+   ra_auth_codePseudo register to save PAC.  */
 
 /* The stack backtrace structure is as follows:
   fp points to here:  |  save code pointer  |  [fp]
@@ -856,7 +857,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 /* 1 for registers not available across function calls.
@@ -886,7 +887,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE
@@ -1062,10 +1063,10 @@ extern const int arm_arch_cde_coproc_bits[];
&& (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   + 1 APSRQ + 1 APSRGE + 1 VPR + 1 Pseudo register to save PAC.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
-#define FIRST_PSEUDO_REGISTER   107
+#define FIRST_PSEUDO_REGISTER   108
 
 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)
 
@@ -1248,12 +1249,15 @@ extern int arm_regs_in_sequence[];
   CC_REGNUM, VFPCC_REGNUM, \
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,\
   SP_REGNUM, PC_REGNUM, APSRQ_REGNUM,  \
-  APSRGE_REGNUM, VPR_REGNUM\
+  APSRGE_REGNUM, VPR_REGNUM, RA_AUTH_CODE  \
 }
 
 #define IS_VPR_REGNUM(REGNUM) \
   ((REGNUM) == VPR_REGNUM)
 
+#define IS_PAC_Pseudo_REGNUM(REGNUM) \
+  ((REGNUM) == RA_AUTH_CODE)
+
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
 
@@ -1292,6 +1296,7 @@ enum 

Re: [PATCH] Remove dead code.

2021-11-12 Thread H.J. Lu via Gcc-patches
On Fri, Nov 12, 2021 at 6:44 AM Martin Liška  wrote:
>
> On 11/12/21 15:41, H.J. Lu wrote:
> > On Fri, Nov 12, 2021 at 6:27 AM Martin Liška  wrote:
> >>
> >> On 11/8/21 15:19, Jeff Law wrote:
> >>>
> >>>
> >>> On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:
>  On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:
> > This fixes issue reported in the PR.
> >
> > Ready to be installed?
>  I'm not sure.  liboffloadmic is copied from upstream, so the right
>  thing if we want to do anything at all (if we don't remove it, nothing
>  bad happens, the condition is never true anyway, whether removed away
>  in the source or removed by the compiler) would be to let Intel fix it in
>  their source and update from that.
>  But I have no idea where it even lives upstream.
> >>> I thought MIC as an architecture was dead, so it could well be the case 
> >>> that there isn't a viable upstream anymore for that code.
> >>>
> >>> jeff
> >>
> >> @H.J. ?
> >>
> >
> > We'd like to deprecate MIC offload in GCC 12.  We will remove all traces of
> > MIC offload in GCC 13.
>
> I see. So do you want the patch to be installed or not?
>

I prefer to leave it alone and close the PR with WONTFIX.

-- 
H.J.


Re: [PATCH] Remove dead code.

2021-11-12 Thread Martin Liška

On 11/12/21 15:41, H.J. Lu wrote:

On Fri, Nov 12, 2021 at 6:27 AM Martin Liška  wrote:


On 11/8/21 15:19, Jeff Law wrote:



On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:

This fixes issue reported in the PR.

Ready to be installed?

I'm not sure.  liboffloadmic is copied from upstream, so the right
thing if we want to do anything at all (if we don't remove it, nothing
bad happens, the condition is never true anyway, whether removed away
in the source or removed by the compiler) would be to let Intel fix it in
their source and update from that.
But I have no idea where it even lives upstream.

I thought MIC as an architecture was dead, so it could well be the case that 
there isn't a viable upstream anymore for that code.

jeff


@H.J. ?



We'd like to deprecate MIC offload in GCC 12.  We will remove all traces of
MIC offload in GCC 13.


I see. So do you want the patch to be installed or not?

Cheers,
Martin








Re: [PATCH] Remove dead code.

2021-11-12 Thread H.J. Lu via Gcc-patches
On Fri, Nov 12, 2021 at 6:27 AM Martin Liška  wrote:
>
> On 11/8/21 15:19, Jeff Law wrote:
> >
> >
> > On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:
> >> On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:
> >>> This fixes issue reported in the PR.
> >>>
> >>> Ready to be installed?
> >> I'm not sure.  liboffloadmic is copied from upstream, so the right
> >> thing if we want to do anything at all (if we don't remove it, nothing
> >> bad happens, the condition is never true anyway, whether removed away
> >> in the source or removed by the compiler) would be to let Intel fix it in
> >> their source and update from that.
> >> But I have no idea where it even lives upstream.
> > I thought MIC as an architecture was dead, so it could well be the case 
> > that there isn't a viable upstream anymore for that code.
> >
> > jeff
>
> @H.J. ?
>

We'd like to deprecate MIC offload in GCC 12.  We will remove all traces of
MIC offload in GCC 13.


-- 
H.J.


[PATCH] Replace more DEBUG_EXPR_DECL creations with build_debug_expr_decl

2021-11-12 Thread Martin Jambor
Hi,

On Tue, Nov 09 2021, Richard Biener wrote:
> On Mon, 8 Nov 2021, Martin Jambor wrote:
>> this patch introduces a helper function build_debug_expr_decl to build
>> DEBUG_EXPR_DECL tree nodes in the most common way and replaces with a
>> call of this function all code pieces which build such a DECL itself
>> and sets its mode to the TYPE_MODE of its type.
>> 
>> There still remain 11 instances of open-coded creation of a
>> DEBUG_EXPR_DECL which set the mode of the DECL to something else.  It
>> would probably be a good idea to figure out that has any effect and if
>> not, convert them to calls of build_debug_expr_decl too.  But this
>> patch deliberately does not introduce any functional changes.
>> 
>> Bootstrapped and tested on x86_64-linux, OK for trunk?
>
> OK (the const_tree suggestion is a good one).
>
> For the remaining cases I'd simply use
>
> decl = build_debug_expr_decl (type);
> SET_DECL_MODE (decl) = ...;
>
> and thus override the mode afterwards, maybe adding a comment to
> check whether that's necessary.  As said, the only case where it
> might matter is when we create a debug decl replacement for a FIELD_DECL,
> so maybe for those SRA things we create for DWARF "piece" info?
>

Like this?  This patch replaces all but one remaining open coded
constructions of DEBUG_EXPR_DECL with calls to build_debug_expr_decl,
even if - in order not to introduce any functional change - the mode of
the constructed decl is then overwritten.

It is not clear if changing the mode has any effect in practice and
therefore I have added a FIXME note to code which does it, as
requested.

After this patch, DEBUG_EXPR_DECLs are created only by
build_debug_expr_decl and make_debug_expr_from_rtl which looks like
it should be left alone.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

I have also compared the generated DWARF (with readelf -w) of cc1plus
generated by a compiler with this patch and one with the mode setting
removed (on top of this patch) and there were no differences
whatsoever.  So perhaps we can just remove it?  I have not
bootstrapped that patch yet, though.

Thanks,

Martin


gcc/ChangeLog:

2021-11-11  Martin Jambor  

* cfgexpand.c (expand_gimple_basic_block): Use build_debug_expr_decl,
add a fixme note about the mode assignment perhaps being unnecessary.
* ipa-param-manipulation.c (ipa_param_adjustments::modify_call):
Likewise.
(ipa_param_body_adjustments::mark_dead_statements): Likewise.
(ipa_param_body_adjustments::reset_debug_stmts): Likewise.
* tree-inline.c (remap_ssa_name): Likewise.
(tree_function_versioning): Likewise.
* tree-into-ssa.c (rewrite_debug_stmt_uses): Likewise.
* tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
* tree-ssa.c (insert_debug_temp_for_var_def): Likewise.
---
 gcc/cfgexpand.c  |  5 ++---
 gcc/ipa-param-manipulation.c | 17 +++--
 gcc/tree-inline.c| 17 +++--
 gcc/tree-into-ssa.c  |  7 +++
 gcc/tree-ssa-loop-ivopts.c   |  5 ++---
 gcc/tree-ssa.c   |  5 ++---
 6 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 55ff75bd78e..eb6466f4be6 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5898,18 +5898,17 @@ expand_gimple_basic_block (basic_block bb, bool 
disable_tail_calls)
   temporary.  */
gimple *debugstmt;
tree value = gimple_assign_rhs_to_tree (def);
-   tree vexpr = make_node (DEBUG_EXPR_DECL);
+   tree vexpr = build_debug_expr_decl (TREE_TYPE (value));
rtx val;
machine_mode mode;
 
set_curr_insn_location (gimple_location (def));
 
-   DECL_ARTIFICIAL (vexpr) = 1;
-   TREE_TYPE (vexpr) = TREE_TYPE (value);
if (DECL_P (value))
  mode = DECL_MODE (value);
else
  mode = TYPE_MODE (TREE_TYPE (value));
+   /* FIXME: Is setting the mode really necessary? */
SET_DECL_MODE (vexpr, mode);
 
val = gen_rtx_VAR_LOCATION
diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index ae3149718ca..a230735d71e 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -831,9 +831,8 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs,
  }
  if (ddecl == NULL)
{
- ddecl = make_node (DEBUG_EXPR_DECL);
- DECL_ARTIFICIAL (ddecl) = 1;
- TREE_TYPE (ddecl) = TREE_TYPE (origin);
+ ddecl = build_debug_expr_decl (TREE_TYPE (origin));
+ /* FIXME: Is setting the mode really necessary? */
  SET_DECL_MODE (ddecl, DECL_MODE (origin));
 
  vec_safe_push (*debug_args, origin);
@@ -1063,9 +1062,8 @@ 

Re: [PATCH] Remove MAY_HAVE_DEBUG_MARKER_STMTS and MAY_HAVE_DEBUG_BIND_STMTS.

2021-11-12 Thread Martin Liška

@Alexandre: PING

On 10/18/21 12:05, Richard Biener wrote:

On Mon, Oct 18, 2021 at 10:54 AM Martin Liška  wrote:


The macros correspond 1:1 to an option flags and make it harder
to find all usages of the flags.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


Hmm, they were introduced on purpose - since you leave around
MAY_HAVE_DEBUG_STMTS they conceptually make the code
easier to understand.

So I'm not sure if we want this change.  CCed Alex so maybe he
can weight in.

Richard.


Thanks,
Martin

gcc/c-family/ChangeLog:

 * c-gimplify.c (genericize_c_loop): Use option directly.

gcc/c/ChangeLog:

 * c-parser.c (add_debug_begin_stmt): Use option directly.

gcc/ChangeLog:

 * cfgexpand.c (pass_expand::execute): Use option directly.
 * function.c (allocate_struct_function): Likewise.
 * gimple-low.c (lower_function_body): Likewise.
 (lower_stmt): Likewise.
 * gimple-ssa-backprop.c (backprop::prepare_change): Likewise.
 * ipa-param-manipulation.c (ipa_param_adjustments::modify_call): 
Likewise.
 * ipa-split.c (split_function): Likewise.
 * lto-streamer-in.c (input_function): Likewise.
 * sese.c (sese_insert_phis_for_liveouts): Likewise.
 * ssa-iterators.h (num_imm_uses): Likewise.
 * tree-cfg.c (make_blocks): Likewise.
 (gimple_merge_blocks): Likewise.
 * tree-inline.c (tree_function_versioning): Likewise.
 * tree-loop-distribution.c (generate_loops_for_partition): Likewise.
 * tree-sra.c (analyze_access_subtree): Likewise.
 * tree-ssa-dce.c (remove_dead_stmt): Likewise.
 * tree-ssa-loop-ivopts.c (remove_unused_ivs): Likewise.
 * tree-ssa-phiopt.c (spaceship_replacement): Likewise.
 * tree-ssa-reassoc.c (reassoc_remove_stmt): Likewise.
 * tree-ssa-tail-merge.c (tail_merge_optimize): Likewise.
 * tree-ssa-threadedge.c (propagate_threaded_block_debug_into): 
Likewise.
 * tree-ssa.c (gimple_replace_ssa_lhs): Likewise.
 (target_for_debug_bind): Likewise.
 (insert_debug_temp_for_var_def): Likewise.
 (insert_debug_temps_for_defs): Likewise.
 (reset_debug_uses): Likewise.
 * tree-ssanames.c (release_ssa_name_fn): Likewise.
 * tree-vect-loop-manip.c (adjust_vec_debug_stmts): Likewise.
 (adjust_debug_stmts): Likewise.
 (adjust_phi_and_debug_stmts): Likewise.
 (vect_do_peeling): Likewise.
 * tree-vect-loop.c (vect_transform_loop_stmt): Likewise.
 (vect_transform_loop): Likewise.
 * tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): Remove
 (MAY_HAVE_DEBUG_BIND_STMTS): Remove.
 (MAY_HAVE_DEBUG_STMTS): Use options directly.

gcc/cp/ChangeLog:

 * parser.c (add_debug_begin_stmt): Use option directly.
---
   gcc/c-family/c-gimplify.c|  4 ++--
   gcc/c/c-parser.c |  2 +-
   gcc/cfgexpand.c  |  2 +-
   gcc/cp/parser.c  |  2 +-
   gcc/function.c   |  2 +-
   gcc/gimple-low.c |  4 ++--
   gcc/gimple-ssa-backprop.c|  2 +-
   gcc/ipa-param-manipulation.c |  2 +-
   gcc/ipa-split.c  |  6 +++---
   gcc/lto-streamer-in.c|  4 ++--
   gcc/sese.c   |  2 +-
   gcc/ssa-iterators.h  |  2 +-
   gcc/tree-cfg.c   |  4 ++--
   gcc/tree-inline.c|  2 +-
   gcc/tree-loop-distribution.c |  2 +-
   gcc/tree-sra.c   |  2 +-
   gcc/tree-ssa-dce.c   |  2 +-
   gcc/tree-ssa-loop-ivopts.c   |  2 +-
   gcc/tree-ssa-phiopt.c|  2 +-
   gcc/tree-ssa-reassoc.c   |  2 +-
   gcc/tree-ssa-tail-merge.c|  2 +-
   gcc/tree-ssa-threadedge.c|  2 +-
   gcc/tree-ssa.c   | 10 +-
   gcc/tree-ssanames.c  |  2 +-
   gcc/tree-vect-loop-manip.c   |  8 
   gcc/tree-vect-loop.c |  4 ++--
   gcc/tree.h   |  7 +--
   27 files changed, 41 insertions(+), 46 deletions(-)

diff --git a/gcc/c-family/c-gimplify.c b/gcc/c-family/c-gimplify.c
index 0d38b706f4c..d9cf051a680 100644
--- a/gcc/c-family/c-gimplify.c
+++ b/gcc/c-family/c-gimplify.c
@@ -295,7 +295,7 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
 finish_bc_block (_list, bc_continue, clab);
 if (incr)
   {
-  if (MAY_HAVE_DEBUG_MARKER_STMTS && incr_locus != UNKNOWN_LOCATION)
+  if (debug_nonbind_markers_p && incr_locus != UNKNOWN_LOCATION)
 {
   tree d = build0 (DEBUG_BEGIN_STMT, void_type_node);
   SET_EXPR_LOCATION (d, expr_loc_or_loc (incr, start_locus));
@@ -305,7 +305,7 @@ genericize_c_loop (tree *stmt_p, location_t start_locus, 
tree cond, tree body,
   }
 append_to_statement_list (entry, _list);

-  if (MAY_HAVE_DEBUG_MARKER_STMTS && cond_locus != UNKNOWN_LOCATION)
+  if (debug_nonbind_markers_p && cond_locus != UNKNOWN_LOCATION)
   {
 

[PATCH] Fortran: Use build_debug_expr_decl to create DEBUG_DECL_EXPRs

2021-11-12 Thread Martin Jambor
Hi,

This patch converts one more open coded construction of a
DEBUG_EXPR_DECL to a call of build_debug_expr_decl that I missed in my
previous patch because it happens to be in the Fortran front-end.

Bootstrapped and tested on x86_64-linux.  Since this should have
been done by an earlier approved patch, I consider it also approved
and will commit it in a moment.

Thanks,

Martin


gcc/fortran/ChangeLog:

2021-11-11  Martin Jambor  

* trans-types.c (gfc_get_array_descr_info): Use build_debug_expr_decl
instead of building DEBUG_EXPR_DECL manually.
---
 gcc/fortran/trans-types.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index 947ab5a099b..e5d36d5a58f 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -3417,10 +3417,8 @@ gfc_get_array_descr_info (const_tree type, struct 
array_descr_info *info)
   base_decl = GFC_TYPE_ARRAY_BASE_DECL (type, indirect);
   if (!base_decl)
 {
-  base_decl = make_node (DEBUG_EXPR_DECL);
-  DECL_ARTIFICIAL (base_decl) = 1;
-  TREE_TYPE (base_decl) = indirect ? build_pointer_type (ptype) : ptype;
-  SET_DECL_MODE (base_decl, TYPE_MODE (TREE_TYPE (base_decl)));
+  base_decl = build_debug_expr_decl (indirect
+? build_pointer_type (ptype) : ptype);
   GFC_TYPE_ARRAY_BASE_DECL (type, indirect) = base_decl;
 }
   info->base_decl = base_decl;
-- 
2.33.0



[PATCH][V3] rs6000: Remove unnecessary option manipulation.

2021-11-12 Thread Martin Liska
gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_override_options_after_change):
Do not set flag_rename_registers, it's already enabled with 
EnabledBy(funroll-loops).
Use EnabledBy for unroll_only_small_loops.
* config/rs6000/rs6000.opt: Use EnabledBy for unroll_only_small_loops.
---
 gcc/config/rs6000/rs6000.c   | 7 +--
 gcc/config/rs6000/rs6000.opt | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e4843eb0f1c..5550113a94c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3466,13 +3466,8 @@ rs6000_override_options_after_change (void)
   /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
  turns -frename-registers on.  */
   if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
-   || (OPTION_SET_P (flag_unroll_all_loops)
-  && flag_unroll_all_loops))
+   || (OPTION_SET_P (flag_unroll_all_loops) && flag_unroll_all_loops))
 {
-  if (!OPTION_SET_P (unroll_only_small_loops))
-   unroll_only_small_loops = 0;
-  if (!OPTION_SET_P (flag_rename_registers))
-   flag_rename_registers = 1;
   if (!OPTION_SET_P (flag_cunroll_grow_size))
flag_cunroll_grow_size = 1;
 }
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..faeb7423ca7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
 munroll-only-small-loops
-Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save 
EnabledBy(funroll-loops)
 ; Use conservative small loop unrolling.
 
 mpower9-misc
-- 
2.33.1



Re: [PATCH][V2] rs6000: Remove unnecessary option manipulation.

2021-11-12 Thread Martin Liška

On 11/11/21 18:52, Segher Boessenkool wrote:

Hi!

On Thu, Nov 04, 2021 at 01:36:06PM +0100, Martin Liška wrote:

Sending the patch in a separate thread.




Hello.


You forgot to send the commit message though?


No, the patch is simple so I didn't write any message (except commit title).




* config/rs6000/rs6000.c (rs6000_override_options_after_change):
Do not set flag_rename_registers, it's already enabled with
EnabledBy(funroll-loops).
Use EnabledBy for unroll_only_small_loops.
* config/rs6000/rs6000.opt: Use EnabledBy for
unroll_only_small_loops.


Please don't put newlines in random places.  It makes reading changelogs
much harder than needed.


All right, I'm going to update it in V3.




--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void)
/* Explicit -funroll-loops turns -munroll-only-small-loops off, and
   turns -frename-registers on.  */
if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
-   || (OPTION_SET_P (flag_unroll_all_loops)
-  && flag_unroll_all_loops))
+   || (OPTION_SET_P (flag_unroll_all_loops) && flag_unroll_all_loops))
  {
-  if (!OPTION_SET_P (unroll_only_small_loops))
-   unroll_only_small_loops = 0;
-  if (!OPTION_SET_P (flag_rename_registers))
-   flag_rename_registers = 1;
if (!OPTION_SET_P (flag_cunroll_grow_size))
flag_cunroll_grow_size = 1;
  }


So some explanation for these two changes would be good to have.


It's explained in the ChangeLog entry.




diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..faeb7423ca7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1)
Save
  Analyze and remove doubleword swaps from VSX computations.
  
  munroll-only-small-loops

-Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save
EnabledBy(funroll-loops)


You used format=flowed it seems?  Don't.  Patches are mangled with it :-(


No, it's correct:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583310.html

Martin




Segher





Re: [PATCH] Remove dead code.

2021-11-12 Thread Martin Liška

On 11/8/21 15:19, Jeff Law wrote:



On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:

This fixes issue reported in the PR.

Ready to be installed?

I'm not sure.  liboffloadmic is copied from upstream, so the right
thing if we want to do anything at all (if we don't remove it, nothing
bad happens, the condition is never true anyway, whether removed away
in the source or removed by the compiler) would be to let Intel fix it in
their source and update from that.
But I have no idea where it even lives upstream.

I thought MIC as an architecture was dead, so it could well be the case that 
there isn't a viable upstream anymore for that code.

jeff


@H.J. ?

Martin


[PATCH][pushed] testsuite: Filter out TSVC test on Power [PR103051]

2021-11-12 Thread Martin Liška

Pushed to master.

Martin

PR testsuite/103051

gcc/testsuite/ChangeLog:

* gcc.dg/vect/tsvc/vect-tsvc-s112.c: Skip test for old Power
CPUs.
---
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s112.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s112.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s112.c
index 3c6ae49f212..c8afaf73f3b 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s112.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s112.c
@@ -36,4 +36,4 @@ int main (int argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */

\ No newline at end of file
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
{ ! powerpc*-*-* } || has_arch_pwr8 } } } } */
--
2.33.1



Re: [PATCH] libbacktrace: fix UBSAN issues

2021-11-12 Thread Martin Liška

On 11/11/21 20:21, Ian Lance Taylor wrote:

On Thu, Nov 11, 2021 at 7:39 AM Martin Liška  wrote:


Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

Fix issues mentioned in the PR.

 PR libbacktrace/103167

libbacktrace/ChangeLog:

 * elf.c (elf_uncompress_lzma_block): Cast to unsigned int.
 (elf_uncompress_lzma): Likewise.
 * xztest.c (test_samples): memcpy only if v > 0.

Co-Authored-By: Andrew Pinski 
---
   libbacktrace/elf.c| 8 
   libbacktrace/xztest.c | 2 +-
   2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/libbacktrace/elf.c b/libbacktrace/elf.c
index 79d56146fc6..e69ac41c88b 100644
--- a/libbacktrace/elf.c
+++ b/libbacktrace/elf.c
@@ -3175,7 +3175,7 @@ elf_uncompress_lzma_block (const unsigned char 
*compressed,
 stream_crc = (compressed[off]
 | (compressed[off + 1] << 8)
 | (compressed[off + 2] << 16)
-   | (compressed[off + 3] << 24));
+   | ((unsigned)(compressed[off + 3]) << 24));


Thanks, but this kind of thing looks strange and is therefore likely
to break again in the future.  I suggest instead

   stream_crc = ((uint32_t) compressed[off]
  | ((uint32_t) compressed[off + 1] << 8)
  | ((uint32_t) compressed[off + 2] << 16)
  | ((uint32_t) compressed[off + 3] << 24));

Same for the similar cases elsewhere.


Sure, done and pushed as g:83310a08a2bc52b6e8c3a3e3216b4e723e58c961.

Thanks,
Martin



Ian





[PATCH] tree-optimization/102880 - improve CD-DCE

2021-11-12 Thread Richard Biener via Gcc-patches
The PR shows a missed control-dependent DCE caused by CFG cleanup
merging a forwarder resulting in a partially degenerate PHI node.
With control-dependent DCE we need to mark control dependences
of incoming edges into PHIs as necessary but that is unnecessarily
conservative for the case when two edges have the same value.
There is no easy way to mark only a subset of control dependences
of both edges necessary so the fix is to produce forwarder blocks
where then the control dependence captures the requirements more
precisely.

The same CFG massaging could be useful at RTL expansion time to
reduce the number of copies we need to insert on edges.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I need
to think about a FAIL of gcc.dg/tree-ssa/ssa-hoist-4.c this
is causing where we fail to discover a MAX_EXPR because
CFG cleanup undoes the forwarder "the other way around",
producing IL that's not recognized by phiopt.

Still the patch is here for general comments on the approach
and implementation.

Thanks,
Richard.

2021-11-12  Richard Biener  

PR tree-optimization/102880
* tree-ssa-dce.c (sort_phi_args): New function.
(make_forwarders_with_degenerate_phis): Likewise.
(perform_tree_ssa_dce): Call
make_forwarders_with_degenerate_phis.

* gcc.dg/tree-ssa/pr102880.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr102880.c |  27 
 gcc/tree-ssa-dce.c   | 171 ++-
 2 files changed, 194 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102880.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102880.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr102880.c
new file mode 100644
index 000..0306deedb6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102880.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void foo(void);
+
+static int b, c, d, e, f, ah;
+static short g, ai, am, aq, as;
+static char an, at, av, ax, ay;
+static char a(char h, char i) { return i == 0 || h && i == 1 ? 0 : h % i; }
+static void ae(int h) {
+  if (a(b, h))
+foo();
+
+}
+int main() {
+  ae(1);
+  ay = a(0, ay);
+  ax = a(g, aq);
+  at = a(0, as);
+  av = a(c, 1);
+  an = a(am, f);
+  int al = e || ((a(1, ah) && b) & d) == 2;
+  ai = al;
+}
+
+/* We should eliminate the call to foo.  */
+/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */
diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index 1281e67489c..dbf02c434de 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-propagate.h"
 #include "gimple-fold.h"
+#include "tree-ssa.h"
 
 static struct stmt_stats
 {
@@ -1612,6 +1613,164 @@ tree_dce_done (bool aggressive)
   worklist.release ();
 }
 
+/* Sort PHI argument values for make_forwarders_with_degenerate_phis.  */
+
+static int
+sort_phi_args (const void *a_, const void *b_)
+{
+  auto *a = (const std::pair *) a_;
+  auto *b = (const std::pair *) b_;
+  hashval_t ha = a->second;
+  hashval_t hb = b->second;
+  if (ha < hb)
+return -1;
+  else if (ha > hb)
+return 1;
+  else
+return 0;
+}
+
+/* Look for a non-virtual PHIs and make a forwarder block when all PHIs
+   have the same argument on a set of edges.  This is to not consider
+   control dependences of individual edges for same values but only for
+   the common set.  */
+
+static unsigned
+make_forwarders_with_degenerate_phis (function *fn)
+{
+  unsigned todo = 0;
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fn)
+{
+  /* Only PHIs with three or more arguments have opportunities.  */
+  if (EDGE_COUNT (bb->preds) < 3)
+   continue;
+  /* Do not touch loop headers.  */
+  if (bb->loop_father->header == bb)
+   continue;
+
+  /* Take one PHI node as template to look for identical
+arguments.  Build a vector of candidates forming sets
+of argument edges with equal values.  Note optimality
+depends on the particular choice of the template PHI
+since equal arguments are unordered leaving other PHIs
+with more than one set of equal arguments within this
+argument range unsorted.  We'd have to break ties by
+looking at other PHI nodes.  */
+  gphi_iterator gsi = gsi_start_nonvirtual_phis (bb);
+  if (gsi_end_p (gsi))
+   continue;
+  gphi *phi = gsi.phi ();
+  auto_vec, 8> args;
+  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+   {
+ edge e = gimple_phi_arg_edge (phi, i);
+ /* Skip abnormal edges since we cannot redirect them.  */
+ if (e->flags & EDGE_ABNORMAL)
+   continue;
+ /* Skip loop exit edges when we are in loop-closed SSA form
+since the forwarder we'd create does not have a PHI node.  */
+ if (loops_state_satisfies_p (LOOP_CLOSED_SSA)
+ && loop_exit_edge_p 

[committed] jit: fix -Werror=format-overflow= in testsuite [PR103199]

2021-11-12 Thread David Malcolm via Gcc-patches
Successfully regression-tested on x86_64-pc-linux-gnu, improves jit.sum:
  FAIL: 2->0 (-2)
  PASS: 12043->12073 (+30)

Pushed to trunk as r12-5196-gaa1fd30df56d752e3d5a81af409875a1f1e3e327.

gcc/jit/ChangeLog:
PR jit/103199
* docs/examples/tut04-toyvm/toyvm.c (toyvm_function_compile):
Increase size of buffer.
* docs/examples/tut04-toyvm/toyvm.cc
(compilation_state::create_function): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/jit/docs/examples/tut04-toyvm/toyvm.c  | 2 +-
 gcc/jit/docs/examples/tut04-toyvm/toyvm.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c 
b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
index 8ea716e2d0a..63418f92d14 100644
--- a/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
+++ b/gcc/jit/docs/examples/tut04-toyvm/toyvm.c
@@ -561,7 +561,7 @@ toyvm_function_compile (toyvm_function *fn)
   /* Create a block per operation.  */
   for (pc = 0; pc < fn->fn_num_ops; pc++)
 {
-  char buf[16];
+  char buf[100];
   sprintf (buf, "instr%i", pc);
   state.op_blocks[pc] = gcc_jit_function_new_block (state.fn, buf);
 }
diff --git a/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc 
b/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc
index 7e9550159ad..81c8045af2c 100644
--- a/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc
+++ b/gcc/jit/docs/examples/tut04-toyvm/toyvm.cc
@@ -633,7 +633,7 @@ compilation_state::create_function (const char *funcname)
   /* Create a block per operation.  */
   for (int pc = 0; pc < toyvmfn.fn_num_ops; pc++)
 {
-  char buf[16];
+  char buf[100];
   sprintf (buf, "instr%i", pc);
   op_blocks[pc] = fn.new_block (buf);
 }
-- 
2.26.3



[PATCH] libgomp, nvptx, v2: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 12, 2021 at 02:20:23PM +0100, Jakub Jelinek via Gcc-patches wrote:
> This patch assumes that .shared variables are initialized to 0,
> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html lists
> in Table 7. .shared as non-initializable.  If that isn't the case,
> we need to initialize it somewhere for the case of #pragma omp target
> without #pragma omp teams in it, maybe in libgcc/config/nvptx/crt0.c ?

A quick look at libgcc/config/nvptx/crt0.c shows the target supports
__attribute__((shared)), so perhaps either following instead, or, if
.shared isn't preinitialized to zero, defining the variable in
libgcc/config/nvptx/crt0.c , adding there __gomp_team_num = 0;
and adding extern keyword before int __gomp_team_num __attribute__((shared));
in libgomp/config/nvptx/target.c.

2021-11-12  Jakub Jelinek  

* config/nvptx/target.c (__gomp_team_num): Define as
__attribute__((shared)) var.
(GOMP_teams4): Use __gomp_team_num as the team number instead of
%ctaid.x.  If first, initialize it to %ctaid.x.  If num_teams_lower
is bigger than num_blocks, use num_teams_lower teams and arrange for
bumping of __gomp_team_num if !first and returning false once we run
out of teams.
* config/nvptx/teams.c (__gomp_team_num): Declare as
extern __attribute__((shared)) var.
(omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x.

--- libgomp/config/nvptx/target.c.jj2021-11-12 12:41:11.433501988 +0100
+++ libgomp/config/nvptx/target.c   2021-11-12 14:21:39.451426717 +0100
@@ -26,28 +26,41 @@
 #include "libgomp.h"
 #include 
 
+int __gomp_team_num __attribute__((shared));
+
 bool
 GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
 unsigned int thread_limit, bool first)
 {
+  unsigned int num_blocks, block_id;
+  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
   if (!first)
-return false;
+{
+  unsigned in team_num;
+  if (num_blocks > gomp_num_teams_var)
+   return false;
+  team_num = __gomp_team_num;
+  if (team_num > gomp_num_teams_var - num_blocks)
+   return false;
+  __gomp_team_num = team_num + num_blocks;
+  return true;
+}
   if (thread_limit)
 {
   struct gomp_task_icv *icv = gomp_icv (true);
   icv->thread_limit_var
= thread_limit > INT_MAX ? UINT_MAX : thread_limit;
 }
-  unsigned int num_blocks, block_id;
-  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
-  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
-  /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple
- times for some CTAs.  */
-  (void) num_teams_lower;
-  if (!num_teams_upper || num_teams_upper >= num_blocks)
+  if (!num_teams_upper)
 num_teams_upper = num_blocks;
-  else if (block_id >= num_teams_upper)
+  else if (num_blocks < num_teams_lower)
+num_teams_upper = num_teams_lower;
+  else if (num_blocks < num_teams_upper)
+num_teams_upper = num_blocks;
+  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
+  if (block_id >= num_teams_upper)
 return false;
+  __gomp_team_num = block_id;
   gomp_num_teams_var = num_teams_upper - 1;
   return true;
 }
--- libgomp/config/nvptx/teams.c.jj 2021-01-05 00:13:58.255297642 +0100
+++ libgomp/config/nvptx/teams.c2021-11-12 14:22:06.443039863 +0100
@@ -28,6 +28,8 @@
 
 #include "libgomp.h"
 
+extern int __gomp_team_num __attribute__((shared));
+
 void
 GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams,
unsigned int thread_limit, unsigned int flags)
@@ -48,9 +50,7 @@ omp_get_num_teams (void)
 int
 omp_get_team_num (void)
 {
-  int ctaid;
-  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid));
-  return ctaid;
+  return __gomp_team_num;
 }
 
 ialias (omp_get_num_teams)


Jakub



[PATCH] libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound

2021-11-12 Thread Jakub Jelinek via Gcc-patches
Hi!

Here is an completely untested attempt at implementing what I was talking
about, that for num_teams_upper 0 or whenever num_teams_lower <= num_blocks,
the current implementation is fine but if the user explicitly asks for more
teams than we can provide in hardware, we need to stop assuming that
omp_get_team_num () is equal to the hw team id, but instead need to use some
team specific memory (I believe it is .shared for PTX), or if none is
provided, array indexed by the hw team id and run some teams serially within
the same hw thread.

This patch assumes that .shared variables are initialized to 0,
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html lists
in Table 7. .shared as non-initializable.  If that isn't the case,
we need to initialize it somewhere for the case of #pragma omp target
without #pragma omp teams in it, maybe in libgcc/config/nvptx/crt0.c ?

2021-11-12  Jakub Jelinek  

* config/nvptx/target.c (__gomp_team_num): Define using inline asm as
a .shared var.
(GOMP_teams4): Use __gomp_team_num as the team number instead of
%ctaid.x.  If first, initialize it to %ctaid.x.  If num_teams_lower
is bigger than num_blocks, use num_teams_lower teams and arrange for
bumping of __gomp_team_num if !first and returning false once we run
out of teams.
* config/nvptx/teams.c (__gomp_team_num): Declare using inline asm as
an external .shared var.
(omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x.

--- libgomp/config/nvptx/target.c.jj2021-11-12 12:41:11.433501988 +0100
+++ libgomp/config/nvptx/target.c   2021-11-12 14:02:56.231477929 +0100
@@ -26,28 +26,43 @@
 #include "libgomp.h"
 #include 
 
+asm ("\n// BEGIN GLOBAL VAR DECL: __gomp_team_num"
+ "\n.visible .shared .align 4 .u32 __gomp_team_num[1];");
+
 bool
 GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
 unsigned int thread_limit, bool first)
 {
+  unsigned int num_blocks, block_id;
+  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
   if (!first)
-return false;
+{
+  unsigned in team_num;
+  if (num_blocks > gomp_num_teams_var)
+   return false;
+  asm ("ld.shared.u32\t%0, [__gomp_team_num]" : "=r" (team_num));
+  if (team_num > gomp_num_teams_var - num_blocks)
+   return false;
+  asm ("st.shared.u32\t[__gomp_team_num], %0"
+  : : "r" (team_num + num_blocks));
+  return true;
+}
   if (thread_limit)
 {
   struct gomp_task_icv *icv = gomp_icv (true);
   icv->thread_limit_var
= thread_limit > INT_MAX ? UINT_MAX : thread_limit;
 }
-  unsigned int num_blocks, block_id;
-  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
-  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
-  /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple
- times for some CTAs.  */
-  (void) num_teams_lower;
-  if (!num_teams_upper || num_teams_upper >= num_blocks)
+  if (!num_teams_upper)
 num_teams_upper = num_blocks;
-  else if (block_id >= num_teams_upper)
+  else if (num_blocks < num_teams_lower)
+num_teams_upper = num_teams_lower;
+  else if (num_blocks < num_teams_upper)
+num_teams_upper = num_blocks;
+  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
+  if (block_id >= num_teams_upper)
 return false;
+  asm ("st.shared.u32\t[__gomp_team_num], %0" : : "r" (block_id));
   gomp_num_teams_var = num_teams_upper - 1;
   return true;
 }
--- libgomp/config/nvptx/teams.c.jj 2021-01-05 00:13:58.255297642 +0100
+++ libgomp/config/nvptx/teams.c2021-11-12 13:55:59.950421993 +0100
@@ -28,6 +28,9 @@
 
 #include "libgomp.h"
 
+asm ("\n// BEGIN GLOBAL VAR DECL: __gomp_team_num"
+ "\n.extern .shared .align 4 .u32 __gomp_team_num[1];");
+
 void
 GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams,
unsigned int thread_limit, unsigned int flags)
@@ -48,9 +50,9 @@ omp_get_num_teams (void)
 int
 omp_get_team_num (void)
 {
-  int ctaid;
-  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid));
-  return ctaid;
+  int team_num;
+  asm ("ld.shared.u32\t%0, [__gomp_team_num]" : "=r" (team_num));
+  return team_num;
 }
 
 ialias (omp_get_num_teams)

Jakub



Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-11-12 Thread Richard Biener via Gcc-patches
On Thu, 11 Nov 2021, Andre Vieira (lists) wrote:

> Hi,
> 
> This is the rebased and reworked version of the unroll patch.  I wasn't
> entirely sure whether I should compare the costs of the unrolled loop_vinfo
> with the original loop_vinfo it was unrolled of. I did now, but I wasn't too
> sure whether it was a good idea to... Any thoughts on this?

+  /* Apply the suggested unrolling factor, this was determined by the 
backend
+ during finish_cost the first time we ran the analyzis for this
+ vector mode.  */
+  if (loop_vinfo->suggested_unroll_factor > 1)
+{
+  poly_uint64 unrolled_vf
+   = LOOP_VINFO_VECT_FACTOR (loop_vinfo) * 
loop_vinfo->suggested_unroll_factor;
+  /* Make sure the unrolled vectorization factor is less than the max
+ vectorization factor.  */
+  unsigned HOST_WIDE_INT max_vf = LOOP_VINFO_MAX_VECT_FACTOR 
(loop_vinfo);
+  if (max_vf == MAX_VECTORIZATION_FACTOR || known_le (unrolled_vf, 
max_vf))
+   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = unrolled_vf;
+  else
+   return opt_result::failure_at (vect_location,
+  "unrolling failed: unrolled"
+  " vectorization factor larger than"
+  " maximum vectorization factor: 
%d\n",
+  LOOP_VINFO_MAX_VECT_FACTOR 
(loop_vinfo));
+}
+
   /* This is the point where we can re-start analysis with SLP forced 
off.  */
 start_over:

So we're honoring suggested_unroll_factor here but you still have the
now unused hunk

+vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool ,
+unsigned *suggested_unroll_factor, poly_uint64 min_vf 
= 2)
 {

I also wonder whether vect_analyze_loop_2 could at least prune
suggested_unroll_factor as set by vect_analyze_loop_costing with its
knowledge of max_vf itself?  That would avoid using the at the moment
unused LOOP_VINFO_MAX_VECT_FACTOR?

I think all the things you do in vect_can_unroll should be figured
out with the re-analysis, and I'd just amend vect_analyze_loop_1
with a suggested unroll factor parameter like it has main_loop_vinfo
for the epilogue case.  The main loop adjustment would the be in the

  if (first_loop_vinfo == NULL)
{
  first_loop_vinfo = loop_vinfo;
  first_loop_i = loop_vinfo_i;
  first_loop_next_i = mode_i;
}

spot only, adding

if (loop_vinfo->suggested_unroll_factor != 1)
  {
suggested_unroll_factor = loop_vinfo->suggested_unroll_factor;
mode_i = first_loop_i;
if (dump)
  dump_print ("Trying unrolling by %d\n");
continue;
  }

and a reset of suggested_unroll_factor after the vect_analyze_loop_1
call?  (that's basically pushing another analysis case to the
poor-mans "worklist")

Richard.

> Regards,
> 
> Andre
> 
> 
> gcc/ChangeLog:
> 
>     * tree-vect-loop.c (vect_estimate_min_profitable_iters): Add
> suggested_unroll_factor parameter.
>     (vect_analyze_loop_costing): Likewise.
>     (vect_determine_partial_vectors_and_peeling): Don't mask an 
> unrolled loop.
>     (vect_analyze_loop_2): Support unrolling of loops.
>     (vect_can_unroll): New function.
>     (vect_try_unrolling): New function.
>     (vect_analyze_loop_1): Add suggested_unroll_factor parameter 
> and use it.
>     (vect_analyze_loop): Call vect_try_unrolling when unrolling suggested.
>     (vectorizable_reduction): Don't single_defuse_cycle when unrolling.
>     * tree-vectorizer.h (_loop_vec_info::_loop_vec_info):  Add 
> suggested_unroll_factor member.
>         (vector_costs::vector_costs): Add m_suggested_unroll_factor member.
>     (vector_costs::suggested_unroll_factor): New getter.
>     (finish_cost): Add suggested_unroll_factor out parameter and 
> set it.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend


Fix ipa-modref pure/const discovery

2021-11-12 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes bug I introduced while breaking up the bigger change.
We currently can not use pure/const to discover looping pures
since lack of global memory writes/stores does not imply we can CSE on
the function.  THis is witnessed by testsuite doing volatile asm
or also can happen if i.e. function returns result of malloc.

I have followup patch to add the analysis, but will first look into
current ICE of ltobootstrap.

Bootstrapped/regtested x86_64-linux, comitted.
PR ipa/103200
* ipa-modref.c (analyze_function, modref_propagate_in_scc): Do
not mark pure/const function if there are side-effects.
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 72006251f29..44b3427a202 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -2790,7 +2790,8 @@ analyze_function (function *f, bool ipa)
 
   if (!ipa && flag_ipa_pure_const)
 {
-  if (!summary->stores->every_base && !summary->stores->bases)
+  if (!summary->stores->every_base && !summary->stores->bases
+ && !summary->side_effects)
{
  if (!summary->loads->every_base && !summary->loads->bases)
fixup_cfg = ipa_make_function_const
@@ -4380,7 +4381,8 @@ modref_propagate_in_scc (cgraph_node *component_node)
modref_summary_lto *summary_lto = summaries_lto
  ? summaries_lto->get (cur)
  : NULL;
-   if (summary && !summary->stores->every_base && !summary->stores->bases)
+   if (summary && !summary->stores->every_base && !summary->stores->bases
+   && !summary->side_effects)
  {
if (!summary->loads->every_base && !summary->loads->bases)
  pureconst |= ipa_make_function_const
@@ -4390,7 +4392,7 @@ modref_propagate_in_scc (cgraph_node *component_node)
 (cur, summary->side_effects, false);
  }
if (summary_lto && !summary_lto->stores->every_base
-   && !summary_lto->stores->bases)
+   && !summary_lto->stores->bases && !summary_lto->side_effects)
  {
if (!summary_lto->loads->every_base && !summary_lto->loads->bases)
  pureconst |= ipa_make_function_const


Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-11-12 Thread Robin Dapp via Gcc-patches
Hi Richard,

> It's hard to judge this in isolation because it's not clear when
> and how the new arguments are going to be used, but it seems OK
> in principle.  Do you still want:
> 
>   /* If earliest == jump, try to build the cmove insn directly.
>  This is helpful when combine has created some complex condition
>  (like for alpha's cmovlbs) that we can't hope to regenerate
>  through the normal interface.  */
> 
>   if (if_info->cond_earliest == if_info->jump)
> {
> 
> to be used when cc_cmp and rev_cc_cmp are nonnull?

My initial hunch was to just leave it in place as I did not manage to
trigger it.  As it is going to be called and costed both ways (with
cc_cmp, rev_cc_cmp and without) it is probably better to move it into
the else branch.

The single usage of this is in patch 5/7.  We are passing the already
existing condition from the jump and its reverse to see if the backend
can come up with something better than when creating a new comparison.

>> +static rtx emit_conditional_move (rtx, rtx, rtx, rtx, machine_mode);
>> +rtx emit_conditional_move (rtx, rtx, rtx, rtx, rtx, machine_mode);
> 
> This is redundant with the header file declaration.
> 

Removed it.

> I think it'd be better to call one of these functions something else,
> rather than make the interpretation of the third parameter depend on
> the total number of parameters.  In the second overload, the comparison
> rtx effectively replaces four parameters of the existing
> emit_conditional_move, so perhaps that's the one that should remain
> emit_conditional_move.  Maybe the first one should be called
> emit_conditional_move_with_rev or something.

Not entirely fond of calling the first one _with_rev because essentially
both try normal and reversed variants but I agree that the naming is not
ideal.  I don't have any great ideas how to properly untangle it so I
would go with your suggestions in order to move forward.  As there is
only one caller of the second function, we could also let the caller
handle the reversing.  Then, the third function would need to be
non-static, though.

The third, static emit_conditional_move I already renamed locally to
emit_conditional_move_1.

> Part of me wonders if this would be simpler if we created a structure
> to describe a comparison and passed that around instead of individual
> fields, but I guess it could become a rat hole.

I also thought about this as it would allow us to use either
representation as required by the usage site.  Even tried it in a branch
locally but indeed it became ugly quickly so I postponed it for now.

Regards
 Robin


Re: Use modref summary to DSE calls to non-pure functions

2021-11-12 Thread Richard Biener via Gcc-patches
On Fri, Nov 12, 2021 at 12:39 PM Jan Hubicka  wrote:
>
> Hi,
> this is updated patch.  It moves the summary walk checking if we can
> possibly suceed on dse to summary->finalize member function so it is done
> once per summary and refactors dse_optimize_call to be called from
> dse_optimize_stmt after early checks.
>
> I did not try to handle the special case of parm_offset_known but we can
> do it incrementally.  I think initializing range with offset being
> polyin64_int_min and max_size unkonwn as suggested is going to work.
> I am bit worried this is in bits so we have 2^61 range instead of 2^64
> but I guess once can not offset pointer back and forth in valid program?

Not sure indeed.  I'd only special-case when the call argument is
, then the start offset can be simply zero.

> Bootstrapped/regtested x86_64-linux, ltobootstrap in progress, OK if it
> succeeds?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * ipa-modref.c (modref_summary::modref_summary): Clear new flags.
> (modref_summary::dump): Dump try_dse.
> (modref_summary::finalize): Add FUN attribute; compute try-dse.
> (analyze_function): Update.
> (read_section): Update.
> (update_signature): Update.
> (pass_ipa_modref::execute): Update.
> * ipa-modref.h (struct modref_summary):
> * tree-ssa-alias.c (ao_ref_init_from_ptr_and_range): Export.
> * tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
> * tree-ssa-dse.c: Include cgraph.h, ipa-modref-tree.h and
> ipa-modref.h
> (dse_optimize_call): New function.
> (dse_optimize_stmt): Use it.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/modref-dse-1.c: New test.
> * gcc.dg/tree-ssa/modref-dse-2.c: New test.
>
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index c6efacb0e20..ea6a27ae767 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -276,7 +276,8 @@ static GTY(()) fast_function_summary  *, va_gc>
>
>  modref_summary::modref_summary ()
>: loads (NULL), stores (NULL), retslot_flags (0), static_chain_flags (0),
> -writes_errno (false), side_effects (false)
> +writes_errno (false), side_effects (false), global_memory_read (false),
> +global_memory_written (false), try_dse (false)
>  {
>  }
>
> @@ -605,6 +606,8 @@ modref_summary::dump (FILE *out)
>  fprintf (out, "  Global memory read\n");
>if (global_memory_written)
>  fprintf (out, "  Global memory written\n");
> +  if (try_dse)
> +fprintf (out, "  Try dse\n");
>if (arg_flags.length ())
>  {
>for (unsigned int i = 0; i < arg_flags.length (); i++)
> @@ -661,12 +664,56 @@ modref_summary_lto::dump (FILE *out)
>  }
>
>  /* Called after summary is produced and before it is used by local analysis.
> -   Can be called multiple times in case summary needs to update signature.  
> */
> +   Can be called multiple times in case summary needs to update signature.
> +   FUN is decl of function summary is attached to.  */
>  void
> -modref_summary::finalize ()
> +modref_summary::finalize (tree fun)
>  {
>global_memory_read = !loads || loads->global_access_p ();
>global_memory_written = !stores || stores->global_access_p ();
> +
> +  /* We can do DSE if we know function has no side effects and
> + we can analyse all stores.  Disable dse if there are too many
> + stores to try.  */
> +  if (side_effects || global_memory_written || writes_errno)
> +try_dse = false;
> +  else
> +{
> +  try_dse = true;
> +  size_t i, j, k;
> +  int num_tests = 0, max_tests
> +   = opt_for_fn (fun, param_modref_max_tests);
> +  modref_base_node  *base_node;
> +  modref_ref_node  *ref_node;
> +  modref_access_node *access_node;
> +  FOR_EACH_VEC_SAFE_ELT (stores->bases, i, base_node)
> +   {
> + if (base_node->every_ref)
> +   {
> + try_dse = false;
> + break;
> +   }
> + FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
> +   {
> + if (base_node->every_ref)
> +   {
> + try_dse = false;
> + break;
> +   }
> + FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
> +   if (num_tests++ > max_tests
> +   || !access_node->parm_offset_known)
> + {
> +   try_dse = false;
> +   break;
> + }
> + if (!try_dse)
> +   break;
> +   }
> + if (!try_dse)
> +   break;
> +   }
> +}
>  }
>
>  /* Get function summary for FUNC if it exists, return NULL otherwise.  */
> @@ -2803,7 +2850,7 @@ analyze_function (function *f, bool ipa)
>summary = NULL;
>  }
>else if (summary)
> -summary->finalize ();
> +summary->finalize (current_function_decl);
>if (summary_lto && !summary_lto->useful_p (ecf_flags))
>  {
>

[committed] libstdc++: Print assertion messages to stderr [PR59675]

2021-11-12 Thread Jonathan Wakely via Gcc-patches
This replaces the printf used by failed debug assertions with fprintf,
so we can write to stderr.

To avoid including  the assert function is moved into the
library. To avoid programs using a vague linkage definition of the old
inline function, the function is renamed. Code compiled with old
versions of GCC might still call the old function, but code compiled
with the newer GCC will call the new function and write to stderr.

libstdc++-v3/ChangeLog:

PR libstdc++/59675
* acinclude.m4 (libtool_VERSION): Bump version.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Add version and
export new symbol.
* configure: Regenerate.
* include/bits/c++config (__replacement_assert): Remove, declare
__glibcxx_assert_fail instead.
* src/c++11/debug.cc (__glibcxx_assert_fail): New function to
replace __replacement_assert, writing to stderr instead of
stdout.
* testsuite/util/testsuite_abi.cc: Update latest version.
---
 libstdc++-v3/acinclude.m4|  2 +-
 libstdc++-v3/config/abi/pre/gnu.ver  |  6 +
 libstdc++-v3/configure   |  2 +-
 libstdc++-v3/include/bits/c++config  | 27 
 libstdc++-v3/src/c++11/debug.cc  | 18 -
 libstdc++-v3/testsuite/util/testsuite_abi.cc |  3 ++-
 6 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 497af5723e1..4adfdf646ac 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3798,7 +3798,7 @@ changequote([,])dnl
 fi
 
 # For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:29:0
+libtool_VERSION=6:30:0
 
 # Everything parsed; figure out what files and settings to use.
 case $enable_symvers in
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 5323c7f0604..8f3c7b3827e 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2397,6 +2397,12 @@ GLIBCXX_3.4.29 {
 
 } GLIBCXX_3.4.28;
 
+GLIBCXX_3.4.30 {
+
+_ZSt21__glibcxx_assert_fail*;
+
+} GLIBCXX_3.4.29;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 21371031b66..3a572475546 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -74892,7 +74892,7 @@ $as_echo "$as_me: WARNING: === Symbol versioning will 
be disabled." >&2;}
 fi
 
 # For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:29:0
+libtool_VERSION=6:30:0
 
 # Everything parsed; figure out what files and settings to use.
 case $enable_symvers in
diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index a6495809671..4b7fa659300 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -526,22 +526,17 @@ namespace std
   // Avoid the use of assert, because we're trying to keep the 
   // include out of the mix.
   extern "C++" _GLIBCXX_NORETURN
-  inline void
-  __replacement_assert(const char* __file, int __line,
-  const char* __function, const char* __condition)
-  _GLIBCXX_NOEXCEPT
-  {
-__builtin_printf("%s:%d: %s: Assertion '%s' failed.\n", __file, __line,
-__function, __condition);
-__builtin_abort();
-  }
+  void
+  __glibcxx_assert_fail(const char* __file, int __line,
+   const char* __function, const char* __condition)
+  _GLIBCXX_NOEXCEPT;
 }
-#define __glibcxx_assert_impl(_Condition) \
-  if (__builtin_expect(!bool(_Condition), false)) \
-  {   \
-__glibcxx_constexpr_assert(false);\
-std::__replacement_assert(__FILE__, __LINE__, __PRETTY_FUNCTION__, \
- #_Condition);\
+#define __glibcxx_assert_impl(_Condition)  \
+  if (__builtin_expect(!bool(_Condition), false))  \
+  {\
+__glibcxx_constexpr_assert(false); \
+std::__glibcxx_assert_fail(__FILE__, __LINE__, __PRETTY_FUNCTION__,
\
+  #_Condition);\
   }
 # else // ! VERBOSE_ASSERT
 # define __glibcxx_assert_impl(_Condition) \
@@ -550,7 +545,7 @@ namespace std
 __glibcxx_constexpr_assert(false); \
 __builtin_abort(); \
   }
-#endif
+# endif
 #endif
 
 #if defined(_GLIBCXX_ASSERTIONS)
diff --git a/libstdc++-v3/src/c++11/debug.cc b/libstdc++-v3/src/c++11/debug.cc
index 0128535135e..77cb2a2c7ed 100644
--- a/libstdc++-v3/src/c++11/debug.cc
+++ b/libstdc++-v3/src/c++11/debug.cc
@@ -33,7 +33,8 @@
 

  1   2   >