date:20210629

[POWER10] __morestack calls from pcrel code

2021-06-29 Thread Alan Modra via Gcc-patches

Compiling gcc/testsuite/gcc.dg/split-*.c and others with -mcpu=power10
and linking with a non-pcrel libgcc results in crashes due to the
power10 pcrel code not having r2 set for the generic-morestack.c
functions called from __morestack.  There is also a problem when
non-pcrel code calls a pcrel libgcc.  See the patch comments.

A similar situation theoretically occurs with ELFv1 multi-toc
executables, when __morestack might be located in a different toc
group to its caller.  This patch makes no attempt to fix that, since
the gold linker does not support multi-toc (gold is needed for proper
support of -fsplit-stack code) nor does gcc emit __morestack calls
that support multi-toc.

Bootstrapped and regression tested power64le-linux with both
-mcpu=power10 and -mcpu=power9.  OK for mainline and backporting to
gcc-11 and gcc-10?

* config/rs6000/morestack.S (R2_SAVE): Define.
(__morestack): Save and restore r2.  Set up r2 for called
functions.

diff --git a/libgcc/config/rs6000/morestack.S b/libgcc/config/rs6000/morestack.S
index 4a07de927c5..a2e255e5c21 100644
--- a/libgcc/config/rs6000/morestack.S
+++ b/libgcc/config/rs6000/morestack.S
@@ -31,6 +31,7 @@
 #define PARAMS 48
 #endif
 #define MORESTACK_FRAMESIZE(PARAMS+96)
+#define R2_SAVE-MORESTACK_FRAMESIZE+PARAMS-8
 #define PARAMREG_SAVE  -MORESTACK_FRAMESIZE+PARAMS+0
 #define STATIC_CHAIN_SAVE  -MORESTACK_FRAMESIZE+PARAMS+64
 #define R29_SAVE   -MORESTACK_FRAMESIZE+PARAMS+72
@@ -143,6 +144,17 @@ ENTRY0(__morestack_non_split)
 # cr7 must also be preserved.
 
 ENTRY0(__morestack)
+
+#if _CALL_ELF == 2
+# Functions with localentry bits of zero cannot make calls if those
+# calls might change r2.  This is true generally, and also true for
+# __morestack with its special calling convention.  When __morestack's
+# caller is non-pcrel but libgcc is pcrel, the functions called here
+# might modify r2.  r2 must be preserved on exit, and also restored
+# for the call back to our caller.
+   std %r2,R2_SAVE(%r1)
+#endif
+
 # Save parameter passing registers, our arguments, lr, r29
 # and use r29 as a frame pointer.
std %r3,PARAMREG_SAVE+0(%r1)
@@ -161,10 +173,24 @@ ENTRY0(__morestack)
std %r12,LINKREG_SAVE(%r1)
std %r3,NEWSTACKSIZE_SAVE(%r1)  # new stack size
mr %r29,%r1
+#if _CALL_ELF == 2
+   .cfi_offset %r2,R2_SAVE
+#endif
.cfi_offset %r29,R29_SAVE
.cfi_def_cfa_register %r29
stdu %r1,-MORESTACK_FRAMESIZE(%r1)
 
+#if _CALL_ELF == 2 && !defined __PCREL__
+# If this isn't a pcrel libgcc then the functions we call here will
+# require r2 to be valid.  If __morestack is called from pcrel code r2
+# won't be valid.  Set it up.
+   bcl 20,31,1f
+1:
+   mflr %r12
+   addis %r2,%r12,.TOC.-1b@ha
+   addi %r2,%r2,.TOC.-1b@l
+#endif
+
# void __morestack_block_signals (void)
bl JUMP_TARGET(__morestack_block_signals)
 
@@ -199,6 +225,9 @@ ENTRY0(__morestack)
 # instructions after __morestack's return address.
 #
ld %r12,LINKREG_SAVE(%r29)
+#if _CALL_ELF == 2
+   ld %r2,R2_SAVE(%r29)
+#endif
ld %r3,PARAMREG_SAVE+0(%r29)# restore arg regs
ld %r4,PARAMREG_SAVE+8(%r29)
ld %r5,PARAMREG_SAVE+16(%r29)
@@ -228,6 +257,15 @@ ENTRY0(__morestack)
std %r10,PARAMREG_SAVE+56(%r29)
 #endif
 
+#if _CALL_ELF == 2 && !defined __PCREL__
+# r2 was restored for calling back into our caller.  Set it up again.
+   bcl 20,31,1f
+1:
+   mflr %r12
+   addis %r2,%r12,.TOC.-1b@ha
+   addi %r2,%r2,.TOC.-1b@l
+#endif
+
bl JUMP_TARGET(__morestack_block_signals)
 
# void *__generic_releasestack (size_t *pavailable)
@@ -249,6 +287,9 @@ ENTRY0(__morestack)
 # Restore return value regs, and return.
ld %r0,LINKREG_SAVE(%r29)
mtlr %r0
+#if _CALL_ELF == 2
+   ld %r2,R2_SAVE(%r29)
+#endif
ld %r3,PARAMREG_SAVE+0(%r29)
ld %r4,PARAMREG_SAVE+8(%r29)
ld %r5,PARAMREG_SAVE+16(%r29)

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] Add stmt context in simplify_using_ranges.

2021-06-29 Thread Aldy Hernandez via Gcc-patches





On 6/29/21 9:09 PM, Andrew MacLeod wrote:
We added context to a lot of simplify_using_ranges, but we didn't catch 
all the places.   This provides the originating stmt to the missing 
cases which resolve a few EVRP testcases when running in ranger-only mode.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew




Thanks for doing this.  I've done a half-assed job at passing context 
around; probably only when it yielded a discrepancy with evrp.


 
 bool

-simplify_using_ranges::op_with_boolean_value_range_p (tree op)
+simplify_using_ranges::op_with_boolean_value_range_p (tree op, gimple *s)
 {
   if (TYPE_PRECISION (TREE_TYPE (op)) == 1)
 


I know you like single letter arguments, but I find them confusing when 
the method is more than a few lines long.  Besides, "stmt" is what is 
used throughout vr-values.c.


And speaking of passing statements around, I wonder if it'd be best to 
have m_stmt and possible m_gsi as class fields.  After all, we never 
change them, and they're used by most methods.


Aldy

Re: [PATCH 0/2] Ranger-based backwards threader implementation.

2021-06-29 Thread Aldy Hernandez via Gcc-patches





On 6/29/21 11:22 PM, Martin Sebor wrote:

On 6/29/21 4:27 AM, Aldy Hernandez wrote:



On 6/29/21 1:19 AM, Martin Sebor wrote:

On 6/28/21 10:21 AM, Aldy Hernandez via Gcc-patches wrote:

This is the ranger-based backwards threader.  It is divided into two
parts: the solver and the path discovery bits.

The solver is generic enough, that it may be of use to other passes,
so it's been abstracted into its own separate class/file.  Andrew and
I have already gone over it, so I don't think a review is necessary.
Besides, it's technically an extension of the ranger infrastructure.

On the other hand, the path discovery bits could benefit from the
watchful eye of the jump threading experts.

Documenting the solver in a [ranger-tech] post is on my TODO list,
as I think it would be useful as an example of GORI as a general
tool, outside the VRP world.

As I have mentioned elsewhere, I have gone through each test and
documented the reasons why they were adjusted (when useful).  The
reviewer(s) may benefit from looking at the test notes.

I have added a --param=threader-mode={ranger,legacy} option, which I
hope to remove shortly after.  It has been useful for diagnosing
issues in the past, though perhaps not so much now.  I've left it
in case there's a remote interest in using it during stage1, but
removing it could be a huge cleanup to tree-ssa-threadbackward.c.

If/when accepted, I will open 2-3 PRs with the XFAILed tests as
requested.  I am still working on distilling a C counterpart for
the libphobos missing thread edge.  It'll hopefully be ready by the
time the review is done.

A version of this patchset with the verification code has
been tested on x86-64, ppc64, ppc64le, and aarch64 (all Linux).

I am currently re-testing on x86-64 Linux, but will not re-test on the
rest of the architectures because...OMG aarch6 is so slow!


I applied the series and ran a subset of tests and didn't see any
failures, just the three XPASSes below.  The Wfree-nonheap-object
tests you mentioned in the other post all pass.  Looks like you
got past that problem?

XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 
32)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 
46)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 
60)


A couple of comments on the tests below (I haven't looked at the meat
of the patch):



Thanks.
Aldy

Aldy Hernandez (2):
   Implement basic block path solver.
   Backwards jump threader rewrite with ranger.

  gcc/Makefile.in   |   6 +
  gcc/flag-types.h  |   7 +
  gcc/params.opt    |  17 +
  .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
  gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
  gcc/testsuite/gcc.dg/Wrestrict-22.c   |   3 +


The change here just adds the comment:

+/* This looks like the threader caused the entire loop to collapse, 
and the

+   warning pass can't determine the arguments to memcpy.  */
+

Since the test passes I'm not sure I understand what the comment
is trying to say.  Is it still accurate and necessary?


This seems like it came from the ranger branch which had slightly 
different code, particularly it made use of a full ranger with 
equivalences.  It looks like this could have failed in the branch, but 
no longer does.  I have removed the comment.


Okay, thanks.






  gcc/testsuite/gcc.dg/loop-unswitch-2.c    |   2 +-
  gcc/testsuite/gcc.dg/old-style-asm-1.c    |   5 +-
  gcc/testsuite/gcc.dg/pr68317.c    |   4 +-
  gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
  gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
  gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
  gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
  .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |   5 +-


I wonder if breaking up the test function into five, one for each
of the tests it does, would be a better way to avoid the IL changes
than disabling all the threading passes.  Like in the attached patch.


As the author of the original test, I completely differ to you :).

Attached is the latest version with your suggested changes, as well as 
a gimple FE test for the previously discussed failing libphobos test.


The tests look good.

In the new APIs, instead of taking vec by value can you please change
them to either by-const-reference if they don't change the vec or by-
reference if they do?  I'm in the midst of changing code to do that
with the goal of eventually removing all by-value vec arguments.


Sure.

Aldy

[PATCH 4/4] poison input_location and cfun in one spot

2021-06-29 Thread Trevor Saunders

This simply confirms we can poison them in a small region.

boostrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/ChangeLog:

* gimple-range.cc (disable_ranger): Prevent access to cfun and
input_location.
---
 gcc/gimple-range.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 1851339c528..d4a3a6e46be 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+#include "poison.h"
 #include "backend.h"
 #include "tree.h"
 #include "gimple.h"
@@ -509,6 +510,8 @@ enable_ranger (struct function *fun)
 void
 disable_ranger (struct function *fun)
 {
+  auto_poison pil (input_location);
+  auto_poison pcfun (cfun_poison);
   delete fun->x_range_query;
 
   fun->x_range_query = &global_ranges;
-- 
2.20.1

[PATCH 2/4] allow poisoning input_location in ranges it should not be used

2021-06-29 Thread Trevor Saunders

This makes it possible to assert if input_location is used during the lifetime
of a scope.  This will allow us to find places that currently use it within a
function and its callees, or prevent adding uses within the lifetime of a
function after all existing uses are removed.

bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/cp/ChangeLog:

* call.c (add_builtin_candidate): Adjust.
* decl.c (compute_array_index_type_loc): Likewise.
* decl2.c (get_guard_cond): Likewise.
(one_static_initialization_or_destruction): Likewise.
(do_static_initialization_or_destruction): Likewise.
* init.c (build_new_1): Likewise.
(build_vec_init): Likewise.
* module.cc (finish_module_processing): Likewise.
* parser.c (cp_convert_range_for): Likewise.
(cp_parser_perform_range_for_lookup): Likewise.
(cp_parser_omp_for_incr): Likewise.
(cp_convert_omp_range_for): Likewise.
* pt.c (fold_expression): Likewise.
(tsubst_copy_and_build): Likewise.
* typeck.c (common_pointer_type): Likewise.
(cp_build_array_ref): Likewise.
(get_member_function_from_ptrfunc): Likewise.
(cp_build_unary_op): Likewise.
(convert_ptrmem): Likewise.
(cp_build_modify_expr): Likewise.
(build_ptrmemfunc): Likewise.

gcc/ChangeLog:

* diagnostic.c (internal_error): Remove use of input_location.
* input.c (input_location): Change type to poisonable.
* input.h (input_location): Adjust prototype.

gcc/objc/ChangeLog:

* objc-next-runtime-abi-02.c (build_v2_objc_method_fixup_call): Adjust.
---
 gcc/cp/call.c   |  2 +-
 gcc/cp/decl.c   |  2 +-
 gcc/cp/decl2.c  | 12 +--
 gcc/cp/init.c   | 14 ++--
 gcc/cp/module.cc|  2 +-
 gcc/cp/parser.c | 11 +-
 gcc/cp/pt.c |  4 ++--
 gcc/cp/typeck.c | 33 -
 gcc/diagnostic.c|  2 +-
 gcc/input.c |  2 +-
 gcc/input.h |  3 ++-
 gcc/objc/objc-next-runtime-abi-02.c |  2 +-
 12 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index e4df72ec1a3..c94fe0b3bd2 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -3126,7 +3126,7 @@ add_builtin_candidate (struct z_candidate **candidates, 
enum tree_code code,
 {
   if (TYPE_PTR_OR_PTRMEM_P (type1))
{
- tree cptype = composite_pointer_type (input_location,
+ tree cptype = composite_pointer_type (op_location_t (input_location),
type1, type2,
error_mark_node,
error_mark_node,
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index fa6af6fec11..84e2bdae6bf 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10884,7 +10884,7 @@ compute_array_index_type_loc (location_t name_loc, tree 
name, tree size,
 cp_build_binary_op will be appropriately folded.  */
   {
processing_template_decl_sentinel s;
-   itype = cp_build_binary_op (input_location,
+   itype = cp_build_binary_op (op_location_t (input_location),
MINUS_EXPR,
cp_convert (ssizetype, size, complain),
cp_convert (ssizetype, integer_one_node,
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 090a83bd670..ddb7e248c63 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3386,7 +3386,7 @@ get_guard_cond (tree guard, bool thread_safe)
   guard_value = integer_one_node;
   if (!same_type_p (TREE_TYPE (guard_value), TREE_TYPE (guard)))
guard_value = fold_convert (TREE_TYPE (guard), guard_value);
-  guard = cp_build_binary_op (input_location,
+  guard = cp_build_binary_op (location_t (input_location),
  BIT_AND_EXPR, guard, guard_value,
  tf_warning_or_error);
 }
@@ -3394,7 +3394,7 @@ get_guard_cond (tree guard, bool thread_safe)
   guard_value = integer_zero_node;
   if (!same_type_p (TREE_TYPE (guard_value), TREE_TYPE (guard)))
 guard_value = fold_convert (TREE_TYPE (guard), guard_value);
-  return cp_build_binary_op (input_location,
+  return cp_build_binary_op (location_t (input_location),
 EQ_EXPR, guard, guard_value,
 tf_warning_or_error);
 }
@@ -4056,7 +4056,7 @@ one_static_initialization_or_destruction (tree decl, tree 
init, bool initp)
 last to destroy the variable.  */
   else if (initp)
guard_cond
- = cp_build_binary_op (input_location,
+ = cp_build_binary_op (location_t (input_location),
EQ_EXPR,

[PATCH 3/4] allow poisoning cfun

2021-06-29 Thread Trevor Saunders

Since cfun is already a macro in most of the compiler, we redefine it to point
to a second variable, to avoid having to support C++ objects as GC roots.
However we keep the existing function * global for internal use as a gc root.
It is unfortunate the two globals need to stay in sync, but there is only a
couple of places that update it, and this seems much easier than getting
gengtype to properly handle objects as roots and in pch generation and use.

bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/cp/ChangeLog:

* module.cc (module_state::read_cluster): Set cfun_poison as well as
cfun.

gcc/ChangeLog:

* function.c (cfun_poison): New global.
(set_cfun): Set cfun_poison.
(allocate_struct_function): Likewise.
* function.h (cfun_poison): New declaration.
(cfun): Adjust.
---
 gcc/cp/module.cc | 2 ++
 gcc/function.c   | 4 
 gcc/function.h   | 4 +++-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 72f32487e51..2f1126211e6 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -14868,6 +14868,8 @@ module_state::read_cluster (unsigned snum)
  redesigning that API right now.  */
 #undef cfun
   cfun = old_cfun;
+  cfun_poison = old_cfun;
+  cfun_poison = old_cfun;
   current_function_decl = old_cfd;
   comparing_dependent_aliases--;
 
diff --git a/gcc/function.c b/gcc/function.c
index 00b2fe70c7d..87e8bc86166 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "function-abi.h"
 #include "value-range.h"
 #include "gimple-range.h"
+#include "poison.h"
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -118,6 +119,7 @@ struct machine_function * (*init_machine_status) (void);
 
 /* The currently compiled function.  */
 struct function *cfun = 0;
+poisonable cfun_poison (0);
 
 /* These hashes record the prologue and epilogue insns.  */
 
@@ -4715,6 +4717,7 @@ set_cfun (struct function *new_cfun, bool force)
   if (cfun != new_cfun || force)
 {
   cfun = new_cfun;
+  cfun_poison = new_cfun;
   invoke_set_current_function_hook (new_cfun ? new_cfun->decl : NULL_TREE);
   redirect_edge_var_map_empty ();
 }
@@ -4797,6 +4800,7 @@ allocate_struct_function (tree fndecl, bool abstract_p)
   tree fntype = fndecl ? TREE_TYPE (fndecl) : NULL_TREE;
 
   cfun = ggc_cleared_alloc ();
+  cfun_poison = cfun;
 
   init_eh_for_function ();
 
diff --git a/gcc/function.h b/gcc/function.h
index 0db51775e7c..82f7510bdc3 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_FUNCTION_H
 #define GCC_FUNCTION_H
 
+template class poisonable;
 
 /* Stack of pending (incomplete) sequences saved by `start_sequence'.
Each element describes one pending sequence.
@@ -459,11 +460,12 @@ void record_dynamic_alloc (tree decl_or_exp);
 
 /* The function currently being compiled.  */
 extern GTY(()) struct function *cfun;
+extern poisonable cfun_poison;
 
 /* In order to ensure that cfun is not set directly, we redefine it so
that it is not an lvalue.  Rather than assign to cfun, use
push_cfun or set_cfun.  */
-#define cfun (cfun + 0)
+#define cfun (cfun_poison + 0)
 
 /* Nonzero if we've already converted virtual regs to hard regs.  */
 extern int virtuals_instantiated;
-- 
2.20.1

[PATCH 1/4] add utility to poison globals that should not be used

2021-06-29 Thread Trevor Saunders

This provides a class to wrap globals like input_location or cfun that should
not be used directly and will ideally go away.  This class tracks if access to
the global is currently blocked and asserts if accessed when that is not
allowed.  It also adds a class to mark access as blocked for the lifetime of the
scope.

bootstrapped and regtested on x86_64-linux-gnu, ok?

Trev

gcc/ChangeLog:

* poison.h: New file.
---
 gcc/poison.h | 88 
 1 file changed, 88 insertions(+)
 create mode 100644 gcc/poison.h

diff --git a/gcc/poison.h b/gcc/poison.h
new file mode 100644
index 000..239ab1cb91a
--- /dev/null
+++ b/gcc/poison.h
@@ -0,0 +1,88 @@
+/* Simple utility to poison globals that should be avoided.
+
+   Copyright (C) 2021 the GNU Toolchain Authors
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_POISON_H
+#define GCC_POISON_H
+
+template class auto_poison;
+
+/* This class is intended to be used as a transparent wrapper around the type 
of
+   a global object that we would like to stop using.  */
+template
+class poisonable
+{
+public:
+  poisonable () : m_val (), m_poisoned (false) {}
+  explicit poisonable (T val) : m_val (val), m_poisoned (false) {}
+
+  operator T& ()
+{
+  gcc_assert (!m_poisoned);
+  return m_val;
+}
+
+  poisonable &operator= (T val)
+{
+  gcc_assert (!m_poisoned);
+  m_val = val;
+  return *this;
+}
+
+  T *operator& ()
+{
+  gcc_assert (!m_poisoned);
+  return &m_val;
+}
+
+  poisonable (const poisonable &) = delete;
+  poisonable (poisonable &&) = delete;
+  poisonable &operator= (const poisonable &) = delete;
+  poisonable &operator= (poisonable &&) = delete;
+
+private:
+  friend class auto_poison;
+
+  T m_val;
+  bool m_poisoned;
+  };
+
+/* This class provides a way to make a global inaccessible in the given scope,
+   and any functions called within that scope.  */
+template
+class auto_poison
+{
+public:
+  auto_poison (poisonable &p) : m_target (p)
+  {
+gcc_assert (!p.m_poisoned);
+p.m_poisoned = true;
+  }
+  ~auto_poison () { m_target.m_poisoned = false; }
+
+  auto_poison (const auto_poison &) = delete;
+  auto_poison (auto_poison &&) = delete;
+  auto_poison &operator= (const auto_poison &) = delete;
+  auto_poison &operator= (auto_poison &&) = delete;
+
+private:
+  poisonable &m_target;
+};
+
+#endif
-- 
2.20.1

Re: [PATCH] Port GCC documentation to Sphinx

2021-06-29 Thread Martin Liška


On 6/29/21 12:50 PM, Richard Earnshaw wrote:



On 29/06/2021 11:09, Martin Liška wrote:

On 6/28/21 5:33 PM, Joseph Myers wrote:

Are formatted manuals (HTML, PDF, man, info) corresponding to this patch
version also available for review?


I've just uploaded them here:
https://splichal.eu/gccsphinx-final/

Martin



In the HTML version of the gcc manual the sidebar has an "Option index" link but no link 
to the general index.  When you follow that link the page contents is just a link to the 
"index" where everything is all lumped together.

If we can't have separate indexes for options and general entries, I think it 
would make more sense for the Option index link to be removed entirely.


Fully agree with you. Thanks for the feedback and I've changed that to the 
standard Sphinx section,
see e.g. https://splichal.eu/gccsphinx-final/html/gcc/indices-and-tables.html

Martin



R.

Re: [PATCH v3] fixinc: don't "fix" machine names in __has_include(...) [PR91085]

2021-06-29 Thread Xi Ruoyao via Gcc-patches

On Tue, 2021-06-29 at 08:53 -0700, Bruce Korb wrote:
> On 6/28/21 10:26 PM, Xi Ruoyao wrote:
> > v3:
> >    use memmem/memchr instead of trivial loops
> >    split most of the logic into a static function
> >    avoid hardcoded magic number
> >    adjust test
> Looks good to me. :)

Thanks for review!

Pushed, with PR number added in ChangeLog to please the hook :).
-- 
Xi Ruoyao

Ping: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-06-29 Thread Xionghu Luo via Gcc-patches


Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html


On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/6/9 07:25, Segher Boessenkool wrote:

On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:

vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for 
vmrghlb.


(vmrglb)


+  if (BYTES_BIG_ENDIAN)
+    emit_insn (
+  gen_altivec_vmrghb_direct (operands[0], operands[1], 
operands[2]));

+  else
+    emit_insn (
+  gen_altivec_vmrglb_direct (operands[0], operands[2], 
operands[1]));


Please don't indent like that, it doesn't match what we do elsewhere.
For better or for worse (for worse imo), we use deep hanging indents.
If you have to, you can do something like

   rtx insn;
   if (BYTES_BIG_ENDIAN)
 insn = gen_altivec_vmrghb_direct (operands[0], operands[1], 
operands[2]);

   else
 insn = gen_altivec_vmrglb_direct (operands[0], operands[2], 
operands[1]);

   emit_insn (insn);

(this is better even, in that it has only one emit_insn), or even

   rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
   : gen_altivec_vmrglb_direct;
   if (!BYTES_BIG_ENDIAN)
 std::swap (operands[1], operands[2]);
   emit_insn (fun (operands[0], operands[1], operands[2]));

Well, C++ does not allow that last example like that, sigh, so
   rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? 
gen_altivec_vmrghb_direct

    : gen_altivec_vmrglb_direct;

This is shorter than the other two options ;-)


Changed.




+(define_insn "altivec_vmrghb_direct"
    [(set (match_operand:V16QI 0 "register_operand" "=v")
+    (vec_select:V16QI


This should be indented one space more.


    "TARGET_ALTIVEC"
    "@
-   xxmrghw %x0,%x1,%x2
-   vmrghw %0,%1,%2"
+  xxmrghw %x0,%x1,%x2
+  vmrghw %0,%1,%2"


The original indent was correct, please restore.


-  emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo));
+  emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));


When you see a mode as part of a pattern name, chances are that it will
be a good candidate for using parameterized names with.  (But don't do
that now, just keep it in mind as a nice cleanup to do).


OK.



@@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target, 
rtx op0, rtx op1,

 : CODE_FOR_altivec_vmrglh_direct),
    {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 
23 } },

  { OPTION_MASK_ALTIVEC,
-  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct
-   : CODE_FOR_altivec_vmrglw_direct),
+  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
+   : CODE_FOR_altivec_vmrglw_direct_v4si),


The correct way is to align the ? and the : (or put everything on one
line of course, if that fits)

The parens around this are not needed btw, and are a distraction.


Changed.




--- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
@@ -317,10 +317,10 @@ int main ()
  /* { dg-final { scan-assembler-times "vctuxs" 2 } } */
  /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
-/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
+/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
  /* { dg-final { scan-assembler-times "vmrghh" 8 } } */
-/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
-/* { dg-final { scan-assembler-times "xxmrglw" 8 } } */
+/* { dg-final { scan-assembler-times "xxmrghw" 4 } } */
+/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
  /* { dg-final { scan-assembler-times "vmrglh" 8 } } */
  /* { dg-final { scan-assembler-times "xxlnor" 6 } } */
  /* { dg-final { scan-assembler-times {\mvpkudus\M} 1 } } */
@@ -347,7 +347,7 @@ int main ()
  /* { dg-final { scan-assembler-times "vspltb" 6 } } */
  /* { dg-final { scan-assembler-times "vspltw" 0 } } */
  /* { dg-final { scan-assembler-times "vmrgow" 8 } } */
-/* { dg-final { scan-assembler-times "vmrglb" 5 { target le } } } */
+/* { dg-final { scan-assembler-times "vmrglb" 4 { target le } } } */
  /* { dg-final { scan-assembler-times "vmrglb" 6 { target be } } } */
  /* { dg-final { scan-assembler-times "vmrgew" 8 } } */
  /* { dg-final { scan-assembler-times "vsplth" 8 } } */


Are those changes correct?  It looks like a vmrglb became a vmrghb, and
that 4 each of xxmrghw and xxmrglw disappeared?  Both seem wrong?



This case is built with "-mdejagnu-cpu=power8 -O0 -mno-fold-gimple -dp"
and it also counted the generated instruction patterns.

1) "vsx_xxmrghw_v4si" is replaced by "altivec_vmrglw_direct_v4si/0", so 
it decreases from 8 to 4. (Likewise for vsx_xxmrglw_v4si.)


     li 9,48  # 1282 [c=4 l=4]  *movdi_internal64/3
-   lxvd2x 0,31,9    # 31   [c=8 l=4]  *vsx_lxvd2x4_le_v4si
-   xxpermdi 0,0,0,2 # 32   [c=4 l=4]  xxswapd_v4si
-   xxmrglw 0,0,12   # 33   [c=4 l=4]  vsx_xxmrghw_v4si
+

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-29 Thread Martin Sebor via Gcc-patches


On 6/29/21 4:58 AM, Richard Biener wrote:

On Mon, Jun 28, 2021 at 8:07 PM Martin Sebor  wrote:


On 6/28/21 2:07 AM, Richard Biener wrote:

On Sat, Jun 26, 2021 at 12:36 AM Martin Sebor  wrote:


On 6/25/21 4:11 PM, Jason Merrill wrote:

On 6/25/21 4:51 PM, Martin Sebor wrote:

On 6/1/21 3:38 PM, Jason Merrill wrote:

On 6/1/21 3:56 PM, Martin Sebor wrote:

On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec
along
with a few simple tests.  It makes auto_vec safe to use in
containers
that expect copyable and assignable element types and passes
bootstrap
and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those
operators?


I would strongly prefer the generic vector class to have the
properties
expected of any other generic container: copyable and
assignable.  If
we also want another vector type with this restriction I suggest
to add
another "noncopyable" type and make that property explicit in
its name.
I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign).
Looking around
I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're
writing C++11.


The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion
richi mentions.  And many uses of auto_vec will pass them as vec,
which will still do a shallow copy.  I think it's probably better
to disable the copy special members for auto_vec until we fix vec<>.


There are at least a couple of problems that get in the way of fixing
all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by providing
copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.

It doesn't seem to me that having a safely copyable auto_vec needs
to be put on hold until the rats nest above is untangled.  It won't
make anything worse than it is.  (I have a project that depends on
a sane auto_vec working).

A couple of alternatives to solving this are to use std::vector or
write an equivalent vector class just for GCC.


It occurs to me that another way to work around the issue of passing
an auto_vec by value as a vec, and thus doing a shallow copy, would
be to add a vec ctor taking an auto_vec, and delete that.  This would
mean if you want to pass an auto_vec to a vec interface, it needs to
be by reference.  We might as well do the same for operator=, though
that isn't as important.


Thanks, that sounds like a good idea.  Attached is an implementatio

Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-06-29 Thread Xionghu Luo via Gcc-patches


Gentle ping ^2, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568143.html


On 2021/5/14 15:13, Xionghu Luo via Gcc-patches wrote:

Test SPEC2017 Ofast P8LE for this patch : 511.povray_r +1.14%,
526.blender_r +1.72%, no obvious changes to others.


On 2021/5/6 10:36, Xionghu Luo via Gcc-patches wrote:

Gentle ping, thanks.


On 2021/4/16 15:10, Xiong Hu Luo wrote:

fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.

fmodf:
  fdivs   f0,f1,f2
  friz    f0,f0
  fnmsubs f1,f2,f0,f1

remainderf:
  fdivs   f0,f1,f2
  frin    f0,f0
  fnmsubs f1,f2,f0,f1

gcc/ChangeLog:

2021-04-16  Xionghu Luo  

PR target/97142
* config/rs6000/rs6000.md (fmod3): New define_expand.
(remainder3): Likewise.

gcc/testsuite/ChangeLog:

2021-04-16  Xionghu Luo  

PR target/97142
* gcc.target/powerpc/pr97142.c: New test.
---
  gcc/config/rs6000/rs6000.md    | 36 ++
  gcc/testsuite/gcc.target/powerpc/pr97142.c | 30 ++
  2 files changed, 66 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97142.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a1315523fec..7e0e94e6ba4 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4902,6 +4902,42 @@ (define_insn "fre"
    [(set_attr "type" "fp")
 (set_attr "isa" "*,")])
+(define_expand "fmod3"
+  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
+    (use (match_operand:SFDF 1 "gpc_reg_operand"))
+    (use (match_operand:SFDF 2 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+  && TARGET_FPRND
+  && flag_unsafe_math_optimizations"
+{
+  rtx div = gen_reg_rtx (mode);
+  emit_insn (gen_div3 (div, operands[1], operands[2]));
+
+  rtx friz = gen_reg_rtx (mode);
+  emit_insn (gen_btrunc2 (friz, div));
+
+  emit_insn (gen_nfms4 (operands[0], operands[2], friz, 
operands[1]));

+  DONE;
+ })
+
+(define_expand "remainder3"
+  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
+    (use (match_operand:SFDF 1 "gpc_reg_operand"))
+    (use (match_operand:SFDF 2 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+  && TARGET_FPRND
+  && flag_unsafe_math_optimizations"
+{
+  rtx div = gen_reg_rtx (mode);
+  emit_insn (gen_div3 (div, operands[1], operands[2]));
+
+  rtx frin = gen_reg_rtx (mode);
+  emit_insn (gen_round2 (frin, div));
+
+  emit_insn (gen_nfms4 (operands[0], operands[2], frin, 
operands[1]));

+  DONE;
+ })
+
  (define_insn "*rsqrt2"
    [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa")
  (unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" ",wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97142.c 
b/gcc/testsuite/gcc.target/powerpc/pr97142.c

new file mode 100644
index 000..48f25ca5b5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97142.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+#include 
+
+float test1 (float x, float y)
+{
+  return fmodf (x, y);
+}
+
+double test2 (double x, double y)
+{
+  return fmod (x, y);
+}
+
+float test3 (float x, float y)
+{
+  return remainderf (x, y);
+}
+
+double test4 (double x, double y)
+{
+  return remainder (x, y);
+}
+
+/* { dg-final { scan-assembler-not {\mbl fmod\M} } } */
+/* { dg-final { scan-assembler-not {\mbl fmodf\M} } } */
+/* { dg-final { scan-assembler-not {\mbl remainder\M} } } */
+/* { dg-final { scan-assembler-not {\mbl remainderf\M} } } */
+







--
Thanks,
Xionghu

Ping: [PATCH] rs6000: Fix wrong code generation for vec_sel [PR94613]

2021-06-29 Thread Xionghu Luo via Gcc-patches


Gentle ping, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570333.html


On 2021/5/14 14:57, Xionghu Luo via Gcc-patches wrote:

Hi,

On 2021/5/13 18:49, Segher Boessenkool wrote:

Hi!

On Fri, Apr 30, 2021 at 01:32:58AM -0500, Xionghu Luo wrote:

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3&op1) | (op3&op2),
(~op3&op1) | (op2&op3),
(op3&op2) | (~op3&op1),
(op2&op3) | (~op3&op1),
(op1&~op3) | (op3&op2),
(op1&~op3) | (op2&op3),
(op3&op2) | (op1&~op3),
(op2&op3) | (op1&~op3),

Combine pass will swap (op1&~op3) to (~op3&op1) due to commutative
canonical, which could reduce it to the FIRST 4 patterns, but it won't
swap (op2&op3) | (~op3&op1) to (~op3&op1) | (op2&op3), so this patch
handles it with two patterns with different NOT op3 position and check
equality inside it.


Yup, that latter case does not have canonicalisation rules.  Btw, not
only combine does this canonicalisation: everything should,
non-canonical RTL is invalid RTL (in the instruction stream, you can do
everything in temporary code of course, as long as the RTL isn't
malformed).


-(define_insn "*altivec_vsel"
+(define_insn "altivec_vsel"
    [(set (match_operand:VM 0 "altivec_register_operand" "=v")
-    (if_then_else:VM
- (ne:CC (match_operand:VM 1 "altivec_register_operand" "v")
-    (match_operand:VM 4 "zero_constant" ""))
- (match_operand:VM 2 "altivec_register_operand" "v")
- (match_operand:VM 3 "altivec_register_operand" "v")))]
-  "VECTOR_MEM_ALTIVEC_P (mode)"
-  "vsel %0,%3,%2,%1"
+    (ior:VM
+ (and:VM
+  (not:VM (match_operand:VM 3 "altivec_register_operand" "v"))
+  (match_operand:VM 1 "altivec_register_operand" "v"))
+ (and:VM
+  (match_operand:VM 2 "altivec_register_operand" "v")
+  (match_operand:VM 4 "altivec_register_operand" "v"]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
+  && (rtx_equal_p (operands[2], operands[3])
+  || rtx_equal_p (operands[4], operands[3]))"
+  {
+    if (rtx_equal_p (operands[2], operands[3]))
+  return "vsel %0,%1,%4,%3";
+    else
+  return "vsel %0,%1,%2,%3";
+  }
    [(set_attr "type" "vecmove")])


That rtx_equal_p stuff is nice and tricky, but it is a bit too tricky I
think.  So please write this as two patterns (and keep the expand if
that helps).


I was a bit concerned that there would be a lot of duplicate code if we
write two patterns for each vsel, totally 4 similar patterns in
altivec.md and another 4 in vsx.md make it difficult to maintain, however
I updated it since you prefer this way, as you pointed out the xxsel in
vsx.md could be folded by later patch.




+(define_insn "altivec_vsel2"


(same here of course).


  ;; Fused multiply add.
diff --git a/gcc/config/rs6000/rs6000-call.c 
b/gcc/config/rs6000/rs6000-call.c

index f5676255387..d65bdc01055 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3362,11 +3362,11 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_unsigned_V2DI },

    { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
  RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 
RS6000_BTI_V2DI },

-  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI,
+  { ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_2DI_UNS,


Are the _uns things still used for anything?  But, let's not change
this until Bill's stuff is in :-)

Why do you want to change this here, btw?  I don't understand.


OK, they are actually "unsigned type" overload builtin functions, change
it or not so far won't cause functionality issue, I will revert this change
in the updated patch.




+  if (target == 0
+  || GET_MODE (target) != tmode
+  || ! (*insn_data[icode].operand[0].predicate) (target, tmode))


No space after ! and other unary operators (except for casts and other
operators you write with alphanumerics, like "sizeof").  I know you
copied this code, but :-)


OK, thanks.



@@ -15608,8 +15606,6 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx 
op_true, rtx op_false,

  case GEU:
  case LTU:
  case LEU:
-  /* Mark unsigned tests with CCUNSmode.  */
-  cc_mode = CCUNSmode;
    /* Invert condition to avoid compound test if necessary.  */
    if (rcode == GEU || rcode == LEU)


So this is related to the _uns thing.  Could you split off that change?
Probably as an earlier patch (but either works for me).


Not related to the ALTIVEC_BUILTIN_VSEL_2DI_UNS things, previously cc_mode
is a parameter to generate the condition for IF_THEN_ELSE instruction, now
we don't need it again as we use IOR (AND... AND...) style, remove it to 
avoid

build error.


-  c

[PATCH] AIX code CSECT alignment

2021-06-29 Thread David Edelsohn via Gcc-patches

aix: align text CSECTs to at least 32 bytes.

Bootstrapped on powerpc-ibm-aix7.2.3.0.

Thanks, David

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_xcoff_section_type_flags):
Increase code CSECT alignment to at least 32 bytes.
* config/rs6000/xcoff.h (TEXT_SECTION_ASM_OP): Add 32 byte
alignment designation.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2c249e186e1..075c156ae13 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
 /* Subroutines used for code generation on IBM RS/6000.
Copyright (C) 1991-2021 Free Software Foundation, Inc.
Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu)

@@ -21361,8 +21362,11 @@ rs6000_xcoff_section_type_flags (tree decl,
const char *name, int reloc)
 flags |= SECTION_BSS;

   /* Align to at least UNIT size.  */
-  if ((flags & SECTION_CODE) != 0 || !decl || !DECL_P (decl))
+  if (!decl || !DECL_P (decl))
 align = MIN_UNITS_PER_WORD;
+  /* Align code CSECT to at least 32 bytes.  */
+  else if ((flags & SECTION_CODE) != 0)
+align = MAX ((DECL_ALIGN (decl) / BITS_PER_UNIT), 32);
   else
 /* Increase alignment of large objects if not already stricter.  */
 align = MAX ((DECL_ALIGN (decl) / BITS_PER_UNIT),
diff --git a/gcc/config/rs6000/xcoff.h b/gcc/config/rs6000/xcoff.h
index 5ba565f63bb..f3546fadf33 100644
--- a/gcc/config/rs6000/xcoff.h
+++ b/gcc/config/rs6000/xcoff.h
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
 /* Definitions of target machine for GNU compiler,
for some generic XCOFF file format
Copyright (C) 2001-2021 Free Software Foundation, Inc.
@@ -249,7 +250,7 @@
 #define DOUBLE_INT_ASM_OP "\t.llong\t"

 /* Output before instructions.  */
-#define TEXT_SECTION_ASM_OP "\t.csect .text[PR]"
+#define TEXT_SECTION_ASM_OP "\t.csect .text[PR],5"

 /* Output before writable data.  */
 #define DATA_SECTION_ASM_OP \

HELP!! How to inhibit optimizations applied to .DEFERRED_INIT argument?

2021-06-29 Thread Qing Zhao via Gcc-patches

Hi, 

I am testing the 4th patch of -ftrivial-auto-var-init with CPU2017 today, and 
found the following issues:

In the dump file of “*t.i.031t.objsz1”, we have:

 :
  __s1_len_217 = .DEFERRED_INIT (__s1_len_176, 2);
  __s2_len_218 = .DEFERRED_INIT (__s2_len_177, 2);
  __s2_len_219 = 7;
  if (__s2_len_219 <= 3)
goto ; [INV]
  else
goto ; [INV]

   :
  _1 = (long unsigned int) i_175;
 

However, after “ccp”, in “t.i.032t.ccp1”, we have:

 :
  __s1_len_217 = .DEFERRED_INIT (__s1_len_176, 2);
  __s2_len_218 = .DEFERRED_INIT (7, 2);
  _36 = (long unsigned int) i_175;
  _37 = _36 * 8;
  _38 = argv_220(D) + _37;


Looks like that the optimization “ccp” replaced the first argument of the call 
.DEFERRED_INIT with the constant 7.
This should be avoided. 

(NOTE, this issue existed in the previous patches, however, only exposed with 
this version since I added more verification
code in tree-cfg.c to verify the call to .DEFERRED_INIT).

I am wondering what’s the best solution to this problem? 

Can we add any attribute to the internal function argument to prevent later 
optimizations that might applied on it? 
Or just update “ccp” phase to specially handle calls to .DEFERRED_INIT? (Not 
sure whether there are other phases have the
Same issue?)

Let me know if you have any suggestion.

Thanks a lot for your help.

Qing

Re: [PATCH 2/3 V2] Fix IEEE 128-bit min/max test.

2021-06-29 Thread Segher Boessenkool

On Thu, Jun 17, 2021 at 06:56:09PM -0400, Michael Meissner wrote:
> The 'lp64' test
> was needed because big endian 32-bit code cannot enable the IEEE 128-bit
> floating point instructions.

No, *does not* enable them.  After

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2c249e186e1e..d4aac4164cfe 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4281,7 +4281,7 @@ rs6000_option_override_internal (bool global_init_p)
   rs6000_isa_flags &= ~OPTION_MASK_FLOAT128_HW;
 }

-  if (TARGET_FLOAT128_HW && !TARGET_64BIT)
+  if (0&& TARGET_FLOAT128_HW && !TARGET_64BIT)
 {
   if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) != 0)
error ("%qs requires %qs", "%<-mfloat128-hardware%>", "-m64");

you can compile fine with -m32 if you add -mfloat128-hardware as well
(it is disabled for BE as well, that should be fixed as well a few lines
up from there).

Can you show any code that will not work please?  Not allowing QP float
with -m32 causes many more problems than just allowing it would.

>   * gcc.target/powerpc/float128-minmax.c: Adjust expected code for
>   power10.
>   * lib/target-supports.exp (check_effective_target_has_arch_pwr10):
>   New target support.

Just "New." please.

>  /* { dg-require-effective-target powerpc_p9vector_ok } */

Please try whether you can lose that line as well.

Okay for trunk, and for 11 after the usual soak.  Thanks!

Segher

Re: [PATCH 2/4] openacc: Fix async bugs in several OpenACC test cases

2021-06-29 Thread Julian Brown

On Tue, 29 Jun 2021 16:42:02 -0700
Julian Brown  wrote:

> Several OpenACC tests accidentally abuse async semantics, leading to
> race conditions & test failures.  This patch fixes those tests.
> 
> Tested with offloading to AMD GCN. I can probably self-approve this as
> a testcase change only, unless anyone objects.

Forgot to say: this was previously posted as part of the AMD GCN
worker-partitioning series here:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566081.html

But I noticed that the worker-partitioning patches do not (now?) have to
be present for the tests in question to fail.

Thanks,

Julian

[PATCH 4/4] openacc: Profiling-interface fixes for asynchronous operations

2021-06-29 Thread Julian Brown

This patch fixes some problems with the OpenACC profiling interface when
used with asynchronous offload operations. The profiling operations
themselves are now launched asynchronously, as previously they measured
the wrong thing, and/or executed at the same time as the operation they
were supposed to be measuring.

A consequence of this change is that "enqueueing" profiling callbacks
are no longer predictably ordered with respect to the callbacks
relating to the execution of asynchronous operations themselves. The
acc_prof-parallel-1.c test is un-XFAILed and adjusted accordingly.

This patch was posted for the og9 branch here:

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01024.html

Tested with offloading to AMD GCN. OK for mainline?

Thanks,

Julian

2021-06-29  Julian Brown  

libgomp/
* oacc-host.c (host_openacc_async_queue_callback): Invoke callback
function immediately.
* oacc-mem.c (goacc_enter_exit_data_internal): Call
queue_async_prof_dispatch for asynchronous profile-event dispatches.
* oacc-parallel.c (struct async_prof_callback_info,
async_prof_dispatch, queue_async_prof_dispatch): New.
(GOACC_parallel_keyed): Call queue_async_prof_dispatch for asynchronous
profile-event dispatches.
(GOACC_update): Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c
(cb_compute_construct_start): Remove/fix TODO.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Remove
XFAIL.
(cb_exit_data_start): Tweak expected state values.
(cb_exit_data_end): Likewise.
(cb_compute_construct_start): Remove/fix TODO.
(cb_compute_construct_end): Don't do adjustments for
acc_ev_enqueue_launch_start/acc_ev_enqueue_launch_end callbacks.
(cb_compute_construct_end): Tweak expected state values.
(cb_enqueue_launch_start, cb_enqueue_launch_end): Don't expect
launch-enqueue operations to happen synchronously with respect to
profiling events on async streams.
(main): Tweak expected state values.
---
 libgomp/oacc-host.c   |   5 +-
 libgomp/oacc-mem.c|  32 ++-
 libgomp/oacc-parallel.c   | 190 ++
 .../acc_prof-init-1.c |   5 +-
 .../acc_prof-parallel-1.c |  66 ++
 5 files changed, 194 insertions(+), 104 deletions(-)

diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index f3bbd2b9c61..1cbff4caace 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -204,10 +204,9 @@ host_openacc_async_dev2host (int ord __attribute__ 
((unused)),
 static void
 host_openacc_async_queue_callback (struct goacc_asyncqueue *aq
   __attribute__ ((unused)),
-  void (*callback_fn)(void *)
-  __attribute__ ((unused)),
-  void *userptr __attribute__ ((unused)))
+  void (*callback_fn)(void *), void *userptr)
 {
+  callback_fn (userptr);
 }
 
 static struct goacc_asyncqueue *
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index 5988db0b886..f0bd907cf07 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -1317,6 +1317,12 @@ goacc_exit_data_internal (struct gomp_device_descr 
*acc_dev, size_t mapnum,
   gomp_mutex_unlock (&acc_dev->lock);
 }
 
+struct async_prof_callback_info *
+queue_async_prof_dispatch (struct gomp_device_descr *devicep, goacc_aq aq,
+  acc_prof_info *prof_info, acc_event_info *event_info,
+  acc_api_info *api_info,
+  struct async_prof_callback_info *prev_info);
+
 static void
 goacc_enter_exit_data_internal (int flags_m, size_t mapnum, void **hostaddrs,
size_t *sizes, unsigned short *kinds,
@@ -1327,6 +1333,7 @@ goacc_enter_exit_data_internal (int flags_m, size_t 
mapnum, void **hostaddrs,
 
   struct goacc_thread *thr;
   struct gomp_device_descr *acc_dev;
+  struct async_prof_callback_info *data_start_info = NULL;
 
   goacc_lazy_initialize ();
 
@@ -1382,9 +1389,19 @@ goacc_enter_exit_data_internal (int flags_m, size_t 
mapnum, void **hostaddrs,
   api_info.async_handle = NULL;
 }
 
+  goacc_aq aq = get_goacc_asyncqueue (async);
+
   if (profiling_p)
-goacc_profiling_dispatch (&prof_info, &enter_exit_data_event_info,
- &api_info);
+{
+  if (aq)
+   data_start_info
+ = queue_async_prof_dispatch (acc_dev, aq, &prof_info,
+  &enter_exit_data_event_info, &api_info,
+  NULL);
+  else
+   goacc_profiling_dispatch (&prof_info, &enter_exit_data_event_info,
+ &api_info);
+}
 
   if ((acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
   || (fl

[PATCH 3/4] openacc: Fix asynchronous host-to-device copies in libgomp runtime

2021-06-29 Thread Julian Brown

This patch fixes several places in libgomp/target.c where "ephemeral" data
(on the stack or in temporary heap locations) may be used as the source of
an asynchronous host-to-device copy that may not complete before the host
data disappears.  Versions of the patch have been posted several times
before, but this one (at Chung-Lin Tang's prior suggesion, IIRC) moves
all logic into target.c rather than pushing it out to each target plugin.

An existing, but flawed, workaround for this problem in the AMD GCN
libgomp offloading plugin is currently present on mainline, and was
posted for the og9 branch here:

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg00901.html

and previous versions of this patch were posted here (for mainline/og9):

  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01482.html
  https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01026.html

This patch exposes a problem with OpenACC profiling support that is
fixed by the next patch in the series. The acc_prof-parallel-1.c test
is XFAILed for now.

Tested with offloading to AMD GCN. OK?

Julian

2021-06-29  Julian Brown  

libgomp/
* libgomp.h (gomp_copy_host2dev): Update prototype.
(memcpy_tofrom_device, update_dev_host): Add new argument to
gomp_copy_host2dev (false).
* plugin/plugin-gcn.c (struct copy_data): Remove free_src field.
(copy_data): Don't free free_src.
(queue_push_copy): Remove free_src handling.
(GOMP_OFFLOAD_dev2dev): Update call to queue_push_copy.
(GOMP_OFFLOAD_openacc_async_host2dev): Remove source-data snapshotting.
(GOMP_OFFLOAD_openacc_async_dev2host): Update call to queue_push_copy.
* target.c (goacc_device_copy_async): Remove.
(gomp_copy_host2dev): Add EPHEMERAL parameter. Snapshot source data
when true, and set up deferred freeing of temporary buffer.
(gomp_copy_dev2host): Inline device-to-host copy handling instead of
calling goacc_device_copy_async.
(gomp_map_vars_existing): Update calls to gomp_copy_host2dev with
appropriate ephemeral argument.
(gomp_map_pointer, gomp_attach_pointer, gomp_detach_pointer,
gomp_update): Likewise.
(gomp_map_vars_internal): Likewise. Don't use coalescing buffer for
async copies.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: XFAIL for
now.
---
 libgomp/libgomp.h |   2 +-
 libgomp/oacc-mem.c|   4 +-
 libgomp/plugin/plugin-gcn.c   |  20 +---
 libgomp/target.c  | 111 +++---
 .../acc_prof-parallel-1.c |   2 +
 5 files changed, 77 insertions(+), 62 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 8d25dc8e2a8..e8901da1069 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1226,7 +1226,7 @@ extern void gomp_acc_declare_allocate (bool, size_t, void 
**, size_t *,
 struct gomp_coalesce_buf;
 extern void gomp_copy_host2dev (struct gomp_device_descr *,
struct goacc_asyncqueue *, void *, const void *,
-   size_t, struct gomp_coalesce_buf *);
+   size_t, bool, struct gomp_coalesce_buf *);
 extern void gomp_copy_dev2host (struct gomp_device_descr *,
struct goacc_asyncqueue *, void *, const void *,
size_t);
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index c21508f3739..5988db0b886 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -202,7 +202,7 @@ memcpy_tofrom_device (bool from, void *d, void *h, size_t 
s, int async,
   if (from)
 gomp_copy_dev2host (thr->dev, aq, h, d, s);
   else
-gomp_copy_host2dev (thr->dev, aq, d, h, s, /* TODO: cbuf? */ NULL);
+gomp_copy_host2dev (thr->dev, aq, d, h, s, false, /* TODO: cbuf? */ NULL);
 
   if (profiling_p)
 {
@@ -874,7 +874,7 @@ update_dev_host (int is_dev, void *h, size_t s, int async)
   goacc_aq aq = get_goacc_asyncqueue (async);
 
   if (is_dev)
-gomp_copy_host2dev (acc_dev, aq, d, h, s, /* TODO: cbuf? */ NULL);
+gomp_copy_host2dev (acc_dev, aq, d, h, s, false, /* TODO: cbuf? */ NULL);
   else
 gomp_copy_dev2host (acc_dev, aq, h, d, s);
 
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index cfed42a2d4d..98da48b77cb 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -292,7 +292,6 @@ struct copy_data
   void *dst;
   const void *src;
   size_t len;
-  bool free_src;
   struct goacc_asyncqueue *aq;
 };
 
@@ -2914,8 +2913,6 @@ copy_data (void *data_)
 data->aq->agent->device_id, data->aq->id, data->len, data->src,
 data->dst);
   hsa_memory_copy_wrapper (data->dst, data->src, data->len);
-  if (data->free_src)
-free ((void *) data->src);
   free (data);
 }
 
@@ -2934,7 +2931,7 @@ gomp_offload_free (void *ptr)
 
 static void
 q

[PATCH 2/4] openacc: Fix async bugs in several OpenACC test cases

2021-06-29 Thread Julian Brown

Several OpenACC tests accidentally abuse async semantics, leading to
race conditions & test failures.  This patch fixes those tests.

Tested with offloading to AMD GCN. I can probably self-approve this as
a testcase change only, unless anyone objects.

Thanks,

Julian

2021-06-29  Julian Brown  

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c: Fix async
behaviour and increase number of iterations.
* testsuite/libgomp.oacc-fortran/lib-16-2.f90: Fix async behaviour.
* testsuite/libgomp.oacc-fortran/lib-16.f90: Likewise.
---
 .../libgomp.oacc-c-c++-common/deep-copy-10.c   | 14 --
 .../testsuite/libgomp.oacc-fortran/lib-16-2.f90|  5 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90  |  5 +
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c
index 573a8214bf0..dadb6d37942 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c
@@ -1,6 +1,8 @@
 #include 
 
-/* Test asyncronous attach and detach operation.  */
+#define ITERATIONS 1023
+
+/* Test asynchronous attach and detach operation.  */
 
 typedef struct {
   int *a;
@@ -25,13 +27,13 @@ main (int argc, char* argv[])
 
 #pragma acc enter data copyin(m)
 
-  for (int i = 0; i < 99; i++)
+  for (int i = 0; i < ITERATIONS; i++)
 {
   int j;
-#pragma acc parallel loop copy(m.a[0:N]) async(i % 2)
+#pragma acc parallel loop copy(m.a[0:N]) async(0)
   for (j = 0; j < N; j++)
m.a[j]++;
-#pragma acc parallel loop copy(m.b[0:N]) async((i + 1) % 2)
+#pragma acc parallel loop copy(m.b[0:N]) async(1)
   for (j = 0; j < N; j++)
m.b[j]++;
 }
@@ -40,9 +42,9 @@ main (int argc, char* argv[])
 
   for (i = 0; i < N; i++)
 {
-  if (m.a[i] != 99)
+  if (m.a[i] != ITERATIONS)
abort ();
-  if (m.b[i] != 99)
+  if (m.b[i] != ITERATIONS)
abort ();
 }
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
index ddd557d3be0..e2e47c967fa 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-16-2.f90
@@ -27,6 +27,9 @@ program main
 
   if (acc_is_present (h) .neqv. .TRUE.) stop 1
 
+  ! We must wait for the update to be done.
+  call acc_wait (async)
+
   h(:) = 0
 
   call acc_copyout_async (h, sizeof (h), async)
@@ -45,6 +48,8 @@ program main
   
   if (acc_is_present (h) .neqv. .TRUE.) stop 3
 
+  call acc_wait (async)
+
   do i = 1, N
 if (h(i) /= i + i) stop 4
   end do 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
index ccd1ce6ee18..ef9a6f6626c 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/lib-16.f90
@@ -27,6 +27,9 @@ program main
 
   if (acc_is_present (h) .neqv. .TRUE.) stop 1
 
+  ! We must wait for the update to be done.
+  call acc_wait (async)
+
   h(:) = 0
 
   call acc_copyout_async (h, sizeof (h), async)
@@ -45,6 +48,8 @@ program main
   
   if (acc_is_present (h) .neqv. .TRUE.) stop 3
 
+  call acc_wait (async)
+
   do i = 1, N
 if (h(i) /= i + i) stop 4
   end do 
-- 
2.29.2

[PATCH 1/4] openacc: Async fix for lib-94 testcase

2021-06-29 Thread Julian Brown

The test case performs an asynchronous host-to-device copy and then
immediately clobbers the data on the host via "memset", leading to a race
condition.  This patch moves the memset after an acc_wait call instead.

Tested with offloading to AMD GCN.

I can probably self-approve this as a testcase change only, unless
anyone objects.

Thanks,

Julian

2021-06-29  Julian Brown  

libgomp/
* testsuite/libgomp.oacc-c-c++-common/lib-94.c: Fix race condition.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
index 54497237b0c..baa3ac83f04 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-94.c
@@ -22,10 +22,10 @@ main (int argc, char **argv)
 
   acc_copyin_async (h, N, async);
 
-  memset (h, 0, N);
-
   acc_wait (async);
 
+  memset (h, 0, N);
+
   acc_copyout_async (h, N, async + 1);
 
   acc_wait (async + 1);
-- 
2.29.2

[PATCH 0/4] openacc: Async fixes

2021-06-29 Thread Julian Brown

This patch series contains fixes for various problems with async support
for OpenACC at present:

 - Asynchonous host-to-device copies invoked from within libgomp
   (target.c) could copy bad data to the target -- and the workaround
   for that currently used in the AMD GCN target plugin could lead to
   a different problem (a race condition).

 - The OpenACC profiling-interface implementation did not measure
   asynchronous operations properly.

 - Several test cases misuse OpenACC asynchronous support (more race
   conditions).

Further comments on individual patches. Tested with offloading to AMD
GCN. OK for mainline?

Thanks,

Julian

Julian Brown (4):
  openacc: Async fix for lib-94 testcase
  openacc: Fix async bugs in several OpenACC test cases
  openacc: Fix asynchronous host-to-device copies in libgomp runtime
  openacc: Profiling-interface fixes for asynchronous operations

 libgomp/libgomp.h |   2 +-
 libgomp/oacc-host.c   |   5 +-
 libgomp/oacc-mem.c|  36 +++-
 libgomp/oacc-parallel.c   | 190 ++
 libgomp/plugin/plugin-gcn.c   |  20 +-
 libgomp/target.c  | 111 ++
 .../acc_prof-init-1.c |   5 +-
 .../acc_prof-parallel-1.c |  64 ++
 .../libgomp.oacc-c-c++-common/deep-copy-10.c  |  14 +-
 .../libgomp.oacc-c-c++-common/lib-94.c|   4 +-
 .../libgomp.oacc-fortran/lib-16-2.f90 |   5 +
 .../testsuite/libgomp.oacc-fortran/lib-16.f90 |   5 +
 12 files changed, 289 insertions(+), 172 deletions(-)

-- 
2.29.2

Re: [PATCH] c++: DR2397 - auto specifier for * and & to arrays [PR100975]

2021-06-29 Thread Marek Polacek via Gcc-patches

On Tue, Jun 29, 2021 at 03:50:27PM -0400, Jason Merrill wrote:
> On 6/29/21 3:25 PM, Marek Polacek wrote:
> > --- a/gcc/testsuite/g++.dg/cpp0x/auto3.C
> > +++ b/gcc/testsuite/g++.dg/cpp0x/auto3.C
> > @@ -10,7 +10,7 @@ auto x;   // { dg-error "auto" }
> >   auto i = 42, j = 42.0;// { dg-error "auto" }
> >   // New CWG issue
> 
> Let's at least update this comment to quote [dcl.type.auto.deduct]/2: "T
> shall not be an array type".  I guess "unable to deduce" is a suitable
> diagnostic for that error.

Fixed.

> > diff --git a/gcc/testsuite/g++.dg/diagnostic/auto1.C 
> > b/gcc/testsuite/g++.dg/diagnostic/auto1.C
> > index ee2eefd59aa..9d9979e3fdc 100644
> > --- a/gcc/testsuite/g++.dg/diagnostic/auto1.C
> > +++ b/gcc/testsuite/g++.dg/diagnostic/auto1.C
> > @@ -1,4 +1,5 @@
> >   // PR c++/86915
> >   // { dg-do compile { target c++17 } }
> > +// Allowed since DR2397.
> 
> Well, not really; any attempt to use this template should hit the same
> problem as above of trying to do auto deduction where T is an array type.
> Please add to the testcase to get the error.

Hmm, this

template struct S { };
static int arr[1];
S s;

won't give an error: I think it's because we coerce the auto tparm into 'auto*'
before deducing and so don't get the type mismatch error.  That seems to be
in line with how 'template' works, though.

So I think we don't need to change this in the patch.  Do you agree?

Marek

Re: [PATCH 0/2] Ranger-based backwards threader implementation.

2021-06-29 Thread Martin Sebor via Gcc-patches


On 6/29/21 4:27 AM, Aldy Hernandez wrote:



On 6/29/21 1:19 AM, Martin Sebor wrote:

On 6/28/21 10:21 AM, Aldy Hernandez via Gcc-patches wrote:

This is the ranger-based backwards threader.  It is divided into two
parts: the solver and the path discovery bits.

The solver is generic enough, that it may be of use to other passes,
so it's been abstracted into its own separate class/file.  Andrew and
I have already gone over it, so I don't think a review is necessary.
Besides, it's technically an extension of the ranger infrastructure.

On the other hand, the path discovery bits could benefit from the
watchful eye of the jump threading experts.

Documenting the solver in a [ranger-tech] post is on my TODO list,
as I think it would be useful as an example of GORI as a general
tool, outside the VRP world.

As I have mentioned elsewhere, I have gone through each test and
documented the reasons why they were adjusted (when useful).  The
reviewer(s) may benefit from looking at the test notes.

I have added a --param=threader-mode={ranger,legacy} option, which I
hope to remove shortly after.  It has been useful for diagnosing
issues in the past, though perhaps not so much now.  I've left it
in case there's a remote interest in using it during stage1, but
removing it could be a huge cleanup to tree-ssa-threadbackward.c.

If/when accepted, I will open 2-3 PRs with the XFAILed tests as
requested.  I am still working on distilling a C counterpart for
the libphobos missing thread edge.  It'll hopefully be ready by the
time the review is done.

A version of this patchset with the verification code has
been tested on x86-64, ppc64, ppc64le, and aarch64 (all Linux).

I am currently re-testing on x86-64 Linux, but will not re-test on the
rest of the architectures because...OMG aarch6 is so slow!


I applied the series and ran a subset of tests and didn't see any
failures, just the three XPASSes below.  The Wfree-nonheap-object
tests you mentioned in the other post all pass.  Looks like you
got past that problem?

XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 32)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 46)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 60)

A couple of comments on the tests below (I haven't looked at the meat
of the patch):



Thanks.
Aldy

Aldy Hernandez (2):
   Implement basic block path solver.
   Backwards jump threader rewrite with ranger.

  gcc/Makefile.in   |   6 +
  gcc/flag-types.h  |   7 +
  gcc/params.opt    |  17 +
  .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
  gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
  gcc/testsuite/gcc.dg/Wrestrict-22.c   |   3 +


The change here just adds the comment:

+/* This looks like the threader caused the entire loop to collapse, 
and the

+   warning pass can't determine the arguments to memcpy.  */
+

Since the test passes I'm not sure I understand what the comment
is trying to say.  Is it still accurate and necessary?


This seems like it came from the ranger branch which had slightly 
different code, particularly it made use of a full ranger with 
equivalences.  It looks like this could have failed in the branch, but 
no longer does.  I have removed the comment.


Okay, thanks.






  gcc/testsuite/gcc.dg/loop-unswitch-2.c    |   2 +-
  gcc/testsuite/gcc.dg/old-style-asm-1.c    |   5 +-
  gcc/testsuite/gcc.dg/pr68317.c    |   4 +-
  gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
  gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
  gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
  gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
  .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |   5 +-


I wonder if breaking up the test function into five, one for each
of the tests it does, would be a better way to avoid the IL changes
than disabling all the threading passes.  Like in the attached patch.


As the author of the original test, I completely differ to you :).

Attached is the latest version with your suggested changes, as well as a 
gimple FE test for the previously discussed failing libphobos test.


The tests look good.

In the new APIs, instead of taking vec by value can you please change
them to either by-const-reference if they don't change the vec or by-
reference if they do?  I'm in the midst of changing code to do that
with the goal of eventually removing all by-value vec arguments.

Thanks
Martin



Thanks.
Aldy

[PATCH] [RFC] libgcc: Add a backchain fallback to _Unwind_Backtrace() on PowerPC

2021-06-29 Thread Raphael Moreira Zinsly via Gcc-patches

Hi all,

There is a patch proposed on glibc that removes the powerpc backtrace
implementation [1]. As discussed in that thread, it would be helpful to have a
backchain fallback on libgcc before removing it.
This patch is moving part of that code to libgcc in order to do that. Is it
acceptable to have the trace_arg struct here or should this be handled by the
trace function passed to _Unwind_Backtrace()?
Any comments are appreciated.

Best Regards,
Raphael Moreira Zinsly

[1] https://sourceware.org/pipermail/libc-alpha/2021-February/122600.html

---
 libgcc/config/rs6000/linux-unwind.h | 58 -
 libgcc/unwind.inc   | 12 +-
 2 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/rs6000/linux-unwind.h 
b/libgcc/config/rs6000/linux-unwind.h
index acdc948f85d..1f4b5e72d47 100644
--- a/libgcc/config/rs6000/linux-unwind.h
+++ b/libgcc/config/rs6000/linux-unwind.h
@@ -203,7 +203,7 @@ ppc_fallback_frame_state (struct _Unwind_Context *context,
   int i;
 
   if (regs == NULL)
-return _URC_END_OF_STACK;
+return _URC_NORMAL_STOP;
 
   new_cfa = regs->gpr[__LIBGCC_STACK_POINTER_REGNUM__];
   fs->regs.cfa_how = CFA_REG_OFFSET;
@@ -352,3 +352,59 @@ frob_update_context (struct _Unwind_Context *context, 
_Unwind_FrameState *fs ATT
 }
 #endif
 }
+
+#define MD_BACKCHAIN_FALLBACK ppc_backchain_fallback
+
+struct trace_arg
+{
+  void **array;
+  struct unwind_link *unwind_link;
+  _Unwind_Word cfa;
+  int cnt;
+  int size;
+};
+
+/* This is the stack layout we see with every stack frame.
+   Note that every routine is required by the ABI to lay out the stack
+   like this.
+
+   +++-+
+%r1  -> | %r1 last frame> | %r1 last frame--->...  --> NULL
+   ||| |
+   | cr save|| cr save |
+   ||| |
+   | (unused)   || return address  |
+   +++-+
+*/
+struct layout
+{
+  struct layout *next;
+  long int condition_register;
+  void *return_address;
+};
+
+void ppc_backchain_fallback (void *a)
+{
+  struct layout *current;
+  struct trace_arg *arg = a;
+
+  /* Force gcc to spill LR.  */
+  asm volatile ("" : "=l"(current));
+
+  /* Get the address on top-of-stack.  */
+  asm volatile ("ld %0,0(1)" : "=r"(current));
+
+  for (int count = 0; current != NULL; current = current->next, count++)
+{
+  arg->array[count] = current->return_address;
+
+  /* Check if the symbol is the signal trampoline and get the interrupted
+   * symbol address from the trampoline saved area. (WIP)  */
+  if (IS_SIGTRAMP_ADDRESS(current->return_address))
+   {
+ if (count + 1 == arg->size)
+   break;
+ // Get sigframe, update arg->array[++count] and current.
+   }
+}
+}
diff --git a/libgcc/unwind.inc b/libgcc/unwind.inc
index aa48d104fd0..955722b1743 100644
--- a/libgcc/unwind.inc
+++ b/libgcc/unwind.inc
@@ -300,14 +300,22 @@ _Unwind_Backtrace(_Unwind_Trace_Fn trace, void * 
trace_argument)
 
   /* Set up fs to describe the FDE for the caller of context.  */
   code = uw_frame_state_for (&context, &fs);
-  if (code != _URC_NO_REASON && code != _URC_END_OF_STACK)
+  if (code != _URC_NO_REASON && code != _URC_END_OF_STACK
+  && code != _URC_NORMAL_STOP)
return _URC_FATAL_PHASE1_ERROR;
 
   /* Call trace function.  */
   if ((*trace) (&context, trace_argument) != _URC_NO_REASON)
return _URC_FATAL_PHASE1_ERROR;
 
-  /* We're done at end of stack.  */   
+  /* Do a backchain if there is no CFI data.  */
+  if (code == _URC_NORMAL_STOP)
+   {
+ MD_BACKCHAIN_FALLBACK(trace_argument);
+ break;
+   }
+
+  /* We're done at end of stack.  */
   if (code == _URC_END_OF_STACK)
break;
 
-- 
2.29.2

Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-29 Thread Christophe Lyon via Gcc-patches

On Tue, Jun 29, 2021 at 10:48 PM Rainer Orth 
wrote:

> Hi Tobias,
>
> > On 29.06.21 13:58, Jakub Jelinek wrote:
> >
> >> Also, wonder if we shouldn't print the list of configured targets in
> that
> >> case, see candidates_list_and_hint functions and its callers.
> >> And it is unclear why we use fatal_error, can't unknown offload target
> names
> >> be simply ignored after emitting error?
> >
> > Done so – as the changes now became a bit larger, I have attached the
> > new version of the patch – despite the LGTM.
>
> this patch broke Solaris bootstrap (both 32 and 64-bit sparc and x86):
>
> /vol/gcc/src/hg/master/local/gcc/gcc.c: In function 'bool
> check_offload_target_name(const char*, ptrdiff_t)':
> /vol/gcc/src/hg/master/local/gcc/gcc.c:4010:23: error: writing 1 byte into
> a region of size 0 [-Werror=stringop-overflow=]
>  4010 |   cand[n - c] = '\0';
>   |   ^~
> In file included from /vol/gcc/src/hg/master/local/gcc/system.h:706,
>  from /vol/gcc/src/hg/master/local/gcc/gcc.c:31:
> /vol/gcc/src/hg/master/local/gcc/../include/libiberty.h:733:36: note: at
> offset 1 into destination object of size 1 allocated by '__builtin_alloca'
>   733 | # define alloca(x) __builtin_alloca(x)
>   |^~~
> /vol/gcc/src/hg/master/local/gcc/gcc.c:4000:29: note: in expansion of
> macro 'alloca'
>  4000 |   char *cand = (char *) alloca (strlen (OFFLOAD_TARGETS) + 1);
>   | ^~
>
> Rainer
>
>
Also seeing:
FAIL:  compiler driver --help=common option(s): "^ +-.*[^:.]$" absent from
output: "  -foffload== Specify options for the offloading
targets"

looks related to this patch.

Thanks,

Christophe

-- 
>
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>

[pushed] c++: don't treat member var as var template

2021-06-29 Thread Jason Merrill via Gcc-patches

While looking at a partial instantiation issue I noticed that we were
wrongly hitting the partial instantiation code when instantiating a static
data member of a class template.  I don't think this broke anything, but we
don't need to do that (small) extra work.

gcc/cp/ChangeLog:

* pt.c (instantiate_decl): Only consider partial specializations of
actual variable templates.
---
 gcc/cp/pt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f2039e09cd7..d2936c106ba 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -26003,7 +26003,7 @@ instantiate_decl (tree d, bool defer_ok, bool 
expl_inst_class_mem_p)
   td = template_for_substitution (d);
   args = gen_args;
 
-  if (VAR_P (d))
+  if (variable_template_specialization_p (d))
 {
   /* Look up an explicit specialization, if any.  */
   tree tid = lookup_template_variable (gen_tmpl, gen_args);

base-commit: 13c906f43f473ee9ff16d80590789d719f2190a4
-- 
2.27.0

Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-29 Thread Rainer Orth

Hi Tobias,

> On 29.06.21 13:58, Jakub Jelinek wrote:
>
>> Also, wonder if we shouldn't print the list of configured targets in that
>> case, see candidates_list_and_hint functions and its callers.
>> And it is unclear why we use fatal_error, can't unknown offload target names
>> be simply ignored after emitting error?
>
> Done so – as the changes now became a bit larger, I have attached the
> new version of the patch – despite the LGTM.

this patch broke Solaris bootstrap (both 32 and 64-bit sparc and x86):

/vol/gcc/src/hg/master/local/gcc/gcc.c: In function 'bool 
check_offload_target_name(const char*, ptrdiff_t)':
/vol/gcc/src/hg/master/local/gcc/gcc.c:4010:23: error: writing 1 byte into a 
region of size 0 [-Werror=stringop-overflow=]
 4010 |   cand[n - c] = '\0';
  |   ^~
In file included from /vol/gcc/src/hg/master/local/gcc/system.h:706,
 from /vol/gcc/src/hg/master/local/gcc/gcc.c:31:
/vol/gcc/src/hg/master/local/gcc/../include/libiberty.h:733:36: note: at offset 
1 into destination object of size 1 allocated by '__builtin_alloca'
  733 | # define alloca(x) __builtin_alloca(x)
  |^~~
/vol/gcc/src/hg/master/local/gcc/gcc.c:4000:29: note: in expansion of macro 
'alloca'
 4000 |   char *cand = (char *) alloca (strlen (OFFLOAD_TARGETS) + 1);
  | ^~

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Go patch committed: In composite literals use temps only for interfaces

2021-06-29 Thread Ian Lance Taylor via Gcc-patches

On Tue, Jun 29, 2021 at 11:01 AM Ian Lance Taylor  wrote:
>
> This patch to the Go frontend reduces the number of temporaries that
> the compiler genrrates for composite literals.  For a composite
> literal we only need to introduce a temporary variable if we may be
> converting to an interface type, so only do it then.  This saves over
> 80% of compilation time when using gccgo to compile
> cmd/internal/obj/x86, as the GCC middle-end spends a lot of time
> pointlessly computing interactions between temporary variables (GCC PR
> 101064).  This is for that PR and for https://golang.org/issue/46600.
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline and GCC 11 branch.

Following up on this, this patch, for mainline only, stops generating
temporaries for composite literals at all.  After the change above we
were generating temporaries for composite literals when a conversion
to interface type was required.  However, Cherry's
https://golang.org/cl/176459 changed the compiler to insert explicit
type conversions.  And those explicit type conversions insert the
required temporaries in Type_conversion_expression::do_flatten.  So in
practice the composite literal do_flatten methods would never insert
temporaries, as the values they see would always be multi_eval_safe.
So just remove the unnecessary do_flatten methods.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
65047dc414205e433c3bcf2a200efcbab329e06b
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f7bcc8c484a..ab1384d698b 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-cad187fe3aceb2a7d964b64c70dfa8c8ad24ce65
+01cb2b5e69a2d08ef3cc1ea023c22ed9b79f5114
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 94342b2f9b8..a0472acc209 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -15147,48 +15147,6 @@ Struct_construction_expression::do_copy()
   return ret;
 }
 
-// Flatten a struct construction expression.  Store the values into
-// temporaries if they may need interface conversion.
-
-Expression*
-Struct_construction_expression::do_flatten(Gogo*, Named_object*,
-  Statement_inserter* inserter)
-{
-  if (this->vals() == NULL)
-return this;
-
-  // If this is a constant struct, we don't need temporaries.
-  if (this->is_constant_struct() || this->is_static_initializer())
-return this;
-
-  Location loc = this->location();
-  const Struct_field_list* fields = this->type_->struct_type()->fields();
-  Struct_field_list::const_iterator pf = fields->begin();
-  for (Expression_list::iterator pv = this->vals()->begin();
-   pv != this->vals()->end();
-   ++pv, ++pf)
-{
-  go_assert(pf != fields->end());
-  if (*pv != NULL)
-   {
-  if ((*pv)->is_error_expression() || (*pv)->type()->is_error_type())
-{
-  go_assert(saw_errors());
-  return Expression::make_error(loc);
-}
- if (pf->type()->interface_type() != NULL
- && !(*pv)->is_multi_eval_safe())
-   {
- Temporary_statement* temp =
-   Statement::make_temporary(NULL, *pv, loc);
- inserter->insert(temp);
- *pv = Expression::make_temporary_reference(temp, loc);
-   }
-   }
-}
-  return this;
-}
-
 // Make implicit type conversions explicit.
 
 void
@@ -15451,55 +15409,6 @@ Array_construction_expression::do_check_types(Gogo*)
 }
 }
 
-// Flatten an array construction expression.  Store the values into
-// temporaries if they may need interface conversion.
-
-Expression*
-Array_construction_expression::do_flatten(Gogo*, Named_object*,
-  Statement_inserter* inserter)
-{
-  if (this->is_error_expression())
-{
-  go_assert(saw_errors());
-  return this;
-}
-
-  if (this->vals() == NULL)
-return this;
-
-  // If this is a constant array, we don't need temporaries.
-  if (this->is_constant_array() || this->is_static_initializer())
-return this;
-
-  // If the array element type is not an interface type, we don't need
-  // temporaries.
-  if (this->type_->array_type()->element_type()->interface_type() == NULL)
-return this;
-
-  Location loc = this->location();
-  for (Expression_list::iterator pv = this->vals()->begin();
-   pv != this->vals()->end();
-   ++pv)
-{
-  if (*pv != NULL)
-   {
-  if ((*pv)->is_error_expression() || (*pv)->type()->is_error_type())
-{
-  go_assert(saw_errors());
-  return Expression::make_error(loc);
-}
- if (!(*pv)->is_multi_eval_safe())
-   {
- Temporary_statement* temp =
-   Stateme

[RFC] ipa: Adjust references to identify read-only globals

2021-06-29 Thread Martin Jambor

Hi,

this patch has been motivated by SPEC 2017's 544.nab_r in which there is
a static variable which is never written to and so zero throughout the
run-time of the benchmark.  However, it is passed by reference to a
function in which it is read and (after some multiplications) passed
into __builtin_exp which in turn unnecessarily consumes almost 10% of
the total benchmark run-time.  The situation is illustrated by the added
testcase remref-3.c.

The patch adds a flag to ipa-prop descriptor of each parameter to mark
such parameters.  IPA-CP and inling then take the effort to remove
IPA_REF_ADDR references in the caller and only add IPA_REF_LOAD
reference to the clone/overall inlined function.  This is sufficient
for subsequent symbol table analysis code to identify the read-only
variable as such and optimize the code.

I plan to compile a number of packages with the patch to test it some
more and get a bit better idea of its impact.  But it has passed
bootstrap, LTObootstrap and testing on x86_64-linux and i686-linux and
so unless I find any problem, I would like to commit it at some point
next month without any major changes, so I'd be grateful for any
feedback even now.

Martin

gcc/ChangeLog:

2021-06-29  Martin Jambor  

* cgraph.h (ipa_replace_map): New field force_load_ref.
* ipa-prop.h (ipa_param_descriptor): Reduce precision of move_cost,
aded new flag load_dereferenced, adjusted comments.
(ipa_get_param_dereferenced): New function.
(ipa_set_param_dereferenced): Likewise.
* cgraphclones.c (cgraph_node::create_virtual_clone): Follow it.
* ipa-cp.c: Include gimple.h.
(ipcp_discover_new_direct_edges): Take into account dereferenced flag.
(get_replacement_map): New parameter force_load_ref, set the
appropriate flag in ipa_replace_map if set.
(struct symbol_and_index_together): New type.
(adjust_references_in_act_callers): New function.
(adjust_references_in_caller): Likewise.
(create_specialized_node): When appropriate, call
adjust_references_in_caller and force only load references.
* ipa-prop.c (load_from_dereferenced_name): New function.
(ipa_analyze_controlled_uses): Also detect loads from a
dereference, harden testing of call statements.
(ipa_write_node_info): Stream the dereferenced flag.
(ipa_read_node_info): Likewise.
(ipa_set_jf_constant): Also create refdesc when jump function
references a variable.
(cgraph_node_for_jfunc): Rename to symtab_node_for_jfunc, work
also on references of variables and return a symtab_node.  Adjust
all callers.
(propagate_controlled_uses): Also remove references to VAR_DECLs.

gcc/testsuite/ChangeLog:

2021-06-29  Martin Jambor  

* gcc.dg/ipa/remref-3.c: New test.
* gcc.dg/ipa/remref-4.c: Likewise.
* gcc.dg/ipa/remref-5.c: Likewise.
* gcc.dg/ipa/remref-6.c: Likewise.
---
 gcc/cgraph.h|   3 +
 gcc/cgraphclones.c  |  10 +-
 gcc/ipa-cp.c| 146 ++--
 gcc/ipa-prop.c  | 166 ++--
 gcc/ipa-prop.h  |  27 -
 gcc/testsuite/gcc.dg/ipa/remref-3.c |  23 
 gcc/testsuite/gcc.dg/ipa/remref-4.c |  31 ++
 gcc/testsuite/gcc.dg/ipa/remref-5.c |  38 +++
 gcc/testsuite/gcc.dg/ipa/remref-6.c |  24 
 9 files changed, 419 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-5.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-6.c

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 9f4338fdf87..0fc20cd4517 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -700,6 +700,9 @@ struct GTY(()) ipa_replace_map
   tree new_tree;
   /* Parameter number to replace, when old_tree is NULL.  */
   int parm_num;
+  /* Set if the newly added reference should not be an address one, but a load
+ one from the operand of the ADDR_EXPR in NEW_TREE.  */
+  unsigned force_load_ref : 1;
 };
 
 enum cgraph_simd_clone_arg_type
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 9f86463b42d..8ec58769c80 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -636,7 +636,15 @@ cgraph_node::create_virtual_clone (vec 
redirect_callers,
   || in_lto_p)
 new_node->unique_name = true;
   FOR_EACH_VEC_SAFE_ELT (tree_map, i, map)
-new_node->maybe_create_reference (map->new_tree, NULL);
+{
+  tree repl = map->new_tree;
+  if (map->force_load_ref)
+   {
+ gcc_assert (TREE_CODE (repl) == ADDR_EXPR);
+ repl = TREE_OPERAND (repl, 0);
+   }
+  new_node->maybe_create_reference (repl, NULL);
+}
 
   if (ipa_transforms_to_apply.exists ())
 new_node->ipa_transforms_to_apply
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 57c18

Re: [PATCH][gcc] Allow functions without C-style ellipsis to use format attribute

2021-06-29 Thread Martin Sebor via Gcc-patches


On 6/27/21 10:24 PM, Tuan Le Quang via Gcc-patches wrote:

Hi,

Currently, format attribute can be used to do type-checking for arguments
with respect to  a format string. However, only functions with a C-style
ellipsis can use it.
Supporting this attribute for non-variadic functions(functions without a
C-style ellipsis) gives nice convenience especially when writing code in
C++, we can use it for C++ variadic template functions like this

template
__attribute__((format(printf, 1, 2))) void myPrint (const char * fmt,
Args...args)


The main benefit of variadic functions templates over C vararg
functions is that they make use of the type system for type safety.
I'm not sure I see applying attribute format to them as a very
compelling use case.  (I'd expect the format string in a variadic
function template to use generic conversion specifiers, say %@ or
some such, and only let the caller specify things like flags, width
and precision but not type conversion specifiers).  Is there one
where relying on the type system isn't good enough?


This patch will introduce these changes:
1. It is no longer an error simply to have a function with the format
attribute but no C-style variadic arguments


I'm a little on the fence about this.  On the one hand it seems
unexpected to apply format checking to ordinary (non-variadic)
functions.  On the other, I can't think of anything wrong with it
and it even seems like it could be useful to specify a format for
a fixed number of arguments of fixed types.  Do you have an actual
use case for it or did it just fall out of the varaidic template
implementation?


2. Functions are subjected to warnings/errors as before, except errors
mentioned in point 1 about not being variadic. For example, when a
non-variadic function has wrong arguments, e.g
__attribute__((format(printf, 1, 1))) or when being type-checked.

Note that behaviours of C-style variadic functions do not change, errors
and warnings are given as before.

This patch does it by:
1.   Relaxing several conditions for format attribute:
  -  Will only use POSARG_ELLIPSIS flag to call `get_constant` when
getting attribute arguments of a variadic function
  -  Relax the check for the last argument of the attribute (will not
require an ellipsis argument)
  -  (Before this patch) After passing the above check, current gcc will
call `get_constant` to get the function parameter that the third attribute
argument is pointing to. If POSARG_ELLIPSIS is set, `get_constant` will
look for `...`. If not, `get_constant` will look for a C-style string. Note
that POSARG_ELLIPSIS is set automatically for getting the third attribute
argument.
 (After this patch) POSARG_ELLIPSIS is set only when the function
has C-style '...'. Now, if POSARG_ELLIPSIS is not set, `get_constant` will
not check whether the third argument of format attribute points to a
C-style string.
2.   Modifying expected outcome of a testcase in objc testsuite, where we
expect a warning instead of an error
3.   Adding 2 test files

Successully bootstrapped and regression tested on x86_64-pc-linux-gnu.

Signed-off-by: Le Quang Tuan 

gcc/c-family/ChangeLog:

* c-attribs.c (positional_argument): allow third argument of format
attribute to point to parameters of any type if the function is not C-style
variadic
* c-format.c (decode_format_attr): read third argument with POSARG_ELLIPSIS
only if the function has has a variable argument
(handle_format_attribute): relax explicit checks for non-variadic functions

gcc/testsuite/ChangeLog:

* gcc.dg/format/attr-3.c: modify comment
* objc.dg/attributes/method-format-1.m: errors do not hold anymore, a
warning is given instead
* g++.dg/warn/format9.C: New test with usage of variadic templates.
* gcc.dg/format/attr-9.c: New test.

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 6bf492afcc0..7a17ce671de 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -714,6 +714,11 @@ positional_argument (const_tree fntype, const_tree
atname, tree pos,
return NULL_TREE;
   }

+  /* For format attribute with argno >= 3, we don't expect any type
+   */
+  if (argno >= 3 && strcmp (IDENTIFIER_POINTER(atname), "format") == 0 &&
!(flags & POSARG_ELLIPSIS ) )
+return pos;


Hardcoding knowledge of individual attributes in this function doesn't
seem very robust.  Avoiding that is the purpose of the flags argument.
I'd suggest adding a bit to the posargflags enum.

Also, at this point, (flags & POSARG_ELLIPSIS) should be zero as
a result of the test above (not shown) so repeating the test shouldn't
be necessary.

Martin

Re: [PATCH 2/4] Allow match-and-simplified phiopt to run in early phiopt

2021-06-29 Thread Andrew Pinski via Gcc-patches

On Tue, Jun 29, 2021 at 12:14 PM Martin Sebor via Gcc-patches
 wrote:
>
> On 6/27/21 5:24 PM, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski 
> >
> > To move a few things more to match-and-simplify from phiopt,
> > we need to allow match_simplify_replacement to run in early
> > phiopt. To do this we add a replacement for gimple_simplify
> > that is explictly for phiopt.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no
> > regressions.
> >
> > gcc/ChangeLog:
> >
> >   * tree-ssa-phiopt.c (match_simplify_replacement):
> >   Add early_p argument. Call gimple_simplify_phiopt
> >   instead of gimple_simplify.
> >   (tree_ssa_phiopt_worker): Update call to
> >   match_simplify_replacement and allow unconditionally.
> >   (phiopt_early_allow): New function.
> >   (gimple_simplify_phiopt): New function.
> > ---
> >   gcc/tree-ssa-phiopt.c | 89 ++-
> >   1 file changed, 70 insertions(+), 19 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> > index ab12e85569d..17bc597851b 100644
> > --- a/gcc/tree-ssa-phiopt.c
> > +++ b/gcc/tree-ssa-phiopt.c
> > @@ -50,12 +50,13 @@ along with GCC; see the file COPYING3.  If not see
> >   #include "gimple-fold.h"
> >   #include "internal-fn.h"
> >   #include "gimple-range.h"
> > +#include "gimple-match.h"
> >
> >   static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
> >   static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
> >  tree, tree);
> >   static bool match_simplify_replacement (basic_block, basic_block,
> > - edge, edge, gphi *, tree, tree);
> > + edge, edge, gphi *, tree, tree, bool);
> >   static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, 
> > tree,
> >   gimple *);
> >   static int value_replacement (basic_block, basic_block,
> > @@ -345,9 +346,9 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> > do_hoist_loads, bool early_p)
> > /* Do the replacement of conditional if it can be done.  */
> > if (!early_p && two_value_replacement (bb, bb1, e2, phi, arg0, 
> > arg1))
> >   cfgchanged = true;
> > -   else if (!early_p
> > -&& match_simplify_replacement (bb, bb1, e1, e2, phi,
> > -   arg0, arg1))
> > +   else if (match_simplify_replacement (bb, bb1, e1, e2, phi,
> > +arg0, arg1,
> > +early_p))
> >   cfgchanged = true;
> > else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> >   cfgchanged = true;
> > @@ -811,6 +812,67 @@ two_value_replacement (basic_block cond_bb, 
> > basic_block middle_bb,
> > return true;
> >   }
> >
> > +/* Return TRUE if CODE should be allowed during early phiopt.
> > +   Currently this is to allow MIN/MAX and ABS/NEGATE.  */
> > +static bool
> > +phiopt_early_allow (enum tree_code code)
> > +{
> > +  switch (code)
> > +{
> > +  case MIN_EXPR:
> > +  case MAX_EXPR:
> > +  case ABS_EXPR:
> > +  case ABSU_EXPR:
> > +  case NEGATE_EXPR:
> > +  case SSA_NAME:
> > + return true;
> > +  default:
> > + return false;
> > +}
> > +}
> > +
> > +/* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
> > +   Return NULL if nothing can be simplified or the resulting simplified 
> > value
> > +   with parts pushed if EARLY_P was true. Also rejects non allowed tree 
> > code
> > +   if EARLY_P is set.
> > +   Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and 
> > tries
> > +   to simplify CMP ? ARG0 : ARG1.  */
> > +static tree
> > +gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
> > + tree arg0, tree arg1,
> > + gimple_seq *seq)
> > +{
> > +  tree result;
> > +  enum tree_code comp_code = gimple_cond_code (comp_stmt);
> > +  location_t loc = gimple_location (comp_stmt);
> > +  tree cmp0 = gimple_cond_lhs (comp_stmt);
> > +  tree cmp1 = gimple_cond_rhs (comp_stmt);
> > +  /* To handle special cases like floating point comparison, it is easier 
> > and
> > + less error-prone to build a tree and gimplify it on the fly though it 
> > is
> > + less efficient.
> > + Don't use fold_build2 here as that might create (bool)a instead of 
> > just
> > + "a != 0".  */
> > +  tree cond = build2_loc (loc, comp_code, boolean_type_node,
> > +   cmp0, cmp1);
> > +  gimple_match_op op (gimple_match_cond::UNCOND,
> > +   COND_EXPR, type, cond, arg0, arg1);
> > +
> > +  if (op.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
> > +{
> > +  /* Early we want only to allow some generated tree codes. */
> > +  if (!early_p
> > +   ||

Re: [PATCH] c++: DR2397 - auto specifier for * and & to arrays [PR100975]

2021-06-29 Thread Jason Merrill via Gcc-patches


On 6/29/21 3:25 PM, Marek Polacek wrote:

This patch implements DR2397, which removes the restriction in
[dcl.array]p4 that the array element type may not be a placeholder
type.  We don't need to worry about decltype(auto) here, so this
allows code like

   int a[3];
   auto (*p)[3] = &a;
   auto (&r)[3] = a;

However, note that

   auto (&&r)[2] = { 1, 2 };
   auto arr[2] = { 1, 2 };

still doesn't work (although one day it might) and neither does

   int arr[5];
   auto x[5] = arr;

given that auto deduction is performed in terms of function template
argument deduction, so the array decays to *.

Bootstrapped/regtested on x86_64-pc-linux-gnu.  Does this look OK or
have I missed a case we want to support?

PR c++/100975
DR 2397

gcc/cp/ChangeLog:

* decl.c (create_array_type_for_decl): Allow array of auto.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/auto24.C: Remove dg-error.
* g++.dg/cpp0x/auto3.C: Adjust dg-error.
* g++.dg/cpp0x/auto42.C: Likewise.
* g++.dg/cpp0x/initlist75.C: Likewise.
* g++.dg/cpp0x/initlist80.C: Likewise.
* g++.dg/diagnostic/auto1.C: Remove dg-error.
* g++.dg/cpp23/auto-array.C: New test.
---
  gcc/cp/decl.c   | 11 
  gcc/testsuite/g++.dg/cpp0x/auto24.C |  3 ++-
  gcc/testsuite/g++.dg/cpp0x/auto3.C  |  2 +-
  gcc/testsuite/g++.dg/cpp0x/auto42.C |  2 +-
  gcc/testsuite/g++.dg/cpp0x/initlist75.C |  2 +-
  gcc/testsuite/g++.dg/cpp0x/initlist80.C |  2 +-
  gcc/testsuite/g++.dg/cpp23/auto-array.C | 36 +
  gcc/testsuite/g++.dg/diagnostic/auto1.C |  3 ++-
  8 files changed, 44 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-array.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index fa6af6fec11..7672947e64a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10969,17 +10969,6 @@ create_array_type_for_decl (tree name, tree type, tree 
size, location_t loc)
if (type == error_mark_node || size == error_mark_node)
  return error_mark_node;
  
-  /* 8.3.4/1: If the type of the identifier of D contains the auto

- type-specifier, the program is ill-formed.  */
-  if (type_uses_auto (type))
-{
-  if (name)
-   error_at (loc, "%qD declared as array of %qT", name, type);
-  else
-   error ("creating array of %qT", type);
-  return error_mark_node;
-}
-
/* If there are some types which cannot be array elements,
   issue an error-message and return.  */
switch (TREE_CODE (type))
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto24.C 
b/gcc/testsuite/g++.dg/cpp0x/auto24.C
index 193f92e977a..ac1ba24f72d 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto24.C
+++ b/gcc/testsuite/g++.dg/cpp0x/auto24.C
@@ -1,5 +1,6 @@
  // PR c++/48599
  // { dg-do compile { target c++11 } }
+// Allowed since DR2397.
  
  int v[1];

-auto (*p)[1] = &v; // { dg-error "8:.p. declared as array of .auto" }
+auto (*p)[1] = &v;
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto3.C 
b/gcc/testsuite/g++.dg/cpp0x/auto3.C
index 2cd0520023d..56439408a0b 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto3.C
+++ b/gcc/testsuite/g++.dg/cpp0x/auto3.C
@@ -10,7 +10,7 @@ auto x;   // { dg-error "auto" }
  auto i = 42, j = 42.0;// { dg-error "auto" }
  
  // New CWG issue


Let's at least update this comment to quote [dcl.type.auto.deduct]/2: "T 
shall not be an array type".  I guess "unable to deduce" is a suitable 
diagnostic for that error.



-auto a[2] = { 1, 2 };  // { dg-error "6:.a. declared as array of 
.auto" }
+auto a[2] = { 1, 2 };  // { dg-error "20:unable to deduce" }
  
  template

  struct A { };
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto42.C 
b/gcc/testsuite/g++.dg/cpp0x/auto42.C
index 8d15fc96f09..5b2f6779aaf 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto42.C
+++ b/gcc/testsuite/g++.dg/cpp0x/auto42.C
@@ -5,5 +5,5 @@
  
  void foo(int i)

  {
-  auto x[1] = { 0 };   // { dg-error "8:.x. declared as array of 
.auto" }
+  auto x[1] = { 0 };   // { dg-error "19:unable to deduce" }
  }
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist75.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist75.C
index 9a45087c5e4..f572f5181ad 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist75.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist75.C
@@ -3,4 +3,4 @@
  
  #include 
  
-auto foo[] = {};// { dg-error "6:.foo. declared as array of .auto" }

+auto foo[] = {};// { dg-error "15:unable to deduce" }
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist80.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
index 15723be16f8..a6ab40ca349 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist80.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
@@ -3,4 +3,4 @@
  
  #include 
  
-auto x[2] = {};			// { dg-error "6:.x. declared as array of .auto" }

+auto x[2] = {};// { dg-error "14:unable to deduce" }
diff --git a/gcc/testsuite/g++.dg/cpp23/auto-array.C 
b/gcc/testsuite/g

Re: [PATCH 2/2] c++: Extend PR96204 fix to variable templates

2021-06-29 Thread Jason Merrill via Gcc-patches


On 6/29/21 1:57 PM, Patrick Palka wrote:

r12-1829 corrected the access scope during partial specialization
matching of class templates, but neglected the variable template case.
This patch moves the access scope adjustment to inside
most_specialized_partial_spec, so that all callers can benefit.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/96204

gcc/cp/ChangeLog:

* pt.c (instantiate_class_template_1): Remove call to
push_nested_class and pop_nested_class added by r12-1829.
(most_specialized_partial_spec): Use push_access_scope_guard
and deferring_access_check_sentinel.

gcc/testsuite/ChangeLog:

* g++.dg/template/access40b.C: New test.
---
  gcc/cp/pt.c   | 12 +++
  gcc/testsuite/g++.dg/template/access40b.C | 26 +++
  2 files changed, 34 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/access40b.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index bd8b17ca047..1e2e2ba5329 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11776,11 +11776,8 @@ instantiate_class_template_1 (tree type)
deferring_access_check_sentinel acs (dk_no_deferred);
  
/* Determine what specialization of the original template to

- instantiate; do this relative to the scope of the class for
- sake of access checking.  */
-  push_nested_class (type);
+ instantiate.  */
t = most_specialized_partial_spec (type, tf_warning_or_error);
-  pop_nested_class ();
if (t == error_mark_node)
  return error_mark_node;
else if (t)
@@ -24989,26 +24986,33 @@ most_specialized_partial_spec (tree target, 
tsubst_flags_t complain)
tree outer_args = NULL_TREE;
tree tmpl, args;
  
+  tree decl;

if (TYPE_P (target))
  {
tree tinfo = CLASSTYPE_TEMPLATE_INFO (target);
tmpl = TI_TEMPLATE (tinfo);
args = TI_ARGS (tinfo);
+  decl = TYPE_NAME (target);
  }
else if (TREE_CODE (target) == TEMPLATE_ID_EXPR)
  {
tmpl = TREE_OPERAND (target, 0);
args = TREE_OPERAND (target, 1);
+  decl = DECL_TEMPLATE_RESULT (tmpl);


Hmm, this won't get us the right scope; we get here for the result of 
finish_template_variable, where tmpl is the most general template and 
args are args for it.  So in the below testcase, tmpl is outer::N:


template  struct outer {
  template 
  static constexpr int f() { return N; };

  template 
  static const int N = f();
};

template 
template 
const int outer::N = 1;

int i = outer::N;

Oddly, I notice that we also get here for static data members of class 
templates that are not themselves templates, as in mem-partial1.C that I 
adapted the above from.  Fixed by the attached patch.


Since the type of the variable depends on the specialization, we can't 
actually get the decl before doing the resolution, but we should be able 
to push into the right enclosing class.  Perhaps we should pass the 
partially instantiated template and its args to lookup_template_variable 
instead of the most general template and its args.



  }
else if (VAR_P (target))
  {
tree tinfo = DECL_TEMPLATE_INFO (target);
tmpl = TI_TEMPLATE (tinfo);
args = TI_ARGS (tinfo);
+  decl = target;
  }
else
  gcc_unreachable ();
  
+  push_access_scope_guard pas (decl);

+  deferring_access_check_sentinel acs (dk_no_deferred);
+
tree main_tmpl = most_general_template (tmpl);
  
/* For determining which partial specialization to use, only the

diff --git a/gcc/testsuite/g++.dg/template/access40b.C 
b/gcc/testsuite/g++.dg/template/access40b.C
new file mode 100644
index 000..040e3d18096
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40b.C
@@ -0,0 +1,26 @@
+// PR c++/96204
+// { dg-do compile { target c++14 } }
+// A variant of access40.C where has_type_member is a variable template instead
+// of a class template.
+
+template
+constexpr bool has_type_member = false;
+
+template
+constexpr bool has_type_member = true;
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct Parent;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization does not match despite Child::type
+// being accessible from the current scope.
+static_assert(!has_type_member, "");
+  }
+};



>From b7b34f555b54f97a9d2315d6c6a552e27e2faa9c Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Tue, 29 Jun 2021 15:11:25 -0400
Subject: [PATCH] vartmp
To: gcc-patches@gcc.gnu.org

---
 gcc/cp/pt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f2039e09cd7..d2936c106ba 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -26003,7 +26003,7 @@ instantiate_decl (tree d, bool defer_ok, bool expl_inst_class_mem_p)
   td = template_for_substitution (d);
   args = gen_args;
 
-  if (VAR_P (d))
+  if (variable_template_specialization_p (d))
 {
   /* Look up an exp

[PATCH] c++: DR2397 - auto specifier for * and & to arrays [PR100975]

2021-06-29 Thread Marek Polacek via Gcc-patches

This patch implements DR2397, which removes the restriction in
[dcl.array]p4 that the array element type may not be a placeholder
type.  We don't need to worry about decltype(auto) here, so this
allows code like

  int a[3];
  auto (*p)[3] = &a;
  auto (&r)[3] = a;

However, note that

  auto (&&r)[2] = { 1, 2 };
  auto arr[2] = { 1, 2 };

still doesn't work (although one day it might) and neither does

  int arr[5];
  auto x[5] = arr;

given that auto deduction is performed in terms of function template
argument deduction, so the array decays to *.

Bootstrapped/regtested on x86_64-pc-linux-gnu.  Does this look OK or
have I missed a case we want to support?

PR c++/100975
DR 2397

gcc/cp/ChangeLog:

* decl.c (create_array_type_for_decl): Allow array of auto.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/auto24.C: Remove dg-error.
* g++.dg/cpp0x/auto3.C: Adjust dg-error.
* g++.dg/cpp0x/auto42.C: Likewise.
* g++.dg/cpp0x/initlist75.C: Likewise.
* g++.dg/cpp0x/initlist80.C: Likewise.
* g++.dg/diagnostic/auto1.C: Remove dg-error.
* g++.dg/cpp23/auto-array.C: New test.
---
 gcc/cp/decl.c   | 11 
 gcc/testsuite/g++.dg/cpp0x/auto24.C |  3 ++-
 gcc/testsuite/g++.dg/cpp0x/auto3.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/auto42.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/initlist75.C |  2 +-
 gcc/testsuite/g++.dg/cpp0x/initlist80.C |  2 +-
 gcc/testsuite/g++.dg/cpp23/auto-array.C | 36 +
 gcc/testsuite/g++.dg/diagnostic/auto1.C |  3 ++-
 8 files changed, 44 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-array.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index fa6af6fec11..7672947e64a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10969,17 +10969,6 @@ create_array_type_for_decl (tree name, tree type, tree 
size, location_t loc)
   if (type == error_mark_node || size == error_mark_node)
 return error_mark_node;
 
-  /* 8.3.4/1: If the type of the identifier of D contains the auto
- type-specifier, the program is ill-formed.  */
-  if (type_uses_auto (type))
-{
-  if (name)
-   error_at (loc, "%qD declared as array of %qT", name, type);
-  else
-   error ("creating array of %qT", type);
-  return error_mark_node;
-}
-
   /* If there are some types which cannot be array elements,
  issue an error-message and return.  */
   switch (TREE_CODE (type))
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto24.C 
b/gcc/testsuite/g++.dg/cpp0x/auto24.C
index 193f92e977a..ac1ba24f72d 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto24.C
+++ b/gcc/testsuite/g++.dg/cpp0x/auto24.C
@@ -1,5 +1,6 @@
 // PR c++/48599
 // { dg-do compile { target c++11 } }
+// Allowed since DR2397.
 
 int v[1];
-auto (*p)[1] = &v; // { dg-error "8:.p. declared as array of 
.auto" }
+auto (*p)[1] = &v;
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto3.C 
b/gcc/testsuite/g++.dg/cpp0x/auto3.C
index 2cd0520023d..56439408a0b 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto3.C
+++ b/gcc/testsuite/g++.dg/cpp0x/auto3.C
@@ -10,7 +10,7 @@ auto x;   // { dg-error "auto" }
 auto i = 42, j = 42.0; // { dg-error "auto" }
 
 // New CWG issue
-auto a[2] = { 1, 2 };  // { dg-error "6:.a. declared as array of 
.auto" }
+auto a[2] = { 1, 2 };  // { dg-error "20:unable to deduce" }
 
 template
 struct A { };
diff --git a/gcc/testsuite/g++.dg/cpp0x/auto42.C 
b/gcc/testsuite/g++.dg/cpp0x/auto42.C
index 8d15fc96f09..5b2f6779aaf 100644
--- a/gcc/testsuite/g++.dg/cpp0x/auto42.C
+++ b/gcc/testsuite/g++.dg/cpp0x/auto42.C
@@ -5,5 +5,5 @@
 
 void foo(int i)
 {
-  auto x[1] = { 0 };   // { dg-error "8:.x. declared as array of 
.auto" }
+  auto x[1] = { 0 };   // { dg-error "19:unable to deduce" }
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist75.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist75.C
index 9a45087c5e4..f572f5181ad 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist75.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist75.C
@@ -3,4 +3,4 @@
 
 #include 
 
-auto foo[] = {};// { dg-error "6:.foo. declared as array of .auto" }
+auto foo[] = {};// { dg-error "15:unable to deduce" }
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist80.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
index 15723be16f8..a6ab40ca349 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist80.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist80.C
@@ -3,4 +3,4 @@
 
 #include 
 
-auto x[2] = {};// { dg-error "6:.x. declared as array 
of .auto" }
+auto x[2] = {};// { dg-error "14:unable to deduce" }
diff --git a/gcc/testsuite/g++.dg/cpp23/auto-array.C 
b/gcc/testsuite/g++.dg/cpp23/auto-array.C
new file mode 100644
index 000..42f2b0c5cf4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/auto-array.C
@@ -0,0 +1,36 @@
+// PR c++/100975
+// DR 2397 - auto specifier for pointers and references to arrays 
+// { dg-d

Go patch commited: Set DECL_NAMELESS for temporaries

2021-06-29 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend sets DECL_NAMELESS for temporary
variables.  This is consistent with what create_tmp_var does, but is
used for cases where we can't use create_tmp_var.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian

* go-gcc.cc (Gcc_backend::static_chain_variable): Set
DECL_NAMELESS on the new decl.
(Gcc_backend::temporary_variable): Likewise.
(Gcc_backend::function): Set DECL_NAMELESS on the result decl.
diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
index 41f309e7294..f812796c959 100644
--- a/gcc/go/go-gcc.cc
+++ b/gcc/go/go-gcc.cc
@@ -2853,6 +2853,7 @@ Gcc_backend::static_chain_variable(Bfunction* function, 
const std::string& name,
   TREE_USED(decl) = 1;
   DECL_ARTIFICIAL(decl) = 1;
   DECL_IGNORED_P(decl) = 1;
+  DECL_NAMELESS(decl) = 1;
   TREE_READONLY(decl) = 1;
 
   struct function *f = DECL_STRUCT_FUNCTION(fndecl);
@@ -2912,6 +2913,7 @@ Gcc_backend::temporary_variable(Bfunction* function, 
Bblock* bblock,
   type_tree);
   DECL_ARTIFICIAL(var) = 1;
   DECL_IGNORED_P(var) = 1;
+  DECL_NAMELESS(var) = 1;
   TREE_USED(var) = 1;
   DECL_CONTEXT(var) = decl;
 
@@ -3290,6 +3292,7 @@ Gcc_backend::function(Btype* fntype, const std::string& 
name,
   build_decl(location.gcc_location(), RESULT_DECL, NULL_TREE, restype);
   DECL_ARTIFICIAL(resdecl) = 1;
   DECL_IGNORED_P(resdecl) = 1;
+  DECL_NAMELESS(resdecl) = 1;
   DECL_CONTEXT(resdecl) = decl;
   DECL_RESULT(decl) = resdecl;
 }

Re: [PATCH 2/4] Allow match-and-simplified phiopt to run in early phiopt

2021-06-29 Thread Martin Sebor via Gcc-patches


On 6/27/21 5:24 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

To move a few things more to match-and-simplify from phiopt,
we need to allow match_simplify_replacement to run in early
phiopt. To do this we add a replacement for gimple_simplify
that is explictly for phiopt.

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (match_simplify_replacement):
Add early_p argument. Call gimple_simplify_phiopt
instead of gimple_simplify.
(tree_ssa_phiopt_worker): Update call to
match_simplify_replacement and allow unconditionally.
(phiopt_early_allow): New function.
(gimple_simplify_phiopt): New function.
---
  gcc/tree-ssa-phiopt.c | 89 ++-
  1 file changed, 70 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index ab12e85569d..17bc597851b 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -50,12 +50,13 @@ along with GCC; see the file COPYING3.  If not see
  #include "gimple-fold.h"
  #include "internal-fn.h"
  #include "gimple-range.h"
+#include "gimple-match.h"
  
  static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);

  static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
   tree, tree);
  static bool match_simplify_replacement (basic_block, basic_block,
-   edge, edge, gphi *, tree, tree);
+   edge, edge, gphi *, tree, tree, bool);
  static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, 
tree,
gimple *);
  static int value_replacement (basic_block, basic_block,
@@ -345,9 +346,9 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
do_hoist_loads, bool early_p)
  /* Do the replacement of conditional if it can be done.  */
  if (!early_p && two_value_replacement (bb, bb1, e2, phi, arg0, arg1))
cfgchanged = true;
- else if (!early_p
-  && match_simplify_replacement (bb, bb1, e1, e2, phi,
- arg0, arg1))
+ else if (match_simplify_replacement (bb, bb1, e1, e2, phi,
+  arg0, arg1,
+  early_p))
cfgchanged = true;
  else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
cfgchanged = true;
@@ -811,6 +812,67 @@ two_value_replacement (basic_block cond_bb, basic_block 
middle_bb,
return true;
  }
  
+/* Return TRUE if CODE should be allowed during early phiopt.

+   Currently this is to allow MIN/MAX and ABS/NEGATE.  */
+static bool
+phiopt_early_allow (enum tree_code code)
+{
+  switch (code)
+{
+  case MIN_EXPR:
+  case MAX_EXPR:
+  case ABS_EXPR:
+  case ABSU_EXPR:
+  case NEGATE_EXPR:
+  case SSA_NAME:
+   return true;
+  default:
+   return false;
+}
+}
+
+/* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT.
+   Return NULL if nothing can be simplified or the resulting simplified value
+   with parts pushed if EARLY_P was true. Also rejects non allowed tree code
+   if EARLY_P is set.
+   Takes the comparison from COMP_STMT and two args, ARG0 and ARG1 and tries
+   to simplify CMP ? ARG0 : ARG1.  */
+static tree
+gimple_simplify_phiopt (bool early_p, tree type, gimple *comp_stmt,
+   tree arg0, tree arg1,
+   gimple_seq *seq)
+{
+  tree result;
+  enum tree_code comp_code = gimple_cond_code (comp_stmt);
+  location_t loc = gimple_location (comp_stmt);
+  tree cmp0 = gimple_cond_lhs (comp_stmt);
+  tree cmp1 = gimple_cond_rhs (comp_stmt);
+  /* To handle special cases like floating point comparison, it is easier and
+ less error-prone to build a tree and gimplify it on the fly though it is
+ less efficient.
+ Don't use fold_build2 here as that might create (bool)a instead of just
+ "a != 0".  */
+  tree cond = build2_loc (loc, comp_code, boolean_type_node,
+ cmp0, cmp1);
+  gimple_match_op op (gimple_match_cond::UNCOND,
+ COND_EXPR, type, cond, arg0, arg1);
+
+  if (op.resimplify (early_p ? NULL : seq, follow_all_ssa_edges))
+{
+  /* Early we want only to allow some generated tree codes. */
+  if (!early_p
+ || op.code.is_tree_code ()
+ || phiopt_early_allow ((tree_code)op.code))
+   {
+ result = maybe_push_res_to_seq (&op, seq);
+ if (result)
+   return result;


It looks to me like the last if statement is redundant and could
be replaced by

  return maybe_push_res_to_seq (&op, seq);

thus also making the result variable redundant, further simplifying
the code.

Martin


+   }
+}
+
+  return NULL;
+}
+
  /*  The function match_simplify_rep

[PATCH] tree-optimization/101254 - Fix MINUS_EXPR relations.

2021-06-29 Thread Andrew MacLeod via Gcc-patches

We were incorrectly calculating the LHS of a MINUS_EXPR for wrapping 
signed values.  This patch cleans that up and fully fleshed out the 
possible LHS of OP1 - OP2 given a known relation between them.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From a96d8d67d0073a7031c0712bc3fb7759417b2125 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 29 Jun 2021 10:52:58 -0400
Subject: [PATCH 3/3] Fix MINUS_EXPR relations.

Flesh out and correct relations for both wrapping and non-wrapping values.

	gcc/
	PR tree-optimization/101254
	* range-op.cc (operator_minus::op1_op2_relation_effect): Check for
	wrapping/non-wrapping when setting the result range.

	gcc/testsuite
	* gcc.dg/pr101254.c: New.
---
 gcc/range-op.cc | 64 -
 gcc/testsuite/gcc.dg/pr101254.c | 27 ++
 2 files changed, 74 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101254.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 29ee9e0f645..97b9843e095 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1314,24 +1314,54 @@ operator_minus::op1_op2_relation_effect (irange &lhs_range, tree type,
   unsigned prec = TYPE_PRECISION (type);
   signop sgn = TYPE_SIGN (type);
 
-  switch (rel)
+  // == and != produce [0,0] and ~[0,0] regardless of wrapping.
+  if (rel == EQ_EXPR)
+rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
+  else if (rel == NE_EXPR)
+rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
+			  VR_ANTI_RANGE);
+  else if (TYPE_OVERFLOW_WRAPS (type))
 {
-  // op1 > op2,  op1 - op2 can be restricted to  [1, max]
-  case GT_EXPR:
-	rel_range = int_range<2> (type, wi::one (prec),
-  wi::max_value (prec, sgn));
-	break;
-  // op1 >= op2,  op1 - op2 can be restricted to  [0, max]
-  case GE_EXPR:
-	rel_range = int_range<2> (type, wi::zero (prec),
-  wi::max_value (prec, sgn));
-	break;
-  // op1 == op2,  op1 - op2 can be restricted to  [0, 0]
-  case EQ_EXPR:
-	rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
-	break;
-  default:
-	return false;
+  switch (rel)
+	{
+	  // For wrapping signed values and unsigned, if op1 > op2 or
+	  // op1 < op2, then op1 - op2 can be restricted to ~[0, 0].
+	  case GT_EXPR:
+	  case LT_EXPR:
+	  rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
+	VR_ANTI_RANGE);
+	break;
+	  default:
+	return false;
+	}
+}
+  else
+{
+  switch (rel)
+	{
+	  // op1 > op2, op1 - op2 can be restricted to [1, +INF]
+	  case GT_EXPR:
+	rel_range = int_range<2> (type, wi::one (prec),
+  wi::max_value (prec, sgn));
+	break;
+	  // op1 >= op2, op1 - op2 can be restricted to [0, +INF]
+	  case GE_EXPR:
+	rel_range = int_range<2> (type, wi::zero (prec),
+  wi::max_value (prec, sgn));
+	break;
+	  // op1 < op2, op1 - op2 can be restricted to [-INF, -1]
+	  case LT_EXPR:
+	rel_range = int_range<2> (type, wi::min_value (prec, sgn),
+  wi::minus_one (prec));
+	break;
+	  // op1 <= op2, op1 - op2 can be restricted to [-INF, 0]
+	  case LE_EXPR:
+	rel_range = int_range<2> (type, wi::min_value (prec, sgn),
+  wi::zero (prec));
+	break;
+	  default:
+	return false;
+	}
 }
   lhs_range.intersect (rel_range);
   return true;
diff --git a/gcc/testsuite/gcc.dg/pr101254.c b/gcc/testsuite/gcc.dg/pr101254.c
new file mode 100644
index 000..b2460ed86f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101254.c
@@ -0,0 +1,27 @@
+/* PR tree-optimization/101254 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fwrapv" } */
+
+int
+foo (long long imin, long long imax)
+{
+  if (imin > imax)
+return 0;
+  else if (imax - imin < 0 || (imax - imin) + 1 < 0)
+return 0;
+  return 1;
+}
+
+int
+main ()
+{
+  long long imax = __LONG_LONG_MAX__;
+  long long imin = -imax - 1; 
+  if (!foo (-10, 10))
+__builtin_abort ();
+  if (foo (-10, imax))
+__builtin_abort ();
+  if (foo (imin, imax))
+__builtin_abort ();
+  return 0;
+}
-- 
2.17.2

[PATCH] Allow PHIs to pick up global values.

2021-06-29 Thread Andrew MacLeod via Gcc-patches

EVRP appears to allow PHIS to pick up global values before inlining.. 
This simply matches that behaviour with ranger and allows better 
equality when running in ranger-only mode..


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


>From 604dce2d74d3417970e23e7ad38322d1adbca2e2 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 25 Jun 2021 15:31:39 -0400
Subject: [PATCH 2/3] Allow PHIs to pick up global values.

We can also apply known global values to PHI nodes in EVRP.

	* value-query.cc (gimple_range_global): Allow phis.
---
 gcc/value-query.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 17dfdb1ccbe..730a2149275 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -419,7 +419,8 @@ gimple_range_global (tree name)
   gcc_checking_assert (gimple_range_ssa_p (name));
   tree type = TREE_TYPE (name);
 
-  if (SSA_NAME_IS_DEFAULT_DEF (name) || (cfun && cfun->after_inlining))
+  if (SSA_NAME_IS_DEFAULT_DEF (name) || (cfun && cfun->after_inlining)
+  || is_a (SSA_NAME_DEF_STMT (name)))
 {
   value_range vr;
   get_range_global (vr, name);
-- 
2.17.2

[PATCH] Add stmt context in simplify_using_ranges.

2021-06-29 Thread Andrew MacLeod via Gcc-patches

We added context to a lot of simplify_using_ranges, but we didn't catch 
all the places.   This provides the originating stmt to the missing 
cases which resolve a few EVRP testcases when running in ranger-only mode.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


>From a7e655ae4016eaf04e261ff32fc67a14ebb0e329 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 25 Jun 2021 11:24:30 -0400
Subject: [PATCH 1/3] Add stmt context in simplify_using_ranges.

There were places simplify_using_ranges was not utilzing the stmt context.

	* vr-values.c (vr_values::vrp_stmt_computes_nonzero): Use stmt.
	(simplify_using_ranges::op_with_boolean_value_range_p): Add a
	statement for location context.
	(check_for_binary_op_overflow): Ditto.
	(simplify_using_ranges::get_vr_for_comparison): Ditto.
	(simplify_using_ranges::compare_name_with_value): Ditto.
	(simplify_using_ranges::compare_names): Ditto.
	(vrp_evaluate_conditional_warnv_with_ops_using_ranges): Ditto.
	(simplify_using_ranges::simplify_truth_ops_using_ranges): Ditto.
	(simplify_using_ranges::simplify_min_or_max_using_ranges): Ditto.
	(simplify_using_ranges::simplify_internal_call_using_ranges): Ditto.
	(simplify_using_ranges::two_valued_val_range_p): Ditto.
	(simplify_using_ranges::simplify): Ditto.
	* vr-values.h: Adjust prototypes.
---
 gcc/vr-values.c | 71 ++---
 gcc/vr-values.h | 14 +-
 2 files changed, 46 insertions(+), 39 deletions(-)

diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 3ae2c68499d..190676de2c0 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -429,7 +429,7 @@ vr_values::vrp_stmt_computes_nonzero (gimple *stmt)
 		  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (expr
 	{
 	  const value_range_equiv *vr
-		= get_value_range (TREE_OPERAND (base, 0));
+		= get_value_range (TREE_OPERAND (base, 0), stmt);
 	  if (!range_includes_zero_p (vr))
 		return true;
 	}
@@ -486,7 +486,7 @@ vr_values::op_with_constant_singleton_value_range (tree op)
 /* Return true if op is in a boolean [0, 1] value-range.  */
 
 bool
-simplify_using_ranges::op_with_boolean_value_range_p (tree op)
+simplify_using_ranges::op_with_boolean_value_range_p (tree op, gimple *s)
 {
   if (TYPE_PRECISION (TREE_TYPE (op)) == 1)
 return true;
@@ -500,7 +500,7 @@ simplify_using_ranges::op_with_boolean_value_range_p (tree op)
 
   /* ?? Errr, this should probably check for [0,0] and [1,1] as well
  as [0,1].  */
-  const value_range *vr = query->get_value_range (op);
+  const value_range *vr = query->get_value_range (op, s);
   return *vr == value_range (build_zero_cst (TREE_TYPE (op)),
 			 build_one_cst (TREE_TYPE (op)));
 }
@@ -1057,18 +1057,18 @@ vr_values::extract_range_from_comparison (value_range_equiv *vr,
 static bool
 check_for_binary_op_overflow (range_query *query,
 			  enum tree_code subcode, tree type,
-			  tree op0, tree op1, bool *ovf)
+			  tree op0, tree op1, bool *ovf, gimple *s = NULL)
 {
   value_range vr0, vr1;
   if (TREE_CODE (op0) == SSA_NAME)
-vr0 = *query->get_value_range (op0);
+vr0 = *query->get_value_range (op0, s);
   else if (TREE_CODE (op0) == INTEGER_CST)
 vr0.set (op0);
   else
 vr0.set_varying (TREE_TYPE (op0));
 
   if (TREE_CODE (op1) == SSA_NAME)
-vr1 = *query->get_value_range (op1);
+vr1 = *query->get_value_range (op1, s);
   else if (TREE_CODE (op1) == INTEGER_CST)
 vr1.set (op1);
   else
@@ -1980,10 +1980,11 @@ vr_values::vrp_visit_assignment_or_call (gimple *stmt, tree *output_p,
is varying or undefined.  Uses TEM as storage for the alternate range.  */
 
 const value_range_equiv *
-simplify_using_ranges::get_vr_for_comparison (int i, value_range_equiv *tem)
+simplify_using_ranges::get_vr_for_comparison (int i, value_range_equiv *tem,
+	  gimple *s)
 {
   /* Shallow-copy equiv bitmap.  */
-  const value_range_equiv *vr = query->get_value_range (ssa_name (i));
+  const value_range_equiv *vr = query->get_value_range (ssa_name (i), s);
 
   /* If name N_i does not have a valid range, use N_i as its own
  range.  This allows us to compare against names that may
@@ -2005,10 +2006,11 @@ simplify_using_ranges::get_vr_for_comparison (int i, value_range_equiv *tem)
 tree
 simplify_using_ranges::compare_name_with_value
 (enum tree_code comp, tree var, tree val,
- bool *strict_overflow_p, bool use_equiv_p)
+ bool *strict_overflow_p, bool use_equiv_p,
+ gimple *s)
 {
   /* Get the set of equivalences for VAR.  */
-  bitmap e = query->get_value_range (var)->equiv ();
+  bitmap e = query->get_value_range (var, s)->equiv ();
 
   /* Start at -1.  Set it to 0 if we do a comparison without relying
  on overflow, or 1 if all comparisons rely on overflow.  */
@@ -2017,7 +2019,7 @@ simplify_using_ranges::compare_name_with_value
   /* Compare vars' value range with val.  */
   value_range_equiv tem_vr;
   const value_range_equiv *equiv_vr
-= get_vr_for_comparison (SSA_NAME_

Re: [PATCH 1/4] Duplicate the range information of the phi onto the new ssa_name

2021-06-29 Thread Martin Sebor via Gcc-patches


On 6/27/21 5:24 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

Since match_simplify_replacement uses gimple_simplify, there is a new
ssa name created sometimes and then we go and replace the phi edge with
this new ssa name, the range information on the phi is lost.
Placing this in replace_phi_edge_with_variable is the best option instead
of doing it in each time replace_phi_edge_with_variable is called which is
what is done today.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.c (replace_phi_edge_with_variable): Duplicate range
info if we're the only things setting the target PHI.
(value_replacement): Don't duplicate range here.
(minmax_replacement): Likewise.
---
  gcc/tree-ssa-phiopt.c | 43 ++-
  1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 1777bff2f7c..ab12e85569d 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -391,6 +391,32 @@ replace_phi_edge_with_variable (basic_block cond_block,
basic_block bb = gimple_bb (phi);
basic_block block_to_remove;
gimple_stmt_iterator gsi;
+  tree phi_result = PHI_RESULT (phi);
+
+  /* Duplicate range info if we're the only things setting the target PHI.


I'm not too familiar with the code so the comments are helpful.  But
I don't understand what you mean by "we're the only things" above.
(What's "we" and what might be some of the other "things?")  Can you
please clarify that comment?


+ This is needed as later on, the new_tree will be replacing
+ The assignement of the PHI.
+ For an example:
+ bb1:
+ _4 = min
+ goto bb2
+
+ range<-INF,255>
+ a_3 = PHI<_4(1)>
+ bb3:
+
+ use(a_3)
+ And _4 gets prograted into the use of a_3 and losing the range info.
+ This can't be done for more than 2 incoming edges as the progration
+ won't happen.  */


Presumably you mean propagated and propagation above?

Martin


+  if (TREE_CODE (new_tree) == SSA_NAME
+  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
+  && INTEGRAL_TYPE_P (TREE_TYPE (phi_result))
+  && !SSA_NAME_RANGE_INFO (new_tree)
+  && SSA_NAME_RANGE_INFO (phi_result))
+duplicate_ssa_name_range_info (new_tree,
+  SSA_NAME_RANGE_TYPE (phi_result),
+  SSA_NAME_RANGE_INFO (phi_result));
  
/* Change the PHI argument to new.  */

SET_USE (PHI_ARG_DEF_PTR (phi, e->dest_idx), new_tree);
@@ -1385,16 +1411,6 @@ value_replacement (basic_block cond_bb, basic_block 
middle_bb,
   :
   # u_3 = PHI   */
reset_flow_sensitive_info (lhs);
-  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
-   {
- /* If available, we can use VR of phi result at least.  */
- tree phires = gimple_phi_result (phi);
- struct range_info_def *phires_range_info
-   = SSA_NAME_RANGE_INFO (phires);
- if (phires_range_info)
-   duplicate_ssa_name_range_info (lhs, SSA_NAME_RANGE_TYPE (phires),
-  phires_range_info);
-   }
gimple_stmt_iterator gsi_from;
for (int i = prep_cnt - 1; i >= 0; --i)
{
@@ -1794,13 +1810,6 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
gimple_seq stmts = NULL;
tree phi_result = PHI_RESULT (phi);
result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1);
-  /* Duplicate range info if we're the only things setting the target PHI.  */
-  if (!gimple_seq_empty_p (stmts)
-  && EDGE_COUNT (gimple_bb (phi)->preds) == 2
-  && !POINTER_TYPE_P (TREE_TYPE (phi_result))
-  && SSA_NAME_RANGE_INFO (phi_result))
-duplicate_ssa_name_range_info (result, SSA_NAME_RANGE_TYPE (phi_result),
-  SSA_NAME_RANGE_INFO (phi_result));
  
gsi = gsi_last_bb (cond_bb);

gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT);

Re: [PATCH] c++: cxx_eval_array_reference and empty elt type [PR101194]

2021-06-29 Thread Jason Merrill via Gcc-patches


On 6/29/21 2:25 PM, Patrick Palka wrote:

Here the initializer for 'x' is represented as an empty CONSTRUCTOR
due to its empty element type.  So during constexpr evaluation of the
ARRAY_REF 'x[0]', we end up trying to lazily value initialize the
omitted element at index 0, which fails because the element type is not
default initializable.

This patch makes cxx_eval_array_reference handle specially the case
where the element type is empty.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/101194

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): When the element type
is empty, just return an empty CONSTRUCTOR for an omitted
element instead of attempting value initialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-empty16.C: New test.
---
  gcc/cp/constexpr.c |  4 +++-
  gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C | 10 ++
  2 files changed, 13 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 4cd9db33a1a..39787f3f5d5 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3845,7 +3845,9 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   directly for non-aggregates to avoid creating a garbage CONSTRUCTOR.  */
tree val;
constexpr_ctx new_ctx;
-  if (CP_AGGREGATE_TYPE_P (elem_type))
+  if (is_really_empty_class (elem_type, /*ignore_vptr*/false))
+return build_constructor (elem_type, NULL);
+  else if (CP_AGGREGATE_TYPE_P (elem_type))
  {
tree empty_ctor = build_constructor (init_list_type_node, NULL);
val = digest_init (elem_type, empty_ctor, tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C
new file mode 100644
index 000..79be244a1d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C
@@ -0,0 +1,10 @@
+// PR c++/101194
+// { dg-do compile { target c++11 } }
+
+struct nodefault {
+  constexpr nodefault(int) { }
+};
+
+constexpr nodefault x[1] = { nodefault{0} };
+
+constexpr nodefault y = x[0];

Re: [PATCH 1/2] c++: Fix push_access_scope and introduce RAII wrapper for it

2021-06-29 Thread Jason Merrill via Gcc-patches


On 6/29/21 1:57 PM, Patrick Palka wrote:

When push_access_scope is passed a TYPE_DECL for a class type (which
can happen during e.g. satisfaction), we undesirably push only the
enclosing context of the class instead of the class itself.  This causes
us to mishandle e.g. testcase below due to us not entering the scope of
A before checking its constraints.

This patch adjusts push_access_scope accordingly, and introduces an
RAII wrapper for it.  We also make use of this wrapper right away by
replacing the only use of push_nested_class_guard with this new wrapper,
which means we can remove this old wrapper (whose functionality is
basically subsumed by the new wrapper).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* constraint.cc (get_normalized_constraints_from_decl): Use
push_access_scope_guard instead of push_nested_class_guard.
* cp-tree.h (struct push_nested_class_guard): Replace with ...
(struct push_access_scope_guard): ... this.
* pt.c (push_access_scope): When the argument corresponds to
a class type, push the class instead of its context.
(pop_access_scope): Adjust accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-access2.C: New test.
---
  gcc/cp/constraint.cc  |  7 +-
  gcc/cp/cp-tree.h  | 23 +++
  gcc/cp/pt.c   |  9 +++-
  gcc/testsuite/g++.dg/cpp2a/concepts-access2.C | 13 +++
  4 files changed, 35 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-access2.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 6df3ca6ce32..99d3ccc6998 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -926,12 +926,7 @@ get_normalized_constraints_from_decl (tree d, bool diag = 
false)
tree norm = NULL_TREE;
if (tree ci = get_constraints (decl))
  {
-  push_nested_class_guard pncs (DECL_CONTEXT (d));
-
-  temp_override ovr (current_function_decl);
-  if (TREE_CODE (decl) == FUNCTION_DECL)
-   current_function_decl = decl;
-
+  push_access_scope_guard pas (decl);
norm = get_normalized_constraints_from_info (ci, tmpl, diag);
  }
  
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h

index 6f713719589..58da7460001 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8463,21 +8463,24 @@ is_constrained_auto (const_tree t)
return is_auto (t) && PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t);
  }
  
-/* RAII class to push/pop class scope T; if T is not a class, do nothing.  */

+/* RAII class to push/pop the access scope for T.  */
  
-struct push_nested_class_guard

+struct push_access_scope_guard
  {
-  bool push;
-  push_nested_class_guard (tree t)
-: push (t && CLASS_TYPE_P (t))
+  tree decl;
+  push_access_scope_guard (tree t)
+: decl (t)
{
-if (push)
-  push_nested_class (t);
+if (VAR_OR_FUNCTION_DECL_P (decl)
+   || TREE_CODE (decl) == TYPE_DECL)
+  push_access_scope (decl);
+else
+  decl = NULL_TREE;
}
-  ~push_nested_class_guard ()
+  ~push_access_scope_guard ()
{
-if (push)
-  pop_nested_class ();
+if (decl)
+  pop_access_scope (decl);
}
  };
  
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c

index f2039e09cd7..bd8b17ca047 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -224,7 +224,7 @@ static void instantiate_body (tree pattern, tree args, tree 
d, bool nested);
  /* Make the current scope suitable for access checking when we are
 processing T.  T can be FUNCTION_DECL for instantiated function
 template, VAR_DECL for static member variable, or TYPE_DECL for
-   alias template (needed by instantiate_decl).  */
+   for a class or alias template (needed by instantiate_decl).  */
  
  void

  push_access_scope (tree t)
@@ -234,6 +234,10 @@ push_access_scope (tree t)
  
if (DECL_FRIEND_CONTEXT (t))

  push_nested_class (DECL_FRIEND_CONTEXT (t));
+  else if (TREE_CODE (t) == TYPE_DECL
+  && CLASS_TYPE_P (TREE_TYPE (t))
+  && DECL_ORIGINAL_TYPE (t) == NULL_TREE)


I suspect DECL_IMPLICIT_TYPEDEF_P is a better test for this case.


+push_nested_class (TREE_TYPE (t));
else if (DECL_CLASS_SCOPE_P (t))
  push_nested_class (DECL_CONTEXT (t));
else if (deduction_guide_p (t) && DECL_ARTIFICIAL (t))
@@ -260,6 +264,9 @@ pop_access_scope (tree t)
  current_function_decl = saved_access_scope->pop();
  
if (DECL_FRIEND_CONTEXT (t)

+  || (TREE_CODE (t) == TYPE_DECL
+ && CLASS_TYPE_P (TREE_TYPE (t))
+ && DECL_ORIGINAL_TYPE (t) == NULL_TREE)
|| DECL_CLASS_SCOPE_P (t)
|| (deduction_guide_p (t) && DECL_ARTIFICIAL (t)))
  pop_nested_class ();
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-access2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-access2.C
new file mode 100644
index 000..8ddcad236e3
--- /dev/null
+++ b/gcc/testsuite/g++.

[PATCH] c++: cxx_eval_array_reference and empty elt type [PR101194]

2021-06-29 Thread Patrick Palka via Gcc-patches

Here the initializer for 'x' is represented as an empty CONSTRUCTOR
due to its empty element type.  So during constexpr evaluation of the
ARRAY_REF 'x[0]', we end up trying to lazily value initialize the
omitted element at index 0, which fails because the element type is not
default initializable.

This patch makes cxx_eval_array_reference handle specially the case
where the element type is empty.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/101194

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): When the element type
is empty, just return an empty CONSTRUCTOR for an omitted
element instead of attempting value initialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-empty16.C: New test.
---
 gcc/cp/constexpr.c |  4 +++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C | 10 ++
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 4cd9db33a1a..39787f3f5d5 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3845,7 +3845,9 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
  directly for non-aggregates to avoid creating a garbage CONSTRUCTOR.  */
   tree val;
   constexpr_ctx new_ctx;
-  if (CP_AGGREGATE_TYPE_P (elem_type))
+  if (is_really_empty_class (elem_type, /*ignore_vptr*/false))
+return build_constructor (elem_type, NULL);
+  else if (CP_AGGREGATE_TYPE_P (elem_type))
 {
   tree empty_ctor = build_constructor (init_list_type_node, NULL);
   val = digest_init (elem_type, empty_ctor, tf_warning_or_error);
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C
new file mode 100644
index 000..79be244a1d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-empty16.C
@@ -0,0 +1,10 @@
+// PR c++/101194
+// { dg-do compile { target c++11 } }
+
+struct nodefault {
+  constexpr nodefault(int) { }
+};
+
+constexpr nodefault x[1] = { nodefault{0} };
+
+constexpr nodefault y = x[0];
-- 
2.32.0.93.g670b81a890

[PATCH 2/4] rs6000: Add tests for SSE4.1 "test" intrinsics

2021-06-29 Thread Paul A. Clarke via Gcc-patches

Copy the test for _mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/sse4_1-ptest.c: Copy from
gcc/testsuite/gcc.target/i386.
---
 .../gcc.target/powerpc/sse4_1-ptest-1.c   | 117 ++
 1 file changed, 117 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
new file mode 100644
index ..69d13d57770d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
@@ -0,0 +1,117 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+
+static int
+make_ptestz (__m128i m, __m128i v)
+{
+  union
+{
+  __m128i x;
+  unsigned char c[16];
+} val, mask;
+  int i, z;
+
+  mask.x = m;
+  val.x = v;
+
+  z = 1;
+  for (i = 0; i < 16; i++)
+if ((mask.c[i] & val.c[i]))
+  {
+   z = 0;
+   break;
+  }
+  return z;
+}
+
+static int
+make_ptestc (__m128i m, __m128i v)
+{
+  union
+{
+  __m128i x;
+  unsigned char c[16];
+} val, mask;
+  int i, c;
+
+  mask.x = m;
+  val.x = v;
+
+  c = 1;
+  for (i = 0; i < 16; i++)
+if ((val.c[i] & ~mask.c[i]))
+  {
+   c = 0;
+   break;
+  }
+  return c;
+}
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x;
+  unsigned int i[4];
+} val[4];
+  int i, j, l;
+  int res[32];
+
+  val[0].i[0] = 0x;
+  val[0].i[1] = 0x;
+  val[0].i[2] = 0x;
+  val[0].i[3] = 0x;
+
+  val[1].i[0] = 0x;
+  val[1].i[1] = 0x;
+  val[1].i[2] = 0x;
+  val[1].i[3] = 0x;
+
+  val[2].i[0] = 0;
+  val[2].i[1] = 0;
+  val[2].i[2] = 0;
+  val[2].i[3] = 0;
+
+  val[3].i[0] = 0x;
+  val[3].i[1] = 0x;
+  val[3].i[2] = 0x;
+  val[3].i[3] = 0x;
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+for(j = 0; j < 4; j++)
+  {
+   res[l++] = _mm_testz_si128 (val[j].x, val[i].x);
+   res[l++] = _mm_testc_si128 (val[j].x, val[i].x);
+  }
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+for(j = 0; j < 4; j++)
+  {
+   if (res[l++] != make_ptestz (val[j].x, val[i].x))
+ abort ();
+   if (res[l++] != make_ptestc (val[j].x, val[i].x))
+ abort ();
+  }
+
+  if (res[2] != _mm_testz_si128 (val[1].x, val[0].x))
+abort ();
+
+  if (res[3] != _mm_testc_si128 (val[1].x, val[0].x))
+abort ();
+}
-- 
2.27.0

[PATCH 4/4] rs6000: Add tests for SSE4.1 "blend" intrinsics

2021-06-29 Thread Paul A. Clarke via Gcc-patches

Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  

gcc/testsuite/ChangeLog:
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c: Copy
from gcc/testsuite/gcc.target/i386.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
---
 .../gcc.target/powerpc/sse4_1-blendpd.c   | 89 ++
 .../gcc.target/powerpc/sse4_1-blendps-2.c | 81 +
 .../gcc.target/powerpc/sse4_1-blendps.c   | 90 +++
 .../gcc.target/powerpc/sse4_1-blendvpd.c  | 65 ++
 4 files changed, 325 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
new file mode 100644
index ..ca1780471fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+#include 
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x03
+#endif
+
+static void
+init_blendpd (double *src1, double *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 2; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendpd (__m128d *dst, double *src1, double *src2)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+
+  for(j = 0; j < 2; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128d x, y;
+  union
+{
+  __m128d x[NUM];
+  double d[NUM * 2];
+} dst, src1, src2;
+  union
+{
+  __m128d x;
+  double d[2];
+} src3;
+  int i;
+
+  init_blendpd (src1.d, src2.d);
+
+  /* Check blendpd imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
+  if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
+   abort ();
+}
+
+  /* Check blendpd imm8, xmm, xmm */
+  src3.x = _mm_setzero_pd ();
+
+  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
+  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
+
+  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
+abort ();
+
+  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
new file mode 100644
index ..768b6e64bbae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+#include 
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xe
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+{
+  src1[i] = i * i * sign;
+  src2[i] = (i + 20) * sign;
+  sign = -sign;
+}
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if ((MASK & (1 << j)))
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  __m128 x, y;
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2;
+  union
+{
+  __m128 x;
+  float f[4];
+} src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK); 
+  if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+   abort ();
+}
+
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
new file mode 100644
index ..2f114b69a84b
--- /dev/null
+++ b/gcc/tes

[PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics

2021-06-29 Thread Paul A. Clarke via Gcc-patches

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
_mm_blend_ps, _mm_blendv_ps): New.
---
 gcc/config/rs6000/smmintrin.h | 46 +++
 1 file changed, 46 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 1b8cad135ed0..fa17a8b2f478 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  const signed char __tmp = (__imm8 & 0b10) * 0b0000 |
+   (__imm8 & 0b01) * 0b;
+  __v16qi __charmask = vec_splats ((signed char) __tmp);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __shortmask);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
+(__imm8 & 0b0100) * 0b1100 |
+(__imm8 & 0b0010) * 0b0110 |
+(__imm8 & 0b0001) * 0b0011;
+  __v16qi __charmask = vec_splats ( __mask);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+}
+
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_testz_si128 (__m128i __A, __m128i __B)
 {
-- 
2.27.0

[PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics

2021-06-29 Thread Paul A. Clarke via Gcc-patches

2021-06-29  Paul A. Clarke  

gcc/ChangeLog:
* config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros): New.
---
 gcc/config/rs6000/smmintrin.h | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index bdf6eb365d88..1b8cad135ed0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_testz_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_testc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_testnzc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_all_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_all_ones (__m128i __A)
+{
+  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
+  return vec_all_eq ((__v16qu) __A, __ones);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
+  const int any_ones = vec_any_ne (__Amasked, __zero);
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
+  const int any_zeros = vec_any_ne (__notAmasked, __zero);
+  return any_ones * any_zeros;
+}
+
 #endif
-- 
2.27.0

[PATCH 0/4] rs6000: Add SSE4.1 "test" and "blend" intrinsics

2021-06-29 Thread Paul A. Clarke via Gcc-patches

Paul A. Clarke (4):
  rs6000: Add support for SSE4.1 "test" intrinsics
  rs6000: Add tests for SSE4.1 "test" intrinsics
  rs6000: Add support for SSE4.1 "blend" intrinsics
  rs6000: Add tests for SSE4.1 "blend" intrinsics

 gcc/config/rs6000/smmintrin.h |  96 ++
 .../gcc.target/powerpc/sse4_1-blendpd.c   |  89 +
 .../gcc.target/powerpc/sse4_1-blendps-2.c |  81 
 .../gcc.target/powerpc/sse4_1-blendps.c   |  90 ++
 .../gcc.target/powerpc/sse4_1-blendvpd.c  |  65 ++
 .../gcc.target/powerpc/sse4_1-ptest-1.c   | 117 ++
 6 files changed, 538 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c

-- 
2.27.0

Go patch committed: In composite literals use temps only for interfaces

2021-06-29 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend reduces the number of temporaries that
the compiler genrrates for composite literals.  For a composite
literal we only need to introduce a temporary variable if we may be
converting to an interface type, so only do it then.  This saves over
80% of compilation time when using gccgo to compile
cmd/internal/obj/x86, as the GCC middle-end spends a lot of time
pointlessly computing interactions between temporary variables (GCC PR
101064).  This is for that PR and for https://golang.org/issue/46600.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline and GCC 11 branch.

Ian
98df89168805cd34409307ef0495721d69acef74
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index f16fb9facc3..f7bcc8c484a 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-bcafcb3c39530bb325514d6377747eb3127d1a03
+cad187fe3aceb2a7d964b64c70dfa8c8ad24ce65
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 5d45e4baab4..94342b2f9b8 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -15148,7 +15148,7 @@ Struct_construction_expression::do_copy()
 }
 
 // Flatten a struct construction expression.  Store the values into
-// temporaries in case they need interface conversion.
+// temporaries if they may need interface conversion.
 
 Expression*
 Struct_construction_expression::do_flatten(Gogo*, Named_object*,
@@ -15162,10 +15162,13 @@ Struct_construction_expression::do_flatten(Gogo*, 
Named_object*,
 return this;
 
   Location loc = this->location();
+  const Struct_field_list* fields = this->type_->struct_type()->fields();
+  Struct_field_list::const_iterator pf = fields->begin();
   for (Expression_list::iterator pv = this->vals()->begin();
pv != this->vals()->end();
-   ++pv)
+   ++pv, ++pf)
 {
+  go_assert(pf != fields->end());
   if (*pv != NULL)
{
   if ((*pv)->is_error_expression() || (*pv)->type()->is_error_type())
@@ -15173,7 +15176,8 @@ Struct_construction_expression::do_flatten(Gogo*, 
Named_object*,
   go_assert(saw_errors());
   return Expression::make_error(loc);
 }
- if (!(*pv)->is_multi_eval_safe())
+ if (pf->type()->interface_type() != NULL
+ && !(*pv)->is_multi_eval_safe())
{
  Temporary_statement* temp =
Statement::make_temporary(NULL, *pv, loc);
@@ -15448,7 +15452,7 @@ Array_construction_expression::do_check_types(Gogo*)
 }
 
 // Flatten an array construction expression.  Store the values into
-// temporaries in case they need interface conversion.
+// temporaries if they may need interface conversion.
 
 Expression*
 Array_construction_expression::do_flatten(Gogo*, Named_object*,
@@ -15467,6 +15471,11 @@ Array_construction_expression::do_flatten(Gogo*, 
Named_object*,
   if (this->is_constant_array() || this->is_static_initializer())
 return this;
 
+  // If the array element type is not an interface type, we don't need
+  // temporaries.
+  if (this->type_->array_type()->element_type()->interface_type() == NULL)
+return this;
+
   Location loc = this->location();
   for (Expression_list::iterator pv = this->vals()->begin();
pv != this->vals()->end();

Re: [PATCH] Port GCC documentation to Sphinx

2021-06-29 Thread Eli Zaretskii via Gcc-patches

> Date: Tue, 29 Jun 2021 19:57:11 +0300
> From: Eli Zaretskii via Gcc 
> Cc: g...@gcc.gnu.org, gcc-patches@gcc.gnu.org, jos...@codesourcery.com
> 
> Or how about this:
> 
>   `Overall Options'
> 
>See Options Controlling the Kind of Output.
> 
>*note -c. *note -S. *note -E. *note -o. ‘`file'’
>*note -dumpbase. ‘`dumpbase'’ *note -dumpbase-ext.
>‘`auxdropsuf'’ *note -dumpdir. ‘`dumppfx'’ ‘-x’ ‘`language'’
>*note -v. *note -###. *note –help.‘[=`class'[,...]]’
>*note –target-help. *note –version. *note -pass-exit-codes
>. *note -pipe. *note -specs.‘=`file'’ *note -wrapper
>.‘@`file'’ *note -ffile-prefix-map.‘=`old'=`new'’ *note
>-fplugin.‘=`file'’ ‘-fplugin-arg-’‘`name'=`arg'’
>‘-fdump-ada-spec’‘[-`slim']’ *note -fada-spec-parent.‘=`unit'’
>*note -fdump-go-spec.‘=`file'’

I see that when I copied text into the mail, the "see" that Emacs
displays got replaced by "*note" (which is what actually appears in
the Info file).  So if you want to understand my references to the
ubiquitous "see", imagine that each "*note" is displayed as "see".

Apologies for any confusion.

[PATCH 1/2] c++: Fix push_access_scope and introduce RAII wrapper for it

2021-06-29 Thread Patrick Palka via Gcc-patches

When push_access_scope is passed a TYPE_DECL for a class type (which
can happen during e.g. satisfaction), we undesirably push only the
enclosing context of the class instead of the class itself.  This causes
us to mishandle e.g. testcase below due to us not entering the scope of
A before checking its constraints.

This patch adjusts push_access_scope accordingly, and introduces an
RAII wrapper for it.  We also make use of this wrapper right away by
replacing the only use of push_nested_class_guard with this new wrapper,
which means we can remove this old wrapper (whose functionality is
basically subsumed by the new wrapper).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* constraint.cc (get_normalized_constraints_from_decl): Use
push_access_scope_guard instead of push_nested_class_guard.
* cp-tree.h (struct push_nested_class_guard): Replace with ...
(struct push_access_scope_guard): ... this.
* pt.c (push_access_scope): When the argument corresponds to
a class type, push the class instead of its context.
(pop_access_scope): Adjust accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-access2.C: New test.
---
 gcc/cp/constraint.cc  |  7 +-
 gcc/cp/cp-tree.h  | 23 +++
 gcc/cp/pt.c   |  9 +++-
 gcc/testsuite/g++.dg/cpp2a/concepts-access2.C | 13 +++
 4 files changed, 35 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-access2.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 6df3ca6ce32..99d3ccc6998 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -926,12 +926,7 @@ get_normalized_constraints_from_decl (tree d, bool diag = 
false)
   tree norm = NULL_TREE;
   if (tree ci = get_constraints (decl))
 {
-  push_nested_class_guard pncs (DECL_CONTEXT (d));
-
-  temp_override ovr (current_function_decl);
-  if (TREE_CODE (decl) == FUNCTION_DECL)
-   current_function_decl = decl;
-
+  push_access_scope_guard pas (decl);
   norm = get_normalized_constraints_from_info (ci, tmpl, diag);
 }
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6f713719589..58da7460001 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8463,21 +8463,24 @@ is_constrained_auto (const_tree t)
   return is_auto (t) && PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t);
 }
 
-/* RAII class to push/pop class scope T; if T is not a class, do nothing.  */
+/* RAII class to push/pop the access scope for T.  */
 
-struct push_nested_class_guard
+struct push_access_scope_guard
 {
-  bool push;
-  push_nested_class_guard (tree t)
-: push (t && CLASS_TYPE_P (t))
+  tree decl;
+  push_access_scope_guard (tree t)
+: decl (t)
   {
-if (push)
-  push_nested_class (t);
+if (VAR_OR_FUNCTION_DECL_P (decl)
+   || TREE_CODE (decl) == TYPE_DECL)
+  push_access_scope (decl);
+else
+  decl = NULL_TREE;
   }
-  ~push_nested_class_guard ()
+  ~push_access_scope_guard ()
   {
-if (push)
-  pop_nested_class ();
+if (decl)
+  pop_access_scope (decl);
   }
 };
 
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f2039e09cd7..bd8b17ca047 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -224,7 +224,7 @@ static void instantiate_body (tree pattern, tree args, tree 
d, bool nested);
 /* Make the current scope suitable for access checking when we are
processing T.  T can be FUNCTION_DECL for instantiated function
template, VAR_DECL for static member variable, or TYPE_DECL for
-   alias template (needed by instantiate_decl).  */
+   for a class or alias template (needed by instantiate_decl).  */
 
 void
 push_access_scope (tree t)
@@ -234,6 +234,10 @@ push_access_scope (tree t)
 
   if (DECL_FRIEND_CONTEXT (t))
 push_nested_class (DECL_FRIEND_CONTEXT (t));
+  else if (TREE_CODE (t) == TYPE_DECL
+  && CLASS_TYPE_P (TREE_TYPE (t))
+  && DECL_ORIGINAL_TYPE (t) == NULL_TREE)
+push_nested_class (TREE_TYPE (t));
   else if (DECL_CLASS_SCOPE_P (t))
 push_nested_class (DECL_CONTEXT (t));
   else if (deduction_guide_p (t) && DECL_ARTIFICIAL (t))
@@ -260,6 +264,9 @@ pop_access_scope (tree t)
 current_function_decl = saved_access_scope->pop();
 
   if (DECL_FRIEND_CONTEXT (t)
+  || (TREE_CODE (t) == TYPE_DECL
+ && CLASS_TYPE_P (TREE_TYPE (t))
+ && DECL_ORIGINAL_TYPE (t) == NULL_TREE)
   || DECL_CLASS_SCOPE_P (t)
   || (deduction_guide_p (t) && DECL_ARTIFICIAL (t)))
 pop_nested_class ();
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-access2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-access2.C
new file mode 100644
index 000..8ddcad236e3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-access2.C
@@ -0,0 +1,13 @@
+// { dg-do compile { target concepts } }
+
+template requires T::value struct A { };
+template requires T::value st

[PATCH 2/2] c++: Extend PR96204 fix to variable templates

2021-06-29 Thread Patrick Palka via Gcc-patches

r12-1829 corrected the access scope during partial specialization
matching of class templates, but neglected the variable template case.
This patch moves the access scope adjustment to inside
most_specialized_partial_spec, so that all callers can benefit.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/96204

gcc/cp/ChangeLog:

* pt.c (instantiate_class_template_1): Remove call to
push_nested_class and pop_nested_class added by r12-1829.
(most_specialized_partial_spec): Use push_access_scope_guard
and deferring_access_check_sentinel.

gcc/testsuite/ChangeLog:

* g++.dg/template/access40b.C: New test.
---
 gcc/cp/pt.c   | 12 +++
 gcc/testsuite/g++.dg/template/access40b.C | 26 +++
 2 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/access40b.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index bd8b17ca047..1e2e2ba5329 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11776,11 +11776,8 @@ instantiate_class_template_1 (tree type)
   deferring_access_check_sentinel acs (dk_no_deferred);
 
   /* Determine what specialization of the original template to
- instantiate; do this relative to the scope of the class for
- sake of access checking.  */
-  push_nested_class (type);
+ instantiate.  */
   t = most_specialized_partial_spec (type, tf_warning_or_error);
-  pop_nested_class ();
   if (t == error_mark_node)
 return error_mark_node;
   else if (t)
@@ -24989,26 +24986,33 @@ most_specialized_partial_spec (tree target, 
tsubst_flags_t complain)
   tree outer_args = NULL_TREE;
   tree tmpl, args;
 
+  tree decl;
   if (TYPE_P (target))
 {
   tree tinfo = CLASSTYPE_TEMPLATE_INFO (target);
   tmpl = TI_TEMPLATE (tinfo);
   args = TI_ARGS (tinfo);
+  decl = TYPE_NAME (target);
 }
   else if (TREE_CODE (target) == TEMPLATE_ID_EXPR)
 {
   tmpl = TREE_OPERAND (target, 0);
   args = TREE_OPERAND (target, 1);
+  decl = DECL_TEMPLATE_RESULT (tmpl);
 }
   else if (VAR_P (target))
 {
   tree tinfo = DECL_TEMPLATE_INFO (target);
   tmpl = TI_TEMPLATE (tinfo);
   args = TI_ARGS (tinfo);
+  decl = target;
 }
   else
 gcc_unreachable ();
 
+  push_access_scope_guard pas (decl);
+  deferring_access_check_sentinel acs (dk_no_deferred);
+
   tree main_tmpl = most_general_template (tmpl);
 
   /* For determining which partial specialization to use, only the
diff --git a/gcc/testsuite/g++.dg/template/access40b.C 
b/gcc/testsuite/g++.dg/template/access40b.C
new file mode 100644
index 000..040e3d18096
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access40b.C
@@ -0,0 +1,26 @@
+// PR c++/96204
+// { dg-do compile { target c++14 } }
+// A variant of access40.C where has_type_member is a variable template instead
+// of a class template.
+
+template
+constexpr bool has_type_member = false;
+
+template
+constexpr bool has_type_member = true;
+
+struct Parent;
+
+struct Child {
+private:
+  friend struct Parent;
+  typedef void type;
+};
+
+struct Parent {
+  static void f() {
+// The partial specialization does not match despite Child::type
+// being accessible from the current scope.
+static_assert(!has_type_member, "");
+  }
+};
-- 
2.32.0.93.g670b81a890

[PATCH] i386: Add V2SFmode vec_addsub pattern [PR95046]

2021-06-29 Thread Uros Bizjak via Gcc-patches

gcc/

2021-06-21  Uroš Bizjak  

PR target/95046
* config/i386/mmx.md (vec_addsubv2sf3): New insn pattern.

gcc/testsuite/

2021-06-21  Uroš Bizjak  

PR target/95046
* gcc.target/i386/pr95046-9.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e887f03474d..5f10572718d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -788,6 +788,24 @@ (define_insn "*mmx_haddsubv2sf3"
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
+(define_insn "vec_addsubv2sf3"
+  [(set (match_operand:V2SF 0 "register_operand" "=x,x")
+   (vec_merge:V2SF
+ (minus:V2SF
+   (match_operand:V2SF 1 "register_operand" "0,x")
+   (match_operand:V2SF 2 "register_operand" "x,x"))
+ (plus:V2SF (match_dup 1) (match_dup 2))
+ (const_int 1)))]
+  "TARGET_SSE3 && TARGET_MMX_WITH_SSE"
+  "@
+   addsubps\t{%2, %0|%0, %2}
+   vaddsubps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "prefix_rep" "1,*")
+   (set_attr "mode" "V4SF")])
+
 ;
 ;;
 ;; Parallel single-precision floating point comparisons
diff --git a/gcc/testsuite/gcc.target/i386/pr95046-9.c 
b/gcc/testsuite/gcc.target/i386/pr95046-9.c
new file mode 100644
index 000..54e948ccfb0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr95046-9.c
@@ -0,0 +1,14 @@
+/* PR target/95046 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O3 -msse3" } */
+
+float r[2], a[2], b[2];
+
+void
+test (void)
+{
+  r[0] = a[0] - b[0];
+  r[1] = a[1] + b[1];
+}
+
+/* { dg-final { scan-assembler "\tv?addsubps" } } */

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-29 Thread Martin Sebor via Gcc-patches


On 6/29/21 8:43 AM, Jason Merrill wrote:

On 6/28/21 2:07 PM, Martin Sebor wrote:

On 6/28/21 2:07 AM, Richard Biener wrote:

On Sat, Jun 26, 2021 at 12:36 AM Martin Sebor  wrote:


On 6/25/21 4:11 PM, Jason Merrill wrote:

On 6/25/21 4:51 PM, Martin Sebor wrote:

On 6/1/21 3:38 PM, Jason Merrill wrote:

On 6/1/21 3:56 PM, Martin Sebor wrote:

On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign 
because
the class manages its own memory but doesn't define (or 
delete)

either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec
along
with a few simple tests.  It makes auto_vec safe to use in
containers
that expect copyable and assignable element types and passes
bootstrap
and regression testing on x86_64-linux.


The question is whether we want such uses to appear since 
those

can be quite inefficient?  Thus the option is to delete those
operators?


I would strongly prefer the generic vector class to have the
properties
expected of any other generic container: copyable and
assignable.  If
we also want another vector type with this restriction I 
suggest

to add
another "noncopyable" type and make that property explicit in
its name.
I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign).
Looking around
I see that vec<> does not do deep copying.  Making auto_vec<> 
do it
might be surprising (I added the move capability to match how 
vec<>

is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all 
(because
of their use in unions).  That's something we might have to 
live with

but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're
writing C++11.

The auto_vec class was introduced to fill the need for a 
conventional
sequence container with a ctor and dtor.  The missing copy 
ctor and

assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the 
auto_vec

primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion
richi mentions.  And many uses of auto_vec will pass them as vec,
which will still do a shallow copy.  I think it's probably better
to disable the copy special members for auto_vec until we fix 
vec<>.


There are at least a couple of problems that get in the way of 
fixing

all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by 
providing

copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are 
PODs.

That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.

It doesn't seem to me that having a safely copyable auto_vec needs
to be put on hold until the rats nest above is untangled.  It won't
make anything worse than it is.  (I have a project that depends on
a sane auto_vec working).

A couple of alternatives to solving this are to use std::vector or
write an equivalent vector class just for GCC.


It occurs to me that another way to work around the issue of passing
an auto_vec by value as a vec, and thus doing a shallow copy, would
be to add a vec ctor taking an auto_vec, and delete that.  This 
would

mean if you want to pass an auto_vec to a vec interface, it needs to
be by reference.  We might as well do the same for operator=, though
that isn't as important.


Thanks, that sounds like a good idea.  Attached is an im

Re: [PATCH] Port GCC documentation to Sphinx

2021-06-29 Thread Eli Zaretskii via Gcc-patches

> From: Martin Liška 
> Date: Tue, 29 Jun 2021 12:09:23 +0200
> Cc: GCC Development , gcc-patches@gcc.gnu.org
> 
> On 6/28/21 5:33 PM, Joseph Myers wrote:
> > Are formatted manuals (HTML, PDF, man, info) corresponding to this patch
> > version also available for review?
> 
> I've just uploaded them here:
> https://splichal.eu/gccsphinx-final/

Thanks.

I'm an Info junkie, so I grabbed gcc.info from there and skimmed
through it.  Please allow me a few unsolicited comments:

1. It sounds like Sphinx is heavily biased towards HTML format, and as
   result uglifies the Info format?

For example, many cross-references (AFAIU introduced as part of
migration to Sphinx) make the text illegible in Emacs.  Example:

  This standard, in both its forms, is commonly known as `C89', or
  occasionally as `C90', from the dates of ratification.  To select this
  standard in GCC, use one of the options *note -ansi *note -std
  .‘=c90’ or *note -std.‘=iso9899:1990’; to obtain all the diagnostics
  required by the standard, you should also specify *note -pedantic.
  (or *note -pedantic-errors. if you want them to be errors rather
  than warnings).  See *note Options Controlling C Dialect.
  [...]
  An amendment to the 1990 standard was published in 1995.  This amendment
  added digraphs and ‘__STDC_VERSION__’ to the language, but otherwise
  concerned the library.  This amendment is commonly known as `AMD1'; the
  amended standard is sometimes known as `C94' or `C95'.  To select this
  standard in GCC, use the option *note -std.‘=iso9899:199409’ (with,
  as for other standard versions, *note -pedantic. to receive all
  required diagnostics).

Or how about this:

  `Overall Options'

   See Options Controlling the Kind of Output.

   *note -c. *note -S. *note -E. *note -o. ‘`file'’
   *note -dumpbase. ‘`dumpbase'’ *note -dumpbase-ext.
   ‘`auxdropsuf'’ *note -dumpdir. ‘`dumppfx'’ ‘-x’ ‘`language'’
   *note -v. *note -###. *note –help.‘[=`class'[,...]]’
   *note –target-help. *note –version. *note -pass-exit-codes
   . *note -pipe. *note -specs.‘=`file'’ *note -wrapper
   .‘@`file'’ *note -ffile-prefix-map.‘=`old'=`new'’ *note
   -fplugin.‘=`file'’ ‘-fplugin-arg-’‘`name'=`arg'’
   ‘-fdump-ada-spec’‘[-`slim']’ *note -fada-spec-parent.‘=`unit'’
   *note -fdump-go-spec.‘=`file'’

In the first line, the emphasis became quotes, which sounds sub-optimal.
In the second line, the hyperlink was lost.
And the rest is not really readable.

Compare this with the original:

  _Overall Options_
   *Note Options Controlling the Kind of Output.
-c  -S  -E  -o FILE  -x LANGUAGE
-v  -###  --help[=CLASS[,...]]  --target-help  --version
-pass-exit-codes  -pipe  -specs=FILE  -wrapper
@FILE  -ffile-prefix-map=OLD=NEW
-fplugin=FILE  -fplugin-arg-NAME=ARG
-fdump-ada-spec[-slim]  -fada-spec-parent=UNIT  -fdump-go-spec=FILE

(Admittedly, Emacs by default hides some of the text of a
cross-reference, but not hiding them in this case produces an even
less legible text.)

In general, it is a well-known rule that Texinfo documentation should
NOT use @ref{foo} as if @ref will disappear without a trace, leaving
just the hyperlink to 'foo'.  Looks like the rewritten manual uses
that a lot.

This "see" consistently gets in the way throughout the entire manual.
A few more examples:

   -- Option: -flocal-ivars

   Default option value for *note -fno-local-ivars.
   ...
   For example *note -std.‘=gnu90 -Wpedantic’ warns about C++ style
   ‘//’ comments, while *note -std.‘=gnu99 -Wpedantic’ does not.
   ...
   If this option is not provided but *note -Wabi.‘=`n'’ is, that
   version is used for compatibility aliases.
   ...
   Below *note -std.‘=c++20’, *note -fconcepts. enables
   support for the C++ Extensions for Concepts Technical
   Specification, ISO 19217 (2015).
   ...
  gcov [ *note -v. | *note –version. ] [ ‘-h’ | *note –help. ]

2. The translation of @var produces double-quoting in Info, here's an
   example:

  The usual way to run GCC is to run the executable called ‘gcc’, or
  ‘`machine'-gcc’ when cross-compiling, or ‘`machine'-gcc-`version'’ to
  run a specific version of GCC.

vs, the old

   The usual way to run GCC is to run the executable called 'gcc', or
  'MACHINE-gcc' when cross-compiling, or 'MACHINE-gcc-VERSION' to run a
  specific version of GCC.

I think the new variant is less readable and more confusing, because
it isn't clear whether the quotes are part of the text.  Here's an
extreme example:

  ‘@`file'’

   Read command-line options from ‘`file'’.  The options read are
   inserted in place of the original ‘@`file'’ option.  If ‘`file'’
   does not exist, or cannot be read, then the option will be treated
   literally, and not removed.

3. Some cross-references lost the hyperlinks:

  See option-index, for an index to GCC’s options.

  ("option-index" was a hyperlink

Re: [wwwdocs] gcc-12/changes.html: GCN - add TI mode, mention -foffload(-options)

2021-06-29 Thread Julian Brown

On Tue, 29 Jun 2021 17:34:00 +0200
Tobias Burnus  wrote:

> This documents AMD GCN's new much-more complete TI-mode
> (__int128_t) support, that was as v2 just posted by Julian
> and should get committed very soon.

Thank you!

> gcc-12/changes.html: GCN - add TI mode, mention -foffload(-options)
> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
> index b854c4e6..599443e7 100644
> --- a/htdocs/gcc-12/changes.html
> +++ b/htdocs/gcc-12/changes.html
> @@ -62,6 +62,14 @@ a work-in-progress.
>OpenACC. It warns about potentially suboptimal choices related
> to OpenACC parallelism.
>
> +  The offload target code generation for OpenMP and OpenACC can
> now
> +  be better adjused using the new

Re: GCC documentation: porting to Sphinx

2021-06-29 Thread Arnaud Charlet

> >In particular can you explain the motivation behind all the changes in the
> >gcc/ada/doc directory?
> 
> Sure:
> 1) All Sphinx manuals live in a directory where index page is called 
> index.rst. That's why
> I moved e.g. this: gcc/ada/doc/{gnat_rm.rst => gnat_rm/index.rst}
> 2) I moved latex_elements.py to ada_latex_elements.py as it clashes with 
> Sphinx global variable
> you modify in Sphinx config files
> 3) I created a shared Ada config (adabaseconf.py) that extends 
> doc/baseconf.py and sets what
> is shared in between 3 Ada manuals.
> 4) gnu_free_documentation_license.rst is taken from $root/doc/

OK, this is really lots of changes, if we could minimize these changes
that would be best (and sorry but posting a link to a tarball also doesn't
help reviews, it was actually better with a link to a git repo previously...
At least the Ada part itself shouldn't be too big in particular once
simplified so could be posted standalone).

Arno

Re: [PATCH v3] fixinc: don't "fix" machine names in __has_include(...) [PR91085]

2021-06-29 Thread Bruce Korb via Gcc-patches


On 6/28/21 10:26 PM, Xi Ruoyao wrote:

v3:
   use memmem/memchr instead of trivial loops
   split most of the logic into a static function
   avoid hardcoded magic number
   adjust test

Looks good to me. :)

[wwwdocs] gcc-12/changes.html: GCN - add TI mode, mention -foffload(-options)

2021-06-29 Thread Tobias Burnus


This documents AMD GCN's new much-more complete TI-mode
(__int128_t) support, that was as v2 just posted by Julian
and should get committed very soon.

Additionally, -foffload= (previously undocumented) has been
split into -foffload= and -foffload-options= and now has a
documentation. Hence, both flags are now in the release notes,
linking to their documentation. (Link are broken until the
next cron run.)

Comments? Concerns? Remarks?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
gcc-12/changes.html: GCN - add TI mode, mention -foffload(-options)
diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index b854c4e6..599443e7 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -62,6 +62,14 @@ a work-in-progress.
   OpenACC. It warns about potentially suboptimal choices related to
   OpenACC parallelism.
   
+  The offload target code generation for OpenMP and OpenACC can now
+  be better adjused using the new https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload-options";
+  >-foffload-options= flag and the pre-existing but now
+  documented https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload";
+  >-foffload= flag.
+  
 
 
 
@@ -104,6 +112,8 @@ a work-in-progress.
 AMD Radeon (GCN)
 
   Debug experience with ROCGDB has been improved.
+  Support for the type __int128_t/integer(kind=16)
+  was added.

Re: [PATCH 4/7] Allow match-and-simplified phiopt to run in early phiopt

2021-06-29 Thread Jeff Law via Gcc-patches





On 6/25/2021 2:24 AM, Richard Biener wrote:

On Thu, Jun 24, 2021 at 6:24 PM Jeff Law via Gcc-patches
 wrote:



On 6/23/2021 4:19 PM, apinski--- via Gcc-patches wrote:

From: Andrew Pinski 

To move a few things more to match-and-simplify from phiopt,
we need to allow match_simplify_replacement to run in early
phiopt. To do this we add a replacement for gimple_simplify
that is explictly for phiopt.

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

gcc/ChangeLog:

   * tree-ssa-phiopt.c (match_simplify_replacement):
   Add early_p argument. Call gimple_simplify_phiopt
   instead of gimple_simplify.
   (tree_ssa_phiopt_worker): Update call to
   match_simplify_replacement and allow unconditionally.
   (phiopt_early_allow): New function.
   (gimple_simplify_phiopt): New function.

So the two questions on my end are why did we restrict when this could
run before and why restrict the codes we're willing to optimize in the
early phase?  Not an ACK or NAK at this point, just trying to understand
a bit of the history.

I've done this because jump threading likes to see the CFG structure
and some of the testcases use if () return 0/1 which are prone to be
value-replaced by phiopt.  At the point I added the early phiopt I
didn't want to go and fixup all the testcases to avoid the phiopt transforms
nor did I want to investigate the "real" impact on code - the purpose
of early phiopt was exactly to get min/max/abs replacement done so
that was the way of least resistance ...

I'd rather not revisit this decision as part of the match-and-simplify
series but of course if anyone dares to do the detective work she'll be
welcome.
Thanks for the background.   So Andrew is largely just preserving this 
property.  Works for me.   And just to be explicit 4/7 is fine for the 
trunk.


jeff

Re: [PATCH] Rearrange detection of temporary directory for NetBSD

2021-06-29 Thread Jeff Law via Gcc-patches





On 6/28/2021 4:45 PM, Gerald Pfeifer wrote:

On Thu, 26 Mar 2020, Kamil Rytarowski wrote:

On 25.03.2020 23:36, Jeff Law wrote:

I wouldn't mind dropping /usr/tmp.  That so antiquated that it'd be
non- controversial.  Can you send that as a separate patch.

Behavior for !__NetBSD__ is out of interest.

This is not a very useful approach in a collaborative project like GCC.

Incremental changes (including cleanups) help and are a good way to get
engaged, improve the overall code base, and gain support from others
(who may not have any interest in the __NetBSD__ case, but be willing
to collaborate).

@Jeff, is the following what you had in mind?

It passed testing on i686-unknown-freebsd12; okay to push?

Gerald


commit 8365565396cee65aeb6c2e4bfad74e095a3c388c
Author: Gerald Pfeifer 
Date:   Tue Jun 29 00:39:15 2021 +0200

 libiberty: No longer use /usr/tmp
 
 /usr/tmp is antiquated and not present on decently modern systems.

 Remove it from consideration when choosing a directory for temporary
 files.
 
 libiberty:
 
 2021-06-29  Gerald Pfeifer  
 
 * make-temp-file.c (usrtmp): Remove.

 (choose_tmpdir): Remove use of usrtmp.
Yup.  This is fine.  You might consider updating the comment which 
references /usr/tmp in choose_tmpdir along the way.

jeff

Re: [COMMITTED V10 3/7] CTF/BTF debug formats

2021-06-29 Thread Joseph Myers

On Tue, 29 Jun 2021, David Edelsohn via Gcc-patches wrote:

> On Tue, Jun 29, 2021 at 10:33 AM Joseph Myers  wrote:
> >
> > There's now a build failure for sparc64-linux-gnu:
> >
> > In file included from ./tm_p.h:4:0,
> >  from /scratch/jmyers/glibc-bot/src/gcc/gcc/ctfout.c:24:
> > /scratch/jmyers/glibc-bot/src/gcc/gcc/config/sparc/sparc-protos.h:46:47: 
> > error: use of enum 'memmodel' without previous declaration
> >  extern void sparc_emit_membar_for_model (enum memmodel, int, int);
> >^
> >
> > (and likewise in btfout.c).
> 
> I see memmodel.h included before tm_p.h.  Do you want to commit that,
> or should I?

I've committed this patch.

> Are you aware of other dependencies for other targets?

This is the only failure I've seen with my build-many-glibcs.py bot, but 
that only covers architectures supported by glibc.


bootstrap: Include memmodel.h in btfout.c and ctfout.c before tm_p.h

This fixes a "use of enum 'memmodel' without previous declaration"
error in sparc-protos.h.

Minimally tested that this fixes the build-many-glibcs.py compilers
build for sparc64-linux-gnu.

* btfout.c, ctfout.c: Include "memmodel.h".

diff --git a/gcc/btfout.c b/gcc/btfout.c
index 2316dea5f27..e58c969825a 100644
--- a/gcc/btfout.c
+++ b/gcc/btfout.c
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
+#include "memmodel.h"
 #include "tm_p.h"
 #include "output.h"
 #include "dwarf2asm.h"
diff --git a/gcc/ctfout.c b/gcc/ctfout.c
index 71d7a62e6ef..682d8529a58 100644
--- a/gcc/ctfout.c
+++ b/gcc/ctfout.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
+#include "memmodel.h"
 #include "tm_p.h"
 #include "output.h"
 #include "dwarf2asm.h"

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

2021-06-29 Thread Julian Brown

On Fri, 18 Jun 2021 15:55:09 +0100
Andrew Stubbs  wrote:

> On 18/06/2021 15:19, Julian Brown wrote:
> > This patch improves 64-bit multiplication for AMD GCN: patterns for
> > unsigned and signed 32x32->64 bit multiplication have been added,
> > and also 64x64->64 bit multiplication is now open-coded rather than
> > calling a library function (which may be a win for code size as
> > well as speed: the function calling sequence isn't particularly
> > concise for GCN).
> > 
> > The mulsi3_highpart pattern has also been extended for GCN5+,
> > since that ISA version supports high-part result multiply
> > instructions with SGPR operands.
> > 
> > The DImode multiply implementation is lost from libgcc if we build
> > it for DImode/TImode rather than SImode/DImode, a change we make in
> > a later patch in this series.

[snip]

> Most of the rest of the backend expands 64-bit operations to 32-bit 
> pairs much later, using define_insn_and_split, because there were
> lots of issues with splitting it early. I don't recall exactly what
> right now, unfortunately. (It might have been related to spilling
> only half the value to the stack?) It also makes it hard to debug, I
> think.

FTR, I followed up on this here:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573911.html

Julian

[PATCH 3/3] amdgcn: Add [us]mulsid3/muldi3 patterns

2021-06-29 Thread Julian Brown

This patch improves 64-bit multiplication for AMD GCN: patterns for
unsigned and signed 32x32->64 bit multiplication have been added, and
also 64x64->64 bit multiplication is now open-coded rather than calling
a library function (which may be a win for code size as well as speed:
the function calling sequence isn't particularly concise for GCN).

This version of the patch uses define_insn_and_split in order to keep
multiply operations together during RTL optimisations up to register
allocation: this appears to produce more compact code via inspection on
small test cases than the previous approach using an expander.

I will apply shortly.

Julian

2021-06-29  Julian Brown  

gcc/
* config/gcn/gcn.md (mulsidi3, mulsidi3_reg, mulsidi3_imm,
muldi3): Add patterns.
---
 gcc/config/gcn/gcn.md | 94 +++
 1 file changed, 94 insertions(+)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index d1d49981ebb..82f7a468bce 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1457,6 +1457,100 @@
(set_attr "length" "4,8,8")
(set_attr "gcn_version" "gcn5,gcn5,*")])
 
+(define_expand "mulsidi3"
+  [(set (match_operand:DI 0 "register_operand" "")
+   (mult:DI (any_extend:DI
+  (match_operand:SI 1 "register_operand" ""))
+(any_extend:DI
+  (match_operand:SI 2 "nonmemory_operand" ""]
+  ""
+{
+  if (can_create_pseudo_p ()
+  && !TARGET_GCN5
+  && !gcn_inline_immediate_operand (operands[2], SImode))
+operands[2] = force_reg (SImode, operands[2]);
+
+  if (REG_P (operands[2]))
+emit_insn (gen_mulsidi3_reg (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_mulsidi3_imm (operands[0], operands[1], operands[2]));
+
+  DONE;
+})
+
+(define_insn_and_split "mulsidi3_reg"
+  [(set (match_operand:DI 0 "register_operand"   "=&Sg, &v")
+   (mult:DI (any_extend:DI
+  (match_operand:SI 1 "register_operand" "%Sg,  v"))
+(any_extend:DI
+  (match_operand:SI 2 "register_operand"  "Sg,vSv"]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+  {
+rtx dstlo = gen_lowpart (SImode, operands[0]);
+rtx dsthi = gen_highpart_mode (SImode, DImode, operands[0]);
+emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2]));
+emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2]));
+DONE;
+  }
+  [(set_attr "gcn_version" "gcn5,*")])
+
+(define_insn_and_split "mulsidi3_imm"
+  [(set (match_operand:DI 0 "register_operand""=&Sg,&Sg,&v")
+   (mult:DI (any_extend:DI
+  (match_operand:SI 1 "register_operand"   "Sg, Sg, v"))
+(match_operand:DI 2 "gcn_32bit_immediate_operand"
+"A,  B, A")))]
+  "TARGET_GCN5 || gcn_inline_immediate_operand (operands[2], SImode)"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+rtx dstlo = gen_lowpart (SImode, operands[0]);
+rtx dsthi = gen_highpart_mode (SImode, DImode, operands[0]);
+emit_insn (gen_mulsi3 (dstlo, operands[1], operands[2]));
+emit_insn (gen_mulsi3_highpart (dsthi, operands[1], operands[2]));
+DONE;
+  }
+  [(set_attr "gcn_version" "gcn5,gcn5,*")])
+
+(define_insn_and_split "muldi3"
+  [(set (match_operand:DI 0 "register_operand" "=&Sg,&Sg, &v,&v")
+   (mult:DI (match_operand:DI 1 "register_operand" "%Sg, Sg,  v, v")
+(match_operand:DI 2 "nonmemory_operand" "Sg,  i,vSv, A")))
+   (clobber (match_scratch:SI 3 "=&Sg,&Sg,&v,&v"))
+   (clobber (match_scratch:BI 4  "=cs, cs, X, X"))
+   (clobber (match_scratch:DI 5   "=X,  X,cV,cV"))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+  {
+rtx tmp = operands[3];
+rtx dsthi = gen_highpart_mode (SImode, DImode, operands[0]);
+rtx op1lo = gcn_operand_part (DImode, operands[1], 0);
+rtx op1hi = gcn_operand_part (DImode, operands[1], 1);
+rtx op2lo = gcn_operand_part (DImode, operands[2], 0);
+rtx op2hi = gcn_operand_part (DImode, operands[2], 1);
+emit_insn (gen_umulsidi3 (operands[0], op1lo, op2lo));
+emit_insn (gen_mulsi3 (tmp, op1lo, op2hi));
+rtx add = gen_rtx_SET (dsthi, gen_rtx_PLUS (SImode, dsthi, tmp));
+rtx clob1 = gen_rtx_CLOBBER (VOIDmode, operands[4]);
+rtx clob2 = gen_rtx_CLOBBER (VOIDmode, operands[5]);
+add = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (3, add, clob1, clob2));
+emit_insn (add);
+emit_insn (gen_mulsi3 (tmp, op1hi, op2lo));
+add = gen_rtx_SET (dsthi, gen_rtx_PLUS (SImode, dsthi, tmp));
+clob1 = gen_rtx_CLOBBER (VOIDmode, operands[4]);
+clob2 = gen_rtx_CLOBBER (VOIDmode, operands[5]);
+add = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (3, add, clob1, clob2));
+emit_insn (add);
+DONE;
+  }
+  [(set_attr "gcn_version" "gcn5,gcn5,*,*")])
+
 (define_insn "mulhisi3"
   [(set (match_operand:SI 0 "register_operand" "=v")

[PATCH 2/3] amdgcn: Add [us]mulsi3_highpart SGPR alternatives

2021-06-29 Thread Julian Brown

This patch splits the mulsi3_highpart pattern into an expander and
register/immediate alternatives (to avoid meaningless sign/zero_extends on
constants), and adds alternatives for SGPR high-part multiply instructions
on GCN5+.

I will apply shortly.

Julian

2021-06-29  Julian Brown  

gcc/
* config/gcn/gcn.md (mulsi3_highpart): Change to expander.
(mulsi3_highpart_reg, mulsi3_highpart_imm): New patterns.
---
 gcc/config/gcn/gcn.md | 62 ++-
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index cca45522fba..d1d49981ebb 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1394,20 +1394,68 @@
 (define_code_attr iu [(sign_extend "i") (zero_extend "u")])
 (define_code_attr e [(sign_extend "e") (zero_extend "")])
 
-(define_insn "mulsi3_highpart"
-  [(set (match_operand:SI 0 "register_operand""= v")
+(define_expand "mulsi3_highpart"
+  [(set (match_operand:SI 0 "register_operand" "")
(truncate:SI
  (lshiftrt:DI
(mult:DI
  (any_extend:DI
-   (match_operand:SI 1 "register_operand" "% v"))
+   (match_operand:SI 1 "register_operand" ""))
  (any_extend:DI
-   (match_operand:SI 2 "register_operand" "vSv")))
+   (match_operand:SI 2 "gcn_alu_operand"  "")))
(const_int 32]
   ""
-  "v_mul_hi0\t%0, %2, %1"
-  [(set_attr "type" "vop3a")
-   (set_attr "length" "8")])
+{
+  if (can_create_pseudo_p ()
+  && !TARGET_GCN5
+  && !gcn_inline_immediate_operand (operands[2], SImode))
+operands[2] = force_reg (SImode, operands[2]);
+
+  if (REG_P (operands[2]))
+emit_insn (gen_mulsi3_highpart_reg (operands[0], operands[1],
+   operands[2]));
+  else
+emit_insn (gen_mulsi3_highpart_imm (operands[0], operands[1],
+   operands[2]));
+
+  DONE;
+})
+
+(define_insn "mulsi3_highpart_reg"
+  [(set (match_operand:SI 0 "register_operand""=Sg,  v")
+   (truncate:SI
+ (lshiftrt:DI
+   (mult:DI
+ (any_extend:DI
+   (match_operand:SI 1 "register_operand" "%Sg,  v"))
+ (any_extend:DI
+   (match_operand:SI 2 "register_operand"  "Sg,vSv")))
+   (const_int 32]
+  ""
+  "@
+  s_mul_hi0\t%0, %1, %2
+  v_mul_hi0\t%0, %2, %1"
+  [(set_attr "type" "sop2,vop3a")
+   (set_attr "length" "4,8")
+   (set_attr "gcn_version" "gcn5,*")])
+
+(define_insn "mulsi3_highpart_imm"
+  [(set (match_operand:SI 0 "register_operand"   "=Sg,Sg,v")
+   (truncate:SI
+ (lshiftrt:DI
+   (mult:DI
+ (any_extend:DI
+   (match_operand:SI 1 "register_operand" "Sg,Sg,v"))
+ (match_operand:DI 2 "gcn_32bit_immediate_operand" "A, B,A"))
+   (const_int 32]
+  "TARGET_GCN5 || gcn_inline_immediate_operand (operands[2], SImode)"
+  "@
+  s_mul_hi0\t%0, %1, %2
+  s_mul_hi0\t%0, %1, %2
+  v_mul_hi0\t%0, %2, %1"
+  [(set_attr "type" "sop2,sop2,vop3a")
+   (set_attr "length" "4,8,8")
+   (set_attr "gcn_version" "gcn5,gcn5,*")])
 
 (define_insn "mulhisi3"
   [(set (match_operand:SI 0 "register_operand" "=v")
-- 
2.29.2

[PATCH 1/3] amdgcn: Mark s_mulk_i32 as clobbering SCC

2021-06-29 Thread Julian Brown

The s_mulk_i32 instruction sets the SCC status register according to
whether the multiplication overflows, but that is not currently modelled
in the GCN backend.  AFAIK this is a latent bug and hasn't been noticed
"in the wild", but it should be fixed.

I will commit shortly.

Julian

2021-06-29  Julian Brown  

gcc/
* config/gcn/gcn.md (mulsi3): Make s_mulk_i32 variant clobber SCC.
---
 gcc/config/gcn/gcn.md | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index b5f895a93e2..cca45522fba 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -1371,10 +1371,13 @@
 
 ; Vector multiply has vop3a encoding, but no corresponding vop2a, so no long
 ; immediate.
+; The "s_mulk_i32" variant sets SCC to indicate overflow (which we don't care
+; about here, but we need to indicate the clobbering).
 (define_insn "mulsi3"
   [(set (match_operand:SI 0 "register_operand""= Sg,Sg, Sg,   v")
 (mult:SI (match_operand:SI 1 "gcn_alu_operand" "%SgA, 0,SgA,   v")
-(match_operand:SI 2 "gcn_alu_operand" " SgA, J,  B,vASv")))]
+(match_operand:SI 2 "gcn_alu_operand" " SgA, J,  B,vASv")))
+   (clobber (match_scratch:BI 3 "=X,cs,  X,   
X"))]
   ""
   "@
s_mul_i32\t%0, %1, %2
-- 
2.29.2

[PATCH 0/3] amdgcn: Integer multiplication improvements

2021-06-29 Thread Julian Brown

These three patches replace the following one from the previously-posted
series:

  https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573161.html

As suggested by Andrew Stubbs, I've changed the early lowering of DImode
multiplication to a define_insn_and_split to keep the operation together
until register-allocation time.  This indeed seems to improve generated
code.

I also noticed that SImode multiplication can silently clobber SCC if
the "s_mulk_i32" alternative is chosen, so I've done a drive-by fix for
that too.

Re-tested on GCN bare metal.

Julian

Julian Brown (3):
  amdgcn: Mark s_mulk_i32 as clobbering SCC
  amdgcn: Add [us]mulsi3_highpart SGPR alternatives
  amdgcn: Add [us]mulsid3/muldi3 patterns

 gcc/config/gcn/gcn.md | 161 +++---
 1 file changed, 153 insertions(+), 8 deletions(-)

-- 
2.29.2

Re: [COMMITTED V10 3/7] CTF/BTF debug formats

2021-06-29 Thread David Edelsohn via Gcc-patches

On Tue, Jun 29, 2021 at 10:33 AM Joseph Myers  wrote:
>
> There's now a build failure for sparc64-linux-gnu:
>
> In file included from ./tm_p.h:4:0,
>  from /scratch/jmyers/glibc-bot/src/gcc/gcc/ctfout.c:24:
> /scratch/jmyers/glibc-bot/src/gcc/gcc/config/sparc/sparc-protos.h:46:47: 
> error: use of enum 'memmodel' without previous declaration
>  extern void sparc_emit_membar_for_model (enum memmodel, int, int);
>^
>
> (and likewise in btfout.c).

I see memmodel.h included before tm_p.h.  Do you want to commit that,
or should I?

Are you aware of other dependencies for other targets?

Thanks, David

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-29 Thread Jason Merrill via Gcc-patches


On 6/28/21 2:07 PM, Martin Sebor wrote:

On 6/28/21 2:07 AM, Richard Biener wrote:

On Sat, Jun 26, 2021 at 12:36 AM Martin Sebor  wrote:


On 6/25/21 4:11 PM, Jason Merrill wrote:

On 6/25/21 4:51 PM, Martin Sebor wrote:

On 6/1/21 3:38 PM, Jason Merrill wrote:

On 6/1/21 3:56 PM, Martin Sebor wrote:

On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign 
because
the class manages its own memory but doesn't define (or 
delete)

either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec
along
with a few simple tests.  It makes auto_vec safe to use in
containers
that expect copyable and assignable element types and passes
bootstrap
and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those
operators?


I would strongly prefer the generic vector class to have the
properties
expected of any other generic container: copyable and
assignable.  If
we also want another vector type with this restriction I suggest
to add
another "noncopyable" type and make that property explicit in
its name.
I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign).
Looking around
I see that vec<> does not do deep copying.  Making auto_vec<> 
do it
might be surprising (I added the move capability to match how 
vec<>

is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all 
(because
of their use in unions).  That's something we might have to 
live with

but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're
writing C++11.

The auto_vec class was introduced to fill the need for a 
conventional
sequence container with a ctor and dtor.  The missing copy ctor 
and

assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.

The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion
richi mentions.  And many uses of auto_vec will pass them as vec,
which will still do a shallow copy.  I think it's probably better
to disable the copy special members for auto_vec until we fix 
vec<>.


There are at least a couple of problems that get in the way of 
fixing

all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by 
providing

copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.

It doesn't seem to me that having a safely copyable auto_vec needs
to be put on hold until the rats nest above is untangled.  It won't
make anything worse than it is.  (I have a project that depends on
a sane auto_vec working).

A couple of alternatives to solving this are to use std::vector or
write an equivalent vector class just for GCC.


It occurs to me that another way to work around the issue of passing
an auto_vec by value as a vec, and thus doing a shallow copy, would
be to add a vec ctor taking an auto_vec, and delete that.  This would
mean if you want to pass an auto_vec to a vec interface, it needs to
be by reference.  We might as well do the same for operator=, though
that isn't as important.


Thanks, that sounds like a good idea.  Attached is an implementation
of this change.  Since the auto_vec cop

Re: [COMMITTED V10 3/7] CTF/BTF debug formats

2021-06-29 Thread Joseph Myers

There's now a build failure for sparc64-linux-gnu:

In file included from ./tm_p.h:4:0,
 from /scratch/jmyers/glibc-bot/src/gcc/gcc/ctfout.c:24:
/scratch/jmyers/glibc-bot/src/gcc/gcc/config/sparc/sparc-protos.h:46:47: error: 
use of enum 'memmodel' without previous declaration
 extern void sparc_emit_membar_for_model (enum memmodel, int, int);
   ^

(and likewise in btfout.c).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-29 Thread Jakub Jelinek via Gcc-patches

On Tue, Jun 29, 2021 at 03:47:03PM +0200, Tobias Burnus wrote:
> gcc/ChangeLog:
> 
> * common.opt (-foffload=): Update description.
>   (-foffload-options=): New.
> * doc/invoke.texi (C Language Options): Document
>   -foffload and -foffload-options.
> * gcc.c (check_offload_target_name): New, split off from
>   handle_foffload_option.
> (check_foffload_target_names): New.
> (handle_foffload_option): Handle -foffload=default.
> (driver_handle_option): Update for -foffload-options.
> * lto-opts.c (lto_write_options): Use -foffload-options
>   instead of -foffload.
> * lto-wrapper.c (merge_and_complain, append_offload_options):
>   Likewise.
> * opts.c (common_handle_option): Likewise.
> 
> libgomp/ChangeLog:
> 
> * testsuite/libgomp.c-c++-common/reduction-16.c: Replace
>   -foffload=nvptx-none= by -foffload-options=nvptx-none= to
>   avoid disabling other offload targets.
> * testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
> * testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
> * testsuite/libgomp.c/target-44.c: Likewise.

Ok, thanks.

Jakub

Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-29 Thread Tobias Burnus


First, the doc-sorting patch has now been applied separately as

https://gcc.gnu.org/g:d479ddc0d9854905d03a3290b203a5dcb8db07eb

On 29.06.21 13:58, Jakub Jelinek wrote:


Also, wonder if we shouldn't print the list of configured targets in that
case, see candidates_list_and_hint functions and its callers.
And it is unclear why we use fatal_error, can't unknown offload target names
be simply ignored after emitting error?


Done so – as the changes now became a bit larger, I have attached the
new version of the patch – despite the LGTM.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Add 'default' to -foffload=; document that flag [PR67300]

As -foffload={options,targets,targets=options} is very convoluted,
it has been split into -foffload=targets (supporting the old syntax
for backward compatibilty) and -foffload-options={options,target=options}.

Only the new syntax is documented.

Additionally, -foffload=default is supported, which can reset the
devices after -foffload=disable / -foffload=targets to the default,
if needed.

gcc/ChangeLog:

* common.opt (-foffload=): Update description.
	(-foffload-options=): New.
* doc/invoke.texi (C Language Options): Document
	-foffload and -foffload-options.
* gcc.c (check_offload_target_name): New, split off from
	handle_foffload_option.
(check_foffload_target_names): New.
(handle_foffload_option): Handle -foffload=default.
(driver_handle_option): Update for -foffload-options.
* lto-opts.c (lto_write_options): Use -foffload-options
	instead of -foffload.
* lto-wrapper.c (merge_and_complain, append_offload_options):
	Likewise.
* opts.c (common_handle_option): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/reduction-16.c: Replace
	-foffload=nvptx-none= by -foffload-options=nvptx-none= to
	avoid disabling other offload targets.
* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
* testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
* testsuite/libgomp.c/target-44.c: Likewise.

 gcc/common.opt |  10 +-
 gcc/doc/invoke.texi|  41 +++
 gcc/gcc.c  | 121 +
 gcc/lto-opts.c |   3 +-
 gcc/lto-wrapper.c  |  10 +-
 gcc/opts.c |   2 +-
 .../testsuite/libgomp.c-c++-common/reduction-16.c  |   2 +-
 .../testsuite/libgomp.c-c++-common/reduction-5.c   |   2 +-
 .../testsuite/libgomp.c-c++-common/reduction-6.c   |   2 +-
 libgomp/testsuite/libgomp.c/target-44.c|   2 +-
 10 files changed, 158 insertions(+), 37 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1dd4456e577..eaee74c580a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2100,9 +2100,15 @@ fnon-call-exceptions
 Common Var(flag_non_call_exceptions) Optimization
 Support synchronous non-call exceptions.
 
+; -foffload= is documented
+; -foffload== is supported for backward compatibility
 foffload=
-Common Driver Joined MissingArgError(options or targets missing after %qs)
--foffload==	Specify offloading targets and options for them.
+Driver Joined MissingArgError(targets missing after %qs)
+-foffload=	Specify offloading targets
+
+foffload-options=
+Common Driver Joined MissingArgError(options or targets=options missing after %qs)
+-foffload==	Specify options for the offloading targets
 
 foffload-abi=
 Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) Init(OFFLOAD_ABI_UNSET)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index bf529090d92..a9fd5fdc104 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,6 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
 -flax-vector-conversions  -fms-extensions @gol
+-foffload=@var{arg}  -foffload-options=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
@@ -2627,6 +2628,46 @@ fields within structs/unions}, for details.
 Note that this option is off for all targets except for x86
 targets using ms-abi.
 
+@item -foffload=disable
+@itemx -foffload=default
+@itemx -foffload=@var{target-list}
+@opindex foffload
+@cindex Offloading targets
+@cindex OpenACC offloading targets
+@cindex OpenMP offloading targets
+Specify for which OpenMP and OpenACC offload targets code should be generated.
+The default behavior, equivalent to @option{-foffload=default}, is to generate
+code for all supported offload targets.  The @option{-foffload=disable} form
+generates code only for the host fallback, while
+@option{-foffload=@var{target-list}} g

[PATCH] Add forward propagation to SLP "any" permutes

2021-06-29 Thread Richard Biener

This adds a forward propagation phase to the permute optimization
machinery which allows us to handle "any" permute for all kinds of
nodes.  To match previous behavior cost-wise we still do not allow
non-external/constant nodes to be duplicated for multiple permutes
and this is ensured during propagation itself.

Bootstrap & regtest in progress on x86_64-unknown-linux-gnu.

2021-06-29  Richard Biener  

* tree-vect-slp.c (vect_optimize_slp): Forward propagate
to "any" permute nodes and relax "any" permute proapgation
during iterative backward propagation.
---
 gcc/tree-vect-slp.c | 81 +++--
 1 file changed, 63 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 524bfaa1c7f..9155af499b3 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3729,16 +3729,11 @@ vect_optimize_slp (vec_info *vinfo)
perm = vertices[idx].perm_out;
  else
{
- perm = -1;
- bool all_constant = true;
+ perm = vertices[idx].get_perm_in ();
  for (graph_edge *succ = slpg->vertices[idx].succ;
   succ; succ = succ->succ_next)
{
  int succ_idx = succ->dest;
- slp_tree succ_node = vertices[succ_idx].node;
- if (SLP_TREE_DEF_TYPE (succ_node) != vect_external_def
- && SLP_TREE_DEF_TYPE (succ_node) != vect_constant_def)
-   all_constant = false;
  int succ_perm = vertices[succ_idx].perm_out;
  /* Handle unvisited (and constant) nodes optimistically.  */
  /* ???  But for constants once we want to handle
@@ -3750,25 +3745,34 @@ vect_optimize_slp (vec_info *vinfo)
continue;
  if (perm == -1)
perm = succ_perm;
- else if (succ_perm == 0)
+ else if (succ_perm == 0
+  || !vect_slp_perms_eq (perms, perm, succ_perm))
{
  perm = 0;
  break;
}
- else if (!vect_slp_perms_eq (perms, perm, succ_perm))
+   }
+
+ /* If this is a node we do not want to eventually unshare
+but it can be permuted at will, verify all users have
+the same permutations registered and otherwise drop to
+zero.  */
+ if (perm == -1
+ && SLP_TREE_DEF_TYPE (node) != vect_external_def
+ && SLP_TREE_DEF_TYPE (node) != vect_constant_def)
+   {
+ int preds_perm = -1;
+ for (graph_edge *pred = slpg->vertices[idx].pred;
+  pred; pred = pred->pred_next)
{
- perm = 0;
- break;
+ int pred_perm = vertices[pred->src].get_perm_in ();
+ if (preds_perm == -1)
+   preds_perm = pred_perm;
+ else if (!vect_slp_perms_eq (perms,
+  pred_perm, preds_perm))
+   perm = 0;
}
}
- /* We still lack a forward propagation of materializations
-and thus only allow "any" permutes on constant or external
-nodes which we handle during materialization by looking
-at SLP children.  So avoid having internal "any" permutes
-for now, see gcc.dg/vect/bb-slp-71.c for a testcase that
-breaks when removing this restriction.  */
- if (perm == -1 && all_constant)
-   perm = 0;
 
  if (!vect_slp_perms_eq (perms, perm,
  vertices[idx].get_perm_in ()))
@@ -3836,6 +3840,47 @@ vect_optimize_slp (vec_info *vinfo)
}
 }
   while (changed);
+  statistics_counter_event (cfun, "SLP optimize perm iterations", iteration);
+
+  /* Compute pre-order.  */
+  auto_vec heads;
+  heads.reserve (vinfo->slp_instances.length ());
+  for (slp_instance inst : vinfo->slp_instances)
+heads.quick_push (SLP_INSTANCE_TREE (inst)->vertex);
+  auto_vec po;
+  graphds_dfs (slpg, &heads[0], heads.length (), &po, true, NULL, NULL);
+
+  /* Propagate materialized permutes to "any" permute nodes.  For heads
+ ending up as "any" (reductions with just invariants), set them to
+ no permute.  */
+  for (int idx : heads)
+if (vertices[idx].perm_out == -1)
+  vertices[idx].perm_out = 0;
+  for (i = po.length (); i > 0; --i)
+{
+  int idx = po[i-1];
+  int perm_in = vertices[idx].get_perm_in ();
+  slp_tree node = vertices[idx].node;
+  if (SLP_TREE_DEF_TYPE (node) == vect_external_def
+ || SLP_TREE_DEF_TYPE (node) == vect_constant_def)
+   continue;
+  gcc_assert (perm_i

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-29 Thread Trevor Saunders

On Fri, Jun 25, 2021 at 02:51:58PM -0600, Martin Sebor via Gcc-patches wrote:
> On 6/1/21 3:38 PM, Jason Merrill wrote:
> > On 6/1/21 3:56 PM, Martin Sebor wrote:
> > > On 5/27/21 2:53 PM, Jason Merrill wrote:
> > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > > > > On 4/27/21 8:04 AM, Richard Biener wrote:
> > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  
> > > > > > wrote:
> > > > > > > 
> > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote:
> > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> > > > > > > >  wrote:
> > > > > > > > > 
> > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and assign 
> > > > > > > > > because
> > > > > > > > > the class manages its own memory but doesn't define (or 
> > > > > > > > > delete)
> > > > > > > > > either special function.  Since I first ran into the problem,
> > > > > > > > > auto_vec has grown a move ctor and move assignment from
> > > > > > > > > a dynamically-allocated vec but still no copy ctor or copy
> > > > > > > > > assignment operator.
> > > > > > > > > 
> > > > > > > > > The attached patch adds the two special functions to auto_vec 
> > > > > > > > > along
> > > > > > > > > with a few simple tests.  It makes auto_vec
> > > > > > > > > safe to use in containers
> > > > > > > > > that expect copyable and assignable element
> > > > > > > > > types and passes bootstrap
> > > > > > > > > and regression testing on x86_64-linux.
> > > > > > > > 
> > > > > > > > The question is whether we want such uses to appear since those
> > > > > > > > can be quite inefficient?  Thus the option is to
> > > > > > > > delete those operators?
> > > > > > > 
> > > > > > > I would strongly prefer the generic vector class to
> > > > > > > have the properties
> > > > > > > expected of any other generic container: copyable and assignable. 
> > > > > > >  If
> > > > > > > we also want another vector type with this
> > > > > > > restriction I suggest to add
> > > > > > > another "noncopyable" type and make that property
> > > > > > > explicit in its name.
> > > > > > > I can submit one in a followup patch if you think we need one.
> > > > > > 
> > > > > > I'm not sure (and not strictly against the copy and
> > > > > > assign). Looking around
> > > > > > I see that vec<> does not do deep copying.  Making auto_vec<> do it
> > > > > > might be surprising (I added the move capability to match how vec<>
> > > > > > is used - as "reference" to a vector)
> > > > > 
> > > > > The vec base classes are special: they have no ctors at all (because
> > > > > of their use in unions).  That's something we might have to live with
> > > > > but it's not a model to follow in ordinary containers.
> > > > 
> > > > I don't think we have to live with it anymore, now that we're
> > > > writing C++11.
> > > > 
> > > > > The auto_vec class was introduced to fill the need for a conventional
> > > > > sequence container with a ctor and dtor.  The missing copy ctor and
> > > > > assignment operators were an oversight, not a deliberate feature.
> > > > > This change fixes that oversight.
> > > > > 
> > > > > The revised patch also adds a copy ctor/assignment to the auto_vec
> > > > > primary template (that's also missing it).  In addition, it adds
> > > > > a new class called auto_vec_ncopy that disables copying and
> > > > > assignment as you prefer.
> > > > 
> > > > Hmm, adding another class doesn't really help with the confusion
> > > > richi mentions.  And many uses of auto_vec will pass them as
> > > > vec, which will still do a shallow copy.  I think it's probably
> > > > better to disable the copy special members for auto_vec until we
> > > > fix vec<>.
> > > 
> > > There are at least a couple of problems that get in the way of fixing
> > > all of vec to act like a well-behaved C++ container:
> > > 
> > > 1) The embedded vec has a trailing "flexible" array member with its
> > > instances having different size.  They're initialized by memset and
> > > copied by memcpy.  The class can't have copy ctors or assignments
> > > but it should disable/delete them instead.
> > > 
> > > 2) The heap-based vec is used throughout GCC with the assumption of
> > > shallow copy semantics (not just as function arguments but also as
> > > members of other such POD classes).  This can be changed by providing
> > > copy and move ctors and assignment operators for it, and also for
> > > some of the classes in which it's a member and that are used with
> > > the same assumption.
> > > 
> > > 3) The heap-based vec::block_remove() assumes its elements are PODs.
> > > That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
> > > and tree-vect-patterns.c).
> > > 
> > > I took a stab at both and while (1) is easy, (2) is shaping up to
> > > be a big and tricky project.  Tricky because it involves using
> > > std::move in places where what's moved is subsequently still used.
> > > I can keep plugging away at it but it won't change the fact that
> > > the e

Re: [Patch] Add 'default' to -foffload=; document that flag [PR67300]

2021-06-29 Thread Jakub Jelinek via Gcc-patches

On Mon, Jun 28, 2021 at 05:51:30PM +0200, Tobias Burnus wrote:
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi

I think it would be better to commit the reorderings in invoke.texi
separately from the -foffload* changes, because otherwise people will keep
wondering what actually really changed.
It can go in before or after (and please take into account Sandra's
review comments).

> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -3977,6 +3977,68 @@ driver_wrong_lang_callback (const struct 
> cl_decoded_option *decoded,
>  static const char *spec_lang = 0;
>  static int last_language_n_infiles;
>  
> +
> +/* Check that GCC is configured to support the offload target.  */
> +
> +static void
> +check_offload_target_name (const char *target, ptrdiff_t len)
> +{
> +  const char *n, *c = OFFLOAD_TARGETS;
> +  char *target2 = NULL;
> +  while (c)
> +{
> +  n = strchr (c, ',');
> +  if (n == NULL)
> + n = strchr (c, '\0');
> +  if (len == n - c && strncmp (target, c, n - c) == 0)
> + break;
> +  c = *n ? n + 1 : NULL;
> +}
> +  if (!c)
> +{
> +  if (target[len] != '\0')
> + {
> +   target2 = XNEWVEC (char, len + 1);
> +   memcpy (target2, target, len);
> +   target2[len] = '\0';
> + }
> +  fatal_error (input_location,
> +  "GCC is not configured to support %qs as offload target",
> +  target2 ? target2 : target);

Can't this be done without target2 with
  fatal_error (input_location,
   "GCC is not configured to support %q.*s as offload target",
   len, target);
instead, regardless if target[len] is 0 or not?

The message should be consistent between this function and 
handle_foffload_option
(on the q in particular).

Also, wonder if we shouldn't print the list of configured targets in that
case, see candidates_list_and_hint functions and its callers.
And it is unclear why we use fatal_error, can't unknown offload target names
be simply ignored after emitting error?

> +  XDELETEVEC (target2);
> +}
> +}
> +
> +/* Sanity check for -foffload-options.  */
> +
> +static void
> +check_foffload_target_names (const char *arg)
> +{
> +  const char *cur, *next, *end;
> +  /* If option argument starts with '-' then no target is specified and we
> + do not need to parse it.  */
> +  if (arg[0] == '-')
> +return;
> +  end = strchr (arg, '=');
> +  if (end == NULL)
> +{
> +  error ("%<=options%> missing after %<-foffload-options=target%>");

Neither options nor target are keywords, so IMHO those shouldn't appear in 
between
%< and %> but after the %>, so
"%<=%>options missing after %%<-foffload-options=%>target"
?

Otherwise LGTM.

Jakub

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-29 Thread Martin Jambor

Hi,

On Tue, Jun 29 2021, Richard Biener via Gcc-patches wrote:
> On Mon, Jun 28, 2021 at 8:07 PM Martin Sebor  wrote:

[...]

>>
>> vNULL can bind to a const vec& (via the vec conversion ctor) but
>> not to vec&.  The three functions that in the patch are passed
>> vNULL modify the argument when it's not vNULL but not otherwise.
>> An alternate design is to have them take a vec* and pass in
>> a plain NULL (or nullptr) instead of vNULL.  That would require
>> some surgery on the function bodies that I've been trying to
>> avoid in the first pass.
>
> But I wonder if since you now identified them they could be massaged
> prior to doing the change.
>
> I do hope we end up not needing .to_vec () after all, if no users remain ;)

I am afraid that the decay from ipa_auto_call_arg_values to
ipa_call_arg_values would still need them.

The auto variant is very useful for ipa-cp but the non-auto one is
necessary because of how ipa_cached_call_context is structured and how
it operates.

I tried to rework ipa_cached_call_context last summer - though I do not
remember it enough to be 100% sure that it would avoid the need for
to_vec - but Honza strongly preferred how it works now.

Martin

[PATCH] Refactor SLP permute opt propagation

2021-06-29 Thread Richard Biener

This rewrites the SLP permute opt propagation to elide the visited
bit for an incoming permute of -1 as well as allowing the initial
propagation to take more than one iteration before starting on
materialization.  As we still lack propagation in the reverse
direction I've added gcc.dg/vect/bb-slp-71.c and a stopgap to
restrict "any" permute handling to the supported cases.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2021-06-29  Richard Biener  

* tree-vect-slp.c (slpg_vertex::visited): Remove.
(vect_slp_perms_eq): Handle -1 permutes.
(vect_optimize_slp): Rewrite permute propagation.

* gcc.dg/vect/pr67790.c: Un-XFAIL.
* gcc.dg/vect/bb-slp-71.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-71.c |  32 ++
 gcc/testsuite/gcc.dg/vect/pr67790.c   |   2 +-
 gcc/tree-vect-slp.c   | 140 --
 3 files changed, 120 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-71.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-71.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-71.c
new file mode 100644
index 000..6816511cd0f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-71.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+int a[4], b[4];
+
+void __attribute__((noipa))
+foo(int x, int y)
+{
+  int tem0 = x + 1;
+  int tem1 = y + 2;
+  int tem2 = x + 3;
+  int tem3 = y + 4;
+  a[0] = tem0 + b[1];
+  a[1] = tem1 + b[0];
+  a[2] = tem2 + b[2];
+  a[3] = tem3 + b[3];
+}
+
+int main()
+{
+  check_vect ();
+
+  b[0] = 10;
+  b[1] = 14;
+  b[2] = 18;
+  b[3] = 22;
+  foo (-1, -3);
+  if (a[0] != 14 || a[1] != 9 || a[2] != 20 || a[3] != 23)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr67790.c 
b/gcc/testsuite/gcc.dg/vect/pr67790.c
index 0555d41abf7..32eacd91fda 100644
--- a/gcc/testsuite/gcc.dg/vect/pr67790.c
+++ b/gcc/testsuite/gcc.dg/vect/pr67790.c
@@ -38,4 +38,4 @@ int main()
 }
 
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail *-*-* } 
} } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 63b6e6a24b9..524bfaa1c7f 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3470,12 +3470,11 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
 struct slpg_vertex
 {
   slpg_vertex (slp_tree node_)
-: node (node_), visited (0), perm_out (0), materialize (0) {}
+: node (node_), perm_out (-1), materialize (0) {}
 
   int get_perm_in () const { return materialize ? materialize : perm_out; }
 
   slp_tree node;
-  unsigned visited : 1;
   /* The permutation on the outgoing lanes (towards SLP parents).  */
   int perm_out;
   /* The permutation that is applied by this node.  perm_out is
@@ -3567,7 +3566,8 @@ vect_slp_perms_eq (const vec > &perms,
   int perm_a, int perm_b)
 {
   return (perm_a == perm_b
- || (perms[perm_a].length () == perms[perm_b].length ()
+ || (perm_a != -1 && perm_b != -1
+ && perms[perm_a].length () == perms[perm_b].length ()
  && memcmp (&perms[perm_a][0], &perms[perm_b][0],
 sizeof (unsigned) * perms[perm_a].length ()) == 0));
 }
@@ -3614,7 +3614,7 @@ vect_optimize_slp (vec_info *vinfo)
   /* Leafs do not change across iterations.  Note leafs also double
 as entries to the reverse graph.  */
   if (!slpg->vertices[idx].succ)
-   vertices[idx].visited = 1;
+   vertices[idx].perm_out = 0;
   /* Loads are the only thing generating permutes.  */
   if (!SLP_TREE_LOAD_PERMUTATION (node).exists ())
continue;
@@ -3668,12 +3668,17 @@ vect_optimize_slp (vec_info *vinfo)
 
   /* Propagate permutes along the graph and compute materialization points.  */
   bool changed;
+  bool do_materialization = false;
   unsigned iteration = 0;
   do
 {
   changed = false;
   ++iteration;
 
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"SLP optimize iteration %d\n", iteration);
+
   for (i = vertices.length (); i > 0 ; --i)
{
  int idx = ipo[i-1];
@@ -3685,19 +3690,21 @@ vect_optimize_slp (vec_info *vinfo)
  || SLP_TREE_DEF_TYPE (node) == vect_constant_def)
continue;
 
- vertices[idx].visited = 1;
-
  /* We still eventually have failed backedge SLP nodes in the
 graph, those are only cancelled when analyzing operations.
 Simply treat them as transparent ops, propagating permutes
 through them.  */
  if (SLP_TREE_DEF_TYPE (node) == vect_internal_def)
{
- /* We do not handle stores with a permutation.  */
+ /* We do not handle stores with a permutation, so all
+incoming permutes must have been materialize

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-29 Thread Richard Biener via Gcc-patches

On Mon, Jun 28, 2021 at 8:07 PM Martin Sebor  wrote:
>
> On 6/28/21 2:07 AM, Richard Biener wrote:
> > On Sat, Jun 26, 2021 at 12:36 AM Martin Sebor  wrote:
> >>
> >> On 6/25/21 4:11 PM, Jason Merrill wrote:
> >>> On 6/25/21 4:51 PM, Martin Sebor wrote:
>  On 6/1/21 3:38 PM, Jason Merrill wrote:
> > On 6/1/21 3:56 PM, Martin Sebor wrote:
> >> On 5/27/21 2:53 PM, Jason Merrill wrote:
> >>> On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
>  On 4/27/21 8:04 AM, Richard Biener wrote:
> > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
> > wrote:
> >>
> >> On 4/27/21 1:58 AM, Richard Biener wrote:
> >>> On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> >>>  wrote:
> 
>  PR 90904 notes that auto_vec is unsafe to copy and assign because
>  the class manages its own memory but doesn't define (or delete)
>  either special function.  Since I first ran into the problem,
>  auto_vec has grown a move ctor and move assignment from
>  a dynamically-allocated vec but still no copy ctor or copy
>  assignment operator.
> 
>  The attached patch adds the two special functions to auto_vec
>  along
>  with a few simple tests.  It makes auto_vec safe to use in
>  containers
>  that expect copyable and assignable element types and passes
>  bootstrap
>  and regression testing on x86_64-linux.
> >>>
> >>> The question is whether we want such uses to appear since those
> >>> can be quite inefficient?  Thus the option is to delete those
> >>> operators?
> >>
> >> I would strongly prefer the generic vector class to have the
> >> properties
> >> expected of any other generic container: copyable and
> >> assignable.  If
> >> we also want another vector type with this restriction I suggest
> >> to add
> >> another "noncopyable" type and make that property explicit in
> >> its name.
> >> I can submit one in a followup patch if you think we need one.
> >
> > I'm not sure (and not strictly against the copy and assign).
> > Looking around
> > I see that vec<> does not do deep copying.  Making auto_vec<> do it
> > might be surprising (I added the move capability to match how vec<>
> > is used - as "reference" to a vector)
> 
>  The vec base classes are special: they have no ctors at all (because
>  of their use in unions).  That's something we might have to live with
>  but it's not a model to follow in ordinary containers.
> >>>
> >>> I don't think we have to live with it anymore, now that we're
> >>> writing C++11.
> >>>
>  The auto_vec class was introduced to fill the need for a conventional
>  sequence container with a ctor and dtor.  The missing copy ctor and
>  assignment operators were an oversight, not a deliberate feature.
>  This change fixes that oversight.
> 
>  The revised patch also adds a copy ctor/assignment to the auto_vec
>  primary template (that's also missing it).  In addition, it adds
>  a new class called auto_vec_ncopy that disables copying and
>  assignment as you prefer.
> >>>
> >>> Hmm, adding another class doesn't really help with the confusion
> >>> richi mentions.  And many uses of auto_vec will pass them as vec,
> >>> which will still do a shallow copy.  I think it's probably better
> >>> to disable the copy special members for auto_vec until we fix vec<>.
> >>
> >> There are at least a couple of problems that get in the way of fixing
> >> all of vec to act like a well-behaved C++ container:
> >>
> >> 1) The embedded vec has a trailing "flexible" array member with its
> >> instances having different size.  They're initialized by memset and
> >> copied by memcpy.  The class can't have copy ctors or assignments
> >> but it should disable/delete them instead.
> >>
> >> 2) The heap-based vec is used throughout GCC with the assumption of
> >> shallow copy semantics (not just as function arguments but also as
> >> members of other such POD classes).  This can be changed by providing
> >> copy and move ctors and assignment operators for it, and also for
> >> some of the classes in which it's a member and that are used with
> >> the same assumption.
> >>
> >> 3) The heap-based vec::block_remove() assumes its elements are PODs.
> >> That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
> >> and tree-vect-patterns.c).
> >>
> >> I took a stab at both and while (1) is easy, (2) is shaping up to
> >> be a big and tricky project.  Tricky because it involves using
>

Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2021-06-29 Thread Richard Biener via Gcc-patches

On Mon, Jun 28, 2021 at 3:15 PM Andrew MacLeod  wrote:
>
> On 6/27/21 11:46 AM, Aldy Hernandez wrote:
> >
> >
> > On 6/25/21 9:38 AM, Richard Biener wrote:
> >> On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod 
> >> wrote:
> >>>
> >>> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
>  On 6/24/21 8:29 AM, Richard Biener wrote:
> 
> 
>  THe original function in EVRP currently looks like:
> 
>    === BB 2 
>    :
>   if (a_5(D) == b_6(D))
> goto ; [INV]
>   else
> goto ; [INV]
> 
>  === BB 8 
>  Equivalence set : [a_5(D), b_6(D)] edge 2->8 provides
>  a_5 and b_6 as equivalences
>    :
>   goto ; [100.00%]
> 
>  === BB 6 
>    :
>   # i_1 = PHI <0(8), i_10(5)>
>   if (i_1 < a_5(D))
> goto ; [INV]
>   else
> goto ; [INV]
> 
>  === BB 3 
>  Relational : (i_1 < a_5(D)) edge 6->3 provides
>  this relation
>    :
>   if (i_1 == b_6(D))
> goto ; [INV]
>   else
> goto ; [INV]
> 
> 
>  So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
>  a_5 in BB3 as well..
> 
>  so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
>  currently aren't.   I think I had turned on equivalence mapping during
>  relational processing, so should be able to tag that without
>  transitive relations...  I'll have a look at why.
> 
>  And once we get a bit further along, you will be able to access this
>  without ranger.. if one wants to simply register the relations
>  directly.
> 
>  Anyway, I'll get back to you why its currently being missed.
> 
>  Andrew
> 
> 
> 
> >>> As promised.  There was a typo in the equivalency comparisons... so it
> >>> was getting missed.  With the fix, the oracle identifies the relation
> >>> and evrp will now fold that expression away and the IL becomes:
> >>>
> >>>  :
> >>> if (a_5(D) == b_6(D))
> >>>   goto ; [INV]
> >>> else
> >>>   goto ; [INV]
> >>>
> >>>  :
> >>> i_10 = i_1 + 1;
> >>>
> >>>  :
> >>> # i_1 = PHI <0(2), i_10(3)>
> >>> if (i_1 < a_5(D))
> >>>   goto ; [INV]
> >>> else
> >>>   goto ; [INV]
> >>>
> >>>  :
> >>> return;
> >>>
> >>> for the other cases you quote, there are no predictions such that if a
> >>> != 0 then this equivalency exists...
> >>>
> >>> +  if (a != 0)
> >>> +{
> >>> +  c = b;
> >>> +}
> >>>
> >>> but the oracle would register that in the TRUE block,  c and b are
> >>> equivalent... so some other pass that was interested in tracking
> >>> conditions that make a block relevant would be able to compare
> >>> relations...
> >>
> >> I guess to fully leverage optimizations for cases like
> >>
> >>if (a != 0)
> >>  c = b;
> >>...
> >>if (a != 0)
> >>  {
> >>  if (c == b)
> >> ...
> >>  }
> >>
> >> That is, we'd do simplifications exposed by jump threading but
> >> without actually doing the jump threading (which will of course
> >> not allow all possible simplifications w/o inserting extra PHIs
> >> for computations we might want to re-use).
> >
> > FWIW, as I mention in the PR, if the upcoming threader work could be
> > taught to use the relation oracle, it could easily solve the
> > conditional flowing through the a!=0 path.  However, we wouldn't be
> > able to thread it because in this particular case, the path crosses
> > loop boundaries.
> >
> > I leave it to Jeff/others to pontificate on whether the jump-threader
> > path duplicator could be taught to through loops. ??
> >
> > Aldy
> >
> This is still bouncing around in my head. I think we have the tools to
> do this better than via threading,  Ranger is now trivially capable of
> calculating when a predicate expression is true or false at another
> location in the IL. Combine this with flagging relations that are true
> when the predicate is, and that relation could be simply added into the
> oracle.
>
> ie:
>
>   :
>  if (a_5(D) != 0)
>goto ; [INV]
>  else
>goto ; [INV]
>
>   :
>   :
>  # c_1 = PHI 
>
> the predicate and relations are:
>  (a_5 != 0)  ->  c_1 == b_7
>  (a_5 == 0) -> c_1 == c_6
>
> then :
>
>  :
>  # i_2 = PHI <0(4), i_12(8)>
>  if (c_1 > i_2)
>goto ; [INV]
>  else
>goto ; [INV]
>
>   :9->5 registers c_1 > 1_2
> with the oracle
>  if (a_5(D) != 0)
>goto ; [INV]
>  else
>goto ; [INV]
>
>   :
(*) >  if (i_2 == b_7(D))
>goto ; [INV]
>  else
>goto ; [INV]
> ..
> If we know to check the predicate list in bb_6, ranger can answer the
> question: on the branch in bb6, a_5 != 0.
> This in turn means the predi

Re: [PATCH] Port GCC documentation to Sphinx

2021-06-29 Thread Richard Earnshaw via Gcc-patches





On 29/06/2021 11:09, Martin Liška wrote:

On 6/28/21 5:33 PM, Joseph Myers wrote:

Are formatted manuals (HTML, PDF, man, info) corresponding to this patch
version also available for review?


I've just uploaded them here:
https://splichal.eu/gccsphinx-final/

Martin



In the HTML version of the gcc manual the sidebar has an "Option index" 
link but no link to the general index.  When you follow that link the 
page contents is just a link to the "index" where everything is all 
lumped together.


If we can't have separate indexes for options and general entries, I 
think it would make more sense for the Option index link to be removed 
entirely.


R.

Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-06-29 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 28 Jun 2021 at 14:48, Christophe LYON
 wrote:
>
>
> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> >
> >> -Original Message-
> >> From: Prathamesh Kulkarni 
> >> Sent: 28 June 2021 09:38
> >> To: Kyrylo Tkachov 
> >> Cc: Christophe Lyon ; gcc Patches  >> patc...@gcc.gnu.org>
> >> Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
> >> constructor
> >>
> >> On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov 
> >> wrote:
> >>>
> >>>
>  -Original Message-
>  From: Prathamesh Kulkarni 
>  Sent: 14 June 2021 09:02
>  To: Christophe Lyon 
>  Cc: gcc Patches ; Kyrylo Tkachov
>  
>  Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>  constructor
> 
>  On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>   wrote:
> > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> 
>  wrote:
> >> On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> >>  wrote:
> >>> Hi,
> >>> As mentioned in PR, for the following test-case:
> >>>
> >>> #include 
> >>>
> >>> bfloat16x4_t f1 (bfloat16_t a)
> >>> {
> >>>return vdup_n_bf16 (a);
> >>> }
> >>>
> >>> bfloat16x4_t f2 (bfloat16_t a)
> >>> {
> >>>return (bfloat16x4_t) {a, a, a, a};
> >>> }
> >>>
> >>> Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
> >> abi=softfp
> >>> -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> >>>
> >>> f1:
> >>>  vdup.16 d16, r0
> >>>  vmovr0, r1, d16  @ v4bf
> >>>  bx  lr
> >>>
> >>> f2:
> >>>  mov r3, r0  @ __bf16
> >>>  adr r1, .L4
> >>>  ldrdr0, [r1]
> >>>  mov r2, r3  @ __bf16
> >>>  mov ip, r3  @ __bf16
> >>>  bfi r1, r2, #0, #16
> >>>  bfi r0, ip, #0, #16
> >>>  bfi r1, r3, #16, #16
> >>>  bfi r0, r2, #16, #16
> >>>  bx  lr
> >>>
> >>> This seems to happen because vec_init pattern in neon.md has VDQ
>  mode
> >>> iterator, which doesn't include V4BF. In attached patch, I changed
> >>> mode
> >>> to VDQX which seems to work for the test-case, and the compiler
> >> now
>  generates:
> >>> f2:
> >>>  vdup.16 d16, r0
> >>>  vmovr0, r1, d16  @ v4bf
> >>>  bx  lr
> >>>
> >>> However, the pattern is also gated on TARGET_HAVE_MVE and I am
>  not
> >>> sure if either VDQ or VDQX are correct modes for MVE since MVE
> >> has
> >>> only 128-bit vectors ?
> >>>
> >> I think patterns common to both Neon and MVE should be moved to
> >> vec-common.md, I don't know why such patterns were left in
> >> neon.md.
> > Since we end up calling neon_expand_vector_init for both NEON and
> >> MVE,
> > I am not sure if we should separate the pattern ?
> > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > in attached patch so
> > it will call neon_expand_vector_init only for 128-bit vectors ?
> > Altho hard-coding 16 in the pattern doesn't seem a good idea to me
> >> either.
>  ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
>  (attaching patch as text).
> 
> >>> --- a/gcc/config/arm/neon.md
> >>> +++ b/gcc/config/arm/neon.md
> >>> @@ -459,10 +459,12 @@
> >>>   )
> >>>
> >>>   (define_expand "vec_init"
> >>> -  [(match_operand:VDQ 0 "s_register_operand")
> >>> +  [(match_operand:VDQX 0 "s_register_operand")
> >>>  (match_operand 1 "" "")]
> >>> "TARGET_NEON || TARGET_HAVE_MVE"
> >>>   {
> >>> +  if (TARGET_HAVE_MVE && GET_MODE_SIZE (GET_MODE
> >> (operands[0])) != 16)
> >>> +FAIL;
> >>> neon_expand_vector_init (operands[0], operands[1]);
> >>> DONE;
> >>>   })
> >>>
> >>> I think we should move this to vec-common.md like Christophe said.
> >>> Perhaps rather than making it FAIL for non-16 MVE sizes we just disable 
> >>> it in
> >> the expander condition?
> >>> "TARGET_NEON || (TARGET_HAVE_MVE && GET_MODE_SIZE (<
> >> VDQ>mode) != 16)"
> >> Is it OK to use mode ? Because using mode resulted in lot
> >> of build errors.
> >> Also, I think the comparison should be inverted, ie, GET_MODE_SIZE
> >> (mode) == 16 since
> >> we want to make the pattern pass if target is MVE and vector size is 16 
> >> bytes ?
> >> Do these changes in attached patch look OK ?
> > Yes, you're right.
>
>
> Can't this be ARM_HAVE__ARITH like in most expanders in vec-common.md?
>
> (maybe with a && !TARGET_REALLY_IWMMXT if needed)
I wonder if this should be ARM_HAVE__LDST instead since we're
initializing the vector ?

Thanks,
Prathamesh
>
>
> Christophe
>
>
> > Ok.
> > Thanks,
> > Kyrill
> >
> >
> >> Thanks,
> >> Prathamesh
> >>> Thanks,
> >>> Kyrill
> >>>
>  Thanks,
>  Prathamesh
> > Thanks,
> > Prathamesh
> >> That being said, I sugges

Re: [PATCH 0/2] Ranger-based backwards threader implementation.

2021-06-29 Thread Aldy Hernandez via Gcc-patches




On 6/29/21 1:19 AM, Martin Sebor wrote:

On 6/28/21 10:21 AM, Aldy Hernandez via Gcc-patches wrote:

This is the ranger-based backwards threader.  It is divided into two
parts: the solver and the path discovery bits.

The solver is generic enough, that it may be of use to other passes,
so it's been abstracted into its own separate class/file.  Andrew and
I have already gone over it, so I don't think a review is necessary.
Besides, it's technically an extension of the ranger infrastructure.

On the other hand, the path discovery bits could benefit from the
watchful eye of the jump threading experts.

Documenting the solver in a [ranger-tech] post is on my TODO list,
as I think it would be useful as an example of GORI as a general
tool, outside the VRP world.

As I have mentioned elsewhere, I have gone through each test and
documented the reasons why they were adjusted (when useful).  The
reviewer(s) may benefit from looking at the test notes.

I have added a --param=threader-mode={ranger,legacy} option, which I
hope to remove shortly after.  It has been useful for diagnosing
issues in the past, though perhaps not so much now.  I've left it
in case there's a remote interest in using it during stage1, but
removing it could be a huge cleanup to tree-ssa-threadbackward.c.

If/when accepted, I will open 2-3 PRs with the XFAILed tests as
requested.  I am still working on distilling a C counterpart for
the libphobos missing thread edge.  It'll hopefully be ready by the
time the review is done.

A version of this patchset with the verification code has
been tested on x86-64, ppc64, ppc64le, and aarch64 (all Linux).

I am currently re-testing on x86-64 Linux, but will not re-test on the
rest of the architectures because...OMG aarch6 is so slow!


I applied the series and ran a subset of tests and didn't see any
failures, just the three XPASSes below.  The Wfree-nonheap-object
tests you mentioned in the other post all pass.  Looks like you
got past that problem?

XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 32)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 46)
XPASS: gcc.dg/uninit-pr61112.c pr61112 (test for bogus messages, line 60)

A couple of comments on the tests below (I haven't looked at the meat
of the patch):



Thanks.
Aldy

Aldy Hernandez (2):
   Implement basic block path solver.
   Backwards jump threader rewrite with ranger.

  gcc/Makefile.in   |   6 +
  gcc/flag-types.h  |   7 +
  gcc/params.opt    |  17 +
  .../g++.dg/debug/dwarf2/deallocator.C |   3 +-
  gcc/testsuite/gcc.c-torture/compile/pr83510.c |  33 ++
  gcc/testsuite/gcc.dg/Wrestrict-22.c   |   3 +


The change here just adds the comment:

+/* This looks like the threader caused the entire loop to collapse, and 
the

+   warning pass can't determine the arguments to memcpy.  */
+

Since the test passes I'm not sure I understand what the comment
is trying to say.  Is it still accurate and necessary?


This seems like it came from the ranger branch which had slightly 
different code, particularly it made use of a full ranger with 
equivalences.  It looks like this could have failed in the branch, but 
no longer does.  I have removed the comment.





  gcc/testsuite/gcc.dg/loop-unswitch-2.c    |   2 +-
  gcc/testsuite/gcc.dg/old-style-asm-1.c    |   5 +-
  gcc/testsuite/gcc.dg/pr68317.c    |   4 +-
  gcc/testsuite/gcc.dg/pr97567-2.c  |   2 +-
  gcc/testsuite/gcc.dg/predict-9.c  |   4 +-
  gcc/testsuite/gcc.dg/shrink-wrap-loop.c   |  53 ++
  gcc/testsuite/gcc.dg/sibcall-1.c  |  10 +
  .../gcc.dg/tree-ssa/builtin-sprintf-3.c   |   5 +-


I wonder if breaking up the test function into five, one for each
of the tests it does, would be a better way to avoid the IL changes
than disabling all the threading passes.  Like in the attached patch.


As the author of the original test, I completely differ to you :).

Attached is the latest version with your suggested changes, as well as a 
gimple FE test for the previously discussed failing libphobos test.


Thanks.
Aldy
>From a373ff1b936f39c8372ba88c4a462dd61a78c535 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 15 Jun 2021 12:32:51 +0200
Subject: [PATCH 2/2] Backwards jump threader rewrite with ranger.

This is a rewrite of the backwards threader with a ranger based solver.

The code is divided into two parts: the path solver in
tree-ssa-path-solver.*, and the path discovery in
tree-ssa-threadbackward.c.

The legacy code is still available with --param=threader-mode=legacy,
but will be removed shortly after.

gcc/ChangeLog:

	* Makefile.in (tree-ssa-loop-im.o-warn): New.
	* flag-types.h (enum threader_mode): New.
	* params.opt: Add entry for --param=threader-mode.
	* tree-ssa-threadbackward.c (THREADER_ITERATIVE_MODE): New.
	(class back_threader): New.
	(back_threader::back_threader)

Re: [PATCH] Port GCC documentation to Sphinx

2021-06-29 Thread Martin Liška


On 6/28/21 5:33 PM, Joseph Myers wrote:

Are formatted manuals (HTML, PDF, man, info) corresponding to this patch
version also available for review?


I've just uploaded them here:
https://splichal.eu/gccsphinx-final/

Martin

Re: [PATCH] New hook adjust_iv_update_pos

2021-06-29 Thread Xionghu Luo via Gcc-patches




On 2021/6/28 16:25, Richard Biener wrote:
> On Mon, Jun 28, 2021 at 10:07 AM Xionghu Luo  wrote:
>>
>>
>>
>> On 2021/6/25 18:02, Richard Biener wrote:
>>> On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  wrote:



 On 2021/6/25 16:54, Richard Biener wrote:
> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
>  wrote:
>>
>> From: Xiong Hu Luo 
>>
>> adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
>> on Power.  For example, it generates mismatched address offset after
>> adjust iv update statement position:
>>
>>  [local count: 70988443]:
>> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
>> ivtmp.30_415 = ivtmp.30_414 + 1;
>> _34 = ref_180 + 18446744073709551615;
>> _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
>> if (_84 == _86)
>>  goto ; [94.50%]
>>  else
>>  goto ; [5.50%]
>>
>> Disable it will produce:
>>
>>  [local count: 70988443]:
>> _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
>> _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
>> ivtmp.30_415 = ivtmp.30_414 + 1;
>> if (_84 == _86)
>>  goto ; [94.50%]
>>  else
>>  goto ; [5.50%]
>>
>> Then later pass loop unroll could benefit from same address offset
>> with different base address and reduces register dependency.
>> This patch could improve performance by 10% for typical case on Power,
>> no performance change observed for X86 or Aarch64 due to small loops
>> not unrolled on these platforms.  Any comments?
>
> The case you quote is special in that if we hoisted the IV update before
> the other MEM _also_ used in the condition it would be fine again.

 Thanks.  I tried to hoist the IV update statement before the first MEM 
 (Fix 2), it
 shows even worse performance due to not unroll(two more "base-1" is 
 generated in gimple,
 then loop->ninsns is 11 so small loops is not unrolled), change the 
 threshold from
 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, 
 the
 performance is SAME to the one that IV update statement in the *MIDDLE* 
 (trunk).
   From the ASM, we can see the index register %r4 is used in two 
 iterations which
 maybe a bottle neck for hiding instruction latency?

 Then it seems reasonable the performance would be better if keep the IV 
 update
 statement at *LAST* (Fix 1).

 (Fix 2):
  [local count: 70988443]:
 ivtmp.30_415 = ivtmp.30_414 + 1;
 _34 = ip_229 + 18446744073709551615;
 _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
 _33 = ref_180 + 18446744073709551615;
 _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
 if (_84 == _86)
   goto ; [94.50%]
 else
   goto ; [5.50%]


 .L67:
   lbzx %r12,%r24,%r4
   lbzx %r25,%r7,%r4
   cmpw %cr0,%r12,%r25
   bne %cr0,.L11
   mr %r26,%r4
   addi %r4,%r4,1
   lbzx %r12,%r24,%r4
   lbzx %r25,%r7,%r4
   mr %r6,%r26
   cmpw %cr0,%r12,%r25
   bne %cr0,.L11
   mr %r26,%r4
 .L12:
   cmpdi %cr0,%r10,1
   addi %r4,%r26,1
   mr %r6,%r26
   addi %r10,%r10,-1
   bne %cr0,.L67

>
> Now, adjust_iv_update_pos doesn't seem to check that the
> condition actually uses the IV use stmt def, so it likely applies to
> too many cases.
>
> Unfortunately the introducing rev didn't come with a testcase,
> but still I think fixing up adjust_iv_update_pos is better than
> introducing a way to short-cut it per target decision.
>
> One "fix" might be to add a check that either the condition
> lhs or rhs is the def of the IV use and the other operand
> is invariant.  Or if it's of similar structure hoist across the
> other iv-use as well.  Not that I understand the argument
> about the overlapping life-range.
>
> You also don't provide a complete testcase ...
>

 Attached the test code, will also add it it patch in future version.
 The issue comes from a very small hot loop:

   do {
 len++;
   } while(len < maxlen && ip[len] == ref[len]);
>>>
>>> unsigned int foo (unsigned char *ip, unsigned char *ref, unsigned int 
>>> maxlen)
>>> {
>>> unsigned int len = 2;
>>> do {
>>> len++;
>>> }while(len < maxlen && ip[len] == ref[len]);
>>> return len;
>>> }
>>>
>>> I can see the effect on this loop on x86_64 as well, we end up with
>>>
>>> .L6:
>>>   movzbl  (%rdi,%rax), %ecx
>>>   addq$1, %rax
>>>   cmpb-1(%rsi,%rax), %cl
>>>   jne .L1
>>> .L3:
>>>   movl%eax, %r8d
>>>   cmpl%edx, %eax
>>>

Re: [PATCH] match.pd: Avoid (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST opt in GENERIC when sanitizing [PR101210]

2021-06-29 Thread Richard Biener

On Tue, 29 Jun 2021, Jakub Jelinek wrote:

> Hi!
> 
> When we have (intptr_t) x == cst where x has REFERENCE_TYPE, this
> optimization creates x == cst out of it where cst has REFERENCE_TYPE.
> If it is done in GENERIC folding, it can results in ubsan failures
> where the INTEGER_CST with REFERENCE_TYPE is instrumented.
> 
> Fixed by deferring it to GIMPLE folding in this case.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2021-06-29  Jakub Jelinek  
> 
>   PR c++/101210
>   * match.pd ((intptr_t)x eq/ne CST to x eq/ne (typeof x) CST): Don't
>   perform the optimization in GENERIC when sanitizing and x has a
>   reference type.
> 
>   * g++.dg/ubsan/pr101210.C: New test.
> 
> --- gcc/match.pd.jj   2021-06-14 12:27:18.605410685 +0200
> +++ gcc/match.pd  2021-06-28 10:08:22.535038549 +0200
> @@ -5124,7 +5124,12 @@ (define_operator_list COND_TERNARY
>(cmp (convert @0) INTEGER_CST@1)
>(if (((POINTER_TYPE_P (TREE_TYPE (@0))
>&& !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@0)))
> -  && INTEGRAL_TYPE_P (TREE_TYPE (@1)))
> +  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +  /* Don't perform this optimization in GENERIC if @0 has reference
> + type when sanitizing.  See PR101210.  */
> +  && !(GENERIC
> +   && TREE_CODE (TREE_TYPE (@0)) == REFERENCE_TYPE
> +   && (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT
>   || (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>   && POINTER_TYPE_P (TREE_TYPE (@1))
>   && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@1)
> --- gcc/testsuite/g++.dg/ubsan/pr101210.C.jj  2021-06-28 10:08:37.773825299 
> +0200
> +++ gcc/testsuite/g++.dg/ubsan/pr101210.C 2021-06-28 10:06:10.647884171 
> +0200
> @@ -0,0 +1,13 @@
> +// PR c++/101210
> +// { dg-do run }
> +// { dg-options "-fsanitize=null,alignment 
> -fno-sanitize-recover=null,alignment" }
> +
> +int v[2];
> +int
> +main ()
> +{
> +  int x;
> +  int &y = x;
> +  v[0] = reinterpret_cast<__INTPTR_TYPE__>(&y) == 0;
> +  v[1] = reinterpret_cast<__INTPTR_TYPE__>(&y) == 1;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

[PATCH] tree-optimization/101242 - fix reverse graph entry detection

2021-06-29 Thread Richard Biener

This avoids detecting random unrelated nodes as possible entries
to not backwards reachable regions of the SLP graph.  Instead
explicitely add the problematic nodes.

This temporary XFAILs gcc.dg/vect/pr67790.c until I get the
permute propagation adjusted to when it needs more than one
optimistic iteration.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-29  Richard Biener  

PR tree-optimization/101242
* tree-vect-slp.c (vect_slp_build_vertices): Force-add
PHIs with not represented initial values as leafs.

* gcc.dg/vect/bb-slp-pr101242.c: New testcase.
* gcc.dg/vect/pr67790.c: XFAIL scan for zero VEC_PERM_EXPR.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr101242.c | 38 +
 gcc/testsuite/gcc.dg/vect/pr67790.c |  2 +-
 gcc/tree-vect-slp.c | 24 ++---
 3 files changed, 50 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101242.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101242.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101242.c
new file mode 100644
index 000..d8854468df4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101242.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast" } */
+
+typedef struct {
+  double real;
+  double imag;
+} complex;
+typedef struct {
+  complex e[3][3];
+} su3_matrix;
+su3_matrix check_su3_c;
+double check_su3_ar, check_su3_ari, check_su3_max;
+int arireturn();
+int check_su3() {
+  check_su3_ar = check_su3_c.e[0][0].real * check_su3_c.e[1][0].real +
+ check_su3_c.e[0][0].imag * check_su3_c.e[1][0].imag +
+ check_su3_c.e[0][1].real * check_su3_c.e[1][1].real +
+ check_su3_c.e[0][1].imag * check_su3_c.e[1][1].imag +
+ check_su3_c.e[0][2].real * check_su3_c.e[1][2].real +
+ check_su3_c.e[0][2].imag * check_su3_c.e[1][2].imag;
+  check_su3_max = check_su3_c.e[0][0].real * check_su3_c.e[2][0].real +
+  check_su3_c.e[0][0].imag * check_su3_c.e[2][0].imag +
+  check_su3_c.e[0][1].real * check_su3_c.e[2][1].real +
+  check_su3_c.e[0][1].imag * check_su3_c.e[2][1].imag +
+  check_su3_c.e[0][2].real * check_su3_c.e[2][2].real +
+  check_su3_c.e[0][2].imag * check_su3_c.e[2][2].imag;
+  check_su3_ari = check_su3_ar;
+  if (check_su3_ari)
+check_su3_max = check_su3_c.e[1][0].real * check_su3_c.e[2][0].real +
+check_su3_c.e[1][0].imag * check_su3_c.e[2][0].imag +
+check_su3_c.e[1][1].real * check_su3_c.e[2][1].real +
+check_su3_c.e[1][1].imag * check_su3_c.e[2][1].imag +
+check_su3_c.e[1][2].real * check_su3_c.e[2][2].real +
+check_su3_c.e[1][2].imag * check_su3_c.e[2][2].imag;
+  if (check_su3_max)
+arireturn();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr67790.c 
b/gcc/testsuite/gcc.dg/vect/pr67790.c
index 32eacd91fda..0555d41abf7 100644
--- a/gcc/testsuite/gcc.dg/vect/pr67790.c
+++ b/gcc/testsuite/gcc.dg/vect/pr67790.c
@@ -38,4 +38,4 @@ int main()
 }
 
 /* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail *-*-* } 
} } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 5401dbe4d5e..63b6e6a24b9 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -3499,13 +3499,21 @@ vect_slp_build_vertices (hash_set &visited, 
slp_tree node,
   vertices.safe_push (slpg_vertex (node));
 
   bool leaf = true;
+  bool force_leaf = false;
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 if (child)
   {
leaf = false;
vect_slp_build_vertices (visited, child, vertices, leafs);
   }
-  if (leaf)
+else
+  force_leaf = true;
+  /* Since SLP discovery works along use-def edges all cycles have an
+ entry - but there's the exception of cycles where we do not handle
+ the entry explicitely (but with a NULL SLP node), like some reductions
+ and inductions.  Force those SLP PHIs to act as leafs to make them
+ backwards reachable.  */
+  if (leaf || force_leaf)
 leafs.safe_push (node->vertex);
 }
 
@@ -3519,18 +3527,8 @@ vect_slp_build_vertices (vec_info *info, 
vec &vertices,
   unsigned i;
   slp_instance instance;
   FOR_EACH_VEC_ELT (info->slp_instances, i, instance)
-{
-  unsigned n_v = vertices.length ();
-  unsigned n_l = leafs.length ();
-  vect_slp_build_vertices (visited, SLP_INSTANCE_TREE (instance), vertices,
-  leafs);
-  /* If we added vertices but no entries to the reverse graph we've
-added a cycle that is not backwards-reachable.   Push the entry
-to mimic as leaf then.  */
-  if (vertices.length () > n_v
- && lea

[PATCH] match.pd: Avoid (intptr_t)x eq/ne CST to x eq/ne (typeof x) CST opt in GENERIC when sanitizing [PR101210]

2021-06-29 Thread Jakub Jelinek via Gcc-patches

Hi!

When we have (intptr_t) x == cst where x has REFERENCE_TYPE, this
optimization creates x == cst out of it where cst has REFERENCE_TYPE.
If it is done in GENERIC folding, it can results in ubsan failures
where the INTEGER_CST with REFERENCE_TYPE is instrumented.

Fixed by deferring it to GIMPLE folding in this case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-29  Jakub Jelinek  

PR c++/101210
* match.pd ((intptr_t)x eq/ne CST to x eq/ne (typeof x) CST): Don't
perform the optimization in GENERIC when sanitizing and x has a
reference type.

* g++.dg/ubsan/pr101210.C: New test.

--- gcc/match.pd.jj 2021-06-14 12:27:18.605410685 +0200
+++ gcc/match.pd2021-06-28 10:08:22.535038549 +0200
@@ -5124,7 +5124,12 @@ (define_operator_list COND_TERNARY
   (cmp (convert @0) INTEGER_CST@1)
   (if (((POINTER_TYPE_P (TREE_TYPE (@0))
 && !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@0)))
-&& INTEGRAL_TYPE_P (TREE_TYPE (@1)))
+&& INTEGRAL_TYPE_P (TREE_TYPE (@1))
+/* Don't perform this optimization in GENERIC if @0 has reference
+   type when sanitizing.  See PR101210.  */
+&& !(GENERIC
+ && TREE_CODE (TREE_TYPE (@0)) == REFERENCE_TYPE
+ && (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT
|| (INTEGRAL_TYPE_P (TREE_TYPE (@0))
&& POINTER_TYPE_P (TREE_TYPE (@1))
&& !FUNC_OR_METHOD_TYPE_P (TREE_TYPE (TREE_TYPE (@1)
--- gcc/testsuite/g++.dg/ubsan/pr101210.C.jj2021-06-28 10:08:37.773825299 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr101210.C   2021-06-28 10:06:10.647884171 
+0200
@@ -0,0 +1,13 @@
+// PR c++/101210
+// { dg-do run }
+// { dg-options "-fsanitize=null,alignment 
-fno-sanitize-recover=null,alignment" }
+
+int v[2];
+int
+main ()
+{
+  int x;
+  int &y = x;
+  v[0] = reinterpret_cast<__INTPTR_TYPE__>(&y) == 0;
+  v[1] = reinterpret_cast<__INTPTR_TYPE__>(&y) == 1;
+}

Jakub

Re: [PATCH v5 2/2] x86: Add vec_duplicate expander

2021-06-29 Thread Richard Sandiford via Gcc-patches

"H.J. Lu via Gcc-patches"  writes:
> On Mon, Jun 28, 2021 at 5:36 AM Richard Sandiford
>  wrote:
>>
>> "H.J. Lu"  writes:
>> > On Sun, Jun 27, 2021 at 2:00 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> "H.J. Lu via Gcc-patches"  writes:
>> >> > On Sun, Jun 27, 2021 at 1:43 AM Richard Sandiford
>> >> >  wrote:
>> >> >>
>> >> >> "H.J. Lu"  writes:
>> >> >> > 1. Update vec_duplicate to allow to fail so that backend can only 
>> >> >> > allow
>> >> >> > broadcasting an integer constant to a vector when broadcast 
>> >> >> > instruction
>> >> >> > is available.  This can be used by memset expander to avoid 
>> >> >> > vec_duplicate
>> >> >> > when loading from constant pool is more efficient.
>> >> >>
>> >> >> I don't see any changes in target-independent code though, other than
>> >> >> the doc update.  It's still the case that (existing) uses of
>> >> >> vec_duplicate_optab do not allow it to fail.
>> >> >
>> >> > I have a followup patch set on
>> >> >
>> >> > https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pieces/broadcast
>> >> >
>> >> > to use it to expand memset with vector broadcast:
>> >> >
>> >> > https://gitlab.com/x86-gcc/gcc/-/commit/991c87f8a83ca736ae9ed92baa3ebadca289f6e3
>> >> >
>> >> > For SSE2 which doesn't have vector broadcast, the constant vector 
>> >> > broadcast
>> >> > expander returns FAIL and load from constant pool will be used.
>> >>
>> >> Hmm, but as Jeff and I mentioned in the earlier replies,
>> >> vec_duplicate_optab shouldn't be used for constants.  Constants
>> >> should go via the move expanders instead.
>> >>
>> >> In a previous message I suggested:
>> >>
>> >>   … would it work to change:
>> >>
>> >> /* Try using vec_duplicate_optab for uniform vectors.  */
>> >> if (!TREE_SIDE_EFFECTS (exp)
>> >> && VECTOR_MODE_P (mode)
>> >> && eltmode == GET_MODE_INNER (mode)
>> >> && ((icode = optab_handler (vec_duplicate_optab, mode))
>> >> != CODE_FOR_nothing)
>> >> && (elt = uniform_vector_p (exp)))
>> >>
>> >>   to something like:
>> >>
>> >> /* Try using vec_duplicate_optab for uniform vectors.  */
>> >> if (!TREE_SIDE_EFFECTS (exp)
>> >> && VECTOR_MODE_P (mode)
>> >> && eltmode == GET_MODE_INNER (mode)
>> >> && (elt = uniform_vector_p (exp)))
>> >>   {
>> >> if (TREE_CODE (elt) == INTEGER_CST
>> >> || TREE_CODE (elt) == POLY_INT_CST
>> >> || TREE_CODE (elt) == REAL_CST
>> >> || TREE_CODE (elt) == FIXED_CST)
>> >>   {
>> >> rtx src = gen_const_vec_duplicate (mode, expand_normal 
>> >> (node));
>> >> emit_move_insn (target, src);
>> >> break;
>> >>   }
>> >> …
>> >>   }
>> >>
>> >> if that code was the source of the constant operand.  If we're adding a
>> >> new use of vec_duplicate_optab then that should be similarly protected
>> >> against constant operands.
>> >>
>> >
>> > Your comments apply to my initial vec_duplicate patch that caused the
>> > gcc.dg/pr100239.c failure.  It has been fixed by
>> >
>> > commit ffe3a37f54ab866d85bdde48c2a32be5e09d8515
>> > Author: Richard Biener 
>> > Date:   Mon Jun 7 20:08:13 2021 +0200
>> >
>> > middle-end/100951 - make sure to generate VECTOR_CST in lowering
>> >
>> > When vector lowering creates piecewise ops make sure to create
>> > VECTOR_CSTs instead of CONSTRUCTORs when possible.
>> >
>> > The problem I am running into now is in my memset vector broadcast
>> > patch.  In order to optimize vector broadcast for memset, I need to
>> > generate a pseudo register for
>> >
>> >  __builtin_memset (ops, 3, 38);
>> >
>> > only when vector broadcast is available:
>> >
>> >   rtx target = nullptr;
>> >
>> >   unsigned int nunits = GET_MODE_SIZE (mode) / GET_MODE_SIZE (QImode);
>> >   machine_mode vector_mode;
>> >   if (!mode_for_vector (QImode, nunits).exists (&vector_mode))
>> > gcc_unreachable ();
>> >
>> >   enum insn_code icode = optab_handler (vec_duplicate_optab,
>> > vector_mode);
>> >   if (icode != CODE_FOR_nothing)
>> > {
>> >   rtx reg = targetm.gen_memset_scratch_rtx (vector_mode);
>> >   class expand_operand ops[2];
>> >   create_output_operand (&ops[0], reg, vector_mode);
>> >   create_input_operand (&ops[1], data, QImode);
>> >   if (maybe_expand_insn (icode, 2, ops))
>> > {
>> >   if (!rtx_equal_p (reg, ops[0].value))
>> > emit_move_insn (reg, ops[0].value);
>> >   target = lowpart_subreg (mode, reg, vector_mode);
>> > }
>> > }
>> >
>> >   return target;  <<< Return nullptr to load from constant pool.
>>
>> I don't think this is a correct use of vec_duplicate_optab.  If the
>> scalar operand is a constant then the move should always go through
>> the move expanders instead, as a move from a CONST_VECTOR.
>
> Like this

[PATCH 2/2] RISC-V: Add ldr/str instruction for T-HEAD.

2021-06-29 Thread Jojo R via Gcc-patches

gcc/
* gcc/config/riscv/riscv-opts.h (TARGET_LDR): New.
(TARGET_LDUR): Likewise.
* gcc/config/riscv/riscv.h (INDEX_REG_CLASS): Use TARGET_LDR.
(REGNO_OK_FOR_INDEX_P): Use TARGET_LDR.
(REG_OK_FOR_INDEX_P): Use REGNO_OK_FOR_INDEX_P.
* gcc/config/riscv/riscv.c (riscv_address_type): Add ADDRESS_REG_REG,
ADDRESS_REG_UREG.
(riscv_address_info): Add shift.
(riscv_classify_address_index): New.
(riscv_classify_address): Use riscv_classify_address_index.
(riscv_legitimize_address_index_p): New.
(riscv_output_move_index): New.
(riscv_output_move): Add parameter, Use riscv_output_move_index.
(riscv_print_operand_address): Use ADDRESS_REG_REG, ADDRESS_REG_UREG.
* gcc/config/riscv/riscv-protos.h (riscv_output_move): Update 
riscv_output_move.
* gcc/config/riscv/riscv.md (zero_extendsidi2): Use riscv_output_move.
(zero_extendhi2): Likewise.
(zero_extendqi2): Likewise.
(extendsidi2): Likewise.
(extend2): Likewise.
* gcc/config/riscv/predicates.md (sync_memory_operand): New.
* gcc/config/riscv/sync.md (atomic_store): Use 
sync_memory_operand.
(atomic_): Likewise.
(atomic_fetch_): Likewise.
(atomic_exchange): Likewise.
(atomic_cas_value_strong): Likewise.
(atomic_compare_and_swap): Likewise.
(atomic_test_and_set): Likewise.

gcc/testsuite/
* gcc.target/riscv/xthead/riscv-xthead.exp: New.
* gcc.target/riscv/xthead/ldr.c: Likewise.
---
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-opts.h |   3 +
 gcc/config/riscv/riscv-protos.h   |   2 +-
 gcc/config/riscv/riscv.c  | 234 --
 gcc/config/riscv/riscv.h  |   7 +-
 gcc/config/riscv/riscv.md |  50 ++--
 gcc/config/riscv/sync.md  |  14 +-
 gcc/testsuite/gcc.target/riscv/xthead/ldr.c   |  34 +++
 .../gcc.target/riscv/xthead/riscv-xthead.exp  |  41 +++
 9 files changed, 348 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xthead/ldr.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xthead/riscv-xthead.exp

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 232115135544..802e7a40e880 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -217,3 +217,7 @@
 {
   return riscv_gpr_save_operation_p (op);
 })
+
+(define_predicate "sync_memory_operand"
+  (and (match_operand 0 "memory_operand")
+   (match_code "reg" "0")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index a2d84a66f037..d3163cb2377c 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -76,4 +76,7 @@ enum stack_protector_guard {
 #define MASK_XTHEAD_C (1 << 0)
 #define TARGET_XTHEAD_C ((riscv_x_subext & MASK_XTHEAD_C) != 0)
 
+#define TARGET_LDR (TARGET_XTHEAD_C)
+#define TARGET_LDUR (TARGET_XTHEAD_C)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 43d7224d6941..3a218f327c42 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -52,9 +52,9 @@ extern bool riscv_legitimize_move (machine_mode, rtx, rtx);
 extern rtx riscv_subword (rtx, bool);
 extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
-extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
 #ifdef RTX_CODE
+extern const char *riscv_output_move (rtx, rtx, rtx_code outer = UNKNOWN);
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
 extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 576960bb37cb..7d321826f669 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -80,6 +80,12 @@ along with GCC; see the file COPYING3.  If not see
A natural register + offset address.  The register satisfies
riscv_valid_base_register_p and the offset is a const_arith_operand.
 
+  ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+  ADDRESS_REG_UREG
+   A base register indexed by (optionally scaled) zero-extended register.
+
ADDRESS_LO_SUM
A LO_SUM rtx.  The first operand is a valid base register and
the second operand is a symbolic address.
@@ -91,6 +97,8 @@ along with GCC; see the file COPYING3.  If not see
A constant symbolic address.  */
 enum riscv_address_type {
   ADDRESS_REG,
+  ADDRESS_REG_REG,
+  ADDRESS_REG_UREG,
   ADDRESS_LO_SUM,
   ADDRESS_CONST_INT,
   ADDRESS_SYMBOLIC
@@ -175,6 +183,11 @@ struct riscv_arg_info {
ADDRESS_REG
REG is the base r

[PATCH 1/2] RISC-V: Add arch flags for T-HEAD.

2021-06-29 Thread Jojo R via Gcc-patches

gcc/
* gcc/config/riscv/riscv.opt (riscv_x_subext): New.
* gcc/config/riscv/riscv-opts.h (MASK_XTHEAD_C): New.
(TARGET_XTHEAD_C): Likewise.
* gcc/common/config/riscv/riscv-common.c
(riscv_ext_flag_table): Use riscv_x_subext & MASK_XTHEAD_C.
---
 gcc/common/config/riscv/riscv-common.c | 2 ++
 gcc/config/riscv/riscv-opts.h  | 3 +++
 gcc/config/riscv/riscv.opt | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.c 
b/gcc/common/config/riscv/riscv-common.c
index 10868fd417dc..a62080129259 100644
--- a/gcc/common/config/riscv/riscv-common.c
+++ b/gcc/common/config/riscv/riscv-common.c
@@ -906,6 +906,8 @@ static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
   {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
 
+  {"xtheadc", &gcc_options::x_riscv_x_subext, MASK_XTHEAD_C},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index f4cf6ca4b823..a2d84a66f037 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -73,4 +73,7 @@ enum stack_protector_guard {
 #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
 #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
 
+#define MASK_XTHEAD_C (1 << 0)
+#define TARGET_XTHEAD_C ((riscv_x_subext & MASK_XTHEAD_C) != 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 5ff85c214307..84176aea05e9 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -195,6 +195,9 @@ long riscv_stack_protector_guard_offset = 0
 TargetVariable
 int riscv_zi_subext
 
+TargetVariable
+int riscv_x_subext
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
-- 
2.24.3 (Apple Git-128)

[PATCH 0/2] RISC-V: Add ldr/str instruction for T-HEAD.

2021-06-29 Thread Jojo R via Gcc-patches

T-HEAD extends some customized ISAs for Cores.
The patches support ldr/str insns, it likes arm's LDR insn, the
memory model is a base register indexed by (optionally scaled) register.

Re: pdp11: Fix warnings to allow compilation with a recent GCC and --enable-werror-always

2021-06-29 Thread Jan-Benedict Glaw

On Mon, 2021-06-28 16:17:13 +, Koning, Paul  wrote:
> > On Jun 28, 2021, at 11:33 AM, Jan-Benedict Glaw  wrote:
> > 
> > I'd like to install this patch to let the pdp11-aout configuration
> > build again with eg.
[...]
> > Okay for master?
> Yes, thanks!

Pushed. Thanks!

MfG, JBG

-- 


signature.asc
Description: PGP signature

Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with a * b

2021-06-29 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 21 Jun 2021 at 14:04, Prathamesh Kulkarni
 wrote:
>
> On Mon, 14 Jun 2021 at 13:27, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 7 Jun 2021 at 12:45, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 31 May 2021 at 16:01, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Wed, 26 May 2021 at 14:07, Marc Glisse  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > > > > >
> > > > > > > The attached patch removes calls to builtins in vmul_n* (a, b) 
> > > > > > > with __a * __b.
> > > > > >
> > > > > > I am not familiar with neon, but are __a and __b unsigned here? 
> > > > > > Otherwise,
> > > > > > is vmul_n already undefined in case of overflow?
> > > > > Hi Marc,
> > > > > Sorry for late reply, for vmul_n_s*, I think they are signed
> > > > > (intx_t).
> > > > Oops, I meant intx_t.
> > > > > I am not sure how should the intrinsic behave in case of signed 
> > > > > overflow,
> > > > > but I am assuming it's OK since vmul_s* intrinsics leave it undefined 
> > > > > too.
> > > > > Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of 
> > > > > overflow ?
> > > The attached version fixes one fallout I missed earlier.
> > > Is this OK to commit ?
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html
> ping * 2 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html
ping * 3 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > --
> > > > > > Marc Glisse

Re: [ARM] PR66791: Gate comparison in vca intrinsics on __FAST_MATH__

2021-06-29 Thread Prathamesh Kulkarni via Gcc-patches

On Tue, 22 Jun 2021 at 15:04, Prathamesh Kulkarni
 wrote:
>
> Hi,
> The attached patch gates abs(__a) cmp abs(__b) for vca intrinsics on
> __FAST_MATH__. I moved vabs intrinsics before vcage_f32 since vca
> intrinsics use those.
> Bootstrapped+tested on arm-linux-gnueabihf.
> OK to commit ?
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573384.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh

95 matches

Mail list logo