date:20191016

[PATCH] Fix ifcombine ICE (PR tree-optimization/92115)

2019-10-16 Thread Jakub Jelinek

Hi!

The following testcase ICEs, because ifcombine_ifandif attempts to set
GIMPLE_COND condition to a condition that might trap (which is allowed
only in COND_EXPR/VEC_COND_EXPR).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2019-10-17  Jakub Jelinek  

PR tree-optimization/92115
* tree-ssa-ifcombine.c (ifcombine_ifandif): Force condition into
temporary if it could trap.

* gcc.dg/pr92115.c: New test.

--- gcc/tree-ssa-ifcombine.c.jj 2019-09-20 12:25:42.232479343 +0200
+++ gcc/tree-ssa-ifcombine.c2019-10-16 10:05:06.826174814 +0200
@@ -599,6 +599,12 @@ ifcombine_ifandif (basic_block inner_con
   t = canonicalize_cond_expr_cond (t);
   if (!t)
return false;
+  if (!is_gimple_condexpr_for_cond (t))
+   {
+ gsi = gsi_for_stmt (inner_cond);
+ t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr_for_cond,
+ NULL, true, GSI_SAME_STMT);
+   }
   gimple_cond_set_condition_from_tree (inner_cond, t);
   update_stmt (inner_cond);
 
--- gcc/testsuite/gcc.dg/pr92115.c.jj   2019-10-16 10:07:35.923924633 +0200
+++ gcc/testsuite/gcc.dg/pr92115.c  2019-10-16 10:06:56.831514691 +0200
@@ -0,0 +1,10 @@
+/* PR tree-optimization/92115 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fexceptions -ffinite-math-only -fnon-call-exceptions 
-fsignaling-nans -fno-signed-zeros" } */
+
+void
+foo (double x)
+{
+  if (x == 0.0 && !__builtin_signbit (x))
+__builtin_abort ();
+}

Jakub

[committed] Add testcase for already fixed omp simd ICE (PR fortran/87752)

2019-10-16 Thread Jakub Jelinek

Hi!

This has been fixed already with r273096, but it is useful to have also a
Fortran testcase.  Committed to trunk as obvious.

2019-10-17  Jakub Jelinek  

PR fortran/87752
* gfortran.dg/gomp/pr87752.f90: New test.

--- gcc/testsuite/gfortran.dg/gomp/pr87752.f90.jj   2019-10-16 
10:19:09.584439754 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr87752.f90  2019-10-16 10:19:02.343549209 
+0200
@@ -0,0 +1,12 @@
+! PR fortran/87752
+! { dg-do compile }
+! { dg-additional-options "-Ofast" }
+
+subroutine foo (n, u, v)
+  integer :: n
+  real, pointer :: u(:), v(:)
+  !$omp parallel do simd
+  do i = 1, n
+u(:) = v(:)
+  end do
+end

Jakub

Re: [PATCH] [x86] Add detection of Icelake Client and Server

2019-10-16 Thread Uros Bizjak

> gcc/ChangeLog:
> * config/i386/driver-i386.c (host_detect_local_cpu): Handle
>   icelake-client and icelake-server.
> * testsuite/gcc.target/i386/builtin_target.c (check_intel_cpu_model):
>   Verify icelakes are detected correctly.
>
> libgcc/ChangeLog:
> * config/i386/cpuinfo.c (get_intel_cpu): Handle icelake-client
>   and icelake-server.

Please also state how you bootstrapped and tested the patch.

Otherwise OK.

Thanks,
Uros.

Re: [PATCH] i386: Add clear_ratio to processor_costs

2019-10-16 Thread Uros Bizjak

On Wed, Oct 16, 2019 at 5:06 PM H.J. Lu  wrote:
>
> i386.h has
>
>  #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
>
> It is impossible to have CLEAR_RATIO > 6.  This patch adds clear_ratio
> to processor_costs, sets it to the minimum of 6 and move_ratio in all
> cost models and defines CLEAR_RATIO with clear_ratio.
>
> * config/i386/i386.h (processor_costs): Add clear_ratio.
> (CLEAR_RATIO): Remove MIN and use ix86_cost->clear_ratio.
> * config/i386/x86-tune-costs.h: Set clear_ratio to the minimum
> of 6 and move_ratio in all cost models.
>
> OK for trunk?

LGTM. Are these numbers backed by some benchmark results?

Thanks,
Uros.

>
> --
> H.J.

Re: [PATCH] Fix -fdebug-types-section ICE, PR91887

2019-10-16 Thread Jason Merrill


On 10/16/19 6:11 AM, Richard Biener wrote:


The following makes sure we correctly identify a parm DIE created
early in a formal parameter pack during late annotation.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK?


OK, thanks.

Jason

Re: [RFC, Darwin, PPC] Fix PR 65342.

2019-10-16 Thread Alan Modra

On Sat, Oct 12, 2019 at 05:39:51PM -0500, Segher Boessenkool wrote:
> On Sat, Oct 12, 2019 at 10:13:16PM +0100, Iain Sandoe wrote:
> > For 32bit cases this isn't a problem since we can load/store to unaligned
> > addresses using D-mode insns.
> 
> Can you?  -m32 -mpowerpc64?  We did have a bug with this before, maybe
> six years ago or so...  Alan, do you remember?  It required some assembler
> work IIRC.

Yes, the ppc32 ABI doesn't have the relocs to support DS fields.
Rather than defining a whole series of _DS (and _DQ!) relocs, the
linker inspects the instruction being relocated and complains if the
relocation would modify opcode bits.  See is_insn_ds_form in
bfd/elf32-ppc.c.  We do the same on ppc64 for DQ field insns.

> I'll have another looke through this (esp. the generic part) when I'm fresh
> awake (but not before coffee!).  Alan, can you have a look as well please?

It looks reasonable to me.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [C++ Patch] Remove most uses of in_system_header_at

2019-10-16 Thread Jason Merrill


On 10/16/19 11:59 AM, Paolo Carlini wrote:
... the below, slightly extended patch: 1- Makes sure the 
in_system_header_at calls surviving in decl.c get the same location used 
for the corresponding diagnostic


Hmm, we probably want to change permerror to respect warn_system_headers 
like warning and pedwarn.


Jason

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Jeff Law

On 10/16/19 9:43 AM, Martin Sebor wrote:
> On 10/16/19 9:11 AM, Richard Sandiford wrote:
>> Sorry for the slow reply.
>>
>> Bernd Edlinger  writes:
>>> Hi,
>>>
>>> this is probably on the border to obvious.
>>>
>>> The REGEXP_xxx macros in genautomata are invoked
>>> recursively, and the local values are all named _regexp
>>> and shadow each other.
>>>
>>>
>>> Fixed by using different names _regexp1..6 for each
>>> macro.
>>
>> Sorry to repeat the complaint about numerical suffixes, but I think
>> we'd need better names.  E.g. _regexp_unit or _re_unit for REGEXP_UNIT
>> and similarly for the other macros.  But a similar fix to rtl.h might
>> be better.
> 
> Should the warning trigger when the shadowing name results from
> macro expansion?  The author of a macro can't (in general) know
> what context it's going to be used, and when different macros
> come from two different third party headers, it would seem
> pointless to force their users to jump through hoops just to
> avoid the innocuous shadowing.  Such as in this example:
One could make the argument that if you want to avoid shadowing, then
you should avoid code within macros for this precise reason.  And if
you're getting code within macros from 3rd parties, then well, you're in
for a world of pain if you're going to try to be shadow-free.



jeff

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Eric Gallager

On 10/16/19, Jakub Jelinek  wrote:
> On Wed, Oct 16, 2019 at 10:03:51AM -0600, Martin Sebor wrote:
>> > The counter example would be:
>> > #define F(x) \
>> >__extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
>> > #define G(x) \
>> >__extension__ (({ __typeof__ (x) _x = x; F(_x); }))
>> > where a -Wshadow diagnostics could point the author at a serious bug,
>> > because in the expansion it will be __typeof__ (_x) _x = _x; ...
>>
>> True.  I don't suppose there is a way to make it so the warning
>> triggers for the counter example and not for the original, is
>> there?
>
> Maybe look through the macro nesting context and if the shadowing
> declaration comes from the same macro as shadowed declaration
> or macro included directly or indirectly from the macro with shadowed
> declaration, warn, otherwise not?
> This might still not warn in case where the scope of the shadowing
> declaration is created from multiple macros ({ coming from one,
> }) from another one, but otherwise could work.
> Perhaps -Wshadow-local needs multiple modes, the default one that
> will have this macro handling and full one (=2) which would warn
> regardless of macro definitions.
>

I'm worried about the proliferation of the number of '=' signs here...
there's already confusion as to whether the first '=' represents
levels or just a different spelling of names, adding a second would
only compound the confusion.

>   Jakub
>

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Richard Sandiford

Segher Boessenkool  writes:
> On Wed, Oct 16, 2019 at 09:04:18PM +0100, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>> > This isn't canonical RTL.  Does combine not simplify this?
>> >
>> > Or, rather, it should not be what we canonicalise to: nothing is defined
>> > here.
>> 
>> But when nothing is defined, let's match what we get :-)
>
> Of course.
>
>> If someone wants to add a new canonical form then the ports should of
>> course adapt, but until then I think the patch is doing the right thing.
>
> We used to generate this, until GCC 5.  There aren't many ports that have
> adapted yet.

The patch has testcases, so this won't be a silent failure for SVE2
if things change again in future.

>> > If the mask is not a constant, we really shouldn't generate a totally
>> > different form.  The xor-and-xor form is very hard to handle, too.
>> >
>> > Expand currently generates this, because gimple thinks this is simpler.
>> > I think this should be fixed.
>> 
>> But the constant form is effectively folding away the NOT.
>> Without it the equivalent rtl uses 4 operations rather than 3:
>> 
>>   (ior (and A C) (and B (not C)))
>
> RTL canonicalisation rules are not based around number of ops.

Not to the exclusion of all else, sure.  But my point was that there
are reasons why forcing the (ior ...) form for non-constants might not
be a strict improvement.

> For example, we do (and (not A) (not B)) rather than (not (ior (A B)) .

Right, hence my complaint about this the other day on IRC. :-)
I hadn't noticed until then that gimple had a different rule.

> Instead, there are other rules (like here: push "not"s inward,
> which can be applied locally with the wanted result).

Sure.  But I think it's common ground that there's no existing
rtl rule that applies naturally to (xor (and (xor A B) C) B),
where there's no (not ...) to push down.

>> And folding 4 operations gets us into 4-insn combinations, which are
>> obviously more limited (for good reason).
>
> But on most machines it doesn't need to combine more than two or three
> insns to get here.  Reducing the depth of the tree is more useful...  That
> is 3 in both cases here, but "andc" is common on many machines, so that
> makes it only two deep.
>
>> As you say, it's no accident that we get this form, it's something
>> that match.pd specifically chose.  And I think there should be a
>> strong justification for having an RTL canonical form that reverses
>> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
>> to be able to undo the gimple transforms in all cases.
>
> Canonical RTL is different in many ways, already.

Sure, wasn't claiming otherwise.  But most of the rtl canonicalisation
rules predate gimple by some distance, so while the individual choices
are certainly deliberate, the differences weren't necessarily planned
as differences.  Whereas here we're talking about adding a new rtl rule
with the full knowledge that it's the ooposite of the equivalent gimple
rule.  If we're going to move in one direction, it seems better to move
towards making the rules more consistent rather than towards deliberately
making them (even) less consistent.

> "Not as powerful", I have no idea what you mean, btw.  RTL is much closer
> to the real machine, so is a lot *more* powerful than Gimple for modelling
> machine instructions (where Gimple is much nicer for higher-level
> optimisations).  We need both.

I meant rtl passes aren't generally as powerful as gimple passes
(which wasn't what I said :-)).  E.g. match.pd sees potential
combinations on gimple stmts that combine wouldn't see for the
corresponding rtl insns.

Richard

[PATCH] [OBVIOUS] Fix old file reference in gcc/cp/cp-gimplify.c

2019-10-16 Thread Luis Machado

I've found this stale reference while looking at cp-gimplify.c. tree-gimplify.c
no longer exists and its contents were merged into gimple.c.

Seems obvious enough.

gcc/cp/ChangeLog:

2019-10-16  Luis Machado  

* cp-gimplify.c: Fix reference to non-existing tree-gimplify.c file.

Signed-off-by: Luis Machado 
---
 gcc/cp/cp-gimplify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 154fa70ec06..0ab0438f601 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -1,4 +1,4 @@
-/* C++-specific tree lowering bits; see also c-gimplify.c and tree-gimple.c.
+/* C++-specific tree lowering bits; see also c-gimplify.c and gimple.c.
 
Copyright (C) 2002-2019 Free Software Foundation, Inc.
Contributed by Jason Merrill 
-- 
2.17.1

Re: [PATCH] handle local aggregate initialization in strlen, take 2 (PR 83821)

2019-10-16 Thread Jakub Jelinek

On Mon, Oct 14, 2019 at 06:23:22PM -0600, Martin Sebor wrote:
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/83821
>   * tree-ssa-strlen.c (maybe_invalidate): Add argument.  Consider
>   the length of a string when available.

> +   fprintf (dump_file,
> +"  statement may clobber string %zu long\n",
> +tree_to_uhwi (size));
> + else
> +   fprintf (dump_file,
> +"  statement may clobber string\n");

This broke bootstrap on i686-linux and likely many other hosts.

I'll commit the following as obvious if it passes bootstrap.

2019-10-16  Jakub Jelinek  

* tree-ssa-strlen.c (maybe_invalidate): Use
HOST_WIDE_INT_PRINT_UNSIGNED instead of "%zu".

--- gcc/tree-ssa-strlen.c.jj2019-10-16 23:10:18.641727500 +0200
+++ gcc/tree-ssa-strlen.c   2019-10-16 23:44:43.202636942 +0200
@@ -1130,7 +1130,8 @@ maybe_invalidate (gimple *stmt, bool zer
  {
if (size && tree_fits_uhwi_p (size))
  fprintf (dump_file,
-  "  statement may clobber string %zu long\n",
+  "  statement may clobber string "
+  HOST_WIDE_INT_PRINT_UNSIGNED " long\n",
   tree_to_uhwi (size));
else
  fprintf (dump_file,


Jakub

[PATCH] [x86] Add detection of Icelake Client and Server

2019-10-16 Thread Thiago Macieira

gcc/ChangeLog:
* config/i386/driver-i386.c (host_detect_local_cpu): Handle
  icelake-client and icelake-server.
* testsuite/gcc.target/i386/builtin_target.c (check_intel_cpu_model):
  Verify icelakes are detected correctly.

libgcc/ChangeLog:
* config/i386/cpuinfo.c (get_intel_cpu): Handle icelake-client
  and icelake-server.
---
 gcc/config/i386/driver-i386.c  | 11 +++
 gcc/testsuite/gcc.target/i386/builtin_target.c | 11 +++
 libgcc/config/i386/cpuinfo.c   | 13 +
 3 files changed, 35 insertions(+)

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 8e8b4d21950..996308c7eaa 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -855,6 +855,17 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
  /* Cannon Lake.  */
  cpu = "cannonlake";
  break;
+   case 0x6a:
+   case 0x6c:
+ /* Icelake Server. */
+ cpu = "icelake-server";
+ break;
+   case 0x7d:
+   case 0x7e:
+   case 0x9d:
+ /* Icelake Client. */
+ cpu = "icelake-client";
+ break;
case 0x85:
  /* Knights Mill.  */
  cpu = "knm";
diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c 
b/gcc/testsuite/gcc.target/i386/builtin_target.c
index 7a8b6e805ed..ef15d48f84d 100644
--- a/gcc/testsuite/gcc.target/i386/builtin_target.c
+++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
@@ -124,6 +124,17 @@ check_intel_cpu_model (unsigned int family, unsigned int 
model,
  /* Cannon Lake.  */
  assert (__builtin_cpu_is ("cannonlake"));
  break;
+   case 0x7d:
+   case 0x7e:
+   case 0x9d:
+ /* Icelake Client. */
+ assert (__builtin_cpu_is ("icelake-client"));
+ break;
+   case 0x6a:
+   case 0x6c:
+ /* Icelake Server. */
+ assert (__builtin_cpu_is ("icelake-server"));
+ break;
case 0x17:
case 0x1d:
  /* Penryn.  */
diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index 5659ec89546..cc9c62f053b 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -232,6 +232,19 @@ get_intel_cpu (unsigned int family, unsigned int model, 
unsigned int brand_id)
  __cpu_model.__cpu_type = INTEL_COREI7;
  __cpu_model.__cpu_subtype = INTEL_COREI7_CANNONLAKE;
  break;
+   case 0x7d:
+   case 0x7e:
+   case 0x9d:
+ /* Icelake Client. */
+ __cpu_model.__cpu_type = INTEL_COREI7;
+ __cpu_model.__cpu_subtype = INTEL_COREI7_ICELAKE_CLIENT;
+ break;
+   case 0x6a:
+   case 0x6c:
+ /* Icelake Server. */
+ __cpu_model.__cpu_type = INTEL_COREI7;
+ __cpu_model.__cpu_subtype = INTEL_COREI7_ICELAKE_SERVER;
+ break;
case 0x17:
case 0x1d:
  /* Penryn.  */
-- 
2.23.0

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-16 Thread JeanHeyd Meneide

Thanks, Jason! I fixed those last things and I put the changelog below
in the e-mail. I'll figure out how to write a good changelog in a
commit message on the command line soon. :D



2019-10-16  JeanHeyd Meneide 

gcc/

* escaped_string.h: New. Refactored out of tree.c to make more
broadly available (e.g. to parser.c, cvt.c).
* tree.c: remove escaped_string class

gcc/c-family

* c-lex.c - update attribute value

gcc/cp/

* tree.c: Implement p1301 - nodiscard("should have a reason"))
Added C++2a nodiscard string message handling.
Increase nodiscard argument handling max_length from 0
to 1.
* parser.c: add requirement that nodiscard only be seen
once in attribute-list
* cvt.c: add nodiscard message to output, if applicable
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..fb05b5f8af0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
  else if (is_attribute_p ("deprecated", attr_name))
result = 201309;
  else if (is_attribute_p ("maybe_unused", attr_name)
-  || is_attribute_p ("nodiscard", attr_name)
   || is_attribute_p ("fallthrough", attr_name))
result = 201603;
  else if (is_attribute_p ("no_unique_address", attr_name)
   || is_attribute_p ("likely", attr_name)
   || is_attribute_p ("unlikely", attr_name))
result = 201803;
+ else if (is_attribute_p ("nodiscard", attr_name))
+   result = 201907;
  if (result)
attr_name = NULL_TREE;
}
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 364af72e68d..f2d2ba6cafb 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"
 
 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1026,22 +1027,45 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
 
   tree rettype = TREE_TYPE (type);
   tree fn = cp_get_fndecl_from_callee (callee);
+  tree attr;
   if (implicit != ICV_CAST && fn
-  && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
+  && (attr = lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = (has_msg ?
+   G_("ignoring return value of %qD, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of %qD, "
+  "declared with attribute %%s"));
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value of %qD, "
- "declared with attribute nodiscard", fn))
+ format, fn, raw_msg))
inform (DECL_SOURCE_LOCATION (fn), "declared here");
 }
   else if (implicit != ICV_CAST
-  && lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
+  && (attr = lookup_attribute ("nodiscard", TYPE_ATTRIBUTES 
(rettype
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = has_msg ?
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %%s");
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring returned value of type %qT, "
- "declared with attribute nodiscard", rettype))
+ format, rettype, raw_msg))
{
  if (fn)
inform (DECL_SOURCE_LOCATION (fn),
@@ -1180,7 +1204,7 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
 instantiations be affected by an ABI property that is, or at
 least ought to be transparent to the language.  */
   if (tree fn = cp_get_callee_fndecl_nofold (expr))
-   if (DECL_CONSTRUCTOR_P (fn) || DECL_DESTRUCTOR_P (fn))
+   if (DECL_DESTRUCTOR_P (fn))
  return expr;
 
   maybe_warn_nodiscard (expr, implicit);
diff --g

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Segher Boessenkool

On Wed, Oct 16, 2019 at 09:04:18PM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > This isn't canonical RTL.  Does combine not simplify this?
> >
> > Or, rather, it should not be what we canonicalise to: nothing is defined
> > here.
> 
> But when nothing is defined, let's match what we get :-)

Of course.

> If someone wants to add a new canonical form then the ports should of
> course adapt, but until then I think the patch is doing the right thing.

We used to generate this, until GCC 5.  There aren't many ports that have
adapted yet.

> > If the mask is not a constant, we really shouldn't generate a totally
> > different form.  The xor-and-xor form is very hard to handle, too.
> >
> > Expand currently generates this, because gimple thinks this is simpler.
> > I think this should be fixed.
> 
> But the constant form is effectively folding away the NOT.
> Without it the equivalent rtl uses 4 operations rather than 3:
> 
>   (ior (and A C) (and B (not C)))

RTL canonicalisation rules are not based around number of ops.  For example,
we do  (and (not A) (not B))  rather than  (not (ior (A B))  .  Instead,
there are other rules (like here: push "not"s inward, which can be applied
locally with the wanted result).

> And folding 4 operations gets us into 4-insn combinations, which are
> obviously more limited (for good reason).

But on most machines it doesn't need to combine more than two or three
insns to get here.  Reducing the depth of the tree is more useful...  That
is 3 in both cases here, but "andc" is common on many machines, so that
makes it only two deep.

> As you say, it's no accident that we get this form, it's something
> that match.pd specifically chose.  And I think there should be a
> strong justification for having an RTL canonical form that reverses
> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
> to be able to undo the gimple transforms in all cases.

Canonical RTL is different in many ways, already.

"Not as powerful", I have no idea what you mean, btw.  RTL is much closer
to the real machine, so is a lot *more* powerful than Gimple for modelling
machine instructions (where Gimple is much nicer for higher-level
optimisations).  We need both.

Segher

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2019-10-16 Thread Richard Sandiford

Wilco Dijkstra  writes:
> ping
>
> PR79262 has been fixed for almost all AArch64 cpus, however the example is 
> still
> vectorized in a few cases, resulting in lower performance.  Increase the cost 
> of
> vector-to-scalar moves so it is more similar to the other vector costs. As a 
> result
> -mcpu=cortex-a53 no longer vectorizes the testcase - libquantum and SPECv6
> performance improves.
>
> OK for commit?
>
> ChangeLog:
> 2018-01-22  Wilco Dijkstra  
>
> PR target/79262
> * config/aarch64/aarch64.c (generic_vector_cost): Adjust 
> vec_to_scalar_cost.

OK, thanks, and sorry for the delay.

qdf24xx_vector_cost is the only specific CPU cost table with a
vec_to_scalar_cost as low as 1.  It's not obvious how emphatic
that choice is though.  It looks like qdf24xx_vector_cost might
(very reasonably!) have started out as a copy of the generic costs
with some targeted changes.

But even if 1 is accurate there from a h/w perspective, the problem
is that the vectoriser's costings have a tendency to miss additional
overhead involved in scalarisation.  Although increasing the cost
to avoid that might be a bit of a hack, it's the accepted hack.

So I suspect in practice all CPUs will benefit from a higher cost,
not just those whose CPU tables already have one.  On that basis,
increasing the generic cost by the smallest possible amount should
be a good change across the board.

If anyone finds a counter-example, please let us know or file a bug.

Richard

> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> c6a83c881038873d8b68e36f906783be63ddde56..43f5b7162152ca92a916f4febee01f624c375202
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -403,7 +403,7 @@ static const struct cpu_vector_cost generic_vector_cost =
>1, /* vec_int_stmt_cost  */
>1, /* vec_fp_stmt_cost  */
>2, /* vec_permute_cost  */
> -  1, /* vec_to_scalar_cost  */
> +  2, /* vec_to_scalar_cost  */
>1, /* scalar_to_vec_cost  */
>1, /* vec_align_load_cost  */
>1, /* vec_unalign_load_cost  */

[PATCH] RISC-V: Include more registers in SIBCALL_REGS.

2019-10-16 Thread Jim Wilson

This finishes the part 1 of 2 patch submitted by Andrew Burgess on Aug 19.
This adds the argument registers but not t0 (aka x5) to SIBCALL_REGS.  It
also adds the missing riscv_regno_to_class change.

Tested with cross riscv32-elf and riscv64-linux toolchain build and check.
There were no regressions.  I see about a 0.01% code size reduction for the
C and libstdc++ libraries.

Committed.

Jim

gcc/
* config/riscv/riscv.h (REG_CLASS_CONTENTS): Add argument passing
regs to SIBCALL_REGS.
* config/riscv/riscv.c (riscv_regno_to_class): Change argument
passing regs to SIBCALL_REGS.
---
 gcc/config/riscv/riscv.c | 6 +++---
 gcc/config/riscv/riscv.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index b8a8778b92c..77a3ad94aa8 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -256,9 +256,9 @@ enum riscv_microarchitecture_type riscv_microarchitecture;
 const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   GR_REGS, GR_REGS,GR_REGS,GR_REGS,
   GR_REGS, GR_REGS,SIBCALL_REGS,   SIBCALL_REGS,
-  JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
-  JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
-  JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
+  JALR_REGS,   JALR_REGS,  SIBCALL_REGS,   SIBCALL_REGS,
+  SIBCALL_REGS,SIBCALL_REGS,   SIBCALL_REGS,   SIBCALL_REGS,
+  SIBCALL_REGS,SIBCALL_REGS,   JALR_REGS,  JALR_REGS,
   JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
   JALR_REGS,   JALR_REGS,  JALR_REGS,  JALR_REGS,
   SIBCALL_REGS,SIBCALL_REGS,   SIBCALL_REGS,   SIBCALL_REGS,
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 5fc9be8edbf..246494663f6 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -400,7 +400,7 @@ enum reg_class
 #define REG_CLASS_CONTENTS \
 {  \
   { 0x, 0x, 0x },  /* NO_REGS */   \
-  { 0xf0c0, 0x, 0x },  /* SIBCALL_REGS */  \
+  { 0xf003fcc0, 0x, 0x },  /* SIBCALL_REGS */  \
   { 0xffc0, 0x, 0x },  /* JALR_REGS */ \
   { 0x, 0x, 0x },  /* GR_REGS */   \
   { 0x, 0x, 0x },  /* FP_REGS */   \
-- 
2.17.1

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-16 Thread Jason Merrill


On 10/15/19 8:31 PM, JeanHeyd Meneide wrote:

Attached is a patch for p1301 that improves in the way Jason Merrill
specified earlier
(https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00858.html)


Great, thanks!

This mail is missing ChangeLog entries.  My guess is that you're using 
git diff to create the patch file; git show or (even better) git 
format-patch will also include the commit message.



+/* Check that the attribute ATTRIBU,TE appears at most once in the


Stray added ,


-  else if (TREE_CODE (expr) == TARGET_EXPR
-  && lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))
+  else if (TREE_CODE (expr) == TARGET_EXPR)
 {
   /* The TARGET_EXPR confuses do_warn_unused_result into thinking that the
 result is used, so handle that case here.  */
-  if (fn)
+  if (lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))

...

+  else if ((attr = lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (type


The first two if/else should have already handled nodiscard; this else 
was only intended to handle cases where warn_unused_result wants a 
warning and nodiscard doesn't, i.e. when there's an explicit cast to 
void.  We shouldn't need to change anything here.


Jason

Re: [PATCH] Fix constexpr-dtor3.C FAIL on arm

2019-10-16 Thread Jason Merrill

On 10/16/19 12:27 PM, Jakub Jelinek wrote:
> On Fri, Oct 11, 2019 at 04:14:16PM -0400, Jason Merrill wrote:
>>> On x86_64 and most other targets, cleanup here (if non-NULL) is the
>>> CALL_EXPR, as destructor return type is void, but on arm, as the dtor return
>>> type is some pointer, the CALL_EXPR is wrapped into a NOP_EXPR to void.
>>> protected_set_expr_location then on x86_64 clears the CALL_EXPR location,
>>> but on arm only NOP_EXPR location.
>>>
>>> The following patch (totally untested) should fix that.
>>>
>>> For the warning location, perhaps we could special case destructor calls
>>> in push_cx_call_context (to offset the intentional clearing of location for
>>> debugging purposes), if they don't have location set, don't use
>>> input_location for them, but try to pick DECL_SOURCE_LOCATION for the
>>> variable being destructed?
>>
>> Expanding the CLEANUP_EXPR of a CLEANUP_STMT could use the EXPR_LOCATION of
>> the CLEANUP_STMT.  Or the EXPR_LOCATION of *jump_target, if suitable.
>
> The already previously posted patch (now attached as first) has now been
> bootstrapped/regtested on x86_64-linux and i686-linux, and regardless if we
> improve the location or not should fix the arm vs. the rest of the world
> difference.  Is that ok for trunk?

OK.

> As for CLEANUP_STMT, I've tried it (the second patch), but it didn't change
> anything, the diagnostics was still
> constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
> constexpr-dtor3.C:16:24:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
> constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
>  5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error 
> "inline assembly is not a constant expression" }
>|  ^~~
> constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in 
> a ‘constexpr’ function in C++2a
> as without that change.

That's because the patch changes EXPR_LOCATION for evaluation of the
CLEANUP_BODY, but it should be for evaluation of CLEANUP_EXPR instead.


Jason

Re: [PATCH] Fix constexpr-dtor3.C FAIL on arm

2019-10-16 Thread Jason Merrill


On 10/16/19 12:27 PM, Jakub Jelinek wrote:

On Fri, Oct 11, 2019 at 04:14:16PM -0400, Jason Merrill wrote:

On x86_64 and most other targets, cleanup here (if non-NULL) is the
CALL_EXPR, as destructor return type is void, but on arm, as the dtor return
type is some pointer, the CALL_EXPR is wrapped into a NOP_EXPR to void.
protected_set_expr_location then on x86_64 clears the CALL_EXPR location,
but on arm only NOP_EXPR location.

The following patch (totally untested) should fix that.

For the warning location, perhaps we could special case destructor calls
in push_cx_call_context (to offset the intentional clearing of location for
debugging purposes), if they don't have location set, don't use
input_location for them, but try to pick DECL_SOURCE_LOCATION for the
variable being destructed?


Expanding the CLEANUP_EXPR of a CLEANUP_STMT could use the EXPR_LOCATION of
the CLEANUP_STMT.  Or the EXPR_LOCATION of *jump_target, if suitable.


The already previously posted patch (now attached as first) has now been
bootstrapped/regtested on x86_64-linux and i686-linux, and regardless if we
improve the location or not should fix the arm vs. the rest of the world
difference.  Is that ok for trunk?


OK.


As for CLEANUP_STMT, I've tried it (the second patch), but it didn't change
anything, the diagnostics was still
constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
constexpr-dtor3.C:16:24:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
 5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error "inline 
assembly is not a constant expression" }
   |  ^~~
constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in a 
‘constexpr’ function in C++2a
as without that change.


That's because the patch changes EXPR_LOCATION for evaluation of the 
CLEANUP_BODY, but it should be for evaluation of CLEANUP_EXPR instead.


Jason

Re: [PATCH] Help compiler detect invalid code

2019-10-16 Thread François Dumont


Here is a version with __detail::__copy and __detail::__copy_backward.

I prefered to introduce the __detail namespace cause __copy is quite a 
common name so putting it in __detail namespace will I hope clarify that 
it is for internal usage only.


I even hesitated to put more details into this namespace, maybe for 
another patch later.


    * include/bits/stl_algobase.h (__memmove): Replace by...
    (__detail::__copy) ...that. Return void, loop as long as __n != 0.
    (__copy_move<_IsMove, true, 
std::random_access_iterator_tag>::__copy_m):

    Adapt to use latter.
    (__detail::__copy_backward): New.
    (__copy_move_backward<_IsMove, true,
    std::random_access_iterator_tag>::__copy_m): Adapt to use latter.
    (__copy_move_backward_a): Remove std::is_constant_evaluated block.
    * testsuite/25_algorithms/copy/constexpr.cc (test): Add check on copied
    values.
    * testsuite/25_algorithms/copy_backward/constexpr.cc (test): Likewise
    and rename in test1.
    (test2): New.
    * testsuite/25_algorithms/copy/constexpr_neg.cc: New.
    * testsuite/25_algorithms/copy_backward/constexpr.cc: New.
    * testsuite/25_algorithms/equal/constexpr_neg.cc: New.
    * testsuite/25_algorithms/move/constexpr.cc: New.
    * testsuite/25_algorithms/move/constexpr_neg.cc: New.

François

On 10/10/19 10:03 PM, Jonathan Wakely wrote:

On 01/10/19 22:05 +0200, François Dumont wrote:

On 9/27/19 1:24 PM, Jonathan Wakely wrote:

On 20/09/19 07:08 +0200, François Dumont wrote:
I already realized that previous patch will be too controversial to 
be accepted.


In this new version I just implement a real memmove in __memmove so


A real memmove doesn't just work backwards, it needs to detect any
overlaps and work forwards *or* backwards, as needed.
ok, good to know, I understand now why using __builtin_memcopy didn't 
show any performance enhancement when I tested it !


I think your change will break this case:

#include 

constexpr int f(int i, int j, int k)
{
 int arr[5] = { 0, 0, i, j, k };
 std::move(arr+2, arr+5, arr);
 return arr[0] + arr[1] + arr[2];
}

static_assert( f(1, 2, 3) == 6 );

This is valid because std::move only requires that the result iterator
is not in the input range, but it's OK for the two ranges to overlap.

I haven't tested it, but I think with your change the array will end
up containing {3, 2, 3, 2, 3} instead of {1, 2, 3, 2, 3}.

Indeed, I've added a std::move constexpr test in this new proposal 
which demonstrate that.


C++ Standard clearly states that [copy|move]_backward is done 
backward. So in this new proposal I propose to add a __memcopy used 
in copy/move and keep __memmove for *_backward algos. Both are using 
__builtin_memmove as before.


Then they *really* need better names now (__memmove was already a bad
name, but now it's terrible). If the difference is that one goes
forwards and one goes backwards, the names should reflect that.

I'll review it properly tomorrow.




diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index 98d324827ed..dc6b3d3fc76 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -77,19 +77,22 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+namespace __detail
+{
   /*
* A constexpr wrapper for __builtin_memmove.
+   * When constant-evaluated performs a forward copy.
* @param __num The number of elements of type _Tp (not bytes).
*/
   template
 _GLIBCXX14_CONSTEXPR
-inline void*
-__memmove(_Tp* __dst, const _Tp* __src, size_t __num)
+inline void
+__copy(_Tp* __dst, const _Tp* __src, ptrdiff_t __num)
 {
 #ifdef __cpp_lib_is_constant_evaluated
   if (std::is_constant_evaluated())
 	{
-	  for(; __num > 0; --__num)
+	  for (; __num != 0; --__num)
 	{
 	  if constexpr (_IsMove)
 		*__dst = std::move(*__src);
@@ -98,13 +101,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  ++__src;
 	  ++__dst;
 	}
-	  return __dst;
 	}
   else
 #endif
-	return __builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
-  return __dst;
+	__builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
+}
+
+  /*
+   * A constexpr wrapper for __builtin_memmove.
+   * When constant-evaluated performs a backward copy.
+   * @param __num The number of elements of type _Tp (not bytes).
+   */
+  template
+_GLIBCXX14_CONSTEXPR
+inline void
+__copy_backward(_Tp* __dst, const _Tp* __src, ptrdiff_t __num)
+{
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+	{
+	  __dst += __num;
+	  __src += __num;
+	  for (; __num != 0; --__num)
+	{
+	  if constexpr (_IsMove)
+		*--__dst = std::move(*--__src);
+	  else
+		*--__dst = *--__src;
+	}
+	}
+  else
+#endif
+	__builtin_memmove(__dst, __src, sizeof(_Tp) * __num);
 }
+} // namespace __detail
 
   /*
* A constexpr wrapper for __builtin_memcmp.
@@ -446,7 +476,7 @@ _GLIBCXX_BEGIN_NAMESP

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Richard Sandiford

Segher Boessenkool  writes:
> Hi,
>
> [ Please don't use application/octet-stream attachments.  Thanks! ]
>
> On Wed, Oct 16, 2019 at 04:24:29PM +, Yuliang Wang wrote:
>> +;; Unpredicated bitwise select.
>> +(define_insn "*aarch64_sve2_bsl"
>> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
>> +(xor:SVE_I
>> +  (and:SVE_I
>> +(xor:SVE_I
>> +  (match_operand:SVE_I 1 "register_operand" ", w")
>> +  (match_operand:SVE_I 2 "register_operand" ", w"))
>> +(match_operand:SVE_I 3 "register_operand" "w, w"))
>> +  (match_dup BSL_3RD)))]
>
> This isn't canonical RTL.  Does combine not simplify this?
>
> Or, rather, it should not be what we canonicalise to: nothing is defined
> here.

But when nothing is defined, let's match what we get :-)

If someone wants to add a new canonical form then the ports should of
course adapt, but until then I think the patch is doing the right thing.

> We normally get something like
>
> Trying 7, 8 -> 9:
> 7: r127:SI=r130:DI#4^r125:DI#4
>   REG_DEAD r130:DI
> 8: r128:SI=r127:SI&0x2000
>   REG_DEAD r127:SI
> 9: r126:SI=r128:SI^r125:DI#4
>   REG_DEAD r128:SI
>   REG_DEAD r125:DI
> Successfully matched this instruction:
> (set (reg:SI 126)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 4)
> (const_int 536870912 [0x2000]))
> (and:SI (subreg:SI (reg/v:DI 125 [ yD.2902+-4 ]) 4)
> (const_int -536870913 [0xdfff]
>
> If the mask is not a constant, we really shouldn't generate a totally
> different form.  The xor-and-xor form is very hard to handle, too.
>
> Expand currently generates this, because gimple thinks this is simpler.
> I think this should be fixed.

But the constant form is effectively folding away the NOT.
Without it the equivalent rtl uses 4 operations rather than 3:

  (ior (and A C) (and B (not C)))

And folding 4 operations gets us into 4-insn combinations, which are
obviously more limited (for good reason).

As you say, it's no accident that we get this form, it's something
that match.pd specifically chose.  And I think there should be a
strong justification for having an RTL canonical form that reverses
a gimple decision.  RTL isn't as powerful as gimple and so isn't going
to be able to undo the gimple transforms in all cases.

Thanks,
Richard

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Richard Sandiford

Thanks for the patch, looks really good.

Yuliang Wang  writes:
> +;; Use NBSL for vector NOR.
> +(define_insn_and_rewrite "*aarch64_sve2_nor"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?&w")
> + (unspec:SVE_I
> +   [(match_operand 3)
> +(and:SVE_I
> +  (not:SVE_I
> +(match_operand:SVE_I 1 "register_operand" "w, 0, w"))
> +  (not:SVE_I
> +(match_operand:SVE_I 2 "register_operand" "0, w, w")))]
> +   UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  "@
> +  nbsl\t%0.d, %0.d, %1.d, %0.d
> +  nbsl\t%0.d, %0.d, %2.d, %0.d
> +  movprfx\t%0, %2\;nbsl\t%0.d, %0.d, %1.d, %0.d"
> +  "&& !CONSTANT_P (operands[3])"
> +  {
> +operands[3] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,*,yes")]
> +)
> +
> +;; Use NBSL for vector NAND.
> +(define_insn_and_rewrite "*aarch64_sve2_nand"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?&w")
> + (unspec:SVE_I
> +   [(match_operand 3)
> +(ior:SVE_I
> +  (not:SVE_I
> +(match_operand:SVE_I 1 "register_operand" "w, 0, w"))
> +  (not:SVE_I
> +(match_operand:SVE_I 2 "register_operand" "0, w, w")))]
> +   UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  "@
> +  nbsl\t%0.d, %0.d, %1.d, %1.d
> +  nbsl\t%0.d, %0.d, %2.d, %2.d
> +  movprfx\t%0, %2\;nbsl\t%0.d, %0.d, %1.d, %1.d"
> +  "&& !CONSTANT_P (operands[3])"
> +  {
> +operands[3] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,*,yes")]
> +)

For these two it should in theory be slightly better to use "%" on
operand 1 to make the operation commutative, rather than provide a
separate matching alternative for operand 2.  E.g.:

(define_insn_and_rewrite "*aarch64_sve2_nand"
  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
(unspec:SVE_I
  [(match_operand 3)
   (ior:SVE_I
 (not:SVE_I
   (match_operand:SVE_I 1 "register_operand" "%0, w"))
 (not:SVE_I
   (match_operand:SVE_I 2 "register_operand" "w, w")))]
  UNSPEC_PRED_X))]
  "TARGET_SVE2"
  "@
  nbsl\t%0.d, %0.d, %2.d, %2.d
  movprfx\t%0, %1\;nbsl\t%0.d, %0.d, %2.d, %2.d"
  "&& !CONSTANT_P (operands[3])"
  {
operands[3] = CONSTM1_RTX (mode);
  }
  [(set_attr "movprfx" "*,*,yes")]
)

(But the EOR3 pattern is fine as-is, since that supports any
permutation of the three inputs.)

> +;; Unpredicated bitwise select.
> +(define_insn "*aarch64_sve2_bsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
> + (xor:SVE_I
> +   (and:SVE_I
> + (xor:SVE_I
> +   (match_operand:SVE_I 1 "register_operand" ", w")
> +   (match_operand:SVE_I 2 "register_operand" ", w"))
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_3RD)))]
> +  "TARGET_SVE2"
> +  "@
> +  bsl\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;bsl\t%0.d, %0.d, %.d, %3.d"
> +  [(set_attr "movprfx" "*,yes")]
> +)

This is sufficiently far from the documented logic that it might
be worth a comment.  E.g.:

;; (op3 ? bsl_mov : bsl_3rd) == (((bsl_mov ^ bsl_3rd) & op3) ^ bsl_3rd)

Similarly for the later patterns.

> +;; Unpredicated bitwise inverted select.
> +(define_insn_and_rewrite "*aarch64_sve2_nbsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
> + (unspec:SVE_I
> +   [(match_operand 4)
> +(not:SVE_I
> +  (xor:SVE_I
> +(and:SVE_I
> +  (xor:SVE_I
> +(match_operand:SVE_I 1 "register_operand" ", w")
> +(match_operand:SVE_I 2 "register_operand" ", w"))
> +  (match_operand:SVE_I 3 "register_operand" "w, w"))
> +(match_dup BSL_3RD)))]
> +   UNSPEC_PRED_X))]
> +  "TARGET_SVE2"
> +  "@
> +  nbsl\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;nbsl\t%0.d, %0.d, %.d, %3.d"
> +  "&& !CONSTANT_P (operands[4])"
> +  {
> +operands[4] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Unpredicated bitwise select with inverted first operand.
> +(define_insn_and_rewrite "*aarch64_sve2_bsl1n"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
> + (xor:SVE_I
> +   (and:SVE_I
> + (unspec:SVE_I
> +   [(match_operand 4)
> +(not:SVE_I
> +  (xor:SVE_I
> +(match_operand:SVE_I 1 "register_operand" ", w")
> +(match_operand:SVE_I 2 "register_operand" ", w")))]
> +   UNSPEC_PRED_X)
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_3RD)))]
> +  "TARGET_SVE2"
> +  "@
> +  bsl1n\t%0.d, %0.d, %.d, %3.d
> +  movprfx\t%0, %\;bsl1n\t%0.d, %0.d, %.d, %3.d"
> +  "&& !CONSTANT_P (operands[4])"
> +  {
> +operands[4] = CONSTM1_RTX (mode);
> +  }
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
> +;; Unpredicated bitwise select with inverted second operand.
> +(define_insn_and_rewrite "*aarch64_sve2_bsl2n"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Segher Boessenkool

On Wed, Oct 16, 2019 at 10:03:51AM -0600, Martin Sebor wrote:
> PS The counterexample nicely illustrates why -Wself-init should
> be in -Wall like in Clang or MSVC, or at least in -Wextra like in
> ICC.  Let me take it as a reminder to submit a patch for GCC 10.

c-family/c-gimplify.c says:

  /* This is handled mostly by gimplify.c, but we have to deal with
 not warning about int x = x; as it is a GCC extension to turn off
 this warning but only if warn_init_self is zero.  */

A lot of code will start to warn if you turn on -Winit-self by default
(in -Wall or -W), since forever we have had this GCC extension, and
people expect it.  (Or so I fear, feel free to prove me wrong :-) )

Segher

[PATCH] [og9] Re-do OpenACC private variable resolution

2019-10-16 Thread Julian Brown

This patch (for the openacc-gcc-9-branch) reworks how the partitioning
level for OpenACC "private" variables is calculated and represented in
the compiler. There have been two previous approaches:

 - The first (by Chung-Lin Tang) recorded which variables should be
   made private per-gang in each front end (i.e. separately in C, C++
   and Fortran) using a new attribute "oacc gangprivate". This was deemed
   too early; the final determination about which loops are assigned
   which parallelism level has not yet been made at parse time.

 - The second, last discussed here:

 https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00726.html

   moved the analysis of OpenACC contexts to determine parallelism levels
   to omp-low.c (but kept the "oacc gangprivate" attribute and the NVPTX
   backend parts). However (as mentioned in that mail), this is still
   too early: in fact the final determination of the parallelism level
   for each loop (especially for loops without explicit gang/worker/vector
   clauses) does not happen until we reach the device compiler, in the
   oaccloops pass.

This patch builds on the second approach, but delays fixing the parallelism
level of each "private" variable (those that are addressable, and declared
private using OpenACC clauses or by defining them in a scope nested
within a compute region or partitioned loop) until the oaccdevlow pass.
This is done by adding a new internal UNIQUE function (OACC_PRIVATE)
that lists (the address of) each private variable as an argument.
These new internal functions fit into the existing scheme for demarking
OpenACC loops, as described in comments in the patch.

Use of the "oacc gangprivate" attribute is now restricted to the NVPTX
backend (and could probably be replaced with some lighter-weight mechanism
as a followup).

Tested with offloading to NVPTX and GCN (separately). I will apply to
the openacc-gcc-9-branch shortly.

Thanks,

Julian

ChangeLog

gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename
to...
(gcn_goacc_adjust_private_decl): ...this.
* config/gcn/gcn-tree.c (diagnostic-core.h): Include.
(gcn_goacc_adjust_gangprivate_decl): Rename to...
(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
(nvptx_goacc_adjust_private_decl): New function.
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook using above function.
* doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* doc/tm.texi: Regenerated.
* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
* omp-low.c (omp_context): Remove oacc_partitioning_levels field.
(lower_oacc_reductions): Add PRIVATE_MARKER parameter.  Insert before
fork.
(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify its
gimple call arguments as appropriate. Don't set
oacc_partitioning_levels in omp_context. Pass private_marker to
lower_oacc_reductions.
(oacc_record_private_var_clauses): Don't check for NULL ctx.
(make_oacc_private_marker): New function.
(lower_omp_for): Only call oacc_record_vars_in_bind for
OpenACC contexts.  Create private marker and pass to
lower_oacc_head_tail.
(lower_omp_target): Remove unnecessary call to
oacc_record_private_var_clauses. Remove call to mark_oacc_gangprivate.
Create private marker and pass to lower_oacc_reductions.
(process_oacc_gangprivate_1): Remove.
(lower_omp_1): Only call oacc_record_vars_in_bind for OpenACC.  Don't
iterate over contexts calling process_oacc_gangprivate_1.
(omp-offload.c (oacc_loop_xform_head_tail): Treat
private-variable markers like fork/join when transforming head/tail
sequences.
(execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE instead of
"oacc gangprivate" attributes to determine partitioning level of
variables.
* omp-sese.c (find_gangprivate_vars): New function.
(find_local_vars_to_propagate): Use GANGPRIVATE_VARS parameter instead
of "oacc gangprivate" attribute to determine which variables are
gang-private.
(oacc_do_neutering): Use find_gangprivate_vars.
* target.def (adjust_gangprivate_decl): Rename to...
(adjust_private_decl): ...this.  Update documentation (briefly).

libgomp/
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Use
oaccdevlow dump and update scanned output.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Likewise.
Add missing atomic to force worker partiti

[PATCH] [og9] Fix libgomp serial-dims.c test for AMD GCN

2019-10-16 Thread Julian Brown

This patch adds support for AMD GCN offloading to the
libgomp.oacc-c-c++-common/serial-dims.c test case.

I will apply to the og9 branch shortly.

Thanks,

Julian

ChangeLog

libgomp/
* testsuite/libgomp.oacc-c-c++-common/serial-dims.c: Support AMD GCN.
---
 libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
index 3895405b2cf..e373ebd37b7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c
@@ -69,6 +69,13 @@ int main ()
  /* The GCC nvptx back end enforces vector_length (32).  */
  vectors_actual = 32;
}
+  else if (acc_on_device (acc_device_gcn))
+   {
+ /* AMD GCN relies on the autovectorizer for the vector dimension:
+the loop below isn't likely to be vectorized, so vectors_actual
+is effectively 1.  */
+ vectors_actual = 1;
+   }
   else if (!acc_on_device (acc_device_host))
__builtin_abort ();
 #pragma acc loop gang \
-- 
2.23.0

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Segher Boessenkool

Hi,

[ Please don't use application/octet-stream attachments.  Thanks! ]

On Wed, Oct 16, 2019 at 04:24:29PM +, Yuliang Wang wrote:
> +;; Unpredicated bitwise select.
> +(define_insn "*aarch64_sve2_bsl"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w")
> + (xor:SVE_I
> +   (and:SVE_I
> + (xor:SVE_I
> +   (match_operand:SVE_I 1 "register_operand" ", w")
> +   (match_operand:SVE_I 2 "register_operand" ", w"))
> + (match_operand:SVE_I 3 "register_operand" "w, w"))
> +   (match_dup BSL_3RD)))]

This isn't canonical RTL.  Does combine not simplify this?

Or, rather, it should not be what we canonicalise to: nothing is defined
here.

We normally get something like

Trying 7, 8 -> 9:
7: r127:SI=r130:DI#4^r125:DI#4
  REG_DEAD r130:DI
8: r128:SI=r127:SI&0x2000
  REG_DEAD r127:SI
9: r126:SI=r128:SI^r125:DI#4
  REG_DEAD r128:SI
  REG_DEAD r125:DI
Successfully matched this instruction:
(set (reg:SI 126)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 4)
(const_int 536870912 [0x2000]))
(and:SI (subreg:SI (reg/v:DI 125 [ yD.2902+-4 ]) 4)
(const_int -536870913 [0xdfff]

If the mask is not a constant, we really shouldn't generate a totally
different form.  The xor-and-xor form is very hard to handle, too.

Expand currently generates this, because gimple thinks this is simpler.
I think this should be fixed.

Segher

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-16 Thread JeanHeyd Meneide

Dear Dave,

 Thanks for sharing all of that! It was very helpful to read it
over again, and it was helpful in IRC yesterday.

 As a bit of a "that was strange" moment, I ran the builds again
and did NOT do --disable-bootstrap with the patch on a different
machine. They built and ran fine, with no segfaults during calls to
error(...). I probably forgot to properly clean/build something in my
debugging environment, which caused most of the problems.

 I already assigned my Copyright to the FSF, so that part should
be fine! Is there anything else I should be doing?

Sincerely,
JeanHeyd

On Wed, Oct 16, 2019 at 8:51 AM David Malcolm  wrote:
>
> On Tue, 2019-10-15 at 20:31 -0400, JeanHeyd Meneide wrote:
> > Attached is a patch for p1301 that improves in the way Jason Merrill
> > specified earlier
> > (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00858.html), but it
> > keeps segfaulting on my build of GCC. I don't know what changes I've
> > made that cause it to segfault: it does so whenever the error()
> > function is called but the backtraces aren't showing me anything
> > conclusive.
> >
> > The tests test what I expect them to and the output is fine, I just
> > can't get the segfaults to stop, so I'm putting the patch up so
> > someone can critique what I've written, or someone else to test it
> > too. Sorry for taking so long.
> >
> > Thanks,
> > JeanHeyd
>
> Thanks for posting these patches.
>
> I believe you're a new GCC contributor - welcome!
>
> Having said "welcome", for good or ill, GCC has some legalese that we
> have to follow - to get a patch accepted, see the "Legal Prerequisites"
> section of: https://gcc.gnu.org/contribute.html so you'll need to
> follow that at some point  (I hate having to say this).
>
> I got the impression on IRC yesterday that you were having some trouble
> getting your debugging environment set up - it sounded to me like
> you've been "making do" by adding print statements to the parser and
> recompiling each time.
>
> If that's the case, then the best next step is probably to figure out
> how to get you debugging cc1plus using a debugger.
>
> Normally, I run:
>   ./xgcc -B.
> and add  "-wrapper gdb,--args" to the command line so that the driver
> invokes gdb.
>
> Another way is to invoke gdb and have it directly invoke cc1plus:
>gdb --args ./cc1plus -quiet x.C
>
> In both cases, run it from /gcc
>
> See also:
> https://dmalcolm.fedorapeople.org/gcc/newbies-guide/debugging.html
> which has further hints and info on a "nicer" debugging experience.
>
> You mentioned that you normally use gdbserver, which makes me think
> that you're most comfortable using gdb from an IDE.  I'm not very
> familiar with gdbserver, but hopefully someone here knows the recipe
> for this that will let you comfortably debug the frontend - without
> needing to add print statements and recompile each time (ugh).
>
> Note that for hacking on the frontend and debugging, I normally
> configure gcc with --disable-bootstrap which saves a *lot* of time;
> similarly I also only build the "gcc" subdirectory and its dependencies
> (I still do a full bootstrap and regression test in a separate
> directory once I've got a candidate patch that I want to test properly
> before submitting to the mailing list).
>
> [I know that you know some/all of this, but I'm posting it here for the
> benefit of people reading the list archives]
>
> Hope this is helpful, and welcome again.
> Dave
>
>

[PATCH] Communicate lto-wrapper and ld through a file

2019-10-16 Thread Giuliano Belinassi

Hi,

Previously, the lto-wrapper communicates with ld by creating a pipe from
lto-wrapper's stdout to ld's stdin. This patch uses a temporary file for
this communication, releasing stdout to be used for debugging.

I've run a full testsuite and bootstrapped LTO in a linux x86_64, and found
no issues so far. Do I need to write a testcase for this feature?

Giuliano.

gcc/ChangeLog
2019-10-16  Giuliano Belinassi  

* lto-wrapper.c (STATIC_LEN): New macro.
(to_ld): New.
(find_crtofftable): Print to file to_ld.
(find_ld_list_file): New.
(main): Check if to_ld is valid or is stdout.

gcc/libiberty
2019-10-16  Giuliano Belinassi  

* pex-unix.c (pex_unix_exec_child): check for PEX_KEEP_STD_IO flag.
(to_ld): New.

gcc/include
19-10-16  Giuliano Belinassi  

* libiberty.h (PEX_KEEP_STD_IO): New macro.

gcc/lto-plugin
2019-10-16  Giuliano Belinassi  

* lto-plugin.c (exec_lto_wrapper): Replace pipe from stdout to temporary
file, and pass its name in argv.

diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c
index 9a7bbd0c022..d794b493d5d 100644
--- gcc/lto-wrapper.c
+++ gcc/lto-wrapper.c
@@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "lto-section-names.h"
 #include "collect-utils.h"
 
+#define STATIC_LEN(x) (sizeof (x) / sizeof (*x))
+
 /* Environment variable, used for passing the names of offload targets from GCC
driver to lto-wrapper.  */
 #define OFFLOAD_TARGET_NAMES_ENV	"OFFLOAD_TARGET_NAMES"
@@ -74,6 +76,8 @@ static char *makefile;
 static unsigned int num_deb_objs;
 static const char **early_debug_object_names;
 
+static FILE *to_ld;
+
 const char tool_name[] = "lto-wrapper";
 
 /* Delete tempfiles.  Called from utils_cleanup.  */
@@ -955,7 +959,7 @@ find_crtoffloadtable (void)
 	/* The linker will delete the filename we give it, so make a copy.  */
 	char *crtoffloadtable = make_temp_file (".crtoffloadtable.o");
 	copy_file (crtoffloadtable, paths[i]);
-	printf ("%s\n", crtoffloadtable);
+	fprintf (to_ld, "%s\n", crtoffloadtable);
 	XDELETEVEC (crtoffloadtable);
 	break;
   }
@@ -1556,7 +1560,7 @@ cont1:
 	{
 	  find_crtoffloadtable ();
 	  for (i = 0; offload_names[i]; i++)
-	printf ("%s\n", offload_names[i]);
+	fprintf (to_ld, "%s\n", offload_names[i]);
 	  free_array_of_ptrs ((void **) offload_names, i);
 	}
 }
@@ -1686,12 +1690,12 @@ cont1:
 
   if (lto_mode == LTO_MODE_LTO)
 {
-  printf ("%s\n", flto_out);
+  fprintf (to_ld, "%s\n", flto_out);
   if (!skip_debug)
 	{
 	  for (i = 0; i < ltoobj_argc; ++i)
 	if (early_debug_object_names[i] != NULL)
-	  printf ("%s\n", early_debug_object_names[i]);	  
+	  fprintf (to_ld, "%s\n", early_debug_object_names[i]);
 	}
   /* These now belong to collect2.  */
   free (flto_out);
@@ -1866,15 +1870,15 @@ cont:
 	}
   for (i = 0; i < nr; ++i)
 	{
-	  fputs (output_names[i], stdout);
-	  putc ('\n', stdout);
+	  fputs (output_names[i], to_ld);
+	  putc ('\n', to_ld);
 	  free (input_names[i]);
 	}
   if (!skip_debug)
 	{
 	  for (i = 0; i < ltoobj_argc; ++i)
 	if (early_debug_object_names[i] != NULL)
-	  printf ("%s\n", early_debug_object_names[i]);	  
+	  fprintf (to_ld, "%s\n", early_debug_object_names[i]);
 	}
   nr = 0;
   free (ltrans_priorities);
@@ -1892,13 +1896,43 @@ cont:
   obstack_free (&argv_obstack, NULL);
 }
 
+/* Find out if lto-wrapper was called and output to lto-plugin will be in a file
+   instead of stdout.  */
+
+static const char *find_ld_list_file (int* argc, char *argv[])
+{
+  int i;
+  static const char param[] = "--to_ld_list=";
+
+  for (i = 1; i < *argc; i++)
+{
+  if (argv[i] != NULL && !strncmp (argv[i], param, STATIC_LEN (param) - 1))
+	{
+	  /* Found.  Retrieve the path to the file.  */
+
+	  /* This simple 'automata' just finds the first ':' of the string,
+	 and then return a pointer from now foward.  */
+	  const char *path = argv[i];
+	  argv[i] = NULL;
+	  (*argc)--;
+	  while (*path != '\0')
+	if (*path++ == '=')
+		return path;
+	}
+}
+
+
+  /* Not found.  */
+  return NULL;
+}
+
 
 /* Entry point.  */
 
 int
 main (int argc, char *argv[])
 {
-  const char *p;
+  const char *p, *files_list;
 
   init_opts_obstack ();
 
@@ -1934,11 +1968,22 @@ main (int argc, char *argv[])
   signal (SIGCHLD, SIG_DFL);
 #endif
 
+  files_list = find_ld_list_file (&argc, argv);
+  to_ld = files_list ? fopen (files_list, "a") : stdout;
+
+  if (!to_ld)
+{
+  fprintf (stderr, "%s: failed to open parameter file.\n", argv[0]);
+  abort ();
+}
+
   /* We may be called with all the arguments stored in some file and
  passed with @file.  Expand them into argv before processing.  */
   expandargv (&argc, &argv);
 
   run_gcc (argc, argv);
 
+  fclose (to_ld);
+
   return 0;
 }
diff --git include/libiberty.h include/libiberty.h
index 71192a29377..cc3940f91f3 100644
--- include/libiberty.h
+++ include/libiberty.h
@@ -406,6 +406,9 @@ extern void hex_init (void)

Re: [gomp4.1] Start of structure element mapping support

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 03:22:52PM +0200, Thomas Schwinge wrote:
> Stumbled over this while reviewing Julian's "Factor out duplicate code in
> gimplify_scan_omp_clauses":

> ..., which here gets writte to...
> 
> > +   if (base != decl)
> > + break;
> > +   gcc_assert (offset == NULL_TREE
> > +   || TREE_CODE (offset) == INTEGER_CST);
> 
> ..., but here we again check 'offset', not 'offset2'...

Yes, it indeed should be offset2 == NULL_TREE and
TREE_CODE (offset2) == INTEGER_CST, thanks for catching that.

> Should the second highlighted 'gcc_assert' be changed as follows,
> suitably adapted for current GCC trunk, of course?  (Not yet tested.)  If
> approving such a patch, please respond with "Reviewed-by: NAME "
> so that your effort will be recorded in the commit log, see
> .
> 
> - gcc_assert (offset == NULL_TREE
> - || TREE_CODE (offset) == INTEGER_CST);
> + gcc_assert (offset2 == NULL_TREE
> + || TREE_CODE (offset2) == INTEGER_CST);

Preapproved for trunk if it passes bootstrap/regtest.

Jakub

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 10:03:51AM -0600, Martin Sebor wrote:
> > The counter example would be:
> > #define F(x) \
> >__extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
> > #define G(x) \
> >__extension__ (({ __typeof__ (x) _x = x; F(_x); }))
> > where a -Wshadow diagnostics could point the author at a serious bug,
> > because in the expansion it will be __typeof__ (_x) _x = _x; ...
> 
> True.  I don't suppose there is a way to make it so the warning
> triggers for the counter example and not for the original, is
> there?

Maybe look through the macro nesting context and if the shadowing
declaration comes from the same macro as shadowed declaration
or macro included directly or indirectly from the macro with shadowed
declaration, warn, otherwise not?
This might still not warn in case where the scope of the shadowing
declaration is created from multiple macros ({ coming from one,
}) from another one, but otherwise could work.
Perhaps -Wshadow-local needs multiple modes, the default one that
will have this macro handling and full one (=2) which would warn
regardless of macro definitions.

Jakub

[arm] fix bootstrap failure due to uninitialized warning

2019-10-16 Thread Richard Earnshaw (lists)

The Arm port is failing bootstrap because GCC is now warning about an 
unitialized array.


The code is complex enough that I certainly can't be sure the compiler 
is wrong, so perhaps the best fix here is just to memset the entire 
array before use.


* config/arm/arm.c (neon_valid_immediate): Clear bytes before use.

Applied to trunk.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f0975dc071..75a011029f1 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12243,7 +12252,7 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 
   unsigned int i, elsize = 0, idx = 0, n_elts;
   unsigned int innersize;
-  unsigned char bytes[16];
+  unsigned char bytes[16] = {};
   int immtype = -1, matches;
   unsigned int invmask = inverse ? 0xff : 0;
   bool vector = GET_CODE (op) == CONST_VECTOR;

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Joseph Myers

On Wed, 16 Oct 2019, Jakub Jelinek wrote:

> The counter example would be:
> #define F(x) \
>   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
> #define G(x) \
>   __extension__ (({ __typeof__ (x) _x = x; F(_x); }))
> where a -Wshadow diagnostics could point the author at a serious bug,
> because in the expansion it will be __typeof__ (_x) _x = _x; ...

And this is not theoretical, 
 
 was a real 
bug in glibc soft-fp where shadowing of variables called _c1 and _c2 in 
two macros resulted in wrong code.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix constexpr-dtor3.C FAIL on arm

2019-10-16 Thread Jakub Jelinek

On Fri, Oct 11, 2019 at 04:14:16PM -0400, Jason Merrill wrote:
> > On x86_64 and most other targets, cleanup here (if non-NULL) is the
> > CALL_EXPR, as destructor return type is void, but on arm, as the dtor return
> > type is some pointer, the CALL_EXPR is wrapped into a NOP_EXPR to void.
> > protected_set_expr_location then on x86_64 clears the CALL_EXPR location,
> > but on arm only NOP_EXPR location.
> > 
> > The following patch (totally untested) should fix that.
> > 
> > For the warning location, perhaps we could special case destructor calls
> > in push_cx_call_context (to offset the intentional clearing of location for
> > debugging purposes), if they don't have location set, don't use
> > input_location for them, but try to pick DECL_SOURCE_LOCATION for the
> > variable being destructed?
> 
> Expanding the CLEANUP_EXPR of a CLEANUP_STMT could use the EXPR_LOCATION of
> the CLEANUP_STMT.  Or the EXPR_LOCATION of *jump_target, if suitable.

The already previously posted patch (now attached as first) has now been
bootstrapped/regtested on x86_64-linux and i686-linux, and regardless if we
improve the location or not should fix the arm vs. the rest of the world
difference.  Is that ok for trunk?

As for CLEANUP_STMT, I've tried it (the second patch), but it didn't change
anything, the diagnostics was still
constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
constexpr-dtor3.C:16:24:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error 
"inline assembly is not a constant expression" }
  |  ^~~
constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in a 
‘constexpr’ function in C++2a
as without that change.

I've also tried the third patch, tested so far with check-c++-all, which
changes that to
constexpr-dtor3.C:16:23:   in ‘constexpr’ expansion of ‘f4()’
constexpr-dtor3.C:12:6:   in ‘constexpr’ expansion of ‘(& w13)->W7::~W7()’
constexpr-dtor3.C:5:34: error: inline assembly is not a constant expression
5 |   constexpr ~W7 () { if (w == 5) asm (""); w = 3; } // { dg-error 
"inline assembly is not a constant expression" }
  |  ^~~
constexpr-dtor3.C:5:34: note: only unevaluated inline assembly is allowed in a 
‘constexpr’ function in C++2a

Jakub
2019-10-16  Jakub Jelinek  

* decl.c (cxx_maybe_build_cleanup): When clearing location of cleanup,
if cleanup is a nop, clear location of its operand too.

--- gcc/cp/decl.c.jj2019-10-10 01:33:38.154943945 +0200
+++ gcc/cp/decl.c   2019-10-11 10:09:24.321277942 +0200
@@ -16864,6 +16864,8 @@ cxx_maybe_build_cleanup (tree decl, tsub
  the end of the block.  So let's unset the location of the
  destructor call instead.  */
   protected_set_expr_location (cleanup, UNKNOWN_LOCATION);
+  if (cleanup && CONVERT_EXPR_P (cleanup))
+protected_set_expr_location (TREE_OPERAND (cleanup, 0), UNKNOWN_LOCATION);

   if (cleanup
   && DECL_P (decl)
2019-10-16  Jakub Jelinek  

* constexpr.c (cxx_eval_constant_expression) :
Temporarily change input_location to CLEANUP_STMT location.

--- gcc/cp/constexpr.c.jj   2019-10-10 01:33:38.185943480 +0200
+++ gcc/cp/constexpr.c  2019-10-11 22:54:32.628051700 +0200
@@ -4980,9 +4980,13 @@ cxx_eval_constant_expression (const cons
 case CLEANUP_STMT:
   {
tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
+   location_t loc = input_location;
+   if (EXPR_HAS_LOCATION (t))
+ input_location = EXPR_LOCATION (t);
r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
  non_constant_p, overflow_p,
  jump_target);
+   input_location = loc;
if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
  /* Also evaluate the cleanup.  If we weren't skipping at the
 start of the CLEANUP_BODY, change jump_target temporarily
2019-10-16  Jakub Jelinek  

* decl.c (cxx_maybe_build_cleanup): When clearing location of cleanup,
if cleanup is a nop, clear location of its operand too.
* constexpr.c (push_cx_call_context): For calls to destructors, use
DECL_SOURCE_LOCATION of destructed variable in preference to
input_location.

* g++.dg/cpp2a/constexpr-dtor3.C: Expect in 'constexpr' expansion of
message on the line with variable declaration.

--- gcc/cp/decl.c.jj2019-10-16 09:30:57.490109872 +0200
+++ gcc/cp/decl.c   2019-10-16 17:45:48.647529567 +0200
@@ -16864,6 +16864,8 @@ cxx_maybe_build_cleanup (tree decl, tsub
  the end of the block.  So let's unset the location of the
  destructor call instead.  */
   protected_set_expr_location (cleanup, UNKNOWN_LOCATION);
+  if (cleanup && CONVERT_EXPR_P (cleanup))
+prot

[AArch64][SVE2] Support for EOR3 and variants of BSL

2019-10-16 Thread Yuliang Wang

Hi,

This patch adds combine pass support for the following SVE2 bitwise logic 
instructions:

- EOR3  (3-way vector exclusive OR)
- BSL   (bitwise select)
- NBSL  (inverted ")
- BSL1N (" with first input inverted)
- BSL2N (" with second input inverted)

Example template snippet:

void foo (TYPE *a, TYPE *b, TYPE *c, TYPE *d, int n)
{
  for (int i = 0; i < n; i++)
a[i] = OP (b[i], c[i], d[i]);
}

EOR3:

  // #define OP(x,y,z) ((x) ^ (y) ^ (z))

  beforeeor z1.d, z1.d, z2.d
eor z0.d, z0.d, z1.d
  ...
  after eor3z0.d, z0.d, z1.d, z2.d

BSL:

  // #define OP(x,y,z) (((x) & (z)) | ((y) & ~(z)))

  beforeeor z0.d, z0.d, z1.d
and z0.d, z0.d, z2.d
eor z0.d, z0.d, z1.d
  ...
  after bsl z0.d, z0.d, z1.d, z2.d

NBSL:

  // #define OP(x,y,z) ~(((x) & (z)) | ((y) & ~(z)))

  beforeeor z0.d, z0.d, z1.d
and z0.d, z0.d, z2.d
eor z0.d, z0.d, z1.d
not z0.s, p1/m, z0.s
  ...
  after nbslz0.d, z0.d, z1.d, z2.d

BSL1N:

  // #define OP(x,y,z) ((~(x) & (z)) | ((y) & ~(z)))

  beforeeor z0.d, z0.d, z1.d
bic z0.d, z2.d, z0.d
eor z0.d, z0.d, z1.d
  ...
  after bsl1n   z0.d, z0.d, z1.d, z2.d

BSL2N:

  // #define OP(x,y,z) (((x) & (z)) | (~(y) & ~(z)))

  beforeorr z0.d, z1.d, z0.d
and z1.d, z1.d, z2.d
not z0.s, p1/m, z0.s
orr z0.d, z0.d, z1.d
  ...
  after bsl2n   z0.d, z0.d, z1.d, z2.d

Additionally, vector NOR and NAND operations are now optimized with NBSL:

  NOR   x, y  ->  NBSL  x, y, x
  NAND  x, y  ->  NBSL  x, y, y

Built and tested on aarch64-none-elf.

Best Regards,
Yuliang Wang


gcc/ChangeLog:

2019-10-16  Yuliang Wang  

* config/aarch64/aarch64-sve2.md (aarch64_sve2_eor3)
(aarch64_sve2_nor, aarch64_sve2_nand)
(aarch64_sve2_bsl, aarch64_sve2_nbsl)
(aarch64_sve2_bsl1n, aarch64_sve2_bsl2n):
New combine patterns.
* config/aarch64/iterators.md (BSL_3RD): New int iterator for the above.
(bsl_1st, bsl_2nd, bsl_3rd, bsl_mov): Attributes for the above.
* config/aarch64/aarch64.h (AARCH64_ISA_SVE2_AES, AARCH64_ISA_SVE2_SM4)
(AARCH64_ISA_SVE2_SHA3, AARCH64_ISA_SVE2_BITPERM): New ISA flag macros.
(TARGET_SVE2_AES, TARGET_SVE2_SM4, TARGET_SVE2_SHA3)
(TARGET_SVE2_BITPERM): New CPU targets.

gcc/testsuite/ChangeLog:

2019-10-16  Yuliang Wang  

* gcc.target/aarch64/sve2/eor3_1.c: New test.
* gcc.target/aarch64/sve2/eor3_2.c: As above.
* gcc.target/aarch64/sve2/nlogic_1.c: As above.
* gcc.target/aarch64/sve2/nlogic_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_1.c: As above.
* gcc.target/aarch64/sve2/bitsel_2.c: As above.
* gcc.target/aarch64/sve2/bitsel_3.c: As above.
* gcc.target/aarch64/sve2/bitsel_4.c: As above.


rb11975.patch
Description: rb11975.patch

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2019-10-16 Thread Richard Earnshaw (lists)


On 11/10/2019 00:08, Ramana Radhakrishnan wrote:

On Thu, Oct 10, 2019 at 7:06 PM Richard Sandiford
 wrote:


Wilco Dijkstra  writes:

ping

Contrary to all documentation, SLOW_BYTE_ACCESS simply means accessing
bitfields by their declared type, which results in better codegeneration on 
practically
any target.


The name is confusing, but the documentation looks accurate to me:

 Define this macro as a C expression which is nonzero if accessing less
 than a word of memory (i.e.@: a @code{char} or a @code{short}) is no
 faster than accessing a word of memory, i.e., if such access
 require more than one instruction or if there is no difference in cost
 between byte and (aligned) word loads.

 When this macro is not defined, the compiler will access a field by
 finding the smallest containing object; when it is defined, a fullword
 load will be used if alignment permits.  Unless bytes accesses are
 faster than word accesses, using word accesses is preferable since it
 may eliminate subsequent memory access if subsequent accesses occur to
 other fields in the same word of the structure, but to different bytes.


I'm thinking we should completely remove all trace of SLOW_BYTE_ACCESS
from GCC as it's confusing and useless.


I disagree.  Some targets can optimise single-bit operations when the
container is a byte, for example.


OK for commit until we get rid of it?

ChangeLog:
2017-11-17  Wilco Dijkstra  

 gcc/
 * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
056110afb228fb919e837c04aa5e5552a4868ec3..d8f4d129a02fb89eb00d256aba8c4764d6026078
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -769,14 +769,9 @@ typedef struct
 if given data not on the nominal alignment.  */
  #define STRICT_ALIGNMENTTARGET_STRICT_ALIGN

-/* Define this macro to be non-zero if accessing less than a word of
-   memory is no faster than accessing a word of memory, i.e., if such
-   accesses require more than one instruction or if there is no
-   difference in cost.
-   Although there's no difference in instruction count or cycles,
-   in AArch64 we don't want to expand to a sub-word to a 64-bit access
-   if we don't have to, for power-saving reasons.  */
-#define SLOW_BYTE_ACCESS   0
+/* Contrary to all documentation, this enables wide bitfield accesses,
+   which results in better code when accessing multiple bitfields.  */
+#define SLOW_BYTE_ACCESS   1

  #define NO_FUNCTION_CSE 1


I agree this makes sense from a performance point of view, and I think
the existing comment is admitting that AArch64 has the properties that
would normally cause us to set SLOW_BYTE_ACCESS to 1.  But the comment
is claiming that there's a power-saving benefit to leaving it off.

It seems like a weak argument though.  Bitfields are used when several
values are packed into the same integer, so there's a high likelihood
we'll need the whole integer anyway.  Avoiding the redundancies described
in the documention should if anything help with power usage.

Maybe the main concern was using a 64-bit access when a 32-bit one
would do, since 32-bit bitfield containers are the most common.  But the:

  && GET_MODE_ALIGNMENT (mode) <= align

condition in get_best_mode should avoid that unless the 64-bit
access is naturally aligned.  (See the big comment above for the
pros and cons of this.)

So I think we should change the macro value unless anyone can back up the
power-saving claim.  Let's wait a week (more) to see if anyone objects.


IIRC, that power saving comment comes from the original port and
probably from when
the port was first written which is probably more than 10 years now.



Yes.  Don't forget that at that time the INSV and EXTV expanders only 
operated on a single mode, which on AArch64 was 64 bits.  IIRC, at the 
time this was written the compiler would widen everything to that size 
if there was a bitfield op and that led to worse code.


So it's probably not as relevant now as it once was.

R.


regards
Ramana

Ramana



The comment change isn't OK though.  Please keep the first paragraph
and just reword the second to say that's why we set the value to 1.

Thanks,
Richard

[wwwdocs, committed] Fix GCC 8.2 release date (was: wwwdocs/htdocs/gcc-8 index.html)

2019-10-16 Thread Thomas Schwinge

Hi!

On 2018-07-26T11:56:36+, ja...@gcc.gnu.org wrote:
> Files modified in the GCC repository. Log entry:
>
> Fix up a typo in the release year.

..., but the day also needs to be fixed.  ;-)

Pushed to wwwdocs the attached commit
0dd4c6860fe284cef2df33ec98b2754c25d10438 "Fix GCC 8.2 release date".


Grüße
 Thomas


From 0dd4c6860fe284cef2df33ec98b2754c25d10438 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 16 Oct 2019 18:02:00 +0200
Subject: [PATCH] Fix GCC 8.2 release date

---
 htdocs/gcc-8/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-8/index.html b/htdocs/gcc-8/index.html
index fb315d17..54bb3809 100644
--- a/htdocs/gcc-8/index.html
+++ b/htdocs/gcc-8/index.html
@@ -29,7 +29,7 @@ GCC 8.2 relative to previous releases of GCC.
 
 
 GCC 8.2
-Jul 14, 2018
+Jul 26, 2018
 (changes,
  http://gcc.gnu.org/onlinedocs/8.2.0/";>documentation)
 
-- 
2.17.1



signature.asc
Description: PGP signature

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Martin Sebor


On 10/16/19 9:50 AM, Jakub Jelinek wrote:

On Wed, Oct 16, 2019 at 09:43:49AM -0600, Martin Sebor wrote:

Should the warning trigger when the shadowing name results from
macro expansion?  The author of a macro can't (in general) know
what context it's going to be used, and when different macros
come from two different third party headers, it would seem
pointless to force their users to jump through hoops just to
avoid the innocuous shadowing.  Such as in this example:

#define Abs(x) \
   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))

#define Min(x, y) \
   __extension__ (({ __typeof__ (x) _x = x; __typeof__ (y) _y = y; _x < _y ?
_x : _y; }))

int f (int x, int y)
{
   return Abs (Min (x, y));   // -Wshadow for _x?
}


The counter example would be:
#define F(x) \
   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
#define G(x) \
   __extension__ (({ __typeof__ (x) _x = x; F(_x); }))
where a -Wshadow diagnostics could point the author at a serious bug,
because in the expansion it will be __typeof__ (_x) _x = _x; ...


True.  I don't suppose there is a way to make it so the warning
triggers for the counter example and not for the original, is
there?

Martin

PS The counterexample nicely illustrates why -Wself-init should
be in -Wall like in Clang or MSVC, or at least in -Wextra like in
ICC.  Let me take it as a reminder to submit a patch for GCC 10.

Re: [C++ Patch] Remove most uses of in_system_header_at

2019-10-16 Thread Paolo Carlini

... the below, slightly extended patch: 1- Makes sure the 
in_system_header_at calls surviving in decl.c get the same location used 
for the corresponding diagnostic; exploit locations[ds_typedef] in an 
error_at in grokdeclarator.


Tested, as usual, on x86_64-linux.

Thanks, Paolo.

/

/cp
2019-10-16  Paolo Carlini  

* decl.c (grokfndecl): Remove redundant use of in_system_header_at.
(compute_array_index_type_loc): Likewise.
(grokdeclarator): Likewise.
* error.c (cp_printer): Likewise.
* lambda.c (add_default_capture): Likewise.
* parser.c (cp_parser_primary_expression): Likewise.
(cp_parser_selection_statement): Likewise.
(cp_parser_toplevel_declaration): Likewise.
(cp_parser_enumerator_list): Likewise.
(cp_parser_using_declaration): Likewise.
(cp_parser_member_declaration): Likewise.
(cp_parser_exception_specification_opt): Likewise.
(cp_parser_std_attribute_spec): Likewise.
* pt.c (do_decl_instantiation): Likewise.
(do_type_instantiation): Likewise.
* typeck.c (cp_build_unary_op): Likewise.

* decl.c (check_tag_decl): Pass to in_system_header_at the same
location used for the permerror.
(grokdeclarator): Likewise.

* decl.c (check_tag_decl): Use locations[ds_typedef] in error_at.

/testsuite
2019-10-16  Paolo Carlini  

* g++.old-deja/g++.other/decl9.C: Check locations too.
Index: cp/decl.c
===
--- cp/decl.c   (revision 276984)
+++ cp/decl.c   (working copy)
@@ -4933,9 +4933,9 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
  "multiple types in one declaration");
   else if (declspecs->redefined_builtin_type)
 {
-  if (!in_system_header_at (input_location))
-   permerror (declspecs->locations[ds_redefined_builtin_type_spec],
-  "redeclaration of C++ built-in type %qT",
+  location_t loc = declspecs->locations[ds_redefined_builtin_type_spec];
+  if (!in_system_header_at (loc))
+   permerror (loc, "redeclaration of C++ built-in type %qT",
   declspecs->redefined_builtin_type);
   return NULL_TREE;
 }
@@ -4984,7 +4984,8 @@ check_tag_decl (cp_decl_specifier_seq *declspecs,
 --end example]  */
   if (saw_typedef)
{
- error ("missing type-name in typedef-declaration");
+ error_at (declspecs->locations[ds_typedef],
+   "missing type-name in typedef-declaration");
  return NULL_TREE;
}
   /* Anonymous unions are objects, so they can have specifiers.  */;
@@ -9328,7 +9329,6 @@ grokfndecl (tree ctype,
}
  /* 17.6.3.3.5  */
  if (suffix[0] != '_'
- && !in_system_header_at (location)
  && !current_function_decl && !(friendp && !funcdef_flag))
warning_at (location, OPT_Wliteral_suffix,
"literal operator suffixes not preceded by %<_%>"
@@ -10036,8 +10036,6 @@ compute_array_index_type_loc (location_t name_loc,
   indicated by the state of complain), so that
   another substitution can be found.  */
return error_mark_node;
- else if (in_system_header_at (input_location))
-   /* Allow them in system headers because glibc uses them.  */;
  else if (name)
pedwarn (loc, OPT_Wpedantic,
 "ISO C++ forbids zero-size array %qD", name);
@@ -11004,7 +11002,7 @@ grokdeclarator (const cp_declarator *declarator,
 
   if (type_was_error_mark_node)
/* We've already issued an error, don't complain more.  */;
-  else if (in_system_header_at (input_location) || flag_ms_extensions)
+  else if (in_system_header_at (id_loc) || flag_ms_extensions)
/* Allow it, sigh.  */;
   else if (! is_main)
permerror (id_loc, "ISO C++ forbids declaration of %qs with no type",
@@ -11037,7 +11035,7 @@ grokdeclarator (const cp_declarator *declarator,
}
   /* Don't pedwarn if the alternate "__intN__" form has been used instead
 of "__intN".  */
-  else if (!int_n_alt && pedantic && ! in_system_header_at 
(input_location))
+  else if (!int_n_alt && pedantic)
pedwarn (declspecs->locations[ds_type_spec], OPT_Wpedantic,
 "ISO C++ does not support %<__int%d%> for %qs",
 int_n_data[declspecs->int_n_idx].bitsize, name);
@@ -12695,10 +12693,7 @@ grokdeclarator (const cp_declarator *declarator,
else
  {
/* Array is a flexible member.  */
-   if (in_system_header_at (input_location))
- /* Do not warn on flexible array members in system
-headers because glibc uses them.  */;
-   else if (name)
+   if (name)
  pedwarn (id_loc, OPT_Wpedantic,

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 09:43:49AM -0600, Martin Sebor wrote:
> Should the warning trigger when the shadowing name results from
> macro expansion?  The author of a macro can't (in general) know
> what context it's going to be used, and when different macros
> come from two different third party headers, it would seem
> pointless to force their users to jump through hoops just to
> avoid the innocuous shadowing.  Such as in this example:
> 
> #define Abs(x) \
>   __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
> 
> #define Min(x, y) \
>   __extension__ (({ __typeof__ (x) _x = x; __typeof__ (y) _y = y; _x < _y ?
> _x : _y; }))
> 
> int f (int x, int y)
> {
>   return Abs (Min (x, y));   // -Wshadow for _x?
> }

The counter example would be:
#define F(x) \
  __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))
#define G(x) \
  __extension__ (({ __typeof__ (x) _x = x; F(_x); }))
where a -Wshadow diagnostics could point the author at a serious bug,
because in the expansion it will be __typeof__ (_x) _x = _x; ...

Jakub

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Martin Sebor


On 10/16/19 9:11 AM, Richard Sandiford wrote:

Sorry for the slow reply.

Bernd Edlinger  writes:

Hi,

this is probably on the border to obvious.

The REGEXP_xxx macros in genautomata are invoked
recursively, and the local values are all named _regexp
and shadow each other.


Fixed by using different names _regexp1..6 for each
macro.


Sorry to repeat the complaint about numerical suffixes, but I think
we'd need better names.  E.g. _regexp_unit or _re_unit for REGEXP_UNIT
and similarly for the other macros.  But a similar fix to rtl.h might
be better.


Should the warning trigger when the shadowing name results from
macro expansion?  The author of a macro can't (in general) know
what context it's going to be used, and when different macros
come from two different third party headers, it would seem
pointless to force their users to jump through hoops just to
avoid the innocuous shadowing.  Such as in this example:

#define Abs(x) \
  __extension__ (({ __typeof__ (x) _x = x; _x < 0 ? -_x : _x; }))

#define Min(x, y) \
  __extension__ (({ __typeof__ (x) _x = x; __typeof__ (y) _y = y; _x < 
_y ? _x : _y; }))


int f (int x, int y)
{
  return Abs (Min (x, y));   // -Wshadow for _x?
}

Martin


Thanks,
Richard


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

2019-10-04  Bernd Edlinger  

* genautomata.c (REGEXP_UNIT, REGEXP_RESERV, REGEXP_SEQUENCE,
REGEXP_REPEAT, REGEXP_ALLOF, REGEXP_ONEOF): Rename local vars.

Index: gcc/genautomata.c
===
--- gcc/genautomata.c   (revision 276484)
+++ gcc/genautomata.c   (working copy)
@@ -984,46 +984,46 @@ decl_mode_check_failed (enum decl_mode mode, const
  
  
  #define REGEXP_UNIT(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_unit)  \
-   regexp_mode_check_failed (_regexp->mode, "rm_unit",\
+(({ struct regexp *const _regex1 = (r);
\
+ if (_regex1->mode != rm_unit)  \
+   regexp_mode_check_failed (_regex1->mode, "rm_unit",\
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.unit; }))
+ &(_regex1)->regexp.unit; }))
  
  #define REGEXP_RESERV(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_reserv)\
-   regexp_mode_check_failed (_regexp->mode, "rm_reserv",  \
+(({ struct regexp *const _regex2 = (r);
\
+ if (_regex2->mode != rm_reserv)\
+   regexp_mode_check_failed (_regex2->mode, "rm_reserv",  \
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.reserv; }))
+ &(_regex2)->regexp.reserv; }))
  
  #define REGEXP_SEQUENCE(r) __extension__\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_sequence)  \
-   regexp_mode_check_failed (_regexp->mode, "rm_sequence",\
+(({ struct regexp *const _regex3 = (r);
\
+ if (_regex3->mode != rm_sequence)  \
+   regexp_mode_check_failed (_regex3->mode, "rm_sequence",\
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.sequence; }))
+ &(_regex3)->regexp.sequence; }))
  
  #define REGEXP_REPEAT(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_repeat)\
-   regexp_mode_check_failed (_regexp->mode, "rm_repeat",  \
+(({ struct regexp *const _regex4 = (r);
\
+ if (_regex4->mode != rm_repeat)\
+   regexp_mode_check_failed (_regex4->mode, "rm_repeat",  \
   __FILE__, __LINE__, __FUNCTION__);   \
- &(_regexp)->regexp.repeat; }))
+ &(_regex4)->regexp.repeat; }))
  
  #define REGEXP_ALLOF(r) __extension__	\

-(({ struct regexp *const _regexp = (r);
\
- if (_regexp->mode != rm_allof) \
-   regexp_mode_check_failed (_regexp->mode, "rm_allof",   \
+(({ struct regexp *const _regex5 = (r);
\
+ if (_regex5->mode != rm_allof) \
+   regexp_mode_check_failed (_regex5->mode, "rm_allof",   \
   __FILE__, __LINE__, __FUNC

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-16 Thread Richard Earnshaw (lists)


On 16/10/2019 13:13, Wilco Dijkstra wrote:

Hi Christophe,


I've noticed that your patch caused a regression:
FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
"internal loop alignment added" 1


That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into like this:

.L17:
         adds    r0, r0, #1
         b       .L27
.L6:
         ldr     r4, [r2, #12]
         adds    r0, r0, #4
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         ldr     r4, [r2, #12]
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         ldr     r4, [r2, #12]
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
.L27:
         ldr     r4, [r2, #12]
         cmp     ip, r0
         ldr     lr, [r1]
         str     lr, [r3, r4, lsl #2]
         bne     .L6
         pop     {r4, pc}

It seems minor changes in scheduling allows blocks to be commoned or not.
The underlying issue is that commoning like this should not be allowed on blocks
with different profile stats - particularly on loops where it inhibits 
scheduling of
the loop itself.

Cheers,
Wilco



So what's your proposed solution?  Leaving the test failing is not an 
option.

Re: [PATCH] [MIPS] Remove unnecessary moves around dpadd and dpsub

2019-10-16 Thread Jeff Law

On 10/16/19 9:03 AM, Mihailo Stojanovic wrote:
> Unnecessary moves around dpadd and dpsub are caused by different pseudos
> being assigned to the input-output operands which correspond to the same
> register.
> 
> This forces the same pseudo to the input-output operands, which removes
> unnecesary moves.
> 
> Tested on mips-mti-linux-gnu.
> 
> gcc/ChangeLog:
> 
> * gcc/config/mips/mips.c (mips_expand_builtin_insn): Force the
> operands which correspond to the same input-output register to
> have the same pseudo assigned to them.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c: New test.
THanks.  Installed.
jeff

Re: [PATCH] Fix -Wshadow=local warnings in genautomata.c

2019-10-16 Thread Richard Sandiford

Sorry for the slow reply.

Bernd Edlinger  writes:
> Hi,
>
> this is probably on the border to obvious.
>
> The REGEXP_xxx macros in genautomata are invoked
> recursively, and the local values are all named _regexp
> and shadow each other.
>
>
> Fixed by using different names _regexp1..6 for each
> macro.

Sorry to repeat the complaint about numerical suffixes, but I think
we'd need better names.  E.g. _regexp_unit or _re_unit for REGEXP_UNIT
and similarly for the other macros.  But a similar fix to rtl.h might
be better.

Thanks,
Richard

> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?
>
>
> Thanks
> Bernd.
>
> 2019-10-04  Bernd Edlinger  
>
>   * genautomata.c (REGEXP_UNIT, REGEXP_RESERV, REGEXP_SEQUENCE,
>   REGEXP_REPEAT, REGEXP_ALLOF, REGEXP_ONEOF): Rename local vars.
>
> Index: gcc/genautomata.c
> ===
> --- gcc/genautomata.c (revision 276484)
> +++ gcc/genautomata.c (working copy)
> @@ -984,46 +984,46 @@ decl_mode_check_failed (enum decl_mode mode, const
>  
>  
>  #define REGEXP_UNIT(r) __extension__ \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_unit)   \
> -   regexp_mode_check_failed (_regexp->mode, "rm_unit",   \
> +(({ struct regexp *const _regex1 = (r);  
> \
> + if (_regex1->mode != rm_unit)   \
> +   regexp_mode_check_failed (_regex1->mode, "rm_unit",   \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.unit; }))
> + &(_regex1)->regexp.unit; }))
>  
>  #define REGEXP_RESERV(r) __extension__   
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_reserv) \
> -   regexp_mode_check_failed (_regexp->mode, "rm_reserv", \
> +(({ struct regexp *const _regex2 = (r);  
> \
> + if (_regex2->mode != rm_reserv) \
> +   regexp_mode_check_failed (_regex2->mode, "rm_reserv", \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.reserv; }))
> + &(_regex2)->regexp.reserv; }))
>  
>  #define REGEXP_SEQUENCE(r) __extension__ \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_sequence)   
> \
> -   regexp_mode_check_failed (_regexp->mode, "rm_sequence",   
> \
> +(({ struct regexp *const _regex3 = (r);  
> \
> + if (_regex3->mode != rm_sequence)   
> \
> +   regexp_mode_check_failed (_regex3->mode, "rm_sequence",   
> \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.sequence; }))
> + &(_regex3)->regexp.sequence; }))
>  
>  #define REGEXP_REPEAT(r) __extension__   
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_repeat) \
> -   regexp_mode_check_failed (_regexp->mode, "rm_repeat", \
> +(({ struct regexp *const _regex4 = (r);  
> \
> + if (_regex4->mode != rm_repeat) \
> +   regexp_mode_check_failed (_regex4->mode, "rm_repeat", \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.repeat; }))
> + &(_regex4)->regexp.repeat; }))
>  
>  #define REGEXP_ALLOF(r) __extension__
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_allof)  \
> -   regexp_mode_check_failed (_regexp->mode, "rm_allof",  \
> +(({ struct regexp *const _regex5 = (r);  
> \
> + if (_regex5->mode != rm_allof)  \
> +   regexp_mode_check_failed (_regex5->mode, "rm_allof",  \
>  __FILE__, __LINE__, __FUNCTION__);   \
> - &(_regexp)->regexp.allof; }))
> + &(_regex5)->regexp.allof; }))
>  
>  #define REGEXP_ONEOF(r) __extension__
> \
> -(({ struct regexp *const _regexp = (r);  
> \
> - if (_regexp->mode != rm_oneof)  \
> -   regexp_mode_check_failed (_regexp->mode, "rm_oneof",  \
> +(({ struct regexp *const _regex6 = (r);

Re: [PATCH] handle local aggregate initialization in strlen, take 2 (PR 83821)

2019-10-16 Thread Jeff Law

On 10/14/19 6:23 PM, Martin Sebor wrote:
> When a subsequent element or member of a local aggregate containing
> a prior character array is initialized the strlen pass discards
> the length it computed for the prior element/member.  E.g., here:
> 
>   struct { char a[4], b[4]; } s = { "1", "12" };
> 
> even though strlen (s.b) is folded to 2, strlen (s.a) is not.  (Ditto
> for other stores even to members of other types.)  This causes hundreds
> (over 700 in GCC) to thousands (nearly 3,000 in Binutils/GDB and some
> 36,000 in the kernel) of instances of previously computed string lengths
> to end up discarded and so besides emitting less than optimal code also
> defeats buffer overflow detection in such cases.
> 
> Attached is a resubmission of a previously approved patch that I never
> committed (the original had a bug that was noted during review that
> I subsequently fixed but I didn't remember to post the corrected patch).
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-83821.diff
> 
> PR tree-optimization/83821 - local aggregate initialization defeats strlen 
> optimization
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/83821
>   * tree-ssa-strlen.c (maybe_invalidate): Add argument.  Consider
>   the length of a string when available.
>   (handle_builtin_memset) Add argument.
>   (handle_store, strlen_check_and_optimize_call): Same.
>   (check_and_optimize_stmt): Same.  Pass it to callees.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/83821
>   * c-c++-common/Warray-bounds-4.c: Remove XFAIL.
>   * gcc.dg/strlenopt-80.c: New test.
>   * gcc.dg/strlenopt-81.c: Same.
>   * gcc.dg/strlenopt-82.c: Same.
>   * gcc.dg/strlenopt-83.c: Same.
>   * gcc.dg/strlenopt-84.c: Same.
>   * gcc.dg/tree-ssa/calloc-4.c: Same.
>   * gcc.dg/tree-ssa/calloc-5.c: Same.
OK.

Jeff

Re: [PATCH] handle string copies with non-constant lengths (PR 91996)

2019-10-16 Thread Jeff Law

On 10/15/19 3:24 PM, Martin Sebor wrote:
> The attached patch removes a FIXME added recently to the strlen
> pass as a reminder to extend the handling of multi-byte stores
> of characters copied from non-constant strings with constant
> lengths to strings with non-constant lengths in some known range.
> For the string length range information it relies on the EVRP
> instance introduced into the pass with the sprintf integration
> and so far only used by sprintf.  (This is just a small first
> step in generalizing the strlen pass to take advantage of strlen
> ranges.)
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-91996.diff
> 
> PR tree-optimization/91996 - fold non-constant strlen relational expressions
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/91996
>   * gcc.dg/strlenopt-80.c: New test.
>   * gcc.dg/strlenopt-81.c: New test.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/91996
>   * tree-ssa-strlen.c (maybe_warn_pointless_strcmp): Improve location
>   information.
>   (compare_nonzero_chars): Add an overload.
>   (count_nonzero_bytes): Add an argument.  Call overload above.
>   Handle non-constant lengths in some range.
>   (handle_store): Add an argument.
>   (check_and_optimize_stmt): Pass an argument to handle_store.
OK
jeff

[PATCH] i386: Add clear_ratio to processor_costs

2019-10-16 Thread H.J. Lu

i386.h has

 #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)

It is impossible to have CLEAR_RATIO > 6.  This patch adds clear_ratio
to processor_costs, sets it to the minimum of 6 and move_ratio in all
cost models and defines CLEAR_RATIO with clear_ratio.

* config/i386/i386.h (processor_costs): Add clear_ratio.
(CLEAR_RATIO): Remove MIN and use ix86_cost->clear_ratio.
* config/i386/x86-tune-costs.h: Set clear_ratio to the minimum
of 6 and move_ratio in all cost models.

OK for trunk?

Thanks.

-- 
H.J.
From 266be647ebab13a461a91d6cdcb0f9c792a3714a Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 15 Oct 2019 08:12:47 -0700
Subject: [PATCH] i386: Add clear_ratio to processor_costs

i386.h has

 #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)

It is impossible to have CLEAR_RATIO > 6.  This patch adds clear_ratio
to processor_costs, sets it to the minimum of 6 and move_ratio in all
cost models and defines CLEAR_RATIO with clear_ratio.

	* config/i386/i386.h (processor_costs): Add clear_ratio.
	(CLEAR_RATIO): Remove MIN and use ix86_cost->clear_ratio.
	* config/i386/x86-tune-costs.h: Set clear_ratio to the minimum
	of 6 and move_ratio in all cost models.
---
 gcc/config/i386/i386.h   |  4 +++-
 gcc/config/i386/x86-tune-costs.h | 24 
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 4e37336c7140..afa0aa83ddf3 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -291,6 +291,8 @@ struct processor_costs {
   const int large_insn;		/* insns larger than this cost more */
   const int move_ratio;		/* The threshold of number of scalar
    memory-to-memory move insns.  */
+  const int clear_ratio;	/* The threshold of number of scalar
+   memory clearing insns.  */
   const int int_load[3];	/* cost of loading integer registers
    in QImode, HImode and SImode relative
    to reg-reg move (2).  */
@@ -1947,7 +1949,7 @@ typedef struct ix86_args {
 /* If a clear memory operation would take CLEAR_RATIO or more simple
move-instruction sequences, we will do a clrmem or libcall instead.  */
 
-#define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
+#define CLEAR_RATIO(speed) ((speed) ? ix86_cost->clear_ratio : 2)
 
 /* Define if shifts truncate the shift count which implies one can
omit a sign-extension or zero-extension of a shift count.
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 8e6f4b5d3ea5..99816aeaebc1 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -81,6 +81,7 @@ struct processor_costs ix86_size_cost = {/* costs for tuning for size */
   COSTS_N_BYTES (3),			/* cost of movzx */
   0,	/* "large" insn */
   2,	/* MOVE_RATIO */
+  2,	/* CLEAR_RATIO */
   {2, 2, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -185,6 +186,7 @@ struct processor_costs i386_cost = {	/* 386 specific costs */
   COSTS_N_INSNS (2),			/* cost of movzx */
   15,	/* "large" insn */
   3,	/* MOVE_RATIO */
+  3,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -286,6 +288,7 @@ struct processor_costs i486_cost = {	/* 486 specific costs */
   COSTS_N_INSNS (2),			/* cost of movzx */
   15,	/* "large" insn */
   3,	/* MOVE_RATIO */
+  3,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -389,6 +392,7 @@ struct processor_costs pentium_cost = {
   COSTS_N_INSNS (2),			/* cost of movzx */
   8,	/* "large" insn */
   6,	/* MOVE_RATIO */
+  6,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -483,6 +487,7 @@ struct processor_costs lakemont_cost = {
   COSTS_N_INSNS (2),			/* cost of movzx */
   8,	/* "large" insn */
   17,	/* MOVE_RATIO */
+  6,	/* CLEAR_RATIO */
   {2, 4, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -592,6 +597,7 @@ struct processor_costs pentiumpro_cost = {
   COSTS_N_INSNS (1),			/* cost of movzx */
   8,	/* "large" insn */
   6,	/* MOVE_RATIO */
+  6,	/* CLEAR_RATIO */
   {4, 4, 4},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to reg-reg move (2).  */
@@ -692,6 +698,7 @@ struct processor_costs geode_cost = {
   COSTS_N_INSNS (1),			/* cost of movzx */
   8,	/* "large" insn */
   4,	/* MOVE_RATIO */
+  4,	/* CLEAR_RATIO */
   {2, 2, 2},/* cost of loading integer registers
 	   in QImode, HImode and SImode.
 	   Relative to

[PATCH] [MIPS] Remove unnecessary moves around dpadd and dpsub

2019-10-16 Thread Mihailo Stojanovic

Unnecessary moves around dpadd and dpsub are caused by different pseudos
being assigned to the input-output operands which correspond to the same
register.

This forces the same pseudo to the input-output operands, which removes
unnecesary moves.

Tested on mips-mti-linux-gnu.

gcc/ChangeLog:

* gcc/config/mips/mips.c (mips_expand_builtin_insn): Force the
operands which correspond to the same input-output register to
have the same pseudo assigned to them.

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c: New test.
---
 gcc/config/mips/mips.c  | 20 ++
 gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c | 28 +
 2 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 7f6a0db..3a77097 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -16960,6 +16960,26 @@ mips_expand_builtin_insn (enum insn_code icode, 
unsigned int nops,
   std::swap (ops[1], ops[3]);
   break;
 
+case CODE_FOR_msa_dpadd_s_w:
+case CODE_FOR_msa_dpadd_s_h:
+case CODE_FOR_msa_dpadd_s_d:
+case CODE_FOR_msa_dpadd_u_w:
+case CODE_FOR_msa_dpadd_u_h:
+case CODE_FOR_msa_dpadd_u_d:
+case CODE_FOR_msa_dpsub_s_w:
+case CODE_FOR_msa_dpsub_s_h:
+case CODE_FOR_msa_dpsub_s_d:
+case CODE_FOR_msa_dpsub_u_w:
+case CODE_FOR_msa_dpsub_u_h:
+case CODE_FOR_msa_dpsub_u_d:
+  /* Force the operands which correspond to the same in-out register
+ to have the same pseudo assigned to them.  If the input operand
+ is not REG, create one for it.  */
+  if (!REG_P (ops[1].value))
+   ops[1].value = copy_to_mode_reg (ops[1].mode, ops[1].value);
+  create_output_operand (&ops[0], ops[1].value, ops[1].mode);
+  break;
+
 default:
   break;
   }
diff --git a/gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c 
b/gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c
new file mode 100644
index 000..c665bdf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/msa-dpadd-dpsub.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mfp64 -mhard-float -mmsa" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+typedef short v8i16 __attribute__ ((vector_size (16)));
+typedef int v4i32 __attribute__ ((vector_size (16)));
+
+void foo (int *x, v8i16 *y, v8i16 *z)
+{
+   v4i32 acc[4];
+
+acc[0] = __builtin_msa_ld_w(x, 0);
+acc[1] = __builtin_msa_ld_w(x, 16);
+acc[2] = __builtin_msa_ld_w(x, 32);
+acc[3] = __builtin_msa_ld_w(x, 48);
+   
+acc[0] = __builtin_msa_dpadd_s_w(acc[0], y[0], z[0]);
+acc[1] = __builtin_msa_dpadd_s_w(acc[1], y[1], z[0]);
+acc[2] = __builtin_msa_dpsub_s_w(acc[2], y[0], z[1]);
+acc[3] = __builtin_msa_dpsub_s_w(acc[3], y[1], z[1]);
+
+__builtin_msa_st_w(acc[0], x, 0);
+__builtin_msa_st_w(acc[1], x, 16);
+__builtin_msa_st_w(acc[2], x, 32);
+__builtin_msa_st_w(acc[3], x, 48);
+}
+
+/* { dg-final { scan-assembler-not "move.v" } } */
-- 
2.7.4

Re: [PATCH] find_partition_fixes: remove unused bbs_in_cold_partition variable

2019-10-16 Thread Jeff Law

On 10/16/19 8:26 AM, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on x86_64-redhat-linux.
> I noticed this while looking into PR92007.
> 
> gcc/ChangeLog:
> 
> 2019-10-16  Ilya Leoshkevich  
> 
>   * cfgrtl.c (find_partition_fixes): Remove bbs_in_cold_partition.
OK.
Thanks,
Jeff

[PATCH] V6, #17 of 17: Add stack protection test

2019-10-16 Thread Michael Meissner

This is a new test for the stack protection code that was added in V6 patch #4.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c: New
test to make sure -fstack-protect-strong works with prefixed
addressing.

Index: gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (revision 
277050)
+++ gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */
+
+/* Test that we can handle large stack frames with -fstack-protector-strong and
+   prefixed addressing.  */
+
+extern long foo (char *);
+
+long
+bar (void)
+{
+  char buffer[0x2];
+  return foo (buffer) + 1;
+}
+
+/* { dg-final { scan-assembler {\mpld\M}  } } */
+/* { dg-final { scan-assembler {\mpstd\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #16 of 17: Wrong subject, should have been update @pcrel

2019-10-16 Thread Michael Meissner

Note, patch #16 had the wrong subject line.  It should have been that modifies
@pcrel to use an explicit (0),1.  Sorry about that.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #16 of 17: New test for stack protection

2019-10-16 Thread Michael Meissner

This patch adds an explicit (0),1 to labels used with the @pcrel syntax.  The
intention is make sure that the user does not use an instruction that assumes
PC-relative instructions can take a base register (as I did in the V4 patches).
This was V5 patch #15.  This patch is optional.  If it is not applied the code
will still work.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/rs6000.c (print_operand_address): Add (0),1 to
@pcrel to catch errant usage.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 277029)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -13272,7 +13272,10 @@ print_operand_address (FILE *file, rtx x
   if (SYMBOL_REF_P (x) && !SYMBOL_REF_LOCAL_P (x))
fprintf (file, "@got");
 
-  fprintf (file, "@pcrel");
+  /* Specifically add (0),1 to catch uses where a @pcrel was added to a an
+address with a base register, since the hardware does not support
+adding a base register to a PC-relative address.  */
+  fprintf (file, "@pcrel(0),1");
 }
   else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
   || GET_CODE (x) == LABEL_REF)

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #15 of 17: Add PC-relative tests

2019-10-16 Thread Michael Meissner

This patch adds PC-relative tests for the various types, and verifies that
expected instructions are generated.  This is the same as V5 patch #14.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h: New set of
tests to test prefixed addressing on 'future' system with
PC-relative tests.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-qi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sf.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-si.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-udi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uhi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-uqi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-usi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-pcrel-v2df.c: New test.

Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c  (revision 276779)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for SImode.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c  (revision 276779)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-df.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for DFmode.  */
+
+#define TYPE double
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c  (revision 276779)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-di.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for DImode.  */
+
+#define TYPE long
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c  (revision 276779)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-hi.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether pc-relative prefixed
+   instructions are generated for HImode.  */
+
+#define TYPE short
+
+#include "prefix-pcrel.h"
+
+/* { dg-final { scan-assembler-times {\mplh[az]\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpsth\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c  (revision 276779)
+++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+

Re: [PATCH 3/4] Factor out duplicate code in gimplify_scan_omp_clauses

2019-10-16 Thread Thomas Schwinge

Hi Julian!

On 2019-10-06T15:32:36-0700, Julian Brown  wrote:
> This patch factors out some code in gimplify_scan_omp_clauses into two
> new outlined functions.

Yay for such simplification, and yay for documenting what's going on!

> Previously approved here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01309.html
>
> FAOD, OK for trunk?

I have not reviewed whether it's a suitable refactoring; I'm not at all
familiar with that code.

I started to "functionally" review it by re-inlining your outlined code,
and checking the differences to the original code.  Additionally to the
'gcc_assert' item I just raised with Jakub, there are a few more things I
don't understand:

>   gcc/
>   * gimplify.c (insert_struct_comp_map, check_base_and_compare_lt): New.
>   (gimplify_scan_omp_clauses): Outline duplicated code into calls to
>   above two functions.
> ---
>  gcc/gimplify.c | 307 -
>  1 file changed, 174 insertions(+), 133 deletions(-)

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -8120,6 +8120,160 @@ gimplify_omp_depend (tree *list_p, gimple_seq *pre_p)
>return 1;
>  }
>  
> +/* Insert a GOMP_MAP_ALLOC or GOMP_MAP_RELEASE node following a
> +   GOMP_MAP_STRUCT mapping.  C is an always_pointer mapping.  STRUCT_NODE is
> +   the struct node to insert the new mapping after (when the struct node is
> +   initially created).  PREV_NODE is the first of two or three mappings for a
> +   pointer, and is either:
> + - the node before C, when a pair of mappings is used, e.g. for a C/C++
> +   array section.
> + - not the node before C.  This is true when we have a 
> reference-to-pointer
> +   type (with a mapping for the reference and for the pointer), or for
> +   Fortran derived-type mappings with a GOMP_MAP_TO_PSET.
> +   If SCP is non-null, the new node is inserted before *SCP.
> +   if SCP is null, the new node is inserted before PREV_NODE.
> +   The return type is:
> + - PREV_NODE, if SCP is non-null.
> + - The newly-created ALLOC or RELEASE node, if SCP is null.
> + - The second newly-created ALLOC or RELEASE node, if we are mapping a
> +   reference to a pointer.  */
> +
> +static tree
> +insert_struct_comp_map (enum tree_code code, tree c, tree struct_node,
> + tree prev_node, tree *scp)
> +{
> +  enum gomp_map_kind mkind
> += ((code == OMP_TARGET_EXIT_DATA || code == OACC_EXIT_DATA)
> +   ? GOMP_MAP_RELEASE : GOMP_MAP_ALLOC);

The original code does not handle 'OACC_EXIT_DATA', so that will be a
change separate from this refactoring?

> +
> +  tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
> +  tree cl = scp ? prev_node : c2;
> +  OMP_CLAUSE_SET_MAP_KIND (c2, mkind);
> +  OMP_CLAUSE_DECL (c2) = unshare_expr (OMP_CLAUSE_DECL (c));
> +  OMP_CLAUSE_CHAIN (c2) = scp ? *scp : prev_node;
> +  OMP_CLAUSE_SIZE (c2) = TYPE_SIZE_UNIT (ptr_type_node);
> +  if (struct_node)
> +OMP_CLAUSE_CHAIN (struct_node) = c2;
> +
> +  /* We might need to create an additional mapping if we have a reference to 
> a
> + pointer (in C++).  Don't do this if we have something other than a
> + GOMP_MAP_ALWAYS_POINTER though, i.e. a GOMP_MAP_TO_PSET.  */
> +  if (OMP_CLAUSE_CHAIN (prev_node) != c
> +  && OMP_CLAUSE_CODE (OMP_CLAUSE_CHAIN (prev_node)) == OMP_CLAUSE_MAP
> +  && (OMP_CLAUSE_MAP_KIND (OMP_CLAUSE_CHAIN (prev_node))
> +   == GOMP_MAP_ALWAYS_POINTER))

In the original code, and with 'prev_node' being '*prev_list_p', this
condition was just 'if (OMP_CLAUSE_CHAIN (*prev_list_p) != c)', without
the 'GOMP_MAP_ALWAYS_POINTER' handling, so that'd be another change
separate from this refactoring?

> +{
> +  tree c4 = OMP_CLAUSE_CHAIN (prev_node);
> +  tree c3 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP);
> +  OMP_CLAUSE_SET_MAP_KIND (c3, mkind);
> +  OMP_CLAUSE_DECL (c3) = unshare_expr (OMP_CLAUSE_DECL (c4));
> +  OMP_CLAUSE_SIZE (c3) = TYPE_SIZE_UNIT (ptr_type_node);
> +  OMP_CLAUSE_CHAIN (c3) = prev_node;
> +  if (!scp)
> + OMP_CLAUSE_CHAIN (c2) = c3;
> +  else
> + cl = c3;
> +}
> +
> +  if (scp)
> +*scp = c2;
> +
> +  return cl;
> +}
> +
> +/* Called initially with ORIG_BASE non-null, sets PREV_BITPOS and 
> PREV_POFFSET
> +   to the offset of the field given in BASE.  Return type is 1 if BASE is 
> equal
> +   to *ORIG_BASE after stripping off ARRAY_REF and INDIRECT_REF nodes and
> +   calling get_inner_reference, else 0.
> +
> +   Called subsequently with ORIG_BASE null, compares the offset of the field
> +   given in BASE to PREV_BITPOS, PREV_POFFSET. Returns -1 if the base object
> +   has changed, 0 if the new value has a higher bit position than that
> +   described by the aforementioned arguments, or 1 if the new value is less
> +   than them.  Used for (insertion) sorting components after a 
> GOMP_MAP_STRUCT
> +   mapping.  */

Hmm, that doesn't seem to be the most intuit

[PATCH] V6, #14 of 17: Add prefixed load/store tests with large offsets

2019-10-16 Thread Michael Meissner

This patch adds a bunch of tests for each of the types to verify that it
generates the appropriate instructions for addresses that do not fit in a
16-bit offset.  This patch is essentially V5 patch #13.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* gcc/testsuite/gcc.target/powerpc/prefix-large.h: New set of
tests to test prefixed addressing on 'future' system with large
numeric offsets.
* gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-df.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-di.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-qi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-sf.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-si.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-udi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-uhi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-uqi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-usi.c: New test.
* gcc/testsuite/gcc.target/powerpc/prefix-large-v2df.c: New test.

Index: gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c  (revision 276777)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE _Decimal64
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-df.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-df.c  (revision 276777)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-df.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE double
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplfd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstfd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-di.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-di.c  (revision 276777)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-di.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE long
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mpld\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c  (revision 276777)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-hi.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests for prefixed instructions testing whether we can generate a prefixed
+   load/store instruction that has a 34-bit offset.  */
+
+#define TYPE short
+
+#include "prefix-large.h"
+
+/* { dg-final { scan-assembler-times {\mplh[az]\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpsth\M} 2 } } */
Index: gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c  (revision 276777)
+++ gcc/testsuite/gcc.target/powerpc/prefix-large-kf.c  (workin

[PATCH] V6, #13 of 17: Add test for prefix pre-modify

2019-10-16 Thread Michael Meissner

This patch adds a test to make sure the GCC compiler does not try to issue a
pre-modify prefixed address load/store since the prefixed instructions do not
support an update form.  This patch was in V5 patch #12 but it was split out.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* gcc.target/powerpc/prefix-premodify.c: New test to make sure we
do not generate PRE_INC, PRE_DEC, or PRE_MODIFY on prefixed loads
or stores.

Index: gcc/testsuite/gcc.target/powerpc/prefix-premodify.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-premodify.c (revision 277039)
+++ gcc/testsuite/gcc.target/powerpc/prefix-premodify.c (working copy)
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Make sure that we don't try to generate a prefixed form of the load and
+   store with update instructions.  */
+
+#ifndef SIZE
+#define SIZE 5
+#endif
+
+struct foo {
+  unsigned int field;
+  char pad[SIZE];
+};
+
+struct foo *inc_load (struct foo *p, unsigned int *q)
+{
+  *q = (++p)->field;
+  return p;
+}
+
+struct foo *dec_load (struct foo *p, unsigned int *q)
+{
+  *q = (--p)->field;
+  return p;
+}
+
+struct foo *inc_store (struct foo *p, unsigned int *q)
+{
+  (++p)->field = *q;
+  return p;
+}
+
+struct foo *dec_store (struct foo *p, unsigned int *q)
+{
+  (--p)->field = *q;
+  return p;
+}
+
+/* { dg-final { scan-assembler-times {\mpli\M|\mpla\M|\mpaddi\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mplwz\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mp?lwzu\M}  } } */
+/* { dg-final { scan-assembler-not   {\mp?stwzu\M} } } */
+/* { dg-final { scan-assembler-not   {\maddis\M}   } } */
+/* { dg-final { scan-assembler-not   {\maddi\M}} } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #12 of 17: Add prefix test for DS/DQ instructions

2019-10-16 Thread Michael Meissner

This patch adds a new test that makes sure the appropriate prefixed instruction
is generated if a memory is attempted for DS-format or DQ-format instructions
where the offset does not fit the DS or DQ constraints.  This patch was in V5
patch #12 and was split out in this patch.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* gcc.target/powerpc/prefix-odd-memory.c: New test to make sure
prefixed instructions are generated if an offset would not be
legal for the non-prefixed DS/DQ instructions.

Index: gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c
===
--- gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c(revision 
277037)
+++ gcc/testsuite/gcc.target/powerpc/prefix-odd-memory.c(working copy)
@@ -0,0 +1,156 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Tests whether we can generate a prefixed load/store operation for addresses
+   that don't meet DS/DQ alignment constraints.  */
+
+unsigned long
+load_uc_odd (unsigned char *p)
+{
+  return p[1]; /* should generate LBZ.  */
+}
+
+long
+load_sc_odd (signed char *p)
+{
+  return p[1]; /* should generate LBZ + EXTSB.  */
+}
+
+unsigned long
+load_us_odd (unsigned char *p)
+{
+  return *(unsigned short *)(p + 1);   /* should generate LHZ.  */
+}
+
+long
+load_ss_odd (unsigned char *p)
+{
+  return *(short *)(p + 1);/* should generate LHA.  */
+}
+
+unsigned long
+load_ui_odd (unsigned char *p)
+{
+  return *(unsigned int *)(p + 1); /* should generate LWZ.  */
+}
+
+long
+load_si_odd (unsigned char *p)
+{
+  return *(int *)(p + 1);  /* should generate PLWA.  */
+}
+
+unsigned long
+load_ul_odd (unsigned char *p)
+{
+  return *(unsigned long *)(p + 1);/* should generate PLD.  */
+}
+
+long
+load_sl_odd (unsigned char *p)
+{
+  return *(long *)(p + 1); /* should generate PLD.  */
+}
+
+float
+load_float_odd (unsigned char *p)
+{
+  return *(float *)(p + 1);/* should generate LFS.  */
+}
+
+double
+load_double_odd (unsigned char *p)
+{
+  return *(double *)(p + 1);   /* should generate LFD.  */
+}
+
+__ieee128
+load_ieee128_odd (unsigned char *p)
+{
+  return *(__ieee128 *)(p + 1);/* should generate PLXV.  */
+}
+
+void
+store_uc_odd (unsigned char uc, unsigned char *p)
+{
+  p[1] = uc;   /* should generate STB.  */
+}
+
+void
+store_sc_odd (signed char sc, signed char *p)
+{
+  p[1] = sc;   /* should generate STB.  */
+}
+
+void
+store_us_odd (unsigned short us, unsigned char *p)
+{
+  *(unsigned short *)(p + 1) = us; /* should generate STH.  */
+}
+
+void
+store_ss_odd (signed short ss, unsigned char *p)
+{
+  *(signed short *)(p + 1) = ss;   /* should generate STH.  */
+}
+
+void
+store_ui_odd (unsigned int ui, unsigned char *p)
+{
+  *(unsigned int *)(p + 1) = ui;   /* should generate STW.  */
+}
+
+void
+store_si_odd (signed int si, unsigned char *p)
+{
+  *(signed int *)(p + 1) = si; /* should generate STW.  */
+}
+
+void
+store_ul_odd (unsigned long ul, unsigned char *p)
+{
+  *(unsigned long *)(p + 1) = ul;  /* should generate PSTD.  */
+}
+
+void
+store_sl_odd (signed long sl, unsigned char *p)
+{
+  *(signed long *)(p + 1) = sl;/* should generate PSTD.  */
+}
+
+void
+store_float_odd (float f, unsigned char *p)
+{
+  *(float *)(p + 1) = f;   /* should generate STF.  */
+}
+
+void
+store_double_odd (double d, unsigned char *p)
+{
+  *(double *)(p + 1) = d;  /* should generate STD.  */
+}
+
+void
+store_ieee128_odd (__ieee128 ieee, unsigned char *p)
+{
+  *(__ieee128 *)(p + 1) = ieee;/* should generate PSTXV.  */
+}
+
+/* { dg-final { scan-assembler-times {\mextsb\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlbz\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mlfd\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlfs\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlha\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlhz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mlwz\M}   1 } } */
+/* { dg-final { scan-assembler-times {\mpld\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mplwa\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mplxv\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mpstd\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mpstxv\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstb\

[PATCH] find_partition_fixes: remove unused bbs_in_cold_partition variable

2019-10-16 Thread Ilya Leoshkevich

Bootstrapped and regtested on x86_64-redhat-linux.
I noticed this while looking into PR92007.

gcc/ChangeLog:

2019-10-16  Ilya Leoshkevich  

* cfgrtl.c (find_partition_fixes): Remove bbs_in_cold_partition.
---
 gcc/cfgrtl.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 39fc7aa36bf..f279d083a6c 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -2375,7 +2375,6 @@ static vec
 find_partition_fixes (bool flag_only)
 {
   basic_block bb;
-  vec bbs_in_cold_partition = vNULL;
   vec bbs_to_fix = vNULL;
   hash_set set;
 
@@ -2394,7 +2393,6 @@ find_partition_fixes (bool flag_only)
else
  BB_SET_PARTITION (bb, BB_COLD_PARTITION);
bbs_to_fix.safe_push (bb);
-   bbs_in_cold_partition.safe_push (bb);
   }
 
   return bbs_to_fix;
-- 
2.23.0

[PATCH] V6, #11 of 17: Add PADDI tests

2019-10-16 Thread Michael Meissner

This patch adds 3 tests for the PADDI instruction.  This was originally part of
V5 patch #12, but it was split out.

2019-10-15   Michael Meissner  

* gcc.target/powerpc/paddi-1.c: New test to test using PLI to
load up a large DImode constant.
* gcc.target/powerpc/paddi-2.c: New test to test using PLI to
load up a large SImode constant.
* gcc.target/powerpc/paddi-3.c: New test to test using PADDI to
add a large DImode constant.

Index: gcc/testsuite/gcc.target/powerpc/paddi-1.c
===
--- gcc/testsuite/gcc.target/powerpc/paddi-1.c  (revision 276774)
+++ gcc/testsuite/gcc.target/powerpc/paddi-1.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PADDI is generated to add a large constant.  */
+unsigned long
+add (unsigned long a)
+{
+  return a + 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpaddi\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/paddi-2.c
===
--- gcc/testsuite/gcc.target/powerpc/paddi-2.c  (revision 276774)
+++ gcc/testsuite/gcc.target/powerpc/paddi-2.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant.  */
+unsigned long
+large (void)
+{
+  return 0x12345678UL;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */
Index: gcc/testsuite/gcc.target/powerpc/paddi-3.c
===
--- gcc/testsuite/gcc.target/powerpc/paddi-3.c  (revision 276774)
+++ gcc/testsuite/gcc.target/powerpc/paddi-3.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=future" } */
+
+/* Test that PLI (PADDI) is generated to load a large constant for SImode.  */
+void
+large_si (unsigned int *p)
+{
+  *p = 0x12345U;
+}
+
+/* { dg-final { scan-assembler {\mpli\M} } } */

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #10 of 17: Update target-supports.exp

2019-10-16 Thread Michael Meissner

This patch adds 2 new target supports options for the testsuite.  One is for
whether prefixed instructins with 34-bit offsets are supported.  The other is
whether PC-relative instructions are supported.  This was originally part of V5
patch #12, but it was split out to be a separate patch.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15   Michael Meissner  

* lib/target-supports.exp
(check_effective_target_powerpc_future_ok): Do not require 64-bit
or Linux support before doing the test.  Use a 32-bit constant in
PLI.
(check_effective_target_powerpc_prefixed_addr_ok): New effective
target test to see if prefixed memory instructions are supported.
(check_effective_target_powerpc_pcrel_ok): New effective target
test to test whether PC-relative addressing is supported.
(is-effective-target): Add test for the PowerPC 'future' hardware
support.

Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 276974)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -5307,16 +5307,14 @@ proc check_effective_target_powerpc_p9mo
 }
 }
 
-# Return 1 if this is a PowerPC target supporting -mfuture.
-# Limit this to 64-bit linux systems for now until other
-# targets support FUTURE.
+# Return 1 if this is a PowerPC target supporting -mcpu=future.
 
 proc check_effective_target_powerpc_future_ok { } {
-if { ([istarget powerpc64*-*-linux*]) } {
+if { ([istarget powerpc*-*-*]) } {
return [check_no_compiler_messages powerpc_future_ok object {
int main (void) {
long e;
-   asm ("pli %0,%1" : "=r" (e) : "n" (0x12345));
+   asm ("pli %0,%1" : "=r" (e) : "n" (0x1234));
return e;
}
} "-mfuture"]
@@ -5325,6 +5323,46 @@ proc check_effective_target_powerpc_futu
 }
 }
 
+# Return 1 if this is a PowerPC target supporting -mcpu=future.  The compiler
+# must support large numeric prefixed addresses by default when -mfuture is
+# used.  We test loading up a large constant to verify that the full 34-bit
+# offset for prefixed instructions is supported and we check for a prefixed
+# load as well.
+
+proc check_effective_target_powerpc_prefixed_addr_ok { } {
+if { ([istarget powerpc*-*-*]) } {
+   return [check_no_compiler_messages powerpc_prefixed_addr_ok object {
+   int main (void) {
+   extern long l[];
+   long e, e2;
+   asm ("pli %0,%1" : "=r" (e) : "n" (0x12345678));
+   asm ("pld %0,0x12345678(%1)" : "=r" (e2) : "r" (& l[0]));
+   return e - e2;
+   }
+   } "-mfuture"]
+} else {
+   return 0
+}
+}
+
+# Return 1 if this is a PowerPC target supporting -mfuture.  The compiler must
+# support PC-relative addressing when -mcpu=future is used to pass this test.
+
+proc check_effective_target_powerpc_pcrel_ok { } {
+if { ([istarget powerpc*-*-*]) } {
+   return [check_no_compiler_messages powerpc_pcrel_ok object {
+ int main (void) {
+ static int s __attribute__((__used__));
+ int e;
+ asm ("plwa %0,s@pcrel(0),1" : "=r" (e));
+ return e;
+ }
+ } "-mfuture"]
+  } else {
+ return 0
+  }
+}
+
 # Return 1 if this is a PowerPC target supporting -mfloat128 via either
 # software emulation on power7/power8 systems or hardware support on power9.
 
@@ -7223,6 +7261,7 @@ proc is-effective-target { arg } {
  "named_sections" { set selected [check_named_sections_available] }
  "gc_sections"{ set selected [check_gc_sections_available] }
  "cxa_atexit" { set selected [check_cxa_atexit_available] }
+ "powerpc_future_hw" { set selected 
[check_powerpc_future_hw_available] }
  default  { error "unknown effective target keyword `$arg'" }
}
 }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #9 of 17: Change defaults on Linux 64-bit to enable -mpcrel

2019-10-16 Thread Michael Meissner

This patch changes the default for Linux 64-bit to enable -mpcrel and
-mprefixed-addr by default when you use -mcpu=future.  Other OS targets do not
enable these switches by default.  This is the same as V5 patch #11.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/linux64.h (TARGET_PREFIXED_ADDR_DEFAULT): Enable
prefixed addressing by default.
(TARGET_PCREL_DEFAULT): Enable pc-relative addressing by default.
* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Only
enable -mprefixed-addr and -mpcrel if the OS tm.h says to enable
it.
(ADDRESSING_FUTURE_MASKS): New mask macro.
(OTHER_FUTURE_MASKS): Use ADDRESSING_FUTURE_MASKS.
* config/rs6000/rs6000.c (TARGET_PREFIXED_ADDR_DEFAULT): Do not
enable -mprefixed-addr unless the OS tm.h says to.
(TARGET_PCREL_DEFAULT): Do not enable -mpcrel unless the OS tm.h
says to.
(rs6000_option_override_internal): Do not enable -mprefixed-addr
or -mpcrel unless the OS tm.h says to enable it.  Add more checks
for -mcpu=future.

Index: gcc/config/rs6000/linux64.h
===
--- gcc/config/rs6000/linux64.h (revision 276974)
+++ gcc/config/rs6000/linux64.h (working copy)
@@ -640,3 +640,11 @@ extern int dot_symbols;
enabling the __float128 keyword.  */
 #undef TARGET_FLOAT128_ENABLE_TYPE
 #define TARGET_FLOAT128_ENABLE_TYPE 1
+
+/* Enable support for pc-relative and numeric prefixed addressing on the
+   'future' system.  */
+#undef  TARGET_PREFIXED_ADDR_DEFAULT
+#define TARGET_PREFIXED_ADDR_DEFAULT   1
+
+#undef  TARGET_PCREL_DEFAULT
+#define TARGET_PCREL_DEFAULT   1
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 276974)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -75,15 +75,21 @@
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_P9_VECTOR)
 
-/* Support for a future processor's features.  Do not enable -mpcrel until it
-   is fully functional.  */
+/* Support for a future processor's features.  The prefixed and pc-relative
+   addressing bits are not added here.  Instead, rs6000.c adds them if the OS
+   tm.h says that it supports the addressing modes.  */
 #define ISA_FUTURE_MASKS_SERVER(ISA_3_0_MASKS_SERVER   
\
-| OPTION_MASK_FUTURE   \
+| OPTION_MASK_FUTURE)
+
+/* Addressing related flags on a future processor.  These flags are broken out
+   because not all targets will support either pc-relative addressing, or even
+   prefixed addressing, and we want to clear all of the addressing bits
+   on targets that cannot support prefixed/pcrel addressing.  */
+#define ADDRESSING_FUTURE_MASKS(OPTION_MASK_PCREL  
\
 | OPTION_MASK_PREFIXED_ADDR)
 
 /* Flags that need to be turned off if -mno-future.  */
-#define OTHER_FUTURE_MASKS (OPTION_MASK_PCREL  \
-| OPTION_MASK_PREFIXED_ADDR)
+#define OTHER_FUTURE_MASKS ADDRESSING_FUTURE_MASKS
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS  (OPTION_MASK_FLOAT128_HW\
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 277026)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -98,6 +98,16 @@
 #endif
 #endif
 
+/* Set up the defaults for whether prefixed addressing is used, and if it is
+   used, whether we want to turn on pc-relative support by default.  */
+#ifndef TARGET_PREFIXED_ADDR_DEFAULT
+#define TARGET_PREFIXED_ADDR_DEFAULT   0
+#endif
+
+#ifndef TARGET_PCREL_DEFAULT
+#define TARGET_PCREL_DEFAULT   0
+#endif
+
 /* Support targetm.vectorize.builtin_mask_for_load.  */
 GTY(()) tree altivec_builtin_mask_for_load;
 
@@ -2532,6 +2542,14 @@ rs6000_debug_reg_global (void)
   if (TARGET_DIRECT_MOVE_128)
 fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element",
 (int)VECTOR_ELEMENT_MFVSRLD_64BIT);
+
+  if (TARGET_FUTURE)
+{
+  fprintf (stderr, DEBUG_FMT_D, "TARGET_PREFIXED_ADDR_DEFAULT",
+  TARGET_PREFIXED_ADDR_DEFAULT);
+  fprintf (stderr, DEBUG_FMT_D, "TARGET_PCREL_DEFAULT",
+  TARGET_PCREL_DEFAULT);
+}
 }
 
 
@@ -4012,26 +4030,6 @@ rs6000_option_over

[PATCH] V6, #8 of 17: Use PADDI to add 34-bit constants

2019-10-16 Thread Michael Meissner

This patch uses the PADDI instruction to add 34-bit constants that can't be
done with a single ADDI or ADDIS instruction.  This patch is the same as V5
patch #10.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/predicates.md (add_operand): Add support for
PADDI.
* config/rs6000/rs6000.md (add3): Add support for PADDI.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 277025)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -839,7 +839,8 @@ (define_special_predicate "indexed_addre
 (define_predicate "add_operand"
   (if_then_else (match_code "const_int")
 (match_test "satisfies_constraint_I (op)
-|| satisfies_constraint_L (op)")
+|| satisfies_constraint_L (op)
+|| satisfies_constraint_eI (op)")
 (match_operand 0 "gpc_reg_operand")))
 
 ;; Return 1 if the operand is either a non-special register, or 0, or -1.
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 277027)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -1760,15 +1760,17 @@ (define_expand "add3"
 })
 
 (define_insn "*add3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r")
-   (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b")
- (match_operand:GPR 2 "add_operand" "r,I,L")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
+   (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
+ (match_operand:GPR 2 "add_operand" "r,I,L,eI")))]
   ""
   "@
add %0,%1,%2
addi %0,%1,%2
-   addis %0,%1,%v2"
-  [(set_attr "type" "add")])
+   addis %0,%1,%v2
+   addi %0,%1,%2"
+  [(set_attr "type" "add")
+   (set_attr "isa" "*,*,*,fut")])
 
 (define_insn "*addsi3_high"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=b")

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #7 of 17: Use PADDI/PLI to load up 32-bit SImode constants

2019-10-16 Thread Michael Meissner

This patch uses PADDI (PLI) to load up 32-bit SImode constants that can't be
loaded with either a single ADDI or ADDIS instruction.  This patch is the same
as V5 patch #9.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/rs6000.md (movsi_internal1): Add support to load
up 32-bit SImode integer constants with PADDI.
(movsi integer constant splitter): Do not split constant if PADDI
can load it up directly.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 277026)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -6908,22 +6908,22 @@ (define_insn "movsi_low"
 
 ;; MR   LA   LWZ  LFIWZX   LXSIWZX
 ;; STW  STFIWX   STXSIWX  LI   LIS
-;; #XXLORXXSPLTIB 0   XXSPLTIB -1  VSPLTISW
-;; XXLXOR 0 XXLORC -1P9 const MTVSRWZ  MFVSRWZ
-;; MF%1 MT%0 NOP
+;; PLI  #XXLORXXSPLTIB 0   XXSPLTIB -1
+;; VSPLTISW XXLXOR 0 XXLORC -1P9 const MTVSRWZ
+;; MFVSRWZ  MF%1 MT%0 NOP
 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
"=r, r,   r,   d,   v,
 m,  Z,   Z,   r,   r,
-r,  wa,  wa,  wa,  v,
-wa, v,   v,   wa,  r,
-r,  *h,  *h")
+r,  r,   wa,  wa,  wa,
+v,  wa,  v,   v,   wa,
+r,  r,   *h,  *h")
(match_operand:SI 1 "input_operand"
"r,  U,   m,   Z,   Z,
 r,  d,   v,   I,   L,
-n,  wa,  O,   wM,  wB,
-O,  wM,  wS,  r,   wa,
-*h, r,   0"))]
+eI, n,   wa,  O,   wM,
+wB, O,   wM,  wS,  r,
+wa, *h,  r,   0"))]
   "gpc_reg_operand (operands[0], SImode)
|| gpc_reg_operand (operands[1], SImode)"
   "@
@@ -6937,6 +6937,7 @@ (define_insn "*movsi_internal1"
stxsiwx %x1,%y0
li %0,%1
lis %0,%v1
+   li %0,%1
#
xxlor %x0,%x1,%x1
xxspltib %x0,0
@@ -6953,21 +6954,21 @@ (define_insn "*movsi_internal1"
   [(set_attr "type"
"*,  *,   load,fpload,  fpload,
 store,  fpstore, fpstore, *,   *,
-*,  veclogical,  vecsimple,   vecsimple,   vecsimple,
-veclogical, veclogical,  vecsimple,   mffgpr,  mftgpr,
-*,  *,   *")
+*,  *,   veclogical,  vecsimple,   vecsimple,
+vecsimple,  veclogical,  veclogical,  vecsimple,   mffgpr,
+mftgpr, *,   *,   *")
(set_attr "length"
"*,  *,   *,   *,   *,
 *,  *,   *,   *,   *,
-8,  *,   *,   *,   *,
-*,  *,   8,   *,   *,
-*,  *,   *")
+*,  8,   *,   *,   *,
+*,  *,   *,   8,   *,
+*,  *,   *,   *")
(set_attr "isa"
"*,  *,   *,   p8v, p8v,
 *,  p8v, p8v, *,   *,
-*,  p8v, p9v, p9v, p8v,
-p9v,p8v, p9v, p8v, p8v,
-*,  *,   *")])
+fut,*,   p8v, p9v, p9v,
+p8v,p9v, p8v, p9v, p8v,
+p8v,*,   *,   *")])
 
 ;; Like movsi, but adjust a SF value to be used in a SI context, i.e.
 ;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0))
@@ -7112,14 +7113,15 @@ (define_insn

[PATCH] V6, #6 of 17: Use PADDI/PLI to load up 34-bit DImode constants

2019-10-16 Thread Michael Meissner

This patch uses PADDI (PLI) to load up 34-bit DImode constants.  This is the
same patch as V5 patch #8.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/rs6000.c (num_insns_constant_gpr): Add support for
PADDI to load up and/or add 34-bit integer constants.
(rs6000_rtx_costs): Treat constants loaded up with PADDI with the
same cost as normal 16-bit constants.
* config/rs6000/rs6000.md (movdi_internal64): Add support to load
up 34-bit integer constants with PADDI.
(movdi integer constant splitter): Add comment about PADDI.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 277025)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5524,7 +5524,7 @@ static int
 num_insns_constant_gpr (HOST_WIDE_INT value)
 {
   /* signed constant loadable with addi */
-  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x1)
+  if (SIGNED_16BIT_OFFSET_P (value))
 return 1;
 
   /* constant loadable with addis */
@@ -5532,6 +5532,10 @@ num_insns_constant_gpr (HOST_WIDE_INT va
   && (value >> 31 == -1 || value >> 31 == 0))
 return 1;
 
+  /* PADDI can support up to 34 bit signed integers.  */
+  else if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (value))
+return 1;
+
   else if (TARGET_POWERPC64)
 {
   HOST_WIDE_INT low  = ((value & 0x) ^ 0x8000) - 0x8000;
@@ -20641,7 +20645,8 @@ rs6000_rtx_costs (rtx x, machine_mode mo
|| outer_code == PLUS
|| outer_code == MINUS)
   && (satisfies_constraint_I (x)
-  || satisfies_constraint_L (x)))
+  || satisfies_constraint_L (x)
+  || satisfies_constraint_eI (x)))
  || (outer_code == AND
  && (satisfies_constraint_K (x)
  || (mode == SImode
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 277024)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -8823,24 +8823,24 @@ (define_split
   DONE;
 })
 
-;;  GPR store  GPR load   GPR move   GPR li GPR lis GPR #
-;;  FPR store  FPR load   FPR move   AVX store  AVX store   AVX 
load
-;;  AVX load   VSX move   P9 0   P9 -1  AVX 0/-1VSX 0
-;;  VSX -1 P9 const   AVX const  From SPR   To SPR  
SPR<->SPR
-;;  VSX->GPR   GPR->VSX
+;;  GPR store  GPR load   GPR move   GPR li GPR lis GPR pli
+;;  GPR #  FPR store  FPR load   FPR move   AVX store   AVX 
store
+;;  AVX load   AVX load   VSX move   P9 0   P9 -1   AVX 
0/-1
+;;  VSX 0  VSX -1 P9 const   AVX const  From SPRTo SPR
+;;  SPR<->SPR  VSX->GPR   GPR->VSX
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
"=YZ,   r, r, r, r,  r,
-m, ^d,^d,wY,Z,  $v,
-$v,^wa,   wa,wa,v,  wa,
-wa,v, v, r, *h, *h,
-?r,?wa")
+r, m, ^d,^d,wY, Z,
+$v,$v,^wa,   wa,wa, v,
+wa,wa,v, v, r,  *h,
+*h,?r,?wa")
(match_operand:DI 1 "input_operand"
-   "r, YZ,r, I, L,  nF,
-^d,m, ^d,^v,$v, wY,
-Z, ^wa,   Oj,wM,OjwM,   Oj,
-wM,wS,wB,*h,r,  0,
-wa,r"))]
+   "r, YZ,r, I, L,  eI,
+nF,^d,m, ^d,^v, $v,
+wY,Z, ^wa,   Oj,wM, OjwM,
+Oj,wM,wS,wB,*h, r,
+0, wa,r"))]
   "TARGET_POWERPC64
&& (gpc_reg_operand (operands[0], DImode)
|| gpc_reg_operand (operands[1], DImode))"
@@ -8850,6 +8850,7 @@ (define_insn "*movdi_internal64"
mr %0,%1
li %0,%1
lis %0,%v1
+   li %0,%1
#
stfd%U0%X0 %1,%0
lfd%U1%X1 %0,%1
@@ -8873,26 +8874,28 @@ (define_insn "

[PATCH] V6, #5 of 17: Add prefixed instruction support to vector extract optimizations

2019-10-16 Thread Michael Meissner

This patch updates the support for optimizing vector extracts to know about
prefixed addressing.  There are two parts to the patch:

1) If a vector extract with a constant element number extracts an element from
a vector residing in memory that uses a prefixed address (either numeric or
PC-relative), the offset for the element number is folded into the address of
the vector and a scalar load is done.

2) If a vector extract with a variable element number extracts an element from
a vector residing in memory that uses a prefixed address, the optimization is
not done because we would need two temporary base registers to do this
calculation and currently we only have one.  Instead, the vector is loaded into
a vector register, and the element extract is done from the value in a
register.  We discovered this trying to run real code through the mambo
simulator.  The compiler previously would try to use the single base register
both to hold the address and to generate the element offset.

This patch adds a new constraint (em) that prevents using prefixed addresses.
Without the constraint the register allocator would recreate the prefixed
address for the insn.

This patch updates V5 patch #7 which did the same thing.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/constraints.md (em constraint): New constraint for
non-prefixed memory.
* config/rs6000/predicates.md (non_prefixed_memory): New
predicate.
(reg_or_non_prefixed_memory): New predicate.
* config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support
for optimizing extracting a constant vector element from a vector
that uses a prefixed address.  If the element number is variable
and the address uses a prefixed address, abort.
* config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator):
Do not allow combining prefixed memory with a variable vector
extract.
(vsx_extract_v4sf_var): Do not allow combining prefixed memory
with a variable vector extract.
(vsx_extract__var, VSX_EXTRACT_I iterator): Do not allow
combining prefixed memory with a variable vector extract.
(vsx_extract__mode_var): Do not allow combining
prefixed memory with a variable vector extract.
* doc/md.texi (PowerPC constraints): Document the em constraint.

Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 276974)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -202,6 +202,11 @@ (define_constraint "H"
 
 ;; Memory constraints
 
+(define_memory_constraint "em"
+  "A memory operand that does not contain a prefixed address."
+  (and (match_code "mem")
+   (match_test "non_prefixed_memory (op, mode)")))
+
 (define_memory_constraint "es"
   "A ``stable'' memory operand; that is, one which does not include any
 automodification of the base register.  Unlike @samp{m}, this constraint
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 277024)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1822,3 +1822,24 @@ (define_predicate "prefixed_memory"
 {
   return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
 })
+
+;; Return true if the operand is a memory address that does not use a prefixed
+;; address.
+(define_predicate "non_prefixed_memory"
+  (match_code "mem")
+{
+  /* If the operand is not a valid memory operand even if it is not prefixed,
+ do not return true.  */
+  if (!memory_operand (op, mode))
+return false;
+
+  return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
+})
+
+;; Return true if the operand is either a register or it is a non-prefixed
+;; memory operand.
+(define_predicate "reg_or_non_prefixed_memory"
+  (match_code "reg,subreg,mem")
+{
+  return gpc_reg_operand (op, mode) || non_prefixed_memory (op, mode);
+})
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 277018)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6702,6 +6702,7 @@ rs6000_adjust_vec_address (rtx scalar_re
   rtx element_offset;
   rtx new_addr;
   bool valid_addr_p;
+  bool pcrel_p = pcrel_local_address (addr, Pmode);
 
   /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY.  */
   gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC);
@@ -6739,6 +6740,38 @@ rs6000_adjust_vec_address (rtx scalar_re
   else if

Re: [PATCH V2] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-16 Thread Jiufu Guo

Segher Boessenkool  writes:

> Hi!
>
>>  the callee explicitly disables some isa_flags the caller is using.  
>
> No trailing spaces please.
Updated.
>
>> +  /* The callee's options must be a subset of the caller's options, i.e.
>> + a vsx function may inline an altivec function, but a non-vsx function
>> + must not inline a vsx function.  However, for those options that the
>
> no-vsx instead of non-vsx?  Or make it clear even more explicitly that this is
> about having something explicitly disabled?  (The code is clear
> though).
Thanks.
>
>> +
>> +static inline void __attribute__ ((__always_inline__, target ("no-vsx")))
>> +foo () /* { dg-error "inlining failed in call to .* target specific option 
>> mismatch" } */
>
> .* can match across lines.  Not a huge deal here of course -- but maybe
> adding  (?n)  to the regexp works?  (Just at the very start of it).
Seems (?n) does not help this case, so keep old one.
>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010-2.c
>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010-3.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr70010-3.c
>> new file mode 100644
>
> Do you want to test anything in those two new testcases?  Other than "it
> compiles"?
If compiling pass, it indicates inlining works fine for the caller and
callee. So, I did not check other thing here.
>
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
>
> This line isn't necessary anymore I think?  Or if it is, it is needed in
> all these new testcases.
Updated.
>
> Okay for trunk.  Thanks to both of you!
>
> Also okay for 9 and 8, after waiting a week to see if there is fallout.
>
>
> Segher

I just committed the patch.

Thanks all for your helps! Segher, Richard, Peter, Andreas, Iain and all.

Jiufu
BR

[PATCH] V6, #4 of 17: Add prefixed instruction support to stack protect insns

2019-10-16 Thread Michael Meissner

This patch fixes the stack protection insns to support stacks larger than
16-bits on the 'future' system using prefixed loads and stores.

This rewrites V5 patch #5.  In earlier patches, I had had a variant of this
patch, but I was asked to restrict the protect insns to use non-prefixed insns,
which I did in V5.  However, in V5, I was asked to support prefixed
instructions once again.  This patch was updated to use the new support in V6
patch #1.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/predicates.md (prefixed_memory): New predicate.
* config/rs6000/rs6000.md (stack_protect_setdi): Deal with either
address being a prefixed load/store.
(stack_protect_testdi): Deal with either address being a prefixed
load.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 277022)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1815,3 +1815,10 @@ (define_predicate "pcrel_external_addres
 (define_predicate "pcrel_local_or_external_address"
   (ior (match_operand 0 "pcrel_local_address")
(match_operand 0 "pcrel_external_address")))
+
+;; Return true if the operand is a memory address that uses a prefixed address.
+(define_predicate "prefixed_memory"
+  (match_code "mem")
+{
+  return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
+})
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 277052)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -11559,14 +11559,44 @@ (define_insn "stack_protect_setsi"
   [(set_attr "type" "three")
(set_attr "length" "12")])
 
+;; We can't use the prefixed attribute here because there are two memory
+;; instructions.  We can't split the insn due to the fact that this operation
+;; needs to be done in one piece.
 (define_insn "stack_protect_setdi"
   [(set (match_operand:DI 0 "memory_operand" "=Y")
(unspec:DI [(match_operand:DI 1 "memory_operand" "Y")] UNSPEC_SP_SET))
(set (match_scratch:DI 2 "=&r") (const_int 0))]
   "TARGET_64BIT"
-  "ld%U1%X1 %2,%1\;std%U0%X0 %2,%0\;li %2,0"
+{
+  if (prefixed_memory (operands[1], DImode))
+output_asm_insn ("pld %2,%1", operands);
+  else
+output_asm_insn ("ld%U1%X1 %2,%1", operands);
+
+  if (prefixed_memory (operands[0], DImode))
+output_asm_insn ("pstd %2,%0", operands);
+  else
+output_asm_insn ("std%U0%X0 %2,%0", operands);
+
+  return "li %2,0";
+}
   [(set_attr "type" "three")
-   (set_attr "length" "12")])
+
+  ;; Back to back prefixed memory instructions take 20 bytes (8 bytes for each
+  ;; prefixed instruction + 4 bytes for the possible NOP).  Add in 4 bytes for
+  ;; the LI 0 at the end.
+   (set_attr "prefixed" "no")
+   (set_attr "num_insns" "3")
+   (set (attr "length")
+   (cond [(and (match_operand 0 "prefixed_memory")
+   (match_operand 1 "prefixed_memory"))
+  (const_string "24")
+
+  (ior (match_operand 0 "prefixed_memory")
+   (match_operand 1 "prefixed_memory"))
+  (const_string "20")]
+
+ (const_string "12")))])
 
 (define_expand "stack_protect_test"
   [(match_operand 0 "memory_operand")
@@ -11605,6 +11635,9 @@ (define_insn "stack_protect_testsi"
lwz%U1%X1 %3,%1\;lwz%U2%X2 %4,%2\;cmplw %0,%3,%4\;li %3,0\;li %4,0"
   [(set_attr "length" "16,20")])
 
+;; We can't use the prefixed attribute here because there are two memory
+;; instructions.  We can't split the insn due to the fact that this operation
+;; needs to be done in one piece.
 (define_insn "stack_protect_testdi"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=x,?y")
 (unspec:CCEQ [(match_operand:DI 1 "memory_operand" "Y,Y")
@@ -11613,10 +11646,45 @@ (define_insn "stack_protect_testdi"
(set (match_scratch:DI 4 "=r,r") (const_int 0))
(clobber (match_scratch:DI 3 "=&r,&r"))]
   "TARGET_64BIT"
-  "@
-   ld%U1%X1 %3,%1\;ld%U2%X2 %4,%2\;xor. %3,%3,%4\;li %4,0
-   ld%U1%X1 %3,%1\;ld%U2%X2 %4,%2\;cmpld %0,%3,%4\;li %3,0\;li %4,0"
-  [(set_attr "length" "16,20")])
+{
+  if (prefixed_memory (operands[1], DImode))
+output_asm_insn ("pld %3,%1", operands);
+  else
+output_asm_insn ("ld%U1%X1 %3,%1", operands);
+
+  if (prefixed_memory (operands[2], DImode))
+output_asm_insn ("pld %4,%2", operands);
+  else
+output_asm_insn ("ld%U2%X2 %4,%2", operands);
+
+  if (which_alternative == 0)
+output_asm_insn ("xor. %3,%3,%4", operands);
+  else
+output_asm_insn ("cmpld %0,%3,%4\;li %3,0", operands);

[PATCH] V6, #3 of 17: Update lwa_operand for prefixed PLWA

2019-10-16 Thread Michael Meissner

This patch allows using load SImode with sign extend to DImode to generate the
PLWA instruction on the 'future' machine if the offset for the load has the
bottom 2 bits being non-zero.  The normal LWA instruction is a DS format
instruction, and it needs the bottom 2 bits to be 0.

This patch was originally part of V5 patch #4.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/predicates.md (lwa_operand): If the bottom two
bits of the offset for the memory address are non-zero, use PLWA
if prefixed instructions are available.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 277017)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -932,6 +932,14 @@ (define_predicate "lwa_operand"
 return false;
 
   addr = XEXP (inner, 0);
+
+  /* The LWA instruction uses the DS-form instruction format which requires
+ that the bottom two bits of the offset must be 0.  The prefixed PLWA does
+ not have this restriction.  While the actual load from memory is 32-bits,
+ we pass in DImode here to test for using a DS instruction.  */
+  if (address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
+return true;
+
   if (GET_CODE (addr) == PRE_INC
   || GET_CODE (addr) == PRE_DEC
   || (GET_CODE (addr) == PRE_MODIFY

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #2 of 17: Minor code reformat

2019-10-16 Thread Michael Meissner

This patch tweaks the code formatting that I noticed in making the previous
patch for some of the 128-bit mode move instructions.  Originally this was part
of V5 patch #2, but it has been moved to a separate patch.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/rs6000.md (mov_64bit_dm): Reformat.
(movtd_64bit_nodm): Reformat.
(mov_32bit): Reformat.
(mov_softfloat): Reformat.
(FMOVE128_GPR splitter): Reformat.
(DIFD splitter): Reformat.
(TI2 splitter): Reformat.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 277018)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -,7 +,10 @@ (define_insn_and_split "*mov_64bit
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
   [(set_attr "length" "8")
(set_attr "isa" "*,*,*,*,*,*,*,*,p8v,p8v")
(set_attr "max_prefixed_insns" "2")
@@ -7792,7 +7795,10 @@ (define_insn_and_split "*movtd_64bit_nod
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;}
+
   [(set_attr "length" "8,8,8,12,12,8")
(set_attr "max_prefixed_insns" "2")
(set_attr "num_insns" "2,2,2,3,3,2")])
@@ -7809,7 +7815,10 @@ (define_insn_and_split "*mov_32bit
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
   [(set_attr "length" "8,8,8,8,20,20,16")])
 
 (define_insn_and_split "*mov_softfloat"
@@ -7821,7 +7830,10 @@ (define_insn_and_split "*mov_softf
   "#"
   "&& reload_completed"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
   [(set_attr_alternative "length"
[(if_then_else (match_test "TARGET_POWERPC64")
(const_string "8")
@@ -8613,7 +8625,10 @@ (define_split
|| (!vsx_register_operand (operands[0], mode)
&& !vsx_register_operand (operands[1], mode)))"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+})
 
 ;; Move SFmode to a VSX from a GPR register.  Because scalar floating point
 ;; type is stored internally as double precision in the VSX registers, we have
@@ -8803,7 +8818,10 @@ (define_split
&& gpr_or_gpr_p (operands[0], operands[1])
&& !direct_move_p (operands[0], operands[1])"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+})
 
 ;;  GPR store  GPR load   GPR move   GPR li GPR lis GPR #
 ;;  FPR store  FPR load   FPR move   AVX store  AVX store   AVX 
load
@@ -9030,7 +9048,10 @@ (define_split
&& !direct_move_p (operands[0], operands[1])
&& !quad_load_store_p (operands[0], operands[1])"
   [(pc)]
-{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; })
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+})
 
 (define_expand "setmemsi"
   [(parallel [(set (match_operand:BLK 0 "")

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[PATCH] V6, #1 of 17: Use ADJUST_INSN_LENGTH for prefixed instructions

2019-10-16 Thread Michael Meissner

This patch uses the target hook ADJUST_INSN_LENGTH to change the length of
instructions that contain prefixed memory/add instructions.

There are 2 new insn attributes:

1) num_insns: If non-zero, returns the number of machine instructions in an
insn.  This simplifies the calculations in rs6000_insn_cost.

2) max_prefixed_insns: Returns the maximum number of prefixed instructions in
an insn.  Normally this is 1, but in the insns that load up 128-bit values into
GPRs, it will be 2.

This patch replaces patches #2, #3, and #6 from the V5 patch series.

Along with the other patches, I have done bootstraps on a little endian power8
system, and there were no regressions in the test suite.  I have built both
Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future,
and there were no failures.  Can I check this into the trunk?

Note, I may have limited email access on October 17th and 18th, 2019.

2019-10-15  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_insn_cost): Use num_insns insn
attribute if it exists, rather than the insn size.  If we use the
insn size, adjust the size to remove the extra size that prefixed
instructions take.
* config/rs6000/rs6000.h (ADJUST_INSN_LENGTH): New target hook to
update the instruction sized if prefixed instructions are used.
* config/rs6000/rs6000.md (prefixed_length attribute): Delete.
(non_prefixed_length attribute): Delete.
(num_insns attribute): New insn attribute to return the number of
instructions.
(max_prefixed_insns attribute): New insn attribute to return the
maximum number of prefixed instructions in an insn.
(length attribute): Do not adjust for prefix instructions here,
punt to ADJUST_INSN_LENGTH.
(mov_64bit): Set max_prefixed_insns and num_insns.
(movtd_64bit_nodm): Set max_prefixed_insns and num_insns.
(mov_ppc64): Set max_prefixed_insns and num_insns.
* config/rs6000/vsx.md: (vsx_mov_64bit): Set
max_prefixed_insns and num_insns.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 277017)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20973,14 +20973,32 @@ rs6000_insn_cost (rtx_insn *insn, bool s
   if (recog_memoized (insn) < 0)
 return 0;
 
+  /* If we are optimizing for size, just use the length.  */
   if (!speed)
 return get_attr_length (insn);
 
+  /* Use the cost if provided.  */
   int cost = get_attr_cost (insn);
   if (cost > 0)
 return cost;
 
-  int n = get_attr_length (insn) / 4;
+  /* If the insn tells us how many insns there are, use that.  Otherwise use
+ the length/4.  Adjust the insn length to remove the extra size that
+ prefixed instructions take.  */
+  int n = get_attr_num_insns (insn);
+  if (n == 0)
+{
+  int length = get_attr_length (insn);
+  if (get_attr_prefixed (insn) == PREFIXED_YES)
+   {
+ int adjust = 0;
+ ADJUST_INSN_LENGTH (insn, adjust);
+ length -= adjust;
+   }
+
+  n = length / 4;
+}
+
   enum attr_type type = get_attr_type (insn);
 
   switch (type)
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 277017)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -1847,9 +1847,30 @@ extern scalar_int_mode rs6000_pmode;
 /* Adjust the length of an INSN.  LENGTH is the currently-computed length and
should be adjusted to reflect any required changes.  This macro is used when
there is some systematic length adjustment required that would be difficult
-   to express in the length attribute.  */
+   to express in the length attribute.
 
-/* #define ADJUST_INSN_LENGTH(X,LENGTH) */
+   In the PowerPC, we use this to adjust the length of an instruction if one or
+   more prefixed instructions are generated, using the attribute
+   num_prefixed_insns.  A prefixed instruction is 8 bytes instead of 4, but the
+   hardware requires that a prefied instruciton not cross a 64-byte boundary.
+   This means the compiler has to assume the length of the first prefixed
+   instruction is 12 bytes instead of 8 bytes.  Since the length is already set
+   for the non-prefixed instruction, we just need to udpate for the
+   difference.  */
+
+#define ADJUST_INSN_LENGTH(INSN,LENGTH)
\
+{  \
+  if (NONJUMP_INSN_P (INSN))   \
+{  \
+  rtx pattern = PATTERN (INSN);\
+  if (GET_CODE (pattern) != USE && GET_CODE (pattern) != CLOBBER   \
+ && get_attr_prefixed (INSN) == PREFIXED_YES)  \
+   {

Re: [gomp4.1] Start of structure element mapping support

2019-10-16 Thread Thomas Schwinge

Hi Jakub!

Stumbled over this while reviewing Julian's "Factor out duplicate code in
gimplify_scan_omp_clauses":

On 2015-07-31T18:16:10+0200, Jakub Jelinek  wrote:
> This patch is the start of implementation of struct element mapping.

Not quite the same, but similar code is still present in GCC trunk.

> --- gcc/gimplify.c.jj 2015-07-31 16:55:01.482411392 +0200
> +++ gcc/gimplify.c2015-07-31 16:57:22.307320290 +0200

> +   tree offset;

Here we define 'offset'...

> +   HOST_WIDE_INT bitsize, bitpos;
> +   machine_mode mode;
> +   int unsignedp, volatilep = 0;
> +   tree base
> + = get_inner_reference (OMP_CLAUSE_DECL (c), &bitsize,
> +&bitpos, &offset, &mode, &unsignedp,
> +&volatilep, false);

..., which here gets writte to...

> +   gcc_assert (base == decl
> +   && (offset == NULL_TREE
> +   || TREE_CODE (offset) == INTEGER_CST));

..., and here gets checked...

> +
> +   splay_tree_node n
> + = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
> +   if (n == NULL || (n->value & GOVD_MAP) == 0)
> + {
> +   [...]
> + }
> +   else
> + {
> +   tree *osc = struct_map_to_clause->get (decl), *sc;
> +   if (OMP_CLAUSE_MAP_KIND (c) & GOMP_MAP_FLAG_ALWAYS)
> + n->value |= GOVD_SEEN;
> +   offset_int o1, o2;
> +   if (offset)
> + o1 = wi::to_offset (offset);

..., and here used.

> +   else
> + o1 = 0;
> +   if (bitpos)
> + o1 = o1 + bitpos / BITS_PER_UNIT;
> +   for (sc = &OMP_CLAUSE_CHAIN (*osc); *sc != c;
> +sc = &OMP_CLAUSE_CHAIN (*sc))
> + if (TREE_CODE (OMP_CLAUSE_DECL (*sc)) != COMPONENT_REF)
> +   break;
> + else
> +   {
> + tree offset2;

Here we define 'offset2'...

> + HOST_WIDE_INT bitsize2, bitpos2;
> + base = get_inner_reference (OMP_CLAUSE_DECL (*sc),
> + &bitsize2, &bitpos2,
> + &offset2, &mode,
> + &unsignedp, &volatilep,
> + false);

..., which here gets writte to...

> + if (base != decl)
> +   break;
> + gcc_assert (offset == NULL_TREE
> + || TREE_CODE (offset) == INTEGER_CST);

..., but here we again check 'offset', not 'offset2'...

> + tree d1 = OMP_CLAUSE_DECL (*sc);
> + tree d2 = OMP_CLAUSE_DECL (c);
> + while (TREE_CODE (d1) == COMPONENT_REF)
> +   if (TREE_CODE (d2) == COMPONENT_REF
> +   && TREE_OPERAND (d1, 1)
> +  == TREE_OPERAND (d2, 1))
> + {
> +   d1 = TREE_OPERAND (d1, 0);
> +   d2 = TREE_OPERAND (d2, 0);
> + }
> +   else
> + break;
> + if (d1 == d2)
> +   {
> + error_at (OMP_CLAUSE_LOCATION (c),
> +   "%qE appears more than once in map "
> +   "clauses", OMP_CLAUSE_DECL (c));
> + remove = true;
> + break;
> +   }
> + if (offset2)
> +   o2 = wi::to_offset (offset2);

.., but here again we use 'offset2'.

> + else
> +   o2 = 0;
> + if (bitpos2)
> +   o2 = o2 + bitpos2 / BITS_PER_UNIT;
> + if (wi::ltu_p (o1, o2)
> + || (wi::eq_p (o1, o2) && bitpos < bitpos2))
> +   break;
> +   }
> +   if (!remove)
> + OMP_CLAUSE_SIZE (*osc)
> +   = size_binop (PLUS_EXPR, OMP_CLAUSE_SIZE (*osc),
> + size_one_node);
> +   if (!remove && *sc != c)
> + {
> +   *list_p = OMP_CLAUSE_CHAIN (c);
> +   OMP_CLAUSE_CHAIN (c) = *sc;
> +

[PATCH] Relax integer condition reduction, simplify vect_is_simple_reduction

2019-10-16 Thread Richard Biener



It happens we cannot have different typed data and index for
integer condition reductions right now, for whatever reason.
The following makes that work, even for double data and integer index.
There's hope this enables some relevant amount of extra vectorization.

Actually this is fallout from simplifying vect_is_simple_reduction
down to SSA cycle detection and moving reduction validity / handling
checks to vectorizable_reduction (thus a single place).

I've decided to take an intermediate step here as I enable more
vectorization.  Which also needed the vect_transform_stmt change.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

* tree-vect-loop.c (vect_valid_reduction_input_p): Remove.
(vect_is_simple_reduction): Delay checking to
vectorizable_reduction and relax the checking.
(vectorizable_reduction): Check we have a simple use.  Check
for bogus condition reductions.
* tree-vect-stmts.c (vect_transform_stmt): Make sure we
are looking at the last stmt in a pattern sequence when
filling in backedge PHI values.

* gcc.dg/vect/vect-cond-reduc-3.c: New testcase.
* gcc.dg/vect/vect-cond-reduc-4.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-3.c
new file mode 100644
index 000..a5b3849a8c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-3.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_float } */
+
+#include "tree-vect.h"
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with different types.  */
+
+int
+condition_reduction (float *a, float min_v)
+{
+  int last = 0;
+
+  for (int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  float a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27
+  };
+
+  check_vect ();
+
+  int ret = condition_reduction (a, 10);
+  if (ret != 18)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 2 "vect" { target { ! vect_fold_extract_last } } } } */
+
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
new file mode 100644
index 000..6b6d17fb93c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-4.c
@@ -0,0 +1,45 @@
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_double } */
+
+#include "tree-vect.h"
+
+extern void abort (void) __attribute__ ((noreturn));
+
+#define N 27
+
+/* Condition reduction with different types.  */
+
+int
+condition_reduction (double *a, double min_v)
+{
+  int last = 0;
+
+  for (int i = 0; i < N; i++)
+if (a[i] < min_v)
+  last = i;
+
+  return last;
+}
+
+int
+main (void)
+{
+  double a[N] = {
+  11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+  1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+  21, 22, 23, 24, 25, 26, 27
+  };
+
+  check_vect ();
+
+  int ret = condition_reduction (a, 10);
+  if (ret != 18)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "optimizing condition reduction with 
FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 2 "vect" { target { ! vect_fold_extract_last } } } } */
+
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 455251070d0..0530d6643b4 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2532,21 +2532,6 @@ report_vect_op (dump_flags_t msg_type, gimple *stmt, 
const char *msg)
   dump_printf_loc (msg_type, vect_location, "%s%G", msg, stmt);
 }
 
-/* DEF_STMT_INFO occurs in a loop that contains a potential reduction
-   operation.  Return true if the results of DEF_STMT_INFO are something
-   that can be accumulated by such a reduction.  */
-
-static bool
-vect_valid_reduction_input_p (stmt_vec_info def_stmt_info)
-{
-  return (is_gimple_assign (def_stmt_info->stmt)
- || is_gimple_call (def_stmt_info->stmt)
- || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_induction_def
- || (gimple_code (def_stmt_info->stmt) == GIMPLE_PHI
- && STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_internal_def
- && !is_loop_header_bb_p (gimple_bb (def_stmt_info->stmt;
-}
-
 /* Return true if we need an in-order reduction for operation CODE
on type TYPE.  NEED_WRAPPING_INTEGRAL_OVERFLOW is true if integer
overflow must wrap.  */
@@ -2748,13 +2733,7 @@ ve

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-16 Thread David Malcolm

On Tue, 2019-10-15 at 20:31 -0400, JeanHeyd Meneide wrote:
> Attached is a patch for p1301 that improves in the way Jason Merrill
> specified earlier
> (https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00858.html), but it
> keeps segfaulting on my build of GCC. I don't know what changes I've
> made that cause it to segfault: it does so whenever the error()
> function is called but the backtraces aren't showing me anything
> conclusive.
> 
> The tests test what I expect them to and the output is fine, I just
> can't get the segfaults to stop, so I'm putting the patch up so
> someone can critique what I've written, or someone else to test it
> too. Sorry for taking so long.
> 
> Thanks,
> JeanHeyd

Thanks for posting these patches.

I believe you're a new GCC contributor - welcome!

Having said "welcome", for good or ill, GCC has some legalese that we
have to follow - to get a patch accepted, see the "Legal Prerequisites"
section of: https://gcc.gnu.org/contribute.html so you'll need to
follow that at some point  (I hate having to say this).

I got the impression on IRC yesterday that you were having some trouble
getting your debugging environment set up - it sounded to me like
you've been "making do" by adding print statements to the parser and
recompiling each time.

If that's the case, then the best next step is probably to figure out
how to get you debugging cc1plus using a debugger.

Normally, I run:
  ./xgcc -B.
and add  "-wrapper gdb,--args" to the command line so that the driver
invokes gdb.

Another way is to invoke gdb and have it directly invoke cc1plus:
   gdb --args ./cc1plus -quiet x.C

In both cases, run it from /gcc

See also:
https://dmalcolm.fedorapeople.org/gcc/newbies-guide/debugging.html
which has further hints and info on a "nicer" debugging experience.

You mentioned that you normally use gdbserver, which makes me think
that you're most comfortable using gdb from an IDE.  I'm not very
familiar with gdbserver, but hopefully someone here knows the recipe
for this that will let you comfortably debug the frontend - without
needing to add print statements and recompile each time (ugh).

Note that for hacking on the frontend and debugging, I normally
configure gcc with --disable-bootstrap which saves a *lot* of time;
similarly I also only build the "gcc" subdirectory and its dependencies
(I still do a full bootstrap and regression test in a separate
directory once I've got a candidate patch that I want to test properly
before submitting to the mailing list).

[I know that you know some/all of this, but I'm posting it here for the
benefit of people reading the list archives]

Hope this is helpful, and welcome again.
Dave

PowerPC future machine patches, version 6

2019-10-16 Thread Michael Meissner

This is version 6 of the patches for the PowerPC 'future' machine.  There are
currently 17 patches in this series.

Compared to the V5 patches, the following changes have been made:

1) The length calculation for memory references involving prefixed addresses
has been moved to the target hook ADJUST_INSN_LENGTH.

2) There is a new insn attribute (num_insns) that gives the number of machine
instructions in an insn, rather than having rs6000_insn_cost trying to figure
out how many instructions an insn had on the fly from the insn length.

3) I moved a patch reformatting the code to be a separate patch.

4) I separated the predicates patches, putting the lwa_operand modification as
a separate patch, and the other new predicate functions are with the other
changes that use them.

5) I reworked stack_protect_setdi and stack_protect_testdi to once again allow
prefixed instructions.  I have added a test for this as a later patch.

6) I have slightly reworked the vector extract patches to fit in with the new
scheme.  This patch still has the restriction that it will not combine a vector
extract from memory if the vector is pointed uses a PC-relative address and the
element number is variable.  This is due to not having two temporary registers
(one for the PC-relative address, and one for the variable offset).

7) I split up the miscellaneous tests into separate patches, including a
separate patch to add new effective targets for tests.

Note, on October 17th and 18th, I likely will have limited email access.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-16 Thread Wilco Dijkstra

Hi Christophe,

> I've noticed that your patch caused a regression:
> FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
> "internal loop alignment added" 1

That's just a testism - it only tests for loop alignment and doesn't
consider the possibility of the loop being jumped into like this:

.L17:
        adds    r0, r0, #1
        b       .L27
.L6:
        ldr     r4, [r2, #12]
        adds    r0, r0, #4
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        ldr     r4, [r2, #12]
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
.L27:
        ldr     r4, [r2, #12]
        cmp     ip, r0
        ldr     lr, [r1]
        str     lr, [r3, r4, lsl #2]
        bne     .L6
        pop     {r4, pc}

It seems minor changes in scheduling allows blocks to be commoned or not.
The underlying issue is that commoning like this should not be allowed on blocks
with different profile stats - particularly on loops where it inhibits 
scheduling of
the loop itself.

Cheers,
Wilco

Re: [SVE] PR86753

2019-10-16 Thread Richard Sandiford

Richard Biener  writes:
> On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni
>  wrote:
>>
>> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni
>>  wrote:
>> >
>> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford
>> >  wrote:
>> > >
>> > > Leaving the main review to Richard, just some comments...
>> > >
>> > > Prathamesh Kulkarni  writes:
>> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
>> > > >
>> > > > When STMT_INFO is vectorized as a nested cycle, for_reduction is 
>> > > > true.
>> > > >
>> > > > +   For COND_EXPR if T comes from masked load, and is 
>> > > > conditional
>> > > > +   on C, we apply loop mask to result of vector comparison, if it's 
>> > > > present.
>> > > > +   Similarly for E, if it is conditional on !C.
>> > > > +
>> > > > Return true if STMT_INFO is vectorizable in this way.  */
>> > > >
>> > > >  bool
>> > >
>> > > I think this is a bit misleading.  But IMO it'd be better not to have
>> > > a comment here and just rely on the one in the main function body.
>> > > This optimisation isn't really changing the vectorisation strategy,
>> > > and the comment could easily get forgotten if things change in future.
>> > >
>> > > > [...]
>> > > > @@ -,6 +10006,35 @@ vectorizable_condition (stmt_vec_info 
>> > > > stmt_info, gimple_stmt_iterator *gsi,
>> > > >/* Handle cond expr.  */
>> > > >for (j = 0; j < ncopies; j++)
>> > > >  {
>> > > > +  tree loop_mask = NULL_TREE;
>> > > > +  bool swap_cond_operands = false;
>> > > > +
>> > > > +  /* Look up if there is a loop mask associated with the
>> > > > +  scalar cond, or it's inverse.  */
>> > >
>> > > Maybe:
>> > >
>> > >See whether another part of the vectorized code applies a loop
>> > >mask to the condition, or to its inverse.
>> > >
>> > > > +
>> > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>> > > > + {
>> > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
>> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>> > > > + {
>> > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
>> > > > vectype, j);
>> > > > + }
>> > > > +   else
>> > > > + {
>> > > > +   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
>> > > > +   cond.code = invert_tree_comparison (cond.code, honor_nans);
>> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>> > > > + {
>> > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
>> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
>> > > > +   vectype, j);
>> > > > +   cond_code = cond.code;
>> > > > +   swap_cond_operands = true;
>> > > > + }
>> > > > + }
>> > > > + }
>> > > > +
>> > > >stmt_vec_info new_stmt_info = NULL;
>> > > >if (j == 0)
>> > > >   {
>> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info 
>> > > > stmt_info, gimple_stmt_iterator *gsi,
>> > > >   }
>> > > >   }
>> > > >   }
>> > > > +
>> > > > +   /* If loop mask is present, then AND it with
>> > >
>> > > Maybe "If we decided to apply a loop mask, ..."
>> > >
>> > > > +  result of vec comparison, so later passes (fre4)
>> > >
>> > > Probably better not to name the pass -- could easily change in future.
>> > >
>> > > > +  will reuse the same condition used in masked load.
>> > >
>> > > Could be a masked store, or potentially other things too.
>> > > So maybe just "will reuse the masked condition"?
>> > >
>> > > > +
>> > > > +  For example:
>> > > > +  for (int i = 0; i < 100; ++i)
>> > > > +x[i] = y[i] ? z[i] : 10;
>> > > > +
>> > > > +  results in following optimized GIMPLE:
>> > > > +
>> > > > +  mask__35.8_43 = vect__4.7_41 != { 0, ... };
>> > > > +  vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
>> > > > +  _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 
>> > > > 0B];
>> > > > +  vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
>> > > > +  vect_iftmp.12_52 = VEC_COND_EXPR > > > > +vect_iftmp.11_47, { 10, 
>> > > > ... }>;
>> > > > +
>> > > > +  instead of recomputing vec != { 0, ... } in vec_cond_expr  
>> > > > */
>> > >
>> > > That's true, but gives the impression that avoiding the vec != { 0, ... }
>> > > is the main goal, whereas we could do that just by forcing a 
>> > > three-operand
>> > > COND_EXPR.  It's really more about making sure that vec != { 0, ... }
>> > > and its masked form aren't both live at the same time.  So maybe:
>> > >
>> > >  instead of using a masked and unmasked forms of
>> > >  vect__4.7_41 != { 0, ... }

Re: Add expr_callee_abi

2019-10-16 Thread Richard Sandiford

Richard Biener  writes:
> On October 14, 2019 2:53:36 PM GMT+02:00, Richard Sandiford 
>  wrote:
>>Richard Biener  writes:
>>> On Fri, Oct 11, 2019 at 4:39 PM Richard Sandiford
>>>  wrote:

 This turned out to be useful for the SVE PCS support, and is a
>>natural
 tree-level analogue of insn_callee_abi.

 Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

 Richard


 2019-10-11  Richard Sandiford  

 gcc/
 * function-abi.h (expr_callee_abi): Declare.
 * function-abi.cc (expr_callee_abi): New function.

 Index: gcc/function-abi.h
 ===
 --- gcc/function-abi.h  2019-09-30 17:39:33.514597856 +0100
 +++ gcc/function-abi.h  2019-10-11 15:38:54.141605718 +0100
 @@ -315,5 +315,6 @@ call_clobbered_in_region_p (unsigned int
  extern const predefined_function_abi &fntype_abi (const_tree);
  extern function_abi fndecl_abi (const_tree);
  extern function_abi insn_callee_abi (const rtx_insn *);
 +extern function_abi expr_callee_abi (const_tree);

  #endif
 Index: gcc/function-abi.cc
 ===
 --- gcc/function-abi.cc 2019-09-30 17:39:33.514597856 +0100
 +++ gcc/function-abi.cc 2019-10-11 15:38:54.141605718 +0100
 @@ -229,3 +229,32 @@ insn_callee_abi (const rtx_insn *insn)

return default_function_abi;
  }
 +
 +/* Return the ABI of the function called by CALL_EXPR EXP.  Return
>>the
 +   default ABI for erroneous calls.  */
 +
 +function_abi
 +expr_callee_abi (const_tree exp)
 +{
 +  gcc_assert (TREE_CODE (exp) == CALL_EXPR);
 +
 +  if (tree fndecl = get_callee_fndecl (exp))
 +return fndecl_abi (fndecl);
>>>
>>> Please not.  The ABI in effect on the call is that of
>>> the type of CALL_EXPR_FN, what GIMPLE optimizers
>>> propagated as fndecl here doesn't matter.
>>
>>expr_callee_abi is returning the ABI of the callee function, ignoring
>>any additional effects involved in actually calling it.  It's supposed
>>to take advantage of local IPA information where possible, e.g. by
>>ignoring call-clobbered registers that the target doesn't actually
>>clobber.
>>
>>So if get_callee_fndecl returns either the correct FUNCTION_DECL or
>>null,
>>using it gives better information than just using the type.  Failing to
>>propagate the decl (or not having a decl to propagate) just means that
>>we don't take advantage of the IPA information.
>
> So we don't ever want to use this for stdcall vs. Msabi Or so where the 
> indirect call function type might be correctly attributed while a function 
> decl is not? 

Right.  This wouldn't be the right query for that case, since the
function is only telling us about the target.

 +
 +  tree callee = CALL_EXPR_FN (exp);
 +  if (callee == error_mark_node)
 +return default_function_abi;
 +
 +  tree type = TREE_TYPE (callee);
 +  if (type == error_mark_node)
 +return default_function_abi;
 +
 +  if (POINTER_TYPE_P (type))
 +{
 +  type = TREE_TYPE (type);
 +  if (type == error_mark_node)
 +   return default_function_abi;
 +}
>>>
>>> so when it's not a POINTER_TYPE (it always shold be!)
>>> then you're handing arbitrary types to fntype_abi.
>>
>>fntype_abi asserts that TYPE is an appropriate type.  There's no safe
>>value we can return if it isn't.  (Well, apart from the error_mark_node
>>cases above, where the assumption is that compilation is going to fail
>>anyway.)
>>
>>I can change it to assert for POINTER_TYPE_P if that's guaranteed
>>everywhere.
>
> That's probably better. 

OK, here's what I applied after retesting with the ACLE stuff
that uses it.

Thanks,
Richard


2019-10-16  Richard Sandiford  

gcc/
* function-abi.cc (expr_callee_abi): Assert for POINTER_TYPE_P.

Index: gcc/function-abi.cc
===
--- gcc/function-abi.cc 2019-10-14 09:05:57.142808555 +0100
+++ gcc/function-abi.cc 2019-10-16 11:58:35.362779932 +0100
@@ -249,12 +249,6 @@ expr_callee_abi (const_tree exp)
   if (type == error_mark_node)
 return default_function_abi;
 
-  if (POINTER_TYPE_P (type))
-{
-  type = TREE_TYPE (type);
-  if (type == error_mark_node)
-   return default_function_abi;
-}
-
-  return fntype_abi (type);
+  gcc_assert (POINTER_TYPE_P (type));
+  return fntype_abi (TREE_TYPE (type));
 }

[committed][AArch64] Add partial SVE vector modes

2019-10-16 Thread Richard Sandiford

This patch adds extra vector modes that represent a half, quarter or
eighth of what an SVE vector can hold.  This is useful for describing
the memory vector involved in an extending load or truncating store.
It might also be useful in future for representing "unpacked" SVE
registers, i.e. registers that contain values in the low bits of a
wider containing element.

The new modes could have the same width as an Advanced SIMD mode for
certain -msve-vector-bits=N options, so we need to ensure that they
come later in the mode list and that Advanced SIMD modes always "win".

Tested on aarch64-linux-gnu and aarch64_be-elf, applied as r277062.

Richard


2019-10-16  Richard Sandiford  

gcc/
* genmodes.c (mode_data::order): New field.
(blank_mode): Update accordingly.
(VECTOR_MODES_WITH_PREFIX): Add an order parameter.
(make_vector_modes): Likewise.
(VECTOR_MODES): Update use accordingly.
(cmp_modes): Sort by the new order field ahead of sorting by size.
* config/aarch64/aarch64-modes.def (VNx2QI, VN2xHI, VNx2SI)
(VNx4QI, VNx4HI, VNx8QI): New partial vector modes.
* config/aarch64/aarch64.c (VEC_PARTIAL): New flag value.
(aarch64_classify_vector_mode): Handle the new partial modes.
(aarch64_vl_bytes): New function.
(aarch64_hard_regno_nregs): Use it instead of BYTES_PER_SVE_VECTOR
when counting the number of registers in an SVE mode.
(aarch64_class_max_nregs): Likewise.
(aarch64_hard_regno_mode_ok): Don't allow partial vectors
in registers yet.
(aarch64_classify_address): Treat partial vectors analogously
to full vectors.
(aarch64_print_address_internal): Consolidate the printing of
MUL VL addresses, using aarch64_vl_bytes as the number of
bytes represented by "VL".
(aarch64_vector_mode_supported_p): Reject partial vector modes.

Index: gcc/genmodes.c
===
--- gcc/genmodes.c  2019-10-11 15:38:53.0 +0100
+++ gcc/genmodes.c  2019-10-16 11:52:42.541299208 +0100
@@ -53,6 +53,7 @@ struct mode_data
 
   const char *name;/* printable mode name -- SI, not SImode */
   enum mode_class cl;  /* this mode class */
+  unsigned int order;  /* top-level sorting order */
   unsigned int precision;  /* size in bits, equiv to TYPE_PRECISION */
   unsigned int bytesize;   /* storage size in addressable units */
   unsigned int ncomponents;/* number of subunits */
@@ -85,7 +86,7 @@ struct mode_data
 
 static const struct mode_data blank_mode = {
   0, "", MAX_MODE_CLASS,
-  -1U, -1U, -1U, -1U,
+  0, -1U, -1U, -1U, -1U,
   0, 0, 0, 0, 0, 0,
   "", 0, 0, 0, 0, false, false, 0
 };
@@ -484,14 +485,15 @@ make_complex_modes (enum mode_class cl,
 }
 }
 
-/* For all modes in class CL, construct vector modes of width
-   WIDTH, having as many components as necessary.  */
-#define VECTOR_MODES_WITH_PREFIX(PREFIX, C, W) \
-  make_vector_modes (MODE_##C, #PREFIX, W, __FILE__, __LINE__)
-#define VECTOR_MODES(C, W) VECTOR_MODES_WITH_PREFIX (V, C, W)
+/* For all modes in class CL, construct vector modes of width WIDTH,
+   having as many components as necessary.  ORDER is the sorting order
+   of the mode, with smaller numbers indicating a higher priority.  */
+#define VECTOR_MODES_WITH_PREFIX(PREFIX, C, W, ORDER) \
+  make_vector_modes (MODE_##C, #PREFIX, W, ORDER, __FILE__, __LINE__)
+#define VECTOR_MODES(C, W) VECTOR_MODES_WITH_PREFIX (V, C, W, 0)
 static void ATTRIBUTE_UNUSED
 make_vector_modes (enum mode_class cl, const char *prefix, unsigned int width,
-  const char *file, unsigned int line)
+  unsigned int order, const char *file, unsigned int line)
 {
   struct mode_data *m;
   struct mode_data *v;
@@ -530,6 +532,7 @@ make_vector_modes (enum mode_class cl, c
}
 
   v = new_mode (vclass, xstrdup (buf), file, line);
+  v->order = order;
   v->component = m;
   v->ncomponents = ncomponents;
 }
@@ -832,6 +835,11 @@ cmp_modes (const void *a, const void *b)
   const struct mode_data *const m = *(const struct mode_data *const*)a;
   const struct mode_data *const n = *(const struct mode_data *const*)b;
 
+  if (m->order > n->order)
+return 1;
+  else if (m->order < n->order)
+return -1;
+
   if (m->bytesize > n->bytesize)
 return 1;
   else if (m->bytesize < n->bytesize)
Index: gcc/config/aarch64/aarch64-modes.def
===
--- gcc/config/aarch64/aarch64-modes.def2019-10-11 15:38:53.0 
+0100
+++ gcc/config/aarch64/aarch64-modes.def2019-10-16 11:52:42.537299236 
+0100
@@ -82,8 +82,8 @@ INT_MODE (XI, 64);
strictly necessary to set the alignment here, since the default would
be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer.  */
 #define SVE_MODES(NVECS, VB, VH, VS, VD) \
-  VECTOR_MODES

[committed][AArch64] Improve poly_int handling in aarch64_layout_frame

2019-10-16 Thread Richard Sandiford

I'd used known_lt when converting these conditions to poly_int,
but on reflection that was a bad choice.  The code isn't just
doing a range check; it specifically needs constants that will
fit in a certain encoding.

Tested on aarch64-linux-gnu and aarch64_be-elf, applied as r277061.

Richard


2019-10-16  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_layout_frame): Use is_constant
rather than known_lt when choosing frame layouts.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-10-16 11:47:34.727496649 +0100
+++ gcc/config/aarch64/aarch64.c2019-10-16 11:50:14.386356922 +0100
@@ -5447,7 +5447,7 @@ #define SLOT_REQUIRED (-1)
   else if (frame.wb_candidate1 != INVALID_REGNUM)
 max_push_offset = 256;
 
-  HOST_WIDE_INT const_size, const_fp_offset;
+  HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset;
   if (frame.frame_size.is_constant (&const_size)
   && const_size < max_push_offset
   && known_eq (crtl->outgoing_args_size, 0))
@@ -5457,16 +5457,18 @@ #define SLOT_REQUIRED (-1)
 stp reg3, reg4, [sp, 16]  */
   frame.callee_adjust = const_size;
 }
-  else if (known_lt (crtl->outgoing_args_size + frame.saved_regs_size, 512)
+  else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size)
+  && const_outgoing_args_size + frame.saved_regs_size < 512
   && !(cfun->calls_alloca
-   && known_lt (frame.hard_fp_offset, max_push_offset)))
+   && frame.hard_fp_offset.is_constant (&const_fp_offset)
+   && const_fp_offset < max_push_offset))
 {
   /* Frame with small outgoing arguments:
 sub sp, sp, frame_size
 stp reg1, reg2, [sp, outgoing_args_size]
 stp reg3, reg4, [sp, outgoing_args_size + 16]  */
   frame.initial_adjust = frame.frame_size;
-  frame.callee_offset = frame.frame_size - frame.hard_fp_offset;
+  frame.callee_offset = const_outgoing_args_size;
 }
   else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
   && const_fp_offset < max_push_offset)

[committed][AArch64] Add an assert to aarch64_layout_frame

2019-10-16 Thread Richard Sandiford

This patch adds an assert that all the individual *_adjust allocations
add up to the full frame size.  With that safety net, it seemed slightly
clearer to use crtl->outgoing_args_size as the final adjustment where
appropriate, to match what's used in the comments.

This is a bit overkill on its own, but I need to add more cases for SVE.

Tested on aarch64-linux-gnu and aarch64_be-elf, applied as r277060.

Richard


2019-10-16  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_layout_frame): Assert
that all the adjustments add up to the full frame size.
Use crtl->outgoing_args_size directly as the final adjustment
where appropriate.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-10-16 11:44:27.776831039 +0100
+++ gcc/config/aarch64/aarch64.c2019-10-16 11:47:06.023701540 +0100
@@ -5476,7 +5476,7 @@ #define SLOT_REQUIRED (-1)
 stp reg3, reg4, [sp, 16]
 sub sp, sp, outgoing_args_size  */
   frame.callee_adjust = const_fp_offset;
-  frame.final_adjust = frame.frame_size - frame.callee_adjust;
+  frame.final_adjust = crtl->outgoing_args_size;
 }
   else
 {
@@ -5487,9 +5487,14 @@ #define SLOT_REQUIRED (-1)
 stp reg3, reg4, [sp, 16]
 sub sp, sp, outgoing_args_size  */
   frame.initial_adjust = frame.hard_fp_offset;
-  frame.final_adjust = frame.frame_size - frame.initial_adjust;
+  frame.final_adjust = crtl->outgoing_args_size;
 }
 
+  /* Make sure the individual adjustments add up to the full frame size.  */
+  gcc_assert (known_eq (frame.initial_adjust
+   + frame.callee_adjust
+   + frame.final_adjust, frame.frame_size));
+
   frame.laid_out = true;
 }

[committed][AArch64] Use frame reference in aarch64_layout_frame

2019-10-16 Thread Richard Sandiford

Using the full path "cfun->machine->frame" in aarch64_layout_frame
led to awkward formatting in some follow-on patches, so it seemed
worth using a local reference instead.

Tested on aarch64-linux-gnu and aarch64_be-elf, applied as r277059.

Richard


2019-10-16  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_layout_frame): Use a local
"frame" reference instead of always referring directly to
"cfun->machine->frame".

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-10-11 15:41:20.0 +0100
+++ gcc/config/aarch64/aarch64.c2019-10-16 11:39:45.870842844 +0100
@@ -5321,8 +5321,9 @@ aarch64_layout_frame (void)
   HOST_WIDE_INT offset = 0;
   int regno, last_fp_reg = INVALID_REGNUM;
   bool simd_function = (crtl->abi->id () == ARM_PCS_SIMD);
+  aarch64_frame &frame = cfun->machine->frame;
 
-  cfun->machine->frame.emit_frame_chain = aarch64_needs_frame_chain ();
+  frame.emit_frame_chain = aarch64_needs_frame_chain ();
 
   /* Adjust the outgoing arguments size if required.  Keep it in sync with what
  the mid-end is doing.  */
@@ -5331,21 +5332,20 @@ aarch64_layout_frame (void)
 #define SLOT_NOT_REQUIRED (-2)
 #define SLOT_REQUIRED (-1)
 
-  cfun->machine->frame.wb_candidate1 = INVALID_REGNUM;
-  cfun->machine->frame.wb_candidate2 = INVALID_REGNUM;
+  frame.wb_candidate1 = INVALID_REGNUM;
+  frame.wb_candidate2 = INVALID_REGNUM;
 
   /* First mark all the registers that really need to be saved...  */
   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
-cfun->machine->frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
+frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
 
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
-cfun->machine->frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
+frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
 
   /* ... that includes the eh data registers (if needed)...  */
   if (crtl->calls_eh_return)
 for (regno = 0; EH_RETURN_DATA_REGNO (regno) != INVALID_REGNUM; regno++)
-  cfun->machine->frame.reg_offset[EH_RETURN_DATA_REGNO (regno)]
-   = SLOT_REQUIRED;
+  frame.reg_offset[EH_RETURN_DATA_REGNO (regno)] = SLOT_REQUIRED;
 
   /* ... and any callee saved register that dataflow says is live.  */
   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
@@ -5353,41 +5353,40 @@ #define SLOT_REQUIRED (-1)
&& !fixed_regs[regno]
&& (regno == R30_REGNUM
|| !crtl->abi->clobbers_full_reg_p (regno)))
-  cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
+  frame.reg_offset[regno] = SLOT_REQUIRED;
 
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
&& !fixed_regs[regno]
&& !crtl->abi->clobbers_full_reg_p (regno))
   {
-   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
+   frame.reg_offset[regno] = SLOT_REQUIRED;
last_fp_reg = regno;
   }
 
-  if (cfun->machine->frame.emit_frame_chain)
+  if (frame.emit_frame_chain)
 {
   /* FP and LR are placed in the linkage record.  */
-  cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
-  cfun->machine->frame.wb_candidate1 = R29_REGNUM;
-  cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
-  cfun->machine->frame.wb_candidate2 = R30_REGNUM;
+  frame.reg_offset[R29_REGNUM] = 0;
+  frame.wb_candidate1 = R29_REGNUM;
+  frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
+  frame.wb_candidate2 = R30_REGNUM;
   offset = 2 * UNITS_PER_WORD;
 }
 
   /* With stack-clash, LR must be saved in non-leaf functions.  */
   gcc_assert (crtl->is_leaf
- || (cfun->machine->frame.reg_offset[R30_REGNUM]
- != SLOT_NOT_REQUIRED));
+ || frame.reg_offset[R30_REGNUM] != SLOT_NOT_REQUIRED);
 
   /* Now assign stack slots for them.  */
   for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
-if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
+if (frame.reg_offset[regno] == SLOT_REQUIRED)
   {
-   cfun->machine->frame.reg_offset[regno] = offset;
-   if (cfun->machine->frame.wb_candidate1 == INVALID_REGNUM)
- cfun->machine->frame.wb_candidate1 = regno;
-   else if (cfun->machine->frame.wb_candidate2 == INVALID_REGNUM)
- cfun->machine->frame.wb_candidate2 = regno;
+   frame.reg_offset[regno] = offset;
+   if (frame.wb_candidate1 == INVALID_REGNUM)
+ frame.wb_candidate1 = regno;
+   else if (frame.wb_candidate2 == INVALID_REGNUM)
+ frame.wb_candidate2 = regno;
offset += UNITS_PER_WORD;
   }
 
@@ -5396,7 +5395,7 @@ #define SLOT_REQUIRED (-1)
   bool has_align_gap = offset != max_int_offset;
 
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
-if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
+if (frame.reg_offset[regno] == SLOT_REQUIRED)
   {
/* If th

Re: [PATCH] Use __is_same_as for std::is_same and std::is_same_v

2019-10-16 Thread Jonathan Wakely


On 16/10/19 10:43 +0100, Jonathan Wakely wrote:

On 16/10/19 10:42 +0100, Jonathan Wakely wrote:

On 12/10/19 18:15 +0200, Romain Geissler wrote:

Le sam. 12 oct. 2019 à 17:44, Romain Geissler
 a écrit :


It looks like this creates the following error when I try to bootstrap
clang 9.0.0 using the latest gcc and libstdc++ from trunk. Note that
with g++, there is no problem, however it looks like clang++ has some
problem with the new header. I don't know if this is an issue on
libstdc++ side or clang++ side.


__is_same_as is not implemented as a compiler builtin in clang++


Sigh, I guess that explains why we weren't using it already.


unlike g++, thus the error on clang 9.0.0 side. It looks like current
clang trunk doesn't define __is_same_as either. Until clang implements
this support (if it ever will), usage of __is_same_as in libstdc++
shall be conditional, only when __has_builtin(__is_same_as) is true
(however that would require that gcc implements __has_builtin, as
implemented here:
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00062.html )


No, we don't need GCC to support __has_builtin, because we know GCC
supports __is_same_as so there's no need to check for it.

So something like this would work fine:

#if __GNUC__ >= 10
# define _GLIBCXX_HAVE__IS_SAME_AS
#elif defined(__has_builtin)
# if __has_builtin(__is_same_as)
#  define _GLIBCXX_HAVE__IS_SAME_AS
# endif
#endif


It looks like Clang provides __is_same instead.

I'll commit a fix today.


This is what I've committed to trunk, after testing on x86_64-linux,
and checking it with Clang.

Thanks for testing and reporting the problem.

commit 98afe2e1a9ee322885ce526f3df6ee612566da96
Author: Jonathan Wakely 
Date:   Wed Oct 16 11:00:17 2019 +0100

Only use GCC-specific __is_same_as built-in conditionally

Clang doesn't support __is_same_as but provides __is_same instead.
Restore the original implementation (pre r276891) when neither of those
built-ins is available.

* include/bits/c++config (_GLIBCXX_BUILTIN_IS_SAME_AS): Define to
one of __is_same_as or __is_same when available.
* include/std/concepts (__detail::__same_as): Use std::is_same_v.
* include/std/type_traits (is_same) [_GLIBCXX_BUILTIN_IS_SAME_AS]:
Use new macro instead of __is_same_as.
(is_same) [!_GLIBCXX_BUILTIN_IS_SAME_AS]: Restore partial
specialization.
(is_same_v) [_GLIBCXX_BUILTIN_IS_SAME_AS]: Use new macro.
(is_same_v) [!_GLIBCXX_BUILTIN_IS_SAME_AS]: Use std::is_same.

diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index c8e099aaadd..32db60f39e5 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -633,6 +633,7 @@ namespace std
 # define _GLIBCXX_HAVE_BUILTIN_HAS_UNIQ_OBJ_REP 1
 # define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
 # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
+# define _GLIBCXX_BUILTIN_IS_SAME_AS(T, U) __is_same_as(T, U)
 # if __GNUC__ >= 9
 #  define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1
 # endif
@@ -650,6 +651,9 @@ namespace std
 # if __has_builtin(__builtin_is_constant_evaluated)
 #  define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1
 # endif
+# if ! __is_identifier(__is_same)
+#  define _GLIBCXX_BUILTIN_IS_SAME_AS(T, U) __is_same(T, U)
+# endif
 #endif // GCC
 
 // PSTL configuration
diff --git a/libstdc++-v3/include/std/concepts b/libstdc++-v3/include/std/concepts
index 6b1150a32d2..6d740eb663c 100644
--- a/libstdc++-v3/include/std/concepts
+++ b/libstdc++-v3/include/std/concepts
@@ -23,7 +23,7 @@
 // .
 
 /** @file include/concepts
- *  This is __a Standard C++ Library header.
+ *  This is a Standard C++ Library header.
  *  @ingroup concepts
  */
 
@@ -54,7 +54,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
 template
-  concept __same_as = __is_same_as(_Tp, _Up);
+  concept __same_as = std::is_same_v<_Tp, _Up>;
   } // namespace __detail
 
   /// [concept.same], concept same_as
diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits
index 4de5daa9f06..8e787a994c3 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1390,9 +1390,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// is_same
   template
 struct is_same
-: public integral_constant
+#ifdef _GLIBCXX_BUILTIN_IS_SAME_AS
+: public integral_constant
+#else
+: public false_type
+#endif
 { };
 
+#ifndef _GLIBCXX_BUILTIN_IS_SAME_AS
+  template
+struct is_same<_Tp, _Tp>
+: public true_type
+{ };
+#endif
+
   /// is_base_of
   template
 struct is_base_of
@@ -3154,8 +3165,13 @@ template 
   inline constexpr size_t rank_v = rank<_Tp>::value;
 template 
   inline constexpr size_t extent_v = extent<_Tp, _Idx>::value;
+#ifdef _GLIBCXX_BUILTIN_IS_SAME_AS
 template 
-  inline constexpr bool is_same_v = __is_same_as(_Tp, _Up);
+  inline c

[PATCH] Fix -fdebug-types-section ICE, PR91887

2019-10-16 Thread Richard Biener



The following makes sure we correctly identify a parm DIE created
early in a formal parameter pack during late annotation.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

OK?

Thanks,
Richard.

2019-10-16  Richard Biener  

PR debug/91887
* dwarf2out.c (gen_formal_parameter_die): Also try to match
context_die against a DW_TAG_GNU_formal_parameter_pack parent.

* g++.dg/debug/dwarf2/pr91887.C: New testcase.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 277053)
+++ gcc/dwarf2out.c (working copy)
@@ -22284,19 +22284,18 @@ gen_formal_parameter_die (tree node, tre
   /* If the contexts differ, we may not be talking about the same
 thing.
 ???  When in LTO the DIE parent is the "abstract" copy and the
-context_die is the specification "copy".  But this whole block
-should eventually be no longer needed.  */
-  if (parm_die && parm_die->die_parent != context_die && !in_lto_p)
+context_die is the specification "copy".  */
+  if (parm_die
+ && parm_die->die_parent != context_die
+ && (parm_die->die_parent->die_tag != DW_TAG_GNU_formal_parameter_pack
+ || parm_die->die_parent->die_parent != context_die)
+ && !in_lto_p)
{
- if (!DECL_ABSTRACT_P (node))
-   {
- /* This can happen when creating an inlined instance, in
-which case we need to create a new DIE that will get
-annotated with DW_AT_abstract_origin.  */
- parm_die = NULL;
-   }
- else
-   gcc_unreachable ();
+ gcc_assert (!DECL_ABSTRACT_P (node));
+ /* This can happen when creating a concrete instance, in
+which case we need to create a new DIE that will get
+annotated with DW_AT_abstract_origin.  */
+ parm_die = NULL;
}
 
   if (parm_die && parm_die->die_parent == NULL)
Index: gcc/testsuite/g++.dg/debug/dwarf2/pr91887.C
===
--- gcc/testsuite/g++.dg/debug/dwarf2/pr91887.C (nonexistent)
+++ gcc/testsuite/g++.dg/debug/dwarf2/pr91887.C (working copy)
@@ -0,0 +1,12 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-options "-g -fdebug-types-section" }
+class A {
+public:
+  A();
+  template  A(U);
+};
+template  struct B { typedef A type; };
+template 
+int Bind(R(Args...), typename B::type...) { return 0; }
+void KeepBufferRefs(A, A) { A a, b(Bind(KeepBufferRefs, a, b)); }

[PATCH] Fix PR92119

2019-10-16 Thread Richard Biener



Committed as obvious.

Richard.

2019-10-16  Richard Biener  

PR tree-optimization/92119
* tree-vect-patterns.c (vect_recog_rotate_pattern): Guard
against missing bswap lhs.

Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c(revision 277053)
+++ gcc/tree-vect-patterns.c(working copy)
@@ -2199,7 +2199,8 @@ vect_recog_rotate_pattern (stmt_vec_info
   lhs = gimple_call_lhs (last_stmt);
   oprnd0 = gimple_call_arg (last_stmt, 0);
   type = TREE_TYPE (oprnd0);
-  if (TYPE_PRECISION (TREE_TYPE (lhs)) != 16
+  if (!lhs
+ || TYPE_PRECISION (TREE_TYPE (lhs)) != 16
  || TYPE_PRECISION (type) <= 16
  || TREE_CODE (oprnd0) != SSA_NAME
  || BITS_PER_UNIT != 8

Re: [PATCH V2] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-16 Thread Segher Boessenkool

Hi!

On Wed, Oct 16, 2019 at 04:50:21PM +0800, Jiufu Guo wrote:
> In PR70010, a function is marked with target(no-vsx) to disable VSX code
> generation.  To avoid VSX code generation, this function should not be
> inlined into VSX function.  
> 
> In previous implementation, target of non-vsx is treated as subset target
> with vsx, even user set no-vsx attribute. 
> 
> As discussed in previous mails, to fix the bug, in the current logic when
> checking whether the caller's ISA flags supports the callee's ISA flags, we
> just need to add a test that enforces that the caller's ISA flags match
> exactly the callee's flags, for those flags that were explicitly set in the
> callee.  If caller without target attribute then using options from command
> line.  Here is a patch which is updated with comments from previous mails.  

> gcc/
> 2019-10-12  Peter Bergner 
>   Jiufu Guo  
> 
>   PR target/70010
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Prohibit inlining if
>   the callee explicitly disables some isa_flags the caller is using.  

No trailing spaces please.

> +  /* If the caller has option attributes, then use them.
> +  Otherwise, use the command line options.  */
> +  if (caller_tree)
> + caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
> +  else
> + caller_isa = rs6000_isa_flags;

Okay, it's very clear with that comment -- or it *seems* easy, anyway :-)

> +  /* The callee's options must be a subset of the caller's options, i.e.
> +  a vsx function may inline an altivec function, but a non-vsx function
> +  must not inline a vsx function.  However, for those options that the

no-vsx instead of non-vsx?  Or make it clear even more explicitly that this is
about having something explicitly disabled?  (The code is clear though).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -flto -mvsx" } */
> +
> +vector int c, a, b;
> +
> +static inline void __attribute__ ((__always_inline__, target ("no-vsx")))
> +foo () /* { dg-error "inlining failed in call to .* target specific option 
> mismatch" } */

.* can match across lines.  Not a huge deal here of course -- but maybe
adding  (?n)  to the regexp works?  (Just at the very start of it).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010-2.c

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010-3.c 
> b/gcc/testsuite/gcc.target/powerpc/pr70010-3.c
> new file mode 100644

Do you want to test anything in those two new testcases?  Other than "it
compiles"?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010-4.c

> +foo () /* { dg-error "inlining failed in call to .* target specific option 
> mismatch" } */

(.* again)

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010.c 
> b/gcc/testsuite/gcc.target/powerpc/pr70010.c
> new file mode 100644
> index 000..affd0c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr70010.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */

This line isn't necessary anymore I think?  Or if it is, it is needed in
all these new testcases.

> +/* { dg-options "-O2 -finline-functions" } */
> +/* { dg-final { scan-assembler {\mbl vadd_no_vsx\M} } } */

Okay for trunk.  Thanks to both of you!

Also okay for 9 and 8, after waiting a week to see if there is fallout.


Segher

Re: [PATCH] Use __is_same_as for std::is_same and std::is_same_v

2019-10-16 Thread Jonathan Wakely


On 16/10/19 10:42 +0100, Jonathan Wakely wrote:

On 12/10/19 18:15 +0200, Romain Geissler wrote:

Le sam. 12 oct. 2019 à 17:44, Romain Geissler
 a écrit :


It looks like this creates the following error when I try to bootstrap
clang 9.0.0 using the latest gcc and libstdc++ from trunk. Note that
with g++, there is no problem, however it looks like clang++ has some
problem with the new header. I don't know if this is an issue on
libstdc++ side or clang++ side.


__is_same_as is not implemented as a compiler builtin in clang++


Sigh, I guess that explains why we weren't using it already.


unlike g++, thus the error on clang 9.0.0 side. It looks like current
clang trunk doesn't define __is_same_as either. Until clang implements
this support (if it ever will), usage of __is_same_as in libstdc++
shall be conditional, only when __has_builtin(__is_same_as) is true
(however that would require that gcc implements __has_builtin, as
implemented here:
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00062.html )


No, we don't need GCC to support __has_builtin, because we know GCC
supports __is_same_as so there's no need to check for it.

So something like this would work fine:

#if __GNUC__ >= 10
# define _GLIBCXX_HAVE__IS_SAME_AS
#elif defined(__has_builtin)
# if __has_builtin(__is_same_as)
#  define _GLIBCXX_HAVE__IS_SAME_AS
# endif
#endif


It looks like Clang provides __is_same instead.

I'll commit a fix today.

Re: [PATCH] Use __is_same_as for std::is_same and std::is_same_v

2019-10-16 Thread Jonathan Wakely


On 12/10/19 18:15 +0200, Romain Geissler wrote:

Le sam. 12 oct. 2019 à 17:44, Romain Geissler
 a écrit :


It looks like this creates the following error when I try to bootstrap
clang 9.0.0 using the latest gcc and libstdc++ from trunk. Note that
with g++, there is no problem, however it looks like clang++ has some
problem with the new header. I don't know if this is an issue on
libstdc++ side or clang++ side.


__is_same_as is not implemented as a compiler builtin in clang++


Sigh, I guess that explains why we weren't using it already.


unlike g++, thus the error on clang 9.0.0 side. It looks like current
clang trunk doesn't define __is_same_as either. Until clang implements
this support (if it ever will), usage of __is_same_as in libstdc++
shall be conditional, only when __has_builtin(__is_same_as) is true
(however that would require that gcc implements __has_builtin, as
implemented here:
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00062.html )


No, we don't need GCC to support __has_builtin, because we know GCC
supports __is_same_as so there's no need to check for it.

So something like this would work fine:

#if __GNUC__ >= 10
# define _GLIBCXX_HAVE__IS_SAME_AS
#elif defined(__has_builtin)
# if __has_builtin(__is_same_as)
#  define _GLIBCXX_HAVE__IS_SAME_AS
# endif
#endif

Re: [PATCH] Fix PR91975, tame PRE some more

2019-10-16 Thread Richard Biener

On Mon, 7 Oct 2019, Richard Biener wrote:

> 
> The following tries to address the issue that PRE is quite happy
> to introduce new IVs in loops just because it can compute some
> constant value on the loop entry edge.  In principle there's
> already code that should work against that but it simply searches
> for a optimize_edge_for_speed () edge.  That still considers
> the loop entry edge to be worth optimizing because it ends
> up as maybe_hot_edge_p (e) for -O2 which compares the edge count
> against the entry block count.  For PRE we want something more
> local (comparing to the destination block count).
> 
> Now for the simple testcases this shouldn't make a difference
> but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which
> isn't the same as profile_probabilities likely or very likely...
> 
> Still one key of the patch is that we compare the sum of the
> edge counts where the value is available (and thus the redundancy
> elimination happens) with the count we have to insert rather
> than looking for a single optimize_edge_for_speed_p edge.
> 
> For that I've used
> 
> if (avail_count < block->count.apply_probability
> (profile_probability::unlikely ()))
> 
> so we block insertion if the redundancies would overall be "unlikely".
> 
> I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION
> while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger
> default value) that seems to match better for scaling a profile-count?
> 
> Honza?

Honza - does this sound sensible?  Can you comment on the
maybe_hot_count_p param use?

I wanted to avoid specifically looking to avoid inserting on
backedges but instead improve the optimize_edge_for_speed
heuristic.

> Bootstrap & regtest running on x86-64-unknown-linux-gnu.
> 
> Does the above predicate look sane or am I on a wrong track with
> using the destination block count here (I realize even the "locally cold"
> entries into the block might be quite hot globally).
> 
> For a 1:1 translation of the existing code to sth using the
> original predicate but summing over edges I could use
> !maybe_hot_count_p (cfun, avail_count)?  But then we're back to
> PRE doing the unwanted insertions.  Changing maybe_hot_count_p
> to use HOT_BB_COUNT_FRACTION doesn't make any difference there
> (obviously).
> 
> Thanks,
> Richard.
> 
> 2019-10-06  Richard Biener  
> 
>   PR tree-optimization/91975
>   * tree-ssa-pre.c (do_pre_regular_insertion): Adjust
>   profitability check to use the sum of all edge counts the
>   value is available on and check against unlikely execution
>   of the block.
> 
>   * gcc.dg/tree-ssa/ldist-39.c: New testcase.
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
> new file mode 100644
> index 000..a63548979ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ldist-details" } */
> +
> +#define T int
> +
> +const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
> +T b[sizeof a / sizeof *a];
> +
> +void f0 (void)
> +{
> +  const T *s = a;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +d[i] = s[i];
> +}
> +
> +void g0 (void)
> +{
> +  const T *s = a;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +*d++ = *s++;
> +}
> +
> +extern const T c[sizeof a / sizeof *a];
> +
> +void f1 (void)
> +{
> +  const T *s = c;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +d[i] = s[i];
> +}
> +
> +void g1 (void)
> +{
> +  const T *s = c;
> +  T *d = b;
> +  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
> +*d++ = *s++;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */
> diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
> index c618601a184..af49ba388c1 100644
> --- a/gcc/tree-ssa-pre.c
> +++ b/gcc/tree-ssa-pre.c
> @@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, 
> basic_block dom)
> pre_expr eprime = NULL;
> edge_iterator ei;
> pre_expr edoubleprime = NULL;
> -   bool do_insertion = false;
> +   profile_count avail_count = profile_count::zero ();
>  
> val = get_expr_value_id (expr);
> if (bitmap_set_contains_value (PHI_GEN (block), val))
> @@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, 
> basic_block dom)
>   {
> avail[pred->dest_idx] = edoubleprime;
> by_some = true;
> -   /* We want to perform insertions to remove a redundancy on
> -  a path in the CFG we want to optimize for speed.  */
> -   if (optimize_edge_for_speed_p (pred))
> - do_insertion = true;
> +   avail_count += pred->count ();
> if (first_s == NULL)
>   first_s = edoubleprime;
> els

Re: [PATCH 2/4] Use gomp_map_val for OpenACC host-to-device address translation

2019-10-16 Thread Thomas Schwinge

Hi!

On 2019-10-06T15:32:35-0700, Julian Brown  wrote:
> This patch uses gomp_map_val for OpenACC host-to-device address
> translation instead of open-coding the device address calculation.

(As has been discussed before.)

(And then, also see  "'GOACC_parallel_keyed'
should use 'GOMP_MAP_VARS_TARGET'" for additional simplification.)

> OK for trunk?

Thanks, that's OK, with one item addressed (see below).  To record the
review effort please include "Reviewed-by: Thomas Schwinge
" in the commit log, see
.

>   libgomp/
>   * libgomp.h (gomp_map_val): Add prototype.
>   * oacc-parallel.c (GOACC_parallel_keyed): Use gomp_map_val instead of
>   open-coding device-address calculation.
>   * target.c (gomp_map_val): Make global.
> ---
>  libgomp/libgomp.h   | 1 +
>  libgomp/oacc-parallel.c | 8 ++--
>  libgomp/target.c| 2 +-
>  3 files changed, 4 insertions(+), 7 deletions(-)

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -1095,6 +1095,7 @@ extern void gomp_copy_host2dev (struct 
> gomp_device_descr *,
>  extern void gomp_copy_dev2host (struct gomp_device_descr *,
>   struct goacc_asyncqueue *, void *, const void *,
>   size_t);
> +extern uintptr_t gomp_map_val (struct target_mem_desc *, void **, size_t);
>  
>  #ifdef RC_CHECKING
>  extern void dump_tgt (const char *, struct target_mem_desc *);
> diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
> index 18feac5f31c..02472c56270 100644
> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -350,12 +350,8 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
>  
>devaddrs = gomp_alloca (sizeof (void *) * mapnum);
>for (i = 0; i < mapnum; i++)
> -if (tgt->list[i].key != NULL)
> -  devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
> -   + tgt->list[i].key->tgt_offset
> -   + tgt->list[i].offset);
> -else
> -  devaddrs[i] = NULL;
> +devaddrs[i] = (void *) gomp_map_val (tgt, hostaddrs, i);
> +
>if (aq == NULL)
>  acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs, dims,
>   tgt);
> diff --git a/libgomp/target.c b/libgomp/target.c
> index 29cb7ca8348..aa7a1df1e46 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -670,7 +670,7 @@ gomp_map_fields_existing (struct target_mem_desc *tgt,
> (void *) cur_node.host_end);
>  }
>  
> -static inline uintptr_t
> +uintptr_t
>  gomp_map_val (struct target_mem_desc *tgt, void **hostaddrs, size_t i)

In  I asked
whether that "Should be 'attribute_hidden'"?


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [PATCH] OpenACC reference count consistency checking

2019-10-16 Thread Thomas Schwinge

Hi!

On 2019-10-03T09:35:05-0700, Julian Brown  wrote:
> This patch provides self-checking for consistency of the OpenACC
> reference-counting implementation in libgomp.

Earlier submissions included description what exactly this is doing,
.
Will be good to get that into the libgomp manual -- or into the code as
comments, given this is an internal-only thing?

> Tested alongside (and
> dependent on) the patch posted adjacent to this one that overhauls said
> reference-counting implementation.

(I'm reviewing that one.)

> Tested (with RC_CHECKING enabled) with offloading to NVPTX. OK for trunk?

So, for this patch, we'll first need to agree on what approach we
generally want to take.  (Jakub?)

Adding "dead" code ('#ifdef RC_CHECKING', off by default) is not a good
thing for maintenance, might easily become out of date, etc.  On the
other hand, I do understand that the functionality is useful --
supposedly not just for us developers, but also for users, to detect
wrong OpenACC data/region directives, API calls, etc.?

In a different context (OpenMP 5 OMPT), the option was mentioned to have
"a variant library that is a replacement for libgomp if more detailed
instrumentation is needed".  (That shouldn't be difficult to implement
given the Automake build system.)  Is that an option for this thing here,
too?  That is, an ABI-compatible "debugging" libgomp, with more run-time
consistency checking etc., with this one here being a start, and more can
be added later?


> 2019-10-02  Julian Brown  
>
>   libgomp/
>   * libgomp.h (RC_CHECKING): New macro, disabled by default, guarding all
>   hunks in this patch.
>   (target_mem_desc): Add refcount_chk, mark fields.
>   (splay_tree_key_s): Add refcount_chk field.
>   (dump_tgt, gomp_rc_check): Add prototypes.
>   * oacc-parallel.c (GOACC_parallel_keyed): Add refcount self-check code.
>   (GOACC_data_start, GOACC_data_end, GOACC_enter_exit_data): Likewise.
>   * target.c (stdio.h): Include.
>   (dump_tgt, rc_check_clear, rc_check_count, rc_check_verify,
>   gomp_rc_check): New functions to consistency-check reference counts.
> ---
>  libgomp/libgomp.h   |  18 
>  libgomp/oacc-parallel.c |  33 
>  libgomp/target.c| 178 
>  3 files changed, 229 insertions(+)

Just to make sure: earlier versions submitted (see link above) included
more changes, for example to "async" code, a new 'struct async_tgt_use'
etc., which is no longer present here.  Is that because that's no longer
needed given the "recent" "Async-rework"?

> diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
> index 6b7ed7248a1..19553a37c13 100644
> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -873,9 +873,17 @@ struct target_var_desc {
>uintptr_t length;
>  };
>  
> +/* Uncomment to enable reference-count consistency checking (for development
> +   use only).  */
> +//#define RC_CHECKING 1
> +
>  struct target_mem_desc {
>/* Reference count.  */
>uintptr_t refcount;
> +#ifdef RC_CHECKING
> +  uintptr_t refcount_chk;
> +  bool mark;
> +#endif
>/* All the splay nodes allocated together.  */
>splay_tree_node array;
>/* Start of the target region.  */
> @@ -930,6 +938,10 @@ struct splay_tree_key_s {
>   If set to VREFCOUNT_LINK_KEY (for OpenMP, where this field is not 
> otherwise
>   needed), the union below represents a link key.  */
>uintptr_t virtual_refcount;
> +#ifdef RC_CHECKING
> +  /* The recalculated reference count, for verification.  */
> +  uintptr_t refcount_chk;
> +#endif
>union {
>  /* Pointer to the original mapping of "omp declare target link" object.
> Only used for OpenMP.  */
> @@ -1084,6 +1096,12 @@ extern void gomp_copy_dev2host (struct 
> gomp_device_descr *,
>   struct goacc_asyncqueue *, void *, const void *,
>   size_t);
>  
> +#ifdef RC_CHECKING
> +extern void dump_tgt (const char *, struct target_mem_desc *);
> +extern void gomp_rc_check (struct gomp_device_descr *,
> +struct target_mem_desc *);
> +#endif
> +
>  extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
> size_t, void **, void **,
> size_t *, void *, bool,
> diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
> index 7e72d9c6b24..18feac5f31c 100644
> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -339,6 +339,15 @@ GOACC_parallel_keyed (int flags_m, void (*fn) (void *),
>   &api_info);
>  }
>
> +#ifdef RC_CHECKING
> +  gomp_mutex_lock (&acc_dev->lock);
> +  assert (tgt);
> +  dump_tgt (__FUNCTION__, tgt);
> +  tgt->prev = thr->mapped_data;
> +  gomp_rc_check (acc_dev, tgt);
> +  gomp_mutex_unlock (&acc_dev->lock);
> +#endif

[PATCH V2] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-16 Thread Jiufu Guo



Hi,

In PR70010, a function is marked with target(no-vsx) to disable VSX code
generation.  To avoid VSX code generation, this function should not be
inlined into VSX function.  

In previous implementation, target of non-vsx is treated as subset target
with vsx, even user set no-vsx attribute. 

As discussed in previous mails, to fix the bug, in the current logic when
checking whether the caller's ISA flags supports the callee's ISA flags, we
just need to add a test that enforces that the caller's ISA flags match
exactly the callee's flags, for those flags that were explicitly set in the
callee.  If caller without target attribute then using options from command
line.  Here is a patch which is updated with comments from previous mails.  

Bootstrapped/regtested on ppc64le-linux, ok for trunk, and backport to 9?

Jiufu
BR


gcc/
2019-10-12  Peter Bergner 
Jiufu Guo  

PR target/70010
* config/rs6000/rs6000.c (rs6000_can_inline_p): Prohibit inlining if
the callee explicitly disables some isa_flags the caller is using.  

gcc.testsuite/
2019-10-12  Peter Bergner 
Jiufu Guo  

PR target/70010
* gcc.target/powerpc/pr70010.c: New test.  
* gcc.target/powerpc/pr70010-1.c: New test.  
* gcc.target/powerpc/pr70010-2.c: New test.  
* gcc.target/powerpc/pr70010-3.c: New test.  
* gcc.target/powerpc/pr70010-4.c: New test.  


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d1434a9..26985d3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -23964,25 +23964,31 @@ rs6000_can_inline_p (tree caller, tree callee)
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If callee has no option attributes, then it is ok to inline.  */
+  /* If the callee has no option attributes, then it is ok to inline.  */
   if (!callee_tree)
 ret = true;
 
-  /* If caller has no option attributes, but callee does then it is not ok to
- inline.  */
-  else if (!caller_tree)
-ret = false;
-
   else
 {
-  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  HOST_WIDE_INT caller_isa;
   struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
 
-  /* Callee's options should a subset of the caller's, i.e. a vsx function
-can inline an altivec function but a non-vsx function can't inline a
-vsx function.  */
-  if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
- == callee_opts->x_rs6000_isa_flags)
+  /* If the caller has option attributes, then use them.
+Otherwise, use the command line options.  */
+  if (caller_tree)
+   caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
+  else
+   caller_isa = rs6000_isa_flags;
+
+  /* The callee's options must be a subset of the caller's options, i.e.
+a vsx function may inline an altivec function, but a non-vsx function
+must not inline a vsx function.  However, for those options that the
+callee has explicitly set, then we must enforce that the callee's
+and caller's options match exactly; see PR70010.  */
+  if (((caller_isa & callee_isa) == callee_isa)
+ && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
ret = true;
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr70010-1.c
new file mode 100644
index 000..78870db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70010-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -flto -mvsx" } */
+
+vector int c, a, b;
+
+static inline void __attribute__ ((__always_inline__, target ("no-vsx")))
+foo () /* { dg-error "inlining failed in call to .* target specific option 
mismatch" } */
+{
+  c = a + b;
+}
+
+int
+main ()
+{
+  foo (); /* { dg-message "called from here" } */
+  c = a + b;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr70010-2.c
new file mode 100644
index 000..4c09b21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70010-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -flto -mno-vsx" } */
+
+vector int c, a, b;
+
+static inline void __attribute__ ((__always_inline__, target ("no-vsx")))
+foo ()
+{
+  c = a + b;
+}
+
+int
+main ()
+{
+  foo ();
+  c = a + b;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70010-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr70010-3.c
new file mode 100644
index 000..bca3187
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70010-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-vsx" } */
+
+vector int c, a, b;
+
+static inline void __

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-16 Thread Richard Sandiford

Wilco Dijkstra  writes:
> Hi Richard,
>> Sure, the "extern array of unknown size" case isn't about section anchors.
>> But this part of my message (snipped above) was about the other case
>> (objects of known size), and applied to individual objects as well as
>> section anchors.
>>
>> What I was trying to say is: yes, we need better offsets for references
>> to extern objects of unknown size.  But your patch does that by reducing
>> the offset range for all objects, including ones with known sizes in
>> which the documented ranges should be safe.
>>
>> I was trying to explain why I don't think we need to reduce the range
>> in that case too.  If offset_within_block_p then any offset should be
>> OK (the aggressive interpretation) or the original documented ranges
>> should be OK.  I think we only need the smaller range when
>> offset_within_block_p returns false.
>
> Right I see now. Yes given offset_within_block_p is not sufficient, we could 
> test
> both conditions. Here is the updated patch:
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 7ee31a66b12d7354759f06449955e933421f5fe0..31c394a7e8567dd7d4f1698e5ba98aeb8807df38
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -14079,26 +14079,30 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT 
> offset)
>   the offset does not cause overflow of the final address.  But
>   we have no way of knowing the address of symbol at compile time
>   so we can't accurately say if the distance between the PC and
> - symbol + offset is outside the addressible range of +/-1M in the
> - TINY code model.  So we rely on images not being greater than
> - 1M and cap the offset at 1M and anything beyond 1M will have to
> - be loaded using an alternative mechanism.  Furthermore if the
> - symbol is a weak reference to something that isn't known to
> - resolve to a symbol in this module, then force to memory.  */
> -  if ((SYMBOL_REF_WEAK (x)
> -   && !aarch64_symbol_binds_local_p (x))
> -  || !IN_RANGE (offset, -1048575, 1048575))
> + symbol + offset is outside the addressible range of +/-1MB in the
> + TINY code model.  So we limit the maximum offset to +/-64KB and
> + assume the offset to the symbol is not larger than +/-(1MB - 64KB).
> + Furthermore force to memory if the symbol is a weak reference to
> + something that doesn't resolve to a symbol in this module.  */
> +
> +  if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
> +return SYMBOL_FORCE_TO_MEM;
> +  if (!(IN_RANGE (offset, -0x1, 0x1)
> +|| offset_within_block_p (x, offset)))
>  return SYMBOL_FORCE_TO_MEM;
> +
>return SYMBOL_TINY_ABSOLUTE;
>
>  case AARCH64_CMODEL_SMALL:
>/* Same reasoning as the tiny code model, but the offset cap here is
> - 4G.  */
> -  if ((SYMBOL_REF_WEAK (x)
> -   && !aarch64_symbol_binds_local_p (x))
> -  || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
> -HOST_WIDE_INT_C (4294967264)))
> + 1MB, allowing +/-3.9GB for the offset to the symbol.  */
> +
> +  if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
> +return SYMBOL_FORCE_TO_MEM;
> +  if (!(IN_RANGE (offset, -0x10, 0x10)
> +|| offset_within_block_p (x, offset)))
>  return SYMBOL_FORCE_TO_MEM;

Think this is just the mailer eating tabs, but: || not properly indented.

OK otherwise, thanks.

Richard

Re: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-10-16 Thread Feng Xue OS

Hi Philipp,

  This patch is still under code review, might still need some time. Thanks,

Feng


From: Philipp Tomsich 
Sent: Wednesday, October 16, 2019 12:05 AM
To: Feng Xue OS
Cc: Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org; Christoph Müllner; 
erick.oc...@theobroma-systems.com
Subject: Re: [PATCH V4] Extend IPA-CP to support arithmetically-computed 
value-passing on by-ref argument (PR ipa/91682)

Feng,

this now looks fine to me: what is the current schedule to get this merged?

Thanks,
Philipp.

> On 19.09.2019, at 16:30, Feng Xue OS  
> wrote:
>
> Fix a bug on unary/binary operation check.
>
> Feng
> ---
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 33d52fe5537..f218f1093b8 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1244,23 +1244,23 @@ initialize_node_lattices (struct cgraph_node *node)
>   }
> }
>
> -/* Return the result of a (possibly arithmetic) pass through jump function
> -   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
> -   to which the result is passed.  Return NULL_TREE if that cannot be
> -   determined or be considered an interprocedural invariant.  */
> +/* Return the result of a (possibly arithmetic) operation on the constant
> +   value INPUT.  OPERAND is 2nd operand for binary operation.  RES_TYPE is
> +   the type of the parameter to which the result is passed.  Return
> +   NULL_TREE if that cannot be determined or be considered an
> +   interprocedural invariant.  */
>
> static tree
> -ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
> - tree res_type)
> +ipa_get_jf_arith_result (enum tree_code opcode, tree input, tree operand,
> +  tree res_type)
> {
>   tree res;
>
> -  if (ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR)
> +  if (opcode == NOP_EXPR)
> return input;
>   if (!is_gimple_ip_invariant (input))
> return NULL_TREE;
>
> -  tree_code opcode = ipa_get_jf_pass_through_operation (jfunc);
>   if (!res_type)
> {
>   if (TREE_CODE_CLASS (opcode) == tcc_comparison)
> @@ -1274,8 +1274,7 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
> *jfunc, tree input,
>   if (TREE_CODE_CLASS (opcode) == tcc_unary)
> res = fold_unary (opcode, res_type, input);
>   else
> -res = fold_binary (opcode, res_type, input,
> -ipa_get_jf_pass_through_operand (jfunc));
> +res = fold_binary (opcode, res_type, input, operand);
>
>   if (res && !is_gimple_ip_invariant (res))
> return NULL_TREE;
> @@ -1283,6 +1282,21 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
> *jfunc, tree input,
>   return res;
> }
>
> +/* Return the result of a (possibly arithmetic) pass through jump function
> +   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
> +   to which the result is passed.  Return NULL_TREE if that cannot be
> +   determined or be considered an interprocedural invariant.  */
> +
> +static tree
> +ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
> + tree res_type)
> +{
> +  return ipa_get_jf_arith_result (ipa_get_jf_pass_through_operation (jfunc),
> +   input,
> +   ipa_get_jf_pass_through_operand (jfunc),
> +   res_type);
> +}
> +
> /* Return the result of an ancestor jump function JFUNC on the constant value
>INPUT.  Return NULL_TREE if that cannot be determined.  */
>
> @@ -1416,6 +1430,146 @@ ipa_context_from_jfunc (ipa_node_params *info, 
> cgraph_edge *cs, int csidx,
>   return ctx;
> }
>
> +/* See if NODE is a clone with a known aggregate value at a given OFFSET of a
> +   parameter with the given INDEX.  */
> +
> +static tree
> +get_clone_agg_value (struct cgraph_node *node, HOST_WIDE_INT offset,
> +  int index)
> +{
> +  struct ipa_agg_replacement_value *aggval;
> +
> +  aggval = ipa_get_agg_replacements_for_node (node);
> +  while (aggval)
> +{
> +  if (aggval->offset == offset
> +   && aggval->index == index)
> + return aggval->value;
> +  aggval = aggval->next;
> +}
> +  return NULL_TREE;
> +}
> +
> +/* Determine whether ITEM, jump function for an aggregate part, evaluates to 
> a
> +   single known constant value and if so, return it.  Otherwise return NULL.
> +   NODE and INFO describes the caller node or the one it is inlined to, and
> +   its related info.  */
> +
> +static tree
> +ipa_agg_value_from_node (class ipa_node_params *info,
> +  struct cgraph_node *node,
> +  struct ipa_agg_jf_item *item)
> +{
> +  tree value = NULL_TREE;
> +  int src_idx;
> +
> +  if (item->offset < 0 || item->jftype == IPA_JF_UNKNOWN)
> +return NULL_TREE;
> +
> +  if (item->jftype == IPA_JF_CONST)
> +return item->value.constant;
> +
> +  gcc_checking_assert (item->jftype == IPA_JF_PASS_THROUGH
> +|| i

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-16 Thread Jakub Jelinek

On Wed, Oct 16, 2019 at 03:38:38AM -0400, Aldy Hernandez wrote:
> Would you take care of this, or shall I?

Will defer to you, I have quite a lot of stuff on my plate ATM.

Jakub

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-16 Thread Aldy Hernandez


On 10/15/19 2:16 PM, Jakub Jelinek wrote:

On Tue, Oct 15, 2019 at 08:35:07AM -0400, Aldy Hernandez wrote:

I'm seeing this on 32-bit i386-pc-solaris2.11 and sparc-sun-solaris2.11,
with more reports for armv8l, pru, and s390x.

Comparing the dumps between 64 and 32-bit, I see

-_1: int * [1B, -1B]
+_1: int * [1B, 4294967295B]


I wonder why 32-bit targets at displaying 4294967295 instead of -1.  Or are
pointers 64-bits here?


Because the dump method does:
   if (INTEGRAL_TYPE_P (ttype)
   && vrp_val_is_max (max ())
   && TYPE_PRECISION (ttype) != 1)
 fprintf (file, "+INF");
   else
 print_generic_expr (file, max ());
so for integral types and maximum value, it prints +INF, but not for
pointers.


Ah, I see.


Perhaps we want to print +INF also for pointers,
   if ((INTEGRAL_TYPE_P (ttype) || POINTER_TYPE_P (ttype))
  && vrp_val_is_max (max (), true)
   && TYPE_PRECISION (ttype) != 1)
 fprintf (file, "+INF");
   else
 print_generic_expr (file, max ());


That sounds reasonable, though I would use supports_type_p() instead of 
open-coding the check for INTEGRAL and POINTER.


Would you take care of this, or shall I?


but maybe vrp_val_is_{min,max} should be rewritten for pointer types to be
more efficient, don't create trees, for min just use integer_zerop and for
max just compare wide_int?


That sounds like a separate issue, but sure.  No complaints.

Note, that I highly dislike the whole handle_pointers=bool argument in 
vrp_*val*, even though I added it.  I think it should default to the 
handle_pointers behavior, though I was unsure what would break, so I 
kept existing behavior gated by a bool (yuck).


Thanks.
Aldy

95 matches

Mail list logo