date:20160202

Re: [PATCH] Fix c/69643, named address space wrong-code

2016-02-02 Thread Richard Henderson


On 02/03/2016 06:05 PM, Richard Biener wrote:
 I wasn't aware that STRIP_NOPS strips ADDR_SPACE_CONVERT_EXPR.


Isn't this maybe failing to use that (unable to look at the attachment from my 
phone).


The test case does fail to use ADDR_SPACE_CONVERT_EXPR.
Perhaps it's because of the intermediate cast to uintptr_t?

Of course, for this case, the intermediate cast is required
because __seg_[fg]s are *not* subsets of ADDR_SPACE_GENERIC,
and thus a direct cast between the pointer types results in
an error message.


r~

Re: [PATCH] Fix c/69643, named address space wrong-code

2016-02-02 Thread Richard Biener

On February 3, 2016 7:03:54 AM GMT+01:00, Richard Henderson  
wrote:
>In gimple_fold_indirect_ref, we STRIP_NOPS, find the ADDR_EXPR, and
>fold 
>everything away.
>
>I can't imagine it ever being correct to drop an address space change
>between 
>pointers, so I've modified tree_nop_conversion_p.  Anything else seems
>to 
>require more checks every places we use STRIP_NOPS.
>
>Ok?

I wasn't aware that STRIP_NOPS strips ADDR_SPACE_CONVERT_EXPR.

Isn't this maybe failing to use that (unable to look at the attachment from my 
phone).

Richard.

>
>r~

Go patch committed: Mark stub functions with $stub, skip them in runtime.Callers

2016-02-02 Thread Ian Lance Taylor

It's been a long-standing problem in gccgo that the testing package
does not report the correct file/line information when using a method
like t.Error.  This is because the testing type uses an embedded type,
and methods like t.Error are actually inherited from the embedded
type.  This means that the method is a stub.  In the gc toolchain,
stubs are thunks that jump directly to the code and do not remain on
the stack.  In the gccgo toolchain, they do remain on the stack, which
means that code that calls runtime.Caller will see them in places
where the gc toolchain does not.

This patch fixes the problem by marking stub functions with $stub in
their name, and skipping $stub functions in runtime.Caller.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 233097)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a408bef550251926c28673818db2c64302faac1d
+c70e74c116d08c6f2e787551eb1366983815c032
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 233097)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -9632,13 +9632,14 @@ Type::build_stub_methods(Gogo* gogo, con
package = NULL;
   else
package = type->named_type()->named_object()->package();
+  std::string stub_name = name + "$stub";
   Named_object* stub;
   if (package != NULL)
-   stub = Named_object::make_function_declaration(name, package,
+   stub = Named_object::make_function_declaration(stub_name, package,
   stub_type, location);
   else
{
- stub = gogo->start_function(name, stub_type, false,
+ stub = gogo->start_function(stub_name, stub_type, false,
  fntype->location());
  Type::build_one_stub_method(gogo, m, buf, stub_params,
  fntype->is_varargs(), location);
Index: libgo/runtime/go-callers.c
===
--- libgo/runtime/go-callers.c  (revision 232239)
+++ libgo/runtime/go-callers.c  (working copy)
@@ -74,6 +74,8 @@ callback (void *data, uintptr_t pc, cons
   p = __builtin_strrchr (function, '$');
   if (p != NULL && __builtin_strcmp(p, "$recover") == 0)
return 0;
+  if (p != NULL && __builtin_strncmp(p, "$stub", 5) == 0)
+   return 0;
 }
 
   if (arg->skip > 0)

Re: [PATCH] [graphite] document that isl-0.16 is supported

2016-02-02 Thread Sebastian Huber

On 03/02/16 07:29, Sebastian Huber wrote:

On 02/02/16 19:00, Mike Stump wrote:
On Feb 2, 2016, at 2:23 AM, Sebastian 
Huber  wrote:
>It would be good to have a recommended version as well (similar for 
cloog, gmp, mpc and mpfr). If you present me three versions which 
one should I choose as a naive user?
The latest release, or the one on your system.  This is so basic that 
we expect you to already know this.

>Are the versions in the contrib/download_prerequisites script the 
recommended ones?

Yes, they are.

If it is so basic to choose the latest release or the one on the 
system, then why uses the contrib/download_prerequisites ancient 
versions, e.g. the six year old GMP 4.3.2?

There is exactly one version of GMP, MPC and MPFR available via:

ftp://gcc.gnu.org/pub/gcc/infrastructure/

This doesn't suggest to me that I should use the latest release.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [PATCH] [graphite] document that isl-0.16 is supported

2016-02-02 Thread Sebastian Huber

On 02/02/16 19:00, Mike Stump wrote:

On Feb 2, 2016, at 2:23 AM, Sebastian Huber 
 wrote:

>It would be good to have a recommended version as well (similar for cloog, 
gmp, mpc and mpfr). If you present me three versions which one should I choose as 
a naive user?

The latest release, or the one on your system.  This is so basic that we expect 
you to already know this.

>Are the versions in the contrib/download_prerequisites script the recommended 
ones?

Yes, they are.

If it is so basic to choose the latest release or the one on the system, 
then why uses the contrib/download_prerequisites ancient versions, e.g. 
the six year old GMP 4.3.2?

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

[PATCH] Fix c/69643, named address space wrong-code

2016-02-02 Thread Richard Henderson

In gimple_fold_indirect_ref, we STRIP_NOPS, find the ADDR_EXPR, and fold 
everything away.


I can't imagine it ever being correct to drop an address space change between 
pointers, so I've modified tree_nop_conversion_p.  Anything else seems to 
require more checks every places we use STRIP_NOPS.


Ok?


r~
diff --git a/gcc/testsuite/gcc.target/i386/addr-space-4.c 
b/gcc/testsuite/gcc.target/i386/addr-space-4.c
new file mode 100644
index 000..3e0966d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/addr-space-4.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { scan-assembler "gs:" } } */
+
+#define uintptr_t __SIZE_TYPE__
+
+struct S { int a, b, c; };
+
+extern struct S __seg_gs s;
+
+int foo (void)
+{
+  int r;
+  r = s.c;
+  return r;
+}
diff --git a/gcc/testsuite/gcc.target/i386/addr-space-5.c 
b/gcc/testsuite/gcc.target/i386/addr-space-5.c
new file mode 100644
index 000..4f73f95
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/addr-space-5.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { scan-assembler "gs:" } } */
+
+#define uintptr_t __SIZE_TYPE__
+
+struct S { int a, b, c; };
+
+extern struct S s;
+
+int ct_state3 (void)
+{
+  int r;
+  r = *((int __seg_gs *) (uintptr_t) &s.c);
+  return r;
+}
diff --git a/gcc/tree.c b/gcc/tree.c
index fa7646b..07cb9d9 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -12219,6 +12219,23 @@ block_ultimate_origin (const_tree block)
 bool
 tree_nop_conversion_p (const_tree outer_type, const_tree inner_type)
 {
+  /* Do not strip casts into or out of differing address spaces.  */
+  if (POINTER_TYPE_P (outer_type)
+  && TYPE_ADDR_SPACE (TREE_TYPE (outer_type)) != ADDR_SPACE_GENERIC)
+{
+  if (!POINTER_TYPE_P (inner_type)
+ || (TYPE_ADDR_SPACE (TREE_TYPE (outer_type))
+ != TYPE_ADDR_SPACE (TREE_TYPE (inner_type
+   return false;
+}
+  else if (POINTER_TYPE_P (inner_type)
+  && TYPE_ADDR_SPACE (TREE_TYPE (inner_type)) != ADDR_SPACE_GENERIC)
+{
+  /* We already know that outer_type is not a pointer with
+a non-generic address space.  */
+  return false;
+}
+
   /* Use precision rather then machine mode when we can, which gives
  the correct answer even for submode (bit-field) types.  */
   if ((INTEGRAL_TYPE_P (outer_type)

Go patch committed: Unpack method names when sorting them

2016-02-02 Thread Ian Lance Taylor

When using type reflection, you occasionally need to know the order of
a type's methods.  The order is simply an alphabetical sort.
Unfortunately, gccgo was not unpacking names before sorting them,
meaning that a type with a combination of exported and unexported
methods would have them in the wrong order.  This patch fixes the
problem.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 232892)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-8dce33f24dd3a34e3574c1d2604428586b63c1aa
+a408bef550251926c28673818db2c64302faac1d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 232855)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -2307,7 +2307,10 @@ class Sort_methods
   bool
   operator()(const std::pair& m1,
 const std::pair& m2) const
-  { return m1.first < m2.first; }
+  {
+return (Gogo::unpack_hidden_name(m1.first)
+   < Gogo::unpack_hidden_name(m2.first));
+  }
 };
 
 // Return a composite literal for the type method table for this type.
@@ -7684,7 +7687,8 @@ Interface_type::get_backend_methods(Gogo
   mfields[i].location = loc;
 
   // Sanity check: the names should be sorted.
-  go_assert(p->name() > last_name);
+  go_assert(Gogo::unpack_hidden_name(p->name())
+   > Gogo::unpack_hidden_name(last_name));
   last_name = p->name();
 }
 
@@ -10489,7 +10493,10 @@ struct Typed_identifier_list_sort
  public:
   bool
   operator()(const Typed_identifier& t1, const Typed_identifier& t2) const
-  { return t1.name() < t2.name(); }
+  {
+return (Gogo::unpack_hidden_name(t1.name())
+   < Gogo::unpack_hidden_name(t2.name()));
+  }
 };
 
 void

New Ukrainian PO file for 'cpplib' (version 6.1-b20160131)

2016-02-02 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Ukrainian team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/uk.po

(This file, 'cpplib-6.1-b20160131.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] [ARM] PR68532: Fix VUZP and VZIP recognition on big endian

2016-02-02 Thread Charles Baylis

On 1 February 2016 at 17:14, Kyrill Tkachov  wrote:

> Indeed I see the new passes on armeb-none-eabi.
> However, the new FAILs that I see are ICEs, not just vectorisation failures,
> so they need to be looked at.
>
> The ICEs that I see are:
> FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (internal compiler error)
> FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (internal compiler error)

Thanks. I hadn't seen these because I wasn't running the "expensive" tests.

> Seems that the code in expr.c asserts that expand_vec_perm returned a
> non-NULL result.

It seems that my implementation of arm_evpc_neon_vuzp doesn't handle
the one vector case correctly. I'm testing a fix.

> I'll look at the patches in more detail, but in the meantime I notice that
> there are some
> GNU style issues that should be resolved, like starting comments with a
> capital letter,
> two spaces after full stop, two spaces between full stop and close comment,
> as well as some
> lines over 80 characters. The check_GNU_style.sh script in the contrib/
> directory can help
> catch some (if not all) of these.

OK, I'll fix those.

> Also, can you please send any follow-up versions of the two patches as
> separate emails,
> so that we can more easily keep track of what's comment goes to which patch.

Will do.

Re: [PATCH] fix #69251 - [6 Regression] ICE in unify_array_domain on a flexible array member

2016-02-02 Thread Martin Sebor


On 02/02/2016 05:28 AM, Jason Merrill wrote:

On 01/25/2016 05:55 PM, Martin Sebor wrote:

The downside of this approach is that it prevents everything but
the front end from distinguishing flexible array members from
arrays of unspecified or unknown bounds.  The immediate impact
is that prevents us from maintaining ABI compatibility with GCC
5 (with -fabi-version=9) and from diagnosing the mangling change.
This means should we decide to adopt this approach, the final
version of the patch for c++/69277 mentioned above that's still
pending approval will need to be tweaked to have the ABI checks
removed.


That's unfortunate, but I think acceptable.


* decl.c (compute_array_index_type): Return null for flexible array
members.


Instead of this, I would think we can remove the calls to
compute_array_index_type added by your earlier patch, as well as many
other changes from that patch to handle null TYPE_MAX_VALUE.


Yes, that's possible but it didn't seem essential at this stage.
I wanted to make only conservative changes to avoid any further
fallout.  I also wasn't sure whether the ABI issue above would
make this approach unviable.




* tree.c (array_of_runtime_bound_p): Handle gracefully array types
with null TYPE_MAX_VALUE.


This seems unneeded.


(build_ctor_subob_ref): Loosen debug checking to handle flexible
array members.


And this shouldn't need the TYPE_MAX_VALUE check.


I went ahead and made the requested changes.  They might seem
perfectly innocuous to you but the removal of the tests for
TYPE_MAX_VALUE(t) being null makes me nervous at this stage.
I'm not nearly comfortable enough with the code to be confident
that they're all 100% safe.  I defer to your better judgment
on this.

In the patch, I also corrected some transgressions against
the GNU formatting rules I introduced in the the original
change.

I've regtested the patch on x86_64.

Please let me know if this patch is good to commit.  In light
of his today's comment on bug 69277 I have the impression
Jakub would like to have the mangling change in the Fedora
mass rebuild or release and I don't want to hold that up.

Thanks
Martin
PR c++/69251 - [6 Regression] ICE in unify_array_domain on a flexible array
   member
PR c++/69253 - [6 Regression] ICE in cxx_incomplete_type_diagnostic initializing
   a flexible array member with empty string
PR c++/69290 - [6 Regression] ICE on invalid initialization of a flexible array
   member
PR c++/69277 - [6 Regression] ICE mangling a flexible array member
PR c++/69349 - template substitution error for flexible array members

gcc/testsuite/ChangeLog:
2016-02-02  Martin Sebor  

	PR c++/69251
	PR c++/69253
	PR c++/69290
	PR c++/69277
	PR c++/69349
	* g++.dg/ext/flexarray-mangle-2.C: New test.
	* g++.dg/ext/flexarray-mangle.C: New test.
	* g++.dg/ext/flexarray-subst.C: New test.
	* g++.dg/ext/flexary11.C: New test.
	* g++.dg/ext/flexary12.C: New test.
	* g++.dg/ext/flexary13.C: New test.
	* g++.dg/ext/flexary14.C: New test.
	* g++.dg/other/dump-ada-spec-2.C: Adjust.

gcc/cp/ChangeLog:
2016-02-02  Martin Sebor  

	PR c++/69251
	PR c++/69253
	PR c++/69290
	PR c++/69277
	PR c++/69349
	* class.c (walk_subobject_offsets): Avoid testing the upper bound
	of a flexible array member for equality to null.
	(find_flexarrays): Remove spurious whitespace introduced in r231665.
	(diagnose_flexarrays): Avoid checking the upper bound of arrays.
	(check_flexarrays): Same.
	* decl.c (compute_array_index_type): Avoid special case for flexible
	array members.
	(grokdeclarator): Avoid calling compute_array_index_type for flexible
	array members.
	* error.c (dump_type_suffix): Revert changes introduced in r231665
	and rendered unnecessary by the changes above.
	* pt.c (tsubst):  Same.
	* tree.c (build_ctor_subob_ref): Handle flexible array members.
	* typeck2.c (digest_init_r): Revert changes introduced in r231665.
	(process_init_constructor_array): Same.
	(process_init_constructor_record): Same.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 45d8a24..9876197 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4120,9 +4120,8 @@ walk_subobject_offsets (tree type,
   /* Avoid recursing into objects that are not interesting.  */
   if (!CLASS_TYPE_P (element_type)
 	  || !CLASSTYPE_CONTAINS_EMPTY_CLASS_P (element_type)
-	  || !domain
-	  /* Flexible array members have no upper bound.  */
-	  || !TYPE_MAX_VALUE (domain))
+	  /* Flexible array members have a null domain.  */
+	  || !domain)
 	return 0;
 
   /* Step through each of the elements in the array.  */
@@ -6645,7 +6644,7 @@ find_flexarrays (tree t, flexmems_t *fmem)
   for (next = fld;
 	   (next = DECL_CHAIN (next))
 	 && TREE_CODE (next) != FIELD_DECL; );
-  
+
   tree fldtype = TREE_TYPE (fld);
   if (TREE_CODE (fld) != TYPE_DECL
 	  && RECORD_OR_UNION_TYPE_P (fldtype)
@@ -6672,22 +6671,20 @@ find_flexarrays (tree t, flexmems_t *fmem)
 	  /* Remember the first non-stati

New Vietnamese PO file for 'cpplib' (version 6.1-b20160131)

2016-02-02 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Vietnamese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/vi.po

(This file, 'cpplib-6.1-b20160131.vi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Marcin Kościelnicki

libgcc/ChangeLog:

* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
* config/s390/morestack.S: New file.
* config/s390/t-stack-s390: New file.
* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

* common/config/s390/s390-common.c (s390_supports_split_stack):
New function.
(TARGET_SUPPORTS_SPLIT_STACK): New macro.
* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
* config/s390/s390.c (struct machine_function): New field
split_stack_varargs_pointer.
(s390_register_info): Mark r12 as clobbered if it'll be used as temp
in s390_emit_prologue.
(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
vararg pointer.
(morestack_ref): New global.
(SPLIT_STACK_AVAILABLE): New macro.
(s390_expand_split_stack_prologue): New function.
(s390_live_on_entry): New function.
(s390_va_start): Use split-stack vararg pointer if appropriate.
(s390_asm_file_end): Emit the split-stack note sections.
(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
(UNSPECV_SPLIT_STACK_CALL): New unspec.
(UNSPECV_SPLIT_STACK_DATA): New unspec.
(split_stack_prologue): New expand.
(split_stack_space_check): New expand.
(split_stack_data): New insn.
(split_stack_call): New expand.
(split_stack_call_*): New insn.
(split_stack_cond_call): New expand.
(split_stack_cond_call_*): New insn.
---
Comment fixed, split_stack_marker gone, reorg gone.  Generated code seems sane,
but testsuite still running.

I will need to modify the gold patch to handle the "leaf function taking 
non-split
stack function address" issue - this will likely require messing with the target
independent plumbing, the hook for that doesn't seem to get enough params.

 gcc/ChangeLog|  30 ++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h|   1 +
 gcc/config/s390/s390.c   | 214 +++-
 gcc/config/s390/s390.md  | 138 
 libgcc/ChangeLog |   7 +
 libgcc/config.host   |   4 +-
 libgcc/config/s390/morestack.S   | 609 +++
 libgcc/config/s390/t-stack-s390  |   2 +
 libgcc/generic-morestack.c   |   4 +
 10 files changed, 1016 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a2cec8..568dff4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,33 @@
+2016-02-02  Marcin Kościelnicki  
+
+   * common/config/s390/s390-common.c (s390_supports_split_stack):
+   New function.
+   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
+   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+   * config/s390/s390.c (struct machine_function): New field
+   split_stack_varargs_pointer.
+   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
+   in s390_emit_prologue.
+   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+   vararg pointer.
+   (morestack_ref): New global.
+   (SPLIT_STACK_AVAILABLE): New macro.
+   (s390_expand_split_stack_prologue): New function.
+   (s390_live_on_entry): New function.
+   (s390_va_start): Use split-stack vararg pointer if appropriate.
+   (s390_asm_file_end): Emit the split-stack note sections.
+   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+   * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+   (UNSPECV_SPLIT_STACK_CALL): New unspec.
+   (UNSPECV_SPLIT_STACK_DATA): New unspec.
+   (split_stack_prologue): New expand.
+   (split_stack_space_check): New expand.
+   (split_stack_data): New insn.
+   (split_stack_call): New expand.
+   (split_stack_call_*): New insn.
+   (split_stack_cond_call): New expand.
+   (split_stack_cond_call_*): New insn.
+
 2016-02-02  Thomas Schwinge  
 
* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
diff --git a/gcc/common/config/s390/s390-common.c 
b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *opts 
ATTRIBUTE_UNUSED,
 }
 }
 
+/* -fsplit-stack uses a field in the TCB, available with glibc-2.23.
+   We don't verify it, since earlier versions just have padding at
+   its place, which works just as well.  */
+
+static bool
+s390_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+  struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
 #undef TARGET_DEFAULT_TARGET_FLAGS
 #define TARGET_DEFAULT_TARGE

Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-02 Thread Steve Ellcey

On Sat, 2016-01-30 at 11:06 +, Richard Sandiford wrote:

> We need to be careful of examples like:
> 
>   struct __attribute__ ((aligned (8))) s { _Complex float x; };
>   void foo (struct s *ptr, struct s val) { *ptr = val; }
> 
> "x" gets SCmode, which has an alignment of 4.  And it's OK for TYPE_MODE
> to have a smaller alignment than the type -- it's just not allowed to
> have a larger alignment (and even that restriction only applies because
> this is a STRICT_ALIGNMENT target).  So the structure itself inherits
> this SCmode.
> 
> The patch therefore changes how we handle foo() for -mabi=32 -msoft-float.
> Before the patch "val" is passed in $6 and $7.  After the patch it's
> passed in $5 and $6.  clang behaves like the unpatched GCC.
> 
> If instead we use:
> 
>   struct __attribute__ ((aligned (8))) s { float x; float y; };
>   void foo (struct s *ptr, struct s val) { *ptr = val; }
> 
> then the structure has BLKmode and the alignment is honoured both before
> and after the patch.
> 
> There's no real ABI reason for handling the two cases differently.
> The fact that one gets BLKmode and the other doesn't is down
> to GCC internals.
> 
> We also have to be careful about the STRICT_ALIGNMENT thing.
> At the moment that's hard-coded to 1 for MIPS, but it's possible that
> it could become configurable in future, like it is for aarch64 and
> rs6000.  !STRICT_ALIGNMENT allows TYPE_MODE to have a larger
> alignment than the type, so underaligned structures would get modes
> when they didn't previously.  That would in turn change how they're
> passed as arguments.
> 
> Thanks,
> Richard

Richard,

Can you explain why the GCC internals cause us to get SCmode instead of
BLKmode for the example with _Complex?  I don't understand that.  It
seems wrong to me and I don't understand where it is coming from.

Steve Ellcey
sell...@imgtec.com

New Brazilian Portuguese PO file for 'cpplib' (version 6.1-b20160131)

2016-02-02 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Brazilian Portuguese team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/pt_BR.po

(This file, 'cpplib-6.1-b20160131.pt_BR.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.

2016-02-02 Thread Joern Wolfgang Rennecke




On 01/02/16 13:57, Claudiu Zissulescu wrote:

In this patch, we add support for the new FPU instructions available with
ARC V2 processors.  The new FPU instructions covers both single and
double precision IEEE formats. While the single precision is available
for both ARC EM and ARC HS processors, the double precision is only
available for ARC HS. ARC EM will make use of the double precision assist
instructions which are in fact FPX double instructions.  The double
floating point precision instructions are making use of the odd-even
register pairs to hold 64-bit datums, exactly like in the case of ldd/std
instructions.

Additional to the mods required by FPU instructions to be supported by
GCC, we forced all the 64 bit datum to use odd-even register pairs (HS
only), as we observed a better usage of the ldd/std, and less generated
move instructions.  A byproduct of this optimization, is a new ABI, which
places the 64-bit arguments into odd-even register pairs.  This behavior
can be selected using -mabi option.

Feedback is welcomed,


  VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI */
+
+/* FPU conditon flags. */

Typo

+   error ("FPU double precission options are available for ARC HS 
only.");


There should be no period at the end of the error message string.

+  if (TARGET_HS && (arc_fpu_build & FPX_DP))
+   error ("FPU double precission assist "

Typo.  And Ditto.


+  case EQ:
+  case NE:
+  case UNORDERED:
+  case UNLT:
+  case UNLE:
+  case UNGT:
+  case UNGE:
+   return CC_FPUmode;
+
+  case LT:
+  case LE:
+  case GT:
+  case GE:
+  case ORDERED:
+   return CC_FPUEmode;

cse and other code transformations are likely to do better if you use
just one mode for these.  It is also very odd to have comparisons and their
inverse use different modes.  Have you done any benchmarking for this?

@@ -1282,6 +1363,16 @@ arc_conditional_register_usage (void)
arc_hard_regno_mode_ok[60] = 1 << (int) S_MODE;
 }

+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
+ registers.  */
+  if (TARGET_HS)
+{
+  for (regno = 1; regno < 32; regno +=2)
+   {
+ arc_hard_regno_mode_ok[regno] = S_MODES;
+   }
+}
+

Does TARGET_HS with -mabi=default allow for passing DFmode / DImode 
arguments

in odd registers?  I fear you might run into reload trouble when trying to
access the values.

+arc_hard_regno_nregs (int regno,
...
+  if ((regno < FIRST_PSEUDO_REGISTER)
+  && (HARD_REGNO_MODE_OK (regno, mode)
+ || (mode == BLKmode)))
+return words;
+  return 0;

This prima facie contradicts HARD_REGNO_NREGS, which considers the
larger sizes of simd vector and dma config registers.
I see that there is no actual conflict as the vector registers are not
used for argument passing, but the comment in the function only states
what the function does - not quite correctly, as detailed before - and
not what it is for.

So, either the mxp support has to be removed before this patch goes in,
or arc_hard_regno_nregs has to handle simd registers properly, or the
comment at the top should state the limited applicability of this
function, and there should be an assert to check that the register
number passed is suitable - e.g.:
gcc_assert (regno < ARC_FIRST_SIMD_VR_REG)

+/* Given an CUMULATIVE_ARGS, this function returns an RTX if the

Typo: C is not a vowel.

+  if (!named && TARGET_HS)
+{
+  /* For unamed args don't try fill up the reg-holes.  */
+  reg_idx = cum->last_reg;
+  /* Only interested in the number of regs.  */

You should make up your mind what the priorities for stdarg are.
Traditionally, lots of gcc ports have supported broken code that lacks
declarations of variadic functions, and furthermore have placed
emphasis on simplicity of varargs/stdarg callee code, at the expense
of normal code.  Often for compatibility with a pre-existing
compiler, sometimes by just copying from existing ports without
stopping to consider the ramifications.
If you make argument passing different for stdarg declared functions,
the broken code that lacks declarations won't work any more.
Ignoring registers for argument passing is not helping the callers
code density.  So the only objective that might be furthered here
is stdarg callee simplicity.  But if you really want that, and ignore
compatibility with broken code, the logical thing to do is not to
pass any unnamed arguments in registers.

If stdarg caller's code size is considered important, and stdarg
callees mostly irrelevant (as mostly associated with I/O, and
linked in just once per function), this aligns well with supporting
broken code: it shouldn't matter if the argument is anonymous or
not, it's the same effort for the caller to pass it.

One further thing to consider when forging new ABIs is that
partial argument passing is there solely for the convenience of
stdarg callees, and/or the programmer who wrote that part of
the t

[PATCH] Add GCC Runtime Library Exception to include/plugin-api.h

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 1:35 PM, Cary Coutant  wrote:
>> include/plugin-api.h defines an ABI between linker and compiler,
>> which can be used to implement linker plug-in by any compilers.
>> I'd like to add GCC Runtime Library Exception to include/plugin-api.h
>> so that the linker plug-in can have non-GPL licenses.
>
> This is OK with me.
>
> -cary

Here is a patch.  OK for trunk?

Thanks.

-- 
H.J.
From 3f8f62505774116d5de233ca36f60e3f8a840516 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Tue, 2 Feb 2016 14:02:03 -0800
Subject: [PATCH] Add GCC Runtime Library Exception to plugin-api.h

	* COPYING.RUNTIME: New file.
	* plugin-api.h: Add GCC Runtime Library Exception.
---
 include/COPYING.RUNTIME | 73 +
 include/plugin-api.h| 11 +++-
 2 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 include/COPYING.RUNTIME

diff --git a/include/COPYING.RUNTIME b/include/COPYING.RUNTIME
new file mode 100644
index 000..e1b3c69
--- /dev/null
+++ b/include/COPYING.RUNTIME
@@ -0,0 +1,73 @@
+GCC RUNTIME LIBRARY EXCEPTION
+
+Version 3.1, 31 March 2009
+
+Copyright (C) 2009 Free Software Foundation, Inc. 
+
+Everyone is permitted to copy and distribute verbatim copies of this
+license document, but changing it is not allowed.
+
+This GCC Runtime Library Exception ("Exception") is an additional
+permission under section 7 of the GNU General Public License, version
+3 ("GPLv3"). It applies to a given file (the "Runtime Library") that
+bears a notice placed by the copyright holder of the file stating that
+the file is governed by GPLv3 along with this Exception.
+
+When you use GCC to compile a program, GCC may combine portions of
+certain GCC header files and runtime libraries with the compiled
+program. The purpose of this Exception is to allow compilation of
+non-GPL (including proprietary) programs to use, in this way, the
+header files and runtime libraries covered by this Exception.
+
+0. Definitions.
+
+A file is an "Independent Module" if it either requires the Runtime
+Library for execution after a Compilation Process, or makes use of an
+interface provided by the Runtime Library, but is not otherwise based
+on the Runtime Library.
+
+"GCC" means a version of the GNU Compiler Collection, with or without
+modifications, governed by version 3 (or a specified later version) of
+the GNU General Public License (GPL) with the option of using any
+subsequent versions published by the FSF.
+
+"GPL-compatible Software" is software whose conditions of propagation,
+modification and use would permit combination with GCC in accord with
+the license of GCC.
+
+"Target Code" refers to output from any compiler for a real or virtual
+target processor architecture, in executable form or suitable for
+input to an assembler, loader, linker and/or execution
+phase. Notwithstanding that, Target Code does not include data in any
+format that is used as a compiler intermediate representation, or used
+for producing a compiler intermediate representation.
+
+The "Compilation Process" transforms code entirely represented in
+non-intermediate languages designed for human-written code, and/or in
+Java Virtual Machine byte code, into Target Code. Thus, for example,
+use of source code generators and preprocessors need not be considered
+part of the Compilation Process, since the Compilation Process can be
+understood as starting with the output of the generators or
+preprocessors.
+
+A Compilation Process is "Eligible" if it is done using GCC, alone or
+with other GPL-compatible software, or if it is done without using any
+work based on GCC. For example, using non-GPL-compatible Software to
+optimize any GCC intermediate representations would not qualify as an
+Eligible Compilation Process.
+
+1. Grant of Additional Permission.
+
+You have permission to propagate a work of Target Code formed by
+combining the Runtime Library with Independent Modules, even if such
+propagation would otherwise violate the terms of GPLv3, provided that
+all Target Code was generated by Eligible Compilation Processes. You
+may then convey such a combination under terms of your choice,
+consistent with the licensing of the Independent Modules.
+
+2. No Weakening of GCC Copyleft.
+
+The availability of this Exception does not imply any general
+presumption that third-party software is unaffected by the copyleft
+requirements of the license of GCC.
+
diff --git a/include/plugin-api.h b/include/plugin-api.h
index d7f9ee3..46686be 100644
--- a/include/plugin-api.h
+++ b/include/plugin-api.h
@@ -18,7 +18,16 @@
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
-   MA 02110-1301, USA.  */
+   MA 02110-1301, USA.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Excepti

Re: Turnoff prefetching for -march=znver1

2016-02-02 Thread Uros Bizjak

On Tue, Feb 2, 2016 at 9:28 PM, Stepanyan, Victoria
 wrote:
> Hi Maintainers,
>
> This patch disables prefetching for -march=znver1 which is turned on by 
> default.
>
> gcc/ChangeLog:
>
> 2016-02-02 Victoria Stepanyan 
>
> * gcc/config/i386/x86-tune.def: Disable default prefetching for 
> -march=znver1
>
> Ok for trunk?

OK.

Thanks,
Uros.

Re: [PATCH] c/69540 - update documentation on -l

2016-02-02 Thread Arkadiusz Drabczyk

On 2016-02-02, Sandra Loosemore  wrote:
> I see that the documentation of -l does need to be updated to mention 
> .so files, but I think your patch doesn't go far enough.  It's already 
> confusing because that sentence says "The only difference is...", and 
> then mentions *two* things it does differently, and you're adding even 
> more things.
>
> Instead, I suggest dropping this confusing sentence entirely and putting 
> the new information a couple paragraphs higher up:
>
>> The linker searches a standard list of directories for the library,
>> which is actually a file named @file{lib@var{library}.a}. The linker
>> then uses this file as if it had been specified precisely by name.
>
> How about just changing that to read
>
> ...a file named @file{lib@var{library}.so}; or, if shared libraries are 
> not supported, are disabled via @option{-static}, or no @samp{.so} file 
> is found, @file{lib@var{library}.a}.

Nice, indeed, more readable than what I came up with plus info on
-static added.  Looks good to me.

-- 
Arkadiusz Drabczyk

[wwwdocs] Add common C++ issues to /gcc-6/porting_to.html

2016-02-02 Thread Jonathan Wakely


This documents the most likely problems for C++ programs using GCC 6.

Committed to CVS.

Index: htdocs/gcc-6/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/porting_to.html,v
retrieving revision 1.2
diff -u -r1.2 porting_to.html
--- htdocs/gcc-6/porting_to.html	27 Jan 2016 14:40:26 -	1.2
+++ htdocs/gcc-6/porting_to.html	2 Feb 2016 20:32:29 -
@@ -33,6 +33,132 @@
 
 C++ language issues
 
+Default standard is now GNU++14
+
+
+GCC defaults to -std=gnu++14 instead of -std=gnu++98.
+This brings several changes that users should be aware of.  The following
+paragraphs describe some of these changes and suggest how to deal with them.
+
+
+Some users might prefer to stay with gnu++98, in which case we suggest to
+use the -std=gnu++98 command-line option, perhaps by putting it
+in CXXFLAGS or similar variables in Makefiles.
+
+Narrowing conversions
+
+
+The C++11 standard does not allow "narrowing conversions" inside braced
+initialization lists, meaning conversions to a type with less precision or
+a smaller range, for example:
+
+
+int i = 127;
+char s[] = { i, 256 };
+
+
+
+In the above example the value 127 would fit in char but
+because it's not a constant it is still a narrowing conversion. If the value
+256 is larger than CHAR_MAX then that is also a narrowing
+conversion. Narrowing conversions can be avoided by using an explicit cast,
+e.g. (char)i.
+
+
+Invalid literal suffixes
+
+
+The C++11 "user-defined literals" feature allows custom suffixes to be added
+to literals, so that for example "Hello, world!"s creates a
+std::string object. This means that code relying on string
+concatenation of string literals and macros might fail to compile, for
+example using printf("%"PRIu64, uint64_value) is not valid in
+C++11, because PRIu64 is parsed as a literal suffix. To fix
+the code to compile in C++11 add whitespace between the string literal and the
+macro: printf("%" PRIu64, uint64_value).
+
+
+Cannot convert 'bool' to 'T*'
+
+
+The current C++ standard only allows integer literals to be used as null
+pointer constants, so other constants such as false and
+(1 - 1) cannot be used where a null pointer is desired. Code that
+fails to compile with this error should be changed to use nullptr,
+or 0, or NULL.
+
+
+Cannot convert 'std::ostream' to 'bool'
+
+
+Since C++11 iostream classes are no longer implicitly convertible to
+void* so it is no longer valid to do something like:
+
+
+  bool valid(std::ostream& os) { return os; }
+
+
+
+Such code must be changed to convert the iostream object to bool
+explicitly:
+
+
+
+  bool valid(std::ostream& os) { return (bool)os; }
+
+
+Header dependency changes
+
+
+The  header has been changed to reduce the
+number of other headers it includes in C++11 mode or above.
+As such, C++ programs that used components defined in
+, , or
+ without explicitly including the right headers
+will no longer compile.
+
+
+Header  changes
+
+
+Some C libraries declare obsolete int isinf(double) or
+int isnan(double) functions in the 
+header. These functions conflict with standard C++ functions with the same
+name but a different return type (the C++ functions return bool).
+When the obsolete functions are declared by the C library the C++ library
+will use them and import them into namespace std
+instead of defining the correct signatures.
+
+
+Header  changes
+
+
+The C++ library now provides its own  header that
+wraps the C library header of the same name. The C++ header defines
+additional overloads of some functions and ensures that all standard
+functions are defined as real functions and not as macros.
+Code which assumes that sin, cos, pow,
+isfinite etc. are macros may no longer compile.
+
+
+Header  changes
+
+
+The C++ library now provides its own  header that
+wraps the C library header of the same name. The C++ header defines
+additional overloads of some functions and ensures that all standard
+functions are defined as real functions and not as macros.
+Code which assumes that abs, malloc etc.
+are macros may no longer compile.
+
+
+
+Programs which provide their own wrappers for 
+or other standard headers are operating outside the standard and so are
+responsible for ensuring their headers work correctly with the headers in
+the C++ standard library.
+
+
 -Wmisleading-indentation
 
 A new warning -Wmisleading-indentation was added

[PATCH] vector-compare-4.c

2016-02-02 Thread Segher Boessenkool

This testcase fails on 32-bit powerpc-linux with

Excess errors:
/home/segher/src/gcc/gcc/testsuite/c-c++-common/vector-compare-4.c:31:1: 
warning: GCC vector returned by reference: non-standard ABI extension with no 
compatibility guarantee

Fix this as in vector-compare-2.c .

Tested on powerpc64-linux, -m32 and -m64; installing as obvious.


Segher


2016-02-02  Segher Boessenkool  

testsuite/
* c-c++-common/vector-compare-4.c: Prune "non-standard ABI extension"
warning.

---
 gcc/testsuite/c-c++-common/vector-compare-4.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/vector-compare-4.c 
b/gcc/testsuite/c-c++-common/vector-compare-4.c
index b44f474..b05decc 100644
--- a/gcc/testsuite/c-c++-common/vector-compare-4.c
+++ b/gcc/testsuite/c-c++-common/vector-compare-4.c
@@ -1,6 +1,8 @@
 /* PR c/68062 */
 /* { dg-do compile } */
 /* { dg-options "-Wsign-compare" } */
+/* Ignore warning on some powerpc configurations. */
+/* { dg-prune-output "non-standard ABI extension" } */
 
 typedef signed char __attribute__ ((vector_size (4))) v4qi;
 typedef unsigned char __attribute__ ((vector_size (4))) uv4qi;
-- 
1.9.3

Turnoff prefetching for -march=znver1

2016-02-02 Thread Stepanyan, Victoria

Hi Maintainers,

This patch disables prefetching for -march=znver1 which is turned on by default.

gcc/ChangeLog:

2016-02-02 Victoria Stepanyan 

* gcc/config/i386/x86-tune.def: Disable default prefetching for 
-march=znver1

Ok for trunk?

Victoria

--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -218,7 +218,7 @@ DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit",
at -O3.  For the moment, the prefetching seems badly tuned for Intel
chips.  */
DEF_TUNE (X86_TUNE_SOFTWARE_PREFETCHING_BENEFICIAL, 
"software_prefetching_beneficial",
-  m_K6_GEODE | m_AMD_MULTIPLE)
+  m_K6_GEODE | m_ATHLON_K8 | m_AMDFAM10 | m_BDVER | m_BTVER)

/* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
on 16-bit immediate moves into memory on Core2 and Corei7.  */

[Patch, Fortran] PR 69495: unused-label warning does not tell which flag triggered it

2016-02-02 Thread Janus Weil

Hi all,

here is a diagnostics patch, which makes sure that the responsible
flag is printed in several warning messages (for which this was still
missing).

The  only case that I'm not completely sure about is the hunk in
intrinsic.c. In particular I was not able to trigger this warning and
found no occurrence of it in the testsuite. Could someone check if the
flag that I'm using there is correct, please?

As a small extra the patch also mentions the -Wpedantic flag in the
gfortran documentation.

It regtests cleanly on x86_64-linux-gnu. Ok for trunk?

Cheers,
Janus


2016-02-01  Janus Weil  

PR fortran/69495
* invoke.texi: Mention -Wpedantic as an alias of -pedantic.
* check.c (gfc_check_transfer): Mention responsible flag in warning
message.
* frontend-passes.c (do_warn_function_elimination): Ditto.
* intrinsic.c (gfc_check_intrinsic_standard): Ditto.
* resolve.c (resolve_elemental_actual): Ditto.
(resolve_operator): Ditto.
(warn_unused_fortran_label): Ditto.
* trans-common.c (translate_common): Ditto.


2016-02-01  Janus Weil  

PR fortran/69495
* gfortran.dg/elemental_optional_args_6.f90: Use -Wpedantic flag.
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c (Revision 233091)
+++ gcc/fortran/check.c (Arbeitskopie)
@@ -5180,9 +5180,9 @@ gfc_check_transfer (gfc_expr *source, gfc_expr *mo
 return true;
 
   if (source_size < result_size)
-gfc_warning (0, "Intrinsic TRANSFER at %L has partly undefined result: "
-"source size %ld < result size %ld", &source->where,
-(long) source_size, (long) result_size);
+gfc_warning (OPT_Wsurprising, "Intrinsic TRANSFER at %L has partly "
+"undefined result: source size %ld < result size %ld",
+&source->where, (long) source_size, (long) result_size);
 
   return true;
 }
Index: gcc/fortran/frontend-passes.c
===
--- gcc/fortran/frontend-passes.c   (Revision 233091)
+++ gcc/fortran/frontend-passes.c   (Arbeitskopie)
@@ -715,11 +715,11 @@ do_warn_function_elimination (gfc_expr *e)
   if (e->expr_type != EXPR_FUNCTION)
 return;
   if (e->value.function.esym)
-gfc_warning (0, "Removing call to function %qs at %L",
-e->value.function.esym->name, &(e->where));
+gfc_warning (OPT_Wfunction_elimination, "Removing call to function %qs "
+"at %L", e->value.function.esym->name, &(e->where));
   else if (e->value.function.isym)
-gfc_warning (0, "Removing call to function %qs at %L",
-e->value.function.isym->name, &(e->where));
+gfc_warning (OPT_Wfunction_elimination, "Removing call to function %qs "
+"at %L", e->value.function.isym->name, &(e->where));
 }
 /* Callback function for the code walker for doing common function
elimination.  This builds up the list of functions in the expression
Index: gcc/fortran/intrinsic.c
===
--- gcc/fortran/intrinsic.c (Revision 233091)
+++ gcc/fortran/intrinsic.c (Arbeitskopie)
@@ -4369,7 +4369,7 @@ gfc_check_intrinsic_standard (const gfc_intrinsic_
 {
   /* Do only print a warning if not a GNU extension.  */
   if (!silent && isym->standard != GFC_STD_GNU)
-   gfc_warning (0, "Intrinsic %qs (is %s) is used at %L",
+   gfc_warning (OPT_Wintrinsics_std, "Intrinsic %qs (is %s) is used at %L",
 isym->name, _(symstd_msg), &where);
 
   return true;
Index: gcc/fortran/invoke.texi
===
--- gcc/fortran/invoke.texi (Revision 233091)
+++ gcc/fortran/invoke.texi (Arbeitskopie)
@@ -709,8 +709,10 @@ Check the code for syntax errors, but do not actua
 will generate module files for each module present in the code, but no
 other output file.
 
-@item -pedantic
+@item -Wpedantic
+@itemx -pedantic
 @opindex @code{pedantic}
+@opindex @code{Wpedantic}
 Issue warnings for uses of extensions to Fortran 95.
 @option{-pedantic} also applies to C-language constructs where they
 occur in GNU Fortran source files, such as use of @samp{\e} in a
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (Revision 233091)
+++ gcc/fortran/resolve.c   (Arbeitskopie)
@@ -2127,9 +2127,9 @@ resolve_elemental_actual (gfc_expr *expr, gfc_code
  && (set_by_optional || arg->expr->rank != rank)
  && !(isym && isym->id == GFC_ISYM_CONVERSION))
{
- gfc_warning (0, "%qs at %L is an array and OPTIONAL; IF IT IS "
-  "MISSING, it cannot be the actual argument of an "
-  "ELEMENTAL procedure unless there is a non-optional "
+ gfc_warning (OPT_Wpedantic, "%qs at %L is an array and OPTIONAL; "
+

New template for 'gcc' made available

2016-02-02 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

http://translationproject.org/POT-files/gcc-6.1-b20160131.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

ftp://gcc.gnu.org/pub/gcc/snapshots/6-20160131/gcc-6-20160131.tar.bz2

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] c/69540 - update documentation on -l

2016-02-02 Thread Sandra Loosemore


On 01/30/2016 10:33 AM, Arkadiusz Drabczyk wrote:

* doc/invoke.texi: update documentation WRT .so libraries in -l
---
  gcc/ChangeLog   | 4 
  gcc/doc/invoke.texi | 8 +---
  2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1d60690..0a6acdb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2016-01-30  Arkadiusz Drabczyk  
+
+   * doc/invoke.texi: update documentation WRT .so libraries in -l
+
  2016-01-29  Martin Jambor  

* hsa-gen.c (get_memory_order_name): Mask with MEMMODEL_BASE_MASK.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ba0b4b2..8b1b329 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10440,9 +10440,11 @@ whose members are object files.  The linker handles an 
archive file by
  scanning through it for members which define symbols that have so far
  been referenced but not defined.  But if the file that is found is an
  ordinary object file, it is linked in the usual fashion.  The only
-difference between using an @option{-l} option and specifying a file name
-is that @option{-l} surrounds @var{library} with @samp{lib} and @samp{.a}
-and searches several directories.
+difference between using an @option{-l} option and specifying a file
+name is that @option{-l} surrounds @var{library} with @samp{lib} and
+@samp{.so} on systems with shared libraries support or with @samp{.a} if
+@var{library} with @samp{.so} is not found and on all other systems and
+searches several directories.

  @item -lobjc
  @opindex lobjc



I see that the documentation of -l does need to be updated to mention 
.so files, but I think your patch doesn't go far enough.  It's already 
confusing because that sentence says "The only difference is...", and 
then mentions *two* things it does differently, and you're adding even 
more things.


Instead, I suggest dropping this confusing sentence entirely and putting 
the new information a couple paragraphs higher up:



The linker searches a standard list of directories for the library,
which is actually a file named @file{lib@var{library}.a}. The linker
then uses this file as if it had been specified precisely by name.


How about just changing that to read

...a file named @file{lib@var{library}.so}; or, if shared libraries are 
not supported, are disabled via @option{-static}, or no @samp{.so} file 
is found, @file{lib@var{library}.a}.


??

-Sandra

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Marcin Kościelnicki


On 02/02/16 19:33, Ulrich Weigand wrote:

Marcin KoÅ›cielnicki wrote:


Here we go.  I've also removed the "see below", since I don't really
see anything below...


The "see below" refers to this code (which I agree isn't really obvious):

   if (TARGET_TPF_PROFILING)
 {
   /* Generate a BAS instruction to serve as a function
  entry intercept to facilitate the use of tracing
  algorithms located at the branch target.  */
   emit_insn (gen_prologue_tpf ());

What is not explicitly called out here is that this tracing function
actually refers to some hard registers, in particular r14, and assumes
they still have the original contents as at function entry.

That is why the prolog code avoid using r14 as temporary if the TPF
tracing mechanism is in use.  Now I think this doesn't apply to r12,
so this part of your patch should still be fine.  (In addition, TPF
is not going to support split stacks --or indeed the Go language--
anyway, so it doesn't really matter all that much.)


Very well, I'll improve the comment.



I do have two other issues; sorry for bringing those up again although
they've been discussed up in the past, but I still think we can find
some improvements here ...

The first is the question Andreas brought up, why we need the extra
set of insns introduced by s390_reorg.  I think this may really have
been necessary for the ESA case where data elements had to be intermixed
into code at a specific location.  But since we no longer support ESA,
we now just have a data block that can be placed anywhere.  For example,
we could just have an insn (at any point in the prolog stream) that
simply emits the full data block during final output, along the lines of
(note: needs to be updated for SImode vs. DImode.):

(define_insn "split_stack_data"
   [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
  (match_operand 1 "bras_sym_operand" "X")
  (match_operand 2 "consttable_operand" "X")
  (match_operand 3 "consttable_operand" "X")]
 UNSPECV_SPLIT_STACK_DATA)]
   ""
{
   switch_to_section (targetm.asm_out.function_rodata_section
   (current_function_decl));

   output_asm_insn (\".align 3", operands);
   (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
  CODE_LABEL_NUMBER (operands[0]));
   output_asm_insn (\".quad %2\", operands);
   output_asm_insn (\".quad %3\", operands);
   output_asm_insn (\".quad %1-%0\", operands);

   switch_to_section (current_function_section ());
   return "";
}
   [(set_attr "length" "0")])

Or possibly even cleaner, we can simply define the data block at the
tree level as if it were an initialized global variable of a certain
struct type, and just leave it to common code to emit it as usual.

Then we just have the code bits, but I don't really see much
difference between the split_stack_call and split_stack_sibcall
patterns (apart from the data block), so if code flow is OK with
the former insns, it should be OK with the latter too ..

[ Or else, if there *are* code flow issues, the other alternative
would be to emit the full call sequence, code and data, from a
single insn pattern during final output.  This might have the extra
benefit that the assembler sequence is fully fixed, and thus easier
to detect in the linker.  ]

Getting rid of the extra transformation in s390_reorg would not
just remove a bunch of code from the back-end (always good!),
it would also speed up compile time a bit.


When I wasn't using reorg, I had problems with gcc deleting the label in 
.rodata, since it wasn't used by any jump instruction.  I guess having a 
whole-block instruction that emits the label on its own should solve the 
issue, though - let's try that.



The second issue I'm still not sure about is the magic nop marker
for frameless functions.  In an earlier mail you wrote:


Both currently supported
architectures always emit split-stack code on every function.


At least for rs6000 this doesn't appear to be true; in
rs6000_expand_split_stack_prologue we have:

   if (!info->push_p)
 return;

so it does nothing for frameless routines.

Now on i386 we do indeed generate code for frameless routines;
in fact, the *same* full stack check is generated as for any
other routine.  Now I'm wondering: is there are reason why
this check would be necessary (and there's simply a bug in
the rs6000 implementation)?  Then we obviously should do the
same on s390.


Try that on powerpc64(le):

$ cat a.c
#include 

void f(void) {
}

typedef void (*fptr)(void);

fptr g(void);

int main() {
printf("%p\n", g());
}

$ cat b.c
void f(void);

typedef void (*fptr)(void);

fptr g(void) {
return f;
}

$ gcc -O3 -fsplit-stack -c b.c
$ gcc -O3 -c a.c
$ gcc a.o b.o -fuse-ld=gold

I don't have a recent enough gcc for powerpc, but from what I've seen in 
the code, this should explode with a linker error.


Of course, mixin

New template for 'cpplib' made available

2016-02-02 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'cpplib' has been made available
to the language teams for translation.  It is archived as:

http://translationproject.org/POT-files/cpplib-6.1-b20160131.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

ftp://gcc.gnu.org/pub/gcc/snapshots/6-20160131/gcc-6-20160131.tar.bz2

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PING] Add new mexecute-only arm option.

2016-02-02 Thread Sandra Loosemore


On 02/02/2016 02:06 AM, mickael guene wrote:

Hi All,

  Ping for following thread :

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01968.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01969.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01970.html


Two comments:

(1) MIPS has had a similar option for quite some time called 
-mcode-readable=.  It might be less confusing to use a similar name for 
the ARM option with the similar reversed sense to -mexecute-only, even 
if it doesn't need to be a tristate flag like for MIPS.


(2) I suggest changing the help string for the command line option


+
+mexecute-only
+Target Report Var(target_execute_only) Init(0)
+Forbid load into text sections.


to use the same wording as the documentation in the manual:


+@item -mexecute-only
+@opindex mexecute-only
+Disable read memory access inside code sections.  Only code fetching is
+allowed.
+This option is off by default.
+


Or at least, "load into text sections" is confusing.  (You load *from* 
the text section, not *into* it, right?)


-Sandra

[committed, PATCH] Add IA MCU tests for passing/returning of empty structures/unions

2016-02-02 Thread H.J. Lu

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8277dff..afe4720 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2016-02-02  H.J. Lu  
+
+   * gcc.target/i386/iamcu/test_empty_structs_and_unions.c: New test.
+
 2016-02-02  James Norris  
 
* c-c++-common/goacc/routine-5.c: Add tests.
diff --git 
a/gcc/testsuite/gcc.target/i386/iamcu/test_empty_structs_and_unions.c 
b/gcc/testsuite/gcc.target/i386/iamcu/test_empty_structs_and_unions.c
new file mode 100644
index 000..15209e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/iamcu/test_empty_structs_and_unions.c
@@ -0,0 +1,61 @@
+/* This tests passing and returning of empty structures and unions.  */
+
+#include "defines.h"
+#include "args.h"
+
+struct IntegerRegisters iregbits = { ~0, ~0, ~0, ~0, ~0, ~0 };
+struct IntegerRegisters iregs;
+unsigned int num_iregs;
+
+struct empty_struct
+{
+};
+
+struct empty_struct
+check_struct_passing(struct empty_struct s0 ATTRIBUTE_UNUSED,
+struct empty_struct s1 ATTRIBUTE_UNUSED,
+int i0 ATTRIBUTE_UNUSED)
+{
+  struct empty_struct s;
+  check_int_arguments;
+  return s;
+}
+
+#define check_struct_passing WRAP_CALL(check_struct_passing)
+
+union empty_union
+{
+};
+
+union empty_union
+check_union_passing(union empty_union u0 ATTRIBUTE_UNUSED,
+   union empty_union u1 ATTRIBUTE_UNUSED,
+   int i0 ATTRIBUTE_UNUSED)
+{
+  union empty_union u;
+  check_int_arguments;
+  return u;
+}
+
+#define check_union_passing WRAP_CALL(check_union_passing)
+
+int
+main (void)
+{
+  struct empty_struct s;
+  union empty_union u;
+
+  clear_struct_registers;
+  iregs.I0 = 32;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_union_passing(u,u,32);
+
+  clear_struct_registers;
+  iregs.I0 = 33;
+  num_iregs = 1;
+  clear_int_hardware_registers;
+  check_struct_passing(s,s,33);
+
+  return 0;
+}

Re: [Patch, fortran, pr67451, gcc-5, v1] [5/6 Regression] ICE with sourced allocation from coarray

2016-02-02 Thread Paul Richard Thomas

Hi Andre,

This one looks good too. As every day goes by, I see more and more why
Tobias was so keen to incorporate all objects into a single descriptor
type :-)

OK for 5-branch.

Thanks for both the patches

Paul

On 1 February 2016 at 13:34, Andre Vehreschild  wrote:
> Oh, well, now with attachments. I am sorry.
>
> - Andre
>
> On Mon, 1 Feb 2016 13:20:24 +0100
> Andre Vehreschild  wrote:
>
>> Hi all,
>>
>> here is the backport of the patch for pr67451 for gcc-5. Because the
>> structure of the allocate() in trunk is quite different the patch looks
>> somewhat different, too, but essentially does the same.
>>
>> Bootstrapped and regtests ok on x86_64-linux-gnu/F23.
>>
>> Ok for gcc-5-branch?
>>
>> Here is the link to the mainline patch:
>> https://gcc.gnu.org/ml/fortran/2016-01/msg00093.html
>>
>> Regards,
>>   Andre
>>
>> On Fri, 29 Jan 2016 19:17:24 +0100
>> Andre Vehreschild  wrote:
>>
>> > Hi all,
>> >
>> > attached is a patch to fix a regression in current gfortran when a
>> > coarray is used in the source=-expression of an allocate(). The ICE was
>> > caused by the class information, i.e., _vptr and so on, not at the
>> > expected place. The patch fixes this.
>> >
>> > The patch also fixes pr69418, which I will flag as a duplicate in a
>> > second.
>> >
>> > Bootstrapped and regtested ok on x86_64-linux-gnu/F23.
>> >
>> > Ok for trunk?
>> >
>> > Backport to gcc-5 is pending, albeit more difficult, because the
>> > allocate() implementation on 5 is not as advanced the one in 6.
>> >
>> > Regards,
>> > Andre
>>
>>
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Ulrich Weigand

Marcin KoÅcielnicki wrote:

> Here we go.  I've also removed the "see below", since I don't really
> see anything below...

The "see below" refers to this code (which I agree isn't really obvious):

  if (TARGET_TPF_PROFILING)
{
  /* Generate a BAS instruction to serve as a function
 entry intercept to facilitate the use of tracing
 algorithms located at the branch target.  */
  emit_insn (gen_prologue_tpf ());

What is not explicitly called out here is that this tracing function
actually refers to some hard registers, in particular r14, and assumes
they still have the original contents as at function entry.

That is why the prolog code avoid using r14 as temporary if the TPF
tracing mechanism is in use.  Now I think this doesn't apply to r12,
so this part of your patch should still be fine.  (In addition, TPF
is not going to support split stacks --or indeed the Go language--
anyway, so it doesn't really matter all that much.)

I do have two other issues; sorry for bringing those up again although
they've been discussed up in the past, but I still think we can find
some improvements here ...

The first is the question Andreas brought up, why we need the extra
set of insns introduced by s390_reorg.  I think this may really have
been necessary for the ESA case where data elements had to be intermixed
into code at a specific location.  But since we no longer support ESA,
we now just have a data block that can be placed anywhere.  For example,
we could just have an insn (at any point in the prolog stream) that
simply emits the full data block during final output, along the lines of
(note: needs to be updated for SImode vs. DImode.):

(define_insn "split_stack_data"
  [(unspec_volatile [(match_operand 0 "bras_sym_operand" "X")
 (match_operand 1 "bras_sym_operand" "X")
 (match_operand 2 "consttable_operand" "X")
 (match_operand 3 "consttable_operand" "X")]
UNSPECV_SPLIT_STACK_DATA)]
  ""
{
  switch_to_section (targetm.asm_out.function_rodata_section
  (current_function_decl));

  output_asm_insn (\".align 3", operands);
  (*targetm.asm_out.internal_label) (asm_out_file, \"L\",
 CODE_LABEL_NUMBER (operands[0]));
  output_asm_insn (\".quad %2\", operands);
  output_asm_insn (\".quad %3\", operands);
  output_asm_insn (\".quad %1-%0\", operands);

  switch_to_section (current_function_section ());
  return "";
}
  [(set_attr "length" "0")])

Or possibly even cleaner, we can simply define the data block at the
tree level as if it were an initialized global variable of a certain
struct type, and just leave it to common code to emit it as usual.

Then we just have the code bits, but I don't really see much
difference between the split_stack_call and split_stack_sibcall
patterns (apart from the data block), so if code flow is OK with
the former insns, it should be OK with the latter too ..

[ Or else, if there *are* code flow issues, the other alternative
would be to emit the full call sequence, code and data, from a
single insn pattern during final output.  This might have the extra
benefit that the assembler sequence is fully fixed, and thus easier
to detect in the linker.  ]

Getting rid of the extra transformation in s390_reorg would not
just remove a bunch of code from the back-end (always good!),
it would also speed up compile time a bit.

The second issue I'm still not sure about is the magic nop marker
for frameless functions.  In an earlier mail you wrote:

> Both currently supported 
> architectures always emit split-stack code on every function.

At least for rs6000 this doesn't appear to be true; in
rs6000_expand_split_stack_prologue we have:

  if (!info->push_p)
return;

so it does nothing for frameless routines.

Now on i386 we do indeed generate code for frameless routines;
in fact, the *same* full stack check is generated as for any
other routine.  Now I'm wondering: is there are reason why
this check would be necessary (and there's simply a bug in
the rs6000 implementation)?  Then we obviously should do the
same on s390.

On the other hand, if rs6000 works fine *without* any code
in frameless routines, why wouldn't that work for s390 too?

Emitting a nop (that is always executed) still looks weird to me.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Re: [Patch, fortran, pr67451, v1] [5/6 Regression] ICE with sourced allocation from coarray

2016-02-02 Thread Paul Richard Thomas

Hi Andre,

This looks to be OK for trunk.

I'll move to the 5-branch patch right away.

Thanks

Paul

On 29 January 2016 at 19:17, Andre Vehreschild  wrote:
> Hi all,
>
> attached is a patch to fix a regression in current gfortran when a
> coarray is used in the source=-expression of an allocate(). The ICE was
> caused by the class information, i.e., _vptr and so on, not at the
> expected place. The patch fixes this.
>
> The patch also fixes pr69418, which I will flag as a duplicate in a
> second.
>
> Bootstrapped and regtested ok on x86_64-linux-gnu/F23.
>
> Ok for trunk?
>
> Backport to gcc-5 is pending, albeit more difficult, because the
> allocate() implementation on 5 is not as advanced the one in 6.
>
> Regards,
> Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein

Re: [PATCH] [graphite] document that isl-0.16 is supported

2016-02-02 Thread Mike Stump

On Feb 2, 2016, at 2:23 AM, Sebastian Huber 
 wrote:
> It would be good to have a recommended version as well (similar for cloog, 
> gmp, mpc and mpfr). If you present me three versions which one should I 
> choose as a naive user?

The latest release, or the one on your system.  This is so basic that we expect 
you to already know this.

> Are the versions in the contrib/download_prerequisites script the recommended 
> ones?

Yes, they are.

Re: PR 69577: Invalid RA of destination subregs

2016-02-02 Thread Richard Sandiford

[Resending without disclaimer, sorry]

Uros Bizjak  writes:
> On Tue, Feb 2, 2016 at 5:54 PM, Kyrill Tkachov
>  wrote:
>> Hi Richard,
>>
>>
>> On 02/02/16 14:56, Richard Sandiford wrote:
>>>
>>> In PR 69577 we have:
>>>
>>>A: (set (reg:V2TI X) ...)
>>>B: (set (subreg:TI (reg:V2TI X) 0) ...)
>>>
>>> X gets allocated to an AVX register, as usual for V2TI.  The problem is
>>> that the movti for B doesn't then preserve the other half of X, even
>>> though the subreg semantics are supposed to guarantee that.
>>>
>>> If instead the same value had been set by:
>>>
>>>A': (set (subreg:TI (reg:V2TI X) 16) ...)
>>>B: (set (subreg:TI (reg:V2TI X) 0) ...)
>>>
>>> the subreg in A' would have prevented the use of AVX registers for X,
>>> since you can't directly access the high part.
>>>
>>> IMO these are really the same thing.  An alternative way to view it
>>> is that the original sequence is equivalent to:
>>>
>>>A: (set (reg:V2TI X) ...)
>>>B1: (set (subreg:TI (reg:V2TI X) 0) ...)
>>>B2: (set (subreg:TI (reg:V2TI X) 16) (subreg:TI (reg:V2TI X) 16))
>>>
>>> in which B2 is a no-op and therefore implicit.  The handling ought
>>> to be the same regardless of whether there is an rtl insn that
>>> explicitly assigns to (subreg:TI (reg:V2TI X) 16).
>>>
>>> This patch implements that idea.  Hopefully the comments explain
>>> what's going on.
>>>
>>> Tested on x86_64-linux-gnu so far.  Will test on aarch64-linux-gnu and
>>> arm-linux-gnueabihf as well.  OK to install if the additional testing
>>> succeeds?
>>
>>
>> For me this patch causes an ICE when building libgcc during an
>> aarch64-none-elf build.
>> It's a segfault with the trace:
>> 0xb0ac2a crash_signal
>> $SRC/gcc/toplev.c:335
>> 0xa7cfd7 init_subregs_of_mode()
>> $SRC/gcc/reginfo.c:1345
>> 0x96fc4b init_costs
>> $SRC/gcc/ira-costs.c:2187
>> 0x97419e ira_set_pseudo_classes(bool, _IO_FILE*)
>> $SRC/gcc/ira-costs.c:2237
>> 0x106fd1e alloc_global_sched_pressure_data
>> $SRC/gcc/haifa-sched.c:7244
>> 0x106fd1e sched_init()
>> $SRC/gcc/haifa-sched.c:7394
>> 0x107109a haifa_sched_init()
>> $SRC/gcc/haifa-sched.c:7406
>> 0xab37ac schedule_insns()
>> $SRC/gcc/sched-rgn.c:3504
>> 0xab3f5b rest_of_handle_sched
>> $SRC/gcc/sched-rgn.c:3717
>> 0xab3f5b execute
>> $SRC/gcc/sched-rgn.c:3825
>
> Also on x86_64-linux-gnu when building -m32 multilib:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00d28264 in init_subregs_of_mode () at
> /home/uros/gcc-svn/trunk/gcc/reginfo.c:1345
> 1345FOR_EACH_INSN_DEF (def, insn)
> (gdb) p insn
> $1 = (rtx_insn *) 0x7fffef9f4d40
> (gdb) p debug_rtx (insn)
> (code_label 60 31 39 10 9 "" [3 uses])
> $2 = void
> (gdb) p def
> $3 = (df_ref) 0x0

Bah, sorry.  I test with --enable-checking=yes,rtl,df, and it turns out
that df checking masks this kind of problem.  -m32 builds (and tests)
fine with it but not without.

Here's the patch again with the obvious fix.  Retesting now with just
--enable-checking=yes,rtl.

Thanks,
Richard


gcc/
PR rtl-optimization/69577
* reginfo.c (record_subregs_of_mode): Add a partial_def parameter.
(find_subregs_of_mode): Update accordingly.  Iterate over partial
definitions.

gcc/testsuite/
PR rtl-optimization/69577
* gcc.target/i386/pr69577.c: New test.

diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 6814eed..ccf53bf 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1244,8 +1244,16 @@ simplifiable_subregs (const subreg_shape &shape)
 static HARD_REG_SET **valid_mode_changes;
 static obstack valid_mode_changes_obstack;
 
+/* Restrict the choice of register for SUBREG_REG (SUBREG) based
+   on information about SUBREG.
+
+   If PARTIAL_DEF, SUBREG is a partial definition of a multipart inner
+   register and we want to ensure that the other parts of the inner
+   register are correctly preserved.  If !PARTIAL_DEF we need to
+   ensure that SUBREG itself can be formed.  */
+
 static void
-record_subregs_of_mode (rtx subreg)
+record_subregs_of_mode (rtx subreg, bool partial_def)
 {
   unsigned int regno;
 
@@ -1256,15 +1264,41 @@ record_subregs_of_mode (rtx subreg)
   if (regno < FIRST_PSEUDO_REGISTER)
 return;
 
+  subreg_shape shape (shape_of_subreg (subreg));
+  if (partial_def)
+{
+  /* The number of independently-accessible SHAPE.outer_mode values
+in SHAPE.inner_mode is GET_MODE_SIZE (SHAPE.inner_mode) / SIZE.
+We need to check that the assignment will preserve all the other
+SIZE-byte chunks in the inner register besides the one that
+includes SUBREG.
+
+In practice it is enough to check whether an equivalent
+SHAPE.inner_mode value in an adjacent SIZE-byte chunk can be formed.
+If the underlying registers are small enough, both subregs will
+be valid.  If the underlying registers are too large, one of the
+subregs will be invalid.
+
+This

[hsa branch] Map collapse(2) and collapse(3) to HSA grid dimensions

2016-02-02 Thread Martin Jambor

Hi,

with HSA merged, the hsa branch can be used for development of new
features again.  Thus, I have committed there a patch which I finished
after the merge proposal and thus I kept in a private branch so far,
which allows collapse(2) and collapse(3) clauses to be gridified and
the individual loops to be directly mapped to HSA grid dimensions.

In order to achieve, that I needed to introduce hsa-specific builtins
which expand to HSAIL instructions giving information about specific
HSA grid dimensions.  I hope I have done that right, any comments are
welcome.

Other than that, the changes are small because as I was restructuring
the code, I was moving it in this direction for some time already.
Committed to the branch (a few days ago actually, sorry for that).

Thanks,

Martin


2016-01-26  Martin Jambor  

gcc/
* Makefile.in (BUILTINS_DEF): Add hsa-builtins.def.
* builtins.def: Include hsa-builtins.def.
(DEF_HSA_BUILTIN): Define.
* hsa-builtins.def: New file.
* hsa-gen.c (query_hsa_grid): Accept dimension as an hsa_op_immed.
Add a new override.
(gen_hsa_insns_for_call): Handle BUILT_IN_HSA_GET_WORKITEM_ABSID.
* omp-low.c (grid_get_kernel_launch_attributes): Support up to
three dimensions.
(grid_expand_omp_for_loop): Likewise.
(lower_omp_for_lastprivate): Do not extract looptemps from grid loops.
(grid_target_follows_gridifiable_pattern): Allow collapse up to 3.
* tree-inline.h (copy_body_data): New field
decl_creation_prevention_level.  Moved remap_var_for_cilk to minimize
padding.

gcc/fortran/
* f95-lang.c: Include hsa-builtins.def.
(DEF_HSA_BUILTIN): Define.

libgomp/
* plugin/plugin-hsa.c (parse_target_attributes): Support up to three
dimensions.
(get_group_size): New function.
(GOMP_OFFLOAD_run): Support up to three dimensions.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ab9cbbf..a996708 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -899,7 +899,8 @@ RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h
 READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
 PARAMS_H = params.h params-enum.h params.def
 BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \
-   gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def
+   gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def \
+   hsa-builtins.def
 INTERNAL_FN_DEF = internal-fn.def
 INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
 TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 2fc7f65..14d2335 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -188,6 +188,16 @@ along with GCC; see the file COPYING3.  If not see
|| flag_cilkplus \
|| flag_offload_abi != OFFLOAD_ABI_UNSET))
 
+#undef DEF_HSA_BUILTIN
+#ifdef ENABLE_HSA
+#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
+  DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
+   false, false, true, ATTRS, false, \
+  (!flag_disable_hsa))
+#else
+#define DEF_HSA_BUILTIN(ENUM, NAME, TYPE, ATTRS)
+#endif
+
 /* Builtin used by implementation of Cilk Plus.  Most of these are decomposed
by the compiler but a few are implemented in libcilkrts.  */ 
 #undef DEF_CILK_BUILTIN_STUB
@@ -932,6 +942,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
ATTR_NOTHROW_LEAF_LIST)
 /* Offloading and Multi Processing builtins.  */
 #include "omp-builtins.def"
 
+/* Heterogeneous Systems Architecture.  */
+#include "hsa-builtins.def"
+
 /* Cilk keywords builtins.  */
 #include "cilk-builtins.def"
 
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 9c3a311..efa750de 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -1234,6 +1234,17 @@ gfc_init_builtin_functions (void)
 #undef DEF_GOMP_BUILTIN
 }
 
+#ifdef ENABLE_HSA
+  if (!flag_disable_hsa)
+{
+#undef DEF_HSA_BUILTIN
+#define DEF_HSA_BUILTIN(code, name, type, attr) \
+  gfc_define_builtin ("__builtin_" name, builtin_types[type], \
+ code, name, attr);
+#include "../hsa-builtins.def"
+}
+#endif
+
   gfc_define_builtin ("__builtin_trap", builtin_types[BT_FN_VOID],
  BUILT_IN_TRAP, NULL, ATTR_NOTHROW_LEAF_LIST);
   TREE_THIS_VOLATILE (builtin_decl_explicit (BUILT_IN_TRAP)) = 1;
diff --git a/gcc/hsa-builtins.def b/gcc/hsa-builtins.def
new file mode 100644
index 000..e4681c1
--- /dev/null
+++ b/gcc/hsa-builtins.def
@@ -0,0 +1,31 @@
+/* This file contains the definitions and documentation for the
+   Offloading and Multi Processing builtins used in the GNU compiler.
+   Copyright (C) 2005-2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3

[patch, Fortran, pr69296, v1] [6 Regression] [F03] Problem with associate and vector subscript

2016-02-02 Thread Andre Vehreschild

Hi all,

the attached patch fixes a regression that was most likely introduced
by one of my former patches, when in an associate() the rank of the
associated variable could not be determined at parse time correctly.
The patch now adds a flag to the association list indicating, that the
rank of the associated variable has been guessed only. In the resolve
phase the rank is corrected when the guess was wrong.

Bootstrapped and regtested ok on x86_64-linux-gnu/F23.

Ok for trunk?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 8441b8c..33fffd8 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2344,6 +2344,9 @@ typedef struct gfc_association_list
  for memory handling.  */
   unsigned dangling:1;
 
+  /* True when the rank of the target expression is guessed during parsing.  */
+  unsigned rankguessed:1;
+
   char name[GFC_MAX_SYMBOL_LEN + 1];
   gfc_symtree *st; /* Symtree corresponding to name.  */
   locus where;
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 5dcab70..7bce47f 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -4098,6 +4098,7 @@ parse_associate (void)
 	  int dim, rank = 0;
 	  if (array_ref)
 	{
+	  a->rankguessed = 1;
 	  /* Count the dimension, that have a non-scalar extend.  */
 	  for (dim = 0; dim < array_ref->dimen; ++dim)
 		if (array_ref->dimen_type[dim] != DIMEN_ELEMENT
diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 8752fd4..8fb7a95 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -4777,7 +4777,7 @@ fail:
 /* Given a variable expression node, compute the rank of the expression by
examining the base symbol and any reference structures it may have.  */
 
-static void
+void
 expression_rank (gfc_expr *e)
 {
   gfc_ref *ref;
@@ -8153,16 +8153,19 @@ resolve_assoc_var (gfc_symbol* sym, bool resolve_target)
   if (target->rank != 0)
 {
   gfc_array_spec *as;
-  if (sym->ts.type != BT_CLASS && !sym->as)
+  /* The rank may be incorrectly guessed at parsing, therefore make sure
+	 it is corrected now.  */
+  if (sym->ts.type != BT_CLASS && (!sym->as || sym->assoc->rankguessed))
 	{
-	  as = gfc_get_array_spec ();
+	  if (!sym->as)
+	sym->as = gfc_get_array_spec ();
+	  as = sym->as;
 	  as->rank = target->rank;
 	  as->type = AS_DEFERRED;
 	  as->corank = gfc_get_corank (target);
 	  sym->attr.dimension = 1;
 	  if (as->corank != 0)
 	sym->attr.codimension = 1;
-	  sym->as = as;
 	}
 }
   else
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 5143c31..cb54499 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -1569,7 +1569,9 @@ trans_associate_var (gfc_symbol *sym, gfc_wrapped_block *block)
   if (sym->attr.subref_array_pointer)
 	{
 	  gcc_assert (e->expr_type == EXPR_VARIABLE);
-	  tmp = e->symtree->n.sym->backend_decl;
+	  tmp = e->symtree->n.sym->ts.type == BT_CLASS
+	  ? gfc_class_data_get (e->symtree->n.sym->backend_decl)
+	  : e->symtree->n.sym->backend_decl;
 	  tmp = gfc_get_element_type (TREE_TYPE (tmp));
 	  tmp = fold_convert (gfc_array_index_type, size_in_bytes (tmp));
 	  gfc_add_modify (&se.pre, GFC_DECL_SPAN(desc), tmp);
diff --git a/gcc/testsuite/gfortran.dg/associate_19.f03 b/gcc/testsuite/gfortran.dg/associate_19.f03
new file mode 100644
index 000..76534c5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/associate_19.f03
@@ -0,0 +1,23 @@
+! { dg-do run }
+!
+! Contributed by mreste...@gmail.com
+! Adapated by Andre Vehreschild  
+! Test that fix for PR69296 is working.
+
+program p
+ implicit none
+
+ integer :: j, a(2,6), i(3,2)
+
+  a(1,:) = (/ ( j , j=1,6) /)
+  a(2,:) = (/ ( -10*j , j=1,6) /)
+
+  i(:,1) = (/ 1 , 3 , 5 /)
+  i(:,2) = (/ 4 , 5 , 6 /)
+
+  associate( ai => a(:,i(:,1)) )
+if (any(shape(ai) /= [2, 3])) call abort()
+if (any(reshape(ai, [6]) /= [1 , -10, 3, -30, 5, -50])) call abort()
+  end associate
+
+end program p
diff --git a/gcc/testsuite/gfortran.dg/associate_20.f03 b/gcc/testsuite/gfortran.dg/associate_20.f03
new file mode 100644
index 000..9d420ef
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/associate_20.f03
@@ -0,0 +1,31 @@
+! { dg-do run }
+!
+! Contributed by mreste...@gmail.com
+! Adapated by Andre Vehreschild  
+! Test that fix for PR69296 is working.
+
+program p
+  implicit none
+
+  type foo
+integer :: i
+  end type
+
+  integer :: j, i(3,2)
+  class(foo), allocatable :: a(:,:)
+
+  allocate (a(2,6))
+
+  a(1,:)%i = (/ ( j , j=1,6) /)
+  a(2,:)%i = (/ ( -10*j , j=1,6) /)
+
+  i(:,1) = (/ 1 , 3 , 5 /)
+  i(:,2) = (/ 4 , 5 , 6 /)
+
+  associate( ai => a(:,i(:,1))%i )
+if (any(shape(ai) /= [2, 3])) call abort()
+if (any(reshape(ai, [6]) /= [1 , -10, 3, -30, 5, -50])) call abort()
+  end associate
+
+  deallocate(a)
+end program p
gcc/fortran/ChangeLog:

2016-02-02  Andre Vehreschild  

PR fortran/69296
* gfortran.h

Re: PR 69577: Invalid RA of destination subregs

2016-02-02 Thread Uros Bizjak

On Tue, Feb 2, 2016 at 5:54 PM, Kyrill Tkachov
 wrote:
> Hi Richard,
>
>
> On 02/02/16 14:56, Richard Sandiford wrote:
>>
>> In PR 69577 we have:
>>
>>A: (set (reg:V2TI X) ...)
>>B: (set (subreg:TI (reg:V2TI X) 0) ...)
>>
>> X gets allocated to an AVX register, as usual for V2TI.  The problem is
>> that the movti for B doesn't then preserve the other half of X, even
>> though the subreg semantics are supposed to guarantee that.
>>
>> If instead the same value had been set by:
>>
>>A': (set (subreg:TI (reg:V2TI X) 16) ...)
>>B: (set (subreg:TI (reg:V2TI X) 0) ...)
>>
>> the subreg in A' would have prevented the use of AVX registers for X,
>> since you can't directly access the high part.
>>
>> IMO these are really the same thing.  An alternative way to view it
>> is that the original sequence is equivalent to:
>>
>>A: (set (reg:V2TI X) ...)
>>B1: (set (subreg:TI (reg:V2TI X) 0) ...)
>>B2: (set (subreg:TI (reg:V2TI X) 16) (subreg:TI (reg:V2TI X) 16))
>>
>> in which B2 is a no-op and therefore implicit.  The handling ought
>> to be the same regardless of whether there is an rtl insn that
>> explicitly assigns to (subreg:TI (reg:V2TI X) 16).
>>
>> This patch implements that idea.  Hopefully the comments explain
>> what's going on.
>>
>> Tested on x86_64-linux-gnu so far.  Will test on aarch64-linux-gnu and
>> arm-linux-gnueabihf as well.  OK to install if the additional testing
>> succeeds?
>
>
> For me this patch causes an ICE when building libgcc during an
> aarch64-none-elf build.
> It's a segfault with the trace:
> 0xb0ac2a crash_signal
> $SRC/gcc/toplev.c:335
> 0xa7cfd7 init_subregs_of_mode()
> $SRC/gcc/reginfo.c:1345
> 0x96fc4b init_costs
> $SRC/gcc/ira-costs.c:2187
> 0x97419e ira_set_pseudo_classes(bool, _IO_FILE*)
> $SRC/gcc/ira-costs.c:2237
> 0x106fd1e alloc_global_sched_pressure_data
> $SRC/gcc/haifa-sched.c:7244
> 0x106fd1e sched_init()
> $SRC/gcc/haifa-sched.c:7394
> 0x107109a haifa_sched_init()
> $SRC/gcc/haifa-sched.c:7406
> 0xab37ac schedule_insns()
> $SRC/gcc/sched-rgn.c:3504
> 0xab3f5b rest_of_handle_sched
> $SRC/gcc/sched-rgn.c:3717
> 0xab3f5b execute
> $SRC/gcc/sched-rgn.c:3825

Also on x86_64-linux-gnu when building -m32 multilib:

Program received signal SIGSEGV, Segmentation fault.
0x00d28264 in init_subregs_of_mode () at
/home/uros/gcc-svn/trunk/gcc/reginfo.c:1345
1345FOR_EACH_INSN_DEF (def, insn)
(gdb) p insn
$1 = (rtx_insn *) 0x7fffef9f4d40
(gdb) p debug_rtx (insn)
(code_label 60 31 39 10 9 "" [3 uses])
$2 = void
(gdb) p def
$3 = (df_ref) 0x0

Uros.

Re: PR 69577: Invalid RA of destination subregs

2016-02-02 Thread Kyrill Tkachov


Hi Richard,

On 02/02/16 14:56, Richard Sandiford wrote:

In PR 69577 we have:

   A: (set (reg:V2TI X) ...)
   B: (set (subreg:TI (reg:V2TI X) 0) ...)

X gets allocated to an AVX register, as usual for V2TI.  The problem is
that the movti for B doesn't then preserve the other half of X, even
though the subreg semantics are supposed to guarantee that.

If instead the same value had been set by:

   A': (set (subreg:TI (reg:V2TI X) 16) ...)
   B: (set (subreg:TI (reg:V2TI X) 0) ...)

the subreg in A' would have prevented the use of AVX registers for X,
since you can't directly access the high part.

IMO these are really the same thing.  An alternative way to view it
is that the original sequence is equivalent to:

   A: (set (reg:V2TI X) ...)
   B1: (set (subreg:TI (reg:V2TI X) 0) ...)
   B2: (set (subreg:TI (reg:V2TI X) 16) (subreg:TI (reg:V2TI X) 16))

in which B2 is a no-op and therefore implicit.  The handling ought
to be the same regardless of whether there is an rtl insn that
explicitly assigns to (subreg:TI (reg:V2TI X) 16).

This patch implements that idea.  Hopefully the comments explain
what's going on.

Tested on x86_64-linux-gnu so far.  Will test on aarch64-linux-gnu and
arm-linux-gnueabihf as well.  OK to install if the additional testing
succeeds?


For me this patch causes an ICE when building libgcc during an aarch64-none-elf 
build.
It's a segfault with the trace:
0xb0ac2a crash_signal
$SRC/gcc/toplev.c:335
0xa7cfd7 init_subregs_of_mode()
$SRC/gcc/reginfo.c:1345
0x96fc4b init_costs
$SRC/gcc/ira-costs.c:2187
0x97419e ira_set_pseudo_classes(bool, _IO_FILE*)
$SRC/gcc/ira-costs.c:2237
0x106fd1e alloc_global_sched_pressure_data
$SRC/gcc/haifa-sched.c:7244
0x106fd1e sched_init()
$SRC/gcc/haifa-sched.c:7394
0x107109a haifa_sched_init()
$SRC/gcc/haifa-sched.c:7406
0xab37ac schedule_insns()
$SRC/gcc/sched-rgn.c:3504
0xab3f5b rest_of_handle_sched
$SRC/gcc/sched-rgn.c:3717
0xab3f5b execute
$SRC/gcc/sched-rgn.c:3825

Thanks,
Kyrill



Thanks,
Richard


diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 6814eed..afb36aa 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1244,8 +1244,16 @@ simplifiable_subregs (const subreg_shape &shape)
  static HARD_REG_SET **valid_mode_changes;
  static obstack valid_mode_changes_obstack;
  
+/* Restrict the choice of register for SUBREG_REG (SUBREG) based

+   on information about SUBREG.
+
+   If PARTIAL_DEF, SUBREG is a partial definition of a multipart inner
+   register and we want to ensure that the other parts of the inner
+   register are correctly preserved.  If !PARTIAL_DEF we need to
+   ensure that SUBREG itself can be formed.  */
+
  static void
-record_subregs_of_mode (rtx subreg)
+record_subregs_of_mode (rtx subreg, bool partial_def)
  {
unsigned int regno;
  
@@ -1256,15 +1264,41 @@ record_subregs_of_mode (rtx subreg)

if (regno < FIRST_PSEUDO_REGISTER)
  return;
  
+  subreg_shape shape (shape_of_subreg (subreg));

+  if (partial_def)
+{
+  /* The number of independently-accessible SHAPE.outer_mode values
+in SHAPE.inner_mode is GET_MODE_SIZE (SHAPE.inner_mode) / SIZE.
+We need to check that the assignment will preserve all the other
+SIZE-byte chunks in the inner register besides the one that
+includes SUBREG.
+
+In practice it is enough to check whether an equivalent
+SHAPE.inner_mode value in an adjacent SIZE-byte chunk can be formed.
+If the underlying registers are small enough, both subregs will
+be valid.  If the underlying registers are too large, one of the
+subregs will be invalid.
+
+This relies on the fact that we've already been passed
+SUBREG with PARTIAL_DEF set to false.  */
+  unsigned int size = MAX (REGMODE_NATURAL_SIZE (shape.inner_mode),
+  GET_MODE_SIZE (shape.outer_mode));
+  gcc_checking_assert (size < GET_MODE_SIZE (shape.inner_mode));
+  if (shape.offset >= size)
+   shape.offset -= size;
+  else
+   shape.offset += size;
+}
+
if (valid_mode_changes[regno])
  AND_HARD_REG_SET (*valid_mode_changes[regno],
- simplifiable_subregs (shape_of_subreg (subreg)));
+ simplifiable_subregs (shape));
else
  {
valid_mode_changes[regno]
= XOBNEW (&valid_mode_changes_obstack, HARD_REG_SET);
COPY_HARD_REG_SET (*valid_mode_changes[regno],
-simplifiable_subregs (shape_of_subreg (subreg)));
+simplifiable_subregs (shape));
  }
  }
  
@@ -1277,7 +1311,7 @@ find_subregs_of_mode (rtx x)

int i;
  
if (code == SUBREG)

-record_subregs_of_mode (x);
+record_subregs_of_mode (x, false);
  
/* Time for some deep diving.  */

for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
@@ -1304,8 +1338,15 @@ init_subregs_of_mode (void)
  
FOR_EACH_BB_FN (bb, cfun)

Re: [PATCH] Fix PR c++/69139 (deduction failure with trailing return type)

2016-02-02 Thread Patrick Palka

On Tue, Jan 26, 2016 at 11:31 AM, Patrick Palka  wrote:
> This patch makes the parser more robust in determining whether an 'auto'
> specifier that appears in a parameter declaration corresponds to a
> placeholder for a late return type, or corresponds to an implicit
> template parameter as for an abbreviated function template.
>
> Bootstrap + regtest in progress on x86_64-pc-linux-gnu, will also test
> this change against Boost.  OK to commit if testing succeeds?  What
> about for GCC 4.9/5?
>
> gcc/cp/ChangeLog:
>
> PR c++/69139
> * parser.c (cp_parser_simple_type_specifier): Make the check
> for disambiguating between an 'auto' placeholder and an implicit
> template parameter more robust.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/69139
> * g++.dg/cpp0x/auto47.C: New test.
> ---
>  gcc/cp/parser.c | 33 +++--
>  gcc/testsuite/g++.dg/cpp0x/auto47.C | 20 
>  2 files changed, 43 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/auto47.C
>
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index d03b0c9..56c834f 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -16032,20 +16032,33 @@ cp_parser_simple_type_specifier (cp_parser* parser,
>   /* The 'auto' might be the placeholder return type for a function 
> decl
>  with trailing return type.  */
>   bool have_trailing_return_fn_decl = false;
> - if (cp_lexer_peek_nth_token (parser->lexer, 2)->type
> - == CPP_OPEN_PAREN)
> +
> + cp_parser_parse_tentatively (parser);
> + cp_lexer_consume_token (parser->lexer);
> + while (cp_lexer_next_token_is_not (parser->lexer, CPP_EQ)
> +&& cp_lexer_next_token_is_not (parser->lexer, CPP_COMMA)
> +&& cp_lexer_next_token_is_not (parser->lexer, 
> CPP_CLOSE_PAREN)
> +&& cp_lexer_next_token_is_not (parser->lexer, CPP_EOF))
> {
> - cp_parser_parse_tentatively (parser);
> - cp_lexer_consume_token (parser->lexer);
> - cp_lexer_consume_token (parser->lexer);
> - if (cp_parser_skip_to_closing_parenthesis (parser,
> + if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
> +   {
> + cp_lexer_consume_token (parser->lexer);
> + cp_parser_skip_to_closing_parenthesis (parser,
>  /*recovering*/false,
>  /*or_comma*/false,
> -
> /*consume_paren*/true))
> -   have_trailing_return_fn_decl
> - = cp_lexer_next_token_is (parser->lexer, CPP_DEREF);
> - cp_parser_abort_tentative_parse (parser);
> +
> /*consume_paren*/true);
> + continue;
> +   }
> +
> + if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF))
> +   {
> + have_trailing_return_fn_decl = true;
> + break;
> +   }
> +
> + cp_lexer_consume_token (parser->lexer);
> }
> + cp_parser_abort_tentative_parse (parser);
>
>   if (have_trailing_return_fn_decl)
> {
> diff --git a/gcc/testsuite/g++.dg/cpp0x/auto47.C 
> b/gcc/testsuite/g++.dg/cpp0x/auto47.C
> new file mode 100644
> index 000..08adf31
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/auto47.C
> @@ -0,0 +1,20 @@
> +// PR c++/69139
> +// { dg-do compile { target c++11 } }
> +
> +auto get(int) -> int { return {}; }
> +template  int f(auto (*)(int) -> R) { return {}; }
> +int i = f(get);
> +
> +int foo1 (auto (int) -> char);
> +
> +int foo2 (auto f(int) -> char);
> +
> +int foo2 (auto (f)(int) -> char);
> +
> +int foo3 (auto (*f)(int) -> char);
> +
> +int foo4 (auto (*const **&f)(int) -> char);
> +
> +int foo5 (auto (*const **&f)(int, int *) -> char);
> +
> +int foo6 (auto (int) const -> char); // { dg-error "const" }
> --
> 2.7.0.134.gf5046bd.dirty
>

Ping.

[PATCH, 386]: Fix PR67032, Geode optimizations incorrectly return -NaN

2016-02-02 Thread Uros Bizjak

Hello!

The problem, exposed by the testcase in the PR, was with the
generation of unwanted MMX registers. Instructions that touch %mm
registers switch x87 register stack to MMX mode and this way clobber
all x87 registers in the register stack.

The core of the problem was in totally bogus cost values for MMX and
SSE moves that tricked register allocator into allocating an
unnecessary %mm register.

The patch changes these values to the values of the pentiumpro processor.

BTW: I gave up on constructing a testcase, because even a small
perturbation of the source caused the %mm registers to disappear.

2016-02-02  Uros Bizjak  

PR target/67032
* config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Also,
I have checked that there were no MMX registers generated for the
original (preprocessed) testcase.

Patch was committed to mainline SVN and will be committed to all
release branches.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b500233..121e802 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -595,17 +595,17 @@ struct processor_costs geode_cost = {
   {4, 6, 6},   /* cost of storing fp registers
   in SFmode, DFmode and XFmode */
 
-  1,   /* cost of moving MMX register */
-  {1, 1},  /* cost of loading MMX registers
+  2,   /* cost of moving MMX register */
+  {2, 2},  /* cost of loading MMX registers
   in SImode and DImode */
-  {1, 1},  /* cost of storing MMX registers
+  {2, 2},  /* cost of storing MMX registers
   in SImode and DImode */
-  1,   /* cost of moving SSE register */
-  {1, 1, 1},   /* cost of loading SSE registers
+  2,   /* cost of moving SSE register */
+  {2, 2, 8},   /* cost of loading SSE registers
   in SImode, DImode and TImode */
-  {1, 1, 1},   /* cost of storing SSE registers
+  {2, 2, 8},   /* cost of storing SSE registers
   in SImode, DImode and TImode */
-  1,   /* MMX or SSE register to integer */
+  3,   /* MMX or SSE register to integer */
   64,  /* size of l1 cache.  */
   128, /* size of l2 cache.  */
   32,  /* size of prefetch block */

Re: [Patch, avr] Restore default value of PARAM_ALLOW_STORE_DATA_RACES to 1

2016-02-02 Thread Denis Chertykov

2016-02-01 16:56 GMT+03:00 Senthil Kumar Selvaraj
:
>
> Hi,
>
>   This patch sets PARAM_ALLOW_STORE_DATA_RACES to 1 (the default until
>   a year back), to avoid code size regressions in trunk (and probably
>   5.x )for the AVR target.
>
>   Consider the following piece of code
>
> volatile int z;
> void foo(int x)
> {
> static char i;
> for (i=0; i< 4; ++i)
> {
> if (x > 2)
> z = 1;
> else
> z = 2;
> }
> }
>
> Unmodified gcc trunk generates this
>
> movw r20,r24
> sts i.1495,__zero_reg__
> ldi r25,0
> ldi r18,0
> ldi r22,lo8(2)
> ldi r23,0
> ldi r30,lo8(1)
> ldi r31,0
> .L2:
> cpi r25,lo8(4)
> brne .L5
> cpse r18,__zero_reg__
> sts i.1495,r25
> .L1:
> ret
> .L5:
> cpi r20,3
> cpc r21,__zero_reg__
> brlt .L3
> sts z+1,r31
> sts z,r30
> .L4:
> subi r25,lo8(-(1))
> ldi r18,lo8(1)
> rjmp .L2
> .L3:
> sts z+1,r23
> sts z,r22
> rjmp .L4
> .size   foo, .-foo
> .local  i.1495
> .comm   i.1495,1,1
> .comm   z,2,1
> .ident  "GCC: (GNU) 6.0.0 20160201 (experimental)"
>
> Note the usage of an extra reg (r18) that is used as a flag to
> record loop entry (in .L4), and the conditional store of r25 to i in .L2.
>
> In 4.x, there is no extra reg usage - only a single unconditional set of
> i to r25 at the end of the loop.
>
> Digging into the code, I found that LIM checks
> PARAM_ALLOW_STORE_DATA_RACES and introduces the flag to avoid store data
> races - see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52558. The
> default value of the param was set to zero a year and a half back - see
> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01548.html.
>
> For AVR, I guess assuming any store can cause a data race is too
> pessimistic for the general case. Globals shared with interrupts will
> need special handling for atomic access anyway, so I thought we should
> revert the default back to allow store data races.
>
> If this is ok, could someone commit please? I don't have commit access.
>
> Regards
> Senthil
>
> gcc/ChangeLog
>
> 2016-02-01  Senthil Kumar Selvaraj  
>
> * config/avr/avr.c (avr_option_override): Set
> PARAM_ALLOW_STORE_DATA_RACES to 1.
>

Committed.

Denis.

Re: [PATCH] Fix wide_int unsigned division (PR tree-optimization/69546, take 2)

2016-02-02 Thread Richard Sandiford

Jakub Jelinek  writes:
> On Sat, Jan 30, 2016 at 02:04:45PM +, Richard Sandiford wrote:
>> Not sure what to call it.  Maybe canonize_uhwi?  Like canonize, except
>> that it takes a uhwi instead of a length.
>> 
>> > Can that be done as a follow-up?  Certainly it would need
>> > to take the uhwi to store, pointer to the array of hwis, and precision.
>> 
>> Yeah, guess it can wait.
>
> So like this?
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-02-01  Jakub Jelinek  
>
>   * wide-int.cc (canonize_uhwi): New function.
>   (wi::divmod_internal): Use it.
>
> --- gcc/wide-int.cc.jj2016-01-30 19:03:35.0 +0100
> +++ gcc/wide-int.cc   2016-02-01 12:28:23.501519292 +0100
> @@ -118,6 +118,20 @@ canonize (HOST_WIDE_INT *val, unsigned i
>return 1;
>  }
>  
> +/* VAL[0] is unsigned result of operation.  Canonize it by adding
> +   another 0 block if needed, and return number of blocks needed.  */
> +
> +static inline unsigned int
> +canonize_uhwi (HOST_WIDE_INT *val, unsigned int precision)

s/is unsigned result of operation/is the unsigned result of an operation/

LGTM otherwise, thanks.

Richard

[PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Marcin Kościelnicki

libgcc/ChangeLog:

* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
* config/s390/morestack.S: New file.
* config/s390/t-stack-s390: New file.
* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

* common/config/s390/s390-common.c (s390_supports_split_stack):
New function.
(TARGET_SUPPORTS_SPLIT_STACK): New macro.
* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
* config/s390/s390.c (struct machine_function): New field
split_stack_varargs_pointer.
(s390_register_info): Mark r12 as clobbered if it'll be used as temp
in s390_emit_prologue.
(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
vararg pointer.
(morestack_ref): New global.
(SPLIT_STACK_AVAILABLE): New macro.
(s390_expand_split_stack_prologue): New function.
(s390_expand_split_stack_call): New function.
(s390_live_on_entry): New function.
(s390_va_start): Use split-stack vararg pointer if appropriate.
(s390_reorg): Lower the split-stack pseudo-insns.
(s390_asm_file_end): Emit the split-stack note sections.
(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
(UNSPECV_SPLIT_STACK_CALL): New unspec.
(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
(UNSPECV_SPLIT_STACK_MARKER): New unspec.
(split_stack_prologue): New expand.
(split_stack_call): New expand.
(split_stack_call_*): New insn.
(split_stack_cond_call): New expand.
(split_stack_cond_call_*): New insn.
(split_stack_space_check): New expand.
(split_stack_sibcall): New expand.
(split_stack_sibcall_*): New insn.
(split_stack_cond_sibcall): New expand.
(split_stack_cond_sibcall_*): New insn.
(split_stack_marker): New insn.
---
Here we go.  I've also removed the "see below", since I don't really
see anything below...

 gcc/ChangeLog|  37 +++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h|   1 +
 gcc/config/s390/s390.c   | 323 ++-
 gcc/config/s390/s390.md  | 177 ++
 libgcc/ChangeLog |   7 +
 libgcc/config.host   |   4 +-
 libgcc/config/s390/morestack.S   | 609 +++
 libgcc/config/s390/t-stack-s390  |   2 +
 libgcc/generic-morestack.c   |   4 +
 10 files changed, 1171 insertions(+), 7 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a2cec8..af86079 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,40 @@
+2016-02-02  Marcin Kościelnicki  
+
+   * common/config/s390/s390-common.c (s390_supports_split_stack):
+   New function.
+   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
+   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+   * config/s390/s390.c (struct machine_function): New field
+   split_stack_varargs_pointer.
+   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
+   in s390_emit_prologue.
+   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+   vararg pointer.
+   (morestack_ref): New global.
+   (SPLIT_STACK_AVAILABLE): New macro.
+   (s390_expand_split_stack_prologue): New function.
+   (s390_expand_split_stack_call): New function.
+   (s390_live_on_entry): New function.
+   (s390_va_start): Use split-stack vararg pointer if appropriate.
+   (s390_reorg): Lower the split-stack pseudo-insns.
+   (s390_asm_file_end): Emit the split-stack note sections.
+   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+   * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+   (UNSPECV_SPLIT_STACK_CALL): New unspec.
+   (UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
+   (UNSPECV_SPLIT_STACK_MARKER): New unspec.
+   (split_stack_prologue): New expand.
+   (split_stack_call): New expand.
+   (split_stack_call_*): New insn.
+   (split_stack_cond_call): New expand.
+   (split_stack_cond_call_*): New insn.
+   (split_stack_space_check): New expand.
+   (split_stack_sibcall): New expand.
+   (split_stack_sibcall_*): New insn.
+   (split_stack_cond_sibcall): New expand.
+   (split_stack_cond_sibcall_*): New insn.
+   (split_stack_marker): New insn.
+
 2016-02-02  Thomas Schwinge  
 
* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
diff --git a/gcc/common/config/s390/s390-common.c 
b/gcc/common/config/s390/s390-common.c
index 4519c21..1e497e6 100644
--- a/gcc/common/config/s390/s390-common.c
+++ b/gcc/common/config/s390/s390-common.c
@@ -105,6 +105,17 @@ s390_handle_option (struct gcc_options *op

[PATCH COMMITTED] MAINTAINERS (Write After Approval): Add myself

2016-02-02 Thread Claudiu Zissulescu

From: claziss 

Adding my self.

2016-02-02  Claudiu Zissulescu  

* MAINTAINERS (Write After Approval): Add myself.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233077 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog   | 4 
 MAINTAINERS | 1 +
 2 files changed, 5 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 9a7e48f..4ea16fa 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2016-02-02  Claudiu Zissulescu  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+
 2016-01-29  Sebastian Pop  
 
* config/isl.m4: Add comments about isl-0.16.
diff --git a/MAINTAINERS b/MAINTAINERS
index f8fa798..a7f2beb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -635,6 +635,7 @@ Yufeng Zhang

 Shujing Zhao   
 Jon Ziegler
 Roman Zippel   
+Claudiu Zissulescu 
 Josef Zlomek   
 
Bug database only accounts
-- 
1.9.1

Re: [PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Andreas Krebbel

On 02/02/2016 03:52 PM, Marcin Kościelnicki wrote:
> libgcc/ChangeLog:
> 
>   * config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>   * config/s390/morestack.S: New file.
>   * config/s390/t-stack-s390: New file.
>   * generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
>   * common/config/s390/s390-common.c (s390_supports_split_stack):
>   New function.
>   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>   * config/s390/s390.c (struct machine_function): New field
>   split_stack_varargs_pointer.
>   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
>   in s390_emit_prologue.
>   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>   vararg pointer.
>   (morestack_ref): New global.
>   (SPLIT_STACK_AVAILABLE): New macro.
>   (s390_expand_split_stack_prologue): New function.
>   (s390_expand_split_stack_call): New function.
>   (s390_live_on_entry): New function.
>   (s390_va_start): Use split-stack vararg pointer if appropriate.
>   (s390_reorg): Lower the split-stack pseudo-insns.
>   (s390_asm_file_end): Emit the split-stack note sections.
>   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>   * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
>   (UNSPECV_SPLIT_STACK_CALL): New unspec.
>   (UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
>   (UNSPECV_SPLIT_STACK_MARKER): New unspec.
>   (split_stack_prologue): New expand.
>   (split_stack_call): New expand.
>   (split_stack_call_*): New insn.
>   (split_stack_cond_call): New expand.
>   (split_stack_cond_call_*): New insn.
>   (split_stack_space_check): New expand.
>   (split_stack_sibcall): New expand.
>   (split_stack_sibcall_*): New insn.
>   (split_stack_cond_sibcall): New expand.
>   (split_stack_cond_sibcall_*): New insn.
>   (split_stack_marker): New insn.
> ---
> I've implemented most of your requested changes, with two exceptions:
> 
> - I don't use virtual_incoming_args_rtx in s390_expand_split_stack_prologue,
>   since this causes constraint error - I suppose it just cannot be used after
>   reload.
Right. As an elimination reg it cannot be used in the code path called from 
s390_reorg.

> - It seems to me there's no problem with TPF and r1 - the conditional you
>   mention is meant to avoid modifying r14 (which we do - by aiming at r1 and
>   r12 for arg pointer and temp, respectively), not to ensure use of r1 as the
>   temporary.  Unless there's a good reason to avoid modifying r12, the code
>   seems fine to me.
Ok. The comment above this check then does not seem to be correct anymore. 
Could you please adjust
it as well. It should read "avoid register 14" then.

  /* Choose best register to use for temp use within prologue.
 See below for why TPF must use the register 1.  */

  if (!has_hard_reg_initial_val (Pmode, RETURN_REGNUM)
  && !crtl->is_leaf
  && !TARGET_TPF_PROFILING)
temp_reg = gen_rtx_REG (Pmode, RETURN_REGNUM);
...

-Andreas-



> 
> As for the testcase we discussed, I'll submit it as a separate patch.
> 
> 
>  gcc/ChangeLog|  37 +++
>  gcc/common/config/s390/s390-common.c |  14 +
>  gcc/config/s390/s390-protos.h|   1 +
>  gcc/config/s390/s390.c   | 321 +-
>  gcc/config/s390/s390.md  | 177 ++
>  libgcc/ChangeLog |   7 +
>  libgcc/config.host   |   4 +-
>  libgcc/config/s390/morestack.S   | 609 
> +++
>  libgcc/config/s390/t-stack-s390  |   2 +
>  libgcc/generic-morestack.c   |   4 +
>  10 files changed, 1170 insertions(+), 6 deletions(-)
>  create mode 100644 libgcc/config/s390/morestack.S
>  create mode 100644 libgcc/config/s390/t-stack-s390
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 9a2cec8..af86079 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,40 @@
> +2016-02-02  Marcin Kościelnicki  
> +
> + * common/config/s390/s390-common.c (s390_supports_split_stack):
> + New function.
> + (TARGET_SUPPORTS_SPLIT_STACK): New macro.
> + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> + * config/s390/s390.c (struct machine_function): New field
> + split_stack_varargs_pointer.
> + (s390_register_info): Mark r12 as clobbered if it'll be used as temp
> + in s390_emit_prologue.
> + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> + vararg pointer.
> + (morestack_ref): New global.
> + (SPLIT_STACK_AVAILABLE): New macro.
> + (s390_expand_split_stack_prologue): New function.
> + (s390_expand_split_stack_call): New function.
> + (s390_live_on_entry): New function.
> + (s390_va_start): Use split-stack vararg pointer if appropriate.
> + (s390_

PR 69577: Invalid RA of destination subregs

2016-02-02 Thread Richard Sandiford

In PR 69577 we have:

  A: (set (reg:V2TI X) ...)
  B: (set (subreg:TI (reg:V2TI X) 0) ...)

X gets allocated to an AVX register, as usual for V2TI.  The problem is
that the movti for B doesn't then preserve the other half of X, even
though the subreg semantics are supposed to guarantee that.

If instead the same value had been set by:

  A': (set (subreg:TI (reg:V2TI X) 16) ...)
  B: (set (subreg:TI (reg:V2TI X) 0) ...)

the subreg in A' would have prevented the use of AVX registers for X,
since you can't directly access the high part.

IMO these are really the same thing.  An alternative way to view it
is that the original sequence is equivalent to:

  A: (set (reg:V2TI X) ...)
  B1: (set (subreg:TI (reg:V2TI X) 0) ...)
  B2: (set (subreg:TI (reg:V2TI X) 16) (subreg:TI (reg:V2TI X) 16))

in which B2 is a no-op and therefore implicit.  The handling ought
to be the same regardless of whether there is an rtl insn that
explicitly assigns to (subreg:TI (reg:V2TI X) 16).

This patch implements that idea.  Hopefully the comments explain
what's going on.

Tested on x86_64-linux-gnu so far.  Will test on aarch64-linux-gnu and
arm-linux-gnueabihf as well.  OK to install if the additional testing
succeeds?

Thanks,
Richard


diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 6814eed..afb36aa 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -1244,8 +1244,16 @@ simplifiable_subregs (const subreg_shape &shape)
 static HARD_REG_SET **valid_mode_changes;
 static obstack valid_mode_changes_obstack;
 
+/* Restrict the choice of register for SUBREG_REG (SUBREG) based
+   on information about SUBREG.
+
+   If PARTIAL_DEF, SUBREG is a partial definition of a multipart inner
+   register and we want to ensure that the other parts of the inner
+   register are correctly preserved.  If !PARTIAL_DEF we need to
+   ensure that SUBREG itself can be formed.  */
+
 static void
-record_subregs_of_mode (rtx subreg)
+record_subregs_of_mode (rtx subreg, bool partial_def)
 {
   unsigned int regno;
 
@@ -1256,15 +1264,41 @@ record_subregs_of_mode (rtx subreg)
   if (regno < FIRST_PSEUDO_REGISTER)
 return;
 
+  subreg_shape shape (shape_of_subreg (subreg));
+  if (partial_def)
+{
+  /* The number of independently-accessible SHAPE.outer_mode values
+in SHAPE.inner_mode is GET_MODE_SIZE (SHAPE.inner_mode) / SIZE.
+We need to check that the assignment will preserve all the other
+SIZE-byte chunks in the inner register besides the one that
+includes SUBREG.
+
+In practice it is enough to check whether an equivalent
+SHAPE.inner_mode value in an adjacent SIZE-byte chunk can be formed.
+If the underlying registers are small enough, both subregs will
+be valid.  If the underlying registers are too large, one of the
+subregs will be invalid.
+
+This relies on the fact that we've already been passed
+SUBREG with PARTIAL_DEF set to false.  */
+  unsigned int size = MAX (REGMODE_NATURAL_SIZE (shape.inner_mode),
+  GET_MODE_SIZE (shape.outer_mode));
+  gcc_checking_assert (size < GET_MODE_SIZE (shape.inner_mode));
+  if (shape.offset >= size)
+   shape.offset -= size;
+  else
+   shape.offset += size;
+}
+
   if (valid_mode_changes[regno])
 AND_HARD_REG_SET (*valid_mode_changes[regno],
- simplifiable_subregs (shape_of_subreg (subreg)));
+ simplifiable_subregs (shape));
   else
 {
   valid_mode_changes[regno]
= XOBNEW (&valid_mode_changes_obstack, HARD_REG_SET);
   COPY_HARD_REG_SET (*valid_mode_changes[regno],
-simplifiable_subregs (shape_of_subreg (subreg)));
+simplifiable_subregs (shape));
 }
 }
 
@@ -1277,7 +1311,7 @@ find_subregs_of_mode (rtx x)
   int i;
 
   if (code == SUBREG)
-record_subregs_of_mode (x);
+record_subregs_of_mode (x, false);
 
   /* Time for some deep diving.  */
   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
@@ -1304,8 +1338,15 @@ init_subregs_of_mode (void)
 
   FOR_EACH_BB_FN (bb, cfun)
 FOR_BB_INSNS (bb, insn)
-  if (NONDEBUG_INSN_P (insn))
-find_subregs_of_mode (PATTERN (insn));
+  {
+   if (NONDEBUG_INSN_P (insn))
+ find_subregs_of_mode (PATTERN (insn));
+   df_ref def;
+   FOR_EACH_INSN_DEF (def, insn)
+ if (DF_REF_FLAGS_IS_SET (def, DF_REF_PARTIAL)
+ && df_read_modify_subreg_p (DF_REF_REG (def)))
+   record_subregs_of_mode (DF_REF_REG (def), true);
+  }
 }
 
 const HARD_REG_SET *
diff --git a/gcc/testsuite/gcc.target/i386/pr69577.c 
b/gcc/testsuite/gcc.target/i386/pr69577.c
new file mode 100644
index 000..d680539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr69577.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O -fno-forward-propagate -fno-split-wide-types -mavx"

Re: [PATCH] Fix PR64748

2016-02-02 Thread James Norris


Hi!

On 02/01/2016 02:03 PM, Jakub Jelinek wrote:

On Mon, Feb 01, 2016 at 01:41:50PM -0600, James Norris wrote:

The attached patch resolves c/PR64748. The patch
adds the use of parm's with the deviceptr clause.


 [snip snip]

--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10760,7 +10760,7 @@ c_parser_oacc_data_clause_deviceptr (c_parser *parser, 
tree list)
 c_parser_omp_var_list_parens() should construct a list of
 locations to go along with the var list.  */

-  if (!VAR_P (v))
+  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))


Please don't write !(x == y) but x != y.


Fixed.




--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -30087,7 +30087,7 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser 
*parser, tree list)
 c_parser_omp_var_list_parens should construct a list of
 locations to go along with the var list.  */

-  if (!VAR_P (v))
+  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))
error_at (loc, "%qD is not a variable", v);
else if (TREE_TYPE (v) == error_mark_node)
;


For C++, all this diagnostics is premature, if processing_template_decl
you really often don't know what the type will be, not sure if you always
know at least if it is a VAR_DECL, PARM_DECL or something else.  I bet you
can easily ICE with the current POINTER_TYPE_P (TREE_TYPE (v)) check as
in templates the type can be NULL, or it could be some lang type and only
later on become POINTER_TYPE, etc.
For C++ the diagnostics need to be done during finish_omp_clauses or so, not
earlier.


The check has been moved to finish_omp_clause (). I put the check at
the tail end of the checking, as I wasn't able to determine if there
was a checking precedence done by the if-else-if sequence.

Thanks for the review!

Jim


= ChangeLog entries...

gcc/testsuite/

PR c/64748
* c-c++-common/goacc/deviceptr-1.c: Add tests.
* g++.dg/goacc/deviceptr-1.c: New file.


gcc/cp/

PR c/64748
* parser.c (cp_parser_oacc_data_clause_deviceptr): Remove checking.
* semantics.c (finish_omp_clauses): Add deviceptr checking.


gcc/c/

PR c/64748
* c-parser.c (c_parser_oacc_data_clause_deviceptr): Allow parms.



diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
index 5341f04..f2d114c 100644
--- a/gcc/c/ChangeLog
+++ b/gcc/c/ChangeLog
@@ -1,3 +1,8 @@
+2016-02-XX  James Norris  
+
+	PR c/64748
+	* c-parser.c (c_parser_oacc_data_clause_deviceptr): Allow parms.
+
 2016-01-27  Jakub Jelinek  
 
 	PR debug/66869
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index eede3a7..229fd6e 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10760,7 +10760,7 @@ c_parser_oacc_data_clause_deviceptr (c_parser *parser, tree list)
 	 c_parser_omp_var_list_parens() should construct a list of
 	 locations to go along with the var list.  */
 
-  if (!VAR_P (v))
+  if (!VAR_P (v) && TREE_CODE (v) != PARM_DECL)
 	error_at (loc, "%qD is not a variable", v);
   else if (TREE_TYPE (v) == error_mark_node)
 	;
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 3b5c9d5..76cf5b1 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,9 @@
+2016-02-XX  James Norris  
+
+	PR c/64748
+	* parser.c (cp_parser_oacc_data_clause_deviceptr): Remove checking.
+	* semantics.c (finish_omp_clauses): Add deviceptr checking.
+
 2016-01-29  Jakub Jelinek  
 
 	PR debug/66869
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d03b0c9..10f3627 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -30080,20 +30080,6 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser *parser, tree list)
   for (t = vars; t; t = TREE_CHAIN (t))
 {
   tree v = TREE_PURPOSE (t);
-
-  /* FIXME diagnostics: Ideally we should keep individual
-	 locations for all the variables in the var list to make the
-	 following errors more precise.  Perhaps
-	 c_parser_omp_var_list_parens should construct a list of
-	 locations to go along with the var list.  */
-
-  if (!VAR_P (v))
-	error_at (loc, "%qD is not a variable", v);
-  else if (TREE_TYPE (v) == error_mark_node)
-	;
-  else if (!POINTER_TYPE_P (TREE_TYPE (v)))
-	error_at (loc, "%qD is not a pointer variable", v);
-
   tree u = build_omp_clause (loc, OMP_CLAUSE_MAP);
   OMP_CLAUSE_SET_MAP_KIND (u, GOMP_MAP_FORCE_DEVICEPTR);
   OMP_CLAUSE_DECL (u) = v;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 95c4f19..1e376b1 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -6683,6 +6683,14 @@ finish_omp_clauses (tree clauses, bool allow_fields, bool declare_simd)
 	  error ("%qD appears both in data and map clauses", t);
 	  remove = true;
 	}
+	  else if (!processing_template_decl
+		   && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+		   && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR
+		   && !POINTER_TYPE_P (TREE_TYPE (t)))
+	{
+	  error ("%qD is not a pointer variable", t);
+	  remove = true;
+	}
 	  else
 	{

[PATCH] s390: Add -fsplit-stack support

2016-02-02 Thread Marcin Kościelnicki

libgcc/ChangeLog:

* config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
* config/s390/morestack.S: New file.
* config/s390/t-stack-s390: New file.
* generic-morestack.c (__splitstack_find): Add s390-specific code.

gcc/ChangeLog:

* common/config/s390/s390-common.c (s390_supports_split_stack):
New function.
(TARGET_SUPPORTS_SPLIT_STACK): New macro.
* config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
* config/s390/s390.c (struct machine_function): New field
split_stack_varargs_pointer.
(s390_register_info): Mark r12 as clobbered if it'll be used as temp
in s390_emit_prologue.
(s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
vararg pointer.
(morestack_ref): New global.
(SPLIT_STACK_AVAILABLE): New macro.
(s390_expand_split_stack_prologue): New function.
(s390_expand_split_stack_call): New function.
(s390_live_on_entry): New function.
(s390_va_start): Use split-stack vararg pointer if appropriate.
(s390_reorg): Lower the split-stack pseudo-insns.
(s390_asm_file_end): Emit the split-stack note sections.
(TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
* config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
(UNSPECV_SPLIT_STACK_CALL): New unspec.
(UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
(UNSPECV_SPLIT_STACK_MARKER): New unspec.
(split_stack_prologue): New expand.
(split_stack_call): New expand.
(split_stack_call_*): New insn.
(split_stack_cond_call): New expand.
(split_stack_cond_call_*): New insn.
(split_stack_space_check): New expand.
(split_stack_sibcall): New expand.
(split_stack_sibcall_*): New insn.
(split_stack_cond_sibcall): New expand.
(split_stack_cond_sibcall_*): New insn.
(split_stack_marker): New insn.
---
I've implemented most of your requested changes, with two exceptions:

- I don't use virtual_incoming_args_rtx in s390_expand_split_stack_prologue,
  since this causes constraint error - I suppose it just cannot be used after
  reload.
- It seems to me there's no problem with TPF and r1 - the conditional you
  mention is meant to avoid modifying r14 (which we do - by aiming at r1 and
  r12 for arg pointer and temp, respectively), not to ensure use of r1 as the
  temporary.  Unless there's a good reason to avoid modifying r12, the code
  seems fine to me.

As for the testcase we discussed, I'll submit it as a separate patch.


 gcc/ChangeLog|  37 +++
 gcc/common/config/s390/s390-common.c |  14 +
 gcc/config/s390/s390-protos.h|   1 +
 gcc/config/s390/s390.c   | 321 +-
 gcc/config/s390/s390.md  | 177 ++
 libgcc/ChangeLog |   7 +
 libgcc/config.host   |   4 +-
 libgcc/config/s390/morestack.S   | 609 +++
 libgcc/config/s390/t-stack-s390  |   2 +
 libgcc/generic-morestack.c   |   4 +
 10 files changed, 1170 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/s390/morestack.S
 create mode 100644 libgcc/config/s390/t-stack-s390

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a2cec8..af86079 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,40 @@
+2016-02-02  Marcin Kościelnicki  
+
+   * common/config/s390/s390-common.c (s390_supports_split_stack):
+   New function.
+   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
+   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
+   * config/s390/s390.c (struct machine_function): New field
+   split_stack_varargs_pointer.
+   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
+   in s390_emit_prologue.
+   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
+   vararg pointer.
+   (morestack_ref): New global.
+   (SPLIT_STACK_AVAILABLE): New macro.
+   (s390_expand_split_stack_prologue): New function.
+   (s390_expand_split_stack_call): New function.
+   (s390_live_on_entry): New function.
+   (s390_va_start): Use split-stack vararg pointer if appropriate.
+   (s390_reorg): Lower the split-stack pseudo-insns.
+   (s390_asm_file_end): Emit the split-stack note sections.
+   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
+   * config/s390/s390.md (UNSPEC_STACK_CHECK): New unspec.
+   (UNSPECV_SPLIT_STACK_CALL): New unspec.
+   (UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
+   (UNSPECV_SPLIT_STACK_MARKER): New unspec.
+   (split_stack_prologue): New expand.
+   (split_stack_call): New expand.
+   (split_stack_call_*): New insn.
+   (split_stack_cond_call): New expand.
+   (split_stack_cond_call_*): New insn.
+   (split_stack_space_check): New expand.
+   (split_stack_sibcall): New expand.
+

Re: [hsa merge 00/10] Merge of HSA branch

2016-02-02 Thread Martin Jambor

Hi,

On Thu, Jan 28, 2016 at 08:18:27AM -0700, Gerald Pfeifer wrote:
> 
> This is okay with the changes/considering the questions above.
> 

thanks for the feedback.  I have committed the following after
incorporating the comments.

Martin

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- changes.html25 Jan 2016 15:09:55 -  1.52
+++ changes.html2 Feb 2016 14:09:11 -
@@ -272,6 +272,30 @@

 
 
+Heterogeneous Systems Architecture
+   
+ GCC can now generate HSAIL (Heterogeneous System Architecture
+   Intermediate Language) for simple OpenMP device constructs if
+   configured with --enable-offload-targets=hsa.  A new
+   libgomp plugin then runs the HSA GPU kernels implementing these
+   constructs on HSA capable GPUs via a standard HSA run time.
+   
+   If the HSA compilation back end determines it cannot output HSAIL
+   for a particular input, it gives a warning by default.  These
+   warnings can be suppressed with -Wno-hsa.  To give a few
+   examples, the HSA back end does not implement compilation of code
+   using function pointers, automatic allocation of variable sized
+   arrays, functions with variadic arguments as well as a number of
+   other less common programming constructs.
+
+   When compilation for HSA is enabled, the compiler attempts to
+   compile composite OpenMP constructs
+
+#pragma omp target teams distribute parallel for
+into parallel HSA GPU kernels.
+ 
+   
+
 IA-32/x86-64

  GCC now supports the Intel CPU named Skylake with AVX-512 extensions




Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.993
diff -u -r1.993 index.html
--- index.html  30 Jan 2016 06:01:48 -  1.993
+++ index.html  2 Feb 2016 14:10:25 -
@@ -50,6 +50,13 @@
 News
 
 
+ Heterogeneous Systems Architecture support
+ [2016-01-27]
+ http://www.hsafoundation.com/";> Heterogeneous Systems
+ Architecture 1.0 https://gcc.gnu.org/gcc-6/changes.html#hsa";>
+ support was added to GCC, contributed by Martin Jambor, Martin Liška
+ and Michael Matz from SUSE.
+
 GCC 5.3 released
 [2015-12-04]

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Ilya Enkovich

2016-02-02 17:03 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 5:55 AM, Ilya Enkovich  wrote:
>> 2016-02-02 16:25 GMT+03:00 H.J. Lu :
>>> On Tue, Feb 2, 2016 at 5:21 AM, Ilya Enkovich  
>>> wrote:
 2016-02-02 16:14 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 5:11 AM, Ilya Enkovich  
> wrote:
>> 2016-02-02 16:06 GMT+03:00 H.J. Lu :
>>> On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  
>>> wrote:
 2016-02-02 15:46 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
>> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  
>> wrote:
>>> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
 On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  
 wrote:

 >> The bottom line is  ix86_minimum_alignment must return the 
 >> correct
 >> number for DImode or you can just turn off STV.   My suggestion 
 >> is
 >> to use my patch.
 >
 > Uros, any preferences here?  I mean, it is possible to use
 > e.g. the ix86_option_override_internal and have H.J's 
 > ix86_minimum_alignment
 > change as a safety net, in the usual case for 
 > -mpreferred-stack-boundary=2
 > we'll just disable TARGET_STV and ix86_minimum_alignment change 
 > won't do
 > anything, as TARGET_STV will be false, and if for whatever case 
 > it gets
 > through (target attribute, -mincoming-stack-boundary=, ...)
 > ix86_minimum_alignment will be there to ensure enough stack 
 > alignment.
 > Most of the smaller -mpreferred-stack-boundary= uses are 
 > -mno-sse anyway,
 > and that is something we don't want to affect.

 IMO, we should disable STV when -mpreferred-stack-boundary < 3, as 
 STV
 is only an optimization. Perhaps we can also emit a "sorry" for
 explicit -mstv in case stack boundary requirement is not satisfied.
 *If* there is a need for -mstv with smaller stack boundary, we can
 revisit this decision for later gcc versions.

 I think disabling STV is less surprising option than increasing 
 stack
 boundary behind the user's back.
>>>
>>> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
>>> ok for trunk then (alone or with additional sorry, incremental or 
>>> not?)?
>>> I believe it does just that.
>>
>> This patch is WRONG.
>>
>> --
>> H.J.
>
> You will run into the same ICE with
>
> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>
> in a leaf function which needs DImode spill/fill.

 Why would we need DImode spill/fill having no DImode registers?

>>>
>>> Because STV is enabled with
>>>
>>>  -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>
>> I misread it as -mpreferred-... So why would we fail having a proper
>> preferred stack alignment? AFAIK leaf function doesn't affect
>> alignment until we finalize it after RA.
>>
>
> /* Finalize stack_realign_needed flag, which will guide prologue/epilogue
>to be generated in correct form.  */
> static void
> ix86_finalize_stack_realign_flags (void)
> {
>   /* Check if stack realign is really needed after reload, and
>  stores result in cfun */
>   unsigned int incoming_stack_boundary
> = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
>? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
>   unsigned int stack_realign
> = (incoming_stack_boundary
>< (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
>   ? crtl->max_used_stack_slot_alignment
> ^^

 We call it after RA when all spill slots are allocated and check if we
 may relax stack alignment. Don't see any problem here.
>>>
>>> Please see
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69454#c26
>>>
>>> Why did LRA crash then?
>>
>> Because it tries a patch [1] which doesn't fix stack alignment and STV
>> enabling and therefore doesn't resolve the problem when
>> -mpreferred-stack-boundary=2 is used.
>>
>
> No, it is because RA doesn't increase stack alignment.  You have to
> have the correct stack alignment requirement before entering RA.

And it's too late to do it after STV pass and therefore we disable it
when stack is not properly aligned. I think this argumentation goes in
a loop.

Thanks,
Ilya

>
>
> --
> H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 5:55 AM, Ilya Enkovich  wrote:
> 2016-02-02 16:25 GMT+03:00 H.J. Lu :
>> On Tue, Feb 2, 2016 at 5:21 AM, Ilya Enkovich  wrote:
>>> 2016-02-02 16:14 GMT+03:00 H.J. Lu :
 On Tue, Feb 2, 2016 at 5:11 AM, Ilya Enkovich  
 wrote:
> 2016-02-02 16:06 GMT+03:00 H.J. Lu :
>> On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  
>> wrote:
>>> 2016-02-02 15:46 GMT+03:00 H.J. Lu :
 On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  
> wrote:
>> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
>>> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  
>>> wrote:
>>>
>>> >> The bottom line is  ix86_minimum_alignment must return the 
>>> >> correct
>>> >> number for DImode or you can just turn off STV.   My suggestion 
>>> >> is
>>> >> to use my patch.
>>> >
>>> > Uros, any preferences here?  I mean, it is possible to use
>>> > e.g. the ix86_option_override_internal and have H.J's 
>>> > ix86_minimum_alignment
>>> > change as a safety net, in the usual case for 
>>> > -mpreferred-stack-boundary=2
>>> > we'll just disable TARGET_STV and ix86_minimum_alignment change 
>>> > won't do
>>> > anything, as TARGET_STV will be false, and if for whatever case 
>>> > it gets
>>> > through (target attribute, -mincoming-stack-boundary=, ...)
>>> > ix86_minimum_alignment will be there to ensure enough stack 
>>> > alignment.
>>> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
>>> > anyway,
>>> > and that is something we don't want to affect.
>>>
>>> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as 
>>> STV
>>> is only an optimization. Perhaps we can also emit a "sorry" for
>>> explicit -mstv in case stack boundary requirement is not satisfied.
>>> *If* there is a need for -mstv with smaller stack boundary, we can
>>> revisit this decision for later gcc versions.
>>>
>>> I think disabling STV is less surprising option than increasing 
>>> stack
>>> boundary behind the user's back.
>>
>> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
>> ok for trunk then (alone or with additional sorry, incremental or 
>> not?)?
>> I believe it does just that.
>
> This patch is WRONG.
>
> --
> H.J.

 You will run into the same ICE with

 -mincoming-stack-boundary=2 -msse2 -O2 -m32

 in a leaf function which needs DImode spill/fill.
>>>
>>> Why would we need DImode spill/fill having no DImode registers?
>>>
>>
>> Because STV is enabled with
>>
>>  -mincoming-stack-boundary=2 -msse2 -O2 -m32
>
> I misread it as -mpreferred-... So why would we fail having a proper
> preferred stack alignment? AFAIK leaf function doesn't affect
> alignment until we finalize it after RA.
>

 /* Finalize stack_realign_needed flag, which will guide prologue/epilogue
to be generated in correct form.  */
 static void
 ix86_finalize_stack_realign_flags (void)
 {
   /* Check if stack realign is really needed after reload, and
  stores result in cfun */
   unsigned int incoming_stack_boundary
 = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
   unsigned int stack_realign
 = (incoming_stack_boundary
< (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
   ? crtl->max_used_stack_slot_alignment
 ^^
>>>
>>> We call it after RA when all spill slots are allocated and check if we
>>> may relax stack alignment. Don't see any problem here.
>>
>> Please see
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69454#c26
>>
>> Why did LRA crash then?
>
> Because it tries a patch [1] which doesn't fix stack alignment and STV
> enabling and therefore doesn't resolve the problem when
> -mpreferred-stack-boundary=2 is used.
>

No, it is because RA doesn't increase stack alignment.  You have to
have the correct stack alignment requirement before entering RA.


-- 
H.J.

Re: [OpenACC 0/7] host_data construct

2016-02-02 Thread Thomas Schwinge

Hi!

On Wed, 2 Dec 2015 16:58:45 +0100, I wrote:
> On Mon, 30 Nov 2015 19:30:34 +, Julian Brown  
> wrote:
> > --- a/libgomp/oacc-parallel.c
> > +++ b/libgomp/oacc-parallel.c
> 
> > +void
> > +GOACC_host_data (int device, size_t mapnum,
> > +void **hostaddrs, size_t *sizes, unsigned short *kinds)
> > +{
> > +[...]
> > +}
> 
> Isn't that identical to GOACC_data_start?  Can we thus get rid of it?

Yes, we can.  As GOACC_host_data has not been part of GCC 5's libgomp
ABI, it's OK to just remove it; committed "as obvious" in r233074:

commit 2bf3f448431be10baa9755df5faeed6b2f6508f8
Author: tschwinge 
Date:   Tue Feb 2 13:53:55 2016 +

Merge BUILT_IN_GOACC_HOST_DATA into BUILT_IN_GOACC_DATA_START

gcc/
* omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
* omp-low.c (expand_omp_target): Use BUILT_IN_GOACC_DATA_START
instead.
libgomp/
* libgomp.map (GOACC_2.0): Remove GOACC_host_data.
* oacc-parallel.c (GOACC_host_data): Remove function definition.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233074 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |  6 ++
 gcc/omp-builtins.def|  2 --
 gcc/omp-low.c   |  5 +
 libgomp/ChangeLog   |  3 +++
 libgomp/libgomp.map |  1 -
 libgomp/oacc-parallel.c | 40 
 6 files changed, 10 insertions(+), 47 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 05741331..9a2cec8 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,9 @@
+2016-02-02  Thomas Schwinge  
+
+   * omp-builtins.def (BUILT_IN_GOACC_HOST_DATA): Remove.
+   * omp-low.c (expand_omp_target): Use BUILT_IN_GOACC_DATA_START
+   instead.
+
 2016-02-02  Richard Biener  
 
PR tree-optimization/69606
diff --git gcc/omp-builtins.def gcc/omp-builtins.def
index 60199b0..ea012df 100644
--- gcc/omp-builtins.def
+++ gcc/omp-builtins.def
@@ -47,8 +47,6 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE, "GOACC_update",
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
   BT_FN_VOID_INT_INT_VAR,
   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_HOST_DATA, "GOACC_host_data",
-  BT_FN_VOID_INT_SIZE_PTR_PTR_PTR, ATTR_NOTHROW_LIST)
 
 DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git gcc/omp-low.c gcc/omp-low.c
index 0b70274..d41688b 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -13186,6 +13186,7 @@ expand_omp_target (struct omp_region *region)
   start_ix = BUILT_IN_GOACC_PARALLEL;
   break;
 case GF_OMP_TARGET_KIND_OACC_DATA:
+case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
   start_ix = BUILT_IN_GOACC_DATA_START;
   break;
 case GF_OMP_TARGET_KIND_OACC_UPDATE:
@@ -13197,9 +13198,6 @@ expand_omp_target (struct omp_region *region)
 case GF_OMP_TARGET_KIND_OACC_DECLARE:
   start_ix = BUILT_IN_GOACC_DECLARE;
   break;
-case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
-  start_ix = BUILT_IN_GOACC_HOST_DATA;
-  break;
 default:
   gcc_unreachable ();
 }
@@ -13324,7 +13322,6 @@ expand_omp_target (struct omp_region *region)
 case BUILT_IN_GOACC_DATA_START:
 case BUILT_IN_GOACC_DECLARE:
 case BUILT_IN_GOMP_TARGET_DATA:
-case BUILT_IN_GOACC_HOST_DATA:
   break;
 case BUILT_IN_GOMP_TARGET:
 case BUILT_IN_GOMP_TARGET_UPDATE:
diff --git libgomp/ChangeLog libgomp/ChangeLog
index 6c9bf6a..250240d 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,8 @@
 2016-02-02  Thomas Schwinge  
 
+   * libgomp.map (GOACC_2.0): Remove GOACC_host_data.
+   * oacc-parallel.c (GOACC_host_data): Remove function definition.
+
* testsuite/lib/libgomp.exp: Skip hsa offloading for OpenACC test
cases.
 
diff --git libgomp/libgomp.map libgomp/libgomp.map
index ea9344d..4d42c42 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -394,7 +394,6 @@ GOACC_2.0.1 {
   global:
GOACC_declare;
GOACC_parallel_keyed;
-   GOACC_host_data;
 } GOACC_2.0;
 
 GOMP_PLUGIN_1.0 {
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index f22ba41..bc24651 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -490,46 +490,6 @@ GOACC_wait (int async, int num_waits, ...)
 goacc_thread ()->dev->openacc.async_wait_all_async_func (acc_async_noval);
 }
 
-void
-GOACC_host_data (int device, size_t mapnum,
-void **hostaddrs, size_t *sizes, unsigned short *kinds)
-{
-  bool host_fallback = device == GOMP_DEVICE_HOST_FALLBACK;
-  struct target_mem_desc *tgt;
-
-#ifdef HAVE_INTTYPES_H
-  gomp_debug (0, "%s: mapnum=%"PRIu64", hostaddrs=%p, size=%p, kinds=%p\n",
- __FUNCTION__, (uint64_t) mapnum, hostaddrs, sizes, kinds);
-#else
-  gomp_debug (0, "%s: mapnum=%lu, hostaddrs=%p, sizes=%p, kinds=%p\n",
- __FUNCTION__, (unsigned long) mapnum, hostaddrs, si

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Ilya Enkovich

2016-02-02 16:25 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 5:21 AM, Ilya Enkovich  wrote:
>> 2016-02-02 16:14 GMT+03:00 H.J. Lu :
>>> On Tue, Feb 2, 2016 at 5:11 AM, Ilya Enkovich  
>>> wrote:
 2016-02-02 16:06 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  
> wrote:
>> 2016-02-02 15:46 GMT+03:00 H.J. Lu :
>>> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
 On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
>> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  
>> wrote:
>>
>> >> The bottom line is  ix86_minimum_alignment must return the correct
>> >> number for DImode or you can just turn off STV.   My suggestion is
>> >> to use my patch.
>> >
>> > Uros, any preferences here?  I mean, it is possible to use
>> > e.g. the ix86_option_override_internal and have H.J's 
>> > ix86_minimum_alignment
>> > change as a safety net, in the usual case for 
>> > -mpreferred-stack-boundary=2
>> > we'll just disable TARGET_STV and ix86_minimum_alignment change 
>> > won't do
>> > anything, as TARGET_STV will be false, and if for whatever case it 
>> > gets
>> > through (target attribute, -mincoming-stack-boundary=, ...)
>> > ix86_minimum_alignment will be there to ensure enough stack 
>> > alignment.
>> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
>> > anyway,
>> > and that is something we don't want to affect.
>>
>> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as 
>> STV
>> is only an optimization. Perhaps we can also emit a "sorry" for
>> explicit -mstv in case stack boundary requirement is not satisfied.
>> *If* there is a need for -mstv with smaller stack boundary, we can
>> revisit this decision for later gcc versions.
>>
>> I think disabling STV is less surprising option than increasing stack
>> boundary behind the user's back.
>
> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
> ok for trunk then (alone or with additional sorry, incremental or 
> not?)?
> I believe it does just that.

 This patch is WRONG.

 --
 H.J.
>>>
>>> You will run into the same ICE with
>>>
>>> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>>
>>> in a leaf function which needs DImode spill/fill.
>>
>> Why would we need DImode spill/fill having no DImode registers?
>>
>
> Because STV is enabled with
>
>  -mincoming-stack-boundary=2 -msse2 -O2 -m32

 I misread it as -mpreferred-... So why would we fail having a proper
 preferred stack alignment? AFAIK leaf function doesn't affect
 alignment until we finalize it after RA.

>>>
>>> /* Finalize stack_realign_needed flag, which will guide prologue/epilogue
>>>to be generated in correct form.  */
>>> static void
>>> ix86_finalize_stack_realign_flags (void)
>>> {
>>>   /* Check if stack realign is really needed after reload, and
>>>  stores result in cfun */
>>>   unsigned int incoming_stack_boundary
>>> = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
>>>? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
>>>   unsigned int stack_realign
>>> = (incoming_stack_boundary
>>>< (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
>>>   ? crtl->max_used_stack_slot_alignment
>>> ^^
>>
>> We call it after RA when all spill slots are allocated and check if we
>> may relax stack alignment. Don't see any problem here.
>
> Please see
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69454#c26
>
> Why did LRA crash then?

Because it tries a patch [1] which doesn't fix stack alignment and STV
enabling and therefore doesn't resolve the problem when
-mpreferred-stack-boundary=2 is used.

Thanks,
Ilya
--
[1] https://gcc.gnu.org/bugzilla/attachment.cgi?id=37468&action=diff

>
>
> --
> H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 5:21 AM, Ilya Enkovich  wrote:
> 2016-02-02 16:14 GMT+03:00 H.J. Lu :
>> On Tue, Feb 2, 2016 at 5:11 AM, Ilya Enkovich  wrote:
>>> 2016-02-02 16:06 GMT+03:00 H.J. Lu :
 On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  
 wrote:
> 2016-02-02 15:46 GMT+03:00 H.J. Lu :
>> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
>>> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
 On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  
> wrote:
>
> >> The bottom line is  ix86_minimum_alignment must return the correct
> >> number for DImode or you can just turn off STV.   My suggestion is
> >> to use my patch.
> >
> > Uros, any preferences here?  I mean, it is possible to use
> > e.g. the ix86_option_override_internal and have H.J's 
> > ix86_minimum_alignment
> > change as a safety net, in the usual case for 
> > -mpreferred-stack-boundary=2
> > we'll just disable TARGET_STV and ix86_minimum_alignment change 
> > won't do
> > anything, as TARGET_STV will be false, and if for whatever case it 
> > gets
> > through (target attribute, -mincoming-stack-boundary=, ...)
> > ix86_minimum_alignment will be there to ensure enough stack 
> > alignment.
> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
> > anyway,
> > and that is something we don't want to affect.
>
> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
> is only an optimization. Perhaps we can also emit a "sorry" for
> explicit -mstv in case stack boundary requirement is not satisfied.
> *If* there is a need for -mstv with smaller stack boundary, we can
> revisit this decision for later gcc versions.
>
> I think disabling STV is less surprising option than increasing stack
> boundary behind the user's back.

 So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
 ok for trunk then (alone or with additional sorry, incremental or 
 not?)?
 I believe it does just that.
>>>
>>> This patch is WRONG.
>>>
>>> --
>>> H.J.
>>
>> You will run into the same ICE with
>>
>> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>
>> in a leaf function which needs DImode spill/fill.
>
> Why would we need DImode spill/fill having no DImode registers?
>

 Because STV is enabled with

  -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>>
>>> I misread it as -mpreferred-... So why would we fail having a proper
>>> preferred stack alignment? AFAIK leaf function doesn't affect
>>> alignment until we finalize it after RA.
>>>
>>
>> /* Finalize stack_realign_needed flag, which will guide prologue/epilogue
>>to be generated in correct form.  */
>> static void
>> ix86_finalize_stack_realign_flags (void)
>> {
>>   /* Check if stack realign is really needed after reload, and
>>  stores result in cfun */
>>   unsigned int incoming_stack_boundary
>> = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
>>? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
>>   unsigned int stack_realign
>> = (incoming_stack_boundary
>>< (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
>>   ? crtl->max_used_stack_slot_alignment
>> ^^
>
> We call it after RA when all spill slots are allocated and check if we
> may relax stack alignment. Don't see any problem here.

Please see

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69454#c26

Why did LRA crash then?


-- 
H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Ilya Enkovich

2016-02-02 16:14 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 5:11 AM, Ilya Enkovich  wrote:
>> 2016-02-02 16:06 GMT+03:00 H.J. Lu :
>>> On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  
>>> wrote:
 2016-02-02 15:46 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
>> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
>>> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
 On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  
 wrote:

 >> The bottom line is  ix86_minimum_alignment must return the correct
 >> number for DImode or you can just turn off STV.   My suggestion is
 >> to use my patch.
 >
 > Uros, any preferences here?  I mean, it is possible to use
 > e.g. the ix86_option_override_internal and have H.J's 
 > ix86_minimum_alignment
 > change as a safety net, in the usual case for 
 > -mpreferred-stack-boundary=2
 > we'll just disable TARGET_STV and ix86_minimum_alignment change 
 > won't do
 > anything, as TARGET_STV will be false, and if for whatever case it 
 > gets
 > through (target attribute, -mincoming-stack-boundary=, ...)
 > ix86_minimum_alignment will be there to ensure enough stack 
 > alignment.
 > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
 > anyway,
 > and that is something we don't want to affect.

 IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
 is only an optimization. Perhaps we can also emit a "sorry" for
 explicit -mstv in case stack boundary requirement is not satisfied.
 *If* there is a need for -mstv with smaller stack boundary, we can
 revisit this decision for later gcc versions.

 I think disabling STV is less surprising option than increasing stack
 boundary behind the user's back.
>>>
>>> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
>>> ok for trunk then (alone or with additional sorry, incremental or not?)?
>>> I believe it does just that.
>>
>> This patch is WRONG.
>>
>> --
>> H.J.
>
> You will run into the same ICE with
>
> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>
> in a leaf function which needs DImode spill/fill.

 Why would we need DImode spill/fill having no DImode registers?

>>>
>>> Because STV is enabled with
>>>
>>>  -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>
>> I misread it as -mpreferred-... So why would we fail having a proper
>> preferred stack alignment? AFAIK leaf function doesn't affect
>> alignment until we finalize it after RA.
>>
>
> /* Finalize stack_realign_needed flag, which will guide prologue/epilogue
>to be generated in correct form.  */
> static void
> ix86_finalize_stack_realign_flags (void)
> {
>   /* Check if stack realign is really needed after reload, and
>  stores result in cfun */
>   unsigned int incoming_stack_boundary
> = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
>? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
>   unsigned int stack_realign
> = (incoming_stack_boundary
>< (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
>   ? crtl->max_used_stack_slot_alignment
> ^^

We call it after RA when all spill slots are allocated and check if we
may relax stack alignment. Don't see any problem here.

Thanks,
Ilya

>
> For leaf function, we check max_used_stack_slot_alignment.
> Since ix86_minimum_alignment returns 32 for DImode.
> We won't realign stack for DImode spill/fill.
>
>   : crtl->stack_alignment_needed));
>
>
> --
> H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 5:09 AM, Jakub Jelinek  wrote:
> On Tue, Feb 02, 2016 at 04:46:26AM -0800, H.J. Lu wrote:
>> >> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
>> >> ok for trunk then (alone or with additional sorry, incremental or not?)?
>> >> I believe it does just that.
>> >
>> > This patch is WRONG.
>> >
>> > --
>> > H.J.
>>
>> You will run into the same ICE with
>>
>> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>
>> in a leaf function which needs DImode spill/fill.
>
> So are you arguing for changing
> +  /* Disable STV if -mpreferred-stack-boundary=2 - the needed
> + stack realignment will be extra cost the pass doesn't take into
> + account and the pass can't realign the stack.  */
> +  if (ix86_preferred_stack_boundary < 64)
> +opts->x_target_flags &= ~MASK_STV;
> to
> +  /* Disable STV if -m{preferred,incoming}-stack-boundary=2 - the needed
> + stack realignment will be extra cost the pass doesn't take into
> + account and the pass can't realign the stack.  */
> +  if (ix86_preferred_stack_boundary < 64
> +  || ix86_incoming_stack_boundary < 64)
> +opts->x_target_flags &= ~MASK_STV;
> I'm fine with that.
>

There are many checks/asserts for stack alignment.  At minimum,
we should assert STV is off in ix86_minimum_alignment before
returning 32 for DImode.


-- 
H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 5:11 AM, Ilya Enkovich  wrote:
> 2016-02-02 16:06 GMT+03:00 H.J. Lu :
>> On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  wrote:
>>> 2016-02-02 15:46 GMT+03:00 H.J. Lu :
 On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
>> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
>>> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
>>>
>>> >> The bottom line is  ix86_minimum_alignment must return the correct
>>> >> number for DImode or you can just turn off STV.   My suggestion is
>>> >> to use my patch.
>>> >
>>> > Uros, any preferences here?  I mean, it is possible to use
>>> > e.g. the ix86_option_override_internal and have H.J's 
>>> > ix86_minimum_alignment
>>> > change as a safety net, in the usual case for 
>>> > -mpreferred-stack-boundary=2
>>> > we'll just disable TARGET_STV and ix86_minimum_alignment change won't 
>>> > do
>>> > anything, as TARGET_STV will be false, and if for whatever case it 
>>> > gets
>>> > through (target attribute, -mincoming-stack-boundary=, ...)
>>> > ix86_minimum_alignment will be there to ensure enough stack alignment.
>>> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
>>> > anyway,
>>> > and that is something we don't want to affect.
>>>
>>> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
>>> is only an optimization. Perhaps we can also emit a "sorry" for
>>> explicit -mstv in case stack boundary requirement is not satisfied.
>>> *If* there is a need for -mstv with smaller stack boundary, we can
>>> revisit this decision for later gcc versions.
>>>
>>> I think disabling STV is less surprising option than increasing stack
>>> boundary behind the user's back.
>>
>> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
>> ok for trunk then (alone or with additional sorry, incremental or not?)?
>> I believe it does just that.
>
> This patch is WRONG.
>
> --
> H.J.

 You will run into the same ICE with

 -mincoming-stack-boundary=2 -msse2 -O2 -m32

 in a leaf function which needs DImode spill/fill.
>>>
>>> Why would we need DImode spill/fill having no DImode registers?
>>>
>>
>> Because STV is enabled with
>>
>>  -mincoming-stack-boundary=2 -msse2 -O2 -m32
>
> I misread it as -mpreferred-... So why would we fail having a proper
> preferred stack alignment? AFAIK leaf function doesn't affect
> alignment until we finalize it after RA.
>

/* Finalize stack_realign_needed flag, which will guide prologue/epilogue
   to be generated in correct form.  */
static void
ix86_finalize_stack_realign_flags (void)
{
  /* Check if stack realign is really needed after reload, and
 stores result in cfun */
  unsigned int incoming_stack_boundary
= (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
   ? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
  unsigned int stack_realign
= (incoming_stack_boundary
   < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
  ? crtl->max_used_stack_slot_alignment
^^

For leaf function, we check max_used_stack_slot_alignment.
Since ix86_minimum_alignment returns 32 for DImode.
We won't realign stack for DImode spill/fill.

  : crtl->stack_alignment_needed));


-- 
H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Ilya Enkovich

2016-02-02 16:06 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  wrote:
>> 2016-02-02 15:46 GMT+03:00 H.J. Lu :
>>> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
 On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
>> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
>>
>> >> The bottom line is  ix86_minimum_alignment must return the correct
>> >> number for DImode or you can just turn off STV.   My suggestion is
>> >> to use my patch.
>> >
>> > Uros, any preferences here?  I mean, it is possible to use
>> > e.g. the ix86_option_override_internal and have H.J's 
>> > ix86_minimum_alignment
>> > change as a safety net, in the usual case for 
>> > -mpreferred-stack-boundary=2
>> > we'll just disable TARGET_STV and ix86_minimum_alignment change won't 
>> > do
>> > anything, as TARGET_STV will be false, and if for whatever case it gets
>> > through (target attribute, -mincoming-stack-boundary=, ...)
>> > ix86_minimum_alignment will be there to ensure enough stack alignment.
>> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
>> > anyway,
>> > and that is something we don't want to affect.
>>
>> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
>> is only an optimization. Perhaps we can also emit a "sorry" for
>> explicit -mstv in case stack boundary requirement is not satisfied.
>> *If* there is a need for -mstv with smaller stack boundary, we can
>> revisit this decision for later gcc versions.
>>
>> I think disabling STV is less surprising option than increasing stack
>> boundary behind the user's back.
>
> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
> ok for trunk then (alone or with additional sorry, incremental or not?)?
> I believe it does just that.

 This patch is WRONG.

 --
 H.J.
>>>
>>> You will run into the same ICE with
>>>
>>> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>>
>>> in a leaf function which needs DImode spill/fill.
>>
>> Why would we need DImode spill/fill having no DImode registers?
>>
>
> Because STV is enabled with
>
>  -mincoming-stack-boundary=2 -msse2 -O2 -m32

I misread it as -mpreferred-... So why would we fail having a proper
preferred stack alignment? AFAIK leaf function doesn't affect
alignment until we finalize it after RA.

>
>
> --
> H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Jakub Jelinek

On Tue, Feb 02, 2016 at 04:46:26AM -0800, H.J. Lu wrote:
> >> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
> >> ok for trunk then (alone or with additional sorry, incremental or not?)?
> >> I believe it does just that.
> >
> > This patch is WRONG.
> >
> > --
> > H.J.
> 
> You will run into the same ICE with
> 
> -mincoming-stack-boundary=2 -msse2 -O2 -m32
> 
> in a leaf function which needs DImode spill/fill.

So are you arguing for changing
+  /* Disable STV if -mpreferred-stack-boundary=2 - the needed
+ stack realignment will be extra cost the pass doesn't take into
+ account and the pass can't realign the stack.  */
+  if (ix86_preferred_stack_boundary < 64)
+opts->x_target_flags &= ~MASK_STV;
to
+  /* Disable STV if -m{preferred,incoming}-stack-boundary=2 - the needed
+ stack realignment will be extra cost the pass doesn't take into
+ account and the pass can't realign the stack.  */
+  if (ix86_preferred_stack_boundary < 64
+  || ix86_incoming_stack_boundary < 64)
+opts->x_target_flags &= ~MASK_STV;
I'm fine with that.

Jakub

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Uros Bizjak

On Tue, Feb 2, 2016 at 2:06 PM, H.J. Lu  wrote:
> On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  wrote:
>> 2016-02-02 15:46 GMT+03:00 H.J. Lu :
>>> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
 On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
>> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
>>
>> >> The bottom line is  ix86_minimum_alignment must return the correct
>> >> number for DImode or you can just turn off STV.   My suggestion is
>> >> to use my patch.
>> >
>> > Uros, any preferences here?  I mean, it is possible to use
>> > e.g. the ix86_option_override_internal and have H.J's 
>> > ix86_minimum_alignment
>> > change as a safety net, in the usual case for 
>> > -mpreferred-stack-boundary=2
>> > we'll just disable TARGET_STV and ix86_minimum_alignment change won't 
>> > do
>> > anything, as TARGET_STV will be false, and if for whatever case it gets
>> > through (target attribute, -mincoming-stack-boundary=, ...)
>> > ix86_minimum_alignment will be there to ensure enough stack alignment.
>> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
>> > anyway,
>> > and that is something we don't want to affect.
>>
>> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
>> is only an optimization. Perhaps we can also emit a "sorry" for
>> explicit -mstv in case stack boundary requirement is not satisfied.
>> *If* there is a need for -mstv with smaller stack boundary, we can
>> revisit this decision for later gcc versions.
>>
>> I think disabling STV is less surprising option than increasing stack
>> boundary behind the user's back.
>
> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
> ok for trunk then (alone or with additional sorry, incremental or not?)?
> I believe it does just that.

 This patch is WRONG.

 --
 H.J.
>>>
>>> You will run into the same ICE with
>>>
>>> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>>
>>> in a leaf function which needs DImode spill/fill.
>>
>> Why would we need DImode spill/fill having no DImode registers?
>>
>
> Because STV is enabled with
>
>  -mincoming-stack-boundary=2 -msse2 -O2 -m32

But this is the whole trick. Since stack alignment requirements won't
be satisfied, we disable STV even with -msse2.

Uros.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 5:03 AM, Ilya Enkovich  wrote:
> 2016-02-02 15:46 GMT+03:00 H.J. Lu :
>> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
>>> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
 On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
>
> >> The bottom line is  ix86_minimum_alignment must return the correct
> >> number for DImode or you can just turn off STV.   My suggestion is
> >> to use my patch.
> >
> > Uros, any preferences here?  I mean, it is possible to use
> > e.g. the ix86_option_override_internal and have H.J's 
> > ix86_minimum_alignment
> > change as a safety net, in the usual case for 
> > -mpreferred-stack-boundary=2
> > we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
> > anything, as TARGET_STV will be false, and if for whatever case it gets
> > through (target attribute, -mincoming-stack-boundary=, ...)
> > ix86_minimum_alignment will be there to ensure enough stack alignment.
> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse 
> > anyway,
> > and that is something we don't want to affect.
>
> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
> is only an optimization. Perhaps we can also emit a "sorry" for
> explicit -mstv in case stack boundary requirement is not satisfied.
> *If* there is a need for -mstv with smaller stack boundary, we can
> revisit this decision for later gcc versions.
>
> I think disabling STV is less surprising option than increasing stack
> boundary behind the user's back.

 So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
 ok for trunk then (alone or with additional sorry, incremental or not?)?
 I believe it does just that.
>>>
>>> This patch is WRONG.
>>>
>>> --
>>> H.J.
>>
>> You will run into the same ICE with
>>
>> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>>
>> in a leaf function which needs DImode spill/fill.
>
> Why would we need DImode spill/fill having no DImode registers?
>

Because STV is enabled with

 -mincoming-stack-boundary=2 -msse2 -O2 -m32


-- 
H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Ilya Enkovich

2016-02-02 15:46 GMT+03:00 H.J. Lu :
> On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
>> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
>>> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
 On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:

 >> The bottom line is  ix86_minimum_alignment must return the correct
 >> number for DImode or you can just turn off STV.   My suggestion is
 >> to use my patch.
 >
 > Uros, any preferences here?  I mean, it is possible to use
 > e.g. the ix86_option_override_internal and have H.J's 
 > ix86_minimum_alignment
 > change as a safety net, in the usual case for 
 > -mpreferred-stack-boundary=2
 > we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
 > anything, as TARGET_STV will be false, and if for whatever case it gets
 > through (target attribute, -mincoming-stack-boundary=, ...)
 > ix86_minimum_alignment will be there to ensure enough stack alignment.
 > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse anyway,
 > and that is something we don't want to affect.

 IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
 is only an optimization. Perhaps we can also emit a "sorry" for
 explicit -mstv in case stack boundary requirement is not satisfied.
 *If* there is a need for -mstv with smaller stack boundary, we can
 revisit this decision for later gcc versions.

 I think disabling STV is less surprising option than increasing stack
 boundary behind the user's back.
>>>
>>> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
>>> ok for trunk then (alone or with additional sorry, incremental or not?)?
>>> I believe it does just that.
>>
>> This patch is WRONG.
>>
>> --
>> H.J.
>
> You will run into the same ICE with
>
> -mincoming-stack-boundary=2 -msse2 -O2 -m32
>
> in a leaf function which needs DImode spill/fill.

Why would we need DImode spill/fill having no DImode registers?

Thanks,
Ilya

>
>
> --
> H.J.

Re: [hsa merge 01/10] Configury changes and new options

2016-02-02 Thread Thomas Schwinge

Hi!

On Wed, 13 Jan 2016 18:39:26 +0100, Martin Jambor  wrote:
> this patch contains changes to the configuration mechanism and offload
> bits, so that users can build compilers with HSA support.
> 
> It is a re-post of
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00714.html, which, has
> already been approved by Jakub after a few changes
> (https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01284.html).

On top of that, I applied four patches "as obvious":

> --- a/libgomp/plugin/configfrag.ac
> +++ b/libgomp/plugin/configfrag.ac
> @@ -81,6 +81,62 @@ AC_SUBST(PLUGIN_NVPTX_CPPFLAGS)
>  AC_SUBST(PLUGIN_NVPTX_LDFLAGS)
>  AC_SUBST(PLUGIN_NVPTX_LIBS)
>  
> +# Look for HSA run-time, its includes and libraries
> +
> +HSA_RUNTIME_INCLUDE=
> +HSA_RUNTIME_LIB=
> +AC_SUBST(HSA_RUNTIME_INCLUDE)
> +AC_SUBST(HSA_RUNTIME_LIB)
> +HSA_RUNTIME_CPPFLAGS=
> +HSA_RUNTIME_LDFLAGS=
> +
> +AC_ARG_WITH(hsa-runtime,
> + [AS_HELP_STRING([--with-hsa-runtime=PATH],
> + [specify prefix directory for installed HSA run-time package.
> +  Equivalent to --with-hsa-runtime-include=PATH/include
> +  plus --with-hsa-runtime-lib=PATH/lib])])
> +AC_ARG_WITH(hsa-runtime-include,
> + [AS_HELP_STRING([--with-hsa-runtime-include=PATH],
> + [specify directory for installed HSA run-time include files])])
> +AC_ARG_WITH(hsa-runtime-lib,
> + [AS_HELP_STRING([--with-hsa-runtime-lib=PATH],
> + [specify directory for the installed HSA run-time library])])
> +if test "x$with_hsa_runtime" != x; then
> +  HSA_RUNTIME_INCLUDE=$with_hsa_runtime/include
> +  HSA_RUNTIME_LIB=$with_hsa_runtime/lib
> +fi
> +if test "x$with_hsa_runtime_include" != x; then
> +  HSA_RUNTIME_INCLUDE=$with_hsa_runtime_include
> +fi
> +if test "x$with_hsa_runtime_lib" != x; then
> +  HSA_RUNTIME_LIB=$with_hsa_runtime_lib
> +fi
> +if test "x$HSA_RUNTIME_INCLUDE" != x; then
> +  HSA_RUNTIME_CPPFLAGS=-I$HSA_RUNTIME_INCLUDE
> +fi
> +if test "x$HSA_RUNTIME_LIB" != x; then
> +  HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
> +fi
> +
> +AC_ARG_WITH(hsa-kmt-lib,
> + [AS_HELP_STRING([--with-hsa-kmt-lib=PATH],
> + [specify directory for installed HSA KMT library.])])
> +if test "x$with_hsa_kmt_lib" != x; then
> +  HSA_RUNTIME_LDFLAGS="$HSA_RUNTIME_LDFLAGS -L$with_hsa_kmt_lib"
> +  HSA_RUNTIME_LIB=
> +fi
> +
> +PLUGIN_HSA=0
> +PLUGIN_HSA_CPPFLAGS=
> +PLUGIN_HSA_LDFLAGS=
> +PLUGIN_HSA_LIBS=
> +AC_SUBST(PLUGIN_HSA)
> +AC_SUBST(PLUGIN_HSA_CPPFLAGS)
> +AC_SUBST(PLUGIN_HSA_LDFLAGS)
> +AC_SUBST(PLUGIN_HSA_LIBS)
> +
> +
> +
>  # Get offload targets and path to install tree of offloading compiler.
>  offload_additional_options=
>  offload_additional_lib_paths=
> @@ -122,6 +178,49 @@ if test x"$enable_offload_targets" != x; then
>   ;;
>   esac
>   ;;
> +  hsa*)
> + case "${target}" in
> +   x86_64-*-*)
> + case " ${CC} ${CFLAGS} " in
> +   *" -m32 "*)
> + PLUGIN_HSA=0
> + ;;
> +   *)
> + tgt_name=hsa
> + PLUGIN_HSA=$tgt
> + PLUGIN_HSA_CPPFLAGS=$HSA_RUNTIME_CPPFLAGS
> + PLUGIN_HSA_LDFLAGS=$HSA_RUNTIME_LDFLAGS
> + PLUGIN_HSA_LIBS="-lhsa-runtime64 -lhsakmt"
> +
> + PLUGIN_HSA_save_CPPFLAGS=$CPPFLAGS
> + CPPFLAGS="$PLUGIN_HSA_CPPFLAGS $CPPFLAGS"
> + PLUGIN_HSA_save_LDFLAGS=$LDFLAGS
> + LDFLAGS="$PLUGIN_HSA_LDFLAGS $LDFLAGS"
> + PLUGIN_HSA_save_LIBS=$LIBS
> + LIBS="$PLUGIN_HSA_LIBS $LIBS"
> +
> + AC_LINK_IFELSE(
> +   [AC_LANG_PROGRAM(
> + [#include "hsa.h"],
> +   [hsa_status_t status = hsa_init ()])],
> +   [PLUGIN_HSA=1])
> + CPPFLAGS=$PLUGIN_HSA_save_CPPFLAGS
> + LDFLAGS=$PLUGIN_HSA_save_LDFLAGS
> + LIBS=$PLUGIN_HSA_save_LIBS
> + case $PLUGIN_HSA in
> +   hsa*)
> + HSA_PLUGIN=0
> + AC_MSG_ERROR([HSA run-time package required for HSA 
> support])
> + ;;
> + esac
> + ;;
> +   esac
> + ;;
> +   *-*-*)
> + PLUGIN_HSA=0
> +;;
> +esac
> +;;
>*)
>   AC_MSG_ERROR([unknown offload target specified])
>   ;;

For the cases where PLUGIN_HSA is set to "0", we also shouldn't configure
libgomp for hsa offloading: loading the plugin will expectedly fail if we
don't build it; committed in r233070:

commit bf57e97e03d046a6109d976bc098e197f220d1b9
Author: tschwinge 
Date:   Tue Feb 2 12:48:04 2016 +

libgomp: Don't configure for offloading target if we don't build the 
corresponding plugin

libgomp/
* plugin/configfrag.ac: Don't configure for offloading target if
we don't build the corresponding plugin.
* configure: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@233070 
138bc75d-0d04-0410-961f-82ee72b054a4
---

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 4:30 AM, H.J. Lu  wrote:
> On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
>> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
>>> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
>>>
>>> >> The bottom line is  ix86_minimum_alignment must return the correct
>>> >> number for DImode or you can just turn off STV.   My suggestion is
>>> >> to use my patch.
>>> >
>>> > Uros, any preferences here?  I mean, it is possible to use
>>> > e.g. the ix86_option_override_internal and have H.J's 
>>> > ix86_minimum_alignment
>>> > change as a safety net, in the usual case for -mpreferred-stack-boundary=2
>>> > we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
>>> > anything, as TARGET_STV will be false, and if for whatever case it gets
>>> > through (target attribute, -mincoming-stack-boundary=, ...)
>>> > ix86_minimum_alignment will be there to ensure enough stack alignment.
>>> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse anyway,
>>> > and that is something we don't want to affect.
>>>
>>> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
>>> is only an optimization. Perhaps we can also emit a "sorry" for
>>> explicit -mstv in case stack boundary requirement is not satisfied.
>>> *If* there is a need for -mstv with smaller stack boundary, we can
>>> revisit this decision for later gcc versions.
>>>
>>> I think disabling STV is less surprising option than increasing stack
>>> boundary behind the user's back.
>>
>> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
>> ok for trunk then (alone or with additional sorry, incremental or not?)?
>> I believe it does just that.
>
> This patch is WRONG.
>
> --
> H.J.

You will run into the same ICE with

-mincoming-stack-boundary=2 -msse2 -O2 -m32

in a leaf function which needs DImode spill/fill.


-- 
H.J.

[PATCH] Fix PR69606

2016-02-02 Thread Richard Biener


The following fixes yet another bogus range info case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-02-02  Richard Biener  

PR tree-optimization/69606
* tree-ssa-math-opts.c (bswap_replace): Clear flow sensitive
info on the result before moving a stmt.

* gcc.dg/torture/pr69606.c: New testcase.

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 233067)
--- gcc/tree-ssa-math-opts.c(working copy)
*** bswap_replace (gimple *cur_stmt, gimple
*** 2622,2627 
--- 2622,2629 
/* Move cur_stmt just before  one of the load of the original
 to ensure it has the same VUSE.  See PR61517 for what could
 go wrong.  */
+   if (gimple_bb (cur_stmt) != gimple_bb (src_stmt))
+   reset_flow_sensitive_info (gimple_assign_lhs (cur_stmt));
gsi_move_before (&gsi, &gsi_ins);
gsi = gsi_for_stmt (cur_stmt);
  
Index: gcc/testsuite/gcc.dg/torture/pr69606.c
===
*** gcc/testsuite/gcc.dg/torture/pr69606.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69606.c  (working copy)
***
*** 0 
--- 1,20 
+ /* { dg-do run } */
+ 
+ char a;
+ unsigned short b;
+ int c, d;
+ unsigned char e;
+ 
+ int
+ main ()
+ {
+   int f = 1, g = ~a;
+   if (b > f)
+ {
+   e = b; 
+   d = b | e; 
+   g = 0;
+ }
+   c = 1 % g;
+   return 0; 
+ }

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 4:29 AM, Jakub Jelinek  wrote:
> On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
>> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
>>
>> >> The bottom line is  ix86_minimum_alignment must return the correct
>> >> number for DImode or you can just turn off STV.   My suggestion is
>> >> to use my patch.
>> >
>> > Uros, any preferences here?  I mean, it is possible to use
>> > e.g. the ix86_option_override_internal and have H.J's 
>> > ix86_minimum_alignment
>> > change as a safety net, in the usual case for -mpreferred-stack-boundary=2
>> > we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
>> > anything, as TARGET_STV will be false, and if for whatever case it gets
>> > through (target attribute, -mincoming-stack-boundary=, ...)
>> > ix86_minimum_alignment will be there to ensure enough stack alignment.
>> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse anyway,
>> > and that is something we don't want to affect.
>>
>> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
>> is only an optimization. Perhaps we can also emit a "sorry" for
>> explicit -mstv in case stack boundary requirement is not satisfied.
>> *If* there is a need for -mstv with smaller stack boundary, we can
>> revisit this decision for later gcc versions.
>>
>> I think disabling STV is less surprising option than increasing stack
>> boundary behind the user's back.
>
> So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
> ok for trunk then (alone or with additional sorry, incremental or not?)?
> I believe it does just that.

This patch is WRONG.

-- 
H.J.

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Jakub Jelinek

On Tue, Feb 02, 2016 at 01:24:26PM +0100, Uros Bizjak wrote:
> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
> 
> >> The bottom line is  ix86_minimum_alignment must return the correct
> >> number for DImode or you can just turn off STV.   My suggestion is
> >> to use my patch.
> >
> > Uros, any preferences here?  I mean, it is possible to use
> > e.g. the ix86_option_override_internal and have H.J's ix86_minimum_alignment
> > change as a safety net, in the usual case for -mpreferred-stack-boundary=2
> > we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
> > anything, as TARGET_STV will be false, and if for whatever case it gets
> > through (target attribute, -mincoming-stack-boundary=, ...)
> > ix86_minimum_alignment will be there to ensure enough stack alignment.
> > Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse anyway,
> > and that is something we don't want to affect.
> 
> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
> is only an optimization. Perhaps we can also emit a "sorry" for
> explicit -mstv in case stack boundary requirement is not satisfied.
> *If* there is a need for -mstv with smaller stack boundary, we can
> revisit this decision for later gcc versions.
> 
> I think disabling STV is less surprising option than increasing stack
> boundary behind the user's back.

So, is http://gcc.gnu.org/ml/gcc-patches/2016-01/msg02129.html
ok for trunk then (alone or with additional sorry, incremental or not?)?
I believe it does just that.

Jakub

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread H.J. Lu

On Tue, Feb 2, 2016 at 4:24 AM, Uros Bizjak  wrote:
> On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:
>
>>> The bottom line is  ix86_minimum_alignment must return the correct
>>> number for DImode or you can just turn off STV.   My suggestion is
>>> to use my patch.
>>
>> Uros, any preferences here?  I mean, it is possible to use
>> e.g. the ix86_option_override_internal and have H.J's ix86_minimum_alignment
>> change as a safety net, in the usual case for -mpreferred-stack-boundary=2
>> we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
>> anything, as TARGET_STV will be false, and if for whatever case it gets
>> through (target attribute, -mincoming-stack-boundary=, ...)
>> ix86_minimum_alignment will be there to ensure enough stack alignment.
>> Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse anyway,
>> and that is something we don't want to affect.
>
> IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
> is only an optimization. Perhaps we can also emit a "sorry" for
> explicit -mstv in case stack boundary requirement is not satisfied.
> *If* there is a need for -mstv with smaller stack boundary, we can
> revisit this decision for later gcc versions.
>
> I think disabling STV is less surprising option than increasing stack
> boundary behind the user's back.
>
> Uros.

My ix86_minimum_alignment change is for correctness, which is required for
DImode spill/fill unless LRA supports unaligned spill/fill.   Disabling
TARGET_STV for  -mpreferred-stack-boundary=2 is an optimization.

Let me point it out again.  Checking  -mpreferred-stack-boundary < 3 to
disable STV won't work for 32-bit incoming stack boundary and 64-bit preferred
stack boundary.  In this case, STV is on.  When LRA needs 64-bit aligned stack
slot, stack must be realigned.  But for leaf function, we may not
realign stack if
 ix86_minimum_alignment returns 32 for DImode.   We must return 64 for DImode
if STV is on in ix86_minimum_alignment.

-- 
H.J.

Re: [PATCH] fix #69251 - [6 Regression] ICE in unify_array_domain on a flexible array member

2016-02-02 Thread Jason Merrill


On 01/25/2016 05:55 PM, Martin Sebor wrote:

The downside of this approach is that it prevents everything but
the front end from distinguishing flexible array members from
arrays of unspecified or unknown bounds.  The immediate impact
is that prevents us from maintaining ABI compatibility with GCC
5 (with -fabi-version=9) and from diagnosing the mangling change.
This means should we decide to adopt this approach, the final
version of the patch for c++/69277 mentioned above that's still
pending approval will need to be tweaked to have the ABI checks
removed.


That's unfortunate, but I think acceptable.


* decl.c (compute_array_index_type): Return null for flexible array
members.


Instead of this, I would think we can remove the calls to 
compute_array_index_type added by your earlier patch, as well as many 
other changes from that patch to handle null TYPE_MAX_VALUE.



* tree.c (array_of_runtime_bound_p): Handle gracefully array types
with null TYPE_MAX_VALUE.


This seems unneeded.


(build_ctor_subob_ref): Loosen debug checking to handle flexible
array members.


And this shouldn't need the TYPE_MAX_VALUE check.

Jason

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Uros Bizjak

On Tue, Feb 2, 2016 at 12:53 PM, Jakub Jelinek  wrote:

>> The bottom line is  ix86_minimum_alignment must return the correct
>> number for DImode or you can just turn off STV.   My suggestion is
>> to use my patch.
>
> Uros, any preferences here?  I mean, it is possible to use
> e.g. the ix86_option_override_internal and have H.J's ix86_minimum_alignment
> change as a safety net, in the usual case for -mpreferred-stack-boundary=2
> we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
> anything, as TARGET_STV will be false, and if for whatever case it gets
> through (target attribute, -mincoming-stack-boundary=, ...)
> ix86_minimum_alignment will be there to ensure enough stack alignment.
> Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse anyway,
> and that is something we don't want to affect.

IMO, we should disable STV when -mpreferred-stack-boundary < 3, as STV
is only an optimization. Perhaps we can also emit a "sorry" for
explicit -mstv in case stack boundary requirement is not satisfied.
*If* there is a need for -mstv with smaller stack boundary, we can
revisit this decision for later gcc versions.

I think disabling STV is less surprising option than increasing stack
boundary behind the user's back.

Uros.

Re: [PATCH] target/68972 - g++.dg/cpp1y/vla-initlist1.C test case fails on powerpc64le

2016-02-02 Thread Jason Merrill


On 01/29/2016 02:25 AM, Martin Sebor wrote:

What seems to happen is that the call to __builtin_alloca_with_align
uses the stdu (store with update) instruction to store and bump down
the stack pointer (SP) at the same time (standard for powerpc63le)
to make room for the VLA.  The subsequent code then reads the saved
value of the SP and uses it as the address of the VLA. Since the SP
is 64 bits wide, it clobbers the first two words of the VLA.  The
test looks at the second element, expecting it to be unchanged, but
what it finds is the upper word of the saved SP. Since the saved SP
value doesn't get read I don't see anything wrong with this.


Interesting.  What's the point of storing the SP if it isn't going to be 
used to restore it later?


I think I'd prefer to disable the test on targets with quirks like this 
rather than everywhere.


Jason

Re: [PATCH, PR target/69454] Disable TARGET_STV when stack is not properly aligned

2016-02-02 Thread Jakub Jelinek

On Thu, Jan 28, 2016 at 04:42:02AM -0800, H.J. Lu wrote:
> >> 2016-01-27  Jakub Jelinek  
> >> Ilya Enkovich  
> >>
> >> PR target/69454
> >> * config/i386/i386.c (convert_scalars_to_vector): Remove
> >> stack alignment fixes.
> >> (ix86_option_override_internal): Disable TARGET_STV if stack
> >> is not properly aligned.
> >>
> >> gcc/testsuite/
> >>
> >> 2016-01-27  Ilya Enkovich  
> >>
> >> PR target/69454
> >> * gcc.target/i386/pr69454-1.c: New test.
> >> * gcc.target/i386/pr69454-2.c: New test.
> >>
> >>
> >> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >> index 34b57a4..9fb8db8 100644
> >> --- a/gcc/config/i386/i386.c
> >> +++ b/gcc/config/i386/i386.c
> >> @@ -3588,16 +3588,6 @@ convert_scalars_to_vector ()
> >>bitmap_obstack_release (NULL);
> >>df_process_deferred_rescans ();
> >>
> >> -  /* Conversion means we may have 128bit register spills/fills
> >> - which require aligned stack.  */
> >> -  if (converted_insns)
> >> -{
> >> -  if (crtl->stack_alignment_needed < 128)
> >> -   crtl->stack_alignment_needed = 128;
> >> -  if (crtl->stack_alignment_estimated < 128)
> >> -   crtl->stack_alignment_estimated = 128;
> >> -}
> >> -
> >>return 0;
> >>  }
> >>
> >> @@ -5453,6 +5443,11 @@ ix86_option_override_internal (bool main_args_p,
> >>  opts->x_target_flags |= MASK_VZEROUPPER;
> >>if (!(opts_set->x_target_flags & MASK_STV))
> >>  opts->x_target_flags |= MASK_STV;
> >> +  /* Disable STV if -mpreferred-stack-boundary=2 - the needed
> >> + stack realignment will be extra cost the pass doesn't take into
> >> + account and the pass can't realign the stack.  */
> >> +  if (ix86_preferred_stack_boundary < 64)
> >> +opts->x_target_flags &= ~MASK_STV;
> >>
> >> This won't work for 32-bit incoming stack boundary and 64-bit preferred
> >> stack boundary.  In this case, STV won't be off.  When LRA needs 64-bit
> >> aligned stack slot, stack must be realigned.  But for leaf function, we may
> >> not realign stack if ix86_minimum_alignment returns 32 for DImode.   You
> >> must either add assert (!TARGET_STV) before returning 32 for DImode or
> >> return 64 for DImode if STV is on in ix86_minimum_alignment.
> >
> > TARGET_STV doesn't mean STV pass will run. We can check alignment in STV
> > pass gate and this assert would be wrong. If we decide STV to be dependent 
> > on
> > stack alignment then we shouldn't make alignment be dependent on STV. I can 
> > add
> > assert into convert_scalars_to_vector to check
> > crtl->stack_alignment_estimated >= 64
> > by that moment.
> >
> 
> The bottom line is  ix86_minimum_alignment must return the correct
> number for DImode or you can just turn off STV.   My suggestion is
> to use my patch.

Uros, any preferences here?  I mean, it is possible to use
e.g. the ix86_option_override_internal and have H.J's ix86_minimum_alignment
change as a safety net, in the usual case for -mpreferred-stack-boundary=2
we'll just disable TARGET_STV and ix86_minimum_alignment change won't do
anything, as TARGET_STV will be false, and if for whatever case it gets
through (target attribute, -mincoming-stack-boundary=, ...)
ix86_minimum_alignment will be there to ensure enough stack alignment.
Most of the smaller -mpreferred-stack-boundary= uses are -mno-sse anyway,
and that is something we don't want to affect.

Jakub

[PATCH] combine: distribute_notes again (PR69567, PR64682)

2016-02-02 Thread Segher Boessenkool

As it happens the patch I did over a year ago for PR64682 isn't quite
correct.  This is PR69567.  This fixes it.

Tested on the separate testcases; also did bootstrap + testsuite on
powerpc64-linux and x86_64-linux.  I'll commit this tomorrow or so if
no one sees something wrong with it.


Segher


2016-02-02  Segher Boessenkool  

PR rtl-optimization/64682
PR rtl-optimization/69567
* combine.c (distribute_notes) : Place the death note
before I2 only if the register is both used and set in I2.

---
 gcc/combine.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index ad79c44..7251078 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -13888,6 +13888,7 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
tem_insn = from_insn;
  else
{
+ tem_insn = i3;
  if (from_insn
  && CALL_P (from_insn)
  && find_reg_fusage (from_insn, USE, XEXP (note, 0)))
@@ -13896,7 +13897,14 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
place = i3;
  else if (i2 != 0 && next_nonnote_nondebug_insn (i2) == i3
   && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
-   place = i2;
+   {
+ place = i2;
+ /* If the new I2 sets the same register that is marked dead
+in the note, the note now should not be put on I2, as the
+note refers to a previous incarnation of the reg.  */
+ if (reg_set_p (XEXP (note, 0), PATTERN (i2)))
+   tem_insn = i2;
+   }
  else if ((rtx_equal_p (XEXP (note, 0), elim_i2)
&& !(i2mod
 && reg_overlap_mentioned_p (XEXP (note, 0),
@@ -13904,12 +13912,6 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
   || rtx_equal_p (XEXP (note, 0), elim_i1)
   || rtx_equal_p (XEXP (note, 0), elim_i0))
break;
- tem_insn = i3;
- /* If the new I2 sets the same register that is marked dead
-in the note, the note now should not be put on I2, as the
-note refers to a previous incarnation of the reg.  */
- if (i2 != 0 && reg_set_p (XEXP (note, 0), PATTERN (i2)))
-   tem_insn = i2;
}
 
  if (place == 0)
-- 
1.9.3

[PATCH] Fix PR69595, bogus -Warray-bound warning

2016-02-02 Thread Richard Biener


The following is the minimal approach to fix this -Warray-bound at
this stage.  We're pretty bad at optimizing range tests that
get optimizable during GIMPLE only (fold can handle quite some cases).
Too bad to optimize this before VRP gets to warn about out-of-bound
array accesses.  Later reassoc handles this case, but pass reordering
at this stage is not appropriate.

So the following implements range test simplifications that result
in true/false to catch not executable code early.

In the past I've always pondered that we need some better infrastructure
for collecting and combining conditions (similar to what we have
with tree-affine for affine combinations).  We have multiple passes
dealing with a subset of condition combinations (CFG / non-CFG,
re-writing and just statically computing).  Sharing this with some
proper infrastructure is the way to go.

I pondered to deal with this case in VRP itself but it doesn't really
fit there.

The patch handles slightly more cases than necessary for the PR
(in particular the && variant).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-02-02  Richard Biener  

PR tree-optimization/69595
* match.pd: Add range test simplifications to true/false. 

* gcc.dg/Warray-bounds-17.c: New testcase.

Index: gcc/match.pd
===
*** gcc/match.pd(revision 233067)
--- gcc/match.pd(working copy)
*** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
*** 2094,2099 
--- 2094,2117 
   (bit_and:c (ordered @0 @0) (ordered:c@2 @0 @1))
   @2)
  
+ /* Simple range test simplifications.  */
+ /* A < B || A >= B -> true.  */
+ (for test1 (lt le ne)
+  test2 (ge gt eq)
+  (simplify
+   (bit_ior:c (test1 @0 @1) (test2 @0 @1))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+|| VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
+{ constant_boolean_node (true, type); })))
+ /* A < B && A >= B -> false.  */
+ (for test1 (lt lt lt le ne eq)
+  test2 (ge gt eq gt eq gt)
+  (simplify
+   (bit_and:c (test1 @0 @1) (test2 @0 @1))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+|| VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
+{ constant_boolean_node (false, type); })))
+ 
  /* -A CMP -B -> B CMP A.  */
  (for cmp (tcc_comparison)
   scmp (swapped_tcc_comparison)
Index: gcc/testsuite/gcc.dg/Warray-bounds-17.c
===
*** gcc/testsuite/gcc.dg/Warray-bounds-17.c (revision 0)
--- gcc/testsuite/gcc.dg/Warray-bounds-17.c (working copy)
***
*** 0 
--- 1,13 
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -Warray-bounds" } */
+ 
+ char *y;
+ void foo (int sysnum)
+ {
+   static char *x[] = {};
+   int nsyscalls = sizeof x / sizeof x[0];
+   if (sysnum < 0 || sysnum >= nsyscalls)
+ return;
+   else
+ y = x[sysnum]; /* { dg-bogus "above array bounds" } */
+ }

Re: [aarch64] Improve TImode constant moves

2016-02-02 Thread James Greenhalgh

On Sun, Jan 24, 2016 at 02:54:32AM -0800, Richard Henderson wrote:
> This looks to be an incomplete transition of the aarch64 backend to
> CONST_WIDE_INT.  I haven't checked to see if it's a regression from
> gcc5, but I suspect not, since there should have been similar checks
> for CONST_DOUBLE.
> 
> This is probably gcc7 fodder, but it helped me debug another TImode PR.

When the time comes, this is OK.

Thanks,
James

>   * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle CONST_WIDE_INT.
>   (aarch64_legitimate_constant_p): Accept CONST_SCALAR_INT_P.
>   * config/aarch64/predicates.md (aarch64_movti_operand): Accept
>   const_wide_int and const_scalar_int_operand.
>   (aarch64_reg_or_imm): Likewise.
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index df3dec0..38c7443 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -6227,6 +6227,17 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer 
> ATTRIBUTE_UNUSED,
>   }
>return true;
>  
> +case CONST_WIDE_INT:
> +  *cost = 0;
> +  for (unsigned int n = CONST_WIDE_INT_NUNITS(x), i = 0; i < n; ++i)
> + {
> +   unsigned HOST_WIDE_INT e = CONST_WIDE_INT_ELT(x, i);
> +   if (e != 0)
> + *cost += COSTS_N_INSNS (aarch64_internal_mov_immediate
> + (NULL_RTX, GEN_INT (e), false, DImode));
> + }
> +  return true;
> +
>  case CONST_DOUBLE:
>if (speed)
>   {
> @@ -9400,6 +9411,9 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x)
>&& aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0
>  return true;
>  
> +  if (CONST_SCALAR_INT_P (x))
> +return true;
> +
>return aarch64_constant_address_p (x);
>  }
>  
> diff --git a/gcc/config/aarch64/predicates.md 
> b/gcc/config/aarch64/predicates.md
> index e96dc00..3eb33fa 100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -217,15 +217,15 @@
>(match_test "aarch64_mov_operand_p (op, mode)")
>  
>  (define_predicate "aarch64_movti_operand"
> -  (and (match_code "reg,subreg,mem,const_int")
> +  (and (match_code "reg,subreg,mem,const_int,const_wide_int")
> (ior (match_operand 0 "register_operand")
>   (ior (match_operand 0 "memory_operand")
> -  (match_operand 0 "const_int_operand")
> +  (match_operand 0 "const_scalar_int_operand")
>  
>  (define_predicate "aarch64_reg_or_imm"
> -  (and (match_code "reg,subreg,const_int")
> +  (and (match_code "reg,subreg,const_int,const_wide_int")
> (ior (match_operand 0 "register_operand")
> - (match_operand 0 "const_int_operand"
> + (match_operand 0 "const_scalar_int_operand"
>  
>  ;; True for integer comparisons and for FP comparisons other than LTGT or 
> UNEQ.
>  (define_special_predicate "aarch64_comparison_operator"

Re: [Patch AArch64] GCC 6 regression in vector performance. - Fix vector initialization to happen with lane load instructions.

2016-02-02 Thread James Greenhalgh

On Wed, Jan 20, 2016 at 03:22:11PM +, James Greenhalgh wrote:
> 
> Hi,
> 
> In a number of cases where we try to create vectors we end up spilling to the
> stack and then filling. This is one example distilled from a couple of
> micro-benchmrks where the issue shows up. The reason for the extra cost
> in this case is the unnecessary use of the stack. The patch attempts to
> finesse this by using lane loads or vector inserts to produce the right
> results.
> 
> This patch is mostly Ramana's work, I've just cleaned it up a little.
> 
> This has been in a number of our trees lately, and we haven't seen any
> regressions. I've also bootstrapped and tested it, and run a set of
> benchmarks to show no regressions on Cortex-A57 or Cortex-A53.
> 
> The patch fixes some regressions caused by the more agressive vectorization
> in GCC6, so I'd like to propose it to go in even though we are in Stage 4.
> 
> OK?

*Ping*

I just ran in to this investigating another performance regression. It would
be nice to get fixed.

Thanks,
James


> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2016-01-20  James Greenhalgh  
>   Ramana Radhakrishnan  
> 
>   * config/aarch64/aarch64.c (aarch64_expand_vector_init): Refactor,
>   always use lane loads to construct non-constant vectors.
> 
> gcc/testsuite/
> 
> 2016-01-20  James Greenhalgh  
>   Ramana Radhakrishnan  
> 
>   * gcc.target/aarch64/vector_initialization_nostack.c: New.
> 

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 03bc1b9..3787b38 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10985,28 +10985,37 @@ aarch64_simd_make_constant (rtx vals)
>  return NULL_RTX;
>  }
>  
> +/* Expand a vector initialisation sequence, such that TARGET is
> +   initialised to contain VALS.  */
> +
>  void
>  aarch64_expand_vector_init (rtx target, rtx vals)
>  {
>machine_mode mode = GET_MODE (target);
>machine_mode inner_mode = GET_MODE_INNER (mode);
> +  /* The number of vector elements.  */
>int n_elts = GET_MODE_NUNITS (mode);
> +  /* The number of vector elements which are not constant.  */
>int n_var = 0;
>rtx any_const = NULL_RTX;
> +  /* The first element of vals.  */
> +  rtx v0 = XVECEXP (vals, 0, 0);
>bool all_same = true;
>  
> +  /* Count the number of variable elements to initialise.  */
>for (int i = 0; i < n_elts; ++i)
>  {
>rtx x = XVECEXP (vals, 0, i);
> -  if (!CONST_INT_P (x) && !CONST_DOUBLE_P (x))
> +  if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x)))
>   ++n_var;
>else
>   any_const = x;
>  
> -  if (i > 0 && !rtx_equal_p (x, XVECEXP (vals, 0, 0)))
> - all_same = false;
> +  all_same &= rtx_equal_p (x, v0);
>  }
>  
> +  /* No variable elements, hand off to aarch64_simd_make_constant which knows
> + how best to handle this.  */
>if (n_var == 0)
>  {
>rtx constant = aarch64_simd_make_constant (vals);
> @@ -11020,14 +11029,15 @@ aarch64_expand_vector_init (rtx target, rtx vals)
>/* Splat a single non-constant element if we can.  */
>if (all_same)
>  {
> -  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
> +  rtx x = copy_to_mode_reg (inner_mode, v0);
>aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
>return;
>  }
>  
> -  /* Half the fields (or less) are non-constant.  Load constant then 
> overwrite
> - varying fields.  Hope that this is more efficient than using the stack. 
>  */
> -  if (n_var <= n_elts/2)
> +  /* Initialise a vector which is part-variable.  We want to first try
> + to build those lanes which are constant in the most efficient way we
> + can.  */
> +  if (n_var != n_elts)
>  {
>rtx copy = copy_rtx (vals);
>  
> @@ -11054,31 +11064,21 @@ aarch64_expand_vector_init (rtx target, rtx vals)
> XVECEXP (copy, 0, i) = subst;
>   }
>aarch64_expand_vector_init (target, copy);
> +}
>  
> -  /* Insert variables.  */
> -  enum insn_code icode = optab_handler (vec_set_optab, mode);
> -  gcc_assert (icode != CODE_FOR_nothing);
> +  /* Insert the variable lanes directly.  */
>  
> -  for (int i = 0; i < n_elts; i++)
> - {
> -   rtx x = XVECEXP (vals, 0, i);
> -   if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
> - continue;
> -   x = copy_to_mode_reg (inner_mode, x);
> -   emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
> - }
> -  return;
> -}
> +  enum insn_code icode = optab_handler (vec_set_optab, mode);
> +  gcc_assert (icode != CODE_FOR_nothing);
>  
> -  /* Construct the vector in memory one field at a time
> - and load the whole vector.  */
> -  rtx mem = assign_stack_temp (mode, GET_MODE_SIZE (mode));
>for (int i = 0; i < n_elts; i++)
> -emit_move_insn (adjust_address_nv (mem, inner_mode,
> - i * GET_MODE_SIZE (inner_mode)),
> - XVECEXP

Re: [PATCH] [graphite] document that isl-0.16 is supported

2016-02-02 Thread Sebastian Huber

On 01/02/16 19:27, Mike Stump wrote:

On Jan 29, 2016, at 8:10 AM, Sebastian Pop  wrote:

>diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
>index 062f42c..3df7974 100644
>--- a/gcc/doc/install.texi
>+++ b/gcc/doc/install.texi
>@@ -383,7 +383,7 @@ installed but it is not in your default library search 
path, the
>@option{--with-mpc} configure option should be used.  See also
>@option{--with-mpc-lib} and @option{--with-mpc-include}.
>
>-@item isl Library version 0.15 or 0.14.
>+@item isl Library version 0.16, 0.15, or 0.14.

So, they should commit to compatibility with apis vended, and if they do, I 
think we should say 0.14 or later.  This doesn’t mean that we won’t need fixes 
from time to time or that they will always do this, but generally that is true. 
 If we (they) deviate from this,_then_  we document the exceptions.  If release 
after release goes out, with all newer versions not working, then the list of 
known good versions is best; but, I’d say that something is terribly broken if 
we had to do that.

It would be good to have a recommended version as well (similar for 
cloog, gmp, mpc and mpfr). If you present me three versions which one 
should I choose as a naive user? Are the versions in the 
contrib/download_prerequisites script the recommended ones?

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2016-02-02 Thread James Greenhalgh

On Tue, Jan 26, 2016 at 05:39:24PM +, Wilco Dijkstra wrote:
> ping (note the regressions discussed below are addressed by 
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01761.html)

OK, but please be extra vigilant for any fallout on AArch64 after this
and the follow-up linked above is applied.

Thanks,
James

> James Greenhalgh wrote:
> > On Wed, Dec 16, 2015 at 01:05:21PM +, Wilco Dijkstra wrote:
> > > James Greenhalgh wrote:
> > > > On Tue, Dec 15, 2015 at 10:54:49AM +, Wilco Dijkstra wrote:
> > > > > ping
> > > > >
> > > > > > -Original Message-
> > > > > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > > > > Sent: 06 November 2015 20:06
> > > > > > To: 'gcc-patches@gcc.gnu.org'
> > > > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > > >
> > > > > > This patch adds support for the 
> > > > > > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the 
> > > > > > register
> > > > > > allocator always uses ALL_REGS even when it has a much higher cost. 
> > > > > > The
> > > > > > hook changes the class to either FP_REGS or GENERAL_REGS depending 
> > > > > > on the
> > > > > > mode of the register. This results in better register allocation 
> > > > > > overall,
> > > > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > > > > >
> > > > > > GCC regression passes with several minor fixes.
> > > > > >
> > > > > > OK for commit?
> > > > > >
> > > > > > ChangeLog:
> > > > > > 2015-11-06  Wilco Dijkstra  
> > > > > >
> > > > > >   * gcc/config/aarch64/aarch64.c
> > > > > >   (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > > > >   (aarch64_ira_change_pseudo_allocno_class): New function.
> > > > > >   * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > > > >   * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > >   (test_corners_sisd_di): Improve force to SIMD register.
> > > > > >   (test_corners_sisd_si): Likewise.
> > > > > >   * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with 
> > > > > > -O2.
> > > > > >   * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > > > >   Remove scan-assembler check for ldr.
> > > >
> > > > Drop the gcc/ from the ChangeLog.
> > > >
> > > > > > --
> > > > > >  gcc/config/aarch64/aarch64.c   | 22 
> > > > > > ++
> > > > > >  gcc/testsuite/gcc.target/aarch64/cvtf_1.c  |  2 +-
> > > > > >  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
> > > > > >  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c |  2 +-
> > > > > >  .../gcc.target/aarch64/vect-ld1r-compile-fp.c  |  1 -
> > > >
> > > > These testsuite changes concern me a bit, and you don't mention them 
> > > > beyond
> > > > saying they are minor fixes...
> > >
> > > Well any changes to register allocator preferencing would cause fallout in
> > > tests that are assuming which register is allocated, especially if they 
> > > use
> > > nasty inline assembler hacks to do so...
> >
> > Sure, but the testcases here each operate on data that should live in
> > FP_REGS given the initial conditions that the nasty hacks try to mimic -
> > that's what makes the regressions notable.
> >
> > >
> > > > > >  #define FCVTDEF(ftype,itype) \
> > > > > >  void \
> > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> > > > > > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > > index 363f554..8465c89 100644
> > > > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > > > >  {
> > > > > >force_simd_di (b);
> > > > > >b = b >> 63;
> > > > > > +  force_simd_di (b);
> > > > > >b = b >> 0;
> > > > > >b += b >> 65; /* { dg-warning "right shift count >= width of 
> > > > > > type" } */
> > > > > > -  force_simd_di (b);
> > > >
> > > > This one I don't understand, but seems to say that we've decided to move
> > > > b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> > > > another register allocator regression?
> > >
> > > No, basically the register allocator is now making better decisions as to
> > > where to allocate integer variables. It will only allocate them to FP
> > > registers if they are primarily used by other FP operations. The
> > > force_simd_di inline assembler tries to mimic FP uses, and if there are
> > > enough of them at the right places then everything works as expected.  If
> > > however you do 3 consecutive integer operations then the allocator will 
> > > now
> > > correctly prefer to allocate them to the integer registers (while 
> > > previously
> > > it wouldn't, which is inefficient).
> >
> > I'm not sure I understand this argument in the abstract (though I believe
> > it for some of the supported core

[PING] Add new mexecute-only arm option.

2016-02-02 Thread mickael guene


Hi All,

 Ping for following thread :

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01968.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01969.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01970.html

Thanks
Mickael

Re: [PATCH] Fix compile/memory hog in the combiner (PR rtl-optimization/69592)

2016-02-02 Thread Jakub Jelinek

On Tue, Feb 02, 2016 at 12:50:39AM +0100, Bernd Schmidt wrote:
> On 02/01/2016 09:34 PM, Jakub Jelinek wrote:
> >On the following testcase we completely uselessly consume about 5.5GB
> >of RAM and lots of compile time.  The problem is the code to avoid
> >exponential behavior of nonzero_bits/num_sign_bit_copies on binary
> >arithmetics rtxes, which causes us to recurse even when handling
> >of those rtxes is going to ignore those arguments.
> >So, this patch limits those only to the cases where we are going
> >to recurse on both arguments, for rtxes where we don't look at arguments
> >at all or where we only recurse on a single arguments it doesn't make
> >sense.  On the testcase, one of the rtxes where the new predicates
> >return false but ARITHMETIC_P is true, is COMPARE, in particular
> >(compare:CCC (plus (x) (y)) (x)), where it is known even without
> >looking at the operands that only one bit is possibly non-zero and
> >number of sign bit copies is always 1.  But without the patch we
> >needlessly recurse on x, which is set by another similar operation etc.
> 
> Hmm, so the code we have to eliminate performance problems is itself causing
> them?

Yes.

> I don't see any code handling COMPARE in nonzero_bits1, only the various
> EQ/NE/etc. codes.

Right.

> I think I have a slight preference for listing the cases where we know we
> can avoid the exponential behaviour workaround - i.e. just test for compares
> and return false for them. Otherwise someone might add another of the
> arithmetic codes to nonzero_bits without noticing they have to adjust this
> function as well.

The problem is that there are more codes that aren't handled and thus
listing those that need to be handled is shorter and IMO more maintainable.
There are 32 ARITHMETIC_P codes, and nonzero_bits1 handles by recursing on
both operands 14 of them, and num_signed_bit_copies1 only 10.

I wonder if it wouldn't be better to pass around some structure, containing
for the common case fixed size cache and perhaps fall back to hash_map if
there are more calls to cache than that.  Plus perhaps a recursion depth, so
that we avoid other pathological cases.

Jakub

79 matches

Mail list logo