date:20160112

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2016-01-12 Thread Jeff Law


On 01/12/2016 08:11 AM, Richard Biener wrote:

On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law  wrote:

On 01/11/2016 03:32 AM, Richard Biener wrote:



Yeah, reassoc is largely about canonicalization.


Plus doing it in TER is almost certainly more complex than getting it
right
in reassoc to begin with.



I guess canonicalizing differently is ok but you'll still create
((a & b) & 1) & c then if you only change the above place.


What's best for that expression would depend on factors like whether or not
the target can exploit ILP.  ie (a & b) & (1 & c) exposes more parallelism
while (((a & b) & c) & 1) is not good for parallelism, but does expose the
bit test.

reassoc currently generates ((a & 1) & b) & c which is dreadful as there's
no ILP or chance of creating a bit test.  My patch shuffles things around,
but still doesn't expose the ILP or bit test in the 4 operand case.  Based
on the comments in reassoc, it didn't seem like the author thought anything
beyond the 3-operand case was worth handling. So my patch just handles the
3-operand case.





So I'm not sure what pattern the backend is looking for?


It just wants the constant last in the sequence.  That exposes bit clear,
set, flip, test, etc idioms.


But those don't feed another bit operation, right?  Thus we'd like to see
((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
are designed to feed conditionals (aka CC consuming ops)?
At the gimple level they could feed a conditional, or be part of a 
series of ops on an SSA_NAME that eventually gets stored to memory, etc. 
 At the RTL level they'll feed CC consumers and bit manipulations of 
pseudos or memory.


For the 3-op case, we always want the constant last.  For the 4-op case 
it's less clear.  Though ((a & b) & c) & 1 is certainly better than ((a 
& b) & 1) & c.


Jeff

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Jim Wilson

On Tue, Jan 12, 2016 at 5:40 PM, Jim Wilson  wrote:
> The info is in here
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932
> See the comments on gcc.target/arm/wmul-[123].c which no longer
> generate smulbb etc instructions, which are 16x16=32 expanding
> multiplies which are faster on some older parts that have them.  They
> are present in armv5e and higher architecture versions.

I forgot about the ldrub/ldrsb problem.  ldrub is preferred,
particularly for older targets, e.g. thumb1, as it accepts more
addressing modes than ldrsb.  We can't get ldrub if PROMOTE_MODE
doesn't do unsigned extension.

So we have a number of bad choices here
1) We can remove sign-changing promotions from PROMOTE_MODE, and
accept slower code for pre-thumb2 architectures.
2) We can add sign-changing promotions to function_promote_mode, and
accept a minor ABI change.
3) We can add strange and probably fragile extensions to the middle
end to work around the ARM back end problem.
4) We can just leave the ARM port broken and let it occasionally
generate incorrect code.

Option 4 is the one that we've been using for the last 8 months or so.
I think we should do either 1 or 2, though that depends on what the
ARM maintainers are willing to accept.

Jim

[PATCH][PR tree-optimization/pr67755] Fix profile insanity adjustments

2016-01-12 Thread Jeff Law



tree-ssa-threadupdate.c has code to compensate for situations where 
threading will produce "profile insanities".  Specifically if we have 
multiple jump thread paths through a common joiner to different final 
targets, then we can end up increasing the counts on one path to 
compensate for counts that will get lost when threading the later path.


I feel that code is papering over the real issue elsewhere in the 
threader's profile update code, but I'm not sure we can fix without some 
significant revamping, which is probably riskier than we'd like at this 
stage.


So this patch detects more accurately when those adjustments may be 
needed rather than just blindly applying them all the time.


The net result is more accurate counts/probabilities in cases where 
there's a single destination for one or more jump threads through a 
common join block as is the case in the BZ.  I also ran some older GCC 
.i files and saw it making similar fixes in a half-dozen or so files via 
that test.


Additionally, I verified that there are no new "Invalid sum of..." 
messages across blob of .i files I had lying around with the patch 
installed for both vrp and dom passes.


And, of course the usual bootstrap and regression testing on 
x86_64-linux-gnu and verification of the testcase using a ppc64 cross 
compiler.


Installed on the trunk.

Jeff
commit f1af94e0b32743de707f7f872975034ff1a56b64
Author: law 
Date:   Wed Jan 13 04:17:36 2016 +

[PATCH][PR tree-optimization/pr67755] Fix profile insanity adjustments

PR tree-optimization/pr67755
* tree-ssa-threadupdate.c (struct ssa_local_info_t): Add new field
"need_profile_correction".
(thread_block_1): Initialize new field to false by default.  If we
have multiple thread paths through a common joiner to different
final targets, then set new field to true.
(compute_path_counts): Only do count adjustment when it's really
needed.

PR tree-optimization/67755
* gcc.dg/tree-ssa/pr67755.c: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@232313 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6423a37..f8f818b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2016-01-12  Jeff Law  
+
+   PR tree-optimization/pr67755
+   * tree-ssa-threadupdate.c (struct ssa_local_info_t): Add new field
+   "need_profile_correction".
+   (thread_block_1): Initialize new field to false by default.  If we
+   have multiple thread paths through a common joiner to different
+   final targets, then set new field to true.
+   (compute_path_counts): Only do count adjustment when it's really
+   needed.
+
 2016-01-12  Sandra Loosemore 
 
* doc/invoke.texi (Spec Files): Move section down in file, past
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 11f3b0c..240fae7 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,4 +1,9 @@
-2015-01-13  Thomas Preud'homme  
+2016-01-13  Jeff Law  
+
+   PR tree-optimization/67755
+   * gcc.dg/tree-ssa/pr67755.c: New test.
+
+2016-01-13  Thomas Preud'homme  
 
* gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Replace static
pass number in output by a star.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr67755.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr67755.c
new file mode 100644
index 000..64ffd0b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr67755.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-dom2-details-blocks" } */
+/* We want to verify no outgoing edge from a conditional
+   has a probability of 100%.  */
+/* { dg-final { scan-tree-dump-not "succ:\[ \]+. .100.0%.  
.\(TRUE|FALSE\)_VALUE" "dom2"} } */
+
+
+void (*zend_block_interruptions) (void);
+
+int * _zend_mm_alloc_int (int * heap, long int size)
+{
+  int *best_fit;
+  long int true_size = (size < 15 ? 32 : size);
+
+  if (zend_block_interruptions)
+zend_block_interruptions ();
+
+  if (__builtin_expect ((true_size < 543), 1))
+best_fit = heap + 2;
+  else
+best_fit = heap;
+
+  return best_fit;
+}
+
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 1bf9ae6..4783c4b 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -239,6 +239,11 @@ struct ssa_local_info_t
 
   /* Blocks duplicated for the thread.  */
   bitmap duplicate_blocks;
+
+  /* When we have multiple paths through a joiner which reach different
+ final destinations, then we may need to correct for potential
+ profile insanities.  */
+  bool need_profile_correction;
 };
 
 /* Passes which use the jump threading code register jump threading
@@ -826,7 +831,8 @@ compute_path_counts (struct redirection_data *rd,
  So

Re: [PATCH] PR testsuite/69181: ensure expected multiline outputs is cleared per-test

2016-01-12 Thread Jeff Law


On 01/12/2016 12:34 PM, David Malcolm wrote:


I looked at this code, and there are two near-identical blocks which
reset all these variables. You are modifying only one of them, leaving
the one inside the if { catch } thing unchanged - is this intentional?


I'm not particularly strong at Tcl, but am I right in thinking that
given that we have this:

if { [ catch { eval saved-dg-test $args } errmsg ] } {
(A) set and unset various things
error $errmsg $saved_info
}
(B) set and unset the same various things as (A)

that (B) will always be reached, and that the duplicates in (A) are
redundant? (unless they affect "error")

Seems like it would, but, well it's TCL, so who in the hell knows.




I see that this pattern was introduced back in r67696 aka
91a385a522a94154f9e0cd940c5937177737af02:
Strangely, I can't find the patch in the archives nor any discussion for 
the patch.  It seems to have appeared from nowhere.   My search-fu must 
be weak tonight.  It may not have helped understand why this code is the 
way it is anyway.


This duplication screams that it ought to be its own procedure if we're 
going to keep the apparently duplicated behaviour.


Jeff

[doc, 2/n] invoke.texi: move spec file section

2016-01-12 Thread Sandra Loosemore

I've checked in this patch to move the section that documents spec file 
formats towards the end of the invoke.texi chapter, so it isn't stuck 
randomly in the middle of the discussion of unrelated command-line 
options.  I made no changes to the text of the section being moved here.


-Sandra

2016-01-12  Sandra Loosemore 

	gcc/
	* doc/invoke.texi (Spec Files): Move section down in file, past
	all command-line option descriptions.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 232306)
+++ gcc/doc/invoke.texi	(working copy)
@@ -146,11 +146,11 @@ only one of these two forms, whichever o
 * Link Options::Specifying libraries and so on.
 * Directory Options::   Where to find header files and libraries.
 Where to find the compiler executable files.
-* Spec Files::  How to pass switches to sub-processes.
 * Submodel Options::Specifying minor hardware or convention variations,
 such as 68010 vs 68020.
 * Code Gen Options::Specifying conventions for function calls, data layout
 and register usage.
+* Spec Files::  How to pass switches to sub-processes.
 * Environment Variables:: Env vars that affect GCC.
 * Precompiled Headers:: Compiling a header once, and using it many times.
 @end menu
@@ -11840,586 +11840,6 @@ for header files.  Thus, @option{-I-} an
 independent.
 @end table
 
-@c man end
-
-@node Spec Files
-@section Specifying Subprocesses and the Switches to Pass to Them
-@cindex Spec Files
-
-@command{gcc} is a driver program.  It performs its job by invoking a
-sequence of other programs to do the work of compiling, assembling and
-linking.  GCC interprets its command-line parameters and uses these to
-deduce which programs it should invoke, and which command-line options
-it ought to place on their command lines.  This behavior is controlled
-by @dfn{spec strings}.  In most cases there is one spec string for each
-program that GCC can invoke, but a few programs have multiple spec
-strings to control their behavior.  The spec strings built into GCC can
-be overridden by using the @option{-specs=} command-line switch to specify
-a spec file.
-
-@dfn{Spec files} are plaintext files that are used to construct spec
-strings.  They consist of a sequence of directives separated by blank
-lines.  The type of directive is determined by the first non-whitespace
-character on the line, which can be one of the following:
-
-@table @code
-@item %@var{command}
-Issues a @var{command} to the spec file processor.  The commands that can
-appear here are:
-
-@table @code
-@item %include <@var{file}>
-@cindex @code{%include}
-Search for @var{file} and insert its text at the current point in the
-specs file.
-
-@item %include_noerr <@var{file}>
-@cindex @code{%include_noerr}
-Just like @samp{%include}, but do not generate an error message if the include
-file cannot be found.
-
-@item %rename @var{old_name} @var{new_name}
-@cindex @code{%rename}
-Rename the spec string @var{old_name} to @var{new_name}.
-
-@end table
-
-@item *[@var{spec_name}]:
-This tells the compiler to create, override or delete the named spec
-string.  All lines after this directive up to the next directive or
-blank line are considered to be the text for the spec string.  If this
-results in an empty string then the spec is deleted.  (Or, if the
-spec did not exist, then nothing happens.)  Otherwise, if the spec
-does not currently exist a new spec is created.  If the spec does
-exist then its contents are overridden by the text of this
-directive, unless the first character of that text is the @samp{+}
-character, in which case the text is appended to the spec.
-
-@item [@var{suffix}]:
-Creates a new @samp{[@var{suffix}] spec} pair.  All lines after this directive
-and up to the next directive or blank line are considered to make up the
-spec string for the indicated suffix.  When the compiler encounters an
-input file with the named suffix, it processes the spec string in
-order to work out how to compile that file.  For example:
-
-@smallexample
-.ZZ:
-z-compile -input %i
-@end smallexample
-
-This says that any input file whose name ends in @samp{.ZZ} should be
-passed to the program @samp{z-compile}, which should be invoked with the
-command-line switch @option{-input} and with the result of performing the
-@samp{%i} substitution.  (See below.)
-
-As an alternative to providing a spec string, the text following a
-suffix directive can be one of the following:
-
-@table @code
-@item @@@var{language}
-This says that the suffix is an alias for a known @var{language}.  This is
-similar to using the @option{-x} command-line switch to GCC to specify a
-language explicitly.  For example:
-
-@smallexample
-.ZZ:
-@@c++
-@end smallexample
-
-Says that .ZZ files are, in fact, C++ source files.
-
-@item #@var{name}
-This causes an error messages

[PATCH] Fix PR69174

2016-01-12 Thread Richard Biener


This fixes an oversight in strided permuted SLP loads which miscomputed
the number of required loads.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69174
* tree-vect-stmts.c (vect_mark_relevant): Remove excessive vertical
space.
(vectorizable_load): Properly compute the number of loads needed
for permuted strided SLP loads and do not spuriously assign
to SLP_TREE_VEC_STMTS.

* gcc.dg/torture/pr69174.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 232213)
--- gcc/tree-vect-stmts.c   (working copy)
*** vect_mark_relevant (vec *workl
*** 190,197 
gimple *pattern_stmt;
  
if (dump_enabled_p ())
! dump_printf_loc (MSG_NOTE, vect_location,
!  "mark relevant %d, live %d.\n", relevant, live_p);
  
/* If this stmt is an original stmt in a pattern, we might need to mark its
   related pattern stmt instead of the original stmt.  However, such stmts
--- 190,200 
gimple *pattern_stmt;
  
if (dump_enabled_p ())
! {
!   dump_printf_loc (MSG_NOTE, vect_location,
!  "mark relevant %d, live %d: ", relevant, live_p);
!   dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
! }
  
/* If this stmt is an original stmt in a pattern, we might need to mark its
   related pattern stmt instead of the original stmt.  However, such stmts
*** vectorizable_load (gimple *stmt, gimple_
*** 6748,6756 
  else
ltype = vectype;
  ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype)));
! ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
  if (slp_perm)
!   dr_chain.create (ncopies);
}
for (j = 0; j < ncopies; j++)
{
--- 6751,6766 
  else
ltype = vectype;
  ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype)));
! /* For SLP permutation support we need to load the whole group,
!not only the number of vector stmts the permutation result
!fits in.  */
  if (slp_perm)
!   {
! ncopies = (group_size * vf + nunits - 1) / nunits;
! dr_chain.create (ncopies);
!   }
! else
!   ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
}
for (j = 0; j < ncopies; j++)
{
*** vectorizable_load (gimple *stmt, gimple_
*** 6798,6806 
  
  if (slp)
{
- SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
  if (slp_perm)
dr_chain.quick_push (gimple_assign_lhs (new_stmt));
}
  else
{
--- 6808,6817 
  
  if (slp)
{
  if (slp_perm)
dr_chain.quick_push (gimple_assign_lhs (new_stmt));
+ else
+   SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
}
  else
{
Index: gcc/testsuite/gcc.dg/torture/pr69174.c
===
*** gcc/testsuite/gcc.dg/torture/pr69174.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69174.c  (working copy)
***
*** 0 
--- 1,19 
+ /* { dg-do compile } */
+ 
+ typedef int pixval;
+ typedef struct { pixval r, g, b; } xel;
+ int convertRow_sample, convertRaster_col;
+ short *convertRow_samplebuf;
+ xel *convertRow_xelrow;
+ short convertRow_spp;
+ void fn1() {
+ int *alpharow;
+ for (; convertRaster_col;
+++convertRaster_col, convertRow_sample += convertRow_spp) {
+   convertRow_xelrow[convertRaster_col].r =
+   convertRow_xelrow[convertRaster_col].g =
+   convertRow_xelrow[convertRaster_col].b =
+   convertRow_samplebuf[convertRow_sample];
+   alpharow[convertRaster_col] = convertRow_samplebuf[convertRow_sample + 
3];
+ }
+ }

Re: [PATCH, ARM] Fox target/69180] #pragma GCC target should not warn about redefined macros

2016-01-12 Thread Kyrill Tkachov



On 12/01/16 09:00, Christian Bruel wrote:



On 01/11/2016 03:37 PM, Kyrill Tkachov wrote:

On 11/01/16 12:54, Christian Bruel wrote:

Hi Kyrill,

On 01/11/2016 12:32 PM, Kyrill Tkachov wrote:

Hi Christian,

On 07/01/16 15:40, Christian Bruel wrote:
as discussed with Kyrill (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00307.html), this patch avoids confusing (for the testsuite) macro redefinition warning or pedantic errors when the user changes FP versions implicitly with a 
#pragma

GCC target. The warning is kept when the macro is redefined explicitly by the 
user.

tested on arm-linux-gnueabi for {,-mfpu=neon-fp-armv8,-mfpu=neon}


Index: config/arm/arm-c.c
===
--- config/arm/arm-c.c(revision 232101)
+++ config/arm/arm-c.c(working copy)
@@ -23,6 +23,7 @@
#include "c-family/c-common.h"
#include "tm_p.h"
#include "c-family/c-pragma.h"
+#include "stringpool.h"

/* Output C specific EABI object attributes.  These can not be done in
   arm.c because they require information from the C frontend.  */
@@ -245,8 +246,18 @@ arm_pragma_target_parse (tree args, tree

  /* Update macros.  */
  gcc_assert (cur_opt->x_target_flags == target_flags);
-  /* This one can be redefined by the pragma without warning.  */
-  cpp_undef (parse_in, "__ARM_FP");
+
+  /* Don't warn for macros that have context sensitive values depending on
+ other attributes.
+ See warn_of_redefinition, Reset after cpp_create_definition.  */
+  tree acond_macro = get_identifier ("__ARM_NEON_FP");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL ;
+
+  acond_macro = get_identifier ("__ARM_FP");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL;
+
+  acond_macro = get_identifier ("__ARM_FEATURE_LDREX");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL;

I see this mechanism also being used by rs6000, s390 and spu but I'm not very 
familiar with it.
Could you please provide a short explanatino of what NODE_CONDITIONAL means?
I suspec this is ok, but I'd like to get a better understanding of what's going 
on here.

This is part of a larger support for context-sensitive keywords implemented for 
rs6000 (patch digging https://gcc.gnu.org/ml/gcc-patches/2007-12/msg00306.html).

On ARM those preprocessor macros are always defined so we don't need to define 
the macro_to_expand cpp hook.  However their value does legitimately change in 
the specific #pragma target path so we reuse this logic for this path.
The macro will always be correctly recognized on the other paths(#ifdef,...) 
because the NODE_CONDITIONAL bit is cleared when defined (see 
cpp_create_definition). The idea of the original rs6000 patch is that if a 
macro is user-defined it
is not context-sensitive.
So this is absolutely a reuse of a subpart of a larger support, but this logic fits and 
works well for our goal, given that the preprocessor value can change between target 
contexts, and that the bit is not set for "normal" builtin
definition.

In short:  Ask `warn_of_redefinition` to be permissive about those macro 
redefinitions when we come from a pragma target definition, as if we were 
redefining a context-sensitive macro,  the difference is that it is always 
defined.

does this sound clear :-) ?


Thanks, it's much clearer now.
A couple of comments on the patch then

+  tree acond_macro = get_identifier ("__ARM_NEON_FP");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL ;

So what happens if __ARM_FP was never defined, does get_identifier return 
NULL_TREE?
If so, won't C_CPP_HASHNODE (acond_macro)->flags ICE?


get_identifier returns an allocated tree, even in not in the pool already. So 
won't ICE.



Index: testsuite/gcc.target/arm/pr69180.c
===
--- testsuite/gcc.target/arm/pr69180.c(revision 0)
+++ testsuite/gcc.target/arm/pr69180.c(working copy)
@@ -0,0 +1,16 @@
+/* PR target/69180
+   Check that __ARM_NEON_FP redefinition warns for user setting and not for
+   #pragma GCC target.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-mfloat-abi=softfp -mfpu=neon" } */
+

I believe we should use /* { dg-add-options arm_neon } */ here.


I also first did this, but the test would fail because -pedantic-error set by 
DEFAULT_CFLAGS turns the warning into errors. So I preferred to reset 
explicitly the options.



Thanks for the explanations.
This is ok for trunk.

Kyrill



Thanks,
Kyrill

[PATCH] Fix PR69168

2016-01-12 Thread Richard Biener


The following fixes the assumption that we consistently have patterns
used in SLP.  That's not true given we skip them if the original
def is life or relevant during SLP analysis.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69168
* tree-vect-loop.c (vect_analyze_loop_2): Reset both main and
pattern stmt SLP type.
* tree-vect-slp.c (vect_detect_hybrid_slp_stmts): Patterns may
end up unused so cope with that case.

* gcc.dg/torture/pr69168.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 232231)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2189,10 +2189,11 @@ again:
   !gsi_end_p (si); gsi_next ())
{
  stmt_vec_info stmt_info = vinfo_for_stmt (gsi_stmt (si));
+ STMT_SLP_TYPE (stmt_info) = loop_vect;
  if (STMT_VINFO_IN_PATTERN_P (stmt_info))
{
- gcc_assert (STMT_SLP_TYPE (stmt_info) == loop_vect);
  stmt_info = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info));
+ STMT_SLP_TYPE (stmt_info) = loop_vect;
  for (gimple_stmt_iterator pi
 = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
   !gsi_end_p (pi); gsi_next ())
@@ -2201,7 +2202,6 @@ again:
  STMT_SLP_TYPE (vinfo_for_stmt (pstmt)) = loop_vect;
}
}
- STMT_SLP_TYPE (stmt_info) = loop_vect;
}
 }
   /* Free optimized alias test DDRS.  */
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 232231)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2016,10 +2016,10 @@ vect_detect_hybrid_slp_stmts (slp_tree n
 {
   /* Check if a pure SLP stmt has uses in non-SLP stmts.  */
   gcc_checking_assert (PURE_SLP_STMT (stmt_vinfo));
-  /* We always get the pattern stmt here, but for immediate
-uses we have to use the LHS of the original stmt.  */
-  gcc_checking_assert (!STMT_VINFO_IN_PATTERN_P (stmt_vinfo));
-  if (STMT_VINFO_RELATED_STMT (stmt_vinfo))
+  /* If we get a pattern stmt here we have to use the LHS of the
+ original stmt for immediate uses.  */
+  if (! STMT_VINFO_IN_PATTERN_P (stmt_vinfo)
+ && STMT_VINFO_RELATED_STMT (stmt_vinfo))
stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo);
   if (TREE_CODE (gimple_op (stmt, 0)) == SSA_NAME)
FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, gimple_op (stmt, 0))
Index: gcc/testsuite/gcc.dg/torture/pr69168.c
===
--- gcc/testsuite/gcc.dg/torture/pr69168.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr69168.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+long a, b, e;
+short *c;
+int *d;
+void fn1()
+{
+  int i;
+  for (; e; e--)
+{
+  i = 2;
+  for (; i; i--)
+   a = b = *d++ / (1 << 9);
+  b = b ? 8 : a;
+  *c++ = *c++ = b;
+}
+}

[PATCH, testsuite] Stabilize test result output of dump-noaddr

2016-01-12 Thread Thomas Preud'homme

Hi,

Everytime the static pass number of passes change, testsuite output for dump-
noaddr will change, leading to a series of noise lines like the following 
under dg-cmp-results:

PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O1  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O2  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O2 -flto -fno-use-
linker-plugin -flto-partition=none  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O2 -flto -fuse-
linker-plugin -fno-fat-lto-objects  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O3 -fomit-frame-
pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O3 -g  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -Og -g  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -Os  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O1  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O2  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O2 -flto -fno-use-
linker-plugin -flto-partition=none  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O2 -flto -fuse-
linker-plugin -fno-fat-lto-objects  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O3 -fomit-frame-
pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O3 -g  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -Og -g  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -Os  comparison

This patch solve this problem by replacing the static pass number in the 
output by a star, allowing for a stable output while retaining easy copy/
pasting in shell.

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2015-12-30  Thomas Preud'homme  

* gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Replace static
pass number in output by a star.


diff --git a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x b/gcc/
testsuite/gcc.c-torture/unsorted/dump-noaddr.x
index a8174e0..001dd6b 100644
--- a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
+++ b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
@@ -18,6 +18,7 @@ proc dump_compare { src options } {
foreach dump1 [lsort [glob -nocomplain dump1/*]] {
regsub dump1/ $dump1 dump2/ dump2
set dumptail "gcc.c-torture/unsorted/[file tail $dump1]"
+   regsub {\.\d+((t|r|i)\.[^.]+)$} $dumptail {.*\1} dumptail
#puts "$option $dump1"
set tmp [ diff "$dump1" "$dump2" ]
if { $tmp == 0 } {


Is this ok for stage3?

Best regards,

Thomas

[PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Tom de Vries


Hi,

This patch fixes PR69110, a wrong-code bug in autopar.


I.

consider testcase test.c:
...
#define N 1000

unsigned int i = 0;

static void __attribute__((noinline, noclone))
foo (void)
{
  unsigned int z;

  for (z = 0; z < N; ++z)
++i;
}

extern void abort (void);

int
main (void)
{
  foo ();
  if (i != N)
abort ();

  return 0;
}
...

When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the 
test fails:

...
$ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd 
-P)//install/lib64 -fno-tree-loop-im

$ ./a.out
Aborted (core dumped)
$
...


II.

Before parloops, at ivcanon we have the loop body:
...
  :
  # z_10 = PHI 
  # ivtmp_12 = PHI 
  i.1_4 = i;
  _5 = i.1_4 + 1;
  i = _5;
  z_7 = z_10 + 1;
  ivtmp_2 = ivtmp_12 - 1;
  if (ivtmp_2 != 0)
goto ;
  else
goto ;
...

There's a loop-carried dependency in i, that is, the read from i in 
iteration z == 1 depends on the write to i in iteration z == 0. So the 
loop cannot be parallelized. The test-case fails because parloops still 
parallelizes the loop.



III.

Since the loop carried dependency is in-memory, it is not handled by the 
code analyzing reductions, since that code ignores the virtual phi.


So, AFAIU, this loop carried dependency should be handled by the 
dependency testing in loop_parallel_p. And loop_parallel_p returns true 
for this loop.


A comment in loop_parallel_p reads: "Check for problems with 
dependences.  If the loop can be reversed, the iterations are independent."


AFAIU, the loop order can actually be reversed. But, it cannot be 
executed in parallel.


So from this perspective, it seems in this case the comment matches the 
check, but the check is not sufficient.



IV.

OTOH, if we replace the declaration of i with i[1], and replace the 
references of i with i[0], we see that loop_parallel_p fails.  So the 
loop_parallel_p check in this case seems sufficient, and there's 
something else that causes the check to fail in this case.


The difference is in the generated data ref:
- in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
  vector with a single element: access function 0.
- in the 'i' case, we set DR_ACCESS_FNS to NULL.

This difference causes different handling in the dependency generation, 
in particular in add_distance_for_zero_overlaps which has no effect for 
the 'i' case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of 
the NULL access_fns of both the source and sink data refs).


From this perspective, it seems that the loop_parallel_p check is 
sufficient, and that dr_analyze_indices shouldn't return a NULL 
access_fns for 'i'.



V.

When compiling with graphite using -floop-parallelize-all --param 
graphite-min-loops-per-function=1, we find:

...
[scop-detection-fail] Graphite cannot handle data-refs in stmt:
# VUSE <.MEM_11>
i.1_4 = i;
...

The function scop_detection::stmt_has_simple_data_refs_p returns false 
because of the code recently added for PR66980 at r228357:

...
  int nb_subscripts = DR_NUM_DIMENSIONS (dr);

  if (nb_subscripts < 1)
{
  free_data_refs (drs);
  return false;
}
...

[ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]

This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis 
has failed'.


From this perspective, it seems that the dependence handling should 
bail out once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or 
DR_ACCESS_FNS == 0).



VI.

This test-case used to pass in 4.6 because in 
find_data_references_in_stmt we had:

...
  /* FIXME -- data dependence analysis does not work correctly for
 objects with invariant addresses in loop nests.  Let us fail
 here until the problem is fixed.  */
  if (dr_address_invariant_p (dr) && nest)
{
  free_data_ref (dr);
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
 "\tFAILED as dr address is invariant\n");
  ret = false;
  break;
}
...

That FIXME was removed in the fix for PR46787, at r175704.

The test-case fails in 4.8, and I guess from there onwards.


VII.

The attached patch fixes the problem by returning a zero access function 
for 'i' in dr_analyze_indices.


[ But I can also imagine a solution similar to the graphite fix:
...
@@ -3997,6 +3999,12 @@ find_data_references_in_stmt
   dr = create_data_ref (nest, loop_containing_stmt (stmt),
ref->ref, stmt, ref->is_read);
   gcc_assert (dr != NULL);
+  if (DR_NUM_DIMENSIONS (dr) == 0)
+   {
+ datarefs->release ();
+ return false;
+   }
+
   datarefs->safe_push (dr);
 }
   references.release ();
...

I'm not familiar enough with the dependency analysis code to know where 
exactly this should be fixed. ]


Bootstrapped and reg-tested on x86_64.

OK for trunk?

OK for release branches?

Thanks,
- Tom
Don't

Ping: check initializer to be zero in .bss-like sections

2016-01-12 Thread Jan Beulich

>>> On 10.12.15 at 08:21,  wrote:
> Just like gas, which has recently learned to reject such initializers,
> gcc shouldn't accept such either.
> ---
> The only question really is whether the new test case should be limited
> to certain targets - I haven't been able to figure out possible valid
> qualifiers to use here.
> 
> gcc/
> 2015-12-10  Jan Beulich  
> 
>   * varasm.c (get_variable_section): Validate initializer in
>   named .bss-like sections.
> 
> gcc/testsuite/
> 2015-12-10  Jan Beulich  
> 
>   * gcc.dg/bss.c: New.
> 
> --- 2015-12-09/gcc/testsuite/gcc.dg/bss.c
> +++ 2015-12-09/gcc/testsuite/gcc.dg/bss.c
> @@ -0,0 +1,8 @@
> +/* Test non-zero initializers in .bss-like sections get properly refused.  
> */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +int __attribute__((section(".bss.local"))) x = 1; /* { dg-error "" "zero 
> init" } */
> +int *__attribute__((section(".bss.local"))) px =  /* { dg-error "" "zero 
> init" } */
> +int __attribute__((section(".bss.local"))) y = 0;
> +int *__attribute__((section(".bss.local"))) py = (void*)0;
> --- 2015-12-09/gcc/varasm.c
> +++ 2015-12-09/gcc/varasm.c
> @@ -1150,7 +1150,18 @@ get_variable_section (tree decl, bool pr
>  
>resolve_unique_section (decl, reloc, flag_data_sections);
>if (IN_NAMED_SECTION (decl))
> -return get_named_section (decl, NULL, reloc);
> +{
> +  section *sect = get_named_section (decl, NULL, reloc);
> +
> +  if ((sect->common.flags & SECTION_BSS) && !bss_initializer_p (decl))
> + {
> +   error_at (DECL_SOURCE_LOCATION (decl),
> + "only zero initializers are allowed in section %qs",
> + sect->named.name);
> +   DECL_INITIAL (decl) = error_mark_node;
> + }
> +  return sect;
> +}
>  
>if (ADDR_SPACE_GENERIC_P (as)
>&& !DECL_THREAD_LOCAL_P (decl)
> 
> 
>

[PATCH] Fix PR69157

2016-01-12 Thread Richard Biener


With the work-around-limited-IL patch I put in earlier we now need
to cope with dt_external stmts (in self-referencing SLPs).  Thus
keep the def-type check to the analysis phase (larger refactoring
to split analysis and transform phase more properly is not appropriate
at this stage).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69157
* tree-vect-stmts.c (vectorizable_mask_load_store): Check
stmts def type only during analyze phase.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.

* gcc.dg/torture/pr69157.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 232213)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_mask_load_store (gimple *st
*** 1757,1763 
if (!STMT_VINFO_RELEVANT_P (stmt_info))
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
if (!STMT_VINFO_DATA_REF (stmt_info))
--- 1760,1767 
if (!STMT_VINFO_RELEVANT_P (stmt_info))
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
if (!STMT_VINFO_DATA_REF (stmt_info))
*** vectorizable_call (gimple *gs, gimple_st
*** 2206,2212 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is GS a vectorizable call?   */
--- 2210,2217 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is GS a vectorizable call?   */
*** vectorizable_simd_clone_call (gimple *st
*** 2811,2817 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
if (gimple_call_lhs (stmt)
--- 2816,2823 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
if (gimple_call_lhs (stmt)
*** vectorizable_conversion (gimple *stmt, g
*** 3669,3675 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
if (!is_gimple_assign (stmt))
--- 3675,3682 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
if (!is_gimple_assign (stmt))
*** vectorizable_assignment (gimple *stmt, g
*** 4246,4252 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is vectorizable assignment?  */
--- 4253,4260 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is vectorizable assignment?  */
*** vectorizable_shift (gimple *stmt, gimple
*** 4462,4468 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
--- 4470,4477 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
*** vectorizable_operation (gimple *stmt, gi
*** 4823,4829 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
--- 4832,4839 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
*** vectorizable_store (gimple *stmt, gimple
*** 5248,5254 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)

RE: [PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-01-12 Thread Thomas Preud'homme

> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 1:58 PM
> 
> Hi,
> 
> We decided to apply the following patch to the ARM embedded 5 branch.
> This is *not* intended for trunk for now. We will send a separate email
> for trunk.

And now a rebased patch on top of trunk.


> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch fixes some assumptions related to M profile
> architectures. Currently GCC (mostly libgcc) contains several assumptions
> that the only ARM architecture with Thumb-1 only instructions is ARMv6-
> M and the only one with Thumb-2 only instructions is ARMv7-M. ARMv8-
> M [1] make this wrong since ARMv8-M baseline is also (mostly) Thumb-1
> only and ARMv8-M mainline is also Thumb-2 only. This patch replace
> checks for __ARM_ARCH_*__ for checks against
> __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM instead. For
> instance, Thumb-1 only can be checked with
> #if !defined(__ARM_ARCH_ISA_ARM) && (__ARM_ARCH_ISA_THUMB
> == 1). It also fixes the guard for DIV code to not apply to ARMv8-M
> Baseline since it uses Thumb-2 instructions.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> ChangeLog entries are as follow:
> 
> 

*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to
decide whether to prevent some libgcc routines being included for some
multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the
link between this condition and the one in
libgcc/config/arm/lib1func.S.
* config/arm/arm.h (TARGET_ARM_V6M): Add check to TARGET_ARM_ARCH.
(TARGET_ARM_V7M): Likewise.


*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
__ARM_ARCH_ISA_ARM to test for Cortex-M devices.


*** libgcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

* config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1 rather
than ARMv6-M.
* config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases
for all Thumb-1 only targets.
(__only_thumb1__): Define for all Thumb-1 only targets.
(THUMB_LDIV0): Test for __only_thumb1__ rather than __ARM_ARCH_6M__.
(EQUIV): Likewise.
(ARM_FUNC_ALIAS): Likewise.
(umodsi3): Add check to __only_thumb1__ to guard the idiv version.
(modsi3): Likewise.
(HAVE_ARM_CLZ): Remove block defining it.
(clzsi2): Test for __only_thumb1__ rather than __ARM_ARCH_6M__ and
check __ARM_FEATURE_CLZ instead of HAVE_ARM_CLZ.
(clzdi2): Likewise.
(ctzsi2): Likewise.
(L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
__ARM_ARCH_6M__ in guard for checking whether it is defined.
(final includes): Test for __only_thumb1__ rather than
__ARM_ARCH_6M__ and add comment to indicate the connection between
this condition and the one in gcc/config/arm/elf.h.
* config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
__ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
* config/arm/t-softfp: Likewise.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index fd999dd..0d23f39 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2182,8 +2182,10 @@ extern int making_const_table;
 #define TARGET_ARM_ARCH\
   (arm_base_arch)  \
 
-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
+#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm \
+   && !arm_arch_thumb2)
+#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm \
+   && arm_arch_thumb2)
 
 /* The highest Thumb instruction set version supported by the chip.  */
 #define TARGET_ARM_ARCH_ISA_THUMB  \
diff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h
index 3795728..579a580 100644
--- a/gcc/config/arm/elf.h
+++ b/gcc/config/arm/elf.h
@@ -148,8 +148,9 @@
   while (0)
 
 /* Horrible hack: We want to prevent some libgcc routines being included
-   for some multilibs.  */
-#ifndef __ARM_ARCH_6M__
+   for some multilibs.  The condition should match the one in
+   libgcc/config/arm/lib1funcs.S.  */
+#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
 #undef L_fixdfsi
 #undef L_fixunsdfsi
 #undef L_truncdfsf2
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4e349e9..3f96826 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3221,10 +3221,8 @@ proc check_effective_target_arm_cortex_m { } {

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Andreas Schwab

gcc.dg/ifcvt-5.c fails on ia64:

>From ifcvt-5.c.223r.ce1:

== Pass 2 ==


== no more changes

1 possible IF blocks searched.
1 IF blocks converted.
2 true changes made.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  wrote:
> On Mon, 11 Jan 2016, H.J. Lu wrote:
>
>> Here is the updated patch.  Joseph, is this OK?
>
> I have no objections to this patch.

Thinking some more, it looks to me that we also have to return 2 when
SSE2 (SSE doubles) is not enabled.

I'm testing following patch:

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 6c63871..b71cf4f 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
argc, const char **argv);
only SSE, rounding is correct; when using both SSE and the FPU,
the rounding precision is indeterminate, since either may be chosen
apparently at random.  */
-#define TARGET_FLT_EVAL_METHOD \
-  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
+#define TARGET_FLT_EVAL_METHOD \
+  (TARGET_MIX_SSE_I387 ? -1\
+   : (TARGET_80387 && !TARGET_SSE2 && !TARGET_SSE_MATH) ? 2 : 0)

 /* Whether to allow x87 floating-point arithmetic on MODE (one of
SFmode, DFmode and XFmode) in the current excess precision

Uros.

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Richard Biener

On Tue, 12 Jan 2016, Tom de Vries wrote:

> Hi,
> 
> This patch fixes PR69110, a wrong-code bug in autopar.
> 
> 
> I.
> 
> consider testcase test.c:
> ...
> #define N 1000
> 
> unsigned int i = 0;
> 
> static void __attribute__((noinline, noclone))
> foo (void)
> {
>   unsigned int z;
> 
>   for (z = 0; z < N; ++z)
> ++i;
> }
> 
> extern void abort (void);
> 
> int
> main (void)
> {
>   foo ();
>   if (i != N)
> abort ();
> 
>   return 0;
> }
> ...
> 
> When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test
> fails:
> ...
> $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd
> -P)//install/lib64 -fno-tree-loop-im
> $ ./a.out
> Aborted (core dumped)
> $
> ...
> 
> 
> II.
> 
> Before parloops, at ivcanon we have the loop body:
> ...
>   :
>   # z_10 = PHI 
>   # ivtmp_12 = PHI 
>   i.1_4 = i;
>   _5 = i.1_4 + 1;
>   i = _5;
>   z_7 = z_10 + 1;
>   ivtmp_2 = ivtmp_12 - 1;
>   if (ivtmp_2 != 0)
> goto ;
>   else
> goto ;
> ...
> 
> There's a loop-carried dependency in i, that is, the read from i in iteration
> z == 1 depends on the write to i in iteration z == 0. So the loop cannot be
> parallelized. The test-case fails because parloops still parallelizes the
> loop.
> 
> 
> III.
> 
> Since the loop carried dependency is in-memory, it is not handled by the code
> analyzing reductions, since that code ignores the virtual phi.
> 
> So, AFAIU, this loop carried dependency should be handled by the dependency
> testing in loop_parallel_p. And loop_parallel_p returns true for this loop.
> 
> A comment in loop_parallel_p reads: "Check for problems with dependences.  If
> the loop can be reversed, the iterations are independent."
> 
> AFAIU, the loop order can actually be reversed. But, it cannot be executed in
> parallel.
> 
> So from this perspective, it seems in this case the comment matches the check,
> but the check is not sufficient.
> 
> 
> IV.
> 
> OTOH, if we replace the declaration of i with i[1], and replace the references
> of i with i[0], we see that loop_parallel_p fails.  So the loop_parallel_p
> check in this case seems sufficient, and there's something else that causes
> the check to fail in this case.
> 
> The difference is in the generated data ref:
> - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
>   vector with a single element: access function 0.
> - in the 'i' case, we set DR_ACCESS_FNS to NULL.
> 
> This difference causes different handling in the dependency generation, in
> particular in add_distance_for_zero_overlaps which has no effect for the 'i'
> case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL
> access_fns of both the source and sink data refs).
> 
> From this perspective, it seems that the loop_parallel_p check is sufficient,
> and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'.
> 
> 
> V.
> 
> When compiling with graphite using -floop-parallelize-all --param
> graphite-min-loops-per-function=1, we find:
> ...
> [scop-detection-fail] Graphite cannot handle data-refs in stmt:
> # VUSE <.MEM_11>
> i.1_4 = i;
> ...
> 
> The function scop_detection::stmt_has_simple_data_refs_p returns false because
> of the code recently added for PR66980 at r228357:
> ...
>   int nb_subscripts = DR_NUM_DIMENSIONS (dr);
> 
>   if (nb_subscripts < 1)
>   {
>   free_data_refs (drs);
>   return false;
> }
> ...
> 
> [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]
> 
> This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has
> failed'.
> 
> From this perspective, it seems that the dependence handling should bail out
> once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS ==
> 0).
> 
> 
> VI.
> 
> This test-case used to pass in 4.6 because in find_data_references_in_stmt we
> had:
> ...
>   /* FIXME -- data dependence analysis does not work correctly for
>  objects with invariant addresses in loop nests.  Let us fail
>  here until the problem is fixed.  */
>   if (dr_address_invariant_p (dr) && nest)
>   {
>   free_data_ref (dr);
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file,
>  "\tFAILED as dr address is invariant\n");
>   ret = false;
>   break;
>   }
> ...
> 
> That FIXME was removed in the fix for PR46787, at r175704.
> 
> The test-case fails in 4.8, and I guess from there onwards.
> 
> 
> VII.
> 
> The attached patch fixes the problem by returning a zero access function for
> 'i' in dr_analyze_indices.
> 
> [ But I can also imagine a solution similar to the graphite fix:
> ...
> @@ -3997,6 +3999,12 @@ find_data_references_in_stmt
>dr = create_data_ref (nest, loop_containing_stmt (stmt),
> ref->ref, stmt, ref->is_read);
>gcc_assert (dr != NULL);
> +  if (DR_NUM_DIMENSIONS (dr) == 0)
> +   {

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-01-12 Thread James Greenhalgh

On Mon, Jan 11, 2016 at 04:57:56PM -0600, Evandro Menezes wrote:
> On 01/11/2016 05:53 AM, James Greenhalgh wrote:
> >I'd like to switch the logic around in aarch64.c such that
> >-mlow-precision-recip-sqrt causes us to always emit the low-precision
> >software expansion for reciprocal square root. I have two reasons to do
> >this; first is consistency across -mcpu targets, second is enabling more
> >-mcpu targets to use the flag for peak tuning.
> >
> >I don't much like that the precision we use for -mlow-precision-recip-sqrt
> >differs between cores (and possibly compiler revisions). Yes, we're
> >under -ffast-math but I take this flag to mean the user explicitly wants the
> >low-precision expansion, and we should not diverge from that based on an
> >internal decision as to what is optimal for performance in the
> >high-precision case. I'd prefer to keep things as predictable as possible,
> >and here that means always emitting the low-precision expansion when asked.
> >
> >Judging by the comments in the thread proposing the reciprocal square
> >root optimisation, this will benefit all cores currently supported by GCC.
> >To be clear, we would still not expand in the high-precision case for any
> >cores which do not explicitly ask for it. Currently that is Cortex-A57
> >and xgene, though I will be proposing a patch to remove Cortex-A57 from
> >that list shortly.
> >
> >Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> >is intended as a tuning flag for situations where performance is more
> >important than precision, but the current logic requires setting an
> >internal flag which also changes the performance characteristics where
> >high-precision is needed. This conflates two decisions the target might
> >want to make, and reduces the applicability of an option targets might
> >want to enable for performance. In particular, I'd still like to see
> >-mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
> >sequence for floats under Cortex-A57.
> >
> >Based on that reasoning, this patch makes the appropriate change to the
> >logic. I've checked with the current -mcpu values to ensure that behaviour
> >without -mlow-precision-recip-sqrt does not change, and that behaviour
> >with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> >
> >I've also put this through bootstrap and test on aarch64-none-linux-gnu
> >with no issues.
> >
> >OK?
> 
> Yes, it LGTM.

Thanks.

> I appreciate the idea of uniformity whne an option is specified,
> which led me to think if it wouldn't be a good ide to add an option
> that would have the effect of focring the emission of the reciprocal
> square root, effectively forcing the flag
> AARCH64_EXTRA_TUNE_RECIP_SQRT on, regardless of the tuning flags for
> the given core.  I think that this flag would be particularly useful
> when specifying flags for specific functions, irrespective of the
> core.
> 
> Thoughts?

Currently you can do this using the (mostly unsupported) -moverride
mechanism as -moverride=tune=recip_sqrt from the command line.
I'm not sure how reliable using this from
__attribute__((target("override=tune=recip_sqrt"))) would be, I wrote a small
testcase that didn't work as intended, but whether that is a bug or a
design decision I'm not yet sure. I think the logic for parsing the
target attribute is set up to reapply the command-line override string
over whichever tuning options you apply through the attribute, rather than
to allow you to apply a per-function override.

As to whether we'd want to expose this as a fully supported,
user-visible setting, I'd rather not. Our claim is that for the
higher-precision sequences the results are close enough that we can
consider this like reassociation width or other core-specific tuning
parameters that we don't expose. What I'm hoping to avoid is a
proliferation of supported options which are not in anybody's regular
testing matrix. This one would not be so bad as it is automatically
enabled by some cores. For now I'd rather not add the option.

Thanks,
James

Re: [PATCH PR68911]Check overflow when computing range information from loop niter bound

2016-01-12 Thread Richard Biener

On Mon, Jan 11, 2016 at 5:11 PM, Bin Cheng  wrote:
> Hi,
> A wrong code bug is reported in PR68911, which GCC generates infinite loop 
> for below example after loop niter analysis changes.  After that change, 
> scev_probably_wraps_p identifies that e_1 in below case never overflow/wrap:
> :
> e_15 = e_1 + 1;
>
> :
> # e_1 = PHI 
> if (e_1 <= 93)
>   goto ;
> else
>   goto ;
>
> The loop niter analysis gives us:
> Analyzing # of iterations of loop 2
>   exit condition [e_8, + , 1] <= 93
>   bounds on difference of bases: -4294967202 ... 93
>   result:
> zero if e_8 > 94
> # of iterations 94 - e_8, bounded by 94
>
> I think the analysis is good.  When scev_probably_wraps_p returns false, it 
> may imply two possible cases.
> CASE 1) If loop's latch gets executed, then we know the SCEV doesn't 
> overflow/wrap during loop execution.
> CASE 2) If loop's latch isn't executed, i.e., the loop exits immediately at 
> its first check on exit condition.  In this case the SCEV doesn't 
> overflow/wrap because it isn't increased at all.
>
> The real problem I think is VRP only checks scev_probably_wraps_p then 
> assumes SCEV won't overflow/wrap after niter.bound iterations.  This is not 
> true for CASE 2).  If we have a large enough starting value for e_1, for 
> example, 0xfff8 in this example, e_1 is guaranteed not overflow/wrap only 
> because the loop exits immediately, not after niter.bound interations.  Here 
> VRP assuming "e_1 + niter.bound" doesn't overflow/wrap is wrong.
>
> This patch fixes the issue by adding overflow check in range information 
> computed for "e_1 + niter.bound".  It catches overflow/wrap of the expression 
> when loop may exit immediately.
>
> With this change, actually I think it may be possible for us to remove the 
> call to scev_probably_wraps_p, though I didn't do that in this patch.
>
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-01-10  Bin Cheng  
>
> PR tree-optimization/68911
> * tree-vrp.c (adjust_range_with_scev): Check overflow in range
> information computed for expression "init + nit * step".
>
> gcc/testsuite/ChangeLog
> 2016-01-10  Bin Cheng  
>
> PR tree-optimization/68911
> * gcc.c-torture/execute/pr68911.c: New test.
>
>

Re: [PATCH : RL78] Disable interrupts during hardware multiplication routines

2016-01-12 Thread Nick Clifton


Hi Kaushik,


+/* Structure for G13 MDUC registers.  */
+struct mduc_reg_type
+{
+  unsigned int address;
+  enum machine_mode mode;
+  bool is_volatile;
+};
+
+struct mduc_reg_type  mduc_regs[NUM_OF_MDUC_REGS] =
+  {{0xf00e8, QImode, true},
+   {0x0, HImode, true},
+   {0x2, HImode, true},
+   {0xf2224, HImode, true},
+   {0xf00e0, HImode, true},
+   {0xf00e2, HImode, true}};


If the is_volatile field is true for all members of this array, why 
bother having it at all ?  (If I remember correctly in your previous 
patch only some of these addresses were being treated as volatile 
registers, not all of them).




+/* Check if the block uses mul/div insns for G13 target.  */
+static bool
+check_mduc_usage ()


Add a void type to the declaration.  Ie:

  check mduc_usage (void)



+{
+  rtx insn;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+  {


You should have a blank line between the end of the variable 
declarations and the start of the code.




+FOR_BB_INSNS (bb, insn)
+{
+  if (get_attr_is_g13_muldiv_insn (insn) == IS_G13_MULDIV_INSN_YES)
+return true;


I am not sure - but it might be safer to check INSN_P(insn) first 
before checking for the muldiv attribute.




+  for (int i = 0; i mode, GEN_INT (reg->address));

   MEM_VOLATILE_P (mem_mduc) = reg->is_volatile;
   if (reg->mode == QImode)
 emit_insn (gen_movqi (gen_rtx_REG (QImode, A_REG), mem_mduc));
   else
 emit_insn (gen_movhi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
   emit_insn (gen_push (gen_rtx_REG (HImode, AX_REG)));
}


fs = cfun->machine->framesize_locals + cfun->machine->framesize_outgoing;
+  if (MUST_SAVE_MDUC_REGISTERS && (!crtl->is_leaf || check_mduc_usage ()))
+fs = fs + NUM_OF_MDUC_REGS * 2;
if (fs > 0)
  {
/* If we need to subtract more than 254*3 then it is faster and
@@ -1426,6 +1490,8 @@
else
  {
fs = cfun->machine->framesize_locals + 
cfun->machine->framesize_outgoing;
+  if (MUST_SAVE_MDUC_REGISTERS && (!crtl->is_leaf || check_mduc_usage ()))
+fs = fs + NUM_OF_MDUC_REGS * 2;
if (fs > 254 * 3)


No - this is wrong.  "fs" is the amount of extra space needed in the 
stack frame to hold local variables and outgoing variables.  It should 
not include the stack space used for already pushed registers.




Index: gcc/config/rl78/rl78.h
===
--- gcc/config/rl78/rl78.h(revision 2871)
+++ gcc/config/rl78/rl78.h(working copy)
@@ -28,6 +28,7 @@
  #define TARGET_G14(rl78_cpu_type == CPU_G14)


+#define NUM_OF_MDUC_REGS 6


Why define this here ?  It is only ever used in rl78,c and it can be 
computed automatically by applying the ARRAY_SIZE macro to the mduc_regs 
array.





Index: gcc/config/rl78/rl78.opt
===
--- gcc/config/rl78/rl78.opt(revision 2871)
+++ gcc/config/rl78/rl78.opt(working copy)
@@ -103,4 +103,10 @@
  Target Mask(ES0)
  Assume ES is zero throughout program execution, use ES: for read-only data.

+msave-mduc-in-interrupts
+Target Mask(SAVE_MDUC_REGISTERS)
+Stores the MDUC registers in interrupt handlers for G13 target.

+mno-save-mduc-in-interrupts
+Target RejectNegative Mask(NO_SAVE_MDUC_REGISTERS)
+Does not save the MDUC registers in interrupt handlers for G13 target.


This looks wrong.  Surely you only need the msave-mduc-in-interrupts 
definition.  That will automatically allow -mno-save-mduc-in-interrupts, 
since it does not have the RejectNegative attribute.  Also these is no 
need to have two separate target mask bits.  Just SAVE_MDUC_REGISTERS 
will do.






Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi(revision 2871)
+++ gcc/doc/invoke.texi(working copy)


You should also add the name of the new option to the Machine Dependent 
Options section of the manual.  (Approx line 896 in invoke.texi)




+@item -msave-mduc-in-interrupts
+@item -mno-save-mduc-in-interrupts
+@opindex msave-mduc-in-interrupts
+@opindex mno-save-mduc-in-interrupts
+Specifies that interrupt handler functions should preserve the
+MDUC registers.  This is only necessary if normal code might use
+the MDUC registers, for example because it

C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

Seems that people find compile-time error on the following testcase overly
pedantic.  I.e. that "enum A { X = -1 << 1 };" should compile, at least with
-fpermissive.  So I've changed the error_at into permerror and the return value
of cxx_eval_check_shift_p now depends on flag_permissive.  Luckily, I didn't
have to modify any of the existing tests.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-01-12  Marek Polacek  

PR c++/68979
* constexpr.c (cxx_eval_check_shift_p): Use permerror rather than
error_at and return negated flag_permissive.

* g++.dg/warn/permissive-1.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index e60180e..dbcc242 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1512,17 +1512,17 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (rhs) == -1)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
   if (compare_tree_int (rhs, uprec) >= 0)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is >= than "
- "the precision of the left operand",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is >= than "
+  "the precision of the left operand",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
 
   /* The value of E1 << E2 is E1 left-shifted E2 bit positions; [...]
@@ -1536,9 +1536,10 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (lhs) == -1)
{
  if (!ctx->quiet)
-   error_at (loc, "left operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc,
+  "left operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
   /* For signed x << y the following:
 (unsigned) x >> ((prec (lhs) - 1) - y)
@@ -1555,9 +1556,9 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_lt (integer_one_node, t))
{
  if (!ctx->quiet)
-   error_at (loc, "shift expression %q+E overflows",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc, "shift expression %q+E overflows",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
 }
   return false;
diff --git gcc/testsuite/g++.dg/warn/permissive-1.C 
gcc/testsuite/g++.dg/warn/permissive-1.C
index e69de29..7223e68 100644
--- gcc/testsuite/g++.dg/warn/permissive-1.C
+++ gcc/testsuite/g++.dg/warn/permissive-1.C
@@ -0,0 +1,8 @@
+// PR c++/68979
+// { dg-do compile }
+// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
-Wno-shift-count-negative" }
+
+enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
target c++11 } }
+enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
+enum C { CC = 1 << 100 }; // { dg-warning "operand of shift expression" }
+enum D { DD = 31 << 30 }; // { dg-warning "shift expression" "" { target c++11 
} }

Marek

[PATCH] Handle inter-block notes before BARRIER in rtl merge_blocks (PR target/69175, take 2)

2016-01-12 Thread Jakub Jelinek

On Sat, Jan 09, 2016 at 12:27:02AM +0100, Bernd Schmidt wrote:
> Well, I checked a bit more. Most callers of merge_blocks seem to already
> look for barriers if they are a concern and remove them. This occurs
> multiple times in ifcvt.c and cfgcleanup.c. Oddly,
> merge_blocks_move_predecessor_nojumps uses next_nonnote_insn to find the
> barrier, while merge_blocks_move_successor_nojumps uses just NEXT_INSN. That
> should probably be fixed too.
> 
> So the situation is a bit odd in that most callers remove the barrier but
> merge_blocks tries to handle an isolated barrier as well. The area could
> probably cleaned up a little, but on the whole I still lean towards
> requiring the caller to remove an isolated barrier. That leaves the RTL in a
> more consistent state before the call to merge_blocks.

So is the following ok for trunk?
Bootstrapped/regtested on x86_64-linux and i686-linux, and Kyrill has kindly
bootstrapped/regtested it on arm too.

2016-01-12  Jakub Jelinek  

PR target/69175
* ifcvt.c (cond_exec_process_if_block): When removing the last
insn from then_bb, remove also any possible barriers that follow it.

* g++.dg/opt/pr69175.C: New test.

--- gcc/ifcvt.c.jj  2016-01-04 14:55:53.0 +0100
+++ gcc/ifcvt.c 2016-01-11 16:13:22.833174933 +0100
@@ -739,7 +739,7 @@ cond_exec_process_if_block (ce_if_block
   rtx_insn *from = then_first_tail;
   if (!INSN_P (from))
from = find_active_insn_after (then_bb, from);
-  delete_insn_chain (from, BB_END (then_bb), false);
+  delete_insn_chain (from, get_last_bb_insn (then_bb), false);
 }
   if (else_last_head)
 delete_insn_chain (first_active_insn (else_bb), else_last_head, false);
--- gcc/testsuite/g++.dg/opt/pr69175.C.jj   2016-01-08 13:04:04.084805432 
+0100
+++ gcc/testsuite/g++.dg/opt/pr69175.C  2016-01-08 13:03:47.0 +0100
@@ -0,0 +1,29 @@
+// PR target/69175
+// { dg-do compile }
+// { dg-options "-O2" }
+// { dg-additional-options "-march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 
-mthumb" { target { arm_hard_vfp_ok && arm_thumb2_ok } } }
+
+struct A { A *c, *d; } a;
+struct B { A *e; A *f; void foo (); };
+void *b;
+
+void
+B::foo ()
+{
+  if (b) 
+{
+  A *n = (A *) b;
+  if (b == e)
+   if (n == f)
+ e = __null;
+   else
+ e->c = __null;
+  else
+   n->d->c = 
+  n->d = e;
+  if (e == __null)
+   e = f = n;
+  else
+   e = n;
+}
+}


Jakub

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 12:10:20PM +0100, Uros Bizjak wrote:
> On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  wrote:
> > On Mon, 11 Jan 2016, H.J. Lu wrote:
> >
> >> Here is the updated patch.  Joseph, is this OK?
> >
> > I have no objections to this patch.
> 
> Thinking some more, it looks to me that we also have to return 2 when
> SSE2 (SSE doubles) is not enabled.
> 
> I'm testing following patch:

That looks weird.  If TARGET_80387 and !TARGET_SSE_MATH, then no matter
whether sse2 is enabled or not, normal floating point operations will be
performed in 387 stack and thus FLT_EVAL_METHOD should be 2, not 0.
Do you want to do this because some instructions might be vectorized and
therefore end up in sse registers?  For -std=c99 that shouldn't happen,
already the C FE would promote all the arithmetics to be done in long
doubles, and for -std=gnu99 it is acceptable if non-vectorized computations
honor FLT_EVAL_METHOD and vectorized ones don't.
> 
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 6c63871..b71cf4f 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
> argc, const char **argv);
> only SSE, rounding is correct; when using both SSE and the FPU,
> the rounding precision is indeterminate, since either may be chosen
> apparently at random.  */
> -#define TARGET_FLT_EVAL_METHOD \
> -  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
> +#define TARGET_FLT_EVAL_METHOD \
> +  (TARGET_MIX_SSE_I387 ? -1\
> +   : (TARGET_80387 && !TARGET_SSE2 && !TARGET_SSE_MATH) ? 2 : 0)
> 
>  /* Whether to allow x87 floating-point arithmetic on MODE (one of
> SFmode, DFmode and XFmode) in the current excess precision
> 
> Uros.

Jakub

[PATCH, PING] DWARF: process all TYPE_DECL nodes when iterating on scopes

2016-01-12 Thread Pierre-Marie de Rodat


Hello,

Ping for the patch submitted in 
. Thanks!


--
Pierre-Marie de Rodat

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-01-12 Thread James Greenhalgh

On Tue, Jan 12, 2016 at 05:53:21AM +, Kumar, Venkataramanan wrote:
> Hi James,
> 
> > -Original Message-
> > From: James Greenhalgh [mailto:james.greenha...@arm.com]
> > Sent: Monday, January 11, 2016 5:24 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: n...@arm.com; marcus.shawcr...@arm.com;
> > richard.earns...@arm.com; Kumar, Venkataramanan;
> > philipp.toms...@theobroma-systems.com; pins...@gmail.com;
> > kyrylo.tkac...@arm.com; e.mene...@samsung.com
> > Subject: [Patch AArch64] Use software sqrt expansion always for -mlow-
> > precision-recip-sqrt
> > 
> > 
> > Hi,
> > 
> > I'd like to switch the logic around in aarch64.c such that -mlow-precision-
> > recip-sqrt causes us to always emit the low-precision software expansion for
> > reciprocal square root. I have two reasons to do this; first is consistency
> > across -mcpu targets, second is enabling more -mcpu targets to use the flag
> > for peak tuning.
> > 
> > I don't much like that the precision we use for -mlow-precision-recip-sqrt
> > differs between cores (and possibly compiler revisions). Yes, we're under -
> > ffast-math but I take this flag to mean the user explicitly wants the low-
> > precision expansion, and we should not diverge from that based on an
> > internal decision as to what is optimal for performance in the 
> > high-precision
> > case. I'd prefer to keep things as predictable as possible, and here that
> > means always emitting the low-precision expansion when asked.
> > 
> > Judging by the comments in the thread proposing the reciprocal square root
> > optimisation, this will benefit all cores currently supported by GCC.
> > To be clear, we would still not expand in the high-precision case for any 
> > cores
> > which do not explicitly ask for it. Currently that is Cortex-A57 and xgene,
> > though I will be proposing a patch to remove Cortex-A57 from that list
> > shortly.
> > 
> > Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> > is intended as a tuning flag for situations where performance is more
> > important than precision, but the current logic requires setting an internal
> > flag which also changes the performance characteristics where high-precision
> > is needed. This conflates two decisions the target might want to make, and
> > reduces the applicability of an option targets might want to enable for
> > performance. In particular, I'd still like to see -mlow-precision-recip-sqrt
> > continue to emit the cheaper, low-precision sequence for floats under
> > Cortex-A57.
> > 
> > Based on that reasoning, this patch makes the appropriate change to the
> > logic. I've checked with the current -mcpu values to ensure that behaviour
> > without -mlow-precision-recip-sqrt does not change, and that behaviour
> > with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> > 
> > I've also put this through bootstrap and test on aarch64-none-linux-gnu with
> > no issues.
> > 
> > OK?
> > 
> > Thanks,
> > James
> > 
> 
> Yes I like enabling this optimization for all cpus target via
> -mlow-precision-recip-sqrt .
>  
> If my understanding is correct for cortex-a57 we now need to use only
> -mlow-precision-recip-sqrt to emit software sqrt expansion?
> 
> In the below code 
> ---snip---
> void
> aarch64_emit_swrsqrt (rtx dst, rtx src)
> {
> 
> 
>   int iterations = double_mode ? 3 : 2;
> 
>   if (flag_mrecip_low_precision_sqrt)
> iterations--;
>  ---snip---
> 
> Now cortex-a57 case we will always do  2 and 1 steps  for double and float
> and  3 and 2 will never be used. Should we make it 2 and 1 as default? Or
> any target still needs to use 3 and 2. 

The code here should handle two cases:

  1) Normal -Ofast case -> Some targets use the estimate expansion with
 3 iterations for double, 2 for float. Other targets use the hardware
 fsqrt/fdiv instructions.
  2) -mlow-precision-recip-sqrt -> All targets use the estimate expansion
 with 2 iterations for double, 1 for float.

-mlow-precision-recip-sqrt is a specialisation to be used only when the
programmer knows the lower precision is acceptable. It should not be on
by default...

> Ps: I remember reducing iterations benefited gromacs but caused some VE in
> other FP benchmarks.  

... For exactly this reason :-)

Thanks,
James

Re: [PR tree-optimization/64946] Push integer type conversion to ABS_EXPR argument when possible.

2016-01-12 Thread Matthew Wahab


On 11/01/16 17:46, Richard Biener wrote:

On January 11, 2016 5:36:33 PM GMT+01:00, Bernd Schmidt  
wrote:

On 01/11/2016 05:33 PM, Matthew Wahab wrote:


The case I'm trying to fix has (short)abs((int)short_var). I'd

thought

that if
abs(short_var) was undefined because the result couldn't be

represented

then the type
conversion from int to short would also be undefined. In fact, it's
implementation
defined and S4.5 of the GCC manual says that the value is reduced

until

it can be
represented. So (short)abs((int)short_var) will produce a value when
abs(short_var) is undefined meaning this transformation isn't

correct.

I'll drop this patch.


Maybe we could have an optab and corresponding internal function for an

abs that's always defined.


I'd like to have ABSU_EXPR (or allow unsigned result on ABS_EXPR).



I'll see if I can do anything along those lines. This looks like something for stage 
1 though.


Matthew

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 12:04:22PM +, Kyrill Tkachov wrote:
> >2016-01-12  Kugan Vivekanandarajah  
> >
> >* expr.c (expand_expr_real_1): Fix promoted sign in SUBREG_PRMOTED

I'd like to just point at the ChangeLog typo - PRMOTED instead of PROMOTED.

Jakub

Re: [PATCH] Handle inter-block notes before BARRIER in rtl merge_blocks (PR target/69175, take 2)

2016-01-12 Thread Bernd Schmidt


On 01/12/2016 11:17 AM, Jakub Jelinek wrote:


PR target/69175
* ifcvt.c (cond_exec_process_if_block): When removing the last
insn from then_bb, remove also any possible barriers that follow it.

* g++.dg/opt/pr69175.C: New test.


Ok.


Bernd

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 01:32:05PM +0100, Uros Bizjak wrote:
> Using this patch, SSE math won't be emitted for a simple testcase
> using " -O2 -msse -m32 -std=c99 -mfpmath=sse" compile flags:
> 
> float test (float a, float b)
> {
>   return a + b;
> }
> 
> since we start with:
> 
> test (float a, float b)
> {
>   long double _2;
>   long double _4;
>   long double _5;
>   float _6;
> 
>   :
>   _2 = (long double) a_1(D);
>   _4 = (long double) b_3(D);
>   _5 = _2 + _4;
>   _6 = (float) _5;
>   return _6;
> }
> 
> This is counter-intuitive, so I'd say we leave things as they are. The
> situation where only floats are evaluated as floats and doubles are
> evaluated as long doubles is not covered in the FLT_EVAL_METHOD spec.

Well, for the -fexcess-precision=standard case (== -std=c99) FLT_EVAL_METHOD
2 doesn't hurt, that forces in the FE long double computation.  While if it
is 0 with -msse -mfpmath=sse, it means that the FE leaves computations as is
and they are computed in float precision for floats and in long double
precision for doubles.  For -fexcess-precision=fast it is different, because
the FE doesn't do anything, so in the end it is mixed in that case.
So, for -msse -mfpmath=sse, I think either we need FLT_EVAL_METHOD 2 or -1
or 2 for -fexcess-precision=standard and -1 for -fexcess-precision=fast.

Jakub

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 01:52:01PM +0100, Marek Polacek wrote:
> --- gcc/testsuite/g++.dg/warn/permissive-1.C
> +++ gcc/testsuite/g++.dg/warn/permissive-1.C
> @@ -0,0 +1,8 @@
> +// PR c++/68979
> +// { dg-do compile }
> +// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
> -Wno-shift-count-negative" }
> +
> +enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
> target c++11 } }
> +enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
> +enum C { CC = 1 << 100 }; // { dg-warning "operand of shift expression" }
> +enum D { DD = 31 << 30 }; // { dg-warning "shift expression" "" { target 
> c++11 } }

Shouldn't this test be limited to
// { dg-do compile { target int32 } }
or better yet replace the 100 and 30 above with
say __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 and __SIZEOF_INT__ * __CHAR_BIT__ - 2
?
I'd guess that on say int16 targets, or int64 targets (if we have any at
some point) or int128 targets this wouldn't do what you are expecting.
{ target int32 } is not exactly right, because it still assumes __CHAR_BIT__ == 
8
and for other char sizes it could fail.

Jakub

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Jason Merrill

Changing the diagnostic is OK, but cxx_eval_check_shift_p should return 
true regardless of flag_permissive, so that SFINAE results follow the 
standard.


Jason

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
> GCC 5 has the same issue.  This patch should be backported to GCC 5
> with
> 
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
> 
> which supersedes:
> 
> https://gcc.gnu.org/viewcvs/gcc?view=revision=231269
> 
> OK to backport Jakub's and my patch for GCC 5?

I think I'd prefer just r231269 and my patch for the branch, to make the
changes as small as possible, leave the cleanup on the trunk only.
But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.

Jakub

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

On Tue, Jan 12, 2016 at 08:27:47AM -0500, Jason Merrill wrote:
> Changing the diagnostic is OK, but cxx_eval_check_shift_p should return true
> regardless of flag_permissive, so that SFINAE results follow the standard.

There's a complication, because if I keep returning true, we'll give a
compile-time error like this:

permissive-1.C:5:18: warning: left operand of shift expression ‘(-1 << 4)’ is
negative [-fpermissive]
 enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

permissive-1.C:5:21: error: enumerator value for ‘AA’ is not an integer
constant
 enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

So I suppose that wouldn't really help.  :(

Marek

Re: [PATCH] Fix up my recent change to vect_get_constant_vectors (PR tree-optimization/69207)

2016-01-12 Thread Ilya Enkovich

2016-01-11 20:13 GMT+03:00 Jakub Jelinek :
> Hi!
>
> Based on discussions on IRC, I'm submitting following fix for a regression
> on aarch64 - partial reversion (the case where VCE works too, just I thought
> using NOP_EXPR would be nicer) and change in the assert - op better be
> some integral value if converting it to VECTOR_BOOLEAN_TYPE_P's element
> type.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-01-11  Jakub Jelinek  
>
> PR tree-optimization/69207
> * tree-vect-slp.c (vect_get_constant_vectors): For
> VECTOR_BOOLEAN_TYPE_P, assert op has integral type instead of
> fold_convertible_p to vector_type's element type, and always
> use VCE for non-VECTOR_BOOLEAN_TYPE_P.
>
> --- gcc/tree-vect-slp.c.jj  2016-01-08 21:45:57.0 +0100
> +++ gcc/tree-vect-slp.c 2016-01-11 12:07:19.633366712 +0100
> @@ -2999,12 +2999,9 @@ vect_get_constant_vectors (tree op, slp_
>   gimple *init_stmt;
>   if (VECTOR_BOOLEAN_TYPE_P (vector_type))
> {
> - gcc_assert (fold_convertible_p (TREE_TYPE (vector_type),
> - op));
> + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
>   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, 
> op);

In vect_init_vector we had to introduce COND_EXPR to choose between 0 and -1 for
boolean vectors.  Shouldn't we do similar in SLP?

Thanks,
Ilya

> }
> - else if (fold_convertible_p (TREE_TYPE (vector_type), op))
> -   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, op);
>   else
> {
>   op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vector_type),
>
> Jakub

Re: [C++ PATCH] Fix ICE due to Cilk+ related cp_gimplify_expr bug (PR objc++/68511, PR c++/69213)

2016-01-12 Thread Jason Merrill


OK.

Jason

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread H.J. Lu

On Tue, Jan 12, 2016 at 5:12 AM, Kirill Yukhin  wrote:
> Hello Jakub
> On 08 Jan 21:20, Jakub Jelinek wrote:
>> Hi!
>>
>> This patch fixes
>> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
>> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
>> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
>> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
>> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> regressions that were introduced recently by fixing up the masked store 
>> check for misalignment.
>> The problem is that for v2df/v4df/v4sf/v8sf masked stores 
>> ix86_expand_special_args_builtin
>> failed to set aligned_mem and thus didn't set correct memory alignment.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Followed you discussion w/ HJ.
> I think that metioned intrinsics should assume proper alignement and this
> agrees with SDM.
>
> So, your patch is ok for main trunk.
>
> --
> Thanks, K
>
>
>>
>> 2016-01-08  Jakub Jelinek  
>>
>>   PR target/69198
>>   * config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
>>   aligned_mem is properly set for AVX512-VL floating point masked
>>   stores.
>>
>> --- gcc/config/i386/i386.c.jj 2016-01-08 07:31:11.0 +0100
>> +++ gcc/config/i386/i386.c2016-01-08 18:16:21.030354042 +0100
>> @@ -39776,7 +39776,11 @@ ix86_expand_special_args_builtin (const
>>memory = 0;
>>break;
>>  case VOID_FTYPE_PV8DF_V8DF_UQI:
>> +case VOID_FTYPE_PV4DF_V4DF_UQI:
>> +case VOID_FTYPE_PV2DF_V2DF_UQI:
>>  case VOID_FTYPE_PV16SF_V16SF_UHI:
>> +case VOID_FTYPE_PV8SF_V8SF_UQI:
>> +case VOID_FTYPE_PV4SF_V4SF_UQI:
>>  case VOID_FTYPE_PV8DI_V8DI_UQI:
>>  case VOID_FTYPE_PV4DI_V4DI_UQI:
>>  case VOID_FTYPE_PV2DI_V2DI_UQI:
>> @@ -39834,10 +39838,6 @@ ix86_expand_special_args_builtin (const
>>  case VOID_FTYPE_PV16QI_V16QI_UHI:
>>  case VOID_FTYPE_PV32QI_V32QI_USI:
>>  case VOID_FTYPE_PV64QI_V64QI_UDI:
>> -case VOID_FTYPE_PV4DF_V4DF_UQI:
>> -case VOID_FTYPE_PV2DF_V2DF_UQI:
>> -case VOID_FTYPE_PV8SF_V8SF_UQI:
>> -case VOID_FTYPE_PV4SF_V4SF_UQI:
>>nargs = 2;
>>klass = store;
>>/* Reserve memory operand for target.  */
>>
>>   Jakub

GCC 5 has the same issue.  This patch should be backported to GCC 5
with

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html

which supersedes:

https://gcc.gnu.org/viewcvs/gcc?view=revision=231269

OK to backport Jakub's and my patch for GCC 5?

-- 
H.J.

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 02:29:06PM +0100, Martin Jambor wrote:
> GOMP_kernel_launch_attributes should not be there (it is a
> reminiscence from before the device-specific target arguments) and
> should be moved just to the HSA plugin.  I'll prepare a patch today.
> 
> While we do not have to share GOMP_hsa_kernel_dispatch, we actually do
> use them in both the plugin and the compiler, where we only use it in
> an offsetof, so that we only have the structure defined once.

But, even using it in offsetof might be wrong, the compiler could be a
cross-compiler, and you'd use offsetof on the host, while you want it for
the target, and that would be different.
So, IMHO you need (unless you already have) built the structure as a tree
type, lay it out, and then you can use at TYPE_SIZE_UNIT or
DECL_FIELD_OFFSET and the like.

Jakub

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Martin Jambor

Hi,

On Fri, Dec 11, 2015 at 07:05:29PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 10, 2015 at 06:52:23PM +0100, Martin Jambor wrote:
> > > > --- a/libgomp/task.c
> > > > +++ b/libgomp/task.c
> > > > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
> > > >gomp_mutex_unlock (>task_lock);
> > > >  }
> > > >ttask->state = GOMP_TARGET_TASK_FINISHED;
> > > > +  free (ttask->firstprivate_copies);
> > > >gomp_target_task_completion (team, task);
> > > >gomp_mutex_unlock (>task_lock);
> > > >  }
> > > 
> > > So, this function should have a special case for the SHARED_MEM case, 
> > > handle
> > > it closely to say how GOMP_taskgroup_end handles the finish_cancelled:
> > > case.  Just note that the target task is missing from certain queues at 
> > > that
> > > point.
> > 
> > I'm afraid I need some help here.  I do not quite understand how is
> > finish_cancelled in GOMP_taskgroup_end similar, it seems to be doing
> > much more than freeing one pointer.  What is exactly the issue with
> > the above?
> > 
> > Nevertheless, after reading through bits of task.c again, I wonder
> > whether any copying (for both shared memory target and the host) in
> > gomp_target_task_fn is actually necessary because it seems to be also
> > done in gomp_create_target_task.  Does that not apply somehow?
> 
> The target task is scheduled for the first action as normal task, and the
> scheduling of it already removes it from some of the queues (each task is
> put into 1-3 queues), i.e. actions performed mostly by
> gomp_task_run_pre.  Then the team task lock is unlocked and the task is run.
> Finally, for normal tasks, gomp_task_run_post_handle_depend,
> gomp_task_run_post_remove_parent, etc. is run.  Now, for async target tasks
> that have something running on some other device at that point, we don't do
> that, but instead make it GOMP_TASK_ASYNC_RUNNING.  And continue with other
> stuff, until gomp_target_task_completion is run.
> For non-shared mem that needs to readd the task again into the queues, so
> that it will be scheduled again.  But you don't need that for shared mem
> target tasks, they can just free the firstprivate_copies and finalize the
> task.
> At the time gomp_target_task_completion is called, the task is pretty much
> in the same state as it is around the finish_cancelled:; label.
> So instead of what the gomp_target_task_completion function does,
> you would for SHARED_MEM do something like:
>   size_t new_tasks
> = gomp_task_run_post_handle_depend (task, team);
>   gomp_task_run_post_remove_parent (task);
>   gomp_clear_parent (>children_queue);
>   gomp_task_run_post_remove_taskgroup (task);
>   team->task_count--;
> do_wake = 0;
>   if (new_tasks > 1)
> {
>   do_wake = team->nthreads - team->task_running_count
> - !task->in_tied_task;
>   if (do_wake > new_tasks)
> do_wake = new_tasks;
> }
> // Unlike other places, the following will be also run with the
> // task_lock held, but I'm afraid there is nothing to do about it.
> // See the comment in gomp_target_task_completion.
> gomp_finish_task (task);
> free (task);
> if (do_wake)
>   gomp_team_barrier_wake (>barrier, do_wake);
> 

I tried the above but libgomp testcase target-33.c always got stuck
within GOMP_taskgroup_end call, more specifically in
gomp_team_barrier_wait_end in config/linux/bar.c where the the first
call to gomp_barrier_handle_tasks left the barrier->generation as
BAR_WAITING_FOR_TASK and then nothing ever happened, even as the
callbacks fired.

After looking into the tasking mechanism for basically the whole day
yesterday, I *think* I fixed it by calling
gomp_team_barrier_set_task_pending from the callback and another hunk
in gomp_barrier_handle_tasks so that it clears that barrier flag even
if it has not picked up any tasks.  Please let me know if you think it
makes sense.

If so, I'll include it in an HSA patch set I hope to generate today.
Otherwise I guess I'd prefer to remove the shared-memory path and
revert to old behavior as a temporary measure until we find out what
was wrong.

Thanks and sorry that it took me so long to resolve this,

Martin


diff --git a/libgomp/task.c b/libgomp/task.c
index ab5df51..828c1fb 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -566,6 +566,14 @@ gomp_target_task_completion (struct gomp_team *team, 
struct gomp_task *task)
 gomp_team_barrier_wake (>barrier, 1);
 }
 
+static inline size_t
+gomp_task_run_post_handle_depend (struct gomp_task *child_task,
+ struct gomp_team *team);
+static inline void
+gomp_task_run_post_remove_parent (struct gomp_task *child_task);
+static inline void
+gomp_task_run_post_remove_taskgroup (struct gomp_task *child_task);
+
 /* Signal that a target task TTASK has completed the asynchronously
running phase

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Andreas Schwab

Yuri Rumyantsev  writes:

> Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing

Sure, go ahead.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[Committed, PATCH] Define STDINT_LONG32 and add predefined integer types for IAMCU

2016-01-12 Thread H.J. Lu

Define STDINT_LONG32 to 0, add SIZE_TYPE, PTRDIFF_TYPE and WCHAR_TYPE
for IAMCU to make integer types compatible with i386 Linux.

Checked into trunk.

H.J.

PR target/68456
PR target/69226
* config/i386/iamcu.h (SIZE_TYPE): New macro.
(PTRDIFF_TYPE): Likewise.
(WCHAR_TYPE): Likewise.
(WCHAR_TYPE_SIZE): Likewise.
(STDINT_LONG32): Likewise.
---
 gcc/config/i386/iamcu.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/config/i386/iamcu.h b/gcc/config/i386/iamcu.h
index 53afbc0..e16c9d63 100644
--- a/gcc/config/i386/iamcu.h
+++ b/gcc/config/i386/iamcu.h
@@ -94,3 +94,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 goto DONE; \
   }
\
   } while (0)
+
+#undef SIZE_TYPE
+#define SIZE_TYPE "unsigned int"
+
+#undef PTRDIFF_TYPE
+#define PTRDIFF_TYPE "int"
+
+#undef WCHAR_TYPE
+#define WCHAR_TYPE "long int"
+
+#undef WCHAR_TYPE_SIZE
+#define WCHAR_TYPE_SIZE BITS_PER_WORD
+
+/* Use int, instead of long int, for int32_t and uint32_t.  */
+#undef STDINT_LONG32
+#define STDINT_LONG32 0
-- 
2.5.0

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 2:42 PM, Jakub Jelinek  wrote:
> On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
>> GCC 5 has the same issue.  This patch should be backported to GCC 5
>> with
>>
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
>>
>> which supersedes:
>>
>> https://gcc.gnu.org/viewcvs/gcc?view=revision=231269
>>
>> OK to backport Jakub's and my patch for GCC 5?
>
> I think I'd prefer just r231269 and my patch for the branch, to make the
> changes as small as possible, leave the cleanup on the trunk only.
> But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.

I agree with Jakub.

Those two patches are OK for backport.

Thanks,
Uros.

[gomp4] fix kernel reductions

2016-01-12 Thread Nathan Sidwell

This patch fixes an ICE encountered with the Houston's testsuite when kernel 
optimizations are enabled.


The reduction is implemented via a cmp loop, but later than the omp code 
usually does that lowering.   At the point it happens for kernels, loops must 
have simple latches, which this patch implements by splitting the non-simple 
latch's back  edge (which is what force_single_succ_latches does when run over 
the loop structure).


applied to gomp4

nathan
2016-01-08  Nathan Sidwell  

	gcc/
	* omp-low.c (expand_omp_atomic_pipeline): Pay attention to
	LOOPS_HAVE_SIMPLE_LATCHES state.

2016-01-12  Nathan Sidwell  

	gcc/testsuite/
	* gcc.dg/goacc/kern-1.c: New.

Index: omp-low.c
===
--- omp-low.c	(revision 232179)
+++ omp-low.c	(revision 232180)
@@ -12370,6 +12370,9 @@ expand_omp_atomic_pipeline (basic_block
   loop->header = loop_header;
   loop->latch = store_bb;
   add_loop (loop, loop_header->loop_father);
+  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
+/* Split the edge from store_bb to loop_header */
+split_edge (e);
 
   if (gimple_in_ssa_p (cfun))
 update_ssa (TODO_update_ssa_no_phi);
Index: gcc.dg/goacc/kern-1.c
===
--- gcc.dg/goacc/kern-1.c	(revision 0)
+++ gcc.dg/goacc/kern-1.c	(working copy)
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-fopenacc -O2 -ftree-parallelize-loops=32" } */
+
+/* The reduction on sum could cause an ICE with a non-simple latch loop.   */
+
+int printf (char const *, ...);
+
+int
+main ()
+{
+  int i;
+  double a[1000], sum = 0;
+
+  
+#pragma acc kernels pcopyin(a[0:1000])
+#pragma acc loop reduction(+:sum)
+  for(int i=0; i<1000; i++) {
+sum += a[i];
+  }
+
+  printf ("%lf\n", sum);
+
+  return 0;
+}

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Jason Merrill


On 01/12/2016 09:05 AM, Marek Polacek wrote:

On Tue, Jan 12, 2016 at 08:27:47AM -0500, Jason Merrill wrote:

Changing the diagnostic is OK, but cxx_eval_check_shift_p should return true
regardless of flag_permissive, so that SFINAE results follow the standard.


There's a complication, because if I keep returning true, we'll give a
compile-time error like this:

permissive-1.C:5:18: warning: left operand of shift expression ‘(-1 << 4)’ is
negative [-fpermissive]
  enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

permissive-1.C:5:21: error: enumerator value for ‘AA’ is not an integer
constant
  enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

So I suppose that wouldn't really help.  :(


In that case, we need to return (!flag_permissive || ctx->quiet).

Jason

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

On Tue, Jan 12, 2016 at 09:09:38AM -0500, Jason Merrill wrote:
> In that case, we need to return (!flag_permissive || ctx->quiet).

Thanks.  So is this one ok once it passes testing?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-01-12  Marek Polacek  

PR c++/68979
* constexpr.c (cxx_eval_check_shift_p): Use permerror rather than
error_at and adjust the return value.

* g++.dg/warn/permissive-1.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index e60180e..36a1e42 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1512,17 +1512,17 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (rhs) == -1)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+  return (!flag_permissive || ctx->quiet);
 }
   if (compare_tree_int (rhs, uprec) >= 0)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is >= than "
- "the precision of the left operand",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is >= than "
+  "the precision of the left operand",
+  build2_loc (loc, code, type, lhs, rhs));
+  return (!flag_permissive || ctx->quiet);
 }
 
   /* The value of E1 << E2 is E1 left-shifted E2 bit positions; [...]
@@ -1536,9 +1536,10 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (lhs) == -1)
{
  if (!ctx->quiet)
-   error_at (loc, "left operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc,
+  "left operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+ return (!flag_permissive || ctx->quiet);
}
   /* For signed x << y the following:
 (unsigned) x >> ((prec (lhs) - 1) - y)
@@ -1555,9 +1556,9 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_lt (integer_one_node, t))
{
  if (!ctx->quiet)
-   error_at (loc, "shift expression %q+E overflows",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc, "shift expression %q+E overflows",
+  build2_loc (loc, code, type, lhs, rhs));
+ return (!flag_permissive || ctx->quiet);
}
 }
   return false;
diff --git gcc/testsuite/g++.dg/warn/permissive-1.C 
gcc/testsuite/g++.dg/warn/permissive-1.C
index e69de29..bfaca76 100644
--- gcc/testsuite/g++.dg/warn/permissive-1.C
+++ gcc/testsuite/g++.dg/warn/permissive-1.C
@@ -0,0 +1,8 @@
+// PR c++/68979
+// { dg-do compile { target int32 } }
+// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
-Wno-shift-count-negative" }
+
+enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
target c++11 } }
+enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
+enum C { CC = 1 << __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 }; // { dg-warning 
"operand of shift expression" }
+enum D { DD = 10 << __SIZEOF_INT__ * __CHAR_BIT__ - 2 }; // { dg-warning 
"shift expression" "" { target c++11 } }

Marek

[PATCH] Fix PR69077

2016-01-12 Thread Richard Biener


The following fixes PR69077.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

lto/
PR lto/69077
* lto-symtab.c (lto_symtab_prevailing_virtual_decl): Properly
merge TREE_ADDRESSABLE and DECL_POSSIBLY_INLINED flags.

* g++.dg/lto/pr69077_0.C: New testcase.
* g++.dg/lto/pr69077_1.C: Likewise.

Index: gcc/lto/lto-symtab.c
===
*** gcc/lto/lto-symtab.c(revision 232261)
--- gcc/lto/lto-symtab.c(working copy)
*** lto_symtab_prevailing_virtual_decl (tree
*** 997,1002 
--- 997,1014 
  n = n->next_sharing_asm_name;
if (n)
  {
+   /* Merge decl state in both directions, we may still end up using
+the other decl.  */
+   TREE_ADDRESSABLE (n->decl) |= TREE_ADDRESSABLE (decl);
+   TREE_ADDRESSABLE (decl) |= TREE_ADDRESSABLE (n->decl);
+ 
+   if (TREE_CODE (decl) == FUNCTION_DECL)
+   {
+ /* Merge decl state in both directions, we may still end up using
+the other decl.  */
+ DECL_POSSIBLY_INLINED (n->decl) |= DECL_POSSIBLY_INLINED (decl);
+ DECL_POSSIBLY_INLINED (decl) |= DECL_POSSIBLY_INLINED (n->decl);
+   }
lto_symtab_prevail_decl (n->decl, decl);
decl = n->decl;
  }
Index: gcc/testsuite/g++.dg/lto/pr69077_0.C
===
*** gcc/testsuite/g++.dg/lto/pr69077_0.C(revision 0)
--- gcc/testsuite/g++.dg/lto/pr69077_0.C(working copy)
***
*** 0 
--- 1,14 
+ // { dg-lto-do link }
+ // { dg-lto-options { { -O3 -g -flto } } }
+ // { dg-extra-ld-options "-r -nostdlib" }
+ 
+ struct cStdDev
+ {
+   long ns;
+   virtual double mean() const {  return ns;  }
+ };
+ 
+ struct cWeightedStdDev : public cStdDev {
+ virtual int netPack();
+ };
+ int cWeightedStdDev::netPack() { }
Index: gcc/testsuite/g++.dg/lto/pr69077_1.C
===
*** gcc/testsuite/g++.dg/lto/pr69077_1.C(revision 0)
--- gcc/testsuite/g++.dg/lto/pr69077_1.C(working copy)
***
*** 0 
--- 1,15 
+ struct cStdDev
+ {
+   long ns;
+   virtual double mean() const {  return ns;  }
+ };
+ 
+ struct sf
+ {
+   void recordScalar(double value);
+   cStdDev eedStats;
+   virtual void finish();
+ };
+ void sf::finish() {
+ recordScalar(eedStats.mean());
+ }

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 02:46:52PM +0100, Martin Jambor wrote:
> diff --git a/libgomp/task.c b/libgomp/task.c
> index ab5df51..828c1fb 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -584,8 +592,34 @@ GOMP_PLUGIN_target_task_completion (void *data)
>gomp_mutex_unlock (>task_lock);
>  }
>ttask->state = GOMP_TARGET_TASK_FINISHED;
> -  free (ttask->firstprivate_copies);
> -  gomp_target_task_completion (team, task);
> +
> +  if (ttask->devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)

First of all, I'm surprised you've changed
GOMP_PLUGIN_target_task_completion rather than gomp_target_task_completion.
The difference between those two is that the latter is run nost just from
the async event, but also if GOMP_PLUGIN_target_task_completion happens to
run before the gomp_mutex_lock (>task_lock); is acquired in the
various spots before
child_task->kind = GOMP_TASK_ASYNC_RUNNING;
The point is if the async completion happens too early for the thread
spawning it to notice, we want to complete it only when the spawning thread
is ready for that.

But looking at GOMP_PLUGIN_target_task_completion, I see we have a bug in
there,
  gomp_mutex_lock (>task_lock);
  if (ttask->state == GOMP_TARGET_TASK_READY_TO_RUN)
{
  ttask->state = GOMP_TARGET_TASK_FINISHED;
  gomp_mutex_unlock (>task_lock);
}
  ttask->state = GOMP_TARGET_TASK_FINISHED;
  gomp_target_task_completion (team, task);
  gomp_mutex_unlock (>task_lock);
there was meant to be I think return; after the first unlock, otherwise
it doubly unlocks the same lock, and performs gomp_target_task_completion
without the lock held, which may cause great havoc.

I'll test the obvious change here.

> +{
> +  free (ttask->firstprivate_copies);
> +  size_t new_tasks
> + = gomp_task_run_post_handle_depend (task, team);
> +  gomp_task_run_post_remove_parent (task);
> +  gomp_clear_parent (>children_queue);
> +  gomp_task_run_post_remove_taskgroup (task);
> +  team->task_count--;
> +  int do_wake = 0;
> +  if (new_tasks)
> + {
> +   do_wake = team->nthreads - team->task_running_count
> + - !task->in_tied_task;
> +   if (do_wake > new_tasks)
> + do_wake = new_tasks;
> + }
> +  /* Unlike other places, the following will be also run with the 
> task_lock
> + held, but there is nothing to do about it.  See the comment in
> + gomp_target_task_completion.  */
> +  gomp_finish_task (task);
> +  free (task);
> +  gomp_team_barrier_set_task_pending (>barrier);

This one really looks weird.  I mean, this should be done if we increase the
number of team's tasks, and gomp_task_run_post_handle_depend should do that
if it adds new tasks (IMHO it does), but if new_tasks is 0, then
there is no new task to schedule and therefore it should not be set.

> +  gomp_team_barrier_wake (>barrier, do_wake ? do_wake : 1);
> +}
> +  else
> +gomp_target_task_completion (team, task);
>gomp_mutex_unlock (>task_lock);
>  }
>  
> @@ -1275,7 +1309,12 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
> thr->task = task;
>   }
>else
> - return;
> + {
> +   if (team->task_count == 0
> +   && gomp_team_barrier_waiting_for_tasks (>barrier))
> + gomp_team_barrier_done (>barrier, state);
> +   return;
> + }
>gomp_mutex_lock (>task_lock);
>if (child_task)
>   {

And this hunk looks wrong too.  gomp_team_barrier_done shouldn't be done
outside of the lock held, there is no waking and I don't understand the
rationale for why you think current gomp_barrier_handle_tasks is wrong.

Anyway, if you make the HSA branch work the less efficient way of creating a
task that just frees the firstprivate copies, and post after the merge into
trunk a WIP patch that includes this, plus if there are clear instructions
how to build the HSA stuff on the wiki, my son has a box with AMD Kaveri,
so I'll try to debug it there.

Jakub

Re: [PATCH] c++/58109 - alignas() fails to compile with constant expression

2016-01-12 Thread Martin Sebor


On 01/11/2016 10:20 PM, Jason Merrill wrote:

On 12/22/2015 09:32 PM, Martin Sebor wrote:

+  if (is_attribute_p ("aligned", name)
+  || is_attribute_p ("vector_size", name))
+{
+  /* Attribute argument may be a dependent indentifier.  */
+  if (tree t = args ? TREE_VALUE (args) : NULL_TREE)
+if (value_dependent_expression_p (t)
+|| type_dependent_expression_p (t))
+  return true;
+}


Instead of this, is_late_template_attribute should be fixed to check
attribute_takes_identifier_p.


attribute_takes_identifier_p() returns false for the aligned
attribute and for vector_size (it returns true only for
attributes cleanup, format, and mode, and none others).

Are you suggesting to also change attribute_takes_identifier_p
to return true for these attributes?  That would likely mean
changes to the C front end as well.)

Thanks
Martin

[gomp4] OpenACC documentation for libgomp.

2016-01-12 Thread James Norris


Hi,

Backported:

2016-01-12  James Norris  

* libgomp.texi: Updates for OpenACC.

from trunk.

Thanks,
Jim
Index: ChangeLog.gomp
===
--- ChangeLog.gomp	(revision 232292)
+++ ChangeLog.gomp	(working copy)
@@ -1,3 +1,9 @@
+2016-01-12  James Norris  
+
+	Backport from trunk:
+	2016-01-12  James Norris  
+	* libgomp.texi: Updates for OpenACC.
+
 2016-01-11  Thomas Schwinge  
 
 	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: Remove
Index: libgomp.texi
===
--- libgomp.texi	(revision 232292)
+++ libgomp.texi	(working copy)
@@ -94,6 +94,14 @@
 @comment  better formatting.
 @comment
 @menu
+* Enabling OpenMP::  How to enable OpenMP for your
+ applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
+ interface.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
 * Enabling OpenACC:: How to enable OpenACC for your
  applications.
 * OpenACC Runtime Library Routines:: The OpenACC runtime application
@@ -104,14 +112,6 @@
  asynchronous operations.
 * OpenACC Library Interoperability:: OpenACC library interoperability with the
  NVIDIA CUBLAS library.
-* Enabling OpenMP::  How to enable OpenMP for your
- applications.
-* OpenMP Runtime Library Routines: Runtime Library Routines.
- The OpenMP runtime application programming
- interface.
-* OpenMP Environment Variables: Environment Variables.
- Influencing OpenMP runtime behavior with
- environment variables.
 * The libgomp ABI::  Notes on the external libgomp ABI.
 * Reporting Bugs::   How to report bugs in the GNU Offloading
  and Multi Processing Runtime Library.
@@ -126,643 +126,6 @@
 
 
 @c -
-@c Enabling OpenACC
-@c -
-
-@node Enabling OpenACC
-@chapter Enabling OpenACC
-
-To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
-flag @command{-fopenacc} must be specified.  This enables the OpenACC directive
-@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
-@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
-@code{!$} conditional compilation sentinels in free form and @code{c$},
-@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
-arranges for automatic linking of the OpenACC runtime library 
-(@ref{OpenACC Runtime Library Routines}).
-
-A complete description of all OpenACC directives accepted may be found in 
-the @uref{http://www.openacc.org/, OpenMP Application Programming
-Interface} manual, version 2.0.
-
-Note that this is an experimental feature, incomplete, and subject to
-change in future versions of GCC.  See
-@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
-
-
-
-@c -
-@c OpenACC Runtime Library Routines
-@c -
-
-@node OpenACC Runtime Library Routines
-@chapter OpenACC Runtime Library Routines
-
-The runtime routines described here are defined by section 3 of the OpenACC
-specifications in version 2.0.
-They have C linkage, and do not throw exceptions.
-Generally, they are available only for the host, with the exception of
-@code{acc_on_device}, which is available for both the host and the
-acceleration device.
-
-@menu
-* acc_get_num_devices:: Get number of devices for the given device type
-* acc_set_device_type::
-* acc_get_device_type::
-* acc_set_device_num::
-* acc_get_device_num::
-* acc_init::
-* acc_shutdown::
-* acc_on_device::   Whether executing on a particular device
-* acc_malloc::
-* acc_free::
-* acc_copyin::
-* acc_present_or_copyin::
-* acc_create::
-* acc_present_or_create::
-* acc_copyout::
-* acc_delete::
-* acc_update_device::
-* acc_update_self::
-* acc_map_data::
-* acc_unmap_data::
-* acc_deviceptr::
-* acc_hostptr::
-* acc_is_present::
-* acc_memcpy_to_device::
-* acc_memcpy_from_device::
-
-API routines for target platforms.
-
-* acc_get_current_cuda_device::
-*

Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 06:51:31PM +0100, Martin Jambor wrote:
> On Tue, Jan 12, 2016 at 06:36:21PM +0100, Martin Jambor wrote:
> > >   remap_decl (old_var, id);
> > > }
> > > - phase 2 - do the full remap_decls, but during that arrange that
> > >   remap_decl for non-zero id->remapping_type_depth if (!n) just returns
> > >   decl
> > 
> > ...they would not be copied here because remap_decl would not be
> > duplicating stuff.  So I'd end up with an original local decl when I
> > actually need a duplicate.
> > 
> 
> ugh, I'm trying to be too fast and obviously forgot about the
> id->remapping_type_depth part of the proposed condition.
> 
> Still, when could relying solely on id->remapping_type_depth fail?

Well, those functions are used for numerous purposes, and you'd only
want to not remap decls if not already remapped if id->remapping_type_depth
when inside of the copy_gimple_seq_and_replace_locals
path (and only for the remap_decls in there), so you need IMHO some
flag to distinguish that.

And the reason for the above suggested 2 phases, where the first phase just
calls remap_decl and nothing else on the non-VLAs is to make sure that
if a VLA type or DECL_VALUE_EXPR uses (usually scalar) vars declared in the
same bind block, then those are processed first.

Jakub

Re: [PATCH] PR testsuite/69181: ensure expected multiline outputs is cleared per-test

2016-01-12 Thread David Malcolm

On Sat, 2016-01-09 at 03:07 +0100, Bernd Schmidt wrote:
> On 01/09/2016 01:51 AM, David Malcolm wrote:
> > The root cause here is that the logic to reset the list of expected
> > multiline outputs was being run from:
> >handle-multiline-outputs, called by
> >  prune.exp's prune_gcc_output
> > and none of that happens if the test is skipped by a target exclusion
> > in dg-do.
> 
> Thanks for tackling this.
> 
> > diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> > index f9ec206..f778bca 100644
> > --- a/gcc/testsuite/lib/gcc-dg.exp
> > +++ b/gcc/testsuite/lib/gcc-dg.exp
> > @@ -836,6 +836,7 @@ if { [info procs saved-dg-test] == [list] } {
> > global testname_with_flags
> > global set_target_env_var
> > global keep_saved_temps_suffixes
> > +   global multiline_expected_outputs
> >
> > if { [ catch { eval saved-dg-test $args } errmsg ] } {
> > set saved_info $errorInfo
> > @@ -871,6 +872,7 @@ if { [info procs saved-dg-test] == [list] } {
> > if [info exists testname_with_flags] {
> > unset testname_with_flags
> > }
> > +   set multiline_expected_outputs []
> >   }
> >   }
> 
> I looked at this code, and there are two near-identical blocks which 
> reset all these variables. You are modifying only one of them, leaving 
> the one inside the if { catch } thing unchanged - is this intentional?

I'm not particularly strong at Tcl, but am I right in thinking that
given that we have this:

if { [ catch { eval saved-dg-test $args } errmsg ] } {
(A) set and unset various things
error $errmsg $saved_info
}
   (B) set and unset the same various things as (A)

that (B) will always be reached, and that the duplicates in (A) are
redundant? (unless they affect "error")

I see that this pattern was introduced back in r67696 aka
91a385a522a94154f9e0cd940c5937177737af02:

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 39ccaf6..c660eca 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2003-06-09  Mark Mitchell  
+
+   * lib/gcc-dg.exp (dg-test): Clear additional_files and
+   additional_sources.
+
 2003-05-21  David Taylor  
 
* gcc.dg/Wpadded.c: New file.
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index aade663..1feadc4 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -294,3 +294,31 @@ proc dg-require-gc-sections { args } {
return
 }
 }
+
+# We need to make sure that additional_files and additional_sources
+# are both cleared out after every test.  It is not enough to clear
+# them out *before* the next test run because gcc-target-compile gets
+# run directly from some .exp files (outside of any test).  (Those
+# uses should eventually be eliminated.) 
+
+# Because the DG framework doesn't provide a hook that is run at the
+# end of a test, we must replace dg-test with a wrapper.
+
+if { [info procs saved-dg-test] == [list] } {
+rename dg-test saved-dg-test
+
+proc dg-test { args } {
+   global additional_files
+   global additional_sources
+   global errorInfo
+
+   if { [ catch { eval saved-dg-test $args } errmsg ] } {
+   set saved_info $errorInfo
+   set additional_files ""
+   set additional_sources ""
+   error $errmsg $saved_info
+   }
+   set additional_files ""
+   set additional_sources ""
+}
+}

and this pattern has been extended over the years.

I *could* add the
  set multiline_expected_outputs []
to the block guarded by the if {}, but it feels like cargo-culting to
me.  Am I missing something?


> Otherwise this looks reasonable IMO.
> 
> 
> Bernd

Re: [patch] libstdc++/68276 and libstdc++68995 qualification in

2016-01-12 Thread Jonathan Wakely


On 21/12/15 13:02 +, Jonathan Wakely wrote:

Two patches to add missing std:: qualification to prevent ADL
problems. Both are regressions, 68276 only on trunk, but 68995 has
been broken since 4.8.0 (but only affects people mixing TR1 with
C++11, and I was already rude about them in Bugzilla so won't do it
again here ;-)


For the branches I added a better test for 68995, this extends the
test on trunk to match what's on the branches now.

Tested x86_64-linux, committed to trunk.
commit 574125855cb79becc19ed564040a0ca1b23ebabc
Author: Jonathan Wakely 
Date:   Tue Jan 12 19:19:02 2016 +

Extend std::function test for PR 68995

	* testsuite/20_util/function/68995.cc: Test reference_wrapper cases.

diff --git a/libstdc++-v3/testsuite/20_util/function/68995.cc b/libstdc++-v3/testsuite/20_util/function/68995.cc
index 78712d6..5690657 100644
--- a/libstdc++-v3/testsuite/20_util/function/68995.cc
+++ b/libstdc++-v3/testsuite/20_util/function/68995.cc
@@ -25,3 +25,8 @@
 std::tr1::shared_ptr test() { return {}; }
 
 std::function func = test;
+std::function funcr = std::ref(test);
+
+void test2(std::tr1::shared_ptr) { }
+
+std::function func2 = std::ref(test2);

Re: [PATCH], PowerPC IEEE 128-bit fp, #11-rev3 (enable libgcc conversions)

2016-01-12 Thread Michael Meissner

On Tue, Jan 12, 2016 at 12:18:55AM +, Joseph Myers wrote:
> On Mon, 11 Jan 2016, Michael Meissner wrote:
> 
> > I fixed the #ifdef to use __NO_FPRS__ (thanks for the heads up on that).  I
> > also believe I fixed the various formatting issues.  These two patches 
> > build on
> > a big endian power7 host and little endian power8 host with no regressions 
> > in
> > the testsuite (the gcc patch is included here, but it hasn't changed since 
> > the
> > previous version of this patch).  Are they ok to be checked in?
> 
> Are you sure you sent the right patch version?  I don't see those fixes in 
> this one.

You are right.  I did not update the patches from the changes I had made in the
branch.

[gcc]
2016-01-12  Michael Meissner  

* config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): Add support
for pack/unpack functions for __ibm128.
(PACK_IF): Likewise.
(UNPACK_IF): Likewise.

* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add
support for __ibm128 pack/unpack functions.
(rs6000_invalid_builtin): Likewise.
(rs6000_init_builtins): Likewise.
(rs6000_opt_masks): Likewise.

* config/rs6000/rs6000.h (MASK_FLOAT128): Add short name.
(RS6000_BTM_FLOAT128): Add support for __ibm128 pack/unpack
functions
(RS6000_BTM_COMMON): Likewise.

* config/rs6000/rs6000.md (f128_vsx): New mode attribute.
(unpack): Use FMOVE128_FPR iterator instead of FMOVE128, to
disallow __builtin_{pack,unpack}_longdouble if long double is IEEE
128-bit floating point.  Add support for the double values to be
in Altivec registers for TF/IF packing and unpacking, but restrict
TD packing sub-fields to be FPR registers.  Don't allow overlapped
register support for packing.  Allow pack inputs to be memory
locations.  Don't build generator functions for unpack_dm
and unpack_nodm.
(unpack_dm): Likewise.
(unpack_nodm): Likewise.
(pack): Likewise.

* config/rs6000/rs6000-builtin.def (__builtin_pack_ibm128): Add
built-in functions to pack/unpack explicit __ibm128 values.
(__builtin_unpack_ibm128): Likewise.

* doc/extend.texi (PowerPC Built-in Functions): Document
__builtin_pack_ibm128 and __builtin_unpack_ibm128.

[libgcc]
2016-01-12  Michael Meissner  
Steven Munroe 
Tulio Magno Quites Machado Filho 

* config/rs6000/sfp-exceptions.c: New file to provide exception
support for IEEE 128-bit floating point.

* config/rs6000/float128-hw.c: New file for ISA 3.0 IEEE 128-bit
floating point hardware support.

* config/rs6000/floattikf.c: New files for IEEE 128-bit floating
point conversions.
* config/rs6000/fixunskfti.c: Likewise.
* config/rs6000/fixkfti.c: Likewise.
* config/rs6000/floatuntikf.c: Likewise.
* config/rs6000/extendkftf2-sw.c: Likewise.
* config/rs6000/trunctfkf2-sw.c: Likewise.

* config/rs6000/float128-ifunc.c: New file to pick either IEEE
128-bit floating point software emulation or use ISA 3.0 hardware
support if it is available.

* config/rs6000/quad-float128.h: New file to support IEEE 128-bit
floating point.

* config/rs6000/t-float128: New Makefile fragments to enable
building __float128 emulation support.
* config/rs6000/t-float128-hw: Likewise.

* config/rs6000/float128-sed: New file to convert TF names to KF
names for PowerPC IEEE 128-bit floating point support.

* config/rs6000/sfp-machine.h (_FP_W_TYPE_SIZE): Use 64-bit types
when building on 64-bit systems, or when VSX is enabled.
(_FP_W_TYPE): Likewise.
(_FP_WS_TYPE): Likewise.
(_FP_I_TYPE): Likewise.
(TItype): Define on 64-bit systems.
(UTItype): Likewise.
(TI_BITS): Likewise.
(_FP_MUL_MEAT_D): Add support for using 64-bit types.
(_FP_MUL_MEAT_Q): Likewise.
(_FP_DIV_MEAT_D): Likewise.
(_FP_DIV_MEAT_Q): Likewise.
(_FP_NANFRAC_D): Likewise.
(_FP_NANFRAC_Q): Likewise.
(ISA_BIT): Add exception support if we are being compiled on a
machine with hardware floating point support to build the IEEE
128-bit emulation functions.
(FP_EX_INVALID): Likewise.
(FP_EX_OVERFLOW): Likewise.
(FP_EX_UNDERFLOW): Likewise.
(FP_EX_DIVZERO): Likewise.
(FP_EX_INEXACT): Likewise.
(FP_EX_ALL): Likewise.
(__sfp_handle_exceptions): Likewise.
(FP_HANDLE_EXCEPTIONS): Likewise.
(FP_RND_NEAREST): Likewise.
(FP_RND_ZERO): Likewise.
(FP_RND_PINF): Likewise.
(FP_RND_MINF): Likewise.
(FP_RND_MASK): Likewise.
(_FP_DECL_EX):

Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-12 Thread Martin Jambor

On Tue, Jan 12, 2016 at 06:36:21PM +0100, Martin Jambor wrote:
> > remap_decl (old_var, id);
> > }
> > - phase 2 - do the full remap_decls, but during that arrange that
> >   remap_decl for non-zero id->remapping_type_depth if (!n) just returns
> >   decl
> 
> ...they would not be copied here because remap_decl would not be
> duplicating stuff.  So I'd end up with an original local decl when I
> actually need a duplicate.
> 

ugh, I'm trying to be too fast and obviously forgot about the
id->remapping_type_depth part of the proposed condition.

Still, when could relying solely on id->remapping_type_depth fail?

Sorry for the noise,

Martin

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Martin Jambor

Hi,

On Tue, Jan 12, 2016 at 02:38:15PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 12, 2016 at 02:29:06PM +0100, Martin Jambor wrote:
> > GOMP_kernel_launch_attributes should not be there (it is a
> > reminiscence from before the device-specific target arguments) and
> > should be moved just to the HSA plugin.  I'll prepare a patch today.
> > 
> > While we do not have to share GOMP_hsa_kernel_dispatch, we actually do
> > use them in both the plugin and the compiler, where we only use it in
> > an offsetof, so that we only have the structure defined once.
> 
> But, even using it in offsetof might be wrong, the compiler could be a
> cross-compiler, and you'd use offsetof on the host, while you want it for
> the target, and that would be different.
> So, IMHO you need (unless you already have) built the structure as a tree
> type, lay it out, and then you can use at TYPE_SIZE_UNIT or
> DECL_FIELD_OFFSET and the like.
> 

I see. For now I have just put a FIXME there but have talked to Martin
about laying out the type properly.  This is what I have committed to
the branch.

Thanks,

Martin

2016-01-12  Martin Jambor  

include/
* gomp-constants.h (GOMP_kernel_launch_attributes): Removed.
(GOMP_hsa_kernel_dispatch): Likewise.

libgomp/
* plugin/plugin-hsa.c (GOMP_kernel_launch_attributes): Moved here.
(GOMP_hsa_kernel_dispatch): Likewise.

gcc/
* hsa-gen.c (GOMP_hsa_kernel_dispatch): Moved here.
---
 gcc/hsa-gen.c   | 35 +
 include/gomp-constants.h| 44 --
 libgomp/plugin/plugin-hsa.c | 47 +
 3 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 1715b57..f633dfd 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3747,6 +3747,41 @@ gen_set_num_threads (tree value, hsa_bb *hbb)
   hbb->append_insn (basic);
 }
 
+/* Collection of information needed for a dispatch of a kernel from a
+   kernel.  Keep in sync with libgomp's plugin-hsa.c.
+
+   FIXME: In order to support cross-compilations, we need to lay ot the type as
+   a tree and then use field_decl positions.
+ */
+
+struct GOMP_hsa_kernel_dispatch
+{
+  /* Pointer to a command queue associated with a kernel dispatch agent.  */
+  void *queue;
+  /* Pointer to reserved memory for OMP data struct copying.  */
+  void *omp_data_memory;
+  /* Pointer to a memory space used for kernel arguments passing.  */
+  void *kernarg_address;
+  /* Kernel object.  */
+  uint64_t object;
+  /* Synchronization signal used for dispatch synchronization.  */
+  uint64_t signal;
+  /* Private segment size.  */
+  uint32_t private_segment_size;
+  /* Group segment size.  */
+  uint32_t group_segment_size;
+  /* Number of children kernel dispatches.  */
+  uint64_t kernel_dispatch_count;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
+  /* Debug purpose argument.  */
+  uint64_t debug;
+  /* Levels-var ICV.  */
+  uint64_t omp_level;
+  /* Kernel dispatch structures created for children kernel dispatches.  */
+  struct GOMP_hsa_kernel_dispatch **children_dispatches;
+};
+
 /* Return an HSA register that will contain number of threads for
a future dispatched kernel.  Instructions are added to HBB.  */
 
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 1dae474..a8e7723 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -256,48 +256,4 @@ enum gomp_map_kind
 /* Identifiers of device-specific target arguments.  */
 #define GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES  (1 << 8)
 
-/* Structure describing the run-time and grid properties of an HSA kernel
-   lauch.  */
-
-struct GOMP_kernel_launch_attributes
-{
-  /* Number of dimensions the workload has.  Maximum number is 3.  */
-  uint32_t ndim;
-  /* Size of the grid in the three respective dimensions.  */
-  uint32_t gdims[3];
-  /* Size of work-groups in the respective dimensions.  */
-  uint32_t wdims[3];
-};
-
-/* Collection of information needed for a dispatch of a kernel from a
-   kernel.  */
-
-struct GOMP_hsa_kernel_dispatch
-{
-  /* Pointer to a command queue associated with a kernel dispatch agent.  */
-  void *queue;
-  /* Pointer to reserved memory for OMP data struct copying.  */
-  void *omp_data_memory;
-  /* Pointer to a memory space used for kernel arguments passing.  */
-  void *kernarg_address;
-  /* Kernel object.  */
-  uint64_t object;
-  /* Synchronization signal used for dispatch synchronization.  */
-  uint64_t signal;
-  /* Private segment size.  */
-  uint32_t private_segment_size;
-  /* Group segment size.  */
-  uint32_t group_segment_size;
-  /* Number of children kernel dispatches.  */
-  uint64_t kernel_dispatch_count;
-  /* Number of threads.  */
-  uint32_t omp_num_threads;
-  /* Debug purpose argument.  */
-  uint64_t debug;
-  /* Levels-var ICV.  */
-  uint64_t omp_level;
-  /* Kernel dispatch structures created for

Re: [Patch, libstdc++/68877] Reimplement __is_[nothrow_]swappable

2016-01-12 Thread Jonathan Wakely


On 23/12/15 22:15 +0100, Daniel Krügler wrote:


   PR libstdc++/68877
   * include/std/type_traits: Following N4511, reimplement __is_swappable and
   __is_nothrow_swappable. Move __is_swappable to namespace std, adjust
   callers. Use __is_nothrow_swappable in swap.
   * include/bits/move.h: Use __is_nothrow_swappable in swap.
   * testsuite/20_util/is_nothrow_swappable/value.cc: Extend; remove
   __is_swappable related tests.
   * testsuite/20_util/is_swappable/value.cc: New.
   * testsuite/20_util/is_swappable/requirements/explicit_instantiation.cc:
   New.
   * testsuite/20_util/is_swappable/requirements/typedefs.cc: New.
   * testsuite/25_algorithms/swap/68877.cc: New.


Committed to trunk now, thanks!

Re: [Patch, libstdc++/68877] Reimplement __is_[nothrow_]swappable

2016-01-12 Thread Jonathan Wakely

On 12 January 2016 at 20:38, Daniel Krügler wrote:
> Ping - this is a tentative reminder for this patch proposal.

I was just about to commit it an hour ago and my machine crashed. It's done now.

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-01-12 Thread Kyrill Tkachov


Hi all,

On 12/01/16 11:32, James Greenhalgh wrote:

On Mon, Jan 11, 2016 at 04:57:56PM -0600, Evandro Menezes wrote:

On 01/11/2016 05:53 AM, James Greenhalgh wrote:

I'd like to switch the logic around in aarch64.c such that
-mlow-precision-recip-sqrt causes us to always emit the low-precision
software expansion for reciprocal square root. I have two reasons to do
this; first is consistency across -mcpu targets, second is enabling more
-mcpu targets to use the flag for peak tuning.

I don't much like that the precision we use for -mlow-precision-recip-sqrt
differs between cores (and possibly compiler revisions). Yes, we're
under -ffast-math but I take this flag to mean the user explicitly wants the
low-precision expansion, and we should not diverge from that based on an
internal decision as to what is optimal for performance in the
high-precision case. I'd prefer to keep things as predictable as possible,
and here that means always emitting the low-precision expansion when asked.

Judging by the comments in the thread proposing the reciprocal square
root optimisation, this will benefit all cores currently supported by GCC.
To be clear, we would still not expand in the high-precision case for any
cores which do not explicitly ask for it. Currently that is Cortex-A57
and xgene, though I will be proposing a patch to remove Cortex-A57 from
that list shortly.

Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
is intended as a tuning flag for situations where performance is more
important than precision, but the current logic requires setting an
internal flag which also changes the performance characteristics where
high-precision is needed. This conflates two decisions the target might
want to make, and reduces the applicability of an option targets might
want to enable for performance. In particular, I'd still like to see
-mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
sequence for floats under Cortex-A57.

Based on that reasoning, this patch makes the appropriate change to the
logic. I've checked with the current -mcpu values to ensure that behaviour
without -mlow-precision-recip-sqrt does not change, and that behaviour
with -mlow-precision-recip-sqrt is to emit the low precision sequences.

I've also put this through bootstrap and test on aarch64-none-linux-gnu
with no issues.

OK?

Yes, it LGTM.

Thanks.


I appreciate the idea of uniformity whne an option is specified,
which led me to think if it wouldn't be a good ide to add an option
that would have the effect of focring the emission of the reciprocal
square root, effectively forcing the flag
AARCH64_EXTRA_TUNE_RECIP_SQRT on, regardless of the tuning flags for
the given core.  I think that this flag would be particularly useful
when specifying flags for specific functions, irrespective of the
core.

Thoughts?

Currently you can do this using the (mostly unsupported) -moverride
mechanism as -moverride=tune=recip_sqrt from the command line.
I'm not sure how reliable using this from
__attribute__((target("override=tune=recip_sqrt"))) would be, I wrote a small
testcase that didn't work as intended, but whether that is a bug or a
design decision I'm not yet sure. I think the logic for parsing the
target attribute is set up to reapply the command-line override string
over whichever tuning options you apply through the attribute, rather than
to allow you to apply a per-function override.


As a clarification: we don't support an "override" target attribute on aarch64.
I had a patch earlier in the year to hook up the override string parsing 
machinery
into the attributes parsing code, but didn't end up proposing it.
IIRC the syntax of the override string (using '=' multiple times) would 
needlessly
complicate the parsing code for something that's not intended to be used by 
regular
users but rather by power users that are exploring gcc internals.

Thanks,
Kyrill


As to whether we'd want to expose this as a fully supported,
user-visible setting, I'd rather not. Our claim is that for the
higher-precision sequences the results are close enough that we can
consider this like reassociation width or other core-specific tuning
parameters that we don't expose. What I'm hoping to avoid is a
proliferation of supported options which are not in anybody's regular
testing matrix. This one would not be so bad as it is automatically
enabled by some cores. For now I'd rather not add the option.

Thanks,
James

Re: [PATCH, testsuite] Stabilize test result output of dump-noaddr

2016-01-12 Thread Mike Stump

On Jan 12, 2016, at 12:48 AM, Thomas Preud'homme 
 wrote:
> This patch solve this problem by replacing the static pass number in the 
> output by a star, allowing for a stable output while retaining easy copy/
> pasting in shell.

> Is this ok for stage3?

Ok.

[PATCH, i386, AVX512] PR target/69228: Restrict default masks for prefetch gathers/scatters instructions.

2016-01-12 Thread Alexander Fomin

This patch addresses PR target/69228. Expanding non-mask builtins
for prefetch gather/scatter insns results in using default mask.
Although Intel ISA Extensions Programming Reference statement about
EVEX.aaa field in prefetch gather/scatter insns encoding is a bit
opaque, no default mask is allowed for that family.

Bootstrapped and regtested on x86_64-linux-gnu. OK for trunk?

Thanks,
Alexander
---
gcc/

PR target/69228
* config/i386/sse.md (define_expand "avx512pf_gatherpfsf"):
Change first operand predicate from register_or_constm1_operand
to register_operand.
(define_expand "avx512pf_gatherpfdf"): Likewise.
(define_expand "avx512pf_scatterpfsf"): Likewise.
(define_expand "avx512pf_scatterpfdf"): Likewise.
(define_insn "*avx512pf_gatherpfsf"): Remove.
(define_insn "*avx512pf_gatherpfdf"): Likewise.
(define_insn "*avx512pf_scatterpfsf"): Likewise.
(define_insn "*avx512pf_scatterpfdf"): Likewise.
* config/i386/i386.c (ix86_expand_builtin): Remove first operand
comparison with constm1_rtx from vec_prefetch_gen part.

gcc/testsuite

PR target/69228
* gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Adjust.
* gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Likewise.
---
 gcc/config/i386/i386.c |   5 +-
 gcc/config/i386/sse.md | 120 +
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0dps-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0qps-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1dps-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1qps-1.c|   3 +-
 10 files changed, 14 insertions(+), 135 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aac0847..c37eb74 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41821,13 +41821,12 @@ rdseed_step:
 
   op0 = fixup_modeless_constant (op0, mode0);
 
-  if (GET_MODE (op0) == mode0
- || (GET_MODE (op0) == VOIDmode && op0 != constm1_rtx))
+  if (GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
{
  if (!insn_data[icode].operand[0].predicate (op0, mode0))
op0 = copy_to_mode_reg (mode0, op0);
}
-  else if (op0 != constm1_rtx)
+  else
{
  op0 = copy_to_reg (op0);
  op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 278dd38..b96be36 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15674,7 +15674,7 @@
 
 (define_expand "avx512pf_gatherpfsf"
   [(unspec
- [(match_operand: 0 "register_or_constm1_operand")
+ [(match_operand: 0 "register_operand")
   (mem:
(match_par_dup 5
  [(match_operand 2 "vsib_address_operand")
@@ -15716,37 +15716,10 @@
(set_attr "prefix" "evex")
(set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpfsf"
-  [(unspec
- [(const_int -1)
-  (match_operator: 4 "vsib_mem_operator"
-   [(unspec:P
-  [(match_operand:P 1 "vsib_address_operand" "Tv")
-   (match_operand:VI48_512 0 "register_operand" "v")
-   (match_operand:SI 2 "const1248_operand" "n")]
-  UNSPEC_VSIBADDR)])
-  (match_operand:SI 3 "const_2_to_3_operand" "n")]
- UNSPEC_GATHER_PREFETCH)]
-  "TARGET_AVX512PF"
-{
-  switch (INTVAL (operands[3]))
-{
-case 3:
-  return "vgatherpf0ps\t{%4|%4}";
-case 2:
-  return "vgatherpf1ps\t{%4|%4}";
-default:
-  gcc_unreachable ();
-}
-}
-  [(set_attr "type" "sse")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "XI")])
-
 ;; Packed double variants
 (define_expand "avx512pf_gatherpfdf"
   [(unspec
- [(match_operand: 0 "register_or_constm1_operand")
+ [(match_operand: 0 "register_operand")
   (mem:V8DF
(match_par_dup 5
  [(match_operand 2 "vsib_address_operand")
@@ -15788,37 +15761,10 @@
(set_attr "prefix" "evex")
(set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpfdf"
-  [(unspec
- [(const_int -1)
-  (match_operator:V8DF 4 "vsib_mem_operator"
-   [(unspec:P
-  [(match_operand:P 1 "vsib_address_operand" "Tv")
-   (match_operand:VI4_256_8_512 0 "register_operand" "v")
-   (match_operand:SI 2

Re: [PATCH] Be less conservative in process_{output,input}_constraints (PR target/65689)

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 12:30:20PM +, James Greenhalgh wrote:
> > 2015-04-08  Jakub Jelinek  
> > 
> > PR target/65689
> > * genpreds.c (struct constraint_data): Add maybe_allows_reg and
> > maybe_allows_mem bitfields.
> > (maybe_allows_none_start, maybe_allows_none_end,
> > maybe_allows_reg_start, maybe_allows_reg_end, maybe_allows_mem_start,
> > maybe_allows_mem_end): New variables.
> > (compute_maybe_allows): New function.
> > (add_constraint): Use it to initialize maybe_allows_reg and
> > maybe_allows_mem fields.
> > (choose_enum_order): Sort the non-is_register/is_const_int/is_memory/
> > is_address constraints such that those that allow neither mem nor
> > reg come first, then those that only allow reg but not mem, then
> > those that only allow mem but not reg, then the rest.
> > (write_allows_reg_mem_function): New function.
> > (write_tm_preds_h): Call it.
> > * stmt.c (parse_output_constraint, parse_input_constraint): Use
> > the generated insn_extra_constraint_allows_reg_mem function
> > instead of always setting *allows_reg = true; *allows_mem = true;
> > for unknown extra constraints.
> 
> Hi Jakub,
> 
> This applies clean to gcc-5-branch. I've bootstrapped and tested it on
> x86_64-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf
> with no problems.
> 
> Is this OK to commit to gcc-5-branch so I can close out PR 65689?

Ok, thanks.

Jakub

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Tom de Vries


On 12/01/16 12:22, Richard Biener wrote:

Doesnt' the same issue apply to


>unsigned int *p;
>
>static void __attribute__((noinline, noclone))
>foo (void)
>{
>   unsigned int z;
>
>   for (z = 0; z < N; ++z)
> ++(*p);
>}

thus when we have a MEM_REF[p_1]?  SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.



I didn't manage to trigger this scenario, though I could probably make 
it happen by modifying ftree-loop-im to work in one case (the load of 
the value of p) but not the other (the *p load and store).



I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.


Ok, I'll retest and commit this patch.

Thanks,
- Tom
Don't return NULL access_fns in dr_analyze_indices

2016-01-12  Tom de Vries  

	* tree-data-ref.c (dr_analyze_indices): Don't return NULL access_fns.

	* gcc.dg/autopar/pr69110.c: New test.

	* testsuite/libgomp.c/pr69110.c: New test.

---
 gcc/testsuite/gcc.dg/autopar/pr69110.c | 19 +++
 gcc/tree-data-ref.c|  3 +++
 libgomp/testsuite/libgomp.c/pr69110.c  | 26 ++
 3 files changed, 48 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/autopar/pr69110.c b/gcc/testsuite/gcc.dg/autopar/pr69110.c
new file mode 100644
index 000..e236015
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr69110.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-loop-im -fdump-tree-parloops-details" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+void
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "FAILED: data dependencies exist across iterations" 1 "parloops" } } */
+
+
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a40f40d..6503012 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1023,6 +1023,9 @@ dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop)
 		build_int_cst (reference_alias_ptr_type (ref), 0));
 }
 
+  if (access_fns == vNULL)
+access_fns.safe_push (integer_zero_node);
+
   DR_BASE_OBJECT (dr) = ref;
   DR_ACCESS_FNS (dr) = access_fns;
 }
diff --git a/libgomp/testsuite/libgomp.c/pr69110.c b/libgomp/testsuite/libgomp.c/pr69110.c
new file mode 100644
index 000..0d9e5ca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr69110.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=2 -O1 -fno-tree-loop-im" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+static void __attribute__((noinline, noclone))
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+extern void abort (void);
+
+int
+main (void)
+{
+  foo ();
+  if (i != N)
+abort ();
+
+  return 0;
+}

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

On Tue, Jan 12, 2016 at 02:02:16PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 12, 2016 at 01:52:01PM +0100, Marek Polacek wrote:
> > --- gcc/testsuite/g++.dg/warn/permissive-1.C
> > +++ gcc/testsuite/g++.dg/warn/permissive-1.C
> > @@ -0,0 +1,8 @@
> > +// PR c++/68979
> > +// { dg-do compile }
> > +// { dg-options "-fpermissive -Wno-shift-overflow 
> > -Wno-shift-count-overflow -Wno-shift-count-negative" }
> > +
> > +enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" 
> > { target c++11 } }
> > +enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
> > +enum C { CC = 1 << 100 }; // { dg-warning "operand of shift expression" }
> > +enum D { DD = 31 << 30 }; // { dg-warning "shift expression" "" { target 
> > c++11 } }
> 
> Shouldn't this test be limited to
> // { dg-do compile { target int32 } }
> or better yet replace the 100 and 30 above with
> say __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 and __SIZEOF_INT__ * __CHAR_BIT__ - 
> 2
> ?
> I'd guess that on say int16 targets, or int64 targets (if we have any at
> some point) or int128 targets this wouldn't do what you are expecting.
> { target int32 } is not exactly right, because it still assumes __CHAR_BIT__ 
> == 8
> and for other char sizes it could fail.

Oh yeah, forgot about those...  The following should be better.
Thanks,

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-01-12  Marek Polacek  

PR c++/68979
* constexpr.c (cxx_eval_check_shift_p): Use permerror rather than
error_at and return negated flag_permissive.

* g++.dg/warn/permissive-1.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index e60180e..dbcc242 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1512,17 +1512,17 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (rhs) == -1)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
   if (compare_tree_int (rhs, uprec) >= 0)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is >= than "
- "the precision of the left operand",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is >= than "
+  "the precision of the left operand",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
 
   /* The value of E1 << E2 is E1 left-shifted E2 bit positions; [...]
@@ -1536,9 +1536,10 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (lhs) == -1)
{
  if (!ctx->quiet)
-   error_at (loc, "left operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc,
+  "left operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
   /* For signed x << y the following:
 (unsigned) x >> ((prec (lhs) - 1) - y)
@@ -1555,9 +1556,9 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_lt (integer_one_node, t))
{
  if (!ctx->quiet)
-   error_at (loc, "shift expression %q+E overflows",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc, "shift expression %q+E overflows",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
 }
   return false;
diff --git gcc/testsuite/g++.dg/warn/permissive-1.C 
gcc/testsuite/g++.dg/warn/permissive-1.C
index e69de29..bfaca76 100644
--- gcc/testsuite/g++.dg/warn/permissive-1.C
+++ gcc/testsuite/g++.dg/warn/permissive-1.C
@@ -0,0 +1,8 @@
+// PR c++/68979
+// { dg-do compile { target int32 } }
+// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
-Wno-shift-count-negative" }
+
+enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
target c++11 } }
+enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
+enum C { CC = 1 << __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 }; // { dg-warning 
"operand of shift expression" }
+enum D { DD = 10 << __SIZEOF_INT__ * __CHAR_BIT__ - 2 }; // { dg-warning 
"shift expression" "" { target c++11 } }

Marek

[PATCH] [RTEMS] Add Cortex-M7 multilib for FPU support

2016-01-12 Thread Sebastian Huber

gcc/ChangeLog
2016-01-12  Sebastian Huber  

* config/arm/t-rtems: Add cortex-m7/fpv5-d16 multilib.
---
 gcc/config/arm/t-rtems | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/t-rtems b/gcc/config/arm/t-rtems
index 3b62181..02dcd65 100644
--- a/gcc/config/arm/t-rtems
+++ b/gcc/config/arm/t-rtems
@@ -1,7 +1,7 @@
 # Custom RTEMS multilibs for ARM
 
-MULTILIB_OPTIONS  = mbig-endian mthumb 
march=armv6-m/march=armv7-a/march=armv7-r/march=armv7-m 
mfpu=neon/mfpu=vfpv3-d16/mfpu=fpv4-sp-d16 mfloat-abi=hard
-MULTILIB_DIRNAMES = eb thumb armv6-m armv7-a armv7-r armv7-m neon vfpv3-d16 
fpv4-sp-d16 hard
+MULTILIB_OPTIONS  = mbig-endian mthumb 
march=armv6-m/march=armv7-a/march=armv7-r/march=armv7-m/mcpu=cortex-m7 
mfpu=neon/mfpu=vfpv3-d16/mfpu=fpv4-sp-d16/mfpu=fpv5-d16 mfloat-abi=hard
+MULTILIB_DIRNAMES = eb thumb armv6-m armv7-a armv7-r armv7-m cortex-m7 neon 
vfpv3-d16 fpv4-sp-d16 fpv5-d16 hard
 
 # Enumeration of multilibs
 
@@ -16,5 +16,6 @@ MULTILIB_REQUIRED += mthumb/march=armv7-a
 MULTILIB_REQUIRED += mthumb/march=armv7-r/mfpu=vfpv3-d16/mfloat-abi=hard
 MULTILIB_REQUIRED += mthumb/march=armv7-r
 MULTILIB_REQUIRED += mthumb/march=armv7-m/mfpu=fpv4-sp-d16/mfloat-abi=hard
+MULTILIB_REQUIRED += mthumb/mcpu=cortex-m7/mfpu=fpv5-d16/mfloat-abi=hard
 MULTILIB_REQUIRED += mthumb/march=armv7-m
 MULTILIB_REQUIRED += mthumb
-- 
1.8.4.5

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 12:18 PM, Jakub Jelinek  wrote:
> On Tue, Jan 12, 2016 at 12:10:20PM +0100, Uros Bizjak wrote:
>> On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  
>> wrote:
>> > On Mon, 11 Jan 2016, H.J. Lu wrote:
>> >
>> >> Here is the updated patch.  Joseph, is this OK?
>> >
>> > I have no objections to this patch.
>>
>> Thinking some more, it looks to me that we also have to return 2 when
>> SSE2 (SSE doubles) is not enabled.
>>
>> I'm testing following patch:
>
> That looks weird.  If TARGET_80387 and !TARGET_SSE_MATH, then no matter
> whether sse2 is enabled or not, normal floating point operations will be
> performed in 387 stack and thus FLT_EVAL_METHOD should be 2, not 0.
> Do you want to do this because some instructions might be vectorized and
> therefore end up in sse registers?  For -std=c99 that shouldn't happen,
> already the C FE would promote all the arithmetics to be done in long
> doubles, and for -std=gnu99 it is acceptable if non-vectorized computations
> honor FLT_EVAL_METHOD and vectorized ones don't.

Eh, today is just not the day for science.

Hopefully, the logic in the patch below is correct:

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 6c63871..5b42e89 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
argc, const char **argv);
only SSE, rounding is correct; when using both SSE and the FPU,
the rounding precision is indeterminate, since either may be chosen
apparently at random.  */
-#define TARGET_FLT_EVAL_METHOD \
-  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
+#define TARGET_FLT_EVAL_METHOD \
+  (TARGET_MIX_SSE_I387 ? -1\
+   : TARGET_80387 && !(TARGET_SSE2 && TARGET_SSE_MATH) ? 2 : 0)

 /* Whether to allow x87 floating-point arithmetic on MODE (one of
SFmode, DFmode and XFmode) in the current excess precision

Uros.

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread Kirill Yukhin

Hello Jakub
On 08 Jan 21:20, Jakub Jelinek wrote:
> Hi!
> 
> This patch fixes
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> regressions that were introduced recently by fixing up the masked store check 
> for misalignment.
> The problem is that for v2df/v4df/v4sf/v8sf masked stores 
> ix86_expand_special_args_builtin
> failed to set aligned_mem and thus didn't set correct memory alignment.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Followed you discussion w/ HJ.
I think that metioned intrinsics should assume proper alignement and this
agrees with SDM.

So, your patch is ok for main trunk.

--
Thanks, K


> 
> 2016-01-08  Jakub Jelinek  
> 
>   PR target/69198
>   * config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
>   aligned_mem is properly set for AVX512-VL floating point masked
>   stores.
> 
> --- gcc/config/i386/i386.c.jj 2016-01-08 07:31:11.0 +0100
> +++ gcc/config/i386/i386.c2016-01-08 18:16:21.030354042 +0100
> @@ -39776,7 +39776,11 @@ ix86_expand_special_args_builtin (const
>memory = 0;
>break;
>  case VOID_FTYPE_PV8DF_V8DF_UQI:
> +case VOID_FTYPE_PV4DF_V4DF_UQI:
> +case VOID_FTYPE_PV2DF_V2DF_UQI:
>  case VOID_FTYPE_PV16SF_V16SF_UHI:
> +case VOID_FTYPE_PV8SF_V8SF_UQI:
> +case VOID_FTYPE_PV4SF_V4SF_UQI:
>  case VOID_FTYPE_PV8DI_V8DI_UQI:
>  case VOID_FTYPE_PV4DI_V4DI_UQI:
>  case VOID_FTYPE_PV2DI_V2DI_UQI:
> @@ -39834,10 +39838,6 @@ ix86_expand_special_args_builtin (const
>  case VOID_FTYPE_PV16QI_V16QI_UHI:
>  case VOID_FTYPE_PV32QI_V32QI_USI:
>  case VOID_FTYPE_PV64QI_V64QI_UDI:
> -case VOID_FTYPE_PV4DF_V4DF_UQI:
> -case VOID_FTYPE_PV2DF_V2DF_UQI:
> -case VOID_FTYPE_PV8SF_V8SF_UQI:
> -case VOID_FTYPE_PV4SF_V4SF_UQI:
>nargs = 2;
>klass = store;
>/* Reserve memory operand for target.  */
> 
>   Jakub

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Yuri Rumyantsev

Andreas,

Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing since
predication must be used instead of conditional move's.

2016-01-12 13:07 GMT+03:00 Andreas Schwab :
> gcc.dg/ifcvt-5.c fails on ia64:
>
> From ifcvt-5.c.223r.ce1:
>
> == Pass 2 ==
>
>
> == no more changes
>
> 1 possible IF blocks searched.
> 1 IF blocks converted.
> 2 true changes made.
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

Re: Backport: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609

2016-01-12 Thread Marcus Shawcroft

On 18 December 2015 at 12:13, James Greenhalgh  wrote:

> Looking back at the patch just before I hit commit, the 4.9 backport was
> a little different (as we still have a CANNOT_CHANGE_MODE_CLASS there).
> We can drop the aarch64-protos.h and aarch64.h changes, and we need to
> change the sense of the new check, such that we can return true for the
> case added by this patch, and false for the limited number of other safe
> cases in 4.9.
>
> Bootstrapped on aarch64-none-linux-gnu.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-12-14  James Greenhalgh  
>
> Backport from mainline.
> 2015-12-09  James Greenhalgh  
>
> PR rtl-optimization/67609
> * config/aarch64/aarch64.c
> (aarch64_cannot_change_mode_class): Don't permit word_mode
> subregs of full vector modes.
> * config/aarch64/aarch64.md (aarch64_movdi_low): Use
> zero_extract rather than truncate.
> (aarch64_movdi_high): Likewise.
>
> gcc/testsuite/
>
> 2015-12-14  James Greenhalgh  
>
> Backport from mainline.
> 2015-12-09  James Greenhalgh  
>
> PR rtl-optimization/67609
> * gcc.dg/torture/pr67609.c: New.
>

OK /Marcus

[PATCH] Fix PR69053

2016-01-12 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69053
* tree-vect-loop.c (get_initial_def_for_reduction): Properly
convert initial value for cond reductions.

* g++.dg/torture/pr69053.C: New testcase.

Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 232261)
--- gcc/tree-vect-loop.c(working copy)
*** get_initial_def_for_reduction (gimple *s
*** 4075,4084 
tree *elts;
int i;
bool nested_in_vect_loop = false;
-   tree init_value;
REAL_VALUE_TYPE real_init_val = dconst0;
int int_init_val = 0;
gimple *def_stmt = NULL;
  
gcc_assert (vectype);
nunits = TYPE_VECTOR_SUBPARTS (vectype);
--- 4075,4084 
tree *elts;
int i;
bool nested_in_vect_loop = false;
REAL_VALUE_TYPE real_init_val = dconst0;
int int_init_val = 0;
gimple *def_stmt = NULL;
+   gimple_seq stmts = NULL;
  
gcc_assert (vectype);
nunits = TYPE_VECTOR_SUBPARTS (vectype);
*** get_initial_def_for_reduction (gimple *s
*** 4107,4122 
return vect_create_destination_var (init_val, vectype);
  }
  
-   if (TREE_CONSTANT (init_val))
- {
-   if (SCALAR_FLOAT_TYPE_P (scalar_type))
- init_value = build_real (scalar_type, TREE_REAL_CST (init_val));
-   else
- init_value = build_int_cst (scalar_type, TREE_INT_CST_LOW (init_val));
- }
-   else
- init_value = init_val;
- 
switch (code)
  {
case WIDEN_SUM_EXPR:
--- 4107,4112 
*** get_initial_def_for_reduction (gimple *s
*** 4193,4199 
break;
  }
  }
!   init_def = build_vector_from_val (vectype, init_value);
break;
  
default:
--- 4183,4192 
break;
  }
  }
!   init_val = gimple_convert (, TREE_TYPE (vectype), init_val);
!   if (! gimple_seq_empty_p (stmts))
! gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
!   init_def = build_vector_from_val (vectype, init_val);
break;
  
default:
Index: gcc/testsuite/g++.dg/torture/pr69053.C
===
*** gcc/testsuite/g++.dg/torture/pr69053.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr69053.C  (working copy)
***
*** 0 
--- 1,17 
+ // { dg-do compile }
+ // { dg-additional-options "-march=core-avx2" { target x86_64-*-* i?86-*-* } }
+ struct A {
+ int *elem[1];
+ };
+ int a, d, e;
+ A *b;
+ int *c;
+ int main()
+ {
+   int *f = 0;
+   for (; e; e++)
+ if (b->elem[e])
+   f = c;
+   if (f)
+ a = d;
+ }

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Kyrill Tkachov



On 12/01/16 12:08, Jakub Jelinek wrote:

On Tue, Jan 12, 2016 at 12:04:22PM +, Kyrill Tkachov wrote:

2016-01-12  Kugan Vivekanandarajah  

* expr.c (expand_expr_real_1): Fix promoted sign in SUBREG_PRMOTED

I'd like to just point at the ChangeLog typo - PRMOTED instead of PROMOTED.


Since we're on the subject of the ChangeLog...
It should also refer to the PR: PR middle-end/67714

Kyrill

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 1:12 PM, Uros Bizjak  wrote:
> On Tue, Jan 12, 2016 at 12:18 PM, Jakub Jelinek  wrote:
>> On Tue, Jan 12, 2016 at 12:10:20PM +0100, Uros Bizjak wrote:
>>> On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  
>>> wrote:
>>> > On Mon, 11 Jan 2016, H.J. Lu wrote:
>>> >
>>> >> Here is the updated patch.  Joseph, is this OK?
>>> >
>>> > I have no objections to this patch.
>>>
>>> Thinking some more, it looks to me that we also have to return 2 when
>>> SSE2 (SSE doubles) is not enabled.
>>>
>>> I'm testing following patch:
>>
>> That looks weird.  If TARGET_80387 and !TARGET_SSE_MATH, then no matter
>> whether sse2 is enabled or not, normal floating point operations will be
>> performed in 387 stack and thus FLT_EVAL_METHOD should be 2, not 0.
>> Do you want to do this because some instructions might be vectorized and
>> therefore end up in sse registers?  For -std=c99 that shouldn't happen,
>> already the C FE would promote all the arithmetics to be done in long
>> doubles, and for -std=gnu99 it is acceptable if non-vectorized computations
>> honor FLT_EVAL_METHOD and vectorized ones don't.
>
> Eh, today is just not the day for science.
>
> Hopefully, the logic in the patch below is correct:
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 6c63871..5b42e89 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
> argc, const char **argv);
> only SSE, rounding is correct; when using both SSE and the FPU,
> the rounding precision is indeterminate, since either may be chosen
> apparently at random.  */
> -#define TARGET_FLT_EVAL_METHOD \
> -  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
> +#define TARGET_FLT_EVAL_METHOD \
> +  (TARGET_MIX_SSE_I387 ? -1\
> +   : TARGET_80387 && !(TARGET_SSE2 && TARGET_SSE_MATH) ? 2 : 0)
>
>  /* Whether to allow x87 floating-point arithmetic on MODE (one of
> SFmode, DFmode and XFmode) in the current excess precision

Using this patch, SSE math won't be emitted for a simple testcase
using " -O2 -msse -m32 -std=c99 -mfpmath=sse" compile flags:

float test (float a, float b)
{
  return a + b;
}

since we start with:

test (float a, float b)
{
  long double _2;
  long double _4;
  long double _5;
  float _6;

  :
  _2 = (long double) a_1(D);
  _4 = (long double) b_3(D);
  _5 = _2 + _4;
  _6 = (float) _5;
  return _6;
}

This is counter-intuitive, so I'd say we leave things as they are. The
situation where only floats are evaluated as floats and doubles are
evaluated as long doubles is not covered in the FLT_EVAL_METHOD spec.

Uros.

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Richard Biener

On Tue, 12 Jan 2016, Tom de Vries wrote:

> On 12/01/16 12:22, Richard Biener wrote:
> > Doesnt' the same issue apply to
> > 
> > > >unsigned int *p;
> > > >
> > > >static void __attribute__((noinline, noclone))
> > > >foo (void)
> > > >{
> > > >   unsigned int z;
> > > >
> > > >   for (z = 0; z < N; ++z)
> > > > ++(*p);
> > > >}
> > thus when we have a MEM_REF[p_1]?  SCEV will not analyze
> > its evolution to a POLYNOMIAL_CHREC and thus access_fns will
> > be NULL again.
> > 
> 
> I didn't manage to trigger this scenario, though I could probably make it
> happen by modifying ftree-loop-im to work in one case (the load of the value
> of p) but not the other (the *p load and store).
> 
> > I think avoiding a NULL access_fns is ok but it should be done
> > unconditionally, not only for the DECL_P case.
> 
> Ok, I'll retest and commit this patch.

Please add a comment as well.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 04:00:11PM +0300, Alexander Monakov wrote:
> Hello, Martin, Jakub, community,
> 
> This part of the patch:
> 
> On Mon, 7 Dec 2015, Martin Jambor wrote:
> > include/
> > * gomp-constants.h (GOMP_DEVICE_HSA): New macro.
> [snip]
> > (GOMP_kernel_launch_attributes): New type.
> > (GOMP_hsa_kernel_dispatch): New type.
> 
> is going to break build of NVPTX cross-compiler, because it uses uint32_t,
> uint64_t types like below, but those types will not be available when building
> nvptx libgcc.  gomp-constants.h is #include'd in libgcc via tm.h and
> offload.h.
> 
> Note how other files in include/ need to do a special dance with #ifdef
> HAVE_STDINT_H to include  and obtain uint64_t.
> 
> Shall I move the problematic structs into a separate file, gomp-types.h?

Or just move those into libgomp-plugin.h, those type definitions don't have
to be shared between the compiler and libgomp, the compiler has to duplicate
those definitions anyway, as it needs to create the IL of those types and
can't use the host structure type for that purpose.

> > diff --git a/include/gomp-constants.h b/include/gomp-constants.h
> > index dffd631..1dae474 100644
> > --- a/include/gomp-constants.h
> > +++ b/include/gomp-constants.h
> [snip]
> > +/* Structure describing the run-time and grid properties of an HSA kernel
> > +   lauch.  */
> > +
> > +struct GOMP_kernel_launch_attributes
> > +{
> > +  /* Number of dimensions the workload has.  Maximum number is 3.  */
> > +  uint32_t ndim;
> > +  /* Size of the grid in the three respective dimensions.  */
> > +  uint32_t gdims[3];
> > +  /* Size of work-groups in the respective dimensions.  */
> > +  uint32_t wdims[3];
> > +};

Jakub

Re: [PATCH] Cleanup vect testsuite includes

2016-01-12 Thread Richard Biener

On Mon, Jan 11, 2016 at 3:01 PM, Alan Lawrence  wrote:
> This was an attempt to make more of the vect testsuite compilable with a 
> stage-1
> compiler, i.e. without standard header files like stdlib.h, to ease looking 
> for
> differences in assembly output. (It is still necessary to comment out most of
> tree-vect.h to do this, but at least such temporary/local changes can be
> restricted to one file.)
>
> Inclusion of stdlib.h and signal.h are quite inconsistent, with some files
> explicitly declaring common functions like abort, and others #including the
> header, sometimes totally unnecessarily.
>
> I left files using malloc, calloc and free as is, tho I guess the same 
> treatment
> could be applied there.
>
> Tested (natively) on x86_64-none-linux-gnu and aarch64-none-linux-gnu.
>
> Is this OK for trunk?

Ok.

Richard.

> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/fast-math-bb-slp-call-3.c: Declare functions as 'extern'
> rather than #including math.h & stdlib.h.
> * gcc.dg/vect/pr47001.c: Declare abort as 'extern', remove stdlib.h.
> * gcc.dg/vect/pr49771.c: Likewise.
> * gcc.dg/vect/vect-10-big-array.c: Likewise.
> * gcc.dg/vect/vect-neg-store-1.c: Likewise.
> * gcc.dg/vect/vect-neg-store-2.c: Likewise.
> * gcc.dg/vect/slp-37.c: Change NULL to 0, remove stdlib.h.
> * gcc.dg/vect/pr40254.c: Remove unnecessary include of stdlib.h.
> * gcc.dg/vect/pr44507.c: Likewise.
> * gcc.dg/vect/pr45902.c: Likewise.
> * gcc.dg/vect/slp-widen-mult-half.c: Likewise.
> * gcc.dg/vect/vect-117.c: Likewise.
> * gcc.dg/vect/vect-99.c: Likewise.
> * gcc.dg/vect/vect-aggressive-1.c: Likewise.
> * gcc.dg/vect/vect-cond-1.c: Likewise.
> * gcc.dg/vect/vect-cond-2.c: Likewise.
> * gcc.dg/vect/vect-cond-3.c: Likewise.
> * gcc.dg/vect/vect-cond-4.c: Likewise.
> * gcc.dg/vect/vect-mask-load-1.c: Likewise.
> * gcc.dg/vect/vect-mask-loadstore-1.c: Likewise.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-1.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-const-s16.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-half-u8.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-half.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c: Remove unnecessary
> include of signal.h.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-16.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-16.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-17.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-2.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-3.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-4.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-5.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-6.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-7.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-9.c: Likewise.
> * gcc.dg/vect/vect-outer-5.c: Likewise.
> * gcc.dg/vect/vect-outer-6.c: Likewise.
> * gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c: Remove unnecessary
> include of stdio.h.
> ---
>  gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-3.c | 8 ++--
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c| 1 -
>  gcc/testsuite/gcc.dg/vect/pr40254.c | 1 -
>  gcc/testsuite/gcc.dg/vect/pr44507.c | 1 -
>  gcc/testsuite/gcc.dg/vect/pr45902.c | 1 -
>  gcc/testsuite/gcc.dg/vect/pr47001.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr49771.c | 3 ++-
>  gcc/testsuite/gcc.dg/vect/slp-37.c  | 5 ++---
>  gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c | 1 -
>  gcc/testsuite/gcc.dg/vect/vect-10-big-array.c   | 3 ++-
>  gcc/testsuite/gcc.dg/vect/vect-117.c

[PATCH] Fix PR69007

2016-01-12 Thread Richard Biener


The following fixes fallout by no longer overwriting detected patterns.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69007
* tree-vect-patterns.c (vect_vect_recog_func_ptrs): Move
widen_sum after dot_prod and sad.

Index: gcc/tree-vect-patterns.c
===
*** gcc/tree-vect-patterns.c(revision 232261)
--- gcc/tree-vect-patterns.c(working copy)
*** struct vect_recog_func
*** 75,85 
vect_recog_func_ptr fn;
const char *name;
  };
  static vect_recog_func vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
{ vect_recog_widen_mult_pattern, "widen_mult" },
-   { vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_over_widening_pattern, "over_widening" },
--- 75,89 
vect_recog_func_ptr fn;
const char *name;
  };
+ 
+ /* Note that ordering matters - the first pattern matching on a stmt
+is taken which means usually the more complex one needs to preceed
+the less comples onex (widen_sum only after dot_prod or sad for example).  
*/
  static vect_recog_func vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
+   { vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_over_widening_pattern, "over_widening" },

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Kyrill Tkachov


Hi Kugan,

On 12/01/16 06:22, kugan wrote:


When promote_function_mode and promote_ssa_mode changes the sign differently, 
following  is the cause for the problem in PR67714.

 _8 = fn1D.5055 ();
  f_13 = _8;

function returns -15 and in _8 it is sign extended. In the second statement, we say that the value is SUBREG_PROMOTED and promoted sign in unsigned which is wrong. When the value in _8 had come other ways than function call it would be 
correct (as it would be zero extended). Attached patch checks that and uses the correct promoted sign in this case.


The problem with the approach is, when you the following piece of code, we can 
still fail. But, I dont think I will ever happen. Any thoughts?


 _8 = fn1D.5055 ();
  _9 = _8
  f_13 = _9;

This is similar to PR65932 where sign change in PROMOTE_MODE causes problem for 
parameter. But need a different fix there.
Regression tested on arm-none-linux-gnu with no new regression. I also 
bootstrapped regression tested (on an earlier version of trunk) for 
x86_64-none-linux-gnu with no new regressions. If this OK, I will do a full 
testing again. Comments?

Thanks,
Kugan


gcc/ChangeLog:

2016-01-12  Kugan Vivekanandarajah  

* expr.c (expand_expr_real_1): Fix promoted sign in SUBREG_PRMOTED
for SSA_NAME when rhs has a value returned from function call.



Thanks for working on this.
I'll leave to other to comment on this part as I'm not overly familiar with 
that area but...


gcc/testsuite/ChangeLog:

2016-01-12  Kugan Vivekanandarajah  

* gcc.target/arm/pr67714.c: New test.


This test doesn't contain any arm-specific code so can you please put it in 
gcc.c-torture/execute/

Thanks,
Kyrill

Re: [PATCH] Be less conservative in process_{output,input}_constraints (PR target/65689)

2016-01-12 Thread James Greenhalgh

On Wed, Apr 08, 2015 at 11:00:59PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> Right now, stmt.c on constraints not hardcoded in it, and not
> define_{register,address,memory}_constraint just assumes the
> constraint might allow both reg and mem.  Unfortunately, on some
> constraints which clearly can't allow either of those leads to errors
> at -O0, because the expander doesn't try so hard to expand it as
> EXPAND_INITIALIZER.
> 
> The following patch is an attempt to handle at least the easy cases
> - define_constraint like:
> (define_constraint "S"
>   "A constraint that matches an absolute symbolic address."
>   (and (match_code "const,symbol_ref,label_ref")
>(match_test "aarch64_symbolic_address_p (op)")))
> where the match_code clearly proves that it never can match any REG/SUBREG,
> nor MEM, by teaching genpreds.c to emit an extra inline function that
> stmt.c can in process_{output,input}_constraint use for the unknown
> constraints.
> 
> On x86_64/i686 this only detects constraint G as not allowing reg nor mem
> (it is match_code const_double), and V (plus < and >, but those are
> hardcoded in stmt.c already) that it allows mem but not reg.
> On aarch64, in the first category it detects several constraints.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2015-04-08  Jakub Jelinek  
> 
>   PR target/65689
>   * genpreds.c (struct constraint_data): Add maybe_allows_reg and
>   maybe_allows_mem bitfields.
>   (maybe_allows_none_start, maybe_allows_none_end,
>   maybe_allows_reg_start, maybe_allows_reg_end, maybe_allows_mem_start,
>   maybe_allows_mem_end): New variables.
>   (compute_maybe_allows): New function.
>   (add_constraint): Use it to initialize maybe_allows_reg and
>   maybe_allows_mem fields.
>   (choose_enum_order): Sort the non-is_register/is_const_int/is_memory/
>   is_address constraints such that those that allow neither mem nor
>   reg come first, then those that only allow reg but not mem, then
>   those that only allow mem but not reg, then the rest.
>   (write_allows_reg_mem_function): New function.
>   (write_tm_preds_h): Call it.
>   * stmt.c (parse_output_constraint, parse_input_constraint): Use
>   the generated insn_extra_constraint_allows_reg_mem function
>   instead of always setting *allows_reg = true; *allows_mem = true;
>   for unknown extra constraints.

Hi Jakub,

This applies clean to gcc-5-branch. I've bootstrapped and tested it on
x86_64-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf
with no problems.

Is this OK to commit to gcc-5-branch so I can close out PR 65689?

Thanks,
James

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Alexander Monakov

Hello, Martin, Jakub, community,

This part of the patch:

On Mon, 7 Dec 2015, Martin Jambor wrote:
> include/
>   * gomp-constants.h (GOMP_DEVICE_HSA): New macro.
[snip]
>   (GOMP_kernel_launch_attributes): New type.
>   (GOMP_hsa_kernel_dispatch): New type.

is going to break build of NVPTX cross-compiler, because it uses uint32_t,
uint64_t types like below, but those types will not be available when building
nvptx libgcc.  gomp-constants.h is #include'd in libgcc via tm.h and
offload.h.

Note how other files in include/ need to do a special dance with #ifdef
HAVE_STDINT_H to include  and obtain uint64_t.

Shall I move the problematic structs into a separate file, gomp-types.h?

Thanks.
Alexander

> diff --git a/include/gomp-constants.h b/include/gomp-constants.h
> index dffd631..1dae474 100644
> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
[snip]
> +/* Structure describing the run-time and grid properties of an HSA kernel
> +   lauch.  */
> +
> +struct GOMP_kernel_launch_attributes
> +{
> +  /* Number of dimensions the workload has.  Maximum number is 3.  */
> +  uint32_t ndim;
> +  /* Size of the grid in the three respective dimensions.  */
> +  uint32_t gdims[3];
> +  /* Size of work-groups in the respective dimensions.  */
> +  uint32_t wdims[3];
> +};

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 1:43 PM, Jakub Jelinek  wrote:
> On Tue, Jan 12, 2016 at 01:32:05PM +0100, Uros Bizjak wrote:
>> Using this patch, SSE math won't be emitted for a simple testcase
>> using " -O2 -msse -m32 -std=c99 -mfpmath=sse" compile flags:
>>
>> float test (float a, float b)
>> {
>>   return a + b;
>> }
>>
>> since we start with:
>>
>> test (float a, float b)
>> {
>>   long double _2;
>>   long double _4;
>>   long double _5;
>>   float _6;
>>
>>   :
>>   _2 = (long double) a_1(D);
>>   _4 = (long double) b_3(D);
>>   _5 = _2 + _4;
>>   _6 = (float) _5;
>>   return _6;
>> }
>>
>> This is counter-intuitive, so I'd say we leave things as they are. The
>> situation where only floats are evaluated as floats and doubles are
>> evaluated as long doubles is not covered in the FLT_EVAL_METHOD spec.
>
> Well, for the -fexcess-precision=standard case (== -std=c99) FLT_EVAL_METHOD
> 2 doesn't hurt, that forces in the FE long double computation.  While if it
> is 0 with -msse -mfpmath=sse, it means that the FE leaves computations as is
> and they are computed in float precision for floats and in long double
> precision for doubles.  For -fexcess-precision=fast it is different, because
> the FE doesn't do anything, so in the end it is mixed in that case.
> So, for -msse -mfpmath=sse, I think either we need FLT_EVAL_METHOD 2 or -1
> or 2 for -fexcess-precision=standard and -1 for -fexcess-precision=fast.

I think that following definition describes -msse -mfpmath=sse
situation in the most elegant way. We can just declare that the
precision is not known in this case:

#define TARGET_FLT_EVAL_METHOD\
  (TARGET_MIX_SSE_I387 ? -1\
   : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : TARGET_SSE2 ? 0 : -1)

Using this patch, the compiler will still generate SSE instructions
for the above test.

Joseph, what is your opinion on this approach?

Uros.

Re: [Patch, libstdc++/68877] Reimplement __is_[nothrow_]swappable

2016-01-12 Thread Daniel Krügler

Ping - this is a tentative reminder for this patch proposal.

2015-12-23 22:15 GMT+01:00 Daniel Krügler :
> This is a second try for a patch for libstdc++ bug 68877. See below
> for responses.
>
> 2015-12-22 22:42 GMT+01:00 Jonathan Wakely :
>> On 21/12/15 12:45 +0100, Daniel Krügler wrote:
>>>
>>> 2015-12-14 21:48 GMT+01:00 Daniel Krügler :

 This is a reimplementation of __is_swappable and
 __is_nothrow_swappable according to

 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4511.html

 and also adds a missing usage of __is_nothrow_swappable in the swap
 overload for arrays. Strictly speaking the latter change differs from
 the Standard specification which requires the expression
 noexcept(swap(*a, *b)) to be used. On the other hand the Standard is
 broken in this regard, as pointed out by

 http://cplusplus.github.io/LWG/lwg-active.html#2554
>>
>> The patch doesn't apply cleanly because it repeats some of the new
>> files either twice or three times (and also has some trailing
>> whitespace that shouldn't be there).
>
> I can confirm this, albeit I don't understand why this happens. I'm
> using TortoiseSVN and when trying to create a patch file it creates
> double entries for new directories. I have now explicitly removed the
> added directories from the patch, I hope that your patch experience is
> now better.
>
>> After fixing the patch to only
>> create new files once it applies, but then I get some FAILs:
>>
>> FAIL: 20_util/is_nothrow_swappable/value.cc (test for excess errors)
>> FAIL: 20_util/is_swappable/value.cc (test for excess errors)
>>
>> I don't have time to analyse these today, so I'll wait until you're
>> able to do so.
>
> I'm sorry for these errors. I could now find a way to reproduce the
> tests and found that they were partially due to an incomplete commit
> and partially because of sleepiness on my side. I hopefully fixed
> these blatant errors and took the chance to increase the test cases
> even further.
>
> Thanks again,
>
> - Daniel



-- 


SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread H.J. Lu

On Tue, Jan 12, 2016 at 5:45 AM, Uros Bizjak  wrote:
> On Tue, Jan 12, 2016 at 2:42 PM, Jakub Jelinek  wrote:
>> On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
>>> GCC 5 has the same issue.  This patch should be backported to GCC 5
>>> with
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
>>>
>>> which supersedes:
>>>
>>> https://gcc.gnu.org/viewcvs/gcc?view=revision=231269
>>>
>>> OK to backport Jakub's and my patch for GCC 5?
>>
>> I think I'd prefer just r231269 and my patch for the branch, to make the
>> changes as small as possible, leave the cleanup on the trunk only.
>> But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.
>
> I agree with Jakub.
>
> Those two patches are OK for backport.
>

This is what I checked in.

Thanks.


-- 
H.J.
From e6a6fd4b2fb4bb239fed4de6f9374f9b102e9c0f Mon Sep 17 00:00:00 2001
From: ienkovich 
Date: Fri, 4 Dec 2015 14:18:58 +
Subject: [PATCH] Fix alignment check in AVX-512 masked store

	Backport from mainline
	2016-01-12  Jakub Jelinek  

	PR target/69198
	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
	aligned_mem is properly set for AVX512-VL floating point masked
	stores.

	2015-12-04  Ilya Enkovich  

	* config/i386/sse.md (_store_mask): Fix
	operand checked for alignment.
---
 gcc/ChangeLog  | 15 +++
 gcc/config/i386/i386.c |  8 
 gcc/config/i386/sse.md |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d7bc6a2..be24722 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2016-01-12  H.J. Lu  
+
+	Backport from mainline
+	2016-01-12  Jakub Jelinek  
+
+	PR target/69198
+	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
+	aligned_mem is properly set for AVX512-VL floating point masked
+	stores.
+
+	2015-12-04  Ilya Enkovich  
+
+	* config/i386/sse.md (_store_mask): Fix
+	operand checked for alignment.
+
 2016-01-12  James Greenhalgh  
 
 	Backport from mainline r222186.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3547ba6..b0c301b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -38259,7 +38259,11 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
   memory = 0;
   break;
 case VOID_FTYPE_PV8DF_V8DF_QI:
+case VOID_FTYPE_PV4DF_V4DF_QI:
+case VOID_FTYPE_PV2DF_V2DF_QI:
 case VOID_FTYPE_PV16SF_V16SF_HI:
+case VOID_FTYPE_PV8SF_V8SF_QI:
+case VOID_FTYPE_PV4SF_V4SF_QI:
 case VOID_FTYPE_PV8DI_V8DI_QI:
 case VOID_FTYPE_PV4DI_V4DI_QI:
 case VOID_FTYPE_PV2DI_V2DI_QI:
@@ -38319,10 +38323,6 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
 case VOID_FTYPE_PV16QI_V16QI_HI:
 case VOID_FTYPE_PV32QI_V32QI_SI:
 case VOID_FTYPE_PV64QI_V64QI_DI:
-case VOID_FTYPE_PV4DF_V4DF_QI:
-case VOID_FTYPE_PV2DF_V2DF_QI:
-case VOID_FTYPE_PV8SF_V8SF_QI:
-case VOID_FTYPE_PV4SF_V4SF_QI:
   nargs = 2;
   klass = store;
   /* Reserve memory operand for target.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9235753..15d7188 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1022,7 +1022,7 @@
   sse_suffix = "";
 }
 
-  if (misaligned_operand (operands[1], mode))
+  if (misaligned_operand (operands[0], mode))
 align = "u";
   else
 align = "a";
-- 
2.5.0

Re: [trans-mem, aa64, arm, ppc, s390] Fixing PR68964

2016-01-12 Thread David Edelsohn

On Tue, Jan 12, 2016 at 11:53 AM, Richard Henderson  wrote:
> The problem in this PR is that we never got around to flushing out the vector
> support for transactions for anything but x86.  My goal here is to make this 
> as
> generic as possible, so that it should Just Work with existing vector support
> in the backend.
>
> In addition, if I encounter other unexpected register types, I will now copy
> them to memory and use memcpy, rather than crash.
>
> The one piece of this that requires a tiny bit of extra work is enabling the
> vector entry points in libitm.
>
> For x86, we make sure to build the files with SSE or AVX support enabled.  For
> s390x, I do the same thing, enabling z13 support.  I suppose we might need to
> check for binutils support, but I'd rather do this only if necessary.
>
> For arm I'm less sure what to do, since I seem to recall that use of Neon sets
> a bit in the ELF header.  Which presumably means that the binary could no
> longer be run without neon, even though the entry points wouldn't be used.
>
> For powerpc, I don't know how to select Altivec if VSX isn't already enabled,
> or indeed if that's the best thing to do.

VSX is an extension of Altivec (VMX) -- VSX always includes Altivec.
If VSX is enable, Altivec will be enabled and available.

Thanks, David

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2016-01-12 Thread Richard Biener

On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law  wrote:
> On 01/11/2016 03:32 AM, Richard Biener wrote:
>
>>
>> Yeah, reassoc is largely about canonicalization.
>>
>>> Plus doing it in TER is almost certainly more complex than getting it
>>> right
>>> in reassoc to begin with.
>>
>>
>> I guess canonicalizing differently is ok but you'll still create
>> ((a & b) & 1) & c then if you only change the above place.
>
> What's best for that expression would depend on factors like whether or not
> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more parallelism
> while (((a & b) & c) & 1) is not good for parallelism, but does expose the
> bit test.
>
> reassoc currently generates ((a & 1) & b) & c which is dreadful as there's
> no ILP or chance of creating a bit test.  My patch shuffles things around,
> but still doesn't expose the ILP or bit test in the 4 operand case.  Based
> on the comments in reassoc, it didn't seem like the author thought anything
> beyond the 3-operand case was worth handling. So my patch just handles the
> 3-operand case.
>
>
>
>>
>> So I'm not sure what pattern the backend is looking for?
>
> It just wants the constant last in the sequence.  That exposes bit clear,
> set, flip, test, etc idioms.

But those don't feed another bit operation, right?  Thus we'd like to see
((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
are designed to feed conditionals (aka CC consuming ops)?

Richard.

>
>
> Jeff

[RFC] non-unit stride loads for size power of 2.

2016-01-12 Thread Kumar, Venkataramanan

Hi 

The code below it looks like we always call  “vect_permute_load_chain” to load 
non-unit strides of size powers of 2.

(---snip---)
/* If reassociation width for vector type is 2 or greater target machine can
 execute 2 or more vector instructions in parallel.  Otherwise try to
 get chain for loads group using vect_shift_permute_load_chain.  */
  mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
  
  if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
  || exact_log2 (size) != -1
  || !vect_shift_permute_load_chain (dr_chain, size, stmt,
 gsi, _chain))
vect_permute_load_chain (dr_chain, size, stmt, gsi, _chain);

static bool
vect_shift_permute_load_chain (vec dr_chain,
   unsigned int length,
   gimple *stmt,
   gimple_stmt_iterator *gsi,
   vec *result_chain)
{
…...
…...
  if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) ⇐ 
This is not used.
{
  unsigned int j, log_length = exact_log2 (length);
  for (i = 0; i < nelt / 2; ++i)
sel[i] = i * 2;
  for (i = 0; i < nelt / 2; ++i)
sel[nelt / 2 + i] = i * 2 + 1; 
(---snip--)


Is there any reason to do so? 

I have not done any benchmarking,  but tried simple test cases for -mavx 
targets with sizes 2, 4 and VF > 4 (short/char types).
Looks like using vect_shift_permute_load_chain seems better. 

Should we change it to something like this ?

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index d0e20da..b0f0a02 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec 
dr_chain, int size,
  get chain for loads group using vect_shift_permute_load_chain.  */
   mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
   if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
-  || exact_log2 (size) != -1
-  || !vect_shift_permute_load_chain (dr_chain, size, stmt,
-gsi, _chain))
+  || (!vect_shift_permute_load_chain (dr_chain, size, stmt,
+gsi, _chain)
+ && exact_log2 (size) != -1))
 vect_permute_load_chain (dr_chain, size, stmt, gsi, _chain);
   vect_record_grouped_load_vectors (stmt, result_chain);
   result_chain.release ();
 
regards,
Venkat.

[patch] libstdc++/69222 Prevent recursive instantiation in std::function

2016-01-12 Thread Jonathan Wakely


This fixes PR 69222 and PR 69005 for gcc-5-branch, by ensuring we
don't try to determine the result of invoking the function(Functor)
constructor argument when the type is incomplete (because that might
require instantiating the constructor again, which recurses).

Jason fixed 69005 on trunk by making the front end skip that
constructor when performing overload resolution for copy construction,
because it cannot be instantiated to make a copy, but there is still a
problem on the branch, so I'm fixing it in the library. I'm making the
same change on trunk, because it's an improvement anyway.

Tested x86_64-linux, committed to trunk and gcc-5-branch.

commit 540303f8e8f24a89ecd3698c70efe9ab753ef9d9
Author: redi 
Date:   Tue Jan 12 14:55:00 2016 +

Prevent recursive instantiation in std::function

	PR libstdc++/69005
	PR libstdc++/69222
	* include/std/functional (function::_Invoke): Remove, use result_of.
	(function::_Callable): Replace alias template with class template
	and use partial specialization instead of _NotSelf alias template.
	(function(_Functor)): Add "not self" constraint so that _Callable is
	not used while type is incomplete.
	* testsuite/20_util/function/69222.cc: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-5-branch@232274 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/std/functional b/libstdc++-v3/include/std/functional
index 139be61..717d1bf 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -1977,19 +1977,14 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 {
   typedef _Res _Signature_type(_ArgTypes...);
 
-  template
-	using _Invoke = decltype(__callable_functor(std::declval<_Functor&>())
- (std::declval<_ArgTypes>()...) );
+  template::type>
+	struct _Callable : __check_func_return_type<_Res2, _Res> { };
 
   // Used so the return type convertibility checks aren't done when
   // performing overload resolution for copy construction/assignment.
   template
-	using _NotSelf = __not_>;
-
-  template
-	using _Callable
-	  = __and_<_NotSelf<_Functor>,
-		   __check_func_return_type<_Invoke<_Functor>, _Res>>;
+	struct _Callable : false_type { };
 
   template
 	using _Requires = typename enable_if<_Cond::value, _Tp>::type;
@@ -2054,6 +2049,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
*  reference_wrapper, this function will not throw.
*/
   template>, void>,
 	   typename = _Requires<_Callable<_Functor>, void>>
 	function(_Functor);
 
@@ -2246,7 +2242,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 }
 
   template
-template
+template
   function<_Res(_ArgTypes...)>::
   function(_Functor __f)
   : _Function_base()
diff --git a/libstdc++-v3/testsuite/20_util/function/69222.cc b/libstdc++-v3/testsuite/20_util/function/69222.cc
new file mode 100644
index 000..7c9dfec
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function/69222.cc
@@ -0,0 +1,30 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+#include 
+
+// Reduced from c++/69005
+struct Foo {
+  std::function f;
+};
+
+extern Foo exfoo;
+Foo f(exfoo);
+Foo& r = f = exfoo;

Re: [PATCH] remove mark_hook gty attribute

2016-01-12 Thread Richard Biener

On Mon, Jan 11, 2016 at 11:54 PM,   wrote:
> From: Trevor Saunders 
>
> Hi,
>
> this hardly counts as a bug fix, but going through open bugs I saw PR54809, 
> and
> realized we don't actually need this attribute any more, so we might as well
> just remove it.
>
> bootstrapped + regtested on x86_64-linux-gnu, ok for now or gcc 7?  I don't
> mind waiting, but it would be nice to have one less thing to remember to do.

Ok.

Richard.

> Trev
>
> gcc/ChangeLog:
>
> 2016-01-11  Trevor Saunders  
>
> PR middle-end/54809
> * doc/gty.texi: Remove documentation of mark_hook.
> * gengtype.c (struct write_types_data): Remove code to support
> mark_hook attribute.
> (walk_type): Likewise.
> (write_func_for_structure): Likewise.
> ---
>  gcc/doc/gty.texi | 10 --
>  gcc/gengtype.c   | 33 +++--
>  2 files changed, 3 insertions(+), 40 deletions(-)
>
> diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
> index d3ca4e0..1a22e4b 100644
> --- a/gcc/doc/gty.texi
> +++ b/gcc/doc/gty.texi
> @@ -261,16 +261,6 @@ garbage collection runs, there's no need to mark 
> anything pointed to
>  by this variable, it can just be set to @code{NULL} instead.  This is used
>  to keep a list of free structures around for re-use.
>
> -@findex mark_hook
> -@item mark_hook ("@var{hook-routine-name}")
> -
> -If provided for a structure or union type, the given
> -@var{hook-routine-name} (between double-quotes) is the name of a
> -routine called when the garbage collector has just marked the data as
> -reachable. This routine should not change the data, or call any ggc
> -routine. Its only argument is a pointer to the just marked (const)
> -structure or union.
> -
>  @findex maybe_undef
>  @item maybe_undef
>
> diff --git a/gcc/gengtype.c b/gcc/gengtype.c
> index 966e597..be49660 100644
> --- a/gcc/gengtype.c
> +++ b/gcc/gengtype.c
> @@ -2407,7 +2407,6 @@ struct write_types_data
>const char *marker_routine;
>const char *reorder_note_routine;
>const char *comment;
> -  int skip_hooks;  /* skip hook generation if non zero */
>enum write_types_kinds kind;
>  };
>
> @@ -2677,8 +2676,6 @@ walk_type (type_p t, struct walk_type_data *d)
>maybe_undef_p = 1;
>  else if (strcmp (oo->name, "desc") == 0 && oo->kind == OPTION_STRING)
>desc = oo->info.string;
> -else if (strcmp (oo->name, "mark_hook") == 0)
> -  ;
>  else if (strcmp (oo->name, "nested_ptr") == 0
>  && oo->kind == OPTION_NESTED)
>nested_ptr_d = (const struct nested_ptr_data *) oo->info.nested;
> @@ -2918,7 +2915,6 @@ walk_type (type_p t, struct walk_type_data *d)
> const char *oldval = d->val;
> const char *oldprevval1 = d->prev_val[1];
> const char *oldprevval2 = d->prev_val[2];
> -   const char *struct_mark_hook = NULL;
> const int union_p = t->kind == TYPE_UNION;
> int seen_default_p = 0;
> options_p o;
> @@ -2942,13 +2938,6 @@ walk_type (type_p t, struct walk_type_data *d)
>   if (!desc && strcmp (o->name, "desc") == 0
>   && o->kind == OPTION_STRING)
> desc = o->info.string;
> - else if (!struct_mark_hook && strcmp (o->name, "mark_hook") == 0
> -  && o->kind == OPTION_STRING)
> -   struct_mark_hook = o->info.string;
> -
> -   if (struct_mark_hook)
> - oprintf (d->of, "%*s%s (&%s);\n",
> -  d->indent, "", struct_mark_hook, oldval);
>
> d->prev_val[2] = oldval;
> d->prev_val[1] = oldprevval2;
> @@ -3473,7 +3462,6 @@ write_func_for_structure (type_p orig_s, type_p s,
>const char *chain_next = NULL;
>const char *chain_prev = NULL;
>const char *chain_circular = NULL;
> -  const char *mark_hook_name = NULL;
>options_p opt;
>struct walk_type_data d;
>
> @@ -3509,9 +3497,6 @@ write_func_for_structure (type_p orig_s, type_p s,
>  else if (strcmp (opt->name, "chain_circular") == 0
>  && opt->kind == OPTION_STRING)
>chain_circular = opt->info.string;
> -else if (strcmp (opt->name, "mark_hook") == 0
> -&& opt->kind == OPTION_STRING)
> -  mark_hook_name = opt->info.string;
>  else if (strcmp (opt->name, "for_user") == 0)
>for_user = true;
>if (chain_prev != NULL && chain_next == NULL)
> @@ -3576,17 +3561,11 @@ write_func_for_structure (type_p orig_s, type_p s,
>oprintf (d.of, "))\n");
>if (chain_circular != NULL)
> oprintf (d.of, "return;\n  do\n");
> -  if (mark_hook_name && !wtd->skip_hooks)
> -   {
> - oprintf (d.of, "{\n");
> - oprintf (d.of, "  %s (xlimit);\n   ", mark_hook_name);
> -   }
> +
>oprintf (d.of, "   xlimit = (");
>d.prev_val[2] = "*xlimit";
>output_escaped_param (, chain_next, "chain_next");
>oprintf (d.of, ");\n");
> -

[PATCH][committed] libitm: Remove dead code and data.

2016-01-12 Thread Torvald Riegel

This removes code and data members that have not been used for quite a
while now.  The user-visible benefit is 8MB less space overhead if
libitm is used.

Tested on x86_64-linux and committed as r232275.


2016-01-12  Torvald Riegel  

* libitm_i.h (gtm_mask_stack): Remove.
* beginend.cc (gtm_stmlock_array, gtm_clock): Likewise.
* stmlock.h: Remove file.
* config/alpha/cacheline.h: Likewise.
* config/generic/cacheline.h: Likewise.
* config/powerpc/cacheline.h: Likewise.
* config/sparc/cacheline.h: Likewise.
* config/x86/cacheline.h: Likewise.

commit fe0abed5782347922d4f9dba13b9a917fe9d5296
Author: Torvald Riegel 
Date:   Mon Jan 11 19:30:14 2016 +0100

libitm: Remove dead code and data.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 367edc8..c801dab 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -36,9 +36,6 @@ gtm_rwlock GTM::gtm_thread::serial_lock;
 gtm_thread *GTM::gtm_thread::list_of_threads = 0;
 unsigned GTM::gtm_thread::number_of_threads = 0;
 
-gtm_stmlock GTM::gtm_stmlock_array[LOCK_ARRAY_SIZE];
-atomic GTM::gtm_clock;
-
 /* ??? Move elsewhere when we figure out library initialization.  */
 uint64_t GTM::gtm_spin_count_var = 1000;
 
diff --git a/libitm/config/alpha/cacheline.h b/libitm/config/alpha/cacheline.h
deleted file mode 100644
index c8da46d..000
--- a/libitm/config/alpha/cacheline.h
+++ /dev/null
@@ -1,38 +0,0 @@
-/* Copyright (C) 2009-2016 Free Software Foundation, Inc.
-   Contributed by Richard Henderson .
-
-   This file is part of the GNU Transactional Memory Library (libitm).
-
-   Libitm is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   .  */
-
-#ifndef LIBITM_ALPHA_CACHELINE_H
-#define LIBITM_ALPHA_CACHELINE_H 1
-
-// A cacheline is the smallest unit with which locks are associated.
-// The current implementation of the _ITM_[RW] barriers assumes that
-// all data types can fit (aligned) within a cachline, which means
-// in practice sizeof(complex long double) is the smallest cacheline size.
-// It ought to be small enough for efficient manipulation of the
-// modification mask, below.
-#define CACHELINE_SIZE 64
-
-#include "config/generic/cacheline.h"
-
-#endif // LIBITM_ALPHA_CACHELINE_H
diff --git a/libitm/config/generic/cacheline.h b/libitm/config/generic/cacheline.h
deleted file mode 100644
index 8b9f927..000
--- a/libitm/config/generic/cacheline.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/* Copyright (C) 2009-2016 Free Software Foundation, Inc.
-   Contributed by Richard Henderson .
-
-   This file is part of the GNU Transactional Memory Library (libitm).
-
-   Libitm is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   .  */
-
-#ifndef LIBITM_CACHELINE_H
-#define LIBITM_CACHELINE_H 1
-
-namespace GTM HIDDEN {
-
-// A cacheline is the smallest unit with which locks are associated.
-// The current implementation of the _ITM_[RW] barriers assumes that
-// all data types can fit (aligned) within a cachline, which means
-// in practice sizeof(complex long double) is the smallest cacheline size.
-// It ought to be small enough for efficient manipulation of the
-// modification

Re: [PATCH] OpenACC documentation for libgomp

2016-01-12 Thread James Norris


Bernd,

On 01/11/2016 11:23 AM, Bernd Schmidt wrote:

On 01/05/2016 04:47 PM, James Norris wrote:

I've updated the original patch after some very helpful
comments from Sandra (thank you, thank you).

OK to commit to trunk?


I'm probably not fully qualified to review the contents either, but few people
are and it looks reasonable enough that I guess I'll just ack it. Before that,
some questions though:


+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{function acc_async_test(arg);}
+@item   @tab @code{integer(kind=acc_handle_kind) arg}
+@item   @tab @code{logical acc_async_test}
+@end multitable


I guess this is how Fortran functions and their args/return values are
documented? Do we have other examples of this somewhere?


Yes, in the earlier section that describes OpenMP. One thing
that needs changing is 'Prototype' should be changed to 'Interface'
for Fortran.


+about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
+sections 4.1 and 4.2 of the â€œThe OpenACC
+Application Programming Interfaceâ€, Version 2.0, June, 2013.}.


Non-ascii characters. I'm guessing this should probably be some kind of texinfo
@something{} block; OTOH references to C standards in standards.texi just name
them in plain text.


As Jakub pointed out in followup, those instances should
be using a @uref and not double quoted.



I wonder if things like OpenMP and OpenACC should be mentioned in
standards.texi, but that is tangential to this patch.



That's a good idea. Thanks!

Thanks for taking the time for the review.

Jim

[4.9][PR69082]Backport "[PATCH][ARM]Tighten the conditions for arm_movw, arm_movt"

2016-01-12 Thread Renlin Li


Hi all,

Here I backport r227129 to branch 4.9 to fix exactly the same issue reported in 
PR69082.
It's been already committed on trunk and backportted to branch 5.


I have quoted the original message for the explanation.
The patch applies to branch 4.9 without any modifications.
Test case is not added as the one provided in the bugzilla ticket is too big 
and complex.

arm-none-linux-gnueabihf regression tested without any issues.

Is Okay to backport to branch 4.9?

Renlin Li


gcc/ChangeLog

2016-01-08  Renlin Li  

PR target/69082
Backport from mainline:
2015-08-24  Renlin Li  

* config/arm/arm-protos.h (arm_valid_symbolic_address_p): Declare.
* config/arm/arm.c (arm_valid_symbolic_address_p): Define.
* config/arm/arm.md (arm_movt): Use arm_valid_symbolic_address_p.
* config/arm/constraints.md ("j"): Add check for high code


On 19/08/15 15:37, Renlin Li wrote:



On 19/08/15 12:49, Renlin Li wrote:

Hi all,

This simple patch will tighten the conditions when matching movw and
arm_movt rtx pattern.
Those two patterns will generate the following assembly:

movw w1, #:lower16: dummy + addend
movt w1, #:upper16: dummy + addend

The addend here is optional. However, it should be an 16-bit signed
value with in the range -32768 <= A <= 32768.

By impose this restriction explicitly, it will prevent LRA/reload code
from generation invalid high/lo_sum code for arm target.
In process_address_1(), if the address is not legitimate, it will 
try to

generate high/lo_sum pair to put the address into register. It will
check if the target support those newly generated reload instructions.
By define those two patterns, arm will reject them if conditions is not
meet.

Otherwise, it might generate movw/movt instructions with addend larger
than 32768, this will cause a GAS error. GAS will produce '''offset out
of range'' error message when the addend for MOVW/MOVT REL 
relocation is

too large.


arm-none-eabi regression tests Okay, Okay to commit to the trunk and
backport to 5.0?

Regards,
Renlin

gcc/ChangeLog:

2015-08-19  Renlin Li  

   * config/arm/arm-protos.h (arm_valid_symbolic_address_p): 
Declare.

   * config/arm/arm.c (arm_valid_symbolic_address_p): Define.
   * config/arm/arm.md (arm_movt): Use 
arm_valid_symbolic_address_p.

   * config/arm/constraints.md ("j"): Add check for high code.


Thank you,
Renlin



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cef9eec..ff168bf 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -319,6 +319,7 @@ extern int vfp3_const_double_for_bits (rtx);
 
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
 	   rtx);
+extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c2095a3..7cc4d93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28664,6 +28664,38 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, rtx out, rtx in,
   #undef BRANCH
 }
 
+/* Returns true if the pattern is a valid symbolic address, which is either a
+   symbol_ref or (symbol_ref + addend).
+
+   According to the ARM ELF ABI, the initial addend of REL-type relocations
+   processing MOVW and MOVT instructions is formed by interpreting the 16-bit
+   literal field of the instruction as a 16-bit signed value in the range
+   -32768 <= A < 32768.  */
+
+bool
+arm_valid_symbolic_address_p (rtx addr)
+{
+  rtx xop0, xop1 = NULL_RTX;
+  rtx tmp = addr;
+
+  if (GET_CODE (tmp) == SYMBOL_REF || GET_CODE (tmp) == LABEL_REF)
+return true;
+
+  /* (const (plus: symbol_ref const_int))  */
+  if (GET_CODE (addr) == CONST)
+tmp = XEXP (addr, 0);
+
+  if (GET_CODE (tmp) == PLUS)
+{
+  xop0 = XEXP (tmp, 0);
+  xop1 = XEXP (tmp, 1);
+
+  if (GET_CODE (xop0) == SYMBOL_REF && CONST_INT_P (xop1))
+	  return IN_RANGE (INTVAL (xop1), -0x8000, 0x7fff);
+}
+
+  return false;
+}
 
 /* Returns true if a valid comparison operation and makes
the operands in a form that is valid.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 288bbb9..eefb7fa 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5774,7 +5774,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
 	(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
 		   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2"
+  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
   "movt%?\t%0, #:upper16:%c2"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 42935a4..f9e11e0 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -67,7 +67,8 @@
 (define_constraint "j"
  "A

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Yuri Rumyantsev

Hi All,

Here is a simple fix to exclude dg/ifcvt-5.c test from ia64 testing.

Is it OK for trunk?
testsuite/ChangeLog:
2016-01-12  Yuri Rumyantsev  

PR rtl-optimization/68920
gcc/testsuite/ChangeLog
* gcc.dg/ifcvt-5.c: Exclude it from ia64 testing.

2016-01-12 17:01 GMT+03:00 Andreas Schwab :
> Yuri Rumyantsev  writes:
>
>> Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing
>
> Sure, go ahead.
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

patch.1
Description: Binary data

Re: [PATCH : RL78] Disable interrupts during hardware multiplication routines

2016-01-12 Thread Mike Stump

On Jan 11, 2016, at 11:20 PM, Kaushik M Phatak  wrote:
> Kindly review the updated patch and let me know if it is OK.

My review comment is still outstanding.

Re: prevent "undef var" errors on gcc --help or --version

2016-01-12 Thread Olivier Hainque

Hello Bernd,

Thanks for your feedback on this :-)

> On 11 Jan 2016, at 17:09, Bernd Schmidt  wrote:
> 
> On 01/08/2016 02:23 PM, Olivier Hainque wrote:
>> +  /* Undefined variable references in specs are harmless if
>> + we're running for --help or --version alone, or together.  */
>> +  spec_undefvar_allowed =
>> +(((print_version || print_help_list)
>> +  && decoded_options_count == 2)
>> + ||
>> + ((print_version && print_help_list)
>> +  && decoded_options_count == 3));
>> +
> 
> This doesn't follow the formatting rules.

Arg, indeed. Revised version attached.

> Also, there are a couple of other options that cause gcc to just print 
> something and exit. Are these affected by missing env vars?

Some of these, for sure. For example, a common use case here is to
define a default --sysroot. We need this to be set properly for at
least --print-search-dirs and --print-prog-name, probably --print-file-name.

The print-multi family might be ok. It's heavily based on the presence
of other options on the command line, but maybe never depending on argument
values. I wasn't ready to bet though and opted for a conservative approach
first.

The attached patch is doing the same as the previous one, except more
explicitly and making it easier to adapt if deemed useful.

I could extract the decision code in a separate function if you prefer.

Olivier

spec-undef.diff
Description: Binary data

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Jim Wilson

On Mon, Jan 11, 2016 at 10:22 PM, kugan
 wrote:
> When promote_function_mode and promote_ssa_mode changes the sign
> differently, following  is the cause for the problem in PR67714.

> This is similar to PR65932 where sign change in PROMOTE_MODE causes problem
> for parameter. But need a different fix there.

One of the proposed fixes for PR65932 was to make PROMOTE_MODE work
the same as promote_function_mode.  That should fix both bugs, and
avoid some of the weirdness necessary to work around the problem where
they disagree.  However, that fix is stalled, because it causes
potential performance regressions for some older ARM versions.  I've
been meaning to look at that again.  It is probably a better fix than
the one you are proposing here if we can make it work.

Jim

[PATCH] Fix whitespace/typos in tree-ssa-threadupdate.c

2016-01-12 Thread Jeff Law



While working on 67755, I kept getting annoyed by the formatting goofs 
and typos.  So, without further delay, whitespace & typo fixes.


Bootstrapped on x86_64 for completeness & installed on the trunk.

Jeff
commit 5203c463e3a1c99634cb83a6ef22200ee68d0dcd
Author: Jeff Law 
Date:   Tue Jan 12 15:37:12 2016 -0700

* tree-ssa-threadupdate.c: Various whitespace and typo fixes.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8df676c..3c2cf3f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2016-01-12  Jeff Law  
+
+   * tree-ssa-threadupdate.c: Various whitespace and typo fixes.
+
 2016-01-12  Olivier Hainque  
 
* gcc.c (spec_undefvar_allowed): New global.
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index e118c49..1bf9ae6 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -353,7 +353,7 @@ lookup_redirection_data (edge e, enum insert_option insert)
   struct redirection_data *elt;
   vec *path = THREAD_PATH (e);
 
- /* Build a hash table element so we can see if E is already
+  /* Build a hash table element so we can see if E is already
  in the table.  */
   elt = XNEW (struct redirection_data);
   elt->path = path;
@@ -635,21 +635,21 @@ any_remaining_duplicated_blocks (vec 
*path,
are not part of any jump threading path, but add profile counts along
the path.
 
-   In the aboe example, after all jump threading is complete, we will
+   In the above example, after all jump threading is complete, we will
end up with the following control flow:
 
-   A B C
-   | | |
- Ea| |Eb |Ec
-   | | |
-   v v v
-  Ja JJc
-  / \  / \Eon' / \
+   A  B   C
+   |  |   |
+ Ea|  |Eb |Ec
+   |  |   |
+   v  v   v
+  Ja  J  Jc
+  / \/ \Eon' / \
  Eona/   \   ---/---\   \Eonc
-/ \ /  / \\
+/ \ /  / \   \
v   v  v   v  v
   Sona Soff  Son   Sonc
-\   /\  /
+\ /\ /
  \___/  \  _/
  \  /\/
   vv  v
@@ -793,19 +793,19 @@ compute_path_counts (struct redirection_data *rd,
 coming into the path that will contribute to the count flowing
 into the path successor.  */
   if (has_joiner && epath != elast)
-  {
-   /* Look for other incoming edges after joiner.  */
-   FOR_EACH_EDGE (ein, ei, epath->dest->preds)
- {
-   if (ein != epath
-   /* Ignore in edges from blocks we have duplicated for a
-  threading path, which have duplicated edge counts until
-  they are redirected by an invocation of this routine.  */
-   && !bitmap_bit_p (local_info->duplicate_blocks,
- ein->src->index))
- nonpath_count += ein->count;
- }
-  }
+   {
+ /* Look for other incoming edges after joiner.  */
+ FOR_EACH_EDGE (ein, ei, epath->dest->preds)
+   {
+ if (ein != epath
+ /* Ignore in edges from blocks we have duplicated for a
+threading path, which have duplicated edge counts until
+they are redirected by an invocation of this routine.  */
+ && !bitmap_bit_p (local_info->duplicate_blocks,
+   ein->src->index))
+   nonpath_count += ein->count;
+   }
+   }
   if (cur_count < path_out_count)
path_out_count = cur_count;
   if (epath->count < min_path_count)
@@ -827,14 +827,14 @@ compute_path_counts (struct redirection_data *rd,
  difference between elast->count and nonpath_count.  Otherwise the edge
  counts after threading will not be sane.  */
   if (has_joiner && path_out_count < elast->count - nonpath_count)
-  {
-path_out_count = elast->count - nonpath_count;
-/* But neither can we go above the minimum count along the path
-   we are duplicating.  This can be an issue due to profile
-   insanities coming in to this pass.  */
-if (path_out_count > min_path_count)
-  path_out_count = min_path_count;
-  }
+{
+  path_out_count = elast->count - nonpath_count;
+  /* But neither can we go above the minimum count along the path
+we are duplicating.  This can be an issue due to profile
+insanities coming in to this pass.  */
+  if (path_out_count > min_path_count)
+

[doc, 1/n] invoke.texi: name of gcc executable

2016-01-12 Thread Sandra Loosemore

I've checked in the first installment of my planned reorganization of 
the invoke.texi chapter.  Here I've deleted the section placed randomly 
in the middle of option descriptions that contained only a paragraph 
about the name of the gcc executable, and incorporated that information 
into the chapter introduction instead.  I did a little bit of editing of 
the text in the introduction as well, and rewrote the one reference to 
the deleted node so it makes sense without it.


-Sandra

2016-01-12  Sandra Loosemore 

	gcc/
	* doc/invoke.texi (Invoking GCC): Copy-edit.  Incorporate information
	about name of GCC executable.  Remove deleted node from menu.
	(Directory Options) <-B>: Remove cross-reference to deleted node.
	(Target Options): Delete section.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 232279)
+++ gcc/doc/invoke.texi	(working copy)
@@ -72,8 +72,9 @@ assembly and linking.  The ``overall opt
 process at an intermediate stage.  For example, the @option{-c} option
 says not to run the linker.  Then the output consists of object files
 output by the assembler.
+@xref{Overall Options,,Options Controlling the Kind of Output}.
 
-Other options are passed on to one stage of processing.  Some options
+Other options are passed on to one or more stages of processing.  Some options
 control the preprocessor and others the compiler itself.  Yet other
 options control the assembler and linker; most of these are not
 documented here, since you rarely need to use any of them.
@@ -85,9 +86,18 @@ for C programs; when an option is only u
 for a particular option does not mention a source language, you can use
 that option with all supported languages.
 
-@cindex C++ compilation options
-@xref{Invoking G++,,Compiling C++ Programs}, for a summary of special
-options for compiling C++ programs.
+@cindex cross compiling
+@cindex specifying machine version
+@cindex specifying compiler version and target machine
+@cindex compiler version, specifying
+@cindex target machine, specifying
+The usual way to run GCC is to run the executable called @command{gcc}, or
+@command{@var{machine}-gcc} when cross-compiling, or
+@command{@var{machine}-gcc-@var{version}} to run a specific version of GCC.
+When you compile C++ programs, you should invoke GCC as @command{g++} 
+instead.  @xref{Invoking G++,,Compiling C++ Programs}, 
+for information about the differences in behavior between @command{gcc} 
+and @code{g++} when compiling C++ programs.
 
 @cindex grouping options
 @cindex options, grouping
@@ -137,7 +147,6 @@ only one of these two forms, whichever o
 * Directory Options::   Where to find header files and libraries.
 Where to find the compiler executable files.
 * Spec Files::  How to pass switches to sub-processes.
-* Target Options::  Running a cross-compiler, or an old version of GCC.
 * Submodel Options::Specifying minor hardware or convention variations,
 such as 68010 vs 68020.
 * Code Gen Options::Specifying conventions for function calls, data layout
@@ -11733,7 +11742,8 @@ include files, and data files of the com
 The compiler driver program runs one or more of the subprograms
 @command{cpp}, @command{cc1}, @command{as} and @command{ld}.  It tries
 @var{prefix} as a prefix for each program it tries to run, both with and
-without @samp{@var{machine}/@var{version}/} (@pxref{Target Options}).
+without @samp{@var{machine}/@var{version}/} for the corresponding target
+machine and compiler version.
 
 For each subprogram to be run, the compiler driver first tries the
 @option{-B} prefix, if any.  If that name is not found, or if @option{-B}
@@ -12409,20 +12419,6 @@ proper position among the other output f
 
 @c man begin OPTIONS
 
-@node Target Options
-@section Specifying Target Machine and Compiler Version
-@cindex target options
-@cindex cross compiling
-@cindex specifying machine version
-@cindex specifying compiler version and target machine
-@cindex compiler version, specifying
-@cindex target machine, specifying
-
-The usual way to run GCC is to run the executable called @command{gcc}, or
-@command{@var{machine}-gcc} when cross-compiling, or
-@command{@var{machine}-gcc-@var{version}} to run a version other than the
-one that was installed last.
-
 @node Submodel Options
 @section Hardware Models and Configurations
 @cindex submodel options

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Joseph Myers

On Tue, 12 Jan 2016, Uros Bizjak wrote:

> I think that following definition describes -msse -mfpmath=sse
> situation in the most elegant way. We can just declare that the
> precision is not known in this case:
> 
> #define TARGET_FLT_EVAL_METHOD\
>   (TARGET_MIX_SSE_I387 ? -1\
>: (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : TARGET_SSE2 ? 0 : -1)
> 
> Using this patch, the compiler will still generate SSE instructions
> for the above test.
> 
> Joseph, what is your opinion on this approach?

I think this is reasonable.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Tom de Vries


On 12/01/16 14:04, Richard Biener wrote:

On Tue, 12 Jan 2016, Tom de Vries wrote:


On 12/01/16 12:22, Richard Biener wrote:

Doesnt' the same issue apply to


unsigned int *p;

static void __attribute__((noinline, noclone))
foo (void)
{
   unsigned int z;

   for (z = 0; z < N; ++z)
 ++(*p);
}

thus when we have a MEM_REF[p_1]?  SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.



I didn't manage to trigger this scenario, though I could probably make it
happen by modifying ftree-loop-im to work in one case (the load of the value
of p) but not the other (the *p load and store).


I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.


Ok, I'll retest and commit this patch.


Please add a comment as well.


Patch updated with comment.

During testing however, I ran into two testsuite regressions:

1.

-PASS: gfortran.dg/graphite/pr39516.f   -O  (test for excess errors)
+FAIL: gfortran.dg/graphite/pr39516.f   -O  (internal compiler error)
+FAIL: gfortran.dg/graphite/pr39516.f   -O  (test for excess errors)

AFAIU, this is a duplicate of PR68976.

Should I wait with committing the patch until PR68976 is fixed?

2.

-XFAIL: gcc.dg/graphite/scop-pr66980.c scan-tree-dump-times graphite 
"number of SCoPs: 1" 1
+XPASS: gcc.dg/graphite/scop-pr66980.c scan-tree-dump-times graphite 
"number of SCoPs: 1" 1


AFAIU, this is not a real regression, but the testcase needs to be 
updated. I'm not sure how. Sebastian, perhaps you have an idea there?


Thanks,
- Tom

>From 24dfdb5a8a536203ad159bcbeaee6931be032f32 Mon Sep 17 00:00:00 2001
From: Tom de Vries 
Date: Tue, 12 Jan 2016 01:45:11 +0100
Subject: [PATCH] Don't return NULL access_fns in dr_analyze_indices

2016-01-12  Tom de Vries  

	* tree-data-ref.c (dr_analyze_indices): Don't return NULL access_fns.

	* gcc.dg/autopar/pr69110.c: New test.

	* testsuite/libgomp.c/pr69110.c: New test.
---
 gcc/testsuite/gcc.dg/autopar/pr69110.c | 19 +++
 gcc/tree-data-ref.c|  4 
 libgomp/testsuite/libgomp.c/pr69110.c  | 26 ++
 3 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/autopar/pr69110.c
 create mode 100644 libgomp/testsuite/libgomp.c/pr69110.c

diff --git a/gcc/testsuite/gcc.dg/autopar/pr69110.c b/gcc/testsuite/gcc.dg/autopar/pr69110.c
new file mode 100644
index 000..e236015
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr69110.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-loop-im -fdump-tree-parloops-details" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+void
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "FAILED: data dependencies exist across iterations" 1 "parloops" } } */
+
+
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a40f40d..7ff5db7 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1023,6 +1023,10 @@ dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop)
 		build_int_cst (reference_alias_ptr_type (ref), 0));
 }
 
+  /* Ensure that DR_NUM_DIMENSIONS (dr) != 0.  */
+  if (access_fns == vNULL)
+access_fns.safe_push (integer_zero_node);
+
   DR_BASE_OBJECT (dr) = ref;
   DR_ACCESS_FNS (dr) = access_fns;
 }
diff --git a/libgomp/testsuite/libgomp.c/pr69110.c b/libgomp/testsuite/libgomp.c/pr69110.c
new file mode 100644
index 000..0d9e5ca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr69110.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=2 -O1 -fno-tree-loop-im" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+static void __attribute__((noinline, noclone))
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+extern void abort (void);
+
+int
+main (void)
+{
+  foo ();
+  if (i != N)
+abort ();
+
+  return 0;
+}
-- 
1.9.1

genattrab.c generate switch

2016-01-12 Thread Jesper Broge Jørgensen


Hello

genattrab.c can generate if statements that have very deep bracket 
nesting causing clang to produce errors (when target=arm-none-eabi) as 
explained at https://gcc.gnu.org/ml/gcc/2014-05/msg00032.html
At the above link it was suggested that genattrab.c generated a switch 
statement instead. I have made a patch that does just that.




gcc/ChangeLog:

2016-01-13  Jesper Broge Jørgensen  

* genattrtab.c (check_attr_set_switch): implemented the function
(write_attr_set): Check if expression can be written as a switch


diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index 2caf8f6..b6de642 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -275,6 +275,8 @@ static bool attr_alt_subset_of_compl_p (rtx, rtx);
 static void clear_struct_flag  (rtx);
 static void write_attr_valueq   (FILE *, struct attr_desc *, const 
char *);

 static struct attr_value *find_most_used  (struct attr_desc *);
+static int check_attr_set_switch (FILE *outf, rtx exp,
+unsigned int attrs_cached, int write_cases, int 
indent);

 static void write_attr_set   (FILE *, struct attr_desc *, int, rtx,
 const char *, const char *, rtx,
 int, int, unsigned int);
@@ -4113,6 +4115,102 @@ eliminate_known_true (rtx known_true, rtx exp, 
int insn_code, int insn_index)

   return exp;
 }

+/* Check if exp contains a series of IOR conditions on the same attr_name.
+   If it does it can be turned into a switch statement and returns true.
+   If write_cases is true it will write the cases of the switch to 
outf.  */

+
+static int
+check_attr_set_switch (FILE *outf, rtx exp, unsigned int attrs_cached,
+int write_cases, int indent)
+{
+  if (GET_CODE (exp) != IOR)
+return 0;
+  if (GET_CODE (XEXP (exp, 0)) != EQ_ATTR)
+return 0;
+
+  rtx next = exp;
+  int ior_depth = 0;
+  int is_first = 1;
+
+  const char *attr_name_cmp = XSTR (XEXP (exp, 0), 0);
+
+  while (1)
+  {
+rtx op1 = XEXP (next, 0);
+rtx op2 = XEXP (next, 1);
+
+if (GET_CODE (op1) != EQ_ATTR)
+  return 0;
+
+const char *attr_name = XSTR (op1, 0);
+const char *cmp_val = XSTR (op1, 1);
+
+/* pointer compare is enough.  */
+if (attr_name_cmp != attr_name)
+  return 0;
+
+if (write_cases)
+{
+  struct attr_desc *attr = find_attr (_name, 0);
+  gcc_assert (attr);
+  if (is_first)
+  {
+fprintf (outf, "(");
+is_first = 0;
+int i;
+for (i = 0; i < cached_attr_count; i++)
+  if (attr->name == cached_attrs[i])
+break;
+
+if (i < cached_attr_count && (attrs_cached & (1U << i)) != 0)
+  fprintf (outf, "cached_%s", attr->name);
+else if (i < cached_attr_count && (attrs_to_cache & (1U << i)) 
!= 0)

+  fprintf (outf, "(cached_%s = get_attr_%s (insn))", attr->name,
+attr->name);
+else
+  fprintf (outf, "get_attr_%s (insn)", attr->name);
+fprintf (outf, ")\n");
+write_indent (outf, indent);
+fprintf (outf, "{\n");
+  }
+  write_indent (outf, indent);
+  fprintf (outf, "case ");
+  write_attr_valueq (outf, attr, cmp_val);
+  fprintf (outf, ":\n");
+}
+
+const int code = GET_CODE (op2);
+if (code != IOR)
+{
+  if (code == EQ_ATTR)
+  {
+const char *attr_name = XSTR (op2, 0);
+const char *cmp_val = XSTR (op2, 1);
+
+if (attr_name == alternative_name)
+  return 0;
+
+struct attr_desc *attr = find_attr (_name, 0);
+gcc_assert (attr);
+
+if (attr->is_const)
+  return 0;
+else if (write_cases)
+{
+  write_indent (outf, indent);
+  fprintf (outf, "case ");
+  write_attr_valueq (outf, attr, cmp_val);
+  fprintf (outf, ":\n");
+}
+  }
+  break;
+}
+next = op2;
+ior_depth++;
+  }
+  return ior_depth > 2;
+}
+
 /* Write out a series of tests and assignment statements to perform 
tests and
sets of an attribute value.  We are passed an indentation amount 
and prefix

and suffix strings to write around each attribute value (e.g., "return"
@@ -4123,6 +4221,7 @@ write_attr_set (FILE *outf, struct attr_desc 
*attr, int indent, rtx value,

 const char *prefix, const char *suffix, rtx known_true,
 int insn_code, int insn_index, unsigned int attrs_cached)
 {
+  int n_switches = 0;
   if (GET_CODE (value) == COND)
 {
   /* Assume the default value will be the default of the COND 
unless we
@@ -4132,6 +4231,7 @@ write_attr_set (FILE *outf, struct attr_desc 
*attr, int indent, rtx value,

   rtx newexp;
   int first_if = 1;
   int i;
+  int is_switch = 0;

   if (cached_attr_count)
 {
@@ -4176,40 +4276,68 @@ write_attr_set (FILE *outf, struct attr_desc 
*attr, int indent, rtx value,

   if (inner_true == false_rtx)
 continue;

+

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Kugan



On 13/01/16 10:19, Jim Wilson wrote:
> On Mon, Jan 11, 2016 at 10:22 PM, kugan
>  wrote:
>> When promote_function_mode and promote_ssa_mode changes the sign
>> differently, following  is the cause for the problem in PR67714.
> 
>> This is similar to PR65932 where sign change in PROMOTE_MODE causes problem
>> for parameter. But need a different fix there.
> 
> One of the proposed fixes for PR65932 was to make PROMOTE_MODE work
> the same as promote_function_mode.  That should fix both bugs, and
> avoid some of the weirdness necessary to work around the problem where
> they disagree.  However, that fix is stalled, because it causes
> potential performance regressions for some older ARM versions.  I've
> been meaning to look at that again.  It is probably a better fix than
> the one you are proposing here if we can make it work.

Yes, making PROMOTE_MODE to work the same way as in
promote_function_mode in arm will fix this. Can you please point me to
the test cases that are regressing so that I can also start looking at them.

Thanks,
Kugan

Re: Note new TR29124 Special math functions on the web pages.

2016-01-12 Thread Jonathan Wakely


On 26/12/15 00:06 -0500, Ed Smith-Rowland wrote:

I can't get CVS to commit.

Could someone do this for me?


Done.


Index: ./htdocs/svn.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v
retrieving revision 1.206
diff -r1.206 svn.html
565a566,572

  tr29124
  This branch is for development of TR29124 Special math Functions,
for the C++ runtime library
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3060.pdf;>
.  It is maintained by Ed Smith-Rowland
mailto:3dw...@verizon.net;>3dw...@verizon.net.

Re: [PATCH], PowerPC IEEE 128-bit fp, #11-rev3 (enable libgcc conversions)

2016-01-12 Thread David Edelsohn

On Tue, Jan 12, 2016 at 6:47 PM, Joseph Myers  wrote:
> On Tue, 12 Jan 2016, Michael Meissner wrote:
>
>> On Tue, Jan 12, 2016 at 12:18:55AM +, Joseph Myers wrote:
>> > On Mon, 11 Jan 2016, Michael Meissner wrote:
>> >
>> > > I fixed the #ifdef to use __NO_FPRS__ (thanks for the heads up on that). 
>> > >  I
>> > > also believe I fixed the various formatting issues.  These two patches 
>> > > build on
>> > > a big endian power7 host and little endian power8 host with no 
>> > > regressions in
>> > > the testsuite (the gcc patch is included here, but it hasn't changed 
>> > > since the
>> > > previous version of this patch).  Are they ok to be checked in?
>> >
>> > Are you sure you sent the right patch version?  I don't see those fixes in
>> > this one.
>>
>> You are right.  I did not update the patches from the changes I had made in 
>> the
>> branch.
>
> I have no further comments on the patch.

If Joseph is satisfied, it's okay with me.

Thanks, David

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-01-12 Thread Evandro Menezes


On 12/16/2015 03:30 PM, Evandro Menezes wrote:

On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:
On 20 October 2015 at 00:40, Evandro Menezes  
wrote:
In the existing targets, it seems that it's always faster to zero up 
a DF

register with "movi %d0, #0" instead of "fmov %d0, xzr".

This patch modifies the respective pattern.


Hi Evandro,

This patch changes the generic, u architecture independent instruction
selection. The ARM ARM (C3.5.3) makes a specific recommendation about
the choice of instruction in this situation and the current
implementation in GCC follows that recommendation.  Wilco has also
picked up on this issue he has the same patch internal to ARM along
with an ongoing discussion with ARM architecture folk regarding this
recommendation.  I'm reluctant to take this patch right now on the
basis that it runs contrary to ARM ARM recommendation pending the
conclusion of Wilco's discussion with ARM architecture folk.


Have you had a chance to discuss this internally further?



Ping.

--
Evandro Menezes

1 2 >

1 - 100 of 122 matches

Mail list logo