[gomp4] Merge trunk r230048 (2015-11-09) into gomp-4_0-branch

2015-11-09 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r230084:

commit 2b663911808921396592f62fe3ae8eb1d49923a8
Merge: 4ca3d77 2f8f4fa
Author: tschwinge 
Date:   Tue Nov 10 07:06:39 2015 +

svn merge -r 229832:230048 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@230084 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [1/2] OpenACC routine support

2015-11-09 Thread Cesar Philippidis
On 11/09/2015 04:48 PM, Nathan Sidwell wrote:
> And these are the new tests.  Cesar, c-c++-common/goacc/routine-5.c will
> need adjusting with your C++ parser patch.  You'll see the two cases
> I've #if'd out.

I enabled those tests in trunk with the patch I posted here
.

Cesar


Re: [1/2] OpenACC routine support

2015-11-09 Thread Cesar Philippidis
On 11/09/2015 04:31 PM, Nathan Sidwell wrote:
> On 11/03/15 10:35, Jakub Jelinek wrote:
>> On Mon, Nov 02, 2015 at 02:21:43PM -0500, Nathan Sidwell wrote:
>>> --- gcc/c/c-parser.c(revision 229667)
>>> +++ gcc/c/c-parser.c(working copy)
>>> @@ -1160,7 +1160,8 @@ enum c_parser_prec {
>>>   static void c_parser_external_declaration (c_parser *);
>>>   static void c_parser_asm_definition (c_parser *);
>>>   static void c_parser_declaration_or_fndef (c_parser *, bool, bool,
>>> bool,
>>> -   bool, bool, tree *, vec);
>>> +   bool, bool, tree *, vec,
>>> +   tree);
>>
>> Wonder if this shouldn't be tree = NULL_TREE, then you'd avoid most of
>> the
>> c_parser_declaration_or_fndef caller changes.
>>
>> Otherwise, LGTM.
> 
> This is the patch I've just committed.  It includes c parser adjustments
> to detect the case of two function decls with a single type specifier. 
> Cesar will be applying a patch for the C++ parser for the same  case.

Here's the patch that Nathan was referring to. I ended up introducing a
boolean variable named first in the various functions which call
finalize_oacc_routines. The problem the original approach was having was
that the routine clauses is only applied to the first function
declarator in a declaration list. By using 'first', which is set to true
if the current declarator is the first in a sequence of declarators, I
was able to defer setting parser->oacc_routine to NULL.

Nathan already approved this patch, so I've applied it to trunk.

Cesar
2015-11-09  Cesar Philippidis  

	gcc/cp/
	* parser.c (cp_finalize_oacc_routine): New boolean first argument.
	(cp_ensure_no_oacc_routine): Update call to cp_finalize_oacc_routine.
	(cp_parser_simple_declaration): Maintain a boolean first to keep track
	of each new declarator.  Propagate it to cp_parser_init_declarator.
	(cp_parser_init_declarator): New boolean first argument.  Propagate it
	to cp_parser_save_member_function_body and cp_finalize_oacc_routine.
	(cp_parser_member_declaration): Likewise.
	(cp_parser_single_declaration): Update call to
	cp_parser_init_declarator.
	(cp_parser_save_member_function_body): New boolean first_decl argument.
	Propagate it to cp_finalize_oacc_routine.
	(cp_parser_finish_oacc_routine): New boolean first argument.  Use it to
	determine if multiple declarators follow a routine construct.
	(cp_parser_oacc_routine): Update call to cp_parser_finish_oacc_routine.

	gcc/testsuite/
	* c-c++-common/goacc/routine-5.c: Enable c++ tests.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6fc2c6a..f3b4b46 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -246,7 +246,7 @@ static bool cp_parser_omp_declare_reduction_exprs
 static tree cp_parser_cilk_simd_vectorlength 
   (cp_parser *, tree, bool);
 static void cp_finalize_oacc_routine
-  (cp_parser *, tree, bool);
+  (cp_parser *, tree, bool, bool);
 
 /* Manifest constants.  */
 #define CP_LEXER_BUFFER_SIZE ((256 * 1024) / sizeof (cp_token))
@@ -1329,7 +1329,7 @@ cp_finalize_omp_declare_simd (cp_parser *parser, tree fndecl)
 static inline void
 cp_ensure_no_oacc_routine (cp_parser *parser)
 {
-  cp_finalize_oacc_routine (parser, NULL_TREE, false);
+  cp_finalize_oacc_routine (parser, NULL_TREE, false, true);
 }
 
 /* Decl-specifiers.  */
@@ -2135,7 +2135,7 @@ static tree cp_parser_decltype
 
 static tree cp_parser_init_declarator
   (cp_parser *, cp_decl_specifier_seq *, vec *,
-   bool, bool, int, bool *, tree *, location_t *);
+   bool, bool, int, bool *, tree *, bool, location_t *);
 static cp_declarator *cp_parser_declarator
   (cp_parser *, cp_parser_declarator_kind, int *, bool *, bool, bool);
 static cp_declarator *cp_parser_direct_declarator
@@ -2445,7 +2445,7 @@ static tree cp_parser_single_declaration
 static tree cp_parser_functional_cast
   (cp_parser *, tree);
 static tree cp_parser_save_member_function_body
-  (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree);
+  (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree, bool);
 static tree cp_parser_save_nsdmi
   (cp_parser *);
 static tree cp_parser_enclosed_template_argument_list
@@ -11909,6 +11909,7 @@ cp_parser_simple_declaration (cp_parser* parser,
   bool saw_declarator;
   location_t comma_loc = UNKNOWN_LOCATION;
   location_t init_loc = UNKNOWN_LOCATION;
+  bool first = true;
 
   if (maybe_range_for_decl)
 *maybe_range_for_decl = NULL_TREE;
@@ -12005,7 +12006,10 @@ cp_parser_simple_declaration (cp_parser* parser,
 	declares_class_or_enum,
 	&function_definition_p,
 	maybe_range_for_decl,
+	first,
 	&init_loc);
+  first = false;
+
   /* If an error occurred while parsing tentatively, exit quickly.
 	 (That usually happens when in the body of a function; each
 	 statement is treated as a declaration-statement until proven
@@ -12104,6 +12108,9 @@ cp_parser_simple_declaration (cp_parser* parser,
 
  done:
   pop_deferring_access_checks ();
+
+  /* Reset any acc routine clauses. 

Re: [PATCH][haifa-sched] PR rtl-optimization/68236: Exit early from autoprefetcher lookahead if not in haifa sched

2015-11-09 Thread Vladimir Makarov

On 11/09/2015 11:31 AM, Kyrill Tkachov wrote:


Ok for trunk?


Yes, thanks.



2015-11-09  Kyrylo Tkachov  

PR rtl-optimization/68236
* haifa-sched.c (autopref_multipass_dfa_lookahead_guard): Return 0
if insn_queue doesn't exist.
(haifa_sched_finish): Reset insn_queue to NULL.




Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Joseph Myers
On Mon, 9 Nov 2015, Trevor Saunders wrote:

> The add default macro definitions then wrap those with hooks, then
> target by target replace the macro by hook overrides approach seems to
> provide that you can incrementally test and fiind most of the issues,
> but the change a macro every where approach doesn't really.

I have this notion that once a target macro is "regular" enough - not used 
in code built for the target, not used in driver code, not used directly 
or indirectly in #if conditions except for the single default definition 
in defaults.h, target definitions only depend on the target architecture 
and not OS or other variations - it ought to be possible to do the 
conversion to a hook with some kind of automated refactoring tool 
(possibly with a little editing of its results).  And so this sort of 
regularizing of target macros is helpful because it increases the number 
of target macros that could be converted in an automated manner.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-09 Thread Bin.Cheng
On Mon, Nov 9, 2015 at 11:24 PM, Bernd Schmidt  wrote:
> On 11/08/2015 10:11 AM, Richard Biener wrote:
>>
>> On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng"
>>  wrote:

 +inline bool
 +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
 +  const iv_common_cand *ccand2)
 +{
 +  return ccand1->hash == ccand2->hash
 +&& operand_equal_p (ccand1->base, ccand2->base, 0)
 +&& operand_equal_p (ccand1->step, ccand2->step, 0)
 +&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
 + == TYPE_PRECISION (TREE_TYPE (ccand2->base));

>> Yes.  Patch is OK then.
>
>
> Doesn't follow the formatting rules though in the quoted piece.

Hi Bernd,
Thanks for reviewing.  I haven't committed it yet, could you please
point out which quoted piece is so that I can update patch?

Thanks,
bin
>
>
> Bernd
>


Re: [PATCH/RFC/RFA] Machine modes for address printing (all targets)

2015-11-09 Thread Joseph Myers
On Mon, 9 Nov 2015, Julian Brown wrote:

> Thanks! I used the attached "build-all.sh" to test all the targets
> affected by the patch with "make all-gcc": those now all succeed
> (I'm sure I reinvented a wheel here, but perhaps the target list is
> useful to someone else).

The wheel you reinvented is called contrib/config-list.mk (which (a) 
requires you to have a native bootstrapped compiler from current trunk in 
your PATH - it uses --enable-werror-always so that cross compilers fail 
for warnings that would cause a native compiler bootstrap to fail - and 
(b) is intended to build compilers for targets covering all significantly 
different variations, though I think you could use it with a custom target 
list).

-- 
Joseph S. Myers
jos...@codesourcery.com


[SPARC] Fix PR target/57845

2015-11-09 Thread Eric Botcazou
This PR is about an ICE on SPARC 32-bit with -freg-struct-return.  I didn't 
know that this option was supported on this architecture but apparently it 
was, at least in the 3.x series, so the attached patchlet fixes the ICE.

Tested on SPARC/Solaris (including compat testing), applied on all branches.


2015-11-09  Eric Botcazou  

PR target/57845
* config/sparc/sparc.c (sparc_function_value_1): In 32-bit mode, do
not promote the mode for aggregate types.


2015-11-09  Eric Botcazou  

* gcc.target/sparc/sparc-ret.c: Rename to...
* gcc.target/sparc/sparc-ret-1.c: ...this.
* gcc.target/sparc/sparc-ret-2.c: New test.

-- 
Eric BotcazouIndex: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 230016)
+++ config/sparc/sparc.c	(working copy)
@@ -7329,9 +7329,10 @@ sparc_function_value_1 (const_tree type,
 	mode = word_mode;
 }
 
-  /* We should only have pointer and integer types at this point.  This must
- match sparc_promote_function_mode.  */
+  /* We should only have pointer and integer types at this point, except with
+ -freg-struct-return.  This must match sparc_promote_function_mode.  */
   else if (TARGET_ARCH32
+	   && !(type && AGGREGATE_TYPE_P (type))
 	   && mclass == MODE_INT
 	   && GET_MODE_SIZE (mode) < UNITS_PER_WORD)
 mode = word_mode;
/* PR target/57845 */

/* { dg-do compile } */
/* { dg-options "-freg-struct-return" } */

struct S { short int i; };

struct S foo (short int i)
{
  struct S s;
  s.i = i;
  return s;
}


Re: [1/2] OpenACC routine support

2015-11-09 Thread Nathan Sidwell
And these are the new tests.  Cesar, c-c++-common/goacc/routine-5.c will need 
adjusting with your C++ parser patch.  You'll see the two cases I've #if'd out.


nathan
2015-11-09  Nathan Sidwell  

	gcc/testsuite/
	* c-c++-common/goacc/routine-1.c: New.
	* c-c++-common/goacc/routine-2.c: New.
	* c-c++-common/goacc/routine-3.c: New.
	* c-c++-common/goacc/routine-4.c: New.
	* c-c++-common/goacc/routine-5.c: New.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: New.

Index: gcc/testsuite/c-c++-common/goacc/routine-1.c
===
--- gcc/testsuite/c-c++-common/goacc/routine-1.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-1.c	(working copy)
@@ -0,0 +1,34 @@
+
+#pragma acc routine gang
+void gang (void)
+{
+}
+
+#pragma acc routine worker
+void worker (void)
+{
+}
+
+#pragma acc routine vector
+void vector (void)
+{
+}
+
+#pragma acc routine seq
+void seq (void)
+{
+}
+
+int main ()
+{
+
+#pragma acc parallel num_gangs (32) num_workers (32) vector_length (32)
+  {
+gang ();
+worker ();
+vector ();
+seq ();
+  }
+
+  return 0;
+}
Index: gcc/testsuite/c-c++-common/goacc/routine-2.c
===
--- gcc/testsuite/c-c++-common/goacc/routine-2.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-2.c	(working copy)
@@ -0,0 +1,21 @@
+#pragma acc routine gang worker /* { dg-error "multiple loop axes" } */
+void gang (void)
+{
+}
+
+#pragma acc routine worker vector /* { dg-error "multiple loop axes" } */
+void worker (void)
+{
+}
+
+#pragma acc routine vector seq /* { dg-error "multiple loop axes" } */
+void vector (void)
+{
+}
+
+#pragma acc routine seq gang /* { dg-error "multiple loop axes" } */
+void seq (void)
+{
+}
+
+#pragma acc routine (nothing) gang /* { dg-error "not been declared" } */
Index: gcc/testsuite/c-c++-common/goacc/routine-3.c
===
--- gcc/testsuite/c-c++-common/goacc/routine-3.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-3.c	(working copy)
@@ -0,0 +1,53 @@
+#pragma acc routine gang
+void gang (void) /* { dg-message "declared here" 3 } */
+{
+}
+
+#pragma acc routine worker
+void worker (void) /* { dg-message "declared here" 2 } */
+{
+}
+
+#pragma acc routine vector
+void vector (void) /* { dg-message "declared here" 1 } */
+{
+}
+
+#pragma acc routine seq
+void seq (void)
+{
+}
+
+int main ()
+{
+
+#pragma acc parallel num_gangs (32) num_workers (32) vector_length (32)
+  {
+#pragma acc loop gang /* { dg-message "loop here" 1 } */
+for (int i = 0; i < 10; i++)
+  {
+	gang (); /*  { dg-error "routine call uses same" } */
+	worker ();
+	vector ();
+	seq ();
+  }
+#pragma acc loop worker /* { dg-message "loop here" 2 } */
+for (int i = 0; i < 10; i++)
+  {
+	gang (); /*  { dg-error "routine call uses same" } */
+	worker (); /*  { dg-error "routine call uses same" } */
+	vector ();
+	seq ();
+  }
+#pragma acc loop vector /* { dg-message "loop here" 3 } */
+for (int i = 0; i < 10; i++)
+  {
+	gang (); /*  { dg-error "routine call uses same" } */
+	worker (); /*  { dg-error "routine call uses same" } */
+	vector (); /*  { dg-error "routine call uses same" } */
+	seq ();
+  }
+  }
+
+  return 0;
+}
Index: gcc/testsuite/c-c++-common/goacc/routine-4.c
===
--- gcc/testsuite/c-c++-common/goacc/routine-4.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/routine-4.c	(working copy)
@@ -0,0 +1,41 @@
+
+void gang (void);
+void worker (void);
+void vector (void);
+
+#pragma acc routine (gang) gang
+#pragma acc routine (worker) worker
+#pragma acc routine (vector) vector
+  
+#pragma acc routine seq
+void seq (void)
+{
+  gang ();  /* { dg-error "routine call uses" } */
+  worker ();  /* { dg-error "routine call uses" } */
+  vector ();  /* { dg-error "routine call uses" } */
+  seq ();
+}
+
+void vector (void) /* { dg-message "declared here" 1 } */
+{
+  gang ();  /* { dg-error "routine call uses" } */
+  worker ();  /* { dg-error "routine call uses" } */
+  vector ();
+  seq ();
+}
+
+void worker (void) /* { dg-message "declared here" 2 } */
+{
+  gang ();  /* { dg-error "routine call uses" } */
+  worker ();
+  vector ();
+  seq ();
+}
+
+void gang (void) /* { dg-message "declared here" 3 } */
+{
+  gang ();
+  worker ();
+  vector ();
+  seq ();
+}
Index: gcc/testsuite/c-c++-common/goacc/routine-5.c
===
--- gcc/testsuite/c-c++-common/goacc/routine-5.c	(revision 0)
+++ gcc/testsuite/c-c++-common/goacc/ro

Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)

2015-11-09 Thread Joseph Myers
I don't see any conversions between KFmode and TImode (in either 
direction, signed or unsigned) here - I suppose there are no instructions 
for that?

If so, I would guess (without having tested it) that it is more efficient 
to use the libgcc2 implementations of those functions (whether copied, or 
with some logic to build selected libgcc2.c functions for KFmode), which 
implement them using a few hardware operations on DImode [note that where 
libgcc2.c has e.g. __floatditf, that gets mapped to __floattitf for 64-bit 
systems], than to use the soft-fp implementations doing everything with 
integer arithmetic.  (There are IEEE exceptions issues with the libgcc2.c 
conversions from double-word integers to floating-point - see bug 59412 - 
but since that's a preexisting issue for all architectures using this 
code, it's clearly not your problem to fix.)

Ideally, I'd think that for optimal efficiency if objects built for power8 
are linked with libgcc built for power9, or if an executable using shared 
libgcc that was built for power8 gets run with shared libgcc for power9, 
you'd want power9 libgcc to contain t-hardfp versions of all the functions 
that can be expanded inline for power9, and libgcc2 versions of those 
(such as TImode comparisons) that aren't expanded inline, but not to 
contain soft-fp versions of any of those KFmode functions.  Cf. how 
config.host ensures various 32-bit powerpc variants use the right mixture 
of hardfp and soft-fp functions.  It's a bit fiddly to make sure you get 
the preferred implementation of every function and that the ABI doesn't 
change depending on the configured processor, but not that hard.

Since none of the libgcc pieces for KFmode support are yet in, and the 
proposed changes are optimizations rather than a matter of correctness, 
none of the above should directly affect this patch in any way - it simply 
indicates desirable followup once both the libgcc soft-fp KFmode support, 
and this patch, are in.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [1/2] OpenACC routine support

2015-11-09 Thread Nathan Sidwell

On 11/03/15 10:35, Jakub Jelinek wrote:

On Mon, Nov 02, 2015 at 02:21:43PM -0500, Nathan Sidwell wrote:

--- gcc/c/c-parser.c(revision 229667)
+++ gcc/c/c-parser.c(working copy)
@@ -1160,7 +1160,8 @@ enum c_parser_prec {
  static void c_parser_external_declaration (c_parser *);
  static void c_parser_asm_definition (c_parser *);
  static void c_parser_declaration_or_fndef (c_parser *, bool, bool, bool,
-  bool, bool, tree *, vec);
+  bool, bool, tree *, vec,
+  tree);


Wonder if this shouldn't be tree = NULL_TREE, then you'd avoid most of the
c_parser_declaration_or_fndef caller changes.

Otherwise, LGTM.


This is the patch I've just committed.  It includes c parser adjustments to 
detect the case of two function decls with a single type specifier.  Cesar will 
be applying a patch for the C++ parser for the same  case.


nathan

2015-11-09  Nathan Sidwell  

	* omp-low.h (replace_oacc_fn_attrib, build_oacc_routine_dims): Declare.
	* omp-low.c (build_oacc_routine_dims): New.

2015-11-09  Thomas Schwinge  
	Cesar Philippidis  
	James Norris  
	Julian Brown  
	Nathan Sidwell  

	c/
	* c-parser.c (c_parser_declaration_or_fndef): Add OpenACC
	routine arg.
	(c_parser_declaration_or_fndef): Call c_finish_oacc_routine.
	(c_parser_pragma): Parse 'acc routine'.
	(OACC_ROUTINE_CLAUSE_MARK): Define.
	(c_parser_oacc_routine, (c_finish_oacc_routine): New.

2015-11-09  Thomas Schwinge  
	Cesar Philippidis  
	James Norris  
	Julian Brown  
	Nathan Sidwell  

	c-family/
	* c-pragma.c (oacc_pragmas): Add "routine".
	* c-pragma.h (pragma_kind): Add PRAGMA_OACC_ROUTINE.

2015-11-09  Thomas Schwinge  
	Cesar Philippidis  
	James Norris  
	Julian Brown  
	Nathan Sidwell  

	cp/
	* parser.h (struct cp_parser): Add oacc_routine field.
	* parser.c (cp_ensure_no_oacc_routine): New.
	(cp_parser_new): Initialize oacc_routine field.
	(cp_parser_linkage_specification): Call cp_ensure_no_oacc_routine.
	(cp_parser_namespace_definition,
	cp_parser_class_specifier_1): Likewise.
	(cp_parser_init_declarator): Call cp_finalize_oacc_routine.
	(cp_parser_function_definition,
	cp_parser_save_member_function_body): Likewise.
	(OACC_ROUTINE_CLAUSE_MASK): New.
	(cp_parser_finish_oacc_routine, cp_parser_oacc_routine,
	cp_finalize_oacc_routine): New.
	(cp_parser_pragma): Adjust omp_declare_simd checking.  Call
	cp_ensure_no_oacc_routine.
	(cp_parser_pragma): Add OpenACC routine handling.
	
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 230040)
+++ gcc/omp-low.c	(working copy)
@@ -12361,6 +12361,50 @@ set_oacc_fn_attrib (tree fn, tree clause
 }
 }
 
+/*  Process the routine's dimension clauess to generate an attribute
+value.  Issue diagnostics as appropriate.  We default to SEQ
+(OpenACC 2.5 clarifies this). All dimensions have a size of zero
+(dynamic).  TREE_PURPOSE is set to indicate whether that dimension
+can have a loop partitioned on it.  non-zero indicates
+yes, zero indicates no.  By construction once a non-zero has been
+reached, further inner dimensions must also be non-zero.  We set
+TREE_VALUE to zero for the dimensions that may be partitioned and
+1 for the other ones -- if a loop is (erroneously) spawned at
+an outer level, we don't want to try and partition it.  */
+
+tree
+build_oacc_routine_dims (tree clauses)
+{
+  /* Must match GOMP_DIM ordering.  */
+  static const omp_clause_code ids[] = 
+{OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR, OMP_CLAUSE_SEQ};
+  int ix;
+  int level = -1;
+
+  for (; clauses; clauses = OMP_CLAUSE_CHAIN (clauses))
+for (ix = GOMP_DIM_MAX + 1; ix--;)
+  if (OMP_CLAUSE_CODE (clauses) == ids[ix])
+	{
+	  if (level >= 0)
+	error_at (OMP_CLAUSE_LOCATION (clauses),
+		  "multiple loop axes specified for routine");
+	  level = ix;
+	  break;
+	}
+
+  /* Default to SEQ.  */
+  if (level < 0)
+level = GOMP_DIM_MAX;
+  
+  tree dims = NULL_TREE;
+
+  for (ix = GOMP_DIM_MAX; ix--;)
+dims = tree_cons (build_int_cst (boolean_type_node, ix >= level),
+		  build_int_cst (integer_type_node, ix < level), dims);
+
+  return dims;
+}
+
 /* Retrieve the oacc function attrib and return it.  Non-oacc
functions will return NULL.  */
 
Index: gcc/omp-low.h
===
--- gcc/omp-low.h	(revision 230040)
+++ gcc/omp-low.h	(working copy)
@@ -30,6 +30,8 @@ extern tree omp_reduction_init (tree, tr
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
 extern tree omp_member_access_dummy_var (tree);
+extern void replace_oacc_fn_attrib (tree, tree);
+extern tree build_oacc_routine_dims (tree);
 extern tree get_oacc_fn_attrib (tree);
 extern int get_oacc_ifn_dim_arg (const gimple *);
 exte

Re: [PATCH], Add power9 support to GCC, patches #2-5 committed

2015-11-09 Thread Michael Meissner
Actually, it looks like I changed advanced fusion -> power9 fusion.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Add power9 support to GCC, patches #2-5 committed

2015-11-09 Thread Michael Meissner
David said I could commit patches 2-5 after fixing the points that Segher
Boessenkool raised.  I think I addressed most of the points.  If not, let me
know.  I now recall, I have not yet fixed the 'advance fusion' vs. 'power9
fusion' wording in comments, and I will get to that shortly.

I updated the tests to have new tests for the integer power9 instructions
(modulus, count trailing 0's, extswsli) vs. the power9 vector instructions.  I
added new tests for both float128 via software emulation and via power9
instructions.

I updated CTZ_DEFINED_VALUE_AT_ZERO to be 32/64 depending on whether you are
running in 32/64-bit mode.

I removed the empty constraints from the mod define_expand.

Inside of ashdi3_extswsli_dot, if we had split the move and we need to re-issue
the instruction, it calls ashdi3_extswsli_dot instead of ashdi3_extswsli_dot2.

I'm including the patch file for the changes I checked in.

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/constraints.md (wF constraint): New constraints
for power9/toc fusion.
(wG constraint): Likewise.

* config/rs6000/predicates.md (u6bit_cint_operand): New
predicate, recognize 0..63.
(upper16_cint_operand): New predicate for power9 and toc fusion.
(fpr_reg_operand): Likewise.
(toc_fusion_or_p9_reg_operand): Likewise.
(toc_fusion_mem_raw): Likewise.
(toc_fusion_mem_wrapped): Likewise.
(fusion_gpr_addis): If power9 fusion, allow fusion for a larger
address range.
(fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
instead.
(fusion_addis_mem_combo_load): Add support for power9 fusion of
floating point loads, floating point stores, and gpr stores.
(fusion_addis_mem_combo_store): Likewise.
(fusion_offsettable_mem_operand): Likewise.

* config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
declarations.
(emit_fusion_load_store): Likewise.
(fusion_p9_p): Likewise.
(expand_fusion_p9_load): Likewise.
(expand_fusion_p9_store): Likewise.
(emit_fusion_p9_load): Likewise.
(emit_fusion_p9_store): Likewise.
(fusion_wrap_memory_address): Likewise.

* config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
elements for power9 fusion.
(rs6000_debug_print_mode): Rework debug information to print more
information about fusion.
(rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
support.
(rs6000_legitimate_address_p): Recognize toc fusion as a valid
offsettable memory address.
(rs6000_rtx_costs): Update costs for new ISA 3.0 instructions.
(emit_fusion_gpr_load): Move most of the code from
emit_fusion_gpr_load into emit_fusion-addis that handles both
power8 and power9 fusion.
(emit_fusion_addis): Likewise.
(emit_fusion_load_store): Likewise.
(fusion_wrap_memory_address): Add support for TOC fusion.
(fusion_split_address): Likewise.
(fusion_p9_p): Add support for power9 fusion.
(expand_fusion_p9_load): Likewise.
(expand_fusion_p9_store): Likewise.
(emit_fusion_p9_load): Likewise.
(emit_fusion_p9_store): Likewise.

* config/rs6000/rs6000.h (TARGET_EXTSWSLI): Macros for support for
new instructions in ISA 3.0.
(TARGET_CTZ): Likewise.
(TARGET_TOC_FUSION_INT): Macros for power9 fusion support.
(TARGET_TOC_FUSION_FP): Likewise.

* config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
fusion unspecs.
(UNSPEC_FUSION_ADDIS): Likewise.
(QHSI mode iterator): New iterator for power9 fusion.
(GPR_FUSION): Likewise.
(FPR_FUSION): Likewise.
(mod3): Add support for ISA 3.0
modulus instructions.
(umod3): Likewise.
(divmod peephole): Likewise.
(udivmod peephole): Likewise.
(ctz2): Add support for ISA 3.0 count trailing zeros scalar
instructions.
(ctz2_h): Likewise.
(ashdi3_extswsli): Add support for ISA 3.0 EXTSWSLI instruction.
(ashdi3_extswsli_dot): Likewise.
(ashdi3_extswsli_dot2): Likewise.
(power9 fusion splitter): New power9/toc fusion support.
(toc_fusionload_): Likewise.
(toc_fusionload_di): Likewise.
(fusion_gpr_load_): Update predicate function.
(power9 fusion peephole2s): New power9/toc fusion support.
(fusion_gpr___load): Likewise.
(fusion_gpr___store): Likewise.
(fusion_fpr___load): Likewise.
(fusion_fpr___store): Likewise.
(fusion_p9__constant): Likewise.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* lib/target-supports.exp (check_p8vector_hw_available): Split
long line.
(check_vsx_hw_available): Likewise.
(check_p9vector_hw_available): Add new checks for ISA 3.0 hardware
support and fo

Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-09 Thread Joseph Myers
On Sun, 8 Nov 2015, David Malcolm wrote:

> I've been experimenting with using Sphinx [1] for GCC's documentation.
> 
> You can see an HTML sample of GCC docs built with Sphinx here:
> https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/gcc.html
> (it's a work-in-progress; i.e. there are bugs).

Observations:

* Could you provide the PDF version there as well?

* The option summary 

 
seems a complete mess.

* The indexes are missing.

> It doesn't *quite* do a direct .texi to .rst conversion yet: it can take
> the XML output from texinfo's "makeinfo --xml", and generate either one
> big .rst file, or a group of smaller .rst files.
> 
> My hope was that for every gcc/docs/foo.texi file we have, my tool would
> be able to generate a gcc/docs/foo.rst (maybe retaining the name, to
> allow for sane diff and hence sane patch review).
> 
> Unfortunately, "makeinfo --xml" resolves includes and conditional
> processing, so the underlying input structure of .texinfo files is lost
> at that point.
> 
> To fix that, I've been working on a frontend from texi2rst that
> re-implements the .texi to xml processing, retaining information on
> includes, and directives, so that I can translate them to
> corresponding .rst directives.  Unfortunately it's clear that I'm not
> going to finish that before stage 1 closes - but I think it's feasible
> in the stage3 timeframe.

You do of course need to convert documentation fragments in target.def 
in-place and adapt genhooks (preserving the arrangements of both tm.rst.in 
and tm.rst being checked in, or some other such arrangement that ensures 
there are always both GPL and GFDL copies of the hook documentation 
checked in, with genhooks dealing with keeping them in sync).  Other 
things to consider: preserving comments (where applicable); preserving 
@ignore contents (where the function is to comment out text); keeping 
manpage generation (which currently uses @c man comments together with 
@ignore) working; keeping --with-pkgversion and --with-bugurl working; 
keeping the principle that BASE-VER is the only checked-in file with the 
version number and everything else gets the version number at build time; 
dealing with the INTERNALS conditionals in files used in multiple manuals 
(and some others such as cppmanual conditionals).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Use combined_fn in tree-vrp.c

2015-11-09 Thread Bernd Schmidt

On 11/07/2015 01:46 PM, Richard Sandiford wrote:

@@ -3814,8 +3817,8 @@ extract_range_basic (value_range *vr, gimple *stmt)
  break;
  /* Both __builtin_ffs* and __builtin_popcount return
 [0, prec].  */
-   CASE_INT_FN (BUILT_IN_FFS):
-   CASE_INT_FN (BUILT_IN_POPCOUNT):
+   CASE_CFN_FFS:
+   CASE_CFN_POPCOUNT:
  arg = gimple_call_arg (stmt, 0);
  prec = TYPE_PRECISION (TREE_TYPE (arg));
  mini = 0;


So let me see if I understood this. From what we discussed the purpose 
of these new internal functions is that they can have vector types. If 
so, isn't this code (here and elsewhere) which expects integers 
potentially going to be confused?



Bernd


Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof

2015-11-09 Thread Joseph Myers
On Sat, 7 Nov 2015, Segher Boessenkool wrote:

> > The last one is certainly invalid.  The one before is arguably invalid as 
> > well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to 
> > a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] 
> > from array to pointer - an array expression gets converted to an 
> > expression "that points to the initial element of the array object", but 
> > there is no array object a5_7[5] here).
> 
> C11, 6.5.2.1/3:
> Successive subscript operators designate an element of a
> multidimensional array object. If E is an n-dimensional array (n >= 2)
> with dimensions i x j x . . . x k, then E (used as other than an lvalue)
> is converted to a pointer to an (n - 1)-dimensional array with
> dimensions j x . . . x k. If the unary * operator is applied to this
> pointer explicitly, or implicitly as a result of subscripting, the
> result is the referenced (n - 1)-dimensional array, which itself is
> converted into a pointer if used as other than an lvalue. It follows
> from this that arrays are stored in row-major order (last subscript
> varies fastest).
> 
> As far as I see, a5_7[5] here is never treated as an array, just as a
> pointer, and &a5_7[5][0] is valid.

As usual, based on taking the address, not offsetof where there's the open 
question of whether the C standard actually requires support for anything 
other than a single element name there:

a5_7[5] is an expression of array type.  The only way for it to be treated 
as a pointer is for it to be converted implicitly to pointer type.  That 
implicit conversion is what I think is problematic.

Only once the implicit conversion has taken place do the special rules 
about &A[B] meaning A + B take effect.  But since the problem I see is 
with the conversion of A to a pointer, you still have undefined behavior.

The paragraph you quote seems to not to add anything to the semantics 
defined elsewhere in the standard; it's purely descriptive of some 
consequences of those semantics.

Whether we wish to be more permissive about some such cases (depending on 
-Warray-bounds=N) is a pragmatic matter depending on the extent to which 
they are used in practice.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Extend tree-call-cdce to calls whose result is used

2015-11-09 Thread Joseph Myers
On Sat, 7 Nov 2015, Richard Sandiford wrote:

> - the call to sqrt is protected by the result of the optab rather
>   than the input.  It would be better to check !(x >= 0), like
>   tree-call-cdce.c does.

Well, no, isless (x, 0) (or !isgreaterequal (x, 0), since it doesn't 
matter what result a NaN gets).  If a quiet NaN is the argument, the 
"invalid" exception should not be raised, which >= will do (modulo bugs on 
some targets where unordered comparison instructions are wrongly used for 
ordered comparisons).  Bug 68264 filed.

> +/* Return true if CALL can produce a domain error (EDOM) but can never
> +   produce a pole or range overflow error (ERANGE).  This means that we
> +   can tell whether a function would have set errno by testing whether
> +   the result is a NaN.
> +
> +   Note that we do not consider range underflow errors, which are
> +   implementation-defined in C99.  */

I'd think this should respect the library's definition.  For glibc that is 
that underflow to 0 should set errno if the result would also underflow to 
0 in the default rounding mode, but underflow to subnormal result doesn't 
need to set errno, and neither does rounding-mode-dependent underflow to 0 
in cases where f (x) is very close to x for small x but in some rounding 
modes f of the least subnormal is 0.  (That caveat around f (x) close to x 
avoids possibly needing to set errno for the least subnormal only for lots 
of functions.)

> +CASE_FLT_FN (BUILT_IN_EXP):
> +CASE_FLT_FN (BUILT_IN_EXP10):
> +CASE_FLT_FN (BUILT_IN_EXP2):
> +CASE_FLT_FN (BUILT_IN_EXPM1):

These can overflow.

> +CASE_FLT_FN (BUILT_IN_ATAN2):

And this is the only case here where, given that caveat, underflow setting 
ERANGE is applicable with glibc, but overflow is not.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-09 Thread Sandra Loosemore

On 11/08/2015 06:55 AM, David Malcolm wrote:

I've been experimenting with using Sphinx [1] for GCC's documentation.

[snip]

The primary advantages of .rst/sphinx over .texi/texinfo I see are in
the generated HTML:

* sane, stable URLs (so e.g. there is a reliable URL for the docs for,
say, "-Wall").

* a page-splitting structure that make sense, to me, at least [3]

* much more use of markup, with restrained and well-chosen CSS
(texinfo's HTML seems to ignore much of the inline markup in
the .texinfo file)

* autogenerated internal links, so that almost everything is clickable,
and will take you somewhere sane, by default

* syntax-highlighting of code examples, with support for multiple
programming languages (note the mixture of C, C++, Fortran, etc in the
docs for the gcc options).

* looks modern and fresh (IMHO), letting casual observers see that the
project is alive and kicking.


Thoughts?


If we're going to switch documentation formats, I'd rather we used 
DocBook.  I've had to use "restructured text" before and found it really 
awkward.


But, personal preferences aside, I also think it's more important that 
we commit documentation-person resources to making the content more 
correct, readable, and better organized, than to making the HTML output 
look "modern and fresh", or worse yet, translating the docs to another 
format and having to proofread them for conversion goofs.


BTW, Mentor Graphics' toolchains ship with a custom HTML stylesheet for 
the generated manuals, to make them a little "prettier".  Maybe 
something like that would go a long way towards solving the perceived 
problems here?  Or improvements to texinfo's HTML generation.


-Sandra



Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 10:10 PM, Steve Ellcey wrote:

emit_library_call_value_1 has no way of knowing if the promotion should
be signed or unsigned because it has a mode (probably QImode or HImode)
that it knows may need to be promoted to SImode but it has no way to
know if that should be a signed or unsigned promotion because it has no
tree type information about the library call argument types.

Right now it guesses based on the return type but it may guess wrong
when converting an unsigned int to a signed fixed type or visa versa.


That's not quite how I read the code, but it doesn't matter - the lack 
of a type seems to be a real issue. Since I don't see anything better, 
please install your patch.



Bernd


[testsuite] Small adjustment to gcc.dg/sso testcases

2015-11-09 Thread Eric Botcazou
I just noticed that all the new gcc.dg/sso testcases fail on visium-elf 
because the libc (newlib) apparently uses DOS line termination and this fools 
the dg-output directives.  Hence the attached patch, which makes them more 
robust on this front.

Tested on x86_64-suse-linux & visium-elf, applied on the mainline as obvious.


2015-11-09  Eric Botcazou  

* gcc.dg/sso/*.c: Robustify dg-output directives.

-- 
Eric BotcazouIndex: gcc.dg/sso/p1.c
===
--- gcc.dg/sso/p1.c	(revision 230016)
+++ gcc.dg/sso/p1.c	(working copy)
@@ -13,52 +13,52 @@ int main (void)
   put ("My_R1:");
   dump (&My_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "My_R1: 78 56 34 12\n" } */
+  /* { dg-output "My_R1: 78 56 34 12.*\n" } */
 
   put ("My_R2:");
   dump (&My_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "My_R2: 12 34 56 78\n" } */
+  /* { dg-output "My_R2: 12 34 56 78.*\n" } */
 
   Local_R1 = My_R1;
   put ("Local_R1 :");
   dump (&Local_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "Local_R1 : 78 56 34 12\n" } */
+  /* { dg-output "Local_R1 : 78 56 34 12.*\n" } */
 
   Local_R2 = My_R2;
   put ("Local_R2 :");
   dump (&Local_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "Local_R2 : 12 34 56 78\n" } */
+  /* { dg-output "Local_R2 : 12 34 56 78.*\n" } */
 
   Local_R1.I = 0x12345678;
 
   put ("Local_R1 :");
   dump (&Local_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "Local_R1 : 78 56 34 12\n" } */
+  /* { dg-output "Local_R1 : 78 56 34 12.*\n" } */
 
   Local_R2.I = 0x12345678;
 
   put ("Local_R2 :");
   dump (&Local_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "Local_R2 : 12 34 56 78\n" } */
+  /* { dg-output "Local_R2 : 12 34 56 78.*\n" } */
 
   Local_R1.I = Local_R2.I;
 
   put ("Local_R1 :");
   dump (&Local_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "Local_R1 : 78 56 34 12\n" } */
+  /* { dg-output "Local_R1 : 78 56 34 12.*\n" } */
 
   Local_R2.I = Local_R1.I;
 
   put ("Local_R2 :");
   dump (&Local_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "Local_R2 : 12 34 56 78\n" } */
+  /* { dg-output "Local_R2 : 12 34 56 78.*\n" } */
 
   return 0;
 }
Index: gcc.dg/sso/p13.c
===
--- gcc.dg/sso/p13.c	(revision 230016)
+++ gcc.dg/sso/p13.c	(working copy)
@@ -13,52 +13,52 @@ int main (void)
   put ("My_R1:");
   dump (&My_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "My_R1: db 0f 49 40 db 0f 49 c0\n" } */
+  /* { dg-output "My_R1: db 0f 49 40 db 0f 49 c0.*\n" } */
 
   put ("My_R2:");
   dump (&My_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "My_R2: 40 49 0f db c0 49 0f db\n" } */
+  /* { dg-output "My_R2: 40 49 0f db c0 49 0f db.*\n" } */
 
   Local_R1 = My_R1;
   put ("Local_R1 :");
   dump (&Local_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "Local_R1 : db 0f 49 40 db 0f 49 c0\n" } */
+  /* { dg-output "Local_R1 : db 0f 49 40 db 0f 49 c0.*\n" } */
 
   Local_R2 = My_R2;
   put ("Local_R2 :");
   dump (&Local_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "Local_R2 : 40 49 0f db c0 49 0f db\n" } */
+  /* { dg-output "Local_R2 : 40 49 0f db c0 49 0f db.*\n" } */
 
   Local_R1.F = Pi - Pi * I;
 
   put ("Local_R1 :");
   dump (&Local_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "Local_R1 : db 0f 49 40 db 0f 49 c0\n" } */
+  /* { dg-output "Local_R1 : db 0f 49 40 db 0f 49 c0.*\n" } */
 
   Local_R2.F = Pi - Pi * I;
 
   put ("Local_R2 :");
   dump (&Local_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "Local_R2 : 40 49 0f db c0 49 0f db\n" } */
+  /* { dg-output "Local_R2 : 40 49 0f db c0 49 0f db.*\n" } */
 
   Local_R1.F = Local_R2.F;
 
   put ("Local_R1 :");
   dump (&Local_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "Local_R1 : db 0f 49 40 db 0f 49 c0\n" } */
+  /* { dg-output "Local_R1 : db 0f 49 40 db 0f 49 c0.*\n" } */
 
   Local_R2.F = Local_R1.F;
 
   put ("Local_R2 :");
   dump (&Local_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "Local_R2 : 40 49 0f db c0 49 0f db\n" } */
+  /* { dg-output "Local_R2 : 40 49 0f db c0 49 0f db.*\n" } */
 
   return 0;
 }
Index: gcc.dg/sso/p2.c
===
--- gcc.dg/sso/p2.c	(revision 230016)
+++ gcc.dg/sso/p2.c	(working copy)
@@ -13,24 +13,24 @@ int main (void)
   put ("My_R1:");
   dump (&My_R1, sizeof (struct R1));
   new_line ();
-  /* { dg-output "My_R1: e2 59 d1 48 b4 aa d9 bb\n" } */
+  /* { dg-output "My_R1: e2 59 d1 48 b4 aa d9 bb.*\n" } */
 
   put ("My_R2:");
   dump (&My_R2, sizeof (struct R2));
   new_line ();
-  /* { dg-output "My_R2: 84 8d 15 9e 15 5b 35 df\n" } */
+  /* { dg-output "My_R2: 84 8d 15 9e 15 5b 35 df.*\n" } */
 
   Local_R1 = My_R1;
   put ("Local_R1 :");
   dump (&Local_R1, size

Re: [OpenACC] declare directive

2015-11-09 Thread James Norris

Jakub,


The attached patch and ChangeLog reflects the updates from your
review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01002.html

All of the issues you pointed out have been addressed. I've also
added a test that uses C++ templates. A bug was also fixed in the
parsers which dealt with determining which identifier to use  with
an attribute.

Thanks!
Jim




2015-XX-XX  James Norris  
Joseph Myers  

gcc/c-family/
* c-pragma.c (oacc_pragmas): Add entry for declare directive. 
* c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE.
(enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT and
PRAGMA_OACC_CLAUSE_LINK.

gcc/c/
* c-parser.c (c_parser_pragma): Handle PRAGMA_OACC_DECLARE.
(c_parser_omp_clause_name): Handle 'device_resident' clause.
(c_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(c_parser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OACC_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(c_parser_oacc_declare): New function.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Handle 'device_resident'
clause.
(cp_parser_oacc_data_clause): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(cp_paser_oacc_all_clauses): Handle PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT
and PRAGMA_OMP_CLAUSE_LINK.
(OACC_DECLARE_CLAUSE_MASK): New definition.
(cp_parser_oacc_declare): New function.
(cp_parser_pragma): Handle PRAGMA_OACC_DECLARE.
* pt.c (tsubst_expr): Handle OACC_DECLARE.

gcc/
* gimple-pretty-print.c (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE. 
* gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_DECLARE.
(is_gomple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* gimplify.c (oacc_declare_returns): New.
(gimplify_bind_expr): Prepend 'exit' stmt to cleanup.
(device_resident_p): New function.
(omp_default_clause): Handle device_resident clause.
(gimplify_oacc_declare_1, gimplify_oacc_declare): New functions.
(gimplify_expr): Handle OACC_DECLARE.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): New builtin.
* omp-low.c (expand_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_DECLARE and BUILTIN_GOACC_DECLARE.
(build_omp_regions_1): Handlde GF_OMP_TARGET_KIND_OACC_DECLARE.
(lower_omp_target): Handle GF_OMP_TARGET_KIND_OACC_DECLARE,
GOMP_MAP_DEVICE_RESIDENT and GOMP_MAP_LINK.
(make_gimple_omp_edges): Handle GF_OMP_TARGET_KIND_OACC_DECLARE.
* tree-pretty-print.c (dump_omp_clause): Handle GOMP_MAP_LINK and
GOMP_MAP_DEVICE_RESIDENT.

gcc/testsuite
* c-c++-common/goacc/declare-1.c: New test.
* c-c++-common/goacc/declare-2.c: Likewise.

include/
* gomp-constants.h (enum gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT
and GOMP_MAP_LINK.

libgomp/

* libgomp.map (GOACC_2.0.1): Export GOACC_declare.
* oacc-parallel.c (GOACC_declare): New function.
* testsuite/libgomp.oacc-c-c++-common/declare-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/declare-5.c: Likewise.
* testsuite/libgomp.oacc-c++/declare-1.C: Likewise.
diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c
index ac11838..cd0cc27 100644
--- a/gcc/c-family/c-pragma.c
+++ b/gcc/c-family/c-pragma.c
@@ -1207,6 +1207,7 @@ static const struct omp_pragma_def oacc_pragmas[] = {
   { "atomic", PRAGMA_OACC_ATOMIC },
   { "cache", PRAGMA_OACC_CACHE },
   { "data", PRAGMA_OACC_DATA },
+  { "declare", PRAGMA_OACC_DECLARE },
   { "enter", PRAGMA_OACC_ENTER_DATA },
   { "exit", PRAGMA_OACC_EXIT_DATA },
   { "kernels", PRAGMA_OACC_KERNELS },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 953c4e3..c6a2981 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -30,6 +30,7 @@ enum pragma_kind {
   PRAGMA_OACC_ATOMIC,
   PRAGMA_OACC_CACHE,
   PRAGMA_OACC_DATA,
+  PRAGMA_OACC_DECLARE,
   PRAGMA_OACC_ENTER_DATA,
   PRAGMA_OACC_EXIT_DATA,
   PRAGMA_OACC_KERNELS,
@@ -151,6 +152,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
+  PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
@@ -175,7 +177,8 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
-  PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION
+  PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION,
+  PRAGMA_OACC_CLAUSE_LINK = PRAGMA_OMP_CLAUSE_LINK
 };
 
 extern struct cpp_reade

Re: [PATCH][optabs][ifcvt][1/3] Define negcc, notcc optabs

2015-11-09 Thread Jeff Law

On 11/09/2015 05:23 AM, Kyrill Tkachov wrote:

Hi all,

This is a rebase of the patch I posted at:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00154.html

The patch has been ok'd by Jeff but I wanted to hold off committing it
until
my fixes for the ifcvt regressions on sparc and x86_64 were fixed.

The rebase conflicts were due to Richard's optabs splitting patch.

I've also noticed that in my original patch I had a comparison of branch
cost with
the magic number '2'. I removed it from this version as it's not really
meaningful.
The transformation this patch enables is, at the moment, only supported
for arm
and aarch64 where it is always beneficial. If/when we have a proper
ifcvt costing
model (perhaps for GCC 7?) we'll update this accordingly if needed.

Jeff, sorry for taking so long to commit this, I just wanted to fix the
other
ifcvt fallout before proceeding with more new functionality.
I have also uncovered a bug in the arm implementation of these optabs
(patch 3/3 in the series), so I'll post an updated version of that patch
as well soon.

Ok to commit this updated version instead?

Bootstrapped and tested on arm, aarch64 and x86_64.
It has been sitting in my tree for a couple of months now with no issues.

Thanks,
Kyrill

2015-11-09  Kyrylo Tkachov  

 * ifcvt.c (noce_try_inverse_constants): New function.
 (noce_process_if_block): Call it.
 * optabs.h (emit_conditional_neg_or_complement): Declare prototype.
 * optabs.def (negcc_optab, notcc_optab): Declare.
 * optabs.c (emit_conditional_neg_or_complement): New function.
 * doc/tm.texi (Standard Names): Document negcc, notcc names.
Thanks for addressing the ifcvt fallout first, then making sure this 
didn't get lost.


Yes, this is fine for the trunk.

jeff



Re: [PATCH 3/4][AArch64] Add scheduling model for Exynos M1

2015-11-09 Thread Evandro Menezes
I think that it's better if I split this patch further in two others, 
since one of the changes is best if done in aarch64.md.


Please, disregard this patch and watch for updates.

Thank you,

--
Evandro Menezes

On 11/05/2015 05:30 PM, Evandro Menezes wrote:

2015-11-05  Evandro Menezes 

   gcc/
* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched 
model.

* config/aarch64/aarch64.md: Include "exynos-m1.md".
* config/arm/arm-cores.def: Use the Exynos M1 sched model.
* config/arm/arm.md: Include "exynos-m1.md".
* config/arm/arm-tune.md: Regenerated.
* config/arm/exynos-m1.md: New file.

This patch adds the scheduling model for Exynos M1.  I split the DFA 
into ones for GP, LS and FP, resulting in many fewer states and arcs 
than before.


Please, commit if it's alright.

Thank you,





Re: [PATCH][AArch64] Replace insn to zero up DF register

2015-11-09 Thread Evandro Menezes

Hi, Marcus.

Have you an update from the architecture folks about this?

Thank you,

--
Evandro Menezes

On 10/30/2015 05:24 AM, Marcus Shawcroft wrote:

On 20 October 2015 at 00:40, Evandro Menezes  wrote:

In the existing targets, it seems that it's always faster to zero up a DF
register with "movi %d0, #0" instead of "fmov %d0, xzr".

This patch modifies the respective pattern.


Hi Evandro,

This patch changes the generic, u architecture independent instruction
selection. The ARM ARM (C3.5.3) makes a specific recommendation about
the choice of instruction in this situation and the current
implementation in GCC follows that recommendation.  Wilco has also
picked up on this issue he has the same patch internal to ARM along
with an ongoing discussion with ARM architecture folk regarding this
recommendation.  I'm reluctant to take this patch right now on the
basis that it runs contrary to ARM ARM recommendation pending the
conclusion of Wilco's discussion with ARM architecture folk.

Cheers
/Marcus





[visium] Turn a few macros into hooks

2015-11-09 Thread Eric Botcazou
On the heels of Julian's patch, this turns the 3 PRINT_OPERAND_* macros into 
hooks for the Visium port.

Tested on visium-elf, applied on the mainline.


2015-11-09  Eric Botcazou  

* config/visium/visium.h (PRINT_OPERAND): Delete.
(PRINT_OPERAND_PUNCT_VALID_P): Likewise.
(PRINT_OPERAND_ADDRESS): Likewise.
* config/visium/visium.c (TARGET_PRINT_OPERAND_PUNCT_VALID_P): Define
to...
(visium_print_operand_punct_valid_p): ...this.  New function.
(TARGET_PRINT_OPERAND): Define to...
(print_operand): Rename to...
(visium_print_operand): ...this.
(TARGET_PRINT_OPERAND_ADDRESS): Define to...
(visium_output_address): Rename to...
(visium_print_operand_address): ...this.
(print_operand_address): Delete.

-- 
Eric BotcazouIndex: config/visium/visium.c
===
--- config/visium/visium.c	(revision 230016)
+++ config/visium/visium.c	(working copy)
@@ -99,7 +99,6 @@ int visium_indent_opcode = 0;
given how unlikely it is to have a long branch in a leaf function.  */
 static unsigned int long_branch_regnum = 31;
 
-static void visium_output_address (FILE *, enum machine_mode, rtx);
 static tree visium_handle_interrupt_attr (tree *, tree, tree, int, bool *);
 static inline bool current_function_saves_fp (void);
 static inline bool current_function_saves_lr (void);
@@ -157,6 +156,10 @@ static bool visium_legitimate_constant_p
 
 static bool visium_legitimate_address_p (enum machine_mode, rtx, bool);
 
+static bool visium_print_operand_punct_valid_p (unsigned char);
+static void visium_print_operand (FILE *, rtx, int);
+static void visium_print_operand_address (FILE *, machine_mode, rtx);
+
 static void visium_conditional_register_usage (void);
 
 static rtx visium_legitimize_address (rtx, rtx, enum machine_mode);
@@ -227,6 +230,13 @@ static unsigned int visium_reorg (void);
 #undef  TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P visium_legitimate_address_p
 
+#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P
+#define TARGET_PRINT_OPERAND_PUNCT_VALID_P visium_print_operand_punct_valid_p
+#undef TARGET_PRINT_OPERAND
+#define TARGET_PRINT_OPERAND visium_print_operand
+#undef TARGET_PRINT_OPERAND_ADDRESS
+#define TARGET_PRINT_OPERAND_ADDRESS visium_print_operand_address
+
 #undef  TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE visium_attribute_table
 
@@ -3038,12 +3048,19 @@ output_cbranch (rtx label, enum rtx_code
   return output_branch (label, cond, insn);
 }
 
-/* Helper function for PRINT_OPERAND (STREAM, X, CODE).  Output to stdio
-   stream FILE the assembler syntax for an instruction operand OP subject
-   to the modifier LETTER.  */
+/* Implement TARGET_PRINT_OPERAND_PUNCT_VALID_P.  */
 
-void
-print_operand (FILE *file, rtx op, int letter)
+static bool
+visium_print_operand_punct_valid_p (unsigned char code)
+{
+  return code == '#';
+}
+
+/* Implement TARGET_PRINT_OPERAND.  Output to stdio stream FILE the assembler
+   syntax for an instruction operand OP subject to the modifier LETTER.  */
+
+static void
+visium_print_operand (FILE *file, rtx op, int letter)
 {
   switch (letter)
 {
@@ -3104,7 +3121,7 @@ print_operand (FILE *file, rtx op, int l
   break;
 
 case MEM:
-  visium_output_address (file, GET_MODE (op), XEXP (op, 0));
+  visium_print_operand_address (file, GET_MODE (op), XEXP (op, 0));
   break;
 
 case CONST_INT:
@@ -3116,7 +3133,7 @@ print_operand (FILE *file, rtx op, int l
   break;
 
 case HIGH:
-  print_operand (file, XEXP (op, 1), letter);
+  visium_print_operand (file, XEXP (op, 1), letter);
   break;
 
 default:
@@ -3124,11 +3141,12 @@ print_operand (FILE *file, rtx op, int l
 }
 }
 
-/* Output to stdio stream FILE the assembler syntax for an instruction operand
-   that is a memory reference in MODE and whose address is ADDR.  */
+/* Implement TARGET_PRINT_OPERAND_ADDRESS.  Output to stdio stream FILE the
+   assembler syntax for an instruction operand that is a memory reference
+   whose address is ADDR.  */
 
 static void
-visium_output_address (FILE *file, enum machine_mode mode, rtx addr)
+visium_print_operand_address (FILE *file, enum machine_mode mode, rtx addr)
 {
   switch (GET_CODE (addr))
 {
@@ -3205,16 +3223,6 @@ visium_output_address (FILE *file, enum
 }
 }
 
-/* Helper function for PRINT_OPERAND_ADDRESS (STREAM, X).  Output to stdio
-   stream FILE the assembler syntax for an instruction operand that is a
-   memory reference whose address is ADDR.  */
-
-void
-print_operand_address (FILE *file, rtx addr)
-{
-  visium_output_address (file, QImode, addr);
-}
-
 /* The Visium stack frames look like:
 
   Before call  After call
Index: config/visium/visium.h
===
--- config/visium/visium.h	(revision 230016)
+++ config/visium/visium.h	(working copy)

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 03:07:21PM -0700, Jeff Law wrote:
> On 11/09/2015 02:30 PM, Trevor Saunders wrote:
> >
> >So in general when I've done cross target things I think I've found more
> >bugs with config-list.mk than with a regtest, but the regtest has found
> >some things I think.
> I'm finding config-list.mk fairly reliable, with the notable exception of
> the avr-rtems issue and interix.  But that may simply be function of running
> it regularly.

yeah, its reliable although I tend to find needing to have an installed
trunk compiler a little painful.  What I meant was that sometimes I've
made mistakes and introduced testsuite failures or the like.

> >
> >However I actually don't mind bootstrapping and regtesting that much,
> >its more or less a few hours for the control and then another few for
> >each patch.
> I usually save my results and only go back for a control build if something
> goes wrong.  Of course I'm usually stepping forward at least once a day, so
> the number of new tests is usually manageable and allows me to compare the
> first run of the day with the last run of the prior day.

SO my usual mode is to build up a branch just checking that a default
build works, and then at the end run a script that regtests all the
patches.  That can suffer from intermitant tests, but its low human
input so I like it.

>   On the other hand config-list.mk takes on the order of 12
> >hours, and setting up a cross for a quick test isn't really that quick.
> >Which means that if you have a patch touching a number of targets you
> >end up not checking it compiles at all until you run config-list.mk, and
> >then its a heavy weight operation.
> FWIW, If we know what ports a particular patch would hit, I'd fully support
> folks doing builds that didn't hit all of config-list.mk.

sure

> In case it's not obvious I do hope that we'll get to a point where the class
> of bugs like "X is unused on port PDQ because it defines/does not define
> FROBIT" just go away and we can get good first level coverage with a native
> and perhaps a very small number of crosses (instead of the 200+ in
> config-list.mk now).
> 
> At some point I also want to see config-list.mk extended to do things like
> "build the crosses and run test tree-ssa/ssa-dom-thread-11.c on all of
> them".  I've got hacks to do that locally, but they're strictly hacks.  I
> think this selectively deeper testing will become more important as we put
> the first level coverage behind us.

yeah, I'd actually like to see config-list.mk become part of the "real"
build system at some point and you could do something like ./configure
--target=i686-linux-gnu,ppc64-linux-gnu,alpha-dec-vms and stuff.

> >So at least for the way I work I'd really rather write series that I can
> >incrementally test on just one target and be reasonably confident they
> >won't break other targets.
> That generally works for me.
> 
> >
> >The add default macro definitions then wrap those with hooks, then
> >target by target replace the macro by hook overrides approach seems to
> >provide that you can incrementally test and fiind most of the issues,
> >but the change a macro every where approach doesn't really.
> I think Bernd and I just have different approaches, preferences and
> priorities on some stuff which results in slightly different priorities or
> approaches to certain issues.

Sure, we're all different ;)

> I've known Bernd a long time and will say he's very reasonable and his
> concerns/objections are well thought out and carry a ton of weight with me.

I don't really know him, but I don't really disagree with where he wants
to get to.  However I think we work fairly different ways, and review
things differently.  When I review patches (mostly for stuff more
directly related to Mozilla my standards are basically it needs to be an
improvement, and it needs to not introduce bugs.  So I find the it might
improve things, but it doesn't  also accomplish X to berather odd, and
hard to work with if I think getting directly to X might be hard.

Trev

> 
> Jeff


Re: [PATCH] c/67882 - improve -Warray-bounds for invalid offsetof

2015-11-09 Thread Martin Sebor

On 11/07/2015 04:38 PM, Segher Boessenkool wrote:

On Tue, Oct 20, 2015 at 10:10:44PM +, Joseph Myers wrote:

typedef struct FA5_7 {
   int i;
   char a5_7 [5][7];
} FA5_7;

 __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" }
 __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" }
 __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" }
 __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" }

Here I think the last one of these is most likely invalid (being 8 bytes past
the end of the object, rather than just one) and the others valid. Can you
confirm this? (If the &a.v[2].a example is considered invalid, then I think
the a5_7[5][0] test would be the equivalent and ought to also be considered
invalid).


The last one is certainly invalid.  The one before is arguably invalid as
well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to
a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5]
from array to pointer - an array expression gets converted to an
expression "that points to the initial element of the array object", but
there is no array object a5_7[5] here).


C11, 6.5.2.1/3:
Successive subscript operators designate an element of a
multidimensional array object. If E is an n-dimensional array (n >= 2)
with dimensions i x j x . . . x k, then E (used as other than an lvalue)
is converted to a pointer to an (n - 1)-dimensional array with
dimensions j x . . . x k. If the unary * operator is applied to this
pointer explicitly, or implicitly as a result of subscripting, the
result is the referenced (n - 1)-dimensional array, which itself is
converted into a pointer if used as other than an lvalue. It follows
from this that arrays are stored in row-major order (last subscript
varies fastest).

As far as I see, a5_7[5] here is never treated as an array, just as a
pointer, and &a5_7[5][0] is valid.


Segher and I discussed this briefly on IRC over the weekend and
I agreed to try to get a confirmation of the interpretation the
warning is based on from WG14. I'll report back what I learn
(if anything). I defer to Bernd and Joseph as to whether to make
any changes in the meantime.

Martin


Re: Extend mathfn_built_in to handle combined_fn

2015-11-09 Thread Jeff Law

On 11/07/2015 05:52 AM, Richard Sandiford wrote:

This patch extends mathfn_built_in to handle combined_fn, but keeps the
old built_in_function interface around since it's a common case.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.h (mathfn_built_in): Add a variant that takes
a combined_fn.
* builtins.c: Include case-cfn-macros.h.
(CASE_MATHFN): Use CASE_CFN_*.
(CASE_MATHFN_REENT): Use CFN_ codes.
(mathfn_built_in_2, mathfn_built_in_1): Replace built_in_function
argument with a combined_fn.
(mathfn_built_in): Add a variant that takes a combined_fn.
(expand_builtin_int_roundingfn_2): Update callers accordingly.
(fold_builtin_sincos, fold_builtin_classify): Likewise.

OK.
jeff



Re: Make more use of combined_fn

2015-11-09 Thread Jeff Law

On 11/07/2015 05:44 AM, Richard Sandiford wrote:

This patch generalises fold-const.[hc] routines to use combined_fn
instead of built_in_function.  It also updates gimple-ssa-backprop,c
since the update is simple and it avoids churn on the call to
negate_mathfn_p.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard

[I've attached a -b form of the patch too since it's easier to read.]
Thanks for that.  I was thinking that would have made one of the later 
ones easier to read, but it was small enough not to really matter.






gcc/
* fold-const.h (negate_mathfn_p): Take a combined_fn rather
than a built_in_function.
(tree_call_nonnegative_warnv_p): Take a combined_fn rather than
a function decl.
(integer_valued_real_call_p): Likewise.
* fold-const.c: Include case-cfn-macros.h
(negate_mathfn_p): Take a combined_fn rather than a built_in_function.
(negate_expr_p): Update accordingly.
(tree_call_nonnegative_warnv_p): Take a combined_fn rather than
a function decl.
(integer_valued_real_call_p): Likewise.
(tree_invalid_nonnegative_warnv_p): Update accordingly.
(integer_valued_real_p): Likewise.
* gimple-fold.c (gimple_call_nonnegative_warnv_p): Update call
to tree_call_nonnegative_warnv_p.
(gimple_call_integer_valued_real_p): Likewise
integer_valued_real_call_p.
* gimple-ssa-backprop.c: Include case-cfn-macros.h.
(backprop::process_builtin_call_use): Extend to combined_fn.
(strip_sign_op_1): Likewise.
(backprop::process_use): Don't check for built-in calls here.
(backprop::execute): Likewise.
(backprop::optimize_builtin_call): Update call to negate_mathfn_p.


OK
jeff


Fix PR middle-end/68259

2015-11-09 Thread Eric Botcazou
It's another fallout of the scalar-storage-order merge in the form of a tree 
checking failure in reverse_storage_order_for_component_p with UBSan.
The tree generated by UBSan is arguably invalid (it's a COMPONENT_REF of a 
REFERENCE_TYPE) but the code already works around invalid trees generated in 
Fortran (COMPONENT_REF of a VOID_TYPE) and making it more robust is trivial.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2015-11-09  Eric Botcazou  

PR middle-end/68259
* tree.h (reverse_storage_order_for_component_p) :
Check that the type of the first operand is an aggregate type.


2015-11-09  Eric Botcazou  

* g++.dg/ubsan/pr68259.C: New test.

-- 
Eric BotcazouIndex: tree.h
===
--- tree.h	(revision 230016)
+++ tree.h	(working copy)
@@ -4387,8 +4387,9 @@ reverse_storage_order_for_component_p (t
 {
 case ARRAY_REF:
 case COMPONENT_REF:
-  /* ??? Fortran can take COMPONENT_REF of a void type.  */
-  return !VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (t, 0)))
+  /* ??? Fortran can take COMPONENT_REF of a VOID_TYPE.  */
+  /* ??? UBSan can take COMPONENT_REF of a REFERENCE_TYPE.  */
+  return AGGREGATE_TYPE_P (TREE_TYPE (TREE_OPERAND (t, 0)))
 	 && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (TREE_OPERAND (t, 0)));
 
 case BIT_FIELD_REF:
// PR middle-end/68259

// { dg-do compile }
// { dg-options "-fsanitize=undefined -w" }

namespace std {
  template < typename _Tp > class allocator { };
template < typename _Tp, typename _Alloc
= std::allocator < _Tp >
>class vector {
  public:
typedef _Tp value_type;
void push_back (const value_type & __x) { }
  };
}
class Foo;
class FooBar {
public:
Foo * primitive_context;
  FooBar () { }
  FooBar (const FooBar & pnhp);
};
template < class KEY, class CONTENT > class AVLTreeNode { };
template < class KEY, class CONTENT > class FooTree final
{
  FooBar insertPrimitive ();
public:
AVLTreeNode < KEY, CONTENT > *seek_no_lock (const KEY & key) { }
  void primitive_patterns ( std::vector < FooBar > &patterns);
};
template < class KEY, class CONTENT > void FooTree < KEY,
  CONTENT >::primitive_patterns ( std::vector  &patterns)
{
patterns.push_back (insertPrimitive());
}
template < class KEY, class CONTENT >
FooBar FooTree < KEY, CONTENT >::insertPrimitive ()
{
  FooBar place;
  seek_no_lock (place.primitive_context);
  return place;
}
class ManuverResults { };
class opc_info_t
{
public:
FooTree < Foo *, ManuverResults > *primitivecache;
};
static void
do_optical_prox_corr_tsafe (opc_info_t * opc_info)
{
  std::vector < FooBar > patterns;
  opc_info->primitivecache->primitive_patterns (patterns);
}


Re: Use combined_fn in tree-vect-patterns.c

2015-11-09 Thread Jeff Law

On 11/07/2015 05:50 AM, Richard Sandiford wrote:

Another patch to extend uses of built_in_function to combined_fn,
this time in tree-vect-patterns.c.  The old code didn't handle the
long double pow variants, but I think that's because noone had a target
that would benefit rather than because the code would mishandle them.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* tree-vect-patterns.c: Include case-cfn-macros.h.
(vect_recog_pow_pattern): Use combined_fn instead of built-in codes.

OK.
jeff



Re: Use combined_fn in tree-ssa-math-opts.c

2015-11-09 Thread Jeff Law

On 11/07/2015 05:49 AM, Richard Sandiford wrote:

Another patch to extend uses of built_in_function to combined_fn, this time
in tree-ssa-math-opts.c.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* tree-ssa-math-opts.c: Include case-cfn-macros.h.
(execute_cse_sincos_1): Use combined_fn instead of built-in codes.
(pass_cse_sincos::execute): Likewise.

OK.
jeff



Re: Use combined_fn in tree-ssa-reassoc.c

2015-11-09 Thread Jeff Law

On 11/07/2015 05:48 AM, Richard Sandiford wrote:

Another patch to extend uses of built_in_function to combined_fn, this time
in tree-ssa-reassoc.c.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* tree-ssa-reassoc.c: Include case-cfn-macros.h.
(stmt_is_power_of_op): Use combined_fn instead of built-in codes.
(decrement_power, acceptable_pow_call): Likewise.
(attempt_builtin_copysign): Likewise.

OK.
jeff



Re: Use combined_fn in tree-vrp.c

2015-11-09 Thread Jeff Law

On 11/07/2015 05:46 AM, Richard Sandiford wrote:

Another patch to extend uses of built_in_function to combined_fn, this time
in tree-vrp.c.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* tree-vrp.c: Include case-cfn-macros.h.
(extract_range_basic): Switch on combined_fn rather than handling
built-in functions and internal functions separately.

OK.
jeff



Re: Add gencfn-macros.c

2015-11-09 Thread Jeff Law

On 11/07/2015 05:35 AM, Richard Sandiford wrote:

This patch automatically generates case macros such as:

 CASE_CFN_SQRT

for each {F,,L} floating-point built-in function and each {,L,LL,IMAX}
integer built-in function.  The macros match the same built-in
functions as CASE_FLT_FN and CASE_INT_FN but in addition include
the associated internal function, if any.

The idea is to make sure that users of combined_fn don't need to know
which built-in functions have internal-function equivalents.  If we add
a new function to internal-fn.def, all combined_fn users should pick it
up automatically.

The generator wants to use "hash_set ",
so the patch follows hash_map in using the types given by the
traits as the key.  This is a no-op for current users of hash_set.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* Makefile.in (HASH_TABLE_H): Add GGC_H.
(MOSTLYCLEANFILES, generated_files): Add case-fn-macros.h.
(s-case-cfn-macros, case-cfn-macros.h, build/gencfn-macros.o)
(build/gencfn-macros$(build_exeext): New rules.
(genprogerr): Add cfn-macros.
* hash-set.h (hash_set): Use the traits value_type as the key.
* gencfn-macros.c: New file.

OK.

Jeff



Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread David Edelsohn
On Mon, Nov 9, 2015 at 2:17 PM, Michael Meissner
 wrote:
> On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote:
>> On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
>>  wrote:
>> > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
>> >> > > +(define_insn "*toc_fusionload_"
>> >> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
>> >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
>> >> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
>> >> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
>> >> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
>> >> > > +  "TARGET_TOC_FUSION_INT"
>> >> >
>> >> > Do you need that "??r" alternative?  Same for the next define_insn.
>> >>
>> >> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is 
>> >> not a
>> >> base register, and it can't be used for power8 gpr fusion (where you use 
>> >> the
>> >> value being loaded for the ADDIS instruction), but it can be used for 
>> >> power9
>> >> fusion (where the ADDIS must be adjancent, but it no longer has to be the
>> >> register being loaded).
>> >
>> > If you have only "b", r0 will not be chosen.  Does that help?  Or are
>> > you generating this pattern from somewhere else where you put in r0?
>>
>> Mike,
>>
>> What happens if you leave out the "r" alternative?  Does other code
>> explicitly generate that pattern with r0?
>
> Sometimes, one of the passes after reload (usually -fgcse-after-reload) 
> decides
> to redo the register allocation, and I would see a failure in building things
> like Spec 2006.  I have tried not putting the "r" in there, or using
> base_reg_operand instead of gpc_reg_operand, but I still got failures.

This seems like a bug in those other passes that should be tracked down.

Thanks, David


Re: Extend fold_const_call to combined_fn

2015-11-09 Thread Jeff Law

On 11/07/2015 05:37 AM, Richard Sandiford wrote:

This patch extends fold_const_call so that it can handle internal
as well as built-in functions.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* fold-const-call.h (fold_const_call): Replace built_in_function
arguments with combined_fn arguments.
* fold-const-call.c: Include case-cfn-macros.h.
(fold_const_call_ss, fold_const_call_cs, fold_const_call_sc)
(fold_const_call_cc, fold_const_call_sss, fold_const_call_ccc)
(fold_const_call_, fold_const_call_1, fold_const_call): Replace
built_in_function arguments with combined_fn arguments.
* builtins.c (fold_builtin_sincos, fold_builtin_1, fold_builtin_2)
(fold_builtin_3): Update calls to fold_const_call.

OK.
jeff



Re: Add internal bitcount functions

2015-11-09 Thread Jeff Law

On 11/07/2015 05:32 AM, Richard Sandiford wrote:

This patch adds internal function equivalents of all the INT_FN functions.
Unlike the math functions, these functions never set errno and the internal
functions should be exactly equivalent to the built-in ones.  The reason
for defining the internal functions is so that we can extend the
functionality to other modes, in particular vector modes.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* internal-fn.def (DEF_INTERNAL_INT_FN): New macro.
(CLRSB, CLZ, CTZ, FFS, PARITY, POPCOUNT): New functions.
* builtins.c (associated_internal_fn): Handle them.

OK.
jeff



Re: Add internal math functions

2015-11-09 Thread Jeff Law

On 11/07/2015 05:30 AM, Richard Sandiford wrote:

This patch adds internal functions for simple FLT_FN built-in functions,
in cases where an associated optab already exists.  Unlike some of the
built-in functions, these internal functions never set errno.

LDEXP is an odd-one out in that its second operand is an integer.
All the others operate on uniform types.

The patch also adds a function to query the internal function associated
with a built-in function (if any), and another to test whether a given
gcall could be replaced by a call to an internal function on the current
target (as long as the caller deals with errno appropriately).

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.h (associated_internal_fn): Declare.
(replacement_internal_fn): Likewise.
* builtins.c: Include internal-fn.h
(associated_internal_fn, replacement_internal_fn): New functions.
* internal-fn.def (DEF_INTERNAL_FLT_FN): New macro.
(ACOS, ASIN, ATAN, COS, EXP, EXP10, EXP2, EXPM1, LOG, LOG10, LOG1P)
(LOG2, LOGB, SIGNIFICAND, SIN, SQRT, TAN, CEIL, FLOOR, NEARBYINT)
(RINT, ROUND, TRUNC, ATAN2, COPYSIGN, FMOD, POW, REMAINDER, SCALB)
(LDEXP): New functions.
* internal-fn.c: Include recog.h.
(unary_direct, binary_direct): New macros.
(expand_direct_optab_fn): New function.
(expand_unary_optab_fn): New macro.
(expand_binary_optab_fn): Likewise.
(direct_unary_optab_supported_p): Likewise.
(direct_binary_optab_supported_p): Likewise.





diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 72536da..9f9f9cf 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "dojump.h"
  #include "expr.h"
  #include "ubsan.h"
+#include "recog.h"
recog.h?  I don't see anything that would require recognition in this 
patch.  Is there something in a later patch that needs the recog.h header?


If you don't need recog.h, then drop it.

OK for the trunk.

jeff



Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread Michael Meissner
On Mon, Nov 09, 2015 at 01:11:41PM -0800, David Edelsohn wrote:
> On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
>  wrote:
> > On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
> >> > > +(define_insn "*toc_fusionload_"
> >> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> >> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> >> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> >> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> >> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> >> > > +  "TARGET_TOC_FUSION_INT"
> >> >
> >> > Do you need that "??r" alternative?  Same for the next define_insn.
> >>
> >> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is 
> >> not a
> >> base register, and it can't be used for power8 gpr fusion (where you use 
> >> the
> >> value being loaded for the ADDIS instruction), but it can be used for 
> >> power9
> >> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> >> register being loaded).
> >
> > If you have only "b", r0 will not be chosen.  Does that help?  Or are
> > you generating this pattern from somewhere else where you put in r0?
> 
> Mike,
> 
> What happens if you leave out the "r" alternative?  Does other code
> explicitly generate that pattern with r0?

Sometimes, one of the passes after reload (usually -fgcse-after-reload) decides
to redo the register allocation, and I would see a failure in building things
like Spec 2006.  I have tried not putting the "r" in there, or using
base_reg_operand instead of gpc_reg_operand, but I still got failures.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: RFC: C++ delayed folding merge

2015-11-09 Thread Eric Botcazou
> Right, the change is just to the C++ front end 'convert'.

OK, thanks for the clarification.

-- 
Eric Botcazou


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Jeff Law

On 11/09/2015 02:30 PM, Trevor Saunders wrote:


So in general when I've done cross target things I think I've found more
bugs with config-list.mk than with a regtest, but the regtest has found
some things I think.
I'm finding config-list.mk fairly reliable, with the notable exception 
of the avr-rtems issue and interix.  But that may simply be function of 
running it regularly.




However I actually don't mind bootstrapping and regtesting that much,
its more or less a few hours for the control and then another few for
each patch.
I usually save my results and only go back for a control build if 
something goes wrong.  Of course I'm usually stepping forward at least 
once a day, so the number of new tests is usually manageable and allows 
me to compare the first run of the day with the last run of the prior day.




  On the other hand config-list.mk takes on the order of 12

hours, and setting up a cross for a quick test isn't really that quick.
Which means that if you have a patch touching a number of targets you
end up not checking it compiles at all until you run config-list.mk, and
then its a heavy weight operation.
FWIW, If we know what ports a particular patch would hit, I'd fully 
support folks doing builds that didn't hit all of config-list.mk.


In case it's not obvious I do hope that we'll get to a point where the 
class of bugs like "X is unused on port PDQ because it defines/does not 
define FROBIT" just go away and we can get good first level coverage 
with a native and perhaps a very small number of crosses (instead of the 
200+ in config-list.mk now).


At some point I also want to see config-list.mk extended to do things 
like "build the crosses and run test tree-ssa/ssa-dom-thread-11.c on all 
of them".  I've got hacks to do that locally, but they're strictly 
hacks.  I think this selectively deeper testing will become more 
important as we put the first level coverage behind us.





So at least for the way I work I'd really rather write series that I can
incrementally test on just one target and be reasonably confident they
won't break other targets.

That generally works for me.



The add default macro definitions then wrap those with hooks, then
target by target replace the macro by hook overrides approach seems to
provide that you can incrementally test and fiind most of the issues,
but the change a macro every where approach doesn't really.
I think Bernd and I just have different approaches, preferences and 
priorities on some stuff which results in slightly different priorities 
or approaches to certain issues.


I've known Bernd a long time and will say he's very reasonable and his 
concerns/objections are well thought out and carry a ton of weight with me.


Jeff


Remove unused openacc call

2015-11-09 Thread Nathan Sidwell
I've committed this to trunk.   It nuke the now unused GOACC_GET_NUM_THREADS and 
GOACC_GET_THREAD_NUM  calls.   Also fixed up some comment typos I noticed



nathan
2015-11-09  Nathan Sidwell  

	* omp-low.c: Fix some OpenACC comment typos.
	(lower_reduction_clauses): Remove BUILT_IN_GOACC_GET_THREAD_NUM call.
	* omp-builtins.def (BUILT_IN_GOACC_GET_THREAD_NUM,
	BUILT_IN_GOACC_GET_NUM_THREADS): Delete.

Index: omp-low.c
===
--- omp-low.c	(revision 230038)
+++ omp-low.c	(working copy)
@@ -5559,7 +5559,7 @@ lower_reduction_clauses (tree clauses, g
 {
   gimple_seq sub_seq = NULL;
   gimple *stmt;
-  tree x, c, tid = NULL_TREE;
+  tree x, c;
   int count = 0;
 
   /* OpenACC loop reductions are handled elsewhere.  */
@@ -5589,17 +5589,6 @@ lower_reduction_clauses (tree clauses, g
   if (count == 0)
 return;
 
-  /* Initialize thread info for OpenACC.  */
-  if (is_gimple_omp_oacc (ctx->stmt))
-{
-  /* Get the current thread id.  */
-  tree call = builtin_decl_explicit (BUILT_IN_GOACC_GET_THREAD_NUM);
-  tid = create_tmp_var (TREE_TYPE (TREE_TYPE (call)));
-  gimple *stmt = gimple_build_call (call, 0);
-  gimple_call_set_lhs (stmt, tid);
-  gimple_seq_add_stmt (stmt_seqp, stmt);
-}
-
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
 {
   tree var, ref, new_var, orig_var;
@@ -12266,7 +12255,7 @@ expand_omp_atomic (struct omp_region *re
 }
 
 
-/* Encode an oacc launc argument.  This matches the GOMP_LAUNCH_PACK
+/* Encode an oacc launch argument.  This matches the GOMP_LAUNCH_PACK
macro on gomp-constants.h.  We do not check for overflow.  */
 
 static tree
@@ -12292,7 +12281,7 @@ oacc_launch_pack (unsigned code, tree de
 
The attribute value is a TREE_LIST.  A set of dimensions is
represented as a list of INTEGER_CST.  Those that are runtime
-   expres are represented as an INTEGER_CST of zero.
+   exprs are represented as an INTEGER_CST of zero.
 
TOOO. Normally the attribute will just contain a single such list.  If
however it contains a list of lists, this will represent the use of
@@ -14311,7 +14300,7 @@ lower_omp_for (gimple_stmt_iterator *gsi
 			  gimple_omp_for_clauses (stmt),
 			  &oacc_head, &oacc_tail, ctx);
 
-  /* Add OpenACC partitioning markers just before the loop  */
+  /* Add OpenACC partitioning and reduction markers just before the loop  */
   if (oacc_head)
 gimple_seq_add_seq (&body, oacc_head);
   
@@ -19524,7 +19513,7 @@ public:
   return execute_oacc_device_lower ();
 }
 
-}; // class pass_oacc_transform
+}; // class pass_oacc_device_lower
 
 } // anon namespace
 
Index: omp-builtins.def
===
--- omp-builtins.def	(revision 230038)
+++ omp-builtins.def	(working copy)
@@ -47,10 +47,6 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_UPDATE
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, "GOACC_wait",
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_THREAD_NUM, "GOACC_get_thread_num",
-		   BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_NUM_THREADS, "GOACC_get_num_threads",
-		   BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST)
 
 DEF_GOACC_BUILTIN_COMPILER (BUILT_IN_ACC_ON_DEVICE, "acc_on_device",
 			BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST)


Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 08:37:19PM +0100, Bernd Schmidt wrote:
> On 11/09/2015 08:29 PM, Trevor Saunders wrote:
> >as I said in 0/12 this did go through config-list.mk, and checking again
> >this does build on alpha-dec-vms.
> 
> The question I have is - why does it build on any other target? It's the
> reference that's unconditional, not the definition. Do we have enough DCE at
> -O0 to eliminate the reference? It's still incorrect IMO (and should be
> fixed in the other patches as well.

dce would be my guess.  I guess going back to #if ing the bits that
reference it, and then incrementally removing the #ifs starting with the
ones defining the functions used in the structs, but given you seem to
be against patches that only change ifdef to #if you might not likethat
:(

> >
> >I'd actually really rather review them, or really deal with them in any
> >way, the way they are.  Smaller simpler patches that only deal with one
> >thing are much better.  I think the most macros that appear on one line
> >are 2, so at most you could lower that to 1 change instead of 2, but who
> >really cares anyway?
> 
> Well, I do, because I get to see this stuff:
> 
> -#if 1 < (defined (DBX_DEBUGGING_INFO) + defined (SDB_DEBUGGING_INFO) \
> +#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
>   + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO)
> \
>   + defined (VMS_DEBUGGING_INFO))
> 
>  #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
> - + defined (DWARF2_DEBUGGING_INFO) + defined (XCOFF_DEBUGGING_INFO)
> \
> +  + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
>   + defined (VMS_DEBUGGING_INFO))
> 
>  #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
>+ defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
> - + defined (VMS_DEBUGGING_INFO))
> +  + (VMS_DEBUGGING_INFO))
> 
>  #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
> -  + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
> +  + (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
>+ (VMS_DEBUGGING_INFO))
> 
> -#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
> +#if 1 < ((DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
>+ (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
>+ (VMS_DEBUGGING_INFO))
> 
> etc.

other than reading this now I'm not sure what the context would be, but
either way personally I really don't mind reading that, and think its
simpler to reason about the correctness of one thing at a time.

Trev

> 
> 
> Bernd


Re: [patch] New remainder of front end header reduction

2015-11-09 Thread Jeff Law

On 11/02/2015 06:58 AM, Andrew MacLeod wrote:

On 11/02/2015 01:41 AM, Jeff Law wrote:

On 10/30/2015 07:37 AM, Andrew MacLeod wrote:

OK, here's the much delayed front end reduction patch based on the
reordering already being checked in.

I discovered that my targets builds were only building c/c++, so the
other languages were being reduced based only on the host
x86_64-pc-linux-gnu build.   Thats *probably* ok, but I wanted to be
sure.  This is when I discovered that the other languages have varying
amounts of support amongst the targets. Simply building all the targets
to compile, say ada, doesn't actually work quite right.

So this patch covers all the languages which do have full support.. the
ones enabled by 'all' languages.

I am determining which targets build the other languages now, and will
submit separate reduction patches for those languages.


Here's the rest of the front end files.  I will temporarily hold off
checking in the other front end file due to the ENABLE_OFFLOADING and
ENABLE_FOLD_CHECKING issues brought up.  I'll rerun the tool with all
the ENABLE_* macros I can fine predefined in the tool to avoid this
issue, and adjust he front end patches if necessary.. ie, if the tool
finds a reduction that shouldnt happen.

Anyway, here's the rest of the header files which should be the final
patch  I ran the tool on the coverage components of config-list.mk that
supported each of the languages, then did a full build of all targets.

bootstraps on x86_64-pc-linux-gnu with no new regressions, and passes
all of config-list.mk

The remaining header file removals are fine.

jeff


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 12:46:33PM -0700, Jeff Law wrote:
> On 11/09/2015 12:38 PM, Bernd Schmidt wrote:
> >On 11/09/2015 07:52 PM, Trevor Saunders wrote:
> >
> >>yeah, that's more or less my thought, and this makes hookization easier
> >>since you can now mechanically add a hook for each thing in defaults.h
> >>that invokes the macro.  Then for each target you can go through and
> >>replace the macro with an override of the hooks.  That ends up with the
> >>macros replaced by hooks without writing a lot of patches that need to
> >>go through config-list.mk, and testing on multiple targets which imho is
> >>a giant pain, and rather slow.
> >
> >We might want to think about making a policy decision to try waiving
> >some of the testing requirements for target macro -> hook conversions.
> >Maybe try only a "build to cc1" requirement and see whether that causes
> >too much breakage.
> A config-list.mk build is a build to cc1*, f951, gnat1, so we're not
> requiring deep tests on the affected targets.  Not sure how much we're
> getting by forcing a bootstrap & regression test of that kind of change.

So in general when I've done cross target things I think I've found more
bugs with config-list.mk than with a regtest, but the regtest has found
some things I think.

However I actually don't mind bootstrapping and regtesting that much,
its more or less a few hours for the control and then another few for
each patch.  On the other hand config-list.mk takes on the order of 12
hours, and setting up a cross for a quick test isn't really that quick.
Which means that if you have a patch touching a number of targets you
end up not checking it compiles at all until you run config-list.mk, and
then its a heavy weight operation.

So at least for the way I work I'd really rather write series that I can
incrementally test on just one target and be reasonably confident they
won't break other targets.

The add default macro definitions then wrap those with hooks, then
target by target replace the macro by hook overrides approach seems to
provide that you can incrementally test and fiind most of the issues,
but the change a macro every where approach doesn't really.

Trev

The add default macros then use those in hooks, and finally add overides
> 
> I'm certainly open to this kind of relaxed testing to help this stuff move
> forward an complete before we're all retired :-)
> 
> Jeff
> 


Remove instantiations when no concept check

2015-11-09 Thread François Dumont
Hi

I just committed this trivial cleanup.

2015-11-09  François Dumont  

* include/bits/stl_algo.h
(partial_sort_copy): Instantiate std::iterator_traits only if concept
checks.
(lower_bound): Likewise.
(upper_bound): Likewise.
(equal_range): Likewise.
(binary_search): Likewise.
* include/bits/stl_heap.h (pop_heap): Likewise.

François

diff --git libstdc++-v3/include/bits/stl_algo.h libstdc++-v3/include/bits/stl_algo.h
index c90f479..6037044 100644
--- libstdc++-v3/include/bits/stl_algo.h
+++ libstdc++-v3/include/bits/stl_algo.h
@@ -1735,12 +1735,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _RandomAccessIterator __result_first,
 		  _RandomAccessIterator __result_last)
 {
+#ifdef _GLIBCXX_CONCEPT_CHECKS
   typedef typename iterator_traits<_InputIterator>::value_type
 	_InputValueType;
   typedef typename iterator_traits<_RandomAccessIterator>::value_type
 	_OutputValueType;
-  typedef typename iterator_traits<_RandomAccessIterator>::difference_type
-	_DistanceType;
+#endif
 
   // concept requirements
   __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
@@ -1786,12 +1786,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  _RandomAccessIterator __result_last,
 		  _Compare __comp)
 {
+#ifdef _GLIBCXX_CONCEPT_CHECKS
   typedef typename iterator_traits<_InputIterator>::value_type
 	_InputValueType;
   typedef typename iterator_traits<_RandomAccessIterator>::value_type
 	_OutputValueType;
-  typedef typename iterator_traits<_RandomAccessIterator>::difference_type
-	_DistanceType;
+#endif
 
   // concept requirements
   __glibcxx_function_requires(_InputIteratorConcept<_InputIterator>)
@@ -2020,13 +2020,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 lower_bound(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val, _Compare __comp)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
   __glibcxx_function_requires(_BinaryPredicateConcept<_Compare,
-  _ValueType, _Tp>)
+	typename iterator_traits<_ForwardIterator>::value_type, _Tp>)
   __glibcxx_requires_partitioned_lower_pred(__first, __last,
 		__val, __comp);
   __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
@@ -2078,12 +2075,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 upper_bound(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
-  __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>)
+  __glibcxx_function_requires(_LessThanOpConcept<
+	_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
   __glibcxx_requires_irreflexive2(__first, __last);
 
@@ -2111,13 +2106,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 upper_bound(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val, _Compare __comp)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
   __glibcxx_function_requires(_BinaryPredicateConcept<_Compare,
-  _Tp, _ValueType>)
+	_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_upper_pred(__first, __last,
 		__val, __comp);
   __glibcxx_requires_irreflexive_pred2(__first, __last, __comp);
@@ -2186,13 +2178,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 equal_range(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator>)
-  __glibcxx_function_requires(_LessThanOpConcept<_ValueType, _Tp>)
-  __glibcxx_function_requires(_LessThanOpConcept<_Tp, _ValueType>)
+  __glibcxx_function_requires(_LessThanOpConcept<
+	typename iterator_traits<_ForwardIterator>::value_type, _Tp>)
+  __glibcxx_function_requires(_LessThanOpConcept<
+	_Tp, typename iterator_traits<_ForwardIterator>::value_type>)
   __glibcxx_requires_partitioned_lower(__first, __last, __val);
   __glibcxx_requires_partitioned_upper(__first, __last, __val);
   __glibcxx_requires_irreflexive2(__first, __last);
@@ -2224,15 +2215,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 equal_range(_ForwardIterator __first, _ForwardIterator __last,
 		const _Tp& __val, _Compare __comp)
 {
-  typedef typename iterator_traits<_ForwardIterator>::value_type
-	_ValueType;
-
   // concept requirements
   __glibcxx_function_requires(_Fo

Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread David Edelsohn
On Mon, Nov 9, 2015 at 11:57 AM, Segher Boessenkool
 wrote:
> On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
>> > > +(define_insn "*toc_fusionload_"
>> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
>> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
>> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
>> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
>> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
>> > > +  "TARGET_TOC_FUSION_INT"
>> >
>> > Do you need that "??r" alternative?  Same for the next define_insn.
>>
>> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
>> base register, and it can't be used for power8 gpr fusion (where you use the
>> value being loaded for the ADDIS instruction), but it can be used for power9
>> fusion (where the ADDIS must be adjancent, but it no longer has to be the
>> register being loaded).
>
> If you have only "b", r0 will not be chosen.  Does that help?  Or are
> you generating this pattern from somewhere else where you put in r0?

Mike,

What happens if you leave out the "r" alternative?  Does other code
explicitly generate that pattern with r0?

Thanks, David


Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Steve Ellcey
On Mon, 2015-11-09 at 21:47 +0100, Bernd Schmidt wrote:
> On 11/09/2015 05:59 PM, Steve Ellcey wrote:
> > Here is a version with the code moved into a new function.  How does
> > this look?
> >
> > 2015-11-09  Steve Ellcey  
> >
> > * optabs.c (prepare_libcall_arg): New function.
> > (expand_fixed_convert): Add call to prepare_libcall_arg.
> 
> Hold on a moment - I see that emit_library_call_value_1 calls 
> promote_function_mode for arguments. Can you investigate why that 
> doesn't do what you need?
> 
> 
> Bernd


emit_library_call_value_1 has no way of knowing if the promotion should
be signed or unsigned because it has a mode (probably QImode or HImode)
that it knows may need to be promoted to SImode but it has no way to
know if that should be a signed or unsigned promotion because it has no
tree type information about the library call argument types.

Right now it guesses based on the return type but it may guess wrong
when converting an unsigned int to a signed fixed type or visa versa.

By doing the promotion in expand_fixed_convert GCC can use the uintp
argument to ensure that the signedness of the promotion is done
correctly.  We could pass that argument into emit_library_call_value_1
so it can do the correct promotion but that would require changing the
argument list for emit_library_call and emit_library_call_value_1 and
changing all the other call locations for those functions and that
seemed like overkill.

Steve Ellcey



Re: [PATCH 01/12] reduce conditional compilation for HARD_FRAME_POINTER_IS_ARG_POINTER

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 09:58 PM, Trevor Saunders wrote:

With the exception of the emit-rtl.c hunk I think I've correctly
convinced myself this macro is just an optimization.


I also looked at that one and initially thought that it can simply go 
away, but the earlier test for FRAME_POINTER_REGNUM also has some extra 
reload_completed etc. conditions.


So I think this:

!  if (!HARD_FRAME_POINTER_IS_FRAME_POINTER
  && regno == HARD_FRAME_POINTER_REGNUM
  && (!reload_completed || frame_pointer_needed))
return hard_frame_pointer_rtx;
-#if !HARD_FRAME_POINTER_IS_ARG_POINTER
  if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
  && regno == ARG_POINTER_REGNUM)
return arg_pointer_rtx;
-#endif

should become

! if (HARD_FRAME_POINTER_REGNUM != FRAME_POINTER_REGNUM
  && regno == HARD_FRAME_POINTER_REGNUM
  && (!reload_completed || frame_pointer_needed))
return hard_frame_pointer_rtx;
  if (FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
+ && HARD_FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM
  && regno == ARG_POINTER_REGNUM)
return arg_pointer_rtx;
#endif

and then it should be possible to eliminate the two X_IS_Y macros.


Bernd


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Mike Stump
On Nov 9, 2015, at 11:46 AM, Jeff Law  wrote:
On 11/09/2015 12:38 PM, Bernd Schmidt wrote:
>> We might want to think about making a policy decision to try waiving
>> some of the testing requirements for target macro -> hook conversions.
>> Maybe try only a "build to cc1" requirement and see whether that causes
>> too much breakage.
> A config-list.mk build is a build to cc1*, f951, gnat1, so we're not 
> requiring deep tests on the affected targets.  Not sure how much we're 
> getting by forcing a bootstrap & regression test of that kind of change.
> 
> I'm certainly open to this kind of relaxed testing to help this stuff move 
> forward an complete before we're all retired :-)

Testing is a cornerstone of gcc quality.  I like it.  It is useful.  That said, 
I don’t think we should always be fanatical about it.  How and when we accept 
less that a standard bootstrap and regression test run I’ve sure would be a big 
topic, but rather than make a ton of rules, I’d rather let small handful of 
reviewers decide when and how to accept less, and let them do what they want.  
We can give them negative feedback if it impacts too many people, too often and 
they can adjust.

The other way, would be to have an integration branch that is tested and merged 
post testing on a regular basis and let people contribute less than well tested 
things on it, the idea being that it still won’t hit trunk until after a 
bootstrap and tests suite run, but that we can bundle 2-100 patches into one 
test suite run.  This strikes me as more scalable, easier for developers and 
removes the requirement of test suite + bootstrap before checkin while 
retaining the useful quality of everything merged to trunk is tested.  Hardest 
part about this would be ChangeLogs, merge resolution and svn blame.  git 
handles this gracefully.  svn as I recall, a little less so.  [ quick check ] 
Ah, seems svn blame -g TARGET ca n handle this graceful (in theory).

Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)

2015-11-09 Thread Michael Meissner
I evidently forgot to attach the patch.

[gcc]
2015-11-08  Michael Meissner  

* config/rs6000/constraints.md (we constraint): New constraint for
64-bit power9 vector support.
(wL constraint): New constraint for the element in a vector that
can be addressed by the MFVSRLD instruction.

* config/rs6000/rs6000.c (rs6000_debug_reg_global): Add ISA 3.0
debugging.
(rs6000_init_hard_regno_mode_ok): If ISA 3.0 and 64-bit, enable we
constraint.  Disable the VSX<->GPR direct move helpers if we have
the MFVSRLD and MTVSRDD instructions.
(rs6000_secondary_reload_simple_move): Add support for doing
vector direct moves directly without additional scratch registers
if we have ISA 3.0 instructions.
(rs6000_secondary_reload_direct_move): Update comments.
(rs6000_output_move_128bit): Add support for ISA 3.0 vector
instructions.

* config/rs6000/vsx.md (vsx_mov): Add support for ISA 3.0
direct move instructions.
(vsx_movti_64bit): Likewise.
(vsx_extract_): Likewise.

* config/rs6000/rs6000.h (VECTOR_ELEMENT_MFVSRLD_64BIT): New
macros for ISA 3.0 direct move instructions.
(TARGET_DIRECT_MOVE_128): Likewise.

* config/rs6000/rs6000.md (128-bit GPR splitters): Don't split a
128-bit move that is a direct move between GPR and vector
registers using ISA 3.0 direct move instructions.

* doc/md.texi (RS/6000 constraints): Document we, wF, wG, wL
constraints.  Update wa documentation to say not to use %x on
instructions that only take Altivec registers.

[gcc/testsuite]
2015-11-08  Michael Meissner  

* gcc.target/powerpc/direct-move-vector.c: New test for 128-bit
vector direct move instructions.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 229976)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -64,7 +64,8 @@ (define_register_constraint "wa" "rs6000
 (define_register_constraint "wd" "rs6000_constraints[RS6000_CONSTRAINT_wd]"
   "VSX vector register to hold vector double data or NO_REGS.")
 
-;; we is not currently used
+(define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]"
+  "VSX register if the -mpower9-vector -m64 options were used or NO_REGS.")
 
 (define_register_constraint "wf" "rs6000_constraints[RS6000_CONSTRAINT_wf]"
   "VSX vector register to hold vector float data or NO_REGS.")
@@ -147,6 +148,12 @@ (define_memory_constraint "wG"
   "Memory operand suitable for TOC fusion memory references"
   (match_operand 0 "toc_fusion_mem_wrapped"))
 
+(define_constraint "wL"
+  "Int constant that is the element number mfvsrld accesses in a vector."
+  (and (match_code "const_int")
+   (and (match_test "TARGET_DIRECT_MOVE_128")
+   (match_test "(ival == VECTOR_ELEMENT_MFVSRLD_64BIT)"
+
 ;; Lq/stq validates the address for load/store quad
 (define_memory_constraint "wQ"
   "Memory operand suitable for the load/store quad instructions"
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 229977)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -2575,6 +2575,10 @@ rs6000_debug_reg_global (void)
   if (TARGET_VSX)
 fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit scalar element",
 (int)VECTOR_ELEMENT_SCALAR_64BIT);
+
+  if (TARGET_DIRECT_MOVE_128)
+fprintf (stderr, DEBUG_FMT_D, "VSX easy 64-bit mfvsrld element",
+(int)VECTOR_ELEMENT_MFVSRLD_64BIT);
 }
 
 
@@ -2986,6 +2990,10 @@ rs6000_init_hard_regno_mode_ok (bool glo
rs6000_constraints[RS6000_CONSTRAINT_wp] = VSX_REGS;/* TFmode  */
 }
 
+  /* Support for new direct moves.  */
+  if (TARGET_DIRECT_MOVE_128)
+rs6000_constraints[RS6000_CONSTRAINT_we] = VSX_REGS;
+
   /* Set up the reload helper and direct move functions.  */
   if (TARGET_VSX || TARGET_ALTIVEC)
 {
@@ -3034,7 +3042,7 @@ rs6000_init_hard_regno_mode_ok (bool glo
  reg_addr[TImode].reload_load   = CODE_FOR_reload_ti_di_load;
}
 
- if (TARGET_DIRECT_MOVE)
+ if (TARGET_DIRECT_MOVE && !TARGET_DIRECT_MOVE_128)
{
  reg_addr[TImode].reload_gpr_vsx= 
CODE_FOR_reload_gpr_from_vsxti;
  reg_addr[V1TImode].reload_gpr_vsx  = 
CODE_FOR_reload_gpr_from_vsxv1ti;
@@ -18081,6 +18089,11 @@ rs6000_secondary_reload_simple_move (enu
  || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
 return true;
 
+  else if (TARGET_DIRECT_MOVE_128 && size == 16
+  && ((to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
+  || (to_type == GPR_REG_TYPE && f

Re: [PATCH 11/12] always define HAVE_AS_LEB128

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 07:42:32PM +0100, Bernd Schmidt wrote:
> >-#ifdef HAVE_AS_LEB128
> >+#if HAVE_AS_LEB128
> 
> This patch doesn't seem to actually remove any conditional compilation?

I guess, but it does make it possible to incrementally do that which
wasn't possible before, and imo is an improvement.

Trev

> 
> 
> Bernd


Re: Extend tree-call-cdce to calls whose result is used

2015-11-09 Thread Michael Matz
Hi,

On Mon, 9 Nov 2015, Richard Sandiford wrote:

> +static bool
> +can_use_internal_fn (gcall *call)
> +{
> +  /* Only replace calls that set errno.  */
> +  if (!gimple_vdef (call))
> +return false;

Oh, I managed to confuse this in my head while reading the patch.  So, 
hmm, you don't actually replace the builtin with an internal function 
(without the condition) under no-errno-math?  Does something else do that?  
Because otherwise that seems an unnecessary restriction?

> >> r229916 fixed that for the non-EH case.
> >
> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the 
> > original dominator of the EH destination was the call block it moves, 
> > otherwise it remains unchanged.
> 
> The target of the edge is easy in itself, I agree, but that isn't
> necessarily the only affected block, if the EH handler doesn't
> exit or rethrow.

You're worried the non-EH and the EH regions merge again, right?  Like so:

before change:

BB1: throwing-call
 fallthru/   \EH
BB2   BBeh
 |   /\ (stuff in EH-region)
 | /some path out of EH region
 | /--/
BB3

Here, BB3 must at least be dominated by BB1 (the throwing block), or by 
something further up (when there are other side-entries to the path 
BB2->BB3 or into the EH region).  When further up, nothing changes, when 
it's BB1, then it's afterwards dominated by the BB containing the 
condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh, 
which gets idom=Bcall.  Depending on how you split BB1, either Bcond or 
BBcall might still be BB1 and doesn't lead to changes in the dom tree.

> > Currently we have quite some of such passes (reassoc, forwprop, 
> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap 
> > and others), but they are all handling only special situations in one 
> > way or the other.  pass_fold_builtins is another one, but it seems 
> > most related to what you want (replacing a call with something else), 
> > so I thought that'd be the natural choice.
> 
> Well, to be pedantic, it's not really replacing the call.  Except for
> the special case of targets that support direct assignments to errno,
> it keeps the original call but ensures that it isn't usually executed.
> From that point of view it doesn't really seem like a fold.
> 
> But I suppose that's just naming again :-).  And it's easily solved with
> s/fold/rewrite/.

Exactly, in my mind pass_fold_builtin (like many of the others I 
mentioned) doesn't do folding but rewriting :)

> > call_cdce is also such a pass, but I think it's simply not the 
> > appropriate one (only in so far as its source file contains the helper 
> > routines you need), and in addition I think it shouldn't exist at all 
> > (and wouldn't need to if it had been part of DCE from the start, or if 
> > you implemented the conditionalizing as part of another pass).  Hey, 
> > you could be one to remove a pass! ;-)
> 
> It still seems a bit artificial to me to say that the transformation 
> with a null lhs is "DCE enough" to go in the main DCE pass (even though 
> like I say it doesn't actually eliminate any code from the IR, it just 
> adds more code) and should be kept in a separate pass from the one that 
> does the transformation on a non-null lhs.

Oh, I agree, I might not have been clear: I'm not arguing that the normal 
DCE should now be changed to do the conditionalizing when it removes an 
call LHS; I was saying that it _would_ have been good instead of adding 
the call_cdce pass in the past, when it was for DCE purposes only.  But 
now your proposal is on the plate, namely doing the conditionalizing also 
with an LHS.  So that conditionalizing should take place in some rewriting 
pass (and ideally not call_cdce), no matter the LHS, and normal DCE not be 
changed (it will still remove LHSs of non-removable calls, just that those 
then are sometimes under a condition, when DCE runs after the rewriting).


Ciao,
Michael.


Re: [PATCH 01/12] reduce conditional compilation for HARD_FRAME_POINTER_IS_ARG_POINTER

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 08:01:28PM +0100, Bernd Schmidt wrote:
> On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:
> >+++ b/gcc/dbxout.c
> >@@ -3076,10 +3076,8 @@ dbxout_symbol_location (tree decl, tree type, const 
> >char *suffix, rtx home)
> >|| (REG_P (XEXP (home, 0))
> >&& REGNO (XEXP (home, 0)) != HARD_FRAME_POINTER_REGNUM
> >&& REGNO (XEXP (home, 0)) != STACK_POINTER_REGNUM
> >-#if !HARD_FRAME_POINTER_IS_ARG_POINTER
> >-   && REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM
> >-#endif
> >-   )))
> >+   && (HARD_FRAME_POINTER_IS_ARG_POINTER
> >+   || REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM
> 
> This used to be
> 
> #if ARG_POINTER_REGNUM != HARD_FRAME_POINTER_REGNUM
> 
> and the whole macro seems kind of pointless - why not just make the
> ARG_POINTER_REGNUM test unconditional? I think the conditional compilation

With the exception of the emit-rtl.c hunk I think I've correctly
convinced myself this macro is just an optimization.

> was originally just a "performance optimization", avoiding unnecessary tests
> - which means the reason to have the tests goes away if we move away from
> the conditional compilation.

Well, with this patch the test should still get optimized away when
appropriate, but if you want to go to supporting multiple targets then
yeah it might as well go away.

Trev

> 
> 
> Bernd


Re: RFC: C++ delayed folding merge

2015-11-09 Thread Jason Merrill

On 11/09/2015 02:28 PM, Jason Merrill wrote:

On 11/09/2015 04:08 AM, Richard Biener wrote:

On Mon, 9 Nov 2015, Jason Merrill wrote:


I'm planning to merge the C++ delayed folding branch this week, but I
need to
get approval of the back end changes (the first patch attached).
Most of
these are the introduction of non-folding variants of convert_to_*,
but there
are a few others.

One question: The branch changes 'convert' to not fold its result,
and it's
not clear to me whether that's part of the expected behavior of a
front end
'convert' function or not.


History.  convert is purely frontend (but shared, unfortunately between
all frontends).  I would expect that FEs that do not do delayed folding
expect convert to fold.


Also, I'm a bit uncertain about merging this at the end of stage 1,
since it's
a large internal change with relatively small user impact; it just
improves
handling of constant expression corner cases.  I'm inclined to go
ahead with
it at this point, but I'm interested in contrary opinions.


I welcome this change as it should allow cleaning up the FE-middle-end
interface a bit more.  It should be possible to remove all
NON_LVALUE_EXPR adding/removal from the middle-end folders.

Looks like the backend patch included frontend parts but as far as I
skimmed it only

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 5e32901..d754a90 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2091,6 +2091,17 @@ fold_convert_const (enum tree_code code, tree
type,
tree arg1)
else if (TREE_CODE (arg1) == REAL_CST)
 return fold_convert_const_fixed_from_real (type, arg1);
  }
+  else if (TREE_CODE (type) == VECTOR_TYPE)
+{
+  if (TREE_CODE (arg1) == VECTOR_CST
+ && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (TREE_TYPE
(arg1))
+ && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1))
+   {
+ tree r = copy_node (arg1);
+ TREE_TYPE (arg1) = type;
+ return r;
+   }
+}


looks suspicious.  The issue here is that the vector elements will
have the wrong type after this simple handling.


I was aiming to just handle simple cv-qualifier changes; that's why the
TYPE_MAIN_VARIANT comparison is there.


If you fix that you can as well handle all kind of element type
changes via recursing to fold_convert_const (that includes
float to int / int to float changes).


But I'll try this.


Like so?

Jason

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index ae16cfc..d8c7faf 100644
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index f9e5064..927e623 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2095,6 +2095,25 @@ fold_convert_const (enum tree_code code, tree type, tree arg1)
   else if (TREE_CODE (arg1) == REAL_CST)
 	return fold_convert_const_fixed_from_real (type, arg1);
 }
+  else if (TREE_CODE (type) == VECTOR_TYPE)
+{
+  if (TREE_CODE (arg1) == VECTOR_CST
+	  && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1))
+	{
+	  int len = TYPE_VECTOR_SUBPARTS (type);
+	  tree elttype = TREE_TYPE (type);
+	  tree *v = XALLOCAVEC (tree, len);
+	  for (int i = 0; i < len; ++i)
+	{
+	  tree elt = VECTOR_CST_ELT (arg1, i);
+	  tree cvt = fold_convert_const (code, elttype, elt);
+	  if (cvt == NULL_TREE)
+		return NULL_TREE;
+	  v[i] = cvt;
+	}
+	  return build_vector (type, v);
+	}
+}
   return NULL_TREE;
 }
 


Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 05:59 PM, Steve Ellcey wrote:

Here is a version with the code moved into a new function.  How does
this look?

2015-11-09  Steve Ellcey  

* optabs.c (prepare_libcall_arg): New function.
(expand_fixed_convert): Add call to prepare_libcall_arg.


Hold on a moment - I see that emit_library_call_value_1 calls 
promote_function_mode for arguments. Can you investigate why that 
doesn't do what you need?



Bernd


Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-09 Thread Arnaud Charlet
> > > We do have also a texi2rst script which handles 90% of the work, the
> > > rest requiring manual adaptations. I can send the script we've used if
> > > this can help.
> > 
> > I'm interested in seeing your script.  Can you post/upload it somewhere?
> 
> Yes I will. Let me get the latest version we've used and get back to you.

Here it is. We've used it to convert many docs at AdaCore (including
gnat_ugn and gnat_rm.texi). It does require manual postprocessing but gives
a good headstart.

Arno
#!/usr/bin/python
# -*- coding: utf-8 -*-

"""Splits an existing .texinfo file into components suitable for
   makeinfo.py
   If "-node " is specified, only that node and its children are
   kept
"""

import re
import sys
import optparse
import os.path


def finish_section(out, section, section_node, marker, with_label):
if section_node == '' or section_node == 'Top':
return

# Create a label
if with_label:
out.write('.. _%s:\n\n' % section_node.replace(' ', '_'))

# Create header

if len(marker) == 2:
out.write(marker[0] * len(section_node) + '\n')

out.write(section_node + '\n')

out.write(marker[0] * len(section_node) + '\n\n')

list_level = 0
prev_was_blank = False
in_example = False
in_table = 0
in_menu = False
example_end = ''
table_marker = '*'

def word(line, index=1):
s = line.lstrip().split()
if len(s) >= index:
return s[index - 1]
else:
return ""

for line in section.strip().splitlines():
if word(line, 1) in ('@itemize', '@enumerate'):
list_level = list_level + 1
if not prev_was_blank:
out.write('\n')
prev_was_blank = False

elif line.lstrip().startswith('@end itemize') \
or line.lstrip().startswith('@end enumerate'):
list_level = list_level - 1
prev_was_blank = False

elif word(line, 1) == '@table':
out.write("\n")
table_marker = '*'
in_table += 1
prev_was_blank = True

elif in_table > 0 and line.lstrip().startswith('@end table'):
in_table -= 1
prev_was_blank = False

elif line.lstrip().startswith('@menu'):
out.write('.. toctree::\n')
out.write('   :numbered:\n')
out.write('   :maxdepth: 3\n')
out.write('\n')
in_menu = True

elif in_menu:
if line.startswith('@end menu'):
in_menu = False
else:
entry = re.sub('::.*', '', line)
entry = re.sub('^\* ', '', entry.strip())
entry = entry.replace(' ', '_').replace('/', '_')
out.write('   ' + entry + '\n')

elif word(line, 1) in (
"@deffn", "@defmethod", "@deftp", "@deftypemethod",
"@deffnx", "@defmethodx", "@deftypefn", "@defun"):

out.write(".. index:: %s\n\n" % line.lstrip().split(' ', 1)[1])
out.write(line.split(' ', 1)[1].strip() + '\n')

in_table += 1
table_marker = '`'

elif in_table > 0 \
and word(line, 1) in ("@end") \
and word(line, 2) in (
   "deffn", "defmethod", "deftp", "deftypemethod",
   "deffnx", "defmethodx", "deftypefn", "defun"):
in_table -= 1

elif word(line, 1) in ('@item', '@itemx'):
line = line.lstrip().replace('@itemx', '')
line = line.replace('@item', '')

if in_table > 0:
if line.strip().startswith(table_marker):
# Avoid lines like  "**Bold* text*" which of course
# sphinx doesn't like
table_marker = ""

out.write('\n%s%s%s\n' % (
table_marker, line.strip(), table_marker))
prev_was_blank = True
else:
out.write('  ' * (list_level - 1) + '* ' + line.strip() + '\n')
prev_was_blank = False

elif line.strip() == '':
if not prev_was_blank:
out.write('\n')
prev_was_blank = True

else:
if '@example' in line:
in_example = True
example_end = '@end example'
out.write('  ' * list_level + '::\n\n')
continue
elif '@smallexample' in line:
in_example = True
example_end = '@end smallexample'
out.write('\n' + '  ' * list_level + '::\n\n')
continue
elif '@CODESAMPLE{' in line:
line = line.replace("@CODESAMPLE{", "")
example_end = '}'
in_example = True
out.write('\n')
out.write('  ' * list_level + ".. highlight:: ada\n\n")
out.write('  ' * list_level + '::\n\n')

elif '@NO

[PATCH, 16/16] Add libgomp.oacc-fortran/kernels-*.f95

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds Fortran oacc kernels execution tests.

Thanks,
- Tom

Add libgomp.oacc-fortran/kernels-*.f95

2015-11-09  Tom de Vries  

	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: New test.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95: Same.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95:
	Same.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95: Same.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95: Same.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: Same.
	* testsuite/libgomp.oacc-fortran/kernels-loop.f95: Same.
	* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
	Same.
---
 .../libgomp.oacc-fortran/kernels-loop-2.f95| 32 ++
 .../libgomp.oacc-fortran/kernels-loop-data-2.f95   | 38 ++
 .../kernels-loop-data-enter-exit-2.f95 | 38 ++
 .../kernels-loop-data-enter-exit.f95   | 36 
 .../kernels-loop-data-update.f95   | 36 
 .../libgomp.oacc-fortran/kernels-loop-data.f95 | 36 
 .../libgomp.oacc-fortran/kernels-loop.f95  | 28 
 .../kernels-parallel-loop-data-enter-exit.f95  | 37 +
 8 files changed, 281 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
new file mode 100644
index 000..1fb40ee
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
@@ -0,0 +1,32 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer:: i, ii
+
+  !$acc kernels copyout (a(0:n-1))
+  do i = 0, n - 1
+ a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyout (b(0:n-1))
+  do i = 0, n -1
+ b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+ c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+ if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
new file mode 100644
index 000..7b52253
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
@@ -0,0 +1,38 @@
+! { dg-do run }
+! { dg-options "-ftree-parallelize-loops=32" }
+
+program main
+  implicit none
+  integer, parameter :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer:: i, ii
+
+  !$acc data copyout (a(0:n-1))
+  !$acc ke

[PATCH, 15/16] Add libgomp.oacc-c-c++-common/kernels-*.c

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds C/C++ oacc kernels execution tests.

Thanks,
- Tom

Add libgomp.oacc-c-c++-common/kernels-*.c

2015-11-09  Tom de Vries  

	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c:
	Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c:
	Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c:
	Same.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c: Same.
---
 .../libgomp.oacc-c-c++-common/kernels-loop-2.c | 47 ++
 .../libgomp.oacc-c-c++-common/kernels-loop-3.c | 34 +
 .../kernels-loop-and-seq-2.c   | 36 ++
 .../kernels-loop-and-seq-3.c   | 37 ++
 .../kernels-loop-and-seq-4.c   | 36 ++
 .../kernels-loop-and-seq-5.c   | 37 ++
 .../kernels-loop-and-seq-6.c   | 36 ++
 .../kernels-loop-and-seq.c | 37 ++
 .../kernels-loop-collapse.c| 40 
 .../kernels-loop-data-2.c  | 56 ++
 .../kernels-loop-data-enter-exit-2.c   | 54 +
 .../kernels-loop-data-enter-exit.c | 51 
 .../kernels-loop-data-update.c | 53 
 .../libgomp.oacc-c-c++-common/kernels-loop-data.c  | 50 +++
 .../libgomp.oacc-c-c++-common/kernels-loop-g.c |  5 ++
 .../kernels-loop-mod-not-zero.c| 41 
 .../libgomp.oacc-c-c++-common/kernels-loop-n.c | 47 ++
 .../libgomp.oacc-c-c++-common/kernels-loop-nest.c  | 26 ++
 .../libgomp.oacc-c-c++-common/kernels-loop.c   | 41 
 .../kernels-parallel-loop-data-enter-exit.c| 52 
 .../libgomp.oacc-c-c++-common/kernels-reduction.c  | 37 ++
 21 files changed, 853 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
 create 

[PATCH, 14/16] Add gfortran.dg/goacc/kernels-*.f95

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds Fortran oacc kernels compilation tests.

Thanks,
- Tom

Add gfortran.dg/goacc/kernels-*.f95

2015-11-09  Tom de Vries  

	* gfortran.dg/goacc/kernels-loop-2.f95: New test.
	* gfortran.dg/goacc/kernels-loop-data-2.f95: New test.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: New test.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: New test.
	* gfortran.dg/goacc/kernels-loop-data-update.f95: New test.
	* gfortran.dg/goacc/kernels-loop-data.f95: New test.
	* gfortran.dg/goacc/kernels-loop.f95: New test.
	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: New test.
---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 | 45 +++
 .../gfortran.dg/goacc/kernels-loop-data-2.f95  | 51 ++
 .../goacc/kernels-loop-data-enter-exit-2.f95   | 51 ++
 .../goacc/kernels-loop-data-enter-exit.f95 | 49 +
 .../gfortran.dg/goacc/kernels-loop-data-update.f95 | 48 
 .../gfortran.dg/goacc/kernels-loop-data.f95| 49 +
 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95   | 39 +
 .../kernels-parallel-loop-data-enter-exit.f95  | 50 +
 8 files changed, 382 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
new file mode 100644
index 000..7fd6d4e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -0,0 +1,45 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+! { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" }
+! { dg-additional-options "-fdump-tree-optimized" }
+
+program main
+  implicit none
+  integer, parameter :: n = 1024
+  integer, dimension (0:n-1) :: a, b, c
+  integer:: i, ii
+
+  !$acc kernels copyout (a(0:n-1))
+  do i = 0, n - 1
+ a(i) = i * 2
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyout (b(0:n-1))
+  do i = 0, n -1
+ b(i) = i * 4
+  end do
+  !$acc end kernels
+
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do ii = 0, n - 1
+ c(ii) = a(ii) + b(ii)
+  end do
+  !$acc end kernels
+
+  do i = 0, n - 1
+ if (c(i) .ne. a(i) + b(i)) call abort
+  end do
+
+end program main
+
+! Check that only three loops are analyzed, and that all can be parallelized.
+! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } }
+! { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } }
+
+! Check that the loop has been split off into a function.
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.0 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAIN__._omp_fn.1 " 1 "optimized" } }
+! { dg-final { scan-tree-dump-times "(?n);; Function MAI

[PATCH, 13/16] Add c-c++-common/goacc/kernels-*.c

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds C/C++ oacc kernels compilation tests.

Thanks,
- Tom

Add c-c++-common/goacc/kernels-*.c

2015-11-09  Tom de Vries  

	* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.
	* c-c++-common/goacc/kernels-counter-var-redundant-load.c: New test.
	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: New test.
	* c-c++-common/goacc/kernels-double-reduction.c: New test.
	* c-c++-common/goacc/kernels-empty.c: New test.
	* c-c++-common/goacc/kernels-eternal.c: New test.
	* c-c++-common/goacc/kernels-loop-2-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-loop-2.c: New test.
	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-loop-3.c: New test.
	* c-c++-common/goacc/kernels-loop-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-loop-data-2.c: New test.
	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: New test.
	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: New test.
	* c-c++-common/goacc/kernels-loop-data-update.c: New test.
	* c-c++-common/goacc/kernels-loop-data.c: New test.
	* c-c++-common/goacc/kernels-loop-g.c: New test.
	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: New test.
	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-loop-n.c: New test.
	* c-c++-common/goacc/kernels-loop-nest.c: New test.
	* c-c++-common/goacc/kernels-loop.c: New test.
	* c-c++-common/goacc/kernels-noreturn.c: New test.
	* c-c++-common/goacc/kernels-one-counter-var.c: New test.
	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: New test.
	* c-c++-common/goacc/kernels-reduction.c: New test.
---
 .../goacc/kernels-acc-loop-reduction.c | 25 
 .../goacc/kernels-acc-loop-smaller-equal.c | 25 
 .../goacc/kernels-counter-var-redundant-load.c | 36 +++
 .../goacc/kernels-counter-vars-function-scope.c| 54 +
 .../c-c++-common/goacc/kernels-double-reduction.c  | 37 
 gcc/testsuite/c-c++-common/goacc/kernels-empty.c   |  6 ++
 gcc/testsuite/c-c++-common/goacc/kernels-eternal.c | 11 
 .../c-c++-common/goacc/kernels-loop-2-acc-loop.c   | 21 +++
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  | 70 ++
 .../c-c++-common/goacc/kernels-loop-3-acc-loop.c   | 17 ++
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c  | 49 +++
 .../c-c++-common/goacc/kernels-loop-acc-loop.c | 17 ++
 .../c-c++-common/goacc/kernels-loop-data-2.c   | 70 ++
 .../goacc/kernels-loop-data-enter-exit-2.c | 68 +
 .../goacc/kernels-loop-data-enter-exit.c   | 65 
 .../c-c++-common/goacc/kernels-loop-data-update.c  | 65 
 .../c-c++-common/goacc/kernels-loop-data.c | 64 
 gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c  | 17 ++
 .../c-c++-common/goacc/kernels-loop-mod-not-zero.c | 52 
 .../c-c++-common/goacc/kernels-loop-n-acc-loop.c   | 17 ++
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c  | 56 +
 .../c-c++-common/goacc/kernels-loop-nest.c | 39 
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c| 56 +
 .../c-c++-common/goacc/kernels-noreturn.c  | 12 
 .../c-c++-common/goacc/kernels-one-counter-var.c   | 54 +
 .../goacc/kernels-parallel-loop-data-enter-exit.c  | 66 
 .

[PATCH, i386]: Fix gcc.target/i386/pr66648.c FAIL

2015-11-09 Thread Uros Bizjak
2015-11-09  Uros Bizjak  

* config/i386/i386.md (*strmovqi_1): Fix insn enable condition.

Tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 230038)
+++ config/i386/i386.md (working copy)
@@ -16251,9 +16251,9 @@
(set (match_operand:P 1 "register_operand" "=S")
(plus:P (match_dup 3)
(const_int 1)))]
-  "!(fixed_regs[SI_REG] || fixed_regs[DI_REG])"
-  "%^movsb
+  "!(fixed_regs[SI_REG] || fixed_regs[DI_REG])
&& ix86_check_no_addr_space (insn)"
+  "%^movsb"
   [(set_attr "type" "str")
(set_attr "memory" "both")
(set (attr "prefix_rex")


[PATCH, 12/16] Handle acc loop directive

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch deals with loops in an oacc kernels region which are 
annotated using "#pragma acc loop". It expands such a loop as a normal 
loop, which has the effect of ignoring the "#pragma acc loop".


Thanks,
- Tom

Handle acc loop directive

2015-11-09  Tom de Vries  

	* omp-low.c (struct omp_region): Add inside_kernels_p field.
	(expand_omp_for_generic): Only set address taken for istart0
	and end0 unless necessary.  Adjust to generate a 'sequential' loop
	when GOMP builtin arguments are BUILT_IN_NONE.
	(expand_omp_for): Use expand_omp_for_generic() to generate a
	non-parallelized loop for OMP_FORs inside OpenACC kernels regions.
	(expand_omp): Mark inside_kernels_p field true for regions
	nested inside OpenACC kernels constructs.
---
 gcc/omp-low.c | 127 --
 1 file changed, 87 insertions(+), 40 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 1283cc7..859a2eb 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -136,6 +136,9 @@ struct omp_region
   /* The ordered stmt if type is GIMPLE_OMP_ORDERED and it has
  a depend clause.  */
   gomp_ordered *ord_stmt;
+
+  /* True if this is nested inside an OpenACC kernels construct.  */
+  bool inside_kernels_p;
 };
 
 /* Context structure.  Used to store information about each parallel
@@ -8238,6 +8241,7 @@ expand_omp_for_generic (struct omp_region *region,
   gassign *assign_stmt;
   bool in_combined_parallel = is_combined_parallel (region);
   bool broken_loop = region->cont == NULL;
+  bool seq_loop = (start_fn == BUILT_IN_NONE || next_fn == BUILT_IN_NONE);
   edge e, ne;
   tree *counts = NULL;
   int i;
@@ -8335,8 +8339,12 @@ expand_omp_for_generic (struct omp_region *region,
   type = TREE_TYPE (fd->loop.v);
   istart0 = create_tmp_var (fd->iter_type, ".istart0");
   iend0 = create_tmp_var (fd->iter_type, ".iend0");
-  TREE_ADDRESSABLE (istart0) = 1;
-  TREE_ADDRESSABLE (iend0) = 1;
+
+if (!seq_loop)
+{
+  TREE_ADDRESSABLE (istart0) = 1;
+  TREE_ADDRESSABLE (iend0) = 1;
+}
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
@@ -8366,7 +8374,20 @@ expand_omp_for_generic (struct omp_region *region,
   gsi_prev (&gsif);
 
   tree arr = NULL_TREE;
-  if (in_combined_parallel)
+  if (seq_loop)
+{
+  tree n1 = fold_convert (fd->iter_type, fd->loop.n1);
+  tree n2 = fold_convert (fd->iter_type, fd->loop.n2);
+
+  assign_stmt = gimple_build_assign (istart0, n1);
+  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+
+  assign_stmt = gimple_build_assign (iend0, n2);
+  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+
+  t = fold_build2 (NE_EXPR, boolean_type_node, istart0, iend0);
+}
+  else if (in_combined_parallel)
 {
   gcc_assert (fd->ordered == 0);
   /* In a combined parallel loop, emit a call to
@@ -8788,39 +8809,45 @@ expand_omp_for_generic (struct omp_region *region,
 	collapse_bb = extract_omp_for_update_vars (fd, cont_bb, l1_bb);
 
   /* Emit code to get the next parallel iteration in L2_BB.  */
-  gsi = gsi_start_bb (l2_bb);
+  if (!seq_loop)
+	{
+	  gsi = gsi_start_bb (l2_bb);
 
-  t = build_call_expr (builtin_decl_explicit (next_fn), 2,
-			   build_fold_addr_expr (istart0),
-			   build_fold_addr_expr (iend0));
-  t = force_gimple_operand_gsi (&gsi, t, true, NULL_TREE,
-false, GSI_CONTINUE_LINKING);
-  if (TREE_TYPE (t) != boolean_type_node)
-	t = fold_build2 (NE_EXPR, boolean_type_node,
-			 

[PATCH, 11/16] Update testcases after adding kernels pass group

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch updates existing testcases with new pass numbers, given the 
passes that were added in the pass list in patch 10.


Thanks,
- Tom

Update testcases after adding kernels pass group

2015-11-09  Tom de Vries  

	* c-c++-common/restrict-2.c: Update after adding pass_oacc_kernels pass
	group.
	* c-c++-common/restrict-4.c: Same.
	* g++.dg/tree-ssa/copyprop-1.C: Same.
	* g++.dg/tree-ssa/pr33615.C: Same.
	* g++.dg/tree-ssa/restrict1.C: Same.
	* gcc.dg/gomp/notify-new-function-3.c: Same.
	* gcc.dg/pr23911.c: Same.
	* gcc.dg/pr41488.c: Same.
	* gcc.dg/tm/pub-safety-1.c: Same.
	* gcc.dg/tm/reg-promotion.c: Same.
	* gcc.dg/tree-ssa/20030709-2.c: Same.
	* gcc.dg/tree-ssa/20030731-2.c: Same.
	* gcc.dg/tree-ssa/20040729-1.c: Same.
	* gcc.dg/tree-ssa/20050314-1.c: Same.
	* gcc.dg/tree-ssa/cfgcleanup-1.c: Same.
	* gcc.dg/tree-ssa/loop-17.c: Same.
	* gcc.dg/tree-ssa/loop-32.c: Same.
	* gcc.dg/tree-ssa/loop-33.c: Same.
	* gcc.dg/tree-ssa/loop-34.c: Same.
	* gcc.dg/tree-ssa/loop-35.c: Same.
	* gcc.dg/tree-ssa/loop-36.c: Same.
	* gcc.dg/tree-ssa/loop-39.c: Same.
	* gcc.dg/tree-ssa/loop-7.c: Same.
	* gcc.dg/tree-ssa/pr21086.c: Same.
	* gcc.dg/tree-ssa/pr23109.c: Same.
	* gcc.dg/tree-ssa/restrict-3.c: Same.
	* gcc.dg/tree-ssa/restrict-5.c: Same.
	* gcc.dg/tree-ssa/scev-7.c: Same.
	* gcc.dg/tree-ssa/ssa-dce-1.c: Same.
	* gcc.dg/tree-ssa/ssa-dce-2.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-1.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-10.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-11.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-12.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-2.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-3.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-6.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-7.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-8.c: Same.
	* gcc.dg/tree-ssa/ssa-lim-9.c: Same.
	* gcc.dg/tree-ssa/structopt-1.c: Same.
	* gcc.dg/vect/pr26359.c: Same.
	* gfortran.dg/pr32921.f: Same.
---
 gcc/testsuite/c-c++-common/restrict-2.c   | 4 ++--
 gcc/testsuite/c-c++-common/restrict-4.c   | 4 ++--
 gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C| 4 ++--
 gcc/testsuite/g++.dg/tree-ssa/pr33615.C   | 4 ++--
 gcc/testsuite/g++.dg/tree-ssa/restrict1.C | 4 ++--
 gcc/testsuite/gcc.dg/gomp/notify-new-function-3.c | 2 +-
 gcc/testsuite/gcc.dg/pr23911.c| 6 +++---
 gcc/testsuite/gcc.dg/pr41488.c| 4 ++--
 gcc/testsuite/gcc.dg/tm/pub-safety-1.c| 4 ++--
 gcc/testsuite/gcc.dg/tm/reg-promotion.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c| 8 
 gcc/testsuite/gcc.dg/tree-ssa/20030731-2.c| 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/20040729-1.c| 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c| 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/cfgcleanup-1.c  | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-17.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-32.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-33.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-34.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-35.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/loop-36.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-39.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/loop-7.c| 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/pr21086.c   | 6 +++---
 gcc/testsuite/gcc.dg/tree-ssa/pr23109.c   | 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c| 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c| 4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/scev-7.c| 4 ++--

Re: [PATCH], Add power9 support to GCC, patch #7 (direct move enhancements)

2015-11-09 Thread Segher Boessenkool
On Sun, Nov 08, 2015 at 07:48:56PM -0500, Michael Meissner wrote:
> This patch adds support for the new direct move instructions (MFVSRLD and
> MTVSRDD) that simplify moving 128-bit data between GPRs and vector registers.

You forgot to attach the patch :-)


Segher


Re: [PATCH] Minor refactoring in tree-ssanames.c & freelists verifier

2015-11-09 Thread Jeff Law

On 11/09/2015 08:00 AM, Michael Matz wrote:

Hi,

On Mon, 9 Nov 2015, Jeff Law wrote:

+verify_ssaname_freelists (struct function *fun)
+{
+  /* Do nothing if we are in RTL format.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+{
+  if (bb->flags & BB_RTL)
+   return;
+}

gimple_in_ssa_p (fun);

Agreed & fixed.



+  /* Then note the operands of each statement.  */
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next (&gsi))
+   {
+ ssa_op_iter iter;
+ gimple *stmt = gsi_stmt (gsi);
+ FOR_EACH_SSA_TREE_OPERAND (t, stmt, iter, SSA_OP_ALL_OPERANDS)
+   if (TREE_CODE (t) == SSA_NAME)
+ bitmap_set_bit (names_in_il, SSA_NAME_VERSION (t));
+   }

t will always be an SSA_NAME here.
Likewise.  I think that test was in there from a time when I'd run the 
verifier at a different point in the pipeline and things weren't 
necessarily consistent.  I'll simplify in the obvious way.


I put bootstrapped x86_64-linux-gnu with the verifier enabled. 
Installed on the trunk.


Thanks for catching these.


Jeff
commit 681292298ad97eebda56fa64f00c772c2c3c7e29
Author: law 
Date:   Mon Nov 9 19:56:57 2015 +

Re: [PATCH] Minor refactoring in tree-ssanames.c & freelists verifier

* tree-ssanames.c (verify_ssaname_freelists): Simplify check for
being in gimple/ssa form.  Remove redundant check for SSA_NAME.
Fix comment typo.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@230049 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7911804..43a8d49 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-11-09  Jeff Law  
+
+   * tree-ssanames.c (verify_ssaname_freelists): Simplify check for
+   being in gimple/ssa form.  Remove redundant check for SSA_NAME.
+   Fix comment typo.
+
 2015-11-09  Michael Meissner  
 
* config/rs6000/rs6000.opt (-mpower9-fusion): Add new switches for
diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 096b75b..b599bb5 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -124,17 +124,13 @@ ssanames_print_statistics (void)
 DEBUG_FUNCTION void
 verify_ssaname_freelists (struct function *fun)
 {
-  /* Do nothing if we are in RTL format.  */
-  basic_block bb;
-  FOR_EACH_BB_FN (bb, fun)
-{
-  if (bb->flags & BB_RTL)
-   return;
-}
+  if (!gimple_in_ssa_p (fun))
+return;
 
   bitmap names_in_il = BITMAP_ALLOC (NULL);
 
   /* Walk the entire IL noting every SSA_NAME we see.  */
+  basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
 {
   tree t;
@@ -163,8 +159,7 @@ verify_ssaname_freelists (struct function *fun)
  ssa_op_iter iter;
  gimple *stmt = gsi_stmt (gsi);
  FOR_EACH_SSA_TREE_OPERAND (t, stmt, iter, SSA_OP_ALL_OPERANDS)
-   if (TREE_CODE (t) == SSA_NAME)
- bitmap_set_bit (names_in_il, SSA_NAME_VERSION (t));
+   bitmap_set_bit (names_in_il, SSA_NAME_VERSION (t));
}
 }
 
@@ -218,7 +213,7 @@ verify_ssaname_freelists (struct function *fun)
  debug/non-debug compilations have the same SSA_NAMEs.  So for each
  lost SSA_NAME, see if it's likely one from that wart.  These will always
  be marked as default definitions.  So we loosely assume that anything
- marked as a default definition isn't leaked by pretening they are
+ marked as a default definition isn't leaked by pretending they are
  in the IL.  */
   for (unsigned int i = UNUSED_NAME_VERSION + 1; i < num_ssa_names; i++)
 if (ssa_name (i) && SSA_NAME_IS_DEFAULT_DEF (ssa_name (i)))


[PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.



This patch adds the pass_oacc_kernels pass group to the pass list in 
passes.def.


Note the repetition of pass_lim/pass_copy_prop. The first pair is for an 
inner loop in a loop nest, the second for an outer loop in a loop nest.


Thanks,
- Tom

Add pass_oacc_kernels pass group in passes.def

2015-11-09  Tom de Vries  

	* omp-low.c (pass_expand_omp_ssa::clone): New function.
	* tree-ssa-loop.c (pass_scev_cprop::clone, pass_tree_loop_init::clone)
	(pass_tree_loop_done::clone): New function.
	* passes.def: Add pass_oacc_kernels pass group.
---
 gcc/omp-low.c   |  1 +
 gcc/passes.def  | 21 +
 gcc/tree-ssa-loop.c |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 13fa456..1283cc7 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -13360,6 +13360,7 @@ public:
   return !(fun->curr_properties & PROP_gimple_eomp);
 }
   virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
diff --git a/gcc/passes.def b/gcc/passes.def
index c0ab6b9..b7a5424 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -86,6 +86,27 @@ along with GCC; see the file COPYING3.  If not see
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	 execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  /* Pass group that runs when there are oacc kernels in the
+	 function.  */
+	  NEXT_PASS (pass_oacc_kernels);
+	  PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
+	  NEXT_PASS (pass_dominator_oacc_kernels);
+	  NEXT_PASS (pass_ch_oacc_kernels);
+	  NEXT_PASS (pass_dominator_oacc_kernels);
+	  NEXT_PASS (pass_tree_loop_init);
+	  NEXT_PASS (pass_lim);
+	  NEXT_PASS (pass_copy_prop);
+	  NEXT_PASS (pass_lim);
+	  NEXT_PASS (pass_copy_prop);
+	  NEXT_PASS (pass_scev_cprop);
+	  NEXT_PASS (pass_tree_loop_done);
+	  NEXT_PASS (pass_dominator_oacc_kernels);
+	  NEXT_PASS (pass_dce);
+	  NEXT_PASS (pass_tree_loop_init);
+	  NEXT_PASS (pass_parallelize_loops_oacc_kernels);
+	  NEXT_PASS (pass_expand_omp_ssa);
+	  NEXT_PASS (pass_tree_loop_done);
+	  POP_INSERT_PASSES ()
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
   NEXT_PASS (pass_dse);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index b51cac2..0557f99 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -270,6 +270,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *);
+  opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); }
 
 }; // class pass_tree_loop_init
 
@@ -374,6 +375,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_scev_cprop; }
   virtual unsigned int execute (function *) { return scev_const_prop (); }
+  opt_pass * clone () { return new pass_scev_cprop (m_ctxt); }
 
 }; // class pass_scev_cprop
 
@@ -516,6 +518,7 @@ public:
 
   /* opt_pass methods: */
   virtual unsigned int execute (function *) { return tree_ssa_loop_done (); }
+  opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); }
 
 }; // class pass_tree_loop_done
 
-- 
1.9.1



Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread Segher Boessenkool
On Mon, Nov 09, 2015 at 12:34:20PM -0500, Michael Meissner wrote:
> > > +(define_insn "*toc_fusionload_"
> > > +  [(set (match_operand:QHSI 0 "int_reg_operand" "=&b,??r")
> > > + (match_operand:QHSI 1 "toc_fusion_mem_wrapped" "wG,wG"))
> > > +   (unspec [(const_int 0)] UNSPEC_FUSION_ADDIS)
> > > +   (use (match_operand:DI 2 "base_reg_operand" "r,r"))
> > > +   (clobber (match_scratch:DI 3 "=X,&b"))]
> > > +  "TARGET_TOC_FUSION_INT"
> > 
> > Do you need that "??r" alternative?  Same for the next define_insn.
> 
> Yes unfortunately.  The ??r catches the case where r0 is chosen.  R0 is not a
> base register, and it can't be used for power8 gpr fusion (where you use the
> value being loaded for the ADDIS instruction), but it can be used for power9
> fusion (where the ADDIS must be adjancent, but it no longer has to be the
> register being loaded).

If you have only "b", r0 will not be chosen.  Does that help?  Or are
you generating this pattern from somewhere else where you put in r0?


Segher


Re: [PATCH 4/6] Simplify ix86_builtin_vectorized_function

2015-11-09 Thread Uros Bizjak
On Mon, Nov 9, 2015 at 5:28 PM, Richard Sandiford
 wrote:
> After the previous patches it's no longer necessary for
> TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
> map to the vector optab of the original operation.  We'll use
> a vector form of the internal function instead.
>
>
> gcc/
> * config/i386/i386.c (ix86_builtin_vectorized_function): Remove
> entries that map directly to optabs.

OK.

Thanks,
Uros.

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index a1d59a5..1003ce1 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -41746,27 +41746,6 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
>
>switch (fn)
>  {
> -CASE_CFN_SQRT:
> -  if (out_mode == DFmode && in_mode == DFmode)
> -   {
> - if (out_n == 2 && in_n == 2)
> -   return ix86_get_builtin (IX86_BUILTIN_SQRTPD);
> - else if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_SQRTPD256);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_SQRTPD512);
> -   }
> -  if (out_mode == SFmode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR512);
> -   }
> -  break;
> -
>  CASE_CFN_EXP2:
>if (out_mode == SFmode && in_mode == SFmode)
> {
> @@ -41869,27 +41848,6 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
> }
>break;
>
> -CASE_CFN_COPYSIGN:
> -  if (out_mode == DFmode && in_mode == DFmode)
> -   {
> - if (out_n == 2 && in_n == 2)
> -   return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD);
> - else if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD256);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD512);
> -   }
> -  if (out_mode == SFmode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS256);
> - else if (out_n == 16 && in_n == 16)
> -   return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS512);
> -   }
> -  break;
> -
>  CASE_CFN_FLOOR:
>/* The round insn does not trap on denormals.  */
>if (flag_trapping_math || !TARGET_ROUND)
> @@ -41974,27 +41932,6 @@ ix86_builtin_vectorized_function (unsigned int fn, 
> tree type_out,
> }
>break;
>
> -CASE_CFN_ROUND:
> -  /* The round insn does not trap on denormals.  */
> -  if (flag_trapping_math || !TARGET_ROUND)
> -   break;
> -
> -  if (out_mode == DFmode && in_mode == DFmode)
> -   {
> - if (out_n == 2 && in_n == 2)
> -   return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ);
> - else if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ256);
> -   }
> -  if (out_mode == SFmode && in_mode == SFmode)
> -   {
> - if (out_n == 4 && in_n == 4)
> -   return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ);
> - else if (out_n == 8 && in_n == 8)
> -   return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ256);
> -   }
> -  break;
> -
>  CASE_CFN_FMA:
>if (out_mode == DFmode && in_mode == DFmode)
> {
>


[PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


This patch adds pass_parallelize_loops_oacc_kernels.

There's a number of things we do differently in parloops for oacc kernels:
- in normal parloops, we generate code to choose between a parallel
  version of the loop, and a sequential (low iteration count) version.
  Since the code in oacc kernels region is supposed to run on the
  accelerator anyway, we skip this check, and don't add a low iteration
  count loop.
- in normal parloops, we generate an #pragma omp parallel /
  GIMPLE_OMP_RETURN pair to delimit the region which will we split off
  into a thread function. Since the oacc kernels region is already
  split off, we don't add this pair.
- we indicate the parallelization factor by setting the oacc function
  attributes
- we generate an #pragma oacc loop instead of an #pragma omp for, and
  we add the gang clause
- in normal parloops, we rewrite the variable accesses in the loop in
  terms into accesses relative to a thread function parameter. For the
  oacc kernels region, that rewrite has already been done at omp-lower,
  so we skip this.
- we need to ensure that the entire kernels region can be run in
  parallel. The loop independence check is already present, so for oacc
  kernels we add a check between blocks outside the loop and the entire
  region.
- we guard stores in the blocks outside the loop with gang_pos == 0.
  There's no need for each gang to write to a single location, we can
  do this in just one gang. (Typically this is the write of the final
  value of the iteration variable if that one is copied back to the
  host).

Thanks,
- Tom

Add pass_parallelize_loops_oacc_kernels

2015-11-09  Tom de Vries  

	* omp-low.c (set_oacc_fn_attrib): Make extern.
	* omp-low.c (expand_omp_atomic_fetch_op):  Release defs of update stmt.
	* omp-low.h (set_oacc_fn_attrib): Declare.
	* tree-parloops.c (struct reduction_info): Add reduc_addr field.
(create_call_for_reduction_1): Handle case that reduc_addr is non-NULL.
	(create_parallel_loop, gen_parallel_loop, try_create_reduction_list):
	Add and handle function parameter oacc_kernels_p.
	(get_omp_data_i_param): New function.
	(ref_conflicts_with_region, oacc_entry_exit_ok_1)
	(oacc_entry_exit_single_gang, oacc_entry_exit_ok): New function.
	(parallelize_loops): Add and handle function parameter oacc_kernels_p.
	Calculate dominance info.  Skip loops that are not in a kernels region
	in oacc_kernels_p mode.  Skip inner loops of parallelized loops.
	(pass_parallelize_loops::execute): Call parallelize_loops with false
	argument.
	(pass_data_parallelize_loops_oacc_kernels): New pass_data.
	(class pass_parallelize_loops_oacc_kernels): New pass.
	(pass_parallelize_loops_oacc_kernels::execute)
	(make_pass_parallelize_loops_oacc_kernels): New function.
	* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.
---
 gcc/omp-low.c   |   8 +-
 gcc/omp-low.h   |   1 +
 gcc/tree-parloops.c | 689 +++-
 gcc/tree-pass.h |   2 +
 4 files changed, 636 insertions(+), 64 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 39c12c1..13fa456 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11967,10 +11967,14 @@ expand_omp_atomic_fetch_op (basic_block load_bb,
   gcc_assert (gimple_code (gsi_stmt (gsi)) == GIMPLE_OMP_ATOMIC_STORE);
   gsi_remove (&gsi, true);
   gsi = gsi_last_bb (store_bb);
+  stmt = gsi_stmt (gsi);
   gsi_remove (&gsi, true);
 
   if (gimple_in_ssa_p (cfun))
-update_ssa (TODO_update_ssa_no_phi);
+{
+  release_defs (stmt);
+ 

[PATCH, testsuite]: Fix g++.dg/cilk-plus/CK/pr66326.cc FAILs

2015-11-09 Thread Uros Bizjak
We don't have to use cilk.h convenience header.

2015-11-09  Uros Bizjak  

* g++.dg/cilk-plus/CK/pr66326.cc: Do not include cilk.h.
(main): Use _Cilk_spawn instead of cilk_spawn.

Tested on x86_64-linux-gnu and committed to mainline SVN.

Uros.
Index: g++.dg/cilk-plus/CK/pr66326.cc
===
--- g++.dg/cilk-plus/CK/pr66326.cc  (revision 230038)
+++ g++.dg/cilk-plus/CK/pr66326.cc  (working copy)
@@ -2,7 +2,6 @@
 /* { dg-do run { target i?86-*-* x86_64-*-* } } */
 /* { dg-options "-fcilkplus -lcilkrts" { target { i?86-*-* x86_64-*-* } } } */
 
-#include 
 #include 
 #include 
 
@@ -23,8 +22,8 @@
 
 int main() {
   std::vector v1, v2, v3;
-  cilk_spawn [&] { v1 = compute(); }();
-  cilk_spawn [&] { v2 = compute(); }();
+  _Cilk_spawn [&] { v1 = compute(); }();
+  _Cilk_spawn [&] { v2 = compute(); }();
   v3 = compute();
   do_not_optimize_away(v1.data());
   do_not_optimize_away(v2.data());


Re: [PATCH], Add power9 support to GCC, patch #4

2015-11-09 Thread Segher Boessenkool
On Mon, Nov 09, 2015 at 12:27:34PM -0500, Michael Meissner wrote:
> On Mon, Nov 09, 2015 at 10:29:10AM -0600, Segher Boessenkool wrote:
> > On Sun, Nov 08, 2015 at 07:39:14PM -0500, Michael Meissner wrote:
> > > +;; Pretend we have a memory form of extswsli until register allocation 
> > > is done
> > > +;; so that we use LWZ to load the value from memory, instead of LWA.
> > 
> > We generate sign_extend loads for many cases where zero_extend would be
> > preferable.  We should deal with that generically, and then we can lose
> > this hack.
> 
> Well it would be nice in theory.  But since we don't have that generic pass, I
> need to use the combiner to generate the instruction.

Yes, it's for a todo list.  And it doesn't have to be a separate pass,
just a bit of tuning here or there.

This is a lot of complex work to treat a special case of a more general
problem.

> > > +(define_insn_and_split "*ashdi3_extswsli_dot"
> > 
> > ...
> > 
> > > +  if (REGNO (cr) == CR0_REGNO)
> > > +{
> > > +  emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
> > > +  DONE;
> > > +}
> > 
> > s/dot2/dot/
> 
> No, it will endless recurse until there is a stack overflow if you use dot
> (since it will call itself, generating the same pattern over and over again).

Generating dot2 from dot does not make much sense, and dot2 calls itself
as well.  Are you sure?  Something is off here.

Cheers,


Segher


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Jeff Law

On 11/09/2015 12:38 PM, Bernd Schmidt wrote:

On 11/09/2015 07:52 PM, Trevor Saunders wrote:


yeah, that's more or less my thought, and this makes hookization easier
since you can now mechanically add a hook for each thing in defaults.h
that invokes the macro.  Then for each target you can go through and
replace the macro with an override of the hooks.  That ends up with the
macros replaced by hooks without writing a lot of patches that need to
go through config-list.mk, and testing on multiple targets which imho is
a giant pain, and rather slow.


We might want to think about making a policy decision to try waiving
some of the testing requirements for target macro -> hook conversions.
Maybe try only a "build to cc1" requirement and see whether that causes
too much breakage.
A config-list.mk build is a build to cc1*, f951, gnat1, so we're not 
requiring deep tests on the affected targets.  Not sure how much we're 
getting by forcing a bootstrap & regression test of that kind of change.


I'm certainly open to this kind of relaxed testing to help this stuff 
move forward an complete before we're all retired :-)


Jeff



Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-09 Thread Arnaud Charlet
> > We do have also a texi2rst script which handles 90% of the work, the
> > rest requiring manual adaptations. I can send the script we've used if
> > this can help.
> 
> I'm interested in seeing your script.  Can you post/upload it somewhere?

Yes I will. Let me get the latest version we've used and get back to you.

Arno


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 07:52 PM, Trevor Saunders wrote:


yeah, that's more or less my thought, and this makes hookization easier
since you can now mechanically add a hook for each thing in defaults.h
that invokes the macro.  Then for each target you can go through and
replace the macro with an override of the hooks.  That ends up with the
macros replaced by hooks without writing a lot of patches that need to
go through config-list.mk, and testing on multiple targets which imho is
a giant pain, and rather slow.


We might want to think about making a policy decision to try waiving 
some of the testing requirements for target macro -> hook conversions. 
Maybe try only a "build to cc1" requirement and see whether that causes 
too much breakage.



Bernd



Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 08:29 PM, Trevor Saunders wrote:

as I said in 0/12 this did go through config-list.mk, and checking again
this does build on alpha-dec-vms.


The question I have is - why does it build on any other target? It's the 
reference that's unconditional, not the definition. Do we have enough 
DCE at -O0 to eliminate the reference? It's still incorrect IMO (and 
should be fixed in the other patches as well.




I'd actually really rather review them, or really deal with them in any
way, the way they are.  Smaller simpler patches that only deal with one
thing are much better.  I think the most macros that appear on one line
are 2, so at most you could lower that to 1 change instead of 2, but who
really cares anyway?


Well, I do, because I get to see this stuff:

-#if 1 < (defined (DBX_DEBUGGING_INFO) + defined (SDB_DEBUGGING_INFO) \
+#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
  + defined (DWARF2_DEBUGGING_INFO) + defined 
(XCOFF_DEBUGGING_INFO) \

  + defined (VMS_DEBUGGING_INFO))

 #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
- + defined (DWARF2_DEBUGGING_INFO) + defined 
(XCOFF_DEBUGGING_INFO) \

++ defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
  + defined (VMS_DEBUGGING_INFO))

 #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
 + defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
- + defined (VMS_DEBUGGING_INFO))
++ (VMS_DEBUGGING_INFO))

 #if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
-+ defined (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
++ (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
 + (VMS_DEBUGGING_INFO))

-#if 1 < (defined (DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
+#if 1 < ((DBX_DEBUGGING_INFO) + (SDB_DEBUGGING_INFO) \
 + (DWARF2_DEBUGGING_INFO) + (XCOFF_DEBUGGING_INFO) \
 + (VMS_DEBUGGING_INFO))

etc.


Bernd


Re: [PATCH], Add power9 support to GCC, patch #3 (scalar count trailing zeros)

2015-11-09 Thread Segher Boessenkool
On Mon, Nov 09, 2015 at 12:17:49PM -0500, Michael Meissner wrote:
> > > +  "TARGET_CTZ"
> > > +  "cnttz %0,%1"
> > > +  [(set_attr "type" "cntlz")])
> > 
> > We should probably rename this attr value now.  "cntz" maybe?  Could be
> > later of course.
> 
> I don't see a need to add another type attribute for count trailing zeros
> unless count leading zeros has a different timing than count trailing zeros.

I didn't suggest adding a "cnttz"; I suggested renaming "cntlz".  Maybe
"ctz" is better, that's what the target flag is as well.

Cheers,


Segher


Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)

2015-11-09 Thread Segher Boessenkool
On Sun, Nov 08, 2015 at 07:44:52PM -0500, Michael Meissner wrote:
> +/* Split a conversion from __float128 to an integer type into separate insns.
> +   OPERANDS points to the destination, source, and V2DI temporary
> +   register. CODE is either FIX or UNSIGNED_FIX.  */

dot space space

> +;; ISA 2.08 IEEE 128-bit floating point support.

3.0

> +(define_code_attr fix_fixuns  [(fix   "fix")   (unsigned_fix   "fixuns")])
> +(define_code_attr float_floatuns [(float "float") (unsigned_float 
> "floatuns")])

You could instead do an "uns" attribute so you would write fix etc.

> +;; 0 says do sign-extension, 1 says zero-extension
> +(define_insn "*ieee128_mtvsrw"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v,v,v,v")
> + (unspec:V2DI [(match_operand:SI 1 "nonimmediate_operand" "r,Z,r,Z")
> +   (match_operand:SI 2 "const_0_to_1_operand" "O,O,n,n")]
> +  UNSPEC_IEEE128_MOVE))]
> +  "TARGET_FLOAT128_HW"
> +  "@
> +   mtvsrwa %x0,%1
> +   lxsiwax %x0,%y1
> +   mtvsrwz %x0,%1
> +   lxsiwzx %x0,%y1"
> +  [(set_attr "type" "mffgpr,fpload,mffgpr,fpload")])

Tricky, is there no cleaner way to do this?


Segher


Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 11:44:30AM -0700, Jeff Law wrote:
> On 11/09/2015 11:34 AM, Bernd Schmidt wrote:
> >In general I think the _DEBUGGING_INFO patches are going to be OK,
> >modulo Jeff's comment about stage 1. I think they shouldn't have been
> >split - it causes numerous unnecessary extra changes, and the
> >intermediate stages look very inconsistent.

I'd actually really rather review them, or really deal with them in any
way, the way they are.  Smaller simpler patches that only deal with one
thing are much better.  I think the most macros that appear on one line
are 2, so at most you could lower that to 1 change instead of 2, but who
really cares anyway?  I guess its not the greatest thing for blame, but
good blame tools should make that a tiny issue since uninteresting changes
are a fact of life, and I'd much rather skip uninteresting changes than
look at a giant change doing many things and wonder why it did one of
them.

> >>-#ifdef VMS_DEBUGGING_INFO
> >>-  else if (write_symbols == VMS_DEBUG || write_symbols ==
> >>VMS_AND_DWARF2_DEBUG)
> >>+  else if (VMS_DEBUGGING_INFO
> >>+   && (write_symbols == VMS_DEBUG
> >>+   || write_symbols == VMS_AND_DWARF2_DEBUG))
> >>  debug_hooks = &vmsdbg_debug_hooks;
> >>-#endif
> >>  #ifdef DWARF2_LINENO_DEBUGGING_INFO
> >>else if (write_symbols == DWARF2_DEBUG)
> >>  debug_hooks = &dwarf2_lineno_debug_hooks;
> >>diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c
> >>index d41d4b2..6dd6878 100644
> >>--- a/gcc/vmsdbgout.c
> >>+++ b/gcc/vmsdbgout.c
> >>@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "coretypes.h"
> >>  #include "tm.h"
> >>
> >>-#ifdef VMS_DEBUGGING_INFO
> >>+#if VMS_DEBUGGING_INFO
> >>  #include "alias.h"
> >>  #include "tree.h"
> >>  #include "varasm.h"
> >
> >This seems to reference vmsdbg_debug_hooks unconditionally, but as far
> >as I can tell the definition is still guarded by an #if? Does this compile?
> There's an easy way for Trevor to find out.  Build a cross for one of the
> VMS targets (there's 3 defined in config-list.mk) :-)

as I said in 0/12 this did go through config-list.mk, and checking again
this does build on alpha-dec-vms.

Trev

> 
> jeff


Re: RFC: C++ delayed folding merge

2015-11-09 Thread Jason Merrill

On 11/09/2015 04:08 AM, Richard Biener wrote:

On Mon, 9 Nov 2015, Jason Merrill wrote:


I'm planning to merge the C++ delayed folding branch this week, but I need to
get approval of the back end changes (the first patch attached).  Most of
these are the introduction of non-folding variants of convert_to_*, but there
are a few others.

One question: The branch changes 'convert' to not fold its result, and it's
not clear to me whether that's part of the expected behavior of a front end
'convert' function or not.


History.  convert is purely frontend (but shared, unfortunately between
all frontends).  I would expect that FEs that do not do delayed folding
expect convert to fold.


Also, I'm a bit uncertain about merging this at the end of stage 1, since it's
a large internal change with relatively small user impact; it just improves
handling of constant expression corner cases.  I'm inclined to go ahead with
it at this point, but I'm interested in contrary opinions.


I welcome this change as it should allow cleaning up the FE-middle-end
interface a bit more.  It should be possible to remove all
NON_LVALUE_EXPR adding/removal from the middle-end folders.

Looks like the backend patch included frontend parts but as far as I
skimmed it only

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 5e32901..d754a90 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -2091,6 +2091,17 @@ fold_convert_const (enum tree_code code, tree type,
tree arg1)
else if (TREE_CODE (arg1) == REAL_CST)
 return fold_convert_const_fixed_from_real (type, arg1);
  }
+  else if (TREE_CODE (type) == VECTOR_TYPE)
+{
+  if (TREE_CODE (arg1) == VECTOR_CST
+ && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (TREE_TYPE
(arg1))
+ && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1))
+   {
+ tree r = copy_node (arg1);
+ TREE_TYPE (arg1) = type;
+ return r;
+   }
+}


looks suspicious.  The issue here is that the vector elements will
have the wrong type after this simple handling.


I was aiming to just handle simple cv-qualifier changes; that's why the 
TYPE_MAIN_VARIANT comparison is there.



If you fix that you can as well handle all kind of element type
changes via recursing to fold_convert_const (that includes
float to int / int to float changes).


But I'll try this.


Not sure if we can even
write a testcase for such conversions with the GCC vector extensions
though.


Jason




Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-09 Thread David Malcolm
On Sun, 2015-11-08 at 16:16 +0100, Arnaud Charlet wrote:
> We've switched the Ada doc to sphinx indeed, so can only be
> in favor of this change for the rest of GCC.
> 
> We do have also a texi2rst script which handles 90% of the work, the
> rest requiring manual adaptations. I can send the script we've used if
> this can help.

I'm interested in seeing your script.  Can you post/upload it somewhere?

Thanks
Dave



Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-09 Thread David Malcolm
On Mon, 2015-11-09 at 16:54 +, Kyrill Tkachov wrote:
> Hi David,
> 
> On 08/11/15 13:55, David Malcolm wrote:
> > I've been experimenting with using Sphinx [1] for GCC's documentation.
> >
> > You can see an HTML sample of GCC docs built with Sphinx here:
> > https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/gcc.html
> > (it's a work-in-progress; i.e. there are bugs).
> >
> > Compare with:
> >   https://gcc.gnu.org/onlinedocs/gcc/index.html
> >
> >
> > In particular, note how options get stable, clickable URLs:
> > https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/option-summary.html
> 
> FWIW, I think this all looks much better than the existing formatting.
> One weird artifact I noticed while looking at the ARM options:
> https://dmalcolm.fedorapeople.org/gcc/2015-08-31/rst-experiment/hardware-models-and-configurations.html#arm-options
> In particular for -mcpu where it gives an explanation of what 
> -mcpu=generic- is equivalent to, there seems
> to be something weird going on.
> The .texi source is:
> @option{-mcpu=generic-@var{arch}} is also permissible, and is equivalent to 
> @option{-march=@var{arch} -mtune=generic-@var{arch}}.
> 
> Whereas the output looks something like:
> -mcpu=generic-``arch`` is also permissible, and is equivalent to 
> -march=``arch` -mtune=generic-arch`
> 
> The backticks look somewhat inconsistent. But that may be due to invalid use 
> of the @var and @option
> constructs in the source. I'm not very familiar with the details.

Thanks; I've filed this for myself as:
  https://github.com/davidmalcolm/texi2rst/issues/10

[...snip...]



Re: [PATCH 00/12] misc conditional compilation work

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 10:57:10AM -0700, Jeff Law wrote:
> On 11/09/2015 09:47 AM, tbsaunde+...@tbsaunde.org wrote:
> >From: Trevor Saunders 
> >
> >Hi,
> >
> >basically $subject, making some code unconditionally compiled, and changing
> >other things from #ifdef to #if so they can be made unconditional
> >incrementally.
> >
> >patches individually bootstrapped + regtested on x86_64-linux-gnu, and a
> >slightly earlier version of the series ran through config-list.mk.  I think
> >everything here is either preapproved, or obvious so I'll commit it later
> >today if nobody complains.
> Are these the last patches of this nature planned for GCC6?  While the
> window was left slightly open by Richi this morning, I think that's more to
> allow the queues to drain rather than to allow more new work to go into the
> tree :-)

yeah, I guess I misread, I thought the end was tonight not last night (I
could easily have sent this out a day or so earlier).  Given my in
correct assumption about timing I was considering trying to sneak in a
little more around reg-stack.c, but I suspect that isn't going to work
out anyway (turns out even after the macros reg-stack.c uses x86
specific variables).

Trev

> 
> jeff
> 


Re: [PATCH 01/12] reduce conditional compilation for HARD_FRAME_POINTER_IS_ARG_POINTER

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

+++ b/gcc/dbxout.c
@@ -3076,10 +3076,8 @@ dbxout_symbol_location (tree decl, tree type, const char 
*suffix, rtx home)
   || (REG_P (XEXP (home, 0))
   && REGNO (XEXP (home, 0)) != HARD_FRAME_POINTER_REGNUM
   && REGNO (XEXP (home, 0)) != STACK_POINTER_REGNUM
-#if !HARD_FRAME_POINTER_IS_ARG_POINTER
-  && REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM
-#endif
-  )))
+  && (HARD_FRAME_POINTER_IS_ARG_POINTER
+  || REGNO (XEXP (home, 0)) != ARG_POINTER_REGNUM


This used to be

#if ARG_POINTER_REGNUM != HARD_FRAME_POINTER_REGNUM

and the whole macro seems kind of pointless - why not just make the 
ARG_POINTER_REGNUM test unconditional? I think the conditional 
compilation was originally just a "performance optimization", avoiding 
unnecessary tests - which means the reason to have the tests goes away 
if we move away from the conditional compilation.



Bernd


Re: [PATCH], Add power9 support to GCC, patch #5 (ISA 3.0 fusion)

2015-11-09 Thread David Edelsohn
On Sun, Nov 8, 2015 at 4:42 PM, Michael Meissner
 wrote:
> This patch adds support for new fusion forms in ISA 3.0 (power9).  In
> particular, ISA 3.0 can fuse GPR loads of R0, FPR loads, GPR stores, FPR
> stores, and some constant generation that ISA 2.07 (power8) could not
> generate.
>
> I have built this patch with a bootstrap build on a power8 little endian
> system.  There were no regressions in the test suite.  Is this patch ok to
> install in the trunk once patch #1 has been installed.
>
> [gcc]
> 2015-11-08  Michael Meissner  
>
> * config/rs6000/constraints.md (wF constraint): New constraints
> for power9/toc fusion.
> (wG constraint): Likewise.
>
> * config/rs6000/predicates.md (upper16_cint_operand): New
> predicate for power9 and toc fusion.
> (fpr_reg_operand): Likewise.
> (toc_fusion_or_p9_reg_operand): Likewise.
> (toc_fusion_mem_raw): Likewise.
> (toc_fusion_mem_wrapped): Likewise.
> (fusion_gpr_addis): If power9 fusion, allow fusion for a larger
> address range.
> (fusion_gpr_mem_combo): Delete, use fusion_addis_mem_combo_load
> instead.
> (fusion_addis_mem_combo_load): Add support for power9 fusion of
> floating point loads, floating point stores, and gpr stores.
> (fusion_addis_mem_combo_store): Likewise.
> (fusion_offsettable_mem_operand): Likewise.
>
> * config/rs6000/rs6000-protos.h (emit_fusion_addis): Add
> declarations.
> (emit_fusion_load_store): Likewise.
> (fusion_p9_p): Likewise.
> (expand_fusion_p9_load): Likewise.
> (expand_fusion_p9_store): Likewise.
> (emit_fusion_p9_load): Likewise.
> (emit_fusion_p9_store): Likewise.
> (fusion_wrap_memory_address): Likewise.
>
> * config/rs6000/rs6000.c (struct rs6000_reg_addr): Add new
> elements for power9 fusion.
> (rs6000_debug_print_mode): Rework debug information to print more
> information about fusion.
> (rs6000_init_hard_regno_mode_ok): Setup for power9 fusion
> support.
> (rs6000_legitimate_address_p): Recognize toc fusion as a valid
> offsettable memory address.
> (emit_fusion_gpr_load): Move most of the code from
> emit_fusion_gpr_load into emit_fusion-addis that handles both
> power8 and power9 fusion.
> (emit_fusion_addis): Likewise.
> (emit_fusion_load_store): Likewise.
> (fusion_wrap_memory_address): Add support for TOC fusion.
> (fusion_split_address): Likewise.
> (fusion_p9_p): Add support for power9 fusion.
> (expand_fusion_p9_load): Likewise.
> (expand_fusion_p9_store): Likewise.
> (emit_fusion_p9_load): Likewise.
> (emit_fusion_p9_store): Likewise.
>
> * config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): New macros for
> power9 fusion support.
> (TARGET_TOC_FUSION_FP): Likewise.
>
> * config/rs6000/rs6000.md (UNSPEC_FUSION_P9): New power9/toc
> fusion unspecs.
> (UNSPEC_FUSION_ADDIS): Likewise.
> (QHSI mode iterator): New iterator for power9 fusion.
> (GPR_FUSION): Likewise.
> (FPR_FUSION): Likewise.
> (power9 fusion splitter): New power9/toc fusion support.
> (toc_fusionload_): Likewise.
> (toc_fusionload_di): Likewise.
> (fusion_gpr_load_): Update predicate function.
> (power9 fusion peephole2s): New power9/toc fusion support.
> (fusion_gpr___load): Likewise.
> (fusion_gpr___store): Likewise.
> (fusion_fpr___load): Likewise.
> (fusion_fpr___store): Likewise.
> (fusion_p9__constant): Likewise.
>
> [gcc/testsuite]
> 2015-11-08  Michael Meissner  
>
> * gcc.target/powerpc/fusion.c (fusion_vector): Move to fusion2.c
> and allow the test on PowerPC LE.
> * gcc.target/powerpc/fusion2.c (fusion_vector): Likewise.
>
> * gcc.target/powerpc/fusion3.c: New file, test power9 fusion.

Okay, with the changes that you and Segher discussed.

Thanks, David


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Trevor Saunders
On Mon, Nov 09, 2015 at 11:42:19AM -0700, Jeff Law wrote:
> On 11/09/2015 11:27 AM, Bernd Schmidt wrote:
> >On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:
> >>From: Trevor Saunders 
> >>
> >>gcc/ChangeLog:
> >>
> >>2015-11-09  Trevor Saunders  
> >>
> >>* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
> >>* df-scan.c (df_get_exit_block_use_set): Adjust.
> >>* except.c (expand_eh_return): Likewise.
> >
> >As I said for a previous patch series, if we go to the trouble of fixing
> >up stuff like this, we might as well do it properly and turn things like
> >this into a target hook.
> I agree that pushing hookization further is good as well.  I still think the
> patch in and of itself is a step forward, even if it doesn't hookize
> EH_RETURN_HANDLER_RTX.

yeah, that's more or less my thought, and this makes hookization easier
since you can now mechanically add a hook for each thing in defaults.h
that invokes the macro.  Then for each target you can go through and
replace the macro with an override of the hooks.  That ends up with the
macros replaced by hooks without writing a lot of patches that need to
go through config-list.mk, and testing on multiple targets which imho is
a giant pain, and rather slow.

Trev

> 
> jeff


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 07:42 PM, Jeff Law wrote:

On 11/09/2015 11:27 AM, Bernd Schmidt wrote:

On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
* df-scan.c (df_get_exit_block_use_set): Adjust.
* except.c (expand_eh_return): Likewise.


As I said for a previous patch series, if we go to the trouble of fixing
up stuff like this, we might as well do it properly and turn things like
this into a target hook.

I agree that pushing hookization further is good as well.  I still think
the patch in and of itself is a step forward, even if it doesn't hookize
EH_RETURN_HANDLER_RTX.


Well, I was hoping that, by pointing out the issue for the last patch 
set, the next set of patches would get things right. We really shouldn't 
make sideways steps when there's a simple way to go forward.



Bernd


Re: [PATCH 02/12] remove EXTENDED_SDB_BASIC_TYPES

2015-11-09 Thread Bernd Schmidt


The last target using this was i960, which was removed many years ago,
so there's no reason to keep it.

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* gsyms.h (enum sdb_type): Remove code for
EXTENDED_SDB_BASIC_TYPES.
(enum sdb_masks): Likewise.
* sdbout.c (plain_type_1): Likewise.


Ok if you also poison the macro name as usual.


Bernd


Re: [ping] Fix PR debug/66728

2015-11-09 Thread Mike Stump
On Nov 6, 2015, at 5:06 AM, Richard Biener  wrote:
>> If there are no substantial reasons to not check it in now, I’d like to 
>> proceed and get it checked in.  People can refine it further in tree if they 
>> want.  Any objections?
> 
> Ok with a changelog entry and bootstrap/regtest.

Also committed to the release branch after waiting a few days to ensure no 
issue on trunk after the normal regression test and bootstrap.

Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Jeff Law

On 11/09/2015 11:34 AM, Bernd Schmidt wrote:

In general I think the _DEBUGGING_INFO patches are going to be OK,
modulo Jeff's comment about stage 1. I think they shouldn't have been
split - it causes numerous unnecessary extra changes, and the
intermediate stages look very inconsistent.


-#ifdef VMS_DEBUGGING_INFO
-  else if (write_symbols == VMS_DEBUG || write_symbols ==
VMS_AND_DWARF2_DEBUG)
+  else if (VMS_DEBUGGING_INFO
+   && (write_symbols == VMS_DEBUG
+   || write_symbols == VMS_AND_DWARF2_DEBUG))
  debug_hooks = &vmsdbg_debug_hooks;
-#endif
  #ifdef DWARF2_LINENO_DEBUGGING_INFO
else if (write_symbols == DWARF2_DEBUG)
  debug_hooks = &dwarf2_lineno_debug_hooks;
diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c
index d41d4b2..6dd6878 100644
--- a/gcc/vmsdbgout.c
+++ b/gcc/vmsdbgout.c
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "coretypes.h"
  #include "tm.h"

-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
  #include "alias.h"
  #include "tree.h"
  #include "varasm.h"


This seems to reference vmsdbg_debug_hooks unconditionally, but as far
as I can tell the definition is still guarded by an #if? Does this compile?
There's an easy way for Trevor to find out.  Build a cross for one of 
the VMS targets (there's 3 defined in config-list.mk) :-)


jeff


Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT

2015-11-09 Thread Mike Stump
On Nov 9, 2015, at 3:32 AM, Kyrill Tkachov  wrote:
> The aarch64 port does not define TARGET_SUPPORTS_WIDE_INT.

> Ok for trunk and GCC 5?

:-)  I’d endorse it, but, best left to the target folks.


Re: [PATCH 11/12] always define HAVE_AS_LEB128

2015-11-09 Thread Bernd Schmidt

-#ifdef HAVE_AS_LEB128
+#if HAVE_AS_LEB128


This patch doesn't seem to actually remove any conditional compilation?


Bernd


Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Jeff Law

On 11/09/2015 11:27 AM, Bernd Schmidt wrote:

On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
* df-scan.c (df_get_exit_block_use_set): Adjust.
* except.c (expand_eh_return): Likewise.


As I said for a previous patch series, if we go to the trouble of fixing
up stuff like this, we might as well do it properly and turn things like
this into a target hook.
I agree that pushing hookization further is good as well.  I still think 
the patch in and of itself is a step forward, even if it doesn't hookize 
EH_RETURN_HANDLER_RTX.


jeff


Re: [PATCH 12/12] always define ENABLE_OFFLOADING

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

-#ifdef ENABLE_OFFLOADING
/* If the user didn't specify any, default to all configured offload
   targets.  */
if (offload_targets == NULL)
  handle_foffload_option (OFFLOAD_TARGETS);
-#endif


This one I would keep guarded with an if.

Otherwise ok modulo stage 1 end.


Bernd


Re: [PATCH 05/12] always define VMS_DEBUGGING_INFO

2015-11-09 Thread Bernd Schmidt
In general I think the _DEBUGGING_INFO patches are going to be OK, 
modulo Jeff's comment about stage 1. I think they shouldn't have been 
split - it causes numerous unnecessary extra changes, and the 
intermediate stages look very inconsistent.



-#ifdef VMS_DEBUGGING_INFO
-  else if (write_symbols == VMS_DEBUG || write_symbols == VMS_AND_DWARF2_DEBUG)
+  else if (VMS_DEBUGGING_INFO
+  && (write_symbols == VMS_DEBUG
+  || write_symbols == VMS_AND_DWARF2_DEBUG))
  debug_hooks = &vmsdbg_debug_hooks;
-#endif
  #ifdef DWARF2_LINENO_DEBUGGING_INFO
else if (write_symbols == DWARF2_DEBUG)
  debug_hooks = &dwarf2_lineno_debug_hooks;
diff --git a/gcc/vmsdbgout.c b/gcc/vmsdbgout.c
index d41d4b2..6dd6878 100644
--- a/gcc/vmsdbgout.c
+++ b/gcc/vmsdbgout.c
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "coretypes.h"
  #include "tm.h"

-#ifdef VMS_DEBUGGING_INFO
+#if VMS_DEBUGGING_INFO
  #include "alias.h"
  #include "tree.h"
  #include "varasm.h"


This seems to reference vmsdbg_debug_hooks unconditionally, but as far 
as I can tell the definition is still guarded by an #if? Does this compile?



Bernd


[PATCH, 8/16] Add pass_ch_oacc_kernels

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds a pass pass_ch_oacc_kernels, which is like pass_ch, but 
only runs for loops with oacc_kernels_region set.


[ But... thinking about it a bit more, I think that we could use a 
regular pass_ch instead. We only use the kernels pass group for a single 
loop nest in a kernels region, and we mark all the loops in the loop 
nest with oacc_kernels_region. So I think that the oacc_kernels_region 
test in pass_ch_oacc_kernels::process_loop_p evaluates to true. ]


So, I'll try to confirm with retesting that we can drop this patch.

Thanks,
- Tom

Add pass_ch_oacc_kernels

2015-11-09  Tom de Vries  

	* tree-pass.h (make_pass_ch_oacc_kernels): Declare.
	* tree-ssa-loop-ch.c (pass_ch::pass_ch (pass_data, gcc::context)): New
	constructor.
	(pass_data_ch_oacc_kernels): New pass_data.
	(class pass_ch_oacc_kernels): New pass.
	(pass_ch_oacc_kernels::process_loop_p): New function.
	(make_pass_ch_oacc_kernels): New function.
---
 gcc/tree-pass.h|  1 +
 gcc/tree-ssa-loop-ch.c | 54 +-
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 2825aea..f95a820 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -389,6 +389,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 7e618bf..8bf47fe 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "tree-ssa-scopedtables.h"
 #include "tree-ssa-threadedge.h"
+#include "omp-low.h"
 
 /* Duplicates headers of loops if they are small enough, so that the statements
in the loop body are always executed when the loop is entered.  This
@@ -124,7 +125,7 @@ do_while_loop_p (struct loop *loop)
 
 namespace {
 
-/* Common superclass for both header-copying phases.  */
+/* Common superclass for header-copying phases.  */
 class ch_base : public gimple_opt_pass
 {
   protected:
@@ -159,6 +160,10 @@ public:
 : ch_base (pass_data_ch, ctxt)
   {}
 
+  pass_ch (pass_data data, gcc::context *ctxt)
+: ch_base (data, ctxt)
+  {}
+
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_ch != 0; }
   
@@ -414,3 +419,50 @@ make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);
 }
+
+namespace {
+
+const pass_data pass_data_ch_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "ch_oacc_kernels", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_ch_oacc_kernels : public pass_ch
+{
+public:
+  pass_ch_oacc_kernels (gcc::context *ctxt)
+: pass_ch (pass_data_ch_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+
+protected:
+  /* ch_base method: */
+  virtual bool pro

Re: [PATCH 10/12] always define EH_RETURN_HANDLER_RTX

2015-11-09 Thread Bernd Schmidt

On 11/09/2015 05:47 PM, tbsaunde+...@tbsaunde.org wrote:

From: Trevor Saunders 

gcc/ChangeLog:

2015-11-09  Trevor Saunders  

* defaults.h (EH_RETURN_HANDLER_RTX): New default definition.
* df-scan.c (df_get_exit_block_use_set): Adjust.
* except.c (expand_eh_return): Likewise.


As I said for a previous patch series, if we go to the trouble of fixing 
up stuff like this, we might as well do it properly and turn things like 
this into a target hook.



Bernd


[PATCH, 7/16] Add pass_dominator_oacc_kernels

2015-11-09 Thread Tom de Vries

On 09/11/15 16:35, Tom de Vries wrote:

Hi,

this patch series for stage1 trunk adds support to:
- parallelize oacc kernels regions using parloops, and
- map the loops onto the oacc gang dimension.

The patch series contains these patches:

  1Insert new exit block only when needed in
 transform_to_exit_first_loop_alt
  2Make create_parallel_loop return void
  3Ignore reduction clause on kernels directive
  4Implement -foffload-alias
  5Add in_oacc_kernels_region in struct loop
  6Add pass_oacc_kernels
  7Add pass_dominator_oacc_kernels
  8Add pass_ch_oacc_kernels
  9Add pass_parallelize_loops_oacc_kernels
 10Add pass_oacc_kernels pass group in passes.def
 11Update testcases after adding kernels pass group
 12Handle acc loop directive
 13Add c-c++-common/goacc/kernels-*.c
 14Add gfortran.dg/goacc/kernels-*.f95
 15Add libgomp.oacc-c-c++-common/kernels-*.c
 16Add libgomp.oacc-fortran/kernels-*.f95

The first 9 patches are more or less independent, but patches 10-16 are
intended to be committed at the same time.

Bootstrapped and reg-tested on x86_64.

Build and reg-tested with nvidia accelerator, in combination with a
patch that enables accelerator testing (which is submitted at
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).

I'll post the individual patches in reply to this message.


this patch adds pass_dominator_oacc_kernels (which we may as well call 
pass_dominator_no_peel_loop_headers. It doesn't do anything 
oacc-kernels-specific), to be used in the kernels pass group.


The reason I'm adding a new pass instead of using pass_dominator is that 
pass_dominator uses first_pass_instance. So adding a pass_dominator 
instance A before a pass_dominator instance B has the unexpected 
consequence that it may change the behaviour of instance B. I've filed 
PR68247 - "Remove pass_first_instance" to note this issue.


Thanks,
- Tom

Add pass_dominator_oacc_kernels

2015-11-09  Tom de Vries  

	* tree-pass.h (make_pass_dominator_oacc_kernels): Declare.
	* tree-ssa-dom.c (class dominator_base): New class.  Factor out of ...
	(class pass_dominator): ... here.
	(dominator_base::may_peel_loop_headers_p)
(pass_dominator::may_peel_loop_headers_p): New function.
	(pass_dominator_oacc_kernels): New pass.
	(make_pass_dominator_oacc_kernels): New function.
	(dominator_base::execute): Use may_peel_loop_headers_p.
---
 gcc/tree-pass.h|  1 +
 gcc/tree-ssa-dom.c | 57 +-
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 4ed8da6..2825aea 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -395,6 +395,7 @@ extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_alias (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_dominator_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 3887bbe1..e4ff63a 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -519,6 +519,19 @@ private:
 
 namespace {
 
+class dominator_base : public gimple_opt_pass
+{
+ protected:
+  dominator_base (pass_data data, gcc::context *ctxt)
+: gimple_opt_pass (data, ctxt)
+  {}
+
+  unsigned int execute (function *);
+
+ protected:
+  virtual bool may_peel_loop_headers_p (void) { return true; }
+}; // class dominator_base
+
 const pass_data pass_data_dominator =
 {
   GIMPLE_PASS, /* type */
@@ -532,22 +545,23 @@ const pass_data pass_data_dominator =
   ( TODO_cleanup_cfg | TODO_update_ssa ), /* todo_flags_finish */
 };
 
-class pass_dominator : public gimple_opt_pass
+class pass_dominator : public dominator_base
 {
 public:
   pass_dominator (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_dominator, ctxt)
+: dominator_base (pass_data_dominator, ctxt)
   {}
 
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_dominator (m_ctxt); }
   virtual bool gate (function *) { return flag_tree_dom != 0; }
-  virtual unsigned int execute (function *);
 
+ protected:
+  virtual bool may_peel_loop_headers_p (void) { return first_pass_instance; }
 }; // class pass_dominator
 
 unsigned int
-pass_dominator::execute (function *fun)
+dominator_base::execute (function *fun)
 {
   memset (&opt_stats, 0, sizeof (opt_stats));
 
@@ -619,7 +633,7 @@ pass_dominator::execute (function *fun)
   free_all_edge_infos ();
 
   /* Thread jumps, creating duplicate blocks as needed.  */
-  cfg_altered |= thread_through_all_blocks (first_pass_instance);
+  cfg_altered |= thr

Re: [PATCH], Add power9 support to GCC, patch #2 (add modulus instructions)

2015-11-09 Thread Michael Meissner
On Mon, Nov 09, 2015 at 09:48:50AM -0600, Segher Boessenkool wrote:
> Hi,
> 
> On Sun, Nov 08, 2015 at 07:36:16PM -0500, Michael Meissner wrote:
> > [gcc/testsuite]
> > * lib/target-supports.exp (check_p9vector_hw_available): Add
> > checks for power9 availability.
> > (check_effective_target_powerpc_p9vector_ok): Likewise.
> 
> It's probably better not to use this for modulo; it is confusing and if
> you'll later need to untangle it it is much more work.
> 
> > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> 
> Lose this line?  If Darwin cannot support modulo, the next line will
> catch that.
> 
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > "-mcpu=power9" } } */
> > +/* { dg-options "-mcpu=power9 -O3" } */
> 
> Is -O3 needed?  Why won't -O2 work?

Just habit.

> > +proc check_p9vector_hw_available { } {
> > +return [check_cached_effective_target p9vector_hw_available {
> > +   # Some simulators are known to not support VSX/power8 instructions.
> > +   # For now, disable on Darwin
> > +   if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || 
> > [istarget *-*-darwin*]} {
> 
> Long line.

Cut and paste from other tests.

> > Index: gcc/config/rs6000/rs6000.md
> > ===
> > --- gcc/config/rs6000/rs6000.md (revision 229972)
> > +++ gcc/config/rs6000/rs6000.md (working copy)
> > @@ -2885,9 +2885,9 @@ (define_insn_and_split "*div3_sra_
> > (set_attr "cell_micro" "not")])
> >  
> >  (define_expand "mod3"
> > -  [(use (match_operand:GPR 0 "gpc_reg_operand" ""))
> > -   (use (match_operand:GPR 1 "gpc_reg_operand" ""))
> > -   (use (match_operand:GPR 2 "reg_or_cint_operand" ""))]
> > +  [(set (match_operand:GPR 0 "gpc_reg_operand" "")
> > +   (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "")
> > +(match_operand:GPR 2 "reg_or_cint_operand" "")))]
> 
> You could delete the empty constraint strings while you're at it.
> 
> > +;; On machines with modulo support, do a combined div/mod the old fashioned
> > +;; method, since the multiply/subtract is faster than doing the mod 
> > instruction
> > +;; after a divide.
> 
> You can instead have a "divmod" insn that is split to either of div, mod,
> or div+mul+sub depending on which of the outputs is unused.  Peepholes
> do not get all cases.

Yes, though as I recall, I couldn't get it to do what I wanted, and moved on to
other targets.

> This can be a later improvement of course.

Yep.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



  1   2   3   >