Re: update acc routines in fortran

2015-11-24 Thread Cesar Philippidis
On 11/20/2015 02:18 AM, Jakub Jelinek wrote:
> On Thu, Nov 19, 2015 at 08:26:45AM -0800, Cesar Philippidis wrote:
>>  (gfc_oacc_routine_name): New struct;
> 
> Full stop instead of semicolon.

Fixed.

>> diff --git a/gcc/tree-nested.c b/gcc/tree-nested.c
>> index 1f6311c..e321072 100644
>> --- a/gcc/tree-nested.c
>> +++ b/gcc/tree-nested.c
>> @@ -1106,6 +1106,9 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
>> walk_stmt_info *wi)
>>  case OMP_CLAUSE_NUM_TASKS:
>>  case OMP_CLAUSE_HINT:
>>  case OMP_CLAUSE__CILK_FOR_COUNT_:
>> +case OMP_CLAUSE_NUM_GANGS:
>> +case OMP_CLAUSE_NUM_WORKERS:
>> +case OMP_CLAUSE_VECTOR_LENGTH:
>>wi->val_only = true;
>>wi->is_lhs = false;
>>convert_nonlocal_reference_op (_CLAUSE_OPERAND (clause, 0),
>> @@ -1173,6 +1176,10 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
>> walk_stmt_info *wi)
>>  case OMP_CLAUSE_THREADS:
>>  case OMP_CLAUSE_SIMD:
>>  case OMP_CLAUSE_DEFAULTMAP:
>> +case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_WORKER:
>> +case OMP_CLAUSE_VECTOR:
> 
> This looks wrong.  OMP_CLAUSE_GANG has 2 arguments, OMP_CLAUSE_WORKER and
> OMP_CLAUSE_VECTOR one argument, if you use a non-local decl or local decl
> that is referenced by a nested routine in those operands, it won't be
> handled properly.

Yeah, you're right. This didn't get updated when we added support for
the loop shape arguments. I fixed that in this patch.

>> @@ -1830,6 +1840,10 @@ convert_local_omp_clauses (tree *pclauses, struct 
>> walk_stmt_info *wi)
>>  case OMP_CLAUSE_THREADS:
>>  case OMP_CLAUSE_SIMD:
>>  case OMP_CLAUSE_DEFAULTMAP:
>> +    case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_WORKER:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_SEQ:
> 
> Ditto.
> 
> Otherwise LGTM.

Are the tree-nested changes ok?

Cesar

2015-11-24  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* tree-nested.c (convert_nonlocal_omp_clauses): Add support for
	OMP_CLAUSE_{NUM_GANGS,NUM_VECTORS,VECTOR_LENGTH,SEQ}.
	(convert_local_omp_clauses): Likewise.

2015-11-24  Cesar Philippidis  <ce...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>
	Nathan Sidwell  <nat...@codesourcery.com>

	gcc/fortran/
	* f95-lang.c (gfc_attribute_table): Add an "oacc function"
	attribute.
	* gfortran.h (symbol_attribute): Add an oacc_function bit-field.
	(gfc_oacc_routine_name): New struct;
	(gfc_get_oacc_routine_name): New macro.
	(gfc_namespace): Add oacc_routine_clauses, oacc_routine_names and
	oacc_routine fields.
	(gfc_exec_op): Add EXEC_OACC_ROUTINE.
	* openmp.c (OACC_ROUTINE_CLAUSES): New mask.
	(gfc_oacc_routine_dims): New function.
	(gfc_match_oacc_routine): Add support for named routines and the
	gang, worker vector and seq clauses.
	* parse.c (is_oacc): Add EXEC_OACC_ROUTINE.
	* resolve.c (gfc_resolve_blocks): Likewise.
	* st.c (gfc_free_statement): Likewise.
	* trans-decl.c (add_attributes_to_decl): Attach an 'oacc function'
	attribute and shape geometry for acc routine.

2015-11-24  Cesar Philippidis  <ce...@codesourcery.com>
	Nathan Sidwell  <nat...@codesourcery.com>

	gcc/testsuite/
	* gfortran.dg/goacc/routine-3.f90: New test.
	* gfortran.dg/goacc/routine-4.f90: New test.
	* gfortran.dg/goacc/routine-5.f90: New test.
	* gfortran.dg/goacc/routine-6.f90: New test.
	* gfortran.dg/goacc/subroutines: New test.

2015-11-24  James Norris  <jnor...@codesourcery.com>
	Cesar Philippidis  <ce...@codesourcery.com>

	libgomp/
	* libgomp.oacc-fortran/routine-5.f90: New test.
	* libgomp.oacc-fortran/routine-7.f90: New test.
	* libgomp.oacc-fortran/routine-9.f90: New test.

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 605c2ab..8556b70 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -93,6 +93,8 @@ static const struct attribute_spec gfc_attribute_table[] =
affects_type_identity } */
   { "omp declare target", 0, 0, true,  false, false,
 gfc_handle_omp_declare_target_attribute, false },
+  { "oacc function", 0, -1, true,  false, false,
+gfc_handle_omp_declare_target_attribute, false },
   { NULL,		  0, 0, false, false, false, NULL, false }
 };
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 5487c93..0628e86 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -848,6 +848,9 @@ typedef struct
   unsigned oacc_declare_device_resident:1;
   unsigned oacc_declare_link:1;
 
+  /* This is an OpenACC acclerator function at level N - 1  */
+  unsigned oacc_function:3;
+
   /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
   unsigned ext_attr:EXT_ATTR_NUM;
 
@@ -1606,6 +1609,16 @@ gfc_dt_list;
  

update acc routines in fortran

2015-11-19 Thread Cesar Philippidis
This patch extends the existing support for acc routines in fortran.
It's a little bit more invasive than what I remembered, but it's still
fairly straightforward. Basically, it adds support for the following:

 - name routines
 - gang, worker, vector and seq clauses

In addition, I've also taught tree-nested to be aware of the
aforementioned clauses. Without those tree-nested changes, a lot of the
new test cases would fail.

If you observe the changelog closely, you'll noticed that I didn't
include libgomp.oacc-fortran/routine-[48].f90. The reason is, we don't
have support for the bind and nohost clauses on trunk yet. Thomas posted
a patch right before stage1 closed. So if that patch gets accepted, I'll
create a follow up patch for routines in fortran.

This this OK for trunk?

Cesar
2015-11-19  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* tree-nested.c (convert_nonlocal_omp_clauses): Add support for
	OMP_CLAUSE_{NUM_GANGS,NUM_VECTORS,VECTOR_LENGTH,SEQ}.
	(convert_local_omp_clauses): Likewise.

2015-11-19  Cesar Philippidis  <ce...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>
	Nathan Sidwell  <nat...@codesourcery.com>

	gcc/fortran/
	* f95-lang.c (gfc_attribute_table): Add an "oacc function"
	attribute.
	* gfortran.h (symbol_attribute): Add an oacc_function bit-field.
	(gfc_oacc_routine_name): New struct;
	(gfc_get_oacc_routine_name): New macro.
	(gfc_namespace): Add oacc_routine_clauses, oacc_routine_names and
	oacc_routine fields.
	(gfc_exec_op): Add EXEC_OACC_ROUTINE.
	* openmp.c (OACC_ROUTINE_CLAUSES): New mask.
	(gfc_oacc_routine_dims): New function.
	(gfc_match_oacc_routine): Add support for named routines and the
	gang, worker vector and seq clauses.
	* parse.c (is_oacc): Add EXEC_OACC_ROUTINE.
	* resolve.c (gfc_resolve_blocks): Likewise.
	* st.c (gfc_free_statement): Likewise.
	* trans-decl.c (add_attributes_to_decl): Attach an 'oacc function'
	attribute and shape geometry for acc routine.

2015-11-19  Cesar Philippidis  <ce...@codesourcery.com>
	Nathan Sidwell  <nat...@codesourcery.com>

	gcc/testsuite/
	* gfortran.dg/goacc/routine-3.f90: New test.
	* gfortran.dg/goacc/routine-4.f90: New test.
	* gfortran.dg/goacc/routine-5.f90: New test.
	* gfortran.dg/goacc/routine-6.f90: New test.

2015-11-19  James Norris  <jnor...@codesourcery.com>
	Cesar Philippidis  <ce...@codesourcery.com>

	libgomp/
	* libgomp.oacc-fortran/routine-5.f90: New test.
	* libgomp.oacc-fortran/routine-7.f90: New test.
	* libgomp.oacc-fortran/routine-9.f90: New test.

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 605c2ab..8556b70 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -93,6 +93,8 @@ static const struct attribute_spec gfc_attribute_table[] =
affects_type_identity } */
   { "omp declare target", 0, 0, true,  false, false,
 gfc_handle_omp_declare_target_attribute, false },
+  { "oacc function", 0, -1, true,  false, false,
+gfc_handle_omp_declare_target_attribute, false },
   { NULL,		  0, 0, false, false, false, NULL, false }
 };
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index e13b4d4..3dbcd96 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -841,6 +841,9 @@ typedef struct
   /* Mentioned in OMP DECLARE TARGET.  */
   unsigned omp_declare_target:1;
 
+  /* This is an OpenACC acclerator function at level N - 1  */
+  unsigned oacc_function:3;
+
   /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
   unsigned ext_attr:EXT_ATTR_NUM;
 
@@ -1582,6 +1585,16 @@ gfc_dt_list;
   /* A list of all derived types.  */
   extern gfc_dt_list *gfc_derived_types;
 
+typedef struct gfc_oacc_routine_name
+{
+  struct gfc_symbol *sym;
+  struct gfc_omp_clauses *clauses;
+  struct gfc_oacc_routine_name *next;
+}
+gfc_oacc_routine_name;
+
+#define gfc_get_oacc_routine_name() XCNEW (gfc_oacc_routine_name)
+
 /* A namespace describes the contents of procedure, module, interface block
or BLOCK construct.  */
 /* ??? Anything else use these?  */
@@ -1648,6 +1661,12 @@ typedef struct gfc_namespace
   /* !$ACC DECLARE clauses.  */
   gfc_omp_clauses *oacc_declare_clauses;
 
+  /* !$ACC ROUTINE clauses.  */
+  gfc_omp_clauses *oacc_routine_clauses;
+
+  /* !$ACC ROUTINE names.  */
+  gfc_oacc_routine_name *oacc_routine_names;
+
   gfc_charlen *cl_list, *old_cl_list;
 
   gfc_dt_list *derived_types;
@@ -1693,6 +1712,9 @@ typedef struct gfc_namespace
 
   /* Set to 1 for !$OMP DECLARE REDUCTION namespaces.  */
   unsigned omp_udr_ns:1;
+
+  /* Set to 1 for !$ACC ROUTINE namespaces.  */
+  unsigned oacc_routine:1;
 }
 gfc_namespace;
 
@@ -2320,7 +2342,7 @@ enum gfc_exec_op
   EXEC_READ, EXEC_WRITE, EXEC_IOLENGTH, EXEC_TRANSFER, EXEC_DT_END,
   EXEC_BACKSPACE, EXEC_ENDFILE, EXEC_INQUIRE, EXEC_REWIND, EXEC_FLUSH,
   EXEC_LOCK, EXEC_UNLOCK,
-  EXEC_OACC_KERNELS_LOOP, EXEC_OACC_PARALLEL_LOOP,
+  EXEC_OACC_KERNELS_LOOP, EXEC_O

[gomp4] teach fortran to reject the device clause in acc routines

2015-11-19 Thread Cesar Philippidis
While porting the fortran acc routine changes from gomp4 to trunk, I
noticed that device was listed in the acc routine clause mask. That is
incorrect; it should be device_type not device. I fixed that problem in
the trunk patch submission. Here's the corresponding fix and test case
that I applied to gomp4.

As time permits, I'll add more test cases with invalid clause
combination for all of the constructs.

Cesar

2015-11-19  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/fortran/
	* openmp.c (OACC_ROUTINE_CLAUSES): Remove OMP_CLAUSE_OACC_DEVICE
	from the clause mask.

	gcc/testsuite/
	* gfortran.dg/goacc/routine-6.f90: Ensure that the device clause is
	invalid with acc routines.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index c17d071..e8e8071 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1312,8 +1312,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
   (OMP_CLAUSE_ASYNC)
 #define OACC_ROUTINE_CLAUSES \
   (OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER | OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ \
-   | OMP_CLAUSE_BIND | OMP_CLAUSE_OACC_DEVICE | OMP_CLAUSE_NOHOST   \
-   | OMP_CLAUSE_DEVICE_TYPE)
+   | OMP_CLAUSE_BIND | OMP_CLAUSE_NOHOST | OMP_CLAUSE_DEVICE_TYPE)
 
 #define OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK \
   (OMP_CLAUSE_COLLAPSE | OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER		\
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-6.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-6.f90
index 1efe7ab..10951ee 100644
--- a/gcc/testsuite/gfortran.dg/goacc/routine-6.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-6.f90
@@ -77,3 +77,13 @@ subroutine subr4 (x)
  x = x * x - 1
   end if
 end subroutine subr4
+
+subroutine subr10 (x)
+  !$acc routine (subr10) device ! { dg-error "Unclassifiable OpenACC directive" }
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+end subroutine subr10


teach delay folding in c++ about OACC_LOOPs

2015-11-18 Thread Cesar Philippidis
Jason,

Your recent delay folding patch broke libgomp.oacc-c++/loop-auto-1.c. It
looks like you forgot to handle OACC_LOOP in cp_fold_r. You probably
didn't notice this because Nathan committed his auto acc loop patch just
before you applied your patch. I'm not sure why only that test is
affected though.

Is this patch ok for trunk?

Cesar
2015-11-17  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* cp-gimplify.c (cp_fold_r): Add support for OACC_LOOP.

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 8fe9e13..99d0cfb 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -933,7 +933,8 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data)
 
   code = TREE_CODE (stmt);
   if (code == OMP_FOR || code == OMP_SIMD || code == OMP_DISTRIBUTE
-  || code == OMP_TASKLOOP || code == CILK_FOR || code == CILK_SIMD)
+  || code == OMP_TASKLOOP || code == CILK_FOR || code == CILK_SIMD
+  || code == OACC_LOOP)
 {
   tree x;
   int i, n;


Re: [1/2] OpenACC routine support

2015-11-18 Thread Cesar Philippidis
On 11/10/2015 12:16 AM, Jakub Jelinek wrote:
> On Mon, Nov 09, 2015 at 09:28:47PM -0800, Cesar Philippidis wrote:
>> Here's the patch that Nathan was referring to. I ended up introducing a
>> boolean variable named first in the various functions which call
>> finalize_oacc_routines. The problem the original approach was having was
>> that the routine clauses is only applied to the first function
>> declarator in a declaration list. By using 'first', which is set to true
>> if the current declarator is the first in a sequence of declarators, I
>> was able to defer setting parser->oacc_routine to NULL.
> 
> The #pragma omp declare simd has identical restrictions, but doesn't need
> to add any of the first parameters to the C++ parser.
> So, what are you doing differently that you need it?  Handling both
> differently is a consistency issue, and unnecessary additional complexity to
> the parser.

I reworked how acc routines are handed in this patch to be more similar
to #pragma omp declare simd. Things get kind of messy though. For
starters, I had to add a new tree clauses member to
cp_omp_declare_simd_data. This serves two purposes:

  * It allows the c++ FE to record the location of the first
#pragma acc routine, which is nice because it allows test cases to
be shared with the c FE.

  * Unlike omp declare simd, only one acc routine may be associated with
a function decl. This meant that I had to defer attaching the acc
geometry and 'omp target' attributes to cp_finalize_oacc_routine
instead of in cp_parser_late_parsing_oacc_routine like in omp. So
what happens is, cp_parser_late_parsing_oacc_routine ends up
creating a function geometry clause.

I don't really like this approach. I did try to postpone parsing the
clauses till cp_finalize_oacc_routine, but that got messy. Plus, while
I'd be able to remove the clauses field from cp_omp_declare_simd_data,
we'd still need a location_t field for cp_ensure_no_oacc_routine.

Is this OK for trunk?

Cesar
2015-11-17  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* parser.h (struct cp_omp_declare_simd_data): Add clauses member.
	(struct cp_parser): Change type the of oacc_routine to
	cp_omp_declare_simd_data.
	* parser.c (cp_ensure_no_oacc_routine): Rework to use
	cp_omp_declare_simd_data.
	(cp_parser_simple_declaration): Remove boolean first.  Update call to
	cp_parser_init_declarator. Don't NULL out oacc_routine.
	(cp_parser_init_declarator): Remove boolean first parameter.  Update
	calls to cp_finalize_oacc_routine.
	(cp_parser_late_return_type_opt): Handle acc routines. 
	(cp_parser_member_declaration): Remove first variable.  Handle
	acc routines like omp declare simd.
	(cp_parser_function_definition_from_specifiers_and_declarator): Update
	call to cp_finalize_oacc_routine.
	(cp_parser_single_declaration): Update call to
	cp_parser_init_declarator.
	(cp_parser_save_member_function_body): Remove first_decl parameter.
	Update call to cp_finalize_oacc_routine.
	(cp_parser_finish_oacc_routine): Delete.
	(cp_parser_oacc_routine): Rework to use cp_omp_declare_simd_data.
	(cp_parser_late_parsing_oacc_routine): New function.
	(cp_finalize_oacc_routine): Remove first argument.  Add more error
	handling and set the acc routine and 'omp declare target' attributes.
	(cp_parser_pragma): Remove unnecessary call to
	cp_ensure_no_oacc_routine.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 0e1116b..8de3bce 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -241,7 +241,7 @@ static bool cp_parser_omp_declare_reduction_exprs
 static tree cp_parser_cilk_simd_vectorlength 
   (cp_parser *, tree, bool);
 static void cp_finalize_oacc_routine
-  (cp_parser *, tree, bool, bool);
+  (cp_parser *, tree, bool);
 
 /* Manifest constants.  */
 #define CP_LEXER_BUFFER_SIZE ((256 * 1024) / sizeof (cp_token))
@@ -1318,13 +1318,21 @@ cp_finalize_omp_declare_simd (cp_parser *parser, tree fndecl)
 }
 }
 
-/* Diagnose if #pragma omp routine isn't followed immediately
-   by function declaration or definition.   */
+/* Diagnose if #pragma acc routine isn't followed immediately by function
+   declaration or definition.  */
 
 static inline void
 cp_ensure_no_oacc_routine (cp_parser *parser)
 {
-  cp_finalize_oacc_routine (parser, NULL_TREE, false, true);
+  if (parser->oacc_routine && !parser->oacc_routine->error_seen)
+{
+  tree clauses = parser->oacc_routine->clauses;
+  location_t loc = OMP_CLAUSE_LOCATION (TREE_PURPOSE(clauses));
+
+  error_at (loc, "%<#pragma oacc routine%> not followed by function "
+		"declaration or definition");
+  parser->oacc_routine = NULL;
+}
 }
 
 /* Decl-specifiers.  */
@@ -2130,7 +2138,7 @@ static tree cp_parser_decltype
 
 static tree cp_parser_init_declarator
   (cp_parser *, cp_decl_specifier_seq *, vec<deferred_access_check, va_gc> *,
-   bool, bool, int, bo

Re: Re: OpenACC declare directive updates

2015-11-18 Thread Cesar Philippidis
On 11/08/2015 08:53 PM, James Norris wrote:

> The attached patch and ChangeLog reflect the updates from your
> review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00714.html
> and Cesar's review:
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00885.html.
> 
> With the changes made in this patch I think I'm handling the
> situation that you pointed out here correctly:
> 
> "Also, wonder about BLOCK stmt in Fortran, that can give you variables that
> don't live through the whole function, but only a portion of it even in
> Fortran."

What block stmt? The most recent version of Fortran OpenACC 2.0a
supports is 2003. The block construct is a 2008 feature. I don't think
that's applicable to this version. Jim, maybe you should add an error
message for variables defined in blocks.

Thinking about this some more, I wonder if we should emit an error if
any acc constructs are used inside blocks? That's probably overly
pessimistic though.

Cesar


Re: Combined constructs' clause splitting

2015-11-18 Thread Cesar Philippidis
On 11/08/2015 07:45 AM, Tom de Vries wrote:
> On 07/11/15 12:45, Thomas Schwinge wrote:
>> Hi!
>>
>> On Fri, 6 Nov 2015 15:31:23 -0800, Cesar Philippidis
>> <ce...@codesourcery.com> wrote:
>>> I've applied this patch to gomp-4_0-branch which backports most of my
>>> front end changes from trunk. Note that I found a regression while
>>> testing, which is also present in trunk. It looks like
>>> kernels-acc-loop-reduction.c is failing because I'm incorrectly
>>> propagating the reduction variable to both to the kernels and loop
>>> constructs for combined 'acc kernels loop'. The problem here is that
>>> kernels don't support the reduction clause. I'll fix that next week.
>>
>> Always need to consider both what the specification allows -- and thus
>> what the front ends accept/refuse -- as well as what we might do
>> differently, internally in later processing stages.  I have not analyzed
>> whether it makes sense to have the OMP_CLAUSE_REDUCTION of a combined
>> "kernels loop reduction([...])" construct be attached to the outer
>> OACC_KERNELS or inner OACC_LOOP, or duplicated for both.
>>
>> Tom, if you need a solution for that right now/want to restore the
>> previous behavior (attached to innter OACC_LOOP only), here's what you
>> should try: in gcc/c-family/c-omp.c:c_oacc_split_loop_clauses remove the
>> special handling for OMP_CLAUSE_REDUCTION, and move it to "Loop clauses"
>> section,
> 
> Committed to gomp-4_0-branch, as attached.

Can you port this patch to trunk? Originally we were attaching the
reduction clause to both the acc loop and parallel construct so that the
reduction variable would get a copy clause implicitly. However, Nathan
later interpreted

  #pragma acc parallel reduction(+:var)

as

  #pragma acc parallel reduction(+:var) private(var)

Therefore, the burden is on the user to ensure that 'var' is transferred
to the parallel region in an appropriate data clause. As a result, we
only need to associate reductions with loops now. So your patch is good
for trunk.

Cesar





Re: nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Cesar Philippidis
On 11/17/2015 09:23 AM, Nathan Sidwell wrote:
> On 11/17/15 12:23, Nathan Sidwell wrote:
>> On 11/17/15 12:16, Cesar Philippidis wrote:
>>> This patch adds an empty priority_queues.c in libgomp for nvptx targets.
>>> Nvptx targets don't have sufficient support for a complete libgomp
>>> library, so we're only building a subset of it. And without that empty
>>> file, I was seeing an error message that looked like this:
>>>
>>> libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
>>>   #include "sem.h"
>>>
>>> I'm still running the entire testsuite, but it doesn't introduce any new
>>> regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
>>> something?
>>
>> Please apply to trunk.  I've just tripped over it, you've saved me  an
>> investigation ...
> 
> Actually, please put a comment in the file, rather than leave it empty

OK. I've applied this patch in r230466.

Cesar

2015-11-17  Cesar Philippidis  <ce...@codesourcery.com>

	libgomp/
	* config/nvptx/priority_queue.c: New file.

diff --git a/libgomp/config/nvptx/priority_queue.c b/libgomp/config/nvptx/priority_queue.c
new file mode 100644
index 000..63aecd2
--- /dev/null
+++ b/libgomp/config/nvptx/priority_queue.c
@@ -0,0 +1 @@
+/* Empty stub for omp task priority support.  */


nvptx priority queues nonsupport in libgomp

2015-11-17 Thread Cesar Philippidis
This patch adds an empty priority_queues.c in libgomp for nvptx targets.
Nvptx targets don't have sufficient support for a complete libgomp
library, so we're only building a subset of it. And without that empty
file, I was seeing an error message that looked like this:

libgomp/libgomp.h:122:17: fatal error: sem.h: No such file or directory
 #include "sem.h"

I'm still running the entire testsuite, but it doesn't introduce any new
regressions in libgomp.oacc-c. Is this OK for trunk, or am I missing
something?

Cesar
2015-11-17  Cesar Philippidis  <ce...@codesourcery.com>

	libgomp/
	* config/nvptx/priority_queue.c: New empty file.

diff --git a/libgomp/config/nvptx/priority_queue.c b/libgomp/config/nvptx/priority_queue.c
new file mode 100644
index 000..e69de29


Re: [gomp4] Fix some broken tests

2015-11-11 Thread Cesar Philippidis
On 11/11/2015 05:40 AM, Nathan Sidwell wrote:
> On 11/10/15 18:08, Cesar Philippidis wrote:
>> On 11/10/2015 12:35 PM, Nathan Sidwell wrote:
>>> I've committed this to  gomp4.  In preparing the reworked firstprivate
>>> patch changes for gomp4's gimplify.c I discovered these testcases were
>>> passing by accident, and lacked a data clause.
>>
>> It used to be if a reduction was on a parallel construct, the gimplifier
>> would introduce a pcopy clause for the reduction variable if it was not
>> associated with any data clause. Is that not the case anymore?
> 
> AFAICT, the std doesn't specify that behaviour.   2.6 'Data Environment'
> doesn't mention reductions as a modifier for implicitly determined data
> attributes.

I guess I was confused because the reduction section in 2.5.11 mentions
something about updating the original reduction variable after the
parallel region.

Cesar



Re: [gomp4] Fix some broken tests

2015-11-10 Thread Cesar Philippidis
On 11/10/2015 12:35 PM, Nathan Sidwell wrote:
> I've committed this to  gomp4.  In preparing the reworked firstprivate
> patch changes for gomp4's gimplify.c I discovered these testcases were
> passing by accident, and lacked a data clause.

It used to be if a reduction was on a parallel construct, the gimplifier
would introduce a pcopy clause for the reduction variable if it was not
associated with any data clause. Is that not the case anymore?

Cesar


Re: [1/2] OpenACC routine support

2015-11-10 Thread Cesar Philippidis
On 11/10/2015 12:16 AM, Jakub Jelinek wrote:
> On Mon, Nov 09, 2015 at 09:28:47PM -0800, Cesar Philippidis wrote:
>> Here's the patch that Nathan was referring to. I ended up introducing a
>> boolean variable named first in the various functions which call
>> finalize_oacc_routines. The problem the original approach was having was
>> that the routine clauses is only applied to the first function
>> declarator in a declaration list. By using 'first', which is set to true
>> if the current declarator is the first in a sequence of declarators, I
>> was able to defer setting parser->oacc_routine to NULL.
> 
> The #pragma omp declare simd has identical restrictions, but doesn't need
> to add any of the first parameters to the C++ parser.
> So, what are you doing differently that you need it?  Handling both
> differently is a consistency issue, and unnecessary additional complexity to
> the parser.

I see that you added an omp_declare_simd->fndecl_seen field to
cp_parser. My objective was to try and make the c++ routine parsing
somewhat consistent with the c front end. I could probably add a similar
oacc_routine field, but I wonder if it would be better to share
omp_declare_simd. There was talk about the next version of openacc
adding support for -fopenacc and -fopenmp together. So maybe there needs
to be a separate oacc_routine field.

Cesar





Re: [1/2] OpenACC routine support

2015-11-09 Thread Cesar Philippidis
On 11/09/2015 04:31 PM, Nathan Sidwell wrote:
> On 11/03/15 10:35, Jakub Jelinek wrote:
>> On Mon, Nov 02, 2015 at 02:21:43PM -0500, Nathan Sidwell wrote:
>>> --- gcc/c/c-parser.c(revision 229667)
>>> +++ gcc/c/c-parser.c(working copy)
>>> @@ -1160,7 +1160,8 @@ enum c_parser_prec {
>>>   static void c_parser_external_declaration (c_parser *);
>>>   static void c_parser_asm_definition (c_parser *);
>>>   static void c_parser_declaration_or_fndef (c_parser *, bool, bool,
>>> bool,
>>> -   bool, bool, tree *, vec);
>>> +   bool, bool, tree *, vec,
>>> +   tree);
>>
>> Wonder if this shouldn't be tree = NULL_TREE, then you'd avoid most of
>> the
>> c_parser_declaration_or_fndef caller changes.
>>
>> Otherwise, LGTM.
> 
> This is the patch I've just committed.  It includes c parser adjustments
> to detect the case of two function decls with a single type specifier. 
> Cesar will be applying a patch for the C++ parser for the same  case.

Here's the patch that Nathan was referring to. I ended up introducing a
boolean variable named first in the various functions which call
finalize_oacc_routines. The problem the original approach was having was
that the routine clauses is only applied to the first function
declarator in a declaration list. By using 'first', which is set to true
if the current declarator is the first in a sequence of declarators, I
was able to defer setting parser->oacc_routine to NULL.

Nathan already approved this patch, so I've applied it to trunk.

Cesar
2015-11-09  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* parser.c (cp_finalize_oacc_routine): New boolean first argument.
	(cp_ensure_no_oacc_routine): Update call to cp_finalize_oacc_routine.
	(cp_parser_simple_declaration): Maintain a boolean first to keep track
	of each new declarator.  Propagate it to cp_parser_init_declarator.
	(cp_parser_init_declarator): New boolean first argument.  Propagate it
	to cp_parser_save_member_function_body and cp_finalize_oacc_routine.
	(cp_parser_member_declaration): Likewise.
	(cp_parser_single_declaration): Update call to
	cp_parser_init_declarator.
	(cp_parser_save_member_function_body): New boolean first_decl argument.
	Propagate it to cp_finalize_oacc_routine.
	(cp_parser_finish_oacc_routine): New boolean first argument.  Use it to
	determine if multiple declarators follow a routine construct.
	(cp_parser_oacc_routine): Update call to cp_parser_finish_oacc_routine.

	gcc/testsuite/
	* c-c++-common/goacc/routine-5.c: Enable c++ tests.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 6fc2c6a..f3b4b46 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -246,7 +246,7 @@ static bool cp_parser_omp_declare_reduction_exprs
 static tree cp_parser_cilk_simd_vectorlength 
   (cp_parser *, tree, bool);
 static void cp_finalize_oacc_routine
-  (cp_parser *, tree, bool);
+  (cp_parser *, tree, bool, bool);
 
 /* Manifest constants.  */
 #define CP_LEXER_BUFFER_SIZE ((256 * 1024) / sizeof (cp_token))
@@ -1329,7 +1329,7 @@ cp_finalize_omp_declare_simd (cp_parser *parser, tree fndecl)
 static inline void
 cp_ensure_no_oacc_routine (cp_parser *parser)
 {
-  cp_finalize_oacc_routine (parser, NULL_TREE, false);
+  cp_finalize_oacc_routine (parser, NULL_TREE, false, true);
 }
 
 /* Decl-specifiers.  */
@@ -2135,7 +2135,7 @@ static tree cp_parser_decltype
 
 static tree cp_parser_init_declarator
   (cp_parser *, cp_decl_specifier_seq *, vec<deferred_access_check, va_gc> *,
-   bool, bool, int, bool *, tree *, location_t *);
+   bool, bool, int, bool *, tree *, bool, location_t *);
 static cp_declarator *cp_parser_declarator
   (cp_parser *, cp_parser_declarator_kind, int *, bool *, bool, bool);
 static cp_declarator *cp_parser_direct_declarator
@@ -2445,7 +2445,7 @@ static tree cp_parser_single_declaration
 static tree cp_parser_functional_cast
   (cp_parser *, tree);
 static tree cp_parser_save_member_function_body
-  (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree);
+  (cp_parser *, cp_decl_specifier_seq *, cp_declarator *, tree, bool);
 static tree cp_parser_save_nsdmi
   (cp_parser *);
 static tree cp_parser_enclosed_template_argument_list
@@ -11909,6 +11909,7 @@ cp_parser_simple_declaration (cp_parser* parser,
   bool saw_declarator;
   location_t comma_loc = UNKNOWN_LOCATION;
   location_t init_loc = UNKNOWN_LOCATION;
+  bool first = true;
 
   if (maybe_range_for_decl)
 *maybe_range_for_decl = NULL_TREE;
@@ -12005,7 +12006,10 @@ cp_parser_simple_declaration (cp_parser* parser,
 	declares_class_or_enum,
 	_definition_p,
 	maybe_range_for_decl,
+	first,
 	_loc);
+  first = false;
+
   /* If an error occurred while parsing tentatively, exit quickly.
 	 (That usually happens when in the body of a function; each
 	 statement is

Re: [1/2] OpenACC routine support

2015-11-09 Thread Cesar Philippidis
On 11/09/2015 04:48 PM, Nathan Sidwell wrote:
> And these are the new tests.  Cesar, c-c++-common/goacc/routine-5.c will
> need adjusting with your C++ parser patch.  You'll see the two cases
> I've #if'd out.

I enabled those tests in trunk with the patch I posted here
.

Cesar


Re: Re: OpenACC declare directive updates

2015-11-08 Thread Cesar Philippidis
On 11/08/2015 07:29 AM, James Norris wrote:

> The attached patch and ChangeLog reflect the updates from your
> review: https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00714.html.
> All of the issues pointed out, have been address.
> 
> With the changes made in this patch I think I'm handling the
> situation that you pointed out here correctly:
> 
> On Fri, Nov 06, 2015 at 01:45:09PM -0600, James Norris wrote:
> 
> Also, wonder about BLOCK stmt in Fortran, that can give you variables that
> don't live through the whole function, but only a portion of it even in
> Fortran.
> 
> OK to commit to trunk?

I'll defer to Jakub, but here are a couple of comments.

>  void
>  gfc_resolve_oacc_declare (gfc_namespace *ns)
>  {
>int list;
>gfc_omp_namelist *n;
>locus loc;
> +  gfc_oacc_declare *oc;
>  
> -  if (ns->oacc_declare_clauses == NULL)
> +  if (ns->oacc_declare == NULL)
>  return;
>  
> -  loc = ns->oacc_declare_clauses->loc;
> +  loc = gfc_current_locus;
>  
> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> -   list <= OMP_LIST_DEVICE_RESIDENT; list++)
> -for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
> -  {
> - n->sym->mark = 0;
> - if (n->sym->attr.flavor == FL_PARAMETER)
> -   gfc_error ("PARAMETER object %qs is not allowed at %L", n->sym->name, 
> );
> -  }
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_DEVICE_RESIDENT;
> +list <= OMP_LIST_DEVICE_RESIDENT; list++)

Why is this loop necessary?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   {
> + n->sym->mark = 0;
> + if (n->sym->attr.flavor == FL_PARAMETER)
> +   gfc_error ("PARAMETER object %qs is not allowed at %L",
> +  n->sym->name, );
> +   }
>  
> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> -   list <= OMP_LIST_DEVICE_RESIDENT; list++)
> -for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
> -  {
> - if (n->sym->mark)
> -   gfc_error ("Symbol %qs present on multiple clauses at %L",
> -  n->sym->name, );
> - else
> -   n->sym->mark = 1;
> -  }
> +  for (list = OMP_LIST_DEVICE_RESIDENT;
> + list <= OMP_LIST_DEVICE_RESIDENT; list++)

And here.

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   {
> + if (n->sym->mark)
> +   gfc_error ("Symbol %qs present on multiple clauses at %L",
> +  n->sym->name, );
> + else
> +   n->sym->mark = 1;
> +   }
>  
> -  for (n = ns->oacc_declare_clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n;
> -   n = n->next)
> -check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
> -}
> +  for (n = oc->clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n; n = n->next)

This is better.

> + check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
> +
> +  for (n = oc->clauses->lists[OMP_LIST_MAP]; n; n = n->next)
> + {
> +   if (n->expr && n->expr->ref->type == REF_ARRAY)
> +   gfc_error ("Array sections: %qs not allowed in"
> +  " $!ACC DECLARE at %L", n->sym->name, );
> + }
> +}
> +
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)

?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   n->sym->mark = 0;
> +}
>  
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)

?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   {
> + if (n->sym->mark)
> +   gfc_error ("Symbol %qs present on multiple clauses at %L",
> +  n->sym->name, );
> + else
> +   n->sym->mark = 1;
> +   }
> +}
> +
> +  for (oc = ns->oacc_declare; oc; oc = oc->next)
> +{
> +  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)

?

> + for (n = oc->clauses->lists[list]; n; n = n->next)
> +   n->sym->mark = 0;
> +}
> +}

I only noticed these because I thought I fixed them in the patch you
asked me to revert from gomp-4_0-branch. At the very least, please try
to be consistent on iterating OMP_LIST_*.

Cesar



Re: [gomp4] backport trunk FE changes

2015-11-07 Thread Cesar Philippidis
On 11/07/2015 04:30 AM, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 6 Nov 2015 15:31:23 -0800, Cesar Philippidis <ce...@codesourcery.com> 
> wrote:
>> I've applied this patch to gomp-4_0-branch which backports most of my
>> front end changes from trunk.
> 
>> --- a/gcc/cp/pt.c
>> +++ b/gcc/cp/pt.c
>> @@ -14398,7 +14398,6 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, 
>> bool allow_fields,
>>  case OMP_CLAUSE_NUM_GANGS:
>>  case OMP_CLAUSE_NUM_WORKERS:
>>  case OMP_CLAUSE_VECTOR_LENGTH:
>> -case OMP_CLAUSE_GANG:
>>  case OMP_CLAUSE_WORKER:
>>  case OMP_CLAUSE_VECTOR:
>>  case OMP_CLAUSE_ASYNC:
>> @@ -14427,7 +14426,7 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, 
>> bool allow_fields,
>>  = tsubst_omp_clause_decl (OMP_CLAUSE_DECL (oc), args, complain,
>>in_decl);
>>break;
>> -case OMP_CLAUSE_LINEAR:
>> +case OMP_CLAUSE_GANG:
>>  case OMP_CLAUSE_ALIGNED:
>>OMP_CLAUSE_DECL (nc)
>>  = tsubst_omp_clause_decl (OMP_CLAUSE_DECL (oc), args, complain,
> 
> This -- unintentional, I suppose ;-) -- removal of OMP_CLAUSE_LINEAR
> caused a lot of regressions; committed to gomp-4_0-branch in r229928:

Thank you. I had two versions of this patch and I committed the wrong
one. That was the only change though.

Cesar



Re: Combined constructs' clause splitting

2015-11-07 Thread Cesar Philippidis
On 11/07/2015 03:45 AM, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 6 Nov 2015 15:31:23 -0800, Cesar Philippidis <ce...@codesourcery.com> 
> wrote:
>> I've applied this patch to gomp-4_0-branch which backports most of my
>> front end changes from trunk. Note that I found a regression while
>> testing, which is also present in trunk. It looks like
>> kernels-acc-loop-reduction.c is failing because I'm incorrectly
>> propagating the reduction variable to both to the kernels and loop
>> constructs for combined 'acc kernels loop'. The problem here is that
>> kernels don't support the reduction clause. I'll fix that next week.
> 
> Always need to consider both what the specification allows -- and thus
> what the front ends accept/refuse -- as well as what we might do
> differently, internally in later processing stages.  I have not analyzed
> whether it makes sense to have the OMP_CLAUSE_REDUCTION of a combined
> "kernels loop reduction([...])" construct be attached to the outer
> OACC_KERNELS or inner OACC_LOOP, or duplicated for both.
> 
> Tom, if you need a solution for that right now/want to restore the
> previous behavior (attached to innter OACC_LOOP only), here's what you
> should try: in gcc/c-family/c-omp.c:c_oacc_split_loop_clauses remove the
> special handling for OMP_CLAUSE_REDUCTION, and move it to "Loop clauses"
> section, and in

That should would work.

> gcc/fortran/trans-openmp.c:gfc_trans_oacc_combined_directive I don't see
> reduction clauses being handled, hmm, maybe the Fortran front end is
> doing that differently?

You're correct, reductions are being associated with kernels and
parallel constructs. This is one area that needed more test cases, but
things like

  'acc parallel reduction(+:var) copy(var)'

was broken because of the recent gimplifier changes, so I couldn't test
for it. I was planning on fixing both problems (reductions and variable
appearing in multiple clauses) after Nathan's firstprivate and default
gimplifier changes landed in trunk.

Cesar



[gomp4] revert fortran declare changes

2015-11-06 Thread Cesar Philippidis
This patch reverts the declare cleanups I introduced in
<https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02945.html>. Notably, I
deleted resolve_omp_duplicate_list and resolve_oacc_declare_map.

Jim, I also reverted my changes to declare-2.f95. Please make the error
messages consistent with the rest of the FE going forward. Also, note
that the outer for loop in here

+  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)
+   for (n = oc->clauses->lists[list]; n; n = n->next)
+ {
+   if (n->sym->mark)
+ gfc_error ("Symbol %qs present on multiple clauses at %L",
+n->sym->name, );
+   else
+ n->sym->mark = 1;
+ }

is unnecessary. I'm not sure if that you intended to check for other
lists or what.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-11-06  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/fortran/
	* openmp.c (gfc_match_oacc_declare): Revert error message changes.
	(resolve_omp_duplicate_list): Delete.
	(resolve_oacc_declare_map): Delete.
	(gfc_resolve_oacc_declare): Scan map clauses in place.

	gcc/testsuite/
	* gfortran.dg/goacc/declare-2.f95: Update expected errors.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 484add8..1572fdb 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1445,7 +1445,7 @@ gfc_match_oacc_declare (void)
   gfc_omp_clauses *c;
   gfc_omp_namelist *n;
   gfc_namespace *ns = gfc_current_ns;
-  gfc_oacc_declare *new_oc;
+  gfc_oacc_declare *new_oc, *oc;
   bool module_var = false;
 
   if (gfc_match_omp_clauses (, OACC_DECLARE_CLAUSES, 0, false, false, true)
@@ -1467,8 +1467,8 @@ gfc_match_oacc_declare (void)
 	  if (n->u.map_op != OMP_MAP_FORCE_ALLOC
 	  && n->u.map_op != OMP_MAP_FORCE_TO)
 	{
-	  gfc_error ("Invalid clause in module with $!ACC DECLARE at %L",
-			 >where);
+	  gfc_error ("Invalid clause in module with "
+			 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
@@ -1477,29 +1477,29 @@ gfc_match_oacc_declare (void)
 
   if (ns->proc_name->attr.oacc_function)
 	{
-	  gfc_error ("Invalid declare in routine with $!ACC DECLARE at %C");
+	  gfc_error ("Invalid declare in routine with " "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
   if (s->attr.in_common)
 	{
-	  gfc_error ("Variable in a common block with $!ACC DECLARE at %L",
-		 >where);
+	  gfc_error ("Unsupported: variable in a common block with "
+		 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
   if (s->attr.use_assoc)
 	{
-	  gfc_error ("Variable is USE-associated with $!ACC DECLARE at %L",
-		 >where);
+	  gfc_error ("Unsupported: variable is USE-associated with "
+		 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
   if ((s->attr.dimension || s->attr.codimension)
 	  && s->attr.dummy && s->as->type != AS_EXPLICIT)
 	{
-	  gfc_error ("Assumed-size dummy array with $!ACC DECLARE at %L",
-		 >where);
+	  gfc_error ("Unsupported: assumed-size dummy array with "
+		 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
@@ -1527,6 +1527,37 @@ gfc_match_oacc_declare (void)
   new_oc->module_var = module_var;
   new_oc->clauses = c;
   new_oc->where = gfc_current_locus;
+
+  for (oc = new_oc; oc; oc = oc->next)
+{
+  c = oc->clauses;
+  for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	n->sym->mark = 0;
+}
+
+  for (oc = new_oc; oc; oc = oc->next)
+{
+  c = oc->clauses;
+  for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	{
+	  if (n->sym->mark)
+	{
+	  gfc_error ("Symbol %qs present on multiple clauses at %C",
+			 n->sym->name);
+	  return MATCH_ERROR;
+	}
+	  else
+	n->sym->mark = 1;
+	}
+}
+
+  for (oc = new_oc; oc; oc = oc->next)
+{
+  c = oc->clauses;
+  for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	n->sym->mark = 1;
+}
+
   ns->oacc_declare = new_oc;
 
   return MATCH_YES;
@@ -3151,41 +3182,6 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns,
   return copy;
 }
 
-/* Check if a variable appears in multiple clauses.  */
-
-static void
-resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
-			int list)
-{
-  gfc_omp_namelist *n;
-  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
-
-  /* OpenACC reduction clauses are compatible with everything.  We only
- need to check if a reduction variable is used more than once.  */
-  if (openacc && list == OMP_LIST_REDUCTION)
-{
-  hash_set reductions;
-
-  for (n = clause_list; n; n = n->

[gomp4] backport trunk FE changes

2015-11-06 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which backports most of my
front end changes from trunk. Note that I found a regression while
testing, which is also present in trunk. It looks like
kernels-acc-loop-reduction.c is failing because I'm incorrectly
propagating the reduction variable to both to the kernels and loop
constructs for combined 'acc kernels loop'. The problem here is that
kernels don't support the reduction clause. I'll fix that next week.

Cesar
2015-11-06  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/c-family/
	* c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
	AUTO, SEQ, INDEPENDENT and PRIVATE loop clauses.  Associate REDUCTION
	clauses with parallel and kernels and loops.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_default): Replace only_none with
	is_oacc argument.
	(c_parser_oacc_shape_clause): Allow pointers arguments to gang static.
	(c_parser_oacc_clause_tile): Backport cleanups from trunnk.
	(c_parser_oacc_all_clauses): Likewise, update call to
	c_parser_omp_clause_default.
	(c_parser_omp_all_clauses): Update call to c_parser_omp_clause_default.

	gcc/cp/
	* parser.c (cp_parser_oacc_shape_clause): Allow pointers arguments to
	gang static.
	(cp_parser_oacc_clause_tile): Backport cleanups from trunnk.
	(cp_parser_omp_clause_default): Replace is_omp argument with is_oacc.
	(cp_parser_oacc_all_clauses): Likewise, update call to
	c_parser_omp_clause_{default,tile}.
	(cp_parser_omp_all_clauses): Update call to
	c_parser_omp_clause_default.
	(OACC_PARALLEL_CLAUSE_MASK): Remove PRAGMA_OACC_CLAUSE_GANG.
	* pt.c (tsubst_omp_clauses):
	* semantics.c (finish_omp_clauses):

	gcc/testsuite/
	* c-c++-common/goacc/combined-directives.c: New test.
	* c-c++-common/goacc/loop-clauses.c: New test.
	* c-c++-common/goacc/loop-shape.c: More test cases.
	* c-c++-common/goacc/loop-tile-k1.c: Update error messages.
	* c-c++-common/goacc/loop-tile-p1.c: Likewise.
	* c-c++-common/goacc/tile.c: New test.

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 67d9da0..8411814 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -694,13 +694,12 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 /* This function splits clauses for OpenACC combined loop
constructs.  OpenACC combined loop constructs are:
#pragma acc kernels loop
-   #pragma acc parallel loop
-*/
+   #pragma acc parallel loop  */
 
 tree
 c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 {
-  tree next, loop_clauses;
+  tree next, loop_clauses, t;
 
   loop_clauses = *not_loop_clauses = NULL_TREE;
   for (; clauses ; clauses = next)
@@ -709,27 +708,29 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 
   switch (OMP_CLAUSE_CODE (clauses))
 {
+	  /* Loop clauses.  */
 	case OMP_CLAUSE_COLLAPSE:
-	case OMP_CLAUSE_REDUCTION:
+	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE_GANG:
 	case OMP_CLAUSE_WORKER:
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_INDEPENDENT:
+	case OMP_CLAUSE_PRIVATE:
 	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
 	  loop_clauses = clauses;
 	  break;
 
-	case OMP_CLAUSE_PRIVATE:
-	  {
-	tree nc = build_omp_clause (OMP_CLAUSE_LOCATION (clauses),
-	OMP_CLAUSE_CODE (clauses));
-	OMP_CLAUSE_DECL (nc) = OMP_CLAUSE_DECL (clauses);
-	OMP_CLAUSE_CHAIN (nc) = loop_clauses;
-	loop_clauses = nc;
-	  }
-	  /* FALLTHRU */
+	  /* Reductions belong in both constructs.  */
+	case OMP_CLAUSE_REDUCTION:
+	  t = copy_node (clauses);
+	  OMP_CLAUSE_CHAIN (t) = loop_clauses;
+	  loop_clauses = t;
+
+	  /* FIXME: device_type */
 
+	  /* Parallel/kernels clauses.  */
 	default:
 	  OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
 	  *not_loop_clauses = clauses;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index fa70055..96c1bdc 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10627,11 +10627,13 @@ c_parser_omp_clause_copyprivate (c_parser *parser, tree list)
 }
 
 /* OpenMP 2.5:
-   default ( shared | none ) */
+   default ( shared | none )
+
+   OpenACC 2.0:
+   default (none) */
 
 static tree
-c_parser_omp_clause_default (c_parser *parser, tree list,
-			 bool only_none = false)
+c_parser_omp_clause_default (c_parser *parser, tree list, bool is_oacc)
 {
   enum omp_clause_default_kind kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
   location_t loc = c_parser_peek_token (parser)->location;
@@ -10652,7 +10654,7 @@ c_parser_omp_clause_default (c_parser *parser, tree list,
 	  break;
 
 	case 's':
-	  if (strcmp ("shared", p) != 0 || only_none)
+	  if (strcmp ("shared", p) != 0 || is_oacc)
 	goto invalid_kind;
 	  kind = OMP_CLAUSE_DEFAULT_SHARED;
 	  break;
@@ -10666,7 +10668,7 @@ c_parser_omp_clause_default (c_parser *parser, tree list,
   else
 {
 invalid_kind:
-  if (only_none)
+  if (is_oacc)
 	c_parser_error (parser, "expected %<none%>");
   else
 	c_parser_error (parser, "expect

Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Cesar Philippidis
On 11/05/2015 09:13 AM, Nathan Sidwell wrote:
> On 11/05/15 12:01, Thomas Schwinge wrote:
> 
>> On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis
>> <ce...@codesourcery.com> wrote:
>>> On 11/05/2015 04:14 AM, Thomas Schwinge wrote:
> 
>>> Sorry, I must have mis-phrased it. The spec is unclear here. There are
>>> three possible ways to interpret 'acc parallel loop reduction':
>>>
>>>1. acc parallel reduction
>>>   acc loop
>>
>> This is what you propose in your patch, but I don't think that makes
>> sense, or does it?  I'm happy to learn otherwise, but in my current
>> understanding, a reduction clause needs to be attached (at least) to the
>> innermost construct where reductions are to be processed.  (Let's also
> 
> Correct, the  above interpretation must be wrong.
> 
>> consider multi-level gang/worker/vector loops/reductions.)  So, either:
>>
>>>2. acc parallel
>>>   acc loop reduction
>>
>> ... this, or even this:
>>
>>>3. acc parallel reduction
>>>   acc loop reduction
>>
>> ..., which I'm not sure what the execution model implementation requires.
>> (Nathan?)
> 
> interpretation #2 is sufficient, I think. However, both are lacking a
> 'copy (reduction_var)', clause as otherwise there's nothing changing the
> default data attribute of 'firstprivate' (working on that patch). 
> Perhaps 'reduction' on 'parallel'  is meant to imply that  (because
> that's what makes sense), but the std doesn't say it.
> 
> In summary it's probably safe to implement interpretation #3.  That way
> we can implement the hypothesis that reductions at the outer construct
> imply copy.

OK, #3 it is.

>> And while we're at it: the very same question also applies to the private
>> clause, which -- contrary to all other (as far as I remember) clauses --
>> also is applicable to both the parallel and loop constructs:
>>
>>  #pragma acc parallel loop private([...])
>>
>> ... is to be decomposed into which of the following:
>>
>>  #pragma acc parallel private([...])
>>  #pragma acc loop
>>
>>  #pragma acc parallel
>>  #pragma acc loop private([...])
>>
>>  #pragma acc parallel private([...])
>>  #pragma acc loop private([...])
>>
>> (There is no private clause allowed to be specified with the kernels
>> construct for what it's worth, but that doesn't mean we couldn't use it
>> internally, of course, if so required.)
> 
> I think interpretation #2 or #3 make sense, and I suspect result in the
> same emitted code.

I'll probably go #2 here to make life easier with kernels.

After I make these changes (and the c++ template updates), I'll apply
them to trunk and backport them to gomp4. Thank you Jakub, Thomas and
Nathan for reviewing these patches.

Cesar


Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Cesar Philippidis
I've applied this patch to trunk. It also includes the fortran and
template changes. Note that there is a new regression in
gfortran.dg/goacc/combined_loop.f90. Basically, the gimplifier is
complaining about reduction variables appearing in multiple clauses.
E.g. 'acc parallel reduction(+:var) copy(var)'. Nathan's upcoming
gimplifier changes should address that.

Also, because of these reduction problems, I decided not to merge
combined_loops.f90 with combined-directives.f90 yet because the latter
relies on scanning which would fail with the errors detected during
gimplfication. I'm planning on adding a couple of more test cases once
acc reductions are working on trunk.

Cesar
2015-11-05  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>


	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Add support for
	OMP_CLAUSE_TILE.  Update handling of OMP_CLAUSE_INDEPENDENT.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Add support for OMP_CLAUSE_TILE.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_TILE.
	* tree.c (omp_clause_num_ops): Add an entry for OMP_CLAUSE_TILE.
	(omp_clause_code_name): Likewise.
	(walk_tree_1): Handle OMP_CLAUSE_TILE.
	* tree.h (OMP_TILE_LIST): New macro.

	gcc/c-family/
	* c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
	AUTO, SEQ, INDEPENDENT and PRIVATE loop clauses.  Associate REDUCTION
	clauses with parallel and kernels and loops.
	* c-pragma.h (enum pragma_omp_clause): Add entries for
	PRAGMA_OACC_CLAUSE_{INDEPENDENT,TILE,DEFAULT}.
	* pt.c (tsubst_omp_clauses): Add support for OMP_CLAUSE_{NUM_GANGS,
	NUM_WORKERS,VECTOR_LENGTH,GANG,WORKER,VECTOR,ASYNC,WAIT,TILE,AUTO,
	INDEPENDENT,SEQ}. 
	(tsubst_expr): Add support for OMP_CLAUSE_{KERNELS,PARALLEL,LOOP}.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Add support for
	PRAGMA_OACC_CLAUSE_INDEPENDENT and PRAGMA_OACC_CLAUSE_TILE.
	(c_parser_omp_clause_default): Add is_oacc argument. Handle
	default(none) in OpenACC.
	(c_parser_oacc_shape_clause): Allow pointer variables as gang static
	arguments.
	(c_parser_oacc_clause_tile): New function.
	(c_parser_oacc_all_clauses): Add support for OMP_CLAUSE_DEFAULT,
	OMP_CLAUSE_INDEPENDENT and OMP_CLAUSE_TILE.
	(OACC_LOOP_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{PRIVATE,INDEPENDENT,
	TILE}.
	(OACC_KERNELS_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.
	(OACC_PARALLEL_MASK): Add PRAGMA_OACC_CLAUSE_{DEFAULT,PRIVATE,
	FIRSTPRIVATE}.
	(c_parser_omp_all_clauses): Update call to c_parser_omp_clause_default.
	(c_parser_oacc_update): Update the error message for missing clauses.
	* c-typeck.c (c_finish_omp_clauses): Add support for OMP_CLAUSE_TILE
	and OMP_CLAUSE_INDEPENDENT.

	gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Add support for
	PRAGMA_OACC_CLAUSE_INDEPENDENT and PRAGMA_OACC_CLAUSE_TILE.
	(cp_parser_oacc_shape_clause): Allow pointer variables as gang static
	arguments.
	(cp_parser_oacc_clause_tile): New function.
	(cp_parser_omp_clause_default): Add is_oacc argument. Handle
	default(none) in OpenACC.
	(cp_parser_oacc_all_clauses): Add support for
	(cp_parser_omp_all_clauses): Update call to
	cp_parser_omp_clause_default.
	PRAGMA_OACC_CLAUSE_{DEFAULT,INDEPENDENT,TILE,PRIVATE,FIRSTPRIVATE}.
	(OACC_LOOP_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{PRIVATE,INDEPENDENT,
	TILE}.
	(OACC_KERNELS_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.
	(OACC_PARALLEL_MASK): Add PRAGMA_OACC_CLAUSE_{DEFAULT,PRIVATE,
	FIRSTPRIVATE}.
	(cp_parser_oacc_update): Update the error message for missing clauses.
	* semantics.c (finish_omp_clauses): Add support for
	OMP_CLAUSE_INDEPENDENT and OMP_CLAUSE_TILE.

2015-11-05  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/fortran/
	* openmp.c (gfc_match_omp_clauses): Update support for the tile
	and default clauses in OpenACC.
	(gfc_match_oacc_update): Error when data clauses are supplied.
	(oacc_compatible_clauses): Delete.
	(resolve_omp_clauses): Give special care for OpenACC reductions.
	Also update error reporting for the tile clause.
	(resolve_oacc_loop_blocks): Update error reporting for the tile clause.
	* trans-openmp.c (gfc_trans_omp_clauses): Update OMP_CLAUSE_SEQ. Add
	OMP_CLAUSE_{AUTO,TILE} and add support the the gang static argument.
	(gfc_trans_oacc_combined_directive): Update the list of clauses which
	are split to acc loops.


2015-11-05  Cesar Philippidis  <ce...@codesourcery.com>
	Tom de Vries  <t...@codesourcery.com>
	Nathan Sidwell  <nat...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>

	gcc/testsuite/
	* c-c++-common/goacc/combined-directives.c: New test.
	* c-c++-common/goacc/loop-clauses.c: New test.
	* c-c++-common/goacc/tile.c: New test.
	* c-c++-common/goacc/loop-shape.c: Add test for pointer variable
	as gang static arguments.
	* c-c++-common/goacc/update-1.c: Adjust expected err

Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Cesar Philippidis
On 11/05/2015 04:14 AM, Thomas Schwinge wrote:

> On Tue, 3 Nov 2015 14:16:59 -0800, Cesar Philippidis <ce...@codesourcery.com> 
> wrote:
>> This patch does the following to the c and c++ front ends:
> 
>>  * updates c_oacc_split_loop_clauses to filter out the loop clauses
>>from combined parallel/kernels loops
> 
>>  gcc/c-family/
>>  * c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
>>  AUTO, SEQ and independent as loop clauses.  Associate REDUCTION
>>  clauses with parallel and kernels.
> 
>> --- a/gcc/c-family/c-omp.c
>> +++ b/gcc/c-family/c-omp.c
>> @@ -709,12 +709,21 @@ c_oacc_split_loop_clauses (tree clauses, tree 
>> *not_loop_clauses)
>>  
>>switch (OMP_CLAUSE_CODE (clauses))
>>  {
>> +  /* Loop clauses.  */
>>  case OMP_CLAUSE_COLLAPSE:
>> -case OMP_CLAUSE_REDUCTION:
>> +case OMP_CLAUSE_TILE:
>> +case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_WORKER:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_AUTO:
>> +case OMP_CLAUSE_SEQ:
>> +case OMP_CLAUSE_INDEPENDENT:
>>OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
>>loop_clauses = clauses;
>>break;
>>  
>> +  /* Parallel/kernels clauses.  */
>> +
>>  default:
>>OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
>>*not_loop_clauses = clauses;
> 
> Contrary to your ChangeLog entry, this is not duplicating but is moving
> OMP_CLAUSE_REDUCTION handling.  Is that intentional?  (And, doesn't
> anything break in the testsuite?)

Sorry, I must have mis-phrased it. The spec is unclear here. There are
three possible ways to interpret 'acc parallel loop reduction':

  1. acc parallel reduction
 acc loop

  2. acc parallel
 acc loop reduction

  3. acc parallel reduction
 acc loop reduction


You told me to make all of the front ends consistent, and since I
started working on fortran first, I had c and c++ follow what it was doing.

I haven't observed any regressions with this in in place. Then again,
maybe we don't have sufficient test coverage. I'll do more testing.

Cesar



Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-05 Thread Cesar Philippidis
On 11/04/2015 11:29 PM, Jakub Jelinek wrote:
> On Wed, Nov 04, 2015 at 08:58:32PM -0800, Cesar Philippidis wrote:
>> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
>> index e3f55a7..4424596 100644
>> --- a/gcc/cp/pt.c
>> +++ b/gcc/cp/pt.c
>> @@ -14395,6 +14395,15 @@ tsubst_omp_clauses (tree clauses, bool 
>> declare_simd, bool allow_fields,
>>  case OMP_CLAUSE_PRIORITY:
>>  case OMP_CLAUSE_ORDERED:
>>  case OMP_CLAUSE_HINT:
>> +case OMP_CLAUSE_NUM_GANGS:
>> +case OMP_CLAUSE_NUM_WORKERS:
>> +case OMP_CLAUSE_VECTOR_LENGTH:
>> +case OMP_CLAUSE_GANG:
> 
> GANG has two arguments, so you want to handle it differently, you need
> to tsubst both arguments.

Good catch. Support for the static argument was added after I added
template support in gomp4. I'll fix that.

>> +case OMP_CLAUSE_WORKER:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_ASYNC:
>> +case OMP_CLAUSE_WAIT:
>> +case OMP_CLAUSE_TILE:
> 
> I don't think tile will work well this way, if the only argument is a
> TREE_VEC, then I think you hit:
> case TREE_VEC:
>   /* A vector of template arguments.  */
>   gcc_assert (!type);
>   return tsubst_template_args (t, args, complain, in_decl);
> which does something very much different from making a copy of the TREE_VEC
> and calling tsubst_expr on each argument.
> Thus, either you need to handle it manually here, or think about different
> representation of OMP_CLAUSE_TILE?  It seems you allow at most one tile
> clause, so perhaps you could split the single source tile clause into one
> tile clause per expression in there (the only issue is that the FEs
> emit the clauses in last to first order, so you'd need to nreverse the
> tile clause list before adding it to the list of all clauses).

It shouldn't be difficult to call it manually here.

> Otherwise it looks ok, except:

How about the other patch, with the c and c++ FE changes? Is that one OK
for trunk now? Nathan's going to need it for this firstprivate changes
in the middle end.

>> diff --git a/gcc/testsuite/g++.dg/goacc/template-reduction.C 
>> b/gcc/testsuite/g++.dg/goacc/template-reduction.C
>> new file mode 100644
>> index 000..668eeb3
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/goacc/template-reduction.C
>> +++ b/gcc/testsuite/g++.dg/goacc/template.C
> 
> the testsuite coverage is orders of magnitude smaller than it should be.
> Just look at the amount of OpenMP template tests (both compile time and
> runtime):
> grep template libgomp/testsuite/libgomp.c++/*[^4] | wc -l; grep -l template 
> libgomp/testsuite/libgomp.c++/*[^4] | wc -l; grep template 
> gcc/testsuite/g++.dg/gomp/* | wc -l; grep -l template 
> gcc/testsuite/g++.dg/gomp/* | wc -l
> 629 # templates
> 45 # tests with templates
> 151 # templates
> 58 # tests with templates
> and even that is really not sufficient.  From my experience, special care is
> needed in template tests to test both non-type dependent and type-dependent
> cases (e.g. some diagnostics is emitted already when parsing the templates
> even when they won't be instantiated at all, other diagnostic is done during
> instantiation), or for e.g. references there are interesting cases where
> a reference to template parameter typename is used or where a reference to
> some time is tsubsted into a template parameter typename.
> E.g. you don't have any test coverage for the vector (num: ...)
> or gang (static: *, num: type_dep)
> or gang (static: type_dep1, type_dep2)
> (which would show you the above issue with the gang clause), or sufficient
> coverage for tile, etc.
> Of course that coverage can be added incrementally.

We'll add more tests incrementally.

Cesar


Re: [OpenACC] num_gangs, num_workers and vector_length in c++

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 10:09 AM, Jason Merrill wrote:

> A single function is better, to avoid unnecessary code duplication.

Thanks. I've applied this patch to trunk.

Cesar

2015-11-04  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* (cp_parser_oacc_single_int_clause): New function.
	(cp_parser_oacc_clause_vector_length): Delete.
	(cp_parser_omp_clause_num_gangs): Delete.
	(cp_parser_omp_clause_num_workers): Delete.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_single_int_clause
	for num_gangs, num_workers and vector_length.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 12452e6..4f6cd2d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29590,6 +29590,39 @@ cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
   return c;
 }
 
+ /* OpenACC:
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )  */
+
+static tree
+cp_parser_oacc_single_int_clause (cp_parser *parser, omp_clause_code code,
+  const char *str, tree list)
+{
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+return list;
+
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
+
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+{
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
+  return list;
+}
+
+  check_no_duplicate_clause (list, code, str, loc);
+
+  tree c = build_omp_clause (loc, code);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -29713,45 +29746,6 @@ cp_parser_oacc_shape_clause (cp_parser *parser, omp_clause_code kind,
   return list;
 }
 
-/* OpenACC:
-   vector_length ( expression ) */
-
-static tree
-cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
-
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-{
-  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenACC 2.0
Parse wait clause or directive parameters.  */
 
@@ -30130,42 +30124,6 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_GANGS, "num_gangs", location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_NUM_GANGS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenMP 2.5:
num_threads ( expression ) */
 
@@ -30374,43 +30332,6 @@ cp_parser_omp_clause_defaultmap (cp_parser *parser, tree list,
   return list;
 }
 
-/* OpenACC:
-   num_workers ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_workers (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-   

[gomp4] assorted trunk backports

2015-11-04 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which does the following:

 * reverts some of the fortran changes that Jakub rejected for
   trunk, specifically those involving resolve_omp_duplicate_list

 * updates how num_gangs, num_workers and vector_length are parsed by
   the c++ FE

 * changes how errors are reported for acc update when no clauses are
   specified and updates the test cases for c, c++ and fortran

 * shuffles around various code to match trunk in the fortran FE

We probably should update the way the c FE handles num_gangs,
num_workers and vector_length too, but I left it as-is since it works.
The c++ FE would have had problems with template support eventually
because it was doing type checking during scanning.

Cesar
2015-11-04  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/c/
	* c-parser.c (c_parser_oacc_update): Update the error message for
	missing clauses.

	gcc/cp/
	* parser.c (cp_parser_oacc_simple_clause): New function.
	(cp_parser_oacc_clause_vector_length): Delete.
	(cp_parser_omp_clause_num_gangs): Delete.
	(cp_parser_omp_clause_num_workers): Delete.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_simple_clause for
	num_gangs, num_workers and vector_length.
	(cp_parser_oacc_update): Update the error message for missing
	clauses.

	gcc/fortran/
	* openmp.c (gfc_match_omp_clauses): Update how default is parsed.
	(gfc_match_oacc_update): Update error message.
	(resolve_omp_clauses): Don't use resolve_omp_duplicate_list for
	omp clauses.
	(oacc_code_to_statement): Merge atomic placement from trunk.

	gcc/testsuite/
	* c-c++-common/goacc/update-1.c: Update expected error.
	* gfortran.dg/goacc/update.f95: Likewise.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 45e4248..fa70055 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -14060,7 +14060,7 @@ c_parser_oacc_update (c_parser *parser)
 {
   error_at (loc,
 		"%<#pragma acc update%> must contain at least one "
-		"%<device%> or % clause");
+		"%<device%> or %<host%> or %<self%> clause");
   return;
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1e2afdb..ce8bc6a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29652,6 +29652,39 @@ cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
   return c;
 }
 
+ /* OpenACC:
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )  */
+
+static tree
+cp_parser_oacc_single_int_clause (cp_parser *parser, omp_clause_code code,
+  const char *str, tree list)
+{
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+return list;
+
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
+
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+{
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
+  return list;
+}
+
+  check_no_duplicate_clause (list, code, str, loc);
+
+  tree c = build_omp_clause (loc, code);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -29923,45 +29956,6 @@ cp_parser_oacc_clause_tile (cp_parser *parser, tree list, location_t here)
   return c;
 }
 
-/* OpenACC:
-   vector_length ( expression ) */
-
-static tree
-cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
-
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-{
-  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenACC 2.0
Parse wait clause or directive parameters.  */
 
@@ -30345,42 +30339,6 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-

Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 02:24 AM, Jakub Jelinek wrote:
> Have you verified pt.c does the right thing when instantiating the
> OMP_CLAUSE_TILE clause (I mean primarily the TREE_VEC in there)?
> There really should be testcases for that.

Here's a patch which adds template support for the oacc clauses. Is it
ok for trunk?

Cesar
2015-11-04  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* pt.c (tsubst_omp_clauses): Add support for OMP_CLAUSE_{NUM_GANGS,
	NUM_WORKERS,VECTOR_LENGTH,GANG,WORKER,VECTOR,ASYNC,WAIT,TILE,AUTO,
	INDEPENDENT,SEQ}. 
	(tsubst_expr): Add support for OMP_CLAUSE_{KERNELS,PARALLEL,LOOP}.

	gcc/testsuite/
	* g++.dg/goacc/template-reduction.C: New test.
	* g++.dg/goacc/template.C: New test.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e3f55a7..4424596 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -14395,6 +14395,15 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, bool allow_fields,
 	case OMP_CLAUSE_PRIORITY:
 	case OMP_CLAUSE_ORDERED:
 	case OMP_CLAUSE_HINT:
+	case OMP_CLAUSE_NUM_GANGS:
+	case OMP_CLAUSE_NUM_WORKERS:
+	case OMP_CLAUSE_VECTOR_LENGTH:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_ASYNC:
+	case OMP_CLAUSE_WAIT:
+	case OMP_CLAUSE_TILE:
 	  OMP_CLAUSE_OPERAND (nc, 0)
 	= tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain, 
 			   in_decl, /*integral_constant_expression_p=*/false);
@@ -14449,6 +14458,9 @@ tsubst_omp_clauses (tree clauses, bool declare_simd, bool allow_fields,
 	case OMP_CLAUSE_THREADS:
 	case OMP_CLAUSE_SIMD:
 	case OMP_CLAUSE_DEFAULTMAP:
+	case OMP_CLAUSE_INDEPENDENT:
+	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_SEQ:
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -15197,6 +15209,15 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
   }
   break;
 
+case OACC_KERNELS:
+case OACC_PARALLEL:
+  tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false, false, args, complain,
+in_decl);
+  stmt = begin_omp_parallel ();
+  RECUR (OMP_BODY (t));
+  finish_omp_construct (TREE_CODE (t), stmt, tmp);
+  break;
+
 case OMP_PARALLEL:
   r = push_omp_privatization_clauses (OMP_PARALLEL_COMBINED (t));
   tmp = tsubst_omp_clauses (OMP_PARALLEL_CLAUSES (t), false, true,
@@ -15227,6 +15248,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 case CILK_FOR:
 case OMP_DISTRIBUTE:
 case OMP_TASKLOOP:
+case OACC_LOOP:
   {
 	tree clauses, body, pre_body;
 	tree declv = NULL_TREE, initv = NULL_TREE, condv = NULL_TREE;
@@ -15235,7 +15257,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 	int i;
 
 	r = push_omp_privatization_clauses (OMP_FOR_INIT (t) == NULL_TREE);
-	clauses = tsubst_omp_clauses (OMP_FOR_CLAUSES (t), false, true,
+	clauses = tsubst_omp_clauses (OMP_FOR_CLAUSES (t), false,
+  TREE_CODE (t) != OACC_LOOP,
   args, complain, in_decl);
 	if (OMP_FOR_INIT (t) != NULL_TREE)
 	  {
@@ -15305,9 +15328,11 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
   pop_omp_privatization_clauses (r);
   break;
 
+case OACC_DATA:
 case OMP_TARGET_DATA:
 case OMP_TARGET:
-  tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false, true,
+  tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false,
+TREE_CODE (t) != OACC_DATA,
 args, complain, in_decl);
   keep_next_level (true);
   stmt = begin_omp_structured_block ();
@@ -15331,6 +15356,16 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
   add_stmt (t);
   break;
 
+case OACC_ENTER_DATA:
+case OACC_EXIT_DATA:
+case OACC_UPDATE:
+  tmp = tsubst_omp_clauses (OMP_STANDALONE_CLAUSES (t), false, false,
+args, complain, in_decl);
+  t = copy_node (t);
+  OMP_STANDALONE_CLAUSES (t) = tmp;
+  add_stmt (t);
+  break;
+
 case OMP_ORDERED:
   tmp = tsubst_omp_clauses (OMP_ORDERED_CLAUSES (t), false, true,
 args, complain, in_decl);
diff --git a/gcc/testsuite/g++.dg/goacc/template-reduction.C b/gcc/testsuite/g++.dg/goacc/template-reduction.C
new file mode 100644
index 000..668eeb3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/goacc/template-reduction.C
@@ -0,0 +1,104 @@
+// This error is temporary.  Remove when support is added for these clauses
+// in the middle end.
+// { dg-prune-output "sorry, unimplemented" }
+
+extern void abort ();
+
+const int n = 100;
+
+// Check explicit template copy map
+
+template T
+sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s, array[0:n])
+  for (int i = 0; i < n; i++)
+s += array[i];
+
+  return s;
+}
+
+// Check implicit template copy map
+
+template T
+sum ()
+{
+  T s = 0;
+  T array[n];
+
+  for (int i = 0; i < n; i++)
+array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s)
+  for (int i = 0; i < n; i++)
+s += array[i];
+
+  return s

Re: [gomp4] fortran cleanups and c/c++ loop parsing backport

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 03:39 AM, Thomas Schwinge wrote:

> On Tue, 27 Oct 2015 11:36:10 -0700, Cesar Philippidis 
> <ce...@codesourcery.com> wrote:
>>   * Proposed fortran cleanups and enhanced error reporting changes:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02288.html
> 
> ... has now been applied to trunk, in an altered version, so we now got
> some divergence between trunk and gomp-4_0-branch to resolve.

I've been concentrating on trunk lately. I'll backport these changes
when my other fortran changes go in
(https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00287.html).

Cesar



Re: [openacc] acc loop updates in fortran

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 09:15 AM, Thomas Schwinge wrote:

>> --- a/gcc/fortran/trans-openmp.c
>> +++ b/gcc/fortran/trans-openmp.c
> 
>> @@ -3449,16 +3478,28 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
>>sizeof (construct_clauses));
>>loop_clauses.collapse = construct_clauses.collapse;
>> [...]
>> -  construct_clauses.collapse = 0;
> 
> Again I'm confused by this, why this is being removed, as earlier in
> .

I'm not sure why, but gfc_trans_omp_do needs it. It's probably an openmp
thing. If you look at gfc_trans_omp_do, you'll see that two sets of
clauses are passed into it. code->ext.omp_clauses corresponds to the
combined construct clauses and do_clauses are the filtered out ones.

So in order to get collapse to work as expected in combined loops, I
can't zero out construct_clauses.collapse.

>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
> 
> I suggest you also merge the existing
> gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 into your new test case
> file (consistent naming, with the other combined-directives* files).

OK, but it depends on what type of things combined_loop.f90 is checking.
If it's scanning gimple, it may have to be a separate file.

>> @@ -0,0 +1,152 @@
>> +! Exercise combined OpenACC directives.
>> +
>> +! { dg-do compile }
>> +! { dg-options "-fopenacc -fdump-tree-gimple" }
>> +
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> What's still unimplemented here?  Please add a comment, or put the
> dg-prune-output directive next to the offending OpenACC directive, so
> we'll be sure to remove it later on.

I was still seeing those sorry messages. I'll put a comment on them.

>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
>> @@ -0,0 +1,363 @@
>> +! { dg-do compile }
>> +! { dg-additional-options "-fmax-errors=100" }
>> +
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> Likewise.
> 
>> +! { dg-prune-output "Error: work-sharing region" }
> 
> What's the intention of this?  If we're expecting this error, place
> dg-error directives where they belong?

Trunk is missing some acc loop nesting verification code in omp-low.c
that's present in gomp4. I'm not sure who's going to port that to trunk.
I'll add a comment in this test to remove it with the sorry messages
when appropriate.

>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-6.f95
>> @@ -0,0 +1,80 @@
>> +! { dg-do compile }
>> +! { dg-additional-options "-fmax-errors=100" }
>> +
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> Likewise.
> 
>> +! { dg-prune-output "Error: work-sharing region" }
> 
> Likewise.
> 
>> --- a/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
>> +++ b/gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
>> @@ -3,6 +3,9 @@
>>  
>>  ! test for tree-dump-original and spaces-commas
>>  
>> +! { dg-prune-output "sorry, unimplemented" }
> 
> Likewise.
> 
>> +! { dg-prune-output "Error: work-sharing region" }
> 
> Likewise.
> 
>> --- a/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
>> +++ b/gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
>> @@ -37,4 +37,3 @@ end program test
>>  
>>  ! { dg-final { scan-tree-dump-times "map\\(force_deviceptr:u\\)" 1 
>> "original" } } 
>>  ! { dg-final { scan-tree-dump-times "private\\(v\\)" 1 "original" } } 
>> -! { dg-final { scan-tree-dump-times "firstprivate\\(w\\)" 1 "original" } } 
> 
> Which of your source code changes does this change related to?

I think Nathan made this change because he found a bug in the test or
something. I just included this test because trunk should be capable to
handle it now.

Cesar



Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-04 Thread Cesar Philippidis
On 11/04/2015 02:24 AM, Jakub Jelinek wrote:
> On Tue, Nov 03, 2015 at 02:16:59PM -0800, Cesar Philippidis wrote:

>> +
>> +  do
>> +{
>> +  if (c_parser_next_token_is (parser, CPP_MULT))
>> +{
>> +  c_parser_consume_token (parser);
>> +  expr = integer_minus_one_node;
>> +}
>> +  else
> 
> Is this right?  If it is either * or (assignment) expression, then
> I'd expect to parse only CPP_MULT followed by CPP_CLOSE_PAREN
> or CPP_COMMA that way (C parser has 2 tokens look-ahead, so it should be
> fine), so that
> tile (a + b + c, *)
> is parsed as
> (a + b + c); -1
> and so is
> tile (*, a + b)
> as
> -1; (a + b)
> while
> tile (*a, *b)
> is
> *a; *b.
> 
> Guess the gang clause parsing that went into the trunk already has the
> same bug,
> gang (static: *)
> or
> gang (static: *, num: 5)
> should be special, but
> gang (static: *ptr)
> should be
> gang (static: (*ptr))

Thanks for catching that. I'll fix the tile and shape scanners in both
front ends.

>> diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
>> index c73dcd0..14d006b 100644
>> --- a/gcc/cp/semantics.c
>> +++ b/gcc/cp/semantics.c
>> @@ -6704,9 +6704,51 @@ finish_omp_clauses (tree clauses, bool allow_fields, 
>> bool declare_simd)
>>  case OMP_CLAUSE_DEFAULTMAP:
>>  case OMP_CLAUSE__CILK_FOR_COUNT_:
>>  case OMP_CLAUSE_AUTO:
>> +case OMP_CLAUSE_INDEPENDENT:
>>  case OMP_CLAUSE_SEQ:
>>break;
>>  
>> +case OMP_CLAUSE_TILE:
>> +  {
>> +tree list = OMP_CLAUSE_TILE_LIST (c);
>> +
>> +while (list)
>> +  {
>> +t = TREE_VALUE (list);
>> +
>> +if (t == error_mark_node)
>> +  remove = true;
>> +else if (!type_dependent_expression_p (t)
>> + && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>> +  {
>> +error ("%<tile%> value must be integral");
>> +remove = true;
>> +  }
>> +else
>> +  {
>> +t = mark_rvalue_use (t);
>> +if (!processing_template_decl)
>> +  {
>> +t = maybe_constant_value (t);
>> +if (TREE_CODE (t) == INTEGER_CST
>> +&& tree_int_cst_sgn (t) != 1
>> +&& t != integer_minus_one_node)
>> +  {
>> +warning_at (OMP_CLAUSE_LOCATION (c), 0,
>> +"%<tile%> value must be positive");
>> +t = integer_one_node;
>> +  }
>> +  }
>> +t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
>> +  }
>> +
>> +/* Update list item.  */
>> +TREE_VALUE (list) = t;
>> +list = TREE_CHAIN (list);
>> +  }
>> +  }
>> +  break;
> 
> Have you verified pt.c does the right thing when instantiating the
> OMP_CLAUSE_TILE clause (I mean primarily the TREE_VEC in there)?
> There really should be testcases for that.

I don't think we have any support for templates in trunk yet. Should I
add it to this patch, or should I address that in a follow up patch?

By the way, template support for num_gangs, num_worker and vector_length
depend on this patch
<https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03455.html> because the
c++ front end is currently incorrectly trying to do type checking as it
scans those clauses.

>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> index 03203c0..08b192d 100644
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -6997,7 +6997,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
>> *pre_p,
>>  
>>  case OMP_CLAUSE_DEVICE_RESIDENT:
>>  case OMP_CLAUSE_USE_DEVICE:
>> -case OMP_CLAUSE_INDEPENDENT:
>>remove = true;
>>break;
>>  
>> @@ -7007,6 +7006,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
>> *pre_p,
>>  case OMP_CLAUSE_COLLAPSE:
>>  case OMP_CLAUSE_AUTO:
>>  case OMP_CLAUSE_SEQ:
>> +case OMP_CLAUSE_INDEPENDENT:
>>  case OMP_CLAUSE_MERGEABLE:
>>  case OMP_CLAUSE_PROC_BIND:
>>  case OMP_CLAUSE_SAFELEN:
>> @@ -7014,6 +7014,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
>> *pre_p,
>>  case OMP_CLAUSE_NOGROUP:
>>  ca

[openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-03 Thread Cesar Philippidis
This patch does the following to the c and c++ front ends:

 * parsing support for the tile, independent, default (none),
   private and firstprivate clauses in c and c++

 * updates c_oacc_split_loop_clauses to filter out the loop clauses
   from combined parallel/kernels loops

The c front end already had some support for private and firstprivate in
openacc. However, the c++ front end wasn't associating any of those
clauses with parallel, kernels or acc loops.

For reference, here's the grammar for the tile clause from section 2.7
in version 2.0a of the openacc spec:

tile( size-expr-list )

  where size-expr is one of:

*
int-expr

That '*' symbol complicated the parsing a little, since it's no longer a
primary expression.

I've bootstrapped and regression tested this on x86_64. Is this ok for
trunk?

Cesar

2015-11-03  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Add support for
	OMP_CLAUSE_TILE.  Update handling of OMP_CLAUSE_INDEPENDENT.
	(gimplify_adjust_omp_clauses): Likewise.
	* omp-low.c (scan_sharing_clauses): Add support for OMP_CLAUSE_TILE.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_TILE.
	* tree.c (omp_clause_num_ops): Add an entry for OMP_CLAUSE_TILE.
	(omp_clause_code_name): Likewise.
	(walk_tree_1): Handle OMP_CLAUSE_TILE.
	* tree.h (OMP_TILE_LIST): New macro.

	gcc/c-family/
	* c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
	AUTO, SEQ and independent as loop clauses.  Associate REDUCTION
	clauses with parallel and kernels.
	* c-pragma.h (enum pragma_omp_clause): Add entries for
	PRAGMA_OACC_CLAUSE_{INDEPENDENT,TILE,DEFAULT}.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_name): Add support for
	PRAGMA_OACC_CLAUSE_INDEPENDENT and PRAGMA_OACC_CLAUSE_TILE.
	(c_parser_omp_clause_default): Add is_oacc argument. Handle
	default(none) in OpenACC.
	(c_parser_oacc_clause_tile): New function.
	(c_parser_oacc_all_clauses): Add support for OMP_CLAUSE_DEFAULT,
	OMP_CLAUSE_INDEPENDENT and OMP_CLAUSE_TILE.
	(OACC_LOOP_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{PRIVATE,INDEPENDENT,
	TILE}.
	(OACC_KERNELS_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.
	(OACC_PARALLEL_MASK): Add PRAGMA_OACC_CLAUSE_{DEFAULT,PRIVATE,
	FIRSTPRIVATE}.
	(c_parser_omp_all_clauses): Update call to c_parser_omp_clause_default.
	* c-typeck.c (c_finish_omp_clauses): Add support for OMP_CLAUSE_TILE
	and OMP_CLAUSE_INDEPENDENT.

	gcc/cp/
	* parser.c (cp_parser_omp_clause_name): Add support for
	PRAGMA_OACC_CLAUSE_INDEPENDENT and PRAGMA_OACC_CLAUSE_TILE.
	(cp_parser_oacc_clause_tile): New function.
	(cp_parser_omp_clause_default): Add is_oacc argument. Handle
	default(none) in OpenACC.
	(cp_parser_oacc_all_clauses): Add support for
	(cp_parser_omp_all_clauses): Update call to
	cp_parser_omp_clause_default.
	PRAGMA_OACC_CLAUSE_{DEFAULT,INDEPENDENT,TILE,PRIVATE,FIRSTPRIVATE}.
	(OACC_LOOP_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_{PRIVATE,INDEPENDENT,
	TILE}.
	(OACC_KERNELS_MASK): Add PRAGMA_OACC_CLAUSE_DEFAULT.
	(OACC_PARALLEL_MASK): Add PRAGMA_OACC_CLAUSE_{DEFAULT,PRIVATE,
	FIRSTPRIVATE}.
	* semantics.c (finish_omp_clauses): Add support for
	OMP_CLAUSE_INDEPENDENT and OMP_CLAUSE_TILE.

	gcc/testsuite/
	* c-c++-common/goacc/combined-directives.c: New test.
	* c-c++-common/goacc/loop-clauses.c: New test.
	* c-c++-common/goacc/tile.c: New test.

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 133d079..a04caf3 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -709,12 +709,21 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 
   switch (OMP_CLAUSE_CODE (clauses))
 {
+	  /* Loop clauses.  */
 	case OMP_CLAUSE_COLLAPSE:
-	case OMP_CLAUSE_REDUCTION:
+	case OMP_CLAUSE_TILE:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_INDEPENDENT:
 	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
 	  loop_clauses = clauses;
 	  break;
 
+	  /* Parallel/kernels clauses.  */
+
 	default:
 	  OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
 	  *not_loop_clauses = clauses;
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 69e7392..953c4e3 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -153,6 +153,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
+  PRAGMA_OACC_CLAUSE_INDEPENDENT,
   PRAGMA_OACC_CLAUSE_NUM_GANGS,
   PRAGMA_OACC_CLAUSE_NUM_WORKERS,
   PRAGMA_OACC_CLAUSE_PRESENT,
@@ -162,6 +163,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE,
   PRAGMA_OACC_CLAUSE_SELF,
   PRAGMA_OACC_CLAUSE_SEQ,
+  PRAGMA_OACC_CLAUSE_TILE,
   PRAGMA_OACC_CLAUSE_VECTOR,
   PRAGMA_OACC_CLAUSE_VECTOR_LENGTH,
   PRAGMA_OACC_CLAUSE_WAIT,
@@ -16

[openacc] acc loop updates in fortran

2015-11-03 Thread Cesar Philippidis
This patch updates the fortran front end so that it supports the acc
loop clauses in a similar manner to both the c and c++ front ends in
addition to addressing a couple of other loose ends. Specifically:

 * the tile clause now supports a list of arguments like c and c++

 * made the default clause parser aware that openacc only supports
  'none'

 * error when no data clauses are used in an update construct

 * corrects the way that reduction errors are detected for duplicate
   variables

 * add support for the gang static argument and the auto clause

Besides for that, a number of tests have either been updated or added.

Is this OK for trunk?

Cesar


2015-11-03  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>

	gcc/fortran/
	* openmp.c (gfc_match_omp_clauses): Update support for the tile
	and default clauses in OpenACC.
	(gfc_match_oacc_update): Error when data clauses are supplied.
	(oacc_compatible_clauses): Delete.
	(resolve_omp_clauses): Give special care for OpenACC reductions.
	Also update error reporting for the tile clause.
	(resolve_oacc_loop_blocks): Update error reporting for the tile clause.
	* trans-openmp.c (gfc_trans_omp_clauses): Update OMP_CLAUSE_SEQ. Add
	OMP_CLAUSE_{AUTO,TILE} and add support the the gang static argument.
	(gfc_trans_oacc_combined_directive): Update the list of clauses which
	are split to acc loops.

2015-11-03  Cesar Philippidis  <ce...@codesourcery.com>
	Tom de Vries  <t...@codesourcery.com>
	Nathan Sidwell  <nat...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>

	gcc/testsuite/
	* gfortran.dg/goacc/combined-directives.f90: New test.
	* gfortran.dg/goacc/default.f95: New test.
	* gfortran.dg/goacc/default_none.f95: New test.
	* gfortran.dg/goacc/firstprivate-1.f95: New test.
	* gfortran.dg/goacc/gang-static.f95: New test.
	* gfortran.dg/goacc/kernels-loop-inner.f95: New test.
	* gfortran.dg/goacc/kernels-loops-adjacent.f95: New test.
	* gfortran.dg/goacc/list.f95: Update test.
	* gfortran.dg/goacc/loop-2.f95: Likewise.
	* gfortran.dg/goacc/loop-4.f95: New test.
	* gfortran.dg/goacc/loop-5.f95: New test.
	* gfortran.dg/goacc/loop-6.f95: New test.
	* gfortran.dg/goacc/loop-tree-1.f90: Update test.
	* gfortran.dg/goacc/modules.f95: New test.
	* gfortran.dg/goacc/multi-clause.f90: New test.
	* gfortran.dg/goacc/parallel-tree.f95: Update test.
	* gfortran.dg/goacc/update.f95: New test.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 929a739..0a92541 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -703,6 +703,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
    OMP_MAP_FORCE_FROM))
 	continue;
   if ((mask & OMP_CLAUSE_TILE)
+	  && !c->tile_list
 	  && match_oacc_expr_list ("tile (", >tile_list, true) == MATCH_YES)
 	continue;
   if ((mask & OMP_CLAUSE_SEQ) && !c->seq
@@ -856,13 +857,14 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
   if ((mask & OMP_CLAUSE_DEFAULT)
 	  && c->default_sharing == OMP_DEFAULT_UNKNOWN)
 	{
-	  if (gfc_match ("default ( shared )") == MATCH_YES)
+	  if (!openacc && gfc_match ("default ( shared )") == MATCH_YES)
 	c->default_sharing = OMP_DEFAULT_SHARED;
-	  else if (gfc_match ("default ( private )") == MATCH_YES)
+	  else if (!openacc && gfc_match ("default ( private )") == MATCH_YES)
 	c->default_sharing = OMP_DEFAULT_PRIVATE;
 	  else if (gfc_match ("default ( none )") == MATCH_YES)
 	c->default_sharing = OMP_DEFAULT_NONE;
-	  else if (gfc_match ("default ( firstprivate )") == MATCH_YES)
+	  else if (!openacc
+		   && gfc_match ("default ( firstprivate )") == MATCH_YES)
 	c->default_sharing = OMP_DEFAULT_FIRSTPRIVATE;
 	  if (c->default_sharing != OMP_DEFAULT_UNKNOWN)
 	continue;
@@ -1304,10 +1306,19 @@ match
 gfc_match_oacc_update (void)
 {
   gfc_omp_clauses *c;
+  locus here = gfc_current_locus;
+
   if (gfc_match_omp_clauses (, OACC_UPDATE_CLAUSES, false, false, true)
   != MATCH_YES)
 return MATCH_ERROR;
 
+  if (!c->lists[OMP_LIST_MAP])
+{
+  gfc_error ("% must contain at least one "
+		 "%<device%> or % clause at %L", );
+  return MATCH_ERROR;
+}
+
   new_st.op = EXEC_OACC_UPDATE;
   new_st.ext.omp_clauses = c;
   return MATCH_YES;
@@ -2846,30 +2857,6 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns,
   return copy;
 }
 
-/* Returns true if clause in list 'list' is compatible with any of
-   of the clauses in lists [0..list-1].  E.g., a reduction variable may
-   appear in both reduction and private clauses, so this function
-   will return true in this case.  */
-
-static bool
-oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
-			   gfc_symbol *sym, bool openacc)
-{
-

Re: more accurate omp in fortran

2015-10-31 Thread Cesar Philippidis
On 10/30/2015 09:29 PM, Dominique d'Humières wrote:
>> diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
> 
> Revision r229609 breaks bootstrap:
> 
> ../../work/gcc/fortran/openmp.c: In function 'void 
> resolve_omp_clauses(gfc_code*, gfc_omp_clauses*, gfc_namespace*, bool)':
> ../../work/gcc/fortran/openmp.c:2925:27: error: format '%L' expects argument 
> of type 'locus*', but argument 3 has type 'locus' [-Werror=format=]
>  n->sym->name, n->where);
>^
> cc1plus: all warnings being treated as errors

Sorry about that. I as I explained in PR68168, I wasn't using
--enable-bootstrap when I tested this patch because I thought it was
implied by default. I was able to reproduce this problem and fix it with
the attached patch after I explicitly configured and built gcc with
--enable-bootstrap.

I've applied this patch to trunk, since it should have been included
with the original patch in the first place.

Cesar

2015-10-31  Cesar Philippidis  <ce...@codesourcery.com>

	PR Bootstrap/68168

	gcc/fortran/
	* openmp.c (resolve_omp_clauses): Pass >where when calling
	gfc_error.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 3fd19b8..e59139c 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -2922,7 +2922,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	  {
 	if (!code && (!n->sym->attr.dummy || n->sym->ns != ns))
 	  gfc_error ("Variable %qs is not a dummy argument at %L",
-			 n->sym->name, n->where);
+			 n->sym->name, >where);
 	continue;
 	  }
 	if (n->sym->attr.flavor == FL_PROCEDURE


Re: [OpenACC] num_gangs, num_workers and vector_length in c++

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 10:05 AM, Jakub Jelinek wrote:
> On Fri, Oct 30, 2015 at 07:42:39AM -0700, Cesar Philippidis wrote:

>>> Another thing is what Jason as C++ maintainer wants, it is nice to get rid
>>> of some code redundancies, on the other side the fact that there is one
>>> function per non-terminal in the grammar is also quite nice property.
>>> I know I've violated this a few times too.
> 
>> That name had some legacy from the c FE in gomp-4_0-branch which the
>> function was inherited from. On one hand, it doesn't make sense to allow
>> negative integer values for those clauses, but at the same time, those
>> values aren't checked during scanning. Maybe it should be renamed
>> cp_parser_oacc_single_int_clause instead?
> 
> That is better.
> 
>> If you like, I could make a more general
>> cp_parser_omp_generic_expression that has a scan_list argument so that
>> it can be used for both general expressions and assignment-expressions.
>> That way it can be used for both omp and oacc clauses of the form 'foo (
>> expression )'.
> 
> No, that will only confuse readers of the parser.  After all, the code to
> parse an expression argument of a clause is not that large.
> So, either cp_parser_oacc_single_int_clause or just keeping the old separate
> parsing functions, just remove the cruft from those (testing the type,
> using cp_parser_condition instead of cp_parser_assignment_expression) is ok
> with me.  Please ping Jason on what he prefers from those two.

Jason, what's your preference here? Should I create a single function to
parser num_gangs, num_workers and vector_length since they all accept
the same type of argument or should I just correct the existing
functions as I did in the attached patch? Either one would be specific
to openacc.

This patch has been bootstrapped and regression tested on trunk.

Cesar
2015-10-30  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* parser.c (cp_parser_oacc_clause_vector_length): Parse the clause
	argument as an assignment expression. Bail out early on error.
	(cp_parser_omp_clause_num_gangs): Likewise.
	(cp_parser_omp_clause_num_workers): Likewise.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c8f8b3d..a0d3f3b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29732,37 +29732,29 @@ cp_parser_oacc_shape_clause (cp_parser *parser, omp_clause_code kind,
 static tree
 cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
 {
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
 return list;
 
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
 
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
 {
   cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
   return list;
 }
 
   check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
+			 loc);
 
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
+  tree c = build_omp_clause (loc, OMP_CLAUSE_VECTOR_LENGTH);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
   OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
+  return c;
 }
 
 /* OpenACC 2.0
@@ -30149,34 +30141,28 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
 static tree
 cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
 {
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
 return list;
 
-  t = cp_parser_condition (parser);
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
 
   if (t == error_mark_node
   || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
 {
-  error_at (location, "expected positive integer expression");
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 

Re: [patch] New backend header reduction

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 01:20 PM, Andrew MacLeod wrote:
> On 10/30/2015 02:09 PM, Andrew MacLeod wrote:
>> On 10/30/2015 01:56 PM, Cesar Philippidis wrote:
>>> On 10/23/2015 12:24 PM, Jeff Law wrote:
>>>> On 10/23/2015 10:53 AM, Andrew MacLeod wrote:
>>>>
>>> There's a little bit of fallout with this patch when building an
>>> offloaded compiler for openacc. It looks like cgraph.c needs to include
>>> context.h and varpool.c needs context.h and omp-low.h. There's a couple
>>> of ifdef ENABLE_OFFLOADING which may have gone undetected with your
>>> script.
>> If they are defined on the command line or some other way I couldn't
>> see with the targets I built, then that is the common case when that
>> happens.  I don't think I did any openacc builds. OR maybe I need
>> to add nvptx to my coverage builds. Perhaps that is best.
>>> I've bootstrapped the attached patch for an nvptx/x86_64-linux target.
>>> I'm still testing that toolchain. If the testing comes back clean, is
>>> this patch OK for trunk?
> Ah, I see.  there is no nvptx target in config-list.mk, so it never got
> covered.

Yeah, you need to build two separate compilers. Thomas posted some
directions here <https://gcc.gnu.org/wiki/Offloading>. You could
probably reproduce it with openmp and Intel's MIC emulation target too.

Cesar



Re: [OpenACC] num_gangs, num_workers and vector_length in c++

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 06:37 AM, Jakub Jelinek wrote:
> On Thu, Oct 29, 2015 at 04:02:11PM -0700, Cesar Philippidis wrote:
>> I noticed that num_gangs, num_workers and vector_length are parsed in
>> somewhat insistent ways in the c++ FE. Both vector_length and num_gangs
>> bail out whenever as soon as they detect errors, whereas num_workers
>> does not. Besides for that, they are also checking for integral
>> expressions as the arguments are scanned instead of deferring that check
>> to finish_omp_clauses. That check will cause ICEs when template
>> arguments are used when we add support for template arguments later on.
>>
>> Rather than fix each function individually, I've consolidated them into
>> a single cp_parser_oacc_positive_int_clause function. While this
>> function could be extended to support openmp clauses which accept an
>> integer expression argument, like num_threads, I've decided to leave
>> those as-is since there are no known problems with those functions at
>> this moment.
> 
> First question is what int-expr in OpenACC actually stands for (but I'll
> have to raise similar question for OpenMP too).
> 
> Previously you were using cp_parser_condition, which is clearly undesirable
> in this case, it allows e.g.
> num_gangs (int a = 5)
> but the question is if
> num_gangs (5, 6)
> is valid and stands for (5, 6) expression, then it should use
> cp_parser_expression, or if you want to error on it, then you should use
> cp_parser_assignment_expression.

The openacc spec doesn't actually define int-expr, but we take to me
mean a single integral value. In general, the openacc spec uses the term
list to describe comma separated expressions. So we've been assuming
that expr cannot contain commas. Besides, for num_gangs, num_workers and
vector_length it doesn't make sense to accept more than one value. A
construct can accept one than one of those clauses, but they'd have to
be associated with a different device_type.

> From quick skimming of the (now removed) C/C++ Grammar Appendix in OpenMP,
> I believe all the places where expression or scalar-expression is used
> in the grammar are meant to be cp_parser_expression cases (except
> expression-list used in UDRs which stands for normal C++ expression-list
> non-terminal), so clearly I need to fix up omp_clause_{if,final} to call
> cp_parser_expression instead of cp_parser_condition, and the various
> OpenMP clauses that use cp_parser_assignment_expression to instead use
> cp_parser_expression.  Say schedule(static, 3, 6) should be valid IMHO.
> But, in OpenMP expression or scalar-expression in the grammar is never
> followed by , or optional , while in OpenACC grammar clearly is (e.g. for
> the gang clause).
> If OpenACC wants something different, clearly you can't share the parsing
> routines between say num_tasks and num_workers.

So num_threads, num_tasks, grainsize, priority, hint, num_teams,
thread_limit should all accept comma-separated lists?

> Another thing is what Jason as C++ maintainer wants, it is nice to get rid
> of some code redundancies, on the other side the fact that there is one
> function per non-terminal in the grammar is also quite nice property.
> I know I've violated this a few times too.
> 
> Next question is, why do you call it cp_parser_oacc_positive_int_clause
> when the parsing function actually doesn't verify neither the positive nor
> the int properties (and it should not), so perhaps it should just reflect
> in the name that it is a clause with assignment? expression.
> Or, see the previous paragraph, have a helper that does that and then
> have a separate function for each clause kind that calls those with the
> right arguments.

That name had some legacy from the c FE in gomp-4_0-branch which the
function was inherited from. On one hand, it doesn't make sense to allow
negative integer values for those clauses, but at the same time, those
values aren't checked during scanning. Maybe it should be renamed
cp_parser_oacc_single_int_clause instead?

If you like, I could make a more general
cp_parser_omp_generic_expression that has a scan_list argument so that
it can be used for both general expressions and assignment-expressions.
That way it can be used for both omp and oacc clauses of the form 'foo (
expression )'.

What's your preference?

Thanks,
Cesar


Re: more accurate omp in fortran

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 07:47 AM, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 08:21:35AM -0700, Cesar Philippidis wrote:
>> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
>> index b2894cc..93adb7b 100644
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -1123,6 +1123,7 @@ typedef struct gfc_omp_namelist
>>  } u;
>>struct gfc_omp_namelist_udr *udr;
>>struct gfc_omp_namelist *next;
>> +  locus where;
>>  }
>>  gfc_omp_namelist;
>>  
>> diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
>> index 3c12d8e..56a95d4 100644
>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -244,6 +244,7 @@ gfc_match_omp_variable_list (const char *str, 
>> gfc_omp_namelist **list,
>>  }
>>tail->sym = sym;
>>tail->expr = expr;
>> +  tail->where = cur_loc;
>>goto next_item;
>>  case MATCH_NO:
>>break;
>> @@ -278,6 +279,7 @@ gfc_match_omp_variable_list (const char *str, 
>> gfc_omp_namelist **list,
>>tail = tail->next;
>>  }
>>tail->sym = sym;
>> +  tail->where = cur_loc;
>>  }
>>  
>>  next_item:
> 
> The above is fine.

Thanks. I'll apply this change separately.

>> @@ -2832,36 +2834,47 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, 
>> gfc_namespace *ns,
>>return copy;
>>  }
>>  
>> -/* Returns true if clause in list 'list' is compatible with any of
>> -   of the clauses in lists [0..list-1].  E.g., a reduction variable may
>> -   appear in both reduction and private clauses, so this function
>> -   will return true in this case.  */
>> +/* Check if a variable appears in multiple clauses.  */
>>  
>> -static bool
>> -oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
>> -   gfc_symbol *sym, bool openacc)
>> +static void
>> +resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
>> +int list)
>>  {
>>gfc_omp_namelist *n;
>> +  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
> 
> Please don't do this, I'm afraid this breaks translations.
> Also, can you explain why all the mess with OMP_LIST_REDUCTION && openacc?
> That clearly looks misplaced to me.
> If one list item may be in at most one reduction clause, but may be in
> any other clause too, then it is the same case as e.g. OpenMP
> OMP_LIST_ALIGNED case, so you should instead just:
>   && (list != OMP_LIST_REDUCTION || !openacc)
> to the for (list = 0; list < OMP_LIST_NUM; list++) loop, and handle
> OMP_LIST_REDUCTION specially, similarly how OMP_LIST_ALIGNED is handled,
> just guarded with if (openacc).

That's a good idea, thanks. Reduction variables may appear in multiple
clauses in openacc because you have have reductions on kernels and
parallel constructs. And the same reduction variable may be associated
with a data clause.

Cesar


Re: more accurate omp in fortran

2015-10-30 Thread Cesar Philippidis
On 10/30/2015 09:58 AM, Jakub Jelinek wrote:

> What I meant not just the above changes, but also all changes that
> replace where with >where and the like, so pretty much everything
> except for the oacc_compatible_clauses removal and addition of
> resolve_omp_duplicate_list.  That is kind of unrelated change.

Yeah, I was post the patch before I applied it anyway. Here's what I'm
testing now. I just into some fallout with Andrew MacLeod's header file
reduction patch when building offloading compilers. Seems like some
files are not including context.h anymore.

Cesar

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 90f63cf..13e730f 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1123,6 +1123,7 @@ typedef struct gfc_omp_namelist
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
+  locus where;
 }
 gfc_omp_namelist;
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 6c78c97..197b6d6 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -244,6 +244,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	}
 	  tail->sym = sym;
 	  tail->expr = expr;
+	  tail->where = cur_loc;
 	  goto next_item;
 	case MATCH_NO:
 	  break;
@@ -278,6 +279,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail = tail->next;
 	}
 	  tail->sym = sym;
+	  tail->where = cur_loc;
 	}
 
 next_item:
@@ -2860,9 +2862,8 @@ oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
 /* OpenMP directive resolving routines.  */
 
 static void
-resolve_omp_clauses (gfc_code *code, locus *where,
-		 gfc_omp_clauses *omp_clauses, gfc_namespace *ns,
-		 bool openacc = false)
+resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
+		 gfc_namespace *ns, bool openacc = false)
 {
   gfc_omp_namelist *n;
   gfc_expr_list *el;
@@ -2921,7 +2922,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  {
 	if (!code && (!n->sym->attr.dummy || n->sym->ns != ns))
 	  gfc_error ("Variable %qs is not a dummy argument at %L",
-			 n->sym->name, where);
+			 n->sym->name, n->where);
 	continue;
 	  }
 	if (n->sym->attr.flavor == FL_PROCEDURE
@@ -2953,7 +2954,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  }
 	  }
 	gfc_error ("Object %qs is not a variable at %L", n->sym->name,
-		   where);
+		   >where);
   }
 
   for (list = 0; list < OMP_LIST_NUM; list++)
@@ -2969,7 +2970,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  if (n->sym->mark && !oacc_compatible_clauses (omp_clauses, list,
 			n->sym, openacc))
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
 	  else
 	n->sym->mark = 1;
 	}
@@ -2980,7 +2981,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
   if (n->sym->mark)
 	{
 	  gfc_error ("Symbol %qs present on multiple clauses at %L",
-		 n->sym->name, where);
+		 n->sym->name, n->where);
 	  n->sym->mark = 0;
 	}
 
@@ -2988,7 +2989,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->sym->mark)
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
   else
 	n->sym->mark = 1;
 }
@@ -2999,7 +3000,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->sym->mark)
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
   else
 	n->sym->mark = 1;
 }
@@ -3011,7 +3012,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->sym->mark)
 	gfc_error ("Symbol %qs present on multiple clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, n->where);
   else
 	n->sym->mark = 1;
 }
@@ -3025,7 +3026,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 {
   if (n->expr == NULL && n->sym->mark)
 	gfc_error ("Symbol %qs present on both FROM and TO clauses at %L",
-		   n->sym->name, where);
+		   n->sym->name, >where);
   else
 	n->sym->mark = 1;
 }
@@ -3047,7 +3048,7 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  {
 		if (!n->sym->attr.threadprivate)
 		  gfc_error ("Non-THREADPRIVATE object %qs in COPYIN clause"
-			 " at %L", n->sym->name, where);
+			 " at %L", n->sym->name, >where);
 	  }
 	break;
 	  case OMP_LIST_COPYPRIVATE:
@@ -3055,10 +3056,10 @@ resolve_omp_clauses (gfc_code *code, locus *where,
 	  {
 		if (n->sym->as && n->sym->as->type == AS_ASSUMED_SIZE)
 		  gfc_error ("Assumed size array %qs in COPYPRIVATE clause "
-			 "at %L", n->sym->name, where);
+			 "at %L", n->sym->name, >where);
 		if (n->sym->attr.pointer && n->sym->attr.intent == INTENT_IN)
 		  gfc_error ("INTENT(IN) POINTER %qs in COPYPRIVATE clause "
-			 "at %L", n->sym->name, where);
+			 "at %L", n->sym->name, >where);
 	  }
 	break;
 	  case 

Re: [patch] New backend header reduction

2015-10-30 Thread Cesar Philippidis
On 10/23/2015 12:24 PM, Jeff Law wrote:
> On 10/23/2015 10:53 AM, Andrew MacLeod wrote:
>> Just finished running...  I think the external hard drive was slowing
>> down this run :-P  It took quite a while.
>>
>> Anyway, this is the reduction patch independent of the header-ordering
>> patch... ie, that patch needs to be applied before this one.   So this
>> should be mostly just removals.   I also need to follow up and build all
>> the target and bootstrap from scratch to make sure there arent any
>> weirdnesses with it.   But you can at least get a look at it now.
>>
>> a few interesting stats:
>>
>> Top reductions:
>>
>> passes.c: Reduction performed, 26 includes removed.
>> shrink-wrap.c: Reduction performed, 21 includes removed.
>> ipa-polymorphic-call.c: Reduction performed, 21 includes removed.
>> lto-cgraph.c: Reduction performed, 19 includes removed.
>> ddg.c: Reduction performed, 19 includes removed.
>> tree-ssa-pre.c: Reduction performed, 18 includes removed.
>> lra-remat.c: Reduction performed, 18 includes removed.
>> cgraph.c: Reduction performed, 18 includes removed.
>> cgraphclones.c: Reduction performed, 18 includes removed.
>> tsan.c: Reduction performed, 17 includes removed.
>> tree-into-ssa.c: Reduction performed, 17 includes removed.
>> lto-section-in.c: Reduction performed, 17 includes removed.
>>
>> And headers most often removed:
>>
>> alias.h: Removed 230 times.
>> flags.h: Removed 207 times.
>> internal-fn.h: Removed 143 times.
>> stmt.h: Removed 128 times.
>> dojump.h: Removed 122 times.
>> expmed.h: Removed 115 times.
>> explow.h: Removed 115 times.
>> varasm.h: Removed 114 times.
>> calls.h: Removed 114 times.
>> expr.h: Removed 81 times.
>> insn-config.h: Removed 77 times.
>> emit-rtl.h: Removed 62 times.
>> hard-reg-set.h: Removed 60 times.
>> tm_p.h: Removed 56 times.
>> fold-const.h: Removed 56 times.
>> diagnostic-core.h: Removed 53 times.
>> except.h: Removed 51 times.
> Approved.  This was the easy part :-)
> 
> A quick grep shows 2309 unnecessary #includes removed.

There's a little bit of fallout with this patch when building an
offloaded compiler for openacc. It looks like cgraph.c needs to include
context.h and varpool.c needs context.h and omp-low.h. There's a couple
of ifdef ENABLE_OFFLOADING which may have gone undetected with your script.

I've bootstrapped the attached patch for an nvptx/x86_64-linux target.
I'm still testing that toolchain. If the testing comes back clean, is
this patch OK for trunk?

Cesar

2015-10-30  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* cgraph.c: Include context.h for offloading.
	* varpool.c: Include context.h and omp-low.h.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 92b8613..7839c72 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "profile.h"
 #include "params.h"
 #include "tree-chkp.h"
+#include "context.h"
 
 /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
 #include "tree-pass.h"
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 3010dbb..478f365 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "varasm.h"
 #include "debug.h"
 #include "output.h"
+#include "omp-low.h"
+#include "context.h"
 
 const char * const tls_model_names[]={"none", "emulated",
   "global-dynamic", "local-dynamic",


[gomp4] revert num_gangs, num_workers, vector_length and num_threads parser changes in c/c++

2015-10-29 Thread Cesar Philippidis
In gomp-4_0-branch, we've tried to consolidate the parsing all of the
clauses of the form

  foo (int-expression)

into a single c*_parser_omp_positive_int_clause function. At the time,
such clauses included num_gangs, num_workers, vector_length and
num_threads. Looking at OpenMP 4.5, there are additional candidates for
this function, specifically num_tasks, grainsize, priority and hint.
With that in mind, parser support for all of the aforementioned clauses
is already present in trunk, so I'll revert these change in
gomp-4_0-branch since they add no functionality. We might revisit a
similar patch if OpenACC adds new clauses of this form in the future.

I've applied this patch to gomp-4_0-branch.

Cesar


Re: [gomp4] revert num_gangs, num_workers, vector_length and num_threads parser changes in c/c++

2015-10-29 Thread Cesar Philippidis
On 10/29/2015 07:08 AM, Cesar Philippidis wrote:
> In gomp-4_0-branch, we've tried to consolidate the parsing all of the
> clauses of the form
> 
>   foo (int-expression)
> 
> into a single c*_parser_omp_positive_int_clause function. At the time,
> such clauses included num_gangs, num_workers, vector_length and
> num_threads. Looking at OpenMP 4.5, there are additional candidates for
> this function, specifically num_tasks, grainsize, priority and hint.
> With that in mind, parser support for all of the aforementioned clauses
> is already present in trunk, so I'll revert these change in
> gomp-4_0-branch since they add no functionality. We might revisit a
> similar patch if OpenACC adds new clauses of this form in the future.
> 
> I've applied this patch to gomp-4_0-branch.

I found some other bits that needed to be transferred from trunk, which
the attached patch does.

Note that I introduced a regression in template.C in gomp-4_0-branch in
the previous patch. The plan is to get templates working in trunk first,
then backport the fix to gomp-4_0-branch.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-10-29  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* parser.c (cp_parser_omp_simple_clause): Rename to ...
	(cp_parser_oacc_simple_clause): ... this.
	(cp_parser_omp_clause_untied): Restore from trunk.
	(cp_parser_omp_clause_branch): Likewise.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_simple_clause for
	OACC_CLAUSE_{AUTO,INDEPENDENT,NOHOST,NUM_GANGS,SEQ}.
	(cp_parser_omp_all_clauses): Use cp_parser_omp_clause_untied for
	OMP_CLAUSE_UNTIED, and cp_parser_omp_clause_branch for
	OMP_CLAUSE_{INBRANCH,NOTINBRANCH} and CICK_CLAUSE_{MASK,NOMASK}.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 71c33c4..8c1b20d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29642,6 +29642,23 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser *parser, tree list)
   return list;
 }
 
+/* OpenACC 2.0:
+   auto
+   independent
+   nohost
+   seq */
+
+static tree
+cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
+			  enum omp_clause_code code,
+			  tree list, location_t location)
+{
+  check_no_duplicate_clause (list, code, omp_clause_code_name[code], location);
+  tree c = build_omp_clause (location, code);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -30886,20 +30903,27 @@ cp_parser_omp_clause_schedule (cp_parser *parser, tree list, location_t location
 }
 
 /* OpenMP 3.0:
-   untied
+   untied */
 
-   OpenMP 4.0:
-   inbranch
-   notinbranch
+static tree
+cp_parser_omp_clause_untied (cp_parser * /*parser*/,
+			 tree list, location_t location)
+{
+  tree c;
 
-   OpenACC 2.0:
-   auto
-   independent
-   nohost
-   seq */
+  check_no_duplicate_clause (list, OMP_CLAUSE_UNTIED, "untied", location);
+
+  c = build_omp_clause (location, OMP_CLAUSE_UNTIED);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenMP 4.0:
+   inbranch
+   notinbranch */
 
 static tree
-cp_parser_omp_simple_clause (cp_parser * /*parser*/, enum omp_clause_code code,
+cp_parser_omp_clause_branch (cp_parser * /*parser*/, enum omp_clause_code code,
 			 tree list, location_t location)
 {
   check_no_duplicate_clause (list, code, omp_clause_code_name[code], location);
@@ -31697,7 +31721,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "async";
 	  break;
 	case PRAGMA_OACC_CLAUSE_AUTO:
-	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_AUTO,
+	  clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_AUTO,
 		 clauses, here);
 	  c_name = "auto";
 	  break;
@@ -31762,9 +31786,9 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "if";
 	  break;
 	case PRAGMA_OACC_CLAUSE_INDEPENDENT:
-	  clauses = cp_parser_omp_simple_clause (parser,
-		 OMP_CLAUSE_INDEPENDENT,
-		 clauses, here);
+	  clauses = cp_parser_oacc_simple_clause (parser,
+		  OMP_CLAUSE_INDEPENDENT,
+		  clauses, here);
 	  c_name = "independent";
 	  break;
 	case PRAGMA_OACC_CLAUSE_GANG:
@@ -31781,8 +31805,8 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "link";
 	  break;
 	case PRAGMA_OACC_CLAUSE_NOHOST:
-	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_NOHOST,
-		 clauses, here);
+	  clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_NOHOST,
+		  clauses, here);
 	  c_name = "nohost";
 	  break;
 	case PRAGMA_OACC_CLAUSE_NUM_GANGS:
@@ -31823,7 +31847,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
-	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_SEQ,
+	  clauses = cp_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
 		 clauses, here);
 	  c_name = "seq";
 	  

[OpenACC] num_gangs, num_workers and vector_length in c++

2015-10-29 Thread Cesar Philippidis
I noticed that num_gangs, num_workers and vector_length are parsed in
somewhat insistent ways in the c++ FE. Both vector_length and num_gangs
bail out whenever as soon as they detect errors, whereas num_workers
does not. Besides for that, they are also checking for integral
expressions as the arguments are scanned instead of deferring that check
to finish_omp_clauses. That check will cause ICEs when template
arguments are used when we add support for template arguments later on.

Rather than fix each function individually, I've consolidated them into
a single cp_parser_oacc_positive_int_clause function. While this
function could be extended to support openmp clauses which accept an
integer expression argument, like num_threads, I've decided to leave
those as-is since there are no known problems with those functions at
this moment.

It this OK for trunk? I've regression tested and bootstrapped on
x86_64-linux.

Cesar


2015-10-29  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/cp/
	* parser.c (cp_parser_oacc_positive_int_clause): New function.
	(cp_parser_oacc_clause_vector_length): Delete.
	(cp_parser_omp_clause_num_gangs): Delete.
	(cp_parser_omp_clause_num_workers): Delete.
	(cp_parser_oacc_all_clauses): Use cp_parser_oacc_positive_int_clause
	to handle OMP_CLAUSE_{NUM_GANGS,NUM_WORKERS,VECTOR_LENGTH}.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c8f8b3d..b1172e7 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -29603,6 +29603,39 @@ cp_parser_oacc_simple_clause (cp_parser * /* parser  */,
   return c;
 }
 
+ /* OpenACC:
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )  */
+
+static tree
+cp_parser_oacc_positive_int_clause (cp_parser *parser, omp_clause_code code,
+const char *str, tree list)
+{
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+return list;
+
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
+
+  if (t == error_mark_node
+  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+{
+  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+	 /*or_comma=*/false,
+	 /*consume_paren=*/true);
+  return list;
+}
+
+  check_no_duplicate_clause (list, code, str, loc);
+
+  tree c = build_omp_clause (loc, code);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
 /* OpenACC:
 
 gang [( gang-arg-list )]
@@ -29726,45 +29759,6 @@ cp_parser_oacc_shape_clause (cp_parser *parser, omp_clause_code kind,
   return list;
 }
 
-/* OpenACC:
-   vector_length ( expression ) */
-
-static tree
-cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  error = true;
-}
-
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-{
-  cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			 location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenACC 2.0
Parse wait clause or directive parameters.  */
 
@@ -30143,42 +30137,6 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-  || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-	   /*or_comma=*/false,
-	   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-{
-  error_at (location, "expected positive integer expression");
-  return list;
-}
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_GANGS, "num_gangs", location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_NUM_GANGS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenMP 2.5:
num_threads ( expression ) */
 
@@ -30387,43 +30345,6 @@ cp_parse

Re: [gomp4] fortran cleanups and c/c++ loop parsing backport

2015-10-28 Thread Cesar Philippidis
On 10/28/2015 04:00 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Tue, 27 Oct 2015 11:36:10 -0700, Cesar Philippidis 
> <ce...@codesourcery.com> wrote:
>> This patch contains the following:
>>
>>   * C front end changes from trunk:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02528.html
>>
>>   * C++ front end changes from trunk:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02540.html
>>
>>   * Proposed fortran cleanups and enhanced error reporting changes:
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02288.html
> 
> I suppose the latter is a prerequisite for other Fortran front end
> changes you've also committed?  Otherwise, why not get that patch into
> trunk first?  That sould save me from having to deal with potentially
> more merge conflicts later on...

It wasn't necessarily a prerequisite for these changes, but I've been
trying to get that patch into trunk for a while now. Plus, part of those
cleanups touched declare, which Jim is going to work on soon.

>> In addition, I've also added a couple of more test cases and updated the
>> way that combined directives are handled in fortran. Because the
>> device_type clauses form a chain of gfc_omp_clauses, I couldn't reuse
>> gfc_split_omp_clauses for combined parallel and kernels loops. So that's
>> why I introduced a new gfc_filter_oacc_combined_clauses function.
> 
> Thanks, but...
> 
>> I'll apply this patch to gomp-4_0-branch shortly. I know that I should
>> have broken this patch down into smaller patches
> 
> Yes.
> 
>> but it was already
>> arranged as one big patch in my source tree.
> 
> You're using Git, so that's not a good excuse.  :-P

I find git to be too temperamental.

>> --- a/gcc/fortran/trans-openmp.c
>> +++ b/gcc/fortran/trans-openmp.c
>> @@ -3634,12 +3634,65 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, 
>> stmtblock_t *pblock,
>>return gfc_finish_block ();
>>  }
>>  
>> -/* parallel loop and kernels loop. */
>> +/* Helper function to filter combined oacc constructs.  ORIG_CLAUSES
>> +   contains the unfiltered list of clauses.  LOOP_CLAUSES corresponds to
>> +   the filter list of loop clauses corresponding to the enclosed list.
>> +   This function is called recursively to handle device_type clauses.  */
>> +
>> +static void
>> +gfc_filter_oacc_combined_clauses (gfc_omp_clauses **orig_clauses,
>> +  gfc_omp_clauses **loop_clauses)
>> +{
>> +  if (*orig_clauses == NULL)
>> +{
>> +  *loop_clauses = NULL;
>> +  return;
>> +}
>> +
>> +  *loop_clauses = gfc_get_omp_clauses ();
>> +
>> +  memset (*loop_clauses, 0, sizeof (gfc_omp_clauses));
> 
> This has already been created zero-initialized.

I was just doing what I was doing before. I removed that in the follow
up patch.

>> +  (*loop_clauses)->gang = (*orig_clauses)->gang;
>> +  (*orig_clauses)->gang = false;
>> +  (*loop_clauses)->gang_expr = (*orig_clauses)->gang_expr;
>> +  (*orig_clauses)->gang_expr = NULL;
>> +  (*loop_clauses)->gang_static = (*orig_clauses)->gang_static;
>> +  (*orig_clauses)->gang_static = false;
>> +  (*loop_clauses)->vector = (*orig_clauses)->vector;
>> +  (*orig_clauses)->vector = false;
>> +  (*loop_clauses)->vector_expr = (*orig_clauses)->vector_expr;
>> +  (*orig_clauses)->vector_expr = NULL;
>> +  (*loop_clauses)->worker = (*orig_clauses)->worker;
>> +  (*orig_clauses)->worker = false;
>> +  (*loop_clauses)->worker_expr = (*orig_clauses)->worker_expr;
>> +  (*orig_clauses)->worker_expr = NULL;
>> +  (*loop_clauses)->seq = (*orig_clauses)->seq;
>> +  (*orig_clauses)->seq = false;
>> +  (*loop_clauses)->independent = (*orig_clauses)->independent;
>> +  (*orig_clauses)->independent = false;
>> +  (*loop_clauses)->par_auto = (*orig_clauses)->par_auto;
>> +  (*orig_clauses)->par_auto = false;
>> +  (*loop_clauses)->acc_collapse = (*orig_clauses)->acc_collapse;
>> +  (*orig_clauses)->acc_collapse = false;
>> +  (*loop_clauses)->collapse = (*orig_clauses)->collapse;
>> +  /* Don't reset (*orig_clauses)->collapse.  */
> 
> Why?  (Extend source code comment?)  The original code (cited just below)
> did this differently.

Because that's what gfc_split_omp_clauses does. I'm not sure what that's
required for gfc_trans_omp_do, but it is. gfc_trans_omp_do appears to be
operating on two sets of clauses for some non-obvious reason.

>> +  (*loop_clauses)->tile = (*orig_clauses)->tile;

Re: [OpenACC] declare directive

2015-10-28 Thread Cesar Philippidis
On 10/27/2015 01:18 PM, James Norris wrote:

> This patch adds the processing of OpenACC declare directive in C
> and C++. (Note: Support in Fortran is already in trunk.)
> Commentary on the changes is included as an attachment (NOTES).

A quick diff of gomp4 and trunk reveals quite a few fortran changes that
aren't present in trunk. Can you post those changes as a separate patch?

Thanks,
Cesar



[gomp4] minor cfe backports

2015-10-28 Thread Cesar Philippidis
I've applied this patch which backports a change in the way that seq and
auto are parsed in the c front end from trunk to gomp4.

Next up, I'm preparing a patch to remove *_omp_positive_int_clause from
the c and c++ front ends in gomp4. That function is used to parse
num_threads, num_gangs, num_workers and vector_length in gomp4. But
support for those clauses are already present in trunk. I'll post more
details with the patch later.

Cesar
2015-10-28  Cesar Philippidis  <ce...@codesourcery.com>

	* gcc/c/c-parser.c (c_parser_oacc_simple_clause): New
	function.
	(c_parser_oacc_all_clauses): Use it instead of
	c_parser_omp_simple_clause.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index a1465bf..e4a0aca 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11365,7 +11365,25 @@ c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
 
  cleanup_error:
   c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
-  return list;  return c;
+  return list;
+}
+
+/* OpenACC:
+   auto
+   independent
+   nohost
+   seq */
+
+static tree
+c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
+			 tree list)
+{
+  check_no_duplicate_clause (list, code, omp_clause_code_name[code]);
+
+  tree c = build_omp_clause (c_parser_peek_token (parser)->location, code);
+  OMP_CLAUSE_CHAIN (c) = list;
+
+  return c;
 }
 
 /* OpenACC:
@@ -12724,7 +12742,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "async";
 	  break;
 	case PRAGMA_OACC_CLAUSE_AUTO:
-	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_AUTO,
+	  clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_AUTO,
 		clauses);
 	  c_name = "auto";
 	  break;
@@ -12848,7 +12866,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
-	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_SEQ,
+	  clauses = c_parser_oacc_simple_clause (parser, OMP_CLAUSE_SEQ,
 		clauses);
 	  c_name = "seq";
 	  break;


Re: Re: [Bulk] [OpenACC 0/7] host_data construct

2015-10-27 Thread Cesar Philippidis
On 10/26/2015 11:34 AM, Jakub Jelinek wrote:
> On Fri, Oct 23, 2015 at 10:51:42AM -0500, James Norris wrote:
>> @@ -12942,6 +12961,7 @@ c_finish_omp_clauses (tree clauses, bool is_omp, 
>> bool declare_simd)
>>  case OMP_CLAUSE_GANG:
>>  case OMP_CLAUSE_WORKER:
>>  case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_USE_DEVICE:
>>pc = _CLAUSE_CHAIN (c);
>>continue;
>>  
> 
> Are there any restrictions on whether you can specify the same var multiple
> times in use_device clause?
> #pragma acc host_data use_device (x) use_device (x) use_device (y, y, y)
> ?
> If not, have you verified that the gimplifier doesn't ICE on it?  Generally
> it doesn't like the same var being mentioned multiple times.
> If yes, you can use e.g. the generic_head bitmap for that and in any case,
> cover that with sufficient testsuite coverage.

Generally variables cannot appear in multiple clauses. I'll add more
testing for this.

>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> index ab9e540..0c32219 100644
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -93,6 +93,8 @@ enum gimplify_omp_var_data
>>  
>>GOVD_MAP_0LEN_ARRAY = 32768,
>>  
>> +  GOVD_USE_DEVICE = 65536,
>> +
>>GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
>> | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
>> | GOVD_LOCAL)
>> @@ -116,7 +118,9 @@ enum omp_region_type
>>ORT_COMBINED_TARGET = 33,
>>/* Dummy OpenMP region, used to disable expansion of
>>   DECL_VALUE_EXPRs in taskloop pre body.  */
>> -  ORT_NONE = 64
>> +  ORT_NONE = 64,
>> +  /* An OpenACC host-data region.  */
>> +  ORT_HOST_DATA = 128
> 
> I'd prefer ORT_NONE to be the last one, can you just renumber it and put
> ORT_HOST_DATA before it?

OK.

>> +static tree
>> +gimplify_oacc_host_data_1 (tree *tp, int *walk_subtrees,
>> +   void *data ATTRIBUTE_UNUSED)
>> +{
> 
> Your use_device sounds very similar to use_device_ptr clause in OpenMP,
> which is allowed on #pragma omp target data construct and is implemented
> quite a bit differently from this; it is unclear if the OpenACC standard
> requires this kind of implementation, or you just chose to implement it this
> way.  In particular, the GOMP_target_data call puts the variables mentioned
> in the use_device_ptr clauses into the mapping structures (similarly how
> map clause appears) and the corresponding vars are privatized within the
> target data region (which is a host region, basically a fancy { } braces),
> where the private variables contain the offloading device's pointers.

Is this a new OpenMP 4.5 feature? I'll take a closer look and see if
they are similar enough. I also noticed that OpenMP 4.5 has something
similar to OpenACC's enter/exit data construct now.

>> +  splay_tree_node n = NULL;
>> +  location_t loc = EXPR_LOCATION (*tp);
>> +
>> +  switch (TREE_CODE (*tp))
>> +{
>> +case ADDR_EXPR:
>> +  {
>> +tree decl = TREE_OPERAND (*tp, 0);
>> +
>> +switch (TREE_CODE (decl))
>> +  {
>> +  case ARRAY_REF:
>> +  case ARRAY_RANGE_REF:
>> +  case COMPONENT_REF:
>> +  case VIEW_CONVERT_EXPR:
>> +  case REALPART_EXPR:
>> +  case IMAGPART_EXPR:
>> +if (TREE_CODE (TREE_OPERAND (decl, 0)) == VAR_DECL)
>> +  n = splay_tree_lookup (gimplify_omp_ctxp->variables,
>> + (splay_tree_key) TREE_OPERAND (decl, 0));
>> +break;
> 
> I must say this looks really strange, you throw away all the offsets
> embedded in the component codes (fixed or variable).
> Where comes the above list?  What about other components (say bit field refs,
> etc.)?

I'm not sure. This is one of those things where multiple developers
worked on it, and the history got lost. I'll investigate it.

>> +case VAR_DECL:
> 
> What is so special about VAR_DECLs?  Shouldn't PARM_DECLs / RESULT_DECLs
> be treated the same way?
>> --- a/libgomp/libgomp.map
>> +++ b/libgomp/libgomp.map
>> @@ -378,6 +378,7 @@ GOACC_2.0 {
>>  GOACC_wait;
>>  GOACC_get_thread_num;
>>  GOACC_get_num_threads;
>> +GOACC_deviceptr;
>>  };
>>  
>>  GOACC_2.0.1 {
> 
> You shouldn't be adding new symbols into a symbol version that appeared in a
> compiler that shipped already (GCC 5 already had GOACC_2.0 symbols).
> So it should go into GOACC_2.0.1.

OK.

>> diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
>> index af067d6..497ab92 100644
>> --- a/libgomp/oacc-mem.c
>> +++ b/libgomp/oacc-mem.c
>> @@ -204,6 +204,38 @@ acc_deviceptr (void *h)
>>return d;
>>  }
>>  
>> +/* This function is used as a helper in generated code to implement pointer
>> +   lookup in host_data regions.  Unlike acc_deviceptr, it returns its 
>> argument
>> +   unchanged on a shared-memory system (e.g. the host).  */
>> +
>> +void *
>> +GOACC_deviceptr (void *h)
>> +{
>> +  splay_tree_key n;
>> +  void *d;
>> +  void *offset;
>> +
>> +  goacc_lazy_initialize ();
>> +
>> +  

Re: more accurate error messages omp in fortran

2015-10-27 Thread Cesar Philippidis
(was "Re: more accurate omp in fortran"

Ping.

Cesar

On 10/22/2015 08:21 AM, Cesar Philippidis wrote:
> Currently, for certain omp and oacc errors the fortran will inaccurately
> report exactly where in the omp/acc construct the error has occurred. E.g.
> 
>!$acc parallel copy (i) copy (i) copy (j)
>1
> Error: Symbol ‘i’ present on multiple clauses at (1)
> 
> instead of
> 
>!$acc parallel copy (i) copy (i) copy (j)
> 1
> Error: Symbol ‘i’ present on multiple clauses at (1)
> 
> The problem here is how the front end uses the locus for the construct
> and not the individual clause. As a result that diagnostic pointer
> points to the end of the construct.
> 
> This patch teaches gfc_resolve_omp_clauses how to use the locus of each
> individual clause instead of the construct when reporting errors
> involving OMP_LIST_ clauses (which are typically clauses involving
> variables). It's still not perfect, but it does improve the quality of
> the error reporting a little. In particular, in openacc, other compilers
> are somewhat lenient in allowing variables to appear in multiple
> clauses, e.g. copyin (foo) copyout (foo), but this is clearly forbidden
> by the spec. I received some bug reports complaining that gfortran's
> errors aren't accurate.
> 
> I've also split off the check for variables appearing in multiple
> clauses into a separate function. It's a little overkill for trunk right
> now, but it is used quite a bit in gomp4 for oacc declare.
> 
> I've tested these changes on x86_64. Is this ok for trunk?
> 
> Cesar
> 
> 



[gomp4] fortran cleanups and c/c++ loop parsing backport

2015-10-27 Thread Cesar Philippidis
This patch contains the following:

  * C front end changes from trunk:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02528.html

  * C++ front end changes from trunk:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02540.html

  * Proposed fortran cleanups and enhanced error reporting changes:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02288.html

In addition, I've also added a couple of more test cases and updated the
way that combined directives are handled in fortran. Because the
device_type clauses form a chain of gfc_omp_clauses, I couldn't reuse
gfc_split_omp_clauses for combined parallel and kernels loops. So that's
why I introduced a new gfc_filter_oacc_combined_clauses function.

I'll apply this patch to gomp-4_0-branch shortly. I know that I should
have broken this patch down into smaller patches, but it was already
arranged as one big patch in my source tree.

Cesar
2015-10-27  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): Backport from trunk.
	(c_parser_omp_simple_clause): Likewise.
	(c_parser_oacc_all_clauses): Likewise.

	gcc/cp/
	* parser.c (cp_parser_oacc_shape_clause): Backport from trunk.
	(cp_parser_oacc_all_clauses): Likewise.
	* semantics.c (finish_omp_clauses): Likewise.

	gcc/fortran/
	* gfortran.h (gfc_omp_namelist): Add locus where member.
	* openmp.c (gfc_free_omp_clauses): Recursively deallocate device_type
	clauses.
	(gfc_match_omp_variable_list): New function.
	(resolve_omp_clauses): Remove where argument and use the where
	gfc_omp_namespace member when reporting errors.  Use
	resolve_omp_duplicate_list to check for variables appearing in
	mulitple clauses.
	(gfc_match_omp_clauses): Update call to resolve_omp_clauses.
	(gfc_match_oacc_declare): Likewise.
	(resolve_omp_do): Likewise.
	(resolve_oacc_loop): Likewise.
	(gfc_resolve_oacc_directive): Likewise.
	(gfc_resolve_omp_directive): Likewise.
	(gfc_resolve_omp_declare_simd): Likewise.
	(resolve_oacc_declare_map): New function.
	(gfc_resolve_oacc_declare): Use it.
	* trans-openmp.c (gfc_filter_oacc_combined_clauses): New function.
	(gfc_trans_oacc_combined_directive): Use it.

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c (int main): New test.
	* g++.dg/gomp/pr33372-1.C: Adjust expected error messages.
	* g++.dg/gomp/pr33372-3.C: Likewise.
	* gfortran.dg/goacc/combined-directives.f90: New test.
	* gfortran.dg/goacc/declare-2.f95: Adjust error message.
	* gfortran.dg/goacc/multi-clause.f90: New test.
	* gfortran.dg/gomp/intentin1.f90: Adjust error message.

	libgomp/
	* testsuite/libgomp.oacc-fortran/combdir-1.f90: Rename to ...
	* testsuite/libgomp.oacc-fortran/combined-directive-1.f90: ... this.
	Add a description of the test at the top of the file.

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 3c36fc6..a1465bf 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11226,119 +11226,146 @@ c_parser_omp_clause_is_device_ptr (c_parser *parser, tree list)
 }
 
 /* OpenACC:
-   gang [( gang_expr_list )]
-   worker [( expression )]
-   vector [( expression )] */
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
 
 static tree
-c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
 			const char *str, tree list)
 {
-  omp_clause_code kind;
   const char *id = "num";
-
-  switch (c_kind)
-{
-default:
-  gcc_unreachable ();
-case PRAGMA_OACC_CLAUSE_GANG:
-  kind = OMP_CLAUSE_GANG;
-  break;
-case PRAGMA_OACC_CLAUSE_VECTOR:
-  kind = OMP_CLAUSE_VECTOR;
-  id = "length";
-  break;
-case PRAGMA_OACC_CLAUSE_WORKER:
-  kind = OMP_CLAUSE_WORKER;
-  break;
-}
-
-  tree op0 = NULL_TREE, op1 = NULL_TREE;
+  tree ops[2] = { NULL_TREE, NULL_TREE }, c;
   location_t loc = c_parser_peek_token (parser)->location;
 
+  if (kind == OMP_CLAUSE_VECTOR)
+id = "length";
+
   if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
 {
-  tree *op_to_parse = 
   c_parser_consume_token (parser);
 
   do
 	{
-	  if (c_parser_next_token_is (parser, CPP_NAME)
-	  || c_parser_next_token_is (parser, CPP_KEYWORD))
+	  c_token *next = c_parser_peek_token (parser);
+	  int idx = 0;
+
+	  /* Gang static argument.  */
+	  if (kind == OMP_CLAUSE_GANG
+	  && c_parser_next_token_is_keyword (parser, RID_STATIC))
 	{
-	  tree name_kind = c_parser_peek_token (parser)->value;
-	  const char *p = IDENTIFIER_POINTER (name_kind);
-	  if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
+	  c_parser_consume_token (parser);
+
+	  if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
+		goto cleanup_error;
+
+	  

Re: [OpenACC 5/11] C++ FE changes

2015-10-26 Thread Cesar Philippidis
On 10/26/2015 03:20 AM, Jakub Jelinek wrote:
> On Sat, Oct 24, 2015 at 02:11:41PM -0700, Cesar Philippidis wrote:

>> --- a/gcc/cp/semantics.c
>> +++ b/gcc/cp/semantics.c
>> @@ -5911,6 +5911,31 @@ finish_omp_clauses (tree clauses, bool allow_fields, 
>> bool declare_simd)
>>  bitmap_set_bit (_head, DECL_UID (t));
>>goto handle_field_decl;
>>  
>> +case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_WORKER:
>> +  /* Operand 0 is the num: or length: argument.  */
>> +  t = OMP_CLAUSE_OPERAND (c, 0);
>> +  if (t == NULL_TREE)
>> +break;
>> +
>> +  if (!processing_template_decl)
>> +t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
>> +  OMP_CLAUSE_OPERAND (c, 0) = t;
>> +
>> +  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_GANG)
>> +break;
> 
> I think it would be better to do the Operand 1 stuff first for
> case OMP_CLAUSE_GANG: only, and then have /* FALLTHRU */ into
> case OMP_CLAUSE_{VECTOR,WORKER}: which would handle the first argument.
> 
> You should add testing that the operand has INTEGRAL_TYPE_P type
> (except that for processing_template_decl it can be
> type_dependent_expression_p instead of INTEGRAL_TYPE_P).
>
> Also, the if (t == NULL_TREE) stuff looks fishy, because e.g. right now
> if you have OMP_CLAUSE_GANG gang (static: expr) or similar,
> you wouldn't wrap the expr into cleanup point.
> So, instead it should be
>   if (t)
> {
>   if (t == error_mark_node)
>   remove = true;
>   else if (!type_dependent_expression_p (t)
>&& !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>   {
> error_at (OMP_CLAUSE_LOCATION (c), ...);
> remove = true;
> }
>   else
>   {
> t = mark_rvalue_use (t);
> if (!processing_template_decl)
>   t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
> OMP_CLAUSE_OPERAND (c, 0) = t;
>   }
> }
> or so.  Also, can the expressions be arbitrary integers, or just
> non-negative, or positive?  If it is INTEGER_CST, that is something that
> could be checked here too.

I ended up handling with with OMP_CLAUSE_NUM_*, since they all require
positive integer expressions. The only exception was OMP_CLAUSE_GANG
which has two optional arguments.

>>else if (!type_dependent_expression_p (t)
>> && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>>  {
>> -  error ("num_threads expression must be integral");
>> + switch (OMP_CLAUSE_CODE (c))
>> +{
>> +case OMP_CLAUSE_NUM_TASKS:
>> +  error ("%<num_tasks%> expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_TEAMS:
>> +  error ("%<num_teams%> expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_THREADS:
>> +  error ("%<num_threads%> expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_GANGS:
>> +  error ("%<num_gangs%> expression must be integral"); break;
>> +case OMP_CLAUSE_NUM_WORKERS:
>> +  error ("%<num_workers%> expression must be integral");
>> +  break;
>> +case OMP_CLAUSE_VECTOR_LENGTH:
>> +  error ("%<vector_length%> expression must be integral");
>> +  break;
> 
> When touching these, can you please use error_at (OMP_CLAUSE_LOCATION (c),
> instead of error ( ?

Done

>> +default:
>> +  error ("invalid argument");
> 
> What invalid argument?  I'd say that is clearly gcc_unreachable (); case.
> 
> But, I think it would be better to just use
>   error_at (OMP_CLAUSE_LOCATION (c), "%qs expression must be integral",
>   omp_clause_code_name[c]);

I used that generic message for all of those clauses except for _GANG,
_WORKER and _VECTOR. The gang clause, at the very least, needed it to
disambiguate the static and num arguments. If you want I can handle
_WORKER and _VECTOR with the generic message. I only included it because
those arguments are optional, whereas they are mandatory for the other
clauses.

Is this patch OK for trunk?

Cesar

2015-10-26  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>
	Joseph Myers  <jos...@codesourcery.com>
	Julian Brown  <jul...@codesourcery.com>
	Nathan Sidwell <nat...@codesour

Re: [OpenACC 4/11] C FE changes

2015-10-26 Thread Cesar Philippidis
On 10/26/2015 01:59 AM, Jakub Jelinek wrote:

> Ok for trunk with those changes fixed.

Here's the patch with those changes. Nathan will commit this patch the
rest of the openacc execution model patches.

Thanks,
Cesar

2015-10-26  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>
	Joseph Myers  <jos...@codesourcery.com>
	Julian Brown  <jul...@codesourcery.com>
	Bernd Schmidt  <bschm...@redhat.com>

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.

2015-10-26  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c: New test.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c8c6a2d..13f09d8 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11188,6 +11188,167 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 }
 
 /* OpenACC:
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
+
+static tree
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+			const char *str, tree list)
+{
+  const char *id = "num";
+  tree ops[2] = { NULL_TREE, NULL_TREE }, c;
+  location_t loc = c_parser_peek_token (parser)->location;
+
+  if (kind == OMP_CLAUSE_VECTOR)
+id = "length";
+
+  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
+{
+  c_parser_consume_token (parser);
+
+  do
+	{
+	  c_token *next = c_parser_peek_token (parser);
+	  int idx = 0;
+
+	  /* Gang static argument.  */
+	  if (kind == OMP_CLAUSE_GANG
+	  && c_parser_next_token_is_keyword (parser, RID_STATIC))
+	{
+	  c_parser_consume_token (parser);
+
+	  if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
+		goto cleanup_error;
+
+	  idx = 1;
+	  if (ops[idx] != NULL_TREE)
+		{
+		  c_parser_error (parser, "too many %<static%> arguments");
+		  goto cleanup_error;
+		}
+
+	  /* Check for the '*' argument.  */
+	  if (c_parser_next_token_is (parser, CPP_MULT))
+		{
+		  c_parser_consume_token (parser);
+		  ops[idx] = integer_minus_one_node;
+
+		  if (c_parser_next_token_is (parser, CPP_COMMA))
+		{
+		  c_parser_consume_token (parser);
+		  continue;
+		}
+		  else
+		break;
+		}
+	}
+	  /* Worker num: argument and vector length: arguments.  */
+	  else if (c_parser_next_token_is (parser, CPP_NAME)
+		   && strcmp (id, IDENTIFIER_POINTER (next->value)) == 0
+		   && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
+	{
+	  c_parser_consume_token (parser);  /* id  */
+	  c_parser_consume_token (parser);  /* ':'  */
+	}
+
+	  /* Now collect the actual argument.  */
+	  if (ops[idx] != NULL_TREE)
+	{
+	  c_parser_error (parser, "unexpected argument");
+	  goto cleanup_error;
+	}
+
+	  location_t expr_loc = c_parser_peek_token (parser)->location;
+	  tree expr = c_parser_expr_no_commas (parser, NULL).value;
+	  if (expr == error_mark_node)
+	goto cleanup_error;
+
+	  mark_exp_read (expr);
+	  expr = c_fully_fold (expr, false, NULL);
+
+	  /* Attempt to statically determine when the number isn't a
+	 positive integer.  */
+
+	  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)))
+	{
+	  c_parser_error (parser, "expected integer expression");
+	  return list;
+	}
+
+	  tree c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, expr,
+build_int_cst (TREE_TYPE (expr), 0));
+	  if (c == boolean_true_node)
+	{
+	  warning_at (loc, 0,
+			  "%<%s%> value must be positive", str);
+	  expr = integer_one_node;
+	}
+
+	  ops[idx] = expr;
+
+	  if (kind == OMP_CLAUSE_GANG
+	  && c_parser_next_token_is (parser, CPP_COMMA))
+	{
+	  c_parser_consume_token (parser);
+	  continue;
+	}
+	  break;
+	}
+  while (1);
+
+  if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+	goto cleanup_error;
+}
+
+  check_no_duplicate_clause (list, kind, str);
+
+  c = build_omp_clause (loc, kind);
+
+  if (ops[1])
+OMP_CLAUSE_OPERAND (c, 1) = ops[1];
+
+  OMP_CLAUSE_OPERAND (c, 0) = ops[0];
+  OMP_CLAUSE_CHAIN (c) = list;
+
+  return c;
+
+ cleanup_error:
+  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+  return list;
+}
+
+/* OpenACC:
+   auto
+   independent
+   nohost
+   seq */
+
+static tree
+c_parser_oacc_simple_clause (c_parser *parser, enum omp_clause_code code,
+			 tree list)
+{
+  check_no_duplicate_clause (list, code, omp_clause_code_n

Re: [OpenACC 5/11] C++ FE changes

2015-10-24 Thread Cesar Philippidis
On 10/23/2015 07:37 PM, Cesar Philippidis wrote:
> On 10/23/2015 01:25 PM, Cesar Philippidis wrote:
>> On 10/22/2015 01:52 AM, Jakub Jelinek wrote:
>>> On Wed, Oct 21, 2015 at 03:18:55PM -0400, Nathan Sidwell wrote:
>>>> This patch is the C++ changes matching the C ones of patch 4.  In
>>>> finish_omp_clauses, the gang, worker, & vector clauses are handled the same
>>>> as OpenMP's 'num_threads' clause.  One change to num_threads is the
>>>> augmentation of a diagnostic to add %<...%>  markers to the clause name.
>>>
>>> Indeed, lots of older OpenMP diagnostics is missing %<...%> markers around
>>> keywords.  Something to fix eventually.
>>
>> I updated omp tasks and teams in semantics.c.
>>
>>>> 2015-10-20  Cesar Philippidis  <ce...@codesourcery.com>
>>>>Thomas Schwinge  <tho...@codesourcery.com>
>>>>James Norris  <jnor...@codesourcery.com>
>>>>Joseph Myers  <jos...@codesourcery.com>
>>>>Julian Brown  <jul...@codesourcery.com>
>>>>Nathan Sidwell <nat...@codesourcery.com>
>>>>
>>>>* parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
>>>>vector, worker.
>>>>(cp_parser_oacc_simple_clause): New.
>>>>(cp_parser_oacc_shape_clause): New.
>>>
>>> What I've said for the C FE patch, plus:
>>>
>>>> +if (cp_lexer_next_token_is (lexer, CPP_NAME)
>>>> +|| cp_lexer_next_token_is (lexer, CPP_KEYWORD))
>>>> +  {
>>>> +tree name_kind = cp_lexer_peek_token (lexer)->u.value;
>>>> +const char *p = IDENTIFIER_POINTER (name_kind);
>>>> +if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
>>>
>>> As static is a keyword, wouldn't it be better to just handle that case
>>> using cp_lexer_next_token_is_keyword (lexer, RID_STATIC)?
>>>
>>> Also, what is the exact grammar of the shape arguments?
>>> Would be nice to describe the grammar, in the grammar you just say
>>> expression, at least for vector/worker, which is clearly not accurate.
>>>
>>> It seems the intent is that num: or length: or static: is optional, right?
>>> But if that is the case, you should treat those as parsed only if followed
>>> by :.  While static is a keyword, so you can't have a variable called like
>>> that, having vector(length) or vector(num) should not be rejected.
>>> So, I would have expected that it should test if it is RID_STATIC
>>> followed by CPP_COLON (and only in that case consume those tokens),
>>> or CPP_NAME of id followed by CPP_COLON (and only in that case consume those
>>> tokens), otherwise parse it as assignment expression.
>>
>> That function now peeks ahead to look for a colon, so now it can handle
>> variables with the name of clause keywords.
>>
>>> The C FE may have similar issue.  Plus of course there should be testsuite
>>> coverage for all the weird cases.
>>
>> I included a new test in a different patch because it's common to both c
>> and c++.
>>
>>>> +  case OMP_CLAUSE_GANG:
>>>> +  case OMP_CLAUSE_VECTOR:
>>>> +  case OMP_CLAUSE_WORKER:
>>>> +/* Operand 0 is the num: or length: argument.  */
>>>> +t = OMP_CLAUSE_OPERAND (c, 0);
>>>> +if (t == NULL_TREE)
>>>> +  break;
>>>> +
>>>> +t = maybe_convert_cond (t);
>>>
>>> Can you explain the maybe_convert_cond calls (in both cases here,
>>> plus the preexisting in OMP_CLAUSE_VECTOR_LENGTH)?
>>> The reason why it is used for OpenMP if and final clauses is that those have
>>> a condition argument, either the condition is zero or non-zero (so
>>> effectively it is turned into a bool).
>>> But aren't the gang/vector/worker/vector_length arguments integers rather
>>> than conditions?  I'd expect that finish_omp_clauses should verify
>>> those operands are indeed integral expressions (if that is the requirement
>>> in the standard), as it is something that for C++ can't be verified during
>>> parsing, if arbitrary expressions are parsed there.
>>
>> It's probably a copy-and-paste error. This functionality was added
>> incrementally. I removed that check.
>>
>>>> @@ -5959,32 +5990,58 @@ finish_omp_clauses (tree clauses, bool a
>>>>  break;
>>>>  
>>>&

Re: [OpenACC 4/11] C FE changes

2015-10-24 Thread Cesar Philippidis
On 10/24/2015 01:03 AM, Jakub Jelinek wrote:
> On Fri, Oct 23, 2015 at 07:31:51PM -0700, Cesar Philippidis wrote:
> 
>> +static tree
>> +c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
>> +const char *str, tree list)
>> +{
>> +  const char *id = "num";
>> +  tree op0 = NULL_TREE, op1 = NULL_TREE, c;
>> +  location_t loc = c_parser_peek_token (parser)->location;
>> +
>> +  if (kind == OMP_CLAUSE_VECTOR)
>> +id = "length";
>> +
>> +  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
>> +{
>> +  tree *op_to_parse = 
>> +  c_token *next;
>> +
>> +  c_parser_consume_token (parser);
>> +
>> +  do
>> +{
>> +  op_to_parse = 
>> +
>> +  /* Consume a comma if present.  */
>> +  if (c_parser_next_token_is (parser, CPP_COMMA))
>> +{
>> +  if (op0 == NULL && op1 == NULL)
>> +{
>> +  c_parser_error (parser, "unexpected argument");
>> +  goto cleanup_error;
>> +}
>> +
>> +  c_parser_consume_token (parser);
>> +}
> 
> This means you parse
> gang (, static: *)
> vector (, 5)
> etc., even when you error on it afterwards with unexpected argument,
> it is still different diagnostics from other invalid tokens immediately
> after the opening (.

So you didn't like how the error messages are inconsistent? It was
catching those errors.

I've added those new test cases. Unfortunately, c and c++ report
different error messages, so I had the make dg-error generic to that
line containing those types of errors.

> Also, loc and next are wrong if there is a valid comma.

Yeah, I don't think it needs to be adjusted in the loop. c_parser_error
already knows where to report the error at anyway.

> So I'm really wondering why
> gang (static: *, num: 5)
> works, because next is the CPP_COMMA token, so while
> c_parser_next_token_is (parser, CPP_NAME) matches the actual name,
> what exactly next->value contains is unclear.
> 
> I think it would be better to:
> 
>   tree ops[2] = { NULL_TREE, NULL_TREE };
> 
>   do
>   {
> // Note, declare these here
> c_token *next = c_parser_peek_token (parser);
> location_t loc = next->location;
> // Just use ops[idx] instead of *op_to_parse etc., though if you strongly
> // prefer *op_to_parse, I won't object.
> int idx = 0;
> // Note it seems generally the C parser doesn't check for CPP_KEYWORD
> // before calling c_parser_next_token_is_keyword.  And I'd just do it
> // for OMP_CLAUSE_GANG, which has it in the grammar.
> if (kind == OMP_CLAUSE_GANG
> && c_parser_next_token_is_keyword (parser, RID_STATIC))
>   {
> // ...
> // Your current code, except that for 
> if (c_parser_next_token_is (parser, CPP_MULT))
>   {
> c_parser_consume_token (parser);
> if (c_parser_next_token_is (parser, CPP_COMMA))
>   {
> c_parser_consume_token (parser);
> continue;
>   }
> break;
>   }
>   }
> else if (... num: / length: )
>   {
> // ...
>   }
> // ...
> mark_exp_read (expr);
> ops[idx] = expr;
> 
> if (kind == OMP_CLAUSE_GANG
> && c_parser_next_token_is (parser, CPP_COMMA))
>   {
> c_parser_consume_token (parser);
> continue;
>   }
> break;
>   }
>   while (1);
> 
>   if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
>   goto cleanup_error;
> 
> That way you don't parse something that is not in the grammar.

I did that. It turned out to be a little more compact than what I had
before. Is this OK for trunk?

Cesar

2015-10-24  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>
	Joseph Myers  <jos...@codesourcery.com>
	Julian Brown  <jul...@codesourcery.com>
	Bernd Schmidt  <bschm...@redhat.com>

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.

2015-10-24  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c: New test.

diff --git a/gcc/c/c-parser.c b/gcc/

Re: [OpenACC 5/11] C++ FE changes

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 01:25 PM, Cesar Philippidis wrote:
> On 10/22/2015 01:52 AM, Jakub Jelinek wrote:
>> On Wed, Oct 21, 2015 at 03:18:55PM -0400, Nathan Sidwell wrote:
>>> This patch is the C++ changes matching the C ones of patch 4.  In
>>> finish_omp_clauses, the gang, worker, & vector clauses are handled the same
>>> as OpenMP's 'num_threads' clause.  One change to num_threads is the
>>> augmentation of a diagnostic to add %<...%>  markers to the clause name.
>>
>> Indeed, lots of older OpenMP diagnostics is missing %<...%> markers around
>> keywords.  Something to fix eventually.
> 
> I updated omp tasks and teams in semantics.c.
> 
>>> 2015-10-20  Cesar Philippidis  <ce...@codesourcery.com>
>>> Thomas Schwinge  <tho...@codesourcery.com>
>>> James Norris  <jnor...@codesourcery.com>
>>> Joseph Myers  <jos...@codesourcery.com>
>>> Julian Brown  <jul...@codesourcery.com>
>>> Nathan Sidwell <nat...@codesourcery.com>
>>>
>>> * parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
>>> vector, worker.
>>> (cp_parser_oacc_simple_clause): New.
>>> (cp_parser_oacc_shape_clause): New.
>>
>> What I've said for the C FE patch, plus:
>>
>>> + if (cp_lexer_next_token_is (lexer, CPP_NAME)
>>> + || cp_lexer_next_token_is (lexer, CPP_KEYWORD))
>>> +   {
>>> + tree name_kind = cp_lexer_peek_token (lexer)->u.value;
>>> + const char *p = IDENTIFIER_POINTER (name_kind);
>>> + if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
>>
>> As static is a keyword, wouldn't it be better to just handle that case
>> using cp_lexer_next_token_is_keyword (lexer, RID_STATIC)?
>>
>> Also, what is the exact grammar of the shape arguments?
>> Would be nice to describe the grammar, in the grammar you just say
>> expression, at least for vector/worker, which is clearly not accurate.
>>
>> It seems the intent is that num: or length: or static: is optional, right?
>> But if that is the case, you should treat those as parsed only if followed
>> by :.  While static is a keyword, so you can't have a variable called like
>> that, having vector(length) or vector(num) should not be rejected.
>> So, I would have expected that it should test if it is RID_STATIC
>> followed by CPP_COLON (and only in that case consume those tokens),
>> or CPP_NAME of id followed by CPP_COLON (and only in that case consume those
>> tokens), otherwise parse it as assignment expression.
> 
> That function now peeks ahead to look for a colon, so now it can handle
> variables with the name of clause keywords.
> 
>> The C FE may have similar issue.  Plus of course there should be testsuite
>> coverage for all the weird cases.
> 
> I included a new test in a different patch because it's common to both c
> and c++.
> 
>>> +   case OMP_CLAUSE_GANG:
>>> +   case OMP_CLAUSE_VECTOR:
>>> +   case OMP_CLAUSE_WORKER:
>>> + /* Operand 0 is the num: or length: argument.  */
>>> + t = OMP_CLAUSE_OPERAND (c, 0);
>>> + if (t == NULL_TREE)
>>> +   break;
>>> +
>>> + t = maybe_convert_cond (t);
>>
>> Can you explain the maybe_convert_cond calls (in both cases here,
>> plus the preexisting in OMP_CLAUSE_VECTOR_LENGTH)?
>> The reason why it is used for OpenMP if and final clauses is that those have
>> a condition argument, either the condition is zero or non-zero (so
>> effectively it is turned into a bool).
>> But aren't the gang/vector/worker/vector_length arguments integers rather
>> than conditions?  I'd expect that finish_omp_clauses should verify
>> those operands are indeed integral expressions (if that is the requirement
>> in the standard), as it is something that for C++ can't be verified during
>> parsing, if arbitrary expressions are parsed there.
> 
> It's probably a copy-and-paste error. This functionality was added
> incrementally. I removed that check.
> 
>>> @@ -5959,32 +5990,58 @@ finish_omp_clauses (tree clauses, bool a
>>>   break;
>>>  
>>> case OMP_CLAUSE_NUM_THREADS:
>>> - t = OMP_CLAUSE_NUM_THREADS_EXPR (c);
>>> - if (t == error_mark_node)
>>> -   remove = true;
>>> - else if (!type_dependent_expression_p (t)
>>> -  && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>>> -   {
>>> - error ("num_threads exp

Re: [OpenACC 11/11] execution tests

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 01:29 PM, Cesar Philippidis wrote:
> On 10/22/2015 08:00 AM, Jakub Jelinek wrote:
>> On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
>>>> But it is unclear from the parsing what from these is allowed:
>>>
>>> int v, w;
>>> ...
>>> gang(26)  // equivalent to gang(num:26)
>>> gang(v)   // gang(num:v)
>>> vector(length: 16)  // vector(length: 16)
>>> vector(length: v)  // vector(length: v)
>>> vector(16)  // vector(length: 16)
>>> vector(v)   // vector(length: v)
>>> worker(num: 16)  // worker(num: 16)
>>> worker(num: v)   // worker(num: 16)
>>> worker(16)  // worker(num: 16)
>>> worker(v)   // worker(num: 16)
>>> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>>>   // should be an error
>>> gang(v, w)  // likewise
>>> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
>>> gang(static: v, num: w)   // gang(static: v, num: w)
>>> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
>>> gang(num: v, static: w)   // gang(num: v, static: w)
>>>
>>> Also note that the static argument can accept '*'.
>>>
>>>> and if the length: or num: part is really optional, then
>>>> int length, num;
>>>> vector(length)
>>>> worker(num)
>>>> gang(num, static: 6)
>>>> gang(static: 5, num)
>>>> should be also accepted (or subset thereof?).
>>>
>>> Interesting question. The spec is unclear. It defines gang, worker and
>>> vector as follows in section 2.7 in the OpenACC 2.0a spec:
>>>
>>>   gang [( gang-arg-list )]
>>>   worker [( [num:] int-expr )]
>>>   vector [( [length:] int-expr )]
>>>
>>> where gang-arg is one of:
>>>
>>>   [num:] int-expr
>>>   static: size-expr
>>>
>>> and gang-arg-list may have at most one num and one static argument,
>>> and where size-expr is one of:
>>>
>>>   *
>>>   int-expr
>>>
>>> So I've interpreted that as a requirement that length and num must be
>>> followed by an int-expr, whatever that is.
>>
>> My reading of the above is that
>> vector(length)
>> is equivalent to
>> vector(length: length)
>> and
>> worker(num)
>> is equivalent to
>> vector(num: num)
>> etc.  Basically, neither length nor num aren't reserved identifiers,
>> so you can use them for variable names, and if
>> vector(v) is equivalent to vector(length: v), then
>> vector(length) should be equivalent to vector(length:length)
>> or
>> vector(length + 1) should be equivalent to vector(length: length+1)
>> static is a keyword that can't start an integral expression, so I guess
>> it is fine if you issue an expected : diagnostics after it.
>>
>> In any case, please add a testcase (both C and C++) which covers all these
>> allowed variants (ideally one testcase) and rejected variants (another
>> testcase with dg-error).
>>
>> This is still an easy case, as even the C FE has 2 tokens lookup.
>> E.g. for OpenMP map clause where
>> map (always, tofrom: x)
>> means one thing and
>> map (always, tofrom, y)
>> another one (map (tofrom: always, tofrom, y))
>> I had to do quite ugly things to get around this.
> 
> Here are the updated test cases. Besides for adding a new test to
> exercise the loop shape parsing, I also removed that assembly file
> included in the original patch that Ilya noticed.
> 
> Is this OK for trunk?

This patch is mostly the same as I posted earlier, with the exclusion of
the loop-shape parser test. That test was included with the c parser
changes.

Is this OK for trunk?

Cesar

2015-10-23  Nathan Sidwell  <nat...@codesourcery.com>

	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.s: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: New.

diff --git a/libgomp/testsuite/libgomp.c++/member-2.C b/libgomp/testsuite/libgomp.c++/member-2.C
index bb348d8..bbe2bdf4 100644
--- a/libgomp/testsuite/libgomp.c++/member-2.C
+++ b/libgomp/testsuite/libgomp.c++/member-2.C
@@ -154,7 +154,7 @@ A::m1 ()
 {
   f = false;
 #pragma omp single
-#pragma omp taskloop lastprivate (a, T::t, b, n)
+#pragma omp taskloop lastprivate (a, T::t, b, n) private (R::r)
   

Re: [OpenACC 4/11] C FE changes

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 02:31 PM, Cesar Philippidis wrote:
> On 10/23/2015 01:31 PM, Jakub Jelinek wrote:
>> On Fri, Oct 23, 2015 at 01:17:07PM -0700, Cesar Philippidis wrote:
>>> Good idea, thanks. This patch also corrects the problems parsing weird
>>> combinations of num, static and length arguments that you mentioned
>>> elsewhere.
>>>
>>> Is this OK for trunk?
>>
>> I'd strongly prefer to see always patches accompanied by testcases.
>>
>>> + loc = c_parser_peek_token (parser)->location;
>>> + op_to_parse = 
>>> +
>>> + if ((c_parser_next_token_is (parser, CPP_NAME)
>>> +  || c_parser_next_token_is (parser, CPP_KEYWORD))
>>> + && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
>>> +   {
>>> + tree name_kind = c_parser_peek_token (parser)->value;
>>> + const char *p = IDENTIFIER_POINTER (name_kind);
>>
>> I think I'd prefer not to peek at this at all if it is RID_STATIC,
>> so perhaps just have (and name_kind is weird):
>>else
>>  {
>>tree val = c_parser_peek_token (parser)->value;
>>if (strcmp (id, IDENTIFIER_POINTER (val)) == 0)
>>  {
>>c_parser_consume_token (parser);  /* id  */
>>c_parser_consume_token (parser);  /* ':'  */
>>  }
>>else
>>  {
>> ...
>>  }
>>  }
>> ?
> 
> My plan over here was try and catch any arguments with a colon. But that
> fell threw because...
> 
>>> + if (kind == OMP_CLAUSE_GANG
>>> + && c_parser_next_token_is_keyword (parser, RID_STATIC))
>>> +   {
>>> + c_parser_consume_token (parser); /* static  */
>>> + c_parser_consume_token (parser); /* ':'  */
>>> +
>>> + op_to_parse = 
>>> + if (c_parser_next_token_is (parser, CPP_MULT))
>>> +   {
>>> + c_parser_consume_token (parser);
>>> + *op_to_parse = integer_minus_one_node;
>>> +
>>> + /* Consume a comma if present.  */
>>> + if (c_parser_next_token_is (parser, CPP_COMMA))
>>> +   c_parser_consume_token (parser);
>>
>> Doesn't this mean that you happily parse
>> gang (static: * abc)
>> or
>> gang (static:*num:1)
>> etc.?  I'd say the comma should be non-optional (i.e. either accept
>> CPP_COMMA, or CPP_CLOSE_PARENT, but nothing else) in that case (at least,
>> when in OpenMP grammar something is *-list it is meant to be comma
>> separated).
> 
> I'm not handling commas properly. My next patch is going to handle the
> static argument separately.
> 
>>> +     /* Consume a comma if present.  */
>>> + if (c_parser_next_token_is (parser, CPP_COMMA))
>>> +   c_parser_consume_token (parser);
>>
>> Similarly this means
>> gang (num: 5 static: *)
>> is accepted.  If it is valid, then again it should have testsuite coverage.
> 
> I'll include a test case for this with the next patch.

Here's the updated patch. Hopefully I addressed everything. Thank you
for suggesting all of those test cases.

Is this OK for trunk?

Cesar

2015-10-23  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>
	Joseph Myers  <jos...@codesourcery.com>
	Julian Brown  <jul...@codesourcery.com>
	Bernd Schmidt  <bschm...@redhat.com>

	gcc/c/
	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.

2015-10-23  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c (int main):

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c8c6a2d..7d2baa9 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11188,6 +11188,156 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 }
 
 /* OpenACC:
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
+
+static tree
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+			const char *str, tree list)
+{
+  const char *id = "num";
+

Re: Re: [OpenACC 4/11] C FE changes

2015-10-23 Thread Cesar Philippidis
On 10/22/2015 01:22 AM, Jakub Jelinek wrote:
> On Wed, Oct 21, 2015 at 03:16:20PM -0400, Nathan Sidwell wrote:
>> 2015-10-20  Cesar Philippidis  <ce...@codesourcery.com>
>>  Thomas Schwinge  <tho...@codesourcery.com>
>>  James Norris  <jnor...@codesourcery.com>
>>  Joseph Myers  <jos...@codesourcery.com>
>>  Julian Brown  <jul...@codesourcery.com>
>>
>>  * c-parser.c (c_parser_oacc_shape_clause): New.
>>  (c_parser_oacc_simple_clause): New.
>>  (c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
>>  (OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.
> 
> Ok, with one nit.
> 
>>  /* OpenACC:
>> +   gang [( gang_expr_list )]
>> +   worker [( expression )]
>> +   vector [( expression )] */
>> +
>> +static tree
>> +c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
>> +const char *str, tree list)
> 
> I think it would be better to remove the c_kind argument and pass to this
> function omp_clause_code kind instead.  The callers are already in a big
> switch, with a separate call for each of the clauses.
> After all, e.g. for c_parser_oacc_simple_clause you already do it that way
> too.
> 
>> +{
>> +  omp_clause_code kind;
>> +  const char *id = "num";
>> +
>> +  switch (c_kind)
>> +{
>> +default:
>> +  gcc_unreachable ();
>> +case PRAGMA_OACC_CLAUSE_GANG:
>> +  kind = OMP_CLAUSE_GANG;
>> +  break;
>> +case PRAGMA_OACC_CLAUSE_VECTOR:
>> +  kind = OMP_CLAUSE_VECTOR;
>> +  id = "length";
>> +  break;
>> +case PRAGMA_OACC_CLAUSE_WORKER:
>> +  kind = OMP_CLAUSE_WORKER;
>> +  break;
>> +}
> 
> Then you can replace this switch with just if (kind == OMP_CLAUSE_VECTOR)
> id = "length";

Good idea, thanks. This patch also corrects the problems parsing weird
combinations of num, static and length arguments that you mentioned
elsewhere.

Is this OK for trunk?

Nathan, can you try out this patch with your updated patch set? I saw
some test cases getting stuck when expanding expand_GOACC_DIM_SIZE in on
the host compiler, which is wrong. I don't see that happening in
gomp-4_0-branch with this patch. Also, can you merge this patch along
with the c++ and new test case patches to trunk? I'll handle the gomp4
backport.

Cesar

2015-10-20  Cesar Philippidis  <ce...@codesourcery.com>
	Thomas Schwinge  <tho...@codesourcery.com>
	James Norris  <jnor...@codesourcery.com>
	Joseph Myers  <jos...@codesourcery.com>
	Julian Brown  <jul...@codesourcery.com>
	Bernd Schmidt  <bschm...@redhat.com>

	* c-parser.c (c_parser_oacc_shape_clause): New.
	(c_parser_oacc_simple_clause): New.
	(c_parser_oacc_all_clauses): Add auto, gang, seq, vector, worker.
	(OACC_LOOP_CLAUSE_MASK): Add gang, worker, vector, auto, seq.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c8c6a2d..1e3c333 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -11188,6 +11188,142 @@ c_parser_omp_clause_num_workers (c_parser *parser, tree list)
 }
 
 /* OpenACC:
+
+gang [( gang-arg-list )]
+worker [( [num:] int-expr )]
+vector [( [length:] int-expr )]
+
+  where gang-arg is one of:
+
+[num:] int-expr
+static: size-expr
+
+  and size-expr may be:
+
+*
+int-expr
+*/
+
+static tree
+c_parser_oacc_shape_clause (c_parser *parser, omp_clause_code kind,
+			const char *str, tree list)
+{
+  const char *id = "num";
+
+  if (kind == OMP_CLAUSE_VECTOR)
+id = "length";
+
+  tree op0 = NULL_TREE, op1 = NULL_TREE;
+  location_t loc = c_parser_peek_token (parser)->location;
+
+  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
+{
+  tree *op_to_parse = 
+  c_parser_consume_token (parser);
+
+  do
+	{
+	  loc = c_parser_peek_token (parser)->location;
+	  op_to_parse = 
+
+	  if ((c_parser_next_token_is (parser, CPP_NAME)
+	   || c_parser_next_token_is (parser, CPP_KEYWORD))
+	  && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
+	{
+	  tree name_kind = c_parser_peek_token (parser)->value;
+	  const char *p = IDENTIFIER_POINTER (name_kind);
+	  if (kind == OMP_CLAUSE_GANG
+		  && c_parser_next_token_is_keyword (parser, RID_STATIC))
+		{
+		  c_parser_consume_token (parser); /* static  */
+		  c_parser_consume_token (parser); /* ':'  */
+
+		  op_to_parse = 
+		  if (c_parser_next_token_is (parser, CPP_MULT))
+		{
+		  c_parser_consume_token (parser);
+		  *op_to_parse = integer_minus_one_node;
+
+		  /* Consume a comma if present.  */
+		  if (c_parser_next_token_is (parser, CPP_COMMA))

Re: [OpenACC 11/11] execution tests

2015-10-23 Thread Cesar Philippidis
On 10/22/2015 08:00 AM, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
>>> But it is unclear from the parsing what from these is allowed:
>>
>> int v, w;
>> ...
>> gang(26)  // equivalent to gang(num:26)
>> gang(v)   // gang(num:v)
>> vector(length: 16)  // vector(length: 16)
>> vector(length: v)  // vector(length: v)
>> vector(16)  // vector(length: 16)
>> vector(v)   // vector(length: v)
>> worker(num: 16)  // worker(num: 16)
>> worker(num: v)   // worker(num: 16)
>> worker(16)  // worker(num: 16)
>> worker(v)   // worker(num: 16)
>> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>>   // should be an error
>> gang(v, w)  // likewise
>> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
>> gang(static: v, num: w)   // gang(static: v, num: w)
>> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
>> gang(num: v, static: w)   // gang(num: v, static: w)
>>
>> Also note that the static argument can accept '*'.
>>
>>> and if the length: or num: part is really optional, then
>>> int length, num;
>>> vector(length)
>>> worker(num)
>>> gang(num, static: 6)
>>> gang(static: 5, num)
>>> should be also accepted (or subset thereof?).
>>
>> Interesting question. The spec is unclear. It defines gang, worker and
>> vector as follows in section 2.7 in the OpenACC 2.0a spec:
>>
>>   gang [( gang-arg-list )]
>>   worker [( [num:] int-expr )]
>>   vector [( [length:] int-expr )]
>>
>> where gang-arg is one of:
>>
>>   [num:] int-expr
>>   static: size-expr
>>
>> and gang-arg-list may have at most one num and one static argument,
>> and where size-expr is one of:
>>
>>   *
>>   int-expr
>>
>> So I've interpreted that as a requirement that length and num must be
>> followed by an int-expr, whatever that is.
> 
> My reading of the above is that
> vector(length)
> is equivalent to
> vector(length: length)
> and
> worker(num)
> is equivalent to
> vector(num: num)
> etc.  Basically, neither length nor num aren't reserved identifiers,
> so you can use them for variable names, and if
> vector(v) is equivalent to vector(length: v), then
> vector(length) should be equivalent to vector(length:length)
> or
> vector(length + 1) should be equivalent to vector(length: length+1)
> static is a keyword that can't start an integral expression, so I guess
> it is fine if you issue an expected : diagnostics after it.
> 
> In any case, please add a testcase (both C and C++) which covers all these
> allowed variants (ideally one testcase) and rejected variants (another
> testcase with dg-error).
> 
> This is still an easy case, as even the C FE has 2 tokens lookup.
> E.g. for OpenMP map clause where
> map (always, tofrom: x)
> means one thing and
> map (always, tofrom, y)
> another one (map (tofrom: always, tofrom, y))
> I had to do quite ugly things to get around this.

Here are the updated test cases. Besides for adding a new test to
exercise the loop shape parsing, I also removed that assembly file
included in the original patch that Ilya noticed.

Is this OK for trunk?

Cesar

2015-10-23  Nathan Sidwell  <nat...@codesourcery.com>

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: New.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: New.

2015-10-23  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/testsuite/
	* c-c++-common/goacc/loop-shape.c: New.


diff --git a/gcc/testsuite/c-c++-common/goacc/loop-shape.c b/gcc/testsuite/c-c++-common/goacc/loop-shape.c
new file mode 100644
index 000..3cb3006
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/loop-shape.c
@@ -0,0 +1,197 @@
+/* Exercise *_parser_oacc_shape_clause by checking various combinations
+   of gang, worker and vector clause arguments.  */
+
+/* { dg-compile } */
+
+int main ()
+{
+  int i;
+  int v, w;
+  int length, num;
+
+  /* Valid uses.  */
+
+  #pragma acc kernels
+  #pragma acc loop gang worker vector
+  for (i = 0; i < 10; i++)
+;
+  
+  #pragma acc kernels
+  #pragma acc loop gang(26)
+  for (i = 0; i < 10; i++)
+;
+  
+  #pragma acc kernels
+  #pragma acc loop gang(v)
+  for (i = 0; i < 10; i++)
+;
+
+  #pragma acc kernels
+  #pragma acc loop vector(length: 16)
+  for (i = 0; i < 10; i++)
+;
+
+  #pragma acc kernels
+  #pragma acc loop vector(length: v)
+  for (i = 0; i < 10; i++)
+;

Re: Re: [OpenACC 5/11] C++ FE changes

2015-10-23 Thread Cesar Philippidis
On 10/22/2015 01:52 AM, Jakub Jelinek wrote:
> On Wed, Oct 21, 2015 at 03:18:55PM -0400, Nathan Sidwell wrote:
>> This patch is the C++ changes matching the C ones of patch 4.  In
>> finish_omp_clauses, the gang, worker, & vector clauses are handled the same
>> as OpenMP's 'num_threads' clause.  One change to num_threads is the
>> augmentation of a diagnostic to add %<...%>  markers to the clause name.
> 
> Indeed, lots of older OpenMP diagnostics is missing %<...%> markers around
> keywords.  Something to fix eventually.

I updated omp tasks and teams in semantics.c.

>> 2015-10-20  Cesar Philippidis  <ce...@codesourcery.com>
>>  Thomas Schwinge  <tho...@codesourcery.com>
>>  James Norris  <jnor...@codesourcery.com>
>>  Joseph Myers  <jos...@codesourcery.com>
>>  Julian Brown  <jul...@codesourcery.com>
>>  Nathan Sidwell <nat...@codesourcery.com>
>>
>>  * parser.c (cp_parser_omp_clause_name): Add auto, gang, seq,
>>  vector, worker.
>>  (cp_parser_oacc_simple_clause): New.
>>  (cp_parser_oacc_shape_clause): New.
> 
> What I've said for the C FE patch, plus:
> 
>> +  if (cp_lexer_next_token_is (lexer, CPP_NAME)
>> +  || cp_lexer_next_token_is (lexer, CPP_KEYWORD))
>> +{
>> +  tree name_kind = cp_lexer_peek_token (lexer)->u.value;
>> +  const char *p = IDENTIFIER_POINTER (name_kind);
>> +  if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
> 
> As static is a keyword, wouldn't it be better to just handle that case
> using cp_lexer_next_token_is_keyword (lexer, RID_STATIC)?
> 
> Also, what is the exact grammar of the shape arguments?
> Would be nice to describe the grammar, in the grammar you just say
> expression, at least for vector/worker, which is clearly not accurate.
> 
> It seems the intent is that num: or length: or static: is optional, right?
> But if that is the case, you should treat those as parsed only if followed
> by :.  While static is a keyword, so you can't have a variable called like
> that, having vector(length) or vector(num) should not be rejected.
> So, I would have expected that it should test if it is RID_STATIC
> followed by CPP_COLON (and only in that case consume those tokens),
> or CPP_NAME of id followed by CPP_COLON (and only in that case consume those
> tokens), otherwise parse it as assignment expression.

That function now peeks ahead to look for a colon, so now it can handle
variables with the name of clause keywords.

> The C FE may have similar issue.  Plus of course there should be testsuite
> coverage for all the weird cases.

I included a new test in a different patch because it's common to both c
and c++.

>> +case OMP_CLAUSE_GANG:
>> +case OMP_CLAUSE_VECTOR:
>> +case OMP_CLAUSE_WORKER:
>> +  /* Operand 0 is the num: or length: argument.  */
>> +  t = OMP_CLAUSE_OPERAND (c, 0);
>> +  if (t == NULL_TREE)
>> +break;
>> +
>> +  t = maybe_convert_cond (t);
> 
> Can you explain the maybe_convert_cond calls (in both cases here,
> plus the preexisting in OMP_CLAUSE_VECTOR_LENGTH)?
> The reason why it is used for OpenMP if and final clauses is that those have
> a condition argument, either the condition is zero or non-zero (so
> effectively it is turned into a bool).
> But aren't the gang/vector/worker/vector_length arguments integers rather
> than conditions?  I'd expect that finish_omp_clauses should verify
> those operands are indeed integral expressions (if that is the requirement
> in the standard), as it is something that for C++ can't be verified during
> parsing, if arbitrary expressions are parsed there.

It's probably a copy-and-paste error. This functionality was added
incrementally. I removed that check.

>> @@ -5959,32 +5990,58 @@ finish_omp_clauses (tree clauses, bool a
>>break;
>>  
>>  case OMP_CLAUSE_NUM_THREADS:
>> -  t = OMP_CLAUSE_NUM_THREADS_EXPR (c);
>> -  if (t == error_mark_node)
>> -remove = true;
>> -  else if (!type_dependent_expression_p (t)
>> -   && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
>> -{
>> -  error ("num_threads expression must be integral");
>> -  remove = true;
>> -}
>> -  else
>> -{
>> -  t = mark_rvalue_use (t);
>> -  if (!processing_template_decl)
>> -{
>> -  t = maybe_constant_value (t);
>> -  if (TREE_CODE (t) == INTEGER_CST
>> -  && tree_

Re: [OpenACC 4/11] C FE changes

2015-10-23 Thread Cesar Philippidis
On 10/23/2015 01:31 PM, Jakub Jelinek wrote:
> On Fri, Oct 23, 2015 at 01:17:07PM -0700, Cesar Philippidis wrote:
>> Good idea, thanks. This patch also corrects the problems parsing weird
>> combinations of num, static and length arguments that you mentioned
>> elsewhere.
>>
>> Is this OK for trunk?
> 
> I'd strongly prefer to see always patches accompanied by testcases.
> 
>> +  loc = c_parser_peek_token (parser)->location;
>> +  op_to_parse = 
>> +
>> +  if ((c_parser_next_token_is (parser, CPP_NAME)
>> +   || c_parser_next_token_is (parser, CPP_KEYWORD))
>> +  && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
>> +{
>> +  tree name_kind = c_parser_peek_token (parser)->value;
>> +  const char *p = IDENTIFIER_POINTER (name_kind);
> 
> I think I'd prefer not to peek at this at all if it is RID_STATIC,
> so perhaps just have (and name_kind is weird):
> else
>   {
> tree val = c_parser_peek_token (parser)->value;
> if (strcmp (id, IDENTIFIER_POINTER (val)) == 0)
>   {
> c_parser_consume_token (parser);  /* id  */
> c_parser_consume_token (parser);  /* ':'  */
>   }
> else
>   {
> ...
>   }
>   }
> ?

My plan over here was try and catch any arguments with a colon. But that
fell threw because...

>> +  if (kind == OMP_CLAUSE_GANG
>> +  && c_parser_next_token_is_keyword (parser, RID_STATIC))
>> +{
>> +  c_parser_consume_token (parser); /* static  */
>> +  c_parser_consume_token (parser); /* ':'  */
>> +
>> +  op_to_parse = 
>> +  if (c_parser_next_token_is (parser, CPP_MULT))
>> +{
>> +  c_parser_consume_token (parser);
>> +  *op_to_parse = integer_minus_one_node;
>> +
>> +  /* Consume a comma if present.  */
>> +  if (c_parser_next_token_is (parser, CPP_COMMA))
>> +c_parser_consume_token (parser);
> 
> Doesn't this mean that you happily parse
> gang (static: * abc)
> or
> gang (static:*num:1)
> etc.?  I'd say the comma should be non-optional (i.e. either accept
> CPP_COMMA, or CPP_CLOSE_PARENT, but nothing else) in that case (at least,
> when in OpenMP grammar something is *-list it is meant to be comma
> separated).

I'm not handling commas properly. My next patch is going to handle the
static argument separately.

>> +  /* Consume a comma if present.  */
>> +  if (c_parser_next_token_is (parser, CPP_COMMA))
>> +c_parser_consume_token (parser);
> 
> Similarly this means
> gang (num: 5 static: *)
> is accepted.  If it is valid, then again it should have testsuite coverage.

I'll include a test case for this with the next patch.

Cesar



Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Cesar Philippidis
On 10/22/2015 07:23 AM, Nathan Sidwell wrote:
> On 10/22/15 10:05, Jakub Jelinek wrote:
>> On Thu, Oct 22, 2015 at 09:53:46AM -0400, Nathan Sidwell wrote:
>>> On 10/22/15 05:37, Jakub Jelinek wrote:
>>>
 And, I must say I'm at least missing testcases that check parsing
 but also
 runtime behavior of the vector or worker clause arguments (there
 is one gang (static:1) clause, but not the other clauses nor other
 styles of
 gang arguments.
>>>
>>> the static clause is only valid on gang.
>>
>> That is what I've figured out.
>> But it is unclear from the parsing what from these is allowed:
> 
> good questions.  As you may have guessed, I'm not the primary author of
> the parsing code.  Cesar's stepped up to address this.

I'll go into more detail later when I post the revised patch, but for
the time being, in response to your to your earlier question I've
inlined how the clauses should be translated in comments below:

> But it is unclear from the parsing what from these is allowed:

int v, w;
...
gang(26)  // equivalent to gang(num:26)
gang(v)   // gang(num:v)
vector(length: 16)  // vector(length: 16)
vector(length: v)  // vector(length: v)
vector(16)  // vector(length: 16)
vector(v)   // vector(length: v)
worker(num: 16)  // worker(num: 16)
worker(num: v)   // worker(num: 16)
worker(16)  // worker(num: 16)
worker(v)   // worker(num: 16)
gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
  // should be an error
gang(v, w)  // likewise
gang(static: 16, num: 5)  // gang(static: 16, num: 5)
gang(static: v, num: w)   // gang(static: v, num: w)
gang(num: 5, static: 4)   // gang(num: 5, static: 4)
gang(num: v, static: w)   // gang(num: v, static: w)

Also note that the static argument can accept '*'.

> and if the length: or num: part is really optional, then
> int length, num;
> vector(length)
> worker(num)
> gang(num, static: 6)
> gang(static: 5, num)
> should be also accepted (or subset thereof?).

Interesting question. The spec is unclear. It defines gang, worker and
vector as follows in section 2.7 in the OpenACC 2.0a spec:

  gang [( gang-arg-list )]
  worker [( [num:] int-expr )]
  vector [( [length:] int-expr )]

where gang-arg is one of:

  [num:] int-expr
  static: size-expr

and gang-arg-list may have at most one num and one static argument,
and where size-expr is one of:

  *
  int-expr

So I've interpreted that as a requirement that length and num must be
followed by an int-expr, whatever that is.

I've been meaning to cleanup to up the c and c++ front ends for a while
now, but I've been bogged down by other things. This is next on my todo
list.

Cesar


more accurate omp in fortran

2015-10-22 Thread Cesar Philippidis
Currently, for certain omp and oacc errors the fortran will inaccurately
report exactly where in the omp/acc construct the error has occurred. E.g.

   !$acc parallel copy (i) copy (i) copy (j)
   1
Error: Symbol ‘i’ present on multiple clauses at (1)

instead of

   !$acc parallel copy (i) copy (i) copy (j)
1
Error: Symbol ‘i’ present on multiple clauses at (1)

The problem here is how the front end uses the locus for the construct
and not the individual clause. As a result that diagnostic pointer
points to the end of the construct.

This patch teaches gfc_resolve_omp_clauses how to use the locus of each
individual clause instead of the construct when reporting errors
involving OMP_LIST_ clauses (which are typically clauses involving
variables). It's still not perfect, but it does improve the quality of
the error reporting a little. In particular, in openacc, other compilers
are somewhat lenient in allowing variables to appear in multiple
clauses, e.g. copyin (foo) copyout (foo), but this is clearly forbidden
by the spec. I received some bug reports complaining that gfortran's
errors aren't accurate.

I've also split off the check for variables appearing in multiple
clauses into a separate function. It's a little overkill for trunk right
now, but it is used quite a bit in gomp4 for oacc declare.

I've tested these changes on x86_64. Is this ok for trunk?

Cesar


2015-10-22  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/fortran/
	* gfortran.h (gfc_omp_namespace): Add locus where member.
	* openmp.c (gfc_match_omp_variable_list): Set where for each list
	item found.
	(resolve_omp_duplicate_list): New function.
	(oacc_compatible_clauses): Delete.
	(resolve_omp_clauses): Remove where argument and use the where
	gfc_omp_namespace member when reporting errors.  Use
	resolve_omp_duplicate_list to check for variables appearing in
	mulitple clauses.
	(resolve_omp_do): Update call to resolve_omp_clauses.
	(resolve_oacc_loop): Likewise.
	(gfc_resolve_oacc_directive): Likewise.
	(gfc_resolve_omp_directive): Likewise.
	(gfc_resolve_omp_declare_simd): Likewise.

	gcc/testsuite/
	* gfortran.dg/gomp/intentin1.f90: Adjust copyprivate warning.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index b2894cc..93adb7b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1123,6 +1123,7 @@ typedef struct gfc_omp_namelist
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
+  locus where;
 }
 gfc_omp_namelist;
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 3c12d8e..56a95d4 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -244,6 +244,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	}
 	  tail->sym = sym;
 	  tail->expr = expr;
+	  tail->where = cur_loc;
 	  goto next_item;
 	case MATCH_NO:
 	  break;
@@ -278,6 +279,7 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	  tail = tail->next;
 	}
 	  tail->sym = sym;
+	  tail->where = cur_loc;
 	}
 
 next_item:
@@ -2832,36 +2834,47 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns,
   return copy;
 }
 
-/* Returns true if clause in list 'list' is compatible with any of
-   of the clauses in lists [0..list-1].  E.g., a reduction variable may
-   appear in both reduction and private clauses, so this function
-   will return true in this case.  */
+/* Check if a variable appears in multiple clauses.  */
 
-static bool
-oacc_compatible_clauses (gfc_omp_clauses *clauses, int list,
-			   gfc_symbol *sym, bool openacc)
+static void
+resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
+			int list)
 {
   gfc_omp_namelist *n;
+  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
 
-  if (!openacc)
-return false;
+  /* OpenACC reduction clauses are compatible with everything.  We only
+ need to check if a reduction variable is used more than once.  */
+  if (openacc && list == OMP_LIST_REDUCTION)
+{
+  hash_set reductions;
 
-  if (list != OMP_LIST_REDUCTION)
-return false;
+  for (n = clause_list; n; n = n->next)
+	{
+	  if (reductions.contains (n->sym))
+	gfc_error (error_msg, n->sym->name, >where);
+	  else
+	reductions.add (n->sym);
+	}
 
-  for (n = clauses->lists[OMP_LIST_FIRST]; n; n = n->next)
-if (n->sym == sym)
-  return true;
+  return;
+}
 
-  return false;
+  /* Ensure that variables are only used in one clause.  */
+  for (n = clause_list; n; n = n->next)
+{
+  if (n->sym->mark)
+	gfc_error (error_msg, n->sym->name, >where);
+  else
+	n->sym->mark = 1;
+}
 }
 
 /* OpenMP directive resolving routines.  */
 
 static void
-resolve_omp_clauses (gfc_code *code, locus *where,
-		 gfc_omp_clauses *omp_clauses, gfc_namespace *ns,
-		

Re: [OpenACC 11/11] execution tests

2015-10-22 Thread Cesar Philippidis
On 10/22/2015 08:00 AM, Jakub Jelinek wrote:
> On Thu, Oct 22, 2015 at 07:47:01AM -0700, Cesar Philippidis wrote:
>>> But it is unclear from the parsing what from these is allowed:
>>
>> int v, w;
>> ...
>> gang(26)  // equivalent to gang(num:26)
>> gang(v)   // gang(num:v)
>> vector(length: 16)  // vector(length: 16)
>> vector(length: v)  // vector(length: v)
>> vector(16)  // vector(length: 16)
>> vector(v)   // vector(length: v)
>> worker(num: 16)  // worker(num: 16)
>> worker(num: v)   // worker(num: 16)
>> worker(16)  // worker(num: 16)
>> worker(v)   // worker(num: 16)
>> gang(16, 24)  // technically gang(num:16, num:24) is acceptable but it
>>   // should be an error
>> gang(v, w)  // likewise
>> gang(static: 16, num: 5)  // gang(static: 16, num: 5)
>> gang(static: v, num: w)   // gang(static: v, num: w)
>> gang(num: 5, static: 4)   // gang(num: 5, static: 4)
>> gang(num: v, static: w)   // gang(num: v, static: w)
>>
>> Also note that the static argument can accept '*'.
>>
>>> and if the length: or num: part is really optional, then
>>> int length, num;
>>> vector(length)
>>> worker(num)
>>> gang(num, static: 6)
>>> gang(static: 5, num)
>>> should be also accepted (or subset thereof?).
>>
>> Interesting question. The spec is unclear. It defines gang, worker and
>> vector as follows in section 2.7 in the OpenACC 2.0a spec:
>>
>>   gang [( gang-arg-list )]
>>   worker [( [num:] int-expr )]
>>   vector [( [length:] int-expr )]
>>
>> where gang-arg is one of:
>>
>>   [num:] int-expr
>>   static: size-expr
>>
>> and gang-arg-list may have at most one num and one static argument,
>> and where size-expr is one of:
>>
>>   *
>>   int-expr
>>
>> So I've interpreted that as a requirement that length and num must be
>> followed by an int-expr, whatever that is.
> 
> My reading of the above is that
> vector(length)
> is equivalent to
> vector(length: length)
> and
> worker(num)
> is equivalent to
> vector(num: num)
> etc.  Basically, neither length nor num aren't reserved identifiers,
> so you can use them for variable names, and if
> vector(v) is equivalent to vector(length: v), then
> vector(length) should be equivalent to vector(length:length)
> or
> vector(length + 1) should be equivalent to vector(length: length+1)
> static is a keyword that can't start an integral expression, so I guess
> it is fine if you issue an expected : diagnostics after it.

You're correct. I overlooked that 'int length, num' declaration.

> In any case, please add a testcase (both C and C++) which covers all these
> allowed variants (ideally one testcase) and rejected variants (another
> testcase with dg-error).
> 
> This is still an easy case, as even the C FE has 2 tokens lookup.
> E.g. for OpenMP map clause where
> map (always, tofrom: x)
> means one thing and
> map (always, tofrom, y)
> another one (map (tofrom: always, tofrom, y))
> I had to do quite ugly things to get around this.

I'll add more test cases.

Thanks,
Cesar



Re: [gomp4 03/14] nvptx: expand support for address spaces

2015-10-20 Thread Cesar Philippidis
On 10/20/2015 02:13 PM, Bernd Schmidt wrote:
> On 10/20/2015 11:04 PM, Alexander Monakov wrote:
>> On Tue, 20 Oct 2015, Bernd Schmidt wrote:
>>
>>> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
 This allows to emit decls in 'shared' memory from the middle-end.

   * config/nvptx/nvptx.c (nvptx_legitimate_address_p): Adjust
 prototype.
   (nvptx_section_for_decl): If type of decl has a specific
 address
   space, return it.
   (nvptx_addr_space_from_address): Ditto.
   (TARGET_ADDR_SPACE_POINTER_MODE): Define.
   (TARGET_ADDR_SPACE_ADDRESS_MODE): Ditto.
   (TARGET_ADDR_SPACE_SUBSET_P): Ditto.
   (TARGET_ADDR_SPACE_CONVERT): Ditto.
   (TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): Ditto.
>>>
>>> Not a fan of this I'm afraid. I used to have address space support in
>>> the
>>> nvptx backend, but the middle-end was too broken for it to work, so I
>>> made
>>> nvptx deal with all the address space complications internally. Is
>>> there a
>>> reason why this approach can't work for what you want to do? (Also,
>>> where are
>>> you using this?)
>>
>> It is used in patch 06/14, to copy omp_data_o to shared memory.  I
>> don't see
>> any other sane approach.
> 
> There is an alternative - decorate anything you'd like to go to shared
> memory with a special attribute, then handled that attribute in
> nvptx_addr_space_from_address and nvptx_section_for_decl. I actually
> made such a patch for Cesar a while ago, maybe he still has it?
> 
> This would avoid the pitfalls with gcc's middle-end address space
> handling, and the #ifdef ADDR_SPACE_SHARED in patch 6 which is a bit ugly.

Was it this one that you're referring to Bernd? I think this is the
patch that introduces the "oacc ganglocal" attribute. It has bitrot
significantly though.

Regardless, keep in mind that we're abandoning dynamically allocated
shared memory in gcc 6.0. Right now in gomp-4_0-branch the two use cases
for shared memory are spill-and-fill for worker variable broadcasting
and worker reductions.

What are you planning on using shared memory for? It's an extremely
limited resource and it has some quirks.

Cesar
Index: gcc/cgraphunit.c
===
--- gcc/cgraphunit.c	(revision 224547)
+++ gcc/cgraphunit.c	(working copy)
@@ -2171,6 +2171,23 @@ ipa_passes (void)
   execute_ipa_pass_list (passes->all_small_ipa_passes);
   if (seen_error ())
 	return;
+
+  if (g->have_offload)
+	{
+	  extern void write_offload_lto ();
+	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+	  write_offload_lto ();
+	}
+}
+  bool do_local_opts = !in_lto_p;
+#ifdef ACCEL_COMPILER
+  do_local_opts = true;
+#endif
+  if (do_local_opts)
+{
+  execute_ipa_pass_list (passes->all_local_opt_passes);
+  if (seen_error ())
+	return;
 }
 
   /* This extra symtab_remove_unreachable_nodes pass tends to catch some
@@ -2182,7 +2199,7 @@ ipa_passes (void)
   if (symtab->state < IPA_SSA)
 symtab->state = IPA_SSA;
 
-  if (!in_lto_p)
+  if (do_local_opts)
 {
   /* Generate coverage variables and constructors.  */
   coverage_finish ();
@@ -2285,6 +2302,14 @@ symbol_table::compile (void)
   if (seen_error ())
 return;
 
+#ifdef ACCEL_COMPILER
+  {
+cgraph_node *node;
+FOR_EACH_DEFINED_FUNCTION (node)
+  node->get_untransformed_body ();
+  }
+#endif
+
 #ifdef ENABLE_CHECKING
   symtab_node::verify_symtab_nodes ();
 #endif
Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 224547)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -1171,18 +1171,42 @@ nvptx_section_from_addr_space (addr_spac
 }
 }
 
-/* Determine whether DECL goes into .const or .global.  */
+/* Determine the address space DECL lives in.  */
 
-const char *
-nvptx_section_for_decl (const_tree decl)
+static addr_space_t
+nvptx_addr_space_for_decl (const_tree decl)
 {
+  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
+return ADDR_SPACE_GENERIC;
+
+  if (lookup_attribute ("oacc ganglocal", DECL_ATTRIBUTES (decl)) != NULL_TREE)
+return ADDR_SPACE_SHARED;
+
   bool is_const = (CONSTANT_CLASS_P (decl)
 		   || TREE_CODE (decl) == CONST_DECL
 		   || TREE_READONLY (decl));
   if (is_const)
-return ".const";
+return ADDR_SPACE_CONST;
 
-  return ".global";
+  return ADDR_SPACE_GLOBAL;
+}
+
+/* Return a ptx string representing the address space for a variable DECL.  */
+
+const char *
+nvptx_section_for_decl (const_tree decl)
+{
+  switch (nvptx_addr_space_for_decl (decl))
+{
+case ADDR_SPACE_CONST:
+  return ".const";
+case ADDR_SPACE_SHARED:
+  return ".shared";
+case ADDR_SPACE_GLOBAL:
+  return ".global";
+default:
+  gcc_unreachable ();
+}
 }
 
 /* Look for a SYMBOL_REF in ADDR and return the address space to be used
@@ -1196,17 +1220,7 @@ 

[gomp4] privatize internal array variables introduced by the fortran FE

2015-10-13 Thread Cesar Philippidis
Arrays in fortran have a couple of internal variables associated with
them, e.g. stride, lbound, ubound, size, etc. Depending on how and where
the array was declared, these internal variables may be packed inside an
array descriptor represented by a struct or defined individually. The
major problem with this is that kernels and parallel regions with
default(none) will generate errors if those internal variables are
defined individually since the user has no way to add clauses to them. I
suspect this is also true for arrays inside omp target regions.

My fix for this involves two parts. First, I reinitialize those private
array variables which aren't associated with array descriptors at the
beginning of the parallel/kernels region they are used in. Second, I
added OMP_CLAUSE_PRIVATE for those internal variables.

I'll apply this patch to gomp-4_0-branch shortly.

Is there any reason why only certain arrays have array descriptors? The
arrays with descriptors don't have this problem. It's only the ones
without descriptors that leak new internal variables that cause errors
with default(none).

Cesar
2015-10-13  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/fortran/
	* trans-array.c (gfc_trans_array_bounds): Add an INIT_VLA argument
	to control whether VLAs should be initialized.  Don't mark this
	function as static.
	(gfc_trans_auto_array_allocation): Update call to
	gfc_trans_array_bounds.
	(gfc_trans_g77_array): Likewise.
	* trans-array.h: Declare gfc_trans_array_bounds.
	* trans-openmp.c (gfc_scan_nodesc_arrays): New function.
	(gfc_privatize_nodesc_arrays_1): New function.
	(gfc_privatize_nodesc_arrays): New function.
	(gfc_init_nodesc_arrays): New function.
	(gfc_trans_oacc_construct): Initialize any internal variables for
	arrays without array descriptors inside the offloaded parallel and
	kernels region.
	(gfc_trans_oacc_combined_directive): Likewise.

	gcc/testsuite/
	* gfortran.dg/goacc/default_none.f95: New test.

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index a6b761b..86f983a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -5709,9 +5709,9 @@ gfc_trans_array_cobounds (tree type, stmtblock_t * pblock,
 /* Generate code to evaluate non-constant array bounds.  Sets *poffset and
returns the size (in elements) of the array.  */
 
-static tree
+tree
 gfc_trans_array_bounds (tree type, gfc_symbol * sym, tree * poffset,
-stmtblock_t * pblock)
+stmtblock_t * pblock, bool init_vla)
 {
   gfc_array_spec *as;
   tree size;
@@ -5788,7 +5788,9 @@ gfc_trans_array_bounds (tree type, gfc_symbol * sym, tree * poffset,
 }
 
   gfc_trans_array_cobounds (type, pblock, sym);
-  gfc_trans_vla_type_sizes (sym, pblock);
+
+  if (init_vla)
+gfc_trans_vla_type_sizes (sym, pblock);
 
   *poffset = offset;
   return size;
@@ -5852,7 +5854,7 @@ gfc_trans_auto_array_allocation (tree decl, gfc_symbol * sym,
   && !INTEGER_CST_P (sym->ts.u.cl->backend_decl))
 gfc_conv_string_length (sym->ts.u.cl, NULL, );
 
-  size = gfc_trans_array_bounds (type, sym, , );
+  size = gfc_trans_array_bounds (type, sym, , , true);
 
   /* Don't actually allocate space for Cray Pointees.  */
   if (sym->attr.cray_pointee)
@@ -5947,7 +5949,7 @@ gfc_trans_g77_array (gfc_symbol * sym, gfc_wrapped_block * block)
 gfc_conv_string_length (sym->ts.u.cl, NULL, );
 
   /* Evaluate the bounds of the array.  */
-  gfc_trans_array_bounds (type, sym, , );
+  gfc_trans_array_bounds (type, sym, , , true);
 
   /* Set the offset.  */
   if (TREE_CODE (GFC_TYPE_ARRAY_OFFSET (type)) == VAR_DECL)
diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h
index 52f1c9a..8dbafb9 100644
--- a/gcc/fortran/trans-array.h
+++ b/gcc/fortran/trans-array.h
@@ -44,6 +44,8 @@ void gfc_trans_g77_array (gfc_symbol *, gfc_wrapped_block *);
 /* Generate code to deallocate an array, if it is allocated.  */
 tree gfc_trans_dealloc_allocated (tree, bool, gfc_expr *);
 
+tree gfc_trans_array_bounds (tree, gfc_symbol *, tree *, stmtblock_t *, bool);
+
 tree gfc_full_array_size (stmtblock_t *, tree, int);
 
 tree gfc_duplicate_allocatable (tree, tree, tree, int, tree);
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 8c1e897..f2e9803 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -39,6 +39,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "arith.h"
 #include "omp-low.h"
 #include "gomp-constants.h"
+#include "hash-set.h"
+#include "tree-iterator.h"
 
 int ompws_flags;
 
@@ -2716,22 +2718,157 @@ gfc_trans_omp_code (gfc_code *code, bool force_empty)
   return stmt;
 }
 
+void gfc_debug_expr (gfc_expr *);
+
+/* Add any array that does not have an array descriptor to the hash_set
+   pointed to by DATA.  */
+
+static int
+gfc_scan_nodesc_arrays (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
+		void *data)
+{
+ 

Re: [gomp4] privatize internal array variables introduced by the fortran FE

2015-10-13 Thread Cesar Philippidis
On 10/13/2015 01:29 PM, Jakub Jelinek wrote:
> On Tue, Oct 13, 2015 at 01:12:25PM -0700, Cesar Philippidis wrote:
>> Arrays in fortran have a couple of internal variables associated with
>> them, e.g. stride, lbound, ubound, size, etc. Depending on how and where
>> the array was declared, these internal variables may be packed inside an
>> array descriptor represented by a struct or defined individually. The
>> major problem with this is that kernels and parallel regions with
>> default(none) will generate errors if those internal variables are
>> defined individually since the user has no way to add clauses to them. I
>> suspect this is also true for arrays inside omp target regions.
> 
> I believe gfc_omp_predetermined_sharing is supposed to handle this,
> returning predetermined shared for certain DECL_ARTIFICIAL decls.
> If you are not using that hook, perhaps you should have similar one tuned
> for OpenACC purposes?

We do have one for openacc. I thought it's job was to mark variables as
firstprivate or pcopy as necessary. Anyway, it might be too late to call
gfc_omp_predetermined_sharing from the gimplifier from a performance
standpoint. Consider something like this:

  !$acc data copy (array)
  do i = 1,n
!$acc parallel loop
 do j = 1,n
   ...array...
 end do
  end do
  !$acc end data

The problem here is that all of those internal variables would end up
getting marked as firstprivate. And that would cause more data to be
transferred to the accelerator. This patch reinitialized those variables
on the accelerator so they don't have to be transferred at all.

Cesar


[gomp4] handle missing OMP_LIST_ clauses in fortran's parse tree debugger

2015-10-01 Thread Cesar Philippidis
While debugging gfortran with -fdump-fortran-*, I noticed that a couple
of OMP_LIST_ entries weren't being handled show_omp_clauses so I've
added them. I also took advantage of the opportunity to rearrange the
the cases in the switch statement that handles those lists in a way that
matches the enum in gfortran.h because I couldn't figure out how things
were ordered before.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-10-01  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/fortran/
	* dump-parse-tree.c (show_omp_clauses): Add missing omp list_types
	and reorder the switch cases to match the enum in gfortran.h.
	

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 48476af..3e5ac17 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1251,19 +1251,24 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
 	const char *type = NULL;
 	switch (list_type)
 	  {
-	  case OMP_LIST_USE_DEVICE: type = "USE_DEVICE"; break;
-	  case OMP_LIST_DEVICE_RESIDENT: type = "USE_DEVICE"; break;
-	  case OMP_LIST_CACHE: type = ""; break;
 	  case OMP_LIST_PRIVATE: type = "PRIVATE"; break;
 	  case OMP_LIST_FIRSTPRIVATE: type = "FIRSTPRIVATE"; break;
 	  case OMP_LIST_LASTPRIVATE: type = "LASTPRIVATE"; break;
-	  case OMP_LIST_SHARED: type = "SHARED"; break;
+	  case OMP_LIST_COPYPRIVATE: type = "COPYPRIVATE"; break;
+	  case OMP_LIST_SHARED: type = "SHARE"; break;
 	  case OMP_LIST_COPYIN: type = "COPYIN"; break;
 	  case OMP_LIST_UNIFORM: type = "UNIFORM"; break;
 	  case OMP_LIST_ALIGNED: type = "ALIGNED"; break;
 	  case OMP_LIST_LINEAR: type = "LINEAR"; break;
-	  case OMP_LIST_REDUCTION: type = "REDUCTION"; break;
 	  case OMP_LIST_DEPEND: type = "DEPEND"; break;
+	  case OMP_LIST_MAP: type = "MAP"; break;
+	  case OMP_LIST_TO: type = "TO"; break;
+	  case OMP_LIST_FROM: type = "FROM"; break;
+	  case OMP_LIST_REDUCTION: type = "REDUCTION"; break;
+	  case OMP_LIST_DEVICE_RESIDENT: type = "DEVICE_RESIDENT"; break;
+	  case OMP_LIST_LINK: type = "LINK"; break;
+	  case OMP_LIST_USE_DEVICE: type = "USE_DEVICE"; break;
+	  case OMP_LIST_CACHE: type = "CACHE"; break;
 	  default:
 	gcc_unreachable ();
 	  }


[gomp4] tile clause asterisk argument

2015-09-30 Thread Cesar Philippidis
This patch fixes a fortran ICE when a tile clause contains an asterisk.
The problem was the asterisk argument is represented by a NULL
expression. That caused problems as the code when the code is translated
into gimple. The fix is to convert those NULL expressions into -1
expressions late, since that what the c and c++ front ends do.

It looks like there is a lot of existing test coverage for the tile
clause. However, this ICE isn't triggered if there are parser errors.
The new test does contain some deliberate errors, but I included them to
test for invalid nesting which gets triggered in omplow.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-09-30  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/fortran/
	* openmp.c (resolve_oacc_loop_blocks): Represent astrisk tile
	arguments as -1.

	gcc/testsuite/
	* gfortran.dg/goacc/loop-5.f95: New test.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 0bdbb73..c42a2c2 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -4891,10 +4891,21 @@ resolve_oacc_loop_blocks (gfc_code *code)
 	{
 	  num++;
 	  if (el->expr == NULL)
-	continue;
-	  resolve_oacc_positive_int_expr (el->expr, "TILE");
-	  if (el->expr->expr_type != EXPR_CONSTANT)
-	gfc_error ("TILE requires constant expression at %L", >loc);
+	{
+	  /* NULL expressions are used to represent '*' arguments.
+		 Convert those to a -1 expressions.  */
+	  el->expr = gfc_get_constant_expr (BT_INTEGER,
+		gfc_default_integer_kind,
+		>loc);
+	  mpz_set_si (el->expr->value.integer, -1);
+	}
+	  else
+	{
+	  resolve_oacc_positive_int_expr (el->expr, "TILE");
+	  if (el->expr->expr_type != EXPR_CONSTANT)
+		gfc_error ("TILE requires constant expression at %L",
+			   >loc);
+	}
 	}
   resolve_oacc_nested_loops (code, code->block->next, num, "tiled");
 }
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-5.f95 b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
new file mode 100644
index 000..c2db090
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-5.f95
@@ -0,0 +1,429 @@
+! { dg-do compile }
+! { dg-additional-options "-fmax-errors=100" }
+
+! TODO: nested kernels are allowed in 2.0
+
+program test
+  implicit none
+  integer :: i, j
+
+  !$acc kernels
+!$acc loop auto
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+ENDDO
+!$acc loop gang(5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(num:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:*)
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+  !$acc loop worker
+  DO j = 1,10
+  ENDDO
+ENDDO
+
+!$acc loop worker
+DO i = 1,10
+ENDDO
+!$acc loop worker(5)
+DO i = 1,10
+ENDDO
+!$acc loop worker(num:5)
+DO i = 1,10
+ENDDO
+!$acc loop worker
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+ENDDO
+!$acc loop gang worker
+DO i = 1,10
+ENDDO
+
+!$acc loop vector
+DO i = 1,10
+ENDDO
+!$acc loop vector(5)
+DO i = 1,10
+ENDDO
+!$acc loop vector(length:5)
+DO i = 1,10
+ENDDO
+!$acc loop vector
+DO i = 1,10
+ENDDO
+!$acc loop gang vector
+DO i = 1,10
+ENDDO
+!$acc loop worker vector
+DO i = 1,10
+ENDDO
+
+!$acc loop auto
+DO i = 1,10
+ENDDO
+
+!$acc loop tile(1)
+DO i = 1,10
+ENDDO
+!$acc loop tile(2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(6-2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(6+2)
+DO i = 1,10
+ENDDO
+!$acc loop tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop tile(*, 1)
+DO i = 1,10
+  DO j = 1,10
+  ENDDO
+ENDDO
+!$acc loop tile(-1) ! { dg-warning "must be positive" }
+do i = 1,10
+enddo
+!$acc loop vector tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop worker tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop gang tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop vector gang tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop vector worker tile(*)
+DO i = 1,10
+ENDDO
+!$acc loop gang worker tile(*)
+DO i = 1,10
+ENDDO
+  !$acc end kernels
+
+
+  !$acc parallel
+!$acc loop auto
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:5)
+DO i = 1,10
+ENDDO
+!$acc loop gang(static:*)
+DO i = 1,10
+ENDDO
+!$acc loop gang
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+  !$acc loop worker
+  DO j = 1,10
+  ENDDO
+ENDDO
+
+!$acc loop worker
+DO i = 1,10
+ENDDO
+!$acc loop worker
+DO i = 1,10
+  !$acc loop vector
+  DO j = 1,10
+  ENDDO
+ENDDO
+  

Re: OpenACC subarray data alignment in fortran

2015-09-29 Thread Cesar Philippidis
Ping.

In the meantime, I'll apply this patch to gomp-4_0-branch.

Cesar

On 09/22/2015 08:24 AM, Cesar Philippidis wrote:
> In both OpenACC and OpenMP, each subarray has at least two data mappings
> associated with them, one for the pointer and another for the data in
> the array section (fortan also has a pset mapping). One problem I
> observed in fortran is that array section data is casted to char *.
> Consequently, when lower_omp_target assigns alignment for the subarray
> data, it does so incorrectly. This is a problem on nvptx if you have a
> data clause such as
> 
>   integer foo
>   real*8 bar (100)
> 
>   !$acc data copy (foo, bar(1:100))
> 
> Here, the data associated with bar could get aligned on a 4 byte
> boundary instead of 8 byte. That causes problems on nvptx targets.
> 
> My fix for this is to prevent the fortran front end from casting the
> data pointers to char *. I only prevented casting on the code which
> handles OMP_CLAUSE_MAP. The subarrays associated with OMP_CLAUSE_SHARED
> also get casted to char *, but I left those as-is because I'm not that
> familiar with how non-OpenMP target regions get lowered.
> 
> Is this patch OK for trunk?
> 
> Thanks,
> Cesar
> 



Re: [gomp4] error on acc loops not associated with offloaded acc regions

2015-09-29 Thread Cesar Philippidis
On 09/29/2015 02:48 AM, Thomas Schwinge wrote:

> On Mon, 28 Sep 2015 10:08:34 -0700, Cesar Philippidis 
> <ce...@codesourcery.com> wrote:
>> I've applied this patch to gomp-4_0-branch which teaches omplower how to
>> error when it detects acc loops which aren't nested inside an acc
>> parallel or kernels region or located within a function marked as an acc
>> routine. A couple of test cases needed to be updated.
>>
>> The error message is kind of long. Let me know if it should be revised.
> 
>>  gcc/testsuite/
>>  * c-c++-common/goacc/non-routine.c: New test.
>>  * c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
>>  nesting.
>>  * c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
>>  * c-c++-common/goacc/clauses-fail.c: Likewise.
>>  * c-c++-common/goacc/sb-1.c: Likewise.
>>  * c-c++-common/goacc/sb-3.c: Likewise.
>>  * gcc.dg/goacc/sb-1.c: Likewise.
>>  * gcc.dg/goacc/sb-3.c: Likewise.
> 
> What about any Fortran test cases?

My first thought was that we didn't need one because this is generic
error handling in omplow, and there are already a lot of c tests cases
exercising it. However a fortran test can't hurt, so I added one in this
new patch. Note that I had to create a new test instead of hijacking an
existing test, because the fortran front end bails out when it detects
errors before it hands anything over to omplow. And the existing tests
had a bunch of expected front end errors.

>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, 
>> omp_context *ctx)
>>  }
>>return true;
>>  }
>> +  if (is_gimple_omp_oacc (stmt) && ctx == NULL
>> +  && get_oacc_fn_attrib (current_function_decl) == NULL)
>> +{
>> +  error_at (gimple_location (stmt),
>> +"acc loops must be associated with an acc region or "
>> +"routine");
>> +  return false;
>> +}
>>/* FALLTHRU */
>>  case GIMPLE_CALL:
>>if (is_gimple_call (stmt)
> 
> I see that the error reporting doesn't really use a consistent style
> currently, but what about something like "loop directive must be
> associated with compute region" (where "compute region" is the language
> used by OpenACC 2.0a to mean the structured block associated with a
> compute construct as well as routine directive)?

That sounds reasonable, but it's not much shorter.

>> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
>> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
>> @@ -20,6 +20,7 @@ f_acc_kernels (void)
>>}
>>  }
>>  
>> +#pragma acc routine
>>  void
>>  f_acc_loop (void)
>>  {
> 
> OK, but...
> 
>> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
>> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
>> @@ -361,72 +361,72 @@ f_acc_data (void)
>>  void
>>  f_acc_loop (void)
>>  {
>> -#pragma acc loop
>> +#pragma acc loop /* { dg-error "acc loops must be associated with an acc 
>> region or routine" } */
>>for (i = 0; i < 2; ++i)
>>  {
>> -#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC 
>> region" } */
>> +#pragma omp parallel
>>;
>>  }
> 
> ... here you're changing what this is meant to be testing, so please
> restore the original meaning (by adding "#pragma acc routine" to this
> function, I suppose), and then perhaps add whichever additional test
> cases you deem necessary.

I was wondering about that too. After thinking about it some more, I did
as you suggested -- revert those changes and used a routine pragma.

>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/non-routine.c
>> @@ -0,0 +1,16 @@
>> +/* This program validates the behavior of acc loops which are
>> +   not associated with a parallel or kernles region or routine.  */
> 
> :-) Thanks for adding such a comment -- this is missing in too many test
> cases.

We definitely need more of them. I'm not starting to forget what I was
trying to test several months ago.

I'll apply this patch to gomp4.

Cesar

2015-09-29  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* omp-low.c (check_omp_nesting_restrictions): Update the error
	message for loops not affliated with acc compute regions.

	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-fail-1.c (f_omp): Revert changes and
	mark the function as an acc routine.
	* c-c++-common/goacc/clauses-fail.c: Likewise.

[gomp4] error on acc loops not associated with offloaded acc regions

2015-09-28 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which teaches omplower how to
error when it detects acc loops which aren't nested inside an acc
parallel or kernels region or located within a function marked as an acc
routine. A couple of test cases needed to be updated.

The error message is kind of long. Let me know if it should be revised.

Cesar
2015-09-28  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* omp-low.c (check_omp_nesting_restrictions): Check for acc loops not
	associated with acc regions or routines.

	gcc/testsuite/
	* c-c++-common/goacc/non-routine.c: New test.
	* c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
	nesting.
	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
	* c-c++-common/goacc/clauses-fail.c: Likewise.
	* c-c++-common/goacc/sb-1.c: Likewise.
	* c-c++-common/goacc/sb-3.c: Likewise.
	* gcc.dg/goacc/sb-1.c: Likewise.
	* gcc.dg/goacc/sb-3.c: Likewise.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 99b3939..2329a71 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	}
 	  return true;
 	}
+  if (is_gimple_omp_oacc (stmt) && ctx == NULL
+	  && get_oacc_fn_attrib (current_function_decl) == NULL)
+	{
+	  error_at (gimple_location (stmt),
+		"acc loops must be associated with an acc region or "
+		"routine");
+	  return false;
+	}
   /* FALLTHRU */
 case GIMPLE_CALL:
   if (is_gimple_call (stmt)
diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index b38e181..75d6a1d 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -20,6 +20,7 @@ f_acc_kernels (void)
   }
 }
 
+#pragma acc routine
 void
 f_acc_loop (void)
 {
diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 14c6aa6..6d91484 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -361,72 +361,72 @@ f_acc_data (void)
 void
 f_acc_loop (void)
 {
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp parallel
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp for /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp for
   for (i = 0; i < 3; i++)
 	;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp sections /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp sections
   {
 	;
   }
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp single /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp single
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp task /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp task
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp master /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp master
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp critical /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp critical
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp ordered
   ;
 }
 
-#pragma acc loop
+#pragma acc loop /* { dg-error "acc loops must be associated with an acc region or routine" } */
   for (i = 0; i < 2; ++i)
 {
-#pragma omp target /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target
   ;
-#pragma omp target data /* { dg-error "non-

Re: [gomp4] Another oacc reduction simplification

2015-09-25 Thread Cesar Philippidis
On 09/25/2015 03:57 AM, Nathan Sidwell wrote:
> On 09/24/15 16:32, Cesar Philippidis wrote:
>> On 09/22/2015 08:29 AM, Nathan Sidwell wrote:
>>
>>> 1) Don't have a fake gang reduction outside of worker & vector loops.
>>> Deal with the receiver object directly.  I.e. 'ref_to_res' need not be a
>>> null pointer for vector and worker loops.
>>
>> What happens when there is no receiver object. E.g. a reduction inside a
>> routine? Specifically, inside lower_oacc_reductions, your doing this:
>>
>> /* This is the outermost construct with this reduction,
>>see if there's a mapping for it.  */
>> if (maybe_lookup_field (orig, outer))
>>   ref_to_res = build_receiver_ref (orig, false, outer);
>>
>> That's going to ICE inside a routine.
> 
> Is it?  the 'maybe_lookup' should protect against that.  do you have a
> testcase?

See gcc/testsuite/c-c++-common/goacc/routine-7.c.

Cesar


Re: [gomp4] Another oacc reduction simplification

2015-09-24 Thread Cesar Philippidis
On 09/22/2015 08:29 AM, Nathan Sidwell wrote:

> 1) Don't have a fake gang reduction outside of worker & vector loops. 
> Deal with the receiver object directly.  I.e. 'ref_to_res' need not be a
> null pointer for vector and worker loops.

What happens when there is no receiver object. E.g. a reduction inside a
routine? Specifically, inside lower_oacc_reductions, your doing this:

/* This is the outermost construct with this reduction,
   see if there's a mapping for it.  */
if (maybe_lookup_field (orig, outer))
  ref_to_res = build_receiver_ref (orig, false, outer);

That's going to ICE inside a routine.

> 2) Create a local private instance for all cases of reference var
> reductions, not just those in vector & worker loops

Good. I was about to make a similar change to fix a gang reduction bug.

Cesar



[gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
Gang, worker, vector and collapse all contain optional arguments which
may be used during loop expansion. In OpenACC, those expressions could
contain variables, but those variables aren't always getting remapped
automatically. This patch remaps those variables inside lower_omp_loop.

Note that I didn't need to use a tree walker for more complicated
expressions because it's not required. By the time those clauses reach
lower_omp_loop, only the result of the expression is available. So the
other variables in those expressions get remapped with everything else
during omplow. Therefore, the only problematic case is when the the
optional expression is just a decl, e.g. gang(static:foo).

I've applied this patch to gomp-4_0-branch.

Cesar


Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
On 09/23/2015 10:42 AM, Cesar Philippidis wrote:

> I've applied this patch to gomp-4_0-branch.

This patch, that is.

Cesar

2015-09-23  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* omp-low.c (lower_omp_for): Remap any variables present in
	OMP_CLAUSE_GANG, OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR and
	OMP_CLAUSE_COLLAPSE becuase they will be used later by expand_omp_for.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Test if
	static gang expressions containing variables work.
	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ec76096..3f36b7a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11325,6 +11325,35 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (oacc_tail)
 gimple_seq_add_seq (, oacc_tail);
 
+  /* Update the variables inside any clauses which may be involved in loop
+ expansion later on.  */
+  for (tree c = gimple_omp_for_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
+{
+  int args;
+
+  switch (OMP_CLAUSE_CODE (c))
+	{
+	default:
+	  args = 0;
+	  break;
+	case OMP_CLAUSE_GANG:
+	  args = 2;
+	  break;
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_COLLAPSE:
+	  args = 1;
+	  break;
+	}
+
+  for (int i = 0; i < args; i++)
+	{
+	  tree expr = OMP_CLAUSE_OPERAND (c, i);
+	  if (expr && DECL_P (expr))
+	OMP_CLAUSE_OPERAND (c, i) = build_outer_var_ref (expr, ctx);
+	}
+}
+
   pop_gimplify_context (new_stmt);
 
   gimple_bind_append_vars (new_stmt, ctx->block_vars);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
index 3a9a508..20a866d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
@@ -39,7 +39,7 @@ int
 main ()
 {
   int a[N];
-  int i;
+  int i, x;
 
 #pragma acc parallel loop gang (static:*) num_gangs (10)
   for (i = 0; i < 100; i++)
@@ -78,5 +78,21 @@ main ()
 
   test_nonstatic (a, 10);
 
+  /* Static arguments with a variable expression.  */
+
+  x = 20;
+#pragma acc parallel loop gang (static:0+x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  x = 20;
+#pragma acc parallel loop gang (static:x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
   return 0;
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
index e562535..7d56060 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
@@ -3,6 +3,7 @@
 program main
   integer, parameter :: n = 100
   integer i, a(n), b(n)
+  integer x
 
   do i = 1, n
  b(i) = i
@@ -48,6 +49,23 @@ program main
 
   call test (a, b, 20, n)
 
+  x = 5
+  !$acc parallel loop gang (static:0+x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  x = 10
+  !$acc parallel loop gang (static:x) num_gangs (10)
+  do i = 1, n
+ a(i) = b(i) + 10
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 10, n)
 end program main
 
 subroutine test (a, b, sarg, n)


Re: [gomp4] remap variables inside gang, worker, vector and collapse clauses

2015-09-23 Thread Cesar Philippidis
On 09/23/2015 11:26 AM, Thomas Schwinge wrote:
> On Wed, 23 Sep 2015 10:57:40 -0700, Cesar Philippidis 
> <ce...@codesourcery.com> wrote:
>> On 09/23/2015 10:42 AM, Cesar Philippidis wrote:
>> | Gang, worker, vector and collapse all contain optional arguments which
>> | may be used during loop expansion. In OpenACC, those expressions could
>> | contain variables
> 
> I'm fairly sure that at least the collapse clause needs to be a
> compile-time constant?

Thanks, you're correct. I was looking at a user application and not the
spec when I made this change. I've applied this patch to fix that.

>> | but those variables aren't always getting remapped
>> | automatically. This patch remaps those variables inside lower_omp_loop.
> 
> Shouldn't that be done in lower_rec_input_clauses?  (Maybe I'm confused
> -- it's been a long time that I looked at this code.)  (Jakub?)

I thought that lower_rec_input_clauses was for omp reductions and
firstprivate initialization? Variables ultimately get remapped when
omplower eventually calls gimple_regimplify_operands. That function uses
the value-expr to for remapping.

In this case, since lower_omp_for is responsible for GIMPLE_OMP_FOR
stmts, gimple_regimplify_operands doesn't get called on the clauses.

Cesar
2015-09-23  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* omp-low.c (lower_omp_for): Don't remap OMP_CLAUSE_COLLAPSE
	because it always a constant value.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fa6b8a5..753996b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11341,7 +11341,6 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  break;
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_WORKER:
-	case OMP_CLAUSE_COLLAPSE:
 	  args = 1;
 	  break;
 	}


[gomp4] implicit data mappings of dummy arguments

2015-09-22 Thread Cesar Philippidis
Currently, the gimplifier will incorrectly create implicit firstprivate
mappings for pointer variables. That's fine except when the pointer
points to a dummy argument. In which case, the gimplifier should check
the type of the value being pointed to before deciding on the type of
implicit mapping. This patch teaches the gimplifier to do that. This
corrects a bug where a dummy array gets implicitly transferred as
firstprivate instead of pcopy.

I've applied patch has been committed to gomp-4_0-branch.

Cesar
2015-09-22  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* gimplify.c (oacc_default_clause): Inspect pointer types when
	determining implicit data mappings.

	libgomp/
	* testsuite/libgomp.oacc-fortran/dummy-array.f90: New test.
	* testsuite/libgomp.oacc-fortran/reference-reductions.f90: New test.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 914570b..6dc7df7 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5948,7 +5948,8 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
 	  {
 	tree type = TREE_TYPE (decl);
 
-	if (TREE_CODE (type) == REFERENCE_TYPE)
+	if (TREE_CODE (type) == REFERENCE_TYPE
+		|| POINTER_TYPE_P (type))
 	  type = TREE_TYPE (type);
 	
 	if (AGGREGATE_TYPE_P (type))
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/dummy-array.f90 b/libgomp/testsuite/libgomp.oacc-fortran/dummy-array.f90
new file mode 100644
index 000..e95563c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/dummy-array.f90
@@ -0,0 +1,28 @@
+! Ensure that dummy arrays are transferred to the accelerator
+! via an implicit pcopy.
+
+! { dg-do run } 
+
+program main
+  integer, parameter :: n = 1000
+  integer :: a(n)
+  integer :: i
+
+  a(:) = -1
+
+  call dummy_array (a, n)
+  
+  do i = 1, n
+ if (a(i) .ne. i) call abort
+  end do
+end program main
+
+subroutine dummy_array (a, n)
+  integer a(n)
+
+  !$acc parallel loop num_gangs (100) gang
+  do i = 1, n
+ a(i) = i
+  end do
+  !$acc end parallel loop
+end subroutine
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/reference-reductions.f90 b/libgomp/testsuite/libgomp.oacc-fortran/reference-reductions.f90
new file mode 100644
index 000..a684d07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/reference-reductions.f90
@@ -0,0 +1,38 @@
+! Test reductions on dummy arguments inside modules.
+
+! { dg-do run }
+
+module prm
+  implicit none
+
+contains
+
+subroutine param_reduction(var)
+  implicit none
+  integer(kind=8) :: var
+  integer  :: j,k
+
+!$acc parallel copy(var)
+!$acc loop reduction(+ : var) gang
+ do k=1,10
+!$acc loop vector reduction(+ : var)
+do j=1,100
+ var = var + 1.0
+enddo
+ enddo
+!$acc end parallel
+end subroutine param_reduction
+
+end module prm
+
+program test
+  use prm
+  implicit none
+
+  integer(8) :: r
+
+  r=10.0
+  call param_reduction (r)
+
+  if (r .ne. 1010) call abort ()
+end program test


OpenACC subarray data alignment in fortran

2015-09-22 Thread Cesar Philippidis
In both OpenACC and OpenMP, each subarray has at least two data mappings
associated with them, one for the pointer and another for the data in
the array section (fortan also has a pset mapping). One problem I
observed in fortran is that array section data is casted to char *.
Consequently, when lower_omp_target assigns alignment for the subarray
data, it does so incorrectly. This is a problem on nvptx if you have a
data clause such as

  integer foo
  real*8 bar (100)

  !$acc data copy (foo, bar(1:100))

Here, the data associated with bar could get aligned on a 4 byte
boundary instead of 8 byte. That causes problems on nvptx targets.

My fix for this is to prevent the fortran front end from casting the
data pointers to char *. I only prevented casting on the code which
handles OMP_CLAUSE_MAP. The subarrays associated with OMP_CLAUSE_SHARED
also get casted to char *, but I left those as-is because I'm not that
familiar with how non-OpenMP target regions get lowered.

Is this patch OK for trunk?

Thanks,
Cesar
2015-09-22  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* fortran/trans-openmp.c (gfc_omp_finish_clause): Don't cast ptr
	into a character pointer.
	(gfc_trans_omp_clauses_1): Likewise.

	libgomp/
	* testsuite/libgomp.oacc-fortran/data-alignment.f90: New test.

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index cd76f2a..8c1e897 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1065,7 +1065,6 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
   gfc_start_block ();
   tree type = TREE_TYPE (decl);
   tree ptr = gfc_conv_descriptor_data_get (decl);
-  ptr = fold_convert (build_pointer_type (char_type_node), ptr);
   ptr = build_fold_indirect_ref (ptr);
   OMP_CLAUSE_DECL (c) = ptr;
   c2 = build_omp_clause (input_location, OMP_CLAUSE_MAP);
@@ -1972,8 +1971,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 		{
 		  tree type = TREE_TYPE (decl);
 		  tree ptr = gfc_conv_descriptor_data_get (decl);
-		  ptr = fold_convert (build_pointer_type (char_type_node),
-	  ptr);
 		  ptr = build_fold_indirect_ref (ptr);
 		  OMP_CLAUSE_DECL (node) = ptr;
 		  node2 = build_omp_clause (input_location,
@@ -2066,8 +2063,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
    OMP_CLAUSE_SIZE (node), elemsz);
 		}
 		  gfc_add_block_to_block (block, );
-		  ptr = fold_convert (build_pointer_type (char_type_node),
-  ptr);
 		  OMP_CLAUSE_DECL (node) = build_fold_indirect_ref (ptr);
 
 		  if (POINTER_TYPE_P (TREE_TYPE (decl))
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90 b/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90
new file mode 100644
index 000..3c309c0
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/data-alignment.f90
@@ -0,0 +1,35 @@
+! Test if the array data associated with c is properly aligned
+! on the accelerator.  If it is not, this program will crash.
+
+! { dg-do run }
+
+integer function routine_align()
+  implicit none
+  integer, parameter :: n = 1
+  real*8, dimension(:), allocatable :: c
+  integer :: i, idx
+
+  allocate (c(n))
+  routine_align = 0
+  c = 0.0
+
+  !$acc data copyin(idx) copy(c(1:n))
+
+  !$acc parallel vector_length(32)
+  !$acc loop vector
+  do i=1, n
+ c(i) = i
+  enddo
+  !$acc end parallel
+
+  !$acc end data
+end function routine_align
+
+
+! main driver
+program routine_align_main
+  implicit none
+  integer :: success
+  integer routine_align
+  success = routine_align()
+end program routine_align_main


Re: New post-LTO OpenACC pass

2015-09-21 Thread Cesar Philippidis
On 09/21/2015 09:30 AM, Nathan Sidwell wrote:

> +const pass_data pass_data_oacc_transform =
> +{
> +  GIMPLE_PASS, /* type */
> +  "fold_oacc_transform", /* name */

Want to rename the tree dump file to oacc_xforms like I'm did in the
attached patch? Regardless, I think we need to document this flag in
invoke.texi.

> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  PROP_cfg, /* properties_required */
> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};

Cesar
2015-09-21  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* doc/invoke.texi: Document -fdump-tree-oacc_xforms.
	* omp-low.c (pass_data_oacc_transform): Rename the tree dump for
	oacc_transform as oacc_xforms.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 92f82d7..7406941 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7158,6 +7158,11 @@ is made by appending @file{.slp} to the source file name.
 Dump each function after Value Range Propagation (VRP).  The file name
 is made by appending @file{.vrp} to the source file name.
 
+@item oacc_xforms
+@opindex fdump-tree-oacc_xforms
+Dump each function after applying target-specific OpenACC transformations.
+The file name is made by appending @file{.oacc_xforms} to the source file name.
+
 @item all
 @opindex fdump-tree-all
 Enable all the available tree dumps with the flags provided in this option.
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e3dc160..f31e6cd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -15086,7 +15086,7 @@ namespace {
 const pass_data pass_data_oacc_transform =
 {
   GIMPLE_PASS, /* type */
-  "fold_oacc_transform", /* name */
+  "oacc_xforms", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */


Re: [gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-09-18 Thread Cesar Philippidis
On 09/18/2015 01:39 AM, Thomas Schwinge wrote:

> On Tue, 1 Sep 2015 18:29:55 +0200, Tom de Vries <tom_devr...@mentor.com> 
> wrote:
>> On 27/08/15 03:37, Cesar Philippidis wrote:
>>> -  ctx->ganglocal_size_host = align_and_expand (_host, host_size, align);
>>
>> I suspect this caused a bootstrap failure (align_and_expand unused). 
>> Worked-around as attached.
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -1450,7 +1450,7 @@ omp_copy_decl (tree var, copy_body_data *cb)
>>  
>>  /* Modify the old size *POLDSZ to align it up to ALIGN, and then return
>> a value with SIZE added to it.  */
>> -static tree
>> +static tree ATTRIBUTE_UNUSED
>>  align_and_expand (tree *poldsz, tree size, unsigned int align)
>>  {
>>tree oldsz = *poldsz;
> 
> If I remember correctly, this has only ever been used in the "ganglocal"
> implementation -- which is now gone.  So, should align_and_expand also be
> elided (Cesar)?

Most likely. I probably overlooked it when I was working on that
ganglocal removal patch. Can you remove it please? I'm already juggling
a couple of patches right now.

Thanks,
Cesar





[gomp4] parallel reduction nested inside data regions

2015-09-11 Thread Cesar Philippidis
This patch corrects the way that build_outer_var_ref deals with data
mappings in acc parallel and kernels when they are nested in some other
construct (i.e. acc data). This issue can be reproduced with acc
parallel reduction nested nested inside a acc data region.

I've applied this fix to gomp-4_0-branch.

Cesar
2015-09-11  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* omp-low.c (build_outer_var_ref):

	gcc/testsuite/
	* c-c++-common/goacc/parallel-reduction.c: Enclose the parallel
	reduction inside an acc data region.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: Enclose
	one parallel reduction inside a data region.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 09adea8..ba37372 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1240,6 +1240,8 @@ build_outer_var_ref (tree var, omp_context *ctx)
   if (x == NULL_TREE)
 	x = var;
 }
+  else if (is_oacc_parallel (ctx))
+x = var;
   else if (ctx->outer)
 {
   /* OpenACC may have multiple outer contexts (one per loop).  */
@@ -1256,7 +1258,7 @@ build_outer_var_ref (tree var, omp_context *ctx)
   else
 	x = lookup_decl (var, ctx->outer);
 }
-  else if (is_reference (var) || is_oacc_parallel (ctx)
+  else if (is_reference (var)
 	   || extract_oacc_routine_gwv (current_function_decl) != 0)
 /* This can happen with orphaned constructs.  If var is reference, it is
possible it is shared and as such valid.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c b/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c
index debed55..d7cc947 100644
--- a/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c
@@ -2,11 +2,15 @@ int
 main ()
 {
   int sum = 0;
+  int dummy = 0;
 
-#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum)
+#pragma acc data copy (dummy)
   {
-int v = 5;
-sum += 10 + v;
+#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum)
+{
+  int v = 5;
+  sum += 10 + v;
+}
   }
 
   return sum;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
index 381d5b6..d328f46 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
@@ -10,10 +10,14 @@ main ()
 {
   int s1 = 0, s2 = 0;
   int i;
+  int dummy = 0;
 
-#pragma acc parallel num_gangs (N) reduction (+:s1)
+#pragma acc data copy (dummy)
   {
-s1++;
+#pragma acc parallel num_gangs (N) reduction (+:s1)
+{
+  s1++;
+}
   }
 
   if (acc_get_device_type () != acc_device_nvidia)


[gomp4] assign unused gwv clauses to auto/independent parallel acc loops

2015-09-09 Thread Cesar Philippidis
This patch assigns any available gang, worker or vector level
parallelism to auto and independent loops inside acc parallel regions.
This is done in omplower for two reasons:

  1. At the moment, it's too late to do this in oacc-xform because
 ompexpand is responsible for partitioning loops. This will likely
 get revisited later when we add support for kernels.

  2. omplower already has several tree walkers to scan for nesting
 errors and data mappings, etc. This is just another tree walk
 for acc parallel regions.

There are a couple of problems with this patch. First, I make no attempt
to determine the optimal work-sharing clause for a particular loop.
Instead, I assign the lowest (i.e. gang before worker before vector)
available parallelism to the outermost loop. At this point, that's
better than nothing. The second issue is, while adding clauses does let
ompexpand partition acc loops, we are not setting default values for
num_gangs, num_workers and vector_length yet (although we do set
vector_length to 32 when num_workers != 1).

It should be noted that this optimization only applies to acc loops
inside parallel regions. I probably could expand it to acc loops inside
acc routines, but technically acc routines are only supposed to have one
level of parallelism anyway. It also probably could be expanded to
handle independent loops inside kernels regions too.

Is this patch ok for gomp-4_0-branch or should I hold off until the
kernels situation gets resolved?

Cesar
2015-09-09  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* omp-low.c (struct oacc_gwv): New struct.
	(filter_omp_clause): New function.
	(set_oacc_parallel_loop_gwv_1): New function.
	(set_oacc_parallel_loop_gwv): New function.
	(scan_omp_for): Use filer_omp_clause to remove the stale reductions.
	(scan_omp_target): Automatically assign gang, worker and vector
	clauses to auto and independent loop with any worksharing clauses
	inside parallel regions.

	gcc/testsuite/
	* gfortran.dg/goacc/dtype-1.f95: Update xfails to account for the
	automatic parallelism in acc parallel regions.
	* c-c++-common/goacc/dtype-1.c: Likewise.
	* c-c++-common/goacc/par-auto-1.c: New test.
	* c-c++-common/goacc/par-auto-2.c: New test.
	* c-c++-common/goacc/par-auto-3.c: New test.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index bfef298..2d79ad1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -237,6 +237,13 @@ struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
+/* A structure for automatically adding parallelism to OpenACC loops.  */
+
+struct oacc_gwv
+{
+  short gwv;
+  bool update;
+};
 
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
@@ -2596,6 +2603,191 @@ oacc_loop_or_target_p (gimple stmt)
 	  && gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_OACC_LOOP));
 }
 
+/* Remove all clauses of type CODE from the chain of omp CLAUSES.  */
+static tree
+filter_omp_clause (omp_clause_code code, tree clauses)
+{
+  /* First filter out the clauses at the beginning of the chain.  */
+  while (clauses
+	 && OMP_CLAUSE_CODE (clauses) == code)
+{
+  clauses = OMP_CLAUSE_CHAIN (clauses);
+}
+
+  if (clauses != NULL)
+{
+  /* Filter out the remaining clauses.  */
+  for (tree c = OMP_CLAUSE_CHAIN (clauses), prev = clauses;
+	   c; c = OMP_CLAUSE_CHAIN (c))
+	{
+	  if (OMP_CLAUSE_CODE (c) == code)
+	{
+	  tree t = OMP_CLAUSE_CHAIN (c);
+	  OMP_CLAUSE_CHAIN (prev) = t;
+	}
+	  else
+	prev = c;
+	}
+}
+
+  return clauses;
+}
+
+/* Callback for walk_gimple_seq.  Set the appropriate level of parallelism
+   for an acc loop when possible.  Also remove a reduction clause if the
+   a loop doesn't have any parallelism associated with it.  */
+
+static tree
+set_oacc_parallel_loop_gwv_1 (gimple_stmt_iterator *gsi_p,
+			  bool *handled_ops_p,
+			  struct walk_stmt_info *wi)
+{
+  struct oacc_gwv *outer = (struct oacc_gwv *) wi->info;
+  struct oacc_gwv nested = { 0, false };
+  int local_gwv = 0, dim = 0, nested_dim = GOMP_DIM_MAX;
+  gimple stmt = gsi_stmt (*gsi_p);
+  bool is_seq = false;
+  tree clauses, c;
+
+  *handled_ops_p = true;
+
+  switch (gimple_code (stmt))
+{
+WALK_SUBSTMTS;
+
+case GIMPLE_CALL:
+  {
+	tree fndecl = gimple_call_fndecl (stmt);
+	if (fndecl)
+	  {
+	int call_gwv = extract_oacc_routine_gwv (fndecl);
+	outer->gwv |= call_gwv;
+	  }
+  }
+  break;
+
+case GIMPLE_OMP_FOR:
+  clauses = gimple_omp_for_clauses (stmt);
+
+  /* First pass of the clauses: extract the gwv parallelism associated
+	 with this loop.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+	switch (OMP_CLAUSE_CODE (c))
+	  {
+	  case OMP_CLAUSE_GANG:
+	local_gwv |= GOMP_DIM_MASK (GOMP_DIM_GANG);
+	break;
+	  case OMP_CLAUSE_WORKER:
+	local_gwv |= GOMP_DIM_MASK (GOMP_DIM_WORKER);
+	break;
+	  case OMP_CLAUSE_VECTOR:
+	local_gwv |= GOMP_DIM_MASK (GOMP_DIM_VECTOR);
+	br

[gomp4] force global locks for nvptx targets

2015-09-08 Thread Cesar Philippidis
This patch forces GOACC_LOCK to use locks in global memory regardless if
the lock us for a worker or a gang. We were using a shared memory for
worker locks, but we ran into an issue with that would sporadically
involve deadlocks in worker reductions. We're still investigating that
issue, but for the time being, global locks appear to work albeit with a
lock contention penalty.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-09-08  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* config/nvptx/nvptx.c (force_global_lock): New global variable.
	(nvptx_expand_oacc_lock): Use it to workaround a shared memory lock
	problem.
	(nvptx_xform_lock): Likewise.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 51f2893..c8f6f5c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -134,6 +134,9 @@ static const unsigned lock_level[] = {BARRIER_GLOBAL, BARRIER_SHARED};
 static GTY(()) rtx lock_syms[LOCK_MAX];
 static bool lock_used[LOCK_MAX];
 
+/* FIXME: Temporary workaround for worker locks.  */
+static bool force_global_locks = true;
+
 /* Size of buffer needed for worker reductions.  This has to be
disjoing from the worker broadcast array, as both may be live
concurrently.  */
@@ -1245,6 +1248,7 @@ nvptx_expand_oacc_lock (rtx src, int direction)
   rtx pat;
   
   kind = INTVAL (src) == GOMP_DIM_GANG ? LOCK_GLOBAL : LOCK_SHARED;
+  kind = force_global_locks ? LOCK_GLOBAL : kind;
   lock_used[kind] = true;
 
   rtx mem = gen_rtx_MEM (SImode, lock_syms[kind]);
@@ -3740,7 +3744,7 @@ nvptx_xform_lock (gimple stmt, const int *ARG_UNUSED (dims), unsigned ifn_code)
   return mode > GOMP_DIM_WORKER;
 
 case IFN_GOACC_LOCK_INIT:
-  return mode != GOMP_DIM_WORKER;
+  return force_global_lock || mode != GOMP_DIM_WORKER;
 
 default: gcc_unreachable();
 }


[gomp4] remove xfails in the libgomp reduction tests

2015-09-02 Thread Cesar Philippidis
A couple of reduction tests inside libgomp had xfails because Julian
added those tests before my reduction patches were ready. Most of them
should pass unmodified, but I had to found a bug in
loop-reduction-wv-p-3.c, where a private variable was used without being
initialized. This patch fixes that bug and removes the xfails from the
reduction test cases.

This patch has been committed to gomp-4_0-branch.

Cesar
2015-09-02  Cesar Philippidis  <ce...@codesourcery.com>

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c:
	Remove xfail.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c:
	Likwise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c:
	Remove xfail.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c:
	Likewise.  Initialize res because it's private.


diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
index 3e5c707..ea5c151 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
@@ -1,5 +1,3 @@
-/* { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "*" } { "" } } */
-
 #include 
 
 /* Test of reduction on loop directive (gangs, workers and vectors, non-private
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
index 44d7f0f..0056f3c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
@@ -1,5 +1,3 @@
-/* { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "*" } { "" } } */
-
 #include 
 
 /* Test of reduction on loop directive (gangs, workers and vectors, non-private
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
index 8bc18f7..e69d0ec 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
@@ -1,5 +1,3 @@
-/* { dg-xfail-run-if "TODO" { *-*-* } { "*" } { "" } } */
-
 #include 
 
 /* Test of reduction on loop directive (gangs, workers and vectors, multiple
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
index 63f3fef..15f0053 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
@@ -1,5 +1,3 @@
-/* { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } { "*" } { "" } } */
-
 #include 
 
 /* Test of reduction on loop directive (vector reduction in
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
index ac96525..b5e28fb 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
@@ -1,5 +1,3 @@
-/* { dg-xfail-run-if "TODO" { *-*-* } { "*" } { "" } } */
-
 #include 
 
 /* Test of reduction on loop directive (workers and vectors, private reduction
@@ -16,6 +14,9 @@ main (int argc, char *argv[])
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
 		   private(res) copyin(arr) copyout(out)
   {
+/* Private variables aren't initialized by default in openacc.  */
+res = 0;
+
 /* "res" should be available at the end of the following loop (and should
have the same value redundantly in each gang).  */
 #pragma acc loop worker vector reduction(+:res)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
index 860e56d..9b26f9b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
@@ -1,5 +1,3 @@
-/* { dg-xfail-run-if "TODO" { *-*-* } { "*" } { "" } } */
-
 #include 
 
 /* Test of reduction on both parallel and loop directives (workers and vectors
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c
in

[gomp4] useless reduction locks and other bug fixes

2015-09-01 Thread Cesar Philippidis
This patch teaches lower_oacc_reductions not to generate calls to
GOACC_{UN}LOCK if they aren't any reductions. That situation can happen
when there is a fake gang reduction on a private variable.

I also found a bug where the lower_rec_input_clauses expects there to be
a data mapping for the reduction variable when there isn't, e.g. for
private/local reduction variables. And I made the nvptx backend aware of
the fact that the lhs of a call to REDUCTION_TEARDOWN may be have been
optimized away for worker reductions too. I have a couple of test cases
for these bugs, but I'll include them with my upcoming auto-independent
loop patch.

This patch has been committed to gomp-4_0-branch.

Cesar



Re: [gomp4] useless reduction locks and other bug fixes

2015-09-01 Thread Cesar Philippidis
[Attaching patch this time.]

On 09/01/2015 04:21 PM, Cesar Philippidis wrote:
> This patch teaches lower_oacc_reductions not to generate calls to
> GOACC_{UN}LOCK if they aren't any reductions. That situation can happen
> when there is a fake gang reduction on a private variable.
> 
> I also found a bug where the lower_rec_input_clauses expects there to be
> a data mapping for the reduction variable when there isn't, e.g. for
> private/local reduction variables. And I made the nvptx backend aware of
> the fact that the lhs of a call to REDUCTION_TEARDOWN may be have been
> optimized away for worker reductions too. I have a couple of test cases
> for these bugs, but I'll include them with my upcoming auto-independent
> loop patch.
> 
> This patch has been committed to gomp-4_0-branch.
> 
> Cesar
> 

2015-09-01  Cesar Philippidis  <ce...@codesourcery.com>

	gcc/
	* config/nvptx/nvptx.c (nvptx_goacc_reduction_teardown): Allow
	lhs to be NULL for worker reductions too.
	* omp-low.c (lower_rec_input_clauses): Bail out on OpenACC reductions.
	(lower_oacc_reductions): Use maybe_lookup_decl for private reductions.
	Don't emit locks for fake private gang reductions.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 1b85892..51f2893 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4229,14 +4229,19 @@ nvptx_goacc_reduction_teardown (gimple call)
   tree rid = gimple_call_arg (call, 5);
   gimple_seq seq = NULL;
 
+  if (v == NULL)
+{
+  gsi_remove (, true);
+  return false;
+}
+
   push_gimplify_context (true);
 
   switch (loop_dim)
 {
 case GOMP_DIM_GANG:
 case GOMP_DIM_VECTOR:
-  if (v)
-	gimplify_assign (v, local_var, );
+  gimplify_assign (v, local_var, );
   break;
 case GOMP_DIM_WORKER:
   {
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fdca880..bfef298 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3892,7 +3892,14 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 
 	  new_var = var = OMP_CLAUSE_DECL (c);
 	  if (c_kind != OMP_CLAUSE_COPYIN)
-	new_var = lookup_decl (var, ctx);
+	{
+	  /* Not all OpenACC reductions require new mappings.  */
+	  if (is_gimple_omp_oacc (ctx->stmt)
+		  && (new_var = maybe_lookup_decl (var, ctx)) == NULL)
+		new_var = var;
+	  else
+		new_var = lookup_decl (var, ctx);
+	}
 
 	  if (c_kind == OMP_CLAUSE_SHARED || c_kind == OMP_CLAUSE_COPYIN)
 	{
@@ -4724,6 +4731,8 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses,
   tree c, tcode, gwv, rid, lid = build_int_cst (integer_type_node, oacc_lid);
   int oacc_rid, i;
   unsigned mask = extract_oacc_loop_mask (ctx);
+  gimple_seq red_seq = NULL;
+  int num_reductions = 0;
   enum tree_code rcode;
 
   /* Remove the outer-most level of parallelism from the loop.  */
@@ -4753,14 +4762,6 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses,
   gimplify_and_add (call, ilist);
 }
 
-  /* Call GOACC_LOCK.  */
-  if (ifn == IFN_GOACC_REDUCTION_FINI && write_back)
-{
-  call = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOACC_LOCK,
-	   void_type_node, 2, dim, lid);
-  gimplify_and_add (call, ilist);
-}
-
   for (c = clauses, oacc_rid = 0;
c && write_back;
c = OMP_CLAUSE_CHAIN (c), oacc_rid++)
@@ -4776,7 +4777,9 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses,
 
   var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c);
   if (var == NULL_TREE)
-	var = lookup_decl (orig, ctx);
+	var = maybe_lookup_decl (orig, ctx);
+  if (var == NULL_TREE)
+	var = orig;
 
   res = build_outer_var_ref (orig, ctx);
 
@@ -4811,16 +4814,32 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses,
   call = build_call_expr_internal_loc (UNKNOWN_LOCATION, ifn,
 	   TREE_TYPE (var), 6, ref_to_res,
 	   var, gwv, tcode, lid, rid);
-  gimplify_assign (var, call, ilist);
+  gimplify_assign (var, call, _seq);
+  num_reductions++;
 }
 
-  /* Call GOACC_UNLOCK.  */
-  if (ifn == IFN_GOACC_REDUCTION_FINI && write_back)
+  if (num_reductions)
 {
-  dim = build_int_cst (integer_type_node, loop_dim);
-  call = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOACC_UNLOCK,
-	   void_type_node, 2, dim, lid);
-  gimplify_and_add (call, ilist);
+  /* Call GOACC_LOCK.  */
+  if (ifn == IFN_GOACC_REDUCTION_FINI && write_back)
+	{
+	  call = build_call_expr_internal_loc (UNKNOWN_LOCATION,
+	   IFN_GOACC_LOCK, void_type_node,
+	   2, dim, lid);
+	  gimplify_and_add (call, ilist);
+	}
+
+  gimple_seq_add_seq (ilist, red_seq);
+
+  /* Call GOACC_UNLOCK.  */
+  if (ifn == IFN_GOACC_REDUCTION_FINI && write_back)
+	{
+	  dim = build_int_cst (integer_type_node, loop_dim);
+	  call = build_call_exp

[gomp4] check for compatible parallelism with acc routines

2015-08-28 Thread Cesar Philippidis
This patch teaches omplower to report any incompatible parallelism when
using routines. I also fixed a minor bug involving reductions inside
routines and removed a dead variable inside execute_oacc_transform which
caused a build warning.

There are two scenarios involving acc routines that need checking:

  1. calls to routines
  2. acc loops inside routines

For both of these cases, I'm utilizing the routine dimensions associated
with the 'oacc function' attribute.

A couple of libgomp test cases were clearly bogus. E.g., you cannot have
a gang loop inside a worker routine, nor can you call a vector routine
from a vector loop. This patch corrects those tests, too.

I encountered one ambiguity in the spec involving the seq loop clause.
The spec say that seq loops are supposed to be executed sequentially by
a single thread. I'm not sure whether that implies that a seq loop
cannot be embedded into a gang/worker/vector loop, or if a gwv loop can
nest inside a loop. E.g.

  #pragma acc loop gang
  for (...)
  {
#pragma acc loop seq
for (...)
  }

and

  #pragma acc loop seq
  for (...)
  {
#pragma acc loop gang
for (...)
  }

Right now, gcc is permitting both of these loops. I.e., only the seq
loop itself is executing in a non-partitioned mode. Julian inquired
about this in the openacc technical list a while ago, but I don't think
he got a response.

This patch has been applied to gomp-4_0-branch.

Cesar
2015-08-28  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* omp-low.c (extract_oacc_routine_gwv): New function.
	(build_outer_var_ref): Handle refs inside acc routines.
	(scan_omp_for): Check nested parallelism inside acc routines.
	(scan_omp_1_stmt): Check for compatible parallelism when calling
	routines.
	(execute_oacc_transform): Remove dead variable.

	gcc/testsuite/
	* c-c++-common/goacc/routine-6.c: New test.
	* c-c++-common/goacc/routine-7.c: New test.
	* gfortran.dg/goacc/routine-4.f90: New test.
	* gfortran.dg/goacc/routine-5.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Fix calls to
	acc routines.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/vector-routine.f90: Likewise.

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 4312a60..e8d7513 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -415,6 +415,35 @@ is_combined_parallel (struct omp_region *region)
   return region-is_combined_parallel;
 }
 
+/* Return the gang, worker and vector attributes from associated with
+   FNDECL.  Returns a GOMP_DIM for the lowest level of parallelism beginning
+   with GOMP_DIM_GANG, or -1 if the routine is a SEQ. Otherwise, return 0 if
+   the FNDECL is not an acc routine.
+*/
+
+static int
+extract_oacc_routine_gwv (tree fndecl)
+{
+  tree attrs = get_oacc_fn_attrib (fndecl);
+  tree pos;
+  unsigned gwv = 0;
+  int i;
+  int ret = 0;
+
+  if (attrs != NULL_TREE)
+{
+  for (i = 0, pos = TREE_VALUE (attrs);
+	   gwv == 0  i != GOMP_DIM_MAX;
+	   i++, pos = TREE_CHAIN (pos))
+	if (TREE_PURPOSE (pos) != boolean_false_node)
+	  return 1  i;
+
+  ret = -1;
+}
+
+  return ret;
+}
+
 
 /* Extract the header elements of parallel loop FOR_STMT and store
them into *FD.  */
@@ -1227,7 +1256,8 @@ build_outer_var_ref (tree var, omp_context *ctx)
   else
 	x = lookup_decl (var, ctx-outer);
 }
-  else if (is_reference (var) || is_oacc_parallel (ctx))
+  else if (is_reference (var) || is_oacc_parallel (ctx)
+	   || extract_oacc_routine_gwv (current_function_decl) != 0)
 /* This can happen with orphaned constructs.  If var is reference, it is
possible it is shared and as such valid.  */
 x = var;
@@ -2578,9 +2608,16 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
   bool gwv_clause = false;
   bool auto_clause = false;
   bool seq_clause = false;
+  int gwv_routine = 0;
 
   if (outer_ctx)
 outer_type = gimple_code (outer_ctx-stmt);
+  else
+{
+  gwv_routine = extract_oacc_routine_gwv (current_function_decl);
+  if (gwv_routine  0)
+	gwv_routine = gwv_routine  1;
+}
 
   ctx = new_omp_context (stmt, outer_ctx);
 
@@ -2699,6 +2736,12 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
 	ctx-gwv_this  ctx-gwv_below)
 	error_at (gimple_location (stmt),
 		  gang, worker and vector must occur in this order in a loop nest);
+  else if (!outer_ctx  ctx-gwv_this != 0  gwv_routine != 0
+	((ffs (ctx-gwv_this) = gwv_routine)
+		   || gwv_routine  0))
+	error_at (gimple_location (stmt),
+		  invalid parallelism inside acc routine);
+
   if (outer_ctx  outer_type == GIMPLE_OMP_FOR)
 	outer_ctx-gwv_below |= ctx-gwv_below;
 }
@@ -3287,6 +3330,16 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	  default:
 		break;
 	  }
+	  else if (ctx  is_gimple_omp_oacc (ctx-stmt)
+		!is_oacc_parallel (ctx))
+	{
+	  /* Is this a call to an acc routine?  */
+	  int gwv = extract_oacc_routine_gwv (fndecl

Re: [gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-08-27 Thread Cesar Philippidis
On 08/27/2015 06:13 AM, Nathan Sidwell wrote:
 On 08/26/15 21:37, Cesar Philippidis wrote:
 This patch strips out all of the references to ganglocal memory in gcc.
 Unfortunately, the runtime api still takes a shared memory parameter, so
 I haven't made any changes there yet. Perhaps we could still keep the
 shared memory argument to GOACC_parallel, but remove all of the support
 for ganglocal mappings. Then again, maybe we still need support
 ganglocal mappings for legacy purposes.

 With the ganglocal mapping aside, I'm in favor of leaving the shared
 memory argument to GOACC_parallel, just in case we find another use for
 shared memory in the future.

 Nathan, what do you want to do here?
 
 We should remove the parameter.
 
 1) the patch I posted earlier this week for trunk review doesn't have it

I've committed this patch to gomp-4_0-branch. Do you want to apply that
patch to gomp-4_0-branch since ganglocal memory is no longer used? Just
remember that you'll need to teach expand_omp_target not to pass the
shared memory argument to the runtime.

 2) if it turns out to be needed in the future, it can be done by
 extending the tagging scheme we now have in that API
 3) It's a target-specific concept and if needed I strongly suspect
 either compile time known by the target compiler (and hence emittable in
 the offload data), or deducible at runtime from other data.

That sounds reasonable.

 WRT to the patch you've posted, I think you can totally excise
 'GOMP_MAP_FORCE_TO_GANGLOCAL' and friends from gomp-constants.h and from
 the runtime too.  (that could be a separate patch).

I'll create a follow up patch for that later, probably after I finish
working on the auto-independent loop patch. In the meantime, I'm found a
bug where acc routine calls aren't being checked for compatible
parallelism. E.g.

  #pragma acc routine gang
  void foo ();

  ...

  #pragma acc parallel loop worker
  for (...)
 foo ();

The call to foo isn't being reported as an error, which it should. I'm
testing a fix for this.

Cesar


[gomp4] teach the tracer pass to ignore more blocks for OpenACC

2015-08-26 Thread Cesar Philippidis
I hit a problem in on one of my reduction test cases where the
GOACC_JOIN was getting cloned. Nvptx requires FORK and JOIN to be
single-entry, single-exit regions, or some form of thread divergence may
occur. When that happens, we cannot use the shfl instruction for
reductions or broadcasting (if the warp is divergent), and it may cause
problems with synchronization in general.

Nathan ran into a similar problem in one of the ssa passes when he added
support for predication in the nvptx backend. Part of his solution was
to add a gimple_call_internal_unique_p function to determine if internal
functions are safe to be cloned. This patch teaches the tracer to scan
each basic block for internal function calls using
gimple_call_internal_unique_p, and mark the blocks that contain certain
OpenACC internal functions calls as ignored. It is a shame that
gimple_statement_iterators do not play nicely with const_basic_block.

Is this patch ok for gomp-4_0-branch?

Cesar
2015-08-25  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* tracer.c (ignore_bb_p): Change bb argument from const_basic_block
	to basic_block.  Check for non-clonable calls to internal functions.


diff --git a/gcc/tracer.c b/gcc/tracer.c
index cad7ab1..f20c158 100644
--- a/gcc/tracer.c
+++ b/gcc/tracer.c
@@ -58,7 +58,7 @@
 #include fibonacci_heap.h
 
 static int count_insns (basic_block);
-static bool ignore_bb_p (const_basic_block);
+static bool ignore_bb_p (basic_block);
 static bool better_p (const_edge, const_edge);
 static edge find_best_successor (basic_block);
 static edge find_best_predecessor (basic_block);
@@ -91,8 +91,9 @@ bb_seen_p (basic_block bb)
 
 /* Return true if we should ignore the basic block for purposes of tracing.  */
 static bool
-ignore_bb_p (const_basic_block bb)
+ignore_bb_p (basic_block bb)
 {
+  gimple_stmt_iterator gsi;
   gimple g;
 
   if (bb-index  NUM_FIXED_BLOCKS)
@@ -106,6 +107,16 @@ ignore_bb_p (const_basic_block bb)
   if (g  gimple_code (g) == GIMPLE_TRANSACTION)
 return true;
 
+  /* Ignore blocks containing non-clonable function calls.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi))
+{
+  g = gsi_stmt (gsi);
+
+  if (is_gimple_call (g)  gimple_call_internal_p (g)
+	   gimple_call_internal_unique_p (g))
+	return true;
+}
+
   return false;
 }
 


Re: [gomp4] lowering OpenACC reductions

2015-08-26 Thread Cesar Philippidis
On 08/21/2015 02:00 PM, Cesar Philippidis wrote:

 This patch teaches omplower how to utilize the new OpenACC reduction
 framework described in Nathan's document, which was posted here
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01248.html. Here is the
 infrastructure patch
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01130.html, and here's
 the nvptx backend changes
 https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01334.html. The updated
 reduction tests have been posted here
 https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01561.html.

All of these patches have been committed to gomp-4_0-branch.

Cesar


[gomp4] initialize worker reduction locks

2015-08-26 Thread Cesar Philippidis
This patch teaches omplow how to emit function calls to
IFN_GOACC_LOCK_INIT so that the worker mutex has a proper initial value.
On nvptx targets, shared memory isn't initialized (and that's where the
lock is located for OpenACC workers), so this makes it explicit. Nathan
added the internal function used in the patch a couple of days ago.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-08-26  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* omp-low.c (lower_oacc_reductions): Call GOACC_REDUCTION_INIT
	to initialize the gang and worker mutex.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 955a098..ee92141 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4795,10 +4795,20 @@ lower_oacc_reductions (enum internal_fn ifn, int loop_dim, tree clauses,
   if (ctx-reductions == 0)
 return;
 
+  dim = build_int_cst (integer_type_node, loop_dim);
+
+  /* Call GOACC_LOCK_INIT.  */
+  if (ifn == IFN_GOACC_REDUCTION_SETUP)
+{
+  call = build_call_expr_internal_loc (UNKNOWN_LOCATION,
+	   IFN_GOACC_LOCK_INIT,
+	   void_type_node, 2, dim, lid);
+  gimplify_and_add (call, ilist);
+}
+
   /* Call GOACC_LOCK.  */
   if (ifn == IFN_GOACC_REDUCTION_FINI  write_back)
 {
-  dim = build_int_cst (integer_type_node, loop_dim);
   call = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOACC_LOCK,
 	   void_type_node, 2, dim, lid);
   gimplify_and_add (call, ilist);


[gomp4, wip] remove references to ganglocal shared memory inside gcc

2015-08-26 Thread Cesar Philippidis
This patch strips out all of the references to ganglocal memory in gcc.
Unfortunately, the runtime api still takes a shared memory parameter, so
I haven't made any changes there yet. Perhaps we could still keep the
shared memory argument to GOACC_parallel, but remove all of the support
for ganglocal mappings. Then again, maybe we still need support
ganglocal mappings for legacy purposes.

With the ganglocal mapping aside, I'm in favor of leaving the shared
memory argument to GOACC_parallel, just in case we find another use for
shared memory in the future.

Nathan, what do you want to do here?

Cesar
2015-08-26  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* builtins.c (expand_oacc_ganglocal_ptr): Delete.
	(expand_builtin): Remove stale GOACC_GET_GANGLOCAL_PTR builtin.
	* config/nvptx/nvptx.md (ganglocal_ptr): Delete.
	* gimple.h (struct gimple_statement_omp_parallel_layout): Remove
	ganglocal_size member.
	(gimple_omp_target_ganglocal_size): Delete.
	(gimple_omp_target_set_ganglocal_size): Delete.
	* omp-builtins.def (BUILT_IN_GOACC_GET_GANGLOCAL_PTR): Delete.
	* omp-low.c (struct omp_context): Remove ganglocal_init, ganglocal_ptr,
	ganglocal_size, ganglocal_size_host, worker_var, worker_count and
	worker_sync_elt.
	(alloc_var_ganglocal): Delete.
	(install_var_ganglocal): Delete.
	(new_omp_context): Don't use ganglocal memory.
	(expand_omp_target): Likewise.
	(lower_omp_taskreg): Likewise.
	(lower_omp_target): Likewise.
	* tree-parloops.c (create_parallel_loop): Likewise.
	* tree-pretty-print.c (dump_omp_clause): Remove support for
	GOMP_MAP_FORCE_TO_GANGLOCAL

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 7c3ead1..f465716 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5913,25 +5913,6 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   return target;
 }
 
-static rtx
-expand_oacc_ganglocal_ptr (rtx target ATTRIBUTE_UNUSED)
-{
-#ifdef HAVE_ganglocal_ptr
-  enum insn_code icode;
-  icode = CODE_FOR_ganglocal_ptr;
-  rtx tmp = target;
-  if (!REG_P (tmp) || GET_MODE (tmp) != Pmode)
-tmp = gen_reg_rtx (Pmode);
-  rtx insn = GEN_FCN (icode) (tmp);
-  if (insn != NULL_RTX)
-{
-  emit_insn (insn);
-  return tmp;
-}
-#endif
-  return NULL_RTX;
-}
-
 /* Expand an expression EXP that calls a built-in function,
with result going to TARGET if that's convenient
(and in mode MODE if that's convenient).
@@ -7074,12 +7055,6 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	return target;
   break;
 
-case BUILT_IN_GOACC_GET_GANGLOCAL_PTR:
-  target = expand_oacc_ganglocal_ptr (target);
-  if (target)
-	return target;
-  break;
-
 default:	/* just do library call, if unknown builtin */
   break;
 }
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 3d734a8..d0d6564 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1485,23 +1485,6 @@
   
   %.\\tst.shared%u1\\t%1,%0;)
 
-(define_insn ganglocal_ptrmode
-  [(set (match_operand:P 0 nvptx_register_operand )
-	(unspec:P [(const_int 0)] UNSPEC_SHARED_DATA))]
-  
-  %.\\tcvta.shared%t0\\t%0, sdata;)
-
-(define_expand ganglocal_ptr
-  [(match_operand 0 nvptx_register_operand )]
-  
-{
-  if (Pmode == DImode)
-emit_insn (gen_ganglocal_ptrdi (operands[0]));
-  else
-emit_insn (gen_ganglocal_ptrsi (operands[0]));
-  DONE;
-})
-
 ;; Atomic insns.
 
 (define_expand atomic_compare_and_swapmode
diff --git a/gcc/gimple.h b/gcc/gimple.h
index d8d8742..278b49f 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -580,10 +580,6 @@ struct GTY((tag(GSS_OMP_PARALLEL_LAYOUT)))
   /* [ WORD 10 ]
  Shared data argument.  */
   tree data_arg;
-
-  /* [ WORD 11 ]
- Size of the gang-local memory to allocate.  */
-  tree ganglocal_size;
 };
 
 /* GIMPLE_OMP_PARALLEL or GIMPLE_TASK */
@@ -5232,25 +5228,6 @@ gimple_omp_target_set_data_arg (gomp_target *omp_target_stmt,
 }
 
 
-/* Return the size of gang-local data associated with OMP_TARGET GS.  */
-
-static inline tree
-gimple_omp_target_ganglocal_size (const gomp_target *omp_target_stmt)
-{
-  return omp_target_stmt-ganglocal_size;
-}
-
-
-/* Set SIZE to be the size of gang-local memory associated with OMP_TARGET
-   GS.  */
-
-static inline void
-gimple_omp_target_set_ganglocal_size (gomp_target *omp_target_stmt, tree size)
-{
-  omp_target_stmt-ganglocal_size = size;
-}
-
-
 /* Return the clauses associated with OMP_TEAMS GS.  */
 
 static inline tree
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 0d9f386..615c4e0 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -58,8 +58,6 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_UPDATE, GOACC_update,
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, GOACC_wait,
 		   BT_FN_VOID_INT_INT_VAR,
 		   ATTR_NOTHROW_LIST)
-DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, GOACC_get_ganglocal_ptr,
-		   BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DEVICEPTR, GOACC_deviceptr,
 		   BT_FN_PTR_PTR

[gomp4] firstprivate bug in combined acc loops

2015-08-24 Thread Cesar Philippidis
This patch addresses a bug where a firstprivate clause in a combined
parallel/kernels loop is propagated to an acc loop. I came across this
bug when I was testing the new reduction changes. There's a test case
for this in my reduction patch set, so I didn't include one here.

This patch has been committed to gomp-4_0-branch.

Cesar
2015-08-24  Cesar Philippidis  ce...@codesourcery.com

	gcc/c-family
	* c-omp.c (c_oacc_split_loop_clauses): Don't propagate
	OMP_CLAUSE_FIRSTPRIVATE to acc loops with splitting the clauses.


diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index e9df829..798174c 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -1110,7 +1110,6 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 	  loop_clauses = clauses;
 	  break;
 
-	case OMP_CLAUSE_FIRSTPRIVATE:
 	case OMP_CLAUSE_PRIVATE:
 	  c = build_omp_clause (OMP_CLAUSE_LOCATION (clauses),
 			OMP_CLAUSE_CODE (clauses));


[gomp4] bug fix for num_gangs inside fortran subroutines

2015-08-24 Thread Cesar Philippidis
This patch adds support for num_gangs, num_workers and vector_length
inside nested functions. This fixes an ICE that I hit inside a nested
fortran subroutine that was using a num_gangs clause on an acc parallel
construct.

I applied this patch to gomp-4_0-branch.

Cesar


[gomp4] nvptx reductions

2015-08-21 Thread Cesar Philippidis
This patch implements the the goacc.reduction hook introduced here
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01130.html for the nvptx
target. Each new nvptx_goacc_reduction_* function is commented with a
description of how each internal function gets implemented for each
parallel dimension. Nathan goes into that in more detail in his design.

For the most part, all of the reductions use atomic operations with
spinlocks. Nvidia targets have different spinlock requirements then
pthreads on the host, so we couldn't recycle GOMP_atomic_start/end.
Well, we probably could have, but we decided it would be better to
introduce two new GOACC_LOCK and GOACC_UNLOCK functions, so that we
could inline those locks in the compiler. This has several advantages;
different loops can use different locks, and the locks for different
parallel dimensions can be optimized (e.g. worker locks are stored in
.shared memory). Placement of these locking functions happens in the
lowering code, so the nvptx backend (and really goacc.reduction in
general) expects the appropriate threads to evaluate the expanded
instructions.

The exception for the atomic reductions on nvptx targets are vectors.
Since spinlocks don't work on nvptx targets, we're forced to implement a
parallel-tree reduction using shfl.down instructions. You'll note that
nvptx_generate_vector_shuffle expands GOACC_REDUCTION_FINI into an
unrolled sequence of shfl.down instead of using a loop. A loop would be
nice here to handle the case where OACC_DIM_SIZE (GOMP_DIM_VECTOR) = 1.
However, we don't yet have the necessary infrastructure to ensure that
the branch in that loop will be unified. If the branch isn't unified,
bad things happen on the gpu when you try to synchronize all of the threads.

Updating the ssa is kind of messy because nvptx_goacc_reduction_init
creates a couple of new basic block for the vectors. It also inserts a
call to GOACC_DIM_POS, so that's why oacc_transfrom may need to rescan
all of the basic blocks for internal functions.

Is this patch ok for gomp-4_0-branch after the infrastructure patch goes in?

Cesar
2015-08-20  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* config/nvptx/nvptx.c (enum nvptx_builtins): New enum.
	(NVPTX_BUILTIN_MAX): Delete.
	(nvptx_get_worker_red_addr_fn): New function.
	(nvptx_generate_vector_shuffle): New function.
	(nvptx_shuffle_reduction): New function.
	(nvptx_goacc_reduction_setup): New function.
	(nvptx_goacc_reduction_init): New function.
	(nvptx_goacc_reduction_fini): New function.
	(nvptx_goacc_reduction_teardown): New function.
	(nvptx_goacc_reduction): New function.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a05c767..26e28c1 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -53,13 +53,27 @@
 #include target.h
 #include diagnostic.h
 #include cfgrtl.h
+#include cfghooks.h
+#include cfgloop.h
 #include stor-layout.h
 #include df.h
 #include dumpfile.h
 #include builtins.h
 #include dominance.h
 #include cfg.h
+#include tree-cfg.h
 #include omp-low.h
+#include fold-const.h
+#include stringpool.h
+#include internal-fn.h
+#include gimple.h
+#include gimple-iterator.h
+#include gimple-ssa.h
+#include gimplify.h
+#include tree-phinodes.h
+#include ssa-iterators.h
+#include tree-ssanames.h
+#include tree-into-ssa.h
 #include gomp-constants.h
 #include gimple.h
 
@@ -3443,6 +3457,20 @@ enum nvptx_types
 NT_MAX
   };
 
+/* Codes for all the NVPTX builtins.  */
+enum nvptx_builtins
+{
+  NVPTX_BUILTIN_SHUFFLE_DOWN,
+  NVPTX_BUILTIN_SHUFFLE_DOWNLL,
+  NVPTX_BUILTIN_SHUFFLE_DOWNF,
+  NVPTX_BUILTIN_SHUFFLE_DOWND,
+  NVPTX_BUILTIN_WORK_RED_ADDR,
+  NVPTX_BUILTIN_WORK_RED_ADDRLL,
+  NVPTX_BUILTIN_WORK_RED_ADDRF,
+  NVPTX_BUILTIN_WORK_RED_ADDRD,
+  NVPTX_BUILTIN_MAX
+};
+
 static const struct builtin_description builtins[] =
 {
   {__builtin_nvptx_shuffle_down, NT_UINT_UINT_INT,
@@ -3463,8 +3491,6 @@ static const struct builtin_description builtins[] =
nvptx_expand_work_red_addr},
 };
 
-#define NVPTX_BUILTIN_MAX (sizeof (builtins) / sizeof (builtins[0]))
-
 static GTY(()) tree nvptx_builtin_decls[NVPTX_BUILTIN_MAX];
 
 /* Return the NVPTX builtin for CODE.  */
@@ -3642,6 +3668,549 @@ nvptx_xform_lock_unlock (gimple stmt, const int *ARG_UNUSED (dims),
   
   return TREE_INT_CST_LOW (arg)  GOMP_DIM_WORKER;
 }
+
+static tree
+nvptx_get_worker_red_addr_fn (tree var, tree rid, tree lid)
+{
+  tree vartype = TREE_TYPE (var);
+  tree fndecl, call;
+  enum nvptx_builtins fn;
+  machine_mode mode = TYPE_MODE (vartype);
+
+  switch (mode)
+{
+case QImode:
+case HImode:
+case SImode:
+  fn = NVPTX_BUILTIN_WORK_RED_ADDR;
+  break;
+case DImode:
+  fn = NVPTX_BUILTIN_WORK_RED_ADDRLL;
+  break;
+case DFmode:
+  fn = NVPTX_BUILTIN_WORK_RED_ADDRD;
+  break;
+case SFmode:
+  fn = NVPTX_BUILTIN_WORK_RED_ADDRF;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  fndecl = nvptx_builtin_decl (fn, true);
+  call

[gomp4] don't use VLA's in the reduction tests

2015-08-21 Thread Cesar Philippidis
Nathan noticed that I was using VLA's in the a couple of compile-time
reduction tests. That's bad because ptx doesn't have support alloca. I
guess these tests used to pass because we were only running them on the
host.

I'll apply this patch shortly.

Cesar
2015-08-21  Cesar Philippidis  ce...@codesourcery.com

	gcc/testsuite/
	* c-c++-common/goacc/reduction-1.c: Don't use VLA's in accelerated
	regions.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.


diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
index 8f7c70d..cec1dc8 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -1,11 +1,11 @@
 /* Integer reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   int result, array[n];
   int lresult;
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-2.c b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 7ff125f..964c596 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -1,11 +1,11 @@
 /* float reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   float result, array[n];
   int lresult;
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-3.c b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
index cd44559..edcdaf7 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -1,11 +1,11 @@
 /* double reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   double result, array[n];
   int lresult;
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-4.c b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
index ec3a9c9..212266a 100644
--- a/gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -1,11 +1,11 @@
 /* complex reductions.  */
 
 #define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   __complex__ double result, array[n];
   int lresult;


[gomp4] lowering OpenACC reductions

2015-08-21 Thread Cesar Philippidis
This patch teaches omplower how to utilize the new OpenACC reduction
framework described in Nathan's document, which was posted here
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01248.html. Here is the
infrastructure patch
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01130.html, and here's
the nvptx backend changes
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01334.html. The updated
reduction tests have been posted here
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01561.html.

The existing reduction code in gomp-4_0-branch is doing a couple a
quirky things, like creating a special ganglocal copy for the private
reduction variables. Those ganglocal variables were mapped into shared
memory for nvidia gpus and a special malloc'ed buffer for everything
else. That worked, but it too target-specific and it didn't solve the
vector reduction problem. Part of this patch  eliminates the need for
those ganglocal data, at least from lowering code.

Looking at this patch, you might see a reference to fake gang
reductions. The idea behind that, which Nathan describes in his design
document, is that only gang's can access global data mappings, not
worker or vectors. This restriction allows us to cascade multiple
reductions with multiple levels of parallelism using a common interface.
Here's a worker reduction example taken from Nathan's design:

  //#pragma acc parallel loop worker copy(a) reduction (+:a)
  {
// Insert dummy gang reduction at start.
// Note this uses the same RID  LID as the inner worker loop.
a = IFN_SETUP (ompstruct­a, a, GANG, +, 0, 0)
a = IFN_INIT (ompstruct­a, a, GANG, +, 0, 0)
#loop worker reduction(+:a)
a = IFN_SETUP (NULL, a, WORKER, +, 0, 0)
IFN_FORK (WORKER)
a = IFN_INIT (NULL, a, WORKER, +, 0, 0)
for (...) { ... }
IFN_LOCK (WORKER, 0)
a = IFN_FINI (NULL, a, WORKER, +, 0, 0)
IFN_UNLOCK (WORKER, 0)
IFN_JOIN (WORKER)
a = IFN_TEARDOWN (NULL, a, WORKER, +, 0, 0)
// Dummy gang reduction at end
a = IFN_FINI (ompstruct­a, a, GANG, +, 0, 0)
a = IFN_TEARDOWN (ompstruct­a, a, GANG, +, 0, 0)
  }

Note that while this loop doesn't have a gang associated with it, it
does have a fake gang reduction to update the original value. If 'a' was
private, then the gang reduction wouldn't be necessary.

Now for the reduction changes. Starting with the gimplifier, you'll note
that I introduced a function to rewrite reference-typed variables as
non-references. This was initially done to solve the problem with
fortran subroutines, but I'm also using it for reductions that are not
associated with loops (e.g. 'acc parallel reduction (+:foo) copy
(foo)'). The justification for this variable rewriting is as follows:

  * The gimplifier expands reference types to use indirection before it
reaches omplower. So if I were to wait for omplower to rewrite the
variable, I'd have to rewrite possibly three instructions instead of
just one. This solution is just a little more straightforward.

  * Non-loop reductions are kind of tricky. On one hand, we want to the
global copy of the reduction variable to be mapped onto the
accelerator. On the other hand, we don't that the code inside the
parallel region to use the global copy. So that's why I introduced
a new copy of the reduction variable in the gimplifier.

The way that reductions work in acc loops is that each loop creates
a private copy of the reduction variable. Then when it comes time to
updating the original global copy, the lowering code would get the
reference to the reduction variable in its parent omp_context.
There's no parent context for parallel constructs, so the private
copy of the reduction variable would be overwritten. Hence, the
gimplifier pass attaches a private variable to omp clause itself.

If anyone has have a better solution for either of these two problems,
let me know.

The next major change is that lower_omp_for is responsible for inserting
calls for GOACC_FORK and GOACC_JOIN. One thing that does concern me
about this change is that par-loops will need to become aware of that in
insert those calls as necessary. Technically, it should be ok for now
because par-loops doesn't support workers and vectors yet. But if we go
with this change, par-loops will need to be updated eventually.

Is this ok for gomp-4_0-branch?

Cesar
2015-08-21  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* gimplify.c (struct privatize_reduction): New struct.
	(localize_reductions_r): New function.
	(localize_reductions): New function.
	(gimplify_omp_for): Use it.
	(gimplify_omp_workshare): Likweise.
	* omp-low.c (struct omp_context): Remove reduction_map and
	oacc_reduction_set. Add 'int reductions'.
	(oacc_gang_reduction_init): New gimple_seq to contain initialization
	code for fake gang reductions.
	(oacc_gang_reduction_fini): Ditto, but for finalization code.
	(extract_oacc_loop_mask): New function.
	(is_oacc_reduction_private): New function

[gomp4] New reduction infrastructure for OpenACC

2015-08-19 Thread Cesar Philippidis
This patch introduces a infrastructure for reductions in OpenACC. This
infrastructure consists of four internal functions,
GOACC_REDUCTION_SETUP, GOACC_REDUCTION_INIT, GOACC_REDUCTION_FINI, and
GOACC_REDUCTION_TEARDOWN, along with a new target hook goacc.reduction.
Each internal function shares a common interface:

  var = ifn (*ref_to_res, local_var, level, op, lid, rid)

var is the intermediate and private result of the reduction. Usually,
var = local_var.

*ref_to_res is a pointer to the resulting reduction. This is only
non-NULL for gang reductions. All other reduction operate on local
variables for which var will suffice.

local_var is a local (private) copy of the reduction variable.

level is the GOMP_DIM of the reduction. Each function call may only
contain one dim. If a loop a combination of gang, worker and vector,
then ifn must be called one per each dim.

op is the reduction operation.

lid is a unique loop ID. It's not 100% unique because it might get reset
in different TUs.

rid is the reduction ID within a loop. E.g., if a loop has two
reductions associated with it, the first could be designated zero and
the second one.

The target hook takes in one argument, the gimple statement containing
the call to the internal reduction function, and it returns true if it
introduces any calls to other target functions. This was necessary for
the nvptx backend, specifically for vector INIT because the thread ID is
necessary.

Each internal function is expanded during execute_oacc_transform using
that goacc reduction target hook. This allows us to generate
target-specific code while lowering it in a target-independent manner.

There are a couple of significant changes in this patch over the
existing OpenMP reduction implementation. The first change is that
reductions no longer rely on special ganglocal mappings. Certain
targets, such as nvptx gpus, have a distributed memory hierarchy. On
nvptx targets, all of the processors are partitioned into blocks. Each
block has a limited amount of shared memory. Because of the OpenACC spec
is written, we were initially mapping nvptx's shared memory into
gang-local memory. However, Nathan's worker and vector state propagator
is robust enough that we were able to eliminate the ganglocal mappings
altogether.

While this new infrastructure allows us to eliminate the ganglocal
mappings, nvptx still needs to use shared memory for worker reductions.
Consider the following example where red is private:

  #pragma acc loop worker reduction (+:red)
  for (...)
red++;

This loop would expand to this during omp-lower:

  red = GOACC_REDUCTION_SETUP (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);
  GOACC_FORK (GOMP_DIM_WORKER);
  red = GOACC_REDUCTION_INIT (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);

  for (...)
red++;

  red = GOACC_REDUCTION_FINI (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);
  GOACC_JOIN (GOMP_DIM_WORKER);
  red = GOACC_REDUCTION_TEARDOWN (NULL, red, GOMP_DIM_WORKER, '+', 0, 0);

For nvptx targets, SETUP and TEARDOWN are responsible for allocating and
freeing shared memory. INIT is responsible for initializing the private
reduction variable. This is necessary for vector reductions because we
want thread 0 to contain the original value of local_var, and the other
threads to be initialized to the proper value for 'op'. All of the
intermediate reduction results are combined in FINI and written back to
var or *ref_to_res, whichever is necessary, in TEARDOWN.

I don't want to delve too much into the use of this infrastructure right
now. We do have a design for that, and I intend to present more details
when I post the lowering patch. The next patch will likely be the nvptx
changes though.

One of the reasons why we needed create this generic interface was to
implement vector reductions on nvptx targets. On nvptx targets, we're
mapping vectors to warps. That's fine, but warps cannot use spinlocks or
the warp will deadlock. As a consequence, we can't use the existing
OpenMP atomic reductions in OpenACC. The way I got around the spinlock
problem in 5.0 was by allocating an array of length vector_length, and
stashing all of the intermediate reductions in there. The later on, one
thread would merge all of those reductions together.

This new reduction infrastructure provides a more elegant solution for
OpenACC reduction. And while we're still using atomic operations for
gang and worker reductions, we're no longer using a global lock for
workers. This api allows us to use a lock in shared memory for workers.
That said, this infrastructure does provide sufficient flexibility to
implement tree reductions for gangs and workers later on.

It should be noted that this is not a replacement for the existing
OpenMP reductions. Rather, OpenMP will continue to use
lower_reduction_clauses and friends, while OpenACC will use this
infrastructure. That said, OpenMP could taught to use this infrastructure.

Is this patch OK for gomp-4_0-branch?

Thanks,
Cesar
2015-08-19  Cesar Philippidis

[gomp4] non-acc loop reductions implicit copy bugfix

2015-08-14 Thread Cesar Philippidis
This patch teaches the c and c++ front ends not to add a 'copy' clause
for each non-loop OpenACC reduction. Without this patch, this construct
would error with a duplicate mapping for 'sum':

  #pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum)

While the intent behind adding a 'copy' clause for the reduction
variable was good, it is inconsistent with the behavior of the other
data clauses in the OpenACC spec. Without an explicit copy, 'sum' should
probably be transferred to the accelerator as firstprivate. Anyway, it's
probably better to correct this behavior in the gimplifier later on if
necessary.

I'll apply this patch to gomp-4_0-branch shortly.

Cesar
2015-08-14  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* c/c-typeck.c (c_finish_omp_clauses): Permit variables to appear
	in both OpenACC data and reduction clauses.
	* cp/semantics.c (finish_omp_clauses): Likewise.

	gcc/testsuite/
	* c-c++-common/goacc/parallel-reduction.c: New test.


diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index ed748c8..fe54799f 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12059,11 +12059,10 @@ tree
 c_finish_omp_clauses (tree clauses, bool oacc)
 {
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
-  bitmap_head aligned_head, oacc_data_head;
+  bitmap_head aligned_head, oacc_data_head, oacc_reduction_head;
   tree c, t, *pc;
   bool branch_seen = false;
   bool copyprivate_seen = false;
-  bool oacc_data = false;
   tree *nowait_clause = NULL;
 
   bitmap_obstack_initialize (NULL);
@@ -12072,12 +12071,15 @@ c_finish_omp_clauses (tree clauses, bool oacc)
   bitmap_initialize (lastprivate_head, bitmap_default_obstack);
   bitmap_initialize (aligned_head, bitmap_default_obstack);
   bitmap_initialize (oacc_data_head, bitmap_default_obstack);
+  bitmap_initialize (oacc_reduction_head, bitmap_default_obstack);
 
   for (pc = clauses, c = clauses; c ; c = *pc)
 {
   bool remove = false;
   bool need_complete = false;
   bool need_implicitly_determined = false;
+  bool oacc_data = false;
+  bool reduction = false;
 
   switch (OMP_CLAUSE_CODE (c))
 	{
@@ -12095,8 +12097,8 @@ c_finish_omp_clauses (tree clauses, bool oacc)
 	goto check_dup_generic;
 
 	case OMP_CLAUSE_REDUCTION:
-	  need_implicitly_determined = true;
-	  oacc_data = false;
+	  need_implicitly_determined = !oacc;
+	  reduction = true;
 	  t = OMP_CLAUSE_DECL (c);
 	  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) == NULL_TREE
 	   (FLOAT_TYPE_P (TREE_TYPE (t))
@@ -12312,12 +12314,23 @@ c_finish_omp_clauses (tree clauses, bool oacc)
 	  else
 		bitmap_set_bit (oacc_data_head, DECL_UID (t));
 	}
+	  else if (reduction)
+	{
+	  if (oacc  bitmap_bit_p (oacc_reduction_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			%qE appears in multiple reduction clauses, t);
+		  remove = true;
+		}
+	  else
+		bitmap_set_bit (oacc_reduction_head, DECL_UID (t));
+	}
 	  else
 	{
 	  if (bitmap_bit_p (generic_head, DECL_UID (t)))
 		{
 		  error_at (OMP_CLAUSE_LOCATION (c),
-			%qE appears more than once in data clauses, t);
+			%qE appears more than one non-data clause, t);
 		  remove = true;
 		}
 	  else
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 0b13ca2..cf1790c 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -5281,11 +5281,10 @@ tree
 finish_omp_clauses (tree clauses, bool oacc)
 {
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
-  bitmap_head aligned_head, oacc_data_head;
+  bitmap_head aligned_head, oacc_data_head, oacc_reduction_head;
   tree c, t, *pc;
   bool branch_seen = false;
   bool copyprivate_seen = false;
-  bool oacc_data = false;
 
   bitmap_obstack_initialize (NULL);
   bitmap_initialize (generic_head, bitmap_default_obstack);
@@ -5293,10 +5292,13 @@ finish_omp_clauses (tree clauses, bool oacc)
   bitmap_initialize (lastprivate_head, bitmap_default_obstack);
   bitmap_initialize (aligned_head, bitmap_default_obstack);
   bitmap_initialize (oacc_data_head, bitmap_default_obstack);
+  bitmap_initialize (oacc_reduction_head, bitmap_default_obstack);
 
   for (pc = clauses, c = clauses; c ; c = *pc)
 {
   bool remove = false;
+  bool oacc_data = false;
+  bool reduction = false;
 
   switch (OMP_CLAUSE_CODE (c))
 	{
@@ -5314,6 +5316,7 @@ finish_omp_clauses (tree clauses, bool oacc)
 	  if (oacc)
 	{
 	  oacc_data = false;
+	  reduction = true;
 	  goto check_dup_oacc;
 	}
 	  else
@@ -5426,6 +5429,17 @@ finish_omp_clauses (tree clauses, bool oacc)
 	  else
 		bitmap_set_bit (oacc_data_head, DECL_UID (t));
 	}
+	  else if (reduction)
+	{
+	  if (oacc  bitmap_bit_p (oacc_reduction_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			%qE appears in multiple reduction clauses, t);
+		  remove = true;
+		}
+	  else
+		bitmap_set_bit (oacc_reduction_head, DECL_UID (t));
+	}
 	  else

Re: [gomp4] Redesign oacc_parallel launch API

2015-08-06 Thread Cesar Philippidis
On 07/28/2015 09:52 AM, Nathan Sidwell wrote:
 I've committed this patch to the gomp4 branch to redo the launch API. 
 I'll post a version for trunk once the versioning patch gets approved 
 committed.
 
 This changes the API in a number of ways, allowing device-specific
 knowledge to be moved into the device compiler and out of the host
 compiler.
 
 Firstly, we attach a tuple of launch dimensions as an attribute to the
 offloaded function's 'oacc function' attribute.  These are the constant
 launch dimensions.  Dynamic dimensions get a zero for their slot in this
 list.  Further this list can be extended in the future to an alist keyed
 by device_type.
 
 Dynamic dimensions are computed on the host.  however they are passed
 via varadic args to the GOACC_parallel function (which is renamed).  The
 varadic args are passed using key/value representation, and 3 keys are
 currently defined:
 END -- end of the varadic list
 DIM - set of runtime-computed dimensions.  Only the dynamic ones are
 passed.
 ASYNC_WAIT - an async and a set of waits (possibly zero).
 
 I have arranged for the key to have a slot that can later be filled by
 device_type, and hence support multiple device types.
 
 The constant dimensions can be used in expansion of the GOACC_nid
 function in the device compiler.  The device compiler could also process
 that list to select the device_type slot that is appropriate.
 
 For PTX the backend is augmented to emit the launch dimensions into the
 target data, from whence the ptx plugin can pick them up and overwrite
 with any dynamic ones passed in from the launch function.

Looking at set_oacc_fn_attrib, it appears that const values are also
considered dynamic. See the attached test case more more info. Is that
the expected behavior? If not, I could take a look at this after I
finished my reduction patch.

Cesar
#include stdio.h

const int vl = 32;

int
main ()
{
  unsigned int red = 0;

#pragma acc parallel loop vector_length (vl) vector reduction (+:red) copy (red)
  for (int i = 0; i  100; i++)
red ++;

  printf (red = %d\n, red);

  return 0;
}


Re: [gomp4] Worker reduction builtin

2015-08-06 Thread Cesar Philippidis
On 08/04/2015 04:50 AM, Nathan Sidwell wrote:

 +/* Worker reduction address expander.  */
 +static rtx
 +nvptx_expand_work_red_addr (tree exp, rtx target,
 + machine_mode ARG_UNUSED (mode),
 + int ignore)
  {
 -  return nvptx_expand_lock_unlock (desc, exp, false);
 +  if (ignore)
 +return target;
 +  
 +  rtx loop_id = expand_expr (CALL_EXPR_ARG (exp, 0),
 +  NULL_RTX, mode, EXPAND_NORMAL);
 +  rtx red_id = expand_expr (CALL_EXPR_ARG (exp, 1),
 +  NULL_RTX, mode, EXPAND_NORMAL);
 +  gcc_assert (GET_CODE (loop_id) == CONST_INT
 +GET_CODE (red_id) == CONST_INT);
 +  gcc_assert (REG_P (target));
 +
 +  unsigned lid = (unsigned)UINTVAL (loop_id);
 +  unsigned rid = (unsigned)UINTVAL (red_id);
 +
 +  unsigned ix;
 +
 +  for (ix = 0; ix != loop_reds.length (); ix++)
 +if (loop_reds[ix].id == lid)
 +  goto found_lid;
 +  /* Allocate a new loop.  */
 +  loop_reds.safe_push (loop_red (lid));
 + found_lid:
 +  loop_red loop = loop_reds[ix];
 +  for (ix = 0; ix != loop.vars.length (); ix++)
 +if (loop.vars[ix].first == rid)
 +  goto found_rid;
 +
 +  /* Allocate a new var. */
 +  {
 +tree type = TREE_TYPE (TREE_TYPE (exp));
 +enum machine_mode mode = TYPE_MODE (type);
 +unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
 +unsigned off = loop.hwm;
 +
 +if (align  worker_red_align)
 +  worker_red_align = align;
 +off = (off + align - 1)  ~(align -1);
 +loop.hwm = off + GET_MODE_SIZE (mode);
 +loop.vars.safe_push (var_red_t (rid, off));
 +  }
 + found_rid:
 +
 +  /* Return offset into worker reduction array.  */
 +  unsigned offset = loop.vars[ix].second;
 +  
 +  rtx addr = gen_reg_rtx (Pmode);
 +  emit_move_insn (addr,
 +   gen_rtx_PLUS (Pmode, worker_red_sym, GEN_INT (offset)));
 +  emit_insn (gen_rtx_SET (target,
 +   gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
 +   UNSPEC_FROM_SHARED)));
 +  return target;
  }

Something is wrong over here. I'm seeing this ICE:

wred.c: In function ‘main._omp_fn.0’:
wred.c:9:9: error: unrecognizable insn:
 #pragma acc parallel loop vector_length (32) num_workers (32) worker
reduction (+:red) copy (red)
 ^
(insn 28 27 29 2 (set (reg:DI 59)
(plus:DI (symbol_ref:DI (__worker_red))
(const_int 0 [0]))) wred.c:9 -1
 (nil))

The attached patch fixes it by assigning worker_red_sym to a scratch
register. Is this OK gomp-4_0-branch?

Cesar
2015-08-06  Cesar Philippidis  ce...@codesourcery.com

gcc/
	* config/nvptx/nvptx.c (nvptx_expand_work_red_addr): Use a
	scratch register for worker_red_sym.
	

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index e343e53..389e370 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3415,10 +3415,12 @@ nvptx_expand_work_red_addr (tree exp, rtx target,
 
   /* Return offset into worker reduction array.  */
   unsigned offset = loop.vars[ix].second;
-  
+
+  rtx base = gen_reg_rtx (Pmode);
   rtx addr = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (base, worker_red_sym));
   emit_move_insn (addr,
-		  gen_rtx_PLUS (Pmode, worker_red_sym, GEN_INT (offset)));
+		  gen_rtx_PLUS (Pmode, base, GEN_INT (offset)));
   emit_insn (gen_rtx_SET (target,
 			  gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
 	  UNSPEC_FROM_SHARED)));


Re: [gomp4] fix spinlock

2015-08-06 Thread Cesar Philippidis
On 08/06/2015 01:41 AM, Nathan Sidwell wrote:

 I've committed this to fix the spinlock problem Cesar fell over.  While
 there I added more checking on the worker dimension.

I hit a couple of more bugs with the spinlocks. First, the address space
argument to membar wasn't being handled properly. Second,
nvptx_spinunlock should probably be using atom.exch instead of atom.cas.
Finally, ptxas complains about the period prefix to the atom
instructions. This patch addresses these problems.

Is there a better way to allocate a scratch register for
nvptx_spinunlock, or is my solution ok as-is for gomp-4_0-branch?

Thanks,
Cesar



2015-08-06  Cesar Philippidis  ce...@codesourcery.com

	gcc/
	* config/nvptx/nvptx.c (nvptx_expand_lock_unlock): Pass an
	additional scratch register to gen_nvptx_spinlock.
	* config/nvptx/nvptx.md (nvptx_membar): Use %B for the address
	space operand.
	(nvptx_spinlock): Remove period prefix from atom.
	(nvptx_spinunlock): Take additional scratch register argument.
	Use atom.exch to update the lock.
	

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2013219..881aea4 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3327,7 +3327,7 @@ nvptx_expand_lock_unlock (tree exp, bool lock)
 label);
 }
   else
-pat = gen_nvptx_spinunlock (mem, space);
+pat = gen_nvptx_spinunlock (mem, space, gen_reg_rtx (SImode));
   emit_insn (pat);
   if (lock)
 emit_insn (barrier);
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 8cd8300..fb88c72 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1569,7 +1569,7 @@
   [(unspec_volatile [(match_operand:SI 0 const_int_operand )]
 		UNSPECV_MEMBAR)]
   
-  membar%M0;)
+  membar%B0;)
 
 ;; spinlock and unlock
 (define_insn nvptx_spinlock
@@ -1581,11 +1581,12 @@
   (match_operand:BI 3 register_operand =R)
   (label_ref (match_operand 4  ))])]

-   %4:\\t.atom%R1.cas.b32 %2,%0,0,1;setp.ne.u32 %3,%2,0;@%3 bra.uni %4;)
+   %4:\\tatom%R1.cas.b32 %2,%0,0,1;setp.ne.u32 %3,%2,0;@%3 bra.uni %4;)
 
 (define_insn nvptx_spinunlock
[(unspec_volatile [(match_operand:SI 0 memory_operand m)
 		  (match_operand:SI 1 const_int_operand i)]
-		  UNSPECV_UNLOCK)]
+		  UNSPECV_UNLOCK)
+(match_operand:SI 2 register_operand =R)]

-   .atom%R1.cas.b32 %0,1,0;)
+   atom%R1.exch.b32 %2,%0,0;)


<    1   2   3   4   5   6   7   >