Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-29 Thread Cesar Philippidis
On 06/29/2016 07:11 AM, Thomas Schwinge wrote:

> Cesar, I have not yet fully digested this, but do I understand right that
> you're really fixing two issues here, that are related (OpenACC routines)
> but still can be addressed independently of each other?  Do I understand
> right that the first one, the "problems with acc routines [...]
> incorrectly permitting 'acc seq' loops to call gang, worker and vector
> routines" is just a Fortran front end patch?  If yes, please split that
> one out, so as to reduce the volume of remaining changes that remain to
> be discussed.

This patch addresses the following issues:

 1. Issues warnings when a non-acc routine function is called inside an
OpenACC offloaded region.

 2. It corrects a bug what was allowing seq loops to call gang, worker
and vector routines.

 3. It adds supports for acc routines in fortran modules (which I
noticed was missing when I added 'acc routine seq' to acc_on_device
in the fortran openacc include files).

I'll split these into separate patches.

> On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis 
>  wrote:
>> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
 The second set of changes involves teaching the gimplifier to error when
 it detects a function call to an non-acc routines inside an OpenACC
 offloaded region.
> 
> As I understand, that's the same problem as has been discussed before
> (Ilya CCed), and has recently again been filed in
>  "ICE in LTO1 when attempting NVPTX
> offloading (-fopenacc)", and  "ICE in LTO1
> with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
> threads include:
> ,
> ,
> .
> 
 Actually, I relaxed non-acc routines by excluding
 calls to builtin functions, including those prefixed with _gfortran_.
 Nvptx does have a newlib c library, and it also has a subset of
 libgfortran. Still, this solution is probably not optimal.
>>>
>>> I don't really like that, hardcoding prefixes or whatever is available
>>> (you have quite some subset of libc, libm etc. available too) in the
>>> compiler looks very hackish.  What is wrong with complaining during
>>> linking of the offloaded code?
> 
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

OK. I'll drop the warning changes from my patch set then unless you want
to keep it.

> A few random comments on the patch:
> 
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -303,6 +303,15 @@ enum save_state
>>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>>  };
>>  
>> +/* Flags to keep track of ACC routine states.  */
>> +enum oacc_function
>> +{ OACC_FUNCTION_NONE = 0,
>> +  OACC_FUNCTION_SEQ,
>> +  OACC_FUNCTION_GANG,
>> +  OACC_FUNCTION_WORKER,
>> +  OACC_FUNCTION_VECTOR
>> +};
> 
> What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
> as I can tell?

It's used by the fortran module code. It controls how parallelism gets
encoded in the .mod files.

>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>>  
>>  /* Determine the loop level for a routine.   */
>>  
>> -static int
>> +static oacc_function
>>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>  {
>>int level = -1;
>> +  oacc_function ret = OACC_FUNCTION_SEQ;
>>  
>>if (clauses)
>>  {
>>   

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-29 Thread Jakub Jelinek
On Wed, Jun 29, 2016 at 04:11:31PM +0200, Thomas Schwinge wrote:
> > >> Actually, I relaxed non-acc routines by excluding
> > >> calls to builtin functions, including those prefixed with _gfortran_.
> > >> Nvptx does have a newlib c library, and it also has a subset of
> > >> libgfortran. Still, this solution is probably not optimal.
> > > 
> > > I don't really like that, hardcoding prefixes or whatever is available
> > > (you have quite some subset of libc, libm etc. available too) in the
> > > compiler looks very hackish.  What is wrong with complaining during
> > > linking of the offloaded code?
> 
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

Yes.  Well, OpenMP doesn't have different kinds of target functions, just
one.  And at least the current spec doesn't require that target regions or
declare target functions only call functions declared target, I guess mainly
because that would require that all the C/C++ headers are OpenMP aware and
declare everything that has the offloading counterpart.
For user code, of course users have to declare their routines, otherwise it
just can't be offloaded, and the implementation runtime is a very fuzzy
thing outside of the standard.

Jakub


Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-29 Thread Thomas Schwinge
Hi!

Cesar, I have not yet fully digested this, but do I understand right that
you're really fixing two issues here, that are related (OpenACC routines)
but still can be addressed independently of each other?  Do I understand
right that the first one, the "problems with acc routines [...]
incorrectly permitting 'acc seq' loops to call gang, worker and vector
routines" is just a Fortran front end patch?  If yes, please split that
one out, so as to reduce the volume of remaining changes that remain to
be discussed.

On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis  
wrote:
> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> > On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> >> The second set of changes involves teaching the gimplifier to error when
> >> it detects a function call to an non-acc routines inside an OpenACC
> >> offloaded region.

As I understand, that's the same problem as has been discussed before
(Ilya CCed), and has recently again been filed in
 "ICE in LTO1 when attempting NVPTX
offloading (-fopenacc)", and  "ICE in LTO1
with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
threads include:
,
,
.

> >> Actually, I relaxed non-acc routines by excluding
> >> calls to builtin functions, including those prefixed with _gfortran_.
> >> Nvptx does have a newlib c library, and it also has a subset of
> >> libgfortran. Still, this solution is probably not optimal.
> > 
> > I don't really like that, hardcoding prefixes or whatever is available
> > (you have quite some subset of libc, libm etc. available too) in the
> > compiler looks very hackish.  What is wrong with complaining during
> > linking of the offloaded code?

ACK.  Jakub, do I understand you correctly, that you basically say that
every function declaration that is in scope inside offloaded regions (for
example, GCC builtin functions, or standard library functions declared in
target compiler's header files) is permitted to be called in offloaded
regions, and the offloading compiler will then either be able to resolve
these (nvptx back end knows about trigonometric functions, for example,
and a lot of functions are available in the nvptx libc), or otherwise
error out during the offloading compilation (during linking), gracefully
without terminating the target compilation (that "gracefully" bit is
currently missing -- that's for another day).  That is, all such
functions are implicitly callable as OpenACC "seq" functions (which means
that they don't internally use gang/worker/vector parallelism).  In
particular, all these functions do *not* need to be marked with an
explicit "#pragma acc routine seq" directive.  (Functions internally
using gang/worker/vector parallelism will need to be marked
appropriately, using a "#pragma acc routine gang/worker/vector"
directive.)  That's how I understand your comment above, and your earlier
comments on this topic, and also is what I think should be done.

> Wouldn't the error get reported multiple times then, i.e. once per
> target? Then again, maybe this error could have been restrained to the
> host compiler.

That's not something I would care about right now.  :-)

> Anyway, this patch now reduces that error to a warning. Furthermore,
> that warning is being thrown in lower_omp_1 instead of
> gimplify_call_expr because the latter is called multiple times and that
> causes duplicate warnings. The only bit of fallout I had with this
> change was with the fortran FE's usage of BUILT_IN_EXPECT in
> gfc_{un}likely. Since these are generated implicitly by the FE, I just
> added an oacc_function attribute to those calls when flag_openacc is set.
> 
> >> Next, I had to modify the openacc header files in libgomp to mark
> >> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> >> build the opeancc.mod module for gfortran with -fopenacc. But doing
> >> that, caused caused gcc to stream offloaded code to the openacc.o object
> >> file. So, I've updated the behavior of flag_generate_offload such that
> >> minus one indicates that the user specified -foffload=disable, and that
> >> will prevent gcc from streaming offloaded lto code. The alternative was
> >> to hack libtool to build libgomp with -foffload=disable.
> > 
> > This also looks wrong.  I'd say the right thing is when loading modules
> > that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> > handled this well) into CU with the corresponding flags unset (-fopenacc,
> > -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> > IMHO the module loading code should just ignore it, pretend it wasn't 

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-23 Thread Cesar Philippidis
On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
>> The second set of changes involves teaching the gimplifier to error when
>> it detects a function call to an non-acc routines inside an OpenACC
>> offloaded region. Actually, I relaxed non-acc routines by excluding
>> calls to builtin functions, including those prefixed with _gfortran_.
>> Nvptx does have a newlib c library, and it also has a subset of
>> libgfortran. Still, this solution is probably not optimal.
> 
> I don't really like that, hardcoding prefixes or whatever is available
> (you have quite some subset of libc, libm etc. available too) in the
> compiler looks very hackish.  What is wrong with complaining during
> linking of the offloaded code?

Wouldn't the error get reported multiple times then, i.e. once per
target? Then again, maybe this error could have been restrained to the
host compiler.

Anyway, this patch now reduces that error to a warning. Furthermore,
that warning is being thrown in lower_omp_1 instead of
gimplify_call_expr because the latter is called multiple times and that
causes duplicate warnings. The only bit of fallout I had with this
change was with the fortran FE's usage of BUILT_IN_EXPECT in
gfc_{un}likely. Since these are generated implicitly by the FE, I just
added an oacc_function attribute to those calls when flag_openacc is set.

>> Next, I had to modify the openacc header files in libgomp to mark
>> acc_on_device as an acc routine. Unfortunately, this meant that I had to
>> build the opeancc.mod module for gfortran with -fopenacc. But doing
>> that, caused caused gcc to stream offloaded code to the openacc.o object
>> file. So, I've updated the behavior of flag_generate_offload such that
>> minus one indicates that the user specified -foffload=disable, and that
>> will prevent gcc from streaming offloaded lto code. The alternative was
>> to hack libtool to build libgomp with -foffload=disable.
> 
> This also looks wrong.  I'd say the right thing is when loading modules
> that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> handled this well) into CU with the corresponding flags unset (-fopenacc,
> -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> IMHO the module loading code should just ignore it, pretend it wasn't there.
> Similarly e.g. to how lto1 with -g0 should ignore debug statements that
> could be in the LTO inputs.

This required two changes. First, I had to teach lto-cgraph.c how to
report an error rather then fail an assert when partitions are missing
decls. Second, I taught the lto wrapper how to stream offloaded code on
the absence of -fopen*. The only kink with this approach is that I had
to build libgomp/openacc.f90 with -frandom-seed=1 to prevent lto related
bootstrap failures.

By the way, Thomas, I've added

 #pragma acc routine(__builtin_acc_on_device) seq

to openacc.h. Is this OK, or should I just modify the various
libgomp.oacc-c-c++-common/loop* tests to use that pragma directly? Or
another option is to have the compiler add that attribute directly. I
don't think we're really expecting the end user to use
__builtin_acc_on_device directly since this is a gcc-ism.

Cesar
2016-06-23  Cesar Philippidis  

	gcc/
	* lto-cgraph.c (input_overwrite_node): Error on missing symbols.
	(input_varpool_node): Likewise.
	* lto-wrapper.c (compile_images_for_offload_targets): Don't stream
	offloaded images without -fopenacc, -fopenmp or -fopenmp-simd.
	(run_gcc): Set flag_openacc, flag_openmp, and flag_openmp_simd.
	* omp-low.c (lower_omp_1): Emit a warning when calling a function
	that doesn't have an oacc_function attribute from an OpenACC offloaded
	region.
	(oacc_loop_fixed_partitions): Consider SEQ loops when checking
	parallelism.

	gcc/fortran/
	* gfortran.h (enum oacc_function): New enum.
	(oacc_function_types): Declare.
	(symbol_attribute): Add oacc_function field.
	(gfc_intrinsic_sym): Likewise.
	(add_omp_offloading_attributes): Declare.
	* intrinsic.c (add_sym): Initialize oacc_fuction to zero.
	(gfc_intrinsic_sub_interface): Set attr.oacc_function as to
	OACC_FUNCTION_SEQ in the resolved symbol when appropriate.
	* module.c (oacc_function): New DECL_MIO_NAME.
	(mio_symbol_attribute): Set attr->oacc_function.
	* openmp.c (gfc_oacc_routine_dims): Change return type to oacc_function.
	(gfc_match_oacc_routine): Permit named 'acc routine' directives on
	intrinsic procedures.  Update call to gfc_oacc_routine_dims.
	* symbol.c (oacc_function_types): Define.
	* trans-decl.c (add_omp_offloading_attributes): New function.
	(add_attributes_to_decl): Use it.
	* trans.c (gfc_unlikely): Mark calls BUILT_IN_EXPECT as 'acc routines'
	with flag_openacc is set.
	(gfc_likely): Likewise.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-1.c: Add warnings to calls to
	__builtin_abort.
	* c-c++-common/goacc/parallel-1.c: Likewise.
	* c-c++-common/goacc/routine-3.c: Add coverage for acc seq 

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-17 Thread Jakub Jelinek
On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> The second set of changes involves teaching the gimplifier to error when
> it detects a function call to an non-acc routines inside an OpenACC
> offloaded region. Actually, I relaxed non-acc routines by excluding
> calls to builtin functions, including those prefixed with _gfortran_.
> Nvptx does have a newlib c library, and it also has a subset of
> libgfortran. Still, this solution is probably not optimal.

I don't really like that, hardcoding prefixes or whatever is available
(you have quite some subset of libc, libm etc. available too) in the
compiler looks very hackish.  What is wrong with complaining during
linking of the offloaded code?

> Next, I had to modify the openacc header files in libgomp to mark
> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> build the opeancc.mod module for gfortran with -fopenacc. But doing
> that, caused caused gcc to stream offloaded code to the openacc.o object
> file. So, I've updated the behavior of flag_generate_offload such that
> minus one indicates that the user specified -foffload=disable, and that
> will prevent gcc from streaming offloaded lto code. The alternative was
> to hack libtool to build libgomp with -foffload=disable.

This also looks wrong.  I'd say the right thing is when loading modules
that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
handled this well) into CU with the corresponding flags unset (-fopenacc,
-fopenmp, -fopenmp-simd here, depending on which bit it is), then
IMHO the module loading code should just ignore it, pretend it wasn't there.
Similarly e.g. to how lto1 with -g0 should ignore debug statements that
could be in the LTO inputs.

Jakub


[PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-15 Thread Cesar Philippidis
This patch addresses the following problems with acc routines:

 * incorrectly permitting 'acc seq' loops to call gang, worker and
   vector routines

 * lto-wrapper errors when a function or subroutine isn't marked as
   'acc routine'

The solution to the first problem is straightforward. It only required a
small change to oacc_loop_fixed_partitions. The solution to the second
problem is more involved, since it required changes to the fortran FE,
gimplifier, the behavior of flag_generate_offload, and libgomp.

Starting with the the fortran changes, this patch updates the way that
the fortran FE handles the 'acc routine' attribute in modules. Before,
it only recorded that a function was marked as an acc routine. With this
patch, it now records the level of parallelism the routine has. This is
necessary for the middle end to validate compatible parallelism between
the loop calling the routine and the routine itself.

The second set of changes involves teaching the gimplifier to error when
it detects a function call to an non-acc routines inside an OpenACC
offloaded region. Actually, I relaxed non-acc routines by excluding
calls to builtin functions, including those prefixed with _gfortran_.
Nvptx does have a newlib c library, and it also has a subset of
libgfortran. Still, this solution is probably not optimal.

Next, I had to modify the openacc header files in libgomp to mark
acc_on_device as an acc routine. Unfortunately, this meant that I had to
build the opeancc.mod module for gfortran with -fopenacc. But doing
that, caused caused gcc to stream offloaded code to the openacc.o object
file. So, I've updated the behavior of flag_generate_offload such that
minus one indicates that the user specified -foffload=disable, and that
will prevent gcc from streaming offloaded lto code. The alternative was
to hack libtool to build libgomp with -foffload=disable.

Is this patch OK for trunk?

There are still a couple of other quirks with routines we'll need to
address with a follow up patch. Namely, passing scalar dummy arguments
causes to subroutines trips up the nvptx worker and vector state
propagator if the actual argument is a local variable. That's because
the nvptx state propagator only forwards the pointer to the worker and
vector threads, and not the actual variable itself. Consequently, those
pointers dereference garbage. This is a problem with pass-by-reference
in general.

Cesar

2016-06-15  Cesar Philippidis  

	gcc/
	* cgraphunit.c (ipa_passes): Only stream offloaded code when
	flag_generate_offload is positive.
	(symbol_table::compile): Likewise.
	* common.opt (flag_generate_offload): Update comment on its usage.
	* gimplify.c (gimplify_call_expr): Verify that function calls inside
	OpenACC offloaded regions are 'acc routines'.
	* ipa-inline-analysis.c (inline_generate_summary): Update the usage of
	flag_generate_offload.
	* lto-streamer.c (gate_lto_out): Likewise.
	* omp-low.c (oacc_loop_fixed_partitions): Consider SEQ loop when
	validing loop parallelism restrictions.
	* opts.c (common_handle_option): Set x_flag_generate_offload to minus
	one with -foffload=disable.
	* passes.c (ipa_write_summaries): Update usage of flag_generate_offload.
	* toplev.c (compile_file): Likewise.
	* tree.c (free_lang_data):  Likewise.

	gcc/fortran/
	* gfortran.h (enum oacc_function): New enum.
	* module.c (oacc_function): New DECIO_MIO_NAME.
	(mio_symbol_attribute): Handle oacc_function attributes.
	* openmp.c (gfc_oacc_routine_dims): Use enum oacc_function to capture
	acc routine geometry.
	(gfc_match_oacc_routine): Update call to gfc_oacc_routine_dims.
	* symbol.c (oacc_function_types): New const mstring.
	* trans-decl.c (add_attributes_to_decl): Update handling of
	oacc_function.

	gcc/testsuite/
	* c-c++-common/goacc/routine-3.c: Add test coverage for seq loops.
	* c-c++-common/goacc/routine-6.c: New test.
	* gfortran.dg/goacc/routine-7.f90: New test.
	* gfortran.dg/goacc/routine-8.f90: New test.

	libgomp/
	* Makefile.am (openacc.lo): New target.
	(openacc.mod): Build with -fopenacc -foffload=disable.
	* Makefile.in: Regenerate.
	* openacc.f90 (function_on_device_h): Make 'acc routine seq'.
	* openacc.h (acc_on_device): Likewise.
	* openacc_lib.h (acc_on_device): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Filter out warning.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Update test case to
	properly utilize acc parallelism.

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 4bfcad7..5dd211c 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2292,12 +2292,12 @@ ipa_passes (void)
 }
 
   /* Some targets need to handle LTO assembler output specially.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
 targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
 {
-  if (g->have_offload)
+  if (g->have_offload && flag_generate_offload > 0)
 	{
 	  section_name_prefix =