This patch addresses the following problems with acc routines: * incorrectly permitting 'acc seq' loops to call gang, worker and vector routines
* lto-wrapper errors when a function or subroutine isn't marked as 'acc routine' The solution to the first problem is straightforward. It only required a small change to oacc_loop_fixed_partitions. The solution to the second problem is more involved, since it required changes to the fortran FE, gimplifier, the behavior of flag_generate_offload, and libgomp. Starting with the the fortran changes, this patch updates the way that the fortran FE handles the 'acc routine' attribute in modules. Before, it only recorded that a function was marked as an acc routine. With this patch, it now records the level of parallelism the routine has. This is necessary for the middle end to validate compatible parallelism between the loop calling the routine and the routine itself. The second set of changes involves teaching the gimplifier to error when it detects a function call to an non-acc routines inside an OpenACC offloaded region. Actually, I relaxed non-acc routines by excluding calls to builtin functions, including those prefixed with _gfortran_. Nvptx does have a newlib c library, and it also has a subset of libgfortran. Still, this solution is probably not optimal. Next, I had to modify the openacc header files in libgomp to mark acc_on_device as an acc routine. Unfortunately, this meant that I had to build the opeancc.mod module for gfortran with -fopenacc. But doing that, caused caused gcc to stream offloaded code to the openacc.o object file. So, I've updated the behavior of flag_generate_offload such that minus one indicates that the user specified -foffload=disable, and that will prevent gcc from streaming offloaded lto code. The alternative was to hack libtool to build libgomp with -foffload=disable. Is this patch OK for trunk? There are still a couple of other quirks with routines we'll need to address with a follow up patch. Namely, passing scalar dummy arguments causes to subroutines trips up the nvptx worker and vector state propagator if the actual argument is a local variable. That's because the nvptx state propagator only forwards the pointer to the worker and vector threads, and not the actual variable itself. Consequently, those pointers dereference garbage. This is a problem with pass-by-reference in general. Cesar
2016-06-15 Cesar Philippidis <ce...@codesourcery.com> gcc/ * cgraphunit.c (ipa_passes): Only stream offloaded code when flag_generate_offload is positive. (symbol_table::compile): Likewise. * common.opt (flag_generate_offload): Update comment on its usage. * gimplify.c (gimplify_call_expr): Verify that function calls inside OpenACC offloaded regions are 'acc routines'. * ipa-inline-analysis.c (inline_generate_summary): Update the usage of flag_generate_offload. * lto-streamer.c (gate_lto_out): Likewise. * omp-low.c (oacc_loop_fixed_partitions): Consider SEQ loop when validing loop parallelism restrictions. * opts.c (common_handle_option): Set x_flag_generate_offload to minus one with -foffload=disable. * passes.c (ipa_write_summaries): Update usage of flag_generate_offload. * toplev.c (compile_file): Likewise. * tree.c (free_lang_data): Likewise. gcc/fortran/ * gfortran.h (enum oacc_function): New enum. * module.c (oacc_function): New DECIO_MIO_NAME. (mio_symbol_attribute): Handle oacc_function attributes. * openmp.c (gfc_oacc_routine_dims): Use enum oacc_function to capture acc routine geometry. (gfc_match_oacc_routine): Update call to gfc_oacc_routine_dims. * symbol.c (oacc_function_types): New const mstring. * trans-decl.c (add_attributes_to_decl): Update handling of oacc_function. gcc/testsuite/ * c-c++-common/goacc/routine-3.c: Add test coverage for seq loops. * c-c++-common/goacc/routine-6.c: New test. * gfortran.dg/goacc/routine-7.f90: New test. * gfortran.dg/goacc/routine-8.f90: New test. libgomp/ * Makefile.am (openacc.lo): New target. (openacc.mod): Build with -fopenacc -foffload=disable. * Makefile.in: Regenerate. * openacc.f90 (function_on_device_h): Make 'acc routine seq'. * openacc.h (acc_on_device): Likewise. * openacc_lib.h (acc_on_device): Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-4.c: Filter out warning. * testsuite/libgomp.oacc-fortran/routine-7.f90: Update test case to properly utilize acc parallelism. diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 4bfcad7..5dd211c 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2292,12 +2292,12 @@ ipa_passes (void) } /* Some targets need to handle LTO assembler output specially. */ - if (flag_generate_lto || flag_generate_offload) + if (flag_generate_lto || flag_generate_offload > 0) targetm.asm_out.lto_start (); if (!in_lto_p) { - if (g->have_offload) + if (g->have_offload && flag_generate_offload > 0) { section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX; lto_stream_offload_p = true; @@ -2312,7 +2312,7 @@ ipa_passes (void) } } - if (flag_generate_lto || flag_generate_offload) + if (flag_generate_lto || flag_generate_offload > 0) targetm.asm_out.lto_end (); if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects)) @@ -2393,11 +2393,11 @@ symbol_table::compile (void) state = IPA; /* Offloading requires LTO infrastructure. */ - if (!in_lto_p && g->have_offload) + if (!in_lto_p && g->have_offload && flag_generate_offload >= 0) flag_generate_offload = 1; /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE. */ - if (flag_generate_lto || flag_generate_offload) + if (flag_generate_lto || flag_generate_offload > 0) lto_streamer_hooks_init (); /* Don't run the IPA passes if there was any error or sorry messages. */ diff --git a/gcc/common.opt b/gcc/common.opt index f0d7196..9560e08 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -73,7 +73,8 @@ int *param_values Variable int flag_generate_lto -; Nonzero if we should write GIMPLE bytecode for offload compilation. +; Positive if we should write GIMPLE bytecode for offload compilation. +; Negative if the user explicitly passed -foffload=disable. Variable int flag_generate_offload = 0 diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 0bb71cb..fac94ca 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -303,6 +303,15 @@ enum save_state { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT }; +/* Flags to keep track of ACC routine states. */ +enum oacc_function +{ OACC_FUNCTION_NONE = 0, + OACC_FUNCTION_SEQ, + OACC_FUNCTION_GANG, + OACC_FUNCTION_WORKER, + OACC_FUNCTION_VECTOR +}; + /* Strings for all symbol attributes. We use these for dumping the parse tree, in error messages, and also when reading and writing modules. In symbol.c. */ @@ -312,6 +321,7 @@ extern const mstring intents[]; extern const mstring access_types[]; extern const mstring ifsrc_types[]; extern const mstring save_status[]; +extern const mstring oacc_function_types[]; /* Enumeration of all the generic intrinsic functions. Used by the backend for identification of a function. */ @@ -862,7 +872,7 @@ typedef struct unsigned oacc_declare_link:1; /* This is an OpenACC acclerator function at level N - 1 */ - unsigned oacc_function:3; + ENUM_BITFIELD (oacc_function) oacc_function:3; /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES). */ unsigned ext_attr:EXT_ATTR_NUM; diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c index 6d3860e..e3ed2a0 100644 --- a/gcc/fortran/module.c +++ b/gcc/fortran/module.c @@ -2095,6 +2095,7 @@ DECL_MIO_NAME (procedure_type) DECL_MIO_NAME (ref_type) DECL_MIO_NAME (sym_flavor) DECL_MIO_NAME (sym_intent) +DECL_MIO_NAME (oacc_function) #undef DECL_MIO_NAME /* Symbol attributes are stored in list with the first three elements @@ -2116,6 +2117,8 @@ mio_symbol_attribute (symbol_attribute *attr) attr->proc = MIO_NAME (procedure_type) (attr->proc, procedures); attr->if_source = MIO_NAME (ifsrc) (attr->if_source, ifsrc_types); attr->save = MIO_NAME (save_state) (attr->save, save_status); + attr->oacc_function = MIO_NAME (oacc_function) (attr->oacc_function, + oacc_function_types); ext_attr = attr->ext_attr; mio_integer ((int *) &ext_attr); diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 2c92794..96fc2fd 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -1716,21 +1716,31 @@ gfc_match_oacc_cache (void) /* Determine the loop level for a routine. */ -static int +static oacc_function gfc_oacc_routine_dims (gfc_omp_clauses *clauses) { int level = -1; + oacc_function ret = OACC_FUNCTION_SEQ; if (clauses) { unsigned mask = 0; if (clauses->gang) - level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level); + { + level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level); + ret = OACC_FUNCTION_GANG; + } if (clauses->worker) - level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level); + { + level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level); + ret = OACC_FUNCTION_WORKER; + } if (clauses->vector) - level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level); + { + level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level); + ret = OACC_FUNCTION_VECTOR; + } if (clauses->seq) level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level); @@ -1741,7 +1751,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses) if (level < 0) level = GOMP_DIM_MAX; - return level; + return ret; } match @@ -1834,7 +1844,7 @@ gfc_match_oacc_routine (void) &old_loc)) goto cleanup; gfc_current_ns->proc_name->attr.oacc_function - = gfc_oacc_routine_dims (c) + 1; + = gfc_oacc_routine_dims (c); } if (n) diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c index 0ee7dec..b1dd32b 100644 --- a/gcc/fortran/symbol.c +++ b/gcc/fortran/symbol.c @@ -87,6 +87,15 @@ const mstring save_status[] = minit ("IMPLICIT-SAVE", SAVE_IMPLICIT), }; +const mstring oacc_function_types[] = +{ + minit ("NONE", OACC_FUNCTION_NONE), + minit ("OACC_FUNCTION_SEQ", OACC_FUNCTION_SEQ), + minit ("OACC_FUNCTION_GANG", OACC_FUNCTION_GANG), + minit ("OACC_FUNCTION_WORKER", OACC_FUNCTION_WORKER), + minit ("OACC_FUNCTION_VECTOR", OACC_FUNCTION_VECTOR) +}; + /* This is to make sure the backend generates setup code in the correct order. */ diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index 2f5e434..0b8d638 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -1331,7 +1331,22 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list) { tree dims = NULL_TREE; int ix; - int level = sym_attr.oacc_function - 1; + int level = GOMP_DIM_MAX; + + switch (sym_attr.oacc_function) + { + case OACC_FUNCTION_GANG: + level = GOMP_DIM_GANG; + break; + case OACC_FUNCTION_WORKER: + level = GOMP_DIM_WORKER; + break; + case OACC_FUNCTION_VECTOR: + level = GOMP_DIM_VECTOR; + break; + case OACC_FUNCTION_SEQ: + default:; + } for (ix = GOMP_DIM_MAX; ix--;) dims = tree_cons (build_int_cst (boolean_type_node, ix >= level), diff --git a/gcc/gimplify.c b/gcc/gimplify.c index ae8b4fc..6a9ab3c 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -2697,6 +2697,26 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value) CALL_EXPR_FN (*expr_p) = build1 (NOP_EXPR, fnptrtype, CALL_EXPR_FN (*expr_p)); + /* Check if this function is being called from inside an OpenACC + offloaded region. If so, verify that this function has been + declared as an 'acc routine'. Defer loop parallelism geometry + checking until oacc_device_lower. */ + + const char *name = fndecl == NULL_TREE ? "" + : IDENTIFIER_POINTER (DECL_NAME (fndecl)); + gimplify_omp_ctx *ctx = gimplify_omp_ctxp; + if (ctx != NULL && (ctx->region_type == ORT_ACC + || ctx->region_type == ORT_ACC_PARALLEL + || ctx->region_type == ORT_ACC_KERNELS) + && !is_builtin_fn (fndecl) + && !(lang_GNU_Fortran () && strncmp (name, "_gfortran_", 10) == 0) + && get_oacc_fn_attrib (fndecl) == NULL_TREE) + { + error ("%qE is not an %<acc routine%>", fndecl); + CALL_EXPR_FN (*expr_p) = NULL_TREE; + return GS_OK; + } + return ret; } diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index 5d67218..c37a5ab 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -4179,7 +4179,8 @@ inline_generate_summary (void) /* When not optimizing, do not bother to analyze. Inlining is still done because edge redirection needs to happen there. */ - if (!optimize && !flag_generate_lto && !flag_generate_offload && !flag_wpa) + if (!optimize && !flag_generate_lto && flag_generate_offload <= 0 + && !flag_wpa) return; if (!inline_summaries) diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c index bfde1fe..e1848be 100644 --- a/gcc/lto-streamer.c +++ b/gcc/lto-streamer.c @@ -309,7 +309,7 @@ lto_streamer_init (void) bool gate_lto_out (void) { - return ((flag_generate_lto || flag_generate_offload || in_lto_p) + return ((flag_generate_lto || flag_generate_offload > 0 || in_lto_p) /* Don't bother doing anything if the program has errors. */ && !seen_error ()); } diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 22e5909..7824048 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -19420,7 +19420,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask) { unsigned outermost = this_mask & -this_mask; - if (outermost && outermost <= outer_mask) + if ((outermost && outermost <= outer_mask) + || (this_mask && (loop->parent->flags & OLF_SEQ))) { if (noisy) { diff --git a/gcc/opts.c b/gcc/opts.c index e80331f..eec91f8 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -1930,6 +1930,7 @@ common_handle_option (struct gcc_options *opts, && (p[7] == ',' || p[7] == '\0')) { opts->x_flag_disable_hsa = true; + opts->x_flag_generate_offload = -1; break; } diff --git a/gcc/passes.c b/gcc/passes.c index 0565cfa..9c1b902 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2531,7 +2531,7 @@ ipa_write_summaries (void) struct cgraph_node *node; struct cgraph_node **order; - if ((!flag_generate_lto && !flag_generate_offload) || seen_error ()) + if ((!flag_generate_lto && flag_generate_offload <= 0) || seen_error ()) return; select_what_to_stream (); diff --git a/gcc/testsuite/c-c++-common/goacc/routine-3.c b/gcc/testsuite/c-c++-common/goacc/routine-3.c index b322d26..fabae1f 100644 --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c @@ -49,7 +49,7 @@ main () int red = 0; #pragma acc parallel copy (red) { - /* Independent/seq loop tests. */ + /* Independent loop tests. */ #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" } for (int i = 0; i < 10; i++) red += gang (); @@ -62,6 +62,19 @@ main () for (int i = 0; i < 10; i++) red += vector (); + /* Seq loop tests. */ +#pragma acc loop seq reduction (+:red) /* { dg-message "containing loop" } */ + for (int i = 0; i < 10; i++) + red += gang (); /* { dg-error "incorrectly nested" } */ + +#pragma acc loop seq reduction (+:red) /* { dg-message "containing loop" } */ + for (int i = 0; i < 10; i++) + red += worker (); /* { dg-error "incorrectly nested" } */ + +#pragma acc loop seq reduction (+:red) /* { dg-message "containing loop" } */ + for (int i = 0; i < 10; i++) + red += vector (); /* { dg-error "incorrectly nested" } */ + /* Gang routine tests. */ #pragma acc loop gang reduction (+:red) /* { dg-message "containing loop" } */ for (int i = 0; i < 10; i++) diff --git a/gcc/testsuite/c-c++-common/goacc/routine-6.c b/gcc/testsuite/c-c++-common/goacc/routine-6.c new file mode 100644 index 0000000..e95954b --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/routine-6.c @@ -0,0 +1,26 @@ +/* Test calls to non-routines. */ + +int +sum (int a, int b) +{ + return a + b; +} + +int +main () +{ + int c = 0, i; + +#pragma acc parallel loop reduction(+:c) + for (i = 0; i < 100; i++) + c += sum (i, i); /* { dg-error "'sum' is not an 'acc routine'" } */ + + /* Built-in functions are permitted. */ +#pragma acc parallel + { + if (c < 0) + __builtin_abort (); + } + + return 0; +} diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-7.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90 new file mode 100644 index 0000000..27b08b5 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90 @@ -0,0 +1,25 @@ +! Test calls to non-acc routines. + +program test + implicit none + integer c, i + + c = 0 + + !$acc parallel loop reduction(+:c) + do i = 0, 100 + c = c + sum (i, i) ! { dg-error "'sum' is not an 'acc routine'" } + end do + !$acc end parallel loop + + !$acc parallel + if (c .le. 0) call abort + !$acc end parallel + +contains + integer function sum(a, b) + integer a, b + sum = a + b + end function sum + +end program test diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-8.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90 new file mode 100644 index 0000000..d2cb51a --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90 @@ -0,0 +1,122 @@ +! Check routine calls with insufficient parallelism. + +! { dg-do compile } +! { dg-additional-options "-cpp -O0" } + +#define M 8 +#define N 32 + +program main + integer :: i + integer :: a(N) + integer :: b(M * N) + + do i = 1, N + a(i) = 0 + end do + + !$acc parallel copy (a) + !$acc loop seq + do i = 1, N + call seq (a) + end do + !$acc end parallel + + do i = 1, N + if (a(i) .ne.N) call abort + end do + + !$acc parallel copy (a) + !$acc loop seq ! { dg-message "containing loop here" } + do i = 1, N + call gang (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" } + end do + !$acc end parallel + + do i = 1, N + if (a(i) .ne. (N + (N * (-1 * i)))) call abort + end do + + do i = 1, N + b(i) = i + end do + + !$acc parallel copy (b) + !$acc loop seq ! { dg-message "containing loop here" } + do i = 1, N + call worker (b) ! { dg-error "incorrectly nested OpenACC loop parallelism" } + end do + !$acc end parallel + + do i = 1, N + if (b(i) .ne. N + i) call abort + end do + + do i = 1, N + a(i) = i + end do + + !$acc parallel copy (a) + !$acc loop seq ! { dg-message "containing loop here" } + do i = 1, N + call vector (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" } + end do + !$acc end parallel + + do i = 1, N + if (a(i) .ne. 0) call abort + end do + +contains + +subroutine vector (a) + !$acc routine vector + integer, intent (inout) :: a(N) + integer :: i + + !$acc loop vector + do i = 1, N + a(i) = a(i) - a(i) + end do + +end subroutine vector + +subroutine worker (b) + !$acc routine worker + integer, intent (inout) :: b(M*N) + integer :: i, j + + !$acc loop worker + do i = 1, N + !$acc loop vector + do j = 1, M + b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1 + end do + end do + +end subroutine worker + +subroutine gang (a) + !$acc routine gang + integer, intent (inout) :: a(N) + integer :: i + + !$acc loop gang + do i = 1, N + a(i) = a(i) - i + end do + +end subroutine gang + +subroutine seq (a) + !$acc routine seq + integer, intent (inout) :: a(M) + integer :: i + + do i = 1, N + a(i) = a(i) + 1 + end do + +end subroutine seq + +end program main diff --git a/gcc/toplev.c b/gcc/toplev.c index f51d2cb..47d8e2e 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -556,7 +556,7 @@ compile_file (void) We used to emit an undefined reference here, but this produces link errors if an object file with IL is stored into a shared library without invoking lto1. */ - if (flag_generate_lto || flag_generate_offload) + if (flag_generate_lto || flag_generate_offload > 0) { #if defined ASM_OUTPUT_ALIGNED_DECL_COMMON ASM_OUTPUT_ALIGNED_DECL_COMMON (asm_out_file, NULL_TREE, diff --git a/gcc/tree.c b/gcc/tree.c index fd0e692..e6712c2 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -6000,7 +6000,7 @@ free_lang_data (void) /* If we are the LTO frontend we have freed lang-specific data already. */ if (in_lto_p - || (!flag_generate_lto && !flag_generate_offload)) + || (!flag_generate_lto && flag_generate_offload <= 0)) return 0; /* Allocate and assign alias sets to the standard integer types diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am index a3e1c2b..085478b 100644 --- a/libgomp/Makefile.am +++ b/libgomp/Makefile.am @@ -87,8 +87,10 @@ omp_lib_kinds.mod: omp_lib.mod : openacc_kinds.mod: openacc.mod : -openacc.mod: openacc.lo - : +openacc.lo: openacc.f90 + $(LTFCCOMPILE) -fopenacc -foffload=disable -c -o $@ $^ +openacc.mod: openacc.f90 + $(FC) $(FCFLAGS) -fopenacc -foffload=disable -c $< %.mod: %.f90 $(FC) $(FCFLAGS) -fsyntax-only $< fortran.lo: libgomp_f.h diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in index 88c8517..baf0f8d 100644 --- a/libgomp/Makefile.in +++ b/libgomp/Makefile.in @@ -1286,8 +1286,10 @@ omp_lib_kinds.mod: omp_lib.mod : openacc_kinds.mod: openacc.mod : -openacc.mod: openacc.lo - : +openacc.lo: openacc.f90 + $(LTFCCOMPILE) -fopenacc -foffload=disable -c -o $@ $^ +openacc.mod: openacc.f90 + $(FC) $(FCFLAGS) -fopenacc -foffload=disable -c $< %.mod: %.f90 $(FC) $(FCFLAGS) -fsyntax-only $< fortran.lo: libgomp_f.h diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90 index 4b71489..98ba493 100644 --- a/libgomp/openacc.f90 +++ b/libgomp/openacc.f90 @@ -128,6 +128,7 @@ module openacc_internal function acc_on_device_h (d) import + !$acc routine seq integer (acc_device_kind) d logical acc_on_device_h end function @@ -719,6 +720,7 @@ end subroutine function acc_on_device_h (d) use openacc_internal, only: acc_on_device_l use openacc_kinds + !$acc routine seq integer (acc_device_kind) d logical acc_on_device_h if (acc_on_device_l (d) .eq. 1) then diff --git a/libgomp/openacc.h b/libgomp/openacc.h index 7ea8794..094db50 100644 --- a/libgomp/openacc.h +++ b/libgomp/openacc.h @@ -83,6 +83,9 @@ void acc_shutdown (acc_device_t) __GOACC_NOTHROW; #ifdef __cplusplus int acc_on_device (int __arg) __GOACC_NOTHROW; #else +#ifdef _OPENACC +#pragma acc routine seq +#endif int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW; #endif void *acc_malloc (size_t) __GOACC_NOTHROW; diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h index a3f94d7..d627857 100644 --- a/libgomp/openacc_lib.h +++ b/libgomp/openacc_lib.h @@ -142,6 +142,7 @@ interface acc_on_device function acc_on_device_h (devicetype) import acc_device_kind +!$acc routine seq logical acc_on_device_h integer (acc_device_kind) devicetype end function diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c index d6ff44d..02b1f15 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c @@ -78,7 +78,7 @@ main(int argc, char **argv) #pragma acc parallel copy (a[0:N]) { -#pragma acc loop seq +#pragma acc loop /* { dg-warning "insufficient partitioning" } */ for (i = 0; i < N; i++) gang (&a[0]); } diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 index 200188e..27cda44 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 @@ -1,121 +1,101 @@ +! Test acc routines. ! { dg-do run } -! { dg-additional-options "-cpp" } -#define M 8 -#define N 32 +module size + integer, parameter :: N = 32 +end module size program main + use size + implicit none + integer :: i integer :: a(N) - integer :: b(M * N) - do i = 1, N - a(i) = 0 - end do - - !$acc parallel copy (a) - !$acc loop seq - do i = 1, N - call seq (a) - end do + !$acc parallel + call seq (a) !$acc end parallel do i = 1, N - if (a(i) .ne.N) call abort + if (a(i) .ne. 4) call abort end do - !$acc parallel copy (a) - !$acc loop seq - do i = 1, N - call gang (a) - end do + !$acc parallel + call gang (a) !$acc end parallel do i = 1, N - if (a(i) .ne. (N + (N * (-1 * i)))) call abort + if (a(i) .ne. 3) call abort end do - do i = 1, N - b(i) = i - end do - - !$acc parallel copy (b) - !$acc loop seq - do i = 1, N - call worker (b) - end do + !$acc parallel + call worker (a) !$acc end parallel do i = 1, N - if (b(i) .ne. N + i) call abort - end do - - do i = 1, N - a(i) = i + if (a(i) .ne. 2) call abort end do - !$acc parallel copy (a) - !$acc loop seq - do i = 1, N - call vector (a) - end do + !$acc parallel + call vector (a) !$acc end parallel do i = 1, N - if (a(i) .ne. 0) call abort + if (a(i) .ne. 1) call abort end do contains subroutine vector (a) + use size + implicit none !$acc routine vector integer, intent (inout) :: a(N) integer :: i !$acc loop vector do i = 1, N - a(i) = a(i) - a(i) + a(i) = 1 end do - end subroutine vector -subroutine worker (b) +subroutine worker (a) + use size + implicit none !$acc routine worker - integer, intent (inout) :: b(M*N) - integer :: i, j + integer, intent (inout) :: a(N) + integer :: i !$acc loop worker do i = 1, N - !$acc loop vector - do j = 1, M - b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1 - end do + a(i) = 2 end do - end subroutine worker subroutine gang (a) + use size + implicit none !$acc routine gang integer, intent (inout) :: a(N) integer :: i !$acc loop gang do i = 1, N - a(i) = a(i) - i + a(i) = 3 end do - end subroutine gang subroutine seq (a) + use size + implicit none !$acc routine seq - integer, intent (inout) :: a(M) + integer, intent (inout) :: a(N) integer :: i do i = 1, N - a(i) = a(i) + 1 + a(i) = 4 end do - end subroutine seq end program main