Hi, The test-case included in this patch contains this target region: ... for (int i0 = 0 ; i0 < N0 ; i0++ ) counter_N0.i += 1; ...
When running with nvptx accelerator, the counter variable is expected to be N0 after the region, but instead is N0 / 32. The problem is that rather than getting the result for all warp lanes, we get it for just one lane. This is caused by the implementation of SIMT being incomplete. It handles regular reductions, but appearantly not user-defined reductions. For now, make this explicit by erroring out for nvptx, like this: ... target-44.c: In function 'main': target-44.c:20:9: error: SIMT reduction not fully implemented ... Tested libgomp on x86_64-linux with and without nvptx accelerator. Any comments? Thanks, - Tom [openmp, simt] Error out for user-defined reduction gcc/ChangeLog: 2021-05-03 Tom de Vries <tdevr...@suse.de> PR target/100321 * omp-low.c (lower_rec_input_clauses): Error out for user-defined reduction for SIMT. libgomp/ChangeLog: 2021-05-03 Tom de Vries <tdevr...@suse.de> PR target/100321 * testsuite/libgomp.c/target-44.c: New test. --- gcc/omp-low.c | 2 ++ libgomp/testsuite/libgomp.c/target-44.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 7b122059c6e..0f122857a3a 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -6005,6 +6005,8 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c); gimple *tseq; tree ptype = TREE_TYPE (placeholder); + if (sctx.is_simt) + error ("SIMT reduction not fully implemented"); if (cond) { x = error_mark_node; diff --git a/libgomp/testsuite/libgomp.c/target-44.c b/libgomp/testsuite/libgomp.c/target-44.c new file mode 100644 index 00000000000..497931cd14c --- /dev/null +++ b/libgomp/testsuite/libgomp.c/target-44.c @@ -0,0 +1,28 @@ +/* { dg-do link { target { offload_target_nvptx } } } */ +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */ +/* { dg-error "SIMT reduction not fully implemented" "" { target { offload_target_nvptx } } 0 } */ +#include <stdlib.h> + +struct s +{ + int i; +}; + +#pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i) + +int +main (void) +{ + const int N0 = 32768; + + struct s counter_N0 = { 0 }; +#pragma omp target +#pragma omp for simd reduction(+: counter_N0) + for (int i0 = 0 ; i0 < N0 ; i0++ ) + counter_N0.i += 1; + + if (counter_N0.i != N0) + abort (); + + return 0; +}