[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321 Tom de Vries changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 Resolution|--- |FIXED --- Comment #7 from Tom de Vries --- Patch committed, marking resolved-fixed.
[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321 --- Comment #6 from CVS Commits --- The master branch has been updated by Tom de Vries : https://gcc.gnu.org/g:f87990a2a8fc9e20d30462a0a4c9047582af0cd9 commit r12-395-gf87990a2a8fc9e20d30462a0a4c9047582af0cd9 Author: Tom de Vries Date: Mon May 3 11:36:14 2021 +0200 [openmp, simt] Disable SIMT for user-defined reduction The test-case included in this patch contains this target region: ... for (int i0 = 0 ; i0 < N0 ; i0++ ) counter_N0.i += 1; ... When running with nvptx accelerator, the counter variable is expected to be N0 after the region, but instead is N0 / 32. The problem is that rather than getting the result for all warp lanes, we get it for just one lane. This is caused by the implementation of SIMT being incomplete. It handles regular reductions, but appearantly not user-defined reductions. For now, handle this by disabling SIMT in this case, specifically by setting sctx->max_vf to 1. Tested libgomp on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2021-05-03 Tom de Vries PR target/100321 * omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined reduction. libgomp/ChangeLog: 2021-05-03 Tom de Vries PR target/100321 * testsuite/libgomp.c/target-44.c: New test.
[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321 --- Comment #5 from Tom de Vries --- (In reply to Tom de Vries from comment #4) > So, something like this reflects the current state: > ... > diff --git a/gcc/omp-low.c b/gcc/omp-low.c > index 7b122059c6e..a0561800977 100644 > --- a/gcc/omp-low.c > +++ b/gcc/omp-low.c > @@ -6005,6 +6005,11 @@ lower_rec_input_clauses (tree clauses, gimple_seq > *ilist, gimple_seq *dlist, > tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c); > gimple *tseq; > tree ptype = TREE_TYPE (placeholder); > + if (sctx.is_simt) > + { > + sorry ("SIMT not fully implemented"); > + abort (); > + } > if (cond) > { > x = error_mark_node; > ... Submitted patch that does something similar (but using error rather than sorry/abort) @ https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569421.html .
[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321 --- Comment #4 from Tom de Vries --- During lower_rec_input_clauses in omp-low.c, the reduction clause is handled: ... case OMP_CLAUSE_REDUCTION: case OMP_CLAUSE_IN_REDUCTION: /* OpenACC reductions are initialized using the GOACC_REDUCTION internal function. */ if (is_gimple_omp_oacc (ctx->stmt)) break; if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c)) ... AFAICT, the problem is that the the SIMT handling code is added only in the !OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) case. For this test-case, the OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) path is taken instead. So, something like this reflects the current state: ... diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 7b122059c6e..a0561800977 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -6005,6 +6005,11 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist, tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c); gimple *tseq; tree ptype = TREE_TYPE (placeholder); + if (sctx.is_simt) + { + sorry ("SIMT not fully implemented"); + abort (); + } if (cond) { x = error_mark_node; ...
[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321 --- Comment #3 from Tom de Vries --- C example: ... /* { dg-additional-options "-foffload=-latomic" } */ #include struct s { int i; }; #pragma omp declare reduction(+: struct s: omp_out.i += omp_in.i) int main (void) { const int N0 = 32768; printf ("Expected: %d\n", N0); struct s counter_N0 = { 0 }; #pragma omp target #pragma omp for simd reduction(+: counter_N0) for (int i0 = 0 ; i0 < N0 ; i0++ ) counter_N0.i += 1; printf ("Got : %d\n", counter_N0.i); return 0; } ...
[Bug target/100321] [OpenMP][nvptx, SIMT] (Con't) Reduction fails with optimization and 'loop'/'for simd' but not with 'for'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100321 Tom de Vries changed: What|Removed |Added Summary|[OpenMP][nvptx] (Con't) |[OpenMP][nvptx, SIMT] |Reduction fails with|(Con't) Reduction fails |optimization and|with optimization and |'loop'/'for simd' but not |'loop'/'for simd' but not |with 'for' |with 'for' --- Comment #2 from Tom de Vries --- FTR, example minimized to: ... // { dg-additional-options "-foffload=-latomic" } #include #include #include #include using std::complex; #pragma omp declare reduction(+: complex: omp_out += omp_in) int main (void) { const int N0 { 32768 }; const complex expected_value { N0, 0 }; complex counter_N0 { 0, 0 }; #pragma omp target #pragma omp for simd reduction(+: counter_N0) for (int i0 = 0 ; i0 < N0 ; i0++ ) counter_N0 += complex { 1, 0 }; std::cerr << "Expected: " << expected_value << std::endl; std::cerr << "Got : " << counter_N0 << std::endl; return 0; } ...