On 11/05/2015 09:13 AM, Nathan Sidwell wrote: > On 11/05/15 12:01, Thomas Schwinge wrote: > >> On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis >> <ce...@codesourcery.com> wrote: >>> On 11/05/2015 04:14 AM, Thomas Schwinge wrote: > >>> Sorry, I must have mis-phrased it. The spec is unclear here. There are >>> three possible ways to interpret 'acc parallel loop reduction': >>> >>> 1. acc parallel reduction >>> acc loop >> >> This is what you propose in your patch, but I don't think that makes >> sense, or does it? I'm happy to learn otherwise, but in my current >> understanding, a reduction clause needs to be attached (at least) to the >> innermost construct where reductions are to be processed. (Let's also > > Correct, the above interpretation must be wrong. > >> consider multi-level gang/worker/vector loops/reductions.) So, either: >> >>> 2. acc parallel >>> acc loop reduction >> >> ... this, or even this: >> >>> 3. acc parallel reduction >>> acc loop reduction >> >> ..., which I'm not sure what the execution model implementation requires. >> (Nathan?) > > interpretation #2 is sufficient, I think. However, both are lacking a > 'copy (reduction_var)', clause as otherwise there's nothing changing the > default data attribute of 'firstprivate' (working on that patch). > Perhaps 'reduction' on 'parallel' is meant to imply that (because > that's what makes sense), but the std doesn't say it. > > In summary it's probably safe to implement interpretation #3. That way > we can implement the hypothesis that reductions at the outer construct > imply copy.
OK, #3 it is. >> And while we're at it: the very same question also applies to the private >> clause, which -- contrary to all other (as far as I remember) clauses -- >> also is applicable to both the parallel and loop constructs: >> >> #pragma acc parallel loop private([...]) >> >> ... is to be decomposed into which of the following: >> >> #pragma acc parallel private([...]) >> #pragma acc loop >> >> #pragma acc parallel >> #pragma acc loop private([...]) >> >> #pragma acc parallel private([...]) >> #pragma acc loop private([...]) >> >> (There is no private clause allowed to be specified with the kernels >> construct for what it's worth, but that doesn't mean we couldn't use it >> internally, of course, if so required.) > > I think interpretation #2 or #3 make sense, and I suspect result in the > same emitted code. I'll probably go #2 here to make life easier with kernels. After I make these changes (and the c++ template updates), I'll apply them to trunk and backport them to gomp4. Thank you Jakub, Thomas and Nathan for reviewing these patches. Cesar