On 11/05/2015 09:13 AM, Nathan Sidwell wrote:
> On 11/05/15 12:01, Thomas Schwinge wrote:
> 
>> On Thu, 5 Nov 2015 06:47:58 -0800, Cesar Philippidis
>> <ce...@codesourcery.com> wrote:
>>> On 11/05/2015 04:14 AM, Thomas Schwinge wrote:
> 
>>> Sorry, I must have mis-phrased it. The spec is unclear here. There are
>>> three possible ways to interpret 'acc parallel loop reduction':
>>>
>>>    1. acc parallel reduction
>>>       acc loop
>>
>> This is what you propose in your patch, but I don't think that makes
>> sense, or does it?  I'm happy to learn otherwise, but in my current
>> understanding, a reduction clause needs to be attached (at least) to the
>> innermost construct where reductions are to be processed.  (Let's also
> 
> Correct, the  above interpretation must be wrong.
> 
>> consider multi-level gang/worker/vector loops/reductions.)  So, either:
>>
>>>    2. acc parallel
>>>       acc loop reduction
>>
>> ... this, or even this:
>>
>>>    3. acc parallel reduction
>>>       acc loop reduction
>>
>> ..., which I'm not sure what the execution model implementation requires.
>> (Nathan?)
> 
> interpretation #2 is sufficient, I think. However, both are lacking a
> 'copy (reduction_var)', clause as otherwise there's nothing changing the
> default data attribute of 'firstprivate' (working on that patch). 
> Perhaps 'reduction' on 'parallel'  is meant to imply that  (because
> that's what makes sense), but the std doesn't say it.
> 
> In summary it's probably safe to implement interpretation #3.  That way
> we can implement the hypothesis that reductions at the outer construct
> imply copy.

OK, #3 it is.

>> And while we're at it: the very same question also applies to the private
>> clause, which -- contrary to all other (as far as I remember) clauses --
>> also is applicable to both the parallel and loop constructs:
>>
>>      #pragma acc parallel loop private([...])
>>
>> ... is to be decomposed into which of the following:
>>
>>      #pragma acc parallel private([...])
>>      #pragma acc loop
>>
>>      #pragma acc parallel
>>      #pragma acc loop private([...])
>>
>>      #pragma acc parallel private([...])
>>      #pragma acc loop private([...])
>>
>> (There is no private clause allowed to be specified with the kernels
>> construct for what it's worth, but that doesn't mean we couldn't use it
>> internally, of course, if so required.)
> 
> I think interpretation #2 or #3 make sense, and I suspect result in the
> same emitted code.

I'll probably go #2 here to make life easier with kernels.

After I make these changes (and the c++ template updates), I'll apply
them to trunk and backport them to gomp4. Thank you Jakub, Thomas and
Nathan for reviewing these patches.

Cesar

Reply via email to