Re: [PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels

Richard Biener Mon, 14 Dec 2015 07:33:07 -0800

On Sun, Dec 13, 2015 at 5:58 PM, Tom de Vries <tom_devr...@mentor.com> wrote:
> On 24/11/15 13:24, Tom de Vries wrote:
>>
>> On 16/11/15 12:59, Tom de Vries wrote:
>>>
>>> On 09/11/15 20:52, Tom de Vries wrote:
>>>>
>>>> On 09/11/15 16:35, Tom de Vries wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> this patch series for stage1 trunk adds support to:
>>>>> - parallelize oacc kernels regions using parloops, and
>>>>> - map the loops onto the oacc gang dimension.
>>>>>
>>>>> The patch series contains these patches:
>>>>>
>>>>>       1    Insert new exit block only when needed in
>>>>>          transform_to_exit_first_loop_alt
>>>>>       2    Make create_parallel_loop return void
>>>>>       3    Ignore reduction clause on kernels directive
>>>>>       4    Implement -foffload-alias
>>>>>       5    Add in_oacc_kernels_region in struct loop
>>>>>       6    Add pass_oacc_kernels
>>>>>       7    Add pass_dominator_oacc_kernels
>>>>>       8    Add pass_ch_oacc_kernels
>>>>>       9    Add pass_parallelize_loops_oacc_kernels
>>>>>      10    Add pass_oacc_kernels pass group in passes.def
>>>>>      11    Update testcases after adding kernels pass group
>>>>>      12    Handle acc loop directive
>>>>>      13    Add c-c++-common/goacc/kernels-*.c
>>>>>      14    Add gfortran.dg/goacc/kernels-*.f95
>>>>>      15    Add libgomp.oacc-c-c++-common/kernels-*.c
>>>>>      16    Add libgomp.oacc-fortran/kernels-*.f95
>>>>>
>>>>> The first 9 patches are more or less independent, but patches 10-16 are
>>>>> intended to be committed at the same time.
>>>>>
>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>
>>>>> Build and reg-tested with nvidia accelerator, in combination with a
>>>>> patch that enables accelerator testing (which is submitted at
>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ).
>>>>>
>>>>> I'll post the individual patches in reply to this message.
>>>>
>>>>
>>>> This patch adds pass_parallelize_loops_oacc_kernels.
>>>>
>>>> There's a number of things we do differently in parloops for oacc
>>>> kernels:
>>>> - in normal parloops, we generate code to choose between a parallel
>>>>    version of the loop, and a sequential (low iteration count) version.
>>>>    Since the code in oacc kernels region is supposed to run on the
>>>>    accelerator anyway, we skip this check, and don't add a low iteration
>>>>    count loop.
>>>> - in normal parloops, we generate an #pragma omp parallel /
>>>>    GIMPLE_OMP_RETURN pair to delimit the region which will we split off
>>>>    into a thread function. Since the oacc kernels region is already
>>>>    split off, we don't add this pair.
>>>> - we indicate the parallelization factor by setting the oacc function
>>>>    attributes
>>>> - we generate an #pragma oacc loop instead of an #pragma omp for, and
>>>>    we add the gang clause
>>>> - in normal parloops, we rewrite the variable accesses in the loop in
>>>>    terms into accesses relative to a thread function parameter. For the
>>>>    oacc kernels region, that rewrite has already been done at omp-lower,
>>>>    so we skip this.
>>>> - we need to ensure that the entire kernels region can be run in
>>>>    parallel. The loop independence check is already present, so for oacc
>>>>    kernels we add a check between blocks outside the loop and the entire
>>>>    region.
>>>> - we guard stores in the blocks outside the loop with gang_pos == 0.
>>>>    There's no need for each gang to write to a single location, we can
>>>>    do this in just one gang. (Typically this is the write of the final
>>>>    value of the iteration variable if that one is copied back to the
>>>>    host).
>>>>
>>>
>>> Reposting with loop optimizer init added in
>>> pass_parallelize_loops_oacc_kernels::execute.
>>>
>>
>> Reposting with loop_optimizer_finalize,scev_initialize and scev_finalize
>>   added in pass_parallelize_loops_oacc_kernels::execute.
>>
>
> Ping.
>
> Anything I can do to facilitate the review?


Document new functions, avoid if (1).

Ideally some refactoring would avoid some of the if (!oacc_kernels_p) spaghetti
but I'm considering tree-parloops.c (and its bugs) yours.

Can the pass not just use a pass parameter to switch between oacc/non-oacc?

Richard.

> Thanks,
>  Tom
>>
>>
>

Re: [PIING][PATCH, 9/16] Add pass_parallelize_loops_oacc_kernels

Reply via email to