https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90115

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Thomas Schwinge <tschwi...@gcc.gnu.org>:

https://gcc.gnu.org/g:29a2f51806c5b30e17a8d0e9ba7915a3c53c34ff

commit r12-980-g29a2f51806c5b30e17a8d0e9ba7915a3c53c34ff
Author: Julian Brown <jul...@codesourcery.com>
Date:   Fri Feb 26 04:34:49 2021 -0800

    openacc: Add support for gang local storage allocation in shared memory
[PR90115]

    This patch implements a method to track the "private-ness" of
    OpenACC variables declared in offload regions in gang-partitioned,
    worker-partitioned or vector-partitioned modes. Variables declared
    implicitly in scoped blocks and those declared "private" on enclosing
    directives (e.g. "acc parallel") are both handled. Variables that are
    e.g. gang-private can then be adjusted so they reside in GPU shared
    memory.

    The reason for doing this is twofold: correct implementation of OpenACC
    semantics, and optimisation, since shared memory might be faster than
    the main memory on a GPU. Handling of private variables is intimately
    tied to the execution model for gangs/workers/vectors implemented by
    a particular target: for current targets, we use (or on mainline, will
    soon use) a broadcasting/neutering scheme.

    That is sufficient for code that e.g. sets a variable in worker-single
    mode and expects to use the value in worker-partitioned mode. The
    difficulty (semantics-wise) comes when the user wants to do something like
    an atomic operation in worker-partitioned mode and expects a worker-single
    (gang private) variable to be shared across each partitioned worker.
    Forcing use of shared memory for such variables makes that work properly.

    In terms of implementation, the parallelism level of a given loop is
    not fixed until the oaccdevlow pass in the offload compiler, so the
    patch delays fixing the parallelism level of variables declared on or
    within such loops until the same point. This is done by adding a new
    internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each
    private variable as an argument, and other arguments set so as to be able
    to determine the correct parallelism level to use for the listed
    variables. This new internal function fits into the existing scheme for
    demarcating OpenACC loops, as described in comments in the patch.

    Two new target hooks are introduced: TARGET_GOACC_ADJUST_PRIVATE_DECL and
    TARGET_GOACC_EXPAND_VAR_DECL.  The first can tweak a variable declaration
    at oaccdevlow time, and the second at expand time.  The first or both
    of these target hooks can be used by a given offload target, depending
    on its strategy for implementing private variables.

    This patch updates the TARGET_GOACC_ADJUST_PRIVATE_DECL target hook in
    the AMD GCN backend to the current name and prototype. (An earlier
    version of the hook was already present, but dormant.)

            gcc/
            PR middle-end/90115
            * doc/tm.texi.in (TARGET_GOACC_EXPAND_VAR_DECL)
            (TARGET_GOACC_ADJUST_PRIVATE_DECL): Add documentation hooks.
            * doc/tm.texi: Regenerate.
            * expr.c (expand_expr_real_1): Expand decls using the
            expand_var_decl OpenACC hook if defined.
            * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
            * internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
            * omp-low.c (omp_context): Add oacc_privatization_candidates
            field.
            (lower_oacc_reductions): Add PRIVATE_MARKER parameter.  Insert
            before fork.
            (lower_oacc_head_tail): Add PRIVATE_MARKER parameter.  Modify
            private marker's gimple call arguments, and pass it to
            lower_oacc_reductions.
            (oacc_privatization_scan_clause_chain)
            (oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
            New functions.
            (lower_omp_for, lower_omp_target, lower_omp_1): Use these.
            * omp-offload.c (convert.h): Include.
            (oacc_loop_xform_head_tail): Treat private-variable markers like
            fork/join when transforming head/tail sequences.
            (struct var_decl_rewrite_info): Add struct.
            (oacc_rewrite_var_decl, is_sync_builtin_call): New functions.
            (execute_oacc_device_lower): Support rewriting gang-private
            variables using target hook, and fix up addr_expr and var_decl
            nodes afterwards.
            * target.def (adjust_private_decl, expand_var_decl): New hooks.
            * config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl):
            Rename to...
            (gcn_goacc_adjust_private_decl): ...this.
            * config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl):
            Rename to...
            (gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
            * config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename
            definition using gcn_goacc_adjust_gangprivate_decl...
            (TARGET_GOACC_ADJUST_PRIVATE_DECL): ...to this, using
            gcn_goacc_adjust_private_decl.
            * config/nvptx/nvptx.c (tree-pretty-print.h): Include.
            (gang_private_shared_size): New global variable.
            (gang_private_shared_align): Likewise.
            (gang_private_shared_sym): Likewise.
            (gang_private_shared_hmap): Likewise.
            (nvptx_option_override): Initialize these.
            (nvptx_file_end): Output gang_private_shared_sym.
            (nvptx_goacc_adjust_private_decl, nvptx_goacc_expand_var_decl):
            New functions.
            (nvptx_set_current_function): Clear gang_private_shared_hmap.
            (TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook.
            (TARGET_GOACC_EXPAND_VAR_DECL): Likewise.
            libgomp/
            PR middle-end/90115
            * testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c: New
            test.
            * testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90:
            Likewise.
            * testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90:
            Likewise.

    Co-Authored-By: Chung-Lin Tang <clt...@codesourcery.com>
    Co-Authored-By: Thomas Schwinge <tho...@codesourcery.com>

Reply via email to