Wire up -mgomp multilib for OpenMP offloading, in a straightforward way. Changes in nvptx.opt and invoke.texi adding -msoft-stack and -muniform-simt are originally from the patches that introduced those.
Doc additions in invoke.texi will probably change; I've asked Sandra Loosemore to have a look. Previously posted here: [gomp-nvptx 4/9] nvptx backend: add -mgomp option and multilib https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00115.html 2015-12-09 Alexander Monakov <amona...@ispras.ru> * config/nvptx/nvptx.c (nvptx_option_override): Handle TARGET_GOMP. * config/nvptx/nvptx.opt (mgomp): New option. * config/nvptx/t-nvptx (MULTILIB_OPTIONS): New. * doc/invoke.texi (mgomp): Document. diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 2d4dad1..e9e4d06 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -184,6 +171,9 @@ nvptx_option_override (void) worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red"); SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED); worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + + if (TARGET_GOMP) + target_flags |= MASK_SOFT_STACK | MASK_UNIFORM_SIMT; } /* Return a ptx type for MODE. If PROMOTE, then use .u32 for QImode to diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 056b9b2..1a7608b 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -32,3 +32,15 @@ Link in code for a __main kernel. moptimize Target Report Var(nvptx_optimize) Init(-1) Optimize partition neutering + +msoft-stack +Target Report Mask(SOFT_STACK) +Use custom stacks instead of local memory for automatic storage. + +muniform-simt +Target Report Mask(UNIFORM_SIMT) +Generate code that executes all threads in a warp as if one was active. + +mgomp +Target Report Mask(GOMP) +Generate code for OpenMP offloading: enables -msoft-stack and -muniform-simt. diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx index e2580c9..6c1010d 100644 --- a/gcc/config/nvptx/t-nvptx +++ b/gcc/config/nvptx/t-nvptx @@ -8,3 +8,5 @@ ALL_HOST_OBJS += mkoffload.o mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a $(LIBIBERTY) $(LIBDEPS) +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \ mkoffload.o collect-utils.o libcommon-target.a $(LIBIBERTY) $(LIBS) + +MULTILIB_OPTIONS = mgomp diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d281975..a02a852 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -19341,6 +19341,32 @@ offloading execution. Apply partitioned execution optimizations. This is the default when any level of optimization is selected. +@item -msoft-stack +@opindex msoft-stack +Do not use @code{.local} memory for automatic storage. Instead, use pointer +in shared memory array @code{char *__nvptx_stacks[]} at position @code{tid.y} +as the stack pointer. This is for placing automatic variables into storage +that can be accessed from other threads, or modified with atomic instructions. + +@item -muniform-simt +@opindex muniform-simt +Switch to code generation variant that allows to execute all threads in each +warp, while maintaining memory state and side effects as if only one thread +in each warp was active outside of OpenMP SIMD regions. All atomic operations +and calls to runtime (malloc, free, vprintf) are conditionally executed (iff +current lane index equals the master lane index), and the register being +assigned is copied via a shuffle instruction from the master lane. Outside of +SIMD regions lane 0 is the master; inside, each thread sees itself as the +master. Shared memory array @code{int __nvptx_uni[]} stores all-zeros or +all-ones bitmasks for each warp, indicating current mode (0 outside of SIMD +regions). Each thread can bitwise-and the bitmask at position @code{tid.y} +with current lane index to compute the master lane index. + +@item -mgomp +@opindex mgomp +Generate code for use in OpenMP offloading: enables @option{-msoft-stack} and +@option{-muniform-simt} options, and selects corresponding multilib variant. + @end table @node PDP-11 Options