Hi,

atm parallel-dims.c fails on Titan-V, with a cuda launch failure:
...
libgomp: cuLaunchKernel error: too many resources requested for launch
...

We've got a check in the libgomp nvptx plugin to prevent the cuda launch failure and give a more informative error message:
...
/* Check if the accelerator has sufficient hardware resources to
     launch the offloaded kernel.  */
  if (dims[GOMP_DIM_WORKER] > 1)
    {
      if (reg_granularity > 0
          && dims[GOMP_DIM_WORKER] > threads_per_block)
        GOMP_PLUGIN_fatal
          ("The Nvidia accelerator has insufficient resources "
           "to launch '%s'; recompile the program with "
           "'num_workers = %d' on that offloaded region or "
           "'-fopenacc-dim=-:%d'.\n",
           targ_fn->launch->fn, threads_per_block,
           threads_per_block);
    }
...

The message doesn't trigger, because reg_granularity == -1.
This value comes from dev->register_allocation_granularity which defaults to -1 because libgomp does not have a hardcoded constant for sm_70. The hardcoded constants that are present match 'Warp Allocation Granularity' in the GPU Data table in CUDA_Occupancy_calculator.xls, but AFAICT there's no column published yet for sm_70.

Furthermore, the comparison to threads_per_block is not correct. What we want here is the maximum amount of threads per block, while the threads_per_block variable contains an approximation of that, and the exact amount required is already available from the CUDA runtime and stored at targ_fn->max_threads_per_block.

Then, the comparison to dims[GOMP_DIM_WORKER] is incorrect. It used to be correct before "[nvptx] Handle large vectors in libgomp" when we used to do "threads_per_block /= warp_size", but now we need to compare against dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR].

Finally, the message has not been updated to reflect that vector length can be larger than 32.

The patch addresses these issues.

Committed to og7.

Thanks,
- Tom
[libgomp, nvptx] Fix too-many-resources fatal error condition and message

2018-04-30  Tom de Vries  <t...@codesourcery.com>

	* plugin/plugin-nvptx.c (nvptx_exec): Fix
	insufficient-resources-to-launch fatal error condition and message.

---
 libgomp/plugin/plugin-nvptx.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 9b4768f..3c00555 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -834,16 +834,15 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
 
   /* Check if the accelerator has sufficient hardware resources to
      launch the offloaded kernel.  */
-  if (dims[GOMP_DIM_WORKER] > 1)
-    {
-      if (reg_granularity > 0 && dims[GOMP_DIM_WORKER] > threads_per_block)
-	GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources "
-			   "to launch '%s'; recompile the program with "
-			   "'num_workers = %d' on that offloaded region or "
-			   "'-fopenacc-dim=-:%d'.\n",
-			   targ_fn->launch->fn, threads_per_block,
-			   threads_per_block);
-    }
+  if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]
+      > targ_fn->max_threads_per_block)
+    GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to"
+		       " launch '%s' with num_workers = %d and vector_length ="
+		       " %d; recompile the program with 'num_workers = x and"
+		       " vector_length = y' on that offloaded region or "
+		       "'-fopenacc-dim=-:x:y' where x * y <= %d.\n",
+		       targ_fn->launch->fn, dims[GOMP_DIM_WORKER],
+		       dims[GOMP_DIM_VECTOR], targ_fn->max_threads_per_block);
 
   GOMP_PLUGIN_debug (0, "  %s: kernel %s: launch"
 		     " gangs=%u, workers=%u, vectors=%u\n",

Reply via email to