Hi,

I've written a patch series to facilitate debugging libgomp openacc testcase failures on the nvptx accelerator.


When running an openacc test-case on an nvptx accelerator, the following happens:
- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
  such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
  the debug output,  by writing it into a file and calling nvdisasm on
  it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
  compilation/linking process, currently supporting:
  * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
  * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

1. Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin
2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
3. Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin
4. Handle GOMP_OPENACC_NVPTX_JIT=-ori in libgomp nvptx plugin


I've tested the patch series on top of gomp-4_0-branch, by running an openacc testcase from the command line and defining the various environment variables.

[ A relevant difference between gomp-4_0-branch and master is that:
- master defines and includes ./libgomp/plugin/cuda/cuda.h, so I had to
  add the CU_JIT constants there, while
- gomp-4_0-branch doesn't define that local minimal cuda.h file but
  includes cuda's cuda.h. My setup linked against cuda 6.5 which defines
  CU_JIT_OPTIMIZATION_LEVEL but not yet CU_JIT_NEW_SM3X_OPT (that seems
  to have been introduced at cuda 8.0), so I had to hardcode the latter.
]


OK for trunk if bootstrap and reg-test on x86_64 with nvidia accelerator succeeds?

Thanks,
- Tom

Reply via email to