Hi,
I've written a patch series to facilitate debugging libgomp openacc
testcase failures on the nvptx accelerator.
When running an openacc test-case on an nvptx accelerator, the following
happens:
- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel
The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
the debug output, by writing it into a file and calling nvdisasm on
it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
compilation/linking process, currently supporting:
* -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
* -ori, mapping onto CU_JIT_NEW_SM3X_OPT
The patch series consists of these patches:
1. Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin
2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
3. Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin
4. Handle GOMP_OPENACC_NVPTX_JIT=-ori in libgomp nvptx plugin
I've tested the patch series on top of gomp-4_0-branch, by running an
openacc testcase from the command line and defining the various
environment variables.
[ A relevant difference between gomp-4_0-branch and master is that:
- master defines and includes ./libgomp/plugin/cuda/cuda.h, so I had to
add the CU_JIT constants there, while
- gomp-4_0-branch doesn't define that local minimal cuda.h file but
includes cuda's cuda.h. My setup linked against cuda 6.5 which defines
CU_JIT_OPTIMIZATION_LEVEL but not yet CU_JIT_NEW_SM3X_OPT (that seems
to have been introduced at cuda 8.0), so I had to hardcode the latter.
]
OK for trunk if bootstrap and reg-test on x86_64 with nvidia accelerator
succeeds?
Thanks,
- Tom