Hello, I am trying to combine the use of PyCuda and straight C CUDA code (through Boost.Python bindings). The reason is that I would like to have the convenience of allocating memory and copying data to/from the device using the PyCuda bindings, but require very accurate timings of the code executing on the GPU. I tried using PyCuda Event objects, but the function call overhead is adding too much time. So the basic structure of my program is now:
...... PyCuda code to initialize data and copy it to the CUDA device .... ------------------- C code start ------------------------------------------- (I'm passing the addresses of the memory CUDA allocated using PyCuda as long integers and resolving them as pointers in the C code) ...... C call to cudaEventRecord ..... ...... C CUDA kernel invocation ....... ...... C call to cudaEventRecord ..... ...... return timespec (defined in <time.h>) with elapsed runtime ....... ------------------- C code end ------------------------------------------- ...... PyCuda code to copy results data from the CUDA device .... I was sceptical that this would work without some further fiddling, but it seemed to work fine for a bunch of examples. However, I have found at least one example case (a much larger number of items to process, and thus more thread blocks), I get the following error: Traceback (most recent call last): File "./test_benchmark.py", line 65, in <module> run_test(benchmark_file, args.runs) File "./test_benchmark.py", line 41, in run_test rm = test_circuit_runtime(c, runs) File "/home/christian/Documents/dev/starplus_constant/circuit/cuda_data_multi.py", line 329, in test_circuit_runtime results_manager = wl_calc.calc_circuit_wirelength() File "/home/christian/Documents/dev/starplus_constant/circuit/cuda_data_multi.py", line 100, in calc_circuit_wirelength self.run_kernel(self.cuda_func, data_group, kl) File "/home/christian/Documents/dev/starplus_constant/circuit/cuda_data_multi.py", line 186, in run_kernel cuda.memcpy_dtoh(data, gpu_data_set_out[i]) pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch failed Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) LaunchError: cuCtxPopCurrent failed: launch failed Error in sys.exitfunc: PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuMemFree failed: invalid context PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuMemFree failed: invalid context ...... Many more lines like this..... I'm thinking it likely has to do with the C code not being aware of the Context being initialized in by PyCuda. Is there any way to reference the Context created by PyCuda? Perhaps I can pass this Context to my C code as well to ensure both sets of code are using the same CUDA context? Any help would be greatly appreciated :) Thanks, Christian _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda