Hello,

I am trying to combine the use of PyCuda and straight C CUDA code
(through Boost.Python bindings).  The reason is that I would like to
have the convenience of allocating memory and copying data to/from the
device using the PyCuda bindings, but require very accurate timings of
the code executing on the GPU.  I tried using PyCuda Event objects,
but the function call overhead is adding too much time.  So the basic
structure of my program is now:

...... PyCuda code to initialize data and copy it to the CUDA device ....

-------------------  C code start -------------------------------------------
(I'm passing the addresses of the memory CUDA allocated using PyCuda
as long integers and resolving them as pointers in the C code)

...... C call to cudaEventRecord .....
...... C CUDA kernel invocation .......
...... C call to cudaEventRecord .....
...... return timespec (defined in <time.h>) with elapsed runtime .......

-------------------  C code end  -------------------------------------------

...... PyCuda code to copy results data from the CUDA device ....



I was sceptical that this would work without some further fiddling,
but it seemed to work fine for a bunch of examples.  However, I have
found at least one example case (a much larger number of items to
process, and thus more thread blocks), I get the following error:


Traceback (most recent call last):
  File "./test_benchmark.py", line 65, in <module>
    run_test(benchmark_file, args.runs)
  File "./test_benchmark.py", line 41, in run_test
    rm = test_circuit_runtime(c, runs)
  File 
"/home/christian/Documents/dev/starplus_constant/circuit/cuda_data_multi.py",
line 329, in test_circuit_runtime
    results_manager = wl_calc.calc_circuit_wirelength()
  File 
"/home/christian/Documents/dev/starplus_constant/circuit/cuda_data_multi.py",
line 100, in calc_circuit_wirelength
    self.run_kernel(self.cuda_func, data_group, kl)
  File 
"/home/christian/Documents/dev/starplus_constant/circuit/cuda_data_multi.py",
line 186, in run_kernel
    cuda.memcpy_dtoh(data, gpu_data_set_out[i])
pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch failed
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
LaunchError: cuCtxPopCurrent failed: launch failed
Error in sys.exitfunc:
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid context
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid context
...... Many more lines like this.....


I'm thinking it likely has to do with the C code not being aware of
the Context being initialized in by PyCuda.  Is there any way to
reference the Context created by PyCuda?  Perhaps I can pass this
Context to my C code as well to ensure both sets of code are using the
same CUDA context?

Any help would be greatly appreciated :)

Thanks,
Christian

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to