We have a persistent problem attempting to multithread using pycuda. I have
a thread pool with one thread per GPU, each one initializes its own context
with its given device ID and waits to read jobs from a common Queue object.
The main thread processes requests and adds CUDA related jobs to the Queue.
This works well enough and utilizes all available GPUs but we frequently
run into a locking issue when issuing lots of relatively fast cuda calls
where one computation will hang indefinitely. When the contexts are created
with the pycuda.driver.ctx_flags.SCHED_BLOCKING_SYNC flag and I attach to a
hung process I find it's waiting on a semaphore in cuCtxSynchronize in
libcuda.so; when the contexts are created without the SCHED_BLOCKING_SYNC
flag I find its still stuck in cuCtxSynchronize but in a spin loop waiting
for results.

I have an alternative version with all the same code but bypassing pycuda
and calling directly into an nvcc compiled shared library using ctypes that
uses cudaSetDevice and cudaDeviceSynchronize rather than the cuCtx*
functions and it does not experience these same locking issues.

Has anyone ran into this kind of issue before? Also, is there support in
pycuda (or planned support for future releases) to use cudaDevice*
functions rather than explicit context management?

David
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to