I'm trying to run jobs on several GPUs at the same time using multiple
threads, each with its own context. Sometimes this works flawlessly, but
~75% of the time I get a cuModuleLoadDataEx error telling me the context
has been destroyed. What's frustrating is that nothing changes between
failed and successful runs of the code. From what I can tell it's down to
luck whether or not the error comes up:

~/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py in
__init__(self, source, nvcc, options, keep, no_extern_c, arch, code,
cache_dir, include_dirs)    292     293         from pycuda.driver
import module_from_buffer--> 294         self.module =
module_from_buffer(cubin)    295     296         self._bind_module()
LogicError: cuModuleLoadDataEx failed: context is destroyed -


I start by making the contexts

from pycuda import driver as cuda
cuda.init()
contexts = []
for i in range(cuda.Device.count()):
    c = cuda.Device(i).make_context()
    c.pop()
    contexts.append(c)

... and setting up a function to use each context, i.e.

import numpy as np
def do_work(ctx):
    with Acquire(ctx):
        a = gpuarray.to_gpu(np.random.rand(100, 400, 400))
        b = gpuarray.to_gpu(np.random.rand(100, 400, 400))
        for _ in range(10):
            c = (a + b) / 2
        out = c.get()
    return out

where `Acquire` is a context manager that handles pushing and popping:

class Acquire:
    def __init__(self, context):
        self.ctx = context
    def __enter__(self):
        self.ctx.push()
        return self.ctx
    def __exit__(self, type, value, traceback):
        self.ctx.pop()

and here I run the code in parallel using a pool of threaded workers via
joblib

from joblib import Parallel, delayed
pool = Parallel(n_jobs=len(contexts), verbose=8, prefer='threads')
with pool:
    # Pass 1
    sum(pool(delayed(do_work)(ctx) for ctx in contexts))
    # Pass 2
    sum(pool(delayed(do_work)(ctx) for ctx in contexts))

Note that I do several "passes" of work (I'll need to do 50 or so in my
real application) with the same thread pool. It seems like the crash always
happens somewhere in the second pass, or not at all. Any ideas about how to
keep my contexts from getting destroyed?

*System info*
Ubuntu 16.04 (Amazon Deep Learning AMI)
CUDA driver version 396.44
4x V100 GPUs
Python 3.6
pycuda version 2018.1.1
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Reply via email to