You're prescribing the GPU architecture (arch='...'). If this doesn't match your GPU, this could easily cause this issue. Just deleting that kwarg should be fine.
Andreas Zhangsheng Lai <dunno....@gmail.com> writes: > I'm encountering this error as I run my code on the same docker environment > but on different workstations. > > ``` > Traceback (most recent call last): > File "simple_peer.py", line 76, in <module> > tslr_gpu, lr_gpu = mp.initialise() > File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in > initialise > """, arch='sm_60') > File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py", > line 294, in __init__ > self.module = module_from_buffer(cubin) > pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image > is invalid - > > ``` > I did a quick search and only found this : > https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be > relevant to my problem as it runs on my initial workstation. Can anyone see > what is the issue? > > Below is my code that I'm trying to run: > ``` > def initialise(self): > """ > Documentation here > """ > > mod = SourceModule(""" > #include <math.h> > __global__ void initial(float *tslr_out, float *lr_out, float > *W_gpu,\ > float *b_gpu, int *x_gpu, int d, float temp) > { > int tx = threadIdx.x; > > // Wx stores the W_ji x_i product value > float Wx = 0; > > // Matrix multiplication of W and x > for (int k=0; k<d; ++k) > { > float W_element = W_gpu[tx * d + k]; > float x_element = x_gpu[k]; > Wx += W_element * x_element; > } > > // Computing the linear response, signed linear response with > temp > lr_out[tx] = Wx + b_gpu[tx]; > tslr_out[tx] = (0.5/temp) * (1 - 2*x_gpu[tx])* (Wx + b_gpu[tx]); > } > """, arch='sm_60') > > func = mod.get_function("initial") > > # format for prepare defined at > https://docs.python.org/2/library/struct.html > func.prepare("PPPPPif") > > dsize_nparray = np.zeros((self.d,), dtype = np.float32) > > lr_gpu = cuda.mem_alloc(dsize_nparray.nbytes) > slr_gpu = cuda.mem_alloc(dsize_nparray.nbytes) > tslr_gpu = cuda.mem_alloc(dsize_nparray.nbytes) > > grid=(1,1) > block=(self.d,1,1) > # block=(self.d,self.d,1) > > func.prepared_call(grid, block, tslr_gpu, lr_gpu, self.W_gpu, \ > self.b_gpu, self.x_gpu, self.d, self.temp) > > return tslr_gpu, lr_gpu > ``` > _______________________________________________ > PyCUDA mailing list > PyCUDA@tiker.net > https://lists.tiker.net/listinfo/pycuda _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net https://lists.tiker.net/listinfo/pycuda