You're prescribing the GPU architecture (arch='...'). If this doesn't
match your GPU, this could easily cause this issue. Just deleting that
kwarg should be fine.

Andreas


Zhangsheng Lai <dunno....@gmail.com> writes:
> I'm encountering this error as I run my code on the same docker environment
> but on different workstations.
>
> ```
> Traceback (most recent call last):
>   File "simple_peer.py", line 76, in <module>
>     tslr_gpu, lr_gpu = mp.initialise()
>   File "/root/distributed-mpp/naive/mccullochpitts.py", line 102, in
> initialise
>     """, arch='sm_60')
>   File "/root/anaconda3/lib/python3.6/site-packages/pycuda/compiler.py",
> line 294, in __init__
>     self.module = module_from_buffer(cubin)
> pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image
> is invalid -
>
> ```
> I did a quick search and only found this :
> https://github.com/inducer/pycuda/issues/45 , but it doesn't seem to be
> relevant to my problem as it runs on my initial workstation. Can anyone see
> what is the issue?
>
> Below is my code that I'm trying to run:
> ```
> def initialise(self):
>         """
>         Documentation here
>         """
>
>         mod = SourceModule("""
>         #include <math.h>
>         __global__ void initial(float *tslr_out, float *lr_out, float
> *W_gpu,\
>             float *b_gpu, int *x_gpu, int d, float temp)
>         {
>             int tx = threadIdx.x;
>
>             // Wx stores the W_ji x_i product value
>             float Wx = 0;
>
>             // Matrix multiplication of W and x
>             for (int k=0; k<d; ++k)
>             {
>                 float W_element = W_gpu[tx * d + k];
>                 float x_element = x_gpu[k];
>                 Wx += W_element * x_element;
>             }
>
>             // Computing the linear response, signed linear response with
> temp
>             lr_out[tx] = Wx + b_gpu[tx];
>             tslr_out[tx] = (0.5/temp) * (1 - 2*x_gpu[tx])* (Wx + b_gpu[tx]);
>         }
>         """, arch='sm_60')
>
>         func = mod.get_function("initial")
>
>         # format for prepare defined at
> https://docs.python.org/2/library/struct.html
>         func.prepare("PPPPPif")
>
>         dsize_nparray = np.zeros((self.d,), dtype = np.float32)
>
>         lr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
>         slr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
>         tslr_gpu = cuda.mem_alloc(dsize_nparray.nbytes)
>
>         grid=(1,1)
>         block=(self.d,1,1)
>         # block=(self.d,self.d,1)
>
>         func.prepared_call(grid, block, tslr_gpu, lr_gpu, self.W_gpu, \
>                 self.b_gpu, self.x_gpu, self.d, self.temp)
>
>         return tslr_gpu, lr_gpu
> ```
> _______________________________________________
> PyCUDA mailing list
> PyCUDA@tiker.net
> https://lists.tiker.net/listinfo/pycuda


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Reply via email to