Hello CUDA,

I try to speed up my Python program with a not so trivial algorithm, so I need to know. What is the correct way of transferring a list of lists of floats to the (Py)CUDA Kernel?

given as example the following list

           listToProc = [[-1,-2,-3,-4,-5],[1,2,3,4,5,6,7,8.1,9]]

it shall be transfered to a PyCUDA kernel for further processing. I would then proceed with common functions to transfer a list of values (not a list of lists) like this

            listToProcAr = np.array(listToProc, dtype=np.object)
            listToProcAr_gpu = cuda.mem_alloc(listToProcAr.nbytes)
            cuda.memcpy_htod(listToProcAr_gpu, listToProcAr)

However this results in two problems:

1) listToProcAr.nbytes = 2 - i.e. too less memory is reserved. I believe this can be solved by

          listBytes = 0
          for currentList in ListToProc:
listBytes += np.array(currentList, dtype=np.float32).nbytes

and replace the variable here

          listToProcAr_gpu = cuda.mem_alloc(listBytes)

2) and the actual problem

cuda.memcpy_htod(listToProcAr_gpu, listToProcAr) still seems to create a wrong pointer in the Kernel. Because when trying to access the last element of the second list (listToProc[1][8]) raises an

    PyCUDA WARNING: a clean-up operation failed (dead context maybe?)

So I'm a little bit clueless at the moment

The PyCUDA code

__global__ void procTheListKernel(float ** listOfLists)
{
    listOfLists[0][0] = 0;
    listOfLists[1][8] = 0;
    __syncthreads();
}

Can anyone help me out?

Kind Regards
Frank


||
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Reply via email to