Hello CUDA,
I try to speed up my Python program with a not so trivial algorithm, so
I need to know. What is the correct way of transferring a list of lists
of floats to the (Py)CUDA Kernel?
given as example the following list
listToProc = [[-1,-2,-3,-4,-5],[1,2,3,4,5,6,7,8.1,9]]
it shall be transfered to a PyCUDA kernel for further processing. I
would then proceed with common functions to transfer a list of values
(not a list of lists) like this
listToProcAr = np.array(listToProc, dtype=np.object)
listToProcAr_gpu = cuda.mem_alloc(listToProcAr.nbytes)
cuda.memcpy_htod(listToProcAr_gpu, listToProcAr)
However this results in two problems:
1) listToProcAr.nbytes = 2 - i.e. too less memory is reserved. I believe
this can be solved by
listBytes = 0
for currentList in ListToProc:
listBytes += np.array(currentList,
dtype=np.float32).nbytes
and replace the variable here
listToProcAr_gpu = cuda.mem_alloc(listBytes)
2) and the actual problem
cuda.memcpy_htod(listToProcAr_gpu, listToProcAr) still seems to create a
wrong pointer in the Kernel. Because when trying to access the last
element of the second list (listToProc[1][8]) raises an
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
So I'm a little bit clueless at the moment
The PyCUDA code
__global__ void procTheListKernel(float ** listOfLists)
{
listOfLists[0][0] = 0;
listOfLists[1][8] = 0;
__syncthreads();
}
Can anyone help me out?
Kind Regards
Frank
||
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda