Hi,
I have a kernel that is invoked in loop with the data calculated from
the last kernel iteration. The kernel uses textures as input data. Right
now, i use the function Memcpy2D/3D to copy the resulting GPUarray back
to a texture but unfortunately this operation is very slow. I have only
been able to achieve 3-4GB/s which is way lower than the 50-60 GB/s i
can achieve in C with the fct cudaMemcpyToArray which unfortunately is
part of the Runtime API. My guess is that the problem comes from
parameters of Memcpy2D/3D but i can't get the right one to speed up the
process. The function looks like :
ary is a w * h array allocate with "ary = cuda.Array(descr)"
gpu_arry is a GPUArray obtain from gpuarray.to_gpu
both are float32
copy = cuda.Memcpy2D()
copy.set_src_device(gpu_arr.gpudata)
copy.set_dst_array(ary)
copy.height = h
copy.src_pitch = copy.width_in_bytes = w * 4
copy(aligned=True)
cheers,
J-Pascal
_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net