[PyCuda] Slow Device to Array copy

J-Pascal Mercier Thu, 05 Feb 2009 08:59:13 -0800

Hi,

I have a kernel that is invoked in loop with the data calculated from
the last kernel iteration. The kernel uses textures as input data. Right
now, i use the function Memcpy2D/3D to copy the resulting GPUarray back
to a texture but unfortunately this operation is very slow. I have only
been able to achieve 3-4GB/s which is way lower than the 50-60 GB/s i
can achieve in C with the fct cudaMemcpyToArray which unfortunately is
part of the Runtime API. My guess is that the problem comes from
parameters of Memcpy2D/3D but i can't get the right one to speed up the
process. The function looks like :


ary is a w * h array allocate with "ary = cuda.Array(descr)"
gpu_arry is a GPUArray obtain from gpuarray.to_gpu
both are float32

    copy = cuda.Memcpy2D()
    copy.set_src_device(gpu_arr.gpudata)
    copy.set_dst_array(ary)
    copy.height = h
    copy.src_pitch = copy.width_in_bytes = w * 4
    copy(aligned=True)
 

cheers,

J-Pascal


_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

[PyCuda] Slow Device to Array copy

Reply via email to