Can I have a standard julia "for loop" inside a "device do" of CUDArt?

I tried the following example:

using CUDArt, MyCudaModule

nrow = 10
ncol = 3000

mat = ones(Float64,nrow,ncol)
out1 = zeros(Float64,nrow)
vec = Float64[1:nrow;]
out2 = zeros(Float64,nrow)

d_mat  = CudaArray(mat)
d_out1 = CudaArray(out1)
d_vec  = CudaArray(vec)
d_out2 = CudaArray(out2)
d_nrow = CudaArray(Int32[nrow;])
d_ncol = CudaArray(Int32[ncol;])

result = devices(dev->capability(dev)[1]>=2) do devlist
    MyCudaModule.init(devlist) do dev
        blocks = 1
        threads = nrow
        global result = 0
        result = for i in 1:10
            MyCudaModule.cudaSumCol(d_out1,d_mat,d_ncol,blocks,threads)

            result = to_host(d_out1)[1]
        end
    end
end

cudaSumCol is a function ta simply sums a matrix´s entries convetring it 
into a column, it was wrapped just like the example on CUArt´s README.
the above code without the loop part work just perfectly.

Should I try something different, like not using the do devlist?

thanks,
Joaquim

Reply via email to