Can I have a standard julia "for loop" inside a "device do" of CUDArt?
I tried the following example: using CUDArt, MyCudaModule nrow = 10 ncol = 3000 mat = ones(Float64,nrow,ncol) out1 = zeros(Float64,nrow) vec = Float64[1:nrow;] out2 = zeros(Float64,nrow) d_mat = CudaArray(mat) d_out1 = CudaArray(out1) d_vec = CudaArray(vec) d_out2 = CudaArray(out2) d_nrow = CudaArray(Int32[nrow;]) d_ncol = CudaArray(Int32[ncol;]) result = devices(dev->capability(dev)[1]>=2) do devlist MyCudaModule.init(devlist) do dev blocks = 1 threads = nrow global result = 0 result = for i in 1:10 MyCudaModule.cudaSumCol(d_out1,d_mat,d_ncol,blocks,threads) result = to_host(d_out1)[1] end end end cudaSumCol is a function ta simply sums a matrix´s entries convetring it into a column, it was wrapped just like the example on CUArt´s README. the above code without the loop part work just perfectly. Should I try something different, like not using the do devlist? thanks, Joaquim