I can tell you I've gotten CUDArt-based operations working on remote machines.
I don't really know what the issue is, but I did notice a couple of concerns: - what's up with `CudaArray(map(elty, ones(n1)))`? That doesn't look right at all. Don't you mean `CudaArray(elty, ones(n1)...)`? - don't you need to make sure the kernel completes before calling to_host? Best, --Tim On Thursday, March 03, 2016 04:31:54 AM Matthew Pearce wrote: > Hello > > I've come across a baffling error. I have a custom CUDA kernel to calculate > squared row norms of a matrix. It works fine on the host computer: > > julia> d_M = residual_shared(Y_init,A_init,S_init,k,sig) > CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} @ > 0x0000000b037a0000),(4000,2500),0) > > julia> sum(cudakernels.sqrownorms(d_M)) > 5.149127f6 > > However when I try to run the same code on a remote machine, the variable > `d_M' gets calculated properly. The custom kernel launch code looks like: > > function sqrownorms{T}(d_M::CUDArt.CudaArray{T,2}) > elty = eltype(d_M) > n1, n2 = size(d_M) > d_dots = CudaArray(map(elty, ones(n1))) > dev = device(d_dots) > dotf = ptxdict[(dev, "sqrownorms", elty)] > numblox = Int(ceil(n1/maxBlock)) > CUDArt.launch(dotf, numblox, maxBlock, (d_M, n1, n2, d_dots)) > dots = to_host(d_dots) > free(d_dots) > return dots > end > > Running the inside of this on a remote causes the following crash message. > (Running the function produces an unhelpful process exited arrgh! error). > > julia> sow(reps[5], :d_M, :(residual_shared(Y,A_init,S_init,1,sig))) > > julia> p2 = quote > elty = eltype(d_M) > n1, n2 = size(d_M) > d_dots = CudaArray(map(elty, ones(n1))) > dev = device(d_dots) > dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)] > numblox = Int(ceil(n1/cudakernels.maxBlock)) > CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2, > d_dots)) > dots = to_host(d_dots) > free(d_dots) > dots > end; > > julia> reap(reps[5], p2) #this is a remote call fetch of the eval of the > `p2' block in global scope > ERROR: On worker 38: > "an illegal memory access was encountered" > [inlined code] from essentials.jl:111 > in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16 > [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/stream.jl:11 > in cudaMemcpyAsync at /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/ > gen_libcudart.jl:396 > in copy! at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:152 > in to_host at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:148 > in anonymous at multi.jl:892 > in run_work_thunk at multi.jl:645 > [inlined code] from multi.jl:892 > in anonymous at task.jl:59 > in remotecall_fetch at multi.jl:731 > [inlined code] from multi.jl:368 > in remotecall_fetch at multi.jl:734 > in anonymous at task.jl:443 > in sync_end at ./task.jl:409 > [inlined code] from task.jl:418 > in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203 > > Any thoughts much appreciated - I'm not sure where to go with this now. > > Matthew