I can tell you I've gotten CUDArt-based operations working on remote machines.

I don't really know what the issue is, but I did notice a couple of concerns:

- what's up with `CudaArray(map(elty, ones(n1)))`? That doesn't look right at 
all. Don't you mean `CudaArray(elty, ones(n1)...)`?
- don't you need to make sure the kernel completes before calling to_host? 

Best,
--Tim

On Thursday, March 03, 2016 04:31:54 AM Matthew Pearce wrote:
> Hello
> 
> I've come across a baffling error. I have a custom CUDA kernel to calculate
> squared row norms of a matrix. It works fine on the host computer:
> 
> julia> d_M = residual_shared(Y_init,A_init,S_init,k,sig)
> CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} @
> 0x0000000b037a0000),(4000,2500),0)
> 
> julia> sum(cudakernels.sqrownorms(d_M))
> 5.149127f6
> 
> However when I try to run the same code on a remote machine, the variable
> `d_M' gets calculated properly. The custom kernel launch code looks like:
> 
> function sqrownorms{T}(d_M::CUDArt.CudaArray{T,2})
>     elty = eltype(d_M)
>     n1, n2 = size(d_M)
>     d_dots = CudaArray(map(elty, ones(n1)))
>     dev = device(d_dots)
>     dotf = ptxdict[(dev, "sqrownorms", elty)]
>     numblox = Int(ceil(n1/maxBlock))
>     CUDArt.launch(dotf, numblox, maxBlock, (d_M, n1, n2, d_dots))
>     dots = to_host(d_dots)
>     free(d_dots)
>     return dots
> end
> 
> Running the inside of this on a remote causes the following crash message.
> (Running the function produces an unhelpful process exited arrgh! error).
> 
> julia> sow(reps[5], :d_M, :(residual_shared(Y,A_init,S_init,1,sig)))
> 
> julia> p2 = quote
>            elty = eltype(d_M)
>            n1, n2 = size(d_M)
>            d_dots = CudaArray(map(elty, ones(n1)))
>            dev = device(d_dots)
>            dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)]
>            numblox = Int(ceil(n1/cudakernels.maxBlock))
>            CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2,
> d_dots))
>            dots = to_host(d_dots)
>            free(d_dots)
>            dots
>        end;
> 
> julia> reap(reps[5], p2)  #this is a remote call fetch of the eval of the
> `p2' block in global scope
> ERROR: On worker 38:
> "an illegal memory access was encountered"
>  [inlined code] from essentials.jl:111
>  in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16
>  [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/stream.jl:11
>  in cudaMemcpyAsync at /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/
> gen_libcudart.jl:396
>  in copy! at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:152
>  in to_host at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:148
>  in anonymous at multi.jl:892
>  in run_work_thunk at multi.jl:645
>  [inlined code] from multi.jl:892
>  in anonymous at task.jl:59
>  in remotecall_fetch at multi.jl:731
>  [inlined code] from multi.jl:368
>  in remotecall_fetch at multi.jl:734
>  in anonymous at task.jl:443
>  in sync_end at ./task.jl:409
>  [inlined code] from task.jl:418
>  in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203
> 
> Any thoughts much appreciated - I'm not sure where to go with this now.
> 
> Matthew

Reply via email to