Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'
Thanks for the update. On Thursday, March 03, 2016 10:52:26 AM Matthew Pearce wrote: > To get it straight if A is a matrix in main memory, the corresponding GPU > memory object is d_A = CudaArray(A) then: > > A[i, j] = d_A[ j * nrows + i] Yes, that should be right. Please do feel free to edit the README if it's unclear (it's often very hard for the code author to spot what's unclear!) https://github.com/JuliaGPU/CUDArt.jl/blob/master/README.md, click pencil icon. Best, --Tim
Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'
Thanks again. Think the problem may have been with my kernel and getting confused about the row major and column major ordering of the layout of the array. I thought I'd checked it was producing the correct norms yesterday, but I must have changed something... To get it straight if A is a matrix in main memory, the corresponding GPU memory object is d_A = CudaArray(A) then: A[i, j] = d_A[ j * nrows + i] Is that right? I guess I got confused by the discussion of transposition in the CUDArt docs. Matthew
Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'
Oh, drat, I misread it as if you were trying to create an n-dimensional array of size one in each dimension. I forgot you could move a CPU-array to the host with a construct like that. Sorry about that. More efficient, however, is to use fill or fill! for such things, since there is no memory movement involved (it is _only_ created on the GPU). Regarding the rest, I'm not sure. If you figure it out, adding a note to the README seems like it could be helpful. Multi-process is still a bit challenging sometimes, though package precompilation has made it a lot more pleasant than it used to be! Best, --Tim On Thursday, March 03, 2016 07:27:27 AM Matthew Pearce wrote: > Thanks Tim. > > For me `elty=Float32' so if I use `CudaArray(elty, ones(10))' or `CudaArray( > elty, ones(10)...)' I get a conversion error. [I am running Julia > 0.5.0-dev+749] > The result of my CudaArray creation above looks like: > > julia> to_host(CudaArray(map(elty, ones(10' > 1x10 Array{Float32,2}: > 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 > > I tried putting a `device_synchronize()' call in the `p2' block above like > so, which was probably needed anyway, but doesn't fix the error: > > julia> p2 = quote >elty = eltype(d_M) >n1, n2 = size(d_M) >d_dots = CudaArray(map(elty, ones(n1))) >dev = device(d_dots) >dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)] >numblox = Int(ceil(n1/cudakernels.maxBlock)) >CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2, > d_dots)) >device_synchronize() >dots = to_host(d_dots) >free(d_dots) >dots >end > > julia> sow(reps[3], :d_M, :(residual_shared(Y,A_init,S_init,1,sig))) > RemoteRef{Channel{Any}}(51,1,40341) > > julia> reap(reps[3], :(string(d_M))) > Dict{Int64,Any} with 1 entry: > 51 => "CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} > @0x000b041e),(4000,2500),0)" > > julia> reap(reps[3], p2) > ERROR: On worker 51: > "an illegal memory access was encountered" > [inlined code] from essentials.jl:111 > in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16 > [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/ > gen_libcudart.jl:16 > in device_synchronize at /home/mcp50/.julia/v0.5/CUDArt/src/device.jl:28 > in anonymous at multi.jl:892 > in run_work_thunk at multi.jl:645 > [inlined code] from multi.jl:892 > in anonymous at task.jl:59 > in remotecall_fetch at multi.jl:731 > [inlined code] from multi.jl:368 > in remotecall_fetch at multi.jl:734 > in anonymous at task.jl:443 > in sync_end at ./task.jl:409 > [inlined code] from task.jl:418 > in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203 > > One thing I have noted is that a remote process crashes if I ever attempt > to move a `CudaArray' type/pointer from it to the host. > That shouldn't be happening in the above, but I wonder if, inadevertently > something similar is happening. > > If I try calling the kernel on another process on the same machine, I don't > get the error: > > julia> sow(62, :d_M, :(residual_shared($Y_init,$A_init,$S_init,1,$sig))) > RemoteRef{Channel{Any}}(62,1,40936) > > julia> sum(reap(62, p2)[62]) > 5.149127f6 > > Hmm...
Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'
Thanks Tim. For me `elty=Float32' so if I use `CudaArray(elty, ones(10))' or `CudaArray( elty, ones(10)...)' I get a conversion error. [I am running Julia 0.5.0-dev+749] The result of my CudaArray creation above looks like: julia> to_host(CudaArray(map(elty, ones(10' 1x10 Array{Float32,2}: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 I tried putting a `device_synchronize()' call in the `p2' block above like so, which was probably needed anyway, but doesn't fix the error: julia> p2 = quote elty = eltype(d_M) n1, n2 = size(d_M) d_dots = CudaArray(map(elty, ones(n1))) dev = device(d_dots) dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)] numblox = Int(ceil(n1/cudakernels.maxBlock)) CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2, d_dots)) device_synchronize() dots = to_host(d_dots) free(d_dots) dots end julia> sow(reps[3], :d_M, :(residual_shared(Y,A_init,S_init,1,sig))) RemoteRef{Channel{Any}}(51,1,40341) julia> reap(reps[3], :(string(d_M))) Dict{Int64,Any} with 1 entry: 51 => "CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} @0x000b041e),(4000,2500),0)" julia> reap(reps[3], p2) ERROR: On worker 51: "an illegal memory access was encountered" [inlined code] from essentials.jl:111 in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16 [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/ gen_libcudart.jl:16 in device_synchronize at /home/mcp50/.julia/v0.5/CUDArt/src/device.jl:28 in anonymous at multi.jl:892 in run_work_thunk at multi.jl:645 [inlined code] from multi.jl:892 in anonymous at task.jl:59 in remotecall_fetch at multi.jl:731 [inlined code] from multi.jl:368 in remotecall_fetch at multi.jl:734 in anonymous at task.jl:443 in sync_end at ./task.jl:409 [inlined code] from task.jl:418 in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203 One thing I have noted is that a remote process crashes if I ever attempt to move a `CudaArray' type/pointer from it to the host. That shouldn't be happening in the above, but I wonder if, inadevertently something similar is happening. If I try calling the kernel on another process on the same machine, I don't get the error: julia> sow(62, :d_M, :(residual_shared($Y_init,$A_init,$S_init,1,$sig))) RemoteRef{Channel{Any}}(62,1,40936) julia> sum(reap(62, p2)[62]) 5.149127f6 Hmm...
Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'
I can tell you I've gotten CUDArt-based operations working on remote machines. I don't really know what the issue is, but I did notice a couple of concerns: - what's up with `CudaArray(map(elty, ones(n1)))`? That doesn't look right at all. Don't you mean `CudaArray(elty, ones(n1)...)`? - don't you need to make sure the kernel completes before calling to_host? Best, --Tim On Thursday, March 03, 2016 04:31:54 AM Matthew Pearce wrote: > Hello > > I've come across a baffling error. I have a custom CUDA kernel to calculate > squared row norms of a matrix. It works fine on the host computer: > > julia> d_M = residual_shared(Y_init,A_init,S_init,k,sig) > CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} @ > 0x000b037a),(4000,2500),0) > > julia> sum(cudakernels.sqrownorms(d_M)) > 5.149127f6 > > However when I try to run the same code on a remote machine, the variable > `d_M' gets calculated properly. The custom kernel launch code looks like: > > function sqrownorms{T}(d_M::CUDArt.CudaArray{T,2}) > elty = eltype(d_M) > n1, n2 = size(d_M) > d_dots = CudaArray(map(elty, ones(n1))) > dev = device(d_dots) > dotf = ptxdict[(dev, "sqrownorms", elty)] > numblox = Int(ceil(n1/maxBlock)) > CUDArt.launch(dotf, numblox, maxBlock, (d_M, n1, n2, d_dots)) > dots = to_host(d_dots) > free(d_dots) > return dots > end > > Running the inside of this on a remote causes the following crash message. > (Running the function produces an unhelpful process exited arrgh! error). > > julia> sow(reps[5], :d_M, :(residual_shared(Y,A_init,S_init,1,sig))) > > julia> p2 = quote >elty = eltype(d_M) >n1, n2 = size(d_M) >d_dots = CudaArray(map(elty, ones(n1))) >dev = device(d_dots) >dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)] >numblox = Int(ceil(n1/cudakernels.maxBlock)) >CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2, > d_dots)) >dots = to_host(d_dots) >free(d_dots) >dots >end; > > julia> reap(reps[5], p2) #this is a remote call fetch of the eval of the > `p2' block in global scope > ERROR: On worker 38: > "an illegal memory access was encountered" > [inlined code] from essentials.jl:111 > in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16 > [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/stream.jl:11 > in cudaMemcpyAsync at /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/ > gen_libcudart.jl:396 > in copy! at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:152 > in to_host at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:148 > in anonymous at multi.jl:892 > in run_work_thunk at multi.jl:645 > [inlined code] from multi.jl:892 > in anonymous at task.jl:59 > in remotecall_fetch at multi.jl:731 > [inlined code] from multi.jl:368 > in remotecall_fetch at multi.jl:734 > in anonymous at task.jl:443 > in sync_end at ./task.jl:409 > [inlined code] from task.jl:418 > in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203 > > Any thoughts much appreciated - I'm not sure where to go with this now. > > Matthew
[julia-users] Using CUDArt on remote machines - 'illegal memory access'
Hello I've come across a baffling error. I have a custom CUDA kernel to calculate squared row norms of a matrix. It works fine on the host computer: julia> d_M = residual_shared(Y_init,A_init,S_init,k,sig) CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} @ 0x000b037a),(4000,2500),0) julia> sum(cudakernels.sqrownorms(d_M)) 5.149127f6 However when I try to run the same code on a remote machine, the variable `d_M' gets calculated properly. The custom kernel launch code looks like: function sqrownorms{T}(d_M::CUDArt.CudaArray{T,2}) elty = eltype(d_M) n1, n2 = size(d_M) d_dots = CudaArray(map(elty, ones(n1))) dev = device(d_dots) dotf = ptxdict[(dev, "sqrownorms", elty)] numblox = Int(ceil(n1/maxBlock)) CUDArt.launch(dotf, numblox, maxBlock, (d_M, n1, n2, d_dots)) dots = to_host(d_dots) free(d_dots) return dots end Running the inside of this on a remote causes the following crash message. (Running the function produces an unhelpful process exited arrgh! error). julia> sow(reps[5], :d_M, :(residual_shared(Y,A_init,S_init,1,sig))) julia> p2 = quote elty = eltype(d_M) n1, n2 = size(d_M) d_dots = CudaArray(map(elty, ones(n1))) dev = device(d_dots) dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)] numblox = Int(ceil(n1/cudakernels.maxBlock)) CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2, d_dots)) dots = to_host(d_dots) free(d_dots) dots end; julia> reap(reps[5], p2) #this is a remote call fetch of the eval of the `p2' block in global scope ERROR: On worker 38: "an illegal memory access was encountered" [inlined code] from essentials.jl:111 in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16 [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/stream.jl:11 in cudaMemcpyAsync at /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/ gen_libcudart.jl:396 in copy! at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:152 in to_host at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:148 in anonymous at multi.jl:892 in run_work_thunk at multi.jl:645 [inlined code] from multi.jl:892 in anonymous at task.jl:59 in remotecall_fetch at multi.jl:731 [inlined code] from multi.jl:368 in remotecall_fetch at multi.jl:734 in anonymous at task.jl:443 in sync_end at ./task.jl:409 [inlined code] from task.jl:418 in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203 Any thoughts much appreciated - I'm not sure where to go with this now. Matthew