Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'

2016-03-03 Thread Tim Holy
Thanks for the update.

On Thursday, March 03, 2016 10:52:26 AM Matthew Pearce wrote:
> To get it straight if A is a matrix in main memory, the corresponding GPU
> memory object is d_A = CudaArray(A) then:
> 
> A[i, j] = d_A[ j * nrows + i]

Yes, that should be right. Please do feel free to edit the README if it's 
unclear (it's often very hard for the code author to spot what's unclear!)

https://github.com/JuliaGPU/CUDArt.jl/blob/master/README.md, click pencil 
icon.

Best,
--Tim



Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'

2016-03-03 Thread Matthew Pearce
Thanks again.

Think the problem may have been with my kernel and getting confused about 
the row major and column major ordering of the layout of the array. 
I thought I'd checked it was producing the correct norms yesterday, but I 
must have changed something...

To get it straight if A is a matrix in main memory, the corresponding GPU 
memory object is d_A = CudaArray(A) then:

A[i, j] = d_A[ j * nrows + i]

Is that right? I guess I got confused by the discussion of transposition in 
the CUDArt docs.

Matthew


Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'

2016-03-03 Thread Tim Holy
Oh, drat, I misread it as if you were trying to create an n-dimensional array 
of size one in each dimension. I forgot you could move a CPU-array to the host 
with a construct like that. Sorry about that.

More efficient, however, is to use fill or fill! for such things, since there 
is 
no memory movement involved (it is _only_ created on the GPU).

Regarding the rest, I'm not sure. If you figure it out, adding a note to the 
README seems like it could be helpful. Multi-process is still a bit 
challenging sometimes, though package precompilation has made it a lot more 
pleasant than it used to be!

Best,
--Tim

On Thursday, March 03, 2016 07:27:27 AM Matthew Pearce wrote:
> Thanks Tim.
> 
> For me `elty=Float32' so if I use `CudaArray(elty, ones(10))' or `CudaArray(
> elty, ones(10)...)' I get a conversion error. [I am running Julia
> 0.5.0-dev+749]
> The result of my CudaArray creation above looks like:
> 
> julia> to_host(CudaArray(map(elty, ones(10'
> 1x10 Array{Float32,2}:
>  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
> 
> I tried putting a `device_synchronize()' call in the `p2' block above like
> so, which was probably needed anyway, but doesn't fix the error:
> 
> julia> p2 = quote
>elty = eltype(d_M)
>n1, n2 = size(d_M)
>d_dots = CudaArray(map(elty, ones(n1)))
>dev = device(d_dots)
>dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)]
>numblox = Int(ceil(n1/cudakernels.maxBlock))
>CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2,
> d_dots))
>device_synchronize()
>dots = to_host(d_dots)
>free(d_dots)
>dots
>end
> 
> julia> sow(reps[3], :d_M, :(residual_shared(Y,A_init,S_init,1,sig)))
> RemoteRef{Channel{Any}}(51,1,40341)
> 
> julia> reap(reps[3], :(string(d_M)))
> Dict{Int64,Any} with 1 entry:
>   51 => "CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32}
> @0x000b041e),(4000,2500),0)"
> 
> julia> reap(reps[3], p2)
> ERROR: On worker 51:
> "an illegal memory access was encountered"
>  [inlined code] from essentials.jl:111
>  in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16
>  [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/
> gen_libcudart.jl:16
>  in device_synchronize at /home/mcp50/.julia/v0.5/CUDArt/src/device.jl:28
>  in anonymous at multi.jl:892
>  in run_work_thunk at multi.jl:645
>  [inlined code] from multi.jl:892
>  in anonymous at task.jl:59
>  in remotecall_fetch at multi.jl:731
>  [inlined code] from multi.jl:368
>  in remotecall_fetch at multi.jl:734
>  in anonymous at task.jl:443
>  in sync_end at ./task.jl:409
>  [inlined code] from task.jl:418
>  in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203
> 
> One thing I have noted is that a remote process crashes if I ever attempt
> to move a `CudaArray' type/pointer from it to the host.
> That shouldn't be happening in the above, but I wonder if, inadevertently
> something similar is happening.
> 
> If I try calling the kernel on another process on the same machine, I don't
> get the error:
> 
> julia> sow(62, :d_M, :(residual_shared($Y_init,$A_init,$S_init,1,$sig)))
> RemoteRef{Channel{Any}}(62,1,40936)
> 
> julia> sum(reap(62, p2)[62])
> 5.149127f6
> 
> Hmm...



Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'

2016-03-03 Thread Matthew Pearce
Thanks Tim. 

For me `elty=Float32' so if I use `CudaArray(elty, ones(10))' or `CudaArray(
elty, ones(10)...)' I get a conversion error. [I am running Julia 
0.5.0-dev+749]
The result of my CudaArray creation above looks like:

julia> to_host(CudaArray(map(elty, ones(10'
1x10 Array{Float32,2}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

I tried putting a `device_synchronize()' call in the `p2' block above like 
so, which was probably needed anyway, but doesn't fix the error:

julia> p2 = quote 
   elty = eltype(d_M)
   n1, n2 = size(d_M)
   d_dots = CudaArray(map(elty, ones(n1)))
   dev = device(d_dots)
   dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)]
   numblox = Int(ceil(n1/cudakernels.maxBlock))
   CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2, 
d_dots))
   device_synchronize()
   dots = to_host(d_dots)
   free(d_dots)
   dots
   end

julia> sow(reps[3], :d_M, :(residual_shared(Y,A_init,S_init,1,sig)))
RemoteRef{Channel{Any}}(51,1,40341)

julia> reap(reps[3], :(string(d_M)))
Dict{Int64,Any} with 1 entry:
  51 => "CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} 
@0x000b041e),(4000,2500),0)"

julia> reap(reps[3], p2)
ERROR: On worker 51:
"an illegal memory access was encountered"
 [inlined code] from essentials.jl:111
 in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16
 [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/
gen_libcudart.jl:16
 in device_synchronize at /home/mcp50/.julia/v0.5/CUDArt/src/device.jl:28
 in anonymous at multi.jl:892
 in run_work_thunk at multi.jl:645
 [inlined code] from multi.jl:892
 in anonymous at task.jl:59
 in remotecall_fetch at multi.jl:731
 [inlined code] from multi.jl:368
 in remotecall_fetch at multi.jl:734
 in anonymous at task.jl:443
 in sync_end at ./task.jl:409
 [inlined code] from task.jl:418
 in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203

One thing I have noted is that a remote process crashes if I ever attempt 
to move a `CudaArray' type/pointer from it to the host. 
That shouldn't be happening in the above, but I wonder if, inadevertently 
something similar is happening.

If I try calling the kernel on another process on the same machine, I don't 
get the error:

julia> sow(62, :d_M, :(residual_shared($Y_init,$A_init,$S_init,1,$sig)))
RemoteRef{Channel{Any}}(62,1,40936)

julia> sum(reap(62, p2)[62])
5.149127f6

Hmm...






Re: [julia-users] Using CUDArt on remote machines - 'illegal memory access'

2016-03-03 Thread Tim Holy
I can tell you I've gotten CUDArt-based operations working on remote machines.

I don't really know what the issue is, but I did notice a couple of concerns:

- what's up with `CudaArray(map(elty, ones(n1)))`? That doesn't look right at 
all. Don't you mean `CudaArray(elty, ones(n1)...)`?
- don't you need to make sure the kernel completes before calling to_host? 

Best,
--Tim

On Thursday, March 03, 2016 04:31:54 AM Matthew Pearce wrote:
> Hello
> 
> I've come across a baffling error. I have a custom CUDA kernel to calculate
> squared row norms of a matrix. It works fine on the host computer:
> 
> julia> d_M = residual_shared(Y_init,A_init,S_init,k,sig)
> CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} @
> 0x000b037a),(4000,2500),0)
> 
> julia> sum(cudakernels.sqrownorms(d_M))
> 5.149127f6
> 
> However when I try to run the same code on a remote machine, the variable
> `d_M' gets calculated properly. The custom kernel launch code looks like:
> 
> function sqrownorms{T}(d_M::CUDArt.CudaArray{T,2})
> elty = eltype(d_M)
> n1, n2 = size(d_M)
> d_dots = CudaArray(map(elty, ones(n1)))
> dev = device(d_dots)
> dotf = ptxdict[(dev, "sqrownorms", elty)]
> numblox = Int(ceil(n1/maxBlock))
> CUDArt.launch(dotf, numblox, maxBlock, (d_M, n1, n2, d_dots))
> dots = to_host(d_dots)
> free(d_dots)
> return dots
> end
> 
> Running the inside of this on a remote causes the following crash message.
> (Running the function produces an unhelpful process exited arrgh! error).
> 
> julia> sow(reps[5], :d_M, :(residual_shared(Y,A_init,S_init,1,sig)))
> 
> julia> p2 = quote
>elty = eltype(d_M)
>n1, n2 = size(d_M)
>d_dots = CudaArray(map(elty, ones(n1)))
>dev = device(d_dots)
>dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)]
>numblox = Int(ceil(n1/cudakernels.maxBlock))
>CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2,
> d_dots))
>dots = to_host(d_dots)
>free(d_dots)
>dots
>end;
> 
> julia> reap(reps[5], p2)  #this is a remote call fetch of the eval of the
> `p2' block in global scope
> ERROR: On worker 38:
> "an illegal memory access was encountered"
>  [inlined code] from essentials.jl:111
>  in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16
>  [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/stream.jl:11
>  in cudaMemcpyAsync at /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/
> gen_libcudart.jl:396
>  in copy! at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:152
>  in to_host at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:148
>  in anonymous at multi.jl:892
>  in run_work_thunk at multi.jl:645
>  [inlined code] from multi.jl:892
>  in anonymous at task.jl:59
>  in remotecall_fetch at multi.jl:731
>  [inlined code] from multi.jl:368
>  in remotecall_fetch at multi.jl:734
>  in anonymous at task.jl:443
>  in sync_end at ./task.jl:409
>  [inlined code] from task.jl:418
>  in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203
> 
> Any thoughts much appreciated - I'm not sure where to go with this now.
> 
> Matthew



[julia-users] Using CUDArt on remote machines - 'illegal memory access'

2016-03-03 Thread Matthew Pearce
Hello

I've come across a baffling error. I have a custom CUDA kernel to calculate 
squared row norms of a matrix. It works fine on the host computer:

julia> d_M = residual_shared(Y_init,A_init,S_init,k,sig)
CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32} @
0x000b037a),(4000,2500),0)

julia> sum(cudakernels.sqrownorms(d_M))
5.149127f6

However when I try to run the same code on a remote machine, the variable 
`d_M' gets calculated properly. The custom kernel launch code looks like:

function sqrownorms{T}(d_M::CUDArt.CudaArray{T,2})
elty = eltype(d_M)
n1, n2 = size(d_M)
d_dots = CudaArray(map(elty, ones(n1)))
dev = device(d_dots)
dotf = ptxdict[(dev, "sqrownorms", elty)]
numblox = Int(ceil(n1/maxBlock)) 
CUDArt.launch(dotf, numblox, maxBlock, (d_M, n1, n2, d_dots))
dots = to_host(d_dots)
free(d_dots)
return dots
end

Running the inside of this on a remote causes the following crash message. 
(Running the function produces an unhelpful process exited arrgh! error).

julia> sow(reps[5], :d_M, :(residual_shared(Y,A_init,S_init,1,sig)))

julia> p2 = quote 
   elty = eltype(d_M)
   n1, n2 = size(d_M)
   d_dots = CudaArray(map(elty, ones(n1)))
   dev = device(d_dots)
   dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)]
   numblox = Int(ceil(n1/cudakernels.maxBlock))
   CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2, 
d_dots))
   dots = to_host(d_dots)
   free(d_dots)
   dots
   end;

julia> reap(reps[5], p2)  #this is a remote call fetch of the eval of the 
`p2' block in global scope
ERROR: On worker 38:
"an illegal memory access was encountered"
 [inlined code] from essentials.jl:111
 in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16
 [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/stream.jl:11
 in cudaMemcpyAsync at /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/
gen_libcudart.jl:396
 in copy! at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:152
 in to_host at /home/mcp50/.julia/v0.5/CUDArt/src/arrays.jl:148
 in anonymous at multi.jl:892
 in run_work_thunk at multi.jl:645
 [inlined code] from multi.jl:892
 in anonymous at task.jl:59
 in remotecall_fetch at multi.jl:731
 [inlined code] from multi.jl:368
 in remotecall_fetch at multi.jl:734
 in anonymous at task.jl:443
 in sync_end at ./task.jl:409
 [inlined code] from task.jl:418
 in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203

Any thoughts much appreciated - I'm not sure where to go with this now.

Matthew