Oh, drat, I misread it as if you were trying to create an n-dimensional array 
of size one in each dimension. I forgot you could move a CPU-array to the host 
with a construct like that. Sorry about that.

More efficient, however, is to use fill or fill! for such things, since there 
is 
no memory movement involved (it is _only_ created on the GPU).

Regarding the rest, I'm not sure. If you figure it out, adding a note to the 
README seems like it could be helpful. Multi-process is still a bit 
challenging sometimes, though package precompilation has made it a lot more 
pleasant than it used to be!

Best,
--Tim

On Thursday, March 03, 2016 07:27:27 AM Matthew Pearce wrote:
> Thanks Tim.
> 
> For me `elty=Float32' so if I use `CudaArray(elty, ones(10))' or `CudaArray(
> elty, ones(10)...)' I get a conversion error. [I am running Julia
> 0.5.0-dev+749]
> The result of my CudaArray creation above looks like:
> 
> julia> to_host(CudaArray(map(elty, ones(10))))'
> 1x10 Array{Float32,2}:
>  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
> 
> I tried putting a `device_synchronize()' call in the `p2' block above like
> so, which was probably needed anyway, but doesn't fix the error:
> 
> julia> p2 = quote
>            elty = eltype(d_M)
>            n1, n2 = size(d_M)
>            d_dots = CudaArray(map(elty, ones(n1)))
>            dev = device(d_dots)
>            dotf = cudakernels.ptxdict[(dev, "sqrownorms", elty)]
>            numblox = Int(ceil(n1/cudakernels.maxBlock))
>            CUDArt.launch(dotf, numblox, cudakernels.maxBlock, (d_M, n1, n2,
> d_dots))
>            device_synchronize()
>            dots = to_host(d_dots)
>            free(d_dots)
>            dots
>        end
> 
> julia> sow(reps[3], :d_M, :(residual_shared(Y,A_init,S_init,1,sig)))
> RemoteRef{Channel{Any}}(51,1,40341)
> 
> julia> reap(reps[3], :(string(d_M)))
> Dict{Int64,Any} with 1 entry:
>   51 => "CUDArt.CudaArray{Float32,2}(CUDArt.CudaPtr{Float32}(Ptr{Float32}
> @0x0000000b041e0000),(4000,2500),0)"
> 
> julia> reap(reps[3], p2)
> ERROR: On worker 51:
> "an illegal memory access was encountered"
>  [inlined code] from essentials.jl:111
>  in checkerror at /home/mcp50/.julia/v0.5/CUDArt/src/libcudart-6.5.jl:16
>  [inlined code] from /home/mcp50/.julia/v0.5/CUDArt/src/../gen-6.5/
> gen_libcudart.jl:16
>  in device_synchronize at /home/mcp50/.julia/v0.5/CUDArt/src/device.jl:28
>  in anonymous at multi.jl:892
>  in run_work_thunk at multi.jl:645
>  [inlined code] from multi.jl:892
>  in anonymous at task.jl:59
>  in remotecall_fetch at multi.jl:731
>  [inlined code] from multi.jl:368
>  in remotecall_fetch at multi.jl:734
>  in anonymous at task.jl:443
>  in sync_end at ./task.jl:409
>  [inlined code] from task.jl:418
>  in reap at /home/mcp50/.julia/v0.5/ClusterUtils/src/ClusterUtils.jl:203
> 
> One thing I have noted is that a remote process crashes if I ever attempt
> to move a `CudaArray' type/pointer from it to the host.
> That shouldn't be happening in the above, but I wonder if, inadevertently
> something similar is happening.
> 
> If I try calling the kernel on another process on the same machine, I don't
> get the error:
> 
> julia> sow(62, :d_M, :(residual_shared($Y_init,$A_init,$S_init,1,$sig)))
> RemoteRef{Channel{Any}}(62,1,40936)
> 
> julia> sum(reap(62, p2)[62])
> 5.149127f6
> 
> Hmm...

Reply via email to