Amit and Blake, thanks for all your advice. I've managed to cobble together
a shared memory version of LowRankModels.jl, using the workarounds we
devised above. In case you're interested, it's at

https://github.com/madeleineudell/LowRankModels.jl/blob/master/src/shareglrm.jl

and you can run it using eg

julia -p 3 LowRankModels/examples/sharetest.jl

There's a significant overhead, but it's faster than the serial version for
large problem sizes. Any advice for reducing the overhead would be much
appreciated.

However, in that package I'm seeing some unexpected behavior: occasionally
it seems that some of the processes have not finished their jobs at the end
of an @everywhere block, although looking at the code for @everywhere I see
it's wrapped in a @sync already. Is there something else I can use to
synchronize (ie wait for completion of all) the processes?

On Tue, Dec 2, 2014 at 12:21 AM, Amit Murthy <amit.mur...@gmail.com> wrote:

> Issue - https://github.com/JuliaLang/julia/issues/9219
>
> On Tue, Dec 2, 2014 at 10:04 AM, Amit Murthy <amit.mur...@gmail.com>
> wrote:
>
>> From the documentation - "Modules in Julia are separate global variable
>> workspaces."
>>
>> So what is happening is that the anonymous function in "remotecall(i,
>> x->(global const X=x; nothing), localX)" creates X as module global.
>>
>> The following works:
>>
>> module ParallelStuff
>> export doparallelstuff
>>
>> function doparallelstuff()(m = 10, n = 20)
>>     # initialize variables
>>     localX = Base.shmem_rand(m; pids=procs())
>>     localY = Base.shmem_rand(n; pids=procs())
>>     localf = [x->i+sum(x) for i=1:m]
>>     localg = [x->i+sum(x) for i=1:n]
>>
>>     # broadcast variables to all worker processes (thanks to Amit Murthy
>> for suggesting this syntax)
>>     @sync begin
>>         for i in procs(localX)
>>             remotecall(i, x->(global X=x; nothing), localX)
>>             remotecall(i, x->(global Y=x; nothing), localY)
>>             remotecall(i, x->(global f=x; nothing), localf)
>>             remotecall(i, x->(global g=x; nothing), localg)
>>         end
>>     end
>>
>>     # compute
>>     for iteration=1:1
>>         @everywhere begin
>>             X=ParallelStuff.X
>>             Y=ParallelStuff.Y
>>             f=ParallelStuff.f
>>             g=ParallelStuff.g
>>             for i=localindexes(X)
>>                 X[i] = f[i](Y)
>>             end
>>             for j=localindexes(Y)
>>                 Y[j] = g[j](X)
>>             end
>>         end
>>     end
>> end
>>
>> end #module
>>
>>
>> While remotecall, @everywhere, etc run under Main, the fact that the
>> closure variables refers to Module ParallelStuff is pretty confusing.....
>> I think we need a better way to handle this.
>>
>>
>> On Tue, Dec 2, 2014 at 4:58 AM, Madeleine Udell <
>> madeleine.ud...@gmail.com> wrote:
>>
>>> Thanks to Blake and Amit for some excellent suggestions! Both strategies
>>> work fine when embedded in functions, but not when those functions are
>>> embedded in modules. For example, the following throws an error:
>>>
>>> @everywhere include("ParallelStuff.jl")
>>> @everywhere using ParallelStuff
>>> doparallelstuff()
>>>
>>> when ParallelStuff.jl contains the following code:
>>>
>>> module ParallelStuff
>>> export doparallelstuff
>>>
>>> function doparallelstuff(m = 10, n = 20)
>>>     # initialize variables
>>>     localX = Base.shmem_rand(m; pids=procs())
>>>     localY = Base.shmem_rand(n; pids=procs())
>>>     localf = [x->i+sum(x) for i=1:m]
>>>     localg = [x->i+sum(x) for i=1:n]
>>>
>>>     # broadcast variables to all worker processes (thanks to Amit Murthy
>>> for suggesting this syntax)
>>>     @sync begin
>>>         for i in procs(localX)
>>>             remotecall(i, x->(global const X=x; nothing), localX)
>>>             remotecall(i, x->(global const Y=x; nothing), localY)
>>>             remotecall(i, x->(global const f=x; nothing), localf)
>>>             remotecall(i, x->(global const g=x; nothing), localg)
>>>         end
>>>     end
>>>
>>>     # compute
>>>     for iteration=1:1
>>>         @everywhere for i=localindexes(X)
>>>             X[i] = f[i](Y)
>>>         end
>>>         @everywhere for j=localindexes(Y)
>>>             Y[j] = g[j](X)
>>>         end
>>>     end
>>> end
>>>
>>> end #module
>>>
>>> On 3 processes (julia -p 3), the error is as follows:
>>>
>>> exception on 1: exception on 2: exception on 3: ERROR: X not defined
>>>  in anonymous at no file
>>>  in eval at
>>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>>  in anonymous at multi.jl:1310
>>>  in run_work_thunk at multi.jl:621
>>>  in run_work_thunk at multi.jl:630
>>>  in anonymous at task.jl:6
>>> ERROR: X not defined
>>>  in anonymous at no file
>>>  in eval at
>>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>>  in anonymous at multi.jl:1310
>>>  in anonymous at multi.jl:848
>>>  in run_work_thunk at multi.jl:621
>>>  in run_work_thunk at multi.jl:630
>>>  in anonymous at task.jl:6
>>> ERROR: X not defined
>>>  in anonymous at no file
>>>  in eval at
>>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>>  in anonymous at multi.jl:1310
>>>  in anonymous at multi.jl:848
>>>  in run_work_thunk at multi.jl:621
>>>  in run_work_thunk at multi.jl:630
>>>  in anonymous at task.jl:6
>>> exception on exception on 2: 1: ERROR: Y not defined
>>>  in anonymous at no file
>>>  in eval at
>>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>>  in anonymous at multi.jl:1310
>>>  in anonymous at multi.jl:848
>>>  in run_work_thunk at multi.jl:621
>>>  in run_work_thunk at multi.jl:630
>>>  in anonymous at task.jl:6
>>> ERROR: Y not defined
>>>  in anonymous at no file
>>>  in eval at
>>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>>  in anonymous at multi.jl:1310
>>>  in run_work_thunk at multi.jl:621
>>>  in run_work_thunk at multi.jl:630
>>>  in anonymous at task.jl:6
>>> exception on 3: ERROR: Y not defined
>>>  in anonymous at no file
>>>  in eval at
>>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>>  in anonymous at multi.jl:1310
>>>  in anonymous at multi.jl:848
>>>  in run_work_thunk at multi.jl:621
>>>  in run_work_thunk at multi.jl:630
>>>  in anonymous at task.jl:6
>>>
>>> For comparison, the non-modularized version works:
>>>
>>> function doparallelstuff(m = 10, n = 20)
>>>     # initialize variables
>>>     localX = Base.shmem_rand(m; pids=procs())
>>>     localY = Base.shmem_rand(n; pids=procs())
>>>     localf = [x->i+sum(x) for i=1:m]
>>>     localg = [x->i+sum(x) for i=1:n]
>>>
>>>     # broadcast variables to all worker processes (thanks to Amit Murthy
>>> for suggesting this syntax)
>>>     @sync begin
>>>         for i in procs(localX)
>>>             remotecall(i, x->(global const X=x; nothing), localX)
>>>             remotecall(i, x->(global const Y=x; nothing), localY)
>>>             remotecall(i, x->(global const f=x; nothing), localf)
>>>             remotecall(i, x->(global const g=x; nothing), localg)
>>>         end
>>>     end
>>>
>>>     # compute
>>>     for iteration=1:1
>>>         @everywhere for i=localindexes(X)
>>>             X[i] = f[i](Y)
>>>         end
>>>         @everywhere for j=localindexes(Y)
>>>             Y[j] = g[j](X)
>>>         end
>>>     end
>>> end
>>>
>>> doparallelstuff()
>>>
>>> On Mon, Nov 24, 2014 at 11:24 AM, Blake Johnson <
>>> blakejohnso...@gmail.com> wrote:
>>>
>>>> I use this macro to send variables to remote processes:
>>>>
>>>> macro sendvar(proc, x)
>>>>     quote
>>>>         rr = RemoteRef()
>>>>         put!(rr, $x)
>>>>         remotecall($proc, (rr)->begin
>>>>             global $(esc(x))
>>>>             $(esc(x)) = fetch(rr)
>>>>         end, rr)
>>>>     end
>>>> end
>>>>
>>>> Though the solution above looks a little simpler.
>>>>
>>>> --Blake
>>>>
>>>> On Sunday, November 23, 2014 1:30:49 AM UTC-5, Amit Murthy wrote:
>>>>>
>>>>> From the description of Base.localize_vars - 'wrap an expression in
>>>>> "let a=a,b=b,..." for each var it references'
>>>>>
>>>>> Though that does not seem to the only(?) issue here....
>>>>>
>>>>> On Sun, Nov 23, 2014 at 11:52 AM, Madeleine Udell <
>>>>> madelei...@gmail.com> wrote:
>>>>>
>>>>>> Thanks! This is extremely helpful.
>>>>>>
>>>>>> Can you tell me more about what localize_vars does?
>>>>>>
>>>>>> On Sat, Nov 22, 2014 at 9:11 PM, Amit Murthy <amit....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This works:
>>>>>>>
>>>>>>> function doparallelstuff(m = 10, n = 20)
>>>>>>>     # initialize variables
>>>>>>>     localX = Base.shmem_rand(m; pids=procs())
>>>>>>>     localY = Base.shmem_rand(n; pids=procs())
>>>>>>>     localf = [x->i+sum(x) for i=1:m]
>>>>>>>     localg = [x->i+sum(x) for i=1:n]
>>>>>>>
>>>>>>>     # broadcast variables to all worker processes
>>>>>>>     @sync begin
>>>>>>>         for i in procs(localX)
>>>>>>>             remotecall(i, x->(global X; X=x; nothing), localX)
>>>>>>>             remotecall(i, x->(global Y; Y=x; nothing), localY)
>>>>>>>             remotecall(i, x->(global f; f=x; nothing), localf)
>>>>>>>             remotecall(i, x->(global g; g=x; nothing), localg)
>>>>>>>         end
>>>>>>>     end
>>>>>>>
>>>>>>>     # compute
>>>>>>>     for iteration=1:1
>>>>>>>         @everywhere for i=localindexes(X)
>>>>>>>             X[i] = f[i](Y)
>>>>>>>         end
>>>>>>>         @everywhere for j=localindexes(Y)
>>>>>>>             Y[j] = g[j](X)
>>>>>>>         end
>>>>>>>     end
>>>>>>> end
>>>>>>>
>>>>>>> doparallelstuff()
>>>>>>>
>>>>>>> Though I would have expected broadcast of variables to be possible
>>>>>>> with just
>>>>>>> @everywhere X=localX
>>>>>>> and so on ....
>>>>>>>
>>>>>>>
>>>>>>> Looks like @everywhere does not call localize_vars.  I don't know if
>>>>>>> this is by design or just an oversight. I would have expected it to do 
>>>>>>> so.
>>>>>>> Will file an issue on github.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Nov 23, 2014 at 8:24 AM, Madeleine Udell <
>>>>>>> madelei...@gmail.com> wrote:
>>>>>>>
>>>>>>>> The code block I posted before works, but throws an error when
>>>>>>>> embedded in a function: "ERROR: X not defined" (in first line of
>>>>>>>> @parallel). Why am I getting this error when I'm *assigning to* X?
>>>>>>>>
>>>>>>>> function doparallelstuff(m = 10, n = 20)
>>>>>>>>     # initialize variables
>>>>>>>>     localX = Base.shmem_rand(m)
>>>>>>>>     localY = Base.shmem_rand(n)
>>>>>>>>     localf = [x->i+sum(x) for i=1:m]
>>>>>>>>     localg = [x->i+sum(x) for i=1:n]
>>>>>>>>
>>>>>>>>     # broadcast variables to all worker processes
>>>>>>>>     @parallel for i=workers()
>>>>>>>>         global X = localX
>>>>>>>>         global Y = localY
>>>>>>>>         global f = localf
>>>>>>>>         global g = localg
>>>>>>>>     end
>>>>>>>>     # give variables same name on master
>>>>>>>>     X,Y,f,g = localX,localY,localf,localg
>>>>>>>>
>>>>>>>>     # compute
>>>>>>>>     for iteration=1:1
>>>>>>>>         @everywhere for i=localindexes(X)
>>>>>>>>             X[i] = f[i](Y)
>>>>>>>>         end
>>>>>>>>         @everywhere for j=localindexes(Y)
>>>>>>>>             Y[j] = g[j](X)
>>>>>>>>         end
>>>>>>>>     end
>>>>>>>> end
>>>>>>>>
>>>>>>>> doparallelstuff()
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2014 at 5:13 PM, Madeleine Udell <
>>>>>>>> madelei...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> My experiments with parallelism also occur in focused blocks; I
>>>>>>>>> think that's a sign that it's not yet as user friendly as it could be.
>>>>>>>>>
>>>>>>>>> Here's a solution to the problem I posed that's simple to use:
>>>>>>>>> @parallel + global can be used to broadcast a variable, while 
>>>>>>>>> @everywhere
>>>>>>>>> can be used to do a computation on local data (ie, without resending 
>>>>>>>>> the
>>>>>>>>> data). I'm not sure how to do the variable renaming programmatically,
>>>>>>>>> though.
>>>>>>>>>
>>>>>>>>> # initialize variables
>>>>>>>>> m,n = 10,20
>>>>>>>>> localX = Base.shmem_rand(m)
>>>>>>>>> localY = Base.shmem_rand(n)
>>>>>>>>> localf = [x->i+sum(x) for i=1:m]
>>>>>>>>> localg = [x->i+sum(x) for i=1:n]
>>>>>>>>>
>>>>>>>>> # broadcast variables to all worker processes
>>>>>>>>> @parallel for i=workers()
>>>>>>>>>     global X = localX
>>>>>>>>>     global Y = localY
>>>>>>>>>     global f = localf
>>>>>>>>>     global g = localg
>>>>>>>>> end
>>>>>>>>> # give variables same name on master
>>>>>>>>> X,Y,f,g = localX,localY,localf,localg
>>>>>>>>>
>>>>>>>>> # compute
>>>>>>>>> for iteration=1:10
>>>>>>>>>     @everywhere for i=localindexes(X)
>>>>>>>>>         X[i] = f[i](Y)
>>>>>>>>>     end
>>>>>>>>>     @everywhere for j=localindexes(Y)
>>>>>>>>>         Y[j] = g[j](X)
>>>>>>>>>     end
>>>>>>>>> end
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2014 at 11:14 AM, Tim Holy <tim....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> My experiments with parallelism tend to occur in focused blocks,
>>>>>>>>>> and I haven't
>>>>>>>>>> done it in quite a while. So I doubt I can help much. But in
>>>>>>>>>> general I suspect
>>>>>>>>>> you're encountering these problems because much of the IPC goes
>>>>>>>>>> through
>>>>>>>>>> thunks, and so a lot of stuff gets reclaimed when execution is
>>>>>>>>>> done.
>>>>>>>>>>
>>>>>>>>>> If I were experimenting, I'd start by trying to create
>>>>>>>>>> RemoteRef()s and put!
>>>>>>>>>> ()ing my variables into them. Then perhaps you might be able to
>>>>>>>>>> fetch them
>>>>>>>>>> from other processes. Not sure that will work, but it seems to be
>>>>>>>>>> worth a try.
>>>>>>>>>>
>>>>>>>>>> HTH,
>>>>>>>>>> --Tim
>>>>>>>>>>
>>>>>>>>>> On Thursday, November 20, 2014 08:20:19 PM Madeleine Udell wrote:
>>>>>>>>>> > I'm trying to use parallelism in julia for a task with a
>>>>>>>>>> structure that I
>>>>>>>>>> > think is quite pervasive. It looks like this:
>>>>>>>>>> >
>>>>>>>>>> > # broadcast lists of functions f and g to all processes so
>>>>>>>>>> they're
>>>>>>>>>> > available everywhere
>>>>>>>>>> > # create shared arrays X,Y on all processes so they're
>>>>>>>>>> available everywhere
>>>>>>>>>> > for iteration=1:1000
>>>>>>>>>> > @parallel for i=1:size(X)
>>>>>>>>>> > X[i] = f[i](Y)
>>>>>>>>>> > end
>>>>>>>>>> > @parallel for j=1:size(Y)
>>>>>>>>>> > Y[j] = g[j](X)
>>>>>>>>>> > end
>>>>>>>>>> > end
>>>>>>>>>> >
>>>>>>>>>> > I'm having trouble making this work, and I'm not sure where to
>>>>>>>>>> dig around
>>>>>>>>>> > to find a solution. Here are the difficulties I've encountered:
>>>>>>>>>> >
>>>>>>>>>> > * @parallel doesn't allow me to create persistent variables on
>>>>>>>>>> each
>>>>>>>>>> > process; ie, the following results in an error.
>>>>>>>>>> >
>>>>>>>>>> >         s = Base.shmem_rand(12,3)
>>>>>>>>>> > @parallel for i=1:nprocs() m,n = size(s) end
>>>>>>>>>> > @parallel for i=1:nprocs() println(m) end
>>>>>>>>>> >
>>>>>>>>>> > * @everywhere does allow me to create persistent variables on
>>>>>>>>>> each process,
>>>>>>>>>> > but doesn't send any data at all, including the variables I
>>>>>>>>>> need in order
>>>>>>>>>> > to define new variables. Eg the following is an error: s is a
>>>>>>>>>> shared array,
>>>>>>>>>> > but the variable (ie pointer to) s is apparently not shared.
>>>>>>>>>> > s = Base.shmem_rand(12,3)
>>>>>>>>>> > @everywhere m,n = size(s)
>>>>>>>>>> >
>>>>>>>>>> > Here are the kinds of questions I'd like to see protocode for:
>>>>>>>>>> > * How can I broadcast a variable so that it is available and
>>>>>>>>>> persistent on
>>>>>>>>>> > every process?
>>>>>>>>>> > * How can I create a reference to the same shared array "s"
>>>>>>>>>> that is
>>>>>>>>>> > accessible from every process?
>>>>>>>>>> > * How can I send a command to be performed in parallel,
>>>>>>>>>> specifying which
>>>>>>>>>> > variables should be sent to the relevant processes and which
>>>>>>>>>> should be
>>>>>>>>>> > looked up in the local namespace?
>>>>>>>>>> >
>>>>>>>>>> > Note that everything I ask above is not specific to shared
>>>>>>>>>> arrays; the same
>>>>>>>>>> > constructs would also be extremely useful in the distributed
>>>>>>>>>> case.
>>>>>>>>>> >
>>>>>>>>>> > ----------------------
>>>>>>>>>> >
>>>>>>>>>> > An interesting partial solution is the following:
>>>>>>>>>> > funcs! = Function[x->x[:] = x+k for k=1:3]
>>>>>>>>>> > d = drand(3,12)
>>>>>>>>>> > let funcs! = funcs!
>>>>>>>>>> >   @sync @parallel for k in 1:3
>>>>>>>>>> >     funcs![myid()-1](localpart(d))
>>>>>>>>>> >   end
>>>>>>>>>> > end
>>>>>>>>>> >
>>>>>>>>>> > Here, I'm not sure why the let statement is necessary to send
>>>>>>>>>> funcs!, since
>>>>>>>>>> > d is sent automatically.
>>>>>>>>>> >
>>>>>>>>>> > ---------------------
>>>>>>>>>> >
>>>>>>>>>> > Thanks!
>>>>>>>>>> > Madeleine
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Madeleine Udell
>>>>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>>>>> Stanford University
>>>>>>>>> www.stanford.edu/~udell
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Madeleine Udell
>>>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>>>> Stanford University
>>>>>>>> www.stanford.edu/~udell
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Madeleine Udell
>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>> Stanford University
>>>>>> www.stanford.edu/~udell
>>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Madeleine Udell
>>> PhD Candidate in Computational and Mathematical Engineering
>>> Stanford University
>>> www.stanford.edu/~udell
>>>
>>
>>
>


-- 
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

Reply via email to