Re: [julia-users] Broadcasting variables

Amit Murthy Mon, 01 Dec 2014 21:22:05 -0800

Issue - https://github.com/JuliaLang/julia/issues/9219


On Tue, Dec 2, 2014 at 10:04 AM, Amit Murthy <amit.mur...@gmail.com> wrote:

> From the documentation - "Modules in Julia are separate global variable
> workspaces."
>
> So what is happening is that the anonymous function in "remotecall(i,
> x->(global const X=x; nothing), localX)" creates X as module global.
>
> The following works:
>
> module ParallelStuff
> export doparallelstuff
>
> function doparallelstuff()(m = 10, n = 20)
>     # initialize variables
>     localX = Base.shmem_rand(m; pids=procs())
>     localY = Base.shmem_rand(n; pids=procs())
>     localf = [x->i+sum(x) for i=1:m]
>     localg = [x->i+sum(x) for i=1:n]
>
>     # broadcast variables to all worker processes (thanks to Amit Murthy
> for suggesting this syntax)
>     @sync begin
>         for i in procs(localX)
>             remotecall(i, x->(global X=x; nothing), localX)
>             remotecall(i, x->(global Y=x; nothing), localY)
>             remotecall(i, x->(global f=x; nothing), localf)
>             remotecall(i, x->(global g=x; nothing), localg)
>         end
>     end
>
>     # compute
>     for iteration=1:1
>         @everywhere begin
>             X=ParallelStuff.X
>             Y=ParallelStuff.Y
>             f=ParallelStuff.f
>             g=ParallelStuff.g
>             for i=localindexes(X)
>                 X[i] = f[i](Y)
>             end
>             for j=localindexes(Y)
>                 Y[j] = g[j](X)
>             end
>         end
>     end
> end
>
> end #module
>
>
> While remotecall, @everywhere, etc run under Main, the fact that the
> closure variables refers to Module ParallelStuff is pretty confusing.....
> I think we need a better way to handle this.
>
>
> On Tue, Dec 2, 2014 at 4:58 AM, Madeleine Udell <madeleine.ud...@gmail.com
> > wrote:
>
>> Thanks to Blake and Amit for some excellent suggestions! Both strategies
>> work fine when embedded in functions, but not when those functions are
>> embedded in modules. For example, the following throws an error:
>>
>> @everywhere include("ParallelStuff.jl")
>> @everywhere using ParallelStuff
>> doparallelstuff()
>>
>> when ParallelStuff.jl contains the following code:
>>
>> module ParallelStuff
>> export doparallelstuff
>>
>> function doparallelstuff(m = 10, n = 20)
>>     # initialize variables
>>     localX = Base.shmem_rand(m; pids=procs())
>>     localY = Base.shmem_rand(n; pids=procs())
>>     localf = [x->i+sum(x) for i=1:m]
>>     localg = [x->i+sum(x) for i=1:n]
>>
>>     # broadcast variables to all worker processes (thanks to Amit Murthy
>> for suggesting this syntax)
>>     @sync begin
>>         for i in procs(localX)
>>             remotecall(i, x->(global const X=x; nothing), localX)
>>             remotecall(i, x->(global const Y=x; nothing), localY)
>>             remotecall(i, x->(global const f=x; nothing), localf)
>>             remotecall(i, x->(global const g=x; nothing), localg)
>>         end
>>     end
>>
>>     # compute
>>     for iteration=1:1
>>         @everywhere for i=localindexes(X)
>>             X[i] = f[i](Y)
>>         end
>>         @everywhere for j=localindexes(Y)
>>             Y[j] = g[j](X)
>>         end
>>     end
>> end
>>
>> end #module
>>
>> On 3 processes (julia -p 3), the error is as follows:
>>
>> exception on 1: exception on 2: exception on 3: ERROR: X not defined
>>  in anonymous at no file
>>  in eval at
>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>  in anonymous at multi.jl:1310
>>  in run_work_thunk at multi.jl:621
>>  in run_work_thunk at multi.jl:630
>>  in anonymous at task.jl:6
>> ERROR: X not defined
>>  in anonymous at no file
>>  in eval at
>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>  in anonymous at multi.jl:1310
>>  in anonymous at multi.jl:848
>>  in run_work_thunk at multi.jl:621
>>  in run_work_thunk at multi.jl:630
>>  in anonymous at task.jl:6
>> ERROR: X not defined
>>  in anonymous at no file
>>  in eval at
>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>  in anonymous at multi.jl:1310
>>  in anonymous at multi.jl:848
>>  in run_work_thunk at multi.jl:621
>>  in run_work_thunk at multi.jl:630
>>  in anonymous at task.jl:6
>> exception on exception on 2: 1: ERROR: Y not defined
>>  in anonymous at no file
>>  in eval at
>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>  in anonymous at multi.jl:1310
>>  in anonymous at multi.jl:848
>>  in run_work_thunk at multi.jl:621
>>  in run_work_thunk at multi.jl:630
>>  in anonymous at task.jl:6
>> ERROR: Y not defined
>>  in anonymous at no file
>>  in eval at
>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>  in anonymous at multi.jl:1310
>>  in run_work_thunk at multi.jl:621
>>  in run_work_thunk at multi.jl:630
>>  in anonymous at task.jl:6
>> exception on 3: ERROR: Y not defined
>>  in anonymous at no file
>>  in eval at
>> /Users/vagrant/tmp/julia-packaging/osx10.7+/julia-master/base/sysimg.jl:7
>>  in anonymous at multi.jl:1310
>>  in anonymous at multi.jl:848
>>  in run_work_thunk at multi.jl:621
>>  in run_work_thunk at multi.jl:630
>>  in anonymous at task.jl:6
>>
>> For comparison, the non-modularized version works:
>>
>> function doparallelstuff(m = 10, n = 20)
>>     # initialize variables
>>     localX = Base.shmem_rand(m; pids=procs())
>>     localY = Base.shmem_rand(n; pids=procs())
>>     localf = [x->i+sum(x) for i=1:m]
>>     localg = [x->i+sum(x) for i=1:n]
>>
>>     # broadcast variables to all worker processes (thanks to Amit Murthy
>> for suggesting this syntax)
>>     @sync begin
>>         for i in procs(localX)
>>             remotecall(i, x->(global const X=x; nothing), localX)
>>             remotecall(i, x->(global const Y=x; nothing), localY)
>>             remotecall(i, x->(global const f=x; nothing), localf)
>>             remotecall(i, x->(global const g=x; nothing), localg)
>>         end
>>     end
>>
>>     # compute
>>     for iteration=1:1
>>         @everywhere for i=localindexes(X)
>>             X[i] = f[i](Y)
>>         end
>>         @everywhere for j=localindexes(Y)
>>             Y[j] = g[j](X)
>>         end
>>     end
>> end
>>
>> doparallelstuff()
>>
>> On Mon, Nov 24, 2014 at 11:24 AM, Blake Johnson <blakejohnso...@gmail.com
>> > wrote:
>>
>>> I use this macro to send variables to remote processes:
>>>
>>> macro sendvar(proc, x)
>>>     quote
>>>         rr = RemoteRef()
>>>         put!(rr, $x)
>>>         remotecall($proc, (rr)->begin
>>>             global $(esc(x))
>>>             $(esc(x)) = fetch(rr)
>>>         end, rr)
>>>     end
>>> end
>>>
>>> Though the solution above looks a little simpler.
>>>
>>> --Blake
>>>
>>> On Sunday, November 23, 2014 1:30:49 AM UTC-5, Amit Murthy wrote:
>>>>
>>>> From the description of Base.localize_vars - 'wrap an expression in
>>>> "let a=a,b=b,..." for each var it references'
>>>>
>>>> Though that does not seem to the only(?) issue here....
>>>>
>>>> On Sun, Nov 23, 2014 at 11:52 AM, Madeleine Udell <madelei...@gmail.com
>>>> > wrote:
>>>>
>>>>> Thanks! This is extremely helpful.
>>>>>
>>>>> Can you tell me more about what localize_vars does?
>>>>>
>>>>> On Sat, Nov 22, 2014 at 9:11 PM, Amit Murthy <amit....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This works:
>>>>>>
>>>>>> function doparallelstuff(m = 10, n = 20)
>>>>>>     # initialize variables
>>>>>>     localX = Base.shmem_rand(m; pids=procs())
>>>>>>     localY = Base.shmem_rand(n; pids=procs())
>>>>>>     localf = [x->i+sum(x) for i=1:m]
>>>>>>     localg = [x->i+sum(x) for i=1:n]
>>>>>>
>>>>>>     # broadcast variables to all worker processes
>>>>>>     @sync begin
>>>>>>         for i in procs(localX)
>>>>>>             remotecall(i, x->(global X; X=x; nothing), localX)
>>>>>>             remotecall(i, x->(global Y; Y=x; nothing), localY)
>>>>>>             remotecall(i, x->(global f; f=x; nothing), localf)
>>>>>>             remotecall(i, x->(global g; g=x; nothing), localg)
>>>>>>         end
>>>>>>     end
>>>>>>
>>>>>>     # compute
>>>>>>     for iteration=1:1
>>>>>>         @everywhere for i=localindexes(X)
>>>>>>             X[i] = f[i](Y)
>>>>>>         end
>>>>>>         @everywhere for j=localindexes(Y)
>>>>>>             Y[j] = g[j](X)
>>>>>>         end
>>>>>>     end
>>>>>> end
>>>>>>
>>>>>> doparallelstuff()
>>>>>>
>>>>>> Though I would have expected broadcast of variables to be possible
>>>>>> with just
>>>>>> @everywhere X=localX
>>>>>> and so on ....
>>>>>>
>>>>>>
>>>>>> Looks like @everywhere does not call localize_vars.  I don't know if
>>>>>> this is by design or just an oversight. I would have expected it to do 
>>>>>> so.
>>>>>> Will file an issue on github.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Nov 23, 2014 at 8:24 AM, Madeleine Udell <
>>>>>> madelei...@gmail.com> wrote:
>>>>>>
>>>>>>> The code block I posted before works, but throws an error when
>>>>>>> embedded in a function: "ERROR: X not defined" (in first line of
>>>>>>> @parallel). Why am I getting this error when I'm *assigning to* X?
>>>>>>>
>>>>>>> function doparallelstuff(m = 10, n = 20)
>>>>>>>     # initialize variables
>>>>>>>     localX = Base.shmem_rand(m)
>>>>>>>     localY = Base.shmem_rand(n)
>>>>>>>     localf = [x->i+sum(x) for i=1:m]
>>>>>>>     localg = [x->i+sum(x) for i=1:n]
>>>>>>>
>>>>>>>     # broadcast variables to all worker processes
>>>>>>>     @parallel for i=workers()
>>>>>>>         global X = localX
>>>>>>>         global Y = localY
>>>>>>>         global f = localf
>>>>>>>         global g = localg
>>>>>>>     end
>>>>>>>     # give variables same name on master
>>>>>>>     X,Y,f,g = localX,localY,localf,localg
>>>>>>>
>>>>>>>     # compute
>>>>>>>     for iteration=1:1
>>>>>>>         @everywhere for i=localindexes(X)
>>>>>>>             X[i] = f[i](Y)
>>>>>>>         end
>>>>>>>         @everywhere for j=localindexes(Y)
>>>>>>>             Y[j] = g[j](X)
>>>>>>>         end
>>>>>>>     end
>>>>>>> end
>>>>>>>
>>>>>>> doparallelstuff()
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 5:13 PM, Madeleine Udell <
>>>>>>> madelei...@gmail.com> wrote:
>>>>>>>
>>>>>>>> My experiments with parallelism also occur in focused blocks; I
>>>>>>>> think that's a sign that it's not yet as user friendly as it could be.
>>>>>>>>
>>>>>>>> Here's a solution to the problem I posed that's simple to use:
>>>>>>>> @parallel + global can be used to broadcast a variable, while 
>>>>>>>> @everywhere
>>>>>>>> can be used to do a computation on local data (ie, without resending 
>>>>>>>> the
>>>>>>>> data). I'm not sure how to do the variable renaming programmatically,
>>>>>>>> though.
>>>>>>>>
>>>>>>>> # initialize variables
>>>>>>>> m,n = 10,20
>>>>>>>> localX = Base.shmem_rand(m)
>>>>>>>> localY = Base.shmem_rand(n)
>>>>>>>> localf = [x->i+sum(x) for i=1:m]
>>>>>>>> localg = [x->i+sum(x) for i=1:n]
>>>>>>>>
>>>>>>>> # broadcast variables to all worker processes
>>>>>>>> @parallel for i=workers()
>>>>>>>>     global X = localX
>>>>>>>>     global Y = localY
>>>>>>>>     global f = localf
>>>>>>>>     global g = localg
>>>>>>>> end
>>>>>>>> # give variables same name on master
>>>>>>>> X,Y,f,g = localX,localY,localf,localg
>>>>>>>>
>>>>>>>> # compute
>>>>>>>> for iteration=1:10
>>>>>>>>     @everywhere for i=localindexes(X)
>>>>>>>>         X[i] = f[i](Y)
>>>>>>>>     end
>>>>>>>>     @everywhere for j=localindexes(Y)
>>>>>>>>         Y[j] = g[j](X)
>>>>>>>>     end
>>>>>>>> end
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2014 at 11:14 AM, Tim Holy <tim....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> My experiments with parallelism tend to occur in focused blocks,
>>>>>>>>> and I haven't
>>>>>>>>> done it in quite a while. So I doubt I can help much. But in
>>>>>>>>> general I suspect
>>>>>>>>> you're encountering these problems because much of the IPC goes
>>>>>>>>> through
>>>>>>>>> thunks, and so a lot of stuff gets reclaimed when execution is
>>>>>>>>> done.
>>>>>>>>>
>>>>>>>>> If I were experimenting, I'd start by trying to create
>>>>>>>>> RemoteRef()s and put!
>>>>>>>>> ()ing my variables into them. Then perhaps you might be able to
>>>>>>>>> fetch them
>>>>>>>>> from other processes. Not sure that will work, but it seems to be
>>>>>>>>> worth a try.
>>>>>>>>>
>>>>>>>>> HTH,
>>>>>>>>> --Tim
>>>>>>>>>
>>>>>>>>> On Thursday, November 20, 2014 08:20:19 PM Madeleine Udell wrote:
>>>>>>>>> > I'm trying to use parallelism in julia for a task with a
>>>>>>>>> structure that I
>>>>>>>>> > think is quite pervasive. It looks like this:
>>>>>>>>> >
>>>>>>>>> > # broadcast lists of functions f and g to all processes so
>>>>>>>>> they're
>>>>>>>>> > available everywhere
>>>>>>>>> > # create shared arrays X,Y on all processes so they're available
>>>>>>>>> everywhere
>>>>>>>>> > for iteration=1:1000
>>>>>>>>> > @parallel for i=1:size(X)
>>>>>>>>> > X[i] = f[i](Y)
>>>>>>>>> > end
>>>>>>>>> > @parallel for j=1:size(Y)
>>>>>>>>> > Y[j] = g[j](X)
>>>>>>>>> > end
>>>>>>>>> > end
>>>>>>>>> >
>>>>>>>>> > I'm having trouble making this work, and I'm not sure where to
>>>>>>>>> dig around
>>>>>>>>> > to find a solution. Here are the difficulties I've encountered:
>>>>>>>>> >
>>>>>>>>> > * @parallel doesn't allow me to create persistent variables on
>>>>>>>>> each
>>>>>>>>> > process; ie, the following results in an error.
>>>>>>>>> >
>>>>>>>>> >         s = Base.shmem_rand(12,3)
>>>>>>>>> > @parallel for i=1:nprocs() m,n = size(s) end
>>>>>>>>> > @parallel for i=1:nprocs() println(m) end
>>>>>>>>> >
>>>>>>>>> > * @everywhere does allow me to create persistent variables on
>>>>>>>>> each process,
>>>>>>>>> > but doesn't send any data at all, including the variables I need
>>>>>>>>> in order
>>>>>>>>> > to define new variables. Eg the following is an error: s is a
>>>>>>>>> shared array,
>>>>>>>>> > but the variable (ie pointer to) s is apparently not shared.
>>>>>>>>> > s = Base.shmem_rand(12,3)
>>>>>>>>> > @everywhere m,n = size(s)
>>>>>>>>> >
>>>>>>>>> > Here are the kinds of questions I'd like to see protocode for:
>>>>>>>>> > * How can I broadcast a variable so that it is available and
>>>>>>>>> persistent on
>>>>>>>>> > every process?
>>>>>>>>> > * How can I create a reference to the same shared array "s" that
>>>>>>>>> is
>>>>>>>>> > accessible from every process?
>>>>>>>>> > * How can I send a command to be performed in parallel,
>>>>>>>>> specifying which
>>>>>>>>> > variables should be sent to the relevant processes and which
>>>>>>>>> should be
>>>>>>>>> > looked up in the local namespace?
>>>>>>>>> >
>>>>>>>>> > Note that everything I ask above is not specific to shared
>>>>>>>>> arrays; the same
>>>>>>>>> > constructs would also be extremely useful in the distributed
>>>>>>>>> case.
>>>>>>>>> >
>>>>>>>>> > ----------------------
>>>>>>>>> >
>>>>>>>>> > An interesting partial solution is the following:
>>>>>>>>> > funcs! = Function[x->x[:] = x+k for k=1:3]
>>>>>>>>> > d = drand(3,12)
>>>>>>>>> > let funcs! = funcs!
>>>>>>>>> >   @sync @parallel for k in 1:3
>>>>>>>>> >     funcs![myid()-1](localpart(d))
>>>>>>>>> >   end
>>>>>>>>> > end
>>>>>>>>> >
>>>>>>>>> > Here, I'm not sure why the let statement is necessary to send
>>>>>>>>> funcs!, since
>>>>>>>>> > d is sent automatically.
>>>>>>>>> >
>>>>>>>>> > ---------------------
>>>>>>>>> >
>>>>>>>>> > Thanks!
>>>>>>>>> > Madeleine
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Madeleine Udell
>>>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>>>> Stanford University
>>>>>>>> www.stanford.edu/~udell
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Madeleine Udell
>>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>>> Stanford University
>>>>>>> www.stanford.edu/~udell
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Madeleine Udell
>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>> Stanford University
>>>>> www.stanford.edu/~udell
>>>>>
>>>>
>>>>
>>
>>
>> --
>> Madeleine Udell
>> PhD Candidate in Computational and Mathematical Engineering
>> Stanford University
>> www.stanford.edu/~udell
>>
>
>

Re: [julia-users] Broadcasting variables

Reply via email to