I made a plot that compares the time it takes to move an array with the
one-sided methods we are using now and with MPI.jl. It is here

https://github.com/JuliaLang/julia/issues/9167#issuecomment-64721543

2015-01-07 1:54 GMT-05:00 Amuthan <apar...@gmail.com>:

> Amit: Thanks for the suggestion. I gave it a quick try, but wasn't
> successful. It appears to me that communication between the processors (to
> obtain the boundary data) would require reconstructing the DArray from the
> localparts at the end of each iteration. I guess I'll have to take a deeper
> look into the implementation of DArrays to understand how best to implement
> this.
>
> In the meantime, I got a reasonable speedup using the Julia wrapper for
> MPI (https://github.com/JuliaParallel/MPI.jl). Has anyone tried comparing
> the performance of the one-sided message passing model of DArray and the
> standard (2-sided) MPI model?
>
> Amuthan
>
> On Mon, Jan 5, 2015 at 12:53 AM, Amit Murthy <amit.mur...@gmail.com>
> wrote:
>
>> You can have only two DArrays and use localpart() to get the local parts
>> of the arrays on each worker and work off that.
>>
>> With a single iteration the network overhead will be much more than any
>> gains from distributed computation - it depends on the computation of
>> course.
>>
>> Currently, DArrays work best if the distributed computation can work
>> solely off localparts. An efficient means of setindex! on darrays is a TODO
>> at this time.
>>
>> On Mon, Jan 5, 2015 at 12:34 PM, Amuthan <apar...@gmail.com> wrote:
>>
>>> Hi Amit: yes, the idea is to have just two DArrays, one each for the
>>> previous and current iterations. I had some trouble assigning values
>>> directly to a DArray (a setindex! error) and so had to write it like this.
>>> Do you know any means around this?
>>>
>>> Btw, the parallel code runs slower than the serial version even for just
>>> one iteration.
>>>
>>> On Sun, Jan 4, 2015 at 10:27 PM, Amit Murthy <amit.mur...@gmail.com>
>>> wrote:
>>>
>>>> As written, this is creating a 1000 DArrays. I think you intended to
>>>> have only 2 of them and swap values in each iteration?
>>>>
>>>>
>>>> On Sunday, 4 January 2015 11:07:47 UTC+5:30, Amuthan A. Ramabathiran
>>>> wrote:
>>>>>
>>>>> Hello: I recently started exploring the parallel capabilities of Julia
>>>>> and I need some help in understanding and improving the performance a very
>>>>> elementary parallel code using DArrays (I use Julia
>>>>> version 0.4.0-dev+2431). The code pasted below (based essentially on
>>>>> plife.jl) solves u''(x) = 0, x \in [0,1] with u(0) and u(1) specified,
>>>>> using the 2nd order central difference approximation. The parallel version
>>>>> of the code runs significantly slower than the serial version. It would be
>>>>> nice if someone could point out ways to improve this and/or suggest an
>>>>> alternative efficient version.
>>>>>
>>>>> function laplace_1D_serial(u::Array{Float64})
>>>>>    N = length(u) - 2
>>>>>    u_new = zeros(N)
>>>>>
>>>>>    for i = 1:N
>>>>>       u_new[i] = 0.5(u[i] + u[i + 2])
>>>>>    end
>>>>>
>>>>>    u_new
>>>>> end
>>>>>
>>>>> function serial_iterate(u::Array{Float64})
>>>>>    u_new = laplace_1D_serial(u)
>>>>>
>>>>>    for i = 1:length(u_new)
>>>>>       u[i + 1] = u_new[i]
>>>>>    end
>>>>> end
>>>>>
>>>>> function parallel_iterate(u::DArray)
>>>>>    DArray(size(u), procs(u)) do I
>>>>>       J = I[1]
>>>>>
>>>>>       if myid() == 2
>>>>>          local_array = zeros(length(J) + 1)
>>>>>          for i = J[1] : J[end] + 1
>>>>>             local_array[i - J[1] + 1] = u[i]
>>>>>          end
>>>>>          append!([float(u[1])], laplace_1D_serial(local_array))
>>>>>
>>>>>       elseif myid() == length(procs(u)) + 1
>>>>>          local_array = zeros(length(J) + 1)
>>>>>          for i = J[1] - 1 : J[end]
>>>>>             local_array[i - J[1] + 2] = u[i]
>>>>>          end
>>>>>          append!(laplace_1D_serial(local_array), [float(u[end])])
>>>>>
>>>>>       else
>>>>>          local_array = zeros(length(J) + 2)
>>>>>          for i = J[1] - 1 : J[end] + 1
>>>>>             local_array[i - J[1] + 2] = u[i]
>>>>>          end
>>>>>          laplace_1D_serial(local_array)
>>>>>
>>>>>       end
>>>>>    end
>>>>> end
>>>>>
>>>>> A sample run on my laptop with 4 processors:
>>>>> julia> u = zeros(1000); u[end] = 1.0; u_distributed = distribute(u);
>>>>>
>>>>> julia> @time for i = 1:1000
>>>>>          serial_iterate(u)
>>>>>        end
>>>>> elapsed time: 0.011452192 seconds (8300112 bytes allocated)
>>>>>
>>>>> julia> @time for i = 1:1000
>>>>>          u_distributed = parallel_iterate(u_distributed)
>>>>>        end
>>>>> elapsed time: 4.461922218 seconds (190565036 bytes allocated, 10.17%
>>>>> gc time)
>>>>>
>>>>> Thanks for your help!
>>>>>
>>>>> Cheers,
>>>>> Amuthan
>>>>>
>>>>>
>>>>>
>>>
>>
>

Reply via email to