Amit: Thanks for the suggestion. I gave it a quick try, but wasn't
successful. It appears to me that communication between the processors (to
obtain the boundary data) would require reconstructing the DArray from the
localparts at the end of each iteration. I guess I'll have to take a deeper
look into the implementation of DArrays to understand how best to implement
this.

In the meantime, I got a reasonable speedup using the Julia wrapper for MPI
(https://github.com/JuliaParallel/MPI.jl). Has anyone tried comparing the
performance of the one-sided message passing model of DArray and the
standard (2-sided) MPI model?

Amuthan

On Mon, Jan 5, 2015 at 12:53 AM, Amit Murthy <amit.mur...@gmail.com> wrote:

> You can have only two DArrays and use localpart() to get the local parts
> of the arrays on each worker and work off that.
>
> With a single iteration the network overhead will be much more than any
> gains from distributed computation - it depends on the computation of
> course.
>
> Currently, DArrays work best if the distributed computation can work
> solely off localparts. An efficient means of setindex! on darrays is a TODO
> at this time.
>
> On Mon, Jan 5, 2015 at 12:34 PM, Amuthan <apar...@gmail.com> wrote:
>
>> Hi Amit: yes, the idea is to have just two DArrays, one each for the
>> previous and current iterations. I had some trouble assigning values
>> directly to a DArray (a setindex! error) and so had to write it like this.
>> Do you know any means around this?
>>
>> Btw, the parallel code runs slower than the serial version even for just
>> one iteration.
>>
>> On Sun, Jan 4, 2015 at 10:27 PM, Amit Murthy <amit.mur...@gmail.com>
>> wrote:
>>
>>> As written, this is creating a 1000 DArrays. I think you intended to
>>> have only 2 of them and swap values in each iteration?
>>>
>>>
>>> On Sunday, 4 January 2015 11:07:47 UTC+5:30, Amuthan A. Ramabathiran
>>> wrote:
>>>>
>>>> Hello: I recently started exploring the parallel capabilities of Julia
>>>> and I need some help in understanding and improving the performance a very
>>>> elementary parallel code using DArrays (I use Julia
>>>> version 0.4.0-dev+2431). The code pasted below (based essentially on
>>>> plife.jl) solves u''(x) = 0, x \in [0,1] with u(0) and u(1) specified,
>>>> using the 2nd order central difference approximation. The parallel version
>>>> of the code runs significantly slower than the serial version. It would be
>>>> nice if someone could point out ways to improve this and/or suggest an
>>>> alternative efficient version.
>>>>
>>>> function laplace_1D_serial(u::Array{Float64})
>>>>    N = length(u) - 2
>>>>    u_new = zeros(N)
>>>>
>>>>    for i = 1:N
>>>>       u_new[i] = 0.5(u[i] + u[i + 2])
>>>>    end
>>>>
>>>>    u_new
>>>> end
>>>>
>>>> function serial_iterate(u::Array{Float64})
>>>>    u_new = laplace_1D_serial(u)
>>>>
>>>>    for i = 1:length(u_new)
>>>>       u[i + 1] = u_new[i]
>>>>    end
>>>> end
>>>>
>>>> function parallel_iterate(u::DArray)
>>>>    DArray(size(u), procs(u)) do I
>>>>       J = I[1]
>>>>
>>>>       if myid() == 2
>>>>          local_array = zeros(length(J) + 1)
>>>>          for i = J[1] : J[end] + 1
>>>>             local_array[i - J[1] + 1] = u[i]
>>>>          end
>>>>          append!([float(u[1])], laplace_1D_serial(local_array))
>>>>
>>>>       elseif myid() == length(procs(u)) + 1
>>>>          local_array = zeros(length(J) + 1)
>>>>          for i = J[1] - 1 : J[end]
>>>>             local_array[i - J[1] + 2] = u[i]
>>>>          end
>>>>          append!(laplace_1D_serial(local_array), [float(u[end])])
>>>>
>>>>       else
>>>>          local_array = zeros(length(J) + 2)
>>>>          for i = J[1] - 1 : J[end] + 1
>>>>             local_array[i - J[1] + 2] = u[i]
>>>>          end
>>>>          laplace_1D_serial(local_array)
>>>>
>>>>       end
>>>>    end
>>>> end
>>>>
>>>> A sample run on my laptop with 4 processors:
>>>> julia> u = zeros(1000); u[end] = 1.0; u_distributed = distribute(u);
>>>>
>>>> julia> @time for i = 1:1000
>>>>          serial_iterate(u)
>>>>        end
>>>> elapsed time: 0.011452192 seconds (8300112 bytes allocated)
>>>>
>>>> julia> @time for i = 1:1000
>>>>          u_distributed = parallel_iterate(u_distributed)
>>>>        end
>>>> elapsed time: 4.461922218 seconds (190565036 bytes allocated, 10.17% gc
>>>> time)
>>>>
>>>> Thanks for your help!
>>>>
>>>> Cheers,
>>>> Amuthan
>>>>
>>>>
>>>>
>>
>

Reply via email to