Re: Interesting (perhaps unique) approach to parallelising FiPy sweeps

2016-10-03 Thread Daniel Wheeler
On Fri, Sep 16, 2016 at 3:38 PM, Gopalakrishnan, Krishnakumar
 wrote:
>
> The questions are : 1) Has anybody tried this before ? 2) Does this sound 
> problematic from a technical or practical perspective. 3) Does this sound 
> remotely useful.

To my knowledge, this hasn't been done with FiPy. Certainly, this can
be set up with FiPy using mpi4py or ipython's parallel tools, which
you seem to have done below. It could well be useful in specific
cases, but In general this approach is very limited since you can't
scale to different number of processors so I would definitely not
recommend making too much effort.

> I have successfully implemented a multiprocessor parallelisation for 
> examples.diffusion.coupled (shown below) . The values of the simulation 
> results (for v0 & v1) after 100 time-steps are very very close to that of the 
> serial sweep. Although this code is several times slower than the serial 
> code, we tried this more as a concept demonstrator before embarking to spend 
> significant effort to convert our serial code to multiprocessor approach.  If 
> you see a red flag, it would be much appreciated if you can help us by 
> pointing it out.

I don't see a red flag there. Try to set it up so that the equations
are not redefined for every sweep. Also, do an in place update of the
variables using [:] or setValue (return the array rather than the
variable from get_res).

-- 
Daniel Wheeler

___
fipy mailing list
fipy@nist.gov
http://www.ctcms.nist.gov/fipy
  [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]


Interesting (perhaps unique) approach to parallelising FiPy sweeps

2016-09-16 Thread Gopalakrishnan, Krishnakumar

Hello,

In our present system of equations, we sweep 8 loosely coupled PDEs defined in 
5 meshes, until their residues die down to an acceptable value.

Upon profiling the code, it's clear that the sweep function is a major 
computational bottleneck. We tried MPI parallelism, but that didn't work out. 
But to side-step this, recognising the fact that the sweep() functions are 
computed one-by-one in serial, we propose an interesting work-around for an 
alternative style of parallelism.

Instead of dividing the control volumes among multiple CPUs, we instead propose 
to split the 8 different sweep calls (in the sweep loop within a time-step) to 
worker cores. Each sweep can then occur independently (instead of the python 
interpreter waiting for one sweep to finish before beginning another).  I 
suspect that this would be beneficial for those problems with 6-8 sweep calls, 
wherein the sweeps are CPU intensive. Thus, the cost of communication overhead 
between the master process to the child workers is offset by the benefits 
gained by independent sweeping.

>From my fairly underwhelming numerical computing knowledge, this looks like 
>analogous to a Jacobi-style implementation, rather than a Gauss-Seidel style 
>iteration, wherein the updated cell-variable from the sweep is immediately 
>available for all subsequent PDEs of the coupled system within the sweep-loop. 
> In the case of multi-processor sweep, the values are updated using values 
>from last sweep iteration.  Thus, this might be slower (but more stable) to 
>converge, isn't it ?  There is also a penalty incurred by the communication 
>overhead of opening and killing processes.

The questions are : 1) Has anybody tried this before ? 2) Does this sound 
problematic from a technical or practical perspective. 3) Does this sound 
remotely useful.

I have successfully implemented a multiprocessor parallelisation for 
examples.diffusion.coupled (shown below) . The values of the simulation results 
(for v0 & v1) after 100 time-steps are very very close to that of the serial 
sweep. Although this code is several times slower than the serial code, we 
tried this more as a concept demonstrator before embarking to spend significant 
effort to convert our serial code to multiprocessor approach.  If you see a red 
flag, it would be much appreciated if you can help us by pointing it out.

from fipy import Grid1D, CellVariable, TransientTerm, DiffusionTerm, Viewer
from multiprocessing import Process, Queue, cpu_count

m = Grid1D(nx=100, Lx=1.)

v0 = CellVariable(mesh=m, hasOld=True, value=0.5)
v1 = CellVariable(mesh=m, hasOld=True, value=0.5)

def boundaryConditions(m,v0,v1):
v0.constrain(0, m.facesLeft)
v0.constrain(1, m.facesRight)
v1.constrain(1, m.facesLeft)
v1.constrain(0, m.facesRight)

## Multiprocessor implementation 
#
def get_res0(res0_queue,v0,v1):
boundaryConditions(m, v0, v1)
eq0 = TransientTerm() == DiffusionTerm(coeff=0.01) - v1.faceGrad.divergence
res0 = eq0.sweep(var=v0, dt=1e-5)
res0_queue.put([res0,v0])
def get_res1(res1_queue,v0,v1):
boundaryConditions(m, v0, v1)
eq1 = TransientTerm() == v0.faceGrad.divergence + DiffusionTerm(coeff=0.01)
res1 = eq1.sweep(var=v1, dt=1e-5)
res1_queue.put([res1,v1])

if __name__ == '__main__':
print "You have " + str(cpu_count()) + " CPU cores in your machine"
print "Simulation end time is 99 seconds.\n"
res0_queue = Queue()
res1_queue = Queue()
for t in range(100):
print "Computing the solution at time, t = " + str(t) + " sec"
v0.updateOld()
v1.updateOld()
res0 = res1 = 1e6
while max(res0,res1) > 0.01:
process_a = Process(target=get_res0, args=(res0_queue,v0,v1))
process_b = Process(target=get_res1, args=(res1_queue,v0,v1))
process_a.start()
process_b.start()
process_a.join()
process_b.join()
while not res0_queue.empty():
res0, v0 =  res0_queue.get()
while not res1_queue.empty():
res1, v1 =  res1_queue.get()
print v0, v1
print res0, res1





___
fipy mailing list
fipy@nist.gov
http://www.ctcms.nist.gov/fipy
  [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]