Re: [FEniCS] mpi groups via petsc4py

Garth N. Wells Wed, 26 Nov 2014 02:31:40 -0800

On Wed, 26 Nov, 2014 at 10:24 AM, Johan Hake <[email protected]> wrote:

On Wed, Nov 26, 2014 at 10:52 AM, Garth N. Wells <[email protected]>wrote:
On Wed, 26 Nov, 2014 at 9:38 AM, Johan Hake <[email protected]>wrote:
On Wed, Nov 26, 2014 at 10:28 AM, Garth N. Wells <[email protected]>wrote:
On Wed, 26 Nov, 2014 at 9:09 AM, Johan Hake <[email protected]>wrote:
On Wed, Nov 26, 2014 at 9:43 AM, Garth N. Wells <[email protected]>wrote:
On Wed, 26 Nov, 2014 at 8:32 AM, Johan Hake <[email protected]>wrote:
On Wed, Nov 26, 2014 at 9:22 AM, Garth N. Wells<[email protected]> wrote:
On Wed, 26 Nov, 2014 at 7:50 AM, Johan Hake<[email protected]> wrote:
On Wed, Nov 26, 2014 at 8:34 AM, Garth N. Wells<[email protected]> wrote:
On Tue, 25 Nov, 2014 at 9:48 PM, Johan Hake<[email protected]> wrote:
Hello!
I just pushed some fixes to the jit interface of DOLFIN.Now one can jit on different mpi groups.
Nice.
Previously jiting was only done on rank 1 of thempi_comm_world. Now it is done on rank 1 of any passedgroup communicator.
Do you mean rank 0?
Yes, of course.
There is no demo atm showing this but a test has been added:

  test/unit/python/jit/test_jit_with_mpi_groups.py
Here an expression, a subdomain, and a form is constructedon different ranks using group. It is somewhat tedious asone need to initialize PETSc with the same group, otherwisePETSc will deadlock during initialization (the moment aPETSc la object is constructed).
This is ok. It's arguably a design flaw that we don't makethe user handle MPI initialisation manually.
Sure, it is just somewhat tedious. You cannot start yourtypical script with importing dolfin.
The procedure in Python for this is:

1) Construct mpi groups using mpi4py
2) Initalize petscy4py using the groups
3) Wrap groups to petsc4py comm (dolfin only supportpetsc4py not mpi4py)
4) import dolfin
5) Do group specific stuff:
   a) Function and forms no change needed as communicator
      is passed via mesh
   b) domain = CompiledSubDomain("...", mpi_comm=group_comm)
   c) e = Expression("...", mpi_comm=group_comm)
It's not so clear whether passing the communicator meansthat the Expression is only defined/available on group_comm,or if group_comm is simply to control who does the JIT.Could you clarify this?
My knowledge is not that good in MPI. I have only tried toaccess (and construct) the Expression on ranks included inthat group. Also when I tried construct one using a groupcommunicator on a rank that is not included in the group, Igot an when calling MPI_size on it. There is probably aperfectly reasonable explaination to this.
Could you clarify what goes on behind-the-scenes with thecommunicator? Is it only used in a call to get the processrank? What do the ranks other than zero do?
Not sure what you want to know. Instead of usingmpi_comm_world to construct meshes you use the groupcommunicator. This communicator has its own local group ofranks. JITing is still done on rank 0 of the local group,which might and most often is different from rank 0 process ofthe mpi_comm_word.
I just want to be clear (and have in the docstring) that

   e = Expression("...", mpi_comm=group_comm)
is valid only on group_comm (if this is the case), or make clearthat the communicator only determines the process that does theJIT.
I see now what you mean. I can update the docstring. As far asI understand it should be that the expression is only valid ongroup_comm, and that rank 0 of that group take care of the JIT.
OK, could you make this clear in the docstring?
Sure.
If we required all Expressions to have a domain/mesh, as Martinadvocates, things would be clearer.
Sure, but the same question is there for the mesh too. Is itavailable on ranks that is not in the group?
I think in this case it is clear - a mesh lives only on theprocesses belonging to its communicator. The ambiguity with anExpression is that is doesn't have any data that lives onprocesses.
Sure.
The group communicator works exactly like the worldcommunicator but now on just a subset of the processes. Therewere some sharp edges with deadlocks as a consequence, whenbarriers were taken on the world communicator. This is done bydefault when dolfin is imported and petcs gets initialized withthe world communicator. So we need to initialized petsc usingthe group communicator. Other than that there are not realdifferences.
That doesn't sound right. PETSc initialisation does not take acommunicator. It is collective on MPI_COMM_WORLD, but each PETScobject takes a communicator at construction, which can besomething other than MPI_COMM_WORLD or MPI_COMM_SELF.
Well, for all I know petsc can be initialized with ampi_comm. In petsc4py that is done by:
  import petsc4py
  petsc4py.init(comm=group_1)
  import petsc4py.PETSc as petsc
It turned out that this was required for the Function constructorto not deadlock. The line:
    _vector = factory.create_vector();

initilizes PETSc with world_comm, which obviously deadlocks.
There must be something else wrong. PETScInitialize does not takea communicator:
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html
From that web page:
Collective on MPI_COMM_WORLD or PETSC_COMM_WORLD if it hasbeen set
So setting PETSC_COMM_WORLD initializes PETSc on a subset ofprocesses.
It is fundamentally wrong that PETSc is 'initialised' with asub-communicator in DOLFIN to avoid a deadlock for the code usesketched in this thread. Communicators belong to objects.Initialising PETSc on a sub-communicator, it would not be possibleto solve a subsequent problem on more processes than contained inthe sub-communicator.
If the code is deadlocking, then the underlying problem should befixed. It's probably a function call somewhere that is erroneouslynot passing the communicator (since in cases where a communicator isnot passed, we assume it to be MPI_COMM_WORLD - not ideal but keepsit simple for most users).
Found the bugger!
If the first PETScObject is created only on a group,SubSystemManager::init_petsc is called. This wont initialize PETScif it is already initialized outside, for example withMPI_COMM_WORLD. However SLEPc will be initialized usingPETSc_COMM_WORLD. If that is set to MPI_COMM_WORLD we have a deadlock. If we however, create a PETScVector on all processes before wecreate the group one. We are fine as SLEPc then gets initialized onall processes.
So the solution is to call:

SubsystemManager.init_petsc()

on all processes before doing anything.

I don't quite follow. Could you post a simple snippet that deadlockswhen SubsystemManager.init_petsc() is not called, but works when itis called?


Garth

Well, I learned a bunch of MPI related stuff here :) I will updatethe tests accordingly.
Johan
Garth
Also see:

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PETSC_COMM_WORLD.html
Johan
Why does petsc4py want one? It doesn't make sense to initialise itwith a communicator - a communicator belongs to objects.
Garth
You might say that this could be avoided by initializing PETSc onall ranks with the world communicator before constructing aFunction on a group. However it still deadlocks duringconstruction. Here I have just assumed it deadlock at the sameline, but I need to double check this. And when I initilizedPETSc using the group communicator it just worked. So somewhere acollective call to mpi_world_comm is executed when constructing aPETScVector.
Johan
Garth
Johan
Garth
Please try it out and report any sharp edges. A demo wouldalso be fun to include :)
We could run tests on different communicators to speed themup on machines with high core counts!
True!

Johan
Garth
Johan


_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] mpi groups via petsc4py

Reply via email to