> Il giorno 26/nov/2014, alle ore 10:28, Garth N. Wells <[email protected]> ha 
> scritto:
> 
> 
>> On Wed, 26 Nov, 2014 at 9:09 AM, Johan Hake <[email protected]> wrote:
>>> On Wed, Nov 26, 2014 at 9:43 AM, Garth N. Wells <[email protected]> wrote:
>>>> On Wed, 26 Nov, 2014 at 8:32 AM, Johan Hake <[email protected]> wrote:
>>>>> On Wed, Nov 26, 2014 at 9:22 AM, Garth N. Wells <[email protected]> wrote:
>>>>>> On Wed, 26 Nov, 2014 at 7:50 AM, Johan Hake <[email protected]> wrote:
>>>>>>> On Wed, Nov 26, 2014 at 8:34 AM, Garth N. Wells <[email protected]> wrote:
>>>>>>>> On Tue, 25 Nov, 2014 at 9:48 PM, Johan Hake <[email protected]> wrote:
>>>>>>>> Hello!
>>>>>>>> I just pushed some fixes to the jit interface of DOLFIN. Now one can 
>>>>>>>> jit on different mpi groups.
>>>>>>> Nice.
>>>>>>>> Previously jiting was only done on rank 1 of the mpi_comm_world. Now 
>>>>>>>> it is done on rank 1 of any passed group communicator.
>>>>>>> Do you mean rank 0?
>>>>>> ​Yes, of course.​
>>>>>>>> There is no demo atm showing this but a test has been added:
>>>>>>>>  test/unit/python/jit/test_jit_with_mpi_groups.py
>>>>>>>> Here an expression, a subdomain, and a form is constructed on 
>>>>>>>> different ranks using group. It is somewhat tedious as one need to 
>>>>>>>> initialize PETSc with the same group, otherwise PETSc will deadlock 
>>>>>>>> during initialization (the moment a PETSc la object is constructed).
>>>>>>> This is ok. It's arguably a design flaw that we don't make the user 
>>>>>>> handle MPI initialisation manually.
>>>>>> ​Sure, it is just somewhat tedious. You cannot start your typical script 
>>>>>> with importing dolfin.​
>>>>>>>> The procedure in Python for this is:
>>>>>>>> 1) Construct mpi groups using mpi4py
>>>>>>>> 2) Initalize petscy4py using the groups
>>>>>>>> 3) Wrap groups to petsc4py comm (dolfin only support petsc4py not 
>>>>>>>> mpi4py)
>>>>>>>> 4) import dolfin
>>>>>>>> 5) Do group specific stuff:
>>>>>>>>   a) Function and forms no change needed as communicator
>>>>>>>>      is passed via mesh
>>>>>>>>   b) domain = CompiledSubDomain("...", mpi_comm=group_comm)
>>>>>>>>   c) e = Expression("...", mpi_comm=group_comm)
>>>>>>> It's not so clear whether passing the communicator means that the 
>>>>>>> Expression is only defined/available on group_comm, or if group_comm is 
>>>>>>> simply to control who does the JIT. Could you clarify this?
>>>>>> My knowledge is not that good in MPI. I have only tried to access (and 
>>>>>> construct) the Expression on ranks included in that group. Also when I 
>>>>>> tried construct one using a group communicator on a rank that is not 
>>>>>> included in the group, I got an when calling MPI_size on it. There is 
>>>>>> probably a perfectly reasonable explaination to this. ​​
>>>>> Could you clarify what goes on behind-the-scenes with the communicator? 
>>>>> Is it only used in a call to get the process rank? What do the ranks 
>>>>> other than zero do?
>>>> ​Not sure what you want to know. Instead of using mpi_comm_world to 
>>>> construct meshes you use the group communicator. This communicator has its 
>>>> own local group of ranks​. JITing is still done on rank 0 of the local 
>>>> group, which might and most often is different from rank 0 process of the 
>>>> mpi_comm_word.
>>> I just want to be clear (and have in the docstring) that
>>>   e = Expression("...", mpi_comm=group_comm)
>>> is valid only on group_comm (if this is the case), or make clear that the 
>>> communicator only determines the process that does the JIT.
>> ​I see now what you mean. I can update the docstring. As far as I understand 
>> it should be that the expression is only valid on group_comm, and that rank 
>> 0 of that group take care of the JIT.
> 
> 
> OK, could you make this clear in the docstring?
> 
>>> If we required all Expressions to have a domain/mesh, as Martin advocates, 
>>> things would be clearer.
>> Sure, but the same question is there for the mesh too. Is it available on 
>> ranks that is not in the group?
> 
> 
> I think in this case it is clear - a mesh lives only on the processes 
> belonging to its communicator. The ambiguity with an Expression is that is 
> doesn't have any data that lives on processes.
> 
> 
>>>> The group communicator works exactly like the world communicator but now 
>>>> on just a subset of the processes. There were some sharp edges with 
>>>> deadlocks as a consequence, when barriers were taken on the world 
>>>> communicator. This is done by default when dolfin is imported and petcs 
>>>> gets initialized with the world communicator. So we need to initialized 
>>>> petsc using the group communicator. Other than that there are not real 
>>>> differences.
>>> That doesn't sound right. PETSc initialisation does not take a 
>>> communicator. It is collective on MPI_COMM_WORLD, but each PETSc object 
>>> takes a communicator at construction, which can be something other than 
>>> MPI_COMM_WORLD or MPI_COMM_SELF.
>> ​Well, for all I know petsc can be initialized with a mpi_comm​. In petsc4py 
>> that is done by:
>>  import petsc4py
>>  petsc4py.init(comm=group_1)
>>  import petsc4py.PETSc as petsc
>> It turned out that this was required for the Function constructor to not 
>> deadlock. The line:
>>    _vector = factory.create_vector();
>> initilizes PETSc with world_comm, which obviously deadlocks.
> 
> There must be something else wrong. PETScInitialize does not take a 
> communicator:
> 
>   
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html
> 
> Why does petsc4py want one? It doesn't make sense to initialise it with a 
> communicator - a communicator belongs to objects.

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PETSC_COMM_WORLD.html

Simone

> 
> Garth
> 
>> You might say that this could be avoided by initializing PETSc on all ranks 
>> with the world communicator before constructing a Function on a group. 
>> However it still deadlocks during construction. Here I have just assumed it 
>> deadlock at the same line, but I need to double check this. And when I 
>> initilized PETSc using the group communicator it just worked. So somewhere a 
>> collective call to mpi_world_comm is executed when constructing a 
>> PETScVector.
>> Johan
>>> Garth
>>>> ​Johan​
>>>>> Garth
>>>>>>>> Please try it out and report any sharp edges. A demo would also be fun 
>>>>>>>> to include :)
>>>>>>> We could run tests on different communicators to speed them up on 
>>>>>>> machines with high core counts!
>>>>>> True!
>>>>>> Johan​
>>>>>>> Garth
>>>>>>>> Johan
> 
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to