Ralph, I still observe these issues in the current master. (npernode is not respected either).
Also note that the display_allocation seems to be wrong (slots_inuse=0 when the
slot is obviously in use).
$ git show
4899c89 (HEAD -> master, origin/master, origin/HEAD) Fix a race condition when
multiple threads try to create a bml en....Bouteiller 6 hours ago
$ bin/mpirun -np 12 -hostfile /opt/etc/ib10g.machinefile.ompi
-display-allocation -map-by node hostname
====================== ALLOCATED NODES ======================
dancer00: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer01: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer02: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer03: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer04: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer05: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer06: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer07: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer08: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer09: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer10: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer11: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer12: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer13: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer14: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
dancer15: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
=================================================================
dancer01
dancer00
dancer01
dancer01
dancer01
dancer00
dancer00
dancer00
dancer00
dancer00
dancer00
dancer00
--
Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/
<https://icl.cs.utk.edu/~bouteill/>
> Le 13 avr. 2016 à 13:38, Ralph Castain <[email protected]> a écrit :
>
> The —map-by node option should now be fixed on master, and PRs waiting for
> 1.10 and 2.0
>
> Thx!
>
>> On Apr 12, 2016, at 6:45 PM, Ralph Castain <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> FWIW: speaking just to the —map-by node issue, Josh Ladd reported the
>> problem on master as well yesterday. I’ll be looking into it on Wed.
>>
>>> On Apr 12, 2016, at 5:53 PM, George Bosilca <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>
>>>
>>> On Wed, Apr 13, 2016 at 1:59 AM, Gilles Gouaillardet <[email protected]
>>> <mailto:[email protected]>> wrote:
>>> George,
>>>
>>> about the process binding part
>>>
>>> On 4/13/2016 7:32 AM, George Bosilca wrote:
>>> Also my processes, despite the fact that I asked for 1 per node, are not
>>> bound to the first core. Shouldn’t we release the process binding when we
>>> know there is a single process per node (as in the above case) ?
>>> did you expect the tasks are bound to the first *core* on each node ?
>>>
>>> i would expect the tasks are bound to the first *socket* on each node.
>>>
>>> In this particular instance, where it has been explicitly requested to have
>>> a single process per node, I would have expected the process to be unbound
>>> (we know there is only one per node). It is the responsibility of the
>>> application to bound itself or its thread if necessary. Why are we
>>> enforcing a particular binding policy?
>>>
>>> (since we do not know how many (OpenMP or other) threads will be used by
>>> the application,
>>> --bind-to socket is a good policy imho. in this case (one task per node),
>>> no binding at all would mean
>>> the task can migrate from one socket to the other, and/or OpenMP threads
>>> are bound accross sockets.
>>> That would trigger some NUMA effects (better bandwidth if memory is locally
>>> accessed, but worst performance
>>> is memory is allocated only on one socket).
>>> so imho, --bind-to socket is still my preferred policy, even if there is
>>> only one MPI task per node.
>>>
>>> Open MPI is about MPI ranks/processes. I don't think it is our job to try
>>> to figure out how the user handle do with it's own threads.
>>>
>>> Your justification make sense if the application only uses a single socket.
>>> It also make sense if one starts multiple ranks per node, and the internal
>>> threads of each MPI process inherit the MPI process binding. However, in
>>> the case where there is a single process per node, because there is a
>>> mismatch between the number of resources available (hardware threads) and
>>> the binding of the parent process, all the threads of the MPI application
>>> are [by default] bound on a single socket.
>>>
>>> George.
>>>
>>> PS: That being said I think I'll need to implement the binding code anyway
>>> in order to deal with the wide variety of behaviors in the different MPI
>>> implementations.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>> _______________________________________________
>>> devel mailing list
>>> [email protected] <mailto:[email protected]>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18758.php
>>> <http://www.open-mpi.org/community/lists/devel/2016/04/18758.php>
>>> _______________________________________________
>>> devel mailing list
>>> [email protected] <mailto:[email protected]>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18759.php
>>> <http://www.open-mpi.org/community/lists/devel/2016/04/18759.php>
>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/04/18761.php
> <http://www.open-mpi.org/community/lists/devel/2016/04/18761.php>
smime.p7s
Description: S/MIME cryptographic signature
