Re: [OMPI devel] Process placement

Ralph Castain Sat, 7 May 2016 06:58:23 -0400 (EDT)

I believe this has been fixed. Note that the allocation display occurs prior to 
mapping, and thus the slots_inuse will be zero at that point. You’ll see those 
numbers change if you do a comm_spawn, but otherwise they will always be zero



> On May 5, 2016, at 8:37 PM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> Okay, I see it - will fix on Fri. This is unique to master.
> 
>> On May 5, 2016, at 1:54 PM, Aurélien Bouteiller <boute...@icl.utk.edu 
>> <mailto:boute...@icl.utk.edu>> wrote:
>> 
>> Ralph, 
>> 
>> I still observe these issues in the current master. (npernode is not 
>> respected either).
>> 
>> Also note that the display_allocation seems to be wrong (slots_inuse=0 when 
>> the slot is obviously in use). 
>> 
>> $ git show 
>> 4899c89 (HEAD -> master, origin/master, origin/HEAD) Fix a race condition 
>> when multiple threads try to create a bml en....Bouteiller  6 hours ago
>> 
>> $ bin/mpirun -np 12 -hostfile /opt/etc/ib10g.machinefile.ompi 
>> -display-allocation -map-by node    hostname 
>> 
>> ======================   ALLOCATED NODES   ======================
>>      dancer00: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer01: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer02: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer03: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer04: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer05: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer06: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer07: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer08: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer09: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer10: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer11: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer12: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer13: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer14: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>>      dancer15: flags=0x13 slots=8 max_slots=0 slots_inuse=0 state=UNKNOWN
>> =================================================================
>> dancer01
>> dancer00
>> dancer01
>> dancer01
>> dancer01
>> dancer00
>> dancer00
>> dancer00
>> dancer00
>> dancer00
>> dancer00
>> dancer00
>> 
>> 
>> --
>> Aurélien Bouteiller, Ph.D. ~~ https://icl.cs.utk.edu/~bouteill/ 
>> <https://icl.cs.utk.edu/~bouteill/>
>>> Le 13 avr. 2016 à 13:38, Ralph Castain <r...@open-mpi.org 
>>> <mailto:r...@open-mpi.org>> a écrit :
>>> 
>>> The —map-by node option should now be fixed on master, and PRs waiting for 
>>> 1.10 and 2.0
>>> 
>>> Thx!
>>> 
>>>> On Apr 12, 2016, at 6:45 PM, Ralph Castain <r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org>> wrote:
>>>> 
>>>> FWIW: speaking just to the —map-by node issue, Josh Ladd reported the 
>>>> problem on master as well yesterday. I’ll be looking into it on Wed.
>>>> 
>>>>> On Apr 12, 2016, at 5:53 PM, George Bosilca <bosi...@icl.utk.edu 
>>>>> <mailto:bosi...@icl.utk.edu>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Apr 13, 2016 at 1:59 AM, Gilles Gouaillardet <gil...@rist.or.jp 
>>>>> <mailto:gil...@rist.or.jp>> wrote:
>>>>> George,
>>>>> 
>>>>> about the process binding part
>>>>> 
>>>>> On 4/13/2016 7:32 AM, George Bosilca wrote:
>>>>> Also my processes, despite the fact that I asked for 1 per node, are not 
>>>>> bound to the first core. Shouldn’t we release the process binding when we 
>>>>> know there is a single process per node (as in the above case) ?
>>>>> did you expect the tasks are bound to the first *core* on each node ?
>>>>> 
>>>>> i would expect the tasks are bound to the first *socket* on each node.
>>>>> 
>>>>> In this particular instance, where it has been explicitly requested to 
>>>>> have a single process per node, I would have expected the process to be 
>>>>> unbound (we know there is only one per node). It is the responsibility of 
>>>>> the application to bound itself or its thread if necessary. Why are we 
>>>>> enforcing a particular binding policy?
>>>>> 
>>>>> (since we do not know how many (OpenMP or other) threads will be used by 
>>>>> the application, 
>>>>> --bind-to socket is a good policy imho. in this case (one task per node), 
>>>>> no binding at all would mean
>>>>> the task can migrate from one socket to the other, and/or OpenMP threads 
>>>>> are bound accross sockets.
>>>>> That would trigger some NUMA effects (better bandwidth if memory is 
>>>>> locally accessed, but worst performance
>>>>> is memory is allocated only on one socket).
>>>>> so imho, --bind-to socket is still my preferred policy, even if there is 
>>>>> only one MPI task per node.
>>>>> 
>>>>> Open MPI is about MPI ranks/processes. I don't think it is our job to try 
>>>>> to figure out how the user handle do with it's own threads.
>>>>> 
>>>>> Your justification make sense if the application only uses a single 
>>>>> socket. It also make sense if one starts multiple ranks per node, and the 
>>>>> internal threads of each MPI process inherit the MPI process binding. 
>>>>> However, in the case where there is a single process per node, because 
>>>>> there is a mismatch between the number of resources available (hardware 
>>>>> threads) and the binding of the parent process, all the threads of the 
>>>>> MPI application are [by default] bound on a single socket.
>>>>> 
>>>>>  George.
>>>>> 
>>>>> PS: That being said I think I'll need to implement the binding code 
>>>>> anyway in order to deal with the wide variety of behaviors in the 
>>>>> different MPI implementations.
>>>>> 
>>>>>  
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2016/04/18758.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2016/04/18758.php>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/devel/2016/04/18759.php 
>>>>> <http://www.open-mpi.org/community/lists/devel/2016/04/18759.php>
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2016/04/18761.php 
>>> <http://www.open-mpi.org/community/lists/devel/2016/04/18761.php>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/05/18915.php
>

Re: [OMPI devel] Process placement

Reply via email to