Try adding "-mca rmaps_base_verbose 5” and see what that output tells us - I 
assume you have a debug build configured, yes (i.e., added --enable-debug to 
configure line)?


> On Apr 13, 2017, at 7:28 AM, Cyril Bordage <cyril.bord...@inria.fr> wrote:
> 
> When I run this command from the compute node I have also that. But not
> when I run it from a login node (with the same machine file).
> 
> 
> Cyril.
> 
> Le 13/04/2017 à 16:22, r...@open-mpi.org a écrit :
>> We are asking all these questions because we cannot replicate your problem - 
>> so we are trying to help you figure out what is different or missing from 
>> your machine. When I run your cmd line on my system, I get:
>> 
>> [rhc002.cluster:55965] MCW rank 24 bound to socket 0[core 0[hwt 0-1]]: 
>> [BB/../../../../../../../../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 25 bound to socket 1[core 12[hwt 0-1]]: 
>> [../../../../../../../../../../../..][BB/../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 26 bound to socket 0[core 1[hwt 0-1]]: 
>> [../BB/../../../../../../../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 27 bound to socket 1[core 13[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../BB/../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 28 bound to socket 0[core 2[hwt 0-1]]: 
>> [../../BB/../../../../../../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 29 bound to socket 1[core 14[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../BB/../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 30 bound to socket 0[core 3[hwt 0-1]]: 
>> [../../../BB/../../../../../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 31 bound to socket 1[core 15[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../BB/../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 32 bound to socket 0[core 4[hwt 0-1]]: 
>> [../../../../BB/../../../../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 33 bound to socket 1[core 16[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../BB/../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 34 bound to socket 0[core 5[hwt 0-1]]: 
>> [../../../../../BB/../../../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 35 bound to socket 1[core 17[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../BB/../../../../../..]
>> [rhc002.cluster:55965] MCW rank 36 bound to socket 0[core 6[hwt 0-1]]: 
>> [../../../../../../BB/../../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 37 bound to socket 1[core 18[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../BB/../../../../..]
>> [rhc002.cluster:55965] MCW rank 38 bound to socket 0[core 7[hwt 0-1]]: 
>> [../../../../../../../BB/../../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 39 bound to socket 1[core 19[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../BB/../../../..]
>> [rhc002.cluster:55965] MCW rank 40 bound to socket 0[core 8[hwt 0-1]]: 
>> [../../../../../../../../BB/../../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 41 bound to socket 1[core 20[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../BB/../../..]
>> [rhc002.cluster:55965] MCW rank 42 bound to socket 0[core 9[hwt 0-1]]: 
>> [../../../../../../../../../BB/../..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 43 bound to socket 1[core 21[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../../BB/../..]
>> [rhc002.cluster:55965] MCW rank 44 bound to socket 0[core 10[hwt 0-1]]: 
>> [../../../../../../../../../../BB/..][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 45 bound to socket 1[core 22[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../../../BB/..]
>> [rhc002.cluster:55965] MCW rank 46 bound to socket 0[core 11[hwt 0-1]]: 
>> [../../../../../../../../../../../BB][../../../../../../../../../../../..]
>> [rhc002.cluster:55965] MCW rank 47 bound to socket 1[core 23[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../../../../BB]
>> [rhc001:197743] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: 
>> [BB/../../../../../../../../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 1 bound to socket 1[core 12[hwt 0-1]]: 
>> [../../../../../../../../../../../..][BB/../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: 
>> [../BB/../../../../../../../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 3 bound to socket 1[core 13[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../BB/../../../../../../../../../..]
>> [rhc001:197743] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]: 
>> [../../BB/../../../../../../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 5 bound to socket 1[core 14[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../BB/../../../../../../../../..]
>> [rhc001:197743] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]: 
>> [../../../BB/../../../../../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 7 bound to socket 1[core 15[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../BB/../../../../../../../..]
>> [rhc001:197743] MCW rank 8 bound to socket 0[core 4[hwt 0-1]]: 
>> [../../../../BB/../../../../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 9 bound to socket 1[core 16[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../BB/../../../../../../..]
>> [rhc001:197743] MCW rank 10 bound to socket 0[core 5[hwt 0-1]]: 
>> [../../../../../BB/../../../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 11 bound to socket 1[core 17[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../BB/../../../../../..]
>> [rhc001:197743] MCW rank 12 bound to socket 0[core 6[hwt 0-1]]: 
>> [../../../../../../BB/../../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 13 bound to socket 1[core 18[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../BB/../../../../..]
>> [rhc001:197743] MCW rank 14 bound to socket 0[core 7[hwt 0-1]]: 
>> [../../../../../../../BB/../../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 15 bound to socket 1[core 19[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../BB/../../../..]
>> [rhc001:197743] MCW rank 16 bound to socket 0[core 8[hwt 0-1]]: 
>> [../../../../../../../../BB/../../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 17 bound to socket 1[core 20[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../BB/../../..]
>> [rhc001:197743] MCW rank 18 bound to socket 0[core 9[hwt 0-1]]: 
>> [../../../../../../../../../BB/../..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 19 bound to socket 1[core 21[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../../BB/../..]
>> [rhc001:197743] MCW rank 20 bound to socket 0[core 10[hwt 0-1]]: 
>> [../../../../../../../../../../BB/..][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 21 bound to socket 1[core 22[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../../../BB/..]
>> [rhc001:197743] MCW rank 22 bound to socket 0[core 11[hwt 0-1]]: 
>> [../../../../../../../../../../../BB][../../../../../../../../../../../..]
>> [rhc001:197743] MCW rank 23 bound to socket 1[core 23[hwt 0-1]]: 
>> [../../../../../../../../../../../..][../../../../../../../../../../../BB]
>> 
>> Exactly as expected. You might check that you have libnuma and libnuma-devel 
>> installed
>> 
>> 
>>> On Apr 13, 2017, at 6:50 AM, gil...@rist.or.jp wrote:
>>> 
>>> OK thanks,
>>> 
>>> we've had some issues in the past when Open MPI assumed that the (login) 
>>> node running mpirun has the same topology than the other (compute) nodes.
>>> i just wanted to clear this scenario.
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> ----- Original Message -----
>>>> I am using the 6886c12 commit.
>>>> I have no particular option for the configuration.
>>>> I launch my application in the same way as I presented in my firt 
>>> email,
>>>> there is the exact line: mpirun -np 48 -machinefile mf -bind-to core
>>>> -report-bindings ./a.out
>>>> 
>>>> lstopo does give the same output on both types on nodes. What is the
>>>> purpose of that?
>>>> 
>>>> Thanks.
>>>> 
>>>> 
>>>> Cyril.
>>>> 
>>>> Le 13/04/2017 à 15:24, gil...@rist.or.jp a écrit :
>>>>> Also, can you please run
>>>>> lstopo
>>>>> on both your login and compute nodes ?
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Gilles
>>>>> 
>>>>> 
>>>>> ----- Original Message -----
>>>>>> Can you be a bit more specific?
>>>>>> 
>>>>>> - What version of Open MPI are you using?
>>>>>> - How did you configure Open MPI?
>>>>>> - How are you launching Open MPI applications?
>>>>>> 
>>>>>> 
>>>>>>> On Apr 13, 2017, at 9:08 AM, Cyril Bordage <cyril.bord...@inria.fr
>>>> 
>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> now this bug happens also when I launch my mpirun command from the
>>>>>>> compute node.
>>>>>>> 
>>>>>>> 
>>>>>>> Cyril.
>>>>>>> 
>>>>>>> Le 06/04/2017 à 05:38, r...@open-mpi.org a écrit :
>>>>>>>> I believe this has been fixed now - please let me know
>>>>>>>> 
>>>>>>>>> On Mar 30, 2017, at 1:57 AM, Cyril Bordage <cyril.bordage@inria.
>>> fr
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> I am using the git version of MPI with "-bind-to core -report-
>>>>> bindings"
>>>>>>>>> and I get that for all processes:
>>>>>>>>> [miriel010:160662] MCW rank 0 not bound
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> When I use an old version I get:
>>>>>>>>> [miriel010:44921] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
>>>>>>>>> [B/././././././././././.][./././././././././././.]
>>>>>>>>> 
>>>>>>>>> From git bisect the culprit seems to be: 48fc339
>>>>>>>>> 
>>>>>>>>> This bug happends only when I launch my mpirun command from a 
>>>>> login node
>>>>>>>>> and not
>>>>>>>>> from a compute node.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Cyril.
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel@lists.open-mpi.org
>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel@lists.open-mpi.org
>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel@lists.open-mpi.org
>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Jeff Squyres
>>>>>> jsquy...@cisco.com
>>>>>> 
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel@lists.open-mpi.org
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> 
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to