Try adding "-mca rmaps_base_verbose 5” and see what that output tells us - I assume you have a debug build configured, yes (i.e., added --enable-debug to configure line)?
> On Apr 13, 2017, at 7:28 AM, Cyril Bordage <cyril.bord...@inria.fr> wrote: > > When I run this command from the compute node I have also that. But not > when I run it from a login node (with the same machine file). > > > Cyril. > > Le 13/04/2017 à 16:22, r...@open-mpi.org a écrit : >> We are asking all these questions because we cannot replicate your problem - >> so we are trying to help you figure out what is different or missing from >> your machine. When I run your cmd line on my system, I get: >> >> [rhc002.cluster:55965] MCW rank 24 bound to socket 0[core 0[hwt 0-1]]: >> [BB/../../../../../../../../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 25 bound to socket 1[core 12[hwt 0-1]]: >> [../../../../../../../../../../../..][BB/../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 26 bound to socket 0[core 1[hwt 0-1]]: >> [../BB/../../../../../../../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 27 bound to socket 1[core 13[hwt 0-1]]: >> [../../../../../../../../../../../..][../BB/../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 28 bound to socket 0[core 2[hwt 0-1]]: >> [../../BB/../../../../../../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 29 bound to socket 1[core 14[hwt 0-1]]: >> [../../../../../../../../../../../..][../../BB/../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 30 bound to socket 0[core 3[hwt 0-1]]: >> [../../../BB/../../../../../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 31 bound to socket 1[core 15[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../BB/../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 32 bound to socket 0[core 4[hwt 0-1]]: >> [../../../../BB/../../../../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 33 bound to socket 1[core 16[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../BB/../../../../../../..] >> [rhc002.cluster:55965] MCW rank 34 bound to socket 0[core 5[hwt 0-1]]: >> [../../../../../BB/../../../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 35 bound to socket 1[core 17[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../BB/../../../../../..] >> [rhc002.cluster:55965] MCW rank 36 bound to socket 0[core 6[hwt 0-1]]: >> [../../../../../../BB/../../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 37 bound to socket 1[core 18[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../BB/../../../../..] >> [rhc002.cluster:55965] MCW rank 38 bound to socket 0[core 7[hwt 0-1]]: >> [../../../../../../../BB/../../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 39 bound to socket 1[core 19[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../BB/../../../..] >> [rhc002.cluster:55965] MCW rank 40 bound to socket 0[core 8[hwt 0-1]]: >> [../../../../../../../../BB/../../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 41 bound to socket 1[core 20[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../BB/../../..] >> [rhc002.cluster:55965] MCW rank 42 bound to socket 0[core 9[hwt 0-1]]: >> [../../../../../../../../../BB/../..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 43 bound to socket 1[core 21[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../../BB/../..] >> [rhc002.cluster:55965] MCW rank 44 bound to socket 0[core 10[hwt 0-1]]: >> [../../../../../../../../../../BB/..][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 45 bound to socket 1[core 22[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../../../BB/..] >> [rhc002.cluster:55965] MCW rank 46 bound to socket 0[core 11[hwt 0-1]]: >> [../../../../../../../../../../../BB][../../../../../../../../../../../..] >> [rhc002.cluster:55965] MCW rank 47 bound to socket 1[core 23[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../../../../BB] >> [rhc001:197743] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: >> [BB/../../../../../../../../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 1 bound to socket 1[core 12[hwt 0-1]]: >> [../../../../../../../../../../../..][BB/../../../../../../../../../../..] >> [rhc001:197743] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: >> [../BB/../../../../../../../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 3 bound to socket 1[core 13[hwt 0-1]]: >> [../../../../../../../../../../../..][../BB/../../../../../../../../../..] >> [rhc001:197743] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]: >> [../../BB/../../../../../../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 5 bound to socket 1[core 14[hwt 0-1]]: >> [../../../../../../../../../../../..][../../BB/../../../../../../../../..] >> [rhc001:197743] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]: >> [../../../BB/../../../../../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 7 bound to socket 1[core 15[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../BB/../../../../../../../..] >> [rhc001:197743] MCW rank 8 bound to socket 0[core 4[hwt 0-1]]: >> [../../../../BB/../../../../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 9 bound to socket 1[core 16[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../BB/../../../../../../..] >> [rhc001:197743] MCW rank 10 bound to socket 0[core 5[hwt 0-1]]: >> [../../../../../BB/../../../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 11 bound to socket 1[core 17[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../BB/../../../../../..] >> [rhc001:197743] MCW rank 12 bound to socket 0[core 6[hwt 0-1]]: >> [../../../../../../BB/../../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 13 bound to socket 1[core 18[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../BB/../../../../..] >> [rhc001:197743] MCW rank 14 bound to socket 0[core 7[hwt 0-1]]: >> [../../../../../../../BB/../../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 15 bound to socket 1[core 19[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../BB/../../../..] >> [rhc001:197743] MCW rank 16 bound to socket 0[core 8[hwt 0-1]]: >> [../../../../../../../../BB/../../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 17 bound to socket 1[core 20[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../BB/../../..] >> [rhc001:197743] MCW rank 18 bound to socket 0[core 9[hwt 0-1]]: >> [../../../../../../../../../BB/../..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 19 bound to socket 1[core 21[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../../BB/../..] >> [rhc001:197743] MCW rank 20 bound to socket 0[core 10[hwt 0-1]]: >> [../../../../../../../../../../BB/..][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 21 bound to socket 1[core 22[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../../../BB/..] >> [rhc001:197743] MCW rank 22 bound to socket 0[core 11[hwt 0-1]]: >> [../../../../../../../../../../../BB][../../../../../../../../../../../..] >> [rhc001:197743] MCW rank 23 bound to socket 1[core 23[hwt 0-1]]: >> [../../../../../../../../../../../..][../../../../../../../../../../../BB] >> >> Exactly as expected. You might check that you have libnuma and libnuma-devel >> installed >> >> >>> On Apr 13, 2017, at 6:50 AM, gil...@rist.or.jp wrote: >>> >>> OK thanks, >>> >>> we've had some issues in the past when Open MPI assumed that the (login) >>> node running mpirun has the same topology than the other (compute) nodes. >>> i just wanted to clear this scenario. >>> >>> Cheers, >>> >>> Gilles >>> >>> ----- Original Message ----- >>>> I am using the 6886c12 commit. >>>> I have no particular option for the configuration. >>>> I launch my application in the same way as I presented in my firt >>> email, >>>> there is the exact line: mpirun -np 48 -machinefile mf -bind-to core >>>> -report-bindings ./a.out >>>> >>>> lstopo does give the same output on both types on nodes. What is the >>>> purpose of that? >>>> >>>> Thanks. >>>> >>>> >>>> Cyril. >>>> >>>> Le 13/04/2017 à 15:24, gil...@rist.or.jp a écrit : >>>>> Also, can you please run >>>>> lstopo >>>>> on both your login and compute nodes ? >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> Can you be a bit more specific? >>>>>> >>>>>> - What version of Open MPI are you using? >>>>>> - How did you configure Open MPI? >>>>>> - How are you launching Open MPI applications? >>>>>> >>>>>> >>>>>>> On Apr 13, 2017, at 9:08 AM, Cyril Bordage <cyril.bord...@inria.fr >>>> >>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> now this bug happens also when I launch my mpirun command from the >>>>>>> compute node. >>>>>>> >>>>>>> >>>>>>> Cyril. >>>>>>> >>>>>>> Le 06/04/2017 à 05:38, r...@open-mpi.org a écrit : >>>>>>>> I believe this has been fixed now - please let me know >>>>>>>> >>>>>>>>> On Mar 30, 2017, at 1:57 AM, Cyril Bordage <cyril.bordage@inria. >>> fr >>>>>> wrote: >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I am using the git version of MPI with "-bind-to core -report- >>>>> bindings" >>>>>>>>> and I get that for all processes: >>>>>>>>> [miriel010:160662] MCW rank 0 not bound >>>>>>>>> >>>>>>>>> >>>>>>>>> When I use an old version I get: >>>>>>>>> [miriel010:44921] MCW rank 0 bound to socket 0[core 0[hwt 0]]: >>>>>>>>> [B/././././././././././.][./././././././././././.] >>>>>>>>> >>>>>>>>> From git bisect the culprit seems to be: 48fc339 >>>>>>>>> >>>>>>>>> This bug happends only when I launch my mpirun command from a >>>>> login node >>>>>>>>> and not >>>>>>>>> from a compute node. >>>>>>>>> >>>>>>>>> >>>>>>>>> Cyril. >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> devel@lists.open-mpi.org >>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> devel@lists.open-mpi.org >>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> devel@lists.open-mpi.org >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> devel@lists.open-mpi.org >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>>> _______________________________________________ >>>>> devel mailing list >>>>> devel@lists.open-mpi.org >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>>> >>>> _______________________________________________ >>>> devel mailing list >>>> devel@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel