Is your compute node included in your machine file ? If yes, what if you invoke mpirun from a compute node not listed in your machine file ? It can also be helpful to post your machinefile
Cheers, Gilles On Thursday, April 13, 2017, Cyril Bordage <cyril.bord...@inria.fr> wrote: > When I run this command from the compute node I have also that. But not > when I run it from a login node (with the same machine file). > > > Cyril. > > Le 13/04/2017 à 16:22, r...@open-mpi.org <javascript:;> a écrit : > > We are asking all these questions because we cannot replicate your > problem - so we are trying to help you figure out what is different or > missing from your machine. When I run your cmd line on my system, I get: > > > > [rhc002.cluster:55965] MCW rank 24 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../../../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 25 bound to socket 1[core 12[hwt 0-1]]: > [../../../../../../../../../../../..][BB/../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 26 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../../../../../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 27 bound to socket 1[core 13[hwt 0-1]]: > [../../../../../../../../../../../..][../BB/../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 28 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../../../../../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 29 bound to socket 1[core 14[hwt 0-1]]: > [../../../../../../../../../../../..][../../BB/../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 30 bound to socket 0[core 3[hwt 0-1]]: > [../../../BB/../../../../../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 31 bound to socket 1[core 15[hwt 0-1]]: > [../../../../../../../../../../../..][../../../BB/../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 32 bound to socket 0[core 4[hwt 0-1]]: > [../../../../BB/../../../../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 33 bound to socket 1[core 16[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../BB/../../../../../../..] > > [rhc002.cluster:55965] MCW rank 34 bound to socket 0[core 5[hwt 0-1]]: > [../../../../../BB/../../../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 35 bound to socket 1[core 17[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../BB/../../../../../..] > > [rhc002.cluster:55965] MCW rank 36 bound to socket 0[core 6[hwt 0-1]]: > [../../../../../../BB/../../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 37 bound to socket 1[core 18[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../BB/../../../../..] > > [rhc002.cluster:55965] MCW rank 38 bound to socket 0[core 7[hwt 0-1]]: > [../../../../../../../BB/../../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 39 bound to socket 1[core 19[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../BB/../../../..] > > [rhc002.cluster:55965] MCW rank 40 bound to socket 0[core 8[hwt 0-1]]: > [../../../../../../../../BB/../../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 41 bound to socket 1[core 20[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../BB/../../..] > > [rhc002.cluster:55965] MCW rank 42 bound to socket 0[core 9[hwt 0-1]]: > [../../../../../../../../../BB/../..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 43 bound to socket 1[core 21[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../../BB/../..] > > [rhc002.cluster:55965] MCW rank 44 bound to socket 0[core 10[hwt 0-1]]: > [../../../../../../../../../../BB/..][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 45 bound to socket 1[core 22[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../../../BB/..] > > [rhc002.cluster:55965] MCW rank 46 bound to socket 0[core 11[hwt 0-1]]: > [../../../../../../../../../../../BB][../../../../../../../../../../../..] > > [rhc002.cluster:55965] MCW rank 47 bound to socket 1[core 23[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../../../../BB] > > [rhc001:197743] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../../../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 1 bound to socket 1[core 12[hwt 0-1]]: > [../../../../../../../../../../../..][BB/../../../../../../../../../../..] > > [rhc001:197743] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: > [../BB/../../../../../../../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 3 bound to socket 1[core 13[hwt 0-1]]: > [../../../../../../../../../../../..][../BB/../../../../../../../../../..] > > [rhc001:197743] MCW rank 4 bound to socket 0[core 2[hwt 0-1]]: > [../../BB/../../../../../../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 5 bound to socket 1[core 14[hwt 0-1]]: > [../../../../../../../../../../../..][../../BB/../../../../../../../../..] > > [rhc001:197743] MCW rank 6 bound to socket 0[core 3[hwt 0-1]]: > [../../../BB/../../../../../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 7 bound to socket 1[core 15[hwt 0-1]]: > [../../../../../../../../../../../..][../../../BB/../../../../../../../..] > > [rhc001:197743] MCW rank 8 bound to socket 0[core 4[hwt 0-1]]: > [../../../../BB/../../../../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 9 bound to socket 1[core 16[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../BB/../../../../../../..] > > [rhc001:197743] MCW rank 10 bound to socket 0[core 5[hwt 0-1]]: > [../../../../../BB/../../../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 11 bound to socket 1[core 17[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../BB/../../../../../..] > > [rhc001:197743] MCW rank 12 bound to socket 0[core 6[hwt 0-1]]: > [../../../../../../BB/../../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 13 bound to socket 1[core 18[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../BB/../../../../..] > > [rhc001:197743] MCW rank 14 bound to socket 0[core 7[hwt 0-1]]: > [../../../../../../../BB/../../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 15 bound to socket 1[core 19[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../BB/../../../..] > > [rhc001:197743] MCW rank 16 bound to socket 0[core 8[hwt 0-1]]: > [../../../../../../../../BB/../../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 17 bound to socket 1[core 20[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../BB/../../..] > > [rhc001:197743] MCW rank 18 bound to socket 0[core 9[hwt 0-1]]: > [../../../../../../../../../BB/../..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 19 bound to socket 1[core 21[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../../BB/../..] > > [rhc001:197743] MCW rank 20 bound to socket 0[core 10[hwt 0-1]]: > [../../../../../../../../../../BB/..][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 21 bound to socket 1[core 22[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../../../BB/..] > > [rhc001:197743] MCW rank 22 bound to socket 0[core 11[hwt 0-1]]: > [../../../../../../../../../../../BB][../../../../../../../../../../../..] > > [rhc001:197743] MCW rank 23 bound to socket 1[core 23[hwt 0-1]]: > [../../../../../../../../../../../..][../../../../../../../../../../../BB] > > > > Exactly as expected. You might check that you have libnuma and > libnuma-devel installed > > > > > >> On Apr 13, 2017, at 6:50 AM, gil...@rist.or.jp <javascript:;> wrote: > >> > >> OK thanks, > >> > >> we've had some issues in the past when Open MPI assumed that the (login) > >> node running mpirun has the same topology than the other (compute) > nodes. > >> i just wanted to clear this scenario. > >> > >> Cheers, > >> > >> Gilles > >> > >> ----- Original Message ----- > >>> I am using the 6886c12 commit. > >>> I have no particular option for the configuration. > >>> I launch my application in the same way as I presented in my firt > >> email, > >>> there is the exact line: mpirun -np 48 -machinefile mf -bind-to core > >>> -report-bindings ./a.out > >>> > >>> lstopo does give the same output on both types on nodes. What is the > >>> purpose of that? > >>> > >>> Thanks. > >>> > >>> > >>> Cyril. > >>> > >>> Le 13/04/2017 à 15:24, gil...@rist.or.jp <javascript:;> a écrit : > >>>> Also, can you please run > >>>> lstopo > >>>> on both your login and compute nodes ? > >>>> > >>>> Cheers, > >>>> > >>>> Gilles > >>>> > >>>> > >>>> ----- Original Message ----- > >>>>> Can you be a bit more specific? > >>>>> > >>>>> - What version of Open MPI are you using? > >>>>> - How did you configure Open MPI? > >>>>> - How are you launching Open MPI applications? > >>>>> > >>>>> > >>>>>> On Apr 13, 2017, at 9:08 AM, Cyril Bordage <cyril.bord...@inria.fr > <javascript:;> > >>> > >>>> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> now this bug happens also when I launch my mpirun command from the > >>>>>> compute node. > >>>>>> > >>>>>> > >>>>>> Cyril. > >>>>>> > >>>>>> Le 06/04/2017 à 05:38, r...@open-mpi.org <javascript:;> a écrit : > >>>>>>> I believe this has been fixed now - please let me know > >>>>>>> > >>>>>>>> On Mar 30, 2017, at 1:57 AM, Cyril Bordage <cyril.bordage@inria. > >> fr > >>>>> wrote: > >>>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> I am using the git version of MPI with "-bind-to core -report- > >>>> bindings" > >>>>>>>> and I get that for all processes: > >>>>>>>> [miriel010:160662] MCW rank 0 not bound > >>>>>>>> > >>>>>>>> > >>>>>>>> When I use an old version I get: > >>>>>>>> [miriel010:44921] MCW rank 0 bound to socket 0[core 0[hwt 0]]: > >>>>>>>> [B/././././././././././.][./././././././././././.] > >>>>>>>> > >>>>>>>> From git bisect the culprit seems to be: 48fc339 > >>>>>>>> > >>>>>>>> This bug happends only when I launch my mpirun command from a > >>>> login node > >>>>>>>> and not > >>>>>>>> from a compute node. > >>>>>>>> > >>>>>>>> > >>>>>>>> Cyril. > >>>>>>>> _______________________________________________ > >>>>>>>> devel mailing list > >>>>>>>> devel@lists.open-mpi.org <javascript:;> > >>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> devel mailing list > >>>>>>> devel@lists.open-mpi.org <javascript:;> > >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> devel mailing list > >>>>>> devel@lists.open-mpi.org <javascript:;> > >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > >>>>> > >>>>> > >>>>> -- > >>>>> Jeff Squyres > >>>>> jsquy...@cisco.com <javascript:;> > >>>>> > >>>>> _______________________________________________ > >>>>> devel mailing list > >>>>> devel@lists.open-mpi.org <javascript:;> > >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > >>>> _______________________________________________ > >>>> devel mailing list > >>>> devel@lists.open-mpi.org <javascript:;> > >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > >>>> > >>> _______________________________________________ > >>> devel mailing list > >>> devel@lists.open-mpi.org <javascript:;> > >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > >>> > >> _______________________________________________ > >> devel mailing list > >> devel@lists.open-mpi.org <javascript:;> > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org <javascript:;> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org <javascript:;> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel