George, I think if you get rid of the --start 1000000 you'll see the curves I'm getting on my local cluster and on Comet. I don't know exactly how the memory registration is handled so it isn't clear to me why there would be less of a penalty for non factors of 8 when starting the tests at 1 MB.
The --nocache capability was added to NetPIPE to test a memcpy without cache effects by creating a cyclical buffer far greater than the entire cache system (200 MB typically). I'm using this same method to try to test the worst case performance for IB where the memory is always not registered with the HCA. However, the 200 MB cyclical buffer may not be large enough to insure that each message needs to have its memory registered. I changed the MEMSIZE parameter in netpipe.h down to 2 MB and get performance very close to the best case scenario as expected since the entire 2 MB buffer would end up being entirely registered. When I increase the cyclical buffer to 20 GB I get similar spikes for factors of 8 but even greater penalties for non factors of 8 where the performance is nearly zero. I don't see anything in NetPIPE itself that could cause this as non factors of 8 are still aligned to 8 bytes. Can you think of anything in OpenMPI that would result in the message being treated differently when the message size is not a factor of 8? The sends/recvs all use a data type of MPI_BYTE in NetPIPE. I'll poke around the OpenMPI code and try a few things on my own. Any insights you could give me about how much memory can be registered with the HCA, and how the registration is handled (a list of pages or pointers and message lengths, etc) would be helpful. Dave On Wed, May 3, 2017 at 4:05 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > Dave, > > I specifically forced the OB1 PML with the OpenIB BTL. I have "--mca pml > ob1 --mca btl openib,self" on my mpirun. > > Originally, I assumed that the pipeline protocol was not kicking in as > expected, and that the large cost you are seeing was due to pinning the > entire buffer for the communication. Thus, I tried to alter the MCA > parameters driving the pipeline protocol, but failed to see any major > benefit (compared with the stock version). > > Here is what I used: > mpirun --map-by node --mca pml ob1 --mca btl openib,self --mca > btl_openib_get_limit $((1024*1024)) --mca btl_openib_put_limit > $((1024*1024)) ./NPmpi --nocache --start 1000000 > > George. > > > > On Wed, May 3, 2017 at 4:27 PM, Dave Turner <drdavetur...@gmail.com> > wrote: > >> George, >> >> Our local cluster runs Gentoo which I think prevents us from >> using MXM and we do not use UCX. It's a pretty standard build >> of 2.0.1 (ompi_info -a for Beocat is attached). >> >> I've also attached the ompi_info -a dump for Comet which is >> running 1.8.4. A grep shows nothing about MXM or UCX. >> >> Are you testing with MXM or UCX that would be giving you >> the different results? >> >> Dave >> >> On Wed, May 3, 2017 at 1:00 PM, <devel-requ...@lists.open-mpi.org> wrote: >> >>> Send devel mailing list submissions to >>> devel@lists.open-mpi.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> or, via email, send a message with subject or body 'help' to >>> devel-requ...@lists.open-mpi.org >>> >>> You can reach the person managing the list at >>> devel-ow...@lists.open-mpi.org >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of devel digest..." >>> >>> >>> Today's Topics: >>> >>> 1. NetPIPE performance curves (Dave Turner) >>> 2. Re: NetPIPE performance curves (George Bosilca) >>> 3. remote spawn - have no children (Justin Cinkelj) >>> 4. Re: remote spawn - have no children (r...@open-mpi.org) >>> 5. Re: remote spawn - have no children (r...@open-mpi.org) >>> 6. Re: remote spawn - have no children (Justin Cinkelj) >>> 7. Re: remote spawn - have no children (r...@open-mpi.org) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Tue, 2 May 2017 15:40:59 -0500 >>> From: Dave Turner <drdavetur...@gmail.com> >>> To: Open MPI Developers <devel@lists.open-mpi.org> >>> Subject: [OMPI devel] NetPIPE performance curves >>> Message-ID: >>> <CAFGXdkzMut=5r6pv63pqxdwxep0ca1twna2vfofqj_8uyrw...@mail.gm >>> ail.com> >>> Content-Type: text/plain; charset="utf-8" >>> >>> >>> I've used my NetPIPE communication benchmark ( >>> http://netpipe.cs.ksu.edu) >>> >>> to measure the performance of OpenMPI and other implementations on >>> Comet at SDSC (FDR IB, graph attached, same results measured elsewhere >>> too). >>> The uni-directional performance is good at 50 Gbps, the bi-directional >>> performance >>> is double that at 97 Gbps, and the aggregate bandwidth from measuring >>> 24 bi-directional ping-pongs across the link between 2 nodes is a little >>> lower >>> than I'd like to see but still respectable, and similar for MVAPICH. All >>> these >>> were measured by reusing the same source and destination buffers >>> each time. >>> >>> When I measure using the --nocache flag where the data comes >>> from a new buffer in main memory each time, and is therefore also >>> not already registered with the IB card, and likewise gets put into a >>> new buffer in main memory, I see a loss in performance of at least >>> 20%. Could someone please give me a short description of whether >>> this is due to data being copied into a memory buffer that is already >>> registered with the IB card, or whether this is the cost of registering >>> the new memory with the IB card for its first use? >>> I also see huge performance losses in this case when the message >>> size is not a factor of 8 bytes (factors of 8 are the tops of the >>> spikes). >>> I've seen this in the past when there was a memory copy involved and >>> the copy routine switched to a byte-by-byte copy for non factors of 8. >>> While I don't know how many apps fall into this worst case scenario >>> that the --nocache measurements represent, I could certainly see large >>> bioinformatics runs being affected as the message lengths are not >>> going to be factors of 8 bytes. >>> >>> Dave Turner >>> >>> -- >>> Work: davetur...@ksu.edu (785) 532-7791 >>> 2219 Engineering Hall, Manhattan KS 66506 >>> Home: drdavetur...@gmail.com >>> cell: (785) 770-5929 >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: <https://rfd.newmexicoconsortium.org/mailman/private/devel/a >>> ttachments/20170502/a6fe11d2/attachment.html> >>> -------------- next part -------------- >>> A non-text attachment was scrubbed... >>> Name: np.comet.openmpi.pdf >>> Type: application/pdf >>> Size: 20093 bytes >>> Desc: not available >>> URL: <https://rfd.newmexicoconsortium.org/mailman/private/devel/a >>> ttachments/20170502/a6fe11d2/attachment.pdf> >>> >>> ------------------------------ >>> >>> Message: 2 >>> Date: Tue, 2 May 2017 22:57:52 -0400 >>> From: George Bosilca <bosi...@icl.utk.edu> >>> To: Dave Turner <drdavetur...@gmail.com>, Open MPI Developers >>> <devel@lists.open-mpi.org> >>> Subject: Re: [OMPI devel] NetPIPE performance curves >>> Message-ID: >>> <camjjpkvx+yuncomkv5oc-zodzeh4e3pqtzvw5sozha8n4um...@mail.gm >>> ail.com> >>> Content-Type: text/plain; charset="utf-8" >>> >>> >>> David, >>> >>> Are you using the OB1 PML or one of our IB-enabled MTLs (UCX or MXM) ? I >>> have access to similar cards, and I can't replicate your results. I do >>> see >>> a performance loss, but nowhere near what you have seen (it is going down >>> to 47Gb instead of 50Gb). >>> >>> George. >>> >>> >>> On Tue, May 2, 2017 at 4:40 PM, Dave Turner <drdavetur...@gmail.com> >>> wrote: >>> >>> > >>> > I've used my NetPIPE communication benchmark ( >>> > http://netpipe.cs.ksu.edu) >>> > to measure the performance of OpenMPI and other implementations on >>> > Comet at SDSC (FDR IB, graph attached, same results measured elsewhere >>> > too). >>> > The uni-directional performance is good at 50 Gbps, the bi-directional >>> > performance >>> > is double that at 97 Gbps, and the aggregate bandwidth from measuring >>> > 24 bi-directional ping-pongs across the link between 2 nodes is a >>> little >>> > lower >>> > than I'd like to see but still respectable, and similar for MVAPICH. >>> All >>> > these >>> > were measured by reusing the same source and destination buffers >>> > each time. >>> > >>> > When I measure using the --nocache flag where the data comes >>> > from a new buffer in main memory each time, and is therefore also >>> > not already registered with the IB card, and likewise gets put into a >>> > new buffer in main memory, I see a loss in performance of at least >>> > 20%. Could someone please give me a short description of whether >>> > this is due to data being copied into a memory buffer that is already >>> > registered with the IB card, or whether this is the cost of registering >>> > the new memory with the IB card for its first use? >>> > I also see huge performance losses in this case when the message >>> > size is not a factor of 8 bytes (factors of 8 are the tops of the >>> spikes). >>> > I've seen this in the past when there was a memory copy involved and >>> > the copy routine switched to a byte-by-byte copy for non factors of 8. >>> > While I don't know how many apps fall into this worst case scenario >>> > that the --nocache measurements represent, I could certainly see large >>> > bioinformatics runs being affected as the message lengths are not >>> > going to be factors of 8 bytes. >>> > >>> > Dave Turner >>> > >>> > -- >>> > Work: davetur...@ksu.edu (785) 532-7791 >>> > 2219 Engineering Hall, Manhattan KS 66506 >>> > Home: drdavetur...@gmail.com >>> > cell: (785) 770-5929 >>> > >>> > _______________________________________________ >>> > devel mailing list >>> > devel@lists.open-mpi.org >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> > >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: <https://rfd.newmexicoconsortium.org/mailman/private/devel/a >>> ttachments/20170502/fedc2258/attachment.html> >>> >>> ------------------------------ >>> >>> Message: 3 >>> Date: Wed, 3 May 2017 09:17:45 +0200 >>> From: Justin Cinkelj <justin.cink...@xlab.si> >>> To: devel@lists.open-mpi.org >>> Subject: [OMPI devel] remote spawn - have no children >>> Message-ID: <0c2b60ba-ecdd-902f-3b1f-50bea9c5a...@xlab.si> >>> Content-Type: text/plain; charset=utf-8; format=flowed >>> >>> Important detail first: I get this message from significantly modified >>> Open MPI code, so problem exists solely due to my mistake. >>> >>> Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, >>> than orted figures out it has nothing to do. >>> If I request to start workers on the same 192.168.122.90 IP, the >>> mpi_hello is started. >>> >>> Partial log: >>> /usr/bin/mpirun -np 1 ... mpi_hello >>> # >>> [osv:00252] [[50738,0],0] plm:base:setup_job >>> [osv:00252] [[50738,0],0] plm:base:setup_vm >>> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map >>> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation >>> [osv:00252] [[50738,0],0] using dash_host >>> [osv:00252] [[50738,0],0] checking node 192.168.122.91 >>> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon [[50738,0],1] >>> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon >>> [[50738,0],1] to node 192.168.122.91 >>> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 num_procs >>> 2 >>> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 >>> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 num_procs >>> 2 >>> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 >>> [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1 >>> [osv:00252] [[50738,0],0] routed:binomial find children computing tree >>> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 num_procs >>> 2 >>> [osv:00252] [[50738,0],0] routed:binomial find children returning found >>> value 0 >>> [osv:00252] [[50738,0],0]: parent 0 num_children 1 >>> [osv:00252] [[50738,0],0]: child 1 >>> [osv:00252] [[50738,0],0] plm:osvrest: launching vm >>> # >>> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called >>> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 num_procs >>> 2 >>> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 >>> [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1 >>> [osv:00250] [[50738,0],1] routed:binomial find children computing tree >>> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 num_procs >>> 2 >>> [osv:00250] [[50738,0],1] routed:binomial find children returning found >>> value 0 >>> [osv:00250] [[50738,0],1]: parent 0 num_children 0 >>> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children! >>> >>> In the plm mca module remote_spawn() function (my plm is based on >>> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question >>> is, which module(s) are responsible for filling in the coll.targets? >>> Then I will turn on the correct mca xzy_base_verbose level, and >>> hopefully narrow down my problem. I have quite a problem >>> guessing/finding out what various xyz strings mean :) >>> >>> Thank you, Justin >>> >>> >>> ------------------------------ >>> >>> Message: 4 >>> Date: Wed, 3 May 2017 06:26:03 -0700 >>> From: "r...@open-mpi.org" <r...@open-mpi.org> >>> To: OpenMPI Devel <devel@lists.open-mpi.org> >>> Subject: Re: [OMPI devel] remote spawn - have no children >>> Message-ID: <98236fb7-e31f-468d-ad07-fb2006e7c...@open-mpi.org> >>> Content-Type: text/plain; charset=us-ascii >>> >>> The orte routed framework does that for you - there is an API for that >>> purpose. >>> >>> >>> > On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si> >>> wrote: >>> > >>> > Important detail first: I get this message from significantly modified >>> Open MPI code, so problem exists solely due to my mistake. >>> > >>> > Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, >>> than orted figures out it has nothing to do. >>> > If I request to start workers on the same 192.168.122.90 IP, the >>> mpi_hello is started. >>> > >>> > Partial log: >>> > /usr/bin/mpirun -np 1 ... mpi_hello >>> > # >>> > [osv:00252] [[50738,0],0] plm:base:setup_job >>> > [osv:00252] [[50738,0],0] plm:base:setup_vm >>> > [osv:00252] [[50738,0],0] plm:base:setup_vm creating map >>> > [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation >>> > [osv:00252] [[50738,0],0] using dash_host >>> > [osv:00252] [[50738,0],0] checking node 192.168.122.91 >>> > [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon >>> [[50738,0],1] >>> > [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon >>> [[50738,0],1] to node 192.168.122.91 >>> > [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 >>> num_procs 2 >>> > [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 >>> > [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> > [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 >>> > [osv:00252] [[50738,0],0] routed:binomial find children checking peer 1 >>> > [osv:00252] [[50738,0],0] routed:binomial find children computing tree >>> > [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> > [osv:00252] [[50738,0],0] routed:binomial find children returning >>> found value 0 >>> > [osv:00252] [[50738,0],0]: parent 0 num_children 1 >>> > [osv:00252] [[50738,0],0]: child 1 >>> > [osv:00252] [[50738,0],0] plm:osvrest: launching vm >>> > # >>> > [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called >>> > [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> > [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 >>> > [osv:00250] [[50738,0],1] routed:binomial find children checking peer 1 >>> > [osv:00250] [[50738,0],1] routed:binomial find children computing tree >>> > [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> > [osv:00250] [[50738,0],1] routed:binomial find children returning >>> found value 0 >>> > [osv:00250] [[50738,0],1]: parent 0 num_children 0 >>> > [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no children! >>> > >>> > In the plm mca module remote_spawn() function (my plm is based on >>> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question is, >>> which module(s) are responsible for filling in the coll.targets? Then I >>> will turn on the correct mca xzy_base_verbose level, and hopefully narrow >>> down my problem. I have quite a problem guessing/finding out what various >>> xyz strings mean :) >>> > >>> > Thank you, Justin >>> > _______________________________________________ >>> > devel mailing list >>> > devel@lists.open-mpi.org >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> >>> >>> ------------------------------ >>> >>> Message: 5 >>> Date: Wed, 3 May 2017 06:29:16 -0700 >>> From: "r...@open-mpi.org" <r...@open-mpi.org> >>> To: OpenMPI Devel <devel@lists.open-mpi.org> >>> Subject: Re: [OMPI devel] remote spawn - have no children >>> Message-ID: <c2e59f60-1d3c-43c7-be7c-45036c5c6...@open-mpi.org> >>> Content-Type: text/plain; charset=utf-8 >>> >>> I should have looked more closely as you already have the routed verbose >>> output there. Everything in fact looks correct. The node with mpirun has 1 >>> child, which is the daemon on the other node. The vpid=1 daemon on node 250 >>> doesn?t have any children as there aren?t any more daemons in the system. >>> >>> Note that the output has nothing to do with spawning your mpi_hello - it >>> is solely describing the startup of the daemons. >>> >>> >>> > On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote: >>> > >>> > The orte routed framework does that for you - there is an API for that >>> purpose. >>> > >>> > >>> >> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si> >>> wrote: >>> >> >>> >> Important detail first: I get this message from significantly >>> modified Open MPI code, so problem exists solely due to my mistake. >>> >> >>> >> Orterun on 192.168.122.90 starts orted on remote node 192.168.122.91, >>> than orted figures out it has nothing to do. >>> >> If I request to start workers on the same 192.168.122.90 IP, the >>> mpi_hello is started. >>> >> >>> >> Partial log: >>> >> /usr/bin/mpirun -np 1 ... mpi_hello >>> >> # >>> >> [osv:00252] [[50738,0],0] plm:base:setup_job >>> >> [osv:00252] [[50738,0],0] plm:base:setup_vm >>> >> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map >>> >> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation >>> >> [osv:00252] [[50738,0],0] using dash_host >>> >> [osv:00252] [[50738,0],0] checking node 192.168.122.91 >>> >> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon >>> [[50738,0],1] >>> >> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon >>> [[50738,0],1] to node 192.168.122.91 >>> >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 >>> num_procs 2 >>> >> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 >>> >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> >> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 >>> >> [osv:00252] [[50738,0],0] routed:binomial find children checking peer >>> 1 >>> >> [osv:00252] [[50738,0],0] routed:binomial find children computing tree >>> >> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> >> [osv:00252] [[50738,0],0] routed:binomial find children returning >>> found value 0 >>> >> [osv:00252] [[50738,0],0]: parent 0 num_children 1 >>> >> [osv:00252] [[50738,0],0]: child 1 >>> >> [osv:00252] [[50738,0],0] plm:osvrest: launching vm >>> >> # >>> >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called >>> >> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> >> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 >>> >> [osv:00250] [[50738,0],1] routed:binomial find children checking peer >>> 1 >>> >> [osv:00250] [[50738,0],1] routed:binomial find children computing tree >>> >> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> >> [osv:00250] [[50738,0],1] routed:binomial find children returning >>> found value 0 >>> >> [osv:00250] [[50738,0],1]: parent 0 num_children 0 >>> >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no >>> children! >>> >> >>> >> In the plm mca module remote_spawn() function (my plm is based on >>> orte/mca/plm/rsh/), the &coll.targets list has zero length. My question is, >>> which module(s) are responsible for filling in the coll.targets? Then I >>> will turn on the correct mca xzy_base_verbose level, and hopefully narrow >>> down my problem. I have quite a problem guessing/finding out what various >>> xyz strings mean :) >>> >> >>> >> Thank you, Justin >>> >> _______________________________________________ >>> >> devel mailing list >>> >> devel@lists.open-mpi.org >>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> > >>> > _______________________________________________ >>> > devel mailing list >>> > devel@lists.open-mpi.org >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> >>> >>> ------------------------------ >>> >>> Message: 6 >>> Date: Wed, 3 May 2017 17:15:50 +0200 (CEST) >>> From: Justin Cinkelj <justin.cink...@xlab.si> >>> To: Open MPI Developers <devel@lists.open-mpi.org> >>> Subject: Re: [OMPI devel] remote spawn - have no children >>> Message-ID: >>> <99032916.130606272.1493824550110.javamail.zim...@zimbra.xlab.si >>> > >>> Content-Type: text/plain; charset=utf-8 >>> >>> So "remote spawn" and children refer to orted daemons only, and I was >>> looking into wrong modules. >>> >>> Which module(s) are then responsible to send command to orted to start >>> mpi application? >>> Which event names should I search for? >>> >>> Thank you, >>> Justin >>> >>> ----- Original Message ----- >>> > From: r...@open-mpi.org >>> > To: "OpenMPI Devel" <devel@lists.open-mpi.org> >>> > Sent: Wednesday, May 3, 2017 3:29:16 PM >>> > Subject: Re: [OMPI devel] remote spawn - have no children >>> > >>> > I should have looked more closely as you already have the routed >>> verbose >>> > output there. Everything in fact looks correct. The node with mpirun >>> has 1 >>> > child, which is the daemon on the other node. The vpid=1 daemon on >>> node 250 >>> > doesn?t have any children as there aren?t any more daemons in the >>> system. >>> > >>> > Note that the output has nothing to do with spawning your mpi_hello - >>> it is >>> > solely describing the startup of the daemons. >>> > >>> > >>> > > On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote: >>> > > >>> > > The orte routed framework does that for you - there is an API for >>> that >>> > > purpose. >>> > > >>> > > >>> > >> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si >>> > >>> > >> wrote: >>> > >> >>> > >> Important detail first: I get this message from significantly >>> modified >>> > >> Open MPI code, so problem exists solely due to my mistake. >>> > >> >>> > >> Orterun on 192.168.122.90 starts orted on remote node >>> 192.168.122.91, than >>> > >> orted figures out it has nothing to do. >>> > >> If I request to start workers on the same 192.168.122.90 IP, the >>> mpi_hello >>> > >> is started. >>> > >> >>> > >> Partial log: >>> > >> /usr/bin/mpirun -np 1 ... mpi_hello >>> > >> # >>> > >> [osv:00252] [[50738,0],0] plm:base:setup_job >>> > >> [osv:00252] [[50738,0],0] plm:base:setup_vm >>> > >> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map >>> > >> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation >>> > >> [osv:00252] [[50738,0],0] using dash_host >>> > >> [osv:00252] [[50738,0],0] checking node 192.168.122.91 >>> > >> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon >>> [[50738,0],1] >>> > >> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon >>> > >> [[50738,0],1] to node 192.168.122.91 >>> > >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 >>> num_procs 2 >>> > >> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 >>> > >> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> > >> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 >>> > >> [osv:00252] [[50738,0],0] routed:binomial find children checking >>> peer 1 >>> > >> [osv:00252] [[50738,0],0] routed:binomial find children computing >>> tree >>> > >> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> > >> [osv:00252] [[50738,0],0] routed:binomial find children returning >>> found >>> > >> value 0 >>> > >> [osv:00252] [[50738,0],0]: parent 0 num_children 1 >>> > >> [osv:00252] [[50738,0],0]: child 1 >>> > >> [osv:00252] [[50738,0],0] plm:osvrest: launching vm >>> > >> # >>> > >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called >>> > >> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> > >> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 >>> > >> [osv:00250] [[50738,0],1] routed:binomial find children checking >>> peer 1 >>> > >> [osv:00250] [[50738,0],1] routed:binomial find children computing >>> tree >>> > >> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> > >> [osv:00250] [[50738,0],1] routed:binomial find children returning >>> found >>> > >> value 0 >>> > >> [osv:00250] [[50738,0],1]: parent 0 num_children 0 >>> > >> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no >>> children! >>> > >> >>> > >> In the plm mca module remote_spawn() function (my plm is based on >>> > >> orte/mca/plm/rsh/), the &coll.targets list has zero length. My >>> question >>> > >> is, which module(s) are responsible for filling in the >>> coll.targets? Then >>> > >> I will turn on the correct mca xzy_base_verbose level, and hopefully >>> > >> narrow down my problem. I have quite a problem guessing/finding out >>> what >>> > >> various xyz strings mean :) >>> > >> >>> > >> Thank you, Justin >>> > >> _______________________________________________ >>> > >> devel mailing list >>> > >> devel@lists.open-mpi.org >>> > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> > > >>> > > _______________________________________________ >>> > > devel mailing list >>> > > devel@lists.open-mpi.org >>> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> > >>> > _______________________________________________ >>> > devel mailing list >>> > devel@lists.open-mpi.org >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> >>> ------------------------------ >>> >>> Message: 7 >>> Date: Wed, 3 May 2017 08:54:26 -0700 >>> From: "r...@open-mpi.org" <r...@open-mpi.org> >>> To: OpenMPI Devel <devel@lists.open-mpi.org> >>> Subject: Re: [OMPI devel] remote spawn - have no children >>> Message-ID: <25adbe2f-1fd0-4a80-8244-9b348ca9c...@open-mpi.org> >>> Content-Type: text/plain; charset="utf-8" >>> >>> Everything operates via the state machine - events trigger moving the >>> job from one state to the next, with each state being tied to a callback >>> function that implements that state. If you set state_base_verbose=5, >>> you?ll see when and where each state gets executed. >>> >>> By default, the launch_app state goes to a function in the plm/base: >>> >>> https://github.com/open-mpi/ompi/blob/master/orte/mca/plm/ba >>> se/plm_base_launch_support.c#L477 <https://github.com/open-mpi/o >>> mpi/blob/master/orte/mca/plm/base/plm_base_launch_support.c#L477> >>> >>> I suspect the problem is that your plm component isn?t activating the >>> next step upon completion of launch_daemons. >>> >>> >>> > On May 3, 2017, at 8:15 AM, Justin Cinkelj <justin.cink...@xlab.si> >>> wrote: >>> > >>> > So "remote spawn" and children refer to orted daemons only, and I was >>> looking into wrong modules. >>> > >>> > Which module(s) are then responsible to send command to orted to start >>> mpi application? >>> > Which event names should I search for? >>> > >>> > Thank you, >>> > Justin >>> > >>> > ----- Original Message ----- >>> >> From: r...@open-mpi.org >>> >> To: "OpenMPI Devel" <devel@lists.open-mpi.org> >>> >> Sent: Wednesday, May 3, 2017 3:29:16 PM >>> >> Subject: Re: [OMPI devel] remote spawn - have no children >>> >> >>> >> I should have looked more closely as you already have the routed >>> verbose >>> >> output there. Everything in fact looks correct. The node with mpirun >>> has 1 >>> >> child, which is the daemon on the other node. The vpid=1 daemon on >>> node 250 >>> >> doesn?t have any children as there aren?t any more daemons in the >>> system. >>> >> >>> >> Note that the output has nothing to do with spawning your mpi_hello - >>> it is >>> >> solely describing the startup of the daemons. >>> >> >>> >> >>> >>> On May 3, 2017, at 6:26 AM, r...@open-mpi.org wrote: >>> >>> >>> >>> The orte routed framework does that for you - there is an API for >>> that >>> >>> purpose. >>> >>> >>> >>> >>> >>>> On May 3, 2017, at 12:17 AM, Justin Cinkelj <justin.cink...@xlab.si >>> > >>> >>>> wrote: >>> >>>> >>> >>>> Important detail first: I get this message from significantly >>> modified >>> >>>> Open MPI code, so problem exists solely due to my mistake. >>> >>>> >>> >>>> Orterun on 192.168.122.90 starts orted on remote node >>> 192.168.122.91, than >>> >>>> orted figures out it has nothing to do. >>> >>>> If I request to start workers on the same 192.168.122.90 IP, the >>> mpi_hello >>> >>>> is started. >>> >>>> >>> >>>> Partial log: >>> >>>> /usr/bin/mpirun -np 1 ... mpi_hello >>> >>>> # >>> >>>> [osv:00252] [[50738,0],0] plm:base:setup_job >>> >>>> [osv:00252] [[50738,0],0] plm:base:setup_vm >>> >>>> [osv:00252] [[50738,0],0] plm:base:setup_vm creating map >>> >>>> [osv:00252] [[50738,0],0] setup:vm: working unmanaged allocation >>> >>>> [osv:00252] [[50738,0],0] using dash_host >>> >>>> [osv:00252] [[50738,0],0] checking node 192.168.122.91 >>> >>>> [osv:00252] [[50738,0],0] plm:base:setup_vm add new daemon >>> [[50738,0],1] >>> >>>> [osv:00252] [[50738,0],0] plm:base:setup_vm assigning new daemon >>> >>>> [[50738,0],1] to node 192.168.122.91 >>> >>>> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 0 >>> num_procs 2 >>> >>>> [osv:00252] [[50738,0],0] routed:binomial 0 found child 1 >>> >>>> [osv:00252] [[50738,0],0] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> >>>> [osv:00252] [[50738,0],0] routed:binomial find children of rank 0 >>> >>>> [osv:00252] [[50738,0],0] routed:binomial find children checking >>> peer 1 >>> >>>> [osv:00252] [[50738,0],0] routed:binomial find children computing >>> tree >>> >>>> [osv:00252] [[50738,0],0] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> >>>> [osv:00252] [[50738,0],0] routed:binomial find children returning >>> found >>> >>>> value 0 >>> >>>> [osv:00252] [[50738,0],0]: parent 0 num_children 1 >>> >>>> [osv:00252] [[50738,0],0]: child 1 >>> >>>> [osv:00252] [[50738,0],0] plm:osvrest: launching vm >>> >>>> # >>> >>>> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn called >>> >>>> [osv:00250] [[50738,0],1] routed:binomial rank 0 parent 0 me 1 >>> num_procs 2 >>> >>>> [osv:00250] [[50738,0],1] routed:binomial find children of rank 0 >>> >>>> [osv:00250] [[50738,0],1] routed:binomial find children checking >>> peer 1 >>> >>>> [osv:00250] [[50738,0],1] routed:binomial find children computing >>> tree >>> >>>> [osv:00250] [[50738,0],1] routed:binomial rank 1 parent 0 me 1 >>> num_procs 2 >>> >>>> [osv:00250] [[50738,0],1] routed:binomial find children returning >>> found >>> >>>> value 0 >>> >>>> [osv:00250] [[50738,0],1]: parent 0 num_children 0 >>> >>>> [osv:00250] [[50738,0],1] plm:osvrest: remote spawn - have no >>> children! >>> >>>> >>> >>>> In the plm mca module remote_spawn() function (my plm is based on >>> >>>> orte/mca/plm/rsh/), the &coll.targets list has zero length. My >>> question >>> >>>> is, which module(s) are responsible for filling in the >>> coll.targets? Then >>> >>>> I will turn on the correct mca xzy_base_verbose level, and hopefully >>> >>>> narrow down my problem. I have quite a problem guessing/finding out >>> what >>> >>>> various xyz strings mean :) >>> >>>> >>> >>>> Thank you, Justin >>> >>>> _______________________________________________ >>> >>>> devel mailing list >>> >>>> devel@lists.open-mpi.org >>> >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> >>> >>> _______________________________________________ >>> >>> devel mailing list >>> >>> devel@lists.open-mpi.org >>> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >> >>> >> _______________________________________________ >>> >> devel mailing list >>> >> devel@lists.open-mpi.org >>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> > _______________________________________________ >>> > devel mailing list >>> > devel@lists.open-mpi.org >>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> -------------- next part -------------- >>> An HTML attachment was scrubbed... >>> URL: <https://rfd.newmexicoconsortium.org/mailman/private/devel/a >>> ttachments/20170503/5f5d7b7a/attachment.html> >>> >>> ------------------------------ >>> >>> Subject: Digest Footer >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>> >>> ------------------------------ >>> >>> End of devel Digest, Vol 3468, Issue 1 >>> ************************************** >>> >> >> >> >> -- >> Work: davetur...@ksu.edu (785) 532-7791 >> 2219 Engineering Hall, Manhattan KS 66506 >> Home: drdavetur...@gmail.com >> cell: (785) 770-5929 >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> > > -- Work: davetur...@ksu.edu (785) 532-7791 2219 Engineering Hall, Manhattan KS 66506 Home: drdavetur...@gmail.com cell: (785) 770-5929
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel