Hi Nathan, Not entirely sure it is related but I'm also seeing a hang at the end of Put_all_local with -mca pml ob1 -mca btl tcp,sm,self. It seems to have finished the test but doesn't proceed to the next one. When run alone, Put_all_local finishes fine.
Also, I verified with master and I see no hang. Yohann -----Original Message----- From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm Sent: Tuesday, May 12, 2015 11:59 AM To: Open MPI Developers Subject: Re: [OMPI devel] Hang in IMB-RMA? Thanks! I will look at osc/rdma in 1.8 and see about patching the bug. The RMA code in master and 1.8 has diverged significantly but it shouldn't be too dificult to fix. -Nathan On Tue, May 12, 2015 at 06:50:41PM +0000, Friedley, Andrew wrote: > Hi Nathan, > > I should have thought to do that. Yes, the issue seems to be fixed on master > -- no hangs on PSM, openib, or tcp. > > Andrew > > > -----Original Message----- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan > > Hjelm > > Sent: Tuesday, May 12, 2015 9:44 AM > > To: Open MPI Developers > > Subject: Re: [OMPI devel] Hang in IMB-RMA? > > > > > > Thanks for the report. Can you try with master and see if the issue > > is fixed there? > > > > -Nathan > > > > On Tue, May 12, 2015 at 04:38:01PM +0000, Friedley, Andrew wrote: > > > Hi, > > > > > > I've run into a problem with the IMB-RMA exchange_get test. At > > > this point > > I suspect it's an issue in Open MPI or the test itself. Could > > someone take a look? > > > > > > I'm running Open MPI 1.8.5 and IMB 4.0.2. MVAPICH2 is able to run > > > all of > > IMB-RMA successfully. > > > > > > mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA > > > > > > Eventually hangs at the end of exchange_get (after 4mb is > > > reported) > > running the np=2 pass. IMB runs every np power of 2 up to and > > including the np given on the command line. So, with mpirun -np 4 > > above, IMB runs each of its tests with np=2 and then with np=4. > > > > > > If I run just the exchange_get test, the same thing happens: > > > > > > mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA > > > exchange_get > > > > > > If I run either of the above commands with -np 2, IMB-RMA > > > successfully > > runs to completion. > > > > > > I have reproduced with tcp, verbs, and PSM -- does not appear to > > > be > > transport specific. MVAPICH2 2.0 works. > > > > > > Below are bracktraces from two of the four ranks. The other two > > > ranks > > each have a backtrace similar to these two. > > > > > > Thanks! > > > > > > Andrew > > > > > > #0 0x00007fca39a4c0c7 in sched_yield () from /lib64/libc.so.6 > > > #1 0x00007fca393ef2fb in opal_progress () at > > > runtime/opal_progress.c:197 > > > #2 0x00007fca33cd21f5 in opal_condition_wait (m=0x247fc70, c=0x247fcd8) > > > at ../../../../opal/threads/condition.h:78 > > > #3 ompi_osc_rdma_flush_lock (module=module@entry=0x247fb50, > > lock=0x2481a20, > > > target=target@entry=3) at > > > osc_rdma_passive_target.c:530 > > > #4 0x00007fca33cd43bd in ompi_osc_rdma_flush (target=3, > > win=0x2482150) > > > at osc_rdma_passive_target.c:578 > > > #5 0x00007fca39fe5654 in PMPI_Win_flush (rank=3, win=0x2482150) > > > at pwin_flush.c:58 > > > #6 0x000000000040aec5 in IMB_rma_exchange_get () > > > #7 0x0000000000406a35 in IMB_warm_up () > > > #8 0x00000000004023bd in main () > > > > > > #0 0x00007f1c81890bdd in poll () from /lib64/libc.so.6 > > > #1 0x00007f1c81271c86 in poll_dispatch (base=0x1be8350, > > tv=0x7fff4c323480) > > > at poll.c:165 > > > #2 0x00007f1c81269aa4 in opal_libevent2021_event_base_loop > > (base=0x1be8350, > > > flags=2) at event.c:1633 > > > #3 0x00007f1c812232e8 in opal_progress () at > > > runtime/opal_progress.c:169 > > > #4 0x00007f1c7b9641f5 in opal_condition_wait (m=0x1ccf4a0, c=0x1ccf508) > > > at ../../../../opal/threads/condition.h:78 > > > #5 ompi_osc_rdma_flush_lock (module=module@entry=0x1ccf380, > > lock=0x23287f0, > > > target=target@entry=0) at > > > osc_rdma_passive_target.c:530 > > > #6 0x00007f1c7b9663bd in ompi_osc_rdma_flush (target=0, > > win=0x2317d00) > > > at osc_rdma_passive_target.c:578 > > > #7 0x00007f1c81e19654 in PMPI_Win_flush (rank=0, win=0x2317d00) > > > at pwin_flush.c:58 > > > #8 0x000000000040aec5 in IMB_rma_exchange_get () > > > #9 0x0000000000406a35 in IMB_warm_up () > > > #10 0x00000000004023bd in main () > > > _______________________________________________ > > > devel mailing list > > > de...@open-mpi.org > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > Link to this post: > > > http://www.open-mpi.org/community/lists/devel/2015/05/17396.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/05/17398.php