Hi Nathan,

Not entirely sure it is related but I'm also seeing a hang at the end of 
Put_all_local with -mca pml ob1 -mca btl tcp,sm,self.
It seems to have finished the test but doesn't proceed to the next one. When 
run alone, Put_all_local finishes fine.

Also, I verified with master and I see no hang.

Yohann

-----Original Message-----
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
Sent: Tuesday, May 12, 2015 11:59 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Hang in IMB-RMA?


Thanks! I will look at osc/rdma in 1.8 and see about patching the bug. The RMA 
code in master and 1.8 has diverged significantly but it shouldn't be too 
dificult to fix.

-Nathan

On Tue, May 12, 2015 at 06:50:41PM +0000, Friedley, Andrew wrote:
> Hi Nathan,
> 
> I should have thought to do that.  Yes, the issue seems to be fixed on master 
> -- no hangs on PSM, openib, or tcp.
> 
> Andrew
> 
> > -----Original Message-----
> > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan 
> > Hjelm
> > Sent: Tuesday, May 12, 2015 9:44 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] Hang in IMB-RMA?
> > 
> > 
> > Thanks for the report. Can you try with master and see if the issue 
> > is fixed there?
> > 
> > -Nathan
> > 
> > On Tue, May 12, 2015 at 04:38:01PM +0000, Friedley, Andrew wrote:
> > > Hi,
> > >
> > > I've run into a problem with the IMB-RMA exchange_get test.  At 
> > > this point
> > I suspect it's an issue in Open MPI or the test itself.  Could 
> > someone take a look?
> > >
> > > I'm running Open MPI 1.8.5 and IMB 4.0.2.  MVAPICH2 is able to run 
> > > all of
> > IMB-RMA successfully.
> > >
> > >  mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA
> > >
> > > Eventually hangs at the end of exchange_get (after 4mb is 
> > > reported)
> > running the np=2 pass.  IMB runs every np power of 2 up to and 
> > including the np given on the command line.  So, with mpirun -np 4 
> > above, IMB runs each of its tests with np=2 and then with np=4.
> > >
> > > If I run just the exchange_get test, the same thing happens:
> > >
> > >  mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA 
> > > exchange_get
> > >
> > > If I run either of the above commands with -np 2, IMB-RMA 
> > > successfully
> > runs to completion.
> > >
> > > I have reproduced with tcp, verbs, and PSM -- does not appear to 
> > > be
> > transport specific.  MVAPICH2 2.0 works.
> > >
> > > Below are bracktraces from two of the four ranks.  The other two 
> > > ranks
> > each have a backtrace similar to these two.
> > >
> > > Thanks!
> > >
> > > Andrew
> > >
> > > #0  0x00007fca39a4c0c7 in sched_yield () from /lib64/libc.so.6
> > > #1  0x00007fca393ef2fb in opal_progress () at
> > > runtime/opal_progress.c:197
> > > #2  0x00007fca33cd21f5 in opal_condition_wait (m=0x247fc70, c=0x247fcd8)
> > >     at ../../../../opal/threads/condition.h:78
> > > #3  ompi_osc_rdma_flush_lock (module=module@entry=0x247fb50,
> > lock=0x2481a20,
> > >                     target=target@entry=3) at
> > > osc_rdma_passive_target.c:530
> > > #4  0x00007fca33cd43bd in ompi_osc_rdma_flush (target=3,
> > win=0x2482150)
> > >     at osc_rdma_passive_target.c:578
> > > #5  0x00007fca39fe5654 in PMPI_Win_flush (rank=3, win=0x2482150)
> > >         at pwin_flush.c:58
> > > #6  0x000000000040aec5 in IMB_rma_exchange_get ()
> > > #7  0x0000000000406a35 in IMB_warm_up ()
> > > #8  0x00000000004023bd in main ()
> > >
> > > #0  0x00007f1c81890bdd in poll () from /lib64/libc.so.6
> > > #1  0x00007f1c81271c86 in poll_dispatch (base=0x1be8350,
> > tv=0x7fff4c323480)
> > >             at poll.c:165
> > > #2  0x00007f1c81269aa4 in opal_libevent2021_event_base_loop
> > (base=0x1be8350,
> > >                     flags=2) at event.c:1633
> > > #3  0x00007f1c812232e8 in opal_progress () at
> > > runtime/opal_progress.c:169
> > > #4  0x00007f1c7b9641f5 in opal_condition_wait (m=0x1ccf4a0, c=0x1ccf508)
> > >     at ../../../../opal/threads/condition.h:78
> > > #5  ompi_osc_rdma_flush_lock (module=module@entry=0x1ccf380,
> > lock=0x23287f0,
> > >                     target=target@entry=0) at
> > > osc_rdma_passive_target.c:530
> > > #6  0x00007f1c7b9663bd in ompi_osc_rdma_flush (target=0,
> > win=0x2317d00)
> > >     at osc_rdma_passive_target.c:578
> > > #7  0x00007f1c81e19654 in PMPI_Win_flush (rank=0, win=0x2317d00)
> > >         at pwin_flush.c:58
> > > #8  0x000000000040aec5 in IMB_rma_exchange_get ()
> > > #9  0x0000000000406a35 in IMB_warm_up ()
> > > #10 0x00000000004023bd in main ()
> > > _______________________________________________
> > > devel mailing list
> > > de...@open-mpi.org
> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > > Link to this post:
> > > http://www.open-mpi.org/community/lists/devel/2015/05/17396.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/05/17398.php

Reply via email to