fwiw, the onsided/c_fence_lock test from the ibm test suite hangs
(mpirun -np 2 ./c_fence_lock) i ran a git bisect and it incriminates commit b90c83840f472de3219b87cd7e1a364eec5c5a29 commit b90c83840f472de3219b87cd7e1a364eec5c5a29 Author: bosilca <bosi...@users.noreply.github.com> List-Post: devel@lists.open-mpi.org Date: Tue May 24 18:20:51 2016 -0500 Refactor the request completion (#1422) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request. I also noted a warning is emitted when running only one task ./c_fence_lock but I did not git bisect, so that might not be related Cheers, Gilles On Thursday, June 2, 2016, Ralph Castain <r...@open-mpi.org> wrote: > Yes, please! I’d like to know what mpirun thinks is happening - if you > like, just set the —timeout N —report-state-on-timeout flags and tell me > what comes out > > On Jun 1, 2016, at 7:57 PM, George Bosilca <bosi...@icl.utk.edu > <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> wrote: > > I don't think it matters. I was running the IBM collective and pt2pt > tests, but each time it deadlocked was in a different test. If you are > interested in some particular values, I would be happy to attach a debugger > next time it happens. > > George. > > > On Wed, Jun 1, 2016 at 10:47 PM, Ralph Castain <r...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote: > >> What kind of apps are they? Or does it matter what you are running? >> >> >> > On Jun 1, 2016, at 7:37 PM, George Bosilca <bosi...@icl.utk.edu >> <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> wrote: >> > >> > I have a seldomly occurring deadlock on a OS X laptop if I use more >> than 2 processes). It is coming up once every 200 runs or so. >> > >> > Here is what I could gather from my experiments: All the MPI processes >> seem to have correctly completed (I get all the expected output and the MPI >> processes are in a waiting state), but somehow the mpirun does not detect >> their completion. As a result, mpirun never returns. >> > >> > George. >> > >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >> > Searchable archives: >> http://www.open-mpi.org/community/lists/devel/2016/06/19054.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/06/19054.php >> > > _______________________________________________ > devel mailing list > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/06/19055.php > > >