fwiw,

the onsided/c_fence_lock test from the ibm test suite hangs

(mpirun -np 2 ./c_fence_lock)

i ran a git bisect and it incriminates commit
b90c83840f472de3219b87cd7e1a364eec5c5a29

commit b90c83840f472de3219b87cd7e1a364eec5c5a29
Author: bosilca <bosi...@users.noreply.github.com>
List-Post: devel@lists.open-mpi.org
Date:   Tue May 24 18:20:51 2016 -0500

    Refactor the request completion (#1422)

    * Remodel the request.
    Added the wait sync primitive and integrate it into the PML and MTL
    infrastructure. The multi-threaded requests are now significantly
    less heavy and less noisy (only the threads associated with completed
    requests are signaled).

    * Fix the condition to release the request.


I also noted a warning is emitted when running only one task

./c_fence_lock

but I did not git bisect, so that might not be related

Cheers,


Gilles

On Thursday, June 2, 2016, Ralph Castain <r...@open-mpi.org> wrote:

> Yes, please! I’d like to know what mpirun thinks is happening - if you
> like, just set the —timeout N —report-state-on-timeout flags and tell me
> what comes out
>
> On Jun 1, 2016, at 7:57 PM, George Bosilca <bosi...@icl.utk.edu
> <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> wrote:
>
> I don't think it matters. I was running the IBM collective and pt2pt
> tests, but each time it deadlocked was in a different test. If you are
> interested in some particular values, I would be happy to attach a debugger
> next time it happens.
>
>   George.
>
>
> On Wed, Jun 1, 2016 at 10:47 PM, Ralph Castain <r...@open-mpi.org
> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>
>> What kind of apps are they? Or does it matter what you are running?
>>
>>
>> > On Jun 1, 2016, at 7:37 PM, George Bosilca <bosi...@icl.utk.edu
>> <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> wrote:
>> >
>> > I have a seldomly occurring deadlock on a OS X laptop if I use more
>> than 2 processes). It is coming up once every 200 runs or so.
>> >
>> > Here is what I could gather from my experiments: All the MPI processes
>> seem to have correctly completed (I get all the expected output and the MPI
>> processes are in a waiting state), but somehow the mpirun does not detect
>> their completion. As a result, mpirun never returns.
>> >
>> >   George.
>> >
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> > Searchable archives:
>> http://www.open-mpi.org/community/lists/devel/2016/06/19054.php
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2016/06/19054.php
>>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <javascript:_e(%7B%7D,'cvml','de...@open-mpi.org');>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/06/19055.php
>
>
>

Reply via email to