It’s probably a race condition caused by uniting the ORTE and OPAL async 
threads, though I can’t confirm that yet.

> On Jul 17, 2015, at 3:11 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> 
> Folks,
> 
> I noticed several errors such as 
> http://mtt.open-mpi.org/index.php?do_redir=2244 
> <http://mtt.open-mpi.org/index.php?do_redir=2244>
> that did not make any sense to me (at first glance)
> 
> I was able to attach one process when the issue occurs.
> the sigsegv occurs in thread 2, while thread 1 is invoking ompi_rte_finalize.
> 
> All I can think is a scenario in which the progress thread (aka thread 2) is 
> still dealing with some memory that was just freed/unmapped/corrupted by the 
> main thread.
> 
> I empirically noticed the error is more likely to occur when there are many 
> tasks on one node
> e.g. mpirun --oversubscribe -np 32 a.out
> 
> Cheers,
> 
> Gilles
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/07/17652.php

Reply via email to