Hmmm….yeah, I know we saw this and resolved it in the trunk, but it looks like 
the fix indeed failed to come over to 1.8. I’ll take a gander (pretty sure I 
remember how I fixed it) - thanks!

> On Nov 26, 2014, at 12:03 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
> 
> Ralph,
> 
> i noted several hangs in mtt with the v1.8 branch.
> 
> a simple way to reproduce it is to use the MPI_Errhandler_fatal_f test
> from the intel_tests suite,
> invoke mpirun on one node and run the taks on an other node :
> 
> node0$ mpirun -np 3 -host node1 --mca btl tcp,self ./MPI_Errhandler_fatal_f
> 
> /* since this is a race condition, you might need to run this in a loop
> in order to hit the bug */
> 
> the attached tarball contains a patch (add debug + temporary hack) and
> some log files obtained with
> --mca errmgr_base_verbose 100 --mca odls_base_verbose 100
> 
> without the hack, i can reproduce the bug with -np 3 (log.ko.txt) , with
> the hack, i can still reproduce the hang (though it might
> be a different one) with -np 16 (log.ko.2.txt)
> 
> i remember some similar hangs were fixed on the trunk/master a few
> monthes ago.
> i tried to backport some commits but it did not help :-(
> 
> could you please have a look at this ?
> 
> Cheers,
> 
> Gilles
> <abort_hang.tar.gz>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16357.php

Reply via email to