Can we add this to the agenda tomorrow?
Begin forwarded message: > From: Ralph Castain <r...@open-mpi.org> > Subject: Re: [OMPI devel] Non-zero exit status > Date: April 13, 2012 6:40:53 PM EDT > To: Open MPI Developers <de...@open-mpi.org> > Reply-To: Open MPI Developers <de...@open-mpi.org> > > Did you have the param set? I found some missing code in the orted errmgr > that contributed to it, but unless you had set the param in your test, there > is no way it would abort no matter how many procs exit with non-zero status. > > I'm guessing you have that param set in your test due to our earlier defining > the default to "no abort". I'm content to leave it there, but wanted to > ensure your tests ran clean. > > On Apr 13, 2012, at 4:32 PM, TERRY DONTJE wrote: > >> I could see if less then N processes exit with non-zero exit code that the >> ORTE may choose not to abort the job. However, if all N processes have >> exited or aborted I expect everything to clean up and mpirun to exit. It >> does not do that at the moment which I think is what is causing most of the >> hangs in the MTT trunk runs which did not occur prior to this week. >> >> --td >> >> On 4/13/2012 5:18 PM, Ralph Castain wrote: >>> This has come up again because some of the MTT tests depend on a specific >>> behavior when a process exits with a non-zero status - in this case, they >>> expect ORTE to abort the job. At some point, the default had been switched >>> to NOT abort the job if a process exited with a non-zero status. >>> >>> So I'll throw this out to the community: if any process exits with a >>> non-zero status, should ORTE abort the job? >>> >>> I don't personally care, but we ought to decide on something. In the >>> meantime, I will set the default so we DO abort, thus allowing the MTT runs >>> to complete correctly. >>> >>> FWIW: the MCA param orte_abort_non_zero_exit can always be set to control >>> this behavior. >>> >>> Ralph >>> >>> >>> _______________________________________________ >>> devel mailing list >>> >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Terry D. Dontje | Principal Software Engineer >> Developer Tools Engineering | +1.781.442.2631 >> Oracle - Performance Technologies >> 95 Network Drive, Burlington, MA 01803 >> Email terry.don...@oracle.com >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/