Can we add this to the agenda tomorrow?

Begin forwarded message:

> From: Ralph Castain <r...@open-mpi.org>
> Subject: Re: [OMPI devel] Non-zero exit status
> Date: April 13, 2012 6:40:53 PM EDT
> To: Open MPI Developers <de...@open-mpi.org>
> Reply-To: Open MPI Developers <de...@open-mpi.org>
> 
> Did you have the param set? I found some missing code in the orted errmgr 
> that contributed to it, but unless you had set the param in your test, there 
> is no way it would abort no matter how many procs exit with non-zero status.
> 
> I'm guessing you have that param set in your test due to our earlier defining 
> the default to "no abort". I'm content to leave it there, but wanted to 
> ensure your tests ran clean.
> 
> On Apr 13, 2012, at 4:32 PM, TERRY DONTJE wrote:
> 
>> I could see if less then N processes exit with non-zero exit code that the 
>> ORTE may choose not to abort the job.  However, if all N processes have 
>> exited or aborted I expect everything to clean up and mpirun to exit.  It 
>> does not do that at the moment which I think is what is causing most of the 
>> hangs in the MTT trunk runs which did not occur prior to this week.
>> 
>> --td
>> 
>> On 4/13/2012 5:18 PM, Ralph Castain wrote:
>>> This has come up again because some of the MTT tests depend on a specific 
>>> behavior when a process exits with a non-zero status - in this case, they 
>>> expect ORTE to abort the job. At some point, the default had been switched 
>>> to NOT abort the job if a process exited with a non-zero status.
>>> 
>>> So I'll throw this out to the community: if any process exits with a 
>>> non-zero status, should ORTE abort the job?
>>> 
>>> I don't personally care, but we ought to decide on something. In the 
>>> meantime, I will set the default so we DO abort, thus allowing the MTT runs 
>>> to complete correctly.
>>> 
>>> FWIW: the MCA param orte_abort_non_zero_exit can always be set to control 
>>> this behavior.
>>> 
>>> Ralph
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> 
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> -- 
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.781.442.2631
>> Oracle - Performance Technologies
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.don...@oracle.com
>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to