Wesley,

Thanks for catching that oversight. Below are the MCA parameters that you 
should need at the moment:
#####################################
# Use the C/R Process Migration Recovery Supervisor
recos_base_enable=1
# Only use the 'rsh' launcher, other launchers will be supported later
plm=rsh
# The resilient mapper knows how to use RecoS and deal with recovering procs
rmaps=resilient
# 'cm' component is the only one that can handle failures at the moment
routed=cm
#####################################

Let me know if you have any troubles.

-- Josh

On Mar 10, 2010, at 10:36 AM, Wesley Bland wrote:

> Josh,
> 
> You mentioned some MCA parameters that you would include in the email, but I 
> don't see those parameters anywhere.  Could you please put those in here to 
> make testing easier for people.
> 
> Wesley
> 
> On Wed, Mar 10, 2010 at 1:26 PM, Josh Hursey <jjhur...@open-mpi.org> wrote:
> Yesterday evening George, Thomas and I discussed some of their concerns about 
> this RFC at the MPI Forum meeting. After the discussion, we seemed to be in 
> agreement that the RecoS framework is a good idea and the concepts and fixes 
> in this RFC should move forward with a couple of notes:
> 
>  - They wanted to test the branch a bit more over the next couple of days. 
> Some MCA parameters that you will need are at the bottom of this message.
> 
>  - Reiterate that this RFC only addresses ORTE stability, not OMPI stability. 
> The OMPI stability extension is a second step for the line of work, and 
> should/will fit in nicely with the RecoS framework being proposed in this 
> RFC. The OMPI layer stability will require a significant amount of work, but 
> the RecoS framework will provide the ORTE layer stability that is required as 
> a foundation for OMPI layer stability in the future.
> 
>  - The purpose of the ErrMgr becomes slightly unclear with the addition of 
> the RecoS framework, since both are focused on responding to faults in the 
> system (and RecoS, when enabled, overrides most/all of the ErrMgr 
> functionality). Should the RecoS framework be merged with the ErrMgr 
> framework to create a new ErrMgr interface?
> 
> We are typing to decide if we should merge these frameworks, but at this 
> point we are interested in hearing how other developers feel about merging 
> the ErrMgr and RecoS frameworks, which would change the ErrMgr API. Are there 
> any developers out there that are developing ErrMgr components, or are using 
> any particular features of the existing ErrMgr framework that they would like 
> to see preserved in the next revision. By default, the existing default abort 
> behavior of the ErrMgr framework will be preserved, so the user will have to 
> 'opt-in' to any fault recovery capabilities.
> 
> So we are continuing the discussion a bit more off-list, and will return to 
> the list with an updated RFC (and possibly a new branch) soon (hopefully end 
> of the week/early next week). I would like to briefly discuss this RFC at the 
> Open MPI teleconf next Tuesday.
> 
> -- Josh
> 
> On Feb 26, 2010, at 8:06 AM, Josh Hursey wrote:
> 
> > Sounds good to me.
> >
> > For those casually following this RFC let me summarize its current state.
> >
> > Josh and George (and anyone else that wishes to participate attending the 
> > forum) will meet sometime at the next MPI Forum meeting (March 8-10). I 
> > will post any relevant notes from this meeting back to the list afterwards. 
> > So the RFC is on hold pending the outcome of that meeting. For those 
> > developers interested in this RFC that will not be able to attend, feel 
> > free to continue using this thread for discussion.
> >
> > Thanks,
> > Josh
> >
> > On Feb 26, 2010, at 6:09 AM, George Bosilca wrote:
> >
> >>
> >> On Feb 26, 2010, at 01:50 , Josh Hursey wrote:
> >>
> >>> Any of those options are fine with me. I was thinking that if you wanted 
> >>> to talk sooner, we might be able to help explain our intentions with this 
> >>> framework a bit better. I figure that the framework interface will change 
> >>> a bit as we all advance and incorporate our various techniques into it. I 
> >>> think that the current interface is a good first step, but there are 
> >>> certainly many more steps to come.
> >>>
> >>> I am fine delaying this code a bit, just not too long. Meeting at the 
> >>> forum for a while might be a good option (we could probably even arrange 
> >>> to call in others if you wanted).
> >>
> >> Sounds good, let do this.
> >>
> >> Thanks,
> >>   george.
> >>
> >>>
> >>> Cheers,
> >>> Josh
> >>>
> >>> On Feb 25, 2010, at 6:45 PM, Ralph Castain wrote:
> >>>
> >>>> If Josh is going to be at the forum, perhaps you folks could chat there? 
> >>>> Might as well take advantage of being colocated, if possible.
> >>>>
> >>>> Otherwise, I'm available pretty much any time. I can't contribute much 
> >>>> about the MPI recovery issues, but can contribute to the RTE issues if 
> >>>> that helps.
> >>>>
> >>>>
> >>>> On Thu, Feb 25, 2010 at 7:39 PM, George Bosilca <bosi...@eecs.utk.edu> 
> >>>> wrote:
> >>>> Josh,
> >>>>
> >>>> Next week is a little bit too early as will need some time to figure out 
> >>>> how to integrate with this new framework, and at what extent our code 
> >>>> and requirements fit into. Then the week after is the MPI Forum. How 
> >>>> about on Thursday 11 March?
> >>>>
> >>>> Thanks,
> >>>> george.
> >>>>
> >>>> On Feb 25, 2010, at 12:46 , Josh Hursey wrote:
> >>>>
> >>>>> Per my previous suggestion, would it be useful to chat on the phone 
> >>>>> early next week about our various strategies?
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to