Re: [OMPI devel] How can I achieve node fail over

Ralph Castain Mon, 11 Jan 2010 22:52:59 -0500

As Josh indicated, the current OMPI trunk won't do that at the moment. Josh and 
I are working on a side branch to integrate the OpenRCM methods with mpirun to 
provide an OMPI capability for those not running ORCM on their systems.


What wasn't clear is your motivation. Are you trying to develop this 
capability, or just use it? If the former, then the three of us should probably 
discuss off-line how you might contribute. Like Josh said, you might also find 
the ORCM work relevant.

If the latter, it should become available in the next few weeks as most of the 
required elements are already in the ORTE code base (what remains in ORCM and 
not in ORTE is the integration to reassign the task). Rewiring MPI after a task 
is restarted will follow.


On Jan 11, 2010, at 8:37 PM, Sai Sudheesh wrote:

> Hi Josh,
> 
>           First of all...thanks for your response..
>           There was some typos in my mail
>           making it vague at some portions.
> 
>           Let me  make the scenarios mentioned in the
>           previous mail more elaborative.
>           What I tried is as follows.
> 
>           I assigned a parallel task taking a few minutes
>           (matrix multiplication of order 2048) to
>           two machines connected through Ethernet.
>           while the multiplication was going on
>           I pulled off the ethernet cable.
>           This resulted in infinite waiting of the mpirun.
>           I was in need of mechanism to find the failure
>           link.
> 
>           So, I tried to run mpirun with mca parameter
>           -heartbeat-rate 1.
>           Now mpirun was able to be aware of the link failure
>           and aborted after dumping ip of the non reachable
>           node on terminal.
> 
>           At this point I have to catch this fault
>           and instead of displaying the error message on screen
>           and aborting the whole job.
>           I need to reassign the task to some
>           reachable node.
> 
>           I hope this time I expressed it clearly.
>           Thanks.
> 
> With Love
> sudheesh
> 
> 
> On 1/12/10, Josh Hursey <jjhur...@open-mpi.org> wrote:
>> 
>> On Jan 6, 2010, at 9:04 AM, Sai Sudheesh wrote:
>> 
>>> Hi,
>>> 
>>>      Just about two months ago I started experimenting with OpenMPI.
>>>      I found this piece of software very interesting.
>>> 
>>>      How can I make this software fault tolerant?
>> 
>> Depends on what you mean my fault tolerant. :)
>> 
>>>      As of now I am running this software on two machines
>>>      having quad core processors and fedora 10.
>>>      I am using openmpi1.3.2.
>>> 
>>>      If a remote machine fails while a parallel task running on both
>>> the machines
>>>      is it possible to reassign that task assigned to it  to some
>>> other node available and
>>>      continue the computation instead of aborting the entire
>>> computation?
>> 
>> This scenario is currently not supported by Open MPI. If an MPI
>> process fails, Open MPI will cleanup the job.
>> 
>> A few of us have been working on this scenario off-trunk for a while
>> now. It is progressing nicely, but not available for public
>> consumption just yet.
>> 
>> 
>>>      Can anybody tell me where I have to look for more information
>>> regarding this.
>>>      I have tried with FT MPI but tired of it.
>> 
>> FT-MPI should be able to work in this scenario.
>> 
>>>      I have also heard of CIFTS-FTB, can I use for solving this?
>> 
>> The CIFTS FTB is focused on a slightly different problem, that of
>> coordination amongst software components before/during/after a
>> failure. Currently, Open MPI is able to interact with the CIFTS FTB to
>> send fault information. Soon, Open MPI will be able to respond to such
>> fault information and take appropriate actions. The first generation
>> of this work is scheduled to be brought into the Open MPI trunk soon,
>> and will support catching of some basic events. Handling the scenario
>> you mentioned at the top of the message will come shortly thereafter.
>> 
>>>      Is it necessary to make a source code change?
>> 
>> In some cases yes, in others no. It really depends on what the final
>> solution set looks like and how involved your application wants to be
>> in the recovery process. At the very least, the application will
>> likely have to specify the MPI_ERRORS_RETURN error handler for each
>> communicator to override the default MPI_ERRORS_ARE_FATAL.
>> 
>> 
>>>      Have anybody a solution already with you?
>> 
>> There are a couple of transparent fault tolerance solutions in the
>> current trunk.
>>  - Checkpoint/Restart of the entire MPI job (requires full job
>> restart on failure)
>>    http://www.osl.iu.edu/research/ft/ompi-cr/
>>  - Message Logging:
>>    https://svn.open-mpi.org/trac/ompi/wiki/EventLog_CR
>> 
>> For non-MPI jobs you could also check out the Open Resilient Cluster
>> Manager (ORCM) project:
>>   http://www.open-mpi.org/projects/orcm/
>> 
>>> 
>>>      If an application is killed by OS at the remote node
>>>      mpirun is aborting and reports an error.
>>>      What kind of signal the remote orted is to mpirun?
>>>      How can I handle it?
>> 
>> I'm not sure what your asking here. The orted detects the local
>> process failure and notifies the mpirun process using the OOB (out-of-
>> band) communication channel. The mpirun process then initiates the
>> shutdown procedure.
>> 
>> -- Josh
>> 
>>> 
>>>      I know that I have asked a lot of questions..
>>>      I will be thankful to you If anybody could respond with
>>>      at least some suggestions.
>>> 
>>> with love
>>> sudheesh.
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> -- 
> regards
> sai sudheesh
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] How can I achieve node fail over

Reply via email to