Just an FYI - I asked a similar question recently and got the following answer 
from Rolf:

> In my case, it was specific to openib only and it required you to be running 
> with two or more IB rails.
> Then, if one of them failed, we just shut it down, and continued with the 
> working ones.
> You could only get use of the failing rail if it was fixed and a new job was 
> started.
> 
> To get this to work, I created a new PML called bfo.  I also had to make some 
> changes in the openib BTL.
> By default, none of the code is configured in.  There is a README in the PML 
> bfo directory that 
> actually does quite a good job explaining what I did.

The bfo module is included in the 1.6 series, and in the upcoming 1.7 series. 
Can't say anything as to its state of repair.


On Oct 25, 2012, at 10:41 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> 
> On Oct 25, 2012, at 17:54 , Lirong Jian <lirong.m...@gmail.com> wrote:
> 
>> Hi foks,
>> 
>> Sorry to bother you guys, but I have some questions about Open MPI and 
>> really want your help.
>> 
>> There are some papers (e.g., [1, 2, 3], although they are sort of old-aged) 
>> mentioning that Open MPI is supporting NIC failover and message stripping 
>> over multiple NICs. However, when I read the source code of openmpi-1.6.2, I 
>> couldn't find any component named DR or TEG (which are mentioned in those 
>> papers and are supposed to support NIC failover and message stripping). So 
>> my question is:
>> 
>> Does the 1.6.2 release of Open MPI support such two kinds of 
>> functionalities? If positive, which part of code is corresponding to these 
>> functionalities?
> 
> Lirong,
> 
> As you noticed the papers are quite old and dusty.
> 
> Due to a lack of interest from the community the DR PML has been retired from 
> out stable releases. In other terms no stable Open MPI version supports 
> network failover. However, the code is still available in the trunk, but 
> there is no guarantee it still does what it was designed for.
> 
> TEG has been replaced with OB1, which is our current network management 
> layer. It does stripping over multiple NICs (identical or not) by default.
> 
>   george.
> 
>> 
>> Many thanks in advance.
>> 
>> P.S., I am a newbie of this domain. Maybe my questions are simple even 
>> naive, but your help would be highly appreciated.
>> 
>> Best,
>> Lirong
>> 
>> 
>> [1] Network Fault Tolerance in Open MPI.
>> [2] Open MPI: A High Performance, Flexible Implementation of MPI 
>> Point-to-Point Communications.
>> [3] TEG: A High-Performance, Scalable, Multi-network, Point-to-Point, 
>> Communications Methodology.
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to