Thanks, guys. I will check the code of OB1 more carefully. Thanks.
Best, Lirong Message: 7 > Date: Thu, 25 Oct 2012 10:55:51 -0700 > From: Ralph Castain <r...@open-mpi.org> > Subject: Re: [OMPI devel] NIC Failover and Message Stripping of Open > MPI. > To: Open MPI Developers <de...@open-mpi.org> > Message-ID: <b1a13d1b-02a2-4e67-b0cd-fa924538d...@open-mpi.org> > Content-Type: text/plain; charset="us-ascii" > > Just an FYI - I asked a similar question recently and got the following > answer from Rolf: > > > In my case, it was specific to openib only and it required you to be > running with two or more IB rails. > > Then, if one of them failed, we just shut it down, and continued with > the working ones. > > You could only get use of the failing rail if it was fixed and a new job > was started. > > > > To get this to work, I created a new PML called bfo. I also had to make > some changes in the openib BTL. > > By default, none of the code is configured in. There is a README in the > PML bfo directory that > > actually does quite a good job explaining what I did. > > The bfo module is included in the 1.6 series, and in the upcoming 1.7 > series. Can't say anything as to its state of repair. > > > On Oct 25, 2012, at 10:41 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > > > > > On Oct 25, 2012, at 17:54 , Lirong Jian <lirong.m...@gmail.com> wrote: > > > >> Hi foks, > >> > >> Sorry to bother you guys, but I have some questions about Open MPI and > really want your help. > >> > >> There are some papers (e.g., [1, 2, 3], although they are sort of > old-aged) mentioning that Open MPI is supporting NIC failover and message > stripping over multiple NICs. However, when I read the source code of > openmpi-1.6.2, I couldn't find any component named DR or TEG (which are > mentioned in those papers and are supposed to support NIC failover and > message stripping). So my question is: > >> > >> Does the 1.6.2 release of Open MPI support such two kinds of > functionalities? If positive, which part of code is corresponding to these > functionalities? > > > > Lirong, > > > > As you noticed the papers are quite old and dusty. > > > > Due to a lack of interest from the community the DR PML has been retired > from out stable releases. In other terms no stable Open MPI version > supports network failover. However, the code is still available in the > trunk, but there is no guarantee it still does what it was designed for. > > > > TEG has been replaced with OB1, which is our current network management > layer. It does stripping over multiple NICs (identical or not) by default. > > > > george. > > > >> > >> Many thanks in advance. > >> > >> P.S., I am a newbie of this domain. Maybe my questions are simple even > naive, but your help would be highly appreciated. > >> > >> Best, > >> Lirong > >> > >> > >> [1] Network Fault Tolerance in Open MPI. > >> [2] Open MPI: A High Performance, Flexible Implementation of MPI > Point-to-Point Communications. > >> [3] TEG: A High-Performance, Scalable, Multi-network, Point-to-Point, > Communications Methodology. > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -------------- next part -------------- > HTML attachment scrubbed and removed > > ------------------------------ > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > End of devel Digest, Vol 2285, Issue 2 > ************************************** >