On Nov 15, 2013, at 1:59 PM, Steve Wise <sw...@opengridcomputing.com> wrote:
> On 11/14/2013 12:16 PM, Jeff Squyres (jsquyres) wrote: >> On Nov 14, 2013, at 1:03 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >>>> 1) What the status of UDCM is (does it work reliably, does it support >>>> XRC, etc.) >>> Seems to be working okay on the IB systems at LANL and IU. Don't know about >>> XRC - I seem to recall the answer is "no" >> FWIW, I recall that when Cisco was testing UDCM (a long time ago -- before >> we threw away our IB gear...), we found bugs in UDCM that only showed up >> with really large numbers of MTT tests running UDCM (i.e., 10K+ tests a >> night, especially with lots of UDCM-based jobs running concurrently on the >> same cluster). These types of bugs didn't show up in casual testing. >> >> Has that happened with the new/fixed UDCM? Cisco is no longer in a position >> to test this. >> >>>> 2) What's the difference between CPCs and OFACM and what's our plans >>>> w.r.t 1.7 there? >>> Pasha created ofacm because some of the collective components now need to >>> forge connections. So he created the common/ofacm code to meet those needs, >>> with the intention of someday replacing the openib cpc's with the new >>> common code. However, this was stalled by the iWarp issue, and so it fell >>> off the table. > > Perhaps if Pasha or somebody else proficient in the OMPI code could help out, > then the iWARP CPC could be moved. W/O help from OMPI developers, its going > to take me a very long time... I believe we would all be willing to provide advice - we just have no way of testing. > >>> >>> We now have two duplicate ways of doing the same thing, but with code in >>> two different places. :-( >> FWIW, the iWARP vendors have repeatedly been warned that ofacm is going to >> take over, and unless they supply patches, iWarp will stop working in Open >> MPI. I know for a fact that they are very aware of this. >> >> So my $0.02 is that ofacm should take over -- let's get rid of CPC and have >> openib use the ofacm. The iWarp folks can play catch up if/when they want >> to. >> >> Of course, I'm not in this part of the code base any more, so it's not >> really my call -- just my $0.02... >> > > Can't we leave the openib rdma CPC code as is until we can get the rdmacm CPC > moved into OFACM. What is the harm with that exactly? I mean, if no iWARP > devices support these accelerated MPI collectives, then leave the rdmacm CPC > in the openib btl so we can at least support iWARP via the openib BTL... I see no reason why we can't just push the rdma over to ofacm - I'd prefer that to leaving the code in the openib btl. Forcing the openib btl to use both cpc's from ofacm AND its own would be ugly. > > Steve. > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel