On Nov 15, 2013, at 1:59 PM, Steve Wise <sw...@opengridcomputing.com> wrote:

> On 11/14/2013 12:16 PM, Jeff Squyres (jsquyres) wrote:
>> On Nov 14, 2013, at 1:03 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>>> 1) What the status of UDCM is (does it work reliably, does it support
>>>> XRC, etc.)
>>> Seems to be working okay on the IB systems at LANL and IU. Don't know about 
>>> XRC - I seem to recall the answer is "no"
>> FWIW, I recall that when Cisco was testing UDCM (a long time ago -- before 
>> we threw away our IB gear...), we found bugs in UDCM that only showed up 
>> with really large numbers of MTT tests running UDCM (i.e., 10K+ tests a 
>> night, especially with lots of UDCM-based jobs running concurrently on the 
>> same cluster).  These types of bugs didn't show up in casual testing.
>> 
>> Has that happened with the new/fixed UDCM?  Cisco is no longer in a position 
>> to test this.
>> 
>>>> 2) What's the difference between CPCs and OFACM and what's our plans
>>>> w.r.t 1.7 there?
>>> Pasha created ofacm because some of the collective components now need to 
>>> forge connections. So he created the common/ofacm code to meet those needs, 
>>> with the intention of someday replacing the openib cpc's with the new 
>>> common code. However, this was stalled by the iWarp issue, and so it fell 
>>> off the table.
> 
> Perhaps if Pasha or somebody else proficient in the OMPI code could help out, 
> then the iWARP CPC could be moved.  W/O help from OMPI developers, its going 
> to take me a very long time...

I believe we would all be willing to provide advice - we just have no way of 
testing.

> 
>>> 
>>> We now have two duplicate ways of doing the same thing, but with code in 
>>> two different places. :-(
>> FWIW, the iWARP vendors have repeatedly been warned that ofacm is going to 
>> take over, and unless they supply patches, iWarp will stop working in Open 
>> MPI.  I know for a fact that they are very aware of this.
>> 
>> So my $0.02 is that ofacm should take over -- let's get rid of CPC and have 
>> openib use the ofacm.  The iWarp folks can play catch up if/when they want 
>> to.
>> 
>> Of course, I'm not in this part of the code base any more, so it's not 
>> really my call -- just my $0.02...
>> 
> 
> Can't we leave the openib rdma CPC code as is until we can get the rdmacm CPC 
> moved into OFACM.  What is the harm with that exactly? I mean, if no iWARP 
> devices support these accelerated MPI collectives, then leave the rdmacm CPC 
> in the openib btl so we can at least support iWARP via the openib BTL...

I see no reason why we can't just push the rdma over to ofacm - I'd prefer that 
to leaving the code in the openib btl. Forcing the openib btl to use both cpc's 
from ofacm AND its own would be ugly.

> 
> Steve.
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to