Let me try to summarize my understanding of the situation:

1. Ralph made the OOB asynchronous.

2. OOB cpcs don't work as a result of 1, and are thereby "deprecated", meaning: 
won't fix.  

3. Pasha moved the openib/connect to common/ofacm but excluded the rdmacm in 
that move.  Never changed openib to use ofacm/common. 

4. UDCM is "functional" in the trunk, still sitting in openib/connect. But no 
one is entirely sure if it really works which is why it was disabled in 1.7. 
Nathan - is there a design doc you can share on this beyond the comments in the 
code?

5. In order to satisfy the "grand plan":
        a. UDCM still needs to be moved to common/ofacm.
                b. OpenIB still needs to be changed to use common/ofacm.
                c.  RDMACM still needs to migrate to common/ofacm.
                d. XRC support needs to be added to UDCM and put into 
common/ofacm.

6. The "grand plan" being:  move the BTLs into Opal - hence the need to scuttle 
the OOB cpcs thereby justifying the deprecation and not fixing cpcs after #1.

So, that's a quick roundup of how we ended up here (as I understand it.)  What 
needs to be done is:

1. Somebody needs to certify/review/ that what Nathan has done is sound. From 
my perspective, this is a BIG change and needs a comprehensive architecture 
review. We've been using it in the trunk, and we've been testing it under MTT 
for some time - but have not deployed or tested at large-scale out in the 
field. Would be nice to see something on paper in terms of a design doc. 

2. Somebody then needs to move UDCM into common/ofacm.

3. Somebody needs to change openib to use common/ofacm cpcs instead of 
openib/connect cpcs.

4. Somebody needs to move RDMACM into common/ofacm and make sure RoCEE works.

5. Somebody needs to add XRC support to UDCM - whatever that might mean. Given 
Nathan added UDCM back in 2011 and nobody is really sure it's ready for 
prime-time, and given Pasha's comments regarding the difference in state 
machine requirements  between the two connection schemes, this doesn't seem 
like a trivial task.

Given Nathan's comments a second ago about ORNL not supporting the IB Offload 
component, it barely makes sense to keep common/ofacm. And it sounds like the 
two cpcs presently contained therein are now unusable.
 
All of this work is a result of the Grand Plan to move the BTLs into the Opal 
layer - which I have no idea what the motive is (I was not involved with OMPI 
when this was decided or debated.) 

Basically, without these five changes OpenIB is dead in 1.7.4 and beyond for 
RC, XRC, and RoCEE. These are blockers to 1.7.4 and I don't believe that the 
onus falls squarely on Mellanox to fix these. These were community decisions 
and, as such, it must be a community effort to resolve. We are happy to lend a 
hand, but we are not fixing all of this mess.

Josh 

 

-----Original Message-----
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Shamis, Pavel
Sent: Thursday, November 14, 2013 2:08 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi r29703 - 
in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
ompi/mca/btl/openib/connect

There is some confusion in the thread. UDCM is just another CPC, like XOOB, 
OOB, and RDMACM (I think IBCM is officially dead).
XOOB and OOB don't use UDCM, they relay on ORTE out-of-band communication.

OpenIB/connect supports UDCM,XOOB,OOB, and RDMACM OFACM supports (at least last 
time when we checked) OOB and XOOB

RDMACM was not moved to OFACM, because of iWarp's "first message" requirement 
that used to break the abstraction.
Moreover RDMACM scalability used to be terrible, as a result no one in IB 
community really used it.
The situation is a bit different today, since ROCEE relays on RDMACM. It worth 
noting that you may setup ROCEE connections with a regular OOB with a some 
restrictions (we did it for mvapich-1).

The code between ofacm and openib is similar, but NOT the same. We change the 
API in a way that it allows to hide XRC QP management (there is hash table that 
manages QP to EP mapping) in OFACM instead of OPENIB.
This made openib initialization code a bit cleaner. Here is my old tree with 
openib btl changes https://bitbucket.org/pasha/ofacm

I hope it helps,

Best,
Pasha

On Nov 14, 2013, at 1:17 PM, Joshua Ladd <josh...@mellanox.com> wrote:

> Unless someone went in and "fixed" the code in common (judging by the 
> comments, fixed seems to imply porting (x)oob to use UDCM, which hasn't been 
> done at all in the context of xoob and is incompletely patched and remains 
> unusable as a replacement for oob in 1.7.4), there is no reason to believe it 
> would work any different than the cpcs under btl/openib/connect. IIRC, it's 
> the same code - copy/pasted - just moved to a common location so Cheetah 
> collectives can do their wireup. So, if oob cpc doesn't work, ofacm oob won't 
> work either and, I guess, by extension, Cheetah IBoffload won't work. Pasha, 
> correct me if you know different. 
> 
> 
> Josh
> 
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph 
> Castain
> Sent: Thursday, November 14, 2013 1:05 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn-full] svn:open-mpi 
> r29703 - in trunk: contrib/platform/iu/odin ompi/mca/btl/openib 
> ompi/mca/btl/openib/connect
> 
> 
> On Nov 14, 2013, at 9:33 AM, Barrett, Brian W <bwba...@sandia.gov> wrote:
> 
>> On 11/14/13 9:51 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
>> 
>>> Does XRC work with the UDCM CPC?
>>> 
>>> 
>>> On Nov 14, 2013, at 9:35 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>> I think the problems in udcm were fixed by Nathan quite some time 
>>>> ago, but never moved to 1.7 as everyone was told that the connect 
>>>> code in openib was already deprecated pending merge with the new 
>>>> ofacm common code. Looking over at that area, I see only oob and 
>>>> xoob - so if the users of the common ofacm code are finding that it 
>>>> works, the simple answer may just be to finally complete the switchover.
>>>> 
>>>> Meantime, perhaps someone can CMR and review a copying of the udcm 
>>>> cpc to the 1.7 branch?
>>>> 
>>>> 
>>>> On Nov 14, 2013, at 5:14 AM, Joshua Ladd <josh...@mellanox.com> wrote:
>>>> 
>>>>> Um, no. It's supposed to work with UDCM which doesn't appear to be 
>>>>> enabled in 1.7.
>>>>> 
>>>>> Per Ralph's comment to me last night:
>>>>> 
>>>>> "... you cannot use the oob connection manager. It doesn't work 
>>>>> and was deprecated. You must use udcm, which is why things are 
>>>>> supposed to be set to do so by default. Please check the openib 
>>>>> connect priorities and correct them if necessary."
>>>>> 
>>>>> However, it's never been enabled in 1.7 - don't know what "borked"
>>>>> means, and from what Devendar tells me, several UDCM commits that 
>>>>> are in the trunk have not been pushed over to 1.7:
>>>>> 
>>>>> So, as of this moment, OpenIB BTL is essentially dead-in-the-water 
>>>>> in 1.7.
>>>>> 
>>>>> 
>>>>> 
>> 
>> I'm going to start by admitting that I haven't been paying attention 
>> to IB the last couple of months, so I'm out of my league a little bit 
>> here.  I remember discussions of UDCM replacing OOB both because the 
>> OOB CPC had some issues and because it would make it easier to move 
>> the BTLs to the OPAL layer (ie, below the OOB).  But I also thought 
>> that was more future work than it clearly was.  So can someone let me know:
>> 
>> 1) What the status of UDCM is (does it work reliably, does it support 
>> XRC, etc.)
> 
> Seems to be working okay on the IB systems at LANL and IU. Don't know about 
> XRC - I seem to recall the answer is "no"
> 
>> 2) What's the difference between CPCs and OFACM and what's our plans 
>> w.r.t 1.7 there?
> 
> Pasha created ofacm because some of the collective components now need to 
> forge connections. So he created the common/ofacm code to meet those needs, 
> with the intention of someday replacing the openib cpc's with the new common 
> code. However, this was stalled by the iWarp issue, and so it fell off the 
> table.
> 
> We now have two duplicate ways of doing the same thing, but with code 
> in two different places. :-(
> 
>> 3) Someone mentioned that ofacm oob worked, but cpc oob didn't.  Can 
>> someone explain why?
> 
> I'm not sure that is actually true as there is no indication that anyone is 
> using or testing the collective components that use ofacm code.
> 
> 
>> 
>> Again, sorry for being dense; I've been spending too much time in 
>> Portals land lately.
>> 
>> Brian
>> 
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to