Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier
 > > Well, I think you are right. This kind of change seems appropriate to
 > > me for mainline, but OFED/RHEL should carry a responsibility to manage
 > > an identified incompatibility, either patch their kernel, patch their
 > > OMPI, or publish an errata. That is the role of a distribution.
 > 
 > RHEL has said, multiple times, that they rely on OpenFabrics to do the Right 
 > Thing.  They don't do a lot of testing, validating, etc.

In that case OFED plays the role of distribution.
-- 
Roland Dreier 
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Sean Hefty wrote:
>> There is still some inconsistency here.   Sean, you claimed binds to
>> 127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running
>> IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...
>> 
>
> You can verify this by running ucmatose -b 127.0.0.1 and see if the test 
> enters
> the listening state.
>   
Well ofed-1.4.1 with openmpi gets failures when binding to 127.0.0.1 on 
mthca devs.  Jeff will post the results soon.

Are you sure ucmatose is really binding to that address? :)

> Can you also try testing iwarp with the patch that I sent? 
>
>   

I will soon.  Can't do it right now.  I'll try tonight or tomorrow.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 4:53 PM, Steve Wise wrote:

> There is still some inconsistency here.   Sean, you claimed binds to
> 127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running
> IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...

FWIW, I can run Open MPI v1.4.2beta on my OFED 1.4.1 cluster over IB devices 
using RDMA CM with no problems.  

I added some debug statements in OMPI showing which rdma_cm_bind's it attempts, 
just to be sure.  Here's a run across 2 nodes, each with a single 2-port mthca 
(each port connected to a different IB subnet, not that that matters):

$ mpirun -np 2 --bynode --mca btl_openib_cpc_include rdmacm ring
[svbu-mpi025:05592] FAILED to bind to 127.0.0.1
[svbu-mpi025:05592] FAILED to bind to 172.29.218.165
[svbu-mpi025:05592] SUCCEEDED to bind to 10.10.30.165
[svbu-mpi025:05592] SUCCEEDED to bind to 10.10.20.165
[svbu-mpi026:05529] FAILED to bind to 127.0.0.1
[svbu-mpi026:05529] FAILED to bind to 172.29.218.166
[svbu-mpi026:05529] SUCCEEDED to bind to 10.10.30.166
[svbu-mpi026:05529] SUCCEEDED to bind to 10.10.20.166
...

The 172.x address is my gigE device (eth0).

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty
>There is still some inconsistency here.   Sean, you claimed binds to
>127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running
>IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...

You can verify this by running ucmatose -b 127.0.0.1 and see if the test enters
the listening state.

Can you also try testing iwarp with the patch that I sent? 

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Jeff Squyres wrote:
> On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:
>
>   
>> Well, I think you are right. This kind of change seems appropriate to
>> me for mainline, but OFED/RHEL should carry a responsibility to manage
>> an identified incompatibility, either patch their kernel, patch their
>> OMPI, or publish an errata. That is the role of a distribution.
>> 
>
> RHEL has said, multiple times, that they rely on OpenFabrics to do the Right 
> Thing.  They don't do a lot of testing, validating, etc.
>
>   
>> Sounds like this is taken care for now anyhow, Sean's patch to remove
>> it for iwarp since it doesn't work today with any iwarp drivers does
>> obscure the problem.. But it does seem like rdma_cm mode for IB
>> networks will still be broken in OMPI with the new kernels.
>> 
>
> Correct.
>
> So why not back off putting this in the kernel that's coming out now now now? 
>  Why not put it in *next* kernel?  (or even better, the one after that)
>
> Is there a rush / need to have this in *now*?
>
>   

There is still some inconsistency here.   Sean, you claimed binds to 
127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running 
IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:

> Well, I think you are right. This kind of change seems appropriate to
> me for mainline, but OFED/RHEL should carry a responsibility to manage
> an identified incompatibility, either patch their kernel, patch their
> OMPI, or publish an errata. That is the role of a distribution.

RHEL has said, multiple times, that they rely on OpenFabrics to do the Right 
Thing.  They don't do a lot of testing, validating, etc.

> Sounds like this is taken care for now anyhow, Sean's patch to remove
> it for iwarp since it doesn't work today with any iwarp drivers does
> obscure the problem.. But it does seem like rdma_cm mode for IB
> networks will still be broken in OMPI with the new kernels.

Correct.

So why not back off putting this in the kernel that's coming out now now now?  
Why not put it in *next* kernel?  (or even better, the one after that)

Is there a rush / need to have this in *now*?

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jason Gunthorpe
On Fri, Feb 05, 2010 at 03:08:10PM -0500, Jeff Squyres wrote:
> On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:
> 
> > > I think we should remove the feature of allowing binds to 127.0.0.1 
> > > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is 
> > > a sw-loopback mechanism anyway...
> > 
> > I don't agree, the kernel should be free to provide a loop back
> > service any way it likes, and if that means using one of the HW
> 
> Ok, fine.  Should we push back OFED 1.5.1 until Open MPI can get 1.4.2 out?  
> I don't know when that will be.
 
> In short: you're breaking backward compatibility with zero warning.
> There is real software out there that will break if people upgrade
> their kernel/OFED/RDMA CM/whatever (e.g., Open MPI).  Isn't this
> supposed to be the Enterprise distribution (meaning: stability)?
> (trying to keep the frustration out of my voice...)

Well, I think you are right. This kind of change seems appropriate to
me for mainline, but OFED/RHEL should carry a responsibility to manage
an identified incompatibility, either patch their kernel, patch their
OMPI, or publish an errata. That is the role of a distribution.

> How about this: back out the change for now.  Give everyone time to
> upgrade.  If nothing else, ***give those of us who are involved in
> this community*** time to upgrade.  Then put the feature back in
> after adequate time has passed.

I've seen this approach go badly too :( If it isn't actually in a
mainline kernel userspace devs tend to ignore it ..

Sounds like this is taken care for now anyhow, Sean's patch to remove
it for iwarp since it doesn't work today with any iwarp drivers does
obscure the problem.. But it does seem like rdma_cm mode for IB
networks will still be broken in OMPI with the new kernels.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty
>Ammasso and Chelsio T3 rnics do not support HW loopback.

It looks like the NES driver doesn't support 127.0.0.1, but does support
loopback connections (gurgle).  Here's an untested patch for 2.6.33
(not even compile tested) for consideration then.  I'll be testing
this shortly unless there's disagreement.


rdma/cm: disallow loopback address for iwarp devices

From: Sean Hefty 

The current RDMA iWarp devices cannot be used to establish
connections using the loopback address.  Prevent rdma_bind_addr
from associating the loopback address with an iWarp device.

This fixes an issue with openmpi, where it tries to identify which
IP addresses map to RDMA devices by calling rdma_bind_addr on
each address and seeing if the bind succeeds.  Prior to patch
6f8372b6 "RDMA/cm: fix loopback address support", this process
worked.  But the rdma_cm now allows rdma_bind_addr to bind to an
RDMA device using the loopback address, and attaches the rdma_cm_id
to the RDMA device as part of the bind.

Signed-off-by: Sean Hefty 
---

 drivers/infiniband/core/cma.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index cc9b594..5850411 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1739,6 +1739,9 @@ err:
 }
 EXPORT_SYMBOL(rdma_resolve_route);
 
+/*
+ * Only IB devices support loopback connections.
+ */
 static int cma_bind_loopback(struct rdma_id_private *id_priv)
 {
struct cma_device *cma_dev;
@@ -1753,11 +1756,16 @@ static int cma_bind_loopback(struct rdma_id_private 
*id_priv)
ret = -ENODEV;
goto out;
}
-   list_for_each_entry(cma_dev, &dev_list, list)
+   list_for_each_entry(cma_dev, &dev_list, list) {
+   if (rdma_node_get_transport(cma_dev->device->node_type) !=
+   RDMA_TRANSPORT_IB)
+   continue;
+
for (p = 1; p <= cma_dev->device->phys_port_cnt; ++p)
if (!ib_query_port(cma_dev->device, p, &port_attr) &&
port_attr.state == IB_PORT_ACTIVE)
goto port_found;
+   }
 
p = 1;
cma_dev = list_entry(dev_list.next, struct cma_device, list);
@@ -1771,9 +1779,7 @@ port_found:
if (ret)
goto out;
 
-   id_priv->id.route.addr.dev_addr.dev_type =
-   (rdma_node_get_transport(cma_dev->device->node_type) == 
RDMA_TRANSPORT_IB) ?
-   ARPHRD_INFINIBAND : ARPHRD_ETHER;
+   id_priv->id.route.addr.dev_addr.dev_type = ARPHRD_INFINIBAND;
 
rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);



___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 1:56 PM, Jason Gunthorpe wrote:

> > I think we should remove the feature of allowing binds to 127.0.0.1 
> > altogether based on Jeff's arguments and my assertion that 127.0.0.1 is 
> > a sw-loopback mechanism anyway...
> 
> I don't agree, the kernel should be free to provide a loop back
> service any way it likes, and if that means using one of the HW

Ok, fine.  Should we push back OFED 1.5.1 until Open MPI can get 1.4.2 out?  I 
don't know when that will be.

In short: you're breaking backward compatibility with zero warning.  There is 
real software out there that will break if people upgrade their 
kernel/OFED/RDMA CM/whatever (e.g., Open MPI).  Isn't this supposed to be the 
Enterprise distribution (meaning: stability)?  (trying to keep the frustration 
out of my voice...)

This is a terrible, terrible idea.

How about this: back out the change for now.  Give everyone time to upgrade.  
If nothing else, ***give those of us who are involved in this community*** time 
to upgrade.  Then put the feature back in after adequate time has passed.

-- 
Jeff Squyres 
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier
 > > That should be the patch in question.  I'm not sure about reaching 
 > > consensus. :)
 > > If the other changes to the rdma_cm aren't closely tied to that change, we 
 > > may
 > > be able to back that one patch out until we can get whatever other fix may 
 > > be
 > > needed.

 > I'd like to do this approach.  Then re-submit once we come to consensus...

That makes sense to me.  Someone please send me a tested revert.
-- 
Roland Dreier 
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Sean Hefty wrote:
>> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")?  This
>> just went in for 2.6.33, which is still at -rc6, so if we can quickly
>> reach a consensus, there is still time to get a fix in for 2.6.33.
>> 
>
> That should be the patch in question.  I'm not sure about reaching consensus. 
> :)
> If the other changes to the rdma_cm aren't closely tied to that change, we may
> be able to back that one patch out until we can get whatever other fix may be
> needed.
>   

I'd like to do this approach.  Then re-submit once we come to consensus...

> In my view, openmpi has a bug in that it can pass a loopback address to a 
> remote
> peer and expect it to be used to establish a connection.  Steve seems to agree
> with this.
>
> My original intent was to allow the use of the loopback address with the
> rdma_cm.  I.e. 127.0.0.1 meant 'this host', and not 'software loopback'.  I 
> just
> had Arlin run a quick test with OFED 1.4 over IB, and it allows binding to
> 127.0.0.1, but never forms connections.  I.e. ucmatose -b 127.0.0.1 succeeds 
> in
> listening, but ucmatose -s 127.0.0.1 fails to connect because of a route 
> error.
> (Hmm... I'm still confused about what openmpi is doing then.)
>   

But it must fail in OFED-1.4 if binding to an iwarp interface.   Maybe 
there was IB-only logic allowing 127.0.0.1 binds in OFED-1.4?   

The reason openmpi might still work on IB is that its not typical to use 
the rdma-cm for IB setups.  Its required for iwarp though.

 Jeff, what's the default CPC for IB devices?

> Even if an application were to use non-loopback IP addresses, there's no
> guarantee of forming a connection if those addresses map to an iwarp device.
> So, even if the rdma_cm fails binding to 127.0.0.1 unless there's some RDMA
> device (software or hardware - not sure why we care) capable of supporting it,
> an application would need to also deal with failures from rdma_resolve_addr.
>
> Indicating loopback through a device capability flag seems like the right
> approach, and the rdma_cm can use this to fail 
> rdma_bind_addr/rdma_resolve_addr
> calls.  That's probably not a trivial patch however.
>
> - Sean
>   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jason Gunthorpe
On Fri, Feb 05, 2010 at 12:32:51PM -0600, Steve Wise wrote:

> I think we should remove the feature of allowing binds to 127.0.0.1  
> altogether based on Jeff's arguments and my assertion that 127.0.0.1 is  
> a sw-loopback mechanism anyway...

I don't agree, the kernel should be free to provide a loop back
service any way it likes, and if that means using one of the HW
adaptors to accelerate the work, then fine. Consider if we see the
RDMAoE (soft RDMA) patches then it would be reasonable for all
kernels to support RDMA on the loopback.

At a minimum, RDMA CM is an IP service, so whatever logic you use to
determine addresses for TCP must also be done after determining a list
of valid RDMA IPs. Trying to do RDMA CM bind just gives you the list
of candidate addreses, no different than netlink does for TCP.

One of those steps must be at least filtering 127.0.0.0/8. The user
should also be able to have some input into the IP filter - software
RDMAoE for instance really make this important.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier
 > I think we should remove the feature of allowing binds to 127.0.0.1
 > altogether based on Jeff's arguments and my assertion that 127.0.0.1
 > is a sw-loopback mechanism anyway...

Well, someone propose a patch please.
-- 
Roland Dreier 
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty
>Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")?  This
>just went in for 2.6.33, which is still at -rc6, so if we can quickly
>reach a consensus, there is still time to get a fix in for 2.6.33.

That should be the patch in question.  I'm not sure about reaching consensus. :)
If the other changes to the rdma_cm aren't closely tied to that change, we may
be able to back that one patch out until we can get whatever other fix may be
needed.

In my view, openmpi has a bug in that it can pass a loopback address to a remote
peer and expect it to be used to establish a connection.  Steve seems to agree
with this.

My original intent was to allow the use of the loopback address with the
rdma_cm.  I.e. 127.0.0.1 meant 'this host', and not 'software loopback'.  I just
had Arlin run a quick test with OFED 1.4 over IB, and it allows binding to
127.0.0.1, but never forms connections.  I.e. ucmatose -b 127.0.0.1 succeeds in
listening, but ucmatose -s 127.0.0.1 fails to connect because of a route error.
(Hmm... I'm still confused about what openmpi is doing then.)

Even if an application were to use non-loopback IP addresses, there's no
guarantee of forming a connection if those addresses map to an iwarp device.
So, even if the rdma_cm fails binding to 127.0.0.1 unless there's some RDMA
device (software or hardware - not sure why we care) capable of supporting it,
an application would need to also deal with failures from rdma_resolve_addr.

Indicating loopback through a device capability flag seems like the right
approach, and the rdma_cm can use this to fail rdma_bind_addr/rdma_resolve_addr
calls.  That's probably not a trivial patch however.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Jeff Squyres wrote:
> On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:
>
>   
>>  > But Jeff, note that if someone uses the upstream kernel and OpenMPI,
>>  > its busted...
>>
>> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")?  This
>> just went in for 2.6.33, which is still at -rc6, so if we can quickly
>> reach a consensus, there is still time to get a fix in for 2.6.33.
>> 
>
> Oh oh oh!  Yes, that would be fabulous...
>
> Thanks!
>
>   

I think we should remove the feature of allowing binds to 127.0.0.1 
altogether based on Jeff's arguments and my assertion that 127.0.0.1 is 
a sw-loopback mechanism anyway...

I'm not sure if that commit does more or not...

Steve.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:

>  > But Jeff, note that if someone uses the upstream kernel and OpenMPI,
>  > its busted...
> 
> Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")?  This
> just went in for 2.6.33, which is still at -rc6, so if we can quickly
> reach a consensus, there is still time to get a fix in for 2.6.33.

Oh oh oh!  Yes, that would be fabulous...

Thanks!

-- 
Jeff Squyres 
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 11:16 AM, Steve Wise wrote:

> > Note that it is highly unlikely that we will release open mpi 1.4.2 in
> > time for ofed 1.5.1.
> 
> Jeff, there is no way to handle high priority bug fixes in the current
> released stream?

We have 1.4.2 cooking, but it's not ready yet.  

I'll take it back to the OMPI community to see if they want to do a 
high-priority release, but I'm not excited about it (see below).

> > Also note that trying to bind rdma cm to all interface ip addresses
> > was the way that we were advised by openfabrics to figure out which
> > devices are rdma-capable.
> >
> > As such, it is highly desirable to get the fix transparently in rdmacm
> > and preserve the old semantic. More specifically, it seems undesirable
> > to change this semantic in a minor ofed point release.
> 
> I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1
> at all because it regresses OpenMPI.  Even with IB systems, if the bind
> to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that
> rdma interface and advertises this address to its peer as an address
> to-which that peer can rdma connect!  This will break IB clusters too,
> not just T3/iWARP cluster.   While I think OpenMPI needs to skip
> 127.0.0.1 in its logic, I think we should probably defer allowing
> 127.0.0.1 binds until ofed-1.6.

I agree that Open MPI should not advertise 127.0.0.1 to peers.  However, the 
logic that we were advised to use was to try to RDMA CM bind to each IP 
address.  If the bind succeeds, then it's an RDMA-capable device and therefore 
it's advertisable.  The rationale was that 127.0.0.1 (really, any loopback 
address) is *not* an RDMA device and therefore the RDMA CM bind should *never* 
succeed on it.  Hence, it wasn't necessary to add a "is this a loopback 
address?" check in the logic.

I guess I don't understand why that rationale is now incorrect -- 127.0.0.1 is 
still not an RDMA-capable device, right?

> But Jeff, note that if someone uses the upstream kernel and OpenMPI, its
> busted...
> 
> So I recommend:
> 
> 1) Don't allow 127.0.0.1 binds in ofed-1.5.1
> 
> 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm
> connect address (get it in ofed-1.5.2 or ofed-1.6).

We can add this logic (because I understand that some upstream kernels now 
allow binding to loopback addresses), but I'm still confused (in principle) as 
to why it should be necessary.

Can you clarify what kernel versions allow binding LOOPBACK addresses with RDMA 
CM?

-- 
Jeff Squyres 
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier
 > But Jeff, note that if someone uses the upstream kernel and OpenMPI,
 > its busted...

Is the issue 6f8372b6 ("RDMA/cm: fix loopback address support")?  This
just went in for 2.6.33, which is still at -rc6, so if we can quickly
reach a consensus, there is still time to get a fix in for 2.6.33.

 - R.
-- 
Roland Dreier 
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise

Sean Hefty wrote:
>> My concern is breaking an existing working OpenMPI in a point release
>> because we changed semantics of the rdma-cm in an ofed point release...
>> 
>
> OFED can call this release a point release, but in reality, the content makes 
> it
> a major release...
>
>   
>> BTW:  Was this change an artifact of rebasing ofed-1.5.1 on a new kernel
>> version?
>> 
>
> apparently
>
>   

Well as it stands now:  OpenMPI on ofed-1.5.1 is broken for IB if they 
use the rdma-cm for connection setup, and all IW clusters which require 
the rdma-cm connect method. 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty
>My concern is breaking an existing working OpenMPI in a point release
>because we changed semantics of the rdma-cm in an ofed point release...

OFED can call this release a point release, but in reality, the content makes it
a major release...

>BTW:  Was this change an artifact of rebasing ofed-1.5.1 on a new kernel
>version?

apparently

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise

> I agree that we should probably not allow 127.0.0.1 binds in 
> ofed-1.5.1 at all because it regresses OpenMPI.  Even with IB systems, 
> if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is 
> bound to that rdma interface and advertises this address to its peer 
> as an address to-which that peer can rdma connect!  This will break IB 
> clusters too, not just T3/iWARP cluster.   While I think OpenMPI needs 
> to skip 127.0.0.1 in its logic, I think we should probably defer 
> allowing 127.0.0.1 binds until ofed-1.6.
>
> But Jeff, note that if someone uses the upstream kernel and OpenMPI, 
> its busted...
>
> So I recommend:
>
> 1) Don't allow 127.0.0.1 binds in ofed-1.5.1
>
> 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm 
> connect address (get it in ofed-1.5.2 or ofed-1.6).

Also, there is a good argument for never allowing 127.0.0.1 for rdma 
anyway.  It implies a _software_ loopback.  It should NEVER be bound to 
a real NIC interface and thus rdma binds shouldn't be allowed to it 
since there is no software rdma loopback support...

Unless someone implements software rdma loobpack...  ;)


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Sean Hefty wrote:
>> Also note that trying to bind rdma cm to all interface ip addresses was the 
>> way
>> that we were advised by openfabrics to figure out which devices are rdma-
>> capable.
>>
>> As such, it is highly desirable to get the fix transparently in rdmacm and
>> preserve the old semantic. More specifically, it seems undesirable to change
>> this semantic in a minor ofed point release.
>> 
>
> I think the issue is larger than just the rdma_cm.
>
> First, it sounds like openmpi tries to bind to 127.0.0.1, which now works.  If
> opemmpi uses shared memory for connections on the same machine, I'm not sure 
> why
> this is a problem, unless it is passing that address to another machine to use
> for a connection.  If this is the case, then that is a bug in openmpi.
>   

Yes, OpenMPI incorrectly advertises 127.0.0.1 as a valid address 
to-which the peer can connect. This needs to be fixed.


> Second, I still don't understand whether iwarp is limited to 'loopback'
> connections that are not bound to 127.0.0.1.  For instance, if the RDMA device
> is associated with 192.168.0.1, then can it handle a connection from 
> 192.168.0.1
> <-> 192.168.0.1?  If it can't, then the rdma_cm can't help in this case when
> bind is called.  The failure has to come during connect, which sounds like the
> behavior that's seen today with 127.0.0.1.
>   

Its not iWARP specific.  A device may or may not support hw loopback.
Now the IB spec mandates this support, but the iWARP spec doesn't.  
Ammasso and Chelsio T3 rnics do not support HW loopback.  They will fail 
if you try to connect to a local address.  The rdma-cm shouldn't allow 
binds to 127.0.0.1 for these devices since it 100% implies that the 
connection will require hw loopback for that device.

> So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't
> support loopback, I'm still not sure how much of a fix this is.
>   

My concern is breaking an existing working OpenMPI in a point release 
because we changed semantics of the rdma-cm in an ofed point release...

BTW:  Was this change an artifact of rebasing ofed-1.5.1 on a new kernel 
version?

Steve.

> - Sean
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty
>Also note that trying to bind rdma cm to all interface ip addresses was the way
>that we were advised by openfabrics to figure out which devices are rdma-
>capable.
>
>As such, it is highly desirable to get the fix transparently in rdmacm and
>preserve the old semantic. More specifically, it seems undesirable to change
>this semantic in a minor ofed point release.

I think the issue is larger than just the rdma_cm.

First, it sounds like openmpi tries to bind to 127.0.0.1, which now works.  If
opemmpi uses shared memory for connections on the same machine, I'm not sure why
this is a problem, unless it is passing that address to another machine to use
for a connection.  If this is the case, then that is a bug in openmpi.

Second, I still don't understand whether iwarp is limited to 'loopback'
connections that are not bound to 127.0.0.1.  For instance, if the RDMA device
is associated with 192.168.0.1, then can it handle a connection from 192.168.0.1
<-> 192.168.0.1?  If it can't, then the rdma_cm can't help in this case when
bind is called.  The failure has to come during connect, which sounds like the
behavior that's seen today with 127.0.0.1.

So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't
support loopback, I'm still not sure how much of a fix this is.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Jeff Squyres (jsquyres) wrote:
>
> Note that it is highly unlikely that we will release open mpi 1.4.2 in 
> time for ofed 1.5.1.
>

Jeff, there is no way to handle high priority bug fixes in the current 
released stream?

> Also note that trying to bind rdma cm to all interface ip addresses 
> was the way that we were advised by openfabrics to figure out which 
> devices are rdma-capable.
>
> As such, it is highly desirable to get the fix transparently in rdmacm 
> and preserve the old semantic. More specifically, it seems undesirable 
> to change this semantic in a minor ofed point release.
>

I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1 
at all because it regresses OpenMPI.  Even with IB systems, if the bind 
to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that 
rdma interface and advertises this address to its peer as an address 
to-which that peer can rdma connect!  This will break IB clusters too, 
not just T3/iWARP cluster.   While I think OpenMPI needs to skip 
127.0.0.1 in its logic, I think we should probably defer allowing 
127.0.0.1 binds until ofed-1.6.

But Jeff, note that if someone uses the upstream kernel and OpenMPI, its 
busted...

So I recommend:

1) Don't allow 127.0.0.1 binds in ofed-1.5.1

2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm 
connect address (get it in ofed-1.5.2 or ofed-1.6).



Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres (jsquyres)
Note that it is highly unlikely that we will release open mpi 1.4.2 in time for 
ofed 1.5.1. 

Also note that trying to bind rdma cm to all interface ip addresses was the way 
that we were advised by openfabrics to figure out which devices are 
rdma-capable. 

As such, it is highly desirable to get the fix transparently in rdmacm and 
preserve the old semantic. More specifically, it seems undesirable to change 
this semantic in a minor ofed point release. 

-jms
Sent from my PDA.  No type good.

- Original Message -
From: Steve Wise 
To: Sean Hefty 
Cc: linux-rdma ; OpenFabrics EWG 
; Jeff Squyres (jsquyres); Roland Dreier (rdreier)
Sent: Thu Feb 04 18:04:23 2010
Subject: Re: bug 1918 - openmpi broken due to rdma-cm changes

Sean Hefty wrote:
>> Well then the rdma-cm needs to know which devices support hw loopback.
>> Cuz on a T3-only system, no hwloop...
>> 
>
> The problem sounds like it's more than just whether 127.0.0.1 is usable.  That
> check may fix openmpi, but it sounds more like the app needs to know whether 
> the
> device can actually support loopback, regardless of what addresses are used.  
> Is
> this correct?
>
> What would openmpi do if there were two addresses assigned to the T3 device?
>   

It would use them and might even create two connections.

> Does openmpi simply bypass RDMA for all connections on the local machine?
>
>   

OpenMPI can be run to use hw loopback if its available.  For T3 
clusters, OMPI is run in a mode to use shared memory for intra-node 
communications.


> Basically, I'm not sure that this is *just* an rdma_cm issue.  Although it
> definitely appears that some sort of change needs to be made to the rdma_cm.
>
>   

I think the OpenMPI rdmacm code needs to skip 127.0.0.1, in this 
particular case.  Prior to ofed-1.5.1, however, the bind would fail and 
thus OpenMPI would not advertise 127.0.0.1 to its peer.  I will work to 
get that change done.

But lets also add a device attribute so the rdmacm can know if a device 
supports loopback.   Clearly, if the rdma-cm allows binds to T3, 
loopback connections will fail at connect time.

Hey Roland, are you ok with a device attribute to indicate hw-loopback 
support?


Steve.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

[ewg] ofa_1_5_kernel 20100205-0200 daily build status

2010-02-05 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.26
Passed on ia64 with linux-2.6.25
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
Build failed on x86_64 with linux-2.6.18-164.el5
Log:
/home/vlad/tmp/ofa_1_5_kernel-20100205-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1832:
 warning: assignment from incompatible pointer type
/home/vlad/tmp/ofa_1_5_kernel-20100205-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:
 In function 'iscsi_transport_init':
/home/vlad/tmp/ofa_1_5_kernel-20100205-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1935:
 warning: passing argument 3 of 'netlink_kernel_create' from incompatible 
pointer type
/home/vlad/tmp/ofa_1_5_kernel-20100205-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.c:1949:
 error: implicit declaration of function 'netlink_kernel_release'
make[3]: *** 
[/home/vlad/tmp/ofa_1_5_kernel-20100205-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi/scsi_transport_iscsi.o]
 Error 1
make[2]: *** 
[/home/vlad/tmp/ofa_1_5_kernel-20100205-0200_linux-2.6.18-164.el5_x86_64_check/drivers/scsi]
 Error 2
make[1]: *** 
[_module_/home/vlad/tmp/ofa_1_5_kernel-20100205-0200_linux-2.6.18-164.el5_x86_64_check]
 Error 2
make[1]: Leaving directory `/home/vlad/kernel.org/x86_64/linux-2.6.18-164.el5'
make: *** [kernel] Error 2
--
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg