Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-07 Thread Tziporet Koren
On 2/7/2010 3:22 AM, Steve Wise wrote:


 Good catch, I'll update the patch and submit for 2.6.33 on Monday.



 NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.


If this patch will be accepted to the kernel 2.6.33 we can take it too

Tziporet
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-07 Thread Steve Wise
Roland Dreier wrote:
   My point, though, is that even with this patch in ofed-1.5.1, we still
   have an openmpi/IB/rdmacm regression.  The only way to avoid this
   regression without changing openmpi is to disallow _all_ rdma binds to
   127.0.0.1.

 Can you identify the source of the regression?  ie what was the change
 that broke things?

   

It is the same commit you sited earlier.  It enables binding rdma cm_ids 
to 127.0.0.1.  Sean's proposed patch on top of that disables this only 
for iwarp devices.


 I'm most concerned that there is another regression in 2.6.33, and if so
 I would like to try and avoid letting that get into the final release.
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-06 Thread Steve Wise
Sean Hefty wrote:
 There is still some inconsistency here.   Sean, you claimed binds to
 127.0.0.1 succeed in ofed-1.4 for IB devices.  If so, then folks running
 IB/openmpi/rdmacm should be seeing issues.  We need to dig a little more...
 

 You can verify this by running ucmatose -b 127.0.0.1 and see if the test 
 enters
 the listening state.

 Can you also try testing iwarp with the patch that I sent? 

   

I backported your patch to ofed-1.5.1 and tried it, and apparently binds 
to 127.0.0.1 are still working even though the only device in the system 
is iWARP.  I'm debugging now.


Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-06 Thread Steve Wise

   
 Good catch, I'll update the patch and submit for 2.6.33 on Monday.

 

NOTE: This doesn't solve our IB/openmpi regression for ofed-1.5.1.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres (jsquyres)
Note that it is highly unlikely that we will release open mpi 1.4.2 in time for 
ofed 1.5.1. 

Also note that trying to bind rdma cm to all interface ip addresses was the way 
that we were advised by openfabrics to figure out which devices are 
rdma-capable. 

As such, it is highly desirable to get the fix transparently in rdmacm and 
preserve the old semantic. More specifically, it seems undesirable to change 
this semantic in a minor ofed point release. 

-jms
Sent from my PDA.  No type good.

- Original Message -
From: Steve Wise sw...@opengridcomputing.com
To: Sean Hefty sean.he...@intel.com
Cc: linux-rdma linux-r...@vger.kernel.org; OpenFabrics EWG 
e...@openfabrics.org; Jeff Squyres (jsquyres); Roland Dreier (rdreier)
Sent: Thu Feb 04 18:04:23 2010
Subject: Re: bug 1918 - openmpi broken due to rdma-cm changes

Sean Hefty wrote:
 Well then the rdma-cm needs to know which devices support hw loopback.
 Cuz on a T3-only system, no hwloop...
 

 The problem sounds like it's more than just whether 127.0.0.1 is usable.  That
 check may fix openmpi, but it sounds more like the app needs to know whether 
 the
 device can actually support loopback, regardless of what addresses are used.  
 Is
 this correct?

 What would openmpi do if there were two addresses assigned to the T3 device?
   

It would use them and might even create two connections.

 Does openmpi simply bypass RDMA for all connections on the local machine?

   

OpenMPI can be run to use hw loopback if its available.  For T3 
clusters, OMPI is run in a mode to use shared memory for intra-node 
communications.


 Basically, I'm not sure that this is *just* an rdma_cm issue.  Although it
 definitely appears that some sort of change needs to be made to the rdma_cm.

   

I think the OpenMPI rdmacm code needs to skip 127.0.0.1, in this 
particular case.  Prior to ofed-1.5.1, however, the bind would fail and 
thus OpenMPI would not advertise 127.0.0.1 to its peer.  I will work to 
get that change done.

But lets also add a device attribute so the rdmacm can know if a device 
supports loopback.   Clearly, if the rdma-cm allows binds to T3, 
loopback connections will fail at connect time.

Hey Roland, are you ok with a device attribute to indicate hw-loopback 
support?


Steve.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Jeff Squyres (jsquyres) wrote:

 Note that it is highly unlikely that we will release open mpi 1.4.2 in 
 time for ofed 1.5.1.


Jeff, there is no way to handle high priority bug fixes in the current 
released stream?

 Also note that trying to bind rdma cm to all interface ip addresses 
 was the way that we were advised by openfabrics to figure out which 
 devices are rdma-capable.

 As such, it is highly desirable to get the fix transparently in rdmacm 
 and preserve the old semantic. More specifically, it seems undesirable 
 to change this semantic in a minor ofed point release.


I agree that we should probably not allow 127.0.0.1 binds in ofed-1.5.1 
at all because it regresses OpenMPI.  Even with IB systems, if the bind 
to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is bound to that 
rdma interface and advertises this address to its peer as an address 
to-which that peer can rdma connect!  This will break IB clusters too, 
not just T3/iWARP cluster.   While I think OpenMPI needs to skip 
127.0.0.1 in its logic, I think we should probably defer allowing 
127.0.0.1 binds until ofed-1.6.

But Jeff, note that if someone uses the upstream kernel and OpenMPI, its 
busted...

So I recommend:

1) Don't allow 127.0.0.1 binds in ofed-1.5.1

2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm 
connect address (get it in ofed-1.5.2 or ofed-1.6).



Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Sean Hefty
Also note that trying to bind rdma cm to all interface ip addresses was the way
that we were advised by openfabrics to figure out which devices are rdma-
capable.

As such, it is highly desirable to get the fix transparently in rdmacm and
preserve the old semantic. More specifically, it seems undesirable to change
this semantic in a minor ofed point release.

I think the issue is larger than just the rdma_cm.

First, it sounds like openmpi tries to bind to 127.0.0.1, which now works.  If
opemmpi uses shared memory for connections on the same machine, I'm not sure why
this is a problem, unless it is passing that address to another machine to use
for a connection.  If this is the case, then that is a bug in openmpi.

Second, I still don't understand whether iwarp is limited to 'loopback'
connections that are not bound to 127.0.0.1.  For instance, if the RDMA device
is associated with 192.168.0.1, then can it handle a connection from 192.168.0.1
- 192.168.0.1?  If it can't, then the rdma_cm can't help in this case when
bind is called.  The failure has to come during connect, which sounds like the
behavior that's seen today with 127.0.0.1.

So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't
support loopback, I'm still not sure how much of a fix this is.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Sean Hefty wrote:
 Also note that trying to bind rdma cm to all interface ip addresses was the 
 way
 that we were advised by openfabrics to figure out which devices are rdma-
 capable.

 As such, it is highly desirable to get the fix transparently in rdmacm and
 preserve the old semantic. More specifically, it seems undesirable to change
 this semantic in a minor ofed point release.
 

 I think the issue is larger than just the rdma_cm.

 First, it sounds like openmpi tries to bind to 127.0.0.1, which now works.  If
 opemmpi uses shared memory for connections on the same machine, I'm not sure 
 why
 this is a problem, unless it is passing that address to another machine to use
 for a connection.  If this is the case, then that is a bug in openmpi.
   

Yes, OpenMPI incorrectly advertises 127.0.0.1 as a valid address 
to-which the peer can connect. This needs to be fixed.


 Second, I still don't understand whether iwarp is limited to 'loopback'
 connections that are not bound to 127.0.0.1.  For instance, if the RDMA device
 is associated with 192.168.0.1, then can it handle a connection from 
 192.168.0.1
 - 192.168.0.1?  If it can't, then the rdma_cm can't help in this case when
 bind is called.  The failure has to come during connect, which sounds like the
 behavior that's seen today with 127.0.0.1.
   

Its not iWARP specific.  A device may or may not support hw loopback.
Now the IB spec mandates this support, but the iWARP spec doesn't.  
Ammasso and Chelsio T3 rnics do not support HW loopback.  They will fail 
if you try to connect to a local address.  The rdma-cm shouldn't allow 
binds to 127.0.0.1 for these devices since it 100% implies that the 
connection will require hw loopback for that device.

 So, while the rdma_cm can fail binds to 127.0.0.1 if the RDMA device doesn't
 support loopback, I'm still not sure how much of a fix this is.
   

My concern is breaking an existing working OpenMPI in a point release 
because we changed semantics of the rdma-cm in an ofed point release...

BTW:  Was this change an artifact of rebasing ofed-1.5.1 on a new kernel 
version?

Steve.

 - Sean

 --
 To unsubscribe from this list: send the line unsubscribe linux-rdma in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise

 I agree that we should probably not allow 127.0.0.1 binds in 
 ofed-1.5.1 at all because it regresses OpenMPI.  Even with IB systems, 
 if the bind to 127.0.0.1 succeeds, then OpenMPI assumes 127.0.0.1 is 
 bound to that rdma interface and advertises this address to its peer 
 as an address to-which that peer can rdma connect!  This will break IB 
 clusters too, not just T3/iWARP cluster.   While I think OpenMPI needs 
 to skip 127.0.0.1 in its logic, I think we should probably defer 
 allowing 127.0.0.1 binds until ofed-1.6.

 But Jeff, note that if someone uses the upstream kernel and OpenMPI, 
 its busted...

 So I recommend:

 1) Don't allow 127.0.0.1 binds in ofed-1.5.1

 2) Fix OpenMPI ASAP to never advertise 127.0.0.1 as a valid rdma-cm 
 connect address (get it in ofed-1.5.2 or ofed-1.6).

Also, there is a good argument for never allowing 127.0.0.1 for rdma 
anyway.  It implies a _software_ loopback.  It should NEVER be bound to 
a real NIC interface and thus rdma binds shouldn't be allowed to it 
since there is no software rdma loopback support...

Unless someone implements software rdma loobpack...  ;)


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:

   But Jeff, note that if someone uses the upstream kernel and OpenMPI,
   its busted...
 
 Is the issue 6f8372b6 (RDMA/cm: fix loopback address support)?  This
 just went in for 2.6.33, which is still at -rc6, so if we can quickly
 reach a consensus, there is still time to get a fix in for 2.6.33.

Oh oh oh!  Yes, that would be fabulous...

Thanks!

-- 
Jeff Squyres jsquy...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Jeff Squyres wrote:
 On Feb 5, 2010, at 12:51 PM, Roland Dreier (rdreier) wrote:

   
   But Jeff, note that if someone uses the upstream kernel and OpenMPI,
   its busted...

 Is the issue 6f8372b6 (RDMA/cm: fix loopback address support)?  This
 just went in for 2.6.33, which is still at -rc6, so if we can quickly
 reach a consensus, there is still time to get a fix in for 2.6.33.
 

 Oh oh oh!  Yes, that would be fabulous...

 Thanks!

   

I think we should remove the feature of allowing binds to 127.0.0.1 
altogether based on Jeff's arguments and my assertion that 127.0.0.1 is 
a sw-loopback mechanism anyway...

I'm not sure if that commit does more or not...

Steve.

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jason Gunthorpe
On Fri, Feb 05, 2010 at 12:32:51PM -0600, Steve Wise wrote:

 I think we should remove the feature of allowing binds to 127.0.0.1  
 altogether based on Jeff's arguments and my assertion that 127.0.0.1 is  
 a sw-loopback mechanism anyway...

I don't agree, the kernel should be free to provide a loop back
service any way it likes, and if that means using one of the HW
adaptors to accelerate the work, then fine. Consider if we see the
RDMAoE (soft RDMA) patches then it would be reasonable for all
kernels to support RDMA on the loopback.

At a minimum, RDMA CM is an IP service, so whatever logic you use to
determine addresses for TCP must also be done after determining a list
of valid RDMA IPs. Trying to do RDMA CM bind just gives you the list
of candidate addreses, no different than netlink does for TCP.

One of those steps must be at least filtering 127.0.0.0/8. The user
should also be able to have some input into the IP filter - software
RDMAoE for instance really make this important.

Jason
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Steve Wise
Sean Hefty wrote:
 Is the issue 6f8372b6 (RDMA/cm: fix loopback address support)?  This
 just went in for 2.6.33, which is still at -rc6, so if we can quickly
 reach a consensus, there is still time to get a fix in for 2.6.33.
 

 That should be the patch in question.  I'm not sure about reaching consensus. 
 :)
 If the other changes to the rdma_cm aren't closely tied to that change, we may
 be able to back that one patch out until we can get whatever other fix may be
 needed.
   

I'd like to do this approach.  Then re-submit once we come to consensus...

 In my view, openmpi has a bug in that it can pass a loopback address to a 
 remote
 peer and expect it to be used to establish a connection.  Steve seems to agree
 with this.

 My original intent was to allow the use of the loopback address with the
 rdma_cm.  I.e. 127.0.0.1 meant 'this host', and not 'software loopback'.  I 
 just
 had Arlin run a quick test with OFED 1.4 over IB, and it allows binding to
 127.0.0.1, but never forms connections.  I.e. ucmatose -b 127.0.0.1 succeeds 
 in
 listening, but ucmatose -s 127.0.0.1 fails to connect because of a route 
 error.
 (Hmm... I'm still confused about what openmpi is doing then.)
   

But it must fail in OFED-1.4 if binding to an iwarp interface.   Maybe 
there was IB-only logic allowing 127.0.0.1 binds in OFED-1.4?   

The reason openmpi might still work on IB is that its not typical to use 
the rdma-cm for IB setups.  Its required for iwarp though.

 Jeff, what's the default CPC for IB devices?

 Even if an application were to use non-loopback IP addresses, there's no
 guarantee of forming a connection if those addresses map to an iwarp device.
 So, even if the rdma_cm fails binding to 127.0.0.1 unless there's some RDMA
 device (software or hardware - not sure why we care) capable of supporting it,
 an application would need to also deal with failures from rdma_resolve_addr.

 Indicating loopback through a device capability flag seems like the right
 approach, and the rdma_cm can use this to fail 
 rdma_bind_addr/rdma_resolve_addr
 calls.  That's probably not a trivial patch however.

 - Sean
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Roland Dreier
   That should be the patch in question.  I'm not sure about reaching 
   consensus. :)
   If the other changes to the rdma_cm aren't closely tied to that change, we 
   may
   be able to back that one patch out until we can get whatever other fix may 
   be
   needed.

  I'd like to do this approach.  Then re-submit once we come to consensus...

That makes sense to me.  Someone please send me a tested revert.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-05 Thread Jeff Squyres
On Feb 5, 2010, at 4:14 PM, Jason Gunthorpe wrote:

 Well, I think you are right. This kind of change seems appropriate to
 me for mainline, but OFED/RHEL should carry a responsibility to manage
 an identified incompatibility, either patch their kernel, patch their
 OMPI, or publish an errata. That is the role of a distribution.

RHEL has said, multiple times, that they rely on OpenFabrics to do the Right 
Thing.  They don't do a lot of testing, validating, etc.

 Sounds like this is taken care for now anyhow, Sean's patch to remove
 it for iwarp since it doesn't work today with any iwarp drivers does
 obscure the problem.. But it does seem like rdma_cm mode for IB
 networks will still be broken in OMPI with the new kernels.

Correct.

So why not back off putting this in the kernel that's coming out now now now?  
Why not put it in *next* kernel?  (or even better, the one after that)

Is there a rush / need to have this in *now*?

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
I just opened 1918.  The latest ofed-1.5.1 rdma-cm is allowing binds to 
127.0.0.1.  This is no-no for devices that don't support hw loopback...

OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid 
for which IB devices.   This logic is now broken.  Regardless of whether 
OpenMPI should use another method for determining which IP address 
belong to which interfaces, we should probably rethink whether we're 
breaking rdma-cm semantics in a bad way on a point release.


Steve.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
The more I think about this, the more I conclude the rdma-cm is just 
broken.  There's no way to determine an RDMA device from 127.0.0.1, so 
how can bind succeed?


Steve Wise wrote:
 I just opened 1918.  The latest ofed-1.5.1 rdma-cm is allowing binds to 
 127.0.0.1.  This is no-no for devices that don't support hw loopback...

 OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid 
 for which IB devices.   This logic is now broken.  Regardless of whether 
 OpenMPI should use another method for determining which IP address 
 belong to which interfaces, we should probably rethink whether we're 
 breaking rdma-cm semantics in a bad way on a point release.


 Steve.
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
   

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
Sean Hefty wrote:
 OpenMPI uses rdma_bind_addr() to figure out which ip addresses are valid
 for which IB devices.   This logic is now broken.  Regardless of whether
 OpenMPI should use another method for determining which IP address
 belong to which interfaces, we should probably rethink whether we're
 breaking rdma-cm semantics in a bad way on a point release.
 

 The changes to the rdma_cm have been merged upstream.  These were fixes
 specifically to enable using the loopback address with RDMA devices.

 At first thought, we can extend enum ib_device_cap_flags to indicate if a 
 device
 supports loopback capabilities or not.  The rdma_cm could then skip over such
 devices when dealing with a loopback address. 

 - Sean
   

But how can you determine _which_ rdma device should be used if and app 
binds to 127.0.0.1?  I think this is busted...


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Sean Hefty
But how can you determine _which_ rdma device should be used if and app
binds to 127.0.0.1?  I think this is busted...

The code just picks the first rdma device available.  To me, this is preferable
than simply disallowing the loopback device from working at all.  I personally
use it all the time, so I don't have to figure out what the ip address is of the
system that I'm trying to test on.

Loopback support has always been in the rdma_cm and was intended to work; it
just didn't work very well... 

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Sean Hefty
Well then the rdma-cm needs to know which devices support hw loopback.
Cuz on a T3-only system, no hwloop...

The problem sounds like it's more than just whether 127.0.0.1 is usable.  That
check may fix openmpi, but it sounds more like the app needs to know whether the
device can actually support loopback, regardless of what addresses are used.  Is
this correct?

What would openmpi do if there were two addresses assigned to the T3 device?
Does openmpi simply bypass RDMA for all connections on the local machine?

Basically, I'm not sure that this is *just* an rdma_cm issue.  Although it
definitely appears that some sort of change needs to be made to the rdma_cm.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Sean Hefty
This solution would work.  Will you code it up?

I can do that.  I just want to make sure that we address the full scope of the
problem.

- Sean

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Steve Wise
Sean Hefty wrote:
 Well then the rdma-cm needs to know which devices support hw loopback.
 Cuz on a T3-only system, no hwloop...
 

 The problem sounds like it's more than just whether 127.0.0.1 is usable.  That
 check may fix openmpi, but it sounds more like the app needs to know whether 
 the
 device can actually support loopback, regardless of what addresses are used.  
 Is
 this correct?

 What would openmpi do if there were two addresses assigned to the T3 device?
   

It would use them and might even create two connections.

 Does openmpi simply bypass RDMA for all connections on the local machine?

   

OpenMPI can be run to use hw loopback if its available.  For T3 
clusters, OMPI is run in a mode to use shared memory for intra-node 
communications.


 Basically, I'm not sure that this is *just* an rdma_cm issue.  Although it
 definitely appears that some sort of change needs to be made to the rdma_cm.

   

I think the OpenMPI rdmacm code needs to skip 127.0.0.1, in this 
particular case.  Prior to ofed-1.5.1, however, the bind would fail and 
thus OpenMPI would not advertise 127.0.0.1 to its peer.  I will work to 
get that change done.

But lets also add a device attribute so the rdmacm can know if a device 
supports loopback.   Clearly, if the rdma-cm allows binds to T3, 
loopback connections will fail at connect time.

Hey Roland, are you ok with a device attribute to indicate hw-loopback 
support?


Steve.


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Roland Dreier
  Is this only an iwarp issue?  IE do all IB devices support hw
  loopback?  And will all future devices support it (IE is it an IBTA
  requirement)?

I do think IBA requires loopback to work.  Can't quote chapter  verse
off the top of my head.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] bug 1918 - openmpi broken due to rdma-cm changes

2010-02-04 Thread Paul Grun
I can.  Chapter 17 verse 3.1

17.3.1 Loopback
An HCA shall be able to internally loopback a packet sent to itself. That
is,
the verbs layer can specify a packet to be delivered to the same port
(possibly
a different QP though). The packet shall be delivered without the
packet appearing on the port's physical link. This loopback shall be able
to function without requiring the presence of an external switch.
InfiniBand does not reserve a special LID value to indicate loopback.
Instead,
the DLID (and DGID if present) of a loopback packet should be the
LID (and GID) of the port on which the packet was emitted. For loopback
packets, a channel adapter implementation may ignore other path information,
such as MTU, that is not otherwise needed for the receive buffer
or for the completion queue as specified in section 11.4.2.1 Poll for
Completion
on page 629.

-Original Message-
From: linux-rdma-ow...@vger.kernel.org
[mailto:linux-rdma-ow...@vger.kernel.org] On Behalf Of Roland Dreier
Sent: Thursday, February 04, 2010 3:51 PM
To: Steve Wise
Cc: Sean Hefty; linux-rdma; OpenFabrics EWG; Jeff Squyres
Subject: Re: bug 1918 - openmpi broken due to rdma-cm changes

  Is this only an iwarp issue?  IE do all IB devices support hw
  loopback?  And will all future devices support it (IE is it an IBTA
  requirement)?

I do think IBA requires loopback to work.  Can't quote chapter  verse
off the top of my head.
-- 
Roland Dreier rola...@cisco.com
Cisco.com - http://www.cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg