RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-11 Thread Sean Hefty
>Are Jason's patches a superset of David's patches? or they need to be
>applied and only then David's work can be re-reviewed/merged, etc?

I believe that Jason's patches are a superset of David's.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-11 Thread Or Gerlitz

Sean Hefty wrote:

I'll compare my final patches against the ones submitted by David to see if 
anything got missed
  
Are Jason's patches a superset of David's patches? or they need to be 
applied and only then David's work can be re-reviewed/merged, etc?


Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-10 Thread David J. Wilder

On Tue, 2009-11-10 at 13:12 -0800, Sean Hefty wrote:
> >How can we make progress? What are your thoughts?

> 
> Help with testing would be great.

Thanks Sean,
I would be glad to do some testing when patches are ready!
 
> 
> - Sean
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2009 at 01:12:17PM -0800, Sean Hefty wrote:
> >How can we make progress? What are your thoughts?
> 
> I'm actually working on this today.  I've taken Jason's patch as a
> starting point, and I'm breaking it into separate patches for
> merging and testing.  I hope to post the patches within the next
> couple of days.  I'll compare my final patches against the ones
> submitted by David to see if anything got missed.

Oh, thanks, I didn't mean for you to do all that work..

But I won't be able to get to it till after SC|09 which is like the
23rd, so maybe it is best if there is some OFED related deadline.

Aside from testing and breaking up the remaining things I wanted to
review/adjust:
- David's notes on the patch
- The rdma_resolve_ip should require src/dst have the same family
- The arp checking paths needs to be examinined as per my post
- The arp reply waiting code may be better done by holding the dst
  and not polling on route lookups.
- Review the private data handling and check that mixed family cases
  work as they should

It was my intent to capture everything David had done in that big
patch, but I may have missed something.

My intuition is that the fixups to the ARP/ND code will matter for
IPv6, particularly multi-port link local cases, but I'm not 100% sure
on that.

For full IPv6 support the 2 multicast patches you already reviewed will
need to included in OFED as well.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-10 Thread Sean Hefty
>How can we make progress? What are your thoughts?

I'm actually working on this today.  I've taken Jason's patch as a starting
point, and I'm breaking it into separate patches for merging and testing.  I
hope to post the patches within the next couple of days.  I'll compare my final
patches against the ones submitted by David to see if anything got missed.

Help with testing would be great.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-10 Thread Pradeep Satyanarayana
Jason Gunthorpe wrote:
> On Thu, Oct 29, 2009 at 12:56:01PM -0700, David J. Wilder wrote:
>> On Tue, 2009-10-27 at 23:42 -0600, Jason Gunthorpe wrote:
>>
>>> I left the network namespace stuff alone and kept with the init_net
>>> situation..
>> Another possible solution ;)
>>
>>  if (rt->idev->dev->flags & IFF_LOOPBACK){
> 
> Oh! That is probbly much better
> 
>>> -static int addr_resolve_remote(struct sockaddr *src_in,
>>> -   struct sockaddr *dst_in,
>>> -   struct rdma_dev_addr *addr)
>>> +static int addr_resolve(struct sockaddr *src_in,
>>> +   struct sockaddr *dst_in,
>>> +   struct rdma_dev_addr *addr)
>>>  {
>> A problem here, if a source address has not been specified then this
>> test is invalid.  I had to change it to use dst_in->sa_family. But as
>> you said, we should have validated sa_family before this point.
> 
> Yes, that should have been dst, dst is not optional, src is at this
> point.
> 
> The sa_family validation of src should only be to check that if it is
> specified it is is the same..

Sean,

I know Jason and David came up with different sets of patches. Is it possible 
that
we can come to some agreement, so that this can be pushed to OFED-1.5? 

I am concerned that if we miss the OFED-1.5 boat, this feature (IPv6 support 
for RDMA CM)
may be available in the various distributions only in late 2010 or maybe even 
2011.

We got side tracked with some higher priority stuff, but now should be able to 
help
with patches and testing them too. How can we make progress? What are your 
thoughts?

Pradeep


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-31 Thread Or Gerlitz

Jason Gunthorpe wrote:

On Wed, Oct 28, 2009 at 10:05:19AM -0700, Sean Hefty wrote:

A UD endpoint can communicate using multicast and to other UD endpoints.  A 
user could resolve a UD endpoint before joining a multicast group.


So the IP world analog would be:
fd = socket(AF_INET,SOCK_DGRAM);
connect(fd,'Some Unicast Address');
setsockopt(fd,IP_MULITCAST_ADD_MEMBERSHIP,'Some Multicast Address');
sendto(fd,...,'Some Multicast Address');

IP multicast senders don't call IP_ADD_MEMBERSHIP, only receivers

Or.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-29 Thread Jason Gunthorpe
On Thu, Oct 29, 2009 at 12:56:01PM -0700, David J. Wilder wrote:
> 
> On Tue, 2009-10-27 at 23:42 -0600, Jason Gunthorpe wrote:
> 
> > 
> > I left the network namespace stuff alone and kept with the init_net
> > situation..
> 
> Another possible solution ;)
> 
>  if (rt->idev->dev->flags & IFF_LOOPBACK){

Oh! That is probbly much better
 
> 
> > 
> > -static int addr_resolve_remote(struct sockaddr *src_in,
> > -   struct sockaddr *dst_in,
> > -   struct rdma_dev_addr *addr)
> > +static int addr_resolve(struct sockaddr *src_in,
> > +   struct sockaddr *dst_in,
> > +   struct rdma_dev_addr *addr)
> >  {
> 
> A problem here, if a source address has not been specified then this
> test is invalid.  I had to change it to use dst_in->sa_family. But as
> you said, we should have validated sa_family before this point.

Yes, that should have been dst, dst is not optional, src is at this
point.

The sa_family validation of src should only be to check that if it is
specified it is is the same..

Thanks!
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-29 Thread David J. Wilder

On Tue, 2009-10-27 at 23:42 -0600, Jason Gunthorpe wrote:

> 
> I left the network namespace stuff alone and kept with the init_net
> situation..

Another possible solution ;)

 if (rt->idev->dev->flags & IFF_LOOPBACK){


> 
> -static int addr_resolve_remote(struct sockaddr *src_in,
> - struct sockaddr *dst_in,
> - struct rdma_dev_addr *addr)
> +static int addr_resolve(struct sockaddr *src_in,
> + struct sockaddr *dst_in,
> + struct rdma_dev_addr *addr)
>  {

A problem here, if a source address has not been specified then this
test is invalid.  I had to change it to use dst_in->sa_family. But as
you said, we should have validated sa_family before this point.

>   if (src_in->sa_family == AF_INET) {
> - return addr4_resolve_remote((struct sockaddr_in *) src_in,
> + return addr4_resolve((struct sockaddr_in *) src_in,
>   (struct sockaddr_in *) dst_in, addr);
>   } else
> - return addr6_resolve_remote((struct sockaddr_in6 *) src_in,
> + return addr6_resolve((struct sockaddr_in6 *) src_in,
>   (struct sockaddr_in6 *) dst_in, addr);
>  }


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Jason Gunthorpe
On Wed, Oct 28, 2009 at 10:58:58AM -0700, Sean Hefty wrote:

> >I guess I was hoping you'd be able to do the private data in userspace
> >and avoid those kind of special cases..
> 
> The private data would still be done in user space.  This allows
> apps that make use of ADDR_CHANGE events to continue working as
> normal, while still allowing for address resolution to be done in
> user space.  This should meet Or's request.  User space may not know
> if they can resolve a particular address before binding.

Well, looking at this, I see that ADDR_CHANGE is only generated on
netdev events NETDEV_BONDING_FAILOVER - so what are the semantics
WRT to UCM here? As I said before, defeating the ND process seriously
affects how bonding works.

I think if UCM is going to have bonding-like HA it has to do the work
itself. It has to record the bonding IP is bound to multiple GID/LIDs
and it has to attempt connecting to multiple addrs.

So, I don't see ADDR_CHANGE fitting into this. An IB centric IB
connection lost event to trigger reconnect would be more
appropriate. This would be desirable anyhow for AF_IB only apps, so
multi-protocol apps can handle both cases - the same recovery
action is probably required anyhow...

FWIW, I think HA + UCM is very doable, if UCM can construct multiple
PRs based on bonding data it collects it should pass them into the
kernel using the mechanism we talked about for APM. The kernel does
not have to implement APM but it certainly can run through the
combinations and use the first to reply to the GMP as the path to
establish a connection. For typical 2 port bonding there would be 4
combinations, 2 of which the kernel could eliminate immediately by
detecting the local port is downed.

Then again, is bonding + ucm going to be a common usage model? Without
the faster address failover provided by the unsolicted ND packet the
recovery time will be worse. On the other hand, if all the above is
done UCM would actually have enough information to setup multi-port
APM...

Anyhow, I don't suggest you try to implement HA + UCM out of the gate,
but as a direction for work in that area to take I think AF_IB only
plus IB_LINK_LOST event, plus 'APM-lite' kernel PR round robin is a
viable approach.

Keeping AF_IB uniformly as AF_IB seems like it would make things
simpler to implement and describe WTF it is doing.

> At this point, I'm guessing that the only thing that would prevent
> this from working is if an explicit check is added.

Some kinds of checks are needed, passing  AF_INET and AF_INET6 into
rdma_resolve_ip will mis-function. dst=AF_IB will have to
branch off before this.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Sean Hefty
>Hum.. Not sure how that would work? src = AF_INET, dst = AF_IB?

yes - conceptually, these both map to PF_IB, which I think is really the
limiting factor

I'm guessing that you could make the reverse work as well, but it requires more
work and probably wouldn't be used by apps anyway.

>I guess I was hoping you'd be able to do the private data in userspace
>and avoid those kind of special cases..

The private data would still be done in user space.  This allows apps that make
use of ADDR_CHANGE events to continue working as normal, while still allowing
for address resolution to be done in user space.  This should meet Or's request.
User space may not know if they can resolve a particular address before binding.

I need to write the kernel patches to see if supporting this causes undo grief.
At this point, I'm guessing that the only thing that would prevent this from
working is if an explicit check is added.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Jason Gunthorpe
On Wed, Oct 28, 2009 at 10:05:19AM -0700, Sean Hefty wrote:
> >Can you explain how rdma_resolve_addr is used in conjunction with
> >multicast? I do not understand what the dest would be. Is it just a man
> >page typo?
> 
> A UD endpoint can communicate using multicast and to other UD
> endpoints.  A user could resolve a UD endpoint before joining a
> multicast group.

So the IP world analog would be:

fd = socket(AF_INET,SOCK_DGRAM);
connect(fd,'Some Unicast Address');
setsockopt(fd,IP_MULITCAST_ADD_MEMBERSHIP,'Some Multicast Address');
sendto(fd,...,'Some Multicast Address');

?

I think that is still OK. The routines still bind the rdma cm_id to
the devices via rdma_translate_ip pretty much like they did before.

There is no support for Linux IP multicast routing though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Sean Hefty
>Can you explain how rdma_resolve_addr is used in conjunction with
>multicast? I do not understand what the dest would be. Is it just a man
>page typo?

A UD endpoint can communicate using multicast and to other UD endpoints.  A user
could resolve a UD endpoint before joining a multicast group.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Jason Gunthorpe
On Wed, Oct 28, 2009 at 01:08:33AM -0700, Sean Hefty wrote:
> >Here is how I think this should all go. I can't steal some machines to
> >test it right now (prolly can't till after SC|09), but David/Sean maybe
> >you can have a look at it?
> 
> I will try to test it some this week.

Thanks, probably won't work right away, but I feel it is closer..

Do you have any idea about the ARP things? Is whoever did this
originally still around?

> >AH, there is still another oopsie I just noticed, rdma_resolve_addr
> >does not check that dst_addr and src_addr are the same AF
> 
> I was actually considering making use of this to allow AF_INET or AF_INET6 to
> communicate with AF_IB.  There's not a protocol limitation to supporting this.

Hum.. Not sure how that would work? src = AF_INET, dst = AF_IB?

I guess I was hoping you'd be able to do the private data in userspace
and avoid those kind of special cases..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Jason Gunthorpe
On Wed, Oct 28, 2009 at 04:49:28PM +0200, Or Gerlitz wrote:
> Jason Gunthorpe wrote:
> > **COMPILE TESTED ONLY**
> 
> any reason why other people have to test for you?

Just a preliminary proposal - other people can do whatever they
want. Look at the list of things it fixes and decide your self if you
want to help or not 

> > Convert the address resolution process for outgoing connections
> > to be very similar to the way the TCP stack does the same operations.
> > This fixes many corner case bugs that can crop up.
> 
> rdma_join_multicast(3) states that "before joining a multicast
> group, the rdma_cm_id must be bound to an RDMA device by calling
> rdma_bind_addr or rdma_resolve_addr", please make sure that this
> flow isn't broken by your patch.

Can you explain how rdma_resolve_addr is used in conjunction with
multicast? I do not understand what the dest would be. Is it just a man
page typo?

I belive the rdma_bind_addr path is unaffected.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Or Gerlitz
Jason Gunthorpe wrote:
> **COMPILE TESTED ONLY**

any reason why other people have to test for you?

> Convert the address resolution process for outgoing connections
> to be very similar to the way the TCP stack does the same operations.
> This fixes many corner case bugs that can crop up.

rdma_join_multicast(3) states that "before  joining  a  multicast  group,  the  
rdma_cm_id  must  be  bound to an RDMA device by calling rdma_bind_addr or 
rdma_resolve_addr", please make sure that this flow isn't broken by your patch.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Sean Hefty
>Here is how I think this should all go. I can't steal some machines to
>test it right now (prolly can't till after SC|09), but David/Sean maybe
>you can have a look at it?

I will try to test it some this week.

>AH, there is still another oopsie I just noticed, rdma_resolve_addr
>does not check that dst_addr and src_addr are the same AF

I was actually considering making use of this to allow AF_INET or AF_INET6 to
communicate with AF_IB.  There's not a protocol limitation to supporting this.

- Sean

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-27 Thread Jason Gunthorpe
**COMPILE TESTED ONLY**

Convert the address resolution process for outgoing connections
to be very similar to the way the TCP stack does the same operations.
This fixes many corner case bugs that can crop up.

- Process link local scope IPv6 addreses and use sin6_scope_id
  to choose the binding device
- Use routing lookups to choose the device and the source address
- Use ipv6_addr_copy
- Make the interaction of rdma_bind_addr and rdma_resolve_addr smoother
  when both specify a source address
- Replace the unlocked struct net_device pointer in rdma_dev_addr with
  an ifindex value. This is easier to work with when processing
  sin6_scope_id.
- Make addr4_resolve and addr6_resolve have the same flow, for easier
  review.
- Fixup error checking in the v6 routing lookup, dst is never 0, errno
  is returned via dst.error. Corrects cases where routing would return
  ENETUNREACH, etc
- Fold addr_send_arp into addr_resolve so that it uses the correct
  dst structure.

Based on work from "David J. Wilder" 

Signed-off-by: Jason Gunthorpe 
Reported-by: "David J. Wilder" 
Reported-by: leo.tomi...@oracle.com
---
 drivers/infiniband/core/addr.c |  242 ++--
 drivers/infiniband/core/cma.c  |   51 +++--
 include/rdma/ib_addr.h |2 +-
 3 files changed, 127 insertions(+), 168 deletions(-)

So.. I suddenly had a notion how to fit this all together when I was
writing about that SO_BINDTODEVICE stuff.

Here is how I think this should all go. I can't steal some machines to
test it right now (prolly can't till after SC|09), but David/Sean maybe
you can have a look at it?

This should be split up into probably 3-4 patches, but I've included it
as one to make discussions as to the whole flow simpler..

There are still a few areas I'm not certain about. There is something
wonky about how the old code was handling neighbours, I'm not sure if 
addr_send_arp was correct to assume dst->neighbour is always non zero,
or if addrX_resolve_remote is correct to check that it could be
zero. I'm also not certain the neigh_lookup is necessary, if the
dst->neighbour is already correct and non-zero. Hmm.

I left the network namespace stuff alone and kept with the init_net
situation..

The approach to solve the niggly problem Sean pointed out WRT to
rdma_bind_addr was to treat rdma_bind_addr as the combination of
SO_BINDTODEVICE and bind() in the normal inet case, and copy that
algorithm exactly. If rdma_bind_addr() is called with non-any IP
then the cm_id is bound to that device via the new bound_dev_if
index. All other checks then refer to that, and bound_dev_if is
fed into the route lookup to set the source device, and if the source
device is set then it is copied into the route output by the route
routines and everything works the way you'd expect. For the loopback
case, if no source device is bound then the destination IP is used
as the RDMA device, if one is bound then it is used instead.

There is also now checking for IPv6 link local addresses, and like in
the real stack specifying link local is basically the same flow as
SO_BINDTODEVICE, stick the scope_id into bound_dev_if and let the
routing sort it out.

I think the change for the arps should be done, but I'd definately
split it out into a new patch. Doing the neigh_event with the dst from
the single route lookup is really necessary for correct operation in
all cases - the old code could jump to another net_device in certain
situations due to the incomplete second lookup (Leo noticed this). It
may also be a good idea to set bound_dev_if when the neigh_event_send
is called so that if the timers run through addrX_resolve again we are
guarenteed to be bound on the right device.

This also fixes rdma_bind_addr with a v6 link local address, and v6
source address selection cases when unbound.

I'm guessing there are probably more problems with the v6 support,
like how is the port space being shared between v6/v4 and do incoming
v4 connections get mapped to v6 listeners? But that has nothing to do
with the routing... :)

AH, there is still another oopsie I just noticed, rdma_resolve_addr
does not check that dst_addr and src_addr are the same AF, and does
not check if src_addr == 0 that the bind_addr value is the same
AF. Not fixed in this version, just noting it here..

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..1b6d419 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -107,7 +107,7 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct 
net_device *dev,
memcpy(dev_addr->broadcast, dev->broadcast, MAX_ADDR_LEN);
if (dst_dev_addr)
memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
-   dev_addr->src_dev = dev;
+   dev_addr->bound_dev_if = dev->ifindex;
return 0;
 }
 EXPORT_SYMBOL(rdma_copy_addr);
@@ -117,6 +117,16 @@ int rdma_translate_ip(struct sockaddr *addr, struct 
rdma_dev_addr *dev_addr)