Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-11 Thread Or Gerlitz

Sean Hefty wrote:

I'll compare my final patches against the ones submitted by David to see if 
anything got missed
  
Are Jason's patches a superset of David's patches? or they need to be 
applied and only then David's work can be re-reviewed/merged, etc?


Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-11 Thread Sean Hefty
Are Jason's patches a superset of David's patches? or they need to be
applied and only then David's work can be re-reviewed/merged, etc?

I believe that Jason's patches are a superset of David's.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-10 Thread Sean Hefty
How can we make progress? What are your thoughts?

I'm actually working on this today.  I've taken Jason's patch as a starting
point, and I'm breaking it into separate patches for merging and testing.  I
hope to post the patches within the next couple of days.  I'll compare my final
patches against the ones submitted by David to see if anything got missed.

Help with testing would be great.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-10 Thread Jason Gunthorpe
On Tue, Nov 10, 2009 at 01:12:17PM -0800, Sean Hefty wrote:
 How can we make progress? What are your thoughts?
 
 I'm actually working on this today.  I've taken Jason's patch as a
 starting point, and I'm breaking it into separate patches for
 merging and testing.  I hope to post the patches within the next
 couple of days.  I'll compare my final patches against the ones
 submitted by David to see if anything got missed.

Oh, thanks, I didn't mean for you to do all that work..

But I won't be able to get to it till after SC|09 which is like the
23rd, so maybe it is best if there is some OFED related deadline.

Aside from testing and breaking up the remaining things I wanted to
review/adjust:
- David's notes on the patch
- The rdma_resolve_ip should require src/dst have the same family
- The arp checking paths needs to be examinined as per my post
- The arp reply waiting code may be better done by holding the dst
  and not polling on route lookups.
- Review the private data handling and check that mixed family cases
  work as they should

It was my intent to capture everything David had done in that big
patch, but I may have missed something.

My intuition is that the fixups to the ARP/ND code will matter for
IPv6, particularly multi-port link local cases, but I'm not 100% sure
on that.

For full IPv6 support the 2 multicast patches you already reviewed will
need to included in OFED as well.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-11-10 Thread David J. Wilder

On Tue, 2009-11-10 at 13:12 -0800, Sean Hefty wrote:
 How can we make progress? What are your thoughts?

 
 Help with testing would be great.

Thanks Sean,
I would be glad to do some testing when patches are ready!
 
 
 - Sean
 

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Or Gerlitz
Jason Gunthorpe wrote:
 **COMPILE TESTED ONLY**

any reason why other people have to test for you?

 Convert the address resolution process for outgoing connections
 to be very similar to the way the TCP stack does the same operations.
 This fixes many corner case bugs that can crop up.

rdma_join_multicast(3) states that before  joining  a  multicast  group,  the  
rdma_cm_id  must  be  bound to an RDMA device by calling rdma_bind_addr or 
rdma_resolve_addr, please make sure that this flow isn't broken by your patch.

Or.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-28 Thread Jason Gunthorpe
On Wed, Oct 28, 2009 at 10:05:19AM -0700, Sean Hefty wrote:
 Can you explain how rdma_resolve_addr is used in conjunction with
 multicast? I do not understand what the dest would be. Is it just a man
 page typo?
 
 A UD endpoint can communicate using multicast and to other UD
 endpoints.  A user could resolve a UD endpoint before joining a
 multicast group.

So the IP world analog would be:

fd = socket(AF_INET,SOCK_DGRAM);
connect(fd,'Some Unicast Address');
setsockopt(fd,IP_MULITCAST_ADD_MEMBERSHIP,'Some Multicast Address');
sendto(fd,...,'Some Multicast Address');

?

I think that is still OK. The routines still bind the rdma cm_id to
the devices via rdma_translate_ip pretty much like they did before.

There is no support for Linux IP multicast routing though..

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM

2009-10-27 Thread Jason Gunthorpe
**COMPILE TESTED ONLY**

Convert the address resolution process for outgoing connections
to be very similar to the way the TCP stack does the same operations.
This fixes many corner case bugs that can crop up.

- Process link local scope IPv6 addreses and use sin6_scope_id
  to choose the binding device
- Use routing lookups to choose the device and the source address
- Use ipv6_addr_copy
- Make the interaction of rdma_bind_addr and rdma_resolve_addr smoother
  when both specify a source address
- Replace the unlocked struct net_device pointer in rdma_dev_addr with
  an ifindex value. This is easier to work with when processing
  sin6_scope_id.
- Make addr4_resolve and addr6_resolve have the same flow, for easier
  review.
- Fixup error checking in the v6 routing lookup, dst is never 0, errno
  is returned via dst.error. Corrects cases where routing would return
  ENETUNREACH, etc
- Fold addr_send_arp into addr_resolve so that it uses the correct
  dst structure.

Based on work from David J. Wilder dwil...@us.ibm.com

Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com
Reported-by: David J. Wilder dwil...@us.ibm.com
Reported-by: leo.tomi...@oracle.com
---
 drivers/infiniband/core/addr.c |  242 ++--
 drivers/infiniband/core/cma.c  |   51 +++--
 include/rdma/ib_addr.h |2 +-
 3 files changed, 127 insertions(+), 168 deletions(-)

So.. I suddenly had a notion how to fit this all together when I was
writing about that SO_BINDTODEVICE stuff.

Here is how I think this should all go. I can't steal some machines to
test it right now (prolly can't till after SC|09), but David/Sean maybe
you can have a look at it?

This should be split up into probably 3-4 patches, but I've included it
as one to make discussions as to the whole flow simpler..

There are still a few areas I'm not certain about. There is something
wonky about how the old code was handling neighbours, I'm not sure if 
addr_send_arp was correct to assume dst-neighbour is always non zero,
or if addrX_resolve_remote is correct to check that it could be
zero. I'm also not certain the neigh_lookup is necessary, if the
dst-neighbour is already correct and non-zero. Hmm.

I left the network namespace stuff alone and kept with the init_net
situation..

The approach to solve the niggly problem Sean pointed out WRT to
rdma_bind_addr was to treat rdma_bind_addr as the combination of
SO_BINDTODEVICE and bind() in the normal inet case, and copy that
algorithm exactly. If rdma_bind_addr() is called with non-any IP
then the cm_id is bound to that device via the new bound_dev_if
index. All other checks then refer to that, and bound_dev_if is
fed into the route lookup to set the source device, and if the source
device is set then it is copied into the route output by the route
routines and everything works the way you'd expect. For the loopback
case, if no source device is bound then the destination IP is used
as the RDMA device, if one is bound then it is used instead.

There is also now checking for IPv6 link local addresses, and like in
the real stack specifying link local is basically the same flow as
SO_BINDTODEVICE, stick the scope_id into bound_dev_if and let the
routing sort it out.

I think the change for the arps should be done, but I'd definately
split it out into a new patch. Doing the neigh_event with the dst from
the single route lookup is really necessary for correct operation in
all cases - the old code could jump to another net_device in certain
situations due to the incomplete second lookup (Leo noticed this). It
may also be a good idea to set bound_dev_if when the neigh_event_send
is called so that if the timers run through addrX_resolve again we are
guarenteed to be bound on the right device.

This also fixes rdma_bind_addr with a v6 link local address, and v6
source address selection cases when unbound.

I'm guessing there are probably more problems with the v6 support,
like how is the port space being shared between v6/v4 and do incoming
v4 connections get mapped to v6 listeners? But that has nothing to do
with the routing... :)

AH, there is still another oopsie I just noticed, rdma_resolve_addr
does not check that dst_addr and src_addr are the same AF, and does
not check if src_addr == 0 that the bind_addr value is the same
AF. Not fixed in this version, just noting it here..

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index bd07803..1b6d419 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -107,7 +107,7 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct 
net_device *dev,
memcpy(dev_addr-broadcast, dev-broadcast, MAX_ADDR_LEN);
if (dst_dev_addr)
memcpy(dev_addr-dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN);
-   dev_addr-src_dev = dev;
+   dev_addr-bound_dev_if = dev-ifindex;
return 0;
 }
 EXPORT_SYMBOL(rdma_copy_addr);
@@ -117,6 +117,16 @@ int