Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
Sean Hefty wrote: I'll compare my final patches against the ones submitted by David to see if anything got missed Are Jason's patches a superset of David's patches? or they need to be applied and only then David's work can be re-reviewed/merged, etc? Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
Are Jason's patches a superset of David's patches? or they need to be applied and only then David's work can be re-reviewed/merged, etc? I believe that Jason's patches are a superset of David's. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
How can we make progress? What are your thoughts? I'm actually working on this today. I've taken Jason's patch as a starting point, and I'm breaking it into separate patches for merging and testing. I hope to post the patches within the next couple of days. I'll compare my final patches against the ones submitted by David to see if anything got missed. Help with testing would be great. - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
On Tue, Nov 10, 2009 at 01:12:17PM -0800, Sean Hefty wrote: How can we make progress? What are your thoughts? I'm actually working on this today. I've taken Jason's patch as a starting point, and I'm breaking it into separate patches for merging and testing. I hope to post the patches within the next couple of days. I'll compare my final patches against the ones submitted by David to see if anything got missed. Oh, thanks, I didn't mean for you to do all that work.. But I won't be able to get to it till after SC|09 which is like the 23rd, so maybe it is best if there is some OFED related deadline. Aside from testing and breaking up the remaining things I wanted to review/adjust: - David's notes on the patch - The rdma_resolve_ip should require src/dst have the same family - The arp checking paths needs to be examinined as per my post - The arp reply waiting code may be better done by holding the dst and not polling on route lookups. - Review the private data handling and check that mixed family cases work as they should It was my intent to capture everything David had done in that big patch, but I may have missed something. My intuition is that the fixups to the ARP/ND code will matter for IPv6, particularly multi-port link local cases, but I'm not 100% sure on that. For full IPv6 support the 2 multicast patches you already reviewed will need to included in OFED as well. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
On Tue, 2009-11-10 at 13:12 -0800, Sean Hefty wrote: How can we make progress? What are your thoughts? Help with testing would be great. Thanks Sean, I would be glad to do some testing when patches are ready! - Sean -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
Jason Gunthorpe wrote: **COMPILE TESTED ONLY** any reason why other people have to test for you? Convert the address resolution process for outgoing connections to be very similar to the way the TCP stack does the same operations. This fixes many corner case bugs that can crop up. rdma_join_multicast(3) states that before joining a multicast group, the rdma_cm_id must be bound to an RDMA device by calling rdma_bind_addr or rdma_resolve_addr, please make sure that this flow isn't broken by your patch. Or. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
On Wed, Oct 28, 2009 at 10:05:19AM -0700, Sean Hefty wrote: Can you explain how rdma_resolve_addr is used in conjunction with multicast? I do not understand what the dest would be. Is it just a man page typo? A UD endpoint can communicate using multicast and to other UD endpoints. A user could resolve a UD endpoint before joining a multicast group. So the IP world analog would be: fd = socket(AF_INET,SOCK_DGRAM); connect(fd,'Some Unicast Address'); setsockopt(fd,IP_MULITCAST_ADD_MEMBERSHIP,'Some Multicast Address'); sendto(fd,...,'Some Multicast Address'); ? I think that is still OK. The routines still bind the rdma cm_id to the devices via rdma_translate_ip pretty much like they did before. There is no support for Linux IP multicast routing though.. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RDMA] Fixup IPv6 support and IPv4 routing corner cases for RDMA CM
**COMPILE TESTED ONLY** Convert the address resolution process for outgoing connections to be very similar to the way the TCP stack does the same operations. This fixes many corner case bugs that can crop up. - Process link local scope IPv6 addreses and use sin6_scope_id to choose the binding device - Use routing lookups to choose the device and the source address - Use ipv6_addr_copy - Make the interaction of rdma_bind_addr and rdma_resolve_addr smoother when both specify a source address - Replace the unlocked struct net_device pointer in rdma_dev_addr with an ifindex value. This is easier to work with when processing sin6_scope_id. - Make addr4_resolve and addr6_resolve have the same flow, for easier review. - Fixup error checking in the v6 routing lookup, dst is never 0, errno is returned via dst.error. Corrects cases where routing would return ENETUNREACH, etc - Fold addr_send_arp into addr_resolve so that it uses the correct dst structure. Based on work from David J. Wilder dwil...@us.ibm.com Signed-off-by: Jason Gunthorpe jguntho...@obsidianresearch.com Reported-by: David J. Wilder dwil...@us.ibm.com Reported-by: leo.tomi...@oracle.com --- drivers/infiniband/core/addr.c | 242 ++-- drivers/infiniband/core/cma.c | 51 +++-- include/rdma/ib_addr.h |2 +- 3 files changed, 127 insertions(+), 168 deletions(-) So.. I suddenly had a notion how to fit this all together when I was writing about that SO_BINDTODEVICE stuff. Here is how I think this should all go. I can't steal some machines to test it right now (prolly can't till after SC|09), but David/Sean maybe you can have a look at it? This should be split up into probably 3-4 patches, but I've included it as one to make discussions as to the whole flow simpler.. There are still a few areas I'm not certain about. There is something wonky about how the old code was handling neighbours, I'm not sure if addr_send_arp was correct to assume dst-neighbour is always non zero, or if addrX_resolve_remote is correct to check that it could be zero. I'm also not certain the neigh_lookup is necessary, if the dst-neighbour is already correct and non-zero. Hmm. I left the network namespace stuff alone and kept with the init_net situation.. The approach to solve the niggly problem Sean pointed out WRT to rdma_bind_addr was to treat rdma_bind_addr as the combination of SO_BINDTODEVICE and bind() in the normal inet case, and copy that algorithm exactly. If rdma_bind_addr() is called with non-any IP then the cm_id is bound to that device via the new bound_dev_if index. All other checks then refer to that, and bound_dev_if is fed into the route lookup to set the source device, and if the source device is set then it is copied into the route output by the route routines and everything works the way you'd expect. For the loopback case, if no source device is bound then the destination IP is used as the RDMA device, if one is bound then it is used instead. There is also now checking for IPv6 link local addresses, and like in the real stack specifying link local is basically the same flow as SO_BINDTODEVICE, stick the scope_id into bound_dev_if and let the routing sort it out. I think the change for the arps should be done, but I'd definately split it out into a new patch. Doing the neigh_event with the dst from the single route lookup is really necessary for correct operation in all cases - the old code could jump to another net_device in certain situations due to the incomplete second lookup (Leo noticed this). It may also be a good idea to set bound_dev_if when the neigh_event_send is called so that if the timers run through addrX_resolve again we are guarenteed to be bound on the right device. This also fixes rdma_bind_addr with a v6 link local address, and v6 source address selection cases when unbound. I'm guessing there are probably more problems with the v6 support, like how is the port space being shared between v6/v4 and do incoming v4 connections get mapped to v6 listeners? But that has nothing to do with the routing... :) AH, there is still another oopsie I just noticed, rdma_resolve_addr does not check that dst_addr and src_addr are the same AF, and does not check if src_addr == 0 that the bind_addr value is the same AF. Not fixed in this version, just noting it here.. diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index bd07803..1b6d419 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -107,7 +107,7 @@ int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, memcpy(dev_addr-broadcast, dev-broadcast, MAX_ADDR_LEN); if (dst_dev_addr) memcpy(dev_addr-dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN); - dev_addr-src_dev = dev; + dev_addr-bound_dev_if = dev-ifindex; return 0; } EXPORT_SYMBOL(rdma_copy_addr); @@ -117,6 +117,16 @@ int