Re: nfsrdma broken on 2.6.34-rc1?
Sean Hefty wrote: Sean, will you add this to the rdma_cm? Not immediately because I lack the time to do it. It would be really nice to share the kernel's port space code and remove the port code in the rdma_cm. LOL. Yes...yes it would. There is of course a Dragon to be slain. Roland? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nfsrdma broken on 2.6.34-rc1?
Hey Sean, I'm trying NFSRDMA on net-next and the server side fails when registering the rdma transport. I think its due to the INET6 support added to the rdma-cm. I'm still debugging though. In fs/nfsd/nfsctl.c:__write_ports_addxprt(), it tries to create a new svc transport for PF_INET, and PF_INET6 using the same port and the wildcard address. If the INET6 fails with anything other than -EAFNOSUPPORT, then the entire transport registration fails (ie no RDMA/INET support is added). When I do echo rdma 20049 /proc/fs/nfsd/portlist I see the PF_INET transport get created successfully, but the INET6 transport create fails with -EADDRNOTAVAIL. Does the rdma-cm allow concurrent binds to PF_INET, INADDR_ANY, port=X and PF_INET6, IN6ADDR_ANY_INIT, port=X ? Apparently the native stack allows this (which makes sense seeing as how they are different protocol families). Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nfsrdma broken on 2.6.34-rc1?
Actually, this looks like a recent nfs change that now creates an INET and INET6 transport when you add one via the /proc/fs/nfsd/portlist file. Looking at 2.6.30, which works, the nfsctl.c code is very different and only creates an INET transport... Steve Wise wrote: Hey Sean, I'm trying NFSRDMA on net-next and the server side fails when registering the rdma transport. I think its due to the INET6 support added to the rdma-cm. I'm still debugging though. In fs/nfsd/nfsctl.c:__write_ports_addxprt(), it tries to create a new svc transport for PF_INET, and PF_INET6 using the same port and the wildcard address. If the INET6 fails with anything other than -EAFNOSUPPORT, then the entire transport registration fails (ie no RDMA/INET support is added). When I do echo rdma 20049 /proc/fs/nfsd/portlist I see the PF_INET transport get created successfully, but the INET6 transport create fails with -EADDRNOTAVAIL. Does the rdma-cm allow concurrent binds to PF_INET, INADDR_ANY, port=X and PF_INET6, IN6ADDR_ANY_INIT, port=X ? Apparently the native stack allows this (which makes sense seeing as how they are different protocol families). Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: nfsrdma broken on 2.6.34-rc1?
Does the rdma-cm allow concurrent binds to PF_INET, INADDR_ANY, port=X and PF_INET6, IN6ADDR_ANY_INIT, port=X ? No, since this shows up in the rdma_cm as using the same port space. The rdma_cm might be able to support this if the port space were separated based on the address family, depending on how PS IB ends up. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nfsrdma broken on 2.6.34-rc1?
Sean Hefty wrote: The rdma_cm might be able to support this if the port space were separated based on the address family, depending on how PS IB ends up. I think separate port spaces is the correct solution. Steve. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nfsrdma broken on 2.6.34-rc1?
The rdma_cm might be able to support this if the port space were separated based on the address family, depending on how PS IB ends up. I think separate port spaces is the correct solution. This gets a bit tricky -- for normal IP stuff, there's the bindv6only sysctl (and the IPV6_V6ONLY socket option). Without that, you can't bind an IPv4 socket to the same port as an IPv6 socket, since the IPv6 socket will accept IPv4 connections via an v4-v6 mapped address. (You can look at inet_csk_bind_conflict() to see the full complexity of the checking done when binding an IPv4 socket) I wonder what the right way from the RDMA CM to stay close to Linux sockets semantics without adding too much horror is. (Adding Jason to the CC list since he usually has an opinion about things like this :) - R. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nfsrdma broken on 2.6.34-rc1?
On Mon, Mar 29, 2010 at 12:01:07PM -0700, Roland Dreier wrote: The rdma_cm might be able to support this if the port space were separated based on the address family, depending on how PS IB ends up. I think separate port spaces is the correct solution. This gets a bit tricky -- for normal IP stuff, there's the bindv6only sysctl (and the IPV6_V6ONLY socket option). Without that, you can't bind an IPv4 socket to the same port as an IPv6 socket, since the IPv6 socket will accept IPv4 connections via an v4-v6 mapped address. (You can look at inet_csk_bind_conflict() to see the full complexity of the checking done when binding an IPv4 socket) Yeah, exactly, it is very complex and there is a real need for things pretending to be IP to capture all this subtlety. The details can't just be skipped over, people will notice :( Though, I'm also not entirely certain that NFS-RDMA is right to bind to both AFs, generally speaking on Linux for a multi-protocol app you only want to bind to v6 addresses.. Or is it using IPV6_V6ONLY or alike? I wonder what the right way from the RDMA CM to stay close to Linux sockets semantics without adding too much horror is. (Adding Jason to the CC list since he usually has an opinion about things like this :) Clearly the best way is to figure out some way to work with the existing routines in the kernel. This stuff is complex and duplicating all of it in rdma_cm would be annoying.. To match the semantics each CM ID would still register to one SID but an incoming connection request on a v4 PF SID could be matched to a v6 SID, etc. I don't think new port spaces in the API are desirable. Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nfsrdma broken on 2.6.34-rc1?
On Mon, Mar 29, 2010 at 02:51:46PM -0500, Steve Wise wrote: Yeah, exactly, it is very complex and there is a real need for things pretending to be IP to capture all this subtlety. The details can't just be skipped over, people will notice :( Though, I'm also not entirely certain that NFS-RDMA is right to bind to both AFs, generally speaking on Linux for a multi-protocol app you only want to bind to v6 addresses.. Or is it using IPV6_V6ONLY or alike? This issue is really not in the NFS-RDMA code. the nfsd code is doing the binding. See commit: 37498292aa97658a5d0a9bb84699ce8c1016bb74 Author: Chuck Lever chuck.le...@oracle.com Date: Tue Jan 26 14:04:22 2010 -0500 NFSD: Create PF_INET6 listener in write_ports Sure.. but it relies on the behavior of svcsock.c which does this: if (family == PF_INET6) kernel_setsockopt(sock, SOL_IPV6, IPV6_V6ONLY, (char *)val, sizeof(val)); And the NFS-RDMA has no equivalent. Having the common code explicitly rely on IPV6_V6ONLY is quite troublesome when you can't implement it :) Jason -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nfsrdma broken on 2.6.34-rc1?
This issue is really not in the NFS-RDMA code. the nfsd code is doing the binding. See commit: I think the really relevant thing is 7d21c0f9 (SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets) and followups. NFS expects to have one IPv6-only socket and one IPv4-only socket. It seems RDMA CM should create a similar V6ONLY option for binding (and probably default to the /proc/sys/net/ipv6/bindv6only sysctl value) to handle this. -- Roland Dreier rola...@cisco.com || For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html