Re: nfsrdma broken on 2.6.34-rc1?

2010-04-01 Thread Tom Tucker

Sean Hefty wrote:

Sean, will you add this to the rdma_cm?



Not immediately because I lack the time to do it.

It would be really nice to share the kernel's port space code and remove the
port code in the rdma_cm.

  


LOL. Yes...yes it would. There is of course a Dragon to be slain. Roland?


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Steve Wise

Hey Sean,

I'm trying NFSRDMA on net-next and the server side fails when 
registering the rdma transport.  I think its due to the INET6 support 
added to the rdma-cm.  I'm still debugging though.


In fs/nfsd/nfsctl.c:__write_ports_addxprt(), it tries to create a new 
svc transport for PF_INET, and PF_INET6 using the same port and the 
wildcard address.  If the INET6 fails with anything other than 
-EAFNOSUPPORT, then the entire transport registration fails (ie no 
RDMA/INET support is added). 

When I do echo rdma 20049  /proc/fs/nfsd/portlist  I see the PF_INET 
transport get created successfully, but the INET6 transport create fails 
with -EADDRNOTAVAIL. 

Does the rdma-cm allow concurrent binds to PF_INET, INADDR_ANY, port=X 
and PF_INET6, IN6ADDR_ANY_INIT, port=X ?  Apparently the native stack 
allows this (which makes sense seeing as how they are different protocol 
families).



Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Steve Wise
Actually, this looks like a recent nfs change that now creates an INET 
and INET6 transport when you add one via the /proc/fs/nfsd/portlist 
file.  Looking at 2.6.30, which works, the nfsctl.c code is very 
different and only creates an INET transport...





Steve Wise wrote:

Hey Sean,

I'm trying NFSRDMA on net-next and the server side fails when 
registering the rdma transport.  I think its due to the INET6 support 
added to the rdma-cm.  I'm still debugging though.


In fs/nfsd/nfsctl.c:__write_ports_addxprt(), it tries to create a new 
svc transport for PF_INET, and PF_INET6 using the same port and the 
wildcard address.  If the INET6 fails with anything other than 
-EAFNOSUPPORT, then the entire transport registration fails (ie no 
RDMA/INET support is added).
When I do echo rdma 20049  /proc/fs/nfsd/portlist  I see the 
PF_INET transport get created successfully, but the INET6 transport 
create fails with -EADDRNOTAVAIL.
Does the rdma-cm allow concurrent binds to PF_INET, INADDR_ANY, port=X 
and PF_INET6, IN6ADDR_ANY_INIT, port=X ?  Apparently the native stack 
allows this (which makes sense seeing as how they are different 
protocol families).



Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Sean Hefty
Does the rdma-cm allow concurrent binds to PF_INET, INADDR_ANY, port=X
and PF_INET6, IN6ADDR_ANY_INIT, port=X ?

No, since this shows up in the rdma_cm as using the same port space.

The rdma_cm might be able to support this if the port space were separated based
on the address family, depending on how PS IB ends up.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Steve Wise

Sean Hefty wrote:


The rdma_cm might be able to support this if the port space were separated based
on the address family, depending on how PS IB ends up.


I think separate port spaces is the correct solution.

Steve.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Roland Dreier
   The rdma_cm might be able to support this if the port space were separated 
   based
   on the address family, depending on how PS IB ends up.
  
  I think separate port spaces is the correct solution.

This gets a bit tricky -- for normal IP stuff, there's the bindv6only
sysctl (and the IPV6_V6ONLY socket option).  Without that, you can't
bind an IPv4 socket to the same port as an IPv6 socket, since the IPv6
socket will accept IPv4 connections via an v4-v6 mapped address.  (You
can look at inet_csk_bind_conflict() to see the full complexity of the
checking done when binding an IPv4 socket)

I wonder what the right way from the RDMA CM to stay close to Linux
sockets semantics without adding too much horror is.  (Adding Jason to
the CC list since he usually has an opinion about things like this :)

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Jason Gunthorpe
On Mon, Mar 29, 2010 at 12:01:07PM -0700, Roland Dreier wrote:
The rdma_cm might be able to support this if the port space were 
 separated based
on the address family, depending on how PS IB ends up.
   
   I think separate port spaces is the correct solution.
 
 This gets a bit tricky -- for normal IP stuff, there's the bindv6only
 sysctl (and the IPV6_V6ONLY socket option).  Without that, you can't
 bind an IPv4 socket to the same port as an IPv6 socket, since the IPv6
 socket will accept IPv4 connections via an v4-v6 mapped address.  (You
 can look at inet_csk_bind_conflict() to see the full complexity of the
 checking done when binding an IPv4 socket)

Yeah, exactly, it is very complex and there is a real need for
things pretending to be IP to capture all this subtlety. The details
can't just be skipped over, people will notice :(

Though, I'm also not entirely certain that NFS-RDMA is right to bind
to both AFs, generally speaking on Linux for a multi-protocol app you
only want to bind to v6 addresses.. Or is it using IPV6_V6ONLY or alike?

 I wonder what the right way from the RDMA CM to stay close to Linux
 sockets semantics without adding too much horror is.  (Adding Jason to
 the CC list since he usually has an opinion about things like this :)

Clearly the best way is to figure out some way to work with the
existing routines in the kernel. This stuff is complex and duplicating
all of it in rdma_cm would be annoying..

To match the semantics each CM ID would still register to one SID but
an incoming connection request on a v4 PF SID could be matched to a v6
SID, etc.

I don't think new port spaces in the API are desirable.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Jason Gunthorpe
On Mon, Mar 29, 2010 at 02:51:46PM -0500, Steve Wise wrote:

 Yeah, exactly, it is very complex and there is a real need for
 things pretending to be IP to capture all this subtlety. The details
 can't just be skipped over, people will notice :(

 Though, I'm also not entirely certain that NFS-RDMA is right to bind
 to both AFs, generally speaking on Linux for a multi-protocol app you
 only want to bind to v6 addresses.. Or is it using IPV6_V6ONLY or alike?

 This issue is really not in the NFS-RDMA code.  the nfsd code is doing  
 the binding.  See commit:

 37498292aa97658a5d0a9bb84699ce8c1016bb74
 Author: Chuck Lever chuck.le...@oracle.com
 Date:   Tue Jan 26 14:04:22 2010 -0500

NFSD: Create PF_INET6 listener in write_ports

Sure.. but it relies on the behavior of svcsock.c which does this:

if (family == PF_INET6)
kernel_setsockopt(sock, SOL_IPV6, IPV6_V6ONLY,
(char *)val, sizeof(val));

And the NFS-RDMA has no equivalent. Having the common code explicitly
rely on IPV6_V6ONLY is quite troublesome when you can't implement it
:)

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nfsrdma broken on 2.6.34-rc1?

2010-03-29 Thread Roland Dreier
  This issue is really not in the NFS-RDMA code.  the nfsd code is doing
  the binding.  See commit:

I think the really relevant thing is 7d21c0f9 (SUNRPC: Set IPV6ONLY
flag on PF_INET6 RPC listener sockets) and followups.  NFS expects to
have one IPv6-only socket and one IPv4-only socket.

It seems RDMA CM should create a similar V6ONLY option for binding (and
probably default to the /proc/sys/net/ipv6/bindv6only sysctl value) to
handle this.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html