Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-09 Thread Ka-Cheong Poon

On 07/07/2018 01:26 AM, Santosh Shilimkar wrote:

Hi Ka-Cheong,

On 7/6/2018 8:25 AM, Sowmini Varadhan wrote:

On (07/06/18 23:08), Ka-Cheong Poon wrote:


As mentioned in a previous mail, it is unclear why the
port number is transport specific.  Most Internet services
use the same port number running over TCP/UDP as shown
in the IANA database.  And the IANA RDS registration is
the same.  What is the rationale of having a transport
specific number in the RDS implementation?


because not every transport may need a port number.


Lets keep separate port for RDMA and TCP transport. This has been
already useful for wireshark dissector and can also help for eBPF
like external tooling. The fragment format and re-assembly is
different across transports.



But does it have anything to do with the fact that a #define
is in the rds.h file?



I do see your point and also agree that port number isn't transport
specific and in case we need to add another transport, what port
to use. But may be till then lets keep the existing behavior.
As such this port switch is not related to IPv6 support as such
so lets deal with it separately.



As I mentioned in a previous mail, I can move the #define if
it makes folks happy.  But the number cannot be changed as it
is already being used.


--
K. Poon
ka-cheong.p...@oracle.com




Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Santosh Shilimkar

Hi Ka-Cheong,

On 7/6/2018 8:25 AM, Sowmini Varadhan wrote:

On (07/06/18 23:08), Ka-Cheong Poon wrote:


As mentioned in a previous mail, it is unclear why the
port number is transport specific.  Most Internet services
use the same port number running over TCP/UDP as shown
in the IANA database.  And the IANA RDS registration is
the same.  What is the rationale of having a transport
specific number in the RDS implementation?


because not every transport may need a port number.


Lets keep separate port for RDMA and TCP transport. This has been
already useful for wireshark dissector and can also help for eBPF
like external tooling. The fragment format and re-assembly is
different across transports.

I do see your point and also agree that port number isn't transport
specific and in case we need to add another transport, what port
to use. But may be till then lets keep the existing behavior.
As such this port switch is not related to IPv6 support as such
so lets deal with it separately.

Regards,
Santosh


Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Sowmini Varadhan
On (07/06/18 23:08), Ka-Cheong Poon wrote:
> 
> As mentioned in a previous mail, it is unclear why the
> port number is transport specific.  Most Internet services
> use the same port number running over TCP/UDP as shown
> in the IANA database.  And the IANA RDS registration is
> the same.  What is the rationale of having a transport
> specific number in the RDS implementation?

because not every transport may need a port number.

e.g., "RDS over pigeon carrier" may not need a port number.

in a different design (e.g., KCM) the listen port is configurable
with sysctl

Some may need more than one, e.g., rds_rdma, evidently.

What is the problem with using the unused 18635 for the RDS_CM_PORT?






Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Ka-Cheong Poon

On 07/06/2018 06:15 PM, Sowmini Varadhan wrote:

On (07/06/18 17:08), Ka-Cheong Poon wrote:

Hmm. Why do you need to include tcp header in ib transport
code ? If there is any common function just move to core
common file and use it.


I think it can be removed as it is left over from earlier
changes when the IB IPv6 listener port was RDS_TCP_PORT.
Now all the port definitions are in rds.h.


Transport specific information such as port numbers should not
be smeared into the common rds-module.  Please fix that.

If IB is also piggybacking on port 16385 (why?), please use your
own definitions for it in IB specific header files.



As mentioned in a previous mail, it is unclear why the
port number is transport specific.  Most Internet services
use the same port number running over TCP/UDP as shown
in the IANA database.  And the IANA RDS registration is
the same.  What is the rationale of having a transport
specific number in the RDS implementation?


--
K. Poon
ka-cheong.p...@oracle.com




Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread santosh.shilim...@oracle.com

On 7/6/18 2:08 AM, Ka-Cheong Poon wrote:

On 07/06/2018 01:58 AM, Santosh Shilimkar wrote:






diff --git a/net/rds/connection.c b/net/rds/connection.c
index abef75d..ca72563 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c


@@ -142,9 +151,12 @@ static void __rds_conn_path_init(struct 
rds_connection *conn,

   * are torn down as the module is removed, if ever.
   */
  static struct rds_connection *__rds_conn_create(struct net *net,
-    __be32 laddr, __be32 faddr,
-   struct rds_transport *trans, gfp_t gfp,
-   int is_outgoing)
+    const struct in6_addr *laddr,
+    const struct in6_addr *faddr,
+    struct rds_transport *trans,
+    gfp_t gfp,
+    int is_outgoing,
+    int dev_if)

Am just wondering if we can handle local link address case differently
instead of pushing the interface index everywhere. Did you think about
any alternative. This can also avoid bunch of changes again and hence
the question. Am trying to see if we can minimize the changes to
absolute must have to support IPv6.



If link local address is supported, scoped ID must be used.
Unless we remove the support of link local address, we need
to keep scope ID.


Ok.




diff --git a/net/rds/ib.h b/net/rds/ib.h
index a6f4d7d..12f96b3 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h



+union rds_ib_conn_priv {
+    struct rds_ib_connect_private    ricp_v4;
+    struct rds6_ib_connect_private    ricp_v6;
  };

This change was invetiable. Just add a comment saying the priviate
data size for v6 vs v4 is  different. Also for IPv6 priviate data,
I suggest add some resrve fields for any future extensions so
that things can be added without breaking wire protocol. With IPv4,
we ended up in bit of mess.



Will add some comments.  Unfortunately the IB private data
exchange has only a limited space.  I don't think there is
any more space left for reserved field.  I think we should
think of a different way to handle extensions in future.


There is enough space. You can send upto 512 bytes iirc. Please
add some reserve fields and ping me if you run into issues.


diff --git a/net/rds/send.c b/net/rds/send.c
index 94c7f74..6ed2e92 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c


@@ -1081,27 +1085,59 @@ int rds_sendmsg(struct socket *sock, struct 
msghdr *msg, size_t payload_len)

  goto out;
  }
-    if (msg->msg_namelen) {
-    /* XXX fail non-unicast destination IPs? */
-    if (msg->msg_namelen < sizeof(*usin) || usin->sin_family != 
AF_INET) {

+    namelen = msg->msg_namelen;
+    if (namelen != 0) {
+    if (namelen < sizeof(*usin)) {
+    ret = -EINVAL;
+    goto out;
+    }
+    switch (namelen) {
+    case sizeof(*usin):
+    if (usin->sin_family != AF_INET ||
+    usin->sin_addr.s_addr == htonl(INADDR_ANY) ||
+    usin->sin_addr.s_addr == htonl(INADDR_BROADCAST) ||
+    IN_MULTICAST(ntohl(usin->sin_addr.s_addr))) {
+    ret = -EINVAL;

The idea was to fail non-unicast IP someday but can you not add this
change as part of v6 series. We can look at it later unless you need
this change for v6. Again the question is mainly to add only necessary
check and leave the existing behavior as is so please re-check all below
checks too.



This is to match the same IPv6 check.  In IPv6 code, it checks
if the address is a unicast address.  I can remove the IPv4
checks but if an app does send to a multicast address, the
connection will be stuck forever.


OK. Lets keep it then.




diff --git a/net/rds/transport.c b/net/rds/transport.c
index 0b188dd..c9788db 100644
--- a/net/rds/transport.c
+++ b/net/rds/transport.c



+    if (ipv6_addr_v4mapped(addr)) {


Dave already commented on ipv6_addr_v4mapped(). Apart from
some of those comments questions, rest of the patch looks
really good and nicely done. Will also look at your
subsequent two patches and try to send you comments in next
few days.



Do you mean using mapped address to create IPv4 connections?
I already have it in my work space.  Will be in v3.


yeah. Thanks !!


Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Sowmini Varadhan
On (07/06/18 17:08), Ka-Cheong Poon wrote:
> >Hmm. Why do you need to include tcp header in ib transport
> >code ? If there is any common function just move to core
> >common file and use it.
> 
> I think it can be removed as it is left over from earlier
> changes when the IB IPv6 listener port was RDS_TCP_PORT.
> Now all the port definitions are in rds.h.

Transport specific information such as port numbers should not
be smeared into the common rds-module.  Please fix that.

If IB is also piggybacking on port 16385 (why?), please use your
own definitions for it in IB specific header files. 

Santosh, David, I have to NACK this if it is not changed.

--Sowmini









Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-06 Thread Ka-Cheong Poon

On 07/06/2018 01:58 AM, Santosh Shilimkar wrote:


--- a/net/rds/bind.c
+++ b/net/rds/bind.c



@@ -42,42 +43,58 @@
  static const struct rhashtable_params ht_parms = {
  .nelem_hint = 768,
-    .key_len = sizeof(u64),
+    .key_len = RDS_BOUND_KEY_LEN,

Do we really need the scope id to be part of the key ? With link
local/global, do you see any collisions. Please educate me
on the actual usecase. This can avoid bunch of changes and hence
the question.



Yes, because link local address is not unique.  The
same address can be in two different links.  The scope
ID is used to differentiate that.




@@ -114,7 +132,7 @@ static int rds_add_bound(struct rds_sock *rs, 
__be32 addr, __be16 *port)

    rs, , (int)ntohs(*port));
  break;
  } else {
-    rs->rs_bound_addr = 0;
+    rs->rs_bound_addr = in6addr_any;

Can you elaborate why 0 is not ok ?



The type of rs_bound_addr is struct in6_addr.  So 0
cannot be used.



  int rds_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
  {
  struct sock *sk = sock->sk;
-    struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
  struct rds_sock *rs = rds_sk_to_rs(sk);
+    struct in6_addr v6addr, *binding_addr;
  struct rds_transport *trans;
+    __u32 scope_id = 0;
  int ret = 0;
+    __be16 port;
+    /* We only allow an RDS socket to be bound to and IPv4 address. IPv6

s/'bound to and IPv4'/'bound to an IPv4'



Changed.  But this comment will be removed in patch #2
anyway.



+ * address support will be added later.
+ */
+    if (addr_len == sizeof(struct sockaddr_in)) {
+    struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
+
+    if (sin->sin_family != AF_INET ||
+    sin->sin_addr.s_addr == htonl(INADDR_ANY))
+    return -EINVAL;
+    ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, );
+    binding_addr = 
+    port = sin->sin_port;
+    } else if (addr_len == sizeof(struct sockaddr_in6)) {
+    return -EPROTONOSUPPORT;
+    } else {
+    return -EINVAL;
+    }
  lock_sock(sk);
-    if (addr_len != sizeof(struct sockaddr_in) ||
-    sin->sin_family != AF_INET ||
-    rs->rs_bound_addr ||
-    sin->sin_addr.s_addr == htonl(INADDR_ANY)) {
+    /* RDS socket does not allow re-binding. */
+    if (!ipv6_addr_any(>rs_bound_addr)) {
  ret = -EINVAL;
  goto out;
  }

Seems new behavior you are adding. The statement itself is
true but if we return silently for already bind address, its
ok. May be am missing something above.



This is the existing behavior.  The old code checks if
rs_bound_addr is non-zero or not.  If it is non-zero,
it returns an error.



diff --git a/net/rds/connection.c b/net/rds/connection.c
index abef75d..ca72563 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c


@@ -142,9 +151,12 @@ static void __rds_conn_path_init(struct 
rds_connection *conn,

   * are torn down as the module is removed, if ever.
   */
  static struct rds_connection *__rds_conn_create(struct net *net,
-    __be32 laddr, __be32 faddr,
-   struct rds_transport *trans, gfp_t gfp,
-   int is_outgoing)
+    const struct in6_addr *laddr,
+    const struct in6_addr *faddr,
+    struct rds_transport *trans,
+    gfp_t gfp,
+    int is_outgoing,
+    int dev_if)

Am just wondering if we can handle local link address case differently
instead of pushing the interface index everywhere. Did you think about
any alternative. This can also avoid bunch of changes again and hence
the question. Am trying to see if we can minimize the changes to
absolute must have to support IPv6.



If link local address is supported, scoped ID must be used.
Unless we remove the support of link local address, we need
to keep scope ID.



diff --git a/net/rds/ib.h b/net/rds/ib.h
index a6f4d7d..12f96b3 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h



+union rds_ib_conn_priv {
+    struct rds_ib_connect_private    ricp_v4;
+    struct rds6_ib_connect_private    ricp_v6;
  };

This change was invetiable. Just add a comment saying the priviate
data size for v6 vs v4 is  different. Also for IPv6 priviate data,
I suggest add some resrve fields for any future extensions so
that things can be added without breaking wire protocol. With IPv4,
we ended up in bit of mess.



Will add some comments.  Unfortunately the IB private data
exchange has only a limited space.  I don't think there is
any more space left for reserved field.  I think we should
think of a different way to handle extensions in future.



--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 2006 Oracle.  All rights reserved.
+ * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights 
reserved.

   *
   * This software is available to you under a choice of one 

Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-05 Thread Santosh Shilimkar

On 6/27/2018 3:23 AM, Ka-Cheong Poon wrote:

This patch changes the internal representation of an IP address to use
struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
All the functions which take an IP address as argument are also
changed to use struct in6_addr.  But RDS socket layer is not modified
such that it still does not accept IPv6 address from an application.
And RDS layer does not accept nor initiate IPv6 connections.

v2: Fixed sparse warnings.

Signed-off-by: Ka-Cheong Poon 
---
  net/rds/af_rds.c | 138 --
  net/rds/bind.c   |  91 +-
  net/rds/cong.c   |  23 ++--
  net/rds/connection.c | 132 +
  net/rds/ib.c |  17 +--
  net/rds/ib.h |  45 +--
  net/rds/ib_cm.c  | 300 +++
  net/rds/ib_rdma.c|  15 +--
  net/rds/ib_recv.c|  18 +--
  net/rds/ib_send.c|  10 +-
  net/rds/loop.c   |   7 +-
  net/rds/rdma.c   |   6 +-
  net/rds/rdma_transport.c |  56 ++---
  net/rds/rds.h|  69 +++
  net/rds/recv.c   |  51 +---
  net/rds/send.c   |  67 ---
  net/rds/tcp.c|  32 -
  net/rds/tcp_connect.c|  34 +++---
  net/rds/tcp_listen.c |  18 +--
  net/rds/tcp_recv.c   |   9 +-
  net/rds/tcp_send.c   |   4 +-
  net/rds/threads.c|  69 +--
  net/rds/transport.c  |  15 ++-
  23 files changed, 857 insertions(+), 369 deletions(-)




diff --git a/net/rds/bind.c b/net/rds/bind.c
index 5aa3a64..3822886 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 2006 Oracle.  All rights reserved.
+ * Copyright (c) 2006, 2018 Oracle and/or its affiliates. All rights reserved.
   *
   * This software is available to you under a choice of one of two
   * licenses.  You may choose to be licensed under the terms of the GNU
@@ -33,6 +33,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -42,42 +43,58 @@
  
  static const struct rhashtable_params ht_parms = {

.nelem_hint = 768,
-   .key_len = sizeof(u64),
+   .key_len = RDS_BOUND_KEY_LEN,

Do we really need the scope id to be part of the key ? With link
local/global, do you see any collisions. Please educate me
on the actual usecase. This can avoid bunch of changes and hence
the question.


@@ -114,7 +132,7 @@ static int rds_add_bound(struct rds_sock *rs, __be32 addr, 
__be16 *port)
  rs, , (int)ntohs(*port));
break;
} else {
-   rs->rs_bound_addr = 0;
+   rs->rs_bound_addr = in6addr_any;

Can you elaborate why 0 is not ok ?


rds_sock_put(rs);
ret = -ENOMEM;
break;
@@ -127,44 +145,61 @@ static int rds_add_bound(struct rds_sock *rs, __be32 
addr, __be16 *port)
  void rds_remove_bound(struct rds_sock *rs)
  {
  
-	if (!rs->rs_bound_addr)

+   if (ipv6_addr_any(>rs_bound_addr))
return;
  
-	rdsdebug("rs %p unbinding from %pI4:%d\n",

+   rdsdebug("rs %p unbinding from %pI6c:%d\n",
 rs, >rs_bound_addr,
 ntohs(rs->rs_bound_port));
  
  	rhashtable_remove_fast(_hash_table, >rs_bound_node, ht_parms);

rds_sock_put(rs);
-   rs->rs_bound_addr = 0;
+   rs->rs_bound_addr = in6addr_any;
  }
  
  int rds_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)

  {
struct sock *sk = sock->sk;
-   struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
struct rds_sock *rs = rds_sk_to_rs(sk);
+   struct in6_addr v6addr, *binding_addr;
struct rds_transport *trans;
+   __u32 scope_id = 0;
int ret = 0;
+   __be16 port;
  
+	/* We only allow an RDS socket to be bound to and IPv4 address. IPv6

s/'bound to and IPv4'/'bound to an IPv4'

+* address support will be added later.
+*/
+   if (addr_len == sizeof(struct sockaddr_in)) {
+   struct sockaddr_in *sin = (struct sockaddr_in *)uaddr;
+
+   if (sin->sin_family != AF_INET ||
+   sin->sin_addr.s_addr == htonl(INADDR_ANY))
+   return -EINVAL;
+   ipv6_addr_set_v4mapped(sin->sin_addr.s_addr, );
+   binding_addr = 
+   port = sin->sin_port;
+   } else if (addr_len == sizeof(struct sockaddr_in6)) {
+   return -EPROTONOSUPPORT;
+   } else {
+   return -EINVAL;
+   }
lock_sock(sk);
  
-	if (addr_len != sizeof(struct sockaddr_in) ||

-   sin->sin_family != AF_INET ||
-   rs->rs_bound_addr ||
-   sin->sin_addr.s_addr == htonl(INADDR_ANY)) {
+   /* RDS socket does not allow re-binding. */
+   if (!ipv6_addr_any(>rs_bound_addr)) {
ret = -EINVAL;

Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-07-03 Thread Ka-Cheong Poon

On 06/30/2018 04:50 PM, David Miller wrote:

From: Ka-Cheong Poon 
Date: Wed, 27 Jun 2018 03:23:27 -0700


This patch changes the internal representation of an IP address to use
struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
All the functions which take an IP address as argument are also
changed to use struct in6_addr.  But RDS socket layer is not modified
such that it still does not accept IPv6 address from an application.
And RDS layer does not accept nor initiate IPv6 connections.

v2: Fixed sparse warnings.

Signed-off-by: Ka-Cheong Poon 


I really don't like this.

An ipv4 mapped ipv6 address is not the same as an ipv4 address.

You are effectively preventing the use of ipv6 connections
using ipv4 mapped addresses.



Could you please clarify what is meant by an IPv6
connections using IPv4 mapped address?  An IPv6 packet
cannot use an IPv4 mapped address as source or destination
address.  Do you mean an app uses an IPv4 mapped address
in a struct sockaddr_in6 to set up an IPv4 connection?

Please note that this patch is patch #1.  This patch
alone does not support RDS/IPv6.  Hence this patch has
checks to prevent an app to use IPv6 address.  Those
checks will be removed in patch #2.



Also, assuming the sockaddr type based upon size is wrong.
You have to check the family field, then you can decide
to interpret the rest of the sockaddr in one way or another
and also validate it's length.




--
K. Poon
ka-cheong.p...@oracle.com




Re: [PATCH v2 net-next 1/3] rds: Changing IP address internal representation to struct in6_addr

2018-06-30 Thread David Miller
From: Ka-Cheong Poon 
Date: Wed, 27 Jun 2018 03:23:27 -0700

> This patch changes the internal representation of an IP address to use
> struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
> All the functions which take an IP address as argument are also
> changed to use struct in6_addr.  But RDS socket layer is not modified
> such that it still does not accept IPv6 address from an application.
> And RDS layer does not accept nor initiate IPv6 connections.
> 
> v2: Fixed sparse warnings.
> 
> Signed-off-by: Ka-Cheong Poon 

I really don't like this.

An ipv4 mapped ipv6 address is not the same as an ipv4 address.

You are effectively preventing the use of ipv6 connections
using ipv4 mapped addresses.

Also, assuming the sockaddr type based upon size is wrong.
You have to check the family field, then you can decide
to interpret the rest of the sockaddr in one way or another
and also validate it's length.