Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-29 Thread Tom Herbert
On Wed, Jul 29, 2015 at 2:54 AM, Thomas Graf tg...@suug.ch wrote:
 On 07/29/15 at 11:29am, Eric Dumazet wrote:
 On Wed, 2015-07-29 at 11:13 +0200, Thomas Graf wrote:
  On 07/28/15 at 04:02pm, Tom Herbert wrote:
   This patch creates sk_set_txhash and eliminates protocol specific
   inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
   random number instead of performing flow dissection. sk_set_txash
   is also allowed to be called multiple times for the same socket,
   we'll need this when redoing the hash for negative routing advice.
  
   Signed-off-by: Tom Herbert t...@herbertland.com
 
  Doesn't this break TX hashing with SO_REUSEPORT?


 AFAIK nothing uses sk_txhash yet.

 skb_set_hash_from_sk()
 skb_get_hash()

 Am I misreading this? I'm not using SO_REUSEPORT and it might be OK
 to assume that different sockets may go to different queues even if
 the L4 tuple is identical.

Hi Thomas,

The salient property of both sk_txhash and skb-hash is that they
provide a uniform distribution over flows. It is incorrect to assume
that either of these immutable during the lifetime of a flow, so yes
this means that packets of a flow may go to different receive queues
when hashes change. SO_REUSEPORT is a process in the receive path but
uses ehashfn over the ports. But even with SO_REUSEPORT we provide no
guarantee that packets of a flow will always hit the same socket,
the hashing is not consistent when new reuseport sockets are added or
removed-- this is actually a long standing issue with SO_REUSEPORT in
the TCP case since it is possible to orphan connections in SYN-RECV. I
believe Eric was working toward fixing that, so maybe in the future we
can use skb-hash if it is a savings.

Thanks,
Tom
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-29 Thread Thomas Graf
On 07/29/15 at 08:58am, Tom Herbert wrote:
 The salient property of both sk_txhash and skb-hash is that they
 provide a uniform distribution over flows. It is incorrect to assume
 that either of these immutable during the lifetime of a flow, so yes
 this means that packets of a flow may go to different receive queues
 when hashes change. SO_REUSEPORT is a process in the receive path but
 uses ehashfn over the ports. But even with SO_REUSEPORT we provide no
 guarantee that packets of a flow will always hit the same socket,
 the hashing is not consistent when new reuseport sockets are added or
 removed-- this is actually a long standing issue with SO_REUSEPORT in
 the TCP case since it is possible to orphan connections in SYN-RECV. I
 believe Eric was working toward fixing that, so maybe in the future we
 can use skb-hash if it is a savings.

Thanks for the explanation. I have no objections to the changes then.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-29 Thread Thomas Graf
On 07/28/15 at 04:02pm, Tom Herbert wrote:
 This patch creates sk_set_txhash and eliminates protocol specific
 inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
 random number instead of performing flow dissection. sk_set_txash
 is also allowed to be called multiple times for the same socket,
 we'll need this when redoing the hash for negative routing advice.
 
 Signed-off-by: Tom Herbert t...@herbertland.com

Doesn't this break TX hashing with SO_REUSEPORT?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-29 Thread Eric Dumazet
On Wed, 2015-07-29 at 11:13 +0200, Thomas Graf wrote:
 On 07/28/15 at 04:02pm, Tom Herbert wrote:
  This patch creates sk_set_txhash and eliminates protocol specific
  inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
  random number instead of performing flow dissection. sk_set_txash
  is also allowed to be called multiple times for the same socket,
  we'll need this when redoing the hash for negative routing advice.
  
  Signed-off-by: Tom Herbert t...@herbertland.com
 
 Doesn't this break TX hashing with SO_REUSEPORT?


AFAIK nothing uses sk_txhash yet.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-29 Thread Thomas Graf
On 07/29/15 at 11:29am, Eric Dumazet wrote:
 On Wed, 2015-07-29 at 11:13 +0200, Thomas Graf wrote:
  On 07/28/15 at 04:02pm, Tom Herbert wrote:
   This patch creates sk_set_txhash and eliminates protocol specific
   inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
   random number instead of performing flow dissection. sk_set_txash
   is also allowed to be called multiple times for the same socket,
   we'll need this when redoing the hash for negative routing advice.
   
   Signed-off-by: Tom Herbert t...@herbertland.com
  
  Doesn't this break TX hashing with SO_REUSEPORT?
 
 
 AFAIK nothing uses sk_txhash yet.

skb_set_hash_from_sk()
skb_get_hash()

Am I misreading this? I'm not using SO_REUSEPORT and it might be OK
to assume that different sockets may go to different queues even if
the L4 tuple is identical.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-29 Thread Thomas Graf
On 07/29/15 at 12:06pm, Eric Dumazet wrote:
 On Wed, 2015-07-29 at 11:54 +0200, Thomas Graf wrote:
 
  skb_set_hash_from_sk()
  skb_get_hash()
  
  Am I misreading this? I'm not using SO_REUSEPORT and it might be OK
  to assume that different sockets may go to different queues even if
  the L4 tuple is identical.
 
 skb_set_hash_from_sk() sets skb-hash in output path. Nothing uses it
 then later. bonding uses its own hash functions.
 
 SO_REUSEPORT is on input path, and uses its own hash anyway.

Yes, I'm talking about the output path only but after further reading,
the only case that could be a problem is if two SO_REUSEPORT sockets
connect to the same destination address and port which will be prevented
by connect anyway.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-29 Thread Eric Dumazet
On Wed, 2015-07-29 at 11:54 +0200, Thomas Graf wrote:

 skb_set_hash_from_sk()
 skb_get_hash()
 
 Am I misreading this? I'm not using SO_REUSEPORT and it might be OK
 to assume that different sockets may go to different queues even if
 the L4 tuple is identical.

skb_set_hash_from_sk() sets skb-hash in output path. Nothing uses it
then later. bonding uses its own hash functions.

SO_REUSEPORT is on input path, and uses its own hash anyway.




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/2] net: Set sk_txhash from a random number

2015-07-28 Thread Tom Herbert
This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 include/net/ip.h| 16 
 include/net/ipv6.h  | 19 ---
 include/net/sock.h  |  8 
 net/ipv4/datagram.c |  2 +-
 net/ipv4/tcp_ipv4.c |  4 ++--
 net/ipv6/datagram.c |  2 +-
 net/ipv6/tcp_ipv6.c |  4 ++--
 7 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index d5fe9f2..bee5f35 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -370,22 +370,6 @@ static inline void iph_to_flow_copy_v4addrs(struct 
flow_keys *flow,
flow-control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
 }
 
-static inline void inet_set_txhash(struct sock *sk)
-{
-   struct inet_sock *inet = inet_sk(sk);
-   struct flow_keys keys;
-
-   memset(keys, 0, sizeof(keys));
-
-   keys.addrs.v4addrs.src = inet-inet_saddr;
-   keys.addrs.v4addrs.dst = inet-inet_daddr;
-   keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-   keys.ports.src = inet-inet_sport;
-   keys.ports.dst = inet-inet_dport;
-
-   sk-sk_txhash = flow_hash_from_keys(keys);
-}
-
 static inline __wsum inet_gro_compute_pseudo(struct sk_buff *skb, int proto)
 {
const struct iphdr *iph = skb_gro_network_header(skb);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 82dbdb0..7c79798 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -707,25 +707,6 @@ static inline void iph_to_flow_copy_v6addrs(struct 
flow_keys *flow,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static inline void ip6_set_txhash(struct sock *sk)
-{
-   struct inet_sock *inet = inet_sk(sk);
-   struct ipv6_pinfo *np = inet6_sk(sk);
-   struct flow_keys keys;
-
-   memset(keys, 0, sizeof(keys));
-
-   memcpy(keys.addrs.v6addrs.src, np-saddr,
-  sizeof(keys.addrs.v6addrs.src));
-   memcpy(keys.addrs.v6addrs.dst, sk-sk_v6_daddr,
-  sizeof(keys.addrs.v6addrs.dst));
-   keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
-   keys.ports.src = inet-inet_sport;
-   keys.ports.dst = inet-inet_dport;
-
-   sk-sk_txhash = flow_hash_from_keys(keys);
-}
-
 static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb,
__be32 flowlabel, bool autolabel)
 {
diff --git a/include/net/sock.h b/include/net/sock.h
index 4353ef7..fe735c4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1687,6 +1687,14 @@ static inline void sock_graft(struct sock *sk, struct 
socket *parent)
 kuid_t sock_i_uid(struct sock *sk);
 unsigned long sock_i_ino(struct sock *sk);
 
+static inline void sk_set_txhash(struct sock *sk)
+{
+   sk-sk_txhash = prandom_u32();
+
+   if (unlikely(!sk-sk_txhash))
+   sk-sk_txhash = 1;
+}
+
 static inline struct dst_entry *
 __sk_dst_get(struct sock *sk)
 {
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 574fad9..f915abf 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -74,7 +74,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr 
*uaddr, int addr_len
inet-inet_daddr = fl4-daddr;
inet-inet_dport = usin-sin_port;
sk-sk_state = TCP_ESTABLISHED;
-   inet_set_txhash(sk);
+   sk_set_txhash(sk);
inet-inet_id = jiffies;
 
sk_dst_set(sk, rt-dst);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 486ba96..d27eb54 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -222,7 +222,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, 
int addr_len)
if (err)
goto failure;
 
-   inet_set_txhash(sk);
+   sk_set_txhash(sk);
 
rt = ip_route_newports(fl4, rt, orig_sport, orig_dport,
   inet-inet_sport, inet-inet_dport, sk);
@@ -1277,7 +1277,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct 
sk_buff *skb,
newinet-mc_ttl   = ip_hdr(skb)-ttl;
newinet-rcv_tos  = ip_hdr(skb)-tos;
inet_csk(newsk)-icsk_ext_hdr_len = 0;
-   inet_set_txhash(newsk);
+   sk_set_txhash(newsk);
if (inet_opt)
inet_csk(newsk)-icsk_ext_hdr_len = inet_opt-opt.optlen;
newinet-inet_id = newtp-write_seq ^ jiffies;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2572a32..9aadd57 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -199,7 +199,7 @@ ipv4_connected:
  NULL);
 
sk-sk_state = TCP_ESTABLISHED;
-   ip6_set_txhash(sk);
+   sk_set_txhash(sk);
 out:
fl6_sock_release(flowlabel);
return err;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index