Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number
On Wed, Jul 29, 2015 at 2:54 AM, Thomas Graf tg...@suug.ch wrote: On 07/29/15 at 11:29am, Eric Dumazet wrote: On Wed, 2015-07-29 at 11:13 +0200, Thomas Graf wrote: On 07/28/15 at 04:02pm, Tom Herbert wrote: This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert t...@herbertland.com Doesn't this break TX hashing with SO_REUSEPORT? AFAIK nothing uses sk_txhash yet. skb_set_hash_from_sk() skb_get_hash() Am I misreading this? I'm not using SO_REUSEPORT and it might be OK to assume that different sockets may go to different queues even if the L4 tuple is identical. Hi Thomas, The salient property of both sk_txhash and skb-hash is that they provide a uniform distribution over flows. It is incorrect to assume that either of these immutable during the lifetime of a flow, so yes this means that packets of a flow may go to different receive queues when hashes change. SO_REUSEPORT is a process in the receive path but uses ehashfn over the ports. But even with SO_REUSEPORT we provide no guarantee that packets of a flow will always hit the same socket, the hashing is not consistent when new reuseport sockets are added or removed-- this is actually a long standing issue with SO_REUSEPORT in the TCP case since it is possible to orphan connections in SYN-RECV. I believe Eric was working toward fixing that, so maybe in the future we can use skb-hash if it is a savings. Thanks, Tom -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number
On 07/29/15 at 08:58am, Tom Herbert wrote: The salient property of both sk_txhash and skb-hash is that they provide a uniform distribution over flows. It is incorrect to assume that either of these immutable during the lifetime of a flow, so yes this means that packets of a flow may go to different receive queues when hashes change. SO_REUSEPORT is a process in the receive path but uses ehashfn over the ports. But even with SO_REUSEPORT we provide no guarantee that packets of a flow will always hit the same socket, the hashing is not consistent when new reuseport sockets are added or removed-- this is actually a long standing issue with SO_REUSEPORT in the TCP case since it is possible to orphan connections in SYN-RECV. I believe Eric was working toward fixing that, so maybe in the future we can use skb-hash if it is a savings. Thanks for the explanation. I have no objections to the changes then. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number
On 07/28/15 at 04:02pm, Tom Herbert wrote: This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert t...@herbertland.com Doesn't this break TX hashing with SO_REUSEPORT? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number
On Wed, 2015-07-29 at 11:13 +0200, Thomas Graf wrote: On 07/28/15 at 04:02pm, Tom Herbert wrote: This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert t...@herbertland.com Doesn't this break TX hashing with SO_REUSEPORT? AFAIK nothing uses sk_txhash yet. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number
On 07/29/15 at 11:29am, Eric Dumazet wrote: On Wed, 2015-07-29 at 11:13 +0200, Thomas Graf wrote: On 07/28/15 at 04:02pm, Tom Herbert wrote: This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert t...@herbertland.com Doesn't this break TX hashing with SO_REUSEPORT? AFAIK nothing uses sk_txhash yet. skb_set_hash_from_sk() skb_get_hash() Am I misreading this? I'm not using SO_REUSEPORT and it might be OK to assume that different sockets may go to different queues even if the L4 tuple is identical. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number
On 07/29/15 at 12:06pm, Eric Dumazet wrote: On Wed, 2015-07-29 at 11:54 +0200, Thomas Graf wrote: skb_set_hash_from_sk() skb_get_hash() Am I misreading this? I'm not using SO_REUSEPORT and it might be OK to assume that different sockets may go to different queues even if the L4 tuple is identical. skb_set_hash_from_sk() sets skb-hash in output path. Nothing uses it then later. bonding uses its own hash functions. SO_REUSEPORT is on input path, and uses its own hash anyway. Yes, I'm talking about the output path only but after further reading, the only case that could be a problem is if two SO_REUSEPORT sockets connect to the same destination address and port which will be prevented by connect anyway. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] net: Set sk_txhash from a random number
On Wed, 2015-07-29 at 11:54 +0200, Thomas Graf wrote: skb_set_hash_from_sk() skb_get_hash() Am I misreading this? I'm not using SO_REUSEPORT and it might be OK to assume that different sockets may go to different queues even if the L4 tuple is identical. skb_set_hash_from_sk() sets skb-hash in output path. Nothing uses it then later. bonding uses its own hash functions. SO_REUSEPORT is on input path, and uses its own hash anyway. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/2] net: Set sk_txhash from a random number
This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert t...@herbertland.com --- include/net/ip.h| 16 include/net/ipv6.h | 19 --- include/net/sock.h | 8 net/ipv4/datagram.c | 2 +- net/ipv4/tcp_ipv4.c | 4 ++-- net/ipv6/datagram.c | 2 +- net/ipv6/tcp_ipv6.c | 4 ++-- 7 files changed, 14 insertions(+), 41 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index d5fe9f2..bee5f35 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -370,22 +370,6 @@ static inline void iph_to_flow_copy_v4addrs(struct flow_keys *flow, flow-control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; } -static inline void inet_set_txhash(struct sock *sk) -{ - struct inet_sock *inet = inet_sk(sk); - struct flow_keys keys; - - memset(keys, 0, sizeof(keys)); - - keys.addrs.v4addrs.src = inet-inet_saddr; - keys.addrs.v4addrs.dst = inet-inet_daddr; - keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS; - keys.ports.src = inet-inet_sport; - keys.ports.dst = inet-inet_dport; - - sk-sk_txhash = flow_hash_from_keys(keys); -} - static inline __wsum inet_gro_compute_pseudo(struct sk_buff *skb, int proto) { const struct iphdr *iph = skb_gro_network_header(skb); diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 82dbdb0..7c79798 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -707,25 +707,6 @@ static inline void iph_to_flow_copy_v6addrs(struct flow_keys *flow, } #if IS_ENABLED(CONFIG_IPV6) -static inline void ip6_set_txhash(struct sock *sk) -{ - struct inet_sock *inet = inet_sk(sk); - struct ipv6_pinfo *np = inet6_sk(sk); - struct flow_keys keys; - - memset(keys, 0, sizeof(keys)); - - memcpy(keys.addrs.v6addrs.src, np-saddr, - sizeof(keys.addrs.v6addrs.src)); - memcpy(keys.addrs.v6addrs.dst, sk-sk_v6_daddr, - sizeof(keys.addrs.v6addrs.dst)); - keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS; - keys.ports.src = inet-inet_sport; - keys.ports.dst = inet-inet_dport; - - sk-sk_txhash = flow_hash_from_keys(keys); -} - static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb, __be32 flowlabel, bool autolabel) { diff --git a/include/net/sock.h b/include/net/sock.h index 4353ef7..fe735c4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1687,6 +1687,14 @@ static inline void sock_graft(struct sock *sk, struct socket *parent) kuid_t sock_i_uid(struct sock *sk); unsigned long sock_i_ino(struct sock *sk); +static inline void sk_set_txhash(struct sock *sk) +{ + sk-sk_txhash = prandom_u32(); + + if (unlikely(!sk-sk_txhash)) + sk-sk_txhash = 1; +} + static inline struct dst_entry * __sk_dst_get(struct sock *sk) { diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c index 574fad9..f915abf 100644 --- a/net/ipv4/datagram.c +++ b/net/ipv4/datagram.c @@ -74,7 +74,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len inet-inet_daddr = fl4-daddr; inet-inet_dport = usin-sin_port; sk-sk_state = TCP_ESTABLISHED; - inet_set_txhash(sk); + sk_set_txhash(sk); inet-inet_id = jiffies; sk_dst_set(sk, rt-dst); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 486ba96..d27eb54 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -222,7 +222,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) if (err) goto failure; - inet_set_txhash(sk); + sk_set_txhash(sk); rt = ip_route_newports(fl4, rt, orig_sport, orig_dport, inet-inet_sport, inet-inet_dport, sk); @@ -1277,7 +1277,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb, newinet-mc_ttl = ip_hdr(skb)-ttl; newinet-rcv_tos = ip_hdr(skb)-tos; inet_csk(newsk)-icsk_ext_hdr_len = 0; - inet_set_txhash(newsk); + sk_set_txhash(newsk); if (inet_opt) inet_csk(newsk)-icsk_ext_hdr_len = inet_opt-opt.optlen; newinet-inet_id = newtp-write_seq ^ jiffies; diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 2572a32..9aadd57 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -199,7 +199,7 @@ ipv4_connected: NULL); sk-sk_state = TCP_ESTABLISHED; - ip6_set_txhash(sk); + sk_set_txhash(sk); out: fl6_sock_release(flowlabel); return err; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index