Re: [PATCH v2] udp reuseport: fix packet of same flow hashed to different socket

2016-06-08 Thread Eric Dumazet
On Thu, 2016-06-09 at 09:43 +0800, Su Xuemin wrote:
> On Wed 2016-06-08 at 08:43 -0700, Eric Dumazet wrote:
> > I am not convinced this is the right way to fix the issue.
> > 
> > Fact that you did not change udp4_lib_lookup2() is telling me something.
> > 
> > 
> > 1) If host has 100 different IP, and 10 sockets bound to 0.0.0.0:53 
> > 2) If we receive datagrams from SRCIP:srcport send to IP{1..100}:53
> > will all be hashed on same slot.
> > 
> > See the hash used in soreuseport logic as an equivalent of a 4-tuple
> > hash really, not a 3-tuple one.
> > 
> 
> This is my understanding of __udp4_lib_lookup(), see the comments below,
> please kindly review it.
> The problem of the current code is:
>   In Path 1, we pass daddr to udp_ehashfn() inside udp4_lib_lookup2().
>   In Path 2, we pass 0.0.0.0 to udp_ehashfn() inside udp4_lib_lookup2().
>   In Path 3, we pass daddr to udp_ehashfn(), even if inet->inet_rcv_saddr
> is 0.0.0.0.

My patch always pass daddr, and never 0.0.0.0 which makes little sense.

If a socket is bound to 0.0.0.0, then daddr will match it anyway.

I wrote this code, and clearly I was wrong.

The only moment we should use 0.0.0.0 is to get the hash bucket, but
really the lookup should use all the available keys.

Your patch can not be backported to stable versions because socket is
not stable (SLAB_DESTROY_BY_RCU rules) meaning that
inet_sk(sk)->inet_rcv_saddr could contain garbage.

daddr on the other hand is stable, since it is on the caller stack or
incoming packet (IPv6) and can not change under us.

We do not want to compute a hash based on garbage.

Have you tried my patch ?





Re: [PATCH v2] udp reuseport: fix packet of same flow hashed to different socket

2016-06-08 Thread Su Xuemin
On Wed 2016-06-08 at 08:43 -0700, Eric Dumazet wrote:
> I am not convinced this is the right way to fix the issue.
> 
> Fact that you did not change udp4_lib_lookup2() is telling me something.
> 
> 
> 1) If host has 100 different IP, and 10 sockets bound to 0.0.0.0:53 
> 2) If we receive datagrams from SRCIP:srcport send to IP{1..100}:53
> will all be hashed on same slot.
> 
> See the hash used in soreuseport logic as an equivalent of a 4-tuple
> hash really, not a 3-tuple one.
> 

This is my understanding of __udp4_lib_lookup(), see the comments below,
please kindly review it.
The problem of the current code is:
  In Path 1, we pass daddr to udp_ehashfn() inside udp4_lib_lookup2().
  In Path 2, we pass 0.0.0.0 to udp_ehashfn() inside udp4_lib_lookup2().
  In Path 3, we pass daddr to udp_ehashfn(), even if inet->inet_rcv_saddr
is 0.0.0.0.
  The hash value returned by udp_ehashfn() is used as a random seed to
select socket from the sockets of a hslot. In Path 2 and Path 3, this
hash value is different, so we will select different sockets.
  Pass inet->inet_rcv_saddr to udp_ehashfn() in Path 3 will make the
logic consistent in all these three Pathes.

...
if (hslot->count > 10) {
/* Xuemin:
 * for udptable->hash2[], sockets bound to specific ip and
 * sockets bound to 0.0.0.0 may be placed in different hslot,
 * so we may have to search two hslots. */
...
/* Xuemin Path 1:
 * for udptable->hash2[], firstly try to find the socket
 * bound to the specific daddr. */
result = udp4_lib_lookup2(net, saddr, sport,
  daddr, hnum, dif,
  hslot2, slot2, skb);
if (!result) {
/* Xuemin Path 2:
 * for udptable->hash2[], if we can not find the
 * socket bound to the specific daddr, we then
 * try to find the socket bound to 0.0.0.0. */
result = udp4_lib_lookup2(net, saddr, sport,
  htonl(INADDR_ANY), hnum, dif,
  hslot2, slot2, skb);
}
return result;
}
/* Xuemin:
 * for udptable->hash[], all sockets bound to the same
 * port(no matter they are bound to specific ip or to 0.0.0.0)
 * are placed in the same hslot, so we only need to search one
 * hslot. */
begin:
result = NULL;
badness = 0;
sk_for_each_rcu(sk, >head) {
score = compute_score(sk, net, saddr, hnum, sport,
  daddr, dport, dif);
if (score > badness) {
reuseport = sk->sk_reuseport;
if (reuseport) {
/* Xuemin Path 3:
 * when inet_sk(sk)->inet_rcv_saddr is 0.0.0.0,
 * we should not pass daddr here. */
hash = udp_ehashfn(net, daddr, hnum,
   saddr, sport);
result = reuseport_select_sock(sk, hash, skb,
sizeof(struct udphdr));





Re: [PATCH v2] udp reuseport: fix packet of same flow hashed to different socket

2016-06-08 Thread Eric Dumazet
On Wed, 2016-06-08 at 11:40 +0800, Su Xuemin wrote:
> From: "Su, Xuemin" 
> 
> There is a corner case in which udp packets belonging to a same
> flow are hashed to different socket when hslot->count changes from 10
> to 11:

> It's the same case for IPv6, and this patch also fixes that.
> 
> Signed-off-by: Su, Xuemin 
> ---
> The patch v1 does not fix the code in IPv6. Thank Eric Dumazet for
> pointing that.
> And I use this tree to generate this patch, hope it's correct:
>   git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> 

I am not convinced this is the right way to fix the issue.

Fact that you did not change udp4_lib_lookup2() is telling me something.


1) If host has 100 different IP, and 10 sockets bound to 0.0.0.0:53 
2) If we receive datagrams from SRCIP:srcport send to IP{1..100}:53
will all be hashed on same slot.

See the hash used in soreuseport logic as an equivalent of a 4-tuple
hash really, not a 3-tuple one.

I would try instead :

(Looks like we missed this when commit 1223c67c093 ("udp: fix for
unicast RX path optimization") was reviewed/merged.

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 
d56c0559b477cb96ce78c9b9b7dacc3109594f3a..5847fe48f17d64bee32551e9dd42024a8e65fb10
 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -563,7 +563,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 
saddr,
goto begin;
 
result = udp4_lib_lookup2(net, saddr, sport,
- htonl(INADDR_ANY), hnum, dif,
+ daddr, hnum, dif,
  hslot2, slot2, skb);
}
return result;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 
2da1896af93496cc58762c67b4cd4b4f42924901..501b5563234d0921eff9f1e50155db76f27b3837
 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -277,7 +277,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
goto begin;
 
result = udp6_lib_lookup2(net, saddr, sport,
- _any, hnum, dif,
+ daddr, hnum, dif,
  hslot2, slot2, skb);
}
return result;