On Thu, 2015-07-02 at 01:26 +0200, Eric Dumazet wrote: > On Thu, Jul 2, 2015 at 1:18 AM, Alex Gartrell <alexgartr...@gmail.com> wrote: > > On Wednesday, July 1, 2015, Eric Dumazet <eduma...@google.com> wrote: > >> > >> On Wed, Jul 1, 2015 at 11:14 PM, David Miller <da...@davemloft.net> wrote: > >> > From: Alex Gartrell <agartr...@fb.com> > >> > Date: Wed, 1 Jul 2015 13:13:09 -0700 > >> > > >> >> If we early-demux bind a TCP_TIMEWAIT socket to an skb and then orphan > >> >> it > >> >> (as we need to do in the ipvs forwarding case), sock_wfree and > >> >> sock_rfree > >> >> are going to reach into the inet_timewait_sock and mess with fields > >> >> that > >> >> don't exist. > >> >> > >> >> Signed-off-by: Alex Gartrell <agartr...@fb.com> > >> > > >> > If we're forwarding, we should not find a local socket, period. > >> > >> A socket cannot change state to TCP_TIMEWAIT. > >> > >> A new object is allocated and old one is removed from ehash, then > >> freed (rcu rules being applied) > >> > >> Also sock_wfree() has nothing to do with early demux. It is for output > >> path skbs only. > > > > > > Alright I kind of cheated and didn't include full context here. The problem > > is that within ipvs we are getting packets that have been early demuxed and > > associated with time wait sockets which we then wish to forward immediately > > (ip_vs_xmit.c). Under normal circumstances it would never be associated > > with any sk at all, but it is because of early demux, so we want to drop the > > relationship by calling skb_orphan. This invokes the destructor which lands > > us there. > > > > So that is how we reach this illegal "treating a twsk like an sk" state. > > > > If there is a better way to drop the association than skb_orphan I will use > > it. > > I think you are mistaken Alex. > > socket early demux cannot possibly set skb->destructor to sock_rfree() > > If skb->destructor is set by early demux, it correctly points to sock_edemux() > > And this one correctly handles all socket variants.
If ipvs is the problem, could you try instead following patch ? Shoot in the dark, as you do not give a lot of details :( diff --git a/include/net/sock.h b/include/net/sock.h index 05a8c1aea251..f77fe9acc7a4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1932,6 +1932,14 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) } } +/* This helper checks if a socket is a full socket, + * ie _not_ a timewait or request socket. + */ +static inline bool sk_fullsock(const struct sock *sk) +{ + return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); +} + /* * Queue a received datagram if it will fit. Stream and sequenced * protocols can't normally use this as they need to fit buffers in @@ -1944,6 +1952,9 @@ static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) static inline void skb_set_owner_w(struct sk_buff *skb, struct sock *sk) { skb_orphan(skb); + if (unlikely(!sk_fullsock(sk)) + return; + skb->sk = sk; skb->destructor = sock_wfree; skb_set_hash_from_sk(skb, sk); @@ -2204,14 +2215,6 @@ static inline struct sock *skb_steal_sock(struct sk_buff *skb) return NULL; } -/* This helper checks if a socket is a full socket, - * ie _not_ a timewait or request socket. - */ -static inline bool sk_fullsock(const struct sock *sk) -{ - return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); -} - void sock_enable_timestamp(struct sock *sk, int flag); int sock_get_timestamp(struct sock *, struct timeval __user *); int sock_get_timestampns(struct sock *, struct timespec __user *); diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index 5d2b806a862e..ff05ec5a9016 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -1161,9 +1161,10 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af) if (unlikely(skb->sk != NULL && hooknum == NF_INET_LOCAL_OUT && af == AF_INET)) { struct sock *sk = skb->sk; - struct inet_sock *inet = inet_sk(skb->sk); - if (inet && sk->sk_family == PF_INET && inet->nodefrag) + if (sk_fullsock(sk) && + sk->sk_family == PF_INET && + inet_sk(sk)->nodefrag) return NF_ACCEPT; } @@ -1640,9 +1641,10 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af) if (unlikely(skb->sk != NULL && hooknum == NF_INET_LOCAL_OUT && af == AF_INET)) { struct sock *sk = skb->sk; - struct inet_sock *inet = inet_sk(skb->sk); - if (inet && sk->sk_family == PF_INET && inet->nodefrag) + if (sk_fullsock(sk) && + sk->sk_family == PF_INET && + inet_sk(sk)->nodefrag) return NF_ACCEPT; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html