Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-09 Thread David Miller
From: Jon Maxwell 
Date: Thu,  9 Mar 2017 12:15:21 +1100

> We have seen a few incidents lately where a dst_enty has been freed
> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
> dst_entry. If the conditions/timings are right a crash then ensues when the
> freed dst_entry is referenced later on. A Common crashing back trace is:
 ...
> A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
> regardless of whether user space has the socket locked. This can result in a 
> race condition where the same dst_entry cached in sk->sk_dst_entry can be 
> decremented twice for the same socket via: 
> 
> do_redirect()->__sk_dst_check()-> dst_release(). 
> 
> Which leads to the dst_entry being prematurely freed with another socket 
> pointing to it via sk->sk_dst_cache and a subsequent crash.
> 
> To fix this skip do_redirect() if usespace has the socket locked. Instead let 
> the redirect take place later when user space does not have the socket 
> locked.
> 
> The dccp code is very similar in this respect, so fixing it there too. 
> 
> As Eric Garver pointed out the following commit now invalidates routes. Which
> can set the dst->obsolete flag so that ipv4_dst_check() returns null and 
> triggers the dst_release().
> 
> Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
> Cc: Eric Garver 
> Cc: Hannes Sowa 
> Signed-off-by: Jon Maxwell 

Please add the ipv6 side of this fix to this patch and repost.

Thank you.


Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-09 Thread David Miller
From: Jon Maxwell 
Date: Thu,  9 Mar 2017 12:15:21 +1100

> We have seen a few incidents lately where a dst_enty has been freed
> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
> dst_entry. If the conditions/timings are right a crash then ensues when the
> freed dst_entry is referenced later on. A Common crashing back trace is:
 ...
> A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
> regardless of whether user space has the socket locked. This can result in a 
> race condition where the same dst_entry cached in sk->sk_dst_entry can be 
> decremented twice for the same socket via: 
> 
> do_redirect()->__sk_dst_check()-> dst_release(). 
> 
> Which leads to the dst_entry being prematurely freed with another socket 
> pointing to it via sk->sk_dst_cache and a subsequent crash.
> 
> To fix this skip do_redirect() if usespace has the socket locked. Instead let 
> the redirect take place later when user space does not have the socket 
> locked.
> 
> The dccp code is very similar in this respect, so fixing it there too. 
> 
> As Eric Garver pointed out the following commit now invalidates routes. Which
> can set the dst->obsolete flag so that ipv4_dst_check() returns null and 
> triggers the dst_release().
> 
> Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
> Cc: Eric Garver 
> Cc: Hannes Sowa 
> Signed-off-by: Jon Maxwell 

Please add the ipv6 side of this fix to this patch and repost.

Thank you.


Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Jonathan Maxwell
On Thu, Mar 9, 2017 at 3:40 PM, Eric Dumazet  wrote:
> On Thu, 2017-03-09 at 14:42 +1100, Jonathan Maxwell wrote:
>> Sorry let me resend in plain text mode.
>>
>> On Thu, Mar 9, 2017 at 1:10 PM, Eric Dumazet  wrote:
>> > On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
>> >> We have seen a few incidents lately where a dst_enty has been freed
>> >> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
>> >> dst_entry. If the conditions/timings are right a crash then ensues when 
>> >> the
>> >> freed dst_entry is referenced later on. A Common crashing back trace is:
>> >
>> > Very nice catch !
>> >
>>
>> Thanks Eric.
>>
>> > Don't we have a similar issue for IPv6 ?
>> >
>> >
>>
>> Good point.
>>
>> We checked and as far as we can tell IPv6 does not invalidate the route.
>> So it should be safer.
>
> Simply doing :
>
> __sk_dst_check(sk, np->dst_cookie);
>
> is racy, even before calling dst->ops->redirect(dst, sk, skb);
>
> (if socket is owned by user)
>
>
>

Okay, I will add a similar patch for IPv6 to also protect from that.


Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Jonathan Maxwell
On Thu, Mar 9, 2017 at 3:40 PM, Eric Dumazet  wrote:
> On Thu, 2017-03-09 at 14:42 +1100, Jonathan Maxwell wrote:
>> Sorry let me resend in plain text mode.
>>
>> On Thu, Mar 9, 2017 at 1:10 PM, Eric Dumazet  wrote:
>> > On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
>> >> We have seen a few incidents lately where a dst_enty has been freed
>> >> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
>> >> dst_entry. If the conditions/timings are right a crash then ensues when 
>> >> the
>> >> freed dst_entry is referenced later on. A Common crashing back trace is:
>> >
>> > Very nice catch !
>> >
>>
>> Thanks Eric.
>>
>> > Don't we have a similar issue for IPv6 ?
>> >
>> >
>>
>> Good point.
>>
>> We checked and as far as we can tell IPv6 does not invalidate the route.
>> So it should be safer.
>
> Simply doing :
>
> __sk_dst_check(sk, np->dst_cookie);
>
> is racy, even before calling dst->ops->redirect(dst, sk, skb);
>
> (if socket is owned by user)
>
>
>

Okay, I will add a similar patch for IPv6 to also protect from that.


Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Eric Dumazet
On Thu, 2017-03-09 at 14:42 +1100, Jonathan Maxwell wrote:
> Sorry let me resend in plain text mode.
> 
> On Thu, Mar 9, 2017 at 1:10 PM, Eric Dumazet  wrote:
> > On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
> >> We have seen a few incidents lately where a dst_enty has been freed
> >> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
> >> dst_entry. If the conditions/timings are right a crash then ensues when the
> >> freed dst_entry is referenced later on. A Common crashing back trace is:
> >
> > Very nice catch !
> >
> 
> Thanks Eric.
> 
> > Don't we have a similar issue for IPv6 ?
> >
> >
> 
> Good point.
> 
> We checked and as far as we can tell IPv6 does not invalidate the route.
> So it should be safer.

Simply doing :

__sk_dst_check(sk, np->dst_cookie);

is racy, even before calling dst->ops->redirect(dst, sk, skb);

(if socket is owned by user)





Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Eric Dumazet
On Thu, 2017-03-09 at 14:42 +1100, Jonathan Maxwell wrote:
> Sorry let me resend in plain text mode.
> 
> On Thu, Mar 9, 2017 at 1:10 PM, Eric Dumazet  wrote:
> > On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
> >> We have seen a few incidents lately where a dst_enty has been freed
> >> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
> >> dst_entry. If the conditions/timings are right a crash then ensues when the
> >> freed dst_entry is referenced later on. A Common crashing back trace is:
> >
> > Very nice catch !
> >
> 
> Thanks Eric.
> 
> > Don't we have a similar issue for IPv6 ?
> >
> >
> 
> Good point.
> 
> We checked and as far as we can tell IPv6 does not invalidate the route.
> So it should be safer.

Simply doing :

__sk_dst_check(sk, np->dst_cookie);

is racy, even before calling dst->ops->redirect(dst, sk, skb);

(if socket is owned by user)





Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Jonathan Maxwell
Sorry let me resend in plain text mode.

On Thu, Mar 9, 2017 at 1:10 PM, Eric Dumazet  wrote:
> On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
>> We have seen a few incidents lately where a dst_enty has been freed
>> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
>> dst_entry. If the conditions/timings are right a crash then ensues when the
>> freed dst_entry is referenced later on. A Common crashing back trace is:
>
> Very nice catch !
>

Thanks Eric.

> Don't we have a similar issue for IPv6 ?
>
>

Good point.

We checked and as far as we can tell IPv6 does not invalidate the route.
So it should be safer.


Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Jonathan Maxwell
Sorry let me resend in plain text mode.

On Thu, Mar 9, 2017 at 1:10 PM, Eric Dumazet  wrote:
> On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
>> We have seen a few incidents lately where a dst_enty has been freed
>> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
>> dst_entry. If the conditions/timings are right a crash then ensues when the
>> freed dst_entry is referenced later on. A Common crashing back trace is:
>
> Very nice catch !
>

Thanks Eric.

> Don't we have a similar issue for IPv6 ?
>
>

Good point.

We checked and as far as we can tell IPv6 does not invalidate the route.
So it should be safer.


Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Eric Dumazet
On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
> We have seen a few incidents lately where a dst_enty has been freed
> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
> dst_entry. If the conditions/timings are right a crash then ensues when the
> freed dst_entry is referenced later on. A Common crashing back trace is:

Very nice catch !

Don't we have a similar issue for IPv6 ?




Re: [PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Eric Dumazet
On Thu, 2017-03-09 at 12:15 +1100, Jon Maxwell wrote:
> We have seen a few incidents lately where a dst_enty has been freed
> with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
> dst_entry. If the conditions/timings are right a crash then ensues when the
> freed dst_entry is referenced later on. A Common crashing back trace is:

Very nice catch !

Don't we have a similar issue for IPv6 ?




[PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Jon Maxwell
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at 8163e648
[exception RIP: __tcp_ack_snd_check+74]
.
.   
 #9 [] tcp_rcv_established at 81580b64
#10 [] tcp_v4_do_rcv at 8158b54a
#11 [] tcp_v4_rcv at 8158cd02
#12 [] ip_local_deliver_finish at 815668f4
#13 [] ip_local_deliver at 81566bd9
#14 [] ip_rcv_finish at 8156656d
#15 [] ip_rcv at 81566f06
#16 [] __netif_receive_skb_core at 8152b3a2
#17 [] __netif_receive_skb at 8152b608
#18 [] netif_receive_skb at 8152b690
#19 [] vmxnet3_rq_rx_complete at a015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at a015f32a [vmxnet3]
#21 [] net_rx_action at 8152bac2
#22 [] __do_softirq at 81084b4f
#23 [] call_softirq at 8164845c
#24 [] do_softirq at 81016fc5
#25 [] irq_exit at 81084ee5
#26 [] do_IRQ at 81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here: 

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹   const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹   const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹   return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹   ▹   (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in 
netfilter code as well. 

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a 
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a 
race condition where the same dst_entry cached in sk->sk_dst_entry can be 
decremented twice for the same socket via: 

do_redirect()->__sk_dst_check()-> dst_release(). 

Which leads to the dst_entry being prematurely freed with another socket 
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let 
the redirect take place later when user space does not have the socket 
locked.

The dccp code is very similar in this respect, so fixing it there too. 

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and 
triggers the dst_release().

Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver 
Cc: Hannes Sowa 
Signed-off-by: Jon Maxwell 
---
 net/dccp/ipv4.c | 3 ++-
 net/ipv4/tcp_ipv4.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 409d0cf..b99168b 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -289,7 +289,8 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 
switch (type) {
case ICMP_REDIRECT:
-   dccp_do_redirect(skb, sk);
+   if (!sock_owned_by_user(sk))
+   dccp_do_redirect(skb, sk);
goto out;
case ICMP_SOURCE_QUENCH:
/* Just silently ignore these. */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f3ec13..575e19d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -431,7 +431,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 
switch (type) {
case ICMP_REDIRECT:
-   do_redirect(icmp_skb, sk);
+   if (!sock_owned_by_user(sk))
+   do_redirect(icmp_skb, sk);
goto out;
case ICMP_SOURCE_QUENCH:
/* Just silently ignore these. */
-- 
1.8.3.1



[PATCH net] dccp/tcp: fix routing redirect race

2017-03-08 Thread Jon Maxwell
We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at 8163e648
[exception RIP: __tcp_ack_snd_check+74]
.
.   
 #9 [] tcp_rcv_established at 81580b64
#10 [] tcp_v4_do_rcv at 8158b54a
#11 [] tcp_v4_rcv at 8158cd02
#12 [] ip_local_deliver_finish at 815668f4
#13 [] ip_local_deliver at 81566bd9
#14 [] ip_rcv_finish at 8156656d
#15 [] ip_rcv at 81566f06
#16 [] __netif_receive_skb_core at 8152b3a2
#17 [] __netif_receive_skb at 8152b608
#18 [] netif_receive_skb at 8152b690
#19 [] vmxnet3_rq_rx_complete at a015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at a015f32a [vmxnet3]
#21 [] net_rx_action at 8152bac2
#22 [] __do_softirq at 81084b4f
#23 [] call_softirq at 8164845c
#24 [] do_softirq at 81016fc5
#25 [] irq_exit at 81084ee5
#26 [] do_IRQ at 81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here: 

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹   const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹   const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹   return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹   ▹   (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in 
netfilter code as well. 

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a 
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a 
race condition where the same dst_entry cached in sk->sk_dst_entry can be 
decremented twice for the same socket via: 

do_redirect()->__sk_dst_check()-> dst_release(). 

Which leads to the dst_entry being prematurely freed with another socket 
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let 
the redirect take place later when user space does not have the socket 
locked.

The dccp code is very similar in this respect, so fixing it there too. 

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and 
triggers the dst_release().

Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver 
Cc: Hannes Sowa 
Signed-off-by: Jon Maxwell 
---
 net/dccp/ipv4.c | 3 ++-
 net/ipv4/tcp_ipv4.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 409d0cf..b99168b 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -289,7 +289,8 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 
switch (type) {
case ICMP_REDIRECT:
-   dccp_do_redirect(skb, sk);
+   if (!sock_owned_by_user(sk))
+   dccp_do_redirect(skb, sk);
goto out;
case ICMP_SOURCE_QUENCH:
/* Just silently ignore these. */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f3ec13..575e19d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -431,7 +431,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 
switch (type) {
case ICMP_REDIRECT:
-   do_redirect(icmp_skb, sk);
+   if (!sock_owned_by_user(sk))
+   do_redirect(icmp_skb, sk);
goto out;
case ICMP_SOURCE_QUENCH:
/* Just silently ignore these. */
-- 
1.8.3.1