[PATCH 4/4] tcp: avoid noref dst leak on input path

2017-10-06 Thread Paolo Abeni
Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
processing tcp packets:

   to-be-untracked noref entity 942cb71ea300 not found in cache
   [ cut here ]
   WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 
rcu_track_noref+0xa4/0xf0
   Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal 
intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt 
ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich ipmi_si mxm_wmi sg 
pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd 
auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe 
mdio crc32c_intel ahci ptp i2c_algo_bit libahci pps_core i2c_core libata dca 
dm_mirror dm_region_hash dm_log dm_mod
   CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 
4.14.0-rc1.noref_route+ #1610
   Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
   task: 940e4830 task.stack: aec406a2
   RIP: 0010:rcu_track_noref+0xa4/0xf0
   RSP: 0018:aec406a238e0 EFLAGS: 00010246
   RAX: 0040 RBX:  RCX: 0002
   RDX:  RSI:  RDI: 0292
   RBP: aec406a238e0 R08:  R09: 
   R10: 0001 R11: 0003 R12: 
   R13: 942cb511 R14: fe88 R15: 942cb1a20200
   FS:  () GS:942cbee0() 
knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 7febc072d140 CR3: 001feebd6002 CR4: 003606e0
   DR0:  DR1:  DR2: 
   DR3:  DR6: fffe0ff0 DR7: 0400
   Call Trace:
tcp_data_queue+0x82a/0xce0
tcp_rcv_established+0x283/0x570
tcp_v4_do_rcv+0x102/0x1e0
tcp_v4_rcv+0xa9e/0xc10
ip_local_deliver_finish+0x128/0x380
? ip_local_deliver_finish+0x41/0x380
ip_local_deliver+0x74/0x230
ip_rcv_finish+0x105/0x5e0
ip_rcv+0x2a7/0x540
__netif_receive_skb_core+0x3b9/0xe10
? netif_receive_skb_internal+0x40/0x390
__netif_receive_skb+0x18/0x60
netif_receive_skb_internal+0x8d/0x390
? netif_receive_skb_internal+0x40/0x390
napi_gro_complete+0x127/0x1d0
? napi_gro_complete+0x2a/0x1d0
napi_gro_flush+0x5f/0x80
napi_complete_done+0x96/0x100
ixgbe_poll+0x5f8/0x7c0 [ixgbe]
net_rx_action+0x27d/0x520
__do_softirq+0xd1/0x4f5
run_ksoftirqd+0x25/0x70
smpboot_thread_fn+0x11a/0x1f0
kthread+0x155/0x190
? sort_range+0x30/0x30
? kthread_create_on_node+0x70/0x70
ret_from_fork+0x2a/0x40
   Code: e9 83 c2 01 48 83 c0 18 83 fa 07 75 ef 80 3d b0 e5 ff 00 00 75 
d2 48 c7 c7 50 54 07 9d 31 c0 c6 05 9e e5 ff 00 01 e8 ef af fe ff <0f> ff 5d c3 
80 3d 8f e5 ff 00 00 75 b0 48 c7 c7 00 54 07 9d 31

we must clear the skb dst before enqueuing the skb somewhere,
but currently the dst entry for packets delivered
via tcp_rcv_established() -> tcp_queue_rcv() is not cleared.

Fix it by adding the explicit drop in tcp_queue_rcv() and moving
the current skb_dst_drop() just before the other enqueuing
operation, do avoid unneeded double skb_dst_drop() for some
path.

The leak itself is not harmful, because the tcp recvmsg() code
should not access such info.

Signed-off-by: Paolo Abeni 
---
 net/ipv4/tcp_input.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656b..bf4e17edfe7a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4422,6 +4422,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct 
sk_buff *skb)
return;
}
 
+   /* drop the -possibly noref - dst before delivery the skb to ofo tree */
+   skb_dst_drop(skb);
+
/* Stash tstamp to avoid being stomped on by rbnode */
if (TCP_SKB_CB(skb)->has_rxtstamp)
TCP_SKB_CB(skb)->swtstamp = skb->tstamp;
@@ -4560,6 +4563,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, 
struct sk_buff *skb, int
  skb, fragstolen)) ? 1 : 0;
tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
if (!eaten) {
+   skb_dst_drop(skb);
__skb_queue_tail(&sk->sk_receive_queue, skb);
skb_set_owner_r(skb, sk);
}
@@ -4626,7 +4630,6 @@ static void tcp_data_queue(struct sock 

Re: [PATCH 4/4] tcp: avoid noref dst leak on input path

2017-10-06 Thread Eric Dumazet
On Fri, 2017-10-06 at 14:57 +0200, Paolo Abeni wrote:
> Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
> processing tcp packets:
> 
>to-be-untracked noref entity 942cb71ea300 not found in cache
>[ cut here ]
>WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 
> rcu_track_noref+0xa4/0xf0
>Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal 
> intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
> crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper 
> cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich 
> ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter 
> shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs 
> libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt 
> fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit libahci 
> pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
>CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 
> 4.14.0-rc1.noref_route+ #1610
>Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 
> 01/17/2017
>task: 940e4830 task.stack: aec406a2
>RIP: 0010:rcu_track_noref+0xa4/0xf0
>RSP: 0018:aec406a238e0 EFLAGS: 00010246
>RAX: 0040 RBX:  RCX: 0002
>RDX:  RSI:  RDI: 0292
>RBP: aec406a238e0 R08:  R09: 
>R10: 0001 R11: 0003 R12: 
>R13: 942cb511 R14: fe88 R15: 942cb1a20200
>FS:  () GS:942cbee0() 
> knlGS:
>CS:  0010 DS:  ES:  CR0: 80050033
>CR2: 7febc072d140 CR3: 001feebd6002 CR4: 003606e0
>DR0:  DR1:  DR2: 
>DR3:  DR6: fffe0ff0 DR7: 0400
>Call Trace:
> tcp_data_queue+0x82a/0xce0

That is strange, since tcp_data_queue() starts with

skb_dst_drop(skb); 

So this stack trace looks suspicious.

> tcp_rcv_established+0x283/0x570
> tcp_v4_do_rcv+0x102/0x1e0
> tcp_v4_rcv+0xa9e/0xc10
> ip_local_deliver_finish+0x128/0x380
> ? ip_local_deliver_finish+0x41/0x380
> ip_local_deliver+0x74/0x230
> ip_rcv_finish+0x105/0x5e0
> ip_rcv+0x2a7/0x540
> __netif_receive_skb_core+0x3b9/0xe10
> ? netif_receive_skb_internal+0x40/0x390
> __netif_receive_skb+0x18/0x60
> netif_receive_skb_internal+0x8d/0x390
> ? netif_receive_skb_internal+0x40/0x390
> napi_gro_complete+0x127/0x1d0
> ? napi_gro_complete+0x2a/0x1d0
> napi_gro_flush+0x5f/0x80
> napi_complete_done+0x96/0x100
> ixgbe_poll+0x5f8/0x7c0 [ixgbe]
> net_rx_action+0x27d/0x520
> __do_softirq+0xd1/0x4f5
> run_ksoftirqd+0x25/0x70
> smpboot_thread_fn+0x11a/0x1f0
> kthread+0x155/0x190
> ? sort_range+0x30/0x30
> ? kthread_create_on_node+0x70/0x70
> ret_from_fork+0x2a/0x40
>Code: e9 83 c2 01 48 83 c0 18 83 fa 07 75 ef 80 3d b0 e5 ff 00 00 
> 75 d2 48 c7 c7 50 54 07 9d 31 c0 c6 05 9e e5 ff 00 01 e8 ef af fe ff <0f> ff 
> 5d c3 80 3d 8f e5 ff 00 00 75 b0 48 c7 c7 00 54 07 9d 31
> 
> we must clear the skb dst before enqueuing the skb somewhere,
> but currently the dst entry for packets delivered
> via tcp_rcv_established() -> tcp_queue_rcv() is not cleared.
> 
> Fix it by adding the explicit drop in tcp_queue_rcv() and moving
> the current skb_dst_drop() just before the other enqueuing
> operation, do avoid unneeded double skb_dst_drop() for some
> path.
> 
> The leak itself is not harmful, because the tcp recvmsg() code
> should not access such info.
> 
> Signed-off-by: Paolo Abeni 
> ---
>  net/ipv4/tcp_input.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index c5d7656b..bf4e17edfe7a 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4422,6 +4422,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct 
> sk_buff *skb)
>   return;
>   }
>  
> + /* drop the -possibly noref - dst before delivery the skb to ofo tree */
> + skb_dst_drop(skb);
> +
>   /* Stash tstamp to avoid being stomped on by rbnode */
>   if (TCP_SKB_CB(skb)->has_rxtstamp)
>   TCP_SKB_CB(skb)->swtstamp = skb->tstamp;
> @@ -4560,6 +4563,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, 
> struct sk_buff *skb, 

Re: [PATCH 4/4] tcp: avoid noref dst leak on input path

2017-10-06 Thread Paolo Abeni
Hi,

On Fri, 2017-10-06 at 07:37 -0700, Eric Dumazet wrote:
> On Fri, 2017-10-06 at 14:57 +0200, Paolo Abeni wrote:
> > Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
> > processing tcp packets:
> > 
> >to-be-untracked noref entity 942cb71ea300 not found in cache
> >[ cut here ]
> >WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 
> > rcu_track_noref+0xa4/0xf0
> >Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal 
> > intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
> > crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper 
> > cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich 
> > ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter 
> > shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs 
> > libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt 
> > fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit 
> > libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
> >CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 
> > 4.14.0-rc1.noref_route+ #1610
> >Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 
> > 01/17/2017
> >task: 940e4830 task.stack: aec406a2
> >RIP: 0010:rcu_track_noref+0xa4/0xf0
> >RSP: 0018:aec406a238e0 EFLAGS: 00010246
> >RAX: 0040 RBX:  RCX: 0002
> >RDX:  RSI:  RDI: 0292
> >RBP: aec406a238e0 R08:  R09: 
> >R10: 0001 R11: 0003 R12: 
> >R13: 942cb511 R14: fe88 R15: 942cb1a20200
> >FS:  () GS:942cbee0() 
> > knlGS:
> >CS:  0010 DS:  ES:  CR0: 80050033
> >CR2: 7febc072d140 CR3: 001feebd6002 CR4: 003606e0
> >DR0:  DR1:  DR2: 
> >DR3:  DR6: fffe0ff0 DR7: 0400
> >Call Trace:
> > tcp_data_queue+0x82a/0xce0
> 
> That is strange, since tcp_data_queue() starts with
> 
>   skb_dst_drop(skb); 
> 
> So this stack trace looks suspicious.

Thank you for the feedback.

I most probably messed-up while extracting the info from dmsg, as this
issue gives a couple of splats almost concurrently. Please let me re-do 
the test and post a more resonable dmsg.

The problem with the current code is that in the tcp_rcv_established()
-> tcp_queue_rcv() path, the skb_dst() is not cleared.

Thanks,

Paolo


Re: [PATCH 4/4] tcp: avoid noref dst leak on input path

2017-10-06 Thread Eric Dumazet
On Fri, Oct 6, 2017 at 8:21 AM, Paolo Abeni  wrote:
> Hi,
>
> On Fri, 2017-10-06 at 07:37 -0700, Eric Dumazet wrote:
>> On Fri, 2017-10-06 at 14:57 +0200, Paolo Abeni wrote:
>> > Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
>> > processing tcp packets:
>> >
>> >to-be-untracked noref entity 942cb71ea300 not found in cache
>> >[ cut here ]
>> >WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 
>> > rcu_track_noref+0xa4/0xf0
>> >Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal 
>> > intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
>> > crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper 
>> > cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich 
>> > ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter 
>> > shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs 
>> > libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt 
>> > fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit 
>> > libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
>> >CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 
>> > 4.14.0-rc1.noref_route+ #1610
>> >Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 
>> > 01/17/2017
>> >task: 940e4830 task.stack: aec406a2
>> >RIP: 0010:rcu_track_noref+0xa4/0xf0
>> >RSP: 0018:aec406a238e0 EFLAGS: 00010246
>> >RAX: 0040 RBX:  RCX: 
>> > 0002
>> >RDX:  RSI:  RDI: 
>> > 0292
>> >RBP: aec406a238e0 R08:  R09: 
>> > 

> Thank you for the feedback.
>
> I most probably messed-up while extracting the info from dmsg, as this
> issue gives a couple of splats almost concurrently. Please let me re-do
> the test and post a more resonable dmsg.
>
> The problem with the current code is that in the tcp_rcv_established()
> -> tcp_queue_rcv() path, the skb_dst() is not cleared.
>

In any case, I would rather put one skb_dst_drop() right after the
last possible use of skb dst in TCP stack,
probably after sk_rx_dst_set() call.

Trying to move it in multiple places has been error prone, even if
current code is not buggy.