On Fri, Oct 6, 2017 at 8:21 AM, Paolo Abeni <pab...@redhat.com> wrote:
> Hi,
>
> On Fri, 2017-10-06 at 07:37 -0700, Eric Dumazet wrote:
>> On Fri, 2017-10-06 at 14:57 +0200, Paolo Abeni wrote:
>> > Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when
>> > processing tcp packets:
>> >
>> >            to-be-untracked noref entity ffff942cb71ea300 not found in cache
>> >            ------------[ cut here ]------------
>> >            WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 
>> > rcu_track_noref+0xa4/0xf0
>> >            Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal 
>> > intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
>> > crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper 
>> > cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich 
>> > ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter 
>> > shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs 
>> > libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt 
>> > fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit 
>> > libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod
>> >            CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 
>> > 4.14.0-rc1.noref_route+ #1610
>> >            Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 
>> > 01/17/2017
>> >            task: ffff940e48300000 task.stack: ffffaec406a20000
>> >            RIP: 0010:rcu_track_noref+0xa4/0xf0
>> >            RSP: 0018:ffffaec406a238e0 EFLAGS: 00010246
>> >            RAX: 0000000000000040 RBX: 0000000000000000 RCX: 
>> > 0000000000000002
>> >            RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> > 0000000000000292
>> >            RBP: ffffaec406a238e0 R08: 0000000000000000 R09: 
>> > 0000000000000000

> Thank you for the feedback.
>
> I most probably messed-up while extracting the info from dmsg, as this
> issue gives a couple of splats almost concurrently. Please let me re-do
> the test and post a more resonable dmsg.
>
> The problem with the current code is that in the tcp_rcv_established()
> -> tcp_queue_rcv() path, the skb_dst() is not cleared.
>

In any case, I would rather put one skb_dst_drop() right after the
last possible use of skb dst in TCP stack,
probably after sk_rx_dst_set() call.

Trying to move it in multiple places has been error prone, even if
current code is not buggy.

Reply via email to