Hi Eric.

On Mon, 15 Oct 2018 at 16:42, Eric Dumazet <eduma...@google.com> wrote:
>
> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> <step...@networkplumber.org> wrote:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 14 Oct 2018 10:42:48 +0000
> > From: bugzilla-dae...@bugzilla.kernel.org
> > To: step...@networkplumber.org
> > Subject: [Bug 201423] New: eth0: hw csum failure
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=201423
> >
> >             Bug ID: 201423
> >            Summary: eth0: hw csum failure
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 4.19.0-rc7
> >           Hardware: Intel
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: step...@networkplumber.org
> >           Reporter: ross...@inwind.it
> >         Regression: No
> >
> > I have a P6T DELUXE V2 motherboard and using the sky2 driver for the 
> > ethernet
> > ports. I get the following error message:
> >
> > [  433.727397] eth0: hw csum failure
> > [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  433.727406] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  433.727407] Call Trace:
> > [  433.727409]  <IRQ>
> > [  433.727415]  dump_stack+0x46/0x5b
> > [  433.727419]  __skb_checksum_complete+0xb0/0xc0
> > [  433.727423]  tcp_v4_rcv+0x528/0xb60
> > [  433.727426]  ? ipt_do_table+0x2d0/0x400
> > [  433.727429]  ip_local_deliver_finish+0x5a/0x110
> > [  433.727430]  ip_local_deliver+0xe1/0xf0
> > [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  433.727432]  ip_rcv+0xca/0xe0
> > [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
> > [  433.727438]  netif_receive_skb_internal+0x4e/0x130
> > [  433.727439]  napi_gro_receive+0x6a/0x80
> > [  433.727442]  sky2_poll+0x707/0xd20
> > [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
> > [  433.727447]  net_rx_action+0x237/0x380
> > [  433.727449]  __do_softirq+0xdc/0x1e0
> > [  433.727452]  irq_exit+0xa9/0xb0
> > [  433.727453]  do_IRQ+0x45/0xc0
> > [  433.727455]  common_interrupt+0xf/0xf
> > [  433.727456]  </IRQ>
> > [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 
> > 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 
> > 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
> > 000000650512105d
> > [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
> > 00000064fc2a8b1c
> > [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  433.727468]  ? cpuidle_enter_state+0x119/0x200
> > [  433.727471]  do_idle+0x1bf/0x200
> > [  433.727473]  cpu_startup_entry+0x6a/0x70
> > [  433.727475]  start_secondary+0x17f/0x1c0
> > [  433.727476]  secondary_startup_64+0xa4/0xb0
> > [  441.662954] eth0: hw csum failure
> > [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
> > [  441.662960] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  441.662960] Call Trace:
> > [  441.662963]  <IRQ>
> > [  441.662968]  dump_stack+0x46/0x5b
> > [  441.662972]  __skb_checksum_complete+0xb0/0xc0
> > [  441.662975]  tcp_v4_rcv+0x528/0xb60
> > [  441.662979]  ? ipt_do_table+0x2d0/0x400
> > [  441.662981]  ip_local_deliver_finish+0x5a/0x110
> > [  441.662983]  ip_local_deliver+0xe1/0xf0
> > [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  441.662986]  ip_rcv+0xca/0xe0
> > [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
> > [  441.662993]  netif_receive_skb_internal+0x4e/0x130
> > [  441.662994]  napi_gro_receive+0x6a/0x80
> > [  441.662998]  sky2_poll+0x707/0xd20
> > [  441.663000]  net_rx_action+0x237/0x380
> > [  441.663002]  __do_softirq+0xdc/0x1e0
> > [  441.663005]  irq_exit+0xa9/0xb0
> > [  441.663007]  do_IRQ+0x45/0xc0
> > [  441.663009]  common_interrupt+0xf/0xf
> > [  441.663010]  </IRQ>
> > [  441.663012] RIP: 0010:merge+0x22/0xb0
> > [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 
> > 53
> > 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 
> > 85 c9
> > 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
> > [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffffde
> > [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
> > ffff88021ab2d408
> > [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
> > 0000000000000000
> > [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
> > 0000000000008500
> > [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
> > ffffffffa021c440
> > [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
> > ffffc9000090b9e0
> > [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663065]  ? merge+0x57/0xb0
> > [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663082]  list_sort+0x8b/0x230
> > [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
> > [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
> > [  441.663113]  ? __switch_to_asm+0x34/0x70
> > [  441.663114]  ? __switch_to_asm+0x40/0x70
> > [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
> > [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
> > [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
> > [  441.663171]  do_vfs_ioctl+0x9a/0x600
> > [  441.663173]  ksys_ioctl+0x35/0x60
> > [  441.663175]  __x64_sys_ioctl+0x11/0x20
> > [  441.663177]  do_syscall_64+0x3d/0xf0
> > [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  441.663180] RIP: 0033:0x7f9377377f37
> > [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 
> > ad
> > db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 
> > 3d 01
> > f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
> > [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
> > 00007f9377377f37
> > [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
> > 0000000000000010
> > [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00000000c0206466
> > [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
> > 0000564497a38120
> > [  462.833418] eth0: hw csum failure
> > [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  462.833429] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  462.833429] Call Trace:
> > [  462.833432]  <IRQ>
> > [  462.833438]  dump_stack+0x46/0x5b
> > [  462.833442]  __skb_checksum_complete+0xb0/0xc0
> > [  462.833446]  tcp_v4_rcv+0x528/0xb60
> > [  462.833449]  ? ipt_do_table+0x2d0/0x400
> > [  462.833452]  ip_local_deliver_finish+0x5a/0x110
> > [  462.833454]  ip_local_deliver+0xe1/0xf0
> > [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  462.833457]  ip_rcv+0xca/0xe0
> > [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
> > [  462.833464]  netif_receive_skb_internal+0x4e/0x130
> > [  462.833466]  napi_gro_receive+0x6a/0x80
> > [  462.833469]  sky2_poll+0x707/0xd20
> > [  462.833471]  net_rx_action+0x237/0x380
> > [  462.833474]  __do_softirq+0xdc/0x1e0
> > [  462.833477]  irq_exit+0xa9/0xb0
> > [  462.833479]  do_IRQ+0x45/0xc0
> > [  462.833481]  common_interrupt+0xf/0xf
> > [  462.833482]  </IRQ>
> > [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 
> > 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 
> > 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
> > 0000006bc3052131
> > [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  462.833498]  ? cpuidle_enter_state+0x119/0x200
> > [  462.833503]  do_idle+0x1bf/0x200
> > [  462.833506]  cpu_startup_entry+0x6a/0x70
> > [  462.833510]  start_secondary+0x17f/0x1c0
> > [  462.833513]  secondary_startup_64+0xa4/0xb0
> >
> > Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> > got the following first bad commit:
> >
> > commit 88078d98d1bb085d72af8437707279e203524fa5
> > Author: Eric Dumazet <eduma...@google.com>
> > Date:   Wed Apr 18 11:43:15 2018 -0700
> >
> >     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >
> >     After working on IP defragmentation lately, I found that some large
> >     packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >     zero paddings on the last (small) fragment.
> >
> >     While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >     to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >     fragments had CHECKSUM_COMPLETE set.
> >
> >     We can instead compute the checksum of the part we are trimming,
> >     usually smaller than the part we keep.
> >
> >     Signed-off-by: Eric Dumazet <eduma...@google.com>
> >     Signed-off-by: David S. Miller <da...@davemloft.net>
> >
>
> Thanks for bisecting !
>
> This commit is known to expose some NIC/driver bugs.
>
> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
>
> I assume SKY2_HW_NEW_LE is not set on your NIC ?

Just to say that we've also just hit this with both the LAN78xx and
SMSC9514 drivers, ie all Raspberry Pis with onboard ethernet. Likewise
that commit had been pinpointed as the cause, or at least exposing an
underlying issue.
As the patch has been backported in 4.14.71 it's hitting LTS users too.

Thanks for the pointer on sungem. I'll have a look into what's going
on and see if we can sort it, although I have cc'ed in the maintainers
of those chips in case they are already on the case.

Cheers.
  Dave

Reply via email to