I've got a machine with an onboard NIC that reproduces a hardware hang every time I do an rsync to it.
[ 488.752630] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <27> TDT <34> next_to_use <34> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <1000048b2> next_to_watch <27> jiffies <1000049d8> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <7c00> PHY Extended Status <3000> PCI Status <10> [ 490.751948] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <27> TDT <34> next_to_use <34> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <1000048b2> next_to_watch <27> jiffies <100004aa0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <7c00> PHY Extended Status <3000> PCI Status <10> [ 492.750447] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <27> TDT <34> next_to_use <34> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <1000048b2> next_to_watch <27> jiffies <100004b68> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <7c00> PHY Extended Status <3000> PCI Status <10> [ 494.749507] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <27> TDT <34> next_to_use <34> next_to_clean <23> buffer_info[next_to_clean]: time_stamp <1000048b2> next_to_watch <27> jiffies <100004c30> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <7c00> PHY Extended Status <3000> PCI Status <10> [ 494.758881] ------------[ cut here ]------------ [ 494.759109] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x23a/0x250() [ 494.759347] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out [ 494.759585] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-backup-debug+ #1 [ 494.759841] ffffffffb0ddd622 0431bce15e8d04e9 ffff88043d803d08 ffffffffb097e15b [ 494.760111] 0000000000000007 ffff88043d803d60 ffff88043d803d48 ffffffffb0076de5 [ 494.760392] 0000000000000000 0000000000000000 0000000000000000 ffff880427bb7d30 [ 494.760648] Call Trace: [ 494.760896] <IRQ> [<ffffffffb097e15b>] dump_stack+0x4c/0x65 [ 494.761160] [<ffffffffb0076de5>] warn_slowpath_common+0x85/0xc0 [ 494.761423] [<ffffffffb0076ea5>] warn_slowpath_fmt+0x55/0x70 [ 494.761686] [<ffffffffb087b02a>] dev_watchdog+0x23a/0x250 [ 494.761949] [<ffffffffb087adf0>] ? qdisc_rcu_free+0x40/0x40 [ 494.762215] [<ffffffffb00e9703>] call_timer_fn+0xb3/0x420 [ 494.762483] [<ffffffffb00e9655>] ? call_timer_fn+0x5/0x420 [ 494.762753] [<ffffffffb00e9c02>] run_timer_softirq+0x192/0x3d0 [ 494.763025] [<ffffffffb007b6b5>] ? __do_softirq+0xb5/0x5d0 [ 494.763300] [<ffffffffb087adf0>] ? qdisc_rcu_free+0x40/0x40 [ 494.763570] [<ffffffffb007b6df>] __do_softirq+0xdf/0x5d0 [ 494.763838] [<ffffffffb007bd58>] ? irq_exit+0x78/0xc0 [ 494.764108] [<ffffffffb007bd98>] irq_exit+0xb8/0xc0 [ 494.764381] [<ffffffffb098bee6>] smp_apic_timer_interrupt+0x46/0x60 [ 494.764662] [<ffffffffb098a8ad>] apic_timer_interrupt+0x6d/0x80 [ 494.764943] <EOI> [<ffffffffb0815916>] ? cpuidle_enter_state+0x106/0x3a0 [ 494.765232] [<ffffffffb0815951>] ? cpuidle_enter_state+0x141/0x3a0 [ 494.765525] [<ffffffffb0815946>] ? cpuidle_enter_state+0x136/0x3a0 [ 494.765815] [<ffffffffb0815be7>] cpuidle_enter+0x17/0x20 [ 494.766105] [<ffffffffb00bca5c>] cpu_startup_entry+0x38c/0x500 [ 494.766396] [<ffffffffb0977988>] rest_init+0x138/0x140 [ 494.766692] [<ffffffffb0f91f23>] start_kernel+0x466/0x487 [ 494.766990] [<ffffffffb0f91495>] x86_64_start_reservations+0x2a/0x2c [ 494.767292] [<ffffffffb0f91583>] x86_64_start_kernel+0xec/0xf0 Here's another instance after rebooting, with some different register states.. [ 2379.674285] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <50> TDT <5d> next_to_use <5d> next_to_clean <4d> buffer_info[next_to_clean]: time_stamp <100032c2d> next_to_watch <50> jiffies <100032ce8> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> [ 2381.672792] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <50> TDT <5d> next_to_use <5d> next_to_clean <4d> buffer_info[next_to_clean]: time_stamp <100032c2d> next_to_watch <50> jiffies <100032db0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> [ 2383.671379] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <50> TDT <5d> next_to_use <5d> next_to_clean <4d> buffer_info[next_to_clean]: time_stamp <100032c2d> next_to_watch <50> jiffies <100032e78> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> [ 2385.669944] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <50> TDT <5d> next_to_use <5d> next_to_clean <4d> buffer_info[next_to_clean]: time_stamp <100032c2d> next_to_watch <50> jiffies <100032f40> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> [ 2387.668428] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang: TDH <50> TDT <5d> next_to_use <5d> next_to_clean <4d> buffer_info[next_to_clean]: time_stamp <100032c2d> next_to_watch <50> jiffies <100033008> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10> The rsync on the other side then craps itself detecting 'corrupted packets'. The NIC in question is.. 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V If this is a software problem, it's not anything new. I tested as far back as 3.16, which had the same problem. Is there any hw feature I can try disabling, to see if that makes a difference ? Dave -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html