Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Thank you, Kai-Heng Feng. Really appreciate it. Currently I'm under a lot of pressure at work. But I will try this in the next days, to see if it fixes the problem for us. My network still have the same condition and my previous kernel versions are still breaking. So, it should be easy to reproduce. Will write back reporting as soon as I can. Thank you again, Paulo On Tue, Jul 2, 2019, 03:15 Kai-Heng Feng wrote: > Latest kernels in Xenial, Bionic, Cosmic and Disco have the following > commit: > commit 3a498606bb04af603a46ebde8296040b2de350d1 > Author: Sanjeev Bansal > Date: Mon Jul 16 11:13:32 2018 +0530 > > tg3: Add higher cpu clock for 5762. > > This patch has fix for TX timeout while running bi-directional > traffic with 100 Mbps using 5762. > > Signed-off-by: Sanjeev Bansal > Signed-off-by: Siva Reddy Kallam > Reviewed-by: Michael Chan > Signed-off-by: David S. Miller > > ** Changed in: linux (Ubuntu) >Status: Triaged => Fix Released > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1447664 > > Title: > 14e4:1687 broadcom tg3 network driver disconnects under high load > > Status in linux package in Ubuntu: > Fix Released > Status in linux package in Debian: > New > > Bug description: > The tg3 broadcom network driver that binds with chipset 5762 goes > offline and unable to recover (even with tg3 watchdog timeout) when network > transmit is under high load. Call trace: > https://launchpadlibrarian.net/204185480/dmesg > > When this happens, only a reboot would be able to fix it. Sometimes, > however, bringing the interface offline and online (via ifconfig) > would recover networking. I've also tested with the latest tg3 driver > (dec 2014 version) and networking is still problematic. I have also > disabled TSO, GSO etc... with ethtool and the bug still surfaces. > This bug may be related to the integrated Firmware. > > Here is the procedure to replicate the issue because it is hard to > replicate it under moderate network load. > > 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) > using a Ubuntu/Kubunu Live CD 14.04-15.04. > 2. from another machine: start 5 sessions, repetitively copy (scp with > public key authentication) a 70 meg file back and forth to the tg3 machine > in each session. (not sure if this is necessary) > 3. create a 1GB file on the tg3 machine, with something like dd > if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000)) > 4. from another machine: repetitively scp copy that 1GB file from the > tg3 machine. This can be done with something like: > > while [ 0 ]; do > scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp > done; > > Networking will mostly goes offline in about 10-30 minutes. > > WORKAROUND: Add udev rule to make the changes permanent in > /etc/udev/rules.d/80-tg3-fix.rules : > ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", > ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off" > > ProblemType: Bug > DistroRelease: Ubuntu 15.04 > Package: linux-image-3.19.0-15-generic 3.19.0-15.15 > ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3 > Uname: Linux 3.19.0-15-generic x86_64 > ApportVersion: 2.17.2-0ubuntu1 > Architecture: amd64 > AudioDevicesInUse: >USERPID ACCESS COMMAND >/dev/snd/controlC1: kubuntu3748 F pulseaudio >/dev/snd/controlC0: kubuntu3748 F pulseaudio > CasperVersion: 1.360 > Date: Thu Apr 23 11:16:24 2015 > IwConfig: >eth0 no wireless extensions. > >lono wireless extensions. > LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422) > MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT > ProcEnviron: >LANGUAGE= >TERM=xterm >PATH=(custom, no user) >LANG=en_US.UTF-8 >SHELL=/bin/bash > ProcFB: 0 radeondrmfb > ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi > file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash > --- > PulseList: >Error: command ['pacmd', 'list'] failed with exit code 1: Home > directory not accessible: Permission denied >No PulseAudio daemon running, or not running as session daemon. > RelatedPackageVersions: >linux-restricted-modules-3.19.0-15-generic N/A >linux-backports-modules-3.19.0-15-generic N/A >linux-firmware 1.143 > RfKill: > > SourcePackage: linux > UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' > UpgradeStatus: No upgrade log present (probably fresh install) > dmi.bios.date: 10/22/2014 > dmi.bios.vendor: Hewlett-Packard > dmi.bios.version: L06 v02.15 > dmi.board.asset.tag: 2UA5041TG4 > dmi.board.name: 2215 > dmi.board.vendor: Hewlett-Packard > dmi.chassis.asset.tag: 2UA5041TG4 > dmi.chassis.type: 6 > dmi.chassis.vendor: Hewlett-Packard > dmi.modalias:
Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Thank you. I am still having the problem during our cloning process, although it's not so frequent. Before the patch I applied, each and every transfer would ALWAYS kick the tg3 bug. Here it seems related to problems with NAPI. AFAIK, this is an approach to handle interrupt bursts. NIC's work typically in bursts: a long time without packets, then a very large stream of packets, then silence. This is the common scenario. Having interrupts to serve sporadic data is ok. But a burst of packets trigger a burst of interrupts, which is not as efficient as just polling the NIC (during the burst). What NAPI does is (in a very very simplified way): it expects the first interrupt from the network, then switches off interrupts, poll the NIC (up to a limit) until there are no more network packets, or the "work quota" is exhausted, what happens first. Then it turns on interrupts and the cycle repeats. This quota (sorry, don't remember the correct term) is very important to prevent the kernel from being stuck just serving packets. What's happening is (my understanding) that something went wrong during this process and the tg3 driver gets stuck. A colleague told me that it's related to the broadcom driver. Please try this workaround. Remove the two drivers, then reload "broadcom" and "tg3" in this order. Maybe then your network will restart. sudo modprobe -r broadcom tg3 sudo modprobe broadcom sudo modprobe tg3 Please tell us what happens when you try this. It won't solve the problem, but perhaps it helps. Regards, Paulo On Sat, Jan 26, 2019, 10:39 Bob Lawrence <1447...@bugs.launchpad.net wrote: > Confirmed that this is still an issue on 18.04.1. I have an HP 705 G1 > with the Broadcom 5762. In my case it's a Plex server. Whenever I try to > stream something the interface goes "NO-CARRIER" and the only way to > recover is to reboot. I've tried disabling highdma, tso and gso using > ethtool, iommu=soft kernel parameter, and forcing every combo of > 1gbps/100mbps & half/full duplex. Nothing seems to workaround the issue. > > System:Host: Bobs-HTPC Kernel: 4.15.0-43-generic x86_64 bits: 64 > Console: tty 1 Distro: Ubuntu 18.04.1 LTS > Machine: Device: desktop System: Hewlett-Packard product: HP EliteDesk > 705 G1 DM serial: N/A >Mobo: Hewlett-Packard model: 225E serial: N/A BIOS: > Hewlett-Packard v: L06 v02.31 date: 08/31/2018 > Batteryhidpp__0: charge: N/A condition: NA/NA Wh > CPU: Quad core AMD A8-7600 Radeon R7 10 Compute Cores 4C+6G (-MCP-) > cache: 8192 KB >clock speeds: max: 3100 MHz 1: 3094 MHz 2: 3094 MHz 3: 3094 MHz > 4: 3094 MHz > Graphics: Card: Advanced Micro Devices [AMD/ATI] Kaveri [Radeon R7 > Graphics] >Display Server: N/A drivers: ati,radeon (unloaded: > modesetting,fbdev,vesa) >tty size: 120x53 Advanced Data: N/A out of X > Audio: Card-1 Advanced Micro Devices [AMD] FCH Azalia Controller > driver: snd_hda_intel >Card-2 Advanced Micro Devices [AMD/ATI] Kaveri HDMI/DP Audio > Controller driver: snd_hda_intel >Sound: Advanced Linux Sound Architecture v: k4.15.0-43-generic > Network: Card-1: Intel Wireless 7260 driver: iwlwifi >IF: wlp2s0 state: up mac: cc:3d:82:a7:bf:ed >Card-2: Broadcom Limited NetXtreme BCM5762 Gigabit Ethernet > PCIe driver: tg3 >IF: eno1 state: up speed: 100 Mbps duplex: half mac: > ec:b1:d7:4c:2d:8e > Drives:HDD Total Size: 9501.7GB (42.8% used) >ID-1: /dev/sda model: ST500LM000 size: 500.1GB >ID-2: USB /dev/sdb model: 5 size: 9001.6GB > Partition: ID-1: / size: 458G used: 23G (6%) fs: ext4 dev: /dev/sda1 > RAID: No RAID devices: /proc/mdstat, md_mod kernel module present > Sensors: System Temperatures: cpu: 40.8C mobo: N/A gpu: 42.0 >Fan Speeds (in rpm): cpu: N/A > Info: Processes: 227 Uptime: 12:49 Memory: 1608.0/5943.7MB Init: > systemd runlevel: 5 >Client: Shell (bash) inxi: 2.3.56 > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1447664 > > Title: > 14e4:1687 broadcom tg3 network driver disconnects under high load > > Status in linux package in Ubuntu: > Triaged > Status in linux package in Debian: > New > > Bug description: > The tg3 broadcom network driver that binds with chipset 5762 goes > offline and unable to recover (even with tg3 watchdog timeout) when network > transmit is under high load. Call trace: > https://launchpadlibrarian.net/204185480/dmesg > > When this happens, only a reboot would be able to fix it. Sometimes, > however, bringing the interface offline and online (via ifconfig) > would recover networking. I've also tested with the latest tg3 driver > (dec 2014 version) and networking is still problematic. I have also > disabled TSO, GSO etc... with ethtool and the bug still surfaces. > This bug may be related to the integrated Firmware. > > Here is the
Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Hi Kai-heng, Here are the test results we got. Kernel 4.15.0-14-generic failed. Transmit queue timed out. The dmesg output is attached. The tg3 module crashes in a few seconds right after opening the user session (e.g. about less than 10 sec). However, kernel 4.15.0-9-generic worked like a charm. It boots and brings up tg3, the Ethernet link is working and the module seems stable. We tested it to download a few gb, an Ubuntu image, play videos for a few hours and the like. Not even a single crash was observed. The dmesg output for this working kernel is attached also, because maybe it might help you to sort out what's different from one kernel to the other. Would you like us to test another image? Or to gather more information? Regards, Paulo On Fri, Apr 13, 2018, 14:03 Paulo Guedes - IFPE - Campus Recife < paulo.gue...@recife.ifpe.edu.br> wrote: > We tried this same version yesterday and the bug was still present. > Actually it looked worse, because the machine crashed faster (maybe was > just an impression). Will collect logs to report this properly soon, in a > few hours. > Paulo > > On Fri, Apr 13, 2018, 13:55 luc <1447...@bugs.launchpad.net> wrote: > >> Hi Kai-heng, >> >> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018 >> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10. >> I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28 >> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device. >> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is >> mandatory) >> >> [ 105.620301] tg3 :03:00.0 eno1: 0: Host status block >> [0001:00cc:(:002e:):(:0006)] >> [ 105.620309] tg3 :03:00.0 eno1: 0: NAPI info >> [00cc:00cc:(0024:0006:01ff)::(00f7:::)] >> [ 105.620317] tg3 :03:00.0 eno1: 1: Host status block >> [0001:0042:(::):(0830:)] >> [ 105.620324] tg3 :03:00.0 eno1: 1: NAPI info >> [0042:0042:(::01ff):0830:(0030:0030::)] >> [ 105.620331] tg3 :03:00.0 eno1: 2: Host status block >> [0001:00d2:(0fff::):(:)] >> [ 105.620370] tg3 :03:00.0 eno1: 2: NAPI info >> [00d2:00d2:(::01ff):0fff:(07ff:07ff::)] >> [ 105.755739] tg3 :03:00.0: tg3_stop_block timed out, ofs=4c00 >> enable_bit=2 >> [ 105.797123] tg3 :03:00.0 eno1: Link is down >> [ 105.889440] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x000d address=0xffe3d640 flags=0x0020] >> [ 105.889478] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT >> domain=0x000d address=0xffe3d680 flags=0x0020] >> [ 109.932707] tg3 :03:00.0 eno1: Link is up at 1000 Mbps, full duplex >> [ 109.932710] tg3 :03:00.0 eno1: Flow control is off for TX and off >> for RX >> [ 109.932711] tg3 :03:00.0 eno1: EEE is enabled >> >> ** Attachment added: "Bug tg3" >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+attachment/5114233/+files/Bug%20tg3 >> >> -- >> You received this bug notification because you are subscribed to the bug >> report. >> https://bugs.launchpad.net/bugs/1447664 >> >> Title: >> 14e4:1687 broadcom tg3 network driver disconnects under high load >> >> Status in linux package in Ubuntu: >> Triaged >> Status in linux package in Debian: >> New >> >> Bug description: >> The tg3 broadcom network driver that binds with chipset 5762 goes >> offline and unable to recover (even with tg3 watchdog timeout) when network >> transmit is under high load. Call trace: >> https://launchpadlibrarian.net/204185480/dmesg >> >> When this happens, only a reboot would be able to fix it. Sometimes, >> however, bringing the interface offline and online (via ifconfig) >> would recover networking. I've also tested with the latest tg3 driver >> (dec 2014 version) and networking is still problematic. I have also >> disabled TSO, GSO etc... with ethtool and the bug still surfaces. >> This bug may be related to the integrated Firmware. >> >> Here is the procedure to replicate the issue because it is hard to >> replicate it under moderate network load. >> >> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) >> using a Ubuntu/Kubunu Live CD 14.04-15.04. >> 2. from another machine: start 5 sessions, repetitively copy (scp with >> public key authentication) a 70 meg file back and forth to the tg3 machine >> in each session. (not sure if this is necessary) >> 3. create a 1GB file on the tg3 machine, with something like dd >> if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000)) >> 4. from another machine: repetitively scp copy that 1GB file from the >> tg3 machine. This can be done with something like: >> >> while [ 0 ]; do >> scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp >> done; >> >> Networking will mostly goes offline in about 10-30 minutes. >> >> WORKAROUND: Add udev rule to make the changes
Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
We tried this same version yesterday and the bug was still present. Actually it looked worse, because the machine crashed faster (maybe was just an impression). Will collect logs to report this properly soon, in a few hours. Paulo On Fri, Apr 13, 2018, 13:55 luc <1447...@bugs.launchpad.net> wrote: > Hi Kai-heng, > > I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018 > x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10. > I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28 > 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device. > Unfortunelly, i have the same problem= (TG3 still crash, a reboot is > mandatory) > > [ 105.620301] tg3 :03:00.0 eno1: 0: Host status block > [0001:00cc:(:002e:):(:0006)] > [ 105.620309] tg3 :03:00.0 eno1: 0: NAPI info > [00cc:00cc:(0024:0006:01ff)::(00f7:::)] > [ 105.620317] tg3 :03:00.0 eno1: 1: Host status block > [0001:0042:(::):(0830:)] > [ 105.620324] tg3 :03:00.0 eno1: 1: NAPI info > [0042:0042:(::01ff):0830:(0030:0030::)] > [ 105.620331] tg3 :03:00.0 eno1: 2: Host status block > [0001:00d2:(0fff::):(:)] > [ 105.620370] tg3 :03:00.0 eno1: 2: NAPI info > [00d2:00d2:(::01ff):0fff:(07ff:07ff::)] > [ 105.755739] tg3 :03:00.0: tg3_stop_block timed out, ofs=4c00 > enable_bit=2 > [ 105.797123] tg3 :03:00.0 eno1: Link is down > [ 105.889440] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x000d address=0xffe3d640 flags=0x0020] > [ 105.889478] tg3 :03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT > domain=0x000d address=0xffe3d680 flags=0x0020] > [ 109.932707] tg3 :03:00.0 eno1: Link is up at 1000 Mbps, full duplex > [ 109.932710] tg3 :03:00.0 eno1: Flow control is off for TX and off > for RX > [ 109.932711] tg3 :03:00.0 eno1: EEE is enabled > > ** Attachment added: "Bug tg3" > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+attachment/5114233/+files/Bug%20tg3 > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1447664 > > Title: > 14e4:1687 broadcom tg3 network driver disconnects under high load > > Status in linux package in Ubuntu: > Triaged > Status in linux package in Debian: > New > > Bug description: > The tg3 broadcom network driver that binds with chipset 5762 goes > offline and unable to recover (even with tg3 watchdog timeout) when network > transmit is under high load. Call trace: > https://launchpadlibrarian.net/204185480/dmesg > > When this happens, only a reboot would be able to fix it. Sometimes, > however, bringing the interface offline and online (via ifconfig) > would recover networking. I've also tested with the latest tg3 driver > (dec 2014 version) and networking is still problematic. I have also > disabled TSO, GSO etc... with ethtool and the bug still surfaces. > This bug may be related to the integrated Firmware. > > Here is the procedure to replicate the issue because it is hard to > replicate it under moderate network load. > > 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) > using a Ubuntu/Kubunu Live CD 14.04-15.04. > 2. from another machine: start 5 sessions, repetitively copy (scp with > public key authentication) a 70 meg file back and forth to the tg3 machine > in each session. (not sure if this is necessary) > 3. create a 1GB file on the tg3 machine, with something like dd > if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000)) > 4. from another machine: repetitively scp copy that 1GB file from the > tg3 machine. This can be done with something like: > > while [ 0 ]; do > scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp > done; > > Networking will mostly goes offline in about 10-30 minutes. > > WORKAROUND: Add udev rule to make the changes permanent in > /etc/udev/rules.d/80-tg3-fix.rules : > ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", > ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off" > > ProblemType: Bug > DistroRelease: Ubuntu 15.04 > Package: linux-image-3.19.0-15-generic 3.19.0-15.15 > ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3 > Uname: Linux 3.19.0-15-generic x86_64 > ApportVersion: 2.17.2-0ubuntu1 > Architecture: amd64 > AudioDevicesInUse: >USERPID ACCESS COMMAND >/dev/snd/controlC1: kubuntu3748 F pulseaudio >/dev/snd/controlC0: kubuntu3748 F pulseaudio > CasperVersion: 1.360 > Date: Thu Apr 23 11:16:24 2015 > IwConfig: >eth0 no wireless extensions. > >lono wireless extensions. > LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422) > MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT > ProcEnviron: >LANGUAGE= >TERM=xterm >
Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Ok, I'll check it out. Thank you very much! By the way, we downloaded and tested one of the Deb packages you created, and it worked quite well. Will check which one was exactly before reporting (almost sure it was the one for xenial). We managed to reproduce the issue easily by booting into pxe and, after the nic was started (trying to get an ip), we reset the machine and booted into Ubuntu. There is a huge difference by doing this and doing a cold boot, directly into Ubuntu. My hypothesis is that pxe setups the nic in a way that is not the default, by changing one (or more) of the config bits for some register. This same bit(s) is/are not being touched by the tg3 driver without patch. This way, a boot may work sometimes, maybe due to default values not being set by the kernel module tg3 (and being set by pxe code, if it executed before Linux is loaded). Anyway, the unpatched kernel breaks very quickly, while the patched kernel you provided worked out very well. This happens after running pxe. I will check your links soon and return with our results in the next days, hopefully this weekend or next week. Thank you, Paulo On Mar 20, 2018 14:16, "Kai-Heng Feng"wrote: Guy, Broadcom has a new patch [1] that need to test. Here's the kernel [2] to try. [1] https://lkml.org/lkml/2018/3/20/35 [2] https://people.canonical.com/~khfeng/lp1447664-20180320/ -- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1447664 Title: 14e4:1687 broadcom tg3 network driver disconnects under high load Status in linux package in Ubuntu: Triaged Status in linux package in Debian: New Bug description: The tg3 broadcom network driver that binds with chipset 5762 goes offline and unable to recover (even with tg3 watchdog timeout) when network transmit is under high load. Call trace: https://launchpadlibrarian.net/204185480/dmesg When this happens, only a reboot would be able to fix it. Sometimes, however, bringing the interface offline and online (via ifconfig) would recover networking. I've also tested with the latest tg3 driver (dec 2014 version) and networking is still problematic. I have also disabled TSO, GSO etc... with ethtool and the bug still surfaces. This bug may be related to the integrated Firmware. Here is the procedure to replicate the issue because it is hard to replicate it under moderate network load. 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a Ubuntu/Kubunu Live CD 14.04-15.04. 2. from another machine: start 5 sessions, repetitively copy (scp with public key authentication) a 70 meg file back and forth to the tg3 machine in each session. (not sure if this is necessary) 3. create a 1GB file on the tg3 machine, with something like dd if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000)) 4. from another machine: repetitively scp copy that 1GB file from the tg3 machine. This can be done with something like: while [ 0 ]; do scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp done; Networking will mostly goes offline in about 10-30 minutes. WORKAROUND: Add udev rule to make the changes permanent in /etc/udev/rules.d/80-tg3-fix.rules : ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off" ProblemType: Bug DistroRelease: Ubuntu 15.04 Package: linux-image-3.19.0-15-generic 3.19.0-15.15 ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3 Uname: Linux 3.19.0-15-generic x86_64 ApportVersion: 2.17.2-0ubuntu1 Architecture: amd64 AudioDevicesInUse: USERPID ACCESS COMMAND /dev/snd/controlC1: kubuntu3748 F pulseaudio /dev/snd/controlC0: kubuntu3748 F pulseaudio CasperVersion: 1.360 Date: Thu Apr 23 11:16:24 2015 IwConfig: eth0 no wireless extensions. lono wireless extensions. LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422) MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT ProcEnviron: LANGUAGE= TERM=xterm PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash --- PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-3.19.0-15-generic N/A linux-backports-modules-3.19.0-15-generic N/A linux-firmware 1.143 RfKill: SourcePackage: linux UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 10/22/2014 dmi.bios.vendor: Hewlett-Packard dmi.bios.version: L06
Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Thank you, we will try it as soon as possible. Currently I'm on vacation, and will not be able to test it until about March 5 (2 weeks from now). But as soon as I test it, I'll let you know about the results. It would be great if someone else could try it too. Thanks, Paulo On Feb 12, 2018 3:25 AM, "Kai-Heng Feng"wrote: Kernel with patch in comment #40. Please try it out. http://people.canonical.com/~khfeng/lp1447664-clk/ -- You received this bug notification because you are subscribed to the bug report. https://bugs.launchpad.net/bugs/1447664 Title: 14e4:1687 broadcom tg3 network driver disconnects under high load Status in linux package in Ubuntu: Triaged Status in linux package in Debian: New Bug description: -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1447664 Title: 14e4:1687 broadcom tg3 network driver disconnects under high load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Hello, this thread has a patch that solved the bug (for me). https://www.mail-archive.com/netdev@vger.kernel.org/msg189347.html The patch is here: https://www.mail-archive.com/netdev@vger.kernel.org/msg189923/0001-tg3-Add-clock-override-support-for-5762.patch I tested this patch on the following kernels and situations. 1) Stable kernels 4.13.3 and 4.15 crash without the patch (plus all other versions tested). Patch is not merged yet in the main linux branch, until (and including) 4.15 (stable). 2) Stable kernels 4.13.3 and 4.15 work great with the patch: no timeouts on tg3. Fast transfers on gigabit links and 10/100 links. 3) I wrote to the patch author, mentioned my results and asked when it will be merged on Jan 31 (10 days ago). Still waiting, probably the author is currently quite busy. 4) A lot of tests performed during weeks. The last session took about one or two weeks, working full time, on an isolated network. Using the fog open source cloning solution. Several hundreds of GB transferred during tests, for cloning 100+ machines inside a few labs. Both single and multicast cloning sessions used. Tested with a gigabit switch and also with 10/100 switches. Checked both single and multicast, sequential tests, in parallel, with/without power failures, with/without several patches, in many configurations, with lots of kernel parameters, you name it. 5) The test scenario shows this bug is completely reproducible, 100% of the time. Without the patch, my kernels always fail. Tested about 20 different versions and none worked. With the patch above, the two versions always work correctly. 6) A minor detail: patch has a slight offset for 4.15 (2 lines, probably new comments or code) but works anyway. This work would be impossible without all the cooperation from the fog team. Sebastian suggested the patch, and others helped a lot. A big "thank you" for them! I wonder when this will be merged in the main kernel. Please, can anyone help on this? Regards, Paulo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1447664 Title: 14e4:1687 broadcom tg3 network driver disconnects under high load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Hello, I would like to confirm that it's useful to file a new bug for this issue. For me, the problem I'm having is the same as we are discussing in this thread. Would it be just a duplicate? Maybe I'm missing something, because I don't know the details of the bug hunting process for Ubuntu. Can you please confirm I should open it? In this case, I can add a detailed description and dmesg logs, with debug on and the timeout error message inside. Anyway, I want to report advances in this problem. I have tested a few kernels and patches in the last weeks, and have found one combination that does solve the issue. I also checked that this patch is not yet merged into the latest vanilla stable kernel, version 4.15, released three days ago. But it patches and works also for 4.15, which is just great (at last for me). Will send the details later (or tomorrow), as soon as I get back to my computer. Paulo On Jan 29, 2018 12:54 AM, "Kai-Heng Feng"wrote: > First please file an upstream bug at https://bugzilla.kernel.org/ > Product: Drivers > Component: Network > > Also, looks like it's a Ubuntu certified hardware, let me ask around. > > -- > You received this bug notification because you are subscribed to the bug > report. > https://bugs.launchpad.net/bugs/1447664 > > Title: > 14e4:1687 broadcom tg3 network driver disconnects under high load > > Status in linux package in Ubuntu: > Triaged > Status in linux package in Debian: > New > > Bug description: > The tg3 broadcom network driver that binds with chipset 5762 goes > offline and unable to recover (even with tg3 watchdog timeout) when network > transmit is under high load. Call trace: > https://launchpadlibrarian.net/204185480/dmesg > > When this happens, only a reboot would be able to fix it. Sometimes, > however, bringing the interface offline and online (via ifconfig) > would recover networking. I've also tested with the latest tg3 driver > (dec 2014 version) and networking is still problematic. I have also > disabled TSO, GSO etc... with ethtool and the bug still surfaces. > This bug may be related to the integrated Firmware. > > Here is the procedure to replicate the issue because it is hard to > replicate it under moderate network load. > > 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) > using a Ubuntu/Kubunu Live CD 14.04-15.04. > 2. from another machine: start 5 sessions, repetitively copy (scp with > public key authentication) a 70 meg file back and forth to the tg3 machine > in each session. (not sure if this is necessary) > 3. create a 1GB file on the tg3 machine, with something like dd > if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000)) > 4. from another machine: repetitively scp copy that 1GB file from the > tg3 machine. This can be done with something like: > > while [ 0 ]; do > scp -i /my/scp/private.key u...@ip.of.tg3:/my/test/file /tmp > done; > > Networking will mostly goes offline in about 10-30 minutes. > > WORKAROUND: Add udev rule to make the changes permanent in > /etc/udev/rules.d/80-tg3-fix.rules : > ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", > ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off" > > ProblemType: Bug > DistroRelease: Ubuntu 15.04 > Package: linux-image-3.19.0-15-generic 3.19.0-15.15 > ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3 > Uname: Linux 3.19.0-15-generic x86_64 > ApportVersion: 2.17.2-0ubuntu1 > Architecture: amd64 > AudioDevicesInUse: >USERPID ACCESS COMMAND >/dev/snd/controlC1: kubuntu3748 F pulseaudio >/dev/snd/controlC0: kubuntu3748 F pulseaudio > CasperVersion: 1.360 > Date: Thu Apr 23 11:16:24 2015 > IwConfig: >eth0 no wireless extensions. > >lono wireless extensions. > LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422) > MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT > ProcEnviron: >LANGUAGE= >TERM=xterm >PATH=(custom, no user) >LANG=en_US.UTF-8 >SHELL=/bin/bash > ProcFB: 0 radeondrmfb > ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi > file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash > --- > PulseList: >Error: command ['pacmd', 'list'] failed with exit code 1: Home > directory not accessible: Permission denied >No PulseAudio daemon running, or not running as session daemon. > RelatedPackageVersions: >linux-restricted-modules-3.19.0-15-generic N/A >linux-backports-modules-3.19.0-15-generic N/A >linux-firmware 1.143 > RfKill: > > SourcePackage: linux > UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev' > UpgradeStatus: No upgrade log present (probably fresh install) > dmi.bios.date: 10/22/2014 > dmi.bios.vendor: Hewlett-Packard > dmi.bios.version: L06 v02.15 > dmi.board.asset.tag: 2UA5041TG4 >
[Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Hello, I am still having this bug. I'm working with several HP machines, with the same model as Yngvi. Here it is (from dmesg messages): Hardware name: HP HP EliteDesk 705 G3 Brazil Desktop Mini/8266, BIOS P26 Ver. 02.03 12/22/2016 Interesting to notice that it always happens with a 10/100 switch, but never occurs with a gigabit one. I've compiled and tested the 4.15.0-rc8 release candidade, which has the commit 4419bb1cedcda0272e1dc410345c5a1d1da0e367, but it does not solve the issue. I added a few printk and can see that the module is correctly compiled and loaded, but my machine is not a Dell. Hence, the "if" condition fails and the body is not executed. I tried also to force the patch, by keeping the "if body" and removing the condition, just to see what happens (with another printk to prove that it runs). The code runs (limiting MRRS t0 2048, I think), but it does not solve the bug. It complains that TSC is unstable, right after tg3 breaks. Here is a dmesg snippet, maybe it helps. <...> [ 155.816404] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: [ 155.816447] clocksource: 'refined-jiffies' wd_now: fffdcbf3 wd_last: fffdc110 mask: [ 155.816490] clocksource: 'tsc' cs_now: 7d3f16e620 cs_last: 7b2987b172 mask: [ 155.816533] tsc: Marking TSC unstable due to clocksource watchdog [ 155.939181] tg3 :01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2 [ 156.103998] tg3 :01:00.0 eth0: Link is down [ 156.322988] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 156.323040] sched_clock: Marking unstable (156322980975, 5436)<-(156582881282, -259894745) [ 156.323144] clocksource: Switched to clocksource refined-jiffies <...> If you want to take a deeper look, there are a few logs here. Tried also with "tsc=unstable" and other boot parameters, mostly to see if any would help (feeling lucky, perhaps?). Nothing changed, the bug is still in here. They show mostly the same messages, to me. log_01_acpi_off.txt https://pastebin.com/FGQNiLqk log_02_maxcpus_1.txt https://pastebin.com/2eEJnA3Z log_03_nmi_watchdog_off.txt https://pastebin.com/Su44AqiX log_04_nmi_watchdog_off.txt https://pastebin.com/4ja0UZ0c log_05_noapic_nolapic.txt https://pastebin.com/fZNJbME5 Well, any ideas? I can reproduce the problem 100% of the time. Would you like me to test any other patch? Kai-Heng Feng, you mention "it's better to ask HP and Broadcom to fix the issue". I agree, but how can we do that? Thank you, Paulo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1447664 Title: 14e4:1687 broadcom tg3 network driver disconnects under high load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load
Hello, I have seen the exactly same issue, with the exactly same hardware you have: it's the HP EliteDesk 705 G3 Desktop Mini. I've tested already a ton of options, including recompiling the latest kernel, booting with several parameters, and so on and so forth. Got nothing more than a big headache. I have 100+ machines to install in a month and my team is having a really hard time to deal with this issue. I have posted my findings on the fog forums. Fog is an open-source cloning tool. Please check it out: https://forums.fogproject.org/topic/10731/crash-due-to-timeout-in-tg3 -kernel-module-tg3_stop_block-timed-out-ofs-4c00-enable_bit-2 Any ideas on this bug? It seems to be related to 10/100 switches. If both ends are gigabit, it works much more reliably. Problems still arise, but much less frequently. With my old "fast ethernet" switch, the problem alwasy happens. It's lurking anywhere between the binary blob (the firmware), the kernel driver, the hardware or any tricky combination of these. Perhaps related to the AMD platform I can run tests or gather more data, if it helps. The issue always happens here. Any ideas on how to solve or workaround this issue? Patches or parameters are welcome... Regards, Paulo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1447664 Title: 14e4:1687 broadcom tg3 network driver disconnects under high load To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs