[Sorry, Henrique, for replying directly to you] > On 26 May 2015, at 15:39, Henrique de Moraes Holschuh wrote: > > On Tue, May 26, 2015, at 09:24, Justin Catterall wrote: >> At irregular times, and apparently for no reason at all, networking >> drops and cannot be restarted without reboot on a fresh install of >> Jessie. The NIC is a Broadcom NetXtreme BCM5720. >> >> ifconfig thinks networking is still up because I can: >> ifconfig eth0 down >> >> I find this when I try 'ifconfig eth0 up': >> tg3_abort_hw timed out TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff > > Hmm, it is either a kernel issue, or a hardware issue. > >> Any suggestions on where to look for a solution? > > Yes. > > First, disable all hardware offloading using ethtool. See if that > helps.
Was able to disable all except: rx-vlan-offload: on [fixed] tx-vlan-offload: on [fixed] Now, if I "/etc/init.d/networking restart" the system doesn't report any error, but networking is still dead. However, I can rmmod tg3|ptp|libphy, then "modprobe tg3" and "/etc/init.d/networking start" and all works (I have done this a handful of times with no need to reboot to re-enable networking). So that's some progress. > Also, if this NIC is in the system mainboard, make sure you are using > the latest firmware ("BIOS update") from your motherboard vendor: it is > usual to have the motherboard NICs use a data block in the shared system > FLASH for vital product data and firmware. The motherboard vendor will > bundle up updates for the NIC firmware with the BIOS updates when both > are in the same FLASH chip. I've read the documentation for the latest firmware and there is no mention of changes for the NIC, only a "power-on delay option" to allow longer/shorter period of time to hit the key to access the BIOS. And a change to boot device detection to better detect devices with invalid boot records. No other changes mentioned in the firmware. Here's a link to the page: http://h20565.www2.hp.com/hpsc/swd/public/detail?sp4ts.oid=5390291&swItemId=MTX_a21cee44c55643598fb2f52bc2&swEnvOid=4144#tab4 I don't like tinkering with firmware if I can help it, in this case they don't say there are changes to the NIC so do you think I should still upgrade? The description says no bugs fixed, only enhancements. > Make sure you have the latest linux firmware file for the tg3 driver as > well. If the initramfs image has the tg3.ko module inside, it must also > have the firmware file. A workaround for any initramfs-related tg3 > firmware loading issues is to "rmmod tg3 ; modprobe tg3" after the > system booted (and before the NIC hardlocks). See above, even after rmmod'ing I can still force network restart to fail without error, though it is recoverable if noticed. > If all of the above failed, get yourself familiar with building a custom > Debian-compatible kernel using pristine upstream kernels from > www.kernel.org. Wait until 3.18.15 and 4.0.5 are released in > www.kernel.org, and build custom kernels based on them. Alternatively, > wait until a debian-packaged version of kernel 4.0.5 is available. DO > NOT use 4.0 kernels before 4.0.5 on pain of possible data loss. Data loss? On a "stable" kernel? WTF are they doing these days? I notice that stable/dev are no longer even/odd major numbers - took me a bit of Googling to get caught up! > If either the 3.18.15 or 4.0.5 kernel fixes the issue with your bcm5720, > please tell us so that we can try to isolate the fix and backport it to > the Debian kernel. In the mean time I've made a bash-script to rmmod and modprobe as appropriate. I'll set a cron job to ping a couple of other servers on the LAN and execute the script and restart networking should the pings fail. > If that fails, you will have to engage the kernel community itself for a > fix. Please file a bug on bugzilla.kernel.org, and good luck. There are > several hardware hang reports open against BCM57xx + tg3. Damn crap hardware. I remember having issues with tg3 at least six or seven years ago. I can believe it's still being incorporated into motherboards when there are obviously problems with the chipset. Depending on speed of progress on the kernel front I may just stick a PCI NIC in there - I think I still have some 3c509's around somewhere... > Alternatively, try to get yourself an Intel NIC that works with the igb > driver (don't get an Intel NIC that needs the e1000e driver) to replace > the hardlock-prone bcm5720 + tg3 combination. Thanks for the pointers. I at least have a situation now where I don't need a reboot to get networking functioning after it fails. It's far from perfect, but it's much, much better. -- Justin C, by the sea. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/3bcb9e79-8988-475e-b801-e5fccd423...@masonsmusic.co.uk