Re: 10.1-STABLE bce: Watchdog timeout occurred
On Wed, Apr 22, 2015 at 12:39:16AM -0400, Chris Ross wrote: > > On Apr 21, 2015, at 10:10 , Gareth Wyn Roberts > wrote: > > This may be caused by DMA alignment problems. > > See > > https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable > > for a recent thread about the msk driver. The msk maintainer Yonghyeon > > Pyun has opted for super safe options of 32K alignment! > > > > It's a long shot, but you could try increasing BCE_DMA_ALIGN and/or > > BCE_RX_BUF_ALIGN in the include file if_bcereg.h, say up to 4096, to see > > whether it makes any difference. > > Well, after making that change, I was able to confirm that the problem > doesn't seem to occur. However, in trying to verify the problem on an > unmodified kernel, I've rebooted a GENERIC from r281672 without that change, > and am also not seeing the problem. :-/ I'm not sure whether the gremlins > have "fixed" something, or if I was just too critical in my initial analysis. > > For now I'll take that change out of my tree and run without it. If I see > the flapping again, I'll confirm that it's repeatable, then change the > alignments as suggested and see if I see a change. > I guess the alignment issue of msk(4) has nothing to do with bce(4) watchdog timeouts. It would be more helpful to know details of your controller(bce(4)/brgphy(4) related dmesg output, pciconf output etc) and network setup. If you know a reliable way that triggers the watchdog timeouts, please share that info too. I would have tried to disable all hardware offloading features(TSO, checksum, VLAN H/W tagging etc) and see whether that makes any differences in the first step to narrow down the issue. Thanks. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 10.1-STABLE bce: Watchdog timeout occurred
On Apr 21, 2015, at 10:10 , Gareth Wyn Roberts wrote: > This may be caused by DMA alignment problems. > See > https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable > for a recent thread about the msk driver. The msk maintainer Yonghyeon Pyun > has opted for super safe options of 32K alignment! > > It's a long shot, but you could try increasing BCE_DMA_ALIGN and/or > BCE_RX_BUF_ALIGN in the include file if_bcereg.h, say up to 4096, to see > whether it makes any difference. Well, after making that change, I was able to confirm that the problem doesn't seem to occur. However, in trying to verify the problem on an unmodified kernel, I've rebooted a GENERIC from r281672 without that change, and am also not seeing the problem. :-/ I'm not sure whether the gremlins have "fixed" something, or if I was just too critical in my initial analysis. For now I'll take that change out of my tree and run without it. If I see the flapping again, I'll confirm that it's repeatable, then change the alignments as suggested and see if I see a change. Thanks all... - Chris signature.asc Description: Message signed with OpenPGP using GPGMail
Re: 10.1-STABLE bce: Watchdog timeout occurred
This may be caused by DMA alignment problems. See https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable for a recent thread about the msk driver. The msk maintainer Yonghyeon Pyun has opted for super safe options of 32K alignment! It's a long shot, but you could try increasing BCE_DMA_ALIGN and/or BCE_RX_BUF_ALIGN in the include file if_bcereg.h, say up to 4096, to see whether it makes any difference. - Gareth. On 21/04/2015 10:52, Alnis Morics wrote: On 04/21/2015 06:17 AM, Chris Ross wrote: I got a new [to me] system recently, a Dell PE 1950. It has two bce parts on the motherboard that identify as: bce#: The OS I installed and kernel I'm running are from a download of a 10.1 STABLE ISO, r281235, April 7, 2015. I had gone on to check out a newer stable from subversion, and build a custom kernel, but when I booted that one I got a bce0 that didn't seem to work, and kept emitting: bce0: /usr/src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting! bce0: link state changed to DOWN bce0: link state changed to UP So, I fell back. But I've since noticed that even the original kernel seems to do this after booting. I'm not yet running any notable amount of traffic through the system, but intend to make it an edge router, so certainly will be. Is there any sort of issue noted in the bce driver in recent days/weeks/months? Are other folks seeing this diagnostic/error? I'll do a little more testing and see if I'm seeing it more or less often, but I know that in at least some cases the interface has flapped like this after boot for long enough that I was unable to get connected remotely, and resorted to a console login to reboot. - Chris There are "Watchdog timeout" errors with some msk NICs. Both msk and bce are dependent on MII bus code (see /usr/src/sys/amd64/conf/GENERIC) -Alnis ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 10.1-STABLE bce: Watchdog timeout occurred
On 04/21/2015 06:17 AM, Chris Ross wrote: I got a new [to me] system recently, a Dell PE 1950. It has two bce parts on the motherboard that identify as: bce#: The OS I installed and kernel I'm running are from a download of a 10.1 STABLE ISO, r281235, April 7, 2015. I had gone on to check out a newer stable from subversion, and build a custom kernel, but when I booted that one I got a bce0 that didn't seem to work, and kept emitting: bce0: /usr/src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting! bce0: link state changed to DOWN bce0: link state changed to UP So, I fell back. But I've since noticed that even the original kernel seems to do this after booting. I'm not yet running any notable amount of traffic through the system, but intend to make it an edge router, so certainly will be. Is there any sort of issue noted in the bce driver in recent days/weeks/months? Are other folks seeing this diagnostic/error? I'll do a little more testing and see if I'm seeing it more or less often, but I know that in at least some cases the interface has flapped like this after boot for long enough that I was unable to get connected remotely, and resorted to a console login to reboot. - Chris There are "Watchdog timeout" errors with some msk NICs. Both msk and bce are dependent on MII bus code (see /usr/src/sys/amd64/conf/GENERIC) -Alnis ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
10.1-STABLE bce: Watchdog timeout occurred
I got a new [to me] system recently, a Dell PE 1950. It has two bce parts on the motherboard that identify as: bce#: The OS I installed and kernel I'm running are from a download of a 10.1 STABLE ISO, r281235, April 7, 2015. I had gone on to check out a newer stable from subversion, and build a custom kernel, but when I booted that one I got a bce0 that didn't seem to work, and kept emitting: bce0: /usr/src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting! bce0: link state changed to DOWN bce0: link state changed to UP So, I fell back. But I've since noticed that even the original kernel seems to do this after booting. I'm not yet running any notable amount of traffic through the system, but intend to make it an edge router, so certainly will be. Is there any sort of issue noted in the bce driver in recent days/weeks/months? Are other folks seeing this diagnostic/error? I'll do a little more testing and see if I'm seeing it more or less often, but I know that in at least some cases the interface has flapped like this after boot for long enough that I was unable to get connected remotely, and resorted to a console login to reboot. - Chris signature.asc Description: Message signed with OpenPGP using GPGMail