Re: bce(4) panics, 9.2rc1 [redux]
[Added yougari@ and davidch@ to the To:/Cc: list] I confirmed that my issue reported on -current@ is due to the bxe(4) driver (BCM57711). If it is disabled, shutdown works fine without NMI. Also, I received several reports about the same box that NMI occurred even on bge(4) (BCM5717) driver when probing during power-cycle test. The probability was about once per 30 power-cycles. Once it occurred, an AC on/off cycle was required (resetting a system reproduced the NMI in the same timing). Sean Bruno sean_br...@yahoo.com wrote in 1375208841.1496.3.camel@localhost: se se se http://svnweb.freebsd.org/base?view=revisionrevision=236216 se se se se se Ok, confirmed after ~50 reboots. se se There is a timing problem in this revision that I don't fully se understand. Adding printf's inside bce_reset() will cause the existing se code to succeed, and sometimes the existing code in this revision will se work (about 10% of the time). se se In the failure mode, the network interface, bce0, will not come up into se service *without* and network restart, after which it works fine. se se I suspect that we are missing a DELAY or UDELAY somewhere in the se restoral of the emac_status settings that needs to be implemented. se se Sean se se p.s. sorry for the late report as the commit is well over a year old. pgpZQYE1ILK56.pgp Description: PGP signature
Re: bce(4) panics, 9.2rc1 [redux]
On Wed, Jul 31, 2013 at 03:54:06PM +0900, Hiroki Sato wrote: [Added yougari@ and davidch@ to the To:/Cc: list] I confirmed that my issue reported on -current@ is due to the bxe(4) driver (BCM57711). If it is disabled, shutdown works fine without NMI. Also, I received several reports about the same box that NMI occurred even on bge(4) (BCM5717) driver when probing during power-cycle test. The probability was about once per 30 power-cycles. Once it occurred, an AC on/off cycle was required (resetting a system reproduced the NMI in the same timing). Hmm, Hiroki, could you add bge_reset()/bge_chipinit() after bge_stop() in bge_shutdown() and let me know whether that change makes any difference? Sean Bruno sean_br...@yahoo.com wrote in 1375208841.1496.3.camel@localhost: se se se http://svnweb.freebsd.org/base?view=revisionrevision=236216 se se se se se Ok, confirmed after ~50 reboots. se se There is a timing problem in this revision that I don't fully se understand. Adding printf's inside bce_reset() will cause the existing se code to succeed, and sometimes the existing code in this revision will se work (about 10% of the time). se se In the failure mode, the network interface, bce0, will not come up into se service *without* and network restart, after which it works fine. se se I suspect that we are missing a DELAY or UDELAY somewhere in the se restoral of the emac_status settings that needs to be implemented. se se Sean se se p.s. sorry for the late report as the commit is well over a year old. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: bce(4) panics, 9.2rc1 [redux]
Yonghyeon PYUN pyu...@gmail.com wrote in 20130731074341.gc1...@michelle.cdnetworks.com: py On Wed, Jul 31, 2013 at 03:54:06PM +0900, Hiroki Sato wrote: py [Added yougari@ and davidch@ to the To:/Cc: list] py py I confirmed that my issue reported on -current@ is due to the bxe(4) py driver (BCM57711). If it is disabled, shutdown works fine without py NMI. py py Also, I received several reports about the same box that NMI occurred py even on bge(4) (BCM5717) driver when probing during power-cycle test. py The probability was about once per 30 power-cycles. Once it py occurred, an AC on/off cycle was required (resetting a system py reproduced the NMI in the same timing). py py py Hmm, Hiroki, could you add bge_reset()/bge_chipinit() after py bge_stop() in bge_shutdown() and let me know whether that change py makes any difference? Thank you. I will give it a try. The test will probably take some time since it occurs only once in 30-50 power-cycles, though. On bxe(4) it is 100% reproducible, FYI. -- Hiroki pgp0qxbpk6Qs5.pgp Description: PGP signature
Re: bce(4) panics, 9.2rc1 [redux]
http://svnweb.freebsd.org/base?view=revisionrevision=236216 Ok, confirmed after ~50 reboots. There is a timing problem in this revision that I don't fully understand. Adding printf's inside bce_reset() will cause the existing code to succeed, and sometimes the existing code in this revision will work (about 10% of the time). In the failure mode, the network interface, bce0, will not come up into service *without* and network restart, after which it works fine. I suspect that we are missing a DELAY or UDELAY somewhere in the restoral of the emac_status settings that needs to be implemented. Sean p.s. sorry for the late report as the commit is well over a year old. signature.asc Description: This is a digitally signed message part
Re: bce(4) panics, 9.2rc1 [redux]
On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: Running 9.2 in production load mail servers. We're hitting the watchdog message and crashing with the stable/9 version. We're reverting the change from 2 weeks ago and seeing if it still happens. We didn't see this from stable/9 from about a month ago. Sean Not seeing any changes to core dumps, or crashes after updating the bce(4) interface on these Dell R410s. IPMI was a definite false hope. No changes noted after I modified the ipmi_attach code. stable/7 works just fine and stable/9 fails with NMI erros on the console very badly. It fails so badly that it won't come into service at all. I've reverted stable/9 back to august of 2012 with no changes. It sort of looks like r236216 is causing severe issues with my configuration. The Dell R410 has a 3rd ethernet interface for the BMC only, not sure if that is meaningful in this context. The 3rd interface is *not* visible from the o/s and is dedicated to the BMC interface. Doing more testing at this time to validate. Sean http://svnweb.freebsd.org/base?view=revisionrevision=236216 signature.asc Description: This is a digitally signed message part
Re: bce(4) panics, 9.2rc1 [redux]
From: Sean Bruno sean_br...@yahoo.com To: freebsd-net@freebsd.org freebsd-net@freebsd.org Sent: Monday, July 29, 2013 8:56 PM Subject: Re: bce(4) panics, 9.2rc1 [redux] On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: Running 9.2 in production load mail servers. We're hitting the watchdog message and crashing with the stable/9 version. We're reverting the change from 2 weeks ago and seeing if it still happens. We didn't see this from stable/9 from about a month ago. Sean Not seeing any changes to core dumps, or crashes after updating the bce(4) interface on these Dell R410s. IPMI was a definite false hope. No changes noted after I modified the ipmi_attach code. stable/7 works just fine and stable/9 fails with NMI erros on the console very badly. It fails so badly that it won't come into service at all. I've reverted stable/9 back to august of 2012 with no changes. It sort of looks like r236216 is causing severe issues with my configuration. The Dell R410 has a 3rd ethernet interface for the BMC only, not sure if that is meaningful in this context. The 3rd interface is *not* visible from the o/s and is dedicated to the BMC interface. Doing more testing at this time to validate. Sean -- FWIW, I have an R210 with a BCM5716 running 9.1 RELEASE without any problems. I have customized the driver a bit. Try turning off the features and running it raw without any checksum or tso gobbledygook. BC ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: bce(4) panics, 9.2rc1, IPMI related?
bce0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem 0xda00-0xdbff irq 36 at device 0.0 on pci1 miibus0: MII bus on bce0 brgphy0: BCM5709 10/100/1000baseT PHY PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce0: Ethernet address: d4:ae:52:8d:42:fc bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11) Coal (RX:6,6,18,18; TX:20,20,80,80) bce1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem 0xdc00-0xddff irq 48 at device 0.1 on pci1 miibus1: MII bus on bce1 brgphy1: BCM5709 10/100/1000baseT PHY PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce1: Ethernet address: d4:ae:52:8d:42:fd bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11) Coal (RX:6,6,18,18; TX:20,20,80,80) There was no change reverting r253128. I don't think that this affects what I'm seeing. However ... see below These machines are using IPMI for management (Dell R410) and seem to be unable to attach to /dev/ipmi0: ipmi0: IPMI System Interface port 0xca8,0xcac on acpi0 ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: IPMI System Interface on isa0 device_attach: ipmi1 attach returned 16 ipmi1: IPMI System Interface on isa0 device_attach: ipmi1 attach returned 16 ... ipmi0: Timed out waiting for GET_DEVICE_ID Sean tl;dr need a review of http://people.freebsd.org/~sbruno/ipmi_fixes.txt I don't understand why, but ipmi_isa.c attach isn't seeing that ipmi_acpi.c is attached at all. Moreover, it takes slightly more that 3 seconds for the BMC on a Dell R410 to respond to GET_DEVICE_ID while using the SOL console at 9600. :-( So, I've adjusted ipmivars.h::MAX_TIMEOUT to be (6 * hz) to properly attach to ipmi0. I've also done something gross, but I can't really see any way around it. I've added detection to ipmi_isa.c to see if the acpi IPMI interface is enabled/disabled via acpi_disabled which means I have to include ACPI header files in acpi_isa.c ... amusing, but it seems to work. I've been able to return the IPMI controller to the same behavior that it appears to have in stable/7 now with the attached patch. --- ipmi0: IPMI System Interface port 0xca8,0xcac on acpi0 ipmi0: KCS mode found at io 0xca8 on acpi ipmi0: IPMI device rev. 0, firmware rev. 1.90, version 2.0 ipmi0: Number of channels 5 ipmi0: Attached watchdog --- It looks like our implementation of IPMI somehow tries to attach TWICE to the IPMI controller, once via ACPI and once via ISA. This is really confusing the hell out of the Broadcom management firmware even though the second attachment fails. Sean signature.asc Description: This is a digitally signed message part
Re: bce(4) panics, 9.2rc1
On Wed, 2013-07-24 at 14:23 -0700, Sean Bruno wrote: On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: Running 9.2 in production load mail servers. We're hitting the watchdog message and crashing with the stable/9 version. We're reverting the change from 2 weeks ago and seeing if it still happens. We didn't see this from stable/9 from about a month ago. Sean ref: http://svnweb.freebsd.org/base?view=revisionrevision=253128 These panics are happening on: bce0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem 0xda00-0xdbff irq 36 at device 0.0 on pci1 miibus0: MII bus on bce0 brgphy0: BCM5709 10/100/1000baseT PHY PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce0: Ethernet address: d4:ae:52:8d:42:fc bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11) Coal (RX:6,6,18,18; TX:20,20,80,80) bce1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem 0xdc00-0xddff irq 48 at device 0.1 on pci1 miibus1: MII bus on bce1 brgphy1: BCM5709 10/100/1000baseT PHY PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce1: Ethernet address: d4:ae:52:8d:42:fd bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11) Coal (RX:6,6,18,18; TX:20,20,80,80) These machines are using IPMI for management (Dell R410) and seem to be unable to attach to /dev/ipmi0: ipmi0: IPMI System Interface port 0xca8,0xcac on acpi0 ipmi0: KCS mode found at io 0xca8 on acpi ipmi1: IPMI System Interface on isa0 device_attach: ipmi1 attach returned 16 ipmi1: IPMI System Interface on isa0 device_attach: ipmi1 attach returned 16 ... ipmi0: Timed out waiting for GET_DEVICE_ID Sean signature.asc Description: This is a digitally signed message part
Re: bce(4) panics, 9.2rc1
On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: Running 9.2 in production load mail servers. We're hitting the watchdog message and crashing with the stable/9 version. We're reverting the change from 2 weeks ago and seeing if it still happens. We didn't see this from stable/9 from about a month ago. Sean ref: http://svnweb.freebsd.org/base?view=revisionrevision=253128 These panics are happening on: bce0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem 0xda00-0xdbff irq 36 at device 0.0 on pci1 miibus0: MII bus on bce0 brgphy0: BCM5709 10/100/1000baseT PHY PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce0: Ethernet address: d4:ae:52:8d:42:fc bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11) Coal (RX:6,6,18,18; TX:20,20,80,80) bce1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem 0xdc00-0xddff irq 48 at device 0.1 on pci1 miibus1: MII bus on bce1 brgphy1: BCM5709 10/100/1000baseT PHY PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow bce1: Ethernet address: d4:ae:52:8d:42:fd bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11) Coal (RX:6,6,18,18; TX:20,20,80,80) signature.asc Description: This is a digitally signed message part
Re: bce(4) panics, 9.2rc1
On Wed, Jul 24, 2013 at 2:23 PM, Sean Bruno sean_br...@yahoo.com wrote: On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: Running 9.2 in production load mail servers. We're hitting the watchdog message and crashing with the stable/9 version. We're reverting the change from 2 weeks ago and seeing if it still happens. We didn't see this from stable/9 from about a month ago. pciconf -lvb: http://people.freebsd.org/~hiren/pciconf.txt Thanks, Hiren ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: bce(4) panics, 9.2rc1
- Original Message - From: Sean Bruno sean_br...@yahoo.com As a guess its likely the interrupt handler is triggering while the watchdog timeout handler is re-initialising the card so you inconsitent state resulting in the crash. In from /var/crash should help determine the cause and confirm / deny that. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org