Re: bce(4) panics, 9.2rc1 [redux]

2013-07-31 Thread Hiroki Sato
[Added yougari@ and davidch@ to the To:/Cc: list]

 I confirmed that my issue reported on -current@ is due to the bxe(4)
 driver (BCM57711).  If it is disabled, shutdown works fine without
 NMI.

 Also, I received several reports about the same box that NMI occurred
 even on bge(4) (BCM5717) driver when probing during power-cycle test.
 The probability was about once per 30 power-cycles.  Once it
 occurred, an AC on/off cycle was required (resetting a system
 reproduced the NMI in the same timing).

Sean Bruno sean_br...@yahoo.com wrote
  in 1375208841.1496.3.camel@localhost:

se
se
se  http://svnweb.freebsd.org/base?view=revisionrevision=236216
se 
se 
se
se
se Ok, confirmed after ~50 reboots.
se
se There is a timing problem in this revision that I don't fully
se understand.  Adding printf's inside bce_reset() will cause the existing
se code to succeed, and sometimes the existing code in this revision will
se work (about 10% of the time).
se
se In the failure mode, the network interface, bce0, will not come up into
se service *without* and network restart, after which it works fine.
se
se I suspect that we are missing a DELAY or UDELAY somewhere in the
se restoral of the emac_status settings that needs to be implemented.
se
se Sean
se
se p.s. sorry for the late report as the commit is well over a year old.


pgpZQYE1ILK56.pgp
Description: PGP signature


Re: bce(4) panics, 9.2rc1 [redux]

2013-07-31 Thread Yonghyeon PYUN
On Wed, Jul 31, 2013 at 03:54:06PM +0900, Hiroki Sato wrote:
 [Added yougari@ and davidch@ to the To:/Cc: list]
 
  I confirmed that my issue reported on -current@ is due to the bxe(4)
  driver (BCM57711).  If it is disabled, shutdown works fine without
  NMI.
 
  Also, I received several reports about the same box that NMI occurred
  even on bge(4) (BCM5717) driver when probing during power-cycle test.
  The probability was about once per 30 power-cycles.  Once it
  occurred, an AC on/off cycle was required (resetting a system
  reproduced the NMI in the same timing).
 

Hmm, Hiroki, could you add bge_reset()/bge_chipinit() after
bge_stop() in bge_shutdown() and let me know whether that change
makes any difference?

 Sean Bruno sean_br...@yahoo.com wrote
   in 1375208841.1496.3.camel@localhost:
 
 se
 se
 se  http://svnweb.freebsd.org/base?view=revisionrevision=236216
 se 
 se 
 se
 se
 se Ok, confirmed after ~50 reboots.
 se
 se There is a timing problem in this revision that I don't fully
 se understand.  Adding printf's inside bce_reset() will cause the existing
 se code to succeed, and sometimes the existing code in this revision will
 se work (about 10% of the time).
 se
 se In the failure mode, the network interface, bce0, will not come up into
 se service *without* and network restart, after which it works fine.
 se
 se I suspect that we are missing a DELAY or UDELAY somewhere in the
 se restoral of the emac_status settings that needs to be implemented.
 se
 se Sean
 se
 se p.s. sorry for the late report as the commit is well over a year old.


___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: bce(4) panics, 9.2rc1 [redux]

2013-07-31 Thread Hiroki Sato
Yonghyeon PYUN pyu...@gmail.com wrote
  in 20130731074341.gc1...@michelle.cdnetworks.com:

py On Wed, Jul 31, 2013 at 03:54:06PM +0900, Hiroki Sato wrote:
py  [Added yougari@ and davidch@ to the To:/Cc: list]
py 
py   I confirmed that my issue reported on -current@ is due to the bxe(4)
py   driver (BCM57711).  If it is disabled, shutdown works fine without
py   NMI.
py 
py   Also, I received several reports about the same box that NMI occurred
py   even on bge(4) (BCM5717) driver when probing during power-cycle test.
py   The probability was about once per 30 power-cycles.  Once it
py   occurred, an AC on/off cycle was required (resetting a system
py   reproduced the NMI in the same timing).
py 
py
py Hmm, Hiroki, could you add bge_reset()/bge_chipinit() after
py bge_stop() in bge_shutdown() and let me know whether that change
py makes any difference?

 Thank you.  I will give it a try.  The test will probably take some
 time since it occurs only once in 30-50 power-cycles, though.

 On bxe(4) it is 100% reproducible, FYI.

-- Hiroki


pgp0qxbpk6Qs5.pgp
Description: PGP signature


Re: bce(4) panics, 9.2rc1 [redux]

2013-07-30 Thread Sean Bruno


 http://svnweb.freebsd.org/base?view=revisionrevision=236216
 
 


Ok, confirmed after ~50 reboots.

There is a timing problem in this revision that I don't fully
understand.  Adding printf's inside bce_reset() will cause the existing
code to succeed, and sometimes the existing code in this revision will
work (about 10% of the time).

In the failure mode, the network interface, bce0, will not come up into
service *without* and network restart, after which it works fine.

I suspect that we are missing a DELAY or UDELAY somewhere in the
restoral of the emac_status settings that needs to be implemented.

Sean

p.s. sorry for the late report as the commit is well over a year old.


signature.asc
Description: This is a digitally signed message part


Re: bce(4) panics, 9.2rc1 [redux]

2013-07-29 Thread Sean Bruno
On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: 
 Running 9.2 in production load mail servers.  We're hitting the
 watchdog message and crashing with the stable/9 version.  We're
 reverting the change from 2 weeks ago and seeing if it still happens.
 We didn't see this from stable/9 from about a month ago.
 
 
 Sean

Not seeing any changes to core dumps, or crashes after updating the
bce(4) interface on these Dell R410s.  IPMI was a definite false hope.
No changes noted after I modified the ipmi_attach code.

stable/7 works just fine and stable/9 fails with NMI erros on the
console very badly.  It fails so badly that it won't come into service
at all.  I've reverted stable/9 back to august of 2012 with no changes.


It sort of looks like r236216 is causing severe issues with my
configuration.  The Dell R410 has a 3rd ethernet interface for the BMC
only, not sure if that is meaningful in this context.

The 3rd interface is *not* visible from the o/s and is dedicated to the
BMC interface.

Doing more testing at this time to validate.

Sean

http://svnweb.freebsd.org/base?view=revisionrevision=236216




signature.asc
Description: This is a digitally signed message part


Re: bce(4) panics, 9.2rc1 [redux]

2013-07-29 Thread Barney Cordoba





 From: Sean Bruno sean_br...@yahoo.com
To: freebsd-net@freebsd.org freebsd-net@freebsd.org 
Sent: Monday, July 29, 2013 8:56 PM
Subject: Re: bce(4) panics, 9.2rc1 [redux]
 

On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote: 
 Running 9.2 in production load mail servers.  We're hitting the
 watchdog message and crashing with the stable/9 version.  We're
 reverting the change from 2 weeks ago and seeing if it still happens.
 We didn't see this from stable/9 from about a month ago.
 
 
 Sean

Not seeing any changes to core dumps, or crashes after updating the
bce(4) interface on these Dell R410s.  IPMI was a definite false hope.
No changes noted after I modified the ipmi_attach code.

stable/7 works just fine and stable/9 fails with NMI erros on the
console very badly.  It fails so badly that it won't come into service
at all.  I've reverted stable/9 back to august of 2012 with no changes.


It sort of looks like r236216 is causing severe issues with my
configuration.  The Dell R410 has a 3rd ethernet interface for the BMC
only, not sure if that is meaningful in this context.

The 3rd interface is *not* visible from the o/s and is dedicated to the
BMC interface.

Doing more testing at this time to validate.

Sean


--

FWIW, I have an R210 with a BCM5716 running 9.1 RELEASE without
any problems. I have customized the driver a bit. Try turning off the
features and running it raw without any checksum or tso gobbledygook. 


BC
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: bce(4) panics, 9.2rc1, IPMI related?

2013-07-26 Thread Sean Bruno

  
  bce0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem
  0xda00-0xdbff irq 36 at device 0.0 on pci1
  miibus0: MII bus on bce0
  brgphy0: BCM5709 10/100/1000baseT PHY PHY 1 on miibus0
  brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
  bce0: Ethernet address: d4:ae:52:8d:42:fc
  bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3);
  Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11)
  Coal (RX:6,6,18,18; TX:20,20,80,80)
  bce1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem
  0xdc00-0xddff irq 48 at device 0.1 on pci1
  miibus1: MII bus on bce1
  brgphy1: BCM5709 10/100/1000baseT PHY PHY 1 on miibus1
  brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
  bce1: Ethernet address: d4:ae:52:8d:42:fd
  bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3);
  Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11)
  Coal (RX:6,6,18,18; TX:20,20,80,80)
  
 
There was no change reverting r253128.  I don't think that this affects
what I'm seeing.  However ... see below

 
 These machines are using IPMI for management (Dell R410) and seem to be
 unable to attach to /dev/ipmi0:
 
 ipmi0: IPMI System Interface port 0xca8,0xcac on acpi0
 ipmi0: KCS mode found at io 0xca8 on acpi
 
 ipmi1: IPMI System Interface on isa0
 device_attach: ipmi1 attach returned 16
 ipmi1: IPMI System Interface on isa0
 device_attach: ipmi1 attach returned 16
 ...
 ipmi0: Timed out waiting for GET_DEVICE_ID
 
 
 Sean
 
tl;dr need a review of http://people.freebsd.org/~sbruno/ipmi_fixes.txt

I don't understand why, but ipmi_isa.c attach isn't seeing that
ipmi_acpi.c is attached at all.  Moreover, it takes slightly more that 3
seconds for the BMC on a Dell R410 to respond to GET_DEVICE_ID while
using the SOL console at 9600.  :-(

So, I've adjusted ipmivars.h::MAX_TIMEOUT to be (6 * hz) to properly
attach to ipmi0.

I've also done something gross, but I can't really see any way around
it.  I've added detection to ipmi_isa.c to see if the acpi IPMI
interface is enabled/disabled via acpi_disabled which means I have to
include ACPI header files in acpi_isa.c ... amusing, but it seems to
work.  I've been able to return the IPMI controller to the same behavior
that it appears to have in stable/7 now with the attached patch.

---
ipmi0: IPMI System Interface port 0xca8,0xcac on acpi0
ipmi0: KCS mode found at io 0xca8 on acpi
ipmi0: IPMI device rev. 0, firmware rev. 1.90, version 2.0
ipmi0: Number of channels 5
ipmi0: Attached watchdog
---

It looks like our implementation of IPMI somehow tries to attach TWICE
to the IPMI controller, once via ACPI and once via ISA.  This is really
confusing the hell out of the Broadcom management firmware even though
the second attachment fails.

Sean


signature.asc
Description: This is a digitally signed message part


Re: bce(4) panics, 9.2rc1

2013-07-25 Thread Sean Bruno
On Wed, 2013-07-24 at 14:23 -0700, Sean Bruno wrote:
 On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote:
  Running 9.2 in production load mail servers.  We're hitting the
  watchdog message and crashing with the stable/9 version.  We're
  reverting the change from 2 weeks ago and seeing if it still happens.
  We didn't see this from stable/9 from about a month ago.
  
  
  Sean
  
  ref: 
  http://svnweb.freebsd.org/base?view=revisionrevision=253128
 
 These panics are happening on:
 
 bce0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem
 0xda00-0xdbff irq 36 at device 0.0 on pci1
 miibus0: MII bus on bce0
 brgphy0: BCM5709 10/100/1000baseT PHY PHY 1 on miibus0
 brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
 bce0: Ethernet address: d4:ae:52:8d:42:fc
 bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3);
 Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11)
 Coal (RX:6,6,18,18; TX:20,20,80,80)
 bce1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem
 0xdc00-0xddff irq 48 at device 0.1 on pci1
 miibus1: MII bus on bce1
 brgphy1: BCM5709 10/100/1000baseT PHY PHY 1 on miibus1
 brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
 bce1: Ethernet address: d4:ae:52:8d:42:fd
 bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3);
 Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11)
 Coal (RX:6,6,18,18; TX:20,20,80,80)
 


These machines are using IPMI for management (Dell R410) and seem to be
unable to attach to /dev/ipmi0:

ipmi0: IPMI System Interface port 0xca8,0xcac on acpi0
ipmi0: KCS mode found at io 0xca8 on acpi

ipmi1: IPMI System Interface on isa0
device_attach: ipmi1 attach returned 16
ipmi1: IPMI System Interface on isa0
device_attach: ipmi1 attach returned 16
...
ipmi0: Timed out waiting for GET_DEVICE_ID


Sean



signature.asc
Description: This is a digitally signed message part


Re: bce(4) panics, 9.2rc1

2013-07-24 Thread Sean Bruno
On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote:
 Running 9.2 in production load mail servers.  We're hitting the
 watchdog message and crashing with the stable/9 version.  We're
 reverting the change from 2 weeks ago and seeing if it still happens.
 We didn't see this from stable/9 from about a month ago.
 
 
 Sean
 
 ref: 
 http://svnweb.freebsd.org/base?view=revisionrevision=253128

These panics are happening on:

bce0: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem
0xda00-0xdbff irq 36 at device 0.0 on pci1
miibus0: MII bus on bce0
brgphy0: BCM5709 10/100/1000baseT PHY PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bce0: Ethernet address: d4:ae:52:8d:42:fc
bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3);
Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11)
Coal (RX:6,6,18,18; TX:20,20,80,80)
bce1: Broadcom NetXtreme II BCM5716 1000Base-T (C0) mem
0xdc00-0xddff irq 48 at device 0.1 on pci1
miibus1: MII bus on bce1
brgphy1: BCM5709 10/100/1000baseT PHY PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bce1: Ethernet address: d4:ae:52:8d:42:fd
bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3);
Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.11)
Coal (RX:6,6,18,18; TX:20,20,80,80)



signature.asc
Description: This is a digitally signed message part


Re: bce(4) panics, 9.2rc1

2013-07-24 Thread hiren panchasara
On Wed, Jul 24, 2013 at 2:23 PM, Sean Bruno sean_br...@yahoo.com wrote:
 On Wed, 2013-07-24 at 14:07 -0700, Sean Bruno wrote:
 Running 9.2 in production load mail servers.  We're hitting the
 watchdog message and crashing with the stable/9 version.  We're
 reverting the change from 2 weeks ago and seeing if it still happens.
 We didn't see this from stable/9 from about a month ago.


pciconf -lvb: http://people.freebsd.org/~hiren/pciconf.txt

Thanks,
Hiren
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: bce(4) panics, 9.2rc1

2013-07-24 Thread Steven Hartland
- Original Message - 
From: Sean Bruno sean_br...@yahoo.com

As a guess its likely the interrupt handler is triggering
while the watchdog timeout handler is re-initialising the
card so you inconsitent state resulting in the crash.

In from /var/crash should help determine the cause and
confirm / deny that.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org