Re: em interrupt storm

2006-11-14 Thread Bill Moran
In response to Jack Vogel [EMAIL PROTECTED]:

 On 11/13/06, Bill Moran [EMAIL PROTECTED] wrote:
 
  Just experienced an interrupt storm on an em device that disabled a
  server until I could reboot it.
 
  My initial research turned up this thread:
  http://lists.freebsd.org/pipermail/freebsd-current/2005-November/058336.html
 
  Which seems related, even if it is a little old.  I'm aware that there
  have been problems with recent versions of the em driver but I haven't
  been following them closely enough, and there's a LOT of mail traffic
  on this topic.
 
  Note that this is a FreeBSD 5.3-RELEASE-p37 system.  An upgrade is
  possible, but this is a production system and the problem occurs
  infrequently, so I'm reluctant to schedule downtime unless I have good
  reason to believe that it will fix the problem.
 
  Anyone remember if the above problem was fixed in more recent versions,
  or knows enough about the issue to comment on whether I'm barking up
  the correct tree or not?  I'm pretty early in the diagnosis on this,
  but I'm looking for pointers to keep me from doing random upgrades or
  other time-wasting activities.
 
 There were fixes to things that MIGHT be a cause of an interrupt storm
 in the em driver, but without knowing more about your specific hardware
 and the event when it happened its hard to pontificate :)

Hell, it's hard just to spell pontificate! ;)

 As an aside, I would think getting off 5.3 would be desireable in and of
 itself :)

It's on the TODO list, along with 1000 other things.  If I can prove
that 5.5 will fix a stability problem, that will move up the TODO
list in priority.

 Can you give a vmstat -i, a pciconf -l, and maybe messages when the
 storm occurred?

Sorry ... should have done that in the first email, but yesterday
afternoon was a bit flustering ...

There was a console message to the tune of interrupt store on em0 ...
I didn't bother to copy it down exactly, because I expected it would
end up in /var/log/messages, but it didn't.

vmstat -i
interrupt  total   rate
irq4: sio0 3  0
irq8: rtc8027965127
irq13: npx01  0
irq14: ata0   58  0
irq16: uhci0  286184  4
irq18: uhci2  221609  3
irq19: uhci1   3  0
irq23: atapci094  0
irq34: mpt0   221609  3
irq64: em0286144  4
irq0: clk6272009 99
Total   15315679244

pciconf -l
[EMAIL PROTECTED]:0:0:class=0x06 card=0x016c1028 chip=0x35908086 
rev=0x09 hdr=0x00
[EMAIL PROTECTED]:2:0: class=0x060400 card=0x0050 chip=0x35958086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:4:0: class=0x060400 card=0x0050 chip=0x35978086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:5:0: class=0x060400 card=0x0050 chip=0x35988086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:6:0: class=0x060400 card=0x0050 chip=0x35998086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:29:0:class=0x0c0300 card=0x016c1028 chip=0x24d28086 
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:29:1:class=0x0c0300 card=0x016c1028 chip=0x24d48086 
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:29:2:class=0x0c0300 card=0x016c1028 chip=0x24d78086 
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:29:7:class=0x0c0320 card=0x016c1028 chip=0x24dd8086 
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086 
rev=0xc2 hdr=0x01
[EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x24d08086 
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:31:1:  class=0x01018a card=0x016c1028 chip=0x24db8086 
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:0:0: class=0x060400 card=0x0044 chip=0x03298086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:0:2: class=0x060400 card=0x0044 chip=0x032a8086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:5:0:  class=0x01 card=0x016c1028 chip=0x00301000 rev=0x08 
hdr=0x00
[EMAIL PROTECTED]:0:0: class=0x060400 card=0x0044 chip=0x03298086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:0:2: class=0x060400 card=0x0044 chip=0x032a8086 rev=0x09 
hdr=0x01
[EMAIL PROTECTED]:7:0:   class=0x02 card=0x016d1028 chip=0x10768086 
rev=0x05 hdr=0x00
[EMAIL PROTECTED]:8:0:   class=0x02 card=0x016d1028 chip=0x10768086 
rev=0x05 hdr=0x00
[EMAIL PROTECTED]:5:0: class=0xff card=0x00111028 chip=0x00111028 rev=0x00 
hdr=0x00
[EMAIL PROTECTED]:5:1: class=0xff card=0x00121028 chip=0x00121028 rev=0x00 
hdr=0x00
[EMAIL PROTECTED]:5:2: class=0xff card=0x00141028 chip=0x00141028 rev=0x00 
hdr=0x00
[EMAIL PROTECTED]:6:0:   class=0x010185 card=0x06801095 chip=0x06801095 
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:13:0:class=0x03 card=0x016c1028 chip=0x51591002 
rev=0x00 hdr=0x00


-- 
Bill Moran
Collaborative Fusion Inc.

[EMAIL PROTECTED]
Phone: 

em interrupt storm

2006-11-13 Thread Bill Moran

Just experienced an interrupt storm on an em device that disabled a
server until I could reboot it.

My initial research turned up this thread:
http://lists.freebsd.org/pipermail/freebsd-current/2005-November/058336.html

Which seems related, even if it is a little old.  I'm aware that there
have been problems with recent versions of the em driver but I haven't
been following them closely enough, and there's a LOT of mail traffic
on this topic.

Note that this is a FreeBSD 5.3-RELEASE-p37 system.  An upgrade is
possible, but this is a production system and the problem occurs
infrequently, so I'm reluctant to schedule downtime unless I have good
reason to believe that it will fix the problem.

Anyone remember if the above problem was fixed in more recent versions,
or knows enough about the issue to comment on whether I'm barking up
the correct tree or not?  I'm pretty early in the diagnosis on this,
but I'm looking for pointers to keep me from doing random upgrades or
other time-wasting activities.

-- 
Bill Moran
Collaborative Fusion Inc.

[EMAIL PROTECTED]
Phone: 412-422-3463x4023


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.




IMPORTANT: This message contains confidential information and is intended only 
for the individual named. If the reader of this message is not an intended 
recipient (or the individual responsible for the delivery of this message to an 
intended recipient), please be advised that any re-use, dissemination, 
distribution or copying of this message is prohibited.  Please notify the 
sender immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: em interrupt storm

2006-11-13 Thread Jack Vogel

On 11/13/06, Bill Moran [EMAIL PROTECTED] wrote:


Just experienced an interrupt storm on an em device that disabled a
server until I could reboot it.

My initial research turned up this thread:
http://lists.freebsd.org/pipermail/freebsd-current/2005-November/058336.html

Which seems related, even if it is a little old.  I'm aware that there
have been problems with recent versions of the em driver but I haven't
been following them closely enough, and there's a LOT of mail traffic
on this topic.

Note that this is a FreeBSD 5.3-RELEASE-p37 system.  An upgrade is
possible, but this is a production system and the problem occurs
infrequently, so I'm reluctant to schedule downtime unless I have good
reason to believe that it will fix the problem.

Anyone remember if the above problem was fixed in more recent versions,
or knows enough about the issue to comment on whether I'm barking up
the correct tree or not?  I'm pretty early in the diagnosis on this,
but I'm looking for pointers to keep me from doing random upgrades or
other time-wasting activities.


There were fixes to things that MIGHT be a cause of an interrupt storm
in the em driver, but without knowing more about your specific hardware
and the event when it happened its hard to pontificate :)

As an aside, I would think getting off 5.3 would be desireable in and of
itself :)

Can you give a vmstat -i, a pciconf -l, and maybe messages when the
storm occurred?

Thanks Bill,

Jack
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]