Hi all,
We recently (yesterday) had a Cisco 7507 start acting funny in the middle of
the night (when else) and a few hours later rebooted itself. As this router
has never given us any problems before we were a bit concerned. Looking at
the log files and some show commands located what appeared to be a processor
parity memory error. Not knowing what to make of this I opened a case with
TACS and eventually received the reply shown below after sending the show
tech-support file and log file to them.
My question is:
If the event represents a faulty memory module is there any way to monitor
for such a problem beginning to occur again? Something perhaps in CiscoWorks
or some process that can be monitored. As this problem appeared to occur
over an entire night I was thinking you may be able to see it coming. If it
truly was a random event then we shouldn't have to worry about it. Thx for
any suggestions.
****************************************************************************
***************
Reply from TACS:
"The router crashed due to a processore parity memory
error (PMPE). In a router that uses parity checking, each byte of data
stored in memory has an associated parity bit.
If the sum of the bits violates the even (or odd) arity rule, the basic
input-output system halts the router with a message
like "processor memory parity error". While having the router halted is
certainly undesirable and is at the very least
inconvenient, it is good news relative to some of the imaginable
alternatives if the error were to have
gone undetected.
This could be due to several things:
* Faulty Memory
* Transient Memory Error.
Faulty memory would cause more than one crash, so if the router has crashed
several times for this error
it is most likely the cause of your problem. Howeve, if the parity crash
happens no more than 1 time per month
than the problem is most likely caused by an electrical transient error
(Alpha particles) which are
naturally occuring high energy particles that can strike parts of the
silicon in DRAM transferring energy to them.
This can cause a bit to change - hence a parity error. This is is not a
hardware failure.
Due to the fact that this is very hard to define, my recommendation is to
monitor the router for further
issues. If the crash occurs again or has happened several times in the past,
then the memory in the router
needs to be replaced (on RSP)."
Transient Parity Errors Described - PMPE from Cosmic Rays - IBM
http://www.research.ibm.com/journal/rd/ziegl/zieglert.html
****************************************************************************
******************
_________________________________
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]