Cosmic rays?? [7:71402]

2003-06-25 Thread Juan Carlos Perez
We have a Cisco  VIP card plugged into a 7500 router. Every once in a while
the card  just stops working and sometimes it gets stuck so hard that we
have to reload the microcode. The last we did that, the router crashed and
had to be reset (Ugly!). Well, it gets worse. After having to convince the
guys at the local Cisco office to help us in this issue, they came to our
facilities and began their analysis. To make a long story short, they told
us that these problems were caused by cosmic rays! We almost fainted! Cosmic
rays!
Has anybody around here ever heard of this problem in this combo?  Let me
tell you this router is not installed in a spaceship or something like that,
it4s just an ordinary datacenter.
Any ideas about what the real problem might be?

P. S. The router is using a recent version of IOS (newer than 12.1) and has
been patched as per the Cisco site.

Thanks a lot for any advice on this issue.




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=71402&t=71402
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]


Re: Cosmic rays?? [7:71402]

2003-06-25 Thread annlee
The only thing close to your story is the comment from Robert and Barbara
Thompson in _PC Hardware In A Nutshell, 2e_ where they comment that on a
device used as a server or any other PC that needs a large RAM, they always
use ECC memory (Error Checking and Correction) because (honest!) cosmic rays
do strike.

Quote:
One common cause of "flipped bit" memory errors is, believe it or not,
cosmic rays. The more memory you have installed, the more likely it is that
a random cosmic ray will impact one of the memory cells in a chip on your
system, causing the contents of that cell to flip from binary zero to a one
or vice versa. We don't pretend to understand this issue, but we've been
told by memory experts that for systems with 512MB of RAM, using ECC versus
non-parity memory is about an even trade-off in terms of extra cost and lost
performance versus the likelihood of memory errors. For systems with 768MB+,
we use ECC memory exclusively.
End quote (pp201-2).

However ... that could also be a Real Convenient Excuse. Do you have any
kind of other experience with the people who said this (like, are they
naturally FUD-prone)?

Annlee

""Juan Carlos Perez""  wrote in message
news:[EMAIL PROTECTED]
> We have a Cisco  VIP card plugged into a 7500 router. Every once in a
while
> the card  just stops working and sometimes it gets stuck so hard that we
> have to reload the microcode. The last we did that, the router crashed and
> had to be reset (Ugly!). Well, it gets worse. After having to convince the
> guys at the local Cisco office to help us in this issue, they came to our
> facilities and began their analysis. To make a long story short, they told
> us that these problems were caused by cosmic rays! We almost fainted!
Cosmic
> rays!
> Has anybody around here ever heard of this problem in this combo?  Let me
> tell you this router is not installed in a spaceship or something like
that,
> it4s just an ordinary datacenter.
> Any ideas about what the real problem might be?
>
> P. S. The router is using a recent version of IOS (newer than 12.1) and
has
> been patched as per the Cisco site.
>
> Thanks a lot for any advice on this issue.




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=71409&t=71402
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]


Re: Cosmic rays?? [7:71402]

2003-06-25 Thread Zsombor Papp
Well, cosmic rays can cause memory corruption, see for example:

http://www.eetimes.com/news/98/1012news/ibm.html

But if it always hits one particular device out of many at the same 
location, then there might be a more likely explanation, like ... I dunno, 
a bug?!? :)

In any case, replacing the memory chips can't harm.

And btw, instead of talking to the "guys at the local Cisco office", go and 
open a case with TAC.

Thanks,

Zsombor

At 02:46 AM 6/26/2003 +, Juan Carlos Perez wrote:
>We have a Cisco  VIP card plugged into a 7500 router. Every once in a while
>the card  just stops working and sometimes it gets stuck so hard that we
>have to reload the microcode. The last we did that, the router crashed and
>had to be reset (Ugly!). Well, it gets worse. After having to convince the
>guys at the local Cisco office to help us in this issue, they came to our
>facilities and began their analysis. To make a long story short, they told
>us that these problems were caused by cosmic rays! We almost fainted! Cosmic
>rays!
>Has anybody around here ever heard of this problem in this combo?  Let me
>tell you this router is not installed in a spaceship or something like that,
>it4s just an ordinary datacenter.
>Any ideas about what the real problem might be?
>
>P. S. The router is using a recent version of IOS (newer than 12.1) and has
>been patched as per the Cisco site.
>
>Thanks a lot for any advice on this issue.




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=71414&t=71402
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]


Re: Cosmic rays?? [7:71402]

2003-06-25 Thread Carroll Kong
Cosmic rays typically causes some bits to flip, hence the big need 
for parity memory (but now the more advanced and self-correctable 
ECC).

Although, you are usually okay in an enclosed environment.  
(supposedly people get quite a few errors with modern memory if they 
sit it by the sun light... :) )  Errors can still occur without the 
assistance of "cosmic rays" arbitrarily flipping bits.  (yes, it has 
been known as a legitimate cause for bit flippage, but doubtful that 
was your case).

Sounds more like just a flakey card.  All sorts of things could have 
caused the card to start acting flakey.

Of course these types of failures can be generically funneled as 
software or hardware.  Although which one is it hmmm.  ;)

Unless the hardware has some super diagnostics (ever see some of 
those high end Sun workstations?), might have to do the old fashioned 
way and isolate the problem (either software or hardware) and try to 
move from there.

i.e.  can you get a replacement card?  It can be the card, 
interconnect on the router, router itself (motherboard, memory...), 
but more likely the other two).

How long has the card worked before in the past?  On what version of 
code?  Was there any increase in loads after moving to the latest 
version?  Any patterns of usage that can repeat the error?  Can they 
be cross referenced to any known bugs?

If you can repeat the error with a certain combination of actions, it 
leans a bit more towards software.

Curious, so if the diagnosis was cosmic rays... what was their 
proposed solution?  I hope it was not... "oh well, crap happens, good 
luck!"

> We have a Cisco  VIP card plugged into a 7500 router. Every once in a while
> the card  just stops working and sometimes it gets stuck so hard that we
> have to reload the microcode. The last we did that, the router crashed and
> had to be reset (Ugly!). Well, it gets worse. After having to convince the
> guys at the local Cisco office to help us in this issue, they came to our
> facilities and began their analysis. To make a long story short, they told
> us that these problems were caused by cosmic rays! We almost fainted!
Cosmic
> rays!
> Has anybody around here ever heard of this problem in this combo?  Let me
> tell you this router is not installed in a spaceship or something like
that,
> it4s just an ordinary datacenter.
> Any ideas about what the real problem might be?
> 
> P. S. The router is using a recent version of IOS (newer than 12.1) and has
> been patched as per the Cisco site.
> 
> Thanks a lot for any advice on this issue.
-Carroll Kong




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=71418&t=71402
--
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]