[j-nsp] Broadcast storm on M7i fxp0 kills the CFEB?

2012-06-22 Thread Phil Mayers

All,

Yesterday, an error caused a loop in our OOB network. This resulted in 
one of our route reflectors failing, badly. Apparently, the broadcast 
storm caused the CFEB to die.


Both 1GE ports went link-down, which is understandable since the CFEB 
actually seems to have rebooted:


admin@ext-m7i-2 show chassis cfeb
CFEB status:
  ...
  Start time:   2012-06-21 14:46:39 BST
  Uptime:   22 hours, 24 minutes, 7 seconds

The box logged all sorts of horrible messages, which suggest the 
internal control connections (via fxp1) somehow hung or died - possibly 
the RE CPU was pegged?


To say that this is disturbing is an understatement; surely there should 
be no conceivable way for traffic on fxp0 to cause the CFEB to crash?


Has anyone else seen this? Does anyone have any ideas why it happened, 
and how I can ensure it cannot happen in future?


This is an RE 5.0 (400MHz) upgraded to 768Mb of RAM, running 10.4R8.5.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Broadcast storm on M7i fxp0 kills the CFEB?

2012-06-22 Thread Amos Rosenboim
Hello Phil,

I have seen this happen a few times and with different platforms.
A good way to avoid this is to configure policing on the OOB switches ports 
facing the REs.

Regards

Amos

Sent from my iPhone

On 22 Jun 2012, at 15:16, Phil Mayers 
p.may...@imperial.ac.ukmailto:p.may...@imperial.ac.uk wrote:

All,

Yesterday, an error caused a loop in our OOB network. This resulted in
one of our route reflectors failing, badly. Apparently, the broadcast
storm caused the CFEB to die.

Both 1GE ports went link-down, which is understandable since the CFEB
actually seems to have rebooted:

admin@ext-m7i-2 show chassis cfeb
CFEB status:
  ...
  Start time:   2012-06-21 14:46:39 BST
  Uptime:   22 hours, 24 minutes, 7 seconds

The box logged all sorts of horrible messages, which suggest the
internal control connections (via fxp1) somehow hung or died - possibly
the RE CPU was pegged?

To say that this is disturbing is an understatement; surely there should
be no conceivable way for traffic on fxp0 to cause the CFEB to crash?

Has anyone else seen this? Does anyone have any ideas why it happened,
and how I can ensure it cannot happen in future?

This is an RE 5.0 (400MHz) upgraded to 768Mb of RAM, running 10.4R8.5.
___
juniper-nsp mailing list 
juniper-nsp@puck.nether.netmailto:juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Broadcast storm on M7i fxp0 kills the CFEB?

2012-06-22 Thread Phil Mayers

On 22/06/12 13:29, Amos Rosenboim wrote:

Hello Phil,

I have seen this happen a few times and with different platforms.
A good way to avoid this is to configure policing on the OOB switches
ports facing the REs.


Unfortunately, our OOB network is constructed from older, repurposed 
equipment. I doubt we have the ability to do the required egress policing.


What kind of policing parameters have you successfully used?
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Broadcast storm on M7i fxp0 kills the CFEB?

2012-06-22 Thread Clarke Morledge

Phil,

Actually, I am not surprised that this happened to you.  The fxp0 
interface is a funny animal.   It isn't really as isolated from the rest 
of the box as you would think.


Since all IP broadcast/multicast on layer3 interfaces get sent to the RE 
by default, if you get a loop that starts to pump out tons of broadcasts, 
then all of that traffic will start to crush the RE and/or the forwading 
path to the RE.  It does not matter if the storm happens on regular 
interfaces or fxp0.


The only way you can mitigate against this is with RE protection filters. 
For example, you can implement a policer on fxp0 that handles packet 
bursts on ingress.  But I found it just as easy to enumerate which 
protocols and/or source ips need access to fxp0 and discard the rest using 
a firewall filter.


I learned the hard way :-)

You can follow this thread to find out what I went through:

http://www.gossamer-threads.com/lists/nsp/juniper/31311

My experience has been with the MX, but I am pretty sure the same applies 
to the M7i.


Clarke Morledge
College of William and Mary
Information Technology - Network Engineering
Jones Hall (Room 18)
Williamsburg VA 23187
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] Broadcast storm on M7i fxp0 kills the CFEB?

2012-06-22 Thread joel jaeggli

On 6/22/12 6:28 AM, Phil Mayers wrote:

On 22/06/12 13:29, Amos Rosenboim wrote:

Hello Phil,

I have seen this happen a few times and with different platforms.
A good way to avoid this is to configure policing on the OOB switches
ports facing the REs.


Unfortunately, our OOB network is constructed from older, repurposed 
equipment. I doubt we have the ability to do the required egress 
policing.


What kind of policing parameters have you successfully used?


The arp policer is the one that normally kicks in. there are golobal 
defaults which iirc vary by platform. the trick is to have per interface 
limits which are lower than the global limits so that the policer 
renders that interface unusable without rendering all arp learning on 
the box dead at once.

___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp



___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp