Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-22 Thread Phil Mayers

On 09/22/2010 08:53 AM, Phil Mayers wrote:

On 09/22/2010 08:31 AM, Phil Mayers wrote:

On 09/22/2010 01:00 AM, Peter Rathlev wrote:

On Tue, 2010-09-21 at 22:12 +0100, Phil Mayers wrote:

2. Use CoPP for everything else; DO NOT use the glean or cef receive
limiter


I'm confused here: Why not use the glean limiter? As I understand it you
can simplify the CoPP maps a lot. (Non-IP traffic like IS-IS still being
a special case of course.)

What am I missing? Is it because you want to catch special cases of
gleaning earlier in the CoPP and differentiate rates?



IIRC the glean limiter is faulty in some fashion on PFC-3B and earlier.
I can never remember the precise details, but I think it's been
discussed on the list in the past, and I've had knowledgeable TAC
engineers tell me "really, don't use it".


Ah yes: egress ACL craziness:

http://puck.nether.net/pipermail/cisco-nsp/2007-February/038465.html


For the curious, see also this post and surrounding ones in the most 
recent CoPP uber-thread, a cisco-nsp bi-annual event ;o)


http://www.mail-archive.com/cisco-nsp@puck.nether.net/msg29506.html
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-22 Thread Phil Mayers

On 09/22/2010 08:31 AM, Phil Mayers wrote:

On 09/22/2010 01:00 AM, Peter Rathlev wrote:

On Tue, 2010-09-21 at 22:12 +0100, Phil Mayers wrote:

2. Use CoPP for everything else; DO NOT use the glean or cef receive
limiter


I'm confused here: Why not use the glean limiter? As I understand it you
can simplify the CoPP maps a lot. (Non-IP traffic like IS-IS still being
a special case of course.)

What am I missing? Is it because you want to catch special cases of
gleaning earlier in the CoPP and differentiate rates?



IIRC the glean limiter is faulty in some fashion on PFC-3B and earlier.
I can never remember the precise details, but I think it's been
discussed on the list in the past, and I've had knowledgeable TAC
engineers tell me "really, don't use it".


Ah yes: egress ACL craziness:

http://puck.nether.net/pipermail/cisco-nsp/2007-February/038465.html
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-22 Thread Phil Mayers

On 09/22/2010 01:00 AM, Peter Rathlev wrote:

On Tue, 2010-09-21 at 22:12 +0100, Phil Mayers wrote:

2. Use CoPP for everything else; DO NOT use the glean or cef receive
limiter


I'm confused here: Why not use the glean limiter? As I understand it you
can simplify the CoPP maps a lot. (Non-IP traffic like IS-IS still being
a special case of course.)

What am I missing? Is it because you want to catch special cases of
gleaning earlier in the CoPP and differentiate rates?



IIRC the glean limiter is faulty in some fashion on PFC-3B and earlier. 
I can never remember the precise details, but I think it's been 
discussed on the list in the past, and I've had knowledgeable TAC 
engineers tell me "really, don't use it".

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Peter Rathlev
On Tue, 2010-09-21 at 22:12 +0100, Phil Mayers wrote:
> 2. Use CoPP for everything else; DO NOT use the glean or cef receive 
> limiter

I'm confused here: Why not use the glean limiter? As I understand it you
can simplify the CoPP maps a lot. (Non-IP traffic like IS-IS still being
a special case of course.)

What am I missing? Is it because you want to catch special cases of
gleaning earlier in the CoPP and differentiate rates?

-- 
Peter


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Benjamin Lovell
John,

If you are a first hop mcast router then connected could matter to you but as a 
transit box the two you would care about the most are fib-miss(i.e. no hardware 
entry found so punt) and non-RPF(you came in on the wrong interface). Setting 
these two to a low number (100's of pps) should provide the CPU some protection 
for the short period needed to reprogram hardware when you change the 
replication mode. 

-Ben


On Sep 21, 2010, at 5:19 PM, John Neiberger wrote:

> I see the "mls rate-limit multicast ipv4 connected" command, for
> directly connected sources, but is there a command that would apply to
> traffic going through box?
> 
> We have very little unicast traffic but a whole bunch of multicast
> traffic, some of which is directly connected but much of it is simply
> passing through.
> 
> Thanks again for all your help, I appreciate your time.
> 
> On Tue, Sep 21, 2010 at 3:12 PM, Phil Mayers  wrote:
>> On 09/21/2010 09:58 PM, John Neiberger wrote:
>>> 
>>> Sorry.  I also meant to say it's a Sup 720-3BXL. Based on what I can
>>> see on CCO, that thing can forward 400 Mpps of ipv4 traffic. Does that
>>> mean that I can set a rate limit of, say, 300 Mpps and somewhat guard
>>> the CPU from meltdown for a few moments?
>> 
>> I wish! 300Mpps hitting the CPU of a sup720 will kill it stone dead. It has
>> (two) 600MHz CPUs, and they will not survive such a load.
>> 
>> There's lots of info in this in the archives, but in brief: the sup720/pfc3
>> architecture forwards most packets in hardware. Some packets are however
>> punted to CPU; these include:
>> 
>>  1. Packets which need ARP resolution ("glean")
>> 
>>  2. Multicast packets, which are trickled to the CPU so the CPU can see them
>> and build (and refresh) hardware forwarding state
>> 
>>  3. ACL and uRPF denies, which are trickled to the CPU so it can maintain
>> counters
>> 
>>  4. Various other traffic like TTL failures, needing ICMP
>> 
>>  5. Obviously, packets address to the CPU (routing traffic, layer 2 PDUs)
>> 
>> Because the CPUs on these boxes are very, very puny, you want to limit what
>> hits the CPU. There are two methods available:
>> 
>>  1. The "mls rate-limit" commands; these will place a simple numeric rate
>> cap on certain types of traffic, and is done in hardware. There's no
>> prioritisation, but for certain types of traffic (e.g. TTL failures) you can
>> and should IMHO set low-ish limits. You SHOULD NOT use the "general" CEF
>> limiter; because you should use...
>> 
>>  2. CoPP - basically QoS on traffic punted to CPU. This is superior because
>> you can write ACLs defining what is most important to you, with very
>> granular control over policy. It suffers a couple of problems -
>> broadcast/multicast and non-IP traffic are done in software, and it can't
>> distinguish between glean and receive traffic, making a default-deny policy
>> tricky.
>> 
>> 
>> In short, common advice seems to be:
>> 
>>  1. Set low limits on the mls limiters for TTL & MTU failure, and optionally
>> ACL-drop ICMP unreach:
>> 
>> mls rate-limit unicast ip icmp unreachable acl-drop 0
>> mls rate-limit all ttl-failure 100 10
>> mls rate-limit all mtu-failure 100 10
>> 
>>  2. Use CoPP for everything else; DO NOT use the glean or cef receive
>> limiter
>> 
>> Search the archives and the Cisco docs for more info; it's not something I
>> can summarise in 5 minutes I'm afraid.
>> 
>> In your case, if you are going to perform a task which will potentially punt
>> a lot of multicast traffic to the CPU, I was suggesting that there are MLS
>> limiters which will reduce this; see my earlier email; though we run with
>> the defaults, which are (quite) high PPS values!
>> 
>> sh mls rate-limit  | inc MC
>> MCAST NON RPF   Off  -   - -
>>MCAST DFLT ADJ   On  10 100  Not sharing
>>  MCAST DIRECT CON   Off  -   - -
>>  MCAST PARTIAL SC   On  10 100  Not sharing
>> 


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
I see the "mls rate-limit multicast ipv4 connected" command, for
directly connected sources, but is there a command that would apply to
traffic going through box?

We have very little unicast traffic but a whole bunch of multicast
traffic, some of which is directly connected but much of it is simply
passing through.

Thanks again for all your help, I appreciate your time.

On Tue, Sep 21, 2010 at 3:12 PM, Phil Mayers  wrote:
> On 09/21/2010 09:58 PM, John Neiberger wrote:
>>
>> Sorry.  I also meant to say it's a Sup 720-3BXL. Based on what I can
>> see on CCO, that thing can forward 400 Mpps of ipv4 traffic. Does that
>> mean that I can set a rate limit of, say, 300 Mpps and somewhat guard
>> the CPU from meltdown for a few moments?
>
> I wish! 300Mpps hitting the CPU of a sup720 will kill it stone dead. It has
> (two) 600MHz CPUs, and they will not survive such a load.
>
> There's lots of info in this in the archives, but in brief: the sup720/pfc3
> architecture forwards most packets in hardware. Some packets are however
> punted to CPU; these include:
>
>  1. Packets which need ARP resolution ("glean")
>
>  2. Multicast packets, which are trickled to the CPU so the CPU can see them
> and build (and refresh) hardware forwarding state
>
>  3. ACL and uRPF denies, which are trickled to the CPU so it can maintain
> counters
>
>  4. Various other traffic like TTL failures, needing ICMP
>
>  5. Obviously, packets address to the CPU (routing traffic, layer 2 PDUs)
>
> Because the CPUs on these boxes are very, very puny, you want to limit what
> hits the CPU. There are two methods available:
>
>  1. The "mls rate-limit" commands; these will place a simple numeric rate
> cap on certain types of traffic, and is done in hardware. There's no
> prioritisation, but for certain types of traffic (e.g. TTL failures) you can
> and should IMHO set low-ish limits. You SHOULD NOT use the "general" CEF
> limiter; because you should use...
>
>  2. CoPP - basically QoS on traffic punted to CPU. This is superior because
> you can write ACLs defining what is most important to you, with very
> granular control over policy. It suffers a couple of problems -
> broadcast/multicast and non-IP traffic are done in software, and it can't
> distinguish between glean and receive traffic, making a default-deny policy
> tricky.
>
>
> In short, common advice seems to be:
>
>  1. Set low limits on the mls limiters for TTL & MTU failure, and optionally
> ACL-drop ICMP unreach:
>
> mls rate-limit unicast ip icmp unreachable acl-drop 0
> mls rate-limit all ttl-failure 100 10
> mls rate-limit all mtu-failure 100 10
>
>  2. Use CoPP for everything else; DO NOT use the glean or cef receive
> limiter
>
> Search the archives and the Cisco docs for more info; it's not something I
> can summarise in 5 minutes I'm afraid.
>
> In your case, if you are going to perform a task which will potentially punt
> a lot of multicast traffic to the CPU, I was suggesting that there are MLS
> limiters which will reduce this; see my earlier email; though we run with
> the defaults, which are (quite) high PPS values!
>
> sh mls rate-limit  | inc MC
>         MCAST NON RPF   Off                  -       -     -
>        MCAST DFLT ADJ   On              10     100  Not sharing
>      MCAST DIRECT CON   Off                  -       -     -
>      MCAST PARTIAL SC   On              10     100  Not sharing
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Phil Mayers

On 09/21/2010 09:58 PM, John Neiberger wrote:

Sorry.  I also meant to say it's a Sup 720-3BXL. Based on what I can
see on CCO, that thing can forward 400 Mpps of ipv4 traffic. Does that
mean that I can set a rate limit of, say, 300 Mpps and somewhat guard
the CPU from meltdown for a few moments?


I wish! 300Mpps hitting the CPU of a sup720 will kill it stone dead. It 
has (two) 600MHz CPUs, and they will not survive such a load.


There's lots of info in this in the archives, but in brief: the 
sup720/pfc3 architecture forwards most packets in hardware. Some packets 
are however punted to CPU; these include:


 1. Packets which need ARP resolution ("glean")

 2. Multicast packets, which are trickled to the CPU so the CPU can see 
them and build (and refresh) hardware forwarding state


 3. ACL and uRPF denies, which are trickled to the CPU so it can 
maintain counters


 4. Various other traffic like TTL failures, needing ICMP

 5. Obviously, packets address to the CPU (routing traffic, layer 2 PDUs)

Because the CPUs on these boxes are very, very puny, you want to limit 
what hits the CPU. There are two methods available:


 1. The "mls rate-limit" commands; these will place a simple numeric 
rate cap on certain types of traffic, and is done in hardware. There's 
no prioritisation, but for certain types of traffic (e.g. TTL failures) 
you can and should IMHO set low-ish limits. You SHOULD NOT use the 
"general" CEF limiter; because you should use...


 2. CoPP - basically QoS on traffic punted to CPU. This is superior 
because you can write ACLs defining what is most important to you, with 
very granular control over policy. It suffers a couple of problems - 
broadcast/multicast and non-IP traffic are done in software, and it 
can't distinguish between glean and receive traffic, making a 
default-deny policy tricky.



In short, common advice seems to be:

 1. Set low limits on the mls limiters for TTL & MTU failure, and 
optionally ACL-drop ICMP unreach:


mls rate-limit unicast ip icmp unreachable acl-drop 0
mls rate-limit all ttl-failure 100 10
mls rate-limit all mtu-failure 100 10

 2. Use CoPP for everything else; DO NOT use the glean or cef receive 
limiter


Search the archives and the Cisco docs for more info; it's not something 
I can summarise in 5 minutes I'm afraid.


In your case, if you are going to perform a task which will potentially 
punt a lot of multicast traffic to the CPU, I was suggesting that there 
are MLS limiters which will reduce this; see my earlier email; though we 
run with the defaults, which are (quite) high PPS values!


sh mls rate-limit  | inc MC
 MCAST NON RPF   Off  -   - -
MCAST DFLT ADJ   On  10 100  Not sharing
  MCAST DIRECT CON   Off  -   - -
  MCAST PARTIAL SC   On  10 100  Not sharing
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
Sorry.  I also meant to say it's a Sup 720-3BXL. Based on what I can
see on CCO, that thing can forward 400 Mpps of ipv4 traffic. Does that
mean that I can set a rate limit of, say, 300 Mpps and somewhat guard
the CPU from meltdown for a few moments?


On Tue, Sep 21, 2010 at 2:52 PM, John Neiberger  wrote:
> Are you all referring to "mls ip cef rate-limit"? If so, what do you
> think would be a good value to use on a Sup 720? We'd like to set it
> so that the CPU isn't overloaded so much that routing protocols drop
> and we don't lose our SSH sessions. That way we can monitor it and
> watch to see when the CPU drops back down to normal.
>
> Thanks!
> John
>
> On Tue, Sep 21, 2010 at 2:16 PM, John Neiberger  wrote:
>> Thanks. I've never used the MLS limiters before, so I'll look into how
>> they're configured in case we decide to use them. But we also have the
>> option of moving most of our production traffic away from these boxes
>> temporarily, so we may be able to just deal with the temporary chaos.
>>
>> John
>>
>> On Tue, Sep 21, 2010 at 2:04 PM, Benjamin Lovell  wrote:
>>> Excellent point and suggestion. This should prevent punts from smashing 
>>> your control plane and causing a cascading effect like the one I described.
>>>
>>> -Ben
>>>
>>>
>>> On Sep 21, 2010, at 3:47 PM, Phil Mayers wrote:
>>>
 On 09/21/2010 08:27 PM, Benjamin Lovell wrote:
> The primary thing to worry about here is the mcast packet rate not
> number of mroutes. Replication change will cause all mcast packets to
> be punted to CPU for a short period(few 100 msec or so).

 Remember you can rate-limit this with the MLS limiters. Whilst they 
 defaults are (very) high, lowering it for the duration of this change 
 could ease the problems.
 ___
 cisco-nsp mailing list  cisco-...@puck.nether.net
 https://puck.nether.net/mailman/listinfo/cisco-nsp
 archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>
>>>
>>> ___
>>> cisco-nsp mailing list  cisco-...@puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>
>>
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Phil Mayers

On 09/21/2010 09:52 PM, John Neiberger wrote:

Are you all referring to "mls ip cef rate-limit"? If so, what do you


I was referring to the multicast punt rate limiters:

mls rate-limit multicast ipv4 ?
  connected   Rate limiting of multicast packets from directly 
connected source

  fib-missRate limiting of fib-missed multicast packets
  igmpRate limiting of the IGMP protocol packets
  ip-options  rate limiting of multicast packets with ip options
  non-rpf Rate limiting of non-rpf multicast packets
  partial rate limiting of multicast packets during partial-SC state

I don't know what "mls ip cef rate-limit" does; my 6500/SXI box doesn't 
have it.



think would be a good value to use on a Sup 720? We'd like to set it
so that the CPU isn't overloaded so much that routing protocols drop
and we don't lose our SSH sessions. That way we can monitor it and
watch to see when the CPU drops back down to normal.


If you want to rate-limit unicast traffic hitting the CPU I would 
investigate CoPP. There's a lot of info on this in the list archives, 
and there are many reasons to prefer it to mls rate limiters for unicast 
traffic.

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
Are you all referring to "mls ip cef rate-limit"? If so, what do you
think would be a good value to use on a Sup 720? We'd like to set it
so that the CPU isn't overloaded so much that routing protocols drop
and we don't lose our SSH sessions. That way we can monitor it and
watch to see when the CPU drops back down to normal.

Thanks!
John

On Tue, Sep 21, 2010 at 2:16 PM, John Neiberger  wrote:
> Thanks. I've never used the MLS limiters before, so I'll look into how
> they're configured in case we decide to use them. But we also have the
> option of moving most of our production traffic away from these boxes
> temporarily, so we may be able to just deal with the temporary chaos.
>
> John
>
> On Tue, Sep 21, 2010 at 2:04 PM, Benjamin Lovell  wrote:
>> Excellent point and suggestion. This should prevent punts from smashing your 
>> control plane and causing a cascading effect like the one I described.
>>
>> -Ben
>>
>>
>> On Sep 21, 2010, at 3:47 PM, Phil Mayers wrote:
>>
>>> On 09/21/2010 08:27 PM, Benjamin Lovell wrote:
 The primary thing to worry about here is the mcast packet rate not
 number of mroutes. Replication change will cause all mcast packets to
 be punted to CPU for a short period(few 100 msec or so).
>>>
>>> Remember you can rate-limit this with the MLS limiters. Whilst they 
>>> defaults are (very) high, lowering it for the duration of this change could 
>>> ease the problems.
>>> ___
>>> cisco-nsp mailing list  cisco-...@puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>>
>> ___
>> cisco-nsp mailing list  cisco-...@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
Thanks. I've never used the MLS limiters before, so I'll look into how
they're configured in case we decide to use them. But we also have the
option of moving most of our production traffic away from these boxes
temporarily, so we may be able to just deal with the temporary chaos.

John

On Tue, Sep 21, 2010 at 2:04 PM, Benjamin Lovell  wrote:
> Excellent point and suggestion. This should prevent punts from smashing your 
> control plane and causing a cascading effect like the one I described.
>
> -Ben
>
>
> On Sep 21, 2010, at 3:47 PM, Phil Mayers wrote:
>
>> On 09/21/2010 08:27 PM, Benjamin Lovell wrote:
>>> The primary thing to worry about here is the mcast packet rate not
>>> number of mroutes. Replication change will cause all mcast packets to
>>> be punted to CPU for a short period(few 100 msec or so).
>>
>> Remember you can rate-limit this with the MLS limiters. Whilst they defaults 
>> are (very) high, lowering it for the duration of this change could ease the 
>> problems.
>> ___
>> cisco-nsp mailing list  cisco-...@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>
> ___
> cisco-nsp mailing list  cisco-...@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Benjamin Lovell
Excellent point and suggestion. This should prevent punts from smashing your 
control plane and causing a cascading effect like the one I described. 

-Ben


On Sep 21, 2010, at 3:47 PM, Phil Mayers wrote:

> On 09/21/2010 08:27 PM, Benjamin Lovell wrote:
>> The primary thing to worry about here is the mcast packet rate not
>> number of mroutes. Replication change will cause all mcast packets to
>> be punted to CPU for a short period(few 100 msec or so).
> 
> Remember you can rate-limit this with the MLS limiters. Whilst they defaults 
> are (very) high, lowering it for the duration of this change could ease the 
> problems.
> ___
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Phil Mayers

On 09/21/2010 08:27 PM, Benjamin Lovell wrote:

The primary thing to worry about here is the mcast packet rate not
number of mroutes. Replication change will cause all mcast packets to
be punted to CPU for a short period(few 100 msec or so).


Remember you can rate-limit this with the MLS limiters. Whilst they 
defaults are (very) high, lowering it for the duration of this change 
could ease the problems.

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
That's good to know. We actually have several Gbps of multicast video
running over these boxes. Based on that, I suspect that we'll do that
change some other night unless I have plenty of time tonight to deal
with it.

Thanks for everyone's input. That helps me out a lot!
John

On Tue, Sep 21, 2010 at 1:27 PM, Benjamin Lovell  wrote:
> The primary thing to worry about here is the mcast packet rate not number of 
> mroutes. Replication change will cause all mcast packets to be punted to CPU 
> for a short period(few 100 msec or so). I have seen this go badly under the 
> following conditions.
>
> 1) High rate of mcast traffic. few Gigs per sec.
> 2) BFD will tight timers in the 750msec range.
>
> When mcast started getting punted the CPU spiked to the moon. Input queues 
> got smashed. BFD and therefor OSPF dropped. RPF interfaces changed and it 
> took almost 30min for the box to calm back down.
>
> If you have a moderate to high mcast packet rate I would not count on this 
> being a hitless change.
>
> -Ben
>
>
> On Sep 21, 2010, at 3:01 PM, John Neiberger wrote:
>
>> If I recall correctly, we have over 500 mroutes. I was just speaking
>> to a Cisco engineer that works with us about this. I think I'm going
>> to save this change until last. We have a lot of etherchannels and we
>> want to convert those to routed links with ECMP first, then we'll
>> switch over to egress replication. It sounds like we shouldn't have
>> more than a couple of seconds of impact.
>>
>> Thanks!
>> John
>>
>> On Tue, Sep 21, 2010 at 12:46 PM, Tim Stevenson  wrote:
>>> Hi John,
>>> Switching replication modes basically purges the hardware of all mroutes and
>>> those will be reprogrammed based on the current software state. It will be
>>> potentially disruptive for all mroutes, but the exact amount of traffic
>>> loss/blackholing would depend on the rate of each stream at the time, and
>>> the overall amount of time it takes to reprogram. For a few 100 mroutes, I
>>> would not expect much impact.
>>>
>>> Hope that helps,
>>> Tim
>>>
>>> At 11:30 AM 9/21/2010, John Neiberger averred:
>>>
 We're running 12.2(33)SRC2, I believe. It's actually experimental code
 and the exact version is overwritten with another code.

 On Tue, Sep 21, 2010 at 12:04 PM, Jeffrey Pazahanick 
 wrote:
> John,
> Having switched back and forth a few times, I never noticed more than
> a 1-2 second outage.
> What version of code are you on?
>
> On Tue, Sep 21, 2010 at 11:59 AM, John Neiberger 
> wrote:
>> We're going to be doing a whole bunch of maintenance tonight during a
>> maintenance window. One of the many things on our plate is to switch
>> from ingress replication mode to egress on some 7600s that have a few
>> hundred multicast routes on them. We know there is going to be at
>> least a minor blip while things settle down after making the change,
>> but I wanted to see if anyone on the list has done this and what the
>> operational impact was. I've heard there will be slight interruption
>> in traffic, but what sort of interruption are we talking about? Are we
>> speaking about a second or two?
>>
>> I'm asking because we're trying to decide if we want to split this out
>> to another night. If the disruption is minor and the risk is low then
>> we'll do it tonight. Otherwise, we might choose to do it on a separate
>> night.
>>
>> Any thoughts?
>> ___
>> cisco-nsp mailing list  cisco-...@puck.nether.net
>>
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at
>> http://puck.nether.net/pipermail/cisco-nsp/
>>
>

 ___
 cisco-nsp mailing list  cisco-...@puck.nether.net

 https://puck.nether.net/mailman/listinfo/cisco-nsp
 archive at
 http://puck.nether.net/pipermail/cisco-nsp/
>>>
>>>
>>>
>>>
>>> Tim Stevenson, tstev...@cisco.com
>>> Routing & Switching CCIE #5561
>>> Distinguished Technical Marketing Engineer, Cisco Nexus 7000
>>> Cisco - http://www.cisco.com
>>> IP Phone: 408-526-6759
>>> 
>>> The contents of this message may be *Cisco Confidential*
>>> and are intended for the specified recipients only.
>>>
>>>
>>>
>>
>> ___
>> cisco-nsp mailing list  cisco-...@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Benjamin Lovell
The primary thing to worry about here is the mcast packet rate not number of 
mroutes. Replication change will cause all mcast packets to be punted to CPU 
for a short period(few 100 msec or so). I have seen this go badly under the 
following conditions. 

1) High rate of mcast traffic. few Gigs per sec.
2) BFD will tight timers in the 750msec range. 

When mcast started getting punted the CPU spiked to the moon. Input queues got 
smashed. BFD and therefor OSPF dropped. RPF interfaces changed and it took 
almost 30min for the box to calm back down. 

If you have a moderate to high mcast packet rate I would not count on this 
being a hitless change. 

-Ben


On Sep 21, 2010, at 3:01 PM, John Neiberger wrote:

> If I recall correctly, we have over 500 mroutes. I was just speaking
> to a Cisco engineer that works with us about this. I think I'm going
> to save this change until last. We have a lot of etherchannels and we
> want to convert those to routed links with ECMP first, then we'll
> switch over to egress replication. It sounds like we shouldn't have
> more than a couple of seconds of impact.
> 
> Thanks!
> John
> 
> On Tue, Sep 21, 2010 at 12:46 PM, Tim Stevenson  wrote:
>> Hi John,
>> Switching replication modes basically purges the hardware of all mroutes and
>> those will be reprogrammed based on the current software state. It will be
>> potentially disruptive for all mroutes, but the exact amount of traffic
>> loss/blackholing would depend on the rate of each stream at the time, and
>> the overall amount of time it takes to reprogram. For a few 100 mroutes, I
>> would not expect much impact.
>> 
>> Hope that helps,
>> Tim
>> 
>> At 11:30 AM 9/21/2010, John Neiberger averred:
>> 
>>> We're running 12.2(33)SRC2, I believe. It's actually experimental code
>>> and the exact version is overwritten with another code.
>>> 
>>> On Tue, Sep 21, 2010 at 12:04 PM, Jeffrey Pazahanick 
>>> wrote:
 John,
 Having switched back and forth a few times, I never noticed more than
 a 1-2 second outage.
 What version of code are you on?
 
 On Tue, Sep 21, 2010 at 11:59 AM, John Neiberger 
 wrote:
> We're going to be doing a whole bunch of maintenance tonight during a
> maintenance window. One of the many things on our plate is to switch
> from ingress replication mode to egress on some 7600s that have a few
> hundred multicast routes on them. We know there is going to be at
> least a minor blip while things settle down after making the change,
> but I wanted to see if anyone on the list has done this and what the
> operational impact was. I've heard there will be slight interruption
> in traffic, but what sort of interruption are we talking about? Are we
> speaking about a second or two?
> 
> I'm asking because we're trying to decide if we want to split this out
> to another night. If the disruption is minor and the risk is low then
> we'll do it tonight. Otherwise, we might choose to do it on a separate
> night.
> 
> Any thoughts?
> ___
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> 
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at
> http://puck.nether.net/pipermail/cisco-nsp/
> 
 
>>> 
>>> ___
>>> cisco-nsp mailing list  cisco-nsp@puck.nether.net
>>> 
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at
>>> http://puck.nether.net/pipermail/cisco-nsp/
>> 
>> 
>> 
>> 
>> Tim Stevenson, tstev...@cisco.com
>> Routing & Switching CCIE #5561
>> Distinguished Technical Marketing Engineer, Cisco Nexus 7000
>> Cisco - http://www.cisco.com
>> IP Phone: 408-526-6759
>> 
>> The contents of this message may be *Cisco Confidential*
>> and are intended for the specified recipients only.
>> 
>> 
>> 
> 
> ___
> cisco-nsp mailing list  cisco-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Jeffrey Pazahanick
FWIW, I had some issues with SRC2 and egress replication..  Might be
specific to the ES20+ cards.

On Tue, Sep 21, 2010 at 2:01 PM, John Neiberger  wrote:
> If I recall correctly, we have over 500 mroutes. I was just speaking
> to a Cisco engineer that works with us about this. I think I'm going
> to save this change until last. We have a lot of etherchannels and we
> want to convert those to routed links with ECMP first, then we'll
> switch over to egress replication. It sounds like we shouldn't have
> more than a couple of seconds of impact.
>
> Thanks!
> John
>
> On Tue, Sep 21, 2010 at 12:46 PM, Tim Stevenson  wrote:
>> Hi John,
>> Switching replication modes basically purges the hardware of all mroutes and
>> those will be reprogrammed based on the current software state. It will be
>> potentially disruptive for all mroutes, but the exact amount of traffic
>> loss/blackholing would depend on the rate of each stream at the time, and
>> the overall amount of time it takes to reprogram. For a few 100 mroutes, I
>> would not expect much impact.
>>
>> Hope that helps,
>> Tim
>>
>> At 11:30 AM 9/21/2010, John Neiberger averred:
>>
>>> We're running 12.2(33)SRC2, I believe. It's actually experimental code
>>> and the exact version is overwritten with another code.
>>>
>>> On Tue, Sep 21, 2010 at 12:04 PM, Jeffrey Pazahanick 
>>> wrote:
>>> > John,
>>> > Having switched back and forth a few times, I never noticed more than
>>> > a 1-2 second outage.
>>> > What version of code are you on?
>>> >
>>> > On Tue, Sep 21, 2010 at 11:59 AM, John Neiberger 
>>> > wrote:
>>> >> We're going to be doing a whole bunch of maintenance tonight during a
>>> >> maintenance window. One of the many things on our plate is to switch
>>> >> from ingress replication mode to egress on some 7600s that have a few
>>> >> hundred multicast routes on them. We know there is going to be at
>>> >> least a minor blip while things settle down after making the change,
>>> >> but I wanted to see if anyone on the list has done this and what the
>>> >> operational impact was. I've heard there will be slight interruption
>>> >> in traffic, but what sort of interruption are we talking about? Are we
>>> >> speaking about a second or two?
>>> >>
>>> >> I'm asking because we're trying to decide if we want to split this out
>>> >> to another night. If the disruption is minor and the risk is low then
>>> >> we'll do it tonight. Otherwise, we might choose to do it on a separate
>>> >> night.
>>> >>
>>> >> Any thoughts?
>>> >> ___
>>> >> cisco-nsp mailing list  cisco-...@puck.nether.net
>>> >>
>>> >> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> >> archive at
>>> >> http://puck.nether.net/pipermail/cisco-nsp/
>>> >>
>>> >
>>>
>>> ___
>>> cisco-nsp mailing list  cisco-...@puck.nether.net
>>>
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at
>>> http://puck.nether.net/pipermail/cisco-nsp/
>>
>>
>>
>>
>> Tim Stevenson, tstev...@cisco.com
>> Routing & Switching CCIE #5561
>> Distinguished Technical Marketing Engineer, Cisco Nexus 7000
>> Cisco - http://www.cisco.com
>> IP Phone: 408-526-6759
>> 
>> The contents of this message may be *Cisco Confidential*
>> and are intended for the specified recipients only.
>>
>>
>>
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
If I recall correctly, we have over 500 mroutes. I was just speaking
to a Cisco engineer that works with us about this. I think I'm going
to save this change until last. We have a lot of etherchannels and we
want to convert those to routed links with ECMP first, then we'll
switch over to egress replication. It sounds like we shouldn't have
more than a couple of seconds of impact.

Thanks!
John

On Tue, Sep 21, 2010 at 12:46 PM, Tim Stevenson  wrote:
> Hi John,
> Switching replication modes basically purges the hardware of all mroutes and
> those will be reprogrammed based on the current software state. It will be
> potentially disruptive for all mroutes, but the exact amount of traffic
> loss/blackholing would depend on the rate of each stream at the time, and
> the overall amount of time it takes to reprogram. For a few 100 mroutes, I
> would not expect much impact.
>
> Hope that helps,
> Tim
>
> At 11:30 AM 9/21/2010, John Neiberger averred:
>
>> We're running 12.2(33)SRC2, I believe. It's actually experimental code
>> and the exact version is overwritten with another code.
>>
>> On Tue, Sep 21, 2010 at 12:04 PM, Jeffrey Pazahanick 
>> wrote:
>> > John,
>> > Having switched back and forth a few times, I never noticed more than
>> > a 1-2 second outage.
>> > What version of code are you on?
>> >
>> > On Tue, Sep 21, 2010 at 11:59 AM, John Neiberger 
>> > wrote:
>> >> We're going to be doing a whole bunch of maintenance tonight during a
>> >> maintenance window. One of the many things on our plate is to switch
>> >> from ingress replication mode to egress on some 7600s that have a few
>> >> hundred multicast routes on them. We know there is going to be at
>> >> least a minor blip while things settle down after making the change,
>> >> but I wanted to see if anyone on the list has done this and what the
>> >> operational impact was. I've heard there will be slight interruption
>> >> in traffic, but what sort of interruption are we talking about? Are we
>> >> speaking about a second or two?
>> >>
>> >> I'm asking because we're trying to decide if we want to split this out
>> >> to another night. If the disruption is minor and the risk is low then
>> >> we'll do it tonight. Otherwise, we might choose to do it on a separate
>> >> night.
>> >>
>> >> Any thoughts?
>> >> ___
>> >> cisco-nsp mailing list  cisco-...@puck.nether.net
>> >>
>> >> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> >> archive at
>> >> http://puck.nether.net/pipermail/cisco-nsp/
>> >>
>> >
>>
>> ___
>> cisco-nsp mailing list  cisco-...@puck.nether.net
>>
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at
>> http://puck.nether.net/pipermail/cisco-nsp/
>
>
>
>
> Tim Stevenson, tstev...@cisco.com
> Routing & Switching CCIE #5561
> Distinguished Technical Marketing Engineer, Cisco Nexus 7000
> Cisco - http://www.cisco.com
> IP Phone: 408-526-6759
> 
> The contents of this message may be *Cisco Confidential*
> and are intended for the specified recipients only.
>
>
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Tim Stevenson

Hi John,
Switching replication modes basically purges the hardware of all 
mroutes and those will be reprogrammed based on the current software 
state. It will be potentially disruptive for all mroutes, but the 
exact amount of traffic loss/blackholing would depend on the rate of 
each stream at the time, and the overall amount of time it takes to 
reprogram. For a few 100 mroutes, I would not expect much impact.


Hope that helps,
Tim

At 11:30 AM 9/21/2010, John Neiberger averred:


We're running 12.2(33)SRC2, I believe. It's actually experimental code
and the exact version is overwritten with another code.

On Tue, Sep 21, 2010 at 12:04 PM, Jeffrey Pazahanick 
 wrote:

> John,
> Having switched back and forth a few times, I never noticed more than
> a 1-2 second outage.
> What version of code are you on?
>
> On Tue, Sep 21, 2010 at 11:59 AM, John Neiberger 
 wrote:

>> We're going to be doing a whole bunch of maintenance tonight during a
>> maintenance window. One of the many things on our plate is to switch
>> from ingress replication mode to egress on some 7600s that have a few
>> hundred multicast routes on them. We know there is going to be at
>> least a minor blip while things settle down after making the change,
>> but I wanted to see if anyone on the list has done this and what the
>> operational impact was. I've heard there will be slight interruption
>> in traffic, but what sort of interruption are we talking about? Are we
>> speaking about a second or two?
>>
>> I'm asking because we're trying to decide if we want to split this out
>> to another night. If the disruption is minor and the risk is low then
>> we'll do it tonight. Otherwise, we might choose to do it on a separate
>> night.
>>
>> Any thoughts?
>> ___
>> cisco-nsp mailing list  cisco-nsp@puck.nether.net
>> 
https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at 
http://puck.nether.net/pipermail/cisco-nsp/

>>
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at 
http://puck.nether.net/pipermail/cisco-nsp/





Tim Stevenson, tstev...@cisco.com
Routing & Switching CCIE #5561
Distinguished Technical Marketing Engineer, Cisco Nexus 7000
Cisco - http://www.cisco.com
IP Phone: 408-526-6759

The contents of this message may be *Cisco Confidential*
and are intended for the specified recipients only.


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
We're running 12.2(33)SRC2, I believe. It's actually experimental code
and the exact version is overwritten with another code.

On Tue, Sep 21, 2010 at 12:04 PM, Jeffrey Pazahanick  wrote:
> John,
> Having switched back and forth a few times, I never noticed more than
> a 1-2 second outage.
> What version of code are you on?
>
> On Tue, Sep 21, 2010 at 11:59 AM, John Neiberger  wrote:
>> We're going to be doing a whole bunch of maintenance tonight during a
>> maintenance window. One of the many things on our plate is to switch
>> from ingress replication mode to egress on some 7600s that have a few
>> hundred multicast routes on them. We know there is going to be at
>> least a minor blip while things settle down after making the change,
>> but I wanted to see if anyone on the list has done this and what the
>> operational impact was. I've heard there will be slight interruption
>> in traffic, but what sort of interruption are we talking about? Are we
>> speaking about a second or two?
>>
>> I'm asking because we're trying to decide if we want to split this out
>> to another night. If the disruption is minor and the risk is low then
>> we'll do it tonight. Otherwise, we might choose to do it on a separate
>> night.
>>
>> Any thoughts?
>> ___
>> cisco-nsp mailing list  cisco-...@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread Jeffrey Pazahanick
John,
Having switched back and forth a few times, I never noticed more than
a 1-2 second outage.
What version of code are you on?

On Tue, Sep 21, 2010 at 11:59 AM, John Neiberger  wrote:
> We're going to be doing a whole bunch of maintenance tonight during a
> maintenance window. One of the many things on our plate is to switch
> from ingress replication mode to egress on some 7600s that have a few
> hundred multicast routes on them. We know there is going to be at
> least a minor blip while things settle down after making the change,
> but I wanted to see if anyone on the list has done this and what the
> operational impact was. I've heard there will be slight interruption
> in traffic, but what sort of interruption are we talking about? Are we
> speaking about a second or two?
>
> I'm asking because we're trying to decide if we want to split this out
> to another night. If the disruption is minor and the risk is low then
> we'll do it tonight. Otherwise, we might choose to do it on a separate
> night.
>
> Any thoughts?
> ___
> cisco-nsp mailing list  cisco-...@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


[c-nsp] Operational impact of switching from ingress to egress replication mode

2010-09-21 Thread John Neiberger
We're going to be doing a whole bunch of maintenance tonight during a
maintenance window. One of the many things on our plate is to switch
from ingress replication mode to egress on some 7600s that have a few
hundred multicast routes on them. We know there is going to be at
least a minor blip while things settle down after making the change,
but I wanted to see if anyone on the list has done this and what the
operational impact was. I've heard there will be slight interruption
in traffic, but what sort of interruption are we talking about? Are we
speaking about a second or two?

I'm asking because we're trying to decide if we want to split this out
to another night. If the disruption is minor and the risk is low then
we'll do it tonight. Otherwise, we might choose to do it on a separate
night.

Any thoughts?
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/