Re: [c-nsp] Hardware limitations on SUP32 with LDP and full routing table

Marcus.Gerdon Fri, 23 Jan 2009 07:58:43 -0800

Yeah,

but "If it isn't supported in hardware, it's not supported." would mean any 
route (forwarding entry) not fitting into the TCAM would have to be ignored and 
maybe an error message issued.


Software forwarding by the MSFC is supported and actively used by all of the 
supervisors, as not all features are supported in hardware on the PFC's. So 
generally it's implemented (i.ew. IPv6 is done in software on Sup2/MSFC2 and is 
working).

Taken from 
http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a00804916e0.shtml#cb:

-----

%CFIB-SP-7-CFIB_EXCEPTION : FIB TCAM exception, Some entries will be software 
switched 
%CFIB-SP-STBY-7-CFIB_EXCEPTION : FIB TCAM exception, Some entries will be 
software switched

This error message is received when the amount of available space in the TCAM 
is exceeded. This results in high CPU. This is a FIB TCAM limitation. Once TCAM 
is full, a flag will be set and FIB TCAM exception is received. This stops from 
adding new routes to the TCAM. Therefore, everything will be software switched. 
The removal of routes does not help resume hardware switching. Once the TCAM 
enters the exception state, the system must be reloaded to get out of that 
state. The maximum routes that can be installed in TCAM is increased by the mls 
cef maximum-routes command.

-----

If the above behaviour were in fact the case there'd only be a problem with CPU 
load as all traffic gets software switched. No problem with incorrect 
forwarding would occur and break connectivity.

The reality is that new entries are created in software FIB (ip cef) but not in 
hardware FIB/TCAM (mls cef) if there's not enough room in the TCAM. By this the 
two FIB's become inconsistent and 'most specific first' is broken. In addition 
the above description is simply wrong as withdrawn routes trigger deletion of a 
forwarding entry in both FIB's and a newly created forwarding entry of the same 
prefix length is entered into the slot freed up. This is reproducable and with 
some investigative work and proper preparation I've even been able to predict 
the number of routes that will work and the number of the route in a series of 
new statics that will be the first that fails. (I've logged the test and 
attached it to the ticket, anybody interested in it pls. contact me off-list.)

Another indication for the inconsistency is that whilst forwarding of traffic 
for a destination fails the same destination is properly reachable from the 
machine itself as locally originated traffic is never hardware switched.

Also 'ip cef table consistency-check type scan-sw-hw' does help despite it 
takes it's time (with 'auto-repair' enabled of course) and without reloading. 
But the auto-repair introduces regualarly a few hundred ms of additional delay 
whilst reordering the TCAM causing service degredation. Even doing a 'clear ip 
route *' does help without a reload.


That's why I think the problem isn't usage of those hardware above the specs. 
There's some kind of fallback implemented, and instead of effectively limiting 
the functionality to the specs occasional and randomly (on first glance) 
occuring misbehaviour and inconsistencies are introduced. A simple fix like the 
one I already mentioned '(limiting prefix length to enter TCAM) or even the 
'mls cef maximum routes' being supported on Sup32 and Sup720 might provide at 
least a stable workaround. 

Using 'mls cef maximum-routes' I'd think about limiting the TCAM to 1024 
entries (if I remember right 1 Block's the minimum) easily being filled up with 
rather static /32-/30 and adjacency entries forcing everything else into 
software. The discussed problematic behaviour of broken most-specific-first 
wouldn't catch for any other productive routes as nearly all are forwarded by 
the MSFC. In worst case one could fill up the 1024 TCAM entries with dummy /32 
statics to force all production traffic into MSFC. On Sup32 this might work as 
the maximum-routes are configureable on them. Maybe someone might want to try 
that.

With 0.5mpps max. for software-switched traffic this would still be sufficient 
for a number of deployments.


regards,

Marcus


> -----Ursprüngliche Nachricht-----
> Von: Brian Turnbow [mailto:[email protected]] 
> Gesendet: Freitag, 23. Januar 2009 14:48
> An: Jon Lewis; Phil Mayers
> Cc: Marcus.Gerdon; [email protected]
> Betreff: RE: [c-nsp] Hardware limitations on SUP32 with LDP 
> and full routing table
> 
>  
> 
> 
> As has been said before...it's unfortunate cisco decided not to do a 
> Sup32-3bxl.  It renders the Sup32 unsuitable for use in 
> networks where a
> 
> Sup2 doesn't cut it, but Sup720-3bxl is overkill.
> 
> 
> Especially after they said they would (at lest at this roadshow)
> 
> http://www.cisco.at/partner/pdf/Tkrewedl_Roadshow_jan05_cataly
> st_TK.pdf
> 
> I've heard that some have tried it and it worked , this was 
> quite awhile
> ago though I'm sure newer IOS checks and complains if it finds a bxl.
> 
> Brian
> 
_______________________________________________
cisco-nsp mailing list  [email protected]
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: [c-nsp] Hardware limitations on SUP32 with LDP and full routing table

Reply via email to