For the record … I seem to be “hit” by some bugs: CSCsl70634 Bug Details
Headline 67xx EC tx/rx traffic dependency resulting in low throughput Product IOS Feature OTHERS Duplicate of Severity 1 Severity help Status Resolved Status help First Found-in Version 12.2(18)SXF12 All affected versions First Fixed-in Version 8.7(0.22)BUB19, 8.7(0.22)SRC4, 12.2(18.12.5)SXF Version help Release Notes Symptom : Port Channel experiences over runs. Condition : Seen on 67xx card. Trigger : When port receives +6Gbps of ip2tag traffic. Frequency : Found internally. No service requests. Root Cause : This is caused by flow-control asserted by fabric interface asic. The impact : Impacts traffic. Workaround : None. Issue verification : None. And : CSCeh08451 Bug Details Headline Excessive Overruns and lbusDrops due heavy flow control over fabric Product IOS Feature OTHERS Duplicate of Severity 1 Severity help Status Resolved Status help First Found-in Version 12.2(17d)SXB02 All affected versions First Fixed-in Version 12.2(18)SXE, 12.2(18)SXD05, 12.2(17d)SXB08 Version help Release Notes Symptoms Sup720 system running in flow through mode (may not be not limited to this mode) may get to constant flow controlling situation under certain traffic profile which reduces the through put of the system Workaround A command has been to added to reserve ASIC buffers in Line card to improve the through-put of the system [no] fabric buffer-reserve [high | low | medium | value] high - 0x5050 medium - 0x4040 low - 0x3030 value any 16 bit value from 0x0 to 0x5050 From: cheddar cheese [mailto:[EMAIL PROTECTED] Sent: 5 ianuarie 2008 21:34 To: Gabriel Mateiciuc Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem probably best to open a case and upload all relevant info. good luck, -jay On Jan 5, 2008 12:58 PM, Gabriel Mateiciuc <[EMAIL PROTECTED]> wrote: Hello jay, First of all, thanks for the patience of reading and explaining all this. Unfortunately I was already aware of the facts that you've laid here. Normally, I would say the same things you've explained here, but … and there's a but, there are some more empirical observations: We've had previous experience with a 6500-SUP2 (no fabric) that could hit 80-90% bus utilization without packet loss/drops. About a month ago we were using 12.2(18)SXF3 IOS and we went to 12.2(18)SXF12. In the process we noticed the packet loss that occurs at peak hours, so at first we blamed the IOS and we started digging for solutions. Analyzing the trends revealed that the single difference is the bus utilization that rose from between 50-60% to 70-80% Comparing to 6 months ago: Then: 3,4-3,5 gbit/s on each of the 4 backbone links (port-channels of 4 gbit each) – less clients – bus 50-60% - IOS SXF3 Now: congestion loss on the bb links 2,5-3 gbit/s at peak hours – more clients connected to the classic cards – bus 70-80% - IOS SXF12 I've made some test like moving 2 of the 4 links on the supervisor from one Portchannel – that seemed to solve the problem for that po link. Putting another fabric-enabled card and moving some of the links there would solve the problem so I'm sure the bus is not hitting the limit. Then again, that doesn't answer why the 6724 seems uneffective. So, getting to the very problem, some previous experience with port-channels, balancing algorithms, high-traffic, not-recommended configuration options unless advised by tac, not documented IOS bugs … I think the answer would be amongst these. PS: I've read the caveats for the IOS we're using now … and there seems to be no link with the problems we're having. From: cheddar cheese [mailto:[EMAIL PROTECTED] Sent: 5 ianuarie 2008 20:03 To: Gabriel Mateiciuc Subject: Re: [c-nsp] 6500 - SUP720 - IOS - traffic problem Hello Gabriel, since you have a combo of fabric and non-fabric modules the system switching mode is "truncated". in this mode non-fabric cards like the 63xx modules put entire frames on the bus while fabric cards (like your 67xx modules) just put the headers. a non-fabric card forwards a frame via the bus to the Supervisors PFC and the PFC then switches it through the fabric to the fabric enabled card (for traffic going from non-fabric to fabric cards). The maximum centralized switch performance in truncated mode is 15 Mpps. It doesn't look like you're hitting this limit but it does look like the bus is busy. are the 63xx modules heavily utilized? is the traffic mostly large frames? i think replacing all or some of the 63xx modules with fabric enabled modules (like the 6748) should help reduce the bus utilization. also, if you replace all of them, the system can operate in "compact" mode which increases the maximum centralized switching capacity to 30 Mpps. if you add DFCs to the fabric-enabled cards (67xx) then port-to-port traffic within those cards doesn't touch the bus and the total switching capacity also scales by 48 Mpps per DFC. 6500 Architecture White paper http://www.cisco.com/en/US/products/hw/switches/ps708/products_white_paper0900aecd80673385.shtml Cisco TAC might be able to help further. -jay On Jan 5, 2008 8:14 AM, Gabriel Mateiciuc <[EMAIL PROTECTED]> wrote: Hello everyone, Here's the environment i'm talking about: #sh platform hardware capacity System Resources PFC operating mode: PFC3BXL Supervisor redundancy mode: administratively sso, operationally sso Switching resources: Module Part number Series CEF mode 1 WS-X6348-RJ-45 classic CEF 2 WS-X6348-RJ-45 classic CEF 3 WS-X6748-GE-TX CEF720 CEF 4 WS-X6724-SFP CEF720 CEF 5 WS-SUP720-3BXL supervisor CEF 6 WS-X6704-10GE CEF720 CEF 7 WS-X6348-RJ-45 classic CEF 8 WS-X6348-RJ-45 classic CEF 9 WS-X6348-RJ-45 classic CEF CPU Resources CPU utilization: Module 5 seconds 1 minute 5 minutes 3 0% / 0% 0% 0% 4 0% / 0% 0% 0% 5 RP 32% / 11% 11% 11% 5 SP 14% / 1% 9% 9% 6 0% / 0% 0% 0% Processor memory: Module Bytes: Total Used %Used 3 219661760 94927184 43% 4 219661760 94488840 43% 5 RP 927935472 132545832 14% 5 SP 912623676 218933576 24% 6 219661760 94944424 43% I/O memory: Module Bytes: Total Used %Used 5 RP 67108864 11891816 18% 5 SP 67108864 11891760 18% EOBC Resources Module Packets/sec Total packets Dropped packets 3 Rx: 7 280576601 3 Tx: 1 24002677 0 4 Rx: 7 280574860 3 Tx: 3 15260689 0 5 RP Rx: 72 141474821 4066 Tx: 59 109863281 0 5 SP Rx: 11 41664038 4697 Tx: 20 64613234 0 6 Rx: 8 280576597 2 Tx: 2 8779278 0 VLAN Resources VLANs: 4094 total, 149 VTP, 240 extended, 14 internal, 3691 free L2 Forwarding Resources MAC Table usage: Module Collisions Total Used %Used 5 0 65536 2604 4% VPN CAM usage: Total Used %Used 512 0 0% L3 Forwarding Resources FIB TCAM usage: Total Used %Used 72 bits (IPv4, MPLS, EoM) 524288 5558 1% 144 bits (IP mcast, IPv6) 262144 5 1% detail: Protocol Used %Used IPv4 5558 1% MPLS 0 0% EoM 0 0% IPv6 2 1% IPv4 mcast 3 1% IPv6 mcast 0 0% Adjacency usage: Total Used %Used 1048576 635 1% Forwarding engine load: Module pps peak-pps peak-time 5 7865738 8282714 22:21:27 UTC+2 Fri Jan 4 2008 CPU Rate Limiters Resources Rate limiters: Total Used Reserved %Used Layer 3 9 4 1 44% Layer 2 4 2 2 50% ACL/QoS TCAM Resources Key: ACLent - ACL TCAM entries, ACLmsk - ACL TCAM masks, AND - ANDOR, QoSent - QoS TCAM entries, QOSmsk - QoS TCAM masks, OR - ORAND, Lbl-in - ingress label, Lbl-eg - egress label, LOUsrc - LOU source, LOUdst - LOU destination, ADJ - ACL adjacency Module ACLent ACLmsk QoSent QoSmsk Lbl-in Lbl-eg LOUsrc LOUdst AND OR ADJ 5 1% 2% 1% 1% 1% 1% 0% 3% 0% 0% 1% QoS Policer Resources Aggregate policers: Module Total Used %Used 5 1024 1 1% Microflow policer configurations: Module Total Used %Used 5 64 1 1% Switch Fabric Resources Bus utilization: current: 71%, peak was 81% at 22:53:20 UTC+2 Fri Jan 4 2008 Fabric utilization: Ingress Egress Module Chanl Speed rate peak rate peak 3 0 20G 35% 48% @20:38 27Dec07 26% 36% @20:44 04Jan08 3 1 20G 40% 48% @23:00 04Jan08 34% 43% @22:21 03Jan08 4 0 20G 43% 55% @15:57 03Jan08 48% 63% @20:33 27Dec07 5 0 20G 13% 18% @21:42 02Jan08 9% 17% @22:52 04Jan08 6 0 20G 0% 1% @01:30 25Dec07 0% 2% @11:27 30Dec07 6 1 20G 33% 48% @20:26 27Dec07 45% 54% @22:36 03Jan08 Switching mode: Module Switching mode 3 truncated 4 truncated 5 flow through 6 truncated Interface Resources Interface drops: Module Total drops: Tx Rx Highest drop port: Tx Rx 1 7353 2166 1 38 2 24609502 144685 14 40 3 42130 8135613761 7 2 4 160468 49040038842 17 6 5 1354908 184496 1 2 6 12027 286149 1 1 7 29461165 218697 33 37 8 2033449 282 10 10 9 24030508 408094 36 29 Interface buffer sizes: Module Bytes: Tx buffer Rx buffer 1 112640 6144 2 112640 6144 3 1221120 152000 4 1221120 152000 6 14622592 1914304 7 112640 6144 8 112640 6144 9 112640 6144 And for those having enough patience to read the details, here's the question/problem: On the 4-th linecard (6724-SFP) we have links grouped in etherchannels (4xGigabit backbone links), with respect to keeping most of the etherchannels with their ports grouped on the same asic/linecard. The load-balancing used is src-dst-ip. Looking at the figures above I guess anyone would say there are plenty of resources left yet our graphs/interface summary shows us that somere between 40-50% fabric utilization, both ingress and egress, there is a problem with the forwarding performance (also seen looking at the high IQD counters): * GigabitEthernet4/1 0 3938121308 0 56 557290000 100095 620339000 94591 0 * GigabitEthernet4/2 0 3909192601 0 304 562387000 94364 602164000 93503 0 * GigabitEthernet4/3 0 3909817998 0 1113 561663000 94280 847735000 113865 0 * GigabitEthernet4/4 0 3939072687 0 53 557529000 95337 643992000 95015 0 Now, other (posibly) relevant information from the config: ip cef event-log traceback-depth 0 ip cef table consistency-check error-message ip cef table consistency-check auto-repair ip cef load-sharing algorithm original mls ip cef load-sharing simple fabric switching-mode allow truncated fabric buffer-reserve queue fabric buffer-reserve low - that seemed to help a lot (over 10% boost in performance) Did anyone hit similar problems with low performance on fabric enabled linecards ? Any recommended configuration/IOS version ? Cheers, Gabriel Mateiciuc Academia de Studii Economice Departamentul Reţele Echipa Infrastructura - [EMAIL PROTECTED] _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net <mailto:cisco-nsp@puck.nether.net> https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/ _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/