I have an update on this issue: DEFECT 528034 has been opened. The short of it is that if there's a traffic loop causing a lot of MAC moves between cores that the FastIron storm control code, once the loop has subsided, eventually updates only one of the four cores.
The good news is that as long as you're sure that there will be no traffic loops this issue won't be a problem. The bad news is that if there is a traffic loop the fdb can become corrupted, and if you want to remove the corruption the switch (or switches, if it's a stack) will need to be reloaded (as clearing the MAC address entry/ies may not clear the fdb corruption). It's my understanding that the corruption is the cause of issue (a), unicast flooding, and perhaps items (b) and (c). And yes, it took four months for the case to get this far. Frank -----Original Message----- From: foundry-nsp [mailto:[email protected]] On Behalf Of Frank Bulk Sent: Friday, October 03, 2014 6:09 PM To: [email protected] Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running 7.4.00f switch code I'm looking to find more examples of the issue in the field. I'd appreciate if anyone with an ICX6610 stack that has a largish MAC address table and would be willing to have the stack's MAC address table audited by a Perl script to contact me off-list (you would review and run the script yourself). Thanks in advance, Frank -----Original Message----- From: foundry-nsp [mailto:[email protected]] On Behalf Of Frank Bulk (iname.com) Sent: Wednesday, October 01, 2014 10:50 AM To: [email protected] Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running 7.4.00f switch code Just an update on this issue: we finally got a Brocade developer to personally interact with our ICX 6610 stack. After reviewing the issue and poking through some 'dm' output the developer eventually identified the reason for the unicast flooding issue: not all four packet processor CPU cores share the same MAC address value for a certain hardware index. In the example I have after the signature you can find an example. MAC address entry 0090.5E14.6182 has a hardware index of 25864 (you can find the HW index from the "show mac" command). Checking each CPU core on each shelf using the 'dm' command (I used the rcon command to remote to the standby shelf) you can see that on the active shelf the MAC address for cores 1 thru 3 is incorrect. All the cores on the standby shelf have the correct MAC address. Apparently the standby shelf learns its MAC address entries from the active shelf. In our case frames are predominately entering core 0 (each interface is tied to one of the four cores), but they flood out the interfaces that are in other cores on that shelf because there is no matching MAC address for that traffic. The developer is looking into why cores 1 thru 3 sometimes don't have the right values. I wrote a script to check every single MAC address in the MAC address table (takes about 2.5 seconds) and found five such inconsistencies out of ~5,500 MAC addresses. When I ran the script again overnight I found one more inconsistency a different nature, where one of the standby shelf's cores had a MAC address of all zeroes. I checked another ICX 6610 stack that has just ~550 MAC addresses and found no inconsistencies. Frank dm pp-dev 2 chow-diags core-id 0 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: C3044B01 0120BC28 0000E004 00000000 Valid [0]= True Skip [1]= False age [2]= False EntryType [3:4]= MAC addr VID [5:16]= 600 MacAddr [17:64]= 0090.5E14.6182 DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x007 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3 telnet@ICX6610-24 Switc dm pp-dev 2 chow-diags core-id 1 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: E1B40143 38FDCA92 00000004 00000000 Valid [0]= True Skip [1]= True age [2]= False EntryType [3:4]= MAC addr VID [5:16]= 10 MacAddr [17:64]= 1C7E.E549.70DA DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x000 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x0, user_defined=0, trunk=no, port=0 telnet@ICX6610-24 Switc dm pp-dev 2 chow-diags core-id 2 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: E1B40143 38FDCA92 00000004 00000000 Valid [0]= True Skip [1]= True age [2]= False EntryType [3:4]= MAC addr VID [5:16]= 10 MacAddr [17:64]= 1C7E.E549.70DA DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x000 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x0, user_defined=0, trunk=no, port=0 telnet@ICX6610-24 Switc dm pp-dev 2 chow-diags core-id 3 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: E1B40143 38FDCA92 00000004 00000000 Valid [0]= True Skip [1]= True age [2]= False EntryType [3:4]= MAC addr VID [5:16]= 10 MacAddr [17:64]= 1C7E.E549.70DA DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x000 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x0, user_defined=0, trunk=no, port=0 telnet@ICX6610-24 Switc dm pp-dev 0 chow-diags core-id 0 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: C3044B01 0120BC28 0000E004 00000000 Valid [0]= True Skip [1]= False age [2]= False EntryType [3:4]= MAC addr VID [5:16]= 600 MacAddr [17:64]= 0090.5E14.6182 DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x007 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3 [STBY]rconsole-1@ICX6610-24 Switc dm pp-dev 0 chow-diags core-id 1 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: C3044B01 0120BC28 0000E004 00000000 Valid [0]= True Skip [1]= False age [2]= False EntryType [3:4]= MAC addr VID [5:16]= 600 MacAddr [17:64]= 0090.5E14.6182 DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x007 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3 [STBY]rconsole-1@ICX6610-24 Switc dm pp-dev 0 chow-diags core-id 2 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: C3044B05 0120BC28 0000E004 00000000 Valid [0]= True Skip [1]= False age [2]= True EntryType [3:4]= MAC addr VID [5:16]= 600 MacAddr [17:64]= 0090.5E14.6182 DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x007 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3 [STBY]rconsole-1@ICX6610-24 Switc dm pp-dev 0 chow-diags core-id 3 read-mst 25864 cli_ch5_core_based_dm_pp_read_hw_mst cli_ch5_core_based_dm_pp_read_hw_mst_idx Data: C3044B01 0120BC28 0000E004 00000000 Valid [0]= True Skip [1]= False age [2]= False EntryType [3:4]= MAC addr VID [5:16]= 600 MacAddr [17:64]= 0090.5E14.6182 DevId [65:69]= 2 SrcId [70:74]= 0 Bits(24:13) [77:88]= 0x007 static [89]= False multiple [90]= False DA-Cmd [91:93]= FORWARD SA-Cmd [94:96]= FORWARD DARoute [97]= False StormPrevention [98]= False SAQosProfile ID [99:101]= 0 DAQosProfile ID [102:104]= 0 MirrorToAnalyze [105]= False Bits(24:13)=0x7, user_defined=0, trunk=yes, port=3 [STBY]rconsole-1@ICX6610-24 Switc -----Original Message----- From: foundry-nsp [mailto:[email protected]] On Behalf Of Frank Bulk Sent: Friday, September 05, 2014 9:16 PM To: [email protected] Subject: Re: [f-nsp] MAC address table issues with ICX 6610 stack running 7.4.00f switch code We replicated the issue with BTAC and continue troubleshooting. BTAC believes it's a packet processor issue -- we'll be more sure when we flip the active member of the cross-stack LAG to the other stack member. Frank -----Original Message----- From: foundry-nsp [mailto:[email protected]] On Behalf Of Frank Bulk Sent: Saturday, August 30, 2014 3:01 PM To: [email protected] Subject: [f-nsp] MAC address table issues with ICX 6610 stack running 7.4.00f switch code A few weeks ago a customer alerted us to a packet loss issue that we eventually traced down to a loop in a LAG on some access gear (not Brocade gear). In the process of troubleshooting and looking at MAC address tables on the intermediate gear we connected the WAN interface of a simple consumer-grade router on ethernet 1/1/23 of ICX 6610 #1 so we had a pingable host. What I noticed, when graphing that port, is that we were seeing a lot of traffic egressing the 1/1/23 -- anywhere from 200 kbps to 15 Mbps over the day! That seemed like a lot more than the usual amount of broadcast traffic on this 2500 host VLAN. This is an ICX 6610 stack running 7.4 switch code with almost all connections being a cross-stack LAG. Curious as to what was going on, I packet captured the port's output and discovered a lot of unicast traffic flooding out of 1/1/23. By doing some troubleshooting I uncovered three different situations: a) there are times the ICX 6610 lists the correct MAC address and port in its table for a host yet it still floods (some) unicast traffic for that host out of 1/1/23. b) despite having a static mac address entry in the switch to a LAG port the ICX6610 floods some traffic to that host out of 1/1/23 instead out of the statically specified LAG port. c) there are times the ICX 6610 has no MAC address table entry for a host, even though it should have learned it because it's getting traffic from that host on the LAG. Entering a static MAC address and then removing it then results in the switch learning it dynamically! The only traffic I should be seeing out of 1/1/23 is spanning tree, ARP, broadcast traffic, and any traffic for a host that has not yet been learned by the switch. We opened up two cases with Brocade TAC and I was able to able to confirm one of the two items with the tech, but since 7.4.00a had some MAC address table issues (Defect ID 437017 is one of them), rather than troubleshoot extensively we decided to start with a more current release and upgraded to 7.4.00f Thursday morning .... but I have already re-confirmed items (a) and (c). Has anyone else seen this issue? I don't think you would really notice it unless you really did some packet captures and looked for it. Frank _______________________________________________ foundry-nsp mailing list [email protected] http://puck.nether.net/mailman/listinfo/foundry-nsp _______________________________________________ foundry-nsp mailing list [email protected] http://puck.nether.net/mailman/listinfo/foundry-nsp _______________________________________________ foundry-nsp mailing list [email protected] http://puck.nether.net/mailman/listinfo/foundry-nsp _______________________________________________ foundry-nsp mailing list [email protected] http://puck.nether.net/mailman/listinfo/foundry-nsp _______________________________________________ foundry-nsp mailing list [email protected] http://puck.nether.net/mailman/listinfo/foundry-nsp
