Since this list likes to speculate with little facts on a regular basis (and I'll admit to being as guilty as anyone) I throw this one out for opinions :

We were seeing very odd behavior on a Cogent circuit following a software upgrade to tol01.atlas. Two traceroutes:

mark@angola-gw> traceroute 74.125.226.6
traceroute to 74.125.226.6 (74.125.226.6), 30 hops max, 40 byte packets
 1  * * gi1-1.ccr01.tol01.atlas.cogentco.com (38.104.148.5)  110.315 ms
2 te4-2.ccr01.sbn01.atlas.cogentco.com (154.54.7.154) 139.520 ms 196.910 ms 5.728 ms
 3  * * *
4 * te0-5-0-5.ccr21.ord03.atlas.cogentco.com (154.54.44.174) 8.310 ms te0-0-0-7.ccr21.ord03.atlas.cogentco.com (154.54.25.70) 8.752 ms 5 te0-0-0-0.ccr22.ord03.atlas.cogentco.com (154.54.24.214) 8.983 ms te0-1-0-0.ccr22.ord03.atlas.cogentco.com (66.28.4.66) 7.948 ms *
 6  * * te-9-1.car4.Chicago1.Level3.net (4.68.127.129)  26.127 ms
7 GOOGLE-INC.car4.Chicago1.Level3.net (4.71.100.22) 38.132 ms 25.120 ms *
 8  * * 209.85.254.122 (209.85.254.122)  24.539 ms
9 * 72.14.237.130 (72.14.237.130) 26.134 ms 72.14.237.108 (72.14.237.108) 25.021 ms
     MPLS Label=666803 CoS=4 TTL=1 S=1
10  216.239.46.161 (216.239.46.161)  31.816 ms  35.702 ms  32.249 ms
11  72.14.233.142 (72.14.233.142)  32.897 ms * *
12  * yyz06s05-in-f6.1e100.net (74.125.226.6)  33.319 ms *

and a ping over the same path:

--- www.l.google.com ping statistics ---
675 packets transmitted, 323 packets received, 52.1% packet loss
round-trip min/avg/max/stddev = 12.834/28.831/129.743/28.987 ms

and at the same time:

mark@angola-gw> traceroute 38.100.128.10
traceroute to 38.100.128.10 (38.100.128.10), 30 hops max, 40 byte packets
1 gi1-1.ccr01.tol01.atlas.cogentco.com (38.104.148.5) 4.445 ms 1.841 ms 1.713 ms 2 te7-7.ccr02.cle04.atlas.cogentco.com (154.54.5.230) 5.318 ms te3-2.ccr02.cle04.atlas.cogentco.com (154.54.28.86) 4.755 ms te7-7.ccr02.cle04.atlas.cogentco.com (154.54.5.230) 4.982 ms 3 te4-2.ccr01.pit02.atlas.cogentco.com (154.54.30.10) 7.997 ms te3-2.ccr01.pit02.atlas.cogentco.com (154.54.30.6) 7.736 ms te4-2.ccr01.pit02.atlas.cogentco.com (154.54.30.10) 8.177 ms 4 te0-0-0-5.mpd21.dca01.atlas.cogentco.com (154.54.40.81) 17.197 ms te0-0-0-5.ccr22.dca01.atlas.cogentco.com (154.54.30.230) 16.907 ms te0-0-0-5.mpd21.dca01.atlas.cogentco.com (154.54.40.81) 17.008 ms 5 te0-1-0-0.mpd22.dca01.atlas.cogentco.com (154.54.2.193) 17.358 ms te0-0-0-0.mpd22.dca01.atlas.cogentco.com (154.54.31.38) 17.196 ms te0-1-0-0.mpd22.dca01.atlas.cogentco.com (154.54.2.193) 18.690 ms 6 te4-2.mpd01.iad03.atlas.cogentco.com (154.54.29.122) 17.885 ms * 18.537 ms 7 cogentco.com (38.100.128.10) 17.836 ms !<10> 17.918 ms !<10> 17.833 ms !<10>

--- 38.100.128.10 ping statistics ---
236 packets transmitted, 236 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 22.717/27.942/128.011/12.236 ms
sh-3.2#

Works perfectly. There is no asymmetric routing in this scenario (only 1 BGP peer running during this test), and it is not due to traffic congestion. Initial speculation over the dropped packets in the trace to 74.125.226.6 was ICMP depriortization. The results are too consistent for that to make sense (I have dozens of traceroutes to the same destination - they all appear similar).

I realize there is a long history of Cogent/L3 ugliness but I'm pretty sure that this issue has nothing to do with that subject.

Traceroutes and pings from the control plane of tol01.atlas sourced from 38.104.148.5 do not show any odd behavior. Inbound traffic (to us) is not affected by this. Our workaround while resolving this issue was to change local-pref on the affected prefixes to send traffic out our other providers.

The issue started after a software upgrade to tol01.atlas and resolved after a (reported) reboot of tol01.atlas.

The question is: How does a router break in this manner? It appears to unintentionally be doing something different with traffic based on the source address, not the destination address. I realize this can be done intentionally - but that is not the case here (unless somebody isn't telling me something).



--
Mark Radabaugh
Amplex

m...@amplex.net  419.837.5015


Reply via email to