Since this list likes to speculate with little facts on a regular basis
(and I'll admit to being as guilty as anyone) I throw this one out for
opinions :
We were seeing very odd behavior on a Cogent circuit following a
software upgrade to tol01.atlas. Two traceroutes:
mark@angola-gw> traceroute 74.125.226.6
traceroute to 74.125.226.6 (74.125.226.6), 30 hops max, 40 byte packets
1 * * gi1-1.ccr01.tol01.atlas.cogentco.com (38.104.148.5) 110.315 ms
2 te4-2.ccr01.sbn01.atlas.cogentco.com (154.54.7.154) 139.520 ms
196.910 ms 5.728 ms
3 * * *
4 * te0-5-0-5.ccr21.ord03.atlas.cogentco.com (154.54.44.174) 8.310
ms te0-0-0-7.ccr21.ord03.atlas.cogentco.com (154.54.25.70) 8.752 ms
5 te0-0-0-0.ccr22.ord03.atlas.cogentco.com (154.54.24.214) 8.983 ms
te0-1-0-0.ccr22.ord03.atlas.cogentco.com (66.28.4.66) 7.948 ms *
6 * * te-9-1.car4.Chicago1.Level3.net (4.68.127.129) 26.127 ms
7 GOOGLE-INC.car4.Chicago1.Level3.net (4.71.100.22) 38.132 ms
25.120 ms *
8 * * 209.85.254.122 (209.85.254.122) 24.539 ms
9 * 72.14.237.130 (72.14.237.130) 26.134 ms 72.14.237.108
(72.14.237.108) 25.021 ms
MPLS Label=666803 CoS=4 TTL=1 S=1
10 216.239.46.161 (216.239.46.161) 31.816 ms 35.702 ms 32.249 ms
11 72.14.233.142 (72.14.233.142) 32.897 ms * *
12 * yyz06s05-in-f6.1e100.net (74.125.226.6) 33.319 ms *
and a ping over the same path:
--- www.l.google.com ping statistics ---
675 packets transmitted, 323 packets received, 52.1% packet loss
round-trip min/avg/max/stddev = 12.834/28.831/129.743/28.987 ms
and at the same time:
mark@angola-gw> traceroute 38.100.128.10
traceroute to 38.100.128.10 (38.100.128.10), 30 hops max, 40 byte packets
1 gi1-1.ccr01.tol01.atlas.cogentco.com (38.104.148.5) 4.445 ms
1.841 ms 1.713 ms
2 te7-7.ccr02.cle04.atlas.cogentco.com (154.54.5.230) 5.318 ms
te3-2.ccr02.cle04.atlas.cogentco.com (154.54.28.86) 4.755 ms
te7-7.ccr02.cle04.atlas.cogentco.com (154.54.5.230) 4.982 ms
3 te4-2.ccr01.pit02.atlas.cogentco.com (154.54.30.10) 7.997 ms
te3-2.ccr01.pit02.atlas.cogentco.com (154.54.30.6) 7.736 ms
te4-2.ccr01.pit02.atlas.cogentco.com (154.54.30.10) 8.177 ms
4 te0-0-0-5.mpd21.dca01.atlas.cogentco.com (154.54.40.81) 17.197 ms
te0-0-0-5.ccr22.dca01.atlas.cogentco.com (154.54.30.230) 16.907 ms
te0-0-0-5.mpd21.dca01.atlas.cogentco.com (154.54.40.81) 17.008 ms
5 te0-1-0-0.mpd22.dca01.atlas.cogentco.com (154.54.2.193) 17.358 ms
te0-0-0-0.mpd22.dca01.atlas.cogentco.com (154.54.31.38) 17.196 ms
te0-1-0-0.mpd22.dca01.atlas.cogentco.com (154.54.2.193) 18.690 ms
6 te4-2.mpd01.iad03.atlas.cogentco.com (154.54.29.122) 17.885 ms *
18.537 ms
7 cogentco.com (38.100.128.10) 17.836 ms !<10> 17.918 ms !<10>
17.833 ms !<10>
--- 38.100.128.10 ping statistics ---
236 packets transmitted, 236 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 22.717/27.942/128.011/12.236 ms
sh-3.2#
Works perfectly. There is no asymmetric routing in this scenario
(only 1 BGP peer running during this test), and it is not due to traffic
congestion. Initial speculation over the dropped packets in the trace
to 74.125.226.6 was ICMP depriortization. The results are too
consistent for that to make sense (I have dozens of traceroutes to the
same destination - they all appear similar).
I realize there is a long history of Cogent/L3 ugliness but I'm pretty
sure that this issue has nothing to do with that subject.
Traceroutes and pings from the control plane of tol01.atlas sourced from
38.104.148.5 do not show any odd behavior. Inbound traffic (to us) is
not affected by this. Our workaround while resolving this issue was to
change local-pref on the affected prefixes to send traffic out our other
providers.
The issue started after a software upgrade to tol01.atlas and resolved
after a (reported) reboot of tol01.atlas.
The question is: How does a router break in this manner? It appears
to unintentionally be doing something different with traffic based on
the source address, not the destination address. I realize this can
be done intentionally - but that is not the case here (unless somebody
isn't telling me something).
--
Mark Radabaugh
Amplex
m...@amplex.net 419.837.5015