Constantine, Please mail me offlist if Init7 can be of any help to resolve the case.
-- Fredy Kuenzler Init7 (Switzerland) Ltd. St.-Georgen-Strasse 70 CH-8400 Winterthur Switzerland http://www.init7.net/ > Am 30.11.2013 um 08:30 schrieb "Constantine A. Murenin" <muren...@gmail.com>: > > Dear NANOG@, > > I'm not exactly sure how else I can get he.net's attention, because I've been > experiencing congestion issues between my dedi and Indiana for a couple of > months now, all due to he.net's poor transit, as it turns out. The issue was > complicated by the fact that the routes are asymmetric, and it appears as if > the traffic loss is going on somewhere where there is none at all. > > I will just provide the data, and people can make their own conclusions, any > insights are welcome. > > During all of this, since some late September 2013, all 4 networks involved > have been contacted -- hetzner, init7, he.net, indiana; all except for he.net > have responded and did troubleshooting. > > After pressing the lack of any kind of response from he.net, all they did was > ask for a customer number, and that was back in September. I have not heard > from their NOC@ ever since, with requests left unanswered, sans the "we have > received your request" autoreply. > > Interestingly enough, only some of their Europe-to-US routes are blatantly > congested and have very obvious packet loss (often making ssh unusable), > whereas others appear to be doing just fine (at least, not losing packets and > not experiencing jitter, and the increased latency). E.g. IPv6 routes don't > appear affected, for example. IPv4 addresses in North America that are > announced directly from AS6939 (e.g. Linode in Fremont) don't appear > affected, either. But the multi-homed indiana.edu and wiscnet.net are > affected. The single-homed ntp1.yycix.ca is affected, too. Probably other > customers are affected as well. > > Where's the end to this? > > Or is the ongoing 0.5+% traffic loss, and the 140+ms avg latency on a 114ms > route, with random spikes and jitter in certain hours of the day (generally > around midnight ET), every day for several weeks or even months, an > acceptable practice? > > > > From hetzner.de through he.net: > > > Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.1 --order "SRL > BGAWV" -4 ????c????????.indiana.edu ; date > Fri Nov 29 21:06:17 PST 2013 > HOST: Cns??????? Snt Rcv Loss% Best Gmean > Avg Wrst StDev > 1.|-- static.??.???.4.46.clients.your-server.de 600 600 0.0% 0.5 > 1.0 1.3 4.9 1.1 > 2.|-- hos-tr1.juniper1.rz13.hetzner.de 600 600 0.0% 0.1 > 0.2 1.9 66.0 7.6 > 3.|-- core21.hetzner.de 600 600 0.0% 0.2 > 0.2 0.2 5.8 0.4 > 4.|-- core22.hetzner.de 600 600 0.0% 0.2 > 0.2 0.2 19.4 1.2 > 5.|-- core1.hetzner.de 600 600 0.0% 4.8 > 4.8 4.8 13.2 0.7 > 6.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 > 4.8 4.8 27.4 1.4 > 7.|-- 30gigabitethernet1-3.core1.ams1.he.net 600 600 0.0% 11.2 > 14.0 14.6 48.7 4.5 > 8.|-- 10gigabitethernet1-4.core1.lon1.he.net 600 600 0.0% 18.2 > 19.6 19.9 53.9 4.1 > 9.|-- 10gigabitethernet10-4.core1.nyc4.he.net 600 599 0.2% 87.0 > 116.1 116.7 145.7 12.4 > 10.|-- 100gigabitethernet7-2.core1.chi1.he.net 600 597 0.5% 106.6 > 135.4 136.1 192.0 13.3 > 11.|-- ??? 600 0 100.0 0.0 > 0.0 0.0 0.0 0.0 > 12.|-- et-11-0-0.945.rtr.ictc.indiana.gigapop.net 600 594 1.0% 113.3 > 139.3 139.7 166.1 11.4 > 13.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu 600 596 0.7% 113.2 > 139.8 140.3 177.3 12.0 > 14.|-- ae-0.0.br2.bldc.net.uits.iu.edu 600 595 0.8% 114.2 > 140.1 140.6 183.2 11.8 > 15.|-- ae-10.0.cr3.bldc.net.uits.iu.edu 600 597 0.5% 114.3 > 140.3 140.8 165.0 11.5 > 16.|-- ????c????????.indiana.edu 600 597 0.5% 114.7 > 140.7 141.1 161.6 11.4 > Fri Nov 29 21:08:52 PST 2013 > > > Cns# unbuffer hping --icmp-ts ????c????????.indiana.edu | \ > perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ > if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ > if (/tsrtt=(\d+)/) { \ > print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' > 0 143.5 144 = 87 + 57 > 1 125.5 126 = 69 + 57 > 2 143.6 144 = 87 + 57 > 3 157.9 158 = 102 + 56 > 4 122.0 122 = 66 + 56 > 5 141.6 142 = 85 + 57 > 6 132.2 133 = 76 + 57 > 7 146.2 146 = 89 + 57 > 8 145.1 145 = 88 + 57 > 9 119.9 119 = 63 + 56 > 10 132.7 132 = 75 + 57 > 11 140.1 140 = 83 + 57 > 12 151.0 151 = 94 + 57 > 13 152.6 152 = 96 + 56 > 14 129.1 129 = 72 + 57 > 15 128.5 128 = 71 + 57 > ^C > > > > Single-homed at he.net: > > > Cns# date ; mtr --report{,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 > ntp1.yycix.ca ; date > Fri Nov 29 21:16:14 PST 2013 > HOST: Cns??????? Snt Rcv Loss% Best Gmean Avg Wrst > StDev > 1.|-- static.??.???.4.46.client 600 600 0.0% 0.5 1.0 1.3 10.2 > 1.2 > 2.|-- hos-tr4.juniper2.rz13.het 600 600 0.0% 0.1 0.2 2.0 153.9 > 9.8 > 3.|-- core22.hetzner.de 600 600 0.0% 0.2 0.2 0.2 10.6 > 0.6 > 4.|-- core1.hetzner.de 600 600 0.0% 4.8 4.8 4.8 16.4 > 0.9 > 5.|-- juniper1.ffm.hetzner.de 600 600 0.0% 4.8 4.8 4.8 36.4 > 1.5 > 6.|-- 30gigabitethernet1-3.core 600 600 0.0% 11.2 13.5 14.0 36.6 > 4.3 > 7.|-- 10gigabitethernet1-4.core 600 600 0.0% 18.0 21.5 21.8 43.1 > 4.0 > 8.|-- 10gigabitethernet10-4.cor 600 597 0.5% 93.2 128.0 128.3 157.5 > 8.9 > 9.|-- 10gigabitethernet1-2.core 600 596 0.7% 103.1 139.4 139.6 157.5 > 8.2 > 10.|-- 10gigabitethernet3-1.core 600 597 0.5% 128.2 164.9 165.1 181.9 > 8.2 > 11.|-- 10gigabitethernet1-1.core 600 593 1.2% 138.7 175.9 176.1 192.6 > 7.8 > 12.|-- sebo-systems-inc.gigabite 600 597 0.5% 139.0 176.4 176.5 187.5 > 6.9 > 13.|-- ??? 600 0 100.0 0.0 0.0 0.0 0.0 > 0.0 > 14.|-- ntp1.yycix.ca 600 597 0.5% 141.0 176.9 177.0 186.9 > 6.9 > Fri Nov 29 21:18:32 PST 2013 > Cns# traceroute -A ntp1.yycix.ca > traceroute to ntp1.yycix.ca (192.75.191.6), 64 hops max, 40 byte packets > 1 static.??.???.4.46.clients.your-server.de (46.4.???.??) [AS24940] 0.664 ms > 0.648 ms 0.453 ms > 2 hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940] 23.985 ms > hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) [AS24940] 0.234 ms > hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) [AS24940] 0.238 ms > 3 core22.hetzner.de (213.239.245.121) [AS24940] 0.238 ms core21.hetzner.de > (213.239.245.81) [AS24940] 0.234 ms 0.236 ms > 4 core1.hetzner.de (213.239.245.177) [AS24940] 4.811 ms 4.809 ms > core22.hetzner.de (213.239.245.162) [AS24940] 0.248 ms > 5 core1.hetzner.de (213.239.245.177) [AS24940] 4.831 ms > juniper1.ffm.hetzner.de (213.239.245.5) [AS24940] 4.842 ms 4.826 ms > 6 juniper1.ffm.hetzner.de (213.239.245.5) [AS24940] 4.857 ms 4.864 ms > 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200] 11.233 ms > 7 10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) [AS6939, AS6939] > 19.869 ms 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200] > 18.420 ms 11.255 ms > 8 10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939] > 115.845 ms 101.875 ms 10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) > [AS6939, AS6939] 17.249 ms > 9 10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939] > 138.302 ms 10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939] > 120.449 ms 139.730 ms > 10 10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939] 134.755 > ms 104.661 ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) > [AS6939] 167.282 ms > 11 10gigabitethernet1-1.core1.yyc1.he.net (184.105.223.214) [AS6939] 139.310 > ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) [AS6939] 155.983 > ms 155.910 ms > 12 sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) > [AS6939] 138.703 ms 178.530 ms 10gigabitethernet1-1.core1.yyc1.he.net > (184.105.223.214) [AS6939] 172.423 ms > 13 sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) > [AS6939] 158 ms * * > 14 * * ntp1.yycix.ca (192.75.191.6) [AS53339] 181.433 ms > Cns# > Cns# > Cns# > Cns# unbuffer hping --icmp-ts ntp1.yycix.ca | perl -ne \ > 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ > if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ > if (/tsrtt=(\d+)/) { \ > print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }' > 0 165.0 165 = 95 + 70 > 1 156.2 156 = 86 + 70 > 2 178.9 179 = 109 + 70 > 3 181.0 181 = 111 + 70 > 4 178.3 179 = 108 + 71 > 5 163.8 164 = 94 + 70 > 6 175.7 176 = 106 + 70 > 7 173.9 174 = 104 + 70 > 8 172.6 173 = 103 + 70 > 9 163.5 164 = 94 + 70 > 10 181.8 182 = 112 + 70 > 11 161.9 162 = 92 + 70 > 12 183.1 184 = 113 + 71 > 13 174.5 174 = 104 + 70 > 14 181.8 181 = 111 + 70 > 15 181.7 181 = 111 + 70 > ^C > Cns# > > > > > From indiana.edu to hetzner.de; notice that the mtr by itself gives a false > impression of a traffic loss at init7, whereas in reality, it's the reverse > path through he.net that's causing the loss, as hping confirms: > > > m: {5134} date ; sudo mtr --report{,-cycles=600} --interval 0.1 --order "SRL > BGAWV" -4 ?????? ; date > Sat Nov 30 00:36:27 EST 2013 > HOST: ????c????????.indiana.edu Snt Rcv Loss% Best Gmean Avg Wrst > StDev > 1.|-- 129.79.???.? 600 600 0.0% 0.4 0.7 0.9 24.7 > 1.5 > 2.|-- ae-13.0.br2.bldc.net.uits 600 600 0.0% 0.5 0.7 0.9 22.6 > 1.8 > 3.|-- ae-0.0.br2.ictc.net.uits. 600 600 0.0% 1.4 1.7 1.8 20.2 > 1.6 > 4.|-- xe-0-1-0.11.rtr.ictc.indi 600 600 0.0% 1.4 2.1 3.8 66.5 > 8.1 > 5.|-- 64.57.21.13 600 600 0.0% 6.0 7.2 8.4 72.9 > 8.0 > 6.|-- xe-2-2-0.0.ny0.tr-cps.int 600 600 0.0% 32.3 33.9 34.4 81.0 > 6.9 > 7.|-- paix-nyc.init7.net 600 600 0.0% 32.5 35.3 35.5 44.7 > 3.8 > 8.|-- r1lon1.core.init7.net 600 599 0.2% 100.1 104.7 104.9 146.5 > 7.5 > 9.|-- r1nue1.core.init7.net 600 599 0.2% 114.6 115.7 115.7 125.4 > 2.2 > 10.|-- gw-hetzner.init7.net 600 594 1.0% 112.4 141.3 142.4 241.9 > 18.2 > 11.|-- core12.hetzner.de 600 468 22.0% 112.2 142.7 144.0 203.4 > 20.3 > 12.|-- core21.hetzner.de 600 202 66.3% 114.4 143.7 145.0 204.1 > 20.1 > 13.|-- juniper1.rz13.hetzner.de 600 594 1.0% 114.7 141.4 142.1 212.2 > 14.3 > 14.|-- hos-tr2.ex3k11.rz13.hetzn 600 599 0.2% 113.8 123.9 125.5 218.2 > 21.8 > 15.|-- static.88-198-??-??.clien 599 592 1.2% 114.6 137.2 137.9 167.6 > 13.2 > 0.244u 1.766s 1:05.52 3.0% 0+0k 0+1io 0pf+0w > Sat Nov 30 00:37:32 EST 2013 > > m: {5137} sudo script -q /dev/null hping3 --icmp-ts 88.198.??.?? | perl -ne > 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \ > if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \ > if (/tsrtt=(\d+)/) { \ > print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\r\n"; }' > 0 131.3 131 = 57 + 74 > 1 122.4 122 = 56 + 66 > 2 122.6 123 = 56 + 67 > 3 127.6 128 = 57 + 71 > 4 146.5 147 = 57 + 90 > 5 139.8 140 = 56 + 84 > 6 131.0 131 = 57 + 74 > 7 134.6 135 = 57 + 78 > 8 137.7 138 = 57 + 81 > 9 148.1 148 = 57 + 91 > 10 141.2 142 = 57 + 85 > 11 146.4 146 = 56 + 90 > 12 153.6 154 = 57 + 97 > 13 149.4 150 = 57 + 93 > 14 120.2 121 = 57 + 64 > 15 120.6 120 = 56 + 64 > 16 130.7 131 = 57 + 74 > 17 126.4 126 = 56 + 70 > 18 117.9 118 = 57 + 61 > 19 116.9 117 = 57 + 60 > 20 119.8 119 = 56 + 63 > 21 132.0 132 = 56 + 76 > 22 134.2 134 = 56 + 78 > 23 138.8 139 = 57 + 82 > > > > Note the ICMP timestamp data from hping above. From this ICMP timestamping > data, it is obvious that the congestion is only happening on one path -- the > one over he.net, and init7 is in the clear. > > Any further insights are welcome. But finding out about the ICMP timestamp > feature has so far been the most useful thing in troubleshooting this issue; > I'm surprised it's a rather unknown method to get to the bottom of these > problems. > > However, even after finding out about the cause and the party responsible, > the problem is yet to be exhausted. Any help appreciated. > > Best regards, > Constantine. >