Re: Hurricane Electric packet loss
Hey Wolfgang, I believe I may be seeing similar behavior but it's hard for me to confirm. My network configuration is one that mtr doesn't support, so I can't get a report when we're having issues. I don't have my transit provided directly from HE, but rather through a provider who colocates out of one of their facilities. So I'm not sure I could even directly reach out to the Hurricane Electric NOC to get help. We've been seeing the odd connectivity issues between HE FMT2 (Linode) and AWS US-WEST-1 and US-WEST-2. It's a mixed combination of loss and increased latency, both which cause some hiccups in some of our WAN-based clusters. There have been times where the issues we've seen have been attributed to a DoS attack directed toward a Linode customer, but there have been quite a few networking events that seem to have no relation to a known attack. Thanks for reaching out to NANOG with this issue, it may have shed some light on some of the issues we are seeing. Cheers! -Tim On Tue, Jul 22, 2014 at 2:48 AM, Wolfgang Nagele (AusRegistry) wolfgang.nag...@ausregistry.com.au wrote: Hi, We’ve been customers of Hurricane Electric for a number of years now and always been happy with their service. In recent months packet loss on some of their major routes has become a very common (every few days) occurrence. Without knowledge of their network I am unsure what’s the cause of it but we’ve seen it on the Tokyo - US routes as well as the London - US routes. It reminds me of the Cogent expansion which was carried out by unsustainable oversubscription which eventually resulted in unusable service for a number of years. Having seen some of the rates that HE has been selling for I can’t help but wonder if they made the same mistake ... Here is an example of what’s going on again atm. HOST: prolocation01.ring.nlnog.ne Loss% Snt Last Avg Best Wrst StDev 1.|-- 2a00:d00:ff:136::253 0.0%110.3 0.3 0.3 0.4 0.0 2.|-- 2a00:d00:1:12::1 0.0%100.7 0.8 0.7 1.1 0.1 3.|-- hurricane-electric.nikhef 0.0%100.7 3.1 0.7 8.3 2.9 4.|-- 100ge9-1.core1.lon2.he.ne 0.0%109.8 12.6 8.0 19.2 4.1 5.|-- 100ge1-1.core1.nyc4.he.ne 10.0%10 74.7 74.6 73.7 80.8 2.3 6.|-- 10ge10-3.core1.lax1.he.ne 30.0%10 133.4 138.0 133.4 145.1 4.8 7.|-- 10ge1-3.core1.lax2.he.net 20.0%10 135.7 139.1 133.4 145.1 4.5 8.|-- 2001:504:13::3b 40.0%10 143.2 143.1 142.1 144.4 0.8 9.|-- 2402:7800:100:1::55 50.0%10 144.4 144.1 143.8 144.4 0.2 10.|-- 2402:7800:0:1::f6 60.0%10 298.7 298.4 298.2 298.7 0.2 11.|-- ge-0-1-4.cor02.syd03.nsw. 10.0%10 299.3 298.9 298.3 299.5 0.5 12.|-- 2402:7800:0:2::18a20.0%10 299.7 299.4 298.9 300.1 0.4 13.|-- 2001:dcd:12::10 30.0%10 299.8 299.5 298.8 300.0 0.5 Is anybody else observing this as well? Cheers, Wolfgang
Re: Connectivity issue between Verizon and Amazon EC2
I am seeing the same issue between AWS US-WEST 2 and Hurricane Electric's Fremont 2 location (Linode). Looks to be deep within Amanzon's network based on changes in latency in a simple trace route. I would provide an mtr, however my network configuration is something mtr doesn't support. Cheers! -Tim On Jul 21, 2014 8:44 PM, Roland Dobbins rdobb...@arbor.net wrote: On Jul 22, 2014, at 10:31 AM, Ray Van Dolson rvandol...@esri.com wrote: We're seeing poor performance (very slow download speeds -- 100KB/sec) to certain EC2 instances via our Verizon hosted circuits. Have you tried dorking around with your MTU to see if that makes a difference? -- Roland Dobbins rdobb...@arbor.net // http://www.arbornetworks.com Equo ne credite, Teucri. -- Laocoön
Re: Connectivity issue between Verizon and Amazon EC2
Realized I sent the reply to Roland. Apologies. Here it is in full: I am seeing the same issue between AWS US-WEST 2 and Hurricane Electric's Fremont 2 location (Linode). Looks to be deep within Amanzon's network based on changes in latency in a simple trace route. I would provide an mtr, however my network configuration is something mtr doesn't support. Cheers! -Tim On Jul 21, 2014 8:34 PM, Ray Van Dolson rvandol...@esri.com wrote: I'm short some important details on this one, but hopefully can fill in more shortly. We're seeing poor performance (very slow download speeds -- 100KB/sec) to certain EC2 instances via our Verizon hosted circuits. The issue is reproducible on both our production Gigabit circuit as well as a consumer grade Verizion FIOS line. Speeds are normal (10MB/sec plus) via non-Verizon circuits we've tested. Source IP's are in the 198.102.62.0/24 range and destination on the EC2 side is 54.197.239.228. I'm not sure in which availability zone the latter IP sits, but hope to find out shortly. MTR traceroute details are as follows: Host Loss% Snt Drop Avg Best Wrst StDev 1. 198.102.62.253 0.0% 5260 0.2 0.2 0.5 0.0 2. 152.179.250.141 0.0% 5260 14.1 7.0 19.4 3.6 3. 140.222.225.135 37.5% 526 197 7.7 6.8 35.8 1.9 4. 129.250.8.85 0.0% 5260 8.1 7.4 11.7 0.3 5. 129.250.2.229 10.3% 525 54 11.4 7.1 85.7 9.6 6. 129.250.2.169 41.5% 525 218 63.0 45.5 130.7 10.3 7. 129.250.2.1540.2% 5251 59.9 44.5 69.0 4.0 8. ??? 9. 54.240.229.967.8% 525 41 76.6 71.3 119.9 8.6 54.240.229.104 54.240.229.106 10. 54.240.229.2 6.9% 525 36 74.7 71.6 109.1 4.9 54.240.229.4 54.240.229.20 54.240.229.8 54.240.229.14 54.240.228.254 54.240.229.16 54.240.229.10 11. 54.240.229.174 5.5% 525 29 76.0 71.7 109.0 7.3 54.240.229.162 54.240.229.160 54.240.229.170 54.240.229.172 54.240.229.168 54.240.229.164 12. 54.240.228.167 94.5% 525 495 76.4 71.7 126.0 11.6 54.240.228.169 54.240.228.165 54.240.228.163 13. 72.21.220.1085.1% 525 27 75.2 71.3 112.6 6.8 205.251.244.12 72.21.220.8 205.251.244.64 72.21.220.96 205.251.244.8 72.21.220.6 205.251.244.4 14. 72.21.220.45 9.0% 525 47 74.0 71.6 199.5 8.5 72.21.220.149 72.21.220.29 72.21.220.125 72.21.220.37 72.21.220.61 72.21.220.2 72.21.220.69 15. 72.21.222.3310.5% 525 55 73.4 71.5 87.1 1.5 205.251.245.65 72.21.222.149 72.21.222.35 72.21.220.29 72.21.222.131 72.21.222.147 72.21.220.37 16. 205.251.245.65 93.9% 525 492 73.1 72.2 76.2 1.2 72.21.222.35 72.21.222.131 17. ??? 18. ??? 19. 216.182.224.79 13.5% 524 71 77.9 72.4 101.2 5.4 216.182.224.81 216.182.224.95 216.182.224.77 20. 216.182.224.81 94.1% 524 492 77.9 72.8 93.0 6.3 216.182.224.95 216.182.224.77 21. ??? The 140.222.225.135 shows up in the traceroutes via our Verizon Business FIOS line as well. Will be opening a ticket with both Verizon and AWS to assist, but hoping someone out there can take a look or chime in. Feel free to reply off list. Thanks, Ray
Re: Erroneous Leap Second Introduced at 2014-06-30 23:59:59 UTC
On Mon, Jun 30, 2014 at 7:27 PM, Majdi S. Abbas m...@latt.net wrote: On Mon, Jun 30, 2014 at 05:33:52PM -0700, Tim Heckman wrote: I just was alerted to one of the systems I managed having a time skew greater than 100ms from NTP sources. Upon further investigation it seemed that the time was off by almost exactly 1 second. Looking back over our NTP monitoring, it would appear that this system had a large time adjust at approximately 00:00 UTC: Okay. Do you have any logging configured (peerstats, etc?) for ntpd? Our systems all have loopstats and peerstats logging enabled. I have those log files available if interested. However, when I searched over the files I wasn't able to find anything that seemed to indicate this was the peer who told the system to introduce a leap second. That said, I might just not know what to look for in the logs. A few of our systems did alert early this morning, indicating they were going to be receiving a leap second today. However, I was unable to determine the exact cause for NTP believing a leap second should be added. And after some time a few of the systems were no longer indicating that a leap second would be introduced. This can happen if a server is either passing along a leap notification that it received, or is configured to use a leapseconds file that is incorrect. Correct, I was hoping to determine which peer it was so I can reach out to them to make sure this doesn't bleed in to the pool at the end of the year. I was also more-or-less curious how wide-spread of an issue this was, but I'm starting to think I may have been the only person to catch it in the act. :) This specific system is hosted in AWS US-WEST-2C and uses the 0.amazon.pool.ntp.org pool. 0 is just one server in the pool (whichever you draw by rotation); is this the only server you have configured? We use 0.amazon.pool.ntp.org, 1.amazon.pool.ntp.org, and 2.amazon.pool.ntp.org. As with the other widely-used pool hostnames, each of these is a round-robin DNS entry with 4 hosts and a TTL of 150s. --msa Thank you for getting back to me. Cheers! -Tim
Re: Erroneous Leap Second Introduced at 2014-06-30 23:59:59 UTC
On Tue, Jul 1, 2014 at 12:35 PM, Majdi S. Abbas m...@latt.net wrote: On Tue, Jul 01, 2014 at 12:20:12PM -0700, Tim Heckman wrote: Our systems all have loopstats and peerstats logging enabled. I have those log files available if interested. However, when I searched over the files I wasn't able to find anything that seemed to indicate this was the peer who told the system to introduce a leap second. That said, I might just not know what to look for in the logs. Look at the status word in peerstats; if the high bit is set, that's your huckleberry. See: http://www.eecis.udel.edu/~mills/ntp/html/decode.html I've taken a look at all of the peerstats available for this host, and surprisingly none of them are showing code 09 (leap_armed). I'm also fairly certain that I know when some of my systems armed the leap second (within a 60-120s window) based on our monitoring. Around those times everything seems normal according to peerstats. Looking at I am running Ubuntu 10.04 on this box, which is ntp v4.2.4p8. I'll need to looking to see if the printing of this flag was added later; otherwise, it would seem some of my systems picked up a phantom leap second from an unknown source with one of them actually executing it. Thanks for the decoder ring. My Google-fu wasn't hitting the right keywords. Correct, I was hoping to determine which peer it was so I can reach out to them to make sure this doesn't bleed in to the pool at the end of the year. I was also more-or-less curious how wide-spread of an issue this was, but I'm starting to think I may have been the only person to catch it in the act. :) You might want to upgrade to current 4.2.7 development code, wherein a majority rule is used to qualify the leap indicator. We're going to be doing some system refreshes coming soon, so that may be something we'll need to look at. I didn't realize this was happening as part of the 4.2.7 development branch. Definitely an interesting feature, especially after this. :p Cheers, --msa Thanks again, Majdi. Cheers! -Tim
Erroneous Leap Second Introduced at 2014-06-30 23:59:59 UTC
Hey Everyone, I just was alerted to one of the systems I managed having a time skew greater than 100ms from NTP sources. Upon further investigation it seemed that the time was off by almost exactly 1 second. Looking back over our NTP monitoring, it would appear that this system had a large time adjust at approximately 00:00 UTC: - http://puu.sh/9Rs6O/a514ad7c97.png (times are in Pacific in these graphs, sorry about that) A few of our systems did alert early this morning, indicating they were going to be receiving a leap second today. However, I was unable to determine the exact cause for NTP believing a leap second should be added. And after some time a few of the systems were no longer indicating that a leap second would be introduced. This specific system is hosted in AWS US-WEST-2C and uses the 0.amazon.pool.ntp.org pool. Has anyone else seen any erroneous leap seconds being added to their system? Cheers! -Tim Heckman
Packets dropped passing from Qwest to Verizon
Hello, I'm looking looking for a POC at Qwest (AS209) or Verizon (AS701) to help diagnose what looks like a stale bogon filter. The packets drop where Qwest (63.146.26.210) peers with Verizon (152.63.2.130). Thanks in advance! Regards, Tim H.