Hi Martin
I am not a spokesperson on behalf of L3 but I have worked for big telcos my 
whole career and my recommendation is to raise a trouble ticket if any one on 
the forum is their customer and is affected.
I don’t think Engineers at NOC are authorized to reply to forums at any of the 
major telcos especially regarding outages unless someone raise a trouble ticket 
and seeks an RCA of the issue one on one with them.


Utkarsh Gosain
Global Acc Director 
Tata Communications


-----Original Message-----
From: NANOG [mailto:nanog-boun...@nanog.org] On Behalf Of Martin Millnert
Sent: Friday, June 12, 2015 11:33 AM
To: NANOG
Subject: Open letter to Level3 concerning the global routing issues on June 12th

Dear Level3,

The Internet is a cooperative effort, and it works well only when its 
participants take constructive actions to address errors and remedy problems.
Your position as a major Internet Carrier bestows upon you a certain degree of 
responsibility for the correct operation of the Internet all across (and 
beyond) the planet. You have many customers. Customers will always occasionally 
make mistakes. You as a major Internet Carrier have a responsibility to limit, 
not amplify, your customers' mistakes.
Other major carriers implement technical measures that severely limits the 
damages from customer mistakes from having global impact.
Other major carriers also implement operational procedures in addition to 
technical measures.
In combination, these measures drastically reduce the outage-hours as a result 
of customer configuration errors.

At 08:44 UTC on Friday 12th of June, one of your transit customers, Telekom 
Malaysia (AS4788) began announcing the full Internet table back to you, which 
you accepted and propagated to your peers and customers, causing global outages 
for close to 3 hours.
[ https://twitter.com/DynResearch/status/609340592036970496 ] During this 3 
hour window, it appears (from your own service outage
reports) that you did nothing to stop the global Internet outage, but that 
Telekom Malaysia themselves eventually resolved it. This lack of action on your 
end, and your disregard for the correct operation of the global Internet is 
astonishing. These mistakes do not need to happen.
AS4788 under normal circumstances announces ~1900 IPv4 prefixes to the 
Internet. You accepted multiple hundred thousand prefixes from them - a max 
prefix setting would have severely limited the damage. We expect that these are 
your practices as well, but they failed. When they do, it should not take ~3 
hours to shut down the session(s).

Many operators, in despair, turned down their peering sessions with you once it 
was clear you were causing the outages and no immediate fix was in sight. This 
improved the situation for some - but not all did. Had you deployed proper 
IRR-filtering to filter the bad announcements the impact would've been far less 
critical.

As a direct consequence of your ~3 hours of inaction, as a local example, 
Swedish payment terminals were experiencing problems all over the country. The 
Swedish economy was directly affected by your inaction.
There were queues when I was buying lunch! Imagine the food rage. The situation 
was probably similar at other places around the globe where people were awake.

Operators around the planet are curious:
  - Did Level3 not detect or understand that it was causing global Internet 
outages for ~3 hours?
  - If Level3 did in fact detect or understand it was causing global Internet 
outages, why did it not properly and immediately remedy the situation?
  - What is Level3 going to do to address these questions and begin work on 
restoring its credibility as a carrier?

We all understand that mistakes do happen (in applying customer interface 
templates, etc.). However the Internet is all too pervasive in everyday life 
today for anything but swift action by carriers to remedy breakage after the 
fact. It is absolutely not sufficient to let a customer spend 3 hours to detect 
and fix a situation like this one. It is unacceptable that no swift action was 
taken on your end to limit the global routing issues you caused.

Sincerely,
Martin Millnert
Member of Internet Community - no carrier / ISP affiliation. 

Reply via email to