> On Jun 24, 2019, at 11:12 AM, Max Tulyev <max...@netassist.ua> wrote:
> 
> 24.06.19 17:44, Jared Mauch пише:
>>> 1. Why Cloudflare did not immediately announced all their address space by 
>>> /24s? This can put the service up instantly for almost all places.
>> They may not want to pollute the global routing table with these entries.  
>> It has a cost for everyone.  If we all did this, the table would be a mess.
> 
> yes, it is. But it is a working, quick and temporary fix of the problem.

Like many things (eg; ATT had similar issues with 12.0.0.0/8) now there’s a 
bunch of /9’s in the table that will likely never go away.

>>> 2. Why almost all carriers did not filter the leak on their side, but 
>>> waited for "a better weather on Mars" for several hours?
>> There’s several major issues here
>> - Verizon accepted garbage from their customer
>> - Other networks accepted the garbage from Verizon (eg: Cogent)
>> - known best practices from over a decade ago are not applied
> 
> That's it.
> 
> We have several IXes connected, all of them had a correct aggregated route to 
> CF. And there was one upstream distributed leaked more specifics.
> 
> I think 30min maximum is enough to find out a problem and filter out it's 
> source on their side. Almost nobody did it. Why?

I have heard people say “we don’t look for problems”.  This is often the case, 
there is a lack of monitoring/awareness.  I had several systems detect the 
problem, plus things like bgpmon also saw it.

My guess is people that passed this on weren’t monitoring either.  It’s often 
manual procedures vs automated scripts watching things.  Instrumentation of 
your network elements tends to be a small set of people who invest in it.  You 
tend to need some scale for it to make sense, and it also requires people who 
understand the underlying data for what is “odd”.

This is why I’ve had my monitoring system up for the past 12+ years.  It’s 
super simple (dumb) and catches a lot of issues.  I implemented it again for 
the RIPE RIS Live service, but haven’t cut it over to be the primary (realtime) 
monitoring method vs watching route-views.

I think it’s time to do that.

- Jared

Reply via email to