Facebook stopped announcing the vast majority of their IP space to the DFZ during this.
This is where I would like to learn more about the outage. Direct Peering FB connections saw a drop in a networks (about a dozen) and one the networks covered their C and D Nameservers but the block for A and B name servers remained advertised but simply not responsive . I imagine the dropped blocks could have prevented internal responses but an suprise all of these issue would stem from the perspective I have . On Tue, Oct 5, 2021 at 8:48 AM Tom Beecher <beec...@beecher.cc> wrote: > Maybe withdrawing those routes to their NS could have been mitigated by >> having NS in separate entities. >> > > Assuming they had such a thing in place , it would not have helped. > > Facebook stopped announcing the vast majority of their IP space to the DFZ > during this. So even they did have an offnet DNS server that could have > provided answers to clients, those same clients probably wouldn't have been > able to connect to the IPs returned anyways. > > If you are running your own auths like they are, you likely view your > public network reachability as almost bulletproof and that it will never > disappear. Which is probably true most of the time. Until yesterday happens > and the 9's in your reliability percentage change to 7's. > > On Tue, Oct 5, 2021 at 8:10 AM Jean St-Laurent via NANOG <nanog@nanog.org> > wrote: > >> Maybe withdrawing those routes to their NS could have been mitigated by >> having NS in separate entities. >> >> Let's check how these big companies are spreading their NS's. >> >> $ dig +short facebook.com NS >> d.ns.facebook.com. >> b.ns.facebook.com. >> c.ns.facebook.com. >> a.ns.facebook.com. >> >> $ dig +short google.com NS >> ns1.google.com. >> ns4.google.com. >> ns2.google.com. >> ns3.google.com. >> >> $ dig +short apple.com NS >> a.ns.apple.com. >> b.ns.apple.com. >> c.ns.apple.com. >> d.ns.apple.com. >> >> $ dig +short amazon.com NS >> ns4.p31.dynect.net. >> ns3.p31.dynect.net. >> ns1.p31.dynect.net. >> ns2.p31.dynect.net. >> pdns6.ultradns.co.uk. >> pdns1.ultradns.net. >> >> $ dig +short netflix.com NS >> ns-1372.awsdns-43.org. >> ns-1984.awsdns-56.co.uk. >> ns-659.awsdns-18.net. >> ns-81.awsdns-10.com. >> >> Amnazon and Netflix seem to not keep their eggs in the same basket. From >> a first look, they seem more resilient than facebook.com, google.com and >> apple.com >> >> Jean >> >> -----Original Message----- >> From: NANOG <nanog-bounces+jean=ddostest...@nanog.org> On Behalf Of Jeff >> Tantsura >> Sent: October 5, 2021 2:18 AM >> To: William Herrin <b...@herrin.us> >> Cc: nanog@nanog.org >> Subject: Re: Facebook post-mortems... >> >> 129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes >> covering all 4 nameservers (a-d) were withdrawn from all FB peering at >> approximately 15:40 UTC. >> >> Cheers, >> Jeff >> >> > On Oct 4, 2021, at 22:45, William Herrin <b...@herrin.us> wrote: >> > >> > On Mon, Oct 4, 2021 at 6:15 PM Michael Thomas <m...@mtcc.com> wrote: >> >> They have a monkey patch subsystem. Lol. >> > >> > Yes, actually, they do. They use Chef extensively to configure >> > operating systems. Chef is written in Ruby. Ruby has something called >> > Monkey Patches. This is where at an arbitrary location in the code you >> > re-open an object defined elsewhere and change its methods. >> > >> > Chef doesn't always do the right thing. You tell Chef to remove an RPM >> > and it does. Even if it has to remove half the operating system to >> > satisfy the dependencies. If you want it to do something reasonable, >> > say throw an error because you didn't actually tell it to remove half >> > the operating system, you have a choice: spin up a fork of chef with a >> > couple patches to the chef-rpm interaction or just monkey-patch it in >> > one of your chef recipes. >> > >> > Regards, >> > Bill Herrin >> > >> > -- >> > William Herrin >> > b...@herrin.us >> > https://bill.herrin.us/ >> >>