Jeff, > I can fix them later, maybe even after I've had time to fully analyze the > problem and get a software update from my vendor.
Well that assumes you have even noticed the problem in the first place. On the point of flapping - completely agree. But the knob - already available in some implementations - not to flap, but to keep the session down till manual intervention - is completely different thing and this is completely safe solution from protocol correctness pov. -- Yes I understand your motivations, but the problem with BGP doing things like treat-as-withdraw by default are really not what you are describing. Cheers, R. On Thu, Jan 3, 2013 at 10:19 PM, Jeff Wheeler <j...@inconcepts.biz> wrote: > On Thu, Jan 3, 2013 at 3:18 PM, Robert Raszuk <rob...@raszuk.net> wrote: >> How are you going to clean the NLRIs in your network (both transit or >> stub) which were withdrawn in the messages your BGP implementation >> declared "bad" and decided to ignore ? > > I can fix them later, maybe even after I've had time to fully analyze the > problem and get a software update from my vendor. Maybe I'll try a refresh > or a session-reset, but I won't be at the mercy of repeatedly flapping > session and phone ringing off the hook with angry customers! > > A lot of folks are thinking about this problem in the context of the big > carrier who doesn't want a hard-to-diagnose problem of 1 RIB entry being > wrong. That's okay, it is one way to think about it. > > A second way to think of it is as a small/regional ISP. If one or more of > his transits are flapping because of a bad path on the DFZ, that is going to > cost him money and customers. If he has no way to mitigate it, he is at the > mercy of external parties. He could just use "ignore bad messages" and at > least stop bleeding money. He does not care if he can't reach 5 /24s at > LANL, they are unimportant to him. What is important is if he has any > customers left next week. > > A third way is the small- or medium-datacenter network. Imagine you are a > typical small/medium shop and you have some Cisco/Juniper/Brocade stuff for > your ASBRs and your core, but you bought a bunch of RainbowPoop Router Co > switches for your racks, because they are inexpensive and they support EVPN, > L3VPN, VPLS, or some other feature you want but Cisco/Juniper/Brocade don't > put into their inexpensive product. > > So your network looks like this: > > ISP1 ISP2 > > CISCO JUNIPER > | \/ > | /\ \ | > | / \ \ | > TOR1 TOR2 .... TOR99 > > Now imagine your JUNIPER supports NewVpnThing and that's a feature you > decided to use on the RainbowPoop TOR devices. But TOR1 sends a bad BGP > update. JUNIPER knows about NewVpnThing and sees a bad BGP attribute (that > it recognizes) so it does whatever the NewVpnThing spec says, and tears down > the session to TOR1. > > CISCO on the other hand, does not know about NewVpnThing so this router > doesn't even understand the update is bad. It just passes it along to TOR2 > .. TOR99. Now those boxes all tear down their session to the CISCO. Then > they re-establish. Then they go down again. They keep on doing this and > the network is freaking out. > > By the time your in-house clue notices, your symptom is that 99 identical > TORs are flapping their BGP to your CISCO. You probably don't even notice > the 1 TOR that is flapping to JUNIPER. Maybe JUNIPER even logs something > helpful but you may not investigate it for a while. > > So your CISCO which is following the base spec is carrying a buggy update to > your 99 other RainbowPoop TORs and they are all failing. Your JUNIPER which > knows about the NewVpnThing is following its spec and protecting the other > TORs from this problem, but it is probably not helpful since your network is > in chaos from all the flapping. > > What do you do? Call vendor support. Probably for CISCO and RainbowPoop. > Well, now you are expecting the TAC of Cisco and the TAC of RainbowPoop to > cooperate, which they'll have trouble doing; and it may take ages before > anyone identifies the root cause of the problem is really TOR1. > > There are going to be a lot of RainbowPoop routers in the future, and many > of them may use BGP. We should make BGP more robust. > > > -- > Jeff S Wheeler <j...@inconcepts.biz> > Sr Network Operator / Innovative Network Concepts > > _______________________________________________ > Idr mailing list > i...@ietf.org > https://www.ietf.org/mailman/listinfo/idr > _______________________________________________ GROW mailing list GROW@ietf.org https://www.ietf.org/mailman/listinfo/grow