The difference between ignore and treat-as-withdraw is discussed in the draft. Anyway, the feature will be knobbed. So far, I see 2 knob options: 1. Treat-as-withdraw / ignore 2. If stream sync is lost, search ahead for the next 16*0xFF / reset the session.
Cheers, Jakob On Dec 31, 2012, at 10:14 AM, "Jeff Wheeler" <j...@inconcepts.biz> wrote: > On Mon, Dec 31, 2012 at 12:54 PM, Jakob Heitz <jakob.he...@ericsson.com> > wrote: >> I don't think treat-as-withdraw is trying to fix a single session reset. >> Graceful restart can fix that. It's the rolling resets that need a human to >> remove a buggy router or a config that triggered the bug. That takes several >> hours. Treat-as-withdraw limits the damage during those hours. >> >> Could we please settle on that without trying to solve the impossible? > > If you read my posts to IDR on this topic, you'll see where I explain > how it is possible to solve the "impossible." > > Specifically, you can ignore just about any bad update, or bad message > of any kind, as long as you can figure out where the next message > starts. This fixes the rolling resets. > > You may know that a lot of businesses suffered multi-hour outages in > October simply because of 5 DFZ routes announced by LANL that had > illegal attributes. This is very hard for operators to troubleshoot > on most routers. The routers that experienced rolling resets were > buggy but if the operators simply had a panic button, "ignore bad > messages," their networks would have been up and they would not have > been losing money by the minute. > > This is not the only time bad updates have propagated through the DFZ > and caused big problems. It has happened repeatedly. > > I believe it will begin to happen more often inside datacenter > networks, because BGP is being used for more and more things, like > EVPN. Operators are going to need BGP to become more robust. > > You can make it a lot more robust just by deciding to ignore > everything in a bad message. This is not good, but it is a lot better > than session-reset, in most cases. > > Please, read my posts on this topic, and do not treat this problem as > an unsolvable one. It can be largely solved in a way that gives a > very useful fallback option to operators. > > This whole draft is about fallback options, and it is pretty stupid to > have a large amount of complexity to solve a small set of potential > bugs, when you could ALTERNATIVELY or IN ADDITION to that, have a very > low-complexity option that solves more problems. > > -- > Jeff S Wheeler <j...@inconcepts.biz> > Sr Network Operator / Innovative Network Concepts _______________________________________________ GROW mailing list GROW@ietf.org https://www.ietf.org/mailman/listinfo/grow