The difference between ignore and treat-as-withdraw is discussed in the draft.
Anyway, the feature will be knobbed. So far, I see 2 knob options:
1. Treat-as-withdraw / ignore
2. If stream sync is lost, search ahead for the next 16*0xFF / reset the 
session.

Cheers,
Jakob

On Dec 31, 2012, at 10:14 AM, "Jeff Wheeler" <j...@inconcepts.biz> wrote:

> On Mon, Dec 31, 2012 at 12:54 PM, Jakob Heitz <jakob.he...@ericsson.com> 
> wrote:
>> I don't think treat-as-withdraw is trying to fix a single session reset. 
>> Graceful restart can fix that. It's the rolling resets that need a human to 
>> remove a buggy router or a config that triggered the bug. That takes several 
>> hours. Treat-as-withdraw limits the damage during those hours.
>> 
>> Could we please settle on that without trying to solve the impossible?
> 
> If you read my posts to IDR on this topic, you'll see where I explain
> how it is possible to solve the "impossible."
> 
> Specifically, you can ignore just about any bad update, or bad message
> of any kind, as long as you can figure out where the next message
> starts.  This fixes the rolling resets.
> 
> You may know that a lot of businesses suffered multi-hour outages in
> October simply because of 5 DFZ routes announced by LANL that had
> illegal attributes.  This is very hard for operators to troubleshoot
> on most routers.  The routers that experienced rolling resets were
> buggy but if the operators simply had a panic button, "ignore bad
> messages," their networks would have been up and they would not have
> been losing money by the minute.
> 
> This is not the only time bad updates have propagated through the DFZ
> and caused big problems.  It has happened repeatedly.
> 
> I believe it will begin to happen more often inside datacenter
> networks, because BGP is being used for more and more things, like
> EVPN.  Operators are going to need BGP to become more robust.
> 
> You can make it a lot more robust just by deciding to ignore
> everything in a bad message.  This is not good, but it is a lot better
> than session-reset, in most cases.
> 
> Please, read my posts on this topic, and do not treat this problem as
> an unsolvable one.  It can be largely solved in a way that gives a
> very useful fallback option to operators.
> 
> This whole draft is about fallback options, and it is pretty stupid to
> have a large amount of complexity to solve a small set of potential
> bugs, when you could ALTERNATIVELY or IN ADDITION to that, have a very
> low-complexity option that solves more problems.
> 
> -- 
> Jeff S Wheeler <j...@inconcepts.biz>
> Sr Network Operator  /  Innovative Network Concepts
_______________________________________________
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow

Reply via email to