Re: [rrg] RRG to hibernation

Shane Amante Sat, 10 Nov 2012 17:39:06 -0800

On Nov 10, 2012, at 10:35 AM, Danny McPherson <da...@tcb.net> wrote:
> On Nov 10, 2012, at 12:24 PM, Tony Li wrote:
[--snip--]
>> I agree that some security needs to be deployed.  I'm not convinced that it 
>> needs to be BGPSEC.  We've muddled along for many years and never found the 
>> gumption to actually deploy anything.  Must not be important to people.  I 
>> don't get it, but that's the observable behavior.  
>> 
>> In any case, this doesn't seem like a research topic.  This is pretty 
>> clearly an engineering issue.
> 
> I don't agree.  The engineering solution that SIDR is actively working 
> (RPKI-enabled BGPSEC) is pumping out standards track RFCs like there's no 
> tomorrow.  The USG has stated intentions of "expediting secure routing work 
> through the Internet standard process" and "fostering adoption through 
> government procurement vehicles".  
> 
> As an operator this scares the hell out of me, especially considering what 
> they've designed is largely a system to control "what's routed on the 
> Internet and by whom".  They can't seem to do anything in BGP(SEC) without 
> introducing the equivalent of "periodic updates", and undoing all the 
> goodness of things like update packing completely.  
> 
> Some serious thinkers working on this problem would be goodness...


Let me add that I share Danny's concerns ...

However, let me try to take a step back and share with everyone a much broader 
set of, potentially, architectural concerns that I'm not sure this RG 
considered during the last round.

BGP was originally designed for flooding of reachability information.  But, 
reachability information is the end-result /after/ the application of 
_routing_policy_, describing "intent", by operators of individual networks 
based on various contractual agreements they have with parties whom they 
directly interconnect.  Assuming you agree with this premise, this presents a 
paradox from a security PoV.  Specifically, if a downstream network does not 
have visibility into its upstream network's routing policy is it 
practical/feasible for the downstream network to understand the _intended_ 
propagation of reachability information and, ultimately, connectivity?  
Furthermore, is it feasible to carry such information within the control plane 
itself?  Or, should the control plane be relegated to carrying [strictly] 
reachability information in real-time, while offboard systems carry 
accompanying routing policy and security information in order to assist in 
making "optimal" Inter-Domain rou
 ting/forwarding decisions?

A second concern is also related to the original design of BGP and what it has 
organically involved into, today.  Specifically, BGP is /also/ now being tasked 
as a generic "message bus" and service discovery mechanism.  Not to pick on 
anyone, in particular, but the following are recent examples that come to my 
mind wrt this trend:
http://tools.ietf.org/html/draft-ietf-idr-ls-distribution-01
http://tools.ietf.org/html/draft-ietf-idr-operational-message-00
... and, there may be others.  Although, contrast those proposals with what 
should be most concerning to people in this RG, and in the IETF:
http://tools.ietf.org/html/draft-ietf-grow-ops-reqs-for-bgp-error-handling-05
In short, operators (such as myself) are _extremely_ concerned that a single 
erroneous update results in a complete reset of BGP sessions.  Due to the 
overwhelming success of BGP, it's now (and, has been for a while) a 
mission-critical protocol, thus such catastrophic session resets -- caused by a 
single malformed UPDATE -- are widely visible/impactful.  This impact is 
compounded by the 'cost to recover'.  Namely, due to the large and growing 
amount of information in the RIB (again, not just reachability, but also 
service-discovery and completely orthogonal information), it takes longer to 
exchange RIB information and, ultimately, restore services.  Is this really the 
best we, as an industry, can do?

While the IETF IDR WG has been looking at mechanisms for how BGP may defend 
against certain types of erroneous BGP UPDATE's for external BGP sessions:
http://tools.ietf.org/html/draft-ietf-idr-error-handling-02
... there does not appear to be any [straightforward] answer with respect to 
internal BGP sessions, given the requirement that BGP speakers internal to an 
AS must have a globally consistent RIB and FIB, otherwise packet forwarding 
loops will result.  And, in my personal operational experience it's _rarely_ 
the case that malformed UPDATE's are detected at the first ASBR (attached to an 
eBGP neighbor) in my AS, thus it concerns me that mechanisms such as 
draft-ietf-idr-error-handling-02 are an adequate solution to the problems we 
experience.  IOW, as an operator I desire "defense in depth" where a 
heterogeneous mix of vendor equipment (HW + SW), participating as interior BGP 
speakers, have mechanisms to detect *and* automatically recover from malformed 
UDPATE's received over iBGP sessions.  This is another area that I would point 
research colleagues toward.

So, this raises the classic conundrum of: increasing complexity, increasing RIB 
(and FIB) size information coupled with a contrasting need from operators who 
are concerned about the robustness of the protocol and the requirement to NOT 
sustain any failures[1].  Something's got to give.

Ultimately, this makes me question whether it's no longer _just_ growth of RIB 
(and, FIB) size that this RG should be (primarily?) focused on.  Rather, will 
the requirements for:
a) operational robustness, in the face of critical messaging errors in an 
Inter-Domain Routing Protocol, which the IETF may be unable to address on its 
own;
b) designing security as a first-class principle of an Inter-Domain Routing 
Protocol -- either carried within or outside of control-plane reachability 
information
c) increased scalability of RIB (and, other?) information
... lead us down a path of considering we may be approaching the 
end-of-the-road for BGPv4 and we need something new?

Does anyone on this list share similar concerns wrt operational robustness, 
time to recovery and (then) scalability of BGPv4?

-shane

[1] It is not cool to suggest that operators should just stop asking for new 
features and we wouldn't have this problem.  :)
_______________________________________________
rrg mailing list
rrg@irtf.org
http://www.irtf.org/mailman/listinfo/rrg

Re: [rrg] RRG to hibernation

Reply via email to