+1 I might even have some ideas on where to get some ideas, if we can convince the researchers in question to come forward. We could start with a requirements doc, which I'd be willing to co with someone once I replace the hard drive in my computer.
Russ Sent from my iPad On Nov 13, 2012, at 9:09 AM, "George, Wes" <wesley.geo...@twcable.com> wrote: > Changing subject line to reflect topic > > Shane has articulated a number of concerns that I think would be useful for > RRG to spend some time working on, and I tend to agree with Danny that the > current BGPSec solution seems to be more about "hacking at the edges" to get > something that is marginally better in some ways than the [lack of] security > that we have now, potentially ignoring the known scaling problems this group > has discussed at length all the while doing several things likely to > exacerbate them. It gives me concern about whether it will see significant > deployment due to the large amount of required investment vs the potential > benefit. I know I have asked more than once about the scaling implications of > BGPSec since it potentially makes a large impact in the footprint of the > routing data that must be stored and managed, and haven't exactly been > pleased with the answers even though some analysis has been done to show that > it's not a bad thing. > > If I were to distill things down, today we have a growth curve for both the > routing table (both RIB and FIB) and for cost-effective hardware with the > horsepower necessary to manage it (CPU, ASIC, memory, etc). SIDR is likely > not the one thing that will break the routing system by causing those curves > to cross, but it certainly changes the curves' pitch such that it's more > likely that the cost of keeping up with the demands of the system starts > becoming unmanageable, even if it doesn't actually reach the limits of the > technology. The investment in a network for scale and growth is incremental, > and SIDR's full justification is that those incremental upgrades will bring > hardware that can support its needs organically. However, things like BGPSec > or other disturbances that increase the growth curve of the routing table and > related scaling vectors mean that as an operator, I have to shorten my > upgrade cycle, spend capital earlier than originally projected, possibly even > to the p oi > nt where I can't manage an entire depreciation cycle (5-7 years) before > needing to spend additional money on upgrades. In a network that is driven by > commoditization of prices, that's not a good position to be in. > > Additionally, as Shane alluded to in another message, this isn't simply about > DFZ scale, but also internal scale, where there are commonly a *LOT* more > routes being carried by your average router inside an ISP's network. There > are also other considerations like the rate of updates due to background > churn vs during an event, other things that the control plane must manage > simultaneously, etc. Taking a step even further away from where RRG has been > previously focused, there is a similar sort of scaling problem within the > L3VPN space that is typically self-contained within the SP's network. While I > think there are some engineering solutions that may help with the short-term > scaling issues, there may also be some meat for research in the area of > modeling and instrumentation of the routing system to give SP's better tools > to use their available capacity efficiently, and possibly even changes to > help the routing control plane degrade more gracefully and deterministically. > The L3V PN > discussion is detailed in draft-gs-vpn-scaling-01 (an -02 rev is due soon, > waiting on co-author review and a few more updates), specifically in section > 6 and 6.5 for the modeling/instrumentation, and in sections 4 and 5 for ways > that the control plane tends to break down at scale limits. > > Thanks, > > Wes George > > > >> -----Original Message----- >> From: rrg-boun...@irtf.org [mailto:rrg-boun...@irtf.org] On Behalf Of >> Shane Amante >> Sent: Saturday, November 10, 2012 8:39 PM >> To: rrg@irtf.org >> Subject: Re: [rrg] RRG to hibernation >> >> >> On Nov 10, 2012, at 10:35 AM, Danny McPherson <da...@tcb.net> wrote: >>> On Nov 10, 2012, at 12:24 PM, Tony Li wrote: >> [--snip--] >>>> I agree that some security needs to be deployed. I'm not convinced >> that it needs to be BGPSEC. We've muddled along for many years and >> never found the gumption to actually deploy anything. Must not be >> important to people. I don't get it, but that's the observable >> behavior. >>>> >>>> In any case, this doesn't seem like a research topic. This is pretty >> clearly an engineering issue. >>> >>> I don't agree. The engineering solution that SIDR is actively working >> (RPKI-enabled BGPSEC) is pumping out standards track RFCs like there's >> no tomorrow. The USG has stated intentions of "expediting secure >> routing work through the Internet standard process" and "fostering >> adoption through government procurement vehicles". >>> >>> As an operator this scares the hell out of me, especially considering >> what they've designed is largely a system to control "what's routed on >> the Internet and by whom". They can't seem to do anything in BGP(SEC) >> without introducing the equivalent of "periodic updates", and undoing >> all the goodness of things like update packing completely. >>> >>> Some serious thinkers working on this problem would be goodness... >> >> Let me add that I share Danny's concerns ... >> >> However, let me try to take a step back and share with everyone a much >> broader set of, potentially, architectural concerns that I'm not sure >> this RG considered during the last round. >> >> BGP was originally designed for flooding of reachability information. >> But, reachability information is the end-result /after/ the application >> of _routing_policy_, describing "intent", by operators of individual >> networks based on various contractual agreements they have with parties >> whom they directly interconnect. Assuming you agree with this premise, >> this presents a paradox from a security PoV. Specifically, if a >> downstream network does not have visibility into its upstream network's >> routing policy is it practical/feasible for the downstream network to >> understand the _intended_ propagation of reachability information and, >> ultimately, connectivity? Furthermore, is it feasible to carry such >> information within the control plane itself? Or, should the control >> plane be relegated to carrying [strictly] reachability information in >> real-time, while offboard systems carry accompanying routing policy and >> security information in order to assist in making "optimal" Inter-Domain >> rou ting/forwarding decisions? >> >> A second concern is also related to the original design of BGP and what >> it has organically involved into, today. Specifically, BGP is /also/ >> now being tasked as a generic "message bus" and service discovery >> mechanism. Not to pick on anyone, in particular, but the following are >> recent examples that come to my mind wrt this trend: >> http://tools.ietf.org/html/draft-ietf-idr-ls-distribution-01 >> http://tools.ietf.org/html/draft-ietf-idr-operational-message-00 >> ... and, there may be others. Although, contrast those proposals with >> what should be most concerning to people in this RG, and in the IETF: >> http://tools.ietf.org/html/draft-ietf-grow-ops-reqs-for-bgp-error- >> handling-05 >> In short, operators (such as myself) are _extremely_ concerned that a >> single erroneous update results in a complete reset of BGP sessions. >> Due to the overwhelming success of BGP, it's now (and, has been for a >> while) a mission-critical protocol, thus such catastrophic session >> resets -- caused by a single malformed UPDATE -- are widely >> visible/impactful. This impact is compounded by the 'cost to recover'. >> Namely, due to the large and growing amount of information in the RIB >> (again, not just reachability, but also service-discovery and completely >> orthogonal information), it takes longer to exchange RIB information >> and, ultimately, restore services. Is this really the best we, as an >> industry, can do? >> >> While the IETF IDR WG has been looking at mechanisms for how BGP may >> defend against certain types of erroneous BGP UPDATE's for external BGP >> sessions: >> http://tools.ietf.org/html/draft-ietf-idr-error-handling-02 >> ... there does not appear to be any [straightforward] answer with >> respect to internal BGP sessions, given the requirement that BGP >> speakers internal to an AS must have a globally consistent RIB and FIB, >> otherwise packet forwarding loops will result. And, in my personal >> operational experience it's _rarely_ the case that malformed UPDATE's >> are detected at the first ASBR (attached to an eBGP neighbor) in my AS, >> thus it concerns me that mechanisms such as draft-ietf-idr-error- >> handling-02 are an adequate solution to the problems we experience. >> IOW, as an operator I desire "defense in depth" where a heterogeneous >> mix of vendor equipment (HW + SW), participating as interior BGP >> speakers, have mechanisms to detect *and* automatically recover from >> malformed UDPATE's received over iBGP sessions. This is another area >> that I would point research colleagues toward. >> >> So, this raises the classic conundrum of: increasing complexity, >> increasing RIB (and FIB) size information coupled with a contrasting >> need from operators who are concerned about the robustness of the >> protocol and the requirement to NOT sustain any failures[1]. >> Something's got to give. >> >> Ultimately, this makes me question whether it's no longer _just_ growth >> of RIB (and, FIB) size that this RG should be (primarily?) focused on. >> Rather, will the requirements for: >> a) operational robustness, in the face of critical messaging errors in >> an Inter-Domain Routing Protocol, which the IETF may be unable to >> address on its own; >> b) designing security as a first-class principle of an Inter-Domain >> Routing Protocol -- either carried within or outside of control-plane >> reachability information >> c) increased scalability of RIB (and, other?) information ... lead us >> down a path of considering we may be approaching the end-of-the-road for >> BGPv4 and we need something new? >> >> Does anyone on this list share similar concerns wrt operational >> robustness, time to recovery and (then) scalability of BGPv4? >> >> -shane >> >> [1] It is not cool to suggest that operators should just stop asking for >> new features and we wouldn't have this problem. :) >> _______________________________________________ >> rrg mailing list >> rrg@irtf.org >> http://www.irtf.org/mailman/listinfo/rrg > > This E-mail and any of its attachments may contain Time Warner Cable > proprietary information, which is privileged, confidential, or subject to > copyright belonging to Time Warner Cable. This E-mail is intended solely for > the use of the individual or entity to which it is addressed. If you are not > the intended recipient of this E-mail, you are hereby notified that any > dissemination, distribution, copying, or action taken in relation to the > contents of and attachments to this E-mail is strictly prohibited and may be > unlawful. If you have received this E-mail in error, please notify the sender > immediately and permanently delete the original and any copy of this E-mail > and any printout. > _______________________________________________ > rrg mailing list > rrg@irtf.org > http://www.irtf.org/mailman/listinfo/rrg _______________________________________________ rrg mailing list rrg@irtf.org http://www.irtf.org/mailman/listinfo/rrg