Re: So -- what did happen to Panix?
On Wed, Feb 08, 2006 at 04:37:31AM +, Christopher L. Morrow wrote: I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run! FWIW, this sort of mechanism was discussed among the IETF RPSEC WG task group that is working on BGP security requirements. On the presumption that some database of stable routes and paths is present, you could bias your preference in your routes for more stable routes and paths. You would also need to decide what to do about more specific routes covered by stable routes. Do you ignore them? This is a harder question. -- Jeff Haas NextHop Technologies
Re: So -- what did happen to Panix?
Here is what we propose in PGBGP. If you have a more specific route and its AS Path does not contain any of the less specific route's origins, then ignore it for a day and keep routing to the less specific origin. If it's legitimate the less specific origin should forward the data on for the day. We see about 30 of these suspicious routes per day. I imagine some of you will not like this sceheme. Please let me know why. Josh On 2/8/06, Jeffrey Haas [EMAIL PROTECTED] wrote: On Wed, Feb 08, 2006 at 04:37:31AM +, Christopher L. Morrow wrote: I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run! FWIW, this sort of mechanism was discussed among the IETF RPSEC WG task group that is working on BGP security requirements. On the presumption that some database of stable routes and paths is present, you could bias your preference in your routes for more stable routes and paths. You would also need to decide what to do about more specific routes covered by stable routes. Do you ignore them? This is a harder question. -- Jeff Haas NextHop Technologies
Re: So -- what did happen to Panix?
Martin Hannigan wrote: My answer, in short, was to say that I see it as more of an enterprise play because it's a managed service and the hardest part of provisioning is typically the order cycle. If you are an ISP, you are theoretically multi homed by definition and your providers are going to remain fairly stable (you hope) based on your own needs. My point remains: designs based on such assumptions are not a good idea, since these assumptions are by no means fundamental and could certainly change. People get creative with how they announce prefixes, change upstreams, etc., and you can't assume that things like this would stay the way they are. As an aside, another question occurred to me about delaying unusual announcements. Boeing Connexion offers another example of unorthodox prefix announcements. Wouldn't the tactic of delaying unusual announcements would cause problems for this service? -Nick
Re: So -- what did happen to Panix?
On Tue, 7 Feb 2006, Nick Feamster wrote: As an aside, another question occurred to me about delaying unusual announcements. Boeing Connexion offers another example of unorthodox prefix announcements. Wouldn't the tactic of delaying unusual announcements would cause problems for this service? I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run! So.. provided Connexion withdraws from 'as-germany' and announces in 'as-atlantic ocean', and so on there would only be 1 route, and you'd fall to step 2. (yes, the paper was more detailed and there were more steps...)
Re: So -- what did happen to Panix?
At 11:27 PM 2/7/2006, Nick Feamster wrote: Martin Hannigan wrote: My answer, in short, was to say that I see it as more of an enterprise play because it's a managed service and the hardest part of provisioning is typically the order cycle. If you are an ISP, you are theoretically multi homed by definition and your providers are going to remain fairly stable (you hope) based on your own needs. My point remains: designs based on such assumptions are not a good idea, since these assumptions are by no means fundamental and could certainly change. People get creative with how they announce prefixes, change upstreams, etc., and you can't assume that things like this would stay the way they are. Nick: I wouldn't call them assumptions. I would call them engineering decisions in operational environments. I guess I fail to see where a commodity market with a broker adding a vig resolves a real network problem. I'm think tier1? They aren't buying service from anyone on Equinix direct and move/add/drop is just another day on the Internet. I really can't see any provider doing it, but perhaps smaller ones. *shrug*. I don't know why you wouldn't make temporary arrangements via peering fabric, PNI, or transit and eliminate the middle man (point of failure). As an aside, another question occurred to me about delaying unusual announcements. Boeing Connexion offers another example of unorthodox prefix announcements. Wouldn't the tactic of delaying unusual announcements would cause problems for this service? [ snip ] -M -Nick Martin Hannigan(c) 617-388-2663 Renesys Corporation(w) 617-395-8574 Member of Technical Staff Network Operations [EMAIL PROTECTED]
Re: So -- what did happen to Panix?
Chris has it! And to be clear, we only require a slow (1 day) provider changeover in the case that you want to announce your old provider's sub-prefix at a new provider. For instance, if you are an ATT customer using a 12/8 sub-prefix and change providers but keep the prefix, the prefix will look funny coming from another originator for the first day and be delayed. All other methods of changing providers will not be interfered with. Josh I had thought Josh's paper (or maybe not josh, whomever it was) said something along the lines of: 1) if more than one announcement prefer 'longer term', 'older', 'more usual' route 2) if only one route take it and run! So.. provided Connexion withdraws from 'as-germany' and announces in 'as-atlantic ocean', and so on there would only be 1 route, and you'd fall to step 2. (yes, the paper was more detailed and there were more steps...)
Re: So -- what did happen to Panix?
If an IRR suffers from bit-rot, then I don't consider it to be well-operated and therefore it cannot be considered to be part of a well-operated network of IRRs. honestly I'm not a fan of IRR's, so don't pay attention to them, but... is the IRR 'not well operated' or is the data stale because the 'users' of the IRR are 'not well operated' ? (the IRR as near as I can tell is nothing but a web/whois server that you sign-up-for and push/pull data through, right?) Indeed it is not much more than a server with a database which is why I do not consider it to be well-operated. In order to be well-operated, somebody (or some organization) needs to take responsibility for the data in the database and make sure that this data is as accurate as can be. I'm really saying that if people want to solve this problem jointly, then the tools are already there for a membership organization to use. And such an organization could also work on a revised BGP protocol as a longer term solution. But, in the absence of such an organization we have nothing more than a disorganized chaos in which nothing much changes. --Michael Dillon
Re: So -- what did happen to Panix?
Other networks have no such incentive, since their transit providers and peers either build their filters in other ways, or don't filter at all. There is nothing wrong with building your filter in some other way, however, that does not mean that you cannot validate your filters against the IRR and take some action on mismatches. For instance you could email the prefix owners about the mismatch and ask them to update the IRR. Wherever there is a lack of incentive to keep records accurate, we can probably safely assume that they are either missing or stale. Yes. Without regular validation or auditing of data, it does not stay up to date. It's probably fair to say that if all the large, default-free carriers insisted that their customers submitted their routes to the IRR, then every route would be registered. This would not completely address the problem of stale data, though. It's a good start. Perhaps if we decouple the idea of an IRR from building filters more people will see the usefulness of a distributed repository of information against which they can validate (cryptographically or otherwise) their routing data. Right now the secure BGP protocols require a network to climb the hurdles of cryptographic certification in order to participate. A revised and renewed IRR can lower that barrier so that people can participate even before they implement cryptographic signing and certification. The IRR is a loosely-connected collection of route registries, all run by different people. Data originating in one database is frequently found to be mirrored in other databases, but not in any great systematic fashion. If the networking community can't solve the problem of managing the distributed route registries in a systematic fashion, then how can it implement one of the secure BGP proposals? --Michael Dillon
Re: So -- what did happen to Panix?
At 02:05 AM 2/6/2006, Nick Feamster wrote: Martin Hannigan wrote: [ SNIP ] If you are changing providers, which takes awhile anyway, That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm NOT an ISP product. Independent of ED, one should be cautious when designing routing protocols based on logistical and business assumptions (e.g., switching providers takes awhile, most business policies are vanilla peering, etc.). These assumptions are certainly not fundamental, and they may not always be true, regardless of what exists today. I got some can you elaborate comments so please forgive my second response. What I thought I read was that you thought Equinix had an interesting play in a transitioning and provisioning strategy for ISP's. My answer, in short, was to say that I see it as more of an enterprise play because it's a managed service and the hardest part of provisioning is typically the order cycle. If you are an ISP, you are theoretically multi homed by definition and your providers are going to remain fairly stable (you hope) based on your own needs. Equinix direct is a bandwidth commodity in my mind. Anyone remember Invisible Hand (still in business, btw http://www.invisiblehand.net/) Equinix handles the software interaction and is the market maker. Customers appear to providers and providers can decide if they want to sell to customers. For example, if you show up at ED and need X gigs, a provider could opt out of the market because you are a highcap customer. In the end, the market maker gets a piece of the action from the provider and sends the customer a bill since it is theoretically the provider. I think there's a question about neutrality, but there are no more pure neutral colo houses so that is somewhat irrelevant unless it's completely bogus like selling interconnect network or something vs. the ILEC. In an environment like Equinix or SD, you could attach to the public peering fabric and make connections, and then if you need someone specific you can hope to get them on ED (in Equinixs case) without buying dedicated transit. In short, it's easy. With that said, I believe most ISP's would be better suited to overlapped service or TE'ing vs. using commodity markets for b/w, IMHO. Thanks, -M Martin Hannigan(c) 617-388-2663 Renesys Corporation(w) 617-395-8574 Member of Technical Staff Network Operations [EMAIL PROTECTED]
Re: So -- what did happen to Panix?
On Fri, Feb 03, 2006 at 02:15:45PM -0500, Nick Feamster wrote: [snip] This is a losing proposition. The data in the IRR, CA, or any mechanism that is updated out-of-band from the protocol itself will inherently be out-of-sync. Provisioning systems are out of synch with the protocol, but essential for many(maont?) networks' connectivity. Many providers who do use the IRR have it as an adjunct/offshoot of their provisionign system. Of course, to some monolithic entities the suggestion that any alteration (or $deity-forbid, a not-invented-here *improvement*) to their system is anathema. [snip some interesting stuff] If you are changing providers, which takes awhile anyway, That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm See 'whois -h whois.radb.net rs-ed-ash' and similar objects; great support for IRR as externally-relevant portion of a provisioning system. Cheers, Joe -- RSUC / GweepNet / Spunk / FnB / Usenix / SAGE
Re: So -- what did happen to Panix?
[ SNIP ] If you are changing providers, which takes awhile anyway, That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm NOT an ISP product. -M Martin Hannigan(c) 617-388-2663 Renesys Corporation(w) 617-395-8574 Member of the Technical Staff Network Operations [EMAIL PROTECTED]
Re: So -- what did happen to Panix?
Martin Hannigan wrote: [ SNIP ] If you are changing providers, which takes awhile anyway, That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm NOT an ISP product. Independent of ED, one should be cautious when designing routing protocols based on logistical and business assumptions (e.g., switching providers takes awhile, most business policies are vanilla peering, etc.). These assumptions are certainly not fundamental, and they may not always be true, regardless of what exists today. -Nick
Re: So -- what did happen to Panix?
At 02:05 AM 2/6/2006, Nick Feamster wrote: Martin Hannigan wrote: [ SNIP ] If you are changing providers, which takes awhile anyway, That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm NOT an ISP product. Independent of ED, one should be cautious when designing routing protocols based on logistical and business assumptions (e.g., switching providers takes awhile, most business policies are vanilla peering, etc.). These assumptions are certainly not fundamental, and they may not always be true, regardless of what exists today This is strictly a market-maker product, IMHO, which is different from a transition or provisioning strategy. YMMV. ISP's don't switch providers, typically, enterprises do. ISP's add, move, and drop, so physical layer management is more important, believe it or not. -M Martin Hannigan(c) 617-388-2663 Renesys Corporation(w) 617-395-8574 Member of the Technical Staff Network Operations [EMAIL PROTECTED]
Re: So -- what did happen to Panix?
On Mon, 30 Jan 2006 [EMAIL PROTECTED] wrote: Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3. If an IRR suffers from bit-rot, then I don't consider it to be well-operated and therefore it cannot be considered to be part of a well-operated network of IRRs. honestly I'm not a fan of IRR's, so don't pay attention to them, but... is the IRR 'not well operated' or is the data stale because the 'users' of the IRR are 'not well operated' ? (the IRR as near as I can tell is nothing but a web/whois server that you sign-up-for and push/pull data through, right?)
Re: So -- what did happen to Panix?
On Fri, 3 Feb 2006, Josh Karlin wrote: Our primary concern is with keeping BGP stable until its replacement (e.g. sBGP) is ready for deployment. veering off course for a tick: I wonder how well sbgp/sobgp will behave in a world of 1million routes in the DFZ? 5 million? 10? 20?... Someone better be thinking about that part of the problem as well with the coming doom of ipv6 :)
Re: So -- what did happen to Panix?
On 4-Feb-2006, at 15:21, Christopher L. Morrow wrote: honestly I'm not a fan of IRR's, so don't pay attention to them, but... is the IRR 'not well operated' or is the data stale because the 'users' of the IRR are 'not well operated' ? The data ought to be maintained by the people to whom it relates. Customers (and peers) of some ISPs have great incentives to add appropriate records, since if they don't do so their ISPs' filters will not be widened to accept their routes. Other networks have no such incentive, since their transit providers and peers either build their filters in other ways, or don't filter at all. Generally, there is no incentive to remove data from the IRR, except in the case where resources are returned and reallocated to someone else who wants to make their own records. Wherever there is a lack of incentive to keep records accurate, we can probably safely assume that they are either missing or stale. Customer in this context means anybody whose routes might be filtered by someone else. Since large, default-free carriers tend not to have their routes filtered by peers, those that don't use RPSL expressions to build customer filters don't have much reason to care about the IRR. It's probably fair to say that if all the large, default-free carriers insisted that their customers submitted their routes to the IRR, then every route would be registered. This would not completely address the problem of stale data, though. (the IRR as near as I can tell is nothing but a web/whois server that you sign-up-for and push/pull data through, right?) The IRR is a loosely-connected collection of route registries, all run by different people. Data originating in one database is frequently found to be mirrored in other databases, but not in any great systematic fashion. Together these databases form a distributed repository of RPSL objects. Objects are generally submitted by e-mail and retrieved using whois, but some registry operators also make web interfaces available. Anybody who doesn't know what RPSL is can find out at http://www.irr.net/docs/rpsl.html. Joe
Re: So -- what did happen to Panix?
Josh Karlin wrote: Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations.Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless. Depending on the threat model, then, one attack would be to cause an AS to damp the non-suspicious route. This seems bad, right? A flapping, correct route seems better than a stable, suspicious one. A flapping route would only be considered suspicious if it disappears for many consecutive days and no other known route for the prefix originates at the same AS. At which point the attacker has already won. My point was actually that an adversary could flap a correct route to damp it, to induce a router to select a suspicious one. (This threat also exists today, I believe, but the delay tactic does not solve the problem.) Ascertaining correctness is only half of the work. If you correctly classify a malicious route, but do not take some measure to prevent its spread, you have just done yourself and your customers harm. I would say that ascertaining correctness is more than half of the work. If a router can definitively say that a route is bogus, the measure to prevent its spread is pretty simple, right? i.e., just drop the route. In the case of PGBGP, there is a lot that an operator can do to verify correctness. Multiple viewpoints of anomalous routes can be collected into a single database in which operators can, once per day, check to make sure that their own address space is not being announced elsewhere. This can easily be automated for both the NOC and the collection process. Relationship information need not be revealed as only the originator of the suspicious route is needed. Analysis of multiple vantage points could definitely help in your case. The method for determining what a suspcious route is is not obvious, though. In the example you present, a router can install route filters to reject incoming announcements for its own address space (many ISPs seem to deploy these types of filters already). Much trickier is determining things like route hijacks, where even a delay won't help much without a reasonable way to ask Is this route hijacked? The best way I know of for doing that is to go back to the registry. If there are other ways to do this, I'd certainly be very interested to know about the state of the art. The proposal seems useful in a case where collection of measurements from multiple vantage points could run analysis to detect suspcious routes, assuming the detection algorithms could be run quickly enough and the information about suspicious routes could be propagated back out to the network...which might not always be true in an attack scenario. -Nick
Re: So -- what did happen to Panix?
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? -certified prefix ownership -certified AS path ownership -dynamic changes to the above two items It seems to me that most of the pieces needed to do this already exist. RPSL, IRR softwares, regional addressing authorities (RIRs). If there are to be certified AS paths in a central database this also opens the door to special arrangements for AS path routing that go beyond peering, i.e. agreements with the peers of your peers. It is true that most of the pieces do exist. The problem appears to be not a want of tools, but the fact that the tools are not coupled properly---updating records about prefix ownership is, today, performed out-of-band from the routing protocol. This is a losing proposition. The data in the IRR, CA, or any mechanism that is updated out-of-band from the protocol itself will inherently be out-of-sync. A better idea, I think, would be to tie the identifier of the route something that is inherently bound to some cryptographic information (e.g., a public key), rather than a separate piece of information whose ownership must be certified (i.e., an IP prefix, an AS number). I can think of some great ways to do this, but they all involve varying degrees of departure from prefix-based routing. I would certinaly be interested in talking offline about this with any forward-thinking types. Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations.Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless. Depending on the threat model, then, one attack would be to cause an AS to damp the non-suspicious route. This seems bad, right? A flapping, correct route seems better than a stable, suspicious one. Perhaps I am missing something, but how does imposing a delay help in ascertaining a route's correctness? Even looking at some of the suspicious routes I see by hand in the anomalies we detect, I can't personally tell what's incorrect/actionable vs. simply unusual (again, this goes back to the problem of inaccurate registries). In the case of Panix/ConEd, I can imagine that an operator would have responded to the alarms, checked the registry information and said, these routes look reasonable; go for it! Or, as human nature suggests, the operator might have even just ignored the alarms (particularly if origin changes are as frequent as they seem to be). What is really needed, in any case, is a better way to determine the route's veracity. This still requires some auxiliary mechanism to distinguish unusual from suspcious, and, while you're designing that auxiliary mechanism, it might as well be in-band (per the arguments above). If you are changing providers, which takes awhile anyway, That process seems to be getting quicker: http://www.equinix.com/prod_serv/network/ed.htm -Nick
Re: So -- what did happen to Panix?
Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations.Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless. Depending on the threat model, then, one attack would be to cause an AS to damp the non-suspicious route. This seems bad, right? A flapping, correct route seems better than a stable, suspicious one. A flapping route would only be considered suspicious if it disappears for many consecutive days and no other known route for the prefix originates at the same AS. At which point the attacker has already won. Our primary concern is with keeping BGP stable until its replacement (e.g. sBGP) is ready for deployment. Perhaps I am missing something, but how does imposing a delay help in ascertaining a route's correctness? Even looking at some of the suspicious routes I see by hand in the anomalies we detect, I can't personally tell what's incorrect/actionable vs. simply unusual (again, this goes back to the problem of inaccurate registries). In the case of Panix/ConEd, I can imagine that an operator would have responded to the alarms, checked the registry information and said, these routes look reasonable; go for it! Or, as human nature suggests, the operator might have even just ignored the alarms (particularly if origin changes are as frequent as they seem to be). Ascertaining correctness is only half of the work. If you correctly classify a malicious route, but do not take some measure to prevent its spread, you have just done yourself and your customers harm. In the case of PGBGP, there is a lot that an operator can do to verify correctness. Multiple viewpoints of anomalous routes can be collected into a single database in which operators can, once per day, check to make sure that their own address space is not being announced elsewhere. This can easily be automated for both the NOC and the collection process. Relationship information need not be revealed as only the originator of the suspicious route is needed. If, in the worst case, the route is not detected as malicious before it is considered normal, the next wave of routers will be introduced to the route and consider it suspicious. The first wave will then notice the problem and fix it, still protecting a significant portion of the network. Josh
Re: So -- what did happen to Panix?
On Jan 30, 2006, at 5:02 AM, Richard A Steenbergen wrote: On Mon, Jan 30, 2006 at 09:48:13AM +, [EMAIL PROTECTED] wrote: Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3. If an IRR suffers from bit-rot, then I don't consider it to be well-operated and therefore it cannot be considered to be part of a well-operated network of IRRs. The point is that the tools exist. The failing is in how those tools are managed. In other words this is an operational problem on both the scale of a single IRR and on the scale of the IRR system. Is this what you mean by a layer 8 problem? Take it up with the people putting data into the system, not the IRR operators. Anyone who is behind an IRR-based provider (like Verio) has motivation to put data into the system (hey look I do this and now routing works), but there is no motivation to take stale data OUT of the system. It gets even more fun if you're delegating route-origination to 3rd parties. Add a mnt-routes: so they can create a route object, but then you can't remove that inetnum block whilst their route object exists (nor remove the mnt-routes). *sigh*
Re: So -- what did happen to Panix?
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3. If an IRR suffers from bit-rot, then I don't consider it to be well-operated and therefore it cannot be considered to be part of a well-operated network of IRRs. The point is that the tools exist. The failing is in how those tools are managed. In other words this is an operational problem on both the scale of a single IRR and on the scale of the IRR system. Is this what you mean by a layer 8 problem? --Michael Dillon
Re: So -- what did happen to Panix?
Perhaps people should stop trying to have these operational discussions in the IETF and take the discussions to NANOG where network operators gather. We have tried, of course; see, for example, NANOG 28 (Salt Lake City). There was no more consensus at NANOG than in the IETF... One attempt almost 3 years ago, doesn't sound very serious to me. And if the discussion is only concerned with seeking consensus on implementing a new flavor of BGP protocol then it isn't much of a discussion. In fact, there was a consensus at Salt Lake City that the issues of routing security could be adequately dealt with by existing tools and protocols. Not all problems require new protocols to solve them. --Michael Dillon
Re: So -- what did happen to Panix?
On Mon, Jan 30, 2006 at 09:48:13AM +, [EMAIL PROTECTED] wrote: Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3. If an IRR suffers from bit-rot, then I don't consider it to be well-operated and therefore it cannot be considered to be part of a well-operated network of IRRs. The point is that the tools exist. The failing is in how those tools are managed. In other words this is an operational problem on both the scale of a single IRR and on the scale of the IRR system. Is this what you mean by a layer 8 problem? Take it up with the people putting data into the system, not the IRR operators. Anyone who is behind an IRR-based provider (like Verio) has motivation to put data into the system (hey look I do this and now routing works), but there is no motivation to take stale data OUT of the system. I can't even begin to count the number of networks I know who theoretically use IRR who don't even know HOW to remove data, let alone make any active attempt to do so when a customer leaves or a route is returned. Combine this with the idiots who run around proxy registering routes for other people based on everything they see in the table (gee theres a good idea, define filters for what is allowed in the table based on what we see people trying to put into the table, brilliant!) and you quickly see how IRR data becomes stale and eventually worthless. I'll save the rest of my rant for the presentation on the subject in Dallas. :) -- Richard A Steenbergen [EMAIL PROTECTED] http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Re: So -- what did happen to Panix?
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted. IOW: It didn't solve this problem. So I guess we're discussing the other 5%? You missed the words well-operated. Today there is no well-operated network of IRRs so there is bad data in the databases. In addition, there is the question of how to use the IRR data. Should you build filters from it? Should you use it to validate your own internal database with human beings chasing up the differences and fixing whichever database is wrong? --Michael Dillon
Re: So -- what did happen to Panix?
the scheme that josh karlin has been advocating in pretty good bgp involved only supressing a doubtful announcement when you have a better, more trusted announcement. Not a doubtful announcement, a novel announcement. Not a better announcement, a more usual announcement. The trust part, like beauty, is in the eye of the beholder. Don't get me wrong - I think basing decision on some trusted summary of historical behavior is going to be important, unless and until we get some approach that gives a more deterministic answer. But I do believe that we need to consider carefully how this will play with dynamic, particularly unplanned, changes in who is announcing what. If there turn out to be cases where dynamic, particularly unplanned, changes get rejected by this technique in favor of stale data, then there should be consideration given to how to amend the scheme to prevent that or suggest operational practices to get around it. --Sandy
Re: So -- what did happen to Panix?
sandy, On Mon, Jan 30, 2006 at 08:29:45AM -0500, [EMAIL PROTECTED] wrote: the scheme that josh karlin has been advocating in pretty good bgp involved only supressing a doubtful announcement when you have a better, more trusted announcement. Not a doubtful announcement, a novel announcement. Not a better announcement, a more usual announcement. The trust part, like beauty, is in the eye of the beholder. i just don't think you're following along. i think we're talking about different things. read josh, stephanie forrest and jennifer rexford's paper: http://www.cs.unm.edu/~treport/tr/05-10/pgbgp.pdf Don't get me wrong - I think basing decision on some trusted summary of historical behavior is going to be important, unless and until we get some approach that gives a more deterministic answer. But I do believe that we need to consider carefully how this will play with dynamic, particularly unplanned, changes in who is announcing what. josh's scheme only comes into play when there are two, competing origination patterns. in this case the question is just which one to believe. agreed that we should be careful with anything that reduces the ability of people to change routing dynamically. but let's remember: that ability is already constrained by the fact that responsible providers use prefix filters and require some kind of out-of-band (IRR, letter, email) validation of prefix ownership. routing a new prefix with a new origination pattern is not especially dynamic now, so let's not worry about throwing out a baby that's not even in the bath. t. -- _ todd underwood chief of operations security renesys - internet intelligence [EMAIL PROTECTED] www.renesys.com
Re: So -- what did happen to Panix?
In message [EMAIL PROTECTED] .com, [EMAIL PROTECTED] writes: certified validation of prefix ownership (and path, as has been pointed out) would be great. it's clearly a laudable goal and seemed like the right way to go. but right now, no one is doing it. the rfcs that's i've found have all expired. and the conversation about it has reached the point where people seem to have stopped even disagreeing about how to do it. in short, it's as dead as dns-sec. so what are we do do in the meantime? Perhaps people should stop trying to have these operational discussions in the IETF and take the discussions to NANOG where network operators gather. We have tried, of course; see, for example, NANOG 28 (Salt Lake City). There was no more consensus at NANOG than in the IETF... --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
Re: So -- what did happen to Panix?
All these explanations can only go so far as to show that ConEd and its upstreams may have had these prefixes as something that is allowed (due to previous transit relationships) to be annnounced. However presumably all these were transit arrangements with ConEd and ip blocks would have originated from different ASN where a during the accident ConEd actually directly announced prefix as originating from its own ASN. One thing I can think of is that ConEd started doing syncrhonization so all eBGP routes were redistributed into ospf or some other igp protocol. This could led to situation that some previously configured router that redistributes summarized rotues from igp go bgp could think the route needs to be advertised as coming from ConEd and announced it Verio. But I think result of all this should have been that route would be flapping (i.e. they start announcing and then it gets removed from what they learn from upstream and so no longer redistributed to igp and no longer announced; back to the beginning) and they weren't. -- William Leibzon Elan Networks [EMAIL PROTECTED]
Re: So -- what did happen to Panix?
On Fri, Jan 27, 2006 at 04:36:28AM -0800, Randy Bush wrote: what I saw by going through the diffs, etc.. that I have available to me is that the prefix was registered to be announced by our customer and hence made it into our automatic IRR filters. i.e., the 'error' was intended, and followed all process. so, what i don't see is how any hacks on routing, such as delay, history, ... will prevent this while not, at the same time, have very undesired effects on those legitimately changing isps. seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. perhaps you mean certified validation of prefix origin and path. Ownership of any given prefix is a dicey concept at best. as a start, i'd want two things for authentication and integrity checks: AS P asserts it is the origin of prefix R and prefix R asserts the true origin AS is P (or Q or some list). Being able to check these assertions and being assured of the authenticity and integrity of the answers goes a long way, at least for me. path validation is something else and a worthwhile goal. --bill what am i missing here? randy
Re: So -- what did happen to Panix?
seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? -certified prefix ownership -certified AS path ownership -dynamic changes to the above two items It seems to me that most of the pieces needed to do this already exist. RPSL, IRR softwares, regional addressing authorities (RIRs). If there are to be certified AS paths in a central database this also opens the door to special arrangements for AS path routing that go beyond peering, i.e. agreements with the peers of your peers. Seems to me that operational problem solving works better when the problem is not thrown into the laps of the protocol designers. --Michael Dillon
Re: So -- what did happen to Panix?
Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? -certified prefix ownership -certified AS path ownership -dynamic changes to the above two items It seems to me that most of the pieces needed to do this already exist. RPSL, IRR softwares, regional addressing authorities (RIRs). If there are to be certified AS paths in a central database this also opens the door to special arrangements for AS path routing that go beyond peering, i.e. agreements with the peers of your peers. Hasn't that been said for years? Wouldn't perfect IRRs be great? I couldn't agree more. But in the meanwhile, why not protect your own ISP by delaying possible misconfigurations.Our proposed delay does *not* affect reachability, if the only route left is suspicious, it will be chosen regardless. If you are changing providers, which takes awhile anyway, just advertise both for a day and you have no problems. Or, if you are concerned about speed, simply withdraw one and the new one will have to be used. If you are anycasting the prefix and a new origin pops up that your view has not seen before, then you might have a temporary load balance issue, but there is absolutely no guarantee of what routers many hops away from you will see anyway. Josh
Re: So -- what did happen to Panix?
randy, all, On Fri, Jan 27, 2006 at 04:36:28AM -0800, Randy Bush wrote: what I saw by going through the diffs, etc.. that I have available to me is that the prefix was registered to be announced by our customer and hence made it into our automatic IRR filters. i.e., the 'error' was intended, and followed all process. yep. that's the depressing part. so, what i don't see is how any hacks on routing, such as delay, history, ... will prevent this while not, at the same time, have very undesired effects on those legitimately changing isps. you're probably right (as usual). but it seems that if you delay acceptance of announcements with novel origination patterns, you don't harm very many legitimate uses. in particular, ASes changing upstreams won't be harmed at all. people moving their prefix to a new ISP will have a fixed delay in getting their announcement propagated, sure. but they already have this delay now. they tell the new ISP: 'announce my prefix' and the new ISP says 'prove it's yours'. they do that for a couple of emails. then the new ISP asks it's upstreams to accept that announcement. that takes a little while (ranging from 4 to 72 hours in my recent experience). seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. certified validation of prefix ownership (and path, as has been pointed out) would be great. it's clearly a laudable goal and seemed like the right way to go. but right now, no one is doing it. the rfcs that's i've found have all expired. and the conversation about it has reached the point where people seem to have stopped even disagreeing about how to do it. in short, it's as dead as dns-sec. so what are we do do in the meantime? t. -- _ todd underwood chief of operations security renesys - internet intelligence [EMAIL PROTECTED] www.renesys.com
Re: So -- what did happen to Panix?
On Fri, Jan 27, 2006 at 10:42:11AM -0500, Joe Abley wrote: On 27-Jan-2006, at 07:51, [EMAIL PROTECTED] wrote: perhaps you mean certified validation of prefix origin and path. In the absense of path valdiation, a method of determining the real origin of a prefix is also required, if the goal is to prevent intentional hijacking as well as unintentional origination. Simply looking at the right-most entry in the AS_PATH doesn't cut it, since anybody can set as-path prepend P. but by definition, the right-most entry is the prefix origin... the question becomes, is that the origin the prefix expects? to use an historical example: 198.32.6.0/24 thinks that AS 4555 is the correct origin AS 4555 thinks that it should (and does) originate prefix 198.32.6.0/24 AS 4555 uses AS 226 and 701 as transit providers. AS 1239 wants to be helpful and tells its peers that it is the proper origin for prefix 198.32.0.0/16 -BUT- never tells AS 4555 about this and has no direct means to deliver packets to AS 4555. Or... we see 128.9.160.0/24 as originating from multiple ASNs. there is no requirement for single AS origin - is that theft or an engineering tradeoff? This suggests to me that either we can't separate origin validation from path validation (which sucks the former into the more difficult problems associated with the latter), or we need a better measure of origin (e.g. a PKI and an attribute which carries a signature). i was just interested in the problem of assertion of origination. it needs to be done w/o a centralized repositiory (imho) because that method has scalability problems. such a technique does open new chances to confuse ... e.g. what happens when the prefix is seen from the same apparent AS but w/ two or more different signatures? path validation is (again imho) a severable problem the prefix/as origin. Joe
Re: So -- what did happen to Panix?
certified validation of prefix ownership (and path, as has been pointed out) would be great. it's clearly a laudable goal and seemed like the right way to go. but right now, no one is doing it. the rfcs that's i've found have all expired. and the conversation about it has reached the point where people seem to have stopped even disagreeing about how to do it. in short, it's as dead as dns-sec. so what are we do do in the meantime? Perhaps people should stop trying to have these operational discussions in the IETF and take the discussions to NANOG where network operators gather. Writing RFCs is a fine way to document operational best practices, but it is not a good way to work out joint operational practices. Of course, NANOG is no magic bullet, but it seems like a more reasonable place to talk about how to make things better. A good start would be to try and get an agreed statement of what the problem is. Once you have broad agreement on the problem, then move on to solutions. --Michael Dillon
Re: So -- what did happen to Panix?
On 27-Jan-2006, at 11:12, [EMAIL PROTECTED] wrote: but by definition, the right-most entry is the prefix origin... Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555 to the AS_PATH as it does so. Suppose 9327's uses a transit provider which builds prefix filters from the IRR, and the as9327 aut-num object is modified to include policy which suggests 9327 provides transit for 4555. Suppose this is not actually the case, though, and in fact 9327 is a rogue AS which is trying to capture 4555's traffic. The rest of the world sees a prefix with an AS_PATH attribute which ends with 9327 4555. In this case, from the point of view of those trying to discern legitimacy of advertisements, what is the origin of the prefix? Is it 4555, or 9327? Is it possible to tell, from just the right-most entry in the AS_PATH attribute? Joe [note: 9327 is not a rogue AS, in fact. This is just hypothetical :-)]
Re: So -- what did happen to Panix?
Thus spake [EMAIL PROTECTED] seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? We have such a database (used by Verio and others), but the Panix incident happened anyway due to bit rot. We've got to find a way to fix the layer 8 problems before we can make improvements at layer 3. S Stephen SprunkStupid people surround themselves with smart CCIE #3723 people. Smart people surround themselves with K5SSS smart people who disagree with them. --Aaron Sorkin
Re: So -- what did happen to Panix?
On Jan 27, 2006, at 8:29 AM, [EMAIL PROTECTED] wrote: seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted. IOW: It didn't solve this problem. So I guess we're discussing the other 5%? -- TTFN, patrick
Re: So -- what did happen to Panix?
On Jan 27, 2006, at 11:39 AM, Joe Abley wrote: On 27-Jan-2006, at 11:12, [EMAIL PROTECTED] wrote: but by definition, the right-most entry is the prefix origin... Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555 to the AS_PATH as it does so. Suppose 9327's uses a transit provider which builds prefix filters from the IRR, and the as9327 aut-num object is modified to include policy which suggests 9327 provides transit for 4555. Suppose this is not actually the case, though, and in fact 9327 is a rogue AS which is trying to capture 4555's traffic. The rest of the world sees a prefix with an AS_PATH attribute which ends with 9327 4555. In this case, from the point of view of those trying to discern legitimacy of advertisements, what is the origin of the prefix? Is it 4555, or 9327? Is it possible to tell, from just the right-most entry in the AS_PATH attribute? Suggested solutions do not have to solve every possible problem. Knowing the correct origin will stop accidental announcements, like the one under discussion in this thread. And, I suspect, most problems we see today of this sort. We are not (yet) to the point where maliciously originated prefixes are as big a problem as accidentally originated prefixes. -- TTFN, patrick
Re: So -- what did happen to Panix?
On Fri, Jan 27, 2006 at 11:39:27AM -0500, Joe Abley wrote: On 27-Jan-2006, at 11:12, [EMAIL PROTECTED] wrote: but by definition, the right-most entry is the prefix origin... Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555 to the AS_PATH as it does so. Suppose 9327's uses a transit provider which builds prefix filters from the IRR, and the as9327 aut-num object is modified to include policy which suggests 9327 provides transit for 4555. Suppose this is not actually the case, though, and in fact 9327 is a rogue AS which is trying to capture 4555's traffic. The rest of the world sees a prefix with an AS_PATH attribute which ends with 9327 4555. In this case, from the point of view of those trying to discern legitimacy of advertisements, what is the origin of the prefix? Is it 4555, or 9327? from BGP's perspective, you tell me. being the naive BGP listen/speaker - i think that AS 4555 is the origin. now... what does Prefix 198.32.6.0/24 say is the correct origin? Is it possible to tell, from just the right-most entry in the AS_PATH attribute? nope - but you have jumped right into the path question. (what does the as4555 aut-num object say about using 9327 as an upstream AS?) Joe [note: 9327 is not a rogue AS, in fact. This is just hypothetical :-)] sez you :) (reminder to send Cingular the royalty check if you receive the above two characters : and ) as listed above AND you chose to infer mood or intent.) I think -all- AS are run by rouges and pirates. -- (headless) bill
Re: So -- what did happen to Panix?
On 27-Jan-2006, at 11:54, Patrick W. Gilmore wrote: On Jan 27, 2006, at 8:29 AM, [EMAIL PROTECTED] wrote: seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted. Perhaps by well-operated, Michael was referring to something like the hierarchical authentication scheme used by the RIPE database, which ultimately provides access control for route objects using RIR allocation/assignment data? Joe
Re: So -- what did happen to Panix?
On Jan 27, 2006, at 12:57 PM, Joe Abley wrote: On 27-Jan-2006, at 11:54, Patrick W. Gilmore wrote: On Jan 27, 2006, at 8:29 AM, [EMAIL PROTECTED] wrote: seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. Wouldn't a well-operated network of IRRs used by 95% of network operators be able to meet all three of your requirements? Maybe I missed something, but didn't Verio say the prefix was in their internal registry, and that's why it was accepted. Perhaps by well-operated, Michael was referring to something like the hierarchical authentication scheme used by the RIPE database, which ultimately provides access control for route objects using RIR allocation/assignment data? Yet it can still have stale data. That said, if there were a centralized store for such information and you were in charge of your objects, then the only person to blame when your prefix was incorrectly accepted would be you. (We're talking things like accidental origination here, not malicious attempts to go around safeguards.) Put more concretely, Panix would have no one to blame but themselves if Verio accepted a prefix because it was properly registered in the DB. This, IMHO, would be a Good Thing. Not a panacea, but a Good Thing. And would avoid some very long threads on NANOG (which is also a Good Thing :). -- TTFN, patrick
Re: So -- what did happen to Panix?
Todd Underwood wrote: you're probably right (as usual). but it seems that if you delay acceptance of announcements with novel origination patterns, you don't harm very many legitimate uses. in particular, ASes changing upstreams won't be harmed at all. people moving their prefix to a new ISP will have a fixed delay in getting their announcement propagated, sure. but they already have this delay now. they tell the new ISP: 'announce my prefix' and the new ISP says 'prove it's yours'. they do that for a couple of emails. then the new ISP asks it's upstreams to accept that announcement. that takes a little while (ranging from 4 to 72 hours in my recent experience). This is great for the planned changes, but real-time changes to respond to Internet dynamics won't work well with such delays. If you are multi-homed to provide a backup, you would like for it to respond more quickly than 4-72 hours, I'll bet. So if you have PI space but not your own AS, your backup route would look like a novel origination, but you sure wouldn't want it delayed. How common are such cases? Should the solutions cover them also? Should there be special procedures to deal with special cases? Etc. --Sandy
Re: So -- what did happen to Panix?
Todd Underwood wrote: seems to me that certified validation of prefix ownership and as path are the only real way out of these problems that does not teach us the 42 reasons we use a *dynamic* protocol. certified validation of prefix ownership (and path, as has been pointed out) would be great. it's clearly a laudable goal and seemed like the right way to go. but right now, no one is doing it. the rfcs that's i've found have all expired. and the conversation about it has reached the point where people seem to have stopped even disagreeing about how to do it. in short, it's as dead as dns-sec. so what are we do do in the meantime? (a) I'd hardly say dead - there's the sidr work starting up in the IETF with vendor/operator/registry participation. And there was a panel discussion at the last NANOG about government efforts to assemble the right people (vendors/operators/registries/etc) to work on routing infrastructure security - and prefix origination was one of the biggest item on everyone's list of goals/hopes/longings/dreams. (Truth in advertising: I've been one of those involved in the gov't sponsored workshops.) (b) dnssec isn't dead - there's serious work afoot to get it deployed. Sweden and RIPE have signed their zones. There are web sites that point to work going on, if you'd like to know more: www.dnssec-deployment.org www.dnssec.net (Truth in advertising: I work with people who are working on this.) (z) I think you mean internet drafts, not rfcs. I don't think there have been any rfcs (would there were - we'd be in a different situation), and rfcs don't expire. --Sandy
Re: So -- what did happen to Panix?
Michael.Dillon wrote: Writing RFCs is a fine way to document operational best practices, but it is not a good way to work out joint operational practices. Seems to me that operational problem solving works better when the problem is not thrown into the laps of the protocol designers. If the solution turns out to be joint operational practice, then operators need to be involved, natch. If the solution turns out to be protocols, then the protocol designers need to be involved along with the operators. I'm not so certain that operational practices will fix this problem - it could be argued that the fundamental vulnerabilites in the way routing info is communicated would be better fixed in the protocol. --Sandy
Re: So -- what did happen to Panix?
This is great for the planned changes, but real-time changes to respond to Internet dynamics won't work well with such delays. If you are multi-homed to provide a backup, you would like for it to respond more quickly than 4-72 hours, I'll bet. So if you have PI space but not your own AS, your backup route would look like a novel origination, but you sure wouldn't want it delayed. no. the scheme that josh karlin has been advocating in pretty good bgp involved only supressing a doubtful announcement when you have a better, more trusted announcement. it remains to be seen how hard this would be to implement in existing systems of build filters in configs and push to routers. this only works obviously well in systems that centralize route selection and use routers only as forwarding engines. that might be a cool idea, but it's not what we have now. if you don't use the pgbgp scheme, you can still get the benefits of being no worse than what we have now. consider this just a different, more automatic, more scalable, more secure way of building and maintaing the prefix filter that we all are supposed to maintaining already. i'll be happy to talk to interested parties at nanog in dallas about this (or almost anything else, expecially if you're buying). t. -- _ todd underwood chief of operations security renesys - internet intelligence [EMAIL PROTECTED] www.renesys.com
Re: So -- what did happen to Panix?
In terms of the larger question ConEd Communications was recently acquired by RCN. I'm not sure if the transaction has formally closed. I suspect there are serious transition issues occurring. Financial Stability, Employee Churn, and Ownership are, unfortunately, tough things to factor into BGP algorithms. http://investor.rcn.com/ReleaseDetail.cfm?ReleaseID=181194 Internet access has always been a sideline for CEC - they are more of a provider of transport, and their customers have included some very well known entities in the NY metro area. Perhaps someone from RCN would care to comment? - Dan
Re: So -- what did happen to Panix?
Daniel Golding [EMAIL PROTECTED] wrote: ConEd Communications was recently acquired by RCN. I'm not sure if the transaction has formally closed. I suspect there are serious transition issues occurring. Financial Stability, Employee Churn, and Ownership are, unfortunately, tough things to factor into BGP algorithms. I have no idea if this is really related, but the issue was the same weekend that ConEd had major network maintenance going on. My ConEd service was down (NYC area) for the entire weekend (about 60 hours) during their planned maintenance window to convert their network to MPLS. I saw their maintenance notice and noticed that the window lasted multiple days. I expected the link to go down - but I never imagined they meant it would stay down for the entire maintenance window. So, I'm speculating that even if there weren't organization issues their engineers were probably very busy and distracted by the major technical changes going on.
Re: So -- what did happen to Panix?
Steven, all, On Wed, Jan 25, 2006 at 03:04:30PM -0500, Steven M. Bellovin wrote: It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was. I keep hearing that Con Ed Comm was previously an upstream of of Panix ( http://www.renesys.com/blog/2006/01/coned_steals_the_net.shtml#comments ) and that this might have explained why Con Ed had Panix routes in their radb as-27506-transit object. But I checked our records of routing data going back to jan 1, 2002, and see no evidence of 27506 and 2033 being adjacent to each other in any announcement from any of our peers at any time since then. So I can't really verify that Panix was ever a Con Ed Comm customer. Can anyone else clear this up? So far, it's not making sense. The supposition was that all of the other affected ASes that are not currently customers of Con Ed Comm were also previously customers. Some appear to have been (Walrus Internet (AS7169), Advanced Digital Internet (AS23011), and NYFIX (AS20282) for sure) but I haven't been able to verify that all of them were. I know that this isn't really a root cause that Steven was asking for, though. The root cause is that filtering is imperfect and out of date frequently. This case is particularly intersting and painful because Verio is known for building good filters automatically. In this case, they did so based on out-of-date information, unfortunately. This is particularly depressing because normally in cases of leaks like this, the propagation is via some provider or peer who doesn't filter at all. In this case, one of the vectors was one of the most responsible filterers on the net. sigh. So in terms of engineering good solutions, the space is pretty crowded. One camp is of the total solution variety that involves new hardware, new protocols, and a Public Key approach where originations (or any announcements) are signed and verified. This is obviously a very good and complete approach to the problem but it's also obviously seeing precious little adoption. And in the mean time we have nothing. Another set of approaches has been to look at alternate methods of building filters, taking into account more information about history of routing announcements and dampening or refusing to accept novel, questionable announcements for some fixed, short amount of time. Josh Karlin's paper suggests that as does some of the stuff that Tom Scholl, Jim Deleskie and I presented at the last nanog. All of this has the disadvantage of being a partial solution, the advantage of being implementable easily and in stages without a network forklift or a protocol upgrade, but the further disadvantage of being nowhere near fully baked. Clearly more, smarter people need to keep searching for good solutions to this set of problems. Extra credit for solutions that can be implemented by individual autonomous systems without hardware upgrades or major protocol changes, but that may not be possible. t. p.s.: wrt comments made previously that imply that moving parts of routing control off of the routers is Bell-like or bell-headed: although the comments are silly and made somewhat in jest, they're obviously not true. anyone who builds prefix filters or access lists off of routers is already generating policy somewhere other than the router. using additional history or smarts to do that and uploading prefix filters more often doesn't change that existing architecture or make the network somehow bell-like. it might not work well enough to solve the problem, but that's another, interesting objection. -- _ todd underwood chief of operations security renesys - internet intelligence [EMAIL PROTECTED] http://www.renesys.com/blog
Re: So -- what did happen to Panix?
Dislcaimer: I work for AS2914 On Thu, Jan 26, 2006 at 02:39:59PM -0500, Todd Underwood wrote: Another set of approaches has been to look at alternate methods of building filters, taking into account more information about history of routing announcements and dampening or refusing to accept novel, questionable announcements for some fixed, short amount of time. Josh Karlin's paper suggests that as does some of the stuff that Tom Scholl, Jim Deleskie and I presented at the last nanog. All of this has the disadvantage of being a partial solution, the advantage of being implementable easily and in stages without a network forklift or a protocol upgrade, but the further disadvantage of being nowhere near fully baked. Clearly more, smarter people need to keep searching for good solutions to this set of problems. Extra credit for solutions that can be implemented by individual autonomous systems without hardware upgrades or major protocol changes, but that may not be possible. t. p.s.: wrt comments made previously that imply that moving parts of routing control off of the routers is Bell-like or bell-headed: although the comments are silly and made somewhat in jest, they're obviously not true. anyone who builds prefix filters or access lists off of routers is already generating policy somewhere other than the router. using additional history or smarts to do that and uploading prefix filters more often doesn't change that existing architecture or make the network somehow bell-like. it might not work well enough to solve the problem, but that's another, interesting objection. This is something that (as i mentioned to you in private) some others have thought of as well. We at 2914 build the filters and such off-the-route and load them to the router with sometimes quite large configurations. (they have been ~8MB in the past) I'd love to see some prefix stability data (eg: 129.250/16 has been announced by origin-as 2914 for X years/seconds/whatnot) which can help score the data better. Do we need a origin-as match in our router policies? does it exist already? What about a way to dampen/delay announcements that don't match the origin-as data that exists? I think a solution like this would help out a number of networks that have these types of problems/challenges. Obviously noticing an origin change and alerting or similar on that would be nice and useful, but would the noise be too much for a NOC display? - jared ps. i'm glad our NOC/operations people were able to solve the PANIX issue quickly for them. -- Jared Mauch | pgp key available via finger from [EMAIL PROTECTED] clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Re: So -- what did happen to Panix?
The noise of origin changes is fairly heavy, somewhere in the low hundreds of alerts per day given a 3 day history window. Supposing a falsely originated route was delayed, what is the chance of identifying and fixing it before the end of the delay period? Do operators commonly catch misconfigurations on their own or do they usually find out about it from other operators due to service disruption?
Re: So -- what did happen to Panix?
On Thu, Jan 26, 2006 at 04:22:29PM -0700, Josh Karlin wrote: The noise of origin changes is fairly heavy, somewhere in the low hundreds of alerts per day given a 3 day history window. Supposing a falsely originated route was delayed, what is the chance of identifying and fixing it before the end of the delay period? Do operators commonly catch misconfigurations on their own or do they usually find out about it from other operators due to service disruption? Are the origin changes for a small set of the prefixes that tend to repeat (eg: connexion as planes move), or is it a different set of prefixes day-to-day or week-to-week? I suspect there are the obvious prefixes that don't change (eg: 12/8, 18/8, 35/8, 38/8) but subparts of that may change, but for most people with allocations in the range of 12-17 bits, I suspect they won't change frequently. - jared -- Jared Mauch | pgp key available via finger from [EMAIL PROTECTED] clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Re: So -- what did happen to Panix?
I unfortunately don't have answers to those questions, but you've piqued my interest so I will try to look into it within the next couple of days. Josh On 1/26/06, Jared Mauch [EMAIL PROTECTED] wrote: On Thu, Jan 26, 2006 at 04:22:29PM -0700, Josh Karlin wrote: The noise of origin changes is fairly heavy, somewhere in the low hundreds of alerts per day given a 3 day history window. Supposing a falsely originated route was delayed, what is the chance of identifying and fixing it before the end of the delay period? Do operators commonly catch misconfigurations on their own or do they usually find out about it from other operators due to service disruption? Are the origin changes for a small set of the prefixes that tend to repeat (eg: connexion as planes move), or is it a different set of prefixes day-to-day or week-to-week? I suspect there are the obvious prefixes that don't change (eg: 12/8, 18/8, 35/8, 38/8) but subparts of that may change, but for most people with allocations in the range of 12-17 bits, I suspect they won't change frequently. - jared -- Jared Mauch | pgp key available via finger from [EMAIL PROTECTED] clue++; | http://puck.nether.net/~jared/ My statements are only mine.
Re: So -- what did happen to Panix?
jared, i may have missed the answer to my question. but, as verio was the upstream, and verio is known to use the irr to filter, could you tell us why that approach seemed not to suffice in this case? randy
Re: So -- what did happen to Panix?
On Thu, Jan 26, 2006 at 05:41:10PM -0800, Randy Bush wrote: jared, i may have missed the answer to my question. but, as verio was the upstream, and verio is known to use the irr to filter, could you tell us why that approach seemed not to suffice in this case? Sure, what I saw by going through the diffs, etc.. that I have available to me is that the prefix was registered to be announced by our customer and hence made it into our automatic IRR filters. it was no longer in there by the time that I personally looked things up in our registry, but saw diffs go through removing that prefix later in the day (night) from the acl. Someone that has a snapshot of the various IRR data from those days can likely put this together better than I can explain. - jared -- Jared Mauch | pgp key available via finger from [EMAIL PROTECTED] clue++; | http://puck.nether.net/~jared/ My statements are only mine.
So -- what did happen to Panix?
It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb
Re: So -- what did happen to Panix?
On Wed, 25 Jan 2006, Steven M. Bellovin wrote: It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was. Is it really that hard to engineer this solution? We do have several of them proposed (SBGP, soBGP, etc) and new WG is likely to be formed soon within IETF to finally work it out. -- William Leibzon Elan Networks [EMAIL PROTECTED]
Re: So -- what did happen to Panix?
On Wed, 25 Jan 2006, william(at)elan.net wrote: On Wed, 25 Jan 2006, Steven M. Bellovin wrote: It's now been 2.5 business days since Panix was taken out. Do we know what the root cause was? It's hard to engineer a solution until we know what the problem was. Is it really that hard to engineer this solution? We do have several of them proposed (SBGP, soBGP, etc) and new WG is likely to be formed soon within IETF to finally work it out. It'd be darn difficult to engineer a solution that would end up being deployed in any reasonable time if we don't know the requirements first. Yes, there's a draft -- draft-ietf-rpsec-bgpsecrec-03.txt -- but it has been woefully lacking on the operator deployment requirements. More people should participate in the effort. -- Pekka Savola You each name yourselves king, yet the Netcore Oykingdom bleeds. Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
Re: So -- what did happen to Panix?
On Thu, 26 Jan 2006 07:54:30 +0200, Pekka Savola said: It'd be darn difficult to engineer a solution that would end up being deployed in any reasonable time if we don't know the requirements first. Fortunately, when we know the requirements and engineer a solution, deployment is straightforward. RFC2827, for example, has a stellar deployment record. In other words - what is the business case for deploying this proposed solution? I may be able to get things deployed at $WORK by arguing that it's The Right Thing To Do, but at most shops an ROI calculation needs to be attached to get movement pgpDLlZdD3ply.pgp Description: PGP signature
Re: So -- what did happen to Panix?
On Thu, 26 Jan 2006, [EMAIL PROTECTED] wrote: In other words - what is the business case for deploying this proposed solution? I may be able to get things deployed at $WORK by arguing that it's The Right Thing To Do, but at most shops an ROI calculation needs to be attached to get movement Exactly. If $OTHER_FOLKS don't deploy it, cases like Panix may not really be avoided. I think that's what folks proposing perfect -- but practically undeployable -- security solutions are missing. -- Pekka Savola You each name yourselves king, yet the Netcore Oykingdom bleeds. Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
Re: So -- what did happen to Panix?
In message [EMAIL PROTECTED], Pekka Savola writes: On Thu, 26 Jan 2006, [EMAIL PROTECTED] wrote: In other words - what is the business case for deploying this proposed solution? I may be able to get things deployed at $WORK by arguing that it's The Right Thing To Do, but at most shops an ROI calculation needs to be attached to get movement Exactly. If $OTHER_FOLKS don't deploy it, cases like Panix may not really be avoided. I think that's what folks proposing perfect -- but practically undeployable -- security solutions are missing. That is, of course, why I asked the question -- I'm trying to understand the actual failure modes and feasible fixes. I agree that many of the solutions proposed thus far are hard to deploy; some colleagues and I are working on variants that we think are deployable. But we need data first. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb