Re: So -- what did happen to Panix?

2006-02-08 Thread Jeffrey Haas

On Wed, Feb 08, 2006 at 04:37:31AM +, Christopher L. Morrow wrote:
 I had thought Josh's paper (or maybe not josh, whomever it was) said
 something along the lines of:
 1) if more than one announcement prefer 'longer term', 'older', 'more
 usual' route
 2) if only one route take it and run!

FWIW, this sort of mechanism was discussed among the IETF RPSEC WG
task group that is working on BGP security requirements.

On the presumption that some database of stable routes and paths
is present, you could bias your preference in your routes for
more stable routes and paths.

You would also need to decide what to do about more specific routes
covered by stable routes.  Do you ignore them?  This is a harder
question.

-- 
Jeff Haas 
NextHop Technologies




Re: So -- what did happen to Panix?

2006-02-08 Thread Josh Karlin

Here is what we propose in PGBGP.  If you have a more specific route
and its AS Path does not contain any of the less specific route's
origins, then ignore it for a day and keep routing to the less
specific origin.  If it's legitimate the less specific origin should
forward the data on for the day.

We see about 30 of these suspicious routes per day.

I imagine some of you will not like this sceheme.  Please let me know why.

Josh



On 2/8/06, Jeffrey Haas [EMAIL PROTECTED] wrote:

 On Wed, Feb 08, 2006 at 04:37:31AM +, Christopher L. Morrow wrote:
  I had thought Josh's paper (or maybe not josh, whomever it was) said
  something along the lines of:
  1) if more than one announcement prefer 'longer term', 'older', 'more
  usual' route
  2) if only one route take it and run!

 FWIW, this sort of mechanism was discussed among the IETF RPSEC WG
 task group that is working on BGP security requirements.

 On the presumption that some database of stable routes and paths
 is present, you could bias your preference in your routes for
 more stable routes and paths.

 You would also need to decide what to do about more specific routes
 covered by stable routes.  Do you ignore them?  This is a harder
 question.

 --
 Jeff Haas
 NextHop Technologies





Re: So -- what did happen to Panix?

2006-02-07 Thread Nick Feamster


Martin Hannigan wrote:


My answer, in short, was to say that I see it as more of an enterprise
play because it's a managed service and the hardest part of
provisioning is typically the order cycle.
If you are an ISP, you are theoretically multi homed by definition
and your providers are going to remain fairly stable (you hope)
based on your own needs.


My point remains: designs based on such assumptions are not a good idea, 
since these assumptions are by no means fundamental and could certainly 
change.  People get creative with how they announce prefixes, change 
upstreams, etc., and you can't assume that things like this would stay 
the way they are.


As an aside, another question occurred to me about delaying unusual 
announcements.  Boeing Connexion offers another example of unorthodox 
prefix announcements.  Wouldn't the tactic of delaying unusual 
announcements would cause problems for this service?


-Nick


Re: So -- what did happen to Panix?

2006-02-07 Thread Christopher L. Morrow



On Tue, 7 Feb 2006, Nick Feamster wrote:

 As an aside, another question occurred to me about delaying unusual
 announcements.  Boeing Connexion offers another example of unorthodox
 prefix announcements.  Wouldn't the tactic of delaying unusual
 announcements would cause problems for this service?

I had thought Josh's paper (or maybe not josh, whomever it was) said
something along the lines of:
1) if more than one announcement prefer 'longer term', 'older', 'more
usual' route
2) if only one route take it and run!

So.. provided Connexion withdraws from 'as-germany' and announces in
'as-atlantic ocean', and so on there would only be 1 route, and you'd fall
to step 2.

(yes, the paper was more detailed and there were more steps...)


Re: So -- what did happen to Panix?

2006-02-07 Thread Martin Hannigan


At 11:27 PM 2/7/2006, Nick Feamster wrote:


Martin Hannigan wrote:


My answer, in short, was to say that I see it as more of an enterprise
play because it's a managed service and the hardest part of
provisioning is typically the order cycle.
If you are an ISP, you are theoretically multi homed by definition
and your providers are going to remain fairly stable (you hope)
based on your own needs.


My point remains: designs based on such assumptions are not a good 
idea, since these assumptions are by no means fundamental and could 
certainly change.  People get creative with how they announce 
prefixes, change upstreams, etc., and you can't assume that things 
like this would stay the way they are.



Nick:

I wouldn't call them assumptions. I would call them engineering 
decisions in operational
environments. I guess I fail to see where a commodity market with a 
broker adding a vig
resolves a real network problem. I'm think tier1? They aren't buying 
service from anyone
on Equinix direct and move/add/drop is just another day on the 
Internet. I really can't see
any provider doing it, but perhaps smaller ones. *shrug*. I don't 
know why you wouldn't
make temporary arrangements via peering fabric, PNI, or transit and 
eliminate the middle

man (point of failure).



As an aside, another question occurred to me about delaying unusual 
announcements.  Boeing Connexion offers another example of 
unorthodox prefix announcements.  Wouldn't the tactic of delaying 
unusual announcements would cause problems for this service?



[ snip ]

-M






-Nick


Martin Hannigan(c) 617-388-2663
Renesys Corporation(w) 617-395-8574
Member of Technical Staff  Network Operations
   [EMAIL PROTECTED]  



Re: So -- what did happen to Panix?

2006-02-07 Thread Josh Karlin

Chris has it!

And to be clear, we only require a slow (1 day) provider changeover in
the case that you want to announce your old provider's sub-prefix at a
new provider.  For instance, if you are an ATT customer using a 12/8
sub-prefix and change providers but keep the prefix, the prefix will
look funny coming from another originator for the first day and be
delayed.  All other methods of changing providers will not be
interfered with.

Josh



 I had thought Josh's paper (or maybe not josh, whomever it was) said
 something along the lines of:
 1) if more than one announcement prefer 'longer term', 'older', 'more
 usual' route
 2) if only one route take it and run!

 So.. provided Connexion withdraws from 'as-germany' and announces in
 'as-atlantic ocean', and so on there would only be 1 route, and you'd fall
 to step 2.

 (yes, the paper was more detailed and there were more steps...)



Re: So -- what did happen to Panix?

2006-02-06 Thread Michael . Dillon

  If an IRR suffers from bit-rot, then I don't consider
  it to be well-operated and therefore it cannot be
  considered to be part of a well-operated network of
  IRRs.
 
 honestly I'm not a fan of IRR's, so don't pay attention to them, but... 
is
 the IRR 'not well operated' or is the data stale because the 'users' of
 the IRR are 'not well operated' ? (the IRR as near as I can tell is
 nothing but a web/whois server that you sign-up-for and push/pull data
 through, right?)

Indeed it is not much more than a server with a database
which is why I do not consider it to be well-operated.
In order to be well-operated, somebody (or some organization)
needs to take responsibility for the data in the database
and make sure that this data is as accurate as can be.

I'm really saying that if people want to solve this
problem jointly, then the tools are already there for
a membership organization to use. And such an organization
could also work on a revised BGP protocol as a longer term
solution.

But, in the absence of such an organization we have nothing
more than a disorganized chaos in which nothing much changes.

--Michael Dillon



Re: So -- what did happen to Panix?

2006-02-06 Thread Michael . Dillon

 Other networks have no such incentive, since their transit providers 
 and peers either build their filters in other ways, or don't filter 
 at all.

There is nothing wrong with building your filter in
some other way, however, that does not mean that you
cannot validate your filters against the IRR and take
some action on mismatches. For instance you could email
the prefix owners about the mismatch and ask them to
update the IRR.

 Wherever there is a lack of incentive to keep records accurate, we 
 can probably safely assume that they are either missing or stale.

Yes. Without regular validation or auditing of data,
it does not stay up to date.

 It's probably fair to say that if all the large, default-free 
 carriers insisted that their customers submitted their routes to the 
 IRR, then every route would be registered. This would not completely 
 address the problem of stale data, though.

It's a good start. Perhaps if we decouple the idea of an IRR
from building filters more people will see the usefulness
of a distributed repository of information against which
they can validate (cryptographically or otherwise) their
routing data.

Right now the secure BGP protocols require a network to
climb the hurdles of cryptographic certification in order
to participate. A revised and renewed IRR can lower that
barrier so that people can participate even before they
implement cryptographic signing and certification.

 The IRR is a loosely-connected collection of route registries, all 
 run by different people. Data originating in one database is 
 frequently found to be mirrored in other databases, but not in any 
 great systematic fashion.

If the networking community can't solve the problem
of managing the distributed route registries in a systematic
fashion, then how can it implement one of the secure BGP proposals?

--Michael Dillon



Re: So -- what did happen to Panix?

2006-02-06 Thread Martin Hannigan


At 02:05 AM 2/6/2006, Nick Feamster wrote:


Martin Hannigan wrote:

[ SNIP ]


 If you are changing providers, which takes

awhile anyway,


That process seems to be getting quicker:
http://www.equinix.com/prod_serv/network/ed.htm


NOT an ISP product.


Independent of ED, one should be cautious when designing routing 
protocols based on logistical and business assumptions (e.g., 
switching providers takes awhile, most business policies are vanilla 
peering, etc.).


These assumptions are certainly not fundamental, and they may not 
always be true, regardless of what exists today.



I got some can you elaborate comments so please forgive my
second response.

What I thought I read was that you thought Equinix had an interesting
play in a transitioning and provisioning strategy for ISP's.

My answer, in short, was to say that I see it as more of an enterprise
play because it's a managed service and the hardest part of
provisioning is typically the order cycle.
If you are an ISP, you are theoretically multi homed by definition
and your providers are going to remain fairly stable (you hope)
based on your own needs.

Equinix direct is a bandwidth commodity in my mind. Anyone remember
Invisible Hand (still in business, btw http://www.invisiblehand.net/)

Equinix handles the software interaction and is the market maker. Customers
appear to providers and providers can decide if they want to sell to
customers. For example, if you show up at ED and need X gigs, a provider
could opt out of the market because you are a highcap customer. In the end,
the market maker gets a piece of the action from the provider and sends the
customer a bill since it is theoretically the provider. I think there's
a question about neutrality, but there are no more pure neutral colo
houses so that is somewhat irrelevant unless it's completely bogus like
selling interconnect network or something vs. the ILEC.

In an environment like Equinix or SD, you could attach to the public
peering fabric and make connections, and then if you need someone
specific you can hope to get them on ED (in Equinixs case) without
buying dedicated transit. In short, it's easy.

With that said, I believe most ISP's would be better suited to
overlapped service or TE'ing vs. using commodity markets for
b/w, IMHO.

Thanks,


-M




Martin Hannigan(c) 617-388-2663
Renesys Corporation(w) 617-395-8574
Member of Technical Staff  Network Operations
   [EMAIL PROTECTED]  



Re: So -- what did happen to Panix?

2006-02-05 Thread Joe Provo

On Fri, Feb 03, 2006 at 02:15:45PM -0500, Nick Feamster wrote:
[snip]
 This is a losing proposition.  The data in the IRR, CA, or any mechanism 
 that is updated out-of-band from the protocol itself will inherently be 
 out-of-sync.

Provisioning systems are out of synch with the protocol, but essential
for many(maont?) networks' connectivity. Many providers who do use the 
IRR have it as an adjunct/offshoot of their provisionign system.  

Of course, to some monolithic entities the suggestion that any alteration
(or $deity-forbid, a not-invented-here *improvement*) to their system 
is anathema.

[snip some interesting stuff]

  If you are changing providers, which takes
 awhile anyway, 
 
 That process seems to be getting quicker:
 http://www.equinix.com/prod_serv/network/ed.htm

See 'whois -h whois.radb.net rs-ed-ash' and similar objects; great 
support for IRR as externally-relevant portion of a provisioning 
system.

Cheers,

Joe

-- 
 RSUC / GweepNet / Spunk / FnB / Usenix / SAGE


Re: So -- what did happen to Panix?

2006-02-05 Thread Martin Hannigan



[ SNIP ]



 If you are changing providers, which takes

awhile anyway,


That process seems to be getting quicker:
http://www.equinix.com/prod_serv/network/ed.htm



NOT an ISP product.

-M





Martin Hannigan(c) 617-388-2663
Renesys Corporation(w) 617-395-8574
Member of the Technical Staff  Network Operations
   [EMAIL PROTECTED]  



Re: So -- what did happen to Panix?

2006-02-05 Thread Nick Feamster


Martin Hannigan wrote:


[ SNIP ]



 If you are changing providers, which takes

awhile anyway,


That process seems to be getting quicker:
http://www.equinix.com/prod_serv/network/ed.htm



NOT an ISP product.


Independent of ED, one should be cautious when designing routing 
protocols based on logistical and business assumptions (e.g., switching 
providers takes awhile, most business policies are vanilla peering, etc.).


These assumptions are certainly not fundamental, and they may not always 
be true, regardless of what exists today.


-Nick


Re: So -- what did happen to Panix?

2006-02-05 Thread Martin Hannigan


At 02:05 AM 2/6/2006, Nick Feamster wrote:

Martin Hannigan wrote:

[ SNIP ]


 If you are changing providers, which takes

awhile anyway,


That process seems to be getting quicker:
http://www.equinix.com/prod_serv/network/ed.htm


NOT an ISP product.


Independent of ED, one should be cautious when designing routing 
protocols based on logistical and business assumptions (e.g., 
switching providers takes awhile, most business policies are vanilla 
peering, etc.).


These assumptions are certainly not fundamental, and they may not 
always be true, regardless of what exists today



This is strictly a market-maker product, IMHO, which is different from a
transition or provisioning strategy. YMMV.

ISP's don't switch providers, typically, enterprises do. ISP's add, move,
and drop, so physical layer management is more important, believe it or not.


-M




Martin Hannigan(c) 617-388-2663
Renesys Corporation(w) 617-395-8574
Member of the Technical Staff  Network Operations
   [EMAIL PROTECTED]  



Re: So -- what did happen to Panix?

2006-02-04 Thread Christopher L. Morrow


On Mon, 30 Jan 2006 [EMAIL PROTECTED] wrote:


   Wouldn't a well-operated network of IRRs used by 95% of
   network operators be able to meet all three of your
   requirements?
 
  We have such a database (used by Verio and others), but the Panix
 incident
  happened anyway due to bit rot.  We've got to find a way to fix the
 layer 8
  problems before we can make improvements at layer 3.

 If an IRR suffers from bit-rot, then I don't consider
 it to be well-operated and therefore it cannot be
 considered to be part of a well-operated network of
 IRRs.

honestly I'm not a fan of IRR's, so don't pay attention to them, but... is
the IRR 'not well operated' or is the data stale because the 'users' of
the IRR are 'not well operated' ? (the IRR as near as I can tell is
nothing but a web/whois server that you sign-up-for and push/pull data
through, right?)


Re: So -- what did happen to Panix?

2006-02-04 Thread Christopher L. Morrow


On Fri, 3 Feb 2006, Josh Karlin wrote:

 Our primary concern is with keeping BGP stable until its replacement
 (e.g. sBGP) is ready for deployment.


veering off course for a tick: I wonder how well sbgp/sobgp will behave
in a world of 1million routes in the DFZ? 5 million? 10? 20?... 

Someone better be thinking about that part of the problem as well with the
coming doom of ipv6 :)


Re: So -- what did happen to Panix?

2006-02-04 Thread Joe Abley



On 4-Feb-2006, at 15:21, Christopher L. Morrow wrote:

honestly I'm not a fan of IRR's, so don't pay attention to them,  
but... is
the IRR 'not well operated' or is the data stale because the  
'users' of

the IRR are 'not well operated' ?


The data ought to be maintained by the people to whom it relates.

Customers (and peers) of some ISPs have great incentives to add  
appropriate records, since if they don't do so their ISPs' filters  
will not be widened to accept their routes.


Other networks have no such incentive, since their transit providers  
and peers either build their filters in other ways, or don't filter  
at all.


Generally, there is no incentive to remove data from the IRR, except  
in the case where resources are returned and reallocated to someone  
else who wants to make their own records.


Wherever there is a lack of incentive to keep records accurate, we  
can probably safely assume that they are either missing or stale.


Customer in this context means anybody whose routes might be  
filtered by someone else. Since large, default-free carriers tend  
not to have their routes filtered by peers, those that don't use RPSL  
expressions to build customer filters don't have much reason to care  
about the IRR.


It's probably fair to say that if all the large, default-free  
carriers insisted that their customers submitted their routes to the  
IRR, then every route would be registered. This would not completely  
address the problem of stale data, though.



(the IRR as near as I can tell is
nothing but a web/whois server that you sign-up-for and push/pull data
through, right?)


The IRR is a loosely-connected collection of route registries, all  
run by different people. Data originating in one database is  
frequently found to be mirrored in other databases, but not in any  
great systematic fashion.


Together these databases form a distributed repository of RPSL  
objects. Objects are generally submitted by e-mail and retrieved  
using whois, but some registry operators also make web interfaces  
available. Anybody who doesn't know what RPSL is can find out at  
http://www.irr.net/docs/rpsl.html.



Joe



Re: So -- what did happen to Panix?

2006-02-04 Thread Nick Feamster


Josh Karlin wrote:

Hasn't that been said for years?  Wouldn't perfect IRRs be great?  I
couldn't agree more.  But in the meanwhile, why not protect your own
ISP by delaying possible misconfigurations.Our proposed delay does
*not* affect reachability, if the only route left is suspicious, it
will be chosen regardless.

Depending on the threat model, then, one attack would be to cause an AS
to damp the non-suspicious route.  This seems bad, right?  A flapping,
correct route seems better than a stable, suspicious one.


A flapping route would only be considered suspicious if it disappears
for many consecutive days and no other known route for the prefix
originates at the same AS. At which point the attacker has already
won.


My point was actually that an adversary could flap a correct route to 
damp it, to induce a router to select a suspicious one.  (This threat 
also exists today, I believe, but the delay tactic does not solve the 
problem.)



Ascertaining correctness is only half of the work.  If you correctly
classify a malicious route, but do not take some measure to prevent
its spread, you have just done yourself and your customers harm.



I would say that ascertaining correctness is more than half of the work. 
 If a router can definitively say that a route is bogus, the measure 
to prevent its spread is pretty simple, right?  i.e., just drop the route.



In the case of PGBGP, there is a lot that an operator can do to verify
correctness.  Multiple viewpoints of anomalous routes can be collected
into a single database in which operators can, once per day, check to
make sure that their own address space is not being announced
elsewhere.  This can easily be automated for both the NOC and the
collection process.  Relationship information need not be revealed as
only the originator of the suspicious route is needed.


Analysis of multiple vantage points could definitely help in your case. 
 The method for determining what a suspcious route is is not obvious, 
though.


In the example you present, a router can install route filters to reject 
incoming announcements for its own address space (many ISPs seem to 
deploy these types of filters already).  Much trickier is determining 
things like route hijacks, where even a delay won't help much without a 
reasonable way to ask Is this route hijacked?  The best way I know of 
for doing that is to go back to the registry.  If there are other ways 
to do this, I'd certainly be very interested to know about the state of 
the art.


The proposal seems useful in a case where collection of measurements 
from multiple vantage points could run analysis to detect suspcious 
routes, assuming the detection algorithms could be run quickly enough 
and the information about suspicious routes could be propagated back out 
to the network...which might not always be true in an attack scenario.


-Nick


Re: So -- what did happen to Panix?

2006-02-03 Thread Nick Feamster



Wouldn't a well-operated network of IRRs used by 95% of
network operators be able to meet all three of your
requirements?

-certified prefix ownership
-certified AS path ownership
-dynamic changes to the above two items

It seems to me that most of the pieces needed to do
this already exist. RPSL, IRR softwares, regional
addressing authorities (RIRs). If there are to be
certified AS paths in a central database this also
opens the door to special arrangements for AS path
routing that go beyond peering, i.e. agreements with
the peers of your peers.


It is true that most of the pieces do exist.  The problem appears to be 
not a want of tools, but the fact that the tools are not coupled 
properly---updating records about prefix ownership is, today, performed 
out-of-band from the routing protocol.


This is a losing proposition.  The data in the IRR, CA, or any mechanism 
that is updated out-of-band from the protocol itself will inherently be 
out-of-sync.


A better idea, I think, would be to tie the identifier of the route 
something that is inherently bound to some cryptographic information 
(e.g., a public key), rather than a separate piece of information whose 
ownership must be certified (i.e., an IP prefix, an AS number).


I can think of some great ways to do this, but they all involve varying 
degrees of departure from prefix-based routing.  I would certinaly be 
interested in talking offline about this with any forward-thinking types.



Hasn't that been said for years?  Wouldn't perfect IRRs be great?  I
couldn't agree more.  But in the meanwhile, why not protect your own
ISP by delaying possible misconfigurations.Our proposed delay does
*not* affect reachability, if the only route left is suspicious, it
will be chosen regardless.  


Depending on the threat model, then, one attack would be to cause an AS 
to damp the non-suspicious route.  This seems bad, right?  A flapping, 
correct route seems better than a stable, suspicious one.


Perhaps I am missing something, but how does imposing a delay help in 
ascertaining a route's correctness?  Even looking at some of the 
suspicious routes I see by hand in the anomalies we detect, I can't 
personally tell what's incorrect/actionable vs. simply unusual (again, 
this goes back to the problem of inaccurate registries).  In the case of 
Panix/ConEd, I can imagine that an operator would have responded to the 
alarms, checked the registry information and said, these routes look 
reasonable; go for it!  Or, as human nature suggests, the operator 
might have even just ignored the alarms (particularly if origin changes 
are as frequent as they seem to be).


What is really needed, in any case, is a better way to determine the 
route's veracity.  This still requires some auxiliary mechanism to 
distinguish unusual from suspcious, and, while you're designing that 
auxiliary mechanism, it might as well be in-band (per the arguments above).


 If you are changing providers, which takes
awhile anyway, 


That process seems to be getting quicker:
http://www.equinix.com/prod_serv/network/ed.htm

-Nick


Re: So -- what did happen to Panix?

2006-02-03 Thread Josh Karlin

  Hasn't that been said for years?  Wouldn't perfect IRRs be great?  I
  couldn't agree more.  But in the meanwhile, why not protect your own
  ISP by delaying possible misconfigurations.Our proposed delay does
  *not* affect reachability, if the only route left is suspicious, it
  will be chosen regardless.

 Depending on the threat model, then, one attack would be to cause an AS
 to damp the non-suspicious route.  This seems bad, right?  A flapping,
 correct route seems better than a stable, suspicious one.

A flapping route would only be considered suspicious if it disappears
for many consecutive days and no other known route for the prefix
originates at the same AS. At which point the attacker has already
won.

Our primary concern is with keeping BGP stable until its replacement
(e.g. sBGP) is ready for deployment.

 Perhaps I am missing something, but how does imposing a delay help in
 ascertaining a route's correctness?  Even looking at some of the
 suspicious routes I see by hand in the anomalies we detect, I can't
 personally tell what's incorrect/actionable vs. simply unusual (again,
 this goes back to the problem of inaccurate registries).  In the case of
 Panix/ConEd, I can imagine that an operator would have responded to the
 alarms, checked the registry information and said, these routes look
 reasonable; go for it!  Or, as human nature suggests, the operator
 might have even just ignored the alarms (particularly if origin changes
 are as frequent as they seem to be).

Ascertaining correctness is only half of the work.  If you correctly
classify a malicious route, but do not take some measure to prevent
its spread, you have just done yourself and your customers harm.

In the case of PGBGP, there is a lot that an operator can do to verify
correctness.  Multiple viewpoints of anomalous routes can be collected
into a single database in which operators can, once per day, check to
make sure that their own address space is not being announced
elsewhere.  This can easily be automated for both the NOC and the
collection process.  Relationship information need not be revealed as
only the originator of the suspicious route is needed.

If, in the worst case, the route is not detected as malicious before
it is considered normal, the next wave of routers will be introduced
to the route and consider it suspicious.  The first wave will then
notice the problem and fix it, still protecting a significant portion
of the network.

Josh


Re: So -- what did happen to Panix?

2006-02-01 Thread John Payne



On Jan 30, 2006, at 5:02 AM, Richard A Steenbergen wrote:



On Mon, Jan 30, 2006 at 09:48:13AM +,  
[EMAIL PROTECTED] wrote:



Wouldn't a well-operated network of IRRs used by 95% of
network operators be able to meet all three of your
requirements?


We have such a database (used by Verio and others), but the Panix

incident

happened anyway due to bit rot.  We've got to find a way to fix the

layer 8

problems before we can make improvements at layer 3.


If an IRR suffers from bit-rot, then I don't consider
it to be well-operated and therefore it cannot be
considered to be part of a well-operated network of
IRRs.

The point is that the tools exist. The failing is in
how those tools are managed. In other words this is
an operational problem on both the scale of a single
IRR and on the scale of the IRR system. Is this
what you mean by a layer 8 problem?


Take it up with the people putting data into the system, not the IRR
operators. Anyone who is behind an IRR-based provider (like Verio) has
motivation to put data into the system (hey look I do this and now
routing works), but there is no motivation to take stale data OUT  
of the

system.


It gets even more fun if you're delegating route-origination to 3rd  
parties.
Add a mnt-routes: so they can create a route object, but then you  
can't remove that inetnum block whilst their route object exists (nor  
remove the mnt-routes).


*sigh*



Re: So -- what did happen to Panix?

2006-01-30 Thread Michael . Dillon

  Wouldn't a well-operated network of IRRs used by 95% of
  network operators be able to meet all three of your
  requirements?
 
 We have such a database (used by Verio and others), but the Panix 
incident 
 happened anyway due to bit rot.  We've got to find a way to fix the 
layer 8 
 problems before we can make improvements at layer 3.

If an IRR suffers from bit-rot, then I don't consider
it to be well-operated and therefore it cannot be
considered to be part of a well-operated network of
IRRs.

The point is that the tools exist. The failing is in
how those tools are managed. In other words this is
an operational problem on both the scale of a single
IRR and on the scale of the IRR system. Is this
what you mean by a layer 8 problem?

--Michael Dillon



Re: So -- what did happen to Panix?

2006-01-30 Thread Michael . Dillon

 Perhaps people should stop trying to have these
 operational discussions in the IETF and take the
 discussions to NANOG where network operators gather.
 
 We have tried, of course; see, for example, NANOG 28 (Salt Lake City).
 There was no more consensus at NANOG than in the IETF...

One attempt almost 3 years ago, doesn't sound very
serious to me. And if the discussion is only concerned
with seeking consensus on implementing a new flavor 
of BGP protocol then it isn't much of a discussion.

In fact, there was a consensus at Salt Lake City that
the issues of routing security could be adequately dealt
with by existing tools and protocols. Not all problems
require new protocols to solve them.

--Michael Dillon



Re: So -- what did happen to Panix?

2006-01-30 Thread Richard A Steenbergen

On Mon, Jan 30, 2006 at 09:48:13AM +, [EMAIL PROTECTED] wrote:
 
   Wouldn't a well-operated network of IRRs used by 95% of
   network operators be able to meet all three of your
   requirements?
  
  We have such a database (used by Verio and others), but the Panix 
 incident 
  happened anyway due to bit rot.  We've got to find a way to fix the 
 layer 8 
  problems before we can make improvements at layer 3.
 
 If an IRR suffers from bit-rot, then I don't consider
 it to be well-operated and therefore it cannot be
 considered to be part of a well-operated network of
 IRRs.

 The point is that the tools exist. The failing is in
 how those tools are managed. In other words this is
 an operational problem on both the scale of a single
 IRR and on the scale of the IRR system. Is this
 what you mean by a layer 8 problem?

Take it up with the people putting data into the system, not the IRR 
operators. Anyone who is behind an IRR-based provider (like Verio) has 
motivation to put data into the system (hey look I do this and now 
routing works), but there is no motivation to take stale data OUT of the 
system.

I can't even begin to count the number of networks I know who 
theoretically use IRR who don't even know HOW to remove data, let alone 
make any active attempt to do so when a customer leaves or a route is 
returned. Combine this with the idiots who run around proxy registering 
routes for other people based on everything they see in the table (gee 
theres a good idea, define filters for what is allowed in the table based 
on what we see people trying to put into the table, brilliant!) and you 
quickly see how IRR data becomes stale and eventually worthless.

I'll save the rest of my rant for the presentation on the subject in 
Dallas. :)

-- 
Richard A Steenbergen [EMAIL PROTECTED]   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)


Re: So -- what did happen to Panix?

2006-01-30 Thread Michael . Dillon

  Wouldn't a well-operated network of IRRs used by 95% of
  network operators be able to meet all three of your
  requirements?
 
 Maybe I missed something, but didn't Verio say the prefix was in 
 their internal registry, and that's why it was accepted.
 
 IOW: It didn't solve this problem.  So I guess we're discussing the 
 other 5%?

You missed the words well-operated. 

Today there is no well-operated network of IRRs so
there is bad data in the databases. In addition, there
is the question of how to use the IRR data. Should you
build filters from it? Should you use it to validate your
own internal database with human beings chasing up 
the differences and fixing whichever database is wrong?

--Michael Dillon



Re: So -- what did happen to Panix?

2006-01-30 Thread sandy

the scheme that josh karlin has been advocating in pretty good bgp
involved only supressing a doubtful announcement when you have a
better, more trusted announcement.

Not a doubtful announcement, a novel announcement.  Not a better
announcement, a more usual announcement.  The trust part, like beauty,
is in the eye of the beholder.

Don't get me wrong - I think basing decision on some trusted
summary of historical behavior is going to be important, unless and
until we get some approach that gives a more deterministic answer.
But I do believe that we need to consider carefully how this will
play with dynamic, particularly unplanned, changes in who is announcing what.

If there turn out to be cases where dynamic, particularly unplanned,
changes get rejected by this technique in favor of stale data,
then there should be consideration given to how to amend the scheme
to prevent that or suggest operational practices to get around it.

--Sandy


Re: So -- what did happen to Panix?

2006-01-30 Thread Todd Underwood

sandy,

On Mon, Jan 30, 2006 at 08:29:45AM -0500, [EMAIL PROTECTED] wrote:
 the scheme that josh karlin has been advocating in pretty good bgp
 involved only supressing a doubtful announcement when you have a
 better, more trusted announcement.
 
 Not a doubtful announcement, a novel announcement.  Not a better
 announcement, a more usual announcement.  The trust part, like beauty,
 is in the eye of the beholder.

i just don't think you're following along.  i think we're talking
about different things.  read josh, stephanie forrest and jennifer
rexford's paper:  

http://www.cs.unm.edu/~treport/tr/05-10/pgbgp.pdf

 Don't get me wrong - I think basing decision on some trusted
 summary of historical behavior is going to be important, unless and
 until we get some approach that gives a more deterministic answer.
 But I do believe that we need to consider carefully how this will
 play with dynamic, particularly unplanned, changes in who is
 announcing what. 

josh's scheme only comes into play when there are two, competing
origination patterns.  in this case the question is just which one to
believe.  

agreed that we should be careful with anything that reduces the
ability of people to change routing dynamically.  but let's remember:
that ability is already constrained by the fact that responsible
providers use prefix filters and require some kind of out-of-band
(IRR, letter, email) validation of prefix ownership. routing a new
prefix with a new origination pattern is not especially dynamic now,
so let's not worry about throwing out a baby that's not even in the
bath.  

t.


-- 
_
todd underwood
chief of operations  security 
renesys - internet intelligence
[EMAIL PROTECTED]   www.renesys.com


Re: So -- what did happen to Panix?

2006-01-28 Thread Steven M. Bellovin

In message [EMAIL PROTECTED]
.com, [EMAIL PROTECTED] writes:

 certified validation of prefix ownership (and path, as has been
 pointed out) would be great.  it's clearly a laudable goal and seemed
 like the right way to go.  but right now, no one is doing it.  the
 rfcs that's i've found have all expired.  and the conversation about
 it has reached the point where people seem to have stopped even
 disagreeing about how to do it.  in short, it's as dead as dns-sec.
 so what are we do do in the meantime?

Perhaps people should stop trying to have these
operational discussions in the IETF and take the
discussions to NANOG where network operators gather.


We have tried, of course; see, for example, NANOG 28 (Salt Lake City).
There was no more consensus at NANOG than in the IETF...

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb




Re: So -- what did happen to Panix?

2006-01-27 Thread william(at)elan.net



All these explanations can only go so far as to show that ConEd
and its upstreams may have had these prefixes as something that is
allowed (due to previous transit relationships) to be annnounced.
However  presumably all these were transit arrangements with ConEd
and ip blocks would have originated from different ASN where a
during the accident ConEd actually directly announced prefix as
originating from its own ASN.

One thing I can think of is that ConEd started doing syncrhonization
so all eBGP routes were redistributed into ospf or some other igp
protocol. This could led to situation that some previously configured
router that redistributes summarized rotues from igp go bgp could
think the route needs to be advertised as coming from ConEd and 
announced it Verio. But I think result of all this should have been

that route would be flapping (i.e. they start announcing and then it
gets removed from what they learn from upstream and so no longer 
redistributed to igp and no longer announced; back to the beginning) 
and they weren't.


--
William Leibzon
Elan Networks
[EMAIL PROTECTED]


Re: So -- what did happen to Panix?

2006-01-27 Thread bmanning

On Fri, Jan 27, 2006 at 04:36:28AM -0800, Randy Bush wrote:
 
  what I saw by going through the diffs, etc.. that I have
  available to me is that the prefix was registered to be announced
  by our customer and hence made it into our automatic IRR filters.
 
 i.e., the 'error' was intended, and followed all process.
 
 so, what i don't see is how any hacks on routing, such as delay,
 history, ... will prevent this while not, at the same time, have
 very undesired effects on those legitimately changing isps.
 
 seems to me that certified validation of prefix ownership and as
 path are the only real way out of these problems that does not
 teach us the 42 reasons we use a *dynamic* protocol.

perhaps you mean certified validation of prefix origin
and path.  Ownership of any given prefix is a dicey concept
at best.

as a start, i'd want two things for authentication and integrity
checks:  AS P asserts it is the origin of prefix R and prefix R
asserts the true origin AS is P (or Q or some list).  Being able
to check these assertions and being assured of the authenticity
and integrity of the answers goes a long way, at least for me.

path validation is something else and a worthwhile goal.
--bill

 
 what am i missing here?
 
 randy


Re: So -- what did happen to Panix?

2006-01-27 Thread Michael . Dillon

 seems to me that certified validation of prefix ownership and as
 path are the only real way out of these problems that does not
 teach us the 42 reasons we use a *dynamic* protocol.

Wouldn't a well-operated network of IRRs used by 95% of
network operators be able to meet all three of your
requirements?

-certified prefix ownership
-certified AS path ownership
-dynamic changes to the above two items

It seems to me that most of the pieces needed to do
this already exist. RPSL, IRR softwares, regional
addressing authorities (RIRs). If there are to be
certified AS paths in a central database this also
opens the door to special arrangements for AS path
routing that go beyond peering, i.e. agreements with
the peers of your peers.

Seems to me that operational problem solving works
better when the problem is not thrown into the laps
of the protocol designers.

--Michael Dillon



Re: So -- what did happen to Panix?

2006-01-27 Thread Josh Karlin

 Wouldn't a well-operated network of IRRs used by 95% of
 network operators be able to meet all three of your
 requirements?

 -certified prefix ownership
 -certified AS path ownership
 -dynamic changes to the above two items

 It seems to me that most of the pieces needed to do
 this already exist. RPSL, IRR softwares, regional
 addressing authorities (RIRs). If there are to be
 certified AS paths in a central database this also
 opens the door to special arrangements for AS path
 routing that go beyond peering, i.e. agreements with
 the peers of your peers.


Hasn't that been said for years?  Wouldn't perfect IRRs be great?  I
couldn't agree more.  But in the meanwhile, why not protect your own
ISP by delaying possible misconfigurations.Our proposed delay does
*not* affect reachability, if the only route left is suspicious, it
will be chosen regardless.  If you are changing providers, which takes
awhile anyway, just advertise both for a day and you have no problems.
 Or, if you are concerned about speed, simply withdraw one and the new
one will have to be used.  If you are anycasting the prefix and a new
origin pops up that your view has not seen before, then you might have
a temporary load balance issue, but there is absolutely no guarantee
of what routers many hops away from you will see anyway.

Josh


Re: So -- what did happen to Panix?

2006-01-27 Thread Todd Underwood

randy, all,

On Fri, Jan 27, 2006 at 04:36:28AM -0800, Randy Bush wrote:
 
  what I saw by going through the diffs, etc.. that I have
  available to me is that the prefix was registered to be announced
  by our customer and hence made it into our automatic IRR filters.
 
 i.e., the 'error' was intended, and followed all process.

yep.  that's the depressing part.  

 so, what i don't see is how any hacks on routing, such as delay,
 history, ... will prevent this while not, at the same time, have
 very undesired effects on those legitimately changing isps.

you're probably right (as usual).  but it seems that if you delay
acceptance of announcements with novel origination patterns, you don't
harm very many legitimate uses.  in particular, ASes changing
upstreams won't be harmed at all.  people moving their prefix to a new
ISP will have a fixed delay in getting their announcement propagated,
sure.  but they already have this delay now.  

they tell the new ISP:  'announce my prefix' and the new ISP says
'prove it's yours'.  they do that for a couple of emails.  then the
new ISP asks it's upstreams to accept that announcement.  that takes a
little while (ranging from 4 to 72 hours in my recent experience).

 seems to me that certified validation of prefix ownership and as
 path are the only real way out of these problems that does not
 teach us the 42 reasons we use a *dynamic* protocol.

certified validation of prefix ownership (and path, as has been
pointed out) would be great.  it's clearly a laudable goal and seemed
like the right way to go.  but right now, no one is doing it.  the
rfcs that's i've found have all expired.  and the conversation about
it has reached the point where people seem to have stopped even
disagreeing about how to do it.  in short, it's as dead as dns-sec.
so what are we do do in the meantime?

t.

-- 
_
todd underwood
chief of operations  security 
renesys - internet intelligence
[EMAIL PROTECTED]   www.renesys.com


Re: So -- what did happen to Panix?

2006-01-27 Thread bmanning

On Fri, Jan 27, 2006 at 10:42:11AM -0500, Joe Abley wrote:
 
 On 27-Jan-2006, at 07:51, [EMAIL PROTECTED] wrote:
 
  perhaps you mean certified validation of prefix origin
  and path.
 
 In the absense of path valdiation, a method of determining the real  
 origin of a prefix is also required, if the goal is to prevent  
 intentional hijacking as well as unintentional origination. Simply  
 looking at the right-most entry in the AS_PATH doesn't cut it, since  
 anybody can set as-path prepend P.

but by definition, the right-most entry is the prefix origin...
the question becomes, is that the origin the prefix expects?
to use an historical example:

198.32.6.0/24 thinks that AS 4555 is the correct origin
AS 4555 thinks that it should (and does) originate prefix 198.32.6.0/24
AS 4555 uses AS 226 and 701 as transit providers.

AS 1239 wants to be helpful and tells its peers that it is 
the proper origin for prefix 198.32.0.0/16 -BUT- never tells
AS 4555 about this and has no direct means to deliver packets
to AS 4555. 

Or... we see 128.9.160.0/24 as originating from multiple ASNs.
there is no requirement for single AS origin - is that theft
or an engineering tradeoff?

 
 This suggests to me that either we can't separate origin validation  
 from path validation (which sucks the former into the more difficult  
 problems associated with the latter), or we need a better measure of  
 origin (e.g. a PKI and an attribute which carries a signature).

i was just interested in the problem of assertion of origination.   
it needs to be done w/o a centralized repositiory (imho) because
that method has scalability problems.  such a technique does open
new chances to confuse ...  e.g. what happens when the prefix
is seen from the same apparent AS but w/ two or more different 
signatures?

path validation is (again imho) a severable problem the prefix/as
origin.
 
 
 Joe


Re: So -- what did happen to Panix?

2006-01-27 Thread Michael . Dillon

 certified validation of prefix ownership (and path, as has been
 pointed out) would be great.  it's clearly a laudable goal and seemed
 like the right way to go.  but right now, no one is doing it.  the
 rfcs that's i've found have all expired.  and the conversation about
 it has reached the point where people seem to have stopped even
 disagreeing about how to do it.  in short, it's as dead as dns-sec.
 so what are we do do in the meantime?

Perhaps people should stop trying to have these
operational discussions in the IETF and take the
discussions to NANOG where network operators gather.
Writing RFCs is a fine way to document operational
best practices, but it is not a good way to work out
joint operational practices. 

Of course, NANOG is no magic bullet, but it seems
like a more reasonable place to talk about how
to make things better.

A good start would be to try and get an agreed statement
of what the problem is. Once you have broad agreement on
the problem, then move on to solutions.

--Michael Dillon



Re: So -- what did happen to Panix?

2006-01-27 Thread Joe Abley



On 27-Jan-2006, at 11:12, [EMAIL PROTECTED] wrote:


but by definition, the right-most entry is the prefix origin...


Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555  
to the AS_PATH as it does so. Suppose 9327's uses a transit provider  
which builds prefix filters from the IRR, and the as9327 aut-num  
object is modified to include policy which suggests 9327 provides  
transit for 4555. Suppose this is not actually the case, though, and  
in fact 9327 is a rogue AS which is trying to capture 4555's traffic.


The rest of the world sees a prefix with an AS_PATH attribute which  
ends with 9327 4555.


In this case, from the point of view of those trying to discern  
legitimacy of advertisements, what is the origin of the prefix? Is it  
4555, or 9327?


Is it possible to tell, from just the right-most entry in the AS_PATH  
attribute?



Joe

[note: 9327 is not a rogue AS, in fact. This is just hypothetical :-)]



Re: So -- what did happen to Panix?

2006-01-27 Thread Stephen Sprunk


Thus spake [EMAIL PROTECTED]

seems to me that certified validation of prefix ownership and as
path are the only real way out of these problems that does not
teach us the 42 reasons we use a *dynamic* protocol.


Wouldn't a well-operated network of IRRs used by 95% of
network operators be able to meet all three of your
requirements?


We have such a database (used by Verio and others), but the Panix incident 
happened anyway due to bit rot.  We've got to find a way to fix the layer 8 
problems before we can make improvements at layer 3.


S

Stephen SprunkStupid people surround themselves with smart
CCIE #3723   people.  Smart people surround themselves with
K5SSS smart people who disagree with them.  --Aaron Sorkin



Re: So -- what did happen to Panix?

2006-01-27 Thread Patrick W. Gilmore


On Jan 27, 2006, at 8:29 AM, [EMAIL PROTECTED] wrote:


seems to me that certified validation of prefix ownership and as
path are the only real way out of these problems that does not
teach us the 42 reasons we use a *dynamic* protocol.


Wouldn't a well-operated network of IRRs used by 95% of
network operators be able to meet all three of your
requirements?


Maybe I missed something, but didn't Verio say the prefix was in  
their internal registry, and that's why it was accepted.


IOW: It didn't solve this problem.  So I guess we're discussing the  
other 5%?


--
TTFN,
patrick


Re: So -- what did happen to Panix?

2006-01-27 Thread Patrick W. Gilmore


On Jan 27, 2006, at 11:39 AM, Joe Abley wrote:


On 27-Jan-2006, at 11:12, [EMAIL PROTECTED] wrote:


but by definition, the right-most entry is the prefix origin...


Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends  
4555 to the AS_PATH as it does so. Suppose 9327's uses a transit  
provider which builds prefix filters from the IRR, and the as9327  
aut-num object is modified to include policy which suggests 9327  
provides transit for 4555. Suppose this is not actually the case,  
though, and in fact 9327 is a rogue AS which is trying to capture  
4555's traffic.


The rest of the world sees a prefix with an AS_PATH attribute which  
ends with 9327 4555.


In this case, from the point of view of those trying to discern  
legitimacy of advertisements, what is the origin of the prefix? Is  
it 4555, or 9327?


Is it possible to tell, from just the right-most entry in the  
AS_PATH attribute?


Suggested solutions do not have to solve every possible problem.

Knowing the correct origin will stop accidental announcements, like  
the one under discussion in this thread.


And, I suspect, most problems we see today of this sort.  We are not  
(yet) to the point where maliciously originated prefixes are as big a  
problem as accidentally originated prefixes.


--
TTFN,
patrick


Re: So -- what did happen to Panix?

2006-01-27 Thread bmanning

On Fri, Jan 27, 2006 at 11:39:27AM -0500, Joe Abley wrote:
 
 On 27-Jan-2006, at 11:12, [EMAIL PROTECTED] wrote:
 
  but by definition, the right-most entry is the prefix origin...
 
 Suppose AS 9327 decides to originate 198.32.6.0/24, but prepends 4555  
 to the AS_PATH as it does so. Suppose 9327's uses a transit provider  
 which builds prefix filters from the IRR, and the as9327 aut-num  
 object is modified to include policy which suggests 9327 provides  
 transit for 4555. Suppose this is not actually the case, though, and  
 in fact 9327 is a rogue AS which is trying to capture 4555's traffic.
 
 The rest of the world sees a prefix with an AS_PATH attribute which  
 ends with 9327 4555.
 
 In this case, from the point of view of those trying to discern  
 legitimacy of advertisements, what is the origin of the prefix? Is it  
 4555, or 9327?


from BGP's perspective, you tell me.  being the naive BGP
listen/speaker - i think that AS 4555 is the origin.

now... what does  Prefix 198.32.6.0/24 say is the correct
origin?  

 Is it possible to tell, from just the right-most entry in the AS_PATH  
 attribute?

nope - but you have jumped right into the path question.
(what does the as4555 aut-num object say about using 9327
as an upstream AS?)


 Joe
 
 [note: 9327 is not a rogue AS, in fact. This is just hypothetical :-)]

sez you :) (reminder to send Cingular the royalty check if you
receive the above two characters : and ) as listed above
AND you chose to infer mood or intent.)

I think -all- AS are run by rouges and pirates.

-- (headless) bill


Re: So -- what did happen to Panix?

2006-01-27 Thread Joe Abley



On 27-Jan-2006, at 11:54, Patrick W. Gilmore wrote:


On Jan 27, 2006, at 8:29 AM, [EMAIL PROTECTED] wrote:


seems to me that certified validation of prefix ownership and as
path are the only real way out of these problems that does not
teach us the 42 reasons we use a *dynamic* protocol.


Wouldn't a well-operated network of IRRs used by 95% of
network operators be able to meet all three of your
requirements?


Maybe I missed something, but didn't Verio say the prefix was in  
their internal registry, and that's why it was accepted.


Perhaps by well-operated, Michael was referring to something like  
the hierarchical authentication scheme used by the RIPE database,  
which ultimately provides access control for route objects using RIR  
allocation/assignment data?



Joe


Re: So -- what did happen to Panix?

2006-01-27 Thread Patrick W. Gilmore


On Jan 27, 2006, at 12:57 PM, Joe Abley wrote:

On 27-Jan-2006, at 11:54, Patrick W. Gilmore wrote:

On Jan 27, 2006, at 8:29 AM, [EMAIL PROTECTED] wrote:


seems to me that certified validation of prefix ownership and as
path are the only real way out of these problems that does not
teach us the 42 reasons we use a *dynamic* protocol.


Wouldn't a well-operated network of IRRs used by 95% of
network operators be able to meet all three of your
requirements?


Maybe I missed something, but didn't Verio say the prefix was in  
their internal registry, and that's why it was accepted.


Perhaps by well-operated, Michael was referring to something like  
the hierarchical authentication scheme used by the RIPE database,  
which ultimately provides access control for route objects using  
RIR allocation/assignment data?


Yet it can still have stale data.

That said, if there were a centralized store for such information and  
you were in charge of your objects, then the only person to blame  
when your prefix was incorrectly accepted would be you.  (We're  
talking things like accidental origination here, not malicious  
attempts to go around safeguards.)


Put more concretely, Panix would have no one to blame but themselves  
if Verio accepted a prefix because it was properly registered in the DB.


This, IMHO, would be a Good Thing.

Not a panacea, but a Good Thing.  And would avoid some very long  
threads on NANOG (which is also a Good Thing :).


--
TTFN,
patrick


Re: So -- what did happen to Panix?

2006-01-27 Thread sandy

Todd Underwood wrote:

you're probably right (as usual).  but it seems that if you delay
acceptance of announcements with novel origination patterns, you don't
harm very many legitimate uses.  in particular, ASes changing
upstreams won't be harmed at all.  people moving their prefix to a new
ISP will have a fixed delay in getting their announcement propagated,
sure.  but they already have this delay now.  

they tell the new ISP:  'announce my prefix' and the new ISP says
'prove it's yours'.  they do that for a couple of emails.  then the
new ISP asks it's upstreams to accept that announcement.  that takes a
little while (ranging from 4 to 72 hours in my recent experience).

This is great for the planned changes, but real-time changes to
respond to Internet dynamics won't work well with such delays.  If you
are multi-homed to provide a backup, you would like for it to respond
more quickly than 4-72 hours, I'll bet.  So if you have PI space but not
your own AS, your backup route would look like a novel origination,
but you sure wouldn't want it delayed.

How common are such cases?  Should the solutions cover them also?
Should there be special procedures to deal with special cases?
Etc.

--Sandy


Re: So -- what did happen to Panix?

2006-01-27 Thread sandy

Todd Underwood wrote:

 seems to me that certified validation of prefix ownership and as
 path are the only real way out of these problems that does not
 teach us the 42 reasons we use a *dynamic* protocol.

certified validation of prefix ownership (and path, as has been
pointed out) would be great.  it's clearly a laudable goal and seemed
like the right way to go.  but right now, no one is doing it.  the
rfcs that's i've found have all expired.  and the conversation about
it has reached the point where people seem to have stopped even
disagreeing about how to do it.  in short, it's as dead as dns-sec.
so what are we do do in the meantime?

(a) I'd hardly say dead - there's the sidr work starting up in the
IETF with vendor/operator/registry participation.  And there was a
panel discussion at the last NANOG about government efforts to assemble
the right people (vendors/operators/registries/etc) to work on routing
infrastructure security - and prefix origination was one of the biggest
item on everyone's list of goals/hopes/longings/dreams.  
(Truth in advertising: I've been one of those involved in the gov't 
sponsored workshops.)

(b) dnssec isn't dead - there's serious work afoot to get it deployed.
Sweden and RIPE have signed their zones.  There are web sites
that point to work going on, if you'd like to know more:
   www.dnssec-deployment.org
   www.dnssec.net
(Truth in advertising: I work with people who are working on this.)

(z) I think you mean internet drafts, not rfcs.  I don't think
there have been any rfcs (would there were - we'd be in a different
situation), and rfcs don't expire.

--Sandy


Re: So -- what did happen to Panix?

2006-01-27 Thread sandy

Michael.Dillon wrote:

Writing RFCs is a fine way to document operational
best practices, but it is not a good way to work out
joint operational practices. 


Seems to me that operational problem solving works
better when the problem is not thrown into the laps
of the protocol designers.

If the solution turns out to be joint operational practice, then
operators need to be involved, natch.  If the solution turns out to be
protocols, then the protocol designers need to be involved along with
the operators.

I'm not so certain that operational practices will fix this problem -
it could be argued that the fundamental vulnerabilites in the way
routing info is communicated would be better fixed in the protocol.

--Sandy


Re: So -- what did happen to Panix?

2006-01-27 Thread Todd Underwood

 
 This is great for the planned changes, but real-time changes to
 respond to Internet dynamics won't work well with such delays.  If you
 are multi-homed to provide a backup, you would like for it to respond
 more quickly than 4-72 hours, I'll bet.  So if you have PI space but not
 your own AS, your backup route would look like a novel origination,
 but you sure wouldn't want it delayed.

no.

the scheme that josh karlin has been advocating in pretty good bgp
involved only supressing a doubtful announcement when you have a
better, more trusted announcement.  it remains to be seen how hard
this would be to implement in existing systems of build filters in
configs and push to routers.  this only works obviously well in
systems that centralize route selection and use routers only as
forwarding engines.  that might be a cool idea, but it's not what we
have now.

if you don't use the pgbgp scheme, you can still get the benefits of
being no worse than what we have now.  consider this just a different,
more automatic, more scalable, more secure way of building and
maintaing the prefix filter that we all are supposed to maintaining
already.

i'll be happy to talk to interested parties at nanog in dallas about
this (or almost anything else, expecially if you're buying).

t.

-- 
_
todd underwood
chief of operations  security 
renesys - internet intelligence
[EMAIL PROTECTED]   www.renesys.com


Re: So -- what did happen to Panix?

2006-01-26 Thread Daniel Golding


In terms of the larger question

ConEd Communications was recently acquired by RCN. I'm not sure if the
transaction has formally closed. I suspect there are serious transition
issues occurring. Financial Stability, Employee Churn, and Ownership
are, unfortunately, tough things to factor into BGP algorithms.

http://investor.rcn.com/ReleaseDetail.cfm?ReleaseID=181194

Internet access has always been a sideline for CEC - they are more of a
provider of transport, and their customers have included some very well
known entities in the NY metro area.

Perhaps someone from RCN would care to comment?

- Dan



Re: So -- what did happen to Panix?

2006-01-26 Thread Matt Buford


Daniel Golding [EMAIL PROTECTED] wrote:

ConEd Communications was recently acquired by RCN. I'm not sure if the
transaction has formally closed. I suspect there are serious transition
issues occurring. Financial Stability, Employee Churn, and Ownership
are, unfortunately, tough things to factor into BGP algorithms.


I have no idea if this is really related, but the issue was the same weekend 
that ConEd had major network maintenance going on.  My ConEd service was 
down (NYC area) for the entire weekend (about 60 hours) during their planned 
maintenance window to convert their network to MPLS.  I saw their 
maintenance notice and noticed that the window lasted multiple days.  I 
expected the link to go down - but I never imagined they meant it would stay 
down for the entire maintenance window.


So, I'm speculating that even if there weren't organization issues their 
engineers were probably very busy and distracted by the major technical 
changes going on. 



Re: So -- what did happen to Panix?

2006-01-26 Thread Todd Underwood

Steven, all,

On Wed, Jan 25, 2006 at 03:04:30PM -0500, Steven M. Bellovin wrote:
 
 It's now been 2.5 business days since Panix was taken out.  Do we know 
 what the root cause was?  It's hard to engineer a solution until we 
 know what the problem was.

I keep hearing that Con Ed Comm was previously an upstream of of Panix
( http://www.renesys.com/blog/2006/01/coned_steals_the_net.shtml#comments )
and that this might have explained why Con Ed had Panix routes in
their radb as-27506-transit object.  But I checked our records
of routing data going back to jan 1, 2002, and see no evidence of
27506 and 2033 being adjacent to each other in any announcement from
any of our peers at any time since then.  So I can't really verify
that Panix was ever a Con Ed Comm customer.  Can anyone else clear
this up?  So far, it's not making sense.

The supposition was that all of the other affected ASes that are not
currently customers of Con Ed Comm were also previously customers.
Some appear to have been (Walrus Internet (AS7169), Advanced Digital
Internet (AS23011), and NYFIX (AS20282) for sure) but I haven't been
able to verify that all of them were.  

I know that this isn't really a root cause that Steven was asking
for, though.  The root cause is that filtering is imperfect and out of
date frequently. This case is particularly intersting and painful
because Verio is known for building good filters automatically.  In
this case, they did so based on out-of-date information,
unfortunately. This is particularly depressing because normally in
cases of leaks like this, the propagation is via some provider or peer
who doesn't filter at all.  In this case, one of the vectors was one
of the most responsible filterers on the net.  sigh. 

So in terms of engineering good solutions, the space is pretty
crowded.  One camp is of the total solution variety that involves
new hardware, new protocols, and a Public Key approach where
originations (or any announcements) are signed and verified.  This is
obviously a very good and complete approach to the problem but it's
also obviously seeing precious little adoption.  And in the mean time
we have nothing.

Another set of approaches has been to look at alternate methods of
building filters, taking into account more information about history
of routing announcements and dampening or refusing to accept novel,
questionable announcements for some fixed, short amount of time.  Josh
Karlin's paper suggests that as does some of the stuff that Tom
Scholl, Jim Deleskie and I presented at the last nanog. All of this
has the disadvantage of being a partial solution, the advantage of
being implementable easily and in stages without a network forklift or
a protocol upgrade, but the further disadvantage of being nowhere near
fully baked. 

Clearly more, smarter people need to keep searching for good solutions
to this set of problems.  Extra credit for solutions that can be
implemented by individual autonomous systems without hardware upgrades
or major protocol changes, but that may not be possible.

t.

p.s.:  wrt comments made previously that imply that moving parts of
routing control off of the routers is Bell-like or bell-headed:
although the comments are silly and made somewhat in jest, they're
obviously not true.  anyone who builds prefix filters or access lists
off of routers is already generating policy somewhere other than the
router.  using additional history or smarts to do that and uploading
prefix filters more often doesn't change that existing architecture or
make the network somehow bell-like.  it might not work well enough
to solve the problem, but that's another, interesting objection.


-- 
_
todd underwood
chief of operations  security 
renesys - internet intelligence
[EMAIL PROTECTED]   http://www.renesys.com/blog


Re: So -- what did happen to Panix?

2006-01-26 Thread Jared Mauch

Dislcaimer: I work for AS2914

On Thu, Jan 26, 2006 at 02:39:59PM -0500, Todd Underwood wrote:
 Another set of approaches has been to look at alternate methods of
 building filters, taking into account more information about history
 of routing announcements and dampening or refusing to accept novel,
 questionable announcements for some fixed, short amount of time.  Josh
 Karlin's paper suggests that as does some of the stuff that Tom
 Scholl, Jim Deleskie and I presented at the last nanog. All of this
 has the disadvantage of being a partial solution, the advantage of
 being implementable easily and in stages without a network forklift or
 a protocol upgrade, but the further disadvantage of being nowhere near
 fully baked. 
 
 Clearly more, smarter people need to keep searching for good solutions
 to this set of problems.  Extra credit for solutions that can be
 implemented by individual autonomous systems without hardware upgrades
 or major protocol changes, but that may not be possible.
 
 t.
 
 p.s.:  wrt comments made previously that imply that moving parts of
 routing control off of the routers is Bell-like or bell-headed:
 although the comments are silly and made somewhat in jest, they're
 obviously not true.  anyone who builds prefix filters or access lists
 off of routers is already generating policy somewhere other than the
 router.  using additional history or smarts to do that and uploading
 prefix filters more often doesn't change that existing architecture or
 make the network somehow bell-like.  it might not work well enough
 to solve the problem, but that's another, interesting objection.

This is something that (as i mentioned to you in private) some others
have thought of as well.  We at 2914 build the filters and such off-the-route
and load them to the router with sometimes quite large configurations.
(they have been ~8MB in the past)

I'd love to see some prefix stability data (eg: 129.250/16
has been announced by origin-as 2914 for X years/seconds/whatnot)
which can help score the data better.  Do we need a origin-as match
in our router policies?  does it exist already?  What about a way to
dampen/delay announcements that don't match the origin-as data
that exists?

I think a solution like this would help out a number of networks
that have these types of problems/challenges.  Obviously noticing an
origin change and alerting or similar on that would be nice and useful,
but would the noise be too much for a NOC display?

- jared

ps. i'm glad our NOC/operations people were able to solve the PANIX
issue quickly for them.

-- 
Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.


Re: So -- what did happen to Panix?

2006-01-26 Thread Josh Karlin

The noise of origin changes is fairly heavy, somewhere in the low
hundreds of alerts per day given a 3 day history window.  Supposing a
falsely originated route was delayed, what is the chance of identifying
and fixing it before the end of the delay period?  Do operators
commonly catch misconfigurations on their own or do they usually find
out about it from other operators due to service disruption?


Re: So -- what did happen to Panix?

2006-01-26 Thread Jared Mauch

On Thu, Jan 26, 2006 at 04:22:29PM -0700, Josh Karlin wrote:
 The noise of origin changes is fairly heavy, somewhere in the low
 hundreds of alerts per day given a 3 day history window.  Supposing a
 falsely originated route was delayed, what is the chance of identifying
 and fixing it before the end of the delay period?  Do operators
 commonly catch misconfigurations on their own or do they usually find
 out about it from other operators due to service disruption?

Are the origin changes for a small set of the prefixes
that tend to repeat (eg: connexion as planes move), or is it a different
set of prefixes day-to-day or week-to-week?

I suspect there are the obvious prefixes that don't change
(eg: 12/8, 18/8, 35/8, 38/8)  but subparts of that may change, but
for most people with allocations in the range of 12-17 bits, I suspect
they won't change frequently.

- jared

-- 
Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.


Re: So -- what did happen to Panix?

2006-01-26 Thread Josh Karlin

I unfortunately don't have answers to those questions, but you've
piqued my interest so I will try to look into it within the next
couple of days.

Josh



On 1/26/06, Jared Mauch [EMAIL PROTECTED] wrote:
 On Thu, Jan 26, 2006 at 04:22:29PM -0700, Josh Karlin wrote:
  The noise of origin changes is fairly heavy, somewhere in the low
  hundreds of alerts per day given a 3 day history window.  Supposing a
  falsely originated route was delayed, what is the chance of identifying
  and fixing it before the end of the delay period?  Do operators
  commonly catch misconfigurations on their own or do they usually find
  out about it from other operators due to service disruption?

 Are the origin changes for a small set of the prefixes
 that tend to repeat (eg: connexion as planes move), or is it a different
 set of prefixes day-to-day or week-to-week?

 I suspect there are the obvious prefixes that don't change
 (eg: 12/8, 18/8, 35/8, 38/8)  but subparts of that may change, but
 for most people with allocations in the range of 12-17 bits, I suspect
 they won't change frequently.

 - jared

 --
 Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
 clue++;  | http://puck.nether.net/~jared/  My statements are only mine.



Re: So -- what did happen to Panix?

2006-01-26 Thread Randy Bush

jared,

i may have missed the answer to my question.  but, as verio was
the upstream, and verio is known to use the irr to filter, could
you tell us why that approach seemed not to suffice in this case?

randy



Re: So -- what did happen to Panix?

2006-01-26 Thread Jared Mauch

On Thu, Jan 26, 2006 at 05:41:10PM -0800, Randy Bush wrote:
 jared,
 
 i may have missed the answer to my question.  but, as verio was
 the upstream, and verio is known to use the irr to filter, could
 you tell us why that approach seemed not to suffice in this case?

Sure, what I saw by going through the diffs, etc.. that I have
available to me is that the prefix was registered to be announced
by our customer and hence made it into our automatic IRR filters.  it was
no longer in there by the time that I personally looked things up in
our registry, but saw diffs go through removing that prefix later in
the day (night) from the acl.

Someone that has a snapshot of the various IRR data from 
those days can likely put this together better than I can explain.

- jared

-- 
Jared Mauch  | pgp key available via finger from [EMAIL PROTECTED]
clue++;  | http://puck.nether.net/~jared/  My statements are only mine.


So -- what did happen to Panix?

2006-01-25 Thread Steven M. Bellovin

It's now been 2.5 business days since Panix was taken out.  Do we know 
what the root cause was?  It's hard to engineer a solution until we 
know what the problem was.

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb




Re: So -- what did happen to Panix?

2006-01-25 Thread william(at)elan.net



On Wed, 25 Jan 2006, Steven M. Bellovin wrote:


It's now been 2.5 business days since Panix was taken out.  Do we know
what the root cause was?  It's hard to engineer a solution until we
know what the problem was.


Is it really that hard to engineer this solution? We do have several of 
them proposed (SBGP, soBGP, etc) and new WG is likely to be formed soon

within IETF to finally work it out.

--
William Leibzon
Elan Networks
[EMAIL PROTECTED]


Re: So -- what did happen to Panix?

2006-01-25 Thread Pekka Savola


On Wed, 25 Jan 2006, william(at)elan.net wrote:

On Wed, 25 Jan 2006, Steven M. Bellovin wrote:

It's now been 2.5 business days since Panix was taken out.  Do we know
what the root cause was?  It's hard to engineer a solution until we
know what the problem was.


Is it really that hard to engineer this solution? We do have several of them 
proposed (SBGP, soBGP, etc) and new WG is likely to be formed soon

within IETF to finally work it out.


It'd be darn difficult to engineer a solution that would end up being 
deployed in any reasonable time if we don't know the requirements 
first.  Yes, there's a draft -- draft-ietf-rpsec-bgpsecrec-03.txt -- 
but it has been woefully lacking on the operator  deployment 
requirements.  More people should participate in the effort.


--
Pekka Savola You each name yourselves king, yet the
Netcore Oykingdom bleeds.
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings


Re: So -- what did happen to Panix?

2006-01-25 Thread Valdis . Kletnieks
On Thu, 26 Jan 2006 07:54:30 +0200, Pekka Savola said:
 It'd be darn difficult to engineer a solution that would end up being 
 deployed in any reasonable time if we don't know the requirements 
 first.

Fortunately, when we know the requirements and engineer a solution, deployment
is straightforward.  RFC2827, for example, has a stellar deployment record.

In other words - what is the business case for deploying this proposed
solution?  I may be able to get things deployed at $WORK by arguing that
it's The Right Thing To Do, but at most shops an ROI calculation needs
to be attached to get movement


pgpDLlZdD3ply.pgp
Description: PGP signature


Re: So -- what did happen to Panix?

2006-01-25 Thread Pekka Savola


On Thu, 26 Jan 2006, [EMAIL PROTECTED] wrote:

In other words - what is the business case for deploying this proposed
solution?  I may be able to get things deployed at $WORK by arguing that
it's The Right Thing To Do, but at most shops an ROI calculation needs
to be attached to get movement


Exactly.  If $OTHER_FOLKS don't deploy it, cases like Panix may not 
really be avoided.


I think that's what folks proposing perfect -- but practically 
undeployable -- security solutions are missing.


--
Pekka Savola You each name yourselves king, yet the
Netcore Oykingdom bleeds.
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings


Re: So -- what did happen to Panix?

2006-01-25 Thread Steven M. Bellovin

In message [EMAIL PROTECTED], Pekka Savola writes:
On Thu, 26 Jan 2006, [EMAIL PROTECTED] wrote:
 In other words - what is the business case for deploying this proposed
 solution?  I may be able to get things deployed at $WORK by arguing that
 it's The Right Thing To Do, but at most shops an ROI calculation needs
 to be attached to get movement

Exactly.  If $OTHER_FOLKS don't deploy it, cases like Panix may not 
really be avoided.

I think that's what folks proposing perfect -- but practically 
undeployable -- security solutions are missing.


That is, of course, why I asked the question -- I'm trying to 
understand the actual failure modes and feasible fixes.  I agree that 
many of the solutions proposed thus far are hard to deploy; some 
colleagues and I are working on variants that we think are deployable.  
But we need data first.

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb