Re: Hijacked IP space.

2003-11-03 Thread James Cowie


On Mon, Nov 03, 2003 at 09:17:38PM +, Andrew - Supernews wrote:
 
  chuck == chuck goolsbee [EMAIL PROTECTED] writes:
 
  chuck Of course I have no hard data, other than my client's phone
  chuck call about another phone call, so I can't query based on a
  chuck timestamp to see where this was being announced from. It
  chuck appears to vanished, and has remained so according to my
  chuck casual glances here and there.
 
  chuck The netblock in question is:
 
  chuck 204.89.0.0/21
 
 No announcement for that block has been visible here at any time in
 the past couple of weeks (specifically, since Oct 13). We might have
 missed it if it was never announced for more than a few minutes at a
 time, but it's _much_ more likely that the block was never announced
 and was merely forged into headers of a spam.

Our system reports that neither that prefix, nor any of its 
more-specifics, has been seen in the global routing tables at 
any moment since January 1st, 2002.  [ http://www.renesys.com ] 

--
James Cowie
Renesys Corporation
cowie at renesys.com



Re: BellSouth prefix deaggregation (was: as6198 aggregation event)

2003-10-13 Thread James Cowie


On Sun, Oct 12, 2003 at 01:18:59AM -0400, Terry Baranski wrote:
 More on this -
 
 Two of BellSouth's AS's (6197  6198) have combined to inject around
 1,000 deaggregated prefixes into the global routing tables over the last
 few weeks (in addition to their usual load of ~600+ for a total of
 ~1,600).   

Kudos to BellSouth for taking first steps to clean this up 
overnight -- about 1100 of the prefixes left the table between 
about 01:45 and 03:45 GMT.  Very nice deflation. 

Next topic: multiple origin ASNs .. 

--
James Cowie
Renesys Corporation
[EMAIL PROTECTED]
603-464-5799 voice
603-464-6014 fax 



as6198 aggregation event

2003-10-05 Thread James Cowie


On Friday, we noted with some interest the appearance of more 
than six hundred deaggregated /24s into the global routing 
tables.  More unusually, they're still in there this morning.  

AS6198 (BellSouth Miami) seems to have been patiently injecting 
them over the course of several hours, between about 04:00 GMT 
and 08:00 GMT on Friday morning (3 Oct 2003).  

Here's a quick pic: 

   http://www.renesys.com/images/6198.gif

I can share more details offline.

Usually when we see deaggregations, they hit quickly and they 
disappear quickly; nice sharp vertical jumps in the table size. 
This event lasted for hours and, more importantly, the prefixes 
haven't come back out again, an unusual pattern for a single-origin
change that effectively expanded global tables by half a percent.   

Can anyone share any operational insight into the likely causes of 
this event?  Has it caused any operational problems for anyone?  


Thanks! 

--
James Cowie
Renesys Corporation



Re: what happened to ARIN tonight ?

2003-09-29 Thread James Cowie


On Sun, Sep 28, 2003 at 10:26:00PM -0400, Robert Boyle wrote:
 
 At 10:07 PM 9/28/2003, you wrote:
 
 I am seeing the same. ARIN is completely off the air
 
 
 box02rsm-en01.twdx.net sh ip bgp 192.149.252.16
 % Network not in table
 
 I see them via a UUNet announcement through Veroxity and Sprint transit, 
 but I don't see it via any other peer or transit provider. Are they 
 multi-homed?

Single-homed /24 through UUNet's 7046 to 701. Withdrawals started at 01:21:38 GMT
(21:21:38 Eastern time), and ARIN flapped severely for about fifteen minutes.   

Then they spent another hour and ten minutes inconsistently reachable from half the 
world, with the picture mutating slowly.  701 seems to have been telling inconsistent 
stories about ARIN's reachability, depending on which of our peers you consulted  -- 
IGP instability?  By 03:10 GMT everyone seems to have slowly gotten a  
consistent picture again, with ARIN restored. 

AS7046 advertises 320 prefixes, of which ARIN is but one.  Others (139.177.0.0/16, 
139.85.0.0/16, 192.136.136.0/24) had even worse problems in the same timeframe.

My guess would be the collision of two maintenance windows -- a simple problem on 
ARIN's end at the wrong moment, with global convergence severely prolonged by one of 
the UUNet BGP tummy-rumbles that we commonly hear in the middle of the night.  Some 
of the little UUNet ASNs with 701 upstream can be really noisy -- 816 was rocking 
all night long. 

--
James Cowie
Renesys Corporation
[EMAIL PROTECTED]


Re: On the back of other 'security' posts....

2003-08-30 Thread cowie


On Sat, Aug 30, 2003 at 08:17:39PM +1000, Matthew Sullivan wrote:
 
 Hi All,
 
 On the back of the latest round of security related posts, anyone notice 
 the 50% packet loss (as reported to me) across the USA - NZ links 
 around lunchtime (GMT+10) today?

Yep, easily .. we saw big routing problems for that /19 and a lot 
of innocent bystanders, in two waves, correlated across most sources.  

First signs of trouble at 23:12 GMT.  Then long periods of unreachability, 
from roughly 23:15-23:42 GMT and 01:26-02:10 GMT (the second one less 
well-correlated; routes were still available from some of our peers). 
Routes were fully restored globally by 02:12 GMT, presumably after the 
upstream rate limiting kicked in. 

If anyone has mrtg plots for the affected links, I'd sure like to see 'em. 


--
James Cowie
Renesys Corporation
[EMAIL PROTECTED]


Re: Optigate

2003-08-25 Thread cowie


 On Mon, Aug 25, 2003 at 09:17:49AM -0700, Mark V. A. Bown wrote:
  
  This morning at 4:30am pst 200 Paul St. facility in San Francisco went
  dark. The power died and took our 200 Paul St. Router offline. The power
  was restored at 8:00am pst. 
 
 I think you mean PDT. Also, while DC went out at 4:30AM and was restored 
 at 8:00AM, AC power went out at around 3:00AM and wasn't restored until 
 8:30AM.

Both 65.60.0.0/18 and 64.200.195.0/24 were out from 03:09:19 PDT;
they were restored starting at 8:12:27 PDT, converged globally by  
08:14:15 PDT.   

--
James Cowie
Renesys Corporation
[EMAIL PROTECTED]


Re: BGP route tracking.

2003-08-19 Thread cowie


  Anybody watching the bgp routing table.. I see about 5,000 less routes than
  usual.   Anybody know a good pointer..
 
 Okay, here are a couple quick screenshots of what we're looking at 
 tonight.  [..]

We've collected some more plots and maps describing BGP outage 
patterns during last week's blackout.   Feedback welcome.  

http://www.renesys.com/news

--
James Cowie
Renesys Corporation


Re: BGP route tracking.

2003-08-15 Thread cowie


Some updated images of routing table size and 7-day prefix withdrawals: 

http://gradus.renesys.com/aug2003/blackout3-rtsize.gif 

http://gradus.renesys.com/aug2003/aug7-aug15-withdrawals.gif

(Blackout is event #3 on the right.)

We're about halfway back to the table sizes we started with yesterday 
before the grid tripped. If it trends the same way through the 
afternoon, it could be back to its old self sometime around midnight 
GMT; previous experience suggests that it might stabilize somewhat 
higher than before the event.  

At any rate, steady improvement as power returns across the East.

--
James Cowie
Renesys Corporation
[EMAIL PROTECTED]





 Okay, here are a couple quick screenshots of what we're looking at 
 tonight.
 
 First, a plot that shows the routing table size shrinkage since the 
 onset of the blackout at 16:13:07 +/- EDT, across a group 
 of routers. 
 
  http://gradus.renesys.com/aug2003/blackout1-rtsize.gif 
 
 Second, a wider-angle 3D plot of prefix withdrawal rates over the 
 last week, reported by various peers (one line per). 
 
  http://gradus.renesys.com/aug2003/aug7-aug14-withdrawals.gif
 
 The blackout is the big event at the right hand edge (#3). 
 
 Note the sustained high rates of route withdrawal that have been 
 the norm since the onset of MSBlast.  Unlike typical single-cause 
 events (like the one marked #1), MSBlast scanning has caused 
 prefix withdrawal rates to gently lift off into a noiser mode 
 across the board, lasting for days (so far).
 
  (im just getting bored waiting for us to run out of fuel)
 
 Ouch .. 
 
 
 --
 James Cowie
 Renesys Corporation
 [EMAIL PROTECTED]
 



Re: Level3 routing issues?

2003-01-28 Thread cowie


  Wow, for a minute I thought I was looking at one of our old
  plots, except for the fact that the x-axis says January 2003
  and not September 2001 :) :)
 
 seeing that the etiology and effects of the two events were quite
 different, perhaps eyeglasses which make them look the same are
 not as useful as we might wish?
 
 randy

If you've been watching, you might agree that the interesting thing is 
not that it looked like that in September 2001,  but that we really
haven't seen a signal that looks like that SINCE September 2001.  

The large differences between the worms are exactly what should make us  
doubly interested in fingering the common mechanism that connects very 
high speed, high diversity wormscan to increased bgp activity.

So far it's been visible as an apparently accidental byproduct of an attack
with other goals.  Are you willing to bet your bifocals that the same 
mechanism can't be weaponized and used against the routing infrastructure 
directly in the future?

--
James Cowie
Renesys Corporation
http://gradus.renesys.com





Re: Level3 routing issues?

2003-01-28 Thread cowie

  So far it's been visible as an apparently accidental byproduct of an
 attack
  with other goals.  Are you willing to bet your bifocals that the same
  mechanism can't be weaponized and used against the routing infrastructure
  directly in the future?
 
 
 Yet the question becomes the reasoning behind it. How much is a direct
 result of the worm and how much is a result of actions based on the NE's?

Good question. null routing of traffic destined to a network with a BGP
interface on it will cause the session to drop. That is a BGP effect due
to engineers' actions, indirectly triggered by the worm.  

On the other hand, we also know (from private communications and from
other mailing lists.. ahem) that high rate and high src/dst diversity
of scans causes some network devices to fail (devices that cache flows, or
devices that suffer from cpu overload under such conditions). 

Some BGP-speaking routers (not all, by any means, but some subpopulation)
found themselves pegged at 100% CPU on Saturday.  Just one example: 

   http://noc.ilan.net.il/stats/ILAN-CPU/new-gp-cpu.html

Whether you believe anthropogenic explanations for the instability 
depends on how fast you believe NEs can look, think, and type, compared
to the speed with which the BGP announcement and withdrawal rates are 
observed to take off.  For my part, I'd bet that the long slow exponential 
decay (with superimposed spiky noise) is people at work.  But the initial 
blast is not.

--
James Cowie
Renesys Corporation
http://gradus.renesys.com



Re: Level3 routing issues?

2003-01-27 Thread cowie



  here's a plot showing the impact on BGP routing tables from seven ISPs 
  (plotted using route-views data): 
  http://www.research.att.com/~griffin/bgp_monitor/sql_worm.html
 
 And as an interesting counterpoint to this, this graph shows
 the number of BGP routing updates received at MIT before, during,
 and after the worm (3 day window).  Tim's plots showed that the
 number of actual routes at the routers he watched was down
 significantly - these plots show that the actual BGP traffic
 was up quite a bit.  Probably the withdrawals that were taking
 routes away from routeviews...
 
 http://nms.lcs.mit.edu/~dga/sqlworm.html
 
   -Dave

Wow, for a minute I thought I was looking at one of our old plots, 
except for the fact that the x-axis says January 2003 and not 
September 2001  :) :)  

Your plot is consistent with what we saw on Saturday as well.  Looks 
much like a little Nimda. 

Blast from the past:

http://www.renesys.com/projects/bgp_instability

--jim

--
James Cowie
Renesys Corporation
http://gradus.renesys.com




Re: MIA: oregon-ix.net

2002-11-20 Thread cowie



 As some of you have noticed, the BGP4 route containing the address for
 route-views.oregon-ix.net has disappeared a while ago (mid-October?).
 Their website seems to be gone, and I swear, I couldn't resolve
 the domain for a little while just now. Has the Oregon IX been shut down?

As others have noted, they just had DNS problems.  Their routes appear to 
be live.  In fact, the stability of 198.32.162.0/24 is pretty good, by 
and large.  

They did have one global outage of about an hour and a half 
on October 1st, starting at 12:03 GMT.   Also, back on September 
13th, between 12:32 and 13:51 GMT they were (accidentally or deliberately) 
being originated by 15919 (Interhost),  creating a brief blackhole 
situation.  They're otherwise usually advertised by 3701, although you'll 
also see Verio originating them depending on where you look. 

 Their route-server was probably the best-connected one, with the most
 views, of any public route server I am aware of (please prove me wrong,
 but do not torment me with any web-based looking glasses :) .

Yeah, for real forensics, neither looking glasses nor public route 
servers are ideal solutions.   The former have single-site myopia and 
the latter have no good tools.   That's why we built our own 
infrastructure (http://gradus.renesys.com).   

 Nothing like having to poke around 10 other RS's to establish that
 rogue AS 26212 really only has 1, 6402 and 2914 as their upstreams.

Also 2516, 3257, 4513, 6730, and 6939, just in the last few weeks.  --jim 




Re: Do ATM-based Exchange Points make sense anymore?

2002-08-10 Thread cowie



Mike Hughes wrote:
 With the shorter timers or fast-external-fallover, a very short
 maintenance slot at a large exchange can cause ripples in the routing
 table. It would be interesting to do some analysis of this - how far the
 ripples spread from each exchange!

We do BGP instability research, and this is something I'd like to 
examine further.  Compared to other sources of BGP noise, I don't 
think it's a primary driver for the instability we monitor each 
day, but I'd like to quantify it.  

If people were willing to give us a heads-up after the fact when 
there were  .. um .. maintenance events at the major exchanges, 
we could then go back and look at global propagation of the ripples 
on fine timescales.  --jim

p.s. The more eyes we have, the more we see, and we are always looking 
for more silent peers, especially small and midsize providers or their
multihomed customers.  See http://renesys.com/cgi-bin/bgpfeed to 
sign up to send us a one-way multihop EBGP feed.  It's quick, painless, 
and you will be helping unravel the mysteries of why global routing works 
so well in spite of us all. 




Re: effects of NYC power outage

2002-07-24 Thread cowie



[NANOG has been bouncing my attempts to reply to this thread 
 for several days, possibly because I quoted the word u n st a b l e 
 early on, apparently triggering the un subs cribe filter for words 
 that start with uns and contain a b..   If your posts to NANOG have 
 been silently bounced in the past, and your network's operational 
 issues lead you to start your posts with words like uns tab le or 
 uns ol vable or uns uit able or u nsp eak able,  wonder no more. ] 
 

At any rate, about two days ago Senthil wrote:  

 BGP was more [un st a ble] during code red
 propagation(http://www.renesys.com/projects/bgp_instability/.)
 
 A quick peek into both the graphs will make one
 thing clear: *BGP is robust enough to withstand any
 extreme congestion.*

Anyone interested in this might also like to look at our report 
titled Internet Routing Behavior on 9/11 and in the Following Weeks. 

 http://www.renesys.com/projects/reports/renesys-030502-NRC-911.pdf

Note in particular the minute-by-minute changes in routing table size 
around critical events on pages 9 through 11.   Fine time granularity 
is important to avoid missing all the interesting features.  

--jim