Re: US .mil blocking in Japan

2011-03-16 Thread Jeff Aitken
On Tue, Mar 15, 2011 at 09:49:56PM -0600, ryanL wrote:
 should i be surprised that this hasn't been discussed much? anyone care to
 elaborate and/or expand on the real telecom damage done in japan?

What's to be surprised about?  The US military is temporarily blocking 
access to certain high-traffic web sites on its networks.  This obviously
affects only those users on DoD networks.

What damage are you referring to?


--Jeff




Re: US .mil blocking in Japan

2011-03-16 Thread Jeff Aitken
On Wed, Mar 16, 2011 at 09:14:13AM -0700, andrew.wallace wrote:
 This isn't the rhetoric of a super power, more like one of a university
 campus. [...] It strikes me straight away as amateurish to be blocking
 web sites in able to have enough bandwidth for operational purposes.

On the contrary, it's entirely plausible that US forces assisting with the
recovery are (1) using more communications resources than normal, and (2)
relying on infrastructure that's operating in a degraded state due to
fiber or power issues.  If so, it's entirely reasonable to put limits on
bandwidth-hungry but non-essential applications as a precautionary measure.

Here's an excerpt from 
http://www.nextgov.com/nextgov/ng_20110314_9111.php?oref=topnews:

Military units operating in Japan face bandwidth shortages and
network limitations that inhibit communications and command and
control, Defense sources told Nextgov. Misawa Air Base, located on the
northeast tip of Honshu, warned its personnel on a blog post Friday
that the Defense Switched Network, which handles voice calls, was in
backup mode and had only limited capacity, a fact confirmed by a
Pentagon source Monday.

The blog post added, We have a number of connectivity issues.
Internet has been up and down due to our connections through other
places in Japan. For example, Yokota [Air Base] and several other
locations are having issues because we all have power and connectivity
issues right now.

The Pentagon also took the extraordinary step of blocking access to a
range of commercial websites to ensure that its networks have enough
bandwidth to support mission-essential communications, Nextgov
learned. This move, a military source told Nextgov, possibly indicates
one or more undersea cables used by military networks were damaged by
the earthquake.


--Jeff




Re: Router only speaks IGP in BGP network

2011-01-03 Thread Jeff Aitken
On Sat, Dec 25, 2010 at 08:52:42AM -0500, ML wrote:
 If you're only redistributing 10 prefixes into OSPF? Problem?

I know I'm a little late to this thread, but figured I'd point out one
reason why this can be very dangerous:

In IOS, you use a route-map to control redistribution between protocols.
For example, if you want to redist just those BGP prefixes tagged with a
specific community into OSPF, you will probably configure something that
looks like this:

route-map bgp-to-ospf permit 10
 match community $COMMUNITY
!
route-map bgp-to-ospf deny 20
!
router ospf $PID
 redistribute bgp $ASN subnets route-map bgp-to-ospf


Now, consider the following failure scenarios:

1. Someone typo's a BGP config elsewhere in your network and attaches
$COMMUNITY to a whole bunch more routes... say, all 350k being sent by your
upstream provider.  *oops*

2. An engineer thinks that there's something wrong with the redistribution
and decides to temporarily disable it as part of the troubleshooting
process.  He types the following:

conf t
router ospf $PID
no redistribute bgp $ASN subnets route-map bgp-to-ospf

*boom*

He just dumped all BGP routes into OSPF, due to the way IOS parses the
command: it removes the route-map but leaves the redistribution intact. 
To be fair, Cisco does provide you with tools to mitigate this risk (see
the redistribute maximum-prefix command) but the point is that this is
a fairly easy mistake to make.

At the end of the day, the reason that many folks advise against the
redistribution of BGP into an IGP is that it sets the stage for a seemingly
insignificant mistake to cause a not-so-insignificant outage.


--Jeff




Re: Introducing draft-denog-v6ops-addresspartnaming

2010-11-22 Thread Jeff Aitken
[ Meant to send this to the list and not directly to Richard. ]

On Fri, Nov 19, 2010 at 03:07:40AM +0100, Richard Hartmann wrote:
 If any of you have any additional suggestions, you are more than 
 welcome to share them.

I heard hexquad somewhere awhile back and have been using it since...
looking over the other options present in your poll, I think I still
prefer it, but I could live with either hextet or simply quad as well.


--Jeff




Re: RIP Justification

2010-10-04 Thread Jeff Aitken
On Fri, Oct 01, 2010 at 04:28:30PM +, Tim Franklin wrote:
 Leaf-node BGP config is utterly trivial [...]
 
 The Enterprise guys really need to get out of the blanket BGP is scary 
 mindset

It's not just enterprise mindset.  Over the years I've seen a lot of
deployed gear that either didn't support BGP at all or for which it was a
significant extra cost.  At least in the past this applied to many
firewalls and load-balancers, and until recently, even one of the major
CMTS vendors didn't support BGP.

I agree that edge-node BGP is simple, but finding gear that supports it
isn't necessarily so.


--Jeff




Re: Time Warner/Road Runner issues in the Mid West

2009-10-09 Thread Jeff Aitken
On Fri, Oct 09, 2009 at 07:30:19AM -0700, Mike Maberry wrote:
 Is anyone else seeing connectivity issues to the internet using Time
 Warner/Road Runner in the Mid West? Kansas City and Wisconsin seem to be
 unable to access sites on the west coast...

Mike,

There is an ongoing issue that our ops folks are currently troubleshooting.
I don't have any details at this time, but if you've got a traceroute or
other details on the specific issue that you're seeing, feel free to
forward to me directly and I'll make sure it gets to the right parties
here.

Thanks,


--Jeff




Re: Data Center testing

2009-08-26 Thread Jeff Aitken
On Tue, Aug 25, 2009 at 10:45:07PM -0500, Frank Bulk - iName.com wrote:
 There's more to data integrity in a data center (well, anything powered,
 that is) than network configurations.  

Understood and agreed.  My point was that induced failure testing isn't
the right way to catch incorrect or unauthorized config changes, which is
what I understood the original poster to have said was his problem.  My
apologies if I misunderstood what he was asking.


 So while your analogy emphasizes the importance of having good processes in
 place to catch the problems up front, it doesn't eliminate throwing the
 switch.

Yup, and it's precisely why I suggested using planned maintenance events
as one way of doing at least limited failure testing.  


--Jeff




Re: Data Center testing

2009-08-25 Thread Jeff Aitken
On Mon, Aug 24, 2009 at 09:38:38AM -0400, Dan Snyder wrote:
 We have done power tests before and had no problem.  I guess I am looking
 for someone who does testing of the network equipment outside of just power
 tests.  We had an outage due to a configuration mistake that became apparent
 when a switch failed.  It didn't cause a problem however when we did a power
 test for the whole data center.

Dan,

With all due respect, if there are config changes being made to your 
devices that aren't authorized or in accordance with your standards (you
*do* have config standards, right?) then you don't have a testing problem,
you have a data integrity problem.  Periodically inducing failures to catch
them is sorta like using your smoke detector as an oven timer.

There are several tools that can help in this area; a good free one is
rancid [1], which logs in to your routers and collects copies of configs
and other info, all of which gets stored in a central repository.  By
default, you will be notified via email of any changes.  An even better
approach than scanning the hourly config diff emails is to develop scripts
that compare the *actual* state of the network with the *desired* state and
alert you if the two are not in sync.  Obviously this is more work because
you have to have some way of describing the desired state of the network in
machine-parsable format, but the benefit is that you know in pseudo-realtime
when something is wrong, as opposed to finding out the next time a device
fails.  Rancid diffs + tacacs logs will tell you who made the changes, and
with that info you can get at the root of the problem.

Having said that, every planned maintenance activity is an opportunity to
run through at least some failure cases.  If one of your providers is going
to take down a longhaul circuit, you can observe how traffic re-routes and
verify that your metrics and/or TE are doing what you expect.  Any time you
need to load new code on a device you can test that things fail over
appropriately.  Of course, you have to willing to just shut the device
down without draining it first, but that's between you and your customers.
Link and/or device failures will generate routing events that could be used
to test convergence times across your network, etc.

The key is to be prepared.  The more instrumentation you have in place
prior to the test, the better you will be able to analyze the impact of the
failure.  An experienced operator can often tell right away when looking at
a bunch of MRTG graphs that something doesn't look right, but that doesn't
tell you *what* is wrong.  There are tools (free and commercial) that can
help here, too.  Have a central syslog server and some kind of log reduction
tool in place.  Have beacons/probes deployed, in both the control and data
planes.  If you want to record, analyze, and even replay routing system
events, you might want to take a look at the Route Explorer product from
Packet Design [2].

You said switch failure above, so I'm guessing that this doesn't apply
to you, but there are also good network simulation packages out there.
Cariden [3] and WANDL [4] can build models of your network based on actual
router configs and let you simulate the impact of various scenarios,
including device/link failures.  However, these tools are more appropriate
for design and planning than for catching configuration mistakes, so
they may not be what you're looking for in this case.


--Jeff


[1] http://www.shrubbery.net/rancid/
[2] http://www.packetdesign.com/products/rex.htm
[3] http://www.cariden.com/
[4] http://www.wandl.com/html/index.php




Re: ISP BGP Resources

2009-07-10 Thread Jeff Aitken
On Fri, Jul 10, 2009 at 08:17:43AM -0400, Babak Pasdar wrote:
 Are there any resources (books, web sites, mailing lists, etc..) that
 anyone can recommend? 

Richard Steenbergen did a nice preso on this subject a couple years ago:

http://www.nanog.org/meetings/nanog40/presentations/BGPcommunities.pdf


--Jeff




Re: Sprint v. Cogent, some clarity facts

2008-11-03 Thread Jeff Aitken
On Mon, Nov 03, 2008 at 04:34:16PM -0200, Nicolas Antoniello wrote:
 Sorry for my possible ignorance, but could you explain me what are you
 calling transit-free?

Transit-free means that you don't pay anyone else to reach some 3rd-party
network.  In other words, if I'm Sprint, I don't pay UUNET to get to X.
Either X connects directly with me or X pays someone else to get to me.
If I can make that claim for all values of X, then I am transit-free.

Note that while I don't pay another network for access to its *peers*
(that's transit) I might pay for access to its customers.  This is
typically called paid peering or settlement-based peering, but
sometimes it can just be plain transit that's modified with communities
to look like peering.  To add to the confusion, the latter case might be
described differently by both parties; the seller probably says X is a
transit customer of mine, and the buyer says I have peering with Y,
and in this case, neither one is lying (mostly).

If you didn't see the reference a month or so ago when Paul sent it, the
following link might be interesting to you:

http://arstechnica.com/guides/other/peering-and-transit.ars
  

--Jeff




Re: Is it time to abandon bogon prefix filters?

2008-08-18 Thread Jeff Aitken
On Mon, Aug 18, 2008 at 09:51:20AM +0100, [EMAIL PROTECTED] wrote:
 m4 is a macro processor that you probably should not bother
 learning since you can do everything that it does by using Python 

Oh, Abley is gonna have fun with this... and for the record, my money is
on Joe.  He could probably implement python *IN* m4 if you offered enough
beer!


--Jeff




Re: Traceroute and random UDP ports

2008-08-13 Thread Jeff Aitken
On Wed, Aug 13, 2008 at 07:56:53AM -0500, John Kristoff wrote:
  Also, why do we increase the UDP port number with each subsequent
  traceroute packet that is sent?
 
 I don't know definitively, but I have an of educated guess 

From /usr/src/contrib/traceroute/traceroute.c:

/*
 * Notes
 * -
 * [...]
 * The udp port usage may appear bizarre (well, ok, it is bizarre).
 * The problem is that an icmp message only contains 8 bytes of
 * data from the original datagram.  8 bytes is the size of a udp
 * header so, if we want to associate replies with the original
 * datagram, the necessary information must be encoded into the
 * udp header (the ip id could be used but there's no way to
 * interlock with the kernel's assignment of ip id's and, anyway,
 * it would have taken a lot more kernel hacking to allow this
 * code to set the ip id).  So, to allow two or more users to
 * use traceroute simultaneously, we use this task's pid as the
 * source port (the high bit is set to move the port number out
 * of the likely range).  To keep track of which probe is being
 * replied to (so times and/or hop counts don't get confused by a
 * reply that was delayed in transit), we increment the destination
 * port number before each probe.
 * [...]
 *  -- Van Jacobson ([EMAIL PROTECTED])
 * Tue Dec 20 03:50:13 PST 1988
 */



--Jeff