Re: enterprise change/configuration management and compliance software?

2008-04-15 Thread Matthew Petach

On Mon, Apr 14, 2008 at 9:13 PM, jamie <[EMAIL PROTECTED]> wrote:
>   Gentlemen (and Ren!):;-)
>
>   I'm currently investigating options w.r.t. enterprise-wide (over 250
> device, and by 'device' i mean router and/or switch) configuration
> management (and (ideally) compliance-auditing_and_assurance) software.
>
>   We currently use Voyence (now EMC) and are looking into other options for
> various reasons, support being in the top-3 ...
>
>   So, I pose:  To you operators of multi-hundred-device networks : what do
> you use for such purposes(*) ?
>   (*)see subject

We have several thousand network devices currently in play:

[EMAIL PROTECTED]:/tftp/conf/latest> ls *.conf | wc -l
7419
[EMAIL PROTECTED]:/tftp/conf/latest>

I hand read each device configuration check-in email that goes past
to see if there's errors in the configs, security violations, or other WTF-ish
elements in the config check-in, and mail back a nag notice to the
person who changed the config.

Currently, I received between 1900 and 3000 email messages a day.

I sleep 3 hours a night.

> jamie rishaw

Hope that helps answer your question.

Matt


Re: Yahoo Mail Update

2008-04-14 Thread Matthew Petach

On Mon, Apr 14, 2008 at 6:18 AM, Rich Kulawiec <[EMAIL PROTECTED]> wrote:
>  On Sun, Apr 13, 2008 at 03:55:13PM -0500, Ross wrote:
>  > Again I disagree with the principle that this list should be used for
>  > mail operation issues but maybe I'm just in the wrong here.
>
>  I don't think you're getting what I'm saying, although perhaps I'm
>  not saying it very well.
>
>  What I'm saying is that operational staff should be *listening* to
>  relevant lists (of which this is one) and that operational staff
>  should be *talking* on lists relevant to their particular issue(s).

Completely agree.

>  Clearly, NANOG is probably not the best place for most SMTP or HTTP
>  issues, but some of the time, when those issues appear related to
>  topics appropriate for NANOG, it might be.  The rest of the time,
>  the mailop list is probably more appropriate.
>
>  While I prefer to see topics discussed in the "best place" (where
>  there is considerable debate over what that might be) I think that
>  things have gotten so bad that I'm willing to settle for, in the
>  short term, "a place", because it's easier to redirect a converation
>  once it's underway that it seems to be to start one.
>
>  For example: the silence from Yahoo on this very thread is deafening.

I think if you check historically, you'll find that Yahoo network operations
team members are doing exactly as you indicate, and are
"*talking* on lists relevant to their particular issue(s)"
that is to say, here on NANOG, when it comes to networking issues,
deafening silence has not been the modus operandus.

The mistaken notion that a *network operations* list should have
people on it to address mail server response code complaints is
where I disagree with you.

Ask about a BGP leakage, it'll get fixed.  Enquire about how to engage
in peering with Yahoo, you'll get flooded with answers; those are items
the folks who read the list are empowered to deal with.  Asking about
topics not related to the list that they aren't empowered to deal with
are going to be met with silence, because you're trying to talk to the
wrong people in the wrong forum.

>  ---Rsk

Matt
--always speaking for himself--his employer is more likely to pay him
to shut up.


Re: Yahoo Mail Update

2008-04-12 Thread Matthew Petach

On 4/10/08, chuck goolsbee <[EMAIL PROTECTED]> wrote:
> >An anonymous source at Yahoo told me that they have pushed
> > a config update sometime today out to their servers to help with these
> > deferral issues.
> >
> >Please don't ask me to play proxy on this one of any
> > other issues you may have, but take a look at your queues and
> > they should be getting better.
> >
> >- Jared
>
>  Thanks for the update Jared. I can understand your request to not be used
> as a proxy, but it exposes the reason why Yahoo is thought to be clueless:
> They are completely opaque.
>
>  They can not exist in this community without having some visibity and
> interaction on an operational level.
>
>  Yahoo should have a look at how things are done at AOL. While the feedback
> loop from the *users* at AOL is mostly a source of entertainment, dealing
> with the postmaster staff at AOL is a benchmark in how it should be done.

*heh*  Well, depending upon how the battle turns out, Yahoo is likely to
go the way of whomever its new partner will be--which will either be more
like AOL, or more like Hotmail.

Sounds like there's already some amount of preference at least among
this group as to which way they'd prefer to see the battle go.  ^_^;

Matt

>  Proxy that message over and perhaps this issue of Yahoo's perennially
> broken mail causing the rest of us headaches will go away. It seems to come
> up here on nanog and over on the mailop list every few weeks.
>
>  --chuck


Re: /24 blocking by ISPs - Re: Problems sending mail to yahoo?

2008-04-12 Thread Matthew Petach

On 4/11/08, Raymond L. Corbin <[EMAIL PROTECTED]> wrote:
>
>  It's not unusual to do /24 blocks, however Yahoo claims they do not keep any 
> logs as to what causes the /24 block. If they kept logs and were able to tell 
> us which IP address in the /24 sent abuse to their network we would then be 
> able to investigate it. Their stance of 'it's coming from your network you 
> should know' isn't really helpful in solving the problem. When an IP is 
> blocked a lot of ISP's can tell you why. I would think when they block a /24 
> they would atleast be able to decipher who was sending the abuse to their 
> network to cause the block and not simply say 'Were sorry our anti-spam 
> measures do not conform with your business practices'. Logging into every 
> server using a /24 is looking for needle in a haystack.
>

*heh*  And yet just last year, Yahoo was loudly dennounced for
keeping logs that allowed the Chinese government to imprison
political dissidents.  Talk about damned if you do, damned if don't...

I guess logs should only be kept as long as they can only be
used for good, and not evil?

Matt

>  -Ray


Re: cooling door

2008-03-30 Thread Matthew Petach

On 3/29/08, Alex Pilosov <[EMAIL PROTECTED]> wrote:
>
> Can someone please, pretty please with sugar on top, explain the point
>  behind high power density?
>
>  Raw real estate is cheap (basically, nearly free). Increasing power
>  density per sqft will *not* decrease cost, beyond 100W/sqft, the real
>  estate costs are a tiny portion of total cost. Moving enough air to cool
>  400 (or, in your case, 2000) watts per square foot is *hard*.
>
>  I've started to recently price things as "cost per square amp". (That is,
>  1A power, conditioned, delivered to the customer rack and cooled). Space
>  is really irrelevant - to me, as colo provider, whether I have 100A going
>  into a single rack or 5 racks, is irrelevant. In fact, my *costs*
>  (including real estate) are likely to be lower when the load is spread
>  over 5 racks. Similarly, to a customer, all they care about is getting
>  their gear online, and can care less whether it needs to be in 1 rack or
>  in 5 racks.
>
>  To rephrase vijay, "what is the problem being solved"?

I have not yet found a way to split the ~10kw power/cooling
demand of a T1600 across 5 racks.  Yes, when I want to put
a pair of them into an exchange point, I can lease 10 racks,
put T1600s in two of them, and leave the other 8 empty; but
that hasn't helped either me the customer or the exchange
point provider; they've had to burn more real estate for empty
racks that can never be filled, I'm paying for floor space in my
cage that I'm probably going to end up using for storage rather
than just have it go to waste, and we still have the problem of
two very hot spots that need relatively 'point' cooling solutions.

There are very specific cases where high density power and
cooling cannot simply be spread out over more space; thus,
research into areas like this is still very valuable.

Matt


Re: Yahoo! clue (Slightly OT: Spiders)

2007-06-05 Thread Matthew Petach


On 3/30/07, Zach White <[EMAIL PROTECTED]> wrote:

On Thu, Mar 29, 2007 at 10:17:50AM -0400, Kradorex Xeron wrote:
> Another problem is that the Yahoo/Inktomi search robots do not stop if no site
> is present at that address, Thus, someone could register a DNS name and have
> a site set on it temporarily,  just enough time for Yahoo/Inktomi's bots to
> notice it, then redirect it thereafter to any internet host's address and the
> bots would proceed to that host and access them over and over in succession,
> wasting bandwidth of both the user end (Which in most cases is being
> monitored and is limited, sometimes highly by the ISP), and the bot's end
> wasted time that could have been used spidering other sites.

It's not limited to that. I bought this domain which had previously been
in use. I've owned the domain for over 5 years, but I still get requests
for pages that I've never had up.

<[EMAIL PROTECTED]:/var/www/logs:8>$ grep ' 404 ' access_log | grep
darkstar.frop.org | awk '/Yahoo/ { print $8 }' | wc -l
 830
<[EMAIL PROTECTED]:/var/www/logs:9>$ grep ' 404 ' access_log | grep
darkstar.frop.org | awk '/Yahoo/ { print $8 }' | sort -u | wc -l
  82

That's 82 unique URLs that have been returning a 404 for over 5 years.
That log file was last rotated 2006 Sep 26. That's averaging 138
requests per month for pages that don't exist on that one domain alone.
How many bogus requests are they sending each month, and what can
we do to stop them? (The first person to say something involving
robots.txt gets a cookie made with pickle juice.)

Sure, on my domain alone that's not a big deal. It hasn't cost me any
money that I'm aware of, and it hasn't caused any trouble. However, it
is annoying, and at some point it becomes a little ridiculous.

Can anyone that runs a large web server farm weigh in on these sorts of
requests? Has this annoyance multiplied over thousands of domains and
IPs caused you problems? Increased bandwidth costs?

-Zach



Speaking purely for myself, and not for any other organization, I would
wonder what level of response you had gotten from the abuse address
listed in the requesting netblock:

[EMAIL PROTECTED]:/home/mrtg/archive> whois -h whois.ra.net 74.6.0.0/16
route:  74.6.0.0/16
descr:  YST
origin: AS14778
remarks:Send abuse mail to [EMAIL PROTECTED]
mnt-by: MAINT-AS7280
source: RADB
[EMAIL PROTECTED]:/home/mrtg/archive>

First line of inquiry in my mind would be to use the slurp@
email, and work my way along from there.

Matt


Re: Need BGP clueful contact at Global Crossing

2006-12-14 Thread Matthew Petach


On 12/14/06, Lasher, Donn <[EMAIL PROTECTED]> wrote:

On 14 Dec 2006 09:47:46 -0500, Michael A. Patton <[EMAIL PROTECTED]> wrote:
>> If there are any BGP clueful contacts at Global Crossing listening
(or
>> if someone listening wants to forward this to them :-), I would
>> appreciate your getting in touch.

>Out of curiousity, why do you think anyone here on NANOG would be
willing to bother the
>clueful contacts they know at provider (X) based on an email like this?
It's absolutely
>content-free.

Having been on both sides of an issue like this one, I'd much rather see
polite requests like the original requestor, rather than a 10 page dump
on why provider X is severely borked. Good netiquette, seems to me.


10 page dump is excessive; but a one or two line
"I'm seeing bad advertisements from AS  at the following peering
location" goes a long way to explain what the need and urgency is
around the issue.


Re: Need BGP clueful contact at Global Crossing

2006-12-14 Thread Matthew Petach


On 14 Dec 2006 09:47:46 -0500, Michael A. Patton <[EMAIL PROTECTED]> wrote:

If there are any BGP clueful contacts at Global Crossing listening (or
if someone listening wants to forward this to them :-), I would
appreciate your getting in touch.


Out of curiousity, why do you think anyone here on NANOG would
be willing to bother the clueful contacts they know at provider (X)
based on an email like this?  It's absolutely content-free.

Now, if you included examples of BGP announcements that were
being leaked that shouldn't be, or prefixes of yours that they were
accidentally hijacking, or traceroutes going from San Jose to Paris
and then back to Palo Alto within their network, or some other
level of operationally interesting content, then it's much more likely
the issue would be passed along either via forwarding the email,
or, if the issue was sufficiently interesting, via a more immediate
channel (cell phone/IM/IRC/smoke signal/INOC-DBA phone/etc).

But as it currently stands, my view of Global Crossing's network
doesn't show any problems worth contacting them about, so I'm
unlikely to pass along your request.  For all I know, you might
really be a terrorist out to collapse their infrastructure by sleep
depriving their backbone engineers night after night with inane
requests until their REM-deprived brains fat-finger the router
configs into oblivion.  And that just wouldn't be good.

So.  How about trying again, but with relevant content that indicates
an operational issue with their network, and then we can pass that
along to the right folks who can look into it.

Thanks!

Matt
(not now, nor ever have been affiliated with 3549, in case there's any
possibility of confusion)


Re: comcast routing issue question

2006-11-30 Thread Matthew Petach


On 11/29/06, Jim Popovitch <[EMAIL PROTECTED]> wrote:

On Thu, 2006-11-30 at 00:06 -0500, Jim Popovitch wrote:
> Question:  What could cause the first trace below to succeed, but the
> second trace to fail?
>
> $ mtr 69.61.40.35
> HOST: blueLoss%   Snt   Last   Avg  Best Wrst
>   1. 192.168.3.1   0.0% 14.3   4.3   4.3   4.3
>   2. 73.62.48.10.0% 1   10.6  10.6  10.6  10.6
>   3. 68.86.108.25  0.0% 1   11.4  11.4  11.4  11.4
>   4. 68.86.106.54  0.0% 19.8   9.8   9.8   9.8
>   5. 68.86.106.9   0.0% 1   20.5  20.5  20.5  20.5
>   6. 68.86.90.121  0.0% 1   11.3  11.3  11.3  11.3
>   7. 68.86.84.70   0.0% 1   27.7  27.7  27.7  27.7
>   8. 64.213.76.77  0.0% 1   24.5  24.5  24.5  24.5
>   9. 208.50.254.1500.0% 1   39.4  39.4  39.4  39.4
>  10. 208.49.83.237 0.0% 1   46.6  46.6  46.6  46.6
>  11. 208.49.83.234 0.0% 1   40.7  40.7  40.7  40.7
>  12. 69.61.40.35   0.0% 1   43.9  43.9  43.9  43.9
>
> $ mtr 69.61.40.34
> HOST: blueLoss%   Snt   Last   Avg  Best  Wrst
>   1. 192.168.3.1   0.0% 11.1   1.1   1.1   1.1
>   2. 73.62.48.10.0% 19.9   9.9   9.9   9.9
>   3. 68.86.108.25  0.0% 19.3   9.3   9.3   9.3
>   4. 68.86.106.54  0.0% 19.6   9.6   9.6   9.6
>   5. 68.86.106.9   0.0% 19.0   9.0   9.0   9.0
>   6. 68.86.90.121  0.0% 1   18.2  18.2  18.2  18.2
>   7. 68.86.84.70   0.0% 1   23.9  23.9  23.9  23.9
>   8. ???  100.0 10.0   0.0   0.0   0.0
>
>
> Taking the 69.61.40.33/28 subnet a bit further, .36 drops at 68.86.84.70
> but .37 - .39 make it.  .40 drops at 68.86.84.70, but .41 makes it.
>
> Crazy.

Btw, the problem has now been resolved, however I'm still curious as to
what scenario could have caused that.

-Jim P.


eBGP multihop peering across a pair of 10 gigE links with
static routes pointing to the remote router loopback; one
link goes south, but the interface still shows as up/up,
and voila, depending upon the hash, your packets may
go across the good link, or they may disappear into the
black hole of oblivion.

This is why multipath is a good thing, and eBGP multihop
with static routes is a Bad Thing(tm).

Matt


Re: link between Sprint and Level3 Networks is down in Chicago

2006-11-09 Thread Matthew Petach


On 11/9/06, Deepak Jain <[EMAIL PROTECTED]> wrote:

Does someone know if this is a *single* link down?? It seems bizarre to
me that there would only be a single link (geographically) between those
two.

Whatever happened to redundancy?
Deepak



From the outside, this appeared to be more like a CEF

consistency sort of thing; routes were still carrying packets
to the interconnect, but the packets were not successfully
making it across the interconnect.  I would hazard a guess
that had the link truly gone down in the classic sense, BGP
would have done the more proper thing, and found a different
path for the routes to propagate along.

Again, this is speculation from the outside, based on the
path packets were taking before dropping on the floor.

Matt




Dennis Dayman wrote:
> We received confirmation from Time Warner. The link between Sprint and
> Level3 Networks is down in Chicago. This has been an issue since 3:10 PM
> EST.  Time Warner has a ticket open to address the issue. Not sure what it
> is yet.
>
> -Dennis
>
>
>




Re: SprintLink peering issue in Chicago?

2006-11-09 Thread Matthew Petach


On 11/9/06, Olsen, Jason <[EMAIL PROTECTED]> wrote:


At around 1345 Central it was brought to my attention that we had lost
access to a number of websites out on the 'net... Two big-name examples
are Oracle, which has our development team screaming for my blood.  The
other that's come to light as well is, of course, Yahoo... which means
the rest of the userbase hates me.  Traceroutes like the two below for
Oracle generally die after one of Sprint's routers or its peer with
Level3.  I've already opened a case with SprintLink's broadband group
and the tech I've spoken to said that there have been an influx of calls
about routing/website availability problems, but nothing had been
identified inside Sprint yet.

Just curious if anybody else is seeing this sort of action.


Sprint has been made aware of the issue, as has Level3.

Matt





[EMAIL PROTECTED]  [/export/home/jolsen]
$ traceroute www.oracle.com
traceroute: Warning: Multiple interfaces found; using 10.2.2.230 @ ce0
traceroute to www.oracle.com (141.146.8.66), 30 hops max, 40 byte
packets
 1  core2-vlan1.obt.devry.edu (10.2.2.1)  0.407 ms  0.278 ms  0.265 ms
 2  obtfw-virtual.obt.devry.edu (10.2.1.10)  1.413 ms  2.380 ms  2.400
ms
 3  * * 205.240.70.2 (205.240.70.2)  5.209 ms
 4  * * sl-gw32-chi-6-0-ts3.sprintlink.net (144.232.205.237)  10.738 ms
 5  * sl-bb21-chi-4-2.sprintlink.net (144.232.26.33)  14.616 ms  32.739
ms
 6  sl-bb20-chi-14-0.sprintlink.net (144.232.26.1)  16.901 ms  33.400 ms
27.028 ms
 7  sl-st20-chi-12-0.sprintlink.net (144.232.8.219)  42.269 ms  6.190 ms
3.835 ms
 8  * 209.0.225.21 (209.0.225.21) 9.971 ms 148.152 ms
 9  * * *
10  * * *
11  * * *
12  * * *

-- and --

[EMAIL PROTECTED]  [/usr/local/sbin]
# ./tcptraceroute www.oracle.com 80
Selected device ge0, address 10.2.2.4 for outgoing packets
Tracing the path to www.oracle.com (141.146.8.66) on TCP port 80 (http),
30 hops max
 1  10.2.2.1 (10.2.2.1)  0.289 ms  0.224 ms  0.208 ms
 2  10.2.1.10 (10.2.1.10)  1.547 ms  1.502 ms  1.218 ms
 3  205.240.70.2 (205.240.70.2)  2.555 ms  5.551 ms  6.408 ms
 4  sl-gw32-chi-6-0-ts3.sprintlink.net (144.232.205.237)  4.120 ms
8.185 ms  6.024 ms
 5  sl-bb21-chi-4-2.sprintlink.net (144.232.26.33)  5.470 ms  3.884 ms
6.889 ms 6  sl-bb20-chi-14-0.sprintlink.net (144.232.26.1)  8.851 ms
7.624 ms  5.671 ms 7  sl-st20-chi-12-0.sprintlink.net (144.232.8.219)
7.913 ms  7.283 ms  7.427 ms
 8  209.0.225.21 (209.0.225.21)  4.730 ms  6.033 ms  7.925 ms
 9  * * *
10  * * *
11  * * *





Re: UUNET issues?

2006-11-04 Thread Matthew Petach


On 11/4/06, Randy Bush <[EMAIL PROTECTED]> wrote:

Chris L. Morrow wrote:
> "Could you be any less descriptive of the problem you are seeing?"

the internet is broken.  anyone know why?


Because we didn't deploy IPv6 quickly enough?   ;P

Matt


Re: Yahoo Postmaster contact, please

2006-11-03 Thread Matthew Petach


On 11/3/06, Matt Clauson <[EMAIL PROTECTED]> wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Greetings, NANOGers.  I've got a mail cluster that's been spooling about
5 messages for the past week or so (with very little drain and
traffic passing), and my mail admin reports that attempted contacts to
the Yahoo Postmaster are not getting answered.  Can someone over there
drop me a line off-list, please?

- --mec


Amusingly enough, gmail tossed this in my spam folder, so I didn't see it
until people started replying to it.  I have no idea if that's indicative of
anything with respect to Yahoo or not, but it might indicate a possible
reason for mail deferral from some sites.

If you're having network connectivity issues reaching Yahoo, NANOG would
seem like a reasonable place to raise questions--but this isn't really a list
for mail admins to hang out on.  It looks like network connectivity between
dotorg.org and Yahoo is good, so I'm not sure if there's anything people on
this list could help you with--but if you do have network connectivity issues
in the future, there's definitely people here who can address those concerns.

Matt


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (MingW32)
Comment: GnuPT 2.7.2
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFS6w9vDNtj3aXDYkRAu8hAJkBl7fcSpXG1p0nU9QsWHReHfQsKwCdFj20
LrLTe2HcgNremAEoYIp983Y=
=+e8X
-END PGP SIGNATURE-




Re: WSJ: Big tech firms seeking power

2006-06-15 Thread Matthew Petach


On 6/14/06, Sean Donelan <[EMAIL PROTECTED]> wrote:


Since power consumption was a topic at the last NANOG meeting.

subscription required, or buy a copy of the Wall Street Journal from
a newstand

http://online.wsj.com/article/SB115016534015978590.html
Surge in Internet Use, Energy Costs
Has Big Tech Firms Seeking Power
By KEVIN J. DELANEY and REBECCA SMITH
Wall Street Journal
June 13, 2006; Page A1

With both Internet services and power costs soaring, big technology
companies are scouring the nation to secure enough of the cheap
electricity that is vital to their growth.

The search is being led by companies including Microsoft Corp., Yahoo Inc.
and IAC/InterActiveCorp. Big Internet firms have been adding thousands of
computer servers to data centers to handle heavy customer use of their
services, including ambitious new offerings such as online video.
[...]



And, just to be fair, Google gets their own bit of news on the power
front:

http://www.iht.com/articles/2006/06/13/business/search.php

I wonder just how much power it takes to cool 450,000 servers.

Matt


Re: 2006.06.07 NANOG-NOTES TCP Anycast--don't spread the FUD!

2006-06-12 Thread Matthew Petach


On 6/12/06, Rodrick Brown <[EMAIL PROTECTED]> wrote:

Looks like this document maybe have been removed? the link appears to
be dead any mirrors?



The slide deck hadn't been put online when I sent my notes; I took a
guess at what the location might end up being, but guessed wrong.
The actual location ended up being
http://www.nanog.org/mtg-0606/pdf/levine.pdf

Matt


--
Rodrick R. Brown
Senior Systems Engineer
http://www.rodrickbrown.com
http://groups.yahoo.com/group/wallstandtech



Re: 2006.06.07 NANOG-NOTES Smart Network Data Services

2006-06-09 Thread Matthew Petach


On 6/9/06, Simon Waters <[EMAIL PROTECTED]> wrote:

On Friday 09 Jun 2006 12:22, Matthew Petach wrote:
> SNDS tomorrow
> Usability

The sign-up process is very painful.

Microsoft Passports really aren't appropriate for business accounts, my
employer don't have a mothers maiden name, or a first pet. At one point it
claimed the name of my first pet must have more than 5 characters in it ?
(Perhaps they should aim for things likely to have more information in them,
besides my mothers maiden name has been published in the newspapers).

I sent a request for help, as the process fell over at the stage of
authorising the first address range I requested. With a failure to handle the
URL sent for me to click.


Interesting--it's good for me to hear what people are saying about it,
as I can't access it myself--my MSN accounts were all locked, and
part of the termination agreement stipulated that I'm forbidden from
accessing their services.  It does mean the service is limiting
its own scope by requiring Passport-based logins like that, as
I'll never be able to use it to see if any of the domains/netblocks
I'm responsible for might be originating spam.

Perhaps if Microsoft is truly interested in helping clean up the
Internet, they might lift the Passport login requirement?

Matt
[tempted to set Reply-To: to [EMAIL PROTECTED], but that
might be considered antisocial.  ^_^ ]


Re: 2006.06.06 NANOG-NOTES MPLS TE tutorial

2006-06-09 Thread Matthew Petach


On 6/8/06, Matthew Petach <[EMAIL PROTECTED]> wrote:

(still here, just been really busy at work today; will try to finish sending the
notes out tonight.  --Matt)

2006.06.06 MPLS TE tutorial
Pete Templin, Nextlink


Gyah!!  Huge apologies to Pete, who really works for Texlink.
I used to work at Nextlink, and in taking notes, my fingers
went down their old familiar path a bit too easily.

Again, Pete Templin works at Texlink, not Nextlink--apologies
for that gaffe, Pete.  ^_^;;

Matt


2006.06.07 NANOG-NOTES DNSSEC bootstrapping with DLV

2006-06-09 Thread Matthew Petach


(last notes from NANOG37, yay!  I definitely fell further behind
this time around than in Dallas.  Unfortunately, I don't think
I'll be allowed to go to St. Louis, so I probably won't be
able to provide notes for NANOG38.  --Matt)


2006.06.07 Deploying DNSSEC--bootstrap yourself
Joao Damas, ISC
[notes are at
http://www.nanog.org/mtg-0606/pdf/joao-damas.pdf

DNSSEC status
standard is complete and usable
some minor nits with regards to some privacy issues
2 implementations: NSD, BIND
at least one DNSSEC aware resolver (BIND 9.3.2 and later)

Really, you just need some data.

DNSSEC follows a hierarchical model for signatures.
sign the root zone
get root zone to delegation sign TLDs
get TLDs to delegation-sign SLDs,
etc.

Today, the root zone remains unsigned
 likely will be this way for some time
Very few TLDs have signed their zones and offer
delegation signatures
.se, .ru, .org

DNSSEC provides for local trust anchors
you can use trust-anchors clause in BIND
problem: if you have too many, it becomes a nightmare
to maintain, so it doesn't get used.
very manual process

Enter DLV, domain lookaside validation
it's an implementation feature, not a change to the
 protocol; matter of local policy
enables access to a remote, signed repository of
 trust anchors, via the DNS
implemented in BINDs resolver so far
 more to follow?

unfortunately, requires you to trust remote
repository

DLV lookup
a DLV enabled resolver will try to find a secure
entry point using regular DNSSEC; only if it fails
is DLV used, if it is configured.

[picture of DLV lookup chain]

On resolver (BIND)
add to named.conf
 in the options section
 //DNSSEC conifg
 dnnssec-enable yes
 dnssec-lookaside . trust-anchor dlv.isc.org.;
get the key from ISC's web: http://www.isc.org/ops/dlv

ISC is operating a DLV registry free of charge for anone
who wants to secure their DNS
Likely some closed orgs will use their own (eg mil)
have a look, start using it!

Any questions?

Q: Mark Kosters, Verisign: Any plans to configure DLV
registries per TLD?
A: BIND code only allows for one right now.
Q: Would be good to allow it to be configured per TLD.

Q: Randy Bush, IIJ: some feeling or understanding how
IANA, root would validate keys/zones it has keys for;
don't understand how ISC proposes to validate keys
it would be storing.  He suggests they publish the
security policy.
A: In case of registrars proxying keys; they trust
registrar.  Otherwise, it's like PGP; show me your
face, show me your key.

Q: Paul vixie, ISC, following up on Mark Kosters;
you can only have one DLV for any point in the
namespace; you can specify a different one for a
TLD than root; that allows a TLD DLV to be paranoid,
like .mil. who doesn't want to trust anyone else
with key information.
If every TLD wanted to do that, they would find
high levels of cut-and-paste fatigue, so ISC will
operate a root level DLV server as well.

Q: Rick Wesson, runs Alice's Registry, a small registrar.
he's considering doing this, he can help DNS holders
register their keys if people are interested, and
will help get them into the DLV tree.

Q: Sam Wiler?, Sparta: concerns from Randy about how
ISC will authenticate the entries.  Registrars should
consider running their own DLV servers, as they have
the relationship with the domain holder.

Code?  Apparently you don't need code...

NANOG 37, ending slides.

425 attendees, 118 first timers
lots of countries
most USA, 11 canada, scattered others.
ISP, then NSP, then other categories.

top 3 companies represented: Cisco, Juniper, Equinix

HUGE thanks to Rodney Joffe and Neustar for
puling off a miracle to make this happen at the
last minute!

Thanks to sponsors, bear, gear, other.

Susan R Harris, many thanks to her for all
the work she has put in over the years and
to make this happen!

Also huge thanks to all the other people
at Merit

And we'll see you in St. Luis, Oct 8-10th,
joint meeting with ARIN, things set in stone.

Network will go down in 30 minutes or so--pack
up and go home!  :)

I think that was the fastest closing I've seen at
a NANOG yet.  ^_^;;


2006.06.07 NANOG-NOTES TCP Anycast--don't spread the FUD!

2006-06-09 Thread Matthew Petach


(this was one of the coolest talks from the three days, actually,
and has gotten me *really* jazzed about some cool stuff we can
do internally.  Huge props to Matt, Barrett, and Todd for putting
this together!!  --Matt)


2006.06.07 TCP anycast, Matt Levine, Barrett Lyon
with thanks to Todd Underwood
TCP anycast, don't believe the FUD
Todd Underwood is in Chicago
Barrett Lyon starts off.
[slides may eventually be at:
http://www.nanog.org/mtg-0606/pdf/tcp-anycast.pdf

IPv4 anycast
from a network perspective, nothing special
just another route with multiple next-hops
services exist on each next-hop, and respond
from the anycast IP address.

It's the packets, stupid
perceived problem: TCP and anycast don't play together
for long-lived flows.
eg, high-def porn downloads
[do porn streams need to last more than 2 minutes?]
some claim it exists, and works...
yes, been in production for years now.

Anycast at CacheFly
deployed in 2002
prefix announced on 3 continents
3 POPs in US
5 common carriers (transit) + peering
 be sensible to who you peer with
Effective BGP communities from upstreams is key
 keep traffic where you want it.

Proxy Anycast
proxy traffic is easy to anycast!
move HTTP traffic through proxy servers.
customers are isolated on a VIP/virtual address, which
happens to exist in every datacenter.
Virtual address lives over common carriers allowing
even distribution of traffic
state is accomplished with custom hardware to keep
state information synchronized across proxies.

Node geography
anycast nodes that do not keep state must be
 geographically separated
Coasts and countries work really well for keeping
route instability largely isolated.
Nodes that are near by could possibly require state
between them if local routes are unstable.

IP utilization
"Anycast is wasteful"
people use /24's as their service blocks; use 1 /32 out
of a whole /24.
Really?  How much IP space do you need to advertise
from 4 sites via unicast?

Carriers and Peering
for content players, having even peering and carriers
 is key.
 you may cause EU eyeballs to go to CA if you're not
  careful with where you peer with people.
having an EU centric transit provider in the US without
  having the same routes in EU could cause EU traffic
   to home in the US
 Use quality global providers to keep traffic balanced.

When peering...
keep in mind a peer may isolate traffic to a specific
  anycast node
Try to peer with networks where it makes sense; don't
 advertise your anycast to them where they don't have
 eyeballs!
Try to make sure your peers and transit providers know
 your communities and what you're trying to do, and
 make sure you understand their communities well!

Benefits of Anycast.
for content players
moving traffic without major impact or DNS lag
provides buffers for major failures
allows for simplistic traffic management, with a major
 (potential) performance upside.
it's BGP you don't control, though, so not much you
 can do to adjust inbound wins.
HTTP has significant cost to using DNS to try to shift
traffic around; six or more DNS lookups to acquire
content; anycast trims those DNS lookups down
significantly!
Ability to interface tools to traffic management.
No TTL issues!

Data, May 9, 2006
Renesys: monitored changes in atomic-aggregator for
a CacheFly anycast prefix
AS path changes and pop changes
Keynote: monitored availability/performance of 30k file
Revision3: monitored behavour of "longlived" downloads
of DiggNation videocast--over 7TB transferred.

Renesys data:
130BGP updates for may 9th; low volume day
stable prefixes
34 distinct POP changes based on atomic aggregator
property on prefixes
130 updates is considered a stable prefix.

SJC issue:
thirty-five minute window, 0700 to 0735 UTC, saw:
98 updates, 20 actual pop changes based on
 atomic aggregator changes, all from one san jose
 provider, fail from SJC to CHI back to SJC
unable to correlate these shifts with any traffic
changes; mostly likely we don't have a big enough
sample size.
possibly just not a lot of people using those routes.

BGP seems stable--what about TCP flows?

AVG time between SJC and CHI and back again was about
20 seconds; very quick on the trigger to go back to
SJC; would break all TCP sessions happening at the
time.
For the most part, TCP seems stable.

Keynote: 30k download from 31 locations every 5 minutes,
or average of 1 poll per 9.6 seconds
compared against 'Keynote Business 40'
data collected on May 9, 2006
represents short-lived TCP flows, though.

Orange line is Keynote business 40
pegged 100% availability
load time was lower than the business 40.
(0.2s vs 0.7s for business 40)

Revsion 3 data
monitored IPTV downloads for 24 hours (thanks, jay!)
span port; analyzed packet captures
look for new TCP sessions not beginning with SYN
compare that against global active connection table.
looked for sessions that appeared out of nowhere.

Long-lived data
683,204 TCP sessions.
anything less than 10 minutes thrown out
23,795 sessions las

2006.06.07 NANOG-NOTES Anycast benefits for k root server

2006-06-09 Thread Matthew Petach


Break ends at 11:40, PGP signing will take place,
and don't forget to fill out servers.

ANYCAST fun for the final sessions.

Lorenzo Colitti, RIPE NCC
[slides are at:
http://www.nanog.org/mtg-0606/pdf/lorenzo-colitti.pdf

Agenda:
introduction
latency
client-side
server-side
Benefit of individual nodes
Stability
Routing issues

Why anycast?
root server anycast widely deployed
c, f, i j, k, m at least
reasons for anycasting
provide resiliency: eg contain DOS attacks
spread server and network load
increase performance

but is it effective?

measure latency
ideally for every given client, BGP should chose node
with lowest RTT.  does it?
from every client, measure RTTs to
anycast IP address
service interfaces of global nodes (not anycasted)
for every client, compare K RTT to RTT of closest global
node
a = RTTk/min(RTTi)
if 1, BGP is picking right node
if > 1, BGP picks the wrong node
if <1, seeing local node.

Latency with TTM: methodology
DNS queries from ~100 TTM test boxes
dig hostname.bind
see which host answers
extract RTT
take min of 5 queries
check paths to service interfaces;
is it same as prod IP
according to RIS, mostly 'yes'

TTM probe locations, mostly in europe

Latency with TTM: results (5 nodes)
most values are close to one; generally BGP doing pretty
good job.

from 2 nodes to 5 nodes
(2 nodes, April 2005)  (5 nodes, April 2006)
mostly same results, clustered around one, whether
2 or 5 nodes.

consistency of 'a' over time
average of that over time.

TT103 is outlier
calculated over time, threw out that one outlier.

results are pretty consistent.
average is little higher than one, mostly consistent
over time

measuring from servers
TTM latency measurements not optimal
locations biased towards europe
limited number of probes (~100)
don't reflect k client distribution

how to fix?

ping clients from servers
much larger dataset

methodology
process packet traces on k global nodes
extract list of client IP addreses
ping all addresses from all global nodes
plot distribution of 'a'
6 hours of data
246,769,005 queries
845,328 unique IP addresses

CDF of 'a' seen from servers
results not as good as seen by TTM
only 50% of clients have a = 1
about 10% are 4x slower/farther.

probably due to TTM clustering in europe

latency conclusions
5 node result vs 2 node, comparable, at least
in TTM

non-TTM results not so rosy.

How many nodes are needed--is 5 enough?
evaluate existing instances
how to measure benefit of an instance?

Assume optimal instance selection
that is, every client sees closest instance
this is upper benefit of benefit
consistent to see if we've reached diminishing returns

for every client, see how much its performance if the
chosen node didn't exist.

B is loss factor, how much a client would suffer if an
instance were knocked out
B = RTTknockout/RTT...

Graph for LINX; 90% of clients wouldn't see an impact
if it went away; 10% would see a worsening.
geographic distribution pretty wide

AMS-IX
about 20% would suffer performance degregation; busiest
two nodes, see a lot of clients, important to k
deployment.

If they plot it for both LINX and AMSIX together,
about 65% wouldn't be affected, most of others would
see 4x, 10% would be 7x worse.
So taken together, the *two* nodes are important.

Tokyo; best node for few clients; but those served,
BADLY served by others;
about 10% who would go more than 7x if it went way,
those clients mostly Asia.
Miami node at NOTA,
moderate benefit for some clients, US and southAm
would be badly served by europe or Tokyo.

Delhi node is mostly ineffective, most would be
served better by other nodes.

Condense the graph into one number to get a
value for effectiveness of each node.
weighted average of B for each client.
if benefit value is 1, node doesn't provide any
benefit at all.
larger numbers show higher benefits.
Europe, when taken together, high benefit, as is
Tokyo; Miami node not so effective, and Delhi is
nearly ineffective.

Does anycast provide any value then?
knock out all except LINX; dark red curve (pre 1997)
10% wouldn't notice, 85% would get worse,
benefit value is 18.8,
so anycast does bring value.

Stability
the more routes competing in BGP with more nodes
doesn't matter for single packet UDP exchanges
does matter for TCP

Look at node switches that occur.
collect packet dumps on each node.
extract all 53/UDP traffic
k nodes only NTP synchronized
if IP shows up on two nodes, log a switch.

5 nodes, april 2006, 0.06% saw switches
2830 switchers out of 845,328, 0.33% switchers
no big issue with instance switchers.

Routing issues
k-root structure
5 global nodes (prepended)
 linkx, amsix, tokyo, mia, del

different prepending values
no-export causing reachability

TT103 has value of 200, the graph axis is cut.
tt103 is in Yokohama; Tokyo is 2ms away; but
the query goes to Delhi through Tokyo to LA.
416ms vs 2, so value is 208.

Thanks to Matsuzaki and Randy Bush,
got BGP paths from AS2497
bad interaction of different prepending lengths
need 

2006.06.07 NANOG-NOTES Lightning talk notes

2006-06-09 Thread Matthew Petach


(I think these were the toughest to take notes on, since they went
by so fast; took the most cleaning up afterwards.  But they were also
the best talks of the 3 days.  I wish we could have flipped, and taken
more time on Tuesday for them so we really could have dug in and
asked the questions we were itching to ask.  ^_^;  --Matt)


2006.06.07 Lightning talks

Marty Hannigan, Renesys:
[slides are at:
http://www.nanog.org/mtg-0606/pdf/lightning-talks/1-hannigan.pdf

Critical infrastructure, root server location
analysis
Where to stick your servers.  :)
he took some public info out there on root-servers.org
talked to some people, extrapolated from larger set of
data.
operator demographics.
in  US:
3 corp a, c, j
2 edu b and d
1 mil g
2 research e/h
3 nonprofit f, i, l
autonomica is responsible for l, but hosts "some"
instances on a CDN; CDN is a US formed entity
in EU:
1 non profit k
asia/japan:
1 nonprofit m

92% of system operated in US, 8% non-us;
5% margin of error +-.
US entity type
non-us 8%
us corp 39%
us mil 23%
us edu 15%
us nonprofit 15%

where?
in 54 countries
all religions
all methods of governance

politically:
79% are democratic governments
21% in other forms of government

global diversification for security and performance
instances spread across continents
different networks
different proceedures
different software
different hardwware
different weaknesses
 weaknesses become strength, since they are diverse;
 no one weakness knocks out all servers.
 little less open to insider malfeasance

Global distribution
NA 38%
EU 35%
Asia 12%
AUs 8%
east EU 3%
LA 2%
Africa 2%
ANT 0%

getting reasonable coverage in the world

situating a root server
relationship 101
who you know
 ICANN, operator, IX, and RIR relationships
 regulators
how you spin it
 national pride
 performance and security
 betterment of user experience

Threats
no different from anyone else
 direct attacks
 proxy attacks
 botnets
  easy money
  miscreants masking other activities

Not sure what motivations to attack root servers;
can't extort money from nonprofits

let's attack a root server
target $-root
 location; eu hosting facility
 multi-post cabinet config with cabling and power
  under floor
 unlocked cabinet, single factor facility entry
physical attack
  open cabinet door
  access to power
hijack attempt
 advertise a route
 return bad answers
network attack
spoof source
random host queries
packet floods

summary:
root system is less likely to be subject to insider
 attack or weakness
but can be attacked by layer 3
there is likely good resarch data coming across those
 interfaces
trend towards a collapsed root system, where root and
TLD share same hardware or networks should be more
closely examined.
slides will be up soon, talk to him in the hallway


NEXT, Anton Kapela
Network RTTs
[slides are at:
http://www.nanog.org/mtg-0606/pdf/lightning-talks/2-kapela.pdf

I'm pinging 10: high rate active probes
we're pinging stuff really quickly
adjusted host kern.hz to 1000 select() gets pretty
 accurate +-1ms emmission accuracy
stuff is responding
Interesting 0.001% of data relates to end-to-end queuing

what has been sampled?
some cisco 7513s
IOS 12.3 mainline
linux 2.4.20
freebsd 4.8
NT4 sp6
various end-to-end paths on u-wisc network

raw data isn't terrible interesting.
in adaptive link layer protocols, see rate shifting
manifested in RTT
wireless, HPNA/HCNA, powerline ethernet
10,30,60,90 second peaks

fourier transforms, wavelet transforms, frequency domain
1000 seconds at 10ms intervals
break into composite, aggregate graph at top,
0-50hz span on x axis, y axis is contribution
summary of entire graph.
bottom right graph is rough 200 samples of a
range from 0-5hz, 100pps, deduce delay at half
that sampling rate.

delay is not a simple boring thing; has
scheduler delays, path dynamics not visible
before to see queue depths.

shark fins showed up; congestion events do
occur, are quite measurable.
when links are hot, queues are obvious, esp. on
highly multiplexed links.

bottom left, cubic resonance, several tens of
thousands of multiplexed flows hitting odd
resonance.

pinging windows machine, composite spectral
fingerprint; 10,20,25,30 spikes
Linux fewer spikes
freebsd low and flat
IOS is 10, 20, 30 and grass of 1hz spacing
below 10hz.

win32 delay spectrum also has 1hz fuzz below
10hz.

Sampled RTT and performed signal analysis of it;
now what?
is network time continuous? is round trip time
discreet or continous?
no changes in revealed as you go down lower
is delay a "signal' anyway
what's with the 0 hz DC component in the FT output?

could this be used for fingerprinting?
yes, could be like next nmap.
packet-level fingerprinting is trivial to fake; but
IP stack scheduler behaviour doesn't change so
easily.


NEXT:
Mikael Abrahamsson
Affect on traffic from the TPB bust
with Kurtis Lindqvist
[slides are at:
http://www.nanog.org/mtg-0606/pdf/lightning-talks/3-abrahamsson.pdf

Bittorrent background
p2p protocol for filesharing.
text

2006.06.07 NANOG-NOTES Issues with IPv6 multihoming

2006-06-09 Thread Matthew Petach


(hope the inclusion of URLs in the notes isn't
making them all end up in people's spam
folders... --Matt)


2006.06.07 Vince Fuller, from Cisco
and Jason Schiller from UUnet

[slides are at:
http://www.nanog.org/mtg-0606/pdf/vince-fuller.pdf

IPv6 issues routing and multihoming
scalability with respect to routing issues.

how we got where we are today
define "locator" "endpoint-id" and their functions

Explain why these concepts matter, why this
separation is a good thing

understand that v4 and v6 mingle these functions,
and why it matters

recognized exponential growth - late 1980s
CLNS as IP replacement dec 1990 IETF
OSI, TP4 over CLNS--edict handed down from IETF
revolt against that, IP won
ROAD group and the "three trucks" 1991-1992
running out of "class-B" network numbers
explosive growth of "default-free" routing table
eventual exhaustion of 32-bit address space
two efforts -- short-term vs long-term
More at "the long and winding ROAD"
 http://rms46.vlsm.org/1/42.html
Supernetting and CIDR 1992-1993

Two efforts to fix it; CIDR, short term effort,
long term effort became IPv6.

IETF ipng solicitation RFC 1550 Dec 1993
Direction and technical criteria for ipng choice
RFC1719 and RFC 1726, Dec 1994
proliferation of proposals
TUBA == IP over CLNS
NIMROD==how to deal with it from high level

Lots of flaming back and forth, not much good
technical work.
choice eventually made on political choices, not
 technical merit.
Things lost in shuffle...er compromise included:
variable length addresses
decoupling of transport and network-layer addresses
clear separation of endpoint-id/locator (more later)
routing aggregation/abstraction

"math is hard, let's go shopping" -- solving the
real issues was set aside, people focused on
writing packet headers instead

identity -- what's in a name
think of an "endpoint-id" as the "name" of a device
or protocol stack instance that is communicating over
a network
in the real world, this is like your "name"--who you are.
a "domain name" is a human readable analogue

endpoint-IDs:
persistent--long term binding, stays around as long as
machine is up
ease of administrative assignment
hierarchy along organization boundry (like DNS), not
 topology
portable:
stay the same no matter where in the hierarchy you
 are
Globally unique!
unlike human names.  ^_^

Locators: "where" you are in the network
think of "source" and "dest" addresses in routing and
 forwarding as locators
real-world analogy is street addresses or phone numbers.
typically some hierarchy (like address), or like
historical phone number (before portability!)

Desireable properties of locators:
hierarchical assignment according to topology (isomorphic)
dynamic, transparent renumbering without disruption
unique when fully specified, but may be abstracted to
reduce unwanted state
variable length
realworld--don't need to exact street address in Australia
 to fly there
Possbly applied to traffic without end-system knowledge
effectively like NAT, but doesn't beak end-to-end

Why should I care?
v4/v6 there are only "addresses" which serve as both
endpoint-ids and locators
this means they don't have the desirable properties of
either:
assignment to organizations is painful because use as
 locator constrains it to be topological
exceptions to topology create additional global state
renumbering is hard; DHCP isn't enough, sessions
 get distrupted, source-based filtering breaks, etc.
Doesn't scale for large numbers of "provider-indep"
or multihomed sites

why should I care?
currently, v6 is only a few hundred prefixes; won't be
a problem until it really ccatches on, at which point
it's too late.
larger v6 space gives potentially more pain
NAT is effectively id/locator split--what happens if
 NAT goes away in v6?
scale of IP networks still very small compared to what
 it could grow to
re-creating the routing swamp with ipv6 with longer
 addresses could be disasterous; not clear if internet
 could be saved in that case.
Been ignored by IETF for 10+ years
concepts have been known since 60s.

Can v6 be fixed?  And what is GSE, anyhow?
Mike O'Dell proposed this in 1997 with 8+8/GSE
keep v6 packet format, implement id/locator split
http://ietfreport.isoc.org/idref/draft-ietf-ipngwg-gseaddr

basic idea: separate 16-byte addrss into 8-byte EID
and 8-byte routing goop/locator
change TCP/UDP to only care about 8-bytes.
allow routing system to muck with other 8 bytes in-flight

achieves goal of EID/locator split while keeping most of
IPv6, hopefully without requiring  new database for
EID-to-locator mapping
Allows for scalable multi-homing
Renumbering can be fast and painless/transparent to hosts.

GSE issues
problems with it--incompatible changes to TCP/UDP
in 1997, no IPv6 installed base, easy to change;
now, v6 deployed, is it too late to change?
violation of end-to-end principle.
perceived security weakness of trusting "naked" EID
(steve bellovin says this is a non-issue)
mapping of EID to EID+RG may add complexity to DNS
depending o

2006.06.07 NANOG-NOTES Smart Network Data Services

2006-06-09 Thread Matthew Petach


(I'm starting to guess I'd finish sending these out faster if
I stopped falling asleep on my keyboard so often... --Matt)


2006.06.07 Welcome to Wednesday morning

http://www.nanog.org/
click on Evaluation Form
Let us know how the M-W vs S-Tu
format; next time will be S-Tu due to ARIN
joint meeting, but need more feedback!

Bill Woodcock, been on program committee

And lightning talk people need to send their
slides to Steve Feldman!!

Elliot Gilliam,
ISP community, notifications to
Smart Network Data Services
[slides are at
http://www.nanog.org/mtg-0606/pdf/eliot-gillum.pdf

AGENDA
postmaster services
SNDS
problem
goal
today
tomorrow
motivation
feedback/dialog
questions/discussion

Postmaster--starting point for any issues you have
sending mail into Hotmail/MSN Live.
It's like AOL skunkfeed, you can do junk mail
 reporting.
Lets you see what bad stuff is coming from your
 domain.
SenderID

Site is at:
http://postmaster.msn.com/snds/

Problem:
bad stuff on the internet (spam, phishing, zombies,
ID theft, DDoS)
makes customers unhappy.
Solution #1 -- try to stop it before it hits customers
doesn't really *solve* the problem
Solution #2 -- take what we learn, apply it upstream,
get more bang for buck
#2: #1 is too low

ISP-centric efficiency
solution #1, n ISPs have n-1 problems, total is O(n^2)
n ISPs have 1 problem (themselves), total is O(n)

reduces work of the overall system.

Crux
today people and ISPs are measured by how much BAD stuff
 they *receive*
Not judged by what they send out.
similar to healthcare industry
 no tight feedback loop to ISP behaviour
nice quotes on slides
http://www.circleid.com/posts/how_to_stop_spam

7 step program (like 12 step, but shorter)
1: recognize the problem:   SNDS
2: believe that someone can help you :  Me
3: Decide to do something : You
8: Make an inventory of those harmed :  SNDS
9: Make amends to them :Tools
10: Continue to inventory : SNDS
12: Tell others about the program  :You

What is SNDS
Website that offers free, instant access to MSN
data on activity coming from your IP space
 data that correlates with "internet evils"
 informs ISP to enable local policy decisions
Automated authorization mechanism
uses WHOIS and rDNS
users are people not companies
A force multiplier attempt.

You can do it on your own, no need to sign up
your company officially as long as you're an
rWHOIS/WHOIS contact.

SNDS goal:
provide info which allows ISPs to detect and fix any
undesired activity.
qualitative and quantitative data
"No ISP left behind"
stop problems upstream of the destination
Bring total cost of remediation to absolute minimum
keep service free
Make internet a better place.

We have data!
Windows Live Mail/MSN Hotmail is a spam and spoofing
target.
4 billion inbound mails/day
 90/10 spam/ham by filtering technologies
User reports on spam, fraud, etc.

Inbound mail system slide--ugly to read, too dark.

SNDS website slide shown.
You can see daily aggregated traffic from your network;
activity periods, IPs, commands and messages seen on
port 25, samples of exchanges.
Filter results on your mail
rate at which users press "this is junk" on your mail.
Trap counts for when IPs hit their junk filters.
comments column is catch-all for anything else they
might put in; like open proxies, when tested positive.
"export to CSV" button, so you can feed the data in
to your own systems if you want.

Today's Scenario
Illustrate magnitude and evidence of a problem.
additional resources
monitoring infrastructure

SNDS Stats
2500 users
mostly senders
67 million IPs
10-20% of inbound mail and complaints

Output drops by 57% on /24+ when monitored by SNDS

SNDS tomorrow
Usability
signup by ASN
better support for upstream providers
access transfer
Utility
programmatic access
Data
virus-infected emails
phishing
honeymonkey
sample messages
Expand the the coverage, try to hit more of the problems
on the net.
Provide sample messages, compelling evidence when facing
customers
This hasn't shipped yet, it's what he's hoping to
have in a month or two.

Tomorrow's Scenarios
Lowered
barrier to entry
recurring "cost"
ISP  types
end-user
tier 1/2 monitoring, tier 2/3
directly attack more than just spam
virus emails -> infected PCs, outbound virus filters
phishing/malware hosting -> takedowns.

Is asymmetric routing a sign of people trying to
launch hidden abuses of the net?
Looking to hit more issues, like spotting virus-laden
messages; either infected, or an open relay.
Hoping that automation speeds response.

Safety Tools
Stinger: http://vil.nai.com/vil/stinger
Nessus: http://www.nessus.org/
[oy, read the list from his slide, it's long.]
green items on the list are free, others are pay-for
products.
Pay-for isn't necessarily a bad thing if you get
benefit!

Safety tool breakdown from MSN on next slide.

Motivation:
Hypothesis: everyone benefits
Customers:
infected uses get fixed
safer, cheaper, better internet experience
ISPs
solution #1 isn'

2006.06.06 Net Optics Learning Center Presents Passive Monitoring Access

2006-06-08 Thread Matthew Petach


(apologies, this really was just a marketing presentation
in very, very thin disguise.  I really want that hour of my
life back.  :(  --Matt )

2006.06.06 Net Optics Learning Center Presents
The fundamentals of Passive Monitoring Access
[slides are at:
http://www.nanog.org/mtg-0606/pdf/joy-weber.pdf

TAP technology--tools change, but some things
stay somewhat constant--need a way to collect
information.
Port contention for monitoring--how many people
are running into these issues?
How many people use SPAN ports to get access
to information?

Agenda:
Present an overview of Tap technology and how it
makes network monitoring and security devices
more effective and efficient.

tap technology overview
taps, port aggregators, and regen taps
active response, bypass switches
link aggregators and matrix switches
taps with intelligence

Add more intelligence, SNMP capability into
remote tap systems.

passive monitoring access--you should have full
access to 100% of the packet data; even errors,
etc. at layer 1 and layer 2.

passive means without affecting traffic
no latency
no IP addresses
no packets added, dropped, or manipulated
No link failure

traffic can be collected via:
hubs
optical taps

What is zero delay?
eliminates delays caused by the 10msec delay
found in most taps when the tap loses power.

Zero Delay means if the tap loses power
no packets dropped/resent
no latency introduced
power loss to tap undetectable in the network

Hubs are cheap and easy, get most of the info
you need.  The more utilization, the higher
the collision rate means you're not getting all
the data you need.

Placing devices in-line; you get full visibility,
but requires impact when you need to move monitoring
tool from one place to another, or work on the
tool.
advantage: see all traffic including layer 1 and 2 errs
preserve full duplex links

SPAN ports--gain access to data, internal to a switch;
good for data internal to switch fabric.  But you lose
layer1 and layer2 errs; not so bad for security tools,
but for network debugging, horrible.
Only supports seeing data flowing through a
single switch.
fights over who gets access to the port for tools.

Test Access Ports (TAP)
designed to duplicate traffic for monitoring devices.

You put it inline once, it's inline, passive.  preserves
full duplex links, device neutral, can be installed
between any 2 devices.
remains passive
no failure point introduced

fiber taps don't even require power.
always need to fail through, no interruption.

creates a permanent access port to the data
stream.

copper and fiber handled differently;
copper has a retransmit system to replicate
the information; fiber, just splits photon
streams.

Two output ports, only transmitting data;
no way to send data back through.
No way to introduce errors.

Different types:
single tap: duplicates link traffic for a monitoring
 device
regeneration tap: duplicates link traffic for multiple
 monitoring devices
link aggregator tap: combines traffic from multiple
 links
matrix switches: offer software-control access to
 multiple links
other tap options:
 built-in media conversion--use mismatched interfaces
  without separate media converter
active response--inject responses back into the link.

converter taps serve two purposes--connect dissimilar
interfaces without media converter.  but usually
don't fail through cleanly.

Active response is generally in the security arena.
sends back to both sides.

Copper tap devices
10/100baseT
10/100/1000baseT triple speed
1000baseT normal gig tap

Need TWO monitoring NICs to see full duplex data, since
you get TWO TX links coming at you.

Try to get triple speed TAP with dip switch
speed/flow setting, rather than trying to autosense.

Fiber taps
gigabit
SX/LX/SZ,
10gig
SR/LR/ER (multimode and single mode)

still has 2 TX outputs.

topology, and split ratio
split ratio is amount of light going to each
port.

split ratio--amount of light you're willing to
tolerate giving up on the network port.
Basically, work up a Loss Power budget for the
link, figure out how much you can afford to lose
before you lose link.

Need to make sure that there will be no impact
for either end!

Do you take distance between the monitoring device
and the tap output device?  Yes, try to keep within
the reduced power budget available off the monitor
port, usually about 10 meters should be fine.

Can you re-use optical taps for OC12 ATM
as well as gigE or 10gigE?

will be specific for multimode vs single mode,
if you stay at 50/50, generally not a problem.

Converter taps are generally powered.  the primary
path is passive, but the monitoring port has to be
active to support the media conversion.

Port aggregator taps
full duplex link being tapped, aggregating out a single
link so you don't need 2 NICs to capture the TX data.

can also make a port a full duplex, 2 way active/passive
port in newer models.

what about multiple output ports?  allow passive
access for multiple monitoring devices to a single
throug

2006.06.06 NANOG-NOTES MPLS TE tutorial

2006-06-08 Thread Matthew Petach


(still here, just been really busy at work today; will try to finish sending the
notes out tonight.  --Matt)

2006.06.06 MPLS TE tutorial
Pete Templin, Nextlink
[slides are at:
http://www.nanog.org/mtg-0606/pdf/pete-templin.pdf
http://www.nanog.org/mtg-0606/pdf/pete-templin-exercise.pdf

He works in a Cisco shop, no JunOs experience
Operator perspective, no logos

Traffic engineering before MPLS
--the "fish" problem.

two parallel paths, one entry router,
one exit router, you end up with all
traffic taking one path, not using the
other path.

IGP metric adjustments
can lead to routing loops
hard to split traffic

No redundancy left over if both paths
filled, but can be good for using 2 out
of 3 paths.

MPLS TE fundamentals

Packets are forwarded based on FIB or LFIB
FIB/LFIBS built based on RIB

TE tunnels;
TE tunnel interface is a unidirectional logical link
from one router to another.
Once the tunnel is configured, a label is assigned for
the tunnel that corresponds to the path through the
MPLS network (LSP)

TE tunnel basics
Once traffic is routed onto the tunnel, the traffic
flows through the tunnel based on the path.
Return traffic could be placed onto
a tunnel going the opposite direction,
or simply routed by IGP

Key terms for TE
Headend
router on which the tunnel is configured
Tail
destination address of tunnel
Midpoint
router(s) along the path along the tunnel LSP

Basic TE config
Global:
mpls traffic-eng tunnels
IGP: must be OSPF or IS-IS
mpls traffic-eng rouer-id Loopback0
mpls traffic-eng 
physical interfaces
mpls ip
mpls traffic-eng tunnels
 tells IGP to share TE info with other TE nodes

interface TunnelX
ip unnumbered loopback0
 borrow the loopbak's address so we can forward traffic
   down the tunnel
tunnel mode mpls traffic-eng
tunnel destination 
 tunnel tail
tunnel mpls traffic-eng path-option 10 dynamic
 find a dynamic path through network
   best path
   with sufficient bandwidth
 will discuss path selection in a bit

Where are we at?
Tunnels go from headend to tail end through midpoint
routers over a deterministic path
we know what commands go on a router for the
global
physical interface
tunnel interface commands

TE and bandwidth
Physical interfaces can be told how much bandwidth can
 be reserved (used)
 ip rsvp bandwidth X X
TE tunenls can be configured with how much bandwidth
 they need:
 tun mpls traff bandw Y
Tunnels will reserve Y bw on outbound interfaces, and
 find a path across the network wth X(unused)>Y BW.
This prevents oversubscription, or at least helps
control it.

You can allow for burst room, but for now we'll stick
with static, non-oversubscribed links.

TE BW
operators can adjust the tunnel bandwidth values over
time to match changes in traffic.
If tunnels are dynamically placed, the tunnels will
dynamically find a path through the network with
sufficient bandwidth, or will go down.


TE auto-bandwidth magic
Tunnels can be configured to watch their actual traffic
as in "shw int | inc rate" every five minutes,
and update their reservation to match, at periodic
intervals.
Dynamic reservations to match the live network
Bandwidth is 'reserved' using RSVP
 but not "saved" for TE
Often buys enough time to identify the surge, see
where the traffic is coming/going.

The number is only a number in control plane; no
actual impact on data plane, no shaping, no control
on real data flows.

tunnel mpls traffic-eng auto-bw frequency Y
each auto-bw tunnel does "sh int" to capture
its rate every 300* seconds
each auto-bw tunnel updates "tunn mpls traff bandwidth X"
 every Y seconds
The config actually changes; this will impact your
RANCID tracking.

It uses highest measured rate during the interval Y

May want to tweak your load-interval, since it's a
decaying function over time; 5 minute is a fairly
smooth value.

May need to tweak config check-in system to avoid
getting flooded with bandwidth adjustments.

Covered:
TE tunnel basics
router config basics
general concepts about TE and bandwidth
In this case, the shortest path that has X BW available
for reservation
 actually, bw X at or below priority Y, but that's later.

SPF calculations
step 0: create a PATH list and a TENT list
step 1: put "self" on PATH list.
step 2:
step 3: put PATH nodes' neighbors on TENT list
step 4: if TENT list is empty, stop.
step 5:
jump back to step 2:

Example exercise -- calculate router A's best path to
router D using the handout.

CSPF notes
No load sharing is performed within a tunnel; as soon
as a path is found, it wins
CSPF tiebreakers:
lowest IGP cost
largest minimum available bandwidth
lowest hop count
top node on the PATH list

Creating paths -- can be created dynamically,
or statically via static paths.

Dynamic:
tunnel mpls traff path-option X dynamic

Explicit paths
paths can be crated manually by explicitly creating
a path
"ip explicit-path name "
next-address X
next-address Y
tunnel mpls traff path-option X explicit name blah

Paths can be created manually by explicity configuring
a p

2006.06.06 NANOG-NOTES IDC power and cooling panel

2006-06-08 Thread Matthew Petach


(ok, one more set of notes and then off to sit in traffic for an hour on
the way to work... --Matt)


2006.06.06 Power and Cooling panel
Dan Golding, Tier1 research, moderator

Hot Time in the Big IDC
Cooling, Power, and the Data Center

3 IDC vendors, 4 hardware vendors
Michael Laudon, force10
Jay Park, equinix
Rob Snevely, Sun
Josh Snowhorn, terremark
David Tsiang, cisco
Brad Turner, juniper
Brian Young, S&D

The power and cooling crisis
internet datacenters are getting full
most of the slack capacity has been used up
devices are using more and more power
low power density - routers, full sized servers
medium power density - 1u servers, switches
high power density - blade servers
Many data centers are full at 70-80% floor space
utilized
North America IDC occupancy is around 50%
 most sought-after space is around 70%

full when power and cooling capacity is used up,
floor space is vacant but can't be used.

There is a relationship between power and cooling
devices are not 100% efficient
I^2R losses means that power becomes heat
 (conservation of energy)
heat must be dissipated
The ability to dissipate heat with normal cooling
technologies is hitting the wall
need new techniques

Some quick rules of thumb
a rack or cabinet is a standard unit of space
from 30-40sqft per rack
power is measured in watts
many facilities do around 80-100w/sqft; at 30sqft
 per rack, that's about 3kw/rack
high

how did we get here?
what is current situation
where are we going?
[dang,  he's flying through his slides!!]

Hardware engineers
T-series hardware engineer for Juniper
CRS-1 hardware
E-series
datacenter design issues for Sun,
there were other hardware vendors who were not
interested in showing up, these people were brave
for coming up here!

Josh snowhorn, IDC planner
Jay Park, electrial engineer for equinix
Brian Young, S&D cage design specialist

What do the IDC vendors feel the current situation
is in terms of power/cooling, how did we get here?

Josh--designed datacenters at 100w/sq/ft, more than
enough for the carriers; the server guys hit 100w/sqft
in a quarter rack.  you could cannabalize some power
and cooling, but still ran out of cooling.
Now spend hundreds of millions to make 200wsqft
datacenters, or higher.

Now, to hardware vendors--why are their boxes
using up so much electricity, putting out so
much heat?
What are economics behind increasing density
and heat load?


From high-end router space--it's been simple, the

bandwidth demand has grown faster than the power
efficiency can keep up with.  In the past, had
the ability to improve keep up, do power spins about
every 2 years, half power; but now bandwidth is
doubling every year, but takes two years to drop
power in half.  We've been loosing at this game
for a while, and running out of room on the voltage
scale; 90nm is down at 1v, can't go much lower,
since diode drop is at 0.7v; at 65nm, it's still
at 1v, there's no big hammer anymore for power
efficiency.  Need to pull some tricks out, but
may need to do clock gating, may get some 20-30%
efficiency gains, but not much more that can be
pulled out of the bag now.

Newton was right; you can do some tricks, but no
magic.  Chip multithreading is one area they're
trying to squeeze more performance out of; don't
replicate ancillary ASICs for each core.  Also
can more easily share memory, and nobody has a
100% efficient power supply, so you lose some
power there too.

More and more getting squeezed in each rack.

Also a drive on cost; amortizing costs over
space and capability.
reducing costs per port is a big driver.

And customers are pushing for more and more
density, since the cost of real-estate is getting
so high, since each square foot costs so high.
In Ginza, $120/sq ft for space.

If you go to places where realestate is cheap,
easier/cheaper to just build really big rooms,
and let power dissipate more naturally.

IDC people agree, some cities are just crazy
in real-estate costs.  But for those in suburban
areas, cost of real-estate isn't so expensive.
3kw per blade server, put a few in a rack, you
hit nearly 10kw in a rack.  Soon, will need
direct chilled water in the rack to cool them.
But chilled water mixed with other colocation
and lower density cabinets is very challenging
to build.
But need to have enclosed space to handle local
chilled water coolers in localized racks.
20 years ago at IBM, nobody wanted chilled water
in their hardware.  Now, we're running out of
options.

Disagree--other ways of handling the challenge;
how thermally efficient are the rooms in the
first place, and are there other ways of handling
heat issues?
Cables with a cutout in tiles allows air to escape
in areas that don't provide cooling.

Josh notes the diversity between carriers at 40w/sq/ft
vs hosting providers at 400w/sq/ft is making engineering
decisions challenging.

It's not about power really anymore, we can get power,
it's about cooling now.

Dealing with space in wrong terms--watts/sq ft, vs
requirements of each 

2006.06.06 NANOG-NOTES CC1 ENUM LLC update

2006-06-08 Thread Matthew Petach


(sorry these are coming out delayed, I had to deal with an internal
routing challenge
for much of yesterday afternoon.  --Matt)

2006.06.06  CC1 ENUM LLC

IPv6 DAY
http://www.ipv6day.org/

6bone is being shut down today, on the grounds
that IPv6 is live and commercial, based on Jeordi's
findings.

Quotes slide, link to page you can register your
apps on...

Moderator for second session,
Vish from Netflix, member of program committee.

couple of topics to talk about; will start off
with Karen Mulberry
from Neustar talking about the US ENUM trial

This is her first NANOG, very informative,
interesting, entertaining.

CC1 ENUM LLC --what is it?
some background: north american numbering plan,
19 countries.
formed sept 2004 by industry
CC1 shared by 19 countries?  US and canada and
others.

LLC obtained the CC1 ENUM trial delegation in Feb 2006
1 exists at RIPE, points to a server in Canada,
waiting for the rest to happen.
USG "guiding principle" and canadian government
and carribean--interoperate, protect privacy,
foster innovation, promote competition.

US Trial is for End User ENUM ONLY
applied to FCC for numbering for trial, waiver
hasn't been given yet; only regional numbers,
no 800, toll free, or other non-geo numbers used
during trial
No testing in enum.arpa? of carrier enum.

CC1 ENUM trial
test service as interface within CC1, specifically
in US
CIRA will host the temporary Tier 1 registry
Each CC1 country must opt into ENUM trial, gets
their own Tier 1 registry
CIRA just handles 800 area codes for CC1 for US
Canada itself has a trial committee, they are
preparing their own corp. to handle Canada.
And Jamaica is going to do their own.

US Trial, TPAC is committee of trial participants,
will produce trial results.

Each country will do their own Tier 1B registry

Trial roles--a number identified; Tier1B is a subset
of a Tier1 registry
Tier2 provider.
Local exchange provider has to provide...
[wow, slide went fast]

Trial in 3 phases.
registry infrastructure
registry/registrar interface
application testing

phase 2 is under development; phase 3 has some
proposals.  Phase 1 is underway.

TPAC (trial committee) -- 11 members signed MOU
developed documents thus far

TPAC US trial estimated timeline
phase 1: registry infrastructure
late june/july, lasts 2 months, starts
after FCC grants waiver
phase 2: registry/registrar interface
expected to start aug, lasts 2 months
depends on when phase 1 ends, depends on FCC
 waiver
phase 3 applications
later this fall

CC1 timeline as of march 2006
[eyechart slide, good luck reading it.]

By Q4 2006, an RFP will be issued for commercial
tier1 and tier1B registries for CC1, goal to go
live mid 2007.

commercial operations
2 RFPs
tier 1A (for all CC1)
tier 1B for US
expect to see the RFPs Q3/Q4 2006,
beta late next year.

Challenges facing enum
defining the global standard for Carrier/Infrastructure
/Operator/Provider ENUM
Protecting end user security and privacy
managing opt in requirements
ensuring verification and authentication
integrating domestic/global policy mandates.

how do we integrate what happens in the US with
the rest of the world.

CC1 ENUM info resources

CC1 ENUM LLC
http://www.enumllc.com/
US ENUM Forum
http://www.enum-forum.org/
Canadian ENUM Working Group
http://www.enumorg.ca/

Q: What about bringing carrier/operator enum to IETF
forum?
A: working on it -- there was an announcement yesterday
in regards to that.

Moving on to next speaker now.


2006.06.06 NANOG-NOTES DDoS attack information collection

2006-06-07 Thread Matthew Petach


Information collection on DDoS attacks,
Anna Claiborne, Prolexic Technologies.
[slides are at:
http://www.nanog.org/mtg-0606/pdf/anna-claiborne.pdf

DDoS mitigation service.
personal experience mitigating over 150 DDoS
attacks.

Popular topic, but nobody talks about how you
can defend yourself or take legal action;
only thing you can do is collect information.

0.1% of DDoS attacks end in an arrest, that's
out of the reported number to the US Secret
Service, and that's out of the ones that fall
into their jurisdiction.

These are real losses:
A major US corp lost over $2mil in a 20 hour
outage
An offshore gambling comp. lost estimated $4m
in 3 days
Online payment processor lost $400,000 in 72 hours
online retailer lost $20K/day over 3 weeks.

These are directly reported losses; doesn't include
lost PR, etc.

Canadian retailer spend 50K on hardware mitigation,
they got kicked out of 3 datacenters due to the DDoS
attacks, spent 20K on IT and security consultants,
and another $6K on a different mitigation that also
failed.

Basic Information Collection
Get packet captures--either from machine being
attacked, or a span port, or from upstream
device,
tcpdump -n -s0 -C
(get full length of raw packet, limit pcap file
to 5MB or smaller)
take 3 or 4 over 15 minutes, to start, and then
repeat every hour
Determine the type of attack and duration (ex SYN
flood lasting 6 hours)
Obtain as complete a list as possible of source IP
addresses
Save bandwidth graphs, flow data, pps graphs, any and
all visual material relating to the attack
Save any contact with the attacker, email, chat
conversation, phone calls, etc.
Get loss figures from management--downtime, per hour
losses, per day losses, section 18 of some law, have
to substantiate losses over $5k before you can take
legal action against someone.

Recommendations
have a plan!  DDoS is stressful
Put all attack information in a central location
God monitoring doesn't have to be expensive, a simple
fiber card in a 1u box can be a mirror port for a
large volume of traffic
Don't have to have expensive hardware like arbor
 boxes.
 Limit to 100mb to prevent killing your capture box.
Graphs and flow data can be retrieved from upstream

Find the source
Use list of source addresses, find a reputable hosting
company, you may even see a friend's IP
Approach the network with the infected machine, give them
as much information as possible, it can take time
finding someone willing to help
Obtaining information is dependent on who you are dealing
with, be as helpful as possible.
Get information from the infected machine netstat,
tcpdumps, who is logged in, web logs, access logs
Get and save the source code responsible

process can take hours to weeks--prolexic has huge
contact list, and even for them can be really
difficult
And SAVE all your information to a central location!
and back it up!

Examine the source code
scripts are best, you know exactly what's going on
compiled code, run strings on it
best case, you can get a name or identification for who
wrote it, passwords, domain names, port usage
worst case you can obtain information that doesn't make
sense...yet
(it may fit into a bigger context later)

Locate controlling server
Examine TCP connection table or source code to find
the controlling server
verify your information, scan or connect to the suspect
machine
contact abuse where the server is hosted, explain the
situation
have as much information possible to verify your
conclusion and validate your identity
Good luck, most abuse contacts are less than helpful
Raises a good question: how to improve awareness and
legitimate requests answered.
(may be able to get FBI to provide warrants to seize
machines that are being used to control attacks against
you, but takes time and documentation)

Hunting the attacker (not for the faint of heart!)
Review all information gathered so far on the attack
contact the attacker, establish a report
save all information and/or conversations (important
note, if conversations aren't on a public server,
they can't be used)
Piecing the information together to form a high level
view of the exploit, attack, and attacker
A long process, most attackers are highly motivated
and skilled, you usuallly have to wait for them to
slip up!

Resources:
local FBI field office department of cybercrime
department of homeland security
CERT
Cymru--great guys, if they have to help you
NHTCU--EU, cyber crime divisions in local offices
Local US secret service--division of electronic crimes
DDoSDB.org -- under development at the moment.
 how to identify/recognize different types of attacks
 may be able to put their attack database open to the
  public up there.

A success story
The tracking of x3m1st/eXe
responsible for hundreds of extortion based DDoS
attacks
tracked for months
eventually lead to his arrest.

hid behind four levels of compromised servers.

eXe and his group only talked on private IRC
servers; made the mistake of connecting from
his home domain, from a m

2006.06.06 NANOG-NOTES network-level spam behaviour

2006-06-07 Thread Matthew Petach


2006.06.06 Nick Feamster, Network-level spam behaviour
[slides are at:
http://www.nanog.org/mtg-0606/pdf/nick-feamster.pdf

Spam
unsolicited commercial email
feb 2005, 90% of all email is spam
common filtering techniques are
content based
DNS balcklist queries are significant fraction
 of DNS traffic today.  (DNSbls)

Using IP address based spam black lists isn't so
useful.
How spammers evade blacklists will be discussed
as well.

Problems with content-based filters
...uh oh, some technical glitches...

Content-based properties are malleable
low cost to evasion
altering content based on scripts is too easy
customized emails are easy to generate
content based filters need fuzzy hashes over
 content, etc.
high cost to filter maintainers
as content changes, filters need to be updated.
constantly tweaking spamassasain rules is a pain.

false positives are always an issue.

Content-based filters are applied at the destination
too little, too late -- wasted network bandwidth,
 storage, etc. ;  many users recieve and store the
 same spam content.

Network level spam filtering is robust (hypothesis)
network-level propeerties are more fixed
hosting or upstream ISP (as number)
botnet membership
location in the network
IP address block
country?

are there common ISPs that host the spammers, for
example?
Avoid receiving mail from machines that are part
of botnets.

Challenge--which properties are most useful for
distinguishing spam traffic from legitimate email?

very little if anything is known about these
characteristics yet!

Randy gave a lightning talk last NANOG about some
of this.

Some properties listed.

Spamming techniques
mostly botnets, of course
other techniques too
we're trying to quantify this
coordination
characteristics
how we're doing this
correlations with Bobax victims
 from georgia tech botnet sinkhole
other possilities: heuristics
distance of client IP from the MX record
coordinated, low-bandwidth sending

looked at pcaps coming in from hijacked command
and control station from bots trying to talk to
it; spamming bots, Bobax drone botnet, exclusively
used to send spam.

Collection
two domains instrumented with MailAvenger (both on
the same network)
sinkhole domain 1
 continuous spam collection since aug 2004
 no real email addresses--sink everything
 10 million + pieces of spam
sinkhole domain #2
 recently registered Nov 2005
 "clean control" domain posted at a few places
 not much spam yet--perhaps being too conservative
 contact page with random email contact, look at
  who crawls, and then who spams the unique email
  addresses

Monitoring BGP route advertisments from same network

Also capturing traceroutes, DNSBL results, passive
TCP host fingerprinting, simultaneous with spam arrival
(results in this talk focus on BGP+ spam only)

Mail Avenger, not an MTA, it forks to sendmail or
postfix, it sits in front of MTA, does things
like do DNSBL lookups, add headers, passive OS
fingerprinting, as the spam is arriving.
Also logged BGP routes from same network that got
the spam; see connectivity to the spamming machine
at the time.

Picture of collection up at MIT network.

Mail Collection: MailAvenger
X-Avenger header.
best guess at operating system, POF, DNSBL
lookups, traceroutes back to mail relay at the
time the mail was sent (used for debugging BGP)

distribution across IP space
plot /24 prefix vs how much spam coming from it.
steeper lines mean more spam from that part
of the IP space; you can see where spam is
coming from.  bunch comes from apnic, cable
modem space, etc.
few interesting things to note; still redoing
legitimate mail characteristics.
from georgia tech mail machines, it's legit plus
spam, need to split out better.
between 90.* and 180.*, legitimate mail mainly.

Is IP-based blacklisting enough?
Probably not: more than half of spamming client IPs
appear less than twice.

Roughly 50% of the IPs showed up less than twice;
but that's a single sinkhole domain, would help
more across multiple domains.

emphasizes need to collaborate across multiple
domains to build blacklists; any one domain
won't see repeated patterns of IPs.

Distribution across ASes
40% of spam coming from the US

BGP spectrum agility
Log IP addresses of SMTP relays
Join with BGP route advertisements seen at network
where spam trap is co-located.

A small club of persistent players appears to be using
this technique
61.0.0.0/8 AS4678
66.0.0.0/8 AS21562
82.0.0.0/8 AS8717
somewhere between 1-10% of all spam (some clearly
intentional, others might be flapping)

about 10 minute announcement time of the /8 while
spam is flooded out.
Might be interesting to couple this with route
hijacking alerting to filter out if this is
really a hijacking vs a flapping legitimate route.

A slightly different pattern;
announce-spam-withdraw on a minute-by-minute basis.
really really egregious!

Why such big prefixes?
flexibility: client IPs can be scattered throughout
dark space within a large /8
 same sender usually returns with dif

2006.06.06 NANOG-NOTES DNS reflector attacks

2006-06-07 Thread Matthew Petach


(I was going to try to get all the notes from today's panels out
before going to bed, but I fell asleep on my keyboard finishing
up these notes, so I think I'm going to wait and send the batch
of Tuesday and Wednesday notes out after things wrap up on
Wednesday.  Sorry about the delay, but I need a bit more sleep
I think.  ^_^;;  --Matt)


2006.06.06 Morning welcome, and introduction
of Chris Morrow, panelist

Please fill out survey today if you're going
to be leaving!

Frank Scalzo, Verisign
Recent DNS reflector attacks.

Attacker breaks into innocent authoritative
DNS server, publishes large text record;
then does queries from zombie army
against that record, with sources spoofed
with victim IP.

5 gig attack, 2.2G made it, 3gig didn't.

E.TN.CO.ZA DNS attack, 64 byte query,
63:1 amplification, 4028 byte answer
34,668 reflectors.

Victim sees 5G of traffic, 144,142bps
per reflector, 13.5packets per second
4.5DNS answers per second.

reflectors won't see this as anomolous for
the most part; top talker only sent 8.5
answers per second.

No visibility into the attacker at all,
but best guess was 79Mb of source generated
5GB of responses.
Record was maliciously installed;
2 auth servers, 1 compromised; 65% response,
35% name error.

Answer comes in 3 fragments, larger than
normal MTU.
Attack came in 3 phases.
first port 666, then port 53 and 666, then all 53.
Port shifts are nearly instant, so fast command
and control system in place for it.

Filter out open recursive DNS servers;
you can't put ACL in for 500,000 DNS
servers.
What about limiting DNS packets to 512 bytes?
will break things.
What about blocking 53 outside of your network
hierarchy, force people to use your resolvers?

What about discarding fragments?

Challenge is getting your upstream to implement
it, unless you have hardware and pipes to handle
the flood coming at you to start with.
Some ISPs won't do it unless they see live attack
traffic, and a 24 minute attack is too short lived
for ISPs to see and react to.

data from Jan 11 - Feb 27 this year.
Attack queries/second consistent with avg reflector qps.

one reflector sent 1.9M DNS answers to 1593 victims,
605 different queries to generate answers.
180TB of attack traffic on Feb 1st.
after feb 15th, ramped down.

Assume 4KB response packet,
see attacks between 3G and 7G, the scary part is
that it only took 130Mb to generate the 7G attack,
and the 3 gig attacks are all from less than a
fastE connected compromised web server.

500,000 reflectors with 2G source could generate
a 120GB DoS attack.

Top victim got over 130Tb of attack traffic, top bunch
are all over 100Tb

65,461 ports used, Top port is less than 10% of
traffic though

top 20 domains used, mostly innocent bystanders.

Internet root . was second highest domain used;
certainly can't filter *that* out.

Fundamental challenge;
UDP lacks 3 way handshake, easy to spoof
DNS is easy target, so many unsecured DNS servers
Other UDP servers need to be evaluated as well

DNS
closing 500,000 open recursive DNS servers will
be very, very painful.
poor separation between authoritative and recursive
DNS servers.
BIND allow-query ACLs, recursive DNS servers should
not accept queries from outside.

What if it's an embedded system like a wireless
gateway?

We depend on large records for DNSSec, etc.

Beyond open recursive DNS servers
root domain "." was used
most authoritative name servers will answer with an
upward referral
doesn't include actual IPs, but it's still 438 bytes,
and pretty much every DNS server responds to it.

Source validation
IETF BCP 38
How do you manage 70,000 ACLs on 500 routers?
what about people who are multi-homed with static routes?
what about legacy stuff that works but shouldn't?
strict RPF breaks with traffic asymmetry; loose RPF
 doesn't help with this.
ISPs see the problem as long, hard, expensive to
 overcome, and they're right.
If we never start trying, we'll never fix it!

Close open recursive DNS servers
DNS servers should include filtering
SOHO router vendors should fix their DNS proxy code,
don't listen on outside interface
BCP 38
otherwise we'll be jumping from protocol to protocol.

Questions?
Q: What does verisign do to protect their DNS servers?
A: Anycast, massive peering and transit capacity

Q: Jared Mauch, NTT/America; he turned on unicast rpf
on the NANOG upstream link.  372,000 packets that
people here have sent failed the RPF check.
BCP 38 is hard
Paul Quinn asked what percentage of the traffic that
is.
Bora Akyul, Broadcomm--any data on source ranges
on the packets being seen?
He could look at the 1 in 10,000 netflow sampling
to see, but the individual link is a /30, looks
like a normal customer link.
The Merit router isn't RPF'ing either.

Q: Ren Provo asks when they will peer;
A: not yet, next few months,
Miami Terremark, and other sites domestically
and internationally in next year and a half.


2006.06.05 NANOG-NOTES BGP tools BOF notes

2006-06-06 Thread Matthew Petach


(ok, last set of notes for tonight, and then it's off to bed for 90
minutes of sleep
before heading back to the convention center.  ^_^;  --MNP)


2006.06.05 Welcome to the 4th BGP Tools BOF!
[slides are at
http://www.nanog.org/mtg-0606/pdf/lixia-zhang.pdf

Nick Feamster GeorgeTech
Dan Massey CUS
Mohit Lad and Lixia Zhang, UCLA

The Goal
sharing some tools develop from our research
efforts.
hopefully will be useful for operations community.
Also to collect input on new tools we would like
to see so they can develop them.

Routing Configuration Checker
Nick Feamster

O-BGP data organization tool
Dan Massey
[slides are at
http://www.nanog.org/mtg-0606/pdf/dan-massey.pdf

The Datapository by Nick Feamster
[I'm sorry, that just sounds *far* too much like something
you do *NOT* want your bedside nurse administering...--MNP]

Visualizing BGP dynamics using Link-Rank by
Mohit Lad

Open discussions and demos

Nick Feamster
Network Troubleshooting: rcc and beyond

rcc: router configuration checker
proactive routing configuration analysis
idea: analyze configs before deployment
many faults can be detected with static analysis.

rcc implementation.
http://nms.csail.mit.edu/rcc/

preprocessor -> parser -> relational database (mySQL),
constraints <-> verifier <-> faults

verifier is a template checker and set of constraints
your configs are checked against.

He's looking for GUI developers.
very bare-bones command line right now.

Parsing configurations--shows some output.

He shows examples of the abilene configs, which
are non anonymized.
show all routers peering with a given AS, can look
at route maps in each direction, etc.

After running rcc on it, you get a web output
which shows relationships--oh, pictures don't matter,
with some more grease could be a reasonable representation
of your network.

Q: Randy Bush asks if it could show which peering
sessions are missing?
A: Not yet, but it could be added, thank you!

Shows processing and errors;
you get a page that summarizes the things RCC thinks
are errors.

Signalling partition?  that's a missing iBGP session;
he needs some better lingo in places.

Also shows anomalous imports, could be intended for
traffic engineering; that's "inconsistent policy"
in ISP speak.

Some of the names will get fixed to make Randy Bush
happy.

Yes, but surprises happen!
link failures
node failures
traffic volumes shift
network devices "wedged"
...

two problems
detection
localization

Need to marry static config analysis with dynamic
information (route is configured but isn't in the
dynamic table)

he skips a closer look, just some jargon.

Detection: analyze routing dynamics;
drill down on interesting operational issues.
idea: routers exhibit correlated behaviour
blips across signals may be more operationally
interesting than any spike in one signalling system.
How do you spot things in the churn?

Detection three types of events
single-router bursts
correlated bursts
multi-router bursts <---common; and commonly missed
using simple thresholds

Localization: joint dynamic/static
which routers are "border routers" for that burst
topological properties of routers in the burst.

proactive analysis -> deployment -> dynamic ->
 reactive detection -> diagnosis/correction -> static ->

By going back to the configs, lets you see if it's
something happening inside the network, or on the edge.

Specific Focus: firewall configuration
difficult to understand and audit configs

subject to continual modifications
 roughly 1-2 touches per day

federated policy, distributed dependencies
each department has independent policies
local changes may affect global behaviour

(These are pulled from Georgia Tech; 130 firewall
configs.  Builds static connectivity matrix.)

Reactive monitoring...use probes from subnets to
verify reachability/connectivity.

(immediate) open issues
reachability and reliability of controller
service-level probes
diagnostic tools != service-level happiness
policy conformance.

Q: can it give suggested remediation, or provide
config templates for new routers being added?
A:  Good idea!


OK, over to next presenter.  Helps with understanding
BGP data.

BGP data collection and organization (OBGP) Tool
Colorado state university/university of Arizona/UCLA

BGP data collection
takes lots of BGP data, from RIPE RIS, etc.
ISP BGP peer router -> update oreg -> rib+update ->

feeds into gigabytes of data, different formats,
potential errors enter in, and severe lack of metadata.

Other tools can use it, LinkRank, BGP-Inspect, and a
bunch of people cite it in reports and research.

OBGP motivation
Large Volume of Data
data from many sources (RIPE, RV, private data)
Long time scales and very recent (real-time?) data
Slightly different formats
RIPE/RV use different naming conventions
different dump intervals
different timezones for older data
Lack of MetaData
would like to only see desired peers and desired update
 types
Possible errors in the data
are updates missing 

2006.06.05 NANOG-NOTES Peering BOF notes

2006-06-06 Thread Matthew Petach


(This time around I opted to go to the peering BOF and take some notes.
It's the one downside to parallel tracks--wish I could be in two places at
once.  ^_^;; --MNP)


2006.06.05 Peering BOF

Bill Norton introduces the Agenda;
unfortunately, my laptop took so
long to boot, I missed the Agenda
slide.

Doug Toy?, Transit Cost Survey,
data collected at NANOG 36;
he's just here to present the collected
info, not really representing anyone.

Recap:
At NANOG 36, people indicated their cost
per Mb and commit level.
length of contract was usually 1-2 years.

42 data samples collected
avg $25/Mb
$95/$10 were the extremes.

Avg commit level 1440 Mbps

Other observations
as expected, cost per Mbs tends to decrease
as the commit level increases.
Tier1's are more expensive
Cost tends to vary more with Tier 1 providers
than with others.
between 0-500Mb commit level, prices are all
over; at higher commits, prices level out at
the bottom.

Question: Mbps, is that the cap, the usage,
inbound plus outbound?
A: That's the general 95th percentile higher of
the two inbound and outbound.  Committed amount.

Graphs tend to approach a hard bottom; as
commit increases, doesn't change all that much.
Bottom is around $10/Mb, even though commit
levels increase.
Of samples collected, 2/3 were from Tier1
providers.
90% of contracts are 1-2 year in length,
so didn't cause much variance.
Tier 1 definition is based on Wikipedia
definition.

Questions from audience?
Q: Data looked pretty clean; were there samples
pulled out to make it look cleaner?
A: No, other than people who left fields blank on
the survey.

Q: was there a timestamp of when contract started?
A: much of it wasn't complete.
Mostly within the 1-2 year range for length as well
as start date, so nothing really ancient in there.

BillNorton; people had some concerns about violating
NDA or contract details when filling out the survey.
Where do we draw the line in doing these types of
surveys?
SteveGibbard; NDA is agreement between transit
provider and customer, and this was anonymized
and voluntary.
Data is interesting, both for purchasers and
sellers of transit.

Q: 42 samples graphed, there were 80-100 people in
the room at the time; so the real comments from
the rest of people weren't counted?
A: No, there were less than 50 submissions total,
of which 42 were complete.
Q: Patrick; many people put more than one transit
provider on their form; how were those other
transit providers handled?
A: no clue, he just got a spreadsheet with data.

Back to Bill Norton

Peering Lists Issue -- make available  to customer
prospects?  15 mins.
Peering disclosure dilemma:
customers often asked for peering list,
sometimes peerings restricted under NDAs
Metric for determining connectedness, capacity,
 resiliancy.
Is there a better metric for customers?
IX capacity in/out
Peering pipe size?

ISPs are getting commonly asked about this, based
on hands raised in the room.
How many people lose business because the customer
doesn't get an answer?
Sylvie, VSNL notes they provide the info when they're
under an RFP; they won't give capacity, they give an
aggregate, they won't go peer by peer, that would be
a violation of NDA.
BillN: are the NDAs written to allow total numbers
like that?
Sylvie: you should not disclose capacity per location
or per circuit, but they don't forbid aggregate total
numbers.

BillN: is there something else that could be given
to the customer that would satisfy their question
without revealing what
Chuck: A lot of ISPs lie about their peerings; he
runs AMSIX, people claim to have multiple gig to
the peering exchange, he knows they don't really
have that much.
Patrick: but he can look at the peering stats on
AMSIX--Chuck notes only members can.
Patrick: customers ask how many gigs they can send
to a provider; it's available headroom, so they
ask their upstreams how much available headroom
is left.  Most providers are having a lot of
trouble getting the right capacities to the
right networks.  The reason many don't
answer is they don't like the answer they have
to give.
Ted Seely, Sprint, how do you solve the problem?
There's lots of traffic that needs to be exchanged,
how do you fix it?
Patrick: how about everyone upgrade to 10gigE
in many places?  If you can't afford it, stop
selling bandwidth.
But most people can't go to all the different
providers, they have to buy from a small subset
of providers.
RAS: No technology problem with doing it; it's
the money.  Not charging enough to cover the costs
of the technology you need to install to cover
the bandwidth.
Ted notes you can't just link at one spot, you
have to connect at six places, and you need to
have links in and out of the site to support the
volume, etc.
Patrick: can you tell us how much they exchange?
40Gb times 6 providers in six locations is probably
more traffic that Sprint has in total.
Ted Seeley; it's a time scale issue--yes, it can be
solved, but in what timeframe?

BillN feels it's reasonable information for

2006.06.05 NANOG-NOTES Network Neutrality panel notes

2006-06-06 Thread Matthew Petach


(since there's no slides for these online anywhere, and the slides
were going past
pretty quickly, I have to apologize for the gaps in the notes ahead of
time. --MNP)


2006.06.05 Network Neutrality Panel
[slides are not yet online]

next up is the controversial subject of
network neutrality;
Bill Woodcock will be chairing the panel,
so Randy Bush can go be a member of the
audience again.


Bill Woodcock: network neutrality has
been in front of press and legislatures
for the past several months, and has
been in the works behind the scenes for
almost a year.

Brokaw Price, peering at Yahoo!
Sean Doran, free agent, "rooting" for
Sprint back in the day
Sean Donelan, now at cisco,
Gene Lew, at neustar now, has done cable
operations before.

Sean can pretend to be Vint Cerf for this
panel.  :D

Network Neutrality--what does it mean to
operations people?

History:
Michael Powell, Feb, 2004, defined four internet
consumer rights (chairman FCC)
freedom to access content
freedom to use applications
freedom to connect personal devices
freedom to obtain service plan information

History of net neutrality concept:
feb 2005, madison river telephone company
consent decree
"Madison River shall not block ports used for
VoIP..."

August 2005, FCC policy statement
access lawful internet content
run applications and services subject to the needs
of law enforcement
connect legal devices that do no harm to the networks
competition amongst...
All of these principles are subject to reasonable
 network management.

That last bullet is what gives telcos ability to
quote QoS as a mandatory network management requirement.

March 2006, internet non-discrimination act
senate bill 2360
only 2% of americans have a choice in last mile.

shall not interfere with block, degrade, alter,
modify, impair, or change bits or content

May take measures to protect customers from
attacks
may protect their own network infrastructu

May 2006 Internet freedom and non-discrimination
modifies clayton anti-trust act.
Passed based on party lines.

Turns over to Sean to talk about his thoughts.

Sean Donelan
Doesn't represent anyone right now.
The Huck Finn approach.

And no, you can't configure this on your router.

What are we talking about?
Rep John Conyers (D-MI)
"internet as we know it is at risk"
Same guy noted in 2003 about cable operators being
smart enough to not poison their customer pool

Not really a new issue:
ANS CO+RE in 1991
unapproved networks filtered from R&E gateways
ANS and CIX, June 1992
ANS agreed to "provisionally" interconnect
CIX proposed filtering resellers (194)
NSF NAP/NREN solicitation (1993)
required NSPs connect to all priority Network Access
 Points (NAPs)
uncertainty created opportunities for new service
providers

pizzahut.com debate--couldn't reach it from university
networks, but you could reach it from UUnet.  Two-tiered
internet even back then.

Regulations chasing change
TitleI -- General FCC Authority (pre 1984)
TitleII common carrier voice/phone calls/later data
TitleIII spectrum licensing broadcast TV and radio
TitleVI Cable Television (post 1984)

FCC moved DSL (but not UNE-L and cable modems) back
into TitleI again.
VoIP is still unknown.

Telecom act of 1996 was another biggie.
before 1996, enhanced services transmitted over a common
carrier.
After 1996, info services were defined by the telecom
act; they run over telecommunications provisioned by
anyone, not just common carriers.
That's why cable companies can offer telecommunications
even though they're not common carriers.  Also why
Radio and TV can put IP in subcarrier without being
common carriers.

So, we have a really odd blend, they're not mutually
exclusive.  Now you have the potential for new entrants
from any direction.
Most home services now come over cable companies;
multiple resellers of same product wasn't enough
to satisfy customers.

Customers are very fickle
Interfering with a customer's use of the Internet would
hurt the provider's business.
No-one can predict what the next "Killer App" will be.
and *everyone* can complain
Both sides need each other to succeed.
Predicting in advance what customers want and will
 consider improvement vs interference is hard to the
 point of being nearly impossible.
And what customers consider an improvement one year
may become interference, or what was once interference
will now be considered improvement.

If you start writing regulations, you imply investigative
and reporting requirements along with it.
Who would enforce these regulations?
And regulations seldom prevent people from being evil,
the government simply sets the price for being evil.
Broadcast fairness doctrine--equal time; nobody can
buy public advocacy on national networks, except
for politicians, who get the best rates.

Kingsbury committment--ATT had to interconnect with
everyone--but that meant you didn't need a second
long distance company, so ended up supporting monopoly
expansion.

Universal service backfired by giving a monopoly
to thos

2006.06.05 NANOG-NOTES IPv6 deployment at Comcast

2006-06-06 Thread Matthew Petach


Randy Bush, moderator of the next section

He begged to do the introduction for a
specific reason; deployment of IPv6
that is beneficial to this companies'
P&L; possibly the only one in existence
thus far.
He did a very studied and purposeful view of
using IPv6 to benefit his company!


IPv6 @ comcast
Managing 100+ million IP Addresses
[slides are at:
http://www.nanog.org/mtg-0606/pdf/alain-durand.pdf

Alain Durand
Office of the CTO
Director IPv6 Architecture
[EMAIL PROTECTED]

Agenda
Comcast needs for IPv6
Comcast plans for IPv6
Challenges

simplistic view of comcast IP problem
20 million subscribers in video
2.5 set-top boxes per subscriber
2 IP per set top-box per DOCSIS std.
total 100 millions IP addresses needed

that's not including high speed data,
nor comcast digital voice, nor mergers/acquisition

Used to use RFC1918 for cable modems.
that space was exhausted in 2005
Comcast recently was allocated the largest part
of net 73 and has renumbered cable modems in that space.
In the control plane, all devices need to be remotely
managed, so NAT isn't going to help us
IPv6 is the clear solution for us
However, even we are starting now, the move to IPv6
isn't going to happen overnight.

Triple play effect on the use of IP addresses
 2005 HSD only 2006 T+
Cable Modem 1  1
Home computer/router1  1
eMTA (voice adapter)0 1-2
Set top box (STB)   0  2
total num of IP addresss  1-2  8-9
(assume 2.5 STB per household

IP Addresses: Natural Growth vs New Services
nice graph--based on trends, not real data

Contingency plans:
use public address space
use "dark" space (pre-RFC1918 space)
federalization (split into separate domains)
(trying to avoid that)

IPv6 strategy
start early
deployment plans started back in 2005
deploy v6 initially on the control plane
for the management and operation of the edge devices
they manage
DOCSIS CM, set top boxes, packetCable MTA (voice)
be ready to offer customers new services that use
IPv6 LATER, not now--first step is to just be able
to manage their own gear.

migration to v6 must be minimally disruptive.
deploying v6 must be in roadmap for all vendors
ops, infrastructure, systems must be ready to support
v6 devices.
over time, IPv6 will penetrate Comcast "DNA"

Deploy v6 for IP addrs of the CM and STB
architecture: dual-stack at the core, v6 only
at the edges
deployment approach: from the core to the edges
backbone->regional networks->CMTS->devices
this is an incremental deloyment; existing
deployments will be untouched in the beginning
Follow same operational model as with IPv4;
lots of DHCP!

News Flash:
All routers on Comcast IP backbone are IPv6 eanbled
first ping on 10GE production backbone
TTLs aren't quite working properly, still
checking on that.
[so, even mainstream vendors still don't have v6
working quite properly yet]

New CM will be v6 ready (dual-stack capable)
On an IPv4 only CMTS, CM will have v4 address only
On v6 enabled CMTS, CM will only have v6 address
No CM boxes will have both; if they could support
v4 on all, wouldn't have this issue to start with!

Provisioning, Monitoring, Back-Office
mostly software upgrade problem
not unlike the Y2K issue
fields need to be bigger in database and web scripts
Should system "X" be upgraded for v6?
does it communicate with devices that are v6 only?
payload Q: does sstem "x" manipulate IP data that
could be v6 (store, input, display)
Comcast inventory analysis
About 100 systems
10 need major upgrades for transport
30 need minor upgrades just for display/storage

Back office management of cable modems.
network transport will still be v4
however, back office systems may need to be modified
to display/input/store v6 related data (CM v6 addr)
Payload can be v6 while transport is v4.

IPv6 certification
Basic IPv4 compliance taken for granted today
IP level component testing is limited
IPv6 is still new technology
maturity level of vendor implementations vary greatly
some have v6 for 10 years
 even those have features not fully baked
others have nothing, will rush to buy 3rd party stack.
Bar for v6 product acceptance has to be higher than what
we typically accept now for IPv4
Formal v6 requirement list before purchasing
v6 conformance testing/certification to accept product

v6 training
most engineers have heard about it, don't know much
fear factor
can expect new hires to have 2-4 years of v4, but 0 v6
initial and continuous training process is critical!

v6 vendors
CM (cable modems) (DOCSIS 3.0/2.0b)
CMTS
Router
Provisioning system
OSS
Video/Voice back-end systems
Retail Market (Consumer electronics)
 Home Gateways
 Video (eg TV with embedded cable modem)

Right now, provisioning system is most challenging.

v6 protocols
MIBS:
some OSS vendors implement RFC2465 (deprecated)
some router vendor implement partial RFC4293 (new
 combined v4+v6 MIB, but onl

2006.06.05 NANOG-NOTES interdomain routng via Wiser, Ratul Mahjan

2006-06-06 Thread Matthew Petach


2006.06.05
A simple coordination mechasims for interdomain routing
[slides are at:
http://www.nanog.org/mtg-0606/pdf/ratul-mahajan.pdf

Ratul Mahjan
David Wetherall
Tom Anderson
University of Washington now @ Microsoft Research

the nature of internet routing
within a contractual framework, ISPs select routes
that are best for themselves.
Potential downsides
higher BW provisioning
requires manual tweaking to fix egregious problems
inefficient end-to-end paths

An alternative approach: coordinated routing
ISPs have joint control
path selection is based on the preferences of both ISPs
Potential benefits
lower BW provisioning
no egregious cases that need manual tweaking
efficient end-to-end paths
 basis for interdomain QoS

Existing mechanisms cannot implement coordinated routing
route optimization boxes help (stub) ISPs pick better
routs from those available
MEDS implement receiver's preferences.
Cannot create better paths that don't already show up
in the routing table.

What makes for a good coordination mechanism?
MEDS have some nice properties
ISPs can express their own metrics
ISPs are not required to disclose sensitive info
lightweight
requires only pairwise contracts
Provides joint control and benefits all ISPs.

Our solution: Wiser
operates in a lowest-cost routing framework
downstream ISPs advertise their cost
upstream ISPs select paths based on both their
 own and received costs.

Problems with vanilla lowest-cost routing
ISP costs are incomparable
Can be easily gamed

When you bring incomparable costs together, the ISPs
that use higher costs win out.

Cost normalization
Normalize costs such that both ISPs have "equal say"
Normalize such that sum of costs is the same.
Makes the system harder to game.

Bounds on cost usage
Downstreams log cost usage of the upstream ISPs
Compute the ratio of avg. cost of paths used and
announced
Contractually stipulate a bound on the ratio.
Similar to existing ratio requirements.

Wiser in action
Announce costs in routing messages.
normalization occurs between ISP pairs.

Example results
using major ISP topologies for experiments
Wiser provides better control under link failure.
Wiser produces shorter path lengths

Implementation
XORP prototype
Simple, backward-compatible extensions to BGP
embed costs in non-transitive BGP communities
border routers jointly compute normalization
factors and log cost usage
a slightly modified BGP decision process
Benefits even the first two ISPs that deploy it.

Summary
Wiser is a simple mechanism to coordinate
interdomain routing
may lower provisioning, reduce manual tweaking,
produce more efficient paths and help with interdomain
 QoS
Feedback: [EMAIL PROTECTED]
http://www.cs.washington.edu/research/networking/negotiation/

Danny McPhereson:
Q: how do you normalize across multiple ISPs?
A: routing advertisements happen on the sum of the
costs announced from me to you, and from you to
me.  He derived it from different values in
his experimentation; utilization and latency
in general.

Q: Randy Bush: Whatever metrics are, you just normalize
by summing them up.  But Danny notes if you have
multiple ISPs, how do you normalize them together?

Q: Danny: where was the 20ms of control plane savings
seen that he claimed in slide 11?
A: That was based on default ISP policy, prefer customers
over peers, etc.
So the delay was control plane plus data plane; it
wasn't control plane alone.
He based it on the old rockefuel equation.

Randy Bush: vendors--this is cool stuff, open your
ears.

Break time now.


2006.06.05 NANOG-NOTES Pretty Good BGP Josh Karlin

2006-06-06 Thread Matthew Petach


2006.06.05 Pretty Good BGP
Josh Karlin, Stephanie Forrest, Jennifer Rexford
slides are at:
http://www.nanog.org/mtg-0606/pdf/josh-karlin.pdf

Main idea: delay suspicious routes
lower the preference of suspicious routes for 24 hrs
Benefits:
network has a chance to stop the attack before it
 spreads
accidental short-term routes do no harm
no loss of reachability
adaptive
simple

Algorithm
Detection:
monitor BGP update messages
treat origin AS for prefix seen in past few days as normal
new origin AS treated with suspicion for 24 hours.
treat new sub-prefixes as suspicious for 24 hours.
Response:
suspicious prefixes given low localpref, not used or
 propagated
suspicious sub-prefixes are temporarily ignored


Example prefix hijack (without PGBGP)
same specificity

Example sub-prefix hijack (without PGBGP)
two /9's cut from a /8

In these examples, AS 5 acted in its own self interest,
but it helped protect the rest of the net beyond it.

Simulations of two deployment strategies
Random, and core+random.
Random, with 0 deployed, half the network will
be affected, better solution as higher fraction
of ASes deploying it.
If core of network deploys
(core ASes have at least 15 peer-to-peer links)
only 62 out of the 20,000 ASes.
All but 2% of network protected with that.

Sub-prefix hijack suppression a bit tougher,
but still good results as core implements it.

hijacks in the wild
1997, AS 7007 sub-prefix hijacked most of the internet
for over 2 hours
Dec 2005 26-95 hijackings during month
jan 2006, panix's /16 stolen by conEd
Feb 26, 2006, sprint and verio carried TTNET
as origin AS for 4/8, 8/8, and 12/8

IAR: internet alert registry
IAR verifies hijack attempts
a near realtime database of suspicious routes
email alerts are sent to those who opt-in for
the ASes they choose to recieve alerts for
 operators recieve alerts only when their AS has
 caused the hijack or is the victim
Tier1 ASs receive one hijack alert per day typically
working prototype

Solutions with guarantees (and lots of overhead)
sBGP
soBGP
psBGP
Anomaly dectors
Whisper
MOAS lists
Geographic based
Good Practice
proper route filters

Route filters protect the internet from you and
your customers, not vice versa.

Why pretty good BGP?
maintains autonomy
incrementally deployable
no flag day
no change to the BGP protocol
Effective with a small deployment
only requires a software upgrade or change in config
generation.
Most important, requires minimum operator intervention

http://cs.unm.edu/~karlinjf/pgbgp/

Q: (someone)? from UCLA--if you delay the route for
24 hours, if the original AS withdraws it, what happens?
A: you'll still end up using the new route, as it just
has a lower localpref, so moves will still work.

Q: Danny McPherson -- what if origin AS is spoofed
to match the origin AS by the hijacker--does this
stop it?
A: No, that's a man-in-the-middle, or at
least it looks like it, and this can't handle
that, so it's only pretty good; that would be
a later phase.
Q: He also notes if your prefix is hijacked,
your email alert is likely to get jacked
as well.
A: True--subscribe from multiple prefixes/domains
to be safe!

Q: Phil Rosenthal, ISPrime.  What happens when a
small ISP in south america leaks the internet
to an upstream that doesn't filter them?
A: Yes, those leaks suck up a lot of memory; this
doesn't help because the origin AS is still
correct, but the intervening paths are bogus.
If the route for a sub-prefix is seen with the
origin AS along the path, not seen as a hijack.

Q: Jared Mauch, NTT america; follow-on point, you
just have a strange AS along the path, but the
rest of the origin is correct.
A: No, they don't look at the whole path yet;
maybe in the future

Q: Sandy Murphy, Sparta--thinking of statement at
the end, it handles backup routes ok.
it works best where operational changes of the
origin happen at a human-paced interval.
There are some prefixes which seem to oscillate
at a much more rapid pace.  What about studying
prefix behaviour over a longer period of time?
Is it locked into 24 hours, or can be adjusted
to match better frequency?
A: Not locked at 24 hours, could be adjusted to
different 'sensitivity' as needed.

Q: Randy Bush, IIJ: The internet is not static, those
things which relay on viewing it as static like
route flap dampening can bite us.  We need to enable
more and more dynamic behaviour, not less, and Randy
thinks this is going the wrong direction.
A: That's nice, but presenter disagrees and thinks this
is a helpful step in the right direction.


2006.06.05 NANOG-NOTES AS-PATH prepending measurements

2006-06-06 Thread Matthew Petach


2006.06.05 Active measurement of the AS path
prepending method.
[ slides are at
http://www.nanog.org/mtg-0606/pdf/samantha-lo.pdf

This is the research forum part of the meeting,
people doing real research on real networks.
Samantha Lo and Rocky KC Chang
department of computing
{cssmlo,[EMAIL PROTECTED]
Kowloon, Hong Kong

Dr. Rocky Chang is her supervisor.

Motivations:
Apply AS-path prepending on a trial-and-error
basis to control the inbound traffic.
How effective can the AS-path prepending method be?
what would happen to the routes after prepending on
a link?

The measurement setup; dual-homed stub AS.
connected to 9304 and 4528
Two upstream links, L1 and L2.
Announce a beacon prefix to both links with
prepending on L1.

graph of prepending length on the X axis.
from 0 to 5, then back down.  Wait 6 hours
between each change to stabilize.
goes from 102:29 at 0 on L1, to 14:91 at 5
on L1.
Greatest change is between prepending length
of 2 and 3.
When decreasing, see an unbalanced phenomenon.

Who was responsive to prepending?
Incoming link to beacon prefix changes,
next-hop of routes also changes in remote AS

Passive-responsive are those where the next-hop
for the route didn't change, but the subsequent
path is different.

Active-responsive, next-hop actually changes.

Non-responsive ASes, see no change.
43 ASes
no change in either incoming link or next-hop
On L1: 14 ASes
use one next-hop only

Passive-responsive ASes
26 ASes
incoming link change
no change in next-hop

Active responsive ASes
47 ASes:
both incoming link and next-hop changes
possible reasons:
apply shortest-path policy
no localpref override.

Active responsive ASes:
UUnet,
Teleglobe,
bunch of others, slide went pretty quickly

Most of them are located 4 AS-hops away from L1;
after prepending, they are 5 AS-hops away from L2.

Routes to L1 at 4, via L2 at 6 when starting.

What if both ASpaths via L1 and L2 have the
same length?
equal to or greater policy:
AS1239
located 4 AS hops via L1, and 5 AS hops via L2.
AS3662 has prepended once.
So prepending once on L1, 5 < 6, no change.
prepending twice on L1, 6 = 6, route changes to L2
even though they're equal.

AS3257, located same as 1239
when increasing prepending to 2,
L1 is 6 (4+2), L2 is (5+1), but still uses L1.
When increased to 3, 7>6, it finally changes to L2.
When decreasing to 2 again, it's equal again, but
it doesn't flip back to L1 until the prepending is
down to 1, at which point 5 < 6, then it finally
shifts to L1.
This is the "greater than" policy.  Same prepending
length, uses different routes.  'sticks' to previously
used path.

BGP update graph.
After prepending, update messages continue for several
hours.

Conclusions and future work
Route changes are introduced by active-responsive
ASes
shortest path policies
topology -> when they will change
possible applications
predict amount of traffic shifts
discover the upstream ASes policies

Thankss to Michael Lo and Lorenzo Collitti

Q: Randy Bush, IIJ, notes that from her slides,
Tim Griffen describes her "delayed reaction"
as 'BGP wedgies'.  It comes from the BGP tie
breakers, it's not something you'll be able
to predict.
A: She notes it's a policy choice of tiebreakers.
Q: Randy insists it's not a matter of policy
choice; it's built into BGP, and not something
they have control over.

Moving on to next speaker


2006.06.05 NANOG-NOTES TCP authentication with Ron Bonica

2006-06-06 Thread Matthew Petach


2006.06.05 Ron Bonica
slides are at
http://www.nanog.org/mtg-0606/pdf/ron-bonica-joint%20presenters.pdf

Authentication for TCP-based routing and management
protocols, from Juniper.

A joint presentation, Alcatel, Cisco, Juniper.

Starts at NANOG at Washington 2 years ago,
security BOF; someone said they would MD5 auth
if they could update keys without bouncing their
sessions.

Suprisingly small number of people actually
using MD5 authentication.

Motivation
many ops don't authenticate TCP based routing
protocols
RFC 2385 doesn't meet operator needs.

Concerns:
CPU utilization
not so much of an issue for Juniper, [Cisco, Alcatel]
Juniper architecture separates forwarding and control
plane
Key management
hard to change keys
requires bouncing sessions
Weak cryptography
easy attacks against MD5

Alternative approaches
Application:
in the protocols (BGP, LDP, etc)
TLS
--too much of a headache
transport
TCP
Network
IKE/IPsec

Chosen Approach:
better TCP authentication
enhanced TCP auth option
Hitless key rollover
key chains configured on peer systems
time based key rollover
key identifier
Stronger cryptography
HMAC-SHA-1-96
CMAC-AES-128-96

Enhanced Authentication Option
Kind - Length T/K Alg ID Res Key ID
KEY

Key chain
contains a tolerance parameter up to 64 keys
each key contains
id [0..63]
auth algorithm
shared secret
start and end time, both for trans and receive

Sending system procedure
identify active key candidates
start-time <= system-time
end-time > system-time
if there are no candidates, log event and discard
outbound packet
If there are multiple candidates, select key with most
recent start-time for sending

Calculate MAC using active key
calculate over TCP pseudo header, TCP header, and TCP
 payload
by default, include TCP options
(if you set the T bit, ignore TCP options)
Format enhanced auth option
active key ID
...

Receiving system proc.
lookup key specified by TCP option
determine whether that key is eligible
startime <= system - tolerance
end time > endtime + tolerance
[not sure if that shouldn't be
  end time > system time + tolerance, actually.  --MNP]
Calculate MAC
if calculated MAC matches received MAC, accept the packet

auth error procedure
discard datagram
log
do NOT send indication to originator
(doesn't adjust TCP counters)

Config example:
see examples on slide deck, they went past too
quickly.

Q: how many of us are authenticating IBGP sessions?
A: majority in the room are
Q: how many of us are interested in a better way
of handling key changes?
A: lots of people!

Q: Russ Bundy?: are you planning on taking this work
into IETF to publish through that path?
A: Yes, went to RPsec, RPsec2, and SAG working group
mailing lists.

Q: Randy Bush, IIJ, were there any simpler proposals?
Clearly this was designed for the IVTF.
A: None that weren't already rejected by the team
themselves.

Q: Steve Bellovin, Columbia U.  No longer security
IAD. Why reject IKE and IPsec?  It does all this, plus
more (which isn't so good)
Why not tie algorithm to the key, get it out of the
header, get more bits for other uses?
A: actually, alg. could be taken from the key; that's
the type of comment they're looking for in the IETF;
one arg for putting it in option is that is a quick way
to check without calculating the MAC,
second Q: is more interesting; why not use IKE with
just auth. -- no need for confidentiality in this
case.  It was discussed, one idea was to just use IPsec
with preshared keys, but then you have same key
rollover system, and key negotiation.  Those are
all probably good ideas.  Would like to do this
as a first phase, allow for manual key rollovers,
and in a second phase, you can negotiate a key
for one-time use.
Q: but in IKE/IPsec, you can use preshared key
mode in IKE;
A: but in this case, you'd still
need a system like this to roll over the keys
since you want to be able to change keys on each
end asynchronously

Christopher Ranch: Made the right choices, thank
you!

Q: Eric ? from cisco: why is this more complicated?
A: being able to have multiple keys and roll them
over.   There are networks that have used the
same key for 10 years since they don't want to
bounce their sessions.  You just can't do that
with IKE.
This is an operations driven requirement, that
it be hitless.

Q: Jared Mauch, NTT america: how does he go in and
take his iBGP session, roll to this system without
making the NANOG mailing list?
A: [no answer provided]

Q: Bora Kilf?, Broadcom: about IKE not being able to
roll keys without a hit; if you use IKE v1, you can
lose the IKE SA, have the IPsec SA, rekey your
IKE SA, and then rekey your IPsec SA.
He agrees with steve, looks like they're re-iventing
large parts of IKE/IPsec all over again.

Q: Gary? want to avoid colo meets; you want to be able
to re-set keys without having to coordinate people
in different timezones.

Encourage people to participate in SAG to discuss
this and provide feedback.

Research forum speakers up next.


2006.06.05 NANOG welcome notes

2006-06-06 Thread Matthew Petach


(getting my notes from today's talks out, finally.  ^_^ ; --Matt)


2006.06.05 Welcome notes

Program chair, Steve Feldman
Thanks to Rodney Joffe, Neustar/Ultra services

People who were instrumental in getting connectivity
into the room here deserve a big round of applause

NANOG program committee,
Joe Abley
...wow, slide went fast.  ^_^;

Agenda changes--none so far.

Remote participation:
streaming options:
http://www.nanog.org/streaming.html
Realmedia
multicast MPEG2, 

Reminders:
network security
don't use cleartext passwords!
do use end-to-end encryption (ssh, vpn)
PGP key signing
see link off www.nanog.org for details
Beer and Gear tonight.

Interpreting badges
Blue badge: steering committee
yellow badge: program committee
red badge: mailing list committee
green = blue + yellow

Green dot == peering coordinator
Black dot == network security
red dot  == PGP signing

Lightning talks
six 10 minute slots available
on-topic for mailing list
signups http://www.nanogpc.org/lightning
deadling is 12:30pm tuesday
talks will be wednesday morning.

Over to Rodney Joffe now.

Welcome to NANOG 37
Almost the NANOG that wasn't.
significant effort, thanks to Merit and others;
an uphill battle, both from time and location.
Encourage other people to host!
Not expensive, just takes time.

Benefits of Hosting:
choose the location (sort of): much easier to do
at home.  if you wait too long, you don't have a
choice of venues.
Tee-shirt: you get to pick designs for it!
Wonderful engineering opportunties!
NANOG Community--you get to give something back
to the community.  He's hosted through three
companies now over time.
Exposure--touched on by Randy and Bill yesterday.
Generate favourable goodwill for your company;
it's not a marketing event, but still gets name out.

Network Architecture
Plan A:
Existing SBC/ATT OC12 at demarc
cool!  we should be able to get connectibity real cheap
Nope: SJCC exclusive licensee says $10K charge to access
 it, even if gear is donated.

Plan B:
AboveNet GigE at the DEMARC, hot but unused,
and owned by AboveNet from an earlier conf.
Nope: SJCC exclusive licensee still has mandatory
$10K access fee

Plan C:
Hilton has fiber run to SJCC infrastructure
OK.  Let's use microwave to connect Hilton to
MAE-WEST 55 S. Market st.

Plan C:
Microwave from the Hilton to Joe Pucik's 13th
floor office at mae-west
fiber from 12th floor to 2nd floor meet me room
cross connect to S&D
across S&D fiber to PAIX Palo Alto to Jared Mauch/NTT

But... 55 S Market == Faraday cage.  :(  No signal.
FC quality glass.

Plan D:
Tango 54mb microwave from hilton to Dave
'Bungie' Rand's 18th floor penthouse roof
at 50 west san fernando
fiber from bungi to switch and data PAIX,
to Jared Mauch at NTT.

Hotel wasn't the biggest challenge, connectivity
was.
First thing you need to do after picking a hotel
that can handle 500 people, 200 rooms.  ONLY pick
hotels that have fiber with someone you recognize
from NANOG.
Second thing is to make sure there aren't any
access or use fees that have nothing to do with
the equipment or bringing gear in.  Otherwise,
you find there are interesting charges that can
be levied against you as you progress!

Shout outs:
Jared Mauch
Christopher Queseda
Joe Pucik?
Ralph Whitmore
Dave Rand
UltraDNS and Neustar Volunteers!

$10,000 colo space--picture is evil.
Other end is the penthouse of the Knight Ridder
building.  Nice shot!


2006.06.04 NANOG Open Community Meeting Notes

2006-06-05 Thread Matthew Petach


Here's my notes from tonight's (overly!) long Open
Community Meeting.  When I said "yes" to going
long vs cutting the discussion short, I was thinking
we'd go long by 10 minutes or so...not by a whole
hour+  ^_^;;

Matt



2006.06.04 NANOG Open Community Meeting
NANOG/San Jose NANOG SC <[EMAIL PROTECTED]>

AGENDA
Steering Committee (Randy)
Program Comittee (Steve)
Financial Report (Betty)
Mailing List Report (Rob)
Open Discussion (Randy)
Aside from vetted presentations, we're
speaking as individuals, and we differ!
The microphone is open throughout.

NANOG Steering Committee Report
SC Goings On
Get meeting schedule under better control
thank you, Rodney!
Charter amendments
voted on in October
ML Policy and Proceedures
Copyright and IPR issues
Press, photography, ...

Future Meetings
Thank you Rodney for pulling chesnuts out of
the fire!
October in St. Louis with ARIN
since it's meeting with ARIN, it's Monday/Tues,
they have Weds-Fri
Joe organizing Toronto first week of February 2007
Josh is looking at Miami for the first week of
June 2007

Mailing List
Still working with ML Panel to document their
process
Still working with MP(ML?) Panel to develop an appeals
process (vote in October)
Statistics are published monthly on the NANOG web
site
Please volunteer for the ML Committee
a selection will happen in Oct. 3+1 so far, that's
the lowest limit right now.  Need to share the
load.

ML Panel Appointments
No terms, etc. in current charter
Straw proposal charter change
parallels SC and PC
two year terms
staggered
two sequential terms max without a vacation

ML Panel process cont.
this would give members a light at the end of the tunnel
volunteers would know what they're signing up for
Allows change without bad vibe of removal
normal organizational practice

Charter Changes
Randy and Steve F are working with Dan Golding and
Steve Gibbard, the old charter group,
to get the known charter revison proposal pieces in order
for San Jose Meeting.
(for voting in October)
(not sure they'll quite make it)

Dan Golding, there's a lot of bootstrapping
language in the current charter, changes will
remove the bootstrapping language, lists terms,
which things are staggered, etc.
Hoping to publish by the end of this meeeting.
Mostly bookkeeping.

Rights in Data
NANOG trademark is held by MERIT
Presentations are copyright by the author
Right to freely distribute, but not modify,
granted to NANOG
PC is drafting this formally
copyright notices on slides are OK if small and
unobtrusive
But what about rights to Streaming and Videos?

Press
Press likely to be present in San Jose
MERIT may prominently tag their badges
MERIT will ask that no pictures be taken in the actual
meetings themselves
This ensures that members are free from having their
picture published without their consent and without
prejudice as to who is taking the pictures.

Ren has been taking pictures of individuals to
post on the Multiply site; so, if you haven't
had your picture taken in the hallway, talk to
her.  But again, not in the meeting room.

Mailing LIsts
Engineering and Ops Discussion only
<[EMAIL PROTECTED]>
Discussions about NANOG itself
<[EMAIL PROTECTED]>
Steering Committee <[EMAIL PROTECTED]>
Program Committee <[EMAIL PROTECTED]>
ML <[EMAIL PROTECTED]>

Agenda
Over to Program Committee (Steve)
Steve Feldman, PC Chair
Disclaimers--all opinions, errors, are his
program is result of hard work by the whole PC

NANOG 37 program
37 submissions, up from 26
22 accepted
1 cancelled
14 rejected

No breakdown on the rejections; they were fine, but
there wasn't enough time to put them all in; some
will resubmit for next time.

Areas for improvement
Speaker solicitation--PC still needs to solicit
more; get broader representation of NANOG community
Scheduling
this meeting was harder, didn't know where it was
going to be.

Program Format
Mon-Weds format
Morning Plenaries
Afternoon BOF, Tutorials
Evening social events
 (moved program out of evenings for more social
  time)
Newbie meeting--beefed it up with more content
from Bill.

About 50/50 split in terms of how many like each
format.  After October, will use survey results
for subsequent meetings.

Lightning Talks
Criterion: on-topic for mailing list
Signups start Monday morning
instructions during plenary
deadline: 12:30 Tuesday
PC selects 6 best submissions
Take place during 1 hour of Weds morning plenary

Feedback
Talk to us!
PC members have yellow and green badges
Send mail
[EMAIL PROTECTED]
[EMAIL PROTECTED]
And fill out your surveys!!

Open Discussion
What topics would you like to hear more (or less)
 about?
How can we (PC and community) get speakers on those
 topics?
Charter revisions
some proposed PC updates, take out his name, bootstrap
details out of it; specify when the PC is selected by
the SC, things like that.  Draft will be published on
the futures list.
Transparency
more debate recently; how much of the workings
of the PC should be visible to the outside.
PC members feel the discussion has to be in

2006.06.04 NANOG new attendee orientation meeting notes

2006-06-05 Thread Matthew Petach


Here's my notes from tonight's 'NANOG new attendee orientation
meeting'.

Matt




2006.06.04 NANOG New Attendee Orientation

NOTES:
NANOG Organization
Steering Commitee (blue badges)
Program Committee (yellow badges)
decide what's on the agenda
(green badges are both)
Mailing list Committee (smoke badges)
Merit Network staff

NANOG Organization (2)
Meeting hosts (Rodney Joffe)
Rescued NANOG after previous host pulled out
Sponsors
pay for beer and gear, breaks
The NANOG Community
Community meetings
Meeting surveys
Elections
[EMAIL PROTECTED]

Don't forget to fill out the surveys!!

Meeting structure
Format:
Plenary sessions (big room)
Tutorials
 sorta like panels, like Network Neutrality panel.
BOFs
 Usually tools, security, peering, at least.
 not recorded, not webcast, more informal, more
 candid.
Social events
 Beer and Gear, tomorrow evening.  8 exhibitors
  showing off latest gear
Etiquette
Mutual respect is the big thing; no personal
 attacks; can criticize ideas, but avoid ad hominum
 attacks.
Dress Code
Must wear shirt and shoes at all times.  Pants or
shorts would be nice, too.

The press and photography
Members of the press may be present
Photography is not permitted during sessions due
 to legal liability/copyright issues.
 it's also distracting!
All sessions except BOFs are recorded and webcast
 will be available for replay later off nanog.org
Reporters should identify themselves if they speak
with you.

Program selection process
17 on the program committee, one selected by Merit
call for presentations
deadline
ratings: each PC member rates each submission on
 a scale of 1-5 and adds comments
conference call to determine consensus

Usually most are good, some need suggestions on
what is needed before they can be approved.

Program process 2
second round of comments
second round conference call
anything after that: chair's discretion

Some stuff had to be turned away this time, as
schedule was full.

Now over to Betty Burke for Merit update.

NANOG 37 newcomer meeting
Betty Burke, Project Manager, Merit Network
she'll be giving overview of Merit, and why
it's involved with NANOG.
Bill Norton will go more through the relationship
through history.
She also handles the Michigan technology center,
so she covers multiple hats.

Merit and NANOG
Continuation of shared values
Committment to R&D
National Involvement
Regional Educational Activities

Much of the background between merit and NANOG is in
the shared research and development focus.  Merit
originally focused on the Michigan area research
network, now covers research and development for
widespread activities, nationally and internationally.
Merit is 503.1c entity in Michigan, non-profit, one
of the largest network providers in Michigan; they
do R&D only, no commercial side.

NANOG hosts and sponsors
hosts: work with Merit to locate a hotel, provide
connectivity, build the hotel network, and staff the
meeting
break sponsor: an engineer from your organization
exhibits your equipment on a tabletop display.
break slots are 30 minutes
one vendor per event
beer n gear: dipslay your equipment at a table
staffed by two engineers.

Merit is 501c.3 regional network, 40 years young
Owned by Michigan Public Universities
 hosted at University of Michigan
located in the Michigan information technology
 center with Internet2
Org chart
Merit Board
Merit CEO--new, starting in July, reports to Board
Directors - R&D, Network
Managers
Staff

Hierarchy is to allow decisionmaking, but the
hierarchy isn't rigid; ideas can flow in either
direction as needed.

Mission statement: to be a respected leader in
developing and providing advanced networking services
to the research and education community.  Merit is a
trusted source for providing high-quality network
infrastructure: initiating and facilitating
collaboration; and providng knowledge and technology
transfer through outreach...


Supporting our Mission
MichNet, Merit's statewide network
as well as Internet2
Research and Development.

Bill Norton, unvetted slides.
Freshly unslept (new kids will do that to you)

NANOG History (v0.2)
William B. Norton
Co-Founder/Chief Technical Liason Equinix, Inc.
[EMAIL PROTECTED]

He's used to dealing with lots of suits, with
translators, transcribers, etc.

Why did the Frenchman have only one egg in
his omlette?  One egg is an oeuf!

What do I know about NANOG?
Merit Staff 1987-1998
NANOG chair 1995-1998
Developed 1st Business Plan for NANOG
financially self-sustaining
Started number of NANOG traditions
NANOG T-shirts
Numbering NANOGs
Colored NANOG NameTag
Beer-n-Gear
Cookie-graph
Surveys
Etc

Q. What to expect in a typical day?
A. Current meeting structure
2.5 days; it may look like a terminal room
Sunday-Wednesday
Sunday is Newcomer's welcome, and community meeting
Monday-Main sessions...

NANOG spreads travel burdens
(6 was my first NANOG in San Diego)
Still pretty much true that it's cheaper if you stay
over Saturday night in terms of flights.

1987-1994
NSF funded
$->Merit

Re: Have Yahoo! gone pink?

2006-03-30 Thread Matthew Petach
On 3/29/06, Peter Corlett <[EMAIL PROTECTED]> wrote:
[I'm wearing my personal hat here.]I'm getting a *flood* of spam coming in from Yahoo! mailservers, both to mypersonal and work addresses. It seems that Yahoo! don't care. Here's theresponse to me piping a sample one through Spamcop:
  http://abuse.mooli.org.uk/yahoospamYahoo claim "After investigation, we have determined that this email messagedid not originate from the Yahoo! Mail system. It appears that the sender of
this message forged the header information to give the impression that itcame from the Yahoo! Mail system."The spam headers claim otherwise:Received: from mrout3.yahoo.com
 ([216.145.54.173])  by relay-1.mail.uksolutions.net with esmtp (Exim 4.50)  id 1FJbCW-0002Ag-IV  for 
[EMAIL PROTECTED]; Wed, 15 Mar 2006 18:58:29 +As does DNS and whois:[EMAIL PROTECTED]:~$ host 216.145.54.173173.54.145.216.in-addr.arpa
 domain name pointer mrout3.yahoo.com.[EMAIL PROTECTED]:~$ host mrout3.yahoo.commrout3.yahoo.com has address 
216.145.54.173[EMAIL PROTECTED]:~$ whois 216.145.54.173OrgName:Yahoo! Inc.OrgID:  YAHOOI-2Address:701 First AvenueCity:   Sunnyvale
StateProv:  CAPostalCode: 94089Country:US[etc]Doing double-DNS lookups of the IP addresses on other spams also giveyahoo.com hostnames, and they're typically in DNSBLs for being sources of
spam and a useless abuse address.So, which IP blocks shall I null-route then? Or is there anybody here fromYahoo! with a clue? (OK, you can all stop laughing now.)Ewww.  
p4pnet.net is  part of a company Yahoo acquired that is still in theprocess of being integrated.  :(Personally, I'd just null-route the blocks--I'm sure it'll decrease the loadon the Internet as a whole while Yahoo works on trying to clean up their
acquisitions.  Of course, that's me speaking for myself, and not in anyway shape or form speaking for my employer.  ^_^;;There are spam clueful people at Yahoo from the NANAE and anti-spamcommunities--when stuff like this shows up in public forums, it does get
noticed and passed along.  I agree, it would be better if it could garnerthe right level of attention without being called out in public forums like this, though.Matt
--PGP key ID E85DC776 - finger [EMAIL PROTECTED] for full key


Re: shim6 @ NANOG

2006-03-04 Thread Matthew Petach
On 3/4/06, Iljitsch van Beijnum <[EMAIL PROTECTED]> wrote:
On 4-mrt-2006, at 14:07, Kevin Day wrote:>> We got lucky with CIDR because even though all default free>> routers had to be upgraded in a short time, it really wasn't that>> painful.
[Because there was no need to renumber]> Isn't that an excellent argument against shim6 though?> In IPv4, something unanticipated occurred by the original developers> (the need for CIDR), and everyone said "Oh, thank god that all we
> have to do is upgrade DFZ routers."You are absolutely right that having to upgrade not only all hosts ina multihomed site, but also all the hosts they communicate with is animportant weakness of shim6. We looked very hard at ways to do this
type of multihoming that would work if only the hosts in themultihomed site were updated, but that just wouldn't fly.And given that any network big enough to get their own PI /32 has *zero*
incentive to install/support shim6 means that all those smaller networksthat are pushed to install shim6 are going to see *zero* benefit when theytry to reach the major sites on the internet.What benefit does shim6 bring, if only the little guys are using it?
This dog won't hunt.  Move on to something useful.Yes, this is an issue. If we have to wait for a major release or even
a service pack, that will take some time. But OS vendors havesoftware update mechanisms in place so they could send out shim6 codein between.And no major company supports/allows automated software update
mechanisms to run on their production machines--it adds too much of an element of randomness to an environment that has to be as muchas possible deterministic in its behaviour.
But again, it cuts both ways: if only two people run shim6 code,those two people gain shim6 benefits immediately.Cool.  So let individuals make a choice to install it if they want.  Butthat's a choice they make, and should not be part of a mandated IP
allocation policy, because otherwise we're codifying a split between"big" companies and everyone else.  The companies that can justify /32allocations _aren't_ going to install shim6; they already have their
multihoming options (for the most part) covered--so the little guys whoinstall shim6 to "multihome" are going to discover it doesn't do diddlysquat for helping them reach any major sites on the internet during an
outage of one of their providers.  You haven't preserved end-to-endconnectivity this way, you've just waved a pretty picture in front of thesmaller company's face to make them think they'll have the benefits of
multihoming when they really don't.> Getting systems not controlled by the networking department of an
> organization upgraded, when it's for reasons that are not easily> visible to the end user, will be extraordinarily difficult to start> with. Adding shim6 at all to hosts will be one fight. Any upgrades
> or changes later to add features will be another.One thing I'll take away from these discussions is that we shouldredouble our efforts to support shim6 in middleboxes as analternative for doing it in individual hosts, so deployment can be
easier.Won't matter.  shim6 on a middle box still won't be able to re-route to themajority of the large sites on the Internet during an outage on one of theupstream providers given that the large content players and large network
providers aren't going to be installing shim6 on their servers and loadbalancers.
> The real "injustice" about this is that it's creating two classes> of organizations on the internet. One that's meets the guidelines> to get PI space, multihomes with BGP, and can run their whole
> network(including shim6less legacy devices) with full redundancy,> even when talking to shim6 unaware clients. Another(most likely> smaller) that can't meet the rules to get PI space, is forced to
> spend money upgrading hardware and software to shim6 compatible> solution or face a lower reliability than their bigger competitors.And that's exactly why it's so hard to come up with a good PI policy:
you can't just impose an arbitrary limit, because that would be anti-competitive.You failed to note that the smaller company, *even after spending moneyupgrading hardware and software to shim6 compatible solution* won't achieve
the same reliability as their bigger competitors.  (see above if you missed it).shim6 is _more_ anti-competitive than extending the existing IP allocationpolicies from v4 into v6, and is therefore not going to garner the support of
the companies that actually spend money to create this thing we call theInternet.  And without money behind it, the effort is a non-starter.
> Someone earlier brought up that a move to shim6, or not being able> to get PI space was similar to the move to name based vhosting(HTTP/> 1.1 shared IP hosting). It is, somewhat. It was developed to allow
> hosting companies to preserve address space, instead of assigning> one IP address per hostname. (Again, however, this could be done> slowly, without forcing end users to do any

NANOG36-NOTES 2006.02.15 talk 3 Katrina Panel

2006-02-15 Thread Matthew Petach


2006.02.15 Katrina Recover Panel
moderator: Sean Donelan, Cisco

Members: Paula Rhea, Verizon
Josh Snowhorn, Terremark
Bobby Cates, NASA

Sean Donelan was with SBC when Katrina hit,
now with Cisco.  Dave couldn't be here, but
Sean will do his Bellsouth slides.

Lessons Learned
Industry has to be able to function as a first
responder to provide critical infrastructure in
support of state/local response.
 certain sectors may need heightened support,
 including power and voice/data communications
Providing security in times of crisis may fall back 
 to the private sector
Need to understand how the Government works in a crisis
 National Response Plan, FEMA system, etc.

Bellsouth lost COs for first time in 100+ years
of business.  When you get a direct hit, you 
will be impacted, period.  More important is how
your recover!

Most national disasters are pretty quick; we know
how to deal with short term, but as the issue drags
on, security for personnel becomes more and more vital,
and is turned over to private sector, public security
is engaged on more important issues.

We need to help shape up the government to avoid
issues like Katrina from happening again.

Bellsouth, lsessons learned 
partnerships with other carriers, state and local
government, the power companies, and the federal
 government made the difference
 experience and trust are key in a crisis
Get involved--know how to reach teh communications
 ISAC and national coordinationg center in a crisis
 703-607-4950
 NCS at NCS.gov or NCC: telecom-isac at ncs.gov
operational 24/7/365
Konw what programs are available to you and your customers
 GETS/TSP/WPS

Bobby Cates from NASA up next.

Supported first responders right after Katrina; they
were providing video coverage, supported voice over
IP, sat phones, etc. in the first days.  The commercial
facilities were better than gov't lines, actually.
When president came in, military took over all 
satellite frequencies, so VoIP over commercial
internet was what was left.  Phones from Bill Woodcock
from PCH, servers from some bay area folks; got gear
loaded onto a C5 that was warming up and flew it out,
the costs was less than one set of satellite phones.

TSP was interesting for Katrina; for higher bandwidth,
higher pricing, not much diff for TSP and non-TSP
restorral.  Wimax and voip pretty much saved the day,
easy to implement.

Josh Snowhorn, NOTA, didn't take Katrina too bad,
but Norma? hit him hard.

Only 3 cat 5 hurricanes (andrew, 92, camille 89,
and 1935).  cat 4 and cat 3 hit more often, hundreds
 a year of the smaller ones.  

Saffir-simpson hurricane scale.
cat 1; winds 74-95, wind, water
cat 2: 96-110, storm surge 6-8ft
cat 3: 111-130, surge 9-12ft much structural damage
cat 4: 131-155mph, surge 13-18ft at landfall (katrina at coast)
cat 5: 155mph, surge 18ft

Wilma was cat 5 before landfall, as was katrina;
wilma was lowest barometric pressure every recovered.

27 named storms last year, lots of warm water heading into
the gulf.
formed off the bahamas, very little warning before it
hit into south florida.  did a bunch of power line damage.

NOTA faced many issues during the storm.
2005 most storms in recorded history
2005 hurricane season went 27 named storms representing
first time in history that the naming scheme went into
the greek alphabet
lowest

NOTA--pre wilma, 3 happy balls
100mph winds on the curtain should be able to withstand it.
Lost one of their roof balls during wilma

NOTA lost commercial power, went on gensets for 31 hours
during katrina in July

NOTA lost commercial power for 10 hours with wilma, but
had to stay off for 30 hours it was so dirty

majority of enterprises and business in s florida
without power for 10 days

the day after wilma, had no less than 20 truckloads of
servers and infrastructure arrive at NAP loading docks
with sales people and contracts

within 2 days of the passing of wilma, we began to recieve
phone calls askign for fuel truck help from undersea 
cable operators and large enterprises; everyone
pitched in to help all of the other operators in the
area
12 undersea cables coming in, you cut them off, s.america
largely goes away.

only 1 carrier fully lost a CO in north miami, bringing
down their circuits that came out of the NAP; water came
in, shorted things out.

Many companies did not plan properly for power failures
and staff recovery and access to systems after the storms
 have passed.

large portion didn't have DR plans or backups

staff who loose their homes need food/water, won't go to
 work

thow who want to work cannot go to devestated offices
 so they need to work from phone

getting employees acces to systems is the singular issue
 that IT diretors face post katrina and wilma

KEEP a dialup access poitn; it's often the only thing
left in a disaster like that.


Sean: to NASA; for packet traffic, what traffic did you
see--lots of traffic you didn't plan for, or business
as usual.  The emergency response was for NCS, FEMA,
DoD, as well as thei

NANOG36-NOTES 2006.02.15 talks 5-end Lightning talks, closing notes

2006-02-15 Thread Matthew Petach

(they weren't kidding about lightning!!  ^_^;; )

2006.02.15 Lightning Talks:
Infrastructure (DNS and Routing) Security - 
Status and Update by Sandra Murphy

Need for Speed: What's next after 10GE?
by Mike Hughes

A Brief Look at Some DNS Query Data
by John Kristoff

The impact of fiber access to ISP backbones in .jp
by Kenjiro Cho

New Network Monitoring Interest Group
by Mike Caudill

Understanding the Network-Level Behavior of Spammers
by Nick Feamster (presented by Randy Bush)

12:20-12:30  Closing Remarks
Steve Feldman, CNET, Susan Harris, Merit

Reload your agenda for the slides!!

Fun with gnuplot, DNS query data, John Kristoff
X asis, source port of client query to DNS server;
Y axis, how many times that port was used.
Looking at recursive server for an institution
open to inside and outside on 2005.11.22
starting at 1024, lots of clients use that port,
then declining to the right; 1025 is most popular
port; wraps at 5000, windows starts over.
to the right, UNIX  boxes start with high ports.
Port 137, windows stuff, all bogus windows lookups
Port 5353, is multicast DNS, MACs use it, also bogus
Some very interesting outliers, either misconfigured
or poorly thought out OS/stacks.
Graphs are similar at different institutions, and at
large ISPs.
If you take out the external queries, points below
1024 (except 53) seem to be machines behind PAT boxes.
Port 1900 is plug and play port, so windows can't use
it, so it's a low outlier.
external queries show more outliers in low range.
looking at PTR queries internally; no elbow at 1025.
5353 standout is still there, multicast PTR queries,
all bogus.
MX queries, same thing.
 stuff, not many outliers, very clean; possibly
 bogus, though.
A windows box trying to contact IRC server (neutered
bot box); keep using same source port over again until
firewall/virus software moved it.
UNIX box used port range constantly across the range,
more normal (trojaned box)
Normal UNIX box shows more normal rows of different
ports.
looking at source ports, what other useful info and
patterns can you start to discern?  Look at TTL, dest
ports, all sorts of fun you can start to discover.


Sandra Murphy
sandy at spart.com sandy at tislabs.com
DNS and routing security
DNSsec is live, sweden has signed top level zone,
RIPE signing reverse zones, some reverse delegations.
http://www.dnssec-deployment.org/
open working grope, dnssec deployment initiative
focused on deployment issues, active mailing list, regular
telecons.
organizes workshops at conferences, etc.
screenshot of the site; has roadmaps, working group
signups, mailing lists, operator guidelines, links
to NIST, etc., events, and actions.
DNSSEC-tools project
create tools/patches for web browsers and such.
http://www.dnssec-tools.org/
current release is v0.9 from 2/10/2006
Firefox 1.5RPM to check DNS sec records back
Shot of tools being released..
zonesigner tool is how you sign and maintain a signed
zone.
Some very detailed documents on how you sign and
maintain a signed zone, as well as mailing lists.
sourceforge link for dnssec-tools bundle
Securing the routing infrastructure:
big problem, no traction on deployable solutions
3 workshops with a wide net of interested parties.
 operators, iSP, access, content providers, vendors, security
 DHS hosted, anxious to find a solution
 http:///www.hsrpacyber.com/public/
Operators' emphasis
a strong call from the operators for an authenticated
list of authorized prefix originations (accurate, complete
 secure)
respond to customr requests to route prefixes
useful in debugging routing difficulties
NEW ARIN policy suggestion
recommendation
 new field in address templates (direct and subdelegations)
 for list of permitted ASes
Benefits
 inhereits self-discipline of completign form (IRR entries
 aren't always done)
 inherits scrutiny of ARIN process on creation
 ARIN is authority for who is allocated prefixes
 Any IRR would have to check prefix with RIR
Authentication and currency in IRRs
authentication IRR objects
 RIR run IRRS have internal access to authentication for
 prefix holders
 non-RIR run IRRS would have to find a way to get that
 authentication from the RIRs
  samee is true for RIR IRR objec referring to nonmember
   resoureces
Currency for IRR objects
 reclaimed resources have to result in IRR purges
 why not a TTL in IRR objects?  Handles non-RIR IRRs
This solicits requests and feedback.  Try the DNSSec tools,
try signing a zone, see how it works.  Try the client
system that does the DNSsec validation.
Participate in ARIN ppml list on routing security, etc.


Mike Hughes, what's next after 10GE
mike at linx.net
Channels geoff huston for scary graph.
curve of traffic growth.  By end of 2006,
he'll be at 150Gb; if he takes last 3
months, he'll be at 300Gb in one metro.
where is it coming from?
ADSL2, Wimax, FTTx, skype, voip, p2p, etc.
consolidation
fewer people with bigger pipes.
think back to seattle
chap from force10 came and asked what do you
 want, 40g or 100g?
we 

NANOG36-NOTES 2006.02.15 talk 4 Interdomain Routing Consistency

2006-02-15 Thread Matthew Petach

Access point movie goes whizzing past very quickly
as Bill Fenner narrates.
Lets you see where people are congregating, and
which talks are more interesting, and when people
migrate out of talks; could feed into the survey
to tell the program comittee which talks are of
more interesting.

netdisco, collects data from network elements,
plots them, put a front end on it;

If you opted in, by emailing him you MAC address,
it would render a map with your location on it.

has RSS feeds of your location as well.

fenner at research.att.net


2006.02.15 An Inter-domain Consistency Management Layer
Nate Kushman, MIT

Steve Feldman, welcome back, Nate Kushman is up first
to talk about routing consistency.

Transient BGP loops 
was with akamai, now at MIT
srikanth kandula, dina katabi, john wroclawski

Do loops matter?
can we do something about them?

what is a transient BGP loop?
slide showing loop forming.

How common are "transient BGP loops"

Sprint study, IMC 2003, IMW 2002
looked at packet traces from the sprint backbone
up to 90% of the observed packet loss was caused by
 routing loops
60-100% could be attributed to BGP

Is it true on internet?

Routing loop damage

20 fvantage points with BGP feeds
did pings, traceroutes, watch for loops.

correlated on BGP updates, and ttl exceeded
on ping, traceroute.

In fact, all loops were within 100seconds of
 BGP updates.
10-15% of all BGP updates caused routing loops!!

Collateral damage.
they cause impacts on congestions that are part of
the loop, causing loss to non-rerouted networks
from non-rerouted-to source networks.

traceroute to see which links were part of the
loop, see which other traces shared a link in
common with the loop.
there is a marked increase in packet loss in
the 100second window around the BGP loop.

Prefixes sharing a loopy link see 19% packet loss
in general.

What should be done?
We need to prevent forwarding loops.

A loop occurs because:
one AS pushes a route update to the data plane, but
other ASes are not yet aware of that route change.

What about telling everyone about the change before
the change actually happens?

Suspension:
continue to route traffic
tell control system not to propagate the route
FIB stays same for now, RIB doesn't send route.

downstream networks only update forwarding tables
once upstreams have acknowledged the path change.

More generally:
we have proven:
 loops are prevented in general case
 convergence properties similar to normal BGP
http://nsm.lcs.mit.edu/~nkushman/
incrementally deployable.

feedback

Clearly:
 works well for planned maintenace.  We can delay move
 to backup path during those events, at least.
  20% of update events caused by planned maintenance
 Link up events also cause loops, no way to plan for
  them smoothly now.
What about:
 unplanned link down events
 trade-off between loss on current path and collateral damage

Are we willing to do this in general, to avoid impacting
stable prefixes from unstable prefixes.

In short: routing loops are a significant performance
 concern.

Bill Norton--hidden question: what is the time domain
 during which these traffic impacts are seen?  Will
 the propagation path take 10, 20, 30 seconds? 
A. one event causes many, many loops rippling out,
 so one update may cause packet loss for many seconds,
 up to tens of seconds total.
Q. you're talking about adding MORE state information
 into the network.  Also adding latency to update
 acknowledgements.

Jared notes that router software bugs tend to
exacerbate routing loop issues.  You can tune configs
to try to minimize the number of loops seen, as well
as upgrading to "fixed" code to get better results
without more state.

Patrick Gilmore asks jared, does tuning help internal
sessions or external sessions?  Both, it really controls
*when* the updates are sent out (immediately vs batched,
etc.).  Jared notes the internet is being used

Someone (Bill?) asks if convergence times are similar to 
current model, as the slide claims; is that within
a few seconds? convergence in the lab is similar, yes.

Matt Petach asks about details of convergence; it
basically puts you at mercy of the slowest, farthest
away router on the network, since it has to get the
message, realize it has nobody to send to, and then
acknowledge back before anyone else can update FIB;
yes, true, so you'd want to put timers in to limit
how long you wait; basically, like "wait 5 seconds,
and either hear an ACK, or go ahead and update FIB"
type timeout, so you don't wait forever for a
non-conformant device on the other side of the world.

Riverdomain question--with suspension, you're basically
in passive mode, listening but not updating, is that
correct?  Yes, with respect to the links/prefixes in
question.





NANOG36-NOTES 2006.02.15 talk 2 Katrina--telecom infrastructure impacts

2006-02-15 Thread Matthew Petach

2006.02.15 Hurricane Katrina: Telecom Infrastructure Impacts,
Solutions, and Opportunities, Paula Rhea, Verizon

A more interactive presentation from her, in the
aisles.

Verizon Business group--combined MCI/Verizon team.

Agenda
Hurricane Katrina Recap
Telecom infrastructure impacts
telecom provider successes
business continuity planning
conclusions
references
appendices: case examples.

Many of the people in this room would be considered
part of the critical infrastructure for the nation
by the department of homeland security

After world trade center 9/11 issue, there was
a lessons learned; hopefully there will be a
similar report post Katrina.

New Orleans is still very much like a war zone
right now; it's definitely a disaster recovery
training session for many industries.
Neighborhoods are wiped out; no capital
investments, infrastructure in holding pattern.
Many with no power, 20% of houses condemned.
Neighborhoods that are entirely silent--eerie.

Aim is not to diss anyone specifically, certainly
not in this room; aimed to be an assessment in a
neutral fashion.

Critical infrastructure:
food and water supply
energy
transportation
healthcare
banking/finance
telecommunications/infrastructure

Oddly enough, much of critical infrastructure is 
privately owned, rather than government owned.

The domino model says that any one piece will
cause the rest to start to fall.

35th largest city in US
port of new orleans is #1 in US by tonnage
50% of total US grain exports shipped via gulf
10.8% of total US refining from new orleans
5th largest port
Key space shuttle facility in Michoud supported
fuel tanks for international space station

Storm recap
hirricane hit aug 29 2005
135MPH winds, 20foot storm surge sent inland
55foot surges logged in gul pior to landfall

levee failures create secondary crisis
2.3M homes without power
spawned 33 reported tornadoes in NA
1090 fatalities in LA recorded to date

people dancing about cat 5 dropping to cat 4, 
thought they were spared, then levees broke;
had been predicted the year before.  :(

Still 2500 people missing/unaccounted for.

Map of eastern LA parishes
st bernards/plaque mines parishes between the lake
and the gulf, hardest hit when levees broke as water
headed back towards gulf.
Lack of interoperability between parish govt systems.

New orleans telecom impact (multi-carrier)
1.75M lines down immediately following kat.
38 911 centers out (1/3) initially
1,000 cellular towers out
two class 4 toll switches initially out of service
no power/unable to secure extended diesel fuel

Traffic out of lata logjammed with toll switches out.
LECs had backup power systems, but no fuel.

Took 4 days to inspect causeway to allow emergency
crews into the city with main bridge out.  Most
nurses and doctors were in suburbs, not in city.

Central offices post katrina
new orlenas lake co
CLLI NWORLALK
Venice LA CA
CLLI: VENCLAMA
Buras CO
CLLI: BURSLAMA

19 COs are totally destroyed, and will have to be
rebuilt.  

These slides are public domain info, no inside info.

I2/Abilene link from Houston to Atlanta initially out,
restored on sept 8 2005

fiber optic path on lake pchatrain bridge
offline following hurricane katrina

wifi, wimax and voip play key role in area communications

public internet was actually very resilient

Telecom provider successes: alphabetic
1,000 amateur radio operators helped
bellsouth
cingular
cisco
cox,
iridium added 10,000+ new phones to first responders
MCI
Nortel
Sprint/Nextel donated up to 10M
Verizon donated 8M and 200 workers

Carriers have mutual aid agreements; Verizon sent 200
people who volunteered to spend 8 weeks living in a tent
to help rebuild--had to work with armed guards.

The CO rebuilds wasn't any type of upgrade, it was
bulldozing damaged/destroyed facilities, digging new
vaults, and starting over to restore just what was
in place before hand.

Bill Norton--COs underwater, can you imagine some type
of preventative design that could have been put in
place to help avoid impacts like this?
Most of the area is reclaimed land, 2 miles below
sea level (some dispute about that number).  
Bill wonders if they could be built above sea level
somehow.  Even if they were, Paula notes that they
wouldn't have power, wouldn't have 2 weeks of diesel
fuel to run them, etc.  Really, it comes back to the
levees.
Randy Bush noted that early on, community based wifi
was one of the early-on means of communication to
daisy-chain packets along.  
Roland, from Cisco; did some logistical work with relief;
Verizon donated eVDO boxes to make eVDO to wifi bridges,
did VoIP over wifi to eVDO boxes to juryrig connectivity.
But doesn't work so well with towers down, and no power.
With the cell phone infrastructure down, that really hurt
too.

Thanks to Todd Underwood/Renesys for their graphs; did
a pre-and-post analysis routingwise.
Top red is LA;
about 170 networks totally out during the bulk period.
teal/TX not impacted,
MS also hit, in tierms of percentage more s

NANOG36-NOTES 2006.02.15 talk 1 ipv6fix (and boy, does it need it)

2006-02-15 Thread Matthew Petach

Morning intro notes--don't forget to fill out
your SURVEYS

six lightening talks signed up, should be very
cool.  If you have slides, get them to Steve
Feldman start with!

Wireless movie after break should be cool to watch.
Ren?  Steve mistakenly introduces her, she corrects
them.  Don't forget to give feedback via the Survey
forms!!

2006.02.15 v6fix: Wiping the Slate Clean for IPv6
Kenjiro Cho, WIDE/IIJ, Ruri Hiromi, WIDE/Intec NetCore

Will be talking about their efforts to deploy
IPv6, called v6fix.

v6fix is an effort to solve problems in the current
v6 deployment.
focuses on v4/v6 dual stack environments.
it's a technical analysis of real world problem
Kenjiro will talk about tools and measurements.

deployment status
majority of equipment out there is v6 available
from major vendors
still many applications and appliances just work
 with v4
v6 is starting to get into various business fields
Many people lack knowledge/experience with v6.
 when non-experts hit problems, they're clueless.

Example: illiteracy.
Hotel internet systems have instructions for guest.
 troubleshooting: if you have IPv6 enabled, please
  disable IPv6--brochure in guest room.
Cause of problem: combination.
  DNS redirection returns specific A record for 
 clients stub-resolver accepts the A for , can't
  get out.

Wiping the slate clean for the v6
faulty behaviours only 1% and combinatorial often, but
could be fatal to deployment.
 slow fallback to v4 after v6 errors
 misbehaving DNS resolvers
 filtering of ICMPv6
 DNS misconfigurations
 poorly configured tunnels
 lack of peering or v6 paths
 
v6fix activities (research group)
 identify/analysze/solve real-world tech problems 
 in v6 deployment.
 Enemy: "after disabling v6, my problems went away"
Cooperation needed between researches, implementers, ops.

v6fix topics
harmful effects of the on-link assumption.
misbehaving DNS servers and resolvers
slow fallback to v4 after v6 failures

Examples:
case 1: DNS loop at hotel
real story of hotel internet system--went to same room,
 investigated.
DNS is intercepted, redirected to signup page
ipv6 users can't get beyond first page
hotel instructions say to disable v6
erroneus DNS redirection system and stub-resolver
redirection system always returns specific A record
 when getting non-A queries
client's stub resolver queries  for any address,
 blindly accepts A return response.

case 2: DNS server slowdown
Japanse ISP
ISP upgraded a DNS cache to BIND9, recieved complaints
 about slowdown.
recompiling BIND9 with --disable-ipv6, fixed problem,
 reported to JANOG
Caused by older BIND9 w/o IPv6 connectivity
 server w/o v6 connectivity always tries to talk over v6,
 ends up failing back to v4 after timeouts
 fixed in BIND9.2.5 and 9.3.1

Common factors
1 problems appear only with specific combinatorial conditions
2 implementors and operators didn't notice until reported
3 even for professionals, not easy to track down problems.

Kenjiro Cho, Tools:
v6 tools and measurement results
Goal: to understand the macro-level v6 healthiness
current methodologies
 wide area meaasuremetn of behaviours of 2nd/3rd level
 DNS servers
 dual stack issues

DNS server measurements of .jp domain
 responses: 0.13% DNS servers can't deal with 
   requests
Most are lame delegation type errors.
ignore  queries
respond with RCODE 3 ("name error") NXDOMAIN

dual-stack path analysis
measurement techniques specifically designed for
dual-stack
 take measurements for v4 and v6 at same time
 compare v6 results with v4 results
 extract problems that exist in v6 only
methodology
 dual-stack node discovery
 create dual-stack node list by monitoring DNS  replies.
 dual stack ping
 run ping/ping6 to target dual-stack nodes
 select a few representative nodes per site (/48) by RTT
dual-stack traceroute
 trace/trac6 to selected nodes
 visula v6 MTU to look at issues
 visualize path issues

distribution of v6/v4 RTTs
4000 ping targets v4 on x-axis, v6 on y axis
individual nodes far above  unity line--leaf issues

paths and PMTU visualization
from NYSERNET to ARIN sites

Many of ARIN paths via jp!  (lack of peering)

>From ISC to ARIN sites--paths look much better, but
lots of blue == lots of tunnels

Abilene case: well known problem.
Abilene trying to encourage v6 adoption
  no AUP, tunnel services for v6
but ended up with horrible v6 paths, mostly with tunnels
 ISPs are reluctant to move to paid v6 connectivity
Abilene thinking about suspending its relaxed AUP for v6
tool tries to illustrate such issues, convince users to
 move to native v6

dual stack traceroute to ABILENE from WIDE (v4 upper, 
 v6 lower)
similar RTTs/hops for v4/v6; native dual-stack paths

dual-stack trace to ABILENE from IIJ
similar RTTs, but different paths: currently more common

dualstack traceroue to ABILINE fro ES
v6 RTTs much larger than v4: roundabout tunnels

Conclusion: faulty behaviours are only 1% and often
combinatorial, but can be fatal to acceptance of v6
 slow 

NANOG36-NOTES 2006.02.14 Tools BOF Notes

2006-02-14 Thread Matthew Petach

Last notes of the day...

Matt



2006.02.14 Tools BOF
Todd Underwood, panel moderator

A number of interesting tools presented earlier today;
all of them are good and interesting and solve a
particular set of problems.  None are in widespread
use.  There's a lot of possible reasons; do they
solve problems you don't have, in which case they
can move onto something new; or they solve a problem
similar to one you have, but not quite.  Or they solve
a problem you can't quite implement yet.
Discuss use cases, problems they're trying to solve,
and give feedback, as interactively as comfortably
as people can.

3 tools, OpenBGPD, IRR powertools/webtools (to get
feedback and is the IRR even useful anymore?) and
Flamingo as one of 2 netflow platforms.

Start with Henning, active in open source software
development; he'll go in more depth on openbgpd.


OpenBGPD
Henning Brauer henning at openbsd.org

3 process design

Principle of least privilege
the RDE (route decision engine) does not need any special priv
at all, so it runs as __bgpd:__bgpd: chrooted to /var/empty

SE needs to bind to TCP/179

parent needs to modify kernel routing table.

Session Engine (SE)
needs to bind to 179/tcp

we have the parent open sockets
see recvmsg(2)

parent needsd to keep track of which fds the SE has open,
so it doesn't bind() again to same ip/port

the SE can drop all privs, then.

SE 2
since one process handles all bgpd, need nonblocking sockets.

on blocking, you call write(2), won't reurn until it's done
or get errors

on nonblocking, returns as soon as it can't proceed
immediately
So, have to handle buffer managmeent

SE 3
designed an easy to use buffer API and mesg handling
system.

Messaging
internal messaging is core comp. in 
reused for OpenNTPD, OPenOSPFD, and somee more.
bgpd has more than 52 message types, more than OpenSSH
bgpctl talks to bgpd using same imsg socket

tcp md5
some very old code in kernel for tcp md5, from 4.4 BSD
never worked
tcp md5 is somewhat similar to ipsec, ah, so implement
 it within IPSec maze.
Had to add pfkey interface to bgpd; committee designed
 API.
that made IPSec that much easier; extended the API so they
can request unused SPIs from kernel, don't have to be
configured manually.

tcp md5/ipsec
when you don't have tcp md5 or ipsec in place, big tcp
windows are risky

stay at 16k window unless you have tcp md5 or ipsec,
then you get 64K
so ipsec improves performance.

Joel Yagli asks how big a tcp window do you need for
a BGP session at all?  initial connection gets faster
with 64K, but thereafter, similar.

looking glass
just added an optional second control socket that is
restricted to the "show" operations
regular bgpctl binary can be used with it
cgi, yeah, that needs to be hacked in shape, but it's easy.

Juniper only does static IPSec setup, so requires nasty
setup.  OpenBGPD is dynamic, but interoperates with Junipers.

So back to looking glass, security
on OpenBSD, the httpd (an apache 1.3 variant)
runs in a chroot jail by default
th readonly socket can be placed inside that jail
bgpd_flags="-r /var/www/bgpd.rsock" in rc.conf.local

put a statically linked bgpctl binary in the chroot
/path/to/bgpctl -s /bgp.rsock, $

impressions from road to ipv6
most heinous checkin message yet.  The lower 2 bytes
of the scopeID overwrite part of the v6 address...ugly!

Performance
http://hasso.linux.ee/linux/openbgpd.php

it's quick openBGPD 3.6 port for linux; can't communicate
with kernel, no v6, no md5; 8 times faster than quagga.

future plans and ideas
the biggest task waits outside bgpd itself; kernel routing
 table.

we need to make use of the radix mpath capabilities
added in 2004, and add route source markers (BGP,
OSPF, etc)
 bgpd and ospfd can blindly install their routes
 kernel then knows precedence
hard to do, once it's done, routing will be easier.


Also need multiple routing tables, with pf acting as
table selector
so unholy route-to can died, and associated issues
vanish/

might be useful with bgpd as well.

iddeas for quite radical changs, speed up packet
forwarding dramatically.
will have fast path where all easy cases can be handled
on specialized PCI cards
multiple 10GE at wire speed within 2 years.  hardware
exists, on way to him.

for route servers, reversing filter and best path selection
would be good.

filter generation from RIPE DB or similar
 but IRR toolset sucks hairy moose balls
 should be solvable in perl "someone" has to code it.

(maybe use IRR power tools for it instead!)
[
we can fail over IP addresses already, thanks to CARP

we can hve synchronized state tables on multiiple machines,
 gives HA firewall clusters.

Would be really cool to be able to fail over TCP sessions
and bgp sessions.
could make for BGP hitless failover
syncging BGP stuff shouldn't be too hard
lots of work, not much time.


Money has to come from somewhere, obviously.
Unfortunately, people forget about this, just go to mirrors.
Vendors don't help
Never got anything for OpenSSH yet

it com

NANOG36-NOTES 2006.02.14 talk 7 Randy IRR routing security revisited

2006-02-14 Thread Matthew Petach

Many apologies...I'm no Stan Barber, but still doing my best to keep up
with the note-taking.  ^_^;;

Matt



Slides are on Randy's site at 
http://rip.psg.com/~randy/060214.nanog-pki.pdf
 
What I want for Eid ul-Fitr
Randy Bush
randy at psg.com

Definition of Eid ul-Fitr; end of Ramadan; breaking
of the fasting period, and of all evil habits.
Roughly October 24th this year.

10 years ago Randy plead for people to use IRR;
he gives, it didn't work, it has bad data, it
doesn't work.  Let's get rid of it.

Routing security is what we need.

Routing security gap
assume router has been captured.
routing security (not router) is a major problem.

http://rip.psg.com/~randy/060119.janog-routesec.pdf

need PKI, storing and passing and signing certificates.

Public Key Infrastructure
PKI Database
RIR Certs
ISP Certs
End Site Certs
IP Addresss Attestations
ASN Attestations

IP and AS Attestations
specifies identity == pyblic ckey of recipient
signed by allocator's private key
Follows allocation hierarchy
 IANA (or whomever) to RIR
 RIR to ISP
 ISP to downstream ISP or end user enterprise

IP allocation example
 IANA to RIR
 S.iana (192/8, rir)
 RIR allocatees to ISP
 S.rir(192.168/16)
and so on down the chain.
Each chain uses the private key to sign the certs
to hand down the chain.

ISP/End-site-certs
May be acquired anywhere.  Don't have to be chained to
a single master organization, and can use the same one
for multiple RIRs, orgs, etc.
RIRs can issue as a service for members who don't get
them anywhere.
They need no attestation because they are only used
 in business transactions where they are exchanged and
 managed by contract, or
 Bound to IP or ASN attestations by the RIRs or upstream
  ISPs.
Big ISPs may use an ARIN identity for an APNIC allocation
 or business transaction.

Since the keys are acquired separately, doesn't matter
where the certs come from, or where used.

RIR Identity similar.
it's their public key
can get it from 'above', RIR< NRO, IANA, or they can
even self cert.

No provision for revocation, however.

PKI Interfaces/Users
Nice slide showing the interrrelationships; go see
the slides for it, I won't try to render it in ASCII
in realtime.

The certificates are directly exchanged as part of
the business transaction when goods (IPs, ASNs, etc)
are exchanged.

Goal is to have formally verifiable route 
attestations, so want replicas of data near routers
to be used to determine validity of route origination
and propagation.

Transacting with PKI
RFC2585 descripts FTP and HTTP transport for PKI
no need for transport security!

Tools for RIRs
Generate and receive ISP certs
Receive ASN and IP space attestations from upstairs

Tools for ISPs
generate/get certs
register role certs
generate certs for downstreams
sign allocations to downstreams

Open Issues
Coordination of updates
one central repository not feasible
LDAPv3 RFC3377 and RFC2829 for authentication
Cert/key rollover and revocatoin not covered
May require a separate and secured communication
 channel

NSF via awared ANI-0221435
Steve Bellovin & JI

>From microphone, are there TTLs on certs?  Yes, which is
why ISP certs are separated out.  Addresses from ARIN are
only "yours" as long as you keep paying ARIN.
Tie certs to contract terms.  But the ISP identity cert
is yours, nobody else should have control over rollover
and expiration.

APNIC is working to have web pges

Andrew Dole, Boeing; how to get funded--Randy will take
cash donations.  Andrew thinks it'll take 10 million to
get the ball rolling.
Randy doesn't think that's the problem.  The operator
community would prefer to see a rigorously correct and
verifiable solution with reasonable security infrastructure
rather than one more hack on the IRR.
Second question.  What is forum to discuss and nail down
the details?  He'll be at APNIC in 2 weeks; for this region
the ARIN meeting in Montreal, and this meeting is good
too.
Nobody seems to be sure where the right place to do this
is.  But Randy thinks the important part is to SEND the
message, that there is a valid path.

Vince Fuller.  Soliciting input from this group is a
good thing, but be more targetted.  Figure out why the
previous efforts failed, and target them.
Chris Morrow, Ted Seely...Randy targets some specific
people in the audience.

Chris Morrow notes that one challenge he faces is
being able to verify if filters are correct.  
Randy notes the ROUTER will verify the validity itself.
Chris feels doing it in OSS system is safer.

RS--how do you deal with crufty stuff?  RIRs and
community will have to deal with that, he's just
talking about giving tools to make it possible.

Sandy Murphy, Sparta--Randy, you've said there's no 
prefix lists needed for this; but this could be used 
for building filter lists, or checking updates, or for
tracking customers who call in with issues, etc.
this is a first step for a whole BUNCH of things.
So no matter what else we want to build on top of
it, this really is the first level of the fou

NANOG36-NOTES 2006.02.14 talk 4 Flooding via routing loops

2006-02-14 Thread Matthew Petach


2006.02.14 talk 4 Flooding attacks

Jianhong Xia
 
A new talk added right before lunch by
Randy Bush will push us to 12:25.

Two talks coming up about DoS attacks
against control information

Flooding Attacks by exploiting persistent
forwarding loops.

Introduction: routing determines forwarding path.

Transient forwarding loops happen all the time
during convergence; that's normal.  But this
focuses on persistent fowarding loops.

why would persistent loops exist?

Example on neglecting pull-up routes.
Router announces 18.0/16 to internet
router A has default pointing to B
router A uses 18.0.0/24 only
Any traffic to 18.0.1.0-18.0.255.255
will enter the forwarding loop between
A and B

Risk of persistent forwarding loops can
amplify based on ttl of packets injected into
the looping pair of routers.
Can create a denial of service by flooding the
upstream links between routers in front of host
they want to knock off.
any other hosts behind that link are "imperiled
addresses" 

Measurement Design:
balancing granularity and overhead
samples 2 addresses in each /24 IP block
Addresses space collection
 addresses covered by RouteView table
 de-aggregate prefixes into /24 prefixes
  fine-grained prefixes
data traces
 traceroute to 5.5 million fine-grained prefixes
 measurement lasts for 3 weeks in sept 2005

Almost 2.5% of routable addresses have persistent
forwarding loops
Almost .8% of routable addresses are imperiled addresses.

Validating these persistent forwarding loops
from multiple places
 from asia, europe, west and east cost of US
 90% of shadowed prefixes consistently have persistent
 forwading loops
Validation to multiple addresses in shadowed prefixes
 sampling 50 addresses in each shadowed prefix
 68% of shadowed prefixes shows that...

Properties of the loops
How long are the loops?
 86.6% of loops are 2 hops long
 0.4% are more than 10 hops long
  some are more than 15 hops
location
 82.2% of persistent loops happen within destination
  domain
implications
 significantly amplify attacking traffic
 can be exploited from different places.

(oops.  Matt gets paged out to deal with issue, so no
 more notes for a while).



NANOG36-NOTES 2006.02.14 talk 3 Flamingo Netflow Visualization Tool

2006-02-14 Thread Matthew Petach

2006.02.14 talk 3 Flamingo netflow visualization

Manish (from BGP Inspect project from Merit)
bgpinspect.merit.edu:8080

He'll be talking later at the Tools BOF as well
apparently.

Introduction: What is Flamingo?
Visualization
The Flamingo Tool
 combining visualizations with controls
Case Studies
 traffic anomaly
 network scans
 worm traffic
 P2P traffic
 the slashdot effect.


The tool has been under development for a year;
John, in audience, and Mike (now employed) have
been working on it as undergrads.

It's just a view into netflow, no filters or
adjustment of data
it's just a visualization system.
client/server architecture

a single server can support multiple clients

Visualation methods
5 different views
extended quad tree implementation
 volume by src/dst IP prefix
 volume by src/dst AS

Basic quad tree
represent 32bit IP address into fixed space.
4 areas representable by 2 bits.  Keep splitting
16 times, you represent 32 bit address in 2D
mapping.

convert it into 3 dimension, have an axis of
freedom to represent additional info.

So one side is the quad tree, the Z axis is volume
of traffic, so you can see relative volumes.

nice slide showing visualization of the traffic
flow patterns.

Can show traffic flows aggregated by src/dst IP;
now there's 2 surfaces needed on the cube, so they
use line thickness between the surfaces to show 
flow sizes between ASes.

last visualization incorporates port info as well
But since there's only one axis left;
so now port level info is on z axis.
so IP/port is X1Y1Z1; same for dest IP and port.
Once there are coordinates, the line can be drawn,
scale the width based on the volume, and now you
have the full info in one view.

Same colour used to represent traffic from the
same source ntuple.

combine 2D and 3D representation of data to help
keep yourself oriented.

They have text representatiosn of information,
same as visual data, but in text form.
Slider bars allow thresholding of what gets
displayed, to prevent clutter; only over a certain
size, or only certain ports, etc.

Can also apply labels to help pull information out
for fast refrence.

You can also restrict the address space you care about
to only look at certain subnets.

Case study: Traffic anomaly sunday Oct 16, 2005

large burst of traffic from one host at umich,
lasted 6 hours, four targets, not widely
distributed, it was UDP traffic.
Was visible in normal view.
from 12pm to 6pm.
visible on main view, zoomed in, and the 4 million
flows show as a huge block.
going to src/dest view lets you see where the traffic
is going.
adding the port info, and you see the entire port
space is being sprayed.

Another case study--worm traffic doing port 42 scans
a fan view on the graph, highly visible.

An artificial case study, a host scanning a /24 
subnet

SSH scans also show up as many many ports probing
a single port; a reverse fan.

Slashdot effect on campus Oct 31 2004; have before
and during pictures showing the huge traffic swing.

Zotob worm infection;
random destination IPs, but same port, coming from
same host, cone effect.

P2P traffic; single host with multiple connections
to different destinations, significant volume to each.

Darkspace traffic visualizations show nothing but
scans, show up really dramatically.

Conclusion
The Flamingo Visualization Tool provides users with
the ability to easily explore and extract meaning
information regarding traffic flows in their network.

More will be discussed at the Tools BOF this afternoon.

http://flamingo.merit.edu/

Break now, come back at 10:50.  Someone left a jacket
at the Yahoo party with a digital camera; describe it
to the registration desk to get it back.



NANOG36-NOTES 2006.02.14 talk 2 Netflow Visualization Tools

2006-02-14 Thread Matthew Petach

2006.02.14 talk 2 Netflow tools

Bill Yurcik
byurcik at ncsa.uiuc.edu

NVisionIP and VisFlowConnect-IP

probably a dozen tools out there, this is just
two of them.  Concenses is there's something to
this.

They're an edge network, comes into ISP domain,
their tools are used by entities with many
subnet blocks.

Overview
Project Motifivation
Netflows for Security
Two visualization tools
 NVisionIP
 VisFlowConnect-IP
Summary

Internet Security:
N-Dimensional Work Space

large--already lots of data to process
complex--combinatorics explode quickly
time dynamics--things can change quickly!
Visualizations can help!
 in near-realtime
 overview-browse-details on demand

People are wired to do near-realtime processing
of visual information, so that's a good way to
present information for humans.
HCI says use overview-browse-details paradigm.

Netflows for security
can identify connection-oriented stats to see
things like attacks, DoS, DDoS, etc.
Most people don't use the data portion of the
flow field, the first 64 bytes, they just look
at header info or aggregated flow records.

Can spot how many users are on your system at
a given time, to schedule upgrades.

Who are your top talkers?

How long do my users surf?  What are people using
the network for?

Where do users go?   Where did they come from?

Are users following the security policy?

What are the top N destination ports?
Is there traffic to vulnerable hosts?

Can you identify and block scanners/bad guys?

This doesn't replace other systems like syslog, etc.;
it integrates and works alongside them.

architecture slide for NCSA.

Can't really do sampled view for security, so probably
need distributed flow collector farm to get all the
raw data safely.

Two visualization tools:
NVisionIP, VisFlowConnect-IP

focus on quick overview of tools
security.ncsa.uiuc.edu/

3 level hierarchical tool;
galaxy view (small multiple view) ((machine view))

Galaxy is overview of the whole network.
color and shape of dots is each host in a network.
settable parameters for each dot.

Animated toolbar and clock show changes over time
in the galaxy.
Lets you get high-level content quickly and easily.

Domain view lets you drill in a bit more; small
multiple view looks at the traffic within the
block.
upper histogram is lower, well known ports; lower
histogram is ports over 1024

You can click on a given multiple view entry to
delve into one machine.
Many graphs for each machine in the most detailed
view.

well known ports first, then rest of ports (sorted)
then source and destination traffic broken out.

Designed for class Bs.

http://security.ncsa.uiuc.edu/distribution/VisFlowConnectDownload.html

3 vertical lines, comes from edge network perspective; 
middle line is edge network to manage.  You set range
of networks you care about.  Outside lines are people
sourcing or sinking traffic to you, from outside
domains.

There's a time axis, traffic only shown for the slice
of time currently under consideration.
Uses VCR-like controls to move time forward/backward

Lets you see traffic/interactivity, drill into that
domain, see host level connectivity flows.

Shows MS Blaster virus traffic as an example.

Example 2, a scan example.  Just because it looks
like one IP hitting many others doesn't mean it's
really a security incident, though; could be a
cluster getting traffic.

web crawlers hitting NCSA web servers make for
a very charateristic pattern over time.

Summary
Netflows analysis is non-trivial, 

NVisionIP
VisFlowConnect-IP

lots of references listed in very fine blue font.

http://security.ncsa.uiuc.edu/distribution/NVisionIPDownload

Avi Freedman, Akamai, Argus was mentioned a lot; it
lets you grab symmetric netflows, but also does TCP
analysis, shows some performance data as well.  not
sure if people are studying the impact of correlating
argus data with flow data.

Roland Douta? of Cisco; many people are using netflow
to track security issues.  They now have ingress and
egress flow data on many of their platforms.
In reading paper describing it, there's data conversion
that needs to happen into an internal format that
nVision can understand.  It reads log files at the
moment, takes about 5 minutes to process files.  Lets
them take different file data sources, make the tool
for visualization independent of the input format.
They can read large files, but there is a performance
hit when doing it.
Are they planning on doing further work on the tool
to collect TCP flags, for frags, drop traffic, etc?
They've looked at it, but they leave it to IDS tools
for flag activity.  Might be of interest to consider
for future versions of the tools.

Last question came up, echoed about argus.
Question about interactivity, they are working on
feedback through tools.  Question about alarming
on patterns; but once you start alarming or putting
up visual indicators, it distracts from rest of
the overall pattern, you tend to miss other information.






NANOG36-NOTES 2006.02.14 talk 1 IRR power tools

2006-02-14 Thread Matthew Petach

Apologies in advance, notes from this morning will be a bit
more scattered, as I was working on an issue in parallel
to taking notes.

Matt



2006.02.14 talk 1 IRR Power Tools


12:10 to 12:25, extra talk added, not on
printed agenda.
Thanks to those who submitted lightning
talks.

PC committee members are doing moderation,
Todd Underwood will be handling the first
session this morning.

There will be 3 talks about tools for operators
1 IRR and 2 Netflow tools.  Be thinking of 
interesting questions to ask.

Todd has to introduce RAS at 9am, 7am west
coast time which is normally his bedtime.

IRR power tools, Dec 2004 first generation
re-write.

IRR--a quick review
People have been asking him "why do we need
the IRR?"  Any time you have a protocol like
BGP that can propagate information, you need
some form of filtering in place to limit 
damage.

IRRs are databases for storing lists of
customer information.  Written to speak RPSL
some speak RPSLng.
RADB
ALTDB
VERIO, LEVEL3, SAVVIS
RIR-run databases: ARIN, RIPE, APNIC, etc.

IRRs better than manual filtering.
huge list on the slides.
Filtering is needed, and hard to keep updated by
hand.

Why doesn't everyone use IRR?
Many people do
In Europe, pretty much total support in Europe; it's
required by RIPE, providers won't deal with you if
you don't keep your entries up, large exchanges likewise
check.

Few major networks in US use IRR too:
NTT/Verio
Level3
Savvis
Most people don't.

Why doesn't everyone use it?
In US, it's too complex for customers.
support costs go up if you have to teach customers.
Networks don't like to list their customers in a public
database that can be mined by competitors

RAS figured he could fix at least one piece
Wrote a tool to help with:
automatic retrieval of prefixes behind an IRR object
automatic filtering of bogon or other undesirable routes
Automatic aggregation of prefixes to reduce config size
Tracking and long-term recording of prefix changes
Emails the customer and ISP with prefix changes
Exports the change data to plain-text format for easy
interaction with non-IRR enabled networks
Generates router configs for easy deployments.

Doesn't do import/export policies,
doesn't do filter-sets, rtr-set, peering-set, etc.
Just focuses on essential portions.

Tool was written around IRRToolSet initially, but
the C++ code didn't compile nicely.
This isn't a complete replacement for IRRToolSet,
but provides the basic functionality

A few conf files:
IRRDB.CONF
EXCLUSIONS.CONF
NAG.CONF

./irrpt_fetch grabs the current database info

It also speaks clear english on add/remove of
prefixes for access lists; default format is
english, but you can change it to diff format.

./irrpt_pfxgen ASNUM
generates a prefix list suitable for the customer
interface.
Can use -f juniper to create juniper filters.

http://irrpt.sourceforge.net/
Always looking for more feedback; it's been deployed
by a few people in the peering community; this will
be its first widescale announcement.

Future plans:
Add support for IPv6/RPSLng
 needs IPv6 aggregation tools
RADB tool uses a faster protocol, RIPE just breaks down
  one level; you have to do multiple iterations to get
  the full expansion.  Servers tend to time out before
  you can get all the answer; RIPE servers have hard
  3 minute timeout that closes the socket.
 Add SQL database support for a backend
 Convert from a script to a real application
 IRRWeb -- http://www.irrweb.com/

He'll talk about irrweb at next nanog.
Allow end users to register routes without needing to
know ANYTHING about RPSL

You can play with it, register routes, but it
doesn't publish anywhere.

That's it--happy valentine's day!
Richard A Steenbergen ras at nlayer.net

Susan notes that
RADB is developed by Merit, the two primary
developers are here today
Chris Fraiser, main cust interface now
Larry Blunk is RPSLng person, also here today.

Right now, no mirroring between IRRs, you have
to mesh with everyone else when a new IRR comes
up.  RADB at least does pick up from the others,
so right now RADB is the best spot to do your
queries against.

Todd asks about filters; does it do prefix list
only, or prefix list plus as-path?
It builds off as's behind other as's, which might
not be the best model; latest code is starting
to understand as-sets.  To do it properly, you
might need import/export policy support.

Randy Bush, IIJ.  Like IPv6, this meeting marks
the tenth anniversary of Randy pushing for IRR
adoption.  And like IPv6, adoption rate has not
been going well.  What's wrong?
Pretty much too complex, which is why this effort
is to make it much simpler, to try to get more 
uptake in the US.

Todd notes that 2 things; 1, tools are too difficult;
this addresses that.  second piece is that in US,
allocations aren't tied to registry entry creation;
this won't solve that part at all.

For the second part, the benefits are seen mostly
the closer you are to the registration process.
Anyone can register any block; and if you don't
use 

NANOG36-NOTES 2006.02.13 talk 7 QoS in MPLS environments

2006-02-13 Thread Matthew Petach

Here's my notes from the MPLS QoS tutorial; wish I could have
been in two places at once to catch the ISPSec BOF as well.
I won't be taking notes at Eddie Deens, though, so it'll be up
to Ren's camera to capture the details for those following along
at home.  < http://nanog.multiply.com/ >

Matt



2006.02.13 
QoS in MPLS networks tutorial notes.

See notes for Agenda, outline, etc. at
http://www.nanog.org/mtg-0602/sathiamurthi.html

Traffic characterizations go beyond simple DiffServ
bit distinctions 
Understand traffic types and sources and nature
of traffic before

Latency, 
Jitter,
Loss
three traffic parameters to be tracked that influence
choices made when applying QoS

It's all about managing finite resources
 rate control, queing, scheduling, etc.
 congestion management, admission control
 routing control traffic protection

The QoS Triangle (no, not bermuda triangle)

Identify Traffic Type
Determine QoS parameters
Apply QoS settings

2 approaches to QoS
fine-grained approach
or
combination of flows to same traffic type, to same
 source.  Needs to have same characteristics so you
 can consider them as an aggregated flow.

Best Effort is simplest QoS
Integrated services (Hard QoS)
Differentiated Services (soft QoS)

Best Effort is simple, traditional internet

Integrated services model, RFC 1633, guarantees per
flow QoS
strict bandwidth reservations.
RSVP, RFC 2055, PATH/RESV messages
Admission controls
must be configured on every router along path
Works well on small scale.  Scaling challenge with large
 numbers of flows.
 What about aggregating flows into integrated services?

DiffServ arch; RFC 2475
scales well with large flows through aggregation
creates a means for traffic conditioning  (TC)
defines per-hop behaviour (PHB)
edge nodes perform TC
  keeps core doing forwarding
tough to predict end to end behaviour
 esp with multiple domains
 how do you handle capacity planning?

Diff services arch slide with pictures of
traffic flow.

TCA prepares core for the traffic flow that
will be coming in; allows core to do per-hops
behaviour at the core.

IETF diffserv model
redefine ToS byte in IP header to differentiated services
code point (DSCP)
uses 6 bits to define behaviour into behaviour aggregates.

Class Selector (CS0 through CS 7)

classifier; selects packets based on headers.

Classification and Marking
flows have 5 parameters; IP src, dest, prececedence,
DSCP bits,

You can handle traffic metering via adjusting the
three flows.


3 parameters used by the token bucket;
committed information rate
conformed and extended burst size

Policing vs shaping.
policing drops excess traffic; it accomodates bursts;
anything beyond that gets dropped; or, can be re-marked.

Shaping smooths traffic but increases latency.
buffers packets.

policing
uses the token bucket scheme
tokens added to the bucket at the committed rate
depth of the bucket determines the burst size
packets arriving when there's enough tokens in the bucket
are conforming
packets arriving when the bucket is out of tokens are
non-conforming; either coloured, dropping, etc.

diagram of token bucket, very nice.

shaping--use the token bucket scheme as well
smooths through buffering
queued packets transmitted as tokens are available.

1 aspect is traffic conditioning at edge
2 aspect is per hop behaviour

PHB relates to resource allocation for a flow
resource allocation is typically bandwidth
 queing / scheduling mechanisms
 FIFO/WFQ/MWRR(weighted)/MDRR (deficit)
congestion avoidence
 RED (random early detection / Weighted random early drop

Queing/scheduling
needs some data mining to decide how to prioritize certain
classes of traffic.
de-queues depends on weights assigned to different flows.

Congestion avoidance technique
 when there is congestion what should happen?
 tail drop (hit max queue length)
 drop selectively but based on IP Prec/DSCP bit
Congestion control for TcP
 adaptive
 dominant transport protocol

Slide showing problem of congestion; without technique,
have uncontrolled congestion, big performance impact
due to retransmissions.

TCP traffic and congestion
congestion vs slow-start
 sender/recieever negotiate on it.
 source throttles back traffic.
 (control leverages this behaviour)

Global synchroniztion happens when many flows pass through
a congested link; each flow going through starts following
the same backoff and ramp up, leads to sawtooth curves.

RED
a congestion avoidance mechanism
works with TCP
uses packet drop probability and avg queue size
avoids global synchronization of many flows.
minimizes packet delay jitter by managing queue size

RED has minimum and maximum threshold; average queue
size is used to avoid dealing with transient bursts.
WRED combines RED with IP precedence or DSCP to 
implement multiple service classes
each service class has its own min and max threshold and
 drop rate.

nice slides of lower and higher thresholds for different
traffic types.

When is WRED used?  only when TCP is bulk of traffi

NANOG36-NOTES 2006.02.12 talk 5 IPv6 --fear and GOSIP in Dallas

2006-02-13 Thread Matthew Petach

Apparently the video feed is of very good quality this time around--many
thanks to Brokaw for the good bandwidth to the hotel!

Last set of notes before lunch.

Matt


2006.02.12 NANOG IPv6 transition panel
panel member briefs at 
http://www.nanog.org/mtg-0602/golding.html


IPv6: time for transition, or just more GOSIP?

GOSIP was initiative to use OSI networking throughout
the government.

5 participants
Joe Houle ATT
Jared Mauch NTT America
Wes George, Sprint
Jason Schiller, UUNet/Verizon
Fred Wettling, Bechtel

Tried to get government people, since they went v6,
but they're not forthcoming with details; you know
how government people are.  :D  

Daniel Golding, The burton group

Joe Houle, ATT is up first.  Emerging service for
ATT for L2/L3, IP private networking, v6, etc.
fall under his baliwick.  He'd count himself as
pragmaticlly pro; IPv6, why now?  He does believe
we're running out of IPv4 addresses.  NATs and
non-unique addresses make offering quality services
difficult.  Convergence doesn't work well over
NAT'd addresses. 
why governments?  US government doesn't want the
have-have-not split to continue; the v6'ers may be
the "have" side and we don't want to be on the
have-not side.

NTT America (AKA 2914)
Native dual-stack IPv4/IPv6 since fall 2003
Cisco 7200, 7500, "76k"
Juniper M series, T-series

Wes George, Rob Rackell hat, couldn't be here
due to weather.  Pro v6, looking at it with skepticism.
Sprint close to center of v6 world.
200pps on v6 network.
Internet doesn't use v6 for real yet
this is not the movie as the ISO fun; this time the
 government is paying!
IPv6 is something that US carriers can make money on
 in the VPN space
It is not valuable as an internet transport yet
 spend less time marketing about how cool it is, and go
  fix the issues!!
  multihoming, micromobility, SHIMv6 is a host solution.

This time around, the government is paying.  They don't
know exactly what they want, but they know they want it.
hoping carriers will figure it out and tell them.

Jason Schiller, UUnet/Verizon.
public v6 roadmap.
AS284 US/AS12702 EMEA/AS18061 AsPAC) for v6 only
Over network  utilizeing GRE
Phase 2
6PE solution in AS701
dual stack v4/v6 on edge
mail, DNS support
later phase 2a
 upgrade exising non-6PE capable edge routers
2007, phase 2b
 native v6 in the core (maybe)

Problem is, no money yet in v6, so can't roll out
aggressively at all.
But if no money, why put it in the core?  Well, to
be ready in case it DOES take off in the future.

Fred Wettling, Bechtel--large enterprise, also with
v6 business council.
Bechtel Telecoms (A & C for big carriers like Sprint, ATT,
 etc).
Interested in non-traditional transport of IP services.
shift in plant automation networks from proprietary
to IP; so want to be ahead of the curve on it.
Bechtel's internal test started last year, will be
deployed out to 40,000 by this year; a bit of the
chicken and egg issue, go back to 1995, IE v1 vs
today; things will progress, things will take off,
the goal is to be ahead of the curve.

Daniel Golding, host for the panel.

Question 1:
Why IPv6, why now?  Why are you implementing v6, other
than it's cool?  Is it address exhaustion, new capability,
Gov't RFP requirements, vendors pushing new hardware?

Jared notes they rolled it out in 2003 due to global
pressures; they wanted to keep a unified network model
worldwide, and as a subsidiary of a japanese company,
and the largest player in that space, combined with
government mandates, really pushed them in that space
early.
It _is_ a technical cool thing, it's good to be a
market leader.  Jared notes that they've been running
dual stack v4/v6, it just works.

ATT VoIP has been a driver, just doesn't work over NAT,
so what other solutions are there?  Really, address
exhaustion, non-unique addresses propagating throughout
space is just putting roadblock after roadblock in front
of convergence.
Dan asks why do we need NAT--we're not OUT of v4 addresses
yet; Joe notes that people are really using NAT as a
security mechanism right now, more so than really worrying
about conserving address space.  Yes, it's bogus, but
it's what people have been sold on right now, so it
gets widely used.
Jared pitches in and notes that the push for encapsulation
of everything encapsulated over port 80 is getting more and
more widespread.  People are attempting to use "firewalls"
and "NATS" to give themselves the notion of security, 
even though most infection rates now are coming from
other vectors (spyware, infected email, etc), rather
than outside probing.
Dan notes we don't need to do NAT, they can go to 
their upstream, to ARIN.  But ARIN frowns on using
public space for private use?

Bechtel notes they're running into more and more
problems as they try to get companies to do joint
ventures, as every company uses 10.x space, and
they have to do NAT over NAT, it's evil.  He's
also an IMOD (infrastructure modernization)
player, it's a 4 billion dollar upgrade for the
military, and it ha

NANOG36-NOTES 2006.02.13 talk4 DNS infrastructure distribution

2006-02-13 Thread Matthew Petach


2006.02.13 Steve Gibbard

DNS infrastructure Distribution
Steve Gibbard
Packet Clearing House
http://www.pch.net/
scg at pch.net

Introduction
Previous talk on importance of keeping criticical
 infrastructure local
Without local infastructure, local communications are
 subject to far away outages, costs, and performance
Critical infrastructure includes DNS
If a domain is critical, so is everything above it in the
 hierarchy
Sri Lanka a case in point.

Previous talk was in Seattle last spring, highlighted
undersea cable being cut; even local DNS queries failed
since TLD servers couldn't be reached, even though
local connectivity still worked.  The ship dragging
anchor in harbor cut only undersea path out of the
country; international calling was down, and all of
the Internet.  But unlike local telephone system,
even local networks failed to work.

Root server placement
Currently 110 root servers(?)
 Number is a moving target
Operated by 12 organizations
13 IP addresses
 at most 13 servers visible from any one place at any one 
  time
 six are anycast
 four are anycasted in large numbers
All remaining unicast roots are in the bay area, LA,
 or washington DC

Distribution by continet
34 in NA
 8 each in BA/DC/ 5 in LA
 Only non-coastal roots in US are Chicago and Atlanta
 canada, monterrey, mexico some others
34 in Europe
 clusters of 4 each in London, and amsterdam, Europe's
  biggest exchanges
 even throughout rest of europe for rest.

Distribution by continent
26 in Asia (excluding middle east)
5 in japan (4 tok, 1 kyoto)
3 in india, korea, singapore
2 in hongkong, jakarta, and beijing
south asia an area of rapid expansion
6 in australia/new zealand
 2 in brisbane
 1 each in auckland, perth, sydney, and wellington

5 in middle east
 1 each ankara, tel aviv, doha, dubai, abu dhabi
3 in africa
 2 in johannesburg
1 in nairobi, 1 more being shipped
very little intercity onr intercountry connectivity
2 in SA
 sao palo
 santiago de chile

other parts of world not really served at all.
world map with blobs showing coverage.  Huge areas
not covered.
overlaid fiber maps with dots to get ideas of
coverage (redundant); everyone else is one fiber
or satellite cut from being isolated and dark.
Pretty much follows the areas with money.

Root server expansion
4 of 12 root servers actively installing new roots
110 root servers big improvement over 13 from 3 years
 ago
two operators (autonomica, ISC) (I and F) are installing
wherever they can get funding
 funding sources typically RIRs, local governments, or
  ISP associations
 Limitations in currently unserved areas are generally due
  to lack of money

Fs and Is
In large portions of world, several closest roots are
 Is and Fs
 At most 2 root IP addresses visible; others far way
 Does this matter?
  gives poorly connected regions less ability to use
   BINDs failure and closest server detection mechanisms
  non-BIND implementations may default to far-away roots
 Should all 13 roots be anycasted evenly?
  CAIDA study from 2003 assumed a maximum of 13 locations;
   not really relevant anymore

Big Clusters
Lots of complaints about uneven distribution
Only really a concern if resources are finite
Large numbers in some places donesn't prevent growth in 
 others
Bay Area and DC clusters seem a bit much, but sort of match
 topology
Western Europe's dense but relatively even distribution
 exactly right
Two per city perhaps a good goal for everywhere

TLD distribution
Like the root, locally used TLDs need to be served
  locally
Locally used TLDs: local ccTLD; any other TLDs commonly
 in use
Regions don't need ALL TLDs.

gTLD distribution: .com/.net
.com/.net
 well connected to the "internet core"
 servers in the big cities of US, Europe, Asia
 non-core location: sydney.

Map of world with .com/.net overlaid with fiber maps
shows "well-served areas" again following the money,
with even less coverage outside NA/Europe/Asia.

gTLD dist: .org/.info/.coop
share same servers
considered confidential.  data may be incomplete
significantly fewer publically visible servers,
almost all in internet core.
only one public locatino in each of asia and europe

Even worse coverage worldwide, though they do have
south africa.

Do have some caching boxes next to caching resolvers
at the big ISPs; not sure if it increases coverage
or not.

Few other gTLDs, didn't map them.
.gov is us-centric
.edu is US, some eu, some asia
.int is california, netherlands, UK
  (not very international!!)

Where should gTLDs be?
presumably depend on their markets
if it's ok for large portiions of the world to not use
  those gTLDs, then it's OK for them to not be hosted there.

ccTLD dist:
 answers to where ccTLDs should be more straightforward
  working in their own regions a must
  working in the "core" could be a plus
just over 2/3 of ccTLDs are hosted in their own
  countries
(but a lot of those aren't ...

Green map shows those countries that host their own
ccTLDs locally.  Most islands are red, in danger of
bei

NANOG36-NOTES 2006.02.13 talk 3 NTT labs AAAA query explosion worries

2006-02-13 Thread Matthew Petach

(Huge apologies in advance for any and all names I completely
mangle!  check http://nanog.multiply.com/ to see names/faces
correctly handled by Ren.  ^_^; )

Matt


2006.02.13, talk 3
NTT labs, (Steve Feldman apologizes for mangling the
pronnounciation of their names).

NTT information sharing platform labs
(didn't get names/info from opening slide)

Outline
Expect increase in number of DNS queries this year
Discussion
 effect on cache server load and user response time
 how can we decrease number of unnecessary queries?

Today's topic
we focus on increase in number of queries between users
and cache servers caused by
 IPv6 support
  number of 4A queries same as that of A queries
 domain name completion function
  DN completion by OS
  DN completion by application

IPv6 enabled OS increases 4A queries
 Vista will be v6 enbled by default

IPv6 and OS resolver
IPv6 enabled OS sends 4A queries for every name resolution
BSD/Windows
  Sends both A and 4A queries for every name resolution
   currently no way to disable one or the other

Domain Name Completion
 when a name resolution fails, both OS and APP automatically
 try different prefix/suffix completions.

OS using these domains to complete:
 FreeBSD: specified by "search" in /etc/resolv.conf,
  distributed by DHCP
 Windows: configured in control panel, distributed by
  DHCP
 Applications:
  Mozilla: retries with www domain prefix
  IE searches domain using MSN search and then retries
   name resolutions for domains by adding .com, .org,
   .net, .edu.

Convenient for user, perhaps, hard on nameservers.

Combination in FreeBSD
completions are different depending on OS
FreeBSD
 tried domain completions for A and 4A for each case.
Windows tries all 4A records first, THEN tries all A
 records.

So IPv6 queries in Windows means even if there's an
A record in v4 space, it exhausts ALL 4A possibilities
FIRST, before going back to get A record.

Longhorn/Vista
IPv6 default enabled
 ALWAYS tries 4A queries first!

IE7 plus Vista results in 12 DNS queries per user click,
best case.
Worst case, one user click results in 40 DNS queries!!

Slide showing projected impact based on historical
data plus projected Vista deployment.
Right now, 4A queries only about 5% of queries.
After Vista, size of increase could dwarf rest of
DNS queries.

Release of Windows Vista (IPv6 by default)
 doubles at least the number of user queries
 causes more queries in domain name completions and domain 
  search sequences

Operators
 cache servers should be prepared for those increases

 stop domain distribution to users by DHCP or PPPoE
Developers of OS
 is current search order of resolvers appropriate?
  eg should "A" record be resolved before domain completion.

Ed from Neustar, at microphone: before we consider this
a problem, consider from point of application provider;
when you need a name, you don't know what transport you
may have underneath; if you wait for NXDomain, you 
increase latency, so app developers generally send all
queries at once.
What about changing DNS to allow asking for multiple
questions at once?
Changing application behaviour isn't likely to happen,
and changing protocols isn't easy; so why not just
beef up the infrastructure to handle it?

Joel Yagli, UofOregon; do you know how many of those
queries will need to fail over from UDP to TCP due to
responses being too large to fit into a single UDP
response?
Most of the responses coming back don't have data, so
they don't need to go to TCP.

Tony Bates--what happens when v6 record is returned
as valid; does the chain stop there?
Also, if you flip to return A record first, we'll
never to move to v6.  We NEED to start resolving v6
records first, to help move the 'Net off IPv4.

Applause, on to next talk.



NANOG36-NOTES 2006.02.13 talk 2 Duane Wessels, DNS cache poisoning

2006-02-13 Thread Matthew Petach


2006.02.13 talk 2
DNS cache poisoners
Lazy, Stupid, or Evil
Duane Wessels

Motivation
During March/April 2005, SANS internet storm
center reported a number of DNS cache poisoning "attacks"
were occurring
Poisoned nameservers have bogus NS records for the com zone
SANS ISC theorizes it may have been a vector for spyware 
propagation
Microsoft windows (most versions) and symantic firewall 
products are affected.

Slides are on the website, BTW.

The poisoning attack:
an auth nameserver (where queries normally go) is
configured to return bogus and out of baliwick NS
auth records.

caching resolver receives and trusts those bogus referrals

future queries for names in poisoned zone go to the
bogus NS 

dig +trace longislandauction.com
will show the poisoned NS auth responses
NS auth1.ns.sargasso.net.

which has 
NS  com.  auth1.ns.sargasso.net.

so any caching resolvers may consider auth1.ns.sargasso.net
authoritative for any unknowns in com zone.

Vulnerable implementations:
Windows NT (by default, SP4, can tweak it via reg)
Windows 2K, (by default, later fixed)
Windows 2003 (not by default, but easy to unfix)

Symantec gateway firewalls
SYM04-010 and SYM05-010 to Yahoo search and find more.

How to find poisoners?
start with a large list of DNS names or zones
discover set of auth servers for the zone by following
 referrals on down from root
query each auth nameserver
compare the NS RR set in each reply to the previously-learned
 referrals for parent zones
this technique only finds parent-zone poisoning.

February 2006 Survey
input list is about 6 million names from nameservers they
have access to.
Found 284 "poisoning" nameservers; returns bogus NS
entries for root or TLD.
. has 217
com 49
net 29
org 24
au 3
cc 2
cn 1
to 1
default 1

some nameservers poison more than one zone.

List of some poisoners on slide 12.
dns.internic.ca
ns1.afternic.com
ns0.directnic.com
ns1.domainsarefree.com
etc.

Never attribute to malice what can be adequately to be
 explained by stupidity
Many of the nameservers that return bad referrals
appear to be companies in the DNS business
 registrars
 resellers
 speculators
 typo profiteers
others appear to be legitimate companies
they should know better
many of the names leading to poisoners are either expired
or parked

Is the sky falling?
with so many poisoners out there, why don't we hear
more about the problem?

Most implementations don't allow root to be poisoned

If you were surfing the web with poisoned DNS cache,
would you know it?

let's simulate it...
for every bad referral found, we
 put the nameserver's IP address
 go to www.google.com
 go to www.microsoft.com
see what you get.
bbns01.secureserver.net, for example, happily pretends
to be google.com.

dns.domainsatcost.ca is amusing, because their ads are
from google, even as they hijack it.

a.ns.nameflux.com at least does an HTTP redirect

dns2.nai.com
doesn't return any A record, so you at least know
*something*'s wrong.

More examples follow...
ns1.frakes.net

ns.pairnic.com "smart people use pairnic for DNS"...
Duane would beg to differ.

65.75.128.178.com
returns an amusing message that is clearly wrong,
blaming the clients for the traffic the DNS server
itself is causing.

Lazy, Stupid, or Evil
Laziness:  ns1.hi2000.com
The admin is too lazy to put each domain delegated to
them into separate zone files.  Instead, they create a
com zone and list A records for each delegation.

Laziness such as this is probably the source of most of
the poison out there.

(includes guess at what their zone file looks like)

Stupidity: ns1.frakes.net
Typos, combined with laziness, create an interesting
situation.  Looks like Frakes.net is using the com zone
technique, but forgot to make the nameservers fully qualified.

Note that ns1.com etc are legitimate DNS names and have
A records different than those returned by ns1.frakes.net.

Just forgot the dot after the trailing name on the NS
record.

Evilness:
Our definition of an evil poisoning nameserver is one
where it answers queries with the wrong address, and
maybe proxies web traffic sent there so you get what
you (mostly) expect.

To help find them, give each source of poison an
evilness ranking from 1-5, with one point for each
issue below:
 Returning bad referral
 poisoning a TLD
 Answering an A query for "important names"
 Answering query incorrectly
 Answering the query such that the web browser looks
  like it *might* be correct DNS

A few fours, no fives.

Miscellany
Some of the poison sources that we find are actually
vulnerable implementations that hve been previously
poisoned by someone else.
Remember: authoritative nameservers should NEVER accept
 recursive queries!!
Some NS records have non-FQDN names.  The name "ns"
is a popular example.
It's a good thing even the vulnerable implementations
 don't let the root zone become poisoned.

Bottom Line:
Several hundred misconfigured nameserves out there that
return bad referrals that can poison DNS caches
About 75% try to p

NANOG36-NOTES talk 1--steve feldman

2006-02-13 Thread Matthew Petach

Based on generally positive feedback from many people,
I'll be posting my notes from the conference.  I'll preface
the subject line with NANOG36-NOTES, so if you want
to mass-skip the thread, it should be easy to do so.


2006.02.13 NANOG36 day 1

Opening/welcome to Dallas

Steve Feldman starts off--many people are
still trapped out east unfortunately.

Steve Feldman, Program Chair, CNET networks.

Texas Compact Car == SUV

Our Host:
Brokaw
Brian
Mike
Todd Parker
Raj Patel
Brad Parker

Thanks to NANOG program commmitted
List of them went too quickly

Agenda Changes--Tuesday
12:10--12:25...went too fast to see.

Agenda Changes  Wednesday
9:30-10 Hurricane Katrina: Telecom infrastructre,
impacts, solutions, and opportunities...
(more on slide)

Reminders
Network Security
 con't use cleartext passwordss
 do use end-to-end encryption (ssh, VPN)
PGP Key signing
 see link off .nanog.org for details
Yahoo Reception
Bear and Gear reception

Interpreting Badges.
Blue -- steering committee
Yellow -- program committee
green badge -- yellow plus blue, both committees
green dot: peering
black dot: security
red dot: PGP signer
RED badges will be for mailing list panel members

Lighting Talks
six 10-minute slots available
Criterion: on-topic for mailing list
Signups start now!
  http://www.nanogpc.org/lightning
Random acceptance of submissions made before 2pm Monday
Submission order after that (if slots remain)

That's it for Steve Feldman, next up is Brokaw
Price from Yahoo.

Welcome to Dallas, on behalf of Yahoo.
Not many hotels in Dallas that can host a group of this
size with two ballrooms (one for general session, one
for beer and gear; and the Hyatt was the only other one
with space, and they're booked since Katrina wiped out
other conference spaces in the south.).

There's a trolley one block over that will take you
to the downtown restaurant areas, it's free, feel
free to take it and explore the area, note it stops
running around 9:30pm.  It does quiet down after dark
in downtown, unfortunately.
One thing really needed for a NANOG is a really good
sized Internet link.  The hotel had a pair of T1s
to start with, and they were getting a second pair
when Yahoo began pulling the fiber in for NANOG.

Terminal room is now virtual; feel free to use the
laptops and printers should you need to print documents,
boarding passes, etc.  The laptops are cabled down, 
but if you need to borrow one, just track Brokaw down,
and he'll take care of you.

He's been feeling like he's part of the garment
industry in doing all the gear (two sets of tee
shirts, and the fleeces for people who peer with
Yahoo--if you don't already peer with Yahoo, jump
in and send us [EMAIL PROTECTED]).

There will be an awesome party tonight at Eddie
Deens, everyone should make sure to attend--it'll
be fun, Texas style, and Texas sized.  :)

Many thanks to Mike Gallagher for doing the
NANOG36 specific website with details on the
local area and local options for attendees.

Betty Burke asked him to say a few words about what
it takes to put on a conference like this; in many
ways, it's been like being a huge wedding planner,
only weddings don't need large internet connectivity. 
Start planning early!!

Brokaw thanks the Merit people for being so supportive;
they've been complete animals, biting into the details
with gusto; they're like true roadies, getting gear
packaged and shipped, audio gear, video gear, cables,
power, everything.

Many thanks to Larry, Betty, Chris, Dave, Susan, Laurie, 
Dwayne, Steve, Tony, Tom, Greg, SC, PC, everyone else, 
it's all been completely worth it!

It's definitely exciting times--we're building something
huge; traffic levels are growing at near-exponential
levels, datacenters are rolling out faster and faster.
It's great being part of this community, and we all
need to help keep it alive, to nurture it and helping
it grow.  If you haven't hosted a NANOG yet, definitely
consider it; it's an interesting process.  It starts
off with "What's NANOG?"  "What's the ROI on a NANOG?"
Dan Golding joking about having a PBS-style thermometer
graph showing how much new peering we get each day, to
demonstrate the ROI for hosting a NANOG.  
But really, it's about sharing the support for the
community--it's about stepping up to the plate, and
saying "it's our turn to pitch in."

There's 26 inches of snow in central park, which is
keeping many, many of our colleagues away; and if you
DO host, try to avoid Valentine's Day!!  If you have
any questions about hosting, feel free to call us.
The good people at ATT have been great working with
us.  The fiber coming into the hotel terminates in
the parking garage, but that was about 60 feet from
where it needed to be.   December 23rd, discovered
the shortfall, and discovered how to pull innerduct
in through a hotel on short notice.  No matter what
else happens for NANOG

Huge thanks to Yahoo crew, especially
Mike Gallagher,
Brian Lacroix,
Todd Parker,
Raj Patel,
Brad Parker,
The whole ATT cr

Any interest in notes from the talks at NANOG?

2006-02-13 Thread Matthew Petach

Since there are several attendees that are snowed in 
and won't be able to make it to Dallas for NANOG, I
was thinking of posting my notes from each presentation
to the nanog list, so those who are stranded can follow
along from home.  Would that be of interest to the list,
or would it be just so much more useless spam-like
fodder to be deleted?  So far, the response to my notes
from last night's community meeting has been
positive--but that's only 2 people, so I don't know if
I've helped 2 people, but alienated 8,036 more in the
process.  ^_^;

Let me know if you'd be interested in having me post
the notes.

Thanks!

Matt



2006.02.12 Open Committee Meeting Notes

2006-02-12 Thread Matthew Petach

I captured some notes during tonight's open mike
committee meeting, in case they may be of interest
to the list.

Apologies in advance for typos, it was hard to keep
up with the speakers.  ^_^;


Matt



Steering Committee Report ([EMAIL PROTECTED])
2006.02.12 1700 hours Central Time.

AGENDA
Steering Comittee (Randy Bush)
Program Committee (Steve Feldman)
Financial Report (Betty Burke)
Mailing List Report (Chris Malayter)


Steering committee report
Tryng to hear the membershiip
responsible for ML, PC, Lostistics
But trying not to micro-manage
Establishing normal but minimal business practices
Semi-weekly minutes on web site

SC Tries to listen
Transparency: SC Minutes, ML, Stats, ...
Trying Mon-Wed Meeting (instead of Sun-Tues)
Newcomers' Session (did it work?)
No more Terminal Room (Laptops plugged to printers
 near registration)
Change badge fonts (larger company name/person name)
 (really a question of size, according to WBN)

Suggestion from a nice gentleman at the microphone
who says "why not print the badge on both sides, so
you don't need to flip badge around all the time?

Did NOT change
Number of meetings per year
 Many costs of support are fixed, ie not per-meeting
 Currently amortized over three meetings
 If over two meetings, fees would go up significantly

Did NOT change
Working Lunch
 Hotels have monopoly on food
 The charges for lunch are what you would expect from a 
  monopoly
 But we will try to be more sensitive to ease of getting
  lunch near or at the meeting venue
(not economical to have hotel provide food)

Rights in Data
NANOG trademark is held by Mertic
Presos are copyright by the author
Right to freely distribute, but not modify, granted to NANOG
PC is drafting this formally
copyright notices on slides are OK if small and unobtrusive

What does it mean to be a member?
Attendance at meetings, and participation in the mailing
list is pretty much what defines membership.

Program Comittee
 First change using new process seemed successful
Why have you not submitted a talk?
Wht do you want to hear?

Mailing List
Worked the process to fill vacancy left by Steve Gibbard
Still working with ML panel to document their process
Still working With MP to develop an appeals processs
Statistics are published monthly on NANOG web site

ML Panel Appointments
No Terms, etc. in current charter
Straw proposal charger change parallels SC and PC
 two year terms
 staggered
 two sequential terms max without a vacation
Please comment, change, propose

(Bill Norton, Equinix, Use of nanog-futures to discuss
 this very type of issue...Randy will get to it)

ML panel proces cont.
This would give members a light at the end of the tunnel
Volunteers would know what they're signing up for
Allwos chang without bad vibe of removal
Normal organizational practice

Chartger Change
Octove is the end of the process so start now
MLL Panel straw poproasal staring
Need to get Steve's name adn other star-upisms removed
No other proposals received for this year

New Ideas
BLOG--no progress
Wiki, no progress
 SlashNOG, no interest
Trial of new tech gear at NANOG, nothing exciting
 Video in hallways, Do you like it?

It's back, same size, better location!!  By cafe tables
near registration area, near where food will be.

Traded terminal room for informal breakout rooms, informal
seating, allow for more mingling.

Ren Provo--http://nanog.multiply.com/, about 100 pictures 
with names and affiliations, so you can match up faces 
with names to help newcomers.

Mailing Lists
Engineering and Ops dicsussion only
 ([EMAIL PROTECTED])
Discussion about NANOG itself ([EMAIL PROTECTED])
Steering ([EMAIL PROTECTED])
Program ([EMAIL PROTECTED])
ML <[EMAIL PROTECTED])

Fruit supplied, yum!

Discussion?
How can we make NANOG more useful, fun, informative?

Randy gripes the mailing list has gotten boring
recently.  

Cut to Steve Feldman for Program Committee Report.
Steve Feldman, CNET, PC Chair.

All opinions, mistakes, his,
all the good stuff is thanks to the PC.

NANOG 36 program
26 submissions (down from 41!!)
22 accepted
1 cancelled
1 withdrawn
2 rejected
2 very late, both accepted

Areas for improvement
Speaker solicitation
Tool improvements
 self-service submission web interface
 Reports

Program Format
Mon-Weds format
 morning plenaries
 afternoon BOF, Tutorials
 Evening social events
Newbie meeting
 is there a better name
Tracks?
 not without more content!!

for tutorials, bofs, hopefully not too much overlap
or need to be in two places at once.
Party tomorrow courtesy of Yahoo!
Tuesday, Beer and Gear with sponsors.

For Tracks, need sufficient space as well as content.

Lightning Talks
Criterion: on-topic for mailing list
Signups start Monday morning
  instructions during plenary
Random acceptance of submissions made before 2pm Monday
Submission order after that (if slots remain)
(No personal insults!  Stay technical, keep it below 10
 minutes)

Feedback
Talk to us!
  PC members have yellow and green badges
Send mail
 [EMAIL PROTE

Community Meeting Notes

2005-10-24 Thread Matthew Petach

(oops--sent this out last night, but forgot to change the
sender to the subscribed-to-nanog address first, 
gomennasai minnasan)

Matt


I took some notes at the NANOG community meeting
tonight, and thought I'd share them with the list members
in the spirit of transparency--apologies for the
typos that may still exist, I'm heading to the social now.  :)

Matt



2005.10.23 Steering Committee

Randy Bush, IIJ, Tokyo
current chair of the steering committee

Steering committee progress/status Randy
Program committee report (steve)
financial report (betty)
mailing list report (chris)

Other than vetted presentations, each speaks
as individual.

Microphone is open throughout.

How do we progress forward?

Steering committee report

what was asked for?
what has been done?
what should or will be done?
what does the community want?

REferences:
Dan Golding's "Why REfport NaNOG"
httpH;//ww.nanog.org/mtg-0501/pdf/golding.pdff
[EMAIL PROTECTED] mailing list archives
dissucions withing comjunity
charter is up

Mailing list
problem: contining problems witn NANOG mailing list
 administration
solutions: you asked for even moderation,
transparency, fairness, clue

So far: we now have a mailing list amdin group,
drawn from voluneers from the
community, tasked with moderating the list
Need: documented process, appeal, ... ie transparency
Philip Glass from Cisco, Bill Norton, Billo

Mailing list commitee: Rob Seastrom, Steven Gibbard,
Susan Harris, Chris Malayter,

Mailing ilist issues
SC process re ML comittee not in charter
terms
selection
prolicy and process review and approval
ML policy and process: document, get
c ommunity feedback and modify
ML appeals preoss: to SC

Program comittee
joe abley, 
bert russ, 
bill norton
pete templeton?

problem: percieved as being out of touch with
the general operator community, community powerless

solution: empower community to select content
so far: OPC transition from 100% merit select o sec 
  selecte 8 from exsiting PC, 8 from comminity 
 nominations with fixe gterms
so far: clearly identify PC members (nametags)
engage them: orange blobs on their badges!
To do: Steve will give PC reprort

PC selection by SC
Call for volunteers 12 re-ups and 18 new
PC/SC call to discuss new volunters
sub-SC formed of SC members who were
not also PC members
SC and sub-SC took input from public
and PC
sub-SC met and decided the eight to keep
Full SC met to decide which eight new
 volunteers to add
Too many good candidates!

2006 PC 
returnign
bill woodcock
chris morrow
dave oleary
hank kimler
joe abley
kevin epperson
steve feldman
ted seely

new 
danield goldin
jenniver rexford
jel gaeggli
josh snowohrn
pete templon

Charter
No proposals recived for this year
formalizing of ML for next year

General
Problem: no tranpsarency in the way NANOG is run
Solution: segmenting problem and externalizing 
  tranaparent processes
Steering comiitee is ultmately accountable to you
 --we own this one
so far: SC selected PC
so far: Merit/NANOG financial statements available 
 @NANOG
SC and PC have private butdirecty accesible mailing list
[EMAIL PROTECTED]
and [EMAIL PROTECTED] specifically for community 
 interaction with these two groups
SC minutes are publically archived

Still to Do
First time through this process (new PC,
SC working with Merit, etc) trial and error
Mailing list moderation still a challenge
need documentation and community discussion,
exploration of policy and proceedures
BLOG /Wiki/SlashNOG
Trial of new tech gear at NANOG

Adminstrative mailing lists
engineering and ops discussion only
[EMAIL PROTECTED]
open meta discussion
[EMAIL PROTECTED]
steering committee: [EMAIL PROTECTED]
program committee: [EMAIL PROTECTED]
ml admins [EMAIL PROTECTED]

Discussion?
Why do people come to NANOG?  What is it that
brings us together?

Meet face to face with people we interact with
electronically, 
find out new trends, new developments, find out
where the internet is going around the world.
Find out how to do her job more effectively.
Would like longer breaks--15 minutes is really
tough for syncing up with people.

USENIX facesavers--put faces to names would
probably be a nice thing to add on.

Good to get operational information *off the
record*, not just from

Bring back the lunches!!!  box lunches would be
easy, and would let people meet and greet.
Requires less time, allows for better interaction.

Good opportunity to meet with peers, people we
want to peer with, or people we need to smack for
bad peering.  :D

What about a calendar online for while we're at
NANOG, so people can schedule time to meet.

Susan Harris talks about lunch challenges--box
lunches would be better, less of a logistics 
challenge than getting sitdown space.

Steve Wilcox suggests that maybe trying to cut
down on the evening talks to allow more time to
talk to people.

Mike Hughes points out it's hard to have a program
that fits for 400 people as well as allows for
smaller gatherings.
He's also a wG chair for RIPE, and had troubl