from:"Simon Leinen"

Re: hijack chronology: was [ YouTube IP Hijacking ]

2008-02-26 Thread Simon Leinen


Martin A Brown writes:
> Late last night, after poring through our data, I posted a detailed
> chronology of the hijack as seen from our many peering sessions.  I
> would add to this that the speed of YouTube's response to this
> subprefix hijack impressed me.

For a Sunday afternoon, yes, not bad.

Here's a graphical version of the timeline:

http://www.ris.ripe.net/cgi-bin/bgplay.cgi?prefix=208.65.153.0/24&start=2008-02-24+18:46&end=2008-02-24+21:05

> As discussed earlier in this thread, this is really the same old
> song--it simply has a new verse now.  (How many of our troubadors
> know all of the verses since AS 7007?)

Probably someone is busy working on the new NANOG song with an
extending refrain ("AS7007, ..., AS17557, when will we ever learn?").
-- 
Simon.

Re: YouTube IP Hijacking

2008-02-26 Thread Simon Leinen


Iljitsch van Beijnum writes:
> Well, if they had problems like this in the past, then I wouldn't
> trust them to get it right. Which means that it's probably a good
> idea if EVERYONE starts filtering what they allow in their tables
> from PCCW. Obviously that makes it very hard for PCCW to start
> announcing new prefixes, but I can't muster up much sympathy for
> that.

> So basically, rather than generate routing registry filters for the
> entire world, generate routing registry filters for known careless
> ASes. This number should be small enough that this is somewhat
> doable. [...]

Maybe, but how much would that help?

So you suggest that we only need to filter against AS7007, AS9121, and
AS17557.  Personally, those are among the ones I least worry about -
maybe I'm naive, but I'd hope they or their upstreams have learned
their lessons.

The problem is that nobody knows which of the other 25000+ ASes will
be the next AS7007.  So I guess we have to modify your suggestion
somewhat and, in addition to filtering the "known-careless" also
filter the "unknown-maybe-careful" class.  Oops, that leaves only the
"known-careful" class, which includes... my own AS, and then whom?
-- 
Simon.

Re: YouTube IP Hijacking

2008-02-26 Thread Simon Leinen


Rick Astley writes:
> Anything more specific than a /24 would get blocked by many filters,
> so some of the "high target" sites may want to announce their
> mission critical IP space as /24 and avoid using prepends.

Good idea.  But only the "high target" sites, please.  If you're an
unimportant site that nobody cares about, then DON'T DO THIS, ok? ;-)
-- 
Simon.

Re: An Attempt at Economically Rational Pricing: Time Warner Trial

2008-01-20 Thread Simon Leinen


Stupid typo in my last message, sorry.

> While I think this is basically a sound approach, I'm skeptical that
> *slightly* lowering prices will be sufficient to convert 80% of the
> user base from flat to unmetered pricing. [...]
  "METERED pricing", of course.
-- 
Simon.

Re: An Attempt at Economically Rational Pricing: Time Warner Trial

2008-01-20 Thread Simon Leinen


Frank Bulk writes:
> Except if the cable companies want to get rid of the 5% of heavy
> users, they can't raise the prices for that 5% and recover their
> costs.  The MSOs want it win-win: they'll bring prices for metered
> access slightly lower than "unlimited" access, making it attractive
> for a large segment of the user base (say, 80%), and slowly raise
> the unlimited pricing for the 15 to 20% that want that service, such
> that at the end of the day, the costs are less AND the revenue is
> greater.

While I think this is basically a sound approach, I'm skeptical that
*slightly* lowering prices will be sufficient to convert 80% of the
user base from flat to unmetered pricing.  Don't underestimate the
value that people put on not having to think about their consumption.

So I think it is important to design the metered scheme so that it is
perceived as minimally intrusive, and users feel in control.  For
example, a simple metered rate where every Megabyte has a fixed price
is difficult, because the customer has to think about usage vs. cost
all the time.  95%ile is a little better, because the customer only
has to think about longer-term usage (42 hours of peak usage per month
are free).  A flat rate with a usage cap and a lowered rate after the
cap is exceeded is easier to swallow than a variable rate, especially
when the lowered rate is still perceived as useful.  And there are
bound to be other creative ways of charging that might be even more
acceptable.  But in any case customers tend to be willing to pay a
premium for a flat rate.
-- 
Simon.

Re: TransAtlantic Cable Break

2007-06-24 Thread Simon Leinen

Leo Bicknell writes:
> However, if you put 15G down your "20G" path, you have no
> redundancy.  In a cut, dropping 5G on the floor, causing 33% packet
> loss is not "up", it might as well be down.

Sorry, it doesn't work like that either.  33% packet loss is an upper
limit, but not what you'd see in practice.  The vast majority of
traffic is responsive to congestion and will back off.  It is
difficult to predict that actual drop rate; that depends a lot on your
traffic mix.  A million "web mice" are much less elastic than a dozen
bulk transfers.

It is true that on average (averaged over all bytes), *throughput*
will go down by 33%.  But this reduction will not be distributed
evenly over all connections.

In an extreme (ly benign) case, 6G of the 20G are 30 NNTP connections
normally running at 200 Mb/s each, with 50 ms RTT.  A drop rate of
just 0.01% will cause those connections to back down to 20 Mb/s each
(0.6 Gb/s total).  This alone is more than enough to handle the
capacity reduction.  All other connections will (absent other QoS
mechanisms) see the same 0.01% loss, but this won't cause serious
issues to most applications.

What users WILL notice is when suddenly there's a 200ms standing queue
because of the overload situation.  This is a case for using RED (or
small router buffers).

Another trick would be to preferentially drop "low-value" traffic, so
that other users wouldn't have to experience loss (or even delay,
depending on configuration) at all.  And conversely, if you have (a
bounded amount of) "high-value" traffic, you could configure protected
resources for that.

> If your redundancy solution is at Layer 3, you have to have the
> policies in place that you don't run much over 10G across your dual
> 10G links or you're back to effectively giving up all redundancy.

The recommendation has a good core, but it's not that black&white.

Let's say that whatever exceeds the 10G should be low-value and
extremely congestion-responsive traffic.  NNTP (server/server) and P2P
file sharing traffic are examples for this category.  Both application
types (NetNews and things like BitTorrent) even have application-level
congestion responsiveness beyond what TCP itself provides: When a
given connection has bad throughput, the application will prefer
other, hopefully less congested paths.
-- 
Simon.

Re: Bandwidth Augmentation Triggers

2007-05-01 Thread Simon Leinen

Jason Frisvold writes:
> I'm working on a system to alert when a bandwidth augmentation is
> needed.  I've looked at using both true averages and 95th percentile
> calculations.  I'm wondering what everyone else uses for this
> purpose?

We use a "secret formula", aka rules of thumb, based on perceived
quality expectations/customer access capacities, and cost/revenue
considerations.

In the bad old days of bandwidth crunch (ca. 1996), we scheduled
upgrades of our transatlantic links so that relief would come when
peak-hour average packet loss exceeded 5% (later 3%).  At that time
the general performance expectation was that Internet performance is
mostly crap anyway, if you need to transfer large files, "at 0300 AM"
is your friend; and upgrades were incredibly expensive.  With that
rule, link utilization was 100% for most of the (working) day.

Today, we start thinking about upgrading from GbE to 10GE when link
load regularily exceeds 200-300 Mb/s (even when the average load over
a week is much lower).  Since we run over dark fibre and use mid-range
routers with inexpensive ports, upgrades are relatively cheap.  And -
fortunately - performance expectations have evolved, with some users
expecting to be able to run file transfers near Gb/s speeds, >500 Mb/s
videoconferences with no packet loss, etc.

An important question is what kind of users your links aggregate.  A
"core" link shared by millions of low-bandwidth users may run at 95%
utilization without being perceived as a bottleneck.  On the other
hand, you may have an campus access shared by users with fast
connections (I hear GbE is common these days) on both sides.  In that
case, the link may be perceived as a bottleneck even when utilization
graphs suggest there's a lot of headroom.

In general, I think utilization rates are less useful as a basis for
upgrade planning than (queueing) loss and delay measurements.  Loss
can often be measured directly at routers (drop counters in SNMP), but
queueing delay is hard to measure in this way.  You could use tools
such as SmokePing (host-based) or Cisco IP SLA or Juniper RPM
(router-based) to do this.

(And if you manage to link your BSS and OSS, then you can measure the
rate at which customers run away for an even more relevant metric :-)

> We're talking about anything from a T1 to an OC-12 here.  My guess
> is that the calculation needs to be slightly different based on the
> transport, but I'm not 100% sure.

Probably not on the type of transport - PDH/SDH/Ethernet behave
essentially the same.  But the rules will be different for different
bandwidth ranges.  Again, it is important to look not just at link
capacities in isolation, but also at the relation to the capacities of
the access links that they aggregate.
-- 
Simon.

Re: from the academic side of the house

2007-04-26 Thread Simon Leinen

Tony Li writes:
> On Apr 25, 2007, at 2:55 PM, Simon Leinen wrote:
>> Routing table lookups(*) are what's most relevant here, [...]

> Actually, what's most relevant here is the ability to get end-hosts
> to run at rate.  Packet forwarding at line rate has been
> demonstrated for quite awhile now.

That's true (although Steve's question was about the routers).

The host bottleneck for raw 10Gb/s transfers used to be bus bandwidth.
The 10GE adapters in most older land-speed record entries used the
slower PCI-X, while this entry was done with PCI Express (x8) adapters.

Another host issue would be interrupts and CPU load for checksum, but
most modern 10GE (and also GigE!) adapters offload checksum
segmentation and reassembly, as well as checksum computation and
validation to the adapter if the OS/driver supports it.

The adapters used in this record (Chelsio S310E) contain a full TOE
(TCP Offload Engine) that can run the entire TCP state machine on the
adapter, although I'm more sure whether they made use of that.
Details on

http://data-reservoir.adm.s.u-tokyo.ac.jp/lsr-200612-02/
-- 
Simon.

Re: from the academic side of the house

2007-04-25 Thread Simon Leinen

Steven M Bellovin writes:
> Jim Shankland <[EMAIL PROTECTED]> wrote:

>> (2) Getting this kind of throughput seems to depend on a fast
>> physical layer, plus some link-layer help (jumbo packets), plus
>> careful TCP tuning to deal with the large bandwidth-delay product.
>> The IP layer sits between the second and third of those three items.
>> Is there something about IPv6 vs. IPv4 that specifically improves
>> perfomance on this kind of test?  If so, what is it?

> I wonder if the router forward v6 as fast.

In the 10 Gb/s space (sufficient for these records, and I'm not
familiar with 40 Gb/s routers), many if not most of the current gear
handles IPv6 routing lookups "in hardware", just like IPv4 (and MPLS).

For example, the mid-range platform that we use in our backbone
forwards 30 Mpps per forwarding engine, whether based on IPv4
addresses, IPv6 addresses, or MPLS labels.  30 Mpps at 1500-byte
packets corresponds to 360 Gb/s.  So, no sweat.

Routing table lookups(*) are what's most relevant here, because the other
work in forwarding is identical between IPv4 and IPv6.  Again, many
platforms are able to do line-rate forwarding between 10 Gb/s ports.
-- 
Simon, AS559.
(*) ACLs (access control lists) are also important, but again, newer
hardware can do fairly complex IPv6 ACLs at line rate.

Re: Thoughts on increasing MTUs on the internet

2007-04-13 Thread Simon Leinen


Ah, large MTUs.  Like many other "academic" backbones, we implemented
large (9192 bytes) MTUs on our backbone and 9000 bytes on some hosts.
See [1] for an illustration.  Here are *my* current thoughts on
increasing the Internet MTU beyond its current value, 1500.  (On the
topic, see also [2] - a wiki page which is actually served on a
9000-byte MTU server :-)

Benefits of >1500-byte MTUs:

Several benefits of moving to larger MTUs, say in the 9000-byte range,
were cited.  I don't find them too convincing anymore.

1. Fewer packets reduce work for routers and hosts.

   Routers:
 
   Most backbones seem to size their routers to sustain (near-)
   line-rate traffic even with small (64-byte) packets.  That's a good
   thing, because if networks were dimensioned to just work at average
   packet sizes, they would be pretty easy to DoS by sending floods of
   small packets.  So I don't see how raising the MTU helps much
   unless you also raise the minimum packet size - which might be
   interesting, but I haven't heard anybody suggest that.

   This should be true for routers and middleboxes in general,
   although there are certainly many places (especially firewalls)
   where pps limitations ARE an issue.  But again, raising the MTU
   doesn't help if you're worried about the worst case.  And I would
   like to see examples where it would help significantly even in the
   normal case.  In our network it certainly doesn't - we have Mpps to
   spare.
 
   Hosts:
 
   For hosts, filling high-speed links at 1500-byte MTU has often been
   difficult at certain times (with Fast Ethernet in the nineties,
   GigE 4-5 years ago, 10GE today), due to the high rate of
   interrupts/context switches and internal bus crossings.
   Fortunately tricks like polling-instead-of-interrupts (Saku Ytti
   mentioned this), Interrupt Coalescence and Large-Send Offload have
   become commonplace these days.  These give most of the end-system
   performance benefits of large packets without requiring any support
   from the network.

2. Fewer bytes (saved header overhead) free up bandwidth.

   TCP segments over Ethernet with 1500 byte MTU is "only" 94.2%
   efficient, while with 9000 byte MTU it would be 99.?% efficient.
   While an improvement would certainly be nice, 94% already seems
   "good enough" to me.  (I'm ignoring the byte savings due to fewer
   ACKs.  On the other hand not all packets will be able to grow
   sixfold - some transfers are small.)

3. TCP runs faster.

   This boils down to two aspects (besides the effects of (1) and (2)):

   a) TCP reaches its "cruising speed" faster.

  Especially with LFNs (Long Fat Networks, i.e. paths with a large
  bandwidth*RTT product), it can take quite a long time until TCP
  slow-start has increased the window so that the maximum
  achievable rate is reached.  Since the window increase happens
  in units of MSS (~MTU), TCPs with larger packets reach this
  point proportionally faster.

  This is significant, but there are alternative proposals to
  solve this issue of slow ramp-up, for example HighSpeed TCP [3].

   b) You get a larger share of a congested link.

  I think this is true when a TCP-with-large-packets shares a
  congested link with TCPs-with-small-packets, and the packet loss
  probability isn't proportional to the size of the packet.  In
  fact the large-packet connection can get a MUCH larger share
  (sixfold for 9K vs. 1500) if the loss probability is the same
  for everybody (which it often will be, approximately).  Some
  people consider this a fairness issue, other think it's a good
  incentive for people to upgrade their MTUs.

About the issues:

* Current Path MTU Discovery doesn't work reliably.

  Path MTU Discovery as specified in RFC 1191/1981 relies on ICMP
  messages to discover when a smaller MTU has to be used.  When these
  ICMP messages fail to arrive (or be sent), the sender will happily
  continue to send too-large packets into the blackhole.  This problem
  is very real.  As an experiment, try configuring an MTU < 1500 on a
  backbone link which has Ethernet-connected customers behind it.
  I bet that you'll receive LOUD complaints before long.

  Some other people mention that Path MTU Discovery has been refined
  with "blackhole detection" methods in some systems.  This is widely
  implemented, but not configured (although it probably could be with
  a "Service Pack").

  Note that a new Path MTU Discovery proposal was just published as
  RFC 4821 [4].  This is also supposed to solve the problem of relying
  on ICMP messages.

  Please, let's wait for these more robust PMTUD mechanisms to be
  universally deployed before trying to increase the Internet MTU.

* IP assumes a consistent MTU within a logical subnet.

  This seems to be a pretty fundamental assumption, and Iljitsch's
  original mail suggests that we "fix" this.  Umm, ok, I hope we don't
  miss anything important tha

Re: TCP and WAN issue

2007-03-28 Thread Simon Leinen


Andre Oppermann gave the best advice so far IMHO.
I'll add a few points.

> To quickly sum up the facts and to dispell some misinformation:

>  - TCP is limited the delay bandwidth product and the socket buffer
>sizes.

Hm... what about: The TCP socket buffer size limits the achievable
throughput-RTT product? :-)

>  - for a T3 with 70ms your socket buffer on both endss should be
>450-512KB.

Right.  (Victor Reijs' "goodput calculator" says 378kB.)

>  - TCP is also limited by the round trip time (RTT).

This was stated before, wasn't it?

>  - if your application is working in a request/reply model no amount
>of bandwidth will make a difference.  The performance is then
>entirely dominated by the RTT.  The only solution would be to run
>multiple sessions in parallel to fill the available bandwidth.

Very good point.  Also, some applications have internal window
limitations.  Notably SSH, which has become quite popular as a bulk
data transfer method.  See http://kb.pert.geant2.net/PERTKB/SecureShell

>  - Jumbo Frames have definately zero impact on your case as they
>don't change any of the limiting parameters and don't make TCP go
>faster.

Right.  Jumbo frames have these potential benefits for bulk transfer:

(1) They reduce the forwarding/interrupt overhead in routers and hosts
by reducing the number of packets.  But in your situation it is quite
unlikely that the packet rate is a bottleneck.  Modern routers
typically forward even small packets at line rate, and modern
hosts/OSes/Ethernet adapters have mechanisms such as "interrupt
coalescence" and "large send offload" that make the packet size
largely irrelevant.  But even without these mechanisms and with
1500-byte packets, 45 Mb/s shouldn't be a problem for hosts built in
the last ten years, provided they aren't (very) busy with other
processing.

(2) As Perry Lorier pointed out, jumbo frames accelerate the "additive
increase" phases of TCP, so you reach full speed faster both at
startup and when recovering from congestion.  This may be noticeable
when there is competition on the path, or when you have many smaller
transfers such that ramp-up time is an issue.

(3) Large frames reduce header overhead somewhat.  But the improvement
going from 1500-byte to 9000-bytes packets is only 2-3%, from ~97%
efficiency to ~99.5%.  No orders of magnitude here.

>There are certain very high-speed and LAN (<5ms) case where it
>may make a difference but not here.

Cases where jumbo frames might make a difference: When the network
path or the hosts are pps-limited (in the >Gb/s range with modern
hosts); when you compete with other traffic.  I don't see a relation
with RTTs - why do you think this is more important on <5ms LANs?

>  - Your problem is not machine or network speed, only tuning.

Probably yes, but it's not clear what is actually happening.  As it
often happens, the problem is described with very little detail, so
experts (and "experts" :-) have a lot of room to speculate.

This was the original problem description from Philip Lavine:

I have an east coast and west coast data center connected with a
DS3. I am running into issues with streaming data via TCP

In the meantime, Philip gave more information, about the throughput he
is seeing (no mention how this is measured, whether it is total load
on the DS3, throughput for an application/transaction or whatever):

This is the exact issue. I can only get between 5-7 Mbps.

And about the protocols he is using:

I have 2 data transmission scenarios:

1. Microsoft MSMQ data using TCP
2. "Streaming" market data stock quotes transmitted via a TCP
   sockets

It seems quite likely that these applications have their own
performance limits in high-RTT situations.

Philip, you could try a memory-to-memory-test first, to check whether
TCP is really the limiting factor.  You could use the TCP tests of
iperf, ttcp or netperf, or simply FTP a large-but-not-too-large file
to /dev/null multiple times (so that it is cached and you don't
measure the speed of your disks).

If you find that this, too, gives you only 5-7 Mb/s, then you should
look at tuning TCP according to Andre's excellent suggestions quoted
below, and check for duplex mismatches and other sources of
transmission errors.

If you find that the TCP memory-to-memory-test gives you close to DS3
throughput (modulo overhead), then maybe your applications limit
throughput over long-RTT paths, and you have to look for tuning
opportunities on that level.

> Change these settings on both ends and reboot once to get better throughput:

> [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
> "SackOpts"=dword:0x1 (enable SACK)
> "TcpWindowSize"=dword:0x7D000 (512000 Bytes)
> "Tcp1323Opts"=dword:0x3 (enable window scaling and timestamps)
> "GlobalMaxTcpWindowSize"=dword:0x7D000 (512000 Bytes)

> http://www.microsoft.com/technet/network/deploy/depovg/tcpip2k.mspx
-- 
Simon.

Re: Network end users to pull down 2 gigabytes a day, continuously?

2007-01-10 Thread Simon Leinen

Alexander Harrowell writes:
> For example: France Telecom's consumer ISP in France (Wanadoo) is
> pushing out lots and lots of WLAN boxes to its subs, which it brands
> Liveboxes. As well as the router, they also carry their carrier-VoIP
> and IPTV STB functions. [...]

Right, and the French ADSL ecosystem mostly seems to be based on these
"boxes" - Proxad/free.fr has its Freebox, Alice ADSL (Telecom Italia)
the AliceBox, etc.  All these have SCART ("peritelevision") TV plugs
in their current incarnations, in addition to the WLAN access points
and phone jacks that previous versions already had.

Personally I don't like this kind of bundling, and I think being able
to choose telephony and video providers indepenently of ISP is better.
But the business model seems to work in that market.  Note that I
don't have any insight or numbers, just noticing that non-technical
people (friends and family in France) do seem to be capable of
receiving TV over IP (although not "over the Internet") - confirming
what Simon Lockhart claimed.

Of course there are still technical issues such as how to connect two
TV sets in different parts of an appartment to a single *box.  (Some
boxes do support two simultaneous video channels depending on
available bandwidth, which is based on the level of unbundling
("degroupage") in the area.)

As far as I know, the French ISPs use IP multicast for video
distribution, although I'm pretty sure that these IP multicast
networks are not connected to each other or to the rest of the
multicast Internet.
-- 
Simon.

Re: Home media servers, AUPs, and upstream bandwidth utilization.

2006-12-25 Thread Simon Leinen


Lionel Elie Mamane writes:
> On Mon, Dec 25, 2006 at 12:44:37AM +, Jeroen Massar wrote:
>> That said ISP's should simply have a package saying "50GiB/month
>> costs XX euros, 100GiB/month costs double" etc. As that covers what
>> their transits are charging them, nothing more, nothing less.

> I thought IP transit was mostly paid by "95% percentile highest speed
> over 5 minutes" or something like that these days? Meaning that ISP's
> costs are maximised if everyone maxes our their line for the same 6%
> of the time over the month (even if they don't do anything the rest of
> the time), and minimised if the usage pattern were nicely spread out?

Yes.  With Jeroen's suggestion, there's a risk that power-users'
consumption will only be reduced for off-peak hours, and then the ISP
doesn't save much.  A possible countermeasure is to not count off-peak
traffic (or not as much).  Our charging scheme works like that, but
our customers are mostly large campus networks, and I don't know how
digestible this would be to retail ISP consumers.
-- 
Simon.

Re: The Cidr Report

2006-11-10 Thread Simon Leinen


cidr-report  writes:
> Recent Table History
> Date  PrefixesCIDR Agg
> 03-11-06199409  129843
[...]
> 10-11-06  134555024  129854

Growth of the "global routing table" really picked up pace this week!
(But maybe I'm just hallucinating for having heard the report from the
IAB Routing Workshop report three times in a week :-)
Or the CIDR Report software has an R200K problem?
-- 
Simon.

Re: [routing-wg]BGP Update Report

2006-09-13 Thread Simon Leinen

Vince Fuller writes:
> On Mon, Sep 11, 2006 at 12:32:57PM +0200, Oliver Bartels wrote:
>> Ceterum censeo: Nevertheless this moving-clients application shows
>> some demand for a true-location-independend IP-addresses
>> announcement feature (provider independend "roaming") in IPv6, as
>> in v4 (even thru this isn't the "standard" way, but Connexion is
>> anything but standard). Shim etc. is not sufficient ...

Ehm, well, Connexion by Boeing is maybe not such a good example for
this demand.  Leaving aside the question whether there is a business
case, I remain unconvinced that using BGP for mobility is even worth
the effort.  It is obvious that it "worked" for Boeing in IPv4, for
some value of "worked", but the touted delay improvements on the
terrestrial ISP path (ground station - user's "home" ISP) are probably
lost in the noise compared to the 300ms of geostationary.  But, hey,
it's free - just deaggregate a few /19's worth of "PA" (what's that?)
space into /24 and annouce and re-announce at will.

Vince has an outline of an excellent solution that would have avoided
all the load on the global routing system with (at least) the same
performance (provided that the single network/VPN is announced to the
Internet from good locations on multiple continents):

> One might also imagine that more globally-friendly way to implement
> this would have been to build a network (VPN would be adequate)
> between the ground stations and assign each plane a prefix out of a
> block whose subnets are only dynamically advertsed within that
> network/VPN. Doing that would prevent the rest of the global
> Internet from having to track 1000+ routing changes per prefix per
> day as satellite handoffs are performed.

But that would have cost money! Probably just 1% of the marketing
budget of the project or 3% of the cost of equipping a single plane
with the "bump" for the antenna, but why bother? With IPv4 you get
away with advertising de-aggregated /24s from PA space.

At one of the Boeing presentations (NANOG or RIPE) I asked the
presenter how they coped with ISPs who filter.  Instead of responding,
he asked me back "are you from AS3303"?.  From which I deduce that
there are about two ISPs left who filter such more-specifics (AS3303
and us :-).

IMHO Connexion by Boeing's BGP hack, while cool, is a good example of
an abomination that should have been avoided by having slightly
stronger incentives against polluting the global routing system.
Where's Sean Doran when you need him?
-- 
Simon (AS559).

Re: [routing-wg]BGP Update Report

2006-09-13 Thread Simon Leinen

Marshall Eubanks writes:
> In a typical flight Europe / China I believe that there would be
> order 10-15 satellite transponder / ground station changes. The
> satellite footprints count for more that the geography.

What I remember from the Connexion presentations is that they used
only four ground stations to cover more or less the entire Northern
hemisphere.  I think the places were something like Lenk
(Switzerland), Moscow, Tokyo, and somewhere in the Central U.S..

So a Europe->China flight should involve just one or two handoffs
(Switzerland->Moscow(->Tokyo?)).  Each ground station has a different
ISP, and the airplane's /24 is re-announced from a different origin AS
after the handoff.

It's possible that there are additional satellite/transponder changes,
but those wouldn't be visible in BGP.
-- 
Simon.

Re: update bogon routes

2006-07-27 Thread Simon Leinen


Miguel,

> We have had some problems of being beaten back. Our space, being
> annocunced by AS 16592, is 190.5.128.0/19

I only see 190.5.128.0/21, and because it is our policy to ignore
more-specifics from "PA" space (including anything more specific than
/21 from 190.0.0.0/8 and the other LACNIC ranges), we don't accept
that route.  Couldn't you just announce the entire /19?

Regards,
-- 
Simon, AS559.

Re: Best practices inquiry: tracking SSH host keys

2006-06-29 Thread Simon Leinen


Jeroen Massar writes:
> The answer to your question: RFC4255
> "Using DNS to Securely Publish Secure Shell (SSH) Key Fingerprints"
> http://www.ietf.org/rfc/rfc4255.txt

Yes, that's cool if your SSH client supports it (recent OpenSSH's do).

> You will only need to stuff the FP's into SSHFP DNS RR's and turn on
> verification for these records on the clients. Done.

How do you get the SSH host key fingerprint of a Cisco into SSHFP syntax?

> In combo with DNSSEC this is a (afaik ;) 100% secure way to at least get
> the finger prints right.

Exactly.
-- 
Simon.

Re: How to measure network quality&performance for voip&gameservers (udp packetloss, delay, jitter,...)

2006-03-10 Thread Simon Leinen

Gunther Stammwitz writes:
> ==> Which tools (under linux) are you using in order to measure your
> own network ore on of your upstreams in terms of "gameability" or
> voip-usage?

My favorite tool for assessing delay distribution and loss over time
is Tobi Oetiker's (of MRTG fame) SmokePing (http://www.smokeping.org/).

As input, it can use various types of measurements - ping RTT/loss
measurement in the simplest case, but also Cisco SAA (now called IP
SLA) measurements, or various other types of probes such as HTTP or
DNS requests.

The nice thing is the way it presents the time distributions
graphically.  The graphs also include loss rates.  Check out the
"demo" part of Tobi's webpage.
-- 
Simon.

Re: Split flows across Domains

2006-01-25 Thread Simon Leinen


Robert E Seastrom writes:
> Yes and no.  CEF is {src, dst} hash IIRC, and "per-flow" usually
> means {src, srcport, dst, dstport, [proto, tos]} hash in my
> experience.

Correct.

The Catalyst 6500/7600 OSR with Sup2/Sup32/Sup720 can be configured to
hash based on L4 ports in addition to the IP addresses (for IPv4):

http://puck.nether.net/pipermail/cisco-nsp/2005-December/026952.html

This is handy when you have multiple "striped" TCP connections between
a single pair of hosts, and want them to be able to use multiple
equal-cost paths, but still want to avoid reordering inside each
connection (as you would inevitably get with per-packet load
sharing).
-- 
Simon.

Re: Routing Table Jump caused by AS4151

2005-11-10 Thread Simon Leinen


Christopher L Morrow writes:
> On Thu, 10 Nov 2005, Fredy Kuenzler wrote:
>> I noticed a jump from some 171k to almost 175k in the last days, and
>> checked CIDR http://www.cidr-report.org/:
>> 
>> > Top 20 Net Increased Routes per Originating AS
>> >
>> > Prefixes  Change  ASnum AS Description
>> > 3263  0->3263 AS4151USDA-1 - USDA
>> 
>> so I wonder what's wrong with them.

> they are leaking just about every /24 in several /16's? :( ATT worldnet is
> passing them on too, perhaps they can smackdown usda? :)

Indeed.

   Network  Path
[...hundreds of similar routes deleted...]
*  199.156.247.03549 7018 4152 13979 4151 ?
*  199.156.249.03549 7018 4152 13979 4151 i
*  199.156.250.03549 7018 4152 13979 4151 i
*  199.156.252.03549 7018 4152 13979 4151 i
*  199.156.254.03549 7018 4152 13979 4151 ?
*  199.156.255.03549 7018 4152 13979 4151 ?
*  199.157.1.0  3549 7018 4152 13979 4151 ?
*  199.157.2.0  3549 7018 4152 13979 4151 ?
*  199.157.3.0  3549 7018 4152 13979 4151 ?
*  199.157.5.0  3549 7018 4152 13979 4151 i
*  199.157.7.0  3549 7018 4152 13979 4151 ?
*  199.157.8.0  3549 7018 4152 13979 4151 ?
[...hundreds of similar routes deleted...]

This looks more like configuration error than like anything more evil
(such as fine-grained traffic engineering).

AS4151, could you please stop this?

Thanks & regards,
-- 
Simon Leinen.
SWITCH

Re: Deploying 6to4 outbound routes at the border

2005-10-16 Thread Simon Leinen


Daniel Roesen writes:
> On Fri, Oct 14, 2005 at 10:45:33PM -0400, Todd Vierling wrote:
>> Maybe to start -- but again, what kind of 6to4 traffic level are we
>> expecting yet?

> Peak or average? Think twice before answering. :-)

> I'm told there are 6to4 relays seeing in excess of 100mbps. Not
> bursts.  Can you imagine trying to handle 100mbps "internet mix"
> traffic process switched? :-Z Not even talking about the peaks.

Note that not all Cisco routers use process switching for 6to4 tunnel
encap/decap (which is really just IPv6-in-IPv4).  Catalyst 6500/7600
OSR with PFC-3 (Sup32/Sup720) do this "in hardware".
-- 
Simon.

Re: Level 3's side of the story

2005-10-16 Thread Simon Leinen

Kevin Loch writes:
> Does anyone have reachability data for c-root during this episode?

The RIPE NCC "DNSMON" service has some:

http://dnsmon.ripe.net/dns-servmon/server/plot?server=c.root-servers.net&type=drops&tstart=1128246543&tstop=1128972253

According to BGPlay for that particular prefix from Route-Views data
(for some reason the RIPE RIS server used by BGPlay seems to be down
at the moment), the "episode" seems to be between these times (UTC):

 2005-10-05 09:49:03   Route Withdrawal ( 3356 174 2149 )
 2005-10-07 19:24:13   Route Announcement   3356 174 2149

The interval in the URL above starts 72 hours before the start of the
episode and ends 72 hours after its end.  I cannot see any particular
problems that would coincide with the episode, from that set of probes
(RIPE TTM).

Because we rely on default routes to our three transit providers, and
Level(3) is one of them, some of our customers must have had
connectivity issues to Cogent for a few hours, until we noticed
(thanks to Adam Rothschild and NANOG) and implemented a workaround.
But our RIPE TTM boxes (tt85 as well as the currently broken tt86)
aren't in those parts of our network.

> I wonder if they made separate arrangements for that or are planning
> to make arrangements for phase 2.

As someone else said, partial unreachability of a particular root
nameserver isn't that much of an issue.  But it's an interesting
question nevertheless.
-- 
Simon.

IPv6 traffic numbers [was: Re: OT - Vint Cerf joins Google]

2005-09-12 Thread Simon Leinen


[CC'ing Stanislav Shalunov, who does the Internet2 weekly reports.]

Marshall Eubanks writes, in response to Jordi's "8% IPv6" anecdote:
> These estimates seem way high and need support. Here is a counter-example.

While I'm also skeptical about the representativeness of Jordi's
estimates, this is a bad counterexample (see below about why):

> Netflow on Internet 2 for last week 

> http://netflow.internet2.edu/weekly/20050829/

> has 6.299 Gigabytes being sent by IPv6, out of a total 383.2
> Terabytes, or 0.0016% This is backbone traffic, and would not catch
> intra-Campus traffic, nor would it catch tunnel or VPN traffic,
^^^^

Wrong.  What you see here is ONLY tunnel traffic, because the number
is for IPv6-in-IPv4 (IP protocol 41) traffic.

Netflow for IPv6 isn't widely used yet.  Our own equipment doesn't
support it, and I don't think the Junipers used in Abilene do, either
(someone please correct me if I'm wrong).

> but it is suggestive.

Yes, but it's also irrelevant, because Abilene has native IPv6, so
there is little incentive for sending IPv6 tunneled in IPv4.

> According to the graph
> http://netflow.internet2.edu/weekly/longit/perc-protocols41-octets.png
> the most I2 IPv6 traffic was in  2002, when it was almost 0.6% of the total. 

I would assume that that was before IPv6 went native on Abilene.

> It is hard for me to imagine that the situation for commerical US
> traffic is much different.

I'm sure there's less
> There may be similar statistics for Geant - I would be interested to
> see them.

I'll look up the GEANT numbers in a minute, stay tuned.
-- 
Simon.

Re: IPv6 push doesn't have much pull in U.S

2005-07-22 Thread Simon Leinen


Christopher L Morrow writes:
> On Sat, 16 Jul 2005, Iljitsch van Beijnum wrote:
>> And I'm sure Sprint and Verio (MCI/Worldcom/UUNET too? I have a

> I know verio does, Sprint I believe also does, and UUNET
> does... everyone has restrictions on the service though (native or
> tunnel'd type restrictions)

For what it's worth, we get IPv6 transit connectivity from all our
upstreams: Global Crossing, TeliaSonera and Level(3).  In each case,
IPv6 runs over a short tunnel to a router somewhere in the upstream's
backbone.  I assume that's because they either run separate backbones
for IPv4 and IPv6, or because our access routers aren't IPv6-enabled
for some reason.
-- 
Simon.

Re: Two questions [controlling broadcast storms & netflow software]; seeking offlist responses

2005-05-05 Thread Simon Leinen

Drew Weaver writes:
> Also the other question I had was are there any very good either
> open source or fairly affordable netflow analyzer software packages
> out there right now?

Making a recommendation is difficult, because there is such a wide
variety of requirements, depending on context (backbone/campus/
hosting) and application (billing/security/traffic planning).

But I try to keep a comprehensive list of Netflow-related software on

http://www.switch.ch/tf-tant/floma/software.html#netflow

Hope this helps,
-- 
Simon.

Re: Tracking spoofed routes?

2005-01-06 Thread Simon Leinen

Arife Vural writes:
[in response to Florian Frotzler <[EMAIL PROTECTED]>:]
>> To my knowledge, the myas-tool/-service from RIPE NCC is kind of
>> doing what you like to achive.

> MyASN is working on user-based. To get the alarm for unexpected
> routing patterns, you should set it up an account beforehand.

I have been using MyASN for half a year, and it is quite nice.
Setting it up required typing all our customer routes into Web forms,
which was somewhat tedious, but now I receive alerts in almost real
time as soon as someone tries to "highjack" our routes or announces
more-specifics.

For example, there was a large-scale incident on 24 December 2004 (see
e.g. http://www.merit.edu/mail.archives/nanog/msg03827.html).  It
started shortly before 09:20 UTC, and at 09:59 UTC I received an alert
from MyASN that some of our customer routes were announced from
another AS.  This is very respectable, especially since the system
must have been very heavily loaded at that time, because of the sheer
number of BGP updates and the number of potential alerts (MOST
prefixes were highjacked at some point during that day).

> I think for Kevin's situation, we have other tools. One is called,
> "Search by Prefix" and other one is BGPlay. Both tools are running
> over last 3 months routing data.

One problem is that Kevin is looking for an announcement of a *more
specific* prefix from his space.  BGPlay only supports queries on
exact prefixes I think.

The "Search by Prefix" tool seems to be ideal for Kevin's application
though.

> URL for those tools,

> http://www.ris.ripe.net/cgi-bin/risprefix.cgi
> http://www.ris.ripe.net/bgplay/
-- 
Simon.

Re: DNS Timeout Errors

2004-12-09 Thread Simon Leinen

Jay,

> Is anyone else experiencing DNS timeout errors.  I've tried using
> multiple name resolvers, and tested multiple domain names using
> different name servers, and I keep getting "name not found" errors.

> Trying the same domain name a second time, and it resolves ok.  This
> all started a few days ago.

About three weeks ago, some of our users have told us that they were
experiencing many DNS resolution failures while surfing the Web.  We
analyzed this, and part of the explanation we came up with should work
for others, especially if the following conditions are met:

Are you using BIND 9 on the recursive nameserver that you normally use?
If so, does the installation of BIND 9 on your recursive nameserver
include support for DNS queries over IPv6?

BIND 9 seems to have trouble when a nameserver responds fine under
IPv4, but doesn't respond well (or at all) under IPv6 (e.g. because
IPv6 connectivity between you and the server is somehow broken): It
will continue to query the name server under its unresponsive IPv6
address in some situations.  I have seen this a lot when tracing IPv6
DNS queries from our recursive name servers(*).

This can be very noticeable, especially since A.GTLD-SERVERS.NET and
B.GTLD-SERVERS.NET now have  records (IPv6 addresses).  Many
ccTLDs - including ours - have recently added IPv6-reachable name
servers, too.

I'm wondering whether many users are seeing this, but I have no idea
how to gather data on this, especially historical data.  (Except maybe
trying to correlate access times from server logs of popular Web
servers that refer to each other.)

I'm attaching a message from comp.protocols.dns.bind that refers to
this problem.
-- 
Simon.

(*) In our case, our recursive name server was using the wrong source
address for its queries, namely its anycast IPv6 address (Linux
IPv6 source address selection sucks!), so it would often not
receive a response to a query over IPv6, because the response
would end up at another anycast instance.

But I assume the more common case is that the IPv6 queries don't
reach the authoritative name server at all, because the recursive
name server doesn't have global IPv6 connectivity.  The IPv6
connectivity problem may also be at the end of an important
authoritative server, and still cause problems.

--- Begin Message ---

> Hello List --
> 
> I tried searching for this in the archives and didn't see anything
> conclusive.
> 
> We are an ISP with caching resolvers running BIND9.2.2 on Solaris 8 that
> are not behind firewalls.  Upon running scripts to test unrelated issues,
> I noticed that any time I queried any of my resolvers for domains that
> have not been cached, the recursive query response times are horrible --
> consistently over 4 seconds.  If I clear the cache and run a script that
> digs over 100 random domains, all of them come back > 4 seconds.  Nothing
> has changed on our resolvers' config in months.  Root hint file is up to
> date.  Dig +trace or debug isn't showing anything. Tcpdump/snoop shows
> nothing, other than an empty hole when the machine is waiting for a
> response back from any root server.  Queries against the boxes locally vs.
> queries from another machine make no difference.  We have tried boxes that
> have not been patched in months as well as up-to date machines.  All the
> same.
> 
> Here's the options we have:
> 
> 
> options {
> 
> directory "/var/named";
> /*
> *
> */
> max-ncache-ttl 10800;
> transfers-in 25;
> notify no;
> allow-query { CSR; DEV; localhost; };
> recursion yes;
> recursive-clients 10;
> allow-transfer { none; };
> interface-interval 0;
> cleaning-interval 30;
> blackhole { 10.0.0.0/8; 192.168.0.0/16; };
> pid-file "named.pid";
> 
> };
> 
> 
> Although I would be happy to post more info for your review, my questions
> are these:  Has anyone else noticed this lag in recursion recently?  Can
> anyone on this list try clearing their cache and then running queries for
> random domains and noting the response time?
> 
> Curiously, an old BIND8 box we have does NOT experience this lag, no
> matter what.
> 
> Any insight you may have is appreciated.
> 
> Thanks
> 
> -Erik J
 

Know issue which will be fixed in BIND 9.2.5/9.3.1.

Workarounds:
* upgrade to 9.3.0 and run "named -4".
* configure --disable-ipv6.
* get yourself IPv6 connectivity.

A.GTLD-SERVERS.NET and B.GTLD-SERVERS.NET now have  address
and the RTT estimates are not being penalised because you don't
have IPv6 connectivity.

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: [EMAIL PROTECTED]


--- End Message ---

Re: The Cidr Report

2004-11-13 Thread Simon Leinen

Daniel Roesen writes:
> Well, it boils down that if you have enough customers, you seem to
> get away with about any antisocial behaviour on the net.

You don't need to have many customers, it's just more fun if you have
a larger space that you can deaggregate.  Since everybody stopped
filtering you can deaggregate everything into /24s today and get away
with it.  So as soon as you have a /23 you can play.  And there are
just too many "valid reasons" to resist.

Around here most ISPs, when they announce new customer routes, send
mail to their peers saying "we will announce these prefixes under
these paths, please update your filters".  I often respond with a
friendly note that we will filter this or that prefix because it's a
more specific of a PA prefix (which we will actually do, although it
doesn't matter that much since we always have a fallback).  Sometimes
I offer to put in a temporary (usually a year) filter exception so
that they can renumber their new customer into aggregatable space.

It doesn't help often, but sometimes it does.
"Think globally, act locally."
-- 
Simon.

Re: IPV6 renumbering painless?

2004-11-12 Thread Simon Leinen


Daniel Roesen writes:
> On Fri, Nov 12, 2004 at 05:19:36PM +0100, Simon Leinen wrote:
>> On Solaris, you would use the "token" option (see the extract from
>> "man ifconfig" output below).  You can simply put "token ::1234:5678"
>> into /etc/hostname6.bge0.  I assume that other sane OSes have similar
>> mechanisms.

> Ah thanks. No, not seen anywhere in Linux or *BSD.

That's why I put in the qualification :-)
-- 
Simon.

Re: How to Blocking VoIP ( H.323) ?

2004-11-12 Thread Simon Leinen


Robert Mathews writes:
> On Thu, 11 Nov 2004, Alexei Roudnev wrote:
>> Hmm - just introduce some jitter into your network, and add random
>> delay to the short packets - and no VoIP in your company -:).

> Alexei:

> How exactly then would anyone implement this, without screwing-up the
> overall performance elements in the network?  :)

Yeah, no jitter for HTTP please (otherwise users will complain when
they tunnel their VoIP traffic over TCP port 80 :-).

Note that Skype already uses TCP port 80 and 443, at least for control
traffic.
-- 
Simon.

Re: IPV6 renumbering painless?

2004-11-12 Thread Simon Leinen

Daniel Roesen writes:
> On Thu, Nov 11, 2004 at 08:44:57AM -0800, Kevin Oberman wrote:
>> We have renumbered IPv6 space a couple of times when we were
>> developing our addressing plan. (We have a /32.) Renumbering was
>> pretty trivial for most systems, but servers requiring a fixed
>> address were usually configured with an explicit prefix. This
>> should not have been the case, but most people configured IPv6
>> addresses pretty much like IPv4 and specified the entire 128
>> bits. Of course, after a renumbering, this gets fixed, so those
>> systems are usually OK the next time.

> "specified the entire 128 bits"... how do you specify only part of
> it?

On Solaris, you would use the "token" option (see the extract from
"man ifconfig" output below).  You can simply put "token ::1234:5678"
into /etc/hostname6.bge0.  I assume that other sane OSes have similar
mechanisms.

602  token address/prefix_length
603Set the IPv6 token of an  interface  to  be  used  for
604address autoconfiguration.
605 
606example% ifconfig hme0 inet6 token ::1/64
607 

> What determines the rest?

The prefix advertised in prefix advertisements.

> "fixed" as in "now using stateless autoconfig"? Fun... change NIC
> and you need to change DNS. Thanks, but no thanks. Not for
> non-mobile devices which need to be reachable with sessions
> initiated from remote (basically: servers).

The above mechanism solves this problem even with stateless
autoconfiguration.  Agree?

I think it's an advantage if servers can get their prefixes from
router announcements rather than from local config files.  Sure, you
still have to update the DNS at some point(s) during renumbering, but
that can't be avoided anyway.
-- 
Simon.

Re: Internet speed report...

2004-09-06 Thread Simon Leinen

Mikael Abrahamsson writes:
> On Mon, 6 Sep 2004, Simon Leinen wrote:
>> Rather than over-dimensioning the backbone for two or three users
>> (the "Petabyte crowd"), I'd prefer making them happy with a special
>> TCP.

> Tune your max window size so it won't be able to use more than say
> 60% of the total bandwidth, that way (if the packets are paced
> evenly) you won't ever overload the 10GE link with 30% background
> "noise".

Hm, three problems:

1.) Ideally the Petabyte folks would magically get *all* of the
currently "unused bandwidth" - I don't want to limit them to 60%.
(Caveat: Unused bandwidth of a path is very hard to quantify.)
2.) When we upgrade the backbone to 100GE or whatever, I don't want to
have to tell those people they can increase their windows now.
3.) TCP as commonly implemented does NOT pace packets evenly.

If the high-speed TCP

1.) notices the onset of congestion even when it's just a *small*
increase in queue length, or maybe a tiny bit of packet drop/ECN
(someone please convince Cisco to implement ECN on the OSR :-),
2.) adapts quickly to load changes, and
3.) paces its packets nicely as you describe,

then things should be good.  Maybe modern TCPs such as FAST or BIC do
all this, I don't know.  I'm pretty sure FAST helps by avoiding to
fill up the buffers.

As I said, it would be great if it were possible to build fast
networks with modest buffers, and use end-to-end (TCP) improvements to
fill the "needs" of the Petabyte/Internet2 Land Speed Record crowd.
-- 
Simon.

Re: Internet speed report...

2004-09-06 Thread Simon Leinen

Michael Dillon writes:
> In the paper 
> http://klamath.stanford.edu/~keslassy/download/tr04_hpng_060800_sizing.pdf

That's also in the (shorter) SIGCOMM'04 version of the paper.

> they state as follows:
> -
> While we have evidence that buffers 
> can be made smaller, we haven't tested the hypothesis
> in a real operational network. It is a little difficult
> to persuade the operator of a functioning, profitable network
> to take the risk and remove 99% of their buffers. But that
> has to be the next step, and we see the results presented in
> this paper as a first step towards persuading an operator to
> try it.
> 

> So, has anyone actually tried their buffer sizing rules?

> Or do your current buffer sizing rules actually match,
> more or less, the sizes that they recommend?

The latter, more or less.  Our backbone consists of 1 Gbps and 10 Gbps
links, and because our platform is a glorified campus L3 switch (Cisco
Catalyst 6500/7600 OSR, mostly with "LAN" linecards), we have nowhere
near the buffer space that was traditionally recommended for such
networks.  (We use the low-cost/performance 4-port variant of the 10GE
linecards.)

The decision for these types of interfaces (as opposed to going the
Juniper or GSR route) was mostly driven by price, and by the
observation that we don't want to strive for >95% circuit utilization.
We tend to upgrade links at relatively low average utilization -
router interfaces are cheap (even 10 GE), and on the optical transport
side (DWDM/CWDM) these upgrades are also affordable.

What I'd be interested in:

In a lightly-used network with high-capacity links, many (1000s of)
active TCP flows, and small buffers, how well can we still support the
occasional huge-throughput TCP (Internet2 land-speed record :-)?

Or conversely: is there a TCP variant/alternative that can fill 10Gb/s
paths (with maybe 10-30% of background load from those thousands of
TCP flows) without requiring huge buffers in the backbone?

Rather than over-dimensioning the backbone for two or three users (the
"Petabyte crowd"), I'd prefer making them happy with a special TCP.
-- 
Simon.

Re: a small note for the Internet archives

2004-05-31 Thread Simon Leinen


Peter Lothberg writes:
[...]
>   Optics type: VSR2000-3R2 (2km)
>   Clock source: line (actual) line (configured)

> Optical Power Monitoring (accuracy: +/- 1dB)
>   Rx power = 1562.3280 mW, 31.9 dBm
   
Ouch!
Some amplifiers you have there...

>   Tx power = 15.4640 mW, 11.9 dBm
>   Tx laser current bias = 96.2 mA
-- 
Simon.

Re: best effort has economic problems

2004-05-31 Thread Simon Leinen

Mikael Abrahamsson writes:
> Tier 1 operators do not do "best effort" really, at least not in
> their cores (and they have the SLAs to back it up). They buy hugely
> expensive top notch gear (Cisco 12000 (and now CRS:s) and Junipers)
> to get the big packet buffers, the fast reroutes and the full
> routing table lookups for each packet to avoid the pitfalls of flow
> forwarding the cheaper platforms have.

> With the advent of 10GE WAN PHY (Force10, Foundry, Riverstone,
> Extreme Networks, Cisco 7600)

I don't think there's 10 GE WAN PHY for the Cisco 7600 yet.  It has
very cost-effective 10 GE *LAN* PHY (10.0 Gb/s, not SONET-compatible)
interfaces though, which I find even more interesting (see below).

> and full L3 lookup for each packet on their newer platforms, we'll
> see very much cheaper L2/L3 equipment being able to take advantage
> of existing OC192 infrastructure and that's where I think you'll
> start to see the real "best effort" networks operating at. At least
> the L2/L3 equipment will be much cheaper for the operators choosing
> this equipment, at approx 1/5 the initial investment of similar
> capacity 12400 and Juniper equipment.

We find that the L1 equipment is getting much cheaper too, especially
in the 10 GE LAN PHY space.  Think DWDM XENPAKs (or XFPs), which go
70-100 kms and which can be multiplexed and amplified with pretty
affordable optical equipment.  If you're not interested in
carrier-class boxes, "traditional" WDM equipment can sometimes be
replaced with active parts that mostly look like GBICs, and passive
parts that look like funny cables...

> Now, how will this translate in cost compared to DWDM equipment and
> OPEX part of the whole equation? [...]
-- 
Simon.

Re: TCP/BGP vulnerability - easier than you think

2004-04-28 Thread Simon Leinen


Priscilla,

> Questions arose while trying to explain proposed TCP fixes to my
> students. Can y'all help me with these?

> We were going over the "Transmission Control Protocol security
> considerations draft-ietf-tcpm-tcpsecure-00.txt" document here when
> the questions arose:

> http://www.ietf.org/internet-drafts/draft-ietf-tcpm-tcpsecure-00.txt

Meta-response: look at the discussion over at the IETF, in the tcpm
Working Group.  There's a nice summary as well as some interesting
discussion on possible issues with these fixes.

Unfortunately, the tcpm mailing list archive seems to be accessible
via FTP as large monthly mailbox files only, so I cannot point you to
the relevant individual messages.  The threads are called "new work
item: TCP security issue" and "draft-ietf-tcpm-tcpsecure".  (There's
also a lot of process discussion in there, about the way this issue
was initially handled by a closed group and then presented as a work
item for the working group.  This is interesting but only marginally
helpful to understand the technical content of the changes.)

Oh no, wait, there's another mail archive for tcpm (not listed on the
"official" WG page (http://www.ietf.org/html.charters/tcpm-charter.html):

The threads start in
  https://www1.ietf.org/mail-archive/working-groups/tcpm/current/msg00086.html
  https://www1.ietf.org/mail-archive/working-groups/tcpm/current/msg00095.html

A nice summary of the changes by David Borman:
  https://www1.ietf.org/mail-archive/working-groups/tcpm/current/msg00130.html

Hope this helps,
-- 
Simon.

Re: IPv6 IGP

2004-04-09 Thread Simon Leinen


We use OSPFv3 on our backbone (OSPFv2 for IPv4, separate routing
processes but largely identical metric/timeout configuration) using
mostly 12.2(17d)SXB on Catalyst 6500/7600 OSRs and various 12.3T
(pre-)releases on 7200/7500.  Works fine.
-- 
Simon.

Re: netsky issue.

2004-03-09 Thread Simon Leinen

Jamie Reid writes:
> If you have a look at 

> http://vil.nai.com/vil/content/v_101083.htm 

> There is a list of IP addresses that are nameservers which are
> hard-coded into the worm. It spreads by e-mail (currently) and thus
> it can be blocked using anti-virus filters.

> My concern is that these addrs are all for nameservers, which could
> be authoritative for other domains, and by blocking these servers
> any domains they host could be effectively put out of commission.

I think that (most of) the IP addresses in the list belong to
*recursive* DNS servers of larger Internet access providers.  There
certainly are quite a few requests from these to authoritative name
servers in our network.  So if you have authoritative name servers in
your network, blocking the IP addresses will result in some denial of
service.

The operators of these servers could probably do a useful thing or the
other here: they could try to trace suspicious queries to help locate
infected machines, and/or limit access to these name servers to only
their customer address ranges.

The latter may be operationally difficult depending on whether these
name servers are also authoritative (perhaps a good argument for
separating recursive and authoritative name servers) and how easy it
is to map the "legitimate user of recursive name service" predicate to
a range of IP addresses.

> I am not aware of an easy way to find out all the domains registered
> to a particular nameserver, and the trend of blocking addrs that
> appear in worm code is starting to concern me a bit.

Rightly so.

> It is not indicated how blocking these servers will have an
> appreciable effect on the worm propagation (unless it gets a second
> stage from them), and I wonder if anyone else has similar concerns,
> or an opinion on whether these IP addresses should actually be
> blocked.

I'd recommend against it, due to collateral damage and more general
end-to-end arguments.
-- 
Simon Leinen   [EMAIL PROTECTED]
SWITCH http://www.switch.ch/misc/leinen/

   Computers hate being anthropomorphized.

Re: /24s run amuck

2004-01-15 Thread Simon Leinen


Frank Louwers writes:
> On Tue, Jan 13, 2004 at 04:12:13PM -0500, Patrick W. Gilmore wrote:
> Filtering on a /20 or whatever (up to /24) is a bad thing because
> RIPE (and maybe APNIC) actually gives out /24 PI space, that comes
> out of RIPE's /8's, not your upstream's /20 or /16 or /whatever...

Yes, but those PIs are allocated from specific sub-ranges that are
documented.  So you can still filter MOST of the space by allocation
boundaries, and accept /24 only in the "PI" ranges.  We do this.

This is RIPE-specific (we aggregate most non-RIPE routes under
0.0.0.0/0), but other RIRs may have similar policies, although
probably with easier-to-find PI swamp ranges.
-- 
Simon.

Re: Advice/Experience with small sized DDWM gear

2003-06-21 Thread Simon Leinen

Deepak Jain writes:
[a response with excellent pieces of advice on CWDM vs. DWDM.]

>   If you are planning more than just 1 DF run, you could buy the less
> expensive solution and just swap it out when you need something more and use
> the CWDM solution somewhere else.

Yes.  What we often do is buy a single pair of fiber (which happens to
be the smallest amount you can get) and use a bi-directional optical
system (CWDM or DWDM).  So we have one of the two fibers free for a
later upgrade (or another useful purpose).

[...]
>   So its a question of how much BW you need and how much you
> want to pay for right now.

An excellent management summary of CWDM vs. DWDM :-)
-- 
Simon.

Re: High Speed IP-Sec - Summary

2003-06-11 Thread Simon Leinen


For the sake of completeness, Sun just announced a new Crypto
accelerator board with GigE interfaces that does SSL and IPSec VPNs,
and claims 800 Mb/s "bulk 3DES encryption":

http://www.sun.com/products/networking/sslaccel/suncryptoaccel4000/index.html
-- 
Simon.

Re: Network Routing without Cisco or Juniper?

2002-09-04 Thread Simon Leinen



On Wed, 4 Sep 2002 05:30:46 -0400 (EDT), "jeffrey.arnold" <[EMAIL PROTECTED]> said:
> Foundry makes a very good, very stable bgp speaker. I've had them in
> my network alongside cisco's and juniper's for a couple of years
> now, and i've never run into any bgp implementation problems that i
> would consider major. A few annoying bugs here and there, but
> nothing significantly worse than C or J.

Thinking of it, I want to confirm, although we have only really used
IBGP (including IMBGP, and doing MD5 authentication) and OSPF on
those (please, no flames that you only need either of those :-).

In this respect the Foundries have never been problematic, and I
noticed they learned the full routing table much faster than our (old)
C's upon startup.  The only problem we had was that in our deployment
we really needed MBGP, and that became available much later than
originally announced.  But when it came it instantly worked as
advertised, at least as far as we tried.

> Beyond the fact that not too many people are familiar with foundry's
> gear, I tend to think that foundry has lost face in the service
> provider world for non-bgp related issues. ACL problems and CAM size
> issues have come up in really large installs (multi GBps, hundreds
> of thousands of flows, etc). Foundry is also behind cisco and
> juniper in features - GRE and netflow/sflow come to mind.

My main problem is that I find debugging protocol operation (such
as PIM-SM) much more difficult than on Cisco.  And you can't expect
them to have as many resources to develop new fatures all the
time; and the ones that get the resources may not be those that are
interesting to ISPs.

> The ACL and CAM issues are supposedly fixed in foundry's jetcore
> chipset boxes, but i haven't seen any of those yet. Sflow is now an
> option, and from what i hear, their implementation is very very
> good. Overall, foundry still makes a good box - when you figure in
> the cost factor, it becomes a great box.

Definitely agree.  Also they start up incredibly fast, because the
software is so small.  So upgrading software on the box is relatively
painless.
-- 
Simon.

Re: Readiness for IPV6

2002-07-09 Thread Simon Leinen



On Mon, 8 Jul 2002 19:47:52 -0400, "Phil Rosenthal" <[EMAIL PROTECTED]> said:
> As far as I can tell, neither Foundry Bigiron, nor Cisco 65xx
> support IPV6 (I could be wrong).

It is rumored that Cisco has software for the 6500 that does IPv6,
albeit "in software" on the MSFC.  And I'm sure they have plans to
support IPv6 in hardware on this platform at some point.

Foundry has something like "protocol-specific VLANs", which allows you
to bridge IPv6 traffic, while (Layer-3-) routing IPv4.

> While they probably aren't the most popular routers, they are very
> popular, and im sure plenty of cisco's smaller routers don't support
> it either.

The smaller routers are generally not a problem as long as they have
enough memory to run recent IOS releases, and I think the bloat is
mainly due to new functions other than IPv6.

An interesting question is what it would take to support IPv6 on
appliance-like routers such as IP-over-Cable or -xDSL CPE.  In the
retail space I actually see some interest in running IPv6, because it
makes it much more feasible to operate a small network at home, and I
have the impression that home users now lead enterprises in terms of
IPv6-enabled OS deployment (Windows XP and Linux in particular).

> How ready is the 'net to transit to IPV6 in the future?

Let's say that most ISPs could satisfy the current demand :-)

Even though there are relatively few high-performance implementations
(read: ASIC-based IPv6 forwarding as Juniper has) out there, a modest
amount of IPv6 traffic could be carried "natively" on most networks.

If you need higher performance and don't have hardware forwarding for
IPv6, you can always tunnel in IPv4 (or, shudder, MPLS) at the edges.
You may also want to do this if you don't really need the IPv6
performance, but would like to protect the control plane of your
production (IPv4) service from the additional CPU load (IPv6 traffic
as a DOS on your RPs :-).

> Should everyone be factoring in replacing big routers with IPV6
> being the only reason?

Sure, provided everyone has infinite amounts of money, or the
additional revenue from IPv6 justifies the investment.  Honestly I
don't think either is the case today for most of us, except where some
form of public funding exists, for example through innovation/research
subsidies or tax breaks for enterprises using IPv6.

> Just curious on others' opinions on this.
-- 
Simon.

Re: Big meetings should never be held at noon!

2002-06-27 Thread Simon Leinen



On Wed, 26 Jun 2002 13:53:38 -0400, "Pawlukiewicz Jane" <[EMAIL PROTECTED]> 
said:

> Is there a way to download _part_ of a BGP table from a router?

Sure, using SNMP.  The question is whether you'll get the part you
want... if you use SNMPv2/3 and get-bulk, and only ask for the columns
you are actually interested in, it may not even be too inefficient.
-- 
Simon Leinen   [EMAIL PROTECTED]
SWITCH http://www.switch.ch/misc/leinen/

   Computers hate being anthropomorphized.

45 matches

Mail list logo