Re: RINA - scott whaps at the nanog hornets nest :-)

2010-12-02 Thread Simon Horman
On Sun, Nov 07, 2010 at 01:42:33AM -0700, George Bonser wrote:
  
   I guess you didn't read the links earlier.  It has nothing to do
 with
   stack tweaks.  The moment you lose a single packet, you are toast.
  And
  
  TCP SACK.
 
 
 Certainly helps but still has limitations.  If you have too many packets
 in flight, it can take too long to locate the SACKed packet in some
 implementations, this can cause a TCP timeout and resetting the window
 to 1.  It varies from one implementation to another.  The above was for
 some implementations of Linux.  The larger the window (high speed, high
 latency paths) the worse this problem is.  In other words, sure, you can
 get great performance but when you hit a lost packet, depending on which
 packet is lost, you can also take a huge performance hit depending on
 who is doing the talking or what they are talking to.
 
 Common advice on stack tuning  for very large BDP paths where the TCP
 window is  20 MB, you are likely to hit the Linux SACK implementation
 problem. If Linux has too many packets in flight when it gets a SACK
 event, it takes too long to located the SACKed packet, and you get a TCP
 timeout and CWND goes back to 1 packet. Restricting the TCP buffer size
 to about 12 MB seems to avoid this problem, but clearly limits your
 total throughput. Another solution is to disable SACK.  Even if you
 don't have such as system, you might be talking to one.  

Do you know if any work is being done on resolving this problem?
It seems that work in that area might be more fruitful than banging
your head against increasing the MTU.

 But anyway, I still think 1500 is a really dumb MTU value for modern
 interfaces and unnecessarily retards performance over long distances.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-09 Thread Niels Bakker

* gbon...@seven.com (George Bonser) [Mon 08 Nov 2010, 17:54 CET]:
I wasn't talking about changing anything at any of the edges.  The 
idea was just to get the middle portion of the internet, the 
peering points to a place that would support frames larger than 
1500.  It is practically impossible for anyone to send such a packet 
off-net until that happens.


If you think peering points are the middle portion of the internet 
that all packets have to traverse, then this thread is beyond hope.



-- Niels.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-09 Thread Tony Finch
On Mon, 8 Nov 2010, Scott Weeks wrote:

 The mapping server idea that several proposals use do not appear to keep
 the smartness at the edges, rather they seem try to make a smarter core
 network.

Is a DNS server core or edge? ILNP aims to use the DNS as its mapping
service.

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
HUMBER THAMES DOVER WIGHT PORTLAND: NORTH BACKING WEST OR NORTHWEST, 5 TO 7,
DECREASING 4 OR 5, OCCASIONALLY 6 LATER IN HUMBER AND THAMES. MODERATE OR
ROUGH. RAIN THEN FAIR. GOOD.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-09 Thread Scott Weeks


--- b...@herrin.us wrote:
really would. Maybe you can tell me the page number, 'cause I just
can't wade through the rest of it.
-


Don't read anything until around chapter 6 or 7.  Also, skip the last one.   
Thanks for the responses.

scott



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-09 Thread Scott Weeks


--- d...@dotat.at wrote:
From: Tony Finch d...@dotat.at
On Mon, 8 Nov 2010, Scott Weeks wrote:

 The mapping server idea that several proposals use do not appear to keep
 the smartness at the edges, rather they seem try to make a smarter core
 network.

Is a DNS server core or edge? ILNP aims to use the DNS as its mapping
service.
--



DNS root name servers are at the 'core'.  No?

scott



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-09 Thread Nathan Eisenberg
 If you think peering points are the middle portion of the internet that all
 packets have to traverse, then this thread is beyond hope.
 
 
   -- Niels.

Making sweeping generalizations at thin air is fun!

This statement could be easily true, just as it could be easily false.

Nathan




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-09 Thread Nick Hilliard
On 09/11/2010 13:46, Tony Finch wrote:
 Is a DNS server core or edge? ILNP aims to use the DNS as its mapping
 service.

This is one of several reasons that ILNP is destined to fail - imho.

Nick



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Mark Smith
On Sun, 7 Nov 2010 01:07:17 -0700
George Bonser gbon...@seven.com wrote:

  
   Yes, I really don't understand that either.  You would think that
 the
   investment in developing and deploying all that SONET infrastructure
   has been paid back by now and they can lower the prices
 dramatically.
   One would think the vendors would be practically giving it away,
   particularly if people understood the potential improvement in
   performance, though the difference between 1500 and 4000 is probably
   not all that much except on long distance ( 2000km ) paths.
  
  Careful, you're rapidly working your way up to nanog kook status with
  these absurd claims based on no logic whatsoever.
 
 My aploligies.  It just seemed to me that the investment in SONET,
 particularly the lower data rates, should be pretty much paid back by
 now.  How long has OC-12 been around?  I can understand a certain amount
 of premium for something that doesn't sell as much but the difference in
 prices can be quite amazing in some markets. Some differential might be
 justified but why so much?
 
 An OC-12 SFP optic costs nearly $3,000 from one vendor, list.  Their
 list price for a GigE SFP optical module is about 30% of that.  What is
 it about the optic module that would cause it to be 3 times as expensive
 for an interface with half the bandwidth?  A 4-port OC-12 module is
 37,500 list.  A 4-port 10G module is $10,000 less for 10x the bandwidth.
 
 In other words, what is the differential in the manufacturing costs of
 those?  I don't believe it is as much as the differential in the selling
 price.
 
 
 

Once the base manufacturing cost is covered, supply and demand dictate
the price a.k.a. you charge what the market will bear. While at least
one person/organisation continues to pay sonet/sdh pricing, that's what
will be charged.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Mark Smith
On Sun, 7 Nov 2010 01:49:20 -0600
Richard A Steenbergen r...@e-gerbil.net wrote:

 On Sun, Nov 07, 2010 at 08:02:28AM +0100, Mans Nilsson wrote:
  
  The only reason to use (10)GE for transmission in WAN is the 
  completely baroque price difference in interface pricing. With todays 
  line rates, the components and complexity of a line card are pretty 
  much equal between SDH and GE. There is no reason to overcharge for 
  the better interface except because they (all vendors do this) can.
 
 To be fair, there are SOME legitimate reasons for a cost difference. For 
 example, ethernet has very high overhead on small packets and tops out 
 at 14.8Mpps over 10GE, whereas SONET can do 7 bytes of overhead for your 
 PPP/HDLC and FCS etc and easily end up doing well over 40Mpps of IP 
 packets. The cost of the lookup ASIC that only has to support the 
 Ethernet link is going to be a lot cheaper, or let you handle a lot more 
 links on the same chip.
 
 At this point it's only half price gouging of the silly telco customers 
 with money to blow. There really are significant cost savings for the 
 vendors in using the more popular and commoditized technology, even 
 though it may be technically inferior. Think of it like the old IDE vs 
 SCSI wars, when enough people get onboard with the cheaper interior 
 technology, eventually they start shoehorning on all the features and 
 functionality that you wanted from the other one in the first place. :)
 

That sounds a lot like the Worse is Better argument

The Rise of ``Worse is Better''
http://www.jwz.org/doc/worse-is-better.html

This quote would be quite applicable to Ethernet -

The lesson to be learned from this is that it is often undesirable to
go for the right thing first. It is better to get half of the right
thing available so that it spreads like a virus. Once people are hooked
on it, take the time to improve it to 90% of the right thing.

I think ethernet gaining OAM would be an example of improving to
90% of the right thing (15 or so years after being invented and
deployed), while those technologies that tried to be right from the
outset (token ring, ATM etc.) have or are disappearing.




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Mans Nilsson
Subject: RE: RINA - scott whaps at the nanog hornets nest :-) Date: Sun, Nov 
07, 2010 at 12:34:56AM -0700 Quoting George Bonser (gbon...@seven.com):
 
 Yes, I really don't understand that either.  You would think that the
 investment in developing and deploying all that SONET infrastructure
 has been paid back by now and they can lower the prices dramatically.
 One would think the vendors would be practically giving it away,
 particularly if people understood the potential improvement in
 performance, though the difference between 1500 and 4000 is probably
 not all that much except on long distance ( 2000km ) paths.

Even if larger MTUen are interesting (but most of the time not worth
the work) the sole reason I like SDH  as my WAN technology is the
presence of signalling -- so that both ends of a link are aware of its
status near-instantly (via protocol parts like RDI etc). In GE it is
legal to not receive any packets, which means that oblivious is a
possible state for such a connection. With associated routing
implications.

-- 
Måns Nilsson primary/secondary/besserwisser/machina
MN-1334-RIPE +46 705 989668
Is this the line for the latest whimsical YUGOSLAVIAN drama which also
makes you want to CRY and reconsider the VIETNAM WAR?


pgpKowh41ld3j.pgp
Description: PGP signature


Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Tony Finch
On Sun, 7 Nov 2010, William Herrin wrote:

  http://www.ionary.com/PSOC-MovingBeyondTCP.pdf

 The last time this was discussed in the Routing Research Group, none
 of the proponents were able to adequately describe how to build a
 translation/forwarding table in the routers or whatever passes for
 routers in this design.

I note that he doesn't actually describe how to implement a large-scale
addressing and routing architecture. It's all handwaving.

And he seems to think that core routers can cope with per-flow state.

The only bits he's at all concrete about are the transport protocol, which
isn't really where the unsolved problems are.

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
HUMBER THAMES DOVER WIGHT PORTLAND: NORTH BACKING WEST OR NORTHWEST, 5 TO 7,
DECREASING 4 OR 5, OCCASIONALLY 6 LATER IN HUMBER AND THAMES. MODERATE OR
ROUGH. RAIN THEN FAIR. GOOD.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Eugen Leitl
On Mon, Nov 08, 2010 at 03:56:17PM +, Tony Finch wrote:

 I note that he doesn't actually describe how to implement a large-scale
 addressing and routing architecture. It's all handwaving.

I'm probably vying for nanog-kook status as well, but in high-dimensional
spaces blocking is arbitrarily improbable. Think of higher-dimensional
analogs of 3d-Bresenham (which is local-knowledge only), then blow away
most of the links. It still works. You have to wire the network appropriately
to loosely follow geography (which is of course currently a show-stopper) 
and label the nodes appropriately -- as a bonus, you can derive node
ID by mutual iterative refinement, pretty much like relativistic time
of flight mutual triangulation.

Another issue is purely photonic cut-through at very high data rates:
there's not that much time to do a routing decision even if your packet
stuck in molasses as slow light or circulating in a fiber loop FIFO. 
So not only are photonic gates expensive (and conversion to electronics
and back is right out), you might not stack too many individual gate 
delays on top of each other.

Networks are much too smart still, what you need is the barest decoration
upon the raw physics of this universe.
 
 And he seems to think that core routers can cope with per-flow state.
 
 The only bits he's at all concrete about are the transport protocol, which
 isn't really where the unsolved problems are.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Jack Bates

On 11/8/2010 9:56 AM, Tony Finch wrote:


I note that he doesn't actually describe how to implement a large-scale
addressing and routing architecture. It's all handwaving.



That's an extremely hard to address problem. While there are many 
proposals, they usually do away with features which we utilize. I'm 
looking at a graph on the noc screen right now which shows how grotesque 
natural load balancing can be between 3 AS interconnects. I have enough 
free overhead to allow this, but eventually I will have to start 
applying policies to balance better. This implies that I'll eventually 
have to advertise sub-aggregate v6 prefixes to balance as well (perhaps 
some /31 or /32 announcements overlaying the /27).


The problem with most of the other methods are they ignore policies and 
the desired route to reach a network, and instead rely on any way to get 
there. But let's be honest, the current problems tend to be memory 
problems, not performance problems. It annoys me that vendors did this 
last increment in such a small scale guaranteeing we'll be buying new 
hardware again soon.


Jack



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread George Bonser
 
 Even if larger MTUen are interesting (but most of the time not worth
 the work) the sole reason I like SDH  as my WAN technology is the
 presence of signalling -- so that both ends of a link are aware of its
 status near-instantly (via protocol parts like RDI etc). In GE it is
 legal to not receive any packets, which means that oblivious is a
 possible state for such a connection. With associated routing
 implications.

I wasn't talking about changing anything at any of the edges.  The idea was 
just to get the middle portion of the internet, the peering points to a place 
that would support frames larger than 1500.  It is practically impossible for 
anyone to send such a packet off-net until that happens.

There was nothing that said everyone should change to a higher MTU.  I was 
saying that there are cases where it can be useful for certain types of 
transfers but the state of today's internet is that you can't do it even if you 
want to except by special arrangement.  Considering the state of today's modern 
hardware, there isn't a technical reason why those points can't be set to 
handle larger packets should one come along.  That's all.  I wasn't suggesting 
everyone set their home system for a larger MTU, I was suggesting that the 
peering points be able to handle them should one pass through.

Now I agree, on an existing exchange having a flag day for everyone to change 
might not be worthwhile but on a new exchange where you have a green field, 
there is no reason to limit the MTU at that point to 1500.  Having a larger MTU 
in the middle of the path does not introduce PMTUD issues. PMTUD issues are 
introduced by having a smaller MTU somewhere in the middle of the path.  The 
conversation was quickly dragged into areas other than what the suggestion was 
about.

What was interesting was the email I got from people who need to move a lot of 
science and engineering data on a daily basis who said their networking people 
didn't get it either and it is causing them problems.  Not everyone is going 
to need to use large frames.  But people who do need them can't use them and 
there really isn't a technical reason for that.  That specific portion of the 
Internet, the peering points between networks, carries traffic from all sorts 
of users, not just people at home with their twitter app open.  Enabling the 
passage of larger packets doesn't mean advocating that everyone use them or 
changing anyone's customer edge configuration.

It wouldn't change anyone's routing, wouldn't impact anyone's PMTUD problems.  
I don't believe that is kooky. A lot of other people have been calling for 
the same thing for quite some time. But making a network jumbo clean doesn't 
do a lot of good if the peering points are the bottleneck. That's all.  
Removing that bottleneck is all that the suggestion was about.




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Mans Nilsson
Subject: RE: RINA - scott whaps at the nanog hornets nest :-) Date: Mon, Nov 
08, 2010 at 08:53:47AM -0800 Quoting George Bonser (gbon...@seven.com):
  
  Even if larger MTUen are interesting (but most of the time not worth
  the work) the sole reason I like SDH  as my WAN technology is the
  presence of signalling -- so that both ends of a link are aware of its
  status near-instantly (via protocol parts like RDI etc). In GE it is
  legal to not receive any packets, which means that oblivious is a
  possible state for such a connection. With associated routing
  implications.
 
 I wasn't talking about changing anything at any of the edges.  The idea was 
 just to get the middle portion of the internet, the peering points to a 
 place that would support frames larger than 1500.  It is practically 
 impossible for anyone to send such a packet off-net until that happens.

Know what? We have not one, but five or so Internet Exchange points in
Sweden, where there are 802.1q VLANS setup for higher MTU (4470 for
hysterical raisins) . My impression is that people use them, but I'm
also being informed by statistics that there is a _very_ steep drop in
packet count vs size once 1500 is reached. It is setup, but the edge is
where packets are made, not the core. Thus, noone can send large
packets. Anyway. 

I'd concur that links where routers exchange very large routing tables
benefit from PMTUD (most) and larger MTU (to some degree), but I'd
argue that most IXPen see few prefixes per peering, up to a few
thousand max. The large tables run via PNI and paid transit, as well as
iBGP. There, I've seen drastical improvements in convergence time once
PMTUD was introduced and arcane MSS defaults dealt with. MTU mattered
not much.

Given this empirical data, clearly pointing to the fact that It Does
Not Matter, I think we can stop this nonsense now.

-- 
Måns Nilsson primary/secondary/besserwisser/machina
MN-1334-RIPE +46 705 989668
I was making donuts and now I'm on a bus!


pgp3C76fClvkv.pgp
Description: PGP signature


Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Jack Bates

On 11/8/2010 12:36 PM, Mans Nilsson wrote:

I'd concur that links where routers exchange very large routing tables
benefit from PMTUD (most) and larger MTU (to some degree), but I'd
argue that most IXPen see few prefixes per peering, up to a few
thousand max. The large tables run via PNI and paid transit, as well as
iBGP. There, I've seen drastical improvements in convergence time once
PMTUD was introduced and arcane MSS defaults dealt with. MTU mattered
not much.

Given this empirical data, clearly pointing to the fact that It Does
Not Matter, I think we can stop this nonsense now.



His point wasn't to benefit the BGP routers at the IX, but to support 
those who need to transmit  1500 size packets and have the ability to 
create them on the edge. In particular, the impact of running long 
distances (high latency) with higher packet drop probability. In such a 
scenario, it does matter.


Even if you don't see that many  1500 byte packets, doesn't imply that 
it doesn't matter. I have v6 peerings and see very little traffic on 
them compared to v4. Should I then state that v6 doesn't matter? If 
people have an expectation of not making it through core networks at 
1500, they won't bother trying to send 1500. If the IX doesn't 
support 1500, why would people connecting to the IX care if their 
backbones support 1500?



Jack



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Valdis . Kletnieks
On Mon, 08 Nov 2010 19:36:49 +0100, Mans Nilsson said:

 Given this empirical data, clearly pointing to the fact that It Does
 Not Matter, I think we can stop this nonsense now.

That's right up there with the sites that blackhole their abuse@
address, and then claim they never actually see any complaints.
Or forcing NAT at the edge, and saying The fact we get no complaints
means It Does Not Matter, ignoring SCTP and similar use cases where
it *does* matter.

If in fact It Does Not Matter, why did the Internet2 folks make any
effort to support 9000 end-to-end?

http://proj.sunet.se/LSR2/index.html says they used an MTU of 4470.. and then
add and we used only about half the MTU size (which generates heavier CPU-load
on the end-hosts), which pretty much implies the previous record was at 9000 or
so.

So there's empirical data that It Does Indeed Matter (at least to some
people).  



pgp77N4B5YvB0.pgp
Description: PGP signature


Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Nick Hilliard
On 08/11/2010 21:51, valdis.kletni...@vt.edu wrote:
 So there's empirical data that It Does Indeed Matter (at least to some
 people).  

It certainly does.  However, there is lots more empirical data to suggest
that It Does Not Matter to most service providers.  We tried introducing it
to INEX several years ago.  Of 40-something connected parties, only one was
really interested enough to do something about it.  Another indicated
interest but then pulled back when they realised a) the amount of work it
would take to support it across their network, b) the scope for painful
breakage if they accidentally got something wrong somewhere and c) the
benefit they would get from it.

Probably most interesting aspect was the cost / benefit analysis.  One the
one hand, there was little to no benefit for end-users and hosted services
on the commercial ISP that showed interest.  However, the NREN which was
interested could have made real use out of it, in terms of dealing with
very speed single-stream data transfers.

Anyway, all of the arguments for it, both pro and con, have been rehashed
on this thread.  The bottom line is that for most companies, it simply
isn't worth the effort, but that for some NRENs, it is.

Let's move on now.

Nick



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Chris Adams
Once upon a time, valdis.kletni...@vt.edu valdis.kletni...@vt.edu said:
 That's right up there with the sites that blackhole their abuse@
 address, and then claim they never actually see any complaints.

What about telcos that disable error counters and then say we don't see
any errors?
-- 
Chris Adams cmad...@hiwaay.net
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Scott Weeks


Been unexpectedly gone for the weekend, apologies for the delay.  Wow, can 
subjects get hijacked quickly here.  I think it happened within one or two 
emails.  It was just for weekend fun anyway...


--- b...@herrin.us wrote:
From: William Herrin b...@herrin.us

 And so, ...the first principle of our proposed new network architecture: 
 Layers are recursive.

: Anyone who has bridged an ethernet via a TCP based
: IPSec tunnel understands that layers are recursive.

WRT the paper I'm having trouble correlating what you say with their notion of 
recursive layer network communications.  It seems apples and oranges, but maybe 
I have Monday-its.  It's only a little after noon here.




 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf

: John Day has been chasing this notion long enough to write three
: network stacks. If it works and isn't obviously inferior in its
: operational resource consumption, where's the proof-of-concept code?

Not having read the following enough, being in operations and not in the 
research areas as much as others on this list I don my flameproof underpants 
and post this:

pouzinsociety.org gives: 
-
The TSSG developed CBA prototype, which consists of a fully functional 
componentised network stack and the ancillary supporting infrastructure, has 
been contributed to the Pouzin Society as the TINOS project.

TINOS will provide the underlying platform and execution environment upon which 
a RINA prototype can be developed.

The TSSG and i2CAT will be joining forces with the Pouzin Society to contribute 
to the development of a RINA prototype based on the TINOS platform.

The TINOS code is freely available under the LGPL license.
-


the CBA prototype link being: 
http://www.tssg.org/4WARD/2010/07/component_based_architecture_n.html

Seemingly unfortunate (to me) is: ...an open-source project to create a Java 
platform operating system.




: The last time this was discussed in the Routing Research Group, none
: of the proponents were able to adequately describe how to build a
: translation/forwarding table in the routers or whatever passes for
: routers in this design.

When I asked on RRG I was told by the chairs, privately, that no open-slate 
designs would be considered.  No RINA proponents are participating in the list, 
as well.

WRT RRG I had assumed various proposals would be considered with equal respect 
and dignity, the basic components described, a 'winner' selected and then the 
engineering details designed.  Watching the list has been an experience in 
reality (it's not all peace, love and happiness out there :-) and I now more 
clearly understand the comments made by others on this list about the process.  
Since it wasn't allowed on RRG, I hoped to spur discussion here between those 
who spend more cycles in research and learn from that discussion.  It didn't 
happen yet...  ;-)

scott

ps.  Thanks for the response.  I am really curious about the approach.  It 
would seem to weed out a lot of redundant things that various layers repeat.




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Jack Bates

On 11/8/2010 4:08 PM, Nick Hilliard wrote:

Anyway, all of the arguments for it, both pro and con, have been rehashed
on this thread.  The bottom line is that for most companies, it simply
isn't worth the effort, but that for some NRENs, it is.



I think a lot of that is misinformation and confusion. A company looks 
at it and thinks of the issues deploying it to end users, and misses the 
benefits of deploying it at the core only handling special requests. 
This is especially true for hosting companies, where a majority of 
connections to servers need to stay at low MTU to keep things 
streamlined, but for specific cases could increase MTU for things such 
as cross country backups. Many servers can handle these dual MTU setups.


Larger MTU is beneficial when someone controls the 2 endpoints and has 
use for it. They can request for the larger MTU connection with their 
providers/datacenters, but if the core systems aren't supporting it, 
they'll die miserably.



Jack



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Nathan Eisenberg
 Been unexpectedly gone for the weekend, apologies for the delay.  Wow,
 can subjects get hijacked quickly here.  I think it happened within one or two
 emails.  It was just for weekend fun anyway...

So... You tossed a cow into a pool (that you knew was) filled with piranhas, 
waited a few days, and now you want to know where the cow went?

-Nathan


Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Scott Weeks


--- d...@dotat.at wrote:
From: Tony Finch d...@dotat.at

: I note that he doesn't actually describe how to implement 
: a large-scale addressing and routing architecture. It's all 
: handwaving.

There is more discussed in the book.  The paper was written by another person 
and had to only hit the highlights, or it'd be too long for folks to want to 
read.  I'd imagine you can get a copy of the book in a university library.



:And he seems to think that core routers can cope with per-flow state.

Can you elaborate for me?



: The only bits he's at all concrete about are the transport
: protocol, which isn't really where the unsolved problems are.

It wasn't about just solving problems.  It seems to me to be about if you could 
clean-slate design, what would you do?  AFAICT the RRG folks are specifically 
focused on fixing problems: map-n-encap and tunneling being the most liked 
solutions.

One similar thing to other proposals on that list, though, that has me 
wondering is the use of a 'server' in the middle to keep track of everything.


scott



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Mans Nilsson
Subject: Re: RINA - scott whaps at the nanog hornets nest :-) Date: Mon, Nov 
08, 2010 at 10:08:53PM + Quoting Nick Hilliard (n...@foobar.org):
 On 08/11/2010 21:51, valdis.kletni...@vt.edu wrote:
  So there's empirical data that It Does Indeed Matter (at least to some
  people).  
 
 Anyway, all of the arguments for it, both pro and con, have been rehashed
 on this thread.  The bottom line is that for most companies, it simply
 isn't worth the effort, but that for some NRENs, it is.

And NREN-NREN traffic typically does not traverse commercial IXen.
(Even though ISTR Sunet and Nordunet having peerings configured on
Netnod).  Instead, from empire-building reasons or job security or No
research project is complete unless the professor gets a new laptop and
a wavelength to CERN (or in USA, pick a DoE site) from the project
money, NRENs build their own...  I am convinced that some applications
actually benefit from this, though.
 
 Let's move on now.

Indeed. 

-- 
Måns Nilsson primary/secondary/besserwisser/machina
MN-1334-RIPE +46 705 989668
Yow!  I want my nose in lights!


pgpD2gcXRIWrL.pgp
Description: PGP signature


Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Scott Weeks


--- eu...@leitl.org wrote:
From: Eugen Leitl eu...@leitl.org

Networks are much too smart still, what you need is the barest decoration
upon the raw physics of this universe.
--

Yes, that's one thing I note.  The mapping server idea that several proposals 
use do not appear to keep the smartness at the edges, rather they seem try to 
make a smarter core network.

scott




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Tony Finch
On Mon, 8 Nov 2010, Scott Weeks wrote:
 From: Tony Finch d...@dotat.at

 : I note that he doesn't actually describe how to implement
 : a large-scale addressing and routing architecture. It's all
 : handwaving.

 There is more discussed in the book.

I have bought and read the book. It's an interesting and entertaining rant
about the protocol wars, but still far too vague about proposing solutions
for our current pain points. Argiung about TCP vs. Delta-T is a very long
way from the problems that need solving.

My comment stands.

 :And he seems to think that core routers can cope with per-flow state.

 Can you elaborate for me?

Perhaps I don't understand how connection-oriented networks work. How do
you reserve bandwidth for a connection with guaranteed quality of service
without establishing state on every router in the path? How do you do it
in a network that spans multiple organizations? What connection-oriented
inter-domain protocols have had widespread deployment?

 It wasn't about just solving problems.  It seems to me to be about if
 you could clean-slate design, what would you do?

If your lovely clean architecture can't solve problems why should anyone
pay attention to it? A clean slate architecture needs to synthesize what
we have learned from practical experience and add a dose of cleverness so
that problems can be solved much more easily.

A simple mostly-unrelated example: in the 1980s hypertext systems had
links that were bidirectional and they made an effort to keep them
consistently maintained. This made it impossible to have an inter-domain
hypertext system. The WWW discarded the requirement for consistent
bidirectional links, so it was not proper hypertext. Even so, because it
does not require co-operation between the ends of the link, it rapidly
outgrew any previous hypertext system.

The point of a clean slate design is to rethink the foundations of your
architecture, and get rid of constraints that set you up to fail.

Tony.
-- 
f.anthony.n.finch  d...@dotat.at  http://dotat.at/
HUMBER THAMES DOVER WIGHT PORTLAND: NORTH BACKING WEST OR NORTHWEST, 5 TO 7,
DECREASING 4 OR 5, OCCASIONALLY 6 LATER IN HUMBER AND THAMES. MODERATE OR
ROUGH. RAIN THEN FAIR. GOOD.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread Scott Weeks


--- d...@dotat.at wrote:
The point of a clean slate design is to rethink the foundations of your
architecture, and get rid of constraints that set you up to fail.
--


Yes, and I thought this idea could be the beginning of one way to do that and 
became interested in what others thought.  However, there're not very many 
avenues to ask for competent responses on things like this.  Thanks for the 
responses.  

scott

ps. The NAT is your friend part is what I thought would whap at the nest for 
weekend fun...  :-)



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-08 Thread William Herrin
On Mon, Nov 8, 2010 at 6:02 PM, Scott Weeks sur...@mauigateway.com wrote:
 And so, ...the first principle of our proposed new network architecture: 
 Layers are recursive.

 : Anyone who has bridged an ethernet via a TCP based
 : IPSec tunnel understands that layers are recursive.

 WRT the paper I'm having trouble correlating what you say with their
 notion of recursive layer network communications.
 It seems apples and oranges

Hi Scott,

Having skimmed the article and some of its predecessors, I find it
hard to determine whether there's any correlation. REALLY hard.


 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf

 : John Day has been chasing this notion long enough to write three
 : network stacks. If it works and isn't obviously inferior in its
 : operational resource consumption, where's the proof-of-concept code?

 TINOS will provide the underlying platform and execution
 environment upon which a RINA prototype can be developed.

will provide
can be developed


 the CBA prototype link being:
 http://www.tssg.org/4WARD/2010/07/component_based_architecture_n.html

Described in the videos as a clever modeling tool forked off of JNode
which had a plain old TCP/IP stack written in Java.

But what I'm still missing is use of that modeling system to
demonstrate any concepts in Day's plan.


On Mon, Nov 8, 2010 at 6:14 PM, Scott Weeks sur...@mauigateway.com wrote:
 From: Tony Finch d...@dotat.at
 : I note that he doesn't actually describe how to implement
 : a large-scale addressing and routing architecture. It's all
 : handwaving.

 There is more discussed in the book.

A colleague thoughtfully lent me a copy of the book. I found it more
incondite than recondite.

I'd like there to be some abstruse nugget of insight in there. I
really would. Maybe you can tell me the page number, 'cause I just
can't wade through the rest of it.

Regards,
Bill Herrin


-- 
William D. Herrin  her...@dirtside.com  b...@herrin.us
3005 Crane Dr. .. Web: http://bill.herrin.us/
Falls Church, VA 22042-3004



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Mikael Abrahamsson

On Sat, 6 Nov 2010, George Bonser wrote:

And by that I mean using 1500 MTU is what degrades the performance, not 
the ethernet physical transport.  Using MTU 9000 would give you better 
performance than SONET.  That is why Internet2 pushes so hard for people 
to use the largest possible MTU and the suggested MINIMUM is 9000.


I tried to get IEEE to go for higher MTU on 100GE. When taking into 
account what the responses were, this is never going to change.


Also, if we're going to go for bigger MTUs, going from 1500 to 9000 is 
basically worthless, if we really want to do something, we should go for 
64k or even bigger.


About 1500 MTU degrading performance, that's a TCP implementation issue, 
not really a network issue. Interrupt performance in end systems for 
high-speed transfers isn't really a general problem, and not until you 
reach speeds of several gigabit/s. Routers handle PPS just fine, this was 
solved long ago after we stopped using regular CPUs in them.


Increasing MTU on the Internet is not something driven by the end-users, 
so it's not going to happen in the near future. They are just fine with 
1500 MTU. Higher MTU is a nice to have, not something that is seriously 
hindering performance on the Internet as it is today or in the next few 
tens of years.


--
Mikael Abrahamssonemail: swm...@swm.pp.se



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Mans Nilsson
Subject: RE: RINA - scott whaps at the nanog hornets nest :-) Date: Sat, Nov 
06, 2010 at 08:38:33PM -0700 Quoting George Bonser (gbon...@seven.com):

 No wonder there is still so much transport
 using SONET.  Using Ethernet reduces your effective performance over
 long distance paths.

The only reason to use (10)GE for transmission in WAN is the completely
baroque price difference in interface pricing. With todays line rates,
the components and complexity of a line card are pretty much equal
between SDH and GE. There is no reason to overcharge for the better
interface except because they (all vendors do this) can.

We've just ordered a new WAN to be built, and we're going with GE over
(mostly) WDM because the interface prices are like six times higher per
megabit for SDH. (which would have cost roughly equally per line given
that it is quite OK to run SDH without the SDH equipment, just using
WDM)

Oh, s/SDH/SONET/ on above, but I'm in Europe, so..
-- 
Måns Nilsson primary/secondary/besserwisser/machina
MN-1334-RIPE +46 705 989668
If I am elected, the concrete barriers around the WHITE HOUSE will be
replaced by tasteful foam replicas of ANN MARGARET!


pgpWOvubncflf.pgp
Description: PGP signature


RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread George Bonser
 
 Also, if we're going to go for bigger MTUs, going from 1500 to 9000 is
 basically worthless, if we really want to do something, we should go
 for
 64k or even bigger.

I agree but we need to work with what we have.  Practically everything
currently appearing at a peering point will support 9000.  Getting
equipment that would support 64000 would be more difficult.

 
 About 1500 MTU degrading performance, that's a TCP implementation
 issue,
 not really a network issue. 

True, but TCP is what we are stuck with for right now.  Different
protocols could be developed to handle the small packets better.

 Interrupt performance in end systems for
 high-speed transfers isn't really a general problem, and not until you
 reach speeds of several gigabit/s. Routers handle PPS just fine, this
 was
 solved long ago after we stopped using regular CPUs in them.

We are starting to move to 10Gig + peering connections.  I have two 10G
peering ports currently on order.  several gigabits/sec is here today.

 
 Increasing MTU on the Internet is not something driven by the end-
 users,
 so it's not going to happen in the near future. 

It depends on what those end users are doing.  If they are loading a web
page, you are probably correct.  If they are an enterprise user
transferring log files from Hong Kong to New York, it makes a huge
difference, particularly the moment a packet gets lost somewhere.  At
some point it becomes faster to put data on disks and fly them across
the ocean than to try to transmit it by TCP with 1500 byte MTU.  Trying
to explain to someone that they are not going to get any better
performance on that daily data transfer from the far East by upgrading
from 100 Mb to GigE is hard for them to understand as it is a bit
counter-intuitive.  They believe 1G is faster than 100Meg, when it
isn't.  If you tell them they could get a faster file transfer rate by
using a an OC-3 with MTU 4000 than they would get by upgrading from
100Mb ethernet to GigE with MTU 1500, they just don't get it.  In fact,
telling them that they won't get one iota of improvement going from
100Mb to GigE doesn't make sense to them because they believe GigE is
faster.  It isn't faster, it is fatter.   And yes, TCP is the limiting
factor but we are stuck with it for now.  I can't change what protocol
is being used but I can change the MTU of the existing protocol.  I
believe the demand for such high-bandwidth streams is going to greatly
increase.  This is particularly true as people move out of academic
environments where they are used to working on Abilene (Inet2) and move
into industry and the programs they built won't work because it takes
two days to send one day's worth of data. 

There are end users and there are end users.  It depends on the sort of
end user you are talking about and what they are doing.  If they are
watching TV, they might want a higher MTU.  If they are on Twitter, they
don't care.  Industry end users will have different requirements from
residential end users.


 They are just fine with
 1500 MTU. Higher MTU is a nice to have, not something that is
seriously
 hindering performance on the Internet as it is today or in the next
few
 tens of years.

I disagree with that statement because I believe that the next few years
will see an increased demand for high-bandwidth traffic that needs to be
delivered quickly (HDTV from Tokyo to London, for example). 

One of the reasons people aren't interested is because they don't know.
They are ignorant in most cases.  They just know bandwidth.  They
believe that if they get a fatter pipe, it will improve their viewing of
that Australian porn.  Then they pay for the upgrade and it doesn't
change a thing.  It doesn't change the data transfer rate at all.  They
go from a 10Meg to a 100Meg pipe and that file *still* transfers at
3Meg/sec.  If they could increase the MTU to 9000, they might get
15Meg/sec.

But you are correct, going to an even higher MTU is what is really
needed but going for what is attainable is the first step.  Everyone can
physically do 9000 at the peering points (or at least as far as I am
aware they can) and the only thing that is preventing that is just not
wanting to because they don't fully appreciate the benefit and believe
it might break something.  Increasing MTU never breaks PMTUD.  PMTUD is
only needed because something in the path has a *smaller* MTU than the
end points.  The end points don't care of the path in between has a
larger MTU.






RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread George Bonser
 
 The only reason to use (10)GE for transmission in WAN is the completely
 baroque price difference in interface pricing. With todays line rates,
 the components and complexity of a line card are pretty much equal
 between SDH and GE. There is no reason to overcharge for the better
 interface except because they (all vendors do this) can.

Yes, I really don't understand that either.  You would think that the 
investment in developing and deploying all that SONET infrastructure has been 
paid back by now and they can lower the prices dramatically.  One would think 
the vendors would be practically giving it away, particularly if people 
understood the potential improvement in performance, though the difference 
between 1500 and 4000 is probably not all that much except on long distance ( 
2000km ) paths.




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Brielle Bruns
So, question I don't want to forget between now and when I wake up (since its 
late in my neck of the woods)...

Has any work been done with 1500 mtu on 802.11 links?

Is it feasable, or even possible?

I'm in the middle of rolling out a wisp in an area, and it dawned on me I never 
even considered this aspect of the mtu issue.


-- 
Brielle Bruns
http://www.sosdg.org  /  http://www.ahbl.org

-Original Message-
From: George Bonser gbon...@seven.com
Date: Sun, 7 Nov 2010 00:19:03 
To: nanog@nanog.org
Subject: RE: RINA - scott whaps at the nanog hornets nest :-)

 
 Also, if we're going to go for bigger MTUs, going from 1500 to 9000 is
 basically worthless, if we really want to do something, we should go
 for
 64k or even bigger.

I agree but we need to work with what we have.  Practically everything
currently appearing at a peering point will support 9000.  Getting
equipment that would support 64000 would be more difficult.

 
 About 1500 MTU degrading performance, that's a TCP implementation
 issue,
 not really a network issue. 

True, but TCP is what we are stuck with for right now.  Different
protocols could be developed to handle the small packets better.

 Interrupt performance in end systems for
 high-speed transfers isn't really a general problem, and not until you
 reach speeds of several gigabit/s. Routers handle PPS just fine, this
 was
 solved long ago after we stopped using regular CPUs in them.

We are starting to move to 10Gig + peering connections.  I have two 10G
peering ports currently on order.  several gigabits/sec is here today.

 
 Increasing MTU on the Internet is not something driven by the end-
 users,
 so it's not going to happen in the near future. 

It depends on what those end users are doing.  If they are loading a web
page, you are probably correct.  If they are an enterprise user
transferring log files from Hong Kong to New York, it makes a huge
difference, particularly the moment a packet gets lost somewhere.  At
some point it becomes faster to put data on disks and fly them across
the ocean than to try to transmit it by TCP with 1500 byte MTU.  Trying
to explain to someone that they are not going to get any better
performance on that daily data transfer from the far East by upgrading
from 100 Mb to GigE is hard for them to understand as it is a bit
counter-intuitive.  They believe 1G is faster than 100Meg, when it
isn't.  If you tell them they could get a faster file transfer rate by
using a an OC-3 with MTU 4000 than they would get by upgrading from
100Mb ethernet to GigE with MTU 1500, they just don't get it.  In fact,
telling them that they won't get one iota of improvement going from
100Mb to GigE doesn't make sense to them because they believe GigE is
faster.  It isn't faster, it is fatter.   And yes, TCP is the limiting
factor but we are stuck with it for now.  I can't change what protocol
is being used but I can change the MTU of the existing protocol.  I
believe the demand for such high-bandwidth streams is going to greatly
increase.  This is particularly true as people move out of academic
environments where they are used to working on Abilene (Inet2) and move
into industry and the programs they built won't work because it takes
two days to send one day's worth of data. 

There are end users and there are end users.  It depends on the sort of
end user you are talking about and what they are doing.  If they are
watching TV, they might want a higher MTU.  If they are on Twitter, they
don't care.  Industry end users will have different requirements from
residential end users.


 They are just fine with
 1500 MTU. Higher MTU is a nice to have, not something that is
seriously
 hindering performance on the Internet as it is today or in the next
few
 tens of years.

I disagree with that statement because I believe that the next few years
will see an increased demand for high-bandwidth traffic that needs to be
delivered quickly (HDTV from Tokyo to London, for example). 

One of the reasons people aren't interested is because they don't know.
They are ignorant in most cases.  They just know bandwidth.  They
believe that if they get a fatter pipe, it will improve their viewing of
that Australian porn.  Then they pay for the upgrade and it doesn't
change a thing.  It doesn't change the data transfer rate at all.  They
go from a 10Meg to a 100Meg pipe and that file *still* transfers at
3Meg/sec.  If they could increase the MTU to 9000, they might get
15Meg/sec.

But you are correct, going to an even higher MTU is what is really
needed but going for what is attainable is the first step.  Everyone can
physically do 9000 at the peering points (or at least as far as I am
aware they can) and the only thing that is preventing that is just not
wanting to because they don't fully appreciate the benefit and believe
it might break something.  Increasing MTU never breaks PMTUD.  PMTUD is
only needed because something in the path has a *smaller* MTU than the
end points.  The end

RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Mikael Abrahamsson

On Sun, 7 Nov 2010, George Bonser wrote:

True, but TCP is what we are stuck with for right now.  Different 
protocols could be developed to handle the small packets better.


We're not stuck with TCP, TCP is being developed all the time.

http://en.wikipedia.org/wiki/TCP_congestion_avoidance_algorithm

We are starting to move to 10Gig + peering connections.  I have two 10G 
peering ports currently on order.  several gigabits/sec is here today.


I was talking about end users, not network.


It depends on what those end users are doing.  If they are loading a web
page, you are probably correct.  If they are an enterprise user
transferring log files from Hong Kong to New York, it makes a huge
difference, particularly the moment a packet gets lost somewhere.  At
some point it becomes faster to put data on disks and fly them across
the ocean than to try to transmit it by TCP with 1500 byte MTU.  Trying


Oh, come on. Get real. The world TCP speed record is 10GE right now, it'll 
go higher as soon as there are higher interface speeds to be had.


I can easily get 100 megabit/s long-distance between two linux boxes 
without tweaking the settings much.



I disagree with that statement because I believe that the next few years
will see an increased demand for high-bandwidth traffic that needs to be
delivered quickly (HDTV from Tokyo to London, for example).


MTU and quickly has very little to do with each other.

3Meg/sec.  If they could increase the MTU to 9000, they might get 
15Meg/sec.


Or they might tweak some other TCP settings and get 30 meg/s with existing 
1500 MTU. It's WAY easier to tweak existing TCP than trying to get the 
whole network to go to a higher MTU. We do 4470 internally and on peering 
links where the other end agrees, but getting it to work all the way to 
the end customer isn't really easy.


As with IPv6, doing the core is easy, doing the access is much harder.

it might break something.  Increasing MTU never breaks PMTUD.  PMTUD is 
only needed because something in the path has a *smaller* MTU than the 
end points.  The end points don't care of the path in between has a 
larger MTU.


But in a transition some end systems will have 9000 MTU and some parts of 
the network will have smaller, so then you get problems.


--
Mikael Abrahamssonemail: swm...@swm.pp.se



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Richard A Steenbergen
On Sun, Nov 07, 2010 at 08:02:28AM +0100, Mans Nilsson wrote:
 
 The only reason to use (10)GE for transmission in WAN is the 
 completely baroque price difference in interface pricing. With todays 
 line rates, the components and complexity of a line card are pretty 
 much equal between SDH and GE. There is no reason to overcharge for 
 the better interface except because they (all vendors do this) can.

To be fair, there are SOME legitimate reasons for a cost difference. For 
example, ethernet has very high overhead on small packets and tops out 
at 14.8Mpps over 10GE, whereas SONET can do 7 bytes of overhead for your 
PPP/HDLC and FCS etc and easily end up doing well over 40Mpps of IP 
packets. The cost of the lookup ASIC that only has to support the 
Ethernet link is going to be a lot cheaper, or let you handle a lot more 
links on the same chip.

At this point it's only half price gouging of the silly telco customers 
with money to blow. There really are significant cost savings for the 
vendors in using the more popular and commoditized technology, even 
though it may be technically inferior. Think of it like the old IDE vs 
SCSI wars, when enough people get onboard with the cheaper interior 
technology, eventually they start shoehorning on all the features and 
functionality that you wanted from the other one in the first place. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Richard A Steenbergen
On Sun, Nov 07, 2010 at 12:34:56AM -0700, George Bonser wrote:
 
 Yes, I really don't understand that either.  You would think that the 
 investment in developing and deploying all that SONET infrastructure 
 has been paid back by now and they can lower the prices dramatically.  
 One would think the vendors would be practically giving it away, 
 particularly if people understood the potential improvement in 
 performance, though the difference between 1500 and 4000 is probably 
 not all that much except on long distance ( 2000km ) paths.

Careful, you're rapidly working your way up to nanog kook status with 
these absurd claims based on no logic whatsoever.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread George Bonser


 
 Oh, come on. Get real. The world TCP speed record is 10GE right now,
 it'll
 go higher as soon as there are higher interface speeds to be had.

You can buy 100G right now.  I also believe there are some 40G
available, too.

Also, check this:

http://media.caltech.edu/press_releases/13216

That was in 2008.  

 
 I can easily get 100 megabit/s long-distance between two linux boxes
 without tweaking the settings much.

Until you drop a packet.  I can get 100 Megabits/sec with UDP without
tweaking it at all.  Getting 100Meg/sec San Francisco to London is a
challenge over a typical Internet path (i.e. not a dedicated leased
path).

 Or they might tweak some other TCP settings and get 30 meg/s with
 existing
 1500 MTU. It's WAY easier to tweak existing TCP than trying to get the
 whole network to go to a higher MTU. We do 4470 internally and on
 peering
 links where the other end agrees, but getting it to work all the way
to
 the end customer isn't really easy.

I guess you didn't read the links earlier.  It has nothing to do with
stack tweaks.  The moment you lose a single packet, you are toast.  And
there is a limit to how much you can buffer because at some point it
becomes difficult to locate a packet to resend.  *If* you have a perfect
path, sure, but that is generally not available, particularly to APAC.

 But in a transition some end systems will have 9000 MTU and some parts
 of
 the network will have smaller, so then you get problems.

Which is no different than end systems that have 9000 today.  A lot of
networks run jumbo frames internally now. Maybe a lot more than you
realize.  When you are using NFS and iSCSI and other things like
database queries that return large output, large MTUs save you a lot of
packets. NFS reads in 8K chunks, that can easily fit in a 9000 byte
packets.  It is more common in enterprise and academic networks that you
might be aware.




RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Mikael Abrahamsson

On Sun, 7 Nov 2010, George Bonser wrote:


I guess you didn't read the links earlier.  It has nothing to do with
stack tweaks.  The moment you lose a single packet, you are toast.  And


TCP SACK.

I'm too tired to correct your other statements that lack basis in reality 
(or at least in my reality).


--
Mikael Abrahamssonemail: swm...@swm.pp.se



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread George Bonser
 
  Yes, I really don't understand that either.  You would think that
the
  investment in developing and deploying all that SONET infrastructure
  has been paid back by now and they can lower the prices
dramatically.
  One would think the vendors would be practically giving it away,
  particularly if people understood the potential improvement in
  performance, though the difference between 1500 and 4000 is probably
  not all that much except on long distance ( 2000km ) paths.
 
 Careful, you're rapidly working your way up to nanog kook status with
 these absurd claims based on no logic whatsoever.

My aploligies.  It just seemed to me that the investment in SONET,
particularly the lower data rates, should be pretty much paid back by
now.  How long has OC-12 been around?  I can understand a certain amount
of premium for something that doesn't sell as much but the difference in
prices can be quite amazing in some markets. Some differential might be
justified but why so much?

An OC-12 SFP optic costs nearly $3,000 from one vendor, list.  Their
list price for a GigE SFP optical module is about 30% of that.  What is
it about the optic module that would cause it to be 3 times as expensive
for an interface with half the bandwidth?  A 4-port OC-12 module is
37,500 list.  A 4-port 10G module is $10,000 less for 10x the bandwidth.

In other words, what is the differential in the manufacturing costs of
those?  I don't believe it is as much as the differential in the selling
price.





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread George Bonser
 
 On Sun, 7 Nov 2010, George Bonser wrote:
 
  I guess you didn't read the links earlier.  It has nothing to do
with
  stack tweaks.  The moment you lose a single packet, you are toast.
 And
 
 TCP SACK.
 
 I'm too tired to correct your other statements that lack basis in
 reality
 (or at least in my reality).

But the point being, why should everyone have to be forced into making
multiple tweaks to their stacks to accommodate a worst case when a
single change (and possibly a change in total buffer size) is all that
is needed to get improved performance globally?  With modern PMTUd that
is nearly globally supported at this point, it just isn't as big of an
issue as it was, say, 5 years ago.

It isn't that big of an issue but it does seem to be a very inexpensive
change that offers a large benefit.

It will happen on its own as more and more networks configure internally
for larger frames and as more people migrate out of academia where 9000
is the norm these days into industry.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Will Hargrave

On 6 Nov 2010, at 20:29, Matthew Petach wrote:

 There is no reason why we are still using 1500 byte MTUs at exchange points.
 Completely agree with you on that point.  I'd love to see Equinix, AMSIX, 
 LINX,
 DECIX, and the rest of the large exchange points put out statements indicating
 their ability to transparently support jumbo frames through their fabrics, or 
 at 
 least indicate a roadmap and a timeline to when they think they'll be able to
 support jumbo frames throughout the switch fabrics.


At LONAP we've been able to support jumbo frames (at 9000+ depending on how you 
count it) for some years. We have been running large MTU p2p vlans for members 
for some time - L2TP handoff and so on. What we don't do is support 1500byte 
MTU on the shared peering vlan, and I don't see this changing anytime soon. 
There isn't demand; multiple vlans split your critical mass even if you are 
able to decide on a lowest common denominator above 1500.

I imagine the situation is similar for other exchanges (apart from Netnod as 
already mentioned).

I won't bother to further reiterate the contents of 
20101106203616.gh1...@gerbil.cluepon.net; others can just read Ras's post for 
a concise description. :-)

-- 
Will Hargrave
Technical Director
LONAP Ltd






RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread George Bonser
 
  I guess you didn't read the links earlier.  It has nothing to do
with
  stack tweaks.  The moment you lose a single packet, you are toast.
 And
 
 TCP SACK.


Certainly helps but still has limitations.  If you have too many packets
in flight, it can take too long to locate the SACKed packet in some
implementations, this can cause a TCP timeout and resetting the window
to 1.  It varies from one implementation to another.  The above was for
some implementations of Linux.  The larger the window (high speed, high
latency paths) the worse this problem is.  In other words, sure, you can
get great performance but when you hit a lost packet, depending on which
packet is lost, you can also take a huge performance hit depending on
who is doing the talking or what they are talking to.

Common advice on stack tuning  for very large BDP paths where the TCP
window is  20 MB, you are likely to hit the Linux SACK implementation
problem. If Linux has too many packets in flight when it gets a SACK
event, it takes too long to located the SACKed packet, and you get a TCP
timeout and CWND goes back to 1 packet. Restricting the TCP buffer size
to about 12 MB seems to avoid this problem, but clearly limits your
total throughput. Another solution is to disable SACK.  Even if you
don't have such as system, you might be talking to one.  

But anyway, I still think 1500 is a really dumb MTU value for modern
interfaces and unnecessarily retards performance over long distances.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Will Hargrave

On 7 Nov 2010, at 08:24, George Bonser wrote:

 It will happen on its own as more and more networks configure internally
 for larger frames and as more people migrate out of academia where 9000
 is the norm these days into industry.

I used to run a large academic network; there was a vanishingly small incidence 
of edge ports supporting 1500byte MTU. It's possibly even more tricky than the 
IX situation to support in an environment where you commonly have mixed devices 
at different speeds (most 100mbit devices will not support 1500) on a single 
L2, often under different administrative control.


RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread George Bonser
 
 I used to run a large academic network; there was a vanishingly small
 incidence of edge ports supporting 1500byte MTU. It's possibly even
 more tricky than the IX situation to support in an environment where
 you commonly have mixed devices at different speeds (most 100mbit
 devices will not support 1500) on a single L2, often under different
 administrative control.

At the edge, sure.  There are all sorts of problems there.  The major
two being 1: much of America still uses dialup or some form of PPoE that
has 1500 MTU anyway.  The problem at the other end is that the large
content providers are generally behind load balancers that often don't
support jumbo frames.  So if you are talking to an ISP that serves
residential customers or eyeballs that are viewing content from the
major portals, it makes no sense.  But if you are talking about data
between corporate data centers or from one company for another where
they are ethernet end to end, the picture changes.  Dykstra's note of
that study in 1998 showed that while the majority of the *packets* were
1500 the majority of the *data bytes* were in packets 1500.

So considerably more than 50% of the packets were moving 50% of the
data.

But the networks that are now running 1500 internally can't talk to
each other with those packet sizes across the general Internet until the
longer haul path supports it and again you are talking about a small
number of end points sending large amounts of data.  It will work itself
out but it will probably be consumer demand for higher performance data
streams that finally does it.  (only awake to make sure nothing goes
bonkers during the time change).





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Jeff Kell
On 11/7/2010 3:45 AM, Will Hargrave wrote:
 I used to run a large academic network; there was a vanishingly small 
 incidence of edge ports supporting 1500byte MTU. 

I run a moderately sized academic network, and know some details of our
other campus infrastructure (some larger, some smaller).  We have two
chassis that could do L3 1500, perhaps 10 with some upgrades.  And
perhaps a quarter of our switches could do L2 1500 (we have a lot of
older cheap gear at the access layer).

The only demand for 1500 is iSCSI or FCoE, I can see a need for
backup traffic off the server farms 1500.  We have 1500 enabled in
those areas but it's rather localized and not on the consumer side of
the network.  There's also the computing clusters in the mix, but again
that is localized.

There are enough headaches getting the marginal 1500s over the various
encapsulations, tagging, tunneling, VPNs, etc.

I would have to agree on the small edge population even capable of 1500.

Jeff



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread William Herrin
On Fri, Nov 5, 2010 at 6:32 PM, Scott Weeks sur...@mauigateway.com wrote:
 It's really quiet in here.  So, for some Friday fun let
 me whap at the hornets nest and see what happens...  ;-)

 And so, ...the first principle of our proposed new network architecture: 
 Layers are recursive.

Hi Scott,

Anyone who has bridged an ethernet via a TCP based IPSec tunnel
understands that layers are recursive.


 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf

John Day has been chasing this notion long enough to write three
network stacks. If it works and isn't obviously inferior in its
operational resource consumption, where's the proof-of-concept code?

The last time this was discussed in the Routing Research Group, none
of the proponents were able to adequately describe how to build a
translation/forwarding table in the routers or whatever passes for
routers in this design.

Regards,
Bill Herrin


-- 
William D. Herrin  her...@dirtside.com  b...@herrin.us
3005 Crane Dr. .. Web: http://bill.herrin.us/
Falls Church, VA 22042-3004



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-07 Thread Scott Brim
On 11/08/2010 07:57 GMT+08:00, William Herrin wrote:
 On Fri, Nov 5, 2010 at 6:32 PM, Scott Weeks sur...@mauigateway.com wrote:
 It's really quiet in here.  So, for some Friday fun let
 me whap at the hornets nest and see what happens...  ;-)

 And so, ...the first principle of our proposed new network architecture: 
 Layers are recursive.
 
 Hi Scott,
 
 Anyone who has bridged an ethernet via a TCP based IPSec tunnel
 understands that layers are recursive.

See also G.805 et seq.




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Mans Nilsson
Subject: RINA - scott whaps at the nanog hornets nest :-) Date: Fri, Nov 05, 
2010 at 03:32:30PM -0700 Quoting Scott Weeks (sur...@mauigateway.com):
 
 
 It's really quiet in here.  So, for some Friday fun let me whap at the 
 hornets nest and see what happens...  ;-)
 
 
 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf

This tired bumblebee concludes that another instance of Two bypassed
computer scientists who are angry that ISO OSI didn't catch on gripe
about this, and call IP esp. IPv6, names in effort to taint it. isn't
enough to warrant anything but a yawn.

More troubling might be http://www.iec62379.org/ and what they (I think
they are ATM advocates of the most bellheaded form) are trying to push
into ISO standard. Including gems like Research during the decade
leading up to 2010 shows that the connectionless packet switching
paradigm that is inherent in Internet Protocol is unsuitable for an
increasing proportion of the traffic on the Internet.  Sic!

Now that is something to bite into. 
-- 
Måns Nilsson primary/secondary/besserwisser/machina
MN-1334-RIPE +46 705 989668
Do I have a lifestyle yet?


pgp9cmL3CjmdV.pgp
Description: PGP signature


Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Mark Smith
On Fri, 5 Nov 2010 21:40:30 -0400
Marshall Eubanks t...@americafree.tv wrote:

 
 On Nov 5, 2010, at 7:26 PM, Mark Smith wrote:
 
  On Fri, 5 Nov 2010 15:32:30 -0700
  Scott Weeks sur...@mauigateway.com wrote:
  
  
  
  It's really quiet in here.  So, for some Friday fun let me whap at the 
  hornets nest and see what happens...  ;-)
  
  
  http://www.ionary.com/PSOC-MovingBeyondTCP.pdf
  
  
  Who ever wrote that doesn't know what they're talking about. LISP is
  not the IETF's proposed solution (the IETF don't have one, the IRTF do),
 
 Um, I would not agree. The IRTF RRG considered and is documenting a lot of 
 things, but did not
 come to any consensus as to which one should be a proposed solution.
 

I probably got a bit keen, I've been reading through the IRTF RRG
Recommendation for a Routing Architecture draft which, IIRC, makes a
recommendation to pursue Identifier/Locator Network Protocol rather
than LISP.

Regards,
Mark.


 Regards
 Marshall
 
 
  and streaming media was seen to be one of the early applications of the
  Internet - these types of applications is why TCP was split out of
  IP, why UDP was invented, and why UDP has has a significantly
  different protocol number to TCP.
  
  --
  NAT is your friend
  
  IP doesn’t handle addressing or multi-homing well at all
  
  The IETF’s proposed solution to the multihoming problem is 
  called LISP, for Locator/Identifier Separation Protocol. This
  is already running into scaling problems, and even when it works,
  it has a failover time on the order of thirty seconds.
  
  TCP and IP were split the wrong way
  
  IP lacks an addressing architecture
  
  Packet switching was designed to complement, not replace, the telephone 
  network. IP was not optimized to support streaming media, such as voice, 
  audio broadcasting, and video; it was designed to not be the telephone 
  network.
  --
  
  
  And so, ...the first principle of our proposed new network architecture: 
  Layers are recursive.
  
  I can hear the angry hornets buzzing already.  :-)
  
  scott
  
  
 



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Jack Bates

On 11/5/2010 5:32 PM, Scott Weeks wrote:


It's really quiet in here.  So, for some Friday fun let me whap at the hornets 
nest and see what happens...;-)


http://www.ionary.com/PSOC-MovingBeyondTCP.pdf



SCTP is a great protocol. It has already been implemented in a number of 
stacks. With these benefits over that theory, it still hasn't become 
mainstream yet. People are against change. They don't want to leave v4. 
They don't want to leave tcp/udp. Technology advances, but people will 
only change when they have to.



Jack (lost brain cells actually reading that pdf)



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 Sent: Saturday, November 06, 2010 9:45 AM
 To: nanog@nanog.org
 Subject: Re: RINA - scott whaps at the nanog hornets nest :-)
 
 On 11/5/2010 5:32 PM, Scott Weeks wrote:
 
  It's really quiet in here.  So, for some Friday fun let me whap at
 the hornets nest and see what happens...;-)
 
 
  http://www.ionary.com/PSOC-MovingBeyondTCP.pdf
 
 
 SCTP is a great protocol. It has already been implemented in a number
 of
 stacks. With these benefits over that theory, it still hasn't become
 mainstream yet. People are against change. They don't want to leave v4.
 They don't want to leave tcp/udp. Technology advances, but people will
 only change when they have to.
 
 
 Jack (lost brain cells actually reading that pdf)

I believe SCTP will become more widely used in the mobile device world.  You 
can have several different streams so you can still get an IM, for example, 
while you are streaming a movie.  Eliminating the head of line blockage on 
thin connections is really valuable. 

It would be particularly useful where you have different types of traffic from 
a single destination.  File transfer, for example, might be a good application 
where one might wish to issue interactive commands to move around the directory 
structure while a large file transfer is taking place.

If you really want to shake a hornet's nest, try getting people to get rid of 
this idiotic 1500 byte MTU in the middle of the internet and try to get 
everyone to adopt 9000 byte frames as the standard.  That change right there 
would provide a huge performance increase, load reduction on networks and 
servers, and with a greater number of native ethernet end to end connections, 
there is no reason to use 1500 byte MTUs.  This is particularly true with 
modern PMUT methods (such as with modern Linux kernels ... 
/proc/sys/net/ipv4/tcp_mtu_probing set to either 1 or 2).

While the end points should just be what they are, there is no reason for the 
middle portion, the long haul transport part, to be MTU 1500.

http://staff.psc.edu/mathis/MTU/




RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Michael Hallgren
Le samedi 06 novembre 2010 à 12:15 -0700, George Bonser a écrit :
  Sent: Saturday, November 06, 2010 9:45 AM
  To: nanog@nanog.org
  Subject: Re: RINA - scott whaps at the nanog hornets nest :-)
  
  On 11/5/2010 5:32 PM, Scott Weeks wrote:
  
   It's really quiet in here.  So, for some Friday fun let me whap at
  the hornets nest and see what happens...;-)
  
  
   http://www.ionary.com/PSOC-MovingBeyondTCP.pdf
  
  
  SCTP is a great protocol. It has already been implemented in a number
  of
  stacks. With these benefits over that theory, it still hasn't become
  mainstream yet. People are against change. They don't want to leave v4.
  They don't want to leave tcp/udp. Technology advances, but people will
  only change when they have to.
  
  
  Jack (lost brain cells actually reading that pdf)
 
 I believe SCTP will become more widely used in the mobile device world.  You 
 can have several different streams so you can still get an IM, for example, 
 while you are streaming a movie.  Eliminating the head of line blockage on 
 thin connections is really valuable. 
 
 It would be particularly useful where you have different types of traffic 
 from a single destination.  File transfer, for example, might be a good 
 application where one might wish to issue interactive commands to move around 
 the directory structure while a large file transfer is taking place.
 
 If you really want to shake a hornet's nest, try getting people to get rid of 
 this idiotic 1500 byte MTU in the middle of the internet


I doubt that 1500 is (still) widely used in our Internet... Might be,
though, that most of us don't go all the way to 9k.

mh


  and try to get everyone to adopt 9000 byte frames as the standard.  That 
 change right there would provide a huge performance increase, load reduction 
 on networks and servers, and with a greater number of native ethernet end to 
 end connections, there is no reason to use 1500 byte MTUs.  This is 
 particularly true with modern PMUT methods (such as with modern Linux kernels 
 ... /proc/sys/net/ipv4/tcp_mtu_probing set to either 1 or 2).
 
 While the end points should just be what they are, there is no reason for the 
 middle portion, the long haul transport part, to be MTU 1500.
 
 http://staff.psc.edu/mathis/MTU/
 
 



signature.asc
Description: Ceci est une partie de message	numériquement signée


RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 I doubt that 1500 is (still) widely used in our Internet... Might be,
 though, that most of us don't go all the way to 9k.
 
 mh

Last week I asked the operator of fairly major public peering points if they 
supported anything larger than 1500 MTU.  The answer was no. 




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Matthew Petach
On Sat, Nov 6, 2010 at 12:32 PM, George Bonser gbon...@seven.com wrote:
 I doubt that 1500 is (still) widely used in our Internet... Might be,
 though, that most of us don't go all the way to 9k.

 mh

 Last week I asked the operator of fairly major public peering points if they 
 supported anything larger than 1500 MTU.  The answer was no.


There's still a metric buttload of SONET interfaces in the core that
won't go above 4470.

So, you might conceivably get 4k MTU at some point in the future, but
it's really, *really* unlikely you'll get to 9k MTU any time in the next
decade.

Matt



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 There's still a metric buttload of SONET interfaces in the core that
 won't go above 4470.
 
 So, you might conceivably get 4k MTU at some point in the future, but
 it's really, *really* unlikely you'll get to 9k MTU any time in the
 next
 decade.
 
 Matt

Agreed.  But even 4470 is better than 1500.  1500 was fine for 10G
ethernet, it is actually pretty silly for GigE and better.

This survey that Dykstra did back in 1999 points out exactly what you
mentioned:

http://sd.wareonearth.com/~phil/jumbo.html

And that was over a decade ago.

There is no reason, in my opinion, for the various peering points to be
a 1500 byte bottleneck in a path that might otherwise be larger.
Increasing that from 1500 to even 3000 or 4500 gives a measurable
performance boost over high latency connections such as from Europe to
APAC or Western US.  This is not to mention a reduction in the number of
ACK packets flying back and forth across the Internet and a general
reduction in the number of packets that must be processed for a given
transaction.





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 1500 was fine for 10G

I meant, of course, 10M ethernet.





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
  Last week I asked the operator of fairly major public peering points
 if they supported anything larger than 1500 MTU.  The answer was no.
 
 
 There's still a metric buttload of SONET interfaces in the core that
 won't go above 4470.
 
 So, you might conceivably get 4k MTU at some point in the future, but
 it's really, *really* unlikely you'll get to 9k MTU any time in the
 next
 decade.
 
 Matt

There is no reason why we are still using 1500 byte MTUs at exchange points. 

From Dykstra's paper (note that this was written in 1999 before wide 
deployment of GigE):

(quote)

Does GigE have a place in a NAP?

Not if it reduces the available MTU! Network Access Points (NAPs) are at the 
very core of the internet. They are where multiple wide area networks come 
together. A great deal of internet paths traverse at least one NAP. If NAPs put 
a limitation on MTU, then all WANs, LANs, and end systems that traverse that 
NAP are subject to that limitation. There is nothing the end systems could do 
to lift the performance limit imposed by the NAP's MTU. Because of their 
critically important place in the internet, NAPs should be doing everything 
they can to remove performance bottlenecks. They should be among the most 
permissive nodes in the network as far as the parameter space they make 
available to network applications.

The economic and bandwidth arguments for GigE NAPs however are compelling. 
Several NAPs today are based on switched FDDI (100 Mbps, 4 KB MTU) and are 
running out of steam. An upgrade to OC3 ATM (155 Mbps, 9 KB MTU) is hard to 
justify since it only provides a 50% increase in bandwidth. And trying to 
install a switch that could support 50+ ports of OC12 ATM is prohibitively 
expensive! A 64 port GigE switch however can be had for about $100k and 
delivers 50% more bandwidth per port at about 1/3 the cost of OC12 ATM. The 
problem however is 1500 byte frames, but GigE with jumbo frames would permit 
full FDDI MTU's and only slightly reduce a full Classical IP over ATM MTU (9180 
bytes).

A recent example comes from the Pacific Northwest Gigapop in Seattle which is 
based on a collection of Foundry gigabit ethernet switches. At Supercomputing 
'99, Microsoft and NCSA demonstrated HDTV over TCP at over 1.2 Gbps from 
Redmond to Portland. In order to achieve that performance they used 9000 byte 
packets and thus had to bypass the switches at the NAP! Let's hope that in the 
future NAPs don't place 1500 byte packet limitations on applications.

(end quote)

Having the exchange point of ethernet connections at 1500 MTU will not in any 
way adversely impact the traffic on the path.  If the end points are already at 
1500, this change is completely transparent to them.  If the end points are 
capable of 1500 already, then it would allow the flow to increase its packet 
sizes and reduce the number of packets flowing through the network and give a 
huge gain in performance, even in the face of packet loss.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Richard A Steenbergen
On Sat, Nov 06, 2010 at 12:32:55PM -0700, George Bonser wrote:
  I doubt that 1500 is (still) widely used in our Internet... Might be,
  though, that most of us don't go all the way to 9k.
 
 Last week I asked the operator of fairly major public peering points 
 if they supported anything larger than 1500 MTU.  The answer was no.

It would be absolutely trivial for them to enable jumbo frames, there is 
just no demand for them to do so, as supporting Internet wide jumbo 
frames (particularly over exchange points) is highly non-scalable in 
practice.

It's perfectly safe to have the L2 networks in the middle support the 
largest MTU values possible (other than maybe triggering an obscure 
Force10 bug or something :P), so they could roll that out today and you 
probably wouldn't notice. The real issue is with the L3 networks on 
either end of the exchange, since if the L3 routers that are trying to 
talk to each other don't agree about their MTU valus precisely, packets 
are blackholed. There are no real standards for jumbo frames out there, 
every vendor (and in many cases particular type/revision of hardware 
made by that vendor) supports a slightly different size. There is also 
no negotiation protocol of any kind, so the only way to make these two 
numbers match precisely is to have the humans on both sides talk to each 
other and come up with a commonly supported value.

There are two things that make this practically impossible to support at 
scale, even ignoring all of the grief that comes from trying to find a 
clueful human to talk to on the other end of your connection to a third 
party (which is a huge problem in and of itself):

#1. There is currently no mechanism on any major router to set multiple 
MTU values PER NEXTHOP on a multi-point exchange, so to do jumbo frames 
over an exchange you would have to pick a single common value that 
EVERYONE can support. This also means you can't mix and match jumbo and 
non-jumbo participants over the same exchange, you essentially have to 
set up an entirely new exchange point (or vlan within the same exchange) 
dedicated to the jumbo frame support, and you still have to get a common 
value that everyone can support. Ironically many routers (many kinds of 
Cisco and Juniper routers at any rate) actually DO support per-nexthop 
MTUs in hardware, there is just no mechanism exposed to the end user to 
configure those values, let alone auto-negotiate them.

#2. The major vendors can't even agree on how they represent MTU sizes,
so entering the same # into routers from two different vendors can 
easily result in incompatible MTUs. For example, on Juniper when you 
type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the 
opposite is true. So to make a Cisco talk to a Juniper that is 
configured 9192, you would have to configure mtu 9178. Except it's not 
even that simple, because now if you start adding vlan tagging the L2 
header size is growing. If you now configure vlan tagging on the 
interface, you've got to make the Cisco side 9174 to match the Juniper's 
9192. And if you configure flexible-vlan-tagging so you can support 
q-in-q, you've now got to configure to Cisco side for 9170.

As an operator who DOES fully support 9k+ jumbos on every internal link 
in my network, and as many external links as I can find clueful people 
to talk to on the other end to negotiate the correct values, let me just 
tell you this is a GIANT PAIN IN THE ASS. And we're not even talking 
about making sure things actually work right for the end user. Your IGP 
may not come up at all if the MTUs are misconfigured, but EBGP certainly 
will, even if the two sides are actually off by a few bytes. The maximum 
size of a BGP message is 4096 octets, and there is no mechanism to pad a 
message and try to detect MTU incompatibility, so what will actually 
happen in real life is the end user will try to send a big jumbo frame 
through and find that some of their packets are randomly and silently 
blackholed. This would be an utter nightmare to support and diagnose.

Realistically I don't think you'll ever see even a serious attempt at 
jumbo frame support implemented in any kind of scale until there is a 
negotiation protocol and some real standards for the mtu size that must 
be supported, which is something that no standards body (IEEE, IETF, 
etc) has seemed inclined to deal with so far. Of course all of this is 
based on the assumption that path mtu discovery will work correctly once 
the MTU valus ARE correctly configured on the L3 routers, which is a 
pretty huge assumption, given all the people who stupidly filter ICMP. 
Oh and even if you solved all of those problems, I could trivially DoS 
your router with some packets that would overload your ability to 
generate ICMP Unreach Needfrag messages for PMTUD, and then all your 
jumbo frame end users going through that router would be blackholed as 
well.

Great idea in theory, epic disaster in practice, at least given 

RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 Completely agree with you on that point.  I'd love to see Equinix,
 AMSIX, LINX,
 DECIX, and the rest of the large exchange points put out statements
 indicating
 their ability to transparently support jumbo frames through their
 fabrics, or at
 least indicate a roadmap and a timeline to when they think they'll be
 able to
 support jumbo frames throughout the switch fabrics.
 
 Matt

Yes, in moving from SONET to Ethernet exchange points, we have actually
reduced the potential performance of applications across the network for
no good reason, in many cases.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Jack Bates

On 11/6/2010 3:36 PM, Richard A Steenbergen wrote:


#2. The major vendors can't even agree on how they represent MTU sizes,
so entering the same # into routers from two different vendors can
easily result in incompatible MTUs. For example, on Juniper when you
type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the
opposite is true. So to make a Cisco talk to a Juniper that is
configured 9192, you would have to configure mtu 9178. Except it's not
even that simple, because now if you start adding vlan tagging the L2
header size is growing. If you now configure vlan tagging on the
interface, you've got to make the Cisco side 9174 to match the Juniper's
9192. And if you configure flexible-vlan-tagging so you can support
q-in-q, you've now got to configure to Cisco side for 9170.


I agree with the rest, but actually, I've found that juniper has a 
manual physical mtu with a separate logical mtu available, while cisco 
sets a logical mtu and autocalculates the physical mtu (or perhaps the 
physical is just hard set to maximum). It depends on the equipment in 
cisco, though. L3 and L2 interfaces treat mtu differently, especially 
noticeable when doing q-in-q on default switches without adjusting the 
mtu. Also noticeable in mtu setting methods on a c7600(l2 vs l3 methods)


In practice, i think you can actually pop the physical mtu on the 
juniper much higher than necessary, so long as you set the family based 
logical mtu's at the appropriate value.



Jack



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Michael Hallgren
Le samedi 06 novembre 2010 à 13:01 -0700, Matthew Petach a écrit :
 On Sat, Nov 6, 2010 at 12:32 PM, George Bonser gbon...@seven.com wrote:
  I doubt that 1500 is (still) widely used in our Internet... Might be,
  though, that most of us don't go all the way to 9k.
 
  mh
 
  Last week I asked the operator of fairly major public peering points if 
  they supported anything larger than 1500 MTU.  The answer was no.
 
 
 There's still a metric buttload of SONET interfaces in the core that
 won't go above 4470.
 
 So, you might conceivably get 4k MTU at some point in the future, but
 it's really, *really* unlikely you'll get to 9k MTU any time in the next
 decade.

Right, though I'm unsure of decade since we're moving off SDH/Sonet
quite agressively.

mh

 
 Matt



signature.asc
Description: Ceci est une partie de message	numériquement signée


RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 It's perfectly safe to have the L2 networks in the middle support the
 largest MTU values possible (other than maybe triggering an obscure
 Force10 bug or something :P), so they could roll that out today and
you
 probably wouldn't notice. The real issue is with the L3 networks on
 either end of the exchange, since if the L3 routers that are trying to
 talk to each other don't agree about their MTU valus precisely,
packets
 are blackholed. There are no real standards for jumbo frames out
there,
 every vendor (and in many cases particular type/revision of hardware
 made by that vendor) supports a slightly different size. There is also
 no negotiation protocol of any kind, so the only way to make these two
 numbers match precisely is to have the humans on both sides talk to
 each
 other and come up with a commonly supported value.

That is not a new problem.  That is also true to today with last mile
links (e.g. dialup) that support 1500 byte MTU.  What is different
today is RFC 4821 PMTU discovery which deals with the black holes.

RFC 4821 PMTUD is that negotiation that is lacking.  It is there.
It is deployed.  It actually works.  No more relying on someone sending
the ICMP packets through in order for PMTUD to work!

 There are two things that make this practically impossible to support
 at
 scale, even ignoring all of the grief that comes from trying to find a
 clueful human to talk to on the other end of your connection to a
third
 party (which is a huge problem in and of itself):
 
 #1. There is currently no mechanism on any major router to set
multiple
 MTU values PER NEXTHOP on a multi-point exchange, so to do jumbo
frames
 over an exchange you would have to pick a single common value that
 EVERYONE can support. This also means you can't mix and match jumbo
and
 non-jumbo participants over the same exchange, you essentially have to
 set up an entirely new exchange point (or vlan within the same
 exchange)
 dedicated to the jumbo frame support, and you still have to get a
 common
 value that everyone can support. Ironically many routers (many kinds
of
 Cisco and Juniper routers at any rate) actually DO support per-nexthop
 MTUs in hardware, there is just no mechanism exposed to the end user
to
 configure those values, let alone auto-negotiate them.

Is there any gear connected to a major IX that does NOT support large
frames?  I am not aware of any manufactured today.  Even cheap D-Link
gear supports them.  I believe you would be hard-pressed to locate gear
that doesn't support it at any major IX.  Granted, it might require the
change of a global config value and a reboot for it to take effect in
some vendors.

http://darkwing.uoregon.edu/~joe/jumbo-clean-gear.html



 #2. The major vendors can't even agree on how they represent MTU
sizes,
 so entering the same # into routers from two different vendors can
 easily result in incompatible MTUs. For example, on Juniper when you
 type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the
 opposite is true. So to make a Cisco talk to a Juniper that is
 configured 9192, you would have to configure mtu 9178. Except it's not
 even that simple, because now if you start adding vlan tagging the L2
 header size is growing. If you now configure vlan tagging on the
 interface, you've got to make the Cisco side 9174 to match the
 Juniper's
 9192. And if you configure flexible-vlan-tagging so you can support
 q-in-q, you've now got to configure to Cisco side for 9170.

Again, the size of the MTU on the IX port doesn't change the size of the
packets flowing through that gear.  A packet sent from an end point with
an MTU of 1500 will be unchanged by the router change.  A flow to an end
point with 1500 MTU will also be adjusted down by PMTU Discovery just
as it is now when communicating with a dialup end point that might have
600 MTU.  The only thing that is going to change from the perspective
of the routers is the communications originated by the router which will
basically just be the BGP session.  When the TCP session is established
for BGP, the smaller of the two MTU will report an MSS value which is
the largest packet size it can support.  The other unit will not send a
packet larger than this even if it has a larger MTU.  Just because the
MTU is 9000 doesn't mean it is going to aggregate 1500 byte packets
flowing through it into 9000 byte packets, it is going to pass them
through unchanged.  

As for the configuration differences between units, how does that change
from the way things are now?  A person configuring a Juniper for 1500
byte packets already must know the difference as that quirk of including
the headers is just as true at 1500 bytes as it is at 9000 bytes.  Does
the operator suddenly become less competent with their gear when they
use a different value?  Also, a 9000 byte MTU would be a happy value
that practically everyone supports these days, including ethernet
adaptors on host machines.

 As an operator who DOES fully support 9k+ 

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Michael Hallgren
Le samedi 06 novembre 2010 à 13:29 -0700, Matthew Petach a écrit :
 On Sat, Nov 6, 2010 at 1:22 PM, George Bonser gbon...@seven.com wrote:
  
   Last week I asked the operator of fairly major public peering points
  if they supported anything larger than 1500 MTU.  The answer was no.
  
 
  There's still a metric buttload of SONET interfaces in the core that
  won't go above 4470.
 
  So, you might conceivably get 4k MTU at some point in the future, but
  it's really, *really* unlikely you'll get to 9k MTU any time in the
  next
  decade.
 
  Matt
 
  There is no reason why we are still using 1500 byte MTUs at exchange points.
 
 
 Completely agree with you on that point.  I'd love to see Equinix, AMSIX, 
 LINX,
 DECIX, and the rest of the large exchange points put out statements indicating
 their ability to transparently support jumbo frames through their
 fabrics, or at
 least indicate a roadmap and a timeline to when they think they'll be able to
 support jumbo frames throughout the switch fabrics.

Agree. Some people do: Netnod. ;) (1500 in one option, 4470 in another,
part of a single interconnection deal -- unless I'm mistaken about the
contractual side of things).

mh

 
 Matt



signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Matthew Petach
On Sat, Nov 6, 2010 at 2:21 PM, George Bonser gbon...@seven.com wrote:

...
 As for the configuration differences between units, how does that change
 from the way things are now?  A person configuring a Juniper for 1500
 byte packets already must know the difference as that quirk of including
 the headers is just as true at 1500 bytes as it is at 9000 bytes.  Does
 the operator suddenly become less competent with their gear when they
 use a different value?  Also, a 9000 byte MTU would be a happy value
 that practically everyone supports these days, including ethernet
 adaptors on host machines.

While I think 9k for exchange points is an excellent target, I'll reiterate
that there's a *lot* of SONET interfaces out there that won't be going
away any time soon, so practically speaking, you won't really get more
than 4400 end-to-end, even if you set your hosts to 9k as well.

And yes, I agree with ras; having routers able to adjust on a per-session
basis would be crucial; otherwise, we'd have to ask the peeringdb folks to
add a field that lists each participant's interface MTU at each exchange,
and part of peermaker would be a check that could warn you,
sorry, you can't peer with network X, your MTU is too small.  ;-P

(though that would make for an interesting deepering notice...sorry, we
will be unable to peer with networks who cannot support large MTUs
at exchange point X after this date.)

Matt



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread sthaug
 Completely agree with you on that point.  I'd love to see Equinix, AMSIX, 
 LINX,
 DECIX, and the rest of the large exchange points put out statements indicating
 their ability to transparently support jumbo frames through their
 fabrics, or at
 least indicate a roadmap and a timeline to when they think they'll be able to
 support jumbo frames throughout the switch fabrics.

The Netnod IX in Sweden has offered 4470 MTU for many years. From

http://www.netnod.se/technical_information.shtml

One VLAN handles standard sized Ethernet frames (MTU 1500 bytes) and
one handles Ethernet Jumbo frames with MTU-size 4470 bytes.

Steinar Haug, Nethelp consulting, sth...@nethelp.no



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread sthaug
 RFC 4821 PMTUD is that negotiation that is lacking.  It is there.
 It is deployed.  It actually works.  No more relying on someone sending
 the ICMP packets through in order for PMTUD to work!

For some value of works. There are way too many places filtering
ICMP for PMTUD to work consistently. PMTUD is *not* the solution,
unfortunately.

Steinar Haug, Nethelp consulting, sth...@nethelp.no



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 -Original Message-
 From: sth...@nethelp.no [mailto:sth...@nethelp.no]
 Sent: Saturday, November 06, 2010 2:40 PM
 To: George Bonser
 Cc: r...@e-gerbil.net; nanog@nanog.org
 Subject: Re: RINA - scott whaps at the nanog hornets nest :-)
 
  RFC 4821 PMTUD is that negotiation that is lacking.  It is
there.
  It is deployed.  It actually works.  No more relying on someone
 sending
  the ICMP packets through in order for PMTUD to work!
 
 For some value of works. There are way too many places filtering
 ICMP for PMTUD to work consistently. PMTUD is *not* the solution,
 unfortunately.
 
 Steinar Haug, Nethelp consulting, sth...@nethelp.no

I guess you missed the part about 4821 PMTUD does not rely on ICMP.

Modern PMTUD does not rely on ICMP and works even where it is filtered.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Jack Bates

On 11/6/2010 4:40 PM, sth...@nethelp.no wrote:

 For some value of works. There are way too many places filtering
 ICMP for PMTUD to work consistently. PMTUD is *not* the solution,
 unfortunately.


He was referring to the updated RFC 4821.

 In the absence of ICMP messages, the proper MTU is determined by starting
   with small packets and probing with successively larger packets.  The
   bulk of the algorithm is implemented above IP, in the transport layer
   (e.g., TCP) or other Packetization Protocol that is responsible for
   determining packet boundaries.

It is designed to support working without ICMP. It's draw back is the 
ramp time, which makes it useless for small transactions, but it can be 
argued that small transactions don't need larger MTUs.



Jack



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser

 
 While I think 9k for exchange points is an excellent target, I'll
 reiterate
 that there's a *lot* of SONET interfaces out there that won't be going
 away any time soon, so practically speaking, you won't really get more
 than 4400 end-to-end, even if you set your hosts to 9k as well.

Agreed.  But in the meantime, removing the 1500 bottlenecks at the
ethernet peering ports would at least provide the potential for the
connection to scale up to the 4400 available by the SONET links.  Right
now, nothing is possible above 1500 for most flows that traverse an
ethernet peering point.

My point is that 1500 is a relic.  Put another way, how come PoS at 4400
in the path doesn't break anything currently between endpoints while any
suggestion that ethernet be made larger than 1500 in the path causes all
this reaction?  We already HAVE MTUs larger than 1500 in the middle
part of the path.  This really doesn't change much of anything from that
perspective.

For example, simply taking Ethernet to 3000 would still be smaller than
SONET and even that would provide measurable benefit.

There is a certain but that is the way it has always been done inertia
that I believe needs to be overcome.  Increasing the path MTU has the
potential to greatly improve performance at practically no cost to
anyone involved.  We are throttling performance of the Internet for no
sound technical reason, in my opinion.

Now I could see where someone selling jumbo paths at a premium might
be reluctant to see the Internet generally go that path as it would
decrease their value add, but that is a different story.




RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 He was referring to the updated RFC 4821.
 
  In the absence of ICMP messages, the proper MTU is determined by
 starting
 with small packets and probing with successively larger packets.
 The
 bulk of the algorithm is implemented above IP, in the transport
 layer
 (e.g., TCP) or other Packetization Protocol that is responsible
 for
 determining packet boundaries.
 
 It is designed to support working without ICMP. It's draw back is the
 ramp time, which makes it useless for small transactions, but it can
be
 argued that small transactions don't need larger MTUs.
 
 
 Jack

That is also somewhat mitigated in that it operates in two modes.  The
first mode is what I would call passive mode and only comes into play
once a black hole is detected.  It does not change the operation of TCP
until a packet disappears.  The second method is the active mode where
it actively probes with increasing packet sizes until it hits a black
hole or gets an ICMP response.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread sthaug
   RFC 4821 PMTUD is that negotiation that is lacking.  It is there.
   It is deployed.  It actually works.  No more relying on someone sending
   the ICMP packets through in order for PMTUD to work!
  
  For some value of works. There are way too many places filtering
  ICMP for PMTUD to work consistently. PMTUD is *not* the solution,
  unfortunately.
 
 I guess you missed the part about 4821 PMTUD does not rely on ICMP.
 
 Modern PMTUD does not rely on ICMP and works even where it is filtered.

As long as the implementations are few and far between:

https://www.psc.edu/~mathis/MTU/
http://www.ietf.org/mail-archive/web/rrg/current/msg05816.html

the traditional ICMP-based PMTUD is what most of use face today.

Steinar Haug, Nethelp consulting, sth...@nethelp.no



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Jack Bates

On 11/6/2010 4:52 PM, George Bonser wrote:


That is also somewhat mitigated in that it operates in two modes.  The
first mode is what I would call passive mode and only comes into play
once a black hole is detected.  It does not change the operation of TCP
until a packet disappears.  The second method is the active mode where
it actively probes with increasing packet sizes until it hits a black
hole or gets an ICMP response.


While it reads well, what implementations are actually in use? As with 
most protocols, it is useless if it doesn't have a high penetration.


Jack



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Dan White

On 06/11/10 15:56 -0500, Jack Bates wrote:

On 11/6/2010 3:36 PM, Richard A Steenbergen wrote:


#2. The major vendors can't even agree on how they represent MTU sizes,
so entering the same # into routers from two different vendors can
easily result in incompatible MTUs. For example, on Juniper when you
type mtu 9192, this is INCLUSIVE of the L2 header, but on Cisco the
opposite is true. So to make a Cisco talk to a Juniper that is
configured 9192, you would have to configure mtu 9178. Except it's not
even that simple, because now if you start adding vlan tagging the L2
header size is growing. If you now configure vlan tagging on the
interface, you've got to make the Cisco side 9174 to match the Juniper's
9192. And if you configure flexible-vlan-tagging so you can support
q-in-q, you've now got to configure to Cisco side for 9170.


I agree with the rest, but actually, I've found that juniper has a 
manual physical mtu with a separate logical mtu available, while 
cisco sets a logical mtu and autocalculates the physical mtu (or 
perhaps the physical is just hard set to maximum). It depends on the 
equipment in cisco, though. L3 and L2 interfaces treat mtu 
differently, especially noticeable when doing q-in-q on default 
switches without adjusting the mtu. Also noticeable in mtu setting 
methods on a c7600(l2 vs l3 methods)


In practice, i think you can actually pop the physical mtu on the 
juniper much higher than necessary, so long as you set the family 
based logical mtu's at the appropriate value.


Cisco calls this 'routing mtu' and 'jumbo mtu' on the platform we have to
distinguish between layer 3 mtu (where packets which exceed that size get
fragmented) and layer 2 mtu (where frames that exceed that size get dropped
on the floor as 'giants').

We always set layer 2 mtu as high as we can on our switches (9000+), and
strictly leave everything else (layer 3) at 1500 bytes. In my experience,
setting two hosts to differing layer 3 MTUs will lead to fragmentation at
some point along the routing path or within one of the hosts.

With Path MTU Discovery moved to the end hosts in v6, the concept of a
standardized MTU should go away, and open up much larger MTUs. However,
that may not happen until dual stacked v4/v6 goes away.

--
Dan White



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 As long as the implementations are few and far between:
 
 https://www.psc.edu/~mathis/MTU/
 http://www.ietf.org/mail-archive/web/rrg/current/msg05816.html
 
 the traditional ICMP-based PMTUD is what most of use face today.
 
 Steinar Haug, Nethelp consulting, sth...@nethelp.no

It is already the standard with currently shipping Solaris and on by
default.  It ships in Linux 2.6.32 but is off by default (sysctl I
referred to earlier).  It ships with Microsoft Windows as Blackhole
Router Detection and is on by default since Windows 2003 SP2.  The
notion that it isn't widely deployed is not the case.  It has been much
more widely deployed now than it was 12 months ago. 

And again, deploying 9000 byte MTU in the MIDDLE of the network is not
going to change PMTUD one iota unless the rest of the path between both
end points is 9000 bytes since the end points are already probably 1500
hundred anyway.

Changing the MTU on a router in the path is not going to cause the
packets flowing through it to change in size.

It will not introduce any additional PMTU issues as those are end-to-end
problems anyway, if anything it should REDUCE them by making the path
9000 byte clean in the middle, there shouldn't BE any PMTU problems in
the middle of the network and things like reduced effective MTU from
tunnels in the middle of networks disappears.

For example, if some network is using MTU 1500 and tunnels something
over GRE and doesn't enlarge the MTU of the interfaces handling that
tunnel, and if they block ICMP from inside their net, then they have
introduced a PMTU issue by reducing the effective MTU of the
encapsulated packets.  I deal with that very problem all the time.
Increasing the MTU on those paths to 9000 would enable 1500 byte packets
to travel unmolested and eliminate that PMTU problem.  In fact, many
networks already get around that problem by increasing the MTU on
tunnels just so they can avoid fragmenting the encapsulated packet.
Increasing to 9000 would REDUCE problems across the network for end
points using an MTU smaller than 9000.





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 While it reads well, what implementations are actually in use? As with
 most protocols, it is useless if it doesn't have a high penetration.
 
 Jack

Solaris 10, in use and on by default.  Available on Windows for a very
long time as blackhole router detection was off by default originally,
on by default since Win2003 SP2 and is on by default in Win7.  It is on
by default since Windows XP SP3.

It is available on Linux but not yet on by default.  I expect that will
change once it gets enough use.

I am not sure of the default deployment in MacOS and BSD but know it is
available.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Richard A Steenbergen
On Sat, Nov 06, 2010 at 02:21:51PM -0700, George Bonser wrote:
 
 That is not a new problem.  That is also true to today with last 
 mile links (e.g. dialup) that support 1500 byte MTU.  What is 
 different today is RFC 4821 PMTU discovery which deals with the black 
 holes.
 
 RFC 4821 PMTUD is that negotiation that is lacking.  It is there. 
 It is deployed.  It actually works.  No more relying on someone 
 sending the ICMP packets through in order for PMTUD to work!

The only thing this adds is trial-and-error probing mechanism per flow, 
to try and recover from the infinite blackholing that would occur if 
your ICMP is blocked in classic PMTUD. If this actually happened in any 
scale, it would create a performance and overhead penalty that is far 
worse than the original problem you're trying to solve.

Say you have two routers talking to each other over a L2 switched 
infrastructure (i.e. an exchange point). In order for PMTUD to function 
quickly and effectively, the two routers on each end MUST agree on the 
MTU value of the link between them. If router A thinks it is 9000, and 
router B thinks it is 8000, when router A comes along and tries to send 
a 8001 byte packet it will be silently discarded, and the only way to 
recover from this is with trial-and-error probing by the endpoints after 
they detect what they believe to be MTU blackholing. This is little more 
than a desperate ghetto hack designed to save the connection from 
complete disaster.

The point where a protocol is needed is between router A and router B, 
so they can determine the MTU of the link, without needing to involve 
the humans in a manual negotiation process. Ideally this would support 
multi-point LANs over ethernet as well, so .1 could have an MTU of 9000, 
.2 could have an MTU of 8000, etc. And of course you have to make sure 
that you can actually PASS the MTU across the wire (if the switch in the 
middle can't handle it, the packet will also be silently dropped), so 
you can't just rely on the other side to tell you what size it THINKS it 
can support. You don't have a shot in hell of having MTUs negotiated 
correctly or PMTUD work well until this is done.

 Is there any gear connected to a major IX that does NOT support large 
 frames?  I am not aware of any manufactured today.  Even cheap D-Link 
 gear supports them.  I believe you would be hard-pressed to locate 
 gear that doesn't support it at any major IX.  Granted, it might 
 require the change of a global config value and a reboot for it to 
 take effect in some vendors.
 
 http://darkwing.uoregon.edu/~joe/jumbo-clean-gear.html

If that doesn't prove my point about every vendor having their own 
definition of what # is and isn't supported, I don't know what does. 
Also, I don't know what exchanges YOU connect to, but I very clearly see 
a giant pile of gear on that list that is still in use today. :)

 As for the configuration differences between units, how does that 
 change from the way things are now?  A person configuring a Juniper 
 for 1500 byte packets already must know the difference as that quirk 
 of including the headers is just as true at 1500 bytes as it is at 
 9000 bytes.  Does the operator suddenly become less competent with 
 their gear when they use a different value?  Also, a 9000 byte MTU 
 would be a happy value that practically everyone supports these days, 
 including ethernet adaptors on host machines.

Everything defaults to 1500 today, so nobody has to do anything. Again, 
I'm actually doing this with people today on a very large network with 
lots of peers all over the world, so I have a little bit of experience 
with exactly what goes wrong. Nearly everyone who tries to figure out 
the correct MTU between vendors and with a third party network gets it 
wrong, at least some significant percentage of the time.

And honestly I can't even find an interesting number of people willing 
to turn on BFD, something with VERY clear benefits for improving failure 
detection time over an IX (for the next time Equinix decides to do one 
of their 10PM maintenances that causes hours of unreachability until 
hold timers expire :P). If the IX operators saw any significant demand 
they would have already turned it on already.

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Doug Barton

On 11/6/2010 3:14 PM, George Bonser wrote:

It ships with Microsoft Windows as Blackhole
Router Detection and is on by default since Windows 2003 SP2.


The first item returned on a blekko search is the following article 
which indicates that it is on by default in Windows 
2008/Vista/2003/XP/2000. The article seems to predate Win7.



hth,

Doug

--

Nothin' ever doesn't change, but nothin' changes much.
-- OK Go

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/




RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 The only thing this adds is trial-and-error probing mechanism per
flow,
 to try and recover from the infinite blackholing that would occur if
 your ICMP is blocked in classic PMTUD. If this actually happened in
any
 scale, it would create a performance and overhead penalty that is far
 worse than the original problem you're trying to solve.

I ran into this very problem not long ago when attempting to reach a
server for a very large network.  Our Solaris hosts had no problem
transacting with the server.  Our linux machines did have a problem and
the behavior looked like a typical PMTU black hole.  It turned out that
very large network tunneled the connection inside their network
reducing the effective MTU of the encapsulated packets and blocked ICMP
from inside their net to the outside.  Changing the advertised MSS of
the connection to that server to 1380 allowed it to work 

( ip route add ip address via gateway dev device advmss 1380 )

and that verified that the problem was an MTU black hole.  A little
reading revealed why Solaris wasn't having the problem but Linux did.
Setting the Linux ip_no_pmtu_disc sysctl to 1 resulted in the Linux
behavior matching the Solaris behavior.

 Say you have two routers talking to each other over a L2 switched
 infrastructure (i.e. an exchange point). In order for PMTUD to
function
 quickly and effectively, the two routers on each end MUST agree on the
 MTU value of the link between them. If router A thinks it is 9000, and
 router B thinks it is 8000, when router A comes along and tries to
send
 a 8001 byte packet it will be silently discarded, and the only way to
 recover from this is with trial-and-error probing by the endpoints
 after
 they detect what they believe to be MTU blackholing. This is little
 more
 than a desperate ghetto hack designed to save the connection from
 complete disaster.

Correct. Devices on the same vlan will need to use the same MTU.  And
why is that a problem?  That is just as true then as it is today.
Nothing changes.  All you are doing is changing from everyone using 1500
to everyone using 9000 on that vlan.  Nothing else changes.  Why is that
any kind of issue?

 The point where a protocol is needed is between router A and router B,
 so they can determine the MTU of the link, without needing to involve
 the humans in a manual negotiation process. 

When the TCP/IP connection is opened between the routers for a routing
session, they should each send the other an MSS value that says how
large a packet they can accept.  You already have that information
available. TCP provides that negotiation for directly connected
machines.

Again, nothing changes from the current method of operating. If I showed
up at a peering switch and wanted to use 1000 byte MTU, I would probably
have some problems.  The point I am making is that 1500 is a relic value
that hamstrings Internet performance and there is no good reason not to
use 9000 byte MTU at peering points (by all participants) since it A:
introduces no new problems and B: I can't find a vendor of modern gear
at a peering point that doesn't support it though there may be some
ancient gear at some peering points in use by some of the peers.

I can not think of a problem changing from 1500 to 9000 as the standard
at peering points introduces.  It would also speed up the loading of the
BGP routes between routers at the peering points.  If Joe Blow at home
with a dialup connection with an MTU of 576 is talking to a server at Y!
with an MTU of 10 billion, changing a peering path from 1500 to 9000
bytes somewhere in the path is not going to change that PMTU discovery
one iota.  It introduces no problem whatsoever. It changes nothing.


 
 If that doesn't prove my point about every vendor having their own
 definition of what # is and isn't supported, I don't know what does.
 Also, I don't know what exchanges YOU connect to, but I very clearly
 see
 a giant pile of gear on that list that is still in use today. :)

That is a list of 9000 byte clean gear.  The very bottom is the stuff
that doesn't support it.  Of the stuff that doesn't support it, how much
is connected directly to a peering point?  THAT is the bottleneck I am
talking about right now.  One step at a time.  Removing the bottleneck
at the peering points is all I am talking about.  That will not change
PMTU issues elsewhere and those will stand just exactly as they are
today without any change.  In fact it will ensure that there are *fewer*
PMTU discovery issues by being able to support a larger range of packets
without having to fragment them.

We *already* have SONET MTU of 4000 and this hasn't broken anything
since the invention of SONET.





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser

 
 and that verified that the problem was an MTU black hole.  A little
 reading revealed why Solaris wasn't having the problem but Linux did.
 Setting the Linux ip_no_pmtu_disc sysctl to 1 resulted in the Linux
 behavior matching the Solaris behavior.

Oops, meant tcp_mtu_probing





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
Re: large MTU

One place where this has the potential to greatly improve performance is
in transfers of large amounts of data such as vendors supporting the
downloading of movies, cloud storage vendors, and movement of other
large content and streaming. The *first* step in being able to realize
those gains is in removing the low hanging fruit of bottlenecks in
that path.  The lowest hanging fruit is the peering points.  Changing
those should introduce no new problems as the peering points aren't
currently the source of MTU path discovery problems and increasing the
MTU removes a discovery issue point, only reducing the MTU would create
one.

In transitioning from SONET to Ethernet, we are actually reducing
potential performance by reducing the effective MTU from 4000 to 2000.
So even increasing bandwidth is of no use if you are potentially
reducing performance end to end by reducing the effective maximum MTU of
the path.

In that diagram on Phil Dykstra's page linked to earlier, even though
the number of packets on that OC3 backbone were mostly (by a large
margin) le 1500 bytes, the majority of the TRAFFIC was carried by
packets ge 1500 bytes.

http://sd.wareonearth.com/~phil/pktsize_hist.gif

 The above graph is from a study[1] of traffic on the InternetMCI
backbone in 1998. It shows the distribution of packet sizes flowing over
a particular backbone OC3 link. There is clearly a wall at 1500 bytes
(the ethernet limit), but there is also traffic up to the 4000 byte FDDI
MTU. But here is a more surprising fact: while the number of packets
larger than 1500 bytes appears small, more than 50% of the bytes were
carried by such packets because of their larger size.

[1] the nature of the beast: recent traffic measurements from an
Internet backbone
http://www.caida.org/outreach/papers/1998/Inet98/





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Richard A Steenbergen
On Sat, Nov 06, 2010 at 03:49:19PM -0700, George Bonser wrote:
 
 When the TCP/IP connection is opened between the routers for a routing 
 session, they should each send the other an MSS value that says how 
 large a packet they can accept.  You already have that information 
 available. TCP provides that negotiation for directly connected 
 machines.

You're proposing that routers should dynamically alter the interface MTU 
based on the TCP MSS value they receive from an EBGP neighbor? I barely 
know where to begin, but first off MSS is not MTU, it is only loosely 
related to MTU. MSS is affected by TCP options (window scale, sack, MD5 
authentication, etc), and MSS between routers can be set to any value a 
user chooses. There is absolutely no guarantee that MSS is going to lead 
to a correct guess at the MTU. Also, many routers still default to 
having PMTUD turned off, would you suggest that they should set the 
physical interface MTU to 576 based on that? :) And alas, it's one hell 
of a layer violation too.

A negotiation protocol is needed, but you could argue about where it 
should be for days. Maybe at the physical layer as part of 
auto-negotiation, maybe at the L3-L2 layer (i.e. negotiate it per IP 
as part of arp or neighbor discovery), hell maybe even in BGP, but keyed 
off MSS is way over the top. :)

 Again, nothing changes from the current method of operating. If I 
 showed up at a peering switch and wanted to use 1000 byte MTU, I would 
 probably have some problems.  The point I am making is that 1500 is a 
 relic value that hamstrings Internet performance and there is no good 
 reason not to use 9000 byte MTU at peering points (by all 
 participants) since it A: introduces no new problems and B: I can't 
 find a vendor of modern gear at a peering point that doesn't support 
 it though there may be some ancient gear at some peering points in use 
 by some of the peers.

Have you ever tried showing up to the Internet with a 1000 byte MTU? The 
only time that works correctly today is when you're rewriting TCP MSS 
values as the packet goes through the constrained link, which may be 
fine for the GRE tunnel to a Linux box at your house, but clearly can't 
work on the real Internet.

 I can not think of a problem changing from 1500 to 9000 as the 
 standard at peering points introduces.  It would also speed up the 

This suggests a serious lack of imagination on your part. :)

 loading of the BGP routes between routers at the peering points. If 

It's a very very modest increase at best.

 Joe Blow at home with a dialup connection with an MTU of 576 is 
 talking to a server at Y! with an MTU of 10 billion, changing a 
 peering path from 1500 to 9000 bytes somewhere in the path is not 
 going to change that PMTU discovery one iota.  It introduces no 
 problem whatsoever. It changes nothing.

You know one very good reason for the people on a dialup connection to 
have low MTUs is serialization delay. As link speeds have gotten faster 
but MTUs have stayed the same, one tangible benefit is the lack of a 
need for fair queueing to keep big packets from significantly increasing 
the latency of small packets.

Overall I agree with the theory of larger MTUs... Improved efficiency, 
being able to do page-flipping with your payload, not having to worry 
about screwing things up if you DO need to use a tunnel or turn on 
IPsec, it's all well and good... But from a practical standpoint there 
are still a lot of very serious issues that have not been addressed, and 
anyone who actually tries to do this at scale is in for a world of hurt. 

I for one would love to see the situation improved, but trying to gloss 
over it and pretend the problems don't exist just delays the day when it 
actually CAN be supported.

 That is a list of 9000 byte clean gear.  The very bottom is the stuff 
 that doesn't support it.  Of the stuff that doesn't support it, how 
 much is connected directly to a peering point?  THAT is the bottleneck

This argument is completely destroyed at the line that says 7206VXR 
w/PA-GE, you don't need to read any further.

 I am talking about right now.  One step at a time.  Removing the 
 bottleneck at the peering points is all I am talking about.  That will 
 not change PMTU issues elsewhere and those will stand just exactly as 
 they are today without any change.  In fact it will ensure that there 
 are *fewer* PMTU discovery issues by being able to support a larger 
 range of packets without having to fragment them.

The issues I listed are precisely why it doesn't work at peering points. 
I know this because I do a lot of peering, and I spend a lot of time 
dealing with getting people to peer at larger MTU values (correctly). If 
it was easier to do without breaking stuff, I'd be a lot more successful 
at it. :)

 We *already* have SONET MTU of 4000 and this hasn't broken anything 
 since the invention of SONET.

SONET MTU works because it's on by default, it's the same size 
everywhere, 

Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Marshall Eubanks

On Nov 6, 2010, at 10:38 AM, Mark Smith wrote:

 On Fri, 5 Nov 2010 21:40:30 -0400
 Marshall Eubanks t...@americafree.tv wrote:
 
 
 On Nov 5, 2010, at 7:26 PM, Mark Smith wrote:
 
 On Fri, 5 Nov 2010 15:32:30 -0700
 Scott Weeks sur...@mauigateway.com wrote:
 
 
 
 It's really quiet in here.  So, for some Friday fun let me whap at the 
 hornets nest and see what happens...  ;-)
 
 
 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf
 
 
 Who ever wrote that doesn't know what they're talking about. LISP is
 not the IETF's proposed solution (the IETF don't have one, the IRTF do),
 
 Um, I would not agree. The IRTF RRG considered and is documenting a lot of 
 things, but did not
 come to any consensus as to which one should be a proposed solution.
 
 
 I probably got a bit keen, I've been reading through the IRTF RRG
 Recommendation for a Routing Architecture draft which, IIRC, makes a
 recommendation to pursue Identifier/Locator Network Protocol rather
 than LISP.
 

That is not a consensus document - as it says

   To this end, this
   document surveys many of the proposals that were brought forward for
   discussion in this activity, as well as some of the subsequent
   analysis and the architectural recommendation of the chairs.

and (Section 17)

   Unfortunately, the group
   did not reach rough consensus on a single best approach.

The Chairs suggested that work continue on ILNP, but it is a stretch to 
characterize that as the RRG's solution, much less the IRTF's.

(LISP is an IETF WG now, but with an experimental focus on its charter - 
The LISP WG is NOT chartered to develop the final
or standard solution for solving the routing scalability problem.)

Regards
Marshall


 Regards,
 Mark.
 
 
 Regards
 Marshall
 
 
 and streaming media was seen to be one of the early applications of the
 Internet - these types of applications is why TCP was split out of
 IP, why UDP was invented, and why UDP has has a significantly
 different protocol number to TCP.
 
 --
 NAT is your friend
 
 IP doesn’t handle addressing or multi-homing well at all
 
 The IETF’s proposed solution to the multihoming problem is 
 called LISP, for Locator/Identifier Separation Protocol. This
 is already running into scaling problems, and even when it works,
 it has a failover time on the order of thirty seconds.
 
 TCP and IP were split the wrong way
 
 IP lacks an addressing architecture
 
 Packet switching was designed to complement, not replace, the telephone 
 network. IP was not optimized to support streaming media, such as voice, 
 audio broadcasting, and video; it was designed to not be the telephone 
 network.
 --
 
 
 And so, ...the first principle of our proposed new network architecture: 
 Layers are recursive.
 
 I can hear the angry hornets buzzing already.  :-)
 
 scott
 
 
 
 




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Niels Bakker

* gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 00:30 CET]:

Re: large MTU

One place where this has the potential to greatly improve 
performance is in transfers of large amounts of data such as vendors 
supporting the downloading of movies, cloud storage vendors, and 
movement of other large content and streaming. The *first* step in 
being able to realize those gains is in removing the low hanging 
fruit of bottlenecks in that path.  The lowest hanging fruit is the 
peering points.  Changing those should introduce no new problems 
as the peering points aren't currently the source of MTU path 
discovery problems and increasing the MTU removes a discovery issue 
point, only reducing the MTU would create one.


On the contrary.  You're proposing to fuck around with the one place 
on the whole Internet that has pretty clear and well adhered-to rules 
and expectations about MTU size supported by participants, and 
basically re-live the problems from MAE-East and other shared 
Ethernet/FDDI platforms with mismatching MTU sizes brought us during 
their existence.




In transitioning from SONET to Ethernet, we are actually reducing
potential performance by reducing the effective MTU from 4000 to 2000.
So even increasing bandwidth is of no use if you are potentially
reducing performance end to end by reducing the effective maximum MTU of
the path.


These performance gains are minimal at best, and probably completely 
offset by the delays introduced by the packet loss that the probing 
will cause for any connection that doesn't live close to forever.


I'm not even going to bother commenting on your research link from 
production traffic in *1998*.



-- Niels.

--
It's amazing what people will do to get their name on the internet, 
 which is odd, because all you really need is a Blogspot account.

-- roy edroso, alicublog.blogspot.com



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 On the contrary.  You're proposing to fuck around with the one place
 on the whole Internet that has pretty clear and well adhered-to rules
 and expectations about MTU size supported by participants, and
 basically re-live the problems from MAE-East and other shared
 Ethernet/FDDI platforms with mismatching MTU sizes brought us during
 their existence.

Ok, there is another alternative.  Peering points could offer a 1500
byte vlan and a 9000 byte vlan one existing peering points and all new
ones be 9000 from the start.  Then there is no fucking around with
anything.  You show up to the new peering point, your MTU is 9000, you
are done.  No messing with anything.

Only SHORTENING MTUs in the middle causes PMTU problems.  Increasing
them does not.  And someone attempting to send frames larger than 1500
right now would see only a decrease in PMTU issues from such an increase
in MTU at the peering points, not an increase of issues.



 
 These performance gains are minimal at best, and probably completely
 offset by the delays introduced by the packet loss that the probing
 will cause for any connection that doesn't live close to forever.

Huh?  You don't need to do probing.  You can simply operate in passive
mode.  Also, even if using active probing mode, the probing stops once
the MTU is discovered.  In passive mode there is no probing at all
unless you hit a black hole.

And the performance improvements I suppose are minimal if you consider
going from a maximum of 6.5Meg/sec for a transfer from LA to NY to 40Meg
for the same transfer to be minimal

From one of the earlier linked documents:

(quote)
Let's take an example: New York to Los Angeles. Round Trip Time (rtt) is
about 40 msec, and let's say packet loss is 0.1% (0.001). With an MTU of
1500 bytes (MSS of 1460), TCP throughput will have an upper bound of
about 6.5 Mbps! And no, that is not a window size limitation, but rather
one based on TCP's ability to detect and recover from congestion (loss).
With 9000 byte frames, TCP throughput could reach about 40 Mbps.

Or let's look at that example in terms of packet loss rates. Same round
trip time, but let's say we want to achieve a throughput of 500 Mbps
(half a gigabit). To do that with 9000 byte frames, we would need a
packet loss rate of no more than 1x10^-5. With 1500 byte frames, the
required packet loss rate is down to 2.8x10^-7! While the jumbo frame is
only 6 times larger, it allows us the same throughput in the face of 36
times more packet loss.
(end quote)

So if you consider 5x performance boost to be minimal yeah, I guess.
Or being able to operate at todays transfer rates in the face of 36x
more packet loss to be minimal improvement, I suppose.





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 So if you consider 5x performance boost to be minimal yeah, I
guess.
 Or being able to operate at todays transfer rates in the face of 36x
 more packet loss to be minimal improvement, I suppose.


And those improvements in performance get larger the longer the latency
of the connection.  For transit from US to APAC or Europe, the
improvement would be even greater.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Mark Smith
On Sat, 06 Nov 2010 11:45:01 -0500
Jack Bates jba...@brightok.net wrote:

 On 11/5/2010 5:32 PM, Scott Weeks wrote:
 
  It's really quiet in here.  So, for some Friday fun let me whap at the 
  hornets nest and see what happens...;-)
 
 
  http://www.ionary.com/PSOC-MovingBeyondTCP.pdf
 
 
 SCTP is a great protocol. It has already been implemented in a number of 
 stacks. With these benefits over that theory, it still hasn't become 
 mainstream yet. People are against change. They don't want to leave v4. 
 They don't want to leave tcp/udp. Technology advances, but people will 
 only change when they have to.
 

Lock of SCTP availability is nothing to do with people's avoidance of
change - it's likely that deployed Linux kernels in the last 3 to 5
years already have it complied in. IPv4 NAT is what has prevented it
from being deployed, because NATs don't understand it and therefore
can't NAT addresses carried within it.

This is one of the reasons why NAT is bad for the Internet - it has
prevented deployment and/or utilisation of new transport protocols, such
as SCTP or DCCP, that provide benefits over UDP or TCP.

 
 Jack (lost brain cells actually reading that pdf)
 

Glad I haven't then, just the quotes from it hurt.

Regards,
Mark.




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Jack Bates

On 11/6/2010 7:21 PM, George Bonser wrote:


(quote)
Let's take an example: New York to Los Angeles. Round Trip Time (rtt) is
about 40 msec, and let's say packet loss is 0.1% (0.001). With an MTU of
1500 bytes (MSS of 1460), TCP throughput will have an upper bound of
about 6.5 Mbps! And no, that is not a window size limitation, but rather
one based on TCP's ability to detect and recover from congestion (loss).
With 9000 byte frames, TCP throughput could reach about 40 Mbps.


I prefer much less packet loss in a majority of my transmissions, which 
in turn brings those numbers closer together.



Jack



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Matthew Petach
On Sat, Nov 6, 2010 at 5:21 PM, George Bonser gbon...@seven.com wrote:
...
 (quote)
 Let's take an example: New York to Los Angeles. Round Trip Time (rtt) is
 about 40 msec, and let's say packet loss is 0.1% (0.001). With an MTU of
 1500 bytes (MSS of 1460), TCP throughput will have an upper bound of
 about 6.5 Mbps! And no, that is not a window size limitation, but rather
 one based on TCP's ability to detect and recover from congestion (loss).
 With 9000 byte frames, TCP throughput could reach about 40 Mbps.

I'd like to order a dozen of those 40ms RTT LA to NYC wavelengths, please.

If you could just arrange a suitable demonstration of packet-level delivery
time of 40ms from Los Angeles to New York and back, I'm sure there would
be a *long* line of people behind me, checks in hand.^_^

Matt



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 
 I prefer much less packet loss in a majority of my transmissions,
which
 in turn brings those numbers closer together.
 
 
 Jack

True, though t the idea that it greatly reduces packets in flight for a
given amount of data gives a lot of benefit, particularly over high
latency connections.  Considering throughput = ~0.7 * MSS / (rtt *
sqrt(packet_loss)) (from http://sd.wareonearth.com/~phil/jumbo.html) and
that packet loss to places such as China is often greater than zero, the
benefits of increased PMTU become obvious.  Increase that latency from
20ms to 200ms and the benefits of increased MSS are obvious.

The only real argument here against changing existing peering points is
all peers must have the same MTU.  So far I haven't heard any real
argument against it for a new peering point which is starting from a
green field. It isn't going to change how anyone's network behaves
internally and increasing MTU doesn't produce PMTU issues for transit
traffic.  

It just seems a shame that two servers with FDDI interfaces using SONET
long haul are going to perform much better on a coast to coast transfer
than a pair with a GigE over ethernet long haul simply because of the
MTU issue.  Increasing the bandwidth of a path to GigE shouldn't result
in reduced performance but in this case it would.

At least one peering point provider has offered to create a jumbo VLAN
for experimentation.




RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser
 I'd like to order a dozen of those 40ms RTT LA to NYC wavelengths,
 please.
 
 If you could just arrange a suitable demonstration of packet-level
 delivery
 time of 40ms from Los Angeles to New York and back, I'm sure there
 would
 be a *long* line of people behind me, checks in hand.^_^
 
 Matt

Yeah, he must have goofed on that.  The 40ms must be the one-way time,
not the RTT.  I get a pretty consistent 80ms to NY from California.




Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Niels Bakker

* gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]:

It just seems a shame that two servers with FDDI interfaces using SONET


Earth to George Bonser: IT IS NOT 1998 ANYMORE.


-- Niels.



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread Jack Bates

On 11/6/2010 10:31 PM, Niels Bakker wrote:

* gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]:

It just seems a shame that two servers with FDDI interfaces using SONET


Earth to George Bonser: IT IS NOT 1998 ANYMORE.



We don't fly sr71s or use bigger MTU interfaces. Get with the times! :)


Jack



RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser


 -Original Message-
 From: Niels Bakker [mailto:niels=na...@bakker.net]
 Sent: Saturday, November 06, 2010 8:32 PM
 To: nanog@nanog.org
 Subject: Re: RINA - scott whaps at the nanog hornets nest :-)
 
 * gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]:
 It just seems a shame that two servers with FDDI interfaces using
 SONET
 
 Earth to George Bonser: IT IS NOT 1998 ANYMORE.

Exactly my point.  Why should we adopt newer technology while using
configuration parameters that degrade performance?

1500 was designed for thick net.  It is absolutely stupid to use it for
GigE or higher speeds and I do mean absolutely idiotic.  It is going
backwards in performance.  No wonder there is still so much transport
using SONET.  Using Ethernet reduces your effective performance over
long distance paths.





RE: RINA - scott whaps at the nanog hornets nest :-)

2010-11-06 Thread George Bonser


  * gbon...@seven.com (George Bonser) [Sun 07 Nov 2010, 04:27 CET]:
  It just seems a shame that two servers with FDDI interfaces using
  SONET
 
  Earth to George Bonser: IT IS NOT 1998 ANYMORE.
 
 Exactly my point.  Why should we adopt newer technology while using
 configuration parameters that degrade performance?
 
 1500 was designed for thick net.  It is absolutely stupid to use it
for
 GigE or higher speeds and I do mean absolutely idiotic.  It is going
 backwards in performance.  No wonder there is still so much transport
 using SONET.  Using Ethernet reduces your effective performance over
 long distance paths.
 
 

And by that I mean using 1500 MTU is what degrades the performance, not
the ethernet physical transport.  Using MTU 9000 would give you better
performance than SONET.  That is why Internet2 pushes so hard for people
to use the largest possible MTU and the suggested MINIMUM is 9000.





Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-05 Thread Mark Smith
On Fri, 5 Nov 2010 15:32:30 -0700
Scott Weeks sur...@mauigateway.com wrote:

 
 
 It's really quiet in here.  So, for some Friday fun let me whap at the 
 hornets nest and see what happens...  ;-)
 
 
 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf
 

Who ever wrote that doesn't know what they're talking about. LISP is
not the IETF's proposed solution (the IETF don't have one, the IRTF do),
and streaming media was seen to be one of the early applications of the
Internet - these types of applications is why TCP was split out of
IP, why UDP was invented, and why UDP has has a significantly
different protocol number to TCP.

 --
 NAT is your friend
 
 IP doesn’t handle addressing or multi-homing well at all
 
 The IETF’s proposed solution to the multihoming problem is 
 called LISP, for Locator/Identifier Separation Protocol. This
 is already running into scaling problems, and even when it works,
 it has a failover time on the order of thirty seconds.
 
 TCP and IP were split the wrong way
 
 IP lacks an addressing architecture
 
 Packet switching was designed to complement, not replace, the telephone 
 network. IP was not optimized to support streaming media, such as voice, 
 audio broadcasting, and video; it was designed to not be the telephone 
 network.
 --
 
 
 And so, ...the first principle of our proposed new network architecture: 
 Layers are recursive.
 
 I can hear the angry hornets buzzing already.  :-)
 
 scott



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-05 Thread Richard A Steenbergen
On Fri, Nov 05, 2010 at 03:32:30PM -0700, Scott Weeks wrote:
 
 It's really quiet in here.  So, for some Friday fun let me whap at the 
 hornets nest and see what happens...  ;-)

Arguments about locator/identifier splits aside (which I happen to agree 
with), this thing goes off the deep end on page 7 when it starts talking 
about peering infrastructure. Infact pretty much every sentence on that 
page is blatantly wrong. :)

-- 
Richard A Steenbergen r...@e-gerbil.net   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-05 Thread Scott Weeks


--- na...@85d5b20a518b8f6864949bd940457dc124746ddc.nosense.org wrote:
From: Mark Smith na...@85d5b20a518b8f6864949bd940457dc124746ddc.nosense.org

 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf

Who ever wrote that doesn't know what they're talking about. LISP is
not the IETF's proposed solution (the IETF don't have one, the IRTF do),
and streaming media was seen to be one of the early applications of the
Internet - these types of applications is why TCP was split out of
IP, why UDP was invented, and why UDP has has a significantly
different protocol number to TCP.
--


That's interesting, I wasn't aware of that.  I will look into that bit of 
history just for fun.

Getting over misstated things like you've pointed out, what do you think of the 
concept?

scott



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-05 Thread Scott Weeks


--- r...@e-gerbil.net wrote:
From: Richard A Steenbergen r...@e-gerbil.net
On Fri, Nov 05, 2010 at 03:32:30PM -0700, Scott Weeks wrote:

 It's really quiet in here.  So, for some Friday fun let me whap at the 
 hornets nest and see what happens...  ;-)

Arguments about locator/identifier splits aside (which I happen to agree 
with), this thing goes off the deep end on page 7 when it starts talking 
about peering infrastructure. Infact pretty much every sentence on that 
page is blatantly wrong. :)



On re-reading it, I understand what you're saying, but the concept seems to 
have merit.  Were you able to get past the mis-statements and get to the meat 
of the paper?  It's concept, not running code, but very interesting.

scott



Re: RINA - scott whaps at the nanog hornets nest :-)

2010-11-05 Thread Marshall Eubanks

On Nov 5, 2010, at 7:26 PM, Mark Smith wrote:

 On Fri, 5 Nov 2010 15:32:30 -0700
 Scott Weeks sur...@mauigateway.com wrote:
 
 
 
 It's really quiet in here.  So, for some Friday fun let me whap at the 
 hornets nest and see what happens...  ;-)
 
 
 http://www.ionary.com/PSOC-MovingBeyondTCP.pdf
 
 
 Who ever wrote that doesn't know what they're talking about. LISP is
 not the IETF's proposed solution (the IETF don't have one, the IRTF do),

Um, I would not agree. The IRTF RRG considered and is documenting a lot of 
things, but did not
come to any consensus as to which one should be a proposed solution.

Regards
Marshall


 and streaming media was seen to be one of the early applications of the
 Internet - these types of applications is why TCP was split out of
 IP, why UDP was invented, and why UDP has has a significantly
 different protocol number to TCP.
 
 --
 NAT is your friend
 
 IP doesn’t handle addressing or multi-homing well at all
 
 The IETF’s proposed solution to the multihoming problem is 
 called LISP, for Locator/Identifier Separation Protocol. This
 is already running into scaling problems, and even when it works,
 it has a failover time on the order of thirty seconds.
 
 TCP and IP were split the wrong way
 
 IP lacks an addressing architecture
 
 Packet switching was designed to complement, not replace, the telephone 
 network. IP was not optimized to support streaming media, such as voice, 
 audio broadcasting, and video; it was designed to not be the telephone 
 network.
 --
 
 
 And so, ...the first principle of our proposed new network architecture: 
 Layers are recursive.
 
 I can hear the angry hornets buzzing already.  :-)
 
 scott