Re: 923Mbits/s across the ocean

2003-03-15 Thread William Allen Simpson

[just discovered in my unsent messages queue from offline composition, 
probably not timely, but...]

Iljitsch van Beijnum wrote:
 
 We can't replace path MTU discovery (but hopefully people will start to
 realize ICMP messages were invented for another reason than job security
 for firewalls). But what we need is a way for 10/100 Mbps 1500 byte
 hosts to live with 1000 Mbps 9000 byte hosts on the same subnet. I
 thought IPv6 neighbor discovery supported this because ND can
 communicate the MTU between hosts on the same subnet, but unfortunately
 this is a subnet-wide MTU and not a per-host MTU, which is what we
 really need.
 
A decade ago, when I designed SIPP Neighbor Discovery, it saved per 
destination maximum unfragmented datagram size in the route cache, 
and each I-Am-Here message Heard specified Maximum Receive Unit (MRU)
per host.  Thus, once upon a time, IPv6 had what you need.  

Unfortunately, the IPv6 group stripped out such innovative features.  
I stopped paying attention after the new editor stated something like 
it worked for ethernet, we really don't need any more than that.

Well, we used IPv4 from '83, and designed SIPP (cum IPv6) in '93.  

IPv6 is a failure -- maybe it's time for this decade's design?  

Or maybe even some of the features some of us thought we needed a 
decade ago?
-- 
William Allen Simpson
Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32


Re: 923Mbits/s across the ocean

2003-03-11 Thread Iljitsch van Beijnum

On Mon, 10 Mar 2003, Richard A Steenbergen wrote:

   On the receive size, the socket buffers must be large enough to
   accommodate all the data received between application read()'s,

  That's not true. It's perfectly acceptable for TCP to stall when the
  receiving application fails to read the data fast enough.

 Ok, I think I was unclear. You don't NEED to have buffers large enough to
 accommodate all that data received between application read()'s, unless
 you are trying to achieve maximum performance. I thought that was the
 general framework we were all working under. :)

You got me there.  :-)

It seemed that you were talking about more general requirements at this
point, though with the upper and lower limits for kernel buffer space
and all.

  Hm, I don't see this happening to a usable degree as TCP has no concept
  of records. You really want to use fixed size chunks of information here
  rather than pretending everything's a stream.

 We're talking optimizations for high performance transfers... It can't
 always be a stream.

Right. But TCP is a stream protocol. This has many advantages, nearly
all of which are irrelevant for high volume high bandwidth bulk data
transfer.

I can imagine a system that only works in one direction and where the
data is split into fixed size records (which would ideally fit into a
single packet) where each record is acknowledged independently (but
certainly not for each individual packet). I would also want to take
advantage of traffic classification mechanisms: first the data is
flooded at the maximum speed at the lowest possible traffic class.
Everything that doesn't make it to the other end is then resent slower
with a higher traffic class. If the network supports priority queuing
then this would effectively sponge up all free bandwidth without
impacting regular interactive traffic. If after a few retries some data
still didn't make it: simply skip this for now (but keep a record of the
missing bits) and keep going. Many applications can live with some lost
data and for others it's probably more efficient to keep running at high
speed and repair the gaps afterwards.

   IMHO the 1500 byte MTU of ethernet
   will still continue to prevent good end to end performance like this for a
   long time to come. But alas, I digress...

  Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
  to support a per-neighbor MTU? This should make backward-compatible
  adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
  we're at it.)

 Not necessarily sure thats the right thing to do, but SOMETHIG has got to
 be better than what passes for path mtu discovery now. :)

We can't replace path MTU discovery (but hopefully people will start to
realize ICMP messages were invented for another reason than job security
for firewalls). But what we need is a way for 10/100 Mbps 1500 byte
hosts to live with 1000 Mbps 9000 byte hosts on the same subnet. I
thought IPv6 neighbor discovery supported this because ND can
communicate the MTU between hosts on the same subnet, but unfortunately
this is a subnet-wide MTU and not a per-host MTU, which is what we
really need.

Iljitsch



Re: 923Mbits/s across the ocean

2003-03-11 Thread Stephen Sprunk

Thus spake Iljitsch van Beijnum [EMAIL PROTECTED]
 This is the part about TCP that I've never understood: why does it
 send large numbers of packets back-to-back? This is almost never a
 good idea.

Because until you congest the network to the point of dropping packets, a
host has no idea how much bw is actually available.  Exponential rate
growith finds this value very quickly.

 Hm, I don't see this happening to a usable degree as TCP has no
 concept of records. You really want to use fixed size chunks of
 information here rather than pretending everything's a stream.

A record-oriented, reliable transport would make many protocols much easier
to implement.  Too bad SCTP hasn't seen wider use.

S

Stephen Sprunk God does not play dice.  --Albert Einstein
CCIE #3723 God is an inveterate gambler, and He throws the
K5SSSdice at every possible opportunity. --Stephen Hawking



Re: 923Mbits/s across the ocean

2003-03-10 Thread Douglas F. Calvert

On Sat, 2003-03-08 at 15:58, [EMAIL PROTECTED] wrote:
 That's the argument that pentagon used to justify buying $40 lightbulbs. 
 Does not work, sorry.

That is not the argument used to justify buying 40 lightbulbs. They do
not actually purchase 40 lightbulbs, the prices that you see in rag
magazine reports has to do with how the budgets are handled. If you can
budget a multi-billion dollar organization and put in reasonable price
and performance controls there are many schools that would hire you
after you revolutionized public administration and the DoD...


-- 
Douglas F. Calvert [EMAIL PROTECTED]


Re: 923Mbits/s across the ocean

2003-03-10 Thread Iljitsch van Beijnum

On Sun, 9 Mar 2003, Richard A Steenbergen wrote:

 On the send size, the application transmitting is guaranteed to utilize
 the buffers immediately (ever seen a huge jump in speed at the beginning
 of a transfer, this is the local buffer being filled, and the application
 has no way to know if this data is going out to the wire, or just to the
 kernel). Then the network must drain the packets onto the wire, sometimes
 very slowly (think about a dialup user downloading from your GigE server).

Actually this is often way too fast as the congestion window doubles
with each ACK. This means that with a large buffer = large window and a
bottleneck somewhere along the way, you are almost guaranteed to have
some serious congestion in the early stages of the session and lower
levels of congestion periodially later on whenever TCP tries to figure
out how large the congestion window can get without losing packets.

This is the part about TCP that I've never understood: why does it send
large numbers of packets back-to-back? This is almost never a good idea.

 On the receive size, the socket buffers must be large enough to
 accommodate all the data received between application read()'s,

That's not true. It's perfectly acceptable for TCP to stall when the
receiving application fails to read the data fast enough. (TCP then
simply announces a window of 0 to the other side so the communication
effectively stops until the application reads some data and a 0 window
is announced.) If not, the kernel would be required to buffer unlimited
amounts of data in the event an application fails to read it from the
buffer for some time (which is a very common situation).

 locally. Jumbo frames help too, but their real benefit is not the
 simplistic hey look theres 1/3rd the number of frames/sec view that many
 people see. The good stuff comes from techniques like page flipping, where
 the NIC DMA's data into a memory page which can be flipped through the
 system straight to the application, without copying it throughout. Some
 day TCP may just be implemented on the NIC itself, with ALL work
 offloaded, and the system doing nothing but receiving nice page-sized
 chunks of data at high rates of speed.

Hm, I don't see this happening to a usable degree as TCP has no concept
of records. You really want to use fixed size chunks of information here
rather than pretending everything's a stream.

 IMHO the 1500 byte MTU of ethernet
 will still continue to prevent good end to end performance like this for a
 long time to come. But alas, I digress...

Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
to support a per-neighbor MTU? This should make backward-compatible
adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
we're at it.)

Iljitsch van Beijnum



Re: 923Mbits/s across the ocean

2003-03-10 Thread Richard A Steenbergen

On Tue, Mar 11, 2003 at 12:41:15AM +0100, Iljitsch van Beijnum wrote:
  On the receive size, the socket buffers must be large enough to
  accommodate all the data received between application read()'s,
 
 That's not true. It's perfectly acceptable for TCP to stall when the
 receiving application fails to read the data fast enough. (TCP then
 simply announces a window of 0 to the other side so the communication
 effectively stops until the application reads some data and a 0 window
 is announced.) If not, the kernel would be required to buffer unlimited
 amounts of data in the event an application fails to read it from the
 buffer for some time (which is a very common situation).

Ok, I think I was unclear. You don't NEED to have buffers large enough to
accommodate all that data received between application read()'s, unless
you are trying to achieve maximum performance. I thought that was the
general framework we were all working under. :)

  locally. Jumbo frames help too, but their real benefit is not the
  simplistic hey look theres 1/3rd the number of frames/sec view that many
  people see. The good stuff comes from techniques like page flipping, where
  the NIC DMA's data into a memory page which can be flipped through the
  system straight to the application, without copying it throughout. Some
  day TCP may just be implemented on the NIC itself, with ALL work
  offloaded, and the system doing nothing but receiving nice page-sized
  chunks of data at high rates of speed.
 
 Hm, I don't see this happening to a usable degree as TCP has no concept
 of records. You really want to use fixed size chunks of information here
 rather than pretending everything's a stream.

We're talking optimizations for high performance transfers... It can't 
always be a stream.

  IMHO the 1500 byte MTU of ethernet
  will still continue to prevent good end to end performance like this for a
  long time to come. But alas, I digress...
 
 Don't we all? I'm afraid you're right. Anyone up for modifying IPv6 ND
 to support a per-neighbor MTU? This should make backward-compatible
 adoption of jumboframes a possibility. (Maybe retrofit ND into v4 while
 we're at it.)

Not necessarily sure thats the right thing to do, but SOMETHIG has got to 
be better than what passes for path mtu discovery now. :)

-- 
Richard A Steenbergen [EMAIL PROTECTED]   http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)


Re: 923Mbits/s across the ocean

2003-03-09 Thread Iljitsch van Beijnum

On Sat, 8 Mar 2003, Joe St Sauver wrote:

 you will see that for bulk TCP flows, the median throughput is still only
 2.3Mbps. 95th%-ile is only ~9Mbps. That's really not all that great,
 throughput wise, IMHO.

Strange. Why is that? RFC 1323 is widely implemented, although not
widely enabled (and for good reason: the timestamp option kills header
compression so it's bad for lower-bandwidth connections). My guess is
that the OS can't afford to throw around MB+ size buffers for every TCP
session so the default buffers (which limit the windows that can be
used) are relatively small and application programmers don't override
the default.



RE: 923Mbits/s across the ocean

2003-03-09 Thread Cottrell, Les

Also as the OS's are shipped they come with small default maximum window sizes (I 
think Linux is typically 64KB and Solaris is 8K), and so one has to get the sysadmin 
with root privs to change this. 

-Original Message-
From: Iljitsch van Beijnum [mailto:[EMAIL PROTECTED] 
Sent: Sunday, March 09, 2003 5:25 AM
To: Joe St Sauver
Cc: [EMAIL PROTECTED]
Subject: Re: 923Mbits/s across the ocean



On Sat, 8 Mar 2003, Joe St Sauver wrote:

 you will see that for bulk TCP flows, the median throughput is still 
 only 2.3Mbps. 95th%-ile is only ~9Mbps. That's really not all that 
 great, throughput wise, IMHO.

Strange. Why is that? RFC 1323 is widely implemented, although not widely enabled (and 
for good reason: the timestamp option kills header compression so it's bad for 
lower-bandwidth connections). My guess is that the OS can't afford to throw around MB+ 
size buffers for every TCP session so the default buffers (which limit the windows 
that can be
used) are relatively small and application programmers don't override the default.


Re: 923Mbits/s across the ocean

2003-03-09 Thread David G. Andersen

On Sun, Mar 09, 2003 at 02:25:25PM +0100, Iljitsch van Beijnum quacked:
 
 On Sat, 8 Mar 2003, Joe St Sauver wrote:
 
  you will see that for bulk TCP flows, the median throughput is still only
  2.3Mbps. 95th%-ile is only ~9Mbps. That's really not all that great,
  throughput wise, IMHO.
 
 Strange. Why is that? RFC 1323 is widely implemented, although not
 widely enabled (and for good reason: the timestamp option kills header
 compression so it's bad for lower-bandwidth connections). My guess is
 that the OS can't afford to throw around MB+ size buffers for every TCP
 session so the default buffers (which limit the windows that can be
 used) are relatively small and application programmers don't override
 the default.

  Which makes it doubly a shame that the adaptive buffer tuning
tricks haven't made it into production systems yet.  It was
a beautiful, simple idea that worked very well for adapting to
long fat networks:

  http://www.acm.org/sigcomm/sigcomm98/tp/abs_26.html

  -dave

-- 
work: [EMAIL PROTECTED]  me:  [EMAIL PROTECTED]
  MIT Laboratory for Computer Science   http://www.angio.net/
  I do not accept unsolicited commercial email.  Do not spam me.


Re: 923Mbits/s across the ocean

2003-03-09 Thread Richard A Steenbergen

On Sun, Mar 09, 2003 at 08:29:16AM -0800, Cottrell, Les wrote:
 
  Strange. Why is that? RFC 1323 is widely implemented, although not
  widely enabled (and for good reason: the timestamp option kills header
  compression so it's bad for lower-bandwidth connections). My guess is
  that the OS can't afford to throw around MB+ size buffers for every TCP
  session so the default buffers (which limit the windows that can be
  used) are relatively small and application programmers don't override
  the default.

 Also as the OS's are shipped they come with small default maximum window
 sizes (I think Linux is typically 64KB and Solaris is 8K), and so one
 has to get the sysadmin with root privs to change this.

This is related to how the kernel/user model works in relation to TCP.  
TCP itself happens in the kernel, but the data comes from userland through 
the socket interface, so there is a socket buffer in the kernel which 
holds data coming from and going to the application. TCP cannot release 
data from it's buffer until it has been acknowledged by the other side, 
incase it needs to retransmit. This means TCP performance is limited by 
the smaller of either the congestion window (determined by measuring 
conditions along the path), or the send/recv window (determined by local 
system resources).

However, you can't just blindly turn up your socket buffers to large 
values and expect good results.

On the send size, the application transmitting is guaranteed to utilize 
the buffers immediately (ever seen a huge jump in speed at the beginning 
of a transfer, this is the local buffer being filled, and the application 
has no way to know if this data is going out to the wire, or just to the 
kernel). Then the network must drain the packets onto the wire, sometimes 
very slowly (think about a dialup user downloading from your GigE server). 
Setting the socket buffers too high can potentially result in an 
incredible waste of resources, and can severely limit the number of 
simultaneous connections your server can support. This is precisely why 
OS's cannot ship with huge default values, because what may be appropriate 
for your one-user GigE connected box might not be appropriate for someone 
else's 100BASE-TX web server (and guess which setup has more users :P).

On the receive size, the socket buffers must be large enough to
accommodate all the data received between application read()'s, as well 
as making sure they have enough available space to hold future data in the 
event of a gap due to loss and the need for retransmission. However, if 
the application fails to read() the data from the socket buffer, it will 
sit there forever. Large socket buffers also opens the server up to 
malicious attack causing non-swapable kernel memory to consume all 
available resources, either locally (by someone dumping data over lots of 
connections, or running an application which intentionally fails to read 
data from the socket buffer), or remotely (think someone opening a bunch 
of rate limited connections from your high speed server). It can even be 
unintentional, but just as bad (think a million confused dialup users 
accidentally clicking on your high speed video stream).

Some of this can be worked around by implementing what is called
auto-tuning socket buffers. In this case, the kernel would limit the
amount of data allowed into the buffer, by looking at the tcp session's 
observed congestion window. This allows you to define large send buffers 
without applications connected to slow receivers sucking up unnecessary 
resourced. PSC has had example implementations for quite a while, and 
recently FreeBSD even added this (sysctl net.inet.tcp.inflight_enable=1 as 
of 4.7). Unfortunately, there isn't much you can do to prevent malicious 
receive-side buffer attacks, short of limiting the overall max buffer 
(FreeBSD implements this as an rlimit sbsize).

Of course, you need a few other things before you can start getting into
end to end gigabit speeds. If you're transfering a file, you probably
don't want to be reading it from disk via the kernel just to send it back
to the kernel again for transmission, so various things like sendfile()  
and zero copy implementations help get you the performance you need
locally. Jumbo frames help too, but their real benefit is not the
simplistic hey look theres 1/3rd the number of frames/sec view that many
people see. The good stuff comes from techniques like page flipping, where
the NIC DMA's data into a memory page which can be flipped through the
system straight to the application, without copying it throughout. Some
day TCP may just be implemented on the NIC itself, with ALL work
offloaded, and the system doing nothing but receiving nice page-sized
chunks of data at high rates of speed. IMHO the 1500 byte MTU of ethernet 
will still continue to prevent good end to end performance like this for a 
long time to come. But alas, I digress...

-- 
Richard A Steenbergen [EMAIL 

Re: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

I am not normally on this list but someone kindly gave me copies of some of the email 
concerning the Internet2 Land Speed record. So I have joined the list.

As one of the PIs of the record, I thought it might be useful to comment on a few 
interesting items I have seen, and no I am not trying to flame anybody:

Give  em a million dollars, plus fiber from here to anywhere and let me muck with the 
TCP algorith, and I can move a GigE worth of traffic too - Dave

You are modest in your budgetary request. Just the Cisco router (GSR 12406) we had on 
free loan listed at close to a million dollars, and the OC192 links just from 
Sunnyvale to Chicago would have cost what was left of the million/per month.

We used a stock TCP (Linux kernel TCP).  We did however, use jumbo frames (9000Byte 
MTUs).

In response Richard A Steenbergen we are not now living in a tropical foreign 
country,  with lots and lots of drugs and women but then the weather in California is 
great today.

What am I missing here, theres OC48=2.4Gb, OC192=10Gb ...

We were running host to host (end-to-end) with a single stream with common off the 
shelf equipment, there are not too many (I think none)  1GE host NICs available today 
that are in production (e.g. without signing a non-disclosure agreement).

Production commercial networks ... Blow away these speeds on a regular basis. 
See the above remark about end-to-end application to application, single stream.

So, you turn down/off all the parts of TCP that allow you to share bandwidth ... 
We did not mess with the TCP stack, it was stock off the shelf.

... Mention that Internet speed records are measured in terabit-meters/sec. 
You are correct, this is important, but reporters want a sound bite and typically only 
focus on one thing at a time. I will make sure next time I talk to a reporter to 
emphasize this. Maybe we can get some mileage out of Petabmps (Peta bit metres per 
second)  sounds

What kind of production environment needs a single TCP stream of data at 1Gbits/s 
over a 150ms latency link? 
Today High Energy Particle Physics needs hundreds of Megabits/s between California and 
Europe (Lyon, Padova and Oxford) to deliver data on a timely basis form an experiment 
site at SLAC to regional computer sites in Europe. Today on production acadmeic 
networks (with sustainable rates of 100 to a few hundred Mbits/s) it takes about a day 
to transmit just over a Tbyte of data which just about keeps up with the data rates. 
The data generation rates are doubling / year so within 1-3 years we will be needing 
speeds like in the record on a production basis. We needed to ensure we can achieve 
the needed rates, and whether we can do it with off the shelf hardware, how the hosts 
and OS' need configuring, how to tune the TCP stack or how newer stacks perform, what 
are the requirements for jumbo frames etc. Besides High Energy Physics other sciences 
are beginning to grapple with how to repliacte large databases across the globe, such 
sciences include radio-astronmoy, human genome, global
weather, seismic ...

The spud gun is interesting, given the distances, probably a 747 freightliner packed 
with DST tapes or disks is a better idea.  Assuming we fill the 747 with say 50 Gbps 
tapes (disks would probably be better), then if it takes 10 hours to fly from San 
Francisco (BTW Sunnyvale is near San Francisco not near LA as one person talking about 
retiring to better weather might lead one to believe) the bandwidth is about 2-4 
Tbits/s. However, this ignores the reality of labelling, writing the tapes, removing 
from silo robot, pocaking, getting to airport, loading, unloading, getting through 
customs etc. In reality the latency is really closer to 2 weeks. Even worse if there 
is an error (heads not aligned etc.) then the the retry latency is long and the effort 
involved considerable.  Also the network solution lends itself much better to 
automation, in our case we saved a couple of full time equivalent people at the 
sending site to distribute the data on a regular basis to our collaborator sites
in France, UK and Italy.

The remarks about window size and buffer are interesting also.  It is true large 
windows are needed. To approach 1Gbits/s we require 40MByte windows.  If this is going 
to be a problem, then we need to raise question like this soon and figure out how to 
address (add more memory, use other protocols etc.). In practice to approcah 
2.5Gbits/s requires 120MByte windows.

I am quite happy to concede that this does not need to be about some jocks beating a 
record. I do think it is important to catch the public's attention to why high speeds 
are important, that they are achievable today application to application (it would 
also be useful to estimate when such speeds are available to universities, large 
companies, small companies, the home etc.), and for techies it is important to start 
to understand the challenges the high speeds raise, e.g. cpu and router memories, 

Re: 923Mbits/s across the ocean

2003-03-08 Thread E.B. Dreger

LC Date: Sat, 08 Mar 2003 10:04:20 -0800
LC From: Cottrell, Les


LC The remarks about window size and buffer are interesting
LC also.  It is true large windows are needed. To approach
LC 1Gbits/s we require 40MByte windows.  If this is going to be
LC a problem, then we need to raise question like this soon and
LC figure out how to address (add more memory, use other
LC protocols etc.). In practice to approcah 2.5Gbits/s requires
LC 120MByte windows.

Yup.  About 2x to 2.5x the bandwidth*delay product.

I'm still curious about insane SACK or maybe NACK.  Spray TCP
packets hoping they arrive (good odds), and wait to hear what
made or didn't make it.  Let the receiving end have the large
buffers... sending machines generally must handle a greater
number of sessions.  ECN also would be a nice way of telling a
sender to back off, [hopefully] proactively avoiding packet loss.

It certainly seems a shame to require big sending buffers and
slow down entire streams just in case a small bit gets lost.


Eddy
--
Brotsman  Dreger, Inc. - EverQuick Internet Division
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 (785) 865-5885 Lawrence and [inter]national
Phone: +1 (316) 794-8922 Wichita

~
Date: Mon, 21 May 2001 11:23:58 + (GMT)
From: A Trap [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Please ignore this portion of my mail signature.

These last few lines are a trap for address-harvesting spambots.
Do NOT send mail to [EMAIL PROTECTED], or you are likely to
be blocked.



Re: 923Mbits/s across the ocean

2003-03-08 Thread alex

 You are modest in your budgetary request. Just the Cisco router (GSR
 12406) we had on free loan listed at close to a million dollars, and the
 OC192 links just from Sunnyvale to Chicago would have cost what was left
 of the million/per month.

No, your budget folks have no clue, which they clearly demonstrate.  Anyone
here who buys Cisco at the list prices works for companies that for some
reason want to waste money. We pay about 10c on a dollar.

Anyone leasing OC-192 at that price as opposite to lighting it up is
smoking.

 What am I missing here, theres OC48=2.4Gb, OC192=10Gb ...
 
 We were running host to host (end-to-end) with a single stream with common
 off the shelf equipment, there are not too many (I think none)  1GE host
 NICs available today that are in production (e.g. without signing a
 non-disclosure agreement).

Again, if this is all available today, what is so new that you guys have
done, apart from blowing tons of money?

 The remarks about window size and buffer are interesting also.  It is true
 large windows are needed. To approach 1Gbits/s we require 40MByte windows. 
 If this is going to be a problem, then we need to raise question like this
 soon and figure out how to address (add more memory, use other protocols
 etc.). In practice to approcah 2.5Gbits/s requires 120MByte windows.
 
 I am quite happy to concede that this does not need to be about some jocks
 beating a record. I do think it is important to catch the public's
 attention to why high speeds are important, that they are achievable today
 application to application (it would also be useful to estimate when such
 speeds are available to universities, large companies, small companies,
 the home etc.), and for techies it is important to start to understand the
 challenges the high speeds raise, e.g. cpu and router memories, bugs in
 TCP, OS, application etc., new TCP stacks, new (possibly UDP based)
 protocols such as tsunami, need for 64 bit counters in monitoring, effects
 of the NIC card, jumbo requirements etc., and what is needed to address
 them. Also to try and put it in meaningful terms (such as 2 full length
 DVD movies in a minute, that could also increase the cease and desist
 legal messages shipped ;-)) is important.

High speeds are not important. High speeds at a *reasonable* cost are
important. What you are describing is a high speed at an *unreasonable*
cost.

Alex



Re: 923Mbits/s across the ocean

2003-03-08 Thread David G. Andersen

On Sat, Mar 08, 2003 at 03:29:56PM -0500, [EMAIL PROTECTED] quacked:
 
 High speeds are not important. High speeds at a *reasonable* cost are
 important. What you are describing is a high speed at an *unreasonable*
 cost.

To paraphrase many a california sufer, dude, chill out.

The bleeding edge of performance in computers and networks is always
stupidly expensive.  But once you've achieved it, the things you
did to get there start to percolate back into the consumer stream,
and within a few years, the previous bleeding edge is available
in the current O(cheap) hardware.

A cisco 7000 used to provide the latest and greatest performance
in its day, for a rather considerable cost.  Today, you can get a
box from Juniper for the same price you paid for your 7000 that
provides a few orders of magnitude more performance.

But to get there, you have to be willing to see what happens when
you push the envelope.  That's the point of the LSR, and a lot of
other research efforts.

  -Dave

-- 
work: [EMAIL PROTECTED]  me:  [EMAIL PROTECTED]
  MIT Laboratory for Computer Science   http://www.angio.net/
  I do not accept unsolicited commercial email.  Do not spam me.


Re: 923Mbits/s across the ocean

2003-03-08 Thread alex

 To paraphrase many a california sufer, dude, chill out.

When the none of my taxes goes to the silly projects, I will chill out.

It had been stated by the people that participated in this research that

(a) they bought hardware at the prices to help Cisco to make its quarters
(b) they have spent millions of dollars for OC-192 links when they did not
need them.
(c) they did not come up with anything new apart from a proof that they
achieved that speed.

 The bleeding edge of performance in computers and networks is always
 stupidly expensive.  But once you've achieved it, the things you
 did to get there start to percolate back into the consumer stream,
 and within a few years, the previous bleeding edge is available
 in the current O(cheap) hardware.

That is all great if they *actually* *developed* something. However, they
did not. They bought off the shelf products for list prices plugged them in,
ran slightly tweaked kernels, helped Qwest/Globalcrossing etc prop its
quarters and announced we did it.

 A cisco 7000 used to provide the latest and greatest performance
 in its day, for a rather considerable cost.  Today, you can get a
 box from Juniper for the same price you paid for your 7000 that
 provides a few orders of magnitude more performance.
 
 But to get there, you have to be willing to see what happens when
 you push the envelope.  That's the point of the LSR, and a lot of
 other research efforts.

That's the argument that pentagon used to justify buying $40 lightbulbs. 
Does not work, sorry.

Alex



RE: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

With the glossing over of details that goes with press releases there appears to be a 
misunderstanding here.  I never said we paid list prices. I am well aware that one can 
get large discounts from vendors. However, I think it is important to quote a well 
known price (in this case list), which people can relate to how well they think they 
can negotiate (otherwise it just becomes a bragging point of who can get the largest 
discount), and gets away from the point of giving people an idea of what it might 
cost.  In our case we got 100% (free) discounts from Level(3) and Cisco for the 
Sunnyvale to Chicago link and the GSR.

The link from StarLight to Amsterdam was put in place for a European funded 
demonstration (since turned into a production link), the equipment was mainly funded 
by another European research project.

At the same time, getting it for free has its costs, one has much less leverage with 
the vendors as to delivery (and retrieval) dates, reliability etc. as well as the 
headaches of getting everything (PCs, loaned NIC cards, Routers, links) to come 
together, to keep the vendors interest, extend the loan etc. 

High speed at reasonable costs are the end-goal. However, it is important to be able 
to plan for when one will need such links, to know what one will be able to achieve, 
and for regular users to be ready to use them when the commonly available. This takes 
some effort up front to achieve and demonstrate.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 08, 2003 12:30 PM
To: Cottrell, Les
Cc: '[EMAIL PROTECTED]'
Subject: Re: 923Mbits/s across the ocean


 You are modest in your budgetary request. Just the Cisco router (GSR
 12406) we had on free loan listed at close to a million dollars, and 
 the OC192 links just from Sunnyvale to Chicago would have cost what 
 was left of the million/per month.

No, your budget folks have no clue, which they clearly demonstrate.  Anyone here who 
buys Cisco at the list prices works for companies that for some reason want to waste 
money. We pay about 10c on a dollar.

Anyone leasing OC-192 at that price as opposite to lighting it up is smoking.

 What am I missing here, theres OC48=2.4Gb, OC192=10Gb ...
 
 We were running host to host (end-to-end) with a single stream with 
 common off the shelf equipment, there are not too many (I think none) 
  1GE host NICs available today that are in production (e.g. without 
 signing a non-disclosure agreement).

Again, if this is all available today, what is so new that you guys have done, apart 
from blowing tons of money?

 The remarks about window size and buffer are interesting also.  It is 
 true large windows are needed. To approach 1Gbits/s we require 40MByte 
 windows. If this is going to be a problem, then we need to raise 
 question like this soon and figure out how to address (add more 
 memory, use other protocols etc.). In practice to approcah 2.5Gbits/s 
 requires 120MByte windows.
 
 I am quite happy to concede that this does not need to be about some 
 jocks beating a record. I do think it is important to catch the 
 public's attention to why high speeds are important, that they are 
 achievable today application to application (it would also be useful 
 to estimate when such speeds are available to universities, large 
 companies, small companies, the home etc.), and for techies it is 
 important to start to understand the challenges the high speeds raise, 
 e.g. cpu and router memories, bugs in TCP, OS, application etc., new 
 TCP stacks, new (possibly UDP based) protocols such as tsunami, need 
 for 64 bit counters in monitoring, effects of the NIC card, jumbo 
 requirements etc., and what is needed to address them. Also to try and 
 put it in meaningful terms (such as 2 full length DVD movies in a 
 minute, that could also increase the cease and desist legal messages 
 shipped ;-)) is important.

High speeds are not important. High speeds at a *reasonable* cost are important. What 
you are describing is a high speed at an *unreasonable* cost.

Alex


RE: 923Mbits/s across the ocean

2003-03-08 Thread alex

 With the glossing over of details that goes with press releases there
 appears to be a misunderstanding here.  I never said we paid list prices.
 I am well aware that one can get large discounts from vendors. However, I
 think it is important to quote a well known price (in this case list),
 which people can relate to how well they think they can negotiate
 (otherwise it just becomes a bragging point of who can get the largest
 discount), and gets away from the point of giving people an idea of what
 it might cost.  In our case we got 100% (free) discounts from Level(3) and
 Cisco for the Sunnyvale to Chicago link and the GSR.

Ok, after such explanation, I am more than willing to accept that it could
be a good use of the money, including the money that was paid to people to
sit and tweak parameters of gear, kernels, NIC cards to achieve
imporovements in speed (since no one in production world can justify having
people on the clock doing just that to document the smallest possible
improvements).

 High speed at reasonable costs are the end-goal. However, it is important
 to be able to plan for when one will need such links, to know what one
 will be able to achieve, and for regular users to be ready to use them
 when the commonly available. This takes some effort up front to achieve
 and demonstrate.

True, however as it was mentioned before, why not do the same type of
testing in a lab environment between a couple of boxes having the TCP stack
insert appropriate delays? When in 1995 we were getting simplex IP links
over satellites up that is how we did the testing before bringing them up
on the birds.

Alex



RE: 923Mbits/s across the ocean

2003-03-08 Thread E.B. Dreger

LC Date: Sat, 08 Mar 2003 13:13:53 -0800
LC From: Cottrell, Les


LC The link from StarLight to Amsterdam was put in place for a

man 4 dummynet


LC High speed at reasonable costs are the end-goal. However, it
LC is important to be able to plan for when one will need such
LC links, to know what one will be able to achieve, and for
LC regular users to be ready to use them when the commonly
LC available. This takes some effort up front to achieve and
LC demonstrate.

The thing is we already know that large buffers help greatly.
Seeing how fast one can push a box with big buffers might be
cool, but is it accomplishing anything?  As you demonstrated,
anyone who needs that speed here and now can get a private line
and use a stock *ix install.  Done/done.

How about other models?  Limited server buffers (it's nice to
handle more than 25 simultaneous streams), random-bandwidth
clients, congestion, jitter... how were those treated?  Have
these been explored?

If there's going to be research, let's see some TCP stack tuning
and the results.  Investigating other protocols would be nice;
perhaps the scope of the contest should be changed.  The level of
research in unleashing bone-stock equipment is more appropriate
for an undergrad paper than a news release.


Eddy
--
Brotsman  Dreger, Inc. - EverQuick Internet Division
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 (785) 865-5885 Lawrence and [inter]national
Phone: +1 (316) 794-8922 Wichita

~
Date: Mon, 21 May 2001 11:23:58 + (GMT)
From: A Trap [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Please ignore this portion of my mail signature.

These last few lines are a trap for address-harvesting spambots.
Do NOT send mail to [EMAIL PROTECTED], or you are likely to
be blocked.



Re: 923Mbits/s across the ocean

2003-03-08 Thread Iljitsch van Beijnum

On Sat, 8 Mar 2003, Cottrell, Les wrote:

 We used a stock TCP (Linux kernel TCP).  We did however, use jumbo
 frames (9000Byte MTUs).

What kind of difference did you see as opposed to standard 1500 byte
packets? I did some testing once and things actually ran slightly faster
with 1500 byte packets, completely contrary to my expectations... (This
was UDP and just 0.003 km rather than 10,000, though.)

 The remarks about window size and buffer are interesting also.  It
 is true large windows are needed. To approach 1Gbits/s we require
 40MByte windows.  If this is going to be a problem, then we need to
 raise question like this soon and figure out how to address (add
 more memory, use other protocols etc.). In practice to approcah
 2.5Gbits/s requires 120MByte windows.

So how much packet loss did you see? Even with a few packets in a
million lost this would bring your transfer way down and/or you'd need
even bigger windows.

However, bigger windows mean more congestion. When two of those boxes
start pushing traffic at 1 Gbps with a 40 MB window, you'll see 20 MB
worth of lost packets due to congestion in a single RTT.

A test where the high-bandwidth session or several high-bandwidth
sessions have to live side by side with other traffic would be very
interesting. If this works well it opens up possibilities of doing this
type of application over real networks rather than (virtual)
point-to-point links where congestion management isn't an issue.



RE: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

 High speed at reasonable costs are the end-goal. However, it is 
 important to be able to plan for when one will need such links, to 
 know what one will be able to achieve, and for regular users to be 
 ready to use them when the commonly available. This takes some effort 
 up front to achieve and demonstrate.

True, however as it was mentioned before, why not do the same type of testing in a 
lab environment 
between a couple of boxes having the TCP stack insert appropriate delays? When in 
1995 we were 
getting simplex IP links over
satellites up that is how we did the testing before bringing them up on the birds.

Following up on and driven by the work leading up to and following the Land Speed 
Record, 
some of the Caltech people collaborating on this record together with collaborators 
from SLAC and
elsewhere, are proposing a WAN in the Lab that can be used for just such testing. 
This saves on 
leasing fibers but there are still considerable expenses to run at 10Gbit/s rates 
(cpus, NICs, 
optical multiplexing equipment etc.). It is also a much more controlled environment 
that simplifies things. 
On the other hand it misses out on the real world experience, and so eventually has to 
be tested 
first on real world lightly used testbeds, and then on advanced research networks and 
finally on 
production networks, to understand how issues such as fairness, congestion avoidance, 
robustness to poor implemementations or configurations etc. really work. 




RE: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

The jumbo frames effectively increase the congestion avoidance additive increase of 
the congestion avoidance phase of TCP by a factor of 6.  Thus after a congestion 
event, that reduces the window by a factor of 2, one can recover 6 times as fast. This 
is very important on large RTT fast links where the recovery rate(for TCP/Reno) goes 
as the MTU/RTT^2. This can be seen in some of the graphs at:

http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/stacks.png  or more fully at:
http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/ 

We saw little congestion related packet loss on the testbed. With big windows SACK 
becomes increasingly important so one does not have recover a large fraction of the 
window for a single packet.

Once one gets onto networks where one is really sharing the bandwidth with others 
performance drops off rapidly (see for example the measuremsnts at 
http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/#Measurements%20from%20Sunnyvale%20to%20Amsterdam
 and compare them with those at 
http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/#TCP%20Stack%20Comparisons%20with%20Single%20Streams

One of the next things we want to look at next is how the various new TCP stacks work 
on production Academic  Research Networks (e.g. from Internet2, ESnet, GEANT, ...) 
with lots of other competing traffic. 
-Original Message-
From: Iljitsch van Beijnum [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 08, 2003 1:49 PM
To: Cottrell, Les
Cc: '[EMAIL PROTECTED]'
Subject: Re: 923Mbits/s across the ocean


On Sat, 8 Mar 2003, Cottrell, Les wrote:

 We used a stock TCP (Linux kernel TCP).  We did however, use jumbo 
 frames (9000Byte MTUs).

What kind of difference did you see as opposed to standard 1500 byte packets? I did 
some testing once and things actually ran slightly faster with 1500 byte packets, 
completely contrary to my expectations... (This was UDP and just 0.003 km rather than 
10,000, though.)

 The remarks about window size and buffer are interesting also.  It is 
 true large windows are needed. To approach 1Gbits/s we require 40MByte 
 windows.  If this is going to be a problem, then we need to raise 
 question like this soon and figure out how to address (add more 
 memory, use other protocols etc.). In practice to approcah 2.5Gbits/s 
 requires 120MByte windows.

So how much packet loss did you see? Even with a few packets in a million lost this 
would bring your transfer way down and/or you'd need even bigger windows.

However, bigger windows mean more congestion. When two of those boxes start pushing 
traffic at 1 Gbps with a 40 MB window, you'll see 20 MB worth of lost packets due to 
congestion in a single RTT.

A test where the high-bandwidth session or several high-bandwidth sessions have to 
live side by side with other traffic would be very interesting. If this works well it 
opens up possibilities of doing this type of application over real networks rather 
than (virtual) point-to-point links where congestion management isn't an issue.