RE: 923Mbits/s across the ocean

2003-03-09 Thread Cottrell, Les

Also as the OS's are shipped they come with small default maximum window sizes (I 
think Linux is typically 64KB and Solaris is 8K), and so one has to get the sysadmin 
with root privs to change this. 

-Original Message-
From: Iljitsch van Beijnum [mailto:[EMAIL PROTECTED] 
Sent: Sunday, March 09, 2003 5:25 AM
To: Joe St Sauver
Cc: [EMAIL PROTECTED]
Subject: Re: 923Mbits/s across the ocean



On Sat, 8 Mar 2003, Joe St Sauver wrote:

 you will see that for bulk TCP flows, the median throughput is still 
 only 2.3Mbps. 95th%-ile is only ~9Mbps. That's really not all that 
 great, throughput wise, IMHO.

Strange. Why is that? RFC 1323 is widely implemented, although not widely enabled (and 
for good reason: the timestamp option kills header compression so it's bad for 
lower-bandwidth connections). My guess is that the OS can't afford to throw around MB+ 
size buffers for every TCP session so the default buffers (which limit the windows 
that can be
used) are relatively small and application programmers don't override the default.


Re: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

I am not normally on this list but someone kindly gave me copies of some of the email 
concerning the Internet2 Land Speed record. So I have joined the list.

As one of the PIs of the record, I thought it might be useful to comment on a few 
interesting items I have seen, and no I am not trying to flame anybody:

Give  em a million dollars, plus fiber from here to anywhere and let me muck with the 
TCP algorith, and I can move a GigE worth of traffic too - Dave

You are modest in your budgetary request. Just the Cisco router (GSR 12406) we had on 
free loan listed at close to a million dollars, and the OC192 links just from 
Sunnyvale to Chicago would have cost what was left of the million/per month.

We used a stock TCP (Linux kernel TCP).  We did however, use jumbo frames (9000Byte 
MTUs).

In response Richard A Steenbergen we are not now living in a tropical foreign 
country,  with lots and lots of drugs and women but then the weather in California is 
great today.

What am I missing here, theres OC48=2.4Gb, OC192=10Gb ...

We were running host to host (end-to-end) with a single stream with common off the 
shelf equipment, there are not too many (I think none)  1GE host NICs available today 
that are in production (e.g. without signing a non-disclosure agreement).

Production commercial networks ... Blow away these speeds on a regular basis. 
See the above remark about end-to-end application to application, single stream.

So, you turn down/off all the parts of TCP that allow you to share bandwidth ... 
We did not mess with the TCP stack, it was stock off the shelf.

... Mention that Internet speed records are measured in terabit-meters/sec. 
You are correct, this is important, but reporters want a sound bite and typically only 
focus on one thing at a time. I will make sure next time I talk to a reporter to 
emphasize this. Maybe we can get some mileage out of Petabmps (Peta bit metres per 
second)  sounds

What kind of production environment needs a single TCP stream of data at 1Gbits/s 
over a 150ms latency link? 
Today High Energy Particle Physics needs hundreds of Megabits/s between California and 
Europe (Lyon, Padova and Oxford) to deliver data on a timely basis form an experiment 
site at SLAC to regional computer sites in Europe. Today on production acadmeic 
networks (with sustainable rates of 100 to a few hundred Mbits/s) it takes about a day 
to transmit just over a Tbyte of data which just about keeps up with the data rates. 
The data generation rates are doubling / year so within 1-3 years we will be needing 
speeds like in the record on a production basis. We needed to ensure we can achieve 
the needed rates, and whether we can do it with off the shelf hardware, how the hosts 
and OS' need configuring, how to tune the TCP stack or how newer stacks perform, what 
are the requirements for jumbo frames etc. Besides High Energy Physics other sciences 
are beginning to grapple with how to repliacte large databases across the globe, such 
sciences include radio-astronmoy, human genome, global
weather, seismic ...

The spud gun is interesting, given the distances, probably a 747 freightliner packed 
with DST tapes or disks is a better idea.  Assuming we fill the 747 with say 50 Gbps 
tapes (disks would probably be better), then if it takes 10 hours to fly from San 
Francisco (BTW Sunnyvale is near San Francisco not near LA as one person talking about 
retiring to better weather might lead one to believe) the bandwidth is about 2-4 
Tbits/s. However, this ignores the reality of labelling, writing the tapes, removing 
from silo robot, pocaking, getting to airport, loading, unloading, getting through 
customs etc. In reality the latency is really closer to 2 weeks. Even worse if there 
is an error (heads not aligned etc.) then the the retry latency is long and the effort 
involved considerable.  Also the network solution lends itself much better to 
automation, in our case we saved a couple of full time equivalent people at the 
sending site to distribute the data on a regular basis to our collaborator sites
in France, UK and Italy.

The remarks about window size and buffer are interesting also.  It is true large 
windows are needed. To approach 1Gbits/s we require 40MByte windows.  If this is going 
to be a problem, then we need to raise question like this soon and figure out how to 
address (add more memory, use other protocols etc.). In practice to approcah 
2.5Gbits/s requires 120MByte windows.

I am quite happy to concede that this does not need to be about some jocks beating a 
record. I do think it is important to catch the public's attention to why high speeds 
are important, that they are achievable today application to application (it would 
also be useful to estimate when such speeds are available to universities, large 
companies, small companies, the home etc.), and for techies it is important to start 
to understand the challenges the high speeds raise, e.g. cpu and router memories, 

RE: 923 Mbps across the Ocean ...

2003-03-08 Thread Cottrell, Les

We have been talking to the radio astronomy people.  We are aware they have such 
needs, however, I am unclear whether they have succeeded in transmitting single stream 
TCP application to application throughput of 900Mbits/s over 10,000km on a regular 
basis. Perhaps you could point me to whom to talk to.  I am aware of the work of 
Richard Hughes-Jones of Manchester University and others and the Radio Astronomy VLBI 
Data Transmission (see for example http://www.hep.man.ac.uk/~rich/VLBI_web/) since we 
have shared notes and talked together a lot on the high performance issues.  My 
understanding is that for today they use special high performance tapes to ship the 
data around, and are actively looking at using the network.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 08, 2003 12:23 PM
To: Jason Slagle
Cc: Richard A Steenbergen; fingers; [EMAIL PROTECTED]
Subject: Re: 923 Mbps across the Ocean ...



 On Sat, 8 Mar 2003, Richard A Steenbergen wrote:
 
  A) The amount of arrogance it takes to declare a land speed record when
 there are people out there doing way more than this on a regular 
  basis.
 
 Single stream at 900mbs over that distance?  Where?

Talk to folks that deal with radio telescopes.

Alex


RE: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

With the glossing over of details that goes with press releases there appears to be a 
misunderstanding here.  I never said we paid list prices. I am well aware that one can 
get large discounts from vendors. However, I think it is important to quote a well 
known price (in this case list), which people can relate to how well they think they 
can negotiate (otherwise it just becomes a bragging point of who can get the largest 
discount), and gets away from the point of giving people an idea of what it might 
cost.  In our case we got 100% (free) discounts from Level(3) and Cisco for the 
Sunnyvale to Chicago link and the GSR.

The link from StarLight to Amsterdam was put in place for a European funded 
demonstration (since turned into a production link), the equipment was mainly funded 
by another European research project.

At the same time, getting it for free has its costs, one has much less leverage with 
the vendors as to delivery (and retrieval) dates, reliability etc. as well as the 
headaches of getting everything (PCs, loaned NIC cards, Routers, links) to come 
together, to keep the vendors interest, extend the loan etc. 

High speed at reasonable costs are the end-goal. However, it is important to be able 
to plan for when one will need such links, to know what one will be able to achieve, 
and for regular users to be ready to use them when the commonly available. This takes 
some effort up front to achieve and demonstrate.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 08, 2003 12:30 PM
To: Cottrell, Les
Cc: '[EMAIL PROTECTED]'
Subject: Re: 923Mbits/s across the ocean


 You are modest in your budgetary request. Just the Cisco router (GSR
 12406) we had on free loan listed at close to a million dollars, and 
 the OC192 links just from Sunnyvale to Chicago would have cost what 
 was left of the million/per month.

No, your budget folks have no clue, which they clearly demonstrate.  Anyone here who 
buys Cisco at the list prices works for companies that for some reason want to waste 
money. We pay about 10c on a dollar.

Anyone leasing OC-192 at that price as opposite to lighting it up is smoking.

 What am I missing here, theres OC48=2.4Gb, OC192=10Gb ...
 
 We were running host to host (end-to-end) with a single stream with 
 common off the shelf equipment, there are not too many (I think none) 
  1GE host NICs available today that are in production (e.g. without 
 signing a non-disclosure agreement).

Again, if this is all available today, what is so new that you guys have done, apart 
from blowing tons of money?

 The remarks about window size and buffer are interesting also.  It is 
 true large windows are needed. To approach 1Gbits/s we require 40MByte 
 windows. If this is going to be a problem, then we need to raise 
 question like this soon and figure out how to address (add more 
 memory, use other protocols etc.). In practice to approcah 2.5Gbits/s 
 requires 120MByte windows.
 
 I am quite happy to concede that this does not need to be about some 
 jocks beating a record. I do think it is important to catch the 
 public's attention to why high speeds are important, that they are 
 achievable today application to application (it would also be useful 
 to estimate when such speeds are available to universities, large 
 companies, small companies, the home etc.), and for techies it is 
 important to start to understand the challenges the high speeds raise, 
 e.g. cpu and router memories, bugs in TCP, OS, application etc., new 
 TCP stacks, new (possibly UDP based) protocols such as tsunami, need 
 for 64 bit counters in monitoring, effects of the NIC card, jumbo 
 requirements etc., and what is needed to address them. Also to try and 
 put it in meaningful terms (such as 2 full length DVD movies in a 
 minute, that could also increase the cease and desist legal messages 
 shipped ;-)) is important.

High speeds are not important. High speeds at a *reasonable* cost are important. What 
you are describing is a high speed at an *unreasonable* cost.

Alex


RE: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

 High speed at reasonable costs are the end-goal. However, it is 
 important to be able to plan for when one will need such links, to 
 know what one will be able to achieve, and for regular users to be 
 ready to use them when the commonly available. This takes some effort 
 up front to achieve and demonstrate.

True, however as it was mentioned before, why not do the same type of testing in a 
lab environment 
between a couple of boxes having the TCP stack insert appropriate delays? When in 
1995 we were 
getting simplex IP links over
satellites up that is how we did the testing before bringing them up on the birds.

Following up on and driven by the work leading up to and following the Land Speed 
Record, 
some of the Caltech people collaborating on this record together with collaborators 
from SLAC and
elsewhere, are proposing a WAN in the Lab that can be used for just such testing. 
This saves on 
leasing fibers but there are still considerable expenses to run at 10Gbit/s rates 
(cpus, NICs, 
optical multiplexing equipment etc.). It is also a much more controlled environment 
that simplifies things. 
On the other hand it misses out on the real world experience, and so eventually has to 
be tested 
first on real world lightly used testbeds, and then on advanced research networks and 
finally on 
production networks, to understand how issues such as fairness, congestion avoidance, 
robustness to poor implemementations or configurations etc. really work. 




RE: 923Mbits/s across the ocean

2003-03-08 Thread Cottrell, Les

The jumbo frames effectively increase the congestion avoidance additive increase of 
the congestion avoidance phase of TCP by a factor of 6.  Thus after a congestion 
event, that reduces the window by a factor of 2, one can recover 6 times as fast. This 
is very important on large RTT fast links where the recovery rate(for TCP/Reno) goes 
as the MTU/RTT^2. This can be seen in some of the graphs at:

http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/stacks.png  or more fully at:
http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/ 

We saw little congestion related packet loss on the testbed. With big windows SACK 
becomes increasingly important so one does not have recover a large fraction of the 
window for a single packet.

Once one gets onto networks where one is really sharing the bandwidth with others 
performance drops off rapidly (see for example the measuremsnts at 
http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/#Measurements%20from%20Sunnyvale%20to%20Amsterdam
 and compare them with those at 
http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/#TCP%20Stack%20Comparisons%20with%20Single%20Streams

One of the next things we want to look at next is how the various new TCP stacks work 
on production Academic  Research Networks (e.g. from Internet2, ESnet, GEANT, ...) 
with lots of other competing traffic. 
-Original Message-
From: Iljitsch van Beijnum [mailto:[EMAIL PROTECTED] 
Sent: Saturday, March 08, 2003 1:49 PM
To: Cottrell, Les
Cc: '[EMAIL PROTECTED]'
Subject: Re: 923Mbits/s across the ocean


On Sat, 8 Mar 2003, Cottrell, Les wrote:

 We used a stock TCP (Linux kernel TCP).  We did however, use jumbo 
 frames (9000Byte MTUs).

What kind of difference did you see as opposed to standard 1500 byte packets? I did 
some testing once and things actually ran slightly faster with 1500 byte packets, 
completely contrary to my expectations... (This was UDP and just 0.003 km rather than 
10,000, though.)

 The remarks about window size and buffer are interesting also.  It is 
 true large windows are needed. To approach 1Gbits/s we require 40MByte 
 windows.  If this is going to be a problem, then we need to raise 
 question like this soon and figure out how to address (add more 
 memory, use other protocols etc.). In practice to approcah 2.5Gbits/s 
 requires 120MByte windows.

So how much packet loss did you see? Even with a few packets in a million lost this 
would bring your transfer way down and/or you'd need even bigger windows.

However, bigger windows mean more congestion. When two of those boxes start pushing 
traffic at 1 Gbps with a 40 MB window, you'll see 20 MB worth of lost packets due to 
congestion in a single RTT.

A test where the high-bandwidth session or several high-bandwidth sessions have to 
live side by side with other traffic would be very interesting. If this works well it 
opens up possibilities of doing this type of application over real networks rather 
than (virtual) point-to-point links where congestion management isn't an issue.