RE: 923Mbits/s across the ocean
Also as the OS's are shipped they come with small default maximum window sizes (I think Linux is typically 64KB and Solaris is 8K), and so one has to get the sysadmin with root privs to change this. -Original Message- From: Iljitsch van Beijnum [mailto:[EMAIL PROTECTED] Sent: Sunday, March 09, 2003 5:25 AM To: Joe St Sauver Cc: [EMAIL PROTECTED] Subject: Re: 923Mbits/s across the ocean On Sat, 8 Mar 2003, Joe St Sauver wrote: > you will see that for bulk TCP flows, the median throughput is still > only 2.3Mbps. 95th%-ile is only ~9Mbps. That's really not all that > great, throughput wise, IMHO. Strange. Why is that? RFC 1323 is widely implemented, although not widely enabled (and for good reason: the timestamp option kills header compression so it's bad for lower-bandwidth connections). My guess is that the OS can't afford to throw around MB+ size buffers for every TCP session so the default buffers (which limit the windows that can be used) are relatively small and application programmers don't override the default.
RE: 923Mbits/s across the ocean
The jumbo frames effectively increase the congestion avoidance additive increase of the congestion avoidance phase of TCP by a factor of 6. Thus after a congestion event, that reduces the window by a factor of 2, one can recover 6 times as fast. This is very important on large RTT fast links where the recovery rate(for TCP/Reno) goes as the MTU/RTT^2. This can be seen in some of the graphs at: http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/stacks.png or more fully at: http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/ We saw little congestion related packet loss on the testbed. With big windows SACK becomes increasingly important so one does not have recover a large fraction of the window for a single packet. Once one gets onto networks where one is really sharing the bandwidth with others performance drops off rapidly (see for example the measuremsnts at http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/#Measurements%20from%20Sunnyvale%20to%20Amsterdam and compare them with those at http://www-iepm.slac.stanford.edu/monitoring/bulk/fast/#TCP%20Stack%20Comparisons%20with%20Single%20Streams One of the next things we want to look at next is how the various new TCP stacks work on production Academic & Research Networks (e.g. from Internet2, ESnet, GEANT, ...) with lots of other competing traffic. -Original Message- From: Iljitsch van Beijnum [mailto:[EMAIL PROTECTED] Sent: Saturday, March 08, 2003 1:49 PM To: Cottrell, Les Cc: '[EMAIL PROTECTED]' Subject: Re: 923Mbits/s across the ocean On Sat, 8 Mar 2003, Cottrell, Les wrote: > We used a stock TCP (Linux kernel TCP). We did however, use jumbo > frames (9000Byte MTUs). What kind of difference did you see as opposed to standard 1500 byte packets? I did some testing once and things actually ran slightly faster with 1500 byte packets, completely contrary to my expectations... (This was UDP and just 0.003 km rather than 10,000, though.) > The remarks about window size and buffer are interesting also. It is > true large windows are needed. To approach 1Gbits/s we require 40MByte > windows. If this is going to be a problem, then we need to raise > question like this soon and figure out how to address (add more > memory, use other protocols etc.). In practice to approcah 2.5Gbits/s > requires 120MByte windows. So how much packet loss did you see? Even with a few packets in a million lost this would bring your transfer way down and/or you'd need even bigger windows. However, bigger windows mean more congestion. When two of those boxes start pushing traffic at 1 Gbps with a 40 MB window, you'll see 20 MB worth of lost packets due to congestion in a single RTT. A test where the high-bandwidth session or several high-bandwidth sessions have to live side by side with other traffic would be very interesting. If this works well it opens up possibilities of doing this type of application over real networks rather than (virtual) point-to-point links where congestion management isn't an issue.
RE: 923Mbits/s across the ocean
>> High speed at reasonable costs are the end-goal. However, it is >> important to be able to plan for when one will need such links, to >> know what one will be able to achieve, and for regular users to be >> ready to use them when the commonly available. This takes some effort >> up front to achieve and demonstrate. >True, however as it was mentioned before, why not do the same type of testing in a >lab environment >between a couple >of boxes having the TCP stack insert appropriate delays? When in >1995 we were >getting simplex IP links over >satellites up that is how we did the testing before bringing them up on the birds. Following up on and driven by the work leading up to and following the Land Speed Record, some of the Caltech people collaborating on this record together with collaborators from SLAC and elsewhere, are proposing a "WAN in the Lab" that can be used for just such testing. This saves on leasing fibers but there are still considerable expenses to run at 10Gbit/s rates (cpus, NICs, optical multiplexing equipment etc.). It is also a much more controlled environment that simplifies things. On the other hand it misses out on the real world experience, and so eventually has to be tested first on real world lightly used testbeds, and then on advanced research networks and finally on production networks, to understand how issues such as fairness, congestion avoidance, robustness to poor implemementations or configurations etc. really work.
RE: 923Mbits/s across the ocean
With the glossing over of details that goes with press releases there appears to be a misunderstanding here. I never said we paid list prices. I am well aware that one can get large discounts from vendors. However, I think it is important to quote a well known price (in this case list), which people can relate to how well they think they can negotiate (otherwise it just becomes a bragging point of who can get the largest discount), and gets away from the point of giving people an idea of what it might cost. In our case we got 100% (free) discounts from Level(3) and Cisco for the Sunnyvale to Chicago link and the GSR. The link from StarLight to Amsterdam was put in place for a European funded demonstration (since turned into a production link), the equipment was mainly funded by another European research project. At the same time, getting it for free has its costs, one has much less leverage with the vendors as to delivery (and retrieval) dates, reliability etc. as well as the headaches of getting everything (PCs, loaned NIC cards, Routers, links) to come together, to keep the vendors interest, extend the loan etc. High speed at reasonable costs are the end-goal. However, it is important to be able to plan for when one will need such links, to know what one will be able to achieve, and for regular users to be ready to use them when the commonly available. This takes some effort up front to achieve and demonstrate. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Saturday, March 08, 2003 12:30 PM To: Cottrell, Les Cc: '[EMAIL PROTECTED]' Subject: Re: 923Mbits/s across the ocean > You are modest in your budgetary request. Just the Cisco router (GSR > 12406) we had on free loan listed at close to a million dollars, and > the OC192 links just from Sunnyvale to Chicago would have cost what > was left of the million/per month. No, your budget folks have no clue, which they clearly demonstrate. Anyone here who buys Cisco at the list prices works for companies that for some reason want to waste money. We pay about 10c on a dollar. Anyone leasing OC-192 at that price as opposite to lighting it up is smoking. > "What am I missing here, theres OC48=2.4Gb, OC192=10Gb ..." > > We were running host to host (end-to-end) with a single stream with > common off the shelf equipment, there are not too many (I think none) > > 1GE host NICs available today that are in production (e.g. without > signing a non-disclosure agreement). Again, if this is all available today, what is so new that you guys have done, apart from blowing tons of money? > The remarks about window size and buffer are interesting also. It is > true large windows are needed. To approach 1Gbits/s we require 40MByte > windows. If this is going to be a problem, then we need to raise > question like this soon and figure out how to address (add more > memory, use other protocols etc.). In practice to approcah 2.5Gbits/s > requires 120MByte windows. > > I am quite happy to concede that this does not need to be about some > jocks beating a record. I do think it is important to catch the > public's attention to why high speeds are important, that they are > achievable today application to application (it would also be useful > to estimate when such speeds are available to universities, large > companies, small companies, the home etc.), and for techies it is > important to start to understand the challenges the high speeds raise, > e.g. cpu and router memories, bugs in TCP, OS, application etc., new > TCP stacks, new (possibly UDP based) protocols such as tsunami, need > for 64 bit counters in monitoring, effects of the NIC card, jumbo > requirements etc., and what is needed to address them. Also to try and > put it in meaningful terms (such as 2 full length DVD movies in a > minute, that could also increase the "cease and desist" legal messages > shipped ;-)) is important. High speeds are not important. High speeds at a *reasonable* cost are important. What you are describing is a high speed at an *unreasonable* cost. Alex
RE: 923 Mbps across the Ocean ...
We have been talking to the radio astronomy people. We are aware they have such needs, however, I am unclear whether they have succeeded in transmitting single stream TCP application to application throughput of 900Mbits/s over 10,000km on a regular basis. Perhaps you could point me to whom to talk to. I am aware of the work of Richard Hughes-Jones of Manchester University and others and the Radio Astronomy VLBI Data Transmission (see for example http://www.hep.man.ac.uk/~rich/VLBI_web/) since we have shared notes and talked together a lot on the high performance issues. My understanding is that for today they use special high performance tapes to ship the data around, and are actively looking at using the network. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Saturday, March 08, 2003 12:23 PM To: Jason Slagle Cc: Richard A Steenbergen; fingers; [EMAIL PROTECTED] Subject: Re: 923 Mbps across the Ocean ... > On Sat, 8 Mar 2003, Richard A Steenbergen wrote: > > > A) The amount of arrogance it takes to declare a land speed "record" when > >there are people out there doing way more than this on a regular > > basis. > > Single stream at 900mbs over that distance? Where? Talk to folks that deal with radio telescopes. Alex
Re: 923Mbits/s across the ocean
I am not normally on this list but someone kindly gave me copies of some of the email concerning the Internet2 Land Speed record. So I have joined the list. As one of the PIs of the record, I thought it might be useful to comment on a few interesting items I have seen, and no I am not trying to flame anybody: "Give em a million dollars, plus fiber from here to anywhere and let me muck with the TCP algorith, and I can move a GigE worth of traffic too - Dave" You are modest in your budgetary request. Just the Cisco router (GSR 12406) we had on free loan listed at close to a million dollars, and the OC192 links just from Sunnyvale to Chicago would have cost what was left of the million/per month. We used a stock TCP (Linux kernel TCP). We did however, use jumbo frames (9000Byte MTUs). In response Richard A Steenbergen we are not "now living in a tropical foreign country, with lots and lots of drugs and women" but then the weather in California is great today. "What am I missing here, theres OC48=2.4Gb, OC192=10Gb ..." We were running host to host (end-to-end) with a single stream with common off the shelf equipment, there are not too many (I think none) > 1GE host NICs available today that are in production (e.g. without signing a non-disclosure agreement). "Production commercial networks ... Blow away these speeds on a regular basis". See the above remark about end-to-end application to application, single stream. "So, you turn down/off all the parts of TCP that allow you to share bandwidth ..." We did not mess with the TCP stack, it was stock off the shelf. "... Mention that "Internet speed records" are measured in terabit-meters/sec." You are correct, this is important, but reporters want a sound bite and typically only focus on one thing at a time. I will make sure next time I talk to a reporter to emphasize this. Maybe we can get some mileage out of Petabmps (Peta bit metres per second) sounds "What kind of production environment needs a single TCP stream of data at 1Gbits/s over a 150ms latency link?" Today High Energy Particle Physics needs hundreds of Megabits/s between California and Europe (Lyon, Padova and Oxford) to deliver data on a timely basis form an experiment site at SLAC to regional computer sites in Europe. Today on production acadmeic networks (with sustainable rates of 100 to a few hundred Mbits/s) it takes about a day to transmit just over a Tbyte of data which just about keeps up with the data rates. The data generation rates are doubling / year so within 1-3 years we will be needing speeds like in the record on a production basis. We needed to ensure we can achieve the needed rates, and whether we can do it with off the shelf hardware, how the hosts and OS' need configuring, how to tune the TCP stack or how newer stacks perform, what are the requirements for jumbo frames etc. Besides High Energy Physics other sciences are beginning to grapple with how to repliacte large databases across the globe, such sciences include radio-astronmoy, human genome, global weather, seismic ... The spud gun is interesting, given the distances, probably a 747 freightliner packed with DST tapes or disks is a better idea. Assuming we fill the 747 with say 50 Gbps tapes (disks would probably be better), then if it takes 10 hours to fly from San Francisco (BTW Sunnyvale is near San Francisco not near LA as one person talking about retiring to better weather might lead one to believe) the bandwidth is about 2-4 Tbits/s. However, this ignores the reality of labelling, writing the tapes, removing from silo robot, pocaking, getting to airport, loading, unloading, getting through customs etc. In reality the latency is really closer to 2 weeks. Even worse if there is an error (heads not aligned etc.) then the the retry latency is long and the effort involved considerable. Also the network solution lends itself much better to automation, in our case we saved a couple of full time equivalent people at the sending site to distribute the data on a regular basis to our collaborator sites in France, UK and Italy. The remarks about window size and buffer are interesting also. It is true large windows are needed. To approach 1Gbits/s we require 40MByte windows. If this is going to be a problem, then we need to raise question like this soon and figure out how to address (add more memory, use other protocols etc.). In practice to approcah 2.5Gbits/s requires 120MByte windows. I am quite happy to concede that this does not need to be about some jocks beating a record. I do think it is important to catch the public's attention to why high speeds are important, that they are achievable today application to application (it would also be useful to estimate when such speeds are available to universities, large companies, small companies, the home etc.), and for techies it is important to start to understand the challenges the high speeds raise, e.g. cpu and