Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-11-26 Thread Ángel González
On 10/11/11 03:24, Andrew Daviel wrote:
>
> When downloading a large file over a high-latency (e.g. long physical
> distance) high-bandwidth link, the download time is dominated by the
> round-trip time for TCP handshakes.
>
> In the past tools such as bbftp have mitigated this effect by using
> multiple streams, but required both a special server and client.
>
> Using the "range" header in HTTP/1.1, it is possible to start multiple
> simultaneous requests for different portions of a file using a
> standard Apache server, and achieve a significant speedup.
> I have a proof-of-principle Perl script using threads which was able
> to download a medium-sized file from Europe to Vancouver in half the
> normal time.
>
> I wondered it this was of interest as an enhanscement for wget.
>
> regards

I think setting a big SO_RCVBUF should also fix your issue, by using big
window sizes, and it's cleaner.
OTOH, you need support from the TCP stack, and that won't trick
per-connection rate limits that may be
limiting you in the single-connection case.




Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-11-29 Thread Andrew Daviel

On Sat, 26 Nov 2011, Ángel González wrote:


On 10/11/11 03:24, Andrew Daviel wrote:


When downloading a large file over a high-latency (e.g. long physical
distance) high-bandwidth link, the download time is dominated by the
round-trip time for TCP handshakes.

Using the "range" header in HTTP/1.1, it is possible to start multiple
simultaneous requests for different portions of a file using a
standard Apache server, and achieve a significant speedup.

I wondered it this was of interest as an enhanscement for wget.


I think setting a big SO_RCVBUF should also fix your issue, by using big
window sizes, and it's cleaner.
OTOH, you need support from the TCP stack, and that won't trick
per-connection rate limits that may be
limiting you in the single-connection case.


Yes, jumbo frames work well over a private link like a lightpath. I'd 
been thinking of something that would work on the unimproved public 
internet.


I had been thinking of speeding up transfers to e.g. a WebDAV repository 
on another continent, but I became recently aware of "download 
accelerators" designed primarily to thwart bandwidth 
allocation/throttling. Interestingly Wget is listed on the Wikipedia page 
as a "download manager", implying it can already do this.


http://en.wikipedia.org/wiki/Download_acceleration


--
Andrew Daviel, TRIUMF, Canada

Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-11-30 Thread Paul Wratt
another command line option possibly?

Paul

2011/11/30 Andrew Daviel :
> On Sat, 26 Nov 2011, Ángel González wrote:
>
>> On 10/11/11 03:24, Andrew Daviel wrote:
>>>
>>>
>>> When downloading a large file over a high-latency (e.g. long physical
>>> distance) high-bandwidth link, the download time is dominated by the
>>> round-trip time for TCP handshakes.
>>>
>>> Using the "range" header in HTTP/1.1, it is possible to start multiple
>>> simultaneous requests for different portions of a file using a
>>> standard Apache server, and achieve a significant speedup.
>>>
>>> I wondered it this was of interest as an enhanscement for wget.
>>
>>
>> I think setting a big SO_RCVBUF should also fix your issue, by using big
>> window sizes, and it's cleaner.
>> OTOH, you need support from the TCP stack, and that won't trick
>> per-connection rate limits that may be
>> limiting you in the single-connection case.
>
>
> Yes, jumbo frames work well over a private link like a lightpath. I'd been
> thinking of something that would work on the unimproved public internet.
>
> I had been thinking of speeding up transfers to e.g. a WebDAV repository on
> another continent, but I became recently aware of "download accelerators"
> designed primarily to thwart bandwidth allocation/throttling. Interestingly
> Wget is listed on the Wikipedia page as a "download manager", implying it
> can already do this.
>
> http://en.wikipedia.org/wiki/Download_acceleration
>
>
> --
> Andrew Daviel, TRIUMF, Canada



Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-11-30 Thread Fernando Cassia
2011/11/29 Andrew Daviel :
> but I became recently aware of "download accelerators" designed primarily to
> thwart bandwidth allocation/throttling. Interestingly Wget is listed on the
> Wikipedia page as a "download manager", implying it can already do this.
>
> http://en.wikipedia.org/wiki/Download_acceleration

´Axel´ command line app

http://www.theinquirer.net/inquirer/news/1037769/don-download-accelerator

FC

-- 
"The purpose of computing is insight, not numbers."
Richard Hamming - http://en.wikipedia.org/wiki/Hamming_code



Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-11-30 Thread Fernando Cassia
On Wed, Nov 9, 2011 at 23:24, Andrew Daviel  wrote:
> When downloading a large file over a high-latency (e.g. long physical
> distance) high-bandwidth link, the download time is dominated by the
> round-trip time for TCP handshakes.

Which is why large files should be stored on FTP servers, not http.

FTP was designed for a reason: to transfer large binary files reliably.

http was designed primarily to serve web pages, not large file downloads.

But try telling that to today´s sysadmins and webmasters educated aften Win95...

FC

-- 
"The purpose of computing is insight, not numbers."
Richard Hamming - http://en.wikipedia.org/wiki/Hamming_code



Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-11-30 Thread Paul Wratt
unfortunately there are now a lot of services offered where ftp access
is not provided, or not available, or even blocked. About 90% of the
servers I mirror fit into this category

On Thu, Dec 1, 2011 at 6:43 AM, Fernando Cassia  wrote:
> On Wed, Nov 9, 2011 at 23:24, Andrew Daviel  wrote:
>> When downloading a large file over a high-latency (e.g. long physical
>> distance) high-bandwidth link, the download time is dominated by the
>> round-trip time for TCP handshakes.
>
> Which is why large files should be stored on FTP servers, not http.
>
> FTP was designed for a reason: to transfer large binary files reliably.
>
> http was designed primarily to serve web pages, not large file downloads.
>
> But try telling that to today´s sysadmins and webmasters educated aften 
> Win95...
>
> FC
>
> --
> "The purpose of computing is insight, not numbers."
> Richard Hamming - http://en.wikipedia.org/wiki/Hamming_code
>



Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-11-30 Thread Daniel Stenberg

On Wed, 30 Nov 2011, Fernando Cassia wrote:

When downloading a large file over a high-latency (e.g. long physical 
distance) high-bandwidth link, the download time is dominated by the 
round-trip time for TCP handshakes.


First off, this early conclusion is incorrect. RTT has basically no impact on 
an ongoing TCP transfer these days since they introduced large windows for 
like a decade ago.



Which is why large files should be stored on FTP servers, not http.


This is a myth repeated by people all over the net and through-out history. It 
is not true.


I've tried to spell out some FTP vs HTTP facts here: 
http://daniel.haxx.se/docs/ftp-vs-http.html


--

 / daniel.haxx.se



Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-12-01 Thread Andrew Daviel

On Wed, 30 Nov 2011, Daniel Stenberg wrote:


On Wed, 30 Nov 2011, Fernando Cassia wrote:

When downloading a large file over a high-latency (e.g. long physical 
distance) high-bandwidth link, the download time is dominated by the 
round-trip time for TCP handshakes.


First off, this early conclusion is incorrect. RTT has basically no impact on 
an ongoing TCP transfer these days since they introduced large windows for 
like a decade ago.


I may be wrong, but I thought that to get significant benefit large 
windows had to be enabled on every router between the source and 
destination, which I did not think was the case on the public Internet.



Which is why large files should be stored on FTP servers, not http.


I recall that BABARftp and GridFTP have support for multiple threaded 
downloads, but that regular FTP does not. I'd agree with Daniel - FTP 
offers no advantage (for serving files) over HTTP and did not support SSL 
for so long that some orgs deprecated it as insecure.


Re. Axel - thanks for the link FC. I hadn't heard of it. Seems to start 
an inital full-length transfer then kill it with TCP resets if the server 
supports ranges.
I find wget faster slightly sending a photo from my work to home (in the 
same city), but axel faster getting a large file from the other side of the world as 
expected. No DAV/upload ability though.


(I'm not that interested in bypassing download throttling; this was more 
a thought experiment prompted by a discussion of using WebDAV between 
Europe and North America)


--
Andrew Daviel, TRIUMF, Canada



Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-12-02 Thread Daniel Stenberg

On Thu, 1 Dec 2011, Andrew Daviel wrote:

First off, this early conclusion is incorrect. RTT has basically no impact 
on an ongoing TCP transfer these days since they introduced large windows 
for like a decade ago.


I may be wrong, but I thought that to get significant benefit large windows 
had to be enabled on every router between the source and destination, which 
I did not think was the case on the public Internet.


Not exactly, but in firewalls and NATs or whatever you have in between. It is 
a TCP option, not IP: http://en.wikipedia.org/wiki/TCP_window_scale_option


I don't believe this specific option is a common problem these days as current 
network speeds over a single TCP connection wouldn't be achievable without it.


--

 / daniel.haxx.se



Re: [Bug-wget] Support for long-haul high-bandwidth links

2011-12-06 Thread Andrew Daviel

On Fri, 2 Dec 2011, Daniel Stenberg wrote:

I may be wrong, but I thought that to get significant benefit large windows 
had to be enabled on every router between the source and destination, which 
I did not think was the case on the public Internet.


Not exactly, but in firewalls and NATs or whatever you have in between. It is 
a TCP option, not IP: http://en.wikipedia.org/wiki/TCP_window_scale_option


I'm probably confusing that with jumbo frames, which we had 
experimentally enabled on a long distance "lightpath".

http://en.wikipedia.org/wiki/Jumbo_frames
http://www.bigbangwidth.com/pdf/ADS.pdf

In studies done a while ago now (2001), people had tuned TCP parameters
including the buffer size, but still got better performance from 
multi-threading

http://hepwww.rl.ac.uk/Adye/talks/010402-ftp/html/index.htm


ES.net is still recommending multi-thread transfer in pages updated this 
year, though as you say auto-tuning the congestion window is now 
mainstream

http://fasterdata.es.net/fasterdata/host-tuning/
http://fasterdata.es.net/fasterdata/data-transfer-tools/

--
Andrew Daviel, TRIUMF, Canada