Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote: > Hi, > > Am 21.12.2013 um 10:24 schrieb Yousong Zhou : > > In my situation, wget was trigger on the remote machine like the > > following: > > > >wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193 > > > > Then on local machine, I would download with: > > > >nc localhost 7193 > > > > Before these, a local forwarding tunnel has been setup with ssh to make > > this possible. So in this case, there was no local file on the machine > > where wget was triggerred and `--continue' will not work. I am sure > > there are other cases `--start-pos' would be useful and that > > `--start-pos' would make wget more complete. > > > When I just look at your problem it seems to be easier to set up the tunnel > slightly different and pull with standard wget. If the URL looks like > http://:/ > and you set up the tunnel with > ssh -L 7193:: > and then just > wget http://localhost:7193/ > the range requests would be sent by wget just fine to the initial server and > you could also safely use -c on further wget invocations (or with proper > values > for -t / -T automatically). Just tried this approach. It did not work out as expected because HTTP server responded with multiple levels of redirection thus host part were changed on the fly. Anyway, thank you for you time. yousong
Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote: > Hi, > > Am 21.12.2013 um 10:24 schrieb Yousong Zhou : > > In my situation, wget was trigger on the remote machine like the > > following: > > > >wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193 > > > > Then on local machine, I would download with: > > > >nc localhost 7193 > > > > Before these, a local forwarding tunnel has been setup with ssh to make > > this possible. So in this case, there was no local file on the machine > > where wget was triggerred and `--continue' will not work. I am sure > > there are other cases `--start-pos' would be useful and that > > `--start-pos' would make wget more complete. > > > When I just look at your problem it seems to be easier to set up the tunnel > slightly different and pull with standard wget. If the URL looks like > http://:/ > and you set up the tunnel with > ssh -L 7193:: > and then just > wget http://localhost:7193/ > the range requests would be sent by wget just fine to the initial server and > you could also safely use -c on further wget invocations (or with proper > values > for -t / -T automatically). Yes, this is more sensible a way of doing the download once the tunnel is up. Thank you for pointing this out. Really, one thing that has caught much of my concern when redirecting netcat stdout to a file is that it may not good for my hard disk with hours of continuous disk writing. I may count on wget's behavior on this ;) In other cases, `--start-pos' may still come in handy, for example, when trying to peek just parts of each file on server without fully downloading them, or doing parallel downloads with some simple scripting. Just name a few I come up with. yousong > > > Best regards > > -- Dago > > -- > "You don't become great by trying to be great, you become great by wanting to > do something, > and then doing it so hard that you become great in the process." - xkcd #896 >
Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
Hi, Am 21.12.2013 um 10:24 schrieb Yousong Zhou : > In my situation, wget was trigger on the remote machine like the > following: > >wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193 > > Then on local machine, I would download with: > >nc localhost 7193 > > Before these, a local forwarding tunnel has been setup with ssh to make > this possible. So in this case, there was no local file on the machine > where wget was triggerred and `--continue' will not work. I am sure > there are other cases `--start-pos' would be useful and that > `--start-pos' would make wget more complete. When I just look at your problem it seems to be easier to set up the tunnel slightly different and pull with standard wget. If the URL looks like http://:/ and you set up the tunnel with ssh -L 7193:: and then just wget http://localhost:7193/ the range requests would be sent by wget just fine to the initial server and you could also safely use -c on further wget invocations (or with proper values for -t / -T automatically). Best regards -- Dago -- "You don't become great by trying to be great, you become great by wanting to do something, and then doing it so hard that you become great in the process." - xkcd #896 smime.p7s Description: S/MIME cryptographic signature
Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
On Sat, Dec 21, 2013 at 01:51:04PM +0530, Darshit Shah wrote: > I have a few comments on the patch. Commenting inline. Thank you. > > On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou wrote: > > > This patch adds an option `--start-pos' for specifying starting position > > of a download, both for HTTP and FTP. When specified, the newly added > > option would override `--continue'. Apart from that, no existing code > > should be affected. > > > > Signed-off-by: Yousong Zhou > > --- > > Hi, > > > > I found myself needed this feature when I was trying to tunnel the > > download of > > big file (several gigabytes) from a remote machine back to local through a > > somewhat flaky connection. It's a pain both for the server and local > > network > > users if we have to repeat the previously already downloaded part in case > > that > > the connection hangs or breaks. Specifying 'Range: ' header is not an > > option > > for wget (integrity check in the code would fail), and curl is not fast > > enough. > > So I decided to make this patch in hope that this can also be useful to > > someone > > else. > > > > What integrity check would fail on using the Range Header? And if you > already have a partially downloaded file why is using the --continue switch > on an option? `--continue` only works if there is already a partially downloaded file on disk. Otherwise, specifying `-c' will only tell wget to start from scratch. By 'Range: ' header I mean headers specified by `--header'. If the server sends back a 'Content-Range: ' header in the response, wget would think that it's unexpected or not matching what's already on the disk (would be zero if there is no file on disk). If I get the code right, the check is at `http.c:gethttp()': 2744 if ((contrange != 0 && contrange != hs->restval) 2745 || (H_PARTIAL (statcode) && !contrange)) 2746 { 2747 /* The Range request was somehow misunderstood by the server. 2748 Bail out. */ 2749 xfree_null (type); 2750 CLOSE_INVALIDATE (sock); 2751 xfree (head); 2752 return RANGEERR; 2753 } In my situation, wget was trigger on the remote machine like the following: wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193 Then on local machine, I would download with: nc localhost 7193 Before these, a local forwarding tunnel has been setup with ssh to make this possible. So in this case, there was no local file on the machine where wget was triggerred and `--continue' will not work. I am sure there are other cases `--start-pos' would be useful and that `--start-pos' would make wget more complete. > > yousong > > > > doc/ChangeLog |4 > > doc/wget.texi | 14 ++ > > src/ChangeLog |9 + > > src/ftp.c |2 ++ > > src/http.c|2 ++ > > src/init.c|1 + > > src/main.c|1 + > > src/options.h |1 + > > 8 files changed, 34 insertions(+), 0 deletions(-) > > > > diff --git a/doc/ChangeLog b/doc/ChangeLog > > index 3b05756..df103c8 100644 > > --- a/doc/ChangeLog > > +++ b/doc/ChangeLog > > @@ -1,3 +1,7 @@ > > +2013-12-21 Yousong Zhou > > + > > + * wget.texi: Add documentation for --start-pos. > > + > > 2013-10-06 Tim Ruehsen > > > > * wget.texi: add/explain quoting of wildcard patterns > > diff --git a/doc/wget.texi b/doc/wget.texi > > index 4a1f7f1..166ea08 100644 > > --- a/doc/wget.texi > > +++ b/doc/wget.texi > > @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if > > you try to use > > Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http} > > servers that support the @code{Range} header. > > > > +@cindex offset > > +@cindex continue retrieval > > +@cindex incomplete downloads > > +@cindex resume download > > +@cindex start position > > +@item --start-pos=@var{OFFSET} > > +Start the download at position @var{OFFSET}. Offset may be expressed in > > bytes, > > +kilobytes with the `k' suffix, or megabytes with the `m' suffix. > > + > > +When specified, it would override the behavior of @samp{--continue}. When > > +using this option, you may also want to explicitly specify an output > > filename > > +with @samp{-O FILE} in order to not overwrite an existing partially > > downloaded > > +file. > > + > > @cindex progress indicator > > @cindex dot style > > @item --progress=@var{type} > > diff --git a/src/ChangeLog b/src/ChangeLog > > index 42ce3e4..ab8a496 100644 > > --- a/src/ChangeLog > > +++ b/src/ChangeLog > > @@ -1,3 +1,12 @@ > > +2013-12-21 Yousong Zhou > > + > > + * options.h: Add option --start-pos to specify start position of > > + a download. > > + * main.c: Same purpose as above. > > + * init.c: Same purpose as above. > > + * http.c: Utilize opt.start_pos for HTTP download. > > + * ftp.c: Utilize opt.start_pos for FTP retrieval. > > + > > 2013-11-02 Giusepp
Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
I have a few comments on the patch. Commenting inline. On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou wrote: > This patch adds an option `--start-pos' for specifying starting position > of a download, both for HTTP and FTP. When specified, the newly added > option would override `--continue'. Apart from that, no existing code > should be affected. > > Signed-off-by: Yousong Zhou > --- > Hi, > > I found myself needed this feature when I was trying to tunnel the > download of > big file (several gigabytes) from a remote machine back to local through a > somewhat flaky connection. It's a pain both for the server and local > network > users if we have to repeat the previously already downloaded part in case > that > the connection hangs or breaks. Specifying 'Range: ' header is not an > option > for wget (integrity check in the code would fail), and curl is not fast > enough. > So I decided to make this patch in hope that this can also be useful to > someone > else. > > What integrity check would fail on using the Range Header? And if you already have a partially downloaded file why is using the --continue switch on an option? yousong > > doc/ChangeLog |4 > doc/wget.texi | 14 ++ > src/ChangeLog |9 + > src/ftp.c |2 ++ > src/http.c|2 ++ > src/init.c|1 + > src/main.c|1 + > src/options.h |1 + > 8 files changed, 34 insertions(+), 0 deletions(-) > > diff --git a/doc/ChangeLog b/doc/ChangeLog > index 3b05756..df103c8 100644 > --- a/doc/ChangeLog > +++ b/doc/ChangeLog > @@ -1,3 +1,7 @@ > +2013-12-21 Yousong Zhou > + > + * wget.texi: Add documentation for --start-pos. > + > 2013-10-06 Tim Ruehsen > > * wget.texi: add/explain quoting of wildcard patterns > diff --git a/doc/wget.texi b/doc/wget.texi > index 4a1f7f1..166ea08 100644 > --- a/doc/wget.texi > +++ b/doc/wget.texi > @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if > you try to use > Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http} > servers that support the @code{Range} header. > > +@cindex offset > +@cindex continue retrieval > +@cindex incomplete downloads > +@cindex resume download > +@cindex start position > +@item --start-pos=@var{OFFSET} > +Start the download at position @var{OFFSET}. Offset may be expressed in > bytes, > +kilobytes with the `k' suffix, or megabytes with the `m' suffix. > + > +When specified, it would override the behavior of @samp{--continue}. When > +using this option, you may also want to explicitly specify an output > filename > +with @samp{-O FILE} in order to not overwrite an existing partially > downloaded > +file. > + > @cindex progress indicator > @cindex dot style > @item --progress=@var{type} > diff --git a/src/ChangeLog b/src/ChangeLog > index 42ce3e4..ab8a496 100644 > --- a/src/ChangeLog > +++ b/src/ChangeLog > @@ -1,3 +1,12 @@ > +2013-12-21 Yousong Zhou > + > + * options.h: Add option --start-pos to specify start position of > + a download. > + * main.c: Same purpose as above. > + * init.c: Same purpose as above. > + * http.c: Utilize opt.start_pos for HTTP download. > + * ftp.c: Utilize opt.start_pos for FTP retrieval. > + > 2013-11-02 Giuseppe Scrivano > > * http.c (gethttp): Increase max header value length to 512. > diff --git a/src/ftp.c b/src/ftp.c > index c2522ca..c7ab6ef 100644 > --- a/src/ftp.c > +++ b/src/ftp.c > @@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo > *f, ccon *con, char **local_fi >/* Decide whether or not to restart. */ >if (con->cmd & DO_LIST) > restval = 0; > + else if (opt.start_pos) > +restval = opt.start_pos; >else if (opt.always_rest >&& stat (locf, &st) == 0 >&& S_ISREG (st.st_mode)) > diff --git a/src/http.c b/src/http.c > index 754b7ec..a354c6b 100644 > --- a/src/http.c > +++ b/src/http.c > @@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file > exists.\n")); >/* Decide whether or not to restart. */ >if (force_full_retrieve) > hstat.restval = hstat.len; > + else if (opt.start_pos) > +hstat.restval = opt.start_pos; >else if (opt.always_rest >&& got_name >&& stat (hstat.local_file, &st) == 0 > diff --git a/src/init.c b/src/init.c > index 84ae654..7f7a34e 100644 > --- a/src/init.c > +++ b/src/init.c > @@ -271,6 +271,7 @@ static const struct { >{ "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean }, >{ "spanhosts",&opt.spanhost, cmd_boolean }, >{ "spider", &opt.spider,cmd_boolean }, > + { "startpos", &opt.start_pos, cmd_bytes }, >{ "strictcomments", &opt.strict_comments, cmd_boolean }, >{ "timeout", NULL, cmd_spec_timeout }, >{ "timestamping", &opt.timestamping, cmd_boolean }, >
[Bug-wget] [PATCH] Make wget capable of starting download from a specified position.
This patch adds an option `--start-pos' for specifying starting position of a download, both for HTTP and FTP. When specified, the newly added option would override `--continue'. Apart from that, no existing code should be affected. Signed-off-by: Yousong Zhou --- Hi, I found myself needed this feature when I was trying to tunnel the download of big file (several gigabytes) from a remote machine back to local through a somewhat flaky connection. It's a pain both for the server and local network users if we have to repeat the previously already downloaded part in case that the connection hangs or breaks. Specifying 'Range: ' header is not an option for wget (integrity check in the code would fail), and curl is not fast enough. So I decided to make this patch in hope that this can also be useful to someone else. yousong doc/ChangeLog |4 doc/wget.texi | 14 ++ src/ChangeLog |9 + src/ftp.c |2 ++ src/http.c|2 ++ src/init.c|1 + src/main.c|1 + src/options.h |1 + 8 files changed, 34 insertions(+), 0 deletions(-) diff --git a/doc/ChangeLog b/doc/ChangeLog index 3b05756..df103c8 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,7 @@ +2013-12-21 Yousong Zhou + + * wget.texi: Add documentation for --start-pos. + 2013-10-06 Tim Ruehsen * wget.texi: add/explain quoting of wildcard patterns diff --git a/doc/wget.texi b/doc/wget.texi index 4a1f7f1..166ea08 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if you try to use Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http} servers that support the @code{Range} header. +@cindex offset +@cindex continue retrieval +@cindex incomplete downloads +@cindex resume download +@cindex start position +@item --start-pos=@var{OFFSET} +Start the download at position @var{OFFSET}. Offset may be expressed in bytes, +kilobytes with the `k' suffix, or megabytes with the `m' suffix. + +When specified, it would override the behavior of @samp{--continue}. When +using this option, you may also want to explicitly specify an output filename +with @samp{-O FILE} in order to not overwrite an existing partially downloaded +file. + @cindex progress indicator @cindex dot style @item --progress=@var{type} diff --git a/src/ChangeLog b/src/ChangeLog index 42ce3e4..ab8a496 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,12 @@ +2013-12-21 Yousong Zhou + + * options.h: Add option --start-pos to specify start position of + a download. + * main.c: Same purpose as above. + * init.c: Same purpose as above. + * http.c: Utilize opt.start_pos for HTTP download. + * ftp.c: Utilize opt.start_pos for FTP retrieval. + 2013-11-02 Giuseppe Scrivano * http.c (gethttp): Increase max header value length to 512. diff --git a/src/ftp.c b/src/ftp.c index c2522ca..c7ab6ef 100644 --- a/src/ftp.c +++ b/src/ftp.c @@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi /* Decide whether or not to restart. */ if (con->cmd & DO_LIST) restval = 0; + else if (opt.start_pos) +restval = opt.start_pos; else if (opt.always_rest && stat (locf, &st) == 0 && S_ISREG (st.st_mode)) diff --git a/src/http.c b/src/http.c index 754b7ec..a354c6b 100644 --- a/src/http.c +++ b/src/http.c @@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n")); /* Decide whether or not to restart. */ if (force_full_retrieve) hstat.restval = hstat.len; + else if (opt.start_pos) +hstat.restval = opt.start_pos; else if (opt.always_rest && got_name && stat (hstat.local_file, &st) == 0 diff --git a/src/init.c b/src/init.c index 84ae654..7f7a34e 100644 --- a/src/init.c +++ b/src/init.c @@ -271,6 +271,7 @@ static const struct { { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean }, { "spanhosts",&opt.spanhost, cmd_boolean }, { "spider", &opt.spider,cmd_boolean }, + { "startpos", &opt.start_pos, cmd_bytes }, { "strictcomments", &opt.strict_comments, cmd_boolean }, { "timeout", NULL, cmd_spec_timeout }, { "timestamping", &opt.timestamping, cmd_boolean }, diff --git a/src/main.c b/src/main.c index 19d7253..4fbfaee 100644 --- a/src/main.c +++ b/src/main.c @@ -281,6 +281,7 @@ static struct cmdline_option option_data[] = { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 }, { "span-hosts", 'H', OPT_BOOLEAN, "spanhosts", -1 }, { "spider", 0, OPT_BOOLEAN, "spider", -1 }, +{ "start-pos", 0, OPT_VALUE, "startpos", -1 }, { "strict-comments", 0, OPT_BOOLEAN, "strictcomments", -1 }, { "timeout", 'T', OPT_VALUE, "timeout", -1 }, { "timestamp