On Sat, Dec 21, 2013 at 01:51:04PM +0530, Darshit Shah wrote: > I have a few comments on the patch. Commenting inline.
Thank you. > > On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou <yszhou4t...@gmail.com>wrote: > > > This patch adds an option `--start-pos' for specifying starting position > > of a download, both for HTTP and FTP. When specified, the newly added > > option would override `--continue'. Apart from that, no existing code > > should be affected. > > > > Signed-off-by: Yousong Zhou <yszhou4t...@gmail.com> > > --- > > Hi, > > > > I found myself needed this feature when I was trying to tunnel the > > download of > > big file (several gigabytes) from a remote machine back to local through a > > somewhat flaky connection. It's a pain both for the server and local > > network > > users if we have to repeat the previously already downloaded part in case > > that > > the connection hangs or breaks. Specifying 'Range: ' header is not an > > option > > for wget (integrity check in the code would fail), and curl is not fast > > enough. > > So I decided to make this patch in hope that this can also be useful to > > someone > > else. > > > > What integrity check would fail on using the Range Header? And if you > already have a partially downloaded file why is using the --continue switch > on an option? `--continue` only works if there is already a partially downloaded file on disk. Otherwise, specifying `-c' will only tell wget to start from scratch. By 'Range: ' header I mean headers specified by `--header'. If the server sends back a 'Content-Range: ' header in the response, wget would think that it's unexpected or not matching what's already on the disk (would be zero if there is no file on disk). If I get the code right, the check is at `http.c:gethttp()': 2744 if ((contrange != 0 && contrange != hs->restval) 2745 || (H_PARTIAL (statcode) && !contrange)) 2746 { 2747 /* The Range request was somehow misunderstood by the server. 2748 Bail out. */ 2749 xfree_null (type); 2750 CLOSE_INVALIDATE (sock); 2751 xfree (head); 2752 return RANGEERR; 2753 } In my situation, wget was trigger on the remote machine like the following: wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193 Then on local machine, I would download with: nc localhost 7193 Before these, a local forwarding tunnel has been setup with ssh to make this possible. So in this case, there was no local file on the machine where wget was triggerred and `--continue' will not work. I am sure there are other cases `--start-pos' would be useful and that `--start-pos' would make wget more complete. > > yousong > > > > doc/ChangeLog | 4 ++++ > > doc/wget.texi | 14 ++++++++++++++ > > src/ChangeLog | 9 +++++++++ > > src/ftp.c | 2 ++ > > src/http.c | 2 ++ > > src/init.c | 1 + > > src/main.c | 1 + > > src/options.h | 1 + > > 8 files changed, 34 insertions(+), 0 deletions(-) > > > > diff --git a/doc/ChangeLog b/doc/ChangeLog > > index 3b05756..df103c8 100644 > > --- a/doc/ChangeLog > > +++ b/doc/ChangeLog > > @@ -1,3 +1,7 @@ > > +2013-12-21 Yousong Zhou <yszhou4t...@gmail.com> > > + > > + * wget.texi: Add documentation for --start-pos. > > + > > 2013-10-06 Tim Ruehsen <tim.rueh...@gmx.de> > > > > * wget.texi: add/explain quoting of wildcard patterns > > diff --git a/doc/wget.texi b/doc/wget.texi > > index 4a1f7f1..166ea08 100644 > > --- a/doc/wget.texi > > +++ b/doc/wget.texi > > @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if > > you try to use > > Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http} > > servers that support the @code{Range} header. > > > > +@cindex offset > > +@cindex continue retrieval > > +@cindex incomplete downloads > > +@cindex resume download > > +@cindex start position > > +@item --start-pos=@var{OFFSET} > > +Start the download at position @var{OFFSET}. Offset may be expressed in > > bytes, > > +kilobytes with the `k' suffix, or megabytes with the `m' suffix. > > + > > +When specified, it would override the behavior of @samp{--continue}. When > > +using this option, you may also want to explicitly specify an output > > filename > > +with @samp{-O FILE} in order to not overwrite an existing partially > > downloaded > > +file. > > + > > @cindex progress indicator > > @cindex dot style > > @item --progress=@var{type} > > diff --git a/src/ChangeLog b/src/ChangeLog > > index 42ce3e4..ab8a496 100644 > > --- a/src/ChangeLog > > +++ b/src/ChangeLog > > @@ -1,3 +1,12 @@ > > +2013-12-21 Yousong Zhou <yszhou4t...@gmail.com> > > + > > + * options.h: Add option --start-pos to specify start position of > > + a download. > > + * main.c: Same purpose as above. > > + * init.c: Same purpose as above. > > + * http.c: Utilize opt.start_pos for HTTP download. > > + * ftp.c: Utilize opt.start_pos for FTP retrieval. > > + > > 2013-11-02 Giuseppe Scrivano <gscri...@redhat.com> > > > > * http.c (gethttp): Increase max header value length to 512. > > diff --git a/src/ftp.c b/src/ftp.c > > index c2522ca..c7ab6ef 100644 > > --- a/src/ftp.c > > +++ b/src/ftp.c > > @@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo > > *f, ccon *con, char **local_fi > > /* Decide whether or not to restart. */ > > if (con->cmd & DO_LIST) > > restval = 0; > > + else if (opt.start_pos) > > + restval = opt.start_pos; > > else if (opt.always_rest > > && stat (locf, &st) == 0 > > && S_ISREG (st.st_mode)) > > diff --git a/src/http.c b/src/http.c > > index 754b7ec..a354c6b 100644 > > --- a/src/http.c > > +++ b/src/http.c > > @@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file > > exists.\n")); > > /* Decide whether or not to restart. */ > > if (force_full_retrieve) > > hstat.restval = hstat.len; > > + else if (opt.start_pos) > > + hstat.restval = opt.start_pos; > > else if (opt.always_rest > > && got_name > > && stat (hstat.local_file, &st) == 0 > > diff --git a/src/init.c b/src/init.c > > index 84ae654..7f7a34e 100644 > > --- a/src/init.c > > +++ b/src/init.c > > @@ -271,6 +271,7 @@ static const struct { > > { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean }, > > { "spanhosts", &opt.spanhost, cmd_boolean }, > > { "spider", &opt.spider, cmd_boolean }, > > + { "startpos", &opt.start_pos, cmd_bytes }, > > { "strictcomments", &opt.strict_comments, cmd_boolean }, > > { "timeout", NULL, cmd_spec_timeout }, > > { "timestamping", &opt.timestamping, cmd_boolean }, > > diff --git a/src/main.c b/src/main.c > > index 19d7253..4fbfaee 100644 > > --- a/src/main.c > > +++ b/src/main.c > > @@ -281,6 +281,7 @@ static struct cmdline_option option_data[] = > > { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 }, > > { "span-hosts", 'H', OPT_BOOLEAN, "spanhosts", -1 }, > > { "spider", 0, OPT_BOOLEAN, "spider", -1 }, > > + { "start-pos", 0, OPT_VALUE, "startpos", -1 }, > > { "strict-comments", 0, OPT_BOOLEAN, "strictcomments", -1 }, > > { "timeout", 'T', OPT_VALUE, "timeout", -1 }, > > { "timestamping", 'N', OPT_BOOLEAN, "timestamping", -1 }, > > diff --git a/src/options.h b/src/options.h > > index ad89627..9ba1760 100644 > > --- a/src/options.h > > +++ b/src/options.h > > @@ -115,6 +115,7 @@ struct options > > bool ask_passwd; /* Ask for password? */ > > > > bool always_rest; /* Always use REST. */ > > + wgint start_pos; /* Start position of a download. */ > > char *ftp_user; /* FTP username */ > > char *ftp_passwd; /* FTP password */ > > bool netrc; /* Whether to read .netrc. */ > > > Simply setting the startpos and restval values in Wget will not help you if > the server will keep sending the file from scratch on every request. The > only way to partially download a file is by using the Range header. Because > then, the server will send only the requested parts of the file. If a Setting the value will let wget send an appropriate 'Range: ' header or 'REST' command to the server. Below is two demo for this patch. yousong@jumper:~/wget/src$ ./wget -O - --start-pos=1213055 ftp://ftp.gnu.org/gnu/wget/wget-1.10.2.tar.gz | wc -c --2013-12-21 17:15:05-- ftp://ftp.gnu.org/gnu/wget/wget-1.10.2.tar.gz => '-' Resolving ftp.gnu.org... 2001:4830:134:3::b, 208.118.235.20 Connecting to ftp.gnu.org|2001:4830:134:3::b|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /gnu/wget ... done. ==> SIZE wget-1.10.2.tar.gz ... 1213056 ==> EPSV ... done. ==> REST 1213055 ... done. ==> RETR wget-1.10.2.tar.gz ... done. Length: 1213056 (1.2M), 1 remaining (unauthoritative) 100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 1,213,056 --.-K/s in 0s 2013-12-21 17:15:10 (136 KB/s) - written to stdout [1213056] 1 yousong@jumper:~/wget/src$ ./wget -O - --start-pos=1213055 http://ftp.gnu.org/gnu/wget/wget-1.10.2.tar.gz | wc -c --2013-12-21 17:18:45-- http://ftp.gnu.org/gnu/wget/wget-1.10.2.tar.gz Resolving ftp.gnu.org... 2001:4830:134:3::b, 208.118.235.20 Connecting to ftp.gnu.org|2001:4830:134:3::b|:80... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 1213056 (1.2M), 1 remaining [application/x-gzip] Saving to: 'STDOUT' 100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 1,213,056 --.-K/s in 0s 2013-12-21 17:18:47 (112 KB/s) - written to stdout [1213056/1213056] 1 yousong@jumper:~/wget/src$ > server does not support the range header, you really have no other option > but need to resume downloading the file from scratch. This is a limitation > of the HTTP protocol and no client can help you get around it. Yes. I will add this to the doc. > > > Also, please send patches as git format-patch attachments. Email clients > often mangle with spaces and newlines and cause problems when applying them. The patch was formatted with git-format-patch then sent with git-send-email. I will send them as attachments if they are preferred here. Thank you for reviewing this. More suggestions? yousong > > -- > Thanking You, > Darshit Shah