Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-22 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote:
> Hi,
> 
> Am 21.12.2013 um 10:24 schrieb Yousong Zhou :
> > In my situation, wget was trigger on the remote machine like the
> > following:
> > 
> >wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193
> > 
> > Then on local machine, I would download with:
> > 
> >nc localhost 7193
> > 
> > Before these, a local forwarding tunnel has been setup with ssh to make
> > this possible.  So in this case, there was no local file on the machine
> > where wget was triggerred and `--continue' will not work.  I am sure
> > there are other cases `--start-pos' would be useful and that
> > `--start-pos' would make wget more complete.
> 
> 
> When I just look at your problem it seems to be easier to set up the tunnel
> slightly different and pull with standard wget. If the URL looks like
>   http://:/
> and you set up the tunnel with
>   ssh -L 7193:: 
> and then just
>   wget http://localhost:7193/
> the range requests would be sent by wget just fine to the initial server and
> you could also safely use -c on further wget invocations (or with proper 
> values
> for -t / -T automatically).

Just tried this approach.  It did not work out as expected because HTTP
server responded with multiple levels of redirection thus host part were
changed on the fly.  Anyway, thank you for you time.


yousong




Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote:
> Hi,
> 
> Am 21.12.2013 um 10:24 schrieb Yousong Zhou :
> > In my situation, wget was trigger on the remote machine like the
> > following:
> > 
> >wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193
> > 
> > Then on local machine, I would download with:
> > 
> >nc localhost 7193
> > 
> > Before these, a local forwarding tunnel has been setup with ssh to make
> > this possible.  So in this case, there was no local file on the machine
> > where wget was triggerred and `--continue' will not work.  I am sure
> > there are other cases `--start-pos' would be useful and that
> > `--start-pos' would make wget more complete.
> 
> 
> When I just look at your problem it seems to be easier to set up the tunnel
> slightly different and pull with standard wget. If the URL looks like
>   http://:/
> and you set up the tunnel with
>   ssh -L 7193:: 
> and then just
>   wget http://localhost:7193/
> the range requests would be sent by wget just fine to the initial server and
> you could also safely use -c on further wget invocations (or with proper 
> values
> for -t / -T automatically).

Yes, this is more sensible a way of doing the download once the tunnel
is up.  Thank you for pointing this out.  Really, one thing that has
caught much of my concern when redirecting netcat stdout to a file is
that it may not good for my hard disk with hours of continuous disk
writing.  I may count on wget's behavior on this ;)

In other cases, `--start-pos' may still come in handy, for example, when
trying to peek just parts of each file on server without fully
downloading them, or doing parallel downloads with some simple
scripting.  Just name a few I come up with.


yousong

> 
> 
> Best regards
> 
>   -- Dago
> 
> -- 
> "You don't become great by trying to be great, you become great by wanting to 
> do something,
> and then doing it so hard that you become great in the process." - xkcd #896
> 





Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Dagobert Michelsen
Hi,

Am 21.12.2013 um 10:24 schrieb Yousong Zhou :
> In my situation, wget was trigger on the remote machine like the
> following:
> 
>wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193
> 
> Then on local machine, I would download with:
> 
>nc localhost 7193
> 
> Before these, a local forwarding tunnel has been setup with ssh to make
> this possible.  So in this case, there was no local file on the machine
> where wget was triggerred and `--continue' will not work.  I am sure
> there are other cases `--start-pos' would be useful and that
> `--start-pos' would make wget more complete.


When I just look at your problem it seems to be easier to set up the tunnel
slightly different and pull with standard wget. If the URL looks like
  http://:/
and you set up the tunnel with
  ssh -L 7193:: 
and then just
  wget http://localhost:7193/
the range requests would be sent by wget just fine to the initial server and
you could also safely use -c on further wget invocations (or with proper values
for -t / -T automatically).


Best regards

  -- Dago

-- 
"You don't become great by trying to be great, you become great by wanting to 
do something,
and then doing it so hard that you become great in the process." - xkcd #896



smime.p7s
Description: S/MIME cryptographic signature


Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 01:51:04PM +0530, Darshit Shah wrote:
> I have a few comments on the patch. Commenting inline.

Thank you.

> 
> On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou wrote:
> 
> > This patch adds an option `--start-pos' for specifying starting position
> > of a download, both for HTTP and FTP.  When specified, the newly added
> > option would override `--continue'.  Apart from that, no existing code
> > should be affected.
> >
> > Signed-off-by: Yousong Zhou 
> > ---
> > Hi,
> >
> > I found myself needed this feature when I was trying to tunnel the
> > download of
> > big file (several gigabytes) from a remote machine back to local through a
> > somewhat flaky connection.  It's a pain both for the server and local
> > network
> > users if we have to repeat the previously already downloaded part in case
> > that
> > the connection hangs or breaks.  Specifying 'Range: ' header is not an
> > option
> > for wget (integrity check in the code would fail), and curl is not fast
> > enough.
> > So I decided to make this patch in hope that this can also be useful to
> > someone
> > else.
> >
> > What integrity check would fail on using the Range Header? And if you
> already have a partially downloaded file why is using the --continue switch
> on an option?

`--continue` only works if there is already a partially downloaded file
on disk.  Otherwise, specifying `-c' will only tell wget to start from
scratch.

By 'Range: ' header I mean headers specified by `--header'.  If the
server sends back a 'Content-Range: ' header in the response, wget would
think that it's unexpected or not matching what's already on the disk
(would be zero if there is no file on disk).  If I get the code right,
the check is at `http.c:gethttp()':

2744   if ((contrange != 0 && contrange != hs->restval)
2745   || (H_PARTIAL (statcode) && !contrange))
2746 {
2747   /* The Range request was somehow misunderstood by the server.
2748  Bail out.  */
2749   xfree_null (type);
2750   CLOSE_INVALIDATE (sock);
2751   xfree (head);
2752   return RANGEERR;
2753 }

In my situation, wget was trigger on the remote machine like the
following:

wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193

Then on local machine, I would download with:

nc localhost 7193

Before these, a local forwarding tunnel has been setup with ssh to make
this possible.  So in this case, there was no local file on the machine
where wget was triggerred and `--continue' will not work.  I am sure
there are other cases `--start-pos' would be useful and that
`--start-pos' would make wget more complete.

> 
> yousong
> >
> >  doc/ChangeLog |4 
> >  doc/wget.texi |   14 ++
> >  src/ChangeLog |9 +
> >  src/ftp.c |2 ++
> >  src/http.c|2 ++
> >  src/init.c|1 +
> >  src/main.c|1 +
> >  src/options.h |1 +
> >  8 files changed, 34 insertions(+), 0 deletions(-)
> >
> > diff --git a/doc/ChangeLog b/doc/ChangeLog
> > index 3b05756..df103c8 100644
> > --- a/doc/ChangeLog
> > +++ b/doc/ChangeLog
> > @@ -1,3 +1,7 @@
> > +2013-12-21  Yousong Zhou  
> > +
> > +   * wget.texi: Add documentation for --start-pos.
> > +
> >  2013-10-06  Tim Ruehsen  
> >
> > * wget.texi: add/explain quoting of wildcard patterns
> > diff --git a/doc/wget.texi b/doc/wget.texi
> > index 4a1f7f1..166ea08 100644
> > --- a/doc/wget.texi
> > +++ b/doc/wget.texi
> > @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if
> > you try to use
> >  Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
> >  servers that support the @code{Range} header.
> >
> > +@cindex offset
> > +@cindex continue retrieval
> > +@cindex incomplete downloads
> > +@cindex resume download
> > +@cindex start position
> > +@item --start-pos=@var{OFFSET}
> > +Start the download at position @var{OFFSET}.  Offset may be expressed in
> > bytes,
> > +kilobytes with the `k' suffix, or megabytes with the `m' suffix.
> > +
> > +When specified, it would override the behavior of @samp{--continue}.  When
> > +using this option, you may also want to explicitly specify an output
> > filename
> > +with @samp{-O FILE} in order to not overwrite an existing partially
> > downloaded
> > +file.
> > +
> >  @cindex progress indicator
> >  @cindex dot style
> >  @item --progress=@var{type}
> > diff --git a/src/ChangeLog b/src/ChangeLog
> > index 42ce3e4..ab8a496 100644
> > --- a/src/ChangeLog
> > +++ b/src/ChangeLog
> > @@ -1,3 +1,12 @@
> > +2013-12-21  Yousong Zhou  
> > +
> > +   * options.h: Add option --start-pos to specify start position of
> > + a download.
> > +   * main.c: Same purpose as above.
> > +   * init.c: Same purpose as above.
> > +   * http.c: Utilize opt.start_pos for HTTP download.
> > +   * ftp.c: Utilize opt.start_pos for FTP retrieval.
> > +
> >  2013-11-02  Giusepp

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Darshit Shah
I have a few comments on the patch. Commenting inline.

On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou wrote:

> This patch adds an option `--start-pos' for specifying starting position
> of a download, both for HTTP and FTP.  When specified, the newly added
> option would override `--continue'.  Apart from that, no existing code
> should be affected.
>
> Signed-off-by: Yousong Zhou 
> ---
> Hi,
>
> I found myself needed this feature when I was trying to tunnel the
> download of
> big file (several gigabytes) from a remote machine back to local through a
> somewhat flaky connection.  It's a pain both for the server and local
> network
> users if we have to repeat the previously already downloaded part in case
> that
> the connection hangs or breaks.  Specifying 'Range: ' header is not an
> option
> for wget (integrity check in the code would fail), and curl is not fast
> enough.
> So I decided to make this patch in hope that this can also be useful to
> someone
> else.
>
> What integrity check would fail on using the Range Header? And if you
already have a partially downloaded file why is using the --continue switch
on an option?

yousong
>
>  doc/ChangeLog |4 
>  doc/wget.texi |   14 ++
>  src/ChangeLog |9 +
>  src/ftp.c |2 ++
>  src/http.c|2 ++
>  src/init.c|1 +
>  src/main.c|1 +
>  src/options.h |1 +
>  8 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/doc/ChangeLog b/doc/ChangeLog
> index 3b05756..df103c8 100644
> --- a/doc/ChangeLog
> +++ b/doc/ChangeLog
> @@ -1,3 +1,7 @@
> +2013-12-21  Yousong Zhou  
> +
> +   * wget.texi: Add documentation for --start-pos.
> +
>  2013-10-06  Tim Ruehsen  
>
> * wget.texi: add/explain quoting of wildcard patterns
> diff --git a/doc/wget.texi b/doc/wget.texi
> index 4a1f7f1..166ea08 100644
> --- a/doc/wget.texi
> +++ b/doc/wget.texi
> @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if
> you try to use
>  Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
>  servers that support the @code{Range} header.
>
> +@cindex offset
> +@cindex continue retrieval
> +@cindex incomplete downloads
> +@cindex resume download
> +@cindex start position
> +@item --start-pos=@var{OFFSET}
> +Start the download at position @var{OFFSET}.  Offset may be expressed in
> bytes,
> +kilobytes with the `k' suffix, or megabytes with the `m' suffix.
> +
> +When specified, it would override the behavior of @samp{--continue}.  When
> +using this option, you may also want to explicitly specify an output
> filename
> +with @samp{-O FILE} in order to not overwrite an existing partially
> downloaded
> +file.
> +
>  @cindex progress indicator
>  @cindex dot style
>  @item --progress=@var{type}
> diff --git a/src/ChangeLog b/src/ChangeLog
> index 42ce3e4..ab8a496 100644
> --- a/src/ChangeLog
> +++ b/src/ChangeLog
> @@ -1,3 +1,12 @@
> +2013-12-21  Yousong Zhou  
> +
> +   * options.h: Add option --start-pos to specify start position of
> + a download.
> +   * main.c: Same purpose as above.
> +   * init.c: Same purpose as above.
> +   * http.c: Utilize opt.start_pos for HTTP download.
> +   * ftp.c: Utilize opt.start_pos for FTP retrieval.
> +
>  2013-11-02  Giuseppe Scrivano  
>
> * http.c (gethttp): Increase max header value length to 512.
> diff --git a/src/ftp.c b/src/ftp.c
> index c2522ca..c7ab6ef 100644
> --- a/src/ftp.c
> +++ b/src/ftp.c
> @@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo
> *f, ccon *con, char **local_fi
>/* Decide whether or not to restart.  */
>if (con->cmd & DO_LIST)
>  restval = 0;
> +  else if (opt.start_pos)
> +restval = opt.start_pos;
>else if (opt.always_rest
>&& stat (locf, &st) == 0
>&& S_ISREG (st.st_mode))
> diff --git a/src/http.c b/src/http.c
> index 754b7ec..a354c6b 100644
> --- a/src/http.c
> +++ b/src/http.c
> @@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file
> exists.\n"));
>/* Decide whether or not to restart.  */
>if (force_full_retrieve)
>  hstat.restval = hstat.len;
> +  else if (opt.start_pos)
> +hstat.restval = opt.start_pos;
>else if (opt.always_rest
>&& got_name
>&& stat (hstat.local_file, &st) == 0
> diff --git a/src/init.c b/src/init.c
> index 84ae654..7f7a34e 100644
> --- a/src/init.c
> +++ b/src/init.c
> @@ -271,6 +271,7 @@ static const struct {
>{ "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean },
>{ "spanhosts",&opt.spanhost,  cmd_boolean },
>{ "spider",   &opt.spider,cmd_boolean },
> +  { "startpos", &opt.start_pos, cmd_bytes },
>{ "strictcomments",   &opt.strict_comments,   cmd_boolean },
>{ "timeout",  NULL,   cmd_spec_timeout },
>{ "timestamping", &opt.timestamping,  cmd_boolean },
>

[Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-20 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
Hi, 

I found myself needed this feature when I was trying to tunnel the download of
big file (several gigabytes) from a remote machine back to local through a
somewhat flaky connection.  It's a pain both for the server and local network
users if we have to repeat the previously already downloaded part in case that
the connection hangs or breaks.  Specifying 'Range: ' header is not an option
for wget (integrity check in the code would fail), and curl is not fast enough.
So I decided to make this patch in hope that this can also be useful to someone
else.

yousong

 doc/ChangeLog |4 
 doc/wget.texi |   14 ++
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|1 +
 src/options.h |1 +
 8 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  
+
+   * wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  
 
* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..166ea08 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if you 
try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start the download at position @var{OFFSET}.  Offset may be expressed in bytes,
+kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  
+
+   * options.h: Add option --start-pos to specify start position of
+ a download.
+   * main.c: Same purpose as above.
+   * init.c: Same purpose as above.
+   * http.c: Utilize opt.start_pos for HTTP download.
+   * ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  
 
* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, 
ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con->cmd & DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
   && stat (locf, &st) == 0
   && S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n"));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
   && got_name
   && stat (hstat.local_file, &st) == 0
diff --git a/src/init.c b/src/init.c
index 84ae654..7f7a34e 100644
--- a/src/init.c
+++ b/src/init.c
@@ -271,6 +271,7 @@ static const struct {
   { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean },
   { "spanhosts",&opt.spanhost,  cmd_boolean },
   { "spider",   &opt.spider,cmd_boolean },
+  { "startpos", &opt.start_pos, cmd_bytes },
   { "strictcomments",   &opt.strict_comments,   cmd_boolean },
   { "timeout",  NULL,   cmd_spec_timeout },
   { "timestamping", &opt.timestamping,  cmd_boolean },
diff --git a/src/main.c b/src/main.c
index 19d7253..4fbfaee 100644
--- a/src/main.c
+++ b/src/main.c
@@ -281,6 +281,7 @@ static struct cmdline_option option_data[] =
 { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 },
 { "span-hosts", 'H', OPT_BOOLEAN, "spanhosts", -1 },
 { "spider", 0, OPT_BOOLEAN, "spider", -1 },
+{ "start-pos", 0, OPT_VALUE, "startpos", -1 },
 { "strict-comments", 0, OPT_BOOLEAN, "strictcomments", -1 },
 { "timeout", 'T', OPT_VALUE, "timeout", -1 },
 { "timestamp