[Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-20 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
Hi, 

I found myself needed this feature when I was trying to tunnel the download of
big file (several gigabytes) from a remote machine back to local through a
somewhat flaky connection.  It's a pain both for the server and local network
users if we have to repeat the previously already downloaded part in case that
the connection hangs or breaks.  Specifying 'Range: ' header is not an option
for wget (integrity check in the code would fail), and curl is not fast enough.
So I decided to make this patch in hope that this can also be useful to someone
else.

yousong

 doc/ChangeLog |4 
 doc/wget.texi |   14 ++
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|1 +
 src/options.h |1 +
 8 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  
+
+   * wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  
 
* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..166ea08 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if you 
try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start the download at position @var{OFFSET}.  Offset may be expressed in bytes,
+kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  
+
+   * options.h: Add option --start-pos to specify start position of
+ a download.
+   * main.c: Same purpose as above.
+   * init.c: Same purpose as above.
+   * http.c: Utilize opt.start_pos for HTTP download.
+   * ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  
 
* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, 
ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con->cmd & DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
   && stat (locf, &st) == 0
   && S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n"));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
   && got_name
   && stat (hstat.local_file, &st) == 0
diff --git a/src/init.c b/src/init.c
index 84ae654..7f7a34e 100644
--- a/src/init.c
+++ b/src/init.c
@@ -271,6 +271,7 @@ static const struct {
   { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean },
   { "spanhosts",&opt.spanhost,  cmd_boolean },
   { "spider",   &opt.spider,cmd_boolean },
+  { "startpos", &opt.start_pos, cmd_bytes },
   { "strictcomments",   &opt.strict_comments,   cmd_boolean },
   { "timeout",  NULL,   cmd_spec_timeout },
   { "timestamping", &opt.timestamping,  cmd_boolean },
diff --git a/src/main.c b/src/main.c
index 19d7253..4fbfaee 100644
--- a/src/main.c
+++ b/src/main.c
@@ -281,6 +281,7 @@ static struct cmdline_option option_data[] =
 { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 },
 { "span-hosts", 'H', OPT_BOOLEA

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 01:51:04PM +0530, Darshit Shah wrote:
> I have a few comments on the patch. Commenting inline.

Thank you.

> 
> On Sat, Dec 21, 2013 at 12:32 PM, Yousong Zhou wrote:
> 
> > This patch adds an option `--start-pos' for specifying starting position
> > of a download, both for HTTP and FTP.  When specified, the newly added
> > option would override `--continue'.  Apart from that, no existing code
> > should be affected.
> >
> > Signed-off-by: Yousong Zhou 
> > ---
> > Hi,
> >
> > I found myself needed this feature when I was trying to tunnel the
> > download of
> > big file (several gigabytes) from a remote machine back to local through a
> > somewhat flaky connection.  It's a pain both for the server and local
> > network
> > users if we have to repeat the previously already downloaded part in case
> > that
> > the connection hangs or breaks.  Specifying 'Range: ' header is not an
> > option
> > for wget (integrity check in the code would fail), and curl is not fast
> > enough.
> > So I decided to make this patch in hope that this can also be useful to
> > someone
> > else.
> >
> > What integrity check would fail on using the Range Header? And if you
> already have a partially downloaded file why is using the --continue switch
> on an option?

`--continue` only works if there is already a partially downloaded file
on disk.  Otherwise, specifying `-c' will only tell wget to start from
scratch.

By 'Range: ' header I mean headers specified by `--header'.  If the
server sends back a 'Content-Range: ' header in the response, wget would
think that it's unexpected or not matching what's already on the disk
(would be zero if there is no file on disk).  If I get the code right,
the check is at `http.c:gethttp()':

2744   if ((contrange != 0 && contrange != hs->restval)
2745   || (H_PARTIAL (statcode) && !contrange))
2746 {
2747   /* The Range request was somehow misunderstood by the server.
2748  Bail out.  */
2749   xfree_null (type);
2750   CLOSE_INVALIDATE (sock);
2751   xfree (head);
2752   return RANGEERR;
2753 }

In my situation, wget was trigger on the remote machine like the
following:

wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193

Then on local machine, I would download with:

nc localhost 7193

Before these, a local forwarding tunnel has been setup with ssh to make
this possible.  So in this case, there was no local file on the machine
where wget was triggerred and `--continue' will not work.  I am sure
there are other cases `--start-pos' would be useful and that
`--start-pos' would make wget more complete.

> 
> yousong
> >
> >  doc/ChangeLog |4 
> >  doc/wget.texi |   14 ++
> >  src/ChangeLog |9 +
> >  src/ftp.c |2 ++
> >  src/http.c    |2 ++
> >  src/init.c|1 +
> >  src/main.c|1 +
> >  src/options.h |1 +
> >  8 files changed, 34 insertions(+), 0 deletions(-)
> >
> > diff --git a/doc/ChangeLog b/doc/ChangeLog
> > index 3b05756..df103c8 100644
> > --- a/doc/ChangeLog
> > +++ b/doc/ChangeLog
> > @@ -1,3 +1,7 @@
> > +2013-12-21  Yousong Zhou  
> > +
> > +   * wget.texi: Add documentation for --start-pos.
> > +
> >  2013-10-06  Tim Ruehsen  
> >
> > * wget.texi: add/explain quoting of wildcard patterns
> > diff --git a/doc/wget.texi b/doc/wget.texi
> > index 4a1f7f1..166ea08 100644
> > --- a/doc/wget.texi
> > +++ b/doc/wget.texi
> > @@ -701,6 +701,20 @@ Another instance where you'll get a garbled file if
> > you try to use
> >  Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
> >  servers that support the @code{Range} header.
> >
> > +@cindex offset
> > +@cindex continue retrieval
> > +@cindex incomplete downloads
> > +@cindex resume download
> > +@cindex start position
> > +@item --start-pos=@var{OFFSET}
> > +Start the download at position @var{OFFSET}.  Offset may be expressed in
> > bytes,
> > +kilobytes with the `k' suffix, or megabytes with the `m' suffix.
> > +
> > +When specified, it would override the behavior of @samp{--continue}.  When
> > +using this option, you may also want to explicitly specify an output
> > filename
> > +with @samp{-O FILE} in order to not overwrite an existing partially
> > downloaded
> > +file.
> > +
> >

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote:
> Hi,
> 
> Am 21.12.2013 um 10:24 schrieb Yousong Zhou :
> > In my situation, wget was trigger on the remote machine like the
> > following:
> > 
> >wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193
> > 
> > Then on local machine, I would download with:
> > 
> >nc localhost 7193
> > 
> > Before these, a local forwarding tunnel has been setup with ssh to make
> > this possible.  So in this case, there was no local file on the machine
> > where wget was triggerred and `--continue' will not work.  I am sure
> > there are other cases `--start-pos' would be useful and that
> > `--start-pos' would make wget more complete.
> 
> 
> When I just look at your problem it seems to be easier to set up the tunnel
> slightly different and pull with standard wget. If the URL looks like
>   http://:/
> and you set up the tunnel with
>   ssh -L 7193:: 
> and then just
>   wget http://localhost:7193/
> the range requests would be sent by wget just fine to the initial server and
> you could also safely use -c on further wget invocations (or with proper 
> values
> for -t / -T automatically).

Yes, this is more sensible a way of doing the download once the tunnel
is up.  Thank you for pointing this out.  Really, one thing that has
caught much of my concern when redirecting netcat stdout to a file is
that it may not good for my hard disk with hours of continuous disk
writing.  I may count on wget's behavior on this ;)

In other cases, `--start-pos' may still come in handy, for example, when
trying to peek just parts of each file on server without fully
downloading them, or doing parallel downloads with some simple
scripting.  Just name a few I come up with.


yousong

> 
> 
> Best regards
> 
>   -- Dago
> 
> -- 
> "You don't become great by trying to be great, you become great by wanting to 
> do something,
> and then doing it so hard that you become great in the process." - xkcd #896
> 





[Bug-wget] [PATCH v2] Make wget capable of starting download from a specified position.

2013-12-21 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
v1 -> v2

It was kindly pointed out by Darshit Shah  that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|1 +
 src/options.h |1 +
 8 files changed, 37 insertions(+), 0 deletions(-)
>From 93152cb081f529762a364eea67115f654cd6fda4 Mon Sep 17 00:00:00 2001
From: Yousong Zhou 
Date: Fri, 20 Dec 2013 23:17:43 +0800
Subject: [PATCH v2] Make wget capable of starting download from a specified position.

This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
v1 -> v2

	It was kindly pointed out by Darshit Shah  that
	server support for resuming download is required, so adding this into
	doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|1 +
 src/options.h |1 +
 8 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  
+
+	* wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  
 
 	* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..9151d28 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,23 @@ Another instance where you'll get a garbled file if you try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start the download at position @var{OFFSET}.  Offset may be expressed in bytes,
+kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
+Serer support for resuming download is needed, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  
+
+	* options.h: Add option --start-pos to specify start position of
+	  a download.
+	* main.c: Same purpose as above.
+	* init.c: Same purpose as above.
+	* http.c: Utilize opt.start_pos for HTTP download.
+	* ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  
 
 	* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con->cmd & DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
   && stat (locf, &st) == 0
   && S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n"));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
   && got_name
   && stat (hstat.local_file, &st) == 0
diff --git a/src/init.c b/src/init.c
index 84ae654..7f7a34e 100644
--- a/src/init.c
+++ b/src/init.c
@@ -271,6 +271,7 @@ static const struct {
   { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean },
   { "spanhosts",&opt.spanhost,  cmd_boolean },
   { "spider",   &opt.spider,cmd_boolean },
+  { "startpos", &opt.start_pos, cmd_bytes },

[Bug-wget] [PATCH v3] Make wget capable of starting download from a specified position.

2013-12-22 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
v2 -> v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah  for the suggestions.

v1 -> v2

It was kindly pointed out by Darshit Shah  that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)
>From f7266cc18fbea1d07b25c1bd25662a5a71920520 Mon Sep 17 00:00:00 2001
From: Yousong Zhou 
Date: Fri, 20 Dec 2013 23:17:43 +0800
Subject: [PATCH v3] Make wget capable of starting download from a specified position.

This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
v2 -> v3

	Fix a typo and add description text for the new option into the usage
	output.  Thank Darshit Shah  for the suggestions.

v1 -> v2

	It was kindly pointed out by Darshit Shah  that
	server support for resuming download is required, so adding this into
	doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  
+
+	* wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  
 
 	* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..87fef7c 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,23 @@ Another instance where you'll get a garbled file if you try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start the download at position @var{OFFSET}.  Offset may be expressed in bytes,
+kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
+Server support for resuming download is needed, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  
+
+	* options.h: Add option --start-pos to specify start position of
+	  a download.
+	* main.c: Same purpose as above.
+	* init.c: Same purpose as above.
+	* http.c: Utilize opt.start_pos for HTTP download.
+	* ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  
 
 	* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con->cmd & DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
   && stat (locf, &st) == 0
   && S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n"));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
   && got_name
   && stat (hstat.local_file, &st) == 0
diff --git a/src/init.c b/src/init.c
index 84ae654..7f7a34e 100644
--- a/src/init.c
+++ b/src/init.c
@@ -271,6 +271,7 @@ static const struct {
   { "showalldnsentri

Re: [Bug-wget] [PATCH] Make wget capable of starting download from a specified position.

2013-12-22 Thread Yousong Zhou
On Sat, Dec 21, 2013 at 11:05:01AM +0100, Dagobert Michelsen wrote:
> Hi,
> 
> Am 21.12.2013 um 10:24 schrieb Yousong Zhou :
> > In my situation, wget was trigger on the remote machine like the
> > following:
> > 
> >wget -O - --start-pos "$OFFSET" "$URL" | nc -lp 7193
> > 
> > Then on local machine, I would download with:
> > 
> >nc localhost 7193
> > 
> > Before these, a local forwarding tunnel has been setup with ssh to make
> > this possible.  So in this case, there was no local file on the machine
> > where wget was triggerred and `--continue' will not work.  I am sure
> > there are other cases `--start-pos' would be useful and that
> > `--start-pos' would make wget more complete.
> 
> 
> When I just look at your problem it seems to be easier to set up the tunnel
> slightly different and pull with standard wget. If the URL looks like
>   http://:/
> and you set up the tunnel with
>   ssh -L 7193:: 
> and then just
>   wget http://localhost:7193/
> the range requests would be sent by wget just fine to the initial server and
> you could also safely use -c on further wget invocations (or with proper 
> values
> for -t / -T automatically).

Just tried this approach.  It did not work out as expected because HTTP
server responded with multiple levels of redirection thus host part were
changed on the fly.  Anyway, thank you for you time.


yousong




[Bug-wget] [PATCH v4] Make wget capable of starting download from a specified position.

2013-12-22 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
v3 -> v4

In doc/wget.texi and wget usage output, explicitly note that
--start-pos is zero-based.

v2 -> v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah  for the suggestions.

v1 -> v2

It was kindly pointed out by Darshit Shah  that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)

>From d8fd955d161bd8ba17ac97cbcf7a3ed316e00630 Mon Sep 17 00:00:00 2001
From: Yousong Zhou 
Date: Fri, 20 Dec 2013 23:17:43 +0800
Subject: [PATCH v4] Make wget capable of starting download from a specified position.

This patch adds an option `--start-pos' for specifying starting position
of a download, both for HTTP and FTP.  When specified, the newly added
option would override `--continue'.  Apart from that, no existing code
should be affected.

Signed-off-by: Yousong Zhou 
---
v3 -> v4

	In doc/wget.texi and wget usage output, explicitly note that
	--start-pos is zero-based.

v2 -> v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah  for the suggestions.

v1 -> v2

It was kindly pointed out by Darshit Shah  that
server support for resuming download is required, so adding this into
doc/wget.texi.

 doc/ChangeLog |4 
 doc/wget.texi |   17 +
 src/ChangeLog |9 +
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|1 +
 src/main.c|3 +++
 src/options.h |1 +
 8 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 3b05756..df103c8 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2013-12-21  Yousong Zhou  
+
+	* wget.texi: Add documentation for --start-pos.
+
 2013-10-06  Tim Ruehsen  
 
 	* wget.texi: add/explain quoting of wildcard patterns
diff --git a/doc/wget.texi b/doc/wget.texi
index 4a1f7f1..5094c26 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,23 @@ Another instance where you'll get a garbled file if you try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start downloading at zero-based position @var{OFFSET}.  Offset may be expressed
+in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix.
+
+When specified, it would override the behavior of @samp{--continue}.  When
+using this option, you may also want to explicitly specify an output filename
+with @samp{-O FILE} in order to not overwrite an existing partially downloaded
+file.
+
+Server support for resuming download is needed, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index 42ce3e4..ab8a496 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,12 @@
+2013-12-21  Yousong Zhou  
+
+	* options.h: Add option --start-pos to specify start position of
+	  a download.
+	* main.c: Same purpose as above.
+	* init.c: Same purpose as above.
+	* http.c: Utilize opt.start_pos for HTTP download.
+	* ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2013-11-02  Giuseppe Scrivano  
 
 	* http.c (gethttp): Increase max header value length to 512.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..c7ab6ef 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con->cmd & DO_LIST)
 restval = 0;
+  else if (opt.start_pos)
+restval = opt.start_pos;
   else if (opt.always_rest
   && stat (locf, &st) == 0
   && S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 754b7ec..a354c6b 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3098,6 +3098,8 @@ Spider mode enabled. Check if remote file exists.\n"));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos)
+hstat.restval = opt.start_pos;
   else if (opt.always_

Re: [Bug-wget] Bug report: --content-disposition option disables --continue

2014-01-01 Thread Yousong Zhou
On 2 January 2014 07:08, Eternal Sorrow  wrote:
> When I set option "content-disposition" either in command line or in
> wgetrc, wget refuses to resume download of partially-downloaded file
> with --continue command line option and strarts download from begining.

I tried the following comand and it worked.

wget -d -c --quota=1 --content-disposition
http://greenbytes.de/tech/tc2231/attfnboth.asis

--content-disposition support in wget is experimental. I think many
cases are not covered.

It will help if the following information can be provided.
 - Content-Disposition header from the server response.
 - The filename of the partially downloaded file.
 - The filename wget tried to write into.
 - If possible, the minimal command to reproduce.


   yousong



Re: [Bug-wget] [PATCH v4] Make wget capable of starting download from a specified position.

2014-02-04 Thread Yousong Zhou
Hi, can this feature be picked up?  Months have passed and I think a ping
will be good.  :)

yosong

On Monday, December 23, 2013, Yousong Zhou  wrote:

> This patch adds an option `--start-pos' for specifying starting position
> of a download, both for HTTP and FTP.  When specified, the newly added
> option would override `--continue'.  Apart from that, no existing code
> should be affected.
>
> Signed-off-by: Yousong Zhou >
> ---
> v3 -> v4
>
> In doc/wget.texi and wget usage output, explicitly note that
> --start-pos is zero-based.
>
> v2 -> v3
>
> Fix a typo and add description text for the new option into the
> usage
> output.  Thank Darshit Shah > for
> the suggestions.
>
> v1 -> v2
>
> It was kindly pointed out by Darshit Shah 
> >
> that
> server support for resuming download is required, so adding this
> into
> doc/wget.texi.
>
>  doc/ChangeLog |4 
>  doc/wget.texi |   17 +
>  src/ChangeLog |9 +
>  src/ftp.c |2 ++
>  src/http.c|2 ++
>  src/init.c|1 +
>  src/main.c|3 +++
>  src/options.h |1 +
>  8 files changed, 39 insertions(+), 0 deletions(-)
>
>


[Bug-wget] [PATCH v4] Make wget capable of starting download from a specified position.

2014-02-06 Thread Yousong Zhou
On Thursday, February 6, 2014, Tim Ruehsen  wrote:

> Hi Yousong,
>
> please don't forget to send your posts to the mailing list.


Sorry for that.


>
> On Thursday 06 February 2014 10:27:37 Yousong Zhou wrote:
> > On Wednesday, February 5, 2014, Tim Ruehsen  wrote:
> > > First of all, thanks for your contribution.
> > >
> > > I have some little remarks / questions:
> > >
> > > - The documentation is not quite right: when using --start-pos and the
> > > file
> > > already exists, wget creates as expected a file.1.
> > > But your docs say, --start-pos would overwrite an existing file !?
> > > Could you make this point clear ?
> >
> > Yes, 'overwrite' is wrong.
> >
> > > - The combination with --continue works for me as expected. It would
> > > simply
> > > append the downloaded bytes to the existing file. Maybe you should
> > > document
> > > that as well. At least your sentence "... it would override the
> behavior
> > > of --
> > > continue" seems not to be correct.
> >
> > Sorry for the confusion.  --continue will detect size of existing file,
> > then continue as if an equivalent --start-pos was specified.  By
> 'override'
> > I mean the new option has higher precedence over --continue.  Other than
> > that, all existing behaviors of wget are supposed to remain unchanged.
> >
> > > - What about extending the option to something like
> > > --range=STARTPOS[-ENDPOS]
> > > ?
> >
> > You mean change the option name to 'range'?  IIRC, that's how curl does
> it.
> >  I am okay with --start-pos. ;)
>
> I just wanted to mention a possible 'ENDPOS'. In that case --start-pos
> isn't
> appropriate any more and --range seems natural to me.


I thought ENDPOS was in the patch and once I did a quick look at it I know
why I decided it should be --start-pos, without LEN or ENDPOS.

 - I thought the current implementation is simple, neat and easy.  Several
lines of code really help at that time.
 - Code of wget is old and mature enough, mimicing curl's --range is very
likely to pose many compatibility and maintainance issues.  This is not
what we want.
 - I actually thought about LENGTH, but not ENDPOS and the main reason I
give myself to not implement it is that we can achieve length limit with
other utilities, e.g. dd.

So I do not have intention to implement curl-like --range option in wget.


>
> I just took a look at curl's man page. The curl people did it the right
> way.
> Especially their hint about multipart responses is of value (i didn't know
> that). Such cases would likely need special handling in Wget.
>
>
didn't know that either.


> >
> > > - If you want to brush up your patch, add a test-case for it for the
> new
> > > Python based test suite. I guess, Darshit can give you a helping hand,
> if
> > > you
> > > request it.
> >
> > Will do once I get the time.
> >
> > > Tim
> >
> > Thank you for looking at this.
> >
> > yousong
> >
> > > >   --start-pos is zero-based.
> > > >
> > > > v2 -> v3
> > > >
> > > > Fix a typo and add description text for the new option into
> the
> > > >
> > > > usage output.  Thank Darshit Shah >
> for
> > >
> > > the suggestions.
> > >
> > > > v1 -> v2
> > > >
> > > > It was kindly pointed out by Darshit Shah
> > > > >
> > >
> > > that
> > >
> > > > server support for resuming download is required, so adding
> this
> > > >
> > > > into doc/wget.texi.
> > > >
> > > >  doc/ChangeLog |4 
> > > >  doc/wget.texi |   17 +
> > > >  src/ChangeLog |9 +
> > > >  src/ftp.c |2 ++
> > > >  src/http.c|2 ++
> > > >  src/init.c|1 +
> > > >  src/main.c|3 +++
> > > >  src/options.h |1 +
> > > >  8 files changed, 39 insertions(+), 0 deletions(-)
>
>


Re: [Bug-wget] [PATCH v4] Make wget capable of starting download from a specified position.

2014-02-12 Thread Yousong Zhou
On 6 February 2014 22:50, Tim Ruehsen  wrote:
>> I thought ENDPOS was in the patch and once I did a quick look at it I know
>> why I decided it should be --start-pos, without LEN or ENDPOS.
>>
>>  - I thought the current implementation is simple, neat and easy.  Several
>> lines of code really help at that time.
>>  - Code of wget is old and mature enough, mimicing curl's --range is very
>> likely to pose many compatibility and maintainance issues.  This is not
>> what we want.
>>  - I actually thought about LENGTH, but not ENDPOS and the main reason I
>> give myself to not implement it is that we can achieve length limit with
>> other utilities, e.g. dd.
>>
>> So I do not have intention to implement curl-like --range option in wget.
>
> I understand that and agreed with you.
> For me, there is just this little documentation issue that I mentioned (here
> is my imperfect suggestion)
>
> Start downloading at zero-based position @var{OFFSET}.  Offset may be
> expressed in bytes, kilobytes with the `k' suffix, or megabytes with the `m'
> suffix.
>
> If combined with @samp{--continue} and the destination file exists, the
> downloaded data will be appended. To avoid appending, you may explicitly
> specify an output filename with @samp{-O FILE}.
>
> Server support for the 'Range' header is needed, otherwise @samp{--start-pos}
> cannot help.  See @samp{-c} for details.
>

Thank you, Tim.  I appreciate your help and suggestion.  Just got my
internet connection back, sorry for the delayed response.

I will try to review and enhance my implementation then send a new patch.

 - Forbid the presence of both `--start-pos` and `--continue` which
makes little sense.
 - Zero-based offset position is fine at first glance, but not
appropriate when used as a bool value, i.e. `if(opt.start_pos) ...`.
I will try to polish the current zero-based solution, but may change
it to 1 based if there is not acceptable workaround.
 - Add test cases for the new feature.


Best regards.


   yousong



[Bug-wget] [PATCH v5 0/4] Make wget capable of starting downloads from a specified position.

2014-02-13 Thread Yousong Zhou
This series tries to add an option `--start-pos' for specifying starting
position of a HTTP or FTP download.  Also inclued are 2 fixes for the test
infrastructure and 3 test cases for the new option.

With the new option, a user-specified zero-based offset value can be specified,
instead of deriving it from existing file which is what --continue currently
does.  When both this option and --continue are both specified which does not
make much sense, wget will warn and proceed as if --continue was not there.

Signed-off-by: Yousong Zhou 
---
v4 -> v5

- Reworked the description in doc with kind suggestions from Tim
  Ruehsen.
- Disable --start-pos when WARC options are used.
- When --start-pos and --continue are both specified, emit a warning,
  use --start-pos and disable --continue, then proceed.
- Add 2 fixes for the test infrastructure.
- Add 3 test cases for the new option.

v3 -> v4

In doc/wget.texi and wget usage output, explicitly note that
--start-pos is zero-based.

v2 -> v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah  for the suggestions.

v1 -> v2

It was kindly pointed out by Darshit Shah  that
server support for resuming download is required, so adding this into
    doc/wget.texi.

Yousong Zhou (4):
  Make wget capable of starting downloads from a specified position.
  Tests: fix TYPE and RETR command handling.
  Tests: exclude existing files from the check of unexpected downloads.
  Tests: Add test cases for option --start-pos.

 doc/ChangeLog  |4 ++
 doc/wget.texi  |   16 ++
 src/ChangeLog  |7 
 src/ftp.c  |2 +
 src/http.c |2 +
 src/init.c |4 ++
 src/main.c |   18 +--
 src/options.h  |1 +
 tests/ChangeLog|   16 ++
 tests/FTPServer.pm |   12 ---
 tests/Test--start-pos--continue.px |   57 
 tests/Test--start-pos.px   |   46 +
 tests/Test-ftp--start-pos.px   |   42 ++
 tests/WgetTest.pm.in   |5 ++-
 tests/run-px   |3 ++
 15 files changed, 226 insertions(+), 9 deletions(-)
 create mode 100755 tests/Test--start-pos--continue.px
 create mode 100755 tests/Test--start-pos.px
 create mode 100755 tests/Test-ftp--start-pos.px

-- 
1.7.2.5




[Bug-wget] [PATCH v5 1/4] Make wget capable of starting downloads from a specified position.

2014-02-13 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a HTTP or FTP download.

Signed-off-by: Yousong Zhou 
---
 doc/ChangeLog |4 
 doc/wget.texi |   16 
 src/ChangeLog |7 +++
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|4 
 src/main.c|   18 +++---
 src/options.h |1 +
 8 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 58d1439..68629c6 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2014-02-10  Yousong Zhou  
+
+   * wget.texi: Add documentation for --start-pos.
+
 2013-12-29  Giuseppe Scrivano  
 
* wget.texi: Update to GFDL 1.3.
diff --git a/doc/wget.texi b/doc/wget.texi
index 6a8c6a3..0b23bda 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,22 @@ Another instance where you'll get a garbled file if you 
try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start downloading at zero-based position @var{OFFSET}.  Offset may be expressed
+in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc.
+
+@samp{--start-pos} has higher precedence over @samp{--continue}. When
+@samp{--start-pos} and @samp{--continue} are both specified, wget will emit a
+warning then proceed as if @samp{--continue} was absent.
+
+Server support for continued download is required, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index b7b6753..6615ad7 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,10 @@
+2014-02-10  Yousong Zhou  
+
+   * init.c, main.c, options.h: Add option --start-pos for specifying
+   start position of a download.
+   * http.c: Utilize opt.start_pos for HTTP download.
+   * ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2014-02-06  Giuseppe Scrivano  
 
* main.c (print_version): Move copyright year out of the localized
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..5282588 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, 
ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con->cmd & DO_LIST)
 restval = 0;
+  else if (opt.start_pos >= 0)
+restval = opt.start_pos;
   else if (opt.always_rest
   && stat (locf, &st) == 0
   && S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index 5715df6..0bede9d 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3101,6 +3101,8 @@ Spider mode enabled. Check if remote file exists.\n"));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos >= 0)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
   && got_name
   && stat (hstat.local_file, &st) == 0
diff --git a/src/init.c b/src/init.c
index 56fef50..9ed72b2 100644
--- a/src/init.c
+++ b/src/init.c
@@ -270,6 +270,7 @@ static const struct {
   { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean },
   { "spanhosts",&opt.spanhost,  cmd_boolean },
   { "spider",   &opt.spider,cmd_boolean },
+  { "startpos", &opt.start_pos, cmd_bytes },
   { "strictcomments",   &opt.strict_comments,   cmd_boolean },
   { "timeout",  NULL,   cmd_spec_timeout },
   { "timestamping", &opt.timestamping,  cmd_boolean },
@@ -406,6 +407,9 @@ defaults (void)
   opt.warc_cdx_dedup_filename = NULL;
   opt.warc_tempdir = NULL;
   opt.warc_keep_log = true;
+
+  /* Use a negative value to mark the absence of --start-pos option */
+  opt.start_pos = -1;
 }
 
 /* Return the user's home directory (strdup-ed), or NULL if none is
diff --git a/src/main.c b/src/main.c
index 3ce7583..39fcff4 100644
--- a/src/main.c
+++ b/src/main.c
@@ -276,6 +276,7 @@ static struct cmdline_option option_data[] =
 { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 },
 { "span-hosts", 'H', OPT_BOOLEAN, "spanhosts", -1 },
 { "spider", 0, OPT_BOOLEAN, "spider", -1 },
+{ "start-pos", 0, OPT_VALUE, "startpos", -1 },
 { "strict-comments", 0, OPT_BOOLEAN, "strictcomments", -1 },
 { "timeout", 'T', OPT_VALUE, "timeout", -1 },
 { "timestamping", &

[Bug-wget] [PATCH v5 3/4] Tests: exclude existing files from the check of unexpected downloads.

2014-02-13 Thread Yousong Zhou

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog  |5 +
 tests/WgetTest.pm.in |5 -
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index a7db249..d23e76e 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,5 +1,10 @@
 2014-02-13  Yousong Zhou  
 
+   * Wget.pm.in: Exclude existing files from the check of unexpected
+ downloads.
+
+2014-02-13  Yousong Zhou  
+
* FTPServer.pm: Fix the handling of TYPE command and avoid endless
loop when doing binary mode RETR.
 
diff --git a/tests/WgetTest.pm.in b/tests/WgetTest.pm.in
index 58ad140..092777e 100644
--- a/tests/WgetTest.pm.in
+++ b/tests/WgetTest.pm.in
@@ -256,7 +256,10 @@ sub _verify_download {
 # make sure no unexpected files were downloaded
 chdir ("$self->{_workdir}/$self->{_name}/output");
 
-__dir_walk('.', sub { push @unexpected_downloads, $_[0] unless (exists 
$self->{_output}{$_[0]}) }, sub { shift; return @_ } );
+__dir_walk('.',
+   sub { push @unexpected_downloads,
+  $_[0] unless (exists $self->{_output}{$_[0]} || 
$self->{_existing}{$_[0]}) },
+   sub { shift; return @_ } );
 if (@unexpected_downloads) {
 return "Test failed: unexpected downloaded files [" . join(', ', 
@unexpected_downloads) . "]\n";
 }
-- 
1.7.2.5




[Bug-wget] [PATCH v5 2/4] Tests: fix TYPE and RETR command handling.

2014-02-13 Thread Yousong Zhou
 - FTPServer.pm's handling of TYPE command would ignore binary mode
   transfer request.
 - The FTP server would run into dead loop sending the same content
   forever.

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog|5 +
 tests/FTPServer.pm |   12 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index 6730169..a7db249 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-13  Yousong Zhou  
+
+   * FTPServer.pm: Fix the handling of TYPE command and avoid endless
+   loop when doing binary mode RETR.
+
 2014-01-23  Lars Wendler   (tiny change)
 
* Test--post-file.px: Do not fail when wget has no debug support.
diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
index 2ac72e3..7e9e18d 100644
--- a/tests/FTPServer.pm
+++ b/tests/FTPServer.pm
@@ -298,12 +298,13 @@ sub _RETR_command
 # What mode are we sending this file in?
 unless ($conn->{type} eq 'A') # Binary type.
 {
-my ($r, $buffer, $n, $w);
-
+my ($r, $buffer, $n, $w, $sent);
 
 # Copy data.
-while ($buffer = substr($content, 0, 65536))
+$sent = 0;
+while ($sent < length($content))
 {
+$buffer = substr($content, 0, 65536);
 $r = length $buffer;
 
 # Restart alarm clock timer.
@@ -330,6 +331,7 @@ sub _RETR_command
 print {$conn->{socket}} "426 Transfer aborted. Data connection 
closed.\r\n";
 return;
 }
+$sent += $r;
 }
 
 # Cleanup and exit if there was an error.
@@ -410,9 +412,9 @@ sub _TYPE_command
 
 # See RFC 959 section 5.3.2.
 if ($type =~ /^([AI])$/i) {
-$conn->{type} = 'A';
+$conn->{type} = $1;
 } elsif ($type =~ /^([AI])\sN$/i) {
-$conn->{type} = 'A';
+$conn->{type} = $1;
 } elsif ($type =~ /^L\s8$/i) {
 $conn->{type} = 'L8';
 } else {
-- 
1.7.2.5




[Bug-wget] [PATCH v5 4/4] Tests: Add test cases for option --start-pos.

2014-02-13 Thread Yousong Zhou

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog|7 
 tests/Test--start-pos--continue.px |   57 
 tests/Test--start-pos.px   |   46 +
 tests/Test-ftp--start-pos.px   |   42 ++
 tests/run-px   |3 ++
 5 files changed, 155 insertions(+), 0 deletions(-)
 create mode 100755 tests/Test--start-pos--continue.px
 create mode 100755 tests/Test--start-pos.px
 create mode 100755 tests/Test-ftp--start-pos.px

diff --git a/tests/ChangeLog b/tests/ChangeLog
index d23e76e..f2e80e5 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,5 +1,12 @@
 2014-02-13  Yousong Zhou  
 
+   * Test--start-pos.px: Test --start-pos for HTTP downloads.
+   * Test-ftp--start-pos.px: Test --start-pos for FTP downloads.
+   * Test--start-pos--continue.px: Test the case when --start-pos and
+ --continue were both specified.
+
+2014-02-13  Yousong Zhou  
+
* Wget.pm.in: Exclude existing files from the check of unexpected
  downloads.
 
diff --git a/tests/Test--start-pos--continue.px 
b/tests/Test--start-pos--continue.px
new file mode 100755
index 000..09b8ced
--- /dev/null
+++ b/tests/Test--start-pos--continue.px
@@ -0,0 +1,57 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use HTTPTest;
+
+
+###
+
+my $existingfile = < {
+code => "206",
+msg => "Dontcare",
+headers => {
+"Content-type" => "text/plain",
+},
+content => $wholefile,
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH . " --start-pos=1 --continue --debug 
http://localhost:{{port}}/somefile.txt";;
+
+my $expected_error_code = 0;
+
+my %existing_files = (
+'somefile.txt' => {
+content => $existingfile,
+},
+);
+
+my %expected_downloaded_files = (
+'somefile.txt.1' => {
+content => substr($wholefile, 1),
+},
+);
+
+###
+
+my $the_test = HTTPTest->new (name => "Test--start-pos--continue",
+  input => \%urls,
+  cmdline => $cmdline,
+  errcode => $expected_error_code,
+  existing => \%existing_files,
+  output => \%expected_downloaded_files);
+exit $the_test->run();
+
+# vim: et ts=4 sw=4
+
+
diff --git a/tests/Test--start-pos.px b/tests/Test--start-pos.px
new file mode 100755
index 000..4962c82
--- /dev/null
+++ b/tests/Test--start-pos.px
@@ -0,0 +1,46 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use HTTPTest;
+
+
+###
+
+my $dummyfile = "1234";
+
+# code, msg, headers, content
+my %urls = (
+'/dummy.txt' => {
+code => "206",
+msg => "Dontcare",
+headers => {
+"Content-Type" => "text/plain",
+},
+content => $dummyfile
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH . " --start-pos=1 
http://localhost:{{port}}/dummy.txt";;
+
+my $expected_error_code = 0;
+
+my %expected_downloaded_files = (
+'dummy.txt' => {
+content => substr($dummyfile, 1),
+}
+);
+
+###
+
+my $the_test = HTTPTest->new (name => "Test--start-pos",
+  input => \%urls,
+  cmdline => $cmdline,
+  errcode => $expected_error_code,
+  output => \%expected_downloaded_files);
+exit $the_test->run();
+
+# vim: et ts=4 sw=4
+
+
diff --git a/tests/Test-ftp--start-pos.px b/tests/Test-ftp--start-pos.px
new file mode 100755
index 000..5062377
--- /dev/null
+++ b/tests/Test-ftp--start-pos.px
@@ -0,0 +1,42 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use FTPTest;
+
+
+###
+
+my $dummyfile = "1234";
+
+# code, msg, headers, content
+my %urls = (
+'/dummy.txt' => {
+content => $dummyfile
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH . " --start-pos=1 
ftp://localhost:{{port}}/dummy.txt";;
+
+my $expected_error_code = 0;
+
+my %expected_downloaded_files = (
+'dummy.txt' => {
+content => substr($dummyfile, 1),
+}
+);
+
+###
+
+my $the_test = FTPTest->new (name => "Test-ftp--start-pos",
+

[Bug-wget] [PATCH v5 2/4] Tests: fix TYPE and RETR command handling.

2014-02-13 Thread Yousong Zhou
 - FTPServer.pm's handling of TYPE command would ignore binary mode
   transfer request.
 - The FTP server would run into dead loop sending the same content
   forever.

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog|5 +
 tests/FTPServer.pm |   12 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index 6730169..a7db249 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-13  Yousong Zhou  
+
+   * FTPServer.pm: Fix the handling of TYPE command and avoid endless
+   loop when doing binary mode RETR.
+
 2014-01-23  Lars Wendler   (tiny change)
 
* Test--post-file.px: Do not fail when wget has no debug support.
diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
index 2ac72e3..1603caa 100644
--- a/tests/FTPServer.pm
+++ b/tests/FTPServer.pm
@@ -298,12 +298,13 @@ sub _RETR_command
 # What mode are we sending this file in?
 unless ($conn->{type} eq 'A') # Binary type.
 {
-my ($r, $buffer, $n, $w);
-
+my ($r, $buffer, $n, $w, $sent);
 
 # Copy data.
-while ($buffer = substr($content, 0, 65536))
+$sent = 0;
+while ($sent < length($content))
 {
+$buffer = substr($content, $sent, 65536);
 $r = length $buffer;
 
 # Restart alarm clock timer.
@@ -330,6 +331,7 @@ sub _RETR_command
 print {$conn->{socket}} "426 Transfer aborted. Data connection 
closed.\r\n";
 return;
 }
+$sent += $r;
 }
 
 # Cleanup and exit if there was an error.
@@ -410,9 +412,9 @@ sub _TYPE_command
 
 # See RFC 959 section 5.3.2.
 if ($type =~ /^([AI])$/i) {
-$conn->{type} = 'A';
+$conn->{type} = $1;
 } elsif ($type =~ /^([AI])\sN$/i) {
-$conn->{type} = 'A';
+$conn->{type} = $1;
 } elsif ($type =~ /^L\s8$/i) {
 $conn->{type} = 'L8';
 } else {
-- 
1.7.2.5




Re: [Bug-wget] [PATCH v5 2/4] Tests: fix TYPE and RETR command handling.

2014-02-13 Thread Yousong Zhou
Please use this newly sent one of this patch.  The old one is incorrect.

On 14 February 2014 10:27, Yousong Zhou  wrote:
> diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
> index 2ac72e3..1603caa 100644
> --- a/tests/FTPServer.pm
> +++ b/tests/FTPServer.pm
> @@ -298,12 +298,13 @@ sub _RETR_command
>  # What mode are we sending this file in?
>  unless ($conn->{type} eq 'A') # Binary type.
>  {
> -my ($r, $buffer, $n, $w);
> -
> +my ($r, $buffer, $n, $w, $sent);
>
>  # Copy data.
> -while ($buffer = substr($content, 0, 65536))
> +$sent = 0;
> +while ($sent < length($content))
>  {
> +$buffer = substr($content, $sent, 65536);

It was:

 $buffer = substr($content, 0, 65536);

>
>  $r = length $buffer;
>
>  # Restart alarm clock timer.


   yousong



Re: [Bug-wget] wget confused by URL

2014-02-20 Thread Yousong Zhou
Hi,

On Thu, 20 Feb 2014, James Macomber wrote:

> Hi,
> 
> May be my n00bness, but I can't seem to get the syntax right for this
> command or the command is getting confused by my values.
> 
> I am using the win86_64 version 1.11.4.
> 
> I am calling wget -r -i C:\Users\macombej\Desktop\wgeturl.txt -S -o
> C:\Users\macombej\Desktop\wgetresponse.txt
> 
> wgeturl.txt looks like this:
> 
> http://u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
> 
> I have tried it with username/password in the proper syntax for the above
> URL, but this doesn't seem to matter either.
> 
> and wgetresponse.txt shows this:
> 
> --2014-02-20 22:16:23--
> http://u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
> Resolving u.eq2wire.com... 67.23.252.182
> Connecting to u.eq2wire.com|67.23.252.182|:80... connected.
> HTTP request sent, awaiting response...
>   HTTP/1.1 200 OK
>   Date: Fri, 21 Feb 2014 03:16:45 GMT
>   Server: Apache
>   X-Powered-By: PHP/5.4.23
>   Refresh: 0;url=http://u.eq2wire.com/soe/item_search_results

Looks like wget didn't understand this header very well?


yousong

>   Set-Cookie:
> ci_session=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%5c5f724a6c93947f470361ed6c37e8%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%22108.48.199.124%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A11%3A%22Wget%2F1.11.4%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1392952605%3B%7D2fdb4ad7da33521f95643c3980fe9922;
> expires=Sat, 22-Feb-2014 03:16:45 GMT; path=/
>   Set-Cookie:
> ci_session=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%5c5f724a6c93947f470361ed6c37e8%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%22108.48.199.124%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A11%3A%22Wget%2F1.11.4%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1392952605%3B%7D2fdb4ad7da33521f95643c3980fe9922;
> expires=Sat, 22-Feb-2014 03:16:45 GMT; path=/
>   Vary: Accept-Encoding
>   Content-Length: 0
>   Connection: close
>   Content-Type: text/html
> Length: 0 [text/html]
> Saving to: `
> u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
> '
> 
>  0K0.00 =0s
> 
> 2014-02-20 22:16:23 (0.00 B/s) - `
> u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1'
> saved [0/0]
> 
> I have compared this to wireshark captures and these are the first two
> cookies that get pulled, but all the rest of the html code values are not
> getting pulled.
> 
> Any idea what I am missing or why this may not pull the page values I get
> with the same URL in a browser?
> 



Re: [Bug-wget] wget confused by URL

2014-02-21 Thread Yousong Zhou
Hi

`bug-wget@gnu.org' should be CC-ed so the list can see this conversaion.

On Fri, 21 Feb 2014, James Macomber wrote:

> It looks like the commands are processing the URL in the input file as the 
> output target based on the "Saving to:" line.  I am wondering
> if the "-1" values are throwing the command processor off.  I am not sure how 
> or if that is even possible though.
> 

It's expected since you have specified the `-r' option and wget tried to 
construct a hierarchy of directires as the URL indicates.  Without it, 
wget will try to save it with the name `-1'.  You can also disable this 
behaviour with `--no-directories'.

If you are talking about the size of downloaded file being zero, that's 
because the response body of that URL is actually empty.  Your browser 
redirected to another URL as indicated by the `Refresh' header, but wget 
seems to be not aware of this header currently.


    yousong

> 
> On Fri, Feb 21, 2014 at 2:21 AM, Yousong Zhou  wrote:
>   Hi,
> 
>   On Thu, 20 Feb 2014, James Macomber wrote:
> 
>   > Hi,
>   >
>   > May be my n00bness, but I can't seem to get the syntax right for this
>   > command or the command is getting confused by my values.
>   >
>   > I am using the win86_64 version 1.11.4.
>   >
>   > I am calling wget -r -i C:\Users\macombej\Desktop\wgeturl.txt -S -o
>   > C:\Users\macombej\Desktop\wgetresponse.txt
>   >
>   > wgeturl.txt looks like this:
>   >
>   > 
> http://u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
>   >
>   > I have tried it with username/password in the proper syntax for the 
> above
>   > URL, but this doesn't seem to matter either.
>   >
>   > and wgetresponse.txt shows this:
>   >
>   > --2014-02-20 22:16:23--
>   > 
> http://u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
>   > Resolving u.eq2wire.com... 67.23.252.182
>   > Connecting to u.eq2wire.com|67.23.252.182|:80... connected.
>   > HTTP request sent, awaiting response...
>   >   HTTP/1.1 200 OK
>   >   Date: Fri, 21 Feb 2014 03:16:45 GMT
>   >   Server: Apache
>   >   X-Powered-By: PHP/5.4.23
>   >   Refresh: 0;url=http://u.eq2wire.com/soe/item_search_results
> 
>   Looks like wget didn't understand this header very well?
> 
> 
>                   yousong
> 
>   >   Set-Cookie:
>   
> >ci_session=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%5c5f724a6c93947f470361ed6c37e8%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3
> A%22108.48.199.124%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A11%3A%22Wget%2F1.11.4%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1392952605%3B%7D2f
>   db4ad7da33521f95643c3980fe9922;
>   > expires=Sat, 22-Feb-2014 03:16:45 GMT; path=/
>   >   Set-Cookie:
>   
> >ci_session=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%5c5f724a6c93947f470361ed6c37e8%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3
> A%22108.48.199.124%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A11%3A%22Wget%2F1.11.4%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1392952605%3B%7D2f
>   db4ad7da33521f95643c3980fe9922;
>   > expires=Sat, 22-Feb-2014 03:16:45 GMT; path=/
>   >   Vary: Accept-Encoding
>   >   Content-Length: 0
>   >   Connection: close
>   >   Content-Type: text/html
>   > Length: 0 [text/html]
>   > Saving to: `
>   > 
> u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1
>   > '
>   >
>   >      0K                                                        0.00 
> =0s
>   >
>   > 2014-02-20 22:16:23 (0.00 B/s) - `
>   > 
> u.eq2wire.com/soe/item_search_link/Arcane/95/-1/-1/-1/-1/-1/-1/Armor/Fury/-1/-1/-1/-1/-1/-1/-1/-1/-1'
>   > saved [0/0]
>   >
>   > I have compared this to wireshark captures and these are the first two
>   > cookies that get pulled, but all the rest of the html code values are 
> not
>   > getting pulled.
>   >
>   > Any idea what I am missing or why this may not pull the page values I 
> get
>   > with the same URL in a browser?
>   >
> 
> 
> 
> 



Re: [Bug-wget] [bug-wget] Unable to execute the Test Suite

2014-02-21 Thread Yousong Zhou

Hi,

On Fri, 21 Feb 2014, Darshit Shah wrote:


Hi all,


I was trying to run the test suite on Wget, but it keeps failing due to the
new submodule. At first I thought the issue was probably with the
parallel-wget branch, so I switched to master. Yet the same problem. Just
as a control test, I created a new clone of the repository and I am still
facing the same problem.

The error output is:

echo 1.15.6-d682 > .version-t && mv .version-t .version
if test -d ./.git   \
   && git --version >/dev/null 2>&1; then  \
 cd . &&   \
 git submodule --quiet foreach \
 test '"$(git rev-parse "$sha1")"' \
 = '"$(git merge-base origin "$sha1")"'\
   || { echo 'maint.mk: found non-public submodule commit' >&2;\
exit 1; }; \
else\
 : ;   \
fi
Stopping at 'gnulib'; script returned non-zero status.
maint.mk: found non-public submodule commit
maint.mk:1394: recipe for target 'public-submodule-commit' failed
make: *** [public-submodule-commit] Error 1


This happens only when running `make check` and not when trying to
otherwise compile from source.


Mine worked fine after doing `git clean -f -d'.  Have you tried run the 
command manually to see the actuall output of each elements?


  git submodule foreach \
  test '"$(git rev-parse "$sha1")"' \
  = '"$(git merge-base origin "$sha1")"'\

Or something like

  git submodule foreach \
  echo '$name, $path, $sha1'

which produces

  yousong@jumper:~/wget$ git submodule foreach  \
  >   echo '$name, $path, $sha1'
  Entering 'gnulib'
  gnulib, gnulib, 0ac90c5a98030c998f3e1db3a0d7f19d4630b6b6

on my machine.



yousong



Anyone know the reasons for this?

--
Thanking You,
Darshit Shah





Re: [Bug-wget] Using GNU Wget 1.13.4 on an https page...

2014-03-05 Thread Yousong Zhou
On 5 March 2014 22:33, Pauline_FTP@dmin  wrote:
> Pauline_FTP@dmin learningadvoc...@gmail.com
> Feb 27 (6 days ago)
>
> Hi,
>
> Upon reading the manual regarding the topic above, I realized that I am at
> a loss of how to begin.
>
> The information is so overwhelming that I feel I would have to learn an
> entirely new language.
> Obviously, I am a novice at this.
>
> What I want to do is try GNU Wget 1.13.4 on an https page that is timing
> out before I can view and use it.
>

- Does it have to be version 1.13.4?
- Hmm, so you just want to download the page source?

> Are there any shortcuts or tips you can advise for me to get the Wget set
> up easily to use it to so that I can gain access to this https page?
>
> My system is Windows 7 and I use the Firefox browser. (I won't use the
> Chrome due to security issues that are too cumbersome to fix after they
> happen.)

Normally wget will work out of box without extra configuration.  You
can get wget for Windows from GNUWin32 project.  Does the page needs
authentication and other parameters to gain access?

FYI, you should be able to view the page source and the whole network
activity with Firebug or other extensions on FIrefox.

I am curious about the cumbersome security issues that will happen
after accessing a https page on Chrome.


 yousong



Re: [Bug-wget] [GSoC] Refactoring the Test Suite

2014-03-08 Thread Yousong Zhou
Hi, Zihang and Darshit, and all.

On 9 March 2014 09:39, Darshit Shah  wrote:
> Hi Zihang,
>
>
> I just had a brief glance through the whole commit. That's a very large
> change! It's essentially the same code with lots of moving around and
> cosmetic changes.
>
> However, I do have a couple of issues with it:
> 1. I found it really difficult to follow the code. You should edit the
> README file to reflect the current scenario and how should a developer
> follow it.
> 2. It seems like you've created some really nice abstractions, it would
> very nice to explain them so the developers for Wget know what to look at
> and what to edit.

Hi, Zihang. The patch is really big as a single commit.  You'd better
split it into multiple small ones each for a single purpose, without
breaking the code with each commit if possible.  That way we can refer
to and comment on the code more easily.

> 3. While the code surely is more pythonic, it creates a slight problem.
> It's *more* pythonic. Most people who have to deal with this code are not
> users who use Python everyday. I think, a little lesser of strict Python
> syntax and a little more of simpler syntax will allow non-Python developers
> to more easily follow the code. The point of using Python to rewrite the
> old test suite was that Perl was a bit too cryptic and people had to spend
> too much time understanding the code first before they could edit it. I
> don't want to repeat that with having truly pythonic code which takes more
> time to follow for a C developer.

To be honest, I am fine with the current Perl implementation.  My last
several patches for the Perl-based test framework are my first try
with Perl.  It does not take me much time to understand the design and
modify a few lines of code.  I think documentation or self-explaining
code is the solution.

>
> Others, please chime in on this. I like the overall restructuring though.
> And if the abstractions do work the way I think they do, I believe this
> could be a good idea. I'll look at it in much more detail when I get the
> time.
>
>
> On Sun, Mar 9, 2014 at 2:22 AM, Darshit Shah  wrote:
>
>> Hi,
>>
>> Thanks for the refactoring. However, you've included makefile and
>> makefile.in which are autogenerated files and should not be commited.
>>
>> Also, your patch has trailing whitespace errors. And I don't think you've
>> added an entry to ChangeLog either. Please look into these. I haven't seen
>> your patch yet. The Makefile errors mean I can't apply it without a lot of
>> extra work.
>>

Hi Zihang, looks like you development environment has to be configured
right for this.  The newline character in your code in now '\r\n'
which should be '\n'.  There are mode changes from 100755 to 100644 in
the git commit, which is not right.  Those .py files should retain
their executable attributes.


   yousong



Re: [Bug-wget] [GSoC] Refactoring the Test Suite

2014-03-09 Thread Yousong Zhou
On 9 March 2014 11:38, 陈子杭 (Zihang Chen)  wrote:
> Yes, you're right. I mostly work under Windows. Didn't notice that. My bad.
> BTW is there anything similar to guideline for setting the development
> environment? Please let me know if there is one.

You can find many with keywords like "git, line ending", "git rebase,
interactive", "git format-patch", "git send-email".  But I suggest
using a Linux system for this task.  I used to install a Debian server
as a virtual machine which needed only 64MB of host memory, then I
sshed into it.  Almost all tools like vim, git, etc. should just work
out of box.


   yousong



Re: [Bug-wget] [GSoC] Refactoring the Test Suite

2014-03-10 Thread Yousong Zhou
Hi, Zihang,

On 10 March 2014 13:05, 陈子杭 (Zihang Chen)  wrote:
> Hi, Darshit.
> I fixed the line ending using git config --global autocrlf input. Line
> endings should be lf now. I also added some documentation. File modes for
> Test-*.py are 755 now.
>

I just did a quick check on the patch and the line endings are still
wrong, e.g. testenv/test/http_test.py

Also, .pyc files should not be included, right?

I do not have much experience with parallel-wget, but you can enhance
organizing your commits by following how existing ones in the
repository were written.


   yousong



Re: [Bug-wget] [GSoC] Refactoring the Test Suite

2014-03-14 Thread Yousong Zhou
 Darshit Shah :
>> >
>> >>
>> >>
>> >>
>> >> On Mon, Mar 10, 2014 at 10:25 AM, 陈子杭 (Zihang Chen) > >
>> >> wrote:
>> >>>
>> >>> I applied dos2unix to all the files under testenv, checked with file
>> >>> command, deleted all pyc files, line wrap to 80 characters and format
>> a new
>> >>> patch. (I swear this will be the last huge patch I'll ever make.)
>> >>>
>> >>> I also git am this patch to a clean clone locally, and got two warning:
>> >>> warning: squelched 16 whitespace errors
>> >>> warning: 21 lines add whitespace errors.
>> >>> Is this ok?
>> >>>
>> >> I haven't checked the patch yet, but just a few suggestions:
>> >>
>> >> 1. You don't need to delete the pyc files locally. Simply don't add them
>> >> to the git commit. Use a local .gitignore file to handle it
>> >> 2. You can and should split this patch. I'm assuming it's the same stuff
>> >> as before, and that can be split. Use your imagination
>> >> 3. The whitespace errors imply trailing whitespace. This happens when yo
>> >> uhave extra whitespace characters at the end of a
>> >> line. Usually not a good idea sinec these are characters that cannot be
>> >> seen. You should eliminate them. My ViM editor
>> >> simply highlights all trailing whitespaces so I always know if they are
>> >> there. Also, you can configure your git to explicitly
>> >> highlight trailing whitespaces in its diff output (Assuming you're
>> using a
>> >> git shell, not a GUI, in which case I have no idea.)
>> >>
>> >>> Nervously, Chen
>> >>
>> >> Don't worry. Everyone faces problems with these items in the beginning.
>> >> It's not something you are used to.
>> >>
>> >>>
>> >>>
>> >>>
>> >>> 2014-03-10 16:34 GMT+08:00 陈子杭 (Zihang Chen) :
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> 2014-03-10 16:17 GMT+08:00 Darshit Shah :
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Mar 10, 2014 at 8:46 AM, 陈子杭 (Zihang Chen) <
>> chsc4...@gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Hi Yousong,
>> >>>>>>
>> >>>>>> So sorry about the line endings, I'll have to do a thorough check.
>> >>>>>
>> >>>>> I'm not sure about the line endings since my git and vim
>> cinfiguration
>> >>>>> simply do the magic
>> >>>>> of conversions for me. But if Yousong says do, do look into it.
>> >>>>>
>> >>>>> However, you seem to have added a huge amount of those especially in
>> >>>>> your 2nd patch.
>> >>>>>
>> >>>>> I do however, very strongly suggest that you get access to some sort
>> of
>> >>>>> a linux system. It will
>> >>>>> make your life so much easier. Autoconf takes ages to run on Windows
>> in
>> >>>>> a cygwin shell.
>> >>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> BTW, the pyc files in 0001.patch was deleted in the second commit.
>> >>>>>
>> >>>>>
>> >>>>> It would be better if you just did not have them there. It woulld
>> >>>>> clutter *everyone's* git repos
>> >>>>> if the .pyc files were there and later deleted. Because git will
>> leave
>> >>>>> a snapshot of each
>> >>>>> commit in the history. Keep a .gitignore file handy. Those are very
>> >>>>> important. You'll get
>> >>>>> good ones for starts from github's own gitignore repository.
>> >>>>
>> >>>> Got it. But I wonder where to put the .gitignore file. Should I use
>> the
>> >>>> one in the `wget` directory or
>> >>>> get a new one under `testenv`?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Also, we usually expect a Chang

Re: [Bug-wget] [GSoC PATCH 11/11] in conf, rename register to rule and hook

2014-03-14 Thread Yousong Zhou
On 14 March 2014 21:28, 陈子杭 (Zihang Chen)  wrote:
> So sorry I flooded the mailing list. I thought --chain-reply-to is turned
> on by default :(

I think --no-chain-reply-to is okay.


   yousong



Re: [Bug-wget] wget with sms api help

2014-03-18 Thread Yousong Zhou
Hi,

On Wednesday, March 19, 2014,  wrote:

> I have an account with smsglobal, they have sms http api as so:
>
> http://www.smsglobal.com/http-api/
>
> If I use a browser like so:
>
>
> http://www.smsglobal.com/http-api.php?action=sendsms&user=myname&password=mypassword&&from=myself&to=targetcellphone&text=Hello%20world
>
> browser says:
>
> OK: 0; Sent queued message ID: e506.28 SMSGlobalMsgID:64.80337
>
> and, I receive an sms OK on cellphone
>
> if I try wget with same url, I get[1]:
>
> do I need to escape strings...? how ?
>
> downloaded file has like:
>
> # cat http-api.php?action=sendsms
> ERROR: Missing parameter: user
> ERROR: Missing parameter: password
> ERROR: Missing parameter: from
> ERROR: Missing parameter: to
>
> [1]
> # wget
>
> http://www.smsglobal.com/http-api.php?action=sendsms&user=myname&password=mypassword&&from=mysekf&to=targetcellphone&text=Hello%20world
>
> [1] 18942
> [2] 18943
> [3] 18944
> [4] 18945
> [2]   Doneuser=myname
> [3]-  Donepassword=mypassworde && from=myself


the url needs to be quoted, surrounded by 2 single or double quotes, as
character "&" is for specifying background processes.


> # --2014-03-19 10:56:39--
> http://www.smsglobal.com/http-api.php?action=sendsms
> Resolving www.smsglobal.com... 203.89.193.162
> Connecting to www.smsglobal.com|203.89.193.162|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 130 [text/html]
> Saving to: `http-api.php?action=sendsms.1'
>
> 100%[==>] 130 --.-K/s   in 0s
>
> 2014-03-19 10:56:39 (8.76 MB/s) - `http-api.php?action=sendsms.1' saved
> [130/130]
>
> (then I hit 'cr' ot propmt doesn't come back)
>
> [1]-  Donewget
> http://www.smsglobal.com/http-api.php?action=sendsms
> [4]+  Doneto=mycellphone
>
>
>
>


[Bug-wget] [PATCH v6 0/5] Make wget capable of starting downloads from a specified position.

2014-03-19 Thread Yousong Zhou
This series tries to add an option `--start-pos' for specifying starting
position of a HTTP or FTP download.  Also inclued are 3 fixes for the test
infrastructure and 3 test cases for the new option.

With the new option, a user-specified zero-based offset value can be specified,
instead of deriving it from existing file which is what --continue currently
does.  When both this option and --continue are both specified which does not
make much sense, wget will warn and proceed as if --continue was not there.

Signed-off-by: Yousong Zhou 
---
v5 -> v6

- Fix a typo in version 5 of the patch for fixing TYPE and RETR
  commands handling in FTP test server.
- Fix test for --https-only option by adding feature constraint on
  HTTPS support.

v4 -> v5

- Reworked the description in doc with kind suggestions from Tim
  Ruehsen.
- Disable --start-pos when WARC options are used.
- When --start-pos and --continue are both specified, emit a warning,
  use --start-pos and disable --continue, then proceed.
- Add 2 fixes for the test infrastructure.
- Add 3 test cases for the new option.

v3 -> v4

In doc/wget.texi and wget usage output, explicitly note that
--start-pos is zero-based.

v2 -> v3

Fix a typo and add description text for the new option into the usage
output.  Thank Darshit Shah  for the suggestions.

v1 -> v2

It was kindly pointed out by Darshit Shah  that
server support for resuming download is required, so adding this into
doc/wget.texi.

Yousong Zhou (5):
  Make wget capable of starting downloads from a specified position.
  Tests: fix TYPE and RETR command handling.
  Tests: exclude existing files from the check of unexpected downloads.
  Tests: Add test cases for option --start-pos.
  Tests: Add constraint on https for --https-only test.

 doc/ChangeLog  |4 ++
 doc/wget.texi  |   16 ++
 src/ChangeLog  |7 
 src/ftp.c  |2 +
 src/http.c |2 +
 src/init.c |4 ++
 src/main.c |   18 +--
 src/options.h  |1 +
 tests/ChangeLog|   21 +
 tests/FTPServer.pm |   12 ---
 tests/Test--httpsonly-r.px |2 +
 tests/Test--start-pos--continue.px |   57 
 tests/Test--start-pos.px   |   46 +
 tests/Test-ftp--start-pos.px   |   42 ++
 tests/WgetTest.pm.in   |5 ++-
 tests/run-px   |3 ++
 16 files changed, 233 insertions(+), 9 deletions(-)
 create mode 100755 tests/Test--start-pos--continue.px
 create mode 100755 tests/Test--start-pos.px
 create mode 100755 tests/Test-ftp--start-pos.px

-- 
1.7.2.5




[Bug-wget] [PATCH v6 4/5] Tests: Add test cases for option --start-pos.

2014-03-19 Thread Yousong Zhou

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog|7 
 tests/Test--start-pos--continue.px |   57 
 tests/Test--start-pos.px   |   46 +
 tests/Test-ftp--start-pos.px   |   42 ++
 tests/run-px   |3 ++
 5 files changed, 155 insertions(+), 0 deletions(-)
 create mode 100755 tests/Test--start-pos--continue.px
 create mode 100755 tests/Test--start-pos.px
 create mode 100755 tests/Test-ftp--start-pos.px

diff --git a/tests/ChangeLog b/tests/ChangeLog
index d23e76e..f2e80e5 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,5 +1,12 @@
 2014-02-13  Yousong Zhou  
 
+   * Test--start-pos.px: Test --start-pos for HTTP downloads.
+   * Test-ftp--start-pos.px: Test --start-pos for FTP downloads.
+   * Test--start-pos--continue.px: Test the case when --start-pos and
+ --continue were both specified.
+
+2014-02-13  Yousong Zhou  
+
* Wget.pm.in: Exclude existing files from the check of unexpected
  downloads.
 
diff --git a/tests/Test--start-pos--continue.px 
b/tests/Test--start-pos--continue.px
new file mode 100755
index 000..09b8ced
--- /dev/null
+++ b/tests/Test--start-pos--continue.px
@@ -0,0 +1,57 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use HTTPTest;
+
+
+###
+
+my $existingfile = < {
+code => "206",
+msg => "Dontcare",
+headers => {
+"Content-type" => "text/plain",
+},
+content => $wholefile,
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH . " --start-pos=1 --continue --debug 
http://localhost:{{port}}/somefile.txt";;
+
+my $expected_error_code = 0;
+
+my %existing_files = (
+'somefile.txt' => {
+content => $existingfile,
+},
+);
+
+my %expected_downloaded_files = (
+'somefile.txt.1' => {
+content => substr($wholefile, 1),
+},
+);
+
+###
+
+my $the_test = HTTPTest->new (name => "Test--start-pos--continue",
+  input => \%urls,
+  cmdline => $cmdline,
+  errcode => $expected_error_code,
+  existing => \%existing_files,
+  output => \%expected_downloaded_files);
+exit $the_test->run();
+
+# vim: et ts=4 sw=4
+
+
diff --git a/tests/Test--start-pos.px b/tests/Test--start-pos.px
new file mode 100755
index 000..4962c82
--- /dev/null
+++ b/tests/Test--start-pos.px
@@ -0,0 +1,46 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use HTTPTest;
+
+
+###
+
+my $dummyfile = "1234";
+
+# code, msg, headers, content
+my %urls = (
+'/dummy.txt' => {
+code => "206",
+msg => "Dontcare",
+headers => {
+"Content-Type" => "text/plain",
+},
+content => $dummyfile
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH . " --start-pos=1 
http://localhost:{{port}}/dummy.txt";;
+
+my $expected_error_code = 0;
+
+my %expected_downloaded_files = (
+'dummy.txt' => {
+content => substr($dummyfile, 1),
+}
+);
+
+###
+
+my $the_test = HTTPTest->new (name => "Test--start-pos",
+  input => \%urls,
+  cmdline => $cmdline,
+  errcode => $expected_error_code,
+  output => \%expected_downloaded_files);
+exit $the_test->run();
+
+# vim: et ts=4 sw=4
+
+
diff --git a/tests/Test-ftp--start-pos.px b/tests/Test-ftp--start-pos.px
new file mode 100755
index 000..5062377
--- /dev/null
+++ b/tests/Test-ftp--start-pos.px
@@ -0,0 +1,42 @@
+#!/usr/bin/env perl
+
+use strict;
+use warnings;
+
+use FTPTest;
+
+
+###
+
+my $dummyfile = "1234";
+
+# code, msg, headers, content
+my %urls = (
+'/dummy.txt' => {
+content => $dummyfile
+},
+);
+
+my $cmdline = $WgetTest::WGETPATH . " --start-pos=1 
ftp://localhost:{{port}}/dummy.txt";;
+
+my $expected_error_code = 0;
+
+my %expected_downloaded_files = (
+'dummy.txt' => {
+content => substr($dummyfile, 1),
+}
+);
+
+###
+
+my $the_test = FTPTest->new (name => "Test-ftp--start-pos",
+

[Bug-wget] [PATCH v6 3/5] Tests: exclude existing files from the check of unexpected downloads.

2014-03-19 Thread Yousong Zhou

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog  |5 +
 tests/WgetTest.pm.in |5 -
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index a7db249..d23e76e 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,5 +1,10 @@
 2014-02-13  Yousong Zhou  
 
+   * Wget.pm.in: Exclude existing files from the check of unexpected
+ downloads.
+
+2014-02-13  Yousong Zhou  
+
* FTPServer.pm: Fix the handling of TYPE command and avoid endless
loop when doing binary mode RETR.
 
diff --git a/tests/WgetTest.pm.in b/tests/WgetTest.pm.in
index 58ad140..092777e 100644
--- a/tests/WgetTest.pm.in
+++ b/tests/WgetTest.pm.in
@@ -256,7 +256,10 @@ sub _verify_download {
 # make sure no unexpected files were downloaded
 chdir ("$self->{_workdir}/$self->{_name}/output");
 
-__dir_walk('.', sub { push @unexpected_downloads, $_[0] unless (exists 
$self->{_output}{$_[0]}) }, sub { shift; return @_ } );
+__dir_walk('.',
+   sub { push @unexpected_downloads,
+  $_[0] unless (exists $self->{_output}{$_[0]} || 
$self->{_existing}{$_[0]}) },
+   sub { shift; return @_ } );
 if (@unexpected_downloads) {
 return "Test failed: unexpected downloaded files [" . join(', ', 
@unexpected_downloads) . "]\n";
 }
-- 
1.7.2.5




[Bug-wget] [PATCH v6 2/5] Tests: fix TYPE and RETR command handling.

2014-03-19 Thread Yousong Zhou
 - FTPServer.pm's handling of TYPE command would ignore binary mode
   transfer request.
 - The FTP server would run into dead loop sending the same content
   forever.

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog|5 +
 tests/FTPServer.pm |   12 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index 6730169..a7db249 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,8 @@
+2014-02-13  Yousong Zhou  
+
+   * FTPServer.pm: Fix the handling of TYPE command and avoid endless
+   loop when doing binary mode RETR.
+
 2014-01-23  Lars Wendler   (tiny change)
 
* Test--post-file.px: Do not fail when wget has no debug support.
diff --git a/tests/FTPServer.pm b/tests/FTPServer.pm
index 2ac72e3..1603caa 100644
--- a/tests/FTPServer.pm
+++ b/tests/FTPServer.pm
@@ -298,12 +298,13 @@ sub _RETR_command
 # What mode are we sending this file in?
 unless ($conn->{type} eq 'A') # Binary type.
 {
-my ($r, $buffer, $n, $w);
-
+my ($r, $buffer, $n, $w, $sent);
 
 # Copy data.
-while ($buffer = substr($content, 0, 65536))
+$sent = 0;
+while ($sent < length($content))
 {
+$buffer = substr($content, $sent, 65536);
 $r = length $buffer;
 
 # Restart alarm clock timer.
@@ -330,6 +331,7 @@ sub _RETR_command
 print {$conn->{socket}} "426 Transfer aborted. Data connection 
closed.\r\n";
 return;
 }
+$sent += $r;
 }
 
 # Cleanup and exit if there was an error.
@@ -410,9 +412,9 @@ sub _TYPE_command
 
 # See RFC 959 section 5.3.2.
 if ($type =~ /^([AI])$/i) {
-$conn->{type} = 'A';
+$conn->{type} = $1;
 } elsif ($type =~ /^([AI])\sN$/i) {
-$conn->{type} = 'A';
+$conn->{type} = $1;
 } elsif ($type =~ /^L\s8$/i) {
 $conn->{type} = 'L8';
 } else {
-- 
1.7.2.5




[Bug-wget] [PATCH v6 5/5] Tests: Add constraint on https for --https-only test.

2014-03-19 Thread Yousong Zhou

Signed-off-by: Yousong Zhou 
---
 tests/ChangeLog|4 
 tests/Test--httpsonly-r.px |2 ++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/tests/ChangeLog b/tests/ChangeLog
index f2e80e5..c3baac3 100644
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
@@ -1,3 +1,7 @@
+2014-02-24  Yousong Zhou   (tiny change)
+
+   * tests/Test--httpsonly-r.px: Add feature constraint on https.
+
 2014-02-13  Yousong Zhou  
 
* Test--start-pos.px: Test --start-pos for HTTP downloads.
diff --git a/tests/Test--httpsonly-r.px b/tests/Test--httpsonly-r.px
index 019df1a..66d156f 100755
--- a/tests/Test--httpsonly-r.px
+++ b/tests/Test--httpsonly-r.px
@@ -3,6 +3,8 @@
 use strict;
 use warnings;
 
+use WgetFeature qw(https);
+
 use HTTPTest;
 
 
-- 
1.7.2.5




[Bug-wget] [PATCH v6 1/5] Make wget capable of starting downloads from a specified position.

2014-03-19 Thread Yousong Zhou
This patch adds an option `--start-pos' for specifying starting position
of a HTTP or FTP download.

Signed-off-by: Yousong Zhou 
---
 doc/ChangeLog |4 
 doc/wget.texi |   16 
 src/ChangeLog |7 +++
 src/ftp.c |2 ++
 src/http.c|2 ++
 src/init.c|4 
 src/main.c|   18 +++---
 src/options.h |1 +
 8 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/doc/ChangeLog b/doc/ChangeLog
index 58d1439..68629c6 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,7 @@
+2014-02-10  Yousong Zhou  
+
+   * wget.texi: Add documentation for --start-pos.
+
 2013-12-29  Giuseppe Scrivano  
 
* wget.texi: Update to GFDL 1.3.
diff --git a/doc/wget.texi b/doc/wget.texi
index 6a8c6a3..0b23bda 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -701,6 +701,22 @@ Another instance where you'll get a garbled file if you 
try to use
 Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
 servers that support the @code{Range} header.
 
+@cindex offset
+@cindex continue retrieval
+@cindex incomplete downloads
+@cindex resume download
+@cindex start position
+@item --start-pos=@var{OFFSET}
+Start downloading at zero-based position @var{OFFSET}.  Offset may be expressed
+in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc.
+
+@samp{--start-pos} has higher precedence over @samp{--continue}. When
+@samp{--start-pos} and @samp{--continue} are both specified, wget will emit a
+warning then proceed as if @samp{--continue} was absent.
+
+Server support for continued download is required, otherwise @samp{--start-pos}
+cannot help.  See @samp{-c} for details.
+
 @cindex progress indicator
 @cindex dot style
 @item --progress=@var{type}
diff --git a/src/ChangeLog b/src/ChangeLog
index d3ac754..9b10ee8 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,10 @@
+2014-03-19  Yousong Zhou  
+
+   * init.c, main.c, options.h: Add option --start-pos for specifying
+   start position of a download.
+   * http.c: Utilize opt.start_pos for HTTP download.
+   * ftp.c: Utilize opt.start_pos for FTP retrieval.
+
 2014-03-04  Giuseppe Scrivano  
 
* http.c (modify_param_value, extract_param): Aesthetic change.
diff --git a/src/ftp.c b/src/ftp.c
index c2522ca..5282588 100644
--- a/src/ftp.c
+++ b/src/ftp.c
@@ -1632,6 +1632,8 @@ ftp_loop_internal (struct url *u, struct fileinfo *f, 
ccon *con, char **local_fi
   /* Decide whether or not to restart.  */
   if (con->cmd & DO_LIST)
 restval = 0;
+  else if (opt.start_pos >= 0)
+restval = opt.start_pos;
   else if (opt.always_rest
   && stat (locf, &st) == 0
   && S_ISREG (st.st_mode))
diff --git a/src/http.c b/src/http.c
index cd2bd15..8bba70d 100644
--- a/src/http.c
+++ b/src/http.c
@@ -3121,6 +3121,8 @@ Spider mode enabled. Check if remote file exists.\n"));
   /* Decide whether or not to restart.  */
   if (force_full_retrieve)
 hstat.restval = hstat.len;
+  else if (opt.start_pos >= 0)
+hstat.restval = opt.start_pos;
   else if (opt.always_rest
   && got_name
   && stat (hstat.local_file, &st) == 0
diff --git a/src/init.c b/src/init.c
index 56fef50..9ed72b2 100644
--- a/src/init.c
+++ b/src/init.c
@@ -270,6 +270,7 @@ static const struct {
   { "showalldnsentries", &opt.show_all_dns_entries, cmd_boolean },
   { "spanhosts",&opt.spanhost,  cmd_boolean },
   { "spider",   &opt.spider,cmd_boolean },
+  { "startpos", &opt.start_pos, cmd_bytes },
   { "strictcomments",   &opt.strict_comments,   cmd_boolean },
   { "timeout",  NULL,   cmd_spec_timeout },
   { "timestamping", &opt.timestamping,  cmd_boolean },
@@ -406,6 +407,9 @@ defaults (void)
   opt.warc_cdx_dedup_filename = NULL;
   opt.warc_tempdir = NULL;
   opt.warc_keep_log = true;
+
+  /* Use a negative value to mark the absence of --start-pos option */
+  opt.start_pos = -1;
 }
 
 /* Return the user's home directory (strdup-ed), or NULL if none is
diff --git a/src/main.c b/src/main.c
index 3ce7583..39fcff4 100644
--- a/src/main.c
+++ b/src/main.c
@@ -276,6 +276,7 @@ static struct cmdline_option option_data[] =
 { "server-response", 'S', OPT_BOOLEAN, "serverresponse", -1 },
 { "span-hosts", 'H', OPT_BOOLEAN, "spanhosts", -1 },
 { "spider", 0, OPT_BOOLEAN, "spider", -1 },
+{ "start-pos", 0, OPT_VALUE, "startpos", -1 },
 { "strict-comments", 0, OPT_BOOLEAN, "strictcomments", -1 },
 { "timeout", 'T', OPT_VALUE, "timeout", -1 },
 { "timestamping", '

Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-21 Thread Yousong Zhou
Hi, Jure.

On 21 March 2014 03:23, Jure Grabnar  wrote:
> Thank you for you feedback Darshit. I changed my proposal according to your
> advices. Hopefully a new version is better.
>
> I'm also sending corrected patches, again thanks to your review, Darshit.
> First patch allows Metalink to have optional argument "type" in 
> field. Where type is not present, it extracts protocol type from URL string.
>

On the 1st patch, static "char *" value should not be assigned to
resource->type that will later be free()'ed.


   yousong



Re: [Bug-wget] [PATCH v6 0/5] Make wget capable of starting downloads from a specified position.

2014-03-21 Thread Yousong Zhou
On 21 March 2014 19:34, Giuseppe Scrivano  wrote:
> I've done some more tests and now pushed!

Finally.  Thank you, Tim, Darshit, Giuseppe, for your time and
attention on this.


   yousong



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-22 Thread Yousong Zhou
Hi, Jure.

On 22 March 2014 18:02, Jure Grabnar  wrote:
> Hi,
>
> thank you for your feedback, Darshit, Yousong!
>
> I reverted magic number back to its original state ('tmp2'), because it
> should
> be there (I overlooked that 'tmp' variable is changed in the very next
> statement).
>
> Duplicated line is removed.
>
> I also changed resource->type to point at dynamic memory.

+  if (type)
+{
+  resource->type = malloc (strlen (type));
+  sprintf(resource->type, type);
+}

xstrdup() is better because that is how existing code does it.  And
you may want to know that using a variable as the format string is not
a good practice for secure code.

yousong

>
> They say third's time's the charm. :) I hope it's ok now.
>
> Regards,
>
>
> Jure Grabnar
>
>



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-30 Thread Yousong Zhou
Hi,

On 28 March 2014 20:33, Jure Grabnar  wrote:
> Hi,
>
> Thank you Yousong. I've listened to your advice and changed type of
> resource->type to
> enum url_scheme. Now it looks much cleaner.

Using enum is a step forward.

> @@ -134,7 +135,20 @@ parse_metalink(char *input_file)
>++(file->num_of_res);
>
>resource->url = xstrdup ((*resources)->url);
> -  resource->type = ((*resources)->type ? xstrdup 
> ((*resources)->type) : NULL);
> +
> +  if ((*resources)->type)
> +{
> +  /* Append "://" to resource type so url_scheme() recognizes 
> type */
> +  char *temp_url = malloc ( strlen ( (*resources)->type) + 4);
> +  sprintf (temp_url, "%s://", (*resources)->type);
> +
> +  resource->type = url_scheme (temp_url);
> +
> +  free (temp_url);
> +}

This is a little hacky.  Adding a utility function like
url_scheme_str_to_enum() will be better.

> +  else
> +resource->type = url_scheme (resource->url);
> +
>resource->location = ((*resources)->location ? xstrdup 
> ((*resources)->location) : NULL);
>resource->preference = (*resources)->preference;
>resource->maxconnections = (*resources)->maxconnections;
> @@ -143,7 +157,7 @@ parse_metalink(char *input_file)
>(file->resources) = resource;
>  }
>
> -  for (checksums = (*files)->checksums; *checksums; ++checksums)
> +  for (checksums = (*files)->checksums; checksums && *checksums; 
> ++checksums)

Good catch.  Should do the same NULL check for (*files)->resources.

>  {
>mlink_checksum *checksum = malloc (sizeof(mlink_checksum));
>
>

<...>

> @@ -215,19 +229,25 @@ elect_resources (mlink *mlink)
>
>while (res_next = res->next)
>  {
> -  if (strcmp(res_next->type, "ftp") && strcmp(res_next->type, 
> "http"))
> +  if (schemes_are_similar_p (res_next->type, SCHEME_INVALID))
>  {
>res->next = res_next->next;
>free(res_next);
> +
> +  --(file->num_of_res);
>  }
>else
>  res = res_next;
>  }
>res = file->resources;
> -  if (strcmp(res->type, "ftp") && strcmp(res->type, "http"))
> +  if (schemes_are_similar_p (res->type, SCHEME_INVALID))
>  {
>file->resources = res->next;

If I am right, this will set it to NULL if file->num_of_res is 1.

> -  free(res);
> +  free (res);
> +
> +  --(file->num_of_res);
> +  if (!file->num_of_res)
> +file->resources = NULL;

So explicitly setting it to NULL is not needed.

>  }
>  }
>  }
>

>
> I also added check for whenever there's no resources available to download a
> file.
>
> Second patch remains unchanged.
>
> Regards,
>
>
> Jure Grabnar



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-03-31 Thread Yousong Zhou
Hi, Jure.

On 1 April 2014 03:46, Jure Grabnar  wrote:
> Hello,
>
> thanks for your feedback! I corrected the first patch.

Then the 1st one is fine with me.

I am not fluent with Metalink and libmetalink on how it handles the
type attribute.  In version 4.0 of the standard, there is no type
attribute for metalink:url element, only metalink:metaurl has it.
Though not explicitly using a word like "must" in version 3.0 of the
spec, looks like type attribute is a required one there (See 4.1.2.4
of the 3.0 spec). If that is the case, then the metalink file is not a
standard-compliant one if that attribute is missing.  Maybe later
people want a way to ignore those non-compliant metalink:url element.
But let that be another story when the need actually came up.  :)


   yousong



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-04-01 Thread Yousong Zhou
On 1 April 2014 15:48, Jure Grabnar  wrote:
> Hi,
>
> I debugged code before writing the 1st patch and found out that if "type"
> attribute is not present in v3.0, libmetalink completly ignores it (URL is
> not present in resources!).
> If you write "type" attribute in v4.0, libmetalink ignores it (only "type",
> URL is still present in resources!). So you have to find out protocol type
> from URL in v4.0.

But the type attribute is currently not used by wget.  I cannot find
any reference to it outside metalink.c.  Anyway, IIUC, types like
torrent, ed2k, etc. are not in the realm of wget.

yousong

> This was the main purpose of the 1st patch.
>
>
> On 1 April 2014 03:20, Yousong Zhou  wrote:
>>
>> Hi, Jure.
>>
>> On 1 April 2014 03:46, Jure Grabnar  wrote:
>> > Hello,
>> >
>> > thanks for your feedback! I corrected the first patch.
>>
>> Then the 1st one is fine with me.
>>
>> I am not fluent with Metalink and libmetalink on how it handles the
>> type attribute.  In version 4.0 of the standard, there is no type
>> attribute for metalink:url element, only metalink:metaurl has it.
>> Though not explicitly using a word like "must" in version 3.0 of the
>> spec, looks like type attribute is a required one there (See 4.1.2.4
>> of the 3.0 spec).
>
>
> I thought so too, but if you take a look at 4.1.2.5 section of the v3.0
> spec, the last example shows that "type" attribute can be omitted.
>
>>
>> If that is the case, then the metalink file is not a
>> standard-compliant one if that attribute is missing.  Maybe later
>> people want a way to ignore those non-compliant metalink:url element.
>> But let that be another story when the need actually came up.  :)
>
>
> Then libmetalink should be tweaked a bit, to allow non-compliant 
> elements, because currently it just ignores them (v3.0).
> Although, to be honest, they could just switch to v4.0, where "type" is
> optional and properly parsed by libmetalink. :)
>
> Regards,
>
> Jure Grabnar
>
>



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-04-01 Thread Yousong Zhou
Hi,

On 1 April 2014 23:02, Jure Grabnar  wrote:
>
> On 1 April 2014 10:39, Yousong Zhou  wrote:
>>
>> On 1 April 2014 15:48, Jure Grabnar  wrote:
>> > Hi,
>> >
>> > I debugged code before writing the 1st patch and found out that if
>> > "type"
>> > attribute is not present in v3.0, libmetalink completly ignores it (URL
>> > is
>> > not present in resources!).
>> > If you write "type" attribute in v4.0, libmetalink ignores it (only
>> > "type",
>> > URL is still present in resources!). So you have to find out protocol
>> > type
>> > from URL in v4.0.
>>
>> But the type attribute is currently not used by wget.  I cannot find
>> any reference to it outside metalink.c.  Anyway, IIUC, types like
>> torrent, ed2k, etc. are not in the realm of wget.
>

I just checked 4.1.2.5 of metalink 3.0 spec.  It says when the "type"
attribute is missing users can derive if it is for BitTorrent by
examining the suffix of the URL.  That's bad.  URL is only for
Universal Resource Locator, it doesn't end with a specific name to
indicate its type.  I may say that libmetalink does the right thing by
ignoring those metalink:url element.

>
> That's true. "type" is currently only used to filter out types which Wget
> doesn't support.
> Do you think parsing it ("type") is irrelevant?

IMHO, if it will not be used in the near future, then better document
or remove it.

>
> Regards,
>
> Jure Grabnar
>



Re: [Bug-wget] Fwd: [GSoC] Extend concurrency support in Wget

2014-04-05 Thread Yousong Zhou
On 5 April 2014 17:28, Jure Grabnar  wrote:
> Hi,
>
>>
>> >
>> > That's true. "type" is currently only used to filter out types which
>> > Wget
>> > doesn't support.
>> > Do you think parsing it ("type") is irrelevant?
>>
>> IMHO, if it will not be used in the near future, then better document
>> or remove it.
>
>
> I tried removing elect_resources() (essentially removing "type" attribute)
> and it mostly works.
> It fails when "bittorrent" url resource has top priority. In this case it
> HTTP downloads what looks to me like a tracker info.
> Since checksum differs from original file (extracted from metalink file)
> download fails.
>
> I also merged two metalink files (header from the first file and resources
> from the second file) and Wget crashes. I found out there are some issues
> with temporary files.
>
> I do believe checking types is more fail-safe since these issues do not
> occur there. At least "bittorrent" resources have to be eliminated
> beforehand or make Wget somehow aware of them.

Yes, I agree that the parsing is needed for filtering out schemes like
ed2k and bittorrent.

yousong



Re: [Bug-wget] [PATCH] wget hangs on HTTP 204

2014-04-22 Thread Yousong Zhou
On 22 April 2014 21:02, Tim Ruehsen  wrote:
> Attached is a patch including a new test case.
>
> Guiseppe, I made it for a clone of Darshit's clone of Wget. Not sure if it
> fits into master.

Hi, Giuseppe.  Just noticed that my previous test cases for
--start-pos were not recorded in the tests/Makefile.am file.  Can you
kindly pick them up there?


   yousong



Re: [Bug-wget] grab complete download link

2014-07-20 Thread Yousong Zhou
Hi,

On 21 July 2014 09:38, bas smit  wrote:
> Dear Darshit Shah
> Thanks for your response.
>
> I tried with the following command:
> subprocess.call([wget,'--user',user,'--password',passw,'-P',download_dir,'--page-requisites',url,'-o',logfile,\
> '--no-check-certificate'])
>

The URL you provided needs login to access.  But I guess recursive
download is what you want.  Try options `--recursive --level=1` , or
`-r -l 1` for the short equivalent.

> However, still unsuccessful to download the required file.
>
> I also obtained the following in the log file:
>
> WARNING: Certificate verification error: unable to get local issuer
> certificate
>
>
> I hope you can help me.
>
> Bas
>
>
> WARNING: Certificate verification error: unable to get local issuer
> certificate
>
>
> On Thu, Jul 17, 2014 at 9:34 PM, Darshit Shah  wrote:
>
>> You want to use the --page-requisites option
>>
>> On Thu, Jul 17, 2014 at 2:22 PM, bas smit  wrote:
>> > I am looking for command line option to use the same functionality as the
>> > "Download All with Free Download Manager" does. It grabs the complete
>> > download links though only partial links are shown in the source html.  I
>> > tried the following code, but but could not figure out which particular
>> > parameter is necessary for that. The url provided below is the only known
>> > one.
>> >
>> > import subprocess
>> >
>> > user, passw = 'user', 'passw'
>> >
>> > url = '
>> http://earthexplorer.usgs.gov/download/3120/LM10300301974324GDS05/STANDARD/BulkDownload
>> '
>> >
>> > wget = "C:\\Users\\bas\\Downloads\\wget-1.10.2.exe"
>> > subprocess.call([wget, '--user', user, '--password', passw, url])
>>
>>
>>
>> --
>> Thanking You,
>> Darshit Shah
>>



Re: [Bug-wget] Wget export URL list

2014-09-03 Thread Yousong Zhou
On 3 September 2014 22:26, Adrian - adrianTNT.com
 wrote:
> Hello.
> Can anyone tell me how to do this with wget ?
> I want it to spider a given website and return the list of full urls in
> that website.
> Any ideas?

This can be done by

 - grepping through stderr output of wget
 - patching wget for your specific need.  should be easy.

   yousong



Re: [Bug-wget] wget-bug

2014-09-13 Thread Yousong Zhou
On Sep 13, 2014 9:39 PM, "Nyilas MISY"  wrote:
>
> hello :-)
>
> shortly (this is just an example!!) ::
>
> [user@host ~]$ wget -r -c -P ~/Downloads/
>
http://multicommander.com/files/updates/MultiCommander_win32_(4.5.1.1769).exe
>
> when the filename(s) contains ( ), then the wget doesn't downloads
> it/them how can fix this bug??
>

did the shell complained that it couldn't find the command "4.5.1.1769"?

how about try surrounding the URL with quotes.  in this case, double or
single quotes should both work.

yousong

> I'm on Fedora 20, 32bit, MATE desktop environment..
>
> have a nice day and week :-)
>
> Nyilas MISY
>


Re: [Bug-wget] wget-bug

2014-09-13 Thread Yousong Zhou
On Sep 13, 2014 10:15 PM, "Nyilas MISY"  wrote:
>
> [user@host ~]$ wget -r -c -P /home/user/Downloads/
>
http://multicommander.com/files/updates/MultiCommander_win32_(4.5.1.1769).exe
> bash: syntax error "(" near unexpected token

bash emitted the error message, not wget.  quote the URL part and it should
work.

> [user@host ~]$
>
> 2014-09-13 16:04 GMT+02:00 Yousong Zhou :
> >
> > On Sep 13, 2014 9:39 PM, "Nyilas MISY"  wrote:
> >>
> >> hello :-)
> >>
> >> shortly (this is just an example!!) ::
> >>
> >> [user@host ~]$ wget -r -c -P ~/Downloads/
> >>
> >>
http://multicommander.com/files/updates/MultiCommander_win32_(4.5.1.1769).exe
> >>
> >> when the filename(s) contains ( ), then the wget doesn't downloads
> >> it/them how can fix this bug??
> >>
> >
> > did the shell complained that it couldn't find the command "4.5.1.1769"?
> >
> > how about try surrounding the URL with quotes.  in this case, double or
> > single quotes should both work.
> >
> > yousong
> >
> >> I'm on Fedora 20, 32bit, MATE desktop environment..
> >>
> >> have a nice day and week :-)
> >>
> >> Nyilas MISY
> >>


Re: [Bug-wget] Issue with --content-on-error and --convert-links

2014-10-16 Thread Yousong Zhou
On 13 October 2014 10:25, Joe Hoyle  wrote:
> Hi All,
>
>
> I’m having issues using "--convert-links” in conjunction with 
> "--content-on-error”. Though "--content-on-error” is forcing wget to download 
> the pages, the links to that “errored” page is not update in other pages that 
> link to it.
>
>
> This seems to be hinted at in the man page:
>
>
> "Because of this, local browsing works reliably: if a linked file was 
> downloaded, the link will refer to its local name; if it was not downloaded, 
> the link will refer to its full Internet address rather than presenting a 
> broken link. The fact that the former links are converted to relative links 
> ensures that you can move the downloaded hierarchy to another directory.”
>
>
> However, it would seem in the case of using —content-on-error it should 
> ignore this rule and do all the link substation anyhow.
>
>
> If anyone knows if this *should* work then I’d be eager to hear it, or any 
> other way I can get any 404 pages downloaded and also linked to in the wget 
> mirror.
>

Currently, wget thought pages with 404 status code were not RETROKF
(retrieval was OK) though the 404 page itself was actually downloaded
successfully with `--content-on-error` option enabled.  This behaviour
is mostly acceptable I guess.  But you can try the attached the patch
for the moment.  The other option would be serving the 404 page by
manually setting it up with your web server.

Regards.

   yousong


0001-Let-convert-links-work-with-content-on-error.patch
Description: Binary data


Re: [Bug-wget] wget

2014-10-18 Thread Yousong Zhou
Hi, Bryan

Am 18.10.2014 21:43 schrieb "Bryan Baas" :
>
> Hi,
>
> I was wondering about the command output of wget.  I used a Java Runtime
> exec and, although the wget process ended with a 0 completion code, the
> results appeared in the error stream and not the output stream.
>
> As a further test, I executed the same command at the command line and
> redirected output to a file using the > operator.  Upon completion the
> file was empty, but the results scrolled down the screen.  This had me
> thinking that the wget command itself is directing its regular output to
> sderr instead of stdout.

Yes, that is the expected.  It is possible to set the output file to stdout
with "-O -" in which case you do not want to see output of wget itself and
the file content mangled together.

>
> The results of the wget command, from what I could tell, weren't error
> conditions but regular output from a successful execution.
>

I think it is a convention that debug, informational, error, verbose output
of unix programs be written to stderr.  However, the choice of redirecting
stderr to whatever file descriptor users prefer is always available.

regards.

yousong

> Your feedback would be appreciated.
>
> regards,
>
>
> --
> Bryan Baas
> Weyco IT
> x1808
> 414 241 0499 (cell)
>


Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
Hi, Pär.  I got a few comments inline.

On 21 October 2014 05:47, Pär Karlsson  wrote:
> Whoops, I realised I failed on the GNU coding standards, please disregard
> the last one; the patch below should be better.
>
> My apologies :-/
>
> /Pär
>
> diff --git a/src/ChangeLog b/src/ChangeLog
> index d5aeca0..87abd85 100644
> --- a/src/ChangeLog
> +++ b/src/ChangeLog
> @@ -1,3 +1,8 @@
> +2014-10-20 Pär Karlsson  
> +
> +   * utils.c (concat_strings): got rid of double loop, cleaned up
> potential
> +   memory corruption if concat_strings was called with more than five
> args
> +
>  2014-10-16  Tim Ruehsen  
>
> * url.c (url_parse): little code cleanup
> diff --git a/src/utils.c b/src/utils.c
> index 78c282e..5f359e0 100644
> --- a/src/utils.c
> +++ b/src/utils.c
> @@ -356,42 +356,36 @@ char *
>  concat_strings (const char *str0, ...)
>  {
>va_list args;
> -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
>char *ret, *p;
>
>const char *next_str;
> -  int total_length = 0;
> -  size_t argcount;
> +  size_t len;
> +  size_t total_length = 0;
> +  size_t charsize = sizeof (char);

I am not sure here.  Do we always assume sizeof(char) to be 1 for
platforms supported by wget?

> +  size_t chunksize = 64;
> +  size_t bufsize = 64;
> +
> +  p = ret = xmalloc (charsize * bufsize);
>
>/* Calculate the length of and allocate the resulting string. */
>
> -  argcount = 0;
>va_start (args, str0);
>for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
>  {
> -  int len = strlen (next_str);
> -  if (argcount < countof (saved_lengths))
> -saved_lengths[argcount++] = len;
> +  len = strlen (next_str);
> +  if (len == 0)
> +continue;
>total_length += len;
> -}
> -  va_end (args);
> -  p = ret = xmalloc (total_length + 1);
> -
> -  /* Copy the strings into the allocated space. */
> -
> -  argcount = 0;
> -  va_start (args, str0);
> -  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
> -{
> -  int len;
> -  if (argcount < countof (saved_lengths))
> -len = saved_lengths[argcount++];
> -  else
> -len = strlen (next_str);
> +  if (total_length > bufsize)
> +  {
> +bufsize += chunksize;

Should be `bufsize = total_length` ?

> +ret = xrealloc (ret, charsize * bufsize);
> +  }
>memcpy (p, next_str, len);

Xrealloc may return a new block different from p, so memcpy(p, ...)
may not be what you want.

>p += len;
>  }
>va_end (args);
> +  ret = xrealloc (ret, charsize * total_length + 1);
>*p = '\0';

Malloc takes time.  How about counting total_length in one loop and
doing the copy in another?

Regards.

yousong

>
>return ret;
>



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
On 21 October 2014 10:02, Yousong Zhou  wrote:
> Hi, Pär.  I got a few comments inline.
>
> On 21 October 2014 05:47, Pär Karlsson  wrote:
>> Whoops, I realised I failed on the GNU coding standards, please disregard
>> the last one; the patch below should be better.
>>
>> My apologies :-/
>>
>> /Pär
>>
>> diff --git a/src/ChangeLog b/src/ChangeLog
>> index d5aeca0..87abd85 100644
>> --- a/src/ChangeLog
>> +++ b/src/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2014-10-20 Pär Karlsson  
>> +
>> +   * utils.c (concat_strings): got rid of double loop, cleaned up
>> potential
>> +   memory corruption if concat_strings was called with more than five
>> args
>> +
>>  2014-10-16  Tim Ruehsen  
>>
>> * url.c (url_parse): little code cleanup
>> diff --git a/src/utils.c b/src/utils.c
>> index 78c282e..5f359e0 100644
>> --- a/src/utils.c
>> +++ b/src/utils.c
>> @@ -356,42 +356,36 @@ char *
>>  concat_strings (const char *str0, ...)
>>  {
>>va_list args;
>> -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
>>char *ret, *p;
>>
>>const char *next_str;
>> -  int total_length = 0;
>> -  size_t argcount;
>> +  size_t len;
>> +  size_t total_length = 0;
>> +  size_t charsize = sizeof (char);
>
> I am not sure here.  Do we always assume sizeof(char) to be 1 for
> platforms supported by wget?
>
>> +  size_t chunksize = 64;
>> +  size_t bufsize = 64;
>> +
>> +  p = ret = xmalloc (charsize * bufsize);
>>
>>/* Calculate the length of and allocate the resulting string. */
>>
>> -  argcount = 0;
>>va_start (args, str0);
>>for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
>>  {
>> -  int len = strlen (next_str);
>> -  if (argcount < countof (saved_lengths))
>> -saved_lengths[argcount++] = len;
>> +  len = strlen (next_str);
>> +  if (len == 0)
>> +continue;
>>total_length += len;
>> -}
>> -  va_end (args);
>> -  p = ret = xmalloc (total_length + 1);
>> -
>> -  /* Copy the strings into the allocated space. */
>> -
>> -  argcount = 0;
>> -  va_start (args, str0);
>> -  for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
>> -{
>> -  int len;
>> -  if (argcount < countof (saved_lengths))
>> -len = saved_lengths[argcount++];
>> -  else
>> -len = strlen (next_str);
>> +  if (total_length > bufsize)
>> +  {
>> +bufsize += chunksize;
>
> Should be `bufsize = total_length` ?
>
>> +ret = xrealloc (ret, charsize * bufsize);
>> +  }
>>memcpy (p, next_str, len);
>
> Xrealloc may return a new block different from p, so memcpy(p, ...)
> may not be what you want.
>
>>p += len;
>>  }
>>va_end (args);
>> +  ret = xrealloc (ret, charsize * total_length + 1);
>>*p = '\0';
>
> Malloc takes time.  How about counting total_length in one loop and
> doing the copy in another?

I mean, we can skip the strlen part and just do strcpy in the second
loop as we already know we have enough space in the dest buffer for
all those null-terminated arguments.

 yousong



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-20 Thread Yousong Zhou
Hi, Pär

On 17 October 2014 03:50, Pär Karlsson  wrote:
> Hi, I fould a potential gotcha when playing with clang's code analysis tool.
>
> The concat_strings function silently stopped counting string lengths when
> given more than 5 arguments. clang warned about potential garbage values in
> the saved_lengths array, so I redid it with this approach.

After taking a closer look, I guess the old implementation is fine.
saved_length[] is used as a buffer for lengths of the first 5
arguments and there is a bound check with its length.  Maybe it's a
false-positive from clang tool?

Sorry for the noise...

Regards.

yousong

>
> All tests working ok with this patch.
>
> This is my first patch to this list, by the way. I'd be happy to help out
> more in the future.
>
> Best regards,
>
> /Pär Karlsson, Sweden
>
> 
>
> commit 2d855670e0e1fbe578506b376cdd40b0e465d3ef
> Author: Pär Karlsson 
> Date:   Thu Oct 16 21:41:36 2014 +0200
>
> Updated ChangeLog
>
> diff --git a/src/ChangeLog b/src/ChangeLog
> index 1c4e2d5..1e39475 100644
> --- a/src/ChangeLog
> +++ b/src/ChangeLog
> @@ -1,3 +1,8 @@
> +2014-10-16  Pär Karlsson  
> +
> +   * utils.c (concat_strings): fixed arbitrary limit of 5 arguments to
> +   function
> +
>  2014-05-03  Tim Ruehsen  
>
> * retr.c (retrieve_url): fixed memory leak
>
> commit 1fa9ff274dcb6e5a2dbbbc7d3fe2f139059c47f1
> Author: Pär Karlsson 
> Date:   Wed Oct 15 00:00:31 2014 +0200
>
> Generalized concat_strings argument length
>
>   The concat_strings function seemed arbitrary to only accept a maximum
>   of 5 arguments (the others were silently ignored).
>
>   Also it had a potential garbage read of the values in the array.
>   Updated with xmalloc/xrealloc/free
>
> diff --git a/src/utils.c b/src/utils.c
> index 78c282e..93c9ddc 100644
> --- a/src/utils.c
> +++ b/src/utils.c
> @@ -356,7 +356,8 @@ char *
>  concat_strings (const char *str0, ...)
>  {
>va_list args;
> -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
> +  size_t psize = sizeof(int);
> +  int *saved_lengths = xmalloc (psize);
>char *ret, *p;
>
>const char *next_str;
> @@ -370,8 +371,8 @@ concat_strings (const char *str0, ...)
>for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *))
>  {
>int len = strlen (next_str);
> -  if (argcount < countof (saved_lengths))
> -saved_lengths[argcount++] = len;
> +  saved_lengths[argcount++] = len;
> +  xrealloc(saved_lengths, psize * argcount);
>total_length += len;
>  }
>va_end (args);
> @@ -393,7 +394,7 @@ concat_strings (const char *str0, ...)
>  }
>va_end (args);
>*p = '\0';
> -
> +  free(saved_lengths);
>return ret;
>  }
>  ^L



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-21 Thread Yousong Zhou
On 21 October 2014 16:17, Pär Karlsson  wrote:
> Yes, you are right, of course. Looking through the original implementation
> again, it seems water tight. clang probably complains about the
> uninitialized values above argcount in saved_lengths[], that are never
> reached.
>
> The precalculated strlen:s saved is likely only an optimization(?) attempt,
> I suppose.

Yes. Grepping through the code shows that currently there is no
invocation of concat_strings() having more than 5 arguments.

>
> Still, it seems wasteful to set up two complete loops with va_arg, and
> considering what this function actually does, I wonder if not s(n)printf
> should be used instead of this function? :-)

I think concat_strings() is more tight and readable than multiple
strlen() + malloc() + snprintf().

Regards.

   yousong



Re: [Bug-wget] broken progressbar in 1.16

2014-10-28 Thread Yousong Zhou
Hi

On 28 October 2014 19:38, Michael Shigorin  wrote:
> On Tue, Oct 28, 2014 at 03:16:00PM +0800, Darshit Shah wrote:
>> While we try our best to regression test every new feature,
>> some issues often seep through the cracks.
>
> I do understand that, hence  RC proposition (would be great to
> have distro maintainers and other interested grumblers join some
> announce list to have a chance to catch bugs before actual
> release happens).
>
>> a new option, --no-scroll has also been added for people who
>> would prefer the progress bar not have a scrolling filename.
>> You could set it in your wgetrc file to add the option globally.
>
> Looks like it's missing from git://git.savannah.gnu.org/wget.git,
> could you point me to the code location please?

I guess it's `--progress=bar:noscroll' of commit
4eeabffee6e5b348d36c4f3ba0579ed086226603

Regards

yousong

>
>> I'm currently traveling, but will take a look into what causes this issue
>> as soon as I can. Having a stable release of Wget is our main priority.
>
> TIA :-)
>
>> However, we were forced to make an urgent release this time
>> around due to a security issue coming up that had to be fixed
>> immediately.
>
> Yup, a CVE coming late in release cycle leaves few options...
> Good luck colleagues, still don't rely on it!
>
> --
>   WBR, Michael Shigorin / http://altlinux.org
>   -- http://opennet.ru / http://anna-news.info
>



Re: [Bug-wget] Need wget feature defending against evil ISP's HPPT 302 HIJACK

2014-12-25 Thread Yousong Zhou
On 24 December 2014 at 16:48, Dawei Tong  wrote:
> Hell wget developers:I live in China and has an China TieTong 
> Telecommunications DSL connetion .This ISP 's servers continous sending http 
> 302 redirect with junk/AD link that corrupt my downloading files. I found 
> this by analyzing the corrupted files, i compared  2 corrupted files from the 
> same source and found they have inserted junk data to normal files.The 
> testing file is a world of tanks game installer, i downloaded twice, both are 
> corrupted.

Not much wget can do in this situation.  Redirected or not, as long as
it is valid HTTP response, it will fetch it.  Wget won't do the
sensing that the payload is AD junk then tell ISP stop hampering the
stream.

> Here is my test result:cmp -b -l b1_WoT.0.9.4_cn_setup.944980-2.bin 
> b2_WoT.0.9.4_cn_setup.944980-2.bin
>  456582373 261 M-1  110 H
>  456582374  44 $124 T

It's binary data.  I suspect your ISP has inserted any HTML elements
in it.  Was it possible that this diff was caused by the website
serving different content for your requests?

Regards


   yousong



Re: [Bug-wget] Need wget feature defending against evil ISP's HPPT 302 HIJACK

2014-12-25 Thread Yousong Zhou
On 26 December 2014 at 11:40, Yousong Zhou  wrote:
>> Here is my test result:cmp -b -l b1_WoT.0.9.4_cn_setup.944980-2.bin 
>> b2_WoT.0.9.4_cn_setup.944980-2.bin
>>  456582373 261 M-1  110 H
>>  456582374  44 $124 T
>
> It's binary data.  I suspect your ISP has inserted any HTML elements
> in it.  Was it possible that this diff was caused by the website
> serving different content for your requests?

Looks like the ISP was trying to do some cache work for you and HTTP
header has slid into the downloaded file. I guess maybe you have to
call your ISP for support...

HTTP/1.1 302 Found
Location:
http://122.72.5.170:9090/data3/1/5/8e/8/ab687c973b490c7e8f4cd285e2188e51/static.flv.uuzuonline.com/20141125141149_57133flv

HTTP/1.1 302 Found
Location:
http://122.72.5.162:909/data5/3/f/85/4/aac8eb450df593348c5adc6da4b485f3/xmp.down.sandai.net/XMPCore_4.9.16.2258.cab



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-01-29 Thread Yousong Zhou
Hi Tim

On 27 January 2015 at 17:48, Tim Ruehsen  wrote:
> Hi Yousong,
>
> this patch seems to be incomplete. Do you have a complete patch (e.g. +new
> option, + docs) or are you going to work on it ?
>

That patch was only intended as a ephemeral one to see if it can solve
the issue reported by Joe at the time.  But checking it again, I now
think the patch actually does the right thing.  The reason is that
since those --content-on-error pages are downloaded, then links within
those pages should be converted as specified by --convert-links.
There is no need for a new option for this and the current doc is just
fine.  But I will try adding an test cases for this.

Regards

yousong


> Tim
>
> On Thursday 16 October 2014 15:24:48 Yousong Zhou wrote:
>> On 13 October 2014 10:25, Joe Hoyle  wrote:
>> > Hi All,
>> >
>> >
>> > I’m having issues using "--convert-links” in conjunction with
>> > "--content-on-error”. Though "--content-on-error” is forcing wget to
>> > download the pages, the links to that “errored” page is not update in
>> > other pages that link to it.
>> >
>> >
>> > This seems to be hinted at in the man page:
>> >
>> >
>> > "Because of this, local browsing works reliably: if a linked file was
>> > downloaded, the link will refer to its local name; if it was not
>> > downloaded, the link will refer to its full Internet address rather than
>> > presenting a broken link. The fact that the former links are converted to
>> > relative links ensures that you can move the downloaded hierarchy to
>> > another directory.”
>> >
>> >
>> > However, it would seem in the case of using —content-on-error it should
>> > ignore this rule and do all the link substation anyhow.
>> >
>> >
>> > If anyone knows if this *should* work then I’d be eager to hear it, or any
>> > other way I can get any 404 pages downloaded and also linked to in the
>> > wget mirror.
>> Currently, wget thought pages with 404 status code were not RETROKF
>> (retrieval was OK) though the 404 page itself was actually downloaded
>> successfully with `--content-on-error` option enabled.  This behaviour
>> is mostly acceptable I guess.  But you can try the attached the patch
>> for the moment.  The other option would be serving the 404 page by
>> manually setting it up with your web server.
>>
>> Regards.
>>
>>yousong



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-01-30 Thread Yousong Zhou
On 29 January 2015 at 21:26, Tim Ruehsen  wrote:
> Hi Yousong,
>
>> > this patch seems to be incomplete. Do you have a complete patch (e.g. +new
>> > option, + docs) or are you going to work on it ?
>>
>> That patch was only intended as a ephemeral one to see if it can solve
>> the issue reported by Joe at the time.  But checking it again, I now
>> think the patch actually does the right thing.  The reason is that
>> since those --content-on-error pages are downloaded, then links within
>> those pages should be converted as specified by --convert-links.
>> There is no need for a new option for this and the current doc is just
>> fine.  But I will try adding an test cases for this.
>
> Ah sorry, my fault / misunderstanding.
> Since the patch changes Wget behaviour I would apply it after the next bugfix
> release.
> A test case would be perfect. Please consider creating a python test case (see
> directory testenv). We will move all test cases from perl to python by the
> time.
>

Well, there they are, with a few fixes for other issues I encountered
when preparing for this.

> Tim


0001-testenv-typo-and-style-fix.patch
Description: Binary data


0002-testenv-allow-color-printer-for-Darwin-platform.patch
Description: Binary data


0003-testenv-fix-http_server.py-with-Response-and-Authent.patch
Description: Binary data


0004-testenv-add-test-case-Test-convert-links-content-on-.patch
Description: Binary data


0005-Fix-content-on-error-option-handling.patch
Description: Binary data


Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-01-30 Thread Yousong Zhou
On 31 January 2015 at 07:53, Giuseppe Scrivano  wrote:
> Yousong Zhou  writes:
>
>> On 29 January 2015 at 21:26, Tim Ruehsen  wrote:
>>> Hi Yousong,
>>>
>>>> > this patch seems to be incomplete. Do you have a complete patch (e.g. 
>>>> > +new
>>>> > option, + docs) or are you going to work on it ?
>>>>
>>>> That patch was only intended as a ephemeral one to see if it can solve
>>>> the issue reported by Joe at the time.  But checking it again, I now
>>>> think the patch actually does the right thing.  The reason is that
>>>> since those --content-on-error pages are downloaded, then links within
>>>> those pages should be converted as specified by --convert-links.
>>>> There is no need for a new option for this and the current doc is just
>>>> fine.  But I will try adding an test cases for this.
>>>
>>> Ah sorry, my fault / misunderstanding.
>>> Since the patch changes Wget behaviour I would apply it after the next 
>>> bugfix
>>> release.
>>> A test case would be perfect. Please consider creating a python test case 
>>> (see
>>> directory testenv). We will move all test cases from perl to python by the
>>> time.
>>>
>>
>> Well, there they are, with a few fixes for other issues I encountered
>> when preparing for this.
>
> patches look fine to me, could you please ensure to write the commit
> message using the ChangeLog format?
>
> When it is just one line log, you can just use the format:
>
> * blah/file (function): Describe what changed.
>
> Otherwise use the format:
>
> one short line to describe the change
>
> * blah/file1 (foo): Describe what changed here.
> * blah/file2 (bar): And here.
>
> More about the ChangeLog style here:
>
> https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs
>

Changes are made to

 - Follow GNU Changelog style commit message.
 - Update the 2nd patch so that no color codes will be printed when
the stdout is not a tty-like device.


yousong

> Thanks,
> Giuseppe


0001-testenv-typo-and-style-fix.patch
Description: Binary data


0002-testenv-improve-color-output-a-bit.patch
Description: Binary data


0003-testenv-fix-http_server.py-with-Response-and-Authent.patch
Description: Binary data


0004-testenv-add-test-case-Test-convert-links-content-on-.patch
Description: Binary data


0005-Fix-content-on-error-option-handling.patch
Description: Binary data


Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-03-08 Thread Yousong Zhou
Hi,

On 31 January 2015 at 10:13, Yousong Zhou  wrote:
>>>> Ah sorry, my fault / misunderstanding.
>>>> Since the patch changes Wget behaviour I would apply it after the next 
>>>> bugfix
>>>> release.
>>>> A test case would be perfect. Please consider creating a python test case 
>>>> (see
>>>> directory testenv). We will move all test cases from perl to python by the
>>>> time.
>>>>
>>>
>>> Well, there they are, with a few fixes for other issues I encountered
>>> when preparing for this.
>>
>> patches look fine to me, could you please ensure to write the commit
>> message using the ChangeLog format?
>>
>> When it is just one line log, you can just use the format:
>>
>> * blah/file (function): Describe what changed.
>>
>> Otherwise use the format:
>>
>> one short line to describe the change
>>
>> * blah/file1 (foo): Describe what changed here.
>> * blah/file2 (bar): And here.
>>
>> More about the ChangeLog style here:
>>
>> https://www.gnu.org/prep/standards/html_node/Style-of-Change-Logs.html#Style-of-Change-Logs
>>
>
> Changes are made to
>
>  - Follow GNU Changelog style commit message.
>  - Update the 2nd patch so that no color codes will be printed when
> the stdout is not a tty-like device.
>

I noticed that wget v1.16.2 has been released.  Is it okay that this
series can be reviewed again and applied?  FYI, attachments can be
found at link [1]

 [1] Re: [Bug-wget] Issue with --content-on-error and --convert-links,
https://lists.gnu.org/archive/html/bug-wget/2015-01/msg00073.html

Regards.

   yousong



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-03-09 Thread Yousong Zhou
On 9 March 2015 at 18:50, Giuseppe Scrivano  wrote:
> Yousong Zhou  writes:
>
>> I noticed that wget v1.16.2 has been released.  Is it okay that this
>> series can be reviewed again and applied?  FYI, attachments can be
>> found at link [1]
>>
>>  [1] Re: [Bug-wget] Issue with --content-on-error and --convert-links,
>> https://lists.gnu.org/archive/html/bug-wget/2015-01/msg00073.html
>
> and now 1.16.3 :)  I've made some changes in the commit message and
> going to push your series in a bit.  Thanks to have worked on it!
>
> Please look at the commit messages style for future reference.

Thanks, they do look better now.

Regards.

   yousong



Re: [Bug-wget] Issue with --content-on-error and --convert-links

2015-04-04 Thread Yousong Zhou
Hi Alexander,


On Apr 5, 2015 3:57 AM, "Alexander Kurakin"  wrote:
>
>  Good day!
>
> So when the patch can be applied?

It has been in the git repository for a while shortly after the
release of v1.16.3 (not yet in a release version).

cheers,
   yousong



Re: [Bug-wget] Multi segment download

2015-09-08 Thread Yousong Zhou
On 29 August 2015 at 04:04, Abhilash Mhaisne  wrote:
> Hey all. I am new to this mailing list.
> As far as I've used wget, it downloads a specified file as a single segment.
> Can we modify this such that wget will download a file by dividing it into
> multiple
> segments and then combining all at reciever host? Just like some
> proprietary download
> managers do? If work on such a feature is going on, I'd like to be a part
> of it.
>

I guess this can be scripted with --start-pos option of wget?

yousong



Re: [Bug-wget] Multi segment download

2015-09-09 Thread Yousong Zhou
On 9 September 2015 at 11:20, Hubert Tarasiuk  wrote:
> On Sat, Aug 29, 2015 at 12:50 AM, Darshit Shah  wrote:
>> Thanking You,
>> Darshit Shah
>> Sent from mobile device. Please excuse my brevity
>> On 29-Aug-2015 1:13 pm, "Tim Rühsen"  wrote:
>>>
>>> Hi,
>>>
>>> normally it makes much more sense when having several download mirrors and
>>> checksums for each chunk. The perfect technique for such is called
>> 'Metalink'
>>> (more on www.metalinker.org).
>>> Wget has it in branch 'master'. A GSOC project of Hubert Tarasiuk.
>>>
>> Sometimes the evil ISPs enforce a per connection bandwidth limit. In such a
>> case, multi segment downloads from a single server do make sense.
>>
>> Since metalink already has support for downloading a file over multiple
>> connections, it should not be too difficult to reuse the code for use
>> outside of metalink.
> The current Metalink impl in Wget will not download from multiple
> mirrors simultaneously since Wget itself is single-threaded.
> Adding optional (POSIX) threads support to Wget (especially for the
> Metalinks) could be perhaps worth discussion.
> For now the solution might be to start multiple Wget instances using
> the --start-pos option and somehow limit the length of download (I am
> not sure if Wget currently has an option to do that).
>

As said in the discussion when we were about to introduce --start-pos
option, we can limit the length of download with other utilities such
as dd.  This is for the consideration of complexity.

Well, I just made a proof of concept shell script for starting
multiple wget processes to download HTTP files [1].

[1] Concurrent WGET with --start-pos option.
https://gist.github.com/yousong/48266375afb68f9fb85f

Cheers,

yousong



[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-08 Thread Yousong Zhou
URL:
  

 Summary: Link failed caused by bad linker option -R
 Project: GNU Wget
Submitted by: yousong
Submitted on: Thu 09 Feb 2017 03:14:16 AM UTC
Category: Build/Install
Severity: 3 - Normal
Priority: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: 
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any
 Release: 1.19
Operating System: GNU/Linux
 Reproducibility: Every Time
   Fixed Release: None
 Planned Release: None
  Regression: Yes
   Work Required: None
  Patch Included: None

___

Details:

With 1.19


gcc  -I/home/yousong/.usr/include   -I/home/yousong/.usr/include  
-DHAVE_LIBSSL -I/home/yousong/.usr/include   -DNDEBUG -isystem
/home/yousong/.usr/include  -L/home/yousong/.usr/lib
-Wl,-rpath,/home/yousong/.usr/lib -L/home/yousong/.usr/lib64
-Wl,-rpath,/home/yousong/.usr/lib64 -o wget connect.o convert.o cookies.o
ftp.o css_.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o hsts.o html-parse.o
html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o
res.o retr.o spider.o url.o warc.o xattr.o utils.o exits.o build_info.o  
version.o ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a -lrt
-L/home/yousong/.usr/lib -liconv -R/home/yousong/.usr/lib
-L/home/yousong/.usr/lib -lpcre   -luuid -L/home/yousong/.usr/lib -lssl
-lcrypto   -L/home/yousong/.usr/lib -lz
gcc: error: unrecognized option '-R'


With 1.18


gcc  -I/home/yousong/.usr/include   -I/home/yousong/.usr/include  
-DHAVE_LIBSSL -I/home/yousong/.usr/include   -DNDEBUG -isystem
/home/yousong/.usr/include  -L/home/yousong/.usr/lib
-Wl,-rpath,/home/yousong/.usr/lib -L/home/yousong/.usr/lib64
-Wl,-rpath,/home/yousong/.usr/lib64 -o wget connect.o convert.o cookies.o
ftp.o css_.o css-url.o ftp-basic.o ftp-ls.o hash.o host.o hsts.o html-parse.o
html-url.o http.o init.o log.o main.o netrc.o progress.o ptimer.o recur.o
res.o retr.o spider.o url.o warc.o utils.o exits.o build_info.o   version.o
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a
/home/yousong/.usr/lib/libiconv.so -Wl,-rpath -Wl,/home/yousong/.usr/lib 
-L/home/yousong/.usr/lib -lpcre   -luuid -L/home/yousong/.usr/lib -lssl
-lcrypto   -L/home/yousong/.usr/lib -lz-lrt


The problem is that I build my own copy of wget and its dependencies and
install them into a non-standard location.  The build system of wget
incorrectly thought that libtool was used and specified the library location
with -R which gcc did not understand and erred.

The script I am using is available at
https://github.com/yousong/build-scripts/blob/master/build-wget.sh




___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-08 Thread Yousong Zhou
Follow-up Comment #1, bug #50260 (project wget):

Sorry, 1.17.1 was intended as a comparison instead of 1.18.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-08 Thread Yousong Zhou
Follow-up Comment #2, bug #50260 (project wget):

Patching configure.ac to use libtool with AC_PROG_LIBTOOL fixed the link issue
for me, though I had to also patch the m4/po.m4 file to workaround the gettext
version check.

The patch hunks are available at
https://github.com/yousong/build-scripts/blob/683ff79878665fd1a422c4f9a8fddf2fd30be718/build-wget.sh.
 Feel free to pick them if they suit your needs ;)

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-09 Thread Yousong Zhou
Follow-up Comment #4, bug #50260 (project wget):

The error happened with wget-1.19 vanilla tarball.

The -R option was probably from m4/lib-link.m4 (search for "-R$found_dir" in
that file) and according from the comment there, it was intended for libtool.

That macro in m4/lib-link.m4 was imported by the AM_ICONV macro in
configure.ac

Below is the related snippet in the generated src/Makefile after configure
run


LTLIBICONV='-L/home/yousong/.usr/lib -liconv -R/home/yousong/.usr/lib'


Note that I have those wget dependency libs including libiconv-1.14 in a
non-standard location ~/.usr/

This should be reproducible with the following steps

 - clone the build-scripts git repo
 - remove the do_patch func in build-wget.sh
 - remove the PKG_AUTOCONF_FIXUP=1 in build-wget.sh
 - "make wget/install/test" will build do the download, compile, install is
test_dir/ without polluting system-level settings.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-09 Thread Yousong Zhou
Follow-up Comment #5, bug #50260 (project wget):

Digging further, this was probably caused by the commit adding LTLIBICONV to
LDADD for wget


commit d4f97dc9afd149afe1f7b16a84eebb4bab1f044a
Author: Tim Rühsen 
Date:   Sat Jun 11 22:38:42 2016 +0200

Add libraries to LDADD for wget

* src/Makefile.am: Add $(GETADDRINFO_LIB) $(HOSTENT_LIB) $(INET_NTOP_LIB)
 $(LIBSOCKET) $(LIB_CLOCK_GETTIME) $(LIB_CRYPTO) $(LIB_SELECT)
 $(LTLIBICONV) $(LTLIBINTL) $(LTLIBTHREAD) $(SERVENT_LIB) to LDADD


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #50260] Link failed caused by bad linker option -R

2017-02-10 Thread Yousong Zhou
Follow-up Comment #7, bug #50260 (project wget):

Yes, using LIB instead of LTLIB worked.  And indeed, according to
the gnulib manual, the LTLIB was intended for linking with libtool.

https://www.gnu.org/software/gnulib/manual/html_node/Searching-for-Libraries.html

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: [Bug-wget] limit download size -- 201901233

2019-01-22 Thread Yousong Zhou
On Wed, 23 Jan 2019 at 12:06,  wrote:
>
> Hi,
>   acording to
> $wget --help
>   i should send reports and suggestions to this address, so i hope i'm doing 
> right here.
>
>the version of my distribution, given by the above command, is "GNU Wget 
> 1.18"
>
>and i don't seem to see an option to limit the retrieval to a certain 
> amount of data or a range.
>is it possible?
>
> thanks in advance and happy new year,
>
> Zui
> 201901233
>

Wget has an option "--start-pos ZERO-based-offset".  I use "head -c N"
for limiting the download size [1].

 [1] mget for downloading pieces of remote file in parallel,
https://github.com/yousong/dconf/blob/master/data/_usr.env/bin/mget#L43

yousong



Re: [Bug-wget] limit download size -- 201901233

2019-01-23 Thread Yousong Zhou
On Thu, 24 Jan 2019 at 02:32, Tim Rühsen  wrote:
>
> On 23.01.19 03:47, c...@free.fr wrote:
> > Hi,
> >   acording to
> > $wget --help
> >   i should send reports and suggestions to this address, so i hope i'm 
> > doing right here.
> >
> >the version of my distribution, given by the above command, is "GNU Wget 
> > 1.18"
> >
> >and i don't seem to see an option to limit the retrieval to a certain 
> > amount of data or a range.
> >is it possible?
> >
> > thanks in advance and happy new year,
> >
> > Zui
> > 201901233
> >
>
> You could set the Range HTTP header - many servers support it.
>
> Like
>
> wget --header "Range: bytes=0-1" https://www.example.com/filename
>
> Regards, Tim
>

At least for wget 1.19.1, it will ignore 206 "Partial Content", unless
we need to make it think it's continuing previous partial download.
Specifying Range header is not an reliable option in this regard

echo -n aaa >b
wget -c -O b --header "Range: 3-1000" URL

yousong



Re: [Bug-wget] limit download size -- 201901233

2019-01-23 Thread Yousong Zhou
On Thu, 24 Jan 2019 at 12:11,  wrote:
>
> - Yousong Zhou  wrote :
> > On Thu, 24 Jan 2019 at 02:32, Tim Rühsen  wrote:
> > >
> > > On 23.01.19 03:47, c...@free.fr wrote:
> > > > Hi,
> > > >   acording to
> > > > $wget --help
> > > >   i should send reports and suggestions to this address, so i hope i'm 
> > > > doing right here.
> > > >
> > > >the version of my distribution, given by the above command, is "GNU 
> > > > Wget 1.18"
> > > >
> > > >and i don't seem to see an option to limit the retrieval to a 
> > > > certain amount of data or a range.
> > > >is it possible?
> > > >
> > > > thanks in advance and happy new year,
> > > >
> > > > Zui
> > > > 201901233
> > > >
> > >
> > > You could set the Range HTTP header - many servers support it.
> > >
> > > Like
> > >
> > > wget --header "Range: bytes=0-1" https://www.example.com/filename
> > >
> > > Regards, Tim
> > >
> >
> > At least for wget 1.19.1, it will ignore 206 "Partial Content", unless
> > we need to make it think it's continuing previous partial download.
> > Specifying Range header is not an reliable option in this regard
> >
> > echo -n aaa >b
> > wget -c -O b --header "Range: 3-1000" URL
> >
> > yousong
> Thank you both for your input...
>   and, as yousong wrote the Range header is not handled correctly by wget 
> (removing boring parts) :
> $ wget --header "Range: bytes=500-1000" https://free.fr
>   --2019-01-24 02:22:25--  https://server.dom/
>   Resolving server.dom (server.dom)... 
>   Connecting to server.dom (server.dom)...   connected.
>   HTTP request sent, awaiting response... 206 Partial Content
>   Retrying.
>
>   --2019-01-24 02:22:26--  (try: 2)  https://server.dom/
>   Connecting to server.dom (server.dom)...   connected.
>   HTTP request sent, awaiting response... 206 Partial Content
>   Retrying.
>
>   <...loop af retries...>
>
>   but curl is not exempt of problems as in (both cases bring the whole thing):
>   $ curl   https://ddg.gg > a
> % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>   Dload  Upload   Total   SpentLeft  
> Speed
>   100   178  100   1780 0310  0 --:--:-- --:--:-- 
> --:--:--   370
>   $ curl --header "Range: bytes=10-40"  https://ddg.gg > a
> % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>   Dload  Upload   Total   SpentLeft  
> Speed
>   100   178  100   1780 0314  0 --:--:-- --:--:-- 
> --:--:--   376
>

curl has --range specifically for this.

>   as for using "| head -c (end-start)" as you apply in mget, doesn' it 
> actually generate more traffic
>   than the expected (end-start) nimber of bytes?
>   (i mean, since the download goes systematically till the end, if i am 
> correct)
>
> zui
> 201901244

when head quit, wget writing to stdout will receive SIGPIPE and is
expected to quit.  It's likely that buffering in wget may cause excess
traffic be transferred on wire but I think the amount should be
neglectable.

yousong