Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-21 Thread Pär Karlsson
Yes, you are right, of course. Looking through the original implementation
again, it seems water tight. clang probably complains about the
uninitialized values above argcount in saved_lengths[], that are never
reached.

The precalculated strlen:s saved is likely only an optimization(?) attempt,
I suppose.

Still, it seems wasteful to set up two complete loops with va_arg, and
considering what this function actually does, I wonder if not s(n)printf
should be used instead of this function? :-)

Best regards,

/Pär

2014-10-21 4:22 GMT+02:00 Yousong Zhou yszhou4t...@gmail.com:

 Hi, Pär

 On 17 October 2014 03:50, Pär Karlsson feino...@gmail.com wrote:
  Hi, I fould a potential gotcha when playing with clang's code analysis
 tool.
 
  The concat_strings function silently stopped counting string lengths when
  given more than 5 arguments. clang warned about potential garbage values
 in
  the saved_lengths array, so I redid it with this approach.

 After taking a closer look, I guess the old implementation is fine.
 saved_length[] is used as a buffer for lengths of the first 5
 arguments and there is a bound check with its length.  Maybe it's a
 false-positive from clang tool?

 Sorry for the noise...

 Regards.

 yousong

 
  All tests working ok with this patch.
 
  This is my first patch to this list, by the way. I'd be happy to help out
  more in the future.
 
  Best regards,
 
  /Pär Karlsson, Sweden
 
  
 
  commit 2d855670e0e1fbe578506b376cdd40b0e465d3ef
  Author: Pär Karlsson feino...@gmail.com
  Date:   Thu Oct 16 21:41:36 2014 +0200
 
  Updated ChangeLog
 
  diff --git a/src/ChangeLog b/src/ChangeLog
  index 1c4e2d5..1e39475 100644
  --- a/src/ChangeLog
  +++ b/src/ChangeLog
  @@ -1,3 +1,8 @@
  +2014-10-16  Pär Karlsson  feino...@gmail.com
  +
  +   * utils.c (concat_strings): fixed arbitrary limit of 5 arguments
 to
  +   function
  +
   2014-05-03  Tim Ruehsen  tim.rueh...@gmx.de
 
  * retr.c (retrieve_url): fixed memory leak
 
  commit 1fa9ff274dcb6e5a2dbbbc7d3fe2f139059c47f1
  Author: Pär Karlsson feino...@gmail.com
  Date:   Wed Oct 15 00:00:31 2014 +0200
 
  Generalized concat_strings argument length
 
The concat_strings function seemed arbitrary to only accept a
 maximum
of 5 arguments (the others were silently ignored).
 
Also it had a potential garbage read of the values in the array.
Updated with xmalloc/xrealloc/free
 
  diff --git a/src/utils.c b/src/utils.c
  index 78c282e..93c9ddc 100644
  --- a/src/utils.c
  +++ b/src/utils.c
  @@ -356,7 +356,8 @@ char *
   concat_strings (const char *str0, ...)
   {
 va_list args;
  -  int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */
  +  size_t psize = sizeof(int);
  +  int *saved_lengths = xmalloc (psize);
 char *ret, *p;
 
 const char *next_str;
  @@ -370,8 +371,8 @@ concat_strings (const char *str0, ...)
 for (next_str = str0; next_str != NULL; next_str = va_arg (args, char
 *))
   {
 int len = strlen (next_str);
  -  if (argcount  countof (saved_lengths))
  -saved_lengths[argcount++] = len;
  +  saved_lengths[argcount++] = len;
  +  xrealloc(saved_lengths, psize * argcount);
 total_length += len;
   }
 va_end (args);
  @@ -393,7 +394,7 @@ concat_strings (const char *str0, ...)
   }
 va_end (args);
 *p = '\0';
  -
  +  free(saved_lengths);
 return ret;
   }
   ^L



Re: [Bug-wget] Send Content-Length with POST 0 length body

2014-10-21 Thread Matthew Atkinson
On 20/10/14 21:16, Tim Rühsen wrote:
 
 Since Wget does not explicitly support PUT, we may just care for the POST 
 request. Since servers may reject a POST without Content-Length, we are 
 better 
 off with supplying one. Since PUT (and also PATCH) request 'anticipate' a 
 body, 
 we could also care for these.
 
 Matthew, could you also check for 'put and 'patch' request and send an 
 amended 
 version of the patch. FYI, HTTP PATCH request is rfc5789.

Attached
From 0bc85a619677dc8a0313a5596497d42190fb3452 Mon Sep 17 00:00:00 2001
From: Matthew Atkinson mutley...@ntlworld.com
Date: Tue, 21 Oct 2014 09:28:45 +0100
Subject: [PATCH] Always send Content-Length with POST, PUT, PATCH

---
 src/ChangeLog | 6 ++
 src/http.c| 4 
 2 files changed, 10 insertions(+)

diff --git a/src/ChangeLog b/src/ChangeLog
index d5aeca0..0f92db5 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,9 @@
+2014-10-21  Matthew Atkinson  mutley...@ntlworld.com
+
+	* http.c (gethttp): Always send Content-Length header when method is POST,
+	PUT, or PATCH even with no post body, as some servers will reject the
+	request otherwise
+
 2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de
 
 	* url.c (url_parse): little code cleanup
diff --git a/src/http.c b/src/http.c
index 4b99c17..2469852 100644
--- a/src/http.c
+++ b/src/http.c
@@ -1875,6 +1875,10 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
   xstrdup (number_to_static_string (body_data_size)),
   rel_value);
 }
+  else if (strcasecmp (opt.method, post) == 0
+   || strcasecmp (opt.method, put) == 0
+   || strcasecmp (opt.method, patch) == 0)
+request_set_header (req, Content-Length, 0, rel_none);
 }
 
  retry_with_auth:
-- 
1.9.1



Re: [Bug-wget] Send Content-Length with POST 0 length body

2014-10-21 Thread Giuseppe Scrivano
Matthew Atkinson mutley...@ntlworld.com writes:

 On 20/10/14 21:16, Tim Rühsen wrote:
  
 Since Wget does not explicitly support PUT, we may just care for the POST 
 request. Since servers may reject a POST without Content-Length, we are 
 better 
 off with supplying one. Since PUT (and also PATCH) request 'anticipate' a 
 body, 
 we could also care for these.
 
 Matthew, could you also check for 'put and 'patch' request and send an 
 amended 
 version of the patch. FYI, HTTP PATCH request is rfc5789.

 Attached

thanks pushed now!  Congratulations on your first contribution to wget.

Regards,
Giuseppe



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-21 Thread Yousong Zhou
On 21 October 2014 16:17, Pär Karlsson feino...@gmail.com wrote:
 Yes, you are right, of course. Looking through the original implementation
 again, it seems water tight. clang probably complains about the
 uninitialized values above argcount in saved_lengths[], that are never
 reached.

 The precalculated strlen:s saved is likely only an optimization(?) attempt,
 I suppose.

Yes. Grepping through the code shows that currently there is no
invocation of concat_strings() having more than 5 arguments.


 Still, it seems wasteful to set up two complete loops with va_arg, and
 considering what this function actually does, I wonder if not s(n)printf
 should be used instead of this function? :-)

I think concat_strings() is more tight and readable than multiple
strlen() + malloc() + snprintf().

Regards.

   yousong



Re: [Bug-wget] [Win32 Patch] console-close events

2014-10-21 Thread Gisle Vanem

Ángel González keis...@gmail.com wrote:


I would expect someone running a console application to have a console open. I 
understand the rationale when it's a beginner
learning to program and is creating a console application, but don't really see 
a usecase for wget. How are you running wget that
you get an autoclosing console?


Easily. When e.g. Windows needs to restart (because of a WinUpdate etc.). It
sends a WM_QUERYENDSESSION to all (?) top-level windows. The
Console handler translates that to a CTRL_SHUTDOWN_EVENT for the
program in that console. Details here:
 
http://blogs.msdn.com/b/ntdebugging/archive/2007/06/09/how-windows-shuts-down.aspx

Darit, about the git format-patch. I don't know how.

The src/Changelog entry could simply be:

2014-10-21  Gisle Vanem gva...@yahoo.no

* mswindows.c (ws_handler): Added handling of
  CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT
  to cleanup before Wget exits. Added function ws_event_name() to retrieve
  the event name.

The diff for mswindows.c is as in my original message.

--gv




Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-21 Thread Micah Cowan
On Mon, Oct 20, 2014 at 7:02 PM, Yousong Zhou yszhou4t...@gmail.com wrote:
 I am not sure here.  Do we always assume sizeof(char) to be 1 for
 platforms supported by wget?

FWIW, sizeof(char) is always 1 by definition; the C standard
guarantees it. Even on systems with no addressable values smaller than
16 bits, because on such systems, C defines a byte to be 16 bits
(recall that, originally at least, a byte isn't necessarily an octet,
and that's the meaning the C standards use).

If a platform doesn't have sizeof(char) == 1, it's not the C language. :)

-mjc



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-21 Thread Pär Karlsson
Thanks!

It didn't even occur to me to check this out. My only excuse is gratuitous
consistency and lack of pure C experience; a malloc() without a
corresponding sizeof() seemed a little arbitrary to me, but it makes sense
now :-)

/Pär

2014-10-21 17:46 GMT+02:00 Micah Cowan mi...@addictivecode.org:

 On Mon, Oct 20, 2014 at 7:02 PM, Yousong Zhou yszhou4t...@gmail.com
 wrote:
  I am not sure here.  Do we always assume sizeof(char) to be 1 for
  platforms supported by wget?

 FWIW, sizeof(char) is always 1 by definition; the C standard
 guarantees it. Even on systems with no addressable values smaller than
 16 bits, because on such systems, C defines a byte to be 16 bits
 (recall that, originally at least, a byte isn't necessarily an octet,
 and that's the meaning the C standards use).

 If a platform doesn't have sizeof(char) == 1, it's not the C language. :)

 -mjc



Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings

2014-10-21 Thread Micah Cowan
On Tue, Oct 21, 2014 at 8:55 AM, Pär Karlsson feino...@gmail.com wrote:
 Thanks!

 It didn't even occur to me to check this out. My only excuse is gratuitous
 consistency and lack of pure C experience; a malloc() without a
 corresponding sizeof() seemed a little arbitrary to me, but it makes sense
 now :-)

Well, I'm always happy to see someone being more careful than
necessary, than the other direction. ;)

-mjc



[Bug-wget] [PATCH] Add option for absolute extension-adjusted url conversion

2014-10-21 Thread Gabriel L. Somlo
Add option --convert-transparent-proxy which converts links similarly to
-k a.k.a. --convert-links by applying target file name changes made by
the --adjust-extension option, but instead of generating relative links
for offline, file:// based browsing, generate absolute adjusted URLs
which will work if the content is to be hosted from a transparent proxy
server, e.g. in conjunction with --span-hosts.

Signed-off-by: Gabriel Somlo so...@cmu.edu
---

I'm not sure I like --convert-transparent-proxy for the option name, but
don't strongly dislike it either, so any better ideas much appreciated.

Also, if there's an easy way to compare the current document host name and
protocol against the link target URL's, we could generate relative links
within the same host and only use absolute links when going across hosts,
but I couldn't immediately identify a simple and clean way of doing that.

Any other suggestions for improvement much appreciated.

Thanks much,
   Gabriel

 doc/wget.texi | 18 
 src/convert.c | 91 +++
 src/convert.h |  3 +-
 src/init.c|  1 +
 src/main.c| 26 +++--
 src/options.h |  2 ++
 6 files changed, 120 insertions(+), 21 deletions(-)

diff --git a/doc/wget.texi b/doc/wget.texi
index 1e1dd36..feb9abd 100644
--- a/doc/wget.texi
+++ b/doc/wget.texi
@@ -1979,6 +1979,24 @@ Note that only at the end of the download can Wget know 
which links have
 been downloaded.  Because of that, the work done by @samp{-k} will be
 performed at the end of all the downloads.
 
+@cindex conversion of links
+@cindex link conversion
+@item --convert-transparent-proxy
+@itemx --convert-transparent-proxy
+Similar to @samp{--convert-links}, but convert downloaded links to
+absolute URLs adusted to match any changes made by @samp{--adjust-extension}.
+This behavior is useful when the downloaded content is intended to
+be served from a transparent proxy instead of being viewed locally.
+Behavior w.r.t.  links to files that have not been downloaded is
+identical to @samp{--convert-links}.
+
+Example: if the document @file{http://@var{hostname-1}/foo/doc.html} links
+to a script @file{http://@var{hostname-2}/bar/abc.cgi?xyz_arg}, which gets
+downloaded as @file{/download/hostname-2/bar/abc.cgi?xyz_arg.css}, then the
+link in @file{/download/hostname-1/foo/doc.html} will be modified to point
+to @file{http://@var{hostname-2}/bar/abc.cgi%3Fxyz_arg.css}, which can then
+be served from the transparent proxy when a client loads @file{doc.html}.
+
 @cindex backing up converted files
 @item -K
 @itemx --backup-converted
diff --git a/src/convert.c b/src/convert.c
index c147686..df708f5 100644
--- a/src/convert.c
+++ b/src/convert.c
@@ -61,6 +61,7 @@ static void convert_links (const char *, struct urlpos *);
 static void
 convert_links_in_hashtable (struct hash_table *downloaded_set,
 int is_css,
+bool is_tpu,
 int *file_count)
 {
   int i;
@@ -132,12 +133,20 @@ convert_links_in_hashtable (struct hash_table 
*downloaded_set,
   /* Decide on the conversion type.  */
   if (local_name)
 {
-  /* We've downloaded this URL.  Convert it to relative
- form.  We do this even if the URL already is in
- relative form, because our directory structure may
- not be identical to that on the server (think `-nd',
+  /* We've downloaded this URL.  Convert it to reflect
+ any extension adjustments.  Conversion may be to either
+ relative or absolute form, depending on whether local
+ viewing or transparent proxying are required.
+ We do this even if the URL already is in relative
+ form, because our directory structure may not be
+ identical to that on the server (think `-nd',
  `--cut-dirs', etc.)  */
-  cur_url-convert = CO_CONVERT_TO_RELATIVE;
+  cur_url-convert = is_tpu ? CO_CONVERT_TO_TPU_ABSOLUTE
+: CO_CONVERT_TO_RELATIVE;
+  /* FIXME: if we could distinguish between links that stay
+   *within the same host and links going to a different
+   *host, we could limit transparent-proxy/absolute link
+   *conversion to cross-host links only */
   cur_url-local_name = xstrdup (local_name);
   DEBUGP ((will convert url %s to local %s\n, u-url, 
local_name));
 }
@@ -173,24 +182,25 @@ convert_links_in_hashtable (struct hash_table 
*downloaded_set,
direction to convert to.
 
The direction means that the URLs to the files that have been
-   downloaded get converted to the relative URL which will point to
-   that file.  And the other URLs get converted to the remote URL on
-   the server.
+   downloaded 

Re: [Bug-wget] Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)

2014-10-21 Thread Ángel González

On 21/10/14 09:50, Giuseppe Scrivano wrote:

Ángel,
thanks for the debugging effort!  Would you mind to send a separate
patch that doesn't use strncpy (so to make make dist-check happy),
probably we can just use memcpy here.

Regards,
Giuseppe

The revert to the old behavior was only intended for checking it was the
right place. We have xstrndup() to do just what we want. Please test.

Regards



From 66ae9d897a4a2b7458a5d61230dc316befddd731 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81ngel=20Gonz=C3=A1lez?= keis...@gmail.com
Date: Wed, 22 Oct 2014 01:10:21 +0200
Subject: [PATCH] css-url.c (get_uri_string): Fix regression from 8e6de1fb5

Solves the issue discovered by Gabriel Somlo and reported in the ml thread
Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)
---
 src/ChangeLog | 4 
 src/css-url.c | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/ChangeLog b/src/ChangeLog
index d5aeca0..6004c34 100644
--- a/src/ChangeLog
+++ b/src/ChangeLog
@@ -1,3 +1,7 @@
+2013-10-22  Ángel González keis...@gmail.com
+
+	* css-url.c (get_uri_string): Fix regression from 8e6de1fb5.
+
 2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de
 
 	* url.c (url_parse): little code cleanup
diff --git a/src/css-url.c b/src/css-url.c
index c605798..2ac1cef 100644
--- a/src/css-url.c
+++ b/src/css-url.c
@@ -45,6 +45,7 @@ as that of the covered work.  */
 #include stdlib.h
 #include ctype.h
 #include errno.h
+#include xstrndup.h
 
 #include wget.h
 #include utils.h
@@ -97,7 +98,7 @@ get_uri_string (const char *at, int *pos, int *length)
   *length -= 2;
 }
 
-  return xstrdup (at + *pos);
+  return xstrndup (uri, at + *pos, *length);;
 }
 
 void
-- 
2.1.2



Re: [Bug-wget] Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)

2014-10-21 Thread Gabriel Somlo
On Wed, Oct 22, 2014 at 01:20:17AM +0200, ??ngel Gonz??lez wrote:
 On 21/10/14 09:50, Giuseppe Scrivano wrote:
 ??ngel,
 thanks for the debugging effort!  Would you mind to send a separate
 patch that doesn't use strncpy (so to make make dist-check happy),
 probably we can just use memcpy here.
 
 Regards,
 Giuseppe
 The revert to the old behavior was only intended for checking it was the
 right place. We have xstrndup() to do just what we want. Please test.

Build fails with:

make[3]: Entering directory `/home/somlo/PROJECTS/CERT/WGETHACK/wget/src'
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ 
-DLOCALEDIR=\/usr/local/share/locale\ -I.  -I../lib -I../lib   -g -O2 -MT 
css-url.o -MD -MP -MF .deps/css-url.Tpo -c -o css-url.o css-url.c
css-url.c:48:22: fatal error: xstrndup.h: No such file or directory
 #include xstrndup.h
  ^
compilation terminated.
make[3]: *** [css-url.o] Error 1


There seems to be no -I../gnulib/lib on the gcc command line, which
is probably why it can't find it.

Thanks,
--Gabriel

 From 66ae9d897a4a2b7458a5d61230dc316befddd731 Mon Sep 17 00:00:00 2001
 From: =?UTF-8?q?=C3=81ngel=20Gonz=C3=A1lez?= keis...@gmail.com
 Date: Wed, 22 Oct 2014 01:10:21 +0200
 Subject: [PATCH] css-url.c (get_uri_string): Fix regression from 8e6de1fb5
 
 Solves the issue discovered by Gabriel Somlo and reported in the ml thread
 Regression in git master branch (commit 
 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)
 ---
  src/ChangeLog | 4 
  src/css-url.c | 3 ++-
  2 files changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/src/ChangeLog b/src/ChangeLog
 index d5aeca0..6004c34 100644
 --- a/src/ChangeLog
 +++ b/src/ChangeLog
 @@ -1,3 +1,7 @@
 +2013-10-22  ??ngel Gonz??lez keis...@gmail.com
 +
 + * css-url.c (get_uri_string): Fix regression from 8e6de1fb5.
 +
  2014-10-16  Tim Ruehsen  tim.rueh...@gmx.de
  
   * url.c (url_parse): little code cleanup
 diff --git a/src/css-url.c b/src/css-url.c
 index c605798..2ac1cef 100644
 --- a/src/css-url.c
 +++ b/src/css-url.c
 @@ -45,6 +45,7 @@ as that of the covered work.  */
  #include stdlib.h
  #include ctype.h
  #include errno.h
 +#include xstrndup.h
  
  #include wget.h
  #include utils.h
 @@ -97,7 +98,7 @@ get_uri_string (const char *at, int *pos, int *length)
*length -= 2;
  }
  
 -  return xstrdup (at + *pos);
 +  return xstrndup (uri, at + *pos, *length);;
  }
  
  void
 -- 
 2.1.2