Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings
Yes, you are right, of course. Looking through the original implementation again, it seems water tight. clang probably complains about the uninitialized values above argcount in saved_lengths[], that are never reached. The precalculated strlen:s saved is likely only an optimization(?) attempt, I suppose. Still, it seems wasteful to set up two complete loops with va_arg, and considering what this function actually does, I wonder if not s(n)printf should be used instead of this function? :-) Best regards, /Pär 2014-10-21 4:22 GMT+02:00 Yousong Zhou yszhou4t...@gmail.com: Hi, Pär On 17 October 2014 03:50, Pär Karlsson feino...@gmail.com wrote: Hi, I fould a potential gotcha when playing with clang's code analysis tool. The concat_strings function silently stopped counting string lengths when given more than 5 arguments. clang warned about potential garbage values in the saved_lengths array, so I redid it with this approach. After taking a closer look, I guess the old implementation is fine. saved_length[] is used as a buffer for lengths of the first 5 arguments and there is a bound check with its length. Maybe it's a false-positive from clang tool? Sorry for the noise... Regards. yousong All tests working ok with this patch. This is my first patch to this list, by the way. I'd be happy to help out more in the future. Best regards, /Pär Karlsson, Sweden commit 2d855670e0e1fbe578506b376cdd40b0e465d3ef Author: Pär Karlsson feino...@gmail.com Date: Thu Oct 16 21:41:36 2014 +0200 Updated ChangeLog diff --git a/src/ChangeLog b/src/ChangeLog index 1c4e2d5..1e39475 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,8 @@ +2014-10-16 Pär Karlsson feino...@gmail.com + + * utils.c (concat_strings): fixed arbitrary limit of 5 arguments to + function + 2014-05-03 Tim Ruehsen tim.rueh...@gmx.de * retr.c (retrieve_url): fixed memory leak commit 1fa9ff274dcb6e5a2dbbbc7d3fe2f139059c47f1 Author: Pär Karlsson feino...@gmail.com Date: Wed Oct 15 00:00:31 2014 +0200 Generalized concat_strings argument length The concat_strings function seemed arbitrary to only accept a maximum of 5 arguments (the others were silently ignored). Also it had a potential garbage read of the values in the array. Updated with xmalloc/xrealloc/free diff --git a/src/utils.c b/src/utils.c index 78c282e..93c9ddc 100644 --- a/src/utils.c +++ b/src/utils.c @@ -356,7 +356,8 @@ char * concat_strings (const char *str0, ...) { va_list args; - int saved_lengths[5]; /* inspired by Apache's apr_pstrcat */ + size_t psize = sizeof(int); + int *saved_lengths = xmalloc (psize); char *ret, *p; const char *next_str; @@ -370,8 +371,8 @@ concat_strings (const char *str0, ...) for (next_str = str0; next_str != NULL; next_str = va_arg (args, char *)) { int len = strlen (next_str); - if (argcount countof (saved_lengths)) -saved_lengths[argcount++] = len; + saved_lengths[argcount++] = len; + xrealloc(saved_lengths, psize * argcount); total_length += len; } va_end (args); @@ -393,7 +394,7 @@ concat_strings (const char *str0, ...) } va_end (args); *p = '\0'; - + free(saved_lengths); return ret; } ^L
Re: [Bug-wget] Send Content-Length with POST 0 length body
On 20/10/14 21:16, Tim Rühsen wrote: Since Wget does not explicitly support PUT, we may just care for the POST request. Since servers may reject a POST without Content-Length, we are better off with supplying one. Since PUT (and also PATCH) request 'anticipate' a body, we could also care for these. Matthew, could you also check for 'put and 'patch' request and send an amended version of the patch. FYI, HTTP PATCH request is rfc5789. Attached From 0bc85a619677dc8a0313a5596497d42190fb3452 Mon Sep 17 00:00:00 2001 From: Matthew Atkinson mutley...@ntlworld.com Date: Tue, 21 Oct 2014 09:28:45 +0100 Subject: [PATCH] Always send Content-Length with POST, PUT, PATCH --- src/ChangeLog | 6 ++ src/http.c| 4 2 files changed, 10 insertions(+) diff --git a/src/ChangeLog b/src/ChangeLog index d5aeca0..0f92db5 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,9 @@ +2014-10-21 Matthew Atkinson mutley...@ntlworld.com + + * http.c (gethttp): Always send Content-Length header when method is POST, + PUT, or PATCH even with no post body, as some servers will reject the + request otherwise + 2014-10-16 Tim Ruehsen tim.rueh...@gmx.de * url.c (url_parse): little code cleanup diff --git a/src/http.c b/src/http.c index 4b99c17..2469852 100644 --- a/src/http.c +++ b/src/http.c @@ -1875,6 +1875,10 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, xstrdup (number_to_static_string (body_data_size)), rel_value); } + else if (strcasecmp (opt.method, post) == 0 + || strcasecmp (opt.method, put) == 0 + || strcasecmp (opt.method, patch) == 0) +request_set_header (req, Content-Length, 0, rel_none); } retry_with_auth: -- 1.9.1
Re: [Bug-wget] Send Content-Length with POST 0 length body
Matthew Atkinson mutley...@ntlworld.com writes: On 20/10/14 21:16, Tim Rühsen wrote: Since Wget does not explicitly support PUT, we may just care for the POST request. Since servers may reject a POST without Content-Length, we are better off with supplying one. Since PUT (and also PATCH) request 'anticipate' a body, we could also care for these. Matthew, could you also check for 'put and 'patch' request and send an amended version of the patch. FYI, HTTP PATCH request is rfc5789. Attached thanks pushed now! Congratulations on your first contribution to wget. Regards, Giuseppe
Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings
On 21 October 2014 16:17, Pär Karlsson feino...@gmail.com wrote: Yes, you are right, of course. Looking through the original implementation again, it seems water tight. clang probably complains about the uninitialized values above argcount in saved_lengths[], that are never reached. The precalculated strlen:s saved is likely only an optimization(?) attempt, I suppose. Yes. Grepping through the code shows that currently there is no invocation of concat_strings() having more than 5 arguments. Still, it seems wasteful to set up two complete loops with va_arg, and considering what this function actually does, I wonder if not s(n)printf should be used instead of this function? :-) I think concat_strings() is more tight and readable than multiple strlen() + malloc() + snprintf(). Regards. yousong
Re: [Bug-wget] [Win32 Patch] console-close events
Ángel González keis...@gmail.com wrote: I would expect someone running a console application to have a console open. I understand the rationale when it's a beginner learning to program and is creating a console application, but don't really see a usecase for wget. How are you running wget that you get an autoclosing console? Easily. When e.g. Windows needs to restart (because of a WinUpdate etc.). It sends a WM_QUERYENDSESSION to all (?) top-level windows. The Console handler translates that to a CTRL_SHUTDOWN_EVENT for the program in that console. Details here: http://blogs.msdn.com/b/ntdebugging/archive/2007/06/09/how-windows-shuts-down.aspx Darit, about the git format-patch. I don't know how. The src/Changelog entry could simply be: 2014-10-21 Gisle Vanem gva...@yahoo.no * mswindows.c (ws_handler): Added handling of CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT and CTRL_SHUTDOWN_EVENT to cleanup before Wget exits. Added function ws_event_name() to retrieve the event name. The diff for mswindows.c is as in my original message. --gv
Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings
On Mon, Oct 20, 2014 at 7:02 PM, Yousong Zhou yszhou4t...@gmail.com wrote: I am not sure here. Do we always assume sizeof(char) to be 1 for platforms supported by wget? FWIW, sizeof(char) is always 1 by definition; the C standard guarantees it. Even on systems with no addressable values smaller than 16 bits, because on such systems, C defines a byte to be 16 bits (recall that, originally at least, a byte isn't necessarily an octet, and that's the meaning the C standards use). If a platform doesn't have sizeof(char) == 1, it's not the C language. :) -mjc
Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings
Thanks! It didn't even occur to me to check this out. My only excuse is gratuitous consistency and lack of pure C experience; a malloc() without a corresponding sizeof() seemed a little arbitrary to me, but it makes sense now :-) /Pär 2014-10-21 17:46 GMT+02:00 Micah Cowan mi...@addictivecode.org: On Mon, Oct 20, 2014 at 7:02 PM, Yousong Zhou yszhou4t...@gmail.com wrote: I am not sure here. Do we always assume sizeof(char) to be 1 for platforms supported by wget? FWIW, sizeof(char) is always 1 by definition; the C standard guarantees it. Even on systems with no addressable values smaller than 16 bits, because on such systems, C defines a byte to be 16 bits (recall that, originally at least, a byte isn't necessarily an octet, and that's the meaning the C standards use). If a platform doesn't have sizeof(char) == 1, it's not the C language. :) -mjc
Re: [Bug-wget] [PATCH] Small fix for limited number of strings (and potential garbage value) in arguments to concat_strings
On Tue, Oct 21, 2014 at 8:55 AM, Pär Karlsson feino...@gmail.com wrote: Thanks! It didn't even occur to me to check this out. My only excuse is gratuitous consistency and lack of pure C experience; a malloc() without a corresponding sizeof() seemed a little arbitrary to me, but it makes sense now :-) Well, I'm always happy to see someone being more careful than necessary, than the other direction. ;) -mjc
[Bug-wget] [PATCH] Add option for absolute extension-adjusted url conversion
Add option --convert-transparent-proxy which converts links similarly to -k a.k.a. --convert-links by applying target file name changes made by the --adjust-extension option, but instead of generating relative links for offline, file:// based browsing, generate absolute adjusted URLs which will work if the content is to be hosted from a transparent proxy server, e.g. in conjunction with --span-hosts. Signed-off-by: Gabriel Somlo so...@cmu.edu --- I'm not sure I like --convert-transparent-proxy for the option name, but don't strongly dislike it either, so any better ideas much appreciated. Also, if there's an easy way to compare the current document host name and protocol against the link target URL's, we could generate relative links within the same host and only use absolute links when going across hosts, but I couldn't immediately identify a simple and clean way of doing that. Any other suggestions for improvement much appreciated. Thanks much, Gabriel doc/wget.texi | 18 src/convert.c | 91 +++ src/convert.h | 3 +- src/init.c| 1 + src/main.c| 26 +++-- src/options.h | 2 ++ 6 files changed, 120 insertions(+), 21 deletions(-) diff --git a/doc/wget.texi b/doc/wget.texi index 1e1dd36..feb9abd 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -1979,6 +1979,24 @@ Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by @samp{-k} will be performed at the end of all the downloads. +@cindex conversion of links +@cindex link conversion +@item --convert-transparent-proxy +@itemx --convert-transparent-proxy +Similar to @samp{--convert-links}, but convert downloaded links to +absolute URLs adusted to match any changes made by @samp{--adjust-extension}. +This behavior is useful when the downloaded content is intended to +be served from a transparent proxy instead of being viewed locally. +Behavior w.r.t. links to files that have not been downloaded is +identical to @samp{--convert-links}. + +Example: if the document @file{http://@var{hostname-1}/foo/doc.html} links +to a script @file{http://@var{hostname-2}/bar/abc.cgi?xyz_arg}, which gets +downloaded as @file{/download/hostname-2/bar/abc.cgi?xyz_arg.css}, then the +link in @file{/download/hostname-1/foo/doc.html} will be modified to point +to @file{http://@var{hostname-2}/bar/abc.cgi%3Fxyz_arg.css}, which can then +be served from the transparent proxy when a client loads @file{doc.html}. + @cindex backing up converted files @item -K @itemx --backup-converted diff --git a/src/convert.c b/src/convert.c index c147686..df708f5 100644 --- a/src/convert.c +++ b/src/convert.c @@ -61,6 +61,7 @@ static void convert_links (const char *, struct urlpos *); static void convert_links_in_hashtable (struct hash_table *downloaded_set, int is_css, +bool is_tpu, int *file_count) { int i; @@ -132,12 +133,20 @@ convert_links_in_hashtable (struct hash_table *downloaded_set, /* Decide on the conversion type. */ if (local_name) { - /* We've downloaded this URL. Convert it to relative - form. We do this even if the URL already is in - relative form, because our directory structure may - not be identical to that on the server (think `-nd', + /* We've downloaded this URL. Convert it to reflect + any extension adjustments. Conversion may be to either + relative or absolute form, depending on whether local + viewing or transparent proxying are required. + We do this even if the URL already is in relative + form, because our directory structure may not be + identical to that on the server (think `-nd', `--cut-dirs', etc.) */ - cur_url-convert = CO_CONVERT_TO_RELATIVE; + cur_url-convert = is_tpu ? CO_CONVERT_TO_TPU_ABSOLUTE +: CO_CONVERT_TO_RELATIVE; + /* FIXME: if we could distinguish between links that stay + *within the same host and links going to a different + *host, we could limit transparent-proxy/absolute link + *conversion to cross-host links only */ cur_url-local_name = xstrdup (local_name); DEBUGP ((will convert url %s to local %s\n, u-url, local_name)); } @@ -173,24 +182,25 @@ convert_links_in_hashtable (struct hash_table *downloaded_set, direction to convert to. The direction means that the URLs to the files that have been - downloaded get converted to the relative URL which will point to - that file. And the other URLs get converted to the remote URL on - the server. + downloaded
Re: [Bug-wget] Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)
On 21/10/14 09:50, Giuseppe Scrivano wrote: Ángel, thanks for the debugging effort! Would you mind to send a separate patch that doesn't use strncpy (so to make make dist-check happy), probably we can just use memcpy here. Regards, Giuseppe The revert to the old behavior was only intended for checking it was the right place. We have xstrndup() to do just what we want. Please test. Regards From 66ae9d897a4a2b7458a5d61230dc316befddd731 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Gonz=C3=A1lez?= keis...@gmail.com Date: Wed, 22 Oct 2014 01:10:21 +0200 Subject: [PATCH] css-url.c (get_uri_string): Fix regression from 8e6de1fb5 Solves the issue discovered by Gabriel Somlo and reported in the ml thread Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2) --- src/ChangeLog | 4 src/css-url.c | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/ChangeLog b/src/ChangeLog index d5aeca0..6004c34 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,7 @@ +2013-10-22 Ãngel González keis...@gmail.com + + * css-url.c (get_uri_string): Fix regression from 8e6de1fb5. + 2014-10-16 Tim Ruehsen tim.rueh...@gmx.de * url.c (url_parse): little code cleanup diff --git a/src/css-url.c b/src/css-url.c index c605798..2ac1cef 100644 --- a/src/css-url.c +++ b/src/css-url.c @@ -45,6 +45,7 @@ as that of the covered work. */ #include stdlib.h #include ctype.h #include errno.h +#include xstrndup.h #include wget.h #include utils.h @@ -97,7 +98,7 @@ get_uri_string (const char *at, int *pos, int *length) *length -= 2; } - return xstrdup (at + *pos); + return xstrndup (uri, at + *pos, *length);; } void -- 2.1.2
Re: [Bug-wget] Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2)
On Wed, Oct 22, 2014 at 01:20:17AM +0200, ??ngel Gonz??lez wrote: On 21/10/14 09:50, Giuseppe Scrivano wrote: ??ngel, thanks for the debugging effort! Would you mind to send a separate patch that doesn't use strncpy (so to make make dist-check happy), probably we can just use memcpy here. Regards, Giuseppe The revert to the old behavior was only intended for checking it was the right place. We have xstrndup() to do just what we want. Please test. Build fails with: make[3]: Entering directory `/home/somlo/PROJECTS/CERT/WGETHACK/wget/src' gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC=\/usr/local/etc/wgetrc\ -DLOCALEDIR=\/usr/local/share/locale\ -I. -I../lib -I../lib -g -O2 -MT css-url.o -MD -MP -MF .deps/css-url.Tpo -c -o css-url.o css-url.c css-url.c:48:22: fatal error: xstrndup.h: No such file or directory #include xstrndup.h ^ compilation terminated. make[3]: *** [css-url.o] Error 1 There seems to be no -I../gnulib/lib on the gcc command line, which is probably why it can't find it. Thanks, --Gabriel From 66ae9d897a4a2b7458a5d61230dc316befddd731 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81ngel=20Gonz=C3=A1lez?= keis...@gmail.com Date: Wed, 22 Oct 2014 01:10:21 +0200 Subject: [PATCH] css-url.c (get_uri_string): Fix regression from 8e6de1fb5 Solves the issue discovered by Gabriel Somlo and reported in the ml thread Regression in git master branch (commit 8e6de1fb5ff0ca0c749da7db634a1b1e3a1215a2) --- src/ChangeLog | 4 src/css-url.c | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/ChangeLog b/src/ChangeLog index d5aeca0..6004c34 100644 --- a/src/ChangeLog +++ b/src/ChangeLog @@ -1,3 +1,7 @@ +2013-10-22 ??ngel Gonz??lez keis...@gmail.com + + * css-url.c (get_uri_string): Fix regression from 8e6de1fb5. + 2014-10-16 Tim Ruehsen tim.rueh...@gmx.de * url.c (url_parse): little code cleanup diff --git a/src/css-url.c b/src/css-url.c index c605798..2ac1cef 100644 --- a/src/css-url.c +++ b/src/css-url.c @@ -45,6 +45,7 @@ as that of the covered work. */ #include stdlib.h #include ctype.h #include errno.h +#include xstrndup.h #include wget.h #include utils.h @@ -97,7 +98,7 @@ get_uri_string (const char *at, int *pos, int *length) *length -= 2; } - return xstrdup (at + *pos); + return xstrndup (uri, at + *pos, *length);; } void -- 2.1.2