Re: [Bug-wget] http.c code cleaning
Giuseppe Scrivano writes: > Now you can assume opt.method is already uppercase as this is already > enforced in cmd_string_uppercase (init.c). > > Please take a look at the commit 550457. My bad. Here is fixed version. >From dfb31c06db0ea98dfc8f08c20c8ec1de1f7ed0ca Mon Sep 17 00:00:00 2001 From: Dmitry Bogatov Date: Wed, 8 May 2013 18:10:55 +0400 Subject: [PATCH] Non-functionality improvement in src/http.c. Pulled `request_set_method` functionality into `request_new` to ensure these functions always called in right order. --- src/http.c | 30 +- 1 file changed, 9 insertions(+), 21 deletions(-) diff --git a/src/http.c b/src/http.c index 3a33840..6e5c481 100644 --- a/src/http.c +++ b/src/http.c @@ -147,27 +147,20 @@ struct request { extern int numurls; -/* Create a new, empty request. At least request_set_method must be - called before the request can be used. */ +/* Create a new, empty request. Set the request's method and its + arguments. METHOD should be a literal string (or it should outlive + the request) because it will not be freed. ARG will be freed by + request_free. */ static struct request * -request_new (void) +request_new (const char *method, char *arg) { struct request *req = xnew0 (struct request); req->hcapacity = 8; req->headers = xnew_array (struct request_header, req->hcapacity); - return req; -} - -/* Set the request's method and its arguments. METH should be a - literal string (or it should outlive the request) because it will - not be freed. ARG will be freed by request_free. */ - -static void -request_set_method (struct request *req, const char *meth, char *arg) -{ - req->method = meth; + req->method = method; req->arg = arg; + return req; } /* Return the method string passed with the last call to @@ -1758,15 +1751,11 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, conn = u; /* Prepare the request to send. */ - - req = request_new (); { char *meth_arg; const char *meth = "GET"; if (head_only) meth = "HEAD"; -else if (opt.method) - meth = opt.method; /* Use the full path, i.e. one that includes the leading slash and the query string. E.g. if u->path is "foo/bar" and u->query is "param=value", full_path will be "/foo/bar?param=value". */ @@ -1781,7 +1770,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, meth_arg = xstrdup (u->url); else meth_arg = url_full_path (u); -request_set_method (req, meth, meth_arg); +req = request_new(meth, meth_arg); } request_set_header (req, "Referer", (char *) hs->referer, rel_none); @@ -2014,8 +2003,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, { /* When requesting SSL URLs through proxies, use the CONNECT method to request passthrough. */ - struct request *connreq = request_new (); - request_set_method (connreq, "CONNECT", + struct request *connreq = request_new ("CONNECT", aprintf ("%s:%d", u->host, u->port)); SET_USER_AGENT (connreq); if (proxyauth) -- 1.8.2.2 -- Best regards, Dmitry Bogatov , Free Software supporter and netiquette guardian. git clone git://gitorious.org/illusionoflife-read-only/rc-files.git --depth 1 GPG: 54B7F00D Html mail and proprietary format attachments are forwarded to /dev/null.
Re: [Bug-wget] Segmentation fault with current development version of wget
> > We currently suspend any method if the response code is a non 307 > > redirect. That I believe is wrong. We should only be suspending the > > POST method and not others. > > It sounds good, but we must be careful also with 303. > > Do you agree that the matrix for the method to use on the next request > should look this way? > > original > method -> POST FOO HEAD > response > status > | > \/ > 301 GETFOO(terminate) > 302 GETFOO(terminate) > 303 GETGET(terminate) > 307 POST FOO(terminate) > > Looks good to me. This will require some work so that the code remains maintainable later. I implemented setting opt.spider when invoked with --method head. But spider mode currently follows all redirects till it receives a 200 OK. This does not match with the intended behaviour. > > Well yeah. It has opened a whole lot of possibilities that we must > > keep considering. I will handle it separately for HEAD and send in a > > patch soon. However, I was thinking, that since the new --method > > command needs a lot of work specific to itself and that it may only > > increase in the future, we should handle that code separately? Maybe a > > set_method_opts() function? > > we can refactor if/when we really need it, no need to overengineer the > code now. Let's keep it for later then. Do you already have any of such "opts" in mind? > Just the code that handles post-body/data and now for HEAD. -- Thanking You, Darshit Shah
Re: [Bug-wget] http.c code cleaning
Dmitry Bogatov writes: > - meth = opt.method; > + { > +char *q; > +for (q = opt.method; *q; ++q) > + *q = c_toupper (*q); > +meth = opt.method; > + } this code should go. Now you can assume opt.method is already uppercase as this is already enforced in cmd_string_uppercase (init.c). Please take a look at the commit 550457. -- Giuseppe
Re: [Bug-wget] http.c code cleaning
>> Giuseppe pushed a patch with that change a few days ago.. Please rebase it >> onto the master branch. > Good idea, but as Darshit said, please rebase it onto the master > branch. I will be offline for the next few days, I will review it > when I am back. Here it is: >From 6b35e408194bb1970537b54d888374e77d26d6b4 Mon Sep 17 00:00:00 2001 From: Dmitry Bogatov Date: Wed, 8 May 2013 18:10:55 +0400 Subject: [PATCH] Non-functionality improvement in src/http.c. Pulled `request_set_method` functionality into `request_new` to ensure these functions always called in right order. --- src/http.c | 35 +++ 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/src/http.c b/src/http.c index 3a33840..4d3b753 100644 --- a/src/http.c +++ b/src/http.c @@ -147,27 +147,20 @@ struct request { extern int numurls; -/* Create a new, empty request. At least request_set_method must be - called before the request can be used. */ +/* Create a new, empty request. Set the request's method and its + arguments. METHOD should be a literal string (or it should outlive + the request) because it will not be freed. ARG will be freed by + request_free. */ static struct request * -request_new (void) +request_new (const char *method, char *arg) { struct request *req = xnew0 (struct request); req->hcapacity = 8; req->headers = xnew_array (struct request_header, req->hcapacity); - return req; -} - -/* Set the request's method and its arguments. METH should be a - literal string (or it should outlive the request) because it will - not be freed. ARG will be freed by request_free. */ - -static void -request_set_method (struct request *req, const char *meth, char *arg) -{ - req->method = meth; + req->method = method; req->arg = arg; + return req; } /* Return the method string passed with the last call to @@ -1758,15 +1751,18 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, conn = u; /* Prepare the request to send. */ - - req = request_new (); { char *meth_arg; const char *meth = "GET"; if (head_only) meth = "HEAD"; else if (opt.method) - meth = opt.method; + { +char *q; +for (q = opt.method; *q; ++q) + *q = c_toupper (*q); +meth = opt.method; + } /* Use the full path, i.e. one that includes the leading slash and the query string. E.g. if u->path is "foo/bar" and u->query is "param=value", full_path will be "/foo/bar?param=value". */ @@ -1781,7 +1777,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, meth_arg = xstrdup (u->url); else meth_arg = url_full_path (u); -request_set_method (req, meth, meth_arg); +req = request_new(meth, meth_arg); } request_set_header (req, "Referer", (char *) hs->referer, rel_none); @@ -2014,8 +2010,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, { /* When requesting SSL URLs through proxies, use the CONNECT method to request passthrough. */ - struct request *connreq = request_new (); - request_set_method (connreq, "CONNECT", + struct request *connreq = request_new ("CONNECT", aprintf ("%s:%d", u->host, u->port)); SET_USER_AGENT (connreq); if (proxyauth) -- 1.8.2.2 -- Best regards, Dmitry Bogatov , Free Software supporter and netiquette guardian. git clone git://gitorious.org/illusionoflife-read-only/rc-files.git --depth 1 GPG: 54B7F00D Html mail and proprietary format attachments are forwarded to /dev/null.
Re: [Bug-wget] How to tell wget to strip part following question mark in local filenames?
Hi, Am Mittwoch, 8. Mai 2013 schrieb Mark: > Hi, > > I noticed some problems relating to URLs like > http://www.example.com/path/to/filename.zip?arg1=somestring&arg2=anotherstring&;... > > Wget doesn't strip the ? and following characters from the filename when > creating local files. As far as I can tell it doesn't have an option to do > that. This can cause several problems: > > - Local filenames have "garbage" following the actual extension which the > user has to manually remove. In many (most?) cases this is not garbage. It is common, that different argument values returns different content. To change the output file name for single downloads, use -O / --output- document. > - Depending on the web server, each download session may result in unique > arguments in the URL (e.g. some kind of session ID), making it impossible > to easily resume downloading partially-downloaded files. Wget would > instead re-download the whole file, saving it under a different name. When to resume a download, you are not in --recursive mode. Again, -O should do it. > - The worst problem is that when the arguments following the actual > filename in the URL are very long, wget is unable to create the file at > all, reporting > File name too long Again, this is only a problem when in recursive mode. Here, a hash string (e.g. sha1 or md5) instead of the query part (and / or the filename part) could be helpful. If needed, Wget could create a flat text file that maps hash codes to real filenames / urls in this cases. Anyone with other ideas ? > So this message is to suggest adding an option to tell wget to strip a > question mark and everything after that from the filename part of URLs to > get the local file name. Thanks for your suggestion. Regards, Tim
Re: [Bug-wget] http.c code cleaning
Darshit Shah writes: >> >> In comment to `request_new` was that it *always* must be followed by >> `request_set_method`. I changed code to combine these functionality into >> single function. Please, review. > > Looks good to me. > Just one thing though, the method string is now converted to uppercase > through cmd_string_uppercase. > Giuseppe pushed a patch with that change a few days ago.. Please rebase it > onto the master branch. Good idea, but as Darshit said, please rebase it onto the master branch. I will be offline for the next few days, I will review it when I am back. -- Giuseppe
Re: [Bug-wget] Segmentation fault with current development version of wget
Darshit Shah writes: > > We suspend the post data only when we receive a 307, or do you > mean we > shouldn't suspend in this case too? > > It's the other way round. A 307 response code is used when the server > wishes to explicitly ask the client to not suspend. And that is also > the current behaviour. true, I made some confusion with the terms :-) > We currently suspend any method if the response code is a non 307 > redirect. That I believe is wrong. We should only be suspending the > POST method and not others. It sounds good, but we must be careful also with 303. Do you agree that the matrix for the method to use on the next request should look this way? original method -> POST FOO HEAD response status | \/ 301 GETFOO(terminate) 302 GETFOO(terminate) 303 GETGET(terminate) 307 POST FOO(terminate) > Well yeah. It has opened a whole lot of possibilities that we must > keep considering. I will handle it separately for HEAD and send in a > patch soon. However, I was thinking, that since the new --method > command needs a lot of work specific to itself and that it may only > increase in the future, we should handle that code separately? Maybe a > set_method_opts() function? we can refactor if/when we really need it, no need to overengineer the code now. Do you already have any of such "opts" in mind? -- Giuseppe
Re: [Bug-wget] http.c code cleaning
> > In comment to `request_new` was that it *always* must be followed by > `request_set_method`. I changed code to combine these functionality into > single function. Please, review. > Looks good to me. Just one thing though, the method string is now converted to uppercase through cmd_string_uppercase. Giuseppe pushed a patch with that change a few days ago. -- Thanking You, Darshit Shah
[Bug-wget] http.c code cleaning
In comment to `request_new` was that it *always* must be followed by `request_set_method`. I changed code to combine these functionality into single function. Please, review. >From 4e2a98f3746400846486de0e57b4d68e654622f3 Mon Sep 17 00:00:00 2001 From: Dmitry Bogatov Date: Wed, 8 May 2013 18:10:55 +0400 Subject: [PATCH 2/2] Non-functionality improvement in src/http.c. Pulled `request_set_method` functionality into `request_new` to ensure these functions always called in right order. --- src/http.c | 35 +++ 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/src/http.c b/src/http.c index 486df5a..3da83af 100644 --- a/src/http.c +++ b/src/http.c @@ -147,27 +147,20 @@ struct request { extern int numurls; -/* Create a new, empty request. At least request_set_method must be - called before the request can be used. */ +/* Create a new, empty request. Set the request's method and its + arguments. METHOD should be a literal string (or it should outlive + the request) because it will not be freed. ARG will be freed by + request_free. */ static struct request * -request_new (void) +request_new (const char *method, char *arg) { struct request *req = xnew0 (struct request); req->hcapacity = 8; req->headers = xnew_array (struct request_header, req->hcapacity); - return req; -} - -/* Set the request's method and its arguments. METH should be a - literal string (or it should outlive the request) because it will - not be freed. ARG will be freed by request_free. */ - -static void -request_set_method (struct request *req, const char *meth, char *arg) -{ - req->method = meth; + req->method = method; req->arg = arg; + return req; } /* Return the method string passed with the last call to @@ -1758,15 +1751,18 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, conn = u; /* Prepare the request to send. */ - - req = request_new (); { char *meth_arg; const char *meth = "GET"; if (head_only) meth = "HEAD"; else if (opt.method) - meth = opt.method; + { +char *q; +for (q = opt.method; *q; ++q) + *q = c_toupper (*q); +meth = opt.method; + } /* Use the full path, i.e. one that includes the leading slash and the query string. E.g. if u->path is "foo/bar" and u->query is "param=value", full_path will be "/foo/bar?param=value". */ @@ -1781,7 +1777,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, meth_arg = xstrdup (u->url); else meth_arg = url_full_path (u); -request_set_method (req, meth, meth_arg); +req = request_new(meth, meth_arg); } request_set_header (req, "Referer", (char *) hs->referer, rel_none); @@ -2014,8 +2010,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy, { /* When requesting SSL URLs through proxies, use the CONNECT method to request passthrough. */ - struct request *connreq = request_new (); - request_set_method (connreq, "CONNECT", + struct request *connreq = request_new ("CONNECT", aprintf ("%s:%d", u->host, u->port)); SET_USER_AGENT (connreq); if (proxyauth) -- 1.8.2.2 -- Best regards, Dmitry Bogatov , Free Software supporter and netiquette guardian. git clone git://gitorious.org/illusionoflife-read-only/rc-files.git --depth 1 GPG: 54B7F00D Html mail and proprietary format attachments are forwarded to /dev/null.
[Bug-wget] How to tell wget to strip part following question mark in local filenames?
Hi, I noticed some problems relating to URLs like http://www.example.com/path/to/filename.zip?arg1=somestring&arg2=anotherstring&;... Wget doesn't strip the ? and following characters from the filename when creating local files. As far as I can tell it doesn't have an option to do that. This can cause several problems: - Local filenames have "garbage" following the actual extension which the user has to manually remove. - Depending on the web server, each download session may result in unique arguments in the URL (e.g. some kind of session ID), making it impossible to easily resume downloading partially-downloaded files. Wget would instead re-download the whole file, saving it under a different name. - The worst problem is that when the arguments following the actual filename in the URL are very long, wget is unable to create the file at all, reporting File name too long So this message is to suggest adding an option to tell wget to strip a question mark and everything after that from the filename part of URLs to get the local file name. -- Mark