Re: [Bug-wget] http.c code cleaning

2013-05-08 Thread Dmitry Bogatov

Giuseppe Scrivano  writes:
>  Now you can assume opt.method is already uppercase as this is already
> enforced in cmd_string_uppercase (init.c).
>
> Please take a look at the commit 550457.
My bad. Here is fixed version.

>From dfb31c06db0ea98dfc8f08c20c8ec1de1f7ed0ca Mon Sep 17 00:00:00 2001
From: Dmitry Bogatov 
Date: Wed, 8 May 2013 18:10:55 +0400
Subject: [PATCH] Non-functionality improvement in src/http.c.

Pulled `request_set_method` functionality into `request_new`
to ensure these functions always called in right order.
---
 src/http.c | 30 +-
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/src/http.c b/src/http.c
index 3a33840..6e5c481 100644
--- a/src/http.c
+++ b/src/http.c
@@ -147,27 +147,20 @@ struct request {
 
 extern int numurls;
 
-/* Create a new, empty request.  At least request_set_method must be
-   called before the request can be used.  */
+/* Create a new, empty request. Set the request's method and its
+   arguments.  METHOD should be a literal string (or it should outlive
+   the request) because it will not be freed.  ARG will be freed by
+   request_free.  */
 
 static struct request *
-request_new (void)
+request_new (const char *method, char *arg)
 {
   struct request *req = xnew0 (struct request);
   req->hcapacity = 8;
   req->headers = xnew_array (struct request_header, req->hcapacity);
-  return req;
-}
-
-/* Set the request's method and its arguments.  METH should be a
-   literal string (or it should outlive the request) because it will
-   not be freed.  ARG will be freed by request_free.  */
-
-static void
-request_set_method (struct request *req, const char *meth, char *arg)
-{
-  req->method = meth;
+  req->method = method;
   req->arg = arg;
+  return req;
 }
 
 /* Return the method string passed with the last call to
@@ -1758,15 +1751,11 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
   conn = u;
 
   /* Prepare the request to send. */
-
-  req = request_new ();
   {
 char *meth_arg;
 const char *meth = "GET";
 if (head_only)
   meth = "HEAD";
-else if (opt.method)
-  meth = opt.method;
 /* Use the full path, i.e. one that includes the leading slash and
the query string.  E.g. if u->path is "foo/bar" and u->query is
"param=value", full_path will be "/foo/bar?param=value".  */
@@ -1781,7 +1770,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
   meth_arg = xstrdup (u->url);
 else
   meth_arg = url_full_path (u);
-request_set_method (req, meth, meth_arg);
+req = request_new(meth, meth_arg);
   }
 
   request_set_header (req, "Referer", (char *) hs->referer, rel_none);
@@ -2014,8 +2003,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
 {
   /* When requesting SSL URLs through proxies, use the
  CONNECT method to request passthrough.  */
-  struct request *connreq = request_new ();
-  request_set_method (connreq, "CONNECT",
+  struct request *connreq = request_new ("CONNECT",
   aprintf ("%s:%d", u->host, u->port));
   SET_USER_AGENT (connreq);
   if (proxyauth)
-- 
1.8.2.2


--
Best regards, Dmitry Bogatov ,
Free Software supporter and netiquette guardian.
git clone git://gitorious.org/illusionoflife-read-only/rc-files.git 
--depth 1
GPG: 54B7F00D
Html mail and proprietary format attachments are forwarded to /dev/null.


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-08 Thread Darshit Shah
> > We currently suspend any method if the response code is a non 307
> > redirect. That I believe is wrong. We should only be suspending the
> > POST method and not others.
>
> It sounds good, but we must be careful also with 303.
>
> Do you agree that the matrix for the method to use on the next request
> should look this way?
>
>  original
>   method ->  POST   FOO  HEAD
> response
> status
>  |
>  \/
> 301  GETFOO(terminate)
> 302  GETFOO(terminate)
> 303  GETGET(terminate)
> 307  POST   FOO(terminate)
>
> Looks good to me. This will require some work so that the code remains
maintainable later.
I implemented setting opt.spider when invoked with --method head. But
spider mode currently follows all redirects till it receives a 200 OK. This
does not match with the intended behaviour.


> > Well yeah. It has opened a whole lot of possibilities that we must
> > keep considering. I will handle it separately for HEAD and send in a
> > patch soon. However, I was thinking, that since the new --method
> > command needs a lot of work specific to itself and that it may only
> > increase in the future, we should handle that code separately? Maybe a
> > set_method_opts() function?
>
> we can refactor if/when we really need it, no need to overengineer the
> code now.

Let's keep it for later then.

 Do you already have any of such "opts" in mind?
>
Just the code that handles post-body/data and now for HEAD.



-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] http.c code cleaning

2013-05-08 Thread Giuseppe Scrivano
Dmitry Bogatov  writes:

> -  meth = opt.method;
> +  {
> +char *q;
> +for (q = opt.method; *q; ++q)
> +  *q = c_toupper (*q);
> +meth = opt.method;
> +  }

this code should go.  Now you can assume opt.method is already uppercase
as this is already enforced in cmd_string_uppercase (init.c).

Please take a look at the commit 550457.

-- 
Giuseppe



Re: [Bug-wget] http.c code cleaning

2013-05-08 Thread Dmitry Bogatov

>> Giuseppe pushed a patch with that change a few days ago..  Please rebase it 
>> onto the master branch.
> Good idea, but as Darshit said, please rebase it onto the master
> branch.  I will be offline for the next few days, I will review it
> when I am back.
Here it is: >From 6b35e408194bb1970537b54d888374e77d26d6b4 Mon Sep 17 00:00:00 2001
From: Dmitry Bogatov 
Date: Wed, 8 May 2013 18:10:55 +0400
Subject: [PATCH] Non-functionality improvement in src/http.c.

Pulled `request_set_method` functionality into `request_new`
to ensure these functions always called in right order.
---
 src/http.c | 35 +++
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/src/http.c b/src/http.c
index 3a33840..4d3b753 100644
--- a/src/http.c
+++ b/src/http.c
@@ -147,27 +147,20 @@ struct request {
 
 extern int numurls;
 
-/* Create a new, empty request.  At least request_set_method must be
-   called before the request can be used.  */
+/* Create a new, empty request. Set the request's method and its
+   arguments.  METHOD should be a literal string (or it should outlive
+   the request) because it will not be freed.  ARG will be freed by
+   request_free.  */
 
 static struct request *
-request_new (void)
+request_new (const char *method, char *arg)
 {
   struct request *req = xnew0 (struct request);
   req->hcapacity = 8;
   req->headers = xnew_array (struct request_header, req->hcapacity);
-  return req;
-}
-
-/* Set the request's method and its arguments.  METH should be a
-   literal string (or it should outlive the request) because it will
-   not be freed.  ARG will be freed by request_free.  */
-
-static void
-request_set_method (struct request *req, const char *meth, char *arg)
-{
-  req->method = meth;
+  req->method = method;
   req->arg = arg;
+  return req;
 }
 
 /* Return the method string passed with the last call to
@@ -1758,15 +1751,18 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
   conn = u;
 
   /* Prepare the request to send. */
-
-  req = request_new ();
   {
 char *meth_arg;
 const char *meth = "GET";
 if (head_only)
   meth = "HEAD";
 else if (opt.method)
-  meth = opt.method;
+  {
+char *q;
+for (q = opt.method; *q; ++q)
+  *q = c_toupper (*q);
+meth = opt.method;
+  }
 /* Use the full path, i.e. one that includes the leading slash and
the query string.  E.g. if u->path is "foo/bar" and u->query is
"param=value", full_path will be "/foo/bar?param=value".  */
@@ -1781,7 +1777,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
   meth_arg = xstrdup (u->url);
 else
   meth_arg = url_full_path (u);
-request_set_method (req, meth, meth_arg);
+req = request_new(meth, meth_arg);
   }
 
   request_set_header (req, "Referer", (char *) hs->referer, rel_none);
@@ -2014,8 +2010,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
 {
   /* When requesting SSL URLs through proxies, use the
  CONNECT method to request passthrough.  */
-  struct request *connreq = request_new ();
-  request_set_method (connreq, "CONNECT",
+  struct request *connreq = request_new ("CONNECT",
   aprintf ("%s:%d", u->host, u->port));
   SET_USER_AGENT (connreq);
   if (proxyauth)
-- 
1.8.2.2


--
Best regards, Dmitry Bogatov ,
Free Software supporter and netiquette guardian.
git clone git://gitorious.org/illusionoflife-read-only/rc-files.git 
--depth 1
GPG: 54B7F00D
Html mail and proprietary format attachments are forwarded to /dev/null.


Re: [Bug-wget] How to tell wget to strip part following question mark in local filenames?

2013-05-08 Thread Tim Rühsen
Hi,

Am Mittwoch, 8. Mai 2013 schrieb Mark:
> Hi,
> 
> I noticed some problems relating to URLs like
>   
http://www.example.com/path/to/filename.zip?arg1=somestring&arg2=anotherstring&;...
> 
> Wget doesn't strip the ? and following characters from the filename when
> creating local files. As far as I can tell it doesn't have an option to do
> that. This can cause several problems:
> 
>  - Local filenames have "garbage" following the actual extension which the
> user has to manually remove.

In many (most?) cases this is not garbage.
It is common, that different argument values returns different content.
To change the output file name for single downloads, use -O / --output-
document.

>  - Depending on the web server, each download session may result in unique
> arguments in the URL (e.g. some kind of session ID), making it impossible
> to easily resume downloading partially-downloaded files. Wget would
> instead re-download the whole file, saving it under a different name.

When to resume a download, you are not in --recursive mode.
Again, -O should do it.

>  - The worst problem is that when the arguments following the actual
> filename in the URL are very long, wget is unable to create the file at
> all, reporting
>   File name too long

Again, this is only a problem when in recursive mode.
Here, a hash string (e.g. sha1 or md5) instead of the query part (and / or the 
filename part) could be helpful.
If needed, Wget could create a flat text file that maps hash codes to real 
filenames / urls in this cases.

Anyone with other ideas ?

> So this message is to suggest adding an option to tell wget to strip a
> question mark and everything after that from the filename part of URLs to
> get the local file name.

Thanks for your suggestion.

Regards, Tim



Re: [Bug-wget] http.c code cleaning

2013-05-08 Thread Giuseppe Scrivano
Darshit Shah  writes:

>>
>> In comment to `request_new` was that it *always* must be followed by
>> `request_set_method`. I changed code to combine these functionality into
>> single function. Please, review.

>
> Looks good to me.
> Just one thing though, the method string is now converted to uppercase
> through cmd_string_uppercase.
> Giuseppe pushed a patch with that change a few days ago..  Please rebase it 
> onto the master branch.

Good idea, but as Darshit said, please rebase it onto the master branch.

I will be offline for the next few days, I will review it when I am back.

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-08 Thread Giuseppe Scrivano
Darshit Shah  writes:

> 
> We suspend the post data only when we receive a 307, or do you
> mean we
> shouldn't suspend in this case too?
> 
> It's the other way round. A 307 response code is used when the server
> wishes to explicitly ask the client to not suspend. And that is also
> the current behaviour. 

true, I made some confusion with the terms :-)



> We currently suspend any method if the response code is a non 307
> redirect. That I believe is wrong. We should only be suspending the
> POST method and not others. 

It sounds good, but we must be careful also with 303.

Do you agree that the matrix for the method to use on the next request
should look this way?

 original
  method ->  POST   FOO  HEAD
response
status
 |
 \/
301  GETFOO(terminate)
302  GETFOO(terminate)
303  GETGET(terminate)
307  POST   FOO(terminate)


> Well yeah. It has opened a whole lot of possibilities that we must
> keep considering. I will handle it separately for HEAD and send in a
> patch soon. However, I was thinking, that since the new --method
> command needs a lot of work specific to itself and that it may only
> increase in the future, we should handle that code separately? Maybe a
> set_method_opts() function?

we can refactor if/when we really need it, no need to overengineer the
code now.  Do you already have any of such "opts" in mind?

-- 
Giuseppe



Re: [Bug-wget] http.c code cleaning

2013-05-08 Thread Darshit Shah
>
> In comment to `request_new` was that it *always* must be followed by
> `request_set_method`. I changed code to combine these functionality into
> single function. Please, review.
>

Looks good to me.
Just one thing though, the method string is now converted to uppercase
through cmd_string_uppercase.
Giuseppe pushed a patch with that change a few days ago.

--
Thanking You,
Darshit Shah


[Bug-wget] http.c code cleaning

2013-05-08 Thread Dmitry Bogatov

In comment to `request_new` was that it *always* must be followed by
`request_set_method`. I changed code to combine these functionality into
single function. Please, review.
>From 4e2a98f3746400846486de0e57b4d68e654622f3 Mon Sep 17 00:00:00 2001
From: Dmitry Bogatov 
Date: Wed, 8 May 2013 18:10:55 +0400
Subject: [PATCH 2/2] Non-functionality improvement in src/http.c.

Pulled `request_set_method` functionality into `request_new`
to ensure these functions always called in right order.
---
 src/http.c | 35 +++
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/src/http.c b/src/http.c
index 486df5a..3da83af 100644
--- a/src/http.c
+++ b/src/http.c
@@ -147,27 +147,20 @@ struct request {
 
 extern int numurls;
 
-/* Create a new, empty request.  At least request_set_method must be
-   called before the request can be used.  */
+/* Create a new, empty request. Set the request's method and its
+   arguments.  METHOD should be a literal string (or it should outlive
+   the request) because it will not be freed.  ARG will be freed by
+   request_free.  */
 
 static struct request *
-request_new (void)
+request_new (const char *method, char *arg)
 {
   struct request *req = xnew0 (struct request);
   req->hcapacity = 8;
   req->headers = xnew_array (struct request_header, req->hcapacity);
-  return req;
-}
-
-/* Set the request's method and its arguments.  METH should be a
-   literal string (or it should outlive the request) because it will
-   not be freed.  ARG will be freed by request_free.  */
-
-static void
-request_set_method (struct request *req, const char *meth, char *arg)
-{
-  req->method = meth;
+  req->method = method;
   req->arg = arg;
+  return req;
 }
 
 /* Return the method string passed with the last call to
@@ -1758,15 +1751,18 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
   conn = u;
 
   /* Prepare the request to send. */
-
-  req = request_new ();
   {
 char *meth_arg;
 const char *meth = "GET";
 if (head_only)
   meth = "HEAD";
 else if (opt.method)
-  meth = opt.method;
+  {
+char *q;
+for (q = opt.method; *q; ++q)
+  *q = c_toupper (*q);
+meth = opt.method;
+  }
 /* Use the full path, i.e. one that includes the leading slash and
the query string.  E.g. if u->path is "foo/bar" and u->query is
"param=value", full_path will be "/foo/bar?param=value".  */
@@ -1781,7 +1777,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
   meth_arg = xstrdup (u->url);
 else
   meth_arg = url_full_path (u);
-request_set_method (req, meth, meth_arg);
+req = request_new(meth, meth_arg);
   }
 
   request_set_header (req, "Referer", (char *) hs->referer, rel_none);
@@ -2014,8 +2010,7 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, struct url *proxy,
 {
   /* When requesting SSL URLs through proxies, use the
  CONNECT method to request passthrough.  */
-  struct request *connreq = request_new ();
-  request_set_method (connreq, "CONNECT",
+  struct request *connreq = request_new ("CONNECT",
   aprintf ("%s:%d", u->host, u->port));
   SET_USER_AGENT (connreq);
   if (proxyauth)
-- 
1.8.2.2


--
Best regards, Dmitry Bogatov ,
Free Software supporter and netiquette guardian.
git clone git://gitorious.org/illusionoflife-read-only/rc-files.git 
--depth 1
GPG: 54B7F00D
Html mail and proprietary format attachments are forwarded to /dev/null.


[Bug-wget] How to tell wget to strip part following question mark in local filenames?

2013-05-08 Thread Mark
Hi,

I noticed some problems relating to URLs like
  
http://www.example.com/path/to/filename.zip?arg1=somestring&arg2=anotherstring&;...

Wget doesn't strip the ? and following characters from the filename when
creating local files. As far as I can tell it doesn't have an option to do
that. This can cause several problems:

 - Local filenames have "garbage" following the actual extension which the
user has to manually remove.

 - Depending on the web server, each download session may result in unique
arguments in the URL (e.g. some kind of session ID), making it impossible
to easily resume downloading partially-downloaded files. Wget would
instead re-download the whole file, saving it under a different name.

 - The worst problem is that when the arguments following the actual
filename in the URL are very long, wget is unable to create the file at
all, reporting
  File name too long

So this message is to suggest adding an option to tell wget to strip a
question mark and everything after that from the filename part of URLs to
get the local file name.


-- Mark