[Bug-wget] Segmentation fault with current development version of wget

2013-04-30 Thread Stefano Lattarini
Here is the reproducer:

  $ wget --passive-ftp -nv --recursive --level=1 --no-directories --no-parent \
 --no-check-certificate -A '*.po' 
http://translationproject.org/latest/make
  2013-04-30 16:41:25 URL:https://translationproject.org/latest/make/ 
[5489/5489] -> "make" [1]
  Segmentation fault

Sorry, I don't have to look into this more deeply ATM, not to provide
further feedback.  Hope you can reproduce this yourself; otherwise, I'll
try to get back to you in the next days.

Regards, and HTH,
  Stefano



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-04-30 Thread Tim Ruehsen
Hi,

you can even reproduce it with a simple
wget -r http://translationproject.org/latest/make

Darshit, maybe you can have a look at it. It has something to do with 
opt.method (set to read-only "POST" in http.c, line 1772).

Even with --method=GET opt.method points to "POST". And the code tries to 
uppercase it inline.

Regards, Tim

Am Tuesday 30 April 2013 schrieb Stefano Lattarini:
> Here is the reproducer:
> 
>   $ wget --passive-ftp -nv --recursive --level=1 --no-directories
> --no-parent \ --no-check-certificate -A '*.po'
> http://translationproject.org/latest/make 2013-04-30 16:41:25
> URL:https://translationproject.org/latest/make/ [5489/5489] -> "make" [1]
> Segmentation fault
> 
> Sorry, I don't have to look into this more deeply ATM, not to provide
> further feedback.  Hope you can reproduce this yourself; otherwise, I'll
> try to get back to you in the next days.
> 
> Regards, and HTH,
>   Stefano

Mit freundlichem Gruß

 Tim Rühsen



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-04-30 Thread Darshit Shah
I'm in the middle of University exams at the moment. I'll still have a look
at it tomorrow when I get a breather.

However, it looks like wget is converting any method to a POST request
which is weird since that should have caused it to fail most of the tests.

I'll have to look into it and check what is causing this issue.

On Tue, Apr 30, 2013 at 8:46 PM, Tim Ruehsen  wrote:

> Hi,
>
> you can even reproduce it with a simple
> wget -r http://translationproject.org/latest/make
>
> Darshit, maybe you can have a look at it. It has something to do with
> opt.method (set to read-only "POST" in http.c, line 1772).
>
> Even with --method=GET opt.method points to "POST". And the code tries to
> uppercase it inline.
>
> Regards, Tim
>
> Am Tuesday 30 April 2013 schrieb Stefano Lattarini:
> > Here is the reproducer:
> >
> >   $ wget --passive-ftp -nv --recursive --level=1 --no-directories
> > --no-parent \ --no-check-certificate -A '*.po'
> > http://translationproject.org/latest/make 2013-04-30 16:41:25
> > URL:https://translationproject.org/latest/make/ [5489/5489] -> "make"
> [1]
> > Segmentation fault
> >
> > Sorry, I don't have to look into this more deeply ATM, not to provide
> > further feedback.  Hope you can reproduce this yourself; otherwise, I'll
> > try to get back to you in the next days.
> >
> > Regards, and HTH,
> >   Stefano
>
> Mit freundlichem Gruß
>
>  Tim Rühsen
>



-- 
Thanking You,
Darshit Shah
Research Lead, Code Innovation
Kill Code Phobia.
B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-04-30 Thread Darshit Shah
Okay, I could not prevent myself from looking into it.

It seems as if the SUSPEND_POST_DATA macro was being called during a
recursive download attempt.

Attaching a hack around the situation. Will look more deeply when I have
more time to identify what caused the regression.

On Tue, Apr 30, 2013 at 9:09 PM, Darshit Shah  wrote:

> I'm in the middle of University exams at the moment. I'll still have a
> look at it tomorrow when I get a breather.
>
> However, it looks like wget is converting any method to a POST request
> which is weird since that should have caused it to fail most of the tests.
>
> I'll have to look into it and check what is causing this issue.
>
> On Tue, Apr 30, 2013 at 8:46 PM, Tim Ruehsen  wrote:
>
>> Hi,
>>
>> you can even reproduce it with a simple
>> wget -r http://translationproject.org/latest/make
>>
>> Darshit, maybe you can have a look at it. It has something to do with
>> opt.method (set to read-only "POST" in http.c, line 1772).
>>
>> Even with --method=GET opt.method points to "POST". And the code tries to
>> uppercase it inline.
>>
>> Regards, Tim
>>
>> Am Tuesday 30 April 2013 schrieb Stefano Lattarini:
>> > Here is the reproducer:
>> >
>> >   $ wget --passive-ftp -nv --recursive --level=1 --no-directories
>> > --no-parent \ --no-check-certificate -A '*.po'
>> > http://translationproject.org/latest/make 2013-04-30 16:41:25
>> > URL:https://translationproject.org/latest/make/ [5489/5489] -> "make"
>> [1]
>> > Segmentation fault
>> >
>> > Sorry, I don't have to look into this more deeply ATM, not to provide
>> > further feedback.  Hope you can reproduce this yourself; otherwise, I'll
>> > try to get back to you in the next days.
>> >
>> > Regards, and HTH,
>> >   Stefano
>>
>> Mit freundlichem Gruß
>>
>>  Tim Rühsen
>>
>
>
>
> --
> Thanking You,
> Darshit Shah
> Research Lead, Code Innovation
> Kill Code Phobia.
> B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani
>



-- 
Thanking You,
Darshit Shah
Research Lead, Code Innovation
Kill Code Phobia.
B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani


0001-Fix-segfault-in-recursive-calls.patch
Description: Binary data


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-04-30 Thread Stefano Lattarini
On 04/30/2013 07:01 PM, Darshit Shah wrote:
> Okay, I could not prevent myself from looking into it.
> 
> It seems as if the SUSPEND_POST_DATA macro was being called during a
> recursive download attempt.
> 
> Attaching a hack around the situation. Will look more deeply when I have
> more time to identify what caused the regression.
> 
I can confirm this temporary fix works for me.

Thanks,
  Stefano



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-04-30 Thread Tim Rühsen
Hi Darshit,

I understand that your patch was just a quick hack.

But even than you should avoid doing
opt.method == "POST"
for string comparisons.

This definitely not portable.
Not every compiler/linker aggregates two occurrences of the same static string 
into one address in memory.

You should use strcmp or strcasecmp to check if opt.method points to "POST".

Regards, Tim

Am Dienstag, 30. April 2013 schrieb Darshit Shah:
> Okay, I could not prevent myself from looking into it.
> 
> It seems as if the SUSPEND_POST_DATA macro was being called during a
> recursive download attempt.
> 
> Attaching a hack around the situation. Will look more deeply when I have
> more time to identify what caused the regression.
> 
> On Tue, Apr 30, 2013 at 9:09 PM, Darshit Shah  wrote:
> 
> > I'm in the middle of University exams at the moment. I'll still have a
> > look at it tomorrow when I get a breather.
> >
> > However, it looks like wget is converting any method to a POST request
> > which is weird since that should have caused it to fail most of the tests.
> >
> > I'll have to look into it and check what is causing this issue.
> >
> > On Tue, Apr 30, 2013 at 8:46 PM, Tim Ruehsen  wrote:
> >
> >> Hi,
> >>
> >> you can even reproduce it with a simple
> >> wget -r http://translationproject.org/latest/make
> >>
> >> Darshit, maybe you can have a look at it. It has something to do with
> >> opt.method (set to read-only "POST" in http.c, line 1772).
> >>
> >> Even with --method=GET opt.method points to "POST". And the code tries to
> >> uppercase it inline.
> >>
> >> Regards, Tim
> >>
> >> Am Tuesday 30 April 2013 schrieb Stefano Lattarini:
> >> > Here is the reproducer:
> >> >
> >> >   $ wget --passive-ftp -nv --recursive --level=1 --no-directories
> >> > --no-parent \ --no-check-certificate -A '*.po'
> >> > http://translationproject.org/latest/make 2013-04-30 16:41:25
> >> > URL:https://translationproject.org/latest/make/ [5489/5489] -> "make"
> >> [1]
> >> > Segmentation fault
> >> >
> >> > Sorry, I don't have to look into this more deeply ATM, not to provide
> >> > further feedback.  Hope you can reproduce this yourself; otherwise, 
I'll
> >> > try to get back to you in the next days.
> >> >
> >> > Regards, and HTH,
> >> >   Stefano
> >>
> >> Mit freundlichem Gruß
> >>
> >>  Tim Rühsen
> >>
> >
> >
> >
> > --
> > Thanking You,
> > Darshit Shah
> > Research Lead, Code Innovation
> > Kill Code Phobia.
> > B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani
> >
> 
> 
> 
> -- 
> Thanking You,
> Darshit Shah
> Research Lead, Code Innovation
> Kill Code Phobia.
> B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani
> 




Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Darshit Shah
Couple of things.
First, sorry for the quick and dirty hack which was the perfect example of
how NOT to do things.

Secondly, it lies upon me that this feature wasn't tested before submitting
the patch. I had however relied on the Test Environment and since it passed
everything there, I thought it was working correctly. Guess, we should add
a test for this soon. --recursive is a commonly used switch with Wget and
not having a test to prevent regressions on it is very bad.

Lastly, I have spent some time on this issue today and figured out the
issue.
The SUSPEND_POST_DATA logic is so written that the macro is called on every
301/302 redirect. This did not pose any issues till now since when NOT
using --post-file or --post-data, the respective variables would be holding
a NULL value which is backed up and then restored without any problems.
However, with the addition of the --method option, I am explicitly setting
opt.method to POST on RESUME_POST_DATA since the macro is supposed to be
called only during POST requests. This is the source of the error.

The actual Segmentation Fault was being caused since I, in all my
foolishness did not strdup, but instead directly assigned the string to a
pointer.

I am fixing this issue, but it is a terribly ugly hack. If someone could
help improve it I'd be most truly grateful.
I have a couple of ideas, but I will need to work them out and implement
them when I have the time.

The reason it has to be so ugly is that, we cannot use strcmp or strcasecmp
on a NULL String, and we cannot initialize opt.method since that would
break some sanity checks which are in place so that --post-* and --body-*
commands don't conflict with each other.


0001-Fix-bug-in-program-logic-and-Seg-Fault.patch
Description: Binary data


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Giuseppe Scrivano
Darshit Shah  writes:

> I am fixing this issue, but it is a terribly ugly hack. If someone could
> help improve it I'd be most truly grateful.
> I have a couple of ideas, but I will need to work them out and implement
> them when I have the time.
>
> The reason it has to be so ugly is that, we cannot use strcmp or strcasecmp
> on a NULL String, and we cannot initialize opt.method since that would
> break some sanity checks which are in place so that --post-* and --body-*
> commands don't conflict with each other.

what about this patch?  Any comment?

diff --git a/src/http.c b/src/http.c
index 25ad474..6b23382 100644
--- a/src/http.c
+++ b/src/http.c
@@ -1766,12 +1766,8 @@ gethttp (struct url *u, struct http_stat *hs, int *dt, 
struct url *proxy,
 if (head_only)
   meth = "HEAD";
 else if (opt.method)
-  {
-char *q;
-for (q = opt.method; *q; ++q)
-  *q = c_toupper (*q);
-meth = opt.method;
-  }
+  meth = opt.method;
 /* Use the full path, i.e. one that includes the leading slash and
the query string.  E.g. if u->path is "foo/bar" and u->query is
"param=value", full_path will be "/foo/bar?param=value".  */
diff --git a/src/init.c b/src/init.c
index 813781f..370506f 100644
--- a/src/init.c
+++ b/src/init.c
@@ -87,6 +87,7 @@ CMD_DECLARE (cmd_directory_vector);
 CMD_DECLARE (cmd_number);
 CMD_DECLARE (cmd_number_inf);
 CMD_DECLARE (cmd_string);
+CMD_DECLARE (cmd_string_uppercase);
 CMD_DECLARE (cmd_file);
 CMD_DECLARE (cmd_directory);
 CMD_DECLARE (cmd_time);
@@ -212,7 +213,7 @@ static const struct {
   { "logfile",  &opt.lfilename, cmd_file },
   { "login",&opt.ftp_user,  cmd_string },/* deprecated*/
   { "maxredirect",  &opt.max_redirect,  cmd_number },
-  { "method",   &opt.method,cmd_string },
+  { "method",   &opt.method,cmd_string_uppercase },
   { "mirror",   NULL,   cmd_spec_mirror },
   { "netrc",&opt.netrc, cmd_boolean },
   { "noclobber",&opt.noclobber, cmd_boolean },
@@ -959,8 +960,24 @@ cmd_string (const char *com, const char *val, void *place)
   return true;
 }
 
+/* Like cmd_string but ensure the string is upper case.  */
+static bool
+cmd_string_uppercase (const char *com, const char *val, void *place)
+{
+  char *q;
+  bool ret = cmd_string (com, val, place);
+  q = *((char **) place);
+  if (!ret || q == NULL)
+return false;
+
+  while (*q)
+*q++ = c_toupper (*q);
+
+  return true;
+}
+
 
-/* Like the above, but handles tilde-expansion when reading a user's
+/* Like cmd_string, but handles tilde-expansion when reading a user's
`.wgetrc'.  In that case, and if VAL begins with `~', the tilde
gets expanded to the user's home directory.  */
 static bool


-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Tim Rühsen
Am Mittwoch, 1. Mai 2013 schrieb Darshit Shah:
> First, sorry for the quick and dirty hack which was the perfect example of
> how NOT to do things.

Than it was a good example ;-)

> Secondly, it lies upon me that this feature wasn't tested before submitting
> the patch. I had however relied on the Test Environment and since it passed
> everything there, I thought it was working correctly. Guess, we should add
> a test for this soon. --recursive is a commonly used switch with Wget and
> not having a test to prevent regressions on it is very bad.

There are several tests using -r.
The question is why the problem doesn't come out.

> I am fixing this issue, but it is a terribly ugly hack. If someone could
> help improve it I'd be most truly grateful.
> I have a couple of ideas, but I will need to work them out and implement
> them when I have the time.
> 
> The reason it has to be so ugly is that, we cannot use strcmp or strcasecmp
> on a NULL String, and we cannot initialize opt.method since that would
> break some sanity checks which are in place so that --post-* and --body-*
> commands don't conflict with each other.

Your test isn't really ugly.
I (and most C programmers) favor
opt.method && strcasecmp(opt.method,"POST") == 0
instead of
opt.method ? strcasecmp(opt.method,"POST") == 0 : false
But that is not really important.

It is pretty common, that one or both args to strcmp may be NULL. The really 
ugly thing is, that there are no string function alternatives that handle NULL 
pointers. I regularly use my own versions like strcmp_null() to avoid extra 
checks.

Regards, Tim



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Gijs van Tulder

Giuseppe Scrivano wrote:
> what about this patch?  Any comment?

Another suggestion: why not save the original opt.method, set opt.method 
to NULL and put the original opt.method back later?


Gijs


diff --git a/src/retr.c b/src/retr.c
index d51b7e7..2aee578 100644
--- a/src/retr.c
+++ b/src/retr.c
@@ -679,22 +679,23 @@ calc_rate (wgint bytes, double secs, int *units)

 #define SUSPEND_POST_DATA do {  \
   post_data_suspended = true;   \
   saved_post_data = opt.body_data;  \
   saved_post_file_name = opt.body_file; \
+  saved_method = opt.method;\
   opt.body_data = NULL; \
   opt.body_file = NULL; \
   opt.method = NULL;\
 } while (0)

 #define RESTORE_POST_DATA do {  \
   if (post_data_suspended)  \
 {   \
   opt.body_data = saved_post_data;  \
   opt.body_file = saved_post_file_name; \
+  opt.method = saved_method;\
   post_data_suspended = false;  \
-  opt.method = "POST";  \
 }   \
 } while (0)

 static char *getproxy (struct url *);

@@ -721,10 +722,11 @@ retrieve_url (struct url * orig_parsed, const char 
*origurl, char **file,

   int redirection_count = 0;

   bool post_data_suspended = false;
   char *saved_post_data = NULL;
   char *saved_post_file_name = NULL;
+  char *saved_method = NULL;

   /* If dt is NULL, use local storage.  */
   if (!dt)
 {
   dt = &dummy;




Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Giuseppe Scrivano
hi Gijs,

Gijs van Tulder  writes:

> Giuseppe Scrivano wrote:
>> what about this patch?  Any comment?
>
> Another suggestion: why not save the original opt.method, set
> opt.method to NULL and put the original opt.method back later?

thanks for your suggestion but I think we should drop the code that
modifies opt.method, since we have to sanitize it only when it is
specified as argument.  Objections?

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Darshit Shah
On Thu, May 2, 2013 at 12:56 AM, Gijs van Tulder  wrote:

> Giuseppe Scrivano wrote:
> > what about this patch?  Any comment?
>
> Another suggestion: why not save the original opt.method, set opt.method
> to NULL and put the original opt.method back later?
>
> Gijs
>

Unless we explicitly check for opt.method = POST, this will cause a lot of
issues. Since the macro is being called on every redirect, even
HEAD/PUT/OPTIONS headers will get suspended to give way to GET. That is not
the behaviour we want.
And if we are explicitly writing the above test, then why not simply allow
the xstrdup to copy the string? (Although, I believe using a const *string
may be more efficient. )

@Tim: The various tests using -r are all in spider mode. This part of the
code is not reached when in spider mode.
opt.method && strcasecmp(opt.method,"POST") == 0 provides for more readable
code. Maybe we should use that.

@Giuseppe: Thanks for that patch! I didn't think along those lines, adding
a new command. It also cleans up the code from http.c. It shouldn't have
been there in the first place.


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Gijs van Tulder

Hi Giuseppe,

Dropping the bit that sanitizes the opt.method is probably a good idea. 
(Perhaps I shouldn't have replied to your patch directly.)


Still, even if the sanitization is removed: I think it would be better 
if RESTORE_POST_DATA restores the previous value of opt.method, instead 
of overwriting it with a hardcoded "POST". Isn't it?


A related question: how is a redirect response to a PUT request handled? 
How should it be handled?


I haven't tried it, but it looks like in that case the SUSPEND_POST_DATA 
macro is called (by retrieve_url in retr.c). If that's true, then later 
on the opt.method would be 'restored' to "POST" by RESTORE_POST_DATA.


Regards,

Gijs


Op 01-05-13 22:16 schreef Giuseppe Scrivano:

hi Gijs,

Gijs van Tulder  writes:


Giuseppe Scrivano wrote:

what about this patch?  Any comment?


Another suggestion: why not save the original opt.method, set
opt.method to NULL and put the original opt.method back later?


thanks for your suggestion but I think we should drop the code that
modifies opt.method, since we have to sanitize it only when it is
specified as argument.  Objections?






Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Darshit Shah
>
> Still, even if the sanitization is removed: I think it would be better if
> RESTORE_POST_DATA restores the previous value of opt.method, instead of
> overwriting it with a hardcoded "POST". Isn't it?
>
> As double safety yes. Maybe we should do that.


> A related question: how is a redirect response to a PUT request handled?
> How should it be handled?
>

I'd rather that Giuseppe or someone else answers this in more detail.

I haven't tried it, but it looks like in that case the SUSPEND_POST_DATA
> macro is called (by retrieve_url in retr.c). If that's true, then later on
> the opt.method would be 'restored' to "POST" by RESTORE_POST_DATA.
>
> In case, we wish to use the SUSPEND_POST_DATA macro for other commands
too, then we must make it store method in a backup string and retrieve it.
In any case, I think storing and retrieving opt.method is probably a good,
safe option that we must implement.


> Regards,
>
> Gijs
>
>


-- 
Thanking You,
Darshit Shah
Research Lead, Code Innovation
Kill Code Phobia.
B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Gijs van Tulder

Hi Darshit,

Darshit Shah wrote:
> Unless we explicitly check for opt.method = POST, this will cause a lot
> of issues. Since the macro is being called on every redirect, even
> HEAD/PUT/OPTIONS headers will get suspended to give way to GET. That is
> not the behaviour we want.

Ah, I see it's more complicated. If I understand RFC 2616 correctly 
(section 10.3), suspending a PUT request in response to a redirect is 
incorrect. But so is changing a POST request to a GET in response to 
anything but a 303. But that's expected (and the current behaviour of Wget):


  Note: When automatically redirecting a POST request after
  receiving a 301 status code, some existing HTTP/1.0 user agents
  will erroneously change it into a GET request.

I give up.

Regards,

Gijs



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Giuseppe Scrivano
Gijs van Tulder  writes:

> Hi Darshit,
>
> Darshit Shah wrote:
>> Unless we explicitly check for opt.method = POST, this will cause a lot
>> of issues. Since the macro is being called on every redirect, even
>> HEAD/PUT/OPTIONS headers will get suspended to give way to GET. That is
>> not the behaviour we want.
>
> Ah, I see it's more complicated. If I understand RFC 2616 correctly
> (section 10.3), suspending a PUT request in response to a redirect is
> incorrect. But so is changing a POST request to a GET in response to
> anything but a 303. But that's expected (and the current behaviour of
> Wget):

Thanks you both for the feedback.  Indeed, my first patch doesn't
consider this case.

RFC 2606 doesn't seem very clear about it, and I can't find anywhere
that PUT/OPTIONS/ANYTHING should be handled differently than POST wrt
redirections.  I don't see why suspending a PUT request would be
incorrect.  Darshit, do you have any pointer?

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-01 Thread Daniel Stenberg

On Thu, 2 May 2013, Giuseppe Scrivano wrote:

RFC 2606 doesn't seem very clear about it, and I can't find anywhere that 
PUT/OPTIONS/ANYTHING should be handled differently than POST wrt 
redirections.  I don't see why suspending a PUT request would be incorrect. 
Darshit, do you have any pointer?


Please use the httpbis documents instead of solely relying on RFC2616: 
http://datatracker.ietf.org/doc/draft-ietf-httpbis-p2-semantics/?include_text=1
they're much updated and leaves less to interpret and is now based on many 
years of real-world HTTP use.


While on the subject of redirects, you may also like this: 
http://tools.ietf.org/html/draft-reschke-http-status-308-07


--

 / daniel.haxx.se



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-02 Thread Giuseppe Scrivano
Daniel Stenberg  writes:

> On Thu, 2 May 2013, Giuseppe Scrivano wrote:
>
>> RFC 2606 doesn't seem very clear about it, and I can't find anywhere
>> that PUT/OPTIONS/ANYTHING should be handled differently than POST
>> wrt redirections.  I don't see why suspending a PUT request would be
>> incorrect. Darshit, do you have any pointer?
>
> Please use the httpbis documents instead of solely relying on RFC2616:
> http://datatracker.ietf.org/doc/draft-ietf-httpbis-p2-semantics/?include_text=1
> they're much updated and leaves less to interpret and is now based on
> many years of real-world HTTP use.

thanks for the additional info.

This paragraph:

"The 307 (Temporary Redirect) status code indicates that the target
 resource resides temporarily under a different URI and the user agent
 MUST NOT change the request method if it performs an automatic
 redirection to that URI.  Since the redirection can change over time,
 the client ought to continue using the original effective request URI
 for future requests."

seems to confirm that the patch from Gijs, that keeps the original
method, is correct.

I am going to clean it up and if nobody complains in a few hours, I will
push it.

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-04 Thread Darshit Shah
Hi Giuseppe,

The patch supplied by you adding cmd_uppercase_String seems to break the
header generation since (*q++) causes the pointer to increment before the
c_toupper() method returns.

We should instead increment it only after the statement successfully
returns.
I have also converted the loop to a for loop to reduce number of lines.

One issue I see with the latest commit that suspends the method on
redirection is when using HEAD.
When I use --method=HEAD, I have no intentions of it getting converted to
GET upon redirection. In fact, I don't think wget should even follow the
redirection when opt.method is "HEAD".

On Fri, May 3, 2013 at 12:04 AM, Giuseppe Scrivano wrote:

> Daniel Stenberg  writes:
>
> > On Thu, 2 May 2013, Giuseppe Scrivano wrote:
> >
> >> RFC 2606 doesn't seem very clear about it, and I can't find anywhere
> >> that PUT/OPTIONS/ANYTHING should be handled differently than POST
> >> wrt redirections.  I don't see why suspending a PUT request would be
> >> incorrect. Darshit, do you have any pointer?
> >
> > Please use the httpbis documents instead of solely relying on RFC2616:
> >
> http://datatracker.ietf.org/doc/draft-ietf-httpbis-p2-semantics/?include_text=1
> > they're much updated and leaves less to interpret and is now based on
> > many years of real-world HTTP use.
>
> thanks for the additional info.
>
> This paragraph:
>
> "The 307 (Temporary Redirect) status code indicates that the target
>  resource resides temporarily under a different URI and the user agent
>  MUST NOT change the request method if it performs an automatic
>  redirection to that URI.  Since the redirection can change over time,
>  the client ought to continue using the original effective request URI
>  for future requests."
>
> seems to confirm that the patch from Gijs, that keeps the original
> method, is correct.
>
> I am going to clean it up and if nobody complains in a few hours, I will
> push it.
>
> --
> Giuseppe
>
>


-- 
Thanking You,
Darshit Shah
Research Lead, Code Innovation
Kill Code Phobia.
B.E.(Hons.) Mechanical Engineering, '14. BITS-Pilani


0001-Fix-issue-when-converting-string-to-uppercase.patch
Description: Binary data


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-04 Thread Giuseppe Scrivano
Hi Darshit,


Darshit Shah  writes:

> The patch supplied by you adding cmd_uppercase_String seems to break
> the header generation since (*q++) causes the pointer to increment
> before the c_toupper() method returns. 
>
> We should instead increment it only after the statement successfully
> returns.
> I have also converted the loop to a for loop to reduce number of
> lines. 

while I agree that your patch makes the code more readable and it is
good to apply it, I don't see how the previous version was broken.  What
compiler have you used?


> One issue I see with the latest commit that suspends the method on
> redirection is when using HEAD. 
> When I use --method=HEAD, I have no intentions of it getting converted
> to GET upon redirection. In fact, I don't think wget should even
> follow the redirection when opt.method is "HEAD".

Good catch!  Maybe we should handle --method=HEAD as opt.spider?

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-04 Thread Darshit Shah
Hi Giuseppe,

while I agree that your patch makes the code more readable and it is
> good to apply it, I don't see how the previous version was broken.  What
> compiler have you used?
>

This occurs when compiling on gcc 4.8.0. The following output ensues:

Setting --method (method) to head
DEBUG output created by Wget 1.14.43-cf74-dirty on linux-gnu.

URI encoding = ‘UTF-8’
--2013-05-05 07:56:19--  http://www.example.com/
Resolving www.example.com (www.example.com)... 192.0.43.10,
2001:500:88:200::10
Caching www.example.com => 192.0.43.10 2001:500:88:200::10
Connecting to www.example.com (www.example.com)|192.0.43.10|:80...
connected.
Created socket 3.
Releasing 0x01271410 (new refcount 1).

---request begin---
EAD / HTTP/1.1
User-Agent: Wget/1.14.43-cf74-dirty (linux-gnu)
Accept: */*
Host: www.example.com
Connection: Keep-Alive

See how the request has the first character missing? And since --post-data
is also handled internally through this, it breaks all scripts that use
wget to send POST headers.

While on the topic of compilation issues, the linker flag, D_FORTIFY_SOURCE
throws a couple of warnings. Would we be interested in looking into those
and sorting them out?


>
>
Good catch!  Maybe we should handle --method=HEAD as opt.spider?
>
That makes sense.

I also think some more explicit sanity checks are needed in the code. Like
sending data with a HEAD request makes absolutely no sense (does it?).


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-05 Thread Giuseppe Scrivano
Darshit Shah  writes:

> I also think some more explicit sanity checks are needed in the code.
> Like sending data with a HEAD request makes absolutely no sense (does
> it?). 

no, it doesn't, same for GET.

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-05 Thread Darshit Shah
>
> while I agree that your patch makes the code more readable and it is
> good to apply it, I don't see how the previous version was broken.  What
> compiler have you used?
>
> I see you pushed the patch. But have we gotten to the bottom of the cause?
If the result changes with the GCC version, we should probably report it
upstream.

no, it doesn't, same for GET.

True. No sense suspending and setting the same method again. But adding
code to test for the same may add enough overhead to negate most gains
obtained by preventing it.
I feel we should instead ignore processing the --method option if it's
argument is "GET".
Also, I am not aware, but is there any use case where a user may wish to
send body-data with a GET request?


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-05 Thread Giuseppe Scrivano
Darshit Shah  writes:

>while I agree that your patch makes the code more readable and it is
>
>  good to apply it, I don't see how the previous version was broken.
>  What
>  compiler have you used?
>
>I see you pushed the patch. But have we gotten to the bottom of the
>cause? If the result changes with the GCC version, we should probably
>report it upstream.

no, my mistake.  I remembered that it was correct to change the lvalue
on the left side of the assignment but it seems not the case, and that
is an undefined behaviour; good that you hit this problem and fixed it.


>  no, it doesn't, same for GET.
>
>True. No sense suspending and setting the same method again. But adding
>code to test for the same may add enough overhead to negate most gains
>obtained by preventing it.
>I feel we should instead ignore processing the --method option if it's
>argument is "GET".
>Also, I am not aware, but is there any use case where a user may wish
>to send body-data with a GET request?

maybe testing the web server :-)  but let's keep in mind the normal
usage and it seems correct to use "--method GET" or "--method HEAD",
in this case we should raise a warning if the user also specifies a
body-data saying that the body-data is ignored.

Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-06 Thread Darshit Shah
> This paragraph:
>
> "The 307 (Temporary Redirect) status code indicates that the target
>  resource resides temporarily under a different URI and the user agent
>  MUST NOT change the request method if it performs an automatic
>  redirection to that URI.  Since the redirection can change over time,
>  the client ought to continue using the original effective request URI
>  for future requests."
>
> seems to confirm that the patch from Gijs, that keeps the original
> method, is correct.
>
> I am going to clean it up and if nobody complains in a few hours, I will
> push it.
>

I spent some time reading the HTTPbis draft for the update to RFC 2616.
According to HTTPbis, ONLY the POST request should be suspended (which
again is in violation of RFC 2616 but allowed due to popular
implementation).
The PUT method should almost never be redirected, and if it is redirected
the new URI should point to the location where the resource is to be
created/updated.
As has been discussed earlier, HEAD should not even be following the
redirects.
OPTIONS/CONNECT/TRACE should not have a 3xx Response.
While DELETE *may* be redirected, the document did not mention anything
explicitly about it. From what I gauge, we should not suspend a DELETE
command upon a redirect.

Also, this reminds me, when a user specifies --method HEAD, that implies
s/he wishes to see the Response provided by the server and hence in this
case the headers should be printed in ALL outputs (quiet, verbose and
debug). Same with TRACE/OPTIONS commands since these are primarily used
when the user wishes to view the Response generated.


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-06 Thread Giuseppe Scrivano
Darshit Shah  writes:

> While DELETE *may* be redirected, the document did not mention
> anything explicitly about it. From what I gauge, we should not suspend
> a DELETE command upon a redirect.

We suspend the post data only when we receive a 307, or do you mean we
shouldn't suspend in this case too?


> Also, this reminds me, when a user specifies --method HEAD, that
> implies s/he wishes to see the Response provided by the server and
> hence in this case the headers should be printed in ALL outputs
> (quiet, verbose and debug). Same with TRACE/OPTIONS commands since
> these are primarily used when the user wishes to view the Response
> generated. 

The --method patch has opened a can of worms and now we have to consider
literally any HTTP method and how it make more sense for the users.
I think that very few people are going to abuse --method with
TRACE/OPTIONS and we can consider these cases later, if they really hurt
someone.  On the other hand, it will be nice to handle separately HEAD.

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-06 Thread Darshit Shah
>
> We suspend the post data only when we receive a 307, or do you mean we
> shouldn't suspend in this case too?
>
> It's the other way round. A 307 response code is used when the server
wishes to explicitly ask the client to not suspend. And that is also the
current behaviour.

We currently suspend any method if the response code is a non 307 redirect.
That I believe is wrong. We should only be suspending the POST method and
not others.


> The --method patch has opened a can of worms and now we have to consider
> literally any HTTP method and how it make more sense for the users.
> I think that very few people are going to abuse --method with
> TRACE/OPTIONS and we can consider these cases later, if they really hurt
> someone.  On the other hand, it will be nice to handle separately HEAD.
>
>
Well yeah. It has opened a whole lot of possibilities that we must keep
considering. I will handle it separately for HEAD and send in a patch soon.
However, I was thinking, that since the new --method command needs a lot of
work specific to itself and that it may only increase in the future, we
should handle that code separately? Maybe a set_method_opts() function?

-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-08 Thread Giuseppe Scrivano
Darshit Shah  writes:

> 
> We suspend the post data only when we receive a 307, or do you
> mean we
> shouldn't suspend in this case too?
> 
> It's the other way round. A 307 response code is used when the server
> wishes to explicitly ask the client to not suspend. And that is also
> the current behaviour. 

true, I made some confusion with the terms :-)



> We currently suspend any method if the response code is a non 307
> redirect. That I believe is wrong. We should only be suspending the
> POST method and not others. 

It sounds good, but we must be careful also with 303.

Do you agree that the matrix for the method to use on the next request
should look this way?

 original
  method ->  POST   FOO  HEAD
response
status
 |
 \/
301  GETFOO(terminate)
302  GETFOO(terminate)
303  GETGET(terminate)
307  POST   FOO(terminate)


> Well yeah. It has opened a whole lot of possibilities that we must
> keep considering. I will handle it separately for HEAD and send in a
> patch soon. However, I was thinking, that since the new --method
> command needs a lot of work specific to itself and that it may only
> increase in the future, we should handle that code separately? Maybe a
> set_method_opts() function?

we can refactor if/when we really need it, no need to overengineer the
code now.  Do you already have any of such "opts" in mind?

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-05-08 Thread Darshit Shah
> > We currently suspend any method if the response code is a non 307
> > redirect. That I believe is wrong. We should only be suspending the
> > POST method and not others.
>
> It sounds good, but we must be careful also with 303.
>
> Do you agree that the matrix for the method to use on the next request
> should look this way?
>
>  original
>   method ->  POST   FOO  HEAD
> response
> status
>  |
>  \/
> 301  GETFOO(terminate)
> 302  GETFOO(terminate)
> 303  GETGET(terminate)
> 307  POST   FOO(terminate)
>
> Looks good to me. This will require some work so that the code remains
maintainable later.
I implemented setting opt.spider when invoked with --method head. But
spider mode currently follows all redirects till it receives a 200 OK. This
does not match with the intended behaviour.


> > Well yeah. It has opened a whole lot of possibilities that we must
> > keep considering. I will handle it separately for HEAD and send in a
> > patch soon. However, I was thinking, that since the new --method
> > command needs a lot of work specific to itself and that it may only
> > increase in the future, we should handle that code separately? Maybe a
> > set_method_opts() function?
>
> we can refactor if/when we really need it, no need to overengineer the
> code now.

Let's keep it for later then.

 Do you already have any of such "opts" in mind?
>
Just the code that handles post-body/data and now for HEAD.



-- 
Thanking You,
Darshit Shah


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-06-14 Thread Darshit Shah
Attaching a patch that should (hopefully) fix all the problems listed here.
I have tested it and it looks good to me.

On Thu, May 9, 2013 at 9:55 AM, Darshit Shah  wrote:

>
>  > We currently suspend any method if the response code is a non 307
>> > redirect. That I believe is wrong. We should only be suspending the
>> > POST method and not others.
>>
>> It sounds good, but we must be careful also with 303.
>>
>> Do you agree that the matrix for the method to use on the next request
>> should look this way?
>>
>>  original
>>   method ->  POST   FOO  HEAD
>> response
>> status
>>  |
>>  \/
>> 301  GETFOO(terminate)
>> 302  GETFOO(terminate)
>> 303  GETGET(terminate)
>> 307  POST   FOO(terminate)
>>
>> Looks good to me. This will require some work so that the code remains
> maintainable later.
> I implemented setting opt.spider when invoked with --method head. But
> spider mode currently follows all redirects till it receives a 200 OK. This
> does not match with the intended behaviour.
>
>
>> > Well yeah. It has opened a whole lot of possibilities that we must
>> > keep considering. I will handle it separately for HEAD and send in a
>> > patch soon. However, I was thinking, that since the new --method
>> > command needs a lot of work specific to itself and that it may only
>> > increase in the future, we should handle that code separately? Maybe a
>> > set_method_opts() function?
>>
>> we can refactor if/when we really need it, no need to overengineer the
>> code now.
>
> Let's keep it for later then.
>
>  Do you already have any of such "opts" in mind?
>>
> Just the code that handles post-body/data and now for HEAD.
>
>
>
> --
> Thanking You,
> Darshit Shah
>
>


-- 
Thanking You,
Darshit Shah


0001-Follow-RFC-2616-and-httpbis-specifications-when-hand.patch
Description: Binary data


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-06-16 Thread Giuseppe Scrivano
thanks for your contribution, some comments inline:

Darshit Shah  writes:

> -  if (statcode == HTTP_STATUS_TEMPORARY_REDIRECT)
> -return NEWLOCATION_KEEP_POST;
> +  switch (statcode)
> +  {
> +case HTTP_STATUS_TEMPORARY_REDIRECT:
> +  return NEWLOCATION_KEEP_POST;
> +  break;
> +case HTTP_STATUS_MOVED_PERMANENTLY:
> +  if(opt.method && strcasecmp(opt.method,"post")!=0)
> +return NEWLOCATION_KEEP_POST;
> +  break;
> +case HTTP_STATUS_MOVED_TEMPORARILY:
> +  if(opt.method && strcasecmp(opt.method,"post")!=0)
> +return NEWLOCATION_KEEP_POST;
> +  break;

please follow the GNU Coding standards here.  Empty space between
function name and '(' and also leave spaces around the operator !=.

> +  if (opt.method && strcasecmp(opt.method,"HEAD")==0)
> +setoptval("spider","1","spider");

same here.  Also leave a space after the commas.

I have seen these errors in other places too, please double check the
patch.

Could you please also add some comments in the documentation?

We can skip tests for now, since you are going to rewrite the unit tests
for your Summer of Code, but we will probably want something to stress
these features too.

-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-06-16 Thread Darshit Shah
> please follow the GNU Coding standards here.  Empty space between
> function name and '(' and also leave spaces around the operator !=.
>
> Sorry for missing these. Fixed in all the relevant locations in the patch.


> Could you please also add some comments in the documentation?
>
Added the changes in wget.texi. Also fixed formatting where two whitespaces
were used between sentences and a couple of typos.


> We can skip tests for now, since you are going to rewrite the unit tests
> for your Summer of Code, but we will probably want something to stress
> these features too.
>
 Yes, I intend to write rigorous tests for these features.

-- 
Thanking You,
Darshit Shah


0001-Follow-RFC-2616-and-httpbis-specifications-when-hand.patch
Description: Binary data


Re: [Bug-wget] Segmentation fault with current development version of wget

2013-06-16 Thread Giuseppe Scrivano
Thanks, the code seems correct but since you are going to write a lot of
code during your Summer of Code project, I will be very pedantic.  This
time I have just amended the changes before push your patch.


Darshit Shah  writes:

>  Please be aware that Wget needs to know the size of the POST data in
> -advance.  Therefore the argument to @code{--post-file} must be a regular

we put two empty spaces after a sentence.  I have fixed this change in
other places too.


> +  switch (statcode)
> +  {

I have indented the '{' correctly.  Two empty spaces.  Same for the
closing curly bracket.


> +  if ((!(*dt & RETROKF) && !opt.content_on_error) || head_only || 
> (opt.method && strcasecmp (opt.method, "GET") != 0))

long line, I have splitted it.


-- 
Giuseppe



Re: [Bug-wget] Segmentation fault with current development version of wget

2013-06-17 Thread Darshit Shah
On Mon, Jun 17, 2013 at 2:05 AM, Giuseppe Scrivano wrote:

> Thanks, the code seems correct but since you are going to write a lot of
> code during your Summer of Code project, I will be very pedantic.  This
> time I have just amended the changes before push your patch.
>
>
> Darshit Shah  writes:
>
> >  Please be aware that Wget needs to know the size of the POST data in
> > -advance.  Therefore the argument to @code{--post-file} must be a regular
>
> we put two empty spaces after a sentence.  I have fixed this change in
> other places too.
>
>
> > +  switch (statcode)
> > +  {
>
> I have indented the '{' correctly.  Two empty spaces.  Same for the
> closing curly bracket.
>
>
> > +  if ((!(*dt & RETROKF) && !opt.content_on_error) || head_only ||
> (opt.method && strcasecmp (opt.method, "GET") != 0))
>
> long line, I have splitted it.
>

Thanks! And sorry for the goof ups in the formatting. I'm getting rusty.
Although, I must admit I wasn't aware of the convention to use two spaces
after each sentence and thought it was a mistake by someone.


-- 
Thanking You,
Darshit Shah