Re: Bug in LWP::UserAgent?

2001-07-28 Thread Marcus Brito

Gisle Aas wrote:

 What do you think it should do?  We could have the Content-type in the
 head always override the content-type in the response headers, but
 that might throw out information and I don't like that.  We could have
 LWP not override the header, but then you often loose the extra
 charset parameter that is often what is added in this head version
 of the header.  The current way might give you surprises, but it does
 not throw away information.

That's true, but it sure breaks the standard. I think the most sensible 
thing to do is to overwrite the headers with content from the head 
section, as this is just what the head section is intended for.

For security, a overwrite_headers option could be added, so worried 
users can disable this behavior.

--
Pazu




Re: Bug in LWP::UserAgent?

2001-07-27 Thread Bjoern Hoehrmann

* Gisle Aas wrote:
 I believe that this behavior is due to the UserAgent because using telnet I
 do not get multiple 'Content-Type' definitions in the response from the
 server.  The link at the top of this message has more information on the
 matter.  I am working out a workaround in the proxy server, but I wonder if 
 this is not something that should be addressed in the libwww-perl codebase.

What do you think it should do?

HTTP::Headers should have some method to determine whether the body was
parsed or not. Not only usable in this case.
-- 
Björn Höhrmann { mailto:[EMAIL PROTECTED] } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/



Re: Bug in LWP::UserAgent?

2001-07-26 Thread Joshua Vickery

 This is very likely to be what is happening.  You have two options for
 dealing with that:
 
   1) tell LWP not to add headers from the head of the HTML by turning
  off the 'parse_head' attribute:
 
 $ua-parse_head(0);
 
   2) post-process the request to remove the extra header with something like:
 
 $ua-content_type(($ua-header(Content-Type))[0]);

These are fine ideas, I'm sorry I didn't find reference to them in the 
documentation before I posted here.  I did some post processing to work around
this, but the first option fits my needs much better.

 What do you think it should do?  We could have the Content-type in the
 head always override the content-type in the response headers, but
 that might throw out information and I don't like that.  We could have
 LWP not override the header, but then you often loose the extra
 charset parameter that is often what is added in this head version
 of the header.  The current way might give you surprises, but it does
 not throw away information.

Well, given that the LWP::UserAgent is described as a class implementing a 
simple World-Wide Web user agent in Perl.  I think it might be appropriate
to set parse_head to 0 by default, since standards compliant HTTP agents
do not parse the head of an HTML document for additional HTTP headers.  
However, perhaps LWP::UserAgent gets used mostly by people building
web clients rather than servers, in which case I can see the value of the
extra processing.  Thank you for such a prompt and thorough reply, I really
appreciate it.  This has been my first real interaction with the free
software community from a position as an active developer, and I must say, 
between this list and bugzilla.mozilla.org I am very very impressed.

josh


-- 
Joshua Vickery
Grinnell College
14-21
Grinnell IA, 50112
[EMAIL PROTECTED]




Re: Bug in LWP::UserAgent?

2001-07-26 Thread Gisle Aas

Joshua Vickery [EMAIL PROTECTED] writes:

 Well, given that the LWP::UserAgent is described as a class implementing a 
 simple World-Wide Web user agent in Perl.  I think it might be appropriate
 to set parse_head to 0 by default, since standards compliant HTTP agents
 do not parse the head of an HTML document for additional HTTP headers.

The reason 'parse_head' is on by default is that I wanted the
$response-base method to do the right thing by default.  Anyway, I
don't think we can change that at this point.  Too much code would
break.

Regards,
Gisle



Bug in LWP::UserAgent?

2001-07-25 Thread Joshua Vickery

While working with a perl proxy server built in house I found that I was
getting strange behavior from recent builds of Mozilla.   A discussion of
the bug is available here:

http://bugzilla.mozilla.org/show_bug.cgi?id=92140

After digging a little deeper I found that it seems that
LPW::UserAgent-request and LWP::UserAgent-simple_request return invalid
HTTP headers for some web pages.  Here is a simple test script:

==
#!/usr/bin/perl

use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

$ua = new LWP::UserAgent;
$req = HTTP::Request-new(GET, 'http://www.math.grin.edu/');

$response = $ua-request($req);

print Response is:.$response-as_string().\n;
==

In this case, the response returns two 'Content-Type' fields with two
different values, and according to the folks at Mozilla the BNF
definition of Content-Type in RFC 2616, Section 14.17 does not allow multiple 
values for Content-Type. I suspect that what is happening here is that perl 
is parsing the HTML and extracting a second Content-Type declaration from one 
of the Meta tags in the html document, and then storing that as a header.  I 
believe that this behavior is due to the UserAgent because using telnet I
do not get multiple 'Content-Type' definitions in the response from the
server.  The link at the top of this message has more information on the
matter.  I am working out a workaround in the proxy server, but I wonder if 
this is not something that should be addressed in the libwww-perl codebase.

josh



-- 
Joshua Vickery
Grinnell College
14-21
Grinnell IA, 50112
[EMAIL PROTECTED]




Re: Bug in LWP::UserAgent?

2001-07-25 Thread Gisle Aas

Joshua Vickery [EMAIL PROTECTED] writes:

 While working with a perl proxy server built in house I found that I was
 getting strange behavior from recent builds of Mozilla.   A discussion of
 the bug is available here:
 
 http://bugzilla.mozilla.org/show_bug.cgi?id=92140
 
 After digging a little deeper I found that it seems that
 LPW::UserAgent-request and LWP::UserAgent-simple_request return invalid
 HTTP headers for some web pages.  Here is a simple test script:
 
 ==
 #!/usr/bin/perl
 
 use LWP::UserAgent;
 use HTTP::Request;
 use HTTP::Response;
 
 $ua = new LWP::UserAgent;
 $req = HTTP::Request-new(GET, 'http://www.math.grin.edu/');
 
 $response = $ua-request($req);
 
 print Response is:.$response-as_string().\n;
 ==
 
 In this case, the response returns two 'Content-Type' fields with two
 different values, and according to the folks at Mozilla the BNF
 definition of Content-Type in RFC 2616, Section 14.17 does not allow multiple 
 values for Content-Type. I suspect that what is happening here is that perl 
 is parsing the HTML and extracting a second Content-Type declaration from one 
 of the Meta tags in the html document, and then storing that as a header.

This is very likely to be what is happening.  You have two options for
dealing with that:

  1) tell LWP not to add headers from the head of the HTML by turning
 off the 'parse_head' attribute:

$ua-parse_head(0);

  2) post-process the request to remove the extra header with something like:

$ua-content_type(($ua-header(Content-Type))[0]);

 I believe that this behavior is due to the UserAgent because using telnet I
 do not get multiple 'Content-Type' definitions in the response from the
 server.  The link at the top of this message has more information on the
 matter.  I am working out a workaround in the proxy server, but I wonder if 
 this is not something that should be addressed in the libwww-perl codebase.

What do you think it should do?  We could have the Content-type in the
head always override the content-type in the response headers, but
that might throw out information and I don't like that.  We could have
LWP not override the header, but then you often loose the extra
charset parameter that is often what is added in this head version
of the header.  The current way might give you surprises, but it does
not throw away information.

Regards,
Gisle



Re: BUG in LWP::UserAgent

2001-05-19 Thread Gisle Aas

Steven Kordik [EMAIL PROTECTED] writes:

 The bug is this:
 
 All queries in response to a server redirect should be sent via the GET
 method, not whatever method the original request was.

This is not really true.  This change of request method should only
happen for 303 redirects, but it is probably an improvenment to change
LWP to also do this for 302.  This is what the RFC 2616 notes for 302
and 303 says:

  Note: RFC 1945 and RFC 2068 specify that the client is not allowed
  to change the method on the redirected request.  However, most
  existing user agent implementations treat 302 as if it were a 303
  response, performing a GET on the Location field-value regardless
  of the original request method. The status codes 303 and 307 have
  been added for servers that wish to make unambiguously clear which
  kind of reaction is expected of the client.

  Note: Many pre-HTTP/1.1 user agents do not understand the 303
  status. When interoperability with such clients is a concern, the
  302 status code may be used instead, since most user agents react
  to a 302 response as described here for 303. 
  
 Currently, the original requests method is used to request the redirected
 URL.  This is invalid, but rarely an issue since by default LWP::UserAgent
 does not allow redirects for POST method requests.
 
 All modern HTTP clients do quietly, and without informing the user, follow
 server redirects in response to a POST method.  However, they always change
 the request to use the GET method.
 
 To fix this bug in LWP::Useragent, simply add the following line after line
 275:
 
 $referral-method(GET);
 
 Then, if a user of LWP::UserAgent overrides the POST redirect aversion,
 LWP::UserAgent will behave the way Netscape and IE do, and transform POST
 redirects to GET requests.
 
 As an aside, this will fix most of the complaints that I have seen on this
 list by users unable to construct scripts that automatically login to places
 like Yahoo and Mail.com... A large percentage of portal sites out there
 redirect POST's and expect a GET...  I do a lot of scripted web site logins,
 and I encounter this every day...

I agree that it would be a good idea to fix this.  Do you want to try
to make a patch that make LWP do the right thing for each one of 301,
302, 303 and 307?

Regards,
Gisle



BUG in LWP::UserAgent

2001-05-18 Thread Steven Kordik

I submitted this bug several months ago to this list, but never saw a
response...

The bug is this:

All queries in response to a server redirect should be sent via the GET
method, not whatever method the original request was.

Currently, the original requests method is used to request the redirected
URL.  This is invalid, but rarely an issue since by default LWP::UserAgent
does not allow redirects for POST method requests.

All modern HTTP clients do quietly, and without informing the user, follow
server redirects in response to a POST method.  However, they always change
the request to use the GET method.

To fix this bug in LWP::Useragent, simply add the following line after line
275:

$referral-method(GET);

Then, if a user of LWP::UserAgent overrides the POST redirect aversion,
LWP::UserAgent will behave the way Netscape and IE do, and transform POST
redirects to GET requests.

As an aside, this will fix most of the complaints that I have seen on this
list by users unable to construct scripts that automatically login to places
like Yahoo and Mail.com... A large percentage of portal sites out there
redirect POST's and expect a GET...  I do a lot of scripted web site logins,
and I encounter this every day...

-Steven Kordik