Re: Bug in LWP::UserAgent?
Gisle Aas wrote: What do you think it should do? We could have the Content-type in the head always override the content-type in the response headers, but that might throw out information and I don't like that. We could have LWP not override the header, but then you often loose the extra charset parameter that is often what is added in this head version of the header. The current way might give you surprises, but it does not throw away information. That's true, but it sure breaks the standard. I think the most sensible thing to do is to overwrite the headers with content from the head section, as this is just what the head section is intended for. For security, a overwrite_headers option could be added, so worried users can disable this behavior. -- Pazu
Re: Bug in LWP::UserAgent?
* Gisle Aas wrote: I believe that this behavior is due to the UserAgent because using telnet I do not get multiple 'Content-Type' definitions in the response from the server. The link at the top of this message has more information on the matter. I am working out a workaround in the proxy server, but I wonder if this is not something that should be addressed in the libwww-perl codebase. What do you think it should do? HTTP::Headers should have some method to determine whether the body was parsed or not. Not only usable in this case. -- Björn Höhrmann { mailto:[EMAIL PROTECTED] } http://www.bjoernsworld.de am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de 25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Re: Bug in LWP::UserAgent?
This is very likely to be what is happening. You have two options for dealing with that: 1) tell LWP not to add headers from the head of the HTML by turning off the 'parse_head' attribute: $ua-parse_head(0); 2) post-process the request to remove the extra header with something like: $ua-content_type(($ua-header(Content-Type))[0]); These are fine ideas, I'm sorry I didn't find reference to them in the documentation before I posted here. I did some post processing to work around this, but the first option fits my needs much better. What do you think it should do? We could have the Content-type in the head always override the content-type in the response headers, but that might throw out information and I don't like that. We could have LWP not override the header, but then you often loose the extra charset parameter that is often what is added in this head version of the header. The current way might give you surprises, but it does not throw away information. Well, given that the LWP::UserAgent is described as a class implementing a simple World-Wide Web user agent in Perl. I think it might be appropriate to set parse_head to 0 by default, since standards compliant HTTP agents do not parse the head of an HTML document for additional HTTP headers. However, perhaps LWP::UserAgent gets used mostly by people building web clients rather than servers, in which case I can see the value of the extra processing. Thank you for such a prompt and thorough reply, I really appreciate it. This has been my first real interaction with the free software community from a position as an active developer, and I must say, between this list and bugzilla.mozilla.org I am very very impressed. josh -- Joshua Vickery Grinnell College 14-21 Grinnell IA, 50112 [EMAIL PROTECTED]
Re: Bug in LWP::UserAgent?
Joshua Vickery [EMAIL PROTECTED] writes: Well, given that the LWP::UserAgent is described as a class implementing a simple World-Wide Web user agent in Perl. I think it might be appropriate to set parse_head to 0 by default, since standards compliant HTTP agents do not parse the head of an HTML document for additional HTTP headers. The reason 'parse_head' is on by default is that I wanted the $response-base method to do the right thing by default. Anyway, I don't think we can change that at this point. Too much code would break. Regards, Gisle
Bug in LWP::UserAgent?
While working with a perl proxy server built in house I found that I was getting strange behavior from recent builds of Mozilla. A discussion of the bug is available here: http://bugzilla.mozilla.org/show_bug.cgi?id=92140 After digging a little deeper I found that it seems that LPW::UserAgent-request and LWP::UserAgent-simple_request return invalid HTTP headers for some web pages. Here is a simple test script: == #!/usr/bin/perl use LWP::UserAgent; use HTTP::Request; use HTTP::Response; $ua = new LWP::UserAgent; $req = HTTP::Request-new(GET, 'http://www.math.grin.edu/'); $response = $ua-request($req); print Response is:.$response-as_string().\n; == In this case, the response returns two 'Content-Type' fields with two different values, and according to the folks at Mozilla the BNF definition of Content-Type in RFC 2616, Section 14.17 does not allow multiple values for Content-Type. I suspect that what is happening here is that perl is parsing the HTML and extracting a second Content-Type declaration from one of the Meta tags in the html document, and then storing that as a header. I believe that this behavior is due to the UserAgent because using telnet I do not get multiple 'Content-Type' definitions in the response from the server. The link at the top of this message has more information on the matter. I am working out a workaround in the proxy server, but I wonder if this is not something that should be addressed in the libwww-perl codebase. josh -- Joshua Vickery Grinnell College 14-21 Grinnell IA, 50112 [EMAIL PROTECTED]
Re: Bug in LWP::UserAgent?
Joshua Vickery [EMAIL PROTECTED] writes: While working with a perl proxy server built in house I found that I was getting strange behavior from recent builds of Mozilla. A discussion of the bug is available here: http://bugzilla.mozilla.org/show_bug.cgi?id=92140 After digging a little deeper I found that it seems that LPW::UserAgent-request and LWP::UserAgent-simple_request return invalid HTTP headers for some web pages. Here is a simple test script: == #!/usr/bin/perl use LWP::UserAgent; use HTTP::Request; use HTTP::Response; $ua = new LWP::UserAgent; $req = HTTP::Request-new(GET, 'http://www.math.grin.edu/'); $response = $ua-request($req); print Response is:.$response-as_string().\n; == In this case, the response returns two 'Content-Type' fields with two different values, and according to the folks at Mozilla the BNF definition of Content-Type in RFC 2616, Section 14.17 does not allow multiple values for Content-Type. I suspect that what is happening here is that perl is parsing the HTML and extracting a second Content-Type declaration from one of the Meta tags in the html document, and then storing that as a header. This is very likely to be what is happening. You have two options for dealing with that: 1) tell LWP not to add headers from the head of the HTML by turning off the 'parse_head' attribute: $ua-parse_head(0); 2) post-process the request to remove the extra header with something like: $ua-content_type(($ua-header(Content-Type))[0]); I believe that this behavior is due to the UserAgent because using telnet I do not get multiple 'Content-Type' definitions in the response from the server. The link at the top of this message has more information on the matter. I am working out a workaround in the proxy server, but I wonder if this is not something that should be addressed in the libwww-perl codebase. What do you think it should do? We could have the Content-type in the head always override the content-type in the response headers, but that might throw out information and I don't like that. We could have LWP not override the header, but then you often loose the extra charset parameter that is often what is added in this head version of the header. The current way might give you surprises, but it does not throw away information. Regards, Gisle
Re: BUG in LWP::UserAgent
Steven Kordik [EMAIL PROTECTED] writes: The bug is this: All queries in response to a server redirect should be sent via the GET method, not whatever method the original request was. This is not really true. This change of request method should only happen for 303 redirects, but it is probably an improvenment to change LWP to also do this for 302. This is what the RFC 2616 notes for 302 and 303 says: Note: RFC 1945 and RFC 2068 specify that the client is not allowed to change the method on the redirected request. However, most existing user agent implementations treat 302 as if it were a 303 response, performing a GET on the Location field-value regardless of the original request method. The status codes 303 and 307 have been added for servers that wish to make unambiguously clear which kind of reaction is expected of the client. Note: Many pre-HTTP/1.1 user agents do not understand the 303 status. When interoperability with such clients is a concern, the 302 status code may be used instead, since most user agents react to a 302 response as described here for 303. Currently, the original requests method is used to request the redirected URL. This is invalid, but rarely an issue since by default LWP::UserAgent does not allow redirects for POST method requests. All modern HTTP clients do quietly, and without informing the user, follow server redirects in response to a POST method. However, they always change the request to use the GET method. To fix this bug in LWP::Useragent, simply add the following line after line 275: $referral-method(GET); Then, if a user of LWP::UserAgent overrides the POST redirect aversion, LWP::UserAgent will behave the way Netscape and IE do, and transform POST redirects to GET requests. As an aside, this will fix most of the complaints that I have seen on this list by users unable to construct scripts that automatically login to places like Yahoo and Mail.com... A large percentage of portal sites out there redirect POST's and expect a GET... I do a lot of scripted web site logins, and I encounter this every day... I agree that it would be a good idea to fix this. Do you want to try to make a patch that make LWP do the right thing for each one of 301, 302, 303 and 307? Regards, Gisle
BUG in LWP::UserAgent
I submitted this bug several months ago to this list, but never saw a response... The bug is this: All queries in response to a server redirect should be sent via the GET method, not whatever method the original request was. Currently, the original requests method is used to request the redirected URL. This is invalid, but rarely an issue since by default LWP::UserAgent does not allow redirects for POST method requests. All modern HTTP clients do quietly, and without informing the user, follow server redirects in response to a POST method. However, they always change the request to use the GET method. To fix this bug in LWP::Useragent, simply add the following line after line 275: $referral-method(GET); Then, if a user of LWP::UserAgent overrides the POST redirect aversion, LWP::UserAgent will behave the way Netscape and IE do, and transform POST redirects to GET requests. As an aside, this will fix most of the complaints that I have seen on this list by users unable to construct scripts that automatically login to places like Yahoo and Mail.com... A large percentage of portal sites out there redirect POST's and expect a GET... I do a lot of scripted web site logins, and I encounter this every day... -Steven Kordik