Hello,

I have a problem with the "$ua->max_size()". This can really choke you in some cases. It seems when LWP makes this type of request it is sending a Range request. Some servers are super slow at responding to this type of request and often return a 206 Partial Content response. This is sometimes replied with a "Content-Type: multipart/mixed" and a boundary="--bla,bla,bla". This now makes it really difficult to figure out what the content is (ie, text/html, image/gif and so on) so a lot more processing is required to figure out what the content is and whether or not it is acceptable. For example;

<--snip-->
my $url = 'http://search.cpan.org/';
my $max_content = 500;

require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->max_size($max_content);

my $response = $ua->get($url);

<--snip-->

I'm sorry I have not included a URL where all this trouble is found, but that's because I stopped using the $ua->max_size(); some time ago, but now I have a need for it. The problem is that some servers will take forever to respond to this request and will often cause the above problems mentioned. My solution to this was to create a callback instead:

<--snip-->
my $result = '';
my $url = 'http://search.cpan.org/';
my $max_content = 500;

require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);

my $response = $ua->get($url, ':content_cb' => \&http_callback, $max_content+1);

sub http_callback {
  my ($data, $response, $protocol) = @_;
  $result .= $data;
  die if length($result > $max_content);
  return();
}
<--snip-->

While this is not prefect, it did solve all the above issues. Servers respond super fast and the content-type headers are untouched (ie, text/html).

My request to you is to change the way "$ua->max_size($max_content);" works. It would benefit me and I'm sure many others if it worked more like the callback shown above (just stop download at (x)bytes). This would then act more like a browser acts when you click the Stop button. Requests would be fast and the server will reply with all header information as expected. And this will allow us to use "LWP::Parallel::RobotUA" which my above example will not.

So why is "$ua->max_size($max_content);" so useful? Well some people like to create terabyte files and feed it to the robot just to see if they can crash the server. Using the current "$ua->max_size($max_content);" slows everything way down and comes with the extra baggage of a 206 response header and a multipart/mixed content-type. The callback solves all these issues, but will not work with "LWP::Parallel".

Thanks for listening,
John


Reply via email to