LWP::RobotUA won't parse a robots.txt file if the file does not contain "Disallow". The check for "Disallow" is case sensitive, but according to the robot exclusion standard, field names are case insensitive. This causes LWP::RobotUA to ignore some robots.txt files that it should parse.
Attached is a patch that makes the check for "Disallow" case insensitive. The patch is against libwww-perl 5.76 (RobotUA.pm 1.23). -- Liam Quinn
--- LWP/RobotUA.pm.orig 2003-10-24 07:13:03.000000000 -0400 +++ LWP/RobotUA.pm 2004-04-03 17:59:04.000000000 -0500 @@ -126,7 +126,7 @@ my $fresh_until = $robot_res->fresh_until; if ($robot_res->is_success) { my $c = $robot_res->content; - if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/) { + if ($robot_res->content_type =~ m,^text/, && $c =~ /Disallow/i) { LWP::Debug::debug("Parsing robot rules"); $self->{'rules'}->parse($robot_url, $c, $fresh_until); }