Let's say you are running a robot over a site http://www.megacorp.com that has this robots.txt file: > User-Agent: * > Disallow: http://www.gigacorp.com > Disallow: /cgi-bin/ WWW::RobotRules interprets this as forbidding all access to http://www.megacorp.com. In WWW::RobotRules::parse, the relative URL from the Disallow field is converted to an absolute URL with this line: $disallow = URI->new($disallow, $url)->path_query; However, URI->new("http://www.gigacorp.com", "http://www.megacorp.com") returns "http://www.gigacorp.com", the path component of which is "/". So, $disallow above gets assigned to "/". In other words, the entire hierarchy is forbidden which is certainly not right. I've patched my copy of RobotRules.pm by adding: next if $disallow =~ /^http/; I believe this conforms to the robots.txt description and is a good idea. Tony
