WWW::RobotRules attempts to trim the robot's User-Agent before comparing
it with the User-agent field of a robots.txt file:
# Strip it so that it's just the short name.
# I.e., "FooBot" => "FooBot"
# "FooBot/1.2" => "FooBot"
# "FooBot/1.2 [http://foobot.int; [EMAIL PROTECTED]" => "FooBot"
delete $self->{'loc'}; # all old info is now stale
$name = $1 if $name =~ m/(\S+)/; # get first word
$name =~ s!/?\s*\d+.\d+\s*$!!; # loose version
My robot's name is "WDG_SiteValidator/1.5.5". The above code changes the
name to "WDG_SiteValidator/1.", which causes it not to match a robots.txt
User-agent field of "WDG_SiteValidator".
I've attached a patch against WWW::RobotRules 1.23 that replaces the last
line above with
$name =~ s!/.*!!; # loose version
which seems to cover the various cases correctly.
--
Liam Quinn
--- WWW/RobotRules.pm.orig Sat Aug 17 23:32:07 2002
+++ WWW/RobotRules.pm Thu Sep 11 20:55:39 2003
@@ -254,7 +254,7 @@
delete $self->{'loc'}; # all old info is now stale
$name = $1 if $name =~ m/(\S+)/; # get first word
- $name =~ s!/?\s*\d+.\d+\s*$!!; # loose version
+ $name =~ s!/.*!!; # loose version
$self->{'ua'}=$name;
}
$old;