Certainly LWP is widely used, but I think it's an open question as to how
many LWP users use the robots.txt capabilities.  I have used LWP
extensively, but have never bothered with the latter.  My robots target a
handful of sites and really don't recurse, as such, so I just keep an eye on
those sites' policies.  And they tend to be very large, busy sites, so I'm a
mere blip in their stats, I assume... which is not to say that I would
lightly ignore anyone's wishes regarding robots.  But I'm not really doing
the usual search engine robot thing of sucking down every page.  I'm heavily
focused on tools that figure out which pages are most significant, so my
robots behave more like people would... which I hope leaves me a bit more
free.

Going back to the original question... I can't quite see why anyone would
give a robot a name like "Banjo/1.1 [http://nowhere.int/banjo.html
[EMAIL PROTECTED]]".  But if that's the name, then that's what robots.txt
should reference.  A robots.txt that contains a directive for a robot named
"Banjo" should either be referring to another robot or it has the wrong
name.

I think the original poster has confused (conflated, actually) the HTTP
"User-Agent" and "From" headers.

> $ua = LWP::RobotUA->new($agent_name, $from, [$rules])
>
> Your robot's name and the mail address of the human responsible for the
robot (i.e. you) is
> required by the constructor.

Create a user-agent object thus:

"$ua = LWP::RobotUA->new('Banjo/1.1','http://nowhere.int/banjo.html
[EMAIL PROTECTED]')

The string that gets compared with robots.txt is "Banjo/1.1".  That's the
HTTP "User-Agent" header.  The second parameter is the HTTP "From" header,
which allows the target site's administrator to find you (easily) if your
robot misbehaves.  Of course, it isn't special to robots.  Any HTTP client
can send a "From" header (the default behavior of which in some clients led
to much controversy years ago, of course).

>From the LWP docs: "The from attribute can be set to the e-mail address of
the person responsible for running the application. If this is set, then the
address will be sent to the servers with every request."

Hope that's reasonably clear.

Nick

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Otis Gospodnetic
> Sent: Thursday, March 14, 2002 8:57 AM
> To: [EMAIL PROTECTED]
> Subject: [Robots] Re: SV: matching and "UserAgent:" in robots.txt
>
>
>
> LWP?  Very popular in a big Perl community.


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

Reply via email to