You can also go to www.webcrawler.com and search on "robots.txt".  A Webcrawler
employee wrote a primer on robots.

nagendra mishr wrote:

> robots.txt is a list of files robots should not look at.  You would create that
> file and put in it a list of directories and other files to restrict the robots.
>
> I think the most common tools which use that are www perl libraries.
>
> If you want to experiment, you can try w3mir.  I know it looks at robots.txt
> before mirroring..
>
> CyberPsychotic wrote:
>
> > I just looked over my weblogs and found several requests like this:
> > crawl4.atext.com - - [22/Nov/1998:09:30:52 -0600] "GET /robots.txt HTTP/1.0"
> > 404 -
> >
> > this seem to be a web-crawler, but any ideas what does it look for in that
> > file?
> >
> > -====---====---====---====---====---====---====---====---====---====---====-
> >  to unsubscribe email "unsubscribe linux-admin" to [EMAIL PROTECTED]
> >  See the linux-admin FAQ: http://www.kalug.lug.net/linux-admin-FAQ/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to [EMAIL PROTECTED]

Reply via email to