> That's not what I'm finding. If you have a robots.txt file
> that says:
>
> disallow /search.cfm
>
> It will not index the search.cfm file from the root of the
> server. But I cannot find anywhere where you can put in
> something like this:
>
> disallow http://www.someothersite.com
>
>
> You see what I mean? The robot.txt file allows you to exclude
> pages on THIS site that you don't want indexed.
No, you can't use robots.txt to disallow spidering of other sites. However,
if your site has a link to www.someothersite.com, a well-written spider
should request http://www.someothersite.com/robots.txt before requesting
that URL.
Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
phone: 202-797-5496
fax: 202-797-5444
[Todays Threads]
[This Message]
[Subscription]
[Fast Unsubscribe]
[User Settings]
- user agent checking and spidering... Mark A. Kruger - CFG
- RE: user agent checking and spidering... Jim Davis
- RE: user agent checking and spidering... Mark A. Kruger - CFG
- RE: user agent checking and spidering... Jim Davis
- Re: user agent checking and spidering... Jochem van Dieten
- RE: user agent checking and spidering... Dave Watts
- RE: user agent checking and spidering... Mark A. Kruger - CFG
- Re: user agent checking and spidering... Stephen Moretti
- RE: user agent checking and spiderin... Mark A. Kruger - CFG
- RE: user agent checking and spidering... Mark A. Kruger - CFG
- Dave Watts