Dave,

That's not what I'm finding.  If you have a robots.txt file that says:

disallow /search.cfm

It will not index the search.cfm file from the root of the server. But I cannot find anywhere where you can put in
something like this:

disallow http://www.someothersite.com

You see what I mean? The robot.txt file allows you to exclude pages on THIS site that you don't want indexed.

-Mark
  -----Original Message-----
  From: Dave Watts [mailto:[EMAIL PROTECTED]
  Sent: Sunday, April 04, 2004 5:23 PM
  To: CF-Talk
  Subject: RE: user agent checking and spidering...

  > Sequelink (the access service for Jrun I think) locks up
  > quickly trying to service hundreds of requests at once to
  > the same access file.

  As a short-term fix, have you considered a more aggressive caching strategy?
  That might be pretty easy to implement.

  > Each site has a pretty well thought out robots.txt file, but
  > it doesn't help because the links in question are to external
  > sites - not pages on THIS site (even though these external
  > sites are virtuals on the same server).

  I don't think I understand this. It shouldn't matter whether the links are
  internal or external - before a well-written spider requests the link, it
  should check that server's robots.txt file first.

  Dave Watts, CTO, Fig Leaf Software
  http://www.figleaf.com/
  phone: 202-797-5496
  fax: 202-797-5444
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

Reply via email to