Cf talkers,

I have a client with many many similar sites on a single server using CFMX.  Each of the sites is part of a "network" of
sites that all link together - about 150 to 200 sites in all.  Each home page has links to other sites in the network.
Periodically, it appears that google or a similar search engine  hits a home page and spiders the links - which of
course leads it to other sites on the server and other links. This generates (again - this is my hypothesus from
examining the logs and behaviour) concurrent requests for similar pages that all hit the same "news" database (in
access). Sequelink (the access service for Jrun I think) locks up quickly trying to service hundreds of requests at once
to the same access file. This results in a climbing request queue that climbs into the thousands and requires a restart
of the CFMX services.

To fix this issue I am migrating the databases over to SQL server which will help greatly with stability, but this will
take a little time and there is still the problem of trying to avoid letting a spider hit this single server with so
many requests at once.  Each site has a pretty well thought out robots.txt file, but it doesn't help because the links
in question are to external sites - not pages on THIS site (even though these external sites are virtuals on the same
server).

I'm considering suggesting a "mask" be installed for spider agents that eliminates the absolute links and only exposes
the "internal" links - which are controlled by the robots.txt.  I'd like to know if:

A) in anyone's experience my hypothesis may be correct and

B) Is there anything I should watch out for in masking these links

C) Does anyone know of a link that gives me the string values of the various user-agents I'm trying to look for.

Any help will be appreciated - thanks!

-Mark

Mark A. Kruger, MCSE, CFG
www.cfwebtools.com
www.necfug.com
http://blog.mxconsulting.com
...what the web can be!
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

Reply via email to