Cf talkers,
I have a client with many many similar sites on a single server using CFMX. Each of the sites is part of a "network" of
sites that all link together - about 150 to 200 sites in all. Each home page has links to other sites in the network.
Periodically, it appears that google or a similar search engine hits a home page and spiders the links - which of
course leads it to other sites on the server and other links. This generates (again - this is my hypothesus from
examining the logs and behaviour) concurrent requests for similar pages that all hit the same "news" database (in
access). Sequelink (the access service for Jrun I think) locks up quickly trying to service hundreds of requests at once
to the same access file. This results in a climbing request queue that climbs into the thousands and requires a restart
of the CFMX services.
To fix this issue I am migrating the databases over to SQL server which will help greatly with stability, but this will
take a little time and there is still the problem of trying to avoid letting a spider hit this single server with so
many requests at once. Each site has a pretty well thought out robots.txt file, but it doesn't help because the links
in question are to external sites - not pages on THIS site (even though these external sites are virtuals on the same
server).
I'm considering suggesting a "mask" be installed for spider agents that eliminates the absolute links and only exposes
the "internal" links - which are controlled by the robots.txt. I'd like to know if:
A) in anyone's experience my hypothesis may be correct and
B) Is there anything I should watch out for in masking these links
C) Does anyone know of a link that gives me the string values of the various user-agents I'm trying to look for.
Any help will be appreciated - thanks!
-Mark
Mark A. Kruger, MCSE, CFG
www.cfwebtools.com
www.necfug.com
http://blog.mxconsulting.com
...what the web can be!
[Todays Threads]
[This Message]
[Subscription]
[Fast Unsubscribe]
[User Settings]
- RE: user agent checking and spidering... Mark A. Kruger - CFG
- RE: user agent checking and spidering... Jim Davis
- RE: user agent checking and spidering... Mark A. Kruger - CFG
- RE: user agent checking and spidering... Jim Davis
- Re: user agent checking and spidering... Jochem van Dieten
- RE: user agent checking and spidering... Dave Watts
- RE: user agent checking and spidering... Mark A. Kruger - CFG
- Re: user agent checking and spidering... Stephen Moretti
- RE: user agent checking and spiderin... Mark A. Kruger - CFG
- RE: user agent checking and spidering... Mark A. Kruger - CFG
- RE: user agent checking and spidering... Dave Watts