On Tue, 2011-01-18 at 22:08 -0500, Chad Bailey wrote: > I'd disagree that blocking IP's is easier...
Well when the goal at the time was to stop the immediate load, blocking the offending IPs would have been much quicker, thus easier. Where as the robots.txt file is a bit different and a broader solution. > as there is no feasable > way to know all IP addresses of current and future robots that crawl > the web, especially with regard to Google not being the only one out > there. That is correct, but I was not bothered by any other crawlers, only Googles. As for Googles crawlers IP address changing, yes that would be a problem, and would have to build up a list of IPs for google crawlers. Which I don't think would be unreasonable to do. I could start by grepping logs and stuff to get an idea of past IPs used. Then look at the ip blocks and scan for others. Since Google is doing PTRs for the IPs that identify it as a crawler ;) Keep in mind I only wanted to stop the unwanted load from Google's crawlers. The robots.txt file now effects every crawler, every search engine. Which is really not ideal for a variety of reasons. At some point I and/or the group will need to revisit the robots.txt file and dial it in so we can allow some stuff out to search engine. But just sucks that since Google's crawler is not wiki friendly, other crawlers must suffer and pay the price as well. The blocking IP approach would only have been specific to Google. Then we would not have to worry about dealing with allowing certain pages/areas of the wiki, denying others, etc. All because of Google, but can't easily come up with a Google specific solution using robots.txt file. Not to mention it took several hours for Googles crawlers to discover the robots.txt file. Though I did find out it had been looked for previously, like 8hrs before I created the one that exists now. But I still do not see the current solution as simple, elegant, or friendly to all other crawlers/search engines. More of a band aid, temporary fix than any sort of a solution really. -- William L. Thomson Jr. Systems Administrator Jacksonville Linux Users Group --------------------------------------------------------------------- Archive http://marc.info/?l=jaxlug-list&r=1&w=2 RSS Feed http://www.mail-archive.com/[email protected]/maillist.xml Unsubscribe [email protected]

