This is not about dirvish, but about the website. Perhaps some of you sysadmins can help.
You may occasionally see the dirvish.org website stop responding to web requests. dirvish.org is running on my virtual machine at rimuhosting in Dallas, along with half a dozen other low-usage sites. Some of the contents on other sites are lectures and videos, about 5GB of total content. Baidu, the Chinese search engine, spiders the net every 15 minutes, looking for changes. Which means it attempts to download 20GB an hour from my server. Sometimes it does not complete the requests in time, and they accumulate. During the last slowdown, netstat reported 140 open ports to baiduspider, including many big files. Apache stopped taking most new requests, and browsers timed out. As a temporary measure, I've disallowed baiduspider in robots.txt for all my sites. I will move the videos and large files to some of the free file hosting services over time. But I want to keep serving China's 20% of the world's population with reasonably up-to-date search results. So, the question: Is there any way to tell the search spiders to visit once a day or once a week, rather than four times per hour? Or send them "recent changes" lists instead of them repeatedly downloading the same files? Any other ideas for calming down the web crawlers? Keith -- Keith Lofstrom [email protected] Voice (503)-520-1993 KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon" Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs _______________________________________________ Dirvish mailing list [email protected] http://www.dirvish.org/mailman/listinfo/dirvish
