On Wed, 14 Jan 2009, Shane Beers wrote: > We had an issue with our local google instance crawling our DSpace > installation and causing huge issues. I re-wrote the robots.txt to disallow > anything besides the item pages themselves - no browsing pages or search > pages > and whatnot. Here is a copy of ours:
We've had to do that for years; without it DSpace just crumbles under the load. I've got a small Perl script which generates a flat html file with links to all our item pages, and we put a link to that in the footer. So we can block all browse pages, but not item or bitstreams, and still get indexed. DSpace 1.x has major scalability issues, alas. No matter how much hardware you throw at it. Best, -- Tom De Mulder <td...@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 14/01/2009 : The Moon is Waning Gibbous (83% of Full) ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech