On Sun, Feb 13, 2011 at 02:45, Jeffrey Trimble <jatrim...@ysu.edu> wrote:
> We've had to block several sites (certain web crawlers causing us headaches,
> and not the legitimate ones) using IPSec.  Of course
> it blocks them from everything.
> That's one option, though a little severe, IMHO.

Right, blocking a web spider is an application where you really could
think of blocking a single IP from DSpace. The problem with blocking a
single IP is that the attacker's IP may change in time.

If you want to block a well-behaving spider, it should respect
robots.txt. You can find the spider's name from your apache access
logs. Then you can block just this one robot. In DSpace, you should
place robots.txt in these locations:
[dspace]/webapps/jspui/robots.txt
[dspace]/webapps/xmlui/static/robots.txt

The contents would look like this (with the name of the spider from your logs):
User-agent: BadBot
Disallow: /

Details here:
http://www.robotstxt.org/
http://www.robotstxt.org/faq/blockjustbad.html

Regards,
~~helix84

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to