SOLR Statistics: Better detection & avoidance of abusive traffic (including a
bot trap)
----------------------------------------------------------------------------------------
Key: DS-919
URL: https://jira.duraspace.org/browse/DS-919
Project: DSpace
Issue Type: New Feature
Components: Solr
Reporter: Bram Luyten (@mire)
The current implementation of bot traffic filtering relies on IP lists. Even
though using hostnames (as suggested here:
https://jira.duraspace.org/browse/DS-790 ) could improve the situation, there
are still forms of abusive traffic we might want to detect and exclude from
stats.
The most obvious example here would be repeated hits or downloads coming from
the same unique source. Another example could be traffic from spiders that
aren't included in the lists. A way to do this would be to create a bot trap: a
link hidden behind one pixel, that a human user would never click, but that
bots might follow. The agents getting to the resource at this link, could be
listed and dynamically removed from the hit/download counts.
Some related links:
http://www.affiliatebeginnersguide.com/sitelogs/bots_hunt.html
http://www.elxsy.com/2009/06/how-to-identify-and-ban-bots-spiders-crawlers/
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.duraspace.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel