On Thu, Apr 23, 2015 at 05:39:01PM +0000, Monika C. Mevenkamp wrote: > I found a couple of really suspicious numbers in my solr stats, aka lots of > entries were marked as isBot=false although the probably should has been > isBot=true. > > In the config file I use > > spiderips.urls = http://iplists.com/google.txt, \ > http://iplists.com/inktomi.txt, \ > http://iplists.com/lycos.txt, \ > http://iplists.com/infoseek.txt, \ > http://iplists.com/altavista.txt, \ > http://iplists.com/excite.txt, \ > http://iplists.com/northernlight.txt, \ > http://iplists.com/misc.txt, \ > http://iplists.com/non_engines.txt > > > I could not find downloadable lists for Bing, Baidu, Yahoo. > The best I saw was: > http://myip.ms/info/bots/Google_Bing_Yahoo_Facebook_etc_Bot_IP_Addresses.html > Is that reliable ? > > Does anybody out there have lists / sources that they can share ?
What version of DSpace are you running? Recent versions can also recognize spiders by regular expression matching of the domain name or UserAgent: string. (However, that only works for new entries. I've recently found that some of the tools for loading and grooming the stat.s core don't use SpiderDetector and are oblivious of these newer patterns.) > Also: does the dspace code gracefully deal with IP address patterns ? That depends on what is considered graceful. The code (in org.dspace.statistics.util.IPTable) accepts patterns in three forms: 11.22.33.44-11.22.33.55 11.22.33.44 11.22.33 Addresses in the first form may be suffixed with a CIDR mask-length, but it is currently ignored. If I've understood the code, a range (the first form) is assumed to differ only in the fourth octet. It will match all addresses between "44" and "55" within the /24 containing the start of the range. The second form is an exact match of a single address. The third form is a match of the first 24 bits -- an entire Class C subnet. There is no provision for IPv6. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu
signature.asc
Description: Digital signature
------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette