[ 
https://jira.duraspace.org/browse/DS-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Dietz reassigned DS-1008:
-------------------------------

    Assignee: Peter Dietz
    
> Solr Statistics markRobotsByIP can mark too many IP addresses, including IP's 
> not on the IP list
> ------------------------------------------------------------------------------------------------
>
>                 Key: DS-1008
>                 URL: https://jira.duraspace.org/browse/DS-1008
>             Project: DSpace
>          Issue Type: Bug
>          Components: Solr
>    Affects Versions: 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1, 1.7.2
>            Reporter: Peter Dietz
>            Assignee: Peter Dietz
>         Attachments: DS-1008-fix-robot-overcounting.patch
>
>
> The function markRobotsByIP is including too many bot IP's by a factor of 
> potentially 9.
> https://github.com/DSpace/DSpace/blob/5366d237afa07005ec485831c9bca1f1c992f01d/dspace-stats/src/main/java/org/dspace/statistics/SolrLogger.java#L473
> /* query for ip, exclude results previously set as bots. */
> processor.execute("ip:"+ip+ "* AND -isBot:true");
> ip* would expand:
> 10.10.10* to 10.10.[10, 100-109].*
> 10.10.10.10* to 10.10.10.[10, 100-109]
> My co-worker Brian Stamper suggested:
> if (ip.matches("[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+") {
>    // Full 4 octet string, run as-is.
>         processor.execute("ip:" + ip + " AND -isBot:true");
> } else if (ip.matches("\.$") {
>    // didn't match full-octet, but ends in period, we assume it was something 
> like #.#.#. or #.#. -- I don't expect this in the "stock" input from 
> ip-list.com
>         processor.execute("ip:" + ip + "* AND -isBot:true");
> } else if (ip.matches("[0-9]$") {
>   // ends with a number, and is not a full 4-octet as first entry, so we 
> append .*
>         processor.execute("ip:" + ip + ".* AND -isBot:true");
> } else {
>         log.error("Unexpected IP value: " + ip);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to