[ 
http://jira.dspace.org/jira/browse/DS-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11126#action_11126
 ] 

Tim Donohue commented on DS-440:
--------------------------------

Stuart,

Your patch works fine for me.  It pulls down the latest known spider IP 
addresses from iplists.com and parses them into spiders.txt.  

A few concerns:

(1) I'm hoping the fact that it generates a 1.6MB spiders.txt file won't hamper 
things in Solr in any way.  I don't have much data to work with locally -- but, 
hopefully someone with more data can test to see if such a large spiders.txt 
file decreases Solr's speed?

(2) I noticed that currently the spiders.txt file is overwritten whenever you 
run 'dspace update-spider-ips'.  So, if an institution added their own IPs to 
that file those would be overwritten. 

Longer term, we may need to figure out a better way to manage this spiders.txt 
file.  Or have Solr accept multiple spiders.txt files, so that institutions can 
add their own IPs to a separate file.

> spiders.txt empty
> -----------------
>
>                 Key: DS-440
>                 URL: http://jira.dspace.org/jira/browse/DS-440
>             Project: DSpace 1.x
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Stuart Lewis
>            Assignee: Mark Diggory
>             Fix For: 1.6.0
>
>         Attachments: [DS-440]_spiders_txt_is_empty.patch.txt
>
>
> spiders.txt is currently empty, so search engine robots are not being 
> excluded from solr stats.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to