[
http://jira.dspace.org/jira/browse/DS-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11126#action_11126
]
Tim Donohue commented on DS-440:
--------------------------------
Stuart,
Your patch works fine for me. It pulls down the latest known spider IP
addresses from iplists.com and parses them into spiders.txt.
A few concerns:
(1) I'm hoping the fact that it generates a 1.6MB spiders.txt file won't hamper
things in Solr in any way. I don't have much data to work with locally -- but,
hopefully someone with more data can test to see if such a large spiders.txt
file decreases Solr's speed?
(2) I noticed that currently the spiders.txt file is overwritten whenever you
run 'dspace update-spider-ips'. So, if an institution added their own IPs to
that file those would be overwritten.
Longer term, we may need to figure out a better way to manage this spiders.txt
file. Or have Solr accept multiple spiders.txt files, so that institutions can
add their own IPs to a separate file.
> spiders.txt empty
> -----------------
>
> Key: DS-440
> URL: http://jira.dspace.org/jira/browse/DS-440
> Project: DSpace 1.x
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Stuart Lewis
> Assignee: Mark Diggory
> Fix For: 1.6.0
>
> Attachments: [DS-440]_spiders_txt_is_empty.patch.txt
>
>
> spiders.txt is currently empty, so search engine robots are not being
> excluded from solr stats.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel