[ http://jira.dspace.org/jira/browse/DS-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11019#action_11019 ]
Mark Diggory commented on DS-440: --------------------------------- [15:07] <stuartlewis> http://jira.dspace.org/jira/browse/DS-440 spiders.txt empty [15:08] <stuartlewis> Need input from mdiggory here [15:08] <stuartlewis> My guess would be to ship a preconfigured list with 1.6, and look at an update process for post 1.6 [15:08] <lcs> there's also a list of spider user-agent keywords in one of the sitemaps to identify spiders.. [15:08] <mhwood> Anything we ship will be outdated. We need to document that in big red letters. [15:08] <lcs> would be good to merge that logic [15:09] <tdonohue> +1 to shipping with some sort of list (or at least documentation on how to format that spiders.txt file) [15:09] <mhwood> +1 some list is better than no list [15:09] <stuartlewis> Yes - 1.6.1 would need an update mechanism, a preconfigured list should catch 90% until then [15:09] <lcs> how about adding references to recommended websites to obtain current lists of spider names? [15:10] <stuartlewis> IIRC spiders.txt works on IP adresses. Could be upgraded to include user-agent strings too. [15:11] <tdonohue> So, should we leave assigned to mdiggory and come up with some sort of list (even if it's just an example)? [15:11] <stuartlewis> Yes - sounds sensible in the short timeframe we have [15:11] <tdonohue> DS-440 Summary: Talk to mdiggory. Need to have some sort of list and or recommendations on how to get a current list. [15:11] <stuartlewis> (any spider filtering is better than the current situation of no spider filtering) [15:11] <stuartlewis> http://jira.dspace.org/jira/browse/DS-441 - resolved [15:11] <richardrodgers> +1 provided we make clear it needs maintenance.. [15:11] <lcs> see dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/sitemap.xmap for some detection logic > spiders.txt empty > ----------------- > > Key: DS-440 > URL: http://jira.dspace.org/jira/browse/DS-440 > Project: DSpace 1.x > Issue Type: Bug > Affects Versions: 1.6.0 > Reporter: Stuart Lewis > Assignee: Mark Diggory > Fix For: 1.6.0 > > > spiders.txt is currently empty, so search engine robots are not being > excluded from solr stats. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.dspace.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Dspace-devel mailing list Dspace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-devel