[ 
http://jira.dspace.org/jira/browse/DS-440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11019#action_11019
 ] 

Mark Diggory commented on DS-440:
---------------------------------

[15:07] <stuartlewis> http://jira.dspace.org/jira/browse/DS-440 spiders.txt 
empty
[15:08] <stuartlewis> Need input from mdiggory here
[15:08] <stuartlewis> My guess would be to ship a preconfigured list with 1.6, 
and look at an update process for post 1.6
[15:08] <lcs> there's also a list of spider user-agent keywords in one of the 
sitemaps to identify spiders..
[15:08] <mhwood> Anything we ship will be outdated. We need to document that in 
big red letters.
[15:08] <lcs> would be good to merge that logic
[15:09] <tdonohue> +1 to shipping with some sort of list (or at least 
documentation on how to format that spiders.txt file)
[15:09] <mhwood> +1 some list is better than no list
[15:09] <stuartlewis> Yes - 1.6.1 would need an update mechanism, a 
preconfigured list should catch 90% until then
[15:09] <lcs> how about adding references to recommended websites to obtain 
current lists of spider names?
[15:10] <stuartlewis> IIRC spiders.txt works on IP adresses. Could be upgraded 
to include user-agent strings too.
[15:11] <tdonohue> So, should we leave assigned to mdiggory and come up with 
some sort of list (even if it's just an example)?
[15:11] <stuartlewis> Yes - sounds sensible in the short timeframe we have
[15:11] <tdonohue> DS-440 Summary: Talk to mdiggory. Need to have some sort of 
list and or recommendations on how to get a current list.
[15:11] <stuartlewis> (any spider filtering is better than the current 
situation of no spider filtering)
[15:11] <stuartlewis> http://jira.dspace.org/jira/browse/DS-441 - resolved
[15:11] <richardrodgers> +1 provided we make clear it needs maintenance..
[15:11] <lcs> see dspace-xmlui/dspace-xmlui-webapp/src/main/webapp/sitemap.xmap 
for some detection logic

> spiders.txt empty
> -----------------
>
>                 Key: DS-440
>                 URL: http://jira.dspace.org/jira/browse/DS-440
>             Project: DSpace 1.x
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Stuart Lewis
>            Assignee: Mark Diggory
>             Fix For: 1.6.0
>
>
> spiders.txt is currently empty, so search engine robots are not being 
> excluded from solr stats.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.dspace.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to