[ 
https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448596#comment-13448596
 ] 

Lewis John McGibbney commented on NUTCH-1465:
---------------------------------------------

Hi Ken,
{bq} I could start a thread, but I also don't want to flog a dead horse {bq}

I thought there had been renewed interest over @ CC but it looks like this is 
not the case. So I guess that we can progress with moving the sitemap-parser 
into Nutch. There have been people from the community who would like it I 
therefore see no reason not to. There was also mention of the canonical tag 
topic again in the thread I cited above (and there are also issues already 
logged on our Jira for this as well) so it will be interesting to see what the 
code contains.   
                
> Support sitemaps in Nutch
> -------------------------
>
>                 Key: NUTCH-1465
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1465
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Lewis John McGibbney
>             Fix For: 1.6, 2.1
>
>
> I recently came across this rather stagnant codebase[0] which is ASL v2.0 
> licensed and appears to have been used successfully to parse sitemaps as per 
> the discussion here[1].
> [0] http://sourceforge.net/projects/sitemap-parser/
> [1] 
> http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to