[ https://issues.apache.org/jira/browse/NUTCH-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney updated NUTCH-2491: ---------------------------------------- Fix Version/s: 1.15 > Integrate sitemap processing and HostDB into crawl script > --------------------------------------------------------- > > Key: NUTCH-2491 > URL: https://issues.apache.org/jira/browse/NUTCH-2491 > Project: Nutch > Issue Type: Improvement > Reporter: Moreno Feltscher > Assignee: Moreno Feltscher > Priority: Minor > Fix For: 1.15 > > > Add three new steps to the crawl bash script: > 1. Generate HostDB from CrawlDB > 2. Inject URLs from sitemaps URLs found in hosts from HostDb > 3. If given, inject sitemap URLs specified in a configuration file / in > configuration files -- This message was sent by Atlassian JIRA (v6.4.14#64029)