Moreno Feltscher created NUTCH-2491: ---------------------------------------
Summary: Integrate sitemap processing and HostDB into crawl script Key: NUTCH-2491 URL: https://issues.apache.org/jira/browse/NUTCH-2491 Project: Nutch Issue Type: Improvement Reporter: Moreno Feltscher Assignee: Moreno Feltscher Priority: Minor Add three new steps to the crawl bash script: 1. Generate HostDB from CrawlDB 2. Inject URLs from sitemaps URLs found in hosts from HostDb 3. If given, inject sitemap URLs specified in a configuration file / in configuration files -- This message was sent by Atlassian JIRA (v6.4.14#64029)