Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/crawl" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/bin/crawl?action=diff&rev1=1&rev2=2 - The bin/crawl script gives more command during a crawl. Instead of using org.apache.nutch.crawl.Crawl class, it uses individual steps (inject->generate->fetch->parse->updatedb) during a crawl. It is recommended to use this instead of using the [[bin/nutch crawl]] command. + = Description = + The bin/crawl script gives more command during a crawl. It uses individual steps (inject->generate->fetch->parse->updatedb) during a crawl. + = Usage = + == Nutch 1.X == + {{{ + Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds> + -i|--index Indexes crawl results into a configured indexer + -D A Java property to pass to Nutch calls + Seed Dir Directory in which to look for a seeds file + Crawl Dir Directory where the crawl/link/segments dirs are saved + Num Rounds The number of rounds to run this crawl for + Example: bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ urls/ TestCrawl/ 2 + }}} + + == Nutch 2.x == + + = Need Assistance ? = Please message us in the [[http://nutch.apache.org/mailing_lists.html|user-mailing list]] if you find any issues