Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchTutorial" page has been changed by kiranchitturi: http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=61&rev2=62 This will include any URL in the domain `nutch.apache.org`. === 3.1 Using the Crawl Command === + + {{{#!wiki caution + The crawl command is deprecated. Please see section [[#A3.3._Using_the_crawl_script|3.3]] on how to use the crawl script that is intended to replace the crawl command. + }}} + Now we are ready to initiate a crawl, use the following parameters: * '''-dir''' ''dir'' names the directory to put the crawl in. @@ -220, +225 @@ }}} We are now ready to search with Apache Solr. + === 3.3. Using the crawl script === + + If you have followed the 3.2 section above on how the crawling can be done step by step, you might be wondering how a bash script can be written to automate all the process described above. + + Nutch developers have written one for you :), and it is available at [[bin/crawl]]. + + {{{ + Usage: bin/crawl <seedDir> <crawlID> <solrURL> <numberOfRounds> + Example: bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2 + }}} + + + The crawl script has lot of parameters set, and you can modify the parameters to your needs. It would be ideal to understand the parameters before setting up big crawls. + + == 4. Setup Solr for search == * download binary file from [[http://www.apache.org/dyn/closer.cgi/lucene/solr/|here]] * unzip to `$HOME/apache-solr-3.X`, we will now refer to this as `${APACHE_SOLR_HOME}`