Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by mozdevil: http://wiki.apache.org/nutch/Nutch0%2e9-Hadoop0%2e10-Tutorial ------------------------------------------------------------------------------ hostname }}} - == Start Hadoop == + == Crawling == + To crawl using dfs it is first necessary to format the namenode of Hadoop and then start it as well as all the datanode services. + Format the namenode {{{ bin/hadoop namenode -format @@ -261, +263 @@ bin/start-all.sh }}} - To stop all of the servers you would use the following command: + To stop all of the servers use the following command, do not do this now: {{{ bin/stop-all.sh }}} - == Crawling == To start crawling from a few urls as seeds an url directory is made in which a seed file is put with some seed urls. This file is put into the hdfs, to check if hdfs has stored the directory use the dfs -ls option of hadoop. {{{ mkdir urls @@ -275, +276 @@ bin/hadoop dfs -ls urls }}} - Start to crawl + Start an initial crawl {{{ bin/nutch crawl urls -dir crawled -depth 3 }}} ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs