Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by mozdevil:
http://wiki.apache.org/nutch/Nutch0%2e9-Hadoop0%2e10-Tutorial

------------------------------------------------------------------------------
  hostname
  }}}
  
- == Start Hadoop ==
+ == Crawling ==
+ To crawl using dfs it is first necessary to format the namenode of Hadoop and 
then start it as well as all the datanode services.
+ 
  Format the namenode
  {{{
  bin/hadoop namenode -format
@@ -261, +263 @@

  bin/start-all.sh
  }}}
  
- To stop all of the servers you would use the following command:
+ To stop all of the servers use the following command, do not do this now:
  {{{
  bin/stop-all.sh
  }}}
  
- == Crawling ==
  To start crawling from a few urls as seeds an url directory is made in which 
a seed file is put with some seed urls. This file is put into the hdfs, to 
check if hdfs has stored the directory use the dfs -ls option of hadoop.
  {{{
  mkdir urls
@@ -275, +276 @@

  bin/hadoop dfs -ls urls
  }}}
  
- Start to crawl
+ Start an initial crawl
  {{{
  bin/nutch crawl urls -dir crawled -depth 3
  }}}

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to