Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by SteveSeverance:
http://wiki.apache.org/nutch/Getting_Started

New page:
This page is a collection of information that is useful for new developers. 
Some of this is going to need to be moved to the Hadoop Wiki but I am putting 
it here first as I assemble this. Please feel free to add, comment and make 
corrections.

Steve

To new developers: If you want to begin to develop on Nutch do not forget to 
get started looking at the Hadoop source code. Hadoop is the platform that 
Nutch is implemented on. In order to understand anything about how Nutch works 
you need to also understand Hadoop.

=== What are the Hadoop primitives and how do I use them? Why are they there 
(what functionality do the add over regular primitives)? ===

These primitives implement the Hadoop Writable interface (or 
WritableComparable). What this does is gives Hadoop control over the 
serialization of these objects. If you look at the higher level Hadoop File 
System objects like ArrayFile you will see that they implement the same 
interfaces for serialization. Using these primitive types allows the 
serialization to be done in the same way as higher order data structures such 
as MapFile.

=== How does the Hadoop implementation of  MapReduce work? ===

 1. First you need a JobConf. This class contains all the relevant information 
for the job. Information that you need to ensure that you include in the 
JobConf include:
 2. Then you need to submit your job to Hadoop to be run. This is done by 
calling JobClient.runJob. JobClient. runJob submits the job for starting and 
handles receiving status updates back from the job. It starts by creating an 
instance of the JobClient. It continues to push the job toward execution by 
calling JobClient.submitJob
 3. JobClient.submitJob handles splitting the input files and generating the 
MapReduce task.

== Tutorials ==
 * CountLinks Counting outbound links with MapReduce

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to