Hi,

Currently nutch is a fairly complex application that *uses* hadoop as a base for distributed computing and storage. In this regard there is no part in nutch that "extends" hadoop. The core of the mapreduce indeed does work with <key,value> pairs, and nutch uses specific <key,value> pairs such as <url, CrawlDatum>, etc.

So long story short, it depends on what you want to build. If you working on something that is not related to nutch, you do not need it. You can give further info about your project if you want extended help.

best wishes.
Enis

Naama Kraus wrote:
Hi,

I've seen in
http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf(slide
12) that Nutch has extensions to MapReduce. I wanted to ask whether
these are part of the Hadoop API or inside Nutch only.

More specifically, I saw in
http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf(slide
15) that MapReduce outputs two files each holds different <key,value>
pairs. I'd be curious to know if I can achieve that using the standard API.

Thanks, Naama

Reply via email to