Re: Nutch Extensions to MapReduce

Enis Soztutar Thu, 06 Mar 2008 05:14:26 -0800

Hi,

Currently nutch is a fairly complex application that *uses* hadoop as abase for distributed computing and storage. In this regard there is nopart in nutch that "extends" hadoop. The core of the mapreduce indeeddoes work with <key,value> pairs, and nutch uses specific <key,value>pairs such as <url, CrawlDatum>, etc.

So long story short, it depends on what you want to build. If youworking on something that is not related to nutch, you do not need it.You can give further info about your project if you want extended help.


best wishes.
Enis

Naama Kraus wrote:

Hi,

I've seen in
http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf(slide
12) that Nutch has extensions to MapReduce. I wanted to ask whether
these are part of the Hadoop API or inside Nutch only.

More specifically, I saw in
http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf(slide
15) that MapReduce outputs two files each holds different <key,value>
pairs. I'd be curious to know if I can achieve that using the standard API.

Thanks, Naama

Re: Nutch Extensions to MapReduce

Reply via email to