Re: Nutch Extensions to MapReduce

Naama Kraus Sat, 08 Mar 2008 13:06:43 -0800

So the configure() method is called when the Reduce task starts, before the
actual reduce takes place ? Is that so ?
Same for map ?


Thanks, Naama

On Thu, Mar 6, 2008 at 6:02 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
>
> This is not difficult to do.  Simply open an extra file in the reducers
> configure method and close it in the close method.  Make sure you make it
> relative to the map reduce output directory so that you can take advantage
> of all of the machinery that handles lost jobs and such.
>
> Search the mailing list archives for more details.
>
>
> On 3/6/08 5:22 AM, "Naama Kraus" <[EMAIL PROTECTED]> wrote:
>
> > Well, I was not actually thinking to use Nutch.
> > To be concrete, I was interested if a MapReduce job could output
> multiple
> > files each holds different <key,value> pairs. I got the impression this
> is
> > done in Nutch from slide 15 of
> >
> http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments
> > /yahoo-sds.pdf
> > but maybe I was mis-understanding.
> > Is it Nutch specific or achievable using Hadoop API ? Would multiple
> > different reducers do the trick ?
> >
> > Thanks for offering to help, I might have more concrete details of what
> I am
> > trying to implement later on, now I am basically learning.
> >
> > Naama
> >
> > On Thu, Mar 6, 2008 at 3:13 PM, Enis Soztutar <[EMAIL PROTECTED]>
> > wrote:
> >
> >> Hi,
> >>
> >> Currently nutch is a fairly complex application that *uses* hadoop as a
> >> base for distributed computing and storage. In this regard there is no
> >> part in nutch that "extends" hadoop. The core of the mapreduce indeed
> >> does work with <key,value> pairs, and nutch uses specific <key,value>
> >> pairs such as <url, CrawlDatum>, etc.
> >>
> >> So long story short, it depends on what you want to build. If you
> >> working on something that is not related to nutch, you do not need it.
> >> You can give further info about your project if you want extended help.
> >>
> >> best wishes.
> >> Enis
> >>
> >> Naama Kraus wrote:
> >>> Hi,
> >>>
> >>> I've seen in
> >>>
> >>
> http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon
> >> 05.pdf(slide<
> http://wiki.apache.org/nutch-data/attachments/Presentations/atta
> >> chments/oscon05.pdf%28slide>
> >>> 12) that Nutch has extensions to MapReduce. I wanted to ask whether
> >>> these are part of the Hadoop API or inside Nutch only.
> >>>
> >>> More specifically, I saw in
> >>>
> >>
> http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachment
> >> s/yahoo-sds.pdf(slide<
> http://wiki.apache.org/hadoop-data/attachments/HadoopPr
> >> esentations/attachments/yahoo-sds.pdf%28slide>
> >>> 15) that MapReduce outputs two files each holds different <key,value>
> >>> pairs. I'd be curious to know if I can achieve that using the standard
> >> API.
> >>>
> >>> Thanks, Naama
> >>>
> >>>
> >>
> >
> >
>
>


-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Nutch Extensions to MapReduce

Reply via email to