So the configure() method is called when the Reduce task starts, before the actual reduce takes place ? Is that so ? Same for map ?
Thanks, Naama On Thu, Mar 6, 2008 at 6:02 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > > > This is not difficult to do. Simply open an extra file in the reducers > configure method and close it in the close method. Make sure you make it > relative to the map reduce output directory so that you can take advantage > of all of the machinery that handles lost jobs and such. > > Search the mailing list archives for more details. > > > On 3/6/08 5:22 AM, "Naama Kraus" <[EMAIL PROTECTED]> wrote: > > > Well, I was not actually thinking to use Nutch. > > To be concrete, I was interested if a MapReduce job could output > multiple > > files each holds different <key,value> pairs. I got the impression this > is > > done in Nutch from slide 15 of > > > http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments > > /yahoo-sds.pdf > > but maybe I was mis-understanding. > > Is it Nutch specific or achievable using Hadoop API ? Would multiple > > different reducers do the trick ? > > > > Thanks for offering to help, I might have more concrete details of what > I am > > trying to implement later on, now I am basically learning. > > > > Naama > > > > On Thu, Mar 6, 2008 at 3:13 PM, Enis Soztutar <[EMAIL PROTECTED]> > > wrote: > > > >> Hi, > >> > >> Currently nutch is a fairly complex application that *uses* hadoop as a > >> base for distributed computing and storage. In this regard there is no > >> part in nutch that "extends" hadoop. The core of the mapreduce indeed > >> does work with <key,value> pairs, and nutch uses specific <key,value> > >> pairs such as <url, CrawlDatum>, etc. > >> > >> So long story short, it depends on what you want to build. If you > >> working on something that is not related to nutch, you do not need it. > >> You can give further info about your project if you want extended help. > >> > >> best wishes. > >> Enis > >> > >> Naama Kraus wrote: > >>> Hi, > >>> > >>> I've seen in > >>> > >> > http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon > >> 05.pdf(slide< > http://wiki.apache.org/nutch-data/attachments/Presentations/atta > >> chments/oscon05.pdf%28slide> > >>> 12) that Nutch has extensions to MapReduce. I wanted to ask whether > >>> these are part of the Hadoop API or inside Nutch only. > >>> > >>> More specifically, I saw in > >>> > >> > http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachment > >> s/yahoo-sds.pdf(slide< > http://wiki.apache.org/hadoop-data/attachments/HadoopPr > >> esentations/attachments/yahoo-sds.pdf%28slide> > >>> 15) that MapReduce outputs two files each holds different <key,value> > >>> pairs. I'd be curious to know if I can achieve that using the standard > >> API. > >>> > >>> Thanks, Naama > >>> > >>> > >> > > > > > > -- oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo "If you want your children to be intelligent, read them fairy tales. If you want them to be more intelligent, read them more fairy tales." (Albert Einstein)