Hi Liu, The reducer task is in an individual JVM, you have to put your modules into reducer task if you really want to access the output in memory.
I am not sure the size of your output, if it's not large, I suggest put them in a message, and wrap your modules into a listener, and then send the message to this listener for further processing. If the size of your output is large, I suggest you store them in hdfs, and put the location in a message and send the message to the listener. Because you said your modules are complicated, so I suggest you separate them with the map-reduce jobs as I mentioned above , it will increase the maintainability and extensibility of your system. Jeff Zhang On Fri, Nov 27, 2009 at 9:45 PM, Liu Xianglong <[email protected]>wrote: > Hi, Jeff. Thanks for you reply. Actually, I will do further process of the > map-reduce output. If I cannot store them in memory, other modules cannot > process them. So if these modules are integrated into map-reduce, then they > will finish the process in mapreduce jobs. The problem is that these modules > are complicated. The easy way is to store output of jobs in memory. What do > you think? Do you have such experiences? > > > -------------------------------------------------- > From: "Jeff Zhang" <[email protected]> > Sent: Friday, November 27, 2009 10:46 PM > > To: <[email protected]> > Subject: Re: Store mapreduce output into my own data structures > > So how do you plan to integrate your other modules with hadoop ? >> >> Put them in reduce phase ? >> >> >> Jeff Zhang >> >> >> >> On Fri, Nov 27, 2009 at 6:37 AM, <[email protected]> wrote: >> >> Actually I want the output can be used by other modules. So it has to >>> read >>> the output from hdfs files? Or integrate these modules into map-reduce? >>> Is >>> there other ways? >>> >>> -------------------------------------------------- >>> From: "Jeff Zhang" <[email protected]> >>> Sent: Friday, November 27, 2009 10:00 PM >>> To: <[email protected]> >>> Subject: Re: Store mapreduce output into my own data structures >>> >>> >>> Hi Liu, >>> >>>> >>>> Why you want to store the output in memory? You can not use the output >>>> out >>>> of reducer. >>>> Actually at the beginning the output of reducer is in memory, and the >>>> OutputFormat write these data to file system or other data store. >>>> >>>> >>>> Jeff Zhang >>>> >>>> >>>> >>>> 2009/11/27 Liu Xianglong <[email protected]> >>>> >>>> Hi, everyone. Is there someone who uses map-reduce to store the reduce >>>> >>>>> output in memory. I mean, now the output path of job is set and reduce >>>>> outputs are stored into files under this path.(see the comments along >>>>> with >>>>> the following codes) >>>>> job.setOutputFormatClass(MyOutputFormat.class); >>>>> //can I implement my OutputFormat to store these output key-value >>>>> pairs >>>>> in my data structures, or are these other ways to do it? >>>>> job.setOutputKeyClass(ImmutableBytesWritable.class); >>>>> job.setOutputValueClass(Result.class); >>>>> FileOutputFormat.setOutputPath(job, outputDir); >>>>> >>>>> Is there any way to store them in some variables or data structures? >>>>> Then >>>>> how can I implement my OutputFormat? Any suggestions and codes are >>>>> welcomed. >>>>> >>>>> Another question: is there some way to set the number of map task? It >>>>> seems >>>>> there is no API to do this in hadoop new job APIs. I am not sure the >>>>> way >>>>> to >>>>> set this number. >>>>> >>>>> Thanks! >>>>> >>>>> Best Wishes! >>>>> _____________________________________________________________ >>>>> >>>>> 刘祥龙 Liu Xianglong >>>>> >>>>> >>>>> >>>> >>
