Re: Store mapreduce output into my own data structures

Liu Xianglong Fri, 27 Nov 2009 21:46:02 -0800

Hi, Jeff. Thanks for you reply. Actually, I will do further process of themap-reduce output. If I cannot store them in memory, other modules cannotprocess them. So if these modules are integrated into map-reduce, then theywill finish the process in mapreduce jobs. The problem is that these modulesare complicated. The easy way is to store output of jobs in memory. What doyou think? Do you have such experiences?


--------------------------------------------------
From: "Jeff Zhang" <[email protected]>
Sent: Friday, November 27, 2009 10:46 PM
To: <[email protected]>
Subject: Re: Store mapreduce output into my own data structures

So how do you plan to integrate your other modules with hadoop ?

Put them in reduce phase ?


Jeff Zhang



On Fri, Nov 27, 2009 at 6:37 AM, <[email protected]> wrote:

Actually I want the output can be used by other modules. So it has toreadthe output from hdfs files? Or integrate these modules into map-reduce?Is

there other ways?

--------------------------------------------------
From: "Jeff Zhang" <[email protected]>
Sent: Friday, November 27, 2009 10:00 PM
To: <[email protected]>
Subject: Re: Store mapreduce output into my own data structures


 Hi Liu,


Why you want to store the output in memory?  You can not use the output
out
of reducer.
Actually at the beginning the output of reducer is in memory, and the
OutputFormat write these data to file system or other data store.


Jeff Zhang



2009/11/27 Liu Xianglong <[email protected]>

 Hi, everyone. Is there someone who uses map-reduce to store the reduce

output in memory. I mean, now the output path of job is set and reduce
outputs are stored into files under this path.(see the comments along
with
the following codes)
   job.setOutputFormatClass(MyOutputFormat.class);
   //can I implement my OutputFormat to store these output key-value
pairs
in my data structures, or are these other ways to do it?
   job.setOutputKeyClass(ImmutableBytesWritable.class);
   job.setOutputValueClass(Result.class);
   FileOutputFormat.setOutputPath(job, outputDir);

 Is there any way to store them in some variables or data structures?
Then
how can I implement my OutputFormat? Any suggestions and codes are
welcomed.

Another question: is there some way to set the number of map task? It
seems

there is no API to do this in hadoop new job APIs. I am not sure theway

to
set this number.

Thanks!

Best Wishes!
_____________________________________________________________

刘祥龙  Liu Xianglong

Re: Store mapreduce output into my own data structures

Reply via email to