Hi, Jeff. Thanks for you reply. Actually, I will do further process of the
map-reduce output. If I cannot store them in memory, other modules cannot
process them. So if these modules are integrated into map-reduce, then they
will finish the process in mapreduce jobs. The problem is that these modules
are complicated. The easy way is to store output of jobs in memory. What do
you think? Do you have such experiences?
--------------------------------------------------
From: "Jeff Zhang" <[email protected]>
Sent: Friday, November 27, 2009 10:46 PM
To: <[email protected]>
Subject: Re: Store mapreduce output into my own data structures
So how do you plan to integrate your other modules with hadoop ?
Put them in reduce phase ?
Jeff Zhang
On Fri, Nov 27, 2009 at 6:37 AM, <[email protected]> wrote:
Actually I want the output can be used by other modules. So it has to
read
the output from hdfs files? Or integrate these modules into map-reduce?
Is
there other ways?
--------------------------------------------------
From: "Jeff Zhang" <[email protected]>
Sent: Friday, November 27, 2009 10:00 PM
To: <[email protected]>
Subject: Re: Store mapreduce output into my own data structures
Hi Liu,
Why you want to store the output in memory? You can not use the output
out
of reducer.
Actually at the beginning the output of reducer is in memory, and the
OutputFormat write these data to file system or other data store.
Jeff Zhang
2009/11/27 Liu Xianglong <[email protected]>
Hi, everyone. Is there someone who uses map-reduce to store the reduce
output in memory. I mean, now the output path of job is set and reduce
outputs are stored into files under this path.(see the comments along
with
the following codes)
job.setOutputFormatClass(MyOutputFormat.class);
//can I implement my OutputFormat to store these output key-value
pairs
in my data structures, or are these other ways to do it?
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);
FileOutputFormat.setOutputPath(job, outputDir);
Is there any way to store them in some variables or data structures?
Then
how can I implement my OutputFormat? Any suggestions and codes are
welcomed.
Another question: is there some way to set the number of map task? It
seems
there is no API to do this in hadoop new job APIs. I am not sure the
way
to
set this number.
Thanks!
Best Wishes!
_____________________________________________________________
刘祥龙 Liu Xianglong