Re: Store mapreduce output into my own data structures

Jeff Zhang Fri, 27 Nov 2009 22:39:16 -0800

Hi Liu,

The reducer task is in an individual JVM, you have to put your modules into
reducer task if you really want to access the output in memory.


I am not sure the size of your output, if it's not large, I suggest put them
in a message, and wrap your modules into a listener, and then send the
message to this listener for further processing.

If the size of your output is large, I suggest you store them in hdfs, and
put the location in a message and send the message to the listener.
Because you said your modules are complicated, so I suggest you separate
them with the map-reduce jobs as I mentioned above , it will increase the
maintainability and extensibility of your system.


Jeff Zhang



On Fri, Nov 27, 2009 at 9:45 PM, Liu Xianglong <[email protected]>wrote:

> Hi, Jeff. Thanks for you reply. Actually, I will do further process of the
> map-reduce output. If I cannot store them in memory, other modules cannot
> process them. So if these modules are integrated into map-reduce, then they
> will finish the process in mapreduce jobs. The problem is that these modules
> are complicated. The easy way is to store output of jobs in memory. What do
> you think? Do you have such experiences?
>
>
> --------------------------------------------------
> From: "Jeff Zhang" <[email protected]>
> Sent: Friday, November 27, 2009 10:46 PM
>
> To: <[email protected]>
> Subject: Re: Store mapreduce output into my own data structures
>
>  So how do you plan to integrate your other modules with hadoop ?
>>
>> Put them in reduce phase ?
>>
>>
>> Jeff Zhang
>>
>>
>>
>> On Fri, Nov 27, 2009 at 6:37 AM, <[email protected]> wrote:
>>
>>  Actually I want the output can be used by other modules. So it has to
>>> read
>>> the output from hdfs files? Or integrate these modules into map-reduce?
>>> Is
>>> there other ways?
>>>
>>> --------------------------------------------------
>>> From: "Jeff Zhang" <[email protected]>
>>> Sent: Friday, November 27, 2009 10:00 PM
>>> To: <[email protected]>
>>> Subject: Re: Store mapreduce output into my own data structures
>>>
>>>
>>>  Hi Liu,
>>>
>>>>
>>>> Why you want to store the output in memory?  You can not use the output
>>>> out
>>>> of reducer.
>>>> Actually at the beginning the output of reducer is in memory, and the
>>>> OutputFormat write these data to file system or other data store.
>>>>
>>>>
>>>> Jeff Zhang
>>>>
>>>>
>>>>
>>>> 2009/11/27 Liu Xianglong <[email protected]>
>>>>
>>>>  Hi, everyone. Is there someone who uses map-reduce to store the reduce
>>>>
>>>>> output in memory. I mean, now the output path of job is set and reduce
>>>>> outputs are stored into files under this path.(see the comments along
>>>>> with
>>>>> the following codes)
>>>>>   job.setOutputFormatClass(MyOutputFormat.class);
>>>>>   //can I implement my OutputFormat to store these output key-value
>>>>> pairs
>>>>> in my data structures, or are these other ways to do it?
>>>>>   job.setOutputKeyClass(ImmutableBytesWritable.class);
>>>>>   job.setOutputValueClass(Result.class);
>>>>>   FileOutputFormat.setOutputPath(job, outputDir);
>>>>>
>>>>>  Is there any way to store them in some variables or data structures?
>>>>> Then
>>>>> how can I implement my OutputFormat? Any suggestions and codes are
>>>>> welcomed.
>>>>>
>>>>> Another question: is there some way to set the number of map task? It
>>>>> seems
>>>>> there is no API to do this in hadoop new job APIs. I am not sure the
>>>>> way
>>>>> to
>>>>> set this number.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Best Wishes!
>>>>> _____________________________________________________________
>>>>>
>>>>> 刘祥龙  Liu Xianglong
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Store mapreduce output into my own data structures

Reply via email to