Re: memory leak in reduce side

Harsh J Fri, 11 Mar 2011 00:51:21 -0800

Hello,

On Fri, Mar 11, 2011 at 1:18 PM, exception <except...@taomee.com> wrote:
> What I am trying to do is storing the values from the map side into a list
> and doing some computation.


Before you attempt this, know how many values you can possibly receive
for a grouped key in your reducer. Storing a few values is alright,
but storing all that comes in each reducer call is not sane in most
cases - you will easily run out of memory in these cases.

>     mList.add(new TimeFreq(time,1));

Ensure you clone the 'time' object. The Reducer of Hadoop reuses the
key and value objects, and you may run into some weird issues where
the only thing you have left is the last value of the last key that
came in.

> This job has 7 reducers. Two of them can finish successfully, but the others
> stop at 66%-70%. These reducers already finish copying and sorting. Looks
> like they are blocked for some reason.

If two always pass, it is probably because their key-[value] sizes
were within limits somehow (the result of the partitioner and the data
used as key).

> I check the memory usage of the blocked java process and find there may be
> memory leak in my code.

You are clearing your list at every reduce call, which is correct and
should avoid leaks. But you need to see just how many values get
entered into the list in _each_ reduce call.


-- 
Harsh J
www.harshj.com

Re: memory leak in reduce side

Reply via email to