Re: lazy-loading of Reduce's input

2011-10-03 Thread Sami Dalouche
Just to make sure I was clear-enough :
- Is there a parameter that allows to set the size of the batch of elements
that are retrieved to memory while the reduce task iterates on the input
values ?

Thanks,
Sami dalouche

On Mon, Oct 3, 2011 at 1:42 PM, Sami Dalouche  wrote:

> Hi,
>
> My understanding is that when the reduce() method is called, the values
> (Iterable values) are stored in memory.
>
> 1/ Is that actually true ?
> 2/ If this is true, is there a way to lazy-load the inputs to use less
> memory ? (e.g. load all the items by batches of 20, and discard the
> previously fetched ones)
> The only related option that I could find is mapreduce.reduce.input.limit,
> but it doesn't do what I need.
>
> The problem I am trying to solve is that my input values are huge objects
> (serialized lucene indices using a custom Writable implementation), and
> loading them all at once seems to require way too much memory.
>
> Thank You,
> Sami Dalouche
>


lazy-loading of Reduce's input

2011-10-03 Thread Sami Dalouche
Hi,

My understanding is that when the reduce() method is called, the values
(Iterable values) are stored in memory.

1/ Is that actually true ?
2/ If this is true, is there a way to lazy-load the inputs to use less
memory ? (e.g. load all the items by batches of 20, and discard the
previously fetched ones)
The only related option that I could find is mapreduce.reduce.input.limit,
but it doesn't do what I need.

The problem I am trying to solve is that my input values are huge objects
(serialized lucene indices using a custom Writable implementation), and
loading them all at once seems to require way too much memory.

Thank You,
Sami Dalouche