Just to make sure I was clear-enough :
- Is there a parameter that allows to set the size of the batch of elements
that are retrieved to memory while the reduce task iterates on the input
values ?
Thanks,
Sami dalouche
On Mon, Oct 3, 2011 at 1:42 PM, Sami Dalouche wrote:
> Hi,
>
> My understanding is that when the reduce() method is called, the values
> (Iterable values) are stored in memory.
>
> 1/ Is that actually true ?
> 2/ If this is true, is there a way to lazy-load the inputs to use less
> memory ? (e.g. load all the items by batches of 20, and discard the
> previously fetched ones)
> The only related option that I could find is mapreduce.reduce.input.limit,
> but it doesn't do what I need.
>
> The problem I am trying to solve is that my input values are huge objects
> (serialized lucene indices using a custom Writable implementation), and
> loading them all at once seems to require way too much memory.
>
> Thank You,
> Sami Dalouche
>