Re: Limit number of records or total size in combiner input using jobconf?

Chris Douglas Fri, 20 Feb 2009 14:35:47 -0800

So here are my questions:
(1) is there a  jobconf hint to limit the number of records in kviter?
I can (and have) made a fix to my code that processes the values in a
combiner step in batches (i.e takes N at a go,processes that and
repeat), but was wondering if i could just set an option.

Approximately and indirectly, yes. You can limit the amount of memoryallocated to storing serialized records in memory (io.sort.mb) and thepercentage of that space reserved for storing record metadata(io.sort.record.percent, IIRC). That can be used to limit the numberof records in each spill, though you may also need to disable thecombiner during the merge, where you may run into the same problem.

You're almost certainly better off designing your combiner to scalewell (as you have), since you'll hit this in the reduce, too.

Since this occurred in the MapContext, changing the number of reducers
wont help.
(2) How does changing the number of reducers help at all? I have 7
machines, so I feel 11 (a prime close to 7, why a prime?) is good
enough (some machines are 16GB others 32GB)

Your combiner will look at all the records for a partition and onlythose records in a partition. If your partitioner distributes yourrecords evenly in a particular spill, then increasing the total numberof partitions will decrease the number of records your combinerconsiders in each call. For most partitioners, whether the number ofreducers is prime should be irrelevant. -C

Re: Limit number of records or total size in combiner input using jobconf?

Reply via email to