There are no per-key metrics provided by MapReduce, but you should be able to run your job with an identity reducer to see what the bucket sizes were.

If you're talking about doing it on the fly, there's no way to do that today.  The job is submitted with a fixed number of reducers, which also fixes the number of buckets.  YARN supports adding resources to an existing job, e.g. adding more reducers, but MapReduce doesn't make use of those capabilities.

Daniel

On 11/26/18 9:10 PM, Tianxiang Li wrote:
Dear Hadoop community,

I'm new to the Hadoop MapReduce code, and I'd like to know how I can get the 
number of records under a specific key value after the map process. I'd like to 
detect oversized buckets and perform further key division to split the records.

Thanks,
Peter



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to