There are no per-key metrics provided by MapReduce, but you should be
able to run your job with an identity reducer to see what the bucket
sizes were.
If you're talking about doing it on the fly, there's no way to do that
today. The job is submitted with a fixed number of reducers, which also
fixes the number of buckets. YARN supports adding resources to an
existing job, e.g. adding more reducers, but MapReduce doesn't make use
of those capabilities.
Daniel
On 11/26/18 9:10 PM, Tianxiang Li wrote:
Dear Hadoop community,
I'm new to the Hadoop MapReduce code, and I'd like to know how I can get the
number of records under a specific key value after the map process. I'd like to
detect oversized buckets and perform further key division to split the records.
Thanks,
Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]