Re: detecting oversized bucket in mapreduce

Daniel Templeton Tue, 27 Nov 2018 09:57:01 -0800

There are no per-key metrics provided by MapReduce, but you should beable to run your job with an identity reducer to see what the bucketsizes were.

If you're talking about doing it on the fly, there's no way to do thattoday. The job is submitted with a fixed number of reducers, which alsofixes the number of buckets. YARN supports adding resources to anexisting job, e.g. adding more reducers, but MapReduce doesn't make useof those capabilities.


Daniel

On 11/26/18 9:10 PM, Tianxiang Li wrote:

Dear Hadoop community,

I'm new to the Hadoop MapReduce code, and I'd like to know how I can get the 
number of records under a specific key value after the map process. I'd like to 
detect oversized buckets and perform further key division to split the records.

Thanks,
Peter



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: detecting oversized bucket in mapreduce

Reply via email to