On 29/03/11 16:30, Michael Segel wrote:
Grid Pattern: Applications should not use more than 10, 15 or 25 custom counters." I have to question the limitation. It seems arbitrary. I agree that counters add additional overhead, but suppose I wanted to run the word count m/r as a map only job and use counters as a way to capture a count per word? At what point does the cost of the counter(s) exceed the cost of the reduce job?
It's not a performance issue, it's total JT memory. Too many counters, your JT goes OOM, cluster restart time, all outstanding jobs get to restart, etc, etc.
The cost of a large cluster outage is greater than the cost of the reduce job.
On a small (not yahoo! size) cluster, if your JT process has enough memory, you can have more counters as there is less work to lose, and more memory to spare in the JT
