Mathijs Homminga wrote:
Is there a way to easily determine the efficiency of my cluster?
Example:
- there are 5 slaves which can handle 1 task at the time each
- there is one job, split into 5 sub tasks (5 maps and 5 reduces)
- 4 slaves finish their tasks in 1 minute
- 1 slave finishes its tasks in 2 minutes (so 4 slaves are waiting 1
minute)
... then one could say that the cluster usage is 60% (6 working minutes,
4 waiting minutes)
A standard way to improve this is to increase the number of tasks. If
you instead have 10 tasks/node, then a node that runs at half speed
shouldn't affect the overall time nearly as much.
Doug