Does Hadoop distribute blocks according to how many blocks a node currently contains or according to how much disk space the node has remaining currently ? Suppose that I have many machines with identical CPUs but different disk sizes. If the blocks get distributed according to the remaining disk space, then the larger disk nodes would be storing more data... would this cause performance problems during the mapping phase ? Thanks, moonwatcher
- block distribution with varying disk sizes [EMAIL PROTECTED]