Does Hadoop distribute blocks according to how many blocks a node currently 
contains or according to how much disk space the node has remaining currently ?
Suppose that I have many machines with identical CPUs but different disk sizes. 
If the blocks get distributed according to the remaining disk space, then the 
larger disk nodes would be storing more data... would this cause performance 
problems during the mapping phase ?
Thanks,
moonwatcher




      

Reply via email to