Hey all I recently setup a three node hadoop cluster and ran an examples on it. It was pretty fast, and all the three nodes were being used (I checked the log files to make sure that the slaves are utilized).
Now I ve setup another cluster consisting of 15 nodes. I ran the same example, but instead of speeding up, the map-reduce task seems to take forever! The slaves are not being used for some reason. This second cluster has a lower, per node processing power, but should that make any difference? How can I ensure that the data is being mapped to all the nodes? Presently, the only node that seems to be doing all the work is the Master node. Does 15 nodes in a cluster increase the network cost? What can I do to setup the cluster to function more efficiently? Thanks! Mithila Nagendra Arizona State University