Hi Everyone, We are a start-up company has been using the Hadoop Cluster platform (version 0.20.2) on Amazon EC2 environment. We tried to setup a cluster using two different forms: Cluster 1: includes 1 master (namenode) + 5 datanodes - all of the machines are small EC2 instances (1.6 GB RAM) Cluster 2: includes 1 master (namenode) + 2 datanodes - the master is a small EC2 instance and the other two datanodes are large EC2 instances (7.5 GB RAM) We tried to make changes on the the configuration files (core-sit, hdfs-site and mapred-sit xml files) and we expected to see a significant improvement on the performance of the cluster 2, unfortunately this has yet to happen.
Are there any special parameters on the configuration files that we need to change in order to adjust the Hadoop to a large hardware environment ? Are there any best practice you recommend? Thanks in advance. Avi