Hello experts, I'm new to hdfs/hadoop. After reading the hdfs documents, I'm getting confused by the differences between a namenode and a master server. It's my understanding that the namenode is responsible for managing metadata, while the master-replica group (which is comprised by a number of datanodes) stores the actual data blocks. In the master-replica group, the master server accepts read/write requests, and load balances (or routes) read requests to the appropriate replica. In other words, we should configure the namenode and master server on two different physical machines in a production environment, right? Is this a correct assumption?
One other question about HDFS cluster setup: - requirements: one namenode, replication factor = 3, in a production environment. how would the topology look like? Can I configure as follows? in conf/core-site.xml: fs.default.name = hdfs://machineAAA:54321/ in conf/masters: machineBBB in conf/slaves: machineCCC machineDDD Can someone please confirm and/or comment? Sorry for my new bie questions. Thanks for the help. Chao