Hi Ravi,

Thanks for your reply. It's very helpful.

No, I was talking about this:
http://hadoop.apache.org/common/docs/r1.0.3/hdfs_design.html (hadoop
version:  r1.0.3). It says:

HDFS has a master/slave architecture. An HDFS cluster consists of a single
NameNode, a master server that manages the file system namespace and
regulates access to files by clients. In addition, there are a number of
DataNodes, usually one per node in the cluster, which manage storage
attached to the nodes that they run on.

This part confuses me, and made me think that the NameNode and the master
server should run on two exclusive nodes. Now according to your
explanation, I think masters mean the NameNodes, while slaves mean the
DataNodes. I hope this is correct now.  :)

Best regards,
Chao


On Thu, May 24, 2012 at 7:58 PM, Ravi Prakash <ravihad...@gmail.com> wrote:

> Hi Chao,
>
> What documentation are you reading? This is pretty accurate :
> http://hadoop.apache.org/common/docs/r0.20.203.0/hdfs_design.html
>
> The NameNode is indeed responsible for the metadata. And all the datanodes
> report to the NameNode (so they are all slaves). You are right, the data
> blocks are stored on the DataNodes. Perhaps I am lacking knowledge of the
> history, but as of now there's no "master server". All read write requests
> on files are directed at the Namenode from where they get redirected to the
> appropriate DataNode holding the block.
>
> So your configuration for replication factor 3 would look like:
>
> in conf/core-site.xml:
>     fs.default.name = hdfs://machineAAA:54321/
>
> in conf/slaves:
>     machineBBB
>     machineCCC
>     machineDDD
>     machineEEE
>     ....possibly a lot more
>
>
>
>
> Hope this helps
> Ravi
>
>
> On Thu, May 24, 2012 at 6:30 AM, Chao Huang <chaomhu...@gmail.com> wrote:
>
>> Hello experts,
>>
>> I'm new to hdfs/hadoop.  After reading the hdfs documents, I'm getting
>> confused by the differences between a namenode and a master server.  It's
>> my understanding that the namenode is responsible for managing metadata,
>> while the master-replica group (which is comprised by a number of
>> datanodes) stores the actual data blocks.  In the master-replica group, the
>> master server accepts read/write requests, and load balances (or routes)
>> read requests to the appropriate replica. In other words, we should
>> configure the namenode and master server on two different physical machines
>> in a production environment, right?  Is this a correct assumption?
>>
>> One other question about HDFS cluster setup:
>>
>> - requirements:  one namenode, replication factor = 3, in a production
>> environment.
>>
>> how would the topology look like?  Can I configure as follows?
>>
>>
>> in conf/core-site.xml:
>>     fs.default.name = hdfs://machineAAA:54321/
>>
>> in conf/masters:
>>     machineBBB
>>
>> in conf/slaves:
>>     machineCCC
>>     machineDDD
>>
>>
>> Can someone please confirm and/or comment?
>>
>> Sorry for my new bie questions. Thanks for the help.
>>
>> Chao
>>
>
>

Reply via email to