In this context I would like to ask, can we actually place the data where we wish instead of allowing Hadoop's intelligence to take care of this?
On Tue, Apr 19, 2011 at 10:52 AM, Kai Voigt <k...@123.org> wrote: > Hi, > > I found > http://hadoopblog.blogspot.com/2009/09/hdfs-block-replica-placement-in-your.htmlexplains > the process nicely. > > The first replica of each block will be stored on the client machine, if > it's a datanode itself. Makes sense, as it doesn't require a network > transfer. Otherwise, a random datanode will be picked for the first replica. > > The second replica will be written to a random datanode on a random rack > other than the rack where the first replica is stored. Here, HDFS's rack > awareness will be utilized. So HDFS would survive a rack failure. > > The second replica will be written to the same rack as the second replica, > but another random datanode in that rack. That will make the pipeline > between second and third replica quick. > > Does that make sense to you? However, this is the current hard coded > policy, there's ideas to make that policy customizable ( > https://issues.apache.org/jira/browse/HDFS-385). > > Kai > > Am 18.04.2011 um 15:46 schrieb Nan Zhu: > > > Hi, all > > > > I'm confused by a question that "how does the HDFS decide where to put > the > > data blocks " > > > > I mean that the user invokes some commands like "./hadoop put ***", we > > assume that this file consistes of 3 blocks, but how HDFS decides where > > these 3 blocks to be put? > > > > Most of the materials don't involve this issue, but just introduce the > data > > replica where talking about blocks in HDFS, > > > > can anyone give me some instructions? > > > > Thanks > > > > Nan > > > > -- > > Nan Zhu > > School of Software,5501 > > Shanghai Jiao Tong University > > 800,Dongchuan Road,Shanghai,China > > E-Mail: zhunans...@gmail.com > > -- > Kai Voigt > k...@123.org > > > > > -- Regards, R.V.