Hi Andrew, Thank you for the quick response. I changed the bandwidth using "hadoop dfsadmin -setBalancerBandwidth" command and it works like a charm! Time to transfer data is now proportional to the bandwidth I set.
Thanks again! Best, Karthiek On Wed, Dec 18, 2013 at 6:23 PM, Andrew Wang <andrew.w...@cloudera.com>wrote: > Hi Karthiek, > > I haven't checked 1.0.4, but in 2.2.0 and onwards, there's this setting you > can tweak up: > > dfs.datanode.balance.bandwidthPerSec > > By default, it's set to just 1MB/s, which is pretty slow. Again at least in > 2.2.0, there's also `hdfs dfsadmin -setBalancerBandwidth` which can be used > to adjust this config property at runtime. > > Best, > Andrew > > > On Wed, Dec 18, 2013 at 2:40 PM, Karthiek C <karthi...@gmail.com> wrote: > > > Hi all, > > > > I am working on a research project where we are looking at algorithms to > > "optimally" distribute data blocks in HDFS nodes. The definition of what > is > > optimal is omitted for brevity. > > > > I want to move specific blocks of a file that is *already* in HDFS. I am > > able to achieve it using data transfer protocol (took cues from > "Balancer" > > module). But the operation turns out to be very time consuming. In my > > cluster setup, to move 1 block of data (approximately 60 MB) from > > data-node-1 to data-node-2 it takes nearly 60 seconds. A "dfs -put" > > operation that copies the same file from data-node-1's local file system > to > > data-node-2 takes just 1.4 seconds. > > > > Any suggestions on how to speed up the movement of specific blocks? > > Bringing down the running time is very important for us because this > > operation may happen while executing a job. > > > > I am using hadoop-1.0.4 version. > > > > Thanks in advance! > > > > Best, > > Karthiek > > >