Sorry for the mistake in the previous mail. I meant I ran balancer with default threshold.
On Sat, Aug 8, 2009 at 10:40 AM, prashant ullegaddi < [email protected]> wrote: > Thank you Ravi and Ted. > > I ran hadoop balancer without default threshold. It's been running for last > 8 hours! > How long does it take given the following DFS stats: > > *3140 files and directories, 10295 blocks = 13435 total. Heap Size is > 17.88 MB / 963 MB (1%) > * Capacity : 3.93 TB DFS Remaining : 2.11 TB DFS Used : 1.31 TB DFS > Used% : 33.44 % Live Nodes <http://megh01:50070/dfshealth.jsp#LiveNodes> :10 > Dead > Nodes <http://megh01:50070/dfshealth.jsp#DeadNodes> : 0 > > If I interrupt it now, what will happen? I've to run a job now. I think > balancing and running a job > may not happen together as one will slow down the other. > > Thanks, > Prashant. > > > On Fri, Aug 7, 2009 at 11:28 PM, Ted Dunning <[email protected]>wrote: > >> Make sure you rebalance soon after adding the new node. Otherwise, you >> will >> have an age bias in file distribution. This can, in some applications, >> lead >> to some strange effects. For example, if you have log files that you >> delete >> when they get too old, disk space will be freed non-uniformly. This >> shouldn't much affect performance, but it can lead to a need to rebalance >> again (and again) later. Normal file churn combined with occasional >> rebalancing should eventually fix this, but it is nicer not to. >> >> On Fri, Aug 7, 2009 at 10:48 AM, Ravi Phulari <[email protected]> wrote: >> >> > Use Rebalancer >> > >> > >> > >> http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Rebalancer >> > - >> > Ravi >> > >> > On 8/7/09 10:38 AM, "prashant ullegaddi" <[email protected]> >> wrote: >> > >> > > Hi, >> > > >> > > We had a cluster of 9 machines with one name node, and 8 data nodes (2 >> > had >> > > 220GB hard disk space, rest had 450GB). >> > > Most of the space on first machines with 250GB disk space was >> consumed. >> > > Now we added two new machines each with 450GB hard disk space as data >> > nodes. >> > > >> > > Is there any way to redistribute files on HDFS so that there will >> > > considerable free space left on first two machines without >> > > downloading the files to one local machine and then uploading it back >> on >> > > HDFS? >> > > >> > > ~ >> > > Prashant, >> > > SIEL, >> > > IIIT-Hyderabad. >> > > >> > >> > >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >> > >
