-----Original Message----- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, June 09, 2011 12:14 PM To: common-user@hadoop.apache.org Subject: Re: Question about DFS Reserved Space
>Landy, >>On Thu, Jun 9, 2011 at 10:05 PM, Bible, Landy <landy-bi...@utulsa.edu> wrote: >> Hi all, >> >> I'm planning a rather non-standard HDFS cluster. The machines will be >> doing more than just DFS, and each machine will have varying local storage >> utilization outside of DFS. If I use the "dfs.datanode.du.reserved" >> property and reserve 10 GB, Does that mean DFS will use (total disk size - >> 10 GB) or that it will always leave 10 GB free? Basically, is the disk >> usage outside DFS (OS + other data) taken in to account? >The latter (will leave 10 GB free). The whole disk is taken into account >during space compute. So yes, even external data may influence. >> As usage outside of DFS grows I'd like DFS to back off the disk, and migrate >> blocks to other nodes. If this isn't the current behavior, I could create a >> script to look at disk usage every few hours and modify the reserved >> property dynamically. If the property is changed on a single datanode and >> it is restarted, will the datanode then start moving blocks away? >Why would you need to modify the reserve values once set to a comfortable >value? The DN monitors the disk space by itself, so you don't have to. Great! Problem solved. I assumed that the datanode was smart enough, but I wanted to be sure. >The DN will also not move away blocks if reserved limit is violated (due to >you increasing it, say). However, it will begin to refuse any writes happening >to it. You may require to run the Balancer in order to move blocks around and >balance DNs though. Running the balancer from time to time is easy enough. I'm guessing that if the limit is violated, the balancer would take care of moving the offending blocks off the datanode. >> My other option is to just set the reserved amount very high on every node, >> but that will lead to a lot of wasted space as many nodes won't have a very >> large storage demand outside of DFS. >How about keeping one disk dedicated for all other intents outside of the >DFS's grasp? Normally I would, but as I mentioned, this isn't a normal cluster. I'm actually running the datanodes on Windows 7 desktops, which of course only have a single disk. I'm planning to use HDFS to store backups of user data from the desktops. (encrypted before uploading to the cluster, of course) The idea is to use the vast amount of wasted disk space on our desktops as archival storage. We won't be running any MR jobs, just storing data. -Landy