In our environment we have hdfs nodes that are also used as compute nodes.

Our disk environment is heterogeneous. We have a couple of machines with much smaller disk capacity than others. Another minor issue is our IT staff sets up 1 filesystem backed by a hardware raid of all of the physical disks in the machine.

We have been trying to work with dfs.datanode.du.reserved and dfs.datanode.du.pct, but we are still filling up our small machines.

On reading through the code, it appears to me that these two values are only examined for determining the host of a replica block.

Questions:

  1.     is the 'first block' /always/ written to the local host if it
     is also an hdfs node for the filesystem, ignoring any
     dfs.datanode.du limits.
  2.     is there any attempt to ensure that multiple blocks are not
     allocated such that the dfs.datanode.du limits may be exceeded.

Thanks all.
--
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Reply via email to