In our environment we have hdfs nodes that are also used as compute nodes.
Our disk environment is heterogeneous. We have a couple of machines with
much smaller disk capacity than others. Another minor issue is our IT
staff sets up 1 filesystem backed by a hardware raid of all of the
physical disks in the machine.
We have been trying to work with dfs.datanode.du.reserved and
dfs.datanode.du.pct, but we are still filling up our small machines.
On reading through the code, it appears to me that these two values are
only examined for determining the host of a replica block.
Questions:
1. is the 'first block' /always/ written to the local host if it
is also an hdfs node for the filesystem, ignoring any
dfs.datanode.du limits.
2. is there any attempt to ensure that multiple blocks are not
allocated such that the dfs.datanode.du limits may be exceeded.
Thanks all.
--
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers and coding wizards, contact if
interested