----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36122/#review90231 -----------------------------------------------------------
Ship it! Ship It! - Jonathan Hurley On July 2, 2015, 1:07 a.m., Alejandro Fernandez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/36122/ > ----------------------------------------------------------- > > (Updated July 2, 2015, 1:07 a.m.) > > > Review request for Ambari, Jonathan Hurley, Nate Cole, Sumit Mohanty, > Srimanth Gunturi, Sid Wagle, and Yusaku Sako. > > > Bugs: AMBARI-12252 > https://issues.apache.org/jira/browse/AMBARI-12252 > > > Repository: ambari > > > Description > ------- > > Ambari keeps track of a file, /etc/hadoop/conf/dfs_data_dir_mount.hist > that contains a mapping of HDFS data dirs to the last known mount point. > > This is used to detect when a data dir becomes unmounted, in order to prevent > HDFS from writing to the root partition. > > Consider the example of a data node configured with these volumes: > > /dev/sda -> / > /dev/sdb -> /grid/0 > /dev/sdc -> /grid/1 > /dev/sdd -> /grid/2 > > Typically, each /grid/#/ directory contains a data folder. > Today, if a data directory becomes unmounted, then the directory will not > exist and Ambari will not create it automatically. Ambari will simply log a > warning, and update its cache with the new mount point, which is / ; that is > the underlying bug. > > If hdfs-site contains dfs.datanode.failed.volumes.tolerated with a value > 0, > then DataNode will tolerate the failure, otherwise, the DataNode will die. > > Because Ambari will already have "/" in its cache file, the fact that it used > to be mounted in a non-root drive is lost, so next time DataNode is > restarted, Ambari will create the data dir which is now mounted on the root > partition; this is really bad because HDFS will now fill up the root drive. > > The admin can still remount the partition, but then needs to restart DataNode > so Ambari can update its cache. > > The ideal way to fix this in Ambari 2.2 is as follows, > - Track which data dirs the admin wants mounted on a non-root partition. If > the admin wishes all data dirs to be on non-root mounts, but the initial > install is incorrect, then this should be reported as a problem. > - Keep the history of the mount points in the database. Today, if the cache > file is deleted or the host reimaged, then this information is lost. > - Introduce a new state between FAILED and COMPLETED, such as > COMPLETED_WITH_ERRORS, that will allow tasks to look differently in the UI, > so the user can clearly detect when a critical but non fatal error happened. > - Plugin with Alert Framework > > > Diffs > ----- > > ambari-agent/src/test/python/resource_management/TestDatanodeHelper.py > PRE-CREATION > ambari-agent/src/test/python/resource_management/TestFileSystem.py 91fd71d > ambari-agent/src/test/python/unitTests.py b6f8411 > > ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py > ad5a984 > > Diff: https://reviews.apache.org/r/36122/diff/ > > > Testing > ------- > > Ran python unit tests in ambari-agent > ---------------------------------------------------------------------- > Ran 403 tests in 16.819s > > > Verified that this worked on a host with multiple mounted volumes. > > > Thanks, > > Alejandro Fernandez > >