-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36122/#review90253
-----------------------------------------------------------



ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py
 (line 106)
<https://reviews.apache.org/r/36122/#comment143265>

    Why is this needed? Do we allow revert to "/" if it was originally mounted 
to "/"?



ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py
 (line 107)
<https://reviews.apache.org/r/36122/#comment143268>

    Do we ever need to remove something from prev_data_dir_to_mount_point?


- Sumit Mohanty


On July 2, 2015, 5:07 a.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36122/
> -----------------------------------------------------------
> 
> (Updated July 2, 2015, 5:07 a.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley, Nate Cole, Sumit Mohanty, 
> Srimanth Gunturi, Sid Wagle, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-12252
>     https://issues.apache.org/jira/browse/AMBARI-12252
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Ambari keeps track of a file, /etc/hadoop/conf/dfs_data_dir_mount.hist 
> that contains a mapping of HDFS data dirs to the last known mount point.
> 
> This is used to detect when a data dir becomes unmounted, in order to prevent 
> HDFS from writing to the root partition.
> 
> Consider the example of a data node configured with these volumes: 
> 
> /dev/sda -> / 
> /dev/sdb -> /grid/0
> /dev/sdc -> /grid/1
> /dev/sdd -> /grid/2
> 
> Typically, each /grid/#/ directory contains a data folder.
> Today, if a data directory becomes unmounted, then the directory will not 
> exist and Ambari will not create it automatically. Ambari will simply log a 
> warning, and update its cache with the new mount point, which is /  ; that is 
> the underlying bug.
> 
> If hdfs-site contains dfs.datanode.failed.volumes.tolerated with a value > 0, 
> then DataNode will tolerate the failure, otherwise, the DataNode will die.
> 
> Because Ambari will already have "/" in its cache file, the fact that it used 
> to be mounted in a non-root drive is lost, so next time DataNode is 
> restarted, Ambari will create the data dir which is now mounted on the root 
> partition; this is really bad because HDFS will now fill up the root drive.
> 
> The admin can still remount the partition, but then needs to restart DataNode 
> so Ambari can update its cache.
> 
> The ideal way to fix this in Ambari 2.2 is as follows,
> - Track which data dirs the admin wants mounted on a non-root partition. If 
> the admin wishes all data dirs to be on non-root mounts, but the initial 
> install is incorrect, then this should be reported as a problem. 
> - Keep the history of the mount points in the database. Today, if the cache 
> file is deleted or the host reimaged, then this information is lost.
> - Introduce a new state between FAILED and COMPLETED, such as 
> COMPLETED_WITH_ERRORS, that will allow tasks to look differently in the UI, 
> so the user can clearly detect when a critical but non fatal error happened.
> - Plugin with Alert Framework
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/test/python/resource_management/TestDatanodeHelper.py 
> PRE-CREATION 
>   ambari-agent/src/test/python/resource_management/TestFileSystem.py 91fd71d 
>   ambari-agent/src/test/python/unitTests.py b6f8411 
>   
> ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py
>  ad5a984 
> 
> Diff: https://reviews.apache.org/r/36122/diff/
> 
> 
> Testing
> -------
> 
> Ran python unit tests in ambari-agent
> ----------------------------------------------------------------------
> Ran 403 tests in 16.819s
> 
> 
> Verified that this worked on a host with multiple mounted volumes.
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>

Reply via email to