-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26065/
-----------------------------------------------------------

(Updated Sept. 30, 2014, 11:57 p.m.)


Review request for Ambari, Florian Barca, Jonathan Hurley, Mahadev Konar, Sid 
Wagle, and Tom Beerbower.


Changes
-------

Unit and system testing are complete.


Bugs: AMBARI-7506
    https://issues.apache.org/jira/browse/AMBARI-7506


Repository: ambari


Description
-------

When a drive fails and it is unmounted for service, if the data node process is 
stopped/started using Ambari the dfs.data.dir path that was housed on that 
drive is re-created, but this time on the / partition leading to out of disk 
space issues and data being created on the wrong volume.
In this case we only want the Ambari Agent to create dfs.data.dir's during 
installation, and not after as this makes drive replacements difficult.


Diffs (updated)
-----

  ambari-agent/src/test/python/resource_management/TestFileSystem.py 
PRE-CREATION 
  ambari-common/src/main/python/resource_management/core/logger.py e395bd7 
  ambari-common/src/main/python/resource_management/core/providers/mount.py 
dc6d7d9 
  
ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py
 PRE-CREATION 
  
ambari-common/src/main/python/resource_management/libraries/functions/file_system.py
 PRE-CREATION 
  
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/configuration/hadoop-env.xml
 5da6484 
  
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/package/scripts/hdfs_datanode.py
 2482f97 
  
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/package/scripts/params.py
 245ad92 
  
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/configuration/hadoop-env.xml
 b3935d7 
  
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_datanode.py
 e38d9af 
  
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/scripts/params.py
 27cef20 
  ambari-server/src/test/python/stacks/1.3.2/configs/default.json c80723c 
  ambari-server/src/test/python/stacks/1.3.2/configs/secured.json 99e88b8 
  ambari-server/src/test/python/stacks/2.0.6/configs/default.json 4e00086 
  ambari-server/src/test/python/stacks/2.0.6/configs/secured.json d03be7a 
  ambari-web/app/data/HDP2/site_properties.js 9886d56 
  ambari-web/app/data/site_properties.js 0e6aa8e 

Diff: https://reviews.apache.org/r/26065/diff/


Testing
-------

Created unit tests and simple end-to-end test on a sandbox VM.

Ran end-to-end tests on Google Compute Cloud with VMs that had an external 
drive mounted.
1. Created a cluster with 2 VMs, and copied the changes python files.
2. To avoid having to copy the changed web files, instead saved the new 
property by running,
/var/lib/ambari-server/resources/scripts/configs.sh set localhost dev 
hadoop-env dfs.datanode.data.dir.mount.file 
"/etc/hadoop/conf/dfs_data_dir_mount.hist"
and verified that the property appears in the API, e.g., 
http://162.216.150.229:8080/api/v1/clusters/dev/configurations?type=hadoop-env&tag=version1412115461978734672
3. Restarted HDFS on all agents
4. cat /etc/hadoop/conf/dfs_data_dir_mount.hist
correctly showed the HDFS data dir and its mount point,
# data_dir,mount_point
/grid/0/hadoop/hdfs/data,/grid/0

5. Then changed the HDFS data dir property from /grid/0/hadoop/hdfs/data to 
/grid/1/hadoop/hdfs/data
which correctly showed it is mounted on root, and created the 
/grid/1/hadoop/hdfs/data directory

6. Next, unmounted the drive, by first stopping HDFS and Zookeeper. Also ran,
cd /root
fuser -c /grid/0
lsof /grid/0
umount /grid/0

7. Restarted the HDFS services, and it resulted in an error as expected.
Fail: Execution of 'ulimit -c unlimited;  su - hdfs -c 'export 
HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && 
/usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start 
datanode'' returned 1. starting datanode, logging to 
/var/log/hadoop/hdfs/hadoop-hdfs-datanode-alejandro-1.out

8. Next, incremented the "DataNode volumes failure toleration" property from 0 
to 1 and restarted all of the Datanodes, which did not result in an error this 
time.


Thanks,

Alejandro Fernandez

Reply via email to