----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26065/ -----------------------------------------------------------
(Updated Sept. 30, 2014, 11:57 p.m.) Review request for Ambari, Florian Barca, Jonathan Hurley, Mahadev Konar, Sid Wagle, and Tom Beerbower. Changes ------- Unit and system testing are complete. Bugs: AMBARI-7506 https://issues.apache.org/jira/browse/AMBARI-7506 Repository: ambari Description ------- When a drive fails and it is unmounted for service, if the data node process is stopped/started using Ambari the dfs.data.dir path that was housed on that drive is re-created, but this time on the / partition leading to out of disk space issues and data being created on the wrong volume. In this case we only want the Ambari Agent to create dfs.data.dir's during installation, and not after as this makes drive replacements difficult. Diffs (updated) ----- ambari-agent/src/test/python/resource_management/TestFileSystem.py PRE-CREATION ambari-common/src/main/python/resource_management/core/logger.py e395bd7 ambari-common/src/main/python/resource_management/core/providers/mount.py dc6d7d9 ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py PRE-CREATION ambari-common/src/main/python/resource_management/libraries/functions/file_system.py PRE-CREATION ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/configuration/hadoop-env.xml 5da6484 ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/package/scripts/hdfs_datanode.py 2482f97 ambari-server/src/main/resources/stacks/HDP/1.3.2/services/HDFS/package/scripts/params.py 245ad92 ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/configuration/hadoop-env.xml b3935d7 ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/scripts/hdfs_datanode.py e38d9af ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/scripts/params.py 27cef20 ambari-server/src/test/python/stacks/1.3.2/configs/default.json c80723c ambari-server/src/test/python/stacks/1.3.2/configs/secured.json 99e88b8 ambari-server/src/test/python/stacks/2.0.6/configs/default.json 4e00086 ambari-server/src/test/python/stacks/2.0.6/configs/secured.json d03be7a ambari-web/app/data/HDP2/site_properties.js 9886d56 ambari-web/app/data/site_properties.js 0e6aa8e Diff: https://reviews.apache.org/r/26065/diff/ Testing ------- Created unit tests and simple end-to-end test on a sandbox VM. Ran end-to-end tests on Google Compute Cloud with VMs that had an external drive mounted. 1. Created a cluster with 2 VMs, and copied the changes python files. 2. To avoid having to copy the changed web files, instead saved the new property by running, /var/lib/ambari-server/resources/scripts/configs.sh set localhost dev hadoop-env dfs.datanode.data.dir.mount.file "/etc/hadoop/conf/dfs_data_dir_mount.hist" and verified that the property appears in the API, e.g., http://162.216.150.229:8080/api/v1/clusters/dev/configurations?type=hadoop-env&tag=version1412115461978734672 3. Restarted HDFS on all agents 4. cat /etc/hadoop/conf/dfs_data_dir_mount.hist correctly showed the HDFS data dir and its mount point, # data_dir,mount_point /grid/0/hadoop/hdfs/data,/grid/0 5. Then changed the HDFS data dir property from /grid/0/hadoop/hdfs/data to /grid/1/hadoop/hdfs/data which correctly showed it is mounted on root, and created the /grid/1/hadoop/hdfs/data directory 6. Next, unmounted the drive, by first stopping HDFS and Zookeeper. Also ran, cd /root fuser -c /grid/0 lsof /grid/0 umount /grid/0 7. Restarted the HDFS services, and it resulted in an error as expected. Fail: Execution of 'ulimit -c unlimited; su - hdfs -c 'export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode'' returned 1. starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-alejandro-1.out 8. Next, incremented the "DataNode volumes failure toleration" property from 0 to 1 and restarted all of the Datanodes, which did not result in an error this time. Thanks, Alejandro Fernandez