----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/46434/ -----------------------------------------------------------
(Updated ápr. 22, 2016, 12:42 du) Review request for Ambari, Alejandro Fernandez, Miklos Gergely, Oliver Szabo, Sandor Magyari, and Sebastian Toader. Changes ------- Review fixes Bugs: AMBARI-15991 https://issues.apache.org/jira/browse/AMBARI-15991 Repository: ambari Description ------- If upgrade process takes longer than expected, DataNode and RegionServer is reported as failed. It happens because it needs more time to finish update. The fix for RegionServer checks if the process is running and if it is so, then it is not considered as a failure. For DataNode the process is also checked and if it is running then check is repeated 2 times with 5 minutes wait. I had a limitation here, python scripts are allowed to run for 20 minutes by default and this checking takes 16 mins (2 minutes initial check, 5 minutes sleep if there is a failure, 2 minutes regaular check, 5 minutes sleep, 2 minutes final check). If more time is needed, then default value of *server.task.timeout* and number of repetition in 5 minutes check should be increased. Diffs (updated) ----- ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/upgrade.py 01a8156 ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py 8f36001 ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py 7ad9f39 ambari-server/src/test/python/stacks/2.0.6/HBASE/test_hbase_regionserver.py 8d187ec ambari-server/src/test/python/stacks/2.0.6/HDFS/test_datanode.py 78b8171 Diff: https://reviews.apache.org/r/46434/diff/ Testing ------- I did manual testing on this: For RegionServer the process check is tested. For DataNodes I made an intentional exception to see if it keeps waiting. (this is how I ran into the 20 minutes server task timeout) ---------------------------------------------------------------------- Total run:970 Total errors:0 Total failures:0 OK Thanks, Daniel Gergely