> On Feb. 24, 2016, 7:56 p.m., Alejandro Fernandez wrote: > > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py, > > line 123 > > <https://reviews.apache.org/r/43948/diff/1/?file=1267791#file1267791line123> > > > > If this happens during cluster install, why don't we put a dependency > > in role_command_order.json that RM must start after ATS. > > > > If ATS is on host1 and RM on host2, and during fresh cluster install we > > fail to install ATS, then RM will keep waiting.
role_command_order.json won't work with Blueprints as with Blueprint there is no clustre wide ordering. RM will keep waiting only until it exhausts the retries (8 * 20 secs) - Sebastian ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43948/#review120537 ----------------------------------------------------------- On Feb. 24, 2016, 5:39 p.m., Sebastian Toader wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43948/ > ----------------------------------------------------------- > > (Updated Feb. 24, 2016, 5:39 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Andrew Onischuk, Sumit > Mohanty, and Sid Wagle. > > > Bugs: AMBARI-15158 > https://issues.apache.org/jira/browse/AMBARI-15158 > > > Repository: ambari > > > Description > ------- > > If ATS is installed than Resource Manager after starting will check if the > directories where ATS will store time line data for active and completed > applications exists in DFS. There migh tbe cases when RM comes up much > earlier than ATS creating these directories. In these situations RM will stop > with "IOException: /ats/active does not exist" error message. > > In order to avoid this situation the pythin script responsible for starting > RM component has been modified to check the existence of these directories > upfront before the RM process is started. This check is performed only if ATS > is installed and have either > yarn.timeline-service.entity-group-fs-store.active-dir or > yarn.timeline-service.entity-group-fs-store.done-dir set. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py > 2ef404d > > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py > ec7799e > > Diff: https://reviews.apache.org/r/43948/diff/ > > > Testing > ------- > > Manual testing: > 1. Created secure/non-secure clusters with Blueprint where NN, RM and ATS > were deployed to different nodes. This was tested with both cases when HDFS > has webhdfs enabled and disabled. > 2. Created a cluster using the UI where NN, RM and ATS were deployed to > different nodes. After the cluster was kerberized and was tested with both > cases when HDFS has webhdfs enabled and disabled. > > Python tests results: > ---------------------------------------------------------------------- > Total run:902 > Total errors:0 > Total failures:0 > OK > > > Thanks, > > Sebastian Toader > >
