> On Feb. 24, 2016, 6:56 p.m., Alejandro Fernandez wrote: > > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py, > > line 123 > > <https://reviews.apache.org/r/43948/diff/1/?file=1267791#file1267791line123> > > > > If this happens during cluster install, why don't we put a dependency > > in role_command_order.json that RM must start after ATS. > > > > If ATS is on host1 and RM on host2, and during fresh cluster install we > > fail to install ATS, then RM will keep waiting. > > Sebastian Toader wrote: > role_command_order.json won't work with Blueprints as with Blueprint > there is no clustre wide ordering. > > RM will keep waiting only until it exhausts the retries (8 * 20 secs) > > Alejandro Fernandez wrote: > Can we make Blueprints respect role_command_order? > Please include Robert Nettleton in the code review.
Alejandro, we made BP not to respect RCO to speed up deployments for one of the users, which is really critical about the timings. And if we revert that change gonna run into that problem for him again. - Andrew ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/43948/#review120537 ----------------------------------------------------------- On Feb. 24, 2016, 4:39 p.m., Sebastian Toader wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/43948/ > ----------------------------------------------------------- > > (Updated Feb. 24, 2016, 4:39 p.m.) > > > Review request for Ambari, Alejandro Fernandez, Andrew Onischuk, Sumit > Mohanty, and Sid Wagle. > > > Bugs: AMBARI-15158 > https://issues.apache.org/jira/browse/AMBARI-15158 > > > Repository: ambari > > > Description > ------- > > If ATS is installed than Resource Manager after starting will check if the > directories where ATS will store time line data for active and completed > applications exists in DFS. There migh tbe cases when RM comes up much > earlier than ATS creating these directories. In these situations RM will stop > with "IOException: /ats/active does not exist" error message. > > In order to avoid this situation the pythin script responsible for starting > RM component has been modified to check the existence of these directories > upfront before the RM process is started. This check is performed only if ATS > is installed and have either > yarn.timeline-service.entity-group-fs-store.active-dir or > yarn.timeline-service.entity-group-fs-store.done-dir set. > > > Diffs > ----- > > > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py > 2ef404d > > ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py > ec7799e > > Diff: https://reviews.apache.org/r/43948/diff/ > > > Testing > ------- > > Manual testing: > 1. Created secure/non-secure clusters with Blueprint where NN, RM and ATS > were deployed to different nodes. This was tested with both cases when HDFS > has webhdfs enabled and disabled. > 2. Created a cluster using the UI where NN, RM and ATS were deployed to > different nodes. After the cluster was kerberized and was tested with both > cases when HDFS has webhdfs enabled and disabled. > > Python tests results: > ---------------------------------------------------------------------- > Total run:902 > Total errors:0 > Total failures:0 > OK > > > Thanks, > > Sebastian Toader > >
