Andrew Onischuk created AMBARI-24223: ----------------------------------------
Summary: Component with recovery Enabled are not coming up when autostart is enabled Key: AMBARI-24223 URL: https://issues.apache.org/jira/browse/AMBARI-24223 Project: Ambari Issue Type: Bug Reporter: Andrew Onischuk Assignee: Andrew Onischuk Fix For: 2.7.0 Attachments: AMBARI-24223.patch STR:- 1\. test enable autostart for all component 2\. test restart all the host at once. 3\. test expect within 45 min. each and every component should be up. 4\. test failed with the following exception ||Host:- nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal|| |ComponentName : RANGER_TAGSYNC Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : DRUID_ROUTER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| ComponentName : HIVE_SERVER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : SPARK2_THRIFTSERVER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : HISTORYSERVER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : DRUID_BROKER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : HBASE_MASTER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : HIVE_METASTORE Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : ZEPPELIN_MASTER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : ACTIVITY_ANALYZER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : DRUID_OVERLORD Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : TIMELINE_READER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED |ComponentName : RANGER_ADMIN Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : OOZIE_SERVER Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| ComponentName : METRICS_COLLECTOR Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| |ComponentName : METRICS_GRAFANA Expected Status: Started on host: nat-yc-r7-irrs-ambari-autostart-1-re-re-3.openstacklocal Found: INSTALLED| That means these components are not up untill the waiting time completes. I checked the agent logs:- attching the ambari-agent logs generated from cat /var/log/ambari-agent/ambari-agent.log |grep -i RecoveryManager.py [agent- autostart.log![](/images/icons/link_attachment_7.gif)](/secure/attachment/159734 /159734_agent-autostart.log "agent-autostart.log attached to BUG-106407") the relevent part if cat /var/log/ambari-agent/ambari-agent.log |grep -i RecoveryManager.py INFO 2018-06-28 11:14:31,282 RecoveryManager.py:454 - RecoverConfig = {u'components': ()} INFO 2018-06-28 11:14:34,208 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'METRICS_COLLECTOR', INFO 2018-06-28 11:14:34,208 RecoveryManager.py:178 - New status, desired status is set to INIT for METRICS_COLLECTOR INFO 2018-06-28 11:14:50,695 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HBASE_REGIONSERVER INFO 2018-06-28 11:14:51,404 RecoveryManager.py:157 - New status, current status is set to INSTALLED for SUPERVISOR INFO 2018-06-28 11:14:51,954 RecoveryManager.py:157 - New status, current status is set to INSTALLED for TEZ_CLIENT INFO 2018-06-28 11:14:52,672 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HIVE_CLIENT INFO 2018-06-28 11:14:53,467 RecoveryManager.py:157 - New status, current status is set to INSTALLED for PIG INFO 2018-06-28 11:14:54,241 RecoveryManager.py:157 - New status, current status is set to INSTALLED for KERBEROS_CLIENT INFO 2018-06-28 11:14:54,683 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HST_AGENT INFO 2018-06-28 11:14:55,367 RecoveryManager.py:157 - New status, current status is set to INSTALLED for METRICS_MONITOR INFO 2018-06-28 11:14:55,920 RecoveryManager.py:157 - New status, current status is set to INSTALLED for ZOOKEEPER_CLIENT INFO 2018-06-28 11:14:56,458 RecoveryManager.py:157 - New status, current status is set to INSTALLED for LOGSEARCH_LOGFEEDER INFO 2018-06-28 11:14:57,017 RecoveryManager.py:157 - New status, current status is set to INSTALLED for INFRA_SOLR_CLIENT INFO 2018-06-28 11:14:57,823 RecoveryManager.py:157 - New status, current status is set to INSTALLED for DATANODE INFO 2018-06-28 11:14:58,443 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HST_SERVER INFO 2018-06-28 11:14:59,205 RecoveryManager.py:157 - New status, current status is set to INSTALLED for TIMELINE_READER INFO 2018-06-28 11:15:00,015 RecoveryManager.py:157 - New status, current status is set to INSTALLED for DRUID_ROUTER INFO 2018-06-28 11:15:00,617 RecoveryManager.py:157 - New status, current status is set to INSTALLED for NAMENODE INFO 2018-06-28 11:15:01,151 RecoveryManager.py:157 - New status, current status is set to INSTALLED for YARN_REGISTRY_DNS INFO 2018-06-28 11:15:01,745 RecoveryManager.py:157 - New status, current status is set to INSTALLED for DRPC_SERVER INFO 2018-06-28 11:15:02,250 RecoveryManager.py:157 - New status, current status is set to INSTALLED for STORM_UI_SERVER INFO 2018-06-28 11:15:02,812 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HBASE_MASTER INFO 2018-06-28 11:15:03,353 RecoveryManager.py:157 - New status, current status is set to INSTALLED for RANGER_USERSYNC INFO 2018-06-28 11:15:03,825 RecoveryManager.py:157 - New status, current status is set to INSTALLED for NIMBUS INFO 2018-06-28 11:15:04,413 RecoveryManager.py:157 - New status, current status is set to INSTALLED for SUPERSET INFO 2018-06-28 11:15:04,996 RecoveryManager.py:157 - New status, current status is set to INSTALLED for APP_TIMELINE_SERVER INFO 2018-06-28 11:15:05,552 RecoveryManager.py:157 - New status, current status is set to INSTALLED for ACTIVITY_EXPLORER INFO 2018-06-28 11:15:06,202 RecoveryManager.py:157 - New status, current status is set to INSTALLED for LOGSEARCH_SERVER INFO 2018-06-28 11:15:07,436 RecoveryManager.py:157 - New status, current status is set to INSTALLED for SPARK2_JOBHISTORYSERVER INFO 2018-06-28 11:15:07,962 RecoveryManager.py:157 - New status, current status is set to INSTALLED for INFRA_SOLR INFO 2018-06-28 11:15:08,494 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HIVE_METASTORE INFO 2018-06-28 11:15:09,508 RecoveryManager.py:157 - New status, current status is set to INSTALLED for RESOURCEMANAGER INFO 2018-06-28 11:15:10,197 RecoveryManager.py:157 - New status, current status is set to INSTALLED for OOZIE_SERVER INFO 2018-06-28 11:15:10,840 RecoveryManager.py:157 - New status, current status is set to INSTALLED for RANGER_ADMIN INFO 2018-06-28 11:15:11,517 RecoveryManager.py:157 - New status, current status is set to INSTALLED for SPARK2_THRIFTSERVER INFO 2018-06-28 11:15:12,028 RecoveryManager.py:157 - New status, current status is set to INSTALLED for DRUID_OVERLORD INFO 2018-06-28 11:15:12,998 RecoveryManager.py:157 - New status, current status is set to INSTALLED for DRUID_BROKER INFO 2018-06-28 11:15:13,650 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HIVE_SERVER INFO 2018-06-28 11:15:14,321 RecoveryManager.py:157 - New status, current status is set to INSTALLED for RANGER_TAGSYNC INFO 2018-06-28 11:15:14,852 RecoveryManager.py:157 - New status, current status is set to INSTALLED for ACTIVITY_ANALYZER INFO 2018-06-28 11:15:14,956 RecoveryManager.py:163 - current status is set to INSTALLED for METRICS_COLLECTOR INFO 2018-06-28 11:15:15,655 RecoveryManager.py:157 - New status, current status is set to INSTALLED for ZEPPELIN_MASTER INFO 2018-06-28 11:15:16,199 RecoveryManager.py:157 - New status, current status is set to INSTALLED for METRICS_GRAFANA INFO 2018-06-28 11:15:16,859 RecoveryManager.py:157 - New status, current status is set to INSTALLED for HISTORYSERVER INFO 2018-06-28 11:17:36,038 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'METRICS_COLLECTOR', INFO 2018-06-28 11:17:36,042 RecoveryManager.py:183 - desired status is set to STARTED for METRICS_COLLECTOR INFO 2018-06-28 11:27:33,451 RecoveryManager.py:183 - desired status is set to INSTALLED for METRICS_COLLECTOR INFO 2018-06-28 11:27:33,451 RecoveryManager.py:620 - Received EXECUTION_COMMAND (STOP/INSTALL), desired state of METRICS_COLLECTOR to INSTALLED INFO 2018-06-28 11:29:10,328 RecoveryManager.py:588 - After EXECUTION_COMMAND (STOP/INSTALL), with taskId=63, current state of METRICS_COLLECTOR to INSTALLED INFO 2018-06-28 11:45:03,661 RecoveryManager.py:163 - current status is set to STARTED for INFRA_SOLR INFO 2018-06-28 11:45:12,168 RecoveryManager.py:163 - current status is set to STARTED for METRICS_MONITOR INFO 2018-06-28 11:45:20,469 RecoveryManager.py:163 - current status is set to STARTED for HST_SERVER INFO 2018-06-28 11:45:26,682 RecoveryManager.py:163 - current status is set to STARTED for RANGER_TAGSYNC INFO 2018-06-28 11:45:46,474 RecoveryManager.py:163 - current status is set to STARTED for YARN_REGISTRY_DNS INFO 2018-06-28 11:45:54,622 RecoveryManager.py:163 - current status is set to STARTED for HST_AGENT INFO 2018-06-28 11:46:27,459 RecoveryManager.py:163 - current status is set to STARTED for LOGSEARCH_SERVER INFO 2018-06-28 11:48:53,946 RecoveryManager.py:163 - current status is set to STARTED for RANGER_ADMIN INFO 2018-06-28 14:34:17,142 RecoveryManager.py:583 - After EXECUTION_COMMAND (START), with taskId=1530194603, current state of ACTIVITY_EXPLORER to STARTED INFO 2018-06-28 14:34:18,039 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,105 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,170 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,259 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,303 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,409 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,478 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,570 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,616 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,644 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,674 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,730 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,764 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,780 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,831 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,875 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,888 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,960 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:18,982 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:19,015 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:19,051 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:19,090 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:19,108 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:19,125 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:19,220 RecoveryManager.py:454 - RecoverConfig = {u'components': ({u'component_name': u'SUPERSET', INFO 2018-06-28 14:34:19,290 RecoveryManager.py:454 - RecoverConfig = {u'components': ()} INFO 2018-06-28 14:35:36,027 RecoveryManager.py:163 - current status is set to INSTALLED for SUPERVISOR INFO 2018-06-28 14:36:05,838 RecoveryManager.py:163 - current status is set to STARTED for SUPERSET INFO 2018-06-28 14:36:45,585 RecoveryManager.py:163 - current status is set to STARTED for YARN_REGISTRY_DNS INFO 2018-06-28 14:37:18,681 RecoveryManager.py:163 - current status is set to STARTED for INFRA_SOLR INFO 2018-06-28 14:37:55,603 RecoveryManager.py:163 - current status is set to STARTED for METRICS_MONITOR INFO 2018-06-28 14:38:14,872 RecoveryManager.py:163 - current status is set to STARTED for RANGER_TAGSYNC INFO 2018-06-28 14:38:29,806 RecoveryManager.py:163 - current status is set to STARTED for DATANODE INFO 2018-06-28 14:38:43,789 RecoveryManager.py:163 - current status is set to STARTED for RANGER_USERSYNC INFO 2018-06-28 14:38:54,727 RecoveryManager.py:163 - current status is set to INSTALLED for METRICS_MONITOR INFO 2018-06-28 14:39:40,572 RecoveryManager.py:163 - current status is set to STARTED for STORM_UI_SERVER INFO 2018-06-28 14:39:42,898 RecoveryManager.py:163 - current status is set to STARTED for METRICS_MONITOR INFO 2018-06-28 14:40:49,742 RecoveryManager.py:163 - current status is set to STARTED for RANGER_ADMIN INFO 2018-06-28 14:41:13,508 RecoveryManager.py:163 - current status is set to STARTED for HBASE_REGIONSERVER INFO 2018-06-28 14:44:02,873 RecoveryManager.py:163 - current status is set to STARTED for NIMBUS INFO 2018-06-28 14:45:17,040 RecoveryManager.py:163 - current status is set to STARTED for METRICS_COLLECTOR INFO 2018-06-28 14:45:20,928 RecoveryManager.py:163 - current status is set to STARTED for LOGSEARCH_LOGFEEDER INFO 2018-06-28 14:46:24,130 RecoveryManager.py:163 - current status is set to INSTALLED for INFRA_SOLR INFO 2018-06-28 14:46:24,598 RecoveryManager.py:163 - current status is set to STARTED for ACTIVITY_ANALYZER INFO 2018-06-28 14:47:14,171 RecoveryManager.py:163 - current status is set to STARTED for INFRA_SOLR INFO 2018-06-28 14:49:22,384 RecoveryManager.py:163 - current status is set to STARTED for NAMENODE INFO 2018-06-28 14:49:31,552 RecoveryManager.py:163 - current status is set to STARTED for DRPC_SERVER INFO 2018-06-28 15:10:55,994 RecoveryManager.py:163 - current status is set to STARTED for LOGSEARCH_SERVER INFO 2018-06-28 15:11:01,316 RecoveryManager.py:163 - current status is set to STARTED for SPARK2_JOBHISTORYSERVER INFO 2018-06-28 15:11:11,251 RecoveryManager.py:163 - current status is set to STARTED for METRICS_GRAFANA INFO 2018-06-28 15:11:37,007 RecoveryManager.py:163 - current status is set to STARTED for HST_SERVER INFO 2018-06-28 15:11:57,521 RecoveryManager.py:163 - current status is set to STARTED for HST_AGENT INFO 2018-06-28 15:12:03,321 RecoveryManager.py:163 - current status is set to STARTED for OOZIE_SERVER INFO 2018-06-28 15:13:24,798 RecoveryManager.py:163 - current status is set to INSTALLED for RANGER_ADMIN INFO 2018-06-28 15:14:13,744 RecoveryManager.py:163 - current status is set to STARTED for RANGER_ADMIN INFO 2018-06-28 15:15:03,889 RecoveryManager.py:163 - current status is set to INSTALLED for RANGER_ADMIN INFO 2018-06-28 15:15:50,119 RecoveryManager.py:163 - current status is set to INSTALLED for OOZIE_SERVER INFO 2018-06-28 15:15:50,451 RecoveryManager.py:163 - current status is set to INSTALLED for NAMENODE INFO 2018-06-28 15:15:51,459 RecoveryManager.py:163 - current status is set to STARTED for RANGER_ADMIN INFO 2018-06-28 15:15:57,678 RecoveryManager.py:163 - current status is set to STARTED for NAMENODE INFO 2018-06-28 15:16:22,782 RecoveryManager.py:163 - current status is set to STARTED for HBASE_MASTER INFO 2018-06-28 15:16:40,512 RecoveryManager.py:163 - current status is set to STARTED for APP_TIMELINE_SERVER INFO 2018-06-28 15:16:43,582 RecoveryManager.py:163 - current status is set to STARTED for OOZIE_SERVER INFO 2018-06-28 15:17:45,952 RecoveryManager.py:163 - current status is set to STARTED for HISTORYSERVER INFO 2018-06-28 15:37:57,495 RecoveryManager.py:163 - current status is set to INSTALLED for HST_AGENT INFO 2018-06-28 15:38:47,195 RecoveryManager.py:163 - current status is set to STARTED for HST_AGENT I checked is this because of memory issue. but [root@nat-yc-r7-irrs-ambari-autostart-1-re-re-3 data]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel-root 17G 11G 6.0G 65% / devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 361M 7.5G 5% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/vda1 1014M 172M 843M 17% /boot /dev/vdb 246G 19G 216G 8% /grid/0 tmpfs 1.6G 0 1.6G 0% /run/user/1037 tmpfs 1.6G 0 1.6G 0% /run/user/0 I checked some of auto_errors cat auto_errors-1530190498.txt Traceback (most recent call last): File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/historyserver.py", line 132, in <module> HistoryServer().execute() File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute method(env) File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/historyserver.py", line 100, in start skip=params.sysprep_skip_copy_tarballs_hdfs) or resource_created File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/copy_tarball.py", line 497, in copy_to_hdfs source_file = prepare_function() File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/copy_tarball.py", line 97, in _prepare_tez_tarball hadoop_lib_native_lzo_dir = os.path.join(stack_root, service_version, "hadoop", "lib", "native") File "/usr/lib64/python2.7/posixpath.py", line 75, in join if b.startswith('/'): AttributeError: 'NoneType' object has no attribute 'startswith' attching the ambari-agent logs generated from cat /var/log/ambari-agent/ambari-agent.log |grep -i RecoveryManager.py [agent- autostart.log![](/images/icons/link_attachment_7.gif)](/secure/attachment/159734 /159734_agent-autostart.log "agent-autostart.log attached to BUG-106407") Repro cluster:- <http://linux-jenkins.qe.hortonworks.com:8080/job/Nightly- Start-EC2-Run-HDP/985907/> lifetime extended to 72 hours <http://linux- jenkins.qe.hortonworks.com:8080/job/update-openstack-lifetime/12085/> -- This message was sent by Atlassian JIRA (v7.6.3#76005)