[ https://issues.apache.org/jira/browse/AMBARI-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yusaku Sako updated AMBARI-18191: --------------------------------- Fix Version/s: (was: 2.5.0) trunk > "Restart all required" services operation failed at Metrics Collector since > HDFS was not yet up > ----------------------------------------------------------------------------------------------- > > Key: AMBARI-18191 > URL: https://issues.apache.org/jira/browse/AMBARI-18191 > Project: Ambari > Issue Type: Bug > Components: ambari-metrics > Affects Versions: 2.4.0 > Reporter: Aravindan Vijayan > Assignee: Siddharth Wagle > Priority: Blocker > Fix For: trunk > > Attachments: AMBARI-18191.patch > > > ambari-server --hash > 4017036da951a10f519a578de934308cf866ba50 > *Steps* > # Deploy HDP-2.3.6 cluster with Ambari 2.2.2.0 (AMS is configured in > distributed mode) > # Upgrade Ambari to 2.4.0.0 and let it complete > # Open Ambari web UI and hit "Restart all required" under Actions menu > *Result* > The operation fails while trying to restart Metrics Collector as it tried to > make a WebHDFS call while HDFS was not started: > {code} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", > line 148, in <module> > AmsCollector().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 280, in execute > method(env) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 725, in restart > self.start(env) > File > "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", > line 46, in start > self.configure(env, action = 'start') # for security > File > "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", > line 41, in configure > hbase('master', action) > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/hbase.py", > line 213, in hbase > dfs_type=params.dfs_type > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 155, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 459, in action_create_on_execute > self.action_delayed("create") > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 456, in action_delayed > self.get_hdfs_resource_executor().action_delayed(action_name, self) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 256, in action_delayed > self._set_mode(self.target_status) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 363, in _set_mode > self.util.run_command(self.main_resource.resource.target, > 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 179, in run_command > _, out, err = get_user_call_output(cmd, user=self.run_user, > logoutput=self.logoutput, quiet=False) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", > line 61, in get_user_call_output > raise Fail(err_msg) > resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w > '%{http_code}' -X PUT --negotiate -u : > 'http://vsharma-eu-mt-5.openstacklocal:50070/webhdfs/v1/user/ams/hbase?op=SETPERMISSION&user.name=hdfs&permission=775' > 1>/tmp/tmp8twcZt 2>/tmp/tmpLPih9a' returned 7. curl: (7) couldn't connect to > host > 401 > {code} > Afterwards, restarted HDFS individually first and then hit "Restart all > Required" - the operation was successful > Looks like the issue is because the order of restart is incorrect across the > hosts, hence the dependent services don't come up upfront -- This message was sent by Atlassian JIRA (v6.3.4#6332)