[ https://issues.apache.org/jira/browse/METRON-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justin Leet updated METRON-1326: -------------------------------- Fix Version/s: 0.5.0 > Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop > ---------------------------------------------------------------------- > > Key: METRON-1326 > URL: https://issues.apache.org/jira/browse/METRON-1326 > Project: Metron > Issue Type: Bug > Environment: 12 node VM cluster running CentOS 7 > Reporter: Anand Subramanian > Assignee: Michael Miklavcic > Priority: Major > Fix For: 0.5.0 > > > I am noticing that Metron deploy is failing when enabling Kerberos on a > 12-node VM cluster managed by Ambari 2.5.2. > The error is seen during the "Stop Services" step while kerberizing for > Elasticsearch Master and Elasticsearch Data Node services. > I confirmed that the same deployment goes through fine for Ambari 2.4.2 > version. I am able to setup the Kerberized cluster fine. > For Ambari 2.4, for the "Elasticsearch Data Node Stop" step, we stop the > slave, and do not check on the status of the service after the 'service stop' > command was issued. But with Ambari 2.5, we attempt to check the status after > the service stop command was issued. > *In Ambari 2.4* > {code} > stdout: > Stop the Slave > 2017-11-07 10:21:27,755 - Execute['service elasticsearch stop'] {} > Command completed successfully! > {code} > *In Ambari 2.5* > {code} > Stop the Slave > 2017-11-07 10:12:48,481 - Execute['service elasticsearch stop'] {} > 2017-11-07 10:12:48,599 - Waiting for actual component stop > Status of the Slave > 2017-11-07 10:12:48,600 - Execute['service elasticsearch status'] {} > Command failed after 1 tries > {code} > Apparently the status command is returning a result with error code 3, which > the ambari agent is not liking and hence calling the step as a failure. > I am not sure entirely if this is something to be handled by Metron or by > Ambari. Please feel free to close this defect in case this is deemed out of > scope of Metron. > Here is the full error log from the UI > {code} > stderr: > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", > line 71, in <module> > Elasticsearch().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 332, in execute > self.execute_prefix_function(self.command_name, 'after', env) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 350, in execute_prefix_function > method(env) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 398, in after_stop > status_method(env) > File > "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", > line 59, in status > Execute(status_cmd) > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 166, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", > line 262, in action_run > tries=self.resource.tries, try_sleep=self.resource.try_sleep) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 72, in inner > result = function(command, **kwargs) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 102, in checked_call > tries=tries, try_sleep=try_sleep, > timeout_kill_strategy=timeout_kill_strategy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 150, in _call_wrapper > result = _call(command, **kwargs_copy) > File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", > line 303, in _call > raise ExecutionFailed(err_msg, code, out, err) > resource_management.core.exceptions.ExecutionFailed: Execution of 'service > elasticsearch status' returned 3. ● elasticsearch.service - Elasticsearch > Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; > vendor preset: disabled) > Active: inactive (dead) > Docs: http://www.elastic.co > Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:47,340][INFO ][cluster.service ] [metron-12.openstacklocal] > removed > {{metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false},}, > reason: > zen-disco-node_left({metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false}) > Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:47,466][INFO ][cluster.service ] [metron-12.openstacklocal] > removed > {{metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false},}, > reason: > zen-disco-node_left({metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false}) > Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:47,548][INFO ][cluster.service ] [metron-12.openstacklocal] > removed > {{metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false},}, > reason: > zen-disco-node_left({metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false}) > Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:47,713][INFO ][cluster.service ] [metron-12.openstacklocal] > removed > {{metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false},}, > reason: > zen-disco-node_left({metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false}) > Nov 07 10:12:48 metron-12 systemd[1]: Stopping Elasticsearch... > Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:48,417][INFO ][node ] [metron-12.openstacklocal] > stopping ... > Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:48,456][INFO ][node ] [metron-12.openstacklocal] > stopped > Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:48,456][INFO ][node ] [metron-12.openstacklocal] > closing ... > Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 > 10:12:48,491][INFO ][node ] [metron-12.openstacklocal] > closed > Nov 07 10:12:48 metron-12 systemd[1]: Stopped Elasticsearch. > stdout: > Stop the Slave > 2017-11-07 10:12:49,025 - Execute['service elasticsearch stop'] {} > 2017-11-07 10:12:49,089 - Waiting for actual component stop > Status of the Slave > 2017-11-07 10:12:49,090 - Execute['service elasticsearch status'] {} > Command failed after 1 tries > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)