Anand Subramanian created METRON-1326: -----------------------------------------
Summary: Metron deploy with Kerberos fails on Ambari 2.5 during ES service stop Key: METRON-1326 URL: https://issues.apache.org/jira/browse/METRON-1326 Project: Metron Issue Type: Bug Environment: 12 node VM cluster running CentOS 7 Reporter: Anand Subramanian I am noticing that Metron deploy is failing when enabling Kerberos on a 12-node VM cluster managed by Ambari 2.5.2. The error is seen during the "Stop Services" step while kerberizing for Elasticsearch Master and Elasticsearch Data Node services. I confirmed that the same deployment goes through fine for Ambari 2.4.2 version. I am able to setup the Kerberized cluster fine. For Ambari 2.4, for the "Elasticsearch Data Node Stop" step, we stop the slave, and do not check on the status of the service after the 'service stop' command was issued. But with Ambari 2.5, we attempt to check the status after the service stop command was issued. *In Ambari 2.4* {code} stdout: Stop the Slave 2017-11-07 10:21:27,755 - Execute['service elasticsearch stop'] {} Command completed successfully! {code} *In Ambari 2.5* {code} Stop the Slave 2017-11-07 10:12:48,481 - Execute['service elasticsearch stop'] {} 2017-11-07 10:12:48,599 - Waiting for actual component stop Status of the Slave 2017-11-07 10:12:48,600 - Execute['service elasticsearch status'] {} Command failed after 1 tries {code} Apparently the status command is returning a result with error code 3, which the ambari agent is not liking and hence calling the step as a failure. I am not sure entirely if this is something to be handled by Metron or by Ambari. Please feel free to close this defect in case this is deemed out of scope of Metron. Here is the full error log from the UI {code} stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 71, in <module> Elasticsearch().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 332, in execute self.execute_prefix_function(self.command_name, 'after', env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 350, in execute_prefix_function method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 398, in after_stop status_method(env) File "/var/lib/ambari-agent/cache/common-services/ELASTICSEARCH/2.3.3/package/scripts/elastic_slave.py", line 59, in status Execute(status_cmd) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of 'service elasticsearch status' returned 3. ● elasticsearch.service - Elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: http://www.elastic.co Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,340][INFO ][cluster.service ] [metron-12.openstacklocal] removed {{metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false},}, reason: zen-disco-node_left({metron-9.openstacklocal}{lTJDzEA6Sp6_6ryTY8XSJQ}{172.22.97.19}{172.22.97.19:9300}{master=false}) Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,466][INFO ][cluster.service ] [metron-12.openstacklocal] removed {{metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false},}, reason: zen-disco-node_left({metron-8.openstacklocal}{Q7pgb5LLSj-oHMxld-DFfw}{172.22.97.188}{172.22.97.188:9300}{master=false}) Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,548][INFO ][cluster.service ] [metron-12.openstacklocal] removed {{metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false},}, reason: zen-disco-node_left({metron-2.openstacklocal}{8JdEI93MQPeDxD63tMKrRQ}{172.22.96.83}{172.22.96.83:9300}{master=false}) Nov 07 10:12:47 metron-12 elasticsearch[25937]: [2017-11-07 10:12:47,713][INFO ][cluster.service ] [metron-12.openstacklocal] removed {{metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false},}, reason: zen-disco-node_left({metron-5.openstacklocal}{643SMG8xSLOuFEZpuMNeQg}{172.22.97.119}{172.22.97.119:9300}{master=false}) Nov 07 10:12:48 metron-12 systemd[1]: Stopping Elasticsearch... Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,417][INFO ][node ] [metron-12.openstacklocal] stopping ... Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node ] [metron-12.openstacklocal] stopped Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,456][INFO ][node ] [metron-12.openstacklocal] closing ... Nov 07 10:12:48 metron-12 elasticsearch[25937]: [2017-11-07 10:12:48,491][INFO ][node ] [metron-12.openstacklocal] closed Nov 07 10:12:48 metron-12 systemd[1]: Stopped Elasticsearch. stdout: Stop the Slave 2017-11-07 10:12:49,025 - Execute['service elasticsearch stop'] {} 2017-11-07 10:12:49,089 - Waiting for actual component stop Status of the Slave 2017-11-07 10:12:49,090 - Execute['service elasticsearch status'] {} Command failed after 1 tries {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)