Alejandro Fernandez created AMBARI-9717:
-------------------------------------------
Summary: Kafka & Spark service checks fail intermittently on
kerberized cluster
Key: AMBARI-9717
URL: https://issues.apache.org/jira/browse/AMBARI-9717
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.0.0
Reporter: Alejandro Fernandez
Assignee: Alejandro Fernandez
Fix For: 2.0.0
Impact: Prevents RU from completing successfully
Frequency: reproduces often
I ran into this while performing an RU during the following,
* Installed a 3-node cluster with ambari build #427
* Installed HDP 2.2.2.0-2398 on centos 6
* Added HDFS and ZK
* Added Namenode HA
* Added all services (including Spark and Ranger)
* Kerberized the cluster (failed to start due to AMS service check)
* Registered repo HDP 2.2.2.0-2399
* Performed a RU
stdout:
{code}
Running kafka create topic command
2015-02-18 03:29:51,851 - u'Execute[\'source /etc/kafka/conf/kafka-env.sh ;
/usr/hdp/current/kafka-broker//bin/kafka-topics.sh --zookeeper
c6403.ambari.apache.org:2181,c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181
--create --topic ambari_kafka_service_check --partitions 1
--replication-factor 1 | grep \'Created topic
"ambari_kafka_service_check".\\|Topic "ambari_kafka_service_check" already
exists.\'\']' {'logoutput': True}
2015-02-18 03:29:54,183 - Error while executing command 'service_check':
Traceback (most recent call last):
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 208, in execute
method(env)
File
"/var/lib/ambari-agent/cache/common-services/KAFKA/0.8.1.2.2/package/scripts/service_check.py",
line 37, in service_check
logoutput=True,
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 148, in __init__
self.env.run()
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 152, in run
self.run_action(resource, action)
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 118, in run_action
provider_action()
File
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 276, in action_run
raise ex
Fail: Execution of 'source /etc/kafka/conf/kafka-env.sh ;
/usr/hdp/current/kafka-broker//bin/kafka-topics.sh --zookeeper
c6403.ambari.apache.org:2181,c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181
--create --topic ambari_kafka_service_check --partitions 1
--replication-factor 1 | grep 'Created topic
"ambari_kafka_service_check".\|Topic "ambari_kafka_service_check" already
exists.'' returned 1.
{code}
It turns out that the Kafka topic command can return a nonzero exit code, which
is valid, so the output just needs to be validated against a regex expression.
For Spark, it was not kinit'ing before running the service check.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)