[
https://issues.apache.org/jira/browse/AMBARI-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alejandro Fernandez updated AMBARI-9717:
----------------------------------------
Attachment: AMBARI-9717.patch
> Kafka & Spark service checks fail intermittently on kerberized cluster
> ----------------------------------------------------------------------
>
> Key: AMBARI-9717
> URL: https://issues.apache.org/jira/browse/AMBARI-9717
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.0.0
> Reporter: Alejandro Fernandez
> Assignee: Alejandro Fernandez
> Fix For: 2.0.0
>
> Attachments: AMBARI-9717.patch
>
>
> Impact: Prevents RU from completing successfully
> Frequency: reproduces often
> I ran into this while performing an RU during the following,
> * Installed a 3-node cluster with ambari build #427
> * Installed HDP 2.2.2.0-2398 on centos 6
> * Added HDFS and ZK
> * Added Namenode HA
> * Added all services (including Spark and Ranger)
> * Kerberized the cluster (failed to start due to AMS service check)
> * Registered repo HDP 2.2.2.0-2399
> * Performed a RU
> stdout:
> {code}
> Running kafka create topic command
> 2015-02-18 03:29:51,851 - u'Execute[\'source /etc/kafka/conf/kafka-env.sh ;
> /usr/hdp/current/kafka-broker//bin/kafka-topics.sh --zookeeper
> c6403.ambari.apache.org:2181,c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181
> --create --topic ambari_kafka_service_check --partitions 1
> --replication-factor 1 | grep \'Created topic
> "ambari_kafka_service_check".\\|Topic "ambari_kafka_service_check" already
> exists.\'\']' {'logoutput': True}
> 2015-02-18 03:29:54,183 - Error while executing command 'service_check':
> Traceback (most recent call last):
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 208, in execute
> method(env)
> File
> "/var/lib/ambari-agent/cache/common-services/KAFKA/0.8.1.2.2/package/scripts/service_check.py",
> line 37, in service_check
> logoutput=True,
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 148, in __init__
> self.env.run()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 152, in run
> self.run_action(resource, action)
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 118, in run_action
> provider_action()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
> line 276, in action_run
> raise ex
> Fail: Execution of 'source /etc/kafka/conf/kafka-env.sh ;
> /usr/hdp/current/kafka-broker//bin/kafka-topics.sh --zookeeper
> c6403.ambari.apache.org:2181,c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181
> --create --topic ambari_kafka_service_check --partitions 1
> --replication-factor 1 | grep 'Created topic
> "ambari_kafka_service_check".\|Topic "ambari_kafka_service_check" already
> exists.'' returned 1.
> {code}
> It turns out that the Kafka topic command can return a nonzero exit code,
> which is valid, so the output just needs to be validated against a regex
> expression.
> For Spark, it was not kinit'ing before running the service check.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)