[
https://issues.apache.org/jira/browse/AMBARI-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329907#comment-14329907
]
Hudson commented on AMBARI-9717:
--------------------------------
FAILURE: Integrated in Ambari-trunk-Commit #1828 (See
[https://builds.apache.org/job/Ambari-trunk-Commit/1828/])
AMBARI-9717. Kafka & Spark service checks fail intermittently on kerberized
cluster (alejandro) (afernandez:
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=6e23af6a443de04f148cfff0e7da572497ee9d9e)
*
ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_server.py
*
ambari-common/src/main/python/resource_management/libraries/functions/validate.py
*
ambari-server/src/main/resources/common-services/SPARK/1.2.0.2.2/package/scripts/service_check.py
*
ambari-server/src/main/resources/common-services/KAFKA/0.8.1.2.2/package/scripts/service_check.py
*
ambari-server/src/main/resources/common-services/KAFKA/0.8.1.2.2/package/scripts/params.py
> Kafka & Spark service checks fail intermittently on kerberized cluster
> ----------------------------------------------------------------------
>
> Key: AMBARI-9717
> URL: https://issues.apache.org/jira/browse/AMBARI-9717
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.0.0
> Reporter: Alejandro Fernandez
> Assignee: Alejandro Fernandez
> Fix For: 2.0.0
>
> Attachments: AMBARI-9717.patch
>
>
> Impact: Prevents RU from completing successfully
> Frequency: reproduces often
> I ran into this while performing an RU during the following,
> * Installed a 3-node cluster with ambari build #427
> * Installed HDP 2.2.2.0-2398 on centos 6
> * Added HDFS and ZK
> * Added Namenode HA
> * Added all services (including Spark and Ranger)
> * Kerberized the cluster (failed to start due to AMS service check)
> * Registered repo HDP 2.2.2.0-2399
> * Performed a RU
> stdout:
> {code}
> Running kafka create topic command
> 2015-02-18 03:29:51,851 - u'Execute[\'source /etc/kafka/conf/kafka-env.sh ;
> /usr/hdp/current/kafka-broker//bin/kafka-topics.sh --zookeeper
> c6403.ambari.apache.org:2181,c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181
> --create --topic ambari_kafka_service_check --partitions 1
> --replication-factor 1 | grep \'Created topic
> "ambari_kafka_service_check".\\|Topic "ambari_kafka_service_check" already
> exists.\'\']' {'logoutput': True}
> 2015-02-18 03:29:54,183 - Error while executing command 'service_check':
> Traceback (most recent call last):
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 208, in execute
> method(env)
> File
> "/var/lib/ambari-agent/cache/common-services/KAFKA/0.8.1.2.2/package/scripts/service_check.py",
> line 37, in service_check
> logoutput=True,
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 148, in __init__
> self.env.run()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 152, in run
> self.run_action(resource, action)
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 118, in run_action
> provider_action()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
> line 276, in action_run
> raise ex
> Fail: Execution of 'source /etc/kafka/conf/kafka-env.sh ;
> /usr/hdp/current/kafka-broker//bin/kafka-topics.sh --zookeeper
> c6403.ambari.apache.org:2181,c6401.ambari.apache.org:2181,c6402.ambari.apache.org:2181
> --create --topic ambari_kafka_service_check --partitions 1
> --replication-factor 1 | grep 'Created topic
> "ambari_kafka_service_check".\|Topic "ambari_kafka_service_check" already
> exists.'' returned 1.
> {code}
> It turns out that the Kafka topic command can return a nonzero exit code,
> which is valid, so the output just needs to be validated against a regex
> expression.
> For Spark, it fails with
> {code}
> 2015-02-20 01:25:28,782 - call['hdp-select status hadoop-client'] {'timeout':
> 20}
> 2015-02-20 01:26:19,441 - Spark Job History Server not running.
> 2015-02-20 01:26:19,442 - Error while executing command 'service_check':
> Traceback (most recent call last):
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 208, in execute
> method(env)
> File
> "/var/lib/ambari-agent/cache/common-services/SPARK/1.2.0.2.2/package/scripts/service_check.py",
> line 61, in service_check
> raise ComponentIsNotRunning()
> ComponentIsNotRunning
> {code}
> while running this command several times because it has not kinit'ed,
> {code}
> curl -s -o /dev/null -w'%{http_code}' --negotiate -u: -k
> http://c6407.ambari.apache.org:18080
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)