Jonathan Hurley created AMBARI-11941:
----------------------------------------
Summary: RU: Pre Upgrade HDFS Fails Due To Kerberos Security
Exception
Key: AMBARI-11941
URL: https://issues.apache.org/jira/browse/AMBARI-11941
Project: Ambari
Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Jonathan Hurley
Assignee: Jonathan Hurley
Priority: Blocker
Fix For: 2.1.0
During an upgrade from HDP 2.2 to HDP 2.3, the pre-upgrade of the NameNode
fails:
{code:title=cat
ip-172-31-41-15.ec2.internal/var/lib/ambari-agent/data/output-3113.txt}
2015-06-13 14:50:17,703 - call['hdfs dfsadmin -safemode get'] {'user': 'hdfs'}
2015-06-13 14:50:22,142 - call returned (255, '15/06/13 14:50:21 WARN
ipc.Client: Exception encountered while connecting to the server :
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos
tgt)]\nsafemode: Failed on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos
tgt)]; Host Details : local host is:
"ip-172-31-41-15.ec2.internal/172.31.41.15"; destination host is:
"ip-172-31-41-15.ec2.internal":8020; ')
2015-06-13 14:50:22,143 - Command: hdfs dfsadmin -safemode get
Code: 255.
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
line 311, in <module>
NameNode().execute()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 216, in execute
method(env)
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
line 110, in prepare_rolling_upgrade
namenode_upgrade.prepare_rolling_upgrade()
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode_upgrade.py",
line 100, in prepare_rolling_upgrade
raise Fail("Could not transition to safemode state %s. Please check logs to
make sure namenode is up." % str(SafeMode.OFF))
resource_management.core.exceptions.Fail: Could not transition to safemode
state OFF. Please check logs to make sure namenode is up.
{code}
With the heart of the issue being a kerberos issue:
{code}
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos
tgt)]\nsafemode: Failed on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos
tgt)];
{code}
It looks like kinit was called right before this:
{code}
2015-06-13 14:50:17,473 - Execute['/usr/bin/kinit -kt
/etc/security/keytabs/hdfs.headless.keytab [email protected]'] {}2015-06-13
14:50:17,702 - Prepare to transition into safemode state OFF
2015-06-13 14:50:17,703 - call['hdfs dfsadmin -safemode get'] {'user': 'hdfs'}
2015-06-13 14:50:22,142 - call returned (255, '15/06/13 14:50:21 WARN
ipc.Client: Exception encountered while connecting to the server :
javax.security.sasl.SaslException:...
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)