[ https://issues.apache.org/jira/browse/AMBARI-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hurley updated AMBARI-13120: ------------------------------------- Attachment: AMBARI-13120.patch > Restart Of NodeManager During Rolling Upgrade Runs Command As the Wrong User > ---------------------------------------------------------------------------- > > Key: AMBARI-13120 > URL: https://issues.apache.org/jira/browse/AMBARI-13120 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.1.0 > Reporter: Jonathan Hurley > Assignee: Jonathan Hurley > Priority: Critical > Fix For: 2.1.2 > > Attachments: AMBARI-13120.patch > > > During core slaves step one of nodemanagers failed to restart: > {code} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", > line 153, in <module> > Nodemanager().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 219, in execute > method(env) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 476, in restart > self.post_rolling_restart(env) > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", > line 84, in post_rolling_restart > nodemanager_upgrade.post_upgrade_check() > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager_upgrade.py", > line 41, in post_upgrade_check > _check_nodemanager_startup() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", > line 54, in wrapper > return function(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager_upgrade.py", > line 74, in _check_nodemanager_startup > raise Fail('Unable to determine if the NodeManager has started after > upgrade (result code {0})'.format(str(return_code))) > resource_management.core.exceptions.Fail: Unable to determine if the > NodeManager has started after upgrade (result code 1) > {code} > Looks like expiration of a ticket caused this: > {code} > 15/09/15 15:28:39 INFO impl.TimelineClientImpl: Timeline service address: > http://os-r7-hpjtks-rudtodalsec-5.novalocal:8188/ws/v1/timeline/ > 15/09/15 15:28:40 INFO client.RMProxy: Connecting to ResourceManager at > os-r7-hpjtks-rudtodalsec-15.novalocal/172.22.112.51:8050 > 15/09/15 15:28:40 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > Exception in thread "main" java.io.IOException: Failed on local exception: > java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt)]; Host Details : local host is: > "os-r7-hpjtks-rudtodalsec-19/172.22.112.64"; destination host is: > "os-r7-hpjtks-rudtodalsec-15.novalocal":8050; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1431) > at org.apache.hadoop.ipc.Client.call(Client.java:1358) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy17.getClusterNodes(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:266) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy18.getClusterNodes(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:520) > at > org.apache.hadoop.yarn.client.cli.NodeCLI.listClusterNodes(NodeCLI.java:153) > at org.apache.hadoop.yarn.client.cli.NodeCLI.run(NodeCLI.java:122) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.yarn.client.cli.NodeCLI.main(NodeCLI.java:62) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735) > at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493) > at org.apache.hadoop.ipc.Client.call(Client.java:1397) > ... 17 more > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558) > at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722) > ... 20 more > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt) > at > sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121) > at > sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) > at > sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) > at > sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193) > ... 29 more > {code} > {code} > [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'kdestroy' > [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'yarn node -list > -states=RUNNING' > 15/09/16 21:08:21 INFO impl.TimelineClientImpl: Timeline service address: > http://os-r7-hpjtks-rudtodalsec-5.novalocal:8188/ws/v1/timeline/ > 15/09/16 21:08:21 INFO client.RMProxy: Connecting to ResourceManager at > os-r7-hpjtks-rudtodalsec-15.novalocal/172.22.112.51:8050 > 15/09/16 21:08:21 WARN ipc.Client: Exception encountered while connecting to > the server : javax.security.sasl.SaslException: GSS initiate failed [Caused > by GSSException: No valid credentials provided (Mechanism level: Failed to > find any Kerberos tgt)] > ... > [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'kinit -kt > /etc/security/keytabs/nm.service.keytab > nm/os-r7-hpjtks-rudtodalsec-19.novalo...@example.com' > [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'yarn node -list > -states=RUNNING' > 15/09/16 21:08:59 INFO impl.TimelineClientImpl: Timeline service address: > http://os-r7-hpjtks-rudtodalsec-5.novalocal:8188/ws/v1/timeline/ > 15/09/16 21:08:59 INFO client.RMProxy: Connecting to ResourceManager at > os-r7-hpjtks-rudtodalsec-15.novalocal/172.22.112.51:8050 > Total Nodes:20 > Node-Id Node-State Node-Http-Address > Number-of-Running-Containers > os-r7-hpjtks-rudtodalsec-6.novalocal:25454 RUNNING > os-r7-hpjtks-rudtodalsec-6.novalocal:8042 0 > os-r7-hpjtks-rudtodalsec-16.novalocal:25454 RUNNING > os-r7-hpjtks-rudtodalsec-16.novalocal:8042 0 > os-r7-hpjtks-rudtodalsec-13.novalocal:25454 RUNNING > os-r7-hpjtks-rudtodalsec-13.novalocal:8042 0 > ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)