[ 
https://issues.apache.org/jira/browse/IGNITE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Puviarasu updated IGNITE-8890:
------------------------------
    Labels: Ignite kerberos yarn  (was: )

> Ignite YARN Kerberos - Delegation Token renewal
> -----------------------------------------------
>
>                 Key: IGNITE-8890
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8890
>             Project: Ignite
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.3
>         Environment: Kerberos cluster
> Ignite Version : 2.3.0
> Module : Ignite-YARN
> Class : ApplicationMaster
>  
>            Reporter: Puviarasu
>            Priority: Blocker
>              Labels: Ignite, kerberos, yarn
>
> As Ignite-YARN is a long running application in YARN environment it should 
> have a mechanism to renew the delegation token.
> In Ignite-YARN, when the ApplicationMaster is started, it acquires Delegation 
> tokens and stores in a ByteBuffer[Class: ApplicationMaster, Method: init()].
>  This ByteBuffer with token information is given to all the containers 
> received from ResourceManager [Class: ApplicationMaster, Method: 
> onContainersAllocated()]. 
>  Everything works fine till the life time of the delegation token. 
> Once the delegation token expires, the ApplicationMaster is not able to start 
> Ignite inside containers it receive and below exception occurs
> *WARNING: Error launching container* 
>  
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager*$InvalidToken*)
>  :
>  at org.apache.hadoop.ipc.Client.call(Client.java:1504)
>  at org.apache.hadoop.ipc.Client.call(Client.java:1441)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
>  at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2123)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1253)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1249)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1249)
>  at 
> org.apache.ignite.yarn.utils.IgniteYarnUtils.setupFile(IgniteYarnUtils.java:65)
>  at 
> org.apache.ignite.yarn.ApplicationMaster.onContainersAllocated(ApplicationMaster.java:131)
>  at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:292)
> ApplicationMaster keeps on asking for more and more containers [Class: 
> ApplicationMaster, Method: run()] but not able to start Ignite inside any of 
> the containers due to the expired/missing delegation token. The failed 
> containers are not released when Exception occurs.
>  *This repeats until all the resources in the cluster are allocated to 
> Ignition. As a result of this Ignition uses all resources in the cluster and 
> no other jobs were able to run.*  
> Kindly help in resolving the issue.
> Thanks in Advance!!!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to