[ 
https://issues.apache.org/jira/browse/TWILL-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201189#comment-14201189
 ] 

Alvin Wang commented on TWILL-106:
----------------------------------

Created a new CDAP cluster with default delegation token configuration, 24 hr 
kerberos token lifetime, and 0 kerberos token renew lifetime here: 
112.72.148.146.bc.googleusercontent.com

Transaction service fails after 24 hours of running, due to "token is expired" 
(sometimes creating a new cluster like this gives the "token is not found in 
cache" error message.:
{code}
Launching main: public static void 
org.apache.twill.internal.container.TwillContainerMain.main(java.lang.String[]) 
throws java.lang.Exception []
LOGBACK: No context given for 
co.cask.cdap.logging.appender.kafka.KafkaLogAppender[KafkaLogAppender]
2014-11-05 20:29:30,513 - INFO  
[ServiceDelegate:c.c.c.l.a.LogAppenderInitializer@65] - Initializing log 
appender KafkaLogAppender
2014-11-05 20:29:30,545 - INFO  
[ServiceDelegate:c.c.c.d.r.m.TransactionServiceTwillRunnable@86] - Initializing 
runnable transaction
2014-11-05 20:29:30,649 - INFO  
[ServiceDelegate:c.c.c.d.r.m.TransactionServiceTwillRunnable@88] - transaction 
Setting host name to 112.72.148.146.bc.googleusercontent.com
2014-11-05 20:29:31,058 - INFO  
[ServiceDelegate:c.c.c.d.r.m.TransactionServiceTwillRunnable@101] - Runnable 
initialized transaction
2014-11-05 20:29:31,082 - INFO  
[ServiceDelegate:c.c.c.c.t.AbstractMasterTwillRunnable@109] - Starting runnable 
transaction
2014-11-05 20:29:31,106 - INFO  
[ServiceDelegate:c.c.c.c.t.AbstractMasterTwillRunnable@118] - Runnable started 
transaction
2014-11-06 20:29:54,985 - ERROR [tx-snapshot:c.c.t.TransactionManager@594] - 
Aborting transaction manager due to: Snapshot (timestamp 1415305794971) failed 
due to: token (HDFS_DELEGATION_TOKEN token 2 for yarn) is expired
org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 2 for 
yarn) is expired
        at org.apache.hadoop.ipc.Client.call(Client.java:1347) 
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at org.apache.hadoop.ipc.Client.call(Client.java:1300) 
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 ~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at com.sun.proxy.$Proxy9.create(Unknown Source) ~[na:na]
        at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na]
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 ~[na:1.6.0_45]
        at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_45]
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
 ~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 ~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at com.sun.proxy.$Proxy9.create(Unknown Source) ~[na:na]
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:227)
 ~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1392)
 ~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1382) 
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1307) 
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:384)
 ~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:380)
 ~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:380)
 ~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:324)
 ~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905) 
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886) 
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:848) 
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
        at 
co.cask.tephra.persist.HDFSTransactionStateStorage.writeSnapshot(HDFSTransactionStateStorage.java:120)
 ~[co.cask.tephra.tephra-core-0.3.0.jar:na]
        at 
co.cask.tephra.TransactionManager.doSnapshot(TransactionManager.java:420) 
[co.cask.tephra.tephra-core-0.3.0.jar:na]
        at 
co.cask.tephra.TransactionManager.access$400(TransactionManager.java:101) 
[co.cask.tephra.tephra-core-0.3.0.jar:na]
        at 
co.cask.tephra.TransactionManager$2.doRun(TransactionManager.java:257) 
[co.cask.tephra.tephra-core-0.3.0.jar:na]
        at 
co.cask.tephra.TransactionManager$DaemonThreadExecutor.run(TransactionManager.java:1014)
 [co.cask.tephra.tephra-core-0.3.0.jar:na]
2014-11-06 20:29:54,994 - ERROR [tx-snapshot:c.c.t.d.TransactionService$1$1@75] 
- Transaction manager aborted, stopping transaction service
{code}

namenode logs indicate that token 2 (the one in the error message) is actually 
an older token. Transaction service properly obtained a new delegation token 
just before the failure:
{code}
2014-11-05 20:23:32,798 INFO  [main] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(223)) - Updating 
the current master key for generating delegation tokens
2014-11-05 20:23:32,801 INFO  [Thread[Thread-13,5,main]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(536)) - Starting expired 
delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2014-11-05 20:23:32,801 INFO  [Thread[Thread-13,5,main]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(223)) - Updating 
the current master key for generating delegation tokens
2014-11-05 20:28:59,648 INFO  [IPC Server handler 5 on 8020] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:createPassword(285)) - Creating 
password for identifier: HDFS_DELEGATION_TOKEN token 1 for yarn
2014-11-05 20:28:59,701 INFO  [IPC Server handler 0 on 8020] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:createPassword(285)) - Creating 
password for identifier: HDFS_DELEGATION_TOKEN token 2 for yarn
2014-11-05 20:29:16,217 INFO  [IPC Server handler 0 on 8020] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:renewToken(363)) - Token renewal 
requested for identifier: HDFS_DELEGATION_TOKEN token 1 for yarn
2014-11-06 18:05:16,233 INFO  [IPC Server handler 5 on 8020] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:renewToken(363)) - Token renewal 
requested for identifier: HDFS_DELEGATION_TOKEN token 1 for yarn
2014-11-06 20:23:34,656 INFO  [Thread[Thread-13,5,main]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(223)) - Updating 
the current master key for generating delegation tokens
2014-11-06 20:23:57,326 INFO  [IPC Server handler 5 on 8020] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:createPassword(285)) - Creating 
password for identifier: HDFS_DELEGATION_TOKEN token 3 for yarn
2014-11-06 20:29:54,975 INFO  [Socket Reader #1 for port 8020] ipc.Server 
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from 
client 10.240.95.189 threw exception 
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:29:57,784 INFO  [Socket Reader #1 for port 8020] ipc.Server 
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from 
client 10.240.95.189 threw exception 
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:29:58,792 INFO  [Socket Reader #1 for port 8020] ipc.Server 
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from 
client 10.240.95.189 threw exception 
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:29:59,796 INFO  [Socket Reader #1 for port 8020] ipc.Server 
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from 
client 10.240.95.189 threw exception 
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:30:00,004 INFO  [Socket Reader #1 for port 8020] ipc.Server 
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from 
client 10.240.95.189 threw exception 
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:30:00,800 INFO  [Socket Reader #1 for port 8020] ipc.Server 
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from 
client 10.240.95.189 threw exception 
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:30:01,804 INFO  [Socket Reader #1 for port 8020] ipc.Server 
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from 
client 10.240.95.189 threw exception 
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token 
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
{code}

> HDFS delegation token is not being refreshed properly
> -----------------------------------------------------
>
>                 Key: TWILL-106
>                 URL: https://issues.apache.org/jira/browse/TWILL-106
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.4.0-incubating
>            Reporter: Poorna Chandra
>
> We have a Twill app that runs in a secure Hadoop cluster. The app starts up 
> fine, and runs for a day. I can see in logs that say secure store was updated 
> regularly. However, after a day I see exceptions that say "token 
> (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache". 
> Exception:
> -------------
> 2014-10-23T04:12:42,101Z ERROR c.c.t.TransactionManager 
> [cdap-secure120-1000.dev.continuuity.net] [tx-snapshot] 
> TransactionManager:abortService(TransactionManager.java:594) - Aborting 
> transaction manager due to: Snapshot (timestamp 1414037562088) failed due to: 
> token (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache
> org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 
> 4287 for yarn) can't be found in cache
>         at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to