[
https://issues.apache.org/jira/browse/TWILL-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alvin Wang updated TWILL-106:
-----------------------------
Comment: was deleted
(was: Found that HDFS delegation token is properly updated according to
UserGroupInformation.getCurrentUser().getTokens(). However, 1 day after the
Twill app is started, we get the
{code}
23:55:06.646 [TwillContainerService] ERROR examples.HelloWorld - Error
org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 7256
for yarn) is expired
{code}
message. In this case, token 7256 is the latest token, which was created only
10 minutes before this error message is logged.
--
Also saw an error message in the app master log mentioning that
UserGroupInformation aborted the renew thread:
{code}
242 18:59:22.779 [TGT Renewer for
yarn/[email protected]] WARN
o.a.h.security.UserGroupInformation - Exception encountered while running the
renewal command. Aborting renew thread.
org.apache.hadoop.util.Shell$ExitCodeException: kinit: Ticket expired while
renewing credentials
{code}
This is likely due to the fact that "Maximum renewable life" is set to "0 days
00:00:00" in the KDC for this principal. Also, "Maximum ticket life" is set to
"1 day 00:00:00" which may be what is causing the Twill app to fail after 1 day
because I also tested a cluster with delegation token expiration settings at
the <10 min level in hdfs-site.xml, yet the Twill apps didn't fail within 10
minutes.
I set the maximum renewable life to 7 days - will see if Twill app still fails
after 1 day.)
> HDFS delegation token is not being refreshed properly
> -----------------------------------------------------
>
> Key: TWILL-106
> URL: https://issues.apache.org/jira/browse/TWILL-106
> Project: Apache Twill
> Issue Type: Bug
> Components: core
> Affects Versions: 0.4.0-incubating
> Reporter: Poorna Chandra
>
> We have a Twill app that runs in a secure Hadoop cluster. The app starts up
> fine, and runs for a day. I can see in logs that say secure store was updated
> regularly. However, after a day I see exceptions that say "token
> (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache".
> Exception:
> -------------
> 2014-10-23T04:12:42,101Z ERROR c.c.t.TransactionManager
> [cdap-secure120-1000.dev.continuuity.net] [tx-snapshot]
> TransactionManager:abortService(TransactionManager.java:594) - Aborting
> transaction manager due to: Snapshot (timestamp 1414037562088) failed due to:
> token (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache
> org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token
> 4287 for yarn) can't be found in cache
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)