[ 
https://issues.apache.org/jira/browse/TWILL-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192310#comment-14192310
 ] 

Alvin Wang commented on TWILL-106:
----------------------------------

Found that HDFS delegation token is properly updated according to 
UserGroupInformation.getCurrentUser().getTokens(). However, 1 day after the 
Twill app is started, we get the

{code}
23:55:06.646 [TwillContainerService] ERROR examples.HelloWorld - Error
org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 7256 
for yarn) is expired
{code}

message. In this case, token 7256 is the latest token, which was created only 
10 minutes before this error message is logged.

--

Also saw an error message in the app master log mentioning that 
UserGroupInformation aborted the renew thread:

{code}
242 18:59:22.779 [TGT Renewer for 
yarn/[email protected]] WARN  
o.a.h.security.UserGroupInformation - Exception encountered while running the 
renewal command. Aborting renew thread. 
org.apache.hadoop.util.Shell$ExitCodeException: kinit: Ticket expired while 
renewing credentials
{code}

This is likely due to the fact that "Maximum renewable life" is set to "0 days 
00:00:00" in the KDC for this principal. Also, "Maximum ticket life" is set to 
"1 day 00:00:00" which may be what is causing the Twill app to fail after 1 day 
because I also tested a cluster with delegation token expiration settings at 
the <10 min level in hdfs-site.xml, yet the Twill apps didn't fail within 10 
minutes.

I set the maximum renewable life to 7 days - will see if Twill app still fails 
after 1 day.

> HDFS delegation token is not being refreshed properly
> -----------------------------------------------------
>
>                 Key: TWILL-106
>                 URL: https://issues.apache.org/jira/browse/TWILL-106
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.4.0-incubating
>            Reporter: Poorna Chandra
>
> We have a Twill app that runs in a secure Hadoop cluster. The app starts up 
> fine, and runs for a day. I can see in logs that say secure store was updated 
> regularly. However, after a day I see exceptions that say "token 
> (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache". 
> Exception:
> -------------
> 2014-10-23T04:12:42,101Z ERROR c.c.t.TransactionManager 
> [cdap-secure120-1000.dev.continuuity.net] [tx-snapshot] 
> TransactionManager:abortService(TransactionManager.java:594) - Aborting 
> transaction manager due to: Snapshot (timestamp 1414037562088) failed due to: 
> token (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache
> org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 
> 4287 for yarn) can't be found in cache
>         at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to