[ 
https://issues.apache.org/jira/browse/FALCON-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997368#comment-14997368
 ] 

Sowmya Ramesh commented on FALCON-1595:
---------------------------------------

[~bvellanki]: What is the root cause for this issue? Why doesn't relogin done 
in AuthenticationInitializationService handle this case ? I am trying to 
understand if its one off case where token is just expiring and we try to dole 
out FS just before relogin. In that case similar to 
checkTGTAndReloginFromKeytab shouldn't we relogin if its close to expiry and 
not wait till its expired which is the current implementation.

> Falcon server loses ability to communicate with HDFS over time
> --------------------------------------------------------------
>
>                 Key: FALCON-1595
>                 URL: https://issues.apache.org/jira/browse/FALCON-1595
>             Project: Falcon
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Balu Vellanki
>            Assignee: Balu Vellanki
>         Attachments: FALCON-1595.patch
>
>
> In a kerberos secured cluster where the Kerberos ticket validity is one day, 
> Falcon server eventually lost the ability to read and write to and from HDFS. 
> In the logs we saw typical Kerberos-related errors like "GSSException: No 
> valid credentials provided (Mechanism level: Failed to find any Kerberos 
> tgt)". 
> {code}
> 2015-10-28 00:04:59,517 INFO  - [LaterunHandler:] ~ Creating FS impersonating 
> user testUser (HadoopClientFactory:197)
> 2015-10-28 00:04:59,519 WARN  - [LaterunHandler:] ~ Exception encountered 
> while connecting to the server : javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)] (Client:680)
> 2015-10-28 00:04:59,520 WARN  - [LaterunHandler:] ~ Late Re-run failed for 
> instance sample-process:2015-10-28T03:58Z after 420000 
> (AbstractRerunConsumer:84)
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]; Host Details : local host is: 
> "sample.host.com/127.0.0.1"; destination host is: "sample.host.com":8020; 
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1431)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1358)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy22.getFileInfo(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
>       at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
>       at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
>       at 
> org.apache.falcon.rerun.handler.LateRerunConsumer.detectLate(LateRerunConsumer.java:108)
>       at 
> org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:67)
>       at 
> org.apache.falcon.rerun.handler.LateRerunConsumer.handleRerun(LateRerunConsumer.java:47)
>       at 
> org.apache.falcon.rerun.handler.AbstractRerunConsumer.run(AbstractRerunConsumer.java:73)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
> initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to find any Kerberos tgt)]
>       at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
>       at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1397)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to