[ 
https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031461#comment-14031461
 ] 

Yongjun Zhang commented on HDFS-6475:
-------------------------------------

Hi ATM,

Thanks a lot for the review! Sorry I didn't make it clear earlier. The change 
in ExceptionHandler does happen on the server side. Basically the 
ExceptionHandler class processes the original exception thrown at server side 
and pass a possibly revised exception to client. The original exception thrown 
at the server side is SecurityException (from UserProvider class) which has 
cause InvalidToken which in turn has cause StandbyException, The 
ExceptionHandler processes and and pass StandbyException to client.

I'm uploading a revised version to address both of your comments. Hopefully the 
revised comments made it more clear. Thanks in advance for reviewing the new 
revision!


> WebHdfs clients fail without retry because incorrect handling of 
> StandbyException
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-6475
>                 URL: https://issues.apache.org/jira/browse/HDFS-6475
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, webhdfs
>    Affects Versions: 2.4.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, 
> HDFS-6475.003.patch, HDFS-6475.003.patch
>
>
> With WebHdfs clients connected to a HA HDFS service, the delegation token is 
> previously initialized with the active NN.
> When clients try to issue request, the NN it contacts is stored in a map 
> returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact 
> the NN based on the order, so likely the first one it runs into is StandbyNN. 
> If the StandbyNN doesn't have the updated client crediential, it will throw a 
> s SecurityException that wraps StandbyException.
> The client is expected to retry another NN, but due to the insufficient 
> handling of SecurityException mentioned above, it failed.
> Example message:
> {code}
> {RemoteException={message=Failed to obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> StandbyException, javaCl
> assName=java.lang.SecurityException, exception=SecurityException}}
> org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to 
> obtain user group information: 
> org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException
>         at 
> org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696)
>         at kclient1.kclient$1.run(kclient.java:64)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
>         at kclient1.kclient.main(kclient.java:58)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to