[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031415#comment-14031415 ]
Aaron T. Myers commented on HDFS-6475: -------------------------------------- The latest patch looks pretty good to me. Two very small comments: # I think that the method comment for {{testDelegationTokenStandbyNNAppearFirst}} is a bit misleading. Seems like it's implying that the Standby NN is now throwing a different exception, when in fact I believe that the exception that's thrown is not changed by this patch, but rather that the client-side handling of the unwrapping of the exception is changed. # There should be no need to restore the state of the standby/active NNs at the end of the test, since the cluster is always shut down at the end of every test in this class. +1 from me once the above are addressed. [~daryn] and [~jingzhao] - does the latest patch look OK to you? > WebHdfs clients fail without retry because incorrect handling of > StandbyException > --------------------------------------------------------------------------------- > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs > Affects Versions: 2.4.0 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)