[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yongjun Zhang updated HDFS-6475: -------------------------------- Attachment: HDFS-6475.001.patch Upload a simple patch, hope I can get advice for a better approach. Thanks. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > --------------------------------------------------------------------------------- > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs > Affects Versions: 2.4.0 > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)