[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043677#comment-14043677 ] Hudson commented on HDFS-6475: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1812 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1812/]) HDFS-6475. WebHdfs clients fail without retry because incorrect handling of StandbyException. Contributed by Yongjun Zhang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605217) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ExceptionHandler.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDelegationTokensWithHA.java > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.5.0 > > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043493#comment-14043493 ] Hudson commented on HDFS-6475: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1785 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1785/]) HDFS-6475. WebHdfs clients fail without retry because incorrect handling of StandbyException. Contributed by Yongjun Zhang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605217) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ExceptionHandler.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDelegationTokensWithHA.java > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.5.0 > > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043325#comment-14043325 ] Hudson commented on HDFS-6475: -- FAILURE: Integrated in Hadoop-Yarn-trunk #594 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/594/]) HDFS-6475. WebHdfs clients fail without retry because incorrect handling of StandbyException. Contributed by Yongjun Zhang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605217) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ExceptionHandler.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDelegationTokensWithHA.java > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.5.0 > > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043082#comment-14043082 ] Yongjun Zhang commented on HDFS-6475: - Thanks a lot [~atm]! Many thanks to [~daryn] and [~jingzhao] for the review and comments. I will follow up with the getTrueCause issue in HDFS-6588. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.5.0 > > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042829#comment-14042829 ] Hudson commented on HDFS-6475: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5774 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5774/]) HDFS-6475. WebHdfs clients fail without retry because incorrect handling of StandbyException. Contributed by Yongjun Zhang. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1605217) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/ExceptionHandler.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDelegationTokensWithHA.java > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Fix For: 2.5.0 > > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042801#comment-14042801 ] Aaron T. Myers commented on HDFS-6475: -- +1, the latest patch looks good to me. I'm going to commit this momentarily. We can do the follow-up to clean up the code as Daryn suggests in the other JIRA that Yongjun has filed. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041771#comment-14041771 ] Aaron T. Myers commented on HDFS-6475: -- Hey Daryn, given Yongjun's investigation, seems like this is a bit more complex than originally suggested. Unless you object, I'm going to go ahead and check this in tomorrow so we can get this fixed and move on. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041171#comment-14041171 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], Would like to check with you, you mentioned {quote} If it turns out to be a lot more complicated, then perhaps a followup jira is ok {quote} Based on the information we have so far, the work involved is to remove getTrueCause, and replace the retrievePassword with retriableRetrievePassword, changing the interface spec of relevant methods because retriableRetrievePassword throws more exceptions, removing getTrueCause method, fixing the test failures reported in HDFS-6588. I hope you'd agree that it's appropriate to dedicate HDFS-6588 for the above mentioned work, and use the lastest patch I posted for HDFS-6475 to handle the SecurityException that UserProvider throws. Would you please comment again? Thanks. BTW, thanks [~jingzhao] for clarifying things in HDFS-6588 (sorry I had a typo in last update as 6589). > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040984#comment-14040984 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], Thanks a lot for the comments. Calling {{DelegationTokenSecretManager#retrievePassword}} is the sole place I have seen. And, the following method in AbstractDelegationTokenSecretManager is where retrievePassword is called, {code} public synchronized void verifyToken(TokenIdent identifier, byte[] password) throws InvalidToken { byte[] storedPassword = retrievePassword(identifier); if (!Arrays.equals(password, storedPassword)) { throw new InvalidToken("token (" + identifier + ") is invalid, password doesn't match"); } } {code} I wonder whether we can just replace the above retrievePassword call with retriableRetrievePassword here. I will give it a try. The failed tests are reported in HDFS-6589, related to HDFS-5322. Hi [~jingzhao], I put a question in HDFS-6589. I wonder if the failed tests are designed to cover real user scenarios? Thanks for clarifying. Best regards. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040790#comment-14040790 ] Daryn Sharp commented on HDFS-6475: --- bq. Your earlier suggestion indicated that we should use SecretManager#retriableRetrievePassword instead of SecretManager#retrievePassword, does that mean client code has to be modified? If I understand the question: The methods are only used server-side so no client-side changes should be required, so no incompatibility concerns. Did you happen to trace how/where the {{StandbyException}} is wrapped in an {{InvalidToken}}? It looks like {{DelegationTokenSecretManager#retrievePassword}} is the only place it occurs, but {{DelegationTokenSecretManager#retriableRetrievePassword}} does not wrap exceptions in {{InvalidToken}}. Is this maybe just a test case issue? Which testcase is failing? > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040344#comment-14040344 ] Yongjun Zhang commented on HDFS-6475: - Filed HDFS-6589 for the flaky test TestDistributedFileSystem.testAllWithNoXmlDefaults. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040329#comment-14040329 ] Hadoop QA commented on HDFS-6475: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651893/HDFS-6475.009.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDistributedFileSystem {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7208//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7208//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This me
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040313#comment-14040313 ] Hadoop QA commented on HDFS-6475: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651890/HDFS-6475.008.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7207//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7207//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch, HDFS-6475.009.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040264#comment-14040264 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], [~jingzhao] and [~atm], Many thanks to you guys for earlier review and comments. I just uploaded a new revision (008) to address the comments and testing errors. In summary, - Per Daryn's suggestion, I attempted to remove getTrueCause() method from Server.java as a whole, ran into test failures. After spending quite some time to look into, I personally really removing the existing getTrueCauseMethod really deserves a new JIRA, I filed HDFS-6588 with details and questions. I hope you'd agree based on the information I provided there, but I'm certainly open for further discussion. - The new patch I just uploaded for HDFS-6475 is limited to handle the case reported in this JIRA. It is only a few lines in ExceptionHandler.java, plus the testcase I added. It no longer calls getTrueCause() method defined in Server.java. Thanks a lot for follow-up. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch, > HDFS-6475.008.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039749#comment-14039749 ] Hadoop QA commented on HDFS-6475: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651811/HDFS-6475.007.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.ipc.TestSaslRPC org.apache.hadoop.hdfs.web.TestWebHdfsTokens {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7198//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7198//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039718#comment-14039718 ] Yongjun Zhang commented on HDFS-6475: - Hi, I uploaded a new patch per [~daryn]'s suggestion. On top of what Daryn suggested, and because of the exceptions I described in previous update (with stack information), I added the logic for InvalidToken in ExceptionHandler: {code} if (e instanceof SecurityException) { e = toCause(e); } if (e instanceof InvalidToken) { e = toCause(e); } {code} This logic is essentially what I wanted to share the original getTrueCause method for. Hi Daryn, would you please help review again? BTW, refer to your comment: {quote} In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a "InvalidToken wrapping another exception" anymore. {quote} I did what you suggested, but I'm still getting InvalidToken exception (see the stack described above). So it seems that the exception that saslProcess tries to handle comes from different source than what I'm running into. Thanks a lot. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch, HDFS-6475.007.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039675#comment-14039675 ] Yongjun Zhang commented on HDFS-6475: - Hello Guys, Thank you all for the review and discussion. As a follow-up, the first thing I did was to check that retriableRetrievePassword does give StandbyException. Then I tried what [~daryn] suggested how we could fix this jira, and still run into the same symptom (client side),: {code} org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:348) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:108) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:580) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:546) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient1$2.run(kclient1.java:80) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient1.main(kclient1.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:108) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) {code} I traced the server side and found the following: UserProvider.getValue caught {{org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException}} with the following stack {code} UserProvider.getValue(HttpContext) line: 56 UserProvider.getValue(HttpContext) line: 41 InjectableValuesProvider.getInjectableValues(HttpContext) line: 46 AbstractResourceMethodDispatchProvider$ResponseOutInvoker(AbstractResourceMethodDispatchProvider$EntityParamInInvoker).getParams(HttpContext) line: 153 AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(Object, HttpContext) line: 203 AbstractResourceMethodDispatchProvider$ResponseOutInvoker(ResourceJavaMethodDispatcher).dispatch(Object, HttpContext) line: 75 HttpMethodRule.accept(CharSequence, Object, UriRuleContext) line: 288 ResourceClassRule.accept(CharSequence, Object, UriRuleContext) line: 108 RightHandPathRule.accept(CharSequence, Object, UriRuleContext) line: 147 RootResourceClassesRule.accept(CharSequence, Object, UriRuleContext) line: 84 WebApplicationImpl._handleRequest(WebApplicationContext, ContainerRequest) line: 1469 WebApplicationImpl._handleRequest(WebApplicationContext, ContainerRequest, ContainerResponse) line: 1400 WebApplicationImpl.handleRequest(ContainerRequest, ContainerResponse) line: 1349 WebApplicationImpl.handleRequest(ContainerRequest, ContainerResponseWriter) line: 1339 ServletContainer$InternalWebComponent(WebComponent).service(URI, URI, HttpServletRequest, HttpServletResponse) line: 416 ServletContainer.service(URI, URI, HttpServletRequest, HttpServletResponse) line: 537 ServletContainer.service(HttpServletRequest, HttpServletResponse) line: 699 ServletContainer(HttpServlet).service(ServletRequest, ServletResponse) line:
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039425#comment-14039425 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], While I'm working on a revised version, I have a question. Your earlier suggestion indicated that we should use {{SecretManager#retriableRetrievePassword}} instead of {{SecretManager#retrievePassword}}, does that mean client code has to be modified? If so, is it a compatibility issue? thanks a lot. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039361#comment-14039361 ] Yongjun Zhang commented on HDFS-6475: - Many thanks [~daryn], I will work on a new revision per your suggestion. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039330#comment-14039330 ] Daryn Sharp commented on HDFS-6475: --- I'm +0 and defer to atm if the fix is harder than what I just suggested. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039327#comment-14039327 ] Daryn Sharp commented on HDFS-6475: --- What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: # Delete {{getTrueCause}} entirely instead of moving it elsewhere # In {{saslProcess}}, just throw the exception instead of running it through {{getTrueCause}} since it's not a "InvalidToken wrapping another exception" anymore. # Keep your 3-line change to unwrap {{SecurityException}} in {{toResponse}} If it turns out to be a lot more complicated, then perhaps a followup jira is ok, but since what should be the cleanest fix is 4 lines of change it would be nice to try that first. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039156#comment-14039156 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], Thanks a lot for the review and the in-depth comments. Since the {{getTrueCause}} method is already there, and the patch I posted reused it to solve the problem. What you suggested would be an improvement of the solution, would you please consider that we commit this patch first, and file a follow-up jira to improve the solution? Really appreciate your help. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039128#comment-14039128 ] Daryn Sharp commented on HDFS-6475: --- I don't think {{getTrueCause}} is necessary anymore and can be removed. It was born when {{SaslDigestCallbackHandler}} used to call {{SecretManager#retrievePassword}} which could only throw {{InvalidToken}}. A standby/retriable/etc exception had to be tunneled out via an {{InvalidToken}} cause. Now there's apparently a {{SecretManager#retriableRetrievePassword}} that can throw those exceptions directly. The problem comes down to an abuse of the injectable {{UserProvider}} to do authentication instead of via an authentication filter... Perhaps {{SecurityException}} is the only exception for which the cause needs to be extracted. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039049#comment-14039049 ] Daryn Sharp commented on HDFS-6475: --- Taking a look now. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038040#comment-14038040 ] Aaron T. Myers commented on HDFS-6475: -- +1, the latest patch looks good to me. I'll commit this tomorrow unless anyone else has comments in the meantime. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035626#comment-14035626 ] Hadoop QA commented on HDFS-6475: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651015/HDFS-6475.006.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7160//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7160//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch, HDFS-6475.006.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033990#comment-14033990 ] Daryn Sharp commented on HDFS-6475: --- Is a static method in InvalidToken really the right place to check and handle other non-InvalidToken exceptions? > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033406#comment-14033406 ] Jing Zhao commented on HDFS-6475: - Yeah, +1 > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033372#comment-14033372 ] Aaron T. Myers commented on HDFS-6475: -- Latest patch looks pretty good to me. [~jingzhao]? > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033099#comment-14033099 ] Yongjun Zhang commented on HDFS-6475: - BTW, the testLargeFile failure is intermittent and I updated HDFS_6543 about it. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032627#comment-14032627 ] Yongjun Zhang commented on HDFS-6475: - Running the two failed test locally, I'm seeing one passed and the other failed with today's trunk without my change. Filed HDFS-6543. Specifically: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer.testEncryptedBalancer2 passed, and org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed with and without the fix, I filed HDFS-6543. Thanks. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032178#comment-14032178 ] Hadoop QA commented on HDFS-6475: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650507/HDFS-6475.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7126//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7126//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.j
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032096#comment-14032096 ] Yongjun Zhang commented on HDFS-6475: - Hi [~jingzhao], Thanks a lot for the review and the suggestion! I agree InvalidToken is indeed a good place to hold this util method. I made this change and uploaded patch 005. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, > HDFS-6475.005.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032067#comment-14032067 ] Jing Zhao commented on HDFS-6475: - Thanks for the fix, [~yzhangal]! The patch looks good to me. My only concern is that if we can put the getTrueCause method into some Util class so that we do not need to call a method defined in ipc.Server from a class only used by webhdfs. I guess we can even move it into the InvalidToken class? > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031507#comment-14031507 ] Hadoop QA commented on HDFS-6475: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650438/HDFS-6475.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7121//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7121//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031461#comment-14031461 ] Yongjun Zhang commented on HDFS-6475: - Hi ATM, Thanks a lot for the review! Sorry I didn't make it clear earlier. The change in ExceptionHandler does happen on the server side. Basically the ExceptionHandler class processes the original exception thrown at server side and pass a possibly revised exception to client. The original exception thrown at the server side is SecurityException (from UserProvider class) which has cause InvalidToken which in turn has cause StandbyException, The ExceptionHandler processes and and pass StandbyException to client. I'm uploading a revised version to address both of your comments. Hopefully the revised comments made it more clear. Thanks in advance for reviewing the new revision! > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031415#comment-14031415 ] Aaron T. Myers commented on HDFS-6475: -- The latest patch looks pretty good to me. Two very small comments: # I think that the method comment for {{testDelegationTokenStandbyNNAppearFirst}} is a bit misleading. Seems like it's implying that the Standby NN is now throwing a different exception, when in fact I believe that the exception that's thrown is not changed by this patch, but rather that the client-side handling of the unwrapping of the exception is changed. # There should be no need to restore the state of the standby/active NNs at the end of the test, since the cluster is always shut down at the end of every test in this class. +1 from me once the above are addressed. [~daryn] and [~jingzhao] - does the latest patch look OK to you? > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028154#comment-14028154 ] Yongjun Zhang commented on HDFS-6475: - Filed HDFS-6518 (TestCacheDirectives#testExceedsCapacity fails intermittently). > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028087#comment-14028087 ] Hadoop QA commented on HDFS-6475: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649805/HDFS-6475.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7082//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch, HDFS-6475.003.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027605#comment-14027605 ] Hadoop QA commented on HDFS-6475: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649756/HDFS-6475.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7080//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027515#comment-14027515 ] Yongjun Zhang commented on HDFS-6475: - Hi Guys, I managed to create a testcase and uploaded patch version 003. Would you please help review it again? Hi [~daryn], I found that ExceptionHandler already did the following import from ipc area: {code} import org.apache.hadoop.ipc.RemoteException; {code} which means the webhdfs servlet already has dependency on ipc package. So I didn't duplicate the code. would you please let me know whether that make sense? Thanks a lot. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, > HDFS-6475.003.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026642#comment-14026642 ] Yongjun Zhang commented on HDFS-6475: - HI [~daryn], Thanks a lot for the review and comments. I attempted to let UserProvider class throw different exception earlier, and found it inherits from classes of jersey package, which we won't be able to change the interface spec. I could just make a duplicated copy of the IPC server code to the ExceptionHandler area so they are not tied to each other. I'm open for that. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026440#comment-14026440 ] Daryn Sharp commented on HDFS-6475: --- I had to hack around this problem in HDFS-6222... I'm a bit uneasy about tying the webhdfs servlet to the IPC server. I'd rather see the logic contained within webhdfs. I think {{UserProvider}} should throw a different exception that its {{ExceptionHandler}} specially knows to unwrap. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026288#comment-14026288 ] Hadoop QA commented on HDFS-6475: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12649538/HDFS-6475.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7071//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7071//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026198#comment-14026198 ] Yongjun Zhang commented on HDFS-6475: - Hello [~jingzhao], Thanks for your earlier suggestion. Sorry for a bit delay to take care of it. I just uploaded a patch. I was able to verify it works with a real cluster where I saw the problem and to see the patch fixed the issue. However, I was not successful creating a testcase for it. Since this new patch reused the method getTrueCause() in Server.java, the remaining thing to be checked by a unit test would be the change I made in ExceptionHandler. The change in ExceptionHandler is, for ContainerException and SecurityException, call getTrueCause() to find the real exception based on the "cause" chain of the ContainerException/SecurityException. The original code in ExceptionHandler only does one level of cause-seeking for ContainerException. Would you please help take a look at the patch to see if this patch can be committed without a unit testcase? or if you have any other advice? Thanks a lot. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016289#comment-14016289 ] Yongjun Zhang commented on HDFS-6475: - Hello Jing, many thanks for the very helpful info, I'm behind with other schedule, will look at this a bit later. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016068#comment-14016068 ] Jing Zhao commented on HDFS-6475: - Hi [~yzhangal], thanks for the response and figuring out the issue. Maybe we can put the true cause logic into ExceptionHandler#toResponse? > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016002#comment-14016002 ] Yongjun Zhang commented on HDFS-6475: - Hi Jing, Thanks a lot for the info! I took a quick look, the issue is similar but seems there is an important difference here. That is, In HDFS-5322 fix, the method (and all caller hierarchy) {code} private void saslProcess(RpcSaslProto saslMessage) throws WrappedRpcServerException, IOException, InterruptedException { {code} is allowed to throw IOException, so your HDFS-5322 solution work well. For HDFS-6475, the involved class UserProvider is not allowed to throw IOException. In fact, UserProvider is only throwing unchecked exception, e.g., SecurityException here to include the StandbyException info in the message and cause: {code} /** Inject user information to http operations. */ @Provider public class UserProvider extends AbstractHttpContextInjectable implements InjectableProvider { @Context HttpServletRequest request; @Context ServletContext servletcontext; .. @Override public UserGroupInformation getValue(final HttpContext context) { final Configuration conf = (Configuration) servletcontext .getAttribute(JspHelper.CURRENT_CONF); try { return JspHelper.getUGI(servletcontext, request, conf, AuthenticationMethod.KERBEROS, false); } catch (IOException e) { throw new SecurityException( "Failed to obtain user group information: " + e, e); } } {code} This means we can't throw StandbyException (inherited from IOException) from here. So my uploaded patch tried to parse the message string of the SecurityException thrown here. UserProvider class inherits from classes of jersey package, which we won't be able to change the interface spec. We might be able to change the client/server interface: we detect this kind of case at the interface, then instead of throwing the RemoteException that wraps SecurityException, we throw RemoteException that wraps the cause of StandbyException. I'n not sure whether we should go this route though. Would you please comment again? thanks. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclien
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015634#comment-14015634 ] Jing Zhao commented on HDFS-6475: - Looks like the similar issue on the RPC side has been fixed in HDFS-5322. Maybe we can follow the same pattern to fix this (see Server#getTrueCause). > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015186#comment-14015186 ] Hadoop QA commented on HDFS-6475: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647850/HDFS-6475.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7026//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7026//console This message is automatically generated. > WebHdfs clients fail without retry because incorrect handling of > StandbyException > - > > Key: HDFS-6475 > URL: https://issues.apache.org/jira/browse/HDFS-6475 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, webhdfs >Affects Versions: 2.4.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-6475.001.patch > > > With WebHdfs clients connected to a HA HDFS service, the delegation token is > previously initialized with the active NN. > When clients try to issue request, the NN it contacts is stored in a map > returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact > the NN based on the order, so likely the first one it runs into is StandbyNN. > If the StandbyNN doesn't have the updated client crediential, it will throw a > s SecurityException that wraps StandbyException. > The client is expected to retry another NN, but due to the insufficient > handling of SecurityException mentioned above, it failed. > Example message: > {code} > {RemoteException={message=Failed to obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: > StandbyException, javaCl > assName=java.lang.SecurityException, exception=SecurityException}} > org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to > obtain user group information: > org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException > at > org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) > at kclient1.kclient$1.run(kclient.java:64) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at kclient1.kclient.main(kclient.java:58) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)