[jira] [Updated] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs
[ https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6492: -- Attachment: HDFS-6492.001.patch Patch attached. I included tests for multiset and multiremove. I didn't add internal functions or tests for create/mkdir time xattrs. We're going to make use of this in the encryption branch, so we'll have tests soon enough. I see something like tacking on an XAttrFeature in FSDirectory#addFile. TestOEV isn't going to work since I tweaked the protobuf definition to be {{repeated}} rather than {{optional}}. Worked locally when I updated the test resources. Support create-time xattrs and atomically setting multiple xattrs - Key: HDFS-6492 URL: https://issues.apache.org/jira/browse/HDFS-6492 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-6492.001.patch Ongoing work in HDFS-6134 requires being able to set system namespace extended attributes at create and mkdir time, as well as being able to atomically set multiple xattrs at once. There's currently no need to expose this functionality in the client API, so let's not unless we have to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs
[ https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6492: -- Target Version/s: 2.5.0 (was: 3.0.0) Affects Version/s: (was: 3.0.0) 2.4.0 Status: Patch Available (was: Open) Support create-time xattrs and atomically setting multiple xattrs - Key: HDFS-6492 URL: https://issues.apache.org/jira/browse/HDFS-6492 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-6492.001.patch Ongoing work in HDFS-6134 requires being able to set system namespace extended attributes at create and mkdir time, as well as being able to atomically set multiple xattrs at once. There's currently no need to expose this functionality in the client API, so let's not unless we have to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
Sangjin Lee created HDFS-6540: - Summary: TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-6540: -- Attachment: HDFS-6540.patch The regex pattern that matches the username and the group is changed to include -, which is a more correct pattern for the username and the group name. TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-6540: -- Target Version/s: 2.5.0 Status: Patch Available (was: Open) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6534: Status: Patch Available (was: Open) Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6534: Attachment: HDFS-6534.v1.patch Changes: 1. fix bug in memset(hdfsFileInfo...) 2. use PRId64 instead of %ld to prevent compile warning 3. emulate clock_gettime/sem_init/sem_destroy on macosx 4. remove -lrt on macosx in CMakeLists.txt Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-6534.v1.patch When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032152#comment-14032152 ] Binglin Chang commented on HDFS-6539: - The failed test is not related, create HDFS-6541 to track this test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6541) TestWebHdfsWithMultipleNameNodes.testRedirect failed with read timeout
Binglin Chang created HDFS-6541: --- Summary: TestWebHdfsWithMultipleNameNodes.testRedirect failed with read timeout Key: HDFS-6541 URL: https://issues.apache.org/jira/browse/HDFS-6541 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang https://builds.apache.org/job/PreCommit-HDFS-Build/7124/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsWithMultipleNameNodes/testRedirect/ Error Message Read timed out Stacktrace java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:695) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:472) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:539) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:410) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:438) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:434) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.create(WebHdfsFileSystem.java:1049) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.testRedirect(TestWebHdfsWithMultipleNameNodes.java:130) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6386) HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032177#comment-14032177 ] Andrew Wang commented on HDFS-6386: --- Thanks for doing the split, this was a lot easier to review :) A more thorough review: * We need to rebase the fs-encryption branch (and this patch) on trunk. The xattr code has changed slightly, one example being where we log the edit (FSN now, not FSDir). FSNamesystem: * listEZ needs to only return EZs where the user has permission to know about the EZ path, else we're exposing the existence of the path * In createEncryptionZone, we need to catch the KP exception such that it's logged in the retry cache. * Using FSDirectory#getPathComponentsForReservedPaths doesn't look right, can you check that it's not returning null? Doing some more tests with multiple EZs would be good, I noticed your listEZ test doesn't check the size of the returned listing which might be masking an error here. * KeyProvider should be a single word in javadoc FSDirectory: * I think the exception thrown from unprotectedSetXAttr contains the system.xxx xattr name. Maybe we should throw a fresh new exception rather than showing this to the user. Could also test for this explicitly rather than rethrowing an exception, since that's more expensive. * Do we care about repeating IVs? I'm not a cryptographer, but a Google search turns up concerns for stream cipher initialization vector birthday paradox. KeyAndIv * Need interface annotations HDFS Encryption Zones - Key: HDFS-6386 URL: https://issues.apache.org/jira/browse/HDFS-6386 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: HDFS-6386.4.patch, HDFS-6386.5.patch, HDFS-6386.6.patch, HDFS-6386.8.patch Define the required security xAttributes for directories and files within an encryption zone and how they propagate to children. Implement the logic to create/delete encryption zones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032178#comment-14032178 ] Hadoop QA commented on HDFS-6475: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650507/HDFS-6475.005.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7126//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7126//console This message is automatically generated. WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032202#comment-14032202 ] Gera Shegalov commented on HDFS-6540: - This issue seems to apply only to branch-2.4. We should change target version to 2.4.1. It's cumbersome to define a regex for usernames. For example, usernames must not start with '-' but may contain a '.' . In order to avoid dealing with this, you can paste the value of {{System.getProperty(user.name)}} for this component of regex. {code} ([d\\-])([rwx\\-]{9})\\s*(-|\\d+)\\s* + System.getProperty(user.name) + \\s*([a-zA-Z_0-9\\-]+)\\s*(\\d+)\\s*(\\d+)\\s*([\b/]+) {code} TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032203#comment-14032203 ] Hadoop QA commented on HDFS-6534: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650517/HDFS-6534.v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7129//console This message is automatically generated. Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-6534.v1.patch When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032206#comment-14032206 ] Hadoop QA commented on HDFS-6540: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650516/HDFS-6540.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7130//console This message is automatically generated. TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6494) In some case, the hedged read will lead to client infinite wait.
[ https://issues.apache.org/jira/browse/HDFS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6494: Attachment: HDFS-6494.txt Sorry for the late reply, i tried to repro with the attached HDFS-6494.txt test case which is a trunk patch porting from LiuLei's hedged-read-test-case.patch file. Looped 10 times and all passed. To me your original failure was caused by the CountDownLatch not be protected by a final statement, which was fixed in HDFS-6231(it's available since 2.4.1) by [~cnauroth]. I'd like to close the current jira due to duplicated. [~liulei.cn], if you still can repro it after 2.4.1, please feel free to reopen it, thanks all the same! In some case, the hedged read will lead to client infinite wait. -- Key: HDFS-6494 URL: https://issues.apache.org/jira/browse/HDFS-6494 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: LiuLei Assignee: Liang Xie Attachments: HDFS-6494.txt, hedged-read-bug.patch, hedged-read-test-case.patch When I use hedged read, If there is only one live datanode, the reading from the datanode throw TimeoutException and ChecksumException., the Client will infinite wait. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6494) In some case, the hedged read will lead to client infinite wait.
[ https://issues.apache.org/jira/browse/HDFS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie resolved HDFS-6494. - Resolution: Duplicate In some case, the hedged read will lead to client infinite wait. -- Key: HDFS-6494 URL: https://issues.apache.org/jira/browse/HDFS-6494 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: LiuLei Assignee: Liang Xie Attachments: HDFS-6494.txt, hedged-read-bug.patch, hedged-read-test-case.patch When I use hedged read, If there is only one live datanode, the reading from the datanode throw TimeoutException and ChecksumException., the Client will infinite wait. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032230#comment-14032230 ] Hadoop QA commented on HDFS-5574: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619705/HDFS-5574.v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl org.apache.hadoop.ha.TestZKFailoverControllerStress {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7127//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7127//console This message is automatically generated. Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs
[ https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032236#comment-14032236 ] Hadoop QA commented on HDFS-6492: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650512/HDFS-6492.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestEditLogFileInputStream org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.TestPersistBlocks org.apache.hadoop.hdfs.TestFileAppendRestart org.apache.hadoop.hdfs.server.namenode.TestEditLog {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7128//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7128//console This message is automatically generated. Support create-time xattrs and atomically setting multiple xattrs - Key: HDFS-6492 URL: https://issues.apache.org/jira/browse/HDFS-6492 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-6492.001.patch Ongoing work in HDFS-6134 requires being able to set system namespace extended attributes at create and mkdir time, as well as being able to atomically set multiple xattrs at once. There's currently no need to expose this functionality in the client API, so let's not unless we have to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs
[ https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032261#comment-14032261 ] Yi Liu commented on HDFS-6492: -- Thanks [~andrew.wang], nice work. I just have two comments for small improvements. *1.* {code} ListXAttr setINodeXAttrs(ListXAttr existingXAttrs, ListXAttr toSet, EnumSetXAttrSetFlag flag) throws IOException { // Check for duplicate XAttrs in toSet // We need to use a custom comparator, so using a HashSet is not suitable for (int i=0; itoSet.size(); i++) { for (int j=0; jtoSet.size(); j++) { if (i==j) { continue; } if (toSet.get(i).equalsIgnoreValue(toSet.get(j))) { throw new IOException(Cannot specify the same XAttr to be set + more than once); } } } .. {code} The two {{for}} can be improved as following. Since we don’t need to compare the elements i two times. {code} for(int i = 0; i toSet.size(); i++) { for(int j = i+1; j toSet.size(); j++) { if (toSet.get(i).equalsIgnoreValue(toSet.get(j))) { throw new IOException(Cannot specify the same XAttr to be set + more than once); } } } {code} *2.* In {{FSDirectory#setINodeXAttrs}}, if change like following could be a bit more efficient? (Can save one iteration) *1).* {code} if (existingXAttrs != null) { for (XAttr a: existingXAttrs) { if (isUserVisible(a)) { userVisibleXAttrsNum++; } } } {code} Remove this snippet code. *2).* {code} XAttrSetFlag.validate(xAttr.getName(), exist, flag); // add the new XAttr since it passed validation xAttrs.add(xAttr); if (isUserVisible(xAttr) !exist) { userVisibleXAttrsNum++; } {code} Change it to : {code} XAttrSetFlag.validate(xAttr.getName(), exist, flag); // add the new XAttr since it passed validation xAttrs.add(xAttr); if (isUserVisible(xAttr)) { userVisibleXAttrsNum++; } {code} *3).* {code} if (!alreadySet) { xAttrs.add(existing); } {code} Change it to: {code} if (!alreadySet) { xAttrs.add(existing); if (isUserVisible(existing)) { userVisibleXAttrsNum++; } } {code} Support create-time xattrs and atomically setting multiple xattrs - Key: HDFS-6492 URL: https://issues.apache.org/jira/browse/HDFS-6492 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-6492.001.patch Ongoing work in HDFS-6134 requires being able to set system namespace extended attributes at create and mkdir time, as well as being able to atomically set multiple xattrs at once. There's currently no need to expose this functionality in the client API, so let's not unless we have to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4667) Capture renamed files/directories in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032272#comment-14032272 ] Binglin Chang commented on HDFS-4667: - Thanks for the updates [~jingzhao], I will have a look, it may take 1 or 2. Capture renamed files/directories in snapshot diff report - Key: HDFS-4667 URL: https://issues.apache.org/jira/browse/HDFS-4667 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Jing Zhao Assignee: Binglin Chang Attachments: HDFS-4667.002.patch, HDFS-4667.002.patch, HDFS-4667.003.patch, HDFS-4667.demo.patch, HDFS-4667.v1.patch, getfullname-snapshot-support.patch Currently in the diff report we only show file/dir creation, deletion and modification. After rename with snapshots is supported, renamed file/dir should also be captured in the diff report. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6542) WebHDFSFileSystem doesn't transmit desired checksum type
Andrey Stepachev created HDFS-6542: -- Summary: WebHDFSFileSystem doesn't transmit desired checksum type Key: HDFS-6542 URL: https://issues.apache.org/jira/browse/HDFS-6542 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Andrey Stepachev Priority: Minor Currently DFSClient has possibility to specify desired checksum type. This behaviour controlled by dfs.checksym.type parameter settable by client. It works with hdfs:// filesystem, but doesn't works with webhdfs.It fails to work because webhdfs will use default type of checksumming initialised by server instance of DFSClient. As example https://issues.apache.org/jira/browse/HADOOP-8240 doesn't works with webhdfs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6375) Listing extended attributes with the search permission
[ https://issues.apache.org/jira/browse/HDFS-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032354#comment-14032354 ] Hudson commented on HDFS-6375: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #585 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/585/]) Moved CHANGES.txt entries of MAPREDUCE-5898, MAPREDUCE-5920, HDFS-6464, HDFS-6375 from trunk to 2.5 section on merging HDFS-2006 to branch-2 (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602699) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Listing extended attributes with the search permission -- Key: HDFS-6375 URL: https://issues.apache.org/jira/browse/HDFS-6375 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6375.1.patch, HDFS-6375.10.patch, HDFS-6375.11.patch, HDFS-6375.13.patch, HDFS-6375.2.patch, HDFS-6375.3.patch, HDFS-6375.4.patch, HDFS-6375.5.patch, HDFS-6375.6.patch, HDFS-6375.7.patch, HDFS-6375.8.patch, HDFS-6375.9.patch From the attr(5) manpage: {noformat} Users with search access to a file or directory may retrieve a list of attribute names defined for that file or directory. {noformat} This is like doing {{getfattr}} without the {{-d}} flag, which we currently don't support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032389#comment-14032389 ] Hadoop QA commented on HDFS-5574: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12619705/HDFS-5574.v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7132//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7132//console This message is automatically generated. Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2006) ability to support storing extended attributes per file
[ https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032447#comment-14032447 ] Hudson commented on HDFS-2006: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1776 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1776/]) Moved CHANGES.txt entries of MAPREDUCE-5898, MAPREDUCE-5920, HDFS-6464, HDFS-6375 from trunk to 2.5 section on merging HDFS-2006 to branch-2 (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602699) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt ability to support storing extended attributes per file --- Key: HDFS-2006 URL: https://issues.apache.org/jira/browse/HDFS-2006 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: dhruba borthakur Assignee: Yi Liu Fix For: 3.0.0, 2.5.0 Attachments: ExtendedAttributes.html, HDFS-2006-Branch-2-Merge.patch, HDFS-2006-Merge-1.patch, HDFS-2006-Merge-2.patch, HDFS-XAttrs-Design-1.pdf, HDFS-XAttrs-Design-2.pdf, HDFS-XAttrs-Design-3.pdf, Test-Plan-for-Extended-Attributes-1.pdf, xattrs.1.patch, xattrs.patch It would be nice if HDFS provides a feature to store extended attributes for files, similar to the one described here: http://en.wikipedia.org/wiki/Extended_file_attributes. The challenge is that it has to be done in such a way that a site not using this feature does not waste precious memory resources in the namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6375) Listing extended attributes with the search permission
[ https://issues.apache.org/jira/browse/HDFS-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032450#comment-14032450 ] Hudson commented on HDFS-6375: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1776 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1776/]) Moved CHANGES.txt entries of MAPREDUCE-5898, MAPREDUCE-5920, HDFS-6464, HDFS-6375 from trunk to 2.5 section on merging HDFS-2006 to branch-2 (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602699) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Listing extended attributes with the search permission -- Key: HDFS-6375 URL: https://issues.apache.org/jira/browse/HDFS-6375 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6375.1.patch, HDFS-6375.10.patch, HDFS-6375.11.patch, HDFS-6375.13.patch, HDFS-6375.2.patch, HDFS-6375.3.patch, HDFS-6375.4.patch, HDFS-6375.5.patch, HDFS-6375.6.patch, HDFS-6375.7.patch, HDFS-6375.8.patch, HDFS-6375.9.patch From the attr(5) manpage: {noformat} Users with search access to a file or directory may retrieve a list of attribute names defined for that file or directory. {noformat} This is like doing {{getfattr}} without the {{-d}} flag, which we currently don't support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6464) Support multiple xattr.name parameters for WebHDFS getXAttrs.
[ https://issues.apache.org/jira/browse/HDFS-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032449#comment-14032449 ] Hudson commented on HDFS-6464: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1776 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1776/]) Moved CHANGES.txt entries of MAPREDUCE-5898, MAPREDUCE-5920, HDFS-6464, HDFS-6375 from trunk to 2.5 section on merging HDFS-2006 to branch-2 (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602699) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Support multiple xattr.name parameters for WebHDFS getXAttrs. - Key: HDFS-6464 URL: https://issues.apache.org/jira/browse/HDFS-6464 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6464.1.patch, HDFS-6464.patch For WebHDFS getXAttrs through names, right now the entire list is passed to the client side and then filtered, which is not the best choice since it's inefficient and precludes us from doing server-side smarts on par with the Java APIs. Furthermore, if some xattrs doesn't exist, server side should return error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6375) Listing extended attributes with the search permission
[ https://issues.apache.org/jira/browse/HDFS-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032534#comment-14032534 ] Hudson commented on HDFS-6375: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1803 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1803/]) Moved CHANGES.txt entries of MAPREDUCE-5898, MAPREDUCE-5920, HDFS-6464, HDFS-6375 from trunk to 2.5 section on merging HDFS-2006 to branch-2 (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602699) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Listing extended attributes with the search permission -- Key: HDFS-6375 URL: https://issues.apache.org/jira/browse/HDFS-6375 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6375.1.patch, HDFS-6375.10.patch, HDFS-6375.11.patch, HDFS-6375.13.patch, HDFS-6375.2.patch, HDFS-6375.3.patch, HDFS-6375.4.patch, HDFS-6375.5.patch, HDFS-6375.6.patch, HDFS-6375.7.patch, HDFS-6375.8.patch, HDFS-6375.9.patch From the attr(5) manpage: {noformat} Users with search access to a file or directory may retrieve a list of attribute names defined for that file or directory. {noformat} This is like doing {{getfattr}} without the {{-d}} flag, which we currently don't support. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6464) Support multiple xattr.name parameters for WebHDFS getXAttrs.
[ https://issues.apache.org/jira/browse/HDFS-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032533#comment-14032533 ] Hudson commented on HDFS-6464: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1803 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1803/]) Moved CHANGES.txt entries of MAPREDUCE-5898, MAPREDUCE-5920, HDFS-6464, HDFS-6375 from trunk to 2.5 section on merging HDFS-2006 to branch-2 (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602699) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Support multiple xattr.name parameters for WebHDFS getXAttrs. - Key: HDFS-6464 URL: https://issues.apache.org/jira/browse/HDFS-6464 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0 Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6464.1.patch, HDFS-6464.patch For WebHDFS getXAttrs through names, right now the entire list is passed to the client side and then filtered, which is not the best choice since it's inefficient and precludes us from doing server-side smarts on par with the Java APIs. Furthermore, if some xattrs doesn't exist, server side should return error. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2006) ability to support storing extended attributes per file
[ https://issues.apache.org/jira/browse/HDFS-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032531#comment-14032531 ] Hudson commented on HDFS-2006: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1803 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1803/]) Moved CHANGES.txt entries of MAPREDUCE-5898, MAPREDUCE-5920, HDFS-6464, HDFS-6375 from trunk to 2.5 section on merging HDFS-2006 to branch-2 (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602699) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt ability to support storing extended attributes per file --- Key: HDFS-2006 URL: https://issues.apache.org/jira/browse/HDFS-2006 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: HDFS XAttrs (HDFS-2006) Reporter: dhruba borthakur Assignee: Yi Liu Fix For: 3.0.0, 2.5.0 Attachments: ExtendedAttributes.html, HDFS-2006-Branch-2-Merge.patch, HDFS-2006-Merge-1.patch, HDFS-2006-Merge-2.patch, HDFS-XAttrs-Design-1.pdf, HDFS-XAttrs-Design-2.pdf, HDFS-XAttrs-Design-3.pdf, Test-Plan-for-Extended-Attributes-1.pdf, xattrs.1.patch, xattrs.patch It would be nice if HDFS provides a feature to store extended attributes for files, similar to the one described here: http://en.wikipedia.org/wiki/Extended_file_attributes. The challenge is that it has to be done in such a way that a site not using this feature does not waste precious memory resources in the namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6527: - Attachment: HDFS-6527.v4.patch The v4 patch does what you suggested. Regarding the test code in {{FSNamesystem}}, {{delete()}} also needs a delay. We already have fault injection in various critical parts of the system. Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6376) Distcp data between two HA clusters requires another configuration
[ https://issues.apache.org/jira/browse/HDFS-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032610#comment-14032610 ] Dave Marion commented on HDFS-6376: --- ready for review. Distcp data between two HA clusters requires another configuration -- Key: HDFS-6376 URL: https://issues.apache.org/jira/browse/HDFS-6376 Project: Hadoop HDFS Issue Type: Bug Components: datanode, federation, hdfs-client Affects Versions: 2.3.0, 2.4.0 Environment: Hadoop 2.3.0 Reporter: Dave Marion Fix For: 3.0.0 Attachments: HDFS-6376-2.patch, HDFS-6376-3-branch-2.4.patch, HDFS-6376-4-branch-2.4.patch, HDFS-6376-5-trunk.patch, HDFS-6376-6-trunk.patch, HDFS-6376-branch-2.4.patch, HDFS-6376-patch-1.patch User has to create a third set of configuration files for distcp when transferring data between two HA clusters. Consider the scenario in [1]. You cannot put all of the required properties in core-site.xml and hdfs-site.xml for the client to resolve the location of both active namenodes. If you do, then the datanodes from cluster A may join cluster B. I can not find a configuration option that tells the datanodes to federate blocks for only one of the clusters in the configuration. [1] http://mail-archives.apache.org/mod_mbox/hadoop-user/201404.mbox/%3CBAY172-W2133964E0C283968C161DD1520%40phx.gbl%3E -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6543) org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed
Yongjun Zhang created HDFS-6543: --- Summary: org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed Key: HDFS-6543 URL: https://issues.apache.org/jira/browse/HDFS-6543 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Running latest trunk locally, I'm seeing this failure: {code} --- T E S T S --- Running org.apache.hadoop.hdfs.web.TestWebHDFS Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 119.42 sec FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS testLargeFile(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: 26.415 sec ERROR! java.io.IOException: File /test/largeFile/file could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1468) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2725) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:611) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:163) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:312) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:86) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathOutputStreamRunner$1.close(WebHdfsFileSystem.java:708) at org.apache.hadoop.hdfs.web.TestWebHDFS.largeFileTest(TestWebHDFS.java:134) at org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile(TestWebHDFS.java:97) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6494) In some case, the hedged read will lead to client infinite wait.
[ https://issues.apache.org/jira/browse/HDFS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032624#comment-14032624 ] Chris Nauroth commented on HDFS-6494: - [~xieliang007], thank you for taking another look at this. In some case, the hedged read will lead to client infinite wait. -- Key: HDFS-6494 URL: https://issues.apache.org/jira/browse/HDFS-6494 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: LiuLei Assignee: Liang Xie Attachments: HDFS-6494.txt, hedged-read-bug.patch, hedged-read-test-case.patch When I use hedged read, If there is only one live datanode, the reading from the datanode throw TimeoutException and ChecksumException., the Client will infinite wait. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032627#comment-14032627 ] Yongjun Zhang commented on HDFS-6475: - Running the two failed test locally, I'm seeing one passed and the other failed with today's trunk without my change. Filed HDFS-6543. Specifically: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer.testEncryptedBalancer2 passed, and org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed with and without the fix, I filed HDFS-6543. Thanks. WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-6540: -- Priority: Minor (was: Major) Target Version/s: 2.4.1 (was: 2.5.0) Thanks Gera. It appears this offending method has been removed as part of HDFS-6164. I changed the target version to 2.4.1. However, since this is regarding a functionality that was removed in 2.5.0 and it pertains to purely a unit test failure, it is not too important to fix. I don't think this should stop 2.4.1. I'll still upload an updated patch for 2.4.0 shortly. TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-6540: -- Status: Open (was: Patch Available) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-6540: -- Status: Patch Available (was: Open) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HDFS-6540-branch-2.4.patch, HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated HDFS-6540: -- Attachment: HDFS-6540-branch-2.4.patch TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HDFS-6540-branch-2.4.patch, HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6540) TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames
[ https://issues.apache.org/jira/browse/HDFS-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032660#comment-14032660 ] Hadoop QA commented on HDFS-6540: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650599/HDFS-6540-branch-2.4.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7134//console This message is automatically generated. TestOfflineImageViewer.outputOfLSVisitor fails for certain usernames Key: HDFS-6540 URL: https://issues.apache.org/jira/browse/HDFS-6540 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Attachments: HDFS-6540-branch-2.4.patch, HDFS-6540.patch TestOfflineImageViewer.outputOfLSVisitor() fails if the username contains - (dash). A dash is a valid character in a username. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-3848) A Bug in recoverLeaseInternal method of FSNameSystem class
[ https://issues.apache.org/jira/browse/HDFS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-3848: - Priority: Major (was: Minor) Target Version/s: 2.5.0 A Bug in recoverLeaseInternal method of FSNameSystem class -- Key: HDFS-3848 URL: https://issues.apache.org/jira/browse/HDFS-3848 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.1 Reporter: Hooman Peiro Sajjad Labels: patch Attachments: HDFS-3848-1.patch Original Estimate: 1h Remaining Estimate: 1h This is a bug in logic of the method recoverLeaseInternal. In line 1322 it checks if the owner of the file is trying to recreate the file. The condition of the if statement is (leaseFile != null leaseFile.equals(lease)) || lease.getHolder().equals(holder) As it can be seen, there are two operands (conditions) connected with an or operator. The first operand is straight and will be true only if the holder of the file is the new holder. But the problem is the second operand which will be always true since the lease object is the one found by the holder by calling Lease lease = leaseManager.getLease(holder); in line 1315. To fix this I think the if statement only should contain the following the condition: (leaseFile != null leaseFile.getHolder().equals(holder)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6536) FileSystem.Cache.closeAll() throws authentication exception at the end of a webhdfs client
[ https://issues.apache.org/jira/browse/HDFS-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6536: Priority: Minor (was: Major) FileSystem.Cache.closeAll() throws authentication exception at the end of a webhdfs client -- Key: HDFS-6536 URL: https://issues.apache.org/jira/browse/HDFS-6536 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Priority: Minor With a small client program below, when running as user root which doesn't have kerberos credential, exception is thrown at the end of the client run. The config is HA security enabled, with client config setting {code} property namefs.defaultFS/name valuewebhdfs://ns1/value /property {code} The client program: {code} public class kclient1 { public static void main(String[] args) throws IOException { final Configuration conf = new Configuration(); //a non-root user final UserGroupInformation ugi = UserGroupInformation.getUGIFromTicketCache(/tmp/krb5cc_496, h...@xyz.com); System.out.println(Starting); ugi.doAs(new PrivilegedActionObject() { @Override public Object run() { try { FileSystem fs = FileSystem.get(conf); String renewer = abcdefg; fs.addDelegationTokens( renewer, ugi.getCredentials()); // Just to prove that we connected with right credentials. fs.getFileStatus(new Path(/)); return fs.getDelegationToken(renewer); } catch (Exception e) { e.printStackTrace(); return null; } } }); System.out.println(THE END); } } {code} Output: {code} [root@yjzc5w-1 tmp2]# hadoop --config /tmp2/conf jar kclient1.jar kclient1.kclient1 Starting 14/06/14 20:38:51 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:52 INFO web.WebHdfsFileSystem: Retrying connect to namenode: yjzc5w-2.xyz.com/172.26.3.87:20101. Already tried 0 time(s); retry policy is org.apache.hadoop.io.retry.RetryPolicies$FailoverOnNetworkExceptionRetry@1a92210, delay 0ms. To prove that connection with right credentials to get file status updated updated 7 THE END 14/06/14 20:38:53 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:53 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 14/06/14 20:38:53 INFO fs.FileSystem: FileSystem.Cache.closeAll() threw an exception: java.io.IOException: Authentication failed, url=http://yjzc5w-2.xyz.com:20101/webhdfs/v1/?op=CANCELDELEGATIONTOKENuser.name=roottoken=HAAEaGRmcwRoZGZzAIoBRp2bNByKAUbBp7gcbBQUD6vWmRYJRv03XZj7Jajf8PU8CB8SV0VCSERGUyBkZWxlZ2F0aW9uC2hhLWhkZnM6bnMx [root@yjzc5w-1 tmp2]# {code} We can see the the exception is thrown in the end of the client run. I found that the problem is that at the end of client run, the FileSystem$Cache$ClientFinalizer is run, in which process the tokens stored in the filesystem cache is get cancelled with the following all: {code} final class TokenAspectT extends FileSystem Renewable { @InterfaceAudience.Private public static class TokenManager extends TokenRenewer { @Override public void cancel(Token? token, Configuration conf) throws IOException { getInstance(token, conf).cancelDelegationToken(token); == } {code} where getInstance(token, conf) create a FileSystem as user root, then call cancelDelegationToken to server side. However, server doesn't have root kerberos credential, so throw this exceptoin. When I run the same program as user hdfs which has the kerberos credential, then it's fine. In this case, the client program doesn't own the delegation token, it should not try to cancel the token. However, the token does not get cancelled because of the exception I described is thrown, which is good. The remaining question is, do we need to check token ownership before trying to cancel the token. This is a minor issue here, and I'm dropping the severity. Hi [~daryn], I wonder if you could give a quick comment, really
[jira] [Updated] (HDFS-6536) FileSystem.Cache.closeAll() throws authentication exception at the end of a webhdfs client
[ https://issues.apache.org/jira/browse/HDFS-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6536: Description: With a small client program below, when running as user root which doesn't have kerberos credential, exception is thrown at the end of the client run. The config is HA security enabled, with client config setting {code} property namefs.defaultFS/name valuewebhdfs://ns1/value /property {code} The client program: {code} public class kclient1 { public static void main(String[] args) throws IOException { final Configuration conf = new Configuration(); //a non-root user final UserGroupInformation ugi = UserGroupInformation.getUGIFromTicketCache(/tmp/krb5cc_496, h...@xyz.com); System.out.println(Starting); ugi.doAs(new PrivilegedActionObject() { @Override public Object run() { try { FileSystem fs = FileSystem.get(conf); String renewer = abcdefg; fs.addDelegationTokens( renewer, ugi.getCredentials()); // Just to prove that we connected with right credentials. fs.getFileStatus(new Path(/)); return fs.getDelegationToken(renewer); } catch (Exception e) { e.printStackTrace(); return null; } } }); System.out.println(THE END); } } {code} Output: {code} [root@yjzc5w-1 tmp2]# hadoop --config /tmp2/conf jar kclient1.jar kclient1.kclient1 Starting 14/06/14 20:38:51 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:52 INFO web.WebHdfsFileSystem: Retrying connect to namenode: yjzc5w-2.xyz.com/172.26.3.87:20101. Already tried 0 time(s); retry policy is org.apache.hadoop.io.retry.RetryPolicies$FailoverOnNetworkExceptionRetry@1a92210, delay 0ms. To prove that connection with right credentials to get file status updated updated 7 THE END 14/06/14 20:38:53 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded 14/06/14 20:38:53 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) 14/06/14 20:38:53 INFO fs.FileSystem: FileSystem.Cache.closeAll() threw an exception: java.io.IOException: Authentication failed, url=http://yjzc5w-2.xyz.com:20101/webhdfs/v1/?op=CANCELDELEGATIONTOKENuser.name=roottoken=HAAEaGRmcwRoZGZzAIoBRp2bNByKAUbBp7gcbBQUD6vWmRYJRv03XZj7Jajf8PU8CB8SV0VCSERGUyBkZWxlZ2F0aW9uC2hhLWhkZnM6bnMx [root@yjzc5w-1 tmp2]# {code} We can see the the exception is thrown in the end of the client run. I found that the problem is that at the end of client run, the FileSystem$Cache$ClientFinalizer is run, in which process the tokens stored in the filesystem cache is get cancelled with the following all: {code} final class TokenAspectT extends FileSystem Renewable { @InterfaceAudience.Private public static class TokenManager extends TokenRenewer { @Override public void cancel(Token? token, Configuration conf) throws IOException { getInstance(token, conf).cancelDelegationToken(token); == } {code} where getInstance(token, conf) create a FileSystem as user root, then call cancelDelegationToken to server side. However, server doesn't have root kerberos credential, so throw this exceptoin. When I run the same program as user hdfs which has the kerberos credential, then it's fine. In this case, the client program doesn't own the delegation token, it should not try to cancel the token. However, the token does not get cancelled because of the exception I described is thrown, which is good. The remaining question is, do we need to check token ownership before trying to cancel the token. This is a minor issue here, and I'm dropping the severity. Hi [~daryn], I wonder if you could give a quick comment, really appreciate it! was: With a small client program below, when running as user root, exception is thrown at the end of the client run. The config is HA security enabled, with client config setting {code} property namefs.defaultFS/name valuewebhdfs://ns1/value /property {code} The client program: {code} public class kclient1 { public static void main(String[] args) throws IOException { final Configuration conf = new Configuration(); //a non-root user final UserGroupInformation ugi =
[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032747#comment-14032747 ] Hadoop QA commented on HDFS-6527: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650582/HDFS-6527.v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7133//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7133//console This message is automatically generated. Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6544) Broken Link for GFS in package.html
Suraj Nayak M created HDFS-6544: --- Summary: Broken Link for GFS in package.html Key: HDFS-6544 URL: https://issues.apache.org/jira/browse/HDFS-6544 Project: Hadoop HDFS Issue Type: Bug Reporter: Suraj Nayak M Priority: Minor The link to GFS is currently pointing to http://labs.google.com/papers/gfs.html, which is broken. Change it to http://research.google.com/archive/gfs.html which has Abstract of the GFS paper along with link to the PDF version of the GFS Paper. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6544) Broken Link for GFS in package.html
[ https://issues.apache.org/jira/browse/HDFS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suraj Nayak M updated HDFS-6544: Attachment: HDFS-6544.patch Added patch to change link Broken Link for GFS in package.html --- Key: HDFS-6544 URL: https://issues.apache.org/jira/browse/HDFS-6544 Project: Hadoop HDFS Issue Type: Bug Reporter: Suraj Nayak M Priority: Minor Attachments: HDFS-6544.patch The link to GFS is currently pointing to http://labs.google.com/papers/gfs.html, which is broken. Change it to http://research.google.com/archive/gfs.html which has Abstract of the GFS paper along with link to the PDF version of the GFS Paper. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6312) WebHdfs HA failover is broken on secure clusters
[ https://issues.apache.org/jira/browse/HDFS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6312: Attachment: HDFS-6312.attempted.patch Hi Daryn, Attached is a quick fix for this issue, I tried to write a testcase but was not successful. However, I tested out the fix in a real cluster and I can see the problem without the patch, and see the patch resolved the problem. Would you please take a look to see if we can have this issue fixed first? Thanks a lot! --Yongjun WebHdfs HA failover is broken on secure clusters Key: HDFS-6312 URL: https://issues.apache.org/jira/browse/HDFS-6312 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: HDFS-6312.attempted.patch When webhdfs does a failover, it blanks out the delegation token. This will cause subsequent operations against the other NN to acquire a new token. Tasks cannot acquire a token (no kerberos credentials) so jobs will fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032869#comment-14032869 ] Colin Patrick McCabe commented on HDFS-6518: Yeah, I agree that we need to take the FSN read lock there. +1. Thanks Yongjun. TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6518: --- Summary: TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list (was: TestCacheDirectives#testExceedsCapacity fails intermittently) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032895#comment-14032895 ] Colin Patrick McCabe commented on HDFS-5546: I think we need a unit test to go with this that tests a few threads doing {{mkdir}} and {{remove}} while another thread is using {{Ls}} race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Kousuke Saruta Priority: Minor Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032902#comment-14032902 ] Colin Patrick McCabe commented on HDFS-6539: +1. Thanks, Binglin. test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6312) WebHdfs HA failover is broken on secure clusters
[ https://issues.apache.org/jira/browse/HDFS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032912#comment-14032912 ] Alejandro Abdelnur commented on HDFS-6312: -- Patch LGTM. [~daryn], this is fixing a problem in its own merit. Do you see any issue committing this JIRA separately for your larger fix? (BTW, any JIRA for it?) WebHdfs HA failover is broken on secure clusters Key: HDFS-6312 URL: https://issues.apache.org/jira/browse/HDFS-6312 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 2.4.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Blocker Attachments: HDFS-6312.attempted.patch When webhdfs does a failover, it blanks out the delegation token. This will cause subsequent operations against the other NN to acquire a new token. Tasks cannot acquire a token (no kerberos credentials) so jobs will fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032917#comment-14032917 ] Hudson commented on HDFS-6539: -- FAILURE: Integrated in Hadoop-trunk-Commit #5711 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5711/]) HDFS-6539. test_native_mini_dfs is skipped in hadoop-hdfs pom.xml (decstery via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602998) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6480: - Attachment: HDFS-6480.002.patch Move waitForReady() from FSDirectory to FSNamesystem Key: HDFS-6480 URL: https://issues.apache.org/jira/browse/HDFS-6480 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch, HDFS-6480.002.patch Currently FSDirectory implements a barrier in {{waitForReady()}} / {{setReady()}} so that it only serve requests once the FSImage is fully loaded. As a part of the effort to evolve {{FSDirectory}} to a class which focuses on implementing the data structure of the namespace, this jira proposes to move the barrier one level higher to {{FSNamesystem}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs
[ https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6492: -- Attachment: HDFS-6492.002.patch Thanks for reviewing Liu Yi! I made your suggested improvements, good ideas. I also fixed the broken unit tests, forgot to conditional reading xattrs in AddCloseOp and MkdirOp on the edit log supporting xattrs. Support create-time xattrs and atomically setting multiple xattrs - Key: HDFS-6492 URL: https://issues.apache.org/jira/browse/HDFS-6492 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-6492.001.patch, HDFS-6492.002.patch Ongoing work in HDFS-6134 requires being able to set system namespace extended attributes at create and mkdir time, as well as being able to atomically set multiple xattrs at once. There's currently no need to expose this functionality in the client API, so let's not unless we have to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
Kihwal Lee created HDFS-6545: Summary: Finalizing rolling upgrade can make NN unavailable for a long duration Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6244) Make Trash Interval configurable for each of the namespaces
[ https://issues.apache.org/jira/browse/HDFS-6244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-6244: -- Attachment: HDFS-6244.v3.patch Make Trash Interval configurable for each of the namespaces --- Key: HDFS-6244 URL: https://issues.apache.org/jira/browse/HDFS-6244 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-6244.v1.patch, HDFS-6244.v2.patch, HDFS-6244.v3.patch Somehow we need to avoid the cluster filling up. One solution is to have a different trash policy per namespace. However, if we can simply make the property configurable per namespace, then the same config can be rolled everywhere and we'd be done. This seems simple enough. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3848) A Bug in recoverLeaseInternal method of FSNameSystem class
[ https://issues.apache.org/jira/browse/HDFS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032955#comment-14032955 ] Hadoop QA commented on HDFS-3848: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545609/HDFS-3848-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7136//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7136//console This message is automatically generated. A Bug in recoverLeaseInternal method of FSNameSystem class -- Key: HDFS-3848 URL: https://issues.apache.org/jira/browse/HDFS-3848 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.1 Reporter: Hooman Peiro Sajjad Labels: patch Attachments: HDFS-3848-1.patch Original Estimate: 1h Remaining Estimate: 1h This is a bug in logic of the method recoverLeaseInternal. In line 1322 it checks if the owner of the file is trying to recreate the file. The condition of the if statement is (leaseFile != null leaseFile.equals(lease)) || lease.getHolder().equals(holder) As it can be seen, there are two operands (conditions) connected with an or operator. The first operand is straight and will be true only if the holder of the file is the new holder. But the problem is the second operand which will be always true since the lease object is the one found by the holder by calling Lease lease = leaseManager.getLease(holder); in line 1315. To fix this I think the if statement only should contain the following the condition: (leaseFile != null leaseFile.getHolder().equals(holder)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3848) A Bug in recoverLeaseInternal method of FSNameSystem class
[ https://issues.apache.org/jira/browse/HDFS-3848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032961#comment-14032961 ] Hadoop QA commented on HDFS-3848: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12545609/HDFS-3848-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7135//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7135//console This message is automatically generated. A Bug in recoverLeaseInternal method of FSNameSystem class -- Key: HDFS-3848 URL: https://issues.apache.org/jira/browse/HDFS-3848 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.1 Reporter: Hooman Peiro Sajjad Labels: patch Attachments: HDFS-3848-1.patch Original Estimate: 1h Remaining Estimate: 1h This is a bug in logic of the method recoverLeaseInternal. In line 1322 it checks if the owner of the file is trying to recreate the file. The condition of the if statement is (leaseFile != null leaseFile.equals(lease)) || lease.getHolder().equals(holder) As it can be seen, there are two operands (conditions) connected with an or operator. The first operand is straight and will be true only if the holder of the file is the new holder. But the problem is the second operand which will be always true since the lease object is the one found by the holder by calling Lease lease = leaseManager.getLease(holder); in line 1315. To fix this I think the if statement only should contain the following the condition: (leaseFile != null leaseFile.getHolder().equals(holder)) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032962#comment-14032962 ] Kihwal Lee commented on HDFS-6545: -- I think we can skip {{saveNamespace()}}. Presence of finalize op in a consumable edit segment should not cause any trouble, since any subsequent rolling upgrade will create a rollback image with txid past the op. The only potential inconvenience is that if ANN or SBN needs to be restarted before the new version saving a checkpoint for some reason, it may need -rollingUpgrade started start-up option if there was a layout version change. But this can also happen on SBN with the current code. If SBN was not up when ANN is finalizing an upgrade, SBN may need to be started with -rollingUpgrade started in order for it to read the rollback image saved by the old version. Or it will need to be bootstrapped fresh in order to download the new fsimage created only on ANN through {{saveNamespace()}}. In summary, removing {{saveNamespace()}} from {{finalizeRollingUpgrade()}} seems reasonable. Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6539: --- Resolution: Fixed Fix Version/s: 2.5.0 Target Version/s: 2.5.0 Status: Resolved (was: Patch Available) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.5.0 Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
[ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032964#comment-14032964 ] Colin Patrick McCabe commented on HDFS-6382: bq. Plus, in the places that need this the most, one has to deal with getting what essentially becomes a critical part of uptime getting scheduled, competing with all of the other things running and, to remind you, to just delete files. It's sort of ridiculous to require YARN running for what is fundamentally a file system problem. It simply doesn't work in the real world. In the examples you give, you're already using YARN for Hive and Pig, so it's already a critical part of the infrastructure. Anyway, you should be able to put the cleanup job in a different queue. It's not like YARN is strictly FIFO. bq. One eventually gets to the point that the auto cleaner job is now running hourly just so /tmp doesn't overrun the rest of HDFS. Because these run outside of HDFS, they are slow and tedious and generally fall in the lap of teams that don't do Java so end up doing all sorts of squirrely things to make these jobs work. This also sucks. Well, presumably the implementation in this JIRA won't be done by a team that doesn't do Java so we should skip that problem, right? The comments about /tmp are, I think, another example of how this needs to be highly configurable. Rather than modifying Hive or Pig to set TTLs on things, we probably want to be able to configure the scanner to look at everything under /tmp. Perhaps the scanner should attach a TTL to things in /tmp that don't already have one. Running this under YARN has an intuitive appeal to the upstream developers, since YARN is a scheduler. If we write our own scheduler for this inside HDFS, we're kind of duplicating some of that work, including the monitoring, logging, etc. features. I think Steve's comments (and a lot of the earlier comments) reflect that. Of course, to users not already using YARN, a standalone daemon might seem more appealing. The proposal to put this in the balancer seems like a reasonable compromise. We can reuse some of the balancer code, and that way, we're not adding another daemon to manage. I wonder if we could have YARN run the balancer periodically? That might be interesting. HDFS File/Directory TTL --- Key: HDFS-6382 URL: https://issues.apache.org/jira/browse/HDFS-6382 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. Following are some details of this proposal: 1. HDFS can support TTL on a specified file or directory 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired 4. The child file/directory's TTL configuration should override its parent directory's 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032962#comment-14032962 ] Kihwal Lee edited comment on HDFS-6545 at 6/16/14 9:25 PM: --- I think we can skip {{saveNamespace()}}. Presence of finalize op in a consumable edit segment should not cause any trouble, since any subsequent rolling upgrade will create a rollback image with txid past the op. The only potential inconvenience is that if ANN or SBN needs to be restarted before the new version saving a checkpoint for some reason, it may need -rollingUpgrade started start-up option if there was a layout version change. But this can also happen on SBN with the current code. In summary, removing {{saveNamespace()}} from {{finalizeRollingUpgrade()}} seems reasonable. was (Author: kihwal): I think we can skip {{saveNamespace()}}. Presence of finalize op in a consumable edit segment should not cause any trouble, since any subsequent rolling upgrade will create a rollback image with txid past the op. The only potential inconvenience is that if ANN or SBN needs to be restarted before the new version saving a checkpoint for some reason, it may need -rollingUpgrade started start-up option if there was a layout version change. But this can also happen on SBN with the current code. If SBN was not up when ANN is finalizing an upgrade, SBN may need to be started with -rollingUpgrade started in order for it to read the rollback image saved by the old version. Or it will need to be bootstrapped fresh in order to download the new fsimage created only on ANN through {{saveNamespace()}}. In summary, removing {{saveNamespace()}} from {{finalizeRollingUpgrade()}} seems reasonable. Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6545: - Attachment: HDFS-6545.patch Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical Attachments: HDFS-6545.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6545: - Status: Patch Available (was: Open) Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical Attachments: HDFS-6545.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6546) Add non-superuser capability to get the encryption zone for a specific path
Charles Lamb created HDFS-6546: -- Summary: Add non-superuser capability to get the encryption zone for a specific path Key: HDFS-6546 URL: https://issues.apache.org/jira/browse/HDFS-6546 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Charles Lamb Assignee: Charles Lamb Need to add protocol, api, and CLI that allows a non super user to ask whether a path is part of an EZ, and if so, which one. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6543) org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed intermittently
[ https://issues.apache.org/jira/browse/HDFS-6543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6543: Summary: org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed intermittently (was: org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed ) org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed intermittently --- Key: HDFS-6543 URL: https://issues.apache.org/jira/browse/HDFS-6543 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Running latest trunk locally, I'm seeing this failure: {code} --- T E S T S --- Running org.apache.hadoop.hdfs.web.TestWebHDFS Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 119.42 sec FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS testLargeFile(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: 26.415 sec ERROR! java.io.IOException: File /test/largeFile/file could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1468) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2725) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:611) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:163) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:312) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:86) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathOutputStreamRunner$1.close(WebHdfsFileSystem.java:708) at org.apache.hadoop.hdfs.web.TestWebHDFS.largeFileTest(TestWebHDFS.java:134) at org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile(TestWebHDFS.java:97) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6543) org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile failed intermittently
[ https://issues.apache.org/jira/browse/HDFS-6543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6543: Description: Running latest trunk locally, I'm seeing the following failure. Running it several times, some fail, most are successful. {code} --- T E S T S --- Running org.apache.hadoop.hdfs.web.TestWebHDFS Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 119.42 sec FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS testLargeFile(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: 26.415 sec ERROR! java.io.IOException: File /test/largeFile/file could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1468) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2725) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:611) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:163) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:312) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:86) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathOutputStreamRunner$1.close(WebHdfsFileSystem.java:708) at org.apache.hadoop.hdfs.web.TestWebHDFS.largeFileTest(TestWebHDFS.java:134) at org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeFile(TestWebHDFS.java:97) {code} was: Running latest trunk locally, I'm seeing this failure: {code} --- T E S T S --- Running org.apache.hadoop.hdfs.web.TestWebHDFS Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 119.42 sec FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS testLargeFile(org.apache.hadoop.hdfs.web.TestWebHDFS) Time elapsed: 26.415 sec ERROR! java.io.IOException: File /test/largeFile/file could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and no node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1468) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2725) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:611) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2099) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2095) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2093) at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:163) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:312) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:86) at
[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
[ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033079#comment-14033079 ] Steve Loughran commented on HDFS-6382: -- {quote} It's sort of ridiculous to require YARN running for what is fundamentally a file system problem. It simply doesn't work in the real world. {quote} Allen, that's like saying it's ridiculous to require bash scripts to perform what is fundamentally a unix filesystem problem. One is data, the other is the mechanism to run code near the data. I don't try and hide any local /tmp cleanup init.d scripts inside an ext3 plugin, after all. YARN # handles security by having you include kerberos tickets in the launch. # stops you having to choose a specific server to run this thing (hence point of failure). # lets you scale up when needed. HDFS File/Directory TTL --- Key: HDFS-6382 URL: https://issues.apache.org/jira/browse/HDFS-6382 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, namenode Affects Versions: 2.4.0 Reporter: Zesheng Wu Assignee: Zesheng Wu Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. Following are some details of this proposal: 1. HDFS can support TTL on a specified file or directory 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired 4. The child file/directory's TTL configuration should override its parent directory's 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6527: Attachment: HDFS-6527.v5.patch Thanks Kihwal! The v4 patch looks good to me. But I guess the unit test now cannot cover the non-snapshot case since the inode will not be removed from the inodemap if it is still contained in a snapshot. So based on your v4 patch I added a new unit test to cover both scenarios. Also I use a customized block placement policy and use whitebox to add the deleted inode back to the inodemap so as to remove the dependency of the fault injection code. Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033099#comment-14033099 ] Yongjun Zhang commented on HDFS-6475: - BTW, the testLargeFile failure is intermittent and I updated HDFS_6543 about it. WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033128#comment-14033128 ] Jing Zhao commented on HDFS-6545: - Yeah, looks like we do not need to do an extra checkpoint while finalizing rolling upgrade. The current patch looks good to me. One minor comment is that for HA setup we do not need to call logSync since rollEditLog already covers the sync part. Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical Attachments: HDFS-6545.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6547) IVs need to be created with a counter, not a SRNG
Charles Lamb created HDFS-6547: -- Summary: IVs need to be created with a counter, not a SRNG Key: HDFS-6547 URL: https://issues.apache.org/jira/browse/HDFS-6547 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Charles Lamb IVs should be created using a persistent counter, not a SRNG. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6518: -- Resolution: Fixed Fix Version/s: 2.5.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2, thanks for reviewing Colin! TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6386) HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6386: --- Attachment: HDFS-6386.012.patch Thanks for the review. I've attached .012 which (I think) addresses all of your previous comments. bq.We need to rebase the fs-encryption branch (and this patch) on trunk. The xattr code has changed slightly, one example being where we log the edit (FSN now, not FSDir). Done. FSNamesystem: bq. listEZ needs to only return EZs where the user has permission to know about the EZ path, else we're exposing the existence of the path In an offline conversation, we agreed that listEZ would become an su-only operation. A new Jira (HDFS-6546) will create a new method/CLI command that will allow a non-SU to ask whether a path is part of an EZ and if so, which one. This reminded me to add tests for the createEZ and deleteEZ ops under a non-superuser, which I have done in the .012 patch. bq. In createEncryptionZone, we need to catch the KP exception such that it's logged in the retry cache. Fixed. bq. Using FSDirectory#getPathComponentsForReservedPaths doesn't look right, can you check that it's not returning null? Doing some more tests with multiple EZs would be good, I noticed your listEZ test doesn't check the size of the returned listing which might be masking an error here. We agreed that it's ok to call getPathComponentsForReservedPaths is oik. I've fixed the tests. bq. KeyProvider should be a single word in javadoc ok FSDirectory: bq. I think the exception thrown from unprotectedSetXAttr contains the system.xxx xattr name. Maybe we should throw a fresh new exception rather than showing this to the user. Could also test for this explicitly rather than rethrowing an exception, since that's more expensive. This check was being made further up anyway so I removed all this catch/rethrow stuff. bq. Do we care about repeating IVs? I'm not a cryptographer, but a Google search turns up concerns for stream cipher initialization vector birthday paradox. A new Jira (HDFS-6547) specifies that we will create a persistent counter and build new IVs off of that. KeyAndIv Need interface annotations Done. HDFS Encryption Zones - Key: HDFS-6386 URL: https://issues.apache.org/jira/browse/HDFS-6386 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: HDFS-6386.012.patch, HDFS-6386.4.patch, HDFS-6386.5.patch, HDFS-6386.6.patch, HDFS-6386.8.patch Define the required security xAttributes for directories and files within an encryption zone and how they propagate to children. Implement the logic to create/delete encryption zones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6547) IVs need to be created with a counter, not a SRNG
[ https://issues.apache.org/jira/browse/HDFS-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb reassigned HDFS-6547: -- Assignee: Charles Lamb IVs need to be created with a counter, not a SRNG - Key: HDFS-6547 URL: https://issues.apache.org/jira/browse/HDFS-6547 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Charles Lamb Assignee: Charles Lamb IVs should be created using a persistent counter, not a SRNG. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033146#comment-14033146 ] Andrew Wang commented on HDFS-6528: --- +1, will commit shortly Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6528: -- Issue Type: Improvement (was: Sub-task) Parent: (was: HDFS-2006) Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033150#comment-14033150 ] Lei (Eddy) Xu commented on HDFS-5546: - Thanks [~cmccabe]. Would it be nondeterministic to use multithreads to test these race conditions? race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Kousuke Saruta Priority: Minor Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6528: -- Resolution: Fixed Fix Version/s: (was: 3.0.0) Status: Resolved (was: Patch Available) Committed to trunk and branch-2. I also converted this into its own JIRA, since HDFS-2006 was already merged to trunk and branch-2. Thanks Stephen for the patch, Akira for reviewing! Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033176#comment-14033176 ] Stephen Chu commented on HDFS-6528: --- Thank you, Andrew! Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6480) Move waitForReady() from FSDirectory to FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033191#comment-14033191 ] Hadoop QA commented on HDFS-6480: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650642/HDFS-6480.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7137//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7137//console This message is automatically generated. Move waitForReady() from FSDirectory to FSNamesystem Key: HDFS-6480 URL: https://issues.apache.org/jira/browse/HDFS-6480 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6480.000.patch, HDFS-6480.001.patch, HDFS-6480.002.patch Currently FSDirectory implements a barrier in {{waitForReady()}} / {{setReady()}} so that it only serve requests once the FSImage is fully loaded. As a part of the effort to evolve {{FSDirectory}} to a class which focuses on implementing the data structure of the namespace, this jira proposes to move the barrier one level higher to {{FSNamesystem}}. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6492) Support create-time xattrs and atomically setting multiple xattrs
[ https://issues.apache.org/jira/browse/HDFS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033193#comment-14033193 ] Hadoop QA commented on HDFS-6492: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650648/HDFS-6492.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7138//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7138//console This message is automatically generated. Support create-time xattrs and atomically setting multiple xattrs - Key: HDFS-6492 URL: https://issues.apache.org/jira/browse/HDFS-6492 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: HDFS-6492.001.patch, HDFS-6492.002.patch Ongoing work in HDFS-6134 requires being able to set system namespace extended attributes at create and mkdir time, as well as being able to atomically set multiple xattrs at once. There's currently no need to expose this functionality in the client API, so let's not unless we have to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6528) Add XAttrs to TestOfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033207#comment-14033207 ] Hudson commented on HDFS-6528: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5713 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5713/]) HDFS-6528. Add XAttrs to TestOfflineImageViewer. Contributed by Stephen Chu. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603020) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java Add XAttrs to TestOfflineImageViewer Key: HDFS-6528 URL: https://issues.apache.org/jira/browse/HDFS-6528 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Fix For: 2.5.0 Attachments: HDFS-6528.001.patch, HDFS-6528.002.patch, HDFS-6528.003.patch We should test that the OfflineImageViewer can run successfully against an fsimage with the new XAttr ops. In this patch, we set and remove XAttrs when preparing the fsimage in TestOfflineImageViewer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033208#comment-14033208 ] Hudson commented on HDFS-6518: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5713 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5713/]) HDFS-6518. TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list. (wang) (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1603016) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6545) Finalizing rolling upgrade can make NN unavailable for a long duration
[ https://issues.apache.org/jira/browse/HDFS-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033225#comment-14033225 ] Hadoop QA commented on HDFS-6545: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650655/HDFS-6545.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7140//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7140//console This message is automatically generated. Finalizing rolling upgrade can make NN unavailable for a long duration -- Key: HDFS-6545 URL: https://issues.apache.org/jira/browse/HDFS-6545 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Critical Attachments: HDFS-6545.patch In {{FSNamesystem#finalizeRollingUpgrade()}}, {{saveNamespace()}} is directly called. For name nodes with a big name space, it can take minutes to save a new fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5546) race condition crashes hadoop ls -R when directories are moved/removed
[ https://issues.apache.org/jira/browse/HDFS-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033240#comment-14033240 ] Lei (Eddy) Xu commented on HDFS-5546: - I will try to use MockFileSystem to test this issue. race condition crashes hadoop ls -R when directories are moved/removed Key: HDFS-5546 URL: https://issues.apache.org/jira/browse/HDFS-5546 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Colin Patrick McCabe Assignee: Kousuke Saruta Priority: Minor Attachments: HDFS-5546.1.patch, HDFS-5546.2.000.patch This seems to be a rare race condition where we have a sequence of events like this: 1. org.apache.hadoop.shell.Ls calls DFS#getFileStatus on directory D. 2. someone deletes or moves directory D 3. org.apache.hadoop.shell.Ls calls PathData#getDirectoryContents(D), which calls DFS#listStatus(D). This throws FileNotFoundException. 4. ls command terminates with FNF -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6386) HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033244#comment-14033244 ] Andrew Wang commented on HDFS-6386: --- Hi Charles, thanks for the rev, I did an actually thorough review this time: * Unused logRetryCache parameter being passed to FSDir * Some things that aren't being used in this patch: setFileKeyMaterial, getFileEncryptionInfo, setFileEncryptionAttributes, parentEncryptionKeyId, changes in FSN#getFileInfo and FSN#startFileInternal, pc in createEZInt and deleteEZInt, isSuperUser in listEncryptionZones. * createEZInt, if it's already in an EZ, could include the EZ in the exception text. * listEZs, can we just use {{Lists.newArrayList(encryptionZones.values())}}? * listEZ, we can move logAuditEvent outside of the critical section HDFS Encryption Zones - Key: HDFS-6386 URL: https://issues.apache.org/jira/browse/HDFS-6386 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: HDFS-6386.012.patch, HDFS-6386.4.patch, HDFS-6386.5.patch, HDFS-6386.6.patch, HDFS-6386.8.patch Define the required security xAttributes for directories and files within an encryption zone and how they propagate to children. Implement the logic to create/delete encryption zones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6418) Regression: DFS_NAMENODE_USER_NAME_KEY missing in trunk
[ https://issues.apache.org/jira/browse/HDFS-6418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033254#comment-14033254 ] Steve Loughran commented on HDFS-6418: -- How about # {{DFSConfigKeys}} becomes the public keyset (for compatibility) # a subclass of this, {{DFSPrivateConfigKeys}} becomes where private keys go # We add the (deleted) tags but deprecate them. # stuff in trunk that is new and private gets pushed into the private keys, promoted as and when its felt to make them public. I can do the creation of the private keys file, revert the deleted keys -enough to help me build/link my code...leaving the choice of new stuff to keep private to others Regression: DFS_NAMENODE_USER_NAME_KEY missing in trunk --- Key: HDFS-6418 URL: https://issues.apache.org/jira/browse/HDFS-6418 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.0, 2.5.0 Reporter: Steve Loughran Code i have that compiles against HADOOP 2.4 doesn't build against trunk as someone took away {{DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY}} -apparently in HDFS-6181. I know the name was obsolete, but anyone who has compiled code using that reference -rather than cutting and pasting in the string- is going to find their code doesn't work. More subtly: that will lead to a link exception trying to run that code on a 2.5+ cluster. This is a regression: the old names need to go back in, even if they refer to the new names and are marked as deprecated -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6548) AuthenticationToken will be ignored if the cookie value contains '@'
Juan Yu created HDFS-6548: - Summary: AuthenticationToken will be ignored if the cookie value contains '@' Key: HDFS-6548 URL: https://issues.apache.org/jira/browse/HDFS-6548 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Assignee: Juan Yu if the cookie value is something like email=x...@abc.com, HDFS will ignore the AuthenticationToken and reject the request. 2014-06-05 19:12:40,654 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: org.apache.hadoop.security.authentication.util.SignerException: Invalid signed text: u This is caused by fix for HADOOP-10379 Protect authentication cookies with the HttpOnly and Secure flags it constructs cookie header manually instead of using Cookie class so the value is not double quoted. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6544) Broken Link for GFS in package.html
[ https://issues.apache.org/jira/browse/HDFS-6544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033273#comment-14033273 ] Akira AJISAKA commented on HDFS-6544: - LGTM, +1 (non-binding). Broken Link for GFS in package.html --- Key: HDFS-6544 URL: https://issues.apache.org/jira/browse/HDFS-6544 Project: Hadoop HDFS Issue Type: Bug Reporter: Suraj Nayak M Priority: Minor Attachments: HDFS-6544.patch The link to GFS is currently pointing to http://labs.google.com/papers/gfs.html, which is broken. Change it to http://research.google.com/archive/gfs.html which has Abstract of the GFS paper along with link to the PDF version of the GFS Paper. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client
[ https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers moved HADOOP-10712 to HDFS-6549: --- Component/s: (was: nfs) nfs Target Version/s: 2.5.0 (was: 2.5.0) Affects Version/s: (was: 2.4.0) 2.4.0 Key: HDFS-6549 (was: HADOOP-10712) Project: Hadoop HDFS (was: Hadoop Common) Add support for accessing the NFS gateway from the AIX NFS client - Key: HDFS-6549 URL: https://issues.apache.org/jira/browse/HDFS-6549 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers We've identified two issues when trying to access the HDFS NFS Gateway from an AIX NFS client: # In the case of COMMITs, the AIX NFS client will always send 4096, or a multiple of the page size, for the offset to be committed, even if fewer bytes than this have ever, or will ever, be written to the file. This will cause a write to a file from the AIX NFS client to hang on close unless the size of that file is a multiple of 4096. # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the same cookie verifier for a given directory seemingly forever after that directory is first accessed over NFS, instead of getting a new cookie verifier for every set of incremental readdir calls. This means that if a directory's mtime ever changes, the FS must be unmounted/remounted before readdir calls on that dir from AIX will ever succeed again. From my interpretation of RFC-1813, the NFS Gateway is in fact doing the correct thing in both cases, but we can introduce simple changes on the NFS Gateway side to be able to optionally work around these incompatibilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6518) TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list
[ https://issues.apache.org/jira/browse/HDFS-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033278#comment-14033278 ] Yongjun Zhang commented on HDFS-6518: - Thanks Andrew for the quick fix and Colin for the review! TestCacheDirectives#testExceedsCapacity should take FSN read lock when accessing pendingCached list --- Key: HDFS-6518 URL: https://issues.apache.org/jira/browse/HDFS-6518 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Andrew Wang Fix For: 2.5.0 Attachments: HDFS-6518.001.patch Observed from https://builds.apache.org/job/PreCommit-HDFS-Build/7080//testReport/ Test org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity fails intermittently {code} Failing for the past 1 build (Since Failed#7080 ) Took 7.3 sec. Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1416) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1437) {code} A second run with the same code is successful, https://builds.apache.org/job/PreCommit-HDFS-Build/7082//testReport/ Running it locally is also successful. HDFS-6257 mentioned about possible race, maybe the issue is still there. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client
[ https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6549: - Status: Patch Available (was: Open) Add support for accessing the NFS gateway from the AIX NFS client - Key: HDFS-6549 URL: https://issues.apache.org/jira/browse/HDFS-6549 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-6549.patch We've identified two issues when trying to access the HDFS NFS Gateway from an AIX NFS client: # In the case of COMMITs, the AIX NFS client will always send 4096, or a multiple of the page size, for the offset to be committed, even if fewer bytes than this have ever, or will ever, be written to the file. This will cause a write to a file from the AIX NFS client to hang on close unless the size of that file is a multiple of 4096. # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the same cookie verifier for a given directory seemingly forever after that directory is first accessed over NFS, instead of getting a new cookie verifier for every set of incremental readdir calls. This means that if a directory's mtime ever changes, the FS must be unmounted/remounted before readdir calls on that dir from AIX will ever succeed again. From my interpretation of RFC-1813, the NFS Gateway is in fact doing the correct thing in both cases, but we can introduce simple changes on the NFS Gateway side to be able to optionally work around these incompatibilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6386) HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6386: --- Attachment: HDFS-6386.013.patch Per our conversation, I think this makes the appropriate split between our two patches and addresses your previous comments. Thanks for dealing with this. HDFS Encryption Zones - Key: HDFS-6386 URL: https://issues.apache.org/jira/browse/HDFS-6386 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Reporter: Alejandro Abdelnur Assignee: Charles Lamb Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: HDFS-6386.012.patch, HDFS-6386.013.patch, HDFS-6386.4.patch, HDFS-6386.5.patch, HDFS-6386.6.patch, HDFS-6386.8.patch Define the required security xAttributes for directories and files within an encryption zone and how they propagate to children. Implement the logic to create/delete encryption zones. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6549) Add support for accessing the NFS gateway from the AIX NFS client
[ https://issues.apache.org/jira/browse/HDFS-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6549: - Attachment: HDFS-6549.patch Here's a patch which addresses the issue by introducing an AIX compatibility mode configuration setting. When this is enabled, very slight behavior changes are introduced in the case of COMMITs and READDIR/READDIRPLUS calls as described above. No other behavior changes are introduced as part of this change. In addition to the provided test case, I've also tested this manually from an AIX client machine. It works as expected. Please review. Add support for accessing the NFS gateway from the AIX NFS client - Key: HDFS-6549 URL: https://issues.apache.org/jira/browse/HDFS-6549 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-6549.patch We've identified two issues when trying to access the HDFS NFS Gateway from an AIX NFS client: # In the case of COMMITs, the AIX NFS client will always send 4096, or a multiple of the page size, for the offset to be committed, even if fewer bytes than this have ever, or will ever, be written to the file. This will cause a write to a file from the AIX NFS client to hang on close unless the size of that file is a multiple of 4096. # In the case of READDIR and READDIRPLUS, the AIX NFS client will send the same cookie verifier for a given directory seemingly forever after that directory is first accessed over NFS, instead of getting a new cookie verifier for every set of incremental readdir calls. This means that if a directory's mtime ever changes, the FS must be unmounted/remounted before readdir calls on that dir from AIX will ever succeed again. From my interpretation of RFC-1813, the NFS Gateway is in fact doing the correct thing in both cases, but we can introduce simple changes on the NFS Gateway side to be able to optionally work around these incompatibilities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6527) Edit log corruption due to defered INode removal
[ https://issues.apache.org/jira/browse/HDFS-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033322#comment-14033322 ] Hadoop QA commented on HDFS-6527: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650667/HDFS-6527.v5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7141//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7141//console This message is automatically generated. Edit log corruption due to defered INode removal Key: HDFS-6527 URL: https://issues.apache.org/jira/browse/HDFS-6527 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6527.branch-2.4.patch, HDFS-6527.trunk.patch, HDFS-6527.v2.patch, HDFS-6527.v3.patch, HDFS-6527.v4.patch, HDFS-6527.v5.patch We have seen a SBN crashing with the following error: {panel} \[Edit log tailer\] ERROR namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/xxx, penultimateBlock=NULL, lastBlock=blk_111_111, RpcClientId=, RpcCallId=-2] java.io.FileNotFoundException: File does not exist: /xxx {panel} This was caused by the deferred removal of deleted inodes from the inode map. Since getAdditionalBlock() acquires FSN read lock and then write lock, a deletion can happen in between. Because of deferred inode removal outside FSN write lock, getAdditionalBlock() can get the deleted inode from the inode map with FSN write lock held. This allow addition of a block to a deleted file. As a result, the edit log will contain OP_ADD, OP_DELETE, followed by OP_ADD_BLOCK. This cannot be replayed by NN, so NN doesn't start up or SBN crashes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6439) NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not
[ https://issues.apache.org/jira/browse/HDFS-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033329#comment-14033329 ] Aaron T. Myers commented on HDFS-6439: -- Latest patch looks pretty good to me, and I agree that the test failure is not due to this patch - it's due to a quirk of the way test-patch chooses to build the native libs or not. Two small comments: # It's fine to change the name of the config setting, but please add a deprecation delta for the old one so that this change will be backward compatible in that respect. # The documentation addition is using the wrong name for the config setting. You need to remove the leading dfs. +1 once these are addressed. Thanks, Brandon. NFS should not reject NFS requests to the NULL procedure whether port monitoring is enabled or not -- Key: HDFS-6439 URL: https://issues.apache.org/jira/browse/HDFS-6439 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Brandon Li Assignee: Aaron T. Myers Attachments: HDFS-6439.003.patch, HDFS-6439.004.patch, HDFS-6439.patch, HDFS-6439.patch, linux-nfs-disallow-request-from-nonsecure-port.pcapng, mount-nfs-requests.pcapng As discussed in HDFS-6406, this JIRA is to track the follow update: 1. Port monitoring is the feature name with traditional NFS server and we may want to make the config property (along with related variable allowInsecurePorts) something as dfs.nfs.port.monitoring. 2 . According to RFC2623 (http://www.rfc-editor.org/rfc/rfc2623.txt): {quote}Whether port monitoring is enabled or not, NFS servers SHOULD NOT reject NFS requests to the NULL procedure (procedure number 0). See subsection 2.3.1, NULL procedure for a complete explanation. {quote} I do notice that NFS clients (most time) send mount NULL and nfs NULL from non-privileged port. If we deny NULL call in mountd or nfs server, the client can't mount the export even as user root. 3. it would be nice to have the user guide updated for the port monitoring feature. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6550) Document MapReduce metrics
Akira AJISAKA created HDFS-6550: --- Summary: Document MapReduce metrics Key: HDFS-6550 URL: https://issues.apache.org/jira/browse/HDFS-6550 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA MapReduce-side of HADOOP-6350. Add MapReduce metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6403) Add metrics for log warnings reported by JVM pauses
[ https://issues.apache.org/jira/browse/HDFS-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033365#comment-14033365 ] Aaron T. Myers commented on HDFS-6403: -- Hey Yongjun, the latest patch looks pretty good to me, but I'm a little uneasy about moving the JvmPauseMonitor into the JvmMetrics class. Even though I think that will work, it seems like inappropriate encapsulation to me. The JvmMetrics class should really just be setting up metrics, gauges, etc. and storing a handful of counters. Seems inappropriate to me to have the JvmMetrics class now also be creating a thread and doing its own monitoring of the process. Would you be alright with changing this patch around to instead keep the instantiation of the JvmPauseMonitor like it was, and just pass in a reference to that to the JvmMetrics class? Should be pretty straightforward, and I think would keep the object hierarchy a bit more sane. Add metrics for log warnings reported by JVM pauses --- Key: HDFS-6403 URL: https://issues.apache.org/jira/browse/HDFS-6403 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6403.001.patch, HDFS-6403.002.patch HADOOP-9618 logs warnings when there are long GC pauses. If this is exposed as a metric, then they can be monitored. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6475) WebHdfs clients fail without retry because incorrect handling of StandbyException
[ https://issues.apache.org/jira/browse/HDFS-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033372#comment-14033372 ] Aaron T. Myers commented on HDFS-6475: -- Latest patch looks pretty good to me. [~jingzhao]? WebHdfs clients fail without retry because incorrect handling of StandbyException - Key: HDFS-6475 URL: https://issues.apache.org/jira/browse/HDFS-6475 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.4.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6475.001.patch, HDFS-6475.002.patch, HDFS-6475.003.patch, HDFS-6475.003.patch, HDFS-6475.004.patch, HDFS-6475.005.patch With WebHdfs clients connected to a HA HDFS service, the delegation token is previously initialized with the active NN. When clients try to issue request, the NN it contacts is stored in a map returned by DFSUtil.getNNServiceRpcAddresses(conf). And the client contact the NN based on the order, so likely the first one it runs into is StandbyNN. If the StandbyNN doesn't have the updated client crediential, it will throw a s SecurityException that wraps StandbyException. The client is expected to retry another NN, but due to the insufficient handling of SecurityException mentioned above, it failed. Example message: {code} {RemoteException={message=Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException, javaCl assName=java.lang.SecurityException, exception=SecurityException}} org.apache.hadoop.ipc.RemoteException(java.lang.SecurityException): Failed to obtain user group information: org.apache.hadoop.security.token.SecretManager$InvalidToken: StandbyException at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:159) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:325) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$700(WebHdfsFileSystem.java:107) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.getResponse(WebHdfsFileSystem.java:635) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:542) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:431) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:685) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:696) at kclient1.kclient$1.run(kclient.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at kclient1.kclient.main(kclient.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)