[jira] [Commented] (HDFS-6824) Additional user documentation for HDFS encryption.
[ https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179619#comment-14179619 ] Yi Liu commented on HDFS-6824: -- +1 Thanks Andrew for the contribution. Additional user documentation for HDFS encryption. -- Key: HDFS-6824 URL: https://issues.apache.org/jira/browse/HDFS-6824 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 2.6.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Attachments: TransparentEncryption.html, hdfs-6824.001.patch, hdfs-6824.002.patch We'd like to better document additional things about HDFS encryption: setup and configuration, using alternate access methods (namely WebHDFS and HttpFS), other misc improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6877: Attachment: HDFS-6877.007.patch [~cmccabe] Thanks for your great advices. I changed the patch to enforce the logic that the ReplicaNotFoundException must be thrown when the volume for the block has been removed. Would you please take another look? Avoid calling checkDisk when an HDFS volume is removed during a write. -- Key: HDFS-6877 URL: https://issues.apache.org/jira/browse/HDFS-6877 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6877.000.consolidate.txt, HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, HDFS-6877.007.patch Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS volume is removed during a write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()
[ https://issues.apache.org/jira/browse/HDFS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179670#comment-14179670 ] Sanghyun Yun commented on HDFS-6291: Please review my patch, [~vinayrpet] and [~tedyu]. :) And can I assign to me this issue? FSImage may be left unclosed in BootstrapStandby#doRun() Key: HDFS-6291 URL: https://issues.apache.org/jira/browse/HDFS-6291 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: Ted Yu Priority: Minor Attachments: HDFS-6291.patch At around line 203: {code} if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) { return ERR_CODE_LOGS_UNAVAILABLE; } {code} If we return following the above check, image is not closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179671#comment-14179671 ] Hadoop QA commented on HDFS-7226: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676258/HDFS-7226.003.patch against trunk revision c0e0343. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8478//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8478//console This message is automatically generated. TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, HDFS-7226.003.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7246) Use ids for DatanodeStorageInfo in the BlockInfo triplets - HDFS 6660
[ https://issues.apache.org/jira/browse/HDFS-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amir Langer updated HDFS-7246: -- Summary: Use ids for DatanodeStorageInfo in the BlockInfo triplets - HDFS 6660 (was: Use ids for DatanodeStorageInfo in the BlockInfo triplets) Use ids for DatanodeStorageInfo in the BlockInfo triplets - HDFS 6660 - Key: HDFS-7246 URL: https://issues.apache.org/jira/browse/HDFS-7246 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Identical to HDFS-6660 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7014) Implement input and output streams to DataNode for native client
[ https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7014: --- Attachment: HDFS-7014-pnative.004.patch Implement input and output streams to DataNode for native client Key: HDFS-7014 URL: https://issues.apache.org/jira/browse/HDFS-7014 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: 0001-HDFS-7014-001.patch, HDFS-7014-pnative.002.patch, HDFS-7014-pnative.003.patch, HDFS-7014-pnative.004.patch, HDFS-7014.patch Implement Client - Namenode RPC protocol and support Namenode HA. Implement Client - Datanode RPC protocol. Implement some basic server side class such as ExtendedBlock and LocatedBlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7014) Implement input streams and file system functionality
[ https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7014: --- Summary: Implement input streams and file system functionality (was: Implement input and output streams to DataNode for native client) Implement input streams and file system functionality - Key: HDFS-7014 URL: https://issues.apache.org/jira/browse/HDFS-7014 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: 0001-HDFS-7014-001.patch, HDFS-7014-pnative.002.patch, HDFS-7014-pnative.003.patch, HDFS-7014-pnative.004.patch, HDFS-7014.patch Implement Client - Namenode RPC protocol and support Namenode HA. Implement Client - Datanode RPC protocol. Implement some basic server side class such as ExtendedBlock and LocatedBlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7014) Implement input streams and file system functionality
[ https://issues.apache.org/jira/browse/HDFS-7014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179755#comment-14179755 ] Zhanwei Wang commented on HDFS-7014: HDFS-7014-pnative.003.patch was created incorrectly. I create a new patch HDFS-7014-pnative.004.patch that implements the features which I mentioned above and separate the code that are related to OutputStream. If this patch is OK, I think it is time to commit it and work on HDFS-7017. Implement input streams and file system functionality - Key: HDFS-7014 URL: https://issues.apache.org/jira/browse/HDFS-7014 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: 0001-HDFS-7014-001.patch, HDFS-7014-pnative.002.patch, HDFS-7014-pnative.003.patch, HDFS-7014-pnative.004.patch, HDFS-7014.patch Implement Client - Namenode RPC protocol and support Namenode HA. Implement Client - Datanode RPC protocol. Implement some basic server side class such as ExtendedBlock and LocatedBlock -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7017 started by Zhanwei Wang. -- Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179782#comment-14179782 ] Hadoop QA commented on HDFS-6877: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676278/HDFS-6877.007.patch against trunk revision 7e3b5e6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8479//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8479//console This message is automatically generated. Avoid calling checkDisk when an HDFS volume is removed during a write. -- Key: HDFS-6877 URL: https://issues.apache.org/jira/browse/HDFS-6877 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6877.000.consolidate.txt, HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, HDFS-6877.007.patch Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS volume is removed during a write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
[ https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179833#comment-14179833 ] Hudson commented on HDFS-7259: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/720/]) HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. Contributed by Brandon Li (brandonli: rev b6f9d5538cf2b425652687e99503f3d566b2056a) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java Unresponseive NFS mount point due to deferred COMMIT response - Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.6.0 Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6581) Write to single replica in memory
[ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179827#comment-14179827 ] Hudson commented on HDFS-6581: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/720/]) Updated CHANGES.txt for HDFS-6581 merge into branch-2.6. (jitendra: rev b85919feef64ed8b05b84ab8c372844a815cc139) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Write to single replica in memory - Key: HDFS-6581 URL: https://issues.apache.org/jira/browse/HDFS-6581 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf Per discussion with the community on HDFS-5851, we will implement writing to a single replica in DN memory via DataTransferProtocol. This avoids some of the issues with short-circuit writes, which we can revisit at a later time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently
[ https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179834#comment-14179834 ] Hudson commented on HDFS-7221: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/720/]) HDFS-7221. TestDNFencingWithReplication fails consistently. Contributed by Charles Lamb. (wang: rev ac56b0637e55465d3b7f7719c8689bff2a572dc0) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HAStressTestHarness.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDNFencingWithReplication fails consistently --- Key: HDFS-7221 URL: https://issues.apache.org/jira/browse/HDFS-7221 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch TestDNFencingWithReplication consistently fails with a timeout, both in jenkins runs and on my local machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon
[ https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179828#comment-14179828 ] Hudson commented on HDFS-7204: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/720/]) HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920) * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs balancer doesn't run as a daemon Key: HDFS-7204 URL: https://issues.apache.org/jira/browse/HDFS-7204 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Blocker Labels: newbie Fix For: 3.0.0 Attachments: HDFS-7204-01.patch, HDFS-7204.patch From HDFS-7184, minor issues with balancer: * daemon isn't set to true in hdfs to enable daemonization * start-balancer script has usage instead of hadoop_usage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179826#comment-14179826 ] Hudson commented on HDFS-7215: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #720 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/720/]) HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li (brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea) * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently
[ https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179923#comment-14179923 ] Hudson commented on HDFS-7221: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/]) HDFS-7221. TestDNFencingWithReplication fails consistently. Contributed by Charles Lamb. (wang: rev ac56b0637e55465d3b7f7719c8689bff2a572dc0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HAStressTestHarness.java TestDNFencingWithReplication fails consistently --- Key: HDFS-7221 URL: https://issues.apache.org/jira/browse/HDFS-7221 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch TestDNFencingWithReplication consistently fails with a timeout, both in jenkins runs and on my local machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6581) Write to single replica in memory
[ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179915#comment-14179915 ] Hudson commented on HDFS-6581: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/]) Updated CHANGES.txt for HDFS-6581 merge into branch-2.6. (jitendra: rev b85919feef64ed8b05b84ab8c372844a815cc139) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Write to single replica in memory - Key: HDFS-6581 URL: https://issues.apache.org/jira/browse/HDFS-6581 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf Per discussion with the community on HDFS-5851, we will implement writing to a single replica in DN memory via DataTransferProtocol. This avoids some of the issues with short-circuit writes, which we can revisit at a later time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
[ https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179922#comment-14179922 ] Hudson commented on HDFS-7259: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/]) HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. Contributed by Brandon Li (brandonli: rev b6f9d5538cf2b425652687e99503f3d566b2056a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java Unresponseive NFS mount point due to deferred COMMIT response - Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.6.0 Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179914#comment-14179914 ] Hudson commented on HDFS-7215: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/]) HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li (brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea) * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon
[ https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179916#comment-14179916 ] Hudson commented on HDFS-7204: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1909 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1909/]) HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920) * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh balancer doesn't run as a daemon Key: HDFS-7204 URL: https://issues.apache.org/jira/browse/HDFS-7204 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Blocker Labels: newbie Fix For: 3.0.0 Attachments: HDFS-7204-01.patch, HDFS-7204.patch From HDFS-7184, minor issues with balancer: * daemon isn't set to true in hdfs to enable daemonization * start-balancer script has usage instead of hadoop_usage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon
[ https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180008#comment-14180008 ] Hudson commented on HDFS-7204: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/]) HDFS-7204. balancer doesn't run as a daemon (aw) (aw: rev 4baca311ffb5489fbbe08288502db68875834920) * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/stop-balancer.sh * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs * hadoop-hdfs-project/hadoop-hdfs/src/main/bin/start-balancer.sh balancer doesn't run as a daemon Key: HDFS-7204 URL: https://issues.apache.org/jira/browse/HDFS-7204 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Blocker Labels: newbie Fix For: 3.0.0 Attachments: HDFS-7204-01.patch, HDFS-7204.patch From HDFS-7184, minor issues with balancer: * daemon isn't set to true in hdfs to enable daemonization * start-balancer script has usage instead of hadoop_usage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently
[ https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180014#comment-14180014 ] Hudson commented on HDFS-7221: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/]) HDFS-7221. TestDNFencingWithReplication fails consistently. Contributed by Charles Lamb. (wang: rev ac56b0637e55465d3b7f7719c8689bff2a572dc0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/HAStressTestHarness.java TestDNFencingWithReplication fails consistently --- Key: HDFS-7221 URL: https://issues.apache.org/jira/browse/HDFS-7221 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch TestDNFencingWithReplication consistently fails with a timeout, both in jenkins runs and on my local machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6581) Write to single replica in memory
[ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180007#comment-14180007 ] Hudson commented on HDFS-6581: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/]) Updated CHANGES.txt for HDFS-6581 merge into branch-2.6. (jitendra: rev b85919feef64ed8b05b84ab8c372844a815cc139) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Write to single replica in memory - Key: HDFS-6581 URL: https://issues.apache.org/jira/browse/HDFS-6581 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf Per discussion with the community on HDFS-5851, we will implement writing to a single replica in DN memory via DataTransferProtocol. This avoids some of the issues with short-circuit writes, which we can revisit at a later time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180006#comment-14180006 ] Hudson commented on HDFS-7215: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/]) HDFS-7215.Add JvmPauseMonitor to NFS gateway. Contributed by Brandon Li (brandonli: rev 4e134a02a4b6f30704b99dfb166dc361daf426ea) * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
[ https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180013#comment-14180013 ] Hudson commented on HDFS-7259: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1934 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1934/]) HDFS-7259. Unresponseive NFS mount point due to deferred COMMIT response. Contributed by Brandon Li (brandonli: rev b6f9d5538cf2b425652687e99503f3d566b2056a) * hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/IdUserGroup.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/conf/NfsConfigKeys.java Unresponseive NFS mount point due to deferred COMMIT response - Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.6.0 Attachments: HDFS-7259.001.patch, HDFS-7259.002.patch Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()
[ https://issues.apache.org/jira/browse/HDFS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180209#comment-14180209 ] Ted Yu commented on HDFS-6291: -- With image.close() in finally block, the catch block doesn't need to call it, right ? FSImage may be left unclosed in BootstrapStandby#doRun() Key: HDFS-6291 URL: https://issues.apache.org/jira/browse/HDFS-6291 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: Ted Yu Priority: Minor Attachments: HDFS-6291.patch At around line 203: {code} if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) { return ERR_CODE_LOGS_UNAVAILABLE; } {code} If we return following the above check, image is not closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7226: Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the fix, Yongjun! I've committed this to trunk and branch-2. TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, HDFS-7226.003.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7228) Add an SSD policy into the default BlockStoragePolicySuite
[ https://issues.apache.org/jira/browse/HDFS-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180239#comment-14180239 ] Hudson commented on HDFS-7228: -- FAILURE: Integrated in Hadoop-trunk-Commit #6311 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6311/]) HDFS-7228. Fix TestDNFencing.testQueueingWithAppend. Contributed by Yongjun Zhang. (jing9: rev 1c8d191117de3d2e035bd728bccfde0f4b81296f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Add an SSD policy into the default BlockStoragePolicySuite -- Key: HDFS-7228 URL: https://issues.apache.org/jira/browse/HDFS-7228 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.6.0 Attachments: HDFS-7228.000.patch, HDFS-7228.001.patch, HDFS-7228.002.patch, HDFS-7228.003.patch, HDFS-7228.003.patch Currently in the default BlockStoragePolicySuite, we've defined 4 storage policies: LAZY_PERSIST, HOT, WARM, and COLD. Since we have already defined the SSD storage type, it will be useful to also include a SSD related storage policy in the default suite. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180290#comment-14180290 ] Yongjun Zhang commented on HDFS-7226: - Thanks a lot [~jingzhao]! Hopefully the next hdfs build will be clean. TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.7.0 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, HDFS-7226.003.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7180: - Attachment: HDFS-7180.002.patch NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [10.0.3.172:50010, 10.0.3.176:50010] 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 java.io.IOException: Bad response ERROR for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 from datanode 10.0.3.176:50010 at
[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180305#comment-14180305 ] Brandon Li commented on HDFS-7180: -- Nice catch, Jing. I've uploaded a new patch. It lets dumper notify waiting threads even when error happens. I also did some code cleanup. NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [10.0.3.172:50010, 10.0.3.176:50010] 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180307#comment-14180307 ] Brandon Li commented on HDFS-7180: -- The unit test seems tricky to add. I did some file uploading tests to see the pending non-sequencial writes were under control. NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [10.0.3.172:50010, 10.0.3.176:50010] 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 java.io.IOException: Bad
[jira] [Commented] (HDFS-7231) rollingupgrade needs some guard rails
[ https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180312#comment-14180312 ] Suresh Srinivas commented on HDFS-7231: --- Allen, I just rewrote the steps with additional details to clarify: # Upgrade 2.0.5 cluster to 2.2 # Do not -finalizeUpgrade # Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1. # Start namenode -upgrade option. # Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress # Leave 2.4.1 DNs running # Install binaries on NN to 2.2 # Start NN on 2.2 with no upgrade related options So far things are clear. Then you go on to say, the following: bq. DNs now do a partial roll-forward, rendering them unable to continue What do you mean by this? bq. admins manually repair version files on those broken directories This is as you know is a recipe for disaster. Let me ask you a question. Before you go on to 2.4.1, if you do finalize of upgrade what happens? rollingupgrade needs some guard rails - Key: HDFS-7231 URL: https://issues.apache.org/jira/browse/HDFS-7231 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Allen Wittenauer Priority: Blocker See first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177518#comment-14177518 ] Colin Patrick McCabe edited comment on HDFS-7235 at 10/22/14 7:15 PM: -- {code} 1787 ReplicaInfo replicaInfo = null; 1788 synchronized(data) { 1789replicaInfo = (ReplicaInfo) data.getReplica( block.getBlockPoolId(), 1790block.getBlockId()); 1791 } 1792 if (replicaInfo != null 1793 replicaInfo.getState() == ReplicaState.FINALIZED 1794 !replicaInfo.getBlockFile().exists()) { {code} You can't release the lock this way. Once you release the lock, replicaInfo could be mutated at any time. So you need to do all the check under the lock. {code} 1795// 1796// Report back to NN bad block caused by non-existent block file. 1797// WATCH-OUT: be sure the conditions checked above matches the following 1798// method in FsDatasetImpl.java: 1799// boolean isValidBlock(ExtendedBlock b) 1800// all other conditions need to be true except that 1801// replicaInfo.getBlockFile().exists() returns false. 1802// {code} I don't think we need the WATCH-OUT part. We shouldn't be calling {{isValidBlock}}, so why do we care if the check is the same as that check? I generally agree with this approach and I think we can get this in if that's fixed. was (Author: cmccabe): {code} 1787 ReplicaInfo replicaInfo = null; 1788 synchronized(data) { 1789replicaInfo = (ReplicaInfo) data.getReplica( block.getBlockPoolId(), 1790block.getBlockId()); 1791 } 1792 if (replicaInfo != null 1793 replicaInfo.getState() == ReplicaState.FINALIZED 1794 !replicaInfo.getBlockFile().exists()) { {code} You can't release the lock this way. Once you release the lock, replicaInfo could be mutated at any time. So you need to do all the check under the lock. {code} 1795// 1796// Report back to NN bad block caused by non-existent block file. 1797// WATCH-OUT: be sure the conditions checked above matches the following 1798// method in FsDatasetImpl.java: 1799// boolean isValidBlock(ExtendedBlock b) 1800// all other conditions need to be true except that 1801// replicaInfo.getBlockFile().exists() returns false. 1802// {code} I don't think we need the WATCH-OUT part. We're not calling {{isValidBlock}}, so why do we care if the check is the same as that check? I generally agree with this approach and I think we can get this in if that's fixed. Can not decommission DN which has invalid block due to bad disk --- Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180382#comment-14180382 ] Hadoop QA commented on HDFS-7180: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676390/HDFS-7180.002.patch against trunk revision d67214f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8480//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8480//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8480//console This message is automatically generated. NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:51:46,750 INFO
[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180466#comment-14180466 ] Colin Patrick McCabe commented on HDFS-7235: Hi Yongjun, Thanks for your patience here. I don't think the current patch is quite ready. I could point to a few things, like this: {{ReplicaInfo replicaInfo = (ReplicaInfo) data.getReplica(}} We shouldn't be downcasting here. I think the bigger issue is that the interface in FsDatasetSpi is just not very suitable to what we're trying to do. Rather than trying to hack it, I think we should come up with a better interface. I think we should replace {{FsDatasetSpi#isValid}} with this function: {code} /** * Check if a block is valid. * * @param b The block to check. * @param minLength The minimum length that the block must have. May be 0. * @param state If this is null, it is ignored. If it is non-null, we *will check that the replica has this state. * * @throws FileNotFoundException If the replica is not found or there * was an error locating it. * @throws EOFException If the replica length is too short. * @throws UnexpectedReplicaStateException If the replica is not in the * expected state. */ public void checkBlock(ExtendedBlock b, long minLength, ReplicaState state); {code} Since this function will throw a clearly marked exception detailing which case we're in, we won't have to call multiple functions. This will be better for performance since we're only taking the lock once. This will also be better for clarity, since the current APIs lead to some rather complex code. We could also get rid of {{FsDatasetSpi#isValidRbw}}, since this function can do everything that it can. Also UnexpectedReplicaStateException could be a new exception under hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/UnexpectedReplicaStateException.java I think it's fine to change FsDatasetSpi for this (we did it when adding caching stuff, and again when adding trash). Let me know what you think. I think it would make things a lot more clear. Can not decommission DN which has invalid block due to bad disk --- Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180474#comment-14180474 ] Siqi Li commented on HDFS-5928: --- [~wheat9] I have added the check for both namespace and namenodeID show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7254) Add documents for hot swap drive
[ https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180476#comment-14180476 ] Colin Patrick McCabe commented on HDFS-7254: +1. Thanks, Eddy. Test failure is not related because this is only a docs change. Add documents for hot swap drive Key: HDFS-7254 URL: https://issues.apache.org/jira/browse/HDFS-7254 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, HDFS-7254.002.patch, HDFS-7254.003.patch Add documents for the hot swap drive functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7254) Add documentation for hot swaping DataNode drives
[ https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7254: --- Summary: Add documentation for hot swaping DataNode drives (was: Add documents for hot swap drive) Add documentation for hot swaping DataNode drives - Key: HDFS-7254 URL: https://issues.apache.org/jira/browse/HDFS-7254 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, HDFS-7254.002.patch, HDFS-7254.003.patch Add documents for the hot swap drive functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7254) Add documentation for hot swaping DataNode drives
[ https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7254: --- Resolution: Fixed Fix Version/s: 2.7.0 Target Version/s: 2.7.0 (was: 2.6.0) Status: Resolved (was: Patch Available) Add documentation for hot swaping DataNode drives - Key: HDFS-7254 URL: https://issues.apache.org/jira/browse/HDFS-7254 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, HDFS-7254.002.patch, HDFS-7254.003.patch Add documents for the hot swap drive functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7254) Add documentation for hot swaping DataNode drives
[ https://issues.apache.org/jira/browse/HDFS-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180486#comment-14180486 ] Hudson commented on HDFS-7254: -- FAILURE: Integrated in Hadoop-trunk-Commit #6314 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6314/]) HDFS-7254. Add documentation for hot swaping DataNode drives (Lei Xu via Colin P. McCabe) (cmccabe: rev 66e8187ea1dbc6230ab2c633e4f609a7068b75db) * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSCommands.apt.vm * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Add documentation for hot swaping DataNode drives - Key: HDFS-7254 URL: https://issues.apache.org/jira/browse/HDFS-7254 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-7254.000.patch, HDFS-7254.001.patch, HDFS-7254.002.patch, HDFS-7254.003.patch Add documents for the hot swap drive functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180490#comment-14180490 ] Haohui Mai commented on HDFS-5928: -- The code can be simplified in putting the relevant information in an object. For example: {code} {#HAInfo} {namespace}-{nnid} {/HAInfo} {code} In the javascript side: {code} var namespace = null, nnid = null; // parse XML and set namespace and nnid if (namespace nnid) { HAInfo = {namespace: namespace, nnid: nnid} } {code} show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180497#comment-14180497 ] Colin Patrick McCabe commented on HDFS-6877: +1. Thanks, Eddy. TestDNFencing failure is HDFS-7226, not related. Avoid calling checkDisk when an HDFS volume is removed during a write. -- Key: HDFS-6877 URL: https://issues.apache.org/jira/browse/HDFS-6877 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6877.000.consolidate.txt, HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, HDFS-6877.007.patch Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS volume is removed during a write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6877: --- Resolution: Fixed Fix Version/s: 2.7.0 Target Version/s: 2.7.0 (was: 3.0.0) Status: Resolved (was: Patch Available) Avoid calling checkDisk when an HDFS volume is removed during a write. -- Key: HDFS-6877 URL: https://issues.apache.org/jira/browse/HDFS-6877 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-6877.000.consolidate.txt, HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, HDFS-6877.007.patch Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS volume is removed during a write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7257) Add the time of last HA state transition to NN's /jmx page
[ https://issues.apache.org/jira/browse/HDFS-7257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180499#comment-14180499 ] Andrew Wang commented on HDFS-7257: --- I don't think there are any timezone concerns, considering that the timezone is shown as part of the string. However, if you'd prefer that it's not included, I'm okay with that. I agree that it can just be converted for usage on the webUI. A final note, it'd also be better to use a standardized date format like ISO 8601 rather than creating a new one: http://en.wikipedia.org/wiki/ISO_8601 Add the time of last HA state transition to NN's /jmx page -- Key: HDFS-7257 URL: https://issues.apache.org/jira/browse/HDFS-7257 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7257.001.patch, HDFS-7257.002.patch, HDFS-7257.003.patch It would be useful to some monitoring apps to expose the last HA transition time in the NN's /jmx page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180505#comment-14180505 ] Hudson commented on HDFS-6877: -- FAILURE: Integrated in Hadoop-trunk-Commit #6315 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6315/]) HDFS-6877. Avoid calling checkDisk when an HDFS volume is removed during a write. (Lei Xu via Colin P. McCabe) (cmccabe: rev 7b0f9bb2583cd9b7274f1e31c173c1c6a7ce467b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java * hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java Avoid calling checkDisk when an HDFS volume is removed during a write. -- Key: HDFS-6877 URL: https://issues.apache.org/jira/browse/HDFS-6877 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-6877.000.consolidate.txt, HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, HDFS-6877.007.patch Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS volume is removed during a write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180524#comment-14180524 ] Siqi Li commented on HDFS-5928: --- I don't think this is going to work, if the cluster doesn't have HA or federation. Also, it's good to let people know what is namespace and what namenodeID show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6663: -- Attachment: HDFS-6663-5.patch Decommission status of a block contains more details. It will show a block is decomissioning or decomissioned. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6877) Avoid calling checkDisk when an HDFS volume is removed during a write.
[ https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180536#comment-14180536 ] Lei (Eddy) Xu commented on HDFS-6877: - Thank you for checking in this! [~cmccabe] Avoid calling checkDisk when an HDFS volume is removed during a write. -- Key: HDFS-6877 URL: https://issues.apache.org/jira/browse/HDFS-6877 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Fix For: 2.7.0 Attachments: HDFS-6877.000.consolidate.txt, HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, HDFS-6877.004.patch, HDFS-6877.005.patch, HDFS-6877.006.patch, HDFS-6877.007.patch Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS volume is removed during a write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180541#comment-14180541 ] Haohui Mai commented on HDFS-5928: -- The key idea is to ensure {{HAInfo}} is null in non-HA clusters. You might need some slight tweaks to make it work in all cases, but I think you get the idea. show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6694) TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms
[ https://issues.apache.org/jira/browse/HDFS-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180563#comment-14180563 ] Chen He commented on HDFS-6694: --- the one I got is: java.lang.RuntimeException: Deferred at org.apache.hadoop.test.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:130) at org.apache.hadoop.test.MultithreadedTestUtil$TestContext.waitFor(MultithreadedTestUtil.java:121) at org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testPipelineRecoveryStress(TestPipelinesFailover.java:485) Caused by: java.lang.AssertionError: expected:100 but was:0 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.AppendTestUtil.check(AppendTestUtil.java:123) at org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover$PipelineTestThread.doAnAction(TestPipelinesFailover.java:522) at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222) at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189) Results : Tests in error: TestPipelinesFailover.testPipelineRecoveryStress:485 » Runtime Deferred TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms Key: HDFS-6694 URL: https://issues.apache.org/jira/browse/HDFS-6694 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Critical Fix For: 2.6.0 Attachments: HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, HDFS-6694.002.dbg.patch, org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover-output.txt, org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.txt TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms. Typical failures are described in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-5928: -- Attachment: HDFS-5928.v5.patch show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-2486) Review issues with UnderReplicatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-2486: -- Fix Version/s: (was: 3.0.0) 2.7.0 I merged this down to branch-2 to make a cherry-pick cleaner. Review issues with UnderReplicatedBlocks Key: HDFS-2486 URL: https://issues.apache.org/jira/browse/HDFS-2486 Project: Hadoop HDFS Issue Type: Task Components: namenode Affects Versions: 0.23.0 Reporter: Steve Loughran Assignee: Uma Maheswara Rao G Priority: Minor Fix For: 2.7.0 Attachments: HDFS-2486.patch Here are some things I've noted in the UnderReplicatedBlocks class that someone else should review and consider if the code is correct. If not, they are easy to fix. remove(Block block, int priLevel) is not synchronized, and as the inner classes are not, there is a risk of race conditions there. some of the code assumes that getPriority can return the value LEVEL, and if so does not attempt to queue the blocks. As this return value is not currently possible, those checks can be removed. The queue gives priority to blocks whose replication count is less than a third of its expected count over those that are normally under replicated. While this is good for ensuring that files scheduled for large replication are replicated fast, it may not be the best strategy for maintaining data integrity. For that it may be better to give whichever blocks have only two replicas priority over blocks that may, for example, already have 3 out of 10 copies in the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6694) TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms
[ https://issues.apache.org/jira/browse/HDFS-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180574#comment-14180574 ] Yongjun Zhang commented on HDFS-6694: - HI [~airbots], Thanks for reporting the issue you ran into. Would you please look into your log to see if there are Too many open files kind of messages? TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms Key: HDFS-6694 URL: https://issues.apache.org/jira/browse/HDFS-6694 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Priority: Critical Fix For: 2.6.0 Attachments: HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, HDFS-6694.001.dbg.patch, HDFS-6694.002.dbg.patch, org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover-output.txt, org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.txt TestPipelinesFailover.testPipelineRecoveryStress tests fail intermittently with various symptoms. Typical failures are described in first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6888) Remove audit logging of getFIleInfo()
[ https://issues.apache.org/jira/browse/HDFS-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated HDFS-6888: -- Attachment: HDFS-6888-6.patch update patch against trunk.TestBalancer and TestFailureToReadEdits work fine on my machine. TestPipelinesFailover failure is because of HDFS-6694 Remove audit logging of getFIleInfo() - Key: HDFS-6888 URL: https://issues.apache.org/jira/browse/HDFS-6888 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Labels: log Attachments: HDFS-6888-2.patch, HDFS-6888-3.patch, HDFS-6888-4.patch, HDFS-6888-5.patch, HDFS-6888-6.patch, HDFS-6888.patch The audit logging of getFileInfo() was added in HDFS-3733. Since this is a one of the most called method, users have noticed that audit log is now filled with this. Since we now have HTTP request logging, this seems unnecessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6824) Additional user documentation for HDFS encryption.
[ https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6824: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Thanks Yi, I committed this to branch-2 and trunk. Additional user documentation for HDFS encryption. -- Key: HDFS-6824 URL: https://issues.apache.org/jira/browse/HDFS-6824 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 2.6.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Fix For: 2.7.0 Attachments: TransparentEncryption.html, hdfs-6824.001.patch, hdfs-6824.002.patch We'd like to better document additional things about HDFS encryption: setup and configuration, using alternate access methods (namely WebHDFS and HttpFS), other misc improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2486) Review issues with UnderReplicatedBlocks
[ https://issues.apache.org/jira/browse/HDFS-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180587#comment-14180587 ] Hudson commented on HDFS-2486: -- FAILURE: Integrated in Hadoop-trunk-Commit #6317 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6317/]) Move HDFS-2486 down to 2.7.0 in CHANGES.txt (wang: rev 08457e9e57e4fa3c83217fd0a092e926ba7eb135) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Review issues with UnderReplicatedBlocks Key: HDFS-2486 URL: https://issues.apache.org/jira/browse/HDFS-2486 Project: Hadoop HDFS Issue Type: Task Components: namenode Affects Versions: 0.23.0 Reporter: Steve Loughran Assignee: Uma Maheswara Rao G Priority: Minor Fix For: 2.7.0 Attachments: HDFS-2486.patch Here are some things I've noted in the UnderReplicatedBlocks class that someone else should review and consider if the code is correct. If not, they are easy to fix. remove(Block block, int priLevel) is not synchronized, and as the inner classes are not, there is a risk of race conditions there. some of the code assumes that getPriority can return the value LEVEL, and if so does not attempt to queue the blocks. As this return value is not currently possible, those checks can be removed. The queue gives priority to blocks whose replication count is less than a third of its expected count over those that are normally under replicated. While this is good for ensuring that files scheduled for large replication are replicated fast, it may not be the best strategy for maintaining data integrity. For that it may be better to give whichever blocks have only two replicas priority over blocks that may, for example, already have 3 out of 10 copies in the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6824) Additional user documentation for HDFS encryption.
[ https://issues.apache.org/jira/browse/HDFS-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180588#comment-14180588 ] Hudson commented on HDFS-6824: -- FAILURE: Integrated in Hadoop-trunk-Commit #6317 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6317/]) HDFS-6824. Additional user documentation for HDFS encryption. (wang: rev a36399e09c8c92911df08f78a4b88528b6dd513f) * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/TransparentEncryption.apt.vm * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Additional user documentation for HDFS encryption. -- Key: HDFS-6824 URL: https://issues.apache.org/jira/browse/HDFS-6824 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 2.6.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Fix For: 2.7.0 Attachments: TransparentEncryption.html, hdfs-6824.001.patch, hdfs-6824.002.patch We'd like to better document additional things about HDFS encryption: setup and configuration, using alternate access methods (namely WebHDFS and HttpFS), other misc improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal
Haohui Mai created HDFS-7277: Summary: Remove explicit dependency on netty 3.2 in BKJournal Key: HDFS-7277 URL: https://issues.apache.org/jira/browse/HDFS-7277 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the code does not use it. It should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal
[ https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7277: - Attachment: HDFS-7277.000.patch Remove explicit dependency on netty 3.2 in BKJournal Key: HDFS-7277 URL: https://issues.apache.org/jira/browse/HDFS-7277 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-7277.000.patch The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the code does not use it. It should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal
[ https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7277: - Status: Patch Available (was: Open) Remove explicit dependency on netty 3.2 in BKJournal Key: HDFS-7277 URL: https://issues.apache.org/jira/browse/HDFS-7277 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-7277.000.patch The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the code does not use it. It should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7235) Can not decommission DN which has invalid block due to bad disk
[ https://issues.apache.org/jira/browse/HDFS-7235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180626#comment-14180626 ] Yongjun Zhang commented on HDFS-7235: - Hi [~cmccabe], Thanks a lot for the side discussion and comment. I will look into. Can not decommission DN which has invalid block due to bad disk --- Key: HDFS-7235 URL: https://issues.apache.org/jira/browse/HDFS-7235 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7235.001.patch, HDFS-7235.002.patch, HDFS-7235.003.patch When to decommission a DN, the process hangs. What happens is, when NN chooses a replica as a source to replicate data on the to-be-decommissioned DN to other DNs, it favors choosing this DN to-be-decommissioned as the source of transfer (see BlockManager.java). However, because of the bad disk, the DN would detect the source block to be transfered as invalidBlock with the following logic in FsDatasetImpl.java: {code} /** Does the block exist and have the given state? */ private boolean isValid(final ExtendedBlock b, final ReplicaState state) { final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(), b.getLocalBlock()); return replicaInfo != null replicaInfo.getState() == state replicaInfo.getBlockFile().exists(); } {code} The reason that this method returns false (detecting invalid block) is because the block file doesn't exist due to bad disk in this case. The key issue we found here is, after DN detects an invalid block for the above reason, it doesn't report the invalid block back to NN, thus NN doesn't know that the block is corrupted, and keeps sending the data transfer request to the same DN to be decommissioned, again and again. This caused an infinite loop, so the decommission process hangs. Thanks [~qwertymaniac] for reporting the issue and initial analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal
[ https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180667#comment-14180667 ] Hadoop QA commented on HDFS-7277: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676439/HDFS-7277.000.patch against trunk revision a36399e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8485//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8485//console This message is automatically generated. Remove explicit dependency on netty 3.2 in BKJournal Key: HDFS-7277 URL: https://issues.apache.org/jira/browse/HDFS-7277 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-7277.000.patch The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the code does not use it. It should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180670#comment-14180670 ] Hadoop QA commented on HDFS-7232: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675587/HDFS-7232.patch against trunk revision a36399e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8484//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8484//console This message is automatically generated. Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180682#comment-14180682 ] Konstantin Shvachko commented on HDFS-6658: --- I agree usually people remove data in order to have space to put more. And the freed space usually fills up again in a couple of weeks or months. I don't know if this asnwer is good enough. It is for me, but in the end you got a bigger cluster. It would be nice to find a way to detect fully empty arrays of the BlockList and release them once the last reference is removed. That should be good enough to avoid a stand-alone thread for garbage collecting or compacting in your terms. Namenode memory optimization - Block replicas list --- Key: HDFS-6658 URL: https://issues.apache.org/jira/browse/HDFS-6658 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Amir Langer Assignee: Amir Langer Attachments: BlockListOptimizationComparison.xlsx, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas list.docx Part of the memory consumed by every BlockInfo object in the Namenode is a linked list of block references for every DatanodeStorageInfo (called triplets). We propose to change the way we store the list in memory. Using primitive integer indexes instead of object references will reduce the memory needed for every block replica (when compressed oops is disabled) and in our new design the list overhead will be per DatanodeStorageInfo and not per block replica. see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal
[ https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180703#comment-14180703 ] Jing Zhao commented on HDFS-7277: - +1 Remove explicit dependency on netty 3.2 in BKJournal Key: HDFS-7277 URL: https://issues.apache.org/jira/browse/HDFS-7277 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Attachments: HDFS-7277.000.patch The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the code does not use it. It should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold
[ https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-6988: - Attachment: HDFS-6988.03.patch Add configurable limit for percentage-based eviction threshold -- Key: HDFS-6988 URL: https://issues.apache.org/jira/browse/HDFS-6988 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao Fix For: HDFS-6581 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, HDFS-6988.03.patch Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction thresholds configurable. The hard-coded thresholds may not be appropriate for very large RAM disks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6988) Add configurable limit for percentage-based eviction threshold
[ https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-6988: - Fix Version/s: (was: HDFS-6581) 3.0.0 Affects Version/s: (was: HDFS-6581) 2.6.0 Status: Patch Available (was: In Progress) Add configurable limit for percentage-based eviction threshold -- Key: HDFS-6988 URL: https://issues.apache.org/jira/browse/HDFS-6988 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao Fix For: 3.0.0 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, HDFS-6988.03.patch Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction thresholds configurable. The hard-coded thresholds may not be appropriate for very large RAM disks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold
[ https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180709#comment-14180709 ] Xiaoyu Yao commented on HDFS-6988: -- Thanks [~cmccabe] for the confirmation. I just submit a patch for it. Add configurable limit for percentage-based eviction threshold -- Key: HDFS-6988 URL: https://issues.apache.org/jira/browse/HDFS-6988 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao Fix For: 3.0.0 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, HDFS-6988.03.patch Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction thresholds configurable. The hard-coded thresholds may not be appropriate for very large RAM disks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7258) CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level
[ https://issues.apache.org/jira/browse/HDFS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDFS-7258: Assignee: Xiaoyu Yao CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level Key: HDFS-7258 URL: https://issues.apache.org/jira/browse/HDFS-7258 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 seconds to HDSF NN log as shown below. This should be a DEBUG level log to avoid flooding the namenode log. 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:53:00,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:00,266 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:00,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:00,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:55:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:00,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:30,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:57:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:00,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:57:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:30,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180725#comment-14180725 ] Hadoop QA commented on HDFS-6663: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676423/HDFS-6663-5.patch against trunk revision 7b0f9bb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8481//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8481//console This message is automatically generated. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6742) Support sorting datanode list on the new NN webUI
[ https://issues.apache.org/jira/browse/HDFS-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180743#comment-14180743 ] Siqi Li commented on HDFS-6742: --- [~airbots] Hi Chen, any updates on this jira? It would be extremely helpful when dealing with cluster with thousands of nodes Support sorting datanode list on the new NN webUI - Key: HDFS-6742 URL: https://issues.apache.org/jira/browse/HDFS-6742 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ming Ma Assignee: Chen He The legacy webUI allows sorting datanode list based on specific column such as hostname. It is handy for admins can find pattern more quickly, especially for big clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7180: - Attachment: HDFS-7180.003.patch Uploaded a new patch to fix the findbugs warning. NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, HDFS-7180.003.patch We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [10.0.3.172:50010, 10.0.3.176:50010] 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 java.io.IOException: Bad response ERROR for block
[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180759#comment-14180759 ] Jing Zhao commented on HDFS-7180: - +1 pending Jenkins NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, HDFS-7180.003.patch We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [10.0.3.172:50010, 10.0.3.176:50010] 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 java.io.IOException: Bad response ERROR for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
[jira] [Created] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
Colin Patrick McCabe created HDFS-7278: -- Summary: Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7278: --- Attachment: HDFS-7278.002.patch Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7278.002.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7278: --- Status: Patch Available (was: Open) Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7278.002.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180770#comment-14180770 ] Suresh Srinivas commented on HDFS-7278: --- [~cmccabe], can you describe why this is needed so that others have context? Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7278.002.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
Haohui Mai created HDFS-7279: Summary: Use netty to implement DatanodeWebHdfsMethods Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180775#comment-14180775 ] Aaron T. Myers commented on HDFS-7278: -- I think it's a good tool to have in our toolbox to work around possible bugs in NN replica accounting. If an operator suspects such an issue, they might be tempted to restart a DN, or all of the DNs in a cluster, in order to trigger full block reports. It'd be much lighter weight if instead the operator could just manually trigger a full BR instead of having to restart the DN and therefore need to scan all the DN data dirs, etc. Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7278.002.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180784#comment-14180784 ] Haohui Mai commented on HDFS-7279: -- An alternative option is to upgrade jetty and servlet. The New APIs from both jetty and servlet such as asynchronous servlet can amend some of the issues. Webhdfs on the DN side, however, is data intensive which does not fit the servlet API very well. The servlet / jetty APIs do not give fine-grain control on the resources that netty is able to provide. These controls are critical if webhdfs needs to survive on heavy workload. The strategy is proven by the mapreduce client, which already uses netty to implement the shuffle functionality. For other URLs on the DNs, I plan to keep jetty listening on a local address, but to have a reverse proxy in netty to continue the serve these URLs. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7223) Tracing span description of IPC client is too long
[ https://issues.apache.org/jira/browse/HDFS-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-7223: --- Attachment: HDFS-7223-1.patch Thanks for the comment [~cmccabe]! I updated patch based on your suggestion. Tracing span description of IPC client is too long -- Key: HDFS-7223 URL: https://issues.apache.org/jira/browse/HDFS-7223 Project: Hadoop HDFS Issue Type: Improvement Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Attachments: HDFS-7223-0.patch, HDFS-7223-1.patch Current span description for IPC call is too long. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180814#comment-14180814 ] Hadoop QA commented on HDFS-5928: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676430/HDFS-5928.v5.patch against trunk revision 70719e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFSAcl {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8482//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8482//console This message is automatically generated. show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal
[ https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7277: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~jingzhao] for the reviews. Remove explicit dependency on netty 3.2 in BKJournal Key: HDFS-7277 URL: https://issues.apache.org/jira/browse/HDFS-7277 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7277.000.patch The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the code does not use it. It should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6888) Remove audit logging of getFIleInfo()
[ https://issues.apache.org/jira/browse/HDFS-6888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180823#comment-14180823 ] Hadoop QA commented on HDFS-6888: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676436/HDFS-6888-6.patch against trunk revision a36399e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8483//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8483//console This message is automatically generated. Remove audit logging of getFIleInfo() - Key: HDFS-6888 URL: https://issues.apache.org/jira/browse/HDFS-6888 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Labels: log Attachments: HDFS-6888-2.patch, HDFS-6888-3.patch, HDFS-6888-4.patch, HDFS-6888-5.patch, HDFS-6888-6.patch, HDFS-6888.patch The audit logging of getFileInfo() was added in HDFS-3733. Since this is a one of the most called method, users have noticed that audit log is now filled with this. Since we now have HTTP request logging, this seems unnecessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7277) Remove explicit dependency on netty 3.2 in BKJournal
[ https://issues.apache.org/jira/browse/HDFS-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180824#comment-14180824 ] Hudson commented on HDFS-7277: -- FAILURE: Integrated in Hadoop-trunk-Commit #6319 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6319/]) HDFS-7277. Remove explicit dependency on netty 3.2 in BKJournal. Contributed by Haohui Mai. (wheat9: rev f729ecf9d2b858e9ee97419e788f1a2ac38b15bb) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml Remove explicit dependency on netty 3.2 in BKJournal Key: HDFS-7277 URL: https://issues.apache.org/jira/browse/HDFS-7277 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7277.000.patch The HDFS BKJournal states a direct dependency on netty 3.2.4 in pom but the code does not use it. It should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7279: -- Component/s: webhdfs datanode Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5928) show namespace and namenode ID on NN dfshealth page
[ https://issues.apache.org/jira/browse/HDFS-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180836#comment-14180836 ] Haohui Mai commented on HDFS-5928: -- The patch looks good. Tested on a non-HA cluster and it looks good to me. {code} +{#HAInfo} +h3{Namespace} {NamenodeID}/h3 +{/HAInfo} {code} Can you move the information into the table below? For example: {code} {#HAInfo} trthNamespace:/thtd{Namespace}/td/tr trthNamenode ID:/thtd{NamenodeID}/td/tr {/HAInfo} {code} Can you post a screenshot on a HA cluster setup as well? show namespace and namenode ID on NN dfshealth page --- Key: HDFS-5928 URL: https://issues.apache.org/jira/browse/HDFS-5928 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Siqi Li Assignee: Siqi Li Attachments: HDFS-5928.v2.patch, HDFS-5928.v3.patch, HDFS-5928.v4.patch, HDFS-5928.v5.patch, HDFS-5928.v1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7280) Use netty 4 in WebImageViewer
Haohui Mai created HDFS-7280: Summary: Use netty 4 in WebImageViewer Key: HDFS-7280 URL: https://issues.apache.org/jira/browse/HDFS-7280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai This jira changes WebImageViewer to use netty 4 instead of netty 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180853#comment-14180853 ] Hadoop QA commented on HDFS-7180: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676474/HDFS-7180.003.patch against trunk revision 3b12fd6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1269 javac compiler warnings (more than the trunk's current 1266 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8487//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8487//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8487//console This message is automatically generated. NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch, HDFS-7180.003.patch We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO
[jira] [Updated] (HDFS-7280) Use netty 4 in WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7280: - Status: Patch Available (was: Open) Use netty 4 in WebImageViewer - Key: HDFS-7280 URL: https://issues.apache.org/jira/browse/HDFS-7280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7280.000.patch This jira changes WebImageViewer to use netty 4 instead of netty 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7280) Use netty 4 in WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7280: - Attachment: HDFS-7280.000.patch Use netty 4 in WebImageViewer - Key: HDFS-7280 URL: https://issues.apache.org/jira/browse/HDFS-7280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7280.000.patch This jira changes WebImageViewer to use netty 4 instead of netty 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7280) Use netty 4 in WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180869#comment-14180869 ] Hadoop QA commented on HDFS-7280: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676498/HDFS-7280.000.patch against trunk revision f729ecf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8490//console This message is automatically generated. Use netty 4 in WebImageViewer - Key: HDFS-7280 URL: https://issues.apache.org/jira/browse/HDFS-7280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7280.000.patch This jira changes WebImageViewer to use netty 4 instead of netty 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7281) Missing block is marked as corrupted block
Ming Ma created HDFS-7281: - Summary: Missing block is marked as corrupted block Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7281) Missing block is marked as corrupted block
[ https://issues.apache.org/jira/browse/HDFS-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180898#comment-14180898 ] Yongjun Zhang commented on HDFS-7281: - Thanks reporting this issue [~mingma]. I happen to notice the same in a fsck report today. It's indeed confusing. Missing block is marked as corrupted block -- Key: HDFS-7281 URL: https://issues.apache.org/jira/browse/HDFS-7281 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma In the situation where the block lost all its replicas, fsck shows the block is missing as well as corrupted. Perhaps it is better not to mark the block corrupted in this case. The reason it is marked as corrupted is numCorruptNodes == numNodes == 0 in the following code. {noformat} BlockManager final boolean isCorrupt = numCorruptNodes == numNodes; {noformat} Would like to clarify if it is the intent to mark missing block as corrupted or it is just a bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6988) Add configurable limit for percentage-based eviction threshold
[ https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180908#comment-14180908 ] Hadoop QA commented on HDFS-6988: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676462/HDFS-6988.03.patch against trunk revision a36399e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8486//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8486//console This message is automatically generated. Add configurable limit for percentage-based eviction threshold -- Key: HDFS-6988 URL: https://issues.apache.org/jira/browse/HDFS-6988 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao Fix For: 3.0.0 Attachments: HDFS-6988.01.patch, HDFS-6988.02.patch, HDFS-6988.03.patch Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction thresholds configurable. The hard-coded thresholds may not be appropriate for very large RAM disks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180958#comment-14180958 ] Hadoop QA commented on HDFS-7278: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676481/HDFS-7278.002.patch against trunk revision 3b12fd6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8488//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8488//console This message is automatically generated. Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7278.002.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7278) Add a command that allows sysadmins to manually trigger full block reports from a DN
[ https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180962#comment-14180962 ] Suresh Srinivas commented on HDFS-7278: --- bq. I think it's a good tool to have in our toolbox to work around possible bugs in NN replica accounting. Very interesting. I have not encountered such an issue. If you have details it would be good to share. This command must be okay to add. Add a command that allows sysadmins to manually trigger full block reports from a DN Key: HDFS-7278 URL: https://issues.apache.org/jira/browse/HDFS-7278 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.6.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7278.002.patch We should add a command that allows sysadmins to manually trigger full block reports from a DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7231) rollingupgrade needs some guard rails
[ https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180312#comment-14180312 ] Suresh Srinivas edited comment on HDFS-7231 at 10/23/14 3:29 AM: - Allen, I just rewrote the steps with additional details to clarify: # Upgrade 2.0.5 cluster to 2.2 # Do not -finalizeUpgrade # Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1. # Start namenode -upgrade option. # Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress # Leave 2.4.1 DNs running # Install binaries on NN to 2.2 # Start NN on 2.2 with no upgrade related options So far things are clear. Then you go on to say, the following: bq. DNs now do a partial roll-forward, rendering them unable to continue What do you mean by this? bq. admins manually repair version files on those broken directories This as you know is a recipe for disaster :) Let me ask you a question. Before you go on to 2.4.1, if you do finalize of upgrade what happens? was (Author: sureshms): Allen, I just rewrote the steps with additional details to clarify: # Upgrade 2.0.5 cluster to 2.2 # Do not -finalizeUpgrade # Install 2.4.1 binaries on the cluster machines. Start the datanodes on 2.4.1. # Start namenode -upgrade option. # Namenode start fails because 2.0.5 to 2.2 upgrade is still in progress # Leave 2.4.1 DNs running # Install binaries on NN to 2.2 # Start NN on 2.2 with no upgrade related options So far things are clear. Then you go on to say, the following: bq. DNs now do a partial roll-forward, rendering them unable to continue What do you mean by this? bq. admins manually repair version files on those broken directories This is as you know is a recipe for disaster. Let me ask you a question. Before you go on to 2.4.1, if you do finalize of upgrade what happens? rollingupgrade needs some guard rails - Key: HDFS-7231 URL: https://issues.apache.org/jira/browse/HDFS-7231 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Allen Wittenauer Priority: Blocker See first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6663) Admin command to track file and locations from block id
[ https://issues.apache.org/jira/browse/HDFS-6663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14180971#comment-14180971 ] Chen He commented on HDFS-6663: --- TestDNFencingWithReplication, TestNameEditsConfigs, and TestStandbyCheckpoints passed test on my machine. The latest QA does not show any test failure. Not sure why it gives me -1. Admin command to track file and locations from block id --- Key: HDFS-6663 URL: https://issues.apache.org/jira/browse/HDFS-6663 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6663-2.patch, HDFS-6663-3.patch, HDFS-6663-3.patch, HDFS-6663-4.patch, HDFS-6663-5.patch, HDFS-6663-WIP.patch, HDFS-6663.patch A dfsadmin command that allows finding out the file and the locations given a block number will be very useful in debugging production issues. It may be possible to add this feature to Fsck, instead of creating a new command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)