[jira] [Commented] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438489#comment-15438489 ] SammiChen commented on HDFS-8901: - Hi Zhe, Thanks for your great effort to review the patch. Very good suggestions. Here is my thoughts. 1. the piece of code will be moved to BlockReaderUtil, create a new readAll function for it. 2. will be handled. 3. Yes. we are planning to add a ByteBuffer version read API 4. good question, I'm double checking the code. And will share my finding later. 5. will be handled. 6. Every StripingChunk will have either a ChunkByteBuffer or a ByteBuffer to track different buffer source. ChunkByteBuffer will be used when reusing user input buffer as the striping read buffer during positional read. And ByteBuffer will be used when read using stateful strip reader. 7. the memory copy is avoid in case chunk.useChunkBuffer is not true, that's an improvement. > Use ByteBuffer in striping positional read > -- > > Key: HDFS-8901 > URL: https://issues.apache.org/jira/browse/HDFS-8901 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: SammiChen > Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, > HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, > HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, > HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, > HDFS-8901.v13.patch, HDFS-8901.v14.patch, initial-poc.patch > > > Native erasure coder prefers to direct ByteBuffer for performance > consideration. To prepare for it, this change uses ByteBuffer through the > codes in implementing striping position read. It will also fix avoiding > unnecessary data copying between striping read chunk buffers and decode input > buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10803) TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available
[ https://issues.apache.org/jira/browse/HDFS-10803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438482#comment-15438482 ] Hadoop QA commented on HDFS-10803: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}109m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10803 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825596/HDFS-10803.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 688ba967f82a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 27c3b86 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16546/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16546/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16546/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails > intermittently due to no free space available > > > Key: HDFS-10803 >
[jira] [Updated] (HDFS-10803) TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available
[ https://issues.apache.org/jira/browse/HDFS-10803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10803: - Attachment: HDFS-10803.001.patch > TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails > intermittently due to no free space available > > > Key: HDFS-10803 > URL: https://issues.apache.org/jira/browse/HDFS-10803 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10803.001.patch > > > The test {{TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools}} > fails intermittently. The stack > infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/): > {code} > java.io.IOException: Creating block, no free space available > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.(SimulatedFSDataset.java:151) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580) > at > org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516) > {code} > The error message means that the datanode's capacity has used up and there is > no other space to create a new file block. > I looked into the code, I found the main reason seemed that the > {{capacities}} for cluster is not correctly constructed in the second > cluster startup before preparing to redistribute blocks in test. > The related code: > {code} > // Here we do redistribute blocks nNameNodes times for each node, > // we need to adjust the capacities. Otherwise it will cause the no > // free space errors sometimes. > final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) > .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes)) > .numDataNodes(nDataNodes) > .racks(racks) > .simulatedCapacities(newCapacities) > .format(false) > .build(); > LOG.info("UNEVEN 11"); > ... > for(int n = 0; n < nNameNodes; n++) { > // redistribute blocks > final Block[][] blocksDN = TestBalancer.distributeBlocks( > blocks[n], s.replication, distributionPerNN); > > for(int d = 0; d < blocksDN.length; d++) > cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d])); > LOG.info("UNEVEN 13: n=" + n); > } > {code} > And that means the totalUsed value has been increased as > {{nNameNodes*usedSpacePerNN}} rather than {{usedSpacePerNN}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10803) TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available
[ https://issues.apache.org/jira/browse/HDFS-10803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10803: - Status: Patch Available (was: Open) Attach a patch for fixing this. > TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails > intermittently due to no free space available > > > Key: HDFS-10803 > URL: https://issues.apache.org/jira/browse/HDFS-10803 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin > > The test {{TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools}} > fails intermittently. The stack > infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/): > {code} > java.io.IOException: Creating block, no free space available > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.(SimulatedFSDataset.java:151) > at > org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580) > at > org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516) > {code} > The error message means that the datanode's capacity has used up and there is > no other space to create a new file block. > I looked into the code, I found the main reason seemed that the > {{capacities}} for cluster is not correctly constructed in the second > cluster startup before preparing to redistribute blocks in test. > The related code: > {code} > // Here we do redistribute blocks nNameNodes times for each node, > // we need to adjust the capacities. Otherwise it will cause the no > // free space errors sometimes. > final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) > .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes)) > .numDataNodes(nDataNodes) > .racks(racks) > .simulatedCapacities(newCapacities) > .format(false) > .build(); > LOG.info("UNEVEN 11"); > ... > for(int n = 0; n < nNameNodes; n++) { > // redistribute blocks > final Block[][] blocksDN = TestBalancer.distributeBlocks( > blocks[n], s.replication, distributionPerNN); > > for(int d = 0; d < blocksDN.length; d++) > cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d])); > LOG.info("UNEVEN 13: n=" + n); > } > {code} > And that means the totalUsed value has been increased as > {{nNameNodes*usedSpacePerNN}} rather than {{usedSpacePerNN}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10803) TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available
Yiqun Lin created HDFS-10803: Summary: TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools fails intermittently due to no free space available Key: HDFS-10803 URL: https://issues.apache.org/jira/browse/HDFS-10803 Project: Hadoop HDFS Issue Type: Bug Reporter: Yiqun Lin Assignee: Yiqun Lin The test {{TestBalancerWithMultipleNameNodes#testBalancing2OutOf3Blockpools}} fails intermittently. The stack infos(https://builds.apache.org/job/PreCommit-HDFS-Build/16534/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithMultipleNameNodes/testBalancing2OutOf3Blockpools/): {code} java.io.IOException: Creating block, no free space available at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset$BInfo.(SimulatedFSDataset.java:151) at org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset.injectBlocks(SimulatedFSDataset.java:580) at org.apache.hadoop.hdfs.MiniDFSCluster.injectBlocks(MiniDFSCluster.java:2679) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.unevenDistribution(TestBalancerWithMultipleNameNodes.java:405) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancing2OutOf3Blockpools(TestBalancerWithMultipleNameNodes.java:516) {code} The error message means that the datanode's capacity has used up and there is no other space to create a new file block. I looked into the code, I found the main reason seemed that the {{capacities}} for cluster is not correctly constructed in the second cluster startup before preparing to redistribute blocks in test. The related code: {code} // Here we do redistribute blocks nNameNodes times for each node, // we need to adjust the capacities. Otherwise it will cause the no // free space errors sometimes. final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) .nnTopology(MiniDFSNNTopology.simpleFederatedTopology(nNameNodes)) .numDataNodes(nDataNodes) .racks(racks) .simulatedCapacities(newCapacities) .format(false) .build(); LOG.info("UNEVEN 11"); ... for(int n = 0; n < nNameNodes; n++) { // redistribute blocks final Block[][] blocksDN = TestBalancer.distributeBlocks( blocks[n], s.replication, distributionPerNN); for(int d = 0; d < blocksDN.length; d++) cluster.injectBlocks(n, d, Arrays.asList(blocksDN[d])); LOG.info("UNEVEN 13: n=" + n); } {code} And that means the totalUsed value has been increased as {{nNameNodes*usedSpacePerNN}} rather than {{usedSpacePerNN}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10795) Fix an error in ReaderStrategy#ByteBufferStrategy
[ https://issues.apache.org/jira/browse/HDFS-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438397#comment-15438397 ] Hudson commented on HDFS-10795: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10350 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10350/]) HDFS-10795. Fix an error in ReaderStrategy#ByteBufferStrategy. (kai.zheng: rev f4a21d3abaa7c5a9f0a0d8417e81f7eaf3d1b29a) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ReaderStrategy.java > Fix an error in ReaderStrategy#ByteBufferStrategy > - > > Key: HDFS-10795 > URL: https://issues.apache.org/jira/browse/HDFS-10795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10795-v1.patch > > > ReaderStrategy#ByteBufferStrategy's function {{readFromBlock}} allocate a > temp ByteBuffer, but not used. Refactor to make it more reasonable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10795) Fix an error in ReaderStrategy#ByteBufferStrategy
[ https://issues.apache.org/jira/browse/HDFS-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-10795: - Resolution: Fixed Fix Version/s: 3.0.0-alpha1 Status: Resolved (was: Patch Available) Committed to 3.0.0-alpha1 and trunk branches. Thanks [~Sammi] for the contribution. > Fix an error in ReaderStrategy#ByteBufferStrategy > - > > Key: HDFS-10795 > URL: https://issues.apache.org/jira/browse/HDFS-10795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10795-v1.patch > > > ReaderStrategy#ByteBufferStrategy's function {{readFromBlock}} allocate a > temp ByteBuffer, but not used. Refactor to make it more reasonable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10795) Fix an error in ReaderStrategy#ByteBufferStrategy
[ https://issues.apache.org/jira/browse/HDFS-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-10795: - Hadoop Flags: Reviewed > Fix an error in ReaderStrategy#ByteBufferStrategy > - > > Key: HDFS-10795 > URL: https://issues.apache.org/jira/browse/HDFS-10795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10795-v1.patch > > > ReaderStrategy#ByteBufferStrategy's function {{readFromBlock}} allocate a > temp ByteBuffer, but not used. Refactor to make it more reasonable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10795) Fix an error in ReaderStrategy#ByteBufferStrategy
[ https://issues.apache.org/jira/browse/HDFS-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438368#comment-15438368 ] Kai Zheng commented on HDFS-10795: -- The patch LGTM and +1. > Fix an error in ReaderStrategy#ByteBufferStrategy > - > > Key: HDFS-10795 > URL: https://issues.apache.org/jira/browse/HDFS-10795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Attachments: HDFS-10795-v1.patch > > > ReaderStrategy#ByteBufferStrategy's function {{readFromBlock}} allocate a > temp ByteBuffer, but not used. Refactor to make it more reasonable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10795) Fix an error in ReaderStrategy#ByteBufferStrategy
[ https://issues.apache.org/jira/browse/HDFS-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-10795: - Summary: Fix an error in ReaderStrategy#ByteBufferStrategy (was: Refactor ReaderStrategy#ByteBufferStrategy) > Fix an error in ReaderStrategy#ByteBufferStrategy > - > > Key: HDFS-10795 > URL: https://issues.apache.org/jira/browse/HDFS-10795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Attachments: HDFS-10795-v1.patch > > > ReaderStrategy#ByteBufferStrategy's function {{readFromBlock}} allocate a > temp ByteBuffer, but not used. Refactor to make it more reasonable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
[ https://issues.apache.org/jira/browse/HDFS-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438345#comment-15438345 ] Hadoop QA commented on HDFS-10798: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 32s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 581 unchanged - 1 fixed = 585 total (was 582) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 32s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.qjournal.client.TestQuorumJournalManager | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.server.namenode.TestCacheDirectives | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10798 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825572/HDFS-10789.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 68ea9baea93d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 81485db | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16545/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16545/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16545/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
[ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438333#comment-15438333 ] James Clampffer commented on HDFS-10754: Looks good to me, +1. Thanks for taking a couple more passes. Having these tools is great and as a bonus they give a good set of stuff to run in a regression test suite. {code} //SetOwner DOES NOT guarantee that the handler will only be called once at a time, so we DO need locking in handlerSetOwner. {code} Thanks for making note of this. The default of 1 worker thread has made it very easy for race conditions to slip in. There's some really nitpicky c++isms that aren't worth holding this up but might be worth using in the future. {code} std::string port = (uri.get_port()) ? std::to_string(uri.get_port().value()) : ""; {code} Is a good candidate for optional::value_or to simplify things. uri.get_port.value() is already returning a string temporary, so you don't need to construct another rvalue, if you're doing it for clarity about the return type then I can't complain. I haven't disassembled this specific example but IIRC an optimized build will get rid of the second constructor anyway (not like it's in the critical path either). {code} std::string port = uri.get_port().value_or(""); {code} libhdfspp/tools/tools_common.cpp should really be libhdfspp/tools/tools_common.cc. That'll get fixed soon enough by any other patches that deal with tools. {code} SetOwnerState(const std::string & username_, const std::string & groupname_, const std::function& handler_, uint64_t request_counter_, bool find_is_done_) : username(username_), // cutting some stuff out status(), lock() { {code} You don't need status and lock in the initializer list; the default constructor for member variables is called implicitly. Again, if that's more a preference thing I'm fine with it (or maybe it's needed and I'm missing something). > libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, > hdfs_chown, hdfs_chmod and hdfs_find. > --- > > Key: HDFS-10754 > URL: https://issues.apache.org/jira/browse/HDFS-10754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10754.HDFS-8707.000.patch, > HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, > HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, > HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, > HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, > HDFS-10754.HDFS-8707.009.patch, HDFS-10754.HDFS-8707.010.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
[ https://issues.apache.org/jira/browse/HDFS-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-10798: - Description: Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. In a busy system a lower overhead might be desired. In other scenarios, more aggressive reporting might be desired. We should make the threshold configurable. (was: Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. In a busy system this might add too much overhead. We should make the threshold configurable.) > Make the threshold of reporting FSNamesystem lock contention configurable > - > > Key: HDFS-10798 > URL: https://issues.apache.org/jira/browse/HDFS-10798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Zhe Zhang >Assignee: Erik Krogen > Labels: newbie > Attachments: HDFS-10789.001.patch > > > Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. > In a busy system a lower overhead might be desired. In other scenarios, more > aggressive reporting might be desired. We should make the threshold > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438289#comment-15438289 ] Vinayakumar B commented on HDFS-10652: -- Thanks [~yzhangal] for finding the problem in test. Updated the test looks good to me. I think it could be better to change {{System.out.println(..)}} change to {{LOG.info()}} in test. Rest all looks fine. +1 once addressed. Sorry I am currently in travel and not able to update the patch myself. Thanks for taking care of this. > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, > HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, > HDFS-10652.006.patch, HDFS-10652.007.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438271#comment-15438271 ] Hadoop QA commented on HDFS-10652: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 2s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSShell | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10652 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825562/HDFS-10652.007.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 58bdef09db21 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 81485db | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16544/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16544/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16544/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Attachments:
[jira] [Commented] (HDFS-10748) TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
[ https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438241#comment-15438241 ] Yiqun Lin commented on HDFS-10748: -- Thanks [~xyao] for the review and commit! > TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout > > > Key: HDFS-10748 > URL: https://issues.apache.org/jira/browse/HDFS-10748 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin > Fix For: 2.8.0 > > Attachments: HDFS-10748.001.patch, HDFS-10748.002.patch > > > This was fixed by HDFS-7886. But some recent [Jenkins > Results|https://builds.apache.org/job/PreCommit-HDFS-Build/16390/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > started seeing this again: > {code} > Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate > testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate) > Time elapsed: 43.861 sec <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for > /test/testTruncateWithDataNodesRestart to reach 3 replicas > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
[ https://issues.apache.org/jira/browse/HDFS-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-10798: --- Attachment: HDFS-10789.001.patch > Make the threshold of reporting FSNamesystem lock contention configurable > - > > Key: HDFS-10798 > URL: https://issues.apache.org/jira/browse/HDFS-10798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Zhe Zhang >Assignee: Erik Krogen > Labels: newbie > Attachments: HDFS-10789.001.patch > > > Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. > In a busy system this might add too much overhead. We should make the > threshold configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
[ https://issues.apache.org/jira/browse/HDFS-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-10798: --- Status: Patch Available (was: In Progress) Add configuration 'dfs.namenode.write-lock.reporting.threshold.ms' to configure this value. > Make the threshold of reporting FSNamesystem lock contention configurable > - > > Key: HDFS-10798 > URL: https://issues.apache.org/jira/browse/HDFS-10798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Zhe Zhang >Assignee: Erik Krogen > Labels: newbie > > Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. > In a busy system this might add too much overhead. We should make the > threshold configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438210#comment-15438210 ] Manoj Govindassamy commented on HDFS-10793: --- -- TestNameNodeMetadataConsistency failure is not related to this patch -- Manually verified the patch as mentioned in comment 1 and 2 -- All check styling issues are related to number of arguments in methods exceeding recommended count of 7 > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch, HDFS-10793.002.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438204#comment-15438204 ] Hadoop QA commented on HDFS-10793: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 182 unchanged - 2 fixed = 185 total (was 184) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 50s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10793 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825549/HDFS-10793.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 433ded302311 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 81485db | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16543/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16543/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16543/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16543/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix HdfsAuditLogger
[jira] [Commented] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438191#comment-15438191 ] Yongjun Zhang commented on HDFS-10652: -- Added comment instead to address the second comment from Wei-Chiu, and uploaded rev 007. Thanks for taking further look [~jojochuang] and [~vinayrpet]. > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, > HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, > HDFS-10652.006.patch, HDFS-10652.007.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-10652: - Attachment: HDFS-10652.007.patch > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, > HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, > HDFS-10652.006.patch, HDFS-10652.007.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work
[ https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-10794: --- Summary: [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work (was: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work) > [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the > block storage movement work > > > Key: HDFS-10794 > URL: https://issues.apache.org/jira/browse/HDFS-10794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-10794-00.patch > > > The idea of this jira is to implement a mechanism to move the blocks to the > given target in order to satisfy the block storage policy. Datanode receives > {{blocktomove}} details via heart beat response from NN. More specifically, > its a datanode side extension to handle the block storage movement commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10802) [SPS]: Add satisfyStoragePolicy API in HdfsAdmin
Uma Maheswara Rao G created HDFS-10802: -- Summary: [SPS]: Add satisfyStoragePolicy API in HdfsAdmin Key: HDFS-10802 URL: https://issues.apache.org/jira/browse/HDFS-10802 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This JIRA is to track the work for adding user/admin API for calling to satisfyStoragePolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10801) [SPS]: Protocol buffer changes for sending storage movement commands from NN to DN
Uma Maheswara Rao G created HDFS-10801: -- Summary: [SPS]: Protocol buffer changes for sending storage movement commands from NN to DN Key: HDFS-10801 URL: https://issues.apache.org/jira/browse/HDFS-10801 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Uma Maheswara Rao G Assignee: Rakesh R This JIRA is for tracking the work of protocol buffer changes for sending the storage movement commands from NN to DN -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438128#comment-15438128 ] Sean Mackrory commented on HDFS-10797: -- To reproduce the discrepancy you can follow the following procedure. I put a 100 MB file into HDFS and snapshot it (hadoop fs -du -s reports 100 MB * replication after both operations), and then append another 100 MB onto it (hadoop fs -du -s will report 200 MB * replication factor at that point). If I move the file to trash or simply rename it, hadoop fs -du -s starts reporting 300 MB * replication factor in the second column. I believe at this point it is counting some of the overlap in block between the snapshot and the regular file twice, because it views the move operation the same as a delete, but since the file wasn't actually deleted it gets counted again. {quote} dd if=/dev/zero of=100MB.zero bs=1 count=1 bin/hadoop fs -mkdir -p /user/sean bin/hadoop fs -chown sean /user/sean bin/hadoop fs -put 100MB.zero /user/sean/HDFS-10797 bin/hdfs dfsadmin -allowSnapshot /user/sean bin/hdfs dfs -createSnapshot /user/sean s1 bin/hadoop fs -appendToFile 100MB.zero /user/sean/HDFS-10797 bin/hadoop fs -du -s /user/sean bin/hadoop fs -rm /user/sean/HDFS-10797 # or simply rename with mv bin/hadoop fs -du -s /user/sean {quote} > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10800) [SPS]: Storage Policy Satisfier daemon thread in Namenode to find the blocks which were placed in wrong storages than what NN is expecting.
Uma Maheswara Rao G created HDFS-10800: -- Summary: [SPS]: Storage Policy Satisfier daemon thread in Namenode to find the blocks which were placed in wrong storages than what NN is expecting. Key: HDFS-10800 URL: https://issues.apache.org/jira/browse/HDFS-10800 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This JIRA is for implementing a daemon thread called StoragePolicySatisfier in nematode, which should scan the asked files blocks which were placed in wrong storages in DNs. The idea is: # When user called on some files/dirs for satisfyStorage policy, They should have tracked in NN and then StoragePolicyDaemon thread will pick one by one file and then check the blocks which might have placed in wrong storage in DN than what NN is expecting it to. # After checking all, it should also construct the data structures for the required information to move a block from one storage to another. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9507) LeaseRenewer Logging Under-Reporting
[ https://issues.apache.org/jira/browse/HDFS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438051#comment-15438051 ] Sameer Abhyankar commented on HDFS-9507: Hello - I was looking through the code for this and it seems that DFSClient#renewLease() will only return a false when either the client is not running or there are no files being written. Otherwise DFSClient#renewLease will either renew the lease successfully and return "true" or it will throw an Exception (which will bubble up). The behavior of LeaseRenewer#renew to either log a warning or bubble up the Exception would be valid, correct? As for the LOG level, I agree it should be logged as a warn instead of a debug. > LeaseRenewer Logging Under-Reporting > > > Key: HDFS-9507 > URL: https://issues.apache.org/jira/browse/HDFS-9507 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: BELUGA BEHR >Priority: Minor > > Why is it that in LeaseRenewer#run() failures to renew a lease on a file are > reported with "warn" level logging, but in LeaseRenewer#renew() it is > reported with a "debug" level warn? > In LeaseRenewer#renew(), if the method renewLease() returns 'false' then the > problem is silently discarded (continue, no Exception is thrown) and the next > client in the list tries to renew. > {code:title=LeaseRenewer.java|borderStyle=solid} > private void run(final int id) throws InterruptedException { > ... > try { > renew(); > lastRenewed = Time.monotonicNow(); > } catch (SocketTimeoutException ie) { > LOG.warn("Failed to renew lease for " + clientsString() + " for " > + (elapsed/1000) + " seconds. Aborting ...", ie); > synchronized (this) { > while (!dfsclients.isEmpty()) { > DFSClient dfsClient = dfsclients.get(0); > dfsClient.closeAllFilesBeingWritten(true); > closeClient(dfsClient); > } > //Expire the current LeaseRenewer thread. > emptyTime = 0; > } > break; > } catch (IOException ie) { > LOG.warn("Failed to renew lease for " + clientsString() + " for " > + (elapsed/1000) + " seconds. Will retry shortly ...", ie); > } > } > ... > } > private void renew() throws IOException { > { >... > if (!c.renewLease()) { > LOG.debug("Did not renew lease for client {}", c); > continue; > } >... > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-10793: -- Attachment: HDFS-10793.002.patch Attaching v002 patch -- took care of styling issues. > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch, HDFS-10793.002.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
[ https://issues.apache.org/jira/browse/HDFS-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-10798: - Component/s: logging > Make the threshold of reporting FSNamesystem lock contention configurable > - > > Key: HDFS-10798 > URL: https://issues.apache.org/jira/browse/HDFS-10798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: logging, namenode >Reporter: Zhe Zhang >Assignee: Erik Krogen > Labels: newbie > > Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. > In a busy system this might add too much overhead. We should make the > threshold configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
[ https://issues.apache.org/jira/browse/HDFS-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-10798 started by Erik Krogen. -- > Make the threshold of reporting FSNamesystem lock contention configurable > - > > Key: HDFS-10798 > URL: https://issues.apache.org/jira/browse/HDFS-10798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Zhe Zhang >Assignee: Erik Krogen > Labels: newbie > > Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. > In a busy system this might add too much overhead. We should make the > threshold configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437863#comment-15437863 ] Kihwal Lee commented on HDFS-9145: -- I can reproduce it consistently. But when I tried the approach in the latest patch in HDFS-8915, it passes. If I undo the patch, the test failure is reproduced 100% times. Let's get HDFS-8915 moving. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch, testlog.txt > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437857#comment-15437857 ] Mingliang Liu edited comment on HDFS-9145 at 8/25/16 10:40 PM: --- [~kihwal] are you able to reproduce this failure consistently? How about the test without this patch? On my local machine, I can not reproduce the bug on Java8 against the {{branch-2.7}} specifying {{TestFSNamesystem}} class (9 test cases). was (Author: liuml07): [~kihwal] are you able to reproduce this failure consistently? How about the test without this patch? On my local machine, I can not reproduce the bug on Java8 against the {{branch-2}} specifying {{TestFSNamesystem}} class (9 test cases). > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch, testlog.txt > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437857#comment-15437857 ] Mingliang Liu commented on HDFS-9145: - [~kihwal] are you able to reproduce this failure consistently? How about the test without this patch? On my local machine, I can not reproduce the bug on Java8 against the {{branch-2}} specifying {{TestFSNamesystem}} class (9 test cases). > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch, testlog.txt > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9145: - Attachment: testlog.txt I am not sure how helpful this log will be. I am using openjdk 1.8.0_101 on my box. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch, testlog.txt > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437840#comment-15437840 ] Kihwal Lee commented on HDFS-9145: -- For me it fails when I run {{TestFSNamesystem}}, but passes when I specify {{TestFSNamesystem#testFSLockGetWaiterCount}}. I will get the test log and upload here. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437821#comment-15437821 ] Mingliang Liu commented on HDFS-10793: -- Looks good once Andrew's comment is addressed. Thanks. > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437807#comment-15437807 ] Manoj Govindassamy commented on HDFS-10793: --- Sure [~andrew.wang]. Will do the suggested changes. Will wait for others review so that I can post the next patch with all comments incorporated. Thanks for the review. > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437796#comment-15437796 ] Zhe Zhang commented on HDFS-9145: - Thanks [~liuml07], it looks very likely. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437792#comment-15437792 ] Mingliang Liu edited comment on HDFS-9145 at 8/25/16 10:04 PM: --- Hm Is this an unrelated bug [HDFS-8915]? was (Author: liuml07): Hm Is this a unrelated bug [HDFS-8915]? > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437792#comment-15437792 ] Mingliang Liu commented on HDFS-9145: - Hm Is this a unrelated bug [HDFS-8915]? > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437781#comment-15437781 ] Zhe Zhang commented on HDFS-9145: - I can't reproduce the {{testFSLockGetWaiterCount}} failure locally. [~kihwal] [~liuml07] How about in your local environments? > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437773#comment-15437773 ] Zhe Zhang commented on HDFS-9145: - Thanks Kihwal. I'm trying to fix this. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10475) Adding metrics for long FSD lock
[ https://issues.apache.org/jira/browse/HDFS-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437760#comment-15437760 ] Zhe Zhang commented on HDFS-10475: -- Thanks Xiaoyu for clarifying this. Do you still plan to work on it? Some of my colleagues might be interested in implementing this if you are OK with it. > Adding metrics for long FSD lock > > > Key: HDFS-10475 > URL: https://issues.apache.org/jira/browse/HDFS-10475 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > This is a follow up of the comment on HADOOP-12916 and > [here|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15310837=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15310837] > add more metrics and WARN/DEBUG logs for long FSD/FSN locking operations on > namenode similar to what we have for slow write/network WARN/metrics on > datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437742#comment-15437742 ] Kihwal Lee commented on HDFS-9145: -- There is a test failure in branch-2.7 after this. {noformat} --- T E S T S --- OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem testFSLockGetWaiterCount(org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem) Time elapsed: 0.004 sec <<< FAILURE! java.lang.AssertionError: Expected number of blocked thread not found expected:<3> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.namenode.TestFSNamesystem.testFSLockGetWaiterCount(TestFSNamesystem.java:244) {noformat} It seems to pass when this test case is run separately. It might be due to interactions with other tests in the suite. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437718#comment-15437718 ] Andrew Wang commented on HDFS-10793: Overall looks great to me, thanks for picking this up Manoj. One little nit, could we undo some of the whitespace changes? I think your IDE is configured to align parameters, but I think our normal convention is to double indent. I also would mildly prefer if you generate the patch without "--no-prefix", since then I can apply with just "git apply hdfs.patch". [~arpitagarwal] / [~liuml07] either of you want to review too? Should be a quick one. > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10768) Optimize mkdir ops
[ https://issues.apache.org/jira/browse/HDFS-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437708#comment-15437708 ] Kihwal Lee commented on HDFS-10768: --- I will review the latest patch by tomorrow. > Optimize mkdir ops > -- > > Key: HDFS-10768 > URL: https://issues.apache.org/jira/browse/HDFS-10768 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10768.1.patch, HDFS-10768.patch > > > Directory creation causes excessive object allocation: ex. an immutable list > builder, containing the string of components converted from the IIP's > byte[]s, sublist views of the string list, iterable, followed by string to > byte[] conversion. This can all be eliminated by accessing the component's > byte[] in the IIP. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10768) Optimize mkdir ops
[ https://issues.apache.org/jira/browse/HDFS-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437701#comment-15437701 ] Kihwal Lee commented on HDFS-10768: --- {{TestEditLogJournalFailures}} was broken by HADOOP-13465. It has been reverted from trunk. {{TestRollingFileSystemSinkWithHdfs}} passes. Cannot reproduce the failure. > Optimize mkdir ops > -- > > Key: HDFS-10768 > URL: https://issues.apache.org/jira/browse/HDFS-10768 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10768.1.patch, HDFS-10768.patch > > > Directory creation causes excessive object allocation: ex. an immutable list > builder, containing the string of components converted from the IIP's > byte[]s, sublist views of the string list, iterable, followed by string to > byte[] conversion. This can all be eliminated by accessing the component's > byte[] in the IIP. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl
[ https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437671#comment-15437671 ] Hadoop QA commented on HDFS-10742: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock | | | hadoop.hdfs.server.namenode.TestEditLogJournalFailures | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10742 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825515/HDFS-10742.008.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ab90c40c203a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1360bd2 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16542/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16542/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16542/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Measurement of lock held time in FsDatasetImpl > -- > > Key: HDFS-10742 > URL:
[jira] [Updated] (HDFS-8617) Throttle DiskChecker#checkDirs() speed.
[ https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-8617: Resolution: Duplicate Fix Version/s: 3.0.0-beta1 2.9.0 Status: Resolved (was: Patch Available) Thanks [~arpitagarwal] to point out. I will close this JIRA as duplicated. > Throttle DiskChecker#checkDirs() speed. > --- > > Key: HDFS-8617 > URL: https://issues.apache.org/jira/browse/HDFS-8617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: HDFS-8617.000.patch > > > As described in HDFS-8564, {{DiskChecker.checkDirs(finalizedDir)}} is > causing excessive I/Os because {{finalizedDirs}} might have up to 64K > sub-directories (HDFS-6482). > This patch proposes to limit the rate of IO operations in > {{DiskChecker.checkDirs()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10792) RedundantEditLogInputStream should log caught exceptions
[ https://issues.apache.org/jira/browse/HDFS-10792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10792: --- Issue Type: Improvement (was: Bug) > RedundantEditLogInputStream should log caught exceptions > > > Key: HDFS-10792 > URL: https://issues.apache.org/jira/browse/HDFS-10792 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-10792.01.patch > > > There are a few places in {{RedundantEditLogInputStream}} where an > IOException is caught but never logged. We should improve the logging of > these exceptions to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests
[ https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-10799 started by Wei-Chiu Chuang. -- > NameNode should use loginUser(hdfs) to serve iNotify requests > - > > Key: HDFS-10799 > URL: https://issues.apache.org/jira/browse/HDFS-10799 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10799.001.patch > > > When a NameNode serves iNotify requests from a client, it verifies the client > has superuser permission and then uses the client's Kerberos principal to > read edits from journal nodes. > However, if the client does not renew its tgt tickets, the connection from > NameNode to journal nodes may fail. In which case, the NameNode thinks the > edits are corrupt, and prints a scary error message: > "During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever!" > However, the edits are actually good. NameNode _should not freak out when an > iNotify client's tgt ticket expires_. > I think that an easy solution to this bug, is that after NameNode verifies > client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then > read edits. This will make sure the operation does not fail due to an expired > client ticket. > Excerpt of related logs: > {noformat} > 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:h...@example.com (auth:KERBEROS) > cause:java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 112 on 8020, call > org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client > IP:port] Call#73 Retry#0 > java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437625#comment-15437625 ] Manoj Govindassamy commented on HDFS-10793: --- -- All 3 check style issues are because of number of arguments in the method are beyond 7. Not introduced anything new, just a bit refactoring. -- TestEditLogJournalFailures are not related to this patch and it fails even without this patch. > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437597#comment-15437597 ] Hadoop QA commented on HDFS-10793: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 182 unchanged - 2 fixed = 185 total (was 184) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 18s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 77m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestEditLogJournalFailures | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10793 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825511/HDFS-10793.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 29d00ef89787 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1360bd2 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16541/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16541/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16541/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16541/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix HdfsAuditLogger binary
[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
[ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437491#comment-15437491 ] Hadoop QA commented on HDFS-10754: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 27s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 40s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 38s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 9s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 35s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 41s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s{color} | {color:green} the patch passed with JDK v1.8.0_101 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 8s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 6s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Issue | HDFS-10754 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825510/HDFS-10754.HDFS-8707.010.patch | | Optional Tests | asflicense compile cc mvnsite javac unit javadoc mvninstall | | uname | Linux fc80ed71065d 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / 4a74bc4 | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16540/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | |
[jira] [Updated] (HDFS-10742) Measurement of lock held time in FsDatasetImpl
[ https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-10742: -- Attachment: HDFS-10742.008.patch fix checkstyle > Measurement of lock held time in FsDatasetImpl > -- > > Key: HDFS-10742 > URL: https://issues.apache.org/jira/browse/HDFS-10742 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha2 >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, > HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch, > HDFS-10742.006.patch, HDFS-10742.007.patch, HDFS-10742.008.patch > > > This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is > held by a thread. Doing so will allow us to measure lock statistics. > This can be done by extending the {{AutoCloseableLock}} lock object in > {{FsDatasetImpl}}. In the future we can also consider replacing the lock with > a read-write lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10742) Measurement of lock held time in FsDatasetImpl
[ https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437482#comment-15437482 ] Hadoop QA commented on HDFS-10742: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 109 unchanged - 0 fixed = 113 total (was 109) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 0s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 77m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestHASafeMode | | | hadoop.hdfs.server.namenode.TestEditLogJournalFailures | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10742 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825507/HDFS-10742.007.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux c2ecb97868a7 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1360bd2 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/16539/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16539/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16539/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was
[jira] [Commented] (HDFS-10768) Optimize mkdir ops
[ https://issues.apache.org/jira/browse/HDFS-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437483#comment-15437483 ] Hadoop QA commented on HDFS-10768: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 58 unchanged - 1 fixed = 58 total (was 59) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 36s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | | | hadoop.hdfs.server.namenode.TestEditLogJournalFailures | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10768 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825502/HDFS-10768.1.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 1576a6028302 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1360bd2 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16538/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16538/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16538/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Optimize mkdir ops > -- > > Key: HDFS-10768 > URL: https://issues.apache.org/jira/browse/HDFS-10768 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >
[jira] [Updated] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-10793: -- Status: Patch Available (was: Open) > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests
[ https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10799: --- Description: When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes. However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message: "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!" However, the edits are actually good. NameNode _should not freak out when an iNotify client's tgt ticket expires_. I think that an easy solution to this bug, is that after NameNode verifies client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then read edits. This will make sure the operation does not fail due to an expired client ticket. Excerpt of related logs: {noformat} 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:h...@example.com (auth:KERBEROS) cause:java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever! 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 112 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client IP:port] Call#73 Retry#0 java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever! at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) {noformat} was: When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes. However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message: "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!" However, the edits are actually good. NameNode _should not freak out when an iNotify client's tgt ticket
[jira] [Comment Edited] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437462#comment-15437462 ] Yongjun Zhang edited comment on HDFS-10652 at 8/25/16 7:02 PM: --- Thanks [~jojochuang] for reviewing. I on-purposely added those messages. My thinking is, it's worthwhile since it helps debugging unit test, and it may help adding clarity in real cluster too. Do you agree? I will address your other comment soon. Thanks. was (Author: yzhangal): Thanks [~jojochuang] for reviewing. I on-purposely added those messages. My thinking is, it's worthwhile since it helps debugging unit test, and it may help adding clarity in real cluster too. Thanks. > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, > HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, > HDFS-10652.006.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437462#comment-15437462 ] Yongjun Zhang commented on HDFS-10652: -- Thanks [~jojochuang] for reviewing. I on-purposely added those messages. My thinking is, it's worthwhile since it helps debugging unit test, and it may help adding clarity in real cluster too. Thanks. > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, > HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, > HDFS-10652.006.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests
[ https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10799: --- Attachment: HDFS-10799.001.patch v01: a quick fix. Need tests, and probably need to wrap NameNodeRpcServer#getCurrentEditLogTxid as well. > NameNode should use loginUser(hdfs) to serve iNotify requests > - > > Key: HDFS-10799 > URL: https://issues.apache.org/jira/browse/HDFS-10799 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10799.001.patch > > > When a NameNode serves iNotify requests from a client, it verifies the client > has superuser permission and then uses the client's Kerberos principal to > read edits from journal nodes. > However, if the client does not renew its tgt tickets, the connection from > NameNode to journal nodes may fail. In which case, the NameNode thinks the > edits are corrupt, and prints a scary error message: > "During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever!" > However, the edits are actually good. NameNode _should not freak out when an > iNotify client's tgt ticket expires_. > I think that an easy solution to this bug, is that after NameNode verifies > client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then > read edits. This will make sure the operation does not fail due to an expired > client ticket. > Expert of related logs: > {noformat} > 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:h...@example.com (auth:KERBEROS) > cause:java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 112 on 8020, call > org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client > IP:port] Call#73 Retry#0 > java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) >
[jira] [Updated] (HDFS-10793) Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184
[ https://issues.apache.org/jira/browse/HDFS-10793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-10793: -- Attachment: HDFS-10793.001.patch Attaching v001 patch to address binary compatibility issue with HdfsAuditLogger. > Fix HdfsAuditLogger binary incompatibility introduced by HDFS-9184 > -- > > Key: HDFS-10793 > URL: https://issues.apache.org/jira/browse/HDFS-10793 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy >Priority: Blocker > Attachments: HDFS-10793.001.patch > > > HDFS-9184 added a new parameter to an existing method signature in > HdfsAuditLogger, which is a Public/Evolving class. This breaks binary > compatibility with implementing subclasses. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-4660) Block corruption can happen during pipeline recovery
[ https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437454#comment-15437454 ] Yongjun Zhang commented on HDFS-4660: - Thank you very much [~nroberts]! > Block corruption can happen during pipeline recovery > > > Key: HDFS-4660 > URL: https://issues.apache.org/jira/browse/HDFS-4660 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.0.3-alpha, 3.0.0-alpha1 >Reporter: Peng Zhang >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 2.7.1, 2.6.4 > > Attachments: HDFS-4660.br26.patch, HDFS-4660.patch, HDFS-4660.patch, > HDFS-4660.v2.patch, periodic_hflush.patch > > > pipeline DN1 DN2 DN3 > stop DN2 > pipeline added node DN4 located at 2nd position > DN1 DN4 DN3 > recover RBW > DN4 after recover rbw > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004 > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134144 > getBytesOnDisk() = 134144 > getVisibleLength()= 134144 > end at chunk (134144/512=262) > DN3 after recover rbw > 2013-04-01 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 > 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134028 > getBytesOnDisk() = 134028 > getVisibleLength()= 134028 > client send packet after recover pipeline > offset=133632 len=1008 > DN4 after flush > 2013-04-01 21:02:31,779 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1063 > // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is > 1063. > DN3 after flush > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, > type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, > lastPacketInBlock=false, offsetInBlock=134640, > ackEnqueueNanoTime=8817026136871545) > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing > meta file offset of block > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from > 1055 to 1051 > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1059 > After checking meta on DN4, I found checksum of chunk 262 is duplicated, but > data not. > Later after block was finalized, DN4's scanner detected bad block, and then > reported it to NM. NM send a command to delete this block, and replicate this > block from other DN in pipeline to satisfy duplication num. > I think this is because in BlockReceiver it skips data bytes already written, > but not skips checksum bytes already written. And function > adjustCrcFilePosition is only used for last non-completed chunk, but > not for this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests
[ https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10799: --- Description: When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes. However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message: "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!" However, the edits are actually good. NameNode _should not freak out when an iNotify client's tgt ticket expires_. I think that an easy solution to this bug, is that after NameNode verifies client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then read edits. This will make sure the operation does not fail due to an expired client ticket. Expert of related logs: {noformat} 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:h...@example.com (auth:KERBEROS) cause:java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever! 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 112 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client IP:port] Call#73 Retry#0 java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever! at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) {noformat} was: When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes. However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message: "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!" However, the edits are actually good. NameNode _should not freak out when an iNotify client's tgt ticket
[jira] [Updated] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests
[ https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-10799: --- Description: When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes. However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message: "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!" However, the edits are actually good. NameNode _should not freak out when an iNotify client's tgt ticket expires_. I think that an easy solution to this bug, is that after NameNode verifies client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then read edits. This will make sure the operation does not fail due to an expired client ticket. Appendix: {noformat} 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:h...@example.com (auth:KERBEROS) cause:java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever! 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 112 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client IP:port] Call#73 Retry#0 java.io.IOException: We encountered an error reading http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever! at org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) at org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) {noformat} was: When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes. However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message: "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!" However, the edits are actually good. NameNode _should not freak out when an iNotify client's tgt ticket expires_. I
[jira] [Created] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests
Wei-Chiu Chuang created HDFS-10799: -- Summary: NameNode should use loginUser(hdfs) to serve iNotify requests Key: HDFS-10799 URL: https://issues.apache.org/jira/browse/HDFS-10799 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang When a NameNode serves iNotify requests from a client, it verifies the client has superuser permission and then uses the client's Kerberos principal to read edits from journal nodes. However, if the client does not renew its tgt tickets, the connection from NameNode to journal nodes may fail. In which case, the NameNode thinks the edits are corrupt, and prints a scary error message: "During automatic edit log failover, we noticed that all of the remaining edit log streams are shorter than the current one! The best remaining edit log ends at transaction 11577603, but we thought we could read up to transaction 11577606. If you continue, metadata will be lost forever!" However, the edits are actually good. NameNode _should not freak out when an iNotify client's tgt ticket expires_. I think that an easy solution to this bug, is that after NameNode verifies client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then read edits. This will make sure the operation does not fail due to an expired client ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
[ https://issues.apache.org/jira/browse/HDFS-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-10798: - Assignee: Erik Krogen > Make the threshold of reporting FSNamesystem lock contention configurable > - > > Key: HDFS-10798 > URL: https://issues.apache.org/jira/browse/HDFS-10798 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Zhe Zhang >Assignee: Erik Krogen > Labels: newbie > > Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. > In a busy system this might add too much overhead. We should make the > threshold configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
[ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-10754: - Attachment: HDFS-10754.HDFS-8707.010.patch New patch is attached. Please review. > libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, > hdfs_chown, hdfs_chmod and hdfs_find. > --- > > Key: HDFS-10754 > URL: https://issues.apache.org/jira/browse/HDFS-10754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10754.HDFS-8707.000.patch, > HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, > HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, > HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, > HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, > HDFS-10754.HDFS-8707.009.patch, HDFS-10754.HDFS-8707.010.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
[ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437421#comment-15437421 ] Anatoli Shein commented on HDFS-10754: -- Yes, seems like hdfs_find.cpp did not get included into my last diff. Should be good in the new patch. > libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, > hdfs_chown, hdfs_chmod and hdfs_find. > --- > > Key: HDFS-10754 > URL: https://issues.apache.org/jira/browse/HDFS-10754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10754.HDFS-8707.000.patch, > HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, > HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, > HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, > HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, > HDFS-10754.HDFS-8707.009.patch, HDFS-10754.HDFS-8707.010.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
[ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437416#comment-15437416 ] Anatoli Shein edited comment on HDFS-10754 at 8/25/16 6:33 PM: --- Thank for the review, [~bobhansen]. I have addressed your comments as follows: * The new recursive methods should not use future/promises internally. That blocks one of the asio threads waiting for more data; if a consumer tried to do one of these with a single thread in the threadpool, it would deadlock waiting for the subtasks to complete, but they'd all get queued up behind the initial handler. * Instead, whenever they're all done (request_count == 0), the last one out the door (the handler that dropped the request_count to 0) should call into the consumer's handler directly with the final status. If any of the other threads has received an error, all of the subsequent deliveries to the handler should return false, telling find "I'm going to report an error anyway, so don't bother recursing any more." It's good to wait until request_count==0, even in an error state, so the consumer doesn't have any lame duck requests queued up to take care of? * Also because all of this is asynchronous, you can't allocate the lock and state variables on the stack. When the consumer calls SetOwner('/', true, handler), the function is going to return as soon as the find operation is kicked off, destroying all of the elements on the stack. We'll need to create a little struct for SetOwner that is maintained with a shared_ptr and cleaned up when the last request is done. (/) I removed futures and promises from recursive methods, and created a small struct to keep the state. Now it is purely async. Minor points: * In find, perhaps recursion_counter is a bit of a misnomer at this point. It's more outstanding_requests, since for big directories, we'll have more requests without recursing. (/) Done. * Perhaps FindOperationState is a better name than CurrentState, and SharedFindState is better than just SharedState (since we might have many shared states in the FileSystem class). (/) Done. * In CurrentState, perhaps "depth" is more accurate than position? (/) Yes, I changed it to "depth" now. * Do we support a globbing find without recursion? Can I find "/dir?/path*/" "*.db", and not have it recurse to the sub-directories of path*? (/) POSIX find supports globbing without recursion by setting maxdepth to zero. I added maxdepth functionality for our tool also. * Can we push the shims and state into the .cpp file and keep them out of the itnerface (even if private)? (/) Done. was (Author: anatoli.shein): Thank for the review, [~bobhansen]. I have addressed your comments as follows: * The new recursive methods should not use future/promises internally. That blocks one of the asio threads waiting for more data; if a consumer tried to do one of these with a single thread in the threadpool, it would deadlock waiting for the subtasks to complete, but they'd all get queued up behind the initial handler. Instead, whenever they're all done (request_count == 0), the last one out the door (the handler that dropped the request_count to 0) should call into the consumer's handler directly with the final status. If any of the other threads has received an error, all of the subsequent deliveries to the handler should return false, telling find "I'm going to report an error anyway, so don't bother recursing any more." It's good to wait until request_count==0, even in an error state, so the consumer doesn't have any lame duck requests queued up to take care of? Also because all of this is asynchronous, you can't allocate the lock and state variables on the stack. When the consumer calls SetOwner('/', true, handler), the function is going to return as soon as the find operation is kicked off, destroying all of the elements on the stack. We'll need to create a little struct for SetOwner that is maintained with a shared_ptr and cleaned up when the last request is done. (/) I removed futures and promises from recursive methods, and created a small struct to keep the state. Now it is purely async. Minor points: * In find, perhaps recursion_counter is a bit of a misnomer at this point. It's more outstanding_requests, since for big directories, we'll have more requests without recursing. (/) Done. * Perhaps FindOperationState is a better name than CurrentState, and SharedFindState is better than just SharedState (since we might have many shared states in the FileSystem class). (/) Done. * In CurrentState, perhaps "depth" is more accurate than position? (/) Yes, I changed it to "depth" now. * Do we support a globbing find without recursion? Can I find "/dir?/path*/" "*.db", and not have it recurse to the sub-directories of path*? (/) POSIX find supports globbing without recursion by setting maxdepth to
[jira] [Commented] (HDFS-10754) libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, hdfs_chown, hdfs_chmod and hdfs_find.
[ https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437416#comment-15437416 ] Anatoli Shein commented on HDFS-10754: -- Thank for the review, [~bobhansen]. I have addressed your comments as follows: * The new recursive methods should not use future/promises internally. That blocks one of the asio threads waiting for more data; if a consumer tried to do one of these with a single thread in the threadpool, it would deadlock waiting for the subtasks to complete, but they'd all get queued up behind the initial handler. Instead, whenever they're all done (request_count == 0), the last one out the door (the handler that dropped the request_count to 0) should call into the consumer's handler directly with the final status. If any of the other threads has received an error, all of the subsequent deliveries to the handler should return false, telling find "I'm going to report an error anyway, so don't bother recursing any more." It's good to wait until request_count==0, even in an error state, so the consumer doesn't have any lame duck requests queued up to take care of? Also because all of this is asynchronous, you can't allocate the lock and state variables on the stack. When the consumer calls SetOwner('/', true, handler), the function is going to return as soon as the find operation is kicked off, destroying all of the elements on the stack. We'll need to create a little struct for SetOwner that is maintained with a shared_ptr and cleaned up when the last request is done. (/) I removed futures and promises from recursive methods, and created a small struct to keep the state. Now it is purely async. Minor points: * In find, perhaps recursion_counter is a bit of a misnomer at this point. It's more outstanding_requests, since for big directories, we'll have more requests without recursing. (/) Done. * Perhaps FindOperationState is a better name than CurrentState, and SharedFindState is better than just SharedState (since we might have many shared states in the FileSystem class). (/) Done. * In CurrentState, perhaps "depth" is more accurate than position? (/) Yes, I changed it to "depth" now. * Do we support a globbing find without recursion? Can I find "/dir?/path*/" "*.db", and not have it recurse to the sub-directories of path*? (/) POSIX find supports globbing without recursion by setting maxdepth to zero. I added maxdepth functionality for our tool also. * Can we push the shims and state into the .cpp file and keep them out of the itnerface (even if private)? (/) Done. > libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, > hdfs_chown, hdfs_chmod and hdfs_find. > --- > > Key: HDFS-10754 > URL: https://issues.apache.org/jira/browse/HDFS-10754 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein >Assignee: Anatoli Shein > Attachments: HDFS-10754.HDFS-8707.000.patch, > HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, > HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, > HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, > HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, > HDFS-10754.HDFS-8707.009.patch, HDFS-10754.HDFS-8707.010.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10742) Measurement of lock held time in FsDatasetImpl
[ https://issues.apache.org/jira/browse/HDFS-10742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-10742: -- Attachment: HDFS-10742.007.patch Thanks [~chris.douglas] for the comments! Updated a patch to fix Jenkins findbugs complains. Also reverted an unnecessary change from previous patch that could be misleading. Will update with unit test later on. Apart from this. I'm considering using hadoop's built-in metric system, instead of maintaining and printing locally maintained stats, what do you think about this? > Measurement of lock held time in FsDatasetImpl > -- > > Key: HDFS-10742 > URL: https://issues.apache.org/jira/browse/HDFS-10742 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.0.0-alpha2 >Reporter: Chen Liang >Assignee: Chen Liang > Attachments: HDFS-10742.001.patch, HDFS-10742.002.patch, > HDFS-10742.003.patch, HDFS-10742.004.patch, HDFS-10742.005.patch, > HDFS-10742.006.patch, HDFS-10742.007.patch > > > This JIRA proposes to measure the time the of lock of {{FsDatasetImpl}} is > held by a thread. Doing so will allow us to measure lock statistics. > This can be done by extending the {{AutoCloseableLock}} lock object in > {{FsDatasetImpl}}. In the future we can also consider replacing the lock with > a read-write lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437340#comment-15437340 ] Mingliang Liu commented on HDFS-9145: - Thanks [~zhz] for taking care of this. I also noticed that you backported [HDFS-10798]. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437340#comment-15437340 ] Mingliang Liu edited comment on HDFS-9145 at 8/25/16 5:57 PM: -- Thanks [~zhz] for taking care of this. I also noticed that you backported [HDFS-9467]. was (Author: liuml07): Thanks [~zhz] for taking care of this. I also noticed that you backported [HDFS-10798]. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10584) Allow long-running Mover tool to login with keytab
[ https://issues.apache.org/jira/browse/HDFS-10584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437327#comment-15437327 ] Rakesh R commented on HDFS-10584: - Hi [~zhz], I've tried addressing your comments in the latest patch, please review it again when you get a chance. Thanks! > Allow long-running Mover tool to login with keytab > -- > > Key: HDFS-10584 > URL: https://issues.apache.org/jira/browse/HDFS-10584 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer & mover >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-10584-00.patch, HDFS-10584-01.patch, > HDFS-10584-02.patch, HDFS-10584-03.patch > > > The idea of this jira is to support {{mover}} tool the ability to login from > a keytab. That way, the RPC client would re-login from the keytab after > expiration, which means the process could remain authenticated indefinitely. > With some people wanting to run mover non-stop in "daemon mode", that might > be a reasonable feature to add. Recently balancer has been enhanced using > this feature. > Thanks [~zhz] for the offline discussions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10791) Delete block meta file when the block file is missing
[ https://issues.apache.org/jira/browse/HDFS-10791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437326#comment-15437326 ] Tsz Wo Nicholas Sze commented on HDFS-10791: Hi [~linyiqun], thanks for pointing out the code. However, there are two deficiencies: # The code is only used by DirectoryScanner on FINALIZED blocks. RBW block meta files won't be cleaned up. # DirectoryScanner uses filters to list files so that it skips unidentified files if these files are not matched by the patterns. > Delete block meta file when the block file is missing > - > > Key: HDFS-10791 > URL: https://issues.apache.org/jira/browse/HDFS-10791 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Tsz Wo Nicholas Sze > > When the block file is missing, the block meta file should be deleted if it > exists. > Note that such situation is possible since the meta file is closed before the > block file, the datanode could be killed in-between. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9467) Fix data race accessing writeLockHeldTimeStamp in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-9467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9467: Fix Version/s: 2.7.4 > Fix data race accessing writeLockHeldTimeStamp in FSNamesystem > -- > > Key: HDFS-9467 > URL: https://issues.apache.org/jira/browse/HDFS-9467 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9467.000.patch, HDFS-9467.001.patch > > > This is a followup of [HDFS-9145]. > Actually the value of {{writeLockInterval}} should be captured within the > lock. The current code has a race on {{writeLockHeldTimeStamp}}. Thanks to > [~jingzhao] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9145: Fix Version/s: 2.7.4 > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9145) Tracking methods that hold FSNamesytemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437295#comment-15437295 ] Zhe Zhang commented on HDFS-9145: - Thanks [~liuml07], [~jingzhao]. This is pretty good improvement. I just backport this change and HDFS-9467 to branch-2.7. > Tracking methods that hold FSNamesytemLock for too long > --- > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10768) Optimize mkdir ops
[ https://issues.apache.org/jira/browse/HDFS-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-10768: --- Attachment: HDFS-10768.1.patch Fixed findbugs and the style issue. > Optimize mkdir ops > -- > > Key: HDFS-10768 > URL: https://issues.apache.org/jira/browse/HDFS-10768 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10768.1.patch, HDFS-10768.patch > > > Directory creation causes excessive object allocation: ex. an immutable list > builder, containing the string of components converted from the IIP's > byte[]s, sublist views of the string list, iterable, followed by string to > byte[] conversion. This can all be eliminated by accessing the component's > byte[] in the IIP. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8883) NameNode Metrics : Add FSNameSystem lock Queue Length
[ https://issues.apache.org/jira/browse/HDFS-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437259#comment-15437259 ] Zhe Zhang commented on HDFS-8883: - Thanks [~anu] for the work. This is a pretty good improvement. I just backported to branch-2.7. > NameNode Metrics : Add FSNameSystem lock Queue Length > - > > Key: HDFS-8883 > URL: https://issues.apache.org/jira/browse/HDFS-8883 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-8883.001.patch > > > FSNameSystemLock can have contention when NameNode is under load. This patch > adds LockQueueLength -- the number of threads waiting on FSNameSystemLock -- > as a metric in NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8883) NameNode Metrics : Add FSNameSystem lock Queue Length
[ https://issues.apache.org/jira/browse/HDFS-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8883: Fix Version/s: 2.7.4 > NameNode Metrics : Add FSNameSystem lock Queue Length > - > > Key: HDFS-8883 > URL: https://issues.apache.org/jira/browse/HDFS-8883 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-8883.001.patch > > > FSNameSystemLock can have contention when NameNode is under load. This patch > adds LockQueueLength -- the number of threads waiting on FSNameSystemLock -- > as a metric in NameNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8773) Few FSNamesystem metrics are not documented in the Metrics page
[ https://issues.apache.org/jira/browse/HDFS-8773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8773: Fix Version/s: (was: 2.7.4) > Few FSNamesystem metrics are not documented in the Metrics page > --- > > Key: HDFS-8773 > URL: https://issues.apache.org/jira/browse/HDFS-8773 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 2.8.0 > > Attachments: HDFS-8773-00.patch > > > This jira is to document missing metrics in the [Metrics > page|https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Metrics.html#FSNamesystem]. > Following are not documented: > {code} > MissingReplOneBlocks > NumFilesUnderConstruction > NumActiveClients > HAState > FSState > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8773) Few FSNamesystem metrics are not documented in the Metrics page
[ https://issues.apache.org/jira/browse/HDFS-8773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8773: Fix Version/s: 2.7.4 > Few FSNamesystem metrics are not documented in the Metrics page > --- > > Key: HDFS-8773 > URL: https://issues.apache.org/jira/browse/HDFS-8773 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-8773-00.patch > > > This jira is to document missing metrics in the [Metrics > page|https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Metrics.html#FSNamesystem]. > Following are not documented: > {code} > MissingReplOneBlocks > NumFilesUnderConstruction > NumActiveClients > HAState > FSState > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8721) Add a metric for number of encryption zones
[ https://issues.apache.org/jira/browse/HDFS-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437228#comment-15437228 ] Zhe Zhang commented on HDFS-8721: - This is a pretty good improvement. I just backported to branch-2.7. > Add a metric for number of encryption zones > --- > > Key: HDFS-8721 > URL: https://issues.apache.org/jira/browse/HDFS-8721 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-8721-00.patch, HDFS-8721-01.patch > > > Would be good to expose the number of encryption zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8721) Add a metric for number of encryption zones
[ https://issues.apache.org/jira/browse/HDFS-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8721: Fix Version/s: 2.7.4 > Add a metric for number of encryption zones > --- > > Key: HDFS-8721 > URL: https://issues.apache.org/jira/browse/HDFS-8721 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: encryption >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 2.8.0, 2.7.4 > > Attachments: HDFS-8721-00.patch, HDFS-8721-01.patch > > > Would be good to expose the number of encryption zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10798) Make the threshold of reporting FSNamesystem lock contention configurable
Zhe Zhang created HDFS-10798: Summary: Make the threshold of reporting FSNamesystem lock contention configurable Key: HDFS-10798 URL: https://issues.apache.org/jira/browse/HDFS-10798 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Zhe Zhang Currently {{FSNamesystem#WRITELOCK_REPORTING_THRESHOLD}} is set at 1 second. In a busy system this might add too much overhead. We should make the threshold configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
Sean Mackrory created HDFS-10797: Summary: Disk usage summary of snapshots causes renamed blocks to get counted twice Key: HDFS-10797 URL: https://issues.apache.org/jira/browse/HDFS-10797 Project: Hadoop HDFS Issue Type: Bug Reporter: Sean Mackrory DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how much disk usage is used by a snapshot by tallying up the files in the snapshot that have since been deleted (that way it won't overlap with regular files whose disk usage is computed separately). However that is determined from a diff that shows moved (to Trash or otherwise) or renamed files as a deletion and a creation operation that may overlap with the list of blocks. Only the deletion operation is taken into consideration, and this causes those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10748) TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
[ https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437189#comment-15437189 ] Hudson commented on HDFS-10748: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10346 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10346/]) HDFS-10748. TestFileTruncate#testTruncateWithDataNodesRestart runs (xyao: rev 4da5000dd33cf013e7212848ed2c44f1e60e860e) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java > TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout > > > Key: HDFS-10748 > URL: https://issues.apache.org/jira/browse/HDFS-10748 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin > Fix For: 2.8.0 > > Attachments: HDFS-10748.001.patch, HDFS-10748.002.patch > > > This was fixed by HDFS-7886. But some recent [Jenkins > Results|https://builds.apache.org/job/PreCommit-HDFS-Build/16390/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > started seeing this again: > {code} > Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate > testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate) > Time elapsed: 43.861 sec <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for > /test/testTruncateWithDataNodesRestart to reach 3 replicas > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10768) Optimize mkdir ops
[ https://issues.apache.org/jira/browse/HDFS-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437169#comment-15437169 ] Kihwal Lee commented on HDFS-10768: --- The test failure was already reported in HDFS-10498. I've added the log and initial analysis there. [~daryn], please take care of the findbugs warning. > Optimize mkdir ops > -- > > Key: HDFS-10768 > URL: https://issues.apache.org/jira/browse/HDFS-10768 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10768.patch > > > Directory creation causes excessive object allocation: ex. an immutable list > builder, containing the string of components converted from the IIP's > byte[]s, sublist views of the string list, iterable, followed by string to > byte[] conversion. This can all be eliminated by accessing the component's > byte[] in the IIP. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10748) TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
[ https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-10748: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Thanks [~linyiqun] for the contribution. I've commit the fix to trunk, branch-2 and branch-2.8. > TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout > > > Key: HDFS-10748 > URL: https://issues.apache.org/jira/browse/HDFS-10748 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin > Fix For: 2.8.0 > > Attachments: HDFS-10748.001.patch, HDFS-10748.002.patch > > > This was fixed by HDFS-7886. But some recent [Jenkins > Results|https://builds.apache.org/job/PreCommit-HDFS-Build/16390/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > started seeing this again: > {code} > Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate > testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate) > Time elapsed: 43.861 sec <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for > /test/testTruncateWithDataNodesRestart to reach 3 replicas > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10498) Intermittent test failure org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength.testSnapshotfileLength
[ https://issues.apache.org/jira/browse/HDFS-10498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10498: -- Attachment: test_failure.txt Attaching a test log from a precommit build. The test calls {{append()}} and makes an assumption on when the block state changes on datanodes. In the failed test, {{getFileChecksum()}} reached a datanode right after {{append()}}. This made {{BLOCK_CHECKSUM}} op on the datanode to fail with an {{EOFException}} while trying to read the meta file. If the {{BLOCK_CHECKSUM}} op reached the datanode a tiny bit later, it would have failed with the expected exception. > Intermittent test failure > org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength.testSnapshotfileLength > --- > > Key: HDFS-10498 > URL: https://issues.apache.org/jira/browse/HDFS-10498 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, snapshots >Affects Versions: 3.0.0-alpha1 >Reporter: Hanisha Koneru > Attachments: test_failure.txt > > > Error Details > Per https://builds.apache.org/job/PreCommit-HDFS-Build/15646/testReport/, we > had the following failure. Local rerun is successful. > Error Details: > {panel} > Fail to get block MD5 for > LocatedBlock{BP-145245805-172.17.0.3-1464981728847:blk_1073741826_1002; > getBlockSize()=1; corrupt=false; offset=1024; > locs=[DatanodeInfoWithStorage[127.0.0.1:55764,DS-a33d7c97-9d4a-4694-a47e-a3187a33ed5a,DISK]]} > {panel} > Stack Trace: > {panel} > java.io.IOException: Fail to get block MD5 for > LocatedBlock{BP-145245805-172.17.0.3-1464981728847:blk_1073741826_1002; > getBlockSize()=1; corrupt=false; offset=1024; > locs=[DatanodeInfoWithStorage[127.0.0.1:55764,DS-a33d7c97-9d4a-4694-a47e-a3187a33ed5a,DISK]]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$ReplicatedFileChecksumComputer.checksumBlocks(FileChecksumHelper.java:289) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:206) > at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1731) > at > org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1482) > at > org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1479) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1490) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength.testSnapshotfileLength(TestSnapshotFileLength.java:137) > Standard Output 7 sec > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10619) Cache path in InodesInPath
[ https://issues.apache.org/jira/browse/HDFS-10619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437129#comment-15437129 ] Kihwal Lee commented on HDFS-10619: --- [~daryn], you need to rebase the patch. > Cache path in InodesInPath > -- > > Key: HDFS-10619 > URL: https://issues.apache.org/jira/browse/HDFS-10619 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-10619.patch > > > INodesInPath#getPath, a frequently called method, dynamically builds the > path. IIP should cache the path upon construction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10748) TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
[ https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437123#comment-15437123 ] Xiaoyu Yao commented on HDFS-10748: --- Thanks [~linyiqun] for working on this. The patch v02 LGTM, +1. I will commit it shortly. > TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout > > > Key: HDFS-10748 > URL: https://issues.apache.org/jira/browse/HDFS-10748 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Xiaoyu Yao >Assignee: Yiqun Lin > Attachments: HDFS-10748.001.patch, HDFS-10748.002.patch > > > This was fixed by HDFS-7886. But some recent [Jenkins > Results|https://builds.apache.org/job/PreCommit-HDFS-Build/16390/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > started seeing this again: > {code} > Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate > testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate) > Time elapsed: 43.861 sec <<< ERROR! > java.util.concurrent.TimeoutException: Timed out waiting for > /test/testTruncateWithDataNodesRestart to reach 3 replicas > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751) > at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4660) Block corruption can happen during pipeline recovery
[ https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated HDFS-4660: - Attachment: periodic_hflush.patch > Block corruption can happen during pipeline recovery > > > Key: HDFS-4660 > URL: https://issues.apache.org/jira/browse/HDFS-4660 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.0.3-alpha, 3.0.0-alpha1 >Reporter: Peng Zhang >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 2.7.1, 2.6.4 > > Attachments: HDFS-4660.br26.patch, HDFS-4660.patch, HDFS-4660.patch, > HDFS-4660.v2.patch, periodic_hflush.patch > > > pipeline DN1 DN2 DN3 > stop DN2 > pipeline added node DN4 located at 2nd position > DN1 DN4 DN3 > recover RBW > DN4 after recover rbw > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004 > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134144 > getBytesOnDisk() = 134144 > getVisibleLength()= 134144 > end at chunk (134144/512=262) > DN3 after recover rbw > 2013-04-01 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 > 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134028 > getBytesOnDisk() = 134028 > getVisibleLength()= 134028 > client send packet after recover pipeline > offset=133632 len=1008 > DN4 after flush > 2013-04-01 21:02:31,779 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1063 > // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is > 1063. > DN3 after flush > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, > type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, > lastPacketInBlock=false, offsetInBlock=134640, > ackEnqueueNanoTime=8817026136871545) > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing > meta file offset of block > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from > 1055 to 1051 > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1059 > After checking meta on DN4, I found checksum of chunk 262 is duplicated, but > data not. > Later after block was finalized, DN4's scanner detected bad block, and then > reported it to NM. NM send a command to delete this block, and replicate this > block from other DN in pipeline to satisfy duplication num. > I think this is because in BlockReceiver it skips data bytes already written, > but not skips checksum bytes already written. And function > adjustCrcFilePosition is only used for last non-completed chunk, but > not for this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-4660) Block corruption can happen during pipeline recovery
[ https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437036#comment-15437036 ] Nathan Roberts commented on HDFS-4660: -- Hi [~yzhangal]. Had to go back to an old git stash, but I'll attach a sample patch to TeraOutputFormat. > Block corruption can happen during pipeline recovery > > > Key: HDFS-4660 > URL: https://issues.apache.org/jira/browse/HDFS-4660 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.0.3-alpha, 3.0.0-alpha1 >Reporter: Peng Zhang >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 2.7.1, 2.6.4 > > Attachments: HDFS-4660.br26.patch, HDFS-4660.patch, HDFS-4660.patch, > HDFS-4660.v2.patch > > > pipeline DN1 DN2 DN3 > stop DN2 > pipeline added node DN4 located at 2nd position > DN1 DN4 DN3 > recover RBW > DN4 after recover rbw > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004 > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134144 > getBytesOnDisk() = 134144 > getVisibleLength()= 134144 > end at chunk (134144/512=262) > DN3 after recover rbw > 2013-04-01 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 > 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134028 > getBytesOnDisk() = 134028 > getVisibleLength()= 134028 > client send packet after recover pipeline > offset=133632 len=1008 > DN4 after flush > 2013-04-01 21:02:31,779 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1063 > // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is > 1063. > DN3 after flush > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, > type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, > lastPacketInBlock=false, offsetInBlock=134640, > ackEnqueueNanoTime=8817026136871545) > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing > meta file offset of block > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from > 1055 to 1051 > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1059 > After checking meta on DN4, I found checksum of chunk 262 is duplicated, but > data not. > Later after block was finalized, DN4's scanner detected bad block, and then > reported it to NM. NM send a command to delete this block, and replicate this > block from other DN in pipeline to satisfy duplication num. > I think this is because in BlockReceiver it skips data bytes already written, > but not skips checksum bytes already written. And function > adjustCrcFilePosition is only used for last non-completed chunk, but > not for this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9038) DFS reserved space is erroneously counted towards non-DFS used.
[ https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437020#comment-15437020 ] Hadoop QA commented on HDFS-9038: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 9 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 52s{color} | {color:red} hadoop-hdfs-project generated 1 new + 51 unchanged - 1 fixed = 52 total (was 52) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 40s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 583 unchanged - 5 fixed = 584 total (was 588) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 0s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}109m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.mover.TestStorageMover | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12825464/HDFS-9038-010.patch | | JIRA Issue | HDFS-9038 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 4f982e9d4706 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 525d52b | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/16537/artifact/patchprocess/diff-compile-javac-hadoop-hdfs-project.txt | | checkstyle |
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: (was: HDFS-7859.008.patch) > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org